Kursinformation | Natural Language Processing and the Web (WS 22/23)

Natural Language Processing and the Web (WS 22/23)

The Web contains more than 10 billion indexable web pages, which can be retrieved via search queries. The lecture will present Natural Language Processing (NLP) methods to (1) automatically process large amounts of unstructured text from the web and (2) analyse the use of Web data as a resource for other NLP tasks.

Processing of unstructured web content

Introduction
NLP Basics - Tokenisation, Part of Speech Tagging, Chunking, Stemming, Lemmatization
Web contents and their characteristics - diverse genres of web contents, e.g. personal web sites, news sites, blogs, forums, wikis
Web contents and their characteristics - continued

NLP applications for the web

Information retrieval - introduction to the basics of information retrieval
Web information retrieval - natural language interfaces for web information retrieval
Question answering (QA): Factoid QA, Knowledge Base QA, Community QA
Crowdsourcing
Reproducibility

Dozent*in: Thomas Arnold
Dozent*in: Max Eichler