Kursinformation | Natural Language Processing and the Web (WS 17/18)

Natural Language Processing and the Web (WS 17/18)

Natural Language Processing and the Web

Teaching Staff

We currently do not have fixed office hours, so please contact us by mail to get an appointment.

Organization

Lecture: Tuesday 08:00-09:40, Room S202 / C205 starting October 17
Practice class: Thursday 16:15-17:55, Room S202/C120 starting October 26

The learning material is available from the Moodle eLeaning platform.

Registration

If you plan to participate in this course, please register on Tucan.

Requirements

To pass, each student has to take the written exam at the end of the semester.

There will also be a project in the practice class which will contribute to your overall grade.

Exam

Date/Time: 27/2/2018 15:00-17:00
Room: S202/C205 - Bosch Hörsaal

Course content

The Web contains more than 10 billion indexable web pages, which can be retrieved via search queries. The lecture will present Natural Language Processing (NLP) methods to (1) automatically process large amounts of unstructured text from the web and (2) analyse the use of Web data as a resource for other NLP tasks.

Processing of unstructured web content

Introduction
NLP Basics - Tokenisation, Part of Speech Tagging, Chunking, Stemming, Lemmatization
Web contents and their characteristics - diverse genres of web contents, e.g. personal web sites, news sites, blogs, forums, wikis
Web contents and their characteristics - continued

NLP applications for the web

Information retrieval - introduction to the basics of information retrieval
Web information retrieval - natural language interfaces for web information retrieval
Question answering (QA): Factoid QA, Knowledge Base QA, Community QA
Crowdsourcing
Text Structuring

Literature

Kai-Uwe Carstensen, Christian Ebert, Cornelia Endriss, Susanne Jekat, Ralf Klabunde, Computerlinguistik und Sprachtechnologie. Eine Einführung, Heidelberg: Spektrum-Verlag, März 2010. (3. Auflage) I
T. Götz & O. Suhre, Design and implementation of the UIMA Common Analysis System, IBM Systems Journal, 2004, 43, 476-489.
Adam Kilgarriff & Gregory Grefenstette, Introduction to the special issue on the web as corpus, Computational Linguistics, MIT Press, 2003, 29, 333-347
Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. 2008.

Dozent*in: Thomas Arnold
Dozent*in: Hatem Mousselly Sergieh
Dozent*in: Christian Stab