The lecture offers an introduction into the perspectives, problems, methods and techniques of text technology. All examples and tutorials are based on the programming language Python.

Key topics:

  • Natural language processing (NLP)
    • Tokenization and segmentation
    • Part-of-Speech tagging
    • Creating and using text corpora
    • Statistical analysis
    • Syntactic analysis
  • Machine Learning
    • Categorization and classification
    • Information extraction
  • Introduction to Python
    • Structured programming
    • Data structures and IO
    • NLTK library for NLP
    • Usage of further libraries such as scikit-learn

The course is based on the Python programming language together with an open source library called the Natural Language Toolkit (NLTK). NLTK allows explorative and problem-solving learning of theoretical concepts without the requirement of extensive programming knowledge.

The course assumes familiarity with basic computing concepts, but will not assume any knowledge of the Python language, which will be acquired during the course. If you like to work with your own notebook, we kindly ask you to follow the installation instructions given at http://www.nltk.org/download.