Can hundreds or thousands of untrained people do the job of an expert? This seminar will explore the benefits of this idea and investigate the techniques needed to make it work.

Natural language processing (NLP) requires vast amounts of annotated text data for training and evaluating algorithms, as well as for analysing language to further the underlying theory. Expert annotators have very limited time, but if we can break the annotation task into many simple sub-tasks, they can be distributed to a large number of non-experts workers. This is the idea behind crowdsourcing, which provides a way to annotate text data cheaply at larger scale, with commercial platforms such as Amazon Mechanical Turk providing access to many thousands of workers. However, it is far from trivial to obtain good annotations using crowdsourcing, so it has become an active research topic in fields such as machine learning and human-computer interaction. Intelligent crowdsourcing techniques aim to increase efficiency and improve the quality of results by solving problems such as annotation errors and untrusted workers in an automated manner.

This seminar will introduce the fundamentals of text annotation tasks and crowdsourcing, and review in depth the latest research into crowdsourcing methods, including the increasing use of statistical machine learning to optimise the crowdsourcing process. We will investigate how general approaches can be adapted to linguistic annotation tasks, and analyse their strengths, limitations and potential improvements.