Large Language Models (LLMs) are currently considered the state-of-the-art in most NLP applications, first garnering acclaim with the introduction of BERT, then rising in popularity with the development of ChatGPT and GPT-4. One of the defining features of LLMs is their versatility: their ability to adapt to seemingly any task given an appropriate prompt. Moreover, research has shown that LLMs’ abilities improve when their parameter counts and amount of training data are increased (“scaling”). At the same time, LLMs still demonstrate the tendency to “hallucinate” (i.e. produce factually-incorrect output), and are often dependent on specific prompt formats.


In this seminar, we will look at the various abilities of LLMs, particularly in the context of their contributing factors, such as scaling and specific prompting methods (e.g. instruction-tuning and chain-of-thought prompting).


We will also look at weaknesses currently exhibited by LLMs, such as hallucination, prompt sensitivity, and bias. Additionally, we will explore the topic of LLM safety.