Database management systems (DBMS) in the cloud are the backbone for managing large volumes of data efficiently and thus play a central role in business and science today. For providing high performance, many of the most complex DBMS components such as query optimizers or schedulers involve solving non-trivial problems.

To tackle such problems, very recent work has outlined a new direction of so-called learned DBMS components where AI-based methods are used to replace and enhance core DBMS components, which has been shown to provide significant performance benefits. This route is particularly interesting since Cloud vendors such as Google, Amazon, and Microsoft are already applying these techniques to optimize the performance of their cloud data systems.

Besides learned DBMS components, AI has been used to improve many other data management-related tasks. For example, classical data engineering tasks like error detection, missing value imputation, and data augmentation typically cause high manual overheads and can be automated with AI. Finally, AI has also been used to extend databases through better data access interfaces (e.g., natural language querying and chatbots for data) or by supporting data beyond structured tabular data (i.e., text and images).

This seminar is designed to introduce students to the foundational concepts of using AI for data management. The course will include a mini lecture series that provides the necessary background on AI in data management, preparing students for the seminar tasks. The seminar is divided into two parts, each focusing on key themes as introduced above: learned DBMS components and the application of AI for data engineering. Students will engage in practical tasks related to these topics, as outlined below.