Seminar Data-Science I (Methods)
CS 721 Master Seminar (M. Sc. Wirt. Inf., M.Sc. MMDS, Lehramt für Gymnasien)
Lecturer | Marlene Lutz, Maximilian Kräutner |
Course Format | Seminar |
Offering | HWS/ |
Credit Points | 4 ECTS |
Language | English |
Grading | Written report with oral presentations |
Examination date | See schedule below |
Information for Students | The course is limited to 16 participants. Please register centrally via Portal2. |
Contact
For administrative questions, please contact office.strohmaier. uni-mannheim.de

Marlene Lutz
L 15, 1–6
3. OG – Raum 323
68161 Mannheim

Maximilian Kreutner
L 15, 1–6
3. OG – Raum 322
68161 Mannheim
Course Information
Course Description
In this seminar, students perform scientific research, either in the form of a literature review or by conducting a small experiment, or a mixture of both, and prepare a written report about the results. Topics of interest focus around a variety of problems and tasks from the fields of Data-Science, Network Science and Text Mining.
Previous participation in the courses “Network Science” and “Text Analytics” are recommended.
Objectives
Expertise: Students will acquire a deep understanding of the research topic. They are expected to describe and summarize a topic in detail in their own words, as well as to judge the contribution of the research papers to ongoing research.
Methodological competence: Students will develop methods and skills to find relevant literature for their topic, to write a well-structured scientific paper and to present their results.
Topics
This seminar will be split into four main topic blocks. Every student will be assigned a research paper from only one of these blocks to work on. Yet, it is expected that students also actively participate in the discussion of papers from other topic blocks after they have been presented.
The four topics we are going to discuss in the FSS 2025 are:
- Analyzing the Pre-training Data of LLMs. The pre-training data of Large Language Models (LLMs) is the foundation of their knowledge, shaping their responses, reasoning, and biases. Analyzing this data is essential for understanding how models acquire information and ensuring diversity and representation. However, the data's vast scale and proprietary restrictions present significant challenges. Within this topic block, we will explore methods to connect model behavior to its pre-training data.
- Training Data Attribution. Training Data Attribution (TDA) involves identifying the sources of data that contribute to specific outputs generated by an LLM. TDA is critical for ensuring accountability, and complying with copyright and data usage laws. By attributing outputs to their underlying data, researchers and practitioners can better assess model performance, trace issues like hallucinations, and improve transparency. In this topic block, we will discuss different techniques for TDA and their technical and ethical challenges and implications.
- Pluralistic Alignment. As LLMs are increasingly deployed in diverse cultural and professional settings, LLMs should cater to the perspectives, values, and goals of various user groups. Unlike traditional approaches, which often aim for a single alignment objective, pluralistic alignment emphasizes flexibility, enabling LLMs to respond appropriately based on the context and the user's intentions and background. While studying different methods for pluralistic alignment, we will also address various ethical and technical challenges in developing LLMs for real-world applications.
Multi-agent simulations with Large Language Models. New multi-agent simulations use Large Language Models to model complex interactions, from social behavior to economic systems. These agents can mimic human actions, debate ideas or reveal how knowledge or misinformation spreads. In this seminar, we’ll explore how these simulations work and how they can be used to draw conclusions for the real world.
- Generative Agents: Interactive Simulacra of Human Behavior
- Social Simulacra: Creating Populated Prototypes for Social Computing Systems
- Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate
- EconAgent: Large Language Model-Empowered Agents for Simulating Macroeconomic Activities
Schedule
The schedule below is preliminary, dates are subject to change.
Registration period
until 10.02.2025 via Portal2 Kick-off meeting 20.02.25, 11:00–11:45
L 15 1-6, room 314/
315 General information
Drop-out until 23.02.2025 Midterm 03.04.25 or 07.04.25 , 09:00–14:00
L 15 1-6, room 314/
315 Presentations Endterm 12.05.25 or 15.05.25, 09:00–14:00
L 15 1-6, room 314/
315 Presentations Submission deadline 25.05.2025, 23:59 Written report Registration
Please register via Portal2.