Seminar Data-Science I (Methods)

CS 721 Master Seminar (M. Sc. Wirt. Inf., M.Sc. MMDS, Lehramt für Gymnasien)

Lecturer	Marlene Lutz, Maximilian Kräutner
Course Format	Seminar
Offering	HWS/FSS
Credit Points	4 ECTS
Language	English
Grading	Written report with oral presentations
Examination date	See schedule below
Information for Students	The course is limited to 16 participants. Please register centrally via Portal2.

Contact

For administrative questions, please contact office.strohmaiermail-uni-mannheim.de.

Credit: Yannick Stahl

Marlene Lutz

Research Assistant

University of Mannheim
L 15, 1–6
3rd floor – Room 323
68161 Mannheim

E-mail: marlene.lutzuni-mannheim.de
Web: marlenelutz.github.io

Credit: privat

Maximilian Kreutner

Research Assistant

University of Mannheim
L 15, 1–6
3rd floor – Room 322
68161 Mannheim

E-mail: maximilian.kreutneruni-mannheim.de

Course Information

Course Description
In this seminar, students perform scientific research, either in the form of a literature review or by conducting a small experiment, or a mixture of both, and prepare a written report about the results. Topics of interest focus around a variety of problems and tasks from the fields of Data-Science, Network Science and Text Mining.
Previous participation in the courses “Network Science” and “Text Analytics” are recommended.
Objectives
Expertise: Students will acquire a deep understanding of the research topic. They are expected to describe and summarize a topic in detail in their own words, as well as to judge the contribution of the research papers to ongoing research.
Methodological competence: Students will develop methods and skills to find relevant literature for their topic, to write a well-structured scientific paper and to present their results.
Topics
This seminar will be split into four main topic blocks. Every student will be assigned a research paper from only one of these blocks to work on. Yet, it is expected that students also actively participate in the discussion of papers from other topic blocks after they have been presented.
The four topics we are going to discuss in the FSS 2025 are:
Analyzing the Pre-training Data of LLMs. The pre-training data of Large Language Models (LLMs) is the foundation of their knowledge, shaping their responses, reasoning, and biases. Analyzing this data is essential for understanding how models acquire information and ensuring diversity and representation. However, the data's vast scale and proprietary restrictions present significant challenges. Within this topic block, we will explore methods to connect model behavior to its pre-training data.
Context versus Prior Knowledge in Language Models
Detection and Measurement of Syntactic Templates in Generated Text
Measuring Causal Effects of Data Statistics on Language Model's `Factual' Predictions
Understanding In-Context Learning via Supportive Pretraining Data
Training Data Attribution. Training Data Attribution (TDA) involves identifying the sources of data that contribute to specific outputs generated by an LLM. TDA is critical for ensuring accountability, and complying with copyright and data usage laws. By attributing outputs to their underlying data, researchers and practitioners can better assess model performance, trace issues like hallucinations, and improve transparency. In this topic block, we will discuss different techniques for TDA and their technical and ethical challenges and implications.
“According to . . . ”: Prompting Language Models Improves Quoting from Pre-Training Data
Automatic Evaluation of Attribution by Large Language Models
Helpful or Harmful Data? Fine-tuning-free Shapley Attribution for Explaining Language Model Predictions
TRAK: Attributing Model Behavior at Scale
Pluralistic Alignment. As LLMs are increasingly deployed in diverse cultural and professional settings, LLMs should cater to the perspectives, values, and goals of various user groups. Unlike traditional approaches, which often aim for a single alignment objective, pluralistic alignment emphasizes flexibility, enabling LLMs to respond appropriately based on the context and the user's intentions and background. While studying different methods for pluralistic alignment, we will also address various ethical and technical challenges in developing LLMs for real-world applications.
MaxMin-RLHF: Alignment with Diverse Human Preferences
Modular Pluralism: Pluralistic Alignment via Multi-LLM Collaboration
PERSONA: A Reproducible Testbed for Pluralistic Alignment
Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging
Multi-agent simulations with Large Language Models. New multi-agent simulations use Large Language Models to model complex interactions, from social behavior to economic systems. These agents can mimic human actions, debate ideas or reveal how knowledge or misinformation spreads. In this seminar, we’ll explore how these simulations work and how they can be used to draw conclusions for the real world.
Generative Agents: Interactive Simulacra of Human Behavior
Social Simulacra: Creating Populated Prototypes for Social Computing Systems
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate
EconAgent: Large Language Model-Empowered Agents for Simulating Macroeconomic Activities

The schedule below is preliminary, dates are subject to change.

Registration period	until 10.02.2025	via Portal2
Kick-off meeting	20.02.25, 11:00–11:45 L 15 1-6, room 314/315	General information
Drop-out until	23.02.2025
Midterm	03.04.25 or 07.04.25 , 09:00–14:00 L 15 1-6, room 314/315	Presentations
Endterm	12.05.25 or 15.05.25, 09:00–14:00 L 15 1-6, room 314/315	Presentations
Submission deadline	25.05.2025, 23:59	Written report

Registration
Please register via Portal2.

Seminar Data-Science I (Methods)

CS 721 Master Seminar (M. Sc. Wirt. Inf., M.Sc. MMDS, Lehramt für Gymnasien)

Contact

Marlene Lutz

Maximilian Kreutner

Course Information

Course Description

Objectives

Topics

Schedule

Registration

FORUM