Seminar Data-Science I (Methods)

CS 721 Master Seminar (M. Sc. Wirt. Inf., M.Sc. MMDS, Lehramt für Gymnasien)

Lecturer	Georg Ahnert, Jana Jung, Marlene Lutz, Jens Rupprecht
Course Format	Seminar
Offering	HWS/FSS
Credit Points	4 ECTS
Language	English
Grading	Written report (40%), Report review (10%), Oral presentation (40%) and Discussion (10%)
Examination date	See schedule below
Information for Students	The course is limited to 16 participants. Please register centrally via Portal2.

Contact

For administrative questions, please contact Georg Ahnert.

Bild: Georg Ahnert

Georg Ahnert

Wissenschaftlicher Mitarbeiter

Universität Mannheim
L 15, 1–6
3. OG – Raum 322
68161 Mannheim

E-Mail: ahnertuni-mannheim.de
Web: georgahnert.de

Course Information

Course Description
In this seminar, students perform scientific research, either in the form of a literature review or by conducting a small experiment, or a mixture of both, and prepare a written report about the results. Topics of interest focus around a variety of problems and tasks from the fields of Data-Science, Network Science and Text Mining.
Previous participation in the courses Network Science and Text Analytics are recommended.
Objectives
Expertise: Students will acquire a deep understanding of the research topic. They are expected to describe and summarize a topic in detail in their own words, as well as to judge the contribution of the research papers to ongoing research.
Methodological competence: Students will develop methods and skills to find relevant literature for their topic, to write a well-structured scientific paper and to present their results.
Topics
This seminar will be split into four main topic blocks. Every student will be assigned a research paper from only one of these blocks to work on. Yet, it is expected that students also actively participate in the discussion of papers from other topic blocks after they have been presented.
The four topics we are going to discuss in the FSS 2026 are:
Prompt Attribution. The „black-box“ nature of large language models (LLMs) makes it difficult to tell exactly how specific prompts drive model outputs. In this seminar, we will explore different prompt attribution techniques that aim to make LLMs more transparent through methods such as gradient-based saliency, attention visualizations, and perturbation analysis. This transparency can benefit, for instance, prompt engineering through evidence-based refinement, but different prompt attribution methods might yield conflicting results.
Integrated Directional Gradients: Feature Interaction Attribution for Neural NLP Models (PDF)
Token-wise Decomposition of Autoregressive Language Model Hidden States for Analyzing Model Predictions
A Diagnostic Study of Explainability Techniques for Text Classification
Interpreting Language Models with Contrastive Explanations
Data Attribution. In this topic block, we explore how generative models rely on and reproduce their training data, raising questions of attribution, legality, and trust. We will examine how models can imitate copyrighted or private content, how to measure and mitigate such behavior, and how to identify what data a model has memorized even without access to its training set. The seminar will focus on methods for tracing and auditing model behavior, and on emerging strategies to ensure generative AI systems remain accountable and compliant.
Fantastic Copyrighted Beasts and How (Not) to Generate Them
HALoGEN: Fantastic LLM Hallucinations and Where to Find Them
How Many Images Does It Take? Estimating Imitation Thresholds in Text-to-Image Models
Information-Guided Identification of Training Data Imprint in (Proprietary) Large Language Models
LLM Jailbreaking. We will investigate how Large Language Models’ safety mechanisms can be attacked, evaluated, and improved, with a particular focus on red-teaming and jailbreak strategies. This body of work is highly relevant for advancing AI safety research, as it exposes fundamental weaknesses in current alignment strategies while motivating more robust, transparent, and resilient defenses for real-world LLM deployment.
MART: Improving LLM Safety with Multi-round Automatic Red-Teaming
Effective Red-Teaming of Policy-Adherent Agents
Foot-In-The-Door: A Multi-turn Jailbreak for LLMs
How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States
WordGame: Efficient & Effective LLM Jailbreak via Simultaneous Obfuscation in Query and Response
Long-Context Windows vs. RAG Architectures. Modern LLMs come up with larger and larger context window sizes. For example, the recently published Llama 4 Scout has a context length of up to 10 million tokens. However, as Large Language Models evolve, a critical architectural discussion emerges: Should we feed models entire libraries via million-token context windows, or rely on external search mechanisms like Retrieval-Augmented Generation (RAG)? Thus, by analyzing papers on retrieval failures, hallucination risks, and the „Lost in the Middle“ effect in generative AI, this block offers a broad overview of this architectural debate.
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Lost in the Middle: How Language Models Use Long Contexts (PDF)
Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach (PDF)
LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering
Precise Zero-Shot Dense Retrieval without Relevance Labels

The schedule below is preliminary, dates are subject to change.

Registration period	until 09.02.2026, 23:59	via Portal2
Kick-off meeting	20.02.2026, 09:30–10:15 L 15 1-6, room 314/315	General information
Drop-out until	22.02.2026, 23:59
1st Presentation Date	20.03.2026, 09:00–12:00 (both groups) L 15 1-6, room 314/315	Presentations
2nd Presentation Date	23.03. or 17.04.2026, 09:00–12:00 L 15 1-6, room 314/315	Presentations
Report draft due	23.04.2026, 23:59
Peer review of report drafts	29.04.2026, 23:59
Submission deadline	15.05.2026, 23:59	Written Report

Registration
Please register via Portal2.

Seminar Data-Science I (Methods)

CS 721 Master Seminar (M. Sc. Wirt. Inf., M.Sc. MMDS, Lehramt für Gymnasien)

Contact

Georg Ahnert

Course Information

Course Description

Objectives

Topics

Schedule

Registration

FORUM