Seminar Data-Science I (Methods)
CS 721 Master Seminar (M. Sc. Wirt. Inf., M.Sc. MMDS, Lehramt für Gymnasien)
| Lecturer | Georg Ahnert, Jana Jung, Marlene Lutz, Jens Rupprecht |
| Course Format | Seminar |
| Offering | HWS/ |
| Credit Points | 4 ECTS |
| Language | English |
| Grading | Written report (40%), Report review (10%), Oral presentation (40%) and Discussion (10%) |
| Examination date | See schedule below |
| Information for Students | The course is limited to 16 participants. Please register centrally via Portal2. |
Contact
For administrative questions, please contact Georg Ahnert.

Georg Ahnert
L 15, 1–6
3. OG – Raum 322
68161 Mannheim
Course Information
Course Description
In this seminar, students perform scientific research, either in the form of a literature review or by conducting a small experiment, or a mixture of both, and prepare a written report about the results. Topics of interest focus around a variety of problems and tasks from the fields of Data-Science, Network Science and Text Mining.
Previous participation in the courses Network Science and Text Analytics are recommended.
Objectives
Expertise: Students will acquire a deep understanding of the research topic. They are expected to describe and summarize a topic in detail in their own words, as well as to judge the contribution of the research papers to ongoing research.
Methodological competence: Students will develop methods and skills to find relevant literature for their topic, to write a well-structured scientific paper and to present their results.
Topics
This seminar will be split into four main topic blocks. Every student will be assigned a research paper from only one of these blocks to work on. Yet, it is expected that students also actively participate in the discussion of papers from other topic blocks after they have been presented.
The four topics we are going to discuss in the FSS 2026 are:
- Prompt Attribution. The „black-box“ nature of large language models (LLMs) makes it difficult to tell exactly how specific prompts drive model outputs. In this seminar, we will explore different prompt attribution techniques that aim to make LLMs more transparent through methods such as gradient-based saliency, attention visualizations, and perturbation analysis. This transparency can benefit, for instance, prompt engineering through evidence-based refinement, but different prompt attribution methods might yield conflicting results.
- Integrated Directional Gradients: Feature Interaction Attribution for Neural NLP Models (PDF)
- Token-wise Decomposition of Autoregressive Language Model Hidden States for Analyzing Model Predictions
- A Diagnostic Study of Explainability Techniques for Text Classification
- Interpreting Language Models with Contrastive Explanations
- Data Attribution. In this topic block, we explore how generative models rely on and reproduce their training data, raising questions of attribution, legality, and trust. We will examine how models can imitate copyrighted or private content, how to measure and mitigate such behavior, and how to identify what data a model has memorized even without access to its training set. The seminar will focus on methods for tracing and auditing model behavior, and on emerging strategies to ensure generative AI systems remain accountable and compliant.
- Fantastic Copyrighted Beasts and How (Not) to Generate Them
- HALoGEN: Fantastic LLM Hallucinations and Where to Find Them
- How Many Images Does It Take? Estimating Imitation Thresholds in Text-to-Image Models
- Information-Guided Identification of Training Data Imprint in (Proprietary) Large Language Models
- LLM Jailbreaking. We will investigate how Large Language Models’ safety mechanisms can be attacked, evaluated, and improved, with a particular focus on red-teaming and jailbreak strategies. This body of work is highly relevant for advancing AI safety research, as it exposes fundamental weaknesses in current alignment strategies while motivating more robust, transparent, and resilient defenses for real-world LLM deployment.
- MART: Improving LLM Safety with Multi-round Automatic Red-Teaming
- Effective Red-Teaming of Policy-Adherent Agents
- Foot-In-The-Door: A Multi-turn Jailbreak for LLMs
- How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States
- WordGame: Efficient & Effective LLM Jailbreak via Simultaneous Obfuscation in Query and Response
- Long-Context Windows vs. RAG Architectures. Modern LLMs come up with larger and larger context window sizes. For example, the recently published Llama 4 Scout has a context length of up to 10 million tokens. However, as Large Language Models evolve, a critical architectural discussion emerges: Should we feed models entire libraries via million-token context windows, or rely on external search mechanisms like Retrieval-Augmented Generation (RAG)? Thus, by analyzing papers on retrieval failures, hallucination risks, and the „Lost in the Middle“ effect in generative AI, this block offers a broad overview of this architectural debate.
- Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
- Lost in the Middle: How Language Models Use Long Contexts (PDF)
- Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach (PDF)
- LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering
- Precise Zero-Shot Dense Retrieval without Relevance Labels
- Prompt Attribution. The „black-box“ nature of large language models (LLMs) makes it difficult to tell exactly how specific prompts drive model outputs. In this seminar, we will explore different prompt attribution techniques that aim to make LLMs more transparent through methods such as gradient-based saliency, attention visualizations, and perturbation analysis. This transparency can benefit, for instance, prompt engineering through evidence-based refinement, but different prompt attribution methods might yield conflicting results.
Schedule (for FSS 2026: tba)
The schedule below is preliminary, dates are subject to change.
Registration period until 01.09.2025, 23:59 via Portal2 Kick-off meeting 10.09.25, 9:00–9:45
L 15 1-6, room 314/
315 General information Drop-out until 14.09.25 1st Presentation Date 13.10. or 20.10.25 , 10:15–13:15
L 15 1-6, room 314/
315 Presentations 2nd Presentation Date 27.10. or 03.11.25, 10:15–13:15
L 15 1-6, room 314/
315 Presentations Peer review of report drafts Week of 10.11.2025 Submission deadline 28.11.2025, 23:59 Written report Registration
Please register via Portal2.