DE / EN

IS 616: Large Scale Data Analysis and Visualization

Contents
This course teaches students principles of scientific visualization of data using the R and Python programming languages. Starting from introductory large scale data handling and basics of visualization, more advanced methods for visualization will also be covered. Important libraries and frameworks that are essential for data analysis and visualization are introduced.

Learning outcomes
On completion of the course, students should be familiar with libraries in the R and Python programming languages that enable them to create professional scientific visualizations. This outcome includes the application of those scientific libraries, handling of large datasets and knowledge of many examples of how challenges in scientific visualization were overcome and in what ways creative solutions were found. Skills:

  • Knowledge on how to include scientific visualization in research projects
  • Independent choice of ways to prepare large scale data to run visualization methods to solve a given problem
  • Knowledge about different libraries and their (dis-)advantages
  • Data preprocessing, analysis, organisation and visualization

Necessary prerequisites

Recommended prerequisites
Basic knowledge about statistics and 1) either basic knowledge of R and Python or 2) intermediate knowledge of either Python or R and willingness to learn the other, yet unfamiliar language

Forms of teaching and learningContact hoursIndependent study time
Lecture2 SWS10 SWS
Exercise class2 SWS7 SWS
ECTS credits6
Graded yes
Workload180h
LanguageEnglish
Form of assessmentWritten exam (90 min)
Restricted admissionyes
Further informationhttps://www.bwl.uni-mannheim.de/strohmaier/teaching/
Examiner
Performing lecturer
Prof. Dr. Markus Strohmaier
M. Strohmaier & M. Pellert
Frequency of offeringFall semester
Duration of module 1 semester
Range of applicationM.Sc. MMM, M.Sc. VWL, M.Sc. Wirt. Inf., MMDS
Preliminary course workSuccessful completion of the corresponding exercises
Program-specific Competency GoalsCG 2
LiteratureClaus Wilke: Fundamentals of Data Viz (https://clauswilke.com/dataviz/), Roger D. Peng & Elizabeth Matsui: The Art of Data Science (https://bookdown.org/rdpeng/artofdatascience/), Julia Silge & David Robinson: Tidy Text Mining (https://www.tidytextmining.com/), Robin Lovelace, Jakub Nowosad, Jannes Muenchow: Geocomputation with R (https://r.geocompx.org/), Kieran Healy: Data Visualization (https://socviz.co/index.html), Winston Chang: ggplot2 cookbook (http://www.cookbook-r.com/Graphs/), Jake VanderPlas: Python Data Science Handbook (https://jakevdp.github.io/PythonDataScienceHandbook/), BBC Data Journalism team (https://medium.com/bbc-visual-and-data-journalism/how-the-bbc-visual-and-data-journalism-team-works-with-graphics-in-r-ed0b35693535)
Course outlineThis course starts with the fundamental concepts of working with data in R and Python. This will progress towards methods that can be used to work with large data sets. Concurrently, basic concepts of visualization will be introduced. After that, we study selected examples of scientific visualizations (from historical times until today). Together, we will reconstruct the problem situation that scientists were facing when creating these visualizations and we will study their creative problem solutions to learn by example. While we provide theoretical background where necessary, we strongly focus on implementations to solve practical problems.