IS 616: Large Scale Data Analysis and Visualization
no offering in fall
Contents
This course teaches students principles of scientific visualization of data using the R and Python programming languages. Starting from introductory large scale data handling and basics of visualization, more advanced methods for visualization will also be covered. Important libraries and frameworks that are essential for data analysis and visualization are introduced.
Learning outcomes
On completion of the course, students should be familiar with libraries in the R and Python programming languages that enable them to create professional scientific visualizations. This outcome includes the application of those scientific libraries, handling of large datasets and knowledge of many examples of how challenges in scientific visualization were overcome and in what ways creative solutions were found.
Skills:
- Knowledge on how to include scientific visualization in research projects
- Independent choice of ways to prepare large scale data to run visualization methods to solve a given problem
- Knowledge about different libraries and their (dis-)advantages
- Data preprocessing, analysis, organisation and visualization
Necessary prerequisites
–
Recommended prerequisites
Basic knowledge about statistics and 1) either basic knowledge of R and Python or 2) intermediate knowledge of either Python or R and willingness to learn the other, yet unfamiliar language
Forms of teaching and learning | Contact hours | Independent study time |
---|---|---|
Lecture | 2 SWS | 10 SWS |
Exercise class | 2 SWS | 7 SWS |
ECTS credits | 6 |
Graded | yes |
Workload | 180h |
Language | English |
Form of assessment | Written exam (60 min) |
Restricted admission | yes |
Further information | https://www.bwl.uni-mannheim.de/en/information-systems/chairs/prof-dr-strohmaier/teaching/ |
Examiner Performing lecturer | ![]() | Prof. Dr. Markus Strohmaier M. Strohmaier & M. Pellert |
Frequency of offering | Fall semester |
Duration of module | 1 semester |
Range of application | M.Sc. MMM, M.Sc. VWL, M.Sc. Wirt. Inf., MMDS |
Preliminary course work | Successful completion of the corresponding exercises |
Program-specific Competency Goals | CG 2 |
Literature | Claus Wilke: Fundamentals of Data Viz (https://clauswilke.com/dataviz/), Roger D. Peng & Elizabeth Matsui: The Art of Data Science (https://bookdown.org/rdpeng/artofdatascience/), Julia Silge & David Robinson: Tidy Text Mining (https://www.tidytextmining.com/), Robin Lovelace, Jakub Nowosad, Jannes Muenchow: Geocomputation with R (https://r.geocompx.org/), Kieran Healy: Data Visualization (https://socviz.co/index.html), Winston Chang: ggplot2 cookbook (http://www.cookbook-r.com/Graphs/), Jake VanderPlas: Python Data Science Handbook (https://jakevdp.github.io/PythonDataScienceHandbook/), BBC Data Journalism team (https://medium.com/bbc-visual-and-data-journalism/how-the-bbc-visual-and-data-journalism-team-works-with-graphics-in-r-ed0b35693535) |
Course outline | This course starts with the fundamental concepts of working with data in R and Python. This will progress towards methods that can be used to work with large data sets. Concurrently, basic concepts of visualization will be introduced. After that, we study selected examples of scientific visualizations (from historical times until today). Together, we will reconstruct the problem situation that scientists were facing when creating these visualizations and we will study their creative problem solutions to learn by example. While we provide theoretical background where necessary, we strongly focus on implementations to solve practical problems. |