DE / EN

MKT 624: Data Scraping for Analytics and AI using R

Contents
For scientists, online platforms like Twitter, Amazon, LinkedIn, TikTok or AirBnB are invaluable for social science research, offering extensive datasets ideal for analysis and predictive modeling. This course will guide you through the process of extracting, storing, and refining this data, ensuring you’re equipped for statistical analysis, predictive modeling, and AI applications. You’ll explore the crucial role of data science in social sciences and AI, then advance to using R for crafting web scrapers with libraries such as rvest, httr, and RSelenium.
The training encompasses advanced R techniques, interpreting web formats like HTML, CSS, JSON, and XML, using regular expressions, and managing diverse data types. You’ll learn to store data with relational databases and (My)SQL, plus how to efficiently extract data through APIs from platforms like Twitter and Yelp. The course will also briefly cover feature and embeddings extraction from text and images, enriching your datasets for detailed analysis and AI model development.
A special focus will be on enhancing your R skills to an advanced level and teaching you the basics of building programs from simple functional programs to Shiny apps, enabling you to create interactive web applications that showcase your scraped data.

Learning outcomes
Upon successful completion of this course, students will have the proficiency

  • to identify key online data sources,
  • develop sophisticated scrapers,
  • process data for analytical and AI applications, and
  • present your findings through an app

Necessary prerequisites

Recommended prerequisites
basics in statistics and/or empirical social research
basics in R and/or Python
basics in statistical analysis with R

Forms of teaching and learningContact hoursIndependent study time
Seminar2 SWS9 SWS
ECTS credits4
Graded yes
Workload120h
LanguageEnglish
Form of assessmentoral exam (presentation at the end of the seminar)
Restricted admissionyes
Further informationstudent portal
Examiner
Performing lecturer
Prof. Dr. Florian Stahl
Prof. Dr. Reto Hofstetter & Prof. Dr. Florian Stahl
Frequency of offeringSpring semester
Duration of module 1 semester
Range of applicationM.Sc. MMM, M.Sc. WiPäd, M.Sc. VWL, M.Sc. Wirt. Inf.
Preliminary course work
Program-specific Competency GoalsCG 1
Literature
  • please install the following software beforehand: R (latest version), RStudio (latest version), Java, RSelenium
Course outlineWill be announced at the beginning of the course.