DE / EN

Data Scraping for Analytics and AI using R

MKT 624

Lecturer Prof. Dr. Reto Hofstetter & Prof. Dr. Florian Stahl
Course Format Lecture
Credit Points 4 ECTS
Hours per Week 4
Semester FSS
Language English
Registration Please register for the course
Accepted Participants CDSB PhD Students, CDSE PhD Students, Mannheim Master in Business Research (MMBR)

Further Information

  • Brief Description

    For scientists, online platforms like Twitter, Amazon, LinkedIn, TikTok or AirBnB are invaluable for social science research, offering extensive datasets ideal for analysis and predictive modeling. This course will guide you through the process of extracting, storing, and refining this data, ensuring you’re equipped for statistical analysis, predictive modeling, and AI applications. You’ll explore the crucial role of data science in social sciences and AI, then advance to using R for crafting web scrapers with libraries such as rvest, httr, and RSelenium.

    The training encompasses advanced R techniques, interpreting web formats like HTML, CSS, JSON, and XML, using regular expressions, and managing diverse data types. You’ll learn to store data with relational databases and (My)SQL, plus how to efficiently extract data through APIs from platforms like Twitter and Yelp. The course will also briefly cover feature and embeddings extraction from text and images, enriching your datasets for detailed analysis and AI model development.

    A special focus will be on enhancing your R skills to an advanced level and teaching you the basics of building programs from simple functional programs to Shiny apps, enabling you to create interactive web applications that showcase your scraped data.

    Learning outcomes:

    Upon successful completion of this course, students will have the proficiency

    … to identify key online data sources,

    … develop sophisticated scrapers,

    … process data for analytical and AI applications, and

    … present your findings through an app

    Prerequisites:

    Formal:

    • Basics of statistics and/or empirical social research

    Content-wise:

    • Basic knowledge of R and/or Python
    • Basic knowledge of statistical analysis with R
  • Lecture

    Lecturer Prof. Dr. Florian Stahl
    Schedule Please refer to the latest information on Portal2 and ILIAS
    Assessment Homework, Written Exam
  • Required Readings

    • Required software (please install beforehand): R (latest version), RStudio (latest version), Java, RSelenium

    No literature explicitly required, course content will be provided on slides.