Course syllabus

Course schedule and content: [see modules]

Course source code: [see github]

Practical information:

Each day will be divided in two sessions, 10:00 - 13:00 and 14:00 - 17:00. Each session we will introduce / explain a topic followed by time for practice. See the schedule for the location for each day. 


The explosion of digital communication and increasing efforts to digitize existing material has produced a deluge of material such as digitized historical news archives, policy and legal documents, political debates and millions of social media messages by politicians, journalists, and citizens. This has the potential of putting theoretical predictions about the societal roles played by information, and the development and effects of communi­cation to rigorous quantitative tests that were impossible before. Besides providing an opportunity, the analysis of such “big data” sources also poses methodological challenges. Traditional manual content analysis does not scale to very large data sets due to high cost and complexity. For this reason, many researchers turn to automatic text analysis using techniques such as dictionary analysis, automatic clustering and scaling of latent traits, and machine learning.

To properly use such techniques, however, requires a very specific skill set. This course aims to give interested PhDs from the Faculty of Social Science an introduction to text analysis. R will be used as platform and language of instruction, but the basic principles and methods are easily generalizable to other languages and tools such as python. Participants will be given handouts with examples based on pre-existing data to follow along, but are encouraged to work on their own data and problems using the techniques offered.

This course is based on the open source modules developed by the organizers available on github


Period:             4-8 March, 2019
Lecturers:        dr. Wouter van Atteveldt and dr. Kasper Welbers (Department of Communication Science)
Organizers:     Nadia Bij de Vaate, Britta Brugman, Ellen Droog, and Felicia Löcherbach  (Department of Communication Science)
Credits:            3 EC

Bring your own laptop with R and RStudio installed


Before the course starts, please install R and RStudio and complete the online Datacamp Introduction to R course. This online course covers basic knowledge of R and its programming language that you will need to successfully complete this Data Analysis in R course.

Target group

PhDs from the Faculty of Social Sciences that use quantitative text analysis in their projects. The course is also open to others who would like to learn how to work with R. The maximum number of participants is 10.

Course objectives



Upon completion of this course, PhDs should be able to:

  • Understand the R programming language and software environment;
  • Perform web scraping (e.g., news articles, social media responses) with R;
  • Organize, transform and merge data with R;
  • Visualize data as graphs and figures with R;
  • Conduct simple analyses with R (i.e., descriptive statistics, correlations, chi-square, (in)dependent t-test, one-way ANOVA, linear regression);
  • Use R packages to conduct more complex analyses that are relevant to their own project (e.g., factor analysis, multilevel analysis, time series analysis).


PhDs are required (1) to complete the online Datacamp Introduction to R course and to read the assigned readings prior to the start of the course, (2) to actively participate in the seminars, and to (3) submit a final assignment showing that they can independently use R to conduct their own analyses.

Course summary:

Date Details