Skip to content

Category: R

Excess mortality in the Netherlands in 2020

What has been the impact of COVID-19 on mortality in the Netherlands? Using the methods described here, I estimated excess mortality in the country during 2020. The results are not pretty: around 15,000 additional deaths, 10% increase over the expected mortality for the year, 25% of the excess not captured by records of official COVID-19-related deaths. The analysis features comparisons of excess mortality over the past 10 years, as well as an exploration of 2020 excess mortality across age and gender. Read it here. You can also check the data and code (in R).

Modeling mortality

To grasp the true impact of COVID-19 on our societies, we need to know the effect of the pandemic on mortality. In other words, we need to know how many deaths can be attributed to the virus, directly and indirectly. It is already popular to visualize mortality in order to gauge the impact of the pandemic in different countries. You might have seen at least some of these graphs and websites: FT, Economist, Our World in Data, CBS, EFTA, CDC, EUROSTAT, and EUROMOMO. But estimating the impact of COVID-19 on mortality is also controversial, with people either misunderstanding or distrusting the way in which the impact is measured and assessed. That’s why, I put together a step-by-step guide about how we can go about estimating the impact of COVID-19 on mortality. In the guide, I build a large number of statistical models that we can use to predict expected mortality in 2020. The complexity of the models ranges from the simplest, based only on weekly averages from past years, to what is currently the state of the art. But this is not all. What I also do is review the predictive performance of all of these models, so that we know which ones work best. I run the models on publicly available data from the Netherlands, I use only the open software R, and I share the code, so anyone can check, replicate and extend the exercise. The guide is available here: http://dimiter.eu/Visualizations_files/nlmortality/Modeling-Mortality.html I hope this guide will provide some transparency about how expected mortality is and can be estimated…

Government positions from party-level Manifesto data (with R)

In empirical research in political science and public policy, we often need estimates of the political positions of governments (cabinets) and the salience of different issues for different governments (cabinets). Data on policy positions and issue salience is available, but typically at the level of political parties. One prominent source of data for issue salience and positions is the Manifesto Corpus, a database of the electoral manifestos of political parties. To ease the aggregation of government positions and salience from party-level Manifesto data, I developed a set of functions in R that accomplish just that, combining the Manifesto data with data on the duration and composition of governments from ParlGov. The see how the functions work, read this detailed tutorial. You can access all the functions at the dedicated GitHub repository. And you can contribute to this project by forking the code on GitHub. If you have questions or suggestions, get in touch. Enjoy!

Visualizing asylum statistics

Note: of potential interest to R users for the dynamic Google chart generated via googleVis in R and discussed towards the end of the post. Here you can go directly to the graph. An emergency refugee center, opened in September 2013 in an abandoned school in Sofia, Bulgaria. Photo by Alessandro Penso, Italy, OnOff Picture. First prize at World Press Photo 2013 in the category General News (Single). The tragic lives of asylum-seekers make for moving stories and powerful photos. When individual tragedies are aggregated into abstract statistics, the message gets harder to sell. Yet, statistics are arguably more relevant for policy and provide for a deeper understanding, if not as much empathy, than individual stories. In this post, I will offer a few graphs that present some of the major trends and patterns in the numbers of asylum applications and asylum recognition rates in Europe over the last twelve years. I focus on two issues: which European countries take the brunt of the asylum flows, and the link between the application share that each country gets and its asylum recognition rate. Asylum applications and recognition rates Before delving into the details, let’s look at the big picture first. Each year between 2001 and 2012, 370,000 people on average have applied for asylum protection in one of the member states of the European Union (plus Norway and Switzerland). As can be seen from Figure 1, the number fluctuates between 250,000 and 500,000 per year, and there is no clear trend. Altogether, during this 12-year period, approximately 4.5 million…

Predicting movie ratings with IMDb data and R

It’s Oscars season again so why not explore how predictable (my) movie tastes are. This has literally been a million dollar problem and obviously I am not gonna solve it here, but it’s fun and slightly educational to do some number crunching, so why not. Below, I will proceed from a simple linear regression to a generalized additive model to an ordered logistic regression analysis. And I will illustrate the results with nice plots along the way. Of course, all done in R (you can get the script here). Data The data for this little project comes from the IMDb website and, in particular, from my personal ratings of 442 titles recorded there. IMDb keeps the movies you have rated in a nice little table which includes information on the movie title, director, duration, year of release, genre, IMDb rating, and a few other less interesting variables. Conveniently, you can export the data directly as a csv file. Outcome variable The outcome variable that I want to predict is my personal movie rating. IMDb lets you score movies with one to ten stars. Half-points and other fractions are not allowed. It is a tricky variable to work with. It is obviously not a continuous one; at the same time ten ordered categories are a bit too many to treat as a regular categorical variable. Figure 1 plots the frequency distribution (black bars) and density (red area) of my ratings and the density of the IMDb scores (in blue) for the 442 observations…

Swimming in a sea of code

If you are looking for code here, move on. > In the beginning, there was only the relentless blinking of the cursor. With the maddening regularity of waves splashing on the shore: blink, blink, blink, blink…Beyond the cursor, the white wasteland of the empty page: vast, featureless, and terrifying as the sea. You stare at the empty page and primordial fear engulfs you: you are never gonna venture into this wasteland, you are never gonna leave the stable, solid, familiar world of menus and shortcuts, icons and buttons. And then you take the first cautious steps. print ‘Hello world’ > Hello world, the sea obliges. 1+1 > 2 2+2 > 4 You are still scared, but your curiosity is aroused. The playful responsiveness of the sea is tempting, and quickly becomes irresistible. Soon, you are jumpting around like a child, rolling upside-down and around and around: > a=2 > b=3 > a+b 5 > for (x in 1:60) print (x) 1    2    3    4    5    6    7    8    9   10   11   12   13   14  15   16   17   18   19   20   21   22   23   24   25   26  27   28   29   30   31   32   33   34   35   36   37   38  39   40   41   42   43   44   45   46   47   48   49   50  51   52   53   54   55   56   57   58   59   60 The sense of freedom is exhilarating. You take a deep breath and dive: > for (i in 1:10) ifelse (i>5, print (‘ha’), print (‘ho’)) [1] “ho” [1] “ho” [1] “ho” [1]…