The political geography of human development

The research I did for the previous post on the inadequacy of the widely-used term ‘Global South’ led me to some surprising results about the political geography of development.

Although the relationship between latitude and human development is not linear, distance from the equator turned out to have a rather strong, although far from deterministic and not necessarily causal, link with a country’s development level, as measured by its Human Development Index (HDI). Even more remarkably, once we include indicators (dummy variables) for islands and landlocked countries, and interactions between these and distance from the equator, we can account for more than 55% of the variance in HDI (2017). In other words, with three simple geographic variables and their interactions we can ‘explain’ more than half of the variation in the level of development of all countries in the world today. Wow! The plot below (pdf) shows these relationships.

 

 

In case you are wondering whether this results is driven by many small counties with tiny populations, it is not, When we run a weighted linear regression with population size as the weight, the adjusted R-squared of the model remains still (just above) 0.50. On a sidenote, including dummies for (former) communist countries and current European Union (EU) member states pushed the R-squared above 0.60. Communist regime or legacy is associated with significantly lower HDI, net of the geographic variables, and EU membership is associated with significantly higher HDI.

The next question to consider is whether the relationship between geography and development has grown weaker or stronger over time. There are many plausible ideas we might have about the influence of globalization, the spread of information and communication technologies, wars, and financial crises on the links between geography and development. When we look at the data, however, it turns out that the strength of the link has remained roughly the same since 1990. Wow! Despite of all global social and political transformations over the past 30 years, geography still play the same, rather larger role in constraining and enabling human development. The gif below shows the same plots for 1990, 2000, 2010, and 2017. While overal development grows over time, the relationship with distance from the equator remains roughly the same, as indicated by the slopes of the linear regression lines.

 

 

Note that the way the HDI is constructed (HDI) makes changes in development over time not quite comparable (the index is capped at 1.0, so if you are an already highly developed country, there is not much scope to improve further your index). Also, the sample of countries for which there is available data is smaller in 1990 (N=144) than in 2017 (N=191).

Since we mentioned population size, let’s consider the link between the population size of a country and its level of HDI. Are small countries more successful? Does it pay off to be a large state? Maybe countries with populations that are neither too big nor too small perform best?

As the plot below (pdf) shows, there is no clear relationship between population size and HDI. The linear regression line slopes slightly downwards but the ‘effect’ is not significant and it is not really linear. The loess fit meanders up and down without a clear pattern. It turns out there is no sweet spot for population size when it comes to human development. Small populations can be just as good, and just as bad, and bigger ones. There are tiny states that are successful, and ones that do pretty badly. The same for mid-sized, big, and enormous countries (not in terms of area, but population).

 

 

This lack of relationship is quite remarkable, but there is another surprise when we look at the change in development between 2000 and 2017. As the plot below (pdf) shows, more populous countries have been more successful in improving their HDI over the past 18 years. It is not a huge difference, but given the overall small scale of the observed changes, it is significant and important.

 

 

To sum up, while in general population size is not related to development, during the past two decades more populous countries have been more successful in improving their development index. This is of course good news, as it means that more people live longer, study longer, and enjoy higher standards of living.

For now, this concludes my exploits in political geography, which turned out to harbor more insights that I expected, even when I have only explored a total of five variables. If you want to continue from here on your own, the R script for the figures is here and the datafile is here.

Olympic medals, economic power and population size

The 2016 Rio Olympic games being officially over, we can obsess as much as we like with the final medal table, without the distraction of having to actually watch any sports. One of the basic questions to ponder about the medal table is to what extent Olympic glory is determined by the wealth, economic power and population size of the countries.

Many news outlets quickly calculated the ratios of the 2016 medal count with economic power and population size per country and presented the rankings of ‘medals won per billion of GDP’ and ‘medals won per million of population’ (for example here and here). But while these rankings are fun, they give us little idea about the relationships between economic power and population size, on the one hand, and Olympic success, on the other. Obviously, there are no deterministic links, but there could still be systematic relationships. So let’s see.

Data

I pulled from the Internet the total number of medals won at the 2016 Olympic games and assigned each country a score in the following way: each country got 5 points for a gold medal, 3 points for silver, and 1 point for bronze. (Different transformations of medals into points are of course possible.) To measure wealth and economic power, I got the GDP (at purchasing power parity) estimates for 2015 provided by the International Monetary Fund, complemented by data from the CIA Factbook (both sets of numbers available here). For population size, I used the Wikipedia list available here.

Olympic medals and economic power

The plot below shows how the total medal points (Y-axis) vary with GDP (X-axis). Each country is represented by a dot (ok, by a snowflake), and some countries are labeled. Clearly, and not very surprisingly, countries with higher GDP have won more medals in Rio. What is surprising however, is that the relationship is not too far from linear: the red line added to the plot is the OLS regression line, and it turns out that this line summarizes the relationship as well (or as badly) as other, more flexible alternatives (like the loess line shown on the plot in grey). The estimated linear positive relationship implies that, on average, each 1,000 billion of GDP bring about 16 more medal points (so ~315 billion earns you another gold medal).olymp1

The other thing to note from the plot is that the relationship is between medal points and total GDP, thus not GDP per capita. In fact, GDP per capita, which measures the relative wealth of a country, has a much weaker relationship with Olympic success with a number of very wealthy, and mostly very small, countries getting zero medals. The correlation of Olympic medal points with GDP is 0.80, while with GDP per capita is 0.21. So it is absolute and not relative wealth that matters more for Olympic glory. This would seem to make sense as it is not money but people who compete at the games, and you need a large pool of contenders to have a chance. But let’s examine more closely whether and how does population size matter.

Olympic medals and population size

The following plot shows how the number of 2016 Rio medal points earned by each country varies with population size. Overall, the relationship is positive, but it is not quite linear, and it is not very consistent (the correlation is 0.40). Some very populous countries, like India, Indonesia, and Pakistan have won very few medals, and some very small ones have won at least one. The implied effect of population size is also small in substantive terms: each 10 million people are associated with 1 more medal point (so, a bronze); for reference three quarters of the countries in the dataset have less than 25 million inhabitants.

olymp2

Putting everything together

Now, we can put both GDP and population size in the same statistical model with the aim of summarizing the observed distribution of medal points as best as we can. In addition to these two predictors, we can add an interaction between the two, as well as different non-linear transformations of the individual predictors. In fact, the possibilities for modeling are quite a few even with only two predictors, so we have to pick a standard for selecting the best model. As the goal is to describe the distribution of medal points, it makes sense to use the sum of the errors (the absolute values of the differences between the actual and predicted medal score for each country) that the models make as a benchmark.

I find that two models describe the data almost equally well. Both use simple OLS linear regression. The first one features population size, GDP, and GDP squared. In this multivariate model, population size turns out to have a negative relationship with Olympic success, net of economic power. GDP has a positive relationship, but the quadratic term implies that the effect is not truly linear but declines in magnitude with higher values of GDP. The substantive interpretation of this model is something along these lines: Olympic success increases at a slightly declining rate with the economic power of a country, but given a certain level of economic power, less populous countries do better. The sum of errors of Model 1 is 1691 medal points.

The second model is similar, but instead of the squared term for GDP it features an interaction between GDP and population size. The interaction turns out to be negative. This implies that economically powerful but populous countries do less well than their level of GDP alone would suggest. This interpretation is a bit strange as population size is positively associated with GDP and seems to suggest that it is relative wealth (GDP per capita) that matters, but this turns out not to be the case, as any model that features GDP per capita has a bigger sum of errors than either Model 1 or Model 2.

Model 1 Model 2
Population size – 0.20 – 0.09
GDP + 0.04 + 0.03
GDP squared – 0.00000008 /
GDP*Population / -0.0000008
Sum of errors 1691 1678
Adjusted R-squared 0.83 0.81

Both models presented so far are linear which is not entirely appropriate given that the outcome variable – medal points – is constrained to be non-negative and is not normally distributed. The models actually predict that some countries, like Kenya, should get a negative number of medal points, which is clearly impossible. To remedy that, we can use statistical models specifically developed for non-negative (count) data: Poisson, negative binomial, or even hurdle or zero-inflated models that can account for the excess number of countries with no medal points at all. I spend a good deal of time experimenting with these models, but I didn’t find any that improved at all on the simple linear models described above (it is actually quite hard even evaluating the performance of these non-linear models). Let me know if you find a different model that does better than the ones reported here. (But please no geographical dummies or past Olympic performance measures; also, the Olympic delegation size would be a mediator so not a proper predictor).

The one model I can find that outperforms the simple OLS regressions is a generalized additive model (GAM) with a flexible form for the interaction. This model has a sum of errors of 1485, and the interaction surface looks like this:interactionGDPpop

In conclusion, do the population size, economic power and wealth of countries account for their success at the 2016 Olympic games? Yes, to a large extent. It is economic power and not relative wealth that matters more, and population size actually has a negative effect once economic power is taken into account. So the relationships are rather complex and, to remind, far from deterministic.

 

Here is the data (text file): olypm. Let me know if you interested in the R script for the analysis, and I will post it.
Finally, here is a ranking of the countries by the size of the model error (based on Model 2; negative predictions have been replaced with zero). This can be interpreted in the following way: the best way to summarize the distribution of medal points won at the 2016 Rio Olympic games as a function of population size and GDP is the model described above. This model implies a prediction for each country. The ones that outperform their model predictions have achieved more than their level of GDP and economic size imply. The ones with negative errors underperform in the sense that they have achieved less than their level of GDP and economic size imply.

country 2016 medals 2016 medal points predicted medal points model error
Great Britain 67 221 68 153
Russia 56 168 87 81
Australia 29 83 30 53
France 42 118 68 50
Kenya 13 49 0 49
New Zealand 18 52 4 48
Hungary 15 53 6 47
Netherlands 19 65 22 43
Jamaica 11 41 0 41
Croatia 10 36 2 34
Cuba 11 35 2 33
Azerbaijan 18 36 4 32
Germany 42 130 98 32
Uzbekistan 13 33 2 31
Italy 28 84 54 30
Kazakhstan 17 39 10 29
Denmark 15 35 7 28
Ukraine 11 29 5 24
Serbia 8 24 2 22
North Korea 7 21 0 21
Sweden 11 31 12 19
Belarus 9 21 4 17
Ethiopia 8 16 0 16
Georgia 7 17 1 16
South Korea 21 63 47 16
China 70 210 195 15
South Africa 10 30 15 15
Armenia 4 14 0 14
Greece 6 20 7 13
Slovakia 4 16 4 12
Spain 17 53 41 12
Colombia 8 24 14 10
Czech Republic 10 18 8 10
Slovenia 4 12 2 10
Switzerland 7 23 13 10
Bahamas 2 6 0 6
Bahrain 2 8 2 6
Ivory Coast 2 6 0 6
Belgium 6 18 13 5
Fiji 1 5 0 5
Kosovo 1 5 0 5
Tajikistan 1 5 0 5
Lithuania 4 6 2 4
Burundi 1 3 0 3
Grenada 1 3 0 3
Jordan 1 5 2 3
Mongolia 2 4 1 3
Niger 1 3 0 3
Puerto Rico 1 5 2 3
Bulgaria 3 5 3 2
Canada 22 44 43 1
Moldova 1 1 0 1
Romania 5 11 10 1
Vietnam 2 8 7 1
Afghanistan 0 0 0 0
American Samoa 0 0 0 0
Andorra 0 0 0 0
Antigua and Barbuda 0 0 0 0
Aruba 0 0 0 0
Barbados 0 0 0 0
Belize 0 0 0 0
Benin 0 0 0 0
Bermuda 0 0 0 0
Bhutan 0 0 0 0
British Virgin Islands 0 0 0 0
Burkina Faso 0 0 0 0
Cambodia 0 0 0 0
Cameroon 0 0 0 0
Cape Verde 0 0 0 0
Cayman slands 0 0 0 0
Central African Republic 0 0 0 0
Chad 0 0 0 0
Comoros 0 0 0 0
Congo 0 0 0 0
Cook Islands 0 0 0 0
Djibouti 0 0 0 0
Dominica 0 0 0 0
DR Congo 0 0 0 0
Eritrea 0 0 0 0
Estonia 1 1 1 0
Gambia 0 0 0 0
Guam 0 0 0 0
Guinea 0 0 0 0
Guinea-Bissau 0 0 0 0
Guyana 0 0 0 0
Haiti 0 0 0 0
Honduras 0 0 0 0
Iceland 0 0 0 0
Kiribati 0 0 0 0
Kyrgyzstan 0 0 0 0
Laos 0 0 0 0
Lesotho 0 0 0 0
Liberia 0 0 0 0
Liechtenstein 0 0 0 0
Madagascar 0 0 0 0
Malawi 0 0 0 0
Maldives 0 0 0 0
Mali 0 0 0 0
Malta 0 0 0 0
Marshall Islands 0 0 0 0
Mauritania 0 0 0 0
Micronesia 0 0 0 0
Monaco 0 0 0 0
Montenegro 0 0 0 0
Mozambique 0 0 0 0
Nauru 0 0 0 0
Nepal 0 0 0 0
Nicaragua 0 0 0 0
Palau 0 0 0 0
Palestine 0 0 0 0
Papua New Guinea 0 0 0 0
Poland 11 25 25 0
Rwanda 0 0 0 0
Saint Kitts and Nevis 0 0 0 0
Saint Lucia 0 0 0 0
Samoa 0 0 0 0
San Marino 0 0 0 0
Sao Tome and Principe 0 0 0 0
Senegal 0 0 0 0
Seychelles 0 0 0 0
Sierra Leone 0 0 0 0
Solomon Islands 0 0 0 0
Somalia 0 0 0 0
South Sudan 0 0 0 0
St Vincent and the Grenadines 0 0 0 0
Suriname 0 0 0 0
Swaziland 0 0 0 0
Tanzania 0 0 0 0
Timor-Leste 0 0 0 0
Togo 0 0 0 0
Tonga 0 0 0 0
Trinidad and Tobago 1 1 1 0
Tunisia 3 3 3 0
Tuvalu 0 0 0 0
Uganda 0 0 0 0
US Virgin Islands 0 0 0 0
Vanuatu 0 0 0 0
Yemen 0 0 0 0
Zambia 0 0 0 0
Zimbabwe 0 0 0 0
Albania 0 0 1 -1
Bangladesh 0 0 1 -1
Bolivia 0 0 1 -1
Bosnia and Herzegovina 0 0 1 -1
Botswana 0 0 1 -1
Brunei 0 0 1 -1
Cyprus 0 0 1 -1
El Salvador 0 0 1 -1
Equatorial Guinea 0 0 1 -1
FYR Macedonia 0 0 1 -1
Gabon 0 0 1 -1
Ghana 0 0 1 -1
Ireland 2 6 7 -1
Latvia 0 0 1 -1
Mauritius 0 0 1 -1
Namibia 0 0 1 -1
Paraguay 0 0 1 -1
Sudan 0 0 1 -1
Syria 0 0 1 -1
Costa Rica 0 0 2 -2
Dominican Rep. 1 1 3 -2
Guatemala 0 0 2 -2
Libya 0 0 2 -2
Luxembourg 0 0 2 -2
Panama 0 0 2 -2
Turkmenistan 0 0 2 -2
Uruguay 0 0 2 -2
Angola 0 0 3 -3
Lebanon 0 0 3 -3
Myanmar 0 0 3 -3
Ecuador 0 0 4 -4
Morocco 1 1 5 -4
Sri Lanka 0 0 4 -4
Argentina 4 18 23 -5
Finland 1 1 6 -5
Israel 2 2 7 -5
Oman 0 0 5 -5
Qatar 1 3 8 -5
Thailand 6 18 23 -5
Norway 4 4 10 -6
Portugal 1 1 7 -6
Algeria 2 6 13 -7
Brazil 19 59 66 -7
Malaysia 5 13 20 -7
Venezuela 3 5 12 -7
Iran 8 22 30 -8
Pakistan 0 0 8 -8
Peru 0 0 8 -8
Philippines 1 3 11 -8
Singapore 1 5 13 -8
Austria 1 1 11 -10
Chile 0 0 10 -10
Hong Kong 0 0 11 -11
Nigeria 1 1 13 -12
India 2 4 17 -13
Iraq 0 0 13 -13
Japan 41 105 119 -14
U.A.E. 1 1 18 -17
Egypt 3 3 21 -18
Turkey 8 18 37 -19
Chinese Taipei 3 7 29 -22
Mexico 5 11 49 -38
Indonesia 3 11 51 -40
Saudi Arabia 0 0 44 -44
United States 121 379 431 -52