Skip to content

Author: demetriodor

Correlation does not imply causation. Then what does it imply?

‘Correlation does not imply causation’ is an adage students from all social sciences are made to recite from a very early age. What is less often systematically discussed is what could be actually going on so that two phenomena are correlated but not causally related. Let’s try to make a list: 1) The correlation might be due to chance. T-tests and p-values are generally used to guard against this possibility. 1a) The correlation might be due to coincidence. This is essentially a variant of the previous point but with focus on time series. It is especially easy to mistake pure noise (randomness) for patterns (relationships) when one looks at two variables over time. If you look at the numerous ‘correlation is not causation’ jokes and cartoons on the internet, you will note that most concern the spurious correlation between two variables over time (e.g. number of pirates and global warming): it is just easier to find such examples in time series than in cross-sectional data. 1b) Another reason to distrust correlations is the so-called ‘ecological inference‘ problem. The problem arises when data is available at several levels of observation (e.g. people nested in municipalities nested in states). Correlation of two variables aggregated at a higher level (e.g. states) cannot be used to imply correlation of these variables at the lower (e.g. people). Hence, the higher-level correlation is a statistical artifact, although not necessarily due to mistaking ‘noise’ for ‘signal’. 2) The correlation might be due to a third variable being causally related to the two correlated variables we observe. This is the well-known omitted…

Solve for the equilibrium: Dutch higher education

1) The number of first-year students in the Netherlands has soared from 105 000 in 2000 to 135 000 in 2011. The 30% increase is a direct result of government policy which links university funding with student numbers. In some programs in the country, student numbers have more than doubled during the last five years. Everyone is encouraged to enter the university system. 2) In the general case, there is no selection at the gate. Students cannot be refused to enter a program. 3) Now, the government’s objectives are to reduce the number of first-year drop-outs  and slash the number of students who do not graduate within four years. Both objectives are being supported by financial incentives and penalties for the universities. Something’s gotta give. I wonder what… P.S. ‘Solve for the equilibrium’ is the title of a rubric from Marginal Revolution.

In defense of description

John Gerring has a new article in the British Journal of Political Science [ungated here]which attempts to restore description to its rightful place as a respectful occupation for political scientists. Description has indeed been relegated to the sidelines at the expense of causal inference during the last 50 years, and Gerring does a great job in explaining why this is wrong. But he also points out why description is inherently more difficult than causal analysis:  ‘Descriptive inference, by contrast, is centred on a judgment about what is important, substantively speaking, and how to describe it. To describe something is to assert its ultimate value. Not surprisingly, judgments about matters of substantive rationality are usually more contested than judgments about matters of instrumental rationality, and this offers an important clue to the predicament of descriptive inference.’ (p.740) Required reading.

Taking stock of Institutionalism

This year’s Nobel Symposium has been on the topic of Growth and Development. Several of the presentations (available here) deal with the impact of institutions on economic growth and development. The contributions by Daron Acemoglu and Andrei Shleifer in particular do a great job in taking stock of what we know about the role institutions play in society and the economy. And the discussion is useful in understanding the methodological challenges in demonstrating the importance of institutions as well. Highly recommended.

Hyperlinks

It’s been a while since the last post but I am slowly getting back on track after the triple shock from the arrival of a new family member, a new house and a new office (which all happened within a week during the summer). For a starter, a selection of interesting links from the last two months: Animals have morals. Brought to you by one of my academic heroes. Abuses of public budgeting for election purposes. 1) Find a black hole item in the budget. 2) Put all budget cuts there. 3) Brag that you have solved the budget deficit problem. Statistics bring emotions. Probably faked, but still nice to see. When do academic do their work?  At night, at during weekends, too. Watercolor your scatterplots. Yammy. Here as well. Fractals in nature (as seen from Google Earth). By Paul Bourke

Scatterplots vs. regression tables (Economics professors edition)

I have always considered scatterplots to be the best available device to show relationships between variables. But it must be even better to have the regression table and a full description of the results in addition, right? Not so fast: A new paper shows that professional economists make largely correct inferences about data when looking at a scatterplot, but get confused when they are shown the details of the regressions next to the scatterplot, and totally mess it up when they are shown only the numbers without the plot! Wow! If you needed any more persuasion that graphing your data and your results are more important than those regression tables with zillions of numbers, now you have it. P.S. The authors of this research could have done a better job themselves in communicating visually their findings… [via Felix Salmon]   The illusion of predictability: How regression statistics mislead experts Emre Soyer& Robin M. Hogarth Abstract Does the manner in which results are presented in empirical studies affect perceptions of the predictability of the outcomes? Noting the predominant role of linear regression analysis in empirical economics, we asked 257 academic economists to make probabilistic inferences given different presentations of the outputs of this statistical tool. Questions concerned the distribution of the dependent variable conditional on known values of the independent variable. Answers based on the presentation mode that is standard in the literature led to an illusion of predictability; outcomes were perceived to be more predictable than could be justified by the model. In particular, many respondents failed to take…