Correlation does not imply causation. Then what does it imply?

‘Correlation does not imply causation’ is an adage students from all social sciences are made to recite from a very early age. What is less often systematically discussed is what could be actually going on so that two phenomena are correlated but not causally related. Let’s try to make a list:

1) The correlation might be due to chance. T-tests and p-values are generally used to guard against this possibility.

1a) The correlation might be due to coincidence. This is essentially a variant of the previous point but with focus on time series. It is especially easy to mistake pure noise (randomness) for patterns (relationships) when one looks at two variables over time. If you look at the numerous ‘correlation is not causation’ jokes and cartoons on the internet, you will note that most concern the spurious correlation between two variables over time (e.g. number of pirates and global warming): it is just easier to find such examples in time series than in cross-sectional data.

1b) Another reason to distrust correlations is the so-called ‘ecological inference‘ problem. The problem arises when data is available at several levels of observation (e.g. people nested in municipalities nested in states). Correlation of two variables aggregated at a higher level (e.g. states) cannot be used to imply correlation of these variables at the lower (e.g. people). Hence, the higher-level correlation is a statistical artifact, although not necessarily due to mistaking ‘noise’ for ‘signal’.

2) The correlation might be due to a third variable being causally related to the two correlated variables we observe. This is the well-known omitted variable problem. Note that statistical significance test have nothing to contribute to the solution of this potential problem. Statistical significance of the correlation (or, of the regression coefficient, etc.) is not sufficient to guarantee causality. Another point that gets overlooked is that it is actually pretty uncommon for a ‘third’ (omitted) variable to be so highly correlated with both variables of interest as to induce a high correlation between them which would disappear entirely once we account for the omitted variable. Are there any prominent examples from the history of social science where a purported causal relationship was later discovered to be completely spurious due to an omitted variable (not counting time series studies)?

3) Even if a correlation is statistically significant and not spurious in the sense of 2), there is still nothing in the correlation that establishes the direction of causality. Additional information is needed to ascertain in which way the causal relationship flows. Lagging variables and process-tracing case studies can be helpful.

All in all, that’s it: a correlation does not imply causation, but unless the correlation is due to noise, statistical artifact, or an confounder (omitted variable), correlation is pretty suggestive of causation. Of course, causation here means that a variable is a contributing factor to variation in the outcome, rather than that the variable can account for all the changes in the outcome. See my posts on the difference here and here.

Am I missing something?

Is unit homogeneity a sufficient assumption for causal inference?

Is unit homogeneity a sufficient condition (assumption) for causal inference from observational data?

Re-reading King, Keohane and Verba’s bible on research design [lovingly known to all exposed as KKV] I think they regard unit homogeneity and conditional independence as alternative assumptions for causal inference. For example: “we provide an overview here of what is required in terms of the two possible assumptions that enable us to get around the fundamental problem [of causal inference]” (p.91, emphasis mine). However, I don’t see how unit homogeneity on its own can rule out endogeneity (establish the direction of causality). In my understanding, endogeneity is automatically ruled out with conditional independence, but not with unit homogeneity (“Two units are homogeneous when the expected values of the dependent variables from each unit are the same when our explanatory variables takes on a particular value” [p.91]).

Going back to Holland’s seminal article which provides the basis of KKV’s approach, we can confirm that unit homogeneity is listed as a sufficient condition for inference (p.948). But Holland divides variables into pre-exposure and post-exposure before he even gets to discuss any of the additional assumptions, so reverse causality is ruled out altogether. Hence, in Holland’s context unit homogeneity can indeed be regarded as sufficient, but in my opinion in KKV’s context unit homogeneity needs to be coupled with some condition (temporal precedence for example) to ascertain the causal direction when making inferences from data.

The point is minor but can create confusion when presenting unit homogeneity and conditional independence side by side as alternative assumptions for inference.

Inspiring scientific concepts

EDGE asks 159 selected intellectuals What scientific concept would improve everybody’s cognitive toolkit?

You are welcome to read the individual contributions which range from a paragraph to a short essay here. Many of the entries are truly inspiring but I see little synergy of bringing 159 of them together. Like in a group photo of beauty pageant contenders, the total appeal of the group photo is less than sum of the individual attractiveness of its subjects.

But to my point: It is remarkable that so many of the answers (on my count, in excess of 30) deal, more or less directly, with causal inference. What is even more remarkable is that most of the concepts and ideas about causal inference mentioned by the worlds’ intellectual jet-set (no offense to those left out) are anything but new. Many of the ideas can be traced back to Popper’s The Logic of Scientific Discovery (1934) and Ronald Fisher’s The Design of Experiments (1935). So what is most remarkable of all is how long it takes for these ideas to sink-in and diffuse in society.

Several posts focus on the Popperian requirement for falsifiability (Howard Gardner, Tania Lombrozo) and skeptical empiricism more generally (Gerald Holton). The scientific method is further evoked by Richard Dawkins on the double-blind control experiment (see also Roger Schank), Brian Knutson on replicability, and Kevin Kelly the virtues of negative results. Mark Henderson advocates the use of the scientific method outside science (e.g. policy) – a plea that strikes a chord with this blog.

A significant sample of contributions relate to probability (Seth Lloyd, John Allen Paulos, Charles Seife), and the difficulties humans have in understanding risk, uncertainty and probabilities (Antony Garrett, Gerd Gigerenzer, Lawrence M. Krauss, Carlo Rovelli, Keith Devlin, Mahzarin Banaji, David Pizarro). W. Daniel Hillis and Kevin Devlin mention possibility spaces and base rates respectively as concepts that might help.

Several authors warn of the dangers of anecdotal data (Susan Fiske, Robert Sapolsky) and Christine Finn insists that the absence of evidence is not evidence of absence. Susan Blackmore reminds that correlation is not a cause and Diane Halpern critiques the cult of statistical significance.  Beatrice Golomb discusses misinterpretations of the placebo effect.

You do want to check out some innovative approaches to causality – causation as an information flow (David Dalrymple), nexus causality (John Tooby) and Rebecca Newberger Goldstein’s  ‘best explanation‘ that go beyond the “monocausalitis” disease identified by Ernst Poppel (related argument by Nigel Goldenfeld).

Some highlights from the remaining posts:

– Richard Thaler compares the economic concept of utility to  aether.

– Eric R. Weinstein on kayfabe (!) – the fabricated competition in professional wrestling and… the study of economics

– Fiery Cushman on confabulation (“Guessing at plausible explanations for our behavior, and then regarding those guesses as introspective certainties”)

– Joshua D. Greene on  supervenience (“The Set A properties supervene on the Set B properties if and only if no two things can differ in their A properties without also differing in their B properties””)

– Stephen M. Kosslyn  on constraint satisfaction as a decision mechanism

And Andrian Kreye mentions  free jazz: