{"id":526,"date":"2012-10-09T11:31:56","date_gmt":"2012-10-09T11:31:56","guid":{"rendered":"http:\/\/rulesofreason.wordpress.com\/?p=526"},"modified":"2012-10-09T11:31:56","modified_gmt":"2012-10-09T11:31:56","slug":"correlation-does-not-imply-causation-then-what-does-it-imply","status":"publish","type":"post","link":"http:\/\/re-design.dimiter.eu\/?p=526","title":{"rendered":"Correlation does not imply causation. Then what does it imply?"},"content":{"rendered":"<p><strong>&#8216;Correlation does not imply causation&#8217;<\/strong> is an adage students\u00a0from all social sciences are made to recite from a very\u00a0early age. What is less often systematically discussed is what\u00a0could be actually going on so that two\u00a0phenomena are correlated but not\u00a0causally related. Let&#8217;s try to make a list:<\/p>\n<p>1) The correlation might be due to <strong>chance<\/strong>. <a href=\"http:\/\/en.wikipedia.org\/wiki\/Student's_t-test\" target=\"_blank\">T-tests <\/a>and <a href=\"http:\/\/en.wikipedia.org\/wiki\/P-value\" target=\"_blank\">p-values<\/a> are generally used to guard against this possibility.<\/p>\n<p>1a) The correlation might be due to <strong>coincidence<\/strong>. This is essentially a variant of the previous point but with focus on time series. It is especially easy to mistake pure noise (randomness) for patterns (relationships) when one looks at two variables\u00a0over time. If you look at the numerous\u00a0&#8216;correlation is not causation&#8217; jokes and <a href=\"http:\/\/www.google.co.uk\/search?hl=en&amp;q=%27correlation+is+not+causation%27&amp;um=1&amp;ie=UTF-8&amp;tbm=isch&amp;source=og&amp;sa=N&amp;tab=wi&amp;ei=5wh0UMadEOq_0QWihoGQAw&amp;biw=1680&amp;bih=867&amp;sei=Bwl0UOe4IIqp0QWn9IDIAg\" target=\"_blank\">cartoons<\/a>\u00a0on the internet, you will note that most concern the spurious correlation\u00a0between two variables over time (e.g. number of pirates and global warming): it is just easier to find such examples in\u00a0time series than in cross-sectional data.<\/p>\n<p>1b) Another reason\u00a0to distrust\u00a0correlations\u00a0is the so-called <strong>&#8216;ecological inference<\/strong>&#8216; problem. The <a href=\"http:\/\/en.wikipedia.org\/wiki\/Ecological_fallacy\" target=\"_blank\">problem<\/a> arises when data is available at several <a href=\"http:\/\/re-design.dimiter.eu\/2012\/01\/27\/unit-of-analysis-vs-unit-of-observation\/\" target=\"_blank\">levels of observation<\/a> (e.g. people nested in\u00a0municipalities nested in states). Correlation of two variables aggregated at a higher level (e.g. states) cannot be used to imply correlation of these variables at the lower (e.g. people). Hence, the higher-level correlation is a statistical artifact, although not necessarily due to mistaking &#8216;noise&#8217; for &#8216;signal&#8217;.<\/p>\n<p>2) The correlation might be due to a third variable being causally related to the two correlated variables we observe. This is the well-known <strong>omitted variable<\/strong> problem. Note that statistical significance test have nothing to contribute to the solution of this potential problem. Statistical significance of the correlation (or, of the regression coefficient, etc.) is not sufficient to guarantee causality. Another point that gets overlooked is that it is actually pretty uncommon for a &#8216;third&#8217; (omitted)\u00a0variable to be so highly correlated with both variables of interest as to induce a high correlation between them which would disappear entirely once we account for the omitted variable. Are there any prominent examples from the history of social science where a purported causal relationship was later discovered to be completely spurious due to an omitted variable (not counting time series studies)?<\/p>\n<p>3) Even if a correlation is statistically significant and not spurious in the sense of 2), there is still nothing in the correlation that establishes the <strong>direction of causality<\/strong>. Additional information is needed to ascertain in which way the causal relationship flows. Lagging variables and process-tracing case studies can be helpful.<\/p>\n<p>All in all, that&#8217;s it: a correlation does not imply causation, but unless the correlation is due to noise, statistical artifact, or an confounder (omitted variable),\u00a0<strong>correlation is pretty suggestive of causation<\/strong>. Of course, causation here means that a variable is\u00a0a contributing factor to variation in the outcome, rather than that the variable can account for all the changes in the outcome. See my posts on the difference <a href=\"http:\/\/re-design.dimiter.eu\/2012\/02\/17\/explanation-and-the-quest-for-significant-relationships-part-i\/\" target=\"_blank\">here<\/a>\u00a0and <a href=\"http:\/\/re-design.dimiter.eu\/2012\/02\/22\/explanation-and-the-quest-for-significant-relationships-part-ii\/\" target=\"_blank\">here<\/a>.<\/p>\n<p>Am I missing something?<\/p>\n","protected":false},"excerpt":{"rendered":"<p>&#8216;Correlation does not imply causation&#8217; is an adage students\u00a0from all social sciences are made to recite from a very\u00a0early age. What is less often systematically discussed is what\u00a0could be actually going on so that two\u00a0phenomena are correlated but not\u00a0causally related. Let&#8217;s try to make a list: 1) The correlation might be due to chance. T-tests and p-values are generally used to guard against this possibility. 1a) The correlation might be due to coincidence. This is essentially a variant of the previous point but with focus on time series. It is especially easy to mistake pure noise (randomness) for patterns (relationships) when one looks at two variables\u00a0over time. If you look at the numerous\u00a0&#8216;correlation is not causation&#8217; jokes and cartoons\u00a0on the internet, you will note that most concern the spurious correlation\u00a0between two variables over time (e.g. number of pirates and global warming): it is just easier to find such examples in\u00a0time series than in cross-sectional data. 1b) Another reason\u00a0to distrust\u00a0correlations\u00a0is the so-called &#8216;ecological inference&#8216; problem. The problem arises when data is available at several levels of observation (e.g. people nested in\u00a0municipalities nested in states). Correlation of two variables aggregated at a higher level (e.g. states) cannot be used to imply correlation of these variables at the lower (e.g. people). Hence, the higher-level correlation is a statistical artifact, although not necessarily due to mistaking &#8216;noise&#8217; for &#8216;signal&#8217;. 2) The correlation might be due to a third variable being causally related to the two correlated variables we observe. This is the well-known omitted&#8230;<\/p>\n<div class=\"more-link-wrapper\"><a class=\"more-link\" href=\"http:\/\/re-design.dimiter.eu\/?p=526\">Continue reading<span class=\"screen-reader-text\">Correlation does not imply causation. Then what does it imply?<\/span><\/a><\/div>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","jetpack_publicize_message":"","jetpack_is_tweetstorm":false},"categories":[8,33],"tags":[120,122,155,449],"jetpack_featured_media_url":"","jetpack_publicize_connections":[],"jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p7g3hj-8u","jetpack-related-posts":[{"id":230,"url":"http:\/\/re-design.dimiter.eu\/?p=230","url_meta":{"origin":526,"position":0},"title":"Hyperlinks","date":"January 18, 2012","format":false,"excerpt":"Migration and unemployment File under 'correlation is not causation'. And 'endogeneity'. And 'instrumental variables that do not make sense'. Equitable decision making has intrinsic value\u00a0Apparently,there is a region in the brain [anterior insula] 'linked to the experience of subjective disutility'. Ah,\u00a0the prospects for utility maximization! Fukuyama on European identities\u00a0Surfing on\u2026","rel":"","context":"In &quot;Hyperlinks&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":227,"url":"http:\/\/re-design.dimiter.eu\/?p=227","url_meta":{"origin":526,"position":1},"title":"Is this a common (ecological) fallacy?","date":"December 20, 2011","format":false,"excerpt":"You have data on two levels\u00a0(individuals and countries) for\u00a0an outcome variable (e.g. 'trust') and a predictor (e.g. 'wealth'). Supppose that the\u00a0pooled and within-country individual-level correlations between the two variables\u00a0are strongly positive\u00a0but the between-country (country-level) correlation is zero. You build a regression with individual-level 'trust' as the dependent variable and individual-level\u2026","rel":"","context":"In &quot;Multi-level models&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":220,"url":"http:\/\/re-design.dimiter.eu\/?p=220","url_meta":{"origin":526,"position":2},"title":"Slavery, ethnic diversity and economic development","date":"December 14, 2011","format":false,"excerpt":"What is the impact of the slave trades on economic progress in Africa? Are the modern African states which 'exported' a higher number of slaves more likely to be underdeveloped several centuries afterwards? Harvard economist Nathan Nunn addresses these questions in his chapter for the \"Natural experiments of history\" collection.\u2026","rel":"","context":"In &quot;Development&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/re-design.dimiter.eu\/wp-content\/uploads\/2011\/12\/slave-trades.jpg?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":969,"url":"http:\/\/re-design.dimiter.eu\/?p=969","url_meta":{"origin":526,"position":3},"title":"The 'Global South' is a terrible term. Don't use it!","date":"November 6, 2018","format":false,"excerpt":"The Rise of the 'Global South' The 'Global South' and 'Global North' are increasingly popular terms used to categorize the countries of the world.\u00a0According to Wikipedia, the term 'Global South' originated in postcolonial studies, and was first used in 1969. The Google N-gram chart below shows the rise of the\u2026","rel":"","context":"In &quot;Classification&quot;","img":{"alt_text":"","src":"https:\/\/i2.wp.com\/re-design.dimiter.eu\/wp-content\/uploads\/2018\/11\/f2_hdi_eq.png?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":57,"url":"http:\/\/re-design.dimiter.eu\/?p=57","url_meta":{"origin":526,"position":4},"title":"Inspiring scientific concepts","date":"October 16, 2011","format":false,"excerpt":"EDGE asks 159 selected intellectuals What scientific concept would improve everybody's cognitive toolkit? You are welcome to read the individual contributions which range from a paragraph to a short essay here. Many of the entries are truly inspiring but I see little synergy of bringing 159 of them together. Like\u2026","rel":"","context":"In &quot;Causality&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/img.youtube.com\/vi\/2kotK9FNEYU\/0.jpg?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":884,"url":"http:\/\/re-design.dimiter.eu\/?p=884","url_meta":{"origin":526,"position":5},"title":"Is interpretation descriptive or explanatory?","date":"February 2, 2017","format":false,"excerpt":"One defining feature of interpretivist approaches to social science is the idea that the goal of analysis\u00a0is to provide interpretations\u00a0of social reality rather than law-based explanations. But of course nobody these days believes in law-based causality in the social world anyways, so the question whether interpretation is to be understood\u2026","rel":"","context":"In &quot;Anthropology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"http:\/\/re-design.dimiter.eu\/index.php?rest_route=\/wp\/v2\/posts\/526"}],"collection":[{"href":"http:\/\/re-design.dimiter.eu\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/re-design.dimiter.eu\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/re-design.dimiter.eu\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/re-design.dimiter.eu\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=526"}],"version-history":[{"count":0,"href":"http:\/\/re-design.dimiter.eu\/index.php?rest_route=\/wp\/v2\/posts\/526\/revisions"}],"wp:attachment":[{"href":"http:\/\/re-design.dimiter.eu\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=526"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/re-design.dimiter.eu\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=526"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/re-design.dimiter.eu\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=526"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}