Skip to content

Category: Causality

Explanation and the quest for ‘significant’ relationships. Part II

In Part I I argue that the search and discovery of statistically significant relationships does not amount to explanation and is often misplaced in the social sciences because the variables which are purported to have effects on the outcome cannot be manipulated. Just to make sure that my message is not misinterpreted – I am not arguing for a fixation on maximizing R-squared and other measures of model fit in statistical work, instead of the current focus on the size and significance of individual coefficients. R-squared has been rightly criticized as a standard of how good a model is** (see for example here). But I am not aware of any other measure or standard that can convincingly compare the explanatory potential of different models in different contexts. Predictive success might be one way to go, but prediction is altogether something else than explanation. I don’t expect much to change in the future with regard to the problem I outlined. In practice, all one could hope for is some clarity on the part of the researchers whether their objective is to explain (account for) or find significant effects. The standards for evaluating progress towards the former objective (model fit, predictive success, ‘coverage’ in the QCA sense) should be different than the standards for the latter (statistical & practical significance and the practical possibility to manipulate the exogenous variables). Take the so-called garbage-can regressions, for example. These are models with tens of variables all of which are interpreted causally if they reach the magic…

Explanation and the quest for ‘significant’ relationships. Part I

The ultimate goal of social science is causal explanation*. The actual goal of most academic research is to discover significant relationships between variables. The two goals are supposed to be strongly related – by discovering (the) significant effects of exogenous (independent) variables, one accounts for the outcome of interest. In fact, the working assumption of the empiricist paradigm of social science research is that the two goals are essentially the same – explanation is the sum of the significant effects that we have discovered. Just look at what all the academic articles with ‘explanation’, ‘determinants’, and ’causes’ in their titles do – they report significant effects, or associations, between variables. The problem is that explanation and collecting significant associations are not the same. Of course they are not. The point is obvious to all uninitiated into the quantitative empiricist tradition of doing research, but seems to be lost to many of its practitioners. We could have discovered a significant determinant of X, and still be miles (or even light-years) away from a convincing explanation of why and when X occurs. This is not because of the difficulties of causal identification – we could have satisfied all conditions for causal inference from observational data, but the problem still stays. And it would not go away after we pay attention (as we should) to the fact that statistical significance is not the same as practical significance. Even the discovery of convincingly-identified causal effects, large enough to be of practical rather than only statistical significance, does not amount to explanation. A successful explanation needs to account for…

Google tries to find the funniest videos

Following my recent post on the project which tries to explain why some video clips go viral, here is a report on Google’s efforts to find the funniest videos: You’d think the reasons for something being funny were beyond the reach of science – but Google’s brain-box researchers have managed to come up with a formula for working out which YouTube video clips are the funniest. The Google researcher behind the project is quoted saying: ‘If a user uses an “loooooool” vs an “loool”, does it mean they were more amused? We designed features to quantify the degree of emphasis on words associated with amusement in viewer comments.’ Other factors taken into account are tags, descriptions, and ‘whether audible laughter can be heard in the background‘. Ultimately, the algorithm gives a ranking of the funniest videos  (with No No No No Cat on top, since you asked). Now I usually have high respect for all things Google, but this ‘research’ at first appeared to be a total piece of junk. Of course, it turned out that it is just the way it is reported by the Daily Mail (cited above), New Scientist and countless other more or less reputable outlets. Google’s new algorithm does not provide a normative ranking of the funniest videos ever based on some objective criteria; it is a predictive score about the video’s comedic potential. Google trained the algorithm on a bunch of videos (it’s unclear from the original source what the external ‘fun’ measure used for the…

Is unit homogeneity a sufficient assumption for causal inference?

Is unit homogeneity a sufficient condition (assumption) for causal inference from observational data? Re-reading King, Keohane and Verba’s bible on research design [lovingly known to all exposed as KKV] I think they regard unit homogeneity and conditional independence as alternative assumptions for causal inference. For example: “we provide an overview here of what is required in terms of the two possible assumptions that enable us to get around the fundamental problem [of causal inference]” (p.91, emphasis mine). However, I don’t see how unit homogeneity on its own can rule out endogeneity (establish the direction of causality). In my understanding, endogeneity is automatically ruled out with conditional independence, but not with unit homogeneity (“Two units are homogeneous when the expected values of the dependent variables from each unit are the same when our explanatory variables takes on a particular value” [p.91]). Going back to Holland’s seminal article which provides the basis of KKV’s approach, we can confirm that unit homogeneity is listed as a sufficient condition for inference (p.948). But Holland divides variables into pre-exposure and post-exposure before he even gets to discuss any of the additional assumptions, so reverse causality is ruled out altogether. Hence, in Holland’s context unit homogeneity can indeed be regarded as sufficient, but in my opinion in KKV’s context unit homogeneity needs to be coupled with some condition (temporal precedence for example) to ascertain the causal direction when making inferences from data. The point is minor but can create confusion when presenting unit homogeneity and conditional independence side by side as alternative assumptions for inference.

Inspiring scientific concepts

EDGE asks 159 selected intellectuals What scientific concept would improve everybody’s cognitive toolkit? You are welcome to read the individual contributions which range from a paragraph to a short essay here. Many of the entries are truly inspiring but I see little synergy of bringing 159 of them together. Like in a group photo of beauty pageant contenders, the total appeal of the group photo is less than sum of the individual attractiveness of its subjects. But to my point: It is remarkable that so many of the answers (on my count, in excess of 30) deal, more or less directly, with causal inference. What is even more remarkable is that most of the concepts and ideas about causal inference mentioned by the worlds’ intellectual jet-set (no offense to those left out) are anything but new. Many of the ideas can be traced back to Popper’s The Logic of Scientific Discovery (1934) and Ronald Fisher’s The Design of Experiments (1935). So what is most remarkable of all is how long it takes for these ideas to sink-in and diffuse in society. Several posts focus on the Popperian requirement for falsifiability (Howard Gardner, Tania Lombrozo) and skeptical empiricism more generally (Gerald Holton). The scientific method is further evoked by Richard Dawkins on the double-blind control experiment (see also Roger Schank), Brian Knutson on replicability, and Kevin Kelly the virtues of negative results. Mark Henderson advocates the use of the scientific method outside science (e.g. policy) – a plea that strikes a chord with this…