The problem with scope conditions

tl;dr: Posing arbitrary scope conditions to causal arguments leads to the same problem as subgroup analysis: the ‘results’ are too often just random noise.

Ingo Rohlfing has a very nice post on the importance of specifying what you mean by ‘context’ when you say that a causal relationship depends on the context. In sum, the argument is that ‘context’ can mean two rather different things: (1) scope conditions, so that the causal relationship might (or might not) work differently in a different context, or (2) moderating variables, so that the causal relationship should work differently in a different context, defined by different values of the moderating variables. So we better be explicit which of these two interpretations we endorse when we write that a causal relationship is context-dependent.

This is an important point. But the argument also exposes the structural similarity between scope conditions and moderating variables. Once we recognize this similarity, it is a small step to discover an even bigger issue lurking in the background: posing arbitrary scope conditions leads to the same problem as arbitrary subgroup analysis; namely, we mistake random noise for real relationships in the data.

The problem with subgroup analysis is well-known: We start with a population in which we find no association between two variables. And then we try different subgroups of the original population until we find one where the association between the two variables is ‘significant’. Even when a ‘real’ relationship between the variables does not exist at all, when we try enough subgroups, sooner or later we will get ‘lucky’ and discover a subgroup for which the relationship will look too strong to be due to chance. But it will be just that. (If you are still not persuaded, see the classic XKCD post below that makes the problem rather obvious.)

How are scope conditions similar? Well, we start with a subgroup of a population for which we find evidence for a strong, systematic relationship between some variables. Next, we try to extend the research to the broader population or to different subgroups, where we find no relationship. Then we conclude that the original relationship is context-dependent and suggest some scope conditions that define the context. But, essentially, we have committed the same mistake as the researcher trying out different subgroups before he or she gets ‘lucky’: it’s only that we have been ‘lucky’ on the first try!

When we find that a relationship holds in group A, but not in group B, a common response is to say that the relationship depends on some background scope conditions that are present in A but not in B. But, it is probably more likely that the original result for group A has been a fluke in the first place. After all, a theory that there is no relationship is more parsimonious than a theory that there is a relationship that is context-dependent (at least when we start from assumptions that not everything is connected to everything else by default).

Of course, in some cases, there will be good reasons to conclude that there are scope conditions to a previously-established association or causal relationship. Similarly, in some cases there are certain subgroups in which a relationship holds, while not in others or in the general population. The point is that failing to find a relationship in a new context should make us more sceptical whether the original finding itself was not just a result of chance. Hence, before, or in parallel to, searching for scope conditions, we should go back to the original study and try to ascertain whether the original finding still holds by collecting additional evidence or interpreting the existing evidence with a more sceptical prior.

The search for scope conditions should also be theory-driven, the same way the selection of subgroups should be driven by theoretical considerations. A scope condition would be more likely to be real, if it has been anticipated by theory and explicitly hypothesized as such before seeing the new data. Otherwise, it is too easy to capitalize on chance and elevate any random difference between groups (countries, time periods, etc.) as a scope condition of a descriptive or causal relationship.

While the problem with subgroup analysis is discussed mostly in statistical research, the problem with scope conditions is even more relevant for qualitative, small-N research than for large-N studies. This is because small-N research often proceeds from a single case study, where some relationships are found, to new cases, where often these relationships are not found, with the conclusion typically being that the originally-discovered relationships are real but context-dependent. That could be the case, but it could be also be that there are no systematic relationships in any of these cases at all.

I feel that if qualitative researchers disagree with my diagnosis of the problem with scope conditions, it will be because they often start from very different ontological assumptions about how the social world works. As mentioned above, my analysis holds only if we assume that the multitude of variables characterizing our world are not systematically related, unless we find evidence that they are. But many qualitative researchers seem to assume that everything is connected to everything else, unless we find evidence that it is not. Starting from such a strongly deterministic worldview, posing scope conditions when we fail to extend a result makes more sense. But then so would any subgroup analysis that finds a ‘significant’ relationship, and we seem to agree that this is wrong, at least in the context of statistical work.

To conclude, unless you commit to a strongly deterministic ontology where everything is connected to everything else by default, be careful when posing scope conditions to rationalize a failure to find a previously-established relationship in a different context. Instead, question whether the original result itself still holds. Only then search for more complex explanations that bring in scope conditions or moderating variables.