# What are the effects of COVID-19 on mortality? Individual-level causes of death and population-level estimates of casual impact

Introduction

How many people have died from COVID-19? What is the impact of COVID-19 on mortality in a population? Can we use excess mortality to estimate the effects of COVID-19?

In this text I will explain why the answer to the first two questions need not be the same. That is, the sum of cases where COVID-19 has been determined to be the direct[1] cause of death need not be the same as the population-level estimate about the causal impact of COVID-19. When measurement of the individual-level causes of death is imperfect, using excess mortality (observed minus expected) to measure the impact of COVID-19 leads to an underestimate of the number of individual cases where COVID-19 has been the direct cause of death.

Assumptions

The major assumption on which the argument rests is that some of the people who have died from COVID-19 would have died from other causes, within a specified relatively short time-frame (say, within the month). It seems very reasonable to assume that at least some of the victims of COVID-19 would have succumbed to other causes of death. This is especially easy to imagine given that COVID-19 kills disproportionally the very old and that the ultimate causes of death that it provokes – respiratory problems, lungs failure, etc. – are shared with other common diseases with high mortality among the older population, such as the flu.

Defining individual and population-level causal effects

With this crucial assumption in mind, we can construct the following simple table. Cell A contains the people who would have survived if they had not caught the Coronavirus, but they caught it and died. Cell B contains the people that caught the Coronavirus and died, but would have died from other causes even if they did not catch the virus[2]. Cell C contains the people who caught the virus and survived and would have survived even if they did not catch the virus. Cell D contains the people who would have died if they did not catch virus, but they did and survived. Cell C is of no interest for the current argument, and for now we can assume that cases in Cell D are implausible (although this might change if we consider indirect effects of the pandemic and the policy measures it provoked. But for now, we ignore such indirect effects). Cell E is people that did not catch the virus and survived (also not interesting for the argument). Cell F is people who did not catch the virus and died from other causes. As a matter of definition, total mortality within a period is A + B + F.

The number of individual-level deaths directly caused by COVID-19 that can be observed is the sum of cells A + B. Without further assumptions and specialized knowledge, we cannot estimate the share of cases that would have died anyways from the total. For now, just assume that this is positive; that is, such cases exist. The population-level causal impact of COVID-19 is A, or, in words, those that have died from COVID-19 minus those that would have died from other causes within the same period. The population-level causal effect is defined counterfactually. Again, without further assumptions about the ratio of B to A, the population-level causal impact of COVID-19 is not identifiable. An important conclusion that we reach is that the population-level causal impact of COVID-19 on mortality does not necessarily sum up to the sum of the individual cases where COVID-19 was the cause of death.

Scenario I: perfect measures of individual-level causes of death

Assume for the moment that all individual cases where COVID-19 was the cause of death are observed and recorded. Under this assumption, what does excess mortality measure? Excess mortality is defined as the difference between the observed (O) and predicted (P) number of deaths within a period, with the prediction (expectation) coming from historical averages, statistical models or anything else[3]. Under our definitions, the observed mortality in O a period contains groups  A + B + F. So the difference between observed O and predicted P gives A, or the number of people that have died from COVID-19, but would have survived otherwise. Therefore, excess mortality identifies the population-level causal impact of the COVID-19 (see also the figure below).

One implication of this line of reasoning is that under perfect measurement of individual-level cause of deaths and a positive amount of people who would have died from other causes if they had not died from COVID-19 (cell B), the sum of the cases where COVID-19 was recorded as a cause of death should exceed the excess in observed mortality O – P.  (See the situation in France where this might be happening.)

Scenario II: imperfect measures of individual-level causes of death

Let’s consider now a more realistic scenario where determining and recording the individual causes of death is imperfect. Under this assumption, the observed number of deaths in a period still contains O = A + B + F. Excess mortality O – P still identifies the population level effect A. However, this is not the number of deaths directly caused by COVID-19, which includes those that would have died anyways (B): a category that is already included in the prediction about mortality during this period [4].

In other words, excess mortality underestimates the sum of individual cases where COVID-19 is the direct cause of death. The amount of underestimation depends on how large the share of people who would have died from other causes but died from COVID-19 is. The larger the share, the larger the underestimation. To put it bluntly, COVID-19 kills more people than excess mortality suggests. This is because the expected number of deaths, on which the calculation of excess mortality depends, contains a share of people that would have died from other causes, but were killed by the virus.

Conclusions

These are the main conclusions from the analysis:

1. The sum of individual-level cases where COVID-19 was the direct cause of death is not the same as the population-level causal impact of the virus.
2. Excess mortality provides a valid estimate of the population-level causal impact.
3.  When measurement of the individual causes of death is imperfect, excess mortality provides an underestimate of the sum of individual cases where COVID-19 was the cause of death.
4. With perfect measurement of the individual causes of death, excess in mortality should be lower than then the sum of the individual case where COVID-19 was the cause of death.

Notes:

[1] I suspect some will object that the coronavirus and COVID-19 are never the direct causes of death but only provoke other diseases that ultimately kill people. This is irrelevant for the argument: I use ‘COVID-19 as a direct case of death’ as a shortcut for a death that was caused by COVID-19 provoking some other condition that ultimately kills.

[2] Formally, for people in cell B, COVID-19 is a sufficient but not necessary condition for dying within a certain period. For people in cell A, COVID-19 is both necessary and sufficient. Because of the counterfactual definition of the population-level effect, it only tracks cases where the cause was both necessary and sufficient.

[3] In reality, the models used to predict and estimate the expected mortality are imperfect and incorporate considerable uncertainties. These uncertainties compound the estimation problems discussed in the text, but the problems will exist even if the expected mortality was predicted perfectly.

[4] Extending the analysis to include indirect effects of COVID-19 and the policy responses it led to is interesting and important but very challenging. There are multiple plausible mechanisms for indirect effects, some of which would act to decrease mortality (e.g. less pollution, fewer traffic accidents, fewer crime-related murders, etc.) and some of which would act to increase mortality (e.g. due to stress, not seeking medical attention on time, postponed medical operations, increases in domestic violence, self-medication gone wrong, etc.). The time horizon of the estimation becomes even more important as some of these mechanisms need more time to exert their effects (e.g. reduced pollution).   Once we admit indirect effects, the calculation of the direct population-level effect of COVID-19 from excess mortality data becomes impossible without some assumptions about the share and net effect of the indirect mechanisms, and the estimation of the sum of individual-level effects becomes even more muddled.

# The problem with scope conditions

tl;dr: Posing arbitrary scope conditions to causal arguments leads to the same problem as subgroup analysis: the ‘results’ are too often just random noise.

Ingo Rohlfing has a very nice post on the importance of specifying what you mean by ‘context’ when you say that a causal relationship depends on the context. In sum, the argument is that ‘context’ can mean two rather different things: (1) scope conditions, so that the causal relationship might (or might not) work differently in a different context, or (2) moderating variables, so that the causal relationship should work differently in a different context, defined by different values of the moderating variables. So we better be explicit which of these two interpretations we endorse when we write that a causal relationship is context-dependent.

This is an important point. But the argument also exposes the structural similarity between scope conditions and moderating variables. Once we recognize this similarity, it is a small step to discover an even bigger issue lurking in the background: posing arbitrary scope conditions leads to the same problem as arbitrary subgroup analysis; namely, we mistake random noise for real relationships in the data.

The problem with subgroup analysis is well-known: We start with a population in which we find no association between two variables. And then we try different subgroups of the original population until we find one where the association between the two variables is ‘significant’. Even when a ‘real’ relationship between the variables does not exist at all, when we try enough subgroups, sooner or later we will get ‘lucky’ and discover a subgroup for which the relationship will look too strong to be due to chance. But it will be just that. (If you are still not persuaded, see the classic XKCD post below that makes the problem rather obvious.)

How are scope conditions similar? Well, we start with a subgroup of a population for which we find evidence for a strong, systematic relationship between some variables. Next, we try to extend the research to the broader population or to different subgroups, where we find no relationship. Then we conclude that the original relationship is context-dependent and suggest some scope conditions that define the context. But, essentially, we have committed the same mistake as the researcher trying out different subgroups before he or she gets ‘lucky’: it’s only that we have been ‘lucky’ on the first try!

When we find that a relationship holds in group A, but not in group B, a common response is to say that the relationship depends on some background scope conditions that are present in A but not in B. But, it is probably more likely that the original result for group A has been a fluke in the first place. After all, a theory that there is no relationship is more parsimonious than a theory that there is a relationship that is context-dependent (at least when we start from assumptions that not everything is connected to everything else by default).

Of course, in some cases, there will be good reasons to conclude that there are scope conditions to a previously-established association or causal relationship. Similarly, in some cases there are certain subgroups in which a relationship holds, while not in others or in the general population. The point is that failing to find a relationship in a new context should make us more sceptical whether the original finding itself was not just a result of chance. Hence, before, or in parallel to, searching for scope conditions, we should go back to the original study and try to ascertain whether the original finding still holds by collecting additional evidence or interpreting the existing evidence with a more sceptical prior.

The search for scope conditions should also be theory-driven, the same way the selection of subgroups should be driven by theoretical considerations. A scope condition would be more likely to be real, if it has been anticipated by theory and explicitly hypothesized as such before seeing the new data. Otherwise, it is too easy to capitalize on chance and elevate any random difference between groups (countries, time periods, etc.) as a scope condition of a descriptive or causal relationship.

While the problem with subgroup analysis is discussed mostly in statistical research, the problem with scope conditions is even more relevant for qualitative, small-N research than for large-N studies. This is because small-N research often proceeds from a single case study, where some relationships are found, to new cases, where often these relationships are not found, with the conclusion typically being that the originally-discovered relationships are real but context-dependent. That could be the case, but it could be also be that there are no systematic relationships in any of these cases at all.

I feel that if qualitative researchers disagree with my diagnosis of the problem with scope conditions, it will be because they often start from very different ontological assumptions about how the social world works. As mentioned above, my analysis holds only if we assume that the multitude of variables characterizing our world are not systematically related, unless we find evidence that they are. But many qualitative researchers seem to assume that everything is connected to everything else, unless we find evidence that it is not. Starting from such a strongly deterministic worldview, posing scope conditions when we fail to extend a result makes more sense. But then so would any subgroup analysis that finds a ‘significant’ relationship, and we seem to agree that this is wrong, at least in the context of statistical work.

To conclude, unless you commit to a strongly deterministic ontology where everything is connected to everything else by default, be careful when posing scope conditions to rationalize a failure to find a previously-established relationship in a different context. Instead, question whether the original result itself still holds. Only then search for more complex explanations that bring in scope conditions or moderating variables.

# More on QCA solution types and causal analysis

Following up my post on QCA solution types and their appropriateness for causal analysis, Eva Thomann was kind enough to provide a reply. I am posting it here in its entirety :

## Why I still don’t prefer parsimonious solutions (Eva Thomann)

Thank you very much, Dimiter, for issuing this blog debate and inviting me to reply. In your blog post, you outline why, absent counterevidence, you find it justified to reject applied Qualitative Comparative Analysis (QCA) paper submission that do not use the parsimonious solution. I think I agree with some but not all of your points. Let me start by clarifying a few things.

It´s good to see that we all seem to agree that “no single criterion in isolation should be used to reject manuscripts during anonymous peer review”. The reviewer practice addressed in the COMPASSS statement is a bad practice. Highlighting this bad reviewer practice is the sole purpose of this statement. Conversely, the COMPASSS statement does not take sides when it comes to preferring specific solution types over others. The statement also does not imply anything about the frequency of this reviewer practice – this part of your post is pure speculation.  Personally I have heard people complaining about getting papers rejected for promoting or using conservative (QCA-CS), intermediate (QCA-IS) and parsimonious solutions (QCA-PS) with about the same frequency. But it is of course impossible for COMPASSS to get a representative picture of this phenomenon.

The term “empirically valid” refers to the, to my best knowledge entirely undisputed fact that all solution types are (at least) based on the information contained in the empirical data. The question that´s disputed is how we can or should go “beyond the facts” in causally valid ways when deriving QCA solutions.

Having said this, I will take off my “hat” as a member of the COMPASSS steering committee and contribute a few points to this debate. These points represent my own personal view and not that of COMPASSS or any of its bodies. I write as someone who uses QCA sometimes in her research and teaches it, too. Since I am not a methodologist, I won´t talk about fundamental issues of ontology and causality. I hope others will jump in on that.

Point of clarification 2: There is no point in personalizing this debate

In your comment you frequently refer to “the COMPASSS people”. But I find that pointless: COMPASSS hosts a broad variety of methodologists, users, practitioners, developers and teachers with different viewpoints and of different “colours and shapes”, some persons closer to “case-based” research, other closer to statistical/analytical research. Amongst others, Michael Baumgartner whom you mention is himself a members of the advisory board and he has had methodological debates with his co-authors as well.  Just because we can procedurally agree on a bad reviewer practice, it neither means we substantively agree on everything, nor does it imply that we disagree. History has amply shown how unproductive it can be for scientific progress when debates like these become personalized. Thus, if I could make a wish to you and everyone else engaging in this debate, it would be to talk about arguments rather than specific people. In what follows I will therefore refer to different approaches instead unless when referring to specific scholarly publications.

Point of clarification 3: There is more than one perspective on the validity of different solutions

As to your earlier point which you essentially repeat here, that “but if two solutions produce different causal recipes,  e.g. (1) AB-> E and (2) ABC-> E it cannot be that both (1) and (2) are valid”, my answer is: it depends on what you mean with “valid”.

It is common to look at QCA results as subset relations, here: statements of sufficiency. In a paper that is forthcoming in Sociological Methods & Research, Martino Maggetti and I call this the” approach emphasizing substantive interpretability”. From this perspective, the forward arrow “->2 reads “is sufficient for” and 1) in fact implies 2). Sufficiency means that X (here: AB) is a subset of Y (here: E). ABC is a subset of AB and hence it is also a subset of E, if AB is a subset of E. Logic dictates that any subset of a sufficient condition is also sufficient. Both are valid – they describe the sufficiency patterns in the data (and sometimes, some remainders) with different degrees of complexity.

Scholars promoting an “approach emphasizing redundancy-free models” agree with that, if we speak of mere (monotonic) subset relations. Yet they require QCA solutions to be minimal statements of causal relevance. From this perspective, the arrow (it then is <->, see below) reads “is causally relevant for” and if 1) then 2) cannot be true: 2) additionally grants causal relevance to C, but in in 1) we said only AB are causally relevant. As a causal statement, we can think of 2) claiming more than 1).

To proponents of the approach emphasizing substantive interpretability (and I am one of them), it all boils down to the question:

“Can something be incorrect that follows logically and inevitably from a correct statement?

Their brains shout:

“No, of course it can’t!

I am making an informed guess here: this fact is so blatantly obvious to most people well-versed in set theory that it does not require a formal reply.

For everyone else, it is important to understand that in order to follow the reasoning you are proposing in your comment, you have to buy into a whole set of assumptions that underlie the method promoted in the publication you are referring to (Baumgartner 2015), called Coincidence Analysis or CNA. Let me illustrate this.

Point of clarification 4: QCA is not CNA

In fact, one cannot accept 2) if 1) is true in the special case when the condition “AB” is  both minimally sufficient and contained in a minimally necessary condition for an outcome – which is also the situation you refer to (in your point 3). We have to replace the forward arrow “->” with “<->”.In such a situation, the X set and the Y set are equivalent. Of course, if AB and E are equivalent, then ABC and E are not equivalent at the same time. In reality, this – simultaneous necessity and sufficiency– is a rare scenario that requires a solution to be maximally parsimonious and having both a high consistency (indicating sufficiency) AND a very high coverage (indicating necessity).

But QCA – as opposed to CNA – is designed to assess necessary conditions and / or sufficient conditions. They don´t have to be both. As soon as we are speaking of a condition that is sufficient but not necessary (or not part of a necessary condition), then, if 1) is correct, 2) also has to be correct. You are acknowledging this when saying that “if A is sufficient for E, AB is also sufficient, for any arbitrary B”.

I will leave it to the methodologists to clarify whether it is ontologically desirable to empirically analyse sufficient but not necessary (or necessary but not sufficient) conditions. As a political scientist, I find it theoretically and empirically interesting. I believe this is in the tradition of much comparative political research. It is clear, and you seem to agree, that what we find to be correct entirely depends on how we define “correct” – there´s a danger of circularity here. At this point in time, it has to be pointed out that CNA is not QCA. Both are innovative, elegant and intriguing methods with their own pro’s and con’s. I am personally quite fascinated by CNA and would like to see more applications of it, but I am not convinced that we can or need to transfer its assumptions to QCA.

What I like about the recent publications advocating an approach emphasizing redundancy-free models is that they highlight that not all conditions contained in QCA solutions may be causally interpretable, if only we knew the true data-generating process (DGP). That points to the general question of causal arguments made with QCA if there is limited diversity, which has received ample scholarly attention for already quite a while.

Point of agreement 1: We need a cumulative process of rational critique

You argue that “the point about non-parsimonious solutions deriving faulty causal inferences seems settled, at least until there is a published response that rebukes it”. But QCA scholars have long highlighted issues of implausible and untenable counterfactuals entailed in parsimonious solutions (e.g. here, here, here, here, here, here and here). None of the published articles advocating redundancy-free models has so far made concrete attempts to rebuke these arguments. Following your line of reasoning, the points made by previous scholarship about parsimonious solutions deriving faulty causal inferences equally seems settled, at least until there is a published response that rebukes these points.

Indeed, advocates of redundancy-free models seem to either dismiss the relevance of counterfactuals altogether because CNA, so it is argued, does not rely on counterfactuals to derive solutions; OR they argue, that in the presence of limited diversity all solutions rely on counterfactuals. (Wouldn´t it be contradictory to argue both?). I personally would agree with the latter point. There can be no doubt that QCA (as opposed, perhaps, to CNA) is a set-theoretic, truth table based methods that, in the presence of limited diversity, involves counterfactuals. Newer algorithms (such as eQMC, used in the QCA package for R) no longer actively “rely on” remainders for minimization, and they exclude difficult and untenable counterfactuals rather than including tenable and “easy” counterfactuals. But the reason why QCA involves counterfactuals keeps being that intermediate and parsimonious QCA solutions involve configurations of conditions some of which are empirically observed, while others (the counterfactuals) are not. There can be only one conclusion: that the question of whether these counterfactuals are valid requires our keen attention.

Where does that leave us? To me, all that does certainly not mean that “the reliance on counterfactuals cannot be used to arbitrate this debate”. It means that different scholars have highlighted different issues relating to the validity of all solution types. None of these points have been conclusively rebuked so far. That, of course, leaves users in an intricate situation. They should not be punished for consistently and correctly following protocols proposed by methodologists of one or another approach.

Point of agreement 2: In the presence of limited diversity, QCA solutions can err in different directions

Parsimonious solutions are by no means unaffected by the problem that limited empirical diversity challenges our confidence in inferences. Indeed we should be careful not to omit that they err, too. As Ingo Rohlfing has pointed out, the question in which direction we want to err is a different question than the one which solution is correct. The answer to this former question probably depends.

Let us return to the above example and assume that we have a truth table characterized by limited diversity. We get a conservative solution

(CS) ABC -> E,

and a parsimonious solution

(PS) A -> E.

Let us further assume that we know (which in reality we never do) that the true DGP is

(DGP) AB -> E.

Neither CS nor PS give us the true DGP. To recap: To scholars emphasizing redundancy-free models, PS is “correct” because they define as “correct” a solution that does not contain causally irrelevant conditions. But note that PS here is also incomplete: the true result in this example is that, in order to observe the outcome E, A alone is not enough, it has to combine with B. Claiming that A alone is enough involves a counterfactual that could well be untenable. But the evidence alone does not allow us to conclude that B is irrelevant for E. It is usually only by making this type of oversimplifications that parsimonious solutions reach the high coverage values required to be “causally interpretable” under an approach emphasizing redundancy-free models.

To anyone with some basic training in QCA, this should raise some serious questions: But isn´t one of the core assumptions of QCA that we cannot interpret the single conditions in its results in isolation because they unfold their effect only in combination with other conditions? How, then, does QCA-PS fare when assessed against this assumption? I have not read a conclusive answer to this question yet.

Baumgartner and Thiem (2017) point out that with imperfect data, no method can be expected to deliver complete results. That may well be, but in QCA we deal with two types of completeness: complete AND-configurations, or including all substitutable paths or “causal recipes” combined with the logical OR. In order to interpret a QCA solution as a sufficient condition, I want to be reasonably sure that the respective AND-configuration in fact reliably implies the outcome (even if it omits other configurations that may not have been observed in my data).  Using this criterion, QCA-PS arguably fares worst (it most often misses out on causally relevant factors) and QCA-CS fares best (though it most often also still includes causally irrelevant factors).

To be sure, QCA-PS is sufficient for the outcome in the dataset under question. But I am unsure how I have to read it: “either X implies Y, or I did not observe X”? Or “X is causally relevant for Y in the data under question, but I don´t know if it suffices on its own”? There may well be specific situations in which all we want to know if some conditions are causally relevant subsets of sufficient conditions or not. But I find it misleading to claim that this is the only legitimate or even the main research interest of studies using QCA. I can think of many situations, such as public health crises or enforcing EU law, in which reliably achieving or preventing an outcome would have priority.

Let me be clear. The problem we are talking about is really neither QCA nor some solution type. The elephant in the room is essentially that observational data are rarely perfect and do not obey to the laws of logic. But is QCA-PS really the best, or the only, or at all, a way out of this problem?

Point of agreement 3: There are promising and less promising strategies for causal assessment

The technical moment of QCA shares with statistical techniques that it is simply a cross-case comparison of data-set observations. As such, of course it also shares with other methods the limited possibility for directly deriving causal inferences from observational data. Most QCA scholars would therefore be very cautious to interpret QCA results causally when using observational data and in the presence of limited diversity. Obviously, set relation does not equal causation. How then, could a specific minimization algorithm alone plausibly facilitate causal interpretability?

QCA (as opposed to CNA) was always designed to be a multi-method approach. This means that the inferences of the cross-case comparison are not just interpreted as such, but strengthened and complemented with additional insights, usually theoretical, conceptual and case knowledge. Or, as Ragin (2008: 173) puts it:

“Social research (…) is built upon a foundation of substantive and theoretical knowledge, not just methodological technique”.

This way, we can combine the advantages offered by different methods and sources. Used in a formalized way, the combination of QCA with process tracing can even help to disentangle causally relevant from causally irrelevant conditions. This, of course, does not preclude the possibility that some solution types may lend themselves more to causal interpretation than others. It does suggest, though, that focusing on specific solution types alone is an ill-suited strategy for making valid causal assessments.

Point of disagreement: Nobody assumes that “everything matters”

Allow me to disagree that an approach emphasizing substantive interpretability assumes “everything is relevant”. Of course that is nonsense. Like with any other social science method I know, the researcher first screens the literature and field in order to identify potentially relevant explanatory factors. The logic of truth table analysis (as opposed to CNA?) is then to start out with the subset of these previously identified conditions that themselves consistently are a subset of the outcome set, and then it searches for evidence that they are irrelevant. This is not even an assumption, and it is very far from being “everything”.

In my view it makes sense to have a division of labour: Users follow protocols, methodologists foster methodological innovation and progress. I hope the above has made it clear that we are in the midst of, in my view, welcome and needed debate about what “correctness” and validity” means in the QCA context. I find it useful to think of this as a diversity of approaches to QCA. It is important that researchers reflect about the ontology that underlies their work, but we should avoid making premature conclusions as well.

Currently (but I may be proven wrong) I am thinking that each solution type has its merits and limitations. We can’t eliminate limited diversity, but we can use different solution typos for different purposes. For example, if policymakers seek to avoid investing public money in potentially irrelevant measures, the PS could be best. If they are interested in creating situations that are 100% sure to ensure an outcome (e.g. disease prevention), then the conservative solution is best and the parsimonious solution very risky. If we have strong theoretical knowledge or prior evidence available for counterfactual reasoning, intermediate solutions are best. And so on. From this perspective, it is good that we can refer to different solution types with QCA. It forces researchers to think consciously about what the goal of their analysis is, and how it can be adequately reached. It prevents them from just mechanically running some algorithm on their data.

All of the above is why I agree with the COMPASSS statement that …

“the current state of the art is characterized by discussions between leading methodologists about these questions, rather than by definitive and conclusive answers. It is therefore premature to conclude that one solution type can generally be accepted or rejected as “correct”, as opposed to other solution types”.

# QCA solution types and causal analysis

Qualitative Comparative Analysis (QCA) is a relative young research methodology that has been frequently under attack from all corners, often for the wrong reasons. But there is a significant controversy brewing up within the community of people using  set-theoretic methods (of which QCA is one example) as well.

Recently, COMPASSS – a prominent network of scholars interested in QCA – issued a Statement on Rejecting Article Submissions because of QCA Solution Type. In this statement they ‘express the concern … about the practice of some anonymous reviewers to reject manuscripts during peer review for the sole, or primary, reason that the given study chooses one solution type over another’. The ‘solution type’ refers to the procedure used to minimize the ‘truth tables’ which collect the empirical data in QCA (and other set-theoretic) research when there are unobserved combinations of conditions (factors, variables) in the data. Essentially, in cases of missing data (which is practically always) together with the data minimization algorithm the solution type determines the inference you get from the data.

I have not been involved in drawing up the statement (and I am not a member of COMPASSS), and I have not reviewed any articles using QCA recently, so I am not directly involved in this controversy on either side. At the same time, I have been interested in QCA and related methodologies for a while now, I have covered their basics in my textbook on research design, and I remain intrigued both by their promise and their limitations. So I feel like sharing my thoughts on the matter, even if others might have much more experience with QCA.

(1a) First, let me say that no matter what one thinks about the appropriateness of a solution type, no single criterion in isolation should be used to reject manuscripts during anonymous peer review. The reviewer’s recommendation should reflect not only the methodology used, but the original research goal and the types of inferences being made. I can only assume that for COMPASSS to issue such a statement, the problem has been one of systematic rejection due to this one single reason. This is worrisome because the peer review process does not offer possibilities for response, let alone for debate of methodological issues.

(1b) At the same time, if the method that is used is not appropriate for the research goal and does not support the inferences advanced in the manuscript, then rejection is warranted and no further justification is needed.

(2a) So it all depends whether a single solution type should be used in all QCA analyses. In principle, the answer to this question is ‘No’. There are three main types of solutions (parsimonious, complex, and intermediate), and each can be appropriate in different circumstances.

(2b) However, when it comes to causal analysis, my answer is ‘Yes. Only the parsimonious solution should be used to make causal inferences.’ My answer is based on Michael Baumgartner’s analysis (see the 2015 version here), and I will explain why I find it persuasive below. So, if manuscripts make causal claims based on non-parsimonious solution types, I would see that as sufficient grounds for rejection (or rather for revision), unless the authors explicitly subscribe to a very peculiar social ontology in which everything has causal relevance for an outcome unless we have evidence to the contrary (I will explain this below). In my view, the standard ontology is that no factor has causal relevance for an outcome unless we have evidence that it does.

To sum up so far, the COMPASSS people might be right in general, but for the important class of causal analyses they are wrong.

(3) Why only the parsimonious solution should be used to make causal claims? In short, because the relations of necessity and sufficiency are monotonic (so that if A is sufficient for E, AB is also sufficient, for any arbitrary B). Imagine a causal structure in which the presence of A is necessary and sufficient for the presence of E and B is irrelevant. Further imagine that we only have two empirical observations {ABE} and {aBe} (small letters denotes the absence of the condition/outcome).  This data is incomplete as we have no information on what happens under the logically possible situations {Ab} and {ab} (these would be logical remainders in the truth table), so we have to use some further rules (e.g. solution type) to derive the outcome. The complex solution is AB->E (the presence of both A and B is necessary and sufficient for the outcome E to occur). This solution type assumes that we cannot ignore B: as we have no data on what happens when it is not present, it is prudent to assume that B matters and keep it in the resulting formula (e.g. causal recipe). However, this formula and the conclusion it leads to are wrong because we posited above that B is irrelevant. The parsimonious solution is A->E (the presence of A is necessary and sufficient for the presence of E). The solution has eliminated B making the assumption that B does not matter as this would result in a more parsimonious solution. This is the correct inference in our example.

Our example is simple but in no way contrived (Baumgartner has other, more complex examples in the paper). In fact, we can add any number of factors to A and as long as they do not vary across our cases they will appear to be components of the outcome formula (causal recipe). That is, any random factor can be made to appear as causally relevant in the presence of limited diversity in the cases being studied. In the limit, we can make every aspect of a case appear to be causally relevant for an outcome, if we do not have cases combining factors in a way that makes it possible to disprove their illusory relevance.

You can say that this is only fair. But it would only be appropriate in a world where everything matters for everything else, unless some empirical cases point to the opposite. But such a worldview (ontology) is rare among social scientists (and I have never seen it openly endorsed). Note that the problem is not that we imply a separate independent effect of B on E: worse, the solution implies that B must be present for the effect of A to obtain.

To sum up so far, only the parsimonious solution type can provide causal inference from QCA data under a standard social ontology, because of the monotonicity of the relationships of necessity and sufficiency.

(4) So what is the response of the COMPASSS people to this analysis? In fact, I do not know. To the best of  my knowledge, there has been no published response/critique to Michael Baumgartner’s article. In the Statement, the following arguments are given:

(a) ‘The field of QCA and set-theoretic methods is not quite standardized.’ ‘The field is currently witnessing an ongoing and welcome methodological debate about the correctness of different solution types.’ ‘the current state of the art is characterized by discussions between leading methodologists about these questions’.

All this might be the case, but the point about non-parsimonious solutions deriving faulty causal inferences seems settled, at least until there is a published response that rebukes it. There might be debate, but I have not seen a published response to Baumgartner’s analysis or any other persuasive argument why he is wrong on this point in particular.

(b) ‘users applying [QCA and other set-theoretic] methods who refer to established protocols of good practice must not be made responsible for the fact that, currently, several protocols are being promoted’. Well, users cannot be made responsible, but if the protocols they follow are faulty, their manuscripts cannot be accepted as the analyses would be wrong.

(5) So despite offering no reasons why non-parsimonious solutions are appropriate for causal analysis contra Baumgartner, COMPASSS’ statement finishes with ‘all solutions are empirically valid’.

I am not sure what this means. All solutions cannot be empirically valid as they can point to contradictory conclusions. Either A->E or AB->E; either B is causally relevant or not. Technically, any solution might be valid in light of a set of background assumptions, research goals and analytic procedures (in the sense that both 2+2=4 and 2+2=5 are valid under some assumptions.) But that’s the crux of the matter: if one has causal goals and uses the non-parsimonious solution, then the solution is only valid if one assumes that in the social world everything causes and conditions everything else unless proven otherwise.

To conclude, if a group of researchers have been systematically sabotaging the work of other scholars for the sole fact of using a certain type of solution concept, that’s bad.  But if they have been rejecting manuscripts that have used non-parsimonious solutions to derive causal inferences without clear commitments to an ‘everything matters’ worldview, that seems OK to me, in light of the (published) methodological state of the art.

P.S. The issue of counterfactuals enters this debate quite often.
(a) But in his 2015 analysis Baumgartner does not evoke his/a regularity theory of causality. All he needs for the analysis is a notion of a cause as a difference-maker, which in my understanding is compatible with a counterfactual understanding of causality. So any rejection of his argument against non-parsimonious solutions cannot be derived from differences between regularity and counterfactual notions of causality.
(b) Baumgartner notes that the parsimonious solution sometimes requires one to make counterfactuals about impossible states of the world. With this critique he motivates abandoning the Quine-McCluskey Boolean minimization procedure (in the framework of which one must choose the parsimonious, complex, or intermediate solution types) altogether and adopting the coincidence analysis framework, which has the parsimonious solution ‘built-in’ in its algorithms. But his is not a critique against counterfactuals as such.
(c) At the same time, the complex solution also relies on assumption-based counterfactuals, namely that a factor matters unless shown otherwise. So the reliance on counterfactuals cannot be used to arbitrate this debate.

For further discussion of these issues, see Thiem and Baumgarnter 2016, Ingo Rohlfing’s blog post (with a response in the comments by Michael Baumgartner), Schneider 2016, the Standards of Good Practice in QCA,

[addendum 31/08/2017] Michael Baumgartner and Alrik Thiem have published a reply to the COMPASSS statement in which they write: ‘We endorse the prerogative of journal editors and reviewers to favor rejection if they come to the conclusion that a manuscript does not merit publication because of its choice of an unsuitable solution type.’ And they urge for more debate.

# Explanation and the quest for ‘significant’ relationships. Part II

In Part I I argue that the search and discovery of statistically significant relationships does not amount to explanation and is often misplaced in the social sciences because the variables which are purported to have effects on the outcome cannot be manipulated.

Just to make sure that my message is not misinterpreted – I am not arguing for a fixation on maximizing R-squared and other measures of model fit in statistical work, instead of the current focus on the size and significance of individual coefficients. R-squared has been rightly criticized as a standard of how good a model is** (see for example here). But I am not aware of any other measure or standard that can convincingly compare the explanatory potential of different models in different contexts. Predictive success might be one way to go, but prediction is altogether something else than explanation.

I don’t expect much to change in the future with regard to the problem I outlined. In practice, all one could hope for is some clarity on the part of the researchers whether their objective is to explain (account for) or find significant effects. The standards for evaluating progress towards the former objective (model fit, predictive success, ‘coverage’ in the QCA sense) should be different than the standards for the latter (statistical & practical significance and the practical possibility to manipulate the exogenous variables).

Take the so-called garbage-can regressions, for example. These are models with tens of variables all of which are interpreted causally if they reach the magic 5% significance level. The futility of this approach is matched only by its popularity in political science and public administration research. If the research objective is to explore a causal relationship, one better focus on that variable and  include covariates only if it is suspected that they are correlated with the outcome and with the main independent variable of interest. Including everything else that happens to be within easy reach not only leads to inefficiency in the estimation. One should refrain from  interpreting causally the significance of these covariates altogether. On the other hand, if the objective is to comprehensively explain (account for) a certain phenomenon, then including as many variables as possible might be warranted but then the significance of individual variables is of little interest.

The goal of research is important when choosing the research design and the analytic approach. Different standards apply to explanation, the discovery of causal effects, and prediction.

**Just one small example from my current work – a model with one dependent and one exogenous time-series variables in levels with a lagged dependent variable included on the right-hand side of the equation produces an R-squared of 0.93. The same model in first differences has an R-squared of 0.03 while the regression coefficient of the exogenous variable remains significant in both models. So we can ‘explain’ 90% of the variation in the first case by reference to the past values of the outcome. Does this amount to an explanation in any meaningful sense? I guess that depends on the context. Does it provide any leverage to the researcher to manipulate the outcome? Not at all.

# Is unit homogeneity a sufficient assumption for causal inference?

Is unit homogeneity a sufficient condition (assumption) for causal inference from observational data?

Re-reading King, Keohane and Verba’s bible on research design [lovingly known to all exposed as KKV] I think they regard unit homogeneity and conditional independence as alternative assumptions for causal inference. For example: “we provide an overview here of what is required in terms of the two possible assumptions that enable us to get around the fundamental problem [of causal inference]” (p.91, emphasis mine). However, I don’t see how unit homogeneity on its own can rule out endogeneity (establish the direction of causality). In my understanding, endogeneity is automatically ruled out with conditional independence, but not with unit homogeneity (“Two units are homogeneous when the expected values of the dependent variables from each unit are the same when our explanatory variables takes on a particular value” [p.91]).

Going back to Holland’s seminal article which provides the basis of KKV’s approach, we can confirm that unit homogeneity is listed as a sufficient condition for inference (p.948). But Holland divides variables into pre-exposure and post-exposure before he even gets to discuss any of the additional assumptions, so reverse causality is ruled out altogether. Hence, in Holland’s context unit homogeneity can indeed be regarded as sufficient, but in my opinion in KKV’s context unit homogeneity needs to be coupled with some condition (temporal precedence for example) to ascertain the causal direction when making inferences from data.

The point is minor but can create confusion when presenting unit homogeneity and conditional independence side by side as alternative assumptions for inference.