How many people have died from COVID-19? What is the impact of COVID-19 on mortality in a population? Can we use excess mortality to estimate the effects of COVID-19?
In this text I will explain why the answer to the first two questions need not be the same. That is, the sum of cases where COVID-19 has been determined to be the direct cause of death need not be the same as the population-level estimate about the causal impact of COVID-19. When measurement of the individual-level causes of death is imperfect, using excess mortality (observed minus expected) to measure the impact of COVID-19 leads to an underestimate of the number of individual cases where COVID-19 has been the direct cause of death.
The major assumption on which the argument rests is that some of the people who have died from COVID-19 would have died from other causes, within a specified relatively short time-frame (say, within the month). It seems very reasonable to assume that at least some of the victims of COVID-19 would have succumbed to other causes of death. This is especially easy to imagine given that COVID-19 kills disproportionally the very old and that the ultimate causes of death that it provokes – respiratory problems, lungs failure, etc. – are shared with other common diseases with high mortality among the older population, such as the flu.
Defining individual and population-level causal effects
With this crucial assumption in mind, we can construct the following simple table. Cell A contains the people who would have survived if they had not caught the Coronavirus, but they caught it and died. Cell B contains the people that caught the Coronavirus and died, but would have died from other causes even if they did not catch the virus. Cell C contains the people who caught the virus and survived and would have survived even if they did not catch the virus. Cell D contains the people who would have died if they did not catch virus, but they did and survived. Cell C is of no interest for the current argument, and for now we can assume that cases in Cell D are implausible (although this might change if we consider indirect effects of the pandemic and the policy measures it provoked. But for now, we ignore such indirect effects). Cell E is people that did not catch the virus and survived (also not interesting for the argument). Cell F is people who did not catch the virus and died from other causes. As a matter of definition, total mortality within a period is A + B + F.
|Caught Corona-virus and died||Caught Corona-virus and survived||Did not catch Coronavirus|
|Would have survived unless catches Corona||A||C||E|
|Would have died if doesn’t catch Corona||B||D||F|
The number of individual-level deaths directly caused by COVID-19 that can be observed is the sum of cells A + B. Without further assumptions and specialized knowledge, we cannot estimate the share of cases that would have died anyways from the total. For now, just assume that this is positive; that is, such cases exist. The population-level causal impact of COVID-19 is A, or, in words, those that have died from COVID-19 minus those that would have died from other causes within the same period. The population-level causal effect is defined counterfactually. Again, without further assumptions about the ratio of B to A, the population-level causal impact of COVID-19 is not identifiable. An important conclusion that we reach is that the population-level causal impact of COVID-19 on mortality does not necessarily sum up to the sum of the individual cases where COVID-19 was the cause of death.
Scenario I: perfect measures of individual-level causes of death
Assume for the moment that all individual cases where COVID-19 was the cause of death are observed and recorded. Under this assumption, what does excess mortality measure? Excess mortality is defined as the difference between the observed (O) and predicted (P) number of deaths within a period, with the prediction (expectation) coming from historical averages, statistical models or anything else. Under our definitions, the observed mortality in O a period contains groups A + B + F. So the difference between observed O and predicted P gives A, or the number of people that have died from COVID-19, but would have survived otherwise. Therefore, excess mortality identifies the population-level causal impact of the COVID-19 (see also the figure below).
One implication of this line of reasoning is that under perfect measurement of individual-level cause of deaths and a positive amount of people who would have died from other causes if they had not died from COVID-19 (cell B), the sum of the cases where COVID-19 was recorded as a cause of death should exceed the excess in observed mortality O – P. (See the situation in France where this might be happening.)
Scenario II: imperfect measures of individual-level causes of death
Let’s consider now a more realistic scenario where determining and recording the individual causes of death is imperfect. Under this assumption, the observed number of deaths in a period still contains O = A + B + F. Excess mortality O – P still identifies the population level effect A. However, this is not the number of deaths directly caused by COVID-19, which includes those that would have died anyways (B): a category that is already included in the prediction about mortality during this period .
In other words, excess mortality underestimates the sum of individual cases where COVID-19 is the direct cause of death. The amount of underestimation depends on how large the share of people who would have died from other causes but died from COVID-19 is. The larger the share, the larger the underestimation. To put it bluntly, COVID-19 kills more people than excess mortality suggests. This is because the expected number of deaths, on which the calculation of excess mortality depends, contains a share of people that would have died from other causes, but were killed by the virus.
These are the main conclusions from the analysis:
- The sum of individual-level cases where COVID-19 was the direct cause of death is not the same as the population-level causal impact of the virus.
- Excess mortality provides a valid estimate of the population-level causal impact.
- When measurement of the individual causes of death is imperfect, excess mortality provides an underestimate of the sum of individual cases where COVID-19 was the cause of death.
- With perfect measurement of the individual causes of death, excess in mortality should be lower than then the sum of the individual case where COVID-19 was the cause of death.
 I suspect some will object that the coronavirus and COVID-19 are never the direct causes of death but only provoke other diseases that ultimately kill people. This is irrelevant for the argument: I use ‘COVID-19 as a direct case of death’ as a shortcut for a death that was caused by COVID-19 provoking some other condition that ultimately kills.
 Formally, for people in cell B, COVID-19 is a sufficient but not necessary condition for dying within a certain period. For people in cell A, COVID-19 is both necessary and sufficient. Because of the counterfactual definition of the population-level effect, it only tracks cases where the cause was both necessary and sufficient.
 In reality, the models used to predict and estimate the expected mortality are imperfect and incorporate considerable uncertainties. These uncertainties compound the estimation problems discussed in the text, but the problems will exist even if the expected mortality was predicted perfectly.
 Extending the analysis to include indirect effects of COVID-19 and the policy responses it led to is interesting and important but very challenging. There are multiple plausible mechanisms for indirect effects, some of which would act to decrease mortality (e.g. less pollution, fewer traffic accidents, fewer crime-related murders, etc.) and some of which would act to increase mortality (e.g. due to stress, not seeking medical attention on time, postponed medical operations, increases in domestic violence, self-medication gone wrong, etc.). The time horizon of the estimation becomes even more important as some of these mechanisms need more time to exert their effects (e.g. reduced pollution). Once we admit indirect effects, the calculation of the direct population-level effect of COVID-19 from excess mortality data becomes impossible without some assumptions about the share and net effect of the indirect mechanisms, and the estimation of the sum of individual-level effects becomes even more muddled.