Skip to content

When you choose to publish with PLOS, your research makes an impact. Make your work accessible to all, without restrictions, and accelerate scientific discovery with options like preprints and published peer review that make your work more Open.

PLOS BLOGS Speaking of Medicine and Health

Estimating Excess Mortality during the COVID-19 Pandemic

By guest contributor Bhramar Mukherjee, Chair and Professor of Biostatistics, University of Michigan

There have been several global studies recently estimating excess mortality during 2020 and 2021 [Wang et al (IHME), Knutson et al(WHO), Solstad et al (The Economist)] . With different countries having different levels of testing, varying protocols for classifying COVID-19 deaths, data paucity, data opacity (often driven by political dictum), incomplete death registration systems plagued by backlogs, such studies are extremely critical. Excess death estimates help us to assess the true toll of the pandemic and ensure a fairer comparison across countries. However, imperfect, and incomplete death registration often makes these calculations a tricky “missing data” problem.

What is excess mortality?

Excess mortality is not just “unreported or hidden COVID deaths”. While COVID is a large driver of excess deaths during the pandemic, excess mortality is a more holistic measure of the true toll of the pandemic which is composed of various components as described in Figure 1.

Let us first look at the definition of excess mortality. For a given location, and at a given time

All cause excess mortality due to the COVID-19 pandemic

=Observed all cause mortality during the pandemic

Expected all cause mortality had the pandemic not happened

  Figure 1. Conceptual Framework Underlying Excess Mortality

Where and how are models used to estimate excess deaths?

One may think that you would only need a model to calculate the counterfactual prediction of the second term, namely, what would have been expected had the pandemic not happened. The models that are used to come up with these expected death numbers typically are time series models with seasonal effects and secular trends built on historic mortality data.  These models frequently adjust for incompleteness of the death registration system that varies across geography and time. Typically, the first term, namely, the observed all-cause mortality data will come directly from the national/civil death registration system. This number again may need to be corrected/rescaled for incompleteness in death registration. Auxiliary nationally representative probabilistic sample surveys at a household or family level are often used to assess the fraction of missed deaths in the civil registration/vital statistics system.

However, more extensive modeling is needed for the first term if the observed all-cause mortality data are incomplete and available only at a subnational level and different states contribute data to different periods of time. Then, to obtain estimates for locations and times where the observed data are not available, one needs to make assumptions. One can assume a relationship between the death counts of observed locations and unobserved locations based on past/available data and/or use an ensemble of covariates that are predictive of all-cause mortality. These covariates may include pre-pandemic health and socioeconomic indicators, markers of the acuteness of the pandemic, demographics, public health interventions that were rolled out, meteorological variables and alike. Penalized regression methods and machine-learning approaches have been used to sift through the feature space of potential predictors of mortality. We have seen several approaches now for doing this kind of prediction at a global scale, by the WHO, by the Economist, by the Institute of Health Metrics and Evaluation (IHME). It is important to state the assumptions about missing data patterns over space and time in these models,  and to properly quantify uncertainty in these predictions and the final estimates.

One should also compare age-sex specific excess mortality rates wherever possible instead of total excess mortality as different countries have different population sizes and have different age-sex composition. Mortality rates vary by these demographics. For example, the median age in the US is 38 years whereas in India it is 28 years, the proportion of population above 65 years (a high risk group for COVID related mortality) in the two countries are 16% and 6% roughly. We need to consider these differences while comparing mortality rates in these two countries.

All models are wrong, but some are useful

Every prediction model makes its own set of assumptions and emits different estimates of excess deaths with corresponding measures of uncertainty. For example, the WHO estimate of global excess deaths (15M) and the IHME estimate of the same (18M) for 2020-21, differ by nearly 3 million deaths worldwide (Figure 1). While this difference is not inconsequential, the broader takeaway messages from these global studies are clear:

(a)The death toll of this pandemic is much more than the <6M reported COVID deaths worldwide

(b)We need more uniform and reliable weekly mortality surveillance data in all countries.

( c) 41 out of the 47 countries in  Africa have no death data reported to the WHO during this period. Globally, 85 of the 194 countries have not shared any data with WHO.

(c) Even for countries for which we have complete or partial data from their official vital statistics systems, we need to understand and characterize the incompleteness of their death registration systems.

(d) Every country should be held accountable for data transparency and data quality. When we are comparing countries in terms of reported death numbers, the reporting systems and accuracy should be comparable. Otherwise, we are penalizing countries with faithful reporting of true data versus countries suppressing fatality numbers or just refusing to share their data.

My own research on this issue

I and my team have been modeling the pandemic in India since March 2020. The recent  WHO report has generated animated and antagonistic debates across India. The WHO estimates for India of 4.74M excess deaths during 2020-2021 are perceived to be too high by some, fair by some, and a gross underestimate by others. Recall that as of Dec 31, 2021, India reported 481,000 COVID deaths, so the estimated numbers are nearly 10 times than what was reported for COVID deaths. Within a few hours of the release of the WHO report, there has been a press release by the government officials questioning the mathematical models and input data.

Unfortunately, not just in the WHO report, in all three of the above global studies, India leads the chart in terms of estimates of total excess mortality.  This is largely due to the catastrophic second wave in 2021. A synthesis of multiple peer-reviewed high-profile studies using diverse sources of data provides an estimated  total excess deaths somewhere between 3.2M-4.9M  in India during 2020-2021 (Table 1) with most deaths occurring in April-June of 2021.  Even if we take the lowest of these estimates, it is still orders of magnitude higher than the half a million reported COVID deaths. If we do believe that COVID-19 deaths were in fact adequately captured in the official reporting system in India, then we must investigate alternative explanations of these excess deaths, including but not limited to the collapse in the healthcare system during an active surge that led to this excess mortality in 2021.  

The resistance of India to accept these numbers has led to a heated scientific discourse. Contrary to the claims made by the government of India, the WHO model for India, led by Professor Jon Wakefield from the University of Washington was not transported or adapted from a generic model built for other countries, it was customized for India. Extensive sensitivity analyses were carried out to delineate the influence of the input subnational data. A wonderfully documented open-source shiny app was provided to ensure the transparency of the work, providing the input data, the methods and visualization tools As a practicing data scientist for more than 25 years, I find the dissemination and public access of this work to be exemplar.

India has recently released the 2020 civil registration system (CRS) data claiming that death reporting is 99.9% complete for 2020. Interestingly enough, the estimate for excess mortality of about 9-10% for 2020 obtained from multiple models that the government finds unacceptable is quite consistent with what would be obtained from the recently released CRS data. Most of the excess mortality from the models are attributed to April-June of 2021. Having access to the all-cause mortality data from 2021 CRS for India will indeed be illuminating and help tremendously in settling the apparently irreconcilable argument.

Table 1. Nationwide excess deaths estimates and COVID-19-related mortality in India from 2020-21

Study, Time PeriodExcess Deaths (LL, UL) in Millions COVID-19 Reported Deaths1 Under Reporting Factor (LL, UL)2    Data Source(s)
Leffler et al., 20223    
      Jan ’20-Aug ’212.6 (1.9, 3.5)438,5606.1 (4.5, 8.1)CRS
Jha et al., 20224    
      July ’20-May ‘2151.2 (1.0, 1.4)204,3306-7CRS
      Jun ’20-Jul ’2163.2 (3.1, 3.4)450,0006-7CVoter
Guilmoto, 20224    
      Mar ’20-Nov ’213.2458,9007.0Indian Railways, Kerala age & sex-specific death rates
      Mar ’20-Nov ’213.7458,9008.6MLA, Kerala age & sex-specific death rates
Wang et al., 2022
      Jan ’20-Dec ’214.0 (3.7, 4.3)481,0808.3 (7.6, 8.9)CRS
Anand et al., 20213    
      Apr ’20-Jun ’213.4 (1.1, 4.0)400,0008.5 (2.7, 10.0)CRS
      Apr ’20-Jun ’214.0400,00010.0International age-specific infection fatality rates
      Apr ’20-Jun ’214.9400,00012.2CMIE
Banaji and Gupta, 20214
      Apr ’20-Jun ’213.8 (2.8, 5.2)399,4899.5 (6.9, 13.0)CRS
World Health Organization3
      Jan ’20-Dec ’214.7 (3.3, 6.4)481,0809.8 (6.8, 13.4)Human Mortality Database, World Mortality Dataset, ACM subnational data
The Economist3
      Jan ’20-Dec ‘214.8 (1.2, 8.2)481,08010.1 (2.6, 17.2)Human Mortality Database, World Mortality Dataset, Mumbai estimates
Malani and Ramachandran, 20214
      Feb ’20-Aug ’216.3458,47013CMIE

Notes: N/A=Not available, CRS=Civil Registration System, MLA=Member of the Legislative Assembly sample, CVoter=CVoter India Omnibus telephone survey, HMIS=Health Management Information System, ACM=all-cause mortality, CMIE=Center for Monitoring Indian Economy Consumer Pyramids Household survey. Lower and upper uncertainty bounds for all-cause excess deaths estimates are included in this table, when provided in the study.

[1] COVID-19 Reported Deaths are obtained from, unless otherwise noted.

[2] Underreporting Factor is computed as Excess Deaths divided by COVID-19 Reported Deaths, unless otherwise noted.

[3] Excess Deaths, as well as COVID-19 Reported Deaths, are directly reported in this study.

[4] Underreporting Factor (URF), as well as COVID-19 Reported Deaths are directly reported in this study. Hence, the URF in this table is the precalculated estimate provided.

[5] The COVID-19 Reported Deaths provided in this study are across select states in the Civil Registration System (CRS).

[6] The precalculated Underreporting Factor and COVID-19 Reported Deaths reported in this study are through September 2021

Why should we care about all of this now?

Mortality, and cause specific mortality are important public health metrics even when we are not in a crisis. The pandemic has drawn public attention to the relevance of these measures, and the public has a right to accurate information. The cry for accurate mortality data is not just about paying respect to the dead but these measures have serious implications for the lives of the living. To prioritize future healthcare resources, to rank and compare performance and needs of different countries and to appreciate the magnitude of this pandemic on human life, we need to know more about the true excess mortality rate during the pandemic.

If complete and reliable data were available worldwide, the role of statistical models will be limited. Till we have such comprehensive data systems in place, we need such estimates curated by dispassionate data scientists and expert panels that are free of confirmation bias and political agenda. The pandemic has opened the opportunity for all of us to work together to build robust and resilient mortality surveillance systems with the goal of having disaggregated death data by age and sex available at a weekly level for all countries in the world. We have many good models; we desperately need good data from every country in the world!

Bhramar Mukherjee is John D Kalbfleisch Collegiate Professor and Chair of Biostatistics, Professor of Epidemiology and Global Public Health at the University of Michigan School of Public Health. In her faculty career spanning over two decades Bhramar has co-authored more than 320 papers in Biostatistics, Epidemiology, Public Health and Medicine journals. She and her team has been modeling the SARS-CoV-2 transmissions in India from March 2020.

Leave a Reply

Your email address will not be published. Required fields are marked *

Add your ORCID here. (e.g. 0000-0002-7299-680X)

Related Posts
Back to top