A Comparison of Three Popular Methods for Handling Missing Data: Complete-Case Analysis, Inverse Probability Weighting, and Multiple Imputation
A Comparison of Three Popular Methods for Handling Missing Data: Complete-Case Analysis, Inverse Probability Weighting, and Multiple Imputation
Key takeaways
Some decent information here on Inverse Probability Weighting - also provides an adequate argument for not including that in my handling missing data chapter.
Bibliography: Little, R.J., Carpenter, J.R., Lee, K.J., 2022. A Comparison of Three Popular Methods for Handling Missing Data: Complete-Case Analysis, Inverse Probability Weighting, and Multiple Imputation. Sociological Methods & Research 004912412211138. https://doi.org/10.1177/00491241221113873
Authors:: Roderick J. Little, James R. Carpenter, Katherine J. Lee
Collections:: To Read
First-page:
Missing data are a pervasive problem in data analysis. Three common methods for addressing the problem are (a) complete-case analysis, where only units that are complete on the variables in an analysis are included; (b) weighting, where the complete cases are weighted by the inverse of an estimate of the probability of being complete; and (c) multiple imputation (MI), where missing values of the variables in the analysis are imputed as draws from their predictive distribution under an implicit or explicit statistical model, the imputation process is repeated to create multiple filled-in data sets, and analysis is carried out using simple MI combining rules. This article provides a non-technical discussion of the strengths and weakness of these approaches, and when each of the methods might be adopted over the others. The methods are illustrated on data from the Youth Cohort (Time) Series (YCS) for England, Wales and Scotland, 1984–2002.
content: "@littleComparisonThreePopular2022" -file:@littleComparisonThreePopular2022
Reading notes
Annotations
(08/05/2024, 21:21:36)
“CC is often described as biased unless the data are missing completely at random, as defined in the next section, and IPW is widely viewed as 4 Sociological Methods & Research 0(0” (Little et al., 2022, p. 4)
“for reducing the bias of CC analysis. However, for some problems, CC analysis is actually less biased than IPW.” (Little et al., 2022, p. 5)
“If the complete cases are not a random subsample, CC will give biased answers for simple summary measures (such as mean, sd) and may yield biased answers for regression models, although not in all situations, as discussed below. Secondly, CC discards information in the incomplete cases, which has typically cost non-trivial resources to collect, and which will often contain information for reducing bias or increasing the efficiency of CC estimates.” (Little et al., 2022, p. 6)
“A modification of CC, commonly used to handle unit nonresponse in surveys, is inverse probability weighting (IPW), which weights complete units by the inverse of an estimate of the probability of response (see e.g. Seaman and White 2011).” (Little et al., 2022, p. 7)
“In particular, when estimating a population mean, the sample mean is replaced by the weighted mean. IPW can also be applied to estimators other than means, such as regression coefficients, or more generally, estimators for generalized estimating equations (weighted GEE).” (Little et al., 2022, p. 7)
“With more extensive background information, a generalization of adjustment cell weighting is response propensity stratification, where (a) the indicator for unit nonresponse is regressed on the background variables, using the combined data for respondents and nonrespondents, using a method such as logistic regression appropriate for a binary outcome; (b) a predicted response probability is computed for each respondent based on the regression in (a); and (c) adjustment cells are formed based on a categorized version of the predicted response probability.” (Little et al., 2022, p. 7)
“Although IPW can be useful for reducing nonresponse bias, it does have serious limitations. First, information in the incomplete cases is only used to determine the weights (i.e. the weight model uses variables that are fully Little et al.” (Little et al., 2022, p. 7)
“observed on both respondents and non-respondents), and partially observed cases are still discarded in the weighted analysis. This fact means that the method is generally inefficient when, as will often be the case, there is substantial information in these partially observed cases. Therefore, weighted estimates can have unacceptably high variance, especially when extreme values of a variable are given large weights.” (Little et al., 2022, p. 8) Important justification to not use/construct these weights for NCDS/BCS
“Variance estimation for weighted estimates will ideally take into account uncertainty in the estimated weights, otherwise standard errors will be overestimated so inferences will be conservative.” (Little et al., 2022, p. 8)