Evaluation of Multiple Imputation with Large Proportions of Missing Data: How Much Is Too Much?

#Methods #MissingData #MultipleImputation

Evaluation of Multiple Imputation with Large Proportions of Missing Data: How Much Is Too Much?

Key takeaways

Whilst MI outperforms CCA in both absolute bias and RME statistics with increasing missingness, this does not get to the root of if the data is useable to begin with. Large MAR missingness could be 'worthless' in both a CCA or MI context... It does however show that MI is always the way to go if given a choice.

(file:///C:\Users\scott\Downloads\IJPH-50-1372.pdf)

Bibliography: Hyuk Lee, J., Huber Jr., J.C., 2021. Evaluation of Multiple Imputation with Large Proportions of Missing Data: How Much Is Too Much? ijph. https://doi.org/10.18502/ijph.v50i7.6626

Authors:: Jin Hyuk Lee, J. Charles Huber Jr.

Collections:: Methods

First-page:

Abstract

Background: Multiple Imputation (MI) is known as an effective method for handling missing data in public health research. However, it is not clear that the method will be effective when the data contain a high percentage of missing observations on a variable. Methods: Using data from “Predictive Study of Coronary Heart Disease” study, this study examined the effectiveness of multiple imputation in data with 20% missing to 80% missing observations using absolute bias (|bias|) and Root Mean Square Error (RMSE) of MI measured under Missing Completely at Random (MCAR), Missing at Random (MAR), and Not Missing at Random (NMAR) assumptions. Results: The |bias| and RMSE of MI was much smaller than of the results of CCA under all missing mechanisms, especially with a high percentage of missing. In addition, the |bias| and RMSE of MI were consistent regardless of increasing imputation numbers from M=10 to M=50. Moreover, when comparing imputation mechanisms, MCMC method had universally smaller |bias| and RMSE than those of Regression method and Predictive Mean Matching method under all missing mechanisms. Conclusion: As missing percentages become higher, using MI is recommended, because MI produced less biased estimates under all missing mechanisms. However, when large proportions of data are missing, other things need to be considered such as the number of imputations, imputation mechanisms, and missing data mechanisms for proper imputation.

Citations

content: "@hyukleeEvaluationMultipleImputation2021" -file:@hyukleeEvaluationMultipleImputation2021

Reading notes

Annotations

(08/05/2024, 21:11:08)

“However, it is not clear that the method will be effective when the data contain a high percentage of missing observations on a variable.” (Hyuk Lee and Huber Jr., 2021, p. 1372)

“The |bias| and RMSE of MI was much smaller than of the results of CCA under all missing mechanisms, especially with a high percentage of missing. In addition, the |bias| and RMSE of MI were consistent regardless of increasing imputation numbers from M=10 to M=50. Moreover, when comparing imputation mechanisms, MCMC method had universally smaller |bias| and RMSE than those of Regression method and Predictive Mean Matching method under all missing mechanisms.” (Hyuk Lee and Huber Jr., 2021, p. 1372)

“As missing percentages become higher, using MI is recommended, because MI produced less biased estimates under all missing mechanisms. However, when large proportions of data are missing, other things need to be considered such as the number of imputations, imputation mechanisms, and missing data mechanisms for proper imputation.” (Hyuk Lee and Huber Jr., 2021, p. 1372)

“As shown in Table 1, MI had a lower |bias| and RMSE than CCA under all missing mechanisms.” (Hyuk Lee and Huber Jr., 2021, p. 1374)

“The |bias| and RMSE were obtained using a true parameter estimate (128.63), the mean of SBP at 0% missing.” (Hyuk Lee and Huber Jr., 2021, p. 1374)

“When CCA was employed for the data, the absolute bias and root mean squared error of the CCA was noticeably larger than those of MI.” (Hyuk Lee and Huber Jr., 2021, p. 1377)