ML Versus MI for Missing Data With Violation of Distribution Conditions

Key takeaways

(file:///C:\Users\scott\Zotero\storage\HRQEQSP2\Yuan%20et%20al.%20-%202012%20-%20ML%20Versus%20MI%20for%20Missing%20Data%20With%20Violation%20of%20Di.pdf)

Bibliography: Yuan, K.-H., Yang-Wallentin, F., Bentler, P.M., 2012. ML Versus MI for Missing Data With Violation of Distribution Conditions. Sociological Methods & Research 41, 598–629. https://doi.org/10.1177/0049124112460373

Authors:: Ke-Hai Yuan, Fan Yang-Wallentin, Peter M. Bentler

Collections:: Missing Data Sim Paper

First-page: 598

Abstract

Normal-distribution-based maximum likelihood (ML) and multiple imputation (MI) are the two major procedures for missing data analysis. This article compares the two procedures with respects to bias and efficiency of parameter estimates. It also compares formula-based standard errors (SEs) for each procedure against the corresponding empirical SEs. The results indicate that parameter estimates by MI tend to be less efficient than those by ML; and the estimates of variance -covariance parameters by MI are also more biased. In particular, when the population for the observed variables possesses heavy tails, estimates of variance -covariance parameters by MI may contain severe bias even at relative large sample sizes. Although performing a lot better, ML parameter estimates may also contain substantial bias at smaller sample sizes. The results also indicate that, when the underlying population is close to normally distributed, SEs based on the sandwich-type covariance matrix and those based on the observed information matrix are very comparable to empirical SEs with either ML or MI. When the underlying distribution has heavier tails, SEs based on the sandwich-type covariance matrix for ML estimates are more reliable than those based on the observed information matrix. Both empirical results and analysis show that neither SEs based on the observed information matrix nor those based on the sandwich-type covariance matrix can provide consistent SEs in MI. Thus, ML is preferable to MI in practice, although parameter estimates by MI might still be consistent.

Citations

content: "@yuanMLMIMissing2012" -file:@yuanMLMIMissing2012

Reading notes

Imported on 2025-04-27 17:52

⭐ Important

& This article compares the two procedures with respects to bias and efficiency of parameter estimates. (p. 598)
& The results indicate that parameter estimates by MI tend to be less efficient than those by ML; and the estimates of variance -covariance parameters by MI are also more biased. (p. 598)
& when the population for the observed variables possesses heavy tails, estimates of variance -covariance parameters by MI may contain severe bias even at relative large sample sizes. (p. 598)
& Although performing a lot better, ML parameter estimates may also contain substantial bias at smaller sample sizes. (p. 598)
& Under the assumption of a correctly specified parametric model and that data are missing at random (MAR), both procedures generate consistent parameter estimates and consistent standard errors (SE; e.g., Little and Rubin 2002; Schafer 1997). (p. 599)