Handling Missing Data by Maximum Likelihood
Handling Missing Data by Maximum Likelihood
Key takeaways
(file:///C:\Users\scott\Downloads\Allison_2012_312-2012.pdf)
Bibliography: Allison, P., 2012. Handling Missing Data by Maximum Likelihood. SAS Global Forum.
Authors:: Paul Allison
Collections:: To Read, Methods
First-page: 1
Abstract
Multiple imputation is rapidly becoming a popular method for handling missing data, especially with easy-to-use software like PROC MI. In this paper, however, I argue that maximum likelihood is usually better than multiple imputation for several important reasons. I then demonstrate how maximum likelihood for missing data can readily be implemented with the following SAS® procedures: MI, MIXED, GLIMMIX, CALIS and QLIM.
Citations
content: "@allisonHandlingMissingData2012" -file:@allisonHandlingMissingData2012
Reading notes
Imported on 2024-05-07 19:34
⭐ Important
- & software like PROC MI. In this paper, however, I argue that maximum likelihood is usually better than multiple (p. 1)
- & 1. Introduce random variation into the process of imputing missing values, and generate several data sets, each with slightly different imputed values. 2. Perform an analysis on each of the data sets. 3. Combine the results into a single set of parameter estimates, standard errors, and test statistics. (p. 2)
- & For both multiple imputation and maximum likelihood, it is often desirable to incorporate auxiliary variables into the imputation or modeling process. Auxiliary variables are those that are not intended to be in the final model. Ideally, such variables are at least moderately correlated with the variables in the model that have missing data. By including auxiliary variables into the imputation model, we can reduce the uncertainty and variability in the imputed values. This can substantially reduce the standard errors of the estimates in our final model. Auxiliary variables can also reduce bias by getting us to a closer approximation of the MAR assumption. (p. 4)
- & Let W be a measure of annual income and let X be a vector of observed variables that will go into the final model, along with W. Suppose that 30% of the cases are missing income, and suppose that we have reason to suspect that persons with high income are more likely to be missing income. Letting R be a response indicator for W, we can express this suspicion as ) , ( ) , | 1 Pr( W X f W X R= = That is, the probability that W is missing depends on both X and W, which would be a violation of the MAR assumption. (p. 4)
- & With or without missing data, the first step in ML estimation is to construct the likelihood function. Suppose that we have n independent observations (i =1,..., n) on k variables (yi1, yi2,..., yik) and no missing data. The likelihood function is ∏ = = n i ik i i iy y y f L 1 2 1) (p. 5)
- & Now suppose that for a particular observation i, the first two variables, y1 and y2, have missing data that satisfy the MAR assumption. (More precisely, the missing data mechanism is assumed to be ignorable). The joint probability for that observation is just the probability of observing the remaining variables, yi3 through yik. If y1 and y2 are discrete, this is the joint probability above summed over all possible values of the two variables with missing data: = 12 ) ; , , ( ) ; , , (1 3 * yy ik i i ik i iy y f y y fθ θ If the missing variables are continuous, we use integrals in place of summations: = 12 1 2 2 1 3 *) , , ( ) ; , , ( yy ik i i i ik i i dy dy y y y f y y f θ Essentially, then, for each observation’s contribution to the likelihood function, we sum or integrate over the variables that have missing data, obtaining the marginal probability of observing those variables that have actually been observed. (p. 5)
- & Obviously, that’s not possible. But we can get very close to full efficiency with a relatively small number of data sets. As Rubin (1987) showed, for moderate amounts of missing data, you can get over 90% efficiency with just five data sets. (p. 5)
- & Because MI involves random draws, there is an inherent indeterminacy in the results. Every time you apply it to a given set of data, you will get different parameter estimates, standard errors, and test statistics. (p. 5)
- & To implement multiple imputation, you must decide: a. Whether to use the MCMC method or the FCS method. b. If you choose FCS, what models or methods to use for each variable with missing data. c. How many data sets to produce, and whether the number you’ve chosen is sufficient. d. How many iterations between data sets. e. What prior distributions to use. f. How to incorporate interactions and non-linearities. g. Which of three methods to use for multivariate testing. (p. 6)
⛔ Weaknesses and caveats
- ! moderate size has some missing data, usually enough to cause serious concern about what methods should be (p. 1)
💡 Main ideas and conclusions
- $ r statistical properties than traditional methods, while at the same time relying on weaker assumptions. The bad news is that these superior methods have not been widely adopted by practicing researchers. The most likely reason is ignorance. Many researchers have barely even heard of modern methods for handling missing data. And if they have heard of them, they have little idea how to go about implementing them. The other likely reason is difficulty. Modern methods can take considerably more time and effort, especially with regard to start-up costs. Nevertheless, with the development of better software, these methods are getting easier to use every year. There are two major approaches to missing data that have good statistical properties: maximum likelihood (ML) and multiple imputation (MI). Multiple imputation is currently a good deal more popular than maximum likelihood. But in this paper, I argue that maximum likelihood is generally preferable to multiple imputation, at least in those situations where appropriate software is available. And many SAS users are not fully aware of the available procedures for using maximum likelihood to handle missing data. In the next section, we’ll examine some assumptions that are commonly used to justify methods for handling missing data. In the subsequent section, we’ll review the b (p. 1)