A Primer on Maximum Likelihood Algorithms Available for Use With Missing Data

A Primer on Maximum Likelihood Algorithms Available for Use With Missing Data

Key takeaways

(file:///C:\Users\scott\Zotero\storage\57ZQPAXP\Enders%20-%202001%20-%20A%20Primer%20on%20Maximum%20Likelihood%20Algorithms%20Availabl.pdf)

Bibliography: Enders, C.K., 2001. A Primer on Maximum Likelihood Algorithms Available for Use With Missing Data. Structural Equation Modeling: A Multidisciplinary Journal 8, 128–141. https://doi.org/10.1207/S15328007SEM0801_7

Authors:: Craig K. Enders

Collections:: Methods

First-page:


Reading notes

Annotations

(08/05/2024, 18:28:41)

“3 maximum likelihood algorithms are currently available in existing software packages: the multiple-group approach, full information maximum likelihood estimation, and the EM algorithm.” (Enders, 2001, p. 128)

“Until recently, the analysis of data with missing observations has been dominated by listwise (LD) and pairwise (PD) deletion methods (Kim & Curry, 1977; Roth, 1994).” (Enders, 2001, p. 128)

“three maximum likelihood (ML) estimation algorithms for use with missing data are currently available: the multiple-group approach (Allison, 1987; Muthén, Kaplan, & Hollis, 1987) can be implemented using existing structural equation modeling (SEM) softw” (Enders, 2001, p. 128)

“The theoretical benefits of ML estimation are widely known (Little & Rubin, 1987), and simulation studies have suggested that ML algorithms may be superior to traditional ad hoc missing-data techniques in many cases (Arbuckle, 1996; Enders & Bandalos, in press; Muthén et al., 1987; Wothke, 2000).” (Enders, 2001, p. 129)

“The distinction between MCAR and MAR is important, because the most common missing-data methods (LD and PD) will only yield unbiased parameter estimates when MCAR holds, and this assumption is frequently not met in practice (Muthén et al., 1987). In contrast, ML methods should yield unbiased estimates under both the MCAR and MAR assumptions.” (Enders, 2001, p. 130)

“An early method for obtaining ML parameter estimates in the presence of missing data was given by Hartley and Hocking (1971). The application of this method to SEM analyses was outlined by Allison (1987) a” (Enders, 2001, p. 131)

“In this procedure, a sample is divided into G subgroups, such that each subgroup has the same pattern of missing data. That is, observations within each of the G subgroups have the same set of variables present and missing. A likelihood function is computed for each of the G groups, and the groupwise likelihood functions are accumulated across the entire sample and maximized. Although mathematically unrelated, this algorithm is loosely analogous to PD; a subgroup gi contributes to the estimation of all parameters that involve the observed data points for that group but does not contribute to parameters that involve missing-data points.” (Enders, 2001, p. 132)

“Despite the wide availability of the LISREL program at the time, the multiple-group method of missing-data analysis had practical limitations that prevented its widespread use. As pointed out by Arbuckle (1996), the LISREL specification for the multiple-group approach required an exceptional level of expertise and thus was practically limited to situations in which there are only a small number of missing-data patterns.” (Enders, 2001, p. 133)

“Despite the technical difficulties associated with its implementation, the multiple-group approach does have advantages. First, the method can be used to estimate both just-identified (e.g., correlation, regression) and over-identified (e.g., SEM) model parameters.” (Enders, 2001, p. 133)

“Second, it is important to note that the multiple-group approach does not estimate, or impute, missing observations, but yields direct estimates of model parameters and standard errors.” (Enders, 2001, p. 133)

“Third, the multiple-group approach yields the usual chi-square test statistic for model fit, although the degrees of freedom and accompanying p value are incorrect due to the use of dummy values in the input covariance matrices of subsamples with missing variance/covariance elements.” (Enders, 2001, p. 133) Reason to not use this over FIML - group level instead of individual

“FIML approach was originally outlined by Finkbeiner (1979) for use with factor analysis and is similar to the multiple-group method, except that a likelihood function is calculated at the individual, rather than the group, level. For this reason, the FIML approach has been referred to as raw maximum likelihood estimation (Duncan, Duncan, & Li, 1998; Graham, Hofer, & MacKinnon, 1996).” (Enders, 2001, p. 134)

“Like the multiple-group approach, the FIML algorithm is conceptually analogous to PD (although mathematically unrelated) in the sense that all available data is used for parameter estimation. An examination of the individual-level likelihood function illustrates this point. Assuming multivariate normality, the casewise likelihood of the observed data is obtained by maximizing the function” (Enders, 2001, p. 134)

“Based on the previous examples, the mathematical similarities between the multiple-group and FIML algorithms should be apparent; the primary difference is that FIML fitting function is the sum of n casewise likelihood values, whereas the multiple-group function is the sum of G groupwise likelihood values.” (Enders, 2001, p. 135)

“First, like the multiple-group approach, one of the advantages of the FIML algorithm is its applicability to both just-identified and over-identified models.” (Enders, 2001, p. 135)

“As such, the method is quite general and can be applied to a wide variety of analyses, including the estimation of means, covariance matrices, multiple regression, and SEM.” (Enders, 2001, p. 135)

“Second, when used in SEM applications, FIML yields a chi-square test of model fit. However, the chi-square statistic generated by FIML does not take the usual form F(N – 1), where F is the value of the fitting function. Clearly, the chi-square test cannot be calculated in the normal fashion, as there is no single value of N that is applicable to the entire sample. Also, unlike the usual SEM fitting functions, there is no minimum value associated with the FIML log-likelihood function, although the value of this statistic will increase as model fit worsens. Instead, a chi-square test for model fit is calculated as the difference in log-likelihood functions between the unrestricted (H0) and restricted (H1) models with degrees of freedom equal to the difference in the number of estimated parameters between the two models.” (Enders, 2001, p. 135)

“Third, although many popular fit indexes can be computed under FIML, the specification of a means structure (required for estimation) renders certain fit indexes undefined (e.g., GFI).” (Enders, 2001, p. 135)

“Fourth, similar to PD, indefinite covariance matrices are a potential byproduct of the FIML approach. However, Wothke (2000) suggested that indefiniteness problems are less pervasive with FIML than with PD.” (Enders, 2001, p. 135)

“Fifth, unlike the EM algorithm (discussed in the following), standard error estimates are obtained directly from the analysis, and bootstrapping is not necessary. Finally, it is important to note that the FIML algorithm does not impute missing values; only model parameters are estimated.” (Enders, 2001, p. 135)

“The EM algorithm uses a two-step iterative procedure where missing observations are filled in, or imputed, and unknown parameters are subsequently estiMAXIMUM LIKELIHOOD MISSING DATA ALGORITHMS 13” (Enders, 2001, p. 135)

“mated.” (Enders, 2001, p. 136)

“In the first step (the E step), missing values are replaced with the conditional expectation of the missing data given the observed data and an initial estimate of the covariance matrix” (Enders, 2001, p. 136)

“Using the observed and imputed values, the sums and sums of squares and cross products are calculated.” (Enders, 2001, p. 136)

“First, unlike the multiple-group and FIML approaches, the EM algorithm cannot be used to obtain direct estimates of linear model parameters (e.g., regression, SEM); as currently implemented, the EM algorithm can only be used to obtain ML estimates of a mean vector and covariance matrix.” (Enders, 2001, p. 137)

“As a result, standard errors from subsequent analyses will be negatively biased to some extent, and bootstrap (Efron, 1981) procedures must be employed to obtain correct estimates.” (Enders, 2001, p. 137)

“Despite the difficulties previously noted, the EM algorithm may be preferred in situations where the missing-data mechanism (i.e., the variables are assumed to influence missingness) is not included in the linear model being tested. This is because the MAR assumption discussed previously is defined relative to the analyzed variables in a given data set. For example, if the missing values on a variable Y are dependent on the values of another variable X, the MAR assumption no longer holds if X is not included in the ultimate analysis. This is clearly problematic for the two direct estimation algorithms, as X must be incorporated in the substantive model for MAR to be tenable. However, this is not the case with the EM algorithm, as the input covariance matrix used to estimate substantive model parameters may be a subset of a larger covariance matrix produced from an EM analysis.” (Enders, 2001, p. 137) Distinction to not use this in my case...