Local and Overall Deviance R-Squared Measures for Mixtures of Generalized Linear Models

Key takeaways

(file:///C:\Users\scott\Zotero\storage\BZE9AD5N\Di%20Mari%20et%20al.%20-%202023%20-%20Local%20and%20Overall%20Deviance%20R-Squared%20Measures%20for%20.pdf)

Bibliography: Di Mari, R., Ingrassia, S., Punzo, A., 2023. Local and Overall Deviance R-Squared Measures for Mixtures of Generalized Linear Models. J Classif 40, 233–266. https://doi.org/10.1007/s00357-023-09432-4

Authors:: Roberto Di Mari, Salvatore Ingrassia, Antonio Punzo

Collections:: Methods

First-page: 233

Abstract

In generalized linear models (GLMs), measures of lack of fit are typically defined as the deviance between two nested models, and a deviance-based R2 is commonly used to evaluate the fit. In this paper, we extend deviance measures to mixtures of GLMs, whose parameters are estimated by maximum likelihood (ML) via the EM algorithm. Such measures are defined both locally, i.e., at cluster-level, and globally, i.e., with reference to the whole sample. At the cluster-level, we propose a normalized two-term decomposition of the local deviance into explained, and unexplained local deviances. At the sample-level, we introduce an additive normalized decomposition of the total deviance into three terms, where each evaluates a different aspect of the fitted model: (1) the cluster separation on the dependent variable, (2) the proportion of the total deviance explained by the fitted model, and (3) the proportion of the total deviance which remains unexplained. We use both local and global decompositions to define, respectively, local and overall deviance R2 measures for mixtures of GLMs, which we illustrate—for Gaussian, Poisson and binomial responses—by means of a simulation study. The proposed fit measures are then used to assess, and interpret clusters of COVID-19 spread in Italy in two time points.

Citations

content: "@dimariLocalOverallDeviance2023" -file:@dimariLocalOverallDeviance2023

Reading notes

Imported on 2024-05-06 13:35

⭐ Important

& In generalized linear models (GLMs), measures of lack of fit are typically defined as the deviance between two nested models, and a deviance-based R2 is commonly used to evaluate the fit. (p. 233)
& generalized linear models (GLMs), measures of lack of fit are typically defined based on the deviance, which compares the log-likelihoods of two nested GLMs. (p. 233)
& In GLMs, the deviances replace the sums of squares (SS) of ordinary least squares (OLS) regression as the building blocks to define measures of lack of fit to the data of the GLM (p. 235)
& Notably, deviance measures are built from the maximum log-likelihoods of three models: the most parsimonious intercept-only model (null model), 23 (p. 235)
& Journal of Classification (2023) 40:233–266 the model we are interested in (fitted model), and the least parsimonious model, with n parameters, providing a perfect fit (saturated model). The null and saturated models are defined so that μ ̂i = ̄ y,andμ ̂i = yi (i = 1,...,n), respectively, with ̄ y being the sample mean of Y . (p. 236)
& Each deviance is a measure of lack of fit, and is obtained as twice the difference between the log-likelihood of one model, compared to the log-likelihood of another (nested) model. (p. 236)
& Therefore, the larger the value of deviance for the nested model, the worse its goodness of fit. The two most used deviances are the null deviance (p. 236)
& The null deviance in Eq. 5is analogous to the total sum of squares (TSS), that is, the total variation in the dependent variable Y from the OLS regression. This measures the discrepancy between the worst, and the best possible models, i.e., all the discrepancy that the (fitted) model can potentially account for. (p. 236)
& The fitted deviance in Eq. 6is analogous to the residual sum of squares (RSS) from OLS regression. This deviance measures the lack of fit after modeling with d predictors. Even if it is not as widespread in the literature, in principle we could also define a sort of “explained” deviance as (p. 236)
& which compares the null and fitted models, analogously to the explained sum of squares (ESS) from OLS regression. (p. 236)
& Explained and residual deviances allow us to decompose the null deviance as (p. 236)
& These pseudo-R2s are based on different definitions of residuals, the most common of which are the raw residuals, the Pearson residuals, and the deviance residuals (Cameron & Windmeijer, 1996). Note that none of these indexes is a goodness of fit measure, in the sense that none can be interpreted as “proportion of variance accounted for,” as in the OLS regression. (p. 236)
& Intuitively, it looks similar to the R2 = ESS/TSS of simple linear regression, 23 (p. 236)
& Journal of Classification (2023) 40:233–266 where the sums of squares are replaced with the deviance measures defined in (5)and(7). Its formula is given by R2 = 1 − D (μ ̂, y; ̂ φ) D ( ̄ y, y; ̂ φ) (9) = D( ̄ y, ̂μ; ̂ φ) D ( ̄ y, y; ̂ φ) . (10) Due to the two-term decomposition (8), also this index ranges between zero and one. (p. 237)
& As a general note, the higher the R2 j , the better the j th GLM fits the data in the j th group. In other words, the larger the fraction of local deviance in group j that is accounted for by the j th GLM, the closer the data points are to the fitted cluster’s regression line. With the same principle, it is natural to define the overall deviance R2 as R2 = EWD WD . (30) Intuitively, the overall R2 in Eq. 30can be interpreted as the proportion of the within deviance explained (accounted for) by the fitted mixture of GLMs. (p. 243)
& Starting from Eq. 9, and similarly to the adjusted R2 for the OLS regression, the adjusted deviance R2 for GLMs is defined (Guisan & Zimmermann, 2000, p. 167) as R2 = 1 − D(̂μ,y;̂ φ) n−(d +1) D( ̄ y,y;̂ φ) n−1 =1− n−1 n − (d + 1) ( 1 − R2) , (32) where n − (d + 1),andn − 1 represent the so-called number of degrees of freedom of D (μ ̂, y; ̂ φ) and D ( ̄ y, y; ̂ φ), respectively. (p. 243)
& The primary attractiveness of R2 is that it imposes a penalty for adding additional independent variables to the GLM. The second related attractiveness of R2 is that it can be used to choose between nested/nonnested GLMs, with the aim of selecting the best set of explanatory variables (variable/model selection). (p. 244)