@Mood2010

Logistic Regression: Why We Cannot Do What We Think We Can Do, and What We Can Do About It

(2010) - C. Mood

Journal: European Sociological Review
Link:: https://academic.oup.com/esr/article-lookup/doi/10.1093/esr/jcp006
DOI:: 10.1093/esr/jcp006
Links::
Tags:: #paper #Methods #Logit #OddsRatio #LogOdds
Cite Key:: [@Mood2010]

Abstract

Logistic regression estimates do not behave like linear regression estimates in one important respect: They are affected by omitted variables, even when these variables are unrelated to the independent variables in the model. This fact has important implications that have gone largely unnoticed by sociologists. Importantly, we cannot straightforwardly interpret log-odds ratios or odds ratios as effect measures, because they also reflect the degree of unobserved heterogeneity in the model. In addition, we cannot compare log-odds ratios or odds ratios for similar models across groups, samples, or time points, or across models with different independent variables in a sample. This article discusses these problems and possible ways of overcoming them.

Notes

“They are affected by omitted variables, even when these variables are unrelated to the independent variables in the model. This fact has important implications that have gone largely unnoticed by sociologists. Importantly, we cannot straightforwardly interpret log-odds ratios or odds ratios as effect measures, because they also reflect the degree of unobserved heterogeneity in the model.” (Mood, 2010, p. 67)

“Unobserved heterogeneity is the variation in the dependent variable that is caused by variables that are not observed (i.e. omitted variables).” (Mood, 2010, p. 67)

“It is problematic to interpret log-odds ratios (LnOR) or odds ratios (OR) as substantive effects, because they also reflect unobserved heterogeneity.” (Mood, 2010, p. 67)

“It is problematic to compare LnOR or OR across models with different independent European Sociological Review VOLUME 26 NUMBER 1 2010 67–82 67 DOI:10.1093/esr/jcp006, available online at www.esr.oxfordjournals.org Online publication 9 March 2009 ß The Author 2009. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org a t U n i v e r s i t y o f C o l o r a d o o n A u g u s t 2 2 , 2 0 1 4 h t t p : / / e s r . o x f o r d j o u r n a l s . o r g / D o w n l o a d e d f r o” (Mood, 2010, p. 67)

“variables, because the unobserved heterogeneity is likely to vary across models. (iii) It is problematic to compare LnOR or OR across samples, across groups within samples, or over time—even when we use models with the same independent variables—because the unobserved heterogeneity can vary across the compared samples, groups, or points in time.” (Mood, 2010, p. 68)

“None of these curves is inherently ‘wrong’, but they estimate different underlying quantities. Curve (1) represents the population-averaged effect of x1 on P(y ¼ 1), while curves (2), (3), and (4) represent the effect of x1 on P(y ¼ 1) conditional on having a certain value on x2. Hence, in the logistic regression without x2 we obtain the LnOR or OR corresponding to a population-averaged probability curve, while the logistic regression with x2 gives us the LnOR or OR corresponding to a conditional probability curve.” (Mood, 2010, p. 72)

“In terms of the above example, each individual must be either a boy or a girl and hence the OR or LnOR for IQ conditional on sex comes closer to the individual-level effect than the OR or LnOR for IQ from the bivariate model.” (Mood, 2010, p. 72)

“As a consequence of the above, when using logistic regression, we should be even more cautious to interpret our estimates as causal effects than we are when we use linear regression” (Mood, 2010, p. 72)

“Even if the models include the same variables, they need not predict the outcome equally well in all the compared categories, so different ORs or LnORs in groups, samples, or points in time can reflect differences in effects, but also differences in unobserved heterogeneity.” (Mood, 2010, p. 73)

“Second, if the intention is to use logistic regression or some similar model using a non-linear link function, one must be careful to collect information on variables that are likely to be important for the outcome, even if these are likely to be only weakly, or not at all, related to the independent variables of intere” (Mood, 2010, p. 79)

“complicated by the fact that we often want estimates that simultaneously (i) capture the non-linearity of the relation, (ii) are comparable over groups, samples etc., (iii) are comparable over models, and (iv) indicate conditional effects.” (Mood, 2010, p. 80)