@holmDealingSelectionBias2011

Dealing with selection bias in educational transition models: The bivariate probit selection model

(2011) - Anders Holm, Mads Meier Jæger

Journal: Research in Social Stratification and Mobility
Link:: https://linkinghub.elsevier.com/retrieve/pii/S0276562411000084
DOI:: 10.1016/j.rssm.2011.02.002
Links::
Tags:: #paper #NCDS #Transition #school-to-work #Family #Attainment
Cite Key:: [@holmDealingSelectionBias2011]

Abstract

This paper proposes the bivariate probit selection model (BPSM) as an alternative to the traditional Mare model for analyzing educational transitions. The BPSM accounts for selection on unobserved variables by allowing for unobserved variables which affect the probability of making educational transitions to be correlated across transitions. The BPSM is easy to estimate with standard software. We use simulated and real data to illustrate how the BPSM improves on the traditional Mare model in terms of correcting for selection bias and providing credible estimates of the effect of family background on educational success. We conclude that models which account for selection on unobserved variables and high-quality data are both required in order to estimate credible educational transition models.

Notes

“proposes the bivariate probit selection model (BPSM) as an alternative to the traditional Mare model for analyzing educational transitions” (Holm and Jæger, 2011)

“BPSM accounts for selection on unobserved variables by allowing for unobserved variables which affect the probability of making educational transitions to be correlated across transitions” (Holm and Jæger, 2011)

“We conclude that models which account for selection on unobserved variables and high-quality data are both required in order to estimate credible educational transition models” (Holm and Jæger, 2011)

“Robert Mare’s (1979, 1980, 1981) model of educational transitions represents one of the major methodological contributions to the literature on family background and educational success” (Holm and Jæger, 2011, p. 1)

“Mare suggested to treat educational attainment as a sequence of discrete transitions from lower to higher educational levels and to use a sequential logit model.” (Holm and Jæger, 2011, p. 1)

“principal advantages of Mare’s educational transition model are that, first, the model is invariant to changes over time in the overall distribution of education, second, the model conforms better to the way most sociologists think about educational attainment (as a sequence of transitions) and, third, it allows researchers to model the effect of family background variables on the probability of making successive educational transitions” (Holm and Jæger, 2011, p. 1)

“One of the consistent findings from applied research using the Mare model is that the effect of family background variables tends to decrease or “wane” across educational transitions.” (Holm and Jæger, 2011, p. 1)

“two influential papers Cameron and Heckman (1998, 2001) argue that the waning coefficients in the Mare model may be artifacts of, first, an arbitrary choice of functional form in the logit model and, second, selection on unobserved variables. Selection on unobserved variables means that the group of individuals “at risk” of making educational transitions becomes increasingly selective at higher transitions due to characteristics that are not observed in the data.” (Holm and Jæger, 2011, p. 1)

“differences in the effects of explanatory variables across transitions (for example, waning coefficients) might be driven by differences in error variances rather than reflecting real differences. This type of bias is called a scaling effect.” (Holm and Jæger, 2011, p. 2)

“The potential drawbacks of our approach are that, in order to be identified, the BPSM requires, first, parametric assumptions and, second, instrumental variables to provide exogenous variation in the probability of making each educational transition” (Holm and Jæger, 2011, p. 3)

“First, there is no technical “magic bullet” for dealing with selection on unobserved variables in educational transition models” (Holm and Jæger, 2011, p. 3)

“Second, selection on unobserved variables always leads to biased estimates of the effect of explanatory variables when analysts use a selective sample. For example, an analyst may be interested in higher education only (a later educational transition) and disregard earlier transitions” (Holm and Jæger, 2011, p. 4) Good point to think on. This is what I was doing.

“Third, in empirical applications it is typically not possible to separate bias from selection on unobserved variables from bias from scaling effects (Mare 2006).” (Holm and Jæger, 2011, p. 4)

“the first transition represents the transition from elementary school to high school (or, equivalent, to upper secondary education such as A levels in the UK), and the second transition represents the transition from high school to higher education (for example, college, university, or university-college education).” (Holm and Jæger, 2011, p. 4)

“We assume that in order for individuals to make the second transition, they must first successfully make the first transition” (Holm and Jæger, 2011, p. 4)

“Our reason for choosing the probit specification over the logit specification is that the probit specification allows us to estimate  and thus to take into account that unobserved variables that affect the propensity to make the first transition are likely to be correlated with the unobserved variables that affect the propensity to make the second transition. There is no bivariate logistic distribution and, consequently, it is not possible to estimate  with the logit specification used in the traditional Mare model.” (Holm and Jæger, 2011, p. 6)

“In our NCDS sample 38 percent of the respondents complete A levels and, of those who complete A levels, more than 80 percent complete some type of higher education.” (Holm and Jæger, 2011, p. 17)

“We used Stata’s probit command to estimate the Mare probit models and the heckprob command to estimate th” (Holm and Jæger, 2011, p. 18)

“19 BPSM” (Holm and Jæger, 2011, p. 19)

“a model which adds the two instrumental variables GCE performance (for the first transition) and A level performance (for the second transition).” (Holm and Jæger, 2011, p. 19)