@Stopforth2020

Parental Socio-Economic Background and Children’s School-Level GCSE Attainment

(2020) - Sarah Stopforth

Journal:
Link::
DOI::
Links::
Tags:: #paper #Aspirations #SES #Attainment
Cite Key:: [@Stopforth2020]

Abstract

The principal aim of this thesis is to better understand the contemporary relationship between parental socio-economic background and children’s General Certificate of Secondary Education (GCSE) attainment. Previous empirical research has demonstrated that there is a strong, persisting association between parental socio-economic background and educational outcomes, and specifically school GCSE attainment. This thesis directly contributes to the sociology of education in two main ways. First, it presents new empirical evidence about the nature of socio-economic inequalities in young people’s GCSE attainment in England over the course of the 1990s and early 2010s. Second, it builds on previous empirical work and builds a more comprehensive understanding of the effects of socio-economic background. Developing a better understanding of why, or how, those from more advantaged socioeconomic backgrounds achieve more favourable educational outcomes by the end of compulsory schooling is important to enable young people, parents, teachers, schools, and policymakers to help to address the persisting attainment gap observed in school-level qualifications. The thesis is organised into two parts. Part 1 examines the nature of the relationship between parental socio-economic background and children’s school GCSE attainment for synthetic cohorts of English Year 12 pupils (i.e. aged 16 and 17). The analyses examine the role of parental socio-economic background in GCSE attainment using the British Household Panel Survey for young people taking their GCSE examinations in the 1990s and 2000s. A key methodological aspect of this work is sensitivity analyses of the independent variables (i.e. socio-economic background measures) and the functional form of the outcome variable (i.e. GCSE attainment). Particular attention is paid to checking the robustness of results using alternative measures and alternative statistical model specifications. The analyses ar replicated using the UK Household Longitudinal Study (UKHLS, also known as Understanding Society). Analyses of the UKHLS dataset represent more contemporary cohorts of young people taking their GCSE examinations in the early 2010s. The final section of Part 1 addresses the methodological challenge of missing data in social surveys. It takes a series of principled statistical approaches to help to address the potential distortions caused by missing data in the synthetic cohort analyses. Part 2 of this thesis investigates the relationship between parental socio-economic background and children’s school GCSE attainment in greater depth. The analyses in Part 2 empirically explore three potential explanations for the enduring socio-economic inequalities observed in educational outcomes. The first set of analyses examine the extent to which inequalities in GCSE attainment can be accounted for by prior academic attainment, for example, attainment at age 11. Cognitive and educational outcomes at earlier stages of schooling are stratified by parental socio-economic background, and therefore the inequalities observed at GCSE level may be a continuation of inequalities observed at earlier stages of a young person’s schooling. Path analysis models are used to decompose the effects of parental education and parental social class on attainment at the end of compulsory secondary school. The next set of analyses investigate the role of cultural capital in educational inequalities. The concept of cultural capital is a prominent sociological explanation for persisting educational inequalities. Developing theoretically informed measures of cultural capital using social survey data is especially challenging because there are no clear prescriptions of how to operationalise these measures. A key aspect of this work is the attention to sensitivity analyses of alternative measures. The candidate measures are compared and contrasted within a series of analyses, with particular attention paid to the effect such measures have on understanding the relationship between parental socio-economic background and GCSE attainment The final set of analyses explore the role of educational aspirations in educational inequalities. ‘Raising aspirations’ has been at the core of recent UK government rhetoric to help to address the attainment gap between the most disadvantaged and more advantaged young people. The overarching government position has been that the attainment gap has been, in part, attributed to the ‘low’ aspirations held by young people and their parents. The analyses explore the socioeconomic gradient to young people’s aspirations over the course of their secondary school years, before examining the influence of the educational aspirations of young people and their parents on GCSE attainment.

Notes

“long-standing research tradition examining social stratification and the role of socioeconomic background in education (Floud et al., 1961, Blau and Duncan, 1967, Jencks, 1973, Boudon, 1974, Karabel and Halsey, 1977, Halsey et al., 1980, Shavit and Blossfeld, 1993, Crompton, 2008, Platt, 2010).” (Stopforth, 2020, p. 19)

“Empirical evidence consistently demonstrates that socioeconomic inequalities in educational outcomes continue to pervade education systems in the UK (Jackson, 1962, Lacey, 1970, Halsey et al., 1980, Heath and Clifford, 1990, Ball, 2003, Blanden and Machin, 2004, Machin and Vignoles, 2004, Reay, 2017).” (Stopforth, 2020, p. 19)

“GCSEs mark a key branching point for young people, shaping their opportunities and choices for continued education, employment, or training (see, for example, analyses of the Youth Cohort Study in Payne, 1995a, Payne, 1995b, Payne, 2001a, Payne, 2001b, and see Jones et al., 2003, Babb, 2005).” (Stopforth, 2020, p. 19)

“Social stratification can be understood as the persistence of inequalities which occur, or are reproduced, across generations (Bottero, 2005).” (Stopforth, 2020, p. 20)

“Stratification in society can take a variety of forms according to socially-constructed differences of, for example, class, gender, and ethnicity (Payne, 2013b, Grusky, 2014).” (Stopforth, 2020, p. 20)

“Socio-economic stratification measures often involve occupation-based schemas and scales (Lambert et al., 2012).” (Stopforth, 2020, p. 20)

“A focus on occupational structure or relations can provide a more stable base than the more transient, or fluctuating, nature of income in terms of economic security and future prospects (Goldthorpe and McKnight, 2004).” (Stopforth, 2020, p. 20)

“Social distance measures have historically taken a bottom-up approach, whereby the structure is defined by the social relations within it, rather than by a pre-determined class structure (Bottero, 2005).” (Stopforth, 2020, p. 20)

“Individuals occupying a similar NS-SEC category, or social class position, are understood to have similar market and economic power (Rose and Pevalin, 2003). Crompton (2008), for example, suggested that they also share similar lifestyles and social attitudes” (Stopforth, 2020, p. 21)

“NS-SEC is considered as a robust measure of socio-economic position which has been found to play a central role in a range of different outcomes such as health and education (Rose and Pevalin, 2003).” (Stopforth, 2020, p. 21)

“Parental education level will also be included in analyses as a measure of socio-economic background. This follows the suggestion of Bukodi and Goldthorpe (2013) that social class and education exert separate, and independent, influences on outcomes such as educational attainment” (Stopforth, 2020, p. 21) important

“Theoretically, an increase in educational expansion could lead to greater equality in educational outcomes between more and less advantaged young people” (Stopforth, 2020, p. 22) Theoretically is doing a LOT of heavy lifting here

“Maximally maintained inequality (MMI) theory states that class inequality persists at different levels of the education system because the rate of transitions at various levels, contingent on social background, remains the same until demand outstrips supply (Raftery and Hout, 1993). Educational expansion occurs when the current level of education is saturated and opportunities open up at higher transition points. The increase in the rate of enrolment for all social classes at lower transition points is offset by the greater opportunities created at higher transition points. These are typically taken up by more advantaged young peopl” (Stopforth, 2020, p. 22)

“There has been an overall increase in participation for students from all social backgrounds (Chowdry et al., 2013). As a result, there has been ‘credential inflation’, where the value of undergraduate degrees has lessened due to the higher proportion of people obtaining them. As undergraduate education becomes saturated, there has been a subsequent rise in the numbers gaining postgraduate degrees (Van de Werfhorst and Andersen, 2005).” (Stopforth, 2020, p. 23)

“Paterson and Iannelli (2007) suggested a refinement to the MMI theory which accounts for non-linearity in the expansion of educational opportunity. They argued that inequalities are higher at certain phases, for example, more advantaged children are better placed to take advantage of new opportunities as they arise, but this evens out over time” (Stopforth, 2020, p. 23)

“refinement of the MMI theory is Effectively Maintained Inequality (EMI) (Lucas, 2001). EMI theory states that more advantaged families maintain their advantage even when opportunities become universal. Instead of differentiating through attending higher levels of education when opportunities are not universal, more advantaged families differentiate themselves through the quality of education sought once a particular education level is saturated. For example, more advantaged young people may attend more prestigious institutions than less advantaged young people (Reay et al., 2001, Boliver, 2015).” (Stopforth, 2020, p. 23)

“Bathmaker et al. (2013) commented that as greater numbers of working class students enter university, the ‘rules of the game’ shift. Middle class students may mobilise their resources to advance their employability after graduation” (Stopforth, 2020, p. 23)

“The UK consists of four territories and does not have a single school education system (Paterson and Iannelli, 2007).” (Stopforth, 2020, p. 24)

“Universal secondary schooling was introduced by the 1918 Fisher Act for children between the ages of 5 and 14, which was subsequently raised to age 15 with the 1944 (Butler) Education Act (McKibbin, 1998). The 1944 Act also established a tripartite system of grammar schools, secondary moderns, and technical schools (Halsey et al., 1980). Heath and Clifford (1990) argued that the tripartite school system disproportionately benefited middle class children because they were more likely to get into grammar schools” (Stopforth, 2020, p. 24)

“One explanation for this was that middle class children had the cultural knowledge required to pass the 11-plus examination (McKibbin, 1998” (Stopforth, 2020, p. 24)

“Widespread comprehensivisation of secondary schools began in the early 1960s, and by the 1970s English schools were mostly non-selective, although pockets of selective schools still existed in some areas (Paterson and Iannelli, 2007, Coldron et al., 2010).” (Stopforth, 2020, p. 24)

“Ball (2003) argued that middle class parents were more able to ‘game’ the system, for example, through greater financial resources to move house or rent a second home in the catchment areas of very good schools.” (Stopforth, 2020, p. 24)

“The expansion of higher education became UK government policy with the 1963 Robbins Report” (Stopforth, 2020, p. 25)

“Participation rates increased over the years, and increased at a much faster rate in the 1990s (Machin and Vignoles, 2004” (Stopforth, 2020, p. 25)

“1997, the Dearing Report recommended the introduction of £1000 tuition fees and income contingent loans (Blanden and Machin, 2004). Fees rose after legislation was passed in 2004 (Galindo-Rueda et al., 2004). Although higher education expansion was intended for pupils from all social backgrounds, evidence suggests that there was a disproportionate benefit of higher education expansion for those from more advantaged families compared with their less advantaged peers (Blanden et al., 2003, Machin and Vignoles, 2004, Chowdry et al., 2013).” (Stopforth, 2020, p. 25)

“qualifications gained at the end of compulsory schooling can be important determinants of the young person’s future educational and occupational opportunities and choices (see, for example, analyses of the Youth Cohort Study in Payne, 1995a, Payne, 1995b, Payne, 2001a, Payne, 2001b, and see Jones et al., 2003, Babb, 2005).” (Stopforth, 2020, p. 25)

“GCSE examinations were introduced as part of the reforms in the 1988 Educational Reform Act. GCSEs replaced General Certificate of Education Ordinary (O’) Levels and Certificates of Secondary Education (CSEs) to establish a single system of assessment, with grades ranging from A to G (Department for Education, 1985). In 1994, an additional A* grade was introduced at the highest level (Yang and Woodhouse, 2001).” (Stopforth, 2020, p. 26)

“Young people generally studied eight or nine subjects at GCSE level (Rothon, 2007” (Stopforth, 2020, p. 26)

“Empirical work has demonstrated that young people from more advantaged backgrounds tend to have more favourable GCSE outcomes than their less advantaged peers (Demack et al., 2000, Sullivan, 2001, Connolly, 2006, Connelly et al., 2013, Gayle et al., 2014, Strand, 2014a, Playford and Gayle, 2015” (Stopforth, 2020, p. 27)

“social stratification of GCSE attainment (often by class, gender, and ethnicity) has been explored using large-scale datasets such as the Youth Cohort Study (Demack et al., 2000, Babb, 2005, Connolly, 2006, Gayle et al., 2014)” (Stopforth, 2020, p. 27)

“Analyses of GCSE attainment have also been undertaken, to a lesser extent, using the British Household Panel Survey (Murray et al., 2012, Connelly et al., 2013),” (Stopforth, 2020, p. 27)

“1 The second dataset is the UK Household Longitudinal Study (UKHLS)” (Stopforth, 2020, p. 30)

“Official education data containing GCSE records from the National Pupil Database are linked to the UKHLS (UKHLS-NPD).3 These data are accessible using the Secure Lab environment provided by the UK Data Service.” (Stopforth, 2020, p. 30)

“Previous studies examining educational attainment for young people in the 1990s have utilised the Youth Cohort Study and the British Household Panel Survey (Gayle, 2005, also see section 2.2 above). The UK Household Longitudinal Study is a relatively under-utilised dataset in the area of GCSE attainment and will contribute contemporary findings in the 21st Century” (Stopforth, 2020, p. 31)

“Household panel surveys are not specialist education datasets but provide a promising opportunity to study educational outcomes for young people in participating households. The BHPS and UKHLS are particularly suited to these analyses because, by design, they provide information on every member of the household, including parents and young people” (Stopforth, 2020, p. 31)

“4.1.1 The British Household Panel Survey” (Stopforth, 2020, p. 31)

“4.1.2 The UK Household Longitudinal Study” (Stopforth, 2020, p. 32)

“The BHPS and UKHLS are complex social surveys. It is important to appropriately represent design and selection strategies when analysing complex surveys (see Longhi and Nandi, 2015). By default, statistical software packages assume that the data it is dealing with has been collected through simple random sampling (Longhi and Nandi, 2015). Conducting analyses of the BHPS and UKHLS without adjusting for the complex survey design would be a naïve” (Stopforth, 2020, p. 33)

“approach to inferential data analysis (Gayle and Connelly, 2017). To fail to represent the complex survey design will negatively influence results (Treiman, 2009).” (Stopforth, 2020, p. 34)

“The BHPS had a two-stage stratified sample design. In stage one, primary sampling units (PSUs) of postcodes were identified. In stage two, systematic sampling was used to select addresses for interview (Taylor et al., 2010). When the BHPS started in 1991, the sample was representative of households in Britain south of the Caledonian Canal (Longhi and Nandi, 2015)” (Stopforth, 2020, p. 34)

“UKHLS has a similar (but not identical) complex survey design with stratified, clustered, and equal probability selection of addresses both north and south of the Caledonian Canal (Buck and McFall, 2011). The primary sampling units of postcodes across Great Britain formed the initial stratified sample. These postal sectors were sampled systematically, with equal probability within each strata (Buck and McFall, 2011). The analyses in this thesis use the General Population Sample and Ethnic Minority Boost Sample, which are the key analytical samples.” (Stopforth, 2020, p. 34)

“Within longitudinal studies, non-response and missing data can take the form of unit nonresponse, wave non-response, and attrition (Hawkes and Plewis, 2006). Angrist and Pischke (2009) highlighted the difficulty and complexity of using sample weights in statistical analyses for even the most advanced researchers. Different survey weights are deposited with the BHPS and UKHLS datasets with useful guidance for choosing the most appropriate weight provided in Taylor et al. (2010) for the BHPS and Knies (2018) for the UKHLS” (Stopforth, 2020, p. 34)

“Specialist survey commands in statistical software packages, for example,” (Stopforth, 2020, p. 34)

“Introduction 32” (Stopforth, 2020, p. 35)

“the cohorts in the following analyses, the end of Year 11 also marked the end of compulsory schooling” (Stopforth, 2020, p. 37)

“Synthetic cohorts of English school Year 12 pupils are identified in the household panel surveys, because they will have recently received, and reported, their GCSE examination results” (Stopforth, 2020, p. 37)

“The results of GCSE examinations can be important determinants of a young person’s future education, employment, and earnings (Babb, 2005, Leckie and Goldstein, 2009, Croll, 2009, Playford and Gayle, 2015).” (Stopforth, 2020, p. 39)

“Earlier research has found that GCSE attainment is stratified by socio-economic background, gender, and ethnicity (Drew, 1995, Demack et al., 2000, Sullivan, 2001, Connolly, 2006, Connelly et al., 2013, Gayle et al., 2014, Strand, 2014a, Playford and Gayle, 2015).” (Stopforth, 2020, p. 39)

“The analyses in this chapter use the BHPS to construct synthetic cohorts of young people in school Year 12 (or of equivalent age, i.e. aged 16 and 17) in England who have recently sat their GCSE examinations, and reported their results in the BHPS” (Stopforth, 2020, p. 39)

“Sensitivity analyses can be understood as post-analysis robustness checks which reestimate analyses using alternative measures or statistical model specifications (Connelly et al., 2016c, Freese and Peterson, 2017).” (Stopforth, 2020, p. 40)

“Social class has an enduring presence in British sociology (Crompton, 2008, Savage, 2016).” (Stopforth, 2020, p. 41)

“Goldthorpe and Marshall (1992) launched a defence of class analysis which argued that class is useful as an analytical lens to view stratification, rather than as a deterministic concept in a Marxist sociological tradition. Crompton (2008) and Savage (2016) noted a revival in class analysis, with a shift in emphasis from class as an economic and deterministic concept in the structure, consciousness and agency debates to class as a cultural concept drawing on similarities in lifestyle and cultural or material consumption” (Stopforth, 2020, p. 41)

“The first social class schema developed was the Registrar General class schema. This originally classified the British population into a five-fold schema based on indices of occupation and industry (Table 1.1) (Rose et al., 1997).” (Stopforth, 2020, p. 42)

“The Goldthorpe class schema emerged from the Oxford Mobility Study in the 1970s. The schema categorised men into social classes based on their occupations (Goldthorpe et al., 1980).” (Stopforth, 2020, p. 43)

“The theoretical foundations of the Goldthorpe class schema were comprehensively detailed in Erikson and Goldthorpe (1992). There are two elements to the Goldthorpe classification: employment status and employment contract” (Stopforth, 2020, p. 43)

“The Goldthorpe class schema has been further developed by the UK Office for National Statistics (ONS) as the National Statistics Socio-Economic Classification (NS-SEC) (Office for National Statistics, 2010).” (Stopforth, 2020, p. 44)

“The classification system of the NS-SEC is based on the Weberian concept of market situation and life chances derived from occupational position (Crompton, 2008).” (Stopforth, 2020, p. 44)

“more recent development in social class analysis is the micro-class approach. Micro-class analysis uses occupational data to better explain differences in life chances, patterns of behaviour or differential attitudes, than at the level of the big or agglomerate classes (Grusky and Weeden, 2001, Weeden and Grusky, 2004).” (Stopforth, 2020, p. 46)

“There are practical challenges associated with adopting the micro-class approach, for example, in many datasets there may be sparse information in some occupational categories to develop and apply micro-class measures, and there may be associated challenges of sample size and statistical power in analyses (Connelly et al., 2016b).” (Stopforth, 2020, p. 47)

“Moving away from an employment-aggregate approach, Savage et al. (2013) constructed a ‘new model’ of social class based on the concept of capitals, assets, and resources (also see Crompton, 2008).” (Stopforth, 2020, p. 47)

“The new schema attracted much critical reflection, for example, on measurement, sample selection bias, model selection, and classifications (for example, see Bradley, 2014, Mills, 2014)” (Stopforth, 2020, p. 47)

“Connelly et al. (2019) undertook a principled attempt to replicate the ‘new model’ using the UKHLS. The authors noted methodological challenges, for example, the results are likely to be sensitive to the manifest variables available to use” (Stopforth, 2020, p. 48)

“there is not convincing evidence that a capitals, assets, and resources measure improves the explanatory power or theoretical understanding of social class compared with the existing NS-SEC social class schema (Connelly et al., 2019).” (Stopforth, 2020, p. 48)

“In Britain, Hope and Goldthorpe (1974) developed an occupational prestige scale of the general desirability of occupations using data in the Oxford Mobility Study” (Stopforth, 2020, p. 48)

“International Socio-Economic Index of occupational status (ISEI) was developed using measures of occupational prestige alongside education and income for men in 16 countries (Ganzeboom et al., 1992” (Stopforth, 2020, p. 49)

“Ganzeboom (2019) reinvestigated and reaffirmed the idea of the Treiman constant in empirical work. The Treiman constant is the theoretical idea that occupational prestige rankings remain constant between different countries and over time (Hout and Diprete, 2006).” (Stopforth, 2020, p. 49)

“Stewart et al. (1980) constructed the Cambridge scale to measure advantage based on social associations. The original Cambridge scale was devised based on a study of male workers in the Cambridge area in 1918 (Stewart et al., 1980).” (Stopforth, 2020, p. 49)

“The original Cambridge scale was based on pairwise matching of a respondent’s four closest friendships and multidimensional scaling was used to generate a social distance score (Stewart et al., 1973). Prandy (1999) asserted that the Cambridge scale is preferable to categorical class measures because of the closer affinity to” (Stopforth, 2020, p. 49)

“measuring the underlying hierarchy in social relations” (Stopforth, 2020, p. 50)

“CAMSIS is based on occupational information and how this relates to social networks. It is therefore claimed that it is not constrained to a structuralist account of social stratification like class schemas, with categories chosen a priori (Bergman and Joye, 2005, Bottero, 2005). The scale can be extended to apply to different contexts and countries (Prandy and Lambert, 2003).” (Stopforth, 2020, p. 50)

“In the late 1960s and early 1970s, sociologists noted the empirical challenge of including women in class analysis (Stacey 1969, Acker 1973). Acker (1973) highlighted the intellectual sexism in social stratification research that the conventional focus on male occupations represented” (Stopforth, 2020, p. 50)

“Erikson (1984) argued in defence of a dominance approach, which assigned the social class of the household to the person with the highest occupational position and with the longest” (Stopforth, 2020, p. 50)

“Chapter 1 48” (Stopforth, 2020, p. 51)

“Sorensen (1994) conducted a review of the empirical evidence using both the conventional view and a joint classification. Sorensen (1994) concluded that neither approach was inherently more appropriate than the other” (Stopforth, 2020, p. 51)

“is not a single appropriate way of measuring parental socio-economic background (Sorensen, 1994, Beller, 2009). A dominance approach to the NS-SEC schema will be taken in this thesis following the clear guidance from Rose and Pevalin (2003).” (Stopforth, 2020, p. 51)

“The years spent in education is a common measure of education level in economic research and it can reflect the returns to education in the form of human capital (Connelly et al., 2016a).” (Stopforth, 2020, p. 51)

“The successive raising of the school leaving age in the UK will affect easy comparisons across generations. Duration measures can also be compared across educational contexts and are strongly correlated with other measures of education level, such as categorical measures (Schröder and Ganzeboom, 2014). Measuring the number of years can obscure more specific educational attainment, for example, levels of education completed and grades attained. This can be partially addressed by scaling methods to combine information on years spent in education and the time taken to achieve different levels within the education system (Schröder and Ganzeboom, 2014).” (Stopforth, 2020, p. 52)

“Qualifications signify not only an educational transition, but successful completion and certification at a certain level. Qualifications tend to be ordinal in nature” (Stopforth, 2020, p. 52)

“The grades achieved in GCSE examinations tend to be determinants of further education and employment opportunities (Croll, 2009).” (Stopforth, 2020, p. 53)

“There is no single agreed-upon way to measure GCSE attainment in social science research (Connelly et al., 2016a). This is because there are a number of different combinations of GCSE examinations that any one pupil could sit, and because the grading system is alphabetised rather than numeric (from grades A* to G). A common measure is the binary outcome variable of whether the respondent attained 5 GCSEs at grades A*-C. This was the standard attainment benchmark in UK education policy (Gayle et al., 2003, Connolly, 2006, Leckie and Goldstein, 2009).” (Stopforth, 2020, p. 54) Multiple GCSE measures ought to be tested

“Connelly et al. (2013) used categorical measures to examine those with middle attainment, i.e. the achievement of 1-4 A*-Cs. Sullivan et al. (2011) examined low, medium, and high attainment, measured as 1 or more A*-Cs, 5 or more A*-Cs, and 8 or more A*-Cs. Gorard and Taylor (2002: 7) noted the challenge of equivalence with the 5 or more A*-C benchmark measure, which treats an A* in Music, a B in Physics, and a C in Sociology as equivalent.” (Stopforth, 2020, p. 54)

“The synthetic cohorts are termed ‘school Year 12 pupils’, but the identification of these individuals does not distinguish between the individuals who have continued with non-compulsory education and those who have left education” (Stopforth, 2020, p. 55)

“Figure 1. 1: Synthetic cohorts of Year 12 pupils in BHPS households” (Stopforth, 2020, p. 55)

“Synthetic school year cohorts were developed by grouping young people born in the same academic year, according to their birth months and years.5 The young people’s data were then linked to their mothers’ and fathers’ data. The school year cohorts were appended together across all 18 waves of the BHPS.” (Stopforth, 2020, p. 56)

“Carpenter and Kenward (2013) strongly advised that the first stage in statistical analyses should be to conduct a complete records analysis. Where possible, this should be followed by exploring the effects of missing data.” (Stopforth, 2020, p. 56)

“Figure 1.3 illustrates that there is a clear spike at zero, as 15.3% of respondents do not have any GCSEs at grades A*-C but at least one GCSE at grades D-G. There is also a distinctive spike around the attainment of 9 or 10 GCSEs at grades A*-C, suggesting that these are generally high achieving synthetic cohor” (Stopforth, 2020, p. 57)

“The national standard policy benchmark of 5 or more GCSEs at grades A*-C is used as the initial outcome variable (Gayle et al., 2003, Connolly, 2006, Leckie and Goldstein, 2009” (Stopforth, 2020, p. 57)

“Parental education level is measured by highest parental education qualification. The categories are higher education, further education, school-level (including O’ Level and A’ Level),7 and below school-level (i.e. less than an O’ Level pass) (Lambert, 2012).” (Stopforth, 2020, p. 58)

“Socio-economic background measures are tested in sensitivity analyses below as robustness checks.” (Stopforth, 2020, p. 58)

“The NS-SEC schema does not follow a strict hierarchical structure to easily identify the higher social class position. The guidance in Rose and Pevalin (2003) suggests that the ordering for dominance is NS-SEC 1.2, 1.1, 4, 2, 3, 5, 6, 7. A dominance approach for parental education level is more straight-forward, i.e. using the highest education qualification of mother or father.” (Stopforth, 2020, p. 59)

“Other independent variables included in the models are housing tenure, gender, and ethnicity. Housing tenure has been included in previous analyses of educational attainment (Connelly et al., 2013, Playford and Gayle, 2015).” (Stopforth, 2020, p. 59)

“Demographic factors like gender and ethnicity have been found to have strong influences on educational attainment, whereby girls tend to outperform boys, and those from Indian or Chinese backgrounds tend to outperform children from all other backgrounds (Drew, 1995, Demack et al., 2000, Connolly, 2006, Platt, 2010, Sullivan et al., 2011, Connelly et al., 2013, Strand, 2014a)” (Stopforth, 2020, p. 59)

“in the following analyses is challenging because of the low coverage of ethnic minority groups in the BHPS. Over 95% of the synthetic cohort sample are from white backgrounds. Only 69 individuals (4.3% of the sample) reported that they are from Black Caribbean, Black African, Indian, Pakistani, Bangladeshi, Chinese, mixed, or other ethnic backgrounds. The sample sizes for some of the disaggregated ethnicity groups are very small (for example, many have five or fewer individuals). For this reason, the resultant variable is parameterised as white and non-white. The parameterisation is highly restrictive and makes a very unrealistic assumption of within-group homogeneity. For this reason, ethnicity is included in the models as a control variable, rather than a variable which can be suitably interpreted.” (Stopforth, 2020, p. 59) Super important regarding the inclusion of ethnicity

“There are a number of Pseudo R2 measures available (Smithson, 2003).” (Stopforth, 2020, p. 61)

“logistic regression diagnostic tests demonstrate that there are normally distributed residuals, evidence of homoscedasticity, that the model is correctly specified, that there is no evidence of multicollinearity between the explanatory variables, and there are no influential cases worthy of further investigation (Kohler and Kreuter, 2012, Mehmetoglu and Jakobsen, 2017). There are no significant interactions between the explanatory variables.” (Stopforth, 2020, p. 62)

“The presentation of the model results follow the useful guidance provided in Connelly et al. (2016d).” (Stopforth, 2020, p. 63)

“Quasivariances are presented to help to address the reference category problem by providing quasistandard errors and 95% comparison intervals to compare all contrasts of the categorical variables (Gayle and Lambert, 2007).” (Stopforth, 2020, p. 63)

“The challenges of interpreting effect sizes of logistic regression models using log odd coefficients is well-documented (see, for example, Long, 1997, Treiman, 2009, Long and Freese, 2014).” (Stopforth, 2020, p. 66)

“suitable alternative method is to convert log odds into probabilities, for example, using marginal effects (Long and Freese, 2014).” (Stopforth, 2020, p. 66)

“It is not statistically appropriate to directly compare log odds coefficients across logistic regression models (see detailed reviews in Allison, 1999, Williams, 2009, Mood, 2010).” (Stopforth, 2020, p. 67) Interesting. Did not know this.

“robustness is assessed by focusing on the substantive conclusions in the alternative logistic regression models, and comparing predicted probabilities for the explanatory variables in each model. The goodnessof-fit of each model is assessed using three BIC measures (based on degrees of freedom, model chi square, and deviance)8, and a variety of Pseudo R2 measures, as there is not a clear, superior Pseudo R2 measure (Smithson, 2003, Lewis-Beck et al., 2004).” (Stopforth, 2020, p. 67)

“There are strong correlations between the three measures of parental social class. Parental NS-SEC and the Goldthorpe schema has a significant chi square statistic (3300 at 42 degrees of freedom, p<.001) and a Cramer’s V of 0.59. Both parental NS-SEC and Cambridge Scale score, and parental Goldthorpe and Cambridge Scale score yield statistically” (Stopforth, 2020, p. 67)

“significant, strong eta statistics (0.71 and 0.72 respectively). A dominance approach is used to construct the parent measures (Erikson 1984).” (Stopforth, 2020, p. 68)

“Three separate logistic regression models are estimated and the results are presented in Table 1.12. The first model uses the measure of parental NS-SEC, and has been described in detail in the previous section. The next model uses the Goldthorpe class schema” (Stopforth, 2020, p. 68)

“final model uses the Cambridge Scale. Parents with higher Cambridge Scale scores are significantly associated with higher log odds of attaining 5 or more GCSEs at grades A*-C.” (Stopforth, 2020, p. 68)

“The BIC statistics demonstrate that the most parsimonious model uses the parental Cambridge Scale. This is unsurprising because the BIC statistic penalises models for estimating additional parameters. The Goldthorpe model would be considered an improvement over the NS-SEC model, i.e. BIC is lower (Raftery, 1995).” (Stopforth, 2020, p. 68)

“on the evidence presented above, NS-SEC is the preferred measure in subsequent analyses. NS-SEC also has the benefit of being widely used in official governmental and social scientific research, and is therefore more ably compared across studies.” (Stopforth, 2020, p. 70)

“Three measures of parental education level are tested. The first measure is the highest UK qualification achieved (a measure collected in the BHPS). Two further international measures of education level are tested, CASMIN and ISCED.” (Stopforth, 2020, p. 70)

“Tabulations demonstrate that there is general consistency in the assignment of individuals to categories in the CASMIN and ISCED frameworks. There is less consistency when tabulating these measures with the UK-specific qualifications” (Stopforth, 2020, p. 71)

“Table 1.14 presents the results of the three alternative logistic regression models using the disaggregated measures for parental education level. The reference category is no qualifications, as this is a consistent category across all three measures and a substantively useful comparison point” (Stopforth, 2020, p. 72)

“The goodness-of-fit measures are almost identical for the CASMIN and ISCED models, with these models having slightly higher Pseudo R2 and slightly lower BIC statistics than the model using UK qualifications” (Stopforth, 2020, p. 72)

“practice, there are no clear statistical or theoretical reasons to prefer any particular measure of parental education level. The aggregated UK qualifications measure is the preferred measure in subsequent analyses” (Stopforth, 2020, p. 74)

“is no single, agreedupon way to measure GCSE attainment. The policy benchmark of attaining 5 or more A*-Cs has been used so far. The following section provides a series of sensitivity analyses to explore different operationalisations of GCSE attainment.” (Stopforth, 2020, p. 74)

“This section examines a categorical measure of GCSE attainment brackets (Table 1.17). There are several models suited to modelling categorical dependent variables. Multinomial logistic regression models are suitable for nominal dependent variables, and are sometimes applied to ordinal outcomes (Long, 1997, Long and Freese, 2014). Stereotype logistic regression models are parsimonious alternatives to multinomial logistic regression models (Lunt, 2001), and can be used when the assumptions of ordered logistic regression models, such as the proportional odds assumption, are violated (Liu, 2014)” (Stopforth, 2020, p. 78)

“An ordered logistic regression model is more appropriate for GCSE attainment brackets, because there is a definite order and hierarchy to the categories and the distances between the categories are not assumed to be equal (Long, 1997” (Stopforth, 2020, p. 78)

“There are different types of ordered logistic regression models, for example, the proportional odds model (McCullagh, 1980) and the continuation ratio model (Fienberg and Mason, 1979) (also see Berridge, 1992, Gayle, 1996, O'Connell, 2006, Long and Freese, 2014)” (Stopforth, 2020, p. 78)

“Chapter 1 76” (Stopforth, 2020, p. 79)

“There is a practical analytical challenge with estimating the continuation ratio model in Stata. The commands are generally user-written and are not compatible with the svy suite in Stata. In practice, this means that appropriate adjustments for complex survey design cannot be made. The models have been estimated using unweighted data to allow for ready comparison in Table 1.19. The use of unweighted estimates is problematic for robust inferential analysis (Treiman, 2009).” (Stopforth, 2020, p. 79) This needs to be looked upon in more detail

“An assumption of the proportional odds model is that the coefficients are the same across the separate logistic regressions, termed the parallel regression assumption (Williams, 2016). The proportional odds model in Table 1.19 does not violate the parallel regression assumption (assessed using the Brant test, see Brant, 1990).” (Stopforth, 2020, p. 79)

“Continuation ratios can be constrained or unconstrained. The difference between the constrained and unconstrained models is in the number of parameters estimated. Constrained models make the assumption that the effects of the independent variables are constant across the categories of the dependent variable. Constrained models are therefore more parsimonious because they produce one set of coefficients (Long and Freese, 2014). A non-significant likelihood ratio test provides evidence to prefer the constrained model over the unconstrained model (LR 62.03 @ 60 degrees of freedom, p=.4037).” (Stopforth, 2020, p. 81)

“The substantive conclusions of the proportional odds model and the continuation ratio model in Table 1.19 are very similar” (Stopforth, 2020, p. 81)

“predicted probabilities shown are based on the proportional odds model. The predicted probabilities were re-estimated for the continuation” (Stopforth, 2020, p. 81)

“Chapter 1 79” (Stopforth, 2020, p. 82)

“The final set of models estimate the number of GCSEs attained at grades A*-C (refer to Figure 1.3). A series of regression models suitable for count data are estimated including the Poisson, negative binomial, zero-inflated Poisson, and zero-inflated negative binomial (Cameron and Trivedi, 1998). The estimates across all four models are relatively similar (see Table A1.4 in Appendix 1)” (Stopforth, 2020, p. 83)

“Long and Freese (2014: 507) advise that the Poisson model seldom fits count measures in social survey data because the model does not account for over-dispersion. The negative binomial regression model is usually better suited to dealing with data with overdispersion (Cameron and Trivedi, 1998, Long and Freese, 2014). Comparing the four models, a significant likelihood ratio test provides evidence of over-dispersion and therefore the negative binomial regression model is preferred over a Poisson model (Long, 1997, Long and Freese, 2014).” (Stopforth, 2020, p. 83)

“Zero-inflated models can account for high proportions of zeros in count data (see Long and Freese, 2014 for a detailed review).” (Stopforth, 2020, p. 83)

“Comparing the negative binomial model and zero-inflated negative binomial model, a significant Vuong test (see Vuong, 1989) provides evidence that a zero-inflated model is most suitable for these data (Long, 1997, Long and Freese, 2014)” (Stopforth, 2020, p. 84)

“Post-estimation marginal effects and expected counts can be calculated to better understand effect sizes (Table 1.21” (Stopforth, 2020, p. 86)

“Expected counts can also be calculated for a combination of characteristics, sometimes called ‘ideal types’ (Long and Freese, 2014). The characteristics of the ‘most’ advantaged can be deduced as those with graduate parents in NS-SEC 1.2 living in owned or privately rented homes. Alternatively, the characteristics of the ‘least’ advantaged can be considered those with parents with below school-level qualifications in NS-SEC 7 living in social housing.” (Stopforth, 2020, p. 86)

“A central methodological contribution of this study is the attention to sensitivity analyses of key parental socio-economic background and GCSE attainment measures.” (Stopforth, 2020, p. 87)

“Government statistics and education reports tend to focus on the attainment gap between the most disadvantaged and all other pupils.11 Often, proxy measures of disadvantage are used, such as the eligibility for Free School Meals (Department for Education, 2015c). The analyses in this chapter explore the relative gap between young people from a broader range of socioeconomic backgrounds using detailed education and occupation-based parental measures” (Stopforth, 2020, p. 88)

“replication study involves using the same analytical procedures with a different dataset to examine the empirical regularity of the findings in Chapter 1.” (Stopforth, 2020, p. 90)

“Herrnson (1995: 452) stated that replication should necessarily involve new data, which ‘repeats an empirical study in its entirety’. Janz (2016) termed this distinction duplication and replication. Freese and Peterson (2017: 152) outlined four distinct ‘forms of replication’: verifiability, robustness, repeatability, and generalisability. Verification tends to focus on producing the same results when analysing the same data. Robustness checks tend to use alternative specifications using the same data. Repeatability involves using the same analytical approach as the original study with different data.” (Stopforth, 2020, p. 90)

“Chapter 2 88” (Stopforth, 2020, p. 91)

“each wave of data collection in the UKHLS takes place over the course of 24 months. The data collection is overlapping, with a new wave beginning every 12 months. This is largely due to the vast increase in the number of households visited per wave compared with the BHPS” (Stopforth, 2020, p. 91)

“Second, GCSE results are not reported within the UKHLS main survey but are linked through official education records from the National Pupil Database (NPD).” (Stopforth, 2020, p. 91)

“The GCSE results contained within the linked NPD data cover the academic years 2001/02 to 2012/13 (University of Essex, 2015).” (Stopforth, 2020, p. 92)

“There is a higher percentage of young people from ethnic minority backgrounds in the UKHLS-NPD sample compared with the BHPS sample. This is due to the inclusion of an Ethnic Minority Boost sample in the UKHLS survey from Wave 1.” (Stopforth, 2020, p. 93)

“Despite the inclusion of an ethnic minority boost sample in the UKHLS, there continue to be low sample sizes for many ethnic groups in the UKHLS-NPD sample” (Stopforth, 2020, p. 95)

“ethnicity is collapsed into a five-category variable and is not significantly associated with GCSE attainment. The within-group heterogeneity of the aggregated variable may contribute to the overall lack of significance.” (Stopforth, 2020, p. 95)

“It is not possible to conclude that the effect of socio-economic background is stronger in the later cohorts (see Allison, 1999, Williams, 2009, Mood, 2010). There is, however, convincing evidence that parental socio-economic background continues to exert a powerful influence over the average attainment of young people into the early 2010s.” (Stopforth, 2020, p. 100)

“There is a remarkable empirical regularity in the results of the zero-inflated negative binomial regression models across the two synthetic cohort samples in the BHPS and the UKHLS-NPD.” (Stopforth, 2020, p. 104)

“The datasets span over two decades and the analyses illustrate that there are clearly persisting socio-economic background effects in GCSE attainment for the young people sitting their GCSEs in the 1990s, 2000s, and early 2010s.” (Stopforth, 2020, p. 104)

“The NPD collects information at a finer level of detail than is routinely collected in social science surveys. The NPD data provide information on individual grades for each GCSE subject.” (Stopforth, 2020, p. 105)

“overall substantive message emerging from the empirical results is that socio-economic inequalities in GCSE attainment are not limited to the gap between the most disadvantaged and their less disadvantaged peers” (Stopforth, 2020, p. 110)

“The presence of missing data is ubiquitous in social science surveys (Hawkes and Plewis, 2006, Longhi and Nandi, 2015). Missing data has the potential to produce biased estimates in statistical analyses (Treiman, 2009). Carpenter and Kenward (2013) advised that a complete records analysis should be the first step before attempting to address missing data. The authors argued that these analyses can provide valid inferences, but there is a potential for results to be inefficient (Carpenter and Kenward, 2013: 35). Mehmetoglu and Jakobsen (2017) noted that missing data are often dealt with using listwise deletion. The analyses in Chapters 1 and 2 were conducted on complete records, i.e. missing data on any of the variables in the analytical models were dealt with through listwise deletion” (Stopforth, 2020, p. 111)

“There are three general types of missingness: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). If data are MCAR, missingness is not conditional on any other variable or on the values of the variable itself. Analyses with data MCAR are likely to provide valid estimates but potentially inefficient standard errors. If data are MAR, missingness may be conditional on other variables in the dataset, but not conditional on the values of the variable itself.” (Stopforth, 2020, p. 112)

“Analyses with data MNAR generally require the most comprehensive statistical methods to help to overcome the problems associated with missingness (see Treiman, 2009, Mehmetoglu and Jakobsen, 2017).” (Stopforth, 2020, p. 112)

“Treiman (2009), Carpenter and Kenward (2013), Mehmetoglu and Jakobsen (2017) advise against ad hoc methods of dealing with missing data, such as pairwise deletion, mean substitution, and creating explicit categories of missingness. Treiman (2009: 182) commented that the ‘gold standard’ of treating missing data in the social sciences is the use of multiple imputation methods.” (Stopforth, 2020, p. 112)

“all variables with missing data in the analytical model are multiply imputed by chained equations (Carpenter and Kenward, 2013)” (Stopforth, 2020, p. 113)

“Those not in employment can be considered as structurally missing. The” (Stopforth, 2020, p. 114)

“are a further 79 missing values for parental NS-SEC. Of the 79 missing values, 54 are missing because neither mother nor father are in the household, and 25 are missing because the parents did not complete an interview.” (Stopforth, 2020, p. 114)

“Unweighted analyses are naïve because they do not take into account complex survey design, and tend to underestimate standard errors (Treiman, 2009).” (Stopforth, 2020, p. 115)

“This is the complete records logistic regression model presented in Chapter 1. Model 3 presents the results of a logistic regression model adjusting for complex survey design using a 9-category version of NS-SEC, i.e. including those not in employment as a separate group” (Stopforth, 2020, p. 115)

“The substantive conclusions derived from Models 2 and 3 are the same. Model 4 presents the results of a logistic regression model adjusting for complex survey design and using a 9-category NS-SEC variable with the last observation carried forward (LOCF). This means that, in the event that an individual is not in employment in the year that their child reported their GCSEs, their NS-SEC from the previous wave is fed forward to fill in a missing data gap. In practice, very few cases changed as a result of carrying the last observation forward” (Stopforth, 2020, p. 115)

“The sample size has, however, increased as a consequence of adding in an extra category of NS-SEC for ‘not in employment” (Stopforth, 2020, p. 115)

“The multiple imputation models are estimated using the mi suite in Stata (see mi, StataCorp, 2017a), which is compatible with the svy suite and can therefore adjust for complex survey design” (Stopforth, 2020, p. 117)

“limitations to using multiple imputation models. For example, goodness-of-fit measures, such as summary statistics of parsimony (BIC) and explanatory power (R2), are not currently estimable using statistical software. It is not possible to compare goodness-of-fit statistics to ascertain the most appropriate model. Further, there is no clear guidance on the optimum number of imputations to estimate” (Stopforth, 2020, p. 117)

“Model 1 presents an unweighted logistic regression model using a 9-category NS-SEC variable. Model 2 presents the logistic regression model results adjusting for complex survey design and using a 9-category NS-SEC variable. The results are fairly consistent between the two models, with more conservative standard error estimates for the model adjusting for complex survey design.” (Stopforth, 2020, p. 117)

“The next three models use variations of multiple imputation by chained equations” (Stopforth, 2020, p. 117)

“e imputation model and analytical model do not use survey weights or adjust for complex survey design. The model does not impute the values of NS-SEC for those structurally missing” (Stopforth, 2020, p. 117)

“The results are consistent with the unweighted complete records model (Model 1). Model 4 presents the results of an unweighted multiple imputation model and weighted analytical model” (Stopforth, 2020, p. 117)

“Model 5 presents the results of a weighted imputation model and a weighted analytical model. This model adjusts for complex survey design in both the imputation and analytical stages of the multiple imputation process” (Stopforth, 2020, p. 119)

“Models 4 and 5 are almost identical.” (Stopforth, 2020, p. 119)

“the results of the multiple imputation by chained equation models are consistent with the complete records analyses in Chapter 1.” (Stopforth, 2020, p. 119)

“Constructing categorical variables with separate categories of missingness is not generally advised in the missing data literature (Carpenter and Kenward, 2013, Mehmetoglu and Jakobsen, 2017).” (Stopforth, 2020, p. 123)

“The following chapters explore the relationship between parental socio-economic background and GCSE attainment in greater depth” (Stopforth, 2020, p. 124)

“Cognitive and educational outcomes at earlier stages of schooling are stratified by parental socio-economic background (Feinstein, 2004, Gregg and Washbrook, 2011, Chowdry et al., 2011, Crawford et al., 2017, Connelly and Gayle, 2019).” (Stopforth, 2020, p. 124)

“Detailed measures of parental education and parental occupation are seldom available in administrative datasets. Linked official records for educational attainment are seldom available in large-scale social science surveys. The advantage of using the UKHLSNPD data thereby provide an original opportunity to model parental socio-economic background and school attainment using detailed individual measures for contemporary cohorts of young people.” (Stopforth, 2020, p. 125)

“The use of cultural capital in quantitative sociological research provides substantial operationalisation challenges (Lamont and Lareau, 1988, Sullivan, 2002). The UKHLS contains a wealth of candidate measures for the concept of cultural capital.” (Stopforth, 2020, p. 125)

“evious” (Stopforth, 2020, p. 125)

“Introduction to Part 2 123” (Stopforth, 2020, p. 126)

“The use of an additional ‘not in employment’ category in the analyses in Part 2 has been largely motivated by the desire to maintain a suitably large sample size of valid observations and associated statistical power” (Stopforth, 2020, p. 126)

“socio-economic background tends to play a diminished role in explaining differences in higher education participation after controlling for prior educational attainment, due to selection effects and socio-economic inequalities influencing attainment at earlier stages (Galindo-Rueda et al., 2004, Marcenaro-Gutierrez et al., 2007, Broecke and Hamed, 2008, Chowdry et al., 2013, Smith, 2015).” (Stopforth, 2020, p. 128)

“Young people in England are periodically examined throughout their primary and secondary schooling at ages 7, 11, 14, and 16 (Machin and Vignoles, 2006).” (Stopforth, 2020, p. 129)

“Key Stage assessments were initially marked internally by teachers (Wyse and Torrance, 2009). Following the Dearing reforms, Key Stage 2 and 3 tests in English, Mathematics, and Science were to be externally examined (Whetton, 2009).” (Stopforth, 2020, p. 130)

“Eligibility for Free School Meals (FSM) is a standard measure of deprivation or disadvantage in the education system in England (Steele et al., 2007, Department for Education, 2015a, Department for Education, 2015b). FSM eligibility is an indicator of relative poverty, and is based on the receipt of other welfare benefits such as income support (Gorard and Siddiqui, 2019).” (Stopforth, 2020, p. 131)

“Locality-based analyses of pupil attainment in primary and secondary schools have demonstrated that eligibility for Free School Meals is strongly associated with attainment over the Key Stages (Sammons et al., 1997, Sammons and Smees, 1998, Strand, 1999, Strand, 2014b).” (Stopforth, 2020, p. 131)

“The FSM measure has a clear, legal definition and can be assumed to be consistently collected in education datasets for all pupils (Gorard and See, 2009).” (Stopforth, 2020, p. 132)

“Taylor (2018) examined the reliability of FSM eligibility as a measure of socio-economic position using the Millennium Cohort Study and linked administrative data for Wales. The study found that, whilst not a perfect measure, FSM eligibility does provide a pragmatic and reliable approach to measuring disadvantage in educational research. However, using FSM eligibility at one point in time does not take into consideration longer-term effects of relative poverty (Gorard and Siddiqui, 2019” (Stopforth, 2020, p. 132)

“Goldthorpe and McKnight (2004) similarly noted that measures of income are less stable indicators of socio-economic position than occupation-based measures.” (Stopforth, 2020, p. 133)

“NS-SEC can therefore provide a more stable measure to examine socio-economic inequalities than a proxy indicator for low income. Further, FSM eligibility compares a minority of the most disadvantaged pupils, i.e. those living in relative poverty, to a majority of non-disadvantaged pupils (Gorard and See, 2009).” (Stopforth, 2020, p. 133)

“Feinstein (2004) analysed the UK 1958 and 1970 birth cohorts and noted that there was a compounding effect of the educational attainment gap between more and less advantaged pupils throughout primary and secondary schooling.” (Stopforth, 2020, p. 134) Massively important to look into

“Scott (2004) analysed the BHPS and demonstrated that social class effects were larger at A’ Level than GCSE level, emphasising the importance of selection effects” (Stopforth, 2020, p. 134)

“education level and parental NS-SEC are included as the main socioeconomic background measures of interest. Housing tenure, gender, and ethnicity are also included in the following models. Ethnicity is measured as a five-category variable following guidance from the Office for National Statistics (Office for National Statistics, 2013). Information has been omitted (represented with an *) where categories did not meet the minimum threshold for statistical disclosure control.” (Stopforth, 2020, p. 138)

“next step in the analyses is to examine models of GCSE attainment, controlling for prior attainment. This is common practice in the literature (for example, Gregg and Washbrook, 2011, Chowdry et al., 2011, Strand, 2014b, Sullivan et al., 2018). Controlling for prior attainment is a useful starting point before moving to more comprehensive analyses” (Stopforth, 2020, p. 144)

“Modelling GCSE attainment and controlling for Key Stage 2 attainment, even after including an interaction effect, is not satisfactory. First, the models violate the regression assumption of multicollinearity.” (Stopforth, 2020, p. 149)

“Second, the determinants of the outcome measure of GCSE attainment are likely to be similar to those underlying an outcome measure of earlier attainment. For example, as observed in section 5.1 above, the socio-economic measures influencing GCSE attainment are also significantly associated with Key Stage 2 attainment” (Stopforth, 2020, p. 149)

“One conventional approach to addressing endogeneity in statistical models is the use of instrumental variables (Angrist and Pischke, 2009).” (Stopforth, 2020, p. 149)

“Suitable instrumental variables for educational outcomes are not routinely collected in social science surveys and therefore this is not a practicable approach in these analyses” (Stopforth, 2020, p. 149)

“Controlling for prior attainment is an alternative, pragmatic approach in statistical models where suitable instruments do not exist. However, there are limitations to this approach. For example, the effects of residual heterogeneity for Key Stage 2 attainment are likely to also affect the estimates of a model of GCSE attainment using Key Stage 2 attainment as an independent variable” (Stopforth, 2020, p. 149)

“Path analysis models can estimate more than one outcome variable in the same model (Allen, 2017), including outcomes which may be temporally dependent. This is particularly useful in the present case, where parental socio-economic background measures are significantly associated with two related educational outcome variables (attainment at age 11 and attainment at age 16). Path analysis associations tend to be presented in diagrammatic form with standardised coefficients (Wright, 1960, Duncan, 1966). A statistically attractive property of path analysis is the decomposition of effects of the exogenous variables on the endogenous variables into direct, and indirect effects.” (Stopforth, 2020, p. 150)

“This is a recursive model, because the effect is in one direction (Allen, 2017” (Stopforth, 2020, p. 150)

“Figure 4. 7: A conceptual path model of parental socio-economic background and” (Stopforth, 2020, p. 151)

“Path models are sometimes considered special types of structural equation models where all variables are manifest, i.e. observed (Kaplan, 2009, Acock, 2013). The path analysis models presented are estimated in a structural equation modelling framework (see sem, StataCorp., 2017a). The path analysis output is presented in Table 4.5. The coefficients have been standardised to follow path model convention (Kaplan, 2009). The standardised coefficients allow for direct comparison of effect sizes across independent variables.” (Stopforth, 2020, p. 151)

“Identifying a well-specified model with suitable goodness-of-fit is an important criteria for structural equation models (Yuan, 2005), of which path models are a special case.” (Stopforth, 2020, p. 153)

“The analyses report absolute fit indices of a chi square test, the root mean squared error of approximation (RMSEA) and the Standardised Root Mean Squared Residual (SRMSR). The analyses also report two measures of relative fit indices, the Comparative Fit Index (CFI) and the Tucker-” (Stopforth, 2020, p. 153)

“Chapter 4 151” (Stopforth, 2020, p. 154)

“To ascertain the full effects of parental socio-economic background, the coefficients are traced back from the main outcome of interest to the exogenous variables (Wright, 1960), i.e. from GCSE attainment to parental education and parental social class.” (Stopforth, 2020, p. 154)

“The following section re-estimates the path analysis models using separate measures of English and Maths test scores” (Stopforth, 2020, p. 157)

“There is a large literature on school effectiveness which emphasises the role that schools play in young people’s educational outcomes (see Goldstein and Woodhouse, 2000 for a comprehensive review” (Stopforth, 2020, p. 161)

“The UKHLS is a nationally representative study of households in the UK, not of young people of secondary school age in the UK. There are likely to be few pupils in the same schools and strong school clustering effects are less likely to be evident.” (Stopforth, 2020, p. 161)

“If the type of school measure were collected and categorised according to selectiveness, such as comparisons of comprehensive schools to grammar schools or private schools, then the” (Stopforth, 2020, p. 161)

“Chapter 4 159” (Stopforth, 2020, p. 162)

“Cultural capital was originally theorised to explain social class differences in educational outcomes (see Bourdieu, 1986).” (Stopforth, 2020, p. 165)

“Cultural capital can be understood as the accumulation of a set of skills, knowledge, attitudes, and behaviours which are sanctioned by the ‘dominant’ class in society (see Lamont and Lareau, 1988). Access to, or possession of, cultural resources can help individuals access ‘scarce rewards’ (Lareau and Weininger, 2003). In an education setting, rewards might be more favourable examination results, participation in higher education, or access to more prestigious universities. Lareau and Weiniger (2003: 587) stressed the exclusionary aspect of cultural capital and its potential to be monopolised, i.e. it is not universally available to all and is transmitted from parents to children so that advantage is passed down between generations.” (Stopforth, 2020, p. 166)

“The concept of cultural capital is most closely associated with Pierre Bourdieu (Davies and Rizk, 2017). Lareau and Weininger (2003: 567) asserted that cultural capital is a ‘signature concept’ of Pierre Bourdieu.” (Stopforth, 2020, p. 166)

“Goldthorpe (2007) suggested that the role of cultural differences in educational outcomes was not new, and was reflective of a longer tradition in the sociology of education. For example, Bernstein noted that children from different social backgrounds were taught different (i.e. elaborated compared with restricted) linguistic codes at home (Bernstein, 1964). Jackson and Marsden’s (1962) ethnographic study in 1950s Huddersfield found that northern working class children struggled to adjust to the cultural environment of grammar schools. The cultural dissonance of the ‘scholarship boy’ in grammar schools was similarly presented in contemporaneous works, such as Hoggart (1957) and Lacey (1970).” (Stopforth, 2020, p. 167)

“Cultural capital has three states: embodied, objectified, and institutionalised (Bourdieu, 1986).” (Stopforth, 2020, p. 167)

“The capital metaphor implies an element of accumulation of time and labour investment for each form of cultural capital before rewards are accessible (see Bourdieu, 1986, Field, 2003). Savage et al. (2005) used the terminology of ‘capitals, assets, and resources’, suggesting that the distinction between resources and capital was accumulation over time” (Stopforth, 2020, p. 168)

“Bourdieu and Passeron (1990) argued that only those who already possess the dominant cultural capital will gain from the education system, notably those with the linguistic and cultural competence of the dominant culture” (Stopforth, 2020, p. 168)

“Sullivan (2002) argued that there is a lack of conceptual clarity in the original theorisation of cultural capital” (Stopforth, 2020, p. 169)

“One of the key consequences of the conceptual vagueness has been varied subsequent operationalisations of cultural capital (DiMaggio, 1979, Sullivan, 2002” (Stopforth, 2020, p. 170)

“Egerton (1997) analysed the National Child Development Study in the UK and demonstrated the importance of cultural capital (highest education level) for respondents entering the same occupational destinations as their parents” (Stopforth, 2020, p. 171) Important to look into

“Sullivan (2002: 154) argued that Bourdieu was ‘not entitled to assume that a high parental level of education reveals a high level of parent cultural capital’. The use of parental education level as a measure of cultural capital assumes that those attaining higher levels of education automatically possess greater levels of cultural capital or engage in classically highbrow cultural practices.” (Stopforth, 2020, p. 172)

“Sullivan (2001) and Jaeger (2009) both noted that researchers tend to test partial components of a cultural capital theory rather than the broader theory of cultural reproduction” (Stopforth, 2020, p. 180)

“Graaf et al. (2000) provided refinements to testing the role of cultural capital and cultural reproduction. First, evidence in favour of the cultural reproduction thesis would demonstrate that the impact of cultural capital is greater for children from more advantaged than less advantaged backgrounds. Second, evidence for the cultural reproduction thesis would demonstrate that cultural capital mediates the role of social background in educational” (Stopforth, 2020, p. 180)

“Chapter 5 178” (Stopforth, 2020, p. 181)

“The first set of analyses examine the extent to which cultural capital is unequally distributed by socio-economic background and is transmitted from parents to children” (Stopforth, 2020, p. 186)

“De Graaf et al. (2000) and Sullivan (2001) noted that the inclusion of cultural capital measures in models of educational attainment should weaken, i.e. mediate, the effects of parental socio-economic background” (Stopforth, 2020, p. 191)

“In the next stage of the analyses, cultural capital measures are entered separately into linear regression models of GCSE attainment along with parental education or parental social class” (Stopforth, 2020, p. 193)

“Partial mediation occurs where the effect of the independent variable is reduced after controlling for the mediating variable” (Stopforth, 2020, p. 193)

“Perfect mediation occurs where the independent variable is not significantly associated with the dependent variable after controlling for the mediating variable (Baron and Kenny, 1986).” (Stopforth, 2020, p. 193)

“The cultural capital measures do not, however, substantially reduce the socio-economic gradient or remove the persisting association between NS-SEC and overall GCSE score.” (Stopforth, 2020, p. 194)

“analysis is a robust method of data reduction routinely used in social science research (Bartholomew, 2008). This method has been used in previous empirical work to operationalise cultural capital (for example, DiMaggio, 1982, De Graaf, 1986, De Graaf, 1988, Katsillis and Rubinson, 1990, Hartas, 2016). The aggregation of separate measures into a single factor score” (Stopforth, 2020, p. 195)

“Chapter 5 193” (Stopforth, 2020, p. 196)

“The resultant factor score can be interpreted as a quantity measure, whereby the individual has ‘more’ or ‘less’ cultural capital.” (Stopforth, 2020, p. 196)

“Factor analysis is estimated using the correlation matrix of variables (Pett et al., 2003). In this case, the tetrachoric correlation matrix is used for dichotomous variables (Uebersax, 2015).” (Stopforth, 2020, p. 196)

“The results presented use principal components analysis, and the results were also re-estimated using the default (principal factor) method in Stata (see factor, StataCorp., 2017a).” (Stopforth, 2020, p. 196)

“Mehmetoglu and Jakobsen (2017) advise that the number of retained factors should be based on eigenvalues, scree test, and theoretical sense” (Stopforth, 2020, p. 196)

“The measure of parent library use loads weakly on all factors, and the measure of young person theatre attendance loads strongly (just over the 0.4 threshold) on two factors. The results are re-estimated separately without the parent library and child theatre variables, but the substantive results remain the same as in the model presented below” (Stopforth, 2020, p. 197)

“Factor scores are generated and used as covariates in the linear regression models of GCSE attainment. The factor scores are standardised with mean of 0 and standard deviation of 1.” (Stopforth, 2020, p. 198)

“Summed scales are often used in empirical work using cultural capital indicators (for example, Roscigno and Ainsworth-Darnell, 1999, Dumais, 2002, Eitle and Eitle, 2002, Kraaykamp and Eijck, 2010).” (Stopforth, 2020, p. 199)

“The summed scale measures were added sequentially into the model and removed if not significant (see Table A1.9 in Appendix 1).” (Stopforth, 2020, p. 200)

“The mediation of highbrow cultural participation after the inclusion of reading behaviours has been demonstrated in previous empirical work (see Crook, 1997, De Graaf et al., 2000, Sullivan, 2001).” (Stopforth, 2020, p. 200)

“The addition of the cultural capital measures improves the explanatory power (Adjusted R2) and parsimony (BIC) of the model. There are minimal differences in the coefficients for parental education and parental social class after the inclusion of cultural capital measures. Therefore, there is a large parental socio-economic background effect in GCSE attainment which persists even after including measures of cultural capital, i.e. parent and child reading behaviours” (Stopforth, 2020, p. 200)

“The overall effect is modest and there is not convincing evidence that cultural capital mediates the role of parental socio-economic background in GCSE attainment. Cultural capital does not explain the socio-economic background effect. These findings do not support a Bourdieusian interpretation of the role of cultural capital in educational outcomes as outlined in section 2.” (Stopforth, 2020, p. 202)

“Bourdieu’s theory of cultural capital, and the broader cultural and social reproduction, has been criticised as being overly deterministic (see Jenkins, 1992, Sullivan, 2002). The concept has been widely criticised for being vague and ill-defined (Lamont and Lareau, 1988, Sullivan, 2002). Goldthorpe (2007) suggested that cultural ‘capital’ is associated with much ‘theoretical baggage’, and that cultural ‘resources’ should be preferred terminology” (Stopforth, 2020, p. 207)

“Youth educational aspirations have often been explored in the form of a desire to continue to study after compulsory schooling (Croll, 2009, Gorard et al., 2012) or to apply to university (Croll and Attwood, 2013, Khattab, 2015, Anders, 2017, McCulloch, 2017). Youth occupational aspirations have often been explored with focus on gender, ethnicity, and social class dimensions (Dumais, 2002, Archer et al., 2014, Platt and Parsons, 2017, Platt and Parson, 2018).” (Stopforth, 2020, p. 209)

“Baker et al. (2014) argued that the policymaker framing of ‘high’ and ‘low’ aspirations is problematic. Treanor (2017) challenged the language of ‘high’ aspirations, arguing that the alternative, i.e. ‘low’ aspirations, infers a deficit view for those from less advantaged backgrounds. St Clair and Benjamin (2011) remarked that perceived ‘low’ aspirations have been framed as something of a ‘personal shortcoming’ for individuals and their parents. In a speech to the Labour party conference in 2007, Prime Minister Gordon Brown used the phrase ‘poverty of aspiration’. He asked ‘how much talent that could flourish is lost through a poverty of aspiration: wasted not because young talents fail to reach the stars but because they grow” (Stopforth, 2020, p. 210)

“Chapter 6 208” (Stopforth, 2020, p. 211)

“status attainment literature in the USA in the late 1960s suggested that aspirations were central to understanding socio-economic inequalities in young people’s education (Sewell and Shah, 1968a, Sewell et al., 1969). Sewell and Shah (1968b: 559) noted ‘it is a sociological truism [...] that children of higher social class origins are more likely to aspire to high educational and occupational goals than are children of lower social class origins’.” (Stopforth, 2020, p. 211)

“Khattab (2015) disentangled expectations and aspirations, arguing that although linked, the two represent differences between what one hopes to achieve, and what one can realistically achieve. Conceptually, aspirations may operate outside of structural constraints, or ‘socio-economic realities’ (Gorard et al., 2012).” (Stopforth, 2020, p. 211)

“also conceptually plausible that educational aspirations may differ according to social class or education level (Anders, 2017).” (Stopforth, 2020, p. 211)

“The role of parental social class in aspiration formation may be understood in ‘relative risk aversion’ terms (see Breen and Goldthorpe, 1997). Following this theoretical position, undertaking further educational opportunities may reduce the risk of downward social mobility, and ensure that the young person reaches at least the class position of their parents (Breen and Goldthorpe, 1997, Breen and Yaish, 2006, Holm and Jaeger, 2008).” (Stopforth, 2020, p. 212) This is my belief

“The findings demonstrated that young people had similar aspirations of going to university at the age of 14, but those from less advantaged backgrounds were more likely to revise their aspirations downwards (from ‘likely’ to ‘unlikely’), and those from more advantaged backgrounds were more likely to revise their aspirations upwards (Anders, 2017).” (Stopforth, 2020, p. 213)

“McCulloch (2017) used latent class analysis to track young people’s educational trajectories using the LSYPE data, and found evidence of cumulative advantage and disadvantage with regards to socio-economic background, aspiration formation, and achievement.” (Stopforth, 2020, p. 213)

“Croll (2009) analysed intentions to stay on beyond the compulsory school leaving age of 16 using the BHPS and found that lower achieving individuals from more advantaged backgrounds were more likely to stay on in education than lower achieving individuals from less advantaged backgrounds” (Stopforth, 2020, p. 213)

“larger participation gap may be a result of socio-economic differences in access to resources or capitals (Hartas, 2016), or the material circumstances of the family (Hoskins and Barker, 2016).” (Stopforth, 2020, p. 213)

“Gorard et al. (2012: 41) stressed the non-recursive (i.e. not in one direction) nature of the association between aspirations and attainment, as ‘aspirations can be both a predictor ... and an outcome’ of educational achievement.” (Stopforth, 2020, p. 215)

“Anders (2017) commented that when measuring university aspirations, attainment at later levels of schooling, for example GCSE examinations at age 16, are potentially endogenous. This is largely because high attainment at this level is a requirement for A’ Level study, and subsequent university application” (Stopforth, 2020, p. 215)

“process of aspiration formation has been similarly described as ‘dynamic’, involving internal feedback between attainment and aspirations (St Clair and Benjamin, 2011).” (Stopforth, 2020, p. 215)

“Sullivan et al. (2013) used the Millennium Cohort Study to model cognitive outcomes and noted that mothers’ aspirations for their children at age 7 were ‘universally high’ but that such aspirations were likely to vary more as their children progress through education” (Stopforth, 2020, p. 215)

“Academic self-concept can be informed by prior attainment, and can also play an important role in educational and occupational aspirations (Winterton and Irwin, 2012). A young person’s perception of where they fit in the academic hierarchy is important in developing their future educational and occupational aspirations, and are likely to be solidified during secondary school years (Furlong and Biggart, 1999).” (Stopforth, 2020, p. 216)

“First, parents are asked how important they think A’ Levels are for their child” (Stopforth, 2020, p. 218)

“second measure is whether parents would like their child to go to university” (Stopforth, 2020, p. 218)

“First, longitudinal panel models are estimated to examine university aspirations over the secondary school years. Second, cross-sectional regression models are estimated to examine GCSE attainment and the roles of socio-economic background and parent and child aspirations.” (Stopforth, 2020, p. 219)

“Gayle and Lambert (2018) commented that balanced panels are uncommon in social survey datasets” (Stopforth, 2020, p. 219)

“Repeated contacts data are sub-optimally modelled using standard regression models, because they violate the assumption of independence of observations (Gayle and Lambert, 2018). Panel regression models, for example fixed or random effects models, account for repeated contacts within the dataset (Bell et al., 2018).” (Stopforth, 2020, p. 223)

“There are advantages and limitations with both fixed effects and random effects models (Clarke et al., 2010, Clark and Linzer, 2015, Bell et al., 2018, Hill et al., 2019). Fixed effects panel models account for within-person change and tend to theoretically produce consistent estimates. However, fixed effects panel models estimate potentially inefficient standard errors and cannot estimate models with time-constant explanatory variables. Random effects panel models better account for both within-person and between-person change, and can estimate both time-constant and time-varying explanatory variables” (Stopforth, 2020, p. 223)

“Gayle and Lambert (2018) provide a comprehensive review of panel models and their applications using Stata” (Stopforth, 2020, p. 224)

“Fixed effects models drop observations with unchanging outcomes. This is clearly problematic for the current sample which exhibits high levels of within-person stability.” (Stopforth, 2020, p. 224)

“linear panel models, the Mundlak correction can be applied to random effects models to retrieve the coefficients and standard errors of a fixed effects model (Mundlak, 1978). The Mundlak correction has the attractive property of estimating time-constant variables and helps to overcome the strong assumption of error terms being uncorrelated with unobserved variables. The Mundlak correction involves adding the means of time-varying variables into the random effects model. Allison (2009) provides an alternative, the hybrid model, which can be used in a logistic regression framework. The hybrid method involves adding the means and the deviations from the mean of time-varying variables into the random effects model (Allison, 2009).” (Stopforth, 2020, p. 224)

“fixed and random effects models are estimated to compare results. Although tracking individual trajectories is conceptually attractive, the fixed effects model did not converge. This is likely due to the large proportion of observations with unchanging outcomes.” (Stopforth, 2020, p. 224)

“hybrid method is used to compare the fixed effects and random effects estimates. Allison (2009: 3) noted that where time-varying variables do not vary very much, the estimates of the fixed effects model are very imprecise” (Stopforth, 2020, p. 224)

“The measures are theoretically timevarying, but the variation over time is low and the between-effects of parental education and parental social class are much larger than the within-effects. Therefore, the random effects models are preferred in the analyses below” (Stopforth, 2020, p. 225)

“As a binary variable, university aspirations are estimated using a panel logistic regression model (see xt, StataCorp., 2017a). The xt suite is, however, not compatible with the svy suite and therefore the models are unweighted and do not adjust for complex survey design.” (Stopforth, 2020, p. 226)

“A key methodological benefit of using the UKHLS-NPD, repeated contacts, data is the temporal ordering of the measures in these data is very useful to understanding the direction of effect.” (Stopforth, 2020, p. 231)

“The panel analyses in section 3 demonstrated that attainment at age 11 is strongly associated with university aspirations developed over secondary school. To further develop this analytical theme, Model 2 introduces an interaction effect between Key Stage 2 attainment and university aspirations.” (Stopforth, 2020, p. 240)

“The analyses of the panel data indicate that aspirations tend to be very stable over time, particularly for young people expressing interest in continuing their education beyond the compulsory age” (Stopforth, 2020, p. 241)

“HOLM, A. & JAEGER, M. M. 2008. Does relative risk aversion explain educational inequality? A dynamic choice approach.” (Stopforth, 2020, p. 291)

“PAYNE, J. 1995a. Qualifications between 16 and 18: A comparison of achievements on routes beyond compulsory schooling” (Stopforth, 2020, p. 298)

“PAYNE, J. 1995b. Routes beyond compulsory schooling.” (Stopforth, 2020, p. 298)