Rethinking causality and inequality in students’ degree outcomes

Abstract Inequality in students’ degree outcomes has been a concern for the higher education sector and the UK government for more than a decade. Since its inception in 2018, the Office for Students in England has prioritised the need for evidence of causality through requiring institutions to evaluate the effectiveness of their initiatives as set out in Access and Participation Plans. This policy development responds to several reports which identify a dearth of evidence-based interventions and scant knowledge of ‘what works’. This paper traces the interplay between policy and research, focusing on the assumptions they make about causality. It concludes that unwarranted positions are taken in both spheres of practice, making progress unlikely. A conception of causality situated in extant formal theory on evidential pluralism and that draws on current practices would help us address inequality more effectively. Alternative framings of the problem of inequality in students’ degree outcomes is offered.


Introduction
In 2020-2021, 85.7 percent of white students received a first/2:1 compared with 67.2 percent of black students, representing a gap of 18.5 percentage points. There are also significant gaps between White students and Asian students (6.1 percent), and White students and students from all other ethnicities (9.0 percent). Moreover, first class degrees are awarded to white students at more than double that to black students (Advance HE 2022). As will become apparent below, these inequalities (and others relating inequalities in social class indicators) were identified in the mid-1990s, but widespread awareness in England has only accelerated since the Office for Students' regulatory framework of Access and Participation plans introduced public institutional monitoring and targets to address them in 2018. 1 The sense of urgency to address these disparities has grown as a result of student campaigns and in the context of the Black Lives Matter protests of 2020.
In some ways the outrage that many of us feel at this inequality can cloud our capacity to look dispassionately at this problem. As I will outline, there is no shortage of research on this question, but we are at an impasse. On the one hand, moral grandstanding often stifles the possibilities for open debate, and on the other the deployment of a narrow conception of causality in HE policy ignores the possibilities for change that would come from the generation of evidence of causal mechanism that could lead to eliminating this inequality. Fundamentally we know little from the statistical studies that have been undertaken at a national level about how to address inequality in degree outcomes. We need to take a wider range of evidence into account -such as that produced in many institutional studiesin combination with statistical studies. There are many mechanism hypotheses that operate as 'working knowledge' in the sector -often unarticulated -that underlie current institutional initiatives. As well as articulating our mechanism hypotheses we need to test them in ways that both value practice-based (usually qualitative knowledge) and associative studies at both national and institutional levels. This paper will begin by setting out the policy, research, and institutional contexts within which inequality in degree outcomes has been addressed. In reviewing some of the research in this area, there is no attempt at a comprehensive review of the literature which has been accomplished elsewhere, (see for example Wilson and Dauncey 2020;Mountford-Zimdars et al. 2015;Singh 2011). Instead, the purpose of this section is to describe how causation is implicitly or explicitly conceptualised in this work. Similarly, with respect to policy in this area, I select those reports that precede and foreground the most recent articulations of causality in relation to students' degree outcomes. In the section on institutional contexts, I argue that institutions of higher education limit the scope of their own agency by focusing on protecting against reputational damage and seeking to control the narrative surrounding this problem. HEIs are consumers and commissioners of research in this area, and perhaps the proximity of these functions has produced a potential conflict of interest and therefore a lack of critical perspective and progression in the form of conceptual development in both how the problem is seen as well as the interventions designed to address it. In the third section of this paper, I turn to the concept of causation and draw on sociological and philosophical literatures to set out an approach to evidencing causation that is appropriate to this problem. In the closing section, I set out some directions of inquiry to build our understanding of inequality in degree outcomes, drawing on a concept of causation that is theoretically and practically grounded.

The research and policy contexts
Research on inequality between the degree outcomes of white students, and black and minority ethnic students is diverse in its disciplinary and theoretical underpinnings and disparate. As will become apparent in this section, the history of research in this area is marked by periodic high-profile reports from a range of public bodies associated with higher education or social equality. These reports reflect national and institutional general interest in markers of social mobility and the potential for education to reduce or mitigate social inequality. In the context of this interest from policy makers, the question has drawn academic researchers from several disciplines including sociology, psychology, and education. Those commissioned to write public reports are usually, but by no means always, themselves academics specialising in this field. There is, therefore, a complex interlocution between the peer-reviewed academic research that reconceptualises or extends our knowledge; and the reports of public bodies that tend to draw together practices, perspectives, and recommendations with attendant primary intentions to influence public debate and ultimately practices within universities. The relationships between academic research and these policy documents, even when the latter claim to be based on the former, are rarely explicit, and sometimes invisible in the public literature. This relationship itself has been the subject of debate and power struggles evidenced in academics' responses to the evidenced-based policy movement over the last twenty years or so (Oakley 2002, Gorard 2020, and the closely related 'what works' policy agenda (Biesta 2007;Gewirtz and Cribb 2020).
Public bodies that have published reports on inequality in HE students' degree outcomes include the (then) Department for Education and Skills (Connor et al. 1996(Connor et al. , 2004Broecke and Nicholls 2007), Advance HE (Richardson 2008b;Stevenson 2012;Cousin and Cureton 2012;Stevenson and Whelan 2013 among many others), the Office for Students (for example, 2018), and its predecessor Higher Education Funding Council for England (e.g. Mountford-Zimdars et al. (2015); Universities and College Union (Bhopal and Pitkin 2018); NUS (2011) and the Universities UK and the National Union of Students (2019).
The most recent of these, the UUK report, published in partnership with the NUS, states its aim as 'to break down these barriers and accelerate sector-wide progress towards eliminating BAME attainment gaps. ' This report, while providing a valuable record of current discussions in support of its aim, also demonstrates three tendencies within the sector which, I argue, limit our capacity to analyse this problem of inequality in causal terms. First, it places an inordinate importance on perceptions in the sector and elevates these to the status of incontestable truth. Its main findings are based on results from two surveys: an institutional survey (44 respondents) and a student survey (69 respondents). While the report is appropriately tentative in claiming these are indicative rather than representative responses, even if the numbers had been higher, it is difficult to see how this fieldwork could result in findings that identify causes of inequality in students' degree outcomes. Respondents were asked to rate different 'contributing factors' and 'barriers within their institution' in order of importance. These ratings cannot be interpreted as anything other than the perceptions of contributing factors and barriers of a relatively small number of individuals. The report is informed by a round table event and a series of five 'evidence sessions' attended by 150 people. The overall effect is that it reproduces current discourse within universities. The findings are suggestive of hypotheses relating to cause but without research that tests these, it is unclear how such a process could further its stated aim to 'accelerate sector-wide progress' on these matters.
A significant feature of this current discourse is its condemnation of what are termed 'deficit models' that attribute the cause of inequality in degree outcomes to students' characteristics.
The [deficit] model does not therefore allow for an examination of societal or institutional structures and the discrimination that exists within them. It follows in the deficit model that ownership, accountability and responsibility for the inequalities in attainment similarly are not placed with the institution, only the individual. [2019,16] Embedded in this argument are several assumptions that obscure opportunities for open discussions of causality. There is an assumption that speaking about students' characteristics necessarily entails the location of accountability and responsibility in the students. Second, causality is seen as being either located with the institution or with the individual. This dichotomy, often expressed in a morally charged way, restricts the possibilities for unfettered inquiry. Causality as the product of an interplay between individuals and institutions is precluded. Finally, the condemnatory use of 'deficit model' has a significant discursive function in that it idealises the student, a logical step in keeping with the policy constructions of 'the student' as 'future worker' and 'hard worker' (Brooks 2018) and the elevation and sacralisation of 'the student experience' (Sabri 2011).
Finally, the third and final tendency that is typified in this UUK/NUS report which is relevant here is demonstrated in a series of institutional case studies that set out, 'what universities are already doing' but without (yet) evidence that these actions have impact. Its discursive function is to offer protection against reputational damage to the sector and to the named institutions, an unsurprising element given UUK is a membership organisation of vice-chancellors and principals of UK universities. All three tendencies are far from unique to this report and exemplify public discourse on this topic within the sector more generally.
The period of public policy research that preceded these most recent reports illuminated the problem with a greater focus on statistical analysis. The focus was on exploring the veracity of what is now known as 'the gap' . Knowledge of inequality in attainment was in the public domain and known to UK policy-makers from at least 1996 through research at the Institute for Employment Studies (Connor et al. 1996), but it was not until the early 2000s that further research was commissioned to confirm, and in particular, establish through statistical analysis that this inequality really was associated with ethnicity, and that ethnicity was not a proxy for other social characteristics or prior attainment. Commissioned by the DfES, Connor et al. (2004) laid a basis for understanding the nature of the problem in several important ways: they established that there is an effect of ethnicity even after controlling for prior qualifications; and they situated degree outcomes in a broader sweep of the student life-cycle: from access to participation to employment and satisfaction with employment. Importantly, they also disaggregated what has come be termed 'BAME' in looking separately at the outcomes of students of different ethnicities -using the categorisations that were available to them and only grouping ethnic groups if they shared common trajectories: so they found commonalities between Indian and Chinese origin students; Pakistani and Bangladeshi; and Black groups, including Black African and Black Afro-Caribbean students. Connor et al. (2004) did not make causal claims but recommended larger-scale research that would explore 'teaching quality' and the possibility of 'racial discrimination' (Connor et al. 2004, 139). Pointing to the limits of the statistical analysis undertaken to date, Broecke and Nicholls (2007), also writing a DfES report, confirmed crucially that social class (measured through IMD) and a host of other factors do not account for the correlation between degree outcomes and ethnicity. Much of the statistical analysis since then has continued to focus on establishing that ethnicity is indeed a factor in the probability of a lower degree outcome, for example, Richardson (2008aRichardson ( , 2008bRichardson ( , 2015 and HEFCE (2015).
Most statistical work in this field has focused on students' demographic characteristics. Its purpose has been to establish that unequal outcomes are occurring consistently in different circumstances and over time. There has been little statistical work at a national level to explain why ethnicity is associated with this inequality. The statistical analysis conducted so far has simply demonstrated the presence of a problem, repeatedly.
There is an implicit consensus that ethnicity does not in and of itself cause inequality (this would be racist), but that our racialised society and our education system are at play.
Arguably, one possible barrier to investigating the structures of race as causes is that many researchers have in the past been trained to avoid claims of causality (Gorard 2002;Maxwell 2012).
At a smaller scale, sometimes institution-specific, qualitative research about inequality in students' outcomes has been undertaken in combination with statistical evidence (e.g. McDuff et al. 2018;Dhanda 2009, Cousin andCureton 2012). More often, small-scale qualitative research typically explores students' perceptions of inequality, and occasionally those of staff, and suggests responses based on those perceptions and with reference to broader educational knowledge (e.g. Bunce et al. 2021). In a wide-ranging and thorough synthesis of research findings Singh (2011, 24) concludes that a 'complex range of differently connected factors' underlie this inequality rather than any one cause. He lists these factors which may play out differently according to context as including: previous educational experiences, curriculum content and design, teaching learning and assessment approaches, the learning environment and direct and indirect racism. ' This list is so expansive that it is difficult to envision a factor that is not at play in causing the inequality in students' outcomes.
We have reached a plateau. Given the longevity of this problem and the volume of research that has been conducted, it seems timely to explore what kinds of reconceptualization might enable us to reach more definitive conclusions that could usefully inform the deployment of our resources to eliminate inequality. As the forthcoming sections will demonstrate, it is both our conception of what the problem is, as well as how we go about understanding its causes that require a rethink. Before moving to consider alternative conceptions, I summarise below the policy context as it relates specifically to the causes of inequality in degree outcomes.

The policy context in relation to the causes of inequality in degree outcomes
The Office for Students (OfS) has set out a framework for institutions to evaluate their outreach and participation interventions. It sets out the following as three kinds of impact evaluation (and it is worth quoting these at length since they comprise a crucial policy instrument in the sector): Type 1: Narrative The impact evaluation provides a narrative or a coherent theory of change to motivate its selection of activities in the context of a coherent strategy Evidence of impact elsewhere and/or in the research literature on access and participation activity effectiveness or from your existing evaluation results. The claim to be made is 'We have a coherent explanation of what we do and why Our claims are research-based.
Type 2: Empirical Enquiry The impact evaluation collects data on impact and a does not establish any direct causal effect. Evidence can comprise quantitative and/or qualitative evidence of a pre/post intervention change or a difference compared to what might otherwise have happened The claim that can be made is 'We can demonstrate that our interventions are associated with beneficial results.
Type 3: Causality The impact evaluation methodology provides evidence of a causal effect of an intervention Quantitative and/or qualitative evidence of a pre/post treatment change on participants relative to an appropriate control or comparison group who did not take part in the intervention. The claim can be made that 'We believe our intervention causes improvement and can demonstrate the difference using a control or comparison group.
First, it is worth noting that the OfS guidance asserts that this schema is 'not hierarchical' . The three types are also seen as nested such that type 2 encompasses type 1 and type 3 encompasses types 1 and 2. Nevertheless, causality is explicitly excluded as a possible claim in type 2 empirical inquiry despite potentially including mixed methods. We can note also that causality is reserved for type 3 where the key distinction is the presence of a control group. The OfS is suggesting here that RCT is a necessary condition for claims of causality and asserts that claims of causality are not to be made on the basis of other research methodologies. It is difficult to discern how this nested relationship can be construed as anything other than a hierarchy in relation to causal evidence. This seems more puzzling in an environment where the National Centre for Clinical Excellence (NICE) has lessened its emphasis on RCT and no longer refers in its guidelines to a hierarchy of evidence (NICE 2014, updated January 2022. The evidence base of these rules for evaluation is not clear. They seem to emanate from a cluster of publicly commissioned reports and it is worth outlining how these relate to the OfS guidance. The OfS guidance refers to the schema of three types 'building' on the work of Crawford et al. (2017) but in fact there is nothing in that report that provides a theoretical or empirical basis for this scheme. The report was commissioned primarily by OFFA (the Office for Fair Access which preceded the broader remit of the Office for Students) with funding from Department for Education and the Sutton Trust. The report is concerned with describing the current features of outreach activity for access to HE and its evaluation. It is not concerned with degree outcomes or post-graduation progression outcomes and the evaluation of interventions relating to them. Its account is based on an empirical overview of perceptions and practices in eight higher education institutions. There is no discussion of the problems and possibilities for generalising from these institutions to all HEIs in England. Its conclusion is that 'there is a very wide variety of contexts in which outreach work is taking place' but with no detailed discussion of the structures and determinants of that variety. Moreover, the thrust of its recommendations (p. 41) is about establishing consistency through a set of standards of evaluation practice, criteria for meeting each standard, and consideration of an 'off the shelf ' baseline evaluation methodology.
The following year Harrison et al. (2018) produced a follow up to this report in which they described their work as, 'positioned within a 'social realist' worldview … that seeks to understand the fuzzy nature of the cause-and-effect relationships that exist within complex social fields, where individuals construct their own realities in reference to those around them. There is a particular focus on epistemology -the pathways to creating dependable, if contingent, knowledge -as a vehicle for making meaning from data that is usually incomplete, compromised or mediated through young people's emergent constructions of their worlds. (Harrison et al. 2018, 2) In keeping with its realist underpinnings, the ensuing guidance is well-contextualised within the specific conditions of outreach activity and cognizant of the variations of context within which this activity takes place. This guidance which is distinguished by its explicit theoretical positioning and its grounding in research about practice in the sector sits alongside the more wide ranging OfS standards for evaluation quoted above.
At the same time as publishing this guidance on evaluation, the OfS and the Cabinet Office Behavioural Insights Team set up the Centre for Transforming Access and Student Outcomes in Higher Education (TASO) in 2019 which is now part of the national 'What Works Movement' . TASO's approach is very much aligned with the OfS Standards of Evidence (TASO 2020). In common with other What Works centres, TASO acts as an 'intermediary part of an evidence eco-system' with the aim of enabling engagement between research use and research production (Gough, Maidment, and Sharples 2018, 19). There is no doubt that this is a laudable and worthwhile aim. The question being raised in this paper is about the relationship between, on the one hand, the brokering role of TASO and the enforcing role of OfS, and on the other, the promotion of such a narrow conception of causality that is evident in the above quoted guidance on evaluation. In the next section, I describe the higher education institutional contexts within which these brokering and regulatory functions play out.

The institutional context
Many UK higher education institutions have conducted research in this area: Mountford-Zimdars et al. (2015) received institutional research reports from 26 English institutions in 2015 relating to inequality in students' outcomes, most of which related to inequality in students' degree outcomes. More recently Bhopal and Pitkin (2018, 19) found that all 24 institutions in their sample had either conducted research in this area or expressed a desire to do so. Mountford-Zimdars et al. (2015, 54) found institutional responses were characterised by a series of phases. The first, confirming, involved statistical analysis of the institution's own students' degree outcomes. Such analysis was viewed as both confirming and acting as a possible lever for change within institutions. The second phase, exploratory, involved some hypothesising of causes. Typically, this involved more sophisticated statistical analysis, drawing on unit/module level data or other student behaviours and/or qualitative research with staff and students. The third phase comprised awareness-raising and communication among academics, other professional staff, and students. They note that this can sometimes be limited to formal committee structures. The fourth phase involves testing of strategies and interventions. The fifth and final stage entails review. They note that this stage is 'particularly problematic' and observe that 'relatively few of the interventions that have been initiated [in the sector] have been evaluated systematically. ' (Mountford-Zimdars et al. 2015, 86) It is not unreasonable to suppose that much of this activity is coloured by the three tendencies that are exemplified in the UUK report above: the tendency to rely on perception, adherence to an uncritical condemnation of 'the deficit model'; and a need to justify that action has been taken, with a converse fear of reputational damage. Fear of being associated with 'the deficit model' has influenced the choice of language which has changed over time.
Earlier studies refer to 'achievement' and 'under-performance' of Black and minority ethnic students and following these 'the attainment gap' became widely adopted. More recently, a further shift in the discourse focuses attention exclusively on the production of this inequality in university academic processes in the use of 'a degree awarding gap' .
In 2015, Mountford-Zimdars et al. noted that: institutional statistical analysis in relation to attainment is often followed up with qualitative enquiry. Most frequently this takes the form of focus groups with BME students. There are limitations to this approach: namely, it assumes that all BME students achieve less well than their white counterparts; that the causes of differential attainment are capable of being perceived by BME students exclusively; and that BME students can readily articulate them in the midst of their study.
I would add that, in addition to assuming that the causes are visible to black students, there is an assumption that these causes are capable of being articulated, and that students would be willing to discuss them in the context of a focus group. These assumptions amount to an implication that cause lies within the exercise of students' agency, which is paradoxical when this method is deployed in a context that eschews 'the student deficit model' .
A further feature of the institutional context in which this work is taking place is that students have themselves become aware of this inequality and several student campaigns have been launched from the NUS and students at several universities. A video made by students at University College London (Richards 2014) went on to inspire many more in other universities: see for example, Bristol (Abu El Magd 2016) and Leicester (Rahman 2019). In response, and with the encouragement of higher education agencies such as the Quality Assurance Agency and Advance HE, many universities have launched 'Student Partnership' initiatives that aim to work with students to revise curricula and other institutional structures. There is not space to consider the ethics and efficacy of student partnership here -though both of those issues are in need of analysis. I mention it here as an important part of the institutional and national terrain within which the problem of inequality is being tackled. To involve students is seen as an essential feature of how institutions go about addressing the causes of inequality in degree outcomes. A consequence of what has become a moral imperative is that institutions have often adopted interventions that have been the subject of student campaigns: decolonising or diversifying curricula is one such issue. While this focus on reforming curricula has intrinsic value, there is a dearth of evidence as to whether and how addressing it would have an impact on inequality in students' outcomes.
To summarise, debates about causation are political and have profound consequences for what is believed to be worthwhile responses. These debates take place in a highly pressured environment in which there is a moral imperative to eliminate inequality and there is an urgency to limit reputational damage while demonstrating that 'something is being done' and that it is being done 'in partnership with students' . This creates particular constraints in developing our understanding of causality. First, this is because students are often transitory participants, unable to see through their contributions from conception to impact given the longer cycle of undergraduate courses. Second, there is a risk that short-term interventions are undertaken for symbolic and performative purposes. When these conditions pertain, it is reasonable to assume that the activity of 'being seen to be doing something' actually hampers the longer-term systematic development of evidence-based practices and policies that might eliminate the inequality.

Causality in extant theoretical debates: some salient features
This section sets out some features of the debates about causation in the social sciences, and in evidence-based medicine, with a view to considering, in the following section, how these could alter current ways of thinking about inequality in students' degree outcomes. There are multiple seams of literature on causality, each conducting their own debates, constructing historical accounts of these debates, and doing so without reference to each other. This siloed scholarship is demonstrable especially within the social sciences: for example, mutually excluding accounts can be observed in Barringer, Scott, and Leahey (2014), and Hammerlsey (2014), though, interestingly, both find a basis for their theorisations in Weber's concepts of adequate causation, ideal types and comparison of probabilities. Some attempt an overview that encompasses multiple disciplines. For example, Johnson, Russo, and Schoonenboom (2019) list seven core ideas of traditional causation theory (probabilistic; counterfactual; regularity; necessary and sufficient; manipulation and invariance; mechanistic and agency). Pawson (2008) posits three core ideas: successionist; configurationist; and generative. The first five of Johnston's seem broadly to fit within Pawson's first; while Pawson's configurationist refers primarily to the method of causal process tracing and approximates to Johnson's mechanistic and agency. Generative seems to encompass mechanistic and agency but comes with a realist underpinning. What comes across strongly from a review of several overviews of causality is that much less thinking and discussion has taken place about it in qualitative research than in quantitative. Yet at the same time, some quantitative researchers have long seen the limitations of their methods for establishing causality (for example, Goldthorpe 2001).
In 2002 Stephen Gorard observed that social science research methods texts tend to overlook causality except to caution against confusing it with correlation (Gorard 2002, 5). Arguably, this may stem from a tendency to introduce quantitative and qualitative methodologies separately, and as though they are tied irrevocably to positivist and non-positivist epistemologies. Certainly, the subject is seldom addressed and only briefly in some of the more popular textbooks. Robson (2002, 79-80), and Cohen and Manion (1997,(147)(148)(149)(150)(151)(152)(153)(154) present the search for causality as essentially associated with experimental designs and, in the case of Robson, deriving from the natural sciences. However, at the point of advising on data analysis Robson introduces the idea of causal networks which can be demonstrated in qualitative data (Robson 2002, 395-399), drawing on the work of Miles and Huberman with reference to 'causal fragments' . These textbooks' approaches to methodology advice focus on arriving at a description of causal mechanism. While they consider causation in relation to both statistical correlation and mechanism, there is no linkage between the two.
In discussing the limitations of statistical analysis in establishing causation Goldthorpe (2001), from a post-positivist rational action perspective, argues for causation as a generative process that draws on subject matter concerns and theory. He sees this process as going beyond what can be ascertained on the basis of 'robust dependence' , complementing statistical analysis and acting as a corrective to the more limited techniques of 'consequential manipulation' through the control of variables in experimental designs. While Goldthorpe is aware of the limitations of statistical correlation for causal claims, his conception of causality remains centred on finding regularities in social life.
In contrast, the position of critical realists such as Sayer is to reject entirely that causal relationships can be established through what he terms 'extensive' research (Sayer 2010, 21) which organises subjects of study in taxonomic groups assuming commonalities among them and investigating relations between variables as though cause and effect operate in a closed system. The starting point for Sayer and other realists (Sayer 2011, 104) is to inquire what the causal powers of objects, including persons, are, and under what conditions they become activated. There is a distinction here between latent and actual possibilities, 'a causal claim is not about a regularity between separate things or events but about what an object is like and what it can do, and only derivatively what it will do in any particular situation' (Sayer 1992, 105). Furthermore, the relation between cause and effect is contingent on how actors interpret and behave in certain circumstances, and on their wider context of action (Sayer 1992, 107). A further interesting feature of realist inquiry into causation is that rather than studying taxonomic groups, Sayer advocates studying causal groups, the configuration of which may not be obvious at the start. One commonality between Goldsworthy and Sayer is that they draw attention to the importance of defining the phenomenon under study. This seems an obvious point found in most methodology textbooks, but it is worth returning to this in the next section with respect to the issue of inequality in students' degree outcomes.
For Pawson and Tilley (2000), writing about realist evaluation, perhaps the most damning limitation of even rigorously conducted experimental designs is that by pursuing a model that assumes a closed system of constant factors, 'it has tended to overlook the liabilities, powers, and potentialities' of the phenomena it seeks to explain (Pawson and Tilley 2000, 34). Using a series of examples of policy evaluations, they demonstrate the limitations of some such studies in arriving at actionable context-specific recommendations precisely because they are unable to explain the causal mechanisms that underlie their statistical analyses.
From an interpretive perspective Hammerlsey (2014, 83-95) critiques critical social realists' tendency to adopt value judgements based on explanatory evidence alone. While there are always limits to value neutrality, interpretivists will tend to seek causal explanations from the perspective of their research participants and this aim is shared among several methodologies including those undertaking interpretive process analysis, where the researcher is able to study 'multi-faceted processes… over time … to explain how social dynamics produce specific outcomes' (Ludvig 2015, 6). Hammerlsey (2014, 29) argues that causal claims can be supported by a combination of imaginative construction of plausible models of causal processes and analysis of empirical data within cases and across cases. Within the latter Hammerlsey refers to the benefits of using qualitative and quantitative data in a complementary way (Hammerlsey 2014, 32) but the nature of the interplay between these different sorts of evidence in relation to causation is unelaborated. For Hammerlsey the key combination is the imaginative construction of possibilities and the empirical analysis, and the latter may take many forms. The notion that mixed evidence is needed to substantiate causal claims is not entirely absent with respect to the social sciences. Indeed, there are claims that mixed-methods approaches are uniquely placed to undertake studies of causality, without attachment to any theoretical standpoint (Johnson, Russo, andSchoonenboom 2019, Shan andWilliamson 2021).
Before discussing these ideas in more detail, it is worth reminding ourselves that the 'what works' policy agenda, now 'movement' entered education via evidence-based medicine (Oakley 2002). Among the clearest critics of the primacy of randomised controlled trials (RCT) in medicine is the Russo Williamson thesis (Russo and Williamson 2007, 158) which is that causal relations can be inferred: '…from mixed evidence: on the one hand, mechanisms and theoretical knowledge, and, on the other, statistics and probabilities. Statistics are used to show that the cause makes a difference to the effect, and mechanisms allow causal relationships to explain the occurrence of an effect. ' In later work developing this thesis, Clarke et al. (2014) challenge the veracity of a normative hierarchy of evidence. They see 'mechanism' and 'evidence of difference-making' as complementary, observing that even in the construction of an RCT, researchers are likely to deploy their contextual knowledge of mechanisms. To drive home the crucial role of knowledge about mechanisms, they remind us that, 'It is knowledge of mechanism not correlation that stops us from deeming the thermometer to be a cause of temperature or the presence of mud to be the cause of rain' (Clarke et al. 2014, 344).
Most recently, some philosophers of science have found an affinity between the Russo-Williamson thesis and mixed methods research. Johnston et al. (2019) urge mixed methods practitioners to draw on as many conceptualisations of causation as possible and as appropriate to a given research question. Shan and Williamson (2021) clarify that the application of the Russo-Williamson thesis is not in fact simply an injunction to look at all the evidence but a more precise formulation to systematically assess both evidence of association (from either experimental or observational studies) and causal mechanism (predominantly but not exclusively from qualitative data). They make a compelling case for the use of this approach, especially for its appropriateness to applied policy work. They argue that their approach is epistemic and makes no commitment to a metaphysical theory of causation. The approach can therefore, potentially, encompass a broad range of theoretical positions, including the array of approaches that may be most common among practitioners (Wilson and Dauncey 2020 give an overview). Clarke et al. (2014) propose several criteria for evaluating evidence of mechanism, some of which are specific to the context of medical practice. However, there is one criterion that seems especially pertinent to the context of higher education institutions tacking inequality in degree outcomes. They caution against over-estimating the significance of stories and especially, 'psychologically compelling evidence' . This is particularly relevant in the HE context where, as has been discussed above, there is a tendency to elevate 'the student voice' (Sabri 2011).
The implication of this review of current thinking on causality is that there are sound reasons to revise OfS guidance on evaluation and what constitutes evidence of causation. There is no justification for privileging RCTs and evidence derived from experimental or quasi-experimental designs over other kinds of evidence.
The principles that I propose using from extant work on causality are that: 1. We need to explore how we are defining the problem of inequality in students' outcomes; and consider whether an alternative framing, or multiple framings, might enable systematic research that provides a sound basis for concerted change that eliminates these inequalities. 2. Causes of the problem, in its reframed form(s), should be sought through systematic evaluation of both associative and mechanism studies. Both are necessary and their integration will vary depending on the causal relationship being investigated. 3. We need to attend to the issue of how students are involved in this work: considering both the ethics of this involvement and its efficacy.
In the following section, I make suggestions in relation to the first principle and explore a series of causal hypotheses, drawing on associative and mechanistic evidence. I do not substantially address the third principle which, as I have observed above, raises questions that have barely begun to be critically appraised.

How do we investigate the causes of inequality in students' degree outcomes?
Reframing the problem By way of questioning the framing of the problem of inequality in students' degree outcomes, I would like to quote a tutor interviewed as part of an institutional research project (Sabri 2014): 'Some students just don't sign up to the intellectual project that is the course. ' This tutor's observation reminds us that HE curricula are culturally and historically situated (though the nature of this situatedness varies in different discipline contexts). The tutor is also observing that some students come with experiences and interests that they find to be at odds with those of the course they have chosen to undertake. Arguably, the observation is corroborated by the student campaigns referred to above. Its relevance for the framing of the problem is that for some students getting a first or upper second-class degree necessitates 'a sell-out' to a dominant culture with which they do not identify. Conversely, there are students who do not experience this dissonance and for whom 'signing up' is taken for granted. So, how well-served are each of these groups of students? The value of a 'good degree' is called into question. Is equality of students' outcomes, as currently defined necessarily good for students who feel themselves to be on the margins of their curricula? To date 'the attainment gap' has been defined in statistical terms as the difference in the proportion of students of different ethnicities awarded a first or upper-second class degree, as compared with those awarded a lower-second class degree or below. There are also qualitative ways of defining it, one of which may be the relevance of curricula to different students.
If we persist with the statistical definition, one way of eliminating inequality in degree classifications is to abolish the class mark system in favour of pass/fail, a system that currently operates in parts of higher education, notably in the first year of undergraduate study. However, abolishing class marks does not erase the phenomenon under discussion. The essence of the problem is that students of colour tend to experience the form and function (Dewey 1938) of education less favourably than do their white counterparts. Clearly, the class mark is simply one indicator, and a powerful one insofar as it influences prospects in employment and further graduate study. This last point also qualifies the argument in the preceding paragraph about the existential cost of attaining a 'good degree' .
There are other extant statistical indicators too. For some years the National Student Survey (NSS) has demonstrated that fewer Asian, Black and 'Other' students than white students consider their assessment to have been fair. In 2021 71.7% of white students perceived their assessment to have been fair, while 61.7 of Asian Students, 63.9% of Black students and 60.33% of Other students did so. Even though the NSS measures perceptions only, there is reason to pause and consider why these vary so starkly by ethnic group. Other NSS questions show similar differences, and it is probable that institution-level and discipline-level data would yield further insight.
These arguments point to a need to reconceptualise what we mean when we discuss inequality in students' degree outcomes. If the core of this problem is about ensuring that all students fulfil their educative potential, then we need to consider the social processes in which degree outcomes are produced. For example, we might ask: Educative potential in relation to what curricula and whose assessment? And what combination of indicators would provide us with the best means of measuring the problem and subsequent progress in alleviating it?
The foregoing analysis has pointed to some of the ways in which the national policy context, and especially the OfS, structures research in this area. Perhaps the least discussed, and yet most stark, is the omission of students domiciled outside the UK, to whom we commonly refer as 'international students' from the national discourse about equality in degree outcomes. While there has been some institutional research that does include this group (for example, Sabri 2014Sabri , 2017 data analysis is hampered by inadequate ethnicity categories and lack of a system for collecting these data. The impact of the exclusion of these students from the OfS regulatory framework is that resources within institutions that are aimed at equality in students' experience tend to monitor their performance exclusively in relation to UK domiciled or 'home' students. When we consider reframing the problem, we must also ask, how is this exclusion compatible with our claim to be committed to equity and social justice? And pragmatically, what might we gain in our understanding of the causal relationships that we seek to uncover by including this group of students? Furthermore, we cannot ignore the persistent pattern in summaries of the literature described above that point to anything and everything as having an influence on students' degree outcomes. These are not sufficiently discriminating findings in an environment where there is constant pressure on resources and justified demands for change. To use limited resources truly to make a difference, more nuanced analysis is needed of the strength of different causes, their relative importance, and their interaction with each in particular contexts. Specifically, analysis of association needs to be undertaken, systematically, alongside studies of the mechanisms that may underlie them. None of the national reviews, to date, has adopted a theorised relationship between these kinds of evidence. An alternative to national analysis across the whole of higher education is to contextualise the problem with a greater degree of specificity, an approach that is central to applied purposes (Flyvbjerg 2009). For example, we might ask, what is the cause of inequality in degree outcomes among students of the humanities or medicine (see Woolf et al. 2013), or those entering higher education with vocational qualifications? There is also a need for further rigorous institution-level research that integrates associative and mechanistic evidence. It is not that this institution-level research is absent, but rather that it is disparate, often kept confidential, and not seen as part of a wider scholarship of micro studies contributing to a complex and nuanced understanding that we need to develop.
To summarise there is a need for a more holistic conceptualisation of the problem that includes but also goes beyond proportions of degree classification and uses a cluster of indicators of educative force and function. Second, we must attend to the cultural and historical situatedness of institutional contexts -including the ways in which they can be structured by national accountabilities -which, evidently, at present, are often not enabling Black and minority ethnic students to give of their best. Third, we need to move purposefully and iteratively between macro trends, and institution and subject specificities, allowing local accounts to lead to cross-case comparisons that in turn inform national policy including funding and incentivisation.
Thinking causally holistically about inequality in students' higher education I now set out a series of linked causal hypotheses and, following Russo and Williamson (2007), suggest what association studies and mechanism studies would need to be undertaken to test different parts of it. Figure 1 summarises the interplay between these causes.
The diachronic cycle of causation is represented in this diagram by the larger arrows. There is no suggestion of a particular starting point because these factors will have grown together, enmeshed in a broader history of the expansion of higher education over the last forty years. PhD students are recruited from cohorts of undergraduate students among whom there is an under-representation of Black and minority ethnic students with 'good degrees' , and this selectivity is compounded by these students' under-representation at high-tariff institutions (Boliver 2018). Then, from a disproportionately white population of graduates, a small number are funded and selected to undertake PhDs which train them for academic jobs in a highly competitive market (Williams et al. 2019).
There is frequent reference in student campaigns to the underrepresentation of Black and minority ethnic staff among academics (Joseph-Salisbury 2018). Obviously equal representation is a necessary good in itself and a fundamental aim of social justice. There is also an assumption in the public discourse that greater representation will narrow inequality in students' outcomes. However, the causes of under-representation (and these well vary across disciplines and institutions (Bhopal 2016) bear closer examination. It seems simplistic to suggest that parity of representation would have a linear relationship to narrowing inequality in students' degree outcomes. While the unfolding reality maybe more complex, we can identify that in the current discourse is embedded a hypothesis that academics' own expertise, and career trajectories (whether in academic research or professional practice) play a significant role in proscribing curricula. If this is true, then it would not be surprising if a gap emerged between the design of HE curricula designed by a relatively demographically stable cohort of mainly white academics and the interests of an increasingly diverse body of students; diversity that is multiplied by the participation of international students as well as home UK students. This could be investigated through a combination of associative data relating levels of diversity among staff and students; and mechanistic study of both the determinants of curricula and the interests of students who are awarded different degree outcomes.
A further set of closely connected hypotheses concerns the structures of pedagogy and interaction with students. It is well-known from school research (Newman et al. 2021) and higher education (Morris, Perry, and Wardle 2021) that formative feedback is frequently correlated with higher attainment. Interventions based on such factors can be evaluated in specific contexts. For example, if there is an association between staff-student ratios (SSR) and the frequency of formative assessment, we might hypothesise that contexts of high SSR enable staff to get to know students' work and keep track of their development, and that students acquire feedback literacy that enables them to fulfil their potential. A comparative study of this mechanism in contexts of relatively high and low SSR would test this hypothesis and produce evidence that would justify investing (or not) in high SSR. The intensity of contact time between academics and students varies enormously between programmes and disciplines: in each case, through what formal and informal processes is the resource of contact with academics distributed? A study of mechanism in this context would begin to reveal the patterns of social action that result in the unequal distribution of those resources that enable students to fulfil their potential.

Conclusion
This paper has brought together an analysis of the policy and institutional contexts within which we grapple with the problem of inequality in students' higher education experiences, with the extant literature on causality. I have argued that the current framework for thinking about causality, promoted through the most dominant policy instrument in this area of higher education work, namely the OfS guidance on evaluation of activity relating to Access and Participation Plans, is reductive and inadequate to support the sector to meet the targets it has set.
The cultural and discursive landscape we have created around this issue militates against systematic and cumulatively illuminating research. I see explicit debate about causal assumptions embedded in our discourse as essential to moving beyond a climate of moral grandstanding and towards an evidence-based set of strategic approaches at both national and institutional levels that would eliminate this inequality. We often use the term 'intervention' which for me is a radical under-estimation of the task at hand. The inequality in degree outcomes is a small but powerful indicator of what is likely to range across several dimensions of social injustice: a historical maldistribution of resources, misrecognition (of value, accomplishments etc.) and, relating to both dimensions, participatory parity (Fraser 2001). The scheme of causal relationships set out in Figure 1 points to the interplay between inequality in degree outcomes and some other manifest inequalities embedded in curriculum design, access to postgraduate research and recruitment in higher education.
The environment within which higher education institutions are operating and their responses to that environment effectively stifle a critical appraisal of what the problem is. They approach the framework of accountability offered by the OfS uncritically, for example without questioning the exclusion of international students. They are at once mediating the requirements of OfS, and the advice of other HE agencies such as Advance HE and TASO, and attending to a need to be seen to be 'doing something' . They have also yet to interrogate the discursive and symbolic power of 'student deficit ' and, conversely, 'student partnership' . In this context there is a compelling reason for the OfS to revise its current evaluation guidance in favour of evidential pluralism. Evaluation of any activity, especially when it is externally mandated, has a backwash effect. It is a truism in educational research that the way in which we assess, structures the learning activity that comes before assessment. If the OfS truly wants the sector to learn and make headway in narrowing inequality, then the evaluation guidance that governs this activity should be structured to value meaningful cumulative learning to improve, rather than impose a top-down narrow conception of causality. Note 1. Significantly, both the regulatory framework of the APP and the way in which these statistics are compiled exclude international students, that is students who pay higher fees and have a domicile outside the UK.