One size fits all? Using machine learning to study heterogeneity and dominance in the determinants of early-stage entrepreneurship

Despite the vast number of studies exploring the determinants of entrepreneurship, few have been able to distinguish the relative importance of these factors. Traditional regression-based approaches, upon which such studies are based, are unable to fully capture heterogeneous and complex non-linear patterns in the determinants of entrepreneurship. To address these limitations, we adopt a novel approach, using machine learning to study heterogeneity and dominance in the social-cognitive determinants of early-stage entrepreneurship. We apply decision tree algorithms to a large-scale dataset from the Global Entrepreneurship Monitor. Our results reveal that the dominant determinants, irrespective of entrepreneurial pathway, are individual entrepreneurial self-efficacy and networks, with factors such as cultural perceptions being relatively unimportant, despite substantial attention in the literature. The results also show considerable heterogeneity in the factors contributing to entrepreneurship, highlighting the need for academics and policy makers to consider the likelihood that there is no single set of motivating factors.


Introduction
Entrepreneurship plays a crucial role in driving economic growth (Stel et al., 2005), innovation, and social development (Chell et al., 2010).Although a substantial body of literature has focused on the determinants of entrepreneurship (see Audretsch & Erdem, 2005;Walter & Heinrichs, 2015 for reviews), there is still debate over which set of factors has greater explanatory power (Amini Sedeh et al., 2020); the relative importance of predictors, and problems with model uncertainty (Arin et al., 2015).The literature also recognises that there are nonlinear relationships in the determinants of entrepreneurship (Beynon et al., 2016(Beynon et al., , 2018(Beynon et al., , 2020;;Coduras et al., 2016;Lévesque & Minniti, 2006) and identifies the heterogeneous nature of entrepreneurs (Douglas et al., 2020;Kerr et al., 2018;Van Ness & Seifert, 2016).We contribute to these bodies of literature by adopting a machine learning approach to examine the relative importance of the social-cognitive determinants of total early-stage entrepreneurship (TEA), and to explore heterogeneity in the combinations of factors that lead to TEA.
We draw on social cognitive theory (SCT) as a lens through which to study heterogeneity and dominance of the factors associated with earlystage entrepreneurship.SCT posits that complex bidirectional interactions between individual traits and characteristics, past behaviour, and the environment will result in motivation and subsequent action (Wood & Bandura, 1989).Although the importance of socialcognitive factors has been established in the entrepreneurship literature, there is limited knowledge about the different combinations of such factors that lead to entrepreneurship, as well as the relative importance of these factors.
Advances in machine learning have created opportunities for entrepreneurship researchers to gain new insights, contributing to entrepreneurship theory and practice.This potential has been recognised in the entrepreneurship literature, with calls for more studies to draw on analytics approaches (Obschonka & Audretsch, 2020;Schwab & Zhang, 2019).Obschonka & Audretsch (2020) refer to a 'new era' of entrepreneurship research, suggesting that AI and big data provide opportunities for the potential to progress the entrepreneurship field, albeit with some challenges.Despite this interest, relatively few studies in the entrepreneurship literature have adopted methods from machine learning and analytics (Lévesque et al., 2020;Obschonka & Audretsch, 2020) with just a small body of emerging research (Celbis ¸, 2021;Haworth et al., 1991;Montebruno et al., 2020;Santos et al., 2020;Sohn & Lee, 2013;Wei et al., 2020).
In the wider business literature, opportunities from the application of analytics have resulted in the development of a new 'analytics paradigm' (Delen & Zolbanin, 2018).Within this paradigm, new types of data and methods, such as machine learning approaches, are used to answer interesting research questions and provide new insights.We situate our study within this paradigm, applying a decision tree algorithm to model the heterogeneity and dominance of the social-cognitive determinants of TEA.A large-scale dataset is used in the analysis, drawing on combined data from the 2015-2018 GEM surveys.
The results of the study show that cognitive traits are most important in predicting TEA, with self-efficacy being important for most entrepreneurs.Although factors such as cultural perceptions have received attention in the literature (Shinnar et al., 2012;Thornton et al., 2011) we find that these are relatively less important.Beyond self-efficacy, the results show substantial heterogeneity in the social-cognitive determinants of entrepreneurship and other career choices, highlighting the diversity of entrepreneurs.
In carrying out the study, we contribute to SCT theory and to the wider entrepreneurship literature on the social-cognitive determinants of entrepreneurship, as well as the emerging literature on the heterogeneity of entrepreneurs (Cieślik & Dvoulety, 2019;Douglas et al., 2020;Kerr et al., 2018;Van Ness & Seifert, 2016).Although substantial progress has been made in understanding the factors that lead to entrepreneurship, debate remains in the literature around the importance of commonly studied variables in predicting entrepreneurship (Arin et al., 2015).Arin et al. (2015) attribute this to model uncertainty, where variable significance is influenced by researcher choices about which variables to include in the model.The decision tree algorithm used in this study helps to overcome this problem, by automatically ignoring variables with no predictive value.The decision tree algorithm also produces measures of variable importance, which contributes to the dominance analysis approach in entrepreneurship research (e.g.Arin et al., 2015).
Another advantage of our approach is that the structure of the decision tree can be visualised, which permits the identification of nonlinear relationships, as well as the different combinations of factors that can lead to entrepreneurship.This approach builds on past research which has highlighted non-linearity in the relationships between individual characteristics and entrepreneurship (Beynon et al., 2016(Beynon et al., , 2018(Beynon et al., , 2020;;Coduras et al., 2016), and the heterogeneity of entrepreneurs (Cieślik & Dvoulety, 2019;Douglas et al., 2020;Kerr et al., 2018;Van Ness & Seifert, 2016).By adopting a novel methodological and theoretical approach, we thus add to this literature, allowing us to demonstrate the heterogeneity in the combinations of social-cognitive factors that are associated with entrepreneurship, as well as non-entrepreneurs.

Theoretical background
The theoretical background for the study draws on two main bodies of literature.The first is the literature on entrepreneurial motivation, with a specific focus on SCT (Wood & Bandura, 1989).The second is the emerging literature on the heterogeneity and segmentation of entrepreneurs (Cieślik & Dvoulety, 2019;Douglas et al., 2020;Dvouletý, 2020).We combine insights from both bodies of literature to develop the theoretical lens for our study, which focuses on the heterogeneity and dominance of cognitive and personal, behavioural, and environmental motivational antecedents of entrepreneurship.

Social cognitive determinants of entrepreneurship
A substantial body of literature has focused on the antecedents that influence the motivation to start a business (Murnieks et al., 2020;Rauch & Frese, 2007).Findings from this literature highlight the importance of a diverse range of factors such as demographics, personal circumstances, self-efficacy, commitment to goals, passion, and fear of failure (Murnieks et al., 2020).SCT has been particularly prominent in this recent literature as a framework to explain why some individuals are motivated to engage in entrepreneurship (Boudreaux et al., 2019;Camelo-Ordaz et al., 2020).SCT proposes that behaviour is influenced by personal, environmental, and behavioural factors (Wood & Bandura, 1989).It explains how people regulate their behaviour through control and reinforcement to achieve goal-directed behaviour.These effects result from reciprocal triadic bidirectional relationships between the three dimensions of cognitive and personal factors, behavioural, and environmental factors (see Fig. 1) (Wood & Bandura, 1989).The inclusion of the personal, behavioural, and environmental factors in SCT therefore provides a more comprehensive framework than focusing on a more limited range of factors (Hmieleski & Baron, 2009).
The first dimension of SCT theory focuses on individuals' cognitive and personal characteristics.From a SCT perspective, self-efficacy is understood to be central to individual agency, as without the belief that an outcome can be achieved there is no incentive to carry out the behaviour (Bandura, 2002).The second dimension posits the importance of the individual's external environment.Individuals also operate within wider social systems, and the wider 'beliefs and actions' of these systems also influences individual agency (Bandura, 2002, p. 271).Individuals are more likely to use their self-efficacy in social contexts where the behaviour is perceived favourably as culturally legitimate (Klyver & Thornton, 2010).Thus, perceptions of external cultural factors relating to entrepreneurship are also considered to have an important influence on entrepreneurial behaviour (Castaño et al., 2015;Hayton et al., 2002), and it is important to consider these alongside cognitive factors (Siu & Lo, 2013).The third element of SCT is behaviour.Behaviour is guided through the development of knowledge structures (Bandura, 1999).Reflecting the reciprocal bidirectional nature of the relationships, Bandura (1977, p. 191) proposes that 'experiences of mastery' will help to develop self-efficacy, thus shaping behaviour.
A number of studies have adopted SCT as the underlying theoretical framework for explaining entrepreneurial behaviour and performance (Bacq et al., 2017;Boudreaux et al., 2019;Hmieleski & Baron, 2009;Oo et al., 2018).Indeed, there has been widespread support in the empirical literature for the positive impact of cognitive influences in particular, including self-efficacy (Boudreaux et al., 2019;Sequeira et al., 2007), networks (Casson & Della Giusta, 2007;Sequeira et al., 2007;Thornton et al., 2011) and opportunity recognition (Davidsson, 2015;George et al., 2016;Shane & Venkataraman, 2000).But other personal attributes, such as income, education, employment status, and age, for example, have resulted in inconsistent findings in the empirical literature (Stephan et al., 2015;Van Ness & Seifert, 2016).Likewise, there have been contrasting findings with regards to the environmental factors that reflect cultural norms such as perceptions of entrepreneurs' status; consideration of entrepreneurship as a good career choice, and media coverage of successful entrepreneurs (Coduras et al., 2016).
Despite the use of SCT as a theoretical framework for motivations towards entrepreneurial activity, the conflicting empirical findings Fig. 1.Bidirectional relationships between SCT determinants.Source: Based on (Wood & Bandura, 1989).
regarding some of the personal, behavioural and environmental determinants of entrepreneurship leave open questions around the factors that motivate individuals to start a business.In this paper, we therefore propose an alternative framework and methodology, building on SCT, which allows for an explanation of the contrasting findings.We propose that such inconsistent findings are due to the heterogeneity of the entrepreneurial phenomenon, coupled with the insufficiency of traditional statistical techniques to account for this heterogeneity.We therefore extend the theoretical framework and the modelling technique to account for both these factors.In doing so, we also provide an indication of the relative dominance of the explanatory factors, addressing a limitation of the empirical SCT literature.
Central to our theoretical proposition is that entrepreneurship is a diverse phenomenon across multiple dimensions, exhibiting substantial heterogeneity across the characteristics of the individuals involved (Calderon et al., 2017;Woolley, 2019), their goals (Hoogendoorn et al., 2020;Hurst & Pugsley, 2011), the businesses they form (Chua et al., 2012), and the performance of these ventures (Arbelo et al., 2022).Past entrepreneurship research has conceptualised motivation as a static construct (Murnieks et al., 2020) rather than a non-linear, heterogeneous and dynamic driver of entrepreneurship.
The wider motivation literature also points towards non-linearity and heterogeneity in the factors that lead to motivation and behaviour in a range of contexts such as effort, trust, risk-taking, and strategic reasoning (Jacoby, 2002;Taylor, 2020).Heterogeneity amongst individuals has also been considered in the wider management literature.For example, Lamberti et al. (2022) find that employees respond differently to organisational climate variables, such as empowerment, pay and work conditions.Despite these arguments, past research has not focused on the heterogeneity of social-cognitive factors that lead to entrepreneurship.
Failing to consider heterogeneity and, as a consequence, nonlinearity creates several problems, and can hinder our understanding of a phenomena rather than improving it (Jacoby, 2002).Focusing on the average effect of the predictor variables on an outcome can hide the existence of different relationships in sample sub-groups (Newbert et al., 2022).Even though a regression model might show a significant positive or negative relationship, the opposite, or no relationship at all, could be present in sub-groups (Woodside, 2014).In these cases, a greater contribution could be made by showing this heterogeneity.Moreover, the use of more complex modelling techniques such as structural equation modelling (SEM), and the inclusion of more complex specifications including controls and moderators also do not permit a comprehensive analysis of sample heterogeneity (Newbert et al., 2022).Speaking from the perspective of complexity theory, Woodside (2014Woodside ( , p. 2495) therefore proposes the 'necessity of modelling multiple realities using complex antecedent configurations', whilst Newbert et al. (2022, p. 4) cautions against a focus on analysing the 'average entrepreneur'.Moreover, it is often the case that there is causal asymmetry where the causes of different values on the dependent variable are not necessarily the same (Woodside, 2013(Woodside, , 2014)), therefore the factors that cause someone to be an entrepreneur are not necessarily the same as the factors that cause someone not to be an entrepreneur.
Although the difficulties of studying non-linear and heterogeneous relationships using regression techniques are well understood, much of the existing literature applies these techniques (Coduras et al., 2016).Woodside (2013) suggests moving beyond the traditional statistical methods which focus on linear regression.Although the use of analytics techniques have been limited in the entrepreneurship research, recent work in the entrepreneurship field has begun to adopt a wider variety of methods such as fuzzy-set qualitative comparative analysis (FSQCA) (Beynon et al., 2016(Beynon et al., , 2018(Beynon et al., , 2020;;Coduras et al., 2016), and Bayesian networks (Sohn & Lee, 2013).Decision trees and other analytics approaches have been applied in the wider business literature (Duchessi & Lauría, 2013;Schivinski, 2021), but have not been applied to the study of heterogeneity in the entrepreneurship literature.These methods allow more heterogeneous and non-linear relationships to be uncovered in the data.The use of new tools in entrepreneurship research is important as tools have an influence on the discovery and development of new theories (Gigerenzer, 2009).

Framework and research questions: Heterogeneity and SCT
As shown previously in Fig. 1, SCT recognises the non-linearity of the relationships between personal characteristics, behaviour, and the environment, proposing a series of reciprocal bi-directional relationships.This is coupled with arguments that the three social-cognitive components are not of equal strength (Wood & Bandura, 1989).Moreover, SCT also recognises that even though individuals may have the motivation to undertake a behaviour, they do not necessarily act on this motivation (Wood & Bandura, 1989).Therefore, we expect that a proportion of non-entrepreneurs will report social-cognitive factors that are normally seen as positively related to starting a business, such as selfefficacy, knowledge, and ability to recognise opportunities to start a business, but will be motivated to choose other careers.Similarly, some individuals may decide to start a business even though they are lacking the important social-cognitive factors.Moreover, we expect the socialcognitive factors to differ in terms of their importance within groups of entrepreneurs and non-entrepreneurs.Thus, we expect to observe heterogeneity in terms of the combinations of social-cognitive factors for different sub-groups of entrepreneurs and non-entrepreneurs.
Although the inherent non-linearity of the relationships between these factors is recognised in SCT, most empirical research does not fully account for this.Following the advice of Jacoby (2002) in the wider psychology literature, we take this non-linearity further and conceptualise the relationships between the three dimensions as a Venn diagram, with each circle representing one of the three dimensions, with entrepreneurs possessing one or a combination of the social-cognitive characteristics.In this conceptualisation, presented in Fig. 2, entrepreneurs (and non-entrepreneurs) could have various different combinations of characteristics.Combinations of social-cognitive factors will then determine whether individuals are engaged in TEA.
Based on the above framework our research questions focus on the two novel additions to SCT theory, namely addressing dominance and heterogeneity, with question one focusing on dominance, and questions two and three focusing on the exploration of heterogeneity: Q1: Which of the individual elements of SCT are the dominant, and hence necessary, factors that result in early-stage entrepreneurship?Q2: What combinations of factors are most likely to result in earlystage entrepreneurship?Q3: What combinations of factors are most likely to result in individuals not becoming entrepreneurs?

Measures of SCT
The Global Entrepreneurship Monitor (GEM) framework and dataset includes variables relating to each of the three dimensions of SCT.In this section, we briefly outline each area relating back to the wider B. Graham and K. Bonner literature, before proceeding to the methodology.

Cognitive and other personal factors
The GEM includes data on four cognitive traits: self-efficacy, networks, fear of failure and opportunity recognition.It also includes data on age, gender, household income, and education which reflect the other personal factors that are said to influence motivation and subsequent behaviour (Wood & Bandura, 1989).We discuss these factors in relation to their potential heterogeneity and importance.
2.3.1.1.Self-Efficacy.Self-efficacy is defined as 'people's beliefs in their capabilities to mobilize the motivation, cognitive resources, and courses of action needed to exercise control over events in their lives' (Wood & Bandura, 1989, p. 364).Self-efficacy is therefore central to SCT in selfregulating individual motivation and behaviour (Bandura, 1999;Boudreaux et al., 2019;Wood & Bandura, 1989).If individuals do not possess self-efficacy they will not be motivated to act (Bandura, 1999).Given its centrality in terms of such motivation, and its support in the empirical literature (Boudreaux et al., 2019;Sequeira et al., 2007) it is proposed that self-efficacy will be one of the dominant factors with regards to early-stage entrepreneurial activity.Furthermore, the expectation is that it will be common to all entrepreneurs.

Networks.
Learning by observing others is a crucial element of SCT, and one of its differentiating factors compared to earlier theories (Wood & Bandura, 1989).Following Bandura (1977), entrepreneurial role models, mentors, and entrepreneurs within an individual's social network may enhance an individual's desire to become an entrepreneur and boost their entrepreneurial self-efficacy (Van Auken et al., 2006).They may also transmit legitimacy to the perceived social value of entrepreneurship (Kacperczyk, 2013).
We therefore expect that knowing other entrepreneurs will be an important antecedent for entrepreneurial activity, but less important than self-efficacy.Entrepreneurs can form businesses in challenging situations (Serviere, 2010) or where there are few role models (Bosma et al., 2012), or may gain self-efficacy from other sources such as formal education (Piperopoulos & Dimov, 2015), introducing the potential for some heterogeneity in the importance of knowing an entrepreneur.Given its importance from a SCT perspective, however, and empirical support for its significance (Casson & Della Giusta, 2007;Sequeira et al., 2007;Thornton et al., 2011), we expect this variable to be important across most individuals.

Opportunity recognition.
Within the SCT framework, perceived opportunities influence whether individuals will implement what they have learned (Bandura, 1999).Indeed, Davidsson (2015) suggests that opportunity perception not only implies opportunity recognition by individuals, but also their fit, i.e., the individual-opportunity nexus.More fundamentally, entrepreneurship can be viewed as a process of exploiting opportunities (Shapero, 1984), making opportunity perception and identification central in the decision to start a business.
Although there are theoretical arguments identifying the fundamental nature of opportunity for entrepreneurship (Baron, 2006) and support from the empirical literature (Arenius & Minniti, 2005) there are nevertheless wider motivations for starting a business covering a range of personal, social and community dimensions, which also include necessity-based options (Stephan et al., 2015).Overall, we therefore expect that opportunity recognition will be important, but will vary across individuals and it will not be a necessary factor for all groups of entrepreneurs.

Fear of failure.
Fear of failure is considered a type of risk perception and typically viewed as a barrier to entrepreneurship (Cacciotti & Hayton, 2015).Despite empirical support for the negative relationship between fear of failure and entrepreneurship (Boudreaux et al., 2019;Simon et al., 2000) perceptions of risk have also been found to be important only for sub-groups, such as those coming out of employment.This implies heterogeneity in the relationship with entrepreneurship (Caliendo et al., 2009).Given this, we do not expect to find fear of failure as highly important to entrepreneurs, but it is likely it will be an important barrier, and hence determinant, for non-entrepreneurs.

Other personal factors.
In addition to cognitive traits, SCT also highlights the importance of other personal characteristics (Wood & Bandura, 1989).Demographic variables have been identified as important as they reflect the human and resource capital and necessary experience associated with successful business creation.These factors include income, education, employment status, age, and gender (Arenius & Minniti, 2005;Boudreaux et al., 2019;Koellinger et al., 2013;Pathak & Muralidharan, 2021).Van Ness & Seifert (2016) and Stephan et al. (2015) both discuss the heterogeneous nature of these personal attributes on entrepreneurship.We therefore anticipate that these factors will vary across entrepreneurial individuals, and we expect them to be the least important of the personal element of SCT.

Environment: Cultural perceptions
SCT highlights the importance of the external environment in governing behaviour, whereby people are both "products and producers of their environment" (Wood and Bandura 1989 pg. 362).The perceived cultural environment also relates to Bandura's (1977) concept of outcome expectations as it impacts on expectations from self-efficacy (Klyver & Thornton, 2010).For example, entrepreneurship may be positively influenced by culture particularly where it promotes positive attitudes to business creation and offers social legitimisation to the behaviour (Liñán et al., 2011).Alternatively, individuals might not exercise their self-efficacy if they believe that the environment will be unresponsive or will punish the behaviour (Bandura, 1977).
As with the personal characteristics identified above, there have been mixed findings in the empirical literature as to the impact of cultural norms on entrepreneurial activity.The consideration of entrepreneurship as a good career choice is found to be significantly associated with early-stage entrepreneurship, particularly necessity-driven TEA (Hindle & Klyver, 2007).However, Santos et al. (2017) found it to have a significant negative effect on early-stage entrepreneurial activity, measured before and during the Global Financial Crisis.Indeed, they suggest that social norms have a weaker impact on entrepreneurial activity than individual characteristics.
Media coverage of successful entrepreneurs has been found to exert the highest impact of the socio-cultural factors in a number of studies (Beynon et al., 2016(Beynon et al., , 2018(Beynon et al., , 2020;;Coduras et al., 2016;Liñán et al., 2011), but others report either no significant impact (Ahmad et al., 2014) or a negative impact on entrepreneurial activity (Santos et al. (2017).Its impact is also found to vary depending on the stage of the business (Arshad et al., 2021;Hindle & Klyver, 2007).Likewise, the impact of perceptions of entrepreneurs' status and respect also has mixed findings (Hindle & Klyver, 2007;Santos et al., 2017).
The ease of starting a business reflects the regulatory and institutional context in which would-be and existing entrepreneurs must operate.Empirical evidence supports the proposition that entrepreneurial activity is higher in economies with less regulation and lower barriers to entry (El-Namaki, 1988;Hong & Sullivan, 2007) and that it influences the rate of general entrepreneurial activity more than any other factor (Stenholm et al., 2013).
Social enterprises are those businesses set up to address a social purpose.Although the number of social enterprises in an economy will be small relative to commercial ventures, it is found that a highly developed social enterprise sector can result in a scenario whereby more commercial entrepreneurial activity arises due to a competition effect, resulting in barriers to entry to social enterprise (Fernández-Laviada et al., 2020).
B. Graham and K. Bonner Given the mixed findings of these cultural aspects, and the fact that both entrepreneurs and non-entrepreneurs operate within the same social and cultural norms within a country, we anticipate that these environmental factors will be less important than the cognitive aspects of SCT for explaining entrepreneurial activity.

Behavioural factors
The third group of factors that shape motivation and outcomes relate to behavioural factors (Wood & Bandura, 1989).This dimension relates to Bandura's (1977) argument that self-efficacy is influenced by past mastery experiences.For this aspect, we draw on the following from the GEM framework: whether the individual has acted as a business angel, whether they already own an established business, and whether they have recently closed a business.
Business angels are thought to make investments with some previous knowledge of entrepreneurship (Maula et al., 2003).They are therefore expected to be favourable towards entrepreneurial activity and have a higher propensity to engage in such behaviour (Ramos-Rodríguez et al., 2012).Having acted as a business angel, particularly in cases where the business has succeeded, or hearing about credible success stories could provide important learning, leading to self-efficacy and motivation to start a business (Veciana, 2007).
In a similar vein, owning and managing an established business could indicate mastery experience in the form of entrepreneurial human capital (Hessels et al., 2011), leading to increased self-efficacy and motivation to start another business.Where the established business is successful, it will enhance business skills and confidence, which may facilitate further new market entry (Shrader et al., 2000).Past business failure, however, also provides a learning opportunity for entrepreneurs.Empirical evidence has shown it to be a significant determinant of entrepreneurial intentions (He et al., 2020) and activity (Koellinger et al., 2013).Previous business ownership reflects useful learning effects and enhances entrepreneurs' experience and skills, so they are more likely to start again (Stam et al., 2008).Furthermore, entrepreneurs are thought to have resilient qualities and abilities to recover from failure, indicating a proclivity towards a career as an entrepreneur and increasing the likelihood of serial entrepreneurship (Tugade & Fredrickson, 2004).
We anticipate that such behavioural factors will be heterogeneous across individuals.Indeed, it is likely that rates of prior business ownership or business angel activity will be relatively low among individuals and therefore we do not expect these to be dominant antecedents for entrepreneurial activity across a majority of groups.

Data
This study draws on the full annual working age (18-64) population dataset from the 2015-2018 GEM surveys.The dataset for analysis includes the most recently published GEM data that contains all of our variables of interest.The combined dataset includes data from 665,233 respondents.The full list of variables included in this study are shown in Table 1.The dependent variable for the study is TEA, which is a widely used measure encompassing nascent and new business ownership.The nascent stage reflects businesses within the first three months of startup, and new businesses as those between 3 and 42 months old.TEA is measured on binary scale where 0 = no and 1 = yes.
The independent variables are all measured at the individual level, and include variables relating to cognitive and personal characteristics, past behaviour, and environmental perceptions.Cognitive factors include self-perceptions of self-efficacy, networks, opportunities, and fear of failure.As shown in Table 1, these variables are measured on binary scales, where 0 = no and 1 = yes.Other personal characteristics include demographic variables of age, gender, education, household income and household size.Past behaviour includes whether the person has acted as a business angel, whether they currently own and manage an established business over 42 months old, and whether they have closed a business in the past two years.Environmental perceptions include cultural perceptions of good opportunities for start-up; perception of entrepreneurship as a good career choice; perception of status of entrepreneurs; prevalence of entrepreneurs in the media; ease of starting a business and social entrepreneurship.These variables are also measured on binary scales, where 0 = no and 1 = yes.The descriptive statistics for the variables are shown in Table 2.

Machine learning method
This study draws on a machine learning approach to predict and profile TEA.Machine learning involves using algorithms to learn patterns in data.There are two broad categories of machine learning problems: supervised and unsupervised.Supervised learning involves applying an algorithm to learn the relationships between input data and a target outcome.In contrast, unsupervised learning involves grouping observations into categories that emerge from patterns in the data.In the present study, we adopt a supervised learning approach to learn the relationships between the input data and TEA.
The machine learning methodology for this study involves a series of steps, which include data and feature selection; splitting the data into  (Kuhn & Johnson, 2013).These stages are discussed in more detail below.

Machine learning algorithms and model training
The data for training and testing the models is split into two parts using random stratified sampling; 70 % of the data is used to train the machine learning models, and 30 % of the data is used to test the performance of the model.This split allows for a sufficient quantity of data to train the model, as well as providing an objective assessment of the models' predictive performance.Evaluating the accuracy of the model using a holdout test set allows us to obtain a more objective assessment of the models' performance.
A decision tree algorithm is used to build the model for the study.Decision trees are a widely used approach in machine learning, with several variations such as ID3, C5.0 and CART, as well as ensemble techniques, which model the data by combining multiple decision trees to improve predictive accuracy.Decision trees have been applied in a range of scenarios such as profiling ski resorts marketing activities (Duchessi & Lauría, 2013), predicting purchase intentions (Martínez et al., 2018), business failure prediction (Gepp et al., 2010), credit risk (Pérez-Martín et al., 2018), and churn prediction (Coussement & De Bock, 2013), amongst others.
In this study, we focus on a commonly applied decision tree algorithm: recursive partitioning (Breiman et al., 1984;Thereneau & Atkinson, 2015).Recursive partitioning can be used to model patterns in the data which best separate observations into groups, such as entrepreneur or non-entrepreneur.The algorithm builds a model, by recursively separating observations into groups which are increasingly homogenous across the target variable.The tree building process begins with a root node, which contains all of the observations.The algorithm then searches across all variables to identify the variable that maximises node purity if it were used to split the data.The data is then split into groups based on this variable.This process is applied at each node until a stopping criteria is reached or splitting the data further does not increase the node purity.This results in a terminal node, which is used to predict the outcome.
The output of this process is a model consisting of a series of "if, then, else" rules which can be used to make predictions about the target outcome in cases where the outcome is unknown.The model resulting from the decision tree algorithm has the advantage of being readily interpretable both graphically and using the variable importance measures.The decision tree algorithm allows us to address different research questions, compared to a more traditional approach such as logistic regression.Specifically, in this study it allows us to identify dominant variables and more nuanced and non-linear relationships.It also has the advantage that it is a non-paramedic technique and does not assume a specific distribution, whereas models such as logistic regression assume a linear relationship between the logit of the dependent variable and the independent variables (Coussement & De Bock, 2013).The algorithm also has a built in feature selection mechanism, in that variables that do not improve the model accuracy are not selected for splits in the tree, which means that irrelevant variables are ignored.The decision tree algorithm also effectively handles missing data through the use of surrogate splits.If an observation has missing data on a splitting variable, the next best variable will be used instead.This means more data is used in the model building process, compared to removing observations with missing values.It is important to note that observations with missing data on a particular variable are not used to calculate node impurity when splitting on that variable (Thereneau & Atkinson, 2015).

Parameter tuning
Machine learning algorithms often have parameters that need to be tuned to maximise the accuracy of the model.For recursive partitioning, we tune the complexity parameter, which governs the size of the tree by controlling the minimum improvement from a subsequent split in the tree.As the parameters must be learned from the data we adopt a 10-fold cross validation approach, repeated five times (Kuhn & Johnson, 2013).A grid of plausible tuning parameters are tested on the training data to identify the parameters that result in the most accurate model.To tune the parameters, the training data is split into ten parts, with each model built using nine parts of the data, and tested using the one part that was left out.Each of the candidate parameters is evaluated for accuracy.The process is repeated five times for robustness.The overall process identifies the model parameters that maximise the predictive accuracy of the model.

Dealing with class imbalance
The dependent variable in our study suffers from class imbalance, which biases the predictions towards the majority class, thus decreasing the models sensitivity to the minority class (Kuhn & Johnson, 2013).In practice, this means the model would be more accurate in identifying non-entrepreneurs, but less accurate in identifying entrepreneurs.To reduce the impact of this problem, up sampling of the minority class is used during the cross-validation process.Each cross-validation fold was up sampled separately, rather than up sampling the entire training dataset prior to the cross validation.This ensures that the individual folds are balanced, and allows cross validation performance to be assessed on the unbalanced data (Kuhn & Johnson, 2013), in effect optimising the model against the original distribution of TEA.The test dataset was not up sampled, allowing us to evaluate the model performance based on the original distribution of the data.Other common resampling techniques were also evaluated during the analysis process, including down sampling, and the more advanced techniques of SMOTE and ROSE (Burez & Van den Poel, 2009;Chawla et al., 2002).SMOTE and ROSE resulted in similar performance to up sampling, and down sampling was slightly worse.We therefore elected to present results from the model built using the up sampled data as it is both intuitive and of comparable accuracy.

Evaluating and interpreting the model
The predictive accuracy of the models are evaluated using the hold out test set.Due to the class imbalance of the dependent variables, the main metric used to evaluate the model accuracy is the area under the curve of the receiver operating characteristic (AUC-ROC) (Bradley, 1997;Burez & Van den Poel, 2009).We also present kappa and the overall accuracy, although would caution against using accuracy by itself, due to class imbalance.Models are interpreted using the variable importance scores, and by visualising the tree structure.The variable importance measures are calculated based on the reduction in the loss function from the splits and surrogate splits on the variable (Thereneau & Atkinson, 2015).
To enhance the robustness of our results we carried out additional analyses on the data, following the same process discussed previously.This included building models and comparing accuracy for each individual year between 2015 and 2018.As shown in Table 4 of the results presented in Section 4.1, this revealed no substantial differences in accuracy.No substantial differences were observed in the dominant variables over the years.In addition, we also built a model using the 2015-2017 data, and tested the model on the 2018 data.Again, this revealed no substantial differences in accuracy or dominance of variables.

Variable importance and model accuracy
One of the aims of our machine learning approach is to identify the relative importance of the determinants of TEA.This is similar to the dominance analysis approach adopted in the wider management literature (Arin et al., 2015;Hakanen et al., 2021;Kumar et al., 2010).We B. Graham and K. Bonner draw on the variable importance measures produced through the machine learning approach to identify the relative importance of the independent variables.Table 3 presents the variable importance scores for the top 20 features.
The three cognitive traits of self-efficacy, networks, and opportunities are the most important determinants of TEA.This is followed by the person's age and fear of failure.Experience as an established business owner, having discontinued a business, and having acted as a business angel are also important, highlighting the overall prominence of past behaviour as a determinant of entrepreneurship.Aside from age, although the other personal characteristics and cultural variables have been used by the decision tree algorithm to build the model, the variable importance scores indicate that these are of less overall importance.
To assess the model accuracy, predictions are made on the unbalanced hold out test dataset.The results of the model accuracy are presented in Table 4.This shows that the decision tree has an overall accuracy of 70.95 %, with an AUC-ROC of 0.7735.Due to the class imbalance in the data the AUC-ROC provides a more meaningful estimate of the model performance.According to Hosmer & Lemeshow (2000), a model with an AUC-ROC of over 0.7 can be interpreted as having good discriminative ability.The accuracy of the model based on the test data is very similar to the accuracy we observed during the model tuning process (AUC-ROC of 0.7686) for the best complexity parameter (0.0001037794), indicating that overfitting is not an issue.

Profiling entrepreneurs using the tree structure
One of the advantages of the decision tree approach compared with some other machine learning algorithms is that it is readily interpretable by visualising the tree structure, thus allowing us to gain additional insight about the structure of the relationships in the data.The decision tree structure shows the different combinations of factors that can lead to entrepreneurship.The pruned structure of the final decision tree is show in Fig. 3, with the full tree structure presented in appendix A. This tree structure can be interpreted alongside the variable importance scores to add additional information about the structure of the final model.Because the algorithm splits first on the most important variable, the variables that best separate the observations into entrepreneurs and non-entrepreneurs are shown at the top of the tree, with subsequent tree levels showing relatively less important predictors.The tree shows that the variable that best separates the data is self-efficacy, with individuals less likely to be engaged in early-stage entrepreneurial activity when they feel they do not have the skills to do so (i.e.suskillyes = 0).This is followed by networks on both the left-hand side and right-hand side of the tree.This corresponds to the results of the variable importance scores, where self-efficacy and networks are also found to be the most important predictors.
After the top two levels, the tree becomes more complex, illustrating the multiple groups of factors that categorise people into sub-groups of entrepreneurs and non-entrepreneurs.This highlights that different combinations of factors can lead to entrepreneurship, and there is no 'one size fits all' pathway.On the left-hand side of the tree, beginning at the root node, people who do not have the skills to start a business and who do not know an entrepreneur, and who have not closed a business are classified as non-entrepreneurs.
However, even for people who do not have the skills and knowledge to start a business, there are still combinations of factors that result in entrepreneurship, even though very few observations in the decision tree fall into these groups.For example, having discontinued a business, being aged under 51, and perceiving good opportunities results in an increased probability of starting a business, as does perceiving good opportunities and having discontinued a business.
Focusing on the right hand side of the decision tree shows the combinations of variables that increase the probability of entrepreneurship.Those most likely to be engaged in TEA have self-efficacy, networks, are not currently established business owners, and do not fear failure.For individuals with self-efficacy and networks, being an established business owner decreases the probability the individual would be predicted to be engaged in TEA, although there are also combinations of factors for business owners that increase the predicted probability of entrepreneurship, such as household size and education level.Although networks are important for individuals with selfefficacy, there are combinations of factors that result in TEA even for individuals who do not know an entrepreneur.For example, individuals within this group who perceive good opportunities for entrepreneurship would be predicted to be engaged in TEA.Although age does not feature at the top of the tree, it is ranked highly in the variable importance scores, and does feature in multiple lower splits.In general, younger people are more likely to be predicted to be engaged in entrepreneurship compared with older individuals, although the cut off age for the prediction differs depending on the preceding variables in the tree structure, indicating differential effects across sub-groups.This highlights the ability of the decision tree to model complex non-linear relationships in the data.Other personal characteristics and cultural variables do not appear at all in the top levels of the decision tree, indicating that these are less important determinants of TEA.However, the full decision tree model (appendix A) shows that these variables are relevant for smaller sub-groups of individuals.

Discussion
The results presented in the previous section highlight the most important social-cognitive determinants of TEA, as well as the different combinations of factors that lead to entrepreneurship.We discuss the main implications of the results from a theoretical, practical, and methodological perspective.

Theoretical contributions of the results
The results make theoretical contributions to both the entrepreneurship literature and the SCT literature, by extending the dominance analysis approach to the individual level, and by highlighting the heterogeneous nature of the social and cognitive antecedents.We discuss these theoretical contributions further here, in the context of our initial research questions, drawing on insights from the existing literature to explain our findings.

RQ 1: Dominance of social-cognitive determinants of entrepreneurship
Our first research question focuses on which factors are dominant, and hence necessary for entrepreneurship.From a dominance analysis perspective, the variable importance scores highlight the relative importance of self-efficacy in separating entrepreneurs and nonentrepreneurs.This is consistent with arguments from SCT about the importance of self-efficacy in motivation (Wood & Bandura, 1989).However, the SCT literature provides little theoretical guidance as to the relative importance of motivational antecedents beyond self-efficacy.In addressing this gap, we find substantial differences in the relative importance of other social-cognitive determinants of entrepreneurship.We empirically find that networks, opportunities, and fear of failure are the next most important social-cognitive determinants of entrepreneurship.Although this confirms findings from the wider entrepreneurship literature (Arenius & Minniti, 2005), we extend this past work by discerning their relative importance.Our results further highlight the crucial importance of cognitive traits in motivating entrepreneurship, relative to the other dimensions of SCT, particularly when contrasted against the relatively lower importance of external factors.
In terms of the other personal factors, age features as the most important of the demographic variables in both the tree structure and in the variable importance measures.Age exhibits a non-linear relationship with entrepreneurship.This is consistent with the wider literature (Lévesque & Minniti, 2006), but our tree-based approach highlights more detailed nuances about the differential effect of age across different sub groups.Other personal characteristics, such as gender, educational level, household income and household size are less important overall, and are only important for certain groups of individuals and in combination with other factors.This highlights the relative importance of cognitive traits in predicting TEA compared with other personal demographic factors.One potential reason for the lower importance of gender could be due to interrelationships between gender and other concepts such as self-efficacy (Chowdhury & Endres, 2005;Gatewood et al., 2003;Kourilsky & Walstad, 1998), which account for the majority of the predictive ability.This would suggest a more complex role for gender as a predictor of entrepreneurship.
Perhaps surprisingly, household income is a relatively less important predictor of TEA.The wider literature does point to conflicting arguments about the effect of income on entrepreneurship (Amidžić, 2019;Block & Sandner, 2009;Stephan et al., 2015).A higher income is thought to either encourage people to remain in paid employment, or provide the necessary resources to start a business and enable those individuals to pursue more opportunities.Alternatively, a lower income is also associated with entrepreneurship, albeit of the necessity-driven type.These conflicting perspectives could both hold true in different scenarios, decreasing the overall predictive ability of this variable.Education is also a less important in predicting TEA, which backs up arguments from the literature where no clear relationship emerges between education level and entrepreneurship (Carter et al., 2000).Other authors also highlight a more general indirect relationship between such personal factors and entrepreneurship (Krueger et al., 2000).
The experiential variables are found to be of moderate importance, ranking higher than most of the personal and cultural factors, but lower than the cognitive traits.Although past research has not focused on the relative importance of these factors, our results support the relevance of experience in motivating action as proposed by the SCT literature (Wood & Bandura, 1989), as well as findings about the importance of experience from the entrepreneurship literature (Koellinger et al., 2013;Ramos-Rodríguez et al., 2012;Tugade & Fredrickson, 2004).
Although past research using GEM data and asymmetrical techniques has found cultural perceptions to be important in predicting TEA (Coduras et al., 2016), these do not feature amongst the most important predictors of entrepreneurship, nor do they appear in the top levels of the decision tree.Our findings are thus more consistent with studies that have found these variables to have low odds ratios (Liñán et al., 2011).Another possible explanation is that the cultural variables are indirectly related, via relationships with other social and cognitive factors.
These findings extend the small body of literature which has taken a dominance analysis approach to study the determinants of entrepreneurship at a macro level (Arin et al., 2015), both through a focus on the individual level and through the adoption of a social-cognitive approach.

RQ 2 and 3: Heterogeneity in the social-cognitive determinants
Our second and third research questions focus on heterogeneity, and specifically exploring the combinations of social-cognitive factors that lead to entrepreneurship and non-entrepreneurship.The visual structure of the decision tree provides an effective way of addressing these questions.The results of the decision tree support the proposition from B. Graham and K. Bonner SCT that self-efficacy is central to entrepreneurial behaviour.Beyond this, we find variation in the heterogeneity of social and cognitive motivating factors that lead to entrepreneurship for different groups.Whilst past work has identified heterogeneity in the businesses started by entrepreneurs (Cieślik & Dvoulety, 2019;Dvouletý, 2020), and in other characteristics (Douglas et al., 2020), our findings point towards heterogeneity in the antecedents of entrepreneurial motivation.Although cognitive traits are found to be most important, there are smaller groups of entrepreneurs who start a business even when they do not perceive good opportunities, fear failure, or do not have entrepreneurial networks.There is more diversity in the characteristics of people who start a business when they are lacking one of the most important traits of self-efficacy or knowing an entrepreneur.Even though it is possible to start a business when lacking these traits it is much less likely than for individuals who possess both traits.
Heterogeneity is particularly evident in terms of the demographic characteristics of entrepreneurs, where factors such as age, gender, education, and household income and size are more important for certain sub-groups compared with others.One reason for this could be due to the heterogeneous nature of the entrepreneurial process, the drivers of entrepreneurship, and differences in the factors necessary to start different types of businesses.Indeed, the heterogeneous nature of entrepreneurs has been identified by Kerr et al. (2018) as partly due to the type of business ventures created, with the archetypal young tech entrepreneur not expected to have the same traits as an older immigrant entrepreneur setting up a traditional business.They suggest the development of a taxonomy to better understand sub-groups of entrepreneurs.In a similar vein Van Ness & Seifert (2016) suggest that entrepreneurs should be considered as more than a single category of individuals, recognising that distinctive groups will have diverse motives and objectives and therefore differing characteristics.They highlight the lack of unifying theories, and propose a multidimensional model of characteristics including affect (moods, inferences and perceptions); personality (emotional stability, openness) and work ethic (work centrality, selfreliance).
Our results confirm these heterogeneous and multidimensional aspects showing that no single factor by itself leads to entrepreneurship, rather combinations of factors are necessary for entrepreneurship, and these combinations of factors can be different across groups of entrepreneurs.It has further been recognised in the literature that some variables can be positive for certain groups, but negative for others (Douglas et al., 2020).This line of thought draws on arguments from complexity theory (Woodside, 2014), suggesting that the factors leading to entrepreneurship are not necessarily the same as the factors that lead to non-entrepreneurial career choices.Our results empirically support this idea, with differences in the probability of entrepreneurship across different combinations of variables.For example, when focusing specifically on established business owners, the decision tree shows that the predicted probability of entrepreneurship varies depending on other factors that separate established business owners into further subgroups, each with different probabilities of TEA.When considering established entrepreneurs, the tree shows that some sub-groups are predicted to be engaged in TEA, whilst other sub-groups are predicted not to be engaged in TEA.This is also the case for sub-groups who have discontinued a business, highlighting the importance of considering that the same variable has different effects for different groups.
These findings build on past studies adopting techniques that allow for the analysis of asymmetries and more complex relationships.For example, recent studies have drawn on FSQCA, to gain new insights about the determinants of entrepreneurship (Douglas et al., 2020), although the FSQCA approach has received some criticism in the literature due to subjective bias, calibration of data and reliance on prior knowledge (Liu et al., 2017).Our findings, alongside those from the wider literature highlight the importance of considering heterogeneity and complexity that are difficult to study using traditional regressionbased approaches (Douglas et al., 2020).

Methodological contributions
From a methodological perspective, the decision tree approach has a number of advantages over traditional regression-based techniques.Traditional regression based approaches are underpinned by assumptions such as the assumption of a linear relationship, the distribution of errors, independence of observations, and lack of multicollinearity (Douglas et al., 2020), which are not assumed by the decision tree approach.Of particular importance in the study of heterogeneity, recursive partitioning does not require a linear relationship between the independent and dependent variables, which allows us to identify more complex and non-linear relationships in the data.Moreover, traditional regression-based approaches would fail to identify these complexities as they focus on net effects rather than differences between groups (Douglas et al., 2020).
Our approach also goes beyond existing regression based approaches in examining variable importance, which have well documented limitations (Tonidandel & LeBreton, 2011).For example, the level of significance is influenced by sample size, whilst coefficients depend on the measurement scale, making it difficult to assess the contribution of variables using traditional approaches.The variable importance scores presented in this paper also allow us to study the relative importance of predictors, contributing to the small body of literature that focuses on dominance analysis in entrepreneurship (e.g.Arin et al. 2015).Model uncertainty (Arin et al., 2015) is also reduced as the algorithm selects the optimal set of variables to include.
The machine learning approach also includes a robust mechanism for training models and testing their predictive accuracy on unseen data.In particular, the cross-validation approach used to tune the model, and the use of a separate test dataset to evaluate predictive accuracy both increase the robustness of the approach.Indeed, Woodside (2013) has argued that testing models on unseen data is rarely carried out, but is essential in evaluating the predictive accuracy of the model.One key point arising from the accuracy measures is that it is difficult to predict early-stage entrepreneurship.This is consistent with the wider literature that entrepreneurship is a difficult phenomenon to predict, with multiple determinants (Kuckertz et al., 2015).It also leaves open the possibility of future studies identifying more variables that can enhance the predictive accuracy.Indeed, a further advantage of the decision tree approach is its ability to handle large datasets by ignoring irrelevant variables.Decision trees can also handle missing data effectively using surrogate splits, or by retaining the option to split based on missing data when building the tree structure.This is in contrast to many regression approaches where missing data is imputed or observations with missing values discarded.

Implications for policy and practice
The findings lead to several implications for policy and practice.When designing policies to encourage entrepreneurship, policy makers can consider targeting the most important predictors, such as selfefficacy, networking and opportunity recognition.These traits could be developed through initiatives such as entrepreneurship education programmes, mentoring schemes, and facilitation of networking events (Dodd & Keles, 2014;Mukesh et al., 2020;St-Jean & Audet, 2009).Although factors such as self-efficacy and networks are universally important, the findings also highlight the importance of considering the heterogeneity of entrepreneurs, meaning that policy interventions could also be tailored more closely to different sub-groups, rather than adopting blanket policies to encourage entrepreneurship.For example, heterogeneity amongst entrepreneurs and non-entrepreneurs resulting from past behaviour and other personal factors could suggest policies targeted towards different sub-groups based on these characteristics.Such interventions could focus on encouraging past entrepreneurs to start a business, or could focus on differentiating interventions across different life stages.
B. Graham and K. Bonner Potential entrepreneurs should also be made aware of the heterogeneous backgrounds of others, to understand that it is possible to start a business even when certain motivating factors are not present.For example, that entrepreneurship takes place even when individuals fear failure, likewise that past failure can be a learning experience.Potential entrepreneurs should also be aware that their demographic characteristics such as age, gender, general education or income, should not be a deterrent to entrepreneurship, particularly as these are less important determinants than the cognitive factors.Investors could also consider this heterogeneity, recognising the diversity of entrepreneurs when making funding decisions.When considered alongside the other variables, cultural perceptions are less important than cognitive and personal factors.Potential entrepreneurs and entrepreneurial educators can therefore focus on developing cognitive traits, for example through education and networking, rather than focusing on wider cultural issues which are potentially outside of their control and thus difficult to influence.

Limitations and future work
Machine learning techniques are relatively new to the entrepreneurship literature, and their full potential is still being explored.Future work should therefore consider applying machine learning techniques to study entrepreneurship in different contexts and with different datasets.This would help to examine the replicability and generalisability of the findings.We focus on only one machine learning approach: recursive partitioning.Although this allows us to identify the variable importance and to interpret the structure of the model, future research could consider implementing additional machine learning techniques such as ensembles of trees (random forests or gradient boosting).Although these models can increase accuracy, they are also more difficult to interpret.
Although the GEM is a well-established dataset for studying entrepreneurship, there are limitations around the data that should be considered when interpreting the results.One limitation of the data is that the constructs are measured using single item measures.Whilst we focus specifically on heterogeneity of social cognitive determinants, it is also likely that this heterogeneity leads to differences in terms of the types of businesses started.To understand the heterogeneity of entrepreneurs, future research could consider a taxonomy approach, classifying entrepreneurs by the different types of businesses they start in terms of motivation, purpose and activity e.g.high growth, innovation driven, social enterprise, opportunity/necessity-driven etc. to help understand and explain some of the variance in importance of the demographic determinants.New analytics techniques have opened up the possibility of examining these factors using different tools, leading to new findings and a better understanding of entrepreneurship.
Our dependent variable, TEA, can also be considered a partial measurement of entrepreneurship which fails to capture wider entrepreneurial activity such as that in existing businesses ( Ács & Szerb, 2009).To fully capture the multi-dimensional aspects of entrepreneurship, beyond venture creation, future analysis should consider alternative measurements of entrepreneurship such as the Global Entrepreneurship and Development Index (GEDI) ( Ács & Szerb, 2009) or those proposed under the OECD (2009) Entrepreneurship Indicators Programme.Another limitation is that the data is cross sectional, which has limitations for attributing causation.In understanding the relationships, we have therefore drawn on wider theoretical and empirical work, but future studies could consider incorporating time series data into the analysis.

Fig. 2 .
Fig. 2. Conceptual Framework illustrating possible combinations of heterogeneity.P = cognitive and other personal characteristics; B = behaviour; E = Environmental perceptions.

Fig. 3 .
Fig. 3. Decision tree model predicting TEA.The figure has been pruned for presentation purposesthe full model can be found in appendix A.

Table 1
List of Variables.

Table 3
Variable importance scores.

Table 4
Final Model Accuracy and Robustness Checks.All accuracy measures evaluated using the test data.