Factorial validity and measurement invariance of the Athlete Burnout Questionnaire (ABQ)

The Athlete Burnout Questionnaire (ABQ) is the gold standard measure for burnout in athletes. However, previous assessments of factorial validity have: (a) tested overly restrictive measurement models; (b) provided mixed support for factorial validity; and (c) not been applied to assess measurement invariance across gender, sport type, or age. To address these issues, we used ABQ data provided by 914 athletes ( M age = 21.75 years, SD = 8.79) and examined factorial validity using confirmatory factor analysis (CFA) and exploratory structural equation modelling (ESEM) techniques. We also examined measurement invariance of the ABQ data across re-ported gender (female, male), sport type (individual, team), and age ( ≤ 18 years, > 18 years) groups. The analyses revealed that an ESEM model provided superior fit over the corresponding CFA model. In terms of measurement invariance, support was provided for the equivalence of the ABQ across each group. This means that researchers using the ABQ can collect data across these groups and examine potential differences with confidence that the ABQ is approximately invariant. In all, we provide evidence that the majority of ABQ items are key target construct indicators and the burnout construct (as measured by the ABQ) has the same structure and meaning to different athlete groups.


Introduction
Participation in sport is often an enjoyable, worthwhile, and highly rewarding experience (Kilpatrick et al., 2005).While this positive way of thinking about sport is familiar for many athletes, for others it is far removed from their current thoughts and feelings towards sport.That is, some athletes see sport as an endeavour that is overtaxing, unrewarding, and unenjoyable (Creswell & Eklund, 2006;Gustafsson et al., 2008).This cynical view of sport and its associated value is typified by athletes who experience high levels of athlete burnout (Eklund & DeFreese, 2020).Over the last two decades, researchers have dedicated considerable effort to better understand what athlete burnout is, how it differs from other experiential states, and what factors are most likely to underpin its development (Eklund & DeFreese, 2020;Gustafsson et al., 2017;Pacewicz et al., 2019).This line of research has been made possible by the development of conceptual models of athlete burnout (e. g., Coakley, 1992;Raedeke, 1997;Smith, 1986) and associated domain-specific measures (e.g., Eades, 1990;Isoard-Gautheur et al., 2018;Raedeke & Smith, 2001).

Athlete burnout
The most widely adopted model of athlete burnout was developed by Raedeke and Smith (Raedeke, 1997;Raedeke & Smith, 2001, 2009).As described by Raedeke and Smith, athlete burnout is a psychological syndrome characterised by a constellation of symptoms: reduced sense of accomplishment, RSA; physical and emotional exhaustion, EXH; and sport devaluation, SD.The first burnout symptom, RSA, reflects a sense of low accomplishment and personal inadequacy in sport.The second burnout symptom, EXH, reflects the perceived depletion of physical and emotional resources resulting from sport training and competition.The third burnout symptom, SD, reflects the development of a diminished and cynical view towards the benefits of sport participation (Raedeke & Smith, 2001).This conceptualisation of burnout is specific to athletes and the demands they face in sport (Eklund & DeFreese, 2020).
Athlete burnout is often characterised as an extreme and persistent form of sport disillusionment (Madigan et al., 2019).In this regard, it is unsurprising that burnout has been found to contribute towards diminished physical and psychological well-being among athletes.Researchers have found that burnout is positively associated with negative outcomes (e.g., depressed mood, psychological stress, and negative affect) and negatively associated with adaptive outcomes (e.g., coping skills, hope, perceived control, and optimism) among athletes (Gustafsson et al., 2017).Additionally, researchers have hypothesised that burnout is likely to give rise to long term performance impairment, illicit substance use, and sleep dysfunction (Eklund & DeFreese, 2020).In these regards, high levels of athlete burnout may confer vulnerability to a host of negative and damaging outcomes for athletes in sport.
While it is difficult to know exactly how many athletes are affected by burnout and potentially vulnerable to burnout-induced problems, some estimates suggest that up to 10% of athletes may experience meaningful levels of burnout (Gustafsson et al., 2007;Raedeke & Smith, 2009).In addition to evidence on worrying rates of burnout in athletes, researchers have found that average levels of athlete burnout have increased over the last two decades.Specifically, adopting a cross-temporal meta-analytical framework, Madigan et al. (2022) traced average levels of reported athlete burnout from 1997 to 2019.In this review of 91 studies (N = 21,012), Madigan and colleagues found that mean levels of RSA and SD have increased over time.This evidence is important as it suggests that athletes are now at greater risk of burnout symptoms and more susceptible to its negative consequences.In this regard, the study of athlete burnout is extremely important.

Conceptualising and measuring athlete burnout
The model of athlete burnout proposed by Raedeke and Smith (Raedeke, 1997;Raedeke & Smith, 2001, 2009) was developed based on Maslach and Jackson's (1981) conceptualisation of burnout in care and human service workers.One major distinction made between sport and human service institutions is that athletes do not provide a service to others (Eklund & DeFreese, 2020).This meant that the three symptom dimensions of athlete burnout included a focus on sport participation and sport performance rather than work with others.This is why RSA captures low accomplishment and personal inadequacy in sport performance rather than in one's work with others.Similarly, SD captures cynical attitudes towards sport participation rather than the recipients of one's healthcare or service.Another important distinction made was that sport is a context in which physical demands are an obvious source of psychosocial stress (Eklund & DeFreese, 2020).This is why EXH includes a focus on both physical and emotional exhaustion.
To operationalise athlete burnout Raedeke and Smith (2001) developed the Athlete Burnout Questionnaire (ABQ).In line with the conceptual model of athlete burnout outlined above, the ABQ contains three separate 5-item subscales that measure RSA ("I am not achieving much in sport"), EXH ("I am exhausted by the mental and physical demands of sport"), and SD ("I'm not into sport like I used to be").The process of arriving at this 15-item scale was iterative and involved multiple forms of robust evaluation (e.g., statistical tests of factorial validity and feedback from expert panels on content validity, suitability, and readability).While other measures of athlete burnout have since been developed (e.g., Isoard-Gautheur et al., 2018), the ABQ remains the most widely used and continues to be considered the gold standard measure for burnout in athletes (Eklund & DeFreese, 2020).
One reason for the dominance of the ABQ is that it has a strong conceptual grounding in a well-established model of burnout (Maslach & Jackson, 1981).However, just as importantly, the ABQ has typically performed well under psychometric scrutiny.This is evident in research showing support for convergent and discriminant validity, test-retest reliability, and scale internal consistency (Creswell & Eklund, 2006;Raedeke & Smith, 2001, 2009).While this may be the case, assessment of psychometric properties is, of course, an ongoing process and there are two aspects of validity evidence that require further examinationfactorial validity (i.e., the extent to which ABQ items measure the intended burnout construct) and measurement invariance (i.e., the extent to which the burnout construct has the same structure and meaning to different athlete groups).

Factorial validity of the ABQ
In the original scale validation study, as well as all subsequent studies examining the factorial validity of the ABQ, researchers have employed confirmatory factor analysis (CFA).This approach allows researchers to examine the relationships between indicators (e.g., ABQ items) and latent constructs (e.g., symptom dimensions of athlete burnout) in a pre-determined factor structure.In adopting this approach, some researchers have found reasonable support for the original three-factor ABQ.This is evident in studies where the CFA model specification for the ABQ provides acceptable fit to ABQ data (e.g., DeFreese & Smith, 2013;Raedeke & Smith, 2001;Ruser et al., 2020).Despite this support, it is important to note that other researchers have often found that the same CFA model specification provides either marginal or suboptimal fit to ABQ data (e.g., Appleby et al., 2022;Barcza-Renner et al., 2016;Casanova et al., 2023).To better understand why this mixed pattern of support exists, it is important to reflect on limitations of the CFA modelling technique.
While CFA is a strong technique with several modelling capabilities, it is not without limitation (Asparouhov & Muthén, 2009).One major limitation is that indicators in CFA are assumed to be pure indicators of the construct they are developed to measure (zero cross-loadings are permitted).The problem with this assumption is that indicators are imperfect and rarely provide a reflection of a single construct (Morin et al., 2020).Most multidimensional measures include indicators that can be expected (logically, theoretically, or empirically) to present construct-relevant associations with more than one factor (Morin, 2023).If these cross-loadings are incorrectly set to zero, the misspecification can result in poor model fit and biased factor correlations (Tóth-Király et al., 2017).This is problematic as researchers may then reject the model under examination in error and call into question whether the constructs being examined are conceptually distinct (Steenkamp & Maydeu-Olivares, 2023).
In terms of the ABQ, it is realistic that some items will present significant, and even reasonably large, cross-loadings.For example, RSA and SD are frequently the two most strongly correlated symptom dimensions of burnout (r = .47to .74;Raedeke & Smith, 2009).This overlap partly reflects the fact that the two burnout symptoms are attitudinal in nature and characterised by negative feelings (e.g., cynicism and dissatisfaction; Raedeke & Smith, 2009).We therefore expect RSA items to cross-load on the SD factor (and vice versa).Even if these cross-loadings are only small (e.g., λ ≤ .10),ignoring them could undermine model fit and the discriminant validity of factors (Tóth-Király et al., 2017).This is important to acknowledge given that CFA sometimes provides poor model fit for the ABQ (e.g., Appleby et al., 2022;Barcza-Renner et al., 2016;Casanova et al., 2023) and very high factor correlationsespecially between RSA and SD (r > .85;Lower-Hoppe et al., 2022).Even though these results may be a function of the overly restrictive CFA model specification, it is easy to assume the evidence reflects issues with the ABQ and that modifications are required (e.g., removal of items; Casanova et al., 2023;Lower-Hoppe et al., 2022).
One technique that overcomes many of the limitations of CFA is exploratory structural equation modelling (ESEM).Unlike the CFA approach, indicators in ESEM are permitted to load on all factors, allowing for complex structure (i.e., cross-loadings).However, in line with CFA, ESEM enables researchers to examine an a-priori factor structure (using target rotation), obtain overall tests of model fit, and M.C.Grugan et al. examine standard errors for individual parameter estimates.The key benefit of ESEM is that it helps researchers to achieve a more accurate representation of latent factors and factor correlations (Morin et al., 2020).In addition, ESEM helps researchers to better identify potentially problematic indicators.While cross-loadings are to be expected, they should be easy to explain (based on theory, logic, or empirical evidence) and weaker than corresponding target loadings (Morin et al., 2020).This technique is therefore needed to provide a more comprehensive and accurate assessment of the ABQ and identify any items that may need to be revised or removed.

Measurement invariance of the ABQ
In the field of sport psychology, researchers often assume that measures (and their indicators) behave in the same manner for athletes from different groups.The problem with this assumption is that some measures function differently across groups (Wells, 2021).An illustrative example of such group-based differences is evident in the measurement of depression.Some depression measures include indicators that may have less relevance to depression in men than women (e.g., items measuring frequency of "crying" as a depressive symptom; Kim & Yoon, 2011).This means it is possible that a sample of men and women who are equally depressed could score different total depression scores when using such measures.This problem can lead to evidence of group differences that are a product of measurement non-invariance as opposed to true group differences in depression.To avoid such issues, it is important that researchers establish evidence that a construct has the same structure and meaning across different groups, and responses are not confounded by features of the respondents (i.e., evidence of measurement invariance, Putnick & Bornstein, 2016).
The potential for measurement non-invariance to interfere with group comparisons using ABQ data is an area that requires further examination.While numerous tests of measurement invariance have been conducted on translated versions of the ABQ (e.g., Isoard-Gautheur et al., 2010;Liu et al., 2022;Zhang et al., 2016), the psychometric properties of these scales are specific to the context and language in which the scales were examined.In terms of the original (English, 15-item ) version of the ABQ, only one study dedicated to examining measurement invariance has been published.Lonsdale et al., (2006) tested the original ABQ and found evidence to support invariance across groups who differed in their method of reporting ABQ scores (online versus paper-and-pencil).More recently, Casanova et al. (2023) tested the ABQ for invariance but examined a shortened version with some slight wording amendments.In doing so, Casanova et al. found evidence to support invariance across groups who differed in their athlete category (athlete versus dancer), class standing (lower-class versus upper-class) and scholarship status (scholarship versus no scholarship).While this evidence is important, we still do not know if the original ABQ is invariant across other commonly examined groups in sport included in most samples.
When it comes to the ABQ, there has been encouragement from researchers to examine measurement invariance across gender and other sport populations (Gustafsson et al., 2007).In addition to gender, other sport populations that are of substantive importance are sport type and age.These three variables are important for at least two major reasons.The first reason is that researchers have found evidence of differences in ABQ scores across each of these variables.For example, Dubuc- Charbonneau et al. (2014) found evidence that female athletes reported higher EXH than male athletes, and Cremades and Wiggins (2008) found evidence that individual sport athletes reported higher RSA than team sport athletes.To properly evaluate such results, it is important to identify whether the measurement properties of the ABQ generalise across these groups (Marsh et al., 2016).If they do not, existing evidence on differences in athlete burnout may be invalid.
The second reason that gender, sport type, and age are important to examine is that researchers often collect and study ABQ data using heterogenous samples (e.g., multiple genders, sport types, and age ranges).This practice relies on the assumption that the underlying factors are measuring the same construct in the same way across these different groups.If we find that the athlete burnout construct (as measured by the ABQ) has a different structure or meaning to different groups, researchers may have to rethink how they design future ABQ research.To deal with non-invariance, it may be necessary to collect more homogenous samples or statistically control for variables such as age, gender, and sport type in the planned analysis.

The present study
In line with the evidence presented above, there is a need to further examine the factorial validity of the ABQ using ESEM and measurement invariance approaches.To address these requirements, we conducted both single-group and multi-group analyses.In the single groupanalyses, we examined the ABQ using both CFA and ESEM techniques.In the multi-group analyses, we examined measurement invariance of the ABQ across meaningful groups defined based on reported gender, sport type, and age.We hypothesised that: (a) ESEM would provide better model fit for the ABQ (than a corresponding CFA model specification); and (b) the ABQ would be invariant across identified groups.

Participants
Three independent ABQ data sets were utilised in the present study.The three data sets have not been used previously in any published research.Data set one consisted of 575 adult athletes from a range of team and individual sports in the UK (M age = 24.81years, SD = 9.85, age range = 18-59), data set two consisted of 182 adolescent athletes from a range of team and individual sports in the UK (M age = 17.10 years, SD = 0.54, age range = 16-17), and data set three consisted of 157 male footballers from youth and young adult teams in the UK (M age = 15.96years, SD = 1.20, age range = 13-19).The data collected in each independent data set was approved under institutional ethical approval.In all cases, approved consent procedures were followed, and appropriate permissions were gained prior to inviting athletes to complete the voluntary paper-and-pencil study questionnaire.
In tests of factorial validity and measurement invariance, large sample sizes are required to obtain accurate parameter estimates and achieve adequate power (Hu et al., 2023).One method that researchers often use to obtain an appropriately large sample size for tests of invariance is to combine extant data sets into one large, pooled data set (van Dijk et at al., 2022).This practice is common in tests of measurement invariance in sport and exercise psychology (e.g., Grugan et al., 2021;Vlachopoulos, 2008).In adopting this approach, we pooled the ABQ data.In the combined data set of 914 athletes (M age = 21.75 years, SD = 8.79, age range = 13-59), meaningful groups were coded based on their reported gender (n 1 = 377 female athletes, n 2 = 532 male athletes), sport type (n 1 = 344 individual sport athletes, n 2 = 570 team sport athletes), and age (n 1 = 416 ≤ 18 years, n 2 = 498 > 18 years).
We evaluated the appropriateness of this data set in relation to power-related guidelines and considerations in tests of measurement invariance.Based on a simulation study conducted by Sass et al. (2014), an overall sample size of 600 (300 per group) provides adequate power for detecting large non-invariance using stringent cut-off values under WLMSV estimation and various modelling conditions (average rejection rates for ΔCFI, ΔTLI, ΔRMSEA ≥ 80%).In this regard, our combined data set is reasonable for testing measurement invariance across the groups of interest (i.e., gender, sport type, and age).

Measure
Athlete Burnout.The Athlete Burnout Questionnaire (ABQ; M.C.Grugan et al.Raedeke & Smith, 2001) was used to measure athlete burnout.The ABQ includes three 5-item subscales: RSA, EXH, and SD.All participants were instructed to think about their current sport involvement and rate how often they experienced the feelings identified in each item using a 5-point (1 = almost never to 5 = almost always) Likert scale.

Data analysis
In the present study we examined both single-group and multi-group measurement models using WLSMV estimation for categorical variables in Mplus 8.1 (Muthén & Muthén, 1998-2017).When compared to ML estimation, WLSMV is slightly less efficient at handling missing data (Asparouhov & Muthén, 2010).However, this was not considered an issue due to the extremely low level of missing data at the item level (<1% for all items).In addition to screening for missing values, we also checked for imputation errors, re-coded the two reversed scored ABQ items (RSA1 and RSA14), and calculated item statistics and scale reliability estimates.To assess scale reliability, we computed McDonald's omega (ω) estimates for each of the three ABQ subscales.
Single-group Measurement Models.In line with previous research, we initially adopted a CFA approach to examine the factorial validity of the ABQ.In this model, indicators were constrained to load on first-order target factors only and all latent factors were specified to covary.However, as highlighted in the introduction, one issue with the CFA specification is that it can be highly restrictive.To address this issue, we also adopted an ESEM approach to examine the factorial validity of the ABQ.In this model, ABQ items were permitted to load on all first-order factors and all latent factors were specified to covary.
In line with previous psychometric research, we used multiple indices to evaluate overall model fit: χ 2 , CFI, TLI, RMSEA, and SRMR.However, as the χ 2 is oversensitive to sample size and minor model misspecifications, we predominantly focused on the alternative model fit indices (e.g., CFI, TLI, and RMSEA).We considered models meeting the following criteria to reflect at least adequate fit: > .90CFI, TLI, < .08RMSEA, 90% CI < .05 to < .08,< .08SRMR (Marsh et al., 2004).When evaluating the standardised factor loadings in each model, we considered the magnitude of the estimates (≥ .30was considered meaningful), degree of cross-loading (the number of indicators loading meaningfully on more than one factor), and solution interpretability (Morin et al., 2020).Multi-group Measurement Models.The first step in this process involved exploring the suitability of combining the three independent data sets into one large, pooled data set.This was achieved using the alignment methodology (Asparouhov & Muthén, 2023) to identify the percentage of approximately invariant parameters (i.e., factor loadings and response thresholds) across the groups of interest (in this case, the groups represent each independent data sets).When approximate invariance holds for >80% of the parameters, the alignment methodology can be used to reliably compare latent means.We therefore used this threshold to identify whether it is reasonable to combine the three independent data sets.
The second step in this process involved testing the ABQ for measurement invariance across important athlete groups.We tested the following sequential measurement invariance models, as outlined by Morin (2023): configural (i.e., equality of measurement model, number of factors, indicators, and indicators-to-factors associations); weak (equality of factor loadings across groups); strong invariance (equality of response thresholds); strict (equality of the indicator uniquenesses); latent variance-covariance (equality of the factor variances and covariances); and latent means (equality of factor means).The Mplus syntax for these models was developed using De Beer & Morin, 2022 ESEM invariance syntax generator.
We examined measurement invariance across the coded gender, sport type, and age groups.In each assessment, the first stage involved examining the overall fit of each model.In the second stage, we examined changes between nested models using both the Mplus DIFFTEST function (MDΔχ 2 ) and changes in the following alternative fit indices: CFI, RMSEA, and SRMR.While the chi-square difference test (MDΔχ 2 ) provides a test of exact invariance, the change in alternative fit indices provide a test of approximate invariance (Millsap, 2005).In line with common recommendations, we relied predominantly on changes in the alternative fit indices and used the following criteria to identify measurement non-invariance: (ΔCFI > -.002, ΔRMSEA > +.010, ΔSRMR > +.010; Sass et al., 2014).These cut-off values are more stringent than traditional cut-off values used in tests or measurement invariance (e.g., Cheung & Rensvold, 2002) and are more appropriate for the proposed model specification (ESEM) and estimation method (WLSMV).
We supplemented the traditional tests of measurement invariance outlined above with the alignment methodology.We used this approach to: (a) explore the percentage of approximately invariant parameters across the groups of interest; and (b) compare latent means across groups in cases where evidence of approximate measurement invariance is satisfied.With fixed alignment, the factor means in each reference group (female athletes, individual sport athletes, and adolescent athletes) are fixed to 0. The methodology produces factor means for the non-reference group and identifies whether the estimates are statistically different from the reference group at the 5% significance level.

Scale reliability estimates
The scale reliability estimates are reported in Table 2.For each of the three athlete burnout subscales, estimates for McDonald's omega were all acceptable (ω = .77to .85).

Multi-group measurement models
Samples.The ABQ was assessed for measurement invariance across the independent data sets using the alignment methodology.The approach identified that 92% of factor loadings and 95% of response thresholds were approximately invariant.This means that it was reasonable to combine the three independent data sets into one large, pooled data set.
Gender.For gender, the six increasingly restrictive models provided good fit.While all the χ 2 difference tests (MDΔχ 2 ) were significant (meaning exact invariance was not supported in any model comparison), changes in the alternative fit indices were below the specified cut-off values for four (out of five) of the nested model comparisons (meaning approximate invariance was supported in these cases).The evidence provides support for the equality of factor loadings, response thresholds, indicator uniquenesses, and factor variances and covariances.However, the equality of factor means was not fully supported.While the changes in RMSEA and SRMR were below the identified cut-off values, the change in CFI (ΔCFI = -.005) was not.This evidence suggests that there may be differences in latent burnout scores between the male and female groups.
The results from the alignment methodology were consistent with this evidence.We found that 96% of factor loadings and 100% of response thresholds were approximately invariant.The alignment methodology also provided support for differences in factor means.While there were no significant differences in levels of RSA, we found that: (a) levels of EXH were higher for the male group (M = 0.16, p < .05);and (b) levels of SD were lower for the male group (M = -0.32,p < .05).
Sport Type.For sport type, the six increasingly restrictive models provided good fit.While all the χ 2 difference tests (MDΔχ 2 ) were significant (meaning exact invariance was not supported in any model comparison), changes in the alternative fit indices were below the specified cut-off values for four (out of five) of the nested model comparisons (meaning approximate invariance was supported in these cases).The evidence provides support for the equality of factor loadings, response thresholds, indicator uniquenesses, and factor variances and covariances.However, the equality of factor means was not fully supported.While the change in SRMR was below the identified cut-off value, the changes in CFI (ΔCFI = -.009) and RMSEA (ΔRMSEA = +.010) were not.This evidence suggests that there may be differences in latent burnout scores between the individual and team sport groups.
The results from the alignment methodology were consistent with this evidence.We found that 84% of factor loadings and 98% of response thresholds were approximately invariant.The alignment methodology also provided support for differences in factor means.We found that: (a) levels of RSA were lower for the team sport group (M = -0.28,p < .05);(b) levels of EXH were higher for the team sport group (M = 0.23, p < .05);and (c) levels of SD were lower for the team sport group (M = -0.33,p < .05).
Age.For age, the six increasingly restrictive models provided good fit.While all the χ 2 difference tests (MDΔχ 2 ) were significant (meaning exact invariance was not supported in any model comparison), changes in the alternative fit indices were below the specified cut-off values for three (out of five) of the nested model comparisons (meaning approximate invariance was supported in these cases).The evidence provides support for the equality of factor loadings, response thresholds, and indicator uniquenesses.However, the equality of factor variances and covariances was not fully supported, nor was the equality of factor means.In the two final nested models, the changes in CFI exceeded the identified cut-off value (ΔCFI = -.004 and -.010, respectively).This evidence suggests that there may be differences in the structural parameters of the ABQ (e.g., variances, covariances, and/or latent means).
The results from the alignment methodology were consistent with this evidence.We found that 84% of factor loadings and 93% of response thresholds were approximately invariant.The alignment methodology also provided support for differences in factor means.While there were no significant differences in levels of RSA, we found that: (a) levels of EXH were lower for the adult group (M = -0.27,p < .05);and (b) levels of SD were higher for the adult group (M = 0.40, p < .05).The model fit statistics for all invariance models are reported in Table 1.

Discussion
The first aim of the present study was to examine the factorial validity of the ABQ using both CFA and ESEM techniques.In line with our first hypothesis, we found that the ESEM model provided superior fit over the more restrictive CFA model.In the ESEM model, while there was clear evidence of three distinct and discernable factors for the three symptom dimensions of the ABQ, some model misspecification was evident based on cross-loading.The second aim was to examine measurement invariance of the ABQ across important groups defined based on their reported gender (female, male), sport type (individual sport, team sport), and age (≤18 years, >18 years).In line with our second hypothesis, we found evidence that supports the invariance of ABQ measurement properties required to make valid latent mean comparisons across these athlete groups.

Single-group measurement models
While some studies have found support for the original ABQ under a CFA model specification (DeFreese & Smith, 2013;Raedeke & Smith, 2001;Ruser et al., 2020), there are studies in which the same model provides either marginal or suboptimal fit to ABQ data (Appleby et al., 2022;Barcza-Renner et al., 2016;Lonsdale et al., 2006).In the present study, when specifying the ABQ under a CFA model specification, we found evidence of poor model fit.In the CFA model, individual factor loadings did not reveal any signs of potential model misspecification.In all cases, ABQ items provided significant and meaningful loadings on their target factor.However, in line with previous research, potential model misspecification was evident in the strong positive factor correlation between RSA and SD (r > .80;see also Lower-Hoppe et al., 2022).Rather than assuming this result reflects poor discriminant validity, it is important to acknowledge the limitations of CFA (zero cross-loadings permitted).Given that RSA and SD are highly correlated (Raedeke & Smith, 2009), cross-loading items can logically be anticipated.By fixing these cross-loadings to zero, it is unsurprising that CFA often results in poor model fit and very high factor correlations.
To provide a more accurate representation of the latent factors and factor correlations in the ABQ, and learn more about potentially problematic items, it may be important to adopt an ESEM model specification.In doing so, we found that the ESEM model (which permits crossloadings) provided better fit and lower factor correlations relative to the corresponding CFA model.For example, improved support for the distinction between RSA and SD was provided (r = .66).This pattern of results is consistent with wider psychometric research in sport showing that ESEM outperforms CFA (e.g., Grugan et al., 2021;Hill et al., 2016;Myers et al., 2011).In addition to good model fit, support for the ABQ was evident in the pattern of ESEM factor loadings.That is, in all except one case (item SD15), items provided a meaningful loading on their target factor.The ESEM evidence therefore provides support for the factorial validity of the ABQ in that the empirical evidence (i.e., factor loadings and factor correlations) closely matches the corresponding conceptual model of athlete burnout.
While there was evidence to support the factorial validity of the ABQ, it is important to highlight that a small number of RSA and SD items were flagged due to meaningful cross-loading.In the case of RSA, we found that item RA7 ("I am not performing up to my ability in sport") made a meaningful non-target loading on the SD factor, while item RA13 ("It seems that no matter what I do, I don't perform as well as I should") made a meaningful non-target loading on the EXH factor.These cross-loadings reflect the presence of construct-relevant associations with non-target factors and can be explained based on theory and logic (Morin et al., 2020).For example, in interviewing athletes with elevated ABQ scores, Gustafsson et al. (2008) found evidence that a lack of personal accomplishment is sometimes an immediate precursor to experiences of SD and EXH.In this regard, it makes sense that items capturing persistent yet futile attempts to perform to an expected standard might share a meaningful association with SD and EXH.
The issue of cross-loading was also evident for two SD items.Item SD3 ("The effort I spend in sport would be better off spent doing other things") made a meaningful non-target loading on the EXH factor, while item SD15 ("I have negative feelings in sport") made a meaningful non-target loading on the RSA factor.This evidence of construct-relevant association can also be logically explained.For example, Gustafsson et al. (2008) found that some athletes with elevated ABQ scores referenced feelings of dissatisfaction with their personal performance.This might explain why item SD15 -which references "negative feelings" -emerged as a better indicator of RSA than SD.In the case of item SD3, it is conceivable that doubts about sporting participation (a key characteristic of SD) and feelings of lethargy and needing a break from sport (key characteristics of EXH) represent somewhat similar experiential states (Creswell & Eklund, 2006).This similarity would explain why this item made a comparably strong loading on these two factors.
The key point to emphasize is that the majority ABQ items appear to be key target construct indicators.This means that researchers can be confident in using item scores to measure each of the three latent athlete burnout symptoms.However, there are two important caveats to acknowledge.The first caveat is that item SD15 ("I have negative feelings in sport") failed to make a meaningful target factor loading.In reviewing previous research, we found that SD15 has previously been flagged for issues with clarity and been removed from the ABQ following factor analysis (Casanova et al., 2023;Isoard-Gautheur et al., 2010).If issues with this item persist, it may be advisable to replace "negative feelings" with a type of negative feeling more characteristic of SD (e.g., "unenthusiastic, "cynical", or "pessimistic"; Creswell & Eklund, 2006).
The second caveat to consider when evaluating the support provided for the ABQ is that we found a small number of items that are not distinct enough to distinguish between the target factor and other symptoms of burnout.In line with previous research (e.g., Casanova et al., 2023), we found that this issue was particularly apparent for items designed to measure RSA and SD (RA7, RA13, and SD3).The two RSA items capture a sense of reduced accomplishment (e.g., "I am not achieving much in sport").However, these items could be refined to include the troubling sense of inadequacy or self-imposed verdict of failure that defines this symptom (e.g."It's painful to say, but I'm not performing up to my ability in sport").The SD item could also be refined by incorporating a deeper sense of contempt towards sport (e.g., "I feel like the time I spend in my sport is being wasted").These are the types of change that may be required to: (a) ensure that each item is a better predictor of the target construct; and (b) help achieve a clearer interpretation between the factors.

Multi-group measurement models
An important aim in the present study was to examine measurement invariance of the ABQ.This type of assessment is important as many researchers are interested in whether ABQ scores differ across groups of athletes (e.g., Cremades & Wiggins, 2008;Dubuc-Charbonneau et al., 2014).An important prerequisite to this type of research is evidence for equality of factor loadings and response thresholds across groups.Only when this level of measurement invariance is supported can researchers confidently conclude that group differences in latent factor means reflect true group differences (Han et al., 2019).Without this evidence, it is impossible to rule out that such differences arise due to measurement non-invariance (Morin et al., 2011).Even though evidence of measurement invariance is clearly essential, many measures we use in sport and exercise psychology have not been properly examined for their measurement invariance across athlete groups (Pacewicz et al., 2022).
In terms of previous research on the original ABQ, the only study examining the original version of the ABQ (English, 15-item) for measurement invariance focussed on methods of data collection (Lonsdale et al., 2006).To build on this study and carry out research called for by burnout researchers (Gustafsson et al., 2007), we examined measurement invariance across reported gender (female, male), sport type (individual sport, team sport), and age (≤18 years, >18 years) groups.In doing so, we found consistent evidence that the three-factor ABQ operates equivalently across the specified groups.This was evident in that: (a) each increasingly restrictive invariance model (configural, weak, strong, and strict) provided good fit to the data; (b) differences in the alternative fit measures between nested models provided evidence for the approximate invariance of factor loadings, response thresholds, and indicator uniquenesses; and (c) the percentage of approximately invariant parameters (i.e., factor loadings and response thresholds) in the ABQ across gender, sport type, and age groups was very high.These findings suggest that we can reliably make ABQ-based comparisons across these groups.
In all three tests of measurement invariance (gender, sport type, and age) we found evidence for differences in the latent means of the ABQ factors.For example, in line with previous research, we found that levels of RSA were lower for team sport athletes in comparison to individual sport athletes (e.g., Cremades & Wiggins, 2008), and levels of SD were higher in adult athletes in comparison to adolescent athletes (e.g., Madigan et al., 2022).We also found some differences that were inconsistent with previous research, such as levels of EXH being higher for male athletes rather than for female athletes (Dubuc-Charbonneau et al., 2014).Given the limited and inconsistent evidence that exists pertaining to differences in athlete burnout across gender, sport type, and age, more research in this area is clearly required.The evidence reported in the present study is important in this regard as it provides evidence of the necessary invariance required to make valid comparisons between latent ABQ factor means when studying these variables.

Limitations and future research
While the present study has several notable strengths, there are important limitations that warrant consideration.The first point is that we only examined first-order ABQ models in our tests of factorial validity and measurement invariance.This approach is consistent with the original ABQ specification (Raedeke & Smith, 2001).However, as researchers often examine burnout as a global construct, it may be important to examine the applicability of a hierarchical ABQ structure using both single and multi-group analyses.An additional point to highlight is that we used one pooled sample of data to examine factorial validity.Researchers may wish to establish whether the support we found (in addition to the potential areas of model misspecification we identified) is stable across multiple independent samples.This is because the stability of model parameters across samples is an important test of model applicability (Fabrigar & Wegener, 2012).
It is also important for researchers to examine the ABQ for measurement invariance in groups beyond those tested in the present study (e.g., elite versus non-elite athletes).One assessment that is required is measurement invariance across measurement occasions.With researchers using the ABQ to examine changes in burnout following intervention (e.g., Langan et al., 2015) or significant life events (e.g., COVID-19;Woods et al., 2022), this research is a priority.When testing for measurement invariance, if researchers find evidence of non-invariance (e.g., differences in fit measures between nested models that exceed specific cut-off values), they may look to estimate the magnitude of the misfit.This can be achieved by computing effect size measures of non-invariance that inform researchers about the degree of non-invariance and its practical importance (Gunn et al., 2020).

Conclusion
The major finding in the present study is that the athlete burnout construct (as measured using the ABQ) is approximately invariant across the reported gender (female, male), sport type (individual sport, team sport), and age (≤18 years, >18 years) groups.Researchers using the ABQ can collect data across such groups and examine potential differences with confidence that the measure is acceptably invariant.We also found evidence that ABQ data is best modelled using an ESEM specification.While we found support for the factorial validity of the ABQ using this technique, it is important to highlight that some potentially problematic items were identified.We have therefore suggested areas of refinement to help solve these issues and improve the alignment between the ABQ and the associated theoretical framework of athlete burnout.

Table 1
Goodness of fit statistics for CFA, ESEM, and ESEM invariance measurement models.

Table 2
Item statistics, scale reliability estimates, and standardized factor loadings for CFA and ESEM models.