Alexithymia in eating disorders: Systematic review and meta-analyses of studies using the Toronto Alexithymia Scale

Objective: The aim of this review was to synthesise the literature on the use of the Toronto Alexithymia Scale (TAS) in eating disorder populations and Healthy Controls (HCs) and to compare TAS scores in these groups. Method: Electronic databases were searched systematically for studies using the TAS and meta-analyses were performed to statistically compare scores on the TAS between individuals with eating disorders and HCs. Results: Forty-eight studies using the TAS with both a clinical eating disorder group and HCs were identi ﬁ ed. Of these, 44 were included in the meta-analyses, separated into: Anorexia Nervosa; Anorexia Nervosa, Restricting subtype; Anorexia Nervosa, Binge-Purge subtype, Bulimia Nervosa and Binge Eating Disorder. For all groups, there were signi ﬁ cant di ﬀ erences with medium or large e ﬀ ect sizes between the clinical group and HCs, with the clinical group scoring signi ﬁ cantly higher on the TAS, indicating greater di ﬃ culty with identifying and labelling emotions. Conclusion: Across the spectrum of eating disorders, individuals report having di ﬃ culties recognising or de- scribing their emotions. Given the self-report design of the TAS, research to develop and evaluate treatments and clinician-administered assessments of alexithymia (0.5) Positive e ﬀ ect the clinical group higher on the TAS than the p value of 0.05 a signi ﬁ cant di ﬀ erence the clinical group and HCs. To the potential impact of moderator variables on the results of the meta-analysis, meta-regression was performed using STATA with the following


Introduction
Alexithymia, meaning literally "no words for mood" [1] was first coined in the 1970s to define an inability to describe and/or recognise one's own emotions. Since then, research has focused on both understanding alexithymia and on measuring it in both clinical and general populations. Alexithymia is known to be present in several psychiatric disorders, including depression [2]; Obsessive-Compulsive Disorder [3]; Schizophrenia [4]; Post-Traumatic Stress Disorder [5]; Autism Spectrum Disorder [6] and eating disorders (EDs) [e.g., 7]. While alexithymia is described as a stable personality trait [8] it correlates highly with symptoms of both depression and anxiety and may be a predisposing factor for the development of other psychopathologies [9]. What's more, alexithymia is thought to underlie emotional difficulties in individuals with eating disorders [10] and has been implicated in both the development and maintenance of EDs [11]. It is also related to poorer treatment outcome, making it a relevant treatment target [12].
Prevalence estimates of alexithymia within the general population, as measured by the twenty-item Toronto Alexithymia Scale  13], range from 5.2 to 18.8%, with a prevalence of 18% being reported in a British undergraduate sample [14]. In this study, alexithymia was found be to be more prevalent in females than in males. Alexithymia is also associated with higher levels of sub-clinical disordered eating in undergraduate females [15], mirroring what has been found in ED populations [16][17][18].
One of the main focuses of alexithymia research has been how to effectively measure the concept. The development of the TAS-20 [13] resulted in increased interest in this field as it provides an efficient way to measure alexithymia, allowing for comparability across clinical groups [9]. The TAS-20 is a brief, self-report measure on which participants rate their level of agreement to statements on a five-point Likert scale, yielding a total score as well as subscale scores designed to measure: difficulty identifying feelings (DIF); difficulty describing feelings (DDF) and externally-oriented thinking (EOT). The maximum possible score on the TAS-20 is 100 with a score of 61 or above indicative of high levels of alexithymia [19]. The TAS-20 demonstrates good reliability and factorial validity [20,21]. Despite the TAS being widely used, normative data for ED populations have not yet been reported. Synthesising studies comparing scores on the TAS in ED groups with Healthy Controls (HCs) would therefore aid comparison between existing and future studies.
The TAS-20 has been criticised for not measuring a universal alexithymia construct but perhaps instead measuring concepts such as social shame [22], negative emotional expressivity [23] or negative affect [24]. The factor structure of the TAS-20 may also vary across samples [25], highlighting the need to use the TAS-20 in combination with other measures of alexithymia [26]. Disagreement exists over exactly which constructs instruments are measuring and there is still a need for reliable, objective measures of alexithymia for use across psychiatric populations. Determining whether individuals with EDs consistently score higher on the TAS than controls would be useful for future research, aiming to develop new ways of measuring alexithymia in this population.
In a critical review of the literature on alexithymia in EDs, Nowakowski, McFarlane [18] report that individuals with EDs consistently report higher levels of alexithymia on the TAS than controls. However, as this review did not include meta-analysis of studies, it is not known whether the effect size is the same across the spectrum of EDs, e.g., in Anorexia Nervosa (AN), Bulimia Nervosa (BN) or Binge ED (BED), or whether a particular diagnosis is associated with higher levels of alexithymia. Nowakowski, McFarlane [18] report that individuals with EDs score higher on two of the TAS-20 subscales: DDF and DIF but not on EOT. Performing meta-analyses of subscale scores will help synthesise this literature further and to determine whether significant differences exist between groups on all sub-scale scores.

Aims of the study
This review aimed to synthesis the literature on the use of the Toronto Alexithymia Scale to assess alexithymia across the spectrum of eating disorder and to compare total and sub-scale scores on the Toronto Alexithymia Scale between eating disorder diagnoses.

Method
The systematic review and meta-analysis was conducted according to the PRISMA statement [27]. The quality of each study was assessed using the Clinical Appraisal Skills Programme checklist for case-control studies [28]. The tool consists of 11 questions, which yield a mixture of 'yes', 'no' and more qualitative answers. In this review, extra questions were added to more fully appraise the specific qualities of studies addressing alexithymia in EDs. These included whether confounding variables were accounted for in analysis and whether association between the TAS scores and other psychopathologies was examined. To calculate an overall quality rating, several questions were split into sub questions and a score of 1 was awarded for every 'yes' answered, with a maximum possible score of 17. The quality rating for each study is shown in Table 1.

Eligibility criteria
Studies using either the TAS-20 or TAS-26 with both a clinical ED population and HCs were included in the review. Inclusion criteria were: 1) full text available in English; 2) reporting mean and standard deviation TAS total scores for both groups; 3) published in a peer-reviewed journal.

Information sources and search
The electronic databases PsychInfo, Scopus, Pubmed and Web of Science were searched systematically for papers up to and including May 2017. The search terms were either Anorexia Nervosa, Bulimia Nervosa or ED and alexithymia or Toronto Alexithymia Scale. With the exception of being published in a peer-reviewed journal, no other search limits were applied. The reference list of a previously published review [18] was also screened for relevant studies.

Selection
The titles of papers were screened for relevance and the abstracts of those that appeared to meet the criteria were then screened by both the first and second authors. Full texts were retrieved if the abstract indicated that the inclusion criteria had been met or if the details of the study were ambiguous. The first and second author discussed all fulltexts and reached consensus about whether to include them in the review. Any full-texts which did not meet the inclusion criteria were excluded. The number of papers reviewed at each stage of the review process, including reasons for exclusion at full-text screening, is displayed in Fig. 1.

Data collection and items
The following data were extracted from each included paper: diagnosis of clinical group; mean age; mean BMI; mean ED duration; how the clinical and HC groups were matched; TAS version; mean TAS scores, including subscale scores if the TAS-20 was used; recruitment site; percentage of female participants; diagnostic tool used and any comorbidities which were assessed.

Risk of bias in individuals studies
Risk of bias within each study was assessed by considering how the methodology may impact on the results i.e., how clinical groups were matched to HCs, where participants were recruited from and how ED pathology was assessed.

Summary measure
The principle measure used for meta-analysis was the difference in mean scores and standard deviations on the TAS and any reported subscale scores.

Synthesis of data
For the purposes of meta-analyses, studies were split into different ED diagnoses: AN; AN binge-purge type (AN-BP); AN restricting type (AN-R), BN and BED. Studies which included more than one diagnostic group e.g. AN and BN as well as HCs were included in each respective meta-analysis separately. The meta-analyses were performed by pooling the standard effect sizes using a random effects model. This model assumes that as well as within-group variability in scores, mean effect size is also caused by differences between studies. The random-effects model includes between study heterogeneity, resulting in estimates with wider confidence intervals than fixed-effect models. Individual meta-analyses were also run for each of the TAS-20 subscale scores, again split into different diagnostic groups.

Statistical analysis
All meta-analyses were conducted using Review Manager 5.3 [29]. Cohen's d [30] was used to estimate effect sizes using the following interpretation: small (0.2), medium (0.5) and large (0.8). Positive effect sizes indicate that the clinical group scored higher on the TAS than the control group. A p value of < 0.05 indicates a significant difference between the clinical group and HCs. To assess the potential impact of moderator variables on the results of the meta-analysis, meta-regression was performed using STATA 13 [31] with the following user-contributed command: metreg [32].

Risk of bias across studies
Publication bias was assessed visually by inspection of funnel plots, which represent a plot of a study's precision (1/standard error) against effect size. The absence of studies in the right bottom corner (low precision and small effect sizes) of a funnel plot is usually taken as an indication of publication bias. The Duval and Tweedie [33] non-parametric 'trim and fill' method was also used, which accounts for publication bias in meta-analysis as implemented in Stata's user-written command 'metatrim' [34]. Effect sizes following adjustment for the publication bias using the trim and fill method are reported. It was noted that the effect size of one study [35] was extremely large, representing an outlier in the AN meta-analysis. The original data were checked and had been reported incorrectly in the paper (standard deviations of the AN and HC groups were incorrect) and have therefore been updated here accordingly.
Between-study heterogeneity was measured using I 2 [36] based on Cochran's Q test: measure of heterogeneity, I 2 = 100% × (Q-df)/Q, where df is degrees of freedom. I 2 ranges between 0%, indicating no heterogeneity and 100%, indicating high heterogeneity with the following approximate interpretation: 0 to 40% might not be important; 30 to 60%, moderate heterogeneity; 50 to 90% may represent substantial heterogeneity and 75 to 100% is considerable heterogeneity [37].

Additional analysis
To examine the potential predictors of between study heterogeneity, age or BMI measures were assessed to examine whether they could explain some of the variance using meta-regressions. Mean age of the clinical and control group and the mean age difference between clinical and control group were used as predictor variables. To assess BMI, BMI of the clinical group and BMI differences between clinical and control group were used. For each domain, three models were assessed: mean age, age difference and mean age and age difference and mean clinical BMI, BMI difference and mean clinical and BMI difference, respectively.

Study selection
A total of 48 studies were identified through systematic review of the literature. Of these studies, three [24,38,39] include a mixed ED sample i.e., patients with either AN, BN or BED and one study [40] included only patients who had recovered from AN. These subgroups of studies were too small to be included in meta-analysis, leaving 44 studies to be included in further analysis. Rommel, Nandrino [41] included an AN-R and a mixed AN-BP/BN group with purging symptoms, therefore, only the AN-R group was included in the corresponding meta-analysis as the mixed group could not be compared to either AN-BP or BN groups in other studies.

Study characteristics
All extracted information included in the systematic review and meta-analyses is presented in Table 1. Generally, the quality of reporting within individual studies was high. All studies included the mean age of participants, aside from Speranza, Corcos [42] who did not report the mean age of HCs. Twelve studies did not report the mean BMI or percentage of ideal body weight (%IBW) for at least one participant group. Half of the studies identified (N = 22) did not report the duration of illness in the clinical ED group. Sixteen studies did not describe how HCs were matched to the clinical group while one study reported the groups being matched by "sociodemographic characteristics" without being more specific [43]. The most common characteristic on which groups were matched was age (N = 29) and 24 studies matched  groups on at least two characteristics. Twenty-six studies included at least one group with a sample size of < 30.
Out of a maximum score of 17 on the quality appraisal, three studies scored 16 [44][45][46]. The study with the lowest overall quality rating was Deborde, Berthoz [39], which did not include any information on how participants were recruited, inclusion/exclusion criteria, any confounding factors which may have influence the results, the relationship between TAS scores and other variables or the precision or generalisability of the results. All other studies scored between 11 and 15.
Eleven studies used the TAS-26 while the remaining 37 studies used the 20-item version. Of the studies using the TAS-20, 20 reported scores for the three subscales while two [46,47] included subscale scores for DIF and DDF but not EOT, due to poor internal consistency of this factor within the study. Seven studies did not report where at least one of the groups (clinical or HC) was recruited from and 14 studies did not provide details of how EDs were diagnosed. Co-morbid mental health problems were assessed in all but nine of the included studies, with symptoms of depression and anxiety being the most commonly assessed. Except for Marchesi, Ossola [24], whose ED and HCs samples was 92.3% and 80.8% female respectively and Aloi, Rania [48] whose BED and HC groups were 45% and 81.4% respectively, all studies included only female participants. Tchanturia, Davies [49] did not explicitly report the sex of their participants.
Twenty-two studies attempted to control for potential confounding variables within analysis, 19 of which controlled for depression. After controlling for depression, the difference in TAS scores between the clinical group and HCs remained significant in eight studies. In the other studies, the difference was either no longer significant or was only significant on certain subscales of the TAS or for subgroups of participants. Thirty-four studies also examined correlations between TAS scores and other variables, including depression, anxiety, BMI and illness duration, of which 15 studies reported a significant positively correlation between TAS scores and depression in their respective clinical groups. More information on the results of the quality appraisal are presented in the appendix.

Risk of bias
The funnel plots for AN, AN-R, AN-BP, BN and BED studies are shown in Figs. 2-6. In all five study groups, there was some evidence of publication bias as there was a small asymmetrical appearance of the funnel plots with a gap in the left bottom corner of the graph, indicating that smaller effect sized studies with less precision may be missing. The trim and fill method indicated missing studies in all five groups. The re-

Synthesis of results
The forest plot of studies including participants with AN is displayed in Fig. 7. The random-effect analysis with a total sample size of 2332 participants (AN = 944, HC = 1388) from 22 studies revealed a significant difference with a large effect size between AN and HC groups on the total TAS score (d = 1.44, (95% CI 1.2, 1.68) z = 12.01, p < 0.001). For studies in which AN subtype was defined, the forest plots are displayed in Fig. 8. For AN-R studies, the analysis with a total sample size of 1159 participants (AN-R = 441, HC = 718) from 12 studies revealed a significant difference between groups, again with a large effect size (d = 1.18, (95% CI 0.90, 1.46) z = 8.22, p < 0.001). For AN-BP studies, the analysis with a total sample size of 720 participants (AN-BP = 177, HC = 543) from six studies revealed a significant difference with large effect size (d = 1.25, (95% CI 0.79, 1.72) z = 5.33, p < 0.001). The forest plot of BN studies is displayed in Fig. 9. The analysis included a total of 2391 participants (BN = 858, HC = 1533) from 21 studies and revealed a significant difference with a large effect size between the two groups (d = 1.26, (95% CI 1.02, 1.51) z = 10.07, p < 0.001). The forest plot for BED studies is displayed in Fig. 10. The analysis included a total of 488 participants (BED = 192, HC = 296) from five studies and revealed a significant     differences with a medium effect size between the two groups (d = 0.76, (95% CI 0.31, 1.21) z = 3.32, p < 0.001).

TAS-20 subscale analysis
The forest plots for TAS-20 subscale analysis are shown in Figs. 11-13. For AN studies, the random-effect analysis included a total sample size of 1642 (AN = 635, HC = 1007) from 13 studies for the DIF and DDF subscales. For the DIF subscale, there was a significant difference between groups with a large effect size (d = 1.57, (95% CI 1.33, 1.80) z = 13.1933, p < 0.001). For the DDF subscale, there was also a significant difference between groups with a large effect size (d = 1.11, (95% CI 0.93, 1.29) z = 12.14, p < 0.001). For the EOT subscale, the analysis included 1509 participants (AN = 582, HC = 927) from 12 studies. For the EOT subscale, there was a significant difference between groups with a small effect size (d = 0.48, (95% CI 0.23, 0.74) z = 3.73, p < 0.001).
For AN-R studies, the meta-analysis of the DIF and DDF subscales

Additional analysis
There was evidence of substantial heterogeneity in the overall TAS score analysis for all diagnostic groups (AN, I 2 = 81%; AN-R, I 2 = 74%, AN-BP, I 2 = 80%; BN, I 2 = 82%, BED, I 2 = 78%). To examine the potential cause of this heterogeneity, meta-regression was performed with BMI of the clinical group, differences in BMI between clinical and control group, mean age and age difference between clinical and control group as moderator variables. Meta regressions revealed that BMI of the clinical groups or the difference in BMI between clinical and control group revealed that only in the AN group there was a significant positive effect of BMI difference on effect size (b = 0.21 (95% CI 0.03,

Discussion
The aim of this review was to synthesise the literature on alexithymia in EDs using the TAS, a widely used self-report measure. A total of 48 studies were identified through systematic review, 44 of which were included in meta-analyses examining group differences in total TAS scores. Twenty-two AN studies, 12 AN-R studies, six AN-BP studies, 21 BN studies and five BED studies were included in the meta-analyses. There were significant differences between all diagnostic groups and HCs on the total TAS score with large effect sizes, with the exception of BED where the difference was with a medium effect size. This indicates that individuals across the spectrum of EDs have more difficulties with identifying and describing their emotions than individuals without EDs. When the individual subscale scores of the TAS-20 were analysed, individuals with AN scored significantly higher on all subscales than HCs. However, in studies where the AN subtype was defined as AN-R and in BN studies, the clinical group only scored higher on the Difficulty Identifying Feelings and Difficulty Describing Feelings subscales. Due to lack of data, it was not possible to examine subscale scores in individuals with AN-BP or BED.
There was evidence of publication bias in all diagnostic groups. This may have been accounted for by the exclusion of any studies that did not report both the mean and standard deviations of TAS scores. Further attempt to contact the authors of such papers may have therefore been beneficial. In addition, methodological bias may have impacted on the results. For example, a large proportion of studies had a relatively small sample size, with very few accounting for this with a power size calculation. When adjusting for publication bias, the effect sizes remained significant, however, for BED the significance disappeared. Further studies examining alexithymia in individuals with BED are therefore needed to confirm whether they have real difficulties with identifying and labelling emotions.
There were high levels of heterogeneity (74-82%) across studies. To examine potential reasons for this heterogeneity, BMI of the clinical group, differences in BMI between groups, age and age differences Fig. 9. Forest plot of mean TAS score: standardized mean effect size for differences (SMD) between Bulimia Nervosa (BN) and Healthy Controls. CI, confidence interval. Fig. 10. Forest plot of mean TAS score: standardized mean effect size for differences (SMD) between Binge Eating Disorder (BED) and Healthy Controls. CI, confidence interval. between groups were entered as moderator variables into meta-regression. The BMI of the clinical groups did not influence the effect size for any of the ED diagnoses. In the AN analysis, there was a significant effect of BMI difference between the clinical and HC group on effect size. There was also a significant effect of mean age of the clinical groups on outcome in AN and BN studies, with older age of the clinical group being associated with a larger effect size.
The acute phase of AN is associated with reduced facial expression of emotions compared with recovered patients [85], suggesting that starvation may impact on alexithymia scores. As older patients may be expected to have longer illness durations than younger patients, this could explain why age was associated with a larger effect size. It is also possible that older patients experience more co-morbidities, including Obsessive-Compulsive Disorder and depression, which are also associated with alexithymia [2]. In the general population, alexithymia has also been associated with increasing age [86]. Other reasons for the large heterogeneity between studies include factors such as the level of comorbid psychiatric conditions, such as anxiety or depression, which could not be controlled for in meta-regression. There is also some evidence that treatment outcome, or indeed the treatment which patients receive, may be associated with levels of alexithymia in ED patients [87] and thus the type of treatment patients in each of the studies were receiving may have also accounted for some of the heterogeneity observed.
Despite the large number of cross-sectional studies examining alexithymia in EDs, longitudinal studies examining changes in TAS scores overtime are lacking. One study [88] examined the predictive value of alexithymia over three-years in patients with EDs. Using the TAS-20, the DIF factor was a significant predictor of treatment outcome, independent of both depression and eating disorder severity. In addition, there was significant improvement in alexithymia scores, along with improvement in clinical severity and depression, suggesting that alexithymia may not be stable in individuals with EDs. The one study identified in this review which included recovered patients [40] found no significant difference on TAS-20 scores between the clinical and HC groups. This suggests that starvation, or the acute phase of AN may impact on alexithymia. However, as illness duration was not included in meta-regression, it is not possible to draw conclusions about its impact on alexithymia in individuals with EDs.
Interestingly, while BMI of the clinical groups did not influence the effect size, in AN studies, the difference in BMI between the two groups did have a significant positive effect on effect size, with a greater difference in BMI being associated with a larger effect size. This suggests that BMI may be associated with alexithymia in AN although after controlling for this, the main effect was still significant. Three studies included in the AN meta-analysis [47,50,53] controlled for BMI in their analysis and found that group differences remained significant. There is also a possibility that this is a chance result which would disappear Fig. 11. Forest plot of mean TAS subscale scores: standardized mean effect size for differences (SMD) between Anorexia Nervosa (AN) and Healthy Controls (HC). CI, confidence interval.
after controlling for multiple testing. One previous study [89] found that lower BMI was associated with decreased difficulties with emotion regulation in women with acute AN. While emotion regulation is a separate construct to alexithymia, with the former referring to the ability to respond appropriately to situations with a range of emotions, one might expect the two to be linked. Further research may therefore be warranted to explore the impact of age, BMI and illness duration on both alexithymia and emotional regulation.
Around half of the studies attempted to control for the effect of potentially confounding variables within the analysis of group  differences in TAS scores. Of the 19 studies which controlled for depression, the differences in TAS scores between clinical and HC groups only remained significant in eight. In the eleven studies whose results became insignificant after controlling for depression, eight were conducted with individuals with AN (including AN-R and AN-BP). Six studies [24,[43][44][45]54,67] also controlled for anxiety within the analysis, however, so it is not possible to determine whether group differences were influenced by depression, anxiety or both. Given the previous literature suggesting a link between alexithymia and depression [e.g., 9], research aiming to elucidate the relationship between the two constructs i.e., with inclusion of a clinical control group with depression, would be beneficial.
These meta-analyses indicate that difficulties with identifying and describing emotions are transdiagnostic across the spectrum of EDs. This is consistent with a previous systematic review of alexithymia [18] and extends previous research by demonstrating that the differences between clinical groups and HCs are of the same magnitude i.e., large effect sizes, across ED diagnoses. Nowakowski, McFarlane [18] found that individuals with EDs score higher on the DIF and DDF subscales of the TAS-20 but not on the EOT subscale. In this meta-analysis, individuals with AN were found to score significantly higher on all subscales, including the EOT whereas in AN-R studies and BN studies there was no difference between groups on the EOT subscale. This suggests that difficulties with EOT may be diagnosis-specific although further research to confirm this is warranted. Cronbach's alpha has been shown to be lower for EOT than for the other two factors and both DDF and DIF have low correlations with EOT, possibly due to low internal consistency of this factor [90]. This may also explain the reason for the inconsistent findings across EDs on the EOT subscale in these metaanalyses.
Compared with other ED diagnoses, only five studies using the TAS were identified in individuals with BED and it was not possible to conduct subscale analysis on these studies, due to only two of them [48,70] including subscale scores. Future research is therefore needed to determine with difficulties across the TAS subscales are present in BED. In addition, studies examining the presence of alexithymia in individuals who have recovered from an ED will help delineate the relationship between alexithymia and ED psychopathology.
Despite individuals with EDs scoring significantly higher than HCs, it is still not clear whether the TAS is measuring a universal construct of alexithymia or whether it is instead measuring other traits such as negative affect, emotional expressivity or social shame [22][23][24]. Differences in effect size on the TAS-20 subscale scores indicate that the TAS may be measuring several different constructs and suggests that the proposed three-factor model may not be suitable for use with ED populations [25]. Self-report measures such as the TAS may not be reliable in that the very nature of alexithymia may make it difficult for individuals to reflect on their emotions, thus giving inaccurate report. For this reason, the development of tools which accurately assess the nature of difficulties with emotion recognition using more objective measures would be beneficial. This would ensure that co-morbidities such as anxiety or depression can be adequately controlled for in the assessment of alexithymia and would allow for meaningful comparisons between clinical groups.
The Observer Alexithymia Scale [OAS; 91], an informant-report measure, was developed as an alternative way of measuring alexithymia and can be completed by either relatives or clinicians. In a study examining the use of the OAS in an eating disorder population, the measure showed acceptable validity and inter-rater reliability and the OAS was recommended for use alongside the TAS-20 in both research and clinical practice [92]. The psychometric properties of the OAS have been tested across a range of psychiatric disorders and it was found to be psychometrically sound for evaluating observer ratings of alexithymia [93]. Despite this, correlation between the TAS-20 and OAS is reportedly weak [94] and the authors do not recommend its use in clinical settings. Thus, disagreement exists over exactly which constructs the OAS and TAS are measuring. Another informant measure, the Toronto Structured Interview for Alexithymia [TSIA; 95] allows for multi-modal assessment of alexithymia and was designed as a tool for clinicians to elicit information about the extent of a patient's difficulties with DIF, DDF, EOT and fantasy and imaginal processes. When used with females with AN and their parents, there was significant discordance between the two measures, with suggestion that the TSIA may be more sensitive for detecting alexithymia than the selfreport TAS-20 [96]. Further studies using the TSIA or other clinical-led assessments across psychiatric populations would therefore help determine whether such tools were measuring independent, related or homogenous constructs.
The findings from the current meta-analysis add to the wider literature on socio-emotional difficulties in EDs. For example, a review [97] used a multidimensional framework to map out emotion regulation difficulties in AN and BN. The model [98] outlines four dimensions theorised to contribute to the development or maintenance of psychopathology: use of adaptive and situationally appropriate emotion regulation strategies; impulse inhibition and behavioural control when distressed; emotional awareness, clarity, and acceptance; and emotional approach and tolerance. There is evidence to suggest those with AN and BN have difficulties across all four dimensions, however most relevant here are difficulties with emotional awareness and acceptance, including several constructs which overlap with alexithymia. For example, using the Levels of Emotional Awareness Scale [LEAS; 99], which asks individuals to report how they and another person would feel in various scenarios, several studies have found impairments in emotional awareness in the self and others in both EDs [44,100]. Studies using experimental paradigms such as the LEAS while controlling for levels of alexithymia would help elucidate the relationships between different socio-emotional constructs in people with EDs.
Relatedly, emotional theory of mind, the ability to infer the emotional states of others, is also reported to be impaired in AN and BN [101,102], but may improve with recovery [100]. Facial emotion recognition also appears to be impaired in AN (but not BN), although results vary somewhat for different emotions [103,104]. Finally, there is evidence from both self-report and performance-based measures for greater emotional suppression and non-acceptance in AN and BN compared to controls [105][106][107]. Interestingly, this emotional suppression appears to also be reflected in facial emotion expressivity, with individuals with AN showing significantly less positive emotions than HCs, and BN showing an intermediate profile [108]. Given the difficulties in recognising ones' own emotions, it is perhaps not surprising that those with EDs show widespread problems in decoding the emotional states of others.
Given the difficulties that individuals with EDs have identifying and describing emotions, clinical intervention has recently shifted focus on addressing these issues. Certain treatment protocols including Cognitive Remediation and Emotion Skills Training (CREST) have attempted to address these difficulties in both individual and group format [109,110], with preliminary findings suggesting that such treatment leads to improvements in patients' ability to label emotions and a reduction in social anhedonia. The development of treatments such as CREST are in their infancy and thus further studies assessing their efficacy in reducing alexithymia are needed. Other treatment modalities such as the Maudsley Model of Anorexia Nervosa Treatment in Adults (MANTRA; Schmidt et al., 2015) and Radically Open Dialectical Behavioural Therapy (RO-DBT; Lynch et al., 2013) also include a focus on emotional difficulties. Another relatively new treatment for EDs, namely Emotion Acceptance Behaviour Therapy [EABT; 111], aims to combine standard behavioural therapy with strategies to increase emotional awareness and has been associated with decreased emotion avoidance at follow-up. Measuring alexithymia before and after the treatment will be beneficial to explore the effectiveness of these approaches on improving emotion recognition.

Limitations
The studies varied greatly in terms of the mean age of participants, mean BMI, mean illness duration of the clinical group, matching criteria to HCs, recruitment sites, diagnostic tools and co-morbidities assessed. This heterogeneity made direct comparison between studies difficult. Only six studies [46,50,52,80,112,113] fully reported all data extracted for the purpose of this review. The variety of sociodemographic characteristics, recruitment sites and diagnostic tools may have accounted for some of the between-study heterogeneity found within the analysis, although this could not be accounted for within analysis. Analysis of risk of bias across studies indicated that smaller effect sized studies with less precision may be missing. There is a possibility that a small number of studies were not identified through systematic review due to not being published in English or full texts being unavailable.

Conclusion
Alexithymia, particularly difficulties with identifying and describing emotions, is transdiagnostic across the ED spectrum. Our systematic review of the literature, focusing on the TAS demonstrated that individuals with AN, BN and BED score consistently higher on the TAS than HCs. Despite this, current instruments which measure alexithymia may be influenced by co-morbid symptoms such as depression or anxiety and may not be measuring a homogenous construct. Recognising and managing emotions are viable treatment targets. Future research should focus on improving the measurement of this construct and on the development of effective clinical intervention to address difficulties with emotional recognition.

Role of funding sources
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Contributors
The first and fourth authors conceived and designed the study. The first and second authors conducted the systematic review and data extraction. The first and third authors conducted the data-analysis. The first author prepared the first draft of the manuscript. All authors contributed to and approved the final manuscript.

Conflict of interest
There are no known conflicts of interest associated with this publication and there has been no significant financial support for this work that could have influenced its outcome. We confirm that the manuscript has been read and approved by all named authors and that there are no other persons who satisfied the criteria for authorship but are not listed.
We further confirm that the order of authors listed in the manuscript has been approved by all of us. We confirm that we have given due consideration to the protection of intellectual property associated with this work and that there are no impediments to publication, including the timing of publication, with respect to intellectual property. In so doing we confirm that we have followed the regulations of our institutions concerning intellectual property.
We understand that the Corresponding Author is the sole contact for the Editorial process (including Editorial Manager and direct communications with the office). He/she is responsible for communicating with the other authors about progress, submissions of revisions and final approval of proofs. We confirm that we have provided a current, correct email address which is accessible by the Corresponding Author and which has been configured to accept email from.