Is alexithymia characterised by impaired interoception? Further evidence, the importance of control variables, and the problems with the Heartbeat Counting Task

Interoception, the perception of one ’ s internal state, is commonly quanti ﬁ ed using the heartbeat counting task (HCT) – which is thought to be a measure of cardiac interoceptive sensitivity (accuracy). Interoceptive sensitivity has been associated with a number of clinical traits and aspects of higher order cognition, including emotion processing and decision-making. It has been proposed that alexithymia (di ﬃ culties identifying and describing one ’ s own emotions) is associated with impaired interoceptive sensitivity, but new research questions this as- sociation. Problematically, much evidence attesting to the absence of this association has been conducted using the HCT, a measure a ﬀ ected by various physiological and psychological factors. Here, we present novel data (N=287) examining the relationship between alexithymia and HCT performance, controlling for a number of potential confounds. Inclusion of these control measures reveals the predicted negative relationship between alexithymia and HCT performance. Results are discussed with regard to di ﬃ culties quantifying interoceptive sensitivity using the HCT.


Introduction
Interoception is generally defined as the ability to perceive one's internal state. Such a seemingly simple definition hides a great deal of uncertainty as to what constitutes an internal signal. For example, some consider proprioception, or perception of external signals that activate interoceptive pathways such as 'affective touch' (e.g. slow stroking of the forearm), to be interoceptive signals, while others do not (Khalsa & Lapidus, 2016). Further, the degree to which interoceptive signals need to be consciously perceived and/or recognised in order for a process to be described as interoceptive has been debated (see Murphy, Brewer, Catmur, & Bird, 2017, for discussion). The wider nature of interoception is also under debate; Garfinkel, Seth, Barrett, Suzuki and Critchley (2015) have proposed extending the notion of interoception by separating it into a tripartite model, whereby three facets of interoceptive 'ability' exist. Under this model, interoceptive sensitivity refers to one's objective accuracy in perceiving interoceptive states. Interoceptive sensitivity is assessed by comparing the degree to which one's perception of one's internal state aligns with objective measures of that internal state. Interoceptive sensibility, on the other hand, describes subjective beliefs about one's own interoceptive states, including the extent to which one perceives oneself to be a) aware of internal signals, and b) accurate at detecting these internal signals. Finally, interoceptive awareness refers to the degree to which one can accurately assess one's own interoceptive sensitivity (a metacognitive ability). However, other models of interoceptive ability have been proposed, with new approaches advocating a distinction between beliefs (selfreport) and objective data concerning a) the ability to perceive the internal state of one's body, and b) the propensity to become aware of, and separately to utilise, interoceptive information .
Several models of higher-order cognition assign a role to interoception, in areas as diverse as emotion processing, learning and decision-making, and the sense of self (e.g., Critchley & Nagai, 2012;Critchley & Harrison, 2013;Dunn et al., 2010;Füstös, Gramann, Herbert, & Pollatos, 2013;Quattrocki & Friston, 2014;Seth, 2013 study of interoception has also been extended into the clinical domain, due to the fact that atypical interoception is thought to characterise a number of physical and psychiatric conditions such as eating disorders, anxiety, depression and Autism Spectrum Disorder (see Barrett & Simmons, 2015;Khalsa & Lapidus, 2016;Murphy, Brewer, Catmur, & Bird, 2017). One of the most comprehensive clinically-relevant interoceptive theories was that advanced by Quattrocki and Friston (2014), who suggested an interoceptive deficit was responsible for the symptoms of Autism Spectrum Disorder (henceforth 'autism'). However, the literature on interoception in autism is mixed; while Garfinkel, Tiley et al. (2016) found that adults with autism demonstrated worse performance on the Heartbeat Counting Task (HCT; Dale & Anderson, 1978;Schandry, 1981), a commonly-used measure in which participants are required to count their heartbeats over a specified interval and their count is compared to an objective measure, they were unimpaired on another measure of cardiac interoceptive sensitivity (the heartbeat discrimination task, in which participants are required to judge whether auditory or visual signals are synchronous with their heartbeat). In addition, Schauder, Mash, Bryant, and Cascio (2015) examined HCT performance in autistic 1 children and found them to be unimpaired. In fact, autistic children performed better on the task than typical children over longer durations. Noel, Lytle, Cascio, and Wallace (2018) also found that a small sample of adults with autism performed at a level similar to typical individuals on the HCT. Brewer, Happé, Cook, and Bird (2015) have argued that the pattern of deficits predicted by Quattrocki and Friston's interoceptive model does not characterise autism, but that some of the deficits instead characterise alexithymia (a sub-clinical condition in which individuals are poor at identifying and describing their emotions, and have an externally-oriented thinking style . The possibility of a link between alexithymia and interoceptive sensitivity was supported by the results of an initial study, which found that increased levels of alexithymia were associated with worse performance on the HCT (Herbert, Herbert, & Pollatos, 2011). As approximately 50-60% of individuals with autism also have alexithymia (e.g., Berthoz & Hill, 2005), it is possible that sampling variance with respect to alexithymia within the autistic population explains the inconsistent findings concerning autism and interoception. It may be the case that when samples of autistic individuals are largely comprised of alexithymic individuals, group level deficits are observed, but when the autism sample has a smaller proportion of alexithymic individuals, the autistic group perform as well as a group of typical individuals. This hypothesis was supported in a study in which the impact of autistic and alexithymic traits on the HCT were contrasted. Across two experiments, alexithymia, rather than autism, predicted lower sensitivity to cardiac signals as measured using the HCT (Shah, Hall, Catmur, & Bird, 2016). While the relationship between alexithymia and HCT performance replicated across both experiments, samples were relatively small (N < 50), likely providing an imprecise measure of the true effect size. Indeed, the magnitude of the correlation between alexithymia and performance on the HCT varied considerably across experiments (−0.36 and −0.64). It is notable, therefore, that the one previous study to examine the relationship between alexithymia and performance on the HCT in a larger (N = 155) non-clinical sample found a correlation of −0.37 (Herbert et al., 2011). More recent indirect studies also support this association in a typical population; Bornemann and Singer (2017) demonstrated that 9 months of meditative training had correlated effects on levels of alexithymia and interoceptive sensitivity, such that the reduction in alexithymia following the meditative training was associated with improvements in interoceptive sensitivity measured using the HCT.
Evidence therefore supports the hypothesis that alexithymia, rather than autism, is associated with poor performance on the HCT and, assuming the HCT is an index of interoceptive sensitivity, that alexithymia is associated with impaired interoceptive sensitivity. However, since these initial studies were published, a small number of papers and conference proceedings have reported a failure to replicate the association between performance on the HCT and alexithymia (e.g. Borhani, Làdavas, Fotopoulou, & Haggard, 2017;Palser, Pellicano, Fotopoulou & Kilner, 2017;Zamariola, Vlemincx, & Luminet, 2018). It is therefore crucial to examine methodological factors that may explain these inconsistent findings, in order to guide both on-going and future studies. This paper therefore presents novel data on the link between alexithymia and performance on the HCT from a larger sample (N = 287) of adult participants, and scrutinises factors that may impact results across studies. In particular, we examine how the inclusion of control variables influences the effect size (and therefore significance) of the relationship between alexithymia and performance on the HCT. Considering how inclusion of appropriate control variables affects the observed results adds to pre-existing concerns regarding the suitability and validity of the HCT as a measure of interoceptive sensitivity. Despite its popularity, the task has received a significant degree of criticism: evidence suggests that the task may be completed using exteroceptive information alone (Khalsa, Rudrauf, Sandesara, Olshansky, & Tranel, 2009), and that the task may index prior knowledge of resting heart rate rather than interoceptive sensitivity (Brener & Ring, 2016). Previous studies have also detailed a range of psychological and physiological factors which impact on performance on cardiac based measures of interoceptive sensitivity and may determine the degree of performance explained by interoceptive and exteroceptive factors (e.g., blood pressure; see O'Brien, Reid, & Jones, 1998), heart rate variability and resting heart rate (Knapp-Kline & Kline, 2005), body mass index (Rouse, Jones, & Jones, 1988), and beliefs and knowledge of resting heart rate (Brener & Ring, 2016;Ring, Brener, Knapp, & Mailloux, 2015;Ring & Brener, 1996;Windmann, Schonecke, Fröhlig, & Maldener, 1999). Further criticism centres on the lack of a suitable control task, and inconsistencies in the implementation of the task across studies. All of these factors are discussed below, and the current results contribute to the debate around the validity and reliability of the HCT as a measure of interoceptive sensitivity.

Participants
299 volunteers took part in this study in exchange for a small honorarium. Participants were recruited via local advertisements and databases of individuals who had expressed an interest in taking part in psychological research. Ethical approval was granted by the local ethics committee. In line with the declaration of Helsinki, all participants gave informed consent and were fully debriefed upon task completion. 12 participants were removed for extreme scores on control variables (see Analysis Strategy) resulting in 287 valid cases (86 Males, M age = 38.07 years, SD age = 21.09 years, Range = 18-90 years).

Measures
Data presented here were collected from participants across a period of two years and combined for the purpose of the present analysis. Some participants took part in more than one study using the HCT, therefore duplicate values for participants were removed prior to analyses. During this time period the measures used by our research group have changed. Therefore, for three factors (depression, anxiety, heart rate estimates) the measures utilised differ across participants (detailed below). The method used for the collection of all other variables was the same across all participants.

Alexithymia
Alexithymia was quantified using the Toronto Alexithymia Scale (TAS-20; . This measure includes 20 items, rated on a scale from 1 to 5, yielding scores between 20 and 100, with higher scores representative of more severe alexithymic traits. In this sample, total scores ranged from 20 to 82 (M = 45.26, SD = 13.00) with 39 individuals in the sample meeting cut off for alexithymia (≥61).

The heartbeat counting task
As is typical during the HCT, participants were asked to silently count their heartbeats over a series of intervals whilst their heartbeat was objectively recorded using a pulse oximeter. Participants were explicitly instructed not to count seconds or guess; if they could not feel their heartbeat at all, they were asked to give a response of zero. Four durations were examined (25, 35, 45, 100 s) with half of participants completing longer durations (28, 38, 48, 103 s). As a control, participants were also asked to complete a time estimation task, in which they were asked to count seconds instead of heartbeats. The durations utilised (e.g., 25 vs. 28) were counterbalanced across the time and heartbeat tasks. Across both tasks, the order of durations was counterbalanced across participants, and half of the participants completed the timing task first, while half completed the HCT first. HCT and time estimation accuracy were estimated on a scale from 0 to 400 using the following equation, where higher scores indicate better cardiac/time estimation accuracy: SUM(1-(|Objective measureparticipant's esti-mate|/Objective measure)) x 100. In individuals for whom counterbalancing information was available (N = 271) no order effect (HCT vs time estimation first) was observed for performance on the HCT (t (269) = −0.559, p>.50). Levene's test indicated that the assumption of equal variances was violated for the time estimation task data (F = 11.845, p = .001), therefore a robust method was utilised to analyse these data (see Field & Wilcox, 2017). The Yuen (1974) modified t-test revealed no significant difference in time estimation performance as a function of task order (Mdiff = −10.77 [-28.60, 7.06], Yt = −1.19, p .20).

Additional control measures
As performance on the HCT has been found to be influenced by various physiological and psychological factors (e.g., Brener & Ring, 2016;Khalsa, Rudrauf, Sandesara et al., 2009;Knapp-Kline & Kline, 2005;O'Brien et al., 1998;Pollatos, Traut-Mattausch, & Schandry, 2009;Rouse et al., 1988;Wittmann, 2013) a number of control measures were employed. These were available for the majority of participants (see Table 1). Body mass index (BMI), systolic blood pressure and knowledge of the 'typical' resting heartrate were collected poststudy for all participants. Depression and anxiety were assessed at the same time as alexithymia with these questionnaires completed in a randomised order.
2.2.3.2. Systolic blood pressure. Blood pressure was taken using an electronic upper arm monitor (Omron M2) whilst participants were seated. High scores indicate higher systolic blood pressure.

2.2.3.3.
Resting heart rate & heart rate variability. Average resting heart rate was taken as a measure of resting heart rate. For some participants all intervals were included, whereas for others the last 60 s of the longest duration was utilised. The root mean square of successive differences was used as a measure of heart rate variability (HRV).
Higher scores indicate higher resting heart rate or increased heart rate variability.

2.2.3.4.
Knowledge of average resting heart rate. After the heartbeat counting task participants were asked to estimate the average person's resting heart rate "how many times do you think the average person's heart beats in 60 s when they are at rest?" (see Murphy, Geary et al., 2017;Murphy et al., 2018). The absolute difference between the participant's estimate and average resting heart rate (reported in large studies of human physiology; 72.26; Agelink et al., 2001;Ramaekers, Ector, Aubert, Rubens, & Van de Werf, 1998) was taken as a measure of accuracy. This was favoured over asking participants to estimate their own heart rate to avoid effects of estimation on the HCT and vice versa. High scores on this variable indicate greater deviation between the participant's estimate and average resting heart rate, and therefore greater inaccuracy.

Depression.
Depressive traits were measured using either the Beck Depression Inventory (BDI; Ward, Mendelson, Mock, & Erbaugh, 1961;Beck, Steer, & Brown, 1996) or the depression subscale from the Depression, Anxiety and Stress Scale (Lovibond & Lovibond, 1995;Lovibond, 1993). To combine these scores into one variable, scores within the sample reported here were Z-scored and these Z scores were then combined into one variable indexing depressive traits. As such, high scores indicate greater depressive traits.
2.2.3.6. Anxiety. Anxiety was measured using either the anxiety subscale from the Depression, Anxiety and Stress Scale (Lovibond & Table 1 Provides the descriptive statistics and correlations between all measured variables. Partial correlations between alexithymia and HCT performance controlling for each variable are also reported. TAS-20 = Alexithymia scores, HCT = heartbeat counting scores, Age = age in years, Gender = 0 = females, 1 = males, Time = scores on the time estimation task, BMI = body mass index, Knowledge = inaccuracy of estimates regarding the average resting heart rate (see text for details), HRV = heart rate variability, Depression = Z-score depression scores, Anxiety = Z-score anxiety scores (see text for details), Systolic BP = Systolic blood pressure, Mean HR = the participant's average resting heart rate. *denotes significant at p < .05, **denotes significant at p < .01. Lovibond, 1995;Lovibond, 1993) or the trait anxiety subscale from the State-Trait Anxiety Inventory (Spielberger, Gorsuch, & Lushene, 1970).
To combine these scores into one variable, scores within the sample reported here were Z-scored and these Z scores were then combined into one variable indexing anxious traits. As such, high scores indicate greater anxiety.

Analysis strategy
Initially, the zero-order correlation between alexithymia and score on the HCT is reported (ignoring performance on the time estimation task and without accounting for any physiological or psychological control variables). Zero-order correlations between all variables are also reported, as well as partial correlations between alexithymia and HCT performance controlling for each control variable separately. We then report the results of a series of multiple regressions in which control measures are successively added (see Supplementary Results). These analyses are included for illustrative purposes only and demonstrate how the results change with each added control variable. Whilst directional predictions can be made for all variables, results of twotailed statistical tests are reported for all analyses. Univariate outliers more than three times the interquartile range were removed. This resulted in exclusion of one outlier on the basis of BMI, four on the basis of HRV, two on their knowledge of the average heart rate, four for extreme depression scores, and one on the basis of systolic blood pressure. The inclusion of these 12 individuals, however, did not alter the pattern observed in the final model reported in Table 2. For each regression, we report the predictor values and number of participants, and the standardised Beta, t value, significance, and partial correlation coefficient for alexithymia in each regression model (see Supplementary Results; Table S1), and the same values for each predictor variable in the full regression model. It is the full, final, regression model that tests the association between alexithymia and performance on the HCT after controlling for all relevant variables (Table 2).

Results
The zero-order correlation between alexithymia and HCT performance, ignoring performance on the control task and failing to account for any control variable, was not significant (r(285) = −0.079, p = .182). Table 1 presents the descriptive statistics for all measured variables, the zero-order correlations between all variables, and the partial correlation between HCT performance and alexithymia controlling for each variable separately.
The models, and relevant values for alexithymia, for a series of multiple regression models in which alexithymia and an increasing number of control variables are used to predict performance on the HCT are reported in the Supplementary Results (with each predictor entered in the order that maintained maximum statistical power; Table S1). It can be seen that inclusion of the various control variables changes the observed effect size from r(partial) = −0.079 to −0.193. Importantly, however, when the full range of control measures was included (Table 2), alexithymia was a significant predictor of performance on the HCT. The same pattern was observed when missing values were replaced to retain power (see Supplementary Results; Table S3). Moreover, whilst the residuals from these models were normally distributed (see Supplementary Results: Fig. S1), to confirm the final model reported in Table 2 the data were analysed using a robust regression procedure (Field & Wilcox, 2017) implemented in MATLAB (2014) with the default weighting function employed. This analysis confirmed the same pattern of results; alexithymia was a significant predictor of poor performance (p = .021), and accurate knowledge of resting heartrate and male gender were both predictors of good performance (p = .048 and p = .003 respectively).
These results highlight a potential reason for the observed variance in the effect size relating HCT performance to alexithymia across studies: failure to appropriately control for the various non-interoceptive factors that influence performance on the HCT will influence the observed effect size. We now turn to other potential reasons why one may see variance in the observed effect size across studies.

The heartbeat counting task is a poor measure of interoceptive sensitivity
The HCT is commonly used as a measure of interoceptive sensitivity as it is very quick, cheap, and easy to administer, but it is generally recognised as having substantial problems. Approximately 40% of typical individuals report no conscious awareness of their heartbeat at all (Khalsa, Rudrauf, Sandesara et al., 2009), making this task unsuitable for examining interoception at lower ranges of ability. Perhaps most problematic, however, is that heartbeat may be perceived via (exteroceptive) touch receptors due to the vibration of the chest wall (Khalsa, Rudrauf, Sandesara et al., 2009). The extent to which the heartbeat may be perceived exteroceptively 2 depends on factors such as the percentage of body fat (Rouse et al., 1988), systolic blood pressure (O'Brien et al., 1998), resting heart rate, and heart rate variability (Knapp-Kline & Kline, 2005). This is clearly of concern when it comes to comparisons across studies; even if the relationship between interoception and alexithymia is perfectly fixed and unchanging, one may observe large variation in the size of the relationship between performance on the HCT and alexithymia (or any other variable to which HCT performance is related) depending on the particular physical characteristics of the sample tested, and the ratio of interoceptive to Table 2 Depicts the results of the final regression analysis predicting scores on the HCT from alexithymia after the inclusion of all control variables (for each step please see supplementary results). As can be seen, after controlling for all variables a significant relationship between alexithymia and HCT performance was observed.
TAS-20 = Alexithymia scores, Age = age in years, Gender = 0 = females, 1 = males, Time = scores on the time estimation task, BMI = body mass index, Knowledge = inaccuracy of estimates regarding the average resting heart rate (see text for details), HRV = heart rate variability, Depression = Z-score depression scores, Anxiety = Z-score anxiety scores (see text for details), Systolic BP = Systolic blood pressure, Mean HR = the participant's average resting heart rate. exteroceptive information participants were using to perform the task. This is further complicated by the fact that some of these factors may themselves be associated with alexithymia (or autism) (e.g., rates of alexithymia are higher in obese individuals; Pinna et al., 2011). This highlights the need to control for these factors when using the HCT, as failure to do so renders the results extremely hard to interpret. Indeed, in the current data depending on which physical parameters (e.g., BMI, HRV, Systolic blood pressure, resting heart rate) are controlled for, the observed r value for the correlation between the HCT and alexithymia varies between −0.079 and −0.167, and in this sample controlling for systolic blood pressure and heart rate variability alone resulted in a significant relationship between alexithymia and HCT performance (Table 1).Whilst the inclusion of all control variables only had a modest influence on the effect size of the relationship between alexithymia and HCT performance, the importance of controlling for these factors may be greater in clinical populations characterised by ill-health (and thus, higher BMI, Systolic blood pressure and greater HRV and resting heart rate; e.g., Hert et al., 2011), or at certain stages of development (e.g., knowledge of resting heartrate and beliefs may differ substantially within children or adolescents). Procedural differences in the way the task is administered can also contribute to the discrepant findings across studies. A time estimation task is often used alongside the HCT to control for nonspecific factors that may influence HCT performance such as motivation, fatigue, etc. (e.g., Ainley, Brass, & Tsakiris, 2014;Murphy, Geary et al., 2017;Shah et al., 2016), and inclusion of this task may be especially crucial for autistic or alexithymic individuals, who may feel anxious or distracted during any experimental task. Failure to include a control task therefore means that any studies reporting the presence or absence of a correlation with the HCT are extremely difficult to interpretany of these nonspecific factors may artificially inflate the relationship between HCT performance and alexithymia, or mask a real association. It is worth noting, however, that although the time estimation task controls for many nonspecific factors, it does not control for differences in detection thresholds relating to decision parameters. For example, if those with autism or alexithymia require more sensory evidence (regardless of whether this is interoceptive or exteroceptive) than neurotypical individuals in order to decide an event has occurred, this factor would affect the HCT but not the time estimation task. To control for factors such as these, a control task in which exteroceptive stimuli must be counted (matched for detectability with heartbeats at the population level) may be preferable. Several other factors have been shown to significantly impact upon the results obtained using the HCT. For example, one factor relates to the effect of knowledge of one's own, or the average person's, heart rate. Indeed, a body of evidence demonstrates that manipulating participants' beliefs about one's own resting heart rate alters heartbeat counting estimates in the HCT (Ring et al., 2015;Ring & Brener, 1996;Windmann et al., 1999). Likewise, accurate knowledge of average heart rate correlates with improved performance on the HCT (Murphy, Geary et al., 2017;Murphy et al., 2018), as replicated in the current data, and it may do so via at least two routes. The first may be considered interoceptive (depending on one'sdefinition of interoception); knowledge of the frequency of one's heartbeat may allow the heartbeat signal to be distinguished from other interoceptive signals and therefore accurately recognised. The second is not interoceptive and may be evidenced by an interaction with an important procedural factor; whether participants are encouraged to guess if they cannot feel their heartbeat. The particular instructions given to participants in the HCT are rarely reported but an informal survey suggests that participants are often instructed to guess (or 'estimate') if they cannot feel a heartbeat as per early descriptions of the task (Schandry, 1981;see Brener & Ring, 2016). This instruction is not universal however; the same informal survey also found that participants were sometimes (less frequently) instructed to report zero heartbeats if they felt no heartbeats. If participants are instructed to guess (or if they do so regardless of the instruction not to guess) then a sensible strategy is to estimate the duration of the interval over which one is required to count one's heartbeats and use the knowledge of one's resting heart rate (or the average resting heart rate) to arrive at an estimate of the number of heartbeats. It is therefore crucial to control for the accuracy of participants' knowledge concerning their own or the typical resting heart rate, and to (at least) use the time estimation task as a control. It is also important to report the instructions given to participants with respect to guessing. Arguably, it is more appropriate to instruct participants to report zero heartbeats if they can feel zero heartbeats than asking them to make an arbitrary guess. Importantly, these factors may differentially impact upon typical and clinical groups; using a modified version of the HCT, Khalsa et al. (2017) demonstrated that instructions relating to guessing had a significant impact on performance, and that this impact was significantly different in a clinical group (those with Substance Use Disorder) and a typical control group.
Importantly, even if all of the limitations listed above are taken into account and appropriately controlled for, it is still unclear whether the HCT is a measure of interoceptive sensitivity. Due to the lack of work controlling for all potential confounds we cannot currently be sure whether, when administered properly, it is an adequate test of interoceptive sensitivity. It is therefore particularly problematic to relate individual differences in HCT performance to psychological variables or performance on cognitive tasks. Importantly, these difficulties are problematic not only for relating HCT performance to alexithymia, but apply to any individual difference measure. For example, the present data demonstrate that the documented decrease in interoceptive sensitivity with advancing age Murphy, Geary et al., 2017) is absent when a number of control variables are taken into account. Likewise, well-evidenced relationships between anxiety and interoceptive sensitivity, and depression and interoceptive sensitivity (e.g., Pollatos et al., 2009; see Section 3.3) were not found in the present sample. Thus, whilst the following sections focus on the relationship between alexithymia and HCT performance, it is important to acknowledge that the limitations described above are applicable to all research using the HCT, and so are also likely to impact the debate concerning interoception in autism.
The implications of this relationship for measuring the interoceptive ability (interoceptive sensitivity, sensibility or awareness) of individuals with alexithymia or autism are clear; without accounting for depression and anxiety (either by matching alexithymic/autistic and control groups on levels of depression and anxiety, or through controlling for anxiety and depression statistically 3 ) one simply cannot measure the relationship between alexithymia or autism and interoceptive ability. If the group of interest is more depressed than the control group, this may artificially inflate any deleterious effect of the condition of interest on interoception, whereas the opposite will be true if the group of interest is more anxious than the control group. It is therefore very difficult to conclude anything about the relationship between autism or alexithymia and interoception from studies that have not controlled for these factors.

The relationship between Alexithymia and the HCT
Above, we suggested that the relationship between alexithymia and interoceptive sensitivity may be incredibly reliable in actuality, but very unreliable when interoceptive sensitivity is measured using the HCT due to the many problems associated with using the HCT as a measure of interoceptive sensitivity. Of course, it is also true that measurement of alexithymia may be unreliable or invalid. Likewise, it remains a possibility that there may be multiple routes to alexithymia and that sometimes alexithymia may be observed in the absence of interoceptive impairment. Whilst these issues are beyond the scope of discussion of the current results, we have commented on these issues in the Supplementary Discussion.
Beyond problems with the measurement of alexithymia, or multiple types of alexithymia (some being associated with interoceptive impairment and some not), a further explanation for the variability in the strength of the association between alexithymia and performance on the HCT could be that there is no association, and that findings of an association are false positives. If true, it is still unclear what should be concluded. As detailed above, even if all appropriate physiological and psychological factors are controlled for, and an adequate control task employed, it is still not clear that the HCT is actually a measure of interoceptive sensitivity. This is important as relationships between alexithymia and interoception have been found using tests other than the HCT. These tests have measured interoceptive sensitivity in the domains of taste , muscular effort Van Der Cruijsen, Murphy, Crone & Bird, in prep.), temperature (Borhani et al., 2017) and physiological arousal (Gaigg, Cornell, & Bird, 2018). Furthermore, several studies have reported a relationship between alexithymia and self-reported interoceptive sensibility (Betka et al., 2018;Brewer, Cook, & Bird, 2016;Longarzo et al., 2015;Zamariola et al., 2018) or the objectively-measured propensity to utilise interoceptive information in the respiratory domain Zhang, Murphy, Bird & Lau, in prep. (adolescent females only)). If these relationships are not false positives, and yet the previously observed (and currently observed) relationship between HCT performance and alexithymia is, then the logical conclusion is that either the HCT is not a (good) measure of interoception, or that interoception fractionates, such that performance in some interoceptive domains (and dimensions; e.g., self-report vs objective measures) is predicted by alexithymia, and some not. The possible fractionation of interoceptive ability is currently a matter of debate; whilst the perception of cardiac signals has received much research attention, with the HCT commonly employed (Dale & Anderson, 1978;Schandry, 1981), in recent years a number of novel measures of interoception have been developed to assess interoceptive ability across multiple domains (e.g., respiratory). In part, these research efforts have been driven by concerns over the validity of cardiac measures of interoception (Khalsa, Rudrauf, Sandesara et al., 2009), and by the need to test the assumption of a unitary interoceptive ability. Whilst some studies support a unitary view, for example moderate correlations have been reported between measures of gastric and cardiac interoception (Herbert, Muth, Pollatos, & Herbert, 2012;Whitehead & Drescher, 1980), others find no relationship across domains (e.g., respiratory and cardiac; Ehlers & Breuer, 1992;Garfinkel, Manassei, &, Hamilton-Fletcher, 2016;Pollatos, Herbert, Mai, & Kammer, 2016;Steptoe & Vögele, 1992). Likewise, the HCT is often uncorrelated with self-report measures of interoception (Garfinkel et al., 2015). It is worth noting, however, that even if interoceptive ability does fractionate according to the signal to be perceived and along the dimension measured (e.g., selfreport (sensibility) vs. objective accuracy (sensitivity)), it is still the case that performance on the HCT only correlates modestly with the second-most commonly used test of objective interoceptive sensitivity, which also tests the ability to perceive cardiac signals (the Heartbeat Discrimination Task; Whitehead, Drescher, Heiman, & Blackwell, 1977) with reports of small (e.g., Garfinkel et al., 2015) or absent (e.g., Ring & Brener, 2018) relationships between performance on these two tasks. If the two 'gold-standard' measures of cardiac interoceptive sensitivity correlate only modestly (if at all), then it becomes problematic to conclude anything about cardiac interoceptive sensitivity from either test. Indeed, recent work assessing the test-retest reliability of the HCT suggests that scores at one time point correlate only moderately (r = 0.6) with scores just two months later i.e. only 36% of the variance in test scores at Time 2 are predicted by Time 1 4 (Ferentzi, Drew, Tihanyi, & Köteles, 2018). Assuming interoceptive sensitivity does not actually fluctuate over 2 months to this degree, then one may find very different estimates of the true correlation between alexithymia (or autism) and cardiac interoceptive ability due to the unreliability of the HCT.

Limitations
Despite the relevance of this evidence for our understanding of the relationship between alexithymia and interoception it is important to acknowledge certain limitations. First, although a number of control variables were present in the current dataset, no measure of autistic traits was included. Indeed, given relationships between alexithymia and autism (Berthoz & Hill, 2005), it is important that future research assesses the relationship between alexithymia, autism and interoceptive sensitivity using appropriate measures of interoception, coupled with the inclusion of necessary control variables. These considerations are important also for any future research examining the contribution of alexithymia to interoceptive difficulties in other conditions characterised by increased rates of alexithymia and poor interoceptive sensitivity e.g. eating disorders (Cochrane, Brewerton, Wilson, & Hodges, 1993;Klabunde, Acheson, Boutelle, Matthews, & Kaye, 2013;Pollatos et al., 2008) and shizophrenia (Ardizzi et al., 2016;Henry, Bailey, von Hippel, Rendell, & Lane, 2010). Second, although control variables were present for a number of individuals, data were not available for all participants. Despite these shortcomings, the present data underscore the need for future research to consider the appropriateness of the HCT as a measure of interoceptive sensitivity, and the use of appropriate control variables.

Conclusions
In conclusion, although the HCT is the most commonly used measure of interoceptive sensitivity, results from studies using this task are extremely difficult to interpret, even when those studies control for most, or all, of the factors identified here. Indeed, given the number of limitations of the task, it is unclear why it is used so frequently. Previous concerns over poor correlations with other measures of interoception (self-reported and objective, including other tests purporting to measure cardiac interoceptive sensitivity) and possible exteroceptive compensation strategies may be further exacerbated by the inconsistencies in administering the task. Currently, researchers do not provide consistent instructions, utilise the same control tasks, or account for the same set of variables known to have an impact on performance. The current results demonstrate that these factors can substantially influence the relationship between alexithymia and HCT performance, and presumably would also substantially influence the association between autism and HCT performance or indeed the relationship between any factor related to HCT performance. In particular, knowledge of the average heart rate may have a substantial effect on HCT performance. These results highlight that those studies that fail to use an adequate control task, or account for the full range of relevant psychological and physiological factors, may provide a very inaccurate estimate of relationships with HCT performance.
With respect to the relationship between alexithymia and performance on the HCT, previous findings of a significant relationship were replicated, such that those individuals with higher levels of alexithymia performed worse on the HCT task. It should be noted, however, that the size of the effect observed in these data was smaller than that observed in previous studies (≅0.20 versus ≅0.35 in previous studies). To our knowledge, the studies failing to replicate this association have not accounted for the complete set of relevant control variables or included an appropriate control task. Given the current results, it is unlikely that these studies can provide an accurate assessment of the relationship between alexithymia and HCT performance.
In order to further our understanding of the relationship between interoception and alexithymia, it is essential to include a large number of individuals who score at least above the cut-off for alexithymia when investigating the relationship between alexithymia and interoception, and to control for co-occurring traits such as depression and anxiety. More broadly, for the study of interoception, it is clearly crucial to reduce the discrepancies across studies in the HCT methodology, as well as utilising alternative measures of interoception.

Data declaration
A subsample of the HCT data presented in this paper has previously been reported in the following papers, however the relationship between alexithymia and HCT performance controlling for all possible confounds has not been previously reported.