A multidimensional and multi-feature framework for cardiac interoception

Interoception (the sensing of inner-body signals) is a multi-faceted construct with major relevance for basic and clinical neuroscience research. However, the neurocognitive signatures of this domain (cutting across behavioral, electrophysiological, and fMRI connectivity levels) are rarely reported in convergent or systematic fashion. Additionally, various controversies in the field might reflect the caveats of standard interoceptive accuracy (IA) indexes, mainly based on heartbeat detection (HBD) tasks. Here we profit from a novel IA index (md) to provide a convergent multidimensional and multi-feature approach to cardiac interoception. We found that outcomes from our IA-md index are associated with -and predicted by- canonical markers of interoception, including the hd-EEG-derived heart-evoked potential (HEP), fMRI functional connectivity within interoceptive hubs (insular, somatosensory, and frontal networks), and socio-emotional skills. Importantly, these associations proved more robust than those involving current IA indexes. Furthermore, this pattern of results persisted when taking into consideration confounding variables (gender, age, years of education, and executive functioning). This work has relevant theoretical and clinical implications concerning the characterization of cardiac interoception and its assessment in heterogeneous samples, such as those composed of neuropsychiatric patients.

Mainstream interoceptive tasks require subjects to track their cardiac bumps through silent counting (e.g., Schandry, 1981) or motor tapping (e.g., Canales-Johnson et al., 2015;McFarland, 1975). In this approach, IA is typically calculated as the difference between perceived and actual heartbeats (i.e., Schandry's index). Despite its simplicity, this index has been severely criticized (Brener and Ring, 2016;Zamariola et al., 2018;Murphy et al., 2018a;Khalsa et al., 2009a) mainly because responses may be guided by an estimation of the average heart rate rather than the actual tracking of relevant signals (Windmann et al., 1999;Murphy et al., 2018b;Ring et al., 2015). Furthermore, this index is biased by the total number of responses, such that a higher number of tracked heartbeats leads to a higher IA even if body signals are not actually perceived. Indeed, people with high IA do not show a corresponding high correlation between responses and actual heartbeats, which suggests that they over-report heartbeat perception (Zamariola et al., 2018).
Motor-tracking HBD tasks can yield a more robust IA index based on Signal Detection Theory (SDT) (Macmillan and Creelman, 2004;Killeen, 2015;Gonzalez Campo et al., 2019) -i.e., d' index. This framework allows estimating the subject's sensitivity and specificity in discriminating signal (heartbeats) from noise, penalizing correct responses made by chance. Nevertheless, this method also faces major limitations. In particular, it requires a definition of a window time-locked to the heartbeat to consider a response as correct ('hit') or incorrect ('false alarm'), but heartbeat perception hardly occurs in the same timespan for all individuals (Brener and Ring, 2016).
More importantly, the approaches above share an additional and critical shortcoming: they are blind to the effect of heart rate changes on behavioral responses during the task. Indeed, heart rate modulates heartbeat counting (Zamariola et al., 2018) and detection (Knapp-Kline and Kline, 2005). As explained above, Schandry's index is based on a single number comparing the subject's total perceived and actual heartbeats. For its part, the d' index weighs correct and incorrect motor responses to heartbeats depending on their occurrence in a fixed time-window that remains constant throughout the task. Thus, they both fail to account for on-the-fly behavioral adjustments to heart rate fluctuations, potentially produced by changes in respiration (Novak et al., 1993), temperature (Shin, 2016), or arousal or stress levels (Taelman et al., 2009). Those indexes, then, are suboptimal to determine whether subjects are following their hearts' rhythm or other sensations .
Furthermore, heartbeat perception may also be affected by potential confounding variables, such as demographic (i.e., gender, age, years of education) or domain-general cognitive factors (e.g., executive functioning), which typically modulate results in any task. In fact, some studies have reported higher IA in men than women (Montoya et al., 1993;Grabauskaite et al., 2017), but others have found no evidence for gender-based differences (Pollatos and Schandry, 2004;Khalsa et al., 2009b). Additionally, although aging seems to have a detrimental effect on IA (Khalsa et al., 2009b), the lack of longitudinal data precludes excluding sample-or task-specific confounds (Murphy et al., 2017). In any case, most available research has not accounted for these potentially relevant factors.
In this context, we recently developed a new IA index, called 'mean distance' (md) (de la Fuente et al., 2019), that captures the oscillatory coupling between subjects' responses and cardiac frequency during motor-tracking HBD tasks (Salamone et al., 2018;Yoris et al., 2017aYoris et al., , 2017b. This metric presents important advantages. First, md is mostly uncontaminated by the subjects' beliefs about their average heart rate since it compares motor responses and heartbeat frequencies in multiple overlapping time-windows rather than a single time-span. Second, md is unaffected by the total number of responses because subjects who tap repeatedly do not obtain higher IA unless their response frequency is close to their cardiac frequency. Third, md does not rely on arbitrary time windows to consider a response as correct or incorrect, as it assesses heartbeat frequency rather than individual heartbeats. Finally, unlike all previous IA procedures, md captures dynamic behavioral adjustments driven by cardiac frequency changes. Using this new index, we developed a multidimensional and multifeature approach to robustly characterize cardiac interoception ( Fig. 1A and B). We assessed a large sample of 114 healthy subjects with a validated HBD task (Salamone et al., 2018;Yoris et al., 2017aYoris et al., , 2017b, and tested the association of our md index with canonical neurocognitive markers of interoception, including the heart-evoked potential (HEP) -here derived from high-density electroencephalography (hd-EEG) (Salamone et al., 2018;Canales-Johnson et al., 2015;Couto et al., 2014;Montoya et al., 1993;Pollatos and Schandry, 2004;Yoris et al., 2017a;Garcia-Cordero et al., 2017;Fukushima et al., 2011;Pollatos et al., 2005Pollatos et al., , 2016-and functional connectivity signatures from resting-state functional magnetic resonance imaging (fMRI) (Salamone et al., 2018;Yoris et al., 2017a). Also, given the intimate links between interoception and socio-emotional skills (Critchley and Garfinkel, 2017;Ondobaka et al., 2017;Wiens, 2005;Adolfi et al., 2017;Craig, 2002;Critchley et al., 2004;Zaki et al., 2012), we tested the association of our md index and emotion recognition tasks. Then, for comparison, we repeated all analyses with the two mainstream indexes described above: a modified version of Schandry's index (mSI) (Schandry, 1981) (Supplementary Material 1.1), and a d' score based on SDT (Macmillan and Creelman, 2004;Killeen, 2015;Gonzalez Campo et al., 2019) (Supplementary Material 1.2). Finally, to explore whether the combination of ongoing brain measures (HEP), resting-state interoceptive brain network correlates, and behavioral data (emotion recognition scores) predicts each IA index, we applied a data-driven multivariate computational analysis. Thereupon, we explored whether ensuing predictions were affected when adding potential confounding variables (i.e., gender, age, years of education, and executive functioning) (Fig. 1C), which is critical to evaluate interoception in heterogeneous populations. Based on previous findings, we expected to find significant associations between IA-md and canonical neurocognitive markers of interoception (i.e., HEP, fMRI networks, emotion recognition). Furthermore, we hypothesized that these associations would be stronger for md than standard IA indexes (mSI and d'). Finally, we expected to find null associations between interoceptive markers and exteroceptive accuracy (EA) -the control condition of the motor-tracking HBD task-, which would support the construct validity of IA.

Participants
The study comprised 114 volunteers (59 female; 5.5% left-handed) between 17 and 84 years old (M ¼ 40.81, SD ¼ 20.54). They had a mean of 14.64 years of education (SD ¼ 3.95) and declared no history of psychiatric or neurological conditions, substance abuse disorder or heart diseases. Furthermore, they underwent a standard clinical examination comprising neurological, neuropsychiatric, and neuropsychological assessments by expert professionals -Supplementary Material 2.1. The INECO Frontal Screening (IFS) battery (Torralva et al., 2009), a brief tool to evaluate executive functioning, revealed preserved scores across the sample (N ¼ 108, M ¼ 25.05, SD ¼ 2.82). The IFS assesses three executive functions: response inhibition and set shifting, abstraction capacity, and working memory. Total IFS scores range from 0 to 30 (with higher scores representing better executive functioning) (Torralva et al., 2009) -more details about this test are provided in Supplementary Material 2.2. The discrepancy between the entire sample size (N ¼ 114) and the subsample with IFS scores (N ¼ 108) reflects missing data. All participants signed an informed consent in accordance with the Declaration of Helsinki. The study was approved by the Ethics Committee of the host institution.

Interoceptive performance: heartbeat detection task
We assessed cardiac interoception through a validated HBD task (Garcia-Cordero et al., 2016Salamone et al., 2018;Yoris et al., 2015Yoris et al., , 2017aYoris et al., , 2017bSedeno et al., 2014;de la Fuente et al., 2019;Gonzalez Campo et al., 2019;Couto et al., 2014;Melloni et al., 2013) -available online at http://bit.ly/2EpfGrq. The task comprises two conditions (Salamone et al., 2018;Yoris et al., 2017aYoris et al., , 2017b. The exteroceptive condition provides a control measure assessing the subjects' capacity to attend to external stimuli -i.e., EA. Participants were binaurally presented with an audio of a recorded heartbeat (digitally constructed from an actual electrocardiogram record of a researcher), which they had to follow by pressing a key with their dominant hand. They were given the following instructions: "In this part of the test, you will hear the beating of a heart recorded from another person. You Participants performed a heartbeat detection (HBD) task in which they were instructed to tap a key following their own heartbeats while electrocardiographic (ECG) and high-density electroencephalographic (hd-EEG) signals were recorded. This was done twice (two 2-min blocks). Then, a subsample of participants underwent a resting-state functional magnetic resonance imaging (fMRI) session and a socioemotional skills assessment involving emotion recognition tasks (Ekman-35 and TASIT-EET). B. md calculation. During the HBD task, tapping responses and ECG signals were recorded and logged as marks in time. To calculate IA-md, blocks were subdivided in overlapping 10-s windows starting at each individual heartbeat. The absolute difference between cardiac frequency and response frequency (md) was computed for each individual window and averaged over all windows comprising both blocks. C. Multivariate analysis. Four heart-evoked potential (HEP) modulation metrics from EEG recordings, 16 functional connectivity metrics from fMRI registers, and three emotion recognition scores from the socio-emotional skills assessment were introduced as selected features in a linear regression model to test their power in predicting IA-md score as well as two other indexes for comparison: a modified version of Schandry's index (mSI) and a d' score. The regression was then repeated including four demographic and executive functioning features ('demographics'). For both analyses, data were split in 50-50 train and test partition and optimized over a validation set bootstrapped from train partition. We assessed the coefficient of determination (R 2 ) between the target and the predicted value for data in test partition. must follow every heartbeat by tapping the "z" key on the laptop keyboard. Do not try to anticipate your responses by guessing the recorded heart rhythm; instead, tap as fast as you can after each beat you hear". This condition comprised two blocks lasting 2 min each. In the first block, recorded heartbeats were presented at a constant and regular frequency (60 bpm), while in the second block, recorded heartbeats were manipulated to have the same overall frequency (60 bpm) but at irregular intervals. Both blocks of the exteroceptive condition were always presented in the same order, before moving on to the interoceptive condition.
The interoceptive condition provides an objective measure of the subjects' ability to track their own heartbeats (i.e., IA) (Garfinkel et al., 2015). Participants were asked to tap a key with their dominant hand following their own heartbeats. They were instructed not to use any external cues, as stated in the instructions: "Now, you must follow the beating of your own heart by tapping the "z" key for every beat you feel. You should not guide your responses by checking your arterial pulse in your wrists or neck. If you are unable to feel these sensations, you should appeal to your intuition trying to respond whenever you think your heart is beating". The interoceptive condition also included two blocks of 2 min each, with identical instructions.
While subjects performed the HBD task, we recorded the electrocardiographic signals to register the heartbeats alongside motor responses over time. We also obtained hd-EEG recordings to analyze HEP modulations during the task, as detailed in Section 2.3.2.
To estimate the subjects' accuracy across each condition, we calculated the md index (de la Fuente et al., 2019), which is based on the comparison between the frequencies of heartbeats and motor responses (Fig. 1B). First, for each condition, we subdivided each block in overlapping windows starting at each individual heartbeat and extending for 10 s. Then, for each window, we computed the absolute difference (md) between cardiac frequency (measured as 1/mean R-R) and response frequency (1/mean inter response intervals). This process is represented in the following equation: where fc is the average cardiac frequency in a window of w duration centered at time i, fr is the average response frequency in the same window and time, and N is the number of heartbeats in the block. In addition, to control for possible periods during which subjects may have lost concentration, a coefficient of variation (CV) was estimated to assess the regularity of the motor responses inside each individual 10-s window (de la Fuente et al., 2019). To compute the CV, we calculated the ratio of the standard deviation to the mean (SD/X) of the participant's time-intervals between motor responses. The CV estimate was used for thresholding. Windows with CV > 0.5 were not used in the estimation of md because they would fall above the expected values to reflect delivered signal detection (de la Fuente et al., 2019;Werner and Mountcastle, 1963).
Finally, the absolute difference between cardiac and response frequencies was averaged across all windows comprising each block of each condition. More specifically, the averaged md of the windows that make up blocks one and two resulted in the EA index, while the averaged md of the windows that make up blocks three and four resulted in the IA index. Since md is a distance index, its minimum score is 0, indicating a perfect match between motor responses and cardiac frequencies, with higher scores indicating higher distances, and thus, worse performance.
We also followed canonical procedures to compute other IA indexes for comparison: a modified version of Schandry's index (mSI) (Schandry, 1981), and a d' score calculated by means of the SDT (Macmillan and Creelman, 2004;Killeen, 2015;Gonzalez Campo et al., 2019). These are described in Supplementary Material 1.1 and 1.2, respectively.

Signal acquisition and preprocessing
For all participants (N ¼ 114), we recorded hd-EEG signals during the HBD task using a Biosemi Active-two 128-channel system at 1024 Hz. To acquire electrocardiographic data, two external Ag/Ag-Cl adhesive electrodes placed in lead-II were included as references. Data were bandpass filtered during recording (0.1-100 Hz) and offline (0.5-30 Hz) in order to remove undesired frequency components. The signal was rereferenced offline to averaged mastoids. Ocular movement artifacts were removed through independent components analysis and visual inspection, as done in previous works (Garcia-Cordero et al., 2016Salamone et al., 2018).
To analyze the HEP, we implemented a PeakFinder function on Matlab (Kruczyk et al., 2013) to detect the R-wave-electrocardiographic values, allowing to segment continuous EEG data (Garcia-Cordero et al., 2016Salamone et al., 2018;de la Fuente et al., 2019;Couto et al., 2014Couto et al., , 2015Yoris et al., 2017aYoris et al., , 2017b. Epochs were segmented from 300 ms prior to the onset of the R-wave onset to 500 ms after, and baseline-corrected relative to a À300 to À200 ms time window. Noisy epochs were rejected using an automated procedure, which excludes data points as artifacts if the probability of the epoch exceeds a threshold of 2.5 SDs from the mean probability distribution calculated from all trials or by measuring the kurtosis of probability distribution (de la Fuente et al., 2019;Zich et al., 2015) and visual inspection.
To explore the association of IA indexes (md, mSI and d') and HEP modulations in selected ROIs, we performed non-parametric correlation tests (Spearman's rho). Results were considered significant using a statistical threshold of p < 0.05. In order to show the specificity of the IA construct, analyses were repeated to test the expected null association between EA indexes (md, mSI and d') and HEP modulation.

fMRI data
As in previous works (Salamone et al., 2018;Yoris et al., 2017a), we explored the association between the IA indexes (md, mSI and d') and the patterns of fMRI co-activation of key interoceptive regions, namely the insula, the postcentral cortex, and the anterior cingulate cortex (ACC), which are proposed to subserve interoceptive processing (Adolfi et al., 2017;Critchley et al., 2004;Schulz, 2016). We also tested the expected null associations among functional connectivity and EA indexes (md, mSI and d').

Image acquisition and preprocessing
The fMRI acquisition protocol and the description of preprocessing steps are reported in accordance with the practical guide from the Organization for Human Brain Mapping (Nichols et al., 2017;Poldrack et al., 2017). We obtained 10-min resting-state fMRI recordings from a subsample of 72 participants (see Supplementary Table 1 for demographics and executive functioning information about this subsample,  and Supplementary Table 2 for overlap between subsamples). Images were acquired in a 1.5 T Phillips Intera scanner with a standard head coil (8 channels). We acquired functional spin echo volumes in a sequentially ascending order, parallel to the anterior-posterior commissures, covering the whole brain. The following parameters were used: TR ¼ 2777 ms; TE ¼ 50 ms; flip angle ¼ 908; 33 slices, matrix dimension ¼ 64 x 64; voxel size in plane ¼ 3.6 mm Â 3.6 mm; slice thickness ¼ 4 mm; number of volumes ¼ 209. Participants were instructed to lying still, keep their eyes closed, avoid falling asleep, and not to think about anything in particular.
Before preprocessing, we discarded the first five volumes of each subject's resting-state recording to ensure that magnetization achieved a steady state. Images were then preprocessed using the Data Processing Assistant for Resting-State fMRI (DPARSF V2.3) (Chao-Gan and Yu-Feng, 2010), an open-access toolbox that generates automatic pipeline for fMRI analysis. DPARFS works by calling the Statistical Parametric Mapping (SPM 12) and the Resting-State fMRI Data Analysis Toolkit (REST V.1.7). As in previous studies (Salamone et al., 2018;Yoris et al., 2017a), preprocessing steps included slice-timing correction (using middle slice of each volume as the reference scan) and realignment to the first scan of the session to correct head movement (SPM functions). We regressed out six motion parameters, CFS, and WM signals to reduce the effect of motion and physiological artifacts such as cardiac and respiration effects (REST V1.7 toolbox). Motion parameters were estimated during realignment, and CFS and WM masks were derived from the tissue segmentation of each subject's T1 scan in native space with SPM12 (after co-registration of each subject's structural image with the functional image). Then, images were normalized to the MNI space using the echo-planar imaging (EPI) template from SPM (Ashburner and Friston, 1999), smoothed using a 8-mm full-width-at-half-maximum isotropic Gaussian kernel (SPM functions), and bandpass filtered between 0.01 and 0.08 Hz. None of the participants showed movements greater than 3 mm (M ¼ 0.1, SD ¼ 0.06) and/or rotations higher than 3 (M ¼ 0.08, SD ¼ 0.07).

Seed analysis
To explore the association between IA indexes (md, mSI and d') and the functional connectivity of interoceptive hubs, we selected a-priori six spherical 5-mm seeds based on MNI space: left insula (x ¼ À40, y ¼ 10, z ¼ 0) (Schulz, 2016) (Critchley et al., 2004), left postcentral cortex (x ¼ À58, y ¼ À14, z ¼ 24) (Adolfi et al., 2017), and right postcentral cortex (x ¼ 56, y ¼ À24, z ¼ 36) (Adolfi et al., 2017) -see Fig. 2B. For each participant, we extracted the temporal course of the BOLD signal of the voxels comprising each seed region and correlated these data with the temporal course of the BOLD signal of every voxel of the rest of the brain (Pearson's correlation coefficient; DPARSF toolbox). Then, we performed a Fisher z-transformation. The resulting connectivity maps for each seed were used to perform multiple regression analyses in SPM 12, including IA score as the regressor of interest and age as a nuisance covariate. To further account for aging effects in fMRI results (e.g., Varangis et al., 2019), the main analysis (i.e., the association between IA-md and the functional connectivity of the seeds) was also performed in the subsample of subjects < 55 years old (N ¼ 46), with a mean age of 29.26 (SD ¼ 13.43, range ¼ 17-54).
In order to show the specificity of the IA construct, analyses were repeated to test the expected null associations between EA indexes (md, mSI and d') and the functional connectivity within interoceptive hubs.
2.5. Socio-emotional tasks 2.5.1. Facial emotion recognition task (Ekman-35) A subsample of 50 participants completed this task ( Supplementary  Tables 1 and 2), which consists in identifying basic facial emotional expressions in static pictures from the Ekman series (Bertoux et al., 2014). Stimuli were displayed on a computer screen, and participants were given the following instructions: "I will present you with various faces, one by one, expressing one of the following emotions: happiness, surprise, sadness, fear, disgust, or anger. You have to tell me which emotion is expressed by each face. You may respond "neutral" when no emotion can be identified. This is not a speed test, but try not to dwell on your answer for too long". The seven possible response options were written at the bottom of the screen in each trial. Stimuli remained static until the participant gave a verbal response, which the examiner had to write down. Answers given at latencies longer than 12 s were omitted from the analyses. In total, 35 different face stimuli were presented, five corresponding to each of the six basic emotion categories (sadness, fear, anger, disgust, surprise, happiness), and an additional five corresponding to neutral expressions. One point was given for each correct response.
To perform correlational analyses with IA indexes (md, mSI and d'), we computed three global scores: a negative emotion recognition score (corresponding to the sum of sadness, fear, anger, and disgust scores), a positive emotion recognition score (the sum of surprise and happiness), and a total score (the sum of all correct responses). The associations between IA indexes and the described global scores were performed using non-parametric correlation tests (Spearman rho), considering an alpha threshold of p < 0.05. Correlations between EA indexes (md, mSI and d') and the global scores were also performed to test the specificity of these markers.

The awareness of social inference test (TASIT)emotion evaluation test (EET)
Forty-seven participants performed this task (Supplementary Tables 1  and 2), which assesses the ability to infer basic emotions in videotaped vignettes representing actors interacting in naturalistic situations (McDonald et al., 2003). Given that the verbal scripts are neutral in content, the emotions must be inferred from a combination of various clues, including prosody, facial expressions, body language, and the social situation surrounding the emotional expression. This particularity makes the TASIT-EET a more ecological task than picture-based ones (such as Ekman's), since it resembles more precisely the types of interactions people encounter in real life situations. Some scenes depict only one actor talking (on the telephone or directly to the camera), while others show two actors and instructions are given to focus on one of them. Before visualizing each tape, the following instructions were given: "I will show you some short scenes. Please observe each one carefully. After each scene, I will write down the emotion that you tell me that best describes the feeling of the person in the scene. You have to select 1 of 5 emotions from the list that will appear on the screen after each scene. The first will be a practice trial". Thus, the participant was asked to verbally identify the emotion displayed by the target actor within five options that appeared written in the computer screen at the offset of the video: sadness, fear, anger, disgust, surprise, obtaining one point for each correct response. In total, ten short (15-60 s) videos were presented, two per each emotion category.
For correlational analysis, we computed a negative emotion recognition score (corresponding to the sum of sadness, fear, anger, and disgust scores) and a total score (the sum of all correct responses). We tested the association between IA indexes (md, mSI and d') and the global scores through non-parametric correlation tests (Spearman rho), considering an alpha level of p < 0.05. Correlations between EA indexes (md, mSI and d') and the global scores were also performed to test the specificity of these markers.

Multivariate analysis
After univariate analysis, we explored how robustly the different IA indexes (md, mSI, d') were predicted by the combination of measures tapping ongoing brain markers (hd-EEG-HEP), resting-state functional connectivity, and socio-emotional skills. To this end, we used a datadriven multidimensional and multi-feature computational analysis using the subsample that included the cases that completed all sessions of the experimental design (i.e., EEG, fMRI, and socio-emotional skills assessments) (n ¼ 29) (Fig. 1C). For each target variable (IA-md, IA-mSI, IA-d'), we performed a linear regression with an L2 regularization (Alpaydin, 2009) using as input all experimental features that yielded significant associations with any IA index in the previous analyses (i.e., HEP modulation in the extended frontal ROI and its subdivisions, the average functional connectivity of each seed associated with each IA index, and Ekman-35 and TASIT-EET scores) -Section 3.5.1 for details. We used the statistical criteria as filter method of feature selection because this is a standard practice in machine learning studies (Gonzalez Campo et al., 2019;Kassraian-Fard et al., 2016;Anderson et al., 2011;Dottori et al., 2017).
Then, to explore how confounding variables influenced the predictions, we implemented another linear regression with an L2 regularization (Alpaydin, 2009) for each target (IA-md, IA-mSI, IA-d'), adding demographic (gender, age, and years of education) and executive functioning (total IFS score) measures to the previously mentioned features (Section 3.5.1).
For both analyses, we split the data in 50-50 train and test partition. Regardless of the regularization parameter, the process was optimized over a validation set (20%) bootstrapped from train partition. We assessed the coefficient of determination (R 2 ) between the target and the predicted value for data in test partition. To get a more realistic estimation, we performed the regression 30 times and informed the mean and standard deviation.
Although our sample size is small (N ¼ 29), as recommended (Friedman et al., 2001;Kvålseth, 1985), we explicitly avoided using the leave-one-out cross-validation (LOOCV) method, since the coefficient of determination (R 2 ) -the models' performance score-needs a large set of test samples to be computed. While it would be possible to accumulate the dependent variable's prediction over the LOOCV procedure and then compute the R 2 , this would not allow us to assess the variance of the score (the standard deviation) due to changes in the training data. Thus, to know how precise the model's performance score is when we change the data used to train it, we opted for a random sampling procedure, training with one partition and testing in other, various times, always sampling from different random partitions (Gonzalez Campo et al., 2019; Donnelly-Kehoe et al., 2019).

Heartbeat detection task results and associations with sample demographics and executive functioning profiles
The md index was estimated including only 'good windows' (those that met the requirement of CV < 0.5 in the regularity of motor responses) -see Section 2.2 for details about this procedure. Analyses revealed that the mean percentage of good windows was 96% (SD ¼ 0.06) for the interoceptive condition, and 97% (SD ¼ 0.07) for the exteroceptive condition, with no significant difference between them (t ¼ À1.666; p ¼ 0.097). This result indicates that subjects maintained a comparable level of concentration in both conditions of the HBD task.
Regarding performance, as expected for interoceptive measures, IA-md scores (M ¼ 0.43; SD ¼ 0.25) were higher (and thus, worse) than EA-md scores (M ¼ 0.06; SD ¼ 0.09) across the sample (t ¼ 15.196; p ¼ 0.000). This result was also found for the comparison indexes (mSI and d') -see Supplementary Table 3 for details. In addition, subjects' IA-md scores were more variable (IQR ¼ 1.61) than EA-md scores (IQR ¼ 0.50). This variability pattern was also captured by mSI, but not d' (Supplementary Table 3). Regarding demographic information, there were no gender differences in either IA-md (t ¼ 1.075; p ¼ 0.285) or EA-md (t ¼ À0.242; p ¼ 0.810). Null results were also found for mSI and d' (Supplementary  Table 4). Lastly, the IA-md index was not associated with age (rs ¼ À0.036; p ¼ 0.702), years of education (rs ¼ À0.135; p ¼ 0.153) or executive functions as tapped by the IFS (rs ¼ À0.040; p ¼ 0.683), indicating that interoceptive performance could not be explained by these confounding factors when taken separately. Similar results were obtained for IA-mSI and IA-d' (Supplementary Table 5). On the other hand, the number of years of education and the total IFS score were significantly correlated with EA-md (rs ¼ À0.288; p ¼ 0.003 and rs ¼ À0.255; p ¼ 0.013, respectively), possibly reflecting the demands of attending to external stimuli. These results were replicated for EA-d', but not for the EA-mSI index (Supplementary Table 5).

HEP results
As expected, we found a significant positive correlation between IAmd scores and HEP amplitude in a window of 300-400 ms after the Rwave peak in the defined extended ROI comprising 30 fronto-central electrodes (rs ¼ 0.281; p ¼ 0.002) ( Fig. 2A). Since the md index is an error score, this result indicates that lower (thus, better) IA-dm scores are associated with more negative HEP modulations. Similar results were obtained when tested in the subdivisions of that ROI (Supplementary  Table 6). However, IA-md was not associated with HEP amplitude in the earlier 200-300-time window (rs ¼ 0.148; p ¼ 0.117). In addition, no significant association was found between EA-md and HEP modulation. Finally, IA and EA scores derived from mSI and d' did not correlate with HEP modulation in any window or ROI (Supplementary Table 6; Supplementary Figures 1A and 2.A).

Functional connectivity results
Seed analysis revealed significant associations between IA-md and the functional connectivity of key interoceptive hubs, mainly in the left hemisphere (Fig. 2B). More specifically, md was negatively associated with the strength of the correlation between the temporal course of the BOLD signal of the selected seeds (bilateral insula, ACC, and postcentral cortex) and the temporal course of the BOLD signal in insular, frontal, temporal, postcentral, precentral, and inferior parietal cortical regions (Supplementary Table 7). Repeating this analysis in the subsample of subjects <55 years old yielded a consistent though more widespread pattern of results (Supplementary Table 8 and Supplementary Fig. 3). Results were also replicated for IA-mSI (Supplementary Table 9 and Supplementary Figure 1B), although the strength of association was significantly lower than that for IA-md (t ¼ À9.14; p ¼ 0.000) - Supplementary Fig. 4. For its part, the IA-d' index correlated with the functional connectivity between the seeds and ACC, precentral, postcentral, frontal and temporal regions (Supplementary Table 10 and Supplementary Figure 2B). In contrast, no significant associations were found for EA measured as md and mSI ( Supplementary Figs. 5 and 6). Lastly, while the functional connectivity of some seeds appeared significantly correlated with EA-d', these do not belong to interoceptive networks, but comprise occipital, precuneus, and cerebellar regions (Supplementary Table 11 and Supplementary Fig. 7). All fMRI results were considered significant with a statistical threshold of p < 0.001, uncorrected, extent threshold ¼ 30 voxels (Moguilner et al., 2018;Loitfelder et al., 2012).

Socio-emotional skills results
The subjects' performance in emotion recognition tasks is displayed in Supplementary Table 12. We found significant associations between IA-md scores and measures of negative emotion recognition. More specifically, better performance (lower IA-md scores) correlated with higher scores in the recognition of negative emotions in the two tasks administered: Ekman-35 (rs ¼ À0.323; p ¼ 0.022) and TASIT-EET (rs ¼ À0.328; p ¼ 0.034). For visualization purposes, Fig. 2C displays the correlation between IA-md and a composite negative emotion recognition score, comprised by the sum of the subjects' performance in both tasks. In addition, we found a significant negative correlation between IA-md and TASIT-EET total score (rs ¼ À0.403; p ¼ 0.005), and a trend toward significance in the association between IA-md and Ekman-35 total score (rs ¼ À0.263; p ¼ 0.065). In contrast, IA-md was not correlated with positive emotion recognition -as measured with Ekman-35 (rs ¼ 0.088; p ¼ 0.543). Results concerning TASIT (negative emotion recognition and total scores) were replicated for IA-d', but not for mSI. Additionally, IA-mSI and IA-d' were not associated with positive emotion recognition. Furthermore, no significant associations were found between EA -as measured by md, mSI and d'-and emotion recognition measures (All these results are provided in Supplementary  Table 13).

Feature selection
For our first multivariate regression architecture (Section 2.6 and Fig. 1C, bottom left diagram), we included as predictor features the experimental variables that yielded significant associations with any IA index in the previous analyses. In total, we included: -Four EEG metrics: HEP amplitude values in the 300-400 ms-window after the R-wave peak in the extended ROI comprising 30 frontocentral electrodes, and in the left-frontal, central-frontal, and rightfrontal subdivisions of that ROI (since all these variables were significantly associated with IA-md); -Sixteen fMRI metrics: the average functional connectivity of each seed that showed a significant association with each IA index (i.e., 6 features corresponding to the functional connectivity of the 6 seeds that showed significant associations with IA-md -Supplementary Tables 7 and 5 features corresponding to the functional connectivity of the 5 seeds that showed significant associations with IA-mSI -Supplementary Tables 9 and 5 features corresponding to the functional connectivity of the 5 seeds that showed significant associations with IA-d' - Supplementary Table 10); and Multivariate analysis results. Combined HEP, fMRI, and socio-emotional skills metrics (i.e., experimental features) yielded a greater coefficient of determination for IA-md than for IA-mSI and IA-d' (left panel), and these results persisted when adding demographic features, even improving for IA-md (right panel). Regressor performance is shown on test data.
-Three socio-emotional skills metrics: Ekman-35 negative emotion recognition score (since this variable was significantly correlated with IA-md), and TASIT-EET negative emotion recognition and total scores (since these last two variables were significantly correlated with IAmd and IA-d') - Supplementary Table 13.
For our second multivariate regression architecture (Section 2.6 and Fig. 1C, bottom right diagram), we added to the previously mentioned features three demographic variables (gender, age, and years of education) and one executive functioning variable (total IFS score) -collectively called 'demographics'.

Multiple linear regressions results
The combined experimental features (HEP, fMRI, and socioemotional skills metrics) resulted in a higher coefficient of determination for IA-md than for the comparison indexes, IA-mSI and IA-d' (Table 1 and Fig. 2D, left panel). When adding demographics to the experimental features, the coefficient of determination for IA-md improved, and it remained higher than for IA-mSI -which also improved-and IA-d' (Table 1 and Fig. 2D, right panel).

Discussion
This work provides, for the first time, a systematic multidimensional approach to cardiac interoception in combination with a dynamic and sensitive IA index (i.e., md) during a validated motor-tracking HBD task (de la Fuente et al., 2019). We showed that this metric is associated with canonical neurocognitive markers of interoception, including the HEP, functional connectivity within interoceptive hubs, and socio-emotional skills. Furthermore, using a multivariate regression model, we showed that IA-md can be predicted by those markers better than by mainstream IA indexes (mSI and d'). Lastly, while IA-md was not directly associated with the sample's demographic variables (age, gender, and years of education) and overall executive functioning, adding these features to the multivariate regression model increased predictive precision, suggesting that IA-md is more sensitive to non-interoceptive variables that may partially account for subjects' performance in the HBD task. Therefore, our approach represents a robust framework for the field, since the IA-md index overcomes several methodological limitations of mainstream alternatives, including Schandry's index and the d' index.
First, we assessed whether our md index yielded predictable behavioral results by discriminating between interoceptive and exteroceptive abilities. We found poorer performance in the former condition when measured with md, but also with mSI and d'. Note, in this sense, that the interoceptive condition of the HBD task (where participants are asked to follow their own heartbeats without taking their pulse) involves high uncertainty, usually resulting in floor-level scores regardless of the method used to quantify IA (Khalsa et al., 2009a). In addition, IA-md scores were more variable than EA-md scores, again reflecting the high degree of uncertainty of the interoceptive condition and the dispersion found in interoceptive ability in the general population (Marshall et al., 2017;Garfinkel et al., 2015;Tsakiris et al., 2011).
Regarding the relationship between md and neurophysiological markers of interoception, we found a significant correlation between IA scores and HEP modulation (the better IA-md score, the more negative the amplitude of the HEP). The negative-going modulation of the HEP is considered a canonical marker of interoception since it (i) captures allocation of attention to body signals (Montoya et al., 1993;Yuan et al., 2007;Petzschner et al., 2019), (ii) distinguishes between good and bad heartbeat perceivers (Pollatos and Schandry, 2004;Pollatos et al., 2005), and (iii) has sources in interoceptive hubs (Pollatos et al., 2005). However, the association between HEP amplitude and behavioral performance in HBD tasks have proven elusive (Salamone et al., 2018;Terhaar et al., 2012;Schulz et al., 2015). Similarly, in our study, HEP modulation was not significantly associated with either IA-mSI or IA-d' outcomes. Importantly, EA was not related to HEP amplitude regardless of the method used, highlighting the specificity of the result for the IA-md index.
Results concerning hemodynamic markers of interoception also support the sensitivity of our md index. Indeed, IA-md was related to functional connectivity among interoceptive networks. Specifically, we found that, the better IA-md score, the stronger the resting-state functional connectivity among insular, somatosensory (i.e., postcentral), frontal, temporal, and ACC regions. These results are in line with previous studies from active (Critchley et al., 2004;Zaki et al., 2012;Schulz, 2016) and resting-state (Garcia-Cordero et al., 2016;Salamone et al., 2018) fMRI experiments consistently implicating those cortical structures in interoception. Particularly, the insular and somatosensory cortices play a key role in mapping the physiological condition of the body and in using that information to generate subjective feeling states (Craig, 2002;Critchley et al., 2004). Connections within interoceptive seeds and frontal regions (i.e., middle and superior frontal gyrus) may reflect the allocation of attention to endogenous stimuli needed for decision making (i.e., tapping responses) during the task (Farb et al., 2013). In contrast to previous evidence (Adolfi et al., 2017;Critchley et al., 2004), the involvement of the ACC was minor in the present study. However, this is not surprising since this region might be more relevant for top-down executive monitoring (Bush et al., 2000), while a primary tracking of bodily changes would occur in insular and somatosensory cortices (Critchley et al., 2001).
It is worth noting that our functional connectivity results showed a bilateral but more left-lateralized insular involvement. This finding would seem to clash with previous reports of predominantly right-sided insular activity in the processing of interoceptive signals (Critchley et al., 2004;Schulz, 2016;Pollatos et al., 2007). However, meta-analytic evidence of interoception (Adolfi et al., 2017;Schulz, 2016;Salvato et al., 2019) has revealed a significant engagement of the left insula, slightly below that of the right insula. Moreover, in Adolfi's study (Adolfi et al., 2017) while the greatest likelihood of activation was found within the right insular cortex (BA13), additional significant clusters in the left insula (BA13) comprised a greater number of voxels, suggesting a greater spatial extent in that region. Bilateral modulations of the insula (Critchley et al., 2004;Pollatos et al., 2007;Kleint et al., 2015;Caseras et al., 2013;Ernst et al., 2013;Simmons et al., 2013;Tan et al., 2018;Yao et al., 2018) and the neighboring Rolandic operculum (Blefari et al., 2017) have also been consistently reported during active cardiac interoceptive tasks. In fact, motor-tracking HBD tasks similar to ours have yielded activations not only in the right anterior insula/frontal operculum (Zaki et al., 2012), but also (and exclusively) in the left insula (Klabunde et al., 2019). Finally, and more pertinent to our results, previous associations between resting-state fMRI connectivity and IA in HBD tasks have yielded mixed results. Chong et al. (2017) reported a significant correlation between heartbeat counting scores and salience network connectivity in the right posterior insula, but also a trend towards a positive association in the left posterior insula, suggesting the involvement of a bilateral insular pattern in cardiac monitoring. More specifically, using the same motor-tracking HBD task as ours, positive Experimental variables þ demographics (gender, age, years of education, and executive functioning) fMRI: functional magnetic resonance imaging; HEP: heart-evoked potential; IA: interoceptive accuracy. associations have been found between IA scores and the functional connectivity of the left or bilateral insula (Garcia-Cordero et al., 2016;de la Fuente et al., 2019;Yoris et al., 2017a). Taken together, all this evidence supports the bilateral involvement of the insula in cardiac interoception, even in experimental settings very similar to the present one.
In particular, the specific left (and bilateral) insula involvement during motor-tracking HBD performance could be interpreted in light of the embodied predictive interoception coding (EPIC) model (Barrett and Simmons, 2015), which proposes an active inference account of interoception. According to the EPIC model, the interoceptive system in the brain is composed by agranular visceromotor regions (e.g., anterior insula, posterior ventromedial prefrontal cortex, cingulate cortex) that generate interoceptive predictions and prediction errors from actual sensory signals (related from the body to the granular layer IV of the primary insular interoceptive cortex). The prediction errors can in turn act as a forward model to prime motor responses. Thus, the mid-posterior insula would compute the interoceptive prediction error and propagate it back to the deep layers of the visceromotor regions where the predictions originated. In this context, we propose a forward model based on intra-hemispheric insular-motor system connections: Insular hubs may convey information from interoceptive predictions errors to adjust motor actions (here, tapping responses to heartbeats). Since the majority of our subjects were right-handed, the lateralization of results to the left insula could be explained by the intra-hemispheric connections with the left motor system corresponding to the dominant hand-movements. However, further research is required to directly test the hypothesis of this forward model.
The pattern of functional connectivity results described above was replicated when excluding older adults (>55 years old) from the analysis (Supplementary Table 8 and Supplementary Fig. 3), suggesting common mechanisms across a very large age-range. Results were also replicated for IA-mSI, although less robustly. Regarding the functional connectivity associated with IA-d', it did not involve the insular cortex, a key interoceptive hub (Craig, 2002). Thus, fMRI results favor our IA-md index. Importantly, all reported associations were specific for IA (as opposed to EA) scores, supporting the construct validity of IA-md index as a measure of interoceptive ability.
The link between interoception and socio-emotional processing is grounded in strong theoretical frameworks (Wiens, 2005;Craig, 2002;James, 2013;Damasio, 2006;Barrett and Russell, 2014), with embodied simulation accounts suggesting that individuals might be able to recognize others' emotions by means of body resonance and by interpreting the corresponding interoceptive signals (Gallese and Sinigaglia, 2011). However, these ideas have received sparse empirical support from HBD tasks, with some studies reporting associations between IA and the sensitivity to facial emotions (Terasawa et al., 2014), empathy (Grynberg and Pollatos, 2015), or affective theory of mind (Shah et al., 2017), and others providing incongruent findings regarding emotion perception (Ferguson and Katkin, 1996) and various socio-emotional skills (Ainley et al., 2015). We suggest this might be due to the index used to quantify interoception. In fact, here we found significant associations between interoceptive ability and socio-emotional skills when IA was measured with md, but not with mSI or d'. More specifically, IA-md correlated with the recognition of negative emotions in others in two tasks: one consisting on identifying facial emotions in static pictures (i.e., Ekman-35) (Bertoux et al., 2014), and another with greater contextual load, consisting in recognizing emotions in naturalistic social scenarios (i.e., TASIT-EET) (McDonald et al., 2003), which implicates social cognition skills in general, and theory of mind in particular. Additionally, associations with interoception were specific for negative (as opposed to positive) emotion recognition, in accordance to previous research (Critchley et al., 2004). This specificity may reflect common neural substrates between interoception and the processing of negative affective states, such as disgust (Wicker et al., 2003), pain processing (Lamm et al., 2011), empathy for pain (Lamm et al., 2011), envy (Takahashi et al., 2009), and social exclusion (Eisenberger and Lieberman, 2004), among others, all of which converge in the insular cortex and the ACC. Thus, the md index may be more sensitive to capture the theoretical role of interoception in the vicarious experience of emotional states. Note that we found no relationship between EA and emotion recognition, underscoring the specificity of results for our IA-md index.
After univariate analysis, we aimed to test how the combination of multiple dimensions (i.e., electrophysiological, hemodynamic, behavioral) explained the variance in the sample's IA scores when measured by each index (md, mSI and d'). Thus, we performed a data-driven multivariate regression including as selected features all the variables yielding significant associations with IA in the previous analyses. Results revealed that prediction was more accurate for md, indicating that our measure better captures interoceptive features across dimensions. Furthermore, these results persisted (and even increased for md and mSI) when adding confounding variables to the model, including demographics and executive functioning information. Thus, domain-general factors may interact with specific interoceptive dimensions in explaining the variance in IA scores. Indeed, interoceptive performance might prove better in male (Montoya et al., 1993;Grabauskaite et al., 2017) and young (Khalsa et al., 2009b) subjects, in relation to mediating factors such as body composition (percentage of body fat) (Cameron, 2001). In addition, cognitive abilities (indexed here as executive functioning) may also impact on HBD performance. Educational level can also influence interoceptive outcomes through its relationship with cognitive functioning (Hackman et al., 2010). Importantly, we did not find associations between IA and any of those factors when assessed with univariate methods (i.e., Spearman correlations). In contrast, EA was related to executive performance and years of education, reflecting the capacity to attend to external stimuli, as expected. However, when these variables were included in the multivariate model alongside interoceptive markers, they increased predictions for IA-md, suggesting our measure outperforms other measures in capturing interoceptive variability induced by confounding factors. This finding has relevant implications concerning the assessment of interoception in heterogeneous samples, such as those composed of neuropsychiatric patients.
In sum, this work represents a robust approach combining different dimensions (i.e., electrophysiological, hemodynamic, behavioral) to evaluate HBD-derived IA with different measures. Results also support the validity of our newly developed index (i.e., md), which overcomes major limitations of other widely used alternatives. As this measure is based on capturing synchrony, it is less contaminated by confounding factors such as heart rate estimations (which affects Schandry's index), and it avoids arbitrary definitions of time-lapses to determine correct responses (which affects the d' index). More importantly, in contrast to other metrics, IA-md accounts for heart changes effects in subjects' online performance during motor-tracking HBD tasks. This aspect might be crucial in making IA-md a more sensible index of interoceptive ability. Indeed, interoceptive stimuli (i.e., heartbeats) are variable and temporally inconsistent by nature. As the literature on action-perception coupling shows, expert individuals are indeed more efficient at tracking unexpected changes in task-relevant exteroceptive stimuli (e.g., a ball moving in a sport context) (Mallek et al., 2017). Analogously, individuals with good interoceptive abilities could prove better at detecting the changing rhythm of inner stimuli (i.e., their heart rate), and IA-md is designed to capture such ability.
Also, our results are relevant for the assessment of interoception with clinical aims. In fact, the literature concerning interoceptive alterations in neuropsychiatry are partially inconsistent (e.g., 26, 27), contrasting its theoretical relevance and therapeutic potential (Khalsa and Lapidus, 2016). The md index, whose validity and sensitivity are supported by its associations with multiple dimensional canonical markers of interoception, could be helpful in this regard.
Future works should also assess whether our results reflect the neurocognitive correlates of interoception beyond the cardiac domain, and whether our measure (md) is sensitive to tap interoceptive abilities related to other systems. Indeed, interoception has been mainly studied through HBD tasks because heartbeats are discrete and frequent internal events that can be easily, non-invasively, and objectively measured (Garfinkel et al., 2015) and/or manipulated (Khalsa et al., 2009a). However, interoception is not limited to cardiac sensations, but also includes the monitoring of other internal signals, such as thermoceptive, nociceptive, respiratory, and gastrointestinal (GI) stimuli (Craig, 2002;Khalsa and Lapidus, 2016;Cameron, 2001;H€ olzl et al., 1996;Azzalini et al., n.d.;Alfonsi et al., 2016). Based on evidence showing an overlap between cardiac and non-cardiac -particularly GI-interoceptive abilities (Herbert et al., 2012;Whitehead and Drescher, 1980), we hope our results could be extrapolated to other interoceptive modalities. Notwithstanding, more research is needed to effectively test the assumption that interoceptive signal detection and awareness work in a coherent and coordinated fashion across different systems (see, for example, (Steptoe and Noll, 1997), (Kollenbaum et al., 1996;Mauss et al., 2005;Ferentzi et al., 2018)). Here we have provided a systematic framework that, although based on heartbeat detection, has the potential to be used in other contexts. In principle, our index can be implemented in any setting involving self-detectable organs' signals. To illustrate, the GI system, as the heart, also generates its own rhythm (Azzalini et al., n.d.;Sanders et al., 2006), which can be measured through non-invasive electrogastrography (e.g., 127).
Moving forward from the cardiovascular system to study other interoceptive modalities -and how they influence and are influenced by cognition and emotion-is necessary to create "interoceptive profiles" (Khalsa and Lapidus, 2016) and expand our knowledge about the mechanisms by which individuals sense their physiological condition in health and disease (Khalsa and Lapidus, 2016). Moreover, since heartbeat detection is itself difficult (with approximately 40% of subjects reporting not being able to consciously register their heartbeats at all) (Khalsa et al., 2009a), the development of experimental paradigms aimed at assessing other interoceptive modalities would be promising.
Some limitations must be acknowledged. First, its correlational approach prevents us from making causal claims. Future studies should include experimental manipulations to directly assess the impact of cardiac frequency changes in HBD performance. Second, our fMRI analysis was based on resting-state spontaneous fluctuations of the BOLD signal, which constitute only indirect evidence of the neural correlates of interoception. The use of active fMRI tasks would be useful to more precisely detect the cortical regions involved in online interoceptive processing. Finally, note that we used a permissive alpha value for our functional connectivity analyses (p < 0.001 uncorrected, extent threshold ¼ 30 voxels) (Moguilner et al., 2018;Loitfelder et al., 2012). However, our analyses were hypothesis-driven and results actually align with previous literature, suggesting that we found a true effect that could have been missed with a more conservative approach (Lieberman and Cunningham, 2009).
In conclusion, here we provided evidence for a multidimensional and multi-feature framework to interoception combined with a new IA index (md) capturing oscillatory couplings between heartbeats and responses during a validated HBD task. Comparisons of this index with other commonly used ones, alongside multivariate analysis, suggest the IA-md index would constitute a better proxy of interoceptive dynamics, even in highly heterogeneous samples. These results pave the way for new theoretical and clinical breakthroughs in the study of interoception.