Heartbeat-related spectral perturbation of electroencephalogram reflects dynamic interoceptive attention states in the trial-by-trial classification analysis

Attending to heartbeats for interoceptive awareness initiates distinct electrophysiological responses synchronized with the R-peaks of an electrocardiogram (ECG), such as the heartbeat-evoked potential (HEP). Beyond HEP, this study proposes heartbeat-related spectral perturbation (HRSP), a time–frequency map of the R-peak locked electroencephalogram (EEG), and explores its characteristics in identifying interoceptive attention states using a classification approach. HRSPs of EEG brain components specified by independent component analysis (ICA) were used for the offline and online classification of interoceptive states. A convolutional neural network (CNN) designed specifically for HRSP was applied to publicly available data from a binary-state experiment (attending to self-heartbeats and white noise) and data from our four-state classification experiment (attending to self-heartbeats, white noise, time passage, and toe) with diverse input feature conditions of HRSP. From the dynamic state perspective, we evaluated the primary frequency bands of HRSP and the minimal number of averaging epochs required to reflect changing interoceptive attention states without compromising accuracy. We also assessed the utility of group ICA and models for classifying HRSP in new participants. The CNN for trial-by-trial HRSP with actual R-peaks demonstrated significantly higher classification accuracy than HRSP with sham, i


Introduction
Interoception, the preconscious or intentional awareness of internal bodily states such as heart rate and respiratory rate, plays a critical role in regulating brain states and maintaining homeostasis-a central concept in allostasis (Barrett et al., 2016;Sennesh et al., 2022).This ability is fundamental for experiencing and interpreting emotions (Schachter and Singer, 1962;Craig, 2002;Garfinkel et al., 2015;Critchley and Garfinkel, 2017).Alterations in interoceptive awareness are commonly observed using the heartbeat detection task in individuals with mental health conditions such as anxiety and depression (Avery et al., 2014;Eggart et al., 2019;Paulus and Stein, 2010).Consequently, enhancing interoceptive awareness induced by interoceptive attention is increasingly recognized for its therapeutic potential, offering significant benefits in the treatment of various psychiatric disorders (Quadt et al., 2021;Burrows et al., 2022;Torregrossa et al., 2022;Schmitz et al., 2023).
Attending to heartbeats as in the heartbeat detection task induces specific electrophysiological responses, such as the heartbeat-evoked potential (HEP), which synchronizes with the R-peaks of an electrocardiogram (ECG) (Schandry et al., 1986;Montoya et al., 1993; https://doi.org/10.1016/j.neuroimage.2024 This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/).Pollatos and Schandry, 2004;Yuan et al., 2007).HEP offers a direct means to explore how the brain processes interoceptive signals.For instance, Montoya et al. (1993) observed differences in electroencephalogram (EEG) channels in the frontal area between 350-550 ms after the R peak when participants focused on their heartbeat.Similarly, Petzschner et al. (2019) found significant increases in HEP amplitude in the Cz and frontocentral electrode regions between 524-620 ms during heartbeat focus compared to while listening to white noise.Mai et al. (2018) noted that HEP during a heartbeat detection task displayed higher peak amplitudes in the frontocentral and central area compared to a resting state.These studies indicate that focusing on one's heartbeat can effectively modulate typical brain responses to interoception.
However, a major challenge with HEP analysis is removing ECG artifacts.Given that HEP is derived from averaging epochs marked by R-peaks, selectively removing cardiac field artifacts is not straightforward (Dirlich et al., 1997;Devuyst et al., 2008).A meta-analysis by Coll et al. (2021), which reviewed 42 HEP-related studies, underscored that research findings could be significantly influenced by factors such as cardiac field artifacts and the methods used to preprocess these noises.Therefore, enhancing the reliability and reproducibility of HEP research necessitates effective handling of artifacts, along with the development of standardized measurement and analysis methodologies.
Due to significant challenges in removing ECG artifacts from HEP analyses, our study introduces trial-by-trial analysis of Heartbeat-Related Spectral Perturbation (HRSP) as a novel method to investigate heartbeat-related brain responses.Unlike HEP, which primarily focuses on amplitude characteristics prone to contamination by ECG artifacts, HRSP focuses on the time-frequency characteristics of EEG signals.We hypothesize that these characteristics are less susceptible to ECG artifacts, thus providing a clearer and more distinct measure of interoceptive processing.
HRSP is a time-frequency map of an R-peak locked EEG, derived by employing wavelet transforms to EEG.HRSP aligns with the eventrelated spectral perturbation (ERSP) (Makeig, 1993), with the distinction that R-peaks serve as the event onsets (Kern et al., 2013;Park et al., 2018).HRSP analysis, akin to induced gamma analysis (Tallon-Baudry and Bertrand, 1999), involves averaging time-frequency maps of EEG epochs partitioned by the onset of R-peaks, in contrast to the time-frequency map of the averaged R-peak onset EEG (i.e., HEP).This method of HRSP evaluation diverges from inter-trial coherence (ITC) analysis used in previous studies (Kern et al., 2013;Park et al., 2018), which assess the trial-by-trial consistency of HRSP across entire sessions or experiments.
The primary goal of the current study is to propose HRSP as a feature or biomarker for reflecting interoceptive attention states and to explore its characteristics for identifying dynamic brain states.We examined various properties of HRSP, including the impact of different averaging epoch sizes under dynamic state conditions, the selection of frequency bands, consistency across within and between individuals in representing interoceptive attention states.
While conventional methods concentrate on stationary features of heartbeat-evoked responses throughout a session or an experiment, they may miss the state-dependent variations within individuals.Thus, our study adopts a dynamic perspective on state changes in interoceptive attention, contrasting with conventional approaches like HEP, ERSP, or induced gamma studies that compile data from all epochs.
The challenge of adopting a dynamic state perspective lies in the reliable detection of signals amid the inherently noisy background of EEG data.Traditionally, to isolate event-related potentials (ERP) or event-related spectral perturbations (ERSP) from background EEG, inter-trial averaging has been employed (Jung et al., 2000;Gwin et al., 2010).This method typically involves averaging across entire epochs, assuming that EEG signals or time-frequency spectra are stationary.However, for effective classification of dynamic brain states, it is crucial to average HRSPs over a small number of epochs to balance noise reduction and sensitivity to changes in brain states.Consequently, determining the minimal number of averages necessary to accurately represent and respond to changes in brain states remains a significant yet unresolved challenge.
Furthermore, given the spectral nature of HRSP, determining which frequency bands are most effective at distinguishing between different interoceptive attention states remains an area of active investigation.
To assess the potential of HRSP for representing dynamic interoceptive attention states, we employed a convolutional neural network (CNN) (LeCun et al., 1998;Tran et al., 2015) specifically tailored for HRSP analysis.This decision was informed by the advantages of multivariate approaches, which are particularly effective in characterizing brain states under conditions of endogenous fluctuations, such as during resting states (Park et al., 2017).Unlike univariate analyses that evaluate single variables independently, CNN leverages a multivariate framework capable of capturing complex interactions across spectral activities in both time and frequency domains.
To enhance the stability of HRSP, we utilized time series of independent components (IC) extracted through independent component analysis (ICA) of EEG data for HRSP analysis instead of relying on signals from EEG channels (Chen et al., 2013;Whitmer et al., 2010;Maruyama et al., 2020).By focusing on ICs within the cortex and subcortex (we refer to brain components), we aim to capture interoceptionrelevant features of brain activity, which could significantly improve the reliability and accuracy of HRSP-based assessments.
The secondary goal of this study is to establish a foundation for realtime HRSP classification of dynamic interoceptive attention states.This capability could greatly enhance neurofeedback training, providing a robust tool for improving interoceptive awareness and thereby potentially benefiting mental health interventions.Implementations like the Aligning Dimensions of Interoceptive Experience (ADIE) protocol have shown improvements in interoceptive accuracy and anxiety outcomes in autism spectrum disorder patients compared to traditional treatments (Quadt et al., 2021).Furthermore, short-term mindfulness training focusing on bodily sensations has also shown significant reductions in anxiety (Lima-Araujo et al., 2022), underscoring the potential of interoception training to increase mental health (Bornemann et al., 2015;Martínez et al., 2021;Iatridi et al., 2021).
For effective real-time classification, it is crucial to rapidly and reliably extract brain components from EEG data and dynamically estimate HRSP from a few EEG epochs.Another critical issue is the rapid and reliable ICA and CNN model training for individual HRSP classification with limited data size for online applications.
To address these challenges, employing group-level brain components and classification models for new individuals can mitigate the need for extensive new EEG data collection, thus enhancing the clinical applicability of interoception training.Thus, we explored the efficacy of using group ICA, instead of individual ICA, to unmix an individual's EEG and derive time series for group brain components.The rationale behind this approach is that group brain components are sufficiently common across individuals, potentially offering greater resistance to noise than components identified from small individual EEG samples.To validate this, we assessed classification accuracy to determine how effectively these methods could identify distinct interoceptive attention states within individuals.Successful application of HRSP classification models derived from group data to new individuals suggests that certain HRSP features consistently reflect brain states across different people.
This paper is organized into four main sections.First, we introduce the experimental setup, detailing the procedures for generating HRSP and the structure of the CNN model designed for HRSP analysis.We then conduct an offline classification of interoceptive attention states using HRSPs, employing both individual and group-level models across two experimental datasets.Following this, we evaluate the specificity of HRSP by comparing classification performance between real and sham R-peaks and performing a Gradient-weighted Class Activation Mapping The process begins with filtering and common average referencing to standardize the EEG signal.Subsequently, individual Independent Component Analysis (ICA) is performed on each dataset separately, extracting unique brain components for each participant.In contrast, group ICA involves temporally concatenating all individual datasets, followed by Principal Component Analysis (PCA) and ICA to identify common brain components across participants.The extracted time series of the brain components are then subjected to continuous wavelet transformation using a Morlet filter, followed by R-peak-locked epoching.The final steps include trial-by-trial normalization and baseline correction to reduce variance and enhance consistency across trials, culminating in the generation of HRSP.
(Grad-CAM) analysis (Selvaraju et al., 2017) to pinpoint the primary features used in classifying interoception states.Finally, we explore the online application of our HRSP classification system, discussing its potential implications for real-time neurofeedback and personalized medical interventions.

Participants and experimental tasks
In this study, we analyzed EEG data from two distinct interoception experiments: a binary-state experiment and a four-state experiment.

Participants
EEG data was initially collected from 19 participants, but due to incomplete records for two individuals in the open data, analyses were conducted using data from the remaining 17 participants.These participants had an average age of 23.68 ± 3.61 years and met the following inclusion criteria: aged 18 to 40 years, free from chronic disorders, brain-related injuries, surgeries, neurological or psychiatric illnesses, or history of drug abuse, and abstained from medication or drugs for seven days and alcohol for 24 h prior to the experiment.The protocol was approved by the Cantonal Ethics Committee Zurich (PB_2016-01717).Further details can be found in Petzschner et al. (2019).
The four-state experiment included 12 healthy participants (5 males, 7 females) with an average age of 24.16 ± 3.01 years, all right-handed and Korean.Participants were selected based on criteria including age between 19 to 39 and good general health.Exclusion criteria included corrected or uncorrected vision less than 0.8, and any neurological or psychiatric disorders, including substance and alcohol use disorders or depression.The study was approved by the Yonsei University Health System Institutional Review Board, and all participants provided written informed consent (4-2022-1026).

Experimental tasks
In the binary-state experimental task (Petzschner et al., 2019), participants alternated between two types of task blocks: attending to their heartbeat and listening to external white noise sounds.For the heartbeat attention task block (H), participants focused on their heartbeat for 20 s upon seeing a heart image.For the sound attention task block (S), signaled by the headphone image, participants concentrated on the white noise changes for 20 s.Every task block is followed by an evaluation step, during which participants were asked to rate their level of attention on each task on a scale from 1 to 10 for self-assessment by asking, ''How much would you associate your perceived heartbeat in the previous block with the color red?''.The experiment sequence, repeated 20 times, was H-S-H-S-S-H-H-S-H-S-H-S-S-H-S-H-H-S-H-S.EEG was recorded using a 64-channel setup with nose reference, alongside simultaneous EOG and ECG recordings at a 500 Hz sampling rate.For further details, see Petzschner et al. (2019).
In the four-state experiment, each performed randomized tasks of concentration on self-heartbeats, white noise, time passage, and toe for 30 s each, repeated five times.Task sequences followed an intro (2 s), one of four tasks (30 s), evaluation (8 s), and a 10-s inter-trial interval.Every task block is followed by an evaluation step, during which participants were asked to rate their level of attention on each task on a scale from 1 to 10 using a keyboard with a prompt of ''How focused were you?''.The task paradigm was created and presented using Psychtoolbox (http://psychtoolbox.org).EEG was recorded at 500 Hz using BrainAmp (BrainProducts GmbH) and a 64-channel BrainCap, with AFz as ground and FCz as reference.T9 and T10 electrodes were repurposed to each ankle for ECG recordings.

ECG R-peak detection
To detect R peaks in the concurrently recorded ECG signals alongside EEG data, we employed the HEPLAB tool (Perakakis, 2019), which implements the Pan-Tompkins algorithm (Pan and Tompkins, 1985).This algorithm is well-known for its efficacy in real-time, continuous QRS complex detection, making it highly suitable for the precise analysis demanded by our study.

EEG preprocessing
For EEG preprocessing, we used EEGLAB (https://eeglab.org/)and Fieldtrip (https://www.fieldtriptoolbox.org/)toolbox on the MATLAB (2023b, Mathworks Inc.).Fig. 1 presents the entire procedure.After downsampling to 250 Hz, we applied 0.3 Hz high-pass filtering and 30 Hz low-pass filtering to EEG.As described in Petzschner et al. (2019), no bad channels were identified in the binary-state experimental data.Thus, no interpolation was required for these datasets.However, for the four-state experimental data, an average of 2.42 ± 2.39 bad channels were detected among the 62 EEG channels (excluding the two ECG channels) using the clean_rawdata function (Kothe and Makeig, 2013).These identified bad channels were then interpolated using the spherical method as described in Perrin et al. (1989) and Kang et al. (2015).We adopted the common average reference (CAR) scheme used in Ludwig et al. (2009) and Yao et al. (2019).

Independent Component Analysis (ICA)
To minimize diverse artifacts, we applied ICA to the preprocessed EEG signals, focusing on the time series of brain components rather than EEG channel signals.We employed the infomax ICA algorithm (Cardoso, 1997) to extract individual ICs specific to each participant and to extract group ICs common across the entire cohort.
For group ICA of  individuals' EEG data, we adopted a method used in Huster et al. (2015) and Eichele et al. (2011).In brief, ICA was applied to group-concatenated data  from all individual's EEG data   ,  ∈ {1, … , }, each having dimensions [ ×  ] (i.e., EEG channels × samples).The concatenated data were first decomposed into principal components along the channel axis using Principal Component Analysis (PCA) with a basis matrix   .Group ICA was then applied to these principal components to estimate the mixing matrix   .By multiplying   and   , we calculated  , enabling the transformation of each participant's EEG data,   , into common independent component data,   , of dimensions [ ×  ] (i.e., components × samples).
This approach differs from the original group ICA methods (Huster et al., 2015;Eichele et al., 2011), which perform PCA on each participant's EEG data before temporal concatenation of EEG time series.Applying PCA to group-concatenated EEG data directly without computing individual PCA weights allows for the projection of a precomputed unmixing matrix onto the individual EEG data, significantly enhancing processing speed for real-time applications.
To ensure the extraction of ICs corresponding to brain sources, we identified brain-related ICs, i.e., brain components using the ICLABEL function (Pion-Tonachini et al., 2019).This refinement process ensures the unmixing matrices represent only brain components, crucial for accurate brain state classification.

Heartbeat Related Spectral Perturbation (HRSP)
In prior research, ERSP analysis typically involves applying a discrete wavelet transform to EEG data within a specified epoch.However, this approach is limited by the potential distortion of low-frequency band signals due to boundary effects at the start and end of each epoch.To address this limitation, our study employs a continuous wavelet transform (CWT) with Morlet wavelets (using the FieldTrip function ft_freqanalysis.m,width = 7) across the entire task duration of the signals.This method produces time-frequency spectral maps for the complete EEG dataset before epoching.We set the frequency range at 1 Hz intervals from 5 Hz to 20 Hz, excluding frequencies below 5 Hz to reduce the influence of slow-varying artifacts, aligning with approaches used in previous studies (Park et al., 2018;Kern et al., 2013).
We then segmented these time-frequency maps into epochs synchronized with the R-peaks, spanning from −0.2 s to 0.6 s from each R-peak, designated as single-trial HRSPs in this study (Petzschner et al., 2019;Coll et al., 2021;Park et al., 2018).An illustration of this single-trial HRSP construction is shown in Fig. 1.Petzschner et al. (2019) excluded R-peaks with an RR interval (RRI) of less than 0.7 s and used the remaining R-peaks to epoch EEG data from −0.1 s to 0.652 s to analyze HEP.In the binary-state experimental dataset, the mean RRI was 0.92 ± 0.13 s, allowing for the analysis of most epochs according to this criterion.However, in the four-state experimental dataset, the mean RRI was 0.81 ± 0.074 s.Notably, one participant had a mean RRI of 0.66 ± 0.024 s, which is below the 0.7 s threshold, necessitating a more flexible approach to analyzing HRSP epochs than the fixed epoch length traditionally used.
To adapt to shorter RRIs under 0.8 s, we employed an interpolation scheme.For these shorter epochs, we divided the RR intervals into four segments, designating the pre-HRSP to the last quarter segment before the R-peak and post-HRSP to the first three-quarters after the R-peak.These segments were then interpolated to standardize HRSP duration across all epochs.In this interpolation process, to preserve the Rpeak location before and after interpolation, bilinear interpolation was applied to both the pre-HRSP and post-HRSP segments.Specifically, the pre-HRSP segment was stretched to span from −0.2 s to the R-peak, and the post-HRSP segment was stretched from the R-peak to 0.6 s.These stretched segments were then concatenated to form single-trial HRSPs, ensuring uniform time points across all HRSPs, independent of the R-R interval (RRI) length (Supplementary Figure 1).
Each HRSP underwent single-trial normalization using a ztransformation with the mean and standard deviation of the entire time period at each frequency, aiming to reduce variability between trials (Grandchamp and Delorme, 2011).
Furthermore, we implemented baseline correction for the time series at each frequency for each trial to minimize the influence of low-frequency noise and enhance the signal-to-noise ratio, facilitating smoother data analysis (Freeman et al., 2000;Slotnick et al., 2002;Oostenveld et al., 2011).We utilized the z-score method for baseline correction, which is well-regarded in time-frequency analyses similar to HRSP, as evidenced by research findings in studies such as Roehri et al. (2016) and Young et al. (2019).This process involved a ztransformation using the mean and standard deviation determined during the baseline period from −0.2 s to −0.1 s.This period was chosen due to its minimal influence on the R-peak, enhancing the accuracy and consistency of our spectral analysis (Coll et al., 2021;Park et al., 2018).

HRSP CNN model
We constructed a deep learning framework for classifying a single HRSP using a 3D CNN architecture, as shown in Fig. 2. The CNN model has a basic feature extraction module that processes a defined set of HRSP frequency bands.
The feature extraction process follows a series of structured steps.Initially, 3D convolutional layers (LeCun et al., 1998;Tran et al., 2015) perform convolution operations with sliding filters on the timefrequency characteristics of each component, effectively extracting relevant features.These features are then normalized using batch normalization, and ReLU is applied to introduce nonlinearity.Following this, 3D max-pooling layers reduce the dimensionality of the timefrequency characteristics for each component, retaining only the most specific features.This sequence of steps is repeated two more times to enhance the feature extraction process.Subsequently, the extracted features are converted into one-dimensional data through the flatten layer.The classification process is then performed using a multilayer perceptron classifier, which includes two fully connected layers, a dropout layer (Srivastava et al., 2014), and a softmax output layer.
This design efficiently handles the three-dimensional HRSP data with a size of components (C) × frequency (F) × time (T), ensuring the extraction of distinct time-frequency characteristics while maintaining channel independence.After flattening, the model addresses feature interdependencies across channels.
For model training, we employed the Adam optimizer (Kingma and Ba, 2014), with an initial learning rate set at 0.001 and a maximum of 30 epochs.To mitigate overfitting, an early stopping criterion was applied based on the validation loss, ceasing training if no improvement was observed for five consecutive epochs.Given the relatively small dataset size (about 400 samples for individual datasets), the mini-batch size was set to 32.For larger group datasets comprising 6800 samples (400 × 17), the batch size was increased to 128 to optimize learning from the more extensive data collection.Validation frequency was maintained consistently at every 50 steps throughout all training phases.The models were configured and trained using the Matlab(R2023b) environment.

Offline evaluation of HRSP properties for interoceptive attention state classification
In our study, we focused on evaluating three key aspects of HRSP for classifying interoceptive attention states.First, we applied two types of ICA: individual ICA on individual EEG data to extract unique brain components for each participant, and a group ICA on a combined dataset to identify common components across participants.Both serve as input data for a CNN model after HRSP transformation.We expected that the group ICA would enhance classification robustness and reduce individual data accumulation needed for training.Second, we analyzed which frequency bands within the HRSP were most indicative of specific brain states, aiding in more accurate state classification.Lastly, we looked at the minimal number of epochs to average in the HRSP calculation to find a balance between reducing noise for higher classification accuracy and maintaining the HRSP's dynamic representation.Our analyses were grounded in a binary classification experiment, distinguishing between heartbeat-focused and sound-focused conditions.We also evaluated the classification accuracy using the four-state experimental data.

Application of individual ICA and individual CNN model for classification of individual brain states
When employing individual ICA and model, participant-specific independent brain components and their respective time series were used for HRSP analysis.We assessed the classification accuracy of each individual's CNN model using a 5-fold cross-validation (CV) method.The methodologies and experimental approaches using individual unique components and models utilized in this study are depicted in Fig. 3B(1).

Application of group ICA and group CNN model for classification of individual brain states
For group ICA and model, we evaluated classification performance using a leave-one-out strategy.In this approach, we trained the model on the data from 16 participants and tested it on the remaining participant.
The training for the 16 participants involved concatenating their data and performing a 5-fold CV.This procedure was repeated for all 17 participants to determine the average classification accuracy when applying a group-trained model to new individual data.The methodologies used are depicted in Fig. 3B(2).

Frequency bands and number of averaging epochs for classification
We explored which frequency bands of HRSP most effectively reflect specific brain states by conducting a comparative analysis of classification accuracy across five frequency bands.This approach was informed by prior research (Makeig et al., 2002;Park et al., 2018) and included: theta (5-8 Hz), alpha (9-12 Hz), low beta (13-20 Hz), combined theta+alpha (5-12 Hz), and combined theta+alpha+low beta (5-20 Hz).
To find the minimal number of averaging epochs that reliably represent brain states in the midst of ongoing EEG activity, we evaluated the classification accuracy of HRSP by averaging different numbers of consecutive epochs: 1, 3, 5, 7, and 9.
We also conducted a temporal similarity analysis among all HRSPs at all the epochs of the experiment to explore the frequency of state changes reflected in the changes of the similarity in the consecutive HRSPs.The similarity was defined by the correlation coefficients of HRSPs after vectorizing all brain components and frequencies (5-20 Hz) of each HRSP.Let each individual have a total of  consecutive heartbeats for the entire experiment, resulting in  consecutive HRSPs in the temporal order.We evaluated the correlation coefficient as a similarity index between all pairs of  HRSPs, resulting in an  ×  similarity matrix.We hypothesized that temporally neighboring HRSPs would exhibit higher similarity than those more temporally distant.Additionally, we anticipated that states would persist for some duration; within the same state, the similarity across consecutive HRSPs should be high, decreasing at transitions to different states.This temporal similarity matrix is expected to reveal the frequency of state transitions.
The selection of frequency bands and the minimal number of averaging epochs were simultaneously assessed to identify the combinations that maximize classification accuracy between attending to the heartbeat and external sound.This assessment utilized components derived from both individual ICA and group ICA, aiming to optimize the parameters for accurate real-time classification of brain states.

Sham HRSP with sham R peaks
To determine if the classification of interoceptive attention states using HRSP was due to brain responses time-locked to R-peaks, we generated sham R-peaks by introducing random jitters ranging from −500 ms to 500 ms to the original R-peak positions, following the methodology used in previous studies (Kim et al., 2017;Simor et al., 2021).Using these sham R-peaks, we created sham HRSPs employing the same procedure as for the original R-peak-based HRSPs.The classification was then carried out using the input frequency band conditions and the number of averages that previously resulted in the highest classification accuracy in offline tests.This process was replicated ten times to ensure the consistency and reliability of the findings.

Grad-CAM analysis
To examine the key factors influencing the classification of brain states -specifically group brain components, frequency bands, and temporal ranges -we utilized the Grad-CAM technique (Selvaraju et al., 2017).We trained and tested a model using concatenated HRSP data from all participants, organized into five folds for 5-fold CV.Each fold was randomly designated as test data once, while the remaining data was split such that 20% was used for validation to train the model and assess accuracy.This evaluation was not intended for the prediction of individuals but to evaluate the model classification characteristics.
We applied Grad-CAM to each model and dataset configuration to visualize which features were most influential in the classification decisions.This analysis was also extended to the classification tasks involving sham HRSP data to compare the impact of genuine versus randomized R-peak alignments on model interpretability.The methodological flow, including the Grad-CAM application, is outlined in Fig. 3B(3).

Classification of heartbeat attention levels
In the binary-state experiment, we conducted a classification experiment to determine if varying levels of attention to heartbeats could be discerned using HRSP features that were previously effective in the binary state classification.This aimed to assess whether these features could not only differentiate between interoceptive and noninteroceptive attention states but also identify different degrees of interoceptive engagement within the heartbeat-focused tasks.
Participants in the original study by Petzschner et al. (2019) were asked to associate their perceived heartbeat intensity with the color red after a 20-s task block, which ostensibly aimed to enhance task engagement.We interpreted these responses as indicative of the participants' attention levels toward their heartbeat.For analytical purposes, we categorized each participant's responses across all session blocks (10 in total) into three attention levels: high, medium, and low.This categorization was done by sorting and dividing the ten scores into three high, four middle, and three low blocks.We applied an offline group ICA approach and used leave-one-out CV to test all ten heartbeat blocks for a new participant.
Due to the limited number of evaluation scores (only five) in the four-class experiment, we did not extend this level of attention analysis to that dataset, as the small sample size could undermine the reliability of the findings.

Online classification
The current study evaluates the online classification accuracy of brain states based on HRSP, highlighting the distinction between online and offline classification approaches.Offline classification leverages the entire EEG dataset for estimating individual ICA and training the classifier model, while online classification requires only the early parts of experimental data for ICA and model training, which is then applied to later experimental data.Thus, unlike offline methods where training and testing datasets are shuffled to evaluate classification accuracy, online classification strategically uses data sequentially to train the model, subsequently testing the model's accuracy on new participants' EEG data.
In the online classification process, we evaluated two methods: individual ICA with a model trained specifically for each participant, and group ICA with a group trained model.For the individual approach, ICA and CNN model training were conducted using the early segments of a 20-block session for each participant, with subsequent blocks used to assess the model's accuracy.In contrast, the group model approach involved conducting ICA and model training using data combined from all participants, and then applying this model to the same later blocks of data used in the individual approach.This evaluation provided insights into the effectiveness of a generalized group model versus a tailored individual model for online classification.

The effects of EEG sample sizes for online individual ICA
When group data is unavailable, we must collect individual EEG data for each participant to apply ICA and train a classification model.The method involves using an unmixing matrix derived from the initial segments of each participant's EEG data.
To establish the optimal length for effective ICA, we conducted ICA on the first ten blocks of each individual's EEG, removing nonbrain components to produce HEP and HRSP.These outputs were then compared with those generated from the entire EEG dataset to determine the most suitable block length for ICA.
Once the optimal block length was determined, ICA was performed only on these initial blocks to create an unmixing matrix.This matrix was then used to generate HRSPs across the full dataset.The HRSPs derived from these initial blocks served as training data for the model, while HRSPs from subsequent blocks were used as test data to evaluate classification accuracy.This method allows us to assess the minimum EEG data length necessary for reliable online individual ICA applications, as illustrated in Fig. 3B(4).

Evaluation of group ICA and group CNN model for online classification of each individual HRSP
In scenarios where pre-acquired group data is available, we consider using group brain components along with a group CNN model for online classification, rather than deriving individual-specific components from small initial EEG samples.This approach assumes that brain components are consistent across individuals and reflect similar brain states.
To assess the applicability of the group ICA and group CNN model, we first perform group ICA using data from all participants except one.The unmixing matrix derived from this group data is then applied to the EEG of the excluded individual to extract his/her brain component time series for HRSPs.Model training is similarly conducted on the selected group data, excluding one individual, using 5-fold CV, resulting in five different models.We then evaluate the average classification accuracy of these models for each new individual who was not included in the model training, without any individual-specific adjustments.
These methods and evaluations are detailed in Fig. 3B(5), which illustrates the process of employing group ICA and CNN models for the online classification of individual HRSPs.

Results
The self-evaluation scores for attention to tasks in 17 participants in the binary-state experiment were 4.22 (SD = 2.48) in the heartbeat task (with each of the three participants having one missing value, which was replaced with the mean of the remaining values) and 3.92 (SD = 3.05) in the sound task.A paired t-test conducted on the average scores of the heartbeat and sound tasks for each participant revealed no significant difference (p = 0.30).
The self-evaluation scores for attention to tasks measured for 12 participants in the four-state experiment were as follows: attention to heartbeats, 7.40 (SD = 1.01); white noise, 8.10 (SD = 0.58); time passage, 7.89 (SD = 0.49); and toe, 7.19 (SD = 0.79).One-way ANOVA results indicated that the differences in attention scores across tasks were statistically significant, F(3, 44) = 3.81, p = 0.016.Post hoc paired t-tests revealed significant differences between the heartbeats and white noise (p = 0.037), white noise and toe (p = 0.0024), and time passage and toe (p = 0.016), while no significant differences were found in other comparisons.

Full-epoch average HRSPs versus trial-based HRSPs
Fig. 4A and B illustrate examples of HRSPs based on a smaller number of averaging epochs rather than averaging across entire epochs.Fig. 4C displays similarity maps that compare HRSPs across different epochs of an entire session for varying numbers of averaging epochs.In visually examining the average of 5 and 7 epochs, as shown in Fig. 4C, some blocks can be segmented into three or four subblocks.This partitioning indicates the presence of three or more distinct states within each 20-s block, with each state lasting approximately five to seven epochs.These observations underscore the variation in HRSP throughout a session within a single participant, highlighting the necessity for adopting a dynamic approach to HRSP analysis and determining the appropriate number of epochs for averaging to capture the nuanced changes in brain states effectively.
The paired t-test between the heartbeat and sound conditions across all individuals showed no significant differences in any components or within any specific time-frequency domains in the full-epoch average HRSPs after FDR correction (<0.05) (Supplementary Figure 2).

Offline classification performance of an individual CNN model with HRSP for individual ICA components
In the binary-state experiment, the offline application of individual ICA revealed an average of 11.71 ± 6.29 brain components per participant.Classification using the combined theta and alpha bands of HRSPs, averaged over nine epochs, yielded a high accuracy of 98.39 ± 0.96%.Specifically, the true positive rate (TPR) for accurately predicting the attention state to self-heartbeats was 98.18 ± 1.43%, and TPR for white noise was 98.64 ± 1.17%.These results are presented in Table 1.

Table 1
Offline classification accuracy of binary-state experimental data (N = 17) after 5-fold CV using individual models and individual components (chance level = 50%).The element indicates accuracy (%) and standard deviation according to the number of averaging epochs.In the four-state experiment (Table 2), individual ICA shows an average of 8.08 ± 4.12 brain components.The highest classification accuracy achieved 91.01 ± 5.01% when using combined theta and alpha band with averaging five epochs.Specifically, this configuration yielded a TPR of 99.67 ± 1.15% for self-heartbeats, 90.13 ± 7.60% for white noise, 78.50 ± 14.02% for time passage and 87.13 ± 10.87% for toe.
The number of averaging epochs does not significantly impact performance in the offline classification using individual models.Even single-trial HRSPs achieve relatively high classification accuracy when distinguishing between attending to heartbeats and attending to white noises.

Offline classification performance of group ICA and group CNN model in individual brain state classification
In the binary-state experiment, upon implementing group ICA, 11 common brain components were identified.The average classification accuracy following a leave-one-out CV approach, using group ICA and group models (trained on 16 participants and tested on one), demonstrated that using an average of nine consecutive HRSPs at the theta frequency bands yielded the highest accuracy, recorded at 97.60 ± 1.71% (Table 3).This setting achieved a TPR of 97.34 ± 3.80% for self-heartbeats and 97.91 ± 2.37% for sound stimuli.
For the four-state experiment, group ICA identified nine brain components.The theta band averaged over seven epochs for the group ICA and model, achieved the highest classification accuracies of 85.77 ± 4.77%.This yielded a TPR of 98.59 ± 1.66% for self-heartbeats, 92.23 ± 4.64% for white noise, 75.94 ± 15.87% for time passage, and 76.76 ± 26.41% for toe.These results are detailed in Table 4.
In summary, configurations incorporating the theta band consistently exhibited the highest classification accuracies as documented in Tables 1 and 3.
The classification models' accuracies using group information indicated that averaging more HRSP epochs generally resulted in higher classification accuracies than using a single HRSP epoch.Among the averaging epoch options of five to nine, which demonstrated maximal performance across various conditions (Table 5), we selected an averaging method of five epochs for online classification to balance responsiveness and accuracy.This choice effectively accommodates the frequent transitions between brain states while maintaining high classification performance.

Specificity results of HRSP
Fig. 5A showcases the classification accuracies for classifiers based on real and sham HRSPs.Using the binary state data with theta frequency band and averaging nine epochs, the classification accuracy (5-fold CV) for sham HRSPs reached 87.93 ± 10.09%.In contrast, for the four-state data with theta band and averaging seven epochs, the accuracy for sham HRSPs was only 61.76 ± 7.28%.These results are notably lower than those obtained using actual HRSPs, which recorded 97.60 ± 1.58% (binary state) and 85.77 ± 4.93% (four-state), with significant differences ( = 0.002 and  < 0.001, respectively), as noted in Tables 3 and 4.
The Grad-CAM results for real and sham HRSPs are illustrated in Fig. 6.For real HRSPs, the time-frequency maps of the central regions, including the supplementary motor area and middle cingulate cortex, predominantly utilized the time range from 0.2 s to 0.6 s post-R peak, focusing mainly on the theta band (Component 3).The superior parietal area (Component 8) was primarily active up to 0.35 s post-R peak, engaging both theta and alpha bands.Additionally, the precuneus area (Component 6) demonstrated a notable increase in theta band activity from 0.45 to 0.6 s post-R peak.The right centro-lateral area, extending to the insula (Component 5), was activated in the alpha and theta frequency bands, suggesting these regions' critical roles in processing interoceptive signals (Fig. 6A).
Conversely, the analysis of sham HRSPs consistently involved the left lateral occipital area (Component 2), precuneus area (Component 6), left central area (Component 7), and superior parietal area (Component 8) across the entire time domain.This diffuse activity across various brain regions in the sham condition (Fig. 6B) suggests a generalized brain response that contrasts sharply with the more localized (temporal and frequency domains) and specific activations observed in the real HRSP condition.
These findings underscore the specificity of real HRSP in capturing R-peak-locked brain responses associated with interoceptive attention states compared to the broader, less specific activations in sham HRSPs.

Classification of heartbeat attention levels
The experiment to classify attention scores was conducted using two methods: one encompassing the full frequency spectrum (5-20 Hz) and the other focusing on the theta+alpha bands (5-12 Hz), excluding the low beta band which generally exhibited the lowest task classification accuracy.The overall average classification accuracies across 17 NeuroImage 299 (2024) 120797 participants were 77.00 ± 7.92% for the full frequency spectrum and 80.75 ± 8.26% for the theta+alpha bands.The TPR for high attention states reached 93.89 ± 6.80% and 99.30 ± 1.10%, respectively.However, the TPR varied significantly among participants for medium states at 68.81 ± 28.40% and for low states at 56.90 ± 37.58%.The classification results for different attention levels are depicted in Fig. 7.

Online classification performance of using HRSP for individual ICA components
Prior to conducting online individual ICA analysis, the effect of EEG sample sizes for reliable ICA was explored.As illustrated in Fig. 5B, compared to using the entire dataset for ICA, the greatest reduction in the average MSE (mean square error) value occurred when using seven blocks, dropping from 0.29 to 0.14.
Therefore, when employing individual ICA, at least 35% of the total EEG data (7 blocks out of 20 blocks) should be utilized to achieve decomposition performance similar to that obtained using the full EEG dataset in the current experiment.In this study, eight blocks, rather than seven, were chosen for online ICA in consideration of four-conditioned blocks in the experimental design.
In the online classification process, the two frequency input methods (theta, combined theta, and alpha) were evaluated using five HRSPs' averaging epochs.
Results of the online individual classification showed that when using the theta band alone with binary-state experimental data, the classification accuracy was 95.51 ± 3.90%.

Online classification performance of using HRSP for group ICA components
In the online classification process of the binary-state experiment, as depicted in Fig. 5C, the classification accuracy of the individual model, when applied to the later part of the EEG, achieved 95.51 ± 3.90% using the theta band.Meanwhile, the group ICA and group model, without fine-tuning for individual application, showed an accuracy of 97.02 ± 1.62% using the theta band, indicating no significant difference in utilizing group versus individual information.
For the four-state experimental data, training a CNN model with the theta band of HRSPs from early eight blocks out of a total of 20 blocks and testing on the latter 12 blocks using individual ICA components and individual CNN models resulted in a classification accuracy of 77.31 ± 5.23%.However, employing group ICA and a group model for classifying an individual's EEG significantly increased accuracy rates to 88.02 ± 3.30% for the latter 12 blocks (Fig. 5D).
The repeated measures ANOVA revealed significant main effects for frequency bands and model types F(3,33) = 38.17, < 0.001.Additionally, post hoc paired t-tests revealed significant differences between theta band alone, which proved more advantageous for classification than combining theta and alpha frequency bands in the individual method ( = 0.0029) and the group method ( < 0.001).Furthermore, when using only the theta band, a comparison between the individual and group methods revealed that the accuracy of the group method was significantly higher ( < 0.001).
It should be noted that the classifier using the theta frequency band in both individual and group models, as shown in the matrices detailing the percentage distribution of the prediction outcomes for each actual category (Fig. 7C,D), achieved a TPR of 96.48 ± 9.59% and 98.21 ± 1.75% for the concentration state focused on self-heartbeats.However, it exhibited lower TPRs for white noise (88.14 ± 9.63% and 91.20 ± 6.46%), time passage (49.02 ± 24.88% and 76.38 ± 13.41%), and toe (75.44 ± 21.72% and 86.77 ± 10.80%).

Discussion
In this study, we explored how attention to heartbeats modulates neural synchrony related to heartbeats (Petzschner et al., 2019;Wang et al., 2019), termed HRSP, in the time-frequency domain using a CNN for classification analysis.This analysis aimed to evaluate the characteristics and practical utility of HRSP for online applications.
Key questions addressed in our analysis of HRSP characteristics include: (1) Does the classifier predominantly utilize brain responses that are synchronized with R-peaks? (2) Are these R-peak-locked brain responses specifically associated with interoceptive processes?

Brain state classification using R-peak locked HRSP
The experimental tasks in this study involve attending to selfheartbeats and external stimuli, each differing in cognitive demand.Notably, attending to one's own heartbeat typically requires more cognitive effort than responding to external stimuli, monitoring time, or attending to bodily sensations (Hina and Aspell, 2019;Vig et al., 2021;Murphy, 2024).Given the duration of each task is 30 s, it is plausible that the sustained cognitive processes differ between tasks.These processes might not align directly with R-peaks but could manifest as prolonged brain rhythms.
This consideration leads to a crucial analysis point: the possibility that the differences in classification accuracy between task types could stem from ongoing brain responses, rather than being strictly tied to R-peak-locked processing.This insight prompts a reevaluation of our HRSP-based CNN approach, questioning its reliance on features beyond those that are R-peak locked.
To investigate this, we performed classification tests using HRSPs generated with sham R-peaks, following the previous literature (Park et al., 2016;Kim et al., 2017;Simor et al., 2021).These tests, which involved significantly reduced classification accuracy compared to real HRSPs, support the idea that HRSP captures brain responses that are time-locked to R-peaks and linked to heartbeat detection.This conclusion aligns with previous studies (Schandry et al., 1986;Montoya et al., 1993;Petzschner et al., 2019;Coll et al., 2021) documenting Heartbeat-Evoked Potentials (HEPs) within the 200-600 ms range post-R-peak, which our HRSP results reflect in later time ranges (Figs.5A and 6A).
However, we must also consider the presence of oscillatory components that persist over extended periods, potentially related to states of alertness (Stevner et al., 2019;McCormick et al., 2020;Huang et al., 2023).The fact that classification accuracy with sham HRSPs remained above chance levels suggests that the CNN may be detecting features indicative of sustained differences in brain states across tasks.This observation implies that HRSP encompasses both R-peak-induced responses and more sustained neural activity patterns.Grad-CAM analysis further illustrates this point (Fig. 6B).The model trained on sham HRSPs highlights features distributed broadly across all time ranges, particularly in higher frequency bands (Components 1, 2, 4, 6) and nearly the entire frequency spectrum (Component 7).This result suggests that sham HRSP captures some ongoing brain activity patterns that help differentiate between interoceptive brain states and other states.In contrast, the Grad-CAM results for the model trained on real HRSPs emphasize its ability to pinpoint R-peaklocked responses primarily within the theta and alpha frequency bands, alongside broader sustained neural oscillations across higher or whole frequency bands.Notably, notable contributions from specific brain components (e.g., Components 3, 6, and 8) within particular time zones were observed, likely corresponding to R-peak-related brain responses.At the same time, Components such as 4 and 5, which span more extended periods within an HRSP trial, show a periodic pattern of brain activity, contributing to sustained attention or alertness states.

R-peak locked HRSP and interoceptive processing
The second question concerning whether attention to heartbeats modulates interoceptive processing remains partially unresolved in this study.Interoception and its neural correlates often involve preconscious processing (Critchley and Garfinkel, 2017;Schmitt and Schoen, 2022).We hypothesized that attending to one's own heartbeat could influence this preconscious R-peak-locked processing, which the CNN then distinguishes from tasks not centered on heartbeat attention.
Source localization of brain components showing localized spectra in the Grad-CAM analysis and their time-frequency characteristics suggest interoceptive processing involves regions like the insula and somatosensory cortex (Critchley et al., 2004;Farb et al., 2013;Stern et al., 2017;Petzschner et al., 2019;Klabunde et al., 2019;Wang et al., 2019;Haruki and Ogawa, 2022;Burrows et al., 2022), known to be engaged in such processing.The precuneus is also implicated in accessing interoceptive information and forming the subjective experience of emotional states (Terasawa et al., 2013;Kerr et al., 2016).
Grad-CAM analysis highlighted brain components implicated in interoception processing.Component 3, covering the supplementary motor area and middle cingulate cortex, associated with the sensorimotor area and medial part of the salience network (Ketai et al., 2016;Stern et al., 2017;Schimmelpfennig et al., 2023); right centro-lateral Component 5, extending into the insula (Critchley et al., 2004;Farb et al., 2013;García-Cordero et al., 2017;Burrows et al., 2022;Haruki and Ogawa, 2022), part of the sensorimotor network and salience network; and Component 6 that overlaps with the precuneus (Terasawa et al., 2013;Kerr et al., 2016), a part of the default mode network.These components were crucial in the classification.The superior parietal cortex (Component 8) (Klabunde et al., 2019;Haruki and Ogawa, 2022;Balconi and Angioletti, 2023) also showed early responses to R-peaks, followed by notable spectral increases around 200-600 ms post-R-peak during the interoceptive state, aligning with the integration of interoceptive sensations into awareness.
Our experiment effectively classified attention levels into three categories based on participants' self-reported attention states.However, since these levels derive from subjective self-reports rather than direct measures of interoceptive engagement, the high predictive performance of our HRSP-based classification system should not be viewed as definitive evidence of HRSP's ability to discern detailed interoceptive attention states.Nonetheless, HRSP may be used to reflect attention levels that participants can consciously recognize.

Dynamic nature of HRSP
We adopt a dynamic perspective on interoceptive attention states, leading to evaluating the minimal number of averaging epochs.Our  (Jung et al., 2000;Gwin et al., 2010), yet it may reduce sensitivity to quick changes in brain states.In our study, we chose averaging five epochs for online classification as it maintains classification accuracy while capturing rapid shifts in brain states.This choice, despite the possibility of specificity in the current experiment, is further highlighted in the similarity maps from Fig. 4C, which shows variability in HRSPs across different epochs within a session.These similarity maps reveal that brain states may shift dynamically every five to seven epochs, emphasizing the importance of a flexible approach to averaging epochs in capturing these quick transitions effectively.This observation underscores the dynamic nature of brain states during interoceptive attention tasks (Fukushima et al., 2011;Martínez et al., 2021) and illustrates the capability of HRSP to reflect these transitions over relatively short intervals.Thus, it aligns with our research objectives to accurately assess and characterize the temporal dynamics of interoceptive processing.

Methodological novelties of HRSP CNN
The current study introduces the use of a convolutional neural network (CNN) for classifying interoceptive attention states, adopting a multivariate approach that provides a robust framework for analyzing complex interactions across spectral activities in both time and frequency domains.This contrasts with traditional univariate analyses, which may not capture the entirety of spectral dynamics.
From a methodological standpoint, this research introduces trialbased HRSP analysis, diverging from conventional methods that emphasize mean amplitude evaluations across entire epochs, such as HEP (Schandry et al., 1986;Montoya et al., 1993;Petzschner et al., 2019) or phase coherence across trials (Kern et al., 2013;Park et al., 2018), summarized in Table 6.Instead, HRSP is analyzed as a timefrequency map, offering a significant advantage by mitigating the influence of cardiac-induced electrical fields that often contaminate HEP measurements.Furthermore, this study enhances analytical precision by utilizing the time series of brain components identified through ICA, rather than direct EEG channel signals (Chen et al., 2013;Whitmer et al., 2010;Maruyama et al., 2020).This method leverages individual components to effectively minimize noise and confounding influences from various brain areas within each channel.This strategic choice increases the clarity and reliability of the findings.
It is important to clarify that the primary focus of this study was not to engineer an optimal CNN architecture for maximum classification performance.Instead, the CNN served primarily as a tool to investigate the properties of HRSP.Although deep feature engineering can significantly enhance a model's adaptability to new datasets, we concentrated on evaluating HRSP features in the input layer of a relatively simple model that are more interpretable and directly associated with the interoceptive attention state.Grad-CAM analysis also supports the interpretability of the model's decision-making process by visualizing which aspects of the HRSP were most influential in the CNN's classifications.These approaches allowed us to delve into how HRSPs reflect the brain's response to interoceptive stimuli, sidestepping the complexities associated with fine-tuning deep learning models.

Benefits of HRSP CNN for online applications
The current study presents a practical strategy for the real-time classification of heartbeat-evoked brain responses modulated by interoceptive attention brain states, employing ICs identified through group analysis for individual applications.
Moreover, we advocate for utilizing a group-trained CNN model to classify individual HRSP data.This approach bypasses the need for extensive individual dataset collection and training, thereby saving significant time and resources.The primary advantage of employing group ICA (Eichele et al., 2011;Huster et al., 2015) and group-trained model lies in their ability to overcome the instability often associated with models trained on limited data samples.Group ICA provides a robust set of brain components that represent diverse brain functions more effectively than individual ICAs performed on small samples.Moreover, a model trained on group data can leverage a broader spectrum of data, enabling more effective learning and representation.These benefits were demonstrated by the higher classification accuracy of the group model using group components compared to that of an individual model trained with a shorter individual dataset, as illustrated in our four-class experiment results (Fig. 5D).
This research underscores the potential of HRSP for clinical applications aimed at enhancing sensitivity to internal bodily signals.Interoceptive-sensation-based training has been shown to significantly improve treatment outcomes for various mental disorders and enhance emotional regulation (Martínez et al., 2021;Sugawara et al., 2020;Quadt et al., 2021;Lima-Araujo et al., 2022).Effective management of such training benefits immensely from real-time monitoring and timely feedback, which are essential for adapting treatment strategies effectively.
For instance, neurofeedback, which involves training individuals to regulate their brain activity through real-time feedback, has been shown to improve interoceptive accuracy in patients with Alzheimer's disease, frontotemporal dementia and fronto-insular stroke, thereby aiding in better management of these diseases (García-Cordero et al., 2016;Sun et al., 2022;Hazelton et al., 2023).
The methodologies developed for analyzing HRSP parameters, coupled with the application of group ICA and group models, are especially valuable for neurofeedback applications, providing a robust framework for enhancing participant awareness and engagement in their own treatment processes.
W. Lee et al.

Limitations and further directions
The current study has several limitations that warrant consideration.First, the interpretations drawn from our results are based on the specific experimental setup used in this study and thus necessitate further validation to solidify these findings.Specifically, our analysis does not conclusively differentiate between the modulation of interoceptive processes by attending to heartbeats and the conscious awareness of interoceptive body signals.Further studies are required to distinctly separate these aspects of interoceptive processing.
Secondly, our focus was not on optimizing the CNN architecture for maximal classification performance.Consequently, future improvements to the neural network design could benefit the final application, potentially enhancing classification accuracy and robustness.
Thirdly, our study did not assess HRSPs across different levels of interoceptive processing.While we conducted a classification analysis using three categorized levels of attention based on participants' responses and demonstrated high predictive performance, the questionnaire used does not directly measure interoceptive awareness.Consequently, the high classification accuracy does not conclusively establish HRSP as a definitive marker of interoceptive attention states.Predicting and understanding the nuances of distinct levels of interoception and their implications remains an important area for future research.
Fourth, interoception operates as a closed-loop process where the brain responds to and modulates the heartbeat.This is prominent in emotion processing (Critchley and Garfinkel, 2017;Garfinkel and Critchley, 2013).However, the current study is not directly associated with emotional processing, which regulates the heartbeat via the hypothalamus.The apparent top-down cognitive processing in this study is interoceptive attention.Therefore, we viewed interoceptive attention as a modulator that influences the processing of bottom-up cardiac signals as it continuously works for the entire epoch, not R-peak locked by itself but interacting with the post-R-peak process.Nevertheless, the distinction between bottom-up and top-down influences within specific components or frequency bands remains to be seen and is a subject for future research.

Conclusion
This study introduces HRSP as a valuable tool for exploring interoceptive brain states, thereby enhancing our understanding of the time-frequency characteristics of EEG signals induced by heartbeat attention.Additionally, we propose a dynamic approach for monitoring changes in brain states, facilitating the real-time classification of HRSP for various brain conditions.The potential utility of realtime HRSP classification is promising for applications in neurofeedback and personalized medical interventions.However, empirical testing in these applications is essential to validate its effectiveness.Pursuing this research direction is crucial for fully realizing the potential of HRSP in clinical and therapeutic settings.

Declaration of competing interest
None

Fig. 1 .
Fig. 1.Overview of Heartbeat-Related Spectral Perturbation (HRSP) extraction steps.The sequence of preprocessing steps applied to EEG data for HRSP extraction is presented.The process begins with filtering and common average referencing to standardize the EEG signal.Subsequently, individual Independent Component Analysis (ICA) is performed on each dataset separately, extracting unique brain components for each participant.In contrast, group ICA involves temporally concatenating all individual datasets, followed by Principal Component Analysis (PCA) and ICA to identify common brain components across participants.The extracted time series of the brain components are then subjected to continuous wavelet transformation using a Morlet filter, followed by R-peak-locked epoching.The final steps include trial-by-trial normalization and baseline correction to reduce variance and enhance consistency across trials, culminating in the generation of HRSP.

Fig. 2 .
Fig. 2. 3D CNN architecture for HRSP classification.This diagram illustrates the CNN model architecture specifically designed to handle the 3D input shape of HRSP, which consists of dimensions representing brain components (C) × frequency (F) × time (T).Due to variability in the number of brain components per participant, the model is adapted to process this dimensional diversity.The architecture includes three main modules for feature extraction: a convolution layer, a BR module (batch normalization and ReLU), and a max-pooling layer.Following feature extraction, the architecture incorporates a multilayer perceptron consisting of two fully connected layers, a dropout layer, and a softmax layer for classification.The convolution layers are characterized by specified filter sizes, strides, and the number of filters, which are crucial for optimal feature detection.The features extracted from each set are concatenated at the flattening layer, leading to a comprehensive feature set that feeds into the final classification layers.
Fig. 3. Structure of all evaluations and methodologies for HRSPs.(A) The comprehensive assessment of HRSP classification examines optimal features such as frequency bands and the number of averaging epochs for online classification, determined through offline classification accuracy using both individual ICA and group ICA methods.Offline classifications are meticulously evaluated through 5-fold cross-validation (CV) for individual models and leave-one-out CV for group models, which involves excluding an individual from the group data during training.For online classification, features identified as optimal in offline settings are applied to real-time scenarios, where the individual model, trained on early data, is tested on the latter part of the same individual's data.Similarly, the group model, trained excluding the test individual, is applied to the latter part of that individual's data.The importance of HRSP features in classification decisions is visualized through Grad-CAM, enhancing the interpretability of how specific time-frequency elements and brain components contribute to the classification outcomes.(B) This section delineates the methodologies for each evaluation in (A).

Fig. 4 .
Fig. 4. Epoch-averaged HRSP examples and inter-HRSP similarity matrix according to the number of averaging epochs.HRSPs for the central component (Component 3) of a participant are displayed for different numbers (1, 3, 5, 7, and 9) of averaging epochs during tasks focused on heartbeats (A) and sounds (B).The baseline period within each HRSP is highlighted by a white dots.(C) Similarity matrices for different numbers (1, 3, 5, 7, and 9) of averaging epochs display the correlation coefficients among all HRSPs during 10 blocks of heartbeat attention task, each lasting 20 s (approximately 22 heartbeats in group average, white dashed lines).The -axis and -axis of each matrix label the sequence of heartbeat epochs, with colors indicating the degree of similarity (defined by the correlation coefficient) between different epoch HRSPs.These matrices provide insights into the stability and consistency of HRSP patterns over time, illustrating brain state transitions within the same task block through magnified examples in (C).

Fig. 5 .
Fig. 5. Evaluation Results.(A) Classification accuracies from leave-one-out CV of group models using both actual HRSPs and sham HRSPs are compared.Models were trained using a configuration that showed the highest accuracy with actual HRSPs (refer to Table 3).The CV for sham HRSPs was repeated ten times with different sets of jittered R-peaks.(B) The reliability of Independent Component Analysis (ICA) within an individual is analyzed by displaying relative mean squared error (MSE) values.These values compare HRSP and HEP generation from various numbers of EEG blocks against using the full length of EEG data.The -axis shows the number of EEG blocks used, while the -axis indicates relative MSE values, ranging from 0 (no error) to 1 (maximum error).(C) Online classification accuracies in the binary-state experiment for a new individual, comparing two frequency bands, are displayed.For individual components and models, the first eight blocks of EEG data were used for ICA and training the classification model, with the subsequent 12 blocks used for testing (labeled as ''Individual components & Model'').The ''Group components & Model'' category displays classification accuracy using a group model with group ICA for the same later 12 blocks without individual-specific training.Black dots represent outliers, and a repeated measures ANOVA found no significant differences in classification accuracies ( > 0.05).(D) Online classification accuracies in the four-state experiment for a new individual, evaluated similarly to (C), are presented.The significance levels are marked as ''ns'' for no significant differences, ** for  < 0.01, and for  < 0.001, illustrating the statistical relevance of the findings.

Fig. 6 .
Fig. 6.Independent brain components of group ICA and Grad-CAM results.For the group brain components, source localization was conducted using the LORETA method with the pop_dipfit_loreta.m function.(A) illustrates Grad-CAM results for each brain component primarily utilized in classifying between heartbeat and sound conditions with actual R peaks.Components that significantly influence classification are marked in red font.The hot colors in the time-frequency map of Grad-CAM indicate the degree of importance for classification.(B) illustrates the Grad-CAM results used to classify heartbeat and sound conditions with sham R peaks, providing insight into how classification features differ with altered peak timing.Note that the sources for the brain components in both actual and sham conditions are identical; the only difference lies in the R-peaks used for epoching in each condition.

Fig. 7 .
Fig. 7. Mean classification accuracies for the three attention level classification experiment and the online four-state classification experiment.This figure presents the mean classification accuracies for three attention levels, determined by self-reported evaluation scores in the binary-state experiment, across the entire frequency range (5-20 Hz) in (A) and within the theta+alpha frequency band (5-12 Hz) in (B).Additionally, it shows the mean classification accuracies achieved during the online classification of theta band HRSPs in the binary-state experiment, using individual-specific ICA components and CNN models tailored to each participant's data in (C), and employing a group-derived ICA and a CNN model trained on a collective dataset but applied to individuals without specific tuning in (D).Each row in the matrices sums up to 100%, reflecting the distribution of classification results across different classes for each modeling approach.

Table 4
The best results in the table are highlighted in boldface.Offline classification accuracy of four-state experimental data (N = 12) after leave-one-out CV using group models and group components.

Table 5
Summary of classification analyses.The table displays the best accuracy for each condition, detailing the evaluation methods used.The leave-one-out method involves training a model using the data from all but one individual and then testing it on the remaining individual.A 5-fold CV approach is applied during training.The individual train & test method involves training a model using the early part of an individual's EEG data and testing it on the later part of the same individual's EEG.For online processing, an average of 5 epochs (indicated in parentheses) is used to accommodate rapid changes in brain states.

Table 6
Summary for the interoceptive marker variables, analysis methods, and manipulation conditions used in the current and previous studies of HEP and Heartbeat-evoked response (HER) studies.iEEG: intracranial EEG, ECoG: Electrocorticography, HB: heartbeat, ICs: independent components, and PCs: principal components.