Modeling physiological responses induced by an emotion recognition task using latent class mixed models

Correctly recognizing emotions is an essential skill to manage interpersonal relationships in everyday life. Facial expression represents the most powerful mean to convey important information on emotional and cognitive states during interactions with others. In this paper, we analyze physiological responses triggered by an emotion recognition test, which requires the processing of facial cues. In particular, we evaluate the modulation of several Heart Rate Variability indices, collected during the Reading the Mind in the Eyes Test, accounting for test difficulty (derived from a Rasch analysis), test performances, demographic and psychological characteristics of the participants. The main idea is that emotion recognition is associated with the Autonomic Nervous System and, as a consequence, with the Heart Rate Variability. The principal goal of our study was to explore the complexity of the collected measures and their possible interactions by applying a class of flexible models, i.e., the latent class mixed models. Actually, this modelling strategy allows for the identification of clusters of subjects characterized by similar longitudinal trajectories. Both univariate and multivariate latent class mixed models were used. In fact, while the interpretation of the Heart Rate Variability indices is very difficult when considered individually, a joint evaluation provides a better description of the Autonomic Nervous System state.


Introduction
The ability to correctly recognize own and others' emotions has been acknowledged as crucial for successful interaction with others. To assess the ability in understanding others' mental states, psychometric tools, affective picture database and facial expression database have been developed (see for an overview, [1]) and used in combination with physiological monitoring [2].
Cognitive stress triggered by emotion recognition task affects the Autonomic Nervous System (ANS) and, as a consequence, Heart Rate Variability (HRV). PLOS  334 subjects taken from the general population completed a computerized version of the original "pencil and paper" RMET test. Out of these, 174 were females and 160 were males, with an average age of 30.2 years (sd = 10.34, range = 18-68ys). 91 subjects out of the 334 subjects completed the online version of the RMET in a laboratory (44 females and 47 males, average age 26.78 years, ranging from 18 to 52ys). During the experimental session, physiological reactivity has been monitored at rest and while completing RMET. Various indices of HRV, skin conductance and blood volume pulse have been collected and derived. In this work, we focus on HRV modulation induced by the emotion recognition task. In particular, biosignals in the time-domain, e.g. mean and standard deviation of beat to beat (R-R) intervals and various spectral indices of HRV, were extracted.
Experimental sessions were organized in the morning at 11 a.m. (±1 h) and in the afternoon at 3 p.m (±1 h). Electrodes were placed in the non-dominant hand and forearms and physiological baseline was recorded for 5 minutes (at rest). Biosignals were measured, amplified, and recorded using Procomp InfinityTM (Thougth Technology, USA). After baseline measurement, RMET was administered: RMET items, i.e., 1 trial + 36 target pictures, were shown on a monitor according to the original sequence. Before showing each new stimulus, a slide with a fixation point was displayed for 3 seconds. Immediately after the presentation of the stimulus and the selection of the emotional state conveyed by the eyes, a black slide appeared and remained on the screen for 5 seconds before the fixation point slide.
Several self-report questionnaires for the assessment of anxiety, depression severity, alexithymia and the presence of psychopathological traits related to obsessive-compulsive or eating disorders have been also administered once completing the test. These factors have been showed to have an impact on physiological activation [18][19][20][21][22].
In particular, the State-Trait Anxiety Inventory (STAY-Y, [23]) has been administered to measure trait (STAI-1) and state anxiety (STAI-2). Anxiety Sensitivity (AS) construct was assessed by means of the Anxiety Sensitivity Index (ASI, [24]). Depression severity was evaluated by means of the Beck Depression Inventory (BDI-II, [25]). Psychopathological traits related to obsessive-compulsive or eating disorders were assessed using respectively the Padua Inventory (PI, [26]) and the Eating Disoder Inventory-2 (EDI-2, [27]). Actually, PI was designed to measure four factors, namely "Becoming Contaminated", "Checking Behaviours", "Impaired Control of Mental Activities", "Urges and Worries of Losing Control". Moreover, alexithymia, i.e., the difficulty in identifying and describing emotions, was measured by means of the Toronto Alexitimia Scale (TAS-20, [28]). The questionnaire has a three-factor structure. The first factor (F1) assesses the ability to identify feelings and to distinguish them from the somatic sensations that accompany emotional arousal. The second factor (F2) assesses the ability to describe feelings to other people, while the third (F3) evaluates externally oriented thinking.

Ethical statement
All the procedures performed in this study involving human subjects were conducted in accordance with the ethical standards of the San Raffaele Hospital and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. With reference to the online administration of the RMET, the questionnaire was completed anonymously without collecting any sensitive data compromising identities of the respondents. The participation to the experimental sessions was voluntary and, prior to study participation, participants gave their written informed consent. The entire FIRB project, of which the study presented in the paper is a part, was approved by the Ethics Committee of San Raffaele Hospital (CE 1129 register 213/2014).

HRV analysis
HRV refers to the variability of the length of beat to beat (R-R) intervals in electrocardiograms. It can be quantified by descriptive statistics of R-R interval duration and its variation over time, i.e. range, mean and standard deviation. Beat-by-beat series of R-R intervals were obtained by ECG recordings by applying the freely available "eplimited" software [29]. Each recording was subdivided into several segments: the baseline epoch, i.e. the 5 minutes before the starting of the questionnaire, and the segments of variable length associated to the period of time following the vision of the stimulus. Only when 85% of beats resulted normal according to ECG quality and R-R physiological value range, the segments were analyzed. The spectral analysis was performed for the baseline recording only. For each segment and the baseline epoch we computed the following time-domain parameters [30]: mean of beat to beat (R-R) intervals (msec), the average heart rate in beat per minute (bpm), the standard deviation of beat to beat (R-R) intervals commonly named SDNN, the square root of the sum of the squares of differences between adjacent R-R intervals (RMSSD), the number of pairs of adjacent R-R intervals differing by more than 50 ms in the sequence (NN50), the sample asymmetry represented by the ratio R1/R2 [31], the SD1 and SD2 parameters from the Poincaré plot [32]. Poincaré plot is actually a diagram in which each R-R interval of tachogram is plotted against the previous R-R interval, where the values of each pair of successive R-R interval define a point in the plot.
All the collected time domain measures of HRV are summarized in Table 1.

Identifying difficult RMET questions
Differently from previous works which evaluate the association between HRV and emotion recognition indexed by RMET total score (e.g., [2]), we decided to use the information provided by each item of the test and, in particular, to examine the effect of item difficulty on HRV modulation. Hence, Rasch model was used, on the total sample, just to estimate item difficulty, thus allowing for a classification of items as "difficult" or "easy". The model has been proposed in the psychometric field to study the ability of a subject to overcome or fail a test.
The key hypothesis underlying the Rasch model is that the probability of a correct answer depends on two parameters: a parameter for items and a parameter for the subject. For the sake of simplicity, a binary Rasch model without accounting for guessing was used. Incorrect answers were aggregated into a single category. The probability that an individual with a particular trait level will correctly answer an item characterized by a particular difficulty is where β j are the item parameters measuring the difficulty of the item and θ i represent the person parameter measuring the ability of the respondent [33]. Estimated difficulty parameters were used to classify items into two categories, respectively as "difficult" or "easy", if their values were larger or smaller than zero, which is the mean of the latent trait.

Latent class mixed models
LCMMs provide a flexible framework to model Gaussian or non-Gaussian (curvilinear) quantitative and even ordinal longitudinal outcomes. LCMMs generalize traditional LMEs, assuming that the population is heterogeneous and G unobserved sub-populations (latent classes), with their own mean profiles of trajectories, may be identified. Hence these models allow to account for common fixed effects over classes, for class-specific fixed effects and for sources of unobserved heterogeneity by specifying random effects. Following the notation provided by Proust-Lima et al. (2015) [17,34] and consistently with the literature on latent variable modelling, the approach requires the specification of a structural latent model, i.e., a standard linear mixed model without measurement errors, along with a measurement model, linking the latent process to the outcome of interest. When heterogeneous population is assumed, for a subject i belonging to the class c i equal to g (g = 1, . . ., G), a latent class-specific process can be defined as • t ij denotes the time of measurement for subject i (i = 1, . . ., N) at occasion j (j = 1, . . ., n i ) • X 1i (t ij ) and X 2i (t ij ) are vectors of time-dependent covariates respectively with common fixed effects β over classes and class-specific fixed effects γ g is a vector of time-dependent covariates associated with individual class-specific random effects u ig • w i (t ij ) represents an autocorrelated process.
Then a measurement model ruling the relationship between the longitudinal outcome variable, observed at time t ij , and the latent process is defined as it follows where H is a parametrized monotonic increasing link function, defining linear/nonlinear transformations, � ij are independent normally distributed errors and represents a noisy latent process at time. Every subject is assigned to one latent class only. For each subject, the latent class membership is described by a latent variable c i that equals g if i belongs to class g and probability of latent class membership is modeled using a multinomial logistic regression: where ξ 0g is the intercept for class g and ξ 1g is the vector of class-specific parameters related to the time-independent covariates X 3i .

LCMM for multivariate outcomes case
LCMM framework has been generalized to the case of multiple outcomes measuring the same latent process [34]. Let us assume that K longitudinal outcomes, indicators of the same underlying construct, are available. For each subject i, i = 1, . . ., N, and outcome variable k, k = 1, . . ., K, the set of measurements y ik = (y i1k , . . ., y ijk , . . ., y in ik k ) 0 is collected at times t i1k , . . ., t ijk , . . ., t in ik k . This model specification allows for a number of measurements and related time records varying within subjects and outcomes.
Observed outcomes actually should provide information on the true common latent process L i ðtÞ t2R .
As for the univariate case, a measurement model for each collected outcome and a structural model for the latent process can be defined.
The first describes the association between the observed outcome and the underlying latent process, the second allows to examine changes in the latent trait over time, thus identifying variables modulating the construct of interest.
The measurement model can be specified as it follows where is the true common latent process at time t ijk ; X 2i (t ijk ) 0 and γ k are, respectively, time-dependent covariates and contrasts accounting for differential effects of covariate/time on outcomes after adjustment for the latent process level. Finally, α ik are subject-and test-specific random effects and � ijk are measurements errors. Several different link functions (linear, splines, thresholds, etc.) can be chosen depending on the type of the longitudinal markers. Curvilinear as well as bounded quantitative longitudinal outcomes can be analyzed within this modeling framework.
The structural model accounts for the dynamic nature of the latent trait, embodying information on collected covariates and time. Actually, X 1i (t ijk ) 0 are time-dependent covariates associated with fixed effects β, Z i (t) 0 are other time-dependent covariates associated with random effects b i . Autocorrelated process w i (t) may be also defined. As in the univariate LCMM, also in the multivariate framework, latent classes of subjects may be hypothesized. A sketch showing the idea underlying this modeling procedure is provided in Fig 1, where multivariate outcomes are represented by different HRV indices.

Model specification
Among all the HRV indices, we chose a measure of central tendency and one of variability of beat to beat (R-R) intervals since other measures resulted correlated with each others and therefore redundant. To account for possible individual-specific physiology, the baseline values of the collected indices were subtracted to the actual values recorded while administering the RMET, thus allowing to highlight the activation induced by the task itself (if any exists).
We first fitted separate LCMM models modelling to separately examine the changes, with respect to baseline values, of the average heart rate expressed as beat per minute (Δmean.bpm) and of the standard deviations of R-R intervals (Δstd.msec). Then, we model them jointly. In all models, we estimated the impact of the following variables on the physiological response: • item (i.e., picture) sequence as time variable • performance on the RMET test (correct/wrong answer) and item difficulty (as categorical variable) • demographics and clinical characteristics (age, gender, state anxiety, depression, alexithymia) Moreover, we included interaction terms to evaluate gender-specific effects of clinical variables on physiology. In particular, we tested for differential effects of anxiety, depression and alexithymia depending on respondent's gender. All the above mentioned covariates were set as fixed in the model. Random intercept and random slope models were specified. RMET pictures sequence was set as random effect in the latent process mixed model. Random effects were grouped by subject's ID.
Flexible splines link functions were considered to account for nonlinearities in the longitudinal response.
LCMMs were estimated with a number of latent classes ranging from 1 to 4 in order to ensure an adequate sample size in each class and thus allowing for accurate parameter estimates [35]. Bayesian information criterion (BIC, [36]) was used to choose the optimal number of latent classes, thus following traditional mixture modeling approaches [37]. Example of structural and measurement models in a multivariate modeling framework, where several outcomes are expected to measure the same phenomenon. In a very general experimental setting where elicited physiological reactions are measured, one may assume to have different HRV indices measured in several occasions, e.g., while administering emotionally charged stimuli. These multivariate outcomes potentially underlie a common latent trait that could be described as an "overall physiological activation", which in turn is affected and modulated by demographic and clinical characteristics. Whenever the best model included at least two latent classes, we test, by applying appropriate inferential procedures, whether subjects assigned to different latent classes were different respect to clinical or demographic characteristics. We used this strategy to reduce the number of variables to be included in the class membership model. We aim so at reducing computational burden and improving model convergence. Covariates effectively distinguishing among latent classes were included to estimate the model and BIC was used for model comparison. Hence, we compared simpler intercept-only model with more complex models including also covariates in the class-membership multinomial logistic model.
All the analyses were performed using R Statistical Software [38], version 3.3.1. In particular lcmm [34] package was used to estimate latent class mixed models.

Results
When examining the simplest LCMM model with 1 latent class, we found that item difficulty, BDI and TAS play a significant role in modulating the change with respect to baseline of average beats per minute (Δmean.bpm) during the emotion recognition task. In particular, in presence of difficult items, physiological response significantly increases (see Table 2).
Moreover, in males, as TAS increases, Δmean.bpm significantly increases, while increasing depression levels measured through BDI significantly decreases the variation in the physiological response.
To address the second goal of our research, that is the identification of clusters of homogeneous subjects, we estimated models with a number of latent classes greater than 1. When increasing the number of latent classes, we found that the two latent classes model is the best in terms of BIC, with 51 subjects assigned to class 1 and 35 to class 2 (BIC = 15114.07). These latent classes significantly differ on total levels of alexithymia (average class 1 TAS was 43.10± 9.08 while in class 2 it was 47.8± 10.45, Mann-Whitney test p-value = 0.038) and in terms of the third TAS subscale measuring "Externally-Oriented Thinking" (average class 1 TAS-F3 was 17.69± 4.32 while in class 2 it was 19.54± 4.71, Mann-Whitney test p-value = 0.031). When including these covariates in the class membership model, we found that the best model is the Table 2. Separate simple LCMM models, with 1 latent class, for the index Δmean.bpm and Δstd.msec. BDI.TOT indicates the total score in the Beck Depression Inventory, i.e. the questionnaire administered to evaluate depression severity, STAI.Y.1 indicates the state anxiety measured by the State-Trait Anxiety Inventory, TAS.TOT is the total score in the Toronto Alexitimia Scale used to measure alexithymia, i.e., the difficulty in identifying and describing emotions. "Easy" category was chosen as reference in the item difficulty variable derived from Rasch model and "wrong" as reference for the item answer.

Parameter
Average one with two latent classes and total alexithymia score as covariate in the class membership model (BIC = 15113.02). Results are shown in Table 3. TAS presents a significant effect on class membership, with more alexithymic subjects less likely belonging to class 1. Actually, subjects assigned to class 1 show, on average, a higher variation in the physiological response, with respect to the baseline, in the longitudinal process. In the longitudinal model, we found that item difficulty and increasing TAS levels significantly increases Δmean.bpm. On the other side, with increasing levels of depression and anxiety, in men, the change in the physiological response decreases significantly.
Average posterior probabilities of falling into the class in which the subjects were classified are equal to 0.9464 and 0.9739 (Table 4), thus suggesting an unambiguous classification. In addition, the non-diagonal terms indicate that subjects classified in class 1 have a non-negligible probability of belonging to class 2 (mean of 0.0536) and conversely (mean of 0.0261).
In the model with 1 latent class, evaluating changes, with respect to baseline, in the standard deviations of R-R intervals (Δstd.msec), we found that the presentation of items classified as difficult and the rating task itself positively affect the outcome: in presence of difficult pictures to be rated and, as the number of presented pictures increases, also the standard deviations of R-R intervals increases. Clinical covariates do not significantly modulate this signal (see Table 2). When increasing the number of latent classes, we found that a three latent classes model is the best choice in terms of BIC, with 32 subjects assigned to class 1, 40 to class 2 and 12 to class 3 (BIC = 25214.13). Examining differences among these classes, it emerged that they significantly differ on the PADUA subscale reflecting "contamination fear" (average PADUA F2 score in class 1 was 9.22± 5.77, in class 2 was 7.15± 5.62 and in class 3 was 4.83±4.06, Kruskal-Wallis test pvalue = 0.035, with class 3 resulting significantly different from class 1 in the post-hoc pairwise comparison). However, when including this covariate in the class membership model, we found that the best model is the only-intercept three latent class model, being the latter model associated with higher BIC.
We found that Δstd.msec significantly increases as difficult items were shown and as the number of presented and rated items increased (significant time effect). Clinical characteristics do not play a significantly role in the physiological response modulation (Table 5).
Average posterior probabilities are equal to 0.9236, 0.9308 and 0.9816, thus suggesting again an unambiguous classification (see Table 6).
When jointly modeling the two physiological indices (Δmean.bpm and Δstd.msec), we found that a three latent classes model is the best in terms of BIC, with 32 subjects assigned to class 1, 41 to class 2 and 13 to class 3 (BIC = 41980.85).
If we assume that the latent trait measured by these two indices is a "global activation", we may conclude that the length of the task (the increasing number of presented pictures) and the presentation of difficult items have a positive and significant impact on the latent trait (see Evaluating physiological responses during emotion recognition Table 7). Inspecting the posterior probabilities table, we can observe an unambiguous classification (see Table 8).

Discussion
In this work we investigate the role of an emotion recognition task on HRV, accounting also for demographic and clinical characteristics. LCMMs provide an appealing framework to analyse univariate and multivariate longitudinal psychophysiological outcomes, due to their flexibility in handling non-Gaussian continuous outcomes. Differently from standard models for longitudinal data, the proposed approach may account for a heterogeneous population. The HRV is a well-established marker of the ANS activity. However, the interpretation of single HRV indices may be complex. Alternatively, a set of indices may provide a better description of the ANS activation or state. The typical example is given by the rest tilt test, which is a provocative sympathetic stimulus (see Figure 5 in [30]) where the physiological response is described as an increase of the heart rate along with a decrease of the standard deviation. However, when the heart rate increases, without a joint reduction of standard deviation, the interpretation of an exclusive activation of the sympathetic nervous system (SNS) could not still hold.
Our modelling strategy highlighted that in presence of complex stimuli, reflecting complex mental states to recognize, physiological response significantly increases. In fact, in both the Evaluating physiological responses during emotion recognition models for Δmean.bpm (Average Beats Per Minute) and Δstd.msec (standard deviations of R-R intervals), as well as in the joint model, we found a significant and positive effect of item difficulty. This result confirms previous findings reported by Park et al. (2013) [39], which suggested how cardiac vagal tone is associated with more adaptive top-down and bottom-up modulation of emotional attention to faces. In this perspective, our results highlight how the recognition of more complex faces (e.g., the recognition of more complex emotional states) require more physiological activation. Future studies should investigate the performance and difficulties of subjects with psychopathologies once exposed to these complex stimuli. Moreover, we found that alexithymic features are associated with an increase in Δmean.

bpm.
These results lead to some speculation from the psychological perspective. Grynberg et al. (2012) [40] reviewed and supported the hypothesis that alexithymia is linked to deficits in processing and labeling emotional facial expressions. In particular, the Authors suggested that alexithymia would be associated with processing deficits already at the perceptual level. Our data could deepen this hypothesis since it supports the belief that a bottom-up activation is present during the evaluation of an emotion inducing facial expression. Hence, a deficient perceptual level would be associated with a bottom-up physiological activation.
Finally, in line with the current literature, we found that anxiety and mood state influenced the physiological response during the task in males. Neuroimaging studies have well documented a gender dimorphism for what concerns limbic system in humans. More in detail, amygdala is modulated by both vasopressin (involved in anxiety and stress) and oxytocin [41] and it is able to influence the autonomic control [42,43]. Moreover, lots of evidences [44][45][46] supported a female superiority in the ability to read others mental states, linking this capability with the action of gender-related hormones. Taken together these evidences suggest that it is crucial to control for depression and anxiety when analyzing ANS arousal during an emotion recognition task as proposed by Quintana and colleagues in their previous work. Future studies should be conducted to deepen the knowledge on the complex interrelationships among sex hormones, anxiety, depression, the ability to understand other mental states and ANS arousal. The modeling approach here proposed can be applied to provide an insight into this mechanism.
Our work provides evidence for a relationship between emotion recognition and HRV during RMET, by evaluating longitudinal trajectories of physiological responses during the task. Our methodology should be considered a first step to provide clinicians the chance to investigate theory of mind performances and its physiological components. In fact, even if it is well documented the role of theory of mind as mediator of effectiveness in psychotherapy treatments [47] as well as it is known the role of ANS in the maintenance of psychopathological features [48], nowadays there is lack of methodologies able to assess their relationship and possible interactions.
To conclude, some directions for future works can be suggested. From the statistical point of view, we would rather consider a more complex Rasch model appropriate for multiplechoice test thus obtaining a more precise estimate of item difficulty. Moreover, the collection of more "stress-coping related" covariates could improve the characterization of the latent classes.
Finally, we focused mainly on HRV parameters. However, electrodermal activity could be also used in evaluating stress levels induced by the task. Actually, skin electrical conductivity depends on the activity of eccrine sweat glands which, in turn, is controlled by the nervous system and is involved in thermoregulation processes or varies in response to stressful situations. Also skin conductance response reflects the ANS activity and has been widely used as a marker of emotional states [49,50].