Contributions of diagnostic, cognitive, and somatovisceral information to the prediction of fear ratings in spider phobic and non-spider-fearful individuals

Background: Physiological responding is a key characteristic of fear responses. Yet, it is unknown whether the time-consuming measurement of somatovisceral responses ameliorates the prediction of individual fear responses beyond the accuracy reached by the consideration of diagnostic (e.g., phobic vs. non phobic) and cognitive (e.g., risk estimation) factors, which can be more easily assessed. Method: We applied a machine learning approach to data of an experiment, in which spider phobic and non- spider fearful participants (diagnostic factor) faced pictures of spiders. For each experimental trial, participants specified their personal risk of encountering the spider (cognitive factor), as well as their subjective fear (outcome variable) on quasi-continuous scales, while diverse somatovisceral responses were registered (heart rate, electrodermal activity, respiration, facial muscle activity). Results: The machine-learning analyses revealed that fear ratings were predominantly predictable by the diagnostic factor. Yet, when allowing for learning of individual patterns in the data, somatovisceral responses contributed additional information on the fear ratings, yielding a prediction accuracy of 81% explained variance. Moreover, heart rate prior to picture onset, but not heart rate reactivity increased predictive power. Limitations: Fear was solely assessed by verbal reports, only 27 females were considered, and no generalization to other anxiety disorders is possible. Conclusions: After training the algorithm to learn about individual-specific responding, somatovisceral patterns can be successfully exploited. Our findings further point to the possibility that the expectancy-related autonomic state throughout the experiment predisposes an individual to experience specific levels of fear, with less influence of the actual visual stimulations.


Introduction
The question of whether somatovisceral responses may inform about mental states has been debated for a long time, especially in the field of emotion psychology (see (Pace-Schott et al., 2019), for an overview). While it remains unclear whether emotion-specific somatovisceral response profiles exist (Kreibig, 2010), the majority of recent theories on emotion converge on the assumption that affective experiences go hand in hand with significant bodily changes (e.g., (Critchley and Garfinkel, 2017;Damasio and Carvalho, 2013;Lang et al., 2017). Unsurprisingly, therefore, prevailing classification systems (American Psychiatric Association, 2013;World Health Organization, 1992) specify diverse somatovisceral responses as characteristic symptoms of phobias and panic attacks (see (Roth, 2005), for details). Among those symptoms figure changes in sympathetic tone (sweating, trembling or shaking, palpitations, pounding heart, or accelerated heart rate) or respiration (sensations of shortness of breath or smothering, feeling of choking, chest pain or discomfort, feeling dizzy, unsteady, lightheaded or faint, paresthesia). Thus, phobic fear responses go along with substantial changes in objective and subjective physiological responding.
In an earlier publication investigating the sensitivity of physiological measures to variations in subjective fear intensity  in spider phobic and non-spider-fearful individuals, we hypothesized that (a) parts of the inconsistencies relate to differences in existing fear levels between the experiments, and (b) somatovisceral measures may be characterized by different onset thresholds and ceiling levels when indexing fear and phobia. To this aim, we looked more closely into variations in fear levels and their associations with physiological responding when spider phobic and non-spider-fearful participants faced pictures of spiders (and snakes). We observed that autonomic and respiratory responses, as well as facial muscle activity changed with varying degrees of subjective fear. Most importantly, those variables were differentially sensitive to different fear levels. Specifically, respiration differed between phobic and nonphobic participants but not between fear-level variations within each group of participants. Skin conductance captured only very high levels of fear (i.e., high fear in phobic participants), whereas activity at the M. Corrugator supercilii (related to frowning) was capable of distinguishing phobic from nonphobic, and additionally low and high fear in the phobic population. Notably, activity at the M. zygomaticus major (related to a fear grimace) and heart rate were most sensitive to variations in subjective fear. Apart from distinguishing the two populations, they also discriminated between high and low fear within both groups of participants.
While suggestive, these data do not inform about (a) the degree to which psychophysiological data acquired in a given set of situations support the prediction of subjective fear responses in a new situation, and (b) the relative importance of the different somatovisceral variables in doing so (i.e., the different measures had not been considered simultaneously, whereas them being analyzed in a combined way may considerably increase predictive power). Furthermore, our earlier study did not provide insights into (c) the importance of psychophysiological responses compared with diagnostic (e.g., diagnosis of being phobic vs. not phobic) and cognitive (e.g., subjective threat or risk in a given situation) factors. Accordingly, an added value of the usually timeconsuming somatovisceral assessments needs to be proven. Finally, nothing is known about (d) the relative importance of baseline (i.e., prestimulus, thus referring to habitual responding or expectancies) vs. stimulusdriven physiological responding in the prediction of subjective fear responding. Correspondingly, the current study addressed those four open issues by the application of machine learning.
The utilization of machine learning approaches in the study of emotion has become increasingly popular Cowen, Sauter et al., 2019;Hui and Sherratt, 2018;Izquierdo-Reyes et al., 2018;Kukolja et al., 2014;Song et al., 2019). Among various facial and voice parameters, as well as brain activity, analyses aiming at the differentiation of discrete affective states (e.g., fear and disgust) included somatovisceral measures derived from the electrocardiogram, electrodermal activity, and respiration. In the clinical field, machine learning has been used to assist diagnosis, predict treatment outcome, and inform individual-specific tailoring of therapeutical procedures for various anxiety disorders (Deckert and Erhardt, 2019;Frick et al., 2014;Hahn et al., 2015;Mansson et al., 2015;Nicholson et al., 2019;Schwarzmeier et al., 2020).
While machine learning in the above fields of research mostly tested whether physiological variables can successfully classify events into qualitatively distinct categories, it may also be used for quantitative predictions, such as fear level classification. Bȃlan and colleagues (Bȃlan et al., 2019, 2020) adopted a machine learning approach to estimate fear levels (two or four categories ranging from no fear to high fear) from electrophysiological and peripheral responding (e.g., heart rate, skin conductance) in acrophobic patients. Based on such estimation, they (Bȃlan et al., 2019) applied an individually tailored virtual reality task designed to decrease phobic symptoms. The authors of the two studies reported good fear level classification accuracy (up to 90%) with different machine learning methods. Electroencephalogram (EEG) signals in the beta frequency range, skin conductance, and heart rate turned out to be the most important features for fear level classification in one of their studies .
Because these earlier publications on subjective fear intensity included phobic individuals only, it remains to be determined whether somatovisceral responses also help in the prediction of fear responses in mixed populations (i.e., healthy and phobic individuals). Moreover, it is still unclear, whether somatovisceral responses add predictive value beyond cognitive (e.g., evaluations of subjective risk) and diagnostic (e. g., being diagnosed as phobic vs. non-phobic individual) factors. If physiology does not add to those cognitive and diagnostic predictors, it may not be worthwhile including time-consuming and work-intense physiological assessment procedures in the prognosis of individual fear.
Accordingly, in the current investigation we examined whether the inclusion of somatovisceral variables increases predictive power beyond the consideration of solely cognitive and diagnostic factors. Specifically, we re-analyzed data from a previous investigation of variations in phobic and non-phobic fear displayed toward spiders and snakes . In this experiment, spider phobic and control participants (diagnostic factor) looked at pictures of spiders, snakes, and birds, while various somatovisceral responses were assessed (heart rate, electrodermal activity, muscle activity at the M. corrugator supercilii [frowning] and the M. zygomaticus major [fear grimace], respiration rate, and tidal volume). They additionally specified their subjective risk of encountering the animal displayed (cognitive factor) as well as their fear of the respective animals displayed (predicted outcome variable, quasi-continuous rating).
In the present approach, we used a machine learning-based regression analysis to estimate the predictive power of somatovisceral responses beyond diagnostic and cognitive factors. When identifying the variables that best predicted subjective fear of spiders (responses to snakes and birds were not considered), we further distinguished between baseline responses (i.e., responses before the display of spider pictures; not considered in ) and reactivity (i.e., changes arising during the viewing of the spider pictures). Baseline responses describe bodily states in absence of visual stimulation, thereby referring to a more basic state throughout an experiment that may reflect, among others, general tension in the experimental setting or expectations regarding subsequently presented animalswithout the participants knowing whether or not a spider will ultimately be shown. Reactivity, on the contrary describes responses toward specific spider pictures and therefore clearly relates to (visual) processing of biological threat.
We hypothesized that diagnostic and cognitive factors would be characterized by greater predictive power with respect to individual fear levels than would be somatovisceral responding. This is because both diagnostic and cognitive factors relied on verbal reports, as did the assessment of subjective fear in the current study (see Method section for details). Nonetheless, we assumed somatovisceral responses to aid in the fine-tuning of fear-level predictions, and hence contribute independent predictive power. This prediction is based on the observation that mind and body are intimately linked, mutually influence each other, and hence do not function as separate units (see (Critchley et al., 2013), for details). In addition, we expected somatovisceral reactivity measures to be better predictors than baseline measures because the former (but not the latter) relate to stimulus-specific processing. These hypotheses were tested with two different machine learning approaches. One version permitted learning of participant-specific patterns (a given participant's data were allowed to be partly in the training and partly in the test sets), and another version that prevented learning of participant-specific patterns (inclusion of a given participant's data in either the training or the test set).

Participants
Our machine learning approach relied on data of 34 female (17 spider phobic) participants of an earlier publication . Because of artifacts leading to missing values in some somatovisceral variables in some participants, and because we needed a complete set of somatovisceral responses for each participant, 7 of those 34 participants could not be included in our analyses (for details on excluded measures and participants, see ).

Experimental task, procedure, and included somatovisceral measures
Participants viewed 30 pictures displaying spiders, 30 pictures displaying snakes, and 30 pictures displaying birds (only responses to spiders were considered in our main analyses; see Supplementary Materials for responses to snakes). In each experimental trial, they first saw a picture of a forest location (presented for 1s) and imagined to be there. They next viewed a picture of an animal (presented for 4s) and rated their personal risk of encountering the animal displayed (available time: 4s; detailed findings on encounter expectancies are presented elsewhere; ). This risk evaluation constituted our cognitive factor. Subsequently, participants rated their fear at facing the possibility of encountering the animal displayed on a 17-point scale (from 0% [no fear at all] to 100% [extreme, paralyzing fear], increasing in steps of 6.25%), constituting the to-be-predicted quasi-continuous outcome in our machine learning approach.
While the participants performed the task, their somatovisceral responses were registered continuously with AcqKnowledge 4.1 (Biopac, Goleta, CA). In the machine learning approach, we considered the following measures: Heart rate during picture presentation, mean skin conductance during the entire experimental trial (~15s; because of its high latency), respiration rate and maximal respiratory amplitude (as an estimate of tidal volume) for the entire experimental trial, and facial muscle activity at the M. Zygomaticus major and M. Corrugator supercilii during picture presentation. Additionally, we included the baselines of these variables (i.e., activity before picture onset; 2s for all variables except the rapidly changing muscle activity; for the latter: 1s). Further details about the experimental task, setting, procedure and included somatovisceral measures can be found in .

Multivariable regression of fear ratings
A machine learning analysis using a multivariable regression model was performed in Python (v3.8.3), to identify factors that predict the individual fear ratings of participants. For the machine learning model, "Extra Trees" (ExtraTreesRegressor, scikit-learn library v0.23.1, (Pedregosa et al., 2011)) was chosen, because it is computationally efficient, highly accurate, and able to model linear as well as non-linear relationships between factors and the prediction target, in our case the fear ratings (Hastie et al., 2009); (Geurts et al., 2006). Extra Trees implement an ensemble of "Extremely randomized trees" (Geurts et al., 2006). Generally, ensemble methods improve the performance of base predictors (decision trees in the case of the Extra Trees regressor) by averaging the predictions of all base predictors and using this average as the final prediction of the ensemble. However, to obtain diverse predictions from the same base predictors, it is necessary to use processes that introduce randomness when building the base predictors. Hence, the name "randomized trees".
The model performancethe prediction accuracywas estimated using a nested cross-validation (CV) procedure (Cawley and Talbot, 2010;Hastie et al., 2009). CV allows to assess the performance of the model that can be expected on new, unseen data, hence, the generalizability of the model. CV implements repeated train-test splits of the data; a separate model is trained and tested in each CV repetition. In the main CV loop, a shuffle-split data partitioning with 33% of the data in the testing-set was repeated 100 times, resulting in 100 Extra Trees models (1000 trees per model). Feature scaling (z-scoring) and hyper-parameter tuning was carried out within the main CV loop, with using the training-data of the current CV loop only. Hyper-parameter tuning is necessary to control model complexity and, as a consequence, to avoid overfitting the data. The hyper-parameter tuning was implemented in an inner (nested) CV procedure. Hence, a separate CV was carried out for each repetition of the outer CV loop. The inner CV loops used again a shuffle-split partitioning scheme, but these times with 50 repetitions only, to save computation time. To control model complexity and consequently prevent overfitting the data, we decided to restrict the maximum number of possible interactions of a decision tree in the Extra Trees ensembles by controlling the number of maximum leave nodes per tree. The candidate maximum number of leave nodes were randomly drawn between 2 and 512 (50 random draws, RandomizedSearchCV, scikit-learn, v0.23.1). The maximum number of leave nodes that led to the lowest squared error was subsequently used in the outer CV loop.
The obtained model was tested on the respective hold-out set of the main CV loop. The hold-out set (33% of the data) was explicitly not used in the inner CV loop. In each repetition of the main CV loop, the following two model performance metrics were computed: (1) the mean absolute error (MAE), and (2) the prediction coefficient of determination (prediction R 2 ) of the model (Hastie et al., 2009). For MAE smaller values represent a better model fit, whereas for R 2 higher values indicate a better model fit. MAE values lie between 0 (perfect model fit, no error at all) and infinity (bad performing model), and are not scale invariant, hence the values of the MAE depend on the scale of the target (fear rating). To establish a baseline MAE for determining statistical significance of the MAE, we additionally computed the chance MAE based on the predictions of a model with exact same parameters but trained with shuffled target data in each CV repetition. R 2 values lie between minus infinity (a model that performs worse than using the mean target value as the prediction) and 1 (perfect model, where all predictions are exactly the true values of the target). R 2 is scaled that 0 means that the model performs as good as using the average target value as predictor (this is referred to as the trivial predictor), and that 1 means no error at all. Statistical significance of the MAE and of the R 2 were determined using bootstrap tests (100,000 bootstrap samples; (Efron, 1992)). The null-hypothesis for the MAE was that the difference between MAE and chance MAE is smaller than or equal to zero. The null-hypothesis for the prediction R 2 was that the prediction R 2 is smaller than or equal to zero.
This analysis (computing the models in a nested CV) was carried out four times. First, two times with only non-physiological factors (small model). These factors were whether a participant was classified as phobic or not (i.e., diagnostic factor) and the perceived risk of encountering a spider (i.e., cognitive factor). Second, two times with the same factors and the somatovisceral measurements (big model). The factors included in this model were: whether a participant was classified as phobic or not (i.e., diagnostic factor), the perceived risk of encountering a spider (i.e., cognitive factor), baseline heart rate from electrocardiogram (ECG) of 2s before stimulus presentation, heart rate change from baseline to stimulus presentation (ECG of 4s after stimulus presentation minus ECG baseline), baseline electrodermal activity (EDA) of 2s before stimulus presentation, EDA change from baseline to stimulus presentation (EDA of 15s after stimulus presentation minus EDA baseline), baseline electromyogram (EMG) at the M. Corrugator supercilii of 1s before stimulus presentation, Corrugator EMG change from baseline to stimulus presentation (Corrugator EMG of 4s after stimulus presentation minus Corrugator EMG baseline), baseline EMG at the M. Zygomaticus major of 1s before stimulus presentation, Zygomaticus EMG change from baseline to stimulus presentation (Zygomaticus EMG of 4s after stimulus presentation minus Zygomaticus EMG baseline), baseline respiration amplitude (RESP_amp) of 2s before stimulus presentation, respiration amplitude change from baseline to stimulus presentation (RESP_amp of 15s after stimulus presentation minus RESP_amp baseline), baseline respiration rate (RESP_freq) of 2s before stimulus presentation, respiration rate change from baseline to stimulus presentation (RESP_freq of 15s after stimulus presentation minus RESP_freq baseline). The separation in small and big models allows to analyze the contribution of the physiological measures to the respective models' performance.
For the big models only, we further analyzed the contributions of single factors to the models' performances. For non-linear models, as used here, this is not as straightforward as for linear models. One cannot analyze model weights as they usually do not exist in non-linear methods as the Extra Trees model. However, one analysis technique that can be applied with non-linear methods, too, is permutation feature importance testing, which works as follows: (1) A baseline R 2 score is recorded by passing a testing-set through the model. (2) The values of a single factor are permuted, and the testing-set is passed again through the model. (3) The R 2 score is recomputed. (4) The importance of a factor is the difference between the baseline and the drop in overall R 2 score caused by permuting a factor's values (Molnar, 2019). The permutation thus disentangles the relationship between a factor and the prediction (fear rating), i.e. the drop in the model score is indicative of how much the model depends on that factor (Molnar, 2019). We report the drop in R 2 score for each factor normalized to the baseline R 2 score. Hence, permutation feature importance values lie between minus infinity (contradictive, misleading information), 0 (not important, because R 2 score does not change) and 1 (very important, because R 2 changes to zero). To determine whether a feature's contribution was statistically significant, we tested the null-hypothesis that the median drop in R 2 of a feature is smaller than or equal to zero with a bootstrap test (100,000 bootstrap samples per feature; (Efron, 1992)). Subsequently, obtained p-values were Bonferroni corrected for multiple comparisons.
As mentioned above, the analyses were repeated twice with the small as well as twice with the big model, hence, in total four times. In the first run of the small and the big model, all fear ratings were assumed to be independent. However, it is very likely that fear ratings, (as well as risk ratings and somatovisceral responses) of different participants exhibit specific patterns (e.g., general high or low ratings, etc.) and consequently, fear ratings of a given participant form clusters and are not statistically independent. This is a kind of a repeated-measures problem, where a participant is measured several times. Therefore, we ran a second analysis with the small as well as with the big model, where the CV procedure had an additional constraint per CV repetition, namely that the data of one participant were not allowed to be in both, the testas well as the training-set. This strict participant-based data separation was implemented to avoid an information flow between training and testing-set due to the above-mentioned participant-specific patterns or clusters of fear ratings. Hence, the first analysis of the small and the big models reflects how well one can predict the fear ratings in the case that participant-specific patterns in the data are known (learning of participant-specific patterns permitted), whereas the second analysis reflects how well one can predict the fear ratings in the case that participant-specific patterns in the data are not known (in other words, how well the prediction would perform if we tried to predict the fear ratings of new, formerly not available participants; learning of participant-specific patterns prevented). Furthermore, the difference between these analyses shows whether such clusters are present or not and how much they contribute to prediction accuracy.

Results
We applied a machine learning algorithm to predict fear ratings from diagnostic information (phobic), cognitive information (perceived risk), and, additionally, physiological information (based on ECG, EDA, EMG, RESP). In total, we computed four analyses. Two analyses with the small model (only phobic and perceived risk as features) and two analyses with the big model (all described features). The analyses differed in whether learning of participant-specific patterns was permitted or not. To prevent learning of participant-specific patterns, the CV data splitting was modified so that samples of a given participant were only available during the model's training or testing, but never in both.
First, we computed a correlation matrix (Fig. 1) of all factors and the target by using Pearson product moment correlation coefficients. On the one hand, this revealed a high correlation (0.7) between participants' fear ratings and being classified as phobic (vs. control; i.e., diagnostic factor), as well as a medium negative correlation (-0.5) between RESP_freq changes and the respective baselines. On the other hand, except of phobic state, we found only low correlations between features and the fear ratings (Fig. 1, bottom row). Due to its sensitivity to outliers, however, the Pearson product moment correlation coefficient can fail to detect the existence of meaningful relationships between variables (Rousselet and Pernet, 2012). Particularly, it cannot uncover non-linear relationships or more complex multivariate patterns of interdependent relationships. Thus, we used machine learning in the next step to predict fear ratings from the variables (factors) to establish which ones (if any) contribute to the prediction.
Second, we estimated the multivariate models and determined the models' performances. The Extra Trees models provided a good fit in accordance with conventional ranges of cutoff values (Table 1). In analysis one (small model, learning of participant-specific patterns permitted) the average absolute error is 0.19, within a fear rating range of 0 to 1. The prediction-based coefficient of determination is 0.56, which indicates that the model explains on average 56% of the variance in the fear ratings. In the second analysis (small model, learning of participant-specific patterns prevented), the average absolute error is bigger than in analysis 1 with 0.21, at the same fear ratings range of 0 to 1. Consistent with this finding, the prediction coefficient of determination is smaller than in the first analysis (0.43, corresponding to 43% explained variance in the fear ratings). This observation is supportive of the existence of participant-specific patterns in the data. In the third analysis (big model, learning of participant-specific patterns permitted), the average absolute error is 0.12, within a fear rating range of 0 to 1. The prediction-based coefficient of determination is 0.81, which indicates that the models explains on average 81% of the variance in the fear ratings. In the fourth analysis (big model, learning of participantspecific patterns prevented), the average absolute error is 0.21, within a fear rating range of 0 to 1. The prediction-based coefficient of determination is 0.4, which indicates that the models explains on average 40% of the variance in the fear ratings.
Third, we analyzed the contributions of single factors to the big models' performance, i.e. which factors are important to predict fear ratings in the big models. For that, we applied permutation importance calculations and, subsequently, bootstrap significance tests. In analysis three (big model, learning of participant-specific patterns permitted; Fig. 2), the most relevant features werein descending order: Phobic, somatovisceral measurements (baseline heart rate, baseline activity at the M. Zygomaticus major, reactivity at the M. Zygomaticus major), and the perceived risk of encountering a spider. Other factors did not contribute significantly. Noteworthy, the feature phobic (diagnostic factor) was by far the most important feature. It had a median importance of 0.66, whereas the second most relevant feature (ECG_BSL) had a median importance of 0.09. In the fourth analysis (big model, learning of participant-specific patterns prevented; Fig. 3), the most relevant feature was whether a participant was classified as being phobic or not with a median importance of 0.93. All other features were not significant.
Taken together, the machine learning analyses revealed that fear ratings were predominantly predictable by whether a participant was phobic or not. However, in the case that learning of individual patterns in the data is possible, somatovisceral measurements become important, too, and contribute additional information on the fear ratings. When participant-specific patterns in the data are known (third analysis), the  prediction is very accurate, with 81% explained variance. After removing the participant-specific pattern information (fourth analysis), the prediction accuracy drops to 40% explained variance.

Discussion
We investigated how the prediction of subjective fear responses is informed by somatovisceral respondingbeyond the level of diagnostic interviews and rating scales assessing perceived threat. We hypothesized that diagnostic and cognitive factors would be characterized by greater predictive power than the somatovisceral factor(s), while the latter would still aid in the fine-tuning of the prediction. Our findings are predominantly consistent with these hypotheses. The analyses conducted demonstrate that the diagnostic factor is by far most important when prognosticating subjective fear. Yet, under certain circumstances, the other factors considered are also relevant.
The third model (big model, learning of participant-specific patterns permitted) incorporates all investigated factors and has the lowest MAE and highest R 2 of all models. Its comparison with the first model (small model, learning of participant-specific patterns permitted) points out that somatovisceral measures carry additional information about the fear ratings; information that cannot be provided by the diagnostic and cognitive factors, because the predictive power of the first compared with the third model dropped from R 2 of 0.81 to 0.56. Consistent with our earlier publication , heart rate and muscle activity at the M. Zygomaticus major supply significantly to this higher predictive power. The other somatovisceral measures included did not add independent information.
Our findings are in line with the widespread assumption that heart rate is an important constituent of fear responses. For instance, established classification systems (American Psychiatric Association, 2013; World Health Organization, 1992) state palpitations, pounding heart, and accelerated heart rate as characteristic indicators of specific phobia. Furthermore, various theoretical considerations and empirical findings include heart rate changes as important indexes of fear responses (Aue et al., 2016;Aue et al., 2007;Hamm, 2020;Lang and Bradley, 2013;Lang et al., 2017). Somewhat surprisingly though, opposed to our expectations, in the current investigation baseline heart rate (which we did not consider in our earlier publication on the same data; ), but not heart rate reactivity was sensitive to variations in fear levels. Hence, with respect to the prediction of situation-specific subjective fear responses to spiders, heart rate prior to the onset of the spider picture (rather than heart rate accompanying processing of the spider stimulus) is more informative. This finding points to the possibility that the expectancy-related autonomic state throughout the experiment predisposes an individual to experience specific levels of fear, with less influence of the actual visual stimulations. Such an interpretation of the data is corroborated by the observation that this did not only held for responses to spiders (i.e., phobic stimulus material for half of our participants), but also for responses to snakes (see Supplementary Materials for details). The present finding is of particular interest for the clinical context, because it suggests that imagined (rather than real) circumstances as well as the associated uncertainty determine subsequently expressed fear levels. This implies that successful therapies should have a strong focus on problematic expectancies that may be hard to overcome. Because empirical studies commonly do not consider both baseline heart rate and heart rate reactivity, these results remain to be confirmed. Future investigations should do so and further examine the possibility of differential predictive power of baseline and reactivity features.
That facial muscle activity at the M. Zygomaticus major (including muscles that are located close-by) is able to capture fear and very negative stimulation has been reported before (Aue et al., 2007;Bradley and Lang, 2007;Ekman, 2003;Elgee, 2003;Larsen et al., 2003). Such activity may relate to the existence of a so-called fear grimace or bared-teeth display (Van Hooff, 1972) as well as signal appeasement and submissive behavior in primates (Parr et al., 2016;Waller and Dunbar, 2005). Interestingly, we found both baseline activity as well as reactivity to yield independent information for the prognostication of subjective fear. As for heart rate, therefore, findings for muscle activity at the M. Zygomaticus major suggest that subsequent research should carefully differentiate the two.
For respiratory and electrodermal measurements, we did not observe independent contributions to the prediction of the participants' fear levels. Yet, our findings do not necessarily suggest that respiration and electrodermal measures are uninformative overall. Notably, respiration measures included in the current study differed significantly between phobic and control participants, and electrodermal activity differed between high and low fear in phobic participants (see , for details). The present approach goes a step further and shows that respiratory and electrodermal measures do not add beyond the diagnostic, cognitive, and already available somatovisceral information (provided by heart rate and facial muscle activity at the M. Zygomaticus major). Thus, if we know that a given participant belongs to the phobic or control group, specified a certain perceived risk, and showed specific heart rate and fear grimace responses, additional consideration of the participants' respiratory and electrodermal responses does not ameliorate the prediction accuracy for subjective fear. It has yet to be considered that, because the current study design relied on a rather quick succession of stimuli, it may not have been possible for those low-latency signals to capture rapidly occurring psychological changes. It therefore remains to be determined whether a study involving only slowly changing events (and corresponding psychological states) yields a replication of our finding of limited predictive value of electrodermal and respiratory responses.
While our data show that the consideration of some somatovisceral measures may assist the prediction of phobic fear (in addition to diagnostic and cognitive factors), we also see that their predictive power disappears, if an individual's data cannot figure in both the training and test sets (i.e., learning of participant-specific patterns is prevented). A direct comparison of MAEs and R 2 s of the two respective models reveals that the fourth model (big model, learning of participant-specific patterns prevented) is only about as good as the second (small model, learning of participant-specific patterns prevented). Moreover, we see a big drop in model performance (only half R 2 ) from model three (big model, learning of participant-specific patterns permitted) to model four (big model, learning of participant-specific patterns prevented). Because the control for individual clusters was the difference between models three and four, this drop of model performance suggests that somatovisceral information is highly individual and does not generalize between participants. Hence, somatovisceral measures included in the current study are of little value in the prediction of an unknown individual's subjective fear responses. In other words: knowing how an individual responds to a given situation improves prediction substantially. By contrast, if a new individual is considered, the algorithm needs to learn about the individual (e.g., whether or not there is a general tendency to report high or low fear).
An aspect that needs attention, in this regard, is that people can be distinguished with respect to the degree of affective-autonomic response dissociation displayed (Brosschot and Janssen, 1998). According to this conception, in some individuals there can be a divergence between the levels of fear revealed in verbal reports, on the one hand, and in somatovisceral responses, on the other hand. Whereas a considerable number of people show coherence between verbal reports and psychophysiology in that they display high or low expressions in both types of measures, other individuals do not. For instance, so-called repressors are characterized by low subjective fear levels while demonstrating high sympathetic arousal (Asendorpf and Scherer, 1983). Sensitizers, on the other hand, are individuals who report high fear with concurrent low sympathetic arousal (Derakshan and Eysenck, 1997). To no surprise, therefore, investigations involving machine learning approaches may, sometimes, reveal only weak correspondence between subjective and physiological indicators of fear (e.g., (Taschereau-Dumouchel et al., 2019)). Applied to our own observations, these reflections suggest that predictions of subjective fear levels from somatovisceral responding may be of limited use if nothing is known about an individual's personal affective-autonomic response relationship.
Importantly, the exact same applies to the cognitive factor (i.e., estimated risk). Thus, cognitive factors do not appear to be per se more effective in the prediction of subjective fear than are somatovisceral measures. Subsequent investigations should explore a greater variety of cognitive (e.g., expectancies of being harmed Aue and Okon-Singer, 2015); reappraisal of a potentially threatening situation (Everaert and Joormann, 2019;Kamphuis and Telch, 2000)) and somatovisceral (e.g., blood pressure and ECG T wave amplitude; (Globisch et al., 1999;Sarlo et al., 2002)) measures in order to get closer insight into this point. Other considered sources of information related to electrophysiological (e.g., (Bȃlan et al., 2019; Bȃlan et al., 2020)) and

Fig. 2.
Distribution of permutation-based feature importance for predicting fear ratings with Extra Trees regressors over 100 cross-validation repetitions. For the cross-validation, all fear ratings were treated as being independent, hence not depending on a specific participant. Therefore, data of a participant were present in the training set as well as in the testing set. This data partitioning reflects how well one can predict the fear ratings in the case that participant-specific patterns in the data are accessible. ACT = Activity (relating to activity following picture onset); amp = Amplitude; BSL = Baseline; corr = M. Corrugator supercilii; ECG = Electrocardiogram; EDA = Electrodermal activity; EMG = Electromyogram; freq = Frequency; RESP = Respiration; zygo = M. Zygomaticus major. Fig. 3. Distribution of permutation-based feature importance for predicting fear ratings with Extra Trees regressors over 100 cross-validation repetitions. For the cross-validation, fear ratings were treated as depending on specific participants. Therefore, data of a participant were present in the training set or in the testing set, but never in both. This data partitioning reflects how well one can predict the fear ratings in the case that participant-specific patterns in the data are not accessible, hence, how well the prediction generalizes to unknown participants. ACT = Activity (relating to activity following picture onset); amp = Amplitude; BSL = Baseline; corr = M. Corrugator supercilii; ECG = Electrocardiogram; EDA = Electrodermal activity; EMG = Electromyogram; freq = Frequency; RESP = Respiration; zygo = M. Zygomaticus major. speech parameters (e.g., pitch; (Koduru et al., 2020;Seng et al., 2016)). Those sources have been repeatedly examined in emotion research that relied on machine learning approaches and may be of interest when predicting subjective fear responses as well.
Finally, it may seem that the Bȃlan et al. (2019) models (achieving classification accuracy up to 90%) that included EEG measures in addition to our somatovisceral parameters, outperformed our own when predicting subjective fear. However, these authors' models relied on (few) phobic participants only. Moreover, subjective fear responses, in these earlier models were classified into only two (no fear; fear) or four categories (no fear; low fear; medium fear; high fear), while here, we employed a quasi-continuous variable. In addition, these authors did not investigate methods that prevented the learning of participant-specific patterns. The specific value of EEG measures with respect to our own models, therefore, remains to be determined.

Limitations
Part of the high correlation between subjective fear and the diagnostic factor in the current investigation can possibly be explained by the fact that both variables were obtained from verbal reports. A limitation of the study, therefore, is that it did not assess clinically relevant features of fear other than verbal reports (e.g., behavioral avoidance). In addition, the present number of participants (n = 27) may be considered comparably low (note, however that, due to the repeated-measures design, our overall sample size was n = 803). In light of these limitations, subsequent studies need to examine how the effects observed replicate in new participants with multiple indices of fear.
Future studies further need to address whether the results can be generalized (a) to men; (b) to other anxiety disorders (e.g., social anxiety disorder, agoraphobia, or generalized anxiety disorder); and (c) across different experimental settings. Especially the demonstration of commonalities and divergences across different anxiety disorders (related to point b) may yield meaningful insights about possible shared mechanisms between these disorders. Furthermore, identification of the key predictors for these different forms of anxiety disorders will likely highlight important starting points for the therapeutic context (Deckert and Erhardt, 2019;Frick et al., 2014;Hahn et al., 2015;Lueken, Hilbert, et al., 2015;Lueken, Straube, et al., 2015;Mansson et al., 2015;Nicholson et al., 2019;Schwarzmeier et al., 2020), whith these starting points possibly varying across anxiety disorders.

Summary and Conclusions
The current investigation demonstrates the general utility of machine learning in the prediction of subjective fear. Combining diagnostic, cognitive, and somatovisceral factors can achieve high levels of prediction accuracy when learning of participant-specific patterns is allowed forwith heart rate and facial muscle activity at the M. Zygomaticus major being the most effective somatovisceral predictors. Thus, physiological assessments may help to fine-tune fear level predictions. Yet, the utility of somatovisceral and cognitive risk factors fades when no prior corresponding information about an individual is available. Hence, the algorithm needs to learn about the individual (e.g., about a potential affective-autonomic response dissociation) in order to successfully integrate cognitive and somatovisceral information into fear level prediction for subsequent trials. Once such learning has taken place, the inclusion of somatovisceral predictors may be beneficial if one does not want to continuously interrupt an experiment and let participants consciously assess their fear levels. Future studies may replicate our findings with the consideration of additional physiological measures and further distinguish between baseline and reactivity responses.

Author statement
T.A. and D.S. developed the theory. T.A. and M.E.H. carried out the experiment. D.S. performed the computations. T.A. and F.S. verified the analytical methods. T.A. and D.S. wrote the manuscript with input from all authors.

Declaration of Competing Interest
The authors declare the absence of competing interests.