Change in heart rate variability with increasing time-on-task as a marker for mental fatigue: A systematic review

Fatigue-specific changes in the autonomic nervous system are often assumed to underlie the development of mental fatigue caused by prolonged cognitive tasks (i.e. Time-on-Task). Therefore, several previous studies have chosen to investigate the Time-on-Task related changes in heart rate variability (HRV). However, previous studies have used many different HRV indices, and their results often show inconsistencies. The present study, therefore, systematically reviewed previous empirical HRV studies with healthy individuals and in which mental fatigue is induced by prolonged cognitive tasks. Articles relevant to the objectives were systematically searched and selected by applying the PRISMA guidelines. We screened 360 records found on 4 databases and found that 19 studies were eligible for full review in accordance with the inclusion criteria. In general, all studies reviewed (with the exception of two studies) found significant changes in HRV with increasing Time-on-Task, suggesting that HRV is a reliable autonomic marker for Time-on-Task induced fatigue. The most conclusive HRV indices that showed a consistent Time-on-Task effect were the low frequency component of HRV and the time domain indices, particularly the root mean square of successive differences. Time-on-Task typically induced an increasing trend in both type of measures.


Introduction
Mental fatigue (henceforth fatigue) is a subjective state, characterized by a reduced motivation to exert effort, which is often accompanied with fluctuations and impairments in cognitive performance (Ackerman, 2011;Hancock et al., 2012;Robert & Hockey, 1997).Fatigue is sometimes also referred to as the 'Stop-emotion' because it seems to function as an urge to discontinue further investment of resources into the task at hand ( Van der Linden, 2011).The aversive feeling and detrimental effects associated with fatigue are frequently induced by long periods of continuous performance on demanding cognitive tasks, which is commonly referred to as Time-on-Task (ToT, e.g.Lorist, 2008;Faber et al., 2012).Mental fatigue due to Time-on-Task has been identified as a source of serious accidents, especially when the safe performance of a task requires continuous sustained attention (Bendak & Rashid, 2020;Matthews & Desmond, 2002).The high prevalence of mental fatigue in daily life, and its potentially severe consequences have inspired much previous research.An important conclusion of research on the underlying mechanisms is that mental fatigue is a complex biopsychological state that includes motivational factors (Herlambang et al., 2019;Hopstaken et al., 2015a) as well as compensatory mechanisms that counteract the affective and cognitive effects of fatigue (Nakagawa et al., 2013;Robert & Hockey, 1997.Of the many physiological indices that reflect ToT-induced fatigue, a commonly used one is heart rate variability (HRV).There are numerous HRV indices, though, and their interpretation often differs in different studies.Moreover, studies vary in the tasks they use to induce fatigue.Therefore, it is relevant to review the current literature in this field in order to examine whether conclusions can be drawn regarding which are the most useful HRV indices in fatigue research and what they may imply.
Recent theories on the topic have emphasized that cost-benefit evaluations play an important role in mental fatigue (Boksem & Tops, 2008).That is, the level of fatigue one experiences is partly under the influence of trade-offs between the immediate or potential rewards of an activity versus its costs, such as the amount of effort or time that has to be invested.For example, in the ToT paradigms that are typically used to study fatigue, it is assumed that with increasing ToT, the benefits of continuing the same activity get devalued (i.e. it is no longer worth the cost of performance) and, therefore, the motivation to exert further effort into that activity declines (Hopstaken et al., 2015b;Kok, 2022).In general, decreased motivation is associated with task disengagement.Yet, in many fatiguing task situations, a strong task disengagement is not a desirable option (e.g., for safety reasons) and thus needs to be avoided.This implies that a person will try to uphold adequate levels of task performance, despite experiencing relatively high levels of subjective fatigue.Accordingly, instead of reducing the level of effort, the person, in fact, has to increase compensatory effort in order to suppress the urge to stop paying attention to the task at hand (Hockey, 2011).In line with this notion, fatigue due to prolonged performance may involve task disengagement, compensatory effort, or a combination of both, depending on task and person characteristics.These two antagonistic processes might, however, be associated with opposing autonomic neural regulations: whereas task disengagement is more likely to be related to an enhanced parasympathetic activation (e.g.Matuz et al., 2021;Pattyn et al., 2008), compensatory effort is often hypothesized to induce distress accompanied with enhanced sympathetic activity (e.g.Hockey, 2011).
Due to the potentially dual autonomic character of Time-on-Task induced fatigue, several previous studies have chosen to investigate the Time-on-Task related changes in HRV (e.g.Gergelyfi et al., 2015a;Matuz et al., 2019;Redondo et al., 2019).HRV refers to the variability in intervals between successive heartbeats (i.e.R-R intervals), and is determined by the rhythm of sinoatrial node (SAN) activity, which is strongly influenced by interactions between the parasympathetic and the sympathetic inputs to the SAN (Draghici & Taylor, 2016).The association between HRV and fatigue-related changes in cognitive performance seems to be related to direct functional and structural connections between vagal control and multiple brain areas such as the limbic and prefrontal areas (e.g.Smith et al., 2017).These brain areas constitute important networks of emotional and cognitive regulation, and have been shown to change in activity level during fatiguing mental operations (Lorist et al., 2005;Darnai et al., 2023).HRV analysis can be performed in frequency (quantifying the distribution of power into frequency ranges) and time domains (quantifying HRV during certain monitoring periods), as well as by using non-linear analyses (quantifying the unpredictability of time series).In the context of fatigue research, HRV measurement is considered to be a valuable tool as HRV indices have been found to be predictive of fatigue in several studies (e.g.Melo et al., 2017;Matuz et al., 2021).In addition, the various HRV indices may differ in the extent to which they reflect the functioning of the parasympathetic and sympathetic systems (Martinmäki et al., 2006;Kiyono et al., 2017).Therefore, comparisons of HRV indices became a widely used method to estimate changes over time in parasympathetic (i.e.vagal activity) and sympathetic activity (Pagani et al., 1984(Pagani et al., , 1986)).One example of the use of HRV measures is to look at the ratio of low frequency component (LF) HRV, presumed to partially reflect sympathetic activation, and the high frequency component (HF), which primarily reflects parasympathetic activation.This LF/HF ratio has been suggested a marker of sympatho-vagal balance (Pagani et al., 1984).
HRV approaches such as described above have been challenged, however, based on evidence showing that most of the HRV components are under a highly mixed sympathetic and parasympathetic influence (e. g.Laborde et al., 2017).Consequently, it was argued that they do not allow a clear differentiation of activities of the two subbranches of ANS.For example, LF HRV has been linked to baroreflex rather than to cardiac sympathetic tone suggesting that LF/HF ratio cannot be interpreted as a marker sympatho-vagal balance (Goldstein et al., 2011;Heathers, 2012;Rahman et al., 2011).Recent recommendations, therefore, suggest that when interpreting HRV findings, one should focus on the indices whose physiological background is identified as indicative of vagal tone (Sassi et al., 2015;Laborde et al., 2017).
Several measures of HRV have been suggested as indices of the parasympathetic nervous system activity.For example, among the time domain measures, the root mean square of successive differences (RMSSD) and the percentage of interbeat intervals that differ by more than 50 ms (pNN50) have been assumed to reflect vagal tone.Other examples are the high frequency component (HF) of the frequency HRV measures, and the SD1 (a Poincaré plot component used to quantify short-term heart rate variability) of the non-linear measures (Malik et al., 1996;Berntson et al., 1997).However, it is important to note that the relationship between HRV and vagal function is complex and HRV should not be considered a direct measure of vagal activity.More specifically, HRV does not reflect full vagal outflow to the heart, because substantial component of cardiac vagal tone seems to arise independently of HRV (Farmer et al., 2016).Furthermore, HRV is only driven by cardiac vagal activity and therefore may not be representative of overall vagal activity (e.g.Marmerstein et al., 2021).In addition, respiration substantially influence HRV (especially its HF component) irrespective of vagal activation (Ritz et al., 2012).
Furthermore, LF HRV has long been considered an index of sympathetic activity, several studies have drawn attention to its unclear physiological background (e.g.Billman, 2009;Reyes del Paso et al., 2013).In line with this, it has been suggested that the LF/HF ratio should no longer be interpreted as an index of the sympatho-vagal balance (Billman, 2013;Eckberg, 1997).
From the line of reasoning above, it may become apparent that the interpretation of HRV has been changing.As such, it is relevant to review previous studies on the association between HRV and fatigue in light of the more recent recommendations and concepts regarding the functional interpretation of HRV.This is even more relevant as the previous studies often found opposite trends in relation to mental fatigue: some of the studies observed an increase in HRV (e.g.Matuz et al., 2021;Delliaux et al., 2019a) while others reported a decreased HRV as the function of time spent on the cognitive task (e.g.Melo et al., 2017).In addition, the indices used to assess changes in HRV also show a high variability across studies.Therefore, in the present study we systematically review previous fatigue studies that used HRV measurement.This review allows us to address questions regarding the nature of trends in HRV during prolonged, fatiguing performance of cognitive tasks, and whether there are task characteristics associated with different HRV findings.We also synthetize previous findings on the relationship between cardiac autonomic regulation and Time-on-Task induced fatigue mainly based on those HRV indices that are supposed to have 'clean physiological background', that is, which seem to be clearly related to vagal tone.To the best of our knowledge, no such systematic review has been published before.Although there are previous reviews on the associations between HRV and determinants of fatigue, such as self-regulatory mechanisms (Holzman & Bridgett, 2017) and mental workload (Charles & Nixon, 2019), a systematic review on the associations of HRV with Time-on-Task induced fatigue is lacking.

Search protocol
Articles relevant to the objectives outlined above were systematically searched and selected by applying the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines (Liberati et al., 2009;Page et al., 2021).As the first step, we searched the databases using pre-defined search terms (see Table 1).Three groups of search terms were defined: [1] terms relating to Heart Rate Variability, [2] terms most often used to describe the type of fatigue induced by prolonged cognitive tasks, and [3] terms used in association with fatigue and ToT.
The terms were entered at search query strings created by compiling Boolean operators (i.e.OR, AND) in accordance with the specific syntax rules required by the databases.The general syntax was as follows (please see the Supplementary for the database specific search strings, S1): (mental OR cognitive OR attentional OR motivational) AND (fatigue OR exhaustion OR tiredness OR vigilance OR "ToT" OR "prolonged Á. Csathó et al. task" OR "extended task") AND ("heart rate variability" OR "inter beat interval" OR "beat to beat interval" OR "R-R variability" OR "cardiac chronotropy").
The databases searched were PubMed, Web of Science, Scopus, and Google Scholar.Please note that Google Scholar has character limit for searches, therefore the above search string was applied in several, smaller parts (see again the Supplementary, S1).If the Database (e.g.Scopus) provided in-built filtering option to select document type and language, then the filters of 'English' and 'Article' were applied.'All fields' search descriptor was selected for each Database.Articles returned as the results of the search process were reviewed for eligibility using the exclusion and inclusion criteria (see Table 2): they were first reviewed based on the titles and the abstracts, and afterwards based on the full-text.Following recommendations and previous systematic review studies, in the cases of Google Scholar and Scopus the 600 most relevant results were reviewed for eligibility (Haddaway et al., 2015;Bramer et al., 2017).In Google Scholar and Scopus, all relevant records screened for eligibility were found among the first 400 records.The search process was carried out in November 2023.

Eligibility criteria
Articles were considered eligible for review if they met the inclusion and exclusion criteria as listed in Table 2.In case of intervention (e.g.pharmacological intervention), stimulation (e.g.TMS), and patient studies, we always evaluated the eligibility of the control group or condition (if there was any) for being reviewed in this study.If the participants in the control conditions were not under any intervention and stimulation, that is, if the procedure of the control was in line with the eligibility criteria, then the results of that control were presented in the current review.To determine whether the ToT manipulation in a specific study was successful in inducing fatigue we adopted the criteria often use in the literature.That is, fatigue studies usually acknowledge that mental fatigue has a subjective component (i.e., subjective sensation of fatigue) and an objective component (i.e., declined performance over time).Compromised performance is often considered an indication of fatigue.Importantly, however, it is also widely acknowledged that compromised performance is not necessarily a consequence of fatigue (Balkin & Wesensten, 2011;Fischler, 1999).For example, Robert & Hockey, 1997 clearly made a case for compensatory strategies in which people can uphold adequate performance for a considerable time, despite high levels of fatigue.Therefore, we considered as eligible for review not only those studies that showed a performance deficit with ToT but also those that have shown an increase in subjective fatigue while maintaining or even improving cognitive performance (see also Table 2).

Assessment of methodological quality and risk of bias
Methodological quality and risk of bias of the individual studies were assessed using a shortened version of the modified Downs and Black Checklist (Downs & Black, 1998).This Checklist is widely used in systematic reviews and is suited for randomised as well as non-randomised studies covering quality domains including reporting, external validity, internal validity, and statistical power.For the present review, we created a shortened checklist (see Supplementary material, S2) selecting 13 of the total of 27 items, because we found many of the items not relevant for reviewing experimental studies testing healthy human participants.Due to similar reasons, further modifications were made in criteria and/or scoring of several items (see below).Although the modifications were extensive, the adaptability limitations of the Downs and Black Checklist have been reduced through similar modifications in several previous studies (e.g.Gorber et al., 2007;Feijó et al., 2019) including systematic reviews of experimental studies (e.g.Groeber et al., 2019).With these modifications, using this checklist is beneficial in terms of the wide range of quality domains covered and the comparability of specific quality items across the reviewed studies.In the 13 items from the Downs and Black Checklist (see S2), further modifications were made in the criteria and/or the scoring of 5 items.The items originally relating to patients were interpreted as referring to participants in experiments (i.e.item 3, 11), and the item 4 relating to intervention/treatment was interpreted as referring to the fatigue induction procedures.In addition, as in many other studies (e.g.Littlewood et al., 2020;Hwang et al., 2015), a simplified criteria of item 27 was used: instead of calculating a range of potential study powers, the power was scored by whether or not a power calculation was reported in the study.Except for item 25, the items were always dichotomously scored with either 0 (no, or unable to determine) or 1 (yes); the maximum possible score was 13.In item 25, that refers to the adjustments for confounding factors in analyses, we again made a modification in scoring.As there are a number of confounding variables potentially affecting mental fatigue and HRV (e.g.BMI, sleep quality, gender, age etc.), a partial score (point 0.5) was also used in scoring.That is, if the analyses were not controlled for any confounding variable, then score 0 was given, if the analyses were controlled for one confounding variable only, then we gave 0.5 point, and if multiple confounding variables were considered in Note.Where it was appropriate (e.g.'R-R variability' and 'ToT'), terms were implemented in the search things in two forms: with and without hyphens.

Table 2
Inclusion and exclusion criteria.

Inclusion criteria Exclusion criteria
• Empirical research papers published in peer-reviewed scientific journals in English.• A human study was conducted.
• Fatigue was induced by a cognitive or perceptual task.• In the results, there were clear indications of enhanced fatigue induced by ToT: subjective fatigue reported by participants was higher after the experiment than before and/or cognitive performance declined with increasing ToT.• ToT analyses were performed and reported for HRV indices calculated on ECG signals measured during active task performance periods.• HRV components analysed in the study were specified (e.g. the specific frequency range of HRV).
• We excluded studies defined as pilot or preliminary study.Note.ToT: Time-on-Task; * In case of intervention, stimulation, and patient studies we always considered the eligibility of the control groups or conditions for being reviewed in this study.If the control conditions were in line with the eligibility criteria, then the results of that condition were presented in the current review.
the analyses, then the score was 1 point.Each item in this modified Checklist was rated independently by two authors (AC, AM).Any disparity in the scores among the authors were resolved through discussion.The concept of the scoring method was extensively discussed with a non-author person (without any conflict of interest with the present review) who agreed the concept, and independently evaluated two of the selected articles that had partly the same authors as the present review.In assessing the quality of the studies, only those outcome variables and analyses were evaluated that were relevant to the present systematic review: That is, reporting and analyses of cognitive performance, subjective fatigue, and HRV.The final Downs and Black checklist scores for each article selected are presented in the Supplementary materials (S3).

Data extraction and synthesis
We extracted the following 7 different types of data from the selected articles.
[1] General information: author, title and publication year etc.; The primary objective of the present review was to provide an update on the direction and magnitude of changes in HRV induced by fatiguing prolonged performance of cognitive tasks.The diversity in methodological approaches and study designs used to induce fatigue precluded a formal meta-analysis, therefore, we conducted a qualitative synthesis of the study outcomes.

Study selection
Fig. 1 summarizes the results of the selection procedure performed in accordance with the PRISMA guidelines.The database search on the 4 databases yielded 2083 records in total, taking into account that for two databases (i.e.Google Scholar, Scopus) only the 600 most relevant records were considered for review (see also Method section above).Throughout the different databases, there were many duplicates among these records (n = 1723) that were removed before screening.After removing the duplicates, 360 records remained and were screened for eligibility based on the titles and abstracts.Of these articles, 146 seemed to be in line with the selection criteria and were further assessed for eligibility based on the full text.This deeper inspection resulted in 19 papers that met the inclusion criteria and did not meet the exclusion criteria.Articles were excluded because of various reasons (see Fig. 1 for details) including the non-eligibility of type of the study (e.g. it was a clinical study with only patient participants), no ToT analyses, and lack of indication of mental fatigue.In sum, 19 studies were included in this systematic review to synthetize the results of previous studies in relation to the direction and magnitude of changes in HRV induced by ToT.

Sample characteristics
Table 3 summarizes the studies selected for review, including the characteristics of the samples, methods and main findings reported.The sample size of the studies ranged from 8 to 50 with a mean of 21 and a median of 19.The studies differed in the proportion of male and female participants, but overall, calculating across the studies, more male than

Table 3
Summary about the characteristics and results of the reviewed studies (continued on the next page).female participants were tested (average proportion of males/females: 58.10% / 41.90%).Each of the studies tested relatively young adults with a range of 18 -40 years.In one study the age range of participants was not specified, but the authors refer to them as first year students (Pattyn et al., 2008).In two studies, the age of participants was reported for all the invited participants, not specified for those selected finally for testing (Karthikeyan et al., 2022;Nann et al., 2021).In 10 of the 19 studies, it was explicitly reported that the participants were recruited from a university student population (Herlambang et al., 2021;Herlambang et al., 2019;Karthikeyan et al., 2022;Matuz et al., 2021;Matuz et al., 2019;Melo et al., 2017;Pattyn et al., 2008;Qin et al., 2021;Redondo et al., 2019;Shi et al., 2022;Zhao et al., 2012).In each study, participants were healthy, tested mostly by self-report (one study also tested patients, but we included only the results of the healthy control group, Brouwer & van Wolffelaar, 1985).Taking medication prior to testing was an exclusion criterion in 9 studies (Delliaux et al., 2019a;Gergelyfi et al., 2015a;Karthikeyan et al., 2021;Matuz et al., 2019;Nann et al., 2021;Qin et al., 2021;Redondo et al., 2019;Smith et al., 2019;Zhao et al., 2012).

Reference
The reviewed studies used different protocols to instruct the participants before the experiments by which they aimed to minimize the effect of potential confounding factors on cognitive performance and HRV (see Table 4 for a summary of instructions given to participants prior to the experiments).More specifically, it was common to instruct the participants to avoid alcohol (9 studies, for clarity references are shown in Table 5 only), and caffeine consumption (12 studies), to have normal sleep (12 studies), and to restrict physical activity (5 studies) before the experiment.In contrast, only 2 studies instructed participants to avoid excessive mental activity before the experiments, and similarly, only 2 studies put an emphasis on controlling of participants' preexperiment meal.

Characteristics of cognitive tasks assessed for fatigue induction
The studies reviewed showed considerable variation in the duration of the task (i.e.ToT period) administered to induce mental fatigue.Specifically, the ToT periods ranged between 8.5 and 182 min with a mean of 80.86 min (median = 90 min).Typically, the studies assessing psychomotor vigilance task applied shorter periods (10-45 min) for fatigue induction than the studies in which participants needed to discriminate between different stimulus conditions (e.g.Stroop task), or to find the target in a stimulus-series (e.g.n-back tasks).
In line with the criteria and fatigue conceptions of the study selection, the fatigue manipulations used in each study were successful: participants reported higher (subjective) fatigue after task performance than before and/or participants' performance was impaired with increasing ToT.Specifically, 16 studies showed significant decline in performance by the end of the ToT period.In 13 studies subjective fatigue was also assessed and the results indicated enhanced subjective fatigue induced by ToT.Only 2 of the 19 studies reviewed observed either no change in task performance (Matuz, 2021) or even an improvement in performance while participants nevertheless reported enhanced subjective fatigue (Delliaux et al., 2019b).
In studies where task conditions varied in respect of difficulty or reward (e.g.used reward and non-rewarded conditions), subjective fatigue as well as objective performance could differ between the conditions.For example, in Nann et al. (2021), performance was impaired in the difficult condition only; in Herlambang et al. (2019), performance impairment was found (both in reaction time and accuracy) only when participants were not rewarded.
The tasks assessed across studies were also found to be highly variable in terms of the cognitive operations required.The tasks required cognitive functions such as vigilance (psychomotor vigilance tasks, monotonous driving tasks), working memory (n-back tasks), cognitive control (Stroop task, Go-NoGo task, AX-CPT), and complex problemsolving (Sudoku games).Except for Gergelyfi et al., 2015b, in each study the task to induce fatigue was the same as the task to examine the effects of fatigue manipulation.Thus, these studies used a Time on the same task paradigm.In Gergelyfi et al., 2015b, prolonged performance on a Sudoku puzzle game was used to induce fatigue, whereas a working memory task and simple reaction time task were used for evaluation of the ToT effects.Finally, the task modalities were predominantly visual with a few exception of studies using either auditory stimuli (Brouwer & van Wolffelaar, 1985) or bimodal stimuli (visual-auditory stimuli (Matuz et al., 2021;Matuz et al., 2019)).

Characteristics of HRV measurements
In each study reviewed, ECG was recorded for the entire duration of the task, which was divided into time windows (i.e.time bins) within which the HRV was calculated.The time window varied from a fewminute-long intervals to 30-minute-long intervals (see Table 3).More specifically, the time windows ranged between 1.66 and 30 min with a mean of 8.18, but the most frequently used time window was 5 min (i.e. the median was 5 min and 9 studies chose this time window to calculate HRV).Regarding the HRV indices computed, only a few studies analysed the very low frequency (VLF, 0.003-0.04Hz; Smith et al., 2019;Delliaux et al., 2019b), and mid-frequency components of the HRV spectrum (MF, 0.07 -0.14 Hz;Brouwer & van Wolffelaar, 1985;Herlambang et al., 2021;Herlambang et al., 2019), as well as the non-linear indices of HRV (SD1, SD2, entropy indices; Delliaux et al., 2019b;Gergelyfi et al., 2015b;Matuz et al., 2021).In contrast, most of the studies focused on ToT-related changes in the low (0.04 and 0.14 Hz, LF) and high frequency components (0.15 -0.4 Hz, HF) of the HRV spectrum as well as the time domain indices (e.g., RMSSD, pNN50 etc, for references see Table 3).In addition, 4 studies calculated the LF/HF ratio as a supposed marker of sympatho-vagal balance (Melo et al., 2017;Qin et al., 2021;Shi et al., 2022;Smith et al., 2019).Similarly, to compute proportional sympathetic and parasympathetic activities, numerous studies calculated normalized LF (LFnu) and HF (HFnu) power, respectively (Delliaux et al., 2019b;Qin et al., 2021;Shi et al., 2022;Smith et al., 2019;Tran et al., 2009).The normalization method, however, was not always clarified in the studies whether, for example, LFnu was calculated as LF power / total power, or LF power / (Total power -VLF).See a short description about each of the HRV indices used in studies reviewed in the Supplementary material, S4.

Methodological quality and risk of bias
Using the modified Downs and Black checklist (see description above), the quality check of the studies revealed that, of the maximum score available (i.e. 13 points), the studies reviewed scored 8.68 points Note.x: the instruction was reported in the study, -information about the instruction is non-available in the study.The period to which the instructions were applied varied across studies.q. specific instruction about mental activity was not mentioned in the study, but the participants reported that they did not have a mental overactivity.
on average with a range from 7 to 10.5 (median = 9, SD = 1.30).The results of quality check for each individual study are shown in the Supplementary material (S3) and summarized for each quality criterion in Fig. 2. Scores on the quality criteria differed: For example, while there were criteria that were met by all studies, we also found criteria that none of the studies met.More specifically, each study clearly described the intention of interest (item 4, Reporting quality), precisely reported the main findings (item 6, Reporting quality), and had valid outcome measures (item 20, Internal validity).In contrast, it was nondeterminable whether the participants were representative for the entire population from which they were recruited (item 11 and 12, External validity): Because all articles included experimental studies, relatively small sample sizes were used.The sample size chosen were justified with a-priori power analysis in only 6 of the 19 studies (item 27, Power).

Frequency domain
The effect of ToT on the VLF component of the HRV spectrum showed no change in a psychomotor vigilance task (Smith et al., 2019), and in two tasks assessing cognitive control: AX-CPT (Smith et al., 2019), and task-switching task (Delliaux et al., 2019b).In contrast, prolonged performance on the Stroop task -used widely to measure cognitive control-, induced a significant increase in VLF as a function of ToT (Smith et al., 2019).In the studies reviewed, the findings for VLF were not explicitly linked to changes in specific psychological functions (e.g.cognitive effort, motivation) or sympato-vagal balance.
The few studies that included measures of MF HRV consistently found a ToT-related increment in this component.Performance on a vigilance task (Brouwer & van Wolffelaar, 1985), a working memory task (Herlambang et al., 2019), and a complex problem-solving task (i.e.Sudoku; Herlambang et al., 2021) were all associated with increased MF as a function of ToT.In each of these studies, MF was considered a marker for mental effort and the authors interpreted their findings as evidence for an impaired sustained effort with increasing ToT.Please note that MF was defined as the upper range of the LF HRV spectrum in the reviewed studies implying that the autonomic profile underlying MF may be similar to that of the LF.That is, MF may also reflect baroreflex activity (see e.g.Hippel et al., 2001).
LF or its normalized, transformed measures (e.g.LFnu, lnLF; henceforth LF) were analysed in 10 studies with various tasks (i.e. 10 different tasks in total, but Smith et al., 2019 used more than one type of task).In seven tasks, LF was observed to show an increase over time (Delliaux et al., 2019b;Karthikeyan et al., 2021;Karthikeyan et al., 2022;Matuz et al., 2021;Qin et al., 2021;Smith et al., 2019;Zhao et al., 2012;Shi et al., 2022).In three other tasks, however, no relationship with ToT, was observed (Pattyn et al., 2008;Tran et al., 2009;Smith et al., 2019).The tasks associated with LF increment were highly variable (e.g.different simulator tasks, response selection tasks, memory tasks etc), and could be considered relatively complex regarding their underlying cognitive operations.Only one study reported LF-related findings for the less cognitively complex psychomotor vigilance test that mainly requires sustained attention: in Smith et al.'s (2019) study a 45-min-long vigilance performance did not cause a significant increase in LF.Most studies interpreted LF either as a marker of the sympathetic activity (e.g.Pattyn et al., 2008;Zhao et al., 2012) or otherwise as an HRV component under dual sympathetic and parasympathetic influence (Delliaux et al., 2019b;Qin et al., 2021).The finding of an increased LF with ToT, especially in combination with vagal-mediated components of HRV showing different trends (e.g.HF decreased or unchanged), led the authors of the reviewed studies to conclude that mental fatigue is particularly associated with an increased sympathetic activity (e.g.Tran et al., 2009;Zhao et al., 2012).Only one study has made the direct link between LF and baroreflex sensitivity (Matuz et al., 2021).
In each of the studies, HF was considered a predominantly parasympathetic component of HRV.The trends found for HF, however, were far from conclusive.While a HF (or HFnu) decrement was observed in four studies (Nann et al., 2021;Zhao et al., 2012;Qin et al., 2021;Tran et al., 2009), HF (or HFnu) showed no ToT related change in six studies (Smith et al., 2019;Delliaux et al., 2019b;Qin et al., 2021;Shi et al., 2022;Tran et al., 2009;Herlambang et al., 2019).In one study (Karthikeyan et al., 2022) a mixed trend was observed: HF showed first an increase until about the mid-duration of the task and then it significantly decreased.Moreover, four studies actually reported an increment in HF over time (Smith et al., 2019;Delliaux et al., 2019b;Matuz et al., 2021;Matuz et al., 2019).Finally, the LF/HF ratio as a widely used, but often criticized, index of sympato/vagal balance was found to relate to ToT induced fatigue in two of the six studies only.These two studies used prolonged simulator tasks to induce fatigue and found increased LF/HF ratio over time (Qin et al., 2021, flight simulator with in-built vigilance task;Shi et al., 2022, monotonous simulated driving task with speed limit).The other four studies using highly variable task settings (see e.g.Smith et al., 2019 testing LF/HF in three different tasks) found no significant effect of ToT on LF/HF ratio (Melo et al., 2017;Smith et al., 2019;Tran et al., 2009;Delliaux et al., 2019b).The lack of change in the LF/HF ratio is in line with the finding that LF and HF do not necessarily vary proportionally and in opposite directions with ToT (see e.g.Matuz et al., 2021;Karthikeyan et al., 2022).

Time domain
In each of the eleven studies that have used time-domain HRV indices (e.g.RMSSD, pNN50, SDNN, SD of HR), these indices were interpreted as markers of parasympathetic activity (i.e.vagal tone).Seven out of the ten studies reported increased time-domain indices as a function of ToT (Qin et al., 2021;Shi et al., 2022;Delliaux et al., 2019b;Karthikeyan et al., 2021;Mascord and Heath, 1992;Matuz et al., 2021;Matuz et al., 2019).One study (Karthikeyan et al., 2022) observed an increasing trend of RMSSD and SDNN over three 12-min-long ToT phases followed by a plateau and a decline in these measures in the last two phases.In contrast, decrement over time was observed in one study (Melo et al., 2017), and two studies found no significant changes in these indices (Redondo et al., 2019;Tran et al., 2009).

Non-linear domain
Only 4 studies reported changes in non-linear HRV indices as a function of the ToT (Delliaux et al., 2019b;Matuz et al., 2021;Zhao et al., 2012;Gergelyfi et al., 2015b): each of these studies found an increased SD1 and/or SD2 over time.Of the different entropy indices, approximate entropy (ApEn) was examined in two studies, but contradictory results were found: while ApEn decreased with ToT in Delliaux et al., 2019b, it increased in Zhao et al. (2012).None of the other non-linear indices (e.g. the other entropy indices: SampEn, ShanEn) did significantly change with increasing ToT.

Discussion
The present study systematically reviewed empirical studies that have measured HRV and used Time-on Task to induce mental fatigue in healthy individuals.In line with the selection criteria, in each study reviewed, we found a clear indication of increased mental fatigue induced by ToT: higher fatigue was reported by participants after the experiment than before and/or cognitive performance declined as a function of ToT.The studies clearly specified the HRV indices calculated on the ECG signals.
After screening 360 studies, we found 19 studies that met all the inclusion criteria.This suggests that while the view that HRV is a fatigue marker is widely accepted, so far there have been remarkably few controlled (e.g.minimizing environmental noise) experimental studies that have directly tested the relationship between fatigue and HRV in healthy participants.Although the number of studies was low, the findings were generally consistent: except for two studies, all studies reported significant changes in HRV with increasing ToT.The consistency of this finding under highly varying task conditions (e.g.task duration, type of the task) yields the general conclusion that ToT frequently affects HRV.Importantly however, the trends and sensitivities of HRV indices as a function of ToT were markedly different across the studies.We found LF to be one of the most frequently calculated indices (i.e.analysed in 10 studies), showing a consistent increase with ToT (i.e.increment was found in 7 studies, and no significant change in 2 studies).In the reviewed studies sympato-vagal was commonly quantified as the balance between LF, assumed to be a sympathetic source (or a mixed sympathetic-parasympathetic source), and HF, assumed to be a parasympathetic source.In the literature, however, the proportional comparison of LF and HF has been seriously challenged and is often considered to be an oversimplification of the complex and non-linear relationship between the sympathetic and parasympathetic systems (e. g.Billman, 2013;Houle & Billman, 1999).
In addition, the LF/HF ratio was found to be non-consistent and, thus, non-conclusive in terms of its association with ToT.Only two of the six studies calculating the LF/HF found this ratio indicative for ToT induced mental fatigue.A similar inconsistency in the relationship between LF/HF ratio and fatigue was reported in a recent systematic review that specifically addressed drivers' fatigue (Lu et al., 2022).
Besides LF, all other frequency components of HRV were investigated in a few studies only (e.g.VLF, MF) or otherwise showed mixed results across the studies in terms of the direction of change over time.For example, HF decreased in 4 studies, increased in 4 studies, and showed no ToT related changes in 6 studies.Oscillation in HF is usually attributed to parasympathetic outflow but has also been linked to high sensitivity of respiratory functions (Laborde et al., 2017;Billman, 2013).HF may reflect a cardiorespiratory interaction that modulate energy expenditure in accordance with the changing demands (Grossman & Taylor, 2007).
In contrast to the frequency measures discussed above, the review of the time domain indices (e.g.RMSSD, SDNN) revealed more conclusive results.These indices were analysed in 11 studies and, nine showed a consistent HRV increase over the ToT.Time domain indices are typically interpreted as a vagal mediated component of HRV (e.g.Laborde et al., 2017) and thus their increment with ToT indicates that mental fatigue is mainly associated with an enhanced parasympathetic tone (please see the potential caveats of this interpretation in the Introduction.).Similarly to the present review, Hidalgo-Muñoz et al. ( 2018) who investigated HRV during a prolonged performance of a realistic flight simulator, also reported increased time domain HRV indices in association with ToT.The study of Hidalgo-Muñoz et al. (2018) was not included in the present review because subjective fatigue was not measured and no ToT-related changes in cognitive performance were reported.
One important inference that can be drawn from results on the time domain indices is that the increasing parasympathetic dominance with ToT seems to indicate that mental fatigue is not so much associated with activation of stress systems but rather involves a disengagement from the task at hand.
Although the non-linear HRV components (e.g.SD1, SD2) were calculated in only a fraction of the studies reviewed here (i.e.four studies), there are, at least, two reasons to nevertheless discuss these findings here.First, each of the three studies calculating SD1 and SD2 observed a clear increase in these components as a function of ToT.Second, it was previously demonstrated that RMSSD is mathematically and empirically identical to SD1: SD1 equals RMSSD multiplied by 1/ √2 (Ciccone et al., 2017).This, however, indicates that increasing SD1 found in the studies should be interpreted similarly to the increasing trend observed for RMSSD.That is, they should both be considered as indications of the enhanced parasympathetic functions with ToT.Future research may want to consider the non-linear indices as reliable markers of fatigue and the indication of SD1 for parasympathetic functions.
The generalisability of the above conclusions drawn is reduced by the fact that, overall, the external validity of the studies was low or could not be determined (see the quality check results).Specifically, all studies used university students or young people as participants and did not use samples representative of the wider population.This is, however, a typical limitation in human experimental studies, therefore the studies reviewed here were not different in this respect from other research in the field.In addition, as a limitation of our conclusions, we need to note that this study reviewed published articles only and not considered unpublished or grey data, and so there is a possibility that the conclusion may be affected by publication bias.On the other hand, the risk of such a bias seems to be very low based on the methodological study conducted by Schmucker et al. (2017).
Finally, it is worth mentioning that the studies in this review all compared HRV during active task performance at different stages of ToT.Studies that compared passive (non-task related) HRV before and after the task were not included in this review.The HRV indices analysed also seem to vary in these studies.For example, Mizuno et al., (2014;2011) found a higher LF/HF ratio after the cognitively demanding task performance than before and concluded an increase in sympathetic activation underlying the enhanced level of fatigue.In contrast, a different conclusion seems to be provided by Van Cutsem et al. (2022) who analysed resting HRV before and after a cognitively demanding task and found a clear increase in the parasympathetic measures of HRV (RMSSD, pNN50).

Conclusion
To conclude, although HRV measures are considered indicators of fatigue and effort expenditure, there currently only is a limited number of studies that have directly assessed HRV during active task performance in ToT paradigms.Of the various HRV measures used in the studies, the low frequency component (LF) and time-domain indices seem to show the most consistent results regarding ToT.The general picture emerging from these indices is that increasing ToT is associated with an increase in LF and time-domain measures.According to recent interpretations, while the physiological background of LF remains rather unclear, the time domain measures can be linked to parasympathetic activity (e.g.Laborde et al., 2017).This also seems to suggest that when participants work on a cognitively demanding task for a prolonged period, they may be likely to be disengaged from the task whereas there are no clear physiological signs (i.e. based on HRV) of compensatory effort.
In general, knowing the physiological and cognitive processes that are affected by mental fatigue may contribute to further elucidate the nature of this state, and may yield applications for better detection or prevention of fatigue in real life settings.As such, the current review may be relevant as it provides a current state of the art of research in this field, and may also guide future research.
[2] Aim: research aims of the individual studies; [3] Sample: participant characteristics including gender, age, and sample size; [4] Procedure: details of the experimental sessions including the type and duration (ToT period) of the cognitive task performed as well as the presence or absents of control conditions; [5] HRV measurement: reporting the details of HRV recoding and analysis; [6] Findings: study outcomes in relation to the ToT-related changes in HRV, performance measures, and subjective fatigue; [7] Interpretation: authors' interpretation of the results including the changes found for HRV.

Fig. 1 .
Fig. 1.Flow diagram showing the study selection.*For two Databases, only the 600 most relevant records were screened for eligibility.

Fig. 2 .
Fig. 2. Quality check results based on the Modified Downs and Black checklist for the assessment of methodological quality.Items (i1 to i27) are numbered in accordance with the original numbering of the Checklist.For the detailed description of each Checklist item see Supplementary material (S2).Numbers in the bars represent the number of studies that met (Yes), unmet (No), partially met (Partially yes) a quality check criterion or their quality were undeterminable for the criterion (Unable to determine).

Table 1
Pre-defined search terms.

Table 4 (
Continued) Summary about the characteristics and results of the reviewed studies.

Table 5
Summary of instructions that participants needed to follow before the experiments.