HRT assessment reviewed: a systematic review of heart rate turbulence methodology

Heart rate turbulence (HRT) is a biphasic reaction to a ventricular premature contraction (VPC) mainly mediated by the baroreflex. It can be used for risk stratification in different disease patterns. Despite existing standards there is a lot of variation in terms of measuring and calculating HRT, which complicates research and application. Objective: This systematic review outlines and evaluates the methodological spectrum of HRT research, especially filtering criteria, parameter calculation and thresholds. Approach: The analysis includes all research papers written in English that have been published before 12.10.2018, are listed on PubMed and involve calculation of HRT parameter values. Main results: HRT assessment is still being performed in various ways and important specifications of the methodology are not given in many articles. Nevertheless, some suggestions regarding HRT methodology can be made: a normalised turbulence slope should be used to uncouple the parameter from heart rate and frequency of extrasystoles. Filtering criteria as formerly reviewed in the guidelines should be met and mentioned. The minimal number of VPC snippets (VPCSs) as well as new cut-off values for different risks need to be further evaluated. Most importantly, the exact and complete methodology must be described to ensure reproducibility and comparability. Significance: Methodical variation hinders comparability of research and medical application. Our continuing questions help to further standardise the measurement and calculation of HRT and increase its value for medical risk stratification.


PRISMA
Preferred Reporting Items for Systematic reviews and Meta-Analyses PVS Programmed ventricular stimulation refI Reference interval RMSSD Square root of the mean of the squared successive differences between adjacent intervals ROC Receiver operating characteristics SCD Sudden cardiac death SDNNI SDNN index, mean of the standard deviation of all normal sinus rhythm intervals in any 5 min segments SDNN Standard deviation of of the averages of all normal sinus rhythm intervals in any 5 min segments TO Turbulence onset #TSRR The number of intervals in which TS is calculated TS Turbulence slope TT Turbulence timing VLoF Very low frequency power VPCI VPC interval VPCS VPC snippet VPC Ventricular premature contraction VT Ventricular tachyarrhythmia 1. Introduction

Rationale
Heart rate turbulence is the naturally occurring fluctuation of heart rate (HR) after a ventricular premature contraction (VPC) (see figure 1) (Schmidt et al 1999). While re-establishing the former blood pressure, which fluctuates due to a VPC, a characteristic heart rate pattern occurs (see figure 2): as the VPC is a premature beat, it leads to a shortened interval, called the coupling interval (couplI). This interval is followed by a compensatory interval (compI) that is much longer than a normal cycle duration, because the ectopic beat suppresses one contraction that would regularly have been triggered by the sinus signal. After these two irregular intervals, a characteristic pattern can be observed which consists of an initial fast increase of HR followed by a smooth HR decrease and a latter return to the baseline. The term heart rate turbulence (HRT) refers to this fluctuation of the HR after the compI. The turbulence depends mainly on the baroreflex and thus is an indirect marker of the condition of the autonomic nervous system (Cygankiewicz et al 2013). Two HRT parameters, turbulence onset (TO) and turbulence slope (TS), were developed to define HRT.
TO quantifies the initial increase of the heart rate after the compI and is suggested to reflect vagal inhibition (Lombardi et al 2011). The VPC is hemodynamically inefficient compared to a normal sinus beat: It has lower contractility strength due to the missing atrial contraction, incomplete electrical recovery, and thus less synchronization and it moves less blood volume due to less diastolic filling and higher afterload. Hence, the systolic blood pressure drops. This lack of afferent baroreflex input results in vagal inhibition (Bauer et al 2008). To quantify this fast reaction, the relative difference between the arithmetic mean of the two beats after compI (RR 1 &RR 2 ) and the arithmetic mean of the last two beats before couplI (RR −1 &RR −2 ) is calculated. TO is given as a percentage: It is first calculated for each VPC and averaged afterwards to get a subject's overall TO value. TS is the steepest slope of the increase of interval length following the compI and reflects vagal activation (Lombardi et al 2011): after the initial inhibition vagal activity recovers with increasing interval lengths . The slope is measured as the steepest regression line over any 5 succeeding intervals within the first 20 intervals after the compI and is given as ms/RR. In contrast to TO, an averaged tachogram of all suitable extrasystoles is calculated first, before TS is assessed once for a subject.
The workflow of HRT assessment begins with a 24-h Holter electrocardiogram (ECG)-recording. The record has to be cleaned from erratic data, meaning artefacts and noise. Afterwards, all VPC snippets (VPCSs) have to be singled out. A VPCS contains several intervals: the couplI between a sinus beat and the early occurring VPC, the long compI between the VPC and the following sinus beat, and a number of regular intervals preceding (preRRs) and following (postRRs) these two VPC intervals (VPCIs). The intervals of the VPCS are filtered based on a set of established criteria . These criteria regard the length of the intervals to discard ectopy within the regular intervals and ensure a sufficient prematurity of the VPC. Afterwards, HRT can be calculated from these suitable VPCS as mentioned before. In healthy volunteers TO ranges from −2.7% to −2.3%, TS ranges from 11.0 to 19.2 ms/RR interval (Diaz et   Example of a ventricular premature contraction (VPC): A stimulus originating from the ventricles leads to a broad QRS complex with an abnormal morphology, here shaped like a right bundle branch block pattern. The extrasystole appears prematurely, superimposes the naturally occurring sinus signal and causes two specific intervals: The coupling interval (couplI) and the compensatory interval (compI) are seperated by the VPC. The length of these two intervals is equivalent to the length of two of the surrounding sinus RR intervals. The intervals before (preRR) and after (postRR) the VPC interval are framed by regular sinus beats.
2003, . Finally, patients can be classified into risk groups. Established as representing a high risk are TO values higher than 0% and TS values less than 2.5% as suggested by Schmidt et al (1999).
In many studies the two HRT parameters have been identified as feasible markers for all-cause mortality after myocardial infarction or chronic heart failure as reviewed by Cygankiewicz et al (2013). In recent studies HRT has been shown to significantly contribute as predictive factor in different clinical settings. HRT combined with traditional heart rate variability increases the sensitivity of the diagnosis of early cardiac autonomic neuropathy in diabetic patients to 98% (Lin et al 2017). Including HRT to a predictive model of death associated with chronic heart failure improved the accuracy of the model Ramírez et al (2017). HRT increased the predictive sensitivity and specificity in life-threatening ventricular tachyarrhythmias (VTA) (Frolov et al 2017). The predictive value of HRT has also been shown in subjects with Marfan syndrome (Schaeffer et al 2015), chronic obstructive pulmonary disease (Gunduz et al 2009), metabolic syndrome  or obstructive sleep apnea syndrome (D'Addio et al 2013).
In contrast to heart rate variability (HRV) that shows autonomic activity during sinus rhythm, HRT reflects autonomic responses to endogenous interference. Cardiac diseases can significantly depress autonomic functions and thus attenuate HRV (Malik et al 1996). Since they can also increase VPC occurrence (Gorenek et al 2020), HRT may be able to detect residual autonomic activity that cannot be detected by HRV. Therefore, it can be a valuable addition to risk prediction, that usually incorporates many different indicators.
In 2008 the International Society for Holter and Noninvasive Electrophysiology Consensus was published (Bauer et al 2008). Though it is referred to as a standard for HRT calculation, it did not define norms but rather summarised commonly used methods. Despite the common ground established in this paper, HRT methodology varies widely regarding data recording, filtering of measurements and calculation itself. Different methodologies lead to different results. This lack of comparability hinder the understanding of HRT, e.g. the intensity of influences on this phenomenon. It also complicates finding suitable standards for HRT application in the medical field.

Objectives
We systematically assess the methodology of determining HRT parameter values in research since its original description. The scope of the study is not to review the clinical use of HRT but the assessment itself. The aim is to outline the different types of HRT assessment as well as their usefulness in order to create a basis for determining precise guidelines that will lead to higher comparability between studies.
When defining guidelines the aim of ECG recording and HRT assessment have to be considered: Recording ECGs in order to assess HRT is rare and limited to scientific research. In most cases HRT is Figure 2. Schematic visualisation of heart rate turbulence (HRT): the ventricular premature contraction (VPC) creates a short coupling interval (couplI) between a sinus beat and the VPC and a long compensatory interval (compI) between the VPC and the following sinus beat, while an intermediate sinus signal is skipped. The actual HRT occurs afterwards, beginning with the first interval after compI, when the interval lengths show an initial drop off, a steep increase and a following decline. Turbulence onset (TO) represents this first drop-off in HR, while turbulence slope (TS) represents the subsequent increase. calculated from records made during clinical practice and without focus on HRT. Therefore, guidelines have to take into account what is possible to implement in the daily clinical routine.

Reviewing details
Our approach followed the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) statement (Liberati et al 2009) wherever possible. The PRISMA statement provides guidelines for systematic reviews and meta-studies in the medical field (Liberati et al 2009). Since the scope of our study differs from common reviews not all guidelines were applicable.
In a PubMed search made on 12.10.2018 we gathered all articles containing the term 'heart rate turbulence' . After checking for HRT assessment and language we examined the remaining 240 articles regarding the exact methodology of HRT that was used in the respective study. As baseline methods we used the filtering criteria of , the calculation rules by Schmidt et al (1999) and the minimum number of needed VPCS as reviewed by Bauer et al (2008). A detailed description of our methods with all examined aspects, baseline methods and resulting number of articles can be found in the appendix Methodological details.
We want to give an overview about the most common practices regarding HRT assessment as well as a critical evaluation to provide the reader with guidelines. Therefore we structured our findings into chapters following a customary data assessment: • Section 2 'Data collection' summarises the variety of the length of the ECG measurement, different approaches to assess HRT after other arrhythmias and the technique of inducing VPCs to circumvent the difficulty of getting enough suitable VPCSs. It also includes the approach to adapt TS to address several biasing influences due to different recording lengths.
• Section 3 'Data preparation' assesses the application of filtering criteria for VPCSs by . Furthermore, it covers the number of VPCSs needed to calculate HRT and the different approaches to handle measurements with lesser VPCSs. • Section 4 'Data analysis' covers variations of the calculation techniques by Schmidt et al (1999) and a third established HRT parameter. • Section 5 'Classification' discusses cut-off values for risk stratification and classification systems. • Section 6 'Description of HRT assessment' covers the methodological details that need to be given in publications.

Data collection
The ECG recording duration is very important for HRT calculation for three reasons: First, there must be enough suitable VPCSs available for HRT calculation. This can be a challenge since the occurrence of VPCs in healthy hearts can be very low (Kostis et al 1981). Secondly, TS is correlated with the number of VPCSs used for calculation and therefore correlated with the length of the measurement (Cygankiewicz et al 2004, Hallstrom et al 2004. And finally, since HRT is based on the baroreflex and therefore the autonomic nervous system, it depends on the circadian rhythm (Chen 2011, Cygankiewicz et al 2004, Hallstrom et al 2005.

Common recording durations
Although most inspected articles state the planned recording duration, the actual recording duration (i.e. given as mean of the population) is mostly missing: in 198 articles either Holter monitoring or 24 h are reported as the planned recording duration. Some studies used short recordings with a planned duration of less than 1 h (Raj et al 2005, Berkowitsch et al 2004, Davos et al 2009, Malberg et al 2004, Davies et al 2001, Jeron et al 2003, Wichterle et al 2006, Segerson et al 2007, Tekelioglu et al 2013 or between 2 and 8 h , Rich et al 2012, Gil et al 2013, Tekelioglu et al 2013, D'Addio et al 2013, D'Addio et al 2014, Watanabe et al 2008. In several studies 24-h Holter recordings were cut into 1, 2, 4 or 6 h snippets (Cygankiewicz et al 2004, Hallstrom et al 2004, Hallstrom et al 2005, Lewis et al 2011, Watanabe et al 2007. In contrast, two studies used longer recordings, namely 28 h (Ovreiu et al 2008) and 7 days (Manzano-Fernández et al 2011).

Methods to enable HRT assessment
Naturally, HRT can only be calculated when recordings contain VPCSs. Based on the researched population, obtaining a sufficient number of suitable VPCSs can be challenging, since the prevalence of VPCs is positively correlated with increasing age (Kostis et al 1981) and higher cardiovascular risk (Latchamsetty et al 2015, Lown et al 1971). The amount of VPCSs is likely to decrease even more after filtering. Options to address the problem of having a low incidence of VPCs for HRT assessment are: extending the recording time, inducing VPCs or using other heart beats to calculate HRT.

Extending the recording duration
The HRT guidelines (Bauer et al 2008) (Iwasa et al 2005). This is to be expected due to different measurement conditions and lengths: First, the number of VPCs used for calculation is correlated with TS as mentioned before (Cygankiewicz et al 2013). Depending on the incidence of spontaneous VPCs in subjects, TS may differ significantly. Secondly, hemodynamic changes depend on the subject's position and movement: it can be assumed that values differ between Holter ECG (with varying, but mostly upright position and movement) and induced HRTs (with supine position and rest). Makai et al showed that TS differed significantly from supine to upright position and correlated significantly with baroreceptor sensitivity in both positions. However, it should be mentioned that HR, which is negatively correlated with TS, also differed significantly, but was not taken into account as covariate in this study (Makai et al 2008). Thirdly, sedation can have an affect on the cardiovascular system depending on the sedative and its dose (Lau et al 1993, Kanaya et al 2003, Frölich et al 2011. Unfortunately, the article by Iwasa et al does not specify the sedative or differences of other cardiovascular parameters (Iwasa et al 2005). To sum up, although HRT can be triggered by electrical stimuli, the validity of HRT from induced VPCs at rest in contrast to HRT from VPCs in Holter ECG remains to be investigated. Again, this approach is unfitting for the clinical practice, because it entails additional stress and effort for both patient and medical staff.

HRT based on other arrhythmias
The turbulence following a VPC is mainly mediated by the baroreflex responding to blood pressure turbulence (Wichterle et al 2006, Davies et al 2001, Lin et al 2002. It can be assumed that the baroreflex reacts similarly to other arrhythmic events and, thus, HRT is also present. This has been studied for APCs, premature sinus beats and ventricular tachyarrhythmia (VT). To prevent ambiguity, HRT and its parameters are marked in this section as ventricular (V), atrial (A), sino-atrial (S) or tachycardial (T).
Atrial premature complexes: Heddle et al analysed the HR dynamics after an atrial premature contraction (APC) and found no turbulence but continuously decreasing interval lengths. The first interval after the APC was lengthened and the interval lengths decreased for about five intervals (Heddle et al 1985). In contrast, Lindgren et al compared HRT after APCs and VPCs and found a one-beat-delayed HRT A : The first interval after compI was relatively long and was followed by a shortened second interval (Lindgren et al 2003). A similar pattern of a one-beat-delay followed by a HR acceleration and deceleration was shown by Wichterle et al (2007).
HRT A was milder, meaning TS A was lower and TO A higher, but still negative (Lindgren et al 2003, . Similar results, but with a positive TO A were found by Savelieva et al , Wichterle et al and Ovreiu et al (Savelieva et al 2003, Ovreiu et al 2008. The abnormal TO A may depend on the study populations, consisting of patients referred to VT evaluation  or after cardiac surgery (Ovreiu et al 2008). However, about half of the patients examined by Schwab et al had sudden cardiac death (SCD) but a negative median for TO A anyway . This effect should impact both TO and TS, but Savelieva et al described a strong correlation for TS A and TS V , but not for TO A and TO V . An explanation is the one beat delay existing in HRT A , Lindgren et al 2003. Thus, an adjustment of the TO A calculation using the second and third interval after the APC compI should be more suitable and may better reflect TO A . While Savelieva et al and Schwab et al found no correlation between the prematurity (namely the couplI length) and HRT A , except for TO A induced in the sinus node , in another study TS A and TS V have been shown to both inversely and significantly correlate with couplI . Wichterle et al later found a correlation between APC prematurity and TO A , but not TS A .
In three studies the usefulness of HRT A was investigated: TS A could be used as a risk stratifier for all-cause mortality after myocardial infarction (MI), although it was a weaker predictor than TS V . In their study Wichterle et al found a dichotomy optimum of 0.8 ms/RR as cut-off for TS A . In another case, patients with atrial fibrillation (AF) after cardiac surgery had significantly higher TS A , while there was no difference in TO A (Ovreiu et al 2008). Finally, while no other measurements showed significant differences between non-AF and AF episodes, TO A was impaired within one hour before atrial fibrillation (Vikman et al 2005).
Sinus beats: Voss et al first proposed HRT after sinus beats using 'most premature normal beats' defined as having a couplI length <80% and compI length >120% compared to the mean 4 of all intervals in the measurement. No significant difference was found in all HRT parameters (TO S , TS S , CI/COMPP S 5 ) between dilated cardiomyopathy (DCM) patients and controls. TO S was positive and TS S attenuated in comparison to HRT V . The same approach was used by Jochum et al to find differences in patients before and during clomethiazole therapy. Except from a significant change of TS S before and after 2 h of medication, no changes in HRT after ventricular or sinus beats were found. Again, TO S was positive while TO V was negative. TS S and TS V showed similar results (Jochum et al 2012).
In conclusion, a sinus beat with suitable prematurity can trigger a turbulence, but HRT S lacks the typical initial HR decrease. This is similar to HRT A , but neither Jochum et al nor Voss et al describe a one beat delay (Jochum et al 2012. Ventricular tachycardia: The third approach was to calculate HRT after spontaneous or induced VT (Flevari et al 2007, Raj et al 2005. TO T (Raj et al 2005, Flevari et al 2007 as well as TS T , Raj et al 2005, Flevari et al 2007 were present after VT. Flevari et al report, that HRT V and HRT T correlated significantly, although TO T was significantly higher than TO V (but still marginally negative) (Flevari et al 2007). Additionally, both HRT T parameters correlated significantly with HRV parameters: TO T correlated with SDNN (Flevari et al 2007). Analogously, TO V has been shown to correlate with SDNN (Koyama et al 2002, Sestito et al 2004, Ghuran et al 2002, Lindgren et al 2003. TS T correlated with SDNN, left ventricular ejection fraction (LVEF), SDNNI, VLoF and HiF (Flevari et al 2007). Similarly, TS V correlated with LVEF (Koyama et al 2002), SDNN (Cygankiewicz et al 2004, Sestito et al 2004, Ghuran et al 2002, Koyama et al 2002, SDNNI (Cygankiewicz et al 2004, Sestito et al 2004, LoF (Cygankiewicz et al 2004, Sestito et al 2004 and HiF (Cygankiewicz et al 2004, Koyama et al 2002 (further information about HRV parameters can be found in the standards of measurement, physiological interpretation, and clinical use of HRV (Malik et al 1996)).
A turbulence pattern exists after VT, but results of HRT T are ambiguous: Different results can be found regarding the correlation of HRT with the mean interval length of VT (Flevari et al 2007 and the number of tachycardiac beats (Flevari et al 2007, Raj et al 2005.
In conclusion, ventricular tachycardia triggers HRT T and may be usable for HRT analysis. Like TS V , TS T is diminished in subjects with reduced LVEF , Flevari et al 2007, while TO T is not (Flevari et al 2007). However, the studies used subjects with cardiomyopathies (VT, heart failure (HF)) and no control groups, so a prognostic value of HRT T remains to be analysed. Furthermore, the ambiguous results inhibit a sound deduction whether the number and interval lengths of beats of the VT influence HRT T (see 2.

Methods to reduce bias on parameter values
HRT can be influenced by various factors. Apart from diseases associated with the impairment of the autonomic nervous system, HRT is biased by age, HR and the number of VPCSs used for the calculation as reviewed by Bauer et al (2008) and Cygankiewicz et al Cygankiewicz et al (2013). The latter two influences have to be taken into account during data collection, because they can have implications on the desired ECGs recording time.

Influences related to recording duration
Heart rate: The baroreflex correlates with HR (Melenovsky et al 2005); consequently HRT correlates with HR as well: TS has been shown to be decreased with higher HR , Cygankiewicz et al 2004, Kowalewski et al 2007. Zaza et al reason that the non-linear neural modulation of the HR leads to a rate-dependency of all autonomic markers (Zaza et al 2001). However, TO does not seem to be influenced by HR , Cygankiewicz et al 2004. Contrastingly, Schwab et al report a correlation of TO and HR as well as a correlation of TS and HR in men, but not in women . The authors speculate that this sex-dependent variation is based on a higher sympathetic tone in men as described by Ramaekers et al (1998). Regarding TO an affect of HR is less likely, since TO characterises sudden changes while TS is based on a slow change in HR (Cygankiewicz et al 2004).
Circadian rhythm: Since HR shows a circadian rhythm and HRT is dependent on HR, the circadian rhythm can bias HRT parameter values when recording durations shorter or longer than 24 h are used (Hallstrom et al 2004). Even with normalisation with respect to HR, TS has been shown to keep a weaker circadian pattern (Watanabe et al 2007). In this article, even TO showed this rhythm (Watanabe et al 2007), that has not been shown before in studies analysing circadian pattern in HRT (Hallstrom et al 2005, Hallstrom et al 2004, Cygankiewicz et al 2004. A circadian rhythm that is not only based on the HR can be assumed since the incidence of VPCs is higher at day time than night time (Lown et al 1973, Chen 2011. Number of VPCs: With increasing number of VPCSs HRT is attenuated (Chen 2009). Some studies show only a correlation between TS and VPCSs (Cygankiewicz et al 2004, Hallstrom et al 2005  Secondly, Chen proposed a baroreflex fatigue caused by a high VPC burden and suggested that longer pauses between VPCSs should be used for HRT calculation (Chen 2009). Last, the relationship of TS and the number of VPCSs is mathematically induced (Hallstrom et al 2004). Even for a sequence of 15 normal beats with random variation, the maximising step will find a slope proportional to the standard deviation. This standard deviation in turn depends on the number of VPCSs, because the averaging of the tachogram reduces the standard deviation by a factor of 1/ √ #VPCSs.

Adjustment of turbulence slope
An approach to bypass these biases on TS is to normalise it with regard to HR and the number of VPCSs used. This parameter would be independent of the circadian pattern and the length of the recording. Several adaptions have been proposed that are discussed below. To reduce confusion we adopted the nomenclature of the parameters used in the respective original papers. Cygankiewicz et al adapt TS to HR (Cygankiewicz et al 2004, Cygankiewicz et al 2004 by assuming a relationship of the form: where TS c is the 'corrected' TS, that is the part of TS that is independent of HR. The parameter x is population-specific. It can be determined by a linear regression of log RR against logTS, since For their population, Cygankiewicz et al find a value of x = 3.4. With this, TS c becomes Hallstrom et al used two normalisation steps for HR and the number of VPCSs. First, they rescaled the tachogram to an average HR of 75 bpm before calculation which yields nTS. In a second step, they also address the fact that TS (and also nTS) is biased towards higher values for a low number of VPCSs. For a set of intervals that should yield a TS of zero, the maximising step in the calculation actually introduces a slope proportional to the standard deviation of the local variation in the intervals. The averaging step before the calculation of this slope reduces this standard deviation, which can be assumed to be equal to RMSSD 6 , by a factor of 1/ √ #VPCSs. This yields the following formula: Here k is equal to the number of intervals in which TS is calculated (#TSRR). The numerical parameters were determined by Hallstrom et al by a fit of the formula to simulated data. The new unbiased estimator vnTS is then defined as This approach was reused by Yang et al (2005) and D'Addio et al (2014). The adjustments of TS to HR and the number of VPCSs abolished the dependency on circadian patterns for both parameters as well as the dependency on recording duration for vnTS Hallstrom et al (2004). Like TS the new parameter vnTS is correlated with age Hallstrom et al (2004).
Hallstrom's vnTS showed the same clinical value as the standard TS: D'Addio et al found a significant difference of TS as well as vnTS 7 in obstructive sleep apnea syndrome (OSAS) patients during normal and obstructive apnea breathing (D'Addio et al 2014). Similarly, Yang et al used vTS (only adapted to the number of VPCSs) and found a significant decrease of both TS and vTS in patients with moderate or severe OSAS compared to patients with mild OSAS with both parameters identically inversely correlating with apnea-hypnea index . But the positive and negative predictive value of TS was higher than of vTS . Still, TS as well as vnTS were both univariate predictors of survival after MI (Hallstrom et al 2005). Cygankiewicz' adjusted TS c , too, showed the same results as TS and significantly decreased after CABG surgery (Cygankiewicz et al 2004). 6 The HRV parameter RMSSD is the square root of the mean of the squared successive differences between adjacent intervals Malik et al (1996) However, an TS adjusted to the HR as well as the number of VPCSs used is less variable and therefore much more comparable.

Conclusion: data collection
Most articles use 24-h recordings for HRT assessment. However, a minimal number of VPCSs has to be recorded for parameter calculation, which may be a challenge in different populations. Therefore, longer recordings could increase the amount of suitable data similarly to HRT induced by electrical stimuli and HRT calculated from other types of contractions. While extending the recording duration and induced VPCs are not feasible for clinical routine, calculating HRT based on other contractions might increase the accessible data. Best studied is HRT after APCs, but further research is needed to determine the prognostic value of these types of HRT. Particularly, we suggest whether a shifted onset of HRT after a APCs in comparison to VPCs can be verified and eventually adapt TS calculation to take this shifted onset into account. For HRT assessment, where longer recordings are possible, the extent of HRT variability between days remains to be studied and should be taken into account when determining the optimal recording duration.
Furthermore, HR and the number of VPCSs bias TS. This leads to a dependency of this parameter on the recording duration as well as on circadian rhythm. To cancel this influence we suggest to use a TS adjusted to HR and the number of VPCSs. Nevertheless, to improve comparability we suggest to give the arithmetic mean of the ECG measurement duration and the time of day if the recording durations are no multiple of 24 h as well as the mean of VPCSs used for HRT calculation per patient.

Research questions:
• Can HRT after other arrhythmias be used with a similar prognostic value?
• Is there a day-to-day variability of HRT? • How long is the optimal recording duration to calculate valid HRT parameter values?

Data preparation
As a first step after data collection, the recorded data has to be filtered for suitability. Apart from checking the quality (e.g. noise), data is scanned for two aspects: Firstly, VPCSs have to be suitable for HRT calculation meaning that enough sinus beats have to be present before and after the VPCI and all intervals have a suitable length. Secondly, records must contain a minimal number of suitable VPCSs.

Filtering criteria
Only snippets that are as free as possible of bias, e.g. other arrhythmias, and which contain effective VPCs with a sufficient prematurity should be included in HRT calculation. Therefore, filtering of the VPCs and their surrounding intervals is necessary. Since filtering criteria of the ECG data determine the input for the calculation, the exact procedure is crucial. In the original publication of HRT no criteria for filtering were suggested (Schmidt et al 1999). The first set of criteria was introduced by  (see the filtering criteria summarised in the section 1.1 or detailed in the appendix). These criteria were repeated almost identically in the HRT guidelines (Bauer et al 2008). Nevertheless, there are many different approaches to filter the interval series before parameter calculation. Some studies report excluding VPCSs containing abnormal beats (arrhythmia or ectopy) or erratic data (artefacts and noise). Commonly, a quantitative filtering is made regarding the length and the range of the inspected intervals.
Length of intervals in a VPCS: The minimal prematurity as well as the minimal length of compI varies in different studies and is sometimes not explicitly stated. The surrounding preRRs and the regular intervals in a VPCS after the compI (postRRs) are either filtered via absolute values or proportionally in comparison to preceding intervals or a calculated reference interval (refI) (see table 1).
Range of checked intervals: The first range in which intervals were checked was introduced by Davies et al consisting of 20 intervals before and after the VPCIs, respectively (Davies et al 2001). The range of 2 preRRs and 15 postRRs, which was proposed by Bauer et al, was first defined by  and propagated   regarding the length of all intervals in a VPC snippet (VPCS). How many intervals are included as surrounding intervals varies (see paragraph Range of checked intervals). NS: not specified in article.

Criterion
RefI References Length of couplI  (2008) a Notice: The phrasing in the papers is 'mean' , which probably refers to the arithmetic mean. b Notice: The phrasing in the papers is 'mean' , which probably refers to the arithmetic mean. c The exact type of mean is not specified in the articles. on www.h-r-t.org/.com 8 . Most authors do not explicitly specify the checked range before and/or after the VPCIs (176 articles), though some cite Bauer's guidelines (Bauer et al 2008) or use the algorithm from www.h-r-t.org/.com. Most authors that state the range explicitly use the ranges of 2 preRRs and 15 postRRs, 5 preRRs and 15 postRRs, or 20 intervals before and after, respectively (see table 2).
Although only two intervals are needed for HRT calculation, many studies filter more intervals before the VPC. This is in accordance with the results of Chen showing a negative and positive correlation of TO and TS, respectively, with the number of VPCs within the 2 minutes before the VPC used for HRT calculation (Chen 2009). However, the results may be biased for several reasons: different sample sizes of HRTs with or without preceding VPCs in the study (Chen 2009), the incidence of VPCs due to circadian rhythm (Lown et al 1973, Chen 2011 and the correlation of diminished HRT with a high incidence of VPCs (Cygankiewicz et al 2004).

Minimal number of VPCSs
Number of VPCSs needed for HRT analysis: The minimal number of suitable VPCSs needed for HRT analysis (#minVPCS) is not explicitly stated in most articles (173). Sometimes a minimal number of VPCs is given as a criterion for a patient's inclusion in the study; however, these VPCs are not explicitly defined to be part of VPCSs suitable for analysis. While Bauer et al suggested 5 VPCSs for HRT calculation (Bauer et al 2008), various numbers between 1 and 100 have been used, mostly 5, 1 and 2 (in descending order of the frequency of usage, see table 3).  All numbers used for #minVPCS are merely suggestions, because the optimal #minVPCS has yet to be systematically identified. By now there have only been two studies focusing on #minVPCS: Both did not find any difference of the predicitve value of HRT when comparing the whole dataset or just patients with a minimum of 2 (Osman et al 2004) or 4 (Berkowitsch et al 2004) VPCs. However, their approach compares two groups where one is the subset of another, which inherently makes finding a difference less probable. Cygankiewicz et al found that TS may be only correlated to the number of VPCSs for low absolute numbers of VPCSs, while the dependency of HRT on the number of VPCs vanished in patients with more than 10 VPCSs (Cygankiewicz et al 2004). A systematic study to find #minVPCS for reliable risk stratification should not only focus on significant differences of HRT parameters between study groups, but on their variances and should be done with a larger range of numbers of VPCSs.
Data with insufficient number of VPCs: Most authors do not specify the procedure regarding measurements not meeting the #minVPCS criterion. In most cases it can only be assumed that the subjects were excluded from analysis. The second most common approach is categorising subjects with too few VPCs in a low-risk group. Some studies use the category ( Some authors differentiate between too few VPCSs and a sufficient number of VPCSs, that, however, are not measurable (Carney et al 2007. It is reasonable to categorise subjects with too few #minVPCS as having low risk, since a high number of VPCs is correlated with high risk of SCD in chronic heart disease (Lown et al 1971) and poor prognosis after MI as reviewed by Latchamsetty et al (2015). The data situation however is insufficient and contradictory:  For risk stratification in the clinical practice a standardised workflow needs to be established. It must be investigated whether an unsuitable number of VPCSs displays a low risk regarding HRT or whether HRT should be removed from the risk model in that case.

Conclusion: data preparation
After the common step of ensuring the quality of a recording, the VPCSs need to be sorted out. Therefore, filtering criteria specify the length of the intervals and the range that has to be checked. These criteria are mostly not mentioned in the literature and if mentioned vary widely. We suggest the usage of Grimm's filtering criteria for the time being . One exception from these criteria should be the number of preRRs being 5 intervals instead of just 2, because these intervals are used as reference for other filtering criteria. It should be analysed how many intervals before the couplI are suitable to exclude VPCSs that are directly preceded by a VPC and are therefore biased. It is noteworthy, that the number of postRRs should be longer or equal to #TSRR.
The #minVPCS mostly used is 1, 2 or 5. The optimal #minVPCS still needs to be determined systematically in a large sample considering the difficulty of low incidence of VPCs on one hand and statistical validity of a sufficient amount of data points on the other. Subjects with less VPCSs are either excluded from the studies or classified as having low risk. It must be investigated which procedure is optimal for risk stratification in the clinical setting.

Research questions:
• What is the minimal and optimal number of VPCSs to calculate valid HRT parameter values? • How many preRRs are needed to avoid bias by preceding arrhythmias?
• What is the optimal handling of subjects with too few VPCSs?

Data analysis
After collecting a minimal amount of suitable VPCSs, HRT parameters can be calculated. The most common parameters are TO and TS as defined by Schmidt et al (1999) and the new parameter turbulence timing (TT). Although these parameters are equally calculated in most of the inspected articles, the number of intervals in which TS is calculated varies as well as the calculation order of all parameters.

Turbulence onset
Most studies (203) follow the calculation process suggested by Schmidt et al (1999). In 136 articles the method is explicitly stated, in 67 a calculation reference is given. In 31 studies TO is used, but no calculation or reference is described. TO is consistently calculated with the two intervals before and after the VPCIs, respectively, and is hardly calculated with a different approach.

Turbulence slope
In most studies #TSRR is either 15 as first suggested by Barthel et al (2003) or 20 intervals as suggested by Schmidt et al (1999). In 129 articles the range is explicitly stated; 77 provide only a calculation reference. Apart from this the calculation process remains the same. Some studies examine the use of TS that was adjusted to several biases, which we describe in 2.3.2. For higher comparability we suggest to use this adjusted TS as defined by Hallstrom et al (2004) instead of the original TS.

Turbulence timing
The most commonly calculated HRT parameter apart from Sc hmidt's originals is turbulence timing (TT). It was first defined by Watanabe et al as the index of the first interval in the interval series, whose regression line has the steepest slope and is therefore used for the TS calculation . It has been used since then in nine of the studies inspected here (Schwab et  To this day, no study used this cut-off on a healthy study group to evaluate its specificity.

Numbers of intervals used for TS calculation
Until now the optimal #TSRR has not been studied. One standardised range is crucial for comparability, because differing #TSRR can lead to different filters and therefore different VPCS sets as well as diverging TS values for the same VPCs. The optimal range can possibly be deducted from TT, since it describes the first interval of the sequence used to calculate TS. An analysis of the statistical keypoints of this parameter might indicate the optimal value of #TSRR. Table 4 shows values for TT that have been obtained by articles covered in this work. The arithmetic mean TT ranges from 3.6 ± 1.0  to 7.3 ± 3.0 (Średniawa et al 2010), meaning that in most cases 15 intervals seem to be suitable as #TSRR. It must be noticed that the choice of #TSRR limits the maximal value of TT and can thus bias the results. Apart from that, a short #TSRR is favourable to increase the number of available VPCSs. Since a turbulence appears in proximity to the VPC, we hypothesise that TS with a TT of 7 or more, meaning a #TSRR of more than 10, just presents random fluctuation rather than HRT. Watanabe et al found 6 to be the maximum TT of all subjects, so 10 was defined to be the minimum number of intervals needed after a VPC to calculate HRT . In  (2002) theory, the intra-subject variability in TT could also be used to distinguish between TS occurring randomly and based on autonomic regulation. For this, TT must be calculated before the tachogram is averaged. If the turbulence is based on a regulated mechanism, it is likely to be similar for each VPCS, thus having little intra-subject variance. If autonomic regulation is failing, however, and TT is calculated from random fluctuations, the variance should be considerably higher, while the TS can still lie in the normal range. However, no results of correlation analysis between TT and TS have been published by now apart from Cebula et al, who found no correlation between all HRT parameters (Cebula et al 2012). Thus, the optimal #TSRR remains to be determined.

Calculation order
TO as well as TS are influenced by the order of their calculation: Schmidt et al propose to first calculate TO in each VPCS and average the results afterwards while TS should be calculated from the averaged tachogram (Schmidt et al 1999). In a few studies TO was calculated from the averaged tachogram instead ( Chen et al showed, that the order of the workflow has an impact on the parameter values: Though the results from both methods were significantly correlated for both parameters, the arithmetic mean of TS after Schmidt's method was considerably lower than TS with averaging afterwards. Since averaging smooths the slope due to different TT, it decreases the overall slope in the tachogram and therefore decreases TS. This difference is most notable when TS values are very low (Chen et al 2011). Thus, averaging the TS results afterwards leads to classification in lower risk groups compared to Schmidt's method as shown by Chen et al (2011), Soguero et al (2013. Notice, that Soguero et al used a different approach to assess HRT categories, thus results may change when the data were assessed as usual (Soguero et al 2013). Nevertheless, the order of calculating and averaging of the HRT parameters affects the results and should therefore be performed uniformly. While averaging the tachogram first reduces noise, assessing TS from the seperate tachograms takes into account the variable onset of the HR acceleration.

Mean vs. median
In the original article by Schmidt et al, the method description uses the general term 'average' , but the numbers given in the article fit the definition of the arithmetic mean (Schmidt et al 1999). Accordingly, the arithmetic mean is mostly used as descriptor for HRT. Few studies use the median , Witham et al 2012 or a trimmed mean  instead.

Conclusion: data analysis
TO is almost consistently calculated as described by Schmidt et al. TS is calculated differently with either 15 or 20 intervals in which the slope is measured. The new parameter TT might help to determine the optimal range for #TSRR. It is defined as the index of the first interval from which TS is calculated and therefore provides information about the occurrence on TS after the VPC. A first analysis leads to #TSRR of 10, but this needs in-depth investigation. However, TT is a new parameter that is easy to assess without additional cost and has been shown to be a feasible risk stratifier. Therefore, we suggest to further investigate its usefulness and possible cut-offs. Another variation of TO and TS calculation is the order of the calculation steps. In some studies they are reversed to Schmidt's original approach. Whether the tachogram itself or the parameter values from single VPCSs should be averaged is not comprehensively studied yet. If parameters are calculated from single VPCSs, they should be given as arithmetic mean.

Research questions:
• How many intervals are sufficient for #TSRR? • In which order should TO and TS be calculated?

Classification
The main usage of HRT is risk stratification. For this purpose, every HRT parameter is dichotomised on the basis of given cut-off values. As supposed by Schmidt et al many studies used the cut-offs 0% for TO (138 articles) and 2.5 ms/RR for TS (136 articles). Other articles proposed alternative cut-offs. Depending on the dichotomisation of each parameter, the record can be classified into risk or non-risk categories. Different classification systems have been established from two to five categories.

Cut-off values
Alternative Cut-off Values: Some authors suggest other cut-off values (see tables 5 and 6). These values are either statistical descriptors like the median or determined with receiver operating characteristics (ROC) analysis. Accordingly, Cygankiewicz et al used the quartiles of their coronary artery disease (CAD) population -0.37% for TO and 4.25 ms/RR for TS, because Schmidt's cut-offs were determined for postinfarctional patients (Cygankiewicz et al 2004, Cygankiewicz et al 2004, Cygankiewicz et al 2003, Cygankiewicz et al 2003. Likewise, quartiles were used in stable CAD patients by Sestito et al (2004). Medians were also used as cut-offs, namely 0.1% (TO) and 2.0 ms/RR (TS), for patients with cardiomyopathy after implantable cardioverter defibrillator implantation (Seegers et al 2016) and 0.005% (TO) for risk-stratification of disease deterioration in patients with liver cirrhosis (Jansen et al 2018).
The following studies performed ROC analysis to determine optimal cutoffs: Schaeffer et al identified 3.95 ms/RR to be the optimal cut-off to stratify risk of cardiac events in patients with Marfan syndrome (Schaeffer et al 2015). Karakurt et al used 1.2 ms/RR for TS and found a significant correlation between TS and mortality rate in children with DCM (Karakurt et al 2007). In patients with acute decompensated HF TS with a cut-off of 1.695 ms/RR was predictive for cardiac events (Yamada et al 2018). Yuan et al did not find any prognostic power of TO and TS with the cut-off values -1.17% and 12.1 ms/RR, respectively, in patients with acute coronary syndrome (ACS) (Yuan et al 2015).
Some studies compared their suggested cut-offs with the original values and showed a similar performance of both cut-off types: Schmidt's cut-off values as well as quartiles (0.025% and 1.27 ms/RR) and continuous values were all useful to predict all-cause mortality in patients with CHF caused by ischemic cardiomyopathy (Cygankiewicz et al 2006). The quartiles 0.22% for TO and 1.42 ms/RR for TS were used for CHF patients, but no difference was found compared to the results using Schmidt's cut-off values (Cygankiewicz et al 2008). Quartiles 0.31% and 1.5 ms/RR were used for risk stratification in MI patients, but showed no significant results (Berkowitsch et al 2004). A cut-off of 3.2 ms/RR for TS was proposed in postinfarctional patients with malignant ventricular arrhythmias, though its performance was similar to Schmidt's cut-off regarding sensitivity and specificity (Szydlo et al 2011).
Some new cut-off values yielded better results than Schmidt's values. In the Cardiovascular Health Study, investigating cardiovascular disease in older adults, the cut-off 3.0 ms/RR for TS was used (Patel et al 2017, Kop et al 2010, Stein et al 2010. In contrast to Schmidt's settings, this value showed a significant correlation between cardiac death and TS . The predicitve capability of TS with the new cut-off has also been shown in other studies as well, but without comparison to the traditional values (Stein et al 2010, Koyama et al 2002. A comparison of the cut-off values by Stein et al and Schmidt et al with data from the Eplerenone Post-Acute Myocardial Infarction Heart Failure Efficacy and Survival Study showed that Stein's TS cut-off had a slightly higher sensitivity but slightly lower specificity than Schmidt's value . Other cut-offs showing improved performance were -1.52% (4th decile) and 4.9 ms/RR (6th decile). In contrast to Schmidt's cut-offs, these values allowed HRT parameters to Table 5. Cut-off values for TO in %. New cut-offs were determined in different populations on the basis of statistical descriptors or ROC analysis. The performance was tested in comparison to the traditional cut-off value 0%. +: better performance. = : no difference in performance. * It is difficult to find a single cut-off value which is suitable for different datasets, even with the same patient background. Only three new cut-off values have been used in more than one study (see tables 5 and 6). However, most of these studies are either based on the same dataset , Patel et al 2017, Kop et al 2010 or do not compare the new cut-off with Schmidt's original values (Cygankiewicz et al 2004, Cygankiewicz et al 2004, Cygankiewicz et al 2003, Cygankiewicz et al 2003. Different populations: It is important to notice that cut-off values have first been set for postinfarction patients (Schmidt et al 1999). In the following, HRT has been used for patients with other pathological background without setting new cut-off values. For example, the standard cut-offs were used in chronic obstructive pulmonary disease (Gunduz et al 2009), diabetes mellitus (Lin et al 2017, Balcioglu et al 2007) or metabolic syndrome , Yilmaz et al 2006, even apparently healthy subjects , Poręba et al 2011.
Additionally, Yilmaz et al analysed children, while Schmidt et al analysed elderly patients (Yilmaz et al 2006, Schmidt et al 1999. Due to the age-dependency of HRT, however, it can be expected that other values should be used for different age groups . This is done by Stein et al, because the population's age was higher than the population used for Schmidt's cut-offs. A new cut-off 3.0 ms/RR for TS was determined . This is interesting, since heart rate turbulence is decreased with increasing age , thus a lower cut-off would have been expected in elderly people. The cut-off values are also used in the context of medical treatments, e.g. to show an effect of elective gynecologic surgery on autonomic functions (Tekelioglu et al 2013) or compare different types of hemodialysis (Kaplan et al 2016). Since HRT is a feasible marker for autonomic functionality, it might be used for these purposes, but specialised cut-offs should be determined for every population.
Number of false-positives: Although the standard cut-off values for high-risk patients after infarction were established on the basis of a large population, there are indications that these values still need optimisation. Some studies have reported false-positives, when examining patients indicated for assessment of supraventricular arrhythmias (Vukajlovic et al 2006) or apparently healthy subjects . This especially relates to TO. Therefore, Grimm et al even recommended that only TS is to be used, because TO is not specific enough .
Customised cut-offs: Instead of using a fixed value, a formula could determine an adaptive cut-off value. This would adjust the cut-off value to the background of the given dataset, avoiding the necessity to determine a great amount of cut-off values for different populations and use cases. However, this approach has not been implemented in the literature and is impractical for risk stratification in the medical routine.

Classification systems
Based on the cut-offs, subjects are categorised as having distinctive HRT (with low risk of adverse outcome) or having impaired HRT (with high risk). The most common categorisation system is defined by Ghuran et al composed of the three categories: (1) TO and TS normal, (2) either parameter abnormal, and (3) both parameters abnormal (Ghuran et al 2002). The categories are generally called HRT0-2 and are used for dichotomisation in 23 inspected articles (Maeda et  Stein et al and D'Addio et al used categories HRT0-2, but differentiated HRT1 into 'TO abnormal' and 'TS abnormal'  or HRT1a (TO abnormal) and HRT1b (TS abnormal) (D'Addio et al 2013), respectively. One system was introduced that considers the number of VPCs: Carney et al suggested the two new classes 'a' with less than 5 VPCs and 'b' with a sufficient number of VPCs but not calculable because of artifacts etc The standard classes HRT0-2 are called c-e, respectively (Carney et al 2007). It should be investigated, whether a simplification of the classification system yields sufficient results or a differentiation provides added value. Only Barthel et al reported analysing the discriminative power of the used categories, resulting in combining HRT0 and HRT⊘ (Barthel et al 2003).
When not only TO and TS but also TT are analysed, the three categories A (all normal), B (at least one abnormal) and C (all abnormal) have been used (Średniawa et al 2016, Cebula et al 2012). The classification into HRT0-2 displays HRT at one point in time. To show HRT changes, Kurpesa et al grouped patients into the categories type I (improvement of both parameters), II (worsening of both) and III (improvement and worsening of one, respectively) (Kurpesa et al 2007).

Conclusion: classification
If subjects are dichotomised, Schmidt's cut-off values are mostly used. Different cut-offs for several populations have been proposed with equal or better results than the original cut-offs, but none can clearly be recommended. The cut-off 0% of TO leads to more values being classified abnormal and therefore a higher false-positive rate than the cut-off 2.5 ms/RR of TS. Cut-off values should be reviewed with the result of either new cut-off values for different population backgrounds like medical condition and age or mathematical formulas for adaptive cut-off calculations.
On the basis of these cut-offs subjects can be categorised into different risk classification systems. The most common is a classification into the three groups HRT0 (both parameters normal, low risk), HRT1 (one parameter normal, intermediate risk) and HRT2 (both parameters abnormal, high risk.) Other systems consider subjects that fall short of suitable VPCSs, the new parameter TT or HRT development between two points in time. The usefulness of these classification systems with more or less groups than HRT0-2 should be studied. If subjects are categorised into groups, the handling of subjects with an insufficient number of VPCSs should be described.

Research questions:
• Which cut-off values are optimal for which population and question?
• Is there an alternative for hard cut-off values?
• Are other classification systems more suitable than the standard HRT0-2?
• Are the prognoses of subjects in categories HRT0 and HRT⊘ comparable?

Description of HRT assessment
When reviewing the methodology used in the 240 inspected articles we encountered missing, unspecific or conflicting descriptions. Transparency and reproducibility depend on full disclosure of the used methodology of a study. Therefore, we summarise the steps of HRT assessment that most commonly have not been sufficiently described.

Recording duration
Some articles provide no recording duration at all (Yang et al 2013, Sandberg et al 2014, Lenis et al 2013, Lenis et al 2013. As mentioned, most articles only state the planned recording duration (see section 2.1), not the actual recording duration that could help estimate the bias on TS. Some authors report the average of the actual recording duration as the arithmetic mean , Lammers et al 2006, Kowalewski et al 2007, Bienias et al 2015, Liu et al 2014, Secemsky et al 2011, Balcioglu et al 2015, Lindgren et al 2003, others as the median , Bienias et al 2010a, Bienias et al 2010b, Bienias et al 2010c, Bonnemeier et al 2003, Zhong et al 2007. Some articles of the same first authors report different types of means (Balcioglu et al 2015, Bienias et al 2010a, Bienias et al 2010b, Bienias et al 2010c, Bienias et al 2015.

Number of VPCSs
As with recording duration, the arithmetic mean of the number of VPCSs actually used for HRT calculation is rarely provided. If a number is mentioned, most of the times it is not transparent whether the number includes different kinds of ectopic beats, all VPCs or just VPCs used for analysis. Additionally, the number may be absolute or extrapolated for 24 h. 46 articles give the mean of VPCs per hour or day, partly even before filtering out patients without suitable VPCs, while 129 articles give no number of VPCs at all. Only in 27 of the 240 articles studied the mean of VPCs in the recording per patient is stated absolutely as either arithmetic mean (17), median (9) or both (1). Again, the majority of these articles report the overall number of VPCs, not the number of VPCSs suitable for HRT calculation, or the number is not clearly described. In contrast, Wichterle et al stated the numbers of VPCs and APCs, both recorded and suitable for analysis, respectively, listing four medians .

Filtering criteria
In contrast to the calculation methods, most articles explain their filtering criteria either very briefly or not at all. Examples include vague descriptions such as 'very short or long cycle lengths' (Cano et al 2008), no 'too long' couplI or 'too short' compI (Manzano-Fernández et al 2011) or no 'inappropriate RR intervals' (Manzano-Fernández et al 2011. Some articles describe filtering only as 'manual review' , Arslan et al 2008 or 'visual examination' (Stein et al 2010).
The refI is often defined ambiguously, for example as '5 consecutive sinus intervals' (Golukhova et al 2016), 'normal RR-interval' (Jochum et al 2012, 'the sinus RR interval' (Jeron et al 2003) or 'normal interval ' (La et al 2011, La et al 2012, not giving the range on which basis refI is calculated.
Some articles give as filtering reference the HRT analysis software HRT View or the algorithm from www.h-r-t.org/.com 9 , but their reported values of the number of intervals used for filtering differ from the default values of the software tools (2 preRRs and 15 postRRs) (Wustmann et al 2009, Dursun et al 2015, Celik et al 2011, Karakurt et al 2007, Yorgun et al 2012, Celik et al 2011b, Schwab et al 2011a, Celik et al 2012, Iwasa et al 2005.

Calculation
As mentioned, #TSRR is often explicitly stated or given with a calculation reference. However, of the articles stating this parameter explicitly, 43 studies use a different range than the respective calculation reference. In three articles the range 15 to 20 intervals is given without further explanation which number is used in which case (Szymanowska et al 2008, Soguero et al 2013,Średniawa et al 2016.

Conclusion: description of HRT assessment
Mostly the actual recording duration is not provided in the literature, but only the planned recording duration. Descriptions differ between the arithmetic mean and the median. Similarly, the number of VPCSs used for calculation is mostly missing. Since TS is correlated with the number of VPCs and therefore the length of the recording, comparability between studies is only given if the mean of the actual recording duration -or better the number of actually used VPCSs -is stated.
Filtering criteria are usually not or only partially given. The #TSRR as all calculation steps is usually provided, however, many studies use a different range than the respective reference for the calculation workflow. Unclear definitions and therefore varying filtering criteria lead to a selection of different sets of VPCSs out of the same data and consequently to different results. Of course references can be given for the used workflow, which creates the need to accurately carry out HRT assessment as stated there.

Conclusion
We studied the methodology assessing HRT parameter values since the original description of HRT (Schmidt et al 1999) in 240 articles. Since 1999 the methodology has barely changed. The most substantial adjustment was the description of filtering criteria in 2003 . Despite the summary of the then present methodology in 2008 (Bauer et al 2008), many different approaches have been used before and after the publication of these guidelines.
HRT is mostly assessed from 24-h recordings. To increase the number of suitable VPCSs, the recording duration can be extended, VPCs can be triggered or HRT can be calculated after other contractions, namely APCs, VT and premature sinus beats. For clinical practice only increasing the data base through other triggers is feasible, for which APCs are most promising. However, more research is needed to determine the usability of HRT after APCs.
The recording duration and the circadian rhythm influences TS, because the parameter is correlated with HR and the number of VPCs. Therefore, we suggest to use a TS adjusted to HR and the number of VPCSs (Hallstrom et al 2004). Several adjusted TS parameters have shown the same prognostic value as TS, but increase comparability between studies.
A step mostly not disclosed in literature and with the most variations is filtering. With minor changes we suggest the usage of Grimm's filtering criteria . Also widely varying is the number of suitable VPCSs needed for calculation. The optimal #minVPCS still needs to be determined systematically as well as the handling of subjects with less VPCSs.
The calculation workflows vary considerably less than the filtering. Only the optimal range in which the regression line for TS calculation is determined and the calculation order of TO and TS vary. Both need further investigation. A new parameter TT, that is the index of the first interval of the TS regression line, is easy to asses without additional cost and has been shown to be a feasible risk stratifier. Therefore, we suggest to add it to future HRT assessments.
HRT is mostly used for risk stratification using the cut-off values proposed by Schmidt et al. However, cut-off values are needed for different population backgrounds. The most common classification consists of the three groups HRT0-2. The usefulness of other classification systems with more or less groups than HRT0-2 should be studied.
Many methodological details were not or incompletely given in the literature. This includes the actual recording duration, number of VPCSs used for calculation and the filtering criteria. Regarding the number of intervals used to calculate TS the descriptions and references given in some articles were contradictory. We recommend to precisely provide the HRT assessment steps (see Suggested methodology as template).
The current state of HRT research is found lacking in comparability, transparency and thus reproducibility as well as control for confounding variables. As it is, this systematic review, although involving all published papers on the topic that are listed on PubMed, cannot constitute conclusive evidence how HRT should be assessed to achieve reliable classification of subjects. There are many contradicting results that may not be due to a limited predictive power, but due to limitations caused by the varying assessment of HRT. We believe that with further research and uniformly used assessment standards HRT can become a more useful and common risk stratifier in clinical practice.

Limitations
Our study has a few limitations: There is a risk of bias in the assessment of the frequency of a certain approach in the literature, since there are groups working intensively on the subject and thus particular methodologies of these groups are more frequently used and therefore more common than methodologies of other researchers. Furthermore, we included only original papers that were listed on PubMed.
• COTO: cut-off for TO, if used • COTS: cut-off for TS, if used • #TORR: number of intervals used before and after VPC to calculate TO • #TSRR: number of intervals in which the 5-interval-sequence for TS calculation was assessed (see calculation of HRT below) • ø VPCSs: arithmetic mean of VPCSs included in calculation for each subject • Other calculation methods varying from Schmidt's original methods (Schmidt et al 1999), Grimm's filtering criteria  or Bauer's guidelines (Bauer et al 2008).
The baseline methods, that are accepted as standards, are as follows: • Filtering of VPCSs : • During filtering the intervals are compared to a refI that is the arithmetic mean 11 of the five intervals preceding the couplI. • couplI must be no more than 80% of refI • compI must be at least 120% of refI • normal intervals (intervals before couplI and after compI) used for filtering are two intervals before couplI and 15 intervals after compI • normal intervals must be longer than 300 ms and shorter than 2000 ms • the difference between succeeding normal intervals must be less than 200 ms • the normal intervals must not differ 20% or more from refI • Calculation (Schmidt et al 1999): • TO is the relative difference between the arithmetic mean of the two beats after compI and the arithmetic mean of the last two beats before couplI given as percentage: • TO is first calculated for each VPC and averaged afterwards to get a subject's overall TO value • TS is the steepest slope of a regression line over any 5 succeeding intervals within the first 20 intervals after the compI • First an averaged tachogram of all suitable VPCSs is calculated and then TS is assessed once for a subject • Other standards are (Bauer et al 2008): • all beats (apart from the VPC itself) in the VPCS must be normal sinus contractions • to calculate a subject's overall HRT parameters a minimum of 5 VPCSs is needed Only explicitly stated data were extracted from the articles. It was only assumed that the calculation methodology of a referenced article was used if it was explicitly mentioned as a calculation resource. Furthermore, if a reference was given for calculation methods, it was only inferred that these exact calculation methods were used but not the filtering criteria of the cited article. If an article was cited for e.g. 'HRT assessment' , calculation and filtering methods were assumed to be adopted. Other assumptions were, firstly, that filtering was not done if no filtering criteria were mentioned. Secondly, if the algorithm implemented by the TMU working group of Schmidt was used, standard filtering criteria were assumed. The criteria could be assessed on www.h-r-t.org or .com. Unfortunately the websites have been offline for some months now, but some snapshots can be found via the Wayback Machine of the Internet Archive: web.archive.org.
A search in PubMed made on 12.10.2018 resulted in 339 records. After screening of the abstracts 88 records were removed because of duplication (1), language (21), studied organism (8) or article type (comments (18), reviews (37), meta-studies (3)), leaving 251 articles for detailed analysis. The full texts of two articles could not be accessed from databases and after reaching out to the authors. In the remaining articles the aforementioned methods were assessed. Of these articles seven had to be removed because they did not involve HRT calculation. Another two were removed, because no HRT methodology was described and HRT values and units did not fit. Finally, 240 articles were analysed. A list of all articles can be found in the supplement. The numbers can be seen in the PRISMA flow diagram (figure A1).
If not stated otherwise, no reasons were given by the authors for the variations from the standard criteria that are discussed in this analysis.

Suggested methodology
We suggest the following methodology, similar to the already established methods, until the questions mentioned in the review are sufficiently analysed: • Recording: • The recording duration should be a multiple of 24 h, optimally 72 h • Filtering of VPCs: • All beats (apart from the VPC itself) must be normal sinus contractions • The length of the refI is the arithmetic mean of the 5 intervals preceding the couplI • The couplI must be no more than 80% of refI • The compI must be at least 120% of refI • Surrounding intervals (intervals before couplI and after compI) used for filtering are 5 intervals before couplI and 15 intervals after compI • Surrounding intervals must be at least 300 ms and no more than 2000 ms • The difference between succeeding intervals must be no more than 200 ms • The intervals must not differ 20% or more from refI • Calculation: • To calculate a subject's overall HRT parameters a minimum of 5 VPCSs is needed • TO • is the relative difference between the arithmetic mean of the two beats after compI and the arithmetic mean of the last two beats before couplI given as percentage (see equation (A1)) • TO is first calculated for each VPC and averaged as arithmetic mean afterwards to get a subject's overall TO value • Adjusted TS: aTS • TS is the steepest slope of a regression line over any 5 succeeding intervals within the first 15 intervals after the compI • First an averaged tachogram of all suitable VPCSs is calculated with the arithmetic mean of all intervals of one index, respectively, and then adjusted to a HR of 75 bpm (or 800 ms) • Then TS is assessed from the averaged, normalised tachogram • Finally, TS is adjusted to the number of VPCSs and variance with aTS = TS − 0.283 · RMSSD/ √ #VPCSs

• TT
• is the index of the interval, from which the TS regression line is calculated, beginning with the first interval after compI being one and so forth • Description: • The actual recording duration • The exact filtering criteria for VPCSs • The number of VPCSs actually used for the calculation • The exact calculation steps in their order

HRT in animals
HRT was not only measured in humans but also in animals. We found ten studies during our literature research, but excluded them because they did not fit the scope of the review. One full-text could not be obtained (Liu et  The standard HRT methodology is used in studies investigating HRT in animals with the following variations: In the three studies using mice and cut-offs, the HR was normalised to a cycle length of 800 ms (75 bpm) in order to reuse the human standard cut-off values (Stöckigt et al 2015, Stöckigt et al 2014, Mersmann et al 2010. To meet the different characteristics of mice, Stöckigt et al introduced the parameter TS 3 , that uses 3 intervals instead of 5 to calculate the steepest slope (Stöckigt et al 2015, Stöckigt et al 2014. Additionally, Stöckigt et al calculated TO as the mean of 3 instead of 2 intervals to take a delayed acceleration onset into account (Stöckigt et al 2014). Petric et al calculated TO as the difference of the two intervals before and any two consecutive intervals after the VPC up to the sixth and seventh interval after the VPCIs (Petric et al 2012). As in some studies focusing on humans, two of three studies regarding dogs calculated individual TS values before averaging (Harris et al 2017, Noszczyk-Nowak 2012b, while the third article does not describe the exact methodology (Noszczyk 2012a). Harris et al used the HRT View program and adapted it to dogs, but without further explanation on the nature of the adaption (Harris et al 2017).
The usage of HRT in animals is similar to HRT in humans. HRT was used as a cardiac marker for the description of implantable cardioverter defibrillator therapy (Wang et al 2011), for comparison between healthy dogs and dogs with DCM (Harris et al 2017, Noszczyk-Nowak 2012b or subaortic stenosis (Noszczyk 2012a). Furthermore, it was used for comparison between healthy mice and mice with heart disease, precisely transverse aortic constriction or myocardial cryoinfarction (Stöckigt et al 2015), and for comparison of wildtypes and mutants (Mersmann et al 2010, Petric et al 2012. Analogously to the findings of Iwasa et al, that HRT measured during PVS and Holter monitoring differed significantly (Iwasa et al 2005), Petric et al described a difference in TO between HRTs measured in 24 h and induced ones in mice (Petric et al 2012).

New HRT parameters
Since the first description of the two HRT parameters TS and TO, many enhancements and new parameters have been developed. Adjustments of TS and TT are discussed in the review. Another parameter, Turbulence dynamics (TD), is measured in a sliding window over a coordinate system with TS on the y-axis and the HR of the three intervals before the couplI on the x-axis. The exact averaging method is not stated on the paper. Supposedly, the arithmetic mean of the three intervals is calculated. It is defined as the steepest slope of the regression line over the part of the plot with the most negative correlation. The width of the sliding window is 10 bpm, for the calculation of a regression line five data points are needed and TD is measured in ms/RRI bpm. TD reflects the relationship between TS and HR. It was found to be an independent predictor of late mortality in patients after myocardial infarction and with low LVEF (< = 40%) .
(Lenis et al characterised four parameters: The damping coefficient d measures the stability of the system. ω 0 is the 'resonance frequency' and measures how rapidly the system responds to an external influence. The two 'Morphology HRT' parameters MTO and MTS are calculated analogously to HRT but depend on T-wave morphology, so they compare different morphological features of the T-waves of the beats before and after the VPC (Lenis et al 2013, Lenis et al 2013).
The IPFM model, a model of the cardiac pacemaker, was adapted for HRT , Solem et al 2007. Based on this, a detection statistic was introduced by (Martínez et al that discriminates between HRT and no HRT better than TO and TS . The parameter T Σ characterising the HRT shape  was later used by Gil et al to compare HRT calculated from ECG and photoplethysmography (PPG) (Gil et al 2013). T Σ and T µ performed better than TO and TS to predict cardiac death in ischemic heart failure patients, needed less VPCSs and could predict mortality very early (within few months) . T Σ was used by Gil et al to compare HRT calculated from ECG and PPG and, in contrast, did not show any improvements over TO and TS (Gil et al 2013).
The parameter CI/COMPP was defined by Voss et al as a ratio between the length of couplI and compI. CI/COMPP did not show significant differences between DCM patients and controls .
Other parameters were defined but not used in the inspected articles in this review, namely Turbulence Frequency, Turbulence Jump and Correlation Coefficient as reviewed by (Watanabe 2003).
The performance of the summarised new parameters has either been tested only once (Neyman-Pearson detector, CI/COMPP, T Σ and T µ TD) or not at all (MHRT, ω 0 and d). Even though some parameters showed better results than the standard parameters, more studies on different populations are needed to determine their prognostic value.