Low sensitivity of the new FIGO classification system for electronic fetal monitoring to identify fetal acidosis in the second stage of labor

Highlights • Cardiotocography interpretation guidelines evaluated during second stage of labor.• Case-control study including neonates with cord artery acidosis at vaginal delivery.• Low sensitivity of FIGO intrapartum monitoring guidelines to detect acidosis.• The Swedish 2009 template had a high sensitivity.• The Swedish 2017 template had a high sensitivity with cut-off set at suspicious.


Introduction
Surveillance with cardiotocography (CTG) aims to detect signs of fetal hypoxia, enabling intervention before asphyxia occurs. Randomised studies from 1976 to 1993 comparing CTG with intermittent auscultation indicated that CTG monitoring lowered the incidence of neonatal seizures [1,2] but only one study showed a reduction of perinatal mortality [3]. Since then, the knowledge about CTG interpretation has grown [4].
In 1987 FIGO presented the first international guidelines of CTG [5], that was modified to different guidelines over the years. The systems differ in many aspects [6], leading to different classifications of the same tracings [7], and likely to different clinical decisions. In Sweden a national classification system for CTG interpretation, modified from FIGO-1987 and from the STAN template from 2007, was in use 2009-2016, SWE-09 [8].
In 2010 Ayres-De-Campos and Bernardes concluded that the FIGO guidelines from 1987 had limitations and called for a simpler and more objective guideline [6]. Not having an internationally accepted guideline was thought to lower the effectiveness of CTG [6]. In 2015 a new guideline and classification template on intrapartum fetal monitoring was introduced, FIGO-15 [9]. In Sweden, FIGO-15 was adjusted to SWE-17 (10), replacing SWE-09. CTG patterns are often markedly different in the first and second stage of labor. The second stage is the period of highest risk of hypoxia, and the fetus is affected by the higher intrauterine pressure [11,12].
This study was undertaken to evaluate the sensitivity and specificity for the three templates in detecting hypoxia during the second stage of labor, as indicated by acidosis at birth after vaginal delivery or after cesarean section in the second stage of labor.
The primary objective of this study was to compare the sensitivity and specificity of FIGO-15, SWE-09 and SWE-17, in identifying cases with acidosis at birth in the second stage of labor.

Materials and methods
This is a retrospective case control study including neonates with acidemia (case group) defined as cord artery or cord vein pH < 7.05, and controls defined as neonates with a cord artery and cord vein pH ! 7.15 and Apgar scores of 9 or 10 at five and ten minutes, all with CTG monitoring during the second stage of labor.
A power calculation estimated that 215 cases were needed to detect a difference in sensitivity between 80 % and 90 %, with 80 % power and a p-value of 0.05. To detect a difference in specificity between 60 % and 70 %, 386 controls were needed.
Cases and controls were collected from births at Skåne University Hospital in Malmö and Lund April 23 d 2013 -October 31st 2017, and Helsingborg Hospital March 13th2012 -December 31st 2016. Inclusion criteria for both groups were singleton pregnancy and an available CTG tracing of at least 30 min, ending at vaginal birth or within 30 min of second stage cesarean delivery. The case group consisted of newborns with cord artery or cord vein pH < 7.05 after vaginal birth or cesarean delivery in the second stage of labor. As controls, the first two neonates born consecutively after each case at the same hospital fulfilling the inclusion criteria above, except for acidosis, and who had both cord artery and cord vein pH ! 7.15 and at least 0.02 apart and Apgar scores 9 or 10 at five and ten minutes, were included.
In the international guidelines [9], preterm birth is not mentioned, but in the Swedish national guidelines [10], it is stated that after 34 weeks, guidelines for full term are used. We therefore excluded births prior to 34 full weeks.
Clinical data was gathered from patient files and computerized CTG tracings were evaluated. The last 30 min, and when available up to 80 min, of the tracings before birth were assessed. The tracings were anonymized and randomized.
The interpretations of the CTG tracings were performed independently by three professionals, representing trained obstetric staff (midwives, residents and obstetricians) with different levels in experience of electronic fetal monitoring from their daily work. All had performed educational programs including both the previous and the current classification templates. Each of the 886 traces were assessed by 3 of totally 21 interpreters. Each interpreter received a portfolio with tracings, information about the study and classification forms. The only additional clinical information provided was that it was a singleton pregnancy ending in vaginal birth or cesarean delivery in the second stage of labor. The graphic of the tracings was 1 cm/min. Each interpreter completed a form including the assessment of all the variables relevant for classification. This was done twice; in 2017 with a protocol for SWE-09 when it was in clinical use, and in 2018 with a protocol for FIGO-15 and SWE-17 when the interpreters had been re-educated and used the SWE-17 in clinical practice. The description of all the included variables by each interpreter were then used to classify strictly according to each template to classification normal, suspicious or pathological.
The final classification was composed of the three professionals' assessments of the variables in each trace, and the classification for each interpreter for each template, representing the majority assessment of obstetric staff with different experience. If at least two out of three assessments agreed, that was the final classification. If all three classifications differed, a fourth assessor, an experienced obstetrician, also classified the trace, so that a final judgement was attained for all traces.

Outcomes
The main outcome was the sensitivity for the classification pathological to identify cases with acidosis at birth, and the specificity for the classification normal or suspicious together to rule out acidosis. The classification preterminal using SWE-09 was regarded as pathological in the final analyses. The interpretation suspicious mandates continued surveillance combined with additional active management to correct reversible causes and to evaluate the fetal condition in all three templates. Therefore, we also evaluated the sensitivity of suspicious and pathological patterns combined in identifying acidosis, and the specificity for the classification normal to rule out acidosis.

Statistical analyses
The information was gathered in Stat View 1 computer software. The sensitivity and specificity with 95 % CI for the final classification was calculated for the three classification systems, using www.sample-size.net/confidence-interval-proportion provided by UCSF. The chi-square test was used to determine if there was a significant difference in sensitivity or specificity between classification systems, and a p-value < 0.05 was considered statistically significant.

Ethical approval
Ethical approval was obtained from the Regional Ethical Review Board in Lund, Dnr 2016/371, 2016-05-24.

Results
During the study period 57,582 neonates were born at the two hospitals. A total of 296 cases and 592 controls fulfilling the inclusion criteria were included. One case and one control were excluded due to birth before 34 gestational weeks, leaving 295 cases and 591 controls in the study (Fig. 1). Background data are summarized in Table 1.
Rates of agreement between the classifications determined by the assessment of the included variables of the three interpreters are shown for cases and controls in Table 2. For classification of cases, the agreement was highest for SWE-09, whereas for controls agreement was higher for FIGO-15 and SWE-17.
The result of the final classifications is summarized in Table 3, and the sensitivity and specificity for the different templates in Table 4. The sensitivity for the classification pathological to identify cases with acidosis differed significantly between the classifications systems: 87.1 % for SWE-09, 62.0 % for SWE-17 and 50.2 % for the FIGO-15 classification system. The corresponding specificity was higher for FIGO-15 (87.5 %) and SWE-17 (84.8 %) than for SWE-09 (55.5 %).
When combining suspicious and pathological patterns the sensitivity for SWE-17 increased to 83.4 %, which was not significantly lower than the sensitivity for pathological patterns with SWE-09 (p = 0.26), whereas the specificity at 67.7 % was significantly higher than for pathological patterns with SWE-09 (p < 0.001). For the FIGO-15, combing suspicious and pathological patterns also lead to a high sensitivity (96.6 %), but the specificity declined to a very low level (22.5 %).

Interpretation of the main results
In this study we found that during the second stage of labor the FIGO-15 template had the lowest sensitivity, 50.2 %, to detect fetal acidosis when the cut-off was pathological patterns. When the cutoff was suspicious patterns, it had the lowest specificity, 22.5 %. The template lead to a high rate of patterns classified as suspicious in cases (46.4 %) as well as in controls (65.0 %).
The SWE-09 template had the highest sensitivity to detect acidotic fetuses (87.1 %), whereas the specificity was low (55.5 %). Combining pathological and suspicious increased the sensitivity to 97.6 % with a concomitant decrease of specificity to 28.3 %.
The SWE-17, modelled on the FIGO-15, had a sensitivity of 62.0 % for pathological patterns. When the cut-off was set at suspicious patterns the sensitivity was 83.4 %. This is similar as for pathological patterns with SWE-09, but the specificity was higher, 67.7 %.
We consider that the safety of SWE-17 and SWE-09 was similar, if also suspicious patterns are acted upon with SWE-17. Acting does not always have to be to deliver, but may include diagnostic measures (fetal blood sampling) as well as other therapeutic measures (alleviating oxytocin overstimulation). The FIGO-15 classification results in too many suspicious tracings to be discriminative. The sensitivity of the classification pathological for FIGO-15 was low, and the specificity with a cut-off at suspicious was also low. This finding raises doubts concerning the validity in clinical practice.
It is not clear to us why the interpretation using the SWE-17 template resulted in a higher rate of normal classification in acidotic fetuses (16.

Comparison of the present results with previous studies
A few previous studies have compared different CTG interpretation templates [13,14] and many studies have analyzed the association between specific CTG patterns and acidemia [15,16]. Coletta et al. evaluated a 3-tier and a 5-tier classification system in a study including 30 cases with pH < 7.00 and 24 controls with pH > 7.20 [17]. They found a 79 % sensitivity and a 100 % specificity of the two worst categories in the 5-tier system to detect acidosis at birth, whereas the 3-tier system, that was similar to FIGO-15, had a low sensitivity (12.5 %) since most cases and controls were categorized as category 2. They did not present confidence intervals for sensitivity and specificity and the number of cases and controls were limited.
Bhatia et al. compared FIGO-15 with the NICE guidelines from 2007 and 2014 [18]. They found that FIGO-15 offered favorable agreement scores, was easy to use and lead to a moderate rate of interventions. However, that study did not address the validity of the classification.
Olofsson et al. compared the STAN classification system from 2007 (similar to SWE-09) with FIGO-15. They found that the two systems classified tracings differently [19] and that the FIGO-15 had a lower sensitivity (43 %) than the STAN template (73 %) [20]. Martí Gamboa et al. [21] compared the FIGO-15 classification form with a 5-tier classification [22], in 102 cases with pH 7.10 and base deficit > 8 mmol/l, and 100 controls. The FIGO classification had a sensitivity of 43.6 %, and a specificity of 82.5 %, and the 5-tier system a sensitivity of 36 %, and 88 % specificity for acidosis at birth. The present study confirms the results of these studies, with a low sensitivity for the FIGO classification system to detect fetal acidosis.

Strengths and weaknesses of study design and methods
The case group was defined as neonates born with a pH < 7.05. Jonsson et al. reported pH < 7.05 at birth as a useful variable for quality control of management of the second-stage of labor [23]. A cord artery pH of 7.01-7.05 ha s been associated with a 10-fold risk of encephalopathy with early neonatal seizures [24]. Thus, it is desirable for our monitoring methods to detect hypoxia resulting in this degree of acidosis.
Each trace was interpreted by at least three different individuals, and the final classification was that of the majority (or two out of four). This might result in a higher sensitivity and specificity than if only one person assesses a trace. We chose the design with triple interpretation of each trace by clinicians with different experience both to reflect clinical practice and to achieve more accurate interpretations of the fetal heart rate parameters than from assessments of a single individual.
The interpreters in the study were blinded to outcome, which is necessary to avoid ascertainment bias [25,26]. In the clinical management of labor, CTG is just one part of the management, but since our purpose was to assess the CTG templates as such, and not clinical management, we considered it proper to leave the interpreters blind for clinical data, minimizing the risk of bias.
To eliminate bias caused by different experience of the different classification systems, the classification of the different fetal heart rate parameters (heart rate, decelerations etc.) were systematically transformed to classifications normal, suspicious or pathological according to each classification form by one of the authors.

Concluding discussion
The results of the study indicate that we are in dire need for improvement of interpretation protocols to ensure safe labor care. The purpose of intrapartum fetal monitoring is to detect fetal hypoxia in time to prevent asphyxia. With a low sensitivity of monitoring, fetal asphyxia may not be possible to avoid, whereas a low specificity may lead to unnecessary interventions. We consider a high sensitivity of CTG to be more important than a high specificity, since identification of fetuses at risk is indispensable if asphyxia should be avoidable, and since secondary diagnostic methods can be used to improve specificity.
Further studies are needed to analyze how differences between the different parameters in the templates affect sensitivity and specificity. Improvements of SWE-09 to increase specificity or of SWE-17 to increase sensitivity might be possible, but before introducing new templates for fetal monitoring in clinical practice, studies of the validity of such templates should be performed.

Conclusion
We consider the FIGO-15 classification to be too restrictive in classifying tracings as pathological, and too liberal in the classification of merely suspicious tracings, to properly be able to discriminate between fetuses with and without acidosis. The sensitivity of the classification "pathological" to detect acidosis was low and insufficient for safe surveillance. The classification "suspicious" however, implied fetal acidosis with too low specificity to be clinically useful and to require clinical action in all cases of a suspicious trace. SWE-17 provided the best combination of sensitivity (83.4 %) and specificity (67.7 %), if the cut-off for indicating fetal acidosis was set at a suspicious pattern, whereas SWE-09 provided the best sensitivity (87.1 %) for pathological patterns and had the lowest false negative rate of tracings classified as normal in cases with acidosis (2.4 %).

Funding
This work was supported by research grants from Region Skåne and funding from LÖF, the national Swedish patient insurance company. The funders played no role in planning or conducting the research or writing of the paper.

Declaration of Competing Interest
The authors declare that they have no competing financial interest that have influenced the work reported in this paper. Andreas Herbst has contributed in the development of the Swedish classification systems from 2009 and 2017.