|Home | About | Journals | Submit | Contact Us | Français|
The vast majority of pregnant women in the United States are subjected to electronic fetal heart monitoring during labor. There is limited evidence to support its benefit compared with intermittent auscultation. In addition, there is significant variability in interpretation and its false positive rate is high. The latter may have contributed to the rise in operative deliveries. In order to address the critical need for better approaches to intrapartum monitoring, the MFMU Network has completed 2 large multi-site randomized trials, one to evaluate fetal pulse oximetry and the other to evaluate fetal ECG ST segment analysis (STAN). Both of these technologies had been approved for clinical use in the U.S. based on prior smaller trials. These technologies were evaluated in laboring women near term and their primary outcomes were overall cesarean delivery for the oximetry trial and a composite adverse neonatal outcome for STAN. Both trials failed to show a benefit of the technology, neither in the rates of operative deliveries nor in the rates of adverse neonatal outcomes. The experience with these trials, summarized in this report, highlights the need for rigorous evidence before introduction of new technology into clinical practice and provide a blueprint for future such trials to address the need for better intrapartum monitoring approaches.
When first introduced, electronic fetal heart rate monitoring was used primarily in complicated pregnancies, but gradually it came to be used during most labors. In 1978 it was estimated that nearly two-thirds of American women were being monitored electronically during labor.1 By 1998, nearly 3.3 million American women, comprising 84 percent of all live births, underwent electronic fetal heart rate monitoring.2
By the end of the 1970s, however, questions about the efficacy, safety, and costs of electronic monitoring were being voiced by the Office of Technology Assessment, the United States Congress, and the Centers for Disease Control and Prevention. Banta and Thacker1 analyzed 158 reports and concluded that “the technical advances required in the demonstration that reliable recording could be done seems to have blinded most observers to the fact that this additional information will not necessarily produce better outcomes”. They attributed the apparent lack of benefit to the imprecision of electronic monitoring to identify fetal distress. Moreover, increased usage was linked to more frequent cesarean delivery. They estimated that additional costs of childbirth in the United States, if half of labors had electronic monitoring, were approximately $400 million per year in 1979 dollars.
The Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) appointed a task force to study these concerns, and a consensus report was published in 1979.3 After an exhaustive review of the electronic fetal heart rate monitoring literature, the group concluded that the evidence only suggested a trend toward improved infant outcome in complicated pregnancies. They emphasized that few scientifically rigorous investigations had been done to address perinatal benefits. A subsequent NICHD consensus panel,4 convened to address the dramatic increase in cesarean births in the United States, concluded that use of electronic fetal heart rate monitoring was a contributing factor.
Almost 20 years later, the NICHD Fetal Monitoring Workshop5 formulated research recommendations intended to assess the reliability and validity of fetal heart rate patterns in the prevention of asphyxial brain damage. The workshop participants concluded that the effectiveness of fetal heart rate monitoring still remains to be established despite widespread use in the United States. Another reason for the failure of fetal heart rate monitoring technology to be proven beneficial is the now well-accepted non-specificity of fetal heart rate patterns to predict fetal compromise. This poor specificity of fetal heart rate pattern interpretation has resulted in a continuing search for adjunctive tests that could be used to distinguish false-positive fetal heart rate patterns.
A number of adjunctive measures have been proposed, including fetal scalp sampling for pH, fetal scalp stimulation, fetal lactate measurement, determination of fetal oxygen saturation, and monitoring of fetal ECG. The Eunice Kennedy Shriver National Institute of Child Health and Human Development Maternal-Fetal Medicine Units (MFMU) Network identified intrapartum monitoring as one of the areas in need for more research, especially given that one of the major aims of the Network is to “evaluate maternal and fetal interventions for efficacy, safety, and cost-effectiveness”. 6 In this review, we attempt to present the rationale, the findings, and our experience with two large randomized trials conducted by the Network designed to measure the efficacy and safety of two promising adjuvants to electronic fetal monitoring – fetal pulse oximetry and fetal ECG. Although neither of these trials showed benefit, we believe that these ambitious efforts helped ensure that interventions were not introduced prior to their efficacy and safety being validated and hope that this information will be valuable for future research designed to improve intrapartum fetal assessment.
In May 2000, the United States Food and Drug Administration (FDA) granted conditional approval of the Nellcor OxiFirst Fetal Pulse Oximetry System for use as an adjunct to electronic fetal monitoring.7 This technology was designed to improve knowledge of the intrapartum condition of the fetus in the presence of a non-reassuring fetal heart-rate pattern by continuously measuring fetal oxygen saturation. With this technology, a specialized sensor is inserted through the dilated cervix after ruptured membranes and positioned against the fetal face. Once in contact with the fetal skin, the device permits measurement of fetal oxygen saturation during labor.8
The fetal pulse oximetry system was designed based upon principles of spectrophotometry and plethysmography.8 The sensor contains two low-voltage, light-emitting diodes as light sources and one photo detector. One light-emitting diode emits red light (735 nm), and the other emits infrared light (890 nm). When light from each light-emitting diode passes through fetal tissue at the sensor application site, a fraction is absorbed. The photodetector measures the light that was not absorbed – that is, the light that is reflected. Because oxyhemoglobin and deoxyhemoglobin have different light-absorption characteristics – relatively less red light is absorbed by oxyhemoglobin compared with deoxyhemoglobin and relatively more infrared light is absorbed by oxyhemoglobin compared with deoxyhemoglobin – pulse oximetry employs the ratio of these differences to calculate fetal oxygen saturation during each arterial pulse.8,9
A number of published observational studies in both animals and humans demonstrated a correlation between fetal metabolic acidosis and increasing duration of fetal pulse oximetry saturations below 30 percent in the setting of a non-reassuring fetal heart rate pattern.10–12 These studies were followed by the first randomized controlled trial of fetal pulse oximetry, which was published by Garite and colleagues in 2000.13 In this trial, a total of 1010 women with term pregnancies in active labor with an abnormal fetal heart rate pattern were randomly assigned to electronic fetal monitoring alone (the control group) or to electronic fetal monitoring plus continuous fetal pulse oximetry (the study group). The primary outcome was a reduction in cesarean deliveries for the indication of non-reassuring fetal status.
As shown in Table 1, the frequency of the primary outcome was significantly lower in the study group compared with the control group (5% vs. 10%, P<0.001).
However, as also shown in Table 1, there were no significant differences in the overall cesarean rate, the overall operative vaginal delivery rate, or the rate of operative vaginal delivery for non-reassuring fetal status. Looking at the results from a different angle, although the rate of cesarean delivery for non-reassuring fetal status was halved in the fetal pulse oximetry arm, the rate of cesarean delivery for dystocia more than doubled in this same group (9% vs. 19%, P<0.001). This increase in the rate of cesareans for dystocia was unexpected, and the results of the study raised several key questions. For example, were the discrepancies in the effect of the oximeter according to the indication for cesarean delivery reproducible? Was the sample size sufficient to permit assessment of infant safety in circumstances when an obstetrician withholds cesarean delivery in the presence of an abnormal fetal heart rate because fetal oxygenation is deemed to be normal? Such questions prompted the American College of Obstetricians and Gynecologists to withhold endorsement of the oximeter for use in clinical practice until additional studies were completed.14 Similarly, final FDA approval was contingent upon the results of post-approval studies.
In 2001, the investigators of the NICHD MFMU Network began an ambitious trial to test the efficacy and safety of fetal pulse oximetry.15 The study protocol required that 10,000 women with term pregnancies all have their labors monitored with both an electronic fetal monitor and a fetal pulse oximeter. Women were randomly assigned to a group in which the fetal oxygen saturation data were made available to the managing clinicians or to a group in which the fetal oxygen saturation data were masked. This design and the large sample size strengthened the study in several ways. First, it was hypothesized that like electronic fetal monitoring, fetal pulse oximetry, if widely adopted, would inevitably be used in labors not only complicated by non-reassuring fetal heart rate patterns but also those with either reassuring or “less non-reassuring” fetal heart rate patterns. The sample size ensured that both a large number of labors complicated by non-reassuring fetal heart rate patterns and a large number of less complicated fetal heart rate patterns would be available for study. Second, the masked design permitted that the actual experiment tested the clinical utility of fetal oxygen saturation data and helped eliminate other potential confounding variables. For example, some patients with non-reassuring fetal heart rate patterns which developed in the second stage of labor could have gone on to deliver vaginally during the occasionally time-consuming minutes that it took to insert the device. Such vaginal deliveries were less a result of the utility of the fetal oxygen saturation data and more a result of the additional time required attempting to obtain such data. Third, the masked design permitted blinded ascertainment and recording of fetal oxygen saturation data in approximately 5000 laboring patients. This archived data would serve to help better elucidate the natural history of fetal oxygen saturation during labor, as well as to refine the parameters used to define abnormal. Fourth, the sample size permitted a statistically valid assessment of the safety of fetal pulse oximetry for both mother and fetus.
Following a comprehensive training phase – including centralized and local on-site training of Network physicians and nurses by educators from the manufacturer, as well as passing of both written and practical examinations by the study personnel – recruitment began in May 2002. At the third, planned interim analysis, the data and safety monitoring committee halted the trial early after concluding that, given the data collected up to that point, an adequate number of subjects had already been enrolled to address the major study outcomes.
At the conclusion of the study, a total of 27,571 nulliparous women had been screened at 13 participating academic medical centers. Of the eligible women, 5553 consented to the study and sensor insertion was attempted. Of these, 212 did not undergo randomization because in 170 the device could not be successfully placed and in 42 attempted insertion of the device was associated with a previously unreported finding of prolonged fetal heart rate decelerations. The remaining 5341 women were randomly assigned to either the open or masked group.
As shown in Table 2, the availability of intrapartum fetal oxygen saturation data had no impact on the route of delivery.
Specifically, the overall cesarean rate was similar in the open and masked groups (26.3% vs. 27.5%, P = 0.31) as were the rates of cesarean for non-reassuring fetal heart rate (7.1% vs. 7.9%, P = 0.30) and dystocia (18.6% vs. 19.2%, P = 0.59). The findings were the same for the 2168 women with labors complicated by a non-reassuring fetal heart rate pattern prior to randomization (Table 2). There were also no differences in neonatal outcomes between the two groups including a composite measure comprised of one or more of the following: 5-minute Apgar score of 3 or less, umbilical–artery blood pH value of less than 7.0, seizures, intubation in the delivery room, stillbirth, neonatal death, or admission to the neonatal intensive care unit for more than 48 hours.
As reviewed by Dildy,16 there have been at least four other smaller randomized studies of fetal pulse oximetry published around the same time as or after the Network trial.17–20 At the University of Mississippi Medical Center, Klauser and colleagues randomly assigned 360 laboring women with a non-reassuring fetal heart rate pattern to electronic fetal monitoring alone or electronic fetal monitoring plus fetal pulse oximetry.17 Like the Network trial, the overall rate of cesarean delivery, as well as the rates of cesarean for non-reassuring fetal heart-rate tracing and for dystocia were similar between the two groups. And although the decision-to-incision time was approximately 10 minutes shorter in the fetal pulse oximetry arm, there were no statistically significant differences in neonatal outcome.
Conducted at four maternity hospitals in Australia, the FOREMOST trial included 600 laboring women with non-reassuring fetal status who were randomly assigned to electronic fetal monitoring alone or electronic fetal monitoring plus fetal pulse oximetry.18 Similar to the Garite trial,13 these investigators also found that fetal pulse oximetry was associated with a lower rate of cesarean delivery for non-reassuring fetal status compared with electronic fetal monitoring alone (13.8% vs. 20.2%, P = 0.042). However, the overall cesarean rate was comparable between the two groups (45.9% vs. 48.1%, P = 0.584), and there were no differences in neonatal outcomes.
Two smaller randomized studies – one from Turkey with 230 patients19 and one from Germany with 146 patients20 – demonstrated significant reductions in both cesarean for non-reassuring fetal status and overall cesareans or overall operative delivery in those women randomly assigned to fetal pulse oximetry. These two trials were later included in a synthesis of the available data in 2014 by the Cochrane Collaboration.21 The Cochrane authors concluded that “the addition of fetal pulse oximetry does not reduce overall cesarean section rates” and that “a better method than pulse oximetry is required to enhance the overall evaluation of fetal well-being in labour”.
Fetal electrocardiogram (ECG) ST segment analysis (STAN) was approved by the FDA in 2005 as an adjunct to electronic fetal heart rate (FHR) monitoring to determine whether obstetrical intervention is warranted when there is an increased risk for developing metabolic acidosis.22 Clinical application of STAN to fetal monitoring began in 1979 and further refinements between 1979 and 1989 to improve signal processing capability then made it possible for clinical trials to be conducted.23 STAN has now been available in Europe for more than two decades; and there have been several randomized controlled trials (RCTs)24–28 that have suggested its utility in decreasing (i) cord blood acidosis, (ii) need for fetal scalp blood sampling during labor, and (iii) need for operative vaginal delivery and emergency cesarean delivery for fetal indications. In 2006 the Cochrane Library29 endorsed the use of STAN “when a decision has been made to undertake continuous electronic FHR monitoring during labour”. However, despite these endorsements, enthusiasm for this technology in the United States was muted, and there were concerns that needed to be addressed before widespread adoption of another electronic fetal monitoring modality could be considered.30 None of the randomized trials published prior to 2006 had been performed in the United States, and given the different patient case mix, different health care delivery models, and different obstetrical practices in the United States and Europe, direct extrapolation of the European data to the U.S. population was not thought to be appropriate. As a result the largest randomized controlled trial of ST analysis ever undertaken was performed by the NICHD MFMU Network and the results were published in 2015.31
The fetal STAN system uses established principles of ECG analysis and select components of the fetal ECG signal are specifically isolated to determine the presence of myocardial ischemia.32–35 Fetal ECG does not provide the same perspective as an adult ECG recording because only one lead is used (scalp lead) to provide a global electrical picture of what is happening within the fetal heart as a whole, relying on the principle that the fetal ECG is a surface representation of changes in action potentials within the myocardium.
While a detailed description of myocardial electrophysiology is beyond the scope of this article, some description of the ST analysis methodology used in the STAN system may be helpful. A normal fetal ST interval is made up of an ST segment characterized by a horizontal or upward-sloping ST segment and a T wave with a constant and stable amplitude. A normal ST interval usually indicates a positive energy balance and aerobic myocardial function. Dawes and colleagues36 showed that during hypoxia, fetal myocardial function and survival depends on myocardial glycogenolysis. As glycogenolysis increases, so does the amplitude of the T wave in the fetal ECG33 and this relationship has been demonstrated to be linear.34 The amplitude of the QRS complex remains relatively stable until quite late in the hypoxia/acidosis process, providing a metric against which the change in the height of the T wave is standardized—the T/QRS ratio.
In fetal guinea pigs, progressive hypoxia results in changes in the ST segment and T-wave that are clearly seen and occur within a few minutes of the initiation of hypoxia.37 The fetus responds to moderate hypoxemia with a catecholamine surge, beta-adrenergic activation and myocardial glycogenolysis, all of which stimulate an increase in the T-wave amplitude. Repolarization of the myocardium (as reflected by the ST segment and T wave) is an energy-consuming process. When there is hypoxia, the energy balance within the myocytes becomes negative and the cells resort to beta-adrenoceptor–mediated anaerobic glycolysis. This pathway produces both lactate and potassium ions, and these ions affect the myocyte cell membrane potential and cause an elevation in the T-wave amplitude.38 When energy balance cannot be maintained by the compensatory mechanisms (vasodilatation and anaerobic metabolism), the endocardium becomes ischemic, altering the sequence of repolarization and direction of electrical flow. This imbalance between the endocardium and epicardium causes depression of the ST segment, with or without inversion of the T wave.39 In some fetuses, certain conditions prevent the usual myocardial response of ST segment elevation and T-wave amplitude increase, and there is the development of a biphasic shape to the ST segment, with progressive depression of this biphasic ST segment as the hypoxia worsens.
Because the perfusion pressure in the endocardium is always lowest when the mechanical strain is highest and because the response of the myocardium (beta-receptor activation and enhanced Frank-Starling curve) to an increased volume load is not instantaneous, there can be delays in the repolarization, which is manifested as ST changes. Thus, any stimulus that substantially alters the balance and performance characteristics of the myocardium may result in ST depression. Conditions that may be associated with ST depression and biphasic fetal ECG waveforms include prematurity, infection, maternal fever, myocardial dystrophy, cardiac malformations,40 chronic hypoxia, and the initial phase of acute hypoxia (when the fetus has not had enough time to develop the classic response). In addition, Yli and colleagues41 showed that ST depression occurs more frequently in fetuses of mothers with diabetes mellitus (possibly related to the higher prevalence of myocardial dystrophy in such babies).
An important concept that must be taken into account is that the STAN system is based on the ability to detect changes in the ST interval when the fetus mobilizes its compensatory mechanisms against hypoxia. A fetus that is already hypoxic, or one that has a significantly decreased capability to mount a response to hypoxia, may not show a change in the T-wave amplitude with further hypoxia. In this case, even progressive metabolic acidosis does not elicit a T-wave response. This fact highlights the importance of using STAN only in a fetus that is initially deemed to be capable of mounting a hypoxic response and underlines the adjunct nature of this system. In other words, the system can only be relied upon to act as designed when there has been accurate interpretation of the standard electronic fetal heart pattern. Therein lies a very real source of bias – the STAN technology depends upon another technology (electronic fetal heart rate monitoring and its interpretation) – a technology that in and of itself is subjective and open to interpretation.
The STAN system includes a fetal ECG electrode, a maternal skin reference electrode, and a microprocessor-based monitor that identifies the fetal ST segment and T-wave changes and compares them with the normal baseline values that have been individually established for that patient using the fetal heart rate monitoring strip as the basis for the normal reference indicator. This is an important distinction between this technology and others that have been used to evaluate fetal condition in labor. The STAN monitor requires that all fetuses monitored have had a normal heart rate pattern for at least 500 consecutive heartbeats, which equates to approximately 4 to 5 minutes of monitoring. This initial period of signal acquisition is used to generate the baseline parameters against which the algorithms will compare all subsequent changes. This stipulation means that, despite its capability of discerning ST segment and T-wave changes, the STAN technology depends completely on the initial subjective decision by the clinical team as to whether a fetal heart rate tracing is reassuring or not. This requirement has probably created a significant source of confounding and bias since a number of different classification systems have been used to categorize the tracings in the published studies.42 In addition, different providers in different medical systems have differing opinions as to what is reassuring or normal. In fact, most of the published trials have used different classification systems and have not used a standardized objective measure to classify the fetal heart rate tracing. Most published randomized, controlled trials,24–28 used inconsistent interpretation of the 1987 FIGO-CTG categories43 and were not constant in their definition of, or response to, ST-segment events. In addition, there was inconsistent use of fetal scalp sampling and intermittent monitoring of the fetal heart rate. These inconsistencies in study design, equipment, and clinical practice patterns between Europe and the U.S., underlined the need for the NICHD MFMU network trial. There was significant resistance by U.S. clinicians and medical systems to the introduction of a new technology that was not proven in an appropriately designed study using a U.S. population and practice patterns. In addition, the manufacturer had modified the color-coded fetal heart rate classification system from the four color classification used in Europe (Normal [green], Intermediary [yellow], Abnormal [orange], and Preterminal [red]), to a three color classification (green, yellow and red, based on the NICHD three category system)44,45 for their FDA application and for use in the U.S.46 To add to the confusion, most recently (after completion of the U.S. trial), the latest (2015) FIGO Classification system47 has introduced yet another classification that is different from that used in the STAN guidelines on which the prior European studies were based (Table 3).
The U.S. study by Belfort et al31 used the STAN guidelines that were approved by the FDA46 and which were promoted by and used in the manufacturer’s training program. It was felt that using the four color classification coding system promoted by the manufacturer in Europe would not have made any sense in the U.S. study given that the marketed product in the U.S. used a three color coding system. The subjective nature of the eligibility criteria for use of ST analysis technology was considered by many to be a major drawback, and the concern was raised that if the interpretation of STAN could be so heavily swayed by the fetal-heart interpretation, was the technology clinically reliable and efficacious in the U.S. setting? This skepticism was the basis for the U.S. trial which is discussed later in this article.
During labor, fetuses essentially fall into three categories:43,44 those who are tolerating labor without any issue and the monitoring strip is strongly predictive of normal acid-base balance (NICHD category I); those who are clearly in trouble, with a strip predictive of abnormal fetal acid-base balance with a need for urgent intervention of some sort, if not urgent delivery (category III); and those who are neither a category I nor category III (category II). The category II monitoring strip is regarded as indeterminate, not predictive of an acidotic fetus but without adequate evidence to classify the strip as category I or category III. The recommendation for the management of a category II strip is to evaluate, continue surveillance, and reevaluate, taking into account the entire clinical circumstances. Category II tracings are clearly the arena in which most of the more difficult decision making occurs. This was also the area where STAN has been reported to be potentially helpful in Europe, and where it required evaluation in the U.S. context. As mentioned above, the STAN system is based on a combination of FHR interpretation and adjunctive ST analysis. Because the entire premise of the system is based on an assurance that the fetus is not acidotic at the baseline, the ability to identify an abnormal FHR tracing is the rate limiting step of this technology. Competence in the assessment and management of FHR monitoring is an important component of any usage of the STAN system and this underscored one of the important reasons why STAN had to be studied in the U.S. context before any widespread use could be promulgated or endorsed. The system had never been tested using the NICHD FHR monitoring guidelines44 (as opposed to International Federation of Gynecology and Obstetrics (FIGO) and local European guidelines) and had not been extensively studied using the FDA-approved STAN guidelines.48
In the U.S., STAN uses a FHR classification that separates the FHR tracing into 1 of 3 zones: green, yellow, or red. Much of the European work done on STAN has used a four color code classification that includes an intermediary [orange] zone between the yellow and red zones. Heart rate tracings classified as green zone do not require any intervention and may be watched expectantly regardless of any ST change. Tracings classified as red zone need expeditious delivery regardless of ST changes. Yellow-zone tracings in the U.S. (and in Europe yellow and orange zone tracings) have a more complex management schema that relies on the presence or absence of ST changes, and the degree of baseline T/QRS increase. The color zones in the FDA approved STAN system are roughly analogous to the NICHD categories but with two significant differences. (i) The green zone allows the presence of variable decelerations as long as they are less than 60 seconds in duration and less than 60 beats per minute in depth. NICHD category I does not allow any variable decelerations, and as such, the FDA approved STAN system may be slightly more lenient than the NICHD classification in terms of what is regarded as acceptable to continue to monitor without any concern for acidosis. (ii) In contrast, NICHD category II is less stringent than the FDA approved STAN system in what it regards as indeterminate in that NICHD category II tracings allow absent variability without recurrent decelerations, whereas the STAN system classifies such a strip as red zone needing expeditious delivery. The four tiers system used in Europe required a greater degree of abnormality in the intermediary [European yellow] and abnormal [European orange] zones than what was required in the FDA approved guidelines for intervention to be initiated. For example:
These differences could result in later intervention in the European context and may explain some of the differences seen in the European studies and the U.S. study.
The promise of the STAN system lies in helping to manage those patients who are classified in the yellow zone (in the U.S.) or in the yellow and orange zones elsewhere - ST changes in this zone are believed to indicate progressive hypoxia, and the initiation of anaerobic metabolism, with its potential for metabolic acidosis.
As mentioned above, the recent changes to the FIGO classification system47 have introduced yet another potential confounder to use of the STAN system. There are some important differences to the interpretation of what is accepted as normal between the two (FIGO and STAN) classifications. The main differences between the 2015 FIGO CTG classification and the STAN CTG classification are: (i) A baseline fetal heart rate of 150–160 bpm is classified as normal by FIGO but as intermediary (yellow zone) by STAN. (ii) Variability 5–25 bpm: accelerations are not needed for classifying a pattern as a normal CTG by FIGO, but are required by STAN, and (iii) Absent variability (silent pattern) and pre-terminal patterns are not classified by FIGO, but constitute a fourth CTG class (pre-terminal CTG) in the STAN CTG classification system.
Approved training, certification, and credentialing (as mandated by the FDA) are needed before use of the STAN system. In addition, as has been noted in most publications, there is a definite learning curve to the use of this system and indiscriminate use without adequate supervision is not advised. The system is FDA approved only for use in singleton pregnancies in which the fetus is more than 36–0/7 weeks, the membranes are ruptured, and the mother is in the first stage of labor without active or involuntary pushing. There should also be no contraindication to a fetal scalp electrode analysis or STAN.
The Plymouth Trial24 was the first large randomized controlled trial in which intervention rates and neonatal outcomes in fetuses monitored with cardiotocography (CTG) alone were compared with those in fetuses monitored using the combination of ST waveform analysis plus CTG. In this study of 2434 patients, there was a 43% reduction in operative interventions for fetal distress in the ST + CTG group. There was also a trend toward fewer cases of cord artery metabolic acidosis and low Apgar score. The second, large-scale randomized controlled trial to compare outcomes following use of CTG alone and CTG + ST waveform analysis was the rather controversial Swedish Randomized Controlled trial.25 The study included 4966 term fetuses in three large labor and delivery wards in Sweden. After exclusion of inadequate recordings and fetuses with malformations (n = 5574), the findings showed a 61% decrease in the number of fetuses born with umbilical cord arterial metabolic acidosis in the CTG + ST group. There was also a 28% decrease in operative interventions because of fetal distress in the STAN-monitored group. These findings were consistent with the results of the Plymouth trial.24 It was concluded that intrapartum monitoring with CTG combined with automatic ST waveform analysis increased the ability to identify fetal hypoxia and to intervene more appropriately, resulting in an improved perinatal outcome. There were a number of problems with this trial and concerns were raised regarding patient selection, exclusion and statistical analysis.49 Errors in the analysis were subsequently confirmed by a committee of the Swedish Research Council.50 A revised modified intention-to-treat analysis confirmed that after correction for errors at data collection there was still a 52% reduction of metabolic acidosis, but the p-value of the difference was now 0.038.51
The Finnish randomized controlled study aimed to examine whether STAN could reduce the rate of neonatal acidemia and the rate of operative intervention during labor compared with fetal heart rate (FHR) monitoring alone.26 A total of 1483 women in active labor with a singleton term fetus in cephalic presentation were randomly assigned to STAN + FHR monitoring or to FHR monitoring alone. Fetal scalp sampling was optional in both groups. The main outcome measures were neonatal acidemia (umbilical artery pH <7.10), neonatal metabolic acidosis (umbilical artery pH <7.05 and base excess <–12 mmol/L) and operative interventions: cesarean delivery rate, vacuum delivery rate, and fetal scalp sampling rate. There were no statistically significant differences between the STAN group and FHR monitoring group in the incidence of neonatal acidemia (5.8% vs 4.7%) or metabolic acidosis (1.7% vs 0.7%). The cesarean delivery rate (6.4% vs 4.7%) and the vacuum delivery rate (9.5% vs 10.7%) were also similar in the STAN and FHR monitoring groups. The incidence of fetal scalp sampling was lower (P<.001) in the STAN group (7.0%) than in the FHR monitoring group (15.6%). The investigators concluded that intrapartum fetal monitoring with STAN did not improve the neonatal outcome or decrease the cesarean rate. However, the need for fetal blood sampling (FBS) during labor was lower in the STAN group.
The French randomized controlled trial was conducted in two French maternity centers.27 Its objective was to assess whether knowledge of ST-segment analysis was associated with a reduction in operative deliveries for non-reassuring fetal status (NRFS) or with a need for at least one scalp pH during labor. A total of 799 women in labor at 36 weeks or more, with a single fetus with cephalic presentation, and either abnormal cardiotocographic tracing or thick meconium-stained amniotic fluid were randomized to either monitoring with CTG alone (n = 400) or CTG + STAN (n = 399). Scalp pH sampling was an option in both groups. There was no difference between the groups in the primary outcome of operative delivery for NRFS. The proportion of patients who had at least one scalp pH measurement during labor was substantially lower in the CTG + STAN group as compared with CTG alone: 27% compared with 62% (relative risk, 0.44; 95% confidence interval [CI], 0.36–0.52). There was no significant difference in a composite abnormal neonatal outcome between the groups (pH 7.05 or base deficit in extracellular fluid [BDecf] 12 mmol/L or 5-minute Apgar score < 7 or neonatal intensive care unit admission or convulsions or neonatal death).
The Dutch randomized controlled trial was a multicenter randomized pragmatic trial in three academic and six nonacademic teaching hospitals in The Netherlands.28 The objective of the trial was to estimate the effectiveness of intrapartum fetal monitoring by CTG plus ST analysis using a strict protocol for performance of fetal blood sampling. Performance of fetal blood sampling was restricted to three situations: (1) start of STAN registration with an intermediary or abnormal CTG tracing; (2) abnormal CTG tracing for more than 60 minutes during the first stage without ST events; and (3) poor ECG signal quality in the presence of an intermediary or abnormal CTG tracing. The reason for this protocol was to control performance of scalp pH, a factor that may have affected the results of prior trials in which scalp pH use was not left up to the discretion of the provider. The primary outcome measure was the incidence of metabolic acidosis, defined as an umbilical cord artery blood pH below 7.05 and a base deficit calculated in the extracellular fluid compartment (BDecf) above 12 mmol/L according to the Siggaard-Andersen acid-base chart algorithm. Although base deficit in blood is the value most commonly reported by umbilical cord blood analyzers, and the one often used in clinical practice, the investigators in this trial thought that BDecf better reflects the true metabolic component of acidosis. There were 2832 and 2849 women randomly assigned to monitoring by CTG with ST analysis (index) or CTG only (control). The fetal blood sampling rate was 10.6% in the index compared with 20.4% in the control group (relative risk 0.52; 95% CI 0.46–0.59). The primary outcome occurred 0.7% in the index compared with 1.1% in the control group (relative risk 0.70; 95% CI 0.38–1.28; number needed to treat 252). Using metabolic acidosis calculated in blood, these rates were 1.6% and 2.6%, respectively (relative risk 0.63; 95% CI 0.42–0.94; number needed to treat 100). The number of operative deliveries, low Apgar scores, neonatal admissions, and newborns with hypoxic-ischemic encephalopathy was comparable in both groups. Although this study did not show a difference in the primary outcome, it did show a lower rate of metabolic acidosis defined using base deficit in blood, the most common method used in the United States. And this benefit was achieved despite performing 48% less fetal blood sampling when ST segment analysis was used. It should also be noted that the actual rates of metabolic acidosis using base deficit in extracellular fluid (1.1% in the control and 0.7% in the index group) were much lower than the 3.5% assumed in the sample size calculations.
All of the trials described above begged the question as to how would ST segment analysis perform when fetal blood sampling was not performed. It became clear to clinicians in the U.S. that whereas data seemed encouraging in the European system, the utility of this new method of fetal monitoring had yet to be confirmed in a U.S. population. There were two main reasons why another randomized controlled trial (RCT) was needed before widespread general use of STAN could be endorsed in the United States. First, STAN had never been evaluated with an RCT under the specific conditions of obstetric health care delivery in this country. STAN is not simply a stand-alone technology but rather a system of management. How the U.S. obstetric community would adapt to this technology and its associated protocol was not something that could be assumed to mirror the experience in Europe. The U.S. model for obstetric care delivery is different from that in European countries and the direct extrapolation of data from these countries to a U.S. population is not appropriate. Second, although metabolic acidosis and some aspects of neonatal outcome had been studied, no RCT had specifically addressed a composite neonatal outcome as the primary outcome of interest. Although a decrease in operative delivery is a very important outcome in obstetric practice, the reduction in adverse neonatal outcomes is the ultimate goal for a monitoring system designed to reduce fetal and neonatal acidosis.
This then was the rationale for the U.S. RCT that was published in 2015.31 The study, which consisted of a pilot phase and a randomized trial, was conducted at 16 university-based clinical centers – each comprised of one to five delivery hospitals (26 total) – of the NICHD MFMU Network. The level of attention paid to ensuring that participating providers were adequately trained and experienced prior to enrolling patients in the RCT was unprecedented. Participating care providers and research personnel were trained and certified in the correct use of the fetal ECG ST analysis system to a level exceeding FDA requirements in a program in which the manufacturer participated and had oversight to ensure that correct training and usage of their equipment was in place. Each hospital participated in a pilot phase consisting of enrollment and management of at least 50 patients monitored with fetal ECG ST analysis, with central review of labor management decisions and re-training as needed, before being approved to start the trial. Once approved to begin the RCT, each site enrolled women with a singleton gestation attempting vaginal delivery at more than 36 weeks’ gestation, and who had cervical dilation between 2 to 7 cm (inclusive). Patients were randomly assigned to “open” or “masked” fetal ST analysis monitoring. The masked system functioned as a normal fetal heart rate monitor. The open system displayed additional information for use with uncertain (yellow zone) fetal heart rate patterns. The primary outcome was a composite of fetal death, neonatal death, Apgar score ≤ 3 at 5 minutes, neonatal seizure, cord artery pH ≤7.05 with base deficit ≥12 mmol/L, intubation for ventilation at delivery, or neonatal encephalopathy.
There were 11,108 patients randomized (5,532 open; 5,576 masked). The primary outcome occurred in 52 patients (0.9%) in the open arm and 40 (0.7%) in the masked arm (relative risk 1.31; 95% CI 0.87 to 1.98; p=0.20). Among the individual components of the primary outcome, only the frequency of 5-minute Apgar score ≤3 differed significantly between the open and masked arms (0.31% versus 0.11%; respectively P=0.02). There were no significant differences between groups in cesarean delivery (16.9% versus 16.2%; P=0.31) or any operative delivery (22.8% versus 22.0%; P=0.31). Adverse events were rare, but similar in the two groups.
The U.S. trial finding of no improvement in neonatal outcomes or reduction in cesarean delivery rates is consistent with the results of an individual patient data meta-analysis of ST analysis trials.52 However, that meta-analysis showed a reduction in the frequency of fetal blood sampling (not routine in the U.S.) and operative vaginal delivery in women who had electronic fetal monitoring and adjunctive ST analysis, compared with those who had electronic fetal monitoring alone.41
Concerns about the failure of STAN to predict cases of intrapartum metabolic acidosis have been raised over the years.53 Three cases of intrapartum metabolic acidosis were documented by the Dutch group54 and a paper from St. George’s Hospital in London reported on 14 cases of neonatal encephalopathy in the first 1052 patients who were monitored using STAN and stated that in only 7 of these 14 had there been a significant ST event.55 It was subsequently established that STAN monitoring was being started when the fetal heart rate tracing was already abnormal, a finding that reinforces the importance of only starting STAN monitoring after a reassuring pattern has been established (and maintained until the baseline ST analysis is completed). In the same study STAN also failed to identify seven (30%) of 23 cases of metabolic acidosis. Another paper published in 2008 reported on approximately 5000 patients monitored using STAN, and showed that in only two of three cases (66%) with severe, and in 20/48 (42%) cases with moderate metabolic acidemia, were there ST events coinciding with CTG abnormalities.56
Seven randomized trials were included in the most recent Cochrane review (September 2015) on the topic.57 The objective of this review was to compare the effects of analysis of fetal electrocardiogram (ECG) waveforms during labor with alternative methods of fetal monitoring. Seven trials (27,403 women) were included: six trials of ST waveform analysis (26,446 women) and one trial of PR interval analysis (957 women).58 The reviewers felt that the trials were generally at low risk of bias for most domains and the quality of evidence for ST waveform analysis trials was graded moderate to high. The major finding was that in comparison with continuous electronic fetal heart rate monitoring alone, the use of adjunctive ST waveform analysis made no obvious difference to primary outcomes: births by cesarean (risk ratio (RR) 1.02, 95% confidence interval (CI) 0.96 to 1.08; six trials, 26,446 women; high quality evidence); the number of babies with severe metabolic acidosis at birth (cord arterial pH less than 7.05 and base deficit greater than 12 mmol/L) (average RR 0.72, 95% CI 0.43 to 1.20; six trials, 25,682 babies; moderate quality evidence); or babies with neonatal encephalopathy (RR 0.61, 95% CI 0.30 to 1.22; six trials, 26,410 babies; high quality evidence). There were, however, on average fewer fetal scalp samples taken during labor (average RR 0.61, 95% CI 0.41 to 0.91; four trials, 9671 babies; high quality evidence) although the findings were heterogeneous and there were no data from the largest trial (from the U.S.). There were marginally fewer operative vaginal births (RR 0.92, 95% CI 0.86 to 0.99; six trials, 26,446 women); but no obvious difference in the number of babies with low Apgar scores at five minutes or babies requiring neonatal intubation, or babies requiring admission to the special care unit (RR 0.96, 95% CI 0.89 to 1.04, six trials, 26,410 babies; high quality evidence). The authors concluded that the modest benefits of fewer fetal scalp sampling procedures during labor (in settings in which this procedure is performed) and fewer instrumental vaginal births have to be considered against the disadvantages of needing to use an internal scalp electrode, after membrane rupture, for ECG waveform recordings. They found little strong evidence that ST waveform analysis had an effect on the primary outcome measures in their systematic review.
As the rate of cesarean delivery remains high in the U.S., there is need for better methods to evaluate fetal well-being intrapartum. The MFMU Network has evaluated two of the most promising methods, without finding a benefit. Although the results were negative, the studies actually prevented adoption of these methods without appropriate evidence, as happened with the original introduction of fetal heart rate monitoring.
Without the unique setup of the MFMU Network, these studies would likely never have been performed in the U.S. Given that both of these technologies were already approved for use in the U.S., without the MFMU Network trials, the outcome would have likely gone the way of the fetal heart rate monitoring technology. Our experience in these two situations also highlights the importance of performing the trials that are adequately powered to address the appropriate outcomes and that are generalizable to the U.S. population and management. These two case studies clearly illustrate that hurried adoption of new technologies based on surrogate outcomes or on studies in clinical settings other than the U.S. may not be a good approach to health care. They also highlight the importance of having infrastructures such as the MFMU Network which allow such large, complex and rigorous trials to be conducted, and its impact on maternal and child health to be rigorously measured.
The project described was supported by grants from the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) [HD21410, HD27860, HD27869, HD27915, HD27917, HD34116, HD34136, HD34208, HD53097, HD40545, HD40560, HD27869, HD40485, HD40500, HD40512, HD40544, M01 RR00080 (NCRR); HD68282, HD68268, HD27917, HD36801]. Comments and views of the authors do not necessarily represent views of the NICHD.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.