|Home | About | Journals | Submit | Contact Us | Français|
To evaluate Michigan newborn screening for congenital hypothyroidism (CH) protocol changes.
This population-based study includes infants born and screened in Michigan (January 1, 1994–June 30, 2010). Screening performance is compared across 4 periods defined by the dried blood spot testing method: (1) thyroxine (T4) with backup thyrotropin, (2) tandem T4 and thyrotropin, (3) primary thyrotropin testing without serial testing, and (4) primary thyrotropin plus serial testing for births weighing <1800 g. Logistic regression is used to test for differences across periods.
Thyrotropin testing exhibited greater specificity overall and greater likelihood of detection with serial testing relative to primary T4 testing. Tandem T4 and thyrotropin testing appeared more sensitive relative to other protocols, yet it produced significantly more false-positives, and detection may have been affected by overdiagnosis and misclassification. Central CH was no longer detected once T4 testing ceased.
Primary thyrotropin plus serial testing for infants at risk for later rising thyrotropin outperformed other newborn screening strategies for classic CH, although 2 false-negatives occurred among normal birth weight infants admitted to the NICU during this testing period. Tandem T4 and thyrotropin screening outperformed other strategies for detection of both classic and central CH combined, although it is associated with increased operating costs. Additional research is necessary to weigh the benefits of increased sensitivity against additional program operating costs.
Significant variation in congenital hypothyroidism screening operations/performance has been observed in the United States. The origin of this variation remains unknown, in part because of a lack of evaluation. Accordingly, debates persist about optimal screening operations including laboratory testing methods.
Four distinct screening protocols applied to Michigan resident infants are compared in detecting congenital hypothyroidism overall and specific to cases characterized by high initial thyrotropin concentrations thought to have a more severe form of the disease.
Newborn screening (NBS) for congenital hypothyroidism (CH), a clinically defined group of thyroid disorders observed at birth, began in the mid-1970s after the development of a radioimmunoassay capable of measuring thyroxine (T4) in dried blood spotted on filter paper.1–5 Based on findings from the first million infants screened, the NBS Committee of the American Thyroid Association recommended broad establishment and expansion of NBS programs for CH in 1977.6 By 1992, it was estimated that 50 million infants were screened annually for CH worldwide.7 NBS programs around the world initially reported detection rates ranging from 1:3000 to 1:4000 infants screened and a typical 2:1 ratio of female to male cases.8–11 More recently, US NBS programs have reported an increase in the birth prevalence of CH from 1:3985 in 1987 to 1:2274 in 2002 not fully explained by changes in laboratory methods or potential misclassification of transient disease; significant interstate variation has also been observed.12–16 The origin of this variation remains largely unknown, perhaps because there has been a lack of emphasis on evaluating screening system components.17 Accordingly, debates persist about optimal CH screening operations, particularly dried blood spot testing methods.
Previous evidence of the comparative effectiveness of dried blood spot testing protocols for CH NBS is heterogeneous. Greater sensitivity and specificity have been reported among primary thyrotropin relative to T4 testing programs and vice versa.15,18–24 Several studies estimated that 4% to 10% of cases missed by primary T4 testing are appropriately detected by primary thyrotropin testing20–23; others conclude that primary thyrotropin testing fails to detect cases of central CH and those exhibiting a later rise in thyrotropin.18,19 Serial thyrotropin testing protocols that rescreen selected infants generally during the first month of life have emerged to address later rising thyrotropin25–27; however, detection of central CH remains an issue. Estimates of the birth prevalence of central CH range widely (~1:20000–1:125000 live births), and some believe it is adequately detected clinically amid diagnosis of concomitant pituitary hormone deficiencies. Others note that although it is true that most (~75%) cases of central hypothyroidism also have pituitary hormone deficiencies,28 diagnosis of either condition can often be delayed beyond 3 months of age and may result in severe hypoglycemia, neonatal hepatitis, or death.29–31
Heterogeneity of previous evidence is difficult to interpret in the context of interprogram variation in screening protocols, performance, and population characteristics.32 In an attempt to investigate the impact of changes in NBS for CH among a fixed population, we assessed NBS for CH performance metrics in Michigan during 4 successive periods in which different dried blood spot testing protocols were used. This study adds to previous literature by (1) comparing the effectiveness of 4 distinct screening protocols in a reasonably stable and homogenous population of infants, (2) reporting findings generated in a program that collects virtually all initial bloodspot specimens between 24 and 36 hours of life, and (3) comparing NBS protocols based on their ability to detect cases characterized by high initial thyrotropin concentrations (100 uIU/mL) who are thought to have a more severe form of CH in addition to overall CH.
This population-based retrospective cohort study was approved by the Michigan Department of Community Health Institutional Review Board and includes Michigan resident infants born and screened in Michigan from January 1, 1994, through June 30, 2010. The primary exposure of interest is the method of dried blood spot testing defined based on infant date of birth (DOB) as follows: (1) T4 backup thyrotropin testing (DOB: 1/1/1994–12/31/1997); (2) tandem T4 and thyrotropin testing for all infants, no serial testing (DOB: 1/1/1998–9/30/2003); (3) primary thyrotropin testing, no serial testing (DOB: 10/1/2003–2/28/2007); and (4) primary thyrotropin testing, serial testing for infants weighing <1800 g at birth (DOB: 3/1/2007–6/30/2010). T4 backup thyrotropin testing involves making referrals for confirmatory testing based on thyrotropin determinations obtained from dried blood spots only in newborns whose T4 concentrations are below the 10th centile. Tandem T4 and thyrotropin testing involves making referrals for confirmatory testing based on either low T4 or elevated thyrotropin concentration measured in newborn dried blood. Primary thyrotropin testing involves making referrals for confirmatory testing based only on thyrotropin concentration; the addition of serial testing involves rescreening among infants at elevated risk of later rising thyrotropin.
Confirmatory testing is usually based on serum tests of venipuncture blood samples combined with some measure of binding proteins (ie, T3 resin uptake) used to differentiate free (active) from total T4.24,64 Blood samples for confirmatory testing are ideally obtained ~2 to 3 weeks of life when the upper range of thyrotropin falls to ~10 mU/L. Reference ranges for free T4, total T4, and thyrotropin concentrations measured in serum at 2 to 4 weeks of life are ~10 to 26 pmol/L, 90 to 206 nmol/L, and <10 mU/L, respectively.22 Infants having ≥2 serum thyrotropin concentrations >20 mU/L are expected to have permanent primary CH.13 If a defect in thyroid hormone synthesis is suspected, perchlorate washout testing is sometimes performed to test the ability of the thyroid to transform iodine into organically bound iodine.65 Other tests including scintigraphy and ultrasound are also useful during the process of diagnosing CH. Table 1 reports cutoff values used in referring infants for confirmatory testing over time. Outcomes of interest include screening performance metrics: detection rate, false-positive rate (FPR), positive predictive value (PPV), sensitivity, and specificity.
Demographic and perinatal information collected on the NBS, laboratory screening results, and medical management data were used to identify and characterize infants screened from January 1, 1994 through June 30, 2010. Aside from rescreens because of early specimen collection, infants identified by newborn dried blood spot screen for additional testing are considered screen positive; those who are classified as CH and are treated at the conclusion of confirmatory testing are considered diagnosed cases. Reports actively and passively ascertained from pediatric endocrinologists by the NBS Follow-up Program are used in this study to identify false-negative screening results.
Descriptive and analytical techniques include tabulation and trending of newborn characteristics by NBS outcomes of interest during 4 exposure periods. Logistic regression analysis is used to investigate whether the overall likelihood of detection, likelihood of severe CH detection, and likelihood of false-positive determination changed significantly across periods after adjusting for differences in the distribution of selected newborn demographic and perinatal characteristics. Cases are categorized as severe CH if their initial thyrotropin concentration reached or exceeded 100 uIU/mL based on the work of Mitchell et al.33 Adjusted models include covariates that are both significantly associated with the dependent variable (overall detection or severe case detection) and varied significantly during the 4 exposure periods. We were unable to assess area under the receiver operating characteristic curve associated with each protocol because of the lack of analyte concentration data among normal screens during T4 testing periods.
More than 2 million infants are included; Table 2 reports the distributions of demographic and perinatal characteristics across the 4 exposure periods. Population characteristics did not meaningfully differ over time, although, due to the large sample size, observed differences were statistically significant.
Table 3 reports screening performance metrics by dried blood spot testing protocol. During the T4 backup thyrotropin testing period, the detection rate, positive predictive value, and specificity were each less than observed during primary thyrotropin testing periods, both with and without serial testing. Alternatively, the FPR was more than twofold greater during the primary T4 relative to primary thyrotropin testing periods. The greatest rate of overall detection was observed during the tandem T4 and thyrotropin testing period (1:1271); however, the FPR (4.45%) was far greater than during other periods of observation. Accordingly, the PPV and specificity were significantly less during the tandem T4 and thyrotropin testing period compared with others observed in this study. Of note, the expected gender dimorphism of more female than male cases was reversed only during the tandem T4 and thyrotropin testing period, suggesting potential misclassification; otherwise, more female than male infants were diagnosed as expected.
Primary thyrotropin testing protocols were more specific than primary T4 testing protocols, yielding greater PPVs; however, the detection rate observed during primary thyrotropin testing periods is less than was observed during the tandem T4 and thyrotropin testing period. Overall, primary thyrotropin testing with serial testing for infants born weighing <1800 g yielded fewer false-positive results, a greater PPV, and greater overall detection than either primary thyrotropin testing without serial testing or primary T4 backup thyrotropin testing protocols. However, the 2 false-negative results observed during this study occurred during the primary thyrotropin plus serial testing period among infants admitted to the NICU who had later rising thyrotropin but were not included in the serial testing protocol due to their normal birth weights. A single case of central hypothyroidism was detected during the T4 backup thyrotropin testing period (1:542945), 6 of such cases were detected during the tandem T4 and thyrotropin testing period (1:125787), and none were detected during either primary thyrotropin testing periods.
Although the overall rate of CH detection and subsequent screening performance metrics varied considerably by screening protocol period, the birth prevalence of severe CH, characterized by having an initial thyrotropin concentration >100 uIU/mL, was far more stable (Fig 1). The number of severe CH cases detected per 100000 live births screened increased after the introduction of thyrotropin into the dried blood spot testing protocol relative to the T4 backup thyrotropin testing period and remained relatively stable across the tandem T4 and thyrotropin and primary thyrotropin testing periods (with and without serial testing).
Overall, after adjusting for potential confounding factors (race, gender, twin status, birth weight), tandem T4 and thyrotropin testing and primary thyrotropin plus serial testing for infants born weighing <1800 g are associated with a 89% and 58% increase in the odds of detection respectively compared with primary T4 backup thyrotropin testing. Primary thyrotropin testing was more specific than primary T4 testing, yet was associated with a greater likelihood of detection only after introduction of serial testing.
Although tandem T4 and thyrotropin testing was associated with a near twofold increase in overall detection compared with primary T4 testing, it was also associated with a near threefold increase in the rate of false-positives, as shown in Table 4. Alternatively, the FPR was significantly reduced during both primary thyrotropin testing periods relative to the primary T4 backup thyrotropin and tandem testing periods in both crude (unadjusted) and adjusted models. To compare the trade-offs of tandem thyrotropin and T4 testing verse primary thyrotropin plus serial testing for infants born weighing <1800 g, we applied the detection rates and FPRs to a hypothetical birth population of 125000 infants and estimated that an additional 297 false-positive determinations would be incurred for each additional case detected if Michigan were to switch from primary thyrotropin plus serial testing back to tandem T4 and thyrotropin testing for all births.
As indicated in Table 5, the crude likelihood of severe CH detection was greatest during the thyrotropin plus serial testing period and was significantly elevated in each screening protocol period relative to the T4 backup thyrotropin testing strategy. After adjustment for race and gender distributions, the difference in likelihood of severe CH detection between T4 backup thyrotropin and primary thyrotropin without serial testing protocols was not statistically significant. Severe CH cases were 38% and 35% more likely to be detected during tandem T4 and thyrotropin and primary thyrotropin plus serial testing periods respectively relative to the T4 backup thyrotropin testing period after adjusting for race and gender distributions.
Although the overall detection rate was greatest during the tandem T4 and thyrotropin testing period in this study, this finding is likely affected by misclassification and overdiagnosis based on the elevated birth prevalence, reversal of the expected gender dimorphism, and stable rate of severe CH observed in this period relative to primary thyrotropin testing periods. Furthermore, a surprising 72% of cases detected by primary thyrotropin exhibited normal T4 concentrations, far greater than the expected 4% to 10%,20–23 suggesting that cases of hyperthyrotropinemia may have been classified and treated as CH during this period. Primary thyrotropin testing plus serial testing among infants born <1800 g yielded fewer false-positives and accordingly had lesser operating costs than either tandem T4 and thyrotropin or primary T4 backup thyrotropin protocols. Primary thyrotropin plus serial testing was also associated with a greater likelihood of detection relative to primary T4 testing and was equally able to detect severe CH relative to the tandem testing approach, although no cases of central CH were detected during this period.
On average, 1 case of central CH was detected per year in Michigan before removing T4 from the NBS protocol; none were detected after. It is possible that ≥1 cases of central CH was missed by primary thyrotropin testing strategies and perhaps not identified because of mortality or migration before clinical detection or not reported because of our reliance on passive surveillance of false-negatives. It remains unclear how the additional operating costs associated with tandem thyrotropin and T4 testing compare with the benefit of early central CH detection, although additional research is necessary to quantify this benefit.
Additional investigation is also necessary to determine whether there is benefit to increased detection of marginal cases including hyperthyrotropinemia/subclinical CH, and hypothyroxinemia, particularly in the context of significant increases in the number of diagnoses in the United States over past 20 years. Currently, little evidence exists about the cognitive outcomes of permanent or transient forms of hyperthyrotropinemia and subclinical hypothyroidism.34–37 Two small studies reported an average decrement of 7 to 8 IQ points among children having hyperthyrotropinemia compared with euthyroid children;38,39 another reported subclinical hypothyroidism after age 5 years among such cases.40 Alternatively, other small studies have reported normal mental and physical development among untreated hyperthyrotropinemia and subclinical CH cases.41–44 Several investigators have also reported potential harm including iatrogenic hyperthyroidism associated with treatment of hyperthyrotropinemia patients.45,46
It is similarly unclear whether cases of hypothyroxinemia, a condition common among preterm infants and characterized by low T4 concentrations and normal thyrotropin concentrations not associated with CH, should be treated. Although NBS programs have traditionally considered positive screening results associated with hypothyroxinemia as being false, evidence suggests that these children are at elevated risk for neurodevelopmental disorders47 and developmental delay48; trials are underway to determine whether there is a benefit from treatment, results may have implications for future NBS operations.49,50
Future research efforts would be greatly advanced by application of a standardized operational case definition for CH across NBS programs, similar to efforts made in surveillance for cerebral palsy in Europe.51 Absent a standardized operational case definition, it is difficult to make meaningful comparisons between and within screening programs over time. This definition should lay out the criteria for diagnosing CH in terms of necessary tests and how to interpret them and should attempt to differentiate classic CH from other congenital thyroid abnormalities using operational terms. Standardized age-adjusted analyte thresholds are recommended for both dried blood and serum measurements. It is also recommended that all suspected cases of CH undergo thyroid imaging to facilitate differentiation of likely transient from permanent cases. Finally, expansion of long term follow-up and data collection activities including neurodevelopmental assessment would also facilitate future investigation of cost-benefit.
This study is limited by missing data, although it appeared to occur at random based on similar distributions among tabulations by overall CH detection, severe CH detection, and false-positive screening determination. The small proportion of cases that underwent thyroid imaging (15%) hindered our ability to investigate further whether transient or milder forms of CH were more likely to be detected during any of the observed exposure periods. Our definition of severe CH is also imprecise; 56% of infants included in this study who exhibited an initial thyrotropin concentration ≥100 uIU/mL were not diagnosed as CH. However, use of Mitchell et al.’s definition of severe CH revealed an interesting trend in detection across protocols and led us to similarly believe it is unlikely that the true birth prevalence of classic CH has increased over time. Our findings are also negatively affected by reliance on passive reporting to identify false-negative screening results; accordingly, our results per false-negative determinations should be interpreted as a minimum. Finally, this study is limited by the lack of universal long-term follow-up beyond age 3 years, meaning we are unable to differentiate permanent from transient CH.
Overall, our findings suggest that primary thyrotropin plus serial testing for infants at risk for later rising thyrotropin is an effective NBS strategy for classic CH (characterized by elevated thyrotropin and low T4), although the 2 false-negatives observed in this study occurred among normal birth weight infants admitted to the NICU during this period due to later rising thyrotropin. Michigan now rescreens all children admitted to the NICU at 30 days of life or discharge in lieu of retesting at 14 days and again at 28 days of life only among children born weighing <1800 g. Additional evaluations are underway to determine if the revised serial testing protocol adequately addressed the false-negatives observed in this study. Tandem T4 and thyrotropin screening outperformed other strategies for detection of both classic and central CH combined, although it is associated with increased operating costs per additional laboratory infrastructure and increased false-positive determinations primarily among preterm infants. Additional research is necessary to weigh the benefits of increased sensitivity against additional program operating costs; this research should support future guidelines directly addressing whether central CH should be included in the recommended panel of NBS conditions.
Each author made substantial contributions to conception and design, acquisition of data, or analysis and interpretation of data; participated in drafting the article or revising it critically for important intellectual content; and provided final approval of the version to be published.
FINANCIAL DISCLOSURE: The authors have indicated they have no financial relationships relevant to this article to disclose.
FUNDING: This research was supported in part by the Perinatology Research Branch, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Department of Health and Human Services. Funded by the National Institutes of Health (NIH).