|Home | About | Journals | Submit | Contact Us | Français|
Universal newborn hearing screening for bilateral permanent congenital hearing impairment is standard practice in many developed economies, but until there is clear evidence of cost-effectiveness, it remains a controversial use of limited health care resources. We conducted a formal systematic review of studies of newborn hearing screening that considered both costs and outcomes to produce a summary of the available evidence and to determine whether there was a need for further research.
A search was conducted of medical and nursing databases and gray literature websites by the use of multiple keywords. The titles and abstracts of studies were examined for preliminary inclusion if reference was made to newborn hearing screening, and to both costs and outcomes. Studies of potential relevance were independently assessed by 2 health economists for final inclusion in the review. Studies that met inclusion criteria were appraised by the use of existing guidelines for observational studies, economic evaluations and decision analytic models, and reported in a narrative literature review.
There were 22 distinct observational or modeled evaluations of which only 2 clearly compared universal newborn hearing screening to risk factor screening for bilateral permanent congenital hearing impairment. Of these, the single evaluation that examined long-term costs and outcomes found that universal newborn hearing screening could be cost-saving if early intervention led to a substantial reduction in future treatment costs and productivity losses.
There are only a small number of economic evaluations that have examined the long-term cost-effectiveness of universal newborn hearing screening. This is partly attributable to ongoing uncertainty about the benefits gained from the early detection and treatment of bilateral permanent congenital hearing impairment. There is a clear need for further research on long-term costs and outcomes to establish the cost-effectiveness of universal newborn hearing screening in relation to other approaches to screening, and to establish whether it is a good long term investment.
Bilateral permanent congenital hearing impairment (PCHI) of 40 decibel hearing level (dBHL) or greater1 is a significant public health issue because of its relatively high incidence of 1 to 2 per 1000 live births and its adverse impact on development. In a number of longitudinal studies1–3 investigators have suggested that very early identification and intervention before the infant reaches 6 months of age are associated with better language outcomes. Unfortunately, without screening, PCHI is usually not diagnosed until well after a child has completed infancy.4–6
The 2 main approaches to screening for bilateral PCHI are risk factor screening and universal newborn hearing screening (UNHS). In a risk factor–screening program, newborns with an identifiable risk factor, such as an admission to a neonatal intensive care unit or a birth weight of less than 1500 g, are identified and screened.6 However the correct identification of all at-risk newborns is very difficult to achieve in practice7 and, even when completely successful, risk factor screening programs are unable to identify all newborns with PCHI, as 40% to 60% do not have an identifiable risk factor.6
Therefore, once it became technically feasible, UNHS for bilateral PCHI rapidly became the standard of care in developed economies, largely on the basis of expert recommendation.8 Universal newborn hearing screening programs offer all newborns an automated hearing screen with the use of otoacoustic emissions (OAE), automated auditory brainstem responses (AABR), or both.8 A systematic review of the evidence for UNHS for the detection of moderate-to-severe bilateral PCHI found that it was associated with an earlier age of referral, diagnosis and treatment than other approaches.9 In the absence of randomized controlled trials, the majority of the evidence for favorable child outcomes from UNHS comes from observational studies and the 2 quasi-randomized controlled trials.2,10
In the UK retrospective cohort study,2 exposure to UNHS was associated with better receptive language scores, both in terms of individual measures and aggregate results (receptive language as compared with nonverbal ability adjusted mean difference in aggregate z score, 0.60; 95% confidence interval, 0.07–1.13). The impact on expressive language and speech was less clear. The clinically relevant gains in speech and expressive language garnered from UNHS were reflected in the small-to-medium effect sizes, but these failed to reach statistical significance, possibly because of small sample sizes.
Unfortunately, serious gaps in the evidence for UNHS remain,11 and it is now becoming clear that small population benefits have been overstated. In Australia, as in other developed economies, UNHS programs have been widely implemented despite the lack of strong evidence for their efficacy and cost-effectiveness. Given this, there is no guarantee that UNHS programs will continue to receive government funding.
The objective of this systematic review was to summarize the available evidence for the cost-effectiveness of UNHS programs for bilateral PCHI 40 dBHL or greater and to determine whether there was a need for further research. To our knowledge, this is the first formal systematic literature review that has focused on economic evaluations of screening programs for newborn hearing impairment.
Between January and March 2011, we conducted a systematic search of medical and nursing databases and grey literature websites for papers that examined both the costs and outcomes of a comparison of different approaches to screening for PCHI, one of which could be “no screening.” No restrictions were placed on the severity of disease, the date, or language of publication, with the aim of finding as many studies as possible. We used the following terms in topic and title searches of MEDLINE, EMBASE, CINAHL, and the Cochrane Library: sensorineural hearing loss; congenital hearing loss; neonatal hearing loss; newborn hearing loss; hearing loss; neonatal screening; newborn screening; selective screening; risk factor screening; mass screening; universal screening; newborn hearing screening; universal newborn hearing screening; cost-benefit; cost benefit; cost-effectiveness; cost effectiveness; and economic evaluation. See Table 1 for an example of the search strategy. The results of the searches were entered into an Endnote database. A subset of these terms was also used in a grey literature search of OpenGrey12 and the health economics sites listed in Grey Matters, a search tool developed by the Canadian Agency for Drugs and Technology in Health.13 The review protocol was not registered, but a copy can be obtained on request from the first author.
One author (S.C.) scanned the titles and abstracts of all papers located through systematic searching to identify studies of potential relevance to the review. Full copies of these papers were retrieved and independently assessed by two health economists (S.C. and L.G.) for inclusion in the review, with any disagreements resolved through discussion. Data were extracted from the included studies by the 2 health economists, who used standard National Health Service Economic Evaluation Database extraction forms.14 Quality assessment followed a 3-step process recommended by the Economics Methods Group of the Campbell and Cochrane Collaborations,15 using: 1) Helfand et al11 to assess the methodological quality of cohort studies; 2) Drummond et al16 to assess the quality of economic evaluations; and 3) Philips et al17 to assess the quality of decision analytic modeling.
The initial search returned 2769 papers of which 72 (2.60%) were of possible relevance to the review. Copies were obtained of 71 of these 72 papers (98.61%; Fig. 1). Papers published in English (n = 62) were independently assessed by 2 reviewers (S.C. and L.G.), and papers in other languages (n = 10) were assessed by a single bilingual reviewer for each language. Of these, 27 English language papers and 2 papers published in other languages were included in the review. There were no randomized controlled trials.
The 29 papers contained 22 distinct observational or modeled evaluations (summarized in Table 2)16,18–46: 13 that compared different approaches to UNHS; 2 that compared different approaches to risk factor screening; and 7 that compared UNHS to risk factor screening.1 There were 8 evaluations (36.36%) that examined screening for bilateral hearing impairment, and 12 (54.55%) that examined screening for bilateral or unilateral impairment. In 2 (9.09%) papers, the target condition was unclear.
The studies were further classified as: type A evaluations that derive all of their clinical and epidemiological evidence from a single study; and type B evaluations, which are based on data from multiple data sources, such as published studies, unpublished reports, hospital records, and expert opinion (summarized in Table 3).14,18–20,22,23,25,27,32–48 There were 5 type A and 17 type B evaluations. Many of the type B evaluations used earlier type B evaluations as their primary source of data. Diagnostic criteria, where specified, ranged between 30 dBHL or greater and 40 dBHL or greater in one or both ears (summarized in Table 4).19,20,23–46 The majority of the evaluations included screening fees or resource use in their cost estimates, although the inclusion of overheads varied, and only a few studies included costs to families (summarized in Table 5).18–46 The 3 studies that included long-term costs and/or outcomes18–20 were not directly comparable because of their use of different screening criteria and other model parameters.
The 13 studies that compared different approaches to UNHS variously examined the use of OAE, AABR, or OAE followed by AABR. The number of screening tests performed, and whether these tests were conducted before or after discharge, varied between studies, as did the hearing threshold (from ≥30 dBHL to ≥40 dBHL). Five studies screened for bilateral hearing impairment,20–24 7 studies25–32 for unilateral or bilateral hearing impairment, and in 1 study,33 it was not clear what type of hearing impairment was being screened for. Two studies, those of Grill et al22 (type B) and Uus et al24 (Type A), compared UNHS to community-based distraction testing. Of the 2 type B evaluations that examined different approaches to risk factor screening, only one clearly screened for bilateral hearing impairment.34
The 7 type B studies18,34–45 that compared UNHS to risk factor screening used different diagnostic criteria and approaches to screening. The primary measure of cost-effectiveness was the cost per case detected within a set time period, either explicitly defined as six months of age, or implicitly as the period of time taken until diagnosis. Of these evaluations only Keren et al18 and Kemper and Downs35 examined the cost-effectiveness of screening for bilateral PCHI. In their study, Kemper and Downs35 compared UNHS to risk factor screening for the detection of significant bilateral PCHI (indirectly defined as at least 30–40 dBHL). They found that for a hypothetical cohort of 100 000 American newborns, the incremental cost per extra case detected by UNHS was $23 930 (U.S. dollars, reference year not stated). In sensitivity analysis, the greater yield and extra cost of the UNHS protocol was maintained over the entire range of probability and other model estimates.
Keren et al18 was the only evaluation in the review that examined the long-term costs and outcomes of the use of UNHS for the detection of bilateral PCHI 40 dBHL or greater. Universal newborn hearing screening was modeled in the evaluation as predischarge OAE, followed by AABR for first-stage failures. Risk factor screening was modeled in the evaluation as pre-discharge AABR followed by AABR for first-stage failures. The probability of newborns achieving a language quotient of >80 if diagnosed by 6 months of age was estimated at 0.70 (range, 0.40–1) for low-risk newborns and 0.50 (0.28–1) for high risk newborns. For newborns diagnosed later than 6 months of age, the probabilities were 0.40 (0–0.70) and 0.28 (0–0.50), respectively. The probability of bilateral PCHI was 0.0006 (0.00005–0.0013) for low-risk and 0.0083 (0.001–0.05) for high-risk newborns. In the no-screening arm of the model the proportion diagnosed with PCHI by 6 months of age was 0.20 (0.10–0.20) for low risk and 0.25 (0.10–0.25) for high-risk newborns. The probabilities of events and outcomes used in the model were mostly derived from the literature, with expert opinion only being used when published evidence was lacking.
The evaluation found that UNHS could yield long-term cost savings (2001, U.S.$ discounted at 3% per year) but that this was reliant on the proportion of low-risk newborns who develop normal language skills, and the associated life-time productivity gains. If the proportion of newborns that developed normal language skills was less than 60%, or if the life-time productivity gain was less than 64% of the base case estimate, then risk factor screening was cost saving. No-screening became dominant when the lifetime productivity gain from risk factor screening fell to below 15% of the base case estimate.18
Significant heterogeneity precluded formal meta-analysis. The 22 unique studies differed in terms of the type and size of their cohorts, the screening approaches used, the forms of technology evaluated, the conditions being screened for, the diagnostic criteria used, the cost categories examined, and the outcome measures that were used (Tables 2–5). The assessment of the risk of bias in the observational studies, on which the evaluations were based, was only undertaken for type A evaluations because of the use of data from multiple data sources in the type B evaluations. The quality of the primary studies used in the type A evaluations was assessed as generally poor, with little or no follow-up after screening. The quality of the economic evaluations ranged from fair to good and generally improved over time. The methodological quality of the costing in the evaluations was highly variable, with the level of detail ranging from brief35 to extremely detailed.18 Study perspective was generally not clearly stated, which made it difficult to judge whether all relevant costs had been included. There was little commonality in the cost and outcome data sources in the model-based evaluations, with widespread use of researcher estimates and expert opinion. Discounting was not always used in evaluations that had time horizons of greater than 1 year.
In general the effectiveness of the screening approaches was not well demonstrated. Outcome measures were often surrogates, such as the cost per newborn screened, rather than final endpoints such as the cost per case detected. Follow-up often ended at the point of referral for diagnostic testing (eg, Vohr et al23) with an implied assumption that all newborns referred for diagnostic testing would be later diagnosed with PCHI. The quality of decision analytic modeling was assessed as fair to good. The following indicators were generally poorly addressed: 1) the alternatives examined; 2) the time horizon of the evaluation; 3) the methods to identify important model parameters, and the quality of that data; 4) the structural validity of the models, that is, their internal, external and predictive validity; 5) sensitivity analysis; 6) heterogeneity, that is, failing to model outcomes for subgroups such as individuals with unilateral PCHI as opposed to bilateral PCHI; and 7) the comparison of the results produced by the models to other models with comparable approaches, diagnostic criteria, populations etc.
Despite its worldwide adoption, UNHS is yet to be established as a cost-effective investment. A major impediment is the lack of comparative effectiveness studies in which authors collect long-term prospective outcomes data—on quality of life and the educational, social, and employment opportunities of individuals with bilateral PCHI 40 dBHL or greater.9,49,50
The review’s main strengths are its completeness, the use of established protocols for the conduct of systematic reviews, the inclusion of published and grey literature, and the extraction of study data using standardized forms. Copies were obtained of 71 of the 72 studies (98.61%) that met provisional inclusion criteria and no study was excluded on the basis of the date or language of publication. The studies that met final inclusion criteria were critiqued by the use of existing guidelines for the critical appraisal of observational studies (for type A evaluations), economic evaluations, and decision analytic models.
One limitation of the review is that the identification of studies of potential relevance was undertaken by a single researcher. In a systematic review, these decisions are normally made by 2 researchers, working independently, who through discussion and consensus arrive at an agreed list of included studies, from which they independently extract review data. The aim of this process is to improve the reliability and reproducibility of the review.51 The main limitation of the review was the significant heterogeneity of the included studies. This made it difficult, other than in broad terms, to summarize the studies and to make meaningful comparisons between their results. The majority of the included studies screened for unilateral or bilateral PCHI. Screening for unilateral PCHI is controversial because there is a lack of clear evidence of the need to screen for, and to provide early intervention for this condition.4 Including both forms of congenital hearing impairment in a single program increases the prevalence of the target condition, increasing the positive predictive value of the screening test,49 which in itself can act to reduce the cost per case detected.
The single study18 in which investigators compared the long-term cost-effectiveness of UNHS to risk factor screening found that it could be cost-saving if early intervention led to a substantial reduction in future treatment costs and productivity losses, a finding that was highly dependent on a series of favorable assumptions about the effects of UNHS on language and educational outcomes that have yet to be confirmed in large scale population-based studies. The impact that early detection and intervention may have on the long-term outcomes achieved by UNHS may also be attenuated by factors such as the severity of hearing impairment52 and the presence of cognitive impairment, a common additional disability in children with PCHI.4 Consequently, the proportion of newborns with PCHI who go on to gain normal language skills may fall well below the levels estimated by Keren et al18 that are necessary for UNHS to be cost-saving.
The pathway of events in future economic evaluations of UNHS programs should ideally reflect the Joint Committee on Infant Hearing guidelines8,53 of screening before 1 month, diagnosis before 3 months, and intervention before 6 months of age. A societal perspective should be used, with the evaluation tracking the lifetime events, outcomes and costs that are triggered by the program.9 To answer the question of whether UNHS is value for money, these evaluations need to use another approach to newborn hearing screening, such as risk factor screening or opportunistic screening (“no-screening”), as a comparator, as opposed to another form of UNHS. Because it is now unlikely that a randomized controlled trial will ever be conducted of UNHS, these evaluations will inevitably need to use some form of decision-analytic modelling.9,10 To maximize their value to decision-makers, they need to be conducted and reported in accordance with current best practice guidelines for decision-analytic modeling, the economic evaluation of screening programs, and the examination and presentation of uncertainty.
The evidence for the long-term effectiveness of early intervention in these evaluations, and the probability of key events (ie, screened before 1 month; intervention before 6 months) should be obtained from a formal systematic review of population-based studies,16 such as that of Korver et al.10 The outcomes examined in the evaluations should include school age language ability and utility-weighted quality of life. The measurement of utility-weighted quality of life will make it possible to calculate a cost per quality-adjusted life year. The use of this common metric will mean that meaningful comparisons can be made between the costs and benefits of UNHS programs, and those of other interventions and therapies.16 The information gained through this comparison can be used to inform government and private sector decisions about the efficient use of scarce healthcare resources.
The aim of economic evaluation is not to necessarily save money but to obtain the greatest possible benefit from the use of scarce healthcare resources. Preventative interventions such as newborn screening do not need to save money to be cost-effective (ie, through reducing long-term productivity losses). An intervention to achieve a given outcome is cost-effective if it achieves this outcome at a lower cost than a comparator, or when the additional cost of the intervention falls below a pre-determined threshold that is acceptable to decision-makers and/or funding bodies (ie, $ 50 000 per quality-adjusted life year).54
This review summarized the available evidence for the cost effectiveness of UNHS for the detection of bilateral PCHI. There is a lack of evidence for the longer-term costs and outcomes of these programs, and it is therefore premature to draw any conclusions about their cost effectiveness. There is a clear need for further research before any policy recommendations can be made. At this time the fundamental question remains unanswered: whether, at a population level, UNHS is a good long-term investment54 and a worthwhile use of limited health care resources.
There are very few economic evaluations of universal newborn hearing screening for bilateral permanent congenital hearing impairment that consider long-term costs and outcomes. There is a clear need for further research to establish the cost-effectiveness of the practice.
The following authors are supported by the Australian National Health and Medical Research Council (NHMRC): M.W. (Population Health Career Development Grants 284556 and 546405); and L.G. (Population Health Capacity Building Grant 425855). Murdoch Children’s Research Institute research is supported by the Victorian Government’s Operational Infrastructure Support Program. This project was fully funded by NHMRC Project Grant 491228; the research was independent of the funder.