|Home | About | Journals | Submit | Contact Us | Français|
Following the outbreaks of 2009 pandemic H1N1 infection, rapid influenza diagnostic tests have been used to detect H1N1 infection. However, no meta-analysis has been undertaken to assess the diagnostic accuracy when this manuscript was drafted.
The literature was systematically searched to identify studies that reported the performance of rapid tests. Random effects meta-analyses were conducted to summarize the overall performance.
Seventeen studies were selected with 1879 cases and 3477 non-cases. The overall sensitivity and specificity estimates of the rapid tests were 0.51 (95%CI: 0.41, 0.60) and 0.98 (95%CI: 0.94, 0.99). Studies reported heterogeneous sensitivity estimates, ranging from 0.11 to 0.88. If the prevalence was 30%, the overall positive and negative predictive values were 0.94 (95%CI: 0.85, 0.98) and 0.82 (95%CI: 0.79, 0.85). The overall specificities from different manufacturers were comparable, while there were some differences for the overall sensitivity estimates. BinaxNOW had a lower overall sensitivity of 0.39 (95%CI: 0.24, 0.57) compared to all the others (p-value < 0.001), whereas QuickVue had a higher overall sensitivity of 0.57 (95%CI: 0.50, 0.63) compared to all the others (p-value = 0.005).
Rapid tests have high specificity but low sensitivity and thus limited usefulness.
Real-time reverse-transcriptase polymerase chain reaction (rRT-PCR) is the most accurate method in detecting influenza A (H1N1) virus infection in respiratory specimens. However, the facilities and expertise for performing rRT-PCR are not uniformly available, and the results from rRT-PCR are generally not immediately accessible, which poses challenges in establishing a diagnosis, especially in patients presenting late in their clinical course (1). Rapid influenza diagnostic tests (henceforth, rapid tests) that detect influenza viral antigens produce quick results which can be used to screen patients with suspected influenza. Although as the 2009 pandemic H1N1 progressed, some new rapid tests were developed, the rapid tests used in the majority of studies were already in use and not developed specifically to detect H1N1. Specifically, during the beginning of the pandemic, their performance for the detection of 2009 pandemic H1N1 was not known. The lack of specific rapid and accurate diagnostics for H1N1 has been a major concern for monitoring and controlling outbreaks of 2009 pandemic influenza A (H1N1) infection. When they were developed, rapid influenza diagnostic tests were introduced as promising novel approach to detect this virus. Several commercial antigen assays, although not specifically designed for diagnosing 2009 pandemic H1N1, were quickly introduced to the market. However, rapid test performance has been less than optimal (1). Compared to rRT-PCR, several previous studies reported consistently high specificity but inconsistent estimates of sensitivity using rapid tests to detect 2009 H1N1 virus infection in upper respiratory specimens (2;3). When this manuscript was drafted, no meta-analysis of the diagnostic accuracy of rapid tests for diagnosing 2009 H1N1 had been reported, although Babin et al (4) published a meta-analysis recently. Here we use a comprehensive search strategy and meta-analytic methods to determine the accuracy of existing rapid tests for diagnosing 2009 H1N1 virus infection.
The literature was systematically searched using predetermined inclusion criteria. Studies were included that reported the sensitivity and/or specificity of an influenza rapid test to detect the presence of 2009 pandemic influenza (H1N1) infection, or contained sufficient information to calculate the sensitivity and specificity based on diagnosis of clinical specimens using the rRT-PCR as a gold standard reference test. No language restrictions were applied. Studies were identified eligible for inclusion by searching the databases MEDLINE (NLOM, Bethesda, MD) and EMBASE (Elsevier, Amsterdam, the Netherlands) using PUBMED and OVID interfaces, respectively. Publication dates were restricted to between 1/1/2009 and 1/15/2010, inclusive. Search terms for each database included: "influenza diagnostic", "influenza rapid test", "rapid test H1N1" and "influenza rapid". Subsequently, the title and abstract of each potential study were screened to determine potential eligibility, which was then confirmed by a review of the full-text. References from eligible studies were also examined for additional potential studies, and papers referencing eligible studies were identified using Google Scholar and considered for inclusion.
Data synthesis was performed according to guidelines on systematic reviews of diagnostic accuracy studies (7;8). The bivariate logit-normal random effects meta-analyses were conducted to summarize the overall sensitivity and specificity of rapid tests (9–14). Compared to fixed effects models, the random effects models typically provide conservative estimates with wider confidence intervals because it assumes that the meta-analysis includes only a sample of all possible studies. In addition, the random effects models appropriately account for the difference in study sample sizes, both within-study variability (random error) and between-study variability (heterogeneity) (15;16). In general, the bivariate approach offers some advantages over separate univariate random effects meta-analysis by accounting for the correlation between sensitivity and specificity (17–19). This correlation will exist if the different studies use different test-thresholds and thus are operating at different points along the underlying receiver operating characteristic (ROC) curve for the test. However, one study reported that the differences between univariate and bivariate random effects models for summarizing pooled sensitivity and specificity are trivial based on extensive simulations (20). Thus, we utilized the univariate logit-normal random effects meta-analyses to generate forest plots (i.e., graphical display designed to illustrate the relative strength in meta-analysis of multiple quantitative scientific studies addressing the same question) with overall and rapid test-specific pooled estimates for both sensitivity and specificity. Parameters used to summarize diagnostic accuracy include: sensitivity and specificity directly estimated from the univariate and/or bivariate random effects models; positive and negative likelihood ratio, positive and negative predictive values, and the diagnostic odds ratio (DOR) derived from parameter estimates from the bivariate random effects models accounting for potential correlation between sensitivity and specificity estimates. In addition to reporting pooled sensitivity and specificity, which are often regarded as intrinsic properties of a diagnostic test, we also report other metrics because they are clinically more meaningful in some settings. Sensitivity is estimated by the proportion of positive tests among those with the disease of interest, whereas specificity is estimated by the proportion of negative tests among those without the disease. The positive (or negative) likelihood ratio is estimated by the ratio of the proportion of positive (or negative) tests in the diseased versus non-diseased subjects. The positive (or negative) predictive value is estimated by the proportion of subjects with a positive (or negative) test who have (or do not have) the disease. The DOR, commonly considered a global measure of test performance, is estimated by the ratio of the odds of a positive test result in diseased subjects to the odds of a positive test result in non-diseased subjects.
The Begg and Mazumdar adjusted rank correlation test (21) and the Egger et al. regression asymmetry test (22) were used to assess publication bias for sensitivity and specificity, respectively. The Cochran's Q-test was used to detect heterogeneity (23). Location (US versus non-US) and rapid test manufacturer were included as covariates to examine their possibility as factors causing heterogeneity. Tests for small-study effects were employed only when at least four studies were available. The univariate logit-normal random effects meta-analyses were implemented in R version 2.12.1 (http://cran.r-project.org/) meta package (24;25), and the bivariate random effects models were fitted using the NLMIXED procedure in SAS version 9.2 (SAS Institute, Cary, NC, USA). The summary ROC curve was plotted based on the regression line of sensitivity on the false-positive rate (1–Sp) in logit scale using the estimates from the bivariate random effects models (12) rather than the line proposed by Rutter and Gatsonis (26;27).
We identified 2054 citations from MEDLINE and 775 citations from EMBASE, with overlap from the initial search. After screening titles and abstracts, 85 articles were eligible for full-text review. Of these, 68 articles were excluded, and 17 (11) articles on the sensitivity (specificity) of rapid influenza H1N1 diagnostic test were included, as presented in Table 1. Three studies have contributed results for multiple rapid tests (28–30) producing a total of 22 sensitivity estimates and 12 specificity estimates. Specifically, six (three) studies reported sensitivity (specificity) estimates of BinaxNOW Influenza A & B (2;28–32); seven (four) studies reported sensitivity (specificity) estimates of QuickVue Influenza A + B (28;30;33–37); four (two) studies reported sensitivity (specificity) estimates of BD Directigen EZ Flu A + B test (28;30;38;39); two (one) studies reported sensitivity (specificity) estimates of Espline Influenza A & B (29;40); and one study reported sensitivity and specificity estimates of Xpect Flu A & B (41). The seven (four) studies reporting sensitivity (specificity) of BD Directigen, Espline and Xpect were grouped together due to small numbers of studies for these tests. One study reported sensitivity and specificity estimates of either BinaxNOW Influenza A & B test or the 3M Rapid Detection Flu A + B test (42), and one study reported sensitivity estimate of either QuickVue Influenza A + B or SD Bioline Influenza Antigen test. These two studies are excluded for the analyses of pooled sensitivities and specificities of QuickVue Influenza A + B test and BinaxNOW Influenza A & B test as we cannot calculate the number of false positives, true negatives, false negatives, and true positives for either test. However, we included them for the analyses of pooled overall sensitivity and specificity of rapid tests.
The average sample size of the included seventeen studies was 315 (range 17 – 1831), with a total of 1879 cases and 3477 non-cases confirmed by rRT-PCR. The majority (82% = 14 out of 17) of the studies were prospective. The overall sensitivity and specificity estimates were 0.51 (95% CI: 0.41, 0.60; range 0.11 – 0.88) and 0.99 (95% CI: 0.94, 0.99; range 0.80 – 1.00) from the univariate random effects models. Figures 1 and and22 show the diagnostic accuracy measures from all the studies, stratified by the rapid test manufacturer using the bivariate random effects models. Based on the Q statistics, both the sensitivity and specificity showed highly significant between-study heterogeneity in the summary results (p-value < 0.001).
Specificity appeared to be more consistent than sensitivity from different manufacturers. The overall specificities from different manufacturers were comparable as seen in Figure 2. However, there were some differences for the overall sensitivity estimates from different manufacturers. BinaxNOW had a lower overall sensitivity (0.39 with 95%CI: 0.24, 0.57) compared to all the others (p-value < 0.001), whereas QuickVue had a higher overall sensitivity (0.57 with 95%CI: 0.50, 0.63) compared to all the others (p-value = 0.005) from the bivariate random effects model.
Begg’s adjusted rank correlation test (p-value = 0.40 and 0.53) showed no evidence of publication bias for both sensitivity and specificity, whereas the Egger’s regression asymmetry test (p-value = 0.07 and 0.06) suggested that some publication bias may exist for both sensitivity and specificity. Because we had a total of 22 sensitivity estimates but only had 12 specificity estimates, we did not consider the modified Begg and Mazumdar adjusted rank correlation test and the modified Egger et al. regression asymmetry test to detect the publication bias in log DOR scale, which has been shown to perform slightly better by simulations when equal sensitivity and specificity estimates are available (43).
Based on the bivariate logit-normal random effects models, the correlation between sensitivity and specificity was only 0.32 (95%CI −0.64, 0.89) on the logit scale, suggesting no evidence of strong correlation. The overall positive likelihood ratio was 34.5 (95% CI: 12.7, 93.6) and the overall negative likelihood ratio was 0.48 (95%CI: 0.39, 0.60). The DOR was 71.6 (95%CI: 26.3, 194.6). Study location (US versus non-US) was not associated with sensitivity and specificity (p-value = 0.41 and 0.86, respectively). Sampling type (Nasopharyngeal samples versus the other) was not associated with sensitivity (p-value = 0.95), but associated with specificity (p-value = 0.03). Nasopharyngeal samples have a specificity of 0.97 (95%CI: 0.90, 0.99) and the other samples have a specificity of 1.00 (95%CI: 0.98, 1.00).
Figure 3a presents the summary receiver operating characteristic curve (44). The area under the curve was 0.68 (95%CI: 0.20, 0.92). Figure 3b shows the estimated positive and negative predictive values with their point-wise 95% confidence intervals based on the overall estimates of sensitivity and specificity. For example, when the prevalence was 30%, the estimated overall positive and negative predictive values were 0.94 (95%CI: 0.85, 0.98) and 0.82 (95%CI: 0.79, 0.85), suggesting limited usefulness.
An extensive literature search indentified 17 articles that reported rapid test results from clinical specimens. Meta-analysis results showed that the specificity estimates for existing commercial rapid tests is high and relatively consistent ranging from 0.80 to 1.00. However, the sensitivity is low and highly variable ranging from 0.11 to 0.88. A lack of sensitivity is of particular concern in the present setting. Rapid tests are useful as a screening device to the extent that they identify possible cases. Therefore high sensitivity is essential.
Rapid tests with improved performance are needed. Alternatively, testing strategies that employ multiple rapid tests may improve sensitivity. For example, use of two different rapid tests on sequential biologic samples of the same individual may provide partially independent information. If an individual is defined as positive when at least one of the rapid tests is positive, the upper bound on improved sensitivity is the complement of the probability that both tests yield false negative results. Using the overall sensitivity estimates from QuickVue and other manufactures, this would yield an possibly acceptable sensitivity of 0.80 = 1–(1–0.57)×(1–0.53) if the tests work independently. However, this strategy would double the cost of testing and would also require the collection of a second sample, delaying time to results.
In conclusion, real-time reverse-transcriptase polymerase chain reaction remains the most accurate method for detecting 2009 pandemic influenza A (H1N1) virus infection. Because rRT-PCR results are not immediately accessible, and a lab with the necessary equipment and required skill level to avoid common technical errors that may occur with rRT-PCR may not be available, rapid procedures with adequate diagnostic test characteristics are needed and existing rapid tests are inadequate. Alternative solutions to address poor test sensitivity are needed.
The authors would like to thank Drs. Dennis Faix, Thomas Sandora and Alex McAdam for their contribution of additional data, as well as Drs. Hugo Lopez-Gatell, Guido Schwarzer and Loic Desquilbet for expert advice. Dr. Haitao Chu was supported in part by the U.S. Department of Health and Human Services Agency for Healthcare Research and Quality Grant R03HS020666 and P01CA142538 from the U.S. National Cancer Institute.
Financial Disclosures: None reported.
Conflicts of Interests: None reported.