|Home | About | Journals | Submit | Contact Us | Français|
The emergence and rapid spread of the 2009 H1N1 pandemic influenza virus showed that many diagnostic tests were unsuitable for detecting the novel virus isolates. In most countries the probe-based TaqMan assay developed by the U.S. Centers for Disease Control and Prevention was used for diagnostic purposes. The substantial sequence data that became available during the course of the pandemic created the opportunity to utilize bioinformatics tools to evaluate the unique sequence properties of this virus for the development of diagnostic tests. We used a comprehensive computational approach to examine conserved 2009 H1N1 sequence signatures that are at least 20 nucleotides long and contain at least two mismatches compared to any other known H1N1 genome. We found that the hemagglutinin (HA) and neuraminidase (NA) genes contained sequence signatures that are highly conserved among 2009 H1N1 isolates. Based on the NA gene signatures, we used Visual-OMP to design primers with optimal hybridization affinity and we used ThermoBLAST to minimize amplification artifacts. This procedure resulted in a highly sensitive and discriminatory 2009 H1N1 detection assay. Importantly, we found that the primer set can be used reliably in both a conventional TaqMan and a SYBR green reverse transcriptase (RT)-PCR assay with no loss of specificity or sensitivity. We validated the diagnostic accuracy of the NA SYBR green assay with 125 clinical specimens obtained between May and August 2009 in Chile, and we showed diagnostic efficacy comparable to the CDC assay. Our approach highlights the use of systematic computational approaches to develop robust diagnostic tests during a viral pandemic.
The emergence of a novel strain of influenza virus circulating in humans was announced in April 2009 by the U.S. Centers for Disease Control and Prevention (CDC) (4). This novel virus demonstrated an exceptionally rapid geographical spread with sustained human-human transmission. This led the World Health Organization (WHO) to declare this event the first influenza virus pandemic of the 21st century on 11 June 2009. Since the outbreak of the 2009 H1N1 pandemic, to date more than 214 countries have reported laboratory-confirmed human cases, including at least 18,449 deaths worldwide (http://www.who.int/csr/don/2010_08_06/en/index.html). Currently, the 2009 H1N1 pandemic strain continues to be the dominant influenza A virus in circulation around the world, with sporadic detection of the previous seasonal H1N1 and H3N2 influenza A viruses. Most healthy individuals infected with the pandemic H1N1 virus develop an uncomplicated influenza-like illness, with full recovery within a week, even in the absence of medical treatment. However, high infection rates have been seen in the younger population (<35 years of age) (3, 5, 7), and recently the CDC estimated that in the United States alone, the 2009 H1N1 influenza virus has infected between 43 and 89 million people, resulting in an estimated 195,000 to 403,000 hospitalizations and 8,870 to ~18,300 deaths (http://www.cdc.gov/h1n1flu/pdf/graph_April%202010N.pdf). Thus, early detection of the virus in humans during disease onset has proven crucial in many cases, due to the increased pathogenesis observed in the young and in a portion of the population infected with the pandemic strain (6, 20).
Although several assays for the detection and diagnosis of influenza virus exist, many of the established diagnostic tests at the time of the emergence of the 2009 H1N1 virus were unsuitable for properly characterizing the pandemic strain (2, 10, 21). The novel sequence identity of the 2009 H1N1 virus rendered the established quantitative reverse transcriptase PCR (qRT-PCR) assays for the seasonal strain inadequate for subtyping the novel strain (21). In addition, the more generic but less sensitive assays based on antigen detection performed poorly overall (9, 12, 13). The rapid spread of the virus, and the lack of a specific assay for detecting it, vastly challenged the ability of clinical laboratories around the world to diagnose it. This led to the rapid development of a fluorescent probe-based assay by the CDC that was made available to public health laboratories and distributed to other national laboratories worldwide within weeks of the outbreak (7). Similarly, a number of researchers and companies in diverse parts of the world also developed real-time assays (16, 22). Nevertheless, as sequences of the novel strain began to arise, a number of mismatches began to appear that rendered some of these assays less sensitive for human diagnosis (8, 14).
The unprecedented mass sequence analysis of the pandemic virus genomes from laboratories around the world provided early and precise information of the genotypic characteristics of this new virus and permitted a rapid evaluation of the pathogenic potential of this novel strain (23). Nevertheless, although several full genome sequences were available within a month of the outbreak, a systematic and extensive analysis of the sequence variability for development of a robust diagnostic assay has not been done. The immense amount of sequence data available to date offers the unique opportunity to exploit computational tools to evaluate the unique sequence properties of this virus and their potential use for the development of diagnostic tests for the pandemic 2009 H1N1 virus, as well as establishing specific bioinformatics tools for improving current assays and the rapid development of future diagnostic tests.
Using a comprehensive bioinformatics approach, we sought to identify unique signature sequence islands conserved across 2009 H1N1 sequences obtained early during the pandemic (as of July 2009) and to evaluate whether such signature sequences could be used to design highly specific gene-specific primers to diagnose the pandemic 2009 H1N1 virus. To validate our approach, we assessed the specificity and accuracy of a TaqMan and a SYBR green-based qRT-PCR assay to distinguish the pandemic 2009 H1N1 influenza virus from any other influenza virus strain and compared it to other available assays. Finally, we evaluated the accuracy and robustness the SYBR green assay for use as a diagnostic tool by utilizing 2009 H1N1 human clinical samples obtained during the pandemic outbreak in Chile.
All influenza virus genomic sequences, including types A, B, and C, that were available as of 20 July 2009 were downloaded from GenBank (Table (Table1).1). For clarity and ease in subsequent analyses, the type A influenza virus sequences were grouped into subgroups based on known serotypes (Table (Table2).2). To confirm that the signature sequences identified using the 20 July 2009 data collection were still conserved, we extended the collection to all influenza virus genomic sequences available from GenBank as of 30 April 2010. In addition, to avoid misidentifying non-flu virus sequences during the data analysis, an additional 23 sequences of viruses that produce symptoms similar to those of influenza virus infections and that were available on 20 July 2009 were also added to our sequence database (Table (Table33).
Target sequences were defined as the set of sequences of interest (i.e., the 2009 pandemic H1N1 isolates). In contrast, background sequences were defined as all other sequences in the collection that are not part of the target set (including seasonal H1N1 variants from previous years and other influenza virus strains). Table Table44 shows the original list of sequences (downloaded on 20 July 2009) used as a target or background in each case. The analysis was revalidated with an updated list of sequences used as a target or background downloaded on 30 April 2010. All computational analyses were completed using a collection of in-house software developed at the University of Houston Center for BioMedical and Environmental Genomics (CBMEG) with the assistance of the publicly available BioEdit alignment software (11) in both Windows and Linux environments. The identification of signature islands was completed using an SGI Altix 3700 cluster running SUSE Linux Enterprise 10 with approximately 60 1.3-GHz Itanium2 processors and 512 GB RAM running as a single-system image. The first step used for finding sequence signatures specific to the target, the 2009 H1N1 virus, was to identify all regions of a given length (i.e., subsequences) that are conserved in the target set. The second step was to check the conserved sequences to confirm that they are not found in the background set, with the required number of mismatches (all possible combinations of insertions, deletions, and substitutions for all possible positions). We performed rigorous analysis of all signature sequences of lengths ranging from 16 to 22 nucleotides. Generally, shorter lengths produce more target signatures with fewer background mismatches, while longer lengths produce fewer target signatures with more mismatches in the background. The results (not shown) indicated that a length of 20 nucleotides produced the desired balance between the quantity and quality of signatures obtained with at least two or more mismatches. Further analysis confirmed that only segment 4 and segment 6 were sufficient in this approach to design a signature specific to the 2009 H1N1 pandemic influenza virus subtype (Table (Table5).5). Once the signature sequences were identified, those that were next to each other were grouped into signature islands extending beyond 20 nucleotides. A total of 42 islands were identified in segments 4 and 6. These 42 islands were later revalidated using additional influenza virus sequences (collected on 30 April 2010) to confirm that they were still conserved across the set of target sequences and still composed of sequences with two or more mismatches away from the background.
The primers and fluorescent probe for the SYBR green and TaqMan assays for the NA gene of 2009 H1N1 were designed using Visual OMP and ThermoBLAST software (DNA Software Inc., Ann Arbor, MI). We decided to focus our primer design on the longest signature sequence islands (57 and 55 nucleotides long) found in the NA gene of segment 6, which were identified using the bioinformatics approach outlined above. We selected these long signature islands since they allow flexibility in choosing the length and position of primer sequences. By chance, the two islands happen to be 49 nucleotides apart and thus are amenable to producing an appropriately sized amplicon to allow for efficient amplification for real-time PCR using either SYBR green or TaqMan probe. Candidate primers and probes of various lengths within the signature islands of the NA gene of 2009 H1N1 were automatically generated by utilizing Visual OMP. Next, primer pairs were assembled from the candidate lists for reverse and forward primers so that they matched in hybridization affinity to ensure efficient amplification of sense and antisense target strands (18).
To ensure specificity, the best candidate primer and probe sets were then scanned against a series of sequence databases using the ThermoBLAST algorithm (18). ThermoBLAST also automatically identifies all potential false amplicons, which are a source of background and false-positive PCR assay results. The first databases used by ThermoBLAST are listed in Tables Tables11 and and2.2. The results were used to select the primer and probe set that was most specific and only amplified 2009 H1N1. The ThermoBLAST experiments provided an additional check, ensuring compatibility with all the 314 original H1N1 2009 NA genomes (Table (Table2)2) that were intended to be amplified by these primers. Lastly, ThermoBLAST was also used to verify that the designed primers would not produce false amplicons due to the genomes of 23 common respiratory tract flora and pathogens (15) and 22 gut flora (Tables (Tables66 and and7,7, respectively) and the human genome, which would result in potential false-positive assay results or decreased assay sensitivity due to the presence of background amplification.
Primer sequences for amplification of segment 7 corresponding to the M gene were obtained from CDC through the WHO website (http://www.who.int/csr/resources/publications/swineflu/sequencing_primers/en/index.html). The primer sequences were checked for consensus by alignment of all available H1N1 sequences in the GenBank database. The final set of primers and probes for all assays are shown in Table Table88.
Virus stocks used for experiments were all grown in Madin-Darby canine kidney (MDCK) cells that were maintained in minimum essential medium supplemented with 10% fetal bovine serum (HyClone, Logan, UT) and penicillin-streptomycin (Cellgro, Manassas, VA). All other reagents for cell culture were purchased from Gibco Life Technologies (Invitrogen, Carlsbad, CA). Viruses used in this study for testing assay specificity and sensitivity consisted of the following old human seasonal and swine H1N1 influenza viruses, A/Swine/Iowa/30 (Sw/30), A/Puerto Rico/8/1934-MSSM (PR8), A/Weiss/1943 (Wei/43), A/FLW/1952 (FLW/52), A/Denver/1957 (Den/57), A/New Jersey/8/1976 (NJ/76), A/USSR/92/1977 (USSR/77), A/Findland/13/1980 (Fin/80), A/Houston/20593/1984 (Hou/84), A/Colorado/1/1989 (Col/89), A/Texas/36/1991 (Tx/91), A/New Caledonia/20/1999 (NewCal/99), A/Brisbane/59/2007 (Bris/59/07), and three pandemic 2009 H1N1 influenza viruses, A/Mexico/4108/2009 (Mex/09), A/California/04/2009 (Cal/09), distributed by the U.S. CDC as reference isolate, and the A/Netherlands/602/2009 (Neth/09) virus, which was used as a European reference isolate. A/Northern Territory/60/1968 (NT/68) and A/ Brisbane/10/2007 (Bris/10/07) were used as H3N2 controls. All experiments involving in vitro work with live 2009 H1N1 viruses were conducted under biosafety level 2 (BSL-2) with BSL-3 practices, in accordance with the guidelines of the Centers for Disease Control and Prevention.
Primary human samples were obtained from 125 subjects that presented with influenza-like illness to the Pontificia Universidad Católica de Chile Hospital between 26 May and 3 August 2009 and from whom specimens were collected through nasopharyngeal swabs or tracheal aspirates. Both types of samples were resuspended in 1 to 2 ml of viral transport media (VTM; 7.5% sucrose, 0.05% potassium acid phosphate, 0.12% sodium acid phosphate, 0.07% glutamic acid, 0.05% bovine serum albumin, and 50 μg/ml gentamicin) and stored 24 h at 4°C before being further processed. This retrospective study included 117 samples that tested positive for influenza virus A, based on the LightMix Kit influenza A virus M2 assay, and were randomly selected from an anonymous repository at Laboratorio de Infectología y Virología Molecular at Pontificia Universidad Católica de Chile, without any bias regarding age or gender of the donor. Of these 117 samples, 21 samples tested negative for the 2009 H1N1 virus using the LightMix kit Inf A Swine H1 kit (Roche, Santiago, Chile). Additionally, eight samples, from the same repository, that tested negative by both of these assays were also included as negative controls. The limit of detection for these assays was set to cycle 35 (~103 copies of influenza A virus M2 cDNA, provided in the LightMix kit), based on the manufacturer's protocol and on empirical tests conducted at the Laboratorio de Infectología y Virología Molecular at Pontificia Universidad Católica de Chile.
Viral RNA present in 200 μl of the respiratory human samples above was extracted with the High Pure viral nucleic acid kit (Roche, Santiago, Chile) according to the manufacturer's instructions. We prepared standard curve samples utilizing a negative swab suspended in 1 ml of phosphate-buffered saline spiked with 1 × 107 PFU/ml of the Neth09 strain, which was then serially diluted from 1 × 107 to 1 × 101 PFU/ml. Next, 280-μl aliquots of culture supernatant containing viruses grown on MDCK cells or of diluted standard curve samples were used to extract viral RNA by using the QIAamp viral RNA extraction kit (Qiagen, Valencia, CA) following the instruction manual.
One to 5 μl of viral RNA extracted from culture supernatants with the QIAamp viral RNA extraction kit was used in conventional RT-PCR assays to amplify the NA gene. We used the Access Quick RT-PCR kit (Promega, Madison, WI) and 400 nM concentration (each) of 2009 H1N1 NA signature primers (Table (Table8)8) to reverse transcribe and amplify 2 μl of viral RNA, in a final volume of 25 μl, in a Mastercycler Pro S thermocycler (Eppendorf, Hauppauge, NY) under the following cycling conditions: an RT step at 48°C for 30 min and 94°C for 2 min; 30 cycles of 94°C for 30 s, 60°C for 30 s, and 72°C for 30 s; and a final extension step of 5 min at 72°C and a soak step at 4°C. Products were run on a 1.2% agarose gel and visualized using ethidium bromide staining and UV light. For the M gene, we used the same cycling conditions described above for the NA gene amplification, except that the extension step was done for 1 min. The 5′ ends of the forward and reverse primers contain the M13 forward and reverse sequences, respectively. Thus, all M segment PCR products were excised from the gel, extracted with a QIAquick gel extraction kit (Qiagen, Valencia, CA), and sequenced directly with these two primers to confirm virus strain identity.
A 1.5-μl aliquot of viral RNA extracted from human samples was reverse transcribed using the Moloney murine leukemia virus reverse transcriptase (Invitrogen, Carlsbad, CA) and random hexamers according to the manufacturer's instructions. We utilized 2.5 μl of cDNA in a final volume of 20 μl to perform PCR amplification of the M2 and H1 genes, using the LightMix Kit influenza A virus M2 and LightMix kit Inf A Swine H1 (Roche, Santiago, Chile), respectively, in a LightCycler 2.0 real-time PCR system using reagents and cycling conditions recommended by the manufacturer: 95°C for 10 min, 40 cycles of 90°C for 5 s, 60°C for 5 s, and 72°C for 15s, followed by a step at 72°C for 5 min and a final melting curve analysis.
The SYBR green assays using the 2009 H1N1 NA signature primers were performed in two formats: a 96-well plate assay was done using a CFX96 real-time PCR detection system (Bio-Rad, Hercules, CA) in a final volume of 25 μl, and a 384-well plate assay was performed in a LightCycler 480 real-time PCR system (Roche, Indianapolis, IN) in a 10-μl final reaction volume. Both assays were conducted using the SuperScript III Platinum SYBR green one-step qRT-PCR kit (Invitrogen, Carlsbad, CA), 1 to 2 μl of viral RNA, 200 nM each primer, and the following cycle conditions: an RT step at 55°C for 3 min, 95°C for 5 min, 40 cycles of 95°C for 15 s, 60°C for 30 s, and 40°C for 1 min, followed by a melting curve to confirm product specificity. The fluorescent probe-based 2009 H1N1 NA signature-specific assay was conducted with the same cycling conditions and volumes described above for either the CFX96 real-time PCR detection system (Bio-Rad, Hercules, CA) or the LightCycler 480 real-time PCR system, using the SuperScript III Platinum one-step quantitative RT-PCR kit (Invitrogen, Carlsbad, CA), with 200 nM each primer and 100 nM fluorescently labeled probe. The limit of detection for these assays was evaluated empirically and set to cycle 35 (e.g., samples are deemed positive when the cycle threshold [CT] was ≤35). The CDC 2009 H1 assay (7) and the 2009 H1N1 N1 assay (22) were performed as described previously, except that a final reaction volume of 10 μl was used for direct comparison across all assays performed. Briefly, the CDC 2009 H1 assay was run using 800 nM each primer, 200 nM fluorescent probe, and the following cycle conditions: 50°C for 30-min hold, 95°C for 2-min hold; 45 cycles of 95°C for 15 s and 55°C for 30 s. The limit of detection for this assay was set to cycle 37 (CT, ≤37) as recommended in the CDC assay protocol (7). The 2009 H1N1 N1 assay was carried out with 600 nM each primer, 200 nM fluorescent probe, and cycle conditions as follows: 60°C for 30-min hold, 95°C for 5-min hold; 45 cycles of 95°C for 20 s and 60°C for 1 min. Primers used for the qRT-PCR assays are listed in Table Table8.8. In all instances individual samples were run in triplicate for each qRT-PCR assay described, and assays were performed by a single highly trained researcher. Except for standard samples, experiments assessing the clinical accuracy of the assays with human clinical samples were performed blinded. All assays were evaluated for reproducibility by assessing the slopes obtained with a standard curve that was included on each run.
For validation and assessing the performance of the 2009 H1N1 NA signatures SYBR green assay compared to the CDC 2009 H1 assay, we performed a two-tailed unpaired Student t test (α = 0.05) to evaluate whether the CT values obtained for the clinical samples were statistically different with the results were obtained by either assay.
To develop a sensitive and specific assay for 2009 pandemic H1N1, we employed a combination of comprehensive sequence databases, algorithms for locating signature sequences, designing primers, and scanning for false-positive amplicons, and empirical optimization. In our approach, we combined sequence-based and thermodynamics-based bioinformatics methods in both our positive and negative design of specificity determinations. In this study, a signature sequence was defined as a stretch of 20 nucleotides (i.e., a subsequence) that was conserved among 2,419 sequence segments of 2009 pandemic H1N1 but that were distinct by at least two mismatches from the closest subsequence(s) present in 69,370 nonpandemic 2009 H1N1 influenza virus segments available in GenBank (as of 20 July 2009) (Table (Table4).4). These signature sequences were later further validated with sequences obtained on 30 April 2010 that corresponded to 15,777 target sequence segments of 2009 pandemic H1N1 and 74,420 background segments that were nonpandemic 2009 H1N1 virus. Multiple neighboring signature sequences were identified and grouped together into signature islands that were longer than 20 nucleotides. However, we determined that segments 4 and 6, which correspond to the HA and the NA genes, respectively, contained several unique signature islands that are conserved across all 2009 H1N1 sequences analyzed (Table (Table5).5). These signature islands were confirmed in an identical secondary analysis conducted with sequences downloaded on 30 April 2010 that included 2,582 HA and 2,300 NA sequences, respectively, validating the accuracy of the initial analysis.
To design optimal gene-specific primers and a TaqMan fluorescent probe, we used Visual OMP and employed the ThermoBLAST software tool to check that the designed primers did not inadvertently amplify regions of common contaminating DNA, such as human genome, respiratory tract flora, and gut flora among others (Tables (Tables66 and and7).7). This comprehensive approach allowed the design and development of unique qRT-PCR assays using specific 2009 H1N1 NA signature primers and probes (Table (Table8),8), with great potential for high specificity for detecting and diagnosing the novel 2009 H1N1 influenza virus.
To establish whether the designed 2009 H1N1 NA gene signature primers allowed sufficient specificity to discriminate novel pandemic 2009 H1N1 strains from those of other H1N1 viruses previously circulating in the human population, we used a conventional RT-PCR to amplify representative H1N1 strains spanning from 1930 to the 2009 H1N1. Amplification of the expected product (129 bp) was only achieved for the three reference isolates (Mex09, Cal09, and Neth09) of pandemic 2009 H1N1 used. The signature primers did not amplify any of the previously circulating H1N1 strains or the H3N2 strains used as negative controls (Fig. (Fig.11 A). A parallel experiment in which the same viral RNAs were subjected to RT-PCR with influenza virus strain generic primers to the full-length M gene showed the correct amplification product for all the virus strains, confirming that the NA signature primers are highly discriminating and only detect pandemic H1N1 isolates (Fig. (Fig.1A).1A). To evaluate the suitability of the primers in a real-time qRT-PCR assay without loss of specificity for the pandemic 2009 H1N1, we performed a SYBR green one-step qRT-PCR assay with our panel of H1N1 virus strains and the H3N2 controls. The 2009 H1N1 NA signature primers showed high specificity for the 2009 H1N1 strains and no specific amplification of product for any of the other strains (Fig. (Fig.1B).1B). This was confirmed by a melting curve run at the end of the assay (Fig. (Fig.1C),1C), which only showed a corresponding curve for the 2009 H1N1 reference strains. A minor level of background amplification was observed for the negative samples on cycles 36 to 40 (Fig. (Fig.1B).1B). Altogether, these data validated the high level of specificity of the primers designed.
We assessed the sensitivity of the SYBR green assay by comparing its performance to an assay based on the same primer set but with the addition of a matching internal labeled probe (Table (Table8).8). We ran both assays with a set of standard RNA samples (1 × 101 to 1 × 107 PFU/ml) of the Neth09 strain and tested the linearity of the reactions. With the SYBR green assay, amplification of the correct product was obtained with a dilution as low as 1 × 103 PFU/ml (equivalent to ~4.6 PFU when using 2 μl of the extracted RNA in the reaction mixture), and consistent linearity was observed with samples ranging from 1 × 103 to 1 × 107 PFU/ml (Fig. (Fig.22 A and B). The amplification curves obtained for the same standards with the probe-based assay resulted in higher CT values. However, virtually the same results were obtained with this assay, since amplification of the standards also showed a linear relation (Fig. 2C and D). In addition, results of a relative quantification of a Neth09 control sample by both the SYBR green and the TaqMan probe-based methods were in close agreement with each other (3.75 × 108 and 4.09 × 108 PFU/ml, respectively). These data thus suggested no apparent loss of sensitivity when the NA sequence signature primers were used on their own; we then sought to evaluate this formally by comparing the performance to other available assays.
The CDC 2009 H1 probe based assay is the only Food and Drug Administration-authorized assay for which the primer and probe sequences are publicly available (7); we thus used this assay to assess the sensitivity of our assay. Also, for additional comparison we used a sensitive probe-based diagnostic assay for the NA gene of the 2009 H1N1 pandemic strain recently published (22) (Table (Table8,8, 2009 H1N1 N1 assay). We utilized a set of 10-fold serially diluted standard samples (as above), as well as the three reference strains to evaluate the sensitivity of each assay and its ability to quantify viral RNA samples of known titers. As before, we found that although lower CT values were obtained with the 2009 H1N1 NA signatures SYBR green assay, indicating high discriminating sensitivity for the three reference strains used, the sensitivity was equivalent to any of the probe-based assays used (Table (Table9).9). The reliable limit of detection for all assays was 1 × 103 PFU/ml, which demonstrates equivalent detection capabilities for all assays under these settings, and linearity of the reactions was observed for all assays and conditions used.
To assess the clinical diagnostic performance of the 2009 H1N1 NA signatures SYBR green assay conducted with the sequence signature primers, we used a panel of RNAs extracted from 125 human nasopharyngeal swabs or tracheal aspirates obtained from individuals that presented with influenza-like symptoms during the 2009 H1N1 pandemic virus outbreak in Chile, between 26 May and 3 August 2009. We found that the 2009 H1N1 NA signatures SYBR green assay was highly efficient in diagnosing positive samples of the 2009 H1N1 virus (Fig. (Fig.3),3), as determined by the CDC 2009 H1 assay. Indeed, the 2009 H1N1 NA signatures SYBR green assay detected 95/125 positive and 30/125 negative samples, compared to 92/125 positive and 33/125 negative samples that were detected by the CDC 2009 H1 assay, when the limit of detection was set to CT values of 35 and 37, respectively (Fig. (Fig.3).3). In general, with the 2009 H1N1 NA signatures SYBR green assay, positive samples had CT values (detection mean ± standard deviation of the CT, 29.48 ± 2.5) that were similar to those obtained with the CDC probe-based assay (detection mean ± standard deviation of CT, 29.95 ± 3.18), allowing accurate discrimination of positive amplicons. Statistical analysis of the results obtained with both assays showed no significant differences among the assays (P = 0.2574), indicating that the 2009 H1N1 NA signatures SYBR green assay performed as well as the probe-based CDC 2009 H1 assay.
By using a combination of comprehensive sequence databases, algorithms, primer design tools, scanning tools for avoiding false-positive amplicons, and empirical optimization in this study, we are able report the design, development, and validation of a robust qRT-PCR assay to diagnose and distinguish the pandemic 2009 H1N1 influenza virus from other influenza virus strains. We identified signature sequence stretches of 20 nucleotides conserved among 15,777 sequence segments of 2009 pandemic H1N1 but that are distinct by at least two mismatches from the closest subsequence(s) present in 74,397 nonpandemic 2009 H1N1 influenza virus segments available in GenBank (as of 30 April 2010). Notably, we showed that segments 4 and 6 contained several unique signature conserved islands across all 2009 H1N1 sequences analyzed and that the NA gene contained two proximal islands suitable for primer design and PCR amplification. Using Visual OMP we designed optimal gene-specific primers and a TaqMan fluorescent probe and utilized ThermoBLAST to prevent the inadvertent amplification of regions of common contaminating DNA (e.g., the human genome, respiratory tract flora, and gut flora), resulting in a highly specific and reliable TaqMan assay to diagnose the pandemic 2009 H1N1 virus; our assay performed similarly to the CDC 2009 H1 assay and other published assays (7, 22).
A critical decision for primer design is the choice of the region of the analyte to detect, which is best accomplished using a bioinformatics approach to deduce signature sequences (17). Similarly, optimal design of an assay requires “positive design” for the desired analytes (i.e., all variants of 2009 pandemic H1N1) and “negative design” against detection of false analytes, such as previous seasonal H1N1 strains, other influenza A virus strains, such as H3N2, influenza virus B and C strains, and other viruses, respiratory and gut flora, and the human genome. In this study the H1N1 2009 specific signature islands were identified using data sets obtained in July 2009, at the early stages of the pandemic (Tables (Tables11 and and2).2). Our design was focused on the NA gene, since it contained two proximal islands that were suitable for PCR amplification, resulting in a product size compatible for the development of a qRT-PCR assay. Primers were designed to be H1N1 2009 specific: their 3′ end or penultimate nucleotide was purposefully designed to hybridize to a nucleotide that is mutated in the set of background sequences. This ensures the primers will only be extended by polymerase and therefore form an amplicon if the target is NA H1N1 2009. Within the protein coding regions, the third base of a codon is the most variable during viral evolution; thus, the 3′ ends of the primers were also purposefully designed to not hybridize to any codon's third base. This strategy ensures that these primers will likely work well for future evolving strains of H1N1 2009. The strength of this strategy was demonstrated by our validation analyses conducted with data sets obtained in April 2010 that included seven times more NA segment sequences than the original data set. We assembled the primer pairs from the candidate lists for reverse and forward primers so that they matched in hybridization affinity to ensure efficient amplification of sense and antisense target strands (18).
For optimal sensitivity of PCR amplification, we also considered the thermodynamics of primer hybridization, primer dimers, and competing secondary structures (18, 19). Most researchers have attempted to optimize the two-state melting temperature (Tm) of primers. However, such an approach neglects the effects of competing target secondary structure, formation of primer and probe secondary structure (i.e., hairpins), and formation of competing self-dimers, primer-dimers, or strong probe-primer interactions. The algorithms in Visual OMP rigorously account for these effects in the design (18, 19). The thermodynamic contributions of the 5′ 6-carboxyfluorescein (FAM) fluorophore and 3′ Black hole quencher 1 (BHQ-1) labels are also fully accounted for in the algorithms in Visual OMP and thus were considered during the design of the TaqMan probe. The effective Tm values (which account for competing secondary structures) of the selected primers are 67°C and 68°C; this leads to >99% hybridization of primers to intended target strands at the annealing temperature of 60°C, even for targets at very low initial concentrations. The probe was designed to hybridize to the amplicon generated by the primer set. The effective Tm of the probe was designed to be 72°C. In addition, mishybridization events to other parts of the NA gene are automatically prevented by the design algorithm. Finally, the ThermoBLAST algorithm also allows for one or more oligonucleotides to be rapidly scanned against a database of genomic sequences by utilizing a scoring function that is based on the thermodynamics of Watson-Crick matched or mismatched hybridization rather than sequence similarity, as is done with the traditional BLAST program (1). We utilized this tool to screen for potential nonspecific hybridization to the human genome (e.g., human cells) and common microorganisms that are typically present in nasopharyngeal samples collected from individuals for influenza virus diagnostics (Tables (Tables3,3, ,6,6, and and77).
The experimental validation of the primers and probe set designed demonstrated a high level of specificity and accuracy for amplification of the 2009 H1N1 pandemic influenza virus strain (Fig. (Fig.1).1). Importantly, we found that the designed primer set alone (i.e., without the fluorescent probe) can also be used reliably in a SYBR green assay with no loss of specificity and sensitivity compared to the TaqMan assay (Fig. (Fig.11 and and2).2). A minor level of background amplification was observed toward the last cycles (>36 cycles) of the SYBR green assay. However, melting curves conducted to confirm the PCR product indicated this was likely due to primer-dimer formation toward the end of the run, and thus, for further evaluation of the primers, the limit of detection for all assays was set to cycle 35. Of note, the CT values obtained for the same standard curve samples when the 2009 H1N1 NA signatures probe assay was used were higher than when the SYBR green assay was performed. Nevertheless, linearity and sensitivity were not affected for these assays and under the conditions used and were also similar when both the CDC 2009 H1 and 2009 H1N1 N1 assays were performed. Interestingly, in contrast to the results obtained with the 2009 H1N1 NA signature primers in the SYBR green assay, relative quantification of the Mex09 isolate was poor when any of the probe-based assays were used. This might indicate a loss of sensitivity for amplifying this particular isolate by those assays, and thus an isolate specific standard curve will be needed to achieve a more accurate quantification.
Validation of the 2009 H1N1 NA signatures SYBR green assay with a set of 125 human isolates collected during the outbreak of 2009 H1N1 in Chile (between May and August 2009) showed equivalent performance to the CDC 2009 H1 assay. The assay was highly accurate and sensitive for diagnosing 2009 H1N1 clinical samples, including the detection of samples that were not identified by the CDC 2009 H1 assay (Fig. (Fig.3).3). This assay therefore also provides an additional tool for corroborating negative samples and minimizing diagnoses of false negatives.
Overall, our data indicate that both the 2009 H1N1 NA signatures SYBR green assay and the TaqMan assays are highly sensitive and specific for use in clinical diagnosis of the 2009 H1N1 pandemic influenza virus. Thus, these data validate the significance of our combined bioinformatics approach in the wake of a pandemic, where sequence signatures were used to identify optimal regions to design specific and sensitive primers for establishing a reliable, accurate, and robust qRT-PCR assay suitable for human clinical diagnosis of an important viral pathogen. Additionally, the availability of such a highly sensitive 2009 H1N1 NA SYBR green assay for influenza virus diagnosis provides a novel and cheaper alternative to the more expensive fluorescent probe-based assay that could be widely used for human diagnosis, screening, and surveillance of pandemic 2010-2011 H1N1 influenza A viruses, particularly in underdeveloped regions of the world or in research and clinical settings with limited resources.
We thank the CDC for providing us with influenza A/California/04/09 and the A/Mexico/4108/2009 viruses and R. Fouchier for providing us with influenza A/Netherlands/602/09 virus. We are grateful to Richard Cadagan and Andres Rivero for invaluable and excellent technical assistance during the course of this study.
This work was partially supported by CRIP, an NIAID-funded Center for Research in Influenza Pathogenesis (contract number HHSN266200700010C), and by NIAID grants P01AI058113 and U54AI057158. Development of Visual OMP and ThermoBLAST was supported by NIH grants R44 HG002555, R44 HG GM086968, and R44 HG003923 and Department of Homeland Security and Technology Directorate contract NBCHC070096. The design approach and bioinformatics analysis were partially supported by the Department of Homeland Security Science and Technology Directorate, awards NBCHC070063 and NBCHC070054, a training fellowship from the Keck Center's Training Program in Biomedical Informatics, the National Science Foundation through the Rice-Houston Alliance for Graduate Education and Professoriate, and a Science, Mathematics and Research for Transformation scholarship.
This published material represents the position of the authors and not necessarily that of the Department of Homeland Security.
Published ahead of print on 17 November 2010.
†The authors have paid a fee to allow immediate free access to this article.