|Home | About | Journals | Submit | Contact Us | Français|
For influenza viruses, pyrosequencing has been successfully applied to the high-throughput detection of resistance markers in genes encoding the drug-targeted M2 protein and neuraminidase. In this study, we expanded the utility of this assay to the detection of multiple receptor binding variants of the hemagglutinin protein of influenza viruses directly in clinical specimens. Specifically, a customized pyrosequencing protocol that permits detection of virus variants with the D, G, N, or E amino acid at position 222 in the hemagglutinin of the 2009 pandemic influenza A (H1N1) virus was developed. This customized pyrosequencing protocol was applied to the analysis of 241 clinical specimens. The use of the optimized nucleotide dispensation order allowed detection of mixtures of variants in 10 samples (4.1%) which the standard cyclic nucleotide dispensation protocol failed to detect. The optimized pyrosequencing protocol is expected to provide a more accurate tool in the analysis of virus variant composition.
The influenza virus is an enveloped single-stranded negative-sense RNA virus that is an important respiratory pathogen which causes annual epidemics and occasional pandemics (27). As a result of a high mutation rate of viral RNA polymerase (4, 19), the influenza virus exists as a quasispecies, a complex mixture of closely related genomes. When selective pressure is applied, variants carrying favorable changes gain growth advantage and establish dominance (16). Influenza virus hemagglutinin (HA) is responsible for binding to cellular receptors and for membrane fusion during the virus entry into the host cell (23). Mutations in HA can allow the virus to adapt to a new host and evade the human host immune system (25) and can be responsible for altered receptor specificity, virulence, and other traits. Therefore, the ability to detect HA variants, even when they are present at a low frequency within the virus population, is desirable.
Currently, there are several assays that allow the monitoring of mutations at specific nucleotides. Sanger sequencing is considered the “gold standard” for analysis of virus genome variations (26). However, the interpretation of sequencing chromatograms can be difficult for mixed-genome variants containing insertions, deletions, or multiple nucleotide substitutions (23). Sanger sequencing (31), high-resolution melting curve (HRM) analysis (14), real-time quantitative reverse transcriptase PCR (qRT-PCR) (2), rolling cycle amplification (RCA) (28), single-strand confirmation polymorphism (SSCP) (10), resequencing array (29), and, recently, pyrosequencing (3, 7, 18, 20) have been employed for the analysis of specific mutations in genomes of different viruses.
Pyrosequencing technology is based on luminometric detection of the light signal produced by luciferase (21). The reliability and flexibility in assay development have made pyrosequencing a widely used method for various screening and diagnostic applications (13). Pyrosequencing platforms have been useful for high-throughput sample screening for molecular markers associated with drug resistance or other traits (3, 7, 12). Recently, pyrosequencing was used for the timely detection of resistance to adamantanes and oseltamivir during the 2009 influenza A virus (H1N1) pandemic (pH1N1) (6).
Although pyrosequencing is efficient and well suited for the analysis of short sequences, the technology is still fairly new and there are challenges for certain applications. For example, it was shown for the KRAS oncogene that the presence of a mixture containing more than two major genomic variants can complicate pyrogram interpretation (26). Such mixtures can also cause inaccurate identification of single nucleotide polymorphisms (SNPs).
Identification of molecular markers of virulence in influenza viruses remains a challenging task. The emergence and rapid global spread of a novel pH1N1 influenza virus in 2009 highlighted the need for high-throughput analysis to detect new variants of this rapidly evolving pathogen. Overall, the pH1N1 virus caused mild illness in most cases; however, numerous severe and fatal cases were also reported (30). Analysis of the HA gene of the 2009 pH1N1 virus revealed several mutations (Table 1) at the aspartic acid (D) residue at position 222 (225 in H3 numbering). An association of clinically severe cases with the D222G substitution (GAT → GGT) in the HA1 subunit was reported (11), although no such association was found in other studies (1). Also, a considerable frequency of N222 variants (GAT → AAT) was found among severe and fatal cases (11, 17). Noteworthy, changes at D222 have been associated with adaptation of a seasonal, nonpandemic human H1N1 influenza A virus to a new environment (e.g., adaptation to mice and embryonated chicken eggs) because this amino acid is a part of the receptor binding site (8, 9, 24). To further investigate the potential role of amino acid substitutions at position 222, it was highly desirable to develop an assay which would allow accurate, high-throughput detection of multiple variants within a clinical specimen. In this report, we describe the use of a customized dispensation order of nucleotides for pyrosequencing that enables the accurate detection of HA variants in pH1N1 viruses.
Seven pandemic A (H1N1) influenza virus isolates and 241 clinical specimens (throat swabs, nasal swabs and washes, nasopharyngeal swabs, and sputum specimens) submitted to the Centers for Disease Control and Prevention (CDC) between November 2009 and January 2010 as part of the U.S. virus surveillance activity were used for this study.
To detect mutations at the 222 position of the HA, three primers were designed with the use of PSQ Assay Design software, version 1.0.6 (Qiagen, Valencia, CA). The primers SW-HA-F696 (5′-CAAGAAGTTCAAGCCGGAAATAGC-3′, forward primer) and SW-HA-R799b (biotin-5′-ATTGCGAATGCATATCTCGGTAC-3′, reverse biotinylated primer) were used to amplify DNA fragments by RT-PCR. The primer SW-HA-F715 (5′-CAATAAGACCCAAAGTGAGG-3′, sequencing primer) was applied for detecting the target region.
Viral RNA was extracted from 100 μl of sample using either a MagNA Pure Compact or MagNA Pure LC 2.0 nucleic acid isolation system (Roche Diagnostics, Basel, Switzerland). RNA was eluted in 50 μl and stored at −30°C. A SuperScript III one-step RT-PCR system with Platinum Taq High Fidelity enzyme (Invitrogen, Carlsbad, CA) was used for cDNA synthesis and amplification as described by the manufacturer. Briefly, the reaction was initiated at 50°C for 30 min. Then, Taq DNA polymerase was heat activated at 94°C for 2 min. This was followed by 45 cycles at 94°C for 15 s, 50°C for 30 s, and 68°C for 1 min. The final elongation time was 5 min. Primers SW-HA-F696 and SW-HA-R799-b were used for RT-PCR at a final concentration of 0.4 μM. The amplified RT-PCR products were visualized by electrophoresis on 2% agarose E-gels (Invitrogen, Carlsbad, CA).
Primers designed for the RT-PCR step of pyrosequencing (SW-HA-F696 and nonbiotinylated SW-HA-R799) were used to amplify the cDNA fragment for Sanger sequencing. Amplified RT-PCR products were purified using ExoSAP-IT reagent (USB, Cleveland, OH). Sequence template was synthesized with an ABI Prism BigDye Terminator kit (Applied Biosystems, Foster City, CA). Products of sequencing reactions were treated with XTerminator solution (Applied Biosystems, Foster City, CA), according to the manufacturer's protocol. Sequences generated in an ABI Prism 3730 genetic analyzer (Applied Biosystems, Foster City, CA) were analyzed using Lasergene software, version 7.0 (DNAStar, Madison, WI).
Pyrosequencing reactions were performed on a PyroMark Q96 ID instrument as described by the manufacturer (Qiagen, Valencia, CA). The SW-HA-F715 sequencing primer was used for all reactions at a final concentration of 0.45 μM. Both the sequence analysis (SQA) and the SNP modes of the PyroMark Q96 ID instrument were utilized for analysis of substitutions at the 222 position in the HA. Two different nucleotide dispensation orders were used for pyrosequencing in SQA mode: the cyclic dispensation order (GATC)6 and the customized dispensation order ATGTAT(CAGT)6 (see Results). SNP analysis was conducted to determine the percentage of each variant within a mixture containing two virus variants. These percentages were calculated by the PyroMark ID software. The dispensation order for SNP analysis was generated by PyroMark ID software on the basis of the target sequence and the site for SNP analysis. For a mixture containing GAT and AAT variants, the generated dispensation was CGAGTCAGA.
To confirm the presence of virus variants within mixtures, cloning of RT-PCR products amplified with primers SW-HA-F696 and nonbiotinylated SW-HA-R799 was performed using a TOPO TA cloning kit as described by the manufacturer (Invitrogen, Carlsbad, CA). One hundred individual bacterial colonies containing recombinant plasmids carrying an individual RT-PCR fragment were randomly picked and analyzed by pyrosequencing of the PCR product amplified with primer SW-HA-F696 and biotinylated primer SW-HA-R799b. Pyrosequencing analysis was performed as described above.
t-test analysis was used to determine whether there was a statistically significant difference between the percentage calculated for virus variants in a mixture tested with the customized nucleotide dispensation order in SQA mode or in SNP mode. Statistical significance was set at a P value of <0.05. All probabilities were two-tailed.
Pandemic H1N1 influenza viruses with changes at each nucleotide of the triplet for amino acid 222 of the HA protein were detected by Sanger sequencing. Specifically, a single nucleotide substitution at the first (GAT → AAT), the second (GAT → GGT), or the third (GAT → GAA) position (Fig. 1A to D) was seen. In addition, sequencing showed that some viruses contained a mixture of variants (Fig. 1E to G). For example, the chromatogram in Fig. 1E showed a polymorphism for the first nucleotide (G/A), which suggests the presence of two variants: GAT and AAT. Another virus had two polymorphisms, at the first and second positions (Fig. 1F and G), which might reflect the presence of up to four variants: GAT, AAT, GGT, and AGT, encoding aspartic acid (D), asparagine (N), glycine (G), and serine (S), respectively. However, the actual variants cannot be reliably ascertained on the basis of Sanger sequencing. Therefore, isolates (n = 7) with an individual GAT, AAT, GGT, or GAA variant as well as mixtures were selected to be analyzed using the pyrosequencing assay.
First, the viruses were analyzed in the pyrosequencing assay using a standard cyclic dispensation order, (GATC)6, in SQA mode. With this approach, the single variants were readily identified (Fig. 1A to D, cyclic dispensation). However, results for the samples containing mixtures of variants were not in agreement with the sequences determined by the Sanger method. For example, a mixture of two variants, GAT and AAT, was interpreted as a single wild-type variant, GAT, by the PyroMark ID software analysis (Fig. 1E). Visual inspection of the corresponding pyrogram (Fig. 1E) suggested the presence of the additional variant (AAT), on the basis of the increased height of the first A peak in relation to the first G and T peaks.
In another instance, Sanger sequencing indicated the presence of GAT and GGT variants, while the PyroMark ID software still identified only the wild-type variant, GAT (Fig. 1F). Contrarily, the increased height of the first G peak in comparison to the height of the first A peak in the pyrogram suggested that in addition to the wild type, the variant GGT was present. However, in another sample (Fig. 1G), even close inspection of pyrograms did not provide any evidence for the presence of additional variants detected using the Sanger method.
Since pyrosequencing using the standard cyclic nucleotide dispensation was unable to accurately determine the variants in the mixtures, a customized dispensation order was designed with the intention to improve the resolution of pyrograms. Because all three variants of interest (GAT, GGT, and AAT) share a T in the third position, we proposed a new customized order: ATGTAT(CAGT)6. Consequently, when the first A nucleotide was dispensed, both A nucleotides for an AAT variant were incorporated. The subsequently dispensed T would then be incorporated as the T of an AAT variant. Therefore, the peak height for the A nucleotide would be twice that of the T peak and the T peak would reflect the total portion of the AAT virus variant in the mixture. The next dispensation of a G would be incorporated as the first nucleotide of a GAT variant or the first and second nucleotides of a GGT variant. The presence of the T peak for the T nucleotide dispensed after the G would confirm the presence of a GGT variant. In this case, the height of the T peak was used to calculate the final proportion of the GGT variant. The following A nucleotide dispensation would extend a GAT variant, and the final T dispensed prior to the cyclic (CAGT)6 portion of the dispensation order would indicate the last peak for a GAT variant. This T could also be used to analyze the proportion of a GAT variant present in a mixture. Thus, the proposed order of dispensation allowed separation of each of the three variants on the basis of a location of the T nucleotide on a pyrogram.
For a theoretical sample containing equal amounts of three variants (AAT, GGT, and GAT), a T nucleotide at the second, fourth, or sixth position would indicate the presence of all three variants, respectively (Fig. 2B). Using the standard cyclic order of nucleotide dispensation, this theoretical sample would be incorrectly identified to have the GAT variant solely present in the sample, as depicted in Fig. 2A.
Since the respective peak heights in a pyrogram were proportional to the amount of incorporated nucleotide, the customized dispensation proposed here allowed the quantification of each of the three variants in a mixture based on the analysis of T peak heights. The following algorithm was derived to calculate the proportion of each variant: (Tn − TBkg)/(T2 + T4 + T6 − 3 × TBkg) × 100% = SNPn, where Tn (n = 2, 4, or 6) is the peak height of the T nucleotide dispensed at positions 2, 4, and 6 and Bkg is the average background of T calculated from the peak heights of the T nucleotides dispensed at positions 10, 14, and 18 where no T residues exist in the sequence.
The next step was to compare the results using customized nucleotide dispensation order, SNP analysis, and cloning analysis for the same specimen. A virus isolate which contained two virus variants, GAT and AAT, was used. The pyrogram generated with the customized nucleotide dispensation order clearly showed the presence of both variants (Fig. 3).
Using the customized nucleotide dispensation, the proportion of the sample with the GAT and AAT variants was calculated using the peak height data described in the algorithm (Table 2) and was found to contain 86.8% ± 0.2% of the GAT variant and 13.2% ± 0.2% of the AAT variant. The analysis of this specimen in SNP mode showed that GAT and AAT variants were present in proportions of 87.7% ± 0.1% and 12.3% ± 0.1%, respectively. The similarity of the data obtained by the two approaches was statistically significant (P < 0.02) (Table 2).
For the same specimen, DNA cloning showed the proportions of GAT and AAT variants to be 84% and 16%, respectively (Table 2). Thus, the customized nucleotide dispensation order for pyrosequencing could accurately identify the proportions of the virus variants in mixtures.
To assess the ability of the customized pyrosequencing assay to detect variants at position 222, 241 clinical specimens positive for pH1N1 virus were analyzed. Among those tested, GAT (n = 225), GGT (n = 3), AAT (n = 2), and GAA (n = 1) were individual variants (Table 3). In addition, 10 specimens (4.1%) contained more than a single variant (Table 3). These mixtures would not have been identified using the cyclic dispensation order.
In this study we developed a sensitive assay, utilizing the quantitative nature of pyrosequencing technology, for the detection of individual and mixed variants at position 222 in HA of the pH1N1 2009 influenza virus. Pyrosequencing technology measures a light signal produced after a nucleotide is incorporated into the sequence by the polymerase (21). When a nucleotide is dispensed but not incorporated, a gap is formed in the pyrogram (Fig. 1). We demonstrated that the SQA protocol with a customized nucleotide dispensation order, designed to determine virus variants and their proportions, was more informative than the traditional Sanger sequencing method as well as the pyrosequencing with the cyclic nucleotide dispensation order. Although the SNP pyrosequencing assay is a powerful tool for polymorphism quantification, it cannot be applied to unknown nucleotide polymorphisms. Furthermore, the PyroMark ID software does not permit the SNP analysis of identical nucleotide polymorphisms present at two or more consecutive positions. The advantage of the customized dispensation order was seen when mixtures that would be misinterpreted on the basis of pyrosequencing with the cyclic nucleotide dispensation order were clearly identified in the clinical specimens. No AGT (S) variants were detected in this study utilizing either dispensation order, although it was a possible variant if only Sanger sequencing is used.
The quantitative nature of pyrosequencing allowed the calculation of virus variant proportions with the use of the peak heights within a pyrogram generated using the customized dispensation order. The results from three different approaches, SNP analysis, cloning, and SQA analysis, were in good agreement and supported the use of the formula (Tn − TBkg)/(T2 + T4 + T6 − 3 × TBkg) × 100% = SNPn for the calculation of the virus variant proportions based on peak heights on SQA programs.
The customized nucleotide dispensation order approach has already proven useful in detecting the emergence of variants in other viruses, e.g., in analyzing the transition of wild-type to lamivudine-resistant variants in hepatitis B viruses (15). In this study, we were able to further expand the application of a customized nucleotide dispensation order to the quantitation of each of the virus variants in a mixture.
Pyrosequencing technology has been actively employed for the analysis of the influenza virus genome and has proven to be a robust platform for screening mutations associated with drug resistance in M2 and NA genes (5). The new approach extends the usefulness of pyrosequencing in the detection of multiple virus genomic variants for position 222 in HA. In addition, it was shown to be more informative than Sanger sequencing for quantitative analysis of a polymorphic site. It allows the circumvention of the time-consuming and costly cloning of RT-PCR products from specimens containing virus variants. This approach can also be useful in identifying the presence of multiple variants within influenza viruses and other pathogens. In the future, this assay could be used to rapidly screen for mutations at position 222 of HA or customized to screen for other mutations.
We thank our colleagues from Sequencing Activity, Influenza Division, CDC, for their valuable technical assistance on the project. We also thank Alicia Fry and Marie-Joelle Miron for their useful discussions and contributions.
The findings and conclusions of this report are those of the authors and do not necessarily represent the views of the Centers for Disease Control and Prevention.
Published ahead of print on 9 February 2011.