Pandemic H1N1 influenza viruses with changes at each nucleotide of the triplet for amino acid 222 of the HA protein were detected by Sanger sequencing. Specifically, a single nucleotide substitution at the first (GAT → AAT), the second (GAT → GGT), or the third (GAT → GAA) position (A to D) was seen. In addition, sequencing showed that some viruses contained a mixture of variants (E to G). For example, the chromatogram in E showed a polymorphism for the first nucleotide (G/A), which suggests the presence of two variants: GAT and AAT. Another virus had two polymorphisms, at the first and second positions (F and G), which might reflect the presence of up to four variants: GAT, AAT, GGT, and AGT, encoding aspartic acid (D), asparagine (N), glycine (G), and serine (S), respectively. However, the actual variants cannot be reliably ascertained on the basis of Sanger sequencing. Therefore, isolates (n = 7) with an individual GAT, AAT, GGT, or GAA variant as well as mixtures were selected to be analyzed using the pyrosequencing assay.
Fig. 1. Analysis of virus isolates containing an individual virus variant (A to D) or a mixture of virus variants (E to G) at position 222 of hemagglutinin protein. Sanger sequencing chromatograms are shown in the first column (A nucleotides, green; C nucleotides, (more ...) Analysis of virus isolates using the pyrosequencing assay (cyclic nucleotide dispensation).
First, the viruses were analyzed in the pyrosequencing assay using a standard cyclic dispensation order, (GATC)6, in SQA mode. With this approach, the single variants were readily identified (A to D, cyclic dispensation). However, results for the samples containing mixtures of variants were not in agreement with the sequences determined by the Sanger method. For example, a mixture of two variants, GAT and AAT, was interpreted as a single wild-type variant, GAT, by the PyroMark ID software analysis (E). Visual inspection of the corresponding pyrogram (E) suggested the presence of the additional variant (AAT), on the basis of the increased height of the first A peak in relation to the first G and T peaks.
In another instance, Sanger sequencing indicated the presence of GAT and GGT variants, while the PyroMark ID software still identified only the wild-type variant, GAT (F). Contrarily, the increased height of the first G peak in comparison to the height of the first A peak in the pyrogram suggested that in addition to the wild type, the variant GGT was present. However, in another sample (G), even close inspection of pyrograms did not provide any evidence for the presence of additional variants detected using the Sanger method.
Target-specific customization of nucleotide dispensation.
Since pyrosequencing using the standard cyclic nucleotide dispensation was unable to accurately determine the variants in the mixtures, a customized dispensation order was designed with the intention to improve the resolution of pyrograms. Because all three variants of interest (GAT, GGT, and AAT) share a T in the third position, we proposed a new customized order: ATGTAT(CAGT)6. Consequently, when the first A nucleotide was dispensed, both A nucleotides for an AAT variant were incorporated. The subsequently dispensed T would then be incorporated as the T of an AAT variant. Therefore, the peak height for the A nucleotide would be twice that of the T peak and the T peak would reflect the total portion of the AAT virus variant in the mixture. The next dispensation of a G would be incorporated as the first nucleotide of a GAT variant or the first and second nucleotides of a GGT variant. The presence of the T peak for the T nucleotide dispensed after the G would confirm the presence of a GGT variant. In this case, the height of the T peak was used to calculate the final proportion of the GGT variant. The following A nucleotide dispensation would extend a GAT variant, and the final T dispensed prior to the cyclic (CAGT)6 portion of the dispensation order would indicate the last peak for a GAT variant. This T could also be used to analyze the proportion of a GAT variant present in a mixture. Thus, the proposed order of dispensation allowed separation of each of the three variants on the basis of a location of the T nucleotide on a pyrogram.
For a theoretical sample containing equal amounts of three variants (AAT, GGT, and GAT), a T nucleotide at the second, fourth, or sixth position would indicate the presence of all three variants, respectively (B). Using the standard cyclic order of nucleotide dispensation, this theoretical sample would be incorrectly identified to have the GAT variant solely present in the sample, as depicted in A.
Fig. 2. Histogram comparison of two theoretical pyrograms for cyclic (GATC)6 (A) and customized ATGTAT(CAGT)6 (B) nucleotide dispensation orders. The x axis shows the nucleotide dispensed. The y axis represents the proportion of each nucleotide relative to the (more ...)
Since the respective peak heights in a pyrogram were proportional to the amount of incorporated nucleotide, the customized dispensation proposed here allowed the quantification of each of the three variants in a mixture based on the analysis of T peak heights. The following algorithm was derived to calculate the proportion of each variant: (Tn − TBkg)/(T2 + T4 + T6 − 3 × TBkg) × 100% = SNPn, where Tn (n = 2, 4, or 6) is the peak height of the T nucleotide dispensed at positions 2, 4, and 6 and Bkg is the average background of T calculated from the peak heights of the T nucleotides dispensed at positions 10, 14, and 18 where no T residues exist in the sequence.
Comparison of customized nucleotide dispensation order, SNP analysis, and cloning analysis.
The next step was to compare the results using customized nucleotide dispensation order, SNP analysis, and cloning analysis for the same specimen. A virus isolate which contained two virus variants, GAT and AAT, was used. The pyrogram generated with the customized nucleotide dispensation order clearly showed the presence of both variants ().
Fig. 3. Detection of mixture of GAT and AAT virus variants at amino acid position 222 of hemagglutinin. (A) The theoretical histogram for customized nucleotides was made on the basis of the variant proportion detected for virus mixtures GAT and AAT by cloning (more ...)
Using the customized nucleotide dispensation, the proportion of the sample with the GAT and AAT variants was calculated using the peak height data described in the algorithm () and was found to contain 86.8% ± 0.2% of the GAT variant and 13.2% ± 0.2% of the AAT variant. The analysis of this specimen in SNP mode showed that GAT and AAT variants were present in proportions of 87.7% ± 0.1% and 12.3% ± 0.1%, respectively. The similarity of the data obtained by the two approaches was statistically significant (P < 0.02) ().
Comparison of virus variant proportions in clinical specimens containing two virus variants, GAT and AAT
For the same specimen, DNA cloning showed the proportions of GAT and AAT variants to be 84% and 16%, respectively (). Thus, the customized nucleotide dispensation order for pyrosequencing could accurately identify the proportions of the virus variants in mixtures.
High-throughput analysis of clinical specimens using pyrosequencing.
To assess the ability of the customized pyrosequencing assay to detect variants at position 222, 241 clinical specimens positive for pH1N1 virus were analyzed. Among those tested, GAT (n = 225), GGT (n = 3), AAT (n = 2), and GAA (n = 1) were individual variants (). In addition, 10 specimens (4.1%) contained more than a single variant (). These mixtures would not have been identified using the cyclic dispensation order.
Comparison of the number of HA variants detected between cyclic and customized nucleotide dispensationa