|Home | About | Journals | Submit | Contact Us | Français|
Similar to other segmented RNA viruses, influenza viruses can exchange genome segments and form a wide variety of reassortant strains upon coreplication within a host cell. Therefore, the mapping of genome segments of influenza viruses is essential for understanding their phenotypes. In this work, we have developed an oligonucleotide microarray hybridization method for simultaneous genotyping of all genomic segments of two highly homologous strains of influenza B virus. A few strain-specific oligonucleotide probes matching each of the eight segments of the viral genomes of the B/Beijing/184/93 and B/Shangdong/7/97 strains were hybridized with PCR-amplified fluorescently labeled single-stranded DNA. Even though there were a few mismatches among the genomes of the studied virus strains, microarray hybridization showed highly significant and reproducible discrimination ability and allowed us to determine the origins of individual genomic segments in a series of reassortant strains prepared as vaccine candidates. Additionally, we were able to detect the presence of at least 5% of mixed genotypes in virus stocks even when conventional sequencing methods failed, for example, for the NS segment. Thus, the proposed microarray method can be used for (i) rapid and reliable genome mapping of highly homologous influenza B viruses and (ii) extensive monitoring of influenza B virus reassortants and the mixed genotypes. The array can be expanded by adding new oligoprobes and using more quantitative assays to determine the origin of individual genomic segments in series of reassortant strains prepared as vaccine candidates or in mixed virus populations.
Influenza viruses A and B possess segmented genomes consisting of eight separate RNA molecules, each coding for one or more viral proteins (15, 19). Upon coreplication in a cell, the viruses can exchange segments, leading to diversity of reassortant strains. Together with accumulation of point mutations, segment reassortment is the basis for evolution and maintenance of diversity for many viral pathogens. It provides them with the ability to rapidly adapt to the pressure of the host immune system and leads to the continuous emergence of new virus variants that cause seasonal outbreaks of influenza (17). Because of this ability, segmented viruses can exist in numerous genotypes and serotypes, presenting a challenge to the creation of protective vaccines. On the other hand, it also provides the basis for a rational approach to the development of influenza vaccines (2, 4, 21, 23).
Gradual antigenic drift and intermittent occurrences of more substantial antigenic shifts make the creation of new prospective seed virus strains for both inactivated and live attenuated influenza vaccines a continuous process that must be completed within a short period before each epidemic season. To be effective, prospective vaccine strains must not only possess the antigenic properties of the currently circulating strains but also replicate well in the substrate used for their manufacture. Some naturally occurring influenza virus isolates do not grow well in substrates used for vaccine production (e.g., embryonated chicken eggs). Adaptation of influenza viruses for optimal growth by passaging them in a target substrate is a time-consuming process that may also allow the accumulation of random mutations that can alter antigenic specificity. An alternative strategy is to create reassortants between the currently circulating strains and another strain previously adapted for high growth in vitro (2, 21). The reassortants should contain genes coding for the protective antigens (hemagglutinin [HA] and neuraminidase [NA]), as well as genes determining the optimal replicative properties of the virus. Selection of appropriate reassortants is aided by reverse transcription (RT)-PCR and restriction fragment length polymorphism (RFLP) tests of multiple candidate clones (13, 18) and multiplex RT-PCR with single-strand conformation polymorphism analysis (1). However, none of these methods is fully adequate, and the creation of new rapid, highly specific, sensitive, and reproducible segment-mapping techniques remains highly desirable.
Hybridization with microarrays of immobilized oligonucleotides is a rapid alternative to DNA sequence analysis (6, 12) and has been used for the genotyping of viruses, bacteria, and genes of higher organisms (7, 9, 10, 20). Recently, a microarray method has been used for the successful genotyping of the VP7 gene of rotaviruses (8) and for analysis of intermolecular recombinants of poliovirus (5), as well as for detection and discrimination of orthopoxviruses (14). Recently, the potential of microarray hybridization for detection and discrimination between types and subtypes of influenza A viruses has been shown by using 24 long oligoprobes (average, 500 nucleotides [nt] ). Shorter oligoprobe microarrays (between 4 and 128 nt) have been developed for the detection of influenza viruses (22). However, in the latter study, the intensities of probe spots completely complementary to target sequences lacked definitive boundaries between significant hybridization and nonspecific background binding, and most of the monospecific NA and HA subtype probes have not been discriminated by the array.
For better discrimination of closely related segmented RNA viruses, we report a new high-throughput oligonucleotide microarray-based method to map the origins of all segments of two closely related influenza B viruses, B/Beijing/184/93 and B/Shangdong/7/97, which were used as the models for development of the method. The two influenza B viruses were chosen for the study because B/Shangdong/7/97 is a naturally occurring reassortant with three segments (PA, NP, and M) derived from a precursor virus phylogenetically related to B/Beijing/184/93 and therefore presented a challenge by requiring the ability to distinguish between single-base substitutions. In addition, the strains possess different growth characteristics in embryonated chicken eggs (23) and may serve as a model for the reassortment approach to the creation of influenza B vaccines. The proposed microarray approach enabled us to successfully discriminate all eight segments of the Beijing and Shangdong strains despite minimal differences between them.
The strains used to generate virus reassortants were previously described (23). The potential vaccine candidate influenza viruses Beijing/184/93 and Shangdong/7/97 passaged only in embryonated chicken eggs were received from the Centers for Disease Control and Prevention, Atlanta, Ga. The strains used to generate virus reassortants were previously described (23). The viruses were grown in 11-day-old embryonated chicken eggs. The allantoic cavities of the embryonated eggs were inoculated with the original allantoic fluids diluted 1:1,000 in phosphate-buffered saline containing 10 μg of gentamicin/ml. Each egg was inoculated with 0.2 ml of diluted allantoic fluid and incubated at 33°C for 72 h. To make reassortants, eggs were coinoculated with 1,000 50% egg infective doses (each) of the B/Beijing/184/93 and B/Shangdong/7/97 virus strains/0.2 ml. The mixed progeny from reassortment passage were harvested and neutralized with postinfection ferret anti-B/Shangdong/7/97 serum (1 volume of a 1:10 dilution of virus added to 1 volume of a 1:10 dilution of serum) for 1 h at room temperature. Subsequently, the embryonated eggs were inoculated with 10-fold serial dilutions of the virus-serum mixture. Viruses recovered from this passage that had HA titers of 1:128 to 1:256 were selected for cloning by the terminal-dilution method in two further passages in eggs.
Viral RNA was extracted from allantoic fluid using the Viral RNA Mini kit (QIAGEN, Valencia, Calif.), and cDNA was synthesized using Thermoscript RT polymerase (ThermoScript RT-PCR System; Invitrogen, Carlsbad, Calif.) with an AGCAGAAGC primer complementary to the conserved 3′ ends of all influenza B virus genome segments, either alone or in a mixture with a random hexanucleotide (dN6). The resulting cDNA was used for multiplex PCR amplification.
The nucleotide sequences of all eight segments of B/Beijing/184/93 were obtained from GenBank. The sequences of genome segments of B/Shangdong/7/97 were determined in our laboratory (23).
cDNA samples were prepared by 30 cycles of PCR amplification (HotStarTaq Master Mix; QIAGEN) using primers specific for partial sequences of eight genome segments. Amplification products were separated by electrophoresis in 1% agarose gel, extracted with the QIAquick Gel Extraction kit (QIAGEN), and sequenced using a dRhodamine Terminator Cycle Sequencing kit (PE Biosystems, Warrington, United Kingdom) and an ABI Prizm model 377 (Applied Biosystems, Foster City, Calif.).
Amplicons of all eight influenza B virus genome segments were prepared in two multiplex PCRs; the primer pair for the PB2, PB1, PA, and HA segments in one multiplex PCR was B1-S1, and the primer pair for the NP, NA, M, and NS segments the other multiplex PCR was B2-S2 (B and S refer to Beijing and Shangdong, respectively). The sequences of the PCR primers are listed in Table Table1.1. One primer in each pair was 5′ biotinylated for strand separation and preparation of Cy5 fluorescently labeled single-stranded DNA samples as previously described (8). Briefly, 30 μl of a reaction mixture containing 1× AmpliTaq PCR buffer with 1.5 mM MgCl2; a 200 nM concentration of each primer; 20 μM Cy5-dCTP; 20 μM dCTP; 200 μM (each) dATP, dGTP, and dTTP; and 0.8 U of HotStartTaq DNA Polymerase (QIAGEN) was used. Amplification was performed by 35 PCR cycles consisting of 30 s at 94°C, 30 s at 45°C, and 60 s at 72°C. Single-stranded DNA was prepared by the separation of DNA strands using streptavidin-coated magnetic beads (Dynal ASA, Oslo, Norway) according to the manufacturer's protocol. Fluorescently labeled single-stranded DNA was eluted from the magnetic beads by washing them with 50 μl of 0.1 M NaOH purified through CentriSep spin columns (Princeton Separation, Adelphia, N.J.), concentrated by vacuum drying, and finally resuspended in water.
Oligoprobes (17 to 22 nt) were designed to match the regions of each segment that were previously found to contain one or more nucleotide differences between the RNA sequences of the B/Beijing/184/93 and B/Shangdong/7/97 strains. The selected oligoprobe pairs differed by 1 to 7 nt (Table (Table2).2). The 5′ end of each oligoprobe contained an aminolink group (TFA Aminolink CE reagent; PE Applied Biosystems) added during the synthesis. The probes were covalently immobilized on the surfaces of aldehyde-coated glass slides. Additional spots of oligonucleotides complementary to one of the primers for PCR amplification were used as qualitative positive controls for sample preparation and hybridization (Table (Table2).2). These controls were not used for the normalization of array signals and therefore were not reflected in the results (i.e., they are not shown in Fig. Fig.11).
Microarray fabrication and hybridization were described in detail earlier (8). The concentration of oligonucleotides in 0.25 M acetic acid was adjusted to 75 mM. After the oligonucleotides were spotted using a robotic arrayer (Cartesian Technologies Inc., Ann Arbor, Mich.), the slides were dried for 20 min at 84°C and treated with a 0.25% aqueous solution of NaBH4 for 5 min. Single-stranded Cy5-labeled influenza B virus DNA samples were mixed with equal volumes of 2× hybridization buffer, denatured for 1 min at 95°C, and chilled on ice for 1 min. The final concentration of each fluorescent sample in hybridization solution was typically >0.1 μM; 3-μl aliquots from each sample were applied to the array area. Each array was covered with a 4- by 7-mm plastic coverslip, and hybridization was allowed to proceed for 30 min at 42°C. After hybridization, the slides were washed sequentially with 6× SSC (1× SSC is 0.15 M NaCl plus 0.015 M sodium citrate), 2× SSC, and 1× SSC and dried in an air stream.
Figure Figure11 shows the layout of oligoprobes on the microarray. Each array consisted of four identical subarrays, and each subarray contained 20 oligoprobes redundantly representing eight segments of the B/Beijing/184/93 strain and 20 oligoprobes representing eight segments of the B/Shangdong/7/97 strain. Each subarray consisted of four areas containing oligoprobes complementary to one of four fluorescently labeled cDNA samples designated B1, B2, S1, and S2 (see “Multiplex PCR and synthesis of Cy-5-labeled hybridization samples” above).
Eight identical arrays were printed on each slide. Therefore, each of the four fluorescently labeled cDNA samples could be hybridized twice on each slide. This design enabled us to produce eight replicates of the homotypic hybridization signal and eight replicates of the heterotypic hybridization signal (used as the negative control) for each of the 20 unique oligoprobes spotted on a slide. For example, the Beijing sample B1 could hybridize to eight homotypic Beijing-specific oligoprobes (PB2-B-591) and to eight heterotypic Shangdong-specific oligoprobes (PB2-S-591).
Microchip images were captured using a ScanArray 5000 confocal fluorescent scanner (Perkin-Elmer, Boston, Mass.) with two HeNe lasers (632 nm for excitation of Cy5 and 543 nm for Cy3). The fluorescent images were analyzed using QuantArray software (Perkin-Elmer).
Three micrograms of each NS segment amplicon was digested at 30°C for 65 min in 20 μl of reaction mixture containing 2 U of the HgaI restriction enzyme (New England BioLabs Inc., Beverly Mass.). The products of digestion were separated by electrophoresis in 9.5% polyacrylamide gels.
Let A represent the value of the intensity of the hybridization signal of a given fluorescently labeled cDNA sample hybridized with the homotypic oligoprobe on the array; let B represent the value of the intensity of the hybridization signal of the sample hybridized with the heterotypic oligoprobe on the same array. Let K be the value of the background intensity of the signal measured in mock spots on this array (printed without oligoprobe) (Fig. (Fig.1).1). We measured K values in 64 mock spots on each array.
To reduce the effect of background (nonspecific binding of the fluorescently labeled cDNA sample to the array) and to compare the homotypic hybridization signals across arrays and different slides, we estimated the relative level of homotypic hybridization signal for a given oligoprobe by the following formula:
where IA = (A − K̄)/(Ā − K̄) and IB = (B − K̄)/(Ā − K̄). K̄ is the mean value of background intensities averaged over the 64 signals measured in mock spots on the array (printed without oligoprobe [Fig. [Fig.1]).1]). Ā is the mean value of the homotypic signal averaged over four subarrays on the array. Because the oligoprobes of the array were hybridized to only one of the four labeled cDNA samples, the local (on the array) normalization of the hybridization signal on factor Ā − K̄ allowed us to compare the interarray and interslide variabilities of the specific signals for a given oligoprobe.
The four pairs of IA and IB values on each array were compared by Wilcoxon's signed-rank pair test. We also used the Kruskal-Wallis test for multiple comparisons of the arrays in the replicated experiments and a paired t test and a one-way analysis of variance test to compare the IA and IB values in the replicated experiments. Estimation of the number of replicates of arrays and the confidence probabilities of the false-negative rate for hybridizations is described in the appendix.
The MLAB (Civilized Software, Inc., Silver Spring, Md.), Statistica-5 (StatSoft, Inc., Tulsa, Okla.), and JMP (SAS Institute Inc.) programs were used for statistical analysis.
The sequences of genome segments of B/Shangdong/7/97 determined in our laboratory (23) were deposited in GenBank under accession numbers AF 484967, AY044173, AF486835, AF486836, AY044169, AY044172, AY044171, AY044170, AF101974, AF101992, AF102009, AF050061, AF100367, AF100381, and AF100399.
Figure Figure22 shows typical hybridization patterns for the B/Beijing/184/93 and B/Shangdong/7/97 strains (only one of four identical subarrays is shown). All segments were unambiguously identified as either Beijing or Shangdong, even though for some oligoprobe pairs there was some cross-hybridization.
We also obtained several hybridization patterns for reassortant samples. Representative examples are shown in Fig. Fig.3.3. In reassortant A, the segments PB2, HA, and NP were derived from the Beijing parent while the other five segments were derived from the Shangdong parent. In reassortant B, the segments PB2, HA, and M were derived from the Beijing parent while the other five segments were derived from the Shangdong parent. In reassortant C, the segments PB1, PA, HA, and M were derived from the Beijing parent while the other four segments were derived from the Shangdong parent. These results are in complete agreement with the sequence analysis data (not shown) and thus demonstrate that microarray hybridization of amplicons of reassortants between the Beijing and Shangdong strains can reveal the origins of the segments. However, in some instances, identification of the segment origin was not as clear. One such example is shown in Fig. Fig.3:3: reassortant D consisted of segments PB2, PB1, PA, NP, NA, and M derived from the Beijing parent, while segment HA was derived from the Shangdong parent. However, the hybridization intensities of all Beijing- and Shangdong-specific oligoprobes binding the NS segment were comparable. This pattern could be a result of the mixture of two NS segment genotypes in the sample. To test this possibility and to determine the limits of the ability of the microarray to detect genetic heterogeneity, we performed a detailed quantitative analysis of microarray hybridization results (see “Identification of mixed genotype samples: microarray hybridization is superior to conventional nucleotide sequencing analysis” below). To do this, we had to determine the statistical significance of the microarray hybridization data.
Seven independent hybridization experiments were performed and analyzed for evaluation of the discriminating power of individual oligoprobes. Each experiment included eight replicates of homotypic and heterotypic hybridization signals (fluorescence intensity) obtained from 20 pairs of oligoprobes (spotted on the slide) specific to B/Beijing/184/93 and B/Shangdong/7/97. Relative hybridization intensities were calculated by background subtraction and normalization as described in Materials and Methods. For each of the 40 different oligoprobes, the normalized signal intensities did not differ significantly for the matching spot pairs located on the same slide (P > 0.1 by the Wilcoxon and paired t tests) and for all seven slides (P > 0.1 by the Wilcoxon and Kruscal-Wallis tests). Hence, the background subtraction and normalization procedures allowed us to treat specific hybridization signals for the same oligoprobes on a slide and across different slides as samples chosen from the same population. The statistical comparisons of the 56 combined hybridization experiments (7 hybridization experiments, each including eight replicates) implies that the microarray method is highly sensitive, specific, and reproducible (Table (Table3).3). Comparison of homotypic and heterotypic signals by the t and F tests showed that the P values for homotypic signal intensities were statistically significant (0.000001 < P < 0.01) for 40 oligoprobes.
Sample replication within each slide and redundancy of oligoprobes for each segment of the influenza B virus genome increase the reliability of the method. In particular, performance of the seven hybridization experiments reduces the likelihood of making false-negative determinations of the presence of the segment from a given strain (Table (Table4).4). The maximum probabilities of false-negative determinations for each segment were calculated as described in the appendix. For all eight influenza B virus segments, the probabilities ranged from 0.027 to 0.00014, indicating the high power of the influenza B virus microarray to discriminate between the Beijing and Shangdong viruses.
We compared the discriminating abilities of microarray hybridization and conventional nucleotide sequencing analysis. We tested the origins of all eight segments of six reassortant strains. Table Table55 shows that for 35 of the 44 comparisons both methods provided consistent results. In the remaining nine instances, the results of hybridization revealed the presence of two genotypes similar to pattern D in Fig. Fig.33 for the NS segment. To obtain an independent confirmation that more than one genotype was present in these samples and to rule out possible cross-hybridization, we used the RFLP method. The HgaI restriction enzyme is expected to cut the NS segment of the Shangdong amplicon, producing two fragments of 730 and 300 bp, but not to cut the NS segment of the Beijing amplicon. Figure Figure44 shows the restriction patterns for the reassortant strain analyzed by microarray hybridization. The pattern of DNA bands in Fig. Fig.44 indicates that both Beijing and Shangdong sequences of the NS gene segment were present. The RFLP analysis suggests that the microarray hybridization multiple pattern was not the result of excessive cross-hybridization but was due to the presence of a mixture of NS gene segments from the Beijing and Shangdong strains. Therefore, microarray hybridization could be more sensitive than conventional nucleotide sequencing for detecting mixed influenza B virus genotypes.
The ability of microarray hybridization to detect the presence of mixed genotypes suggests that it can be used for quantitative assessment of the relative abundances of segment variants in strain mixtures. To determine the sensitivity of the microarray for detection of mixed genotypes, we spiked an NS segment sample prepared from the Beijing strain (175 μM) with 5 and 10 or 50% of the NS segment of the Shangdong strain. Figure Figure55 shows that for all three NS oligoprobe pairs, the signals produced by a Beijing sample spiked with 5% Shangdong DNA significantly exceeded that with the unspiked sample. This result suggests that microarray hybridization is sensitive to the presence of at least a 5% alternative genotype. However, despite the trend seen in Fig. Fig.5,5, the difference between spikes with 5 and 10% and with 5 and 50% (not shown) was not statistically significant, meaning that quantification of genotype mixtures was unreliable. This may be due either to saturation of the probe or to a nonlinear dose-response of hybridization intensities.
Segment reassortment is an essential feature of many viral pathogens, including influenza viruses. More than one type of reassortant virus can be present simultaneously in natural populations, further complicating their analysis. Detailed genome segment mapping of virus isolates is needed to elucidate the mechanisms leading to the emergence and evolution of viral pathogens and to facilitate the creation of new protective vaccines.
The rational design of vaccines against influenza includes the generation of reassortants between a recent field isolate and a reference strain adapted for efficient growth in the substrate used for vaccine production. After a variety of reassortant strains is generated in vitro by growing two parental strains together, a prospective vaccine strain is selected by genotyping all relevant genomic segments to identify the right combination of segments coding for immunogenic proteins and those determining high-growth characteristics. Traditionally, this time-critical task of selection of prospective vaccine strains has been accomplished by RFLP, sequencing of cDNA, or other approaches (1, 13, 18). The recently developed high-throughput microarray analysis method appears to be ideally suited for one-step mapping of segmented viruses.
Microarray hybridization was previously used to genotype individual segments of influenza A virus (16, 22). In our work, we have developed and used an improved oligonucleotide microarray method to discriminate viruses with high levels of genomic homology through identification of minor genetic differences, including single-nucleotide variations, simultaneously in all segments of influenza B viruses. We observed unambiguous discrimination between the B/Beijing/184/93 and B/Shangdong/7/97 influenza strains. These strains were discriminated with statistical certainty by using different oligoprobes, including PA-1979, PA-2034, NP-188, and M-1004, differing at only 1 nucleotide and representing the highly homologous segments PA, NP, and M of both viruses. The high level of nucleotide homology between the PA, NP, and M segments of B/Shangdong/7/97 and B/Beijing/184/93 has been an obstacle to the genotyping of reassortants derived from the two strains by RT-PCR and RFLP because of the lack of suitable restriction sites (23).
Microarray hybridization, coupled with statistical interpretation of the results, allowed us to estimate the effects of cross-hybridization and to account for nonspecific sample binding (background). Our statistical analysis demonstrated high levels of sensitivity and reproducibility for all individual oligoprobes to all influenza B virus segments; P values for specific hybridization signals for all of the 40 selected oligoprobes were highly significant (P < 0.01).
The statistical planning of the number of hybridization experiments developed in this work can be used in the future in the design of similar oligonucleotide microchip experiments to assess the discriminating power of individual oligoprobes and to select only those that produce the clearest results. Identification of the origin of each segment based on hybridization with two or more specific oligoprobes significantly increased the reliability of identification. To this end, we estimated the maximum probability of false-negative identification of each segment based on multiple results for individual oligoprobes in 56 hybridization experiments. The P values for segments of different origin were calculated by multiplication of P values for homotypic hybridization with oligoprobes specific to each segment. They were statistically significant and demonstrated high overall discriminating power of the microarray for all eight segments of influenza B virus.
Note that our approaches for estimating the minimum number of replicate experiments and for estimating the false-negative error rate for a given gene segment seem novel for microarray genotyping. However, depending on the specific designs of microarray experiments, other statistical approached can also be useful (22a).
Interestingly, the influenza B virus microarray can detect the presence in mixed viral stocks of as little as 5% of a second closely related influenza B virus strain (Fig. (Fig.4).4). The greater ability to detect mixed viral populations is an important advantage of the method over conventional methods based on nucleotide sequencing, since natural isolates often contain more than one genotype. Although the restriction pattern of DNA shown in Fig. Fig.44 indicates that the NS segments of both the Beijing and Shangdong strains were present, similar analyses of other segments and other reassortant strains prepared as vaccine candidates need to be studied in the future. Our preliminary experiments presented here showed that the method could not be used to quantify the components of a mixture without further refinement. The ability to detect as little as 5% of a second strain in a mixture on one hand and the lack of statistically significant hybridization between the values obtained for 5% and increasing spike concentrations on the other suggest that the dose-response curve may be nonlinear. There may a number of reasons for this, including saturation of immobilized probes during hybridization and nonlinear response of the fluorescence detector. Further experiments are needed to make this method more quantitative and to accurately assess the detection limit for mixtures.
The data presented in this paper suggest that the microarray method can be used for high-throughput screening of influenza B virus reassortants, substantially reducing the time needed for vaccine development. The influenza B virus array described in this paper contains only a few oligoprobes specific to each segment of two influenza B virus strains, Beijing and Shangdong. The microarray method can be expanded by adding new oligoprobes, including those specific to other strains, possibly covering the entire repertoire of currently circulating influenza B virus strains. Such extension of the array might allow us to use a more quantitative assay to determine the origins of individual genomic segments in a series of reassortant strains prepared as vaccine candidates or mixed virus populations.
Our preliminary analysis of the repertoire of vaccine reassortants suggests a highly biased (nonrandom) generation of certain segment combinations. Extensive studies of a large number of reassortant strains may help to reveal the preferred combination of segments and to examine the reason for this nonrandomness of the reassortment process.
It is known that rates of homologous recombination in negative-sense RNA viruses, including influenza virus, are substantially lower than their point mutation rates (24). However, patterns of sequence variations compatible with the action of recombination were recently observed in several RNA viruses, including influenza A virus and influenza B virus (3). A search for the evidence of within-segment recombination in influenza viruses could be done by a microarray method similar to the one developed to analyze recombinants in poliovirus (5).
The ability to perform rapid and comprehensive genomic analysis of multiple field isolates can help in efforts to recognize and respond to the emergence of new pandemic strains of influenza virus by providing a better understanding of the processes of viral evolution in populations of susceptible hosts. Similar approaches can also be developed for analysis of influenza A viruses and for other viruses with segmented genomes and can substantially advance our understanding of their evolution and help to create new effective vaccines.
We thank David Asher for critical reading of the manuscript, Gennady Rezapkin for his helpful suggestions in planning the experiments, and Valery Pickalov for creation of a database of microarray experiments.
The work was supported by grants from the Defense Advanced Research Projects Agency (DARPA) and the National Vaccine Program Office (NVPO).
To design an optimal microarray experiment, we need to know how large a sample size n (i.e., the number of replicate hybridizations [or experiments] for a homotypic oligoprobe) should be to give 95% confidence that <5% of specific hybridizations will produce false-negative results. In other words, we consider the question of how large the sample size n must be so that we can be at least 95% sure that the further use of the microarray gives false-negative values of index I (index IA or IB in equation 11) in ≤5% of homotypic hybridizations.
Let us plan to draw random samples X1, X2,…, Xn of the n hybridizations for a given homotypic oligoprobe. Let A represent the homotypic hybridizations showing a positive value of I (true event). Let B represent the homotypic hybridizations showing a negative or zero value of I (false-negative homotypic hybridization). We assume that the events B are rare or unlikely events. Let p represent the probabilities of event B, and let (1 − p) represent the probability of event A. These A and B events can be considered mutually independent and identically distributed events. Let α represent the level of significance for maximum probability of rejection of a null hypothesis, which is ascribed to a confidence probability of the events B after n true binomial experiments having probability (1 − p) in each of the n experiments. Let pB represent a maximum value of a probability of the event B for a given oligoprobe. Then, using a binomial probability distribution function BIN(0;n,pB) (11), we have P (events B in n true homotypic hybridizations) = (1 − pB)n.
Then, assuming that P = α = 1 − β, we obtain:
Let α = 1 − β = 1 − 0.95 = 0.05; then, using equation 3 we obtain n = log0.05/log(0.95) ≈ 58. Thus, we expect that ~58 replicated hybridization experiments for each homotypic oligoprobe should be sufficient to get an expected reliability for the purpose of planning experiments. In actuality, we analyzed seven slides containing 56 replicates of each of the 40 distinct homotypic oligonucleotides.
When no error is observed in all 56 of the replicated experiments, we can estimate the maximum probability of false-negative events (homotypic hybridizations), called B, by equation 2: pB = 1 − (1 − β)1/n. Then, pB = 1 − (1 − 0.95)1/56 = 0.05209.
When I, 2, …, k,… errors are found, the maximum probability of false-negative events, pB, can be estimated by the binomial probability distribution BIN(k;n,pB) (11):
We can estimate pB (k = 0) = 0.05209; pB (k = 1) = 0.0781; pB (k = 2) = 0.104; pB (k = 3) = 0.1432; pB (k = 4) = 0.1532; pB (k = 5) = 0.1829 for k = 0, 1, 2, 3, 4, 5, which have been observed in all slides.
We selected from two to four homotypic oligoprobes for each of the eight genomic segments to use in hybridization experiments. Let pB1, pB2,… represent the observed frequencies of false-negative events B1, B2,…associated with homotypic hybridization on the first, second, etc. distinct oligoprobes representing a given genome segment. Analysis of hybridization with two or more distinct oligoprobes covering a given segment in the same microarray experiment allowed us to increase the reliability of genomic-segment identification by estimating the joint probability function of false-negative events for a given genomic segment by the formula Ps = pB1pB2. … For example, when two oligoprobes used for the identification of a given segment had pB1 = 0.0781 and pB2 = 0.0108, respectively, the maximum probability of false-negative identification of the given segment was estimated to be Ps = pB1pB2 = 0.0079.