|Home | About | Journals | Submit | Contact Us | Français|
Most human cancers are characterized by genomic instability, the accumulation of multiple genetic alterations and allelic imbalance throughout the genome. Loss of heterozygosity (LOH) is a common form of allelic imbalance and the detection of LOH has been used to identify genomic regions that harbor tumor suppressor genes and to characterize tumor stages and progression. Here we describe the use of high-density oligonucleotide arrays for genome-wide scans for LOH and allelic imbalance in human tumors. The arrays contain redundant sets of probes for 600 genetic loci that are distributed across all human chromosomes. The arrays were used to detect allelic imbalance in two types of human tumors, and a subset of the results was confirmed using conventional gel-based methods. We also tested the ability to study heterogeneous cell populations and found that allelic imbalance can be detected in the presence of a substantial background of normal cells. The detection of LOH and other chromosomal changes using large numbers of single nucleotide polymorphism (SNP) markers should enable identification of patterns of allelic imbalance with potential prognostic and diagnostic utility.
Neoplastic progression is generally characterized by the accumulation of multiple genetic alterations including loss of tumor suppressor gene function. Identification of the alterations involved in initiation and progression of premalignant conditions to cancer will help address many questions concerning the mechanisms of neoplastic progression in vivo and facilitate the discovery of diagnostic and prognostic markers and potential therapeutic targets.
The classic mechanism of tumor suppressor gene inactivation is described by the two-hit model in which one allele is mutated and the other allele is lost through a number of possible mechanisms, resulting in loss of heterozygosity (LOH) at multiple loci (Knudson 1985; Hansen and Cavenee 1987; Brown 1997). LOH can arise by a variety of genetic mechanisms, including physical deletion, chromosome nondisjunction, mitotic nondisjunction followed by reduplication of the remaining chromosome, mitotic recombination and gene conversion. LOH is one example of allelic imbalance. Allelic imbalance can arise from the complete loss of an allele or from an increase in copy number of one allele relative to the other. Allelic imbalances can be detected by measuring the proportion of one allele relative to the other in cells from individuals that are constitutionally heterozygous at a given locus. LOH involves complete loss of one of the two alleles at a locus, but normal cell contamination can confound the distinction between true LOH and other mechanisms of allelic imbalance. However, studies using flow-cytometrically purified samples have shown that complete LOH can be clearly detected in tissue samples (Barrett et al. 1996; Boige et al. 1997; Paulson et al. 1999). Studies have shown that neoplastic progression is often associated with the accumulation of somatic-cell genetic changes as the tumor progresses to advanced stages (Vogelstein et al. 1989; Fults et al. 1990; Sato et al. 1990; Stanbridge 1990; Tsuchiya et al. 1992; Yamaguchi et al. 1992; Thrash-Bingham et al. 1995; Reid et al. 1996). Thus, characterization of genome-wide patterns of allelic imbalance may provide a molecular basis for prognosis as well as aid in the identification of specific regions that harbor tumor suppressor genes.
Large-scale LOH measurements are difficult to perform with conventional approaches that employ restriction fragment length polymorphism (RFLP) or polymorphic microsatellite markers (short tandem repeats or STRs). RFLP markers have low heterozygosity rates and are available in small numbers. Gel-based microsatellite assays are difficult to automate and are not readily scalable (Gruis et al. 1993). As a result, most genome-wide scans for LOH have been conducted at low resolution with a relatively small number of polymorphic markers. For example, an average of 120 STRs was used to determine the allelotypes of multiple different human neoplasms in a series of studies since 1995, and the highest density STR allelotypes used ~280 polymorphic markers (Field et al. 1995; Hahn et al. 1995; Takeuchi et al. 1995; Califano et al. 1996; Johns et al. 1996; Tamura et al. 1996; Baccichet et al. 1997; Boige et al. 1997; Gleeson et al. 1997; Kawanishi et al. 1997; Mori et al. 1997; Chambon-Pautas et al. 1998; Hatta et al. 1998; Piao et al. 1998; Shih et al. 1998; Mao et al. 1999; Yustein et al. 1999). Comparative genomic hybridization (CGH) and cDNA microarrays can be useful for measuring genome-wide increases or decreases in DNA copy number (Forozan et al. 1997; Pollack et al. 1999). However, beginning with the seminal study by Cavenee et al. (1983), several reports have indicated that LOH can occur by genetic mechanisms (e.g., mitotic recombination, mitotic nondisjunction followed by chromosome reduplication, gene conversion) that do not lead to changes in DNA copy number. For example, it has been shown that a large number of LOH events result from mitotic recombination (Gupta et al. 1997; Hagstron and Dryja 1999), which does not lead to DNA copy number changes but could be detected as LOH by use of genetic polymorphisms such as single nucleotide polymorphisms (SNPs).
The recent identification of large numbers of SNPs in the human genome provides a rich set of markers that can be used in a wide variety of genetic studies. Biallelic SNPs are highly abundant, estimated at more than 3 × 106 in the human genome (Kruglyak 1997). In addition, SNPs can be amplified by multiplex PCR (Wang et al. 1998) in contrast with microsatellite markers that generally require individual amplification reactions. The amplification step makes it possible to use only small amounts of genomic DNA, which is often essential when working with limited clinical material. Furthermore, SNP analysis can be performed on high-density oligonucleotide arrays (Wang et al. 1998), eliminating the need for gel-based analysis. This study describes the use of SNPs combined with oligonucleotide probe array technology (Fodor et al. 1991, 1993; Pease et al. 1994; Chee et al. 1996; Lockhart et al. 1996; Wodicka et al. 1997; Gunderson et al. 1998) to detect changes in allelic representation in human tumors in a reproducible, accurate, sensitive, scaleable, and efficient manner.
The arrays were designed for the determination of the genotype of up to 600 biallelic SNPs (Figs. 1 and and2;2; a list of markers is available from the authors on request). On the basis of a previous study (Wang et al. 1998), we estimated that ~440 of the 600 loci are truly polymorphic. These polymorphic loci are distributed across all human chromosomes and have an average heterozygosity of 0.33. The basic approach to the genotyping of these markers is similar to that described by Wang et al. (1998), but the SNP array design and analysis algorithms used here are different. For each locus, the SNP array interrogates not only the polymorphic base (position 0) but also four additional bases for each allele, two on each side, flanking the polymorphic position (positions −4, −1, +1 and +4; Fig. 1A). This probe redundancy improves the confidence of the genotype calls. As shown in Figure 1A, the probe set for each interrogated base includes four oligonucleotide probes that differ only at the central position (referred to collectively as a tile and shown as four squares in the figure). Separate tiles are constructed for the A allele and the B allele at positions −4, −1, +1 and +4. At position 0 (polymorphic base), both alleles share a single tile (Fig. 1B). To increase accuracy further, both sense and antisense strands are queried on the array using the same type of probe sets. Genotypes for each locus were determined by calculation of the fraction of the A allele () in target samples, and chromosomal changes were assessed by measurement of the difference in values between normal and tumor samples from the same individual (see Methods).
The ability to detect allelic imbalance was first demonstrated in a family case study with two unaffected parents and a child with two separate neurofibromatosis type 2 (NF-2) tumors. This case had been studied previously using conventional RFLP markers (Wolff et al. 1992), but the information about tumor type and the results of RFLP analysis were blinded prior to the SNP array experiments described here. The SNP-containing loci were amplified by multiplex PCR from genomic DNA derived from blood and genomic DNA from tumor tissues. PCR products were subsequently labeled with biotin and hybridized to the SNP arrays. As shown in Figure 3, one parent is heterozygous (AB) and the other is homozygous (BB) at one locus on chromosome 22, while the child is a heterozygote (AB). Tumor samples from two independent tumors taken from the child showed a clear loss of the A allele at this locus. The analysis identified only three SNPs that showed clear evidence of LOH. Those three SNPs were all located on chromosome 22, consistent with the previous RFLP analysis that also identified LOH only on chromosome 22 (Wolff et al. 1992; Seizinger et al. 1986).
We tested the reproducibility of the SNP array-based allelic imbalance analysis by performing triplicate experiments with purified aneuploid DNA obtained from a patient with an esophageal adenocarcinoma. Three independent amplification and labeling reactions for 558 SNPs were performed on DNA derived from the patient’s normal cells and a purified aneuploid cell population that had been separated from the normal cells by DNA content flow cytometry. The three independent preparations for the two cell populations were hybridized to six separate SNP arrays. The genotypes for the triplicate experiments were determined by use of an algorithm that calculates the fraction of the A allele () for each marker in the target samples. The values were calculated only for loci that passed the quality analysis, indicating sufficient signal and a clear hybridization pattern (see Methods). A total of 470 loci consistently passed the quality analysis for both the normal and the aneuploid samples across three independent preparations. One hundred and fifty loci were informative (i.e., clearly heterozygous in the normal sample) for this individual. The independently obtained values were highly correlated (with linear correlation coefficient 0.99) for both the normal replicates (Fig. 4A) and the aneuploid replicates (Fig. 4B). In contrast, the values were significantly different between the normal and aneuploid samples for a number of loci (Fig. 4C,D). Loci with values that shift from the heterozygous range in the normal sample to the homozygous range in the aneuploid sample were scored as loci with a change in allelic representation. Of the 470 loci that passed the quality analysis in the triplicates, 33 were consistently scored as showing allelic imbalance and 434 were consistently scored either as showing no allelic imbalance (117) or as not informative (317). Thus, 22% of the informative loci showed allelic imbalance [fractional locus loss (FLL) of 0.22], which is similar to previously published fractional allelic loss (FAL) values of 0.22, 0.28, and 0.29 for esophageal adenocarcinoma (Barrett et al. 1996; Hammoud et al. 1996; Dolan et al. 1998). Only 3 out of the 470 loci (0.64%) gave inconsistent scoring across the three pairs of samples. The highly consistent results demonstrate that the SNP array-based analysis is reproducible with minimal variation introduced at each experimental step.
The extent of genome-wide chromosomal changes detected in the aneuploid population from esophageal adenocarcinoma (triplicate experiment) can be contrasted to that seen for the NF-2 tumor (Fig. 5). The significant difference in the number and location of events between the two tumor types may reflect the underlying biological differences between the benign NF-2 tumor and the malignant esophageal adenocarcinoma.
To confirm the array-based observations, we performed an independent analysis with polymorphic short tandem repeats (STRs) on the same aneuploid and normal DNA samples. We selected 81 STRs that mapped within or flanked SNP loci that have been scored as allelic imbalance in the triplicate experiment (for detailed criteria for scoring allelic imbalance, see Methods). Nine chromosomes (4, 5, 6, 7, 8, 11, 12, 13, and 18) were identified with loci with allelic imbalance by use of SNP arrays (Fig. 5), and eight out of the nine chromosome regions were confirmed to have allelic imbalance by STR analysis (Fig. 6). On multiple chromosomes, the losses extended across large regions. For example, on chromosome 7 the loss region identified by SNP analysis extended at least 92 cM, and STR analysis confirmed that the loss was contiguous throughout this entire region. In the single unconfirmed case (chromosome 13), the STR markers used in this region were not informative for this specific individual and, therefore, the event identified by the SNP array could not be confirmed by the STR analysis. For the STR analysis, rigorous criteria were used for calling allelic imbalance (see Methods). While we cannot rule out chromosome copy number changes for some loci with allelic imbalance, the majority (80%) of STR loci with allelic imbalance showed complete loss of one allele (Fig. 6). These data strongly suggest that, in the majority of cases, the observed allelic imbalance was the result of an LOH event.
We performed a genome-wide analysis with the SNP arrays on 10 patients with either high-grade dysplasia (HGD), the precursor to esophageal adenocarcinoma, or esophageal adenocarcinoma. For each patient, the normal DNA was derived from control gastric tissues whereas the tumor DNA was extracted from flow-cytometrically purified aneuploid populations. The aneuploid cell populations comprised, on average, 67% of the cells per biopsy, but after flow-cytometric cell sorting, the aneuploid populations were >95% pure. Figure 7 shows the SNPs with allelic imbalance for a subset of the aneuploid populations. In general, a larger number of chromosomal events were observed for patients who had developed cancer than those with HGD, consistent with data from previous studies (Barrett et al. 1996). Previously published data suggest that premalignant tissues typically contain fewer chromosomal aberrations than cancers and that losses frequently involve regions on chromosomes 9p and 17p, which were detected with the SNP arrays (Paulson 1999).
Next, we compared the array-based results with those obtained with a previously designed set of STR markers, comprised primarily of tetranucleotide repeats. We performed an independent analysis on three chromosomes (9, 17, and 18) in the same 10 aneuploid populations. A high frequency of LOH, as evidenced by complete loss of one allele, on these three chromosomes is known to be associated with esophageal cancer (Reid et al. 1996) and the STR markers were previously selected to increase the sensitivity of detection in targeted regions on these chromosomes. The SNP markers, on the other hand, were chosen randomly with no bias toward targeted regions. In addition, because the STRs were not selected to be in regions covering or flanking the SNPs used on the array, we expected to see some degree of discordance. Nonetheless, the SNP array and the STR analysis show consistent identification of allelic imbalance events on 24 of 30 chromosomes (Fig. 8). For 5 chromosomes (patients 2 and 7 on chromosome 9; patients 2, 4 and 5 on chromosome 18) no loss was detected by either technique, even though there were many informative markers. On four of 30 chromosomes (13%), allelic imbalance was detected in the STR analysis but not detected by the SNPs, as a result of either the absence of informative markers (patients 4 and 9 on chromosome 17; patient 8 on chromosome 18) or a false negative (patient 1 on chromosome 9). On 2 of 30 chromosomes (6.7 %), allelic imbalance was detected by a single SNP marker but was not confirmed by the STR analysis (patients 2 and 9 on chromosome 17). It is possible that the SNPs were mapped incorrectly or the STR analysis missed the events. Interstitial losses were also detected by both techniques on chromosome 18 (patient 9). Many examples of partial chromosomal losses were identified by both techniques (e.g., patients 3, 6, and 10 on chromosome 17). The comparison between the standard gel-based results and the SNP array-based results shows that, given a sufficient number of polymorphic markers, the SNP arrays can be used to screen for both small and large chromosomal losses. A higher density of SNP markers will help increase coverage and resolution, allowing a greater fraction of the genome to be checked simultaneously for somatic cell chromosome abnormalities.
Because premalignant and tumor samples are often heterogeneous, containing normal cells as well as neoplastic cell populations, it is important to be able to detect chromosomal changes in nonhomogeneous cell populations. Known loci with allelic imbalance (identified in the triplicate experiment and validated by STR analysis) were used to detect allelic imbalance in simulated heterogeneous samples. The aneuploid population purified from normal DNA by flow-cytometry was mixed into DNA from the same patient’s normal control sample in increasing amounts (0%, 5%, 10%, 25%, 50%, 75%, 90%, 95%, and 100%) to simulate the heterogeneity of biopsy samples. DNA was mixed either prior to or after the locus-specific multiplex amplification and labeling reactions to determine whether the amplification procedures affected the relative representation of the two alleles (Fig. 9A,B). Two sets of samples, nine mixed before and nine mixed after the PCR steps, were applied to 18 separate SNP arrays and hybridized under identical conditions. The same aneuploid population was used in this experiment as in the previous triplicate experiments, in which we identified 33 loci with allelic imbalance by comparing normal (0% aneuploid) and aneuploid (100%) samples. The mixing experiment was repeated three times, and 28 of the 33 loci passed the quality test for all 18 mixtures. Figure 9A shows that the values for one of the markers change linearly as a function of the percentage aneuploid DNA in the sample. As expected, the genotype for this marker gradually shifts from being clearly heterozygous in the pure normal sample to being homozygous as the proportion of aneuploid DNA increases (Fig. 9A). To show the overall behavior of all 28 loci, the values were averaged for the 13 loci shifting to an AA genotype and for the 15 loci shifting to a BB genotype from their initially heterozygous state (Fig. 9B). Figure 9C shows a comparison of difference scans for the 50% mixture and the 100% aneuploid samples. The data show that the Δ values for the 50% mixed sample decrease, compared with those for the 100% aneuploid sample, as expected. If the same difference threshold (Δ = 20, as indicated by the dashed line) is applied to the data for the 50% mixed sample, 18 of the 28 loci show differences above the threshold. If the difference threshold is lowered to 15, 26 of the 28 loci are scored as allelic imbalance in the 50% mixed sample (Fig. 9C). However, lowering the threshold also resulted in three additional loci in the 100% aneuploid sample being scored as allelic imbalance. Further tests are required to determine whether these three are real and to determine the best threshold for investigations of heterogeneous samples.
We have demonstrated the feasibility of using SNPs and high-density oligonucleotide arrays in genome-wide screening for allelic imbalance in human tumors. The SNP array used here yielded ~150 informative loci per patient, comparable with the number of STRs used in current genome-wide LOH screens. For example, in 17 different genome-wide allelotype studies conducted over the past 5 years, the average number of STRs used was 120, and the study using the largest number of loci for LOH analysis included 280 STR polymorphisms (Field et al. 1995; Hahn et al. 1995; Takeuchi et al. 1995; Califano et al. 1996; Johns et al. 1996; Tamura et al. 1996; Baccichet et al. 1997; Boige et al. 1997; Gleeson et al. 1997; Kawanishi et al. 1997; Mori et al. 1997; Chambon-Pautas et al. 1998; Hatta et al. 1998; Piao et al. 1998; Shih et al. 1998; Mao et al. 1999; Yustein et al. 1999). However, for prognostic and diagnostic utility, genome-wide analysis will require a greater number of SNP markers that are more evenly distributed throughout the genome. In addition, because of the lower average heterozygosity rate of SNPs (0.33) compared with STRs, approximately three times the number of SNPs are required for an equivalent resolution (Kruglyak 1997). Higher density SNP arrays should greatly increase the ability to detect small regions of chromosomal changes and will provide more information regarding the boundaries of loss regions. In addition, more markers increase confidence in a detected event: If multiple adjacent SNPs all show a consistent change, the confidence in the call is much higher than if it is based on only a single SNP. It is clearly feasible to increase the density of SNP markers as SNPs are abundant in the human genome and SNP discovery and mapping is rapidly advancing (Wang et al. 1998; Cargill et al. 1999; Halushka et al. 1999). Because the array-based readout is parallel and scalable, larger numbers of markers can be assayed simultaneously without significant increases in time or labor.
SNP arrays have many advantages for LOH detection compared with traditional techniques. The PCR products containing SNP loci are typically smaller and more readily amplified in parallel than with STRs, and may be better for amplifying DNA from formalin-fixed or compromised tissues. Also, the amount of cellular DNA required to interrogate a SNP on an array is significantly less than that required for standard STR analysis, providing an opportunity to evaluate limited clinical samples.
Surgically removed tumor tissues often contain some normal cells that can interfere with the detection of changes in tumor cells. Therefore, it is important to be able to detect chromosomal changes in heterogeneous samples in which the tumor cells may represent only a portion of the sampled cell population. We simulated a heterogeneous cell population by preparing a mixture of purified aneuploid DNA with normal control DNA from the same patient. With the SNP arrays, we were able to detect chromosomal changes in heterogeneous samples, and changes can be clearly and reproducibly identified in samples with a background of up to 50% normal DNA (Fig. 9A–C). As described previously, high sample purity is required to distinguish true LOH from other types of allelic imbalance because of the confounding effects of normal cell contamination (Barrett et al. 1996; Boige et al. 1997; Paulson et al. 1999). Our mixing experiments reinforce the importance of working with purified samples to distinguish between true LOH and other mechanisms of allelic imbalance.
At present the SNP-based method cannot distinguish between loss and gain of alleles. With higher density SNP arrays, it may be possible to use signal intensity differences between tumor and normal samples to indicate chromosomal loss or gain. In a recent study, 3360 mapped cDNAs were used in a microarray hybridization assay (Pollack et al. 1999). This technique provides an approach for the detection of DNA copy number changes, which is complementary to a SNP-based method that detects changes in allelic representation.
The identification and mapping of additional SNP markers is rapidly advancing, and array-based methods provide a scalable approach to the simultaneous genotyping of thousands of markers in parallel. The availability of more markers and higher capacity array designs will allow efficient, genome-wide, high-resolution searches for chromosomal changes associated with tumor initiation and progression. The patterns of chromosomal alterations may be useful for diagnostic purposes and to follow disease progression and guide patient care.
Frozen endoscopic or surgical biopsies were processed by DNA content flow cytometry to purify aneuploid cells from normal cells as described previously (Paulson et al. 1999). Aneuploid populations separated by this method have a high degree of purity and typically represent clonal populations (Barrett et al. 1999). The use of purified aneuploid populations allows for detection of near 100% LOH at some loci, making these samples ideal for comparing different LOH detection methodologies in human biopsy samples. DNA was extracted using the Puregene DNA Isolation Kit (Gentra Systems, Inc.). STR polymorphisms used consisted of primarily tetranucleotide repeats shown previously to have a high degree of reproducibility for scoring LOH (Paulson et al. 1999). Locus-specific PCR reagents and conditions for STR amplification and analysis were described previously (Paulson et al. 1999). PCR products were analyzed on an ABI 377 DNA Sequencer, and data were processed by use of Genotyper software (PE Applied Bio-systems). Allelic imbalance was assessed by measurement of the ratio of fluorescence intensity for the shorter allele A to that of the longer allele B (A/B) in the aneuploid sample, compared with a normal constitutive control. Ratios <0.4 or >2.5 (depending on which allele was lost) were considered to be indicative of allelic imbalance.
SNP-containing loci were amplified by allele-specific multiplex PCR from both tumor and normal genomic DNA. The multiplex PCR was performed by use of 46 PCR primer pairs in a single reaction (Wang et al. 1998). Forward and reverse primers contained T7 and T3 sequences, respectively (Wang et al. 1998). The PCR was performed under conditions similar to those described by Wang et al. (1998). The volume of PCR was 20 μl, containing 7 ng of genomic DNA, 0.1 μm of each primer, 1 unit of AmpliTaq Gold (Perkin-Elmer), 1 mm deoxynucleotide triphosphates (dNTPs), 10 mm Tris-HCl (pH 8.3), 50 mm KCl, and 5 mm MgCl2. Thermocycling was performed with initial denaturation at 96°C for 10 min, followed by 30 cycles of denaturation at 96°C for 30 sec, primer annealing at 55°C for 2 min, and primer extension at 65°C for 2 min. After 30 cycles, a final extension reaction was carried out at 65°C for 5 min. The length of the amplified PCR products was from 100 to 150 bp. An aliquot (2 μl) of the multiplex PCR products was subjected to a second round of PCR with biotinylated T7 and T3 primers. The reactions were performed with 0.1 μM labeled primer, 1 unit of AmpliTaq Gold, 100 μm dNTPs, 10 mm Tris-HCl (pH 8.3), 50 mm KCl, and 1.5 mm MgCl2. Thermocycling was carried out with initial denaturation at 96°C for 10 min, followed by 25 cycles of denaturation at 96°C for 30 sec, primer annealing at 55°C for 1 min, and primer extension at 72°C for 1 min. After 25 cycles, a final extension reaction was carried out at 72°C for 5 min. The biotin-labeled products were pooled and denatured at 99°C for 15 min and chilled on ice for 3 min before being added into a hybridization solution [3 M Tetramethl-ammonium Chloride, 10 mm Tris (pH 7.8), 0.01% Triton-X100 and 0.1 mg/ml herring sperm DNA]. Biotin-labeled control oligonucleotide was also added to the hybridization solution to produce fluorescence signals at the corners of the image for proper grid alignment and image analysis. An aliquot of 200 × of the hybridization mixture was added to the flow cell of the SNP arrays. Hybridization was carried out at 40°C for 16 hr on a rotisserie (50 rpm). Following hybridization, arrays were washed with 6× SSPE buffer [0.9 m NaCl, 60 mm NaH2PO4, 6 mM EDTA, (pH 7.4)] at room temperature. Then, the arrays were stained with Phycoerythrin-conjugated streptavidin (Molecular Probes, 2 μg/ml in 6× SSPE buffer, 0.01% Triton, 0.5 mg/ml BSA) on a rotisserie (50 rpm) for 10 min at room temperature. The arrays were washed again with 6× SSPE buffer and scanned with a custom-made scanning confocal microscope at a resolution of 3.4 μm per pixel (Trulson et al. 1997).
Typical data analysis consisted of three sequential steps. First, the data underwent a quality analysis to reject loci lacking sufficiently strong and specific hybridization patterns. Second, the A-allele fraction () was calculated for loci that passed the quality analysis. Third, significant changes were assessed by calculating the difference of values between the tumor sample and the corresponding normal sample at each locus.
The quality analysis was designed to identify and ignore loci that do not yield sufficiently clear hybridization patterns. This analysis is based on the idea that if a SNP marker is present in a target sample, it should hybridize to its complementary sequences tiled on the array and produce a specific hybridization pattern in which perfect match (PM) probes have higher intensity than mismatch (MM) probes. The intensity difference (PM − MM) and ratio (PM/MM) are calculated for each allele at each locus. For a given allele, if PM − MM > Difference Threshold (DT) and PM/MM > Ratio Threshold (RT), the allele is scored as present.
The appropriate values of DT and RT were developed and optimized through a series of analyses with a set of known control samples. In these experiments, 558 SNP markers from three individuals were amplified by single PCRs. The results show that in all three individuals, 510 PCR products had a single specific product of the expected size. These 510 loci were used as controls for false negative scores because they should give a specific hybridization pattern and be scored present. To test for false positive scores, 42 SNPs were chosen not to be amplified and therefore to give no hybridization pattern and be scored as absent. The products of these single PCRs for three individuals were pooled together and hybridized to three separated SNP arrays. False positive and false negative rates were measured for different combinations of DT and RT values, and thresholds were selected that gave the lowest overall false positive and false negative rates.
To score a locus, we also analyzed all probes that represented the marker. As shown in Fig. 1B, both the A- and B-allele tiles together at each position define a miniblock. If the signal for both alleles failed the DT and RT criteria, the miniblock was ignored. One strand of the marker was represented by 5 miniblocks (positions − 4, − 1, 0, +1 and + 4). If 3 miniblocks failed, the block was ignored. Both strands are queried for each marker using the same block structure. Therefore, if both the sense and antisense blocks failed, the marker was ignored.
To determine the genotype, we used an algorithm that estimates the fraction of the A allele for each marker. The average percentage fraction of the A allele is defined as
where a and b represent the A and B allele, respectively, and MM is the average of the MM values as shown in Fig. 1B. Ideally, = 100 (homozygous AA), = 50 (heterozygous AB), or = 0 (homozygous BB). To define the experimental deviation from ideality, a reference range for each genotype was determined empirically by hybridizing samples from 39 unrelated individuals of known genotypes as described previously (Wang et al. 1998). The range of values for each marker was defined by the presence of three distinct clusters representing the three genotypes. Although the absolute genotype calls are not crucial for the difference analysis, it is necessary to have clear distinctions between the values for heterozygous calls in normal samples and homozygous calls in tumor samples.
Allelic imbalance was assessed by measurement of the difference in values between normal and tumor samples from the same individual. The difference value is defined as: Δ = |N − T|, where N denotes normal sample and T denotes tumor sample. Criteria for scoring allelic imbalance were developed with a training data set containing two normal samples and two tumor samples with known deletions (data not shown) and confirmed by the triplicate experiment (Figs. 4 and and5).5). First, to consider a marker potentially informative, the value for the normal sample had to be in the heterozygous range (75 ≥ N ≥ 25). For a marker to be considered as changed, the value for the tumor sample had to be in the homozygous range
and had to be Δ > 20. Finally, a change had to be consistent across the five miniblocks of each probe set, and if both strands of a marker passed the quality analysis, the change had to be called consistently for both strands.
We thank David Wang and Eric Lander for providing multiplex primer pools, PCR conditions and the SNP map, and Don Morris for designing the SNP array. This work was partially supported by National Institutes of Health Grants R01 CA61202 and RFA CA78855 to P.C.G and B.J.R.