In many plant species, high-resolution mapping of genes is limited by lack of sufficient DNA markers. This limitation is especially significant when quantitative trait loci (QTLs) control a trait because QTLs may remain undetected or their effects may be underestimated when marker density is low. Linkage disequilibrium (LD) maps and association mapping also require the identification of many markers at very high resolution from many different individuals. Marker-assisted breeding is another application that requires abundant markers for integration of genes/traits into modern crop varieties.
Single nucleotide polymorphisms (SNPs) are abundant and provide a rich source of potential DNA markers. Individual SNPs may also directly contribute to phenotypic variation if they are in an intragenic or promoter region [1
] and can be used as perfect markers for genes/traits of interests. In addition to their abundance, SNPs have the advantage of several high throughput genotyping platforms that significantly reduce the cost per data point. In soybean, resequencing sequence-tagged sites derived from ESTs led to discovery of SNPs, and a map consisting of 1,141 SNP loci was generated using three RIL populations [3
]. Similarly, a barley map made of 300 SNP loci was constructed using SNPs developed from resequencing unigenes [4
Although bread wheat (Triticum aestivum
L.) is also a major world food crop, progress on SNP discovery has been slow compared to soybean and model organisms such as Arabidopsis and rice [5
]. The wheat genome has not yet been sequenced due to the huge genome size (~17,000 Mb) and because it contains about 80% repetitive sequences [7
]. Wheat is an allohexaploid with 21 chromosomes consisting of seven homoeologous chromosomes from each of three ancestral genomes (A, B, D). The three genomes are closely related, which complicates SNP analysis of homoeologous gene sequences [8
]. Wheat generally has low sequence polymorphism as a consequence of bottlenecks encountered during polyploidization and domestication [9
]. Large expressed sequence tag (EST) databases have been developed for wheat and these have been successfully mined for SNPs using contig alignments and/or resequencing [8
]. However, the number of SNPs available for genotyping in wheat is still relatively small and many SNPs are only polymorphic in wild wheat relatives [11
]. New technologies that detect genome-wide polymorphisms in wheat are needed to discover a large number of new markers for genomic research and breeding in wheat.
SNPs and insertion or deletions of one or more nucleotides (indels) are DNA polymorphisms that can affect hybridization of DNA or cRNA to a probe on an array. The Affymetrix GeneChip arrays are suitable to detect such variations because each gene is represented by a set of eleven 25-bp probes that are sensitive to target mismatch owing to their short sequence. A target sequence that perfectly matches the sequence of a probe binds with much greater affinity than one with a mismatch sequence. The resulting difference in hybridization intensity between two genotypes for an individual probe is called a single feature polymorphism (SFP), where a feature refers to a probe in the array. A SFP may be caused by a SNP, a multiple nucleotide polymorphism, or an indel. However, if cRNA is used for hybridization, gene expression markers (GEMs) that reflect expression level differences may also be detected [12
Winzeler et al. first described the method for detection of SFPs by hybridizing DNA from different yeast strains to high-density oligonucleotide arrays [14
]. They identified 3,714 markers that were used for high-resolution mapping of five loci in yeast. Using the same approach, about 4,000 SFPs were identified between two A. thaliana
strains by using the AtGenome1 GeneChip [15
]. DNA sequence alignment of AtGenome1 feature sequences with publicly available Arabidopsis sequence data confirmed that 117 out of 121 AtGenome1-predicted SFP have sequence variants. In addition, a known mutation was mapped by bulked segregant analysis, hybridization of pools of mutants and wild-types to the microarray [15
]. Singer et al. used an array-based hybridization method to construct an SFP map in Arabidopsis containing 676 markers [16
]. In barley, more than 10,000 SFPs were discovered using the Affymetrix Barley1 GeneChip [17
]). Out of 450 barley SFPs, 270 were verified to contain SNPs by sequence comparison with barley sequence datasets [17
]. A study by Kumar et al. detected 5,376 SFPs in rice between two japonica
subspecies and 25,325 SFPs between japonica
Microarray hybridizations with genomic DNA may not be satisfactory for SFP discovery in species with large genomes [19
]. Several studies successfully used labeled cRNA instead of genomic DNA to hybridize to the array to reduced background and enrich for expressed gene sequences [12
]. In wheat, 297 SFPs were identified between near-isogenic lines contrasting in stripe rust resistance using the Affymetrix GeneChip Wheat Genome Array [21
]. Gore et al. compared different target preparation methods to reduce target complexity [22
]. They tested the Affymetrix Maize Genome Array for SFP detection using a set of 13,000 probes with known sequence. Results showed that the best enrichment method using all the Maize Genome Array data should be able to detect about 10,000 SNP in maize at a 20% false discovery rate. One shortcoming of this approach is that transcripts represented in cRNA pools can vary greatly between tissues, developmental stages, and treatments.
This research was designed to explore the utility of the Affymetrix GeneChip Wheat Genome Array for discovering and mapping SFPs in the large and complex hexaploid wheat genome. Pooled RNAs from two tissues were used to increase the diversity of transcripts and cRNA instead of DNA was used to minimize the problems of large genome size and repetitive DNA. A greater concern was the potential for interference between homoeologous or paralogous gene copies in probe hybridizations. Mochida et al. estimated that less than half of homoeologous genes were expressed by only one genome and many were expressed by all three genomes [8
]. Akhunov et al. reported that one quarter of all wheat gene motifs were present in two or more paralogous copies [23
]. The current study used two different strategies for reducing the problem of interference. In the first experiment, a panel of six diverse varieties of wheat was analyzed for probe × variety interactions. Cluster analysis was used to filter results and only biallelic SFPs with intermediate frequencies were retained. In the second experiment, SFPs in 71 recombinant inbred lines (RILs) from the cross of Ning 7840/Clark were analyzed. SFPs that gave clear allele calls in at least 60 RILs were selected for map construction.