Bread wheat (Triticum aestivum) has a large and highly repetitive genome which poses major technical challenges for its study. To aid map-based cloning and future genome sequencing projects, we constructed a BAC-based physical map of the short arm of wheat chromosome 1A (1AS). From the assembly of 25,918 high information content (HICF) fingerprints from a 1AS-specific BAC library, 715 physical contigs were produced that cover almost 99% of the estimated size of the chromosome arm. The 3,414 BAC clones constituting the minimum tiling path were end-sequenced. Using a gene microarray containing ∼40 K NCBI UniGene EST clusters, PCR marker screening and BAC end sequences, we arranged 160 physical contigs (97 Mb or 35.3% of the chromosome arm) in a virtual order based on synteny with Brachypodium, rice and sorghum. BAC end sequences and information from microarray hybridisation was used to anchor 3.8 Mbp of Illumina sequences from flow-sorted chromosome 1AS to BAC contigs. Comparison of genetic and synteny-based physical maps indicated that ∼50% of all genetic recombination is confined to 14% of the physical length of the chromosome arm in the distal region. The 1AS physical map provides a framework for future genetic mapping projects as well as the basis for complete sequencing of chromosome arm 1AS.
Diploid Aegilops umbellulata and Ae. comosa and their natural allotetraploid hybrids Ae. biuncialis and Ae. geniculata are important wild gene sources for wheat. With the aim of assisting in alien gene transfer, this study provides gene-based conserved orthologous set (COS) markers for the U and M genome chromosomes. Out of the 140 markers tested on a series of wheat-Aegilops chromosome introgression lines and flow-sorted subgenomic chromosome fractions, 100 were assigned to Aegilops chromosomes and six and seven duplications were identified in the U and M genomes, respectively. The marker-specific EST sequences were BLAST-ed to Brachypodium and rice genomic sequences to investigate macrosyntenic relationships between the U and M genomes of Aegilops, wheat and the model species. Five syntenic regions of Brachypodium identified genome rearrangements differentiating the U genome from the M genome and from the D genome of wheat. All of them seem to have evolved at the diploid level and to have been modified differentially in the polyploid species Ae. biuncialis and Ae. geniculata. A certain level of wheat–Aegilops homology was detected for group 1, 2, 3 and 5 chromosomes, while a clearly rearranged structure was showed for the group 4, 6 and 7 Aegilops chromosomes relative to wheat. The conserved orthologous set markers assigned to Aegilops chromosomes promise to accelerate gene introgression by facilitating the identification of alien chromatin. The syntenic relationships between the Aegilops species, wheat and model species will facilitate the targeted development of new markers specific for U and M genomic regions and will contribute to the understanding of molecular processes related to allopolyploidization.
Bread wheat (Triticum aestivum L.) is one of the most important crops worldwide and its production faces pressing challenges, the solution of which demands genome information. However, the large, highly repetitive hexaploid wheat genome has been considered intractable to standard sequencing approaches. Therefore the International Wheat Genome Sequencing Consortium (IWGSC) proposes to map and sequence the genome on a chromosome-by-chromosome basis.
We have constructed a physical map of the long arm of bread wheat chromosome 1A using chromosome-specific BAC libraries by High Information Content Fingerprinting (HICF). Two alternative methods (FPC and LTC) were used to assemble the fingerprints into a high-resolution physical map of the chromosome arm. A total of 365 molecular markers were added to the map, in addition to 1122 putative unique transcripts that were identified by microarray hybridization. The final map consists of 1180 FPC-based or 583 LTC-based contigs.
The physical map presented here marks an important step forward in mapping of hexaploid bread wheat. The map is orders of magnitude more detailed than previously available maps of this chromosome, and the assignment of over a thousand putative expressed gene sequences to specific map locations will greatly assist future functional studies. This map will be an essential tool for future sequencing of and positional cloning within chromosome 1A.
The assembly of the bread wheat genome sequence is challenging due to allohexaploidy and extreme repeat content (>80%). Isolation of single chromosome arms by flow sorting can be used to overcome the polyploidy problem, but the repeat content cause extreme assembly fragmentation even at a single chromosome level. Long jump paired sequencing data (mate pairs) can help reduce assembly fragmentation by joining multiple contigs into single scaffolds. The aim of this work was to assess how mate pair data generated from multiple displacement amplified DNA of flow-sorted chromosomes affect assembly fragmentation of shotgun assemblies of the wheat chromosomes.
Three mate pair (MP) libraries (2 Kb, 3 Kb, and 5 Kb) were sequenced to a total coverage of 89x and 64x for the short and long arm of chromosome 7B, respectively. Scaffolding using SSPACE improved the 7B assembly contiguity and decreased gene space fragmentation, but the degree of improvement was greatly affected by scaffolding stringency applied. At the lowest stringency the assembly N50 increased by ~7 fold, while at the highest stringency N50 was only increased by ~1.5 fold. Furthermore, a strong positive correlation between estimated scaffold reliability and scaffold assembly stringency was observed. A 7BS scaffold assembly with reduced MP coverage proved that assembly contiguity was affected only to a small degree down to ~50% of the original coverage.
The effect of MP data integration into pair end shotgun assemblies of wheat chromosome was moderate; possibly due to poor contig assembly contiguity, the extreme repeat content of wheat, and the use of amplified chromosomal DNA for MP library construction.
Wheat; Assembly; Scaffold; Mate-pair; MDA; Improvement
Polyploidization is considered one of the main mechanisms of plant genome evolution. The presence of multiple copies of the same gene reduces selection pressure and permits sub-functionalization and neo-functionalization leading to plant diversification, adaptation and speciation. In bread wheat, polyploidization and the prevalence of transposable elements resulted in massive gene duplication and movement. As a result, the number of genes which are non-collinear to genomes of related species seems markedly increased in wheat.
We used new-generation sequencing (NGS) to generate sequence of a Mb-sized region from wheat chromosome arm 3DS. Sequence assembly of 24 BAC clones resulted in two scaffolds of 1,264,820 and 333,768 bases. The sequence was annotated and compared to the homoeologous region on wheat chromosome 3B and orthologous loci of Brachypodium distachyon and rice. Among 39 coding sequences in the 3DS scaffolds, 32 have a homoeolog on chromosome 3B. In contrast, only fifteen and fourteen orthologs were identified in the corresponding regions in rice and Brachypodium, respectively. Interestingly, five pseudogenes were identified among the non-collinear coding sequences at the 3B locus, while none was found at the 3DS locus.
Direct comparison of two Mb-sized regions of the B and D genomes of bread wheat revealed similar rates of non-collinear gene insertion in both genomes with a majority of gene duplications occurring before their divergence. Relatively low proportion of pseudogenes was identified among non-collinear coding sequences. Our data suggest that the pseudogenes did not originate from insertion of non-functional copies, but were formed later during the evolution of hexaploid wheat. Some evidence was found for gene erosion along the B genome locus.
Wheat; BAC sequencing; Homoeologous genomes; Gene duplication; Non-collinear genes; Allopolyploidy
Nuclear genomes of human, animals, and plants are organized into subunits called chromosomes. When isolated into aqueous suspension, mitotic chromosomes can be classified using flow cytometry according to light scatter and fluorescence parameters. Chromosomes of interest can be purified by flow sorting if they can be resolved from other chromosomes in a karyotype. The analysis and sorting are carried out at rates of 102–104 chromosomes per second, and for complex genomes such as wheat the flow sorting technology has been ground-breaking in reducing genome complexity for genome sequencing. The high sample rate provides an attractive approach for karyotype analysis (flow karyotyping) and the purification of chromosomes in large numbers. In characterizing the chromosome complement of an organism, the high number that can be studied using flow cytometry allows for a statistically accurate analysis. Chromosome sorting plays a particularly important role in the analysis of nuclear genome structure and the analysis of particular and aberrant chromosomes. Other attractive but not well-explored features include the analysis of chromosomal proteins, chromosome ultrastructure, and high-resolution mapping using FISH. Recent results demonstrate that chromosome flow sorting can be coupled seamlessly with DNA array and next-generation sequencing technologies for high-throughput analyses. The main advantages are targeting the analysis to a genome region of interest and a significant reduction in sample complexity. As flow sorters can also sort single copies of chromosomes, shotgun sequencing DNA amplified from them enables the production of haplotype-resolved genome sequences. This review explains the principles of flow cytometric chromosome analysis and sorting (flow cytogenetics), discusses the major uses of this technology in genome analysis, and outlines future directions.
Chromosome sorting; Chromosome-specific BAC libraries; Chromosome sequencing; Chromosome genomics; Genome complexity reduction; Flow cytometry; Physical mapping
Bread wheat, one of the world’s staple food crops, has the largest, highly repetitive and polyploid genome among the cereal crops. The wheat genome holds the key to crop genetic improvement against challenges such as climate change, environmental degradation, and water scarcity. To unravel the complex wheat genome, the International Wheat Genome Sequencing Consortium (IWGSC) is pursuing a chromosome- and chromosome arm-based approach to physical mapping and sequencing. Here we report on the use of a BAC library made from flow-sorted telosomic chromosome 3A short arm (t3AS) for marker development and analysis of sequence composition and comparative evolution of homoeologous genomes of hexaploid wheat.
The end-sequencing of 9,984 random BACs from a chromosome arm 3AS-specific library (TaaCsp3AShA) generated 11,014,359 bp of high quality sequence from 17,591 BAC-ends with an average length of 626 bp. The sequence represents 3.2% of t3AS with an average DNA sequence read every 19 kb. Overall, 79% of the sequence consisted of repetitive elements, 1.38% as coding regions (estimated 2,850 genes) and another 19% of unknown origin. Comparative sequence analysis suggested that 70-77% of the genes present in both 3A and 3B were syntenic with model species. Among the transposable elements, gypsy/sabrina (12.4%) was the most abundant repeat and was significantly more frequent in 3A compared to homoeologous chromosome 3B. Twenty novel repetitive sequences were also identified using de novo repeat identification. BESs were screened to identify simple sequence repeats (SSR) and transposable element junctions. A total of 1,057 SSRs were identified with a density of one per 10.4 kb, and 7,928 junctions between transposable elements (TE) and other sequences were identified with a density of one per 1.39 kb. With the objective of enhancing the marker density of chromosome 3AS, oligonucleotide primers were successfully designed from 758 SSRs and 695 Insertion Site Based Polymorphisms (ISBPs). Of the 96 ISBP primer pairs tested, 28 (29%) were 3A-specific and compared to 17 (18%) for 96 SSRs.
This work reports on the use of wheat chromosome arm 3AS-specific BAC library for the targeted generation of sequence data from a particular region of the huge genome of wheat. A large quantity of sequences were generated from the A genome of hexaploid wheat for comparative genome analysis with homoeologous B and D genomes and other model grass genomes. Hundreds of molecular markers were developed from the 3AS arm-specific sequences; these and other sequences will be useful in gene discovery and physical mapping.
The purpose of the study is to elucidate the sequence composition of the short arm of rye chromosome 1 (Secale cereale) with special focus on its gene content, because this portion of the rye genome is an integrated part of several hundreds of bread wheat varieties worldwide.
Multiple Displacement Amplification of 1RS DNA, obtained from flow sorted 1RS chromosomes, using 1RS ditelosomic wheat-rye addition line, and subsequent Roche 454FLX sequencing of this DNA yielded 195,313,589 bp sequence information. This quantity of sequence information resulted in 0.43× sequence coverage of the 1RS chromosome arm, permitting the identification of genes with estimated probability of 95%. A detailed analysis revealed that more than 5% of the 1RS sequence consisted of gene space, identifying at least 3,121 gene loci representing 1,882 different gene functions. Repetitive elements comprised about 72% of the 1RS sequence, Gypsy/Sabrina (13.3%) being the most abundant. More than four thousand simple sequence repeat (SSR) sites mostly located in gene related sequence reads were identified for possible marker development. The existence of chloroplast insertions in 1RS has been verified by identifying chimeric chloroplast-genomic sequence reads. Synteny analysis of 1RS to the full genomes of Oryza sativa and Brachypodium distachyon revealed that about half of the genes of 1RS correspond to the distal end of the short arm of rice chromosome 5 and the proximal region of the long arm of Brachypodium distachyon chromosome 2. Comparison of the gene content of 1RS to 1HS barley chromosome arm revealed high conservation of genes related to chromosome 5 of rice.
The present study revealed the gene content and potential gene functions on this chromosome arm and demonstrated numerous sequence elements like SSRs and gene-related sequences, which can be utilised for future research as well as in breeding of wheat and rye.
This study evaluates the potential of flow cytometry for chromosome sorting in two wild diploid wheats Aegilops umbellulata and Ae. comosa and their natural allotetraploid hybrids Ae. biuncialis and Ae. geniculata. Flow karyotypes obtained after the analysis of DAPI-stained chromosomes were characterized and content of chromosome peaks was determined. Peaks of chromosome 1U could be discriminated in flow karyotypes of Ae. umbellulata and Ae. biuncialis and the chromosome could be sorted with purities exceeding 95%. The remaining chromosomes formed composite peaks and could be sorted in groups of two to four. Twenty four wheat SSR markers were tested for their position on chromosomes of Ae. umbellulata and Ae. comosa using PCR on DNA amplified from flow-sorted chromosomes and genomic DNA of wheat-Ae. geniculata addition lines, respectively. Six SSR markers were located on particular Aegilops chromosomes using sorted chromosomes, thus confirming the usefulness of this approach for physical mapping. The SSR markers are suitable for marker assisted selection of wheat-Aegilops introgression lines. The results obtained in this work provide new opportunities for dissecting genomes of wild relatives of wheat with the aim to assist in alien gene transfer and discovery of novel genes for wheat improvement.
Wheat is one of the world's most important crops and is characterized by a large polyploid genome. One way to reduce genome complexity is to isolate single chromosomes using flow cytometry. Low coverage DNA sequencing can provide a snapshot of individual chromosomes, allowing a fast characterization of their main features and comparison with other genomes. We used massively parallel 454 pyrosequencing to obtain a 2x coverage of wheat chromosome 5A. The resulting sequence assembly was used to identify TEs, genes and miRNAs, as well as to infer a virtual gene order based on the synteny with other grass genomes. Repetitive elements account for more than 75% of the genome. Gene content was estimated considering non-redundant reads showing at least one match to ESTs or proteins. The results indicate that the coding fraction represents 1.08% and 1.3% of the short and long arm respectively, projecting the number of genes of the whole chromosome to approximately 5,000. 195 candidate miRNA precursors belonging to 16 miRNA families were identified. The 5A genes were used to search for syntenic relationships between grass genomes. The short arm is closely related to Brachypodium chromosome 4, sorghum chromosome 8 and rice chromosome 12; the long arm to regions of Brachypodium chromosomes 4 and 1, sorghum chromosomes 1 and 2 and rice chromosomes 9 and 3. From these similarities it was possible to infer the virtual gene order of 392 (5AS) and 1,480 (5AL) genes of chromosome 5A, which was compared to, and found to be largely congruent with the available physical map of this chromosome.
Positional cloning in bread wheat is a tedious task due to its huge genome size and hexaploid character. BAC libraries represent an essential tool for positional cloning. However, wheat BAC libraries comprise more than million clones, which makes their screening very laborious. Here, we present a targeted approach based on chromosome-specific BAC libraries. Such libraries were constructed from flow-sorted arms of wheat chromosome 7D. A library from the short arm (7DS) consisting of 49,152 clones with 113 kb insert size represented 12.1 arm equivalents whereas a library from the long arm (7DL) comprised 50,304 clones of 116 kb providing 14.9x arm coverage. The 7DS library was PCR screened with markers linked to Russian wheat aphid resistance gene DnCI2401, the 7DL library was screened by hybridization with a probe linked to greenbug resistance gene Gb3. The small number of clones combined with high coverage made the screening highly efficient and cost effective.
The presence of closely related genomes in polyploid species makes the assembly of total genomic sequence from shotgun sequence reads produced by the current sequencing platforms exceedingly difficult, if not impossible. Genomes of polyploid species could be sequenced following the ordered-clone sequencing approach employing contigs of bacterial artificial chromosome (BAC) clones and BAC-based physical maps. Although BAC contigs can currently be constructed for virtually any diploid organism with the SNaPshot high-information-content-fingerprinting (HICF) technology, it is currently unknown if this is also true for polyploid species. It is possible that BAC clones from orthologous regions of homoeologous chromosomes would share numerous restriction fragments and be therefore included into common contigs. Because of this and other concerns, physical mapping utilizing the SNaPshot HICF of BAC libraries of polyploid species has not been pursued and the possibility of doing so has not been assessed. The sole exception has been in common wheat, an allohexaploid in which it is possible to construct single-chromosome or single-chromosome-arm BAC libraries from DNA of flow-sorted chromosomes and bypass the obstacles created by polyploidy.
The potential of the SNaPshot HICF technology for physical mapping of polyploid plants utilizing global BAC libraries was evaluated by assembling contigs of fingerprinted clones in an in silico merged BAC library composed of single-chromosome libraries of two wheat homoeologous chromosome arms, 3AS and 3DS, and complete chromosome 3B. Because the chromosome arm origin of each clone was known, it was possible to estimate the fidelity of contig assembly. On average 97.78% or more clones, depending on the library, were from a single chromosome arm. A large portion of the remaining clones was shown to be library contamination from other chromosomes, a feature that is unavoidable during the construction of single-chromosome BAC libraries.
The negligibly low level of incorporation of clones from homoeologous chromosome arms into a contig during contig assembly suggested that it is feasible to construct contigs and physical maps using global BAC libraries of wheat and almost certainly also of other plant polyploid species with genome sizes comparable to that of wheat. Because of the high purity of the resulting assembled contigs, they can be directly used for genome sequencing. It is currently unknown but possible that equally good BAC contigs can be also constructed for polyploid species containing smaller, more gene-rich genomes.
Rye (Secale cereale L.) belongs to tribe Triticeae and is an important temperate cereal. It is one of the parents of man-made species Triticale and has been used as a source of agronomically important genes for wheat improvement. The short arm of rye chromosome 1 (1RS), in particular is rich in useful genes, and as it may increase yield, protein content and resistance to biotic and abiotic stress, it has been introgressed into wheat as the 1BL.1RS translocation. A better knowledge of the rye genome could facilitate rye improvement and increase the efficiency of utilizing rye genes in wheat breeding.
Here, we report on BAC end sequencing of 1,536 clones from two 1RS-specific BAC libraries. We obtained 2,778 (90.4%) useful sequences with a cumulative length of 2,032,538 bp and an average read length of 732 bp. These sequences represent 0.5% of 1RS arm. The GC content of the sequenced fraction of 1RS is 45.9%, and at least 84% of the 1RS arm consists of repetitive DNA. We identified transposable element junctions in BESs and developed insertion site based polymorphism markers (ISBP). Out of the 64 primer pairs tested, 17 (26.6%) were specific for 1RS. We also identified BESs carrying microsatellites suitable for development of 1RS-specific SSR markers.
This work demonstrates the utility of chromosome arm-specific BAC libraries for targeted analysis of large Triticeae genomes and provides new sequence data from the rye genome and molecular markers for the short arm of rye chromosome 1.
Flow cytometry facilitates sorting of single chromosomes and chromosome arms which can be used for targeted genome analysis. However, the recovery of microgram amounts of DNA needed for some assays requires sorting of millions of chromosomes which is laborious and time consuming. Yet, many genomic applications such as development of genetic maps or physical mapping do not require large DNA fragments. In such cases time-consuming de novo sorting can be minimized by utilizing whole-genome amplification.
Here we report a protocol optimized in barley including amplification of DNA from only ten thousand chromosomes, which can be isolated in less than one hour. Flow-sorted chromosomes were treated with proteinase K and amplified using Phi29 multiple displacement amplification (MDA). Overnight amplification in a 20-microlitre reaction produced 3.7 – 5.7 micrograms DNA with a majority of products between 5 and 30 kb. To determine the purity of sorted fractions and potential amplification bias we used quantitative PCR for specific genes on each chromosome. To extend the analysis to a whole genome level we performed an oligonucleotide pool assay (OPA) for interrogation of 1524 loci, of which 1153 loci had known genetic map positions. Analysis of unamplified genomic DNA of barley cv. Akcent using this OPA resulted in 1426 markers with present calls. Comparison with three replicates of amplified genomic DNA revealed >99% concordance. DNA samples from amplified chromosome 1H and a fraction containing chromosomes 2H – 7H were examined. In addition to loci with known map positions, 349 loci with unknown map positions were included. Based on this analysis 40 new loci were mapped to 1H.
The results indicate a significant potential of using this approach for physical mapping. Moreover, the study showed that multiple displacement amplification of flow-sorted chromosomes is highly efficient and representative which considerably expands the potential of chromosome flow sorting in plant genomics.
Genomics of rye (Secale cereale L.) is impeded by its large nuclear genome (1C~7,900 Mbp) with prevalence of DNA repeats (> 90%). An attractive possibility is to dissect the genome to small parts after flow sorting particular chromosomes and chromosome arms. To test this approach, we have chosen 1RS chromosome arm, which represents only 5.6% of the total rye genome. The 1RS arm is an attractive target as it carries many important genes and because it became part of the wheat gene pool as the 1BL.1RS translocation.
We demonstrate that it is possible to sort 1RS arm from wheat-rye ditelosomic addition line. Using this approach, we isolated over 10 million of 1RS arms using flow sorting and used their DNA to construct a 1RS-specific BAC library, which comprises 103,680 clones with average insert size of 73 kb. The library comprises two sublibraries constructed using HindIII and EcoRI and provides a deep coverage of about 14-fold of the 1RS arm (442 Mbp). We present preliminary results obtained during positional cloning of the stem rust resistance gene SrR, which confirm a potential of the library to speed up isolation of agronomically important genes by map-based cloning.
We present a strategy that enables sorting short arms of several chromosomes of rye. Using flow-sorted chromosomes, we have constructed a deep coverage BAC library specific for the short arm of chromosome 1R (1RS). This is the first subgenomic BAC library available for rye and we demonstrate its potential for positional gene cloning. We expect that the library will facilitate development of a physical contig map of 1RS and comparative genomics of the homoeologous chromosome group 1 of wheat, barley and rye.