Bread wheat (Triticum aestivum) has a large and highly repetitive genome which poses major technical challenges for its study. To aid map-based cloning and future genome sequencing projects, we constructed a BAC-based physical map of the short arm of wheat chromosome 1A (1AS). From the assembly of 25,918 high information content (HICF) fingerprints from a 1AS-specific BAC library, 715 physical contigs were produced that cover almost 99% of the estimated size of the chromosome arm. The 3,414 BAC clones constituting the minimum tiling path were end-sequenced. Using a gene microarray containing ∼40 K NCBI UniGene EST clusters, PCR marker screening and BAC end sequences, we arranged 160 physical contigs (97 Mb or 35.3% of the chromosome arm) in a virtual order based on synteny with Brachypodium, rice and sorghum. BAC end sequences and information from microarray hybridisation was used to anchor 3.8 Mbp of Illumina sequences from flow-sorted chromosome 1AS to BAC contigs. Comparison of genetic and synteny-based physical maps indicated that ∼50% of all genetic recombination is confined to 14% of the physical length of the chromosome arm in the distal region. The 1AS physical map provides a framework for future genetic mapping projects as well as the basis for complete sequencing of chromosome arm 1AS.
Structural changes of chromosomes are a primary mechanism of genome rearrangement over the course of evolution and detailed knowledge of such changes in a given species and its close relatives should increase the efficiency and precision of chromosome engineering in crop improvement. We have identified sequences bordering each of the main translocation and inversion breakpoints on chromosomes 4A, 5A and 7B of the modern bread wheat genome. The locations of these breakpoints allow, for the first time, a detailed description of the evolutionary origins of these chromosomes at the gene level. Results from this study also demonstrate that, although the strategy of exploiting sorted chromosome arms has dramatically simplified the efforts of wheat genome sequencing, simultaneous analysis of sequences from homoeologous and non-homoeologous chromosomes is essential in understanding the origins of DNA sequences in polyploid species.
A previous study provided an in-depth understanding of molecular population genetics of European and Asian wheat gene pools using a sequenced 3.1-Mb contig (ctg954) on chromosome 3BS. This region is believed to carry the Fhb1 gene for response to Fusarium head blight. In this study, 266 wheat accessions were evaluated in three environments for Type II FHB response based on the single floret inoculation method. Hierarchical clustering (UPGMA) based on a Manhattan dissimilarity matrix divided the accessions into eight groups according to five FHB-related traits which have a high correlation between them; Group VIII comprised six accessions with FHB response levels similar to variety Sumai 3. Based on the compressed mixed linear model (MLM), association analysis between five FHB-related traits and 42 molecular markers along the 3.1-Mb region revealed 12 significant association signals at a threshold of P<0.05. The highest proportion of phenotypic variation (6.2%) in number of diseased spikelets (NDS) occurred at locus cfb6059, and the physical distance was about 2.9 Kb between umn10 and this marker. Haplotype block (HapB) analysis using a sliding window LD of 5 markers, detected six HapBs in the 3.1-Mb region at r2>0.1 and P<0.001 between random closely linked markers. F-tests among Haps with frequencies >0.05 within each HapB at r2>0.1 and P<0.001 showed significant differences between the Hap carried by FHB resistant resources, such as Sumai 3 and Wangshuibai, and susceptible genotypes in HapB3 and HapB6. These results suggest that Fhb1 is located within HapB6, with the possibility that another gene is located at or near HapB3. SSR markers and Haps detected in this study will be helpful in further understanding the genetic basis of FHB resistance, and provide useful information for marker-assisted selection of Fhb1 in wheat breeding.
Polyploidization is considered one of the main mechanisms of plant genome evolution. The presence of multiple copies of the same gene reduces selection pressure and permits sub-functionalization and neo-functionalization leading to plant diversification, adaptation and speciation. In bread wheat, polyploidization and the prevalence of transposable elements resulted in massive gene duplication and movement. As a result, the number of genes which are non-collinear to genomes of related species seems markedly increased in wheat.
We used new-generation sequencing (NGS) to generate sequence of a Mb-sized region from wheat chromosome arm 3DS. Sequence assembly of 24 BAC clones resulted in two scaffolds of 1,264,820 and 333,768 bases. The sequence was annotated and compared to the homoeologous region on wheat chromosome 3B and orthologous loci of Brachypodium distachyon and rice. Among 39 coding sequences in the 3DS scaffolds, 32 have a homoeolog on chromosome 3B. In contrast, only fifteen and fourteen orthologs were identified in the corresponding regions in rice and Brachypodium, respectively. Interestingly, five pseudogenes were identified among the non-collinear coding sequences at the 3B locus, while none was found at the 3DS locus.
Direct comparison of two Mb-sized regions of the B and D genomes of bread wheat revealed similar rates of non-collinear gene insertion in both genomes with a majority of gene duplications occurring before their divergence. Relatively low proportion of pseudogenes was identified among non-collinear coding sequences. Our data suggest that the pseudogenes did not originate from insertion of non-functional copies, but were formed later during the evolution of hexaploid wheat. Some evidence was found for gene erosion along the B genome locus.
Wheat; BAC sequencing; Homoeologous genomes; Gene duplication; Non-collinear genes; Allopolyploidy
The uneven distribution of recombination across the length of chromosomes results in inaccurate estimates of genetic to physical distances. In wheat (Triticum aestivum L.) chromosome 3B, it has been estimated that 90% of the cross over events occur in distal sub-telomeric regions representing 40% of the chromosome. Radiation hybrid (RH) mapping which does not rely on recombination is a strategy to map genomes and has been widely employed in animal species and more recently in some plants. RH maps have been proposed to provide i) higher and ii) more uniform resolution than genetic maps, and iii) to be independent of the distribution patterns observed for meiotic recombination. An in vivo RH panel was generated for mapping chromosome 3B of wheat in an attempt to provide a complete scaffold for this ~1 Gb segment of the genome and compare the resolution to previous genetic maps.
A high density RH map with 541 marker loci anchored to chromosome 3B spanning a total distance of 1871.9 cR was generated. Detailed comparisons with a genetic map of similar quality confirmed that i) the overall resolution of the RH map was 10.5 fold higher and ii) six fold more uniform. A significant interaction (r = 0.879 at p = 0.01) was observed between the DNA repair mechanism and the distribution of crossing-over events. This observation could be explained by accepting the possibility that the DNA repair mechanism in somatic cells is affected by the chromatin state in a way similar to the effect that chromatin state has on recombination frequencies in gametic cells.
The RH data presented here support for the first time in vivo the hypothesis of non-casual interaction between recombination hot-spots and DNA repair. Further, two major hypotheses are presented on how chromatin compactness could affect the DNA repair mechanism. Since the initial RH application 37 years ago, we were able to show for the first time that the iii) third hypothesis of RH mapping might not be entirely correct.
Non homologous end joining; Physical mapping; Gamma radiation; Deletion mutant; Chromatin; Wheat chromosome 3B; Radiation hybrid
Sequencing projects using a clone-by-clone approach require the availability of a robust physical map. The SNaPshot technology, based on pair-wise comparisons of restriction fragments sizes, has been used recently to build the first physical map of a wheat chromosome and to complete the maize physical map. However, restriction fragments sizes shared randomly between two non-overlapping BACs often lead to chimerical contigs and mis-assembled BACs in such large and repetitive genomes. Whole Genome Profiling (WGP™) was developed recently as a new sequence-based physical mapping technology and has the potential to limit this problem.
A subset of the wheat 3B chromosome BAC library covering 230 Mb was used to establish a WGP physical map and to compare it to a map obtained with the SNaPshot technology. We first adapted the WGP-based assembly methodology to cope with the complexity of the wheat genome. Then, the results showed that the WGP map covers the same length than the SNaPshot map but with 30% less contigs and, more importantly with 3.5 times less mis-assembled BACs. Finally, we evaluated the benefit of integrating WGP tags in different sequence assemblies obtained after Roche/454 sequencing of BAC pools. We showed that while WGP tag integration improves assemblies performed with unpaired reads and with paired-end reads at low coverage, it does not significantly improve sequence assemblies performed at high coverage (25x) with paired-end reads.
Our results demonstrate that, with a suitable assembly methodology, WGP builds more robust physical maps than the SNaPshot technology in wheat and that WGP can be adapted to any genome. Moreover, WGP tag integration in sequence assemblies improves low quality assembly. However, to achieve a high quality draft sequence assembly, a sequencing depth of 25x paired-end reads is required, at which point WGP tag integration does not provide additional scaffolding value. Finally, we suggest that WGP tags can support the efficient sequencing of BAC pools by enabling reliable assignment of sequence scaffolds to their BAC of origin, a feature that is of great interest when using BAC pooling strategies to reduce the cost of sequencing large genomes.
In support of the international effort to obtain a reference sequence of the bread wheat genome and to provide plant communities dealing with large and complex genomes with a versatile, easy-to-use online automated tool for annotation, we have developed the TriAnnot pipeline. Its modular architecture allows for the annotation and masking of transposable elements, the structural, and functional annotation of protein-coding genes with an evidence-based quality indexing, and the identification of conserved non-coding sequences and molecular markers. The TriAnnot pipeline is parallelized on a 712 CPU computing cluster that can run a 1-Gb sequence annotation in less than 5 days. It is accessible through a web interface for small scale analyses or through a server for large scale annotations. The performance of TriAnnot was evaluated in terms of sensitivity, specificity, and general fitness using curated reference sequence sets from rice and wheat. In less than 8 h, TriAnnot was able to predict more than 83% of the 3,748 CDS from rice chromosome 1 with a fitness of 67.4%. On a set of 12 reference Mb-sized contigs from wheat chromosome 3B, TriAnnot predicted and annotated 93.3% of the genes among which 54% were perfectly identified in accordance with the reference annotation. It also allowed the curation of 12 genes based on new biological evidences, increasing the percentage of perfect gene prediction to 63%. TriAnnot systematically showed a higher fitness than other annotation pipelines that are not improved for wheat. As it is easily adaptable to the annotation of other plant genomes, TriAnnot should become a useful resource for the annotation of large and complex genomes in the future.
cluster; gene models; pipeline; plant genome; structural and functional annotation; transposable elements; wheat
Transposable elements (TEs) are mobile, repetitive DNA sequences that are almost ubiquitous in prokaryotic and eukaryotic genomes. They have a large impact on genome structure, function and evolution. With the recent development of high-throughput sequencing methods, many genome sequences have become available, making possible comparative studies of TE dynamics at an unprecedented scale. Several methods have been proposed for the de novo identification of TEs in sequenced genomes. Most begin with the detection of genomic repeats, but the subsequent steps for defining TE families differ. High-quality TE annotations are available for the Drosophila melanogaster and Arabidopsis thaliana genome sequences, providing a solid basis for the benchmarking of such methods. We compared the performance of specific algorithms for the clustering of interspersed repeats and found that only a particular combination of algorithms detected TE families with good recovery of the reference sequences. We then applied a new procedure for reconciling the different clustering results and classifying TE sequences. The whole approach was implemented in a pipeline using the REPET package. Finally, we show that our combined approach highlights the dynamics of well defined TE families by making it possible to identify structural variations among their copies. This approach makes it possible to annotate TE families and to study their diversification in a single analysis, improving our understanding of TE dynamics at the whole-genome scale and for diverse species.
Because of its size, allohexaploid nature and high repeat content, the wheat genome has always been perceived as too complex for efficient molecular studies. We recently constructed the first physical map of a wheat chromosome (3B). However gene mapping is still laborious in wheat because of high redundancy between the three homoeologous genomes. In contrast, in the closely related diploid species, barley, numerous gene-based markers have been developed. This study aims at combining the unique genomic resources developed in wheat and barley to decipher the organisation of gene space on wheat chromosome 3B.
Three dimensional pools of the minimal tiling path of wheat chromosome 3B physical map were hybridised to a barley Agilent 15K expression microarray. This led to the fine mapping of 738 barley orthologous genes on wheat chromosome 3B. In addition, comparative analyses revealed that 68% of the genes identified were syntenic between the wheat chromosome 3B and barley chromosome 3 H and 59% between wheat chromosome 3B and rice chromosome 1, together with some wheat-specific rearrangements. Finally, it indicated an increasing gradient of gene density from the centromere to the telomeres positively correlated with the number of genes clustered in islands on wheat chromosome 3B.
Our study shows that novel structural genomics resources now available in wheat and barley can be combined efficiently to overcome specific problems of genetic anchoring of physical contigs in wheat and to perform high-resolution comparative analyses with rice for deciphering the organisation of the wheat gene space.
Multi-allelic microsatellite markers have become the markers of choice for the determination of genetic structure in plants. Synteny across cereals has allowed the cross-species and cross-genera transferability of SSR markers, which constitute a valuable and cost-effective tool for the genetic analysis and marker-assisted introgression of wild related species. Hordeum chilense is one of the wild relatives with a high potential for cereal breeding, due to its high crossability (both interspecies and intergenera) and polymorphism for adaptation traits. In order to analyze the genetic structure and ecogeographical adaptation of this wild species, it is necessary to increase the number of polymorphic markers currently available for the species. In this work, the possibility of using syntenic wheat SSRs as a new source of markers for this purpose has been explored.
From the 98 wheat EST-SSR markers tested for transferability and polymorphism in the wild barley genome, 53 primer pairs (54.0%) gave cross-species transferability and 20 primer pairs (20.4%) showed polymorphism. The latter were used for further analysis in the H. chilense germplasm. The H. chilense-Triticum aestivum addition lines were used to test the chromosomal location of the new polymorphic microsatellite markers. The genetic structure and diversity was investigated in a collection of 94 H. chilense accessions, using a set of 49 SSR markers distributed across the seven chromosomes. Microsatellite markers showed a total of 351 alleles over all loci. The number of alleles per locus ranged from two to 27, with a mean of 7.2 alleles per locus and a mean Polymorphic Information Content (PIC) of 0.5.
According to the results, the germplasm can be divided into two groups, with morphological and ecophysiological characteristics being key determinants of the population structure. Geographic and ecological structuring was also revealed in the analyzed germplasm. A significant correlation between geographical and genetic distance was detected in the Central Chilean region for the first time in the species. In addition, significant ecological influence in genetic distance has been detected for one of the population structure groups (group II) in the Central Chilean region. Finally, the association of the SSR markers with ecogeographical variables was investigated and one marker was found significantly associated with precipitation. These findings have a potential application in cereal breeding.
Physical maps are the substrate of genome sequencing and map-based cloning and their construction relies on the accurate assembly of BAC clones into large contigs that are then anchored to genetic maps with molecular markers. High Information Content Fingerprinting has become the method of choice for large and repetitive genomes such as those of maize, barley, and wheat. However, the high level of repeated DNA present in these genomes requires the application of very stringent criteria to ensure a reliable assembly with the FingerPrinted Contig (FPC) software, which often results in short contig lengths (of 3-5 clones before merging) as well as an unreliable assembly in some difficult regions. Difficulties can originate from a non-linear topological structure of clone overlaps, low power of clone ordering algorithms, and the absence of tools to identify sources of gaps in Minimal Tiling Paths (MTPs).
To address these problems, we propose a novel approach that: (i) reduces the rate of false connections and Q-clones by using a new cutoff calculation method; (ii) obtains reliable clusters robust to the exclusion of single clone or clone overlap; (iii) explores the topological contig structure by considering contigs as networks of clones connected by significant overlaps; (iv) performs iterative clone clustering combined with ordering and order verification using re-sampling methods; and (v) uses global optimization methods for clone ordering and Band Map construction. The elements of this new analytical framework called Linear Topological Contig (LTC) were applied on datasets used previously for the construction of the physical map of wheat chromosome 3B with FPC. The performance of LTC vs. FPC was compared also on the simulated BAC libraries based on the known genome sequences for chromosome 1 of rice and chromosome 1 of maize.
The results show that compared to other methods, LTC enables the construction of highly reliable and longer contigs (5-12 clones before merging), the detection of "weak" connections in contigs and their "repair", and the elongation of contigs obtained by other assembly methods.
The complexity of the wheat genome has resulted from waves of retrotransposable element insertions. Gene deletions and disruptions generated by the fast replacement of repetitive elements in wheat have resulted in disruption of colinearity at a micro (sub-megabase) level among the cereals. In view of genomic changes that are possible within a given time span, conservation of genes between species tends to imply an important functional or regional constraint that does not permit a change in genomic structure. The ctg1034 contig completed in this paper was initially studied because it was assigned to the Sr2 resistance locus region, but detailed mapping studies subsequently assigned it to the long arm of 3B and revealed its unusual features.
BAC shotgun sequencing of the hexaploid wheat (Triticum aestivum cv. Chinese Spring) genome has been used to assemble a group of 15 wheat BACs from the chromosome 3B physical map FPC contig ctg1034 into a 783,553 bp genomic sequence. This ctg1034 sequence was annotated for biological features such as genes and transposable elements. A three-gene island was identified among >80% repetitive DNA sequence. Using bioinformatics analysis there were no observable similarity in their gene functions. The ctg1034 gene island also displayed complete conservation of gene order and orientation with syntenic gene islands found in publicly available genome sequences of Brachypodium distachyon, Oryza sativa, Sorghum bicolor and Zea mays, even though the intergenic space and introns were divergent.
We propose that ctg1034 is located within the heterochromatic C-band region of deletion bin 3BL7 based on the identification of heterochromatic tandem repeats and presence of significant matches to chromodomain-containing gypsy LTR retrotransposable elements. We also speculate that this location, among other highly repetitive sequences, may account for the relative stability in gene order and orientation within the gene island.
Sequence data from this article have been deposited with the GenBank Data Libraries under accession no. GQ422824
Rye (Secale cereale L.) belongs to tribe Triticeae and is an important temperate cereal. It is one of the parents of man-made species Triticale and has been used as a source of agronomically important genes for wheat improvement. The short arm of rye chromosome 1 (1RS), in particular is rich in useful genes, and as it may increase yield, protein content and resistance to biotic and abiotic stress, it has been introgressed into wheat as the 1BL.1RS translocation. A better knowledge of the rye genome could facilitate rye improvement and increase the efficiency of utilizing rye genes in wheat breeding.
Here, we report on BAC end sequencing of 1,536 clones from two 1RS-specific BAC libraries. We obtained 2,778 (90.4%) useful sequences with a cumulative length of 2,032,538 bp and an average read length of 732 bp. These sequences represent 0.5% of 1RS arm. The GC content of the sequenced fraction of 1RS is 45.9%, and at least 84% of the 1RS arm consists of repetitive DNA. We identified transposable element junctions in BESs and developed insertion site based polymorphism markers (ISBP). Out of the 64 primer pairs tested, 17 (26.6%) were specific for 1RS. We also identified BESs carrying microsatellites suitable for development of 1RS-specific SSR markers.
This work demonstrates the utility of chromosome arm-specific BAC libraries for targeted analysis of large Triticeae genomes and provides new sequence data from the rye genome and molecular markers for the short arm of rye chromosome 1.