Brassica rapa (AA) contains very diverse forms which include oleiferous types and many vegetable types. Genome sequence of B. rapa line Chiifu (ssp. pekinensis), a leafy vegetable type, was published in 2011. Using this knowledge, it is important to develop genomic resources for the oleiferous types of B. rapa. This will allow more involved molecular mapping, in-depth study of molecular mechanisms underlying important agronomic traits and introgression of traits from B. rapa to major oilseed crops - B. juncea (AABB) and B. napus (AACC). The study explores the availability of SNPs in RNA-seq generated contigs of three oleiferous lines of B. rapa - Candle (ssp. oleifera, turnip rape), YSPB-24 and Tetra (ssp. trilocularis, Yellow sarson) and their use in genome-wide linkage mapping and specific-region fine mapping using a RIL population between Chiifu and Tetra.
RNA-seq was carried out on the RNA isolated from young inflorescences containing unopened floral buds, floral axis and small leaves, using Illumina paired-end sequencing technology. Sequence assembly was carried out using the Velvet de-novo programme and the assembled contigs were organised against Chiifu gene models, available in the BRAD-CDS database. RNA-seq confirmed the presence of more than 17,000 single-copy gene models described in the BRAD database. The assembled contigs and the BRAD gene models were analyzed for the presence of SSRs and SNPs. While the number of SSRs was limited, more than 0.2 million SNPs were observed between Chiifu and the three oleiferous lines. Assays for SNPs were designed using KASPar technology and tested on a F7-RIL population derived from a Chiifu x Tetra cross. The design of the SNP assays were based on three considerations - the 50 bp flanking region of the SNPs should be strictly similar, the SNP should have a read-depth of ≥7 and no exon/intron junction should be present within the 101 bp target region. Using these criteria, a total of 640 markers (580 for genome-wide mapping and 60 for specific-region mapping) marking as many genes were tested for mapping. Out of 640 markers that were tested, 594 markers could be mapped unambiguously which included 542 markers for genome-wide mapping and 42 markers for fine mapping of the tet-o locus that is involved with the trait tetralocular ovary in the line Tetra.
A large number of SNPs and PSVs are present in the transcriptome of B. rapa lines for genome-wide linkage mapping and specific-region fine mapping. Criteria used for SNP identification delivered markers, more than 93% of which could be successfully mapped to the F7–RIL population of Chiifu x Tetra cross.
Brassica rapa; RNA-seq; Next generation sequencing; Single nucleotide polymorphism (SNP); Paralog specific variation (PSV); Coding DNA Sequences (CDS); KASPar assays
MITE, TRIM and SINEs are miniature form transposable elements (mTEs) that are ubiquitous and dispersed throughout entire plant genomes. Tens of thousands of members cause insertion polymorphism at both the inter- and intra- species level. Therefore, mTEs are valuable targets and resources for development of markers that can be utilized for breeding, genetic diversity and genome evolution studies. Taking advantage of the completely sequenced genomes of Brassica rapa and B. oleracea, characterization of mTEs and building a curated database are prerequisite to extending their utilization for genomics and applied fields in Brassica crops.
We have developed BrassicaTED as a unique web portal containing detailed characterization information for mTEs of Brassica species. At present, BrassicaTED has datasets for 41 mTE families, including 5894 and 6026 members from 20 MITE families, 1393 and 1639 members from 5 TRIM families, 1270 and 2364 members from 16 SINE families in B. rapa and B. oleracea, respectively. BrassicaTED offers different sections to browse structural and positional characteristics for every mTE family. In addition, we have added data on 289 MITE insertion polymorphisms from a survey of seven Brassica relatives. Genes with internal mTE insertions are shown with detailed gene annotation and microarray-based comparative gene expression data in comparison with their paralogs in the triplicated B. rapa genome. This database also includes a novel tool, K BLAST (Karyotype BLAST), for clear visualization of the locations for each member in the B. rapa and B. oleracea pseudo-genome sequences.
BrassicaTED is a newly developed database of information regarding the characteristics and potential utility of mTEs including MITE, TRIM and SINEs in B. rapa and B. oleracea. The database will promote the development of desirable mTE-based markers, which can be utilized for genomics and breeding in Brassica species. BrassicaTED will be a valuable repository for scientists and breeders, promoting efficient research on Brassica species. BrassicaTED can be accessed at http://im-crop.snu.ac.kr/BrassicaTED/index.php.
Brassica; Miniature inverted-repeat transposable element (MITE); Terminal-repeat retrotransposon in miniature (TRIM); Miniature form transposable elements (mTEs); Short interspersed elements (SINEs); TE Database
Brassica oleracea is a morphologically diverse species in the family Brassicaceae and contains a group of nutrition-rich vegetable crops, including common heading cabbage, cauliflower, broccoli, kohlrabi, kale, Brussels sprouts. This diversity along with its phylogenetic membership in a group of three diploid and three tetraploid species, and the recent availability of genome sequences within Brassica provide an unprecedented opportunity to study intra- and inter-species divergence and evolution in this species and its close relatives.
We have developed a comprehensive database, Bolbase, which provides access to the B. oleracea genome data and comparative genomics information. The whole genome of B. oleracea is available, including nine fully assembled chromosomes and 1,848 scaffolds, with 45,758 predicted genes, 13,382 transposable elements, and 3,581 non-coding RNAs. Comparative genomics information is available, including syntenic regions among B. oleracea, Brassica rapa and Arabidopsis thaliana, synonymous (Ks) and non-synonymous (Ka) substitution rates between orthologous gene pairs, gene families or clusters, and differences in quantity, category, and distribution of transposable elements on chromosomes. Bolbase provides useful search and data mining tools, including a keyword search, a local BLAST server, and a customized GBrowse tool, which can be used to extract annotations of genome components, identify similar sequences and visualize syntenic regions among species. Users can download all genomic data and explore comparative genomics in a highly visual setting.
Bolbase is the first resource platform for the B. oleracea genome and for genomic comparisons with its relatives, and thus it will help the research community to better study the function and evolution of Brassica genomes as well as enhance molecular breeding research. This database will be updated regularly with new features, improvements to genome annotation, and new genomic sequences as they become available. Bolbase is freely available at http://ocri-genomics.org/bolbase.
Brassica oleracea; Database; Genome sequence; Synteny; Comparative genomics
The Brassica species include an important group of crops and provide opportunities for studying the evolutionary consequences of polyploidy. They are related to Arabidopsis thaliana, for which the first complete plant genome sequence was obtained and their genomes show extensive, although imperfect, conserved synteny with that of A. thaliana. A large number of EST sequences, derived from a range of different Brassica species, are available in the public database, but no public microarray resource has so far been developed for these species.
We assembled unigenes using ~800,000 EST sequences, mainly from three species: B. napus, B. rapa and B. oleracea. The assembly was conducted with the aim of co-assembling ESTs of orthologous genes (including homoeologous pairs of genes in B. napus from each of the A and C genomes), but resolving assemblies of paralogous, or paleo-homoeologous, genes (i.e. the genes related by the ancestral genome triplication observed in diploid Brassica species). 90,864 unique sequence assemblies were developed. These were incorporated into the BAC sequence annotation for the Brassica rapa Genome Sequencing Project, enabling the identification of cognate genomic sequences for a proportion of them. A 60-mer oligo microarray comprising 94,558 probes was developed using the unigene sequences. Gene expression was analysed in reciprocal resynthesised B. napus lines and the B. oleracea and B. rapa lines used to produce them. The analysis showed that significant expression could consistently be detected in leaf tissue for 35,386 unigenes. Expression was detected across all four genotypes for 27,355 unigenes, genome-specific expression patterns were observed for 7,851 unigenes and 180 unigenes displayed other classes of expression pattern. Principal component analysis (PCA) clearly resolved the individual microarray datasets for B. rapa, B. oleracea and resynthesised B. napus. Quantitative differences in expression were observed between the resynthesised B. napus lines for 98 unigenes, most of which could be classified into non-additive expression patterns, including 17 that showed cytoplasm-specific patterns. We further characterized the unigenes for which A genome-specific expression was observed and cognate genomic sequences could be identified. Ten of these unigenes were found to be Brassica-specific sequences, including two that originate from complex loci comprising gene clusters.
We succeeded in developing a Brassica community microarray resource. Although expression can be measured for the majority of unigenes across species, there were numerous probes that reported in a genome-specific manner. We anticipate that some proportion of these will represent species-specific transcripts and the remainder will be the consequence of variation of sequences within the regions represented by the array probes. Our studies demonstrated that the datasets obtained from the arrays can be used for typical analyses, including PCA and the analysis of differential expression. We have also demonstrated that Brassica-specific transcripts identified in silico in the sequence assembly of public EST database accessions are indeed reported by the array. These would not be detectable using arrays designed using A. thaliana sequences.
Plant disease resistance (R) genes with the nucleotide binding site (NBS) play an important role in offering resistance to pathogens. The availability of complete genome sequences of Brassica oleracea and Brassica rapa provides an important opportunity for researchers to identify and characterize NBS-encoding R genes in Brassica species and to compare with analogues in Arabidopsis thaliana based on a comparative genomics approach. However, little is known about the evolutionary fate of NBS-encoding genes in the Brassica lineage after split from A. thaliana.
Here we present genome-wide analysis of NBS-encoding genes in B. oleracea, B. rapa and A. thaliana. Through the employment of HMM search and manual curation, we identified 157, 206 and 167 NBS-encoding genes in B. oleracea, B. rapa and A. thaliana genomes, respectively. Phylogenetic analysis among 3 species classified NBS-encoding genes into 6 subgroups. Tandem duplication and whole genome triplication (WGT) analyses revealed that after WGT of the Brassica ancestor, NBS-encoding homologous gene pairs on triplicated regions in Brassica ancestor were deleted or lost quickly, but NBS-encoding genes in Brassica species experienced species-specific gene amplification by tandem duplication after divergence of B. rapa and B. oleracea. Expression profiling of NBS-encoding orthologous gene pairs indicated the differential expression pattern of retained orthologous gene copies in B. oleracea and B. rapa. Furthermore, evolutionary analysis of CNL type NBS-encoding orthologous gene pairs among 3 species suggested that orthologous genes in B. rapa species have undergone stronger negative selection than those in B .oleracea species. But for TNL type, there are no significant differences in the orthologous gene pairs between the two species.
This study is first identification and characterization of NBS-encoding genes in B. rapa and B. oleracea based on whole genome sequences. Through tandem duplication and whole genome triplication analysis in B. oleracea, B. rapa and A. thaliana genomes, our study provides insight into the evolutionary history of NBS-encoding genes after divergence of A. thaliana and the Brassica lineage. These results together with expression pattern analysis of NBS-encoding orthologous genes provide useful resource for functional characterization of these genes and genetic improvement of relevant crops.
Brassica species; Disease resistance gene; Nucleotide binding site; Tandem duplication; Whole genome duplication
The Brassicaceae family includes the model plant Arabidopsis thaliana as well as a number of agronomically important species such as oilseed crops (in particular Brassica napus, B. juncea and B. rapa) and vegetables (eg. B. rapa and B. oleracea).
Separated by only 10-20 million years, Brassica species and Arabidopsis thaliana are closely related, and it is expected that knowledge obtained relating to Arabidopsis growth and development can be translated into Brassicas for crop improvement. Moreover, certain aspects of plant development are sufficiently different between Brassica and Arabidopsis to warrant studies to be carried out directly in the crop species. However, mutating individual genes in the amphidiploid Brassicas such as B. napus and B. juncea may, on the other hand, not give rise to expected phenotypes as the genomes of these species can contain up to six orthologues per single-copy Arabidopsis gene. In order to elucidate and possibly exploit the function of redundant genes for oilseed rape crop improvement, it may therefore be more efficient to study the effects in one of the diploid Brassica species such as B. rapa. Moreover, the ongoing sequencing of the B. rapa genome makes this species a highly attractive model for Brassica research and genetic resource development.
Seeds from the diploid Brassica A genome species, B. rapa were treated with ethyl methane sulfonate (EMS) to produce a TILLING (Targeting Induced Local Lesions In Genomes) population for reverse genetics studies. We used the B. rapa genotype, R-o-18, which has a similar developmental ontogeny to an oilseed rape crop. Hence this resource is expected to be well suited for studying traits with relevance to yield and quality of oilseed rape. DNA was isolated from a total of 9,216 M2 plants and pooled to form the basis of the TILLING platform. Analysis of six genes revealed a high level of mutations with a density of about one per 60 kb. This analysis also demonstrated that screening a 1 kb amplicon in just one third of the population (3072 M2 plants) will provide an average of 68 mutations and a 97% probability of obtaining a stop-codon mutation resulting in a truncated protein. We furthermore calculated that each plant contains on average ~10,000 mutations and due to the large number of plants, it is predicted that mutations in approximately half of the GC base pairs in the genome exist within this population.
We have developed the first EMS TILLING resource in the diploid Brassica species, B. rapa. The mutation density in this population is ~1 per 60 kb, which makes it the most densely mutated diploid organism for which a TILLING population has been published. This resource is publicly available through the RevGenUK reverse genetics platform http://revgenuk.jic.ac.uk.
Brassica rapa is an economically important crop species. During its long breeding history, a large number of morphotypes have been generated, including leafy vegetables such as Chinese cabbage and pakchoi, turnip tuber crops and oil crops.
To investigate the genetic variation underlying this morphological variation, we re-sequenced, assembled and annotated the genomes of two B. rapa subspecies, turnip crops (turnip) and a rapid cycling. We then analysed the two resulting genomes together with the Chinese cabbage Chiifu reference genome to obtain an impression of the B. rapa pan-genome. The number of genes with protein-coding changes between the three genotypes was lower than that among different accessions of Arabidopsis thaliana, which can be explained by the smaller effective population size of B. rapa due to its domestication. Based on orthology to a number of non-brassica species, we estimated the date of divergence among the three B. rapa morphotypes at approximately 250,000 YA, far predating Brassica domestication (5,000-10,000 YA).
By analysing genes unique to turnip we found evidence for copy number differences in peroxidases, pointing to a role for the phenylpropanoid biosynthesis pathway in the generation of morphological variation. The estimated date of divergence among three B. rapa morphotypes implies that prior to domestication there was already considerably divergence among B. rapa genotypes. Our study thus provides two new B. rapa reference genomes, delivers a set of computer tools to analyse the resulting pan-genome and uses these to shed light on genetic drivers behind the rich morphological variation found in B. rapa.
MicroRNAs (miRNAs) are one of the functional non-coding small RNAs involved in the epigenetic control of the plant genome. Although plants contain both evolutionary conserved miRNAs and species-specific miRNAs within their genomes, computational methods often only identify evolutionary conserved miRNAs. The recent sequencing of the Brassica rapa genome enables us to identify miRNAs and their putative target genes. In this study, we sought to provide a more comprehensive prediction of B. rapa miRNAs based on high throughput small RNA deep sequencing.
We sequenced small RNAs from five types of tissue: seedlings, roots, petioles, leaves, and flowers. By analyzing 2.75 million unique reads that mapped to the B. rapa genome, we identified 216 novel and 196 conserved miRNAs that were predicted to target approximately 20% of the genome’s protein coding genes. Quantitative analysis of miRNAs from the five types of tissue revealed that novel miRNAs were expressed in diverse tissues but their expression levels were lower than those of the conserved miRNAs. Comparative analysis of the miRNAs between the B. rapa and Arabidopsis thaliana genomes demonstrated that redundant copies of conserved miRNAs in the B. rapa genome may have been deleted after whole genome triplication. Novel miRNA members seemed to have spontaneously arisen from the B. rapa and A. thaliana genomes, suggesting the species-specific expansion of miRNAs. We have made this data publicly available in a miRNA database of B. rapa called BraMRs. The database allows the user to retrieve miRNA sequences, their expression profiles, and a description of their target genes from the five tissue types investigated here.
This is the first report to identify novel miRNAs from Brassica crops using genome-wide high throughput techniques. The combination of computational methods and small RNA deep sequencing provides robust predictions of miRNAs in the genome. The finding of numerous novel miRNAs, many with few target genes and low expression levels, suggests the rapid evolution of miRNA genes. The development of a miRNA database, BraMRs, enables us to integrate miRNA identification, target prediction, and functional annotation of target genes. BraMRs will represent a valuable public resource with which to study the epigenetic control of B. rapa and other closely related Brassica species. The database is available at the following link:
Brassica rapa; Genome; miRNA; miRNA target; Small RNA sequencing; Database
Brassica juncea (AABB) is an allotetraploid species containing genomes of B. rapa (AA) and B. nigra (BB). It is a major oilseed crop in South Asia, and grown on approximately 6–7 million hectares of land in India during the winter season under dryland conditions. B. juncea has two well defined gene pools – Indian and east European. Hybrids between the two gene pools are heterotic for yield. A large number of qualitative and quantitative traits need to be introgressed from one gene pool into the other. This study explores the availability of SNPs in RNA-seq generated contigs, and their use for general mapping, fine mapping of selected regions, and comparative arrangement of gene blocks on B. juncea A and B genomes.
RNA isolated from two lines of B. juncea – Varuna (Indian type) and Heera (east European type) – was sequenced using Illumina paired end sequencing technology, and assembled using the Velvet de novo programme. A and B genome specific contigs were identified in two steps. First, by aligning contigs against the B. rapa protein database (available at BRAD), and second by comparing percentage identity at the nucleotide level with B. rapa CDS and B. nigra transcriptome. 135,693 SNPs were recorded in the assembled partial gene models of Varuna and Heera, 85,473 in the A genome and 50,236 in the B. Using KASpar technology, 999 markers were added to an earlier intron polymorphism marker based map of a B. juncea Varuna x Heera DH population. Many new gene blocks were identified in the B genome. A number of SNP markers covered single copy homoeologues of the A and B genomes, and these were used to identify homoeologous blocks between the two genomes. Comparison of the block architecture of A and B genomes revealed extensive differences in gene block associations and block fragmentation patterns.
Sufficient SNP markers are available for general and specific -region fine mapping of crosses between lines of two diverse B. juncea gene pools. Comparative gene block arrangement and block fragmentation patterns between A and B genomes support the hypothesis that the two genomes evolved from independent hexaploidy events.
Electronic supplementary material
The online version of this article (doi: 10.1186/1471-2164-15-396) contains supplementary material, which is available to authorized users.
Brassica species; RNA-seq; SNP; Linkage map; Comparative genomics; Evolution
Chromosomal synteny analysis is important in genome comparison to reveal genomic evolution of related species. Shared synteny describes genomic fragments from different species that originated from an identical ancestor. Syntenic genes are orthologs located in these syntenic fragments, so they often share similar functions. Syntenic gene analysis is very important in Brassicaceae species to share gene annotations and investigate genome evolution. Here we designed and developed a direct and efficient tool, SynOrths, to identify pairwise syntenic genes between genomes of Brassicaceae species. SynOrths determines whether two genes are a conserved syntenic pair based not only on their sequence similarity, but also by the support of homologous flanking genes. Syntenic genes between Arabidopsis thaliana and Brassica rapa, Arabidopsis lyrata and B. rapa, and Thellungiella parvula and B. rapa were then identified using SynOrths. The occurrence of genome triplication in B. rapa was clearly observed, many genes that were evenly distributed in the genomes of A. thaliana, A. lyrata, and T. parvula had three syntenic copies in B. rapa. Additionally, there were many B. rapa genes that had no syntenic orthologs in A. thaliana, but some of these had syntenic orthologs in A. lyrata or T. parvula. Only 5,851 genes in B. rapa had no syntenic counterparts in any of the other three species. These 5,851 genes could have originated after B. rapa diverged from these species. A tool for syntenic gene analysis between species of Brassicaceae was developed, SynOrths, which could be used to accurately identify syntenic genes in differentiated but closely-related genomes. With this tool, we identified syntenic gene sets between B. rapa and each of A. thaliana, A. lyrata, T. parvula. Syntenic gene analysis is important for not only the gene annotation of newly sequenced Brassicaceae genomes by bridging them to model plant A. thaliana, but also the study of genome evolution in these species.
synteny; ortholog; Brassica rapa; Arabidopsis thaliana; Arabidopsis lyrata; Thellugiella parvula; Brassicaceae
The species Brassica rapa (2n=20, AA) is an important vegetable and oilseed crop, and serves as an excellent model for genomic and evolutionary research in Brassica species. With the availability of whole genome sequence of B. rapa, it is essential to further determine the activity of all functional elements of the B. rapa genome and explore the transcriptome on a genome-wide scale. Here, RNA-seq data was employed to provide a genome-wide transcriptional landscape and characterization of the annotated and novel transcripts and alternative splicing events across tissues.
RNA-seq reads were generated using the Illumina platform from six different tissues (root, stem, leaf, flower, silique and callus) of the B. rapa accession Chiifu-401-42, the same line used for whole genome sequencing. First, these data detected the widespread transcription of the B. rapa genome, leading to the identification of numerous novel transcripts and definition of 5'/3' UTRs of known genes. Second, 78.8% of the total annotated genes were detected as expressed and 45.8% were constitutively expressed across all tissues. We further defined several groups of genes: housekeeping genes, tissue-specific expressed genes and co-expressed genes across tissues, which will serve as a valuable repository for future crop functional genomics research. Third, alternative splicing (AS) is estimated to occur in more than 29.4% of intron-containing B. rapa genes, and 65% of them were commonly detected in more than two tissues. Interestingly, genes with high rate of AS were over-represented in GO categories relating to transcriptional regulation and signal transduction, suggesting potential importance of AS for playing regulatory role in these genes. Further, we observed that intron retention (IR) is predominant in the AS events and seems to preferentially occurred in genes with short introns.
The high-resolution RNA-seq analysis provides a global transcriptional landscape as a complement to the B. rapa genome sequence, which will advance our understanding of the dynamics and complexity of the B. rapa transcriptome. The atlas of gene expression in different tissues will be useful for accelerating research on functional genomics and genome evolution in Brassica species.
Brassica rapa; RNA-seq; Alternative splicing; Transcriptome
Biofuels extracted from the seeds of Camelina sativa have recently been used successfully as environmentally friendly jet-fuel to reduce greenhouse gas emissions. Camelina sativa is genetically very close to Arabidopsis thaliana, and both are members of the Brassicaceae. Although public databases are currently available for some members of the Brassicaceae, such as A. thaliana, A. lyrata, Brassica napus, B. juncea and B. rapa, there are no public Expressed Sequence Tags (EST) or genomic data for Camelina sativa. In this study, a high-throughput, large-scale RNA sequencing (RNA-seq) of the Camelina sativa transcriptome was carried out to generate a database that will be useful for further functional analyses.
Approximately 27 million clean “reads” filtered from raw reads by removal of adaptors, ambiguous reads and low-quality reads (2.42 gigabase pairs) were generated by Illumina paired-end RNA-seq technology. All of these clean reads were assembled de novo into 83,493 unigenes and 103,196 transcripts using SOAPdenovo and Trinity, respectively. The average length of the transcripts generated by Trinity was 697 bp (N50 = 976), which was longer than the average length of unigenes (319 bp, N50 = 346 bp). Nonetheless, the assembly generated by SOAPdenovo produced similar number of non-redundant hits (22,435) with that of Trinity (22,433) in BLASTN searches of the Arabidopsis thaliana CDS sequence database (TAIR). Four public databases, the Kyoto Encyclopedia of Genes and Genomes (KEGG), Swiss-prot, NCBI non-redundant protein (NR), and the Cluster of Orthologous Groups (COG), were used for unigene annotation; 67,791 of 83,493 unigenes (81.2%) were finally annotated with gene descriptions or conserved protein domains that were mapped to 25,329 non-redundant protein sequences. We mapped 27,042 of 83,493 unigenes (32.4%) to 119 KEGG metabolic pathways.
This is the first report of a transcriptome database for Camelina sativa, an environmentally important member of the Brassicaceae. We showed that C. savita is closely related to Arabidopsis spp. and more distantly related to Brassica spp. Although the majority of annotated genes had high sequence identity to those of A. thaliana, a substantial proportion of disease-resistance genes (NBS-encoding LRR genes) were instead more closely similar to the genes of other Brassicaceae; these genes included BrCN, BrCNL, BrNL, BrTN, BrTNL in B. rapa. As plant genomes are under long-term selection pressure from environmental stressors, conservation of these disease-resistance genes in C. sativa and B. rapa genomes implies that they are exposed to the threats from closely-related pathogens in their natural habitats.
Brassicaceae; Camelina sativa; Transcriptome; de novo; Paired-end sequencing; NBS-LRR
Euchromatic regions of the Brassica rapa genome were sequenced and mapped onto the corresponding regions in the Arabidopsis thaliana genome.
Brassica rapa is one of the most economically important vegetable crops worldwide. Owing to its agronomic importance and phylogenetic position, B. rapa provides a crucial reference to understand polyploidy-related crop genome evolution. The high degree of sequence identity and remarkably conserved genome structure between Arabidopsis and Brassica genomes enables comparative tiling sequencing using Arabidopsis sequences as references to select the counterpart regions in B. rapa, which is a strong challenge of structural and comparative crop genomics.
We assembled 65.8 megabase-pairs of non-redundant euchromatic sequence of B. rapa and compared this sequence to the Arabidopsis genome to investigate chromosomal relationships, macrosynteny blocks, and microsynteny within blocks. The triplicated B. rapa genome contains only approximately twice the number of genes as in Arabidopsis because of genome shrinkage. Genome comparisons suggest that B. rapa has a distinct organization of ancestral genome blocks as a result of recent whole genome triplication followed by a unique diploidization process. A lack of the most recent whole genome duplication (3R) event in the B. rapa genome, atypical of other Brassica genomes, may account for the emergence of B. rapa from the Brassica progenitor around 8 million years ago.
This work demonstrates the potential of using comparative tiling sequencing for genome analysis of crop species. Based on a comparative analysis of the B. rapa sequences and the Arabidopsis genome, it appears that polyploidy and chromosomal diploidization are ongoing processes that collectively stabilize the B. rapa genome and facilitate its evolution.
Although much research has been conducted, the pattern of microsatellite distribution has remained ambiguous, and the development/utilization of microsatellite markers has still been limited/inefficient in Brassica, due to the lack of genome sequences. In view of this, we conducted genome-wide microsatellite characterization and marker development in three recently sequenced Brassica crops: Brassica rapa, Brassica oleracea and Brassica napus. The analysed microsatellite characteristics of these Brassica species were highly similar or almost identical, which suggests that the pattern of microsatellite distribution is likely conservative in Brassica. The genomic distribution of microsatellites was highly non-uniform and positively or negatively correlated with genes or transposable elements, respectively. Of the total of 115 869, 185 662 and 356 522 simple sequence repeat (SSR) markers developed with high frequencies (408.2, 343.8 and 356.2 per Mb or one every 2.45, 2.91 and 2.81 kb, respectively), most represented new SSR markers, the majority had determined physical positions, and a large number were genic or putative single-locus SSR markers. We also constructed a comprehensive database for the newly developed SSR markers, which was integrated with public Brassica SSR markers and annotated genome components. The genome-wide SSR markers developed in this study provide a useful tool to extend the annotated genome resources of sequenced Brassica species to genetic study/breeding in different Brassica species.
brassica; microsatellite; distribution; marker; database
Brassica rapa, which is closely related to
Arabidopsis thaliana, is an important crop and a
model plant for studying genome evolution via
polyploidization. We report the current understanding of the
genome structure of B. rapa and efforts for the
whole-genome sequencing of the species. The tribe
Brassicaceae, which comprises ca. 240 species,
descended from a common hexaploid ancestor with a basic genome
similar to that of Arabidopsis. Chromosome
rearrangements, including fusions and/or fissions, resulted in
the present-day “diploid” Brassica
species with variation in chromosome number and phenotype.
Triplicated genomic segments of B. rapa are
collinear to those of A. thaliana with InDels.
The genome triplication has led to an approximately 1.7-fold
increase in the B. rapa gene number compared to
that of A. thaliana. Repetitive DNA of B.
rapa has also been extensively amplified and has
diverged from that of A. thaliana. For its
whole-genome sequencing, the Brassica rapa Genome
Sequencing Project (BrGSP) consortium has developed suitable
genomic resources and constructed genetic and physical maps.
Ten chromosomes of B. rapa are being allocated to
BrGSP consortium participants, and each chromosome will be
sequenced by a BAC-by-BAC approach. Genome sequencing of
B. rapa will offer a new perspective for plant
biology and evolution in the context of polyploidization.
Improving crop species by breeding for salt tolerance or introducing salt tolerant traits is one method of increasing crop yields in saline affected areas. Extensive studies of the model plant species Arabidopsis thaliana has led to the availability of substantial information regarding the function and importance of many genes involved in salt tolerance. However, the identification and characterization of A. thaliana orthologs in species such as Brassica napus (oilseed rape) can prove difficult due to the significant genomic changes that have occurred since their divergence approximately 20 million years ago (MYA). The recently released Brassica rapa genome provides an excellent resource for comparative studies of A. thaliana and the cultivated Brassica species, and facilitates the identification of Brassica species orthologs which may be of agronomic importance. Sodium hydrogen antiporter (NHX) proteins transport a sodium or potassium ion in exchange for a hydrogen ion in the other direction across a membrane. In A. thaliana there are eight members of the NHX family, designated AtNHX1-8, that can be sub-divided into three clades, based on their subcellular localization: plasma membrane (PM), intracellular class I (IC-I) and intracellular class II (IC-II). In plants, many NHX proteins are primary determinants of salt tolerance and act by transporting Na+ out of the cytosol where it would otherwise accumulate to toxic levels. Significant work has been done to determine the role of both PM and IC-I clade members in salt tolerance in a variety of plant species, but relatively little analysis has been described for the IC-II clade. Here we describe the identification of B. napus orthologs of AtNHX5 and AtNHX6, using the B. rapa genome sequence, macro- and micro-synteny analysis, comparative expression and promoter motif analysis, and highlight the value of these multiple approaches for identifying true orthologs in closely related species with multiple paralogs.
Arabidopsis; NHX; antiporter; Brassica; sodium transport; potassium transport; pH; cation transport
Sinapis arvensis is a weed with strong biological activity. Despite being a problematic annual weed that contaminates agricultural crop yield, it is a valuable alien germplasm resource. It can be utilized for broadening the genetic background of Brassica crops with desirable agricultural traits like resistance to blackleg (Leptosphaeria maculans), stem rot (Sclerotinia sclerotium) and pod shatter (caused by FRUITFULL gene). However, few genetic studies of S. arvensis were reported because of the lack of genomic resources. In the present study, we performed de novo transcriptome sequencing to produce a comprehensive dataset for S. arvensis for the first time. We used Illumina paired-end sequencing technology to sequence the S. arvensis flower transcriptome and generated 40,981,443 reads that were assembled into 131,278 transcripts. We de novo assembled 96,562 high quality unigenes with an average length of 832 bp. A total of 33,662 full-length ORF complete sequences were identified, and 41,415 unigenes were mapped onto 128 pathways using the KEGG Pathway database. The annotated unigenes were compared against Brassica rapa, B. oleracea, B. napus and Arabidopsis thaliana. Among these unigenes, 76,324 were identified as putative homologs of annotated sequences in the public protein databases, of which 1194 were associated with plant hormone signal transduction and 113 were related to gibberellin homeostasis/signaling. Unigenes that did not match any of those sequence datasets were considered to be unique to S. arvensis. Furthermore, 21,321 simple sequence repeats were found. Our study will enhance the currently available resources for Brassicaceae and will provide a platform for future genomic studies for genetic improvement of Brassica crops.
Affymetrix GeneChip® arrays are used widely to study transcriptional changes in response to developmental and environmental stimuli. GeneChip® arrays comprise multiple 25-mer oligonucleotide probes per gene and retain certain advantages over direct sequencing. For plants, there are several public GeneChip® arrays whose probes are localised primarily in 3′ exons. Plant whole-transcript (WT) GeneChip® arrays are not yet publicly available, although WT resolution is needed to study complex crop genomes such as Brassica, which are typified by segmental duplications containing paralogous genes and/or allopolyploidy. Available sequence data were sampled from the Brassica A and C genomes, and 142,997 gene models identified. The assembled gene models were then used to establish a comprehensive public WT exon array for transcriptomics studies. The Affymetrix GeneChip® Brassica Exon 1.0 ST Array is a 5 µM feature size array, containing 2.4 million 25-base oligonucleotide probes representing 135,201 gene models, with 15 probes per gene distributed among exons. Discrimination of the gene models was based on an E-value cut-off of 1E−5, with ≤98% sequence identity. The 135 k Brassica Exon Array was validated by quantifying transcriptome differences between leaf and root tissue from a reference Brassica rapa line (R-o-18), and categorisation by Gene Ontologies (GO) based on gene orthology with Arabidopsis thaliana. Technical validation involved comparison of the exon array with a 60-mer array platform using the same starting RNA samples. The 135 k Brassica Exon Array is a robust platform. All data relating to the array design and probe identities are available in the public domain and are curated within the BrassEnsembl genome viewer at http://www.brassica.info/BrassEnsembl/index.html.
Extensive mapping efforts are currently underway for the establishment of comparative genomics between the model plant, Arabidopsis thaliana and various Brassica species. Most of these studies have deployed RFLP markers, the use of which is a laborious and time-consuming process. We therefore tested the efficacy of PCR-based Intron Polymorphism (IP) markers to analyze genome-wide synteny between the oilseed crop, Brassica juncea (AABB genome) and A. thaliana and analyzed the arrangement of 24 (previously described) genomic block segments in the A, B and C Brassica genomes to study the evolutionary events contributing to karyotype variations in the three diploid Brassica genomes.
IP markers were highly efficient and generated easily discernable polymorphisms on agarose gels. Comparative analysis of the segmental organization of the A and B genomes of B. juncea (present study) with the A and B genomes of B. napus and B. nigra respectively (described earlier), revealed a high degree of colinearity suggesting minimal macro-level changes after polyploidization. The ancestral block arrangements that remained unaltered during evolution and the karyotype rearrangements that originated in the Oleracea lineage after its divergence from Rapa lineage were identified. Genomic rearrangements leading to the gain or loss of one chromosome each between the A-B and A-C lineages were deciphered. Complete homoeology in terms of block organization was found between three linkage groups (LG) each for the A-B and A-C genomes. Based on the homoeology shared between the A, B and C genomes, a new nomenclature for the B genome LGs was assigned to establish uniformity in the international Brassica LG nomenclature code.
IP markers were highly effective in generating comparative relationships between Arabidopsis and various Brassica species. Comparative genomics between the three Brassica lineages established the major rearrangements, translocations and fusions pivotal to karyotype diversification between the A, B and C genomes of Brassica species. The inter-relationships established between the Brassica lineages vis-à-vis Arabidopsis would facilitate the identification and isolation of candidate genes contributing to traits of agronomic value in crop Brassicas and the development of unified tools for Brassica genomics.
Chinese cabbage (Brassica rapa ssp. pekinensis) is a member of one of the most important leaf vegetables grown worldwide, which has experienced thousands of years in cultivation and artificial selection. The entire Chinese cabbage genome sequence, and more than forty thousand proteins have been obtained to date. The genome has undergone triplication events since its divergence from Arabidopsis thaliana (13 to 17 Mya), however a high degree of sequence similarity and conserved genome structure remain between the two species. Arabidopsis is therefore a viable reference species for comparative genomics studies. Variation in the number of members in gene families due to genome triplication may contribute to the broad range of phenotypic plasticity, and increased tolerance to environmental extremes observed in Brassica species. Transcription factors are important regulators involved in plant developmental and physiological processes. The AP2/ERF proteins, one of the most important families of transcriptional regulators, play a crucial role in plant growth, and in response to biotic and abiotic stressors. Our analysis will provide resources for understanding the tolerance mechanisms in Brassica rapa ssp. pekinensis.
In the present study, 291 putative AP2/ERF transcription factor proteins were identified from the Chinese cabbage genome database, and compared with proteins from 15 additional species. The Chinese cabbage AP2/ERF superfamily was classified into four families, including AP2, ERF, RAV, and Soloist. The ERF family was further divided into DREB and ERF subfamilies. The AP2/ERF superfamily was subsequently divided into 15 groups. The identification, classification, phylogenetic reconstruction, conserved motifs, chromosome distribution, functional annotation, expression patterns, and interaction networks of the AP2/ERF transcription factor superfamily were predicted and analyzed. Distribution mapping results showed AP2/ERF superfamily genes were localized on the 10 Chinese cabbage chromosomes. AP2/ERF transcription factor expression levels exhibited differences among six tissue types based on expressed sequence tags (ESTs). In the AP2/ERF superfamily, 214 orthologous genes were identified between Chinese cabbage and Arabidopsis. Orthologous gene interaction networks were constructed, and included seven CBF and four AP2 genes, primarily involved in cold regulatory pathways and ovule development, respectively.
The evolution of the AP2/ERF transcription factor superfamily in Chinese cabbage resulted from genome triplication and tandem duplications. A comprehensive analysis of the physiological functions and biological roles of AP2/ERF superfamily genes in Chinese cabbage is required to fully elucidate AP2/ERF, which provides us with rich resources and opportunities to understand crop stress tolerance mechanisms.
Chinese cabbage; AP2/ERF; Stress tolerance; Gene expression; Interaction network; Protein annotation
Molecular genetic maps provide a means to link heritable traits with underlying genome sequence variation. Several genetic maps have been constructed for Brassica species, yet to date, there has been no simple means to compare this information or to associate mapped traits with the genome sequence of the related model plant, Arabidopsis.
We have developed a comparative genetic map database for the viewing, comparison and analysis of Brassica and Arabidopsis genetic, physical and trait map information. This web-based tool allows users to view and compare genetic and physical maps, search for traits and markers, and compare genetic linkage groups within and between the amphidiploid and diploid Brassica genomes. The inclusion of Arabidopsis data enables comparison between Brassica maps that share no common markers. Analysis of conserved syntenic blocks between Arabidopsis and collated Brassica genetic maps validates the application of this system. This tool is freely available over the internet on .
This database enables users to interrogate the relationship between Brassica genetic maps and the sequenced genome of A. thaliana, permitting the comparison of genetic linkage groups and mapped traits and the rapid identification of candidate genes.
As part of a research programme focused on flavonoid biosynthesis in the seed coat of Brassica napus L. (oilseed rape), orthologs of the BANYULS gene that encoded anthocyanidin reductase were cloned in B. napus as well as in the related species Brassica rapa and Brassica oleracea. B. napus genome contained four functional copies of BAN, two originating from each diploid progenitor. Amino acid sequences were highly conserved between the Brassicaceae including B. napus, B. rapa, B. oleracea as well as the model plant Arabidopsis thaliana. Along the 200 bp in 5′ of the ATG codon, Bna.BAN promoters (ProBna.BAN) were conserved with AtANR promoter and contained putative cis-acting elements. In addition, transgenic Arabidopsis and oilseed rape plants carrying the first 230 bp of ProBna.BAN fused to the UidA reporter gene were generated. In the two Brassicaceae backgrounds, ProBna.BAN activity was restricted to the seed coat. In B. napus seed, ProBna.BAN was activated in procyanidin-accumulating cells, namely the innermost layer of the inner integument and the micropyle-chalaza area. At the transcriptional level, the four Bna.BAN genes were expressed in the seed. Laser microdissection assays of the seed integuments showed that Bna.BAN expression was restricted to the inner integument, which was consistent with the activation profile of ProBna.BAN. Finally, Bna.BAN genes were mapped onto oilseed rape genetic maps and potential co-localisations with seed colour quantitative trait loci are discussed.
Anthocyanidin reductase; BANYULS genes; Brassica; Flavonoid metabolism; Seed coat-specific promoter
The Brassicaceae family is an exemplary model for studying plant polyploidy. The Brassicaceae knowledge-base includes the well-annotated Arabidopsis thaliana reference sequence; well-established evidence for three rounds of whole genome duplication (WGD); and the conservation of genomic structure, with 24 conserved genomic blocks (GBs). The recently released Brassica rapa draft genome provides an ideal opportunity to update our knowledge of the conserved genomic structures in Brassica, and to study evolutionary innovations of the mesohexaploid plant, B. rapa.
Three chronological B. rapa genomes (recent, young, and old) were reconstructed with sequence divergences, revealing a trace of recursive WGD events. A total of 636 fast evolving genes were unevenly distributed throughout the recent and young genomes. The representative Gene Ontology (GO) terms for these genes were ‘stress response’ and ‘development’ both through a change in protein modification or signaling, rather than by enhancing signal recognition. In retention patterns analysis, 98% of B. rapa genes were retained as collinear gene pairs; 77% of those were singly-retained in recent or young genomes resulting from death of the ancestral copies, while others were multi-retained as long retention genes. GO enrichments indicated that single retention genes mainly function in the interpretation of genetic information, whereas, multi-retention genes were biased toward signal response, especially regarding development and defense. In the recent genome, 13,302, 5,790, and 20 gene pairs were multi-retained following Brassica whole genome triplication (WGT) events with 2, 3, and 4 homoeologous copies, respectively. Enriched GO-slim terms from B. rapa homomoelogues imply that a major effect of the B. rapa WGT may have been to acquire environmental adaptability or to change the course of development. These homoeologues seem to more frequently undergo subfunctionalization with spatial expression patterns compared with other possible events including nonfunctionalization and neofunctionalization.
We refined Brassicaceae GB information using the latest genomic resources, and distinguished three chronologically ordered B. rapa genomes. B. rapa genes were categorized into fast evolving, single- and multi-retention genes, and long retention genes by their substitution rates and retention patterns. Representative functions of the categorized genes were elucidated, providing better understanding of B. rapa evolution and the Brassica genus.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-606) contains supplementary material, which is available to authorized users.
Brassica rapa; Chronological genomes; Fast-evolving genes; Single-retention genes; Multi-retention genes
The Multinational Brassica rapa Genome Sequencing Project (BrGSP) has developed valuable genomic resources, including BAC libraries, BAC-end sequences, genetic and physical maps, and seed BAC sequences for Brassica rapa. An integrated linkage map between the amphidiploid B. napus and diploid B. rapa will facilitate the rapid transfer of these valuable resources from B. rapa to B. napus (Oilseed rape, Canola).
In this study, we identified over 23,000 simple sequence repeats (SSRs) from 536 sequenced BACs. 890 SSR markers (designated as BrGMS) were developed and used for the construction of an integrated linkage map for the A genome in B. rapa and B. napus. Two hundred and nineteen BrGMS markers were integrated to an existing B. napus linkage map (BnaNZDH). Among these mapped BrGMS markers, 168 were only distributed on the A genome linkage groups (LGs), 18 distrubuted both on the A and C genome LGs, and 33 only distributed on the C genome LGs. Most of the A genome LGs in B. napus were collinear with the homoeologous LGs in B. rapa, although minor inversions or rearrangements occurred on A2 and A9. The mapping of these BAC-specific SSR markers enabled assignment of 161 sequenced B. rapa BACs, as well as the associated BAC contigs to the A genome LGs of B. napus.
The genetic mapping of SSR markers derived from sequenced BACs in B. rapa enabled direct links to be established between the B. napus linkage map and a B. rapa physical map, and thus the assignment of B. rapa BACs and the associated BAC contigs to the B. napus linkage map. This integrated genetic linkage map will facilitate exploitation of the B. rapa annotated genomic resources for gene tagging and map-based cloning in B. napus, and for comparative analysis of the A genome within Brassica species.
Brassica rapa includes several important leaf vegetable crops whose production is often damaged by high temperature. Cis-natural antisense transcripts (cis-NATs) and cis-NATs-derived small interfering RNAs (nat-siRNAs) play important roles in plant development and stress responses. However, genome-wide cis-NATs in B. rapa are not known. The NATs and nat-siRNAs that respond to heat stress have never been well studied in B. rapa. Here, we took advantage of RNA-seq and small RNA (sRNA) deep sequencing technology to identify cis-NATs and heat responsive nat-siRNAs in B. rapa.
Analyses of four RNA sequencing datasets revealed 1031 cis-NATs B. rapa ssp. chinensis cv Wut and B. rapa ssp. pekinensis cv. Bre. Based on sequence homology between Arabidopsis thaliana and B. rapa, 303 conserved cis-NATs in B. rapa were found to correspond to 280 cis-NATs in Arabidopsis; the remaining 728 novel cis-NATs were identified as Brassica-specific ones. Using six sRNA libraries, 4846 nat-siRNAs derived from 150 cis-NATs were detected. Differential expression analysis revealed that nat-siRNAs derived from 12 cis-NATs were responsive to heat stress, and most of them showed strand bias. Real-time PCR indicated that most of the transcripts generating heat-responsive nat-siRNAs were upregulated under heat stress, while the transcripts from the opposite strands of the same loci were downregulated.
Our results provide the first subsets of genome-wide cis-NATs and heat-responsive nat-siRNAs in B. rapa; these sRNAs are potentially useful for the genetic improvement of heat tolerance in B. rapa and other crops.
cis-NATs; nat-siRNAs; Heat response; Genomic comparison; Brassica rapa