Carotenoids are isoprenoid compounds synthesized by all photosynthetic organisms. Despite much research on carotenoid biosynthesis in the model plant Arabidopsis thaliana, there is a lack of information on the carotenoid pathway in Brassica rapa. To better understand its carotenoid biosynthetic pathway, we performed a systematic analysis of carotenoid biosynthetic genes at the genome level in B. rapa.
We identified 67 carotenoid biosynthetic genes in B. rapa, which were orthologs of the 47 carotenoid genes in A. thaliana. A high level of synteny was observed for carotenoid biosynthetic genes between A. thaliana and B. rapa. Out of 47 carotenoid biosynthetic genes in A. thaliana, 46 were successfully mapped to the 10 B. rapa chromosomes, and most of the genes retained more than one copy in B. rapa. The gene expansion was caused by the whole-genome triplication (WGT) event experienced by Brassica species. An expression analysis of the carotenoid biosynthetic genes suggested that their expression levels differed in root, stem, leaf, flower, callus, and silique tissues. Additionally, the paralogs of each carotenoid biosynthetic gene, which were generated from the WGT in B. rapa, showed significantly different expression levels among tissues, suggesting differentiated functions for these multi-copy genes in the carotenoid pathway.
This first systematic study of carotenoid biosynthetic genes in B. rapa provides insights into the carotenoid metabolic mechanisms of Brassica crops. In addition, a better understanding of carotenoid biosynthetic genes in B. rapa will contribute to the development of conventional and transgenic B. rapa cultivars with enriched carotenoid levels in the future.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1655-5) contains supplementary material, which is available to authorized users.
Biosynthetic pathway; Carotenoid biosynthetic genes; Comparative genomics; Expression analysis; Brassica rapa
This data article reports the establishment of the first pan-transcriptome resources for the Brassica A and C genomes. These were developed using existing coding DNA sequence (CDS) gene models from the now-published Brassica oleracea TO1000 and Brassica napus Darmor-bzh genome sequence assemblies representing the chromosomes of these species, along with preliminary CDS models from an updated Brassica rapa Chiifu genome sequence assembly. The B. rapa genome sequence scaffolds required splitting and re-ordering to match the expected genome organisation based on a high density SNP linkage map, but the B. oleracea assembly was used unchanged. The resulting B. rapa (A genome) pseudomolecules contained 47,656 ordered CDS models and the B. oleracea (C genome) pseudomolecules contained 54,766 ordered CDS models. Interpolation of B. napus CDS models not already represented by orthologues resulted in 52,790 and 63,308 ordered CDS models in the A and C pan-transcriptomes, an increase of 13,676 overall. Comparison of the organisation of this resource with publicly available genome sequences for B. napus showed excellent consistency for the B. napus Darmor-bzh resource, but more breakdown of collinearity for the B. napus ZS11 resource. CDS datasets comprising the pan-transcriptomes are available with this article (B. rapa) or from public repositories (B. oleracea and B. napus).
The tapetum plays an important role in anther development by providing necessary enzymes and nutrients for pollen development. However, it is difficult to identify tapetum-specific genes on a large-scale because of the difficulty of separating tapetum cells from other anther tissues. Here, we reported the identification of tapetum-specific genes by comparing the gene expression patterns of four male sterile (MS) lines of Brassica oleracea. The abortive phenotypes of the four MS lines revealed different defects in tapetum and pollen development but normal anther wall development when observed by transmission electron microscopy. These tapetum displayed continuous defective characteristics throughout the anther developmental stages. The transcriptome from flower buds, covering all anther developmental stages, was analyzed and bioinformatics analyses exploring tapetum development-related genes were performed. We identified 1,005 genes differentially expressed in at least one of the MS lines and 104 were non-pollen expressed genes (NPGs). Most of the identified NPGs were tapetum-specific genes considering that anther walls were normally developed in all four MS lines. Among the 104 NPGs, 22 genes were previously reported as being involved in tapetum development. We further separated the expressed NPGs into different developmental stages based on the MS defects. The data obtained in this study are not only informative for research on tapetum development in B. oleracea, but are also useful for genetic pathway research in other related species.
Electronic supplementary material
The online version of this article (doi:10.1007/s11103-015-0287-0) contains supplementary material, which is available to authorized users.
Brassica oleracea; Tapetum; Gene expression; Male sterility (MS); Microarray
Methylthioalkylmalate synthases (MAMs) encoded by MAM genes are central to the diversification of the glucosinolates, which are important secondary metabolites in Brassicaceae species. However, the evolutionary pathway of MAM genes is poorly understood. We analyzed the phylogenetic and synteny relationships of MAM genes from 13 sequenced Brassicaceae species. Based on these analyses, we propose that the syntenic loci of MAM genes, which underwent frequent tandem duplications, divided into two independent lineage-specific evolution routes and were driven by positive selection after the divergence from Aethionema arabicum. In the lineage I species Capsella rubella, Camelina sativa, Arabidopsis lyrata, and A. thaliana, the MAM loci evolved three tandem genes encoding enzymes responsible for the biosynthesis of aliphatic glucosinolates with different carbon chain-lengths. In lineage II species, the MAM loci encode enzymes responsible for the biosynthesis of short-chain aliphatic glucosinolates. Our proposed model of the evolutionary pathway of MAM genes will be useful for understanding the specific function of these genes in Brassicaceae species.
glucosinolates; MAM genes; syntenic; evolution; Brassicaceae
Cabbage Fusarium wilt is a major disease worldwide that can cause severe yield loss in cabbage (Brassica olerecea). Although markers linked to the resistance gene FOC1 have been identified, no candidate gene for it has been determined so far. In this study, we report the fine mapping and analysis of a candidate gene for FOC1 using a double haploid (DH) population with 160 lines and a F2 population of 4000 individuals derived from the same parental lines.
We confirmed that the resistance to Fusarium wilt was controlled by a single dominant gene based on the resistance segregation ratio of the two populations. Using InDel primers designed from whole-genome re-sequencing data for the two parental lines (the resistant inbred-line 99–77 and the highly susceptible line 99–91) and the DH population, we mapped the resistance gene to a 382-kb genomic region on chromosome C06. Using the F2 population, we narrowed the region to an 84-kb interval that harbored ten genes, including four probable resistance genes (R genes): Bol037156, Bol037157, Bol037158 and Bol037161 according to the gene annotations from BRAD, the genomic database for B. oleracea. After correcting the model of the these genes, we re-predicted two R genes in the target region: re-Bol037156 and re-Bol0371578. The latter was excluded after we compared the two genes’ sequences between ten resistant materials and ten susceptible materials. For re-Bol037156, we found high identity among the sequences of the resistant lines, while among the susceptible lines, there were two types of InDels (a 1-bp insertion and a 10-bp deletion), each of which caused a frameshift and terminating mutation in the cDNA sequences. Further sequence analysis of the two InDel loci from 80 lines (40 resistant and 40 susceptible) also showed that all 40 R lines had no InDel mutation while 39 out of 40 S lines matched the two types of loci. Thus re-Bol037156 was identified as a likely candidate gene for FOC1 in cabbage.
This work may lay the foundation for marker-assisted selection as well as for further function analysis of the FOC1 gene.
Brassica oleracea; Fusarium wilt; Resistance gene; FOC1; Map-based cloning
Brassica rapa displays enormous morphological diversity, with leafy vegetables, turnips and oil crops. Turnips (Brassica rapa subsp. rapa) represent one of the morphotypes, which form tubers and can be used to study the genetics underlying storage organ formation. In the present study we investigated several characteristics of an extensive turnip collection comprising 56 accessions from both Asia (mainly Japanese origin) and Europe. Population structure was calculated using data from 280 evenly distributed SNP markers over 56 turnip accessions. We studied the anatomy of turnip tubers and measured carbohydrate composition of the mature turnip tubers of a subset of the collection. The variation in 16 leaf traits, 12 tuber traits and flowering time was evaluated in five independent experiments for the entire collection. The effect of vernalization on flowering and tuber formation was also investigated. SNP marker profiling basically divided the turnip accessions into two subpopulations, with admixture, generally corresponding with geographical origin (Europe or Asia). The enlarged turnip tuber consists of both hypocotyl and root tissue, but the proportion of the two tissues differs between accessions. The ratio of sucrose to fructose and glucose differed among accessions, while generally starch content was low. The evaluated traits segregated in both subpopulations, with leaf shape, tuber colour and number of shoots per tuber explaining most variation between the two subpopulations. Vernalization resulted in reduced flowering time and smaller tubers for the Asian turnips whereas the European turnips were less affected by vernalization.
Kaposi’s sarcoma-associated herpesvirus (KSHV) is the causal agent of all forms of Kaposi’s sarcoma (KS), including AIDS-KS, endemic KS, classic KS and iatrogenic KS. Based on Open reading frame (ORF) K1 sequence analysis, KSHV has been classified into seven major molecular subtypes (A, B, C, D, E, F and Z). The distribution of KSHV strains varies according to geography and ethnicity. Xinjiang is a unique region where the seroprevalence of KSHV is significantly higher than other parts of China. The genotyping of KSHV strains in this region has not been thoroughly studied. The present study aimed to evaluate the frequency of KSHV genotypes isolated from KS tissues in Classical KS and AIDS KS patients from Xinjiang, China. ORF-K1 of KSHV from tissue samples of 28 KS patients was amplified and sequenced. Two subtypes of KSHV were identified according to K1 genotyping. Twenty-three of them belonged to subtype A, while five of them were subtype C. More genotype A than genotype C strains were found in both Classical KS and AIDS KS. No significant difference was found in the prevalence of different genotype between Classical KS and AIDS KS.
Kaposi’s sarcoma-associated herpesvirus (KSHV); genotyping; K1 gene; Xinjiang
Brassica rapa studies towards metabolic variation have largely been focused on the profiling of the diversity of metabolic compounds in specific crop types or regional varieties, but none aimed to identify genes with regulatory function in metabolite composition. Here we followed a genetical genomics approach to identify regulatory genes for six biosynthetic pathways of health-related phytochemicals, i.e carotenoids, tocopherols, folates, glucosinolates, flavonoids and phenylpropanoids. Leaves from six weeks-old plants of a Brassica rapa doubled haploid population, consisting of 92 genotypes, were profiled for their secondary metabolite composition, using both targeted and LC-MS-based untargeted metabolomics approaches. Furthermore, the same population was profiled for transcript variation using a microarray containing EST sequences mainly derived from three Brassica species: B. napus, B. rapa and B. oleracea. The biochemical pathway analysis was based on the network analyses of both metabolite QTLs (mQTLs) and transcript QTLs (eQTLs). Co-localization of mQTLs and eQTLs lead to the identification of candidate regulatory genes involved in the biosynthesis of carotenoids, tocopherols and glucosinolates. We subsequently focused on the well-characterized glucosinolate pathway and revealed two hotspots of co-localization of eQTLs with mQTLs in linkage groups A03 and A09. Our results indicate that such a large-scale genetical genomics approach combining transcriptomics and metabolomics data can provide new insights into the genetic regulation of metabolite composition of Brassica vegetables.
Linkage maps enable the study of important biological questions. The construction of high-density linkage maps appears more feasible since the advent of next-generation sequencing (NGS), which eases SNP discovery and high-throughput genotyping of large population. However, the marker number explosion and genotyping errors from NGS data challenge the computational efficiency and linkage map quality of linkage study methods. Here we report the HighMap method for constructing high-density linkage maps from NGS data. HighMap employs an iterative ordering and error correction strategy based on a k-nearest neighbor algorithm and a Monte Carlo multipoint maximum likelihood algorithm. Simulation study shows HighMap can create a linkage map with three times as many markers as ordering-only methods while offering more accurate marker orders and stable genetic distances. Using HighMap, we constructed a common carp linkage map with 10,004 markers. The singleton rate was less than one-ninth of that generated by JoinMap4.1. Its total map distance was 5,908 cM, consistent with reports on low-density maps. HighMap is an efficient method for constructing high-density, high-quality linkage maps from high-throughput population NGS data. It will facilitate genome assembling, comparative genomic analysis, and QTL studies. HighMap is available at http://highmap.biomarker.com.cn/.
Anthocyanins are a group of flavonoid compounds. As a group of important secondary metabolites, they perform several key biological functions in plants. Anthocyanins also play beneficial health roles as potentially protective factors against cancer and heart disease. To elucidate the anthocyanin biosynthetic pathway in Brassica rapa, we conducted comparative genomic analyses between Arabidopsis thaliana and B. rapa on a genome-wide level.
In total, we identified 73 genes in B. rapa as orthologs of 41 anthocyanin biosynthetic genes in A. thaliana. In B. rapa, the anthocyanin biosynthetic genes (ABGs) have expanded and most genes exist in more than one copy. The anthocyanin biosynthetic structural genes have expanded through whole genome and tandem duplication in B. rapa. More structural genes located upstream of the anthocyanin biosynthetic pathway have been retained than downstream. More negative regulatory genes are retained in the anthocyanin biosynthesis regulatory system of B. rapa.
These results will promote an understanding of the genetic mechanism of anthocyanin biosynthesis, as well as help the improvement of the nutritional quality of B. rapa through the breeding of high anthocyanin content varieties.
Electronic supplementary material
The online version of this article (doi: 10.1186/1471-2164-15-426) contains supplementary material, which is available to authorized users.
Comparative genomics; Anthocyanin biosynthetic genes; Whole genome duplication; Brassica rapa; Cruciferae
Increasing evidence has revealed that humid heat stress (HHS) causes considerable damage to human health. The cardiovascular system has been suggested to be the primary target of heat stress, which results in serious cardiovascular diseases. However, there is still a lack of effective approaches for the prevention and treatment of cardiovascular diseases induced by HHS.
Heat-shock proteins (Hsps), especially Hsp70, are reported to provide effective cytoprotection under various stress stimuli. In the present study, we evaluated the cytoprotective effect of geranylgeranylacetone (GGA), which was previously been reported to induce Hsp70 expression in cardiomyocytes under HHS.
Methods and Principal Findings
Using a mouse model of HHS, we showed that the pretreatment of GGA enhanced Hsp70 expression under HHS, as examined by quantitative real-time polymerase chain reaction (qRT-PCR) and Western blot. We then examined the effect of GGA pretreatment on the cardiomyocyte apoptosis induced by HHS using terminal-deoxynucleoitidyl transferase mediated nick end labeling (TUNEL) staining, and found that GGA pretreatment inhibited mitochondria-mediated apoptosis. GGA pretreatment could reverse the effect of HHS on cell apoptosis by increasing expression of Bcl-2, decreasing cytochrome c in cytosol, and increasing cytochrome c in mitochondria. However, GGA pretreatment had no effect on the oxidative stress induced by HHS as determined by levels of superoxide dismutase (SOD), malondialdehyde (MDA), and glutathione (GSH).
We have demonstrated that GGA pretreatment suppressed HHS-induced apoptosis of cardiomyocytes through the induction of Hsp70 overexpression.
Brassica rapa is an economically important crop species. During its long breeding history, a large number of morphotypes have been generated, including leafy vegetables such as Chinese cabbage and pakchoi, turnip tuber crops and oil crops.
To investigate the genetic variation underlying this morphological variation, we re-sequenced, assembled and annotated the genomes of two B. rapa subspecies, turnip crops (turnip) and a rapid cycling. We then analysed the two resulting genomes together with the Chinese cabbage Chiifu reference genome to obtain an impression of the B. rapa pan-genome. The number of genes with protein-coding changes between the three genotypes was lower than that among different accessions of Arabidopsis thaliana, which can be explained by the smaller effective population size of B. rapa due to its domestication. Based on orthology to a number of non-brassica species, we estimated the date of divergence among the three B. rapa morphotypes at approximately 250,000 YA, far predating Brassica domestication (5,000-10,000 YA).
By analysing genes unique to turnip we found evidence for copy number differences in peroxidases, pointing to a role for the phenylpropanoid biosynthesis pathway in the generation of morphological variation. The estimated date of divergence among three B. rapa morphotypes implies that prior to domestication there was already considerably divergence among B. rapa genotypes. Our study thus provides two new B. rapa reference genomes, delivers a set of computer tools to analyse the resulting pan-genome and uses these to shed light on genetic drivers behind the rich morphological variation found in B. rapa.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-250) contains supplementary material, which is available to authorized users.
The species Brassica rapa (2n=20, AA) is an important vegetable and oilseed crop, and serves as an excellent model for genomic and evolutionary research in Brassica species. With the availability of whole genome sequence of B. rapa, it is essential to further determine the activity of all functional elements of the B. rapa genome and explore the transcriptome on a genome-wide scale. Here, RNA-seq data was employed to provide a genome-wide transcriptional landscape and characterization of the annotated and novel transcripts and alternative splicing events across tissues.
RNA-seq reads were generated using the Illumina platform from six different tissues (root, stem, leaf, flower, silique and callus) of the B. rapa accession Chiifu-401-42, the same line used for whole genome sequencing. First, these data detected the widespread transcription of the B. rapa genome, leading to the identification of numerous novel transcripts and definition of 5'/3' UTRs of known genes. Second, 78.8% of the total annotated genes were detected as expressed and 45.8% were constitutively expressed across all tissues. We further defined several groups of genes: housekeeping genes, tissue-specific expressed genes and co-expressed genes across tissues, which will serve as a valuable repository for future crop functional genomics research. Third, alternative splicing (AS) is estimated to occur in more than 29.4% of intron-containing B. rapa genes, and 65% of them were commonly detected in more than two tissues. Interestingly, genes with high rate of AS were over-represented in GO categories relating to transcriptional regulation and signal transduction, suggesting potential importance of AS for playing regulatory role in these genes. Further, we observed that intron retention (IR) is predominant in the AS events and seems to preferentially occurred in genes with short introns.
The high-resolution RNA-seq analysis provides a global transcriptional landscape as a complement to the B. rapa genome sequence, which will advance our understanding of the dynamics and complexity of the B. rapa transcriptome. The atlas of gene expression in different tissues will be useful for accelerating research on functional genomics and genome evolution in Brassica species.
Brassica rapa; RNA-seq; Alternative splicing; Transcriptome
Brassica oleracea is a morphologically diverse species in the family Brassicaceae and contains a group of nutrition-rich vegetable crops, including common heading cabbage, cauliflower, broccoli, kohlrabi, kale, Brussels sprouts. This diversity along with its phylogenetic membership in a group of three diploid and three tetraploid species, and the recent availability of genome sequences within Brassica provide an unprecedented opportunity to study intra- and inter-species divergence and evolution in this species and its close relatives.
We have developed a comprehensive database, Bolbase, which provides access to the B. oleracea genome data and comparative genomics information. The whole genome of B. oleracea is available, including nine fully assembled chromosomes and 1,848 scaffolds, with 45,758 predicted genes, 13,382 transposable elements, and 3,581 non-coding RNAs. Comparative genomics information is available, including syntenic regions among B. oleracea, Brassica rapa and Arabidopsis thaliana, synonymous (Ks) and non-synonymous (Ka) substitution rates between orthologous gene pairs, gene families or clusters, and differences in quantity, category, and distribution of transposable elements on chromosomes. Bolbase provides useful search and data mining tools, including a keyword search, a local BLAST server, and a customized GBrowse tool, which can be used to extract annotations of genome components, identify similar sequences and visualize syntenic regions among species. Users can download all genomic data and explore comparative genomics in a highly visual setting.
Bolbase is the first resource platform for the B. oleracea genome and for genomic comparisons with its relatives, and thus it will help the research community to better study the function and evolution of Brassica genomes as well as enhance molecular breeding research. This database will be updated regularly with new features, improvements to genome annotation, and new genomic sequences as they become available. Bolbase is freely available at http://ocri-genomics.org/bolbase.
Brassica oleracea; Database; Genome sequence; Synteny; Comparative genomics
Large-scale genotyping plays an important role in genetic association studies. It has provided new opportunities for gene discovery, especially when combined with high-throughput sequencing technologies. Here, we report an efficient solution for large-scale genotyping. We call it specific-locus amplified fragment sequencing (SLAF-seq). SLAF-seq technology has several distinguishing characteristics: i) deep sequencing to ensure genotyping accuracy; ii) reduced representation strategy to reduce sequencing costs; iii) pre-designed reduced representation scheme to optimize marker efficiency; and iv) double barcode system for large populations. In this study, we tested the efficiency of SLAF-seq on rice and soybean data. Both sets of results showed strong consistency between predicted and practical SLAFs and considerable genotyping accuracy. We also report the highest density genetic map yet created for any organism without a reference genome sequence, common carp in this case, using SLAF-seq data. We detected 50,530 high-quality SLAFs with 13,291 SNPs genotyped in 211 individual carp. The genetic map contained 5,885 markers with 0.68 cM intervals on average. A comparative genomics study between common carp genetic map and zebrafish genome sequence map showed high-quality SLAF-seq genotyping results. SLAF-seq provides a high-resolution strategy for large-scale genotyping and can be generally applicable to various species and populations.
Chromoplasts are unique plastids that accumulate massive amounts of carotenoids. To gain a general and comparative characterization of chromoplast proteins, this study performed proteomic analysis of chromoplasts from six carotenoid-rich crops: watermelon, tomato, carrot, orange cauliflower, red papaya, and red bell pepper. Stromal and membrane proteins of chromoplasts were separated by 1D gel electrophoresis and analysed using nLC-MS/MS. A total of 953–2262 proteins from chromoplasts of different crop species were identified. Approximately 60% of the identified proteins were predicted to be plastid localized. Functional classification using MapMan bins revealed large numbers of proteins involved in protein metabolism, transport, amino acid metabolism, lipid metabolism, and redox in chromoplasts from all six species. Seventeen core carotenoid metabolic enzymes were identified. Phytoene synthase, phytoene desaturase, ζ-carotene desaturase, 9-cis-epoxycarotenoid dioxygenase, and carotenoid cleavage dioxygenase 1 were found in almost all crops, suggesting relative abundance of them among the carotenoid pathway enzymes. Chromoplasts from different crops contained abundant amounts of ATP synthase and adenine nucleotide translocator, which indicates an important role of ATP production and transport in chromoplast development. Distinctive abundant proteins were observed in chromoplast from different crops, including capsanthin/capsorubin synthase and fibrillins in pepper, superoxide dismutase in watermelon, carrot, and cauliflower, and glutathione-S-transferease in papaya. The comparative analysis of chromoplast proteins among six crop species offers new insights into the general metabolism and function of chromoplasts as well as the uniqueness of chromoplasts in specific crop species. This work provides reference datasets for future experimental study of chromoplast biogenesis, development, and regulation in plants.
Carrot; cauliflower; chromoplast; papaya; pepper; proteomics; tomato; watermelon
Flowering time is an important trait in Brassica rapa crops. FLOWERING LOCUS C (FLC) is a MADS-box transcription factor that acts as a potent repressor of flowering. Expression of FLC is silenced when plants are exposed to low temperature, which activates flowering. There are four copies of FLC in B. rapa. Analyses of different segregating populations have suggested that BraA.FLC.a (BrFLC1) and BraA.FLC.b (BrFLC2) play major roles in controlling flowering time in B. rapa.
We analyzed the BrFLC2 sequence in nine B. rapa accessions, and identified a 57-bp insertion/deletion (InDel) across exon 4 and intron 4 resulting in a non-functional allele. In total, three types of transcripts were identified for this mutated BrFLC2 allele. The InDel was used to develop a PCR-based marker, which was used to screen a collection of 159 B. rapa accessions. The deletion genotype was present only in oil-type B. rapa, including ssp. oleifera and ssp. tricolaris, and not in other subspecies. The deletion genotype was significantly correlated with variation in flowering time. In contrast, the reported splicing site variation in BrFLC1, which also leads to a non-functional locus, was detected but not correlated with variation in flowering time in oil-type B. rapa, although it was correlated with variation in flowering time in vegetable-type B. rapa.
Our results suggest that the naturally occurring deletion mutation across exon 4 and intron 4 in BrFLC2 gene contributes greatly to variation in flowering time in oil-type B. rapa. The observed different relationship between BrFLC1 or BrFLC2 and flowering time variation indicates that the control of flowering time has evolved separately between oil-type and vegetable-type B. rapa groups.
Polyploidization, both ancient and recent, is frequent among plants. A “two-step theory" was proposed to explain the meso-triplication of the Brassica “A" genome: Brassica rapa. By accurately partitioning of this genome, we observed that genes in the less fractioned subgenome (LF) were dominantly expressed over the genes in more fractioned subgenomes (MFs: MF1 and MF2), while the genes in MF1 were slightly dominantly expressed over the genes in MF2. The results indicated that the dominantly expressed genes tended to be resistant against gene fractionation. By re-sequencing two B. rapa accessions: a vegetable turnip (VT117) and a Rapid Cycling line (L144), we found that genes in LF had less non-synonymous or frameshift mutations than genes in MFs; however mutation rates were not significantly different between MF1 and MF2. The differences in gene expression patterns and on-going gene death among the three subgenomes suggest that “two-step" genome triplication and differential subgenome methylation played important roles in the genome evolution of B. rapa.
The well supported gene dosage hypothesis predicts that genes encoding proteins engaged in dose–sensitive interactions cannot be reduced back to single copies once all interacting partners are simultaneously duplicated in a whole genome duplication. The genomes of extant flowering plants are the result of many sequential rounds of whole genome duplication, yet the fraction of genomes devoted to encoding complex molecular machines does not increase as fast as expected through multiple rounds of whole genome duplications. Using parallel interspecies genomic comparisons in the grasses and crucifers, we demonstrate that genes retained as duplicates following a whole genome duplication have only a 50% chance of being retained as duplicates in a second whole genome duplication. Genes which fractionated to a single copy following a second whole genome duplication tend to be the member of a gene pair with less complex promoters, lower levels of expression, and to be under lower levels of purifying selection. We suggest the copy with lower levels of expression and less purifying selection contributes less to effective gene-product dosage and therefore is under less dosage constraint in future whole genome duplications, providing an explanation for why flowering plant genomes are not overrun with subunits of large dose–sensitive protein complexes.
polyploidy; gene dosage; gene loss; genome evolution; comparative genomics; crucifers; grasses
Chromosomal synteny analysis is important in genome comparison to reveal genomic evolution of related species. Shared synteny describes genomic fragments from different species that originated from an identical ancestor. Syntenic genes are orthologs located in these syntenic fragments, so they often share similar functions. Syntenic gene analysis is very important in Brassicaceae species to share gene annotations and investigate genome evolution. Here we designed and developed a direct and efficient tool, SynOrths, to identify pairwise syntenic genes between genomes of Brassicaceae species. SynOrths determines whether two genes are a conserved syntenic pair based not only on their sequence similarity, but also by the support of homologous flanking genes. Syntenic genes between Arabidopsis thaliana and Brassica rapa, Arabidopsis lyrata and B. rapa, and Thellungiella parvula and B. rapa were then identified using SynOrths. The occurrence of genome triplication in B. rapa was clearly observed, many genes that were evenly distributed in the genomes of A. thaliana, A. lyrata, and T. parvula had three syntenic copies in B. rapa. Additionally, there were many B. rapa genes that had no syntenic orthologs in A. thaliana, but some of these had syntenic orthologs in A. lyrata or T. parvula. Only 5,851 genes in B. rapa had no syntenic counterparts in any of the other three species. These 5,851 genes could have originated after B. rapa diverged from these species. A tool for syntenic gene analysis between species of Brassicaceae was developed, SynOrths, which could be used to accurately identify syntenic genes in differentiated but closely-related genomes. With this tool, we identified syntenic gene sets between B. rapa and each of A. thaliana, A. lyrata, T. parvula. Syntenic gene analysis is important for not only the gene annotation of newly sequenced Brassicaceae genomes by bridging them to model plant A. thaliana, but also the study of genome evolution in these species.
synteny; ortholog; Brassica rapa; Arabidopsis thaliana; Arabidopsis lyrata; Thellugiella parvula; Brassicaceae
Whole genome duplication (WGD) and tandem duplication (TD) are both important modes of gene expansion. However, how WGD influences tandemly duplicated genes is not well studied. We used Brassica rapa, which has undergone an additional genome triplication (WGT) and shares a common ancestor with Arabidopsis thaliana, Arabidopsis lyrata, and Thellungiella parvula, to investigate the impact of genome triplication on tandem gene evolution. We identified 2,137, 1,569, 1,751, and 1,135 tandem gene arrays in B. rapa, A. thaliana, A. lyrata, and T. parvula respectively. Among them, 414 conserved tandem arrays are shared by the three species without WGT, which were also considered as existing in the diploid ancestor of B. rapa. Thus, after genome triplication, B. rapa should have 1,242 tandem arrays according to the 414 conserved tandems. Here, we found 400 out of the 414 tandems had at least one syntenic ortholog in the genome of B. rapa. Furthermore, 294 out of the 400 shared syntenic orthologs maintain tandem arrays (more than one gene for each syntenic hit) in B. rapa. For the 294 tandem arrays, we obtained 426 copies of syntenic paralogous tandems in the triplicated genome of B. rapa. In this study, we demonstrated that tandem arrays in B. rapa were dramatically fractionated after WGT when compared either to non-tandem genes in the B. rapa genome or to the tandem arrays in closely related species that have not experienced a recent whole genome polyploidization event.
whole genome duplication; tandem duplication; tandem gene evolution; Brassica rapa; Arabidopsis thaliana; Arabidopsis lyrata; Thellungiella parvula
Brassica species include both vegetable and oilseed crops, which are very important to the daily life of common human beings. Meanwhile, the Brassica species represent an excellent system for studying numerous aspects of plant biology, specifically for the analysis of genome evolution following polyploidy, so it is also very important for scientific research. Now, the genome of Brassica rapa has already been assembled, it is the time to do deep mining of the genome data.
BRAD, the Brassica database, is a web-based resource focusing on genome scale genetic and genomic data for important Brassica crops. BRAD was built based on the first whole genome sequence and on further data analysis of the Brassica A genome species, Brassica rapa (Chiifu-401-42). It provides datasets, such as the complete genome sequence of B. rapa, which was de novo assembled from Illumina GA II short reads and from BAC clone sequences, predicted genes and associated annotations, non coding RNAs, transposable elements (TE), B. rapa genes' orthologous to those in A. thaliana, as well as genetic markers and linkage maps. BRAD offers useful searching and data mining tools, including search across annotation datasets, search for syntenic or non-syntenic orthologs, and to search the flanking regions of a certain target, as well as the tools of BLAST and Gbrowse. BRAD allows users to enter almost any kind of information, such as a B. rapa or A. thaliana gene ID, physical position or genetic marker.
BRAD, a new database which focuses on the genetics and genomics of the Brassica plants has been developed, it aims at helping scientists and breeders to fully and efficiently use the information of genome data of Brassica plants. BRAD will be continuously updated and can be accessed through http://brassicadb.org.
Brassica rapa is an economically important crop and a model plant for studies concerning polyploidization and the evolution of extreme morphology. The multinational B. rapa Genome Sequencing Project (BrGSP) was launched in 2003. In 2008, next generation sequencing technology was used to sequence the B. rapa genome. Several maps concerning B. rapa pseudochromosome assembly have been published but their coverage of the genome is incomplete, anchoring approximately 73.6% of the scaffolds on to chromosomes. Therefore, a new genetic map to aid pseudochromosome assembly is required.
This study concerns the construction of a reference genetic linkage map for Brassica rapa, forming the backbone for anchoring sequence scaffolds of the B. rapa genome resulting from recent sequencing efforts. One hundred and nineteen doubled haploid (DH) lines derived from microspore cultures of an F1 cross between a Chinese cabbage (B. rapa ssp. pekinensis) DH line (Z16) and a rapid cycling inbred line (L144) were used to construct the linkage map. PCR-based insertion/deletion (InDel) markers were developed by re-sequencing the two parental lines. The map comprises a total of 507 markers including 415 InDels and 92 SSRs. Alignment and orientation using SSR markers in common with existing B. rapa linkage maps allowed ten linkage groups to be identified, designated A01-A10. The total length of the linkage map was 1234.2 cM, with an average distance of 2.43 cM between adjacent marker loci. The lengths of linkage groups ranged from 71.5 cM to 188.5 cM for A08 and A09, respectively. Using the developed linkage map, 152 scaffolds were anchored on to the chromosomes, encompassing more than 82.9% of the B. rapa genome. Taken together with the previously available linkage maps, 183 scaffolds were anchored on to the chromosomes and the total coverage of the genome was 88.9%.
The development of this linkage map is vital for the integration of genome sequences and genetic information, and provides a useful resource for the international Brassica research community.
The Cucurbitaceae includes important crops such as cucumber, melon, watermelon, squash and pumpkin. However, few genetic and genomic resources are available for plant improvement. Some cucurbit species such as cucumber have a narrow genetic base, which impedes construction of saturated molecular linkage maps. We report herein the development of highly polymorphic simple sequence repeat (SSR) markers originated from whole genome shotgun sequencing and the subsequent construction of a high-density genetic linkage map. This map includes 995 SSRs in seven linkage groups which spans in total 573 cM, and defines ∼680 recombination breakpoints with an average of 0.58 cM between two markers. These linkage groups were then assigned to seven corresponding chromosomes using fluorescent in situ hybridization (FISH). FISH assays also revealed a chromosomal inversion between Cucumis subspecies [C. sativus var. sativus L. and var. hardwickii (R.) Alef], which resulted in marker clustering on the genetic map. A quarter of the mapped markers showed relatively high polymorphism levels among 11 inbred lines of cucumber. Among the 995 markers, 49%, 26% and 22% were conserved in melon, watermelon and pumpkin, respectively. This map will facilitate whole genome sequencing, positional cloning, and molecular breeding in cucumber, and enable the integration of knowledge of gene and trait in cucurbits.