Whole genome duplication (WGD) and tandem duplication (TD) are both important modes of gene expansion. However, how WGD influences tandemly duplicated genes is not well studied. We used Brassica rapa, which has undergone an additional genome triplication (WGT) and shares a common ancestor with Arabidopsis thaliana, Arabidopsis lyrata, and Thellungiella parvula, to investigate the impact of genome triplication on tandem gene evolution. We identified 2,137, 1,569, 1,751, and 1,135 tandem gene arrays in B. rapa, A. thaliana, A. lyrata, and T. parvula respectively. Among them, 414 conserved tandem arrays are shared by the three species without WGT, which were also considered as existing in the diploid ancestor of B. rapa. Thus, after genome triplication, B. rapa should have 1,242 tandem arrays according to the 414 conserved tandems. Here, we found 400 out of the 414 tandems had at least one syntenic ortholog in the genome of B. rapa. Furthermore, 294 out of the 400 shared syntenic orthologs maintain tandem arrays (more than one gene for each syntenic hit) in B. rapa. For the 294 tandem arrays, we obtained 426 copies of syntenic paralogous tandems in the triplicated genome of B. rapa. In this study, we demonstrated that tandem arrays in B. rapa were dramatically fractionated after WGT when compared either to non-tandem genes in the B. rapa genome or to the tandem arrays in closely related species that have not experienced a recent whole genome polyploidization event.
whole genome duplication; tandem duplication; tandem gene evolution; Brassica rapa; Arabidopsis thaliana; Arabidopsis lyrata; Thellungiella parvula
Gene duplication is an important mechanism for the origination of functional novelties in organisms. We performed a comparative genome analysis to systematically estimate recent lineage specific gene duplication events in Arabidopsis thaliana and further investigate whether and how these new duplicate genes (NDGs) play a functional role in the evolution and adaption of A. thaliana. We accomplished this using syntenic relationship among four closely related species, A. thaliana, A. lyrata, Capsella rubella and Brassica rapa. We identified 100 NDGs, showing clear origination patterns, whose parental genes are located in syntenic regions and/or have clear orthologs in at least one of three outgroup species. All 100 NDGs were transcribed and under functional constraints, while 24% of the NDGs have differential expression patterns compared to their parental genes. We explored the underlying evolutionary forces of these paralogous pairs through conducting neutrality tests with sequence divergence and polymorphism data. Evolution of about 15% of NDGs appeared to be driven by natural selection. Moreover, we found that 3 NDGs not only altered their expression patterns when compared with parental genes, but also evolved under positive selection. We investigated the underlying mechanisms driving the differential expression of NDGs and their parents, and found a number of NDGs had different cis-elements and methylation patterns from their parental genes. Overall, we demonstrated that NDGs acquired divergent cis-elements and methylation patterns and may experience sub-functionalization or neo-functionalization influencing the evolution and adaption of A. thaliana.
Brassica rapa, which is closely related to
Arabidopsis thaliana, is an important crop and a
model plant for studying genome evolution via
polyploidization. We report the current understanding of the
genome structure of B. rapa and efforts for the
whole-genome sequencing of the species. The tribe
Brassicaceae, which comprises ca. 240 species,
descended from a common hexaploid ancestor with a basic genome
similar to that of Arabidopsis. Chromosome
rearrangements, including fusions and/or fissions, resulted in
the present-day “diploid” Brassica
species with variation in chromosome number and phenotype.
Triplicated genomic segments of B. rapa are
collinear to those of A. thaliana with InDels.
The genome triplication has led to an approximately 1.7-fold
increase in the B. rapa gene number compared to
that of A. thaliana. Repetitive DNA of B.
rapa has also been extensively amplified and has
diverged from that of A. thaliana. For its
whole-genome sequencing, the Brassica rapa Genome
Sequencing Project (BrGSP) consortium has developed suitable
genomic resources and constructed genetic and physical maps.
Ten chromosomes of B. rapa are being allocated to
BrGSP consortium participants, and each chromosome will be
sequenced by a BAC-by-BAC approach. Genome sequencing of
B. rapa will offer a new perspective for plant
biology and evolution in the context of polyploidization.
For identification of genes responsible for varietal differences in flowering time and leaf morphological traits, we constructed a linkage map of Brassica rapa DNA markers including 170 EST-based markers, 12 SSR markers, and 59 BAC sequence-based markers, of which 151 are single nucleotide polymorphism (SNP) markers. By BLASTN, 223 markers were shown to have homologous regions in Arabidopsis thaliana, and these homologous loci covered nearly the whole genome of A. thaliana. Synteny analysis between B. rapa and A. thaliana revealed 33 large syntenic regions. Three quantitative trait loci (QTLs) for flowering time were detected. BrFLC1 and BrFLC2 were linked to the QTLs for bolting time, budding time, and flowering time. Three SNPs in the promoter, which may be the cause of low expression of BrFLC2 in the early-flowering parental line, were identified. For leaf lobe depth and leaf hairiness, one major QTL corresponding to a syntenic region containing GIBBERELLIN 20 OXIDASE 3 and one major QTL containing BrGL1, respectively, were detected. Analysis of nucleotide sequences and expression of these genes suggested possible involvement of these genes in leaf morphological traits.
DNA markers; synteny; bolting time; leaf lobe; leaf hairiness
Brassica rapa is an important crop species that produces vegetables, oilseed, and fodder. Although many studies reported quantitative trait loci (QTL) mapping, the genes governing most of its economically important traits are still unknown. In this study, we report QTL mapping for morphological and yield component traits in B. rapa and comparative map alignment between B. rapa, B. napus, B. juncea, and Arabidopsis thaliana to identify candidate genes and conserved QTL blocks between them. A total of 95 QTL were identified in different crucifer blocks of the B. rapa genome. Through synteny analysis with A. thaliana, B. rapa candidate genes and intronic and exonic single nucleotide polymorphisms in the parental lines were detected from whole genome resequenced data, a few of which were validated by mapping them to the QTL regions. Semi-quantitative reverse transcriptase PCR analysis showed differences in the expression levels of a few genes in parental lines. Comparative mapping identified five key major evolutionarily conserved crucifer blocks (R, J, F, E, and W) harbouring QTL for morphological and yield components traits between the A, B, and C subgenomes of B. rapa, B. juncea, and B. napus. The information of the identified candidate genes could be used for breeding B. rapa and other related Brassica species.
Brassica rapa; quantitative trait loci (QTL); morphological traits; single nucleotide polymorphism (SNP); conserved genome blocks
Genome evolution is a continuous process and genomic rearrangement occurs both within and between species. With the sequencing of the Arabidopsis thaliana genome, comparative genetics and genomics offer new insights into plant biology. The genus Brassica offers excellent opportunities with which to compare genomic synteny so as to reveal genome evolution. During a previous genetic analysis of clubroot resistance in Brassica rapa, we identified a genetic region that is highly collinear with Arabidopsis chromosome 4. This region corresponds to a disease resistance gene cluster in the A. thaliana genome. Relying on synteny with Arabidopsis, we fine-mapped the region and found that the location and order of the markers showed good correspondence with those in Arabidopsis. Microsynteny on a physical map indicated an almost parallel correspondence, with a few rearrangements such as inversions and insertions. The results show that this genomic region of Brassica is conserved extensively with that of Arabidopsis and has potential as a disease resistance gene cluster, although the genera diverged 20 million years ago.
microsynteny; genome evolution; genome organization; genomic collinearity; BAC library
As part of a research programme focused on flavonoid biosynthesis in the seed coat of Brassica napus L. (oilseed rape), orthologs of the BANYULS gene that encoded anthocyanidin reductase were cloned in B. napus as well as in the related species Brassica rapa and Brassica oleracea. B. napus genome contained four functional copies of BAN, two originating from each diploid progenitor. Amino acid sequences were highly conserved between the Brassicaceae including B. napus, B. rapa, B. oleracea as well as the model plant Arabidopsis thaliana. Along the 200 bp in 5′ of the ATG codon, Bna.BAN promoters (ProBna.BAN) were conserved with AtANR promoter and contained putative cis-acting elements. In addition, transgenic Arabidopsis and oilseed rape plants carrying the first 230 bp of ProBna.BAN fused to the UidA reporter gene were generated. In the two Brassicaceae backgrounds, ProBna.BAN activity was restricted to the seed coat. In B. napus seed, ProBna.BAN was activated in procyanidin-accumulating cells, namely the innermost layer of the inner integument and the micropyle-chalaza area. At the transcriptional level, the four Bna.BAN genes were expressed in the seed. Laser microdissection assays of the seed integuments showed that Bna.BAN expression was restricted to the inner integument, which was consistent with the activation profile of ProBna.BAN. Finally, Bna.BAN genes were mapped onto oilseed rape genetic maps and potential co-localisations with seed colour quantitative trait loci are discussed.
Anthocyanidin reductase; BANYULS genes; Brassica; Flavonoid metabolism; Seed coat-specific promoter
The Brassicaceae family includes the model plant Arabidopsis thaliana as well as a number of agronomically important species such as oilseed crops (in particular Brassica napus, B. juncea and B. rapa) and vegetables (eg. B. rapa and B. oleracea).
Separated by only 10-20 million years, Brassica species and Arabidopsis thaliana are closely related, and it is expected that knowledge obtained relating to Arabidopsis growth and development can be translated into Brassicas for crop improvement. Moreover, certain aspects of plant development are sufficiently different between Brassica and Arabidopsis to warrant studies to be carried out directly in the crop species. However, mutating individual genes in the amphidiploid Brassicas such as B. napus and B. juncea may, on the other hand, not give rise to expected phenotypes as the genomes of these species can contain up to six orthologues per single-copy Arabidopsis gene. In order to elucidate and possibly exploit the function of redundant genes for oilseed rape crop improvement, it may therefore be more efficient to study the effects in one of the diploid Brassica species such as B. rapa. Moreover, the ongoing sequencing of the B. rapa genome makes this species a highly attractive model for Brassica research and genetic resource development.
Seeds from the diploid Brassica A genome species, B. rapa were treated with ethyl methane sulfonate (EMS) to produce a TILLING (Targeting Induced Local Lesions In Genomes) population for reverse genetics studies. We used the B. rapa genotype, R-o-18, which has a similar developmental ontogeny to an oilseed rape crop. Hence this resource is expected to be well suited for studying traits with relevance to yield and quality of oilseed rape. DNA was isolated from a total of 9,216 M2 plants and pooled to form the basis of the TILLING platform. Analysis of six genes revealed a high level of mutations with a density of about one per 60 kb. This analysis also demonstrated that screening a 1 kb amplicon in just one third of the population (3072 M2 plants) will provide an average of 68 mutations and a 97% probability of obtaining a stop-codon mutation resulting in a truncated protein. We furthermore calculated that each plant contains on average ~10,000 mutations and due to the large number of plants, it is predicted that mutations in approximately half of the GC base pairs in the genome exist within this population.
We have developed the first EMS TILLING resource in the diploid Brassica species, B. rapa. The mutation density in this population is ~1 per 60 kb, which makes it the most densely mutated diploid organism for which a TILLING population has been published. This resource is publicly available through the RevGenUK reverse genetics platform http://revgenuk.jic.ac.uk.
The completion and release of the Brassica rapa genome is of great benefit to researchers of the Brassicas, Arabidopsis, and genome evolution. While its lineage is closely related to the model organism Arabidopsis thaliana, the Brassicas experienced a whole genome triplication subsequent to their divergence. This event contemporaneously created three copies of its ancestral genome, which had diploidized through the process of homeologous gene loss known as fractionation. By the fractionation of homeologous gene content and genetic regulatory binding sites, Brassica’s genome is well placed to use comparative genomic techniques to identify syntenic regions, homeologous gene duplications, and putative regulatory sequences. Here, we use the comparative genomics platform CoGe to perform several different genomic analyses with which to study structural changes of its genome and dynamics of various genetic elements. Starting with whole genome comparisons, the Brassica paleohexaploidy is characterized, syntenic regions with A. thaliana are identified, and the TOC1 gene in the circadian rhythm pathway from A. thaliana is used to find duplicated orthologs in B. rapa. These TOC1 genes are further analyzed to identify conserved non-coding sequences that contain cis-acting regulatory elements and promoter sequences previously implicated in circadian rhythmicity. Each “cookbook style” analysis includes a step-by-step walk-through with links to CoGe to quickly reproduce each step of the analytical process.
comparative genomics; synteny; CoGe; Brassica rapa; syntenic dotplot; Arabidopsis; TOC1; conserved non-coding sequences
Brassicaceae is an important family of the plant kingdom which includes several plants of major economic importance. The Brassica spp. and Arabidopsis share much-conserved colinearity between their genomes which can be exploited for the genomic research in Brassicaceae crops. In this study, 131,286 ESTs of five Brassicaceae species were assembled into unigene contigs and compared with Arabidopsis gene indices. Almost all the unigenes of Brassicaceae species showed high similarities with Arabidopsis genes except those of B. napus, where 90% of unigenes were found similar. A total of 9,699 SSRs were identified in the unigenes. PCR primers were designed based on this information and amplified across species for validation. Functional annotation of unigenes showed that the majority of the genes are present in metabolism and energy functional classes. It is expected that comparative genome analysis between Arabidopsis and related crop species will expedite research in the more complex Brassica genomes. This would be helpful for genomics as well as evolutionary studies, and DNA markers developed can be used for mapping, tagging, and cloning of important genes in Brassicaceae.
Comparative genomics has provided valuable insights into the nature of gene sequence variation and chromosomal organization of closely related bacterial species. However, questions about the biological significance of gene order conservation, or synteny, remain open. Moreover, few comprehensive studies have been reported for rhizobial genomes.
We analyzed the genomic sequences of four fast growing Rhizobiales (Sinorhizobium meliloti, Agrobacterium tumefaciens, Mesorhizobium loti and Brucella melitensis). We made a comprehensive gene classification to define chromosomal orthologs, genes with homologs in other replicons such as plasmids, and those which were species-specific. About two thousand genes were predicted to be orthologs in each chromosome and about 80% of these were syntenic. A striking gene colinearity was found in pairs of organisms and a large fraction of the microsyntenic regions and operons were similar. Syntenic products showed higher identity levels than non-syntenic ones, suggesting a resistance to sequence variation due to functional constraints; also, an unusually high fraction of syntenic products contained membranal segments. Syntenic genes encode a high proportion of essential cell functions, presented a high level of functional relationships and a very low horizontal gene transfer rate. The sequence variability of the proteins can be considered the species signature in response to specific niche adaptation. Comparatively, an analysis with genomes of Enterobacteriales showed a different gene organization but gave similar results in the synteny conservation, essential role of syntenic genes and higher functional linkage among the genes of the microsyntenic regions.
Syntenic bacterial genes represent a commonly evolved group. They not only reveal the core chromosomal segments present in the last common ancestor and determine the metabolic characteristics shared by these microorganisms, but also show resistance to sequence variation and rearrangement, possibly due to their essential character. In Rhizobiales and Enterobacteriales, syntenic genes encode a high proportion of essential cell functions and presented a high level of functional relationships.
The species Brassica rapa includes important vegetable and oil crops. It also serves as an excellent model system to study polyploidy-related genome evolution because of its paleohexaploid ancestry and its close evolutionary relationships with Arabidopsis thaliana and other Brassica species with larger genomes. Therefore, its genome sequence will be used to accelerate both basic research on genome evolution and applied research across the cultivated Brassica species.
We have determined and analyzed the sequence of B. rapa chromosome A3. We obtained 31.9 Mb of sequences, organized into nine contigs, which incorporated 348 overlapping BAC clones. Annotation revealed 7,058 protein-coding genes, with an average gene density of 4.6 kb per gene. Analysis of chromosome collinearity with the A. thaliana genome identified conserved synteny blocks encompassing the whole of the B. rapa chromosome A3 and sections of four A. thaliana chromosomes. The frequency of tandem duplication of genes differed between the conserved genome segments in B. rapa and A. thaliana, indicating differential rates of occurrence/retention of such duplicate copies of genes. Analysis of 'ancestral karyotype' genome building blocks enabled the development of a hypothetical model for the derivation of the B. rapa chromosome A3.
We report the near-complete chromosome sequence from a dicotyledonous crop species. This provides an example of the complexity of genome evolution following polyploidy. The high degree of contiguity afforded by the clone-by-clone approach provides a benchmark for the performance of whole genome shotgun approaches presently being applied in B. rapa and other species with complex genomes.
Following successful completion of the Brassica rapa sequencing project, the next step is to investigate functions of individual genes/proteins. For Arabidopsis thaliana, large amounts of protein–protein interaction (PPI) data are available from the major PPI databases (DBs). It is known that Brassica crop species are closely related to A. thaliana. This provides an opportunity to infer the B. rapa interactome using PPI data available from A. thaliana. In this paper, we present an inferred B. rapa interactome that is based on the A. thaliana PPI data from two resources: (i) A. thaliana PPI data from three major DBs, BioGRID, IntAct, and TAIR. (ii) ortholog-based A. thaliana PPI predictions. Linking between B. rapa and A. thaliana was accomplished in three complementary ways: (i) ortholog predictions, (ii) identification of gene duplication based on synteny and collinearity, and (iii) BLAST sequence similarity search. A complementary approach was also applied, which used known/predicted domain–domain interaction data. Specifically, since the two species are closely related, we used PPI data from A. thaliana to predict interacting domains that might be conserved between the two species. The predicted interactome was investigated for the component that contains known A. thaliana meiotic proteins to demonstrate its usability.
Brassica rapa; Arabidopsis thaliana; interactome; protein–protein interaction; domain–domain interaction; meiosis
Molecular genetic maps provide a means to link heritable traits with underlying genome sequence variation. Several genetic maps have been constructed for Brassica species, yet to date, there has been no simple means to compare this information or to associate mapped traits with the genome sequence of the related model plant, Arabidopsis.
We have developed a comparative genetic map database for the viewing, comparison and analysis of Brassica and Arabidopsis genetic, physical and trait map information. This web-based tool allows users to view and compare genetic and physical maps, search for traits and markers, and compare genetic linkage groups within and between the amphidiploid and diploid Brassica genomes. The inclusion of Arabidopsis data enables comparison between Brassica maps that share no common markers. Analysis of conserved syntenic blocks between Arabidopsis and collated Brassica genetic maps validates the application of this system. This tool is freely available over the internet on .
This database enables users to interrogate the relationship between Brassica genetic maps and the sequenced genome of A. thaliana, permitting the comparison of genetic linkage groups and mapped traits and the rapid identification of candidate genes.
Arabidopsis belongs to the Brassicaceae family and plays an important role as a model plant for which researchers have developed fine-tuned genome resources. Genome sequencing projects have been initiated for other members of the Brassicaceae family. Among these projects, research on Chinese cabbage (Brassica rapa subsp. pekinensis) started early because of strong interest in this species. Here, we report the development of a library of Chinese cabbage full-length cDNA clones, the RIKEN BRC B. rapa full-length cDNA (BBRAF) resource, to accelerate research on Brassica species. We sequenced 10 000 BBRAF clones and confirmed 5476 independent clones. Most of these cDNAs showed high homology to Arabidopsis genes, but we also obtained more than 200 cDNA clones that lacked any sequence homology to Arabidopsis genes. We also successfully identified several possible candidate marker genes for plant defence responses from our analysis of the expression of the Brassica counterparts of Arabidopsis marker genes in response to salicylic acid and jasmonic acid. We compared gene expression of these markers in several Chinese cabbage cultivars. Our BBRAF cDNA resource will be publicly available from the RIKEN Bioresource Center and will help researchers to transfer Arabidopsis-related knowledge to Brassica crops.
Arabidopsis; Brassica rapa; full-length cDNA; jasmonic acid; salicylic acid
Fragmentary conservation of synteny has been reported between map-anchored Prunus sequences and Arabidopsis. With the availability of genome sequence for fellow rosid I members Populus and Medicago, we analyzed the synteny between Prunus and the three model genomes. Eight Prunus BAC sequences and map-anchored Prunus sequences were used in the comparison.
We found a well conserved synteny across the Prunus species – peach, plum, and apricot – and Populus using a set of homologous Prunus BACs. Conversely, we could not detect any synteny with Arabidopsis in this region. Other peach BACs also showed extensive synteny with Populus. The syntenic regions detected were up to 477 kb in Populus. Two syntenic regions between Arabidopsis and these BACs were much shorter, around 10 kb. We also found syntenic regions that are conserved between the Prunus BACs and Medicago. The array of synteny corresponded with the proposed whole genome duplication events in Populus and Medicago. Using map-anchored Prunus sequences, we detected many syntenic blocks with several gene pairs between Prunus and Populus or Arabidopsis. We observed a more complex network of synteny between Prunus-Arabidopsis, indicative of multiple genome duplication and subsequence gene loss in Arabidopsis.
Our result shows the striking microsynteny between the Prunus BACs and the genome of Populus and Medicago. In macrosynteny analysis, more distinct Prunus regions were syntenic to Populus than to Arabidopsis.
Recent phylogenetic analyses have identified Amborella trichopoda, an understory tree species endemic to the forests of New Caledonia, as sister to a clade including all other known flowering plant species. The Amborella genome is a unique reference for understanding the evolution of angiosperm genomes because it can serve as an outgroup to root comparative analyses. A physical map, BAC end sequences and sample shotgun sequences provide a first view of the 870 Mbp Amborella genome.
Analysis of Amborella BAC ends sequenced from each contig suggests that the density of long terminal repeat retrotransposons is negatively correlated with that of protein coding genes. Syntenic, presumably ancestral, gene blocks were identified in comparisons of the Amborella BAC contigs and the sequenced Arabidopsis thaliana, Populus trichocarpa, Vitis vinifera and Oryza sativa genomes. Parsimony mapping of the loss of synteny corroborates previous analyses suggesting that the rate of structural change has been more rapid on lineages leading to Arabidopsis and Oryza compared with lineages leading to Populus and Vitis. The gamma paleohexiploidy event identified in the Arabidopsis, Populus and Vitis genomes is shown to have occurred after the divergence of all other known angiosperms from the lineage leading to Amborella.
When placed in the context of a physical map, BAC end sequences representing just 5.4% of the Amborella genome have facilitated reconstruction of gene blocks that existed in the last common ancestor of all flowering plants. The Amborella genome is an invaluable reference for inferences concerning the ancestral angiosperm and subsequent genome evolution.
Euchromatic regions of the Brassica rapa genome were sequenced and mapped onto the corresponding regions in the Arabidopsis thaliana genome.
Brassica rapa is one of the most economically important vegetable crops worldwide. Owing to its agronomic importance and phylogenetic position, B. rapa provides a crucial reference to understand polyploidy-related crop genome evolution. The high degree of sequence identity and remarkably conserved genome structure between Arabidopsis and Brassica genomes enables comparative tiling sequencing using Arabidopsis sequences as references to select the counterpart regions in B. rapa, which is a strong challenge of structural and comparative crop genomics.
We assembled 65.8 megabase-pairs of non-redundant euchromatic sequence of B. rapa and compared this sequence to the Arabidopsis genome to investigate chromosomal relationships, macrosynteny blocks, and microsynteny within blocks. The triplicated B. rapa genome contains only approximately twice the number of genes as in Arabidopsis because of genome shrinkage. Genome comparisons suggest that B. rapa has a distinct organization of ancestral genome blocks as a result of recent whole genome triplication followed by a unique diploidization process. A lack of the most recent whole genome duplication (3R) event in the B. rapa genome, atypical of other Brassica genomes, may account for the emergence of B. rapa from the Brassica progenitor around 8 million years ago.
This work demonstrates the potential of using comparative tiling sequencing for genome analysis of crop species. Based on a comparative analysis of the B. rapa sequences and the Arabidopsis genome, it appears that polyploidy and chromosomal diploidization are ongoing processes that collectively stabilize the B. rapa genome and facilitate its evolution.
The Brassica species include an important group of crops and provide opportunities for studying the evolutionary consequences of polyploidy. They are related to Arabidopsis thaliana, for which the first complete plant genome sequence was obtained and their genomes show extensive, although imperfect, conserved synteny with that of A. thaliana. A large number of EST sequences, derived from a range of different Brassica species, are available in the public database, but no public microarray resource has so far been developed for these species.
We assembled unigenes using ~800,000 EST sequences, mainly from three species: B. napus, B. rapa and B. oleracea. The assembly was conducted with the aim of co-assembling ESTs of orthologous genes (including homoeologous pairs of genes in B. napus from each of the A and C genomes), but resolving assemblies of paralogous, or paleo-homoeologous, genes (i.e. the genes related by the ancestral genome triplication observed in diploid Brassica species). 90,864 unique sequence assemblies were developed. These were incorporated into the BAC sequence annotation for the Brassica rapa Genome Sequencing Project, enabling the identification of cognate genomic sequences for a proportion of them. A 60-mer oligo microarray comprising 94,558 probes was developed using the unigene sequences. Gene expression was analysed in reciprocal resynthesised B. napus lines and the B. oleracea and B. rapa lines used to produce them. The analysis showed that significant expression could consistently be detected in leaf tissue for 35,386 unigenes. Expression was detected across all four genotypes for 27,355 unigenes, genome-specific expression patterns were observed for 7,851 unigenes and 180 unigenes displayed other classes of expression pattern. Principal component analysis (PCA) clearly resolved the individual microarray datasets for B. rapa, B. oleracea and resynthesised B. napus. Quantitative differences in expression were observed between the resynthesised B. napus lines for 98 unigenes, most of which could be classified into non-additive expression patterns, including 17 that showed cytoplasm-specific patterns. We further characterized the unigenes for which A genome-specific expression was observed and cognate genomic sequences could be identified. Ten of these unigenes were found to be Brassica-specific sequences, including two that originate from complex loci comprising gene clusters.
We succeeded in developing a Brassica community microarray resource. Although expression can be measured for the majority of unigenes across species, there were numerous probes that reported in a genome-specific manner. We anticipate that some proportion of these will represent species-specific transcripts and the remainder will be the consequence of variation of sequences within the regions represented by the array probes. Our studies demonstrated that the datasets obtained from the arrays can be used for typical analyses, including PCA and the analysis of differential expression. We have also demonstrated that Brassica-specific transcripts identified in silico in the sequence assembly of public EST database accessions are indeed reported by the array. These would not be detectable using arrays designed using A. thaliana sequences.
Orthologous relationships between genes are routinely inferred from bidirectional best hits (BBH) in pairwise genome comparisons. However, to our knowledge, it has never been quantitatively demonstrated that orthologs form BBH. To test this “BBH-orthology conjecture,” we take advantage of the operon organization of bacterial and archaeal genomes and assume that, when two genes in compared genomes are flanked by two BBH show statistically significant sequence similarity to one another, these genes are bona fide orthologs. Under this assumption, we tested whether middle genes in “syntenic orthologous gene triplets” form BBH. We found that this was the case in more than 95% of the syntenic gene triplets in all genome comparisons. A detailed examination of the exceptions to this pattern, including maximum likelihood phylogenetic tree analysis, showed that some of these deviations involved artifacts of genome annotation, whereas very small fractions represented random assignment of the best hit to one of closely related in-paralogs, paralogous displacement in situ, or even less frequent genuine violations of the BBH–orthology conjecture caused by acceleration of evolution in one of the orthologs. We conclude that, at least in prokaryotes, genes for which independent evidence of orthology is available typically form BBH and, conversely, BBH can serve as a strong indication of gene orthology.
orthology; bidirectional best hit; genome comparison; synteny
Yellow-seed (i.e., yellow seed coat) is one of the most important agronomic traits of Brassica plants, which is correlated with seed oil and meal qualities. Previous studies on the Brassicaceae, including Arabidopsis and Brassica species, proposed that the seed-color trait is correlative to flavonoid and lignin biosynthesis, at the molecular level. In Arabidopsis thaliana, the oxidative polymerization of flavonoid and biosynthesis of lignin has been demonstrated to be catalyzed by laccase 15, a functional enzyme encoded by the AtTT10 gene. In this study, eight Brassica TT10 genes (three from B. napus, three from B. rapa and two from B. oleracea) were isolated and their roles in flavonoid oxidation/polymerization and lignin biosynthesis were investigated. Based on our phylogenetic analysis, these genes could be divided into two groups with obvious structural and functional differentiation. Expression studies showed that Brassica TT10 genes are active in developing seeds, but with differential expression patterns in yellow- and black-seeded near-isogenic lines. For functional analyses, three black-seeded B. napus cultivars were chosen for transgenic studies. Transgenic B. napus plants expressing antisense TT10 constructs exhibited retarded pigmentation in the seed coat. Chemical composition analysis revealed increased levels of soluble proanthocyanidins, and decreased extractable lignin in the seed coats of these transgenic plants compared with that of the controls. These findings indicate a role for the Brassica TT10 genes in proanthocyanidin polymerization and lignin biosynthesis, as well as seed coat pigmentation in B. napus.
The Brassica species, related to Arabidopsis thaliana, include an important group of crops and represent an excellent system for studying the evolutionary consequences of polyploidy. Previous studies have led to a proposed structure for an ancestral karyotype and models for the evolution of the B. rapa genome by triplication and segmental rearrangement, but these have not been validated at the sequence level.
We developed computational tools to analyse the public collection of B. rapa BAC end sequence, in order to identify candidates for representing collinearity discontinuities between the genomes of B. rapa and A. thaliana. For each putative discontinuity, one of the BACs was sequenced and analysed for collinearity with the genome of A. thaliana. Additional BAC clones were identified and sequenced as part of ongoing efforts to sequence four chromosomes of B. rapa. Strikingly few of the 19 inter-chromosomal rearrangements corresponded to the set of collinearity discontinuities anticipated on the basis of previous studies. Our analyses revealed numerous instances of newly detected collinearity blocks. For B. rapa linkage group A8, we were able to develop a model for the derivation of the chromosome from the ancestral karyotype. We were also able to identify a rearrangement event in the ancestor of B. rapa that was not shared with the ancestor of A. thaliana, and is represented in triplicate in the B. rapa genome. In addition to inter-chromosomal rearrangements, we identified and analysed 32 BACs containing the end points of segmental inversion events.
Our results show that previous studies of segmental collinearity between the A. thaliana, Brassica and ancestral karyotype genomes, although very useful, represent over-simplifications of their true relationships. The presence of numerous cryptic collinear genome segments and the frequent occurrence of segmental inversions mean that inference of the positions of genes in B. rapa based on the locations of orthologues in A. thaliana can be misleading. Our results will be of relevance to a wide range of plants that have polyploid genomes, many of which are being considered according to a paradigm of comprising conserved synteny blocks with respect to sequenced, related genomes.
Comparative sequence analysis is widely used to infer gene function and study genome evolution and requires proper ortholog identification across different genomes. We have developed a program for the Identification of Orthologs in one-to-one relationship by Neighborhood and Similarity (IONS) between closely related species. The algorithm combines two levels of evidence to determine co-ancestrality at the genome scale: sequence similarity and shared neighborhood. The method was initially designed to provide anchor points for syntenic blocks within the Génolevures project concerning nine hemiascomycetous yeasts (about 50,000 genes) and is applicable to different input databases. Comparison based on use of a Rand index shows that the results are highly consistent with the pillars of the Yeast Gene Order Browser, a manually curated database. Compared with SYNERGY, another algorithm reporting homology relationships, our method’s main advantages are its automation and the absence of dataset-dependent parameters, facilitating consistent integration of newly released genomes.
ortholog; synteny; shared neighborhood; hemiascomycete; yeast
Chinese cabbage (Brassica rapa ssp. pekinensis) is a member of one of the most important leaf vegetables grown worldwide, which has experienced thousands of years in cultivation and artificial selection. The entire Chinese cabbage genome sequence, and more than forty thousand proteins have been obtained to date. The genome has undergone triplication events since its divergence from Arabidopsis thaliana (13 to 17 Mya), however a high degree of sequence similarity and conserved genome structure remain between the two species. Arabidopsis is therefore a viable reference species for comparative genomics studies. Variation in the number of members in gene families due to genome triplication may contribute to the broad range of phenotypic plasticity, and increased tolerance to environmental extremes observed in Brassica species. Transcription factors are important regulators involved in plant developmental and physiological processes. The AP2/ERF proteins, one of the most important families of transcriptional regulators, play a crucial role in plant growth, and in response to biotic and abiotic stressors. Our analysis will provide resources for understanding the tolerance mechanisms in Brassica rapa ssp. pekinensis.
In the present study, 291 putative AP2/ERF transcription factor proteins were identified from the Chinese cabbage genome database, and compared with proteins from 15 additional species. The Chinese cabbage AP2/ERF superfamily was classified into four families, including AP2, ERF, RAV, and Soloist. The ERF family was further divided into DREB and ERF subfamilies. The AP2/ERF superfamily was subsequently divided into 15 groups. The identification, classification, phylogenetic reconstruction, conserved motifs, chromosome distribution, functional annotation, expression patterns, and interaction networks of the AP2/ERF transcription factor superfamily were predicted and analyzed. Distribution mapping results showed AP2/ERF superfamily genes were localized on the 10 Chinese cabbage chromosomes. AP2/ERF transcription factor expression levels exhibited differences among six tissue types based on expressed sequence tags (ESTs). In the AP2/ERF superfamily, 214 orthologous genes were identified between Chinese cabbage and Arabidopsis. Orthologous gene interaction networks were constructed, and included seven CBF and four AP2 genes, primarily involved in cold regulatory pathways and ovule development, respectively.
The evolution of the AP2/ERF transcription factor superfamily in Chinese cabbage resulted from genome triplication and tandem duplications. A comprehensive analysis of the physiological functions and biological roles of AP2/ERF superfamily genes in Chinese cabbage is required to fully elucidate AP2/ERF, which provides us with rich resources and opportunities to understand crop stress tolerance mechanisms.
Chinese cabbage; AP2/ERF; Stress tolerance; Gene expression; Interaction network; Protein annotation
Biofuels extracted from the seeds of Camelina sativa have recently been used successfully as environmentally friendly jet-fuel to reduce greenhouse gas emissions. Camelina sativa is genetically very close to Arabidopsis thaliana, and both are members of the Brassicaceae. Although public databases are currently available for some members of the Brassicaceae, such as A. thaliana, A. lyrata, Brassica napus, B. juncea and B. rapa, there are no public Expressed Sequence Tags (EST) or genomic data for Camelina sativa. In this study, a high-throughput, large-scale RNA sequencing (RNA-seq) of the Camelina sativa transcriptome was carried out to generate a database that will be useful for further functional analyses.
Approximately 27 million clean “reads” filtered from raw reads by removal of adaptors, ambiguous reads and low-quality reads (2.42 gigabase pairs) were generated by Illumina paired-end RNA-seq technology. All of these clean reads were assembled de novo into 83,493 unigenes and 103,196 transcripts using SOAPdenovo and Trinity, respectively. The average length of the transcripts generated by Trinity was 697 bp (N50 = 976), which was longer than the average length of unigenes (319 bp, N50 = 346 bp). Nonetheless, the assembly generated by SOAPdenovo produced similar number of non-redundant hits (22,435) with that of Trinity (22,433) in BLASTN searches of the Arabidopsis thaliana CDS sequence database (TAIR). Four public databases, the Kyoto Encyclopedia of Genes and Genomes (KEGG), Swiss-prot, NCBI non-redundant protein (NR), and the Cluster of Orthologous Groups (COG), were used for unigene annotation; 67,791 of 83,493 unigenes (81.2%) were finally annotated with gene descriptions or conserved protein domains that were mapped to 25,329 non-redundant protein sequences. We mapped 27,042 of 83,493 unigenes (32.4%) to 119 KEGG metabolic pathways.
This is the first report of a transcriptome database for Camelina sativa, an environmentally important member of the Brassicaceae. We showed that C. savita is closely related to Arabidopsis spp. and more distantly related to Brassica spp. Although the majority of annotated genes had high sequence identity to those of A. thaliana, a substantial proportion of disease-resistance genes (NBS-encoding LRR genes) were instead more closely similar to the genes of other Brassicaceae; these genes included BrCN, BrCNL, BrNL, BrTN, BrTNL in B. rapa. As plant genomes are under long-term selection pressure from environmental stressors, conservation of these disease-resistance genes in C. sativa and B. rapa genomes implies that they are exposed to the threats from closely-related pathogens in their natural habitats.
Brassicaceae; Camelina sativa; Transcriptome; de novo; Paired-end sequencing; NBS-LRR