The Brassica species include an important group of crops and provide opportunities for studying the evolutionary consequences of polyploidy. They are related to Arabidopsis thaliana, for which the first complete plant genome sequence was obtained and their genomes show extensive, although imperfect, conserved synteny with that of A. thaliana. A large number of EST sequences, derived from a range of different Brassica species, are available in the public database, but no public microarray resource has so far been developed for these species.
We assembled unigenes using ~800,000 EST sequences, mainly from three species: B. napus, B. rapa and B. oleracea. The assembly was conducted with the aim of co-assembling ESTs of orthologous genes (including homoeologous pairs of genes in B. napus from each of the A and C genomes), but resolving assemblies of paralogous, or paleo-homoeologous, genes (i.e. the genes related by the ancestral genome triplication observed in diploid Brassica species). 90,864 unique sequence assemblies were developed. These were incorporated into the BAC sequence annotation for the Brassica rapa Genome Sequencing Project, enabling the identification of cognate genomic sequences for a proportion of them. A 60-mer oligo microarray comprising 94,558 probes was developed using the unigene sequences. Gene expression was analysed in reciprocal resynthesised B. napus lines and the B. oleracea and B. rapa lines used to produce them. The analysis showed that significant expression could consistently be detected in leaf tissue for 35,386 unigenes. Expression was detected across all four genotypes for 27,355 unigenes, genome-specific expression patterns were observed for 7,851 unigenes and 180 unigenes displayed other classes of expression pattern. Principal component analysis (PCA) clearly resolved the individual microarray datasets for B. rapa, B. oleracea and resynthesised B. napus. Quantitative differences in expression were observed between the resynthesised B. napus lines for 98 unigenes, most of which could be classified into non-additive expression patterns, including 17 that showed cytoplasm-specific patterns. We further characterized the unigenes for which A genome-specific expression was observed and cognate genomic sequences could be identified. Ten of these unigenes were found to be Brassica-specific sequences, including two that originate from complex loci comprising gene clusters.
We succeeded in developing a Brassica community microarray resource. Although expression can be measured for the majority of unigenes across species, there were numerous probes that reported in a genome-specific manner. We anticipate that some proportion of these will represent species-specific transcripts and the remainder will be the consequence of variation of sequences within the regions represented by the array probes. Our studies demonstrated that the datasets obtained from the arrays can be used for typical analyses, including PCA and the analysis of differential expression. We have also demonstrated that Brassica-specific transcripts identified in silico in the sequence assembly of public EST database accessions are indeed reported by the array. These would not be detectable using arrays designed using A. thaliana sequences.