The whitefly Bemisia tabaci
(Gennadius) is a genetically diverse complex containing some of the most destructive invasive pests of many ornamental and glasshouse crops worldwide [1
]. The species complex colonizes more than 600 different species of plants, transmits many plant viruses, feeds on phloem sap, and promotes the growth of damaging fungi on honeydew excretions deposited on plants [3
]. Recent phylogenetic analysis combined with a pattern of reproductive isolation among genetic groups within B. tabaci
indicate that the complex contains at least 24 cryptic species, some of which have been referred to as "biotypes" in the last 20 years [7
]. As the separation at the species level within the B. tabaci
complex is yet to be fully resolved, we have retained the commonly used term "biotype" to link this study with existing literature. The most predominant and damaging biotypes of B. tabaci
are the B and Q biotypes [9
]. While the former is known for its high fitness parameters, the Q biotype whitefly has a unique ability to develop and maintain high levels of resistance to major classes of insecticides owing to biological and genetic factors [11
Despite its global importance, genomic sequence resources available for the whitefly are scarce, especially for the Q biotype. Currently (March 30th, 2010), there are about 9110 EST and 762 nucleotide sequences available on NCBI for the B biotype whitefly, and only 683 nucleotide sequences have been deposited for the Q biotype whitefly. The previous EST sequencing efforts for the B biotype whitefly have allowed the development of small-scale microarrays for gene expression analysis in the context of insecticide resistance and parasitoid-whitefly interactions [13
]. While these studies have highlighted the utility of cDNA sequencing for candidate gene discovery in the absence of a genome sequence, a comprehensive description of the genes expressed in insecticide-resistant Q biotype whitefly remains unavailable.
Over the past several years, the next generation sequencing technology has emerged as a cutting edge approach for high-throughput sequence determination and this has dramatically improved the efficiency and speed of gene discovery [16
]. For example, the Illumina sequencing technology is able to generate over one billion bases of high-quality DNA sequence per run at less than 1% of the cost of capillary-based methods [18
]. Furthermore, this next generation sequencing has also significantly accelerated and improved the sensitivity of gene-expression profiling and, is expected to boost collaborative and comparative genomics studies [19
]. Previously, Illumina sequencing of transcriptomes for organisms with completed genomes confirmed that the relatively short reads produced can be effectively assembled and used for gene discovery and comparison of gene expression profiles [21
]. Despite its obvious potential, next generation sequencing methods have not yet been applied to whitefly research.
In this study, we generated over three billion bases of high-quality DNA sequence with Illumina technology and demonstrated the suitability of short-read sequencing for de novo assembly and annotation of genes expressed in a eukaryote without the prior genome information. In a single run, we identified 168,900 distinct sequences including hundreds of insecticide target and metabolism genes. Furthermore, we compared the gene expression profiles of whiteflies during different developmental stages using a digital gene expression system. The assembled, annotated transcriptome sequences and gene expression profiles provide an invaluable resource for the identification of whitefly genes involved in insecticide resistance, development and virus transmission.