A rich chapter in the history of insect endocrinology has focused on hormonal control of diapause, especially the major roles played by juvenile hormones (JHs), ecdysteroids, and the neuropeptides that govern JH and ecdysteroid synthesis. More recently, experiments with adult diapause in Drosophila melanogaster and the mosquito Culex pipiens, and pupal diapause in the flesh fly Sarcophaga crassipalpis provide strong evidence that insulin signaling is also an important component of the regulatory pathway leading to the diapause phenotype. Insects produce many different insulin-like peptides (ILPs), and not all are involved in the diapause response; ILP-1 appears to be the one most closely linked to diapause in C. pipiens. Many steps in the pathway leading from perception of daylength (the primary environmental cue used to program diapause) to generation of the diapause phenotype remain unknown, but the role for insulin signaling in mosquito diapause appears to be upstream of JH, as evidenced by the fact that application of exogenous JH can rescue the effects of knocking down expression of ILP-1 or the Insulin Receptor. Fat accumulation, enhancement of stress tolerance, and other features of the diapause phenotype are likely linked to the insulin pathway through the action of a key transcription factor, FOXO. This review highlights many parallels for the role of insulin signaling as a regulator in insect diapause and dauer formation in the nematode Caenorhabditis elegans.
diapause; dauer; insulin signaling; FOXO; Culex pipiens
Life-history plasticity is widespread among organisms. However, an important question is whether this plasticity is adaptive, enhancing the organism’s fitness. Most models for plasticity in life-history timing predict that once they have reached the minimal nutritional threshold animals under poor conditions will accelerate timing to development or reproduction. Adaptive delays in reproductive timing are not common, especially in short-lived species. Examples of adaptive reproductive delays exist in mammalian populations experiencing strong interspecific (e.g. predation) and intraspecific (e.g. infanticide) competition. But are there other environmental factors that may trigger an adaptive delay in reproductive timing? We show that the short-lived flesh fly Sarcophaga crassipalpis will delay reproductive timing under nutrient poor conditions, even though it has already met the minimal nutritional threshold for reproduction. We test if this delay strategy is consistent with an adaptive response allowing the scavenger time to locate more resources by providing additional protein pulses (early, mid and late) throughout the reproductive delay period. Flies receiving additional protein produced more eggs and larger eggs, demonstrating a benefit of the delay. In addition, by tracking the allocation of carbon from the pulses using stable isotopes, we show that flies receiving earlier pulses incorporated more carbon into eggs and somatic tissue than those provided a later pulse. These results indicate that the reproductive delay in S. crassipalpis is consistent with adaptive post-threshold plasticity, a nutritionally-linked reproductive strategy that has not been previously reported in an invertebrate species.
resource allocation; phenotypic plasticity; stable isotopes; adaptation; evolutionary physiology
Aggression, costly in both time and energy, is often expressed by male animals in defense of valuable resources such as food or potential mates. Here we present a new insect model system for the study of aggression, the male flesh fly Sarcophaga crassipalpis, and ask whether there is an ontogeny of aggression that coincides with reproductive maturity. After establishing that reproductive maturity occurs by day 3 of age (post-eclosion), we examined the behavior of socially isolated males from different age cohorts (days 1, 2, 3, 4, and 6) upon introduction, in a test arena, with another male of the same age. The results show a pronounced development of aggression with age. The change from relative indifference to heightened aggression involves a profound increase in the frequency of high-intensity aggressive behaviors between days 1 and 3. Also noteworthy is an abrupt increase in the number of statistically significant transitions involving these full-contact agonistic behaviors on day 2. This elevated activity is trimmed back somewhat by day 3 and appears to maintain a stable plateau thereafter. No convincing evidence was found for escalation of aggression nor the establishment of a dominance relationship over the duration of the encounters. Despite the fact that aggressive interactions are brief, lasting only a few seconds, a major reorganization in the relative proportions of four major non-aggressive behaviors (accounting for at least 96% of the total observation time for each age cohort) accompanies the switch from low to high aggression. A series of control experiments, with single flies in the test arenas, indicates that these changes occur in the absence of the performance of aggressive behaviors. This parallel ontogeny of aggressive and non-aggressive behaviors has implications for understanding how the entire behavioral repertoire may be organized and reorganized to accommodate the needs of the organism.
Body condition affects the timing and magnitude of life history transitions. Therefore, identifying proximate mechanisms involved in assessing condition is critical to understanding how these mechanisms affect the expression of life history plasticity. Nutrient storage is an important body condition parameter, likely playing roles in both attaining minimum body-condition thresholds for life history transitions and expression of life history traits.We manipulated protein availability for females of the flesh fly Sarcophaga crassipalpis to determine whether reproductive timing and output would remain plastic or become fixed. Liver was provided for 0, 2, 4, or 6 days of adult pre-reproductive development. Significantly, liver was removed after the feeding threshold had been attained and females had committed to producing a clutch.We also identified the major storage proteins and monitored their abundances, because protein stores may serve as an index of body condition and therefore may play an important role in life history transitions and plasticity.Flesh flies showed clear post-threshold plasticity in reproductive timing. Females fed protein for 2 days took ~30% longer to provision their clutch than those fed for 4 or 6 days. Observations of oogenesis showed the 2-day group expressed a different developmental program including slower egg provisioning.Protein availability also affected reproductive output. Females fed protein for 2 days produced ~20% fewer eggs than females fed 4 or 6 days. Six-day treated females provisioned larger eggs than 4-day treated females, followed by 2-day treated females with the smallest eggs.Two storage proteins were identified, LSP-1 and LSP-2. LSP-2 accumulation differed across feeding treatments. The 2- and 4-day treatment groups accumulated LSP-2 stores but depleted them during provisioning of the first clutch, whereas the 6-day group accumulated the greatest quantity of LSP-2 and had substantial LSP-2 stores remaining at the end of the clutch. This pattern of accumulation and depletion suggests that LSP-2 could play roles in both provisioning the current clutch and future clutches, making it a good candidate molecule for affecting reproductive timing and allotment. LSP-1 was not associated with post-threshold plasticity; it was carried over from larval feeding into adulthood and depleted uniformly across all feeding groups.
reproductive timing; reproductive threshold; hexameric storage protein; phenotypic plasticity
The full power of modern genetics has been applied to the study of speciation in only a small handful of genetic model species - all of which speciated allopatrically. Here we report the first large expressed sequence tag (EST) study of a candidate for ecological sympatric speciation, the apple maggot Rhagoletis pomonella, using massively parallel pyrosequencing on the Roche 454-FLX platform. To maximize transcript diversity we created and sequenced separate libraries from larvae, pupae, adult heads, and headless adult bodies.
We obtained 239,531 sequences which assembled into 24,373 contigs. A total of 6810 unique protein coding genes were identified among the contigs and long singletons, corresponding to 48% of all known Drosophila melanogaster protein-coding genes. Their distribution across GO classes suggests that we have obtained a representative sample of the transcriptome. Among these sequences are many candidates for potential R. pomonella "speciation genes" (or "barrier genes") such as those controlling chemosensory and life-history timing processes. Furthermore, we identified important marker loci including more than 40,000 single nucleotide polymorphisms (SNPs) and over 100 microsatellites. An initial search for SNPs at which the apple and hawthorn host races differ suggested at least 75 loci warranting further work. We also determined that developmental expression differences remained even after normalization; transcripts expected to show different expression levels between larvae and pupae in D. melanogaster also did so in R. pomonella. Preliminary comparative analysis of transcript presences and absences revealed evidence of gene loss in Drosophila and gain in the higher dipteran clade Schizophora.
These data provide a much needed resource for exploring mechanisms of divergence in this important model for sympatric ecological speciation. Our description of ESTs from a substantial portion of the R. pomonella transcriptome will facilitate future functional studies of candidate genes for olfaction and diapause-related life history timing, and will enable large scale expression studies. Similarly, the identification of new SNP and microsatellite markers will facilitate future population and quantitative genetic studies of divergence between the apple and hawthorn-infesting host races.
The aim of his study was to determine development time and thermal requirements of three myiasis flies including Chrysomya albiceps, Lucilia sericata, and Sarcophaga sp.
Rate of development (ROD) and accumulated degree day (ADD) of three important forensic flies in Iran, Chrysomya albiceps, Lucilia sericata, and Sarcophaga sp. by rearing individuals under a single constant temperature (28° C) was calculated using specific formula for four developmental events including egg hatching, larval stages, pupation, and eclosion.
Rates of development decreased step by step as the flies grew from egg to larvae and then to adult stage; however, this rate was bigger for blowflies (C. albiceps and L. sericata) in comparison with the flesh fly Sarcophaga sp. Egg hatching, larval stages, and pupation took about one fourth and half of the time of the total pre-adult development time for all of the three species. In general, the flesh fly Sarcophaga sp. required more heat for development than the blowflies. The thermal constants (K) were 130–195, 148–222, and 221–323 degree-days (DD) for egg hatching to adult stages of C. albiceps, L. sericata, and Sarcophaga sp., respectively.
This is the first report on thermal requirement of three forensic flies in Iran. The data of this study provide preliminary information for forensic entomologist to establish PMI in the area of study.
Degree Day; Forensic Entomology; Larval development; Myiasis; PMI
Horned beetles, in particular in the genus Onthophagus, are important models for studies on sexual selection, biological radiations, the origin of novel traits, developmental plasticity, biocontrol, conservation, and forensic biology. Despite their growing prominence as models for studying both basic and applied questions in biology, little genomic or transcriptomic data are available for this genus. We used massively parallel pyrosequencing (Roche 454-FLX platform) to produce a comprehensive EST dataset for the horned beetle Onthophagus taurus. To maximize sequence diversity, we pooled RNA extracted from a normalized library encompassing diverse developmental stages and both sexes.
We used 454 pyrosequencing to sequence ESTs from all post-embryonic stages of O. taurus. Approximately 1.36 million reads assembled into 50,080 non-redundant sequences encompassing a total of 26.5 Mbp. The non-redundant sequences match over half of the genes in Tribolium castaneum, the most closely related species with a sequenced genome. Analyses of Gene Ontology annotations and biochemical pathways indicate that the O. taurus sequences reflect a wide and representative sampling of biological functions and biochemical processes. An analysis of sequence polymorphisms revealed that SNP frequency was negatively related to overall expression level and the number of tissue types in which a given gene is expressed. The most variable genes were enriched for a limited number of GO annotations whereas the least variable genes were enriched for a wide range of GO terms directly related to fitness.
This study provides the first large-scale EST database for horned beetles, a much-needed resource for advancing the study of these organisms. Furthermore, we identified instances of gene duplications and alternative splicing, useful for future study of gene regulation, and a large number of SNP markers that could be used in population-genetic studies of O. taurus and possibly other horned beetles.
The genus Silene is widely used as a model system for addressing ecological and evolutionary questions in plants, but advances in using the genus as a model system are impeded by the lack of available resources for studying its genome. Massively parallel sequencing cDNA has recently developed into an efficient method for characterizing the transcriptomes of non-model organisms, generating massive amounts of data that enable the study of multiple species in a comparative framework. The sequences generated provide an excellent resource for identifying expressed genes, characterizing functional variation and developing molecular markers, thereby laying the foundations for future studies on gene sequence and gene expression divergence. Here, we report the results of a comparative transcriptome sequencing study of eight individuals representing four Silene and one Dianthus species as outgroup. All sequences and annotations have been deposited in a newly developed and publicly available database called SiESTa, the Silene EST annotation database.
A total of 1,041,122 EST reads were generated in two runs on a Roche GS-FLX 454 pyrosequencing platform. EST reads were analyzed separately for all eight individuals sequenced and were assembled into contigs using TGICL. These were annotated with results from BLASTX searches and Gene Ontology (GO) terms, and thousands of single-nucleotide polymorphisms (SNPs) were characterized. Unassembled reads were kept as singletons and together with the contigs contributed to the unigenes characterized in each individual. The high quality of unigenes is evidenced by the proportion (49%) that have significant hits in similarity searches with the A. thaliana proteome. The SiESTa database is accessible at http://www.siesta.ethz.ch.
The sequence collections established in the present study provide an important genomic resource for four Silene and one Dianthus species and will help to further develop Silene as a plant model system. The genes characterized will be useful for future research not only in the species included in the present study, but also in related species for which no genomic resources are yet available. Our results demonstrate the efficiency of massively parallel transcriptome sequencing in a comparative framework as an approach for developing genomic resources in diverse groups of non-model organisms.
cDNA library; database; EST; SNP; Silene
Plants of the Huperziaceae family, which comprise the two genera Huperzia and Phlegmariurus, produce various types of lycopodium alkaloids that are used to treat a number of human ailments, such as contusions, swellings and strains. Huperzine A, which belongs to the lycodine type of lycopodium alkaloids, has been used as an anti-Alzheimer's disease drug candidate. Despite their medical importance, little genomic or transcriptomic data are available for the members of this family. We used massive parallel pyrosequencing on the Roche 454-GS FLX Titanium platform to generate a substantial EST dataset for Huperzia serrata (H. serrata) and Phlegmariurus carinatus (P. carinatus) as representative members of the Huperzia and Phlegmariurus genera, respectively. H. serrata and P. carinatus are important plants for research on the biosynthesis of lycopodium alkaloids. We focused on gene discovery in the areas of bioactive compound biosynthesis and transcriptional regulation as well as genetic marker detection in these species.
For H. serrata, 36,763 unique putative transcripts were generated from 140,930 reads totaling over 57,028,559 base pairs; for P. carinatus, 31,812 unique putative transcripts were generated from 79,920 reads totaling over 30,498,684 base pairs. Using BLASTX searches of public databases, 16,274 (44.3%) unique putative transcripts from H. serrata and 14,070 (44.2%) from P. carinatus were assigned to at least one protein. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) orthology annotations revealed that the functions of the unique putative transcripts from these two species cover a similarly broad set of molecular functions, biological processes and biochemical pathways.
In particular, a total of 20 H. serrata candidate cytochrome P450 genes, which are more abundant in leaves than in roots and might be involved in lycopodium alkaloid biosynthesis, were found based on the comparison of H. serrata and P. carinatus 454-ESTs and real-time PCR analysis. Four unique putative CYP450 transcripts (Hs01891, Hs04010, Hs13557 and Hs00093) which are the most likely to be involved in the biosynthesis of lycopodium alkaloids were selected based on a phylogenetic analysis. Approximately 115 H. serrata and 98 P. carinatus unique putative transcripts associated with the biosynthesis of triterpenoids, alkaloids and flavones/flavonoids were located in the 454-EST datasets. Transcripts related to phytohormone biosynthesis and signal transduction as well as transcription factors were also obtained. In addition, we discovered 2,729 and 1,573 potential SSR-motif microsatellite loci in the H. serrata and P. carinatus 454-ESTs, respectively.
The 454-EST resource allowed for the first large-scale acquisition of ESTs from H. serrata and P. carinatus, which are representative members of the Huperziaceae family. We discovered many genes likely to be involved in the biosynthesis of bioactive compounds and transcriptional regulation as well as a large number of potential microsatellite markers. These results constitute an essential resource for understanding the molecular basis of developmental regulation and secondary metabolite biosynthesis (especially that of lycopodium alkaloids) in the Huperziaceae, and they provide an overview of the genetic diversity of this family.
The genetic basis of host preference has been investigated in only a few species. It is relevant to important questions in evolutionary biology, including sympatric speciation, generalist versus specialist adaptation, and parasite-host co-evolution. Here we show that a major locus strongly influences host preference in Nasonia. Nasonia are parasitic wasps that utilize fly pupae; N. vitripennis is a generalist that parasitizes a diverse set of hosts whereas N. giraulti specializes on Protocalliphora (bird blowflies). In laboratory choice experiments using Protocalliphora and Sarcophaga (flesh flies), N. vitripennis shows a preference for Sarcophaga while N. giraulti shows a preference for Protocalliphora. Through a series of interspecies crosses we have introgressed a major locus affecting host preference from N. giraulti into N. vitripennis. The N. giraulti allele is dominant and greatly increases preference for Protocalliphora pupae in the introgression line relative to the recessive N. vitripennis allele. Through the utilization of a Nasonia genotyping microarray, we have identified the introgressed region as 16 megabases of chromosome 4, although a more complete analysis is necessary to determine the exact genetic architecture of host preference in the genus. To our knowledge, this is the first introgression of the host preference of one parasitoid species into another, as well as one of the few cases of introgression of a behavioral gene between species.
host preference; genetic basis; parasitic wasps; Nasonia; generalists; specialists
Cucumber, Cucumis sativus L., is an economically and nutritionally important crop of the Cucurbitaceae family and has long served as a primary model system for sex determination studies. Recently, the sequencing of its whole genome has been completed. However, transcriptome information of this species is still scarce, with a total of around 8,000 Expressed Sequence Tag (EST) and mRNA sequences currently available in GenBank. In order to gain more insights into molecular mechanisms of plant sex determination and provide the community a functional genomics resource that will facilitate cucurbit research and breeding, we performed transcriptome sequencing of cucumber flower buds of two near-isogenic lines, WI1983G, a gynoecious plant which bears only pistillate flowers, and WI1983H, a hermaphroditic plant which bears only bisexual flowers.
Using Roche-454 massive parallel pyrosequencing technology, we generated a total of 353,941 high quality EST sequences with an average length of 175bp, among which 188,255 were from gynoecious flowers and 165,686 from hermaphroditic flowers. These EST sequences, together with ~5,600 high quality cucumber EST and mRNA sequences available in GenBank, were clustered and assembled into 81,401 unigenes, of which 28,452 were contigs and 52,949 were singletons. The unigenes and ESTs were further mapped to the cucumber genome and more than 500 alternative splicing events were identified in 443 cucumber genes. The unigenes were further functionally annotated by comparing their sequences to different protein and functional domain databases and assigned with Gene Ontology (GO) terms. A biochemical pathway database containing 343 predicted pathways was also created based on the annotations of the unigenes. Digital expression analysis identified ~200 differentially expressed genes between flowers of WI1983G and WI1983H and provided novel insights into molecular mechanisms of plant sex determination process. Furthermore, a set of SSR motifs and high confidence SNPs between WI1983G and WI1983H were identified from the ESTs, which provided the material basis for future genetic linkage and QTL analysis.
A large set of EST sequences were generated from cucumber flower buds of two different sex types. Differentially expressed genes between these two different sex-type flowers, as well as putative SSR and SNP markers, were identified. These EST sequences provide valuable information to further understand molecular mechanisms of plant sex determination process and forms a rich resource for future functional genomics analysis, marker development and cucumber breeding.
White mold, caused by Sclerotinia sclerotiorum, is one of the most important diseases of pea (Pisum sativum L.), however, little is known about the genetics and biochemistry of this interaction. Identification of genes underlying resistance in the host or pathogenicity and virulence factors in the pathogen will increase our knowledge of the pea-S. sclerotiorum interaction and facilitate the introgression of new resistance genes into commercial pea varieties. Although the S. sclerotiorum genome sequence is available, no pea genome is available, due in part to its large genome size (~3500 Mb) and extensive repeated motifs. Here we present an EST data set specific to the interaction between S. sclerotiorum and pea, and a method to distinguish pathogen and host sequences without a species-specific reference genome.
10,158 contigs were obtained by de novo assembly of 128,720 high-quality reads generated by 454 pyrosequencing of the pea-S. sclerotiorum interactome. A method based on the tBLASTx program was modified to distinguish pea and S. sclerotiorum ESTs. To test this strategy, a mixture of known ESTs (18,490 pea and 17,198 S. sclerotiorum ESTs) from public databases were pooled and parsed; the tBLASTx method successfully separated 90.1% of the artificial EST mix with 99.9% accuracy. The tBLASTx method successfully parsed 89.4% of the 454-derived EST contigs, as validated by PCR, into pea (6,299 contigs) and S. sclerotiorum (2,780 contigs) categories. Two thousand eight hundred and forty pea ESTs and 996 S. sclerotiorum ESTs were predicted to be expressed specifically during the pea-S. sclerotiorum interaction as determined by homology search against 81,449 pea ESTs (from flowers, leaves, cotyledons, epi- and hypocotyl, and etiolated and light treated etiolated seedlings) and 57,751 S. sclerotiorum ESTs (from mycelia at neutral pH, developing apothecia and developing sclerotia). Among those ESTs specifically expressed, 277 (9.8%) pea ESTs were predicted to be involved in plant defense and response to biotic or abiotic stress, and 93 (9.3%) S. sclerotiorum ESTs were predicted to be involved in pathogenicity/virulence. Additionally, 142 S. sclerotiorum ESTs were identified as secretory/signal peptides of which only 21 were previously reported.
We present and characterize an EST resource specific to the pea-S. sclerotiorum interaction. Additionally, the tBLASTx method used to parse S. sclerotiorum and pea ESTs was demonstrated to be a reliable and accurate method to distinguish ESTs without a reference genome.
Pisum sativum; Sclerotinia sclerotiorum; Transcriptome; Parsing of host-pathogen sequences; Non-model organism
Anopheles sinensis is a major malaria vector in China and other Southeast Asian countries, and it is becoming increasingly resistant to the insecticides used for agriculture, net impregnation, and indoor residual spray. Very limited genomic information on this species is available, which has hindered the development of new tools for resistance surveillance and vector control. We used the 454 GS FLX system and generated expressed sequence tag (EST) databases of various life stages of An. sinensis, and we determined the transcriptional differences between deltamethrin resistant and susceptible mosquitoes.
The 454 GS FLX transcriptome sequencing yielded a total of 624,559 reads (average length of 290 bp) with the pooled An. sinensis mosquitoes across various development stages. The de novo assembly generated 33,411 contigs with average length of 493 bp. A total of 8,057 ESTs were generated with Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) annotation. A total of 2,131 ESTs were differentially expressed between deltamethrin resistant and susceptible mosquitoes collected from the same field site in Jiangsu, China. Among these differentially expressed ESTs, a total of 294 pathways were mapped to the KEGG database, with the predominant ESTs belonging to metabolic pathways. Furthermore, a total of 2,408 microsatellites and 15,496 single nucleotide polymorphisms (SNPs) were identified.
The annotated EST and transcriptome databases provide a valuable genomic resource for further genetic studies of this important malaria vector species. The differentially expressed ESTs associated with insecticide resistance identified in this study lay an important foundation for further functional analysis. The identified microsatellite and SNP markers will provide useful tools for future population genetic and comparative genomic analyses of malaria vectors.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-448) contains supplementary material, which is available to authorized users.
Transcriptome; Expressed sequence tag; Pyrethroid resistance; Gene expression; Anopheles sinensis
Expressed Sequence Tags (ESTs) have played significant roles in gene discovery and gene functional analysis, especially for non-model organisms. For organisms with no full genome sequences available, ESTs are normally assembled into longer consensus sequences for further downstream analysis. However current de novo EST assembly programs often generate large number of assembly errors that will negatively affect the downstream analysis. In order to generate more accurate consensus sequences from ESTs, tools are needed to reduce or eliminate errors from de novo assemblies.
We present iAssembler, a pipeline that can assemble large-scale ESTs into consensus sequences with significantly higher accuracy than current existing assemblers. iAssembler employs MIRA and CAP3 assemblers to generate initial assemblies, followed by identifying and correcting two common types of transcriptome assembly errors: 1) ESTs from different transcripts (mainly alternatively spliced transcripts or paralogs) are incorrectly assembled into same contigs; and 2) ESTs from same transcripts fail to be assembled together. iAssembler can be used to assemble ESTs generated using the traditional Sanger method and/or the Roche-454 massive parallel pyrosequencing technology.
We compared performances of iAssembler and several other de novo EST assembly programs using both Roche-454 and Sanger EST datasets. It demonstrated that iAssembler generated significantly more accurate consensus sequences than other assembly programs.
Cultivated watermelon [Citrullus lanatus (Thunb.) Matsum. & Nakai var. lanatus] is an important agriculture crop world-wide. The fruit of watermelon undergoes distinct stages of development with dramatic changes in its size, color, sweetness, texture and aroma. In order to better understand the genetic and molecular basis of these changes and significantly expand the watermelon transcript catalog, we have selected four critical stages of watermelon fruit development and used Roche/454 next-generation sequencing technology to generate a large expressed sequence tag (EST) dataset and a comprehensive transcriptome profile for watermelon fruit flesh tissues.
We performed half Roche/454 GS-FLX run for each of the four watermelon fruit developmental stages (immature white, white-pink flesh, red flesh and over-ripe) and obtained 577,023 high quality ESTs with an average length of 302.8 bp. De novo assembly of these ESTs together with 11,786 watermelon ESTs collected from GenBank produced 75,068 unigenes with a total length of approximately 31.8 Mb. Overall 54.9% of the unigenes showed significant similarities to known sequences in GenBank non-redundant (nr) protein database and around two-thirds of them matched proteins of cucumber, the most closely-related species with a sequenced genome. The unigenes were further assigned with gene ontology (GO) terms and mapped to biochemical pathways. More than 5,000 SSRs were identified from the EST collection. Furthermore we carried out digital gene expression analysis of these ESTs and identified 3,023 genes that were differentially expressed during watermelon fruit development and ripening, which provided novel insights into watermelon fruit biology and a comprehensive resource of candidate genes for future functional analysis. We then generated profiles of several interesting metabolites that are important to fruit quality including pigmentation and sweetness. Integrative analysis of metabolite and digital gene expression profiles helped elucidating molecular mechanisms governing these important quality-related traits during watermelon fruit development.
We have generated a large collection of watermelon ESTs, which represents a significant expansion of the current transcript catalog of watermelon and a valuable resource for future studies on the genomics of watermelon and other closely-related species. Digital expression analysis of this EST collection allowed us to identify a large set of genes that were differentially expressed during watermelon fruit development and ripening, which provide a rich source of candidates for future functional analysis and represent a valuable increase in our knowledge base of watermelon fruit biology.
Contemporary studies in forensic entomology exhaustively evaluate gene sequences because these constitute the fastest and most accurate method of species identification. For this purpose single gene segments, cytochrome oxidase subunit I (COI) in particular, are commonly used. However, the limitation of such sequences in identification, especially of closely related species and populations, demand a multi-gene approach. But this raises the question of which group of genes can best fulfill the identification task? In this context the utility of five gene segments was explored among blowfly species from two distinct geographic regions, China and Pakistan. COI, cytochrome b (CYTB), NADH dehydrogenase 5 (ND5), nuclear internal transcribed spacers (ITS1 and ITS2), were sequenced for eight blowfly species including Chrysomya megacephala F. (Diptera: Calliphoidae), Ch. pinguis Walker, Lucilia sericata Meigen L. porphyrina Walker, L. illustris Meigen Hemipyrellia ligurriens Wiedemann, Aldrichina grahami Aldrich, and the housefly, Musca domestica L. (Muscidae), from Hangzhou, China; while COI, CYTB, and ITS2 were sequenced for four species, i.e. Ch. megacephala, Ch. rufifacies, L. cuprina, and the flesh fly, Sarcophaga albiceps Meigen (Sarcophagidae), from Dera Ismail Khan Pakistan. The results demonstrate a universal utility of these gene segments in the molecular identification of flies of forensic importance.
Calliphoridae; cytochrome b; molecular identification; NADH dehydrogenase 5; nuclear internal transcribed spacers
The striped venus Chamelea gallina clam fishery is among the oldest and the largest in the Mediterranean Sea, particularly in the inshore waters of northern Adriatic Sea. The high fishing pressure has lead to a strong stock abundance decline, enhanced by several irregular mortality events. The nearly complete lack of molecular characterization limits the available genetic resources for C. gallina. We achieved the first transcriptome of this species with the aim of identifying an informative set of expressed genes, potential markers to assess genetic structure of natural populations and molecular resources for pathogenic contamination detection.
The 454-pyrosequencing of a normalized cDNA library of a pool C. gallina adult individuals yielded 298,494 raw reads. Different steps of reads assembly and filtering produced 36,422 contigs of high quality, one half of which (18,196) were annotated by similarity. A total of 111 microsatellites and 20,377 putative SNPs were identified. A panel of 13 polymorphic transcript-linked microsatellites was developed and their variability assessed in 12 individuals. Remarkably, a scan to search for contamination sequences of infectious origin indicated the presence of several Vibrionales species reported to be among the most frequent clam pathogen's species. Results reported in this study were included in a dedicated database available at http://compgen.bio.unipd.it/chameleabase.
This study represents the first attempt to sequence and de novo annotate the transcriptome of the clam C. gallina. The availability of this transcriptome opens new perspectives in the study of biochemical and physiological role of gene products and their responses to large and small-scale environmental stress in C. gallina, with high throughput experiments such as custom microarray or targeted re-sequencing. Molecular markers, such as the already optimized EST-linked microsatellites and the discovered SNPs will be useful to estimate effects of demographic processes and to detect minute levels of population structuring.
The combination of high-throughput transcript profiling and next-generation sequencing technologies is a prerequisite for genome-wide comprehensive transcriptome analysis. Our recent innovation of deepSuperSAGE is based on an advanced SuperSAGE protocol and its combination with massively parallel pyrosequencing on Roche's 454 sequencing platform. As a demonstration of the power of this combination, we have chosen the salt stress transcriptomes of roots and nodules of the third most important legume crop chickpea (Cicer arietinum L.). While our report is more technology-oriented, it nevertheless addresses a major world-wide problem for crops generally: high salinity. Together with low temperatures and water stress, high salinity is responsible for crop losses of millions of tons of various legume (and other) crops. Continuously deteriorating environmental conditions will combine with salinity stress to further compromise crop yields. As a good example for such stress-exposed crop plants, we started to characterize salt stress responses of chickpeas on the transcriptome level.
We used deepSuperSAGE to detect early global transcriptome changes in salt-stressed chickpea. The salt stress responses of 86,919 transcripts representing 17,918 unique 26 bp deepSuperSAGE tags (UniTags) from roots of the salt-tolerant variety INRAT-93 two hours after treatment with 25 mM NaCl were characterized. Additionally, the expression of 57,281 transcripts representing 13,115 UniTags was monitored in nodules of the same plants. From a total of 144,200 analyzed 26 bp tags in roots and nodules together, 21,401 unique transcripts were identified. Of these, only 363 and 106 specific transcripts, respectively, were commonly up- or down-regulated (>3.0-fold) under salt stress in both organs, witnessing a differential organ-specific response to stress.
Profiting from recent pioneer works on massive cDNA sequencing in chickpea, more than 9,400 UniTags were able to be linked to UniProt entries. Additionally, gene ontology (GO) categories over-representation analysis enabled to filter out enriched biological processes among the differentially expressed UniTags. Subsequently, the gathered information was further cross-checked with stress-related pathways.
From several filtered pathways, here we focus exemplarily on transcripts associated with the generation and scavenging of reactive oxygen species (ROS), as well as on transcripts involved in Na+ homeostasis. Although both processes are already very well characterized in other plants, the information generated in the present work is of high value. Information on expression profiles and sequence similarity for several hundreds of transcripts of potential interest is now available.
This report demonstrates, that the combination of the high-throughput transcriptome profiling technology SuperSAGE with one of the next-generation sequencing platforms allows deep insights into the first molecular reactions of a plant exposed to salinity. Cross validation with recent reports enriched the information about the salt stress dynamics of more than 9,000 chickpea ESTs, and enlarged their pool of alternative transcripts isoforms.
As an example for the high resolution of the employed technology that we coin deepSuperSAGE, we demonstrate that ROS-scavenging and -generating pathways undergo strong global transcriptome changes in chickpea roots and nodules already 2 hours after onset of moderate salt stress (25 mM NaCl). Additionally, a set of more than 15 candidate transcripts are proposed to be potential components of the salt overly sensitive (SOS) pathway in chickpea.
Newly identified transcript isoforms are potential targets for breeding novel cultivars with high salinity tolerance. We demonstrate that these targets can be integrated into breeding schemes by micro-arrays and RT-PCR assays downstream of the generation of 26 bp tags by SuperSAGE.
Yellow lupin (Lupinus luteus L.) is a minor legume crop characterized by its high seed protein content. Although grown in several temperate countries, its orphan condition has limited the generation of genomic tools to aid breeding efforts to improve yield and nutritional quality. In this study, we report the construction of 454-expresed sequence tag (EST) libraries, carried out comparative studies between L. luteus and model legume species, developed a comprehensive set of EST-simple sequence repeat (SSR) markers, and validated their utility on diversity studies and transferability to related species.
Two runs of 454 pyrosequencing yielded 205 Mb and 530 Mb of sequence data for L1 (young leaves, buds and flowers) and L2 (immature seeds) EST- libraries. A combined assembly (L1L2) yielded 71,655 contigs with an average contig length of 632 nucleotides. L1L2 contigs were clustered into 55,309 isotigs. 38,200 isotigs translated into proteins and 8,741 of them were full length. Around 57% of L. luteus sequences had significant similarity with at least one sequence of Medicago, Lotus, Arabidopsis, or Glycine, and 40.17% showed positive matches with all of these species. L. luteus isotigs were also screened for the presence of SSR sequences. A total of 2,572 isotigs contained at least one EST-SSR, with a frequency of one SSR per 17.75 kbp. Empirical evaluation of the EST-SSR candidate markers resulted in 222 polymorphic EST-SSRs. Two hundred and fifty four (65.7%) and 113 (30%) SSR primer pairs were able to amplify fragments from L. hispanicus and L. mutabilis DNA, respectively. Fifty polymorphic EST-SSRs were used to genotype a sample of 64 L. luteus accessions. Neighbor-joining distance analysis detected the existence of several clusters among L. luteus accessions, strongly suggesting the existence of population subdivisions. However, no clear clustering patterns followed the accession’s origin.
L. luteus deep transcriptome sequencing will facilitate the further development of genomic tools and lupin germplasm. Massive sequencing of cDNA libraries will continue to produce raw materials for gene discovery, identification of polymorphisms (SNPs, EST-SSRs, INDELs, etc.) for marker development, anchoring sequences for genome comparisons and putative gene candidates for QTL detection.
Lupinus luteus; EST-SSR; Orphan crop; Microsynteny
The olive fruit fly Bactrocera oleae has a unique ability to cope with olive flesh, and is the most destructive pest of olives worldwide. Its control has been largely based on the use of chemical insecticides, however, the selection of insecticide resistance against several insecticides has evolved. The study of detoxification mechanisms, which allow the olive fruit fly to defend against insecticides, and/or phytotoxins possibly present in the mesocarp, has been hampered by the lack of genomic information in this species. In the NCBI database less than 1,000 nucleotide sequences have been deposited, with less than 10 detoxification gene homologues in total. We used 454 pyrosequencing to produce, for the first time, a large transcriptome dataset for B. oleae. A total of 482,790 reads were assembled into 14,204 contigs. More than 60% of those contigs (8,630) were larger than 500 base pairs, and almost half of them matched with genes of the order of the Diptera. Analysis of the Gene Ontology (GO) distribution of unique contigs, suggests that, compared to other insects, the assembly is broadly representative for the B. oleae transcriptome. Furthermore, the transcriptome was found to contain 55 P450, 43 GST-, 15 CCE- and 18 ABC transporter-genes. Several of those detoxification genes, may putatively be involved in the ability of the olive fruit fly to deal with xenobiotics, such as plant phytotoxins and insecticides. In summary, our study has generated new data and genomic resources, which will substantially facilitate molecular studies in B. oleae, including elucidation of detoxification mechanisms of xenobiotic, as well as other important aspects of olive fruit fly biology.
Orchids are one of the most diversified angiosperms, but few genomic resources are available for these non-model plants. In addition to the ecological significance, Phalaenopsis has been considered as an economically important floriculture industry worldwide. We aimed to use massively parallel 454 pyrosequencing for a global characterization of the Phalaenopsis transcriptome.
To maximize sequence diversity, we pooled RNA from 10 samples of different tissues, various developmental stages, and biotic- or abiotic-stressed plants. We obtained 206,960 expressed sequence tags (ESTs) with an average read length of 228 bp. These reads were assembled into 8,233 contigs and 34,630 singletons. The unigenes were searched against the NCBI non-redundant (NR) protein database. Based on sequence similarity with known proteins, these analyses identified 22,234 different genes (E-value cutoff, e-7). Assembled sequences were annotated with Gene Ontology, Gene Family and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Among these annotations, over 780 unigenes encoding putative transcription factors were identified.
Pyrosequencing was effective in identifying a large set of unigenes from Phalaenopsis. The informative EST dataset we developed constitutes a much-needed resource for discovery of genes involved in various biological processes in Phalaenopsis and other orchid species. These transcribed sequences will narrow the gap between study of model organisms with many genomic resources and species that are important for ecological and evolutionary studies.
Benefits from high-throughput sequencing using 454 pyrosequencing technology may be most apparent for species with high societal or economic value but few genomic resources. Rapid means of gene sequence and SNP discovery using this novel sequencing technology provide a set of baseline tools for genome-level research. However, it is questionable how effective the sequencing of large numbers of short reads for species with essentially no prior gene sequence information will support contig assemblies and sequence annotation.
With the purpose of generating the first broad survey of gene sequences in Eucalyptus grandis, the most widely planted hardwood tree species, we used 454 technology to sequence and assemble 148 Mbp of expressed sequences (EST). EST sequences were generated from a normalized cDNA pool comprised of multiple tissues and genotypes, promoting discovery of homologues to almost half of Arabidopsis genes, and a comprehensive survey of allelic variation in the transcriptome. By aligning the sequencing reads from multiple genotypes we detected 23,742 SNPs, 83% of which were validated in a sample. Genome-wide nucleotide diversity was estimated for 2,392 contigs using a modified theta (θ) parameter, adapted for measuring genetic diversity from polymorphisms detected by randomly sequencing a multi-genotype cDNA pool. Diversity estimates in non-synonymous nucleotides were on average 4x smaller than in synonymous, suggesting purifying selection. Non-synonymous to synonymous substitutions (Ka/Ks) among 2,001 contigs averaged 0.30 and was skewed to the right, further supporting that most genes are under purifying selection. Comparison of these estimates among contigs identified major functional classes of genes under purifying and diversifying selection in agreement with previous researches.
In providing an abundance of foundational transcript sequences where limited prior genomic information existed, this work created part of the foundation for the annotation of the E. grandis genome that is being sequenced by the US Department of Energy. In addition we demonstrated that SNPs sampled in large-scale with 454 pyrosequencing can be used to detect evolutionary signatures among genes, providing one of the first genome-wide assessments of nucleotide diversity and Ka/Ks for a non-model plant species.
The reptiles, characterized by both diversity and unique evolutionary adaptations, provide a comprehensive system for comparative studies of metabolism, physiology, and development. However, molecular resources for ectothermic reptiles are severely limited, hampering our ability to study the genetic basis for many evolutionarily important traits such as metabolic plasticity, extreme longevity, limblessness, venom, and freeze tolerance. Here we use massively parallel sequencing (454 GS-FLX Titanium) to generate a transcriptome of the western terrestrial garter snake (Thamnophis elegans) with two goals in mind. First, we develop a molecular resource for an ectothermic reptile; and second, we use these sex-specific transcriptomes to identify differences in the presence of expressed transcripts and potential genes of evolutionary interest.
Using sex-specific pools of RNA (one pool for females, one pool for males) representing 7 tissue types and 35 diverse individuals, we produced 1.24 million sequence reads, which averaged 366 bp in length after cleaning. Assembly of the cleaned reads from both sexes with NEWBLER and MIRA resulted in 96,379 contigs containing 87% of the cleaned reads. Over 34% of these contigs and 13% of the singletons were annotated based on homology to previously identified proteins. From these homology assignments, additional clustering, and ORF predictions, we estimate that this transcriptome contains ~13,000 unique genes that were previously identified in other species and over 66,000 transcripts from unidentified protein-coding genes. Furthermore, we use a graph-clustering method to identify contigs linked by NEWBLER-split reads that represent divergent alleles, gene duplications, and alternatively spliced transcripts. Beyond gene identification, we identified 95,295 SNPs and 31,651 INDELs. From these sex-specific transcriptomes, we identified 190 genes that were only present in the mRNA sequenced from one of the sexes (84 female-specific, 106 male-specific), and many highly variable genes of evolutionary interest.
This is the first large-scale, multi-organ transcriptome for an ectothermic reptile. This resource provides the most comprehensive set of EST sequences available for an individual ectothermic reptile species, increasing the number of snake ESTs 50-fold. We have identified genes that appear to be under evolutionary selection and those that are sex-specific. This resource will assist studies on gene expression and comparative genomics, and will facilitate the study of evolutionarily important traits at the molecular level.
The striped bass and its relatives (genus Morone) are important fisheries and aquaculture species native to estuaries and rivers of the Atlantic coast and Gulf of Mexico in North America. To open avenues of gene expression research on reproduction and breeding of striped bass, we generated a collection of expressed sequence tags (ESTs) from a complementary DNA (cDNA) library representative of their ovarian transcriptome.
Sequences of a total of 230,151 ESTs (51,259,448 bp) were acquired by Roche 454 pyrosequencing of cDNA pooled from ovarian tissues obtained at all stages of oocyte growth, at ovulation (eggs), and during preovulatory atresia. Quality filtering of ESTs allowed assembly of 11,208 high-quality contigs ≥ 100 bp, including 2,984 contigs 500 bp or longer (average length 895 bp). Blastx comparisons revealed 5,482 gene orthologues (E-value < 10-3), of which 4,120 (36.7% of total contigs) were annotated with Gene Ontology terms (E-value < 10-6). There were 5,726 remaining unknown unique sequences (51.1% of total contigs). All of the high-quality EST sequences are available in the National Center for Biotechnology Information (NCBI) Short Read Archive (GenBank: SRX007394). Informative contigs were considered to be abundant if they were assembled from groups of ESTs comprising ≥ 0.15% of the total short read sequences (≥ 345 reads/contig). Approximately 52.5% of these abundant contigs were predicted to have predominant ovary expression through digital differential display in silico comparisons to zebrafish (Danio rerio) UniGene orthologues. Over 1,300 Gene Ontology terms from Biological Process classes of Reproduction, Reproductive process, and Developmental process were assigned to this collection of annotated contigs.
This first large reference sequence database available for the ecologically and economically important temperate basses (genus Morone) provides a foundation for gene expression studies in these species. The predicted predominance of ovary gene expression and assignment of directly relevant Gene Ontology classes suggests a powerful utility of this dataset for analysis of ovarian gene expression related to fundamental questions of oogenesis. Additionally, a high definition Agilent 60-mer oligo ovary 'UniClone' microarray with 8 × 15,000 probe format has been designed based on this striped bass transcriptome (eArray Group: Striper Group, Design ID: 029004).
Big sagebrush (Artemisia tridentata) is one of the most widely distributed and ecologically important shrub species in western North America. This species serves as a critical habitat and food resource for many animals and invertebrates. Habitat loss due to a combination of disturbances followed by establishment of invasive plant species is a serious threat to big sagebrush ecosystem sustainability. Lack of genomic data has limited our understanding of the evolutionary history and ecological adaptation in this species. Here, we report on the sequencing of expressed sequence tags (ESTs) and detection of single nucleotide polymorphism (SNP) and simple sequence repeat (SSR) markers in subspecies of big sagebrush.
cDNA of A. tridentata sspp. tridentata and vaseyana were normalized and sequenced using the 454 GS FLX Titanium pyrosequencing technology. Assembly of the reads resulted in 20,357 contig consensus sequences in ssp. tridentata and 20,250 contigs in ssp. vaseyana. A BLASTx search against the non-redundant (NR) protein database using 29,541 consensus sequences obtained from a combined assembly resulted in 21,436 sequences with significant blast alignments (≤ 1e-15). A total of 20,952 SNPs and 119 polymorphic SSRs were detected between the two subspecies. SNPs were validated through various methods including sequence capture. Validation of SNPs in different individuals uncovered a high level of nucleotide variation in EST sequences. EST sequences of a third, tetraploid subspecies (ssp. wyomingensis) obtained by Illumina sequencing were mapped to the consensus sequences of the combined 454 EST assembly. Approximately one-third of the SNPs between sspp. tridentata and vaseyana identified in the combined assembly were also polymorphic within the two geographically distant ssp. wyomingensis samples.
We have produced a large EST dataset for Artemisia tridentata, which contains a large sample of the big sagebrush leaf transcriptome. SNP mapping among the three subspecies suggest the origin of ssp. wyomingensis via mixed ancestry. A large number of SNP and SSR markers provide the foundation for future research to address questions in big sagebrush evolution, ecological genetics, and conservation using genomic approaches.