A rich chapter in the history of insect endocrinology has focused on hormonal control of diapause, especially the major roles played by juvenile hormones (JHs), ecdysteroids, and the neuropeptides that govern JH and ecdysteroid synthesis. More recently, experiments with adult diapause in Drosophila melanogaster and the mosquito Culex pipiens, and pupal diapause in the flesh fly Sarcophaga crassipalpis provide strong evidence that insulin signaling is also an important component of the regulatory pathway leading to the diapause phenotype. Insects produce many different insulin-like peptides (ILPs), and not all are involved in the diapause response; ILP-1 appears to be the one most closely linked to diapause in C. pipiens. Many steps in the pathway leading from perception of daylength (the primary environmental cue used to program diapause) to generation of the diapause phenotype remain unknown, but the role for insulin signaling in mosquito diapause appears to be upstream of JH, as evidenced by the fact that application of exogenous JH can rescue the effects of knocking down expression of ILP-1 or the Insulin Receptor. Fat accumulation, enhancement of stress tolerance, and other features of the diapause phenotype are likely linked to the insulin pathway through the action of a key transcription factor, FOXO. This review highlights many parallels for the role of insulin signaling as a regulator in insect diapause and dauer formation in the nematode Caenorhabditis elegans.
diapause; dauer; insulin signaling; FOXO; Culex pipiens
Aggression, costly in both time and energy, is often expressed by male animals in defense of valuable resources such as food or potential mates. Here we present a new insect model system for the study of aggression, the male flesh fly Sarcophaga crassipalpis, and ask whether there is an ontogeny of aggression that coincides with reproductive maturity. After establishing that reproductive maturity occurs by day 3 of age (post-eclosion), we examined the behavior of socially isolated males from different age cohorts (days 1, 2, 3, 4, and 6) upon introduction, in a test arena, with another male of the same age. The results show a pronounced development of aggression with age. The change from relative indifference to heightened aggression involves a profound increase in the frequency of high-intensity aggressive behaviors between days 1 and 3. Also noteworthy is an abrupt increase in the number of statistically significant transitions involving these full-contact agonistic behaviors on day 2. This elevated activity is trimmed back somewhat by day 3 and appears to maintain a stable plateau thereafter. No convincing evidence was found for escalation of aggression nor the establishment of a dominance relationship over the duration of the encounters. Despite the fact that aggressive interactions are brief, lasting only a few seconds, a major reorganization in the relative proportions of four major non-aggressive behaviors (accounting for at least 96% of the total observation time for each age cohort) accompanies the switch from low to high aggression. A series of control experiments, with single flies in the test arenas, indicates that these changes occur in the absence of the performance of aggressive behaviors. This parallel ontogeny of aggressive and non-aggressive behaviors has implications for understanding how the entire behavioral repertoire may be organized and reorganized to accommodate the needs of the organism.
Life-history plasticity is widespread among organisms. However, an important question is whether this plasticity is adaptive, enhancing the organism’s fitness. Most models for plasticity in life-history timing predict that once they have reached the minimal nutritional threshold animals under poor conditions will accelerate timing to development or reproduction. Adaptive delays in reproductive timing are not common, especially in short-lived species. Examples of adaptive reproductive delays exist in mammalian populations experiencing strong interspecific (e.g. predation) and intraspecific (e.g. infanticide) competition. But are there other environmental factors that may trigger an adaptive delay in reproductive timing? We show that the short-lived flesh fly Sarcophaga crassipalpis will delay reproductive timing under nutrient poor conditions, even though it has already met the minimal nutritional threshold for reproduction. We test if this delay strategy is consistent with an adaptive response allowing the scavenger time to locate more resources by providing additional protein pulses (early, mid and late) throughout the reproductive delay period. Flies receiving additional protein produced more eggs and larger eggs, demonstrating a benefit of the delay. In addition, by tracking the allocation of carbon from the pulses using stable isotopes, we show that flies receiving earlier pulses incorporated more carbon into eggs and somatic tissue than those provided a later pulse. These results indicate that the reproductive delay in S. crassipalpis is consistent with adaptive post-threshold plasticity, a nutritionally-linked reproductive strategy that has not been previously reported in an invertebrate species.
resource allocation; phenotypic plasticity; stable isotopes; adaptation; evolutionary physiology
Body condition affects the timing and magnitude of life history transitions. Therefore, identifying proximate mechanisms involved in assessing condition is critical to understanding how these mechanisms affect the expression of life history plasticity. Nutrient storage is an important body condition parameter, likely playing roles in both attaining minimum body-condition thresholds for life history transitions and expression of life history traits.We manipulated protein availability for females of the flesh fly Sarcophaga crassipalpis to determine whether reproductive timing and output would remain plastic or become fixed. Liver was provided for 0, 2, 4, or 6 days of adult pre-reproductive development. Significantly, liver was removed after the feeding threshold had been attained and females had committed to producing a clutch.We also identified the major storage proteins and monitored their abundances, because protein stores may serve as an index of body condition and therefore may play an important role in life history transitions and plasticity.Flesh flies showed clear post-threshold plasticity in reproductive timing. Females fed protein for 2 days took ~30% longer to provision their clutch than those fed for 4 or 6 days. Observations of oogenesis showed the 2-day group expressed a different developmental program including slower egg provisioning.Protein availability also affected reproductive output. Females fed protein for 2 days produced ~20% fewer eggs than females fed 4 or 6 days. Six-day treated females provisioned larger eggs than 4-day treated females, followed by 2-day treated females with the smallest eggs.Two storage proteins were identified, LSP-1 and LSP-2. LSP-2 accumulation differed across feeding treatments. The 2- and 4-day treatment groups accumulated LSP-2 stores but depleted them during provisioning of the first clutch, whereas the 6-day group accumulated the greatest quantity of LSP-2 and had substantial LSP-2 stores remaining at the end of the clutch. This pattern of accumulation and depletion suggests that LSP-2 could play roles in both provisioning the current clutch and future clutches, making it a good candidate molecule for affecting reproductive timing and allotment. LSP-1 was not associated with post-threshold plasticity; it was carried over from larval feeding into adulthood and depleted uniformly across all feeding groups.
reproductive timing; reproductive threshold; hexameric storage protein; phenotypic plasticity
The full power of modern genetics has been applied to the study of speciation in only a small handful of genetic model species - all of which speciated allopatrically. Here we report the first large expressed sequence tag (EST) study of a candidate for ecological sympatric speciation, the apple maggot Rhagoletis pomonella, using massively parallel pyrosequencing on the Roche 454-FLX platform. To maximize transcript diversity we created and sequenced separate libraries from larvae, pupae, adult heads, and headless adult bodies.
We obtained 239,531 sequences which assembled into 24,373 contigs. A total of 6810 unique protein coding genes were identified among the contigs and long singletons, corresponding to 48% of all known Drosophila melanogaster protein-coding genes. Their distribution across GO classes suggests that we have obtained a representative sample of the transcriptome. Among these sequences are many candidates for potential R. pomonella "speciation genes" (or "barrier genes") such as those controlling chemosensory and life-history timing processes. Furthermore, we identified important marker loci including more than 40,000 single nucleotide polymorphisms (SNPs) and over 100 microsatellites. An initial search for SNPs at which the apple and hawthorn host races differ suggested at least 75 loci warranting further work. We also determined that developmental expression differences remained even after normalization; transcripts expected to show different expression levels between larvae and pupae in D. melanogaster also did so in R. pomonella. Preliminary comparative analysis of transcript presences and absences revealed evidence of gene loss in Drosophila and gain in the higher dipteran clade Schizophora.
These data provide a much needed resource for exploring mechanisms of divergence in this important model for sympatric ecological speciation. Our description of ESTs from a substantial portion of the R. pomonella transcriptome will facilitate future functional studies of candidate genes for olfaction and diapause-related life history timing, and will enable large scale expression studies. Similarly, the identification of new SNP and microsatellite markers will facilitate future population and quantitative genetic studies of divergence between the apple and hawthorn-infesting host races.
Sarcophagidae are an important element of carrion insect community. Unfortunately, results on larval and adult Sarcophagidae from forensic carrion studies are virtually absent mostly due to the taxonomic problems with species identification of females and larvae. The impact of this taxon on decomposition of large carrion has not been reliably evaluated. During several pig carcass studies in Poland, large body of data on adult and larval Sarcophagidae was collected. We determined (1) assemblages of adult flesh flies visiting pig carrion in various habitats, (2) species of flesh flies which breed in pig carcasses, and (3) temporal distribution of flesh fly larvae during decomposition. Due to species identification of complete material, including larvae, females, and males, it was possible for the first time to reliably answer several questions related to the role of Sarcophagidae in decomposition of large carrion and hence define their forensic importance. Fifteen species of flesh flies were found to visit pig carcasses, with higher diversity and abundance in grasslands as compared to forests. Sex ratio biased towards females was observed only for Sarcophaga argyrostoma, S. caerulescens, S. similis and S. carnaria species group. Gravid females and larvae were collected only in the case of S. argyrostoma, S. caerulescens, S. melanura and S. similis. Sarcophaga caerulescens and S. similis bred regularly in carcasses, while S. argyrostoma was recorded only occasionally. First instar larvae of flesh flies were recorded on carrion earlier or concurrently with first instar larvae of blowflies. Third instar larvae of S. caerulescens were usually observed before the appearance of the third instar blowfly larvae. These results contest the view that flesh flies colonise carcasses later than blowflies. Sarcophaga caerulescens is designated as a good candidate for a broad forensic use in Central European cases.
Sarcophagidae; Europe; Succession; Carrion decomposition; Forensic entomology
Horned beetles, in particular in the genus Onthophagus, are important models for studies on sexual selection, biological radiations, the origin of novel traits, developmental plasticity, biocontrol, conservation, and forensic biology. Despite their growing prominence as models for studying both basic and applied questions in biology, little genomic or transcriptomic data are available for this genus. We used massively parallel pyrosequencing (Roche 454-FLX platform) to produce a comprehensive EST dataset for the horned beetle Onthophagus taurus. To maximize sequence diversity, we pooled RNA extracted from a normalized library encompassing diverse developmental stages and both sexes.
We used 454 pyrosequencing to sequence ESTs from all post-embryonic stages of O. taurus. Approximately 1.36 million reads assembled into 50,080 non-redundant sequences encompassing a total of 26.5 Mbp. The non-redundant sequences match over half of the genes in Tribolium castaneum, the most closely related species with a sequenced genome. Analyses of Gene Ontology annotations and biochemical pathways indicate that the O. taurus sequences reflect a wide and representative sampling of biological functions and biochemical processes. An analysis of sequence polymorphisms revealed that SNP frequency was negatively related to overall expression level and the number of tissue types in which a given gene is expressed. The most variable genes were enriched for a limited number of GO annotations whereas the least variable genes were enriched for a wide range of GO terms directly related to fitness.
This study provides the first large-scale EST database for horned beetles, a much-needed resource for advancing the study of these organisms. Furthermore, we identified instances of gene duplications and alternative splicing, useful for future study of gene regulation, and a large number of SNP markers that could be used in population-genetic studies of O. taurus and possibly other horned beetles.
The aim of his study was to determine development time and thermal requirements of three myiasis flies including Chrysomya albiceps, Lucilia sericata, and Sarcophaga sp.
Rate of development (ROD) and accumulated degree day (ADD) of three important forensic flies in Iran, Chrysomya albiceps, Lucilia sericata, and Sarcophaga sp. by rearing individuals under a single constant temperature (28° C) was calculated using specific formula for four developmental events including egg hatching, larval stages, pupation, and eclosion.
Rates of development decreased step by step as the flies grew from egg to larvae and then to adult stage; however, this rate was bigger for blowflies (C. albiceps and L. sericata) in comparison with the flesh fly Sarcophaga sp. Egg hatching, larval stages, and pupation took about one fourth and half of the time of the total pre-adult development time for all of the three species. In general, the flesh fly Sarcophaga sp. required more heat for development than the blowflies. The thermal constants (K) were 130–195, 148–222, and 221–323 degree-days (DD) for egg hatching to adult stages of C. albiceps, L. sericata, and Sarcophaga sp., respectively.
This is the first report on thermal requirement of three forensic flies in Iran. The data of this study provide preliminary information for forensic entomologist to establish PMI in the area of study.
Degree Day; Forensic Entomology; Larval development; Myiasis; PMI
Plants of the Huperziaceae family, which comprise the two genera Huperzia and Phlegmariurus, produce various types of lycopodium alkaloids that are used to treat a number of human ailments, such as contusions, swellings and strains. Huperzine A, which belongs to the lycodine type of lycopodium alkaloids, has been used as an anti-Alzheimer's disease drug candidate. Despite their medical importance, little genomic or transcriptomic data are available for the members of this family. We used massive parallel pyrosequencing on the Roche 454-GS FLX Titanium platform to generate a substantial EST dataset for Huperzia serrata (H. serrata) and Phlegmariurus carinatus (P. carinatus) as representative members of the Huperzia and Phlegmariurus genera, respectively. H. serrata and P. carinatus are important plants for research on the biosynthesis of lycopodium alkaloids. We focused on gene discovery in the areas of bioactive compound biosynthesis and transcriptional regulation as well as genetic marker detection in these species.
For H. serrata, 36,763 unique putative transcripts were generated from 140,930 reads totaling over 57,028,559 base pairs; for P. carinatus, 31,812 unique putative transcripts were generated from 79,920 reads totaling over 30,498,684 base pairs. Using BLASTX searches of public databases, 16,274 (44.3%) unique putative transcripts from H. serrata and 14,070 (44.2%) from P. carinatus were assigned to at least one protein. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) orthology annotations revealed that the functions of the unique putative transcripts from these two species cover a similarly broad set of molecular functions, biological processes and biochemical pathways.
In particular, a total of 20 H. serrata candidate cytochrome P450 genes, which are more abundant in leaves than in roots and might be involved in lycopodium alkaloid biosynthesis, were found based on the comparison of H. serrata and P. carinatus 454-ESTs and real-time PCR analysis. Four unique putative CYP450 transcripts (Hs01891, Hs04010, Hs13557 and Hs00093) which are the most likely to be involved in the biosynthesis of lycopodium alkaloids were selected based on a phylogenetic analysis. Approximately 115 H. serrata and 98 P. carinatus unique putative transcripts associated with the biosynthesis of triterpenoids, alkaloids and flavones/flavonoids were located in the 454-EST datasets. Transcripts related to phytohormone biosynthesis and signal transduction as well as transcription factors were also obtained. In addition, we discovered 2,729 and 1,573 potential SSR-motif microsatellite loci in the H. serrata and P. carinatus 454-ESTs, respectively.
The 454-EST resource allowed for the first large-scale acquisition of ESTs from H. serrata and P. carinatus, which are representative members of the Huperziaceae family. We discovered many genes likely to be involved in the biosynthesis of bioactive compounds and transcriptional regulation as well as a large number of potential microsatellite markers. These results constitute an essential resource for understanding the molecular basis of developmental regulation and secondary metabolite biosynthesis (especially that of lycopodium alkaloids) in the Huperziaceae, and they provide an overview of the genetic diversity of this family.
The genus Silene is widely used as a model system for addressing ecological and evolutionary questions in plants, but advances in using the genus as a model system are impeded by the lack of available resources for studying its genome. Massively parallel sequencing cDNA has recently developed into an efficient method for characterizing the transcriptomes of non-model organisms, generating massive amounts of data that enable the study of multiple species in a comparative framework. The sequences generated provide an excellent resource for identifying expressed genes, characterizing functional variation and developing molecular markers, thereby laying the foundations for future studies on gene sequence and gene expression divergence. Here, we report the results of a comparative transcriptome sequencing study of eight individuals representing four Silene and one Dianthus species as outgroup. All sequences and annotations have been deposited in a newly developed and publicly available database called SiESTa, the Silene EST annotation database.
A total of 1,041,122 EST reads were generated in two runs on a Roche GS-FLX 454 pyrosequencing platform. EST reads were analyzed separately for all eight individuals sequenced and were assembled into contigs using TGICL. These were annotated with results from BLASTX searches and Gene Ontology (GO) terms, and thousands of single-nucleotide polymorphisms (SNPs) were characterized. Unassembled reads were kept as singletons and together with the contigs contributed to the unigenes characterized in each individual. The high quality of unigenes is evidenced by the proportion (49%) that have significant hits in similarity searches with the A. thaliana proteome. The SiESTa database is accessible at http://www.siesta.ethz.ch.
The sequence collections established in the present study provide an important genomic resource for four Silene and one Dianthus species and will help to further develop Silene as a plant model system. The genes characterized will be useful for future research not only in the species included in the present study, but also in related species for which no genomic resources are yet available. Our results demonstrate the efficiency of massively parallel transcriptome sequencing in a comparative framework as an approach for developing genomic resources in diverse groups of non-model organisms.
cDNA library; database; EST; SNP; Silene
Cultivated watermelon [Citrullus lanatus (Thunb.) Matsum. & Nakai var. lanatus] is an important agriculture crop world-wide. The fruit of watermelon undergoes distinct stages of development with dramatic changes in its size, color, sweetness, texture and aroma. In order to better understand the genetic and molecular basis of these changes and significantly expand the watermelon transcript catalog, we have selected four critical stages of watermelon fruit development and used Roche/454 next-generation sequencing technology to generate a large expressed sequence tag (EST) dataset and a comprehensive transcriptome profile for watermelon fruit flesh tissues.
We performed half Roche/454 GS-FLX run for each of the four watermelon fruit developmental stages (immature white, white-pink flesh, red flesh and over-ripe) and obtained 577,023 high quality ESTs with an average length of 302.8 bp. De novo assembly of these ESTs together with 11,786 watermelon ESTs collected from GenBank produced 75,068 unigenes with a total length of approximately 31.8 Mb. Overall 54.9% of the unigenes showed significant similarities to known sequences in GenBank non-redundant (nr) protein database and around two-thirds of them matched proteins of cucumber, the most closely-related species with a sequenced genome. The unigenes were further assigned with gene ontology (GO) terms and mapped to biochemical pathways. More than 5,000 SSRs were identified from the EST collection. Furthermore we carried out digital gene expression analysis of these ESTs and identified 3,023 genes that were differentially expressed during watermelon fruit development and ripening, which provided novel insights into watermelon fruit biology and a comprehensive resource of candidate genes for future functional analysis. We then generated profiles of several interesting metabolites that are important to fruit quality including pigmentation and sweetness. Integrative analysis of metabolite and digital gene expression profiles helped elucidating molecular mechanisms governing these important quality-related traits during watermelon fruit development.
We have generated a large collection of watermelon ESTs, which represents a significant expansion of the current transcript catalog of watermelon and a valuable resource for future studies on the genomics of watermelon and other closely-related species. Digital expression analysis of this EST collection allowed us to identify a large set of genes that were differentially expressed during watermelon fruit development and ripening, which provide a rich source of candidates for future functional analysis and represent a valuable increase in our knowledge base of watermelon fruit biology.
Anopheles sinensis is a major malaria vector in China and other Southeast Asian countries, and it is becoming increasingly resistant to the insecticides used for agriculture, net impregnation, and indoor residual spray. Very limited genomic information on this species is available, which has hindered the development of new tools for resistance surveillance and vector control. We used the 454 GS FLX system and generated expressed sequence tag (EST) databases of various life stages of An. sinensis, and we determined the transcriptional differences between deltamethrin resistant and susceptible mosquitoes.
The 454 GS FLX transcriptome sequencing yielded a total of 624,559 reads (average length of 290 bp) with the pooled An. sinensis mosquitoes across various development stages. The de novo assembly generated 33,411 contigs with average length of 493 bp. A total of 8,057 ESTs were generated with Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) annotation. A total of 2,131 ESTs were differentially expressed between deltamethrin resistant and susceptible mosquitoes collected from the same field site in Jiangsu, China. Among these differentially expressed ESTs, a total of 294 pathways were mapped to the KEGG database, with the predominant ESTs belonging to metabolic pathways. Furthermore, a total of 2,408 microsatellites and 15,496 single nucleotide polymorphisms (SNPs) were identified.
The annotated EST and transcriptome databases provide a valuable genomic resource for further genetic studies of this important malaria vector species. The differentially expressed ESTs associated with insecticide resistance identified in this study lay an important foundation for further functional analysis. The identified microsatellite and SNP markers will provide useful tools for future population genetic and comparative genomic analyses of malaria vectors.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-448) contains supplementary material, which is available to authorized users.
Transcriptome; Expressed sequence tag; Pyrethroid resistance; Gene expression; Anopheles sinensis
The genetic basis of host preference has been investigated in only a few species. It is relevant to important questions in evolutionary biology, including sympatric speciation, generalist versus specialist adaptation, and parasite-host co-evolution. Here we show that a major locus strongly influences host preference in Nasonia. Nasonia are parasitic wasps that utilize fly pupae; N. vitripennis is a generalist that parasitizes a diverse set of hosts whereas N. giraulti specializes on Protocalliphora (bird blowflies). In laboratory choice experiments using Protocalliphora and Sarcophaga (flesh flies), N. vitripennis shows a preference for Sarcophaga while N. giraulti shows a preference for Protocalliphora. Through a series of interspecies crosses we have introgressed a major locus affecting host preference from N. giraulti into N. vitripennis. The N. giraulti allele is dominant and greatly increases preference for Protocalliphora pupae in the introgression line relative to the recessive N. vitripennis allele. Through the utilization of a Nasonia genotyping microarray, we have identified the introgressed region as 16 megabases of chromosome 4, although a more complete analysis is necessary to determine the exact genetic architecture of host preference in the genus. To our knowledge, this is the first introgression of the host preference of one parasitoid species into another, as well as one of the few cases of introgression of a behavioral gene between species.
host preference; genetic basis; parasitic wasps; Nasonia; generalists; specialists
The reptiles, characterized by both diversity and unique evolutionary adaptations, provide a comprehensive system for comparative studies of metabolism, physiology, and development. However, molecular resources for ectothermic reptiles are severely limited, hampering our ability to study the genetic basis for many evolutionarily important traits such as metabolic plasticity, extreme longevity, limblessness, venom, and freeze tolerance. Here we use massively parallel sequencing (454 GS-FLX Titanium) to generate a transcriptome of the western terrestrial garter snake (Thamnophis elegans) with two goals in mind. First, we develop a molecular resource for an ectothermic reptile; and second, we use these sex-specific transcriptomes to identify differences in the presence of expressed transcripts and potential genes of evolutionary interest.
Using sex-specific pools of RNA (one pool for females, one pool for males) representing 7 tissue types and 35 diverse individuals, we produced 1.24 million sequence reads, which averaged 366 bp in length after cleaning. Assembly of the cleaned reads from both sexes with NEWBLER and MIRA resulted in 96,379 contigs containing 87% of the cleaned reads. Over 34% of these contigs and 13% of the singletons were annotated based on homology to previously identified proteins. From these homology assignments, additional clustering, and ORF predictions, we estimate that this transcriptome contains ~13,000 unique genes that were previously identified in other species and over 66,000 transcripts from unidentified protein-coding genes. Furthermore, we use a graph-clustering method to identify contigs linked by NEWBLER-split reads that represent divergent alleles, gene duplications, and alternatively spliced transcripts. Beyond gene identification, we identified 95,295 SNPs and 31,651 INDELs. From these sex-specific transcriptomes, we identified 190 genes that were only present in the mRNA sequenced from one of the sexes (84 female-specific, 106 male-specific), and many highly variable genes of evolutionary interest.
This is the first large-scale, multi-organ transcriptome for an ectothermic reptile. This resource provides the most comprehensive set of EST sequences available for an individual ectothermic reptile species, increasing the number of snake ESTs 50-fold. We have identified genes that appear to be under evolutionary selection and those that are sex-specific. This resource will assist studies on gene expression and comparative genomics, and will facilitate the study of evolutionarily important traits at the molecular level.
Expressed Sequence Tags (ESTs) have played significant roles in gene discovery and gene functional analysis, especially for non-model organisms. For organisms with no full genome sequences available, ESTs are normally assembled into longer consensus sequences for further downstream analysis. However current de novo EST assembly programs often generate large number of assembly errors that will negatively affect the downstream analysis. In order to generate more accurate consensus sequences from ESTs, tools are needed to reduce or eliminate errors from de novo assemblies.
We present iAssembler, a pipeline that can assemble large-scale ESTs into consensus sequences with significantly higher accuracy than current existing assemblers. iAssembler employs MIRA and CAP3 assemblers to generate initial assemblies, followed by identifying and correcting two common types of transcriptome assembly errors: 1) ESTs from different transcripts (mainly alternatively spliced transcripts or paralogs) are incorrectly assembled into same contigs; and 2) ESTs from same transcripts fail to be assembled together. iAssembler can be used to assemble ESTs generated using the traditional Sanger method and/or the Roche-454 massive parallel pyrosequencing technology.
We compared performances of iAssembler and several other de novo EST assembly programs using both Roche-454 and Sanger EST datasets. It demonstrated that iAssembler generated significantly more accurate consensus sequences than other assembly programs.
Pythium species are an agriculturally important genus of plant pathogens, yet are not understood well at the molecular, genetic, or genomic level. They are closely related to other oomycete plant pathogens such as Phytophthora species and are ubiquitous in their geographic distribution and host rage. To gain a better understanding of its gene complement, we generated Expressed Sequence Tags (ESTs) from the transcriptome of Pythium ultimum DAOM BR144 (= ATCC 200006 = CBS 805.95) using two high throughput sequencing methods, Sanger-based chain termination sequencing and pyrosequencing-based sequencing-by-synthesis.
A single half-plate pyrosequencing (454 FLX) run on adapter-ligated cDNA from a normalized cDNA population generated 90,664 reads with an average read length of 190 nucleotides following cleaning and removal of sequences shorter than 100 base pairs. After clustering and assembly, a total of 35,507 unique sequences were generated. In parallel, 9,578 reads were generated from a library constructed from the same normalized cDNA population using dideoxy chain termination Sanger sequencing, which upon clustering and assembly generated 4,689 unique sequences. A hybrid assembly of both Sanger- and pyrosequencing-derived ESTs resulted in 34,495 unique sequences with 1,110 sequences (3.2%) that were solely derived from Sanger sequencing alone. A high degree of similarity was seen between P. ultimum sequences and other sequenced plant pathogenic oomycetes with 91% of the hybrid assembly derived sequences > 500 bp having similarity to sequences from plant pathogenic Phytophthora species. An analysis of Gene Ontology assignments revealed a similar representation of molecular function ontologies in the hybrid assembly in comparison to the predicted proteomes of three Phytophthora species, suggesting a broad representation of the P. ultimum transcriptome was present in the normalized cDNA population. P. ultimum sequences with similarity to oomycete RXLR and Crinkler effectors, Kazal-like and cystatin-like protease inhibitors, and elicitins were identified. Sequences with similarity to thiamine biosynthesis enzymes that are lacking in the genome sequences of three Phytophthora species and one downy mildew were identified and could serve as useful phylogenetic markers. Furthermore, we identified 179 candidate simple sequence repeats that can be used for genotyping strains of P. ultimum.
Through these two technologies, we were able to generate a robust set (~10 Mb) of transcribed sequences for P. ultimum. We were able to identify known sequences present in oomycetes as well as identify novel sequences. An ample number of candidate polymorphic markers were identified in the dataset providing resources for phylogenetic and diagnostic marker development for this species. On a technical level, in spite of the depth possible with 454 FLX platform, the Sanger and pyro-based sequencing methodologies were complementary as each method generated sequences unique to each platform.
The Colorado potato beetle (Leptinotarsa decemlineata) is a major pest and a serious threat to potato cultivation throughout the northern hemisphere. Despite its high importance for invasion biology, phenology and pest management, little is known about L. decemlineata from a genomic perspective. We subjected European L. decemlineata adult and larval transcriptome samples to 454-FLX massively-parallel DNA sequencing to characterize a basal set of genes from this species. We created a combined assembly of the adult and larval datasets including the publicly available midgut larval Roche 454 reads and provided basic annotation. We were particularly interested in diapause-specific genes and genes involved in pesticide and Bacillus thuringiensis (Bt) resistance.
Using 454-FLX pyrosequencing, we obtained a total of 898,048 reads which, together with the publicly available 804,056 midgut larval reads, were assembled into 121,912 contigs. We established a repository of genes of interest, with 101 out of the 108 diapause-specific genes described in Drosophila montana; and 621 contigs involved in insecticide resistance, including 221 CYP450, 45 GSTs, 13 catalases, 15 superoxide dismutases, 22 glutathione peroxidases, 194 esterases, 3 ADAM metalloproteases, 10 cadherins and 98 calmodulins. We found 460 putative miRNAs and we predicted a significant number of single nucleotide polymorphisms (29,205) and microsatellite loci (17,284).
This report of the assembly and annotation of the transcriptome of L. decemlineata offers new insights into diapause-associated and insecticide-resistance-associated genes in this species and provides a foundation for comparative studies with other species of insects. The data will also open new avenues for researchers using L. decemlineata as a model species, and for pest management research. Our results provide the basis for performing future gene expression and functional analysis in L. decemlineata and improve our understanding of the biology of this invasive species at the molecular level.
Cucumber, Cucumis sativus L., is an economically and nutritionally important crop of the Cucurbitaceae family and has long served as a primary model system for sex determination studies. Recently, the sequencing of its whole genome has been completed. However, transcriptome information of this species is still scarce, with a total of around 8,000 Expressed Sequence Tag (EST) and mRNA sequences currently available in GenBank. In order to gain more insights into molecular mechanisms of plant sex determination and provide the community a functional genomics resource that will facilitate cucurbit research and breeding, we performed transcriptome sequencing of cucumber flower buds of two near-isogenic lines, WI1983G, a gynoecious plant which bears only pistillate flowers, and WI1983H, a hermaphroditic plant which bears only bisexual flowers.
Using Roche-454 massive parallel pyrosequencing technology, we generated a total of 353,941 high quality EST sequences with an average length of 175bp, among which 188,255 were from gynoecious flowers and 165,686 from hermaphroditic flowers. These EST sequences, together with ~5,600 high quality cucumber EST and mRNA sequences available in GenBank, were clustered and assembled into 81,401 unigenes, of which 28,452 were contigs and 52,949 were singletons. The unigenes and ESTs were further mapped to the cucumber genome and more than 500 alternative splicing events were identified in 443 cucumber genes. The unigenes were further functionally annotated by comparing their sequences to different protein and functional domain databases and assigned with Gene Ontology (GO) terms. A biochemical pathway database containing 343 predicted pathways was also created based on the annotations of the unigenes. Digital expression analysis identified ~200 differentially expressed genes between flowers of WI1983G and WI1983H and provided novel insights into molecular mechanisms of plant sex determination process. Furthermore, a set of SSR motifs and high confidence SNPs between WI1983G and WI1983H were identified from the ESTs, which provided the material basis for future genetic linkage and QTL analysis.
A large set of EST sequences were generated from cucumber flower buds of two different sex types. Differentially expressed genes between these two different sex-type flowers, as well as putative SSR and SNP markers, were identified. These EST sequences provide valuable information to further understand molecular mechanisms of plant sex determination process and forms a rich resource for future functional genomics analysis, marker development and cucumber breeding.
White mold, caused by Sclerotinia sclerotiorum, is one of the most important diseases of pea (Pisum sativum L.), however, little is known about the genetics and biochemistry of this interaction. Identification of genes underlying resistance in the host or pathogenicity and virulence factors in the pathogen will increase our knowledge of the pea-S. sclerotiorum interaction and facilitate the introgression of new resistance genes into commercial pea varieties. Although the S. sclerotiorum genome sequence is available, no pea genome is available, due in part to its large genome size (~3500 Mb) and extensive repeated motifs. Here we present an EST data set specific to the interaction between S. sclerotiorum and pea, and a method to distinguish pathogen and host sequences without a species-specific reference genome.
10,158 contigs were obtained by de novo assembly of 128,720 high-quality reads generated by 454 pyrosequencing of the pea-S. sclerotiorum interactome. A method based on the tBLASTx program was modified to distinguish pea and S. sclerotiorum ESTs. To test this strategy, a mixture of known ESTs (18,490 pea and 17,198 S. sclerotiorum ESTs) from public databases were pooled and parsed; the tBLASTx method successfully separated 90.1% of the artificial EST mix with 99.9% accuracy. The tBLASTx method successfully parsed 89.4% of the 454-derived EST contigs, as validated by PCR, into pea (6,299 contigs) and S. sclerotiorum (2,780 contigs) categories. Two thousand eight hundred and forty pea ESTs and 996 S. sclerotiorum ESTs were predicted to be expressed specifically during the pea-S. sclerotiorum interaction as determined by homology search against 81,449 pea ESTs (from flowers, leaves, cotyledons, epi- and hypocotyl, and etiolated and light treated etiolated seedlings) and 57,751 S. sclerotiorum ESTs (from mycelia at neutral pH, developing apothecia and developing sclerotia). Among those ESTs specifically expressed, 277 (9.8%) pea ESTs were predicted to be involved in plant defense and response to biotic or abiotic stress, and 93 (9.3%) S. sclerotiorum ESTs were predicted to be involved in pathogenicity/virulence. Additionally, 142 S. sclerotiorum ESTs were identified as secretory/signal peptides of which only 21 were previously reported.
We present and characterize an EST resource specific to the pea-S. sclerotiorum interaction. Additionally, the tBLASTx method used to parse S. sclerotiorum and pea ESTs was demonstrated to be a reliable and accurate method to distinguish ESTs without a reference genome.
Pisum sativum; Sclerotinia sclerotiorum; Transcriptome; Parsing of host-pathogen sequences; Non-model organism
The olive fruit fly Bactrocera oleae has a unique ability to cope with olive flesh, and is the most destructive pest of olives worldwide. Its control has been largely based on the use of chemical insecticides, however, the selection of insecticide resistance against several insecticides has evolved. The study of detoxification mechanisms, which allow the olive fruit fly to defend against insecticides, and/or phytotoxins possibly present in the mesocarp, has been hampered by the lack of genomic information in this species. In the NCBI database less than 1,000 nucleotide sequences have been deposited, with less than 10 detoxification gene homologues in total. We used 454 pyrosequencing to produce, for the first time, a large transcriptome dataset for B. oleae. A total of 482,790 reads were assembled into 14,204 contigs. More than 60% of those contigs (8,630) were larger than 500 base pairs, and almost half of them matched with genes of the order of the Diptera. Analysis of the Gene Ontology (GO) distribution of unique contigs, suggests that, compared to other insects, the assembly is broadly representative for the B. oleae transcriptome. Furthermore, the transcriptome was found to contain 55 P450, 43 GST-, 15 CCE- and 18 ABC transporter-genes. Several of those detoxification genes, may putatively be involved in the ability of the olive fruit fly to deal with xenobiotics, such as plant phytotoxins and insecticides. In summary, our study has generated new data and genomic resources, which will substantially facilitate molecular studies in B. oleae, including elucidation of detoxification mechanisms of xenobiotic, as well as other important aspects of olive fruit fly biology.
Big sagebrush (Artemisia tridentata) is one of the most widely distributed and ecologically important shrub species in western North America. This species serves as a critical habitat and food resource for many animals and invertebrates. Habitat loss due to a combination of disturbances followed by establishment of invasive plant species is a serious threat to big sagebrush ecosystem sustainability. Lack of genomic data has limited our understanding of the evolutionary history and ecological adaptation in this species. Here, we report on the sequencing of expressed sequence tags (ESTs) and detection of single nucleotide polymorphism (SNP) and simple sequence repeat (SSR) markers in subspecies of big sagebrush.
cDNA of A. tridentata sspp. tridentata and vaseyana were normalized and sequenced using the 454 GS FLX Titanium pyrosequencing technology. Assembly of the reads resulted in 20,357 contig consensus sequences in ssp. tridentata and 20,250 contigs in ssp. vaseyana. A BLASTx search against the non-redundant (NR) protein database using 29,541 consensus sequences obtained from a combined assembly resulted in 21,436 sequences with significant blast alignments (≤ 1e-15). A total of 20,952 SNPs and 119 polymorphic SSRs were detected between the two subspecies. SNPs were validated through various methods including sequence capture. Validation of SNPs in different individuals uncovered a high level of nucleotide variation in EST sequences. EST sequences of a third, tetraploid subspecies (ssp. wyomingensis) obtained by Illumina sequencing were mapped to the consensus sequences of the combined 454 EST assembly. Approximately one-third of the SNPs between sspp. tridentata and vaseyana identified in the combined assembly were also polymorphic within the two geographically distant ssp. wyomingensis samples.
We have produced a large EST dataset for Artemisia tridentata, which contains a large sample of the big sagebrush leaf transcriptome. SNP mapping among the three subspecies suggest the origin of ssp. wyomingensis via mixed ancestry. A large number of SNP and SSR markers provide the foundation for future research to address questions in big sagebrush evolution, ecological genetics, and conservation using genomic approaches.
The razor clam Sinonovacula constricta is a benthic intertidal bivalve species with important commercial value. Despite its economic importance, knowledge of its transcriptome is scarce. Next generation sequencing technologies offer rapid and efficient tools for generating large numbers of sequences, which can be used to characterize the transcriptome, to develop effective molecular markers and to identify genes associated with growth, a key breeding trait.
Total RNA was isolated from the mantle, gill, liver, siphon, gonad and muscular foot tissues. High-throughput deep sequencing of S. constricta using 454 pyrosequencing technology yielded 859,313 high-quality reads with an average read length of 489 bp. Clustering and assembly of these reads produced 16,323 contigs and 131,346 singletons with average lengths of 1,376 bp and 458 bp, respectively. Based on transcriptome sequencing, 14,615 sequences had significant matches with known genes encoding 147,669 predicted proteins. Subsequently, previously unknown growth-related genes were identified. A total of 13,563 microsatellites (SSRs) and 13,634 high-confidence single nucleotide polymorphism loci (SNPs) were discovered, of which almost half were validated.
De novo sequencing of the razor clam S. constricta transcriptome on the 454 GS FLX platform generated a large number of ESTs. Candidate growth factors and a large number of SSRs and SNPs were identified. These results will impact genetic studies of S. constricta.
The striped bass and its relatives (genus Morone) are important fisheries and aquaculture species native to estuaries and rivers of the Atlantic coast and Gulf of Mexico in North America. To open avenues of gene expression research on reproduction and breeding of striped bass, we generated a collection of expressed sequence tags (ESTs) from a complementary DNA (cDNA) library representative of their ovarian transcriptome.
Sequences of a total of 230,151 ESTs (51,259,448 bp) were acquired by Roche 454 pyrosequencing of cDNA pooled from ovarian tissues obtained at all stages of oocyte growth, at ovulation (eggs), and during preovulatory atresia. Quality filtering of ESTs allowed assembly of 11,208 high-quality contigs ≥ 100 bp, including 2,984 contigs 500 bp or longer (average length 895 bp). Blastx comparisons revealed 5,482 gene orthologues (E-value < 10-3), of which 4,120 (36.7% of total contigs) were annotated with Gene Ontology terms (E-value < 10-6). There were 5,726 remaining unknown unique sequences (51.1% of total contigs). All of the high-quality EST sequences are available in the National Center for Biotechnology Information (NCBI) Short Read Archive (GenBank: SRX007394). Informative contigs were considered to be abundant if they were assembled from groups of ESTs comprising ≥ 0.15% of the total short read sequences (≥ 345 reads/contig). Approximately 52.5% of these abundant contigs were predicted to have predominant ovary expression through digital differential display in silico comparisons to zebrafish (Danio rerio) UniGene orthologues. Over 1,300 Gene Ontology terms from Biological Process classes of Reproduction, Reproductive process, and Developmental process were assigned to this collection of annotated contigs.
This first large reference sequence database available for the ecologically and economically important temperate basses (genus Morone) provides a foundation for gene expression studies in these species. The predicted predominance of ovary gene expression and assignment of directly relevant Gene Ontology classes suggests a powerful utility of this dataset for analysis of ovarian gene expression related to fundamental questions of oogenesis. Additionally, a high definition Agilent 60-mer oligo ovary 'UniClone' microarray with 8 × 15,000 probe format has been designed based on this striped bass transcriptome (eArray Group: Striper Group, Design ID: 029004).
Lycoris aurea, also called Golden Magic Lily, is an ornamentally and medicinally important species of the Amaryllidaceae family. To date, the sequencing of its whole genome is unavailable as a non-model organism. Transcriptomic information is also scarce for this species. In this study, we performed de novo transcriptome sequencing to produce the first comprehensive expressed sequence tag (EST) dataset for L. aurea using high-throughput sequencing technology.
Methodology and Principal Findings
Total RNA was isolated from leaves with sodium nitroprusside (SNP), salicylic acid (SA), or methyl jasmonate (MeJA) treatment, stems, and flowers at the bud, blooming, and wilting stages. Equal quantities of RNA from each tissue and stage were pooled to construct a cDNA library. Using 454 pyrosequencing technology, a total of 937,990 high quality reads (308.63 Mb) with an average read length of 329 bp were generated. Clustering and assembly of these reads produced a non-redundant set of 141,111 unique sequences, comprising 24,604 contigs and 116,507 singletons. All of the unique sequences were involved in the biological process, cellular component and molecular function categories by GO analysis. Potential genes and their functions were predicted by KEGG pathway mapping and COG analysis. Based on our sequence analysis and published literatures, many putative genes involved in Amaryllidaceae alkaloids synthesis, including PAL, TYDC OMT, NMT, P450, and other potentially important candidate genes, were identified for the first time in this Lycoris. Furthermore, 6,386 SSRs and 18,107 high-confidence SNPs were identified in this EST dataset.
The transcriptome provides an invaluable new data for a functional genomics resource and future biological research in L. aurea. The molecular markers identified in this study will provide a material basis for future genetic linkage and quantitative trait loci analyses, and will provide useful information for functional genomic research in future.
Although tropical climate of Thailand is suitably endowed with biodiversity of insects, flies of medical importance is not well investigated. Using information from literature search, fly survey approach and specialist’s experience, we review database of Sarcophaga (Liosarcophaga) dux Thomson (Diptera: Sarcophagidae), one of the priorities flesh fly species of medical importance in Thailand.
This review deals with morphology, bionomics and medical involvement. Important morphological characteristics of egg, larva, puparia and adult were highlighted with illustration and/or micrographs. Search pertaining to molecular analysis used for fly identification and developmental rate of larvae were included. Medical involvement of larvae was not only myiasis-producing agent in humans and animals, but associated with human death investigations.
This information will enable us to accurate identify this species and to emphasis the increase medically important scene in Thailand.
Sarcophaga dux; Review literature; Thailand; Forensic entomology; Myiasis; Morphology; Adult; Immature stages