1.  In depth annotation of the Anopheles gambiae mosquito midgut transcriptome 
BMC Genomics  2014;15(1):636.
Genome sequencing of Anopheles gambiae was completed more than ten years ago and has accelerated research on malaria transmission. However, annotation needs to be refined and verified experimentally, as most predicted transcripts have been identified by comparative analysis with genomes from other species. The mosquito midgut—the first organ to interact with Plasmodium parasites—mounts effective antiplasmodial responses that limit parasite survival and disease transmission. High-throughput Illumina sequencing of the midgut transcriptome was used to identify new genes and transcripts, contributing to the refinement of An. gambiae genome annotation.
We sequenced ~223 million reads from An. gambiae midgut cDNA libraries generated from susceptible (G3) and refractory (L35) mosquito strains. Mosquitoes were infected with either Plasmodium berghei or Plasmodium falciparum, and midguts were collected after the first or second Plasmodium infection. In total, 22,889 unique midgut transcript models were generated from both An. gambiae strain sequences combined, and 76% are potentially novel. Of these novel transcripts, 49.5% aligned with annotated genes and appear to be isoforms or pre-mRNAs of reference transcripts, while 50.5% mapped to regions between annotated genes and represent novel intergenic transcripts (NITs). Predicted models were validated for midgut expression using qRT-PCR and microarray analysis, and novel isoforms were confirmed by sequencing predicted intron-exon boundaries. Coding potential analysis revealed that 43% of total midgut transcripts appear to be long non-coding RNA (lncRNA), and functional annotation of NITs showed that 68% had no homology to current databases from other species. Reads were also analyzed using de novo assembly and predicted transcripts compared with genome mapping-based models. Finally, variant analysis of G3 and L35 midgut transcripts detected 160,742 variants with respect to the An. gambiae PEST genome, and 74% were new variants. Intergenic transcripts had a higher frequency of variation compared with non-intergenic transcripts.
This in-depth Illumina sequencing and assembly of the An. gambiae midgut transcriptome doubled the number of known transcripts and tripled the number of variants known in this mosquito species. It also revealed existence of a large number of lncRNA and opens new possibilities for investigating the biological function of many newly discovered transcripts.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-636) contains supplementary material, which is available to authorized users.
PMCID: PMC4131051  PMID: 25073905
2.  A deep insight into the sialotranscriptome of the mosquito, Psorophora albipes 
BMC Genomics  2013;14:875.
Psorophora mosquitoes are exclusively found in the Americas and have been associated with transmission of encephalitis and West Nile fever viruses, among other arboviruses. Mosquito salivary glands represent the final route of differentiation and transmission of many parasites. They also secrete molecules with powerful pharmacologic actions that modulate host hemostasis, inflammation, and immune response. Here, we employed next generation sequencing and proteome approaches to investigate for the first time the salivary composition of a mosquito member of the Psorophora genus. We additionally discuss the evolutionary position of this mosquito genus into the Culicidae family by comparing the identity of its secreted salivary compounds to other mosquito salivary proteins identified so far.
Illumina sequencing resulted in 13,535,229 sequence reads, which were assembled into 3,247 contigs. All families were classified according to their in silico-predicted function/ activity. Annotation of these sequences allowed classification of their products into 83 salivary protein families, twenty (24.39%) of which were confirmed by our subsequent proteome analysis. Two protein families were deorphanized from Aedes and one from Ochlerotatus, while four protein families were described as novel to Psorophora genus because they had no match with any other known mosquito salivary sequence. Several protein families described as exclusive to Culicines were present in Psorophora mosquitoes, while we did not identify any member of the protein families already known as unique to Anophelines. Also, the Psorophora salivary proteins had better identity to homologs in Aedes (69.23%), followed by Ochlerotatus (8.15%), Culex (6.52%), and Anopheles (4.66%), respectively.
This is the first sialome (from the Greek sialo = saliva) catalog of salivary proteins from a Psorophora mosquito, which may be useful for better understanding the lifecycle of this mosquito and the role of its salivary secretion in arboviral transmission.
PMCID: PMC3878727  PMID: 24330624
3.  An insight into the sialome of Simulium guianense (DIPTERA:SIMulIIDAE), the main vector of River Blindness Disease in Brazil 
BMC Genomics  2011;12:612.
Little is known about the composition and function of the saliva in black flies such as Simulium guianense, the main vector of river blindness disease in Brazil. The complex salivary potion of hematophagous arthropods counteracts their host's hemostasis, inflammation, and immunity.
Transcriptome analysis revealed ubiquitous salivary protein families--such as the Antigen-5, Yellow, Kunitz domain, and serine proteases--in the S. guianense sialotranscriptome. Insect-specific families were also found. About 63.4% of all secreted products revealed protein families found only in Simulium. Additionally, we found a novel peptide similar to kunitoxin with a structure distantly related to serine protease inhibitors. This study revealed a relative increase of transcripts of the SVEP protein family when compared with Simulium vittatum and S. nigrimanum sialotranscriptomes. We were able to extract coding sequences from 164 proteins associated with blood and sugar feeding, the majority of which were confirmed by proteome analysis.
Our results contribute to understanding the role of Simulium saliva in transmission of Onchocerca volvulus and evolution of salivary proteins in black flies. It also consists of a platform for mining novel anti-hemostatic compounds, vaccine candidates against filariasis, and immuno-epidemiologic markers of vector exposure.
PMCID: PMC3285218  PMID: 22182526
4.  Novel transposable elements from Anopheles gambiae 
BMC Genomics  2011;12:260.
Transposable elements (TEs) are DNA sequences, present in the genome of most eukaryotic organisms that hold the key characteristic of being able to mobilize and increase their copy number within chromosomes. These elements are important for eukaryotic genome structure and evolution and lately have been considered as potential drivers for introducing transgenes into pathogen-transmitting insects as a means to control vector-borne diseases. The aim of this work was to catalog the diversity and abundance of TEs within the Anopheles gambiae genome using the PILER tool and to consolidate a database in the form of a hyperlinked spreadsheet containing detailed and readily available information about the TEs present in the genome of An. gambiae.
Here we present the spreadsheet named AnoTExcel that constitutes a database with detailed information on most of the repetitive elements present in the genome of the mosquito. Despite previous work on this topic, our approach permitted the identification and characterization both of previously described and novel TEs that are further described in detailed.
Identification and characterization of TEs in a given genome is important as a way to understand the diversity and evolution of the whole set of TEs present in a given species. This work contributes to a better understanding of the landscape of TEs present in the mosquito genome. It also presents a novel platform for the identification, analysis, and characterization of TEs on sequenced genomes.
PMCID: PMC3212995  PMID: 21605407
5.  A further insight into the sialome of the tropical bont tick, Amblyomma variegatum 
BMC Genomics  2011;12:136.
Ticks--vectors of medical and veterinary importance--are themselves also significant pests. Tick salivary proteins are the result of adaptation to blood feeding and contain inhibitors of blood clotting, platelet aggregation, and angiogenesis, as well as vasodilators and immunomodulators. A previous analysis of the sialotranscriptome (from the Greek sialo, saliva) of Amblyomma variegatum is revisited in light of recent advances in tick sialomes and provides a database to perform a proteomic study.
The clusterized data set has been expertly curated in light of recent reviews on tick salivary proteins, identifying many new families of tick-exclusive proteins. A proteome study using salivary gland homogenates identified 19 putative secreted proteins within a total of 211 matches.
The annotated sialome of A. variegatum allows its comparison to other tick sialomes, helping to consolidate an emerging pattern in the salivary composition of metastriate ticks; novel protein families were also identified. Because most of these proteins have no known function, the task of functional analysis of these proteins and the discovery of novel pharmacologically active compounds becomes possible.
PMCID: PMC3060141  PMID: 21362191
6.  An insight into the sialotranscriptome of the brown dog tick, Rhipicephalus sanguineus 
BMC Genomics  2010;11:450.
Rhipicephalus sanguineus, known as the brown dog tick, is a common ectoparasite of domestic dogs and can be found worldwide. R.sanguineus is recognized as the primary vector of the etiological agent of canine monocytic ehrlichiosis and canine babesiosis. Here we present the first description of a R. sanguineus salivary gland transcriptome by the production and analysis of 2,034 expressed sequence tags (EST) from two cDNA libraries, one consctructed using mRNA from dissected salivary glands from female ticks fed for 3-5 days (early to mid library, RsSGL1) and the another from ticks fed for 5 days (mid library, RsSGL2), identifying 1,024 clusters of related sequences.
Based on sequence similarities to nine different databases, we identified transcripts of genes that were further categorized according to function. The category of putative housekeeping genes contained ~56% of the sequences and had on average 2.49 ESTs per cluster, the secreted protein category contained 26.6% of the ESTs and had 2.47 EST's/clusters, while 15.3% of the ESTs, mostly singletons, were not classifiable, and were annotated as "unknown function". The secreted category included genes that coded for lipocalins, proteases inhibitors, disintegrins, metalloproteases, immunomodulatory and antiinflammatory proteins, as Evasins and Da-p36, as well as basic-tail and 18.3 kDa proteins, cement proteins, mucins, defensins and antimicrobial peptides. Comparison of the abundance of ESTs from similar contigs of the two salivary gland cDNA libraries allowed the identification of differentially expressed genes, such as genes coding for Evasins and a thrombin inhibitor, which were over expressed in the RsSGL1 (early to mid library) versus RsSGL2 (mid library), indicating their role in inhibition of inflammation at the tick feeding site from the very beginning of the blood meal. Conversely, sequences related to cement (64P), which function has been correlated with tick attachment, was largely expressed in the mid library.
Our survey provided an insight into the R. sanguineus sialotranscriptome, which can assist the discovery of new targets for anti-tick vaccines, as well as help to identify pharmacologically active proteins.
PMCID: PMC3091647  PMID: 20650005
7.  An insight into the sialome of Glossina morsitans morsitans 
BMC Genomics  2010;11:213.
Blood feeding evolved independently in worms, arthropods and mammals. Among the adaptations to this peculiar diet, these animals developed an armament of salivary molecules that disarm their host's anti-bleeding defenses (hemostasis), inflammatory and immune reactions. Recent sialotranscriptome analyses (from the Greek sialo = saliva) of blood feeding insects and ticks have revealed that the saliva contains hundreds of polypeptides, many unique to their genus or family. Adult tsetse flies feed exclusively on vertebrate blood and are important vectors of human and animal diseases. Thus far, only limited information exists regarding the Glossina sialome, or any other fly belonging to the Hippoboscidae.
As part of the effort to sequence the genome of Glossina morsitans morsitans, several organ specific, high quality normalized cDNA libraries have been constructed, from which over 20,000 ESTs from an adult salivary gland library were sequenced. These ESTs have been assembled using previously described ESTs from the fat body and midgut libraries of the same fly, thus totaling 62,251 ESTs, which have been assembled into 16,743 clusters (8,506 of which had one or more EST from the salivary gland library). Coding sequences were obtained for 2,509 novel proteins, 1,792 of which had at least one EST expressed in the salivary glands. Despite library normalization, 59 transcripts were overrepresented in the salivary library indicating high levels of expression. This work presents a detailed analysis of the salivary protein families identified. Protein expression was confirmed by 2D gel electrophoresis, enzymatic digestion and mass spectrometry. Concurrently, an initial attempt to determine the immunogenic properties of selected salivary proteins was undertaken.
The sialome of G. m. morsitans contains over 250 proteins that are possibly associated with blood feeding. This set includes alleles of previously described gene products, reveals new evidence that several salivary proteins are multigenic and identifies at least seven new polypeptide families unique to Glossina. Most of these proteins have no known function and thus, provide a discovery platform for the identification of novel pharmacologically active compounds, innovative vector-based vaccine targets, and immunological markers of vector exposure.
PMCID: PMC2853526  PMID: 20353571
8.  Transcriptome analysis of reproductive tissue and intrauterine developmental stages of the tsetse fly (Glossina morsitans morsitans) 
BMC Genomics  2010;11:160.
Tsetse flies, vectors of African trypanosomes, undergo viviparous reproduction (the deposition of live offspring). This reproductive strategy results in a large maternal investment and the deposition of a small number of progeny during a female's lifespan. The reproductive biology of tsetse has been studied on a physiological level; however the molecular analysis of tsetse reproduction requires deeper investigation. To build a foundation from which to base molecular studies of tsetse reproduction, a cDNA library was generated from female tsetse (Glossina morsitans morsitans) reproductive tissues and the intrauterine developmental stages. 3438 expressed sequence tags were sequenced and analyzed.
Analysis of a nonredundant catalogue of 1391 contigs resulted in 520 predicted proteins. 475 of these proteins were full length. We predict that 412 of these represent cytoplasmic proteins while 57 are secreted. Comparison of these proteins with other tissue specific tsetse cDNA libraries (salivary gland, fat body/milk gland, and midgut) identified 51 that are unique to the reproductive/immature cDNA library. 11 unique proteins were homologus to uncharacterized putative proteins within the NR database suggesting the identification of novel genes associated with reproductive functions in other insects (hypothetical conserved). The analysis also yielded seven putative proteins without significant homology to sequences present in the public database (unknown genes). These proteins may represent unique functions associated with tsetse's viviparous reproductive cycle. RT-PCR analysis of hypothetical conserved and unknown contigs was performed to determine basic tissue and stage specificity of the expression of these genes.
This paper identifies 51 putative proteins specific to a tsetse reproductive/immature EST library. 11 of these proteins correspond to hypothetical conserved genes and 7 proteins are tsetse specific.
PMCID: PMC2846916  PMID: 20214793
9.  An insight into the sialotranscriptome of the West Nile mosquito vector, Culex tarsalis 
BMC Genomics  2010;11:51.
Saliva of adult female mosquitoes help sugar and blood feeding by providing enzymes and polypeptides that help sugar digestion, control microbial growth and counteract their vertebrate host hemostasis and inflammation. Mosquito saliva also potentiates the transmission of vector borne pathogens, including arboviruses. Culex tarsalis is a bird feeding mosquito vector of West Nile Virus closely related to C. quinquefasciatus, a mosquito relatively recently adapted to feed on humans, and the only mosquito of the genus Culex to have its sialotranscriptome so far described.
A total of 1,753 clones randomly selected from an adult female C. tarsalis salivary glands (SG) cDNA library were sequenced and used to assemble a database that yielded 809 clusters of related sequences, 675 of which were singletons. Primer extension experiments were performed in selected clones to further extend sequence coverage, allowing for the identification of 283 protein sequences, 80 of which code for putative secreted proteins.
Comparison of the C. tarsalis sialotranscriptome with that of C. quinquefasciatus reveals accelerated evolution of salivary proteins as compared to housekeeping proteins. The average amino acid identity among salivary proteins is 70.1%, while that for housekeeping proteins is 91.2% (P < 0.05), and the codon volatility of secreted proteins is significantly higher than those of housekeeping proteins. Several protein families previously found exclusive of mosquitoes, including only in the Aedes genus have been identified in C. tarsalis. Interestingly, a protein family so far unique to C. quinquefasciatus, with 30 genes, is also found in C. tarsalis, indicating it was not a specific C. quinquefasciatus acquisition in its evolution to optimize mammal blood feeding.
PMCID: PMC2823692  PMID: 20089177
10.  The salivary gland transcriptome of the neotropical malaria vector Anopheles darlingi reveals accelerated evolution of genes relevant to hematophagy 
BMC Genomics  2009;10:57.
Mosquito saliva, consisting of a mixture of dozens of proteins affecting vertebrate hemostasis and having sugar digestive and antimicrobial properties, helps both blood and sugar meal feeding. Culicine and anopheline mosquitoes diverged ~150 MYA, and within the anophelines, the New World species diverged from those of the Old World ~95 MYA. While the sialotranscriptome (from the Greek sialo, saliva) of several species of the Cellia subgenus of Anopheles has been described thoroughly, no detailed analysis of any New World anopheline has been done to date. Here we present and analyze data from a comprehensive salivary gland (SG) transcriptome of the neotropical malaria vector Anopheles darlingi (subgenus Nyssorhynchus).
A total of 2,371 clones randomly selected from an adult female An. darlingi SG cDNA library were sequenced and used to assemble a database that yielded 966 clusters of related sequences, 739 of which were singletons. Primer extension experiments were performed in selected clones to further extend sequence coverage, allowing for the identification of 183 protein sequences, 114 of which code for putative secreted proteins.
Comparative analysis of sialotranscriptomes of An. darlingi and An. gambiae reveals significant divergence of salivary proteins. On average, salivary proteins are only 53% identical, while housekeeping proteins are 86% identical between the two species. Furthermore, An. darlingi proteins were found that match culicine but not anopheline proteins, indicating loss or rapid evolution of these proteins in the old world Cellia subgenus. On the other hand, several well represented salivary protein families in old world anophelines are not expressed in An. darlingi.
PMCID: PMC2644710  PMID: 19178717
11.  cDNA sequences reveal considerable gene prediction inaccuracy in the Plasmodium falciparum genome 
BMC Genomics  2007;8:255.
The completion of the Plasmodium falciparum genome represents a milestone in malaria research. The genome sequence allows for the development of genome-wide approaches such as microarray and proteomics that will greatly facilitate our understanding of the parasite biology and accelerate new drug and vaccine development. Designing and application of these genome-wide assays, however, requires accurate information on gene prediction and genome annotation. Unfortunately, the genes in the parasite genome databases were mostly identified using computer software that could make some erroneous predictions.
We aimed to obtain cDNA sequences to examine the accuracy of gene prediction in silico. We constructed cDNA libraries from mixed blood stages of P. falciparum parasite using the SMART cDNA library construction technique and generated 17332 high-quality expressed sequence tags (EST), including 2198 from primer-walking experiments. Assembly of our sequence tags produced 2548 contigs and 2671 singletons versus 5220 contigs and 5910 singletons when our EST were assembled with EST in public databases. Comparison of all the assembled EST/contigs with predicted CDS and genomic sequences in the PlasmoDB database identified 356 genes with predicted coding sequences fully covered by EST, including 85 genes (23.6%) with introns incorrectly predicted. Careful automatic software and manual alignments found an additional 308 genes that have introns different from those predicted, with 152 new introns discovered and 182 introns with sizes or locations different from those predicted. Alternative spliced and antisense transcripts were also detected. Matching cDNA to predicted genes also revealed silent chromosomal regions, mostly at subtelomere regions.
Our data indicated that approximately 24% of the genes in the current databases were predicted incorrectly, although some of these inaccuracies could represent alternatively spliced transcripts, and that more genes than currently predicted have one or more additional introns. It is therefore necessary to annotate the parasite genome with experimental data, although obtaining complete cDNA sequences from this parasite will be a formidable task due to the high AT nature of the genome. This study provides valuable information for genome annotation that will be critical for functional analyses.
PMCID: PMC1978503  PMID: 17662120
12.  An insight into the sialome of the oriental rat flea, Xenopsylla cheopis (Rots) 
BMC Genomics  2007;8:102.
The salivary glands of hematophagous animals contain a complex cocktail that interferes with the host hemostasis and inflammation pathways, thus increasing feeding success. Fleas represent a relatively recent group of insects that evolved hematophagy independently of other insect orders.
Analysis of the salivary transcriptome of the flea Xenopsylla cheopis, the vector of human plague, indicates that gene duplication events have led to a large expansion of a family of acidic phosphatases that are probably inactive, and to the expansion of the FS family of peptides that are unique to fleas. Several other unique polypeptides were also uncovered. Additionally, an apyrase-coding transcript of the CD39 family appears as the candidate for the salivary nucleotide hydrolysing activity in X.cheopis, the first time this family of proteins is found in any arthropod salivary transcriptome.
Analysis of the salivary transcriptome of the flea X. cheopis revealed the unique pathways taken in the evolution of the salivary cocktail of fleas. Gene duplication events appear as an important driving force in the creation of salivary cocktails of blood feeding arthropods, as was observed with ticks and mosquitoes. Only five other flea salivary sequences exist at this time at NCBI, all from the cat flea C. felis. This work accordingly represents the only relatively extensive sialome description of any flea species. Sialotranscriptomes of additional flea genera will reveal the extent that these novel polypeptide families are common throughout the Siphonaptera.
PMCID: PMC1876217  PMID: 17437641
13.  An annotated catalogue of salivary gland transcripts in the adult female mosquito, Ædes ægypti* 
BMC Genomics  2007;8:6.
Saliva of blood-sucking arthropods contains a cocktail of antihemostatic agents and immunomodulators that help blood feeding. Mosquitoes additionally feed on sugar meals and have specialized regions of their glands containing glycosidases and antimicrobials that might help control bacterial growth in the ingested meals. To expand our knowledge on the salivary cocktail of Ædes ægypti, a vector of dengue and yellow fevers, we analyzed a set of 4,232 expressed sequence tags from cDNA libraries of adult female mosquitoes.
A nonredundant catalogue of 614 transcripts (573 of which are novel) is described, including 136 coding for proteins of a putative secretory nature. Additionally, a two-dimensional gel electrophoresis of salivary gland (SG) homogenates followed by tryptic digestion of selected protein bands and MS/MS analysis revealed the expression of 24 proteins. Analysis of tissue-specific transcription of a subset of these genes revealed at least 31 genes whose expression is specific or enriched in female SG, whereas 24 additional genes were expressed in female SG and in males but not in other female tissues. Most of the 55 proteins coded by these SG transcripts have no known function and represent high-priority candidates for expression and functional analysis as antihemostatic or antimicrobial agents. An unexpected finding is the occurrence of four protein families specific to SG that were probably a product of horizontal transfer from prokaryotic organisms to mosquitoes.
Overall, this paper contributes to the novel identification of 573 new transcripts, or near 3% of the Æ. ægypti proteome assuming a 20,000-protein set, and to the best-described sialome of any blood-feeding insect.
PMCID: PMC1790711  PMID: 17204158

