|Home | About | Journals | Submit | Contact Us | Français|
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Mosquito saliva, consisting of a mixture of dozens of proteins affecting vertebrate hemostasis and having sugar digestive and antimicrobial properties, helps both blood and sugar meal feeding. Culicine and anopheline mosquitoes diverged ~150 MYA, and within the anophelines, the New World species diverged from those of the Old World ~95 MYA. While the sialotranscriptome (from the Greek sialo, saliva) of several species of the Cellia subgenus of Anopheles has been described thoroughly, no detailed analysis of any New World anopheline has been done to date. Here we present and analyze data from a comprehensive salivary gland (SG) transcriptome of the neotropical malaria vector Anopheles darlingi (subgenus Nyssorhynchus).
A total of 2,371 clones randomly selected from an adult female An. darlingi SG cDNA library were sequenced and used to assemble a database that yielded 966 clusters of related sequences, 739 of which were singletons. Primer extension experiments were performed in selected clones to further extend sequence coverage, allowing for the identification of 183 protein sequences, 114 of which code for putative secreted proteins.
Comparative analysis of sialotranscriptomes of An. darlingi and An. gambiae reveals significant divergence of salivary proteins. On average, salivary proteins are only 53% identical, while housekeeping proteins are 86% identical between the two species. Furthermore, An. darlingi proteins were found that match culicine but not anopheline proteins, indicating loss or rapid evolution of these proteins in the old world Cellia subgenus. On the other hand, several well represented salivary protein families in old world anophelines are not expressed in An. darlingi.
Saliva of hematophagous arthropods contain a vast array of compounds that disarm their hosts' hemostasis and inflammation, thus helping to obtain a blood meal [1,2]. In the case of mosquitoes and other blood-sucking Nematocera, saliva also helps ingestion of sugar meals, in the form of carbohydrate hydrolysing enzymes . Antimicrobial products, in the form of pattern recognition proteins, serine proteases, and antimicrobial peptides (AMPs), are also routinely found in the saliva of hematophagous arthropods; these may protect the blood or sugar meal from harmful microbial growth .
Detailed sialotranscriptomes of several mosquito species [4-13] are revealing their salivary composition to include a number of proteins of previously known families as well as completely novel families unique to mosquitoes or their close relatives among the hematophagous Nematocera. In particular, studies done with Culex quinquefasciatus , Aedes aegypti , and Anopheles gambiae , for which the genomes are known, indicate that the mosquito salivary cocktail consists of 60–100 secreted proteins, several of which are members of multigene families. In these studies, Aedes-, Anopheles-, and Culex-specific proteins were discovered. Most of the salivary proteins do not have a known function but presumably affect hemostasis, inflammation, and sugar digestion or have antimicrobial activity.
Within the Anopheles genus, sialotranscriptomes were described for An. gambiae [11-13], An. funestus , and An. stephensi , all members of the same subgenus Cellia. These studies allowed the discovery of species-specific proteins and, importantly, that the salivary proteins among members of the same subgenus are very divergent when compared to housekeeping proteins, perhaps due to immune pressure of their vertebrate hosts, in the case of antihemostatic or antiinflammatory proteins, or of microbial resistance, in the case of antimicrobial products . An. darlingi (subgenus Nyssorhynchus) is an important vector of human malaria in Central and South America, and, like all non-autogenous mosquitoes, adult females absolutely require a blood meal to develop eggs, preferring humans to other blood sources . Preliminary studies with An. darlingi salivary glands identified one salivary lysozyme  and a limited proteomic work identified three additional salivary proteins . Additionally, a salivary transcriptome of An. darlingi was previously described , but no protein sequences were extracted from that expressed sequence tag (EST) set. In the present work, we increased the An. darlingi salivary EST set from 593 to 2,371 and extracted and deposited 183 protein sequences to GenBank, 114 of which represent putative salivary secreted proteins (inclusive of alleles). This new set of proteins reveals novel proteins as well as protein families that were previously found only in Culex, thus pointing to their existence at 150 MYA, when a common ancestor existed to culicine and anophelines  and that these protein families were lost in the genus Aedes and the Cellia anopheline subgenus. Accordingly, the complex and varied evolution of salivary proteins in mosquitoes is being revealed at the same time that new protein families with potentially novel pharmacologic activities are being discovered.
A total of 2,371 cDNA clones were used to assemble a database [see additional file 1] that yielded 966 clusters of related sequences, 739 of which contained only one EST. This dataset included the 593 sequences used in our previous work . The 966 clusters were compared, using the programs blastx, blastn, or RPS-BLAST , to the nonredundant (NR) protein database of the National Center of Biological Information (NCBI), National Library of Medicine, NIH, to a gene ontology database , to the conserved domains database of the NCBI , and to a custom prepared subset of the NCBI nucleotide database containing either mitochondrial or rRNA sequences.
Three categories of expressed genes derived from the manual annotation of the contigs (Fig. (Fig.1).1). The putatively secreted (S) category contained 50% of the sequences, the housekeeping (H) category had 34, and 16% of the ESTs could not be classified and belong to the unknown (U) class. The transcripts of the U class could represent novel proteins or derive from the less conserved 3' or 5' untranslated regions of genes, as was indicated for the sialotranscriptome of An. gambiae .
The 797 ESTs attributed to H genes expressed in the salivary glands (SGs) of An. darlingi were further characterized into 19 subgroups according to function (Table (Table11 and additional file 1). Transcripts associated with the protein synthesis machinery represented 53% of all transcripts associated with a housekeeping function, an expected result for the secretory nature of the organ. Energy metabolism accounted for 10% of the transcripts. Twenty percent of the transcripts were classified as either 'Unknown conserved' or 'Conserved secreted' proteins. These represent highly conserved proteins of unknown function, presumably associated with cellular function but still uncharacterized. These sets may help functional identification of the 'Conserved hypothetical' proteins as previously reviewed in .
A total of 1,188 ESTs represent putative An. darlingi salivary components (Table (Table22 and Supplemental Table S1). These include previously known gene families as well as novel proteins. Table Table22 also indicates our degree of knowledge, or ignorance, regarding these protein families, for 22 of which we have no hint for function. Many of these putatively secreted protein families of unknown function are multigenic, such as the SG1 and antigen-5 families, for example. The D7/OBP-like and aegyptin/30-kDa families contribute to 30% of all transcripts associated with secreted products. This is in line with these proteins accounting for the most intensely stained bands in SDS gels of mosquito salivary homogenates [4,7-10]. The identification of 8% of the transcripts with antimicrobial polypeptides is exceptional. Possibly this high level of expression, when compared with previous mosquito sialotranscriptomes, derives from the fact the An. darlingi used in this work were captured from the field and, as such, they could have been more exposed to pathogens than the laboratory-reared insects used to originate other mosquito salivary transcriptomes. Mosquito age could have been another possible variable, as the laboratory-reared mosquitoes had their glands removed in the first two days after emergence, while the ages of captured An. darlingi could not be specified but were most likely older than two days.
From the sequenced cDNAs, a total of 183 novel An. darlingi protein sequences was derived, 114 of which code for putative secreted products (Table (Table2,2, Table Table3,3, and additional file 2). Table Table33 presents a summary of the secreted subset, with links to GenBank.
The first D7 protein was cloned from a cDNA library from adult female Ae. aegypti SGs. It had an appropriately cryptic name because, at the time, it did not match other known proteins and its function was thus unknown . Additional members of this family were later described in An. gambiae, other mosquito species, and also in sand flies [11,23,24]. In these insects, salivary D7 proteins are encoded by multiple genes, and short and long versions of this protein family were recognized. The D7 protein family was then identified to be a member of the odorant-binding protein (OBP) superfamily , the long versions containing two and the short versions containing one OBP domain. Because insect OBP are known to bind and carry lipophylic compounds such as odorants and pheromones, the potential function of D7 proteins was proposed to be related to binding one or more agonists of hemostasis and thus help blood feeding . This prediction was confirmed when the short D7 proteins from An. gambiae and the carboxy terminal domain of the long D7 of Ae. aegypti were found to bind biogenic amines with high affinity . More recently, the amino terminal OBP domain of a D7 long form of Ae. aegypti was shown to bind peptidic leukotrienes with high affinity. The crystal structures of a short D7 protein from An. gambiae and a long D7 protein from Ae. aegypti revealed that the D7 OBP domains have seven alpha helices, two more than the canonical OBP family . In addition to these inflammatory agonist-binding functions, a short D7 protein from An. stephensi, named hamadarin, was shown to inhibit bradykinin formation by inhibiting the FXII/Kallikrein pathway .
An. gambiae has three genes coding for long D7 proteins and five coding for the short proteins, arranged in a single contiguous gene cassette in chromosome 3R . We will refer below to these proteins from An. gambiae by the transcriptional order that their genes appear in chromosome 3R. Twelve An. darlingi proteins exhibiting sequence similarity to proteins from the D7 family were identified (Table (Table22 and Supplemental Table S2). These include five pairs that are more than 95% identical to each other and are probably alleles. Accordingly, at least six unique products from the D7 family are identifiable in the An. darlingi salivary transcriptome. The alignment and phylogram of these protein sequences with all the D7 protein sequences of An. gambiae reveal i) the existence of An. darlingi proteins that are uniquely shorter, indicated by the bar above the alignment (Fig. (Fig.2A),2A), which form a robust clade named 'Short AD clade' in Figure Figure2B.2B. This clade is most closely related to the short D7 proteins 1 and 4 from An. gambiae (Fig. (Fig.2B),2B), as indicated by strong bootstrap support; ii) homologues of An. gambiae short proteins 2 and 3 are identifiable (indicated as s2/s3 homologue in Fig. Fig.2B),2B), as well as the ortholog of the fifth short protein of An. gambiae; and iii) AD-118 represents an An. darlingi long D7 protein that is related to An. gambiae long D7 proteins 1 and 2.
AD-1 and AD-3, which possibly derive from a polymorphic gene, are similar to the D7s2 and D7s3 of An. gambiae. These proteins have in common a similar size as well as being the most transcribed D7 proteins in both species . AD-1 and AD-3, but not the other An. darlingi D7 sequences, share an amino acid (aa) pattern, included in a cysteine framework, that are known from crystal structure to make contact with biogenic amines [27,29]. The high transcription of these gene products is in line with the large amounts of protein needed to scavenge biogenic amines that accumulate to the order of one micromolar in the host tissues , suggesting these An. darlingi proteins, likewise their An. gambiae homologues, function as biogenic amine scavengers.
D7s1 from An. gambiae, the homologue of An. stephensi hamadarin  has an alkaline pI of 9.22, contrasting with the neutral or acidic pI of the remaining short D7 proteins. To the extent this basic pI is associated with hamadarin function, it is worth noting that AD-81 and AD-31 (Fig. (Fig.2)2) also have pIs above 8.5, but not the more distantly related AD-97. These three An. darlingi proteins are members of the novel short AD clade (Fig. (Fig.2B),2B), which shares the same tree branch where D7s1 from An. gambiae are located, suggesting they could have a similar function as hamadarin.
This protein family, found exclusively in the SGs of adult female mosquitoes, was first identified as a salivary antigen in Ae. aegypti  and later found in salivary transcriptomes and proteomes of both culicine and anopheline mosquitoes [4,6-9,13,31,32], where it was named GE-rich protein. Proteomic work also indicated that this is one of the most abundant proteins in the SGs of mosquitoes. Its gene promoter has been used to specifically drive abundant gene expression in the SGs of transgenic mosquitoes . More recently, proteins of this family from Aedes and Anopheles were shown to prevent platelet aggregation by collagen [34,35], indicating conservation of function after the split of the Culicidae into the culicines and anophelines, ~150 MYA .
Analysis of the sialotranscriptome of An. darlingi allowed the identification of 8 protein sequences from this family, all represented by 2–17 ESTs found in the library. These protein sequences most probably reflect alleles from a single polymorphic gene, as they all share at least 95% identity . This degree of polymorphism is paralleled in the An. darlingi D7 proteins but is greater than that determined in sialotranscriptomes of other mosquitoes. Possibly this high degree of sequence variability reflects our material deriving from field-caught insects, whereas previous sialotranscriptomes were made with more genetically uniform mosquito colonies.
Alignment of all known members of this family, excluding those that are more than 95% identical and of the same species, shows their structure clearly to be dominated by three domains : the signal secretion peptide, a Gly/Glu-rich region, and a more conserved and organized region where the block T-x(29,30)-Q-x(5)-P-x(13,15)-I-x(2)-C-F-x(20)-C-x(8,10)-C-x(19,21)-C can be identified (Fig. (Fig.3A).3A). This block was used by the seedtop program http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/seedtop.html to search over 6 million sequences of the NR database, only retrieving mosquito proteins. The phylogram (Fig. (Fig.3B)3B) obtained from the alignment produces strong bootstrap support for three genus-specific clades, containing three genes for Ae. albopictus and Ae. aegypti, two for Culex quinquefasciatus, and one for each anopheline, An. stephensi, An. funestus, An. gambiae, An. darlingi, An. albimanus, and An. dirus. The An. darlingi protein groups, as expected, with An. albimanus, another American species from the Nyssorhynchus subgenus. Near the amino terminal of the mature sequences, the Nyssorhynchus-derived 30-kDa antigen/GE-rich sequences have an RGD motif as pointed out before for the An. albimanus sequence ; this triad is not found in similar proteins of other mosquitoes. RGD-containing peptides are commonly found in snake venoms  and tick saliva , and the motif itself is usually found surrounded by two relatively close Cys groups that allow the RGD to be at the edge of a loop. This conformational feature permits the aa of the RDG motif to interact with integrins, disrupting platelet aggregation . It is unknown, however, whether the RGD domain present in the 30-kDa antigen/GE-rich proteins of Nyssorhynchus mosquitoes is structurally capable of interacting with integrins.
The salivary anticlotting agent of An. albimanus, named anophelin, was previously characterized as a short acidic peptide with strong thrombin inhibitory activity [40,41]. Despite extensive sequencing of the salivary transcriptomes of many hematophagous arthropods, similar sequences are found only in sialotranscriptomes of anopheline mosquitoes. Two similar An. darlingi cDNAs, probably corresponding to alleles of a single gene, were identified. Conceptual translation of the gene results in acidic peptides of 6.3 kDa and pI of 3.9, which are 86% identical to An. albimanus anophelin .
The gSG7 family is also unique to anophelines. In An. gambiae, it has two genes coding for gSG7 and gSG72, both of which are highly enriched in female SGs . More recently, the An. stephensi homologue was determined to inhibit kallikrein and production of bradykinin, a pain-producing substance . Four putative alleles representing the homologue(s) of gSG7/Anophensin in An. darlingi were identified. These An. darlingi SG transcripts, though, have no more than 45% identity to the An. gambiae gSG7 and An. stephensi anophensin .
The Kazal domain is ubiquitously found in proteins of metazoan organisms and, accordingly, peptides containing this domain have been identified in studies of sialotranscriptomes and proteomes of tabanids [45,46], triatomine bugs [47,48], Culicoides sonorensis , and mosquitoes [4,7,8]. In Ae. aegypti and Ae. albopictus, the transcripts encoding Kazal domain proteins were ubiquitously expressed in all major organs analyzed, suggesting their function was not specific to blood feeding [4,7]. Kazal domain peptides have also been isolated and biochemically characterized from the midgut of triatomines, where they act as anticlotting agents [50-52], and from leech saliva, where they inhibit mast cell tryptase and plasmin [53-55]. Midgut transcriptomes of sand flies have also uncovered transcription of genes encoding peptides of this class [56,57]. In addition to their classical function as protease inhibitors, Kazal domain-containing peptides were identified as the salivary vasodilator of the horse flies Hybomitra bimaculata and Tabanus yao [45,46]. In An. darlingi, transcripts coding for three peptides with Kazal domain were found, yielding predicted mature MW of 7.2–8.1 kDa and basic pI (8.3–9.4). AD-417 and AD-257 best match An. gambiae peptides when subjected to blastp against the NR database, albeit at only 45%  and 44% identity . AD-350 best matches Aedes and Culex peptides at 47% and 51% identity . The function of these salivary peptides in mosquitoes remains to be discovered.
Serine- and threonine-rich proteins are commonly found in sialotranscriptomes. These proteins are generally modified post translationally, and their mature forms have N-acetyl galactosamine residues, typical of mucins . They probably have a function to lubricate the food canals and may also have antimicrobial function. Several protein families are represented in this group, including those previously described as SG3, gSG10, and 13.5-kDa families. Peritrophins are proteins with a chitin-binding domain that are often found in sialotranscriptomes and may be related to the maintenance of the structure of the mouthparts and/or salivary canal.
The SG3 family in An. darlingi is highly expressed, four proteins of which account for 90 ESTs found in the cDNA library. They may be alleles or splice variants of a single gene , containing 29% to 32% Ser + Thr and over 47 predicted galactosylation sites in a mature 17-kDa protein framework . The An. darlingi SG3 has similarities only to other anopheline salivary proteins, having only 46% identity and 56% similarity to the closest relative, from An. funestus . Compared to the Old World anophelines, the An. darlingi SG3 has a long GH repeat, which may confer zinc chelation capability and hence a putative antimicrobial activity for these proteins, because zinc chelation is characteristic of histidine rich antimicrobial agents that act by sequestration of this essential microbial growth factor [65-67].
The gSG10 family, containing three peptides (Supplemental Table S2), is represented by mature products with MW of 18 kDa, 22% to 23% Ser + Thr, and 15–20 predicted galactosylation sites . They also may be products of a single polymorphic and/or differentially spliced gene . An. darlingi gSG10 members match both anopheline and culicine sequences of salivary origin , having a unique signature block that characterizes these distinctive mosquito proteins.
The 13.5-kDa protein family is also represented in An. darlingi by the products of two or three genes . Most mosquito 13.5-kDa family members have over 30 predicted galactosylation sites . An. funestus, An. gambiae, and An. stephensi have recognizable relatives; however those proteins show only 41% to 44% identity over most of the length of the protein to the An. darlingi 13.5-kDa products. Culicine proteins that display only conservation of the stretch of threonine residues have been identified, but they may not be true homologues.
Two other putative mucins were found, AD-11 being a hypothetical secreted peptide of predicted mature MW of 3.8 kDa, 25% Ser + Thr, and ten potential glycosylation sites. No significant matches are found with other known proteins. AD-91, on the other hand, with 20% Ser + Thr content and 20 potential O-glycosylation sites, is 71% identical to an An. gambiae protein  that is related to a previously identified Aedes salivary protein and to a Drosophila protein annotated in the Gene Ontology database as associated with defense response to virus .
A single transcript in the An. darlingi sialotranscriptome codes for a peritrophin with a typical chitin-binding domain  and 69% sequence identity to an An. gambiae protein annotated as peritrophin A , which was cloned from the mosquito midgut .
The SG3, SG10, and 13.5-kDa families were found abundantly expressed in sialotranscriptomes of adult male An. gambiae , indicating their function is likely not related specifically to blood feeding.
Enzymes associated with both blood (apyrase and peroxidase) and sugar (amylase and maltase) feeding are known to occur in mosquito saliva; accordingly, their corresponding transcripts have been found in mosquito sialotranscriptomes. Serine protease-encoding transcripts also are regularly found, but their proposed functions in helping blood feeding by interacting with host proteins or as participants in immune proteolytic cascades have not been validated.
Apyrase, which hydrolyses ATP and ADP to AMP and orthophosphates, has been a ubiquitous finding in the saliva of blood-sucking arthropods, where it destroys these important agonists of inflammation and platelet aggregation [2,79]. Mosquitoes have co-opted the 5' nucleotidase family to achieve this function [80-82]. Two genes of this family are expressed in the SGs of An. gambiae , named putative 5' nucleotidase and salivary apyrase, although both may function redundantly as apyrases. The sialotranscriptome of An. darlingi presents evidence for the two orthologues, IS07-44, a full-length orthologue of the salivary 5' nucleotidase of An. gambiae , to which it is 66% identical, and AD-101, which is a 5' truncated clone best matching the An. gambiae salivary apyrase .
A peroxidase was previously identified as the vasodilator for norepinephrine-induced aortic contractions found in An. albimanus SGs [85,86]. AD-573 encodes the full-length sequence of an An. darlingi salivary peroxidase that is 86% identical to An. albimanus and 52% identical to An. gambiae salivary peroxidases . This type of salivary vasodilator is so far unique to anopheline mosquitoes.
Maltase and amylases, as well as their transcripts, have been regularly found in the saliva and sialotranscriptomes of mosquitoes [88-91]. The first cloned gene from the SGs of any mosquito was actually a member of this family . Ae. aegypti and An. gambiae express both genes in their SGs. Transcripts coding for both enzymes were found in the sialotranscriptome of An. darlingi [see additional file 1]. The full-length sequence for the orthologue of An. gambiae salivary maltase (68% identity)  is presented in Supplemental Table S2.
Transcripts coding for at least two different serine proteases were found in An. darlingi sialotranscriptome [see additional file 1]. Supplemental Table S2 presents a truncated sequence of a CLIP domain serine protease expressed in An. darlingi SGs, 86% identical to the An. gambiae closest match .
Antimicrobial peptides, lysozyme, and pathogen pattern recognition polypeptides are commonly found in the sialotranscriptome of blood-sucking arthropods. Among the AMPs found in the sialotranscriptome of An. darlingi, a gambicin , a defensin , and three different cecropins  are described in their full-length condition. A peptidoglycan recognition protein, 94% identical to an An. gambiae protein , is also reported as a full-length protein. Additionally, this study [see additional file 1] provides evidence for An. darlingi transcripts coding for C-type lectins and ficolins, and an odd transcript having a full PMEI Pfam domain  normally found in plant proteins associated with inhibition of microbial pathogens' pectin methyl esterase. Two similar lysozyme cDNAs, probably products of alleles, are also described as full-length polypeptides, matching 57% identity to the closest An. gambiae protein . Another identified lysozyme, contig 443 , corresponds to a previously described salivary An. darlingi lysozyme . The occurrence of multiple lysozymes in the An. darlingi sialome is not surprising, as two lysozymes are expressed in the An. gambiae SGs .
With less certainty, we include in the immunity-related products the full-length sequence for a Gly-His-rich peptide that might have antimicrobial function by zinc chelation, as explained above. This protein matches a C. quinquefasciatus salivary peptide that also contains Gly repeats and a poly His in the amino terminus .
This is a ubiquitous protein family found in animals and plants  and in all sialotranscriptomes of blood-sucking Diptera analyzed so far. The function of these proteins in mosquito saliva is not known, although they were implicated in a proteolytic function in the venom of the marine snail Conus textile , in toxic functions in the saliva of a venomous lizard and snake venoms [105-109], and in an antifungal function in plants . Remarkably, a member of this family acquired a typical RGD domain surrounded by Cys residues and acts as a main platelet aggregation inhibitor in the horsefly Tabanus yao . Several genes from the AG5 family are transcribed in the SGs of mosquitoes, including some specific to the adult females and thus possibly associated with a specific function in blood feeding [4,7,13]. We present evidence, in the form of full-length transcripts, for the expression of at least two members of the AG5 family in An. darlingi SGs . AD-38 matches with 67% identity the putative gVAG protein precursor of An. gambiae , a transcript enriched in the adult female SGs when compared with expression in other tissues . AD-430 matches An. gambiae AG5-related 2 protein , which was shown to be ubiquitously expressed in adult female tissues . The function(s) of this protein family in mosquitoes remain to be determined.
Transcripts coding for the gSG5 protein  were first discovered in the SGs of An. gambiae and shown to be exclusively expressed in the adult female SGs [13,115]. This protein produces weak similarity to a salivary protein of Ae. aegypti  and better similarity to other Aedes  and Culex proteins , indicating this is a mosquito-specific protein. Six transcripts coding for this protein were found in the sialotranscriptome of An. darlingi. AD-196 is 46% identical to the An. gambiae orthologue and only 26% and 23% identical to the culicine proteins . The function of this mosquito-specific protein remains unknown, but its tissue- and sex-specific expression profile suggests it is possibly related to blood feeding.
The gSG8 is a highly divergent family, with members only from An. gambiae and Ae. aegypti . Alignment of the three sequences displays a conserved motif L-C-W-A-x-K-x(2)-P-T-A-x(6)-C-x(5)-K, which might help identify new members of this family. In An. gambiae, this protein is specifically expressed in female SGs , suggesting a likely role in blood feeding.
AD-216 and AD-217 represents two similar proteins deducted from two and three ESTs, respectively. They may represent splicing variants or alleles of the same gene . The predicted mature peptides have 11.2 kDa and solely match proteins found in other mosquito sialotranscriptomes or other hypothetical mosquito proteins . The basic tail name derives from a conserved Lys-X-X-Lys or Lys-X-X-Arg found in the carboxyterminus of proteins derived from the genus Aedes but lacking in the anopheline sequences. The alignment indicates a conserved backbone and the absence of cysteine residues, from where the block pattern L-x-H-x-L-x-Y-L-x-D-x(17,18)-A-x(2)-Y-x(3)-A-x(3)-G can be deduced (Fig. (Fig.4A).4A). The derived phylogram (Fig. (Fig.4B)4B) follows the expected mosquito phylogeny. Ae. aegypti transcripts coding for the basic tail peptide were enriched in adult female SGs .
AD-476 represents the peptide sequence of a mature protein of 4.1 kDa having significant similarities only to other polypeptides found previously in culicine mosquito sialotranscriptomes or predicted proteomes of mosquitoes . This is the first time a protein of this family is found in an anopheline sialotranscriptome. Alignment and phylogram of the mature predicted peptides shows that Ae. aegypti and C. quinquefasciatus have two such peptides, those of Anopheles matching the slightly smaller version (Fig. (Fig.5A).5A). The derived phylogram indicates two clades grouping the short and the large forms. In Ae. aegypti, transcripts coding for a member of this family were shown to be enriched in the adult female SGs .
The sialotranscriptome of Ae. aegypti identified a protein named proline-rich salivary secreted peptide , close homologues of which were never found in other sialotranscriptomes. Transcripts for this protein were found exclusively on the adult female SGs of Ae. aegypti, indicating a function related to acquisition of the blood meal . The sialotranscriptome of An. darlingi provided three ESTs, which when assembled derive the sequence AD-267, matching this Aedes protein at 47% identity  and also, weakly, a smaller region of a salivary protein from An. stephensi of the same size. AD-267 was subjected to psiblast analysis against the NR database retrievieng only sequences from Ae. aegypti, which converged after two iterations. The presence of AD-267 in An. darlingi, its homology to the Ae. aegypti protein, and its absence in An. gambiae suggest that the gene for this family existed in the ancestor of culicines and anophelines but was lost or modified beyond recognition in Culex and the Cellia subgenus of Anopheles.
The first 41.9-kDa family member was characterized in sialotranscriptome of Ae. aegypti and later found in C. quinquefasciatus and in Ae. albopictus [4,7,8,10]. It has never been found in any anopheline sialotranscriptome, nor does it have any similar protein predicted from the An. gambiae genome . AD-114, however, produces similarities to 41.9-kDa family members when subjected to blastp analysis against the NR database . The blast results interestingly retrieves other salivary proteins from hematophagous Diptera from the NR database, such as gSG10, gSG9, and other mucins, despite having itself only three potential galactosylation sites. The alignment of the An. darlingi protein with the 41.9-kDa proteins from Ae. aegypti and C. quinquefasciatus shows extensive similarities over the whole length of the sequences, including a conserved cysteine framework, despite having less than 30% identity with the culicine proteins (Fig. (Fig.6).6). AD-114 thus appear to be a "missing link" joining previously thought unrelated salivary protein families from Culicines and Anophelines. To further investigate this possibility, we used psiblast to search AD-114 against the NR database, retrieving mostly proteins found before in sialotranscriptomes of blood-sucking Diptera , including Culicoides  and sand flies [129,130]. In addition to the known 41.9-kDa members from culicines, the anopheline proteins annotated as gSG10 and gSG9 are also retrieved, as are a group of proteins annotated as salivary mucins from mosquitoes, including the non-bloodfeeding species Toxorhynchites amboinensis . Exceptionally, two bacterial proteins are retrieved, as well as one from the wasp Nasonia vitripennis. The alignment of the proteins from Diptera plus the two bacterial proteins by the Clustal tool does not reveal any region of common conservation among all proteins (not shown), but the derived bootstrapped phylogram (Fig. (Fig.7)7) is informative. Strong support is obtained for four clades, as indicated in Figure Figure7.7. The first clade includes sequences from both anopheline and culicine mosquitoes annotated as gSG10, gSG9, and mucins, together with the An. darlingi sequence. A second clade includes Culex and Aedes proteins annotated as mucins. This second clade roots with strong bootstrap support to the previous clade. A third clade includes Aedes proteins annotated as 41-kDa protein, or a short version, annotated as 30.3-kDa protein. This clade also roots strongly with the two previous clades. The sole C. quinquefasciatus sequence shown in Figure Figure77 (gi|170045863), the 41.9-kDa basic salivary protein, does not group significantly with any other sequence. Finally, a fourth clade groups together the bacterial and sand fly proteins. This clade does not root with strong bootstrap support to the previous clades. The presence of the bacterial proteins in this clade is puzzling, and suggests that the Nematocera proteins could have derived from bacterial contaminants. However, the proteins deriving from Ae. aegypti, C. quinquefasciatus and An. gambiae map to assembled chromosomes or supercontigs, and their respective genes contain introns indicating they are of eukaryotic origin. Together, these results support the argument that the 41.9-kDa protein family of mosquitoes has a common salivary ancestor before the split of anophelines and culicines, being recognized in An. darlingi by AD-114; in the Cellia subgenus, the 41.9-kDa protein family has evolved to produce shorter proteins, the subfamily members of the gSG10 and gSG9 families. Sand flies express related salivary proteins that might have been acquired by convergent evolution or share a distant common ancestor that can no longer be recognized with the available sequences.
Six genes coding for proteins of this unique protein family were found in An. gambiae salivary transcriptomes [11,12,115], four of which are located as a contiguous gene cluster  in chromosome X . Remarkably, all these genes are uniexonic, unusual for eukaryotic genes coding for these relatively large proteins, attaining a mature molecular weight above 40 kDa, suggesting its acquisition as horizontal transfer. This gene family appears to be specifically associated with SG function. The transcripts coding for the Trio, SG1, and SG1b proteins appears to be exclusively expressed in the female SGs, while SG1-like3 and gSG1-2 and gSG1a are enriched in the female glands but also present in lower amounts in male glands and not observed in other tissues . When these proteins were subjected to blastp against the NR database, only other anopheline sequences are retrieved. Sixty-three ESTs were found in the An. darlingi sialotranscriptome coding for proteins of this family, from which six full-length clones were sequenced. Of these six sequences, two possibly derive from alleles or splice variants . When full-length protein sequences from all known members of this family are aligned by the Clustal tool, very few conserved aa are identified (Fig. (Fig.8A);8A); however, the deduced phylogram show strong bootstrap support for five clades (Fig. (Fig.8B),8B), named for the An. gambiae proteins, as follows: Clade SG1/SG1a contains these two proteins from An. gambiae and also one sequence each from An. stephensi, An. dirus, and An. darlingi. Clade SG1-like3 contains two sequences from An. darlingi that could be the result of a recent gene duplication or polymorphism and splice variation . These two sequences cluster with strong bootstrap support, as expected, with the sole sequence from An. albimanus. The Trio clade also has AD-153 from An. darlingi. The clade SG1-2 is the only clade not having An. darlingi representatives. The function of these proteins remains to be determined.
The SG2 protein was deduced from salivary An. gambiae cDNAs and shown to be expressed in female glands and adult males but not in other tissues . It derives from a single gene in chromosome 2L and is abundantly transcribed in sialotranscriptomes of male An. gambiae . Related, but very divergent, sequences were obtained solely from salivary transcriptomes of other anopheline species . The sialotranscriptome of An. darlingi indicates that at least two different genes exist coding for proteins of this family. One gene codes for mature proteins of 8.5 kDa, from which four alleles or splice variants are derived . A second gene may have produced another five different alleles or splice variants coding for shorter (5.6- to 6.1-kDa) peptides , but it is more likely that these derive from two closely related genes. Comparison of these proteins with other anopheline sequences displays sequence identities varying from only 26% to 31% . Because this protein family is expressed in both male and female An. gambiae [11,135], and due to its relatively small size, it may display antimicrobial function.
The hyp 15 and hyp 17 proteins, previously identified in sialotranscriptomes of An. gambiae , have alkaline pI and ~4.7 kDa. Their genes reside as tandem repeat in chromosome X and are preferentially expressed in adult female SGs . Homologues were additionally found in An. stephensi and An. funestus. The An. darlingi sialotranscriptome presents evidence of three transcripts that may derive from splice variants from a single gene , which are 41% and 39% identical to the An. funestus and An. gambiae homologue .
In An. gambiae, the genes coding for the hyp 8.2 and hyp 6.2 proteins are found as a tandem repeat in chromosome arm 2L. These proteins have mature molecular weight of 6–9 kDa, do not have sequence similarity, and are grouped together solely by virtue of being chromosomal neighbours. Transcripts coding for these two polypeptides are similarly enriched in An. gambiae adult female SGs . An. stephensi and An. funestus also have members of these protein families. In An. darlingi, two quite divergent protein sequences  deduced from the sialotranscriptome are similar to hyp 8.2 , and one is similar to hyp 6.2 .
An. darlingi protein AD-269 has a predicted molecular weight of 6.5 kDa and matches  the carboxyterminus of a salivary peptide named hyp 5.6 previously described in An. gambiae sialotranscriptome . Members of this family have not been found previously in other sialomes. In An. gambiae the transcript coding for hyp 5.6 was ubiquitously transcribed, suggesting a housekeeping or antimicrobial role.
A protein cryptically named hypothetical protein was previously identified in a cDNA library of An. gambiae , but homologues were never found in other sialotranscriptomes of either anopheline or culicine mosquitoes. This An. gambiae protein produces matches to other unrelated sequences in the NR database by virtue of repeated acidic amino acids. The sialotranscriptome of An. darlingi produced 60 transcripts matching this An. gambiae protein, distributed into six putative protein sequences deriving from possibly two genes , of which AD-18 represents a shorter form of the family (Fig. (Fig.9).9). The five remaining deduced sequences may result from alleles . These proteins have predicted mature molecular weight of 14–17 kDa and pI of 4.2. They are 41%  to 50% identical  to the An. gambiae homologue. Alignment of two of the An. darlingi sequences with the An. gambiae homologue identifies a region of Ser [Asp/Glu] [Asp-Glu] repeats (identified with a bar labelled I in Fig. Fig.9)9) and a region of two repeats WIRRP in the An. gambiae sequence (identified with a bar labelled II in Fig. Fig.9),9), which provides a name for the family.
Two An. darlingi protein sequences, never before evidenced in mosquito sialotranscriptomes, are described here with clear signal peptide indicative of a secretion. These are AD-136, which significantly matches only hypothetical proteins of An. gambiae, Ae. aegypti, and C. quinquefasciatus , and AD-119, which has no significant matches to any known protein in the NR database. Seven and 15 transcripts were found coding for each protein, respectively. AD-136 has an allele , AD-138, derived from two transcripts.
In a previous sialotranscriptome analysis of An. gambiae, 92 transcripts from a total of 4,066  coded for a protein named gSG6 , orthologues of which were found in An. stephensi and An. funestus sialotranscriptomes . Considering that we have sequenced in the present work 2,371 ESTs from An. darlingi, some 53 ESTs would have been expected for this protein. None were found, suggesting this family to be specific for the Cellia subgenus. Similarly, the related An. gambiae proteins named hyp 10 and hyp 12  had 37 and 12 corresponding ESTs, but none were found in the An. darlingi cDNA library, also suggesting this family to be Cellia-specific.
Seventy-seven deduced protein sequences coding for putative housekeeping (H) products are presented in Supplemental Table S2. These proteins allow comparison of the evolutionary rate of the S proteins compared with that of the H proteins, using the An. gambiae proteome as a reference set as done before for comparing An. stephensi salivary proteins with those of An. gambiae . For this comparison, we used only protein sequences from An. darlingi that had at least 100 aa of alignment to an An. gambiae protein, as identified by blastp with the filter for low complexity set to off. The protein identity in the two groups, 86% for the H and 53% for the S group, were significantly different (P < 0.001, Mann-Whitney rank sum test) (Table (Table4),4), supporting the concept that the evolution of mosquito salivary-secreted proteins occurs at a faster pace than housekeeping proteins.
Anophelines diverged from culicine mosquitoes approximately 150 MYA . Within anophelines, the new world species diverged from the old world forms concomitantly or before the breakup of Gondwanaland, at ~95 MYA . Within the anophelines, detailed sialotranscriptome analyses have been made only from members of the Cellia subgenus (An. gambiae, An. stephensi, and An. funestus). In addition, detailed sialotranscriptomes and proteome data are available for three culicines, Ae. aegypti, Ae. albopictus, and C. quinquefasciatus, and one mosquito of the subfamily Toxorhynchitinae, T. amboinensis. The insertion of a neotropical anopheline (subgenus Nyssorhyncus) fills a gap of information and helps to explain mosquito evolution with regard to adaptation to blood feeding through their salivary proteins.
From a conservative perspective, the sialotranscriptome of An. darlingi confirms the presence of ubiquitous salivary mosquito protein families, such as the D7, 30-kDa antigen/aegyptin, mucins, AG5, gSG5, gSG8, basic tail, the enzymes apyrase/5' nucleotidase and amylase/maltase, and the immunity-related proteins lysozyme, defensin, cecropin, and Gly-His-rich peptides; most of these proteins are uniquely found in mosquitoes. From another standpoint, the An. darlingi sialotranscriptome has confirmed the presence of proteins so far known exclusively in anopheline mosquitoes, such as the antithrombin anophelin, the SG1, SG2, hyp 15/hyp 17, hyp 8.2/hyp 6.2, hyp 5.6, 2WIRRP. In the last two cases, the 2WIRRP and hyp 5.6, the An. darlingi sequences represent the second member of the family previously discovered in An. gambiae but never before found in other anophelines.
Of interest, the An. darlingi sialotranscriptome also produced protein sequences with similarity to polypeptides previously found exclusively in culicine sialotranscriptomes, such as the proline-rich secreted protein, Kazal domain-containing peptides, and the 41.9-kDa family. Psiblast analysis of the An. darlingi sequence member of the 41.9-kDa family allowed identification of related Cellia anopheline sequences members previously known as gSG10 and gSG9, indicating these two families may have evolved quite rapidly from 41.9-kDa ancestors that are now absent not only in the An. gambiae known sialotranscriptome, but also from any predicted protein from this mosquito genome (Fig. (Fig.7).7). On the other hand, An. darlingi lacks transcripts coding for proteins abundantly transcribed in An. gambiae and other Cellia mosquitoes, indicating the loss – or evolution beyond recognition – of these protein families in An. darlingi evolution.
Finally, the rapid divergence of salivary proteins allows the possibility of using such An. darlingi proteins as specific markers of vector exposure, as is now being attempted for An. gambiae and Ae. aegypti [155-158]. Additionally, to the extent that the rapid divergence of the salivary proteins is not associated with divergence of function, the differences between orthologous salivary proteins between An. gambiae and An darlingi, and also among anophelines of the different subfamilies, represents a natural site-directed mutagenesis experiment that will help identify structural determinants of function in such bioactive proteins [159-161].
The sequences utilized in this study originated from the same cDNA library used in our previous publication . This cDNA library was derived from SGs dissected from adult female An. darlingi of unknown ages that were field caught in Porto Velho, Rondonia, Brazil. PolyA+ RNA was extracted from 60 dissected pairs of SGs using the Micro-FastTrack mRNA isolation kit (Invitrogen), which was then used to make a PCR-based cDNA library using the SMART™ cDNA library construction kit (BD Biosciences-Clontech) as described before .
The SG cDNA library was plated on LB/MgSO4 plates containing X gal/IPTG to an average of 250 plaques per 150-mm Petri plate. Recombinant (white) plaques were randomly selected and transferred to 96-well Microtest™ U-bottom plates (BD BioSciences) containing 100 μl of SM buffer (0.1 M NaCl; 0.01 M MgSO4; 7 H2O; 0.035 M Tris HCl [pH 7.5]; 0.01% gelatin) per well. The plates were covered and placed on a gyrating shaker for 30 min at room temperature. The phage suspension was either immediately used for PCR or stored at 4°C for future use.
To amplify the cDNA using a PCR reaction, 4 μl of the phage sample was used as a template. The primers were sequences from the λ TriplEx2 vector and named pTEx2 5seq (5' TCC GAG ATC TGG ACG AGC 3') and pTEx2 3LD (5' ATA CGA CTC ACT ATA GGG CGA ATT GGC 3'), positioned at the 5' end and the 3' end of the cDNA insert, respectively. The reaction was carried out in 96-well flexible PCR plates (Applied Biosystems) using FastStart Taq polymerase (Roche) on a GeneAmp® PCR system 9700 (Perkin Elmer Corp.). The PCR conditions were: one hold of 95°C for 3 min; 25 cycles of 95°C for 1 min, 61°C for 30 sec; 72°C for 5 min. The amplified products were analysed on a 1.5% agarose/EtBr gel. cDNA library clones were PCR amplified, and those showing a single band were selected for sequencing. Approximately 200–250 ng of each PCR product was transferred to ThermoFast 96-well PCR plates (ABgene Corp.) and frozen at -20°C before cycle sequencing using an ABI3730XL machine. The obtained sequences were submitted to DBEST and have the GenBank accession numbers FK703778–FK705605.
These were performed using sequencing primers designed by the Primer3 program , aimed at a region ~100 bp upstream (5') of the end of the previously obtained sequence information of high quality. The process was repeated until full length information was obtained. The primer extension sequences were submitted to DBEST and have the accession numbers FL688077–FL688134. The sequences representing the open reading frames shown in supplemental table 2 have been deposited to GenBank and have the accession numbers EU934251–EU934432.
ESTs were trimmed of primer and vector sequences. The BLAST suite of programs , CAP3 assembler  and ClustalW  software were used to compare, assemble, and align sequences, respectively. Phylogenetic analysis and statistical neighbour-joining (NJ) bootstrap tests of the phylogenies were done with the Mega package . For functional annotation of the transcripts we used blastx  to compare the nucleotide sequences with the NR protein database of the NCBI and to the Gene Ontology (GO) database . The program reverse position-specific BLAST (RPS-BLAST)  was used to search for conserved protein domains in the Pfam , SMART , Kog , and conserved domains databases (CDD) . We have also compared the transcripts with other subsets of mitochondrial and rRNA nucleotide sequences downloaded from NCBI and to several organism proteomes downloaded from NCBI, ENSEMBL, or VectorBase. Segments of the three-frame translations of the EST (because the libraries were unidirectional, six-frame translations were not used) starting with a methionine found in the first 300 predicted aa, or the predicted protein translation in the case of complete coding sequences, were submitted to the SignalP server  to help identify translation products that could be secreted. O-glycosylation sites on the proteins were predicted with the program NetOGlyc . Functional annotation of the transcripts was based on all the comparisons above. Following inspection of all these results, transcripts were classified as either secretory (S), housekeeping (H) or of unknown (U) function, with further subdivisions based on function and/or protein families.
aa: amino acid; AMP: antimicrobial peptide; AG5: antigen 5 family; EST: expressed sequence tag; H class: housekeeping; NR: nonredundant; OBP: odorant-binding protein; S class: secreted; SG: salivary gland; SMART: switching mechanism at 5' end of RNA transcript; U class: unknown function.
EC and JFA helped with library manufacture, sequencing, data analysis, and contributed to the manuscript. VMP participated in sequencing the NIH library. OM helped with experiment design and contributed to manuscript. JMCR performed data analysis and contributed to the manuscript. All authors read and approved the final manuscript.
Assembled and annotated sialotranscriptome of An. darlingi female mosquitoes. Hyperlinked Excel spreadsheet and associated files with EST assembly results. This is a compressed ZIP file that should be expanded to a new directory. After this is done, start Excel and then open the file ending in .xls so the hyperlinks will work.
Annotated sialotranscriptome of An. darlingi female mosquitoes. Hyperlinked Excel spreadsheet with deducted protein sequences. See description above.
This work was supported by the Intramural Research Program of the Division of Intramural Research, National Institute of Allergy and Infectious Diseases, National Institutes of Health. We thank Dr. Bruno Arcà for valuable discussions, and NIAID intramural editor Brenda Rae Marshall for assistance.
Because EC, VMP, JFA, and JMCR are government employees and this is a government work, the work is in the public domain in the United States. Notwithstanding any other agreements, the NIH reserves the right to provide the work to PubMedCentral for display and use by the public, and PubMedCentral may tag or modify the work consistent with its customary practices. You can establish rights outside of the U.S. subject to a government use license.