|Home | About | Journals | Submit | Contact Us | Français|
Ticks are mites specialized in acquiring blood from vertebrates as their sole source of food and are important disease vectors to humans and animals. Among the specializations required for this peculiar diet, ticks evolved a sophisticated salivary potion that can disarm their host’s hemostasis, inflammation, and immune reactions. Previous transcriptome analysis of tick salivary proteins has revealed many new protein families indicative of fast evolution, possibly due to host immune pressure. The hard ticks (family Ixodidae) are further divided into two basal groups, of which the Metastriata have 11 genera. While salivary transcriptomes and proteomes have been described for some of these genera, no tick of the genus Hyalomma has been studied so far. The analysis of 2,084 expressed sequence tags (EST) from a salivary gland cDNA library allowed an exploration of the proteome of this tick species by matching peptide ions derived from MS/MS experiments to this data set. We additionally compared these MS/MS derived peptide sequences against the proteins from the bovine host, finding many host proteins in the salivary glands of this tick. This annotated data set can assist the discovery of new targets for anti-tick vaccines as well as help to identify pharmacologically active proteins.
Ticks are specialized mites, divided into two large families, the Argasidae (soft ticks) and Ixodidae (hard ticks), and the monotypic Nuttalliellidae . Soft ticks take relatively fast meals on their hosts, usually lasting less than one hour, while hard ticks stay attached for days or weeks to their hosts. The Ixodidae are further subdivided into the basal Prostriata group, with the single genus Ixodes, and the Metastriata, with 11 recognized genera organized into 4 subfamilies .
Among the several adaptations to blood feeding, ticks evolved a complex saliva consisting of a mixture of pharmacologically active components that affects their host’s hemostasis, inflammation, and immunity and also contains antimicrobial factors [3–8]. Perhaps due to their host’s immune response, which could neutralize such activities, these salivary proteins appear to evolve quickly, as indicated by the discovery of unique protein families among different tick genera and large sequence diversity within protein families that are common to all ticks, such as the lipocalin or Kunitz superfamilies . Gene duplications are also common, leading to the existence of many multigene families within individual tick species, as exemplified by the Kunitz, lipocalin, basic tail, and ixodegrin families [3, 9].
In the past 8 years, salivary transcriptomes, or sialomes (from the Greek sialo = saliva), have been described from several tick species, including the soft ticks Argas monolakensis [10, 11], Ornithodoros parkeri  and Ornithodoros coriaceus ; the prostriates Ixodes scapularis [14, 15], Ixodes pacificus , and Ixodes ricinus ; the metastriates Amblyomma americanum , Amblyomma cajennense , and Amblyomma variegatum  belonging to the metastriate Amblyomminae subfamily; and Dermacentor andersoni  and Rhipicephalus appendiculattus , members of the larger Rhipicephalinae subfamily. Within this last subfamily, the genera Anomalohimalayia, Cosmiomma, Hyalomma, Margaropus, Nosomma, and Rhipicentor remain unexplored.
To investigate the diversity of the sialome of a member of the Hyalomma genus, we analyzed the sialotranscriptome and sialoproteome of adult female Hyalomma marginatum rufipes, which is a common three-host tick found in Africa and Europe, and also a competent vector of Crimean Congo fever [23–27]. Immature stages H. m. rufipes feed on small vertebrates including mammals but mostly birds, while adults feed on large mammals, including cattle, from where our samples were obtained [28–32].
Ticks were removed from zebu cows located on Point G in Bamako, Mali, in December 2008. The SGs were dissected by one of us (JMA) and transferred to RNAlater (Ambion, Austin, Texas, USA). The vials were kept at 4°C for 24 hours then stored at 30°C until use. Tick carcasses were saved and analyzed by Dr. Dmitry A. Apanaskevich (Assistant Curator, U.S. National Tick Collection, Institute of Arthropodology and Parasitology, Georgia Southern University, Statesboro, Georgia, USA). They were all identified to be adult female specimens of H. m. rufipes Koch, 1844.
H. m rufipes mRNA from one pair of SGs was isolated using the Micro-FastTrack mRNA isolation kit (Invitrogen, San Diego, California, USA). The PCR-based cDNA library was made following the instructions for the SMART cDNA library construction kit (Clontech, Palo Alto, California, USA). This system utilizes oligoribonucleotide (SMART IV) to attach an identical sequence at the 5′ end of each reverse-transcribed cDNA strand. This sequence is then utilized in subsequent PCR reactions and restriction digests.
First-strand synthesis was carried out using PowerScript reverse transcriptase at 42°C for 1 hour in the presence of the SMART IV and CDS III (3′) primers. Second-strand synthesis was performed using a long distance (LD) PCR-based protocol, using Advantage™ Taq polymerase (Clontech) mix in the presence of the 5′ PCR primer and the CDS III (3′) primer. The cDNA synthesis procedure resulted in the creation of SfiI A and B restriction enzyme sites at the ends of the PCR products that are used for cloning into the phage vector. PCR conditions were as follows: 95°C for 20 sec; 24 cycles of 95°C for 5 sec., 68°C for 6 min. A small portion of the cDNA obtained by PCR was analyzed on a 1.1% agarose gel to check quality and range of cDNA synthesized. Double-stranded cDNA was immediately treated with proteinase K (0.8 μg/ml) at 45°C for 20 min, and the enzyme was removed by ultrafiltration though a Microcon YM-100 centrifugal filter device (Amicon Inc., Beverly, California, USA). The cleaned, double-stranded cDNA was then digested with SfiI at 50°C for 2 hours, followed by size fractionation on a ChromaSpin–400 column (Clontech). The profile of the fractions was checked on a 1.1% agarose gel, and fractions containing cDNAs of more than 400 bp were pooled and concentrated using a Microcon YM-100.
The cDNA mixture was ligated into the λ TriplEx2 vector (Clontech), and the resulting ligation mixture was packaged using the GigaPack® III Plus packaging extract (Stratagene, La Jolla, California, USA) according to the manufacturer’s instructions. The packaged library was plated by infecting log-phase XL1-Blue Escherichia coli cells (Clontech). The percentage of recombinant clones was determined by blue-white selection screening on LB/MgSO4 plates containing X-gal/IPTG. Recombinants were also determined by PCR, using vector primers (5′ λ TriplEx2 sequencing primer and 3′ λ TriplEx2 sequencing) flanking the inserted cDNA, with subsequent visualization of the products on a 1.1% agarose/EtBr gel.
The H. m. rufipes SG cDNA library was plated on LB/MgSO4 plates containing X-gal/IPTG to an average of 250 plaques per 150-mm Petri plate. Recombinant (white) plaques were randomly selected and transferred to 96-well MICROTEST ™ U-bottom plates (BD BioSciences, Franklin Lakes, New Jersey, USA) containing 100 μl of SM buffer [0.1 M NaCl; 0.01 M MgSO4; 7 H2O; 0.035 M Tris-HCl (pH 7.5); 0.01% gelatin] per well. The plates were covered and placed on a gyrating shaker for 30 min at room temperature. The phage suspension was either immediately used for PCR or stored at 4°C for future use.
To amplify the cDNA using a PCR reaction, 4 μl of the phage sample was used as a template. The primers were sequences from the λ TriplEx2 vector and named pTEx2 5seq (5′-TCC GAG ATC TGG ACG AGC-3′) and pTEx2 3LD (5′-ATA CGA CTC ACT ATA GGG CGA ATT GGC-3′), positioned at the 5′ and the 3′ end of the cDNA insert, respectively. The reaction was carried out in 96-well flexible PCR plates (Fisher Scientific, Pittsburgh, Pennsylvania, USA) using the TaKaRa EX Taq polymerase (TAKARA Mirus Bio, Madison, Wisconsin, USA), on a Perkin Elmer GeneAmp® PCR system 9700 (Perkin Elmer Corp., Foster City, California, USA). The PCR conditions were: one hold of 95°C for 3 min; 25 cycles of 95°C for 1 min, 61°C for 30 sec; 72°C for 6 min. Approximately 200–250 ng of each PCR product was transferred to Thermo-Fast 96-well PCR plates (ABgene Corp., Epsom, Surrey, UK) and frozen at −20°C. Samples were shipped on dry ice to the Rocky Mountain Laboratories Genomics Unit with primer and template combined together in an ABI 96-well Optical Reaction Plate (P/N 4306737) following the manufacturer’s recommended concentrations. Sequencing reactions were set up as recommended by Applied Biosystems BigDye® Terminator v3.1 cycle sequencing kit by adding 1 μl ABI BigDye® Terminator ready reaction mix v3.1 (P/N 4336921), 3 μl 5× ABI sequencing buffer (P/N 4336699), and 2 μl of water for a final volume of 10 μl. Cycle sequencing was performed at 96°C for 10 sec, 50°C for 5 sec, 60°C for 4 min for 27 cycles on either a Bio-Rad Tetrad 2 (Bio-Rad Laboratories, Hercules, California, USA) or ABI 9700 (Applied Biosystems, Inc., Foster City, California, USA) thermal cycler. Fluorescently labeled extension products were purified following Applied Biosystems BigDye® XTerminator™ purification protocol and subsequently processed on an ABI 3730xL DNA Analyzer (Applied Biosystems, Inc.). The AB1 file generated for each sample from the 3730xL DNA analyzer was provided to researchers in Rockville, Maryland, USA, through a secure network drive for all subsequent downstream sequencing analysis. In addition to the sequencing of the cDNA clones, primer extension experiments were performed in selected clones to further extend sequence coverage.
ESTs were trimmed of primer and vector sequences. The BLAST suite of programs , CAP3 assembler , and ClustalW  software were used to compare, assemble, and align sequences, respectively. For functional annotation of the transcripts, we used blastx  to compare the nucleotide sequences with the NR protein database of the NCBI and to the Gene Ontology database . The program reverse position-specific BLAST (RPS-BLAST)  was used to search for conserved protein domains in the Pfam , SMART , Kog , and conserved domains databases . We have also compared the transcripts with other subsets of mitochondrial and rRNA nucleotide sequences downloaded from NCBI and to several organism proteomes downloaded from NCBI, ENSEMBL, or VectorBase and to the assembled EST salivary database described before , and found in http://exon.niaid.nih.gov/transcriptome/tickreview/Sup-Table-1.xls from where the fasta set can also be recovered at http://exon.niaid.nih.gov/transcriptome/tick_review/tick_proteins_fasta.zip. Segments of the three-frame translations of the EST (because the libraries were unidirectional, six-frame translations were not used) starting with a methionine found in the first 300 predicted amino acids, or the predicted protein translation in the case of complete coding sequences, were submitted to the SignalP server  to help identify translation products that could be secreted. O-glycosylation sites on the proteins were predicted with the program NetOGlyc . Functional annotation of the transcripts was based on all the comparisons above.
For sequence comparisons and phylogenetic analysis, we retrieved tick sequences from GenBank, and we have also deducted protein sequences from ESTs deposited in Dbest, as described and made accessible in a previous review article . Phylogenetic analysis and statistical neighbour-joining bootstrap tests of the phylogenies were done with the Mega package  after sequence alignment performed by Clustal . Codon volatility was calculated as previously described .
The soluble protein fraction from salivary gland homogenates from H. marginatum corresponding to approximately 200 μg of protein was brought up in reducing Laemmli gel-loading buffer. The sample was boiled for 10 min and applied to two lanes (~50 and ~150 μg in each lane) and resolved on a NuPAGE 4–12% Bis-Tris precast gel. The separated proteins were visualized by staining with SimplyBlue (Invitrogen). The gel was sliced into 20 individual sections that were destained and digested overnight with trypsin at 37°C. Peptides were extracted and desalted using ZipTips (Millipore, Bedford, MA) and resuspended in 0.1% TFA prior to S analysis.
Nanoflow reversed-phase liquid chromatography tandem MS (RPLS-MS/MS) was performed using an Agilent 1100 nanoflow LC system (Agilent technologies, Palo Alto, CA) coupled online with a linear ion-trap (LIT) mass spectrometer (LTQ, ThermoElectron, San José, CA). NanoRPLC columns were slurry-packed in-house with 5 μm, 300-Å pore size C-18 phage (Jupiter, Phenomenex, CA) in a 75-μm i.d. × 10-cm fused silica capillary (Polymicro Technologies, Phoenix, AZ) with a flame-pulled tip. After sample injection, the column was washed for 30 min with 98% mobile phase A (0.1% formic acid in water) at 0.5 μL/min, and peptides were eluted using a linear gradient of 2% mobile phase B (0.1% formic acid in acetonitrile) to 42% mobile phase B in 40 min at 0.25 μL/min, then to 98% B for an additional 10 min. The LIT-mass spectrometer was operated in a data-dependent MS/MS mode in which each full MS scan was followed by seven MS/MS scans where the seven most abundant molecular ions were dynamically selected for collision-induced dissociation (CID) using a normalized collision energy of 35%. Dynamic exclusion was applied to minimize repeated selection of peptides previously selected for CID.
Tandem mass spectra were searched using SEQUEST on a 20-node Beowulf cluster against the H. marginatum rufipes described in this paper and the Bos taurus proteome (downloaded from ftp://ftp.ncbi.nih.gov/genomes/Bos_taurus/protein/) with methionine oxidation included as dynamic modification. Only tryptic peptides with up to two missed cleavage sites meeting a specific SEQUEST scoring criteria [delta correlation (ΔCn) ≥ 0.08 and charge-state-dependent cross correlation (Xcorr) ≥ 1.9 for [M + H]1+, ≥ 2.2 for [M + 2H]2+, and 3.5 for [M + 3H]3+] were considered as legitimate identifications. The peptides identified by MS were converted to Prosite block format  by a program written in Visual Basic. This database was used to search matches in the Fasta-formatted database of salivary proteins, using the program Seedtop, which is part of the Blast package. The result of the Seedtop search is piped into the hyperlinked spreadsheet to produce a text file as shown in supplemental table S2. Notice that the ID lines indicate, for example, 18_73, which means that one match was found for fragment number 73 from gel band 18. Because the same tryptic fragment can be found in many gel bands, another program was written to count the number of fragments for each gel band, displaying a summarized result in an Excel table. The summary in this form of 11 →18 | 12 →18 | 13→2 | indicates that 18 fragments were found in Fraction 11, while 18 and 2 peptides were found in fractions 12 and 13, respectively. Furthermore, this summary included protein identification only when two or more peptide matches to the protein were obtained from the same gel slice.
A total of 2,084 cDNA clones were used to assemble a database (Additional file 1 [Supplemental Table S1]) to yield 1,167 clusters of related sequences, 993 of which contained only one EST. The 1,167 clusters were compared, using the programs blastx, blastn, or RPS-BLAST , to the nonredundant (NR) protein database of the National Center of Biological Information (NCBI), National Library of Medicine, NIH, to a gene ontology database , to the conserved domains database of the NCBI , and to a custom-prepared subset of the NCBI nucleotide database containing either mitochondrial or rRNA sequences.
Manual annotation of the contigs resulted in four broad categories of expressed genes (Table 1 and Figure 1). The putatively housekeeping (H) category contained 29% of the sequences and had on average 1.59 sequences per cluster, and the secreted (S) category had 42% of the ESTs with an average of 3.51 ESTs/clusters, while 28% of the ESTs, mostly singletons, were not classifiable, constituting the Unknown (U) group. The transcripts of the U class could represent novel proteins or derive from the less conserved 3′ or 5′ untranslated regions of genes, as was indicated for the sialotranscriptome of Anopheles gambiae . Sequences deriving from transposable elements (TE) accounted for the remaining sequences, mostly singletons. TE-related sequences may indicate either the presence of active transposition in the tick, or more likely, the expression of sequences suppressing transposition. Low-level expression of TE sequences have been a relatively common finding in previous sialotranscriptomes.
The 594 ESTs attributed to H-class genes expressed in the salivary glands (SGs) of H. m. rufipes were further characterized into 20 subgroups according to function (Table 2 and Additional file 1 [Supplemental Table S1]). Transcripts associated with the protein synthesis machinery represented 44% of all transcripts associated within the H class, an expected result given the secretory nature of the organ. Energy metabolism accounted for the second most abundant H class, with 10.8% of the transcripts. Another 10.3% of the transcripts were classified as either “hypothetical conserved” or “conserved secreted” proteins. These represent highly conserved proteins of unknown function, presumably associated with cellular function but still uncharacterized. This functional distribution is typical of previous sialotranscriptomes.
A total of 894 ESTs contributed to 255 contigs and singletons associated with putative H. m. rufipes salivary-secreted components (Table 3 and Additional file 1). These include previously known gene families  such as metalloproteases, lipocalins, protease inhibitor domain-containing peptides, immunomodulators, antimicrobial peptides, basic tail, and glycine-rich peptides. Several other deducted sequences code for proteins that have some sequence similarity to other known proteins or to proteins not previously described in tick sialotranscriptomes.
From the sequenced cDNAs, a total of 249 protein sequences were derived, 101 of which code for putative secreted products (Additional file 2 [Supplemental Table S2]). This set of 101 proteins includes 74 that are presumably full length, while the remaining 27 are truncated. With this transcriptome-derived protein database, we characterized the tick salivary proteome via analysis of salivary gland homogenates fractionated by electrophoresis on SDS-polyacrylamide gels, bands of which were tryptic digested, fractionated by reversed phase chromatography, followed by in line electro spray into a mass spectrometer for tandem mass spectrometry analysis (Figure 2). Follows a description of the protein families deducted from the transcriptome analysis, with information of the proteomic experiment as summarized in Table 4 and supplemental files 1 and 2.
Transcripts coding for metalloproteases have been commonly found in tick sialotranscriptomes [3, 10, 49], and these may be associated with fibrinogenolytic activity as previously found in I. scapularis . HEX-267 is a truncated sequence coding for the carboxy terminal region of a metalloprotease sequence from Haemaphysalis found in GenBank, with only 25 % similarity over 228 residues but with 68% identity to a homologue deducted from ESTs from Rhipicephalus microplus. It also displays the CDD domain for arthropod metalloproteases.
HEX-920 codes for a 5′ truncated endonuclease that may or may not be secreted in saliva. Although not found in Ixodes sialotranscriptomes, these types of transcripts have been found in transcriptomes of Rhipicephalus and Amblyomma, all with a signal peptide indicative of secretion . DNAse activity has not been described in tick saliva but is present in saliva of mosquitoes of the Culex genus , where it may work in concert with hyaluronidases to decrease the viscosity of the extracellular matrix and help the formation of the feeding lesion. Endonuclease transcripts are also commonly found in sand fly and tsetse sialotranscriptomes [52, 53].
A total of 62 ESTs from the sialotranscriptome of H. m. rufipes code for proteins containing signatures of proteins previously associated with a protease inhibitory function, which are either ubiquitous or particular to ticks. A more detailed analysis of these transcripts follows.
The Kunitz domain acquired its name from the Kunitz pancreatic trypsin inhibitor, also known as aprotinin, and later found to be ubiquitous . Tick sialotranscriptomes , as well as those of the hematophagous flies of the genera Culicoides [55, 56] and Simulium , abound with transcripts coding for members of this family. Proteins containing single or multiple Kunitz domain were described in ticks, where Ixolaris, a double Kunitz protein, and Pentalaris, containing five domains, have been functionally characterized [58–62]. The Kunitz fold can also perform functions beyond protease inhibition, such as ion channel inhibition [63–66]; indeed, recently a modified Kunitz domain peptide from R. appendiculatus  was shown to activate maxiK channels in an in vitro system, suggesting a vasodilator function. Fifteen ESTs were found in the H. m. rufipes sialotranscriptome coding for members of the Kunitz family, allowing the deduction of two full-length coding sequences (CDS), one (Hex-1093) from a single and the other (Hex-13) from a double Kunitz family. Both polypeptides have less than 50% identity to their closest matches to the NR and to the assembled dataset described in Francischetti, et al. .
The TIL (for trypsin inhibitor-like) domain typically contain ten cysteines forming five disulphide bonds and is found in many protease inhibitors. It belongs to the family I8 of the MEROPS database . These polypeptides may also exert antimicrobial function . Members of this family have been found ubiquitously in blood-feeding insect and tick sialomes, but very few have been characterized. A tick hemolymph anti microbial peptide (AMP) was previously reported to be a member of this family . More recently, tick proteins containing TIL domains were characterized from R. microplus as subtilisin inhibitors with antimicrobial activity and expressed in various tick organs, including the SGs . Hex-1007 is an interesting member of this family, as having 3 TIL domains in tandem starting at positions 91, 147, and 209. It has over 58% identity to proteins deducted from tick sialome ESTs and proteins from Amblyomma deposited on GenBank. Two ions matching the Hex-1007 sequence were obtained by MS/MS from gel fraction 18 (Fig 2, table 4). This region of the gel is near the 19 kDa marker, a smaller MW than the predicted 28 kDa for this protein. However, it is common for proteins containing many disulfide bonds to appear more compact and thus move faster when submitted to electrophoresis. Alternatively, this protein may be processed into shorter peptides.
Hex-55 is a shorter peptide that actually does not show a typical TIL domain, yet produces weak matches to proteins that are typical members of the family.
This protein family was so named due to a stretch of lysine residues in the carboxy terminus region of an expanded family of salivary proteins found in the I. scapularis sialome [14, 15]. They are unique to ticks  and can be identified by the PFAM domain PF07771, although many members are so divergent that they do not register it. Many members also have the conserved block C-x(13,21)-Y-Y-C-x(16,19)-C. Some members of this family have been characterized in I. scapularis as anti-clotting , thus their inclusion in this section. Hex-449 is a typical member of the family, having the characteristic PFAM domain and the YF-YF block. Its closest known relative is a protein reconstructed from D. andersoni ESTs, to which it has 37% identity and 50% similarity. Hex-390 presents 72% identity to a salivary protein from Hyalomma asiaticum named P18, and to basic tail proteins from Ixodes and Ornithodoros. It does not have the characteristic PFAM domain, but has the YF-YF signature, as well as a poly lysine tail. Hex-238 appears to be a very divergent member, presenting weak similarity to the P18 protein when compared to the NR database. Alignment of the H. m. rufipes sequences with those of other ticks suggests that Hex-238 may be a truncated member of the family or a protein resulting from a missing exon (Figure 3A). The phylogenetic tree shows that HEX-449 is a canonical basic tail protein within clade I as shown in Figure 3B. This clade has strong bootstrap support, indicating a common origin for this protein family in metastriates and prostriates. Clade II, however, does not group with the remaining proteins and does not have strong bootstrap support, suggesting it may derive from a different ancestor, or more likely have evolved beyond recognition of a common ancestor. Future inclusion of novel members of this family may lead to the merging of these clades.
Madanins was the name given for related small polypeptides (~6 kDa) isolated from the tick Haemaphysalis longicornis that possess anti-thrombin activity . Later another peptide, named chimadanin, was isolated from the same tick . Additionally, variegin was isolated from A. variegatum as a novel anti-thrombin peptide. Recently, these peptides were suggested to be part of a exclusive metastriate superfamily . The transcriptome of H. m. rufipes indicates the presence of at least four genes that are possibly polymorphic. Alignment of the Ha. longicornis sequences with those of H. m. rufipes and deducted sequences from Dermacentor indicates three regions on these peptides: the signal peptide region, a second region with predominantly negatively charged peptides, and a third proline/serine/threonine-enriched region (marked as 1, 2 and 3 on Figure 4A). The phylogram provides strong bootstrap support for a common origin between H. m. rufipes and D. andersoni sequences, both members of the Rhipicephalinae subfamily (marked with I in Figure 4A), while the Haemaphysalis madanins are so divergent as to constitute a separate clade (marked II in Figure 4B), with the chimadanin (HAELO 67968373) possibly being a link between the two clades. Additional sequencing within the Rhipicephalinae and Haemaphysaline subfamilies may uncover more detailed phylogenetic relationships of these proteins. Notice that these mature peptides are small, with near 60 amino acids, and contain no cysteines, making them relatively straightforward for direct chemical synthesis. The anti-thrombin function of these peptides in Hyalomma and Dermacentor remains to be confirmed.
The lipocalin family is extremely diverse in ticks, where it serves multiple functions, as chelators of agonists (kratagonists) of hemostasis and inflammation, and other unrelated functions, such as anti-complement . A previous review characterized 301 tick salivary lipocalins into 10 major groups . The sialotranscriptome of H. m. rufipes yielded 18 ESTs that are similar to previously described tick lipocalins (Table 3). From these ESTs, five lipocalins can be derived. Two of these lipocalins, HEX-614, which is 22% identical and 41% similar to an A. americanum salivary protein, and less so to other tick lipocalins and HEX-938, which is most probably a splice variant of the same gene coding for HEX-614, are very divergent, forming a clade of their own with the Amblyomma proteins. HEX-133 produces a best match to another Amblyomma protein, previously classified as the Metastriate specific group III . HEX-497 matches an R. microplus lipocalin with only 34% identity and belongs to Group I, subgroup B of lipocalins. HEX-497 appears abundantly expressed as indicated by the finding of 14 MS/MS ions producing a coverage of 99 % of the protein found in gel band 16, near the 28 kDa marker. Finally, HEX-43 matches a Ha. longicornis protein at 29% identity, belonging to the Group IIa of lipocalins. The specific functions of any of these lipocalins remain to be identified.
Glycine-rich protein is a generic name encompassing a diverse group of proteins, including short and long proteins. Some of these have many GY repeats that are found in small antimicrobial peptides  but may be also found in cuticle proteins, where the tyrosine residue may be involved in crosslinking reactions. Very long glycine-rich proteins are found in metastriate ticks, are similar to spider-silk proteins, and may function as cement proteins to attach the tick mouthparts to their hosts [77, 78]. A total of 217 ESTs from the H. m. rufipes sialotranscriptome was classified as possibly coding for glycine-rich proteins (Table 3), from which 21 CDS were derived. Some of these coding sequences derive from abundantly expressed transcripts such as HEX-1069, a protein containing GY repeats deriving from 30 ESTs, and a homologue of the protein annotated as cement protein 64P-BA1 from R. appendiculatus. HEX-1069, as well as HEX-1143 and HEX-20 were identified by MS/MS in the gel shown in Fig 2 with 6 to 7 ions each at gel fractions 18 (HEX-1069) and 16 (HEX-1143 and HEX-20). HEX-235 is also abundantly expressed, with 42 ESTs, and is similar to the R. appendiculatus protein annotated as putative cement protein RIM36 and found to produce strong antibody response in cattle . It was identified in the proteome experiment (Fig 2 and table 4) at fraction 8, a region of the gel between the markers for 97 and 64 kDa. The glycine rich proteins HEX-1057 and HEX-1043 were also identified in the same gel fraction, and those coded by HEX-750 and HEX-1117 were found in band 15, between the 28 and 39 kDa markers.
Mucins are serine- and/or threonine-rich proteins, usually of low complexity, and having the motifs for being linked to N-acetyl-galactosamine residues . Some of these invertebrate mucins also contain chitin-binding domains, suggesting they may coat the feeding channels of blood-sucking arthropods. Transcripts coding for mucins are commonly found in sialotranscriptomes of blood-sucking arthropods. HEX-930 and HEX-264 are probable alleles, deriving from four and three ESTs, respectively. They have a chitin-binding domain and are 50% identical to a Haemaphysalis protein annotated as a mucin.
HEX-826 represents the sequence of a threonine-rich protein with seven predicted galactosylation sites, a mature MW of 7.8 kDa, and is similar to a D. andersoni protein deducted from salivary ESTs but not to other protein in the NR database.
Twenty-seven ESTs in the H. m. rufipes sialotranscriptome code for proteins assigned to an immunity function (Table 3). Coding sequence for a peptidoglycan recognition protein, an ixoderin/ficolin, also involved in microbial pattern recognition and possibly associated with the activation of the invertebrate complement system , and a typical lysozyme were deduced from these ESTs. This lysozyme (coded by HEX-896) was identified in Fig 2 gel fraction 19 in a region of the gel consistent with its expected MW.
We previously characterized 60 tick salivary proteins as members of the uniquely Ixodidae protein family named 8.9-kDa family, of unknown function. The H. m. rufipes sialotranscriptome provides evidence for five members of this family. Alignment of these proteins with their relatives allows for detection of a conserved framework of cysteines, including a double Cys-Cys in their carboxy terminals and a few other conserved residues (Figure 5) indicative of a fast divergence of these proteins from a common ancestor. The 8.9 kDa protein coded by HEX-1038 was identified in gel fractions 18, 19 and 20 (Fig 2 and table 4).
We have previously characterized a D. andersoni-specific family, based on five protein sequences, named Dermacentor-specific 9-kDa expansion, due to their inability to significantly match any protein in the NR database, but being related among themselves, indicating gene duplications in D. andersoni followed by fast divergence. Somewhat surprisingly, the sialotranscriptome of H. m. rufipes produced 400 ESTs coding for members of this unique family, from which 14 coding sequences were derived. This is the most abundantly expressed family in H. m. rufipes. The coding sequences match various D. andersoni proteins from 36 to 49% identity, over nearly 100% of their lengths. These 14 proteins are possibly the product of at least 4 genes, two of which may be polymorphic, one producing the proteins HEX-902, HEX-272, HEX-275, HEX-277, HEX-274, and HEX-273, and the other producing proteins HEX-874, HEX-353, HEX-300, HEX-303, and HEX-176, as these proteins are within 10% identity from each other. The remaining proteins, HEX-1077 and HEX-1038, appear to derive from different genes. Alignment of the Hyalomma with the Dermacentor proteins shows a framework of conserved cysteine residues as well as two leucines and a serine residue. Phylogenetic analysis reveals two main mono-specific clades (marked I and II on Figure 6C) consisting of Hyalomma and Dermacentor proteins.
We have previously catalogued 917 tick proteins within 19 protein families as orphans, because they did not produce significant matches to proteins outside their own original genus . The sialotranscriptome of H. m. rufipes contains transcripts that allow us to “de-orphanize” a few of these families, as follows: HEX-434 is similar to a R. microplus protein, while HEX-421 is similar to a monospecific family within A. americanum, within which they share a common framework of six cysteines, three glycines, and two additional sites with hydrophobic amino acids (Figure 7). Finally, a group of Hyalomma proteins with a common polylysine stretch in their mid-region, thus named “basic belly” proteins, matches an Ornithodoros protein. The basic belly protein coded by HEX-550 was identified in the gel band 16 (Fig 2 and table 4), near the 28 kDa marker. However, HEX-550 has a predicted MW of 7.8 kDa. Although many of the basic belly proteins have a signal peptide, the deducted polylysine stretch is coded by a polyA region, suggesting that these CDS could be artifacts derived from a 3′ untranslated region.
Additional file 2 presents 22 protein sequences coding for secreted products without any significant similarities to known proteins. Most of these putative polypeptides are small, and their CDS could derive from the 3′ region of truncated transporters that produce intramembrane helices that are interpreted as signal peptides.
The EST set acquired in this study allowed for the description of 144 coding sequences associated with housekeeping functions, including a set of conserved hypothetical proteins that might be related to protein synthesis or protein modification. Many of these products were identified in various electrophoresis gel bands (Fig 2 and table 4), including various ribosomal proteins, products associated with protein modification such as glutathione S-transferases, and proteins associated with energy metabolism. Strong signal for tubulin was also found in fraction 12 and neighbouring fractions. Two class I transposon sequence fragments were also extracted from the dataset (Additional file 2).
We and others have previously reported that host proteins appear in tick saliva [15, 81]. Indeed, there are tick lipocalins postulated to be carriers of host immunoglobulins from the tick hemolymph to tick saliva [81–84], with a possible role of detoxifying host proteins that may cross from the midgut to the hemolymph. We have previously identified host albumin, hemoglobin and immunoglobulin chains in the saliva of Ixodes scapularis. This study was done near 10 years ago, with no host mammalian proteomes available, and using low sensitivity Edman degradation of proteins . Presently we searched for host proteins in the salivary gland homogenates of H. marginatum rufipes, by supplying the predicted proteome of Bos taurus to the Sequest program that searches the MS/MS generated ions against a target database (Table 5 and Supplemental file S3). To properly analyze the bovine proteome, we organized the proteins in a hyperlinked spreadsheet which was blasted against the available predicted proteins of the tick Ixodes scapularis (downloaded from http://iscapularis.vectorbase.org/Ixodes_scapularis/Info/Index), the only tick genome known, and, to facilitate protein annotation, against the SWISSPROT protein database and the Gene Ontology database. MS/MS derived peptides originating from the study reported in Fig 2 were mapped to this spreadsheet as indicated in the methods section. We thus obtained matches to 425 bovine proteins (supplemental file S3, worksheet named Bos matches). However, many of these matches are to very conserved proteins, such as histones, tubulins or ribosomal proteins, that are 100% or nearly so conserved with tick proteins. These matches could derive from tick as well as bovine proteins. We conservatively excluded from the bovine set those proteins producing more than 50% identity to tick proteins, as well as all myosins, obtaining a list of 77 bovine proteins (Supplemental file S3, worksheet 2). Several of these 77 proteins were related proteins by being either splice variants, or closely related gene families, such as hemoglobin. We thus removed these redundancies to produce table 4, with 22 bovine proteins that appear in the tick salivary gland proteome. The table is ordered by the fraction number shown in fig 2, from higher to lower MW. The predicted mature masses of the proteins (Table 4) are in accordance with the gel order, except for complement C3 and fibrinogen; C3 appears on fraction 10, between the markers for 64 and 51 kDa, and incompatible with the C3 predicted mass of 185 kDa, indicating C3 cleavage; further C3 fragments appear on fractions 14 and 17. Fibrinogen appears most abundantly covered on fraction 14, under the 39 kDa marker, while the mature protein has a predicted mass of 53 kDa, indicating fibrinogen cleavage. Notice that the list of bovine proteins includes abundant ions for serum albumin, hemoglobin and immunoglobulin chains, as well as for alpha-2-macroglobulin. Proteins abundant in red cells such as band 3 anion transport protein and carbonic anhydrase were also found, as well as leukocyte-derived products azurocidin and the antimicrobial cathelicidin.
The appearance of host proteins on tick salivary gland homogenates could be considered an artefact of contamination during dissection, possibly from the tick gut. However, our samples were carefully collected and no EST produced matches to bovine sequences, as could happen in the case the SG were contaminated with bovine blood. Host Ig in tick hemolymph and saliva were also previously characterized in detailed studies . As indicated before , it is interesting to speculate whether these host proteins, while passing through the tick salivary glands, may be submitted to the tick protein glycosylation machinery, although no significant increase in mass for any product was found. Incorporation of these tick epitopes into self molecules may be a strategy for tick suppression of host immunity against carbohydrate antigens.
One hundred and fifteen contigs were identified by the proteomic data. The distribution of the matched proteins among functional classes, considering only those that obtained at least two ion matches in one gel slice (supplemental file S1, worksheet named “proteome analysis” and Table 6) shows members of the protein synthesis machinery as the most abundantly detected, followed by secreted proteins, protein modification machinery and energy metabolism; these classes account for over 90% of the identified proteins. No correlation was found between the transcript abundance (measured by their number of EST’s) and the number of matching MS/MS ions for each contig (R=0.13) (Supplemental data S1, worksheet “proteome analysis”).
Several protein families previously found in tick salivary transcriptomes were identified in H. m. rufipes, such as the Kunitz, basic tail, madanin, lipocalin, glycine-rich, mucins, immunity-related, and 8.9-kDa family, as well as protein families previously found only in the metastriate Dermacentor genus, such as the 9-kDa family. Most of these proteins have no known function. Many orphan proteins were found that do not match known proteins, but have signal peptides indicative of secretion, suggesting these are Hyalomma-specific proteins. This annotated dataset can assist in the discovery of new targets for anti-tick vaccines, as well as help to identify pharmacologically active proteins. In this current study, this annotated transcript data was used to identify salivary protein expression in a proteomic experiment. We additionally identified bovine host proteins in salivary homogenates reinforcing the idea that host proteins are recycled back to the host after ingestion.
Hyperlinked Excel spreadsheet containing annotated assembled ESTs (Supplementary Table 1): http://exon.niaid.nih.gov/transcriptome/Hmrufipes/Hmr-S1.xls
Hyperlinked Excel spreadsheet containing annotated coding sequences (Supplementary Table 2): http://exon.niaid.nih.gov/transcriptome/Hmrufipes/Hmr-S2.xls
Hypelinked Excel spreasheet containing annotated bovine proteins (Supplementary Table 3): http://exon.niaid.nih.gov/transcriptome/Hmrufipes/Hmr-S3.xls
This work was supported by the Intramural Research Program of the Division of Intramural Research, National Institute of Allergy and Infectious Diseases, National Institutes of Health. We thank NIAID intramural editor Brenda Rae Marshall for assistance. We are grateful for the support of the Research Technologies Section, Rocky Mountain Laboratories, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Hamilton, MT 59840, USA, for DNA sequencing.
Because all authors are government employees and this is a government work, the work is in the public domain in the United States. Notwithstanding any other agreements, the NIH reserves the right to provide the work to PubMedCentral for display and use by the public, and PubMedCentral may tag or modify the work consistent with its customary practices. You can establish rights outside of the U.S. subject to a government use license.
Authors’ contributionsIMBF and JMA helped with data analysis, sample preparation, and manuscript editing. NM helped with tick collection and manuscript editing.. VMP helped with DNA sequencing. JMCR did most of the data analysis and wrote the bulk of the manuscript. All authors read and approved the final manuscript.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.