|Home | About | Journals | Submit | Contact Us | Français|
Spider venom is a complex mixture of bioactive peptides to subdue their prey. Early estimates suggested that over 400 venom peptides are produced per species. In order to investigate the mechanisms responsible for this impressive diversity, transcriptomics based on second generation high throughput sequencing was combined with peptidomic assays to characterize the venom of the tarantula Haplopelma hainanum. The genes expressed in the venom glands were identified, and the bioactivity of their protein products was analyzed using the patch clamp technique. A total of 1,136 potential toxin precursors were identified that clustered into 90 toxin groups, of which 72 were novel. The toxin peptides clustered into 20 cysteine scaffolds that included between 4 and 12 cysteines, and 14 of these groups were newly identified in this spider. Highly abundant toxin peptide transcripts were present and resulted from hypermutation and/or fragment insertion/deletion. In combination with variable post-translational modifications, this genetic variability explained how a limited set of genes can generate hundreds of toxin peptides in venom glands. Furthermore, the intraspecies venom variability illustrated the dynamic nature of spider venom and revealed how complex components work together to generate diverse bioactivities that facilitate adaptation to changing environments, types of prey, and milking regimes in captivity.
Spiders are some of the most ancient venomous animals and have been roaming the earth for 300 million years, numbering nearly 40,000 species. Tarantulas are a group of often hairy and large arachnids that comprise more than 860 species. Like other venomous animals, these predators use a complex arsenal of venom to paralyze and kill their prey, and many of these toxins have proved to be invaluable tools for pharmacological studies of voltage-sensitive and ligand-gated ion channels (1,–4). The vast majority of spider toxins are cysteine-rich polypeptides, and their properties and structures have been reviewed in detail (4,–7). The molecular diversity of spider toxins has also been investigated, and they appear to be based on a limited set of structural scaffolds, such as the inhibitor cysteine knot (ICK)4 and disulfide-directed β-hairpin, whose cysteines form 3–5 characteristic disulfide bonds, which is a much smaller number than toxins from other species, such as marine cone snails. The lack of a complete genome sequence may be responsible for our currently limited knowledge of the cysteine pattern diversity present in tarantula toxins, and improving this knowledge will be challenging.
Expressed sequence tags (ESTs) are short sequence reads derived from cDNA libraries that are useful tools for the identification of transcripts in species without a fully sequenced genome (8, 9). In the past few years, all reports on tarantula toxin transcriptomics have utilized classical cloning and Sanger sequencing strategies (9). In previous work, 420 peptides were detected by mass spectrometry, but few could be paired with peptide precursors identified from cDNA and genomic DNA sequencing (10). Furthermore, this limited data focused mainly on highly abundant and smaller toxin precursors, whereas less prevalent and longer gene sequences were largely ignored, which has proved a barrier to research on the molecular diversity and genetic mechanisms of toxin evolution (11). The relatively new 454 Life Sciences pyrosequencing technology has been successfully implemented in a number of species, including spiders (12,–16). This approach provides a more comprehensive landscape of the transcriptomic content of venom glands and has improved technical capabilities that identify longer sequences, a wider range of sensitivity, and greater accuracy than traditional Sanger sequencing (17, 18). The longer reads (>300 bp on average) generated with 454 pyrosequencing allow coverage of full-length (60–120 amino acids) precursors, which affords the direct identification of spider toxin precursors and avoids the errors inherent in assembling reads into contigs as typically required for other second generation technologies that generate shorter read lengths (19). In this study, 1,136 toxin precursors were identified using 454 Life Sciences pyrosequencing and classified into 15 classes (class A–O). Sequence analysis revealed that extensive hypermutation and fragment insertion/deletion dramatically increased the molecular diversity of toxin transcripts, and this diversity was further enhanced by highly variable post-translational modifications.
The high intraspecies variability may explain the transcriptome differences between this work and previous work. Analysis of the bioactivity of 14 of the identified toxins against DRG ion channels using patch clamping revealed functional diversity. In conclusion, variable peptide processing and selective expression explain how a limited set of gene transcripts can generate hundreds of toxin peptides in spider venom that have diverse activities and cooperate to subdue potential prey species.
Dithiothreitol (DTT), iodoacetamide, and trifluoroacetic acid (TFA) were obtained from Sigma. Acetonitrile was a domestic product (chromatogram grade). Milli-Q H2O was used for the preparation of all buffers. Other chemicals were of analytical grade.
The tarantula spiders were collected from Hainan, China. Venom glands of Haplopelma hainanum were obtained 2 days after being milked by electrical stimulation and were ground to fine powder in liquid nitrogen. Total RNA was extracted with TRIzol (Invitrogen) and used to construct a cDNA library. 5 μg of full-length double-stranded cDNA was then processed by the standard genome sequencer library preparation method using the 454 DNA library preparation kit (titanium chemistry) to generate single-stranded DNA ready for emulsion PCR (emPCRTM). The cDNA library was sequenced according to GS FLX technology (454/Roche Applied Science).
Sequence reads were trimmed by excluding adapters and low quality regions using the NGen module of the DNAStar Lasergene software suite. Subsequently, assembly was performed with SeqMan Pro (DNASTAR) using high stringency de novo transcriptome assembly (100% identity between reads with 50-nucleotide sequence overlap). Similar reads were assembled into contigs using CLC Genomics Workbench 3 with its default parameters. All assembled contigs and rest reads were searched against the NCBI subset of the EST database with BLASTn. All of these sequences were also searched with BLASTx in all six reading frames to the non-redundant protein database to determine their correct translation products. For both searches, Blast version 2.2.25+ was used, and an e value threshold of e ≤ 10−3 with a bit score of >40 was considered and recorded as a significant match for each query sequence. We analyzed the Blast results and used a homemade PERL script to classify representative sequences into five categories (“toxin-like,” “putative toxins,” “cellular proteins,” “unknown function,” and “no hit”).
Functional characteristics of the transcriptome were predicted using BLAST2GO software (20) with the NCBI non-redundant protein database (cut-off e value of ≤10−5) using contigs. Each contig with GI accession (NCBI) of the significant hits retrieved was assigned GO terms according to molecular function, biological process, and cellular component ontologies at a level that provided the most abundant category numbers (21).
The identification of toxin peptide sequences was carried out from the raw data using tBlastn, and a signal peptide can be predicted with the SignalP version 3.0 program (available from the SignalP server). The propeptide cleavage site was ascertained from the known start site of previously characterized mature toxins. As mentioned previously, such long sequence reads are likely to contain the full nucleic sequences of toxin precursors. The toxin-like sequences, the sequences representing no hits, and those with an abundance of cysteine residues may encode new toxin peptides. Toxin precursors were selected out according to the four following parameters: the proteins came from a full open reading frame (ORF) translated by Geneious software (22); the proteins contained more than 4 Cys residues (23); the proteins contained more than 45 amino acids; the toxin precursors were clustered into toxin groups according to their sequence similarity. All precursor sequences were aligned using ClustalX. The resulting alignment was imported into MEGA software to construct phylogenetic tree by the neighbor-joining method (24), and bootstrap values were estimated from 500 replicates.
Toxins were purified and identified as described previously (10). Rat DRG neurons were acutely dissociated from 30-day-old Sprague-Dawley rats and maintained in short term primary culture according to the method described by Xiao and Liang (25). Ionic currents were recorded from DRG cells under whole-cell patch clamp techniques using an Axon 700B patch clamp amplifier (Axon Instruments, Irvine, CA) at room temperature (20–25 °C). Patch pipettes with DC resistance of 2–3 megaohms were fabricated from borosilicate glass tubing (VWR micropipettes, VWR Co., West Chester, PA) using a two-stage vertical microelectrode puller (PC-10, Narishige, Tokyo, Japan) and fire-polished by a heater (Narishige). Experimental data were acquired and analyzed using the programs Clampfit version 10.0 (Axon Instruments) and Sigmaplot version 9.0 (Sigma). For Na+ current recording, the pipette solution contained 145 mm CsCl, 4 mm MgCl2, 10 mm HEPES, 10 mm EGTA, 10 glucose mm, 2 mm ATP (PH 7.2), and the external solution contained 145 mm NaCl, 2.5 mm KCl, 1.5 mm CaCl2, 2 mm MgCl2, 10 mm HEPES, 10 mm d-glucose (pH 7.4). For K+ current recording, the pipette solution contained 135 mm KCl, 25 mm KF, 9 mm NaCl, 0.1 mm CaCl2, 1 mm MgCl2, 1 mm EGTA, 10 mm HEPES, and 3 mm ATP-Na2, adjusted to pH 7.4 with 1 m KOH, and the external bath solution contained 150 mm NaCl, 30 mm KCl, 5 mm CaCl2, 4 mm MgCl2, 0.3 mm tetrodotoxin (TTX), 10 mm HEPES, and 10 mm d-glucose, adjusted to pH 7.4 with 1 m NaOH. For Ca2+ current recording, the internal solution contained 110 mm Cs-methane sulfonate, 14 mm phosphocreatine, 10 mm HEPES, 10 mm EGTA, 5 mm ATP-Mg, adjusted to pH 7.3 with CsOH, and the external solution contained 10 mm BaCl2, 125 mm tetraethylammonium chloride, 0.3 mm TTX, and 10 mm HEPES, adjusted to pH 7.4 with tetraethylammonium hydroxide.
The mRNAs of six venom glands from the tarantula Ornithoctonus hainana were extracted and sequenced using GS FLX technology (454/Roche Applied Science) following the manufacturer's protocol. Sequencing revealed a total of 249,549 reads (amounting to ~757 Mb) with an average length of ~328 bases/read (maximum 830 bp, minimum 40 bp, 6.72% of reads <100 bp, 84.6% of reads 100–500 bp, 8.63% of reads >500 bp). The raw sequencing data can be downloaded from SRA (NCBI) using accession number SRP040123. After removing sequences of low quality, a total of 215,640 reads were assembled into 65,432 contiguous DNA sequences (contigs) with an average length of 625 bp (36.3 reads/contig), with the rest remaining as single reads. Whereas this study focused mainly on toxin peptides, numerous other protein sequences were identified and will be described elsewhere. As outlined under “Experimental Procedures,” we searched for toxin peptide sequences directly from the sequencing reads, because the average read length of >200 bp allowed the identification of full-length toxin precursors. Toxin peptides were also searched for in the contigs, and no additional toxin peptide sequences were found. In total, 52,570 reads displayed similarity to known peptide toxins or toxin-like sequences; the category of putative toxins includes sequences rich in cysteine residues and sharing sequence identity with toxins or proteins including the ICK motif (5%) that were not identified by a BLAST search; the category of cellular proteins includes transcripts coding for proteins involved in cellular processes (44%); the unknown function category includes reads that shared sequence identity with previously described sequences with no functional assessment or hypothetical genes; and the no hit category indicates no match with currently known sequences. The results are summarized in Fig. 1.
A search against publicly available databases (nr/NCBI, Swiss-Prot + TREMBL/EMBL) revealed that 8,773 high confidence proteins were associated with GO terms and further grouped into the categories of molecular functions, biological processes, and cellular components at the second level according to standard gene ontology terms. Based on annotations from GO analysis (Fig. 2), transcripts were categorized into 2,610 groups with 80% identity threshold. These included functional annotations for 816 biological process (BP), 798 molecular function (MF), and 996 cellular component (CC) categories (Fig. 2). Highly expressed transcripts were enriched in metabolism and translation processes, indicating that venom glands are metabolically active and engaging in the intensive protein synthesis and processing required for venom production. Transcripts associated with binding, catalysis, and channel regulation were also highly represented, indicating that venom peptides may play an important role in prey inactivation by binding to ion channels. GO terms related to redox homeostasis and proteolysis were also enriched, which may be related to the extensive post-translational modification of spider toxins that was reported previously (9).
Although spider toxins contain diverse disulfide bridge patterns and fold into a variety of three-dimensional structures, cysteine-rich domains are a common feature shared by many toxin sequences (26, 27). Overall, 1,136 toxin precursors were identified in the venom gland transcriptome, and 65.8% of the mature peptides included two adjacent cysteine residues. From previous proteomics results (10), we were able to predict precursor endoproteolytic and amidation sites with high confidence, and 90 toxins and variants accounted for 95% of all toxin precursors. These were divided into 14 classes (classes A–N) based on the number of cysteine residues present (Table 1), and 70 sequences did not fall into these categories in class O (Fig. 3). Of the 18 H. hainanum peptide toxins characterized previously, 11 belonged to class G along with 252 variants, four belonged to class E along with 39 variants, and the other three belonged to classes C, L, and M along with 19, 25, and 4 variants, respectively.
Class A contained seven novel transcripts, including HN-Aa and its variants (Fig. 4). The signal peptide and propeptide of these precursors are identical, and their cleavage sites are LFA/ED and LESEK, respectively. The mature peptide has 12 Cys residues, which is the highest number in a toxin peptide scaffold in H. hainanum reported to date, although two other spider toxins share this cysteine pattern (28, 29). TX-L precursors from Lycosa singoriensis superfamily VI have the cysteines arranged C-CC-C-C-C-C-C-C-C-C-C, whereas HN-Aa has the double cysteine shifted to the left. The double knot toxin from the earth tiger tarantula (Selenocosmia huwena) has the pattern C-C-CC-C-C-C-C-CC-C-C, and this sequence shares 87% similarity with HN-Aa. Double knot toxin selectively and irreversibly activates the capsaicin- and heat-sensitive channel TRPV1 (29). This cysteine pattern therefore indicates that HN-Aa may be a TRPV1 channel activator. The six variants identified showed high similarity with the mature peptide, and the presence of different lengths of mature peptide results in great potential for toxin variability.
Class B contains only four members, the fewest of all classes (Fig. 4). HN-Ba and its three variants include 11 Cys residues, and HN-Ba precursors contain over 100 residues, which includes the consensus VIAYA cleavage signal, whereas the propeptide numbered only four residues. A large number of Gln residues were found in the mature peptide, and the C terminus is rich in Thr and Ser residues, which is strikingly different from other toxins present in this species. The cysteine pattern shares homology with the U1-hexatoxin-Hsp201a from the funnel web spider (30). Other than this highly conserved cysteine pattern, HN-Ba showed no obvious sequence similarity with other peptide toxins.
Class C contains 29 transcripts that clustered into the known toxin HN-Ca (HNTX-XIV) and two novel toxins (HN-Cb and HN-Cc; Fig. 4), all of which included 10 Cys residues, but class C toxins were further divided into two clades based on cysteine pattern. HN-Ca and HN-Cb shared the pattern -C-C-CC-C-C-C-C-C-C- and formed the first clade, and these toxins do not have a propeptide but do share a signal peptide containing the cleavage site ELVSC. The mature HN-Cb peptide is rich in Val residues and has a C terminus that is ~20 residues longer than HN-Ca (HNTX-XIV), indicating diverse functions. Further research is needed to investigate whether these additional residues are cleaved during post-translational modification or remain and have some additional functional significance. HN-Cc exhibited the cysteine pattern -C-C-C-CC-C-C-C-C-C-, and the signal peptide, propeptide, and mature peptide were distinct from HN-C[a~b], although HN-Cc peptides shared sequence similarity and included an extended cysteine variant not present in HNTX-XVIII.
The four toxin precursors in class D contain 9 cysteine residues and exhibited two different cysteine patterns (Fig. 4). HN-Da shares significant similarity with HNTX-IV (31) but has the cysteine pattern -C-C-CC-C-C-C-C-C- with an additional 3 cysteines at the C-terminal region that also included a particularly large number of Ser residues. HN-Db has the cysteine pattern -C-C-C-CC-C-C-C-C- and shared an identical signal peptide and propeptide with HNTX-XVIII, but the mature peptide was different, in particular due to the additional cysteine. The additional C-terminal residues could be removed to produce a more canonical spider toxin during maturation.
The mature peptides of class E displayed two distinct cysteine patterns, and the precursors were classified into nine groups (Fig. 4). HN-E[a~d, f, h~i] shared the cysteine pattern -C-C-CC-C-C-C-C-, and the other two toxins shared the cysteine pattern -C-C-C-CC-C-C-C- (Table 1). HN-E[a~c] share high sequence similarity with members of groups HNTX-I, HNTX-III, and HNTX-IV, respectively, and include a 30-residue C-terminal extension. Interestingly, the CIC motif was present in the majority of these sequences, which is typical of μ-AGTXs from the American funnel web spider (Agelenopsis aperta) and PnTx3–3 from the Brazilian armed spider (32,–35). We speculated that the cysteines in this motif are not involved in disulfide bond formation because PnTx3-3 only has two disulfide bonds, with the fourth and fifth Cys residues excluded (32). Members of HN-Ed share sequence similarity with HNTX-VII, with the propeptide lost and a longer mature peptide. Toxin precursors lacking propeptides are commonly observed in scorpions but rarely in spiders. Four known toxins, HN-Ef (HNTX-XVII), HN-Eg (HNTX-XVIII), HN-Ee (HNTX-XV), and HN-Eh (HNTX-XX), have been chemically and functionally characterized in our previous work (10). HN-Ef (HNTX-XVII) and HN-Eh (HNTX-XX) exhibited the shortest mature peptides in this class. The typical motif CVX1CVIX5CVIIX1CVIII, where X is any residue except cysteine, was present in precursors and was first identified in Mu-agatoxins from A. aperta (32). Numerous variants differed by only one or two amino acids, suggesting that these may be ideal for studying structure-function relationships. Moreover, the HN-Ei mature peptide is identical with HWTX-XV (Table 2). Although interesting, it is not unprecedented for toxins from different venomous animals to share identical sequences.
Class F includes 15 toxins (149 precursors) and is divided into three clades based on the seven-cysteine pattern present in these peptides (Fig. 4). Ten (HN-F[a~g, k, n~o]) share the -C-C-CC-C-C-C- pattern, and four (HN-F[f~j, m]) exhibited the -C-C-C-CC-C-C- pattern, whereas only HN-Fl had the pattern -C-CC-C-C-C-C-. The pattern in HN-Fa is shared with that in U1-theraphotoxin-Pc1a, a toxin that possesses strong in vitro anti-plasmodial activity against the intra-erythrocyte stage of Plasmodium falciparum (33, 36). HN-F[b~c] share high homology with HNTX-I and HNTX-III, including the conservative sequence NEINACSPVF in the C-terminal region. HN-Fd shares high sequence identity with HNTX-IV but includes the much longer conserved C-terminal sequence “MRSMYPVQFSNWMYLANGGIMSSTSSACQLMSINK” that is enriched with Ser. Although HN-Fe has the same signal peptide as HNTX-VIII, the C terminus is much longer. The mature peptide of HN-Ff shares minimal sequence identity with HNTX-XIV, and the key Cys residues are changed into other amino acids. HN-F[g~j] shares high sequence similarity with HNTX-XV and HNTX-XVIII but cannot form four disulfide bonds because there is a cysteine missing in the C-terminal region. We speculated that despite the high identity with known toxins, these peptides must display a novel cysteine pattern. HN-F[k~m] share sequence similarity with toxins from Ornithoctonus huwena, and HN-Fk shares sequence similarity with HWTX-VIII, including a high proportion of basic residues in the mature peptides. HN-Fm shares similarity with HWTX-XVIII, and HN-Fl also includes a large number of basic residues, but no homologs were identified from a BLAST search; therefore, this appears to be a novel toxin.
Class G includes the greatest number of peptides, accounting for 36% of all toxins, and 10 of these (HN-Ga (HNTX-I), HN-Gc (HNTX-III), HN-Ge (HNTX-IV), HN-Gg (HNTX-IX), HN-Gj (HNTX-VII), HN-Gm (HNTX-XIII), HN-Gn (HNTX-X), HN-Gr (HNTX-XIII), HN-Gs (HNTX-XIX), HN-Gu (HNTX-XVI), and HN-Go (HNTX-XII)) have been chemically and functionally characterized in our previous work. These 10 toxins contain 6 cysteine residues forming three disulfide bonds. Based on sequence similarity, class G peptides were further divided into 25 distinct toxin groups (Fig. 5). HN-G[b, d, f, i, l, q] are similar to the known toxins HNTX-I, HNTX-III, HNTX-IV, HNTX-IX, HNTX-VII, and HNTX-XII but with a longer C-terminal region. The three disulfides have a 1–4, 2–5, and 3–6 connectivity that forms the distinctive ICK motif. The majority of members in this class are variants that differ by only one or two amino acids, implying that they are natural mutants. In contrast to the traditional ICK motif-containing spider toxins, HN-Gt and HN-Gv are similar to HNTX-XVI and HNTX-XVIII and adopt a -C-C-C-CC-C- cysteine pattern. Toxins HN-G[w~y] share high sequence similarity with HWT-Xa, HWTX-XIII, and HWTX-XVI, respectively, whereas HN-Gw, HN-Gx, and HN-Gy differ from HWTX-Xa, HWTX-XIII, and HWTX-XVI, respectively by only two amino acid residues. The signal peptides and propeptides of these toxins included a signal-peptide cleavage site similar to the CYASE sequence.
Although most of the known spider toxins contain more than six cysteine residues, the members of classes H and I may include four to five cysteines. Class H included five transcripts with five cysteines in a distinctive -CC-CC-C- pattern comprising two contiguous Cys pairs (Fig. 5). This cysteine pattern is novel, and homology between HN-Ha and other toxins is very low. HN-Ha groups only accounted for about 0.2% of the toxin transcripts and are therefore unlikely to play a key role in the venom function of H. hainanum.
All of the 28 transcripts were classified into six toxin groups according to sequence homology. Three toxins, accounting for a quarter of the class I sequences, did not generate hits in a BLAST search (Fig. 8). HN-Id shares homology with HWTX-XV from O. huwena, which is a TTX-sensitive voltage-gated sodium channel (VGSC) inhibitor. The signal peptide and propeptide of HN-Id are highly variable, but HN-Id has a highly conserved mature peptide and shares the -C-CC-C- cysteine arrangement with HN-Ic. HN-I[a~b] also share this pattern but also did not generate hits in a BLAST search. HN-Ie differs from HNTX-III, having a C-terminal mutation and lacking Cys-5 and Cys-6. The first and second Cys are replaced by other residues in the mature peptide of HN-If.
Class J contains a set of eight homologous transcripts (HN-Ja) with the cysteine pattern X3CX3CX8CX4CX5CX4CX5CX1CXn, where X is any amino acid, and n is an undefined number (Fig. 6). In contrast to some other classes, these peptides do not contain the double-cysteine (-CC-) motif, and this cysteine pattern is consistent with that of LSTX-M1 and ω-agatoxin-1A (28, 37), in which 10 Cys residues form four intrachain disulfide bonds and one interchain disulfide bond. In contrast, HN-Ja did not show sequence similarity with LSTX-M1 or ω-agatoxin-1A but is similar to HNTX-II. Class K contains a set of six transcripts that share the cysteine pattern -C-C-C-C-C-C-C-. The C-terminal region is elongated compared with HNTX-II family peptides and includes an additional Cys residue (Fig. 9). The signal peptide and propeptide were the same for all members of these classes, but the mature peptides were variable in C-terminal regions.
Class L comprises 100 transcripts that share the cysteine pattern -C-C-C-C-C-C- (Fig. 6). HN-La (HNTX-II) was the most significant component in this class. The signal peptide and propeptide of HN-L[b~d] were identical with that of HNTX-II, and the mature peptides were also very similar, and all included the conserved NHHDKIRNRKV sequence motif in the C-terminal region. HN-Le shares high sequence similarity with HWTX-VII, but the third Cys in the mature peptide is replaced by Leu, and the conservative sequence VLKCR is added to the C-terminal region. HN-L[f~i] may be narrowly distributed or even species-specific, because no homologs from other spiders have been reported to date. HN-Lg and HN-Li have identical signal peptides, both lack a propeptide, and both have similar mature peptides that include the Cys pattern X2CX5CXCX12CX11CXn. A Lys-rich motif is also present in the mature peptide of HN-L (Fig. 6).
Mature peptides of class M shared the cysteine pattern -C-C-C-C-C-, which is characteristic of Kunitz-type sequences and is found in HWTX-XI and HWTX-VII from O. huwena (38). HN-M[a~b] share high sequence similarity with HNTX-II, with the third Cys replaced by Asp, Leu, or Ile in HN-Ma. Similarly, the fifth cysteine in mature HN-Mb peptides is replaced by other residues, and there are several additional Cys residues in the C-terminal region. HN-M[c, h] share sequence homology with HNTX-VIII, with a double cysteine missing from the cysteine pattern (Fig. 10), whereas HN-Md shares high sequence similarity with HNTX-XI, except the HN-Md3 variant has an extended C-terminal region enriched with basic residues. The three toxins HN-M[e~g] were novel and generated no hits in a BLAST search (Fig. 6).
Class N toxins exhibited a -C-C-C-C- cysteine pattern, and HN-N[a~h] shared some similarity with HNTX-II, huwentoxin II, and HWTX-VII. HN-Ni exhibited no obvious sequence similarity with any known sequences (Fig. 6).
The abundance of toxin ESTs is reflective of venom composition and provides information on their diversity and evolution. Major differences in transcription levels between different toxin groups and between toxin classes were investigated (Fig. 7). Unsurprisingly, most venom peptides were expressed at relatively high levels and were present in a variety of forms at the transcriptional level. This suggests that the more abundant toxin transcripts are mirrored by a greater number of peptide toxins. Moreover, gene classes containing larger numbers of precursors also tended to produce the highest number of total reads. Overall, Class G produced the highest number of reads and the largest number of precursors, while classes E, F, L, M and N were of intermediate abundance (Fig. 7A). Analysis of the data showed that the 30 most abundant toxin peptides accounted for 84.7% of the total toxin transcripts (Fig. 7B). HN-Gc (HNTX-III) had the highest number of transcripts, which were translated into 82 different HN-Gc variants. A BLAST search revealed that HN-Gc shares considerable sequence identity with 20 toxins from spiders and cone snails (Fig. 8), indicating that HN-Gc may be an ancient ancestral toxin from which others evolved. Most toxin precursors included a number of variants that enriches toxin diversity, although there exist only three mutations between HN-Ee, HN-Md and HN-Gn, suggesting that spiders could further improve venom selectively through greater sequence diversification.
Classes A–N included multiple variants for each toxin sequence, due to random mutations in both prepro and mature toxin regions. This led to broad distributions in the chromatographic fractions due to the close molecular weights of the variants and is reflected in the diversity of structures and physicochemical properties. Rapid gene duplication accompanied by focal hypermutation of residues encoded in the mature peptide sequence, and the high conservation of cysteine residues due to high codon bias can explain the diversity of the variants (10, 11, 39). Alignment of the amino acid sequences of HN-Ge and 18 homologous peptides identified several mutations (Fig. 9). All variants except for HN-Ge18 and Ge-15 share the same propeptide sequence, and some sequences are highly similar. HN-Gb and HN-Gd have an extended C-terminal region compared with HN-Gc (Fig. 5). Alteration of the termination codon, point mutations, and insertions and deletions and shuffling of sections of oligonucleotide sequence are responsible for the diversification of peptides (Fig. 10) and are common to the transcriptome of both snakes and spiders (40,–42). Mutations appear to occur randomly and display evident sequence diversity without disrupting the ICK scaffold. Unsurprisingly, 321 precursor variants resulted from 18 known toxins, but only 23 of these overlapped. The significance of the molecular evolutionary process is 2-fold; most residues mutate randomly, which generates numerous variants, but at the same time, the molecular mechanisms of transcription preserve cysteine residues, resulting in a high conservation of the molecular scaffold (43). Although a high number of toxin variants were present in the venom gland transcriptome of H. hainanum, many of the gene products may not be functionally active but may play a role in the evolution of future toxin variants. Comparison of proteomic and transcriptomic data revealed that only 15 fully sequenced and 9 partially sequenced venom peptides were identified using the 454 transcriptome approach (Fig. 11), and the vast majority of transcripts coding for cysteine knot toxins did not appear as translated peptides in the venom proteome. The combination of a targeted mutagenic mechanism to generate high variability with the subsequent action of diversifying selection on highly expressed variants might explain the hypervariability (44). Most transcripts are perhaps unlikely to be translated, apart from critically important components. This mechanism might prevent the venom gland from going to the expense of producing toxins unnecessarily, which makes the process more systematic and efficient (45). Studying toxin diversity is of importance for the discovery of novel toxins and will help to explain toxin evolution.
The variety of toxin variants in spider venom results partly from the diversity of toxin gene products and partly from post-translational modifications, sequence homologues, and protein degradation (38, 41, 46, 47). For example, F1-29.54-4123 and F1-30.82-4112.8 share the same precursor, but several residues are modified in the C-terminal region of the mature peptides (Fig. 11), and amidation provides further diversity. A single glycine is a common amidation signal, as is Gly followed by a single or dibasic endoproteolytic site at the C terminus (48). Amidation signals were identified at the C terminus of three toxins (G-K in HN-Ga, -Gc, and -Ge). During post-translational processing, precursors are processed to yield mature truncated toxins. A single peptide (K) or tripeptide (RMD) was removed from the C terminus of the deduced peptides by the precursor-processing enzyme in HN-Gg (HNTX-IX) and HN-Go (F3-24.71-4057.6). F3-30.36-3538.0 and F3-25.85-3351.9 are truncated forms of HN-Gm, with a dipeptide (RR) or tripeptide (WRR) deleted from the C terminus, whereas F6-25.12-3998.8 and F3-24.71-4057.6 are truncated forms of HN-Go with a tetrapeptide (GRMD) or tripeptide (RMD) deleted from the C terminus. We speculate that precursors were alternatively spliced and modified during posttranslational processing, as was observed in venoms from O. huwena (41). HNTX-IX shares the same amino acid sequence with HNTX-IX-2, but the molecular mass is higher by 10 Da, and preliminary sequence analysis showed that this may be due to an unknown modification, as observed in O. huwena (49). Despite the very low expression levels of most of these variants, they contributed a great deal to the overall diversity of spider toxins.
To explore the functions of these novel spider toxins, we investigated the activities of 14 representative toxins on Nav (voltage-activated sodium), Kv (voltage-activated potassium), and Cav (voltage-activated calcium) channels in rat DRG neurons using the whole-cell patch clamp technique. In total, 11 six-cysteine-containing toxins were chosen from class G. HN-Ef (HNTX-XVII) has four disulfide bonds, HN-La (F8_17.06) includes the DHH motif, and HN-O38 has three cysteine residues that can only form one disulfide. All 14 tested toxins show very weak inhibition or lack inhibition on voltage-gated potassium channels, although they exhibits variable inhibitory effects on sodium or calcium channels. Some toxins with high sequence similarity exhibited similar bioactivity against the ion channels (Fig. 12). HN-Gc (HNTX-III), HN-Ge (HNTX-IV), and HNTX-V all contain the ICK motif, and all inhibited TTX-sensitive (TTX-S) VGSCs by >70%. However, HN-Ga (HNTX-I) also shares this motif but did not affect TTX-S VGSCs. The hydrophilic Asn-19 and basic His-27 are conserved among the first three toxins but are replaced by Gly and acidic Asp in HN-Ga, which may be responsible for the different channel inhibition activity. This result indicates that the mutation of key residues could profoundly affect toxin activity. HN-Gu (HN-XVI) showed high inhibition against TTX-resistant VGSCs, and this toxin has an ICK motif, but there are 14 residues between the fourth and fifth cysteines, which is much more than in other ICK-containing toxins. Moreover, toxins with diverse cysteine patterns could have similar functions. For example, at a concentration of 100 nm, HN-Ge (HNTX-IV) inhibited calcium channels by >50% and TTX-S sodium channels by >30%. Similarly, HN-O38 has only three cysteine residues, whereas F8_17.06 has a complete disulfide-directed β-hairpin motif, and both of these toxins have a cysteine pattern distinct from that of ICK motif toxins, yet all three inhibited calcium currents by >30%.
In summary, the activity of spider toxins on ion channels was highly variable. Toxins with conserved sequence patterns can share similar functions or behave very differently. Toxins exhibiting low activity against DRG ion channels may be important for inactivating prey, but further functional studies are needed.
Spider toxins form a huge reservoir of chromosomally encoded short peptides that exhibit remarkable structural diversity. Many of the more highly abundant toxins have been studied, but the mechanisms of toxin diversity and evolution are far from being well understood. In this work, a high throughput 454 pyrosequencing approach was deployed because it offers the high throughput, accuracy, and relatively long reads (>300 bp on average) required to cover the full length of spider toxin precursors (60–120 amino acids). A total of 1,136 toxin precursors were detected that clustered into 90 toxin peptides, 18 of which have been previously reported (Fig. 13), 44 of which were novel but shared sequence similarity with previously reported H. hainanum toxins, 14 of which shared sequence similarity with known toxins from other spiders, and 14 of which were novel and did not retrieve any hits following a BLAST search (Fig. 14). A total of 20 different cysteine scaffolds were present that contained 4–12 Cys residues. Six of these cysteine scaffolds have been presented in previous work, but 14 were novel. The cysteine pattern -C-C-CC-C-C- that folds into the highly stable ICK motif was the most abundant. This toxin scaffold is found in a wide range of bioactive peptides in spiders, snakes, or other venomous animals (4, 50,–54), suggesting that this structure evolved in an ancient venomous ancestor.
At the transcriptional level, many toxin variants were identified that arose from a limited set of genes through hypermutation and fragment insertion/deletion. Although the mutations occurred randomly, in general, sequences diverged, but the common cysteine scaffold was conserved. Moreover, alternative cleavage sites, heterogeneous post-translational modifications, and highly variable N- and C-terminal truncations further increase peptide diversity (10).
The abundance of different toxin transcripts was highly variable and was reflected in the composition of venom peptides, and this is important for understanding of toxin diversity and evolution. In general, there was a positive correlation between mRNA levels and peptide abundance. Class G contained both the highest number of transcripts and the largest number of peptide precursors. This important toxin class may target channels conserved in various types of prey animal. Toxins of intermediate abundance also may also be related to hunting. A surprisingly large number of transcript sequences were expressed at relatively low levels, many of which differed from more abundant peptides by single amino acid changes, deletions, and frame and stop codon shifts. Some toxin variants were generated from alternative cleavage sites, interrupted or elongated cysteine patterns. The toxins in low abundance might be considered as the track of toxin evolution; namely, natural selection might drive the spider to retain abundant fractions of the venom, which are applied by the spider in prey capture and/or defense.
Only 18 of the identified toxin genes have been described in previous work. We hypothesize that toxin variants arose by random mutations and focal hypermutations from rapid gene duplication, and post-translational modifications provide further structural diversity. These processes allow the venom transcriptome to rapidly adapt to changes in the environment or type of prey (55, 56). It has been suggested that it is the cumulative effect of individual spider toxins that is responsible for the lethality, and different toxin groups exist in different spiders.
Analysis of toxin activity on DRG ion channels showed that some of the toxins with similar sequence patterns generally exhibited similar functions; however, with some exceptions, such toxins alternatively displayed distinct functions. Moreover, some of the peptides with more divergent sequences can also share similar functions if key residues are conserved. The diversity of venom toxins results in many peptides working together to improve the efficiency by which prey are incapacitated. In conclusion, a combination of transcriptomics, peptidomics, and electrophysiology revealed the impressive diversity of spider toxin peptides at the transcriptional and peptide structural and functional levels.
We thank Dr. Zhonghua Liu and Ying Wang for assistance with manuscript writing.
*This work was supported by National Basic Research Program of China (973) Grant 2010CB529801 and by Cooperative Innovation Center of Engineering and New Products for Developmental Biology of Hunan Province Grant 20134486.
4The abbreviations used are: