|Home | About | Journals | Submit | Contact Us | Français|
Subcellular trafficking is a hallmark of eukaryotic cells. Because of their pivotal role in the process, a great deal of attention has been paid to the SNARE proteins. Most R-SNAREs, or "longins", however, also possess a highly conserved, N-terminal fold. This "longin domain" is known to play multiple roles in regulating SNARE activity and targeting via interaction with other trafficking proteins. However, the diversity and complement of longins in eukaryotes is poorly understood.
Our comparative genome survey identified a novel family of longin-related proteins, dubbed the "Phytolongins" because they are specific to land plants. Phytolongins share with longins the N-terminal longin domain and the C-terminal transmembrane domain; however, in the central region, the SNARE motif is replaced by a novel region. Phylogenetic analysis pinpoints the Phytolongins as a derivative of the plant specific VAMP72 longin sub-family and allows elucidation of Phytolongin evolution.
"Longins" have been defined as R-SNAREs composed of both a longin domain and a SNARE motif. However, expressed gene isoforms and splice variants of longins are examples of non-SNARE motif containing longins. The discovery of Phytolongins, a family of non-SNARE longin domain proteins, together with recent evidence on the conservation of the longin-like fold in proteins involved in both vesicle fusion (e.g. the Trs20 tether) and vesicle formation (e.g. σ and μ adaptin) highlight the importance of the longin-like domain in protein trafficking and suggest that it was one of the primordial building blocks of the eukaryotic membrane-trafficking machinery.
Membrane-trafficking is a crucial process in eukaryotic cells. In recent years, the combination of structural biology, molecular cell biology and bio-informatics has allowed the definition of many of the key proteins families involved. Genome-wide analyses of both animals and plants, known to possess complex and tightly regulated protein-trafficking systems, have shown extensive sets of such membrane-trafficking protein machinery [1,2]. Among these, the soluble NSF attachment protein receptors (SNAREs) play a central role in the control of membrane fusion and of protein and lipid traffic [3,4]. SNAREs have been divided into major groups based on either their presence in the vesicle (v-SNAREs) or target membrane (t-SNAREs) or based on the presence of a conserved critical residue in the 0 polar layer, either arginine (R-SNAREs) or glutamine (Q-SNAREs) .
Despite being best characterised in animals, plants and fungi, SNAREs are, in fact, conserved features of the eukaryotic membrane-trafficking system. Comparative genomics and molecular phylogenetics have shown that the four major SNARE super-families (see  for a recent update on SNAREs classification) were already present in the Last Common Eukaryotic Ancestor (LCEA) . The syntaxins or Qa-SNARE super-family has been examined in detail, demonstrating that even the five major organelle and pathway specific families had already evolved before the emergence of the current eukaryotic super-groups [8,9].
The cytoplasmic region of some R-SNAREs, the short VAMPs or "Brevins" (e.g. animal synaptobrevins, yeast Snc1/2), consists of simply the SNARE motif. However, many R-SNAREs also possess a conserved amino-terminal Longin Domain (LD), thus characterizing a large family of long VAMPs or "Longins" . The longins are divided in three main families based on homology to prototypical proteins Ykt6p, Sec22b and TI-VAMP/VAMP7; the LD of Ykt6 and Sec22b show the same globular fold, based on a five-stranded β-sheet core sandwiched by one α-helix on one side and two α-helices on the other . The LD of Ykt6p contains a hydrophobic patch that can inhibit the formation of a fusion complex by intramolecular binding to the coiled-coil domain (SNARE motif); mutation of a conserved Phe residue within this patch abrogates this interaction . Recently, residues in the SNARE motif that are crucial to bind the LD have been identified for Sec22b . Many of these residues are conserved and the same as those involved in SNARE motif binding in TI-VAMP/VAMP7 . Intriguingly, the LD of human TI-VAMP/VAMP7 is capable of playing a dual role because, in addition to negatively regulating the ability of either TI-VAMP/VAMP7 or a LD-synaptobrevin chimera to participate in SNARE complexes, it is also able to target TI-VAMP/VAMP7 to the late endosomal compartment by interacting with the δ subunit of the AP3 adaptor complex  and to interact with the ArfGAP HRB in retrieval from the plasma membrane [14,16]. Such capacity to regulate subcellular localization (SCL) is shown also by the LD of the Arabidopsis thaliana VAMP7 proteins  and of mammalian Ykt6 [18,19]. In mammals, the LD seems also to play a relevant role in regulating neuronal development, as it is crucial to the control of neurite outgrowth [20-22].
Several lines of evidence suggest LD proteins play a central role in trafficking. Firstly, longins are the prototypical R-SNAREs and are essential in eukaryotes, whereas brevins are limited to opisthokonts and synaptobrevins are even more limited taxonomically . Secondly, the LD sensu stricto can also be present in non-SNARE proteins: e.g. mammals have - in addition to the SNARE longin Sec22b - two homologous proteins, Sec22a and Sec22c, which lack the SNARE portion but are involved in early secretory trafficking . As well, alternative splicing of the SYBL1 gene results in encoding the SNARE longin TI-VAMP/VAMP7 and two isoforms showing reverse domain architecture: isoform ''c'' (with the regular SNARE motif but missing the LD ) and isoform ''b'' (with the regular LD but missing the SNARE motif). Finally, the longin-like fold is not limited to members of the SNARE proteins family but rather is shared by other important trafficking protein families, such as the σ and μ subunits in clathrin adaptor complexes , the SEDL/Trs20p subunit of the TRAPP complex [26,27], the SRX domain of the Srα subunit of the signal recognition particle (SRP) [28,29], as well as the CHiPS and DUF254 proteins . Very recently, the syndecan-binding protein synbindin, involved in neuronal membrane trafficking, has been found to show a ''special'' LD-like fold, structurally related to SEDL and split by a loop insertion corresponding to an atypical PDZ domain .
Although the three longin families (Ykt6, VAMP7 and Sec22) have been identified in comparative genomic analyses of SNARE proteins from many eukaryotes , their evolution and diversity has not been fully explored. It is thus not entirely clear whether or not they represent robust clades that branched before the extant eukaryotic supergroups and whether there are any, as-yet, unreported longin families. In order to analyze the complement of LD proteins both in number and genomic structure, we have undertaken a thorough bioinformatic analysis of publicly available completed genomes from diverse eukaryotes, with special emphasis on plant genomes, from both land plants and algae. Trafficking in plants is not only involved in canonical cellular processes but also in regulation of cytokinesis, gravitropism, responses to pathogens and abiotic stress . As such, plants provide an important handle for shedding light on the pivotal role of trafficking in regulating (and mediating) cell function and differentiation.
Here, the three major longin families are demonstrated to be robustly monophyletic and to each contain the diversity of eukaryotes, thus confirming that the gene duplications giving rise to these families pre-date the LCEA . In addition to the known longin families, however, our analysis has allowed the definition of a novel, plant-specific, LD protein family, the Phytolongins. We here characterise this family in silico in terms of genomic complement and structure, protein domain architecture and topology and structural modeling: this shows that a well-conserved N-terminal LD is present in members of this family, as is a predicted C-terminal trans-membrane region. Moreover, the unique central region of Phytolongins - showing neither detectable homology to the SNARE motif nor conservation of hydrophobic heptad repeats - is putatively able to intramolecularly bind the longin domain through a short, SNARE-like motif. Phylogenetic analysis pin-points the Phytolongins as a derivative of the plant specific VAMP72 longin family and allows elucidation of Phytolongin family evolution.
In order to address the evolution and diversity of longins and LD proteins in eukaryotes, we scanned available completed genomes from across eukaryotic diversity. Our sampling was intentionally broad and shallow in most lineages in order to obtain a tractable dataset of LD family proteins for analysis. This sampling included at least one representative of each of the five eukaryotic supergroups  for which genome sequences are publicly available. However, we sampled the Plant lineage in considerable depth. This included representatives of dicots (Arabidopsis thaliana , and Populus trichocarpa ), monocots (Oryza sativa ), moss (Physcomitrella patens ), as well as the multicellular chlorophyte alga Volvox carteri(http://www.jgi.doe.gov/Volvox, 2007) and single-celled chlorophyte and prasinophyte algae (Chlamydomonas reinhardtii  and Ostreococcus tauri ).
Genomes, transcriptomes and corresponding inferred proteomes of such organisms were scanned by iterative homology searching. Originally, we used the sequences of all known longin proteins from Arabidopsis thaliana as probes to scan genomes/transcriptomes/proteomes of the organisms listed above. Homologous extracted hits were in turn used as probes for iterative scanning steps: this process stopped when the search resulted in extracting no further homologous sequences. As a next step, all non-Arabidopsis candidate homologues were used as blast query sequences to be compared to Arabidopsis thaliana longins in order to group them based on classification of the main longin subfamilies (Ykt6, Sec22b and VAMP7)  and further division of plant VAMP7 proteins in two classes: VAMP71 and VAMP72 . In accordance with previous studies, homologues of the three major LD family proteins were identified from the vast majority of eukaryotic genomes (Additional file 1).
The distribution and organization of the "classic" plant longins is presented in Additional file 2. Similar to animals, algae genomes have single Ykt6 and Sec22b genes. However, duplication of Ykt6 is conserved in all land plants, which also show two to four Sec22b-like genes. In plants, which indeed lack orthologues of animal brevins , a progressive amplification of the VAMP7 longin subfamily is observed . We found that - in all scanned complete genomes - the VAMP72 complement is larger than VAMP71; moreover, the single VAMP7 gene of Ostreococcus tauri belongs to the VAMP72 group (Additional file 2). In general, land plants show a 2-4 fold amplification of the complement of classical longins with respect to algae: 12-18 (Physcomitrella patens, Populus trichocarpa) vs. 3-7 (Ostreococcus tauri, Chlamydomonas reinhardtii) genes. This detailed examination of the longin superfamily organisation emphasizes the increased trafficking complexity that has accompanied the colonization of land by the streptophytes and also allowed us to identify several unusual plant longin proteins.
Since Arabidopsis thaliana VAMP727 [UniProt: Q9M376] shows an insertion of several amino acids in the LD sequence, which is unique amongst VAMP7 proteins , we performed a comparative sequence and structural analysis of this region in plant longins. Modeling of the LDs of Arabidopsis thaliana VAMP727 and of its closest homologue VAMP725 [UniProt: O48850] shows that the insertion sequence corresponds to an acidic extension of the loop between helices α-2 and α-3 of the LD (Figure (Figure1).1). Intriguingly, this loop in the LD of Sec22b is part of a conserved interaction surface involved in binding to Sec24 within the Sec23/24/22b complex and in binding and packaging Sec22b by COPII [PDB: 2nut] . When considering that such LD-complex binding is crucial to subcellular targeting, the acidic loop is likely to mediate/regulate the specific SCL of VAMP727 by steric hindrance and/or polar/charge interactions. VAMP727 are present only in seed plants (Spermatophyta) . In more ancient divisions of streptophytes (e.g. Coniferophyta, Gnetophyta) the polar loop is already apparent; however, it is shorter and less acidic than in flowering plants (Magnoliophyta). It is particularly well conserved in Magnoliids, Monocotyledons and Eudicotyledons (Additional file 3).
A few non-SNARE LD proteins have been reported, including mammalian Sec22 gene isoforms Sec22a and Sec22c [11,24]; we report here that plants also have non-SNARE Sec22 genes. A Sec22-like rice protein [UniProt: Q6UU98] - confirmed by FLcDNA [GenBank: AK240832] and by ESTs [GenBank: AK240832, CB632349 and AU057789] - shows a complete LD sequence but lacks both the SNARE motif and the C-terminal TMD. When comparing the transcript to the corresponding genomic sequence (Chromosome 8), it is clear that this results from genomic deletion of the region encoding the SNARE motif in Sec22 paralogues. Although the exon encoding the TMD is conserved, this domain is lost because of a frame shift resulting from the new exon-intron boundary. Hence this Sec22-like protein from rice is expected to correspond to a longin domain, with no further regions. This is not surprising, when considering that single-domain proteins based on the longin fold (e.g. σ adaptin, SEDL) are known to play important roles in trafficking multi-subunit complexes.
Overall, our comparative genomic survey identified several unusual aspects of longin proteins in plants. However most surprisingly, in addition to members of the three well-known longin families, land plant genomes encode a family of previously unreported LD proteins which - based on in silico characterization (see below) - were named "Phytolongins". A first set of Phytolongins was originally identified using VAMP7 sequences from each species as sequence probes. Extracted hits, used as probes in iterated search cycles, allowed for the identification of further homologous sequences. Phytolongins share, with all longins, the N-terminal LD sequence and, with VAMP7-like and Sec22b-like longins, the C-terminus. Topology and TMD predictions (see methods), as well as presence of highly conserved residues in the C-terminus identify a putative TMD, suggesting that most probably Phytolongins are integral membrane proteins sharing topology with longins.
However, the R-SNARE motif of longins is replaced in all Phytolongins by a central region (PhyL region) of unknown function consisting of roughly 60-90 amino acids (Figure (Figure2).2). When using whole Phytolongin sequences or sequence fragments corresponding to their PhyL regions as probes to scan non-redundant protein or DNA sequence databases, no similarity to either SNARE motifs or any other domain was found. Further attempts, performed optimizing BLAST parameters in order to extract weakly similar sequences, confirmed that PhyL sequences are unique and specific to Phytolongins. Moreover, all homology searches confirmed the absence of Phytolongin orthologues in organisms other than land plants.
In order to assess the conservation of genomic organisation of the plant longins, comparison of genomic structures (i.e. exon-intron splitting of paralogues and orthologues) was performed, with the verified genomic structure of each longin gene from the scanned complete plant genomes determined by comparing genomic vs. cDNA sequence. Figure Figure33 illustrates conservation and variation of gene splitting patterns in plant longins. Color-coding in the figure emphasizes that some exon patterns between land plants and algae are better conserved in some longin subfamilies than in others. For example, in land plants, a four-exon pattern is fully conserved in all VAMP71 genes (i.e. in both paralogues and orthologues), whereas the single VAMP71 genes from algae show a different eight-exon pattern and do not share exon-intron junctions with land plant orthologues. Similarly, all Ykt6 genes from land plants share the same six-exon pattern, which is quite different from the mono/bi-exonic pattern of algae genes. Sec22 genes from land plants show a conserved gene-splitting organization (except for the non-SNARE Sec22 gene described above); however, the three-exon organization of their 3' halves (roughly encoding SNARE motif and TMD) is conserved also in algae. The picture of VAMP72 gene organisation is more complex: most land plant genes show a five-exon division of the coding sequences, but three VAMP72 genes are monoexonic in moss and one of the Arabidopsis thaliana VAMP72 genes shows merging of the last two exons (yellow and grey in figure figure3).3). Comparison with algal VAMP72 genes shows conservation of some splitting points: for instance, division between first (light green) and second (pale red) exon. Deeper sequence comparison confirms conservation also in splice junction sequence boundaries. Two of the three longins of Ostreococcus tauri are monoexonic, and the third is biexonic. Finally, the Phytolongin genes are monoexonic in both dicots and monocots (this was confirmed by extending the analysis to Phytolongins from further species as well), whereas moss Phytolongins are biexonic. Overall this analysis confirmed transcription of several, but not all, predicted genes and identified novel, unreported gene structures. It also confirmed expression of Phytolongins from four plant taxa, validating the predicted genes.
Since the profile for the LD [PROSITE: PS50859] was detected in several, but not all Phytolongin sequences, structural modeling of both profile-positive and profile-negative Phytolongins was performed.
Figure Figure44 shows a model of the putative LD of a representative Arabidopsis thaliana Phytolongin [UniProt: Q9SN26]. Homology modeling was performed using Geno3D ; as a template, the NMR structure of human TI-VAMP/VAMP7 LD [PDB: 2dmw] was found to be better than LD structures from either Sec22b [PDB: 1ifq] or Ykt6p [PDB: 1h8m]. Intriguingly, structural variation was found in the α1 side of the LD, which is involved in intramolecular binding to the SNARE motif in both Ykt6p  and Sec22b .
In order to obtain a model including both the LD and PhyL regions, whole Phytolongins were used as sequence probes in fold recognition based modeling. Phyre [43,44] confirmed that the LD of TI-VAMP/VAMP7 LD is the best available template for a Phytolongin LD; in addition however, it was also able to propose a model superimposed onto the structure of subunit Sec22b of the COPII complex recently solved [PDB: 2nup, chain c] . In particular, the model in figure figure5a5a shows that a short peptide from the PhyL region (magenta) is close to the α1-β3 region (blue) of the LD, i.e. to the SNARE-binding site [12,13].
Threading predictions were iterated and the presence of the putative LD binding motif was confirmed for the PhyL regions of all Phytolongins (data not shown). When considering that the α1-β3 region is also a binding partner for the SNARE-like region of Hrb , it is not surprising to see that the putative LD-binding peptides of the PhyL regions are aligned in the model to the LD binding motif of the template and that the putative interaction is based on polar rather than hydrophobic interactions (Figure (Figure5,5, panels b and c). Figure Figure5c5c also shows that the NT half of the PhyL region, including its putative LD binding motif, shares with SNARE motifs some heptadic, hydrophobic layers (whereas the CT half does not - data not shown). Absence of overall homology to the SNARE motif, presence of a putative LD-binding motif and conservation of the heptadic layers only in the NT half suggest that the PhyL region might share with the SNARE motif capacity to bind the LD, but not to participate in SNARE bundles, thus resembling the SNARE-like region of Hrb .
The PhyL region is likely to have strongly diverged from the SNARE motif by point mutations and/or sequence insertions. High divergence between the PhyL region and SNARE motif, together with α1 sequence divergence between Phytolongins and longins LDs suggest that different longin domain proteins may show different binding properties. Indeed, even among SNARE longins from the same organism - e.g. yeast - the intramolecular binding mechanism can be either clearly apparent (Ykt6p ) or not detected (Nyv1p ). Putative binding of the PhyL region to the LD is in agreement with evidence that non-SNARE proteins can also bind the LD [14,15].
In order to obtain further functional predictions, PhyL region sequences from all identified Phytolongins were scanned for the presence of PROSITE motifs/signatures (see methods for details). When searching for degenerate patterns, putative calcium binding regions were consistently found (data not shown) but no positional conservation of these putative sites in multiple alignment was observed. While false positives among degenerate versions of low complexity motifs are quite common, this low confidence prediction is reported because of the special significance of calcium binding in trafficking proteins .
Overall, the domain modeling shows that, despite no detectable sequence homology with SNARE motifs, Phytolongins are bona fide longin proteins with conserved longin domain structure and a potentially conserved binding mechanism between the LD and PhyL motif.
Having established that the Phytolongins are LD proteins, we wanted to establish the longin family from which they are derived. A variety of datasets were created to address this question and were analysed using Bayesian and two methods of protein maximum-likelihood phylogeny. Initial analyses of longins from diverse eukaryotes clearly resolve the Phytolongins as a monophyletic group to the exclusion of all other sequences. The overall analysis (Additional file 4) did not resolve the placement of this clade but did resolve the Ykt6 sequences as monophyletic (0.99/92/90 posterior probabilities/PhyML/RAxML bootstrap support, respectively) indicating that the Phytolongins are not derived from within this family. Subsequent analysis further excluded the Sec22 family as a source of the Phytolongins, with a strongly supported node resolving the Sec22 family and allowing the establishment of the Phytolongins as embedded within the plant specific VAMP72 clade (Figure (Figure66).
In order to further investigate the internal evolution of the Phytolongin family, a final dataset was analysed (Figure (Figure7).7). Independent clades of Phytolongins were observed in the bryophytes (Physcomitrella patens), gymnosperms (Pinus taeda) and the angiosperms. Although the node separating the bryophytes from the other plant Phytolongins is poorly resolved in figure figure7,7, subsequent analyses provided more robust support (Additional file 5-1.00/56/80). Within the angiosperms, two major clades are apparent. Although the inclusion of the monocot sequences in these clades is unclear, the nodes supporting the dicot sequences in each clade are very well supported (Figure (Figure77).
Figure Figure88 illustrates our hypothesis of Phytolongin evolution. The ancestor of streptophytes possessed a single Phytolongin gene, as did the ancestor of tracheophytes with subsequent independent gene family expansions in the descendent lineages. It is difficult to deduce whether the duplication giving rise to the two major clades of angiosperm Phytolongins predates the separation of monocots and dicots. However, based on the observed topology, this appears to be the best-supported scenario. Nonetheless, with the two well-resolved clades of rosid Phytolongins, it is clear that the duplication had already occurred at this point (Figure (Figure8).8). Further expansion of the Phytolongin gene families are also observed in the Populus trichocarpa and Arabidopsis thaliana genomes, as well as in the ancestor of Sorghum bicolor and Oryza sativa.
Preliminary data from subcellular location prediction software applied to the Arabidopsis thaliana Phytolongins gave results inconsistent between the different algorithms and, for the Arabidopsis thaliana VAMPs, results inconsistent with experimentally established location of the proteins (data not shown). Consequently this method of analysis was not pursued. Nonetheless, it is possible to speculate on the possible SCL of Phytolongins and their involvement in plant subcellular trafficking based on their similarity and derivation from the plant specific clade of VAMP72 proteins.
We performed an analysis of percent identity between the animal TI-VAMP/VAMP7, Arabidopsis thaliana VAMP homologues and the four Arabidopsis Phytolongins, considering (i) the full-length sequence, (ii) the LD region only and (iii) the CT region only (i.e. the SNARE motif/Phyl region + TMD). Animal VAMP7 proteins are more similar to the four VAMP71 than to the seven VAMP72 and, intriguingly, such difference is dependent on divergence at the LD sequence. In the CT region, the VAMP71 and VAMP72 share a range in similarity to the animal homologues between 38-42%, as do LDs from animal VAMP7 to plant VAMP71 LDs. However, similarity between the animal TI-VAMP/VAMP7 and VAMP72 LDs is roughly ten percent lower (31 to 34). It is therefore noteworthy that all four Phytolongins LDs are more similar to LDs from VAMP72 proteins than LDs from VAMP71. It has to be stressed here that subcellular targeting of longins is mediated by the LD [12-19], acting as a dominant signal in chimeric constructs combining domains from VAMP7 proteins with different SCL  Moreover, in addition to a similar LD, VAMP72 proteins and Phytolongins are likely to share a conserved intramolecular binding mechanism resulting in a closed conformation in the conformational epitope mediating subcellular targeting.
While the VAMP71 homologues are localized to the Golgi body and vacuole, all VAMP72 proteins localise to the PM/endosomal compartment , apart from VAMP723 (ER ) and VAMP727 (prevacuolar compartment ). Since the Phytolongins share higher similarity with the VAMP72 family, we tentatively speculate that the Phytolongins might be involved in events at the PM/endosomes as well. However, given that multiple linear and often short, cryptic motifs and conformational epitopes, as well as binding partners and post-translational modifications, can finely tune subcellular sorting, experimental evidence is expected to shed light on the SCL, interactions and role in trafficking of this novel protein family.
Our bioinformatic analysis of longin proteins has both verified the ancient nature of the three R-SNARE longin subfamilies and identified the Phytolongins, a previously undescribed LD protein family, specific to plants. That Phytolongins are present in multiple plant genomes, spanning the diversity of land plants, and that Phytolongin transcripts are available from several plant EST projects speak to the validity of the predicted novel genes. The expanded nature of this gene family in many taxa speaks to its potential importance in plant biology.
In addition to this new family of non-SNARE longin proteins, we identified several splice-variants of canonical longins, missing the SNARE motif. These, together with the presence of other non-SNARE longin proteins, and the conserved longin-like fold in a variety of other trafficking proteins, all suggest that the longin domain may be a more central structural feature to membrane-trafficking in eukaryotic cells than is currently recognised. Since the longin-like fold is present in diverse trafficking machinery, involved in vesicle fusion, vesicle formation and even the signal recognition particle, we propose that the longin-like domain should join other prominent structural protein elements, such as the alpha-solenoid, and beta-propeller domains  and monomeric GTPases, in the list of the primordial building blocks that were involved in the earliest evolution of a eukaryotic membrane-trafficking system.
Genome-wide searches were performed using BLAST  with default scoring parameters and excluding the filter for low-complexity regions. Both nucleotide, protein and translated BLAST programs were used to search for homologous genes, transcripts or proteins at both the NCBI and EBI databases as well as at the JGI genome portal http://genome.jgi-psf.org/. Searches vs. complete, non redundant NCBI and EBI databases were performed limiting organism to Eukaryota (taxid:2759); at the same time, several searches at the JGI portal were limited to specific model organisms.
Evidence regarding the conservation and variation of the intron/exon structure was obtained using available transcripts (FLcDNAs and/or ESTs) from EBI, NCBI and JGI databases as sequence queries in BLAST searches vs. genomic scaffolds. Alignment of transcript regions to genomic sequences provided a preliminary exon map of each gene. The map was then manually curated and optimized comparing corresponding translated protein fragments and taking into account splice consensus sequences.
Scanning of canonical PROSITE motifs and signatures was performed using the ScanProsite tool  available at the ExPAsy server http://www.expasy.org, whereas scanning for degenerate patterns was performed using PROSITE scan available on-line at the IBCP-PBIL server http://npsa-pbil.ibcp.fr and allowing for 2 mismatches or setting for 65% similarity.
Prediction of TMD and topology was performed using PSORT , DAS , TMPRED http://www.ch.embnet.org/software/TMPRED_form.html, SOSUI  and HMMTOP .
Homology modeling and superposition of models to templates was performed using the Geno3D tool available on-line at the IBCP-PBIL server http://geno3d-pbil.ibcp.fr. Fold recognition was performed using Phyre [43,44,54]. 3D representation of molecular structures was obtained using the PyMOL Molecular Graphics System http://www.pymol.org.
Sequences were aligned initially using Clustal X  and then adjusted manually based on known secondary structural features of the predicted longin domain. For phylogenetic analysis only regions of unambiguous homology were retained. For all datasets, details of taxon numbers, positions and models of sequence evolution are listed in Additional file 6. All alignments are available upon request and a list of abbreviations and accession numbers for all sequences used in the analyses is provided in Additional file 1.
In all analyses, the model of sequence evolution was established using the program Prot-test V. 1.3 . Datasets were then processed using three methods of protein phylogenetic analysis. The optimal topology and Bayesian posterior probability values were obtained using Mr. Bayes version 3.1.2  with two independent runs each of 1000000 generations. The burnin value was estimated graphically and all trees prior to the plateau were excluded from the consensus. In all cases the splits frequency was below 0.1 indicating that the two runs had converged. Protein Maximum Likelihood (ML) bootstrap support values were calculated using PHYML  and RAxML  with the appropriate models of sequence evolution and correction for variation of rates among sites. Phylogenetic analyses were performed on the CamGrid cluster at the University of Cambridge or the bioinfo cluster at the University of Alberta.
CT: carboxy (C)-terminal; FLcDNA: full-length cDNA; LCEA: last common eukaryotic ancestor; LD: Longin domain; NT: amino (N)-terminal; PDZ: PSD95: DlgA and Zo-1; SNARE: soluble N-ethylmaleimide sensitive factor (NSF) attachment protein (SNAP) receptor; SRP: signal recognition particle; TI-VAMP: tetanus neurotoxin insensitive VAMP; TMD: transmembrane domain; TRAPP: transport protein particle; VAMP: vesicle associated membrane protein
MV collected the dataset for the analysis, performed structural modeling and some of the phylogenetic analyses. VR participated in data mining and performed the genomic structure analyses. JBD conceived of and supervised the phylogenetic analyses, as well as performed the analyses on the Phytolongin-specific dataset. FF initiated and coordinated the work and participated in the structural modeling and sequence analyses. All authors have read and approved of the final manuscript.
Table S1. Accession numbers and names of all sequences identified in our comparative genomic searches and used in phylogenetic analyses.
Table S2. Whole complements of "classic" longins from a number of model plant species.
Alignment of the VAMP727 loop region in diverse plants. This figure shows a multiple alignment of the conserved VAMP727 acidic loop region including a few adjacent C-ter and N-ter residues, corresponding to regions 94-121 or 96-124 of VAMP727 proteins from respectively Arabidopsis thaliana or Oryza sativa. Loops regions are 98-116 (At) and 100-119 (Os) respectively.
Longin phylogeny. This figure demonstrates that the Phytolongins form a well-resolved clade within the longin family and that they are unlikely to have been derived from within the Ykt6 clade of longins. The vertical bars highlight the Ykt6 and Phytolongin (PL) clades respectively.
Phytolongin phylogeny (altered taxon sampling). This figure shows the robust separation of bryophyte and gymno/angiosperm sequences. This dataset included a homologue from Salaginella moellendorffii but excluded a divergent sequence from Sorghum bicolor and resulted in a more robust resolution of the Physcomitrella patens sequences from the other plant Phytolongins (vertical bar).
Table S3. Characteristics of datasets used in phylogenetic analysis. Datasets are listed by name, number of taxa, number of amino acid positions, the model of sequence evolution deduced by Prot-test and the figure in the paper in which the results are illustrated.
We thank Gary Eitzen for critical reading of the manuscript as well as Rosario Dioguardi and Barbara Pasqualato for technical assistance. We acknowledge the CamGrid computational resource on which some of these analyses were performed and financial support to JBD (from the Wellcome Trust, the Parke Davis Foundation and a start-up grant from the University of Alberta) and to FF (from the MIUR-PRIN2005 and the Padua University, PRAT2007 project CPDA077345/07). MV was supported by a PhD fellowship from Ca.Ri.Pa.Ro. Foundation.