The number of structural solutions available to a polypeptide chain is limited, making protein structures multiply convergent (Cheng
et al.,
2008; Krishna and Grishin,
2004; Salem
et al.,
1999), while the combinatorial possibilities in sequence space are nearly endless. For this reason, sequence similarity is considered the primary marker of homology. We thus used sensitive sequence comparisons, as implemented in HHpred, to find homologs of the SMP domain in a database concatenating several complete genomes with PDB70 (see ‘Methods’ section). The search was seeded with the SMP domain from Mmm1. The best hits were to other proteins that have previously been described to contain this domain (). In addition, we detected a hitherto unknown SMP domain in the ERMES protein Mdm34. This protein has been reported as an integral outer membrane protein (Youngman
et al.,
2004), but we were unable to identify any sequence motifs in it that would indicate membrane insertion. HHpred searches with other representatives and reciprocal searches with Mdm34 confirmed the presence of an SMP domain, raising the number of SMP domains in ERMES to three ().
We also found statistically significant matches to many eukaryotic proteins from the bactericidal/permeability-increasing protein-like (BPI-like) family (), including two with known structures: BPI (1EWF) and cholesteryl ester transfer protein (CETP; 2OBD). Other members of this family are lipopolysaccharide-binding protein (LPSBP), lipid-binding serum glycoprotein (LBSGP), phospholipid transfer protein (PLTP) and long and short paralogs of palate, lung and nasal epithelium carcinoma-associated protein (PLUNC). Some of these proteins have been shown to bind lipids, e.g. CETP facilitates lipid transport between different lipoproteins (Qiu
et al.,
2007).
BPI and CETP have similar structures, each containing two tandem domains that adopt the same fold, comprising a long α-helix wrapped in a highly curved anti-parallel β-sheet. All BPI-like proteins contain these two domains, the only exception being short PLUNC, which has only one. The domains show little sequence identity (<15%) and sequence comparisons do not yield significant matches between them. Instead, the C-terminal domain only shows matches to the Aha1 protein, a co-chaperone of Hsp90 in eukaryotes which shares the same fold (1USU; d.83.2). Nevertheless, the N- and C-terminal domains of BPI-like proteins are thought to have a common ancestry based on their structural similarity (Kleiger
et al.,
2000). The Structural Classification of Proteins database (SCOP; Murzin
et al.,
1995) also considers them to be homologous and classifies them into the same family (d.83.1.1).
HHpred searches with SMP domains yielded many statistically significant matches to the N-terminal domain of BPI-like proteins (), but not to the C-terminal domain. We confirmed these findings with reciprocal searches using both domains of BPI-like proteins. From the statistical significance of these matches, we conclude that SMP domains are homologous to BPI-like proteins and therefore predict that they share the same tubular fold and lipid-binding properties.
Further searches with the N-terminal domain of BPI-like proteins retrieved three more proteins of known structure: dust mite allergen Der p 7 (3H4Z), a juvenile hormone-binding protein from
Galleria mellonella (JHBP, 2RCK), and a Takeout 1 protein from
Epiphyas postvittana (3E8T). These proteins are exclusively found in arthropods and are involved in binding hydrophobic ligands. They are composed of a single domain homol
ogous to the N-terminal domain of BPI-like proteins (
Supplementary Fig. S1), a relationship that has been described previously (Hamiaux
et al.,
2009; Kolodziejczyk
et al.,
2008; Mueller
et al.,
2010). In view of their similarities in sequence and structure, we propose to group the arthropod proteins together with the BPI-like family into the TULIP superfamily.
To confirm the membership of SMP domains in the TULIP superfamily, we generated fold predictions for 16 representative sequences using the servers Phyre (Kelley and Sternberg,
2009), MULTICOM (Wang
et al.,
2010) and MUSTER (Wu and Zhang,
2008), all of which performed very well in the most recent Critical Assessment of Structure Prediction Experiment, CASP8 (Kryshtafovych
et al.,
2009). All three methods yielded many highest-scoring matches to TULIP domains ( and
Supplementary Fig. S2). Additionally, we queried the fold prediction metaserver I-TASSER (Roy
et al.,
2010), which was the top performing server in CASP 7 & 8 (Zhang,
2007,
2009). This server returned a TULIP domain as the top match for 10 of 16 queries and as one of the top three matches for all but one query (
Supplementary Fig. S3). These matches included both BPI-like and Takeout-like proteins. A structure-assisted multiple sequence alignment of SMP domains to TULIP domains of known structure highlights the basis for these matches (). All sequences have similar length, distribution of (predicted) secondary structure, and pattern of hydrophobic residues. However, there are no conserved sequence motifs, unsurprisingly as such motifs are not even detectable within individual families (Beamer
et al.,
1997; Kolodziejczyk
et al.,
2008).
To explore the relative positions in sequence space of proteins of the TULIP superfamily, we searched for homologs of SMP domains, N-terminal domains of BPI-like proteins, as well as allergens and Takeout proteins in the nr database using HHsenser, and clustered the obtained sequences in CLANS (see ‘Methods’ section). The resulting cluster map () shows three distinct but connected regions corresponding to SMP, BPI and Takeout-like domain families, confirming the proposed homology between them. In addition to the SMP groups described by Lee and Hong (
2006) and the group of Mdm34 proteins described in this article, the clustering revealed a further SMP group, the uncharacterized transmembrane 24 proteins. It also yielded a number of additional groups of BPI-like proteins, including the expression site-associated gene 5 proteins (ESAG5) from trypanosomes, whose homology to BPI-like proteins has been reported previously (Barker
et al.,
2008). BPI and Takeout-like domains are connected by the arthropod allergens, one form of which is unique in containing tandem domains with clear sequence similarity, indicating a domain duplication that occurred in insects (yellow and orange clusters in ).