Search tips
Search criteria

Results 1-5 (5)

Clipboard (0)

Select a Filter Below

Year of Publication
Document Types
1.  A Bacteriophage Tailspike Domain Promotes Self-Cleavage of a Human Membrane-Bound Transcription Factor, the Myelin Regulatory Factor MYRF 
PLoS Biology  2013;11(8):e1001624.
Myelination of the central nervous system (CNS) is critical to vertebrate nervous systems for efficient neural signaling. CNS myelination occurs as oligodendrocytes terminally differentiate, a process regulated in part by the myelin regulatory factor, MYRF. Using bioinformatics and extensive biochemical and functional assays, we find that MYRF is generated as an integral membrane protein that must be processed to release its transcription factor domain from the membrane. In contrast to most membrane-bound transcription factors, MYRF proteolysis seems constitutive and independent of cell- and tissue-type, as we demonstrate by reconstitution in E. coli and yeast. The apparent absence of physiological cues raises the question as to how and why MYRF is processed. By using computational methods capable of recognizing extremely divergent sequence homology, we identified a MYRF protein domain distantly related to bacteriophage tailspike proteins. Although occurring in otherwise unrelated proteins, the phage domains are known to chaperone the tailspike proteins' trimerization and auto-cleavage, raising the hypothesis that the MYRF domain might contribute to a novel activation method for a membrane-bound transcription factor. We find that the MYRF domain indeed serves as an intramolecular chaperone that facilitates MYRF trimerization and proteolysis. Functional assays confirm that the chaperone domain-mediated auto-proteolysis is essential both for MYRF's transcriptional activity and its ability to promote oligodendrocyte maturation. This work thus reveals a previously unknown key step in CNS myelination. These data also reconcile conflicting observations of this protein family, different members of which have been identified as transmembrane or nuclear proteins. Finally, our data illustrate a remarkable evolutionary repurposing between bacteriophages and eukaryotes, with a chaperone domain capable of catalyzing trimerization-dependent auto-proteolysis in two entirely distinct protein and cellular contexts, in one case participating in bacteriophage tailspike maturation and in the other activating a key transcription factor for CNS myelination.
Author Summary
Membrane-bound transcription factors are synthesized as integral membrane proteins, but are proteolytically cleaved in response to relevant cues, untethering their transcription factor domains from the membrane to control gene expression in the nucleus. Here, we find that the myelin regulatory factor MYRF, a major transcriptional regulator of oligodendrocyte differentiation and central nervous system myelination, is also a membrane-bound transcription factor. In marked contrast to most well-known membrane-bound transcription factors, cleavage of MYRF appears to be unconditional. Surprisingly, this processing is performed by a protein domain shared with bacteriophages in otherwise unrelated proteins, where the domain is critical to the folding and proteolytic maturation of virus tailspikes. In addition to revealing a previously unknown key step in central nervous system myelination, this work also illustrates a remarkable example of evolutionary repurposing between bacteriophages and eukaryotes, with the same protein domain capable of catalyzing trimerization-dependent auto-proteolysis in two completely distinct protein and cellular contexts.
PMCID: PMC3742443  PMID: 23966832
2.  A flaw in the typical evaluation scheme for pair-input computational predictions 
Nature methods  2012;9(12):1134-1136.
PMCID: PMC3531800  PMID: 23223166
3.  Revisiting the negative example sampling problem for predicting protein–protein interactions 
Bioinformatics  2011;27(21):3024-3028.
Motivation: A number of computational methods have been proposed that predict protein–protein interactions (PPIs) based on protein sequence features. Since the number of potential non-interacting protein pairs (negative PPIs) is very high both in absolute terms and in comparison to that of interacting protein pairs (positive PPIs), computational prediction methods rely upon subsets of negative PPIs for training and validation. Hence, the need arises for subset sampling for negative PPIs.
Results: We clarify that there are two fundamentally different types of subset sampling for negative PPIs. One is subset sampling for cross-validated testing, where one desires unbiased subsets so that predictive performance estimated with them can be safely assumed to generalize to the population level. The other is subset sampling for training, where one desires the subsets that best train predictive algorithms, even if these subsets are biased. We show that confusion between these two fundamentally different types of subset sampling led one study recently published in Bioinformatics to the erroneous conclusion that predictive algorithms based on protein sequence features are hardly better than random in predicting PPIs. Rather, both protein sequence features and the ‘hubbiness’ of interacting proteins contribute to effective prediction of PPIs. We provide guidance for appropriate use of random versus balanced sampling.
Availability: The datasets used for this study are available at
Supplementary Information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3198576  PMID: 21908540
4.  Critical assessment of sequence-based protein-protein interaction prediction methods that do not require homologous protein sequences 
BMC Bioinformatics  2009;10:419.
Protein-protein interactions underlie many important biological processes. Computational prediction methods can nicely complement experimental approaches for identifying protein-protein interactions. Recently, a unique category of sequence-based prediction methods has been put forward - unique in the sense that it does not require homologous protein sequences. This enables it to be universally applicable to all protein sequences unlike many of previous sequence-based prediction methods. If effective as claimed, these new sequence-based, universally applicable prediction methods would have far-reaching utilities in many areas of biology research.
Upon close survey, I realized that many of these new methods were ill-tested. In addition, newer methods were often published without performance comparison with previous ones. Thus, it is not clear how good they are and whether there are significant performance differences among them. In this study, I have implemented and thoroughly tested 4 different methods on large-scale, non-redundant data sets. It reveals several important points. First, significant performance differences are noted among different methods. Second, data sets typically used for training prediction methods appear significantly biased, limiting the general applicability of prediction methods trained with them. Third, there is still ample room for further developments. In addition, my analysis illustrates the importance of complementary performance measures coupled with right-sized data sets for meaningful benchmark tests.
The current study reveals the potentials and limits of the new category of sequence-based protein-protein interaction prediction methods, which in turn provides a firm ground for future endeavours in this important area of contemporary bioinformatics.
PMCID: PMC2803199  PMID: 20003442
5.  Prediction of the burial status of transmembrane residues of helical membrane proteins 
BMC Bioinformatics  2007;8:302.
Helical membrane proteins (HMPs) play a crucial role in diverse cellular processes, yet it still remains extremely difficult to determine their structures by experimental techniques. Given this situation, it is highly desirable to develop sequence-based computational methods for predicting structural characteristics of HMPs.
We have developed TMX (TransMembrane eXposure), a novel method for predicting the burial status (i.e. buried in the protein structure vs. exposed to the membrane) of transmembrane (TM) residues of HMPs. TMX derives positional scores of TM residues based on their profiles and conservation indices. Then, a support vector classifier is used for predicting their burial status. Its prediction accuracy is 78.71% on a benchmark data set, representing considerable improvements over 68.67% and 71.06% of previously proposed methods. Importantly, unlike the previous methods, TMX automatically yields confidence scores for the predictions made. In addition, a feature selection incorporated in TMX reveals interesting insights into the structural organization of HMPs.
A novel computational method, TMX, has been developed for predicting the burial status of TM residues of HMPs. Its prediction accuracy is much higher than that of previously proposed methods. It will be useful in elucidating structural characteristics of HMPs as an inexpensive, auxiliary tool. A web server for TMX is established at and freely available to academic users, along with the data set used.
PMCID: PMC2000914  PMID: 17708758

Results 1-5 (5)