Short linear motifs (SLiMs) are functional stretches of protein sequence that are of crucial importance for numerous biological processes by mediating protein–protein interactions. These motifs often comprise peptides of less than 10 amino acids that modulate protein–protein interactions. While well-characterized in eukaryotic intracellular signaling, their role in prokaryotic signaling is less well-understood. We surveyed the distribution of known motifs in prokaryotic extracellular and virulence proteins across a range of bacterial species and conducted searches for novel motifs in virulence proteins. Many known motifs in virulence effector proteins mimic eukaryotic motifs and enable the pathogen to control the intracellular processes of their hosts. Novel motifs were detected by finding those that had evolved independently in three or more unrelated virulence proteins. The search returned several significantly over-represented linear motifs of which some were known motifs and others are novel candidates with potential roles in bacterial pathogenesis. A putative C-terminal G[AG].$ motif found in type IV secretion system proteins was among the most significant detected. A KK$ motif that has been previously identified in a plasminogen-binding protein, was demonstrated to be enriched across a number of adhesion and lipoproteins. While there is some potential to develop peptide drugs against bacterial infection based on bacterial peptides that mimic host components, this could have unwanted effects on host signaling. Thus, novel SLiMs in virulence factors that do not mimic host components but are crucial for bacterial pathogenesis, such as the type IV secretion system, may be more useful to develop as leads for anti-microbial peptides or drugs.
short linear motifs (SLiMs); virulence factor; motif mimicry; antibacterial; bioinformatics; pathogen
The purpose of this study was to investigate the blood stage of the malaria causing parasite, Plasmodium falciparum, to predict potential protein interactions between the parasite merozoite and the host erythrocyte and design peptides that could interrupt these predicted interactions. We screened the P. falciparum and human proteomes for computationally predicted short linear motifs (SLiMs) in cytoplasmic portions of transmembrane proteins that could play roles in the invasion of the erythrocyte by the merozoite, an essential step in malarial pathogenesis. We tested thirteen peptides predicted to contain SLiMs, twelve of them palmitoylated to enhance membrane targeting, and found three that blocked parasite growth in culture by inhibiting the initiation of new infections in erythrocytes. Scrambled peptides for two of the most promising peptides suggested that their activity may be reflective of amino acid properties, in particular, positive charge. However, one peptide showed effects which were stronger than those of scrambled peptides. This was derived from human red blood cell glycophorin-B. We concluded that proteome-wide computational screening of the intracellular regions of both host and pathogen adhesion proteins provides potential lead peptides for the development of anti-malarial compounds.
Bioactive peptides in the juxtamembrane regions of proteins are involved in many signaling events. The juxtamembrane regions of cadherins were examined for the identification of bioactive regions. Several peptides spanning the cytoplasmic juxtamembrane regions of E- and N-cadherin were synthesized and assessed for the ability to influence TGFβ responses in epithelial cells at the gene expression and protein levels. Peptides from regions closer to the membrane appeared more potent inhibitors of TGFβ signaling, blocking Smad3 phosphorylation. Thus inhibiting nuclear translocation of phosphorylated Smad complexes and subsequent transcriptional activation of TGFβ signal propagating genes. The peptides demonstrated a peptide-specific potential to inhibit other TGFβ superfamily members, such as BMP4.
BMP4; E-cadherin; Jagged; N-cadherin; Smad; TGFβ; adherens junctions; palmitic acid; peptide
Identifying effective therapeutic drug combinations that modulate complex
signaling pathways in platelets is central to the advancement of effective
anti-thrombotic therapies. However, there is no systems model of the platelet
that predicts responses to different inhibitor combinations. We developed an
approach which goes beyond current inhibitor-inhibitor combination screening to
efficiently consider other signaling aspects that may give insights into the
behaviour of the platelet as a system. We investigated combinations of platelet
inhibitors and activators. We evaluated three distinct strands of information,
namely: activator-inhibitor combination screens (testing a panel of inhibitors
against a panel of activators); inhibitor-inhibitor synergy screens; and
activator-activator synergy screens. We demonstrated how these analyses may be
efficiently performed, both experimentally and computationally, to identify
particular combinations of most interest. Robust tests of activator-activator
synergy and of inhibitor-inhibitor synergy required combinations to show
significant excesses over the double doses of each component. Modeling
identified multiple effects of an inhibitor of the P2Y12 ADP receptor, and
complementarity between inhibitor-inhibitor synergy effects and
activator-inhibitor combination effects. This approach accelerates the mapping
of combination effects of compounds to develop combinations that may be
therapeutically beneficial. We integrated the three information sources into a
unified model that predicted the benefits of a triple drug combination targeting
ADP, thromboxane and thrombin signaling.
Drugs are often used in combinations, but establishing the best combinations is a
considerable challenge for basic and clinical research. Anti-platelet therapies
reduce thrombosis and heart attacks by lowering the activation of platelet
cells. We wanted to find good drug combinations, but a full systems model of the
platelet is absent, so we had no good predictions of how particular combinations
might behave. Instead, we put together three sources of knowledge. The first
concerned what inhibitors act on what activators; the second concerned what
pairs of activators synergise together (having a bigger effect than expected);
and the third concerned what pairs of inhibitors synergise together. We
implemented an efficient experimental approach to collect this information from
experiments on platelets. We developed a statistical model that brought these
separate results together. This gave us insights into how platelet inhibitors
act. For example, an inhibitor of an ADP receptor showed multiple effects. We
also worked out from the model what further (triple) combinations of drugs may
be most efficient. We predicted, and then tested experimentally, the effects of
a triple drug combination. This simultaneously inhibited the platelet’s
responses to three stimulants that it encounters during coronary thrombosis,
namely ADP, thromboxane and thrombin.
Human milk is known to contain several
proteases, but little is
known about whether these enzymes are active, which proteins they
cleave, and their relative contribution to milk protein digestion
in vivo. This study analyzed the mass spectrometry-identified protein
fragments found in pooled human milk by comparing their cleavage sites
with the enzyme specificity patterns of an array of enzymes. The results
indicate that several enzymes are actively taking part in the digestion
of human milk proteins within the mammary gland, including plasmin
and/or trypsin, elastase, cathepsin D, pepsin, chymotrypsin, a glutamyl
endopeptidase-like enzyme, and proline endopeptidase. Two proteins
were most affected by enzyme hydrolysis: β-casein and polymeric
immunoglobulin receptor. In contrast, other highly abundant milk proteins
such as α-lactalbumin and lactoferrin appear to have undergone
no proteolytic cleavage. A peptide sequence containing a known antimicrobial
peptide is released in breast milk by elastase and cathepsin D.
hydrolysate; human milk digestion; milk; nutrition; proteolytic enzymes; bioactive peptide
Large portions of higher eukaryotic proteomes are intrinsically disordered, and abundant evidence suggests that these unstructured regions of proteins are rich in regulatory interaction interfaces. A major class of disordered interaction interfaces are the compact and degenerate modules known as short linear motifs (SLiMs). As a result of the difficulties associated with the experimental identification and validation of SLiMs, our understanding of these modules is limited, advocating the use of computational methods to focus experimental discovery. This article evaluates the use of evolutionary conservation as a discriminatory technique for motif discovery. A statistical framework is introduced to assess the significance of relatively conserved residues, quantifying the likelihood a residue will have a particular level of conservation given the conservation of the surrounding residues. The framework is expanded to assess the significance of groupings of conserved residues, a metric that forms the basis of SLiMPrints (short linear motif fingerprints), a de novo motif discovery tool. SLiMPrints identifies relatively overconstrained proximal groupings of residues within intrinsically disordered regions, indicative of putatively functional motifs. Finally, the human proteome is analysed to create a set of highly conserved putative motif instances, including a novel site on translation initiation factor eIF2A that may regulate translation through binding of eIF4E.
Epistasis (synergistic interaction) among SNPs governing gene expression is likely to arise within transcriptional networks. However, the power to detect it is limited by the large number of combinations to be tested and the modest sample sizes of most datasets. By limiting the interaction search space firstly to cis-trans and then cis-cis SNP pairs where both SNPs had an independent effect on the expression of the most variable transcripts in the liver and brain, we greatly reduced the size of the search space.
Within the cis-trans search space we discovered three transcripts with significant epistasis. Surprisingly, all interacting SNP pairs were located nearby each other on the chromosome (within 290 kb-2.16 Mb). Despite their proximity, the interacting SNPs were outside the range of linkage disequilibrium (LD), which was absent between the pairs (r2 < 0.01). Accordingly, we redefined the search space to detect cis-cis interactions, where a cis-SNP was located within 10 Mb of the target transcript. The results of this show evidence for the epistatic regulation of 50 transcripts across the tissues studied. Three transcripts, namely, HLA-G, PSORS1C1 and HLA-DRB5 share common regulatory SNPs in the pre-frontal cortex and their expression is significantly correlated. This pattern of epistasis is consistent with mediation via long-range chromatin structures rather than the binding of transcription factors in trans. Accordingly, some of the interactions map to regions of the genome known to physically interact in lymphoblastoid cell lines while others map to known promoter and enhancer elements. SNPs involved in interactions appear to be enriched for promoter markers.
In the context of gene expression and its regulation, our analysis indicates that the study of cis-cis or local epistatic interactions may have a more important role than interchromosomal interactions.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1300-3) contains supplementary material, which is available to authorized users.
Human milk is known to contain several proteases, but little is known about whether these enzymes are active, which proteins they cleave, and their relative contribution to milk protein digestion in vivo. This study analyzed the mass spectrometry-identified protein fragments found in pooled human milk by comparing their cleavage sites with the enzyme specificity patterns of an array of enzymes. The results indicate that several enzymes are actively taking part in the digestion of human milk proteins within the mammary gland, including plasmin and/or trypsin, elastase, cathepsin D, pepsin, chymotrypsin, a glutamyl endopeptidase-like enzyme, and proline endopeptidase. Two proteins were most affected by enzyme hydrolysis: β-casein and polymeric immunoglobulin receptor. In contrast, other highly abundant milk proteins such as α-lactalbumin and lactoferrin appear to have undergone no proteolytic cleavage. A peptide sequence containing a known antimicrobial peptide is released in breast milk by elastase and cathepsin D.
hydrolysate; human milk digestion; milk; nutrition; proteolytic enzymes; bioactive peptide
Little is known about the digestive process in infants. In particular, the chronological activity of enzymes across the course of digestion in the infant remains largely unknown. To create a temporal picture of how milk proteins are digested, enzyme activity was compared between intact human milk samples from three mothers and the gastric samples from each of their 4–12 day postpartum infants, 2 h after breast milk ingestion. The activities of 7 distinct enzymes are predicted in the infant stomach based on their observed cleavage pattern in peptidomics data. We found that the same patterns of cleavage were evident in both intact human milk and gastric milk samples, demonstrating that the enzyme activities that begin in milk persist in the infant stomach. However, the extent of enzyme activity is found to vary greatly between the intact milk and gastric samples. Overall, we observe that milk-specific proteins are cleaved at higher levels in the stomach compared to human milk. Notably, the enzymes we predict here only explain 78% of the cleavages uniquely observed in the gastric samples, highlighting that further investigation of the specific enzyme activities associated with digestion in infants is warranted.
milk enzymes; enzyme activity; digestive enzymes; infant digestion; proteolytic enzymes; human milk; indigenous enzymes
Statins effectively lower LDL cholesterol levels in large studies and the observed interindividual response variability may be partially explained by genetic variation. Here we perform a pharmacogenetic meta-analysis of genome-wide association studies (GWAS) in studies addressing the LDL cholesterol response to statins, including up to 18,596 statin-treated subjects. We validate the most promising signals in a further 22,318 statin recipients and identify two loci, SORT1/CELSR2/PSRC1 and SLCO1B1, not previously identified in GWAS. Moreover, we confirm the previously described associations with APOE and LPA. Our findings advance the understanding of the pharmacogenetic architecture of statin response.
Statins are effectively used to prevent and manage cardiovascular disease, but patient response to these drugs is highly variable. Here, the authors identify two new genes associated with the response of LDL cholesterol to statins and advance our understanding of the genetic basis of drug response.
Background and Purpose
Visit-to-visit variability in BP is associated with ischemic stroke. We sought to determine whether such variability has a genetic aetiology and whether genetic variants associated with BP variability are also associated with ischemic stroke.
A GWAS for loci influencing BP variability was undertaken in 3,802 individuals from the Anglo-Scandinavian Cardiac Outcome Trial (ASCOT) study where long-term visit-to-visit and within visit BP measures were available. Since BP variability is strongly associated with ischemic stroke, we genotyped the sentinel SNP in an independent ischemic stroke population comprising of 8,624 cases and 12,722 controls and in 3,900 additional (Scandinavian) participants from the ASCOT study in order to replicate our findings.
The ASCOT discovery GWAS identified a cluster of 17 correlated SNPs within the NLGN1 gene (3q26.31) associated with BP variability. The strongest association was with rs976683 (p=1.4×10−8). Conditional analysis on rs976683 provided no evidence of additional independent associations at the locus. Analysis of rs976683 in ischemic stroke patients found no association for overall stroke (OR 1.02; 95% CI 0.97-1.07; p=0.52) or its sub-types: CE (OR 1.07; 95% CI 0.97-1.16; p=0.17), LVD (OR 0.98; 95% 0.89-1.07; p=0.60) and SVD (OR 1.07; 95% CI 0.97-1.17; p=0.19). No evidence for association was found between rs976683 and BP variability in the additional (Scandinavian) ASCOT participants (p=0.18).
We identified a cluster of SNPs at the NLGN1 locus showing significant association with BP variability. Follow up analyses did not support an association with risk of ischemic stroke and its subtypes.
Blood pressure variability; stroke; GWAS; gene; polymorphism
Bioactive cyclic peptides derived from natural sources are well studied, particularly those derived from non-ribosomal synthetases in fungi or bacteria. Ribosomally synthesised bioactive disulphide-bonded loops represent a large, naturally enriched library of potential bioactive compounds, worthy of systematic investigation.
We examined the distribution of short cyclic loops on the surface of a large number of proteins, especially membrane or extracellular proteins. Available three-dimensional structures highlighted a number of disulphide-bonded loops responsible for the majority of the likely binding interactions in a variety of protein complexes, due to their location at protein-protein interfaces. We find that disulphide-bonded loops at protein-protein interfaces may, but do not necessarily, show biological activity independent of their parent protein. Examining the conservation of short disulphide bonded loops in proteins, we find a small but significant increase in conservation inside these loops compared to surrounding residues. We identify a subset of these loops that exhibit a high relative conservation, particularly among peptide hormones.
We conclude that short disulphide-bonded loops are found in a wide variety of biological interactions. They may retain biological activity outside their parent proteins. Such structurally independent peptides may be useful as biologically active templates for the development of novel modulators of protein-protein interactions.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2105-15-305) contains supplementary material, which is available to authorized users.
Cyclic peptide; Protein loop; Protein interface; Bioactive peptide; Ribosomal cyclic peptide
Elevated resting heart rate is associated with greater risk of cardiovascular disease and mortality. In a 2-stage meta-analysis of genome-wide association studies in up to 181,171 individuals, we identified 14 new loci associated with heart rate and confirmed associations with all 7 previously established loci. Experimental downregulation of gene expression in Drosophila melanogaster and Danio rerio identified 20 genes at 11 loci that are relevant for heart rate regulation and highlight a role for genes involved in signal transmission, embryonic cardiac development and the pathophysiology of dilated cardiomyopathy, congenital heart failure and/or sudden cardiac death. In addition, genetic susceptibility to increased heart rate is associated with altered cardiac conduction and reduced risk of sick sinus syndrome, and both heart rate–increasing and heart rate–decreasing variants associate with risk of atrial fibrillation. Our findings provide fresh insights into the mechanisms regulating heart rate and identify new therapeutic targets.
Disordered regions of proteins often bind to structured domains, mediating interactions within and between proteins. However, it is difficult to identify a priori the short disordered regions involved in binding. We set out to determine if docking such peptide regions to peptide binding domains would assist in these predictions.We assembled a redundancy reduced dataset of SLiM (Short Linear Motif) containing proteins from the ELM database. We selected 84 sequences which had an associated PDB structures showing the SLiM bound to a protein receptor, where the SLiM was found within a 50 residue region of the protein sequence which was predicted to be disordered. First, we investigated the Vina docking scores of overlapping tripeptides from the 50 residue SLiM containing disordered regions of the protein sequence to the corresponding PDB domain. We found only weak discrimination of docking scores between peptides involved in binding and adjacent non-binding peptides in this context (AUC 0.58).Next, we trained a bidirectional recurrent neural network (BRNN) using as input the protein sequence, predicted secondary structure, Vina docking score and predicted disorder score. The results were very promising (AUC 0.72) showing that multiple sources of information can be combined to produce results which are clearly superior to any single source.We conclude that the Vina docking score alone has only modest power to define the location of a peptide within a larger protein region known to contain it. However, combining this information with other knowledge (using machine learning methods) clearly improves the identification of peptide binding regions within a protein sequence. This approach combining docking with machine learning is primarily a predictor of binding to peptide-binding sites, and is not intended as a predictor of specificity of binding to particular receptors.
YAP (Yes-associated protein) is a potent oncogene and a major effector of the mammalian Hippo tumor suppressor pathway. In this review, our emphasis is on the structural basis of how YAP recognizes its various cellular partners. In particular, we discuss the role of LATS kinase and AMOTL1 junction protein, two key cellular partners of YAP that bind to its WW domain, in mediating cytoplasmic localization of YAP and thereby playing a key role in the regulation of its transcriptional activity. Importantly, the crystal structure of an amino-terminal domain of YAP in complex with the carboxy-terminal domain of TEAD transcription factor was only recently solved at atomic resolution, while the structure of WW domain of YAP in complex with a peptide containing the PPxY motif has been available for more than a decade. We discuss how such structural information may be exploited for the rational development of novel anti-cancer therapeutics harboring greater efficacy coupled with low toxicity. We also embark on a brief discussion of how recent in silico studies led to identification of the cardiac glycoside digitoxin as a potential modulator of WW domain-ligand interactions. Conversely, dobutamine was identified in a screen of known drugs as a compound that promotes cytoplasmic localization of YAP, thereby resulting in growth suppressing activity. Finally, we discuss how a recent study on the dynamics of WW domain folding on a biologically critical time scale may provide a tool to generate repertoires of WW domain variants for regulation of the Hippo pathway toward desired, non-oncogenic outputs.
TEAD transcription factor; WW domain; PDZ domain; Nuclear localization; Digitoxin; Dobutamine
Computational protein short linear motif discovery can use protein interaction information to search for motifs among proteins which share a common interactor. Cytoscape provides a visual interface for protein networks but there is no streamlined way to rapidly visualize motifs in a network of proteins, or to integrate computational discovery with such visualizations.
We present SLiMScape, a Cytoscape plugin, which enables both de novo motif discovery and searches for instances of known motifs. Data is presented using Cytoscape’s visualization features thus providing an intuitive interface for interpreting results. The distribution of discovered or user-defined motifs may be selectively displayed and the distribution of protein domains may be viewed simultaneously. To facilitate this SLiMScape automatically retrieves domains for each protein.
SLiMScape provides a platform for performing short linear motif analyses of protein interaction networks by integrating motif discovery and search tools in a network visualization environment. This significantly aids in the discovery of novel short linear motifs and in visualizing the distribution of known motifs.
We carried out a genome-wide association study (GWAS) of LDL-c response to statin
using data from participants in the Collaborative Atorvastatin Diabetes Study (CARDS;
n = 1,156), the Anglo-Scandinavian Cardiac Outcomes Trial (ASCOT; n =
895), and the observational phase of ASCOT (n = 651), all of whom were
prescribed atorvastatin 10 mg. Following genome-wide imputation, we combined data
from the three studies in a meta-analysis. We found associations of LDL-c response to
atorvastatin that reached genome-wide significance at rs10455872 (P
= 6.13 × 10−9) within the LPA gene and
at two single nucleotide polymorphisms (SNP) within the APOE region
(rs445925; P = 2.22 × 10−16 and
rs4420638; P = 1.01 × 10−11) that are
proxies for the ϵ2 and ϵ4 variants, respectively, in APOE. The novel
association with the LPA SNP was replicated in the PROspective Study
of Pravastatin in the Elderly at Risk (PROSPER) trial (P =
0.009). Using CARDS data, we further showed that atorvastatin therapy did not alter
lipoprotein(a) [Lp(a)] and that Lp(a) levels accounted for all of the associations of
SNPs in the LPA gene and the apparent LDL-c response levels. However, statin therapy
had a similar effect in reducing cardiovascular disease (CVD) in patients in the top
quartile for serum Lp(a) levels (HR = 0.60) compared with those in the lower
three quartiles (HR = 0.66; P = 0.8 for interaction).
The data emphasize that high Lp(a) levels affect the measurement of LDL-c and the
clinical estimation of LDL-c response. Therefore, an apparently lower LDL-c response
to statin therapy may indicate a need for measurement of Lp(a). However, statin
therapy seems beneficial even in those with high Lp(a).
genetics; low density lipoprotein; LDL/metabolism; lipoprotein(a); statins
Intrinsically disordered regions in eukaryotic proteomes contain key signaling and regulatory modules and mediate interactions with many proteins. Many viral proteomes encode disordered proteins and modulate host factors through the use of short linear motifs (SLiMs) embedded within disordered regions. However, the degree of viral protein disorder across different viruses is not well understood, so we set out to establish the constraints acting on viruses, in terms of their use of disordered protein regions. We surveyed predicted disorder across 2,278 available viral genomes in 41 families, and correlated the extent of disorder with genome size and other factors. Protein disorder varies strikingly between viral families (from 2.9% to 23.1% of residues), and also within families. However, this substantial variation did not follow the established trend among their hosts, with increasing disorder seen across eubacterial, archaebacterial, protists, and multicellular eukaryotes. For example, among large mammalian viruses, poxviruses and herpesviruses showed markedly differing disorder (5.6% and 17.9%, respectively). Viral families with smaller genome sizes have more disorder within each of five main viral types (ssDNA, dsDNA, ssRNA+, dsRNA, retroviruses), except for negative single-stranded RNA viruses, where disorder increased with genome size. However, surveying over all viruses, which compares tiny and enormous viruses over a much bigger range of genome sizes, there is no strong association of genome size with protein disorder. We conclude that there is extensive variation in the disorder content of viral proteomes. While a proportion of this may relate to base composition, to extent of gene overlap, and to genome size within viral types, there remain important additional family and virus-specific effects. Differing disorder strategies are likely to impact on how different viruses modulate host factors, and on how rapidly viruses can evolve novel instances of SLiMs subverting host functions, such as innate and acquired immunity.
The conventional wisdom is that certain classes of bioactive peptides have specific structural features that endow their particular functions. Accordingly, predictions of bioactivity have focused on particular subgroups, such as antimicrobial peptides. We hypothesized that bioactive peptides may share more general features, and assessed this by contrasting the predictive power of existing antimicrobial predictors as well as a novel general predictor, PeptideRanker, across different classes of peptides.
We observed that existing antimicrobial predictors had reasonable predictive power to identify peptides of certain other classes i.e. toxin and venom peptides. We trained two general predictors of peptide bioactivity, one focused on short peptides (4–20 amino acids) and one focused on long peptides ( amino acids). These general predictors had performance that was typically as good as, or better than, that of specific predictors. We noted some striking differences in the features of short peptide and long peptide predictions, in particular, high scoring short peptides favour phenylalanine. This is consistent with the hypothesis that short and long peptides have different functional constraints, perhaps reflecting the difficulty for typical short peptides in supporting independent tertiary structure.
We conclude that there are general shared features of bioactive peptides across different functional classes, indicating that computational prediction may accelerate the discovery of novel bioactive peptides and aid in the improved design of existing peptides, across many functional classes. An implementation of the predictive method, PeptideRanker, may be used to identify among a set of peptides those that may be more likely to be bioactive.
Intracellular juxtamembrane regions of transmembrane proteins play pivotal roles in cell signalling, mediated by protein-protein interactions. Disordered protein regions, and short conserved motifs within them, are emerging as key determinants of many such interactions. Here, we investigated whether disorder and conserved motifs are enriched in the juxtamembrane area of human single-pass transmembrane proteins. Conserved motifs were defined as short disordered regions that were much more conserved than the adjacent disordered residues. Human single-pass proteins had higher mean disorder in their cytoplasmic segments than their extracellular parts. Some, but not all, of this effect reflected the shorter length of the cytoplasmic tail. A peak of cytoplasmic disorder was seen at around 30 residues from the membrane. We noted a significant increase in the incidence of conserved motifs within the disordered regions at the same location, even after correcting for the extent of disorder. We conclude that elevated disorder within the cytoplasmic tail of many transmembrane proteins is likely to be associated with enrichment for signalling interactions mediated by conserved short motifs.
Short linear protein motifs are attracting increasing attention as functionally independent sites, typically 3–10 amino acids in length that are enriched in disordered regions of proteins. Multiple methods have recently been proposed to discover over-represented motifs within a set of proteins based on simple regular expressions. Here, we extend these approaches to profile-based methods, which provide a richer motif representation.
The profile motif discovery method MEME performed relatively poorly for motifs in disordered regions of proteins. However, when we applied evolutionary weighting to account for redundancy amongst homologous proteins, and masked out poorly conserved regions of disordered proteins, the performance of MEME is equivalent to that of regular expression methods. However, the two approaches returned different subsets within both a benchmark dataset, and a more realistic discovery dataset.
Profile-based motif discovery methods complement regular expression based methods. Whilst profile-based methods are computationally more intensive, they are likely to discover motifs currently overlooked by regular expression methods.
Protein-protein interactions; Motif discovery; Peptide binding; Short linear motifs; Mini-motifs; SLiMs
Autism spectrum disorder (ASD) is a highly heritable disorder of complex and heterogeneous aetiology. It is primarily characterized by altered cognitive ability including impaired language and communication skills and fundamental deficits in social reciprocity. Despite some notable successes in neuropsychiatric genetics, overall, the high heritability of ASD (~90%) remains poorly explained by common genetic risk variants. However, recent studies suggest that rare genomic variation, in particular copy number variation, may account for a significant proportion of the genetic basis of ASD. We present a large scale analysis to identify candidate genes which may contain low-frequency recessive variation contributing to ASD while taking into account the potential contribution of population differences to the genetic heterogeneity of ASD. Our strategy, homozygous haplotype (HH) mapping, aims to detect homozygous segments of identical haplotype structure that are shared at a higher frequency amongst ASD patients compared to parental controls. The analysis was performed on 1,402 Autism Genome Project trios genotyped for 1 million single nucleotide polymorphisms (SNPs). We identified 25 known and 1,218 novel ASD candidate genes in the discovery analysis including CADM2, ABHD14A, CHRFAM7A, GRIK2, GRM3, EPHA3, FGF10, KCND2, PDZK1, IMMP2L and FOXP2. Furthermore, 10 of the previously reported ASD genes and 300 of the novel candidates identified in the discovery analysis were replicated in an independent sample of 1,182 trios. Our results demonstrate that regions of HH are significantly enriched for previously reported ASD candidate genes and the observed association is independent of gene size (odds ratio 2.10). Our findings highlight the applicability of HH mapping in complex disorders such as ASD and offer an alternative approach to the analysis of genome-wide association data.
Electronic supplementary material
The online version of this article (doi:10.1007/s00439-011-1094-6) contains supplementary material, which is available to authorized users.
Intrinsically disordered regions are enriched in short interaction motifs that play a critical role in many protein-protein interactions. Since new short interaction motifs may easily evolve, they have the potential to rapidly change protein interactions and cellular signaling. In this work we examined the dynamics of gain and loss of intrinsically disordered regions in duplicated proteins to inspect if changes after genome duplication can create functional divergence. For this purpose we used Saccharomyces cerevisiae and the outgroup species Lachancea kluyveri.
We find that genes duplicated as part of a genome duplication (ohnologs) are significantly more intrinsically disordered than singletons (p<2.2e-16, Wilcoxon), reflecting a preference for retaining intrinsically disordered proteins in duplicate. In addition, there have been marked changes in the extent of intrinsic disorder following duplication. A large number of duplicated genes have more intrinsic disorder than their L. kluyveri ortholog (29% for duplicates versus 25% for singletons) and an even greater number have less intrinsic disorder than the L. kluyveri ortholog (37% for duplicates versus 25% for singletons). Finally, we show that the number of physical interactions is significantly greater in the more intrinsically disordered ohnolog of a pair (p = 0.003, Wilcoxon).
This work shows that intrinsic disorder gain and loss in a protein is a mechanism by which a genome can also diverge and innovate. The higher number of interactors for proteins that have gained intrinsic disorder compared with their duplicates may reflect the acquisition of new interaction partners or new functional roles.
Gene and protein interactions are commonly represented as networks, with the genes or proteins comprising the nodes and the relationship between them as edges. Motifs, or small local configurations of edges and nodes that arise repeatedly, can be used to simplify the interpretation of networks.
We examined triplet motifs in a network of quantitative epistatic genetic relationships, and found a non-random distribution of particular motif classes. Individual motif classes were found to be associated with different functional properties, suggestive of an underlying biological significance. These associations were apparent not only for motif classes, but for individual positions within the motifs. As expected, NNN (all negative) motifs were strongly associated with previously reported genetic (i.e. synthetic lethal) interactions, while PPP (all positive) motifs were associated with protein complexes. The two other motif classes (NNP: a positive interaction spanned by two negative interactions, and NPP: a negative spanned by two positives) showed very distinct functional associations, with physical interactions dominating for the former but alternative enrichments, typical of biochemical pathways, dominating for the latter.
We present a model showing how NNP motifs can be used to recognize supportive relationships between protein complexes, while NPP motifs often identify opposing or regulatory behaviour between a gene and an associated pathway. The ability to use motifs to point toward underlying biological organizational themes is likely to be increasingly important as more extensive epistasis mapping projects in higher organisms begin.
Milk proteins are required to proceed through a variety of conditions of radically varying pH, which are not identical across mammalian digestive systems. We wished to investigate if the shifts in these requirements have resulted in marked changes in the isoelectric point and charge of milk proteins during evolution.
We investigated nine major milk proteins in 13 mammals. In comparison with a group of orthologous non-milk proteins, we found that 3 proteins κ-casein, lactadherin, and muc1 have undergone the highest change in isoelectric point during evolution. The pattern of non-synonymous substitutions indicate that selection has played a role in the isoelectric point shift, since residues that show significant evidence of positive selection are much more likely to be charged (p = 0.03 for κ-casein; p < 10-8 for muc1). However, this selection does not appear to be solely due to adaptation to the diversity of mammalian digestive systems, since striking changes are seen among species that resemble each other in terms of their digestion.
The changes in charge are most likely due to changes of other protein functions, rather than an adaptation to the different mammalian digestive systems. These functions may include differences in bioactive peptide releases in the gut between different mammals, which are known to be a major contributing factor in the functional and nutritional value of mammalian milk. This raises the question of whether bovine milk is optimal in terms of particular protein functions, for human nutrition and possibly disease resistance.
This article was reviewed by Fyodor Kondrashov, David Liberles (nominated by David Ardell), and Christophe Lefevre (nominated by Mark Ragan).