It has long been known that methylated cytosines deaminate at higher rates than unmodified cytosines and constitute mutational hotspots in mammalian genomes. The repertoire of naturally occurring cytosine modifications, however, extends beyond 5-methylcytosine to include its oxidation derivatives, notably 5-hydroxymethylcytosine. The effects of these modifications on sequence evolution are unknown. Here, we combine base-resolution maps of methyl- and hydroxymethylcytosine in human and mouse with population genomic, divergence and somatic mutation data to show that hydroxymethylated and methylated cytosines show distinct patterns of variation and evolution. Surprisingly, hydroxymethylated sites are consistently associated with elevated C to G transversion rates at the level of segregating polymorphisms, fixed substitutions, and somatic mutations in tumors. Controlling for multiple potential confounders, we find derived C to G SNPs to be 1.43-fold (1.22-fold) more common at hydroxymethylated sites compared to methylated sites in human (mouse). Increased C to G rates are evident across diverse functional and sequence contexts and, in cancer genomes, correlate with the expression of Tet enzymes and specific components of the mismatch repair pathway (MSH2, MSH6, and MBD4). Based on these and other observations we suggest that hydroxymethylation is associated with a distinct mutational burden and that the mismatch repair pathway is implicated in causing elevated transversion rates at hydroxymethylated cytosines.
Most cytosines that occur in a CpG context in mammalian genomes are methylated. Methylation has important functional consequences in the cell but also affects genome evolution. Notably, methylated cytosines are prone to deaminate and constitute mutational hotspots in mammalian genomes. Recently, a series of other modifications, derived from the oxidation of methylated cytosines, was shown to exist in various mammalian cell types including embryonic stem cells. The most abundant of these modifications is 5-hydroxymethylcytosine. In this work, we ask whether methylated and hydroxymethylated cytosines are subject to the same mutational biases or lead to distinct patterns of genome evolution. To do so, we examine differences between individuals, between species, and between normal and cancer tissues alongside high-resolution maps of DNA methylation and hydroxymethylation in the human and mouse genomes. Unexpectedly, we find that hydroxymethylated cytosines are associated with more cytosine to guanine changes in both human and mouse populations, in closely related species, and in the context of somatic evolution in tumors. Based on multiple lines of evidence, we suggest that the different patterns of sequence evolution at methylated and hydroxymethylated sites are owing to differences in how these sites are handled by the DNA repair machinery.
The genetic code is redundant, meaning that most amino acids can be encoded by more than one codon. Highly expressed genes tend to use optimal codons to increase the accuracy and speed of translation. Thus, codon usage biases provide a signature of the relative expression levels of genes, which can, uniquely, be quantified across the domains of life.
Here we describe a general statistical framework to exploit this phenomenon and to systematically associate genes with environments and phenotypic traits through changes in codon adaptation. By inferring evolutionary signatures of translation efficiency in 911 bacterial and archaeal genomes while controlling for confounding effects of phylogeny and inter-correlated phenotypes, we linked 187 gene families to 24 diverse phenotypic traits. A series of experiments in Escherichia coli revealed that 13 of 15, 19 of 23, and 3 of 6 gene families with changes in codon adaptation in aerotolerant, thermophilic, or halophilic microbes. Respectively, confer specific resistance to, respectively, hydrogen peroxide, heat, and high salinity. Further, we demonstrate experimentally that changes in codon optimality alone are sufficient to enhance stress resistance. Finally, we present evidence that multiple genes with altered codon optimality in aerobes confer oxidative stress resistance by controlling the levels of iron and NAD(P)H.
Taken together, these results provide experimental evidence for a widespread connection between changes in translation efficiency and phenotypic adaptation. As the number of sequenced genomes increases, this novel genomic context method for linking genes to phenotypes based on sequence alone will become increasingly useful.
Nucleosomes, the basic repeat units of eukaryotic chromatin, have been suggested to influence the evolution of eukaryotic genomes, both by altering the propensity of DNA to mutate and by selection acting to maintain or exclude nucleosomes in particular locations. Contrary to the popular idea that nucleosomes are unique to eukaryotes, histone proteins have also been discovered in some archaeal genomes. Archaeal nucleosomes, however, are quite unlike their eukaryotic counterparts in many respects, including their assembly into tetramers (rather than octamers) from histone proteins that lack N- and C-terminal tails. Here, we show that despite these fundamental differences the association between nucleosome footprints and sequence evolution is strikingly conserved between humans and the model archaeon Haloferax volcanii. In light of this finding we examine whether selection or mutation can explain concordant substitution patterns in the two kingdoms. Unexpectedly, we find that neither the mutation nor the selection model are sufficient to explain the observed association between nucleosomes and sequence divergence. Instead, we demonstrate that nucleosome-associated substitution patterns are more consistent with a third model where sequence divergence results in frequent repositioning of nucleosomes during evolution. Indeed, we show that nucleosome repositioning is both necessary and largely sufficient to explain the association between current nucleosome positions and biased substitution patterns. This finding highlights the importance of considering the direction of causality between genetic and epigenetic change.
Genome sequences as well as epigenetic states, such as DNA methylation or nucleosome binding patterns, change during evolution. But what is the causal relationship between the two? We already know that nucleotide variation within and between species is distributed unevenly around nucleosome footprints, but does this mean that sequence evolution follows a biased course because the presence of nucleosomes affects mutation and DNA repair dynamics? Or is it, in fact, the other way around, i.e. changes happen at the DNA level and prompt shifts in nucleosome positioning? To investigate the direction of causality in genetic versus epigenetic evolution, we analyze substitutions patterns in eukaryotes as well as the archaeon Haloferax volcanii in the context of genome-wide nucleosome binding maps. We demonstrate that the relationship between nucleosome positions and between-species divergence patterns, strikingly similar in eukaryotes and archaea, can be explained in large parts by nucleosomes shifting positions in response to substitution, although both mutation and selection biases might still exist. Our results illustrate that it is important to consider the direction of causality between epigenetic and genetic change when analyzing patterns of sequence divergence and using sequence conservation to infer selection on epigenetic states.
Harmful epistatic (genetic) interactions not only occur between mutations, but also when genes change in expression. Gene expression dynamics in yeast suggests that this ‘epigenetic' epistasis constrains evolution, with the tight regulation of network hubs promoting a robust, ‘canalized' phenotype.
Yeast genes with many negative genetic interaction partners tend to have expression that is stable between cells, across conditions, and through evolution.This low expression variation is linked to the use of alternative promoter architectures.The stable expression of genetic interaction network hubs suggests that epigenetic epistasis confers a constraint on evolution.
Reduced activity of two genes in combination often has a more detrimental effect than expected. Such epistatic interactions not only occur when genes are mutated but also due to variation in gene expression, including among isogenic individuals in a controlled environment. We hypothesized that these ‘epigenetic' epistatic interactions could place important constraints on the evolution of gene expression. Consistent with this, we show here that yeast genes with many epistatic interaction partners typically show low expression variation among isogenic individuals and low variation across different conditions. In addition, their expression tends to remain stable in response to the accumulation of mutations and only diverges slowly between strains and species. Yeast promoter architectures, the retention of gene duplicates, and the divergence of expression between humans and chimps are also consistent with selective pressure to reduce the likelihood of harmful epigenetic epistatic interactions. Based on these and previous analyses, we propose that the tight regulation of epistatic interaction network hubs makes an important contribution to the maintenance of a robust, ‘canalized' phenotype. Moreover, that epigenetic epistatic interactions may contribute substantially to fitness defects when single genes are deleted.
epigenetics; epistasis; evolution; gene expression; genetic interaction
The binding of proteins can shield DNA from mutagenic processes but also interfere with efficient repair. How the presence of DNA-binding proteins shapes intra-genomic differences in mutability and, ultimately, sequence variation in natural populations, however, remains poorly understood. In this study, we examine sequence evolution in Escherichia coli in relation to the binding of four abundant nucleoid-associated proteins: Fis, H-NS, IhfA, and IhfB. We find that, for a subset of mutations, protein occupancy is associated with both increased and decreased mutability in the underlying sequence depending on when the protein is bound during the bacterial growth cycle. On average, protein-bound DNA exhibits reduced mutability compared to protein-free DNA. However, this net protective effect is weak and can be abolished or even reversed during stages of colony growth where binding coincides – and hence likely interferes with – DNA repair activity. We suggest that the four nucleoid-associated proteins analyzed here have played a minor but significant role in patterning extant sequence variation in E. coli.
Mutations can be more or less likely to occur depending on whether DNA is naked or bound by proteins. On the one hand, DNA-binding proteins can shield the DNA from certain mutagenic processes. On the other hand, the very same proteins can interfere with efficient DNA repair. In this study, we reconstruct the history of mutations across 54 E. coli genomes and ask whether mutation risk is higher or lower in regions occupied by proteins that help organize bacterial DNA into chromatin. Intriguingly, we find that the effect of binding depends on its timing. When we consider genomic regions bound during stationary phase, we observe that binding is associated with lower mutation risk for some mutation classes compared to naked DNA, albeit weakly. However, when binding occurs during exponential phase, bound regions actually experience more mutations on average. We argue that this is because, during exponential phase, the major effect of binding is that it interferes with efficient DNA repair, whereas in stationary phase – when many repair pathways are inactive – the protective effect of binding dominates. Our results suggest that the four DNA-binding proteins considered here have a small but significant growth phase-specific effect on mutation dynamics in E. coli.
More than 50% of human genes initiate transcription from CpG dinucleotide-rich regions referred to as CpG islands. These genes show differences in their patterns of transcription initiation, and have been reported to have higher levels of some activation-associated chromatin modifications.
Here we report that genes with CpG island promoters have a characteristic transcription-associated chromatin organization. This signature includes high levels of the transcription elongation-associated histone modifications H4K20me1, H2BK5me1 and H3K79me1/2/3 in the 5' end of the gene, depletion of the activation marks H2AK5ac, H3K14ac and H3K23ac immediately downstream of the transcription start site (TSS), and characteristic epigenetic asymmetries around the TSS. The chromosome organization factor CTCF may be bound upstream of RNA polymerase in most active CpG island promoters, and an unstable nucleosome at the TSS may be specifically marked by H4K20me3, the first example of such a modification. H3K36 monomethylation is only detected as enriched in the bodies of active genes that have CpG island promoters. Finally, as expression levels increase, peak modification levels of the histone methylations H3K9me1, H3K4me1, H3K4me2 and H3K27me1 shift further away from the TSS into the gene body.
These results suggest that active genes with CpG island promoters have a distinct step-like series of modified nucleosomes after the TSS. The identity, positioning, shape and relative ordering of transcription-associated histone modifications differ between genes with and without CpG island promoters. This supports a model where chromatin organization reflects not only transcription activity but also the type of promoter in which transcription initiates.
A central challenge in genetics is to understand when and why mutations alter the phenotype of an organism. The consequences of gene inhibition have been systematically studied and can be predicted reasonably well across a genome. However, many sequence variants important for disease and evolution may alter gene regulation rather than gene function. The consequences of altering a regulatory interaction (or “edge”) rather than a gene (or “node”) in a network have not been as extensively studied. Here we use an integrative analysis and evolutionary conservation to identify features that predict when the loss of a regulatory interaction is detrimental in the extensively mapped transcription network of budding yeast. Properties such as the strength of an interaction, location and context in a promoter, regulator and target gene importance, and the potential for compensation (redundancy) associate to some extent with interaction importance. Combined, however, these features predict quite well whether the loss of a regulatory interaction is detrimental across many promoters and for many different transcription factors. Thus, despite the potential for regulatory diversity, common principles can be used to understand and predict when changes in regulation are most harmful to an organism.
The genomes of individuals differ in sequence at thousands of base pairs. Some of these polymorphisms affect the sequence of proteins, but many are likely to alter how genes are regulated. When are changes in gene regulation detrimental to an organism? We have used an integrative analysis of transcription factor binding site conservation in budding yeast to address the extent to which different features predict when potential changes in gene regulation are detrimental. We found that, despite the diversity of transcription factors and regulatory regions in a genome, a few simple properties can be used to predict and understand when changes in regulation are most harmful.
Chromatin in sperm is different from that in other cells, with most of the genome packaged by protamines not nucleosomes. Nucleosomes are, however, retained at some genomic sites, where they have the potential to transmit paternal epigenetic information. It is not understood how this retention is specified. Here we show that base composition is the major determinant of nucleosome retention in human sperm, predicting retention very well in both genic and non-genic regions of the genome. The retention of nucleosomes at GC-rich sequences with high intrinsic nucleosome affinity accounts for the previously reported retention at transcription start sites and at genes that regulate development. It also means that nucleosomes are retained at the start sites of most housekeeping genes. We also report a striking link between the retention of nucleosomes in sperm and the establishment of DNA methylation-free regions in the early embryo. Taken together, this suggests that paternal nucleosome transmission may facilitate robust gene regulation in the early embryo. We propose that chromatin organization in the male germline, rather than in somatic cells, is the major functional consequence of fine-scale base composition variation in the human genome. The selective pressure driving base composition evolution in mammals could, therefore, be the need to transmit paternal epigenetic information to the zygote.
In most cells, DNA is packaged by protein complexes called nucleosomes. In sperm, however, nucleosomes are only retained at a small fraction of the genome, particularly at the start sites of genes. In this work, we show that the sites at which nucleosomes are retained in sperm are specified by variation in the base composition of the human genome. At a fine scale, the human genome varies extensively in the content of GC versus AT base pairs, and we find that in both genic and non-genic regions this predicts very well where nucleosomes are retained in mature sperm. These regions include transcription start sites, especially for genes that are expressed in all cells and for genes that regulate development. We also report that regions that retain nucleosomes in sperm are likely to be protected from DNA methylation in the early embryo, suggesting a further connection between the presence of nucleosomes on the paternal genome and the establishment of gene regulation in the embryo. Based on these results, we propose that an important selective pressure on base composition evolution in mammalian genomes may be the requirement to organize chromatin in sperm in a way that facilitates gene regulation in the early embryo.
In 1905, Albert Einstein proposed that the forces that cause the random Brownian motion of a particle also underlie the resistance to macroscopic motion when a force is applied. This insight, of a coupling between fluctuation (stochastic behavior) and responsiveness (non-stochastic behavior), founded an important branch of physics. Here we argue that his insight may also be relevant for understanding evolved biological systems, and we present a ‘fluctuation–response relationship’ for biology. The relationship is consistent with the idea that biological systems are similarly canalized to stochastic, environmental, and genetic perturbations. It is also supported by in silico evolution experiments, and by the observation that ‘noisy’ gene expression is often both more responsive and more ‘evolvable’. More generally, we argue that in biology there is (and always has been) an important role for macroscopic theory that considers the general behavior of systems without concern for their intimate molecular details.
Electronic supplementary material
The online version of this article (doi:10.1007/s00018-010-0589-y) contains supplementary material, which is available to authorized users.
Noise; Plasticity; Canalization; Evolvability; Fluctuation; Response; Genetic assimilation; Gene expression
Gene expression responds to changes in conditions but also stochastically among individuals. In budding yeast, both expression responsiveness across conditions (“plasticity”) and cell-to-cell variation (“noise”) have been quantified for thousands of genes and found to correlate across genes. It has been argued therefore that noise and plasticity may be strongly coupled and mechanistically linked. This is consistent with some theoretical ideas, but a strong coupling between noise and plasticity also has the potential to introduce cost–benefit conflicts during evolution. For example, if high plasticity is beneficial (genes need to respond to the environment), but noise is detrimental (fluctuations are harmful), then strong coupling should be disfavored. Here, evidence is presented that cost–benefit conflicts do occur and that they constrain the evolution of gene expression and promoter usage. In contrast to recent assertions, coupling between noise and plasticity is not a general property, but one associated with particular mechanisms of transcription initiation. Further, promoter architectures associated with coupling are avoided when noise is most likely to be detrimental, and noise and plasticity are largely independent traits for core cellular components. In contrast, when genes are duplicated noise–plasticity coupling increases, consistent with reduced detrimental affects of expression variation. Noise–plasticity coupling is, therefore, an evolvable trait that may constrain the emergence of highly responsive gene expression and be selected against during evolution. Further, the global quantitative data in yeast suggest that one mechanism that relieves the constraints imposed by noise–plasticity coupling is gene duplication, providing an example of how duplication can facilitate escape from adaptive conflicts.
Gene expression needs to respond to changes in conditions, but also varies stochastically among cells in a homogenous environment. It has been argued that these two levels of expression variation may be coupled, relating to the same underlying molecular mechanisms. However, such a strong coupling between expression “plasticity” and expression “noise” may introduce cost–benefit conflicts during evolution. For example, if plasticity is beneficial, but noise is detrimental, then coupling will be disfavored. In this work, evidence is presented that such cost–benefit conflicts do occur and that they constrain the evolution of gene expression in yeast. In contrast to recent conclusions, it is shown that noise–plasticity coupling is not a general result, but rather one associated with particular mechanisms of transcription initiation. Promoter architectures associated with coupling are avoided when noise is detrimental, and noise and plasticity are not coupled for core cellular components. Noise–plasticity coupling is therefore not a general property of gene expression, but an evolvable trait that may constrain the evolution of gene expression and be selected against during evolution. Further, gene duplication may facilitate escape from the adaptive conflict imposed by coupling.
Gene inactivation often has little or no apparent consequence for the phenotype of an organism. This property—enetic (or mutational) robustness—is pervasive, and has important implications for disease and evolution, but is not well understood. Dating back to at least Waddington, it has been suggested that mutational robustness may be related to the requirement to withstand environmental or stochastic perturbations. Here I show that global quantitative data from yeast are largely consistent with this idea. Considering the effects of mutations in all nonessential genes shows that genes that confer robustness to environmental or stochastic change also buffer the effects of genetic change, and with similar efficacy. This means that selection during evolution for environmental or stochastic robustness (also referred to as canalization) may frequently have the side effect of increasing genetic robustness. A dynamic environment may therefore promote the evolution of phenotypic complexity. It also means that “hub” genes in genetic interaction (synthetic lethal) networks are generally genes that confer environmental resilience and phenotypic stability.
A protein interaction network describes a set of physical associations that can occur between proteins. However, within any particular cell or tissue only a subset of proteins is expressed and so only a subset of interactions can occur. Integrating interaction and expression data, we analyze here this interplay between protein expression and physical interactions in humans. Proteins only expressed in restricted cell types, like recently evolved proteins, make few physical interactions. Most tissue-specific proteins do, however, bind to universally expressed proteins, and so can function by recruiting or modifying core cellular processes. Conversely, most ‘housekeeping' proteins that are expressed in all cells also make highly tissue-specific protein interactions. These results suggest a model for the evolution of tissue-specific biology, and show that most, and possibly all, ‘housekeeping' proteins actually have important tissue-specific molecular interactions.
human; protein interaction networks; tissue-specific evolution
Gene expression, like many biological processes, is subject to noise. This noise has been measured on a global scale, but its general importance to the fitness of an organism is unclear. Here, I show that noise in gene expression in yeast has evolved to prevent harmful stochastic variation in the levels of genes that reduce fitness when their expression levels change. Therefore, there has probably been widespread selection to minimise noise in gene expression. Selection to minimise noise, because it results in gene expression that is stable to stochastic variation in cellular components, may also constrain the ability of gene expression to respond to non-stochastic variation. I present evidence that this has indeed been the case in yeast. I therefore conclude that gene expression noise is an important biological trait, and one that probably limits the evolvability of complex living systems.
evolution; gene expression; noise; robustness
The functions of a eukaryotic cell are largely performed by multi-subunit protein complexes that act as molecular machines or information processing modules in cellular networks. An important problem in systems biology is to understand how, in general, these molecular machines respond to perturbations.
In yeast, genes that inhibit growth when their expression is reduced are strongly enriched amongst the subunits of multi-subunit protein complexes. This applies to both the core and peripheral subunits of protein complexes, and the subunits of each complex normally have the same loss-of-function phenotypes. In contrast, genes that inhibit growth when their expression is increased are not enriched amongst the core or peripheral subunits of protein complexes, and the behaviour of one subunit of a complex is not predictive for the other subunits with respect to over-expression phenotypes.
We propose the principle that the overall activity of a protein complex is in general robust to an increase, but not to a decrease in the expression of its subunits. This means that whereas phenotypes resulting from a decrease in gene expression can be predicted because they cluster on networks of protein complexes, over-expression phenotypes cannot be predicted in this way. We discuss the implications of these findings for understanding how cells are regulated, how they evolve, and how genetic perturbations connect to disease in humans.
Invertebrate conserved noncoding elements (CNEs) are associated with the same core set of genes as vertebrate CNEs, and may reflect the parallel evolution of enhancers in the gene regulatory networks that define alternative animal body plans.
The human genome contains thousands of non-coding sequences that are often more conserved between vertebrate species than protein-coding exons. These highly conserved non-coding elements (CNEs) are associated with genes that coordinate development, and have been proposed to act as transcriptional enhancers. Despite their extreme sequence conservation in vertebrates, sequences homologous to CNEs have not been identified in invertebrates.
Here we report that nematode genomes contain an alternative set of CNEs that share sequence characteristics, but not identity, with their vertebrate counterparts. CNEs thus represent a very unusual class of sequences that are extremely conserved within specific animal lineages yet are highly divergent between lineages. Nematode CNEs are also associated with developmental regulatory genes, and include well-characterized enhancers and transcription factor binding sites, supporting the proposed function of CNEs as cis-regulatory elements. Most remarkably, 40 of 156 human CNE-associated genes with invertebrate orthologs are also associated with CNEs in both worms and flies.
A core set of genes that regulate development is associated with CNEs across three animal groups (worms, flies and vertebrates). We propose that these CNEs reflect the parallel evolution of alternative enhancers for a common set of developmental regulatory genes in different animal groups. This 're-wiring' of gene regulatory networks containing key developmental coordinators was probably a driving force during the evolution of animal body plans. CNEs may, therefore, represent the genomic traces of these 'hard-wired' core gene regulatory networks that specify the development of each alternative animal body plan.
High-throughput combinatorial RNAi demonstrates that many duplicated genes in C. elegans can retain redundant functions for more than 80 million years
Systematic analyses of loss-of-function phenotypes have been carried out for most genes in Saccharomyces cerevisiae, Caenorhabditis elegans, and Drosophila melanogaster. Although such studies vastly expand our knowledge of single gene function, they do not address redundancy in genetic networks. Developing tools for the systematic mapping of genetic interactions is thus a key step in exploring the relationship between genotype and phenotype.
We established conditions for RNA interference (RNAi) in C. elegans to target multiple genes simultaneously in a high-throughput setting. Using this approach, we can detect the great majority of previously known synthetic genetic interactions. We used this assay to examine the redundancy of duplicated genes in the genome of C. elegans that correspond to single orthologs in S. cerevisiae or D. melanogaster and identified 16 pairs of duplicated genes that have redundant functions. Remarkably, 14 of these redundant gene pairs were duplicated before the divergence of C. elegans and C. briggsae 80-110 million years ago, suggesting that there has been selective pressure to maintain the overlap in function between some gene duplicates.
We established a high throughput method for examining genetic interactions using combinatorial RNAi in C. elegans. Using this technique, we demonstrated that many duplicated genes can retain redundant functions for more than 80 million years of evolution. This provides strong support for evolutionary models that predict that genetic redundancy between duplicated genes can be actively maintained by natural selection and is not just a transient side effect of recent gene duplication events.
Mutations in lin-35, the worm ortholog of a mammalian tumor suppressor gene, and other synMuv B genes result in an increased sensitivity to RNAi and enhanced somatic transgene silencing.
Genome-wide RNA interference (RNAi) screening is a very powerful tool for analyzing gene function in vivo in Caenorhabditis elegans. The effectiveness of RNAi varies from gene to gene, however, and neuronally expressed genes are largely refractive to RNAi in wild-type worms.
We found that C. elegans strains carrying mutations in lin-35, the worm ortholog of the tumor suppressor gene p105Rb, or a subset of the genetically related synMuv B family of chromatin-modifying genes, show increased strength and penetrance for many germline, embryonic, and post-embryonic RNAi phenotypes, including neuronal RNAi phenotypes. Mutations in these same genes also enhance somatic transgene silencing via an RNAi-dependent mechanism. Two genes, mes-4 and zfp-1, are required both for the vulval lineage defects resulting from mutations in synMuv B genes and for RNAi, suggesting a common mechanism for the function of synMuv B genes in vulval development and in regulating RNAi. Enhanced RNAi in the germline of lin-35 worms suggests that misexpression of germline genes in somatic cells cannot alone account for the enhanced RNAi observed in this strain.
A worm strain with a null mutation in lin-35 is more sensitive to RNAi than any other previously described single mutant strain, and so will prove very useful for future genome-wide RNAi screens, particularly for identifying genes with neuronal functions. As lin-35 is the worm ortholog of the mammalian tumor suppressor gene p105Rb, misregulation of RNAi may be important during human oncogenesis.
A report on the joint Keystone Symposia on Systems and Biology and Proteomics and Bioinformatics, Keystone, USA, 8-13 April 2005.
A report on the joint Keystone Symposia on Systems and Biology and Proteomics and Bioinformatics, Keystone, USA, 8-13 April 2005.
Using data from model organisms, the authors have generated a large-scale human protein-protein interaction map. The map can be used to predict the function of human proteins.
Protein-interaction maps are powerful tools for suggesting the cellular functions of genes. Although large-scale protein-interaction maps have been generated for several invertebrate species, projects of a similar scale have not yet been described for any mammal. Because many physical interactions are conserved between species, it should be possible to infer information about human protein interactions (and hence protein function) using model organism protein-interaction datasets.
Here we describe a network of over 70,000 predicted physical interactions between around 6,200 human proteins generated using the data from lower eukaryotic protein-interaction maps. The physiological relevance of this network is supported by its ability to preferentially connect human proteins that share the same functional annotations, and we show how the network can be used to successfully predict the functions of human proteins. We find that combining interaction datasets from a single organism (but generated using independent assays) and combining interaction datasets from two organisms (but generated using the same assay) are both very effective ways of further improving the accuracy of protein-interaction maps.
The complete network predicts interactions for a third of human genes, including 448 human disease genes and 1,482 genes of unknown function, and so provides a rich framework for biomedical research.