|Home | About | Journals | Submit | Contact Us | Français|
It has been thought that functional loss of a gene due to null mutation can often be compensated by its paralog(s). Indeed, the genome-wide single-gene knockout or knockdown data in yeast and worm showed that the proportion of essential genes (PE) in singletons is substantially greater than that in duplicates1, 2; we consider a gene to be ‘essential’ if its deletion leads to lethality or sterility. However, the mouse knockout data3 collected from individual experimental studies showed similar PE values for singletons and for duplicates4, 5. This puzzling observation has attracted much attention6, 7. Here we propose an explanation.
Recently Makino et al.6 found that developmental genes tend to be more essential than other genes and are highly enriched in the mouse knockout dataset. Here, we show that this enrichment does not cause a significant bias in the relative PE values for singletons and duplicates at the genome level, because the enrichment exists for both singletons and duplicates. From the dataset of Makino et al., we calculate the PE values for singletons and for duplicates in the mouse genome after adjusting the bias of developmental genes (Supplemental materials online). Interestingly, although both genome-wide PE values become substantially lower than those in the knockout dataset (singletons: from 42.2% to 35.6%; and duplicates: from 41.4% to 32.8%, Table S1), they are still similar (p = 0.09, χ2 = 2.9, Table S2). Next, we consider another bias in the knockout dataset6, namely the enrichment of duplicate genes from whole-genome duplications. After adjusting for the functional bias, this factor has a less than 1% effect on the genome-wide PE estimate (Supplemental Materials).
Why is the PE for mouse duplicates similar to that for singletons even at the genome level? One possible reason is unequal functional partition between singletons and duplicates (i.e. a higher proportion of developmental genes and a lower proportion of un-annotated genes in duplicates than in singletons). However, the observation6 that among mouse developmental genes, the PE for duplicates is even higher than that for singletons further suggests that there were other confounding factors.
From the perspective of systems biology, the centrality of a gene in a biological network can affect gene essentiality8. Previously, we found that mouse duplicates tend to have more interacting partners in the protein interaction network and that genes encoding hub proteins are more likely to be essential4. Thus, is the higher essentiality of developmental duplicates due in part to their higher centrality in the network? We use the high-quality protein interaction data set9 from a systematic examination of all binary interactions among ~7,200 human proteins; this dataset is less biased than those collected from individual studies. For mouse developmental genes with phenotypic data, we use their human orthologs and the human protein interaction data to calculate the connectivity and betweenness of mouse proteins, which are two frequently-used indexes for quantifying the centrality of a node in a network. Indeed, we find that developmental duplicates have higher network centrality than developmental singletons; the same trend also holds for all the duplicates and the singletons in the dataset (Table 1). We obtain similar results using other protein interaction datasets (data not shown).
Thus, both functionality and network centrality can influence PE estimation. To control these two factors, we compare the PE values for the sets of singletons and duplicates that have the same functionalities and the same connectivities. In total, there are 1,847 mouse genes with both interaction and knockout phenotypic data, and we classify them into three centrality groups: low-, median- and high-connectivity (centrality refers to connectivity here and in the remaining text). We calculate PE for each group of duplicates with the same functional classification and centrality classification; to obtain the averaged PE of mouse duplicates in the dataset, we give their weights according to the proportions of their corresponding groups in singletons (Supplemental Materials).
Although in the original dataset the PE values for singletons and duplicates are similar (45.7% vs. 42.4%, p = 0.22, χ2 = 1.5), after controlling for both functionality and centrality biases, the adjusted PE for duplicates is 39.0% (Fig. 1), ~7% lower than that for singletons (p = 0.01, χ2 = 6.4, Table S6). This number implies that ~15% of the single-gene deletions that otherwise would be lethal (or infertile) are viable (or fertile) due to duplicate functional compensation. Thus, the contribution of functional compensation by duplicates seems significant. Su and Gu7 recently showed that young duplicates are under-represented in the knockout dataset. Because the backup role of young duplicates is likely to be stronger than old duplicates, the functional compensation by duplicates for the genome is likely to be higher than we estimated above.
We have provided a systems-level explanation for the observation that developmental genes are more essential6. Our results highlight the importance of controlling confounding factors in studying the role of duplicates in genetic robustness. Conventionally, the contribution to functional compensation by duplicates is inferred by directly comparing the PE value for duplicates with that for singletons, and similar PE values are usually taken as evidence that there is no contribution due to duplicate genes. However, the functional partitioning and network centrality for duplicates might be different from those for singletons. It should be emphasized that even when the genome-wide phenotypic data of single-gene deletion is available, correcting for such intrinsic differences is still necessary.
Of course, our analyses only represent an initial effort to adjust confounding factors. Because current protein interaction datasets are incomplete, an estimation of functional compensation by duplicates in the whole mouse genome is still not feasible. Moreover, other potential confounding factors remain to be explored. Nevertheless, our study provides a general framework for estimating the contribution of duplicate genes to functional compensation by integrating functional genomic data.
We thank Dr. Aoife McLysaght for providing us with their dataset and for her valuable comments on our manuscript. This study was supported by NIH grants (GM30998 and GM081724) to W.H.L.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.