2R-WGD occurred more than 450 million years ago, and most resulting gene duplicates were lost, leading to rediploidisation. Here we set out to functionally characterize retained 2ROs. First, we found that signal transduction was the most enriched GO term (in stark contrast to tandem or segmental duplications, where this term was underrepresented). In total, 74% of human signalling genes were descendants of 2ROs. Foreshadowing later findings, several GO terms were associated with the nervous system: neurogenesis, synaptic transmission, axon guidance, nervous system development and neuron differentiation (Additional file 2
, Table S2_bp). Next, we searched for protein domains enriched among 2ROs and found many classic signalling domains, as well as well-known protein interaction (PI) domains, such as Src homology 2 (SH2), Src homology 3 (SH3), phosphotyrosine-binding domain (PTB) and PDZ (reviewed in [24
]). The PI domains aid signalling by enabling dynamic formation of signalling protein complexes. For example, SH2 and PTB selectively recognise phosphorylated tyrosines, while SH3 binds proline-rich sequences with a characteristic motif Pro-X-X-Pro. SH2 proteins frequently form membrane-attached signal-processing complexes at autophosphorylated receptors and participate in positive and negative feedback loops of phosphorylation cascades. PTB-bearing proteins, in turn, are predominantly adaptors and docking stations, frequently anchored in the cell membrane (sometimes by means of a lipid-binding PH domain), and promoting assembly of large signalling complexes at autophosphorylated tyrosine kinases. Finally, PDZ domains recognise internal valine or leucine residues and are abundant in synapses, serving as scaffolds for the assembly of large signalling complexes involved in neurotransmission.
To better understand the evolutionary dynamics of 2R-WGD, we investigated the relationship between relative timing of gene duplication and spatial expression domain of progeny genes. The heatmap in Figure revealed 2R's expression signature in the broader context of animal evolution. Significantly, a trend could be observed for brain and nervous tissue expression (amygdala, thalamus, caudate nucleus, corpus callosum, spinal cord, fetal brain, cerebellum, cortex and whole brain) to map to the taxonomic cluster (b), Bilateria, Chordata and Vertebrata, while being excluded from younger clusters (c) and (d). These expression patterns, taken together with the results of GO analysis, suggested that the molecular machinery of the vertebrate neuron was defined in the 2R event and strongly conserved thereafter. A previous focused study of fly and mouse noted that vertebrate synapses were far more complex than those of invertebrates [25
], but the scale, the mechanism and the precise timing of this key evolutionary transition was hitherto unknown.
Development of large multicompartmentalised vertebrate brains is shaped by three layers of control [26
]: (1) establishment of patterning centres that secrete diffusible signalling ligands, such as WNTs, BMPs and soluble bone morphogenetic protein (BMP) antagonists; (2) brain-specific transcriptional regulatory networks involving TFs such as paired box proteins (PAX) and forkhead box protein (FOXP); and (3) extensive neuronal apoptosis shaping the fine detail of brain structures and compartments. For example, in a direct mechanistic demonstration, mice deficient in cysteine-aspartic acid protease 3 (CASP3) exhibited decreased neuronal apoptosis and hyperplasia, resulting in gross brain abnormalities [27
]. How important was 2R-WGD for the definition of this developmental toolkit? We found that multiple WNT ligands (TF105310), PAX2/5/8 and PAX1/9 (TF315397), FOXP1/2/3/4 (TF326978) and CASP3/7 (TF102023) were 2ROs. Previously, we showed that the evolution of the BMP/TGF-β pathway was guided almost entirely by 2R-WGD [28
]. In conclusion, we identified most of the vertebrate brain developmental toolkit as 2ROs.
The exclusion of nervous tissue from the expression domain of newly formed mammalian and primate genes contradicts intuition. However, anatomical differences between vertebrate nervous systems can be sufficiently explained by changes in developmental expression patterns of existing regulatory and structural genes of the neuron. Higher complexity of mental functions in certain vertebrate lineages (for example, in primates, some birds, and dolphins) is likely to stem from these anatomical differences, as well as more complex ways in which neurons are connected, as demonstrated by the rising area of connectomics.
Uniquely in animal evolution, and in stark contrast to other basic cellular functions, 2R-WGD expanded the cell cycle machinery, in particular cyclins A and B, and the interface with signalling made up by cyclins D1-D3, CDK4/6, p21/p27 and p18/p19. Cyclin D levels (unlike cyclins A and B) do not correlate with cell-cycle phases but with extracellular mitogens, cytokines, hormones and juxtacrine ligands. Signalling pathways induce expression of cyclin D, which pairs with cyclin-dependent kinases (CDKs) of types 4 and 6, stimulating the cell to enter the cycle from G1. (This progression can be inhibited by cyclin-dependent kinase inhibitors p21/p27 and p18/p19.) We identify all four sets of genes involved (that is, cyclins D1-D3, CDK4/6, p21/p27 and p18/p19) as 2ROs.
Arguably, the cyclin/CDK engine might be a relatively late evolutionary invention, taking over from ancient kinases [29
], and with the inherent tendency for redundancy characteristic of an integrating system [30
]. However, cyclin/CDK signalling is very well documented in yeast. Regardless of the controversy regarding the nature of primordial cell-cycle regulators, the results presented here suggest that control over cell cycles became more important in large and long-lived animals and that expansion of the cyclin/CDK network, which occurred through genome duplication, facilitated fine-tuning of that control. No such regulatory upgrade was required for other basic cellular functions (such as translation, replication, splicing and recombination). We hope to open a new area of investigation into the differences of cell-cycle control between vertebrates and model species such as fly, worm and yeast, with important consequences for both basic and applied science. As cyclin D1-D3/CDK4/6 complexes have at least partially overlapping phosphorylation targets, the apparent functional redundancy serves to integrate multiple upstream signals. In other words, 2R-WGD most likely resulted in retention of duplicates with different signalling inputs but similar outputs. Kinetic modeling, protein interaction and target screens focused on differences between invertebrate and vertebrate cyclin/CDK networks should yield the first clues.
The next question we decided to ask was whether signalling network nodes linked with 2ROs exhibited some characteristic features, such as the degree (that is, the number of interaction partners) or betweenness centrality (that is, the amount of network traffic, or information, flowing through a given node). The degree of human 2RO nodes was significantly increased, with the strongest effect on outdegree of negative regulation (Table ). This suggested that highly connected nodes, that is, network hubs, in particular those involving negative regulators, were preferentially retained. Enrichment of 2ROs in PI domains, as shown by PFAM analysis, also suggested higher interconnectedness of the post-2R network. The likely biological result of this trend towards greater network complexity was increased signalling robustness and cross-talk. Negative feedback loops, on the other hand, were likely to mediate inducible and temporary biological responses invoked by external stimuli or network oscillations facilitating spatiotemporal patterning during vertebrate development.
However, was high connectedness driving preferential retention, or was it merely a consequence of rediploidisation? If only we could sequence the genome of the AP2R animal! This is, of course, impossible, but some features can be inferred from extant species. To this end, we compared fly and human and found that hubs were already enriched in genes ancestral to 2ROs. High connectedness was therefore a factor contributing towards preferential retention. Interestingly, ancestral nodes associated with mammalian and chordate duplications exhibited even higher connectivity biases, but the progeny of these genes were not associated with human hubs. This could be explained by the evolutionary model in which all duplications preferentially target highly connected nodes but WGDs preserve their status as hubs, while tandem and segmental duplications remodel them towards reduced connectivity.
Do gene duplications conserve interactions or rewire duplicates with novel interaction partners? We must first define a few concepts which will help us approach network topology from the evolutionary perspective, with a focus on gene duplication. Let us define shared edges as a pair of edges extending between two nodes and an identical third node. A conserved edge, on the other hand, corresponds to an ancestral interaction in the ancestral network, which is still present in the extant network. We can see that shared edges between a pair of 2ROs are parsimoniously explained as conserved edges (derived from an ancestral interaction in the AP2R network), as the probability of gaining shared edges through convergent evolution is extremely low. Finally, a bridging edge is an edge directly linking the paralogous node pair, suggesting sophisticated forms of regulatory feedback and information processing between duplicates [31
]. The bridging edge is an evolutionary novelty created as a consequence of duplication, possibly but not necessarily associated with ancestral proteins prone to homodimerisation.
When the concepts of shared, conserved and bridging edges are applied to HCSM (Table ), a number of observations emerge: (1) the fraction of conserved edges is higher for 2ROs than for paralogs mapping to Chordates or Bilaterians, (2) the fraction of conserved regulatory edges with negative impact is higher than those with positive impact, and (3) complex novel network motifs are formed by bridged hubs (Figure ). Figure shows a graph representation of a HCSM subnetwork focusing on the apoptosis pathway featuring three bridged 2RO pairs. Overall, bridged pairs are extremely rich in signalling hubs, with twice the average number of interacting partners. In terms of the broader evolutionary impact, we propose that 450 million years ago, at the time of 2R, instantaneous doubling of the signalling network through WGD not only immediately expanded the available space of network states but also kick-started rapid coevolution of nodes into novel topologies. The cumulative effect was that of greatly increased phenotype space, enabling adaptation to an expanded range of physiological parameters, such as temperature, osmotic pressure, availability of nutrients and growth factors. Greater organismal adaptability facilitated, in turn, colonisation of novel environments or ecological niches. 2R-WGD was most likely an instantaneous speciation, in itself an extraordinary evolutionary event, somewhat contrary to the classic Darwinian view of gradual evolution. It probably took place under stress conditions on the fringes of the normal ecological range of the parental species. Few "hopeful monsters", with duplicated genomes, must have had an instant adaptability advantage to compete with AP2R parental populations, despite the increased costs of DNA replication, chromatin remodeling and chromosome segregation associated with polyploidy. For example, Conant and Wolfe [32
] proposed that yeast WGD conferred an immediate selective advantage for growth in high-glucose environments through the increase of dosage of genes in the glycolytic pathway. In the longer term, as proven by our GO analysis, 2R-WGD likely also provided a drive for increased morphological complexity [33
] and conferred greater evolvability, facilitating the emergence of vertebrate novelties.
Figure 5 Bridged pairs and shared edges between 2ROs in the apoptosis pathway. Graphed representation of a subnetwork of the human cancer signalling map (HCSM) focusing on the apoptosis pathway is shown. There are three pairs of 2ROs in the subnetwork: BCL2-like (more ...)