As a central pathway in metabolism, glycolysis has been highly conserved across multiple species from archaea to humans. The omnipresence of the glycolytic enzymes makes for a crude but standardized genomic measuring stick, comprising an ideal platform for studying pseudogenes.
Despite the high degree of conservation in the glycolytic enzymes, there is much more variation in their pseudogene abundances. Some genomes, like chicken, zebrafish, pufferfish, fruitfly, and worm, have very few or none, while others, like mouse and rat, have hundreds. The differences in pseudogene abundances alone suggests significant differences in the processes of gene expression, duplication, and retrotransposition in the different genomes. Previous studies have suggested that the difference lies in the prolonged lampbrush stage of oogenesis in mammalians as compared to non-mammalian organisms [
48,
49].
Most glycolytic pseudogenes are processed and can be assumed to be retrotransposed from an mRNA intermediate. It is possible that certain sequences intrinsic to the GAPDH and LDH genes may predispose them to be preferentially retrotranscribed, inserted, and preserved in the genome. These pseudogenes are classified as processed and not duplicated indicating their formation was the result of a retrotransposition event of the parent gene, rather than a duplication event. However, we must consider the possibility of formation of a processed pseudogene through a retrotransposition event and its subsequent duplication giving rise to so called "duplicated-processed" pseudogenes. Thus, while duplicated pseudogenes result from the duplication of parent gene, duplicated-processed pseudogenes result from the duplication of a processed pseudogene [
50,
51]. One way to differentiate processed pseudogenes from duplicated-processed pseudogenes is to check if the segments of the genome surrounding a pair of processed pseudogenes are also similar. Hence, we checked for the presence of 60 processed pseudogenes of human GAPDH in duplicated regions of the genome called segmental duplications [
52]. A pair of processed pseudogenes located in segmental duplication pairs indicates that one of the pseudogenes was likely formed by the duplication of the other one and hence is a duplicated-processed pseudogene (Figure ). We identified eight duplicated-processed pseudogenes by this analysis, listed in Additional File
1. However, six of those eight pseudogenes occupy > 77% of the segments that are duplicated and could be the result of independent retrotransposition events. In this scenario perhaps the high sequence similarity of these segments led to their annotation as segmental duplications.
As a coincident finding, GAPDH has many more biological roles outside glycolysis as compared to the other glycolytic enzymes. For example, GAPDH functions in DNA repair, telomeric DNA binding, transcriptional regulation, nuclear RNA export, apoptosis, membrane fusion, phosphorylation, tubulin bundling, and sperm motility [
53-
59]. Because the molecular processes of retrotransposition are separate from the enzymatic functionalities, we can only speculate that the preponderance of non-glycolytic roles may be correlated to the enrichment of GAPDH pseudogenes.
In an intergenomic analysis, GAPDH pseudogenes have about five- to six-fold greater abundance in the rodent genomes as in the primate genomes even though overall the mouse genome was found to have about half as many pseudogenes as the human genome [
3]. The mouse genome has higher rates of nucleotide substitution, insertion, and deletion [
33] than the human genome, leading to a higher rate of pseudogene decay. However, the higher rate of pseudogene decay seems to have preferentially spared the GAPDH pseudogenes.
To further characterize the molecular history of pseudogenes in the human, chimpanzee, mouse, and rat genomes, it was necessary to identify the pseudogenes that were most likely present prior to the primate-rodent ancestral divergence. We used orthologous genes to identify regions of synteny between primate-rodent genome pairs. This approach is based on the assumption that gene-coding regions are much less variable than intergenic regions because of functional constraints and are therefore more reliably matched between genome pairs.
The scarcity of GAPDH pseudogenes syntenic between the primate and rodent genomes suggests an increase in retrotranspositional activity after the primate-rodent divergence 91 million years ago, which is consistent with the findings of previous investigators [
6]. In order to achieve more detail in the timeline and provide further corroboration, we used Kimura's two-parameter model of nucleotide substitution to estimate the rates of change in the GAPDH genes and pseudogenes and thereby calculate the insertion date of each pseudogene. The creation dates formed three distinct distributions centered at 42.0, 36.3, and 25.9 million years ago in the human, mouse, and rat genomes, respectively, signifying a burst in retrotranspositional activity around those times. Kimura's model assumes neutrally evolving sequences, as in many pseudogenes [
42], but some may initially be subject to natural selection [
12] and the ages of these pseudogenes may be underestimated. In the human genome, the bursts in retrotranspositional activity may coincide with the "Alu burst" that occurred about 40 million years ago in primate genomes [
60,
1,
5,
61]. By examining the sensitivity of our pseudogene pipeline, as decribed under Methods, we found that the number of pseudogenes does not vary significantly with the threshold for sequence identity or BLAST score when compared to the parent gene. Thus, we believe this dating method accurately reflects all GAPDH pseudogenes and is not significantly biased towards longer and therefore younger pseudogenes.