1.  Positive and Purifying Selection on the Drosophila Y Chromosome 
Molecular Biology and Evolution  2014;31(10):2612-2623.
Y chromosomes, with their reduced effective population size, lack of recombination, and male-limited transmission, present a unique collection of constraints for the operation of natural selection. Male-limited transmission may greatly increase the efficacy of selection for male-beneficial mutations, but the reduced effective size also inflates the role of random genetic drift. Together, these defining features of the Y chromosome are expected to influence rates and patterns of molecular evolution on the Y as compared with X-linked or autosomal loci. Here, we use sequence data from 11 genes in 9 Drosophila species to gain insight into the efficacy of natural selection on the Drosophila Y relative to the rest of the genome. Drosophila is an ideal system for assessing the consequences of Y-linkage for molecular evolution in part because the gene content of Drosophila Y chromosomes is highly dynamic, with orthologous genes being Y-linked in some species whereas autosomal in others. Our results confirm the expectation that the efficacy of natural selection at weakly selected sites is reduced on the Y chromosome. In contrast, purifying selection on the Y chromosome for strongly deleterious mutations does not appear to be compromised. Finally, we find evidence of recurrent positive selection for 4 of the 11 genes studied here. Our results thus highlight the variable nature of the mode and impact of natural selection on the Drosophila Y chromosome.
2.  The Globin Gene Repertoire of Lampreys: Convergent Evolution of Hemoglobin and Myoglobin in Jawed and Jawless Vertebrates 
Molecular Biology and Evolution  2014;31(10):2708-2721.
Agnathans (jawless vertebrates) occupy a key phylogenetic position for illuminating the evolution of vertebrate anatomy and physiology. Evaluation of the agnathan globin gene repertoire can thus aid efforts to reconstruct the origin and evolution of the globin genes of vertebrates, a superfamily that includes the well-known model proteins hemoglobin and myoglobin. Here, we report a comprehensive analysis of the genome of the sea lamprey (Petromyzon marinus) which revealed 23 intact globin genes and two hemoglobin pseudogenes. Analyses of the genome of the Arctic lamprey (Lethenteron camtschaticum) identified 18 full length and five partial globin gene sequences. The majority of the globin genes in both lamprey species correspond to the known agnathan hemoglobins. Both genomes harbor two copies of globin X, an ancient globin gene that has a broad phylogenetic distribution in the animal kingdom. Surprisingly, we found no evidence for an ortholog of neuroglobin in the lamprey genomes. Expression and phylogenetic analyses identified an ortholog of cytoglobin in the lampreys; in fact, our results indicate that cytoglobin is the only orthologous vertebrate-specific globin that has been retained in both gnathostomes and agnathans. Notably, we also found two globins that are highly expressed in the heart of P. marinus, thus representing functional myoglobins. Both genes have orthologs in L. camtschaticum. Phylogenetic analyses indicate that these heart-expressed globins are not orthologous to the myoglobins of jawed vertebrates (Gnathostomata), but originated independently within the agnathans. The agnathan myoglobin and hemoglobin proteins form a monophyletic group to the exclusion of functionally analogous myoglobins and hemoglobins of gnathostomes, indicating that specialized respiratory proteins for O2 transport in the blood and O2 storage in the striated muscles evolved independently in both lineages. This dual convergence of O2-transport and O2-storage proteins in agnathans and gnathostomes involved the convergent co-option of different precursor proteins in the ancestral globin repertoire of vertebrates.
3.  Evidence for Increased Levels of Positive and Negative Selection on the X Chromosome versus Autosomes in Humans 
Molecular Biology and Evolution  2014;31(9):2267-2282.
Partially recessive variants under positive selection are expected to go to fixation more quickly on the X chromosome as a result of hemizygosity, an effect known as faster-X. Conversely, purifying selection is expected to reduce substitution rates more effectively on the X chromosome. Previous work in humans contrasted divergence on the autosomes and X chromosome, with results tending to support the faster-X effect. However, no study has yet incorporated both divergence and polymorphism to quantify the effects of both purifying and positive selection, which are opposing forces with respect to divergence. In this study, we develop a framework that integrates previously developed theory addressing differential rates of X and autosomal evolution with methods that jointly estimate the level of purifying and positive selection via modeling of the distribution of fitness effects (DFE). We then utilize this framework to estimate the proportion of nonsynonymous substitutions fixed by positive selection (α) using exome sequence data from a West African population. We find that varying the female to male breeding ratio (β) has minimal impact on the DFE for the X chromosome, especially when compared with the effect of varying the dominance coefficient of deleterious alleles (h). Estimates of α range from 46% to 51% and from 4% to 24% for the X chromosome and autosomes, respectively. While dependent on h, the magnitude of the difference between α values estimated for these two systems is highly statistically significant over a range of biologically realistic parameter values, suggesting faster-X has been operating in humans.
4.  New Gene Evolution in the Bonus-TIF1-γ/TRIM33 Family Impacted the Architecture of the Vertebrate Dorsal–Ventral Patterning Network 
Molecular Biology and Evolution  2014;31(9):2309-2321.
Uncovering how a new gene acquires its function and understanding how the function of a new gene influences existing genetic networks are important topics in evolutionary biology. Here, we demonstrate nonconservation for the embryonic functions of Drosophila Bonus and its newest vertebrate relative TIF1-γ/TRIM33. We showed previously that TIF1-γ/TRIM33 functions as an ubiquitin ligase for the Smad4 signal transducer and antagonizes the Bone Morphogenetic Protein (BMP) signaling network underlying vertebrate dorsal–ventral axis formation. Here, we show that Bonus functions as an agonist of the Decapentaplegic (Dpp) signaling network underlying dorsal–ventral axis formation in flies. The absence of conservation for the roles of Bonus and TIF1-γ/TRIM33 reveals a shift in the dorsal–ventral patterning networks of flies and mice, systems that were previously considered wholly conserved. The shift occurred when the new gene TIF1-γ/TRIM33 replaced the function of the ubiquitin ligase Nedd4L in the lineage leading to vertebrates. Evidence of this replacement is our demonstration that Nedd4 performs the function of TIF1-γ/TRIM33 in flies during dorsal–ventral axis formation. The replacement allowed vertebrate Nedd4L to acquire novel functions as a ubiquitin ligase of vertebrate-specific Smad proteins. Overall our data reveal that the architecture of the Dpp/BMP dorsal–ventral patterning network continued to evolve in the vertebrate lineage, after separation from flies, via the incorporation of new genes.
5.  Timing and Order of Transmission Events Is Not Directly Reflected in a Pathogen Phylogeny 
Molecular Biology and Evolution  2014;31(9):2472-2482.
Pathogen phylogenies are often used to infer spread among hosts. There is, however, not an exact match between the pathogen phylogeny and the host transmission history. Here, we examine in detail the limitations of this relationship. First, all splits in a pathogen phylogeny of more than 1 host occur within hosts, not at the moment of transmission, predating the transmission events as described by the pretransmission interval. Second, the order in which nodes in a phylogeny occur may be reflective of the within-host dynamics rather than epidemiologic relationships. To investigate these phenomena, motivated by within-host diversity patterns, we developed a two-phase coalescent model that includes a transmission bottleneck followed by linear outgrowth to a maximum population size followed by either stabilization or decline of the population. The model predicts that the pretransmission interval shrinks compared with predictions based on constant population size or a simple transmission bottleneck. Because lineages coalesce faster in a small population, the probability of a pathogen phylogeny to resemble the transmission history depends on when after infection a donor transmits to a new host. We also show that the probability of inferring the incorrect order of multiple transmissions from the same host is high. Finally, we compare time of HIV-1 infection informed by genetic distances in phylogenies to independent biomarker data, and show that, indeed, the pretransmission interval biases phylogeny-based estimates of when transmissions occurred. We describe situations where caution is needed not to misinterpret which parts of a phylogeny that may indicate outbreaks and tight transmission clusters.
6.  Limited Utility of Residue Masking for Positive-Selection Inference 
Molecular Biology and Evolution  2014;31(9):2496-2500.
Errors in multiple sequence alignments (MSAs) can reduce accuracy in positive-selection inference. Therefore, it has been suggested to filter MSAs before conducting further analyses. One widely used filter, Guidance, allows users to remove MSA positions aligned with low confidence. However, Guidance’s utility in positive-selection inference has been disputed in the literature. We have conducted an extensive simulation-based study to characterize fully how Guidance impacts positive-selection inference, specifically for protein-coding sequences of realistic divergence levels. We also investigated whether novel scoring algorithms, which phylogenetically corrected confidence scores, and a new gap-penalization score-normalization scheme improved Guidance’s performance. We found that no filter, including original Guidance, consistently benefitted positive-selection inferences. Moreover, all improvements detected were exceedingly minimal, and in certain circumstances, Guidance-based filters worsened inferences.
7.  Simultaneous Bayesian Estimation of Alignment and Phylogeny under a Joint Model of Protein Sequence and Structure 
Molecular Biology and Evolution  2014;31(9):2251-2266.
For sequences that are highly divergent, there is often insufficient information to infer accurate alignments, and phylogenetic uncertainty may be high. One way to address this issue is to make use of protein structural information, since structures generally diverge more slowly than sequences. In this work, we extend a recently developed stochastic model of pairwise structural evolution to multiple structures on a tree, analytically integrating over ancestral structures to permit efficient likelihood computations under the resulting joint sequence–structure model. We observe that the inclusion of structural information significantly reduces alignment and topology uncertainty, and reduces the number of topology and alignment errors in cases where the true trees and alignments are known. In some cases, the inclusion of structure results in changes to the consensus topology, indicating that structure may contain additional information beyond that which can be obtained from sequences. We use the model to investigate the order of divergence of cytoglobins, myoglobins, and hemoglobins and observe a stabilization of phylogenetic inference: although a sequence-based inference assigns significant posterior probability to several different topologies, the structural model strongly favors one of these over the others and is more robust to the choice of data set.
8.  Evolutionary Origins of Human Herpes Simplex Viruses 1 and 2 
Molecular Biology and Evolution  2014;31(9):2356-2364.
Herpesviruses have been infecting and codiverging with their vertebrate hosts for hundreds of millions of years. The primate simplex viruses exemplify this pattern of virus–host codivergence, at a minimum, as far back as the most recent common ancestor of New World monkeys, Old World monkeys, and apes. Humans are the only primate species known to be infected with two distinct herpes simplex viruses: HSV-1 and HSV-2. Human herpes simplex viruses are ubiquitous, with over two-thirds of the human population infected by at least one virus. Here, we investigated whether the additional human simplex virus is the result of ancient viral lineage duplication or cross-species transmission. We found that standard phylogenetic models of nucleotide substitution are inadequate for distinguishing among these competing hypotheses; the extent of synonymous substitutions causes a substantial underestimation of the lengths of some of the branches in the phylogeny, consistent with observations in other viruses (e.g., avian influenza, Ebola, and coronaviruses). To more accurately estimate ancient viral divergence times, we applied a branch-site random effects likelihood model of molecular evolution that allows the strength of natural selection to vary across both the viral phylogeny and the gene alignment. This selection-informed model favored a scenario in which HSV-1 is the result of ancient codivergence and HSV-2 arose from a cross-species transmission event from the ancestor of modern chimpanzees to an extinct Homo precursor of modern humans, around 1.6 Ma. These results provide a new framework for understanding human herpes simplex virus evolution and demonstrate the importance of using selection-informed models of sequence evolution when investigating viral origin hypotheses.
9.  Prospects for Building Large Timetrees Using Molecular Data with Incomplete Gene Coverage among Species 
Molecular Biology and Evolution  2014;31(9):2542-2550.
Scientists are assembling sequence data sets from increasing numbers of species and genes to build comprehensive timetrees. However, data are often unavailable for some species and gene combinations, and the proportion of missing data is often large for data sets containing many genes and species. Surprisingly, there has not been a systematic analysis of the effect of the degree of sparseness of the species–gene matrix on the accuracy of divergence time estimates. Here, we present results from computer simulations and empirical data analyses to quantify the impact of missing gene data on divergence time estimation in large phylogenies. We found that estimates of divergence times were robust even when sequences from a majority of genes for most of the species were absent. From the analysis of such extremely sparse data sets, we found that the most egregious errors occurred for nodes in the tree that had no common genes for any pair of species in the immediate descendant clades of the node in question. These problematic nodes can be easily detected prior to computational analyses based only on the input sequence alignment and the tree topology. We conclude that it is best to use larger alignments, because adding both genes and species to the alignment augments the number of genes available for estimating divergence events deep in the tree and improves their time estimates.
10.  Evolution in an Ancient Detoxification Pathway Is Coupled with a Transition to Herbivory in the Drosophilidae 
Molecular Biology and Evolution  2014;31(9):2441-2456.
Chemically defended plant tissues present formidable barriers to herbivores. Although mechanisms to resist plant defenses have been identified in ancient herbivorous lineages, adaptations to overcome plant defenses during transitions to herbivory remain relatively unexplored. The fly genus Scaptomyza is nested within the genus Drosophila and includes species that feed on the living tissue of mustard plants (Brassicaceae), yet this lineage is derived from microbe-feeding ancestors. We found that mustard-feeding Scaptomyza species and microbe-feeding Drosophila melanogaster detoxify mustard oils, the primary chemical defenses in the Brassicaceae, using the widely conserved mercapturic acid pathway. This detoxification strategy differs from other specialist herbivores of mustard plants, which possess derived mechanisms to obviate mustard oil formation. To investigate whether mustard feeding is coupled with evolution in the mercapturic acid pathway, we profiled functional and molecular evolutionary changes in the enzyme glutathione S-transferase D1 (GSTD1), which catalyzes the first step of the mercapturic acid pathway and is induced by mustard defense products in Scaptomyza. GSTD1 acquired elevated activity against mustard oils in one mustard-feeding Scaptomyza species in which GstD1 was duplicated. Structural analysis and mutagenesis revealed that substitutions at conserved residues within and near the substrate-binding cleft account for most of this increase in activity against mustard oils. Functional evolution of GSTD1 was coupled with signatures of episodic positive selection in GstD1 after the evolution of herbivory. Overall, we found that preexisting functions of generalized detoxification systems, and their refinement by natural selection, could play a central role in the evolution of herbivory.
11.  The Coevolutionary Period of Wolbachia pipientis Infecting Drosophila ananassae and Its Impact on the Evolution of the Host Germline Stem Cell Regulating Genes 
Molecular Biology and Evolution  2014;31(9):2457-2471.
The endosymbiotic bacteria Wolbachia pipientis is known to infect a wide range of arthropod species yet less is known about the coevolutionary history it has with its hosts. Evidence of highly identical W. pipientis strains in evolutionary divergent hosts suggests horizontal transfer between hosts. For example, Drosophila ananassae is infected with a W. pipientis strain that is nearly identical in sequence to a strain that infects both D. simulans and D. suzukii, suggesting recent horizontal transfer among these three species. However, it is unknown whether the W. pipientis strain had recently invaded all three species or a more complex infectious dynamic underlies the horizontal transfers. Here, we have examined the coevolutionary history of D. ananassae and its resident W. pipientis to infer its period of infection. Phylogenetic analysis of D. ananassae mitochondrial DNA and W. pipientis DNA sequence diversity revealed the current W. pipientis infection is not recent. In addition, we examined the population genetics and molecular evolution of several germline stem cell (GSC) regulating genes of D. ananassae. These studies reveal significant evidence of recent and long-term positive selection at stonewall in D. ananassae, whereas pumillio showed patterns of variation consistent with only recent positive selection. Previous studies had found evidence for adaptive evolution of two key germline differentiation genes, bag of marbles (bam) and benign gonial cell neoplasm (bgcn), in D. melanogaster and D. simulans and proposed that the adaptive evolution at these two genes was driven by arms race between the host GSC and W. pipientis. However, we did not find any statistical departures from a neutral model of evolution for bam and bgcn in D. ananassae despite our new evidence that this species has been infected with W. pipientis for a period longer than the most recent infection in D. melanogaster. In the end, analyzing the GSC regulating genes individually showed two of the seven genes to have evidence of selection. However, combining the data set and fitting a specific population genetic model significant proportion of the nonsynonymous sites across the GSC regulating genes were driven to fixation by positive selection. Clearly the GSC system is under rapid evolution and potentially multiple drivers are causing the rapid evolution.
12.  Cis-Regulatory Changes Associated with a Recent Mating System Shift and Floral Adaptation in Capsella 
Molecular Biology and Evolution  2015;32(10):2501-2514.
The selfing syndrome constitutes a suite of floral and reproductive trait changes that have evolved repeatedly across many evolutionary lineages in response to the shift to selfing. Convergent evolution of the selfing syndrome suggests that these changes are adaptive, yet our understanding of the detailed molecular genetic basis of the selfing syndrome remains limited. Here, we investigate the role of cis-regulatory changes during the recent evolution of the selfing syndrome in Capsella rubella, which split from the outcrosser Capsella grandiflora less than 200 ka. We assess allele-specific expression (ASE) in leaves and flower buds at a total of 18,452 genes in three interspecific F1 C. grandiflora x C. rubella hybrids. Using a hierarchical Bayesian approach that accounts for technical variation using genomic reads, we find evidence for extensive cis-regulatory changes. On average, 44% of the assayed genes show evidence of ASE; however, only 6% show strong allelic expression biases. Flower buds, but not leaves, show an enrichment of cis-regulatory changes in genomic regions responsible for floral and reproductive trait divergence between C. rubella and C. grandiflora. We further detected an excess of heterozygous transposable element (TE) insertions near genes with ASE, and TE insertions targeted by uniquely mapping 24-nt small RNAs were associated with reduced expression of nearby genes. Our results suggest that cis-regulatory changes have been important during the recent adaptive floral evolution in Capsella and that differences in TE dynamics between selfing and outcrossing species could be important for rapid regulatory divergence in association with mating system shifts.
13.  Erasing Errors due to Alignment Ambiguity When Estimating Positive Selection 
Molecular Biology and Evolution  2014;31(8):1979-1993.
Current estimates of diversifying positive selection rely on first having an accurate multiple sequence alignment. Simulation studies have shown that under biologically plausible conditions, relying on a single estimate of the alignment from commonly used alignment software can lead to unacceptably high false-positive rates in detecting diversifying positive selection. We present a novel statistical method that eliminates excess false positives resulting from alignment error by jointly estimating the degree of positive selection and the alignment under an evolutionary model. Our model treats both substitutions and insertions/deletions as sequence changes on a tree and allows site heterogeneity in the substitution process. We conduct inference starting from unaligned sequence data by integrating over all alignments. This approach naturally accounts for ambiguous alignments without requiring ambiguously aligned sites to be identified and removed prior to analysis. We take a Bayesian approach and conduct inference using Markov chain Monte Carlo to integrate over all alignments on a fixed evolutionary tree topology. We introduce a Bayesian version of the branch-site test and assess the evidence for positive selection using Bayes factors. We compare two models of differing dimensionality using a simple alternative to reversible-jump methods. We also describe a more accurate method of estimating the Bayes factor using Rao-Blackwellization. We then show using simulated data that jointly estimating the alignment and the presence of positive selection solves the problem with excessive false positives from erroneous alignments and has nearly the same power to detect positive selection as when the true alignment is known. We also show that samples taken from the posterior alignment distribution using the software BAli-Phy have substantially lower alignment error compared with MUSCLE, MAFFT, PRANK, and FSA alignments.
14.  Expression Evolution Facilitated the Convergent Neofunctionalization of a Sodium Channel Gene 
Molecular Biology and Evolution  2014;31(8):1941-1955.
Ion channels have played a substantial role in the evolution of novel traits across all of the domains of life. A fascinating example of a novel adaptation is the convergent evolution of electric organs in the Mormyroid and Gymnotiform electric fishes. The regulated currents that flow through ion channels directly generate the electrical signals which have evolved in these fish. Here, we investigated how the expression evolution of two sodium channel paralogs (Scn4aa and Scn4ab) influenced their convergent molecular evolution following the teleost-specific whole-genome duplication. We developed a reliable assay to accurately measure the expression stoichiometry of these genes and used this technique to analyze relative expression of the duplicate genes in a phylogenetic context. We found that before a major shift in expression from skeletal muscle and neofunctionalization in the muscle-derived electric organ, Scn4aa was first downregulated in the ancestors of both electric lineages. This indicates that underlying the convergent evolution of this gene, there was a greater propensity toward neofunctionalization due to its decreased expression relative to its paralog Scn4ab. We investigated another derived muscle tissue, the sonic organ of Porichthys notatus, and show that, as in the electric fishes, Scn4aa again shows a radical shift in expression away from the ancestral muscle cells into the evolutionarily novel muscle-derived tissue. This study presents evidence that expression downregulation facilitates neofunctionalization after gene duplication, a pattern that may often set the stage for novel trait evolution after gene duplication.
15.  A Small System—High-Resolution Study of Metabolic Adaptation in the Central Metabolic Pathway to Temperate Climates in Drosophila melanogaster 
Molecular Biology and Evolution  2014;31(8):2032-2041.
In this article, we couple the geographic variation in 127 single-nucleotide polymorphism (SNP) frequencies in genes of 46 enzymes of central metabolism with their associated cis-expression variation to predict latitudinal or climatic-driven gene expression changes in the metabolic architecture of Drosophila melanogaster. Forty-two percent of the SNPs in 65% of the genes show statistically significant clines in frequency with latitude across the 20 local population samples collected from southern Florida to Ontario. A number of SNPs in the screened genes are also associated with significant expression variation within the Raleigh population from North Carolina. A principal component analysis of the full variance–covariance matrix of latitudinal changes in SNP-associated standardized gene expression allows us to identify those major genes in the pathway and its associated branches that are likely targets of natural selection. When embedded in a central metabolic context, we show that these apparent targets are concentrated in the genes of the upper glycolytic pathway and pentose shunt, those controlling glycerol shuttle activity, and finally those enzymes associated with the utilization of glutamate and pyruvate. These metabolites possess high connectivity and thus may be the points where flux balance can be best shifted. We also propose that these points are conserved points associated with coupling energy homeostasis and energy sensing in mammals. We speculate that the modulation of gene expression at specific points in central metabolism that are associated with shifting flux balance or possibly energy-state sensing plays a role in adaptation to climatic variation.
16.  Accelerated Evolution of Morph-Biased Genes in Pea Aphids 
Molecular Biology and Evolution  2014;31(8):2073-2083.
Phenotypic plasticity, the production of alternative phenotypes (or morphs) from the same genotype due to environmental factors, results in some genes being expressed in a morph-biased manner. Theoretically, these morph-biased genes experience relaxed selection, the consequence of which is the buildup of slightly deleterious mutations at these genes. Over time, this is expected to result in increased protein divergence at these genes between species and a signature of relaxed purifying selection within species. Here we test these theoretical expectations using morph-biased genes in the pea aphid, a species that produces multiple morphs via polyphenism. We find that morph-biased genes exhibit faster rates of evolution (in terms of dN/dS) relative to unbiased genes and that divergence generally increases with increasing morph bias. Further, genes with expression biased toward rarer morphs (sexual females and males) show faster rates of evolution than genes expressed in the more common morph (asexual females), demonstrating that the amount of time a gene spends being expressed in a morph is associated with its rate of evolution. And finally, we show that genes expressed in the rarer morphs experience decreased purifying selection relative to unbiased genes, suggesting that it is a relaxation of purifying selection that contributes to their faster rates of evolution. Our results provide an important empirical look at the impact of phenotypic plasticity on gene evolution.
17.  Clawing through Evolution: Toxin Diversification and Convergence in the Ancient Lineage Chilopoda (Centipedes) 
Molecular Biology and Evolution  2014;31(8):2124-2148.
Despite the staggering diversity of venomous animals, there seems to be remarkable convergence in regard to the types of proteins used as toxin scaffolds. However, our understanding of this fascinating area of evolution has been hampered by the narrow taxonomical range studied, with entire groups of venomous animals remaining almost completely unstudied. One such group is centipedes, class Chilopoda, which emerged about 440 Ma and may represent the oldest terrestrial venomous lineage next to scorpions. Here, we provide the first comprehensive insight into the chilopod “venome” and its evolution, which has revealed novel and convergent toxin recruitments as well as entirely new toxin families among both high- and low molecular weight venom components. The ancient evolutionary history of centipedes is also apparent from the differences between the Scolopendromorpha and Scutigeromorpha venoms, which diverged over 430 Ma, and appear to employ substantially different venom strategies. The presence of a wide range of novel proteins and peptides in centipede venoms highlights these animals as a rich source of novel bioactive molecules. Understanding the evolutionary processes behind these ancient venom systems will not only broaden our understanding of which traits make proteins and peptides amenable to neofunctionalization but it may also aid in directing bioprospecting efforts.
18.  Studying Tumorigenesis through Network Evolution and Somatic Mutational Perturbations in the Cancer Interactome 
Molecular Biology and Evolution  2014;31(8):2156-2169.
Cells govern biological functions through complex biological networks. Perturbations to networks may drive cells to new phenotypic states, for example, tumorigenesis. Identifying how genetic lesions perturb molecular networks is a fundamental challenge. This study used large-scale human interactome data to systematically explore the relationship among network topology, somatic mutation, evolutionary rate, and evolutionary origin of cancer genes. We found the unique network centrality of cancer proteins, which is largely independent of gene essentiality. Cancer genes likely have experienced a lower evolutionary rate and stronger purifying selection than those of noncancer, Mendelian disease, and orphan disease genes. Cancer proteins tend to have ancient histories, likely originated in early metazoan, although they are younger than proteins encoded by Mendelian disease genes, orphan disease genes, and essential genes. We found that the protein evolutionary origin (age) positively correlates with protein connectivity in the human interactome. Furthermore, we investigated the network-attacking perturbations due to somatic mutations identified from 3,268 tumors across 12 cancer types in The Cancer Genome Atlas. We observed a positive correlation between protein connectivity and the number of nonsynonymous somatic mutations, whereas a weaker or insignificant correlation between protein connectivity and the number of synonymous somatic mutations. These observations suggest that somatic mutational network-attacking perturbations to hub genes play an important role in tumor emergence and evolution. Collectively, this work has broad biomedical implications for both basic cancer biology and the development of personalized cancer therapy.
19.  Relocation Facilitates the Acquisition of Short Cis-Regulatory Regions that Drive the Expression of Retrogenes during Spermatogenesis in Drosophila 
Molecular Biology and Evolution  2014;31(8):2170-2180.
Retrogenes are functional processed copies of genes that originate via the retrotranscription of an mRNA intermediate and often exhibit testis-specific expression. Although this expression pattern appears to be favored by selection, the origin of such expression bias remains unexplained. Here, we study the regulation of two young testis-specific Drosophila retrogenes, Dntf-2r and Pros28.1A, using genetic transformation and the enhanced green fluorescent protein reporter gene in Drosophila melanogaster. We show that two different short (<24 bp) regions upstream of the transcription start sites (TSSs) act as testis-specific regulatory motifs in these genes. The Dntf-2r regulatory region is similar to the known β2 tubulin 14-bp testis motif (β2-tubulin gene upstream element 1 [β2-UE1]). Comparative sequence analyses reveal that this motif was already present before the Dntf-2r insertion and was likely driving the transcription of a noncoding RNA. We also show that the β2-UE1 occurs in the regulatory regions of other testis-specific retrogenes, and is functional in either orientation. In contrast, the Pros28.1A testes regulatory region in D. melanogaster appears to be novel. Only Pros28.1B, an older paralog of the Pros28.1 gene family, seems to carry a similar regulatory sequence. It is unclear how the Pros28.1A regulatory region was acquired in D. melanogaster, but it might have evolved de novo from within a region that may have been preprimed for testes expression. We conclude that relocation is critical for the evolutionary origin of male germline-specific cis-regulatory regions of retrogenes because expression depends on either the site of the retrogene insertion or the sequence changes close to the TSS thereafter. As a consequence we infer that positive selection will play a role in the evolution of these regulatory regions and can often act from the moment of the retrocopy insertion.
20.  Alternative Splice in Alternative Lice 
Molecular Biology and Evolution  2015;32(10):2749-2759.
Genomic and transcriptomics analyses have revealed human head and body lice to be almost genetically identical; although con-specific, they nevertheless occupy distinct ecological niches and have differing feeding patterns. Most importantly, while head lice are not known to be vector competent, body lice can transmit three serious bacterial diseases; epidemictyphus, trench fever, and relapsing fever. In order to gain insights into the molecular bases for these differences, we analyzed alternative splicing (AS) using next-generation sequencing data for one strain of head lice and one strain of body lice. We identified a total of 3,598 AS events which were head or body lice specific. Exon skipping AS events were overrepresented among both head and body lice, whereas intron retention events were underrepresented in both. However, both the enrichment of exon skipping and the underrepresentation of intron retention are significantly stronger in body lice compared with head lice. Genes containing body louse-specific AS events were found to be significantly enriched for functions associated with development of the nervous system, salivary gland, trachea, and ovarian follicle cells, as well as regulation of transcription. In contrast, no functional categories were overrepresented among genes with head louse-specific AS events. Together, our results constitute the first evidence for transcript pool differences in head and body lice, providing insights into molecular adaptations that enabled human lice to adapt to clothing, and representing a powerful illustration of the pivotal role AS can play in functional adaptation.
21.  Extraordinary Genetic Diversity in a Wood Decay Mushroom 
Molecular Biology and Evolution  2015;32(10):2775-2783.
Populations of different species vary in the amounts of genetic diversity they possess. Nucleotide diversity π, the fraction of nucleotides that are different between two randomly chosen genotypes, has been known to range in eukaryotes between 0.0001 in Lynx lynx and 0.16 in Caenorhabditis brenneri. Here, we report the results of a comparative analysis of 24 haploid genotypes (12 from the United States and 12 from European Russia) of a split-gill fungus Schizophyllum commune. The diversity at synonymous sites is 0.20 in the American population of S. commune and 0.13 in the Russian population. This exceptionally high level of nucleotide diversity also leads to extreme amino acid diversity of protein-coding genes. Using whole-genome resequencing of 2 parental and 17 offspring haploid genotypes, we estimate that the mutation rate in S. commune is high, at 2.0 × 10−8 (95% CI: 1.1 × 10−8 to 4.1 × 10−8) per nucleotide per generation. Therefore, the high diversity of S. commune is primarily determined by its elevated mutation rate, although high effective population size likely also plays a role. Small genome size, ease of cultivation and completion of the life cycle in the laboratory, free-living haploid life stages and exceptionally high variability of S. commune make it a promising model organism for population, quantitative, and evolutionary genetics.
22.  Limited Gene Misregulation Is Exacerbated by Allele-Specific Upregulation in Lethal Hybrids between Drosophila melanogaster and Drosophila simulans 
Molecular Biology and Evolution  2014;31(7):1767-1778.
Misregulation of gene expression is often observed in interspecific hybrids and is generally attributed to regulatory incompatibilities caused by divergence between the two genomes. However, it has been challenging to distinguish effects of regulatory divergence from secondary effects including developmental and physiological defects common to hybrids. Here, we use RNA-Seq to profile gene expression in F1 hybrid male larvae from crosses of Drosophila melanogaster to its sibling species D. simulans. We analyze lethal and viable hybrid males, the latter produced using a mutation in the X-linked D. melanogaster Hybrid male rescue (Hmr) gene and compare them with their parental species and to public data sets of gene expression across development. We find that Hmr has drastically different effects on the parental and hybrid genomes, demonstrating that hybrid incompatibility genes can exhibit novel properties in the hybrid genetic background. Additionally, we find that D. melanogaster alleles are preferentially affected between lethal and viable hybrids. We further determine that many of the differences between the hybrids result from developmental delay in the Hmr+ hybrids. Finally, we find surprisingly modest expression differences in hybrids when compared with the parents, with only 9% and 4% of genes deviating from additivity or expressed outside of the parental range, respectively. Most of these differences can be attributed to developmental delay and differences in tissue types. Overall, our study suggests that hybrid gene misexpression is prone to overestimation and that even between species separated by approximately 2.5 Ma, regulatory incompatibilities are not widespread in hybrids.
23.  Inferring the Evolutionary History of Primate microRNA Binding Sites: Overcoming Motif Counting Biases 
Molecular Biology and Evolution  2014;31(7):1894-1901.
The first microRNAs (miRNAs) were identified as essential, conserved regulators of gene expression, targeting the same genes across nearly all bilaterians. However, there are also prominent examples of conserved miRNAs whose functions appear to have shifted dramatically, sometimes over very brief periods of evolutionary time. To determine whether the functions of conserved miRNAs are stable or dynamic over evolutionary time scales, we have here defined the neutral turnover rates of short sequence motifs in predicted primate 3′-UTRs. We find that commonly used approaches to quantify motif turnover rates, which use a presence/absence scoring in extant lineages to infer ancestral states, are inherently biased to infer the accumulation of new motifs, leading to the false inference of continually increasing regulatory complexity over time. Using a maximum likelihood approach to reconstruct individual ancestral nucleotides, we observe that binding sites of conserved miRNAs in fact have roughly equal numbers of gain and loss events relative to ancestral states and turnover extremely slowly relative to nearly identical permutations of the same motif. Contrary to case studies showing examples of functional turnover, our systematic study of miRNA binding sites suggests that in primates, the regulatory roles of conserved miRNAs are strongly conserved. Our revised methodology may be used to quantify the mechanism by which regulatory networks evolve.
24.  Why Human Disease-Associated Residues Appear as the Wild-Type in Other Species: Genome-Scale Structural Evidence for the Compensation Hypothesis 
Molecular Biology and Evolution  2014;31(7):1787-1792.
Many human-disease associated amino acid residues (DARs) appear as the wild-type in other species. This phenomenon is commonly explained by the presence of compensatory residues in these other species that alleviate the deleterious effects of the DARs. The general validity of this hypothesis, however, is unclear, because few compensatory residues have been identified. Here we test the compensation hypothesis by assembling and analyzing 1,077 DARs located in 177 proteins of known crystal structures. Because destabilizing protein structures is a primary reason why DARs are deleterious, we focus on protein stability in this analysis. We discover that, in species where a DAR represents the wild-type, the destabilizing effect of the DAR is generally lessened by the observed amino acid substitutions in the spatial proximity of the DAR. This and other findings provide genome-scale evidence for the compensation hypothesis and have important implications for understanding epistasis in protein evolution and for using animal models of human diseases.
25.  Signatures of Natural Selection on Mutations of Residues with Multiple Posttranslational Modifications 
Molecular Biology and Evolution  2014;31(7):1641-1645.
Posttranslational modifications (PTMs) regulate molecular structures and functions of proteins by covalently binding to amino acids. Hundreds of thousands of PTMs have been reported for the human proteome, with multiple PTMs known to affect tens of thousands of lysine (K) residues. Our molecular evolutionary analyses show that K residues with multiple PTMs exhibit greater conservation than those with a single PTM, but the difference is rather small. In contrast, short-term evolutionary trends revealed in an analysis of human population variation exhibited a much larger difference. Lysine residues with three PTMs show 1.8-fold enrichment of Mendelian disease-associated variants when compared with K residues with two PTMs, with the latter showing 1.7-fold enrichment of these variants when compared with the K residues with one PTM. Rare polymorphisms in humans show a similar trend, which suggests much greater negative selection against mutations of K residues with multiple PTMs within population. Conversely, common polymorphisms are overabundant at unmodified K residues and at K residues with fewer PTMs. The observed difference between inter- and intraspecies patterns of purifying selection on residues with PTMs suggests extensive species-specific drifting of PTM positions. These results suggest that the functionality of a protein is likely conserved, without necessarily conserving the PTM positions over evolutionary time.
