Hybrid sterility (HS) belongs to reproductive isolation barriers that safeguard the integrity of species in statu nascendi. Although hybrid sterility occurs almost universally among animal and plant species, most of our current knowledge comes from the classical genetic studies on Drosophila interspecific crosses or introgressions. With the house mouse subspecies Mus m. musculus and Mus m. domesticus as a model, new research tools have become available for studies of the molecular mechanisms and genetic networks underlying HS. Here we used QTL analysis and intersubspecific chromosome substitution strains to identify a 4.7 Mb critical region on Chromosome X (Chr X) harboring the Hstx2 HS locus, which causes asymmetrical spermatogenic arrest in reciprocal intersubspecific F1 hybrids. Subsequently, we mapped autosomal loci on Chrs 3, 9 and 13 that can abolish this asymmetry. Combination of immunofluorescent visualization of the proteins of synaptonemal complexes with whole-chromosome DNA FISH on pachytene spreads revealed that heterosubspecific, unlike consubspecific, homologous chromosomes are predisposed to asynapsis in F1 hybrid male and female meiosis. The asynapsis is under the trans- control of Hstx2 and Hst1/Prdm9 hybrid sterility genes in pachynemas of male but not female hybrids. The finding concurred with the fertility of intersubpecific F1 hybrid females homozygous for the Hstx2Mmm allele and resolved the apparent conflict with the dominance theory of Haldane's rule. We propose that meiotic asynapsis in intersubspecific hybrids is a consequence of cis-acting mismatch between homologous chromosomes modulated by the trans-acting Hstx2 and Prdm9 hybrid male sterility genes.
Genomes of newly emerging species restrict their gene exchange with related taxa in order to secure integrity. Hybrid sterility is one of the reproductive isolation mechanisms restricting gene flow between closely related, sexually reproducing organisms. We showed that hybrid sterility between two closely related mouse subspecies is executed by a failure of meiotic synapsis of orthologous chromosomes in F1 hybrid males. The asynapsis of orthologous chromosomes occurred in meiosis of male and female hybrids, though only males were sterile due to trans-acting male-specific hybrid sterility genes. We located one of the two major hybrid sterility genes to a 4.7 Mb interval on Chromosome X, showed that it controls male sterility by modulating the extent of meiotic asynapsis and using the inter-subspecific chromosome substitution strains we refuted the simple interpretation of dominance theory of Haldane's rule. A new working hypothesis posits male sterility of mouse inter-subsubspecific F1 hybrids as a consequence of meiotic chromosome asynapsis caused by the cis-acting mismatch between orthologous chromosomes modulated by the trans-acting hybrid male sterility genes.
The ability to survey polymorphism on a genomic scale has enabled genome-wide scans for the targets of natural selection. Theory that connects patterns of genetic variation to evidence of natural selection most often assumes a diallelic locus and no recurrent mutation. Although these assumptions are suitable to selection that targets single nucleotide variants, fundamentally different types of mutation generate abundant polymorphism in genomes. Moreover, recent empirical results suggest that mutationally complex, multiallelic loci including microsatellites and copy number variants are sometimes targeted by natural selection. Given their abundance, the lack of inference methods tailored to the mutational peculiarities of these types of loci represents a notable gap in our ability to interrogate genomes for signatures of natural selection. Previous theoretical investigations of mutation-selection balance at multiallelic loci include assumptions that limit their application to inference from empirical data. Focusing on microsatellites, we assess the dynamics and population-level consequences of selection targeting mutationally complex variants. We develop general models of a multiallelic fitness surface, a realistic model of microsatellite mutation, and an efficient simulation algorithm. Using these tools, we explore mutation-selection-drift equilibrium at microsatellites and investigate the mutational history and selective regime of the microsatellite that causes Friedreich’s ataxia. We characterize microsatellite selective events by their duration and cost, note similarities to sweeps from standing point variation, and conclude that it is premature to label microsatellites as ubiquitous agents of efficient adaptive change. Together, our models and simulation algorithm provide a powerful framework for statistical inference, which can be used to test the neutrality of microsatellites and other multiallelic variants.
microsatellites; fitness landscape; natural selection; population genetic inference; Friedreich’s ataxia; tandem repeats
The human Y chromosome exhibits surprisingly low levels of genetic diversity. This could result from neutral processes if the effective population size of males is reduced relative to females due to a higher variance in the number of offspring from males than from females. Alternatively, selection acting on new mutations, and affecting linked neutral sites, could reduce variability on the Y chromosome. Here, using genome-wide analyses of X, Y, autosomal and mitochondrial DNA, in combination with extensive population genetic simulations, we show that low observed Y chromosome variability is not consistent with a purely neutral model. Instead, we show that models of purifying selection are consistent with observed Y diversity. Further, the number of sites estimated to be under purifying selection greatly exceeds the number of Y-linked coding sites, suggesting the importance of the highly repetitive ampliconic regions. While we show that purifying selection removing deleterious mutations can explain the low diversity on the Y chromosome, we cannot exclude the possibility that positive selection acting on beneficial mutations could have also reduced diversity in linked neutral regions, and may have contributed to lowering human Y chromosome diversity. Because the functional significance of the ampliconic regions is poorly understood, our findings should motivate future research in this area.
The human Y chromosome is found only in males, and exhibits surprisingly low levels of genetic diversity. This low diversity could result from neutral processes, for example, if there are fewer males successfully mating (and thus fewer Y chromosomes being inherited) relative to the number of females who successfully mate. Alternatively, natural selection may act on mutations on the Y chromosome to reduce genetic diversity. Because there is no recombination across most of the Y chromosome all sites on the Y are effectively linked together. Thus, selection acting on any one site will affect all sites on the Y indirectly. Here, studying the X, Y, autosomal and mitochondrial DNA, in combination with population genetic simulations, we show that low observed Y chromosome variability is consistent with models of purifying selection removing deleterious mutations and linked variation, although positive selection may also be acting. We further infer that the number of sites affected by selection likely includes some proportion of the highly repetitive ampliconic regions on the Y. Because the functional significance of the ampliconic regions is poorly understood, our findings should motivate future research in this area.
The contribution of regulatory versus protein change to adaptive evolution has long been controversial. In principle, the rate and strength of adaptation within functional genetic elements can be quantified on the basis of an excess of nucleotide substitutions between species compared to the neutral expectation or from effects of recent substitutions on nucleotide diversity at linked sites. Here, we infer the nature of selective forces acting in proteins, their UTRs and conserved noncoding elements (CNEs) using genome-wide patterns of diversity in wild house mice and divergence to related species. By applying an extension of the McDonald-Kreitman test, we infer that adaptive substitutions are widespread in protein-coding genes, UTRs and CNEs, and we estimate that there are at least four times as many adaptive substitutions in CNEs and UTRs as in proteins. We observe pronounced reductions in mean diversity around nonsynonymous sites (whether or not they have experienced a recent substitution). This can be explained by selection on multiple, linked CNEs and exons. We also observe substantial dips in mean diversity (after controlling for divergence) around protein-coding exons and CNEs, which can also be explained by the combined effects of many linked exons and CNEs. A model of background selection (BGS) can adequately explain the reduction in mean diversity observed around CNEs. However, BGS fails to explain the wide reductions in mean diversity surrounding exons (encompassing ∼100 Kb, on average), implying that there is a substantial role for adaptation within exons or closely linked sites. The wide dips in diversity around exons, which are hard to explain by BGS, suggest that the fitness effects of adaptive amino acid substitutions could be substantially larger than substitutions in CNEs. We conclude that although there appear to be many more adaptive noncoding changes, substitutions in proteins may dominate phenotypic evolution.
We present an analysis of the genome sequences of multiple wild house mice. Wild house mice are about ten times more genetically diverse than humans, reflecting the large effective population size of the species. This manifests itself as more effective natural selection acting against deleterious mutations and favouring advantageous mutations in mice than in humans. We show that there are strong signals of adaptive evolution at many sites in the genome. We estimate that 80% of adaptive changes in the genome are in gene regulatory elements and only 20% are in protein-coding genes. We find that nucleotide diversity is markedly reduced close to gene regulatory elements and protein-coding gene sequences. The reductions around regulatory elements can be explained by selection purging deleterious mutations that occur in the elements themselves, but this process only partially explains the diversity reductions around protein-coding genes. Recurrent adaptive evolution, which can also cause local reductions in diversity via selective sweeps, may be necessary to fully explain the patterns in diversity that we observe surrounding genes. Although most adaptive molecular evolution appears to be regulatory, adaptive phenotypic change may principally be driven by structural change in proteins.
Profound knowledge of demographic history is a prerequisite for the understanding and inference of processes involved in the evolution of population differentiation and speciation. Together with new coalescent-based methods, the recent availability of genome-wide data enables investigation of differentiation and divergence processes at unprecedented depth. We combined two powerful approaches, full Approximate Bayesian Computation analysis (ABC) and pairwise sequentially Markovian coalescent modeling (PSMC), to reconstruct the demographic history of the split between two avian speciation model species, the pied flycatcher and collared flycatcher. Using whole-genome re-sequencing data from 20 individuals, we investigated 15 demographic models including different levels and patterns of gene flow, and changes in effective population size over time. ABC provided high support for recent (mode 0.3 my, range <0.7 my) species divergence, declines in effective population size of both species since their initial divergence, and unidirectional recent gene flow from pied flycatcher into collared flycatcher. The estimated divergence time and population size changes, supported by PSMC results, suggest that the ancestral species persisted through one of the glacial periods of middle Pleistocene and then split into two large populations that first increased in size before going through severe bottlenecks and expanding into their current ranges. Secondary contact appears to have been established after the last glacial maximum. The severity of the bottlenecks at the last glacial maximum is indicated by the discrepancy between current effective population sizes (20,000–80,000) and census sizes (5–50 million birds) of the two species. The recent divergence time challenges the supposition that avian speciation is a relatively slow process with extended times for intrinsic postzygotic reproductive barriers to evolve. Our study emphasizes the importance of using genome-wide data to unravel tangled demographic histories. Moreover, it constitutes one of the first examples of the inference of divergence history from genome-wide data in non-model species.
Demographic processes leave specific and detectable signatures within species genomes. Analysis of patterns of variation within and between closely related species can be used to unravel their divergence history and is crucial for understanding evolutionary processes such as speciation. We applied a set of novel population-genomic tools to investigate patterns of natural variation and infer demographic history of two avian speciation model species: pied flycatcher and collared flycatcher. The analysis supported a scenario consistent with allopatric speciation with recent, postglacial secondary contact. Most likely the ancestral species persisted through one of the glacial periods of the middle Pleistocene and then split into two large descendent populations that appear to have increased in size before experiencing severe bottlenecks during expansion into their current ranges. The two species established secondary contact after the last glacial maximum. This resulted in unidirectional gene flow from pied flycatcher to collared flycatcher. The results are consistent with a scenario where pied flycatcher recolonized northern Europe more rapidly than collared flycatcher. Our study increases the knowledge about the dynamics of the speciation process and constitutes one of the first examples of the inference of complex demographic history using information from genome-wide data in non-model species.
The Dobzhansky-Muller model of speciation posits that defects in hybrids between species are the result of negative epistatic interactions between alleles that arose in independent genetic backgrounds. Tests of one important prediction from this model, that incompatibilities “snowball”, have relied on comparisons of the number of incompatibilities between closely related pairs of species separated by different divergence times. How incompatibilities accumulate along phylogenies, however, remains poorly understood. We extend the Dobzhansky-Muller model to multi-species clades to describe the mathematical relationship between tree topology and the number of shared incompatibilities among related pairs of species. We use these results to develop a statistical test that distinguishes between the snowball and alternative incompatibility accumulation models, including non-epistatic and multi-locus incompatibility models, in a phylogenetic context. We further demonstrate that patterns of incompatibility sharing across species pairs can be used to estimate the relative frequencies of different types of incompatibilities, including derived-derived vs. derived-ancestral incompatibilities. Our results and statistical methods should motivate comparative genetic mapping of hybrid incompatibilities to evaluate competing models of speciation.
Dobzhansky-Muller incompatibilities; phylogenetic comparison; reproductive isolation; speciation
X chromosome inactivation (XCI) is the mammalian mechanism of dosage compensation that balances X-linked gene expression between the sexes. Early during female development, each cell of the embryo proper independently inactivates one of its two parental X-chromosomes. In mice, the choice of which X chromosome is inactivated is affected by the genotype of a cis-acting locus, the X-chromosome controlling element (Xce). Xce has been localized to a 1.9 Mb interval within the X-inactivation center (Xic), yet its molecular identity and mechanism of action remain unknown. We combined genotype and sequence data for mouse stocks with detailed phenotyping of ten inbred strains and with the development of a statistical model that incorporates phenotyping data from multiple sources to disentangle sources of XCI phenotypic variance in natural female populations on X inactivation. We have reduced the Xce candidate 10-fold to a 176 kb region located approximately 500 kb proximal to Xist. We propose that structural variation in this interval explains the presence of multiple functional Xce alleles in the genus Mus. We have identified a new allele, Xcee present in Mus musculus and a possible sixth functional allele in Mus spicilegus. We have also confirmed a parent-of-origin effect on X inactivation choice and provide evidence that maternal inheritance magnifies the skewing associated with strong Xce alleles. Based on the phylogenetic analysis of 155 laboratory strains and wild mice we conclude that Xcea is either a derived allele that arose concurrently with the domestication of fancy mice but prior the derivation of most classical inbred strains or a rare allele in the wild. Furthermore, we have found that despite the presence of multiple haplotypes in the wild Mus musculus domesticus has only one functional Xce allele, Xceb. Lastly, we conclude that each mouse taxa examined has a different functional Xce allele.
Although mammalian females have two X chromosomes in each cell, only one is functional, while gene expression from the other is silenced through a process called X chromosome inactivation. Little is known about the early stages of this process including how one parental X chromosome is inactivated over the other on a cell-by-cell basis. It has been shown, however, that certain inbred mouse strains are functionally different at a locus that controls this choice that provides an opportunity to identify the locus and determine its molecular mechanism. This has been the goal of many researchers over the past 40 years with incremental success. Here we took advantage of new mouse genotype and whole genome sequencing data to pinpoint the locus controlling choice. Our results identified a smaller region on the X chromosome that contains large duplicated sequences. We propose an explanation for multiple functional alleles in mouse and provide insight into the possible molecular mechanism of X chromosome inactivation choice. Our evolutionary analysis reveals why functional diversity at this locus appears to be common in laboratory mice and offers an explanation as to why we do not see this level of diversity in humans.
Large-scale population sequencing studies provide a complete picture of human genetic variation within the studied populations. A key challenge is to identify, among the myriad alleles, those variants that have an effect on molecular function, phenotypes, and reproductive fitness. Most non-neutral variation consists of deleterious alleles segregating at low population frequency due to incessant mutation. To date, studies characterizing selection against deleterious alleles have been based on allele frequency (testing for a relative excess of rare alleles) or ratio of polymorphism to divergence (testing for a relative increase in the number of polymorphic alleles). Here, starting from Maruyama's theoretical prediction (Maruyama T (1974), Am J Hum Genet USA 6:669–673) that a (slightly) deleterious allele is, on average, younger than a neutral allele segregating at the same frequency, we devised an approach to characterize selection based on allelic age. Unlike existing methods, it compares sets of neutral and deleterious sequence variants at the same allele frequency. When applied to human sequence data from the Genome of the Netherlands Project, our approach distinguishes low-frequency coding non-synonymous variants from synonymous and non-coding variants at the same allele frequency and discriminates between sets of variants independently predicted to be benign or damaging for protein structure and function. The results confirm the abundance of slightly deleterious coding variation in humans.
A key challenge in human genetics is to identify, among the multitude of genetic differences between individuals, those that have an effect on traits. Even though new genetic variants arise through mutation in each generation, most are present only in a small proportion of individuals because they have slightly negative effects on fitness. Detecting such slightly deleterious variants is a key challenge in analyzing how genetics influence human characteristics. In this paper, we test a theoretical prediction by Takeo Maruyama from 1974 that a slightly deleterious variant is, on average, younger than a neutral (non affecting fitness) variant present at the same population frequency. Thus our method detects selection by using estimated age of variants. We applied our method to human data from the Genome of the Netherlands Project, and we show that it distinguishes low-frequency protein-modifying variants from silent variants at the same population frequency and discriminates between sets of variants predicted to be benign or damaging for protein structure and function. Our results confirm the abundance of slightly deleterious protein-coding variation in humans.
Sperm and egg proteins constitute a remarkable paradigm in evolutionary biology: despite their fundamental role in mediating fertilization (suggesting stasis), some of these molecules are among the most rapidly evolving ones known, and their divergence can lead to reproductive isolation. Because of strong selection to maintain function among interbreeding individuals, interacting fertilization proteins should also exhibit a strong signal of correlated divergence among closely related species. We use evidence of such molecular co-evolution to target biochemical studies of fertilization in North Pacific abalone (Haliotis spp.), a model system of reproductive protein evolution. We test the evolutionary rates (dN/dS) of abalone sperm lysin and two duplicated egg coat proteins (VERL and VEZP14), and find a signal of co-evolution specific to ZP-N, a putative sperm binding motif previously identified by homology modeling. Positively selected residues in VERL and VEZP14 occur on the same face of the structural model, suggesting a common mode of interaction with sperm lysin. We test this computational prediction biochemically, confirming that the ZP-N motif is sufficient to bind lysin and that the affinities of VERL and VEZP14 are comparable. However, we also find that on phylogenetic lineages where lysin and VERL evolve rapidly, VEZP14 evolves slowly, and vice versa. We describe a model of sexual conflict that can recreate this pattern of anti-correlated evolution by assuming that VEZP14 acts as a VERL mimic, reducing the intensity of sexual conflict and slowing the co-evolution of lysin and VERL.
Interacting sperm and egg proteins must co-evolve to maintain compatibility at fertilization, so their divergence among species should be correlated—lineages with rapidly evolving sperm proteins should have rapidly evolving egg proteins. We use this expectation to target biochemical studies of fertilization in a model system (abalone). We study a discrete functional domain (ZP-N) found in a pair of duplicated egg coat proteins, and we find the ZP-N motif from both proteins bind sperm lysin (a protein important for sperm passage of the egg coat) in a similar fashion. ZP-N is a feature of vertebrate and invertebrate egg coat proteins, as well as yeast mating recognition proteins, demonstrating its broad significance in sexual reproduction. Unexpectedly, we find that the ZP-N motifs of VEZP14 and VERL exhibit inverse patterns of co-evolution with lysin, suggesting that these duplicates may have opposite functions in fertilization. Using computer simulations, we model a novel explanation for this pattern whereby VEZP14 acts as a decoy of VERL in order to decrease the effective amount of sperm lysin and slow the rate of fertilization. Such molecular mimicry could complement other well-established fertilization blocks that females use to control rates of fertilization and limit polyspermy.
Mitochondrial transcription, translation, and respiration require interactions between genes encoded in two distinct genomes, generating the potential for mutations in nuclear and mitochondrial genomes to interact epistatically and cause incompatibilities that decrease fitness. Mitochondrial-nuclear epistasis for fitness has been documented within and between populations and species of diverse taxa, but rarely has the genetic or mechanistic basis of these mitochondrial–nuclear interactions been elucidated, limiting our understanding of which genes harbor variants causing mitochondrial–nuclear disruption and of the pathways and processes that are impacted by mitochondrial–nuclear coevolution. Here we identify an amino acid polymorphism in the Drosophila melanogaster nuclear-encoded mitochondrial tyrosyl–tRNA synthetase that interacts epistatically with a polymorphism in the D. simulans mitochondrial-encoded tRNATyr to significantly delay development, compromise bristle formation, and decrease fecundity. The incompatible genotype specifically decreases the activities of oxidative phosphorylation complexes I, III, and IV that contain mitochondrial-encoded subunits. Combined with the identity of the interacting alleles, this pattern indicates that mitochondrial protein translation is affected by this interaction. Our findings suggest that interactions between mitochondrial tRNAs and their nuclear-encoded tRNA synthetases may be targets of compensatory molecular evolution. Human mitochondrial diseases are often genetically complex and variable in penetrance, and the mitochondrial–nuclear interaction we document provides a plausible mechanism to explain this complexity.
The ancient symbiosis between two prokaryotes that gave rise to the eukaryotic cell has required genomic cooperation for at least a billion years. Eukaryotic cells respire through the coordinated expression of their nuclear and mitochondrial genomes, both of which encode the proteins and RNAs required for mitochondrial transcription, translation, and aerobic respiration. Genetic interactions between these genomes are hypothesized to influence the effects of mitochondrial mutations on disease and drive mitochondrial–nuclear coevolution. Here we characterize the molecular cause and the cellular and organismal consequences of a mitochondrial–nuclear interaction in Drosophila between naturally occurring mutations in a mitochondrial tRNA and a nuclear-encoded tRNA synthetase. These mutations have little effect on their own; but, when combined, they severely compromise development and reproduction. tRNA synthetases attach the appropriate amino acid onto their cognate tRNA, and this reaction is required for efficient and accurate protein synthesis. We show that disruption of this interaction compromises mitochondrial function, providing hypotheses for the variable penetrance of diseases associated with mitochondrial tRNAs and for which pathways and processes are likely to be affected by mitochondrial–nuclear interactions.
The pseudoautosomal region (PAR) is essential for the accurate pairing and segregation of the X and Y chromosomes during meiosis. Despite its functional significance, the PAR shows substantial evolutionary divergence in structure and sequence between mammalian species. An instructive example of PAR evolution is the house mouse Mus musculus domesticus (represented by the C57BL/6J strain), which has the smallest PAR among those that have been mapped. In C57BL/6J, the PAR boundary is located just ~700 kb from the distal end of the X chromosome, whereas the boundary is found at a more proximal position in Mus spretus, a species that diverged from house mice 2–4 million years ago. Here, we use a combination of genetic and physical mapping to document a pronounced shift in the PAR boundary in a second house mouse subspecies, Mus musculus castaneus (represented by the CAST/EiJ strain), ~430 kb proximal of the M. m. domesticus boundary. We demonstrate molecular evolutionary consequences of this shift, including a marked lineage-specific increase in sequence divergence within Mid1, a gene that resides entirely within the M. m. castaneus PAR but straddles the boundary in other subspecies. Our results extend observations of structural divergence in the PAR to closely related subspecies, pointing to major evolutionary changes in this functionally important genomic region over a short time period.
pseudoautosomal region; house mouse; Mid1
Despite advances in genetic mapping of quantitative traits and in phylogenetic comparative approaches, these two perspectives are rarely combined. The joint consideration of multiple crosses among related taxa (whether species or strains) not only allows more precise mapping of the genetic loci (called quantitative trait loci, QTL) that contribute to important quantitative traits, but also offers the opportunity to identify the origin of a QTL allele on the phylogenetic tree that relates the taxa. We describe a formal method for combining multiple crosses to infer the location of a QTL on a tree. We further discuss experimental design issues for such endeavors, such as how many crosses are required and which sets of crosses are best. Finally, we explore the method’s performance in computer simulations, and we illustrate its use through application to a set of four mouse intercrosses among five inbred strains, with data on HDL cholesterol.
quantitative trait loci (QTL); phylogenetic tree; evolution; multiple crosses; combining crosses
We report genome sequences of 17 inbred strains of laboratory mice and identify almost ten times more variants than previously known. We use these genomes to explore the phylogenetic history of the laboratory mouse and to examine the functional consequences of allele-specific variation on transcript abundance, revealing that at least 12% of transcripts show a significant tissue-specific expression bias. By identifying candidate functional variants at 718 quantitative trait loci we show that the molecular nature of functional variants and their position relative to genes vary according to the effect size of the locus. These sequences provide a starting point for a new era in the functional analysis of a key model organism.
Multiple endocrine neoplasia type 2B (MEN2B) is a highly aggressive thyroid cancer syndrome. Since almost all sporadic cases are caused by the same nucleotide substitution in the RET proto-oncogene, the calculated disease incidence is 100–200 times greater than would be expected based on the genome average mutation frequency. In order to determine whether this increased incidence is due to an elevated mutation rate at this position (true mutation hot spot) or a selective advantage conferred on mutated spermatogonial stem cells, we studied the spatial distribution of the mutation in 14 human testes. In donors aged 36–68, mutations were clustered with small regions of each testis having mutation frequencies several orders of magnitude greater than the rest of the testis. In donors aged 19–23 mutations were almost non-existent, demonstrating that clusters in middle-aged donors grew during adulthood. Computational analysis showed that germline selection is the only plausible explanation. Testes of men aged 75–80 were heterogeneous with some like middle-aged and others like younger testes. Incorporating data on age-dependent death of spermatogonial stem cells explains the results from all age groups. Germline selection also explains MEN2B's male mutation bias and paternal age effect. Our discovery focuses attention on MEN2B as a model for understanding the genetic and biochemical basis of germline selection. Since RET function in mouse spermatogonial stem cells has been extensively studied, we are able to suggest that the MEN2B mutation provides a selective advantage by altering the PI3K/AKT and SFK signaling pathways. Mutations that are preferred in the germline but reduce the fitness of offspring increase the population's mutational load. Our approach is useful for studying other disease mutations with similar characteristics and could uncover additional germline selection pathways or identify true mutation hot spots.
Multiple endocrine neoplasia type 2B (MEN2B) is a highly aggressive thyroid cancer syndrome. MEN2B offspring with unaffected parents almost always received a new mutation from the father. Moreover, this mutation is almost always at the same nucleotide in the RET proto-oncogene. Thus MEN2B's incidence should equal the average single nucleotide mutation frequency, but the observed incidence is 100–200 times greater. One explanation is that the mutation rate at the causal nucleotide is significantly elevated above the genome average. Another is that human testis stem cells acquiring this mutation have a selective advantage over non-mutated ones and this advantage increases the mutation's frequency in the testis. Computational analysis of our testis dissection and mutation assay data rejects the hot spot but not the selective advantage explanation. Because the normal RET gene is known to be critical for mouse testis stem cell function, we now have an important insight into what biochemical pathways are altered by the MEN2B mutation to provide this selective advantage in humans. Germline selection explains the unexpectedly high incidence of MEN2B, why the mutation's origin is almost always in the father, and why the probability a child is born with this disease increases with the father's age.
Recently diverged taxa may continue to exchange genes. A number of models of speciation with gene flow propose that the frequency of gene exchange will be lower in genomic regions of low recombination and that these regions will therefore be more differentiated. However, several population-genetic models that focus on selection at linked sites also predict greater differentiation in regions of low recombination simply as a result of faster sorting of ancestral alleles even in the absence of gene flow. Moreover, identifying the actual amount of gene flow from patterns of genetic variation is tricky, because both ancestral polymorphism and migration lead to shared variation between recently diverged taxa. New analytic methods have been developed to help distinguish ancestral polymorphism from migration. Along with a growing number of datasets of multi-locus DNA sequence variation, these methods have spawned a renewed interest in speciation models with gene flow. Here, we review both speciation and population-genetic models that make explicit predictions about how the rate of recombination influences patterns of genetic variation within and between species. We then compare those predictions with empirical data of DNA sequence variation in rabbits and mice. We find strong support for the prediction that genomic regions experiencing low levels of recombination are more differentiated. In most cases, reduced gene flow appears to contribute to the pattern, although disentangling the relative contribution of reduced gene flow and selection at linked sites remains a challenge. We suggest fruitful areas of research that might help distinguish between different models.
genetic hitchhiking; background selection; gene flow; Mus musculus; Oryctolgaus cuniculus
Rapid advances in DNA sequencing and genotyping technologies are beginning to reveal the scope and pattern of human genomic variation. Although single nucleotide polymorphisms (SNPs) have been intensively studied, the extent and form of variation at other types of molecular variants remain poorly understood. Polymorphism at the most variable loci in the human genome, microsatellites, has rarely been examined on a genomic scale without the ascertainment biases that attend typical genotyping studies. We conducted a genomic survey of variation at microsatellites with at least three perfect repeats by comparing two complete genome sequences, the Human Genome Reference sequence and the sequence of J. Craig Venter. The genomic proportion of polymorphic loci was 2.7%, much higher than the rate of SNP variation, with marked heterogeneity among classes of loci. The proportion of variable loci increased substantially with repeat number. Repeat lengths differed in levels of variation, with longer repeat lengths generally showing higher polymorphism at the same repeat number. Microsatellite variation was weakly correlated with regional SNP number, indicating modest effects of shared genealogical history. Reductions in variation were detected at microsatellites located in introns, in untranslated regions, in coding exons, and just upstream of transcription start sites, suggesting the presence of selective constraints. Our results provide new insights into microsatellite mutational processes and yield a preview of patterns of variation that will be obtained in genomic surveys of larger numbers of individuals.
microsatellites; tandem repeats; population genomics; mutation; human genome
Theoretical work focused on microsatellite variation has produced a number of important
results, including the expected distribution of repeat sizes and the expected squared
difference in repeat size between two randomly selected samples. However, closed-form
expressions for the sampling distribution and frequency spectrum of microsatellite
variation have not been identified. Here, we use coalescent simulations of the stepwise
mutation model to develop gamma and exponential approximations of the microsatellite
allele frequency spectrum, a distribution central to the description of microsatellite
variation across the genome. For both approximations, the parameter of biological
relevance is the number of alleles at a locus, which we express as a function of
θ, the population-scaled mutation rate, based on simulated data.
Discovered relationships between θ, the number of alleles, and the
frequency spectrum support the development of three new estimators of microsatellite
θ. The three estimators exhibit roughly similar mean squared
errors (MSEs) and all are biased. However, across a broad range of sample sizes and
θ values, the MSEs of these estimators are frequently lower than
all other estimators tested. The new estimators are also reasonably robust to mutation
that includes step sizes greater than one. Finally, our approximation to the
microsatellite allele frequency spectrum provides a null distribution of microsatellite
variation. In this context, a preliminary analysis of the effects of demographic change on
the frequency spectrum is performed. We suggest that simulations of the microsatellite
frequency spectrum under evolutionary scenarios of interest may guide investigators to the
use of relevant and sometimes novel summary statistics.
microsatellite; allele frequency spectrum; θ (theta); stepwise mutation model
Pathogens are believed to drive genetic diversity at host loci involved in immunity to infectious disease. To date, studies exploring the genetic basis of pathogen resistance in the wild have focussed almost exclusively on genes of the Major Histocompatibility Complex (MHC); the role of genetic variation elsewhere in the genome as a basis for variation in pathogen resistance has rarely been explored in natural populations. Cytokines are signalling molecules with a role in many immunological and physiological processes. Here we use a natural population of field voles (Microtus agrestis) to examine how genetic diversity at a suite of cytokine and other immune loci impacts the immune response phenotype and resistance to several endemic pathogen species. By using linear models to first control for a range of non-genetic factors, we demonstrate strong effects of genetic variation at cytokine loci both on host immunological parameters and on resistance to multiple pathogens. These effects were primarily localized to three cytokine genes (Interleukin 1 beta (Il1b), Il2, and Il12b), rather than to other cytokines tested, or to membrane-bound, non-cytokine immune loci. The observed genetic effects were as great as for other intrinsic factors such as sex and body weight. Our results demonstrate that genetic diversity at cytokine loci is a novel and important source of individual variation in immune function and pathogen resistance in natural populations. The products of these loci are therefore likely to affect interactions between pathogens and help determine survival and reproductive success in natural populations. Our study also highlights the utility of wild rodents as a model of ecological immunology, to better understand the causes and consequences of variation in immune function in natural populations including humans.
Much of what we know about the genetic basis of immunity to infection has come from studies of laboratory animals. However, these animals are kept in conditions very different from those experienced in the natural environment. In order to improve our understanding of the genetic determinants of disease susceptibility, it is therefore important to examine how genetic variation impacts on immunity in natural populations. So far, studies into the genetic basis of pathogen resistance in the wild have focussed almost exclusively on genes of the Major Histocompatibility Complex (MHC). The MHC is undoubtedly important in immunity to infection, but there are many other genes involved in the immune response that are yet to be investigated. Here we examine genetic variation in cytokines, signalling molecules crucial in the induction and regulation of the different effector arms of the immune response. We use a natural population of field voles, wild rodents related to common laboratory species, and show that variation within cytokine genes is linked to differences between individuals in their immune response and in resistance to multiple pathogens. Cytokines are then likely to be an important source of genetic variation to help individuals combat infection and survive in the wild.
Although growing numbers of single nucleotide polymorphisms (SNPs) and microsatellites (short tandem repeat polymorphisms or STRPs) are used to infer population structure, their relative properties in this context remain poorly understood. SNPs and STRPs mutate differently, suggesting multi-locus genotypes at these loci might differ in ability to detect population structure. Here, we use coalescent simulations to measure the power of sets of SNPs and STRPs to identify population structure. To maximize applicability of our results to empirical studies, we focus on the popular STRUCTURE analysis and evaluate the role of several biological and practical factors in the detection of population structure. We find that: (1) fewer unlinked STRPs than SNPs are needed to detect structure at divergence times <0.3 Ne generations; (2) accurate estimation of the number of populations requires many fewer STRPs than SNPs; (3) for both marker types, declines in power due to modest gene flow (Nem=1.0) are largely negated by increasing marker number; (4) variation in the STRP mutational model affects power modestly; (5) SNP haplotypes (θ=1, no recombination) provide power comparable to STRP loci (θ=10); (6) ascertainment schemes that select highly variable STRP or SNP loci increase power to detect structure, though ascertained data may not be suitable to other inference; and (7) when samples are drawn from an admixed population and one parent population, the reduction in power to detect two populations is greater for STRPs than SNPs. These results should assist the design of multi-locus studies to detect population structure in nature.
population structure; microsatellite; single nucleotide polymorphism; ascertainment bias; statistical power; single tandem repeat
The rate of meiotic recombination varies markedly between species and among individuals. Classical genetic experiments demonstrated a heritable component to population variation in recombination rate, and specific sequence variants that contribute to recombination rate differences between individuals have recently been identified. Despite these advances, the genetic basis of species divergence in recombination rate remains unexplored. Using a cytological assay that allows direct in situ imaging of recombination events in spermatocytes, we report a large (∼30%) difference in global recombination rate between males of two closely related house mouse subspecies (Mus musculus musculus and M. m. castaneus). To characterize the genetic basis of this recombination rate divergence, we generated an F2 panel of inter-subspecific hybrid males (n = 276) from an intercross between wild-derived inbred strains CAST/EiJ (M. m. castaneus) and PWD/PhJ (M. m. musculus). We uncover considerable heritable variation for recombination rate among males from this mapping population. Much of the F2 variance for recombination rate and a substantial portion of the difference in recombination rate between the parental strains is explained by eight moderate- to large-effect quantitative trait loci, including two transgressive loci on the X chromosome. In contrast to the rapid evolution observed in males, female CAST/EiJ and PWD/PhJ animals show minimal divergence in recombination rate (∼5%). The existence of loci on the X chromosome suggests a genetic mechanism to explain this male-biased evolution. Our results provide an initial map of the genetic changes underlying subspecies differences in genome-scale recombination rate and underscore the power of the house mouse system for understanding the evolution of this trait.
Homologous recombination is an indispensable feature of the mammalian meiotic program and an important mechanism for creating genetic diversity. Despite its central significance, recombination rates vary markedly between species and among individuals. Although recent studies have begun to unravel the genetic basis of recombination rate variation within populations, the genetic mechanisms of species divergence in recombination rate remain poorly characterized. In this study, we show that two closely related house mouse subspecies differ in their genomic recombination rates by ∼30%, providing an excellent model system for studying evolutionary divergence in this trait. Using quantitative genetic methods, we identify eight genomic regions that contribute to divergence in global recombination rate between these subspecies, including large effect loci and multiple loci on the X-chromosome. Our study uncovers novel genomic loci contributing to species divergence in global recombination rate and offers simple genetic explanations for rapid phenotypic divergence in this trait.
Patterns of population structure provide insights into evolutionary processes and help identify groups of individuals for genotype–phenotype association studies. With increasing availability of polymorphic molecular markers across genomes, the examination of population structure using large numbers of unlinked loci has become a common practice in evolutionary biology and human genetics. The two classes of molecular variation most widely used for this purpose, short tandem repeat polymorphisms (STRPs) and single-nucleotide polymorphisms (SNPs), differ in mutational properties expected to affect population structure. To measure the relative ability of these loci to describe population structure, we compared diversity at neighboring STRPs and SNPs from 720 genomic regions in the four populations that comprise the Human HapMap. Comparing loci from the same genomic regions allowed us to focus on the contribution of mutational differences (rather than variation in genealogical history) to disparities in population structure between STRPs and SNPs. Relative to average values for SNPs from the same regions, STRPs had lower Fst, but higher Gst′ and In values. STRP–SNP correlations in population structure across genomic regions were statistically significant but weak in magnitude. Separate analyses by repeat type showed that these correlations were driven primarily by tetranucleotide and trinucleotide STRPs; measures of population structure at dinucleotides and SNPs were not significantly correlated. Pairwise comparisons among populations revealed effects of divergence time on differences in population structure between STRPs and SNPs. Collectively, these results confirm that individual STRPs can provide more information about population structure than individual SNPs, but suggest that the difference in structure at STRPs and SNPs depends on local genealogical history. Our study motivates theoretical comparisons of population structure at loci with different mutational properties.
SNP; microsatellite; recurrent mutation; population structure; marker informativeness; human genome
The high genomic density of the single-nucleotide polymorphism (SNP) sets that are typically surveyed in genome-wide association studies (GWAS) now allows the application of haplotype-based methods. Although the choice of haplotype-based vs. individual-SNP approaches is expected to affect the results of association studies, few empirical comparisons of method performance have been reported on the genome-wide scale in the same set of individuals. To measure the relative ability of the two strategies to detect associations, we used a large dataset from the North American Rheumatoid Arthritis Consortium to: 1) partition the genome into haplotype blocks, 2) associate haplotypes with disease, and 3) compare the results with individual-SNP association mapping. Although some associations were shared across methods, each approach uniquely identified several strong candidate regions. Our results suggest that the application of both haplotype-based and individual-SNP testing to GWAS should be adopted as a routine procedure.
Population genetic theory predicts discordance in the true phylogeny of different genomic regions when studying recently diverged species. Despite this expectation, genome-wide discordance in young species groups has rarely been statistically quantified. The house mouse subspecies group provides a model system for examining phylogenetic discordance. House mouse subspecies are recently derived, suggesting that even if there has been a simple tree-like population history, gene trees could disagree with the population history due to incomplete lineage sorting. Subspecies of house mice also hybridize in nature, raising the possibility that recent introgression might lead to additional phylogenetic discordance. Single-locus approaches have revealed support for conflicting topologies, resulting in a subspecies tree often summarized as a polytomy. To analyze phylogenetic histories on a genomic scale, we applied a recently developed method, Bayesian concordance analysis, to dense SNP data from three closely related subspecies of house mice: Mus musculus musculus, M. m. castaneus, and M. m. domesticus. We documented substantial variation in phylogenetic history across the genome. Although each of the three possible topologies was strongly supported by a large number of loci, there was statistical evidence for a primary phylogenetic history in which M. m. musculus and M. m. castaneus are sister subspecies. These results underscore the importance of measuring phylogenetic discordance in other recently diverged groups using methods such as Bayesian concordance analysis, which are designed for this purpose.
The phylogenetic history of individual genes can differ strongly from the species history if taxa are recently derived, making inferences of a species history from only a handful of genes especially difficult in these cases. Genome-scale data sets now allow phylogenetic histories to be reconstructed from a large number of genes. Although data sets of this size are becoming more common, few studies have characterized variation in phylogenetic history across whole genomes. We summarize fine scale variation in phylogenetic history across the genome of house mice, a recently derived group of subspecies, using a method that combines phylogenetic uncertainty among gene trees. We document substantial variation in phylogenetic history among 14,081 loci and describe a primary history in the face of this variation. These results support the use of genome-scale datasets and methods that accommodate phylogenetic discordance in attempts to reconstruct the history of closely related groups.
Genome-wide association studies (GWAS) for quantitative traits and disease in humans and other species have shown that there are many loci that contribute to the observed resemblance between relatives. GWAS to date have mostly focussed on discovery of genes or regulatory regions habouring causative polymorphisms, using single SNP analyses and setting stringent type-I error rates. Genome-wide marker data can also be used to predict genetic values and therefore predict phenotypes. Here, we propose a Bayesian method that utilises all marker data simultaneously to predict phenotypes. We apply the method to three traits: coat colour, %CD8 cells, and mean cell haemoglobin, measured in a heterogeneous stock mouse population. We find that a model that contains both additive and dominance effects, estimated from genome-wide marker data, is successful in predicting unobserved phenotypes and is significantly better than a prediction based upon the phenotypes of close relatives. Correlations between predicted and actual phenotypes were in the range of 0.4 to 0.9 when half of the number of families was used to estimate effects and the other half for prediction. Posterior probabilities of SNPs being associated with coat colour were high for regions that are known to contain loci for this trait. The prediction of phenotypes using large samples, high-density SNP data, and appropriate statistical methodology is feasible and can be applied in human medicine, forensics, or artificial selection programs.
Results from recent genome-wide association studies indicate that for most complex traits, there are many loci that contribute to variation in observed phenotype and that the effect of a single variant (single nucleotide polymorphism, SNP) on a phenotype is small. Here, we propose a method that combines the effects of multiple SNPs to make a prediction of a phenotype that has not been observed. We apply the method to data on mice, using phenotypic and genomic data from some individuals to predict phenotypes in other, either related or unrelated, individuals. We find that correlations between predicted and actual phenotypes are in the range of 0.4 to 0.9. The method also shows that the SNPs used in the prediction appear in regions that are known to contain genes associated with the traits studied. The prediction of unobserved phenotypes from high-density SNP data and appropriate statistical methodology is feasible and can be applied in human medicine, forensics, or artificial breeding programs.