Equine recurrent laryngeal neuropathy (RLN) is a bilateral mononeuropathy
with an unknown pathogenesis that significantly affects performance in
Thoroughbreds. A genetic contribution to the pathogenesis of RLN is
suggested by the higher prevalence of the condition in offspring of
RLN-affected than unaffected stallions. To better understand RLN
pathogenesis and its genetic basis, we performed a genome-wide association
(GWAS) of 282 RLN-affected and 268 control Thoroughbreds.
We found a significant association of RLN with the
LCORL/NCAPG locus on ECA3 previously shown to affect
body size in horses. Using height at the withers of 505 of these horses, we
confirmed the strong association of this locus with body size, and
demonstrated a significant phenotypic and genetic correlation between height
and RLN grade in this cohort. Secondary genetic associations for RLN on
ECA18 and X did not correlate with withers height in our cohort, but did
contain candidate genes likely influencing muscle physiology and growth:
myostatin (MSTN) and integral membrane protein 2A
This linkage between body size and RLN suggests that selective breeding to
reduce RLN prevalence would likely reduce adult size in this population.
However, our results do not preclude the possibility of modifier loci that
attenuate RLN risk without reducing size or performance, or that the RLN
risk allele is distinct but tightly linked to the body size locus on ECA3.
This study is both the largest body size GWAS and the largest RLN GWAS
within Thoroughbred horses to date, and suggests that improved understanding
of the relationship between genetics, equine growth rate, and RLN prevalence
may significantly advance our understanding and management of this
Recurrent laryngeal neuropathy (RLN); Thoroughbred; Horse; Equus caballus; Genome-wide association study (GWAS); Haplotype; Body size
To identify the causative mutations in two early-onset canine retinal degenerations, crd1 and crd2, segregating in the American Staffordshire terrier and the Pit Bull Terrier breeds, respectively.
Retinal morphology of crd1- and crd2-affected dogs was evaluated by light microscopy. DNA was extracted from affected and related unaffected controls. Association analysis was undertaken using the Illumina Canine SNP array and PLINK (crd1 study), or the Affymetrix Version 2 Canine array, the “MAGIC” genotype algorithm, and Fisher's Exact test for association (crd2 study). Positional candidate genes were evaluated for each disease.
Structural photoreceptor abnormalities were observed in crd1-affected dogs as young as 11-weeks old. Rod and cone inner segment (IS) and outer segments (OS) were abnormal in size, shape, and number. In crd2-affected dogs, rod and cone IS and OS were abnormal as early as 3 weeks of age, progressing with age to severe loss of the OS, and thinning of the outer nuclear layer (ONL) by 12 weeks of age. Genome-wide association study (GWAS) identified association at the telomeric end of CFA3 in crd1-affected dogs and on CFA33 in crd2-affected dogs. Candidate gene evaluation identified a three bases deletion in exon 21 of PDE6B in crd1-affected dogs, and a cytosine insertion in exon 10 of IQCB1 in crd2-affected dogs.
Identification of the mutations responsible for these two early-onset retinal degenerations provides new large animal models for comparative disease studies and evaluation of potential therapeutic approaches for the homologous human diseases.
We describe two genome-wide association studies in two closely related dog breeds affected with retinal degeneration, the pathology of the diseases and the discovery of a novel deletion mutation in PDE6B and an insertion mutation in IQCB1 as the causality for these diseases.
retina; mutation; GWAS
Obsessive-compulsive disorder (OCD), a severe mental disease manifested in time-consuming repetition of behaviors, affects 1 to 3% of the human population. While highly heritable, complex genetics has hampered attempts to elucidate OCD etiology. Dogs suffer from naturally occurring compulsive disorders that closely model human OCD, manifested as an excessive repetition of normal canine behaviors that only partially responds to drug therapy. The limited diversity within dog breeds makes identifying underlying genetic factors easier.
We use genome-wide association of 87 Doberman Pinscher cases and 63 controls to identify genomic loci associated with OCD and sequence these regions in 8 affected dogs from high-risk breeds and 8 breed-matched controls. We find 119 variants in evolutionarily conserved sites that are specific to dogs with OCD. These case-only variants are significantly more common in high OCD risk breeds compared to breeds with no known psychiatric problems. Four genes, all with synaptic function, have the most case-only variation: neuronal cadherin (CDH2), catenin alpha2 (CTNNA2), ataxin-1 (ATXN1), and plasma glutamate carboxypeptidase (PGCP). In the 2 Mb gene desert between the cadherin genes CDH2 and DSC3, we find two different variants found only in dogs with OCD that disrupt the same highly conserved regulatory element. These variants cause significant changes in gene expression in a human neuroblastoma cell line, likely due to disrupted transcription factor binding.
The limited genetic diversity of dog breeds facilitates identification of genes, functional variants and regulatory pathways underlying complex psychiatric disorders that are mechanistically similar in dogs and humans.
To identify genetic changes underlying dog domestication and reconstruct their early evolutionary history, we generated high-quality genome sequences from three gray wolves, one from each of the three putative centers of dog domestication, two basal dog lineages (Basenji and Dingo) and a golden jackal as an outgroup. Analysis of these sequences supports a demographic model in which dogs and wolves diverged through a dynamic process involving population bottlenecks in both lineages and post-divergence gene flow. In dogs, the domestication bottleneck involved at least a 16-fold reduction in population size, a much more severe bottleneck than estimated previously. A sharp bottleneck in wolves occurred soon after their divergence from dogs, implying that the pool of diversity from which dogs arose was substantially larger than represented by modern wolf populations. We narrow the plausible range for the date of initial dog domestication to an interval spanning 11–16 thousand years ago, predating the rise of agriculture. In light of this finding, we expand upon previous work regarding the increase in copy number of the amylase gene (AMY2B) in dogs, which is believed to have aided digestion of starch in agricultural refuse. We find standing variation for amylase copy number variation in wolves and little or no copy number increase in the Dingo and Husky lineages. In conjunction with the estimated timing of dog origins, these results provide additional support to archaeological finds, suggesting the earliest dogs arose alongside hunter-gathers rather than agriculturists. Regarding the geographic origin of dogs, we find that, surprisingly, none of the extant wolf lineages from putative domestication centers is more closely related to dogs, and, instead, the sampled wolves form a sister monophyletic clade. This result, in combination with dog-wolf admixture during the process of domestication, suggests that a re-evaluation of past hypotheses regarding dog origins is necessary.
The process of dog domestication is still poorly understood, largely because no studies thus far have leveraged deeply sequenced whole genomes from wolves and dogs to simultaneously evaluate support for the proposed source regions: East Asia, the Middle East, and Europe. To investigate dog origins, we sequence three wolf genomes from the putative centers of origin, two basal dog breeds (Basenji and Dingo), and a golden jackal as an outgroup. We find that none of the wolf lineages from the hypothesized domestication centers is supported as the source lineage for dogs, and that dogs and wolves diverged 11,000–16,000 years ago in a process involving extensive admixture and that was followed by a bottleneck in wolves. In addition, we investigate the amylase (AMY2B) gene family expansion in dogs, which has recently been suggested as being critical to domestication in response to increased dietary starch. We find standing variation in AMY2B copy number in wolves and show that some breeds, such as Dingo and Husky, lack the AMY2B expansion. This suggests that, at the beginning of the domestication process, dogs may have been characterized by a more carnivorous diet than their modern day counterparts, a diet held in common with early hunter-gatherers.
The identification of the H3K4 trimethylase, PRDM9, as the gene responsible for recombination hotspot localization has provided considerable insight into the mechanisms by which recombination is initiated in mammals. However, uniquely amongst mammals, canids appear to lack a functional version of PRDM9 and may therefore provide a model for understanding recombination that occurs in the absence of PRDM9, and thus how PRDM9 functions to shape the recombination landscape. We have constructed a fine-scale genetic map from patterns of linkage disequilibrium assessed using high-throughput sequence data from 51 free-ranging dogs, Canis lupus familiaris. While broad-scale properties of recombination appear similar to other mammalian species, our fine-scale estimates indicate that canine highly elevated recombination rates are observed in the vicinity of CpG rich regions including gene promoter regions, but show little association with H3K4 trimethylation marks identified in spermatocytes. By comparison to genomic data from the Andean fox, Lycalopex culpaeus, we show that biased gene conversion is a plausible mechanism by which the high CpG content of the dog genome could have occurred.
Recombination in mammalian genomes tends to occur within highly localized regions known as recombination hotspots. These hotspots appear to be a ubiquitous feature of mammalian genomes, but tend to not be shared between closely related species despite high levels of DNA sequence similarity. This disparity has been largely explained by the discovery of PRDM9 as the gene responsible for localizing recombination hotspots via recognition and binding to specific DNA motifs. Variation within PRDM9 can lead to changes to the recognized motif, and hence changes to the location of recombination hotspots thought the genome. Multiple studies have shown that PRDM9 is under strong selective pressure, apparently leading to a rapid turnover of hotspot locations between species. However, uniquely amongst mammals, PRDM9 appears to be dysfunctional in dogs and other canids. In this paper, we investigate how the loss of PRDM9 has affected the fine-scale recombination landscape in dogs and contrast this with patterns seen in other species.
Advances in genome technology have facilitated a new understanding of the historical and genetic processes crucial to rapid phenotypic evolution under domestication1,2. To understand the process of dog diversification better, we conducted an extensive genome-wide survey of more than 48,000 single nucleotide polymorphisms in dogs and their wild progenitor, the grey wolf. Here we show that dog breeds share a higher proportion of multi-locus haplotypes unique to grey wolves from the Middle East, indicating that they are a dominant source of genetic diversity for dogs rather than wolves from east Asia, as suggested by mitochondrial DNA sequence data3. Furthermore, we find a surprising correspondence between genetic and phenotypic/functional breed groupings but there are exceptions that suggest phenotypic diversification depended in part on the repeated crossing of individuals with novel phenotypes. Our results show that Middle Eastern wolves were a critical source of genome diversity, although interbreeding with local wolf populations clearly occurred elsewhere in the early history of specific lineages. More recently, the evolution of modern dog breeds seems to have been an iterative process that drew on a limited genetic toolkit to create remarkable phenotypic diversity.
Since the beginnings of domestication, the craniofacial architecture of the domestic dog has morphed and radiated to human whims. By beginning to define the genetic underpinnings of breed skull shapes, we can elucidate mechanisms of morphological diversification while presenting a framework for understanding human cephalic disorders. Using intrabreed association mapping with museum specimen measurements, we show that skull shape is regulated by at least five quantitative trait loci (QTLs). Our detailed analysis using whole-genome sequencing uncovers a missense mutation in BMP3. Validation studies in zebrafish show that Bmp3 function in cranial development is ancient. Our study reveals the causal variant for a canine QTL contributing to a major morphologic trait.
As a result of selective breeding practices, modern dogs display a multitude of head shapes. Breeds such as the Pug and Bulldog popularize one of these morphologies, termed “brachycephaly.” A short, upward-pointing snout, a massive and rounded head, and an underbite typify brachycephalic breeds. Here, we have coupled the phenotypes collected from museum skulls with the genotypes collected from dogs and identified five regions of the dog genome that are associated with canine brachycephaly. Fine mapping at one of these regions revealed a causal mutation in the gene BMP3. Bmp3's role in regulating cranial development is evolutionarily ancient, as zebrafish require its function to generate a normal craniofacial morphology. Our data begin to expose the genetic mechanisms unknowingly employed by breeders to create and diversify the cranial shape of dogs.
The domestic dog genome - shaped by domestication, adaptation to human-dominated environments and artificial selection - encodes tremendous phenotypic diversity. Recent developments have improved our understanding of the genetics underlying this diversity, unleashing the dog as an important model organism for complex-trait analysis.
Oryza sativa or Asian cultivated rice is one of the major cereal grass species domesticated for human food use during the Neolithic. Domestication of this species from the wild grass Oryza rufipogon was accompanied by changes in several traits, including seed shattering, percent seed set, tillering, grain weight, and flowering time. Quantitative trait locus (QTL) mapping has identified three genomic regions in chromosome 3 that appear to be associated with these traits. We would like to study whether these regions show signatures of selection and whether the same genetic basis underlies the domestication of different rice varieties. Fragments of 88 genes spanning these three genomic regions were sequenced from multiple accessions of two major varietal groups in O. sativa—indica and tropical japonica—as well as the ancestral wild rice species O. rufipogon. In tropical japonica, the levels of nucleotide variation in these three QTL regions are significantly lower compared to genome-wide levels, and coalescent simulations based on a complex demographic model of rice domestication indicate that these patterns are consistent with selection. In contrast, there is no significant reduction in nucleotide diversity in the homologous regions in indica rice. These results suggest that there are differences in the genetic and selective basis for domestication between these two Asian rice varietal groups.
Deep resequencing of functional regions in human genomes is key to identifying potentially causal rare variants for complex disorders. Here, we present the results from a large-sample resequencing (n = 285 patients) study of candidate genes coupled with population genetics and statistical methods to identify rare variants associated with Autism Spectrum Disorder and Schizophrenia. Three genes, MAP1A, GRIN2B, and CACNA1F, were consistently identified by different methods as having significant excess of rare missense mutations in either one or both disease cohorts. In a broader context, we also found that the overall site frequency spectrum of variation in these cases is best explained by population models of both selection and complex demography rather than neutral models or models accounting for complex demography alone. Mutations in the three disease-associated genes explained much of the difference in the overall site frequency spectrum among the cases versus controls. This study demonstrates that genes associated with complex disorders can be mapped using resequencing and analytical methods with sample sizes far smaller than those required by genome-wide association studies. Additionally, our findings support the hypothesis that rare mutations account for a proportion of the phenotypic variance of these complex disorders.
It is widely accepted that genetic factors play important roles in the etiology of neurological diseases. However, the nature of the underlying genetic variation remains unclear. Critical questions in the field of human genetics relate to the frequency and size effects of genetic variants associated with disease. For instance, the common disease–common variant model is based on the idea that sets of common variants explain a significant fraction of the variance found in common disease phenotypes. On the other hand, rare variants may have strong effects and therefore largely contribute to disease phenotypes. Due to their high penetrance and reduced fitness, such variants are maintained in the population at low frequencies, thus limiting their detection in genome-wide association studies. Here, we use a resequencing approach on a cohort of 285 Autism Spectrum Disorder and Schizophrenia patients and preformed several analyses, enhanced with population genetic approaches, to identify variants associated with both diseases. Our results demonstrate an excess of rare variants in these disease cohorts and identify genes with negative (deleterious) selection coefficients, suggesting an accumulation of variants of detrimental effects. Our results present further evidence for rare variants explaining a component of the genetic etiology of autism and schizophrenia.
Balancing selection is potentially an important biological force for maintaining advantageous genetic diversity in populations, including variation that is responsible for long-term adaptation to the environment. By serving as a means to maintain genetic variation, it may be particularly relevant to maintaining phenotypic variation in natural populations. Nevertheless, its prevalence and specific targets in the human genome remain largely unknown. We have analyzed the patterns of diversity and divergence of 13,400 genes in two human populations using an unbiased single-nucleotide polymorphism data set, a genome-wide approach, and a method that incorporates demography in neutrality tests. We identified an unbiased catalog of genes with signatures of long-term balancing selection, which includes immunity genes as well as genes encoding keratins and membrane channels; the catalog also shows enrichment in functional categories involved in cellular structure. Patterns are mostly concordant in the two populations, with a small fraction of genes showing population-specific signatures of selection. Power considerations indicate that our findings represent a subset of all targets in the genome, suggesting that although balancing selection may not have an obvious impact on a large proportion of human genes, it is a key force affecting the evolution of a number of genes in humans.
overdominance; frequency-dependent selection; heterosis; human evolution; population genetics; human diversity
Quantifying the number of deleterious mutations per diploid human genome is of critical concern to both evolutionary and medical geneticists1–3. Here, we combine genome-wide polymorphism data from PCR-based exon re-sequencing, comparative genomic data across mammalian species, and protein structure predictions to estimate the number of functionally consequential mutations carried by each of 15 African American (AA) and 20 European American (EA) individuals. We find that AAs show significantly higher levels of nucleotide heterozygosity than do EAs for all categories of functional mutations considered including synonymous, nonsynonymous, predicted “benign”, predicted “possibly damaging” and predicted “probably damaging” mutations. This result is wholly consistent with previous work showing higher overall levels of nucleotide variation in African populations as compared to Europeans4. EA individuals, on the other hand, have significantly more genotypes homozygous for the derived allele at synonymous and nonsynonymous SNPs and for the damaging allele at “probably damaging” SNPs than AAs do. Surprisingly, for SNPs segregating only in one population or the other, the proportion of nonsynonymous SNPs is significantly higher in the EA sample (55.4%) than in the AA sample (47.0%; P<2.3 ×10−37). We observe a similar proportional excess of SNPs that are inferred to be “probably damaging” (15.9% EA; 12.1% AA; P<3.3 ×10−11). Using extensive simulations, we show that this excess proportion of segregating damaging alleles in Europeans is likely a consequence of a bottleneck that Europeans experienced around the time of the migration out of Africa.
The largest genetic study to date of morphology in domestic dogs identifies genes
controlling nearly 100 morphological traits and identifies important trends in
phenotypic variation within this species.
Domestic dogs exhibit tremendous phenotypic diversity, including a greater
variation in body size than any other terrestrial mammal. Here, we generate a
high density map of canine genetic variation by genotyping 915 dogs from 80
domestic dog breeds, 83 wild canids, and 10 outbred African shelter dogs across
60,968 single-nucleotide polymorphisms (SNPs). Coupling this genomic resource
with external measurements from breed standards and individuals as well as
skeletal measurements from museum specimens, we identify 51 regions of the dog
genome associated with phenotypic variation among breeds in 57 traits. The
complex traits include average breed body size and external body dimensions and
cranial, dental, and long bone shape and size with and without allometric
scaling. In contrast to the results from association mapping of quantitative
traits in humans and domesticated plants, we find that across dog breeds, a
small number of quantitative trait loci (≤3) explain the majority of
phenotypic variation for most of the traits we studied. In addition, many
genomic regions show signatures of recent selection, with most of the highly
differentiated regions being associated with breed-defining traits such as body
size, coat characteristics, and ear floppiness. Our results demonstrate the
efficacy of mapping multiple traits in the domestic dog using a database of
genotyped individuals and highlight the important role human-directed selection
has played in altering the genetic architecture of key traits in this important
Dogs offer a unique system for the study of genes controlling morphology. DNA
from 915 dogs from 80 domestic breeds, as well as a set of feral dogs, was
tested at over 60,000 points of variation and the dataset analyzed using novel
methods to find loci regulating body size, head shape, leg length, ear position,
and a host of other traits. Because each dog breed has undergone strong
selection by breeders to have a particular appearance, there is a strong
footprint of selection in regions of the genome that are important for
controlling traits that define each breed. These analyses identified new regions
of the genome, or loci, that are important in controlling body size and shape.
Our results, which feature the largest number of domestic dogs studied at such a
high level of genetic detail, demonstrate the power of the dog as a model for
finding genes that control the body plan of mammals. Further, we show that the
remarkable diversity of form in the dog, in contrast to some other species
studied to date, appears to have a simple genetic basis dominated by genes of
Coat color and type are essential characteristics of domestic dog breeds. Although the genetic basis of coat color has been well characterized, relatively little is known about the genes influencing coat growth pattern, length, and curl. We performed genome-wide association studies of more than 1000 dogs from 80 domestic breeds to identify genes associated with canine fur phenotypes. Taking advantage of both inter- and intrabreed variability, we identified distinct mutations in three genes, RSPO2, FGF5, and KRT71 (encoding R-spondin–2, fibroblast growth factor–5, and keratin-71, respectively), that together account for most coat phenotypes in purebred dogs in the United States. Thus, an array of varied and seemingly complex phenotypes can be reduced to the combinatorial effects of only a few genes.
By targeting SNPs contained in both coding and non-coding areas of the genome, we are able to identify genetic differences and characterize genome-wide patterns of variation among individuals, populations and species. We investigated the utility of 454 sequencing and MassARRAY genotyping for population genetics in natural populations of the teleost, Fundulus heteroclitus as well as closely related Fundulus species (F. grandis, F. majalis and F. similis).
We used 454 pyrosequencing and MassARRAY genotyping technology to identify and type 458 genome-wide SNPs and determine genetic differentiation within and between populations and species of Fundulus. Specifically, pyrosequencing identified 96 putative SNPs across coding and non-coding regions of the F. heteroclitus genome: 88.8% were verified as true SNPs with MassARRAY. Additionally, putative SNPs identified in F. heteroclitus EST sequences were verified in most (86.5%) F. heteroclitus individuals; fewer were genotyped in F. grandis (74.4%), F. majalis (72.9%), and F. similis (60.7%) individuals. SNPs were polymorphic and showed latitudinal clinal variation separating northern and southern populations and established isolation by distance in F. heteroclitus populations. In F. grandis, SNPs were less polymorphic but still established isolation by distance. Markers differentiated species and populations.
In total, these approaches were used to quickly determine differences within the Fundulus genome and provide markers for population genetic studies.
To identify the causative mutation in a canine cone-rod dystrophy (crd3) that segregates as an adult onset disorder in the Glen of Imaal Terrier breed of dog.
Glen of Imaal Terriers were ascertained for crd3 phenotype by clinical ophthalmoscopic examination, and in selected cases by electroretinography. Blood samples from affected cases and non-affected controls were collected and used, after DNA extraction, to undertake a genome-wide association study using Affymetrix Version 2 Canine single nucleotide polymorphism chips and 250K Sty Assay protocol. Positional candidate gene analysis was undertaken for genes identified within the peak-association signal region. Retinal morphology of selected crd3-affected dogs was evaluated by light and electron microscopy.
A peak association signal exceeding genome-wide significance was identified on canine chromosome 16. Evaluation of genes in this region suggested A Disintegrin And Metalloprotease domain, family member 9 (ADAM9), identified concurrently elsewhere as the cause of human cone-rod dystrophy 9 (CORD9), as a strong positional candidate for canine crd3. Sequence analysis identified a large genomic deletion (over 20 kb) that removed exons 15 and 16 from the ADAM9 transcript, introduced a premature stop, and would remove critical domains from the encoded protein. Light and electron microscopy established that, as in ADAM9 knockout mice, the primary lesion in crd3 appears to be a failure of the apical microvilli of the retinal pigment epithelium to appropriately invest photoreceptor outer segments. By electroretinography, retinal function appears normal in very young crd3-affected dogs, but by 15 months of age, cone dysfunction is present. Subsequently, both rod and cone function degenerate.
Identification of this ADAM9 deletion in crd3-affected dogs establishes this canine disease as orthologous to CORD9 in humans, and offers opportunities for further characterization of the disease process, and potential for genetic therapeutic intervention.
Understanding the genetic structure of human populations is of fundamental interest to medical, forensic and anthropological sciences. Advances in high-throughput genotyping technology have markedly improved our understanding of global patterns of human genetic variation and suggest the potential to use large samples to uncover variation among closely spaced populations1–5. Here we characterize genetic variation in a sample of 3,000 European individuals genotyped at over half a million variable DNA sites in the human genome. Despite low average levels of genetic differentiation among Europeans, we find a close correspondence between genetic and geographic distances; indeed, a geographical map of Europe arises naturally as an efficient two-dimensional summary of genetic variation in Europeans. The results emphasize that when mapping the genetic basis of a disease phenotype, spurious associations can arise if genetic structure is not properly accounted for. In addition, the results are relevant to the prospects of genetic ancestry testing6; an individual’s DNA can be used to infer their geographic origin with surprising accuracy—often to within a few hundred kilometres.
Analysis of polymorphism and divergence in the non-coding portion of the human genome yields crucial information about factors driving the evolution of gene regulation. Candidate cis-regulatory regions spanning more than 15,000 genes in 15 African Americans and 20 European Americans were re-sequenced and aligned to the chimpanzee genome in order to identify potentially functional polymorphism and to characterize and quantify departures from neutral evolution. Distortions of the site frequency spectra suggest a general pattern of selective constraint on conserved non-coding sites in the flanking regions of genes (CNCs). Moreover, there is an excess of fixed differences that cannot be explained by a Gamma model of deleterious fitness effects, suggesting the presence of positive selection on CNCs. Extensions of the McDonald-Kreitman test identified candidate cis-regulatory regions with high probabilities of positive and negative selection near many known human genes, the biological characteristics of which exhibit genome-wide trends that differ from patterns observed in protein-coding regions. Notably, there is a higher probability of positive selection in candidate cis-regulatory regions near genes expressed in the fetal brain, suggesting that a larger portion of adaptive regulatory changes has occurred in genes expressed during brain development. Overall we find that natural selection has played an important role in the evolution of candidate cis-regulatory regions throughout hominid evolution.
It has been suggested that changes in gene expression may have played a more important role in the evolution of modern humans than changes in protein-coding sequences. In order to identify signatures of natural selection on candidate cis-regulatory regions, we examined single nucleotide polymorphisms obtained from the complete re-sequencing of conserved non-coding sites (CNCs) in the flanking regions of over 15,000 genes in 35 humans. Patterns of allele frequencies in CNCs indicate the presence of both positive and negative selection acting on standing variation within these candidate cis-regulatory regions, particularly for the 5′ and 3′ UTRs of genes. Gene-specific tests comparing levels of polymorphism and divergence identify several genes with strong signatures of selection on candidate cis-regulatory regions and suggest that the biological characteristics of genes subject to selection are different between coding and candidate cis-regulatory regions with respect to gene expression and function. For example, we find stronger signatures of positive selection in candidate cis-regulatory regions near genes expressed in the fetal brain, which we do not observe in a concurrent analysis on protein-coding regions. Our results suggest that both positive and negative selection have acted on candidate cis-regulatory regions and that the evolution of non-coding DNA has played an important role throughout hominid evolution.
What evolutionary forces shape genes that contribute to the risk of human disease? Do similar selective pressures act on alleles that underlie simple vs. complex disorders? [1-3]. Answers to these questions will shed light on the origin of human disorders (e.g., ), and help to predict the population frequencies of alleles that contribute to disease risk, with important implications for the efficient design of mapping studies [5-7]. As a first step towards addressing them, we created a hand-curated version of the Mendelian Inheritance in Man database (OMIM). We then examined selective pressures on Mendelian disease genes, genes that contribute to complex disease risk and genes known to be essential in mouse, by analyzing patterns of human polymorphism and of divergence between human and rhesus macaque. We find that Mendelian disease genes appear to be under widespread purifying selection, especially when the disease mutations are dominant (rather than recessive). In contrast, the class of genes that influence complex disease risk shows little signs of evolutionary conservation, possibly because this category includes both targets of purifying and positive selection.
Quantifying the distribution of fitness effects among newly arising mutations in the human genome is key to resolving important debates in medical and evolutionary genetics. Here, we present a method for inferring this distribution using Single Nucleotide Polymorphism (SNP) data from a population with non-stationary demographic history (such as that of modern humans). Application of our method to 47,576 coding SNPs found by direct resequencing of 11,404 protein coding-genes in 35 individuals (20 European Americans and 15 African Americans) allows us to assess the relative contribution of demographic and selective effects to patterning amino acid variation in the human genome. We find evidence of an ancient population expansion in the sample with African ancestry and a relatively recent bottleneck in the sample with European ancestry. After accounting for these demographic effects, we find strong evidence for great variability in the selective effects of new amino acid replacing mutations. In both populations, the patterns of variation are consistent with a leptokurtic distribution of selection coefficients (e.g., gamma or log-normal) peaked near neutrality. Specifically, we predict 27–29% of amino acid changing (nonsynonymous) mutations are neutral or nearly neutral (|s|<0.01%), 30–42% are moderately deleterious (0.01%<|s|<1%), and nearly all the remainder are highly deleterious or lethal (|s|>1%). Our results are consistent with 10–20% of amino acid differences between humans and chimpanzees having been fixed by positive selection with the remainder of differences being neutral or nearly neutral. Our analysis also predicts that many of the alleles identified via whole-genome association mapping may be selectively neutral or (formerly) positively selected, implying that deleterious genetic variation affecting disease phenotype may be missed by this widely used approach for mapping genes underlying complex traits.
Although mutations are known to cause varying degrees of harmful effects, it is difficult to quantify the distribution that best describes the variation of fitness effects of these mutations. Here we present a new method for inferring this distribution and inferring population history using Single Nucleotide Polymorphism (SNP) data from human populations. Using 47,576 SNPs discovered in 11,404 genes from sequencing 35 individuals (20 European Americans and 15 African Americans), we find evidence of an ancient population expansion in the sample with African ancestry and a relatively recent bottleneck in the sample with European ancestry. In both populations, the patterns of variation are consistent with a leptokurtic distribution of selection coefficients (e.g., gamma or log-normal) peaked near neutrality. Specifically, we predict 27–29% of amino acid changing (nonsynonymous) mutations are neutral or nearly neutral, 30–42% are moderately deleterious, and nearly all the remainder are highly deleterious or lethal. Furthermore, we infer that 10–20% of amino acid differences between humans and chimpanzees were fixed by positive selection, with the remainder of differences being neutral or nearly neutral.
Domesticated Asian rice (Oryza sativa) is one of the oldest domesticated crop species in the world, having fed more people than any other plant in human history. We report the patterns of DNA sequence variation in rice and its wild ancestor, O. rufipogon, across 111 randomly chosen gene fragments, and use these to infer the evolutionary dynamics that led to the origins of rice. There is a genome-wide excess of high-frequency derived single nucleotide polymorphisms (SNPs) in O. sativa varieties, a pattern that has not been reported for other crop species. We developed several alternative models to explain contemporary patterns of polymorphisms in rice, including a (i) selectively neutral population bottleneck model, (ii) bottleneck plus migration model, (iii) multiple selective sweeps model, and (iv) bottleneck plus selective sweeps model. We find that a simple bottleneck model, which has been the dominant demographic model for domesticated species, cannot explain the derived nucleotide polymorphism site frequency spectrum in rice. Instead, a bottleneck model that incorporates selective sweeps, or a more complex demographic model that includes subdivision and gene flow, are more plausible explanations for patterns of variation in domesticated rice varieties. If selective sweeps are indeed the explanation for the observed nucleotide data of domesticated rice, it suggests that strong selection can leave its imprint on genome-wide polymorphism patterns, contrary to expectations that selection results only in a local signature of variation.
Domesticated Asian rice is one of the oldest and most important crops in the world. Two main rice evolutionary lineages have been identified, and are thought to have been independently domesticated in Asia. We have examined patterns of DNA sequence variation in the genomes of rice and its wild ancestor to make inferences about the origin of domesticated rice. Population bottlenecks (a reduction in the size of the founding population) in the evolutionary transition from wild to cultivated species has long been thought to be the dominant force shaping patterns of molecular evolution during domestication. We find that the nucleotide variation patterns in rice are inconsistent with a simple bottleneck model. Rice genetic variation, however, can be explained by either a model that incorporates both a bottleneck and migration among rice variety groups, or a model that incorporates a bottleneck and multiple rounds of artificial selection on rice. Selection by humans is believed to have played an important role during crop domestication, and these results may suggest that strong, recurrent selection can leave a signal that can be observed throughout the genomes of domesticated species.