Phylogenetic Evidence: Accelerated Evolution of a Functional Element in the Human PDYN Promoter
To understand the evolutionary basis for the functional variation, we sequenced 3 kilobases (kb) of PDYN regulatory DNA from 74 human chromosomes and 32 chromosomes from seven species of non-human primates, experimentally determining haplotypic phase by cloning each allele. The non-human primates bear a single copy of the 68-bp regulatory element, and the pattern of substitutions implies that the duplication of the element is specific to the human lineage. All human copies of the element carry five substitutions that differentiate them from the sequence inferred for the last common ancestor of humans and chimpanzees. A sixth difference is variable among repeats in some human haplotypes ().
The five substitutions fixed on the human lineage are dramatically more than expected for 68 bp of neutrally evolving sequence. Under a model of spatially random mutation, the expected number of substitutions is fewer than 0.5, and the observed number is extremely improbable (Poisson
p < 0.0001), whether we calculate the expectation from the density of substitutions across the
PDYN promoter, the average divergence between human and chimpanzee on a chromosomal scale [
43], or the estimated great ape substitution rate [
44].
The elevated number of substitutions may be due to locally elevated mutation rate or to positive selection increasing the probability of fixation of new mutations. If the local mutation rate is intrinsically elevated, other species should also exhibit rapid evolution in the 68-bp region. We therefore tested a molecular clock, using phylogenetic likelihood ratio tests [
45], to ask whether the 68-bp region is evolving rapidly due to an elevated mutation rate. The evolution of the 68-bp element is significantly accelerated exclusively along the branch leading to humans from our last common ancestor with chimpanzees (
p = 0.005). The other branches of the evolutionary tree show no departure from rate constancy (
p = 0.657), and the remainder of the promoter region and the coding sequence of
PDYN also show no acceleration (). To control for possible lineage specific rate variation, we applied the relative ratio test [
46], which allows for lineage-specific rates and for DNA region-specific rates, and tests for lineage-by-region interactions. Here again, the human 68-bp repeat exhibits a significant departure from the neutral expectation (
p = 0.001 for proportionality to the rest of the promoter,
p = 0.015 for proportionality to the coding sequence; ); the remaining lineages and regions exhibit no such departures. The phylogenetic data imply that the rapid evolution of the human 68-bp element is due to positive selection.
The molecular evolution of the PDYN protein sequence, unlike the regulatory DNA, is consistent with a history dominated by negative selection. In a sample of the complete coding sequences from multiple chromosomes of eight primate species, 25 of the 254 amino acids in the PDYN protein vary, but none of the variants affect the 56 amino acids that comprise the neuroactive peptides. The exclusion of variation from the mature opioid peptides (Poisson
p = 0.004) implies negative selection to maintain function. Phylogenetic likelihood ratio tests found no support for positive selection shaping the amino acid sequence of the remainder of the preprotein (model 1 versus model 2 of Yang et al. [
47],
p = 0.62). Nielsen et al. [
13], in a genome scan of human-chimpanzee orthologs, also found no evidence for selection on the PDYN protein. No amino acid polymorphisms are known among humans, and we found none by directly sequencing the coding regions from chromosomes bearing each of the four repeat-number alleles of the promoter.
Population Genetic Evidence: An Excess of High-Frequency-Derived Mutations Flanking the Selected Element
Positive selection alters the frequency spectrum of linked neutral mutations. As the selected mutations are driven rapidly to fixation, linked alleles are dragged along to high frequency [
48]. The linked alleles may be dragged to fixation, but they may also be driven to high frequency and then decoupled from the selected mutation by recombination or allelic gene conversion. As a result, an excess of high-frequency-derived mutations flanking a fixed difference provides evidence for positive selection [
49]. Our sample of 74 experimentally phased haplotypes from an Austrian population exhibits such a pattern (
Table S1). Fay and Wu's
H statistic is −8.13, strongly supporting a departure from neutrality and consistent with positive selection (
p = 0.004). The three polymorphisms nearest to the 68-bp element have derived allele frequencies greater than 0.95 in all repeat-number allelic classes, consistent with a selective sweep that fixed mutations in the 68-bp region, and that thus predated the origin of different repeat alleles by tandem duplication. As the 68-bp element is tandemly repeated in all sampled human populations (), the signature of selection in all Austrian repeat-number allelic classes also implies that the selective events predate the global human diaspora. A sample of 20 chimpanzee haplotypes, though exhibiting many more polymorphic sites than the human haplotypes, and hence more power to detect a departure from neutrality, shows no such departure (
H = 1.62).
| Table 3PDYN Repeat Allele Frequencies |
The human-specific accelerated evolution of the 68-bp element is best explained as the result of positive selection favoring the fixation of mutations. Although the rate is elevated by a factor of more than ten over the neutral expectation, the selection intensity required to explain this excess is quite modest. The rate of substitution
(k) is equal to the rate at which new mutations arise in the population (2
Neμ) times the probability that a new mutation will become fixed [
50], which is 1/2
Ne for neutral mutations and approximately 2
s for advantageous mutations in a population of constant size [
51], where
Ne is the effective population size,
μ is the mutation rate, and
s is the selective advantage of the mutant allele. If we let
fa be the fraction of non-deleterious mutations in the 68-bp element that are advantageous, then the rate acceleration
(kobserved/kneutral) is the ratio of the substitution rate in the human 68-bp element ([1 −
fa]
μ + 4
Nesμfa) to that expected in the absence of positive selection
(μ). We can place bounds on
fa by recognizing that there are only 204 base-substituting mutations possible in a 68-bp sequence. For the usual estimate [
52] of long-term human effective population size,
Ne = 10,500,
s falls in the range 0.0002 to 0.045; for
fa greater than 2.2% (e.g., if all five fixed mutations were advantageous)
s is less than 0.01 (
Figure S1), well below the estimated selection coefficients of lactase persistence in Northern Europe [
53] and G6PD deficiency [
54] in regions of endemic malaria.
Functional Evidence: The Selected Element Increases Inducible PDYN Expression
To determine the effect of the selected nucleotide substitutions on
PDYN transcription, we transiently transfected the human neural cell line SH-SY5Y with constructs bearing 3 kb of human or chimpanzee
PDYN cis-regulatory DNA linked to a luciferase reporter. Downstream of the 68-bp repeat, in the non-coding first exon of
PDYN, is a downstream regulatory element (DRE), a binding site for the repressor protein DRE-antagonist modulator (DREAM) [
55]. A single nucleotide substitution in humans alters the DRE from the sequence found in the other primate species. To isolate the effect of the substitutions in the 68-bp element from the effect of the substitution in the DRE, we generated chimeric constructs containing either the human DRE or the human 68-bp element in the context of the chimpanzee promoter ().
We found that the human DRE sequence conferred slightly elevated expression of the reporter under basal conditions, though the effect was not significant (B; analysis of variance p = 0.11). The sequence of the 68-bp element has no effect under these conditions (p = 0.66). When the effect of DREAM is removed by stimulating the cells to release intracellular Ca2+, which binds DREAM and causes it to release from DNA, the effect of the substitutions in the 68-bp element is conspicuous. Under these conditions, the human 68-bp element drives significantly higher expression than the chimpanzee sequence, regardless of the source of flanking sequence (B; p = 0.002). Each relevant pairwise contrast is significant by t-test (human versus chimpanzee, 120%, p = 0.006; chimpanzee with human element versus chimpanzee, 115%, p = 0.037; human versus chimpanzee with human DRE, 120%, p = 0.007). In a three-factor analysis of variance, incorporating Ca2+ stimulation and the sequences of the DRE and the 68-bp element, the main effects of Ca2+ (p < 0.001) and the 68-bp element (p = 0.011) are significant and the interaction between Ca2+ and the 68-bp element is nearly significant (p = 0.054).
In contrast to the SH-SY5Y results, we observed no difference between chimpanzee and human constructs in the non-neural JAR cell line (C), which serves as a control for the biological relevance of the cis-regulatory differences. Because PDYN is expressed in a broad range of neural and endocrine cell types and is induced by a diverse array of stimuli, our limited survey of potential functional consequences of human-specific regulatory substitutions is unlikely to have identified all such changes. Although transient transfection entails the removal of the regulatory DNA from its chromosomal context and the possible loss of biologically important interactions, the experimental results imply that the substitutions in the 68-bp element are visible to the cell.
Continuing Selection: PDYN Exhibits Elevated Differentiation among Populations and Reduced Variation within Them
The evidence for positive selection on the functional 68-bp element, and hence for increased
PDYN expression in humans, raises the possibility that selection has also acted more recently on the alleles that differ in the number of tandem repeats of the element following the origin of modern humans. Intraspecific
PDYN variation is a plausible target for selection because variation in the number of repeats has been shown to affect inducibility by the phorbol ester TPA [
36] and has been associated with protection against cocaine dependency [
38] and with neurological disease [
37,
39]. Moreover, evidence for selection among human populations would corroborate the functional importance of the 68-bp element, and hence support the inference of selection in human origins.
Population genetics predicts that recent selection in human populations will leave two types of signatures in patterns of genetic variation: departures from neutral expectations in the pattern of differentiation among populations, and departures from neutral expectations in the pattern of variation within populations. These predictions have given rise to a battery of statistical tests:
FST -based tests to examine differentiation among populations, and
θ-based tests to examine diversity within populations [
56].
We initially genotyped the repeat polymorphism in six Old World populations and compared differentiation among populations (measured by
FST) at the repeat locus to the differentiation expected at loci evolving neutrally. Elevated
FST is a signature of geographically heterogeneous positive selection, driving allele frequencies to differ among populations more rapidly than they would if genetic drift and migration only were acting [
57]. We estimated the neutral distribution of
FST values from a set of 18 mutually unlinked candidate neutral single nucleotide polymorphisms (SNPs) typed in the same individuals [
58]. Each of the candidate neutral SNPs was selected for this preliminary screening on the basis of its high heterozygosity in Europe and its distance (more than 200 kb) from known genes.
FST values are constrained by the overall level of variation at a locus, so high heterozygosity is a useful filter for a pool of informative marker SNPs. Similarly, because genes and their regulatory elements are more likely to be under selection than arbitrary non-coding DNA, SNPs distant from genes are good candidates for neutral mutations.
Alleles with one or four copies of the
PDYN repeat element are rare in every population we examined, but the frequencies of the two- and three-repeat alleles differ dramatically among populations (). The three-repeat allele ranges in frequency from less than 10% in China and New Guinea to more than 60% in Italy and Ethiopia. The differentiation at the repeat locus is higher than all 18 neutral markers for four of fifteen pairwise comparisons (), and the degree of elevation is substantial (A-D). Although the small number of loci in our neutral proxy dataset makes it difficult to estimate precise significance values, we may approximate a denser probability distribution by bootstrapping over loci [
58]. In this test, the difference between the
PDYN FST and the 18-locus estimate of
FST is significantly higher than the bootstrapped differences (
p < 0.001) in the four comparisons. Moreover,
PDYN has the second or third highest
FST in four more comparisons; the sum of
FST ranks across all 15 comparisons is significantly low (
p = 0.01), although this
p-value cannot be taken at face value due to the non-independence of the pairwise comparisons.
| Table 4Pairwise FST at PDYN and at Neutral Markers |
If the elevated
FST at
PDYN is due to positive selection favoring different alleles in different populations, the signature of selection should also be visible in nearby variants, whose evolutionary fates are tied to the selected variant by linkage. We therefore asked whether the
PDYN locus falls within an extended region of elevated
FST. We investigated only Chinese-European
FST, the population contrast for which our data suggested elevated
FST (A) and for which a genomic dataset was available. We used a dataset of 1,236,401 autosomal SNPs genotyped in African-, European-, and Chinese-Americans [
59]. Because SNP ascertainment can influence the distribution of polymorphism statistics, we limited ourselves to SNPs ascertained by a single scheme: specifically, array-based resequencing of chromosomes from the National Institutes of Health Polymorphism Discovery Resource, a global sample. Because the 1.2 million SNPs share a common ascertainment bias, variation in
FST along the chromosomes will reflect only variation in the demographic and selective history of genomic regions.
As an initial screen, we generated a 15-SNP sliding window plot of
FST, considering only SNPs whose expected global heterozygosity exceeds 0.30. This filter is necessary to remove the dependence of
FST on heterozygosity; otherwise, the plot would primarily reflect variation in the allele frequencies of the genotyped SNPs rather than differentiation among populations. As E shows,
PDYN falls within a tall and broad peak in
FST. A finer scale sliding window plot (F) indicates that the region of elevated
FST encompasses two genes,
PDYN and a serine/threonine kinase
(STK35) implicated in cytoskeletal regulation [
60]. These genes are divergently transcribed, and their intergenic region therefore likely contains the majority of
cis-regulatory DNA for both genes. The 3′ flanking regions of each gene also exhibit elevated
FST.The genome-wide empirical distribution of FST is shaped by both demography and selection, and therefore the tail probabilities of SNPs estimated from the empirical distribution represent a very conservative test for selection. Nevertheless, the SNPs within the PDYN-STK35 FST peak exhibit significantly elevated FSTs. In G, we plot FST versus expected global heterozygosity for all 52 genotyped SNPs in the 170-kb interval defined by PDYN and STK35 (i.e., excluding the 3′ flanking SNPs). We also plot the contours of the genome-wide FST distribution conditioned on heterozygosity; note that the median FST is below 0.06 for all heterozygosities. Six of the 52 SNPs in this region (12%) have FSTs in the top 0.5% of the genome-wide distribution, and 20 of the 52 (38%) are in the top 5%.
The number and location of selected variants driving elevation of
FST remain unclear. However, neither
PDYN nor
STK35 is known to contain any non-synonymous variants, and neither protein sequence exhibits evidence of positive selection during human evolution [
13]. The target or targets of selection are therefore likely to be
cis-regulatory and to include the alleles of the 68-bp element.
Positive selection driving differentiation between populations should also decrease variation within populations; as a selected allele increases in frequency, its haplotype replaces other haplotypes before accumulating new variation. Microsatellites are particularly sensitive monitors of linked selection because of their high levels of polymorphism and high mutation rate. We asked whether the microsatellite nearest the PDYN promoter 68-bp element, a (CA)13–27 dinucleotide microsatellite 1.3 kb further upstream, exhibits the predicted signatures of selection. We genotyped the microsatellite in our panel of six populations (A), and we used repeat-number variance and expected heterozygosity as summary statistics ().
| Table 5PDYN Microsatellite Summary Statistics |
Repeat-number variance and heterozygosity are functions of
θ = 4
Neμ [
61]. Because microsatellites vary in their mutation rates
(μ) and recombinational contexts (which influences
Ne), we used test statistics that control for these effects. For a given microsatellite, mutation rate and recombinational context are expected to be shared among populations, so they cancel out in a ratio. The ratio, R
θ, therefore estimates the relative effective sizes of two populations controlling for locus-specific phenomena; remaining variation among neutral microsatellites is attributable to stochastic variation in the outcomes of a neutral coalescent process [
62,
63]. Positive selection in one population will reduce heterozygosity and repeat-number variance at a linked microsatellite, causing it to appear in the tails of the estimated distributions of lnRV and lnRH (where repeat-number variance and heterozygosity are used in place of
θ).
We estimated lnR
θ distributions empirically from a genome-wide dataset of 337 autosomal loci [
64,
65]. Because our
FST data do not indicate recent selection in the sample from Cameroon, we used Cameroon as the denominator in all ratios, and we tested for positive selection in the other populations. Those in which positive selection has acted are predicted to exhibit significantly negative lnR
θ at the
PDYN microsatellite
, unless the Cameroon sample has experienced equal or more extreme positive selection at a
PDYN-linked locus.
We found a significant reduction in repeat-number variance at the PDYN microsatellite (B) in three populations (Italy, p = 0.031; India, p = 0.034; China, p = 0.021), but not in Ethiopia (p = 0.103) or Papua New Guinea (p = 0.209). The sum of lnRV ranks across populations places PDYN in the 2.5% tail of lowest sums among all the microsatellites. The reduction in heterozygosity at PDYN (C) is even more extreme (p < 0.003 for Italy and India, p < 0.006 for Ethiopia, p = 0.016 for China, and p = 0.072 for Papua New Guinea). The PDYN microsatellite is the locus with the lowest lnRH rank summed over populations.
The relationship between the events reducing variation at the PDYN microsatellite and the events elevating FST at the 68-bp repeat is most obvious when the haplotypic phase between the two elements is considered. We calculated expected heterozygosity and repeat-number variance in subsets of our experimentally determined haplotypes from an Austrian population. As shown in , the overall reduction in microsatellite heterozygosity and repeat-number variance is driven by the rapid elevation in frequency of the three-repeat allele at the 68-bp element.
The combination of elevated FSTs and reduced lnRθs implies that the selection occurred in multiple populations, favoring the two-repeat allele in China and India, and the three-repeat allele in Italy and Ethiopia. However, it remains possible that the 68-bp element in the PDYN promoter is not itself the target of selection, as the entire PDYN-STK35 region bears the signature of recent positive selection.