|Home | About | Journals | Submit | Contact Us | Français|
Oxytocin is a short peptide with multiple functions in human biology and has been implicated in the disorder autism. We set out to determine the normal pattern of variation around the oxytocin gene and have resequenced it and its flanking regions in 91 individuals from four HapMap populations and one chimpanzee. We identified 14 SNPs, all non-coding, including eight that were novel. Population-genetic analyses were largely consistent with a neutral evolutionary history, but an HKA test revealed more variation within the human population than expected from the level of chimpanzee-human divergence.
The nine amino acid peptide oxytocin is both a hormone and neurotransmitter in mammals, including humans. Its peripheral actions include stimulating cervical dilation and uterine contraction during childbirth, milk release in lactating mothers and orgasm during sexual activity. In the brain, it influences cognitive, emotional and social functions, including maternal and sexual behaviours (Caldwell and Young 2006). Particularly striking illustrations of these are the findings that intranasal oxytocin administration increases ‘mind-reading’ ability (Domes et al. 2007) and trust, the latter by specifically increasing an individual's willingness to accept social risks arising through interpersonal interactions (Kosfeld et al. 2005). Such characteristics might have been particularly important during the recent evolution of large complex human societies, so oxytocin regulation might have undergone selection during recent evolution.
In addition, oxytocin has been linked to the disorder autism (OMIM 209850) (Carter 2007). This complex phenotype is characterised by poor verbal communication, decreased social interaction and stereotyped patterns of behaviour. It shows high heritability but a complex aetiology that is poorly understood, although it includes de novo mutations in a small proportion of autism patients. Among these was a deletion that removed a ~1.1 Mb region containing 27 genes from chromosome 22 (Sebat et al. 2007), including that coding for oxytocin, OXT. This was of particular interest because of oxytocin's influence on social interactions noted above (Kosfeld et al. 2005; Domes et al. 2007), because oxytocin plasma levels are decreased in autistic subjects (Modahl et al. 1998) and because intravenous oxytocin has been reported to improve some autistic symptoms such as the retention of social information (Hollander et al. 2007). If oxytocin level is relevant to autism, could genetic variants that decrease expression increase the chances of developing the disorder in non-deleted patients, when combined with other genetic or environmental factors?
Oxytocin is synthesised as a precursor that also contains a signal peptide and the protein neurophysin I, which are together encoded by the OXT gene. The role of neurophysin I is unclear, but it may be important for protection or packaging of oxytocin (Caldwell and Young 2006). We have undertaken a study of the genetic variation in and around the OXT gene in healthy individuals from four populations in order to discover the full spectrum of variation, and assess the selective pressures that have shaped the region during recent human evolution.
Human DNA samples were from the HapMap populations (The International HapMap Consortium 2005): 23 Yoruba from Ibadan, Nigeria (YRI), 23 Chinese Han from Beijing (CHB), 22 CEPH Utah residents with ancestry from northern and western Europe (CEU) and 23 Luhya from Webuye, Kenya (LWK). DNAs were purchased from the Coriell Institute for Medical Research (Camden, New Jersey, USA). In addition, one chimpanzee (Pan troglodytes) sample from the ECACC (Salisbury, Wiltshire, UK) was included as an outgroup.
A ~2.2 kb region (Chr20: 2999590–3001789, NCBI build 36) was amplified from genomic DNA with the primer pair 1F and 4R (Table 1). The region is highly GC-rich, and successful amplification required the addition of DMSO to the reaction. The 25μl PCR reaction mixture contained 1 × Taq buffer (Invitrogen), 200 μM dNTPs, 2 mM MgSO4, 0.4 μM 1F primer, 0.4 μM 4R primer, 1.5 U of Platinum High Fidelity Taq DNA polymerase (Invitrogen), 5% DMSO and 200 ng template DNA. The reaction was initiated by incubating at 94°C for 15 min, followed by 35 cycles of 94°C for 30 s, 55°C for 30 s, and 68°C for 3 min, and included a final extension at 68°C for 7 min 30 s. 10 μl PCR products were purified using Exonuclease I (0.067 units) and Shrimp Alkaline Phosphatase (0.67 units) for 1 hr at 37°C followed by 15 min at 85°C to denature the enzymes, and then sequenced by the Faculty Small Sequencing Projects Group at The Wellcome Trust Sanger Institute with all four pairs of primers (Table 1) to produce overlapping reads covering both strands.
All potentially polymorphic positions were flagged by the Mutation Surveyor v. 2.0 software (SoftGenetics, PA, USA) and checked manually. Variable positions were compared in overlapping and complementary reads in all individuals, so that most variants were confirmed on both strands in multiple reads. In addition, four blind replicate samples were included in the experiment, and showed complete concordance. As a quality check against external data, genotype calls were compared with HapMap calls for the three SNPs from this region typed by HapMap (rs2740210, rs2770378, rs2740208) in the 68 common samples (The International HapMap Consortium 2005). Six discrepancies were found (Supplementary Table 1), mostly corresponding to heterozygote calls in one dataset called as homozygotes in the other (overall 1.7% allele difference). Our calls at these positions appeared clear and were used in our analyses. All novel SNPs were confirmed by genotyping using a SNaPshot primer extension protocol (Applied Biosystems; primers in Supplementary Table 2, results in Supplementary Figure 1).
Derived alleles were identified from the chimpanzee and macaque sequences (The Chimpanzee Sequencing and Analysis Consortium 2005; Gibbs et al. 2007) and derived allele frequencies were determined by direct counting in the four populations. The following statistics were calculated using DnaSP 4.0 (Rozas et al. 2003): (1) Ka/Ks – the ratio of the number of nonsynonymous substitutions per nonsynonymous site (Ka) to the number of synonymous substitutions per synonymous site (Ks); (2) nucleotide diversity; (3) sequence divergence from chimpanzee; (2) and (3) were also calculated using a sliding window with window length 200 bp and step size 100 bp; (4) allele frequency spectrum summary statistics: Tajima's D (Tajima 1989), Fu and Li's D and F (Fu and Li 1993) and Fay and Wu's H (Fay and Wu 2000); (5) coalescent simulations to evaluate the significance of the results from (4) under a simple neutral model; (2), (4) and (5) were calculated for the whole region and for the OXT gene alone; and (6) the HKA test (Hudson et al. 1987) was used to compare divergence and diversity at OXT with corresponding values from a neutral 498 kb ENCODE region ENr321 (Chr8:118882220-119382220) from chromosome 8q24 (Birney et al. 2007); ENCODE data were only available from the YRI, CHB and CEU populations, and so the LWK data were omitted from this analysis. Since no SNPs were found in the OXT coding region, we omitted the coding region of the single gene annotated in ENr321 from this analysis, but similar results were obtained if it was retained. The chimpanzee-human alignment needed for this test was downloaded from Ensembl (http://www.ensembl.org/Homo_sapiens/alignsliceview?c=8:119132220.5;w=500000;align=opt_align_228) and the divergence calculated by DnaSP. The population differentiation at individual sites was measured using FST (Schneider et al. 2000). Linkage disequilibrium was investigated using Haploview 3.32 (Barrett et al. 2005). Haplotypes were inferred from the genotype data using PHASE 2.1 (Stephens et al. 2001) and a median-joining network was constructed from the resulting haplotypes using NETW18.104.22.168 (Bandelt et al. 1999) (http://www.fluxus-engineering.com/sharenet.htm).
We first evaluated the evolutionary pressures acting on the OXT coding region over the last few million years by examining the Ka/Ks ratio calculated from the human, chimpanzee (The Chimpanzee Sequencing and Analysis Consortium 2005) and rhesus macaque (Gibbs et al. 2007) reference sequences. This value will be dominated by the neurophysin I sequence because of its length, and was 0.51. This is higher than the genomewide average of 0.25 (Gibbs et al. 2007), but lower than the neutral value of 1. It therefore indicates constrained evolutionary change, although with less constraint than the average protein, and supports the idea that the amino acid sequence of neurophysin I is functionally important.
We then determined the sequence of a 2.1 kb region encompassing the gene (Chr20: 2999650-3001750, Figure 1) in 91 individuals from four populations, and one chimpanzee. We found 14 variable positions in humans, all of which were base substitutions and lay in the flanking or intronic regions; none were found in the coding region (Table 2; Supplementary Table 3). Six of the SNPs were present in dbSNP, but the other eight were novel, and all of these were confirmed by genotyping (Supplementary Figure 1). As expected, the novel SNPs were lower in frequency than the known ones, but included two present at >10% frequency in individual population samples. Six dbSNP entries were not found in our sample: one of these was rare, and the others have not been reported in population surveys, so may either represent other rare variants or erroneous database records (Table 2). The average nucleotide diversity for the region was 8.6 × 10−4, and was slightly lower for the OXT gene alone (4.5 × 10−4) (Table 3), but both these values are within the normal range (e.g. Akey et al. 2004). Strikingly, both nucleotide diversity and divergence from chimpanzee varied across the region, being low upstream of the gene and in the exons, but higher in intron 1 and downstream of the gene (Figure 1). On this timescale, the 5′ region is almost as constrained as the coding region.
We next investigated whether the patterns of variation within the four populations were consistent with a neutral evolutionary history, or indicated the action of positive or balancing selection. The allele frequency spectrum statistics examined (Tajima's D, Fu and Li's D, Fu and Li's F and Fay and Wu's H) can show unusually low values in response to positive selection or high values in response to balancing selection, but were all consistent with neutrality, both in individual populations and in the worldwide sample (Table 3). The degree of population differentiation (FST value), which can be high if positive selection has acted differentially on populations, was not observed to be unusually high, ranging from zero to 0.17 (Supplementary Table 4). Eighteen inferred haplotypes were present and linkage disequilibrium was extensive (D′ = 1) through most of the region (Supplementary Figure 2). A median-joining network linking the haplotypes was therefore constructed and showed a compact structure with limited reticulation as expected from the LD pattern (Figure 2). The three most common haplotypes were all present in all populations, consistent the lack of population differentiation shown by individual SNPs.
Finally, we assessed whether the level of variation in humans was as expected from the amount of divergence from chimpanzee, as it would be under a strictly neutral model of evolution. For this, we used an HKA test in which OXT was compared with other neutral resequenced regions. In order to provide power, we used ~500 kb resequenced by the ENCODE project (Birney et al. 2007) in three of the same populations. More variation was found at OXT than expected (P = 0.024; Table 4), and in view of the large comparative region analysed, this is a robust result.
We have carried out a thorough investigation of the genetic variation in and around the OXT gene. We were particularly interested in two aspects of this: whether or not variants likely to influence function were present, and whether or not the region showed a neutral evolutionary history.
Functional variants are located most obviously in the protein-coding regions of a gene, but evolutionary analyses show that many non-coding sequences are conserved to the same extent as coding sequences, and thus are also likely to be functionally important. No OXT coding variants were found, but two novel low-frequency SNPs were discovered in the 5′ flanking region in African populations, one of which (at 3,000,148) lies in the most highly conserved segment outside the exons (Figure 1b). Comparison with a species such as platypus shows that the downstream region is overall as conserved as the upstream (44% downstream, 46% upstream; Table 5), although this conservation is broadly distributed and it lacks highly conserved segments (Figure 1b). It is therefore possible that the SNPs in one or both regions might contribute to variation in the expression of the gene.
Most of the evolutionary tests suggested a neutral evolutionary history of the region, ruling out strong recent positive or balancing selection. The HKA test, however, revealed a larger number of variants than expected, when the evolutionary divergence rate was taken into account. This finding is in one respect conservative, because the evolutionary divergence may be over-estimated when a single chimpanzee is used for comparison. The result can be viewed in two ways. It could be argued that since we applied five tests (Tables (Tables33 and and4),4), a strict Bonferroni correction would require a P value of 0.01 for significance, and thus our observed P value of 0.024 is not significant: the excess variants are within the range expected by chance. However, a Bonferroni correction is probably too conservative in this case, because the tests in Table 3 all examine related aspects of the allele frequency spectrum and so are not independent. According to this view, the excess of variants is above that expected by chance and, to speculate further, increased diversity of 3′ regulatory elements might have been selected for, perhaps leading to a range of OXT expression levels in the population.
It is now possible to design in vitro experiments to study the regulation of OXT expression, and association studies to relate its variation to phenotypes of interest such as plasma oxytocin level or autism. It is striking that LD in the 100 kb region surrounding OXT is very low in all the HapMap populations, and is even incomplete for position 3,001,514 within the short region resequenced, so tagging would be inefficient and all SNPs need to be identified and genotyped directly in association studies. The new SNPs discovered in this study provide additional material for such projects.
Genotyping of novel variants. For each SNP, traces illustrate the common homozygous genotype on the left, and the novel variant in heterozygous form on the right.
Linkage disequilibrium in the resequenced region. SNPs are listed in Table 2. Red: D′ = 1, LOD ≥ 2. Blue: D′ = 1, LOD < 2. White: D′ < 1, LOD < 2.
We thank the Sanger sequencing team for generating the sequence data. Y Xu and LW were supported by the National Natural Science Foundation of China (grant no. 30771817), and by Harbin Medical University (grant); Y Xue, AD and CTS were supported by The Wellcome Trust.