|Home | About | Journals | Submit | Contact Us | Français|
The process of domestication can exert intense trait-targeted selection on genes and regulatory regions. Specifically, rapid shifts in the structure and sequence of genomic regulatory elements could provide an explanation for the extensive, and sometimes extreme, variation in phenotypic traits observed in domesticated species. Here, we explored methylation differences from >24,000 cytosines distributed across the genomes of the domesticated dog (Canis familiaris) and the gray wolf (C. lupus). PCA and model-based cluster analyses identified two primary groups, domestic versus wild canids. A scan for significantly differentially methylated sites (DMSs) revealed species-specific patterns at 68 sites after correcting for cell heterogeneity, with weak yet significant hyper-methylation typical of purebred dogs when compared to wolves (59% and 58%, p<0.05, respectively). Additionally, methylation patterns at eight genes significantly deviated from neutrality, with similar trends of hyper-methylation in purebred dogs. The majority (>66%) of differentially methylated regions contained or were associated with repetitive elements, indicative of a genotype-mediated trend. However, DMSs were also often linked to functionally relevant genes (e.g. neurotransmitters). Finally, we utilized known genealogical relationships among Yellowstone wolves to survey transmission stability of methylation marks, from which we found a substantial fraction that demonstrated high heritability (both H2 and h2>0.99). These analyses provide a unique epigenetic insight into the molecular consequences of recent selection and radiation of our most ancient domesticated companion, the dog. These findings suggest selection has acted on methylation patterns, providing a new genomic perspective on phenotypic diversification in domesticated species.
As a result of intense trait-targeted selection over a relatively small number of generations, the process of domestication may result in extreme phenotypic diversity (Darwin 1868; Clutton-Brock 1992; Clutton-Brock 1999; Diamond 2002). Recently, the molecular underpinnings of this diversification has been the subject of intense investigation (Ostrander & Wayne 2005; Dobney & Larson 2006; Wiener & Wilkinson 2011; Albert et al. 2012; Andersson 2013; Larson et al. 2014). Various approaches have been used to unravel the genetic changes that occur during domestication, most notably genome-wide SNP and haplotype analyses to map regions under selection and associate those regions with phenotypes (Andersson & Georges 2004; Doebley et al. 2006; Wiener & Wilkinson 2011; Wayne & vonHoldt 2012).
Changes in genomic regulatory elements or transcriptional machinery could explain a significant component of the dramatic evolution of phenotypic diversity in dogs and other domesticated species (Andersson & Georges 2004). Past studies are limited and have generally been transcriptome-based, focusing on differential expression analyses of brain tissues implicated in behavior and brain development (Saetre et al. 2004; Kukekova et al. 2011; Albert et al. 2012), as well as the impact of captivity on genes involved in the stress response (Kennerly et al. 2008). More recently, the use of epigenome-wide association studies (EWAS) have provided a new profile of the epigenomic changes that occur during domestication (Feeney et al. 2014).
Cytosine methylation serves as one mechanism for pre-transcriptional regulatory control and is involved in essential cellular processes, such as cell differentiation, X-inactivation, and genomic imprinting (Bernstein et al. 2007; Suzuki & Bird 2008; Richards et al. 2010). Methylated cytosines in 5’ promoter regions act as transcriptional suppressors by inhibiting the transcriptional complex from binding to the promoter sequence (Bird 2007). New sequencing approaches can now interrogate the methylation state of each cytosine, specifically cytosines in CpG islands (CGIs) comprising putatively regulatory elements and representing a potentially functional fraction of the mammalian genome (Saxonov et al. 2006). In mammals, these islands are often found in association with promoter elements in genes (Deaton & Bird 2011) and epigenetic modifications most frequently occur within the CG dinucleotide motif (Bernstein et al. 2007).
Here, we investigate the methylation variation in the domesticated dog and its wild predecessor, the gray wolf. The dog is considered the oldest domesticated species, providing a rich system for investigation of the evolutionary changes that occur during the domestication process (Sutter & Ostrander 2004; Boyko 2011; Shearman & Wilton 2011; Wayne & vonHoldt 2012; Freedman et al. 2014). Remarkably, the extreme genetic and phenotypic variation observed within dog breeds arose in a short time period, initiating 15,000 to 40,000 years ago (Sablin & Khlopachev 2002; Savolainen et al. 2002; Germonpré et al. 2009; Boyko et al. 2010; Vonholdt et al. 2010; Ovodov et al. 2011; Larson et al. 2012; Freedman et al. 2014; Skoglund et al. 2015). Most modern dog breeds, however, have emerged from controlled breeding practices of the last 200 years involving narrow founding, genetic isolation, and strong phenotypic selection according to breed standards (Parker et al. 2004; Vonholdt et al. 2010). This unique history has created a genetic architecture that is extremely effective for unraveling quantitative phenotypic traits such as body size (Parker et al. 2004; Wayne & Ostrander 2007).
We begin by comparing domestic dogs to the pedigreed population of gray wolves in Yellowstone National Park (YNP) (Vonholdt et al. 2008). Recent studies have revealed many aspects of the Yellowstone population, from their genealogical history and trait evolution (Vonholdt et al. 2008; Anderson et al. 2009), to disease epidemiology (Almberg et al. 2009), life history (MacNulty et al. 2009a; MacNulty et al. 2009b; MacNulty et al. 2011; Stahler et al. 2013; Cubaynes et al. 2014), and population viability (Coulson et al. 2011). We build upon the rich literature on domestic dogs (e.g. heritability estimates, linkage analyses, genome-wide association studies, candidate gene approaches) through the addition of the canine epigenome, an ideal tool to investigate the molecular signature of recent rapid radiation and intense selective pressure. In addition, the samples we analyze include a wide selection of modern and ancient dogs breeds, as well as village dogs, which are dogs whose breeding is not well controlled and represent a mix of modern and ancient lineages (Boyko et al. 2010). Finally, using the Yellowstone wolf genealogy, we estimate the heritability of each methylation mark (Bell et al. 2012).
Genomic DNA (1μg) was extracted from whole blood from 50 domestic dogs from 19 recognized American Kennel Club (AKC) purebred dog breeds (further referred to as purebred dogs), two Boz breed dogs from the Turkish Caucasus, two aboriginal island breeds (dingoes, n=2; New Guinea Singing Dogs, ngsd, n=4), and three village dogs. Similarly, we included 35 gray wolves (C. lupus) from YNP (Supplemental Table S1). All DNA was prepared using QIAamp DNA mini kits following manufacturer's protocol (Qiagen, DNeasy blood and Tissue kit). Whole blood is both a reliable proxy for organism-level transcriptional activity (Liew et al. 2006; Kaminsky et al. 2009; Rakyan et al. 2011) and methylation marks are preserved and stable over time in frozen blood, unlike RNA collections (Ferrer et al. 2008; Lofton-Day et al. 2008; Van der Auwera et al. 2009; Wieczorek et al. 2009). DNA was quantified using a Qubit 2.0 Fluorometer and checked on a 2% agarose gel for degradation.
We used reduced representation bisulfite sequencing (RRBS) to explore methylation patterns at CGIs, a single-base resolution methylation analysis from the sequencing of bisulfite converted DNA of CpG-rich regions (Meissner et al. 2005; Saxonov et al. 2006). We constructed RRBS genomic libraries for sequencing on an Illumina HiSeq2500. High purity genomic DNA was digested with Msp1, ends were repaired with adenine on the 3’ ends followed by ligation of adapters with unique 6mer barcode sequence tags using the TruSeq Sample Preparation kit's standard protocol (Illumina Inc). Library purification and size selection were completed using Agencourt AMPure XP beads (Beckman Coulter). We selected fragments between 100-400bp in size then treated them with bisulfite using an EpiTect kit (Qiagen) for conversion of unmethylated cytosines, following the manufacturer's procedure for low DNA input (1 ng-2 μg) with a double incubation time of 10 hours to ensure a high conversion rate. Our converted DNA was then subjected to 12 PCR cycles of amplification for enrichment of adapter-ligated fragments with MyTaq Mix (Bioline Inc). Library quality was assessed with an Agilent 2100 BioAnalyzer (Agilent) and concentration estimated using a Qubit 2.0 Fluorometer. Libraries were then standardized to 10nM, and then pooled on average four samples per lane for sequencing (minimum 2, maximum 6). To avoid generating redundant methylation information, single-end reads were chosen over paired-end reads for sequencing. Single-end DNA sequencing (1×101nt) was performed using the Illumina HiSeq 2500.
We used bowtie2 in BS-Seeker 2 to map our reads to the reference dog genome (canfam3.1, (Lindblad-Toh et al. 2005)) for read lengths bounded from 50-300 bp (Chen et al. 2010). Per barcode pool, there were 3.25 mismatches (min=3, max=5). Therefore, we deplexed pooled sequence data with a custom perl script with a maximum of two mismatches between expected and observed barcode sequence tags. FASTQ files were then trimmed for low quality reads and adapter sequences clipped using Trim_Galore v0.3.7, discarding reads that were <20 bp in length (Krueger 2013). Trimmed reads were mapped to the reference dog genome using bowtie2 (Langmead 2010; Langmead & Salzberg 2012) and methylation called using BS-Seeker2 (Chen et al. 2010). Bowtie aligns C/T converted reads to C/T converted strands. During post processing, the number of user-defined mismatches is counted with the exception of those between read Ts and genomic Cs. Low-quality mappings with user-defined mismatches are then removed.
We calculated the methylation frequency (MF) as the proportion of methylated cytosines out of the total number of methylated and unmethylated cytosine (sequenced as thymine) reads per site (Chen et al. 2010). For a given frequency, we used a 95% confidence interval threshold value of 0.33 to calculate frequency confidence. Our confidence interval threshold (the maximum range we accept for the 95% confidence calculation) was a function of the number of counts at a particular position, so that for every frequency we used, we were 95% confident that our frequency was real within a window of 0.33 around that number. We also excluded cytosines within the CHG and CHH motifs (Zemach et al. 2010). Cytosines were annotated as being either intergenic or in a promoter (within 5Kb of transcriptional start site; (Agirre et al. 2015)), exon, or intron. Additionally, we used RepeatMasker (Smit et al. 1996) to classify cytosines within canine repeat elements (e.g. LTRs, simple repeats).
To convert methylation frequency into a diploid genotype, we discretized methylation values into epigenotypes: sites with little or no methylation (MF<0.33) were annotated as AA epigenotypes, sites with high or fixed methylation (MF>0.66) were annotated as BB epigenotypes, and sites with intermediate methylation levels (0.33≤MF≤0.66) were annotated as AB epigenotypes (Heyn et al. 2013; Wang & Fan 2015). This allowed for the downstream application of traditional population genetic analyses (e.g. Principal Components Analysis, PCA, Bayesian-clustering, association scans) on methylation data through their conversion to diploid genotypes where each epiallele retains some broad level of methylation information (e.g. A epiallele is hypo-methylated; B epiallele is hyper-methylated).
To assess the samples for quality and general patterns, we employed two methods on our set of 96 samples: 1) hierarchical cluster analysis on methylation frequency data and 2) PCA of SMP data. First, we constructed a Gower pairwise dissimilarity matrix among all samples for autosomes and X chromosome separately with the R function daisy in the cluster package (Gower 1971; Struyf et al. 1997; Kaufman & Rousseeuw 2009). From this matrix, we then performed hierarchical clustering using Ward's method with the hclust function in R (Gordon 1999). Similarly, we conducted a principal component analysis (PCA). However, the nature of RRBS means that we sequence contiguous blocks of nucleotides, thus containing a high degree of redundant genetic information. To avoid violating the assumption of unlinked and independent loci when analyzing SMPs, we chose to use genotypic correlation as a measure of intra-locus linkage through non-random association (e.g. physical linkage). We used PLINK (Purcell et al. 2007) to prune SMPs in linkage disequilibrium (LD) with a genotype correlation of r2>0.2 using the parameter flag --indep-pairwise 50 5 0.2. We used flashPCA (Abraham & Inouye 2014) for PCA as our primary method to identify outlier samples for exclusion in subsequent analyses.
To explore the patterns of SMP variation, we analyzed all canid samples after outliers were removed. We surveyed structure using a clustering method with an explicit population genetics model based on Hardy-Weinberg expectations with the program ADMIXTURE and conducted a cross validation with the --cv flag (Alexander et al. 2009). With this method, each individual is assigned an ancestry proportion based on the number of assumed genetic clusters (K=2-8). Additionally, it assumes individuals are unrelated and loci are unlinked. We used ADMIXTURE to explore the relatedness of the YNP wolves. The purebred dogs are known to be unrelated at the grandparent level, while all family history data are lacking for the non-purebred dog samples. To identify loci that are associated with the domesticated dog phenotype, we conducted an association scan using PLINK and 1,307 unlinked SMPs across the purebred dogs (n=47) and YNP wolves (n=33) with the --assoc flag and a case/control phenotype. This is only a small fraction of the methylation dataset due to the nature of the RRBS approach to capture physical clusters of cytosines in CpG islands. Association p-values were converted into -log10(p) scores and displayed as Manhattan plots using the qqman R function (Turner 2014).
We estimated methylation-specific neutrality statistics with a recently developed method based on a Tajima's D framework with the specific application to SMP data. We first estimated the statistic Dm (Wang & Fan 2015) which detects methylation mutations likely under selection at a particular coding locus. For each gene body represented by sites located within exons and/or introns, we calculated a Dm test to assess potential selection on methylation mutations. /standard deviation . and are two estimators of the mutation parameter, . is the value estimated by the average methylation state difference per cytosine site. is the value estimated by the proportion of methylation segregation sites. When there is an excess of intermediate-frequency SMPs, will have a large value, and Dm tends to be positive. When there is an excess of low-frequency SMPs, will have a small value, and Dm tends to be negative. By theoretic analysis and simulation, it has been shown that directional selection in DNA methylation could generate an excess of intermediate-frequency epialleles (positive Dm) or rare epialleles (negative Dm) dependent on epimutation rate, whereas balancing selection in DNA methylation could generate an excess of intermediate-frequency epialleles (positive Dm). Namely, extreme positive Dm could suggest directional or balancing selection on epialleles, while extreme negative Dm may imply directional selection (Wang & Fan 2015).
Purebred dogs and YNP wolves (n=47 and 33, respectively) were analyzed separately for each gene body tested. In order to examine whether gene bodies in purebred dogs and YNP wolves were under different selection pressures, we calculated the difference between purebred dog and YNP wolves’ Dm values (YNP wolves Dm– purebred dog Dm=DmDiff) for each gene body with polymorphism in both groups (nboth=38). We then performed permutation tests to determine p-values for DmDiff. Under the null hypothesis, there should be no difference between dogs and wolves, i.e. DmDiff=0. Therefore, we performed 10,000 permutations, randomly assigning dog and wolf labels without replacement while keeping the vectors of gene bodies intact to preserve linkage disequilibrium. We then performed a gene ontology (GO) analysis as described below.
Finally, we examined the polymorphic gene bodies in purebred dogs to identify deviations from neutrality through a bootstrap analysis with 10,000 replicates, sampling with replacement while keeping the individual gene body vectors intact. This procedure allowed us to estimate the 99.5% bootstrap confidence interval of the Dm range for each gene body in purebred dogs (ngene_bodies=63). We again conducted a GO enrichment analyses on genes that significantly deviated from neutrality as described below.
To measure the genomic signals of purebred dog domestication, we analyzed methylation data from 33 YNP gray wolves and 47 AKC purebred dogs. We report MF in addition to Delta Δ, the absolute difference between the MF of wolves and purebred dogs. All cytosines located within the mitochondrial genome were excluded, as it does not exhibit a strong signal of DNA methylation in vertebrates due to the underrepresentation of CpG motifs in mitochondrial genes (Cardon et al. 1994).
DNA methylation patterns can vary by biological processes, but also artificially by batch and cell heterogeneity (Christensen et al. 2009; Ji et al. 2010; Bock 2012). For example, blood samples consist of heterogeneous population of cell type in varying proportions, each potentially with a distinct DNA methylation signature that may confound association scans (Reinius et al. 2012). To address this issue, we used a reference-free correction approach in the program EWASher, implemented as part of the FaST-LMM Python v 0.2 package (Lippert et al. 2011; Zou et al. 2014). EWASher computes methylome similarity for every pair of samples. These similarities were then used as the covariance in a mixed model as an implicit proxy for cell-type composition. A reference-based cell-type corrected analysis was performed using the estimated cell-type composition as covariates in a linear regression, in addition to other covariates added to the analysis. The analysis begins by converting raw methylation data into β values. We considered a site to be constitutively methylated if the average probe β value across all samples (cases and controls) was MF>0.8 and to be constitutively unmethylated when MF<0.2. Linear regression was used to correct for all covariates, including batch, computing residuals used in the main association analyses. Using the similarity matrix as the covariance component in the LMM, an associated p-value was generated for each site. If the genomic control factor λ was still inflated (i.e. genome-wide type 1 error rate; (Devlin & Roeder 1999)), EWASher computed the PCs across all samples, included the top PC as a covariate, and then repeated the process. Parameters in the linear mixed model were estimated by maximum likelihood (ML). Association P values were obtained by using a 1 degree-of-freedom likelihood-ratio test and all Bonferroni-significant tests were performed at the α = 0.05 significance level.
We observed only small deviations from the 95% error bars in the quantiles of the theoretical null distribution when plotted against the observed, concordant with our expectation that only a small proportion of methylation sites would be significantly different. To control for batch effects, we added batch as a covariate in our regression analysis. In addition, we included sex of the individual and breed as covariates in our analysis. We applied a Bonferroni-adjusted significance threshold of p<0.005. To survey a larger list of sites, we conducted a GO enrichment analysis on sites that had an adjusted p<0.05 and were found in coding regions (n=8). Last, we compared the list of DMSs between the uncorrected SMP list of 1,307 sites (p<0.05) and the corrected methylation (adjusted p<0.05) association scan of 68 sites to determine if there was overlap between the two types of analyses.
We utilized the known genealogical relationships among the 35 Yellowstone gray wolves to assess overall heritability of methylation marks and their mode of inheritance (Vonholdt et al. 2008). To limit the analysis to relevant members and thus reduce computation time, the pedigree was reduced to include only wolves with methylation values, along with their relatives by using the statistical genetics software Mendel's pedigree trimming option (Lange & Sinsheimer 2004; Lange et al. 2013). Wolves in singleton pedigrees post trimming were removed from the analysis (three wolves total). Thus, the final pedigree consisted of 63 wolves, 30 of which had methylation data.
To estimate heritability of methylation, we used a variance components analysis, which takes into account known genealogical relationships between wolves by partitioning the phenotypic variance (Lange 2002). The variance components considered in our analyses included additive genetic, dominance, and environmental variance. As discussed in detail in Lange (2002), the variance component model posits that a quantitative trait such as methylation levels is controlled by multiple loci (polygenic model). In this model, the additive genetic variance captures the effect of alleles at multiple loci as if they were acting independently on methylation. Dominance genetic variation results from deviation from this allelic independence. Environmental variation is any residual variation in the trait values after accounting for genetic and shared environmental variation.
Using the autosomal sites that remain after filtering, methylation levels were standardized and using Mendel, models with different sets of variance components were fit to the data (Lange et al. 2005; Lange et al. 2013). All models included batch and sex as covariates. In addition, the first 5 PCs from EWASher were included in our analysis to correct for cell-type composition. In order to parsimoniously detect evidence for the heritability of methylation marks, we built models in a stepwise fashion. In analysis 1 we fit the simplest model, which includes additive genetic and environmental variance components only. In analysis 2 we reanalyzed those methylation marks that had non-zero estimates of additive genetic variance from analysis 1 by including dominance genetic variance along with the additive genetic and environmental variances. Significance of heritability estimates for all analyses was determined using a 1-sided z-test. Results were termed genome-wide significant if the p-value was less than the Bonferroni-adjusted significance threshold of p<2.08 × 10−6.
To test if any gene list was enriched for specific functional categories or ontological types, we used WebGestalt (Zhang et al. 2005; Wang et al. 2013b) on a list of genes that were found to be significant outliers. As a background for our GO analyses, we used all of our unique genes from our RRBS data (n=4580). We applied a Benjamini-Hochberg FDR correction (Benjamini & Hochberg 1995).
Four samples were dropped due to low coverage, or extremely high variation in coverage (Supplemental Table S1). Of the remaining 92 samples (nwolf=35, npurebred=47, nBoz=2, nsemi=6, nvillage=2), cytosine-specific coverage was calculated by pooling on average four samples per lane (average 27.5× ±4.8 SD) having 17,024,508 ±2,329,082 cytosines, prior to any filtering (Supplemental Table S2). After conservative filtering, 24,205 sites remained (nAuto=23,997; nXchr=208); 33% of the cytosines were located in intergenic regions, 40% were in coding regions and 26% were in promoters (Supplemental Table S3). Further, 19% (n=4,688) of sites were found within repeat elements, with the majority in LINEs (21%, n=978) and SINEs (43%, n=2,002) (Supplemental Table S3). Compared to the 34% TE distribution in the canine genome (Lindblad-Toh et al. 2005), the proportion of methylated loci within all repeat elements for our 24,205 sites was approximately 19% (χ2=0.058, df=1, p=0.81).
The average methylation (± Standard Deviation; SD) was moderate across 92 samples (MFwolf=0.38 ±0.4; MFpurebred =0.34 ±0.4; MFBoz=0.42 ±0.4; MFsemi=0.35 ±0.4; MFvillage=0.40 ±0.4). However, methylation was significantly higher on the X chromosome than on autosomes (MFautosomes=0.34±0.4; MFXchr=0.75±0.4; 1-tailed t-test unequal variance p<<0.0001). To identify outlier samples, we conducted a PCA on 23,997 autosomal and 208 X-linked cytosines that were converted to SMP genotypes. Using a genotype correlation inclusion threshold of r2<0.2, we retained 1,291 unlinked autosomal and 16 unlinked X chromosome SMPs, of which 8% were found in repeat regions (n=106, where 48 were found within LINEs and SINEs), 32% were within coding regions (n=413), 53% intergenic (n=692), and 15% regulatory (n=202). Across the genome, we identified seven samples that were subsequently excluded from all analyses due to their extremely polarizing effect on the PCA (Supplemental Fig. S1). After removal of outlier samples, the genome-wide average methylation (SD) remained moderate (MFwolf=0.34 ±0.4; MFpurebred =0.34 ±0.4; MFBoz=0.39 ±0.4; MFsemi=0.36 ±0.4; MFvillage=0.38 ±0.4; Supplemental Table S4).
In addition, after removal of outlier samples, a hierarchical cluster analysis of autosomal SMPs (nSMPs=23,997) identified three distinct groups: wolves, dogs, and a “mixed” cluster containing two subgroups of wolves, ancient dogs breeds, and aboriginal island dogs (dingoes removed as outliers) (Fig. 1A). Across the autosomes, the wolves formed a distinct group to that of dogs, with >85% accuracy of species assignment (29/33 wolves; 39/46 purebred dogs) (Fig. 1A). Although there was subdivision of sexes within each cluster, this was less absolute (11/15 female wolves; 10/18 male wolves; 17/25 female purebred dogs; 8/21 male purebred dogs). However, despite a smaller set of X-linked SMPs (nSMPs=208) with variable branch lengths, overall cluster analysis produced five distinct groups: male dogs (15/21), female dogs (14/25), male wolves (18/18), female wolves (12/15), and a “mixed” cluster of aboriginal island/village dogs and wolves (Fig. 1B). The primary axis of divergence for the X-linked SMPs was sex (26/40 females; 33/39 males), followed by species assignments (31/33 wolves; 32/46 purebred dogs). In both dendrograms, there were outlier assignments with wolves processed in a different batch (n=4), aboriginal island dogs (n=4), boz breed dog (n=1), village dogs (n=2), and some putatively ancient dog breeds (bsji, chcr, husk, peke, samo; n=6) (Fig. 1). PC1 also confirmed that the predominant divergence among the samples was a species-specific pattern explaining 7% of the variation (Fig. 2). Relative to the wolves, all dog samples displayed a wider spatial variation along axis 2 (3.8% variation explained).
We then evaluated population subdivision using the program ADMIXTURE for K=2-8 genetic groups for 1,307 unlinked SMPs (Alexander et al. 2009)(Supplemental Fig. S2; Supplemental Table S5). The best fit at K=2 clearly defined two epigenetic groups, gray wolves and all dogs (purebred, aboriginal island, and village). As K increased, subdivision was found within both groups, though it was not consistent with breed grouping. However, three main groups were distinct at higher K values: wolves, purebred dogs, and the cluster containing the aboriginal island and village dogs (Supplemental Fig. S2).
An association scan of 1,307 unlinked SMPs identified 36 significant sites associated with domestication (p<10−5, -log10(p)>5) (Fig. 3; Supplemental Table S6A), of which 47% (17/36) were sites located within a LINE or SINE element, 67% (24/36) intergenic, 25% (9/36) coding, and 6% (2/36) promoter. Both LINES and SINES were significantly enriched compared to the 1,307 unlinked SMPs dataset (Supplemental Table S3). Eleven of the 36 significant sites were located within genes (nexons=1; nintron=8; nprom=2), and the majority of the associated sites (27/36) were hyper-methylated in purebred dogs (average MFwolf=0.62; MFpurebred=0.75). We found an identical pattern of hyper-methylation of transposons in purebred dogs (15/17 hyper-methylated in dogs; average MFwolf=0.61; MFpurebred=0.81). Using the OMIM database (Hamosh et al. 2005), we annotated the significant DMSs associated with genes (Supplemental Table S6A).
EWASher retained 962 cytosines with 0.2<MF<0.8 between 45 purebred dogs and 33 YNP wolves, of which 68 were significant (corrected p<0.05; Supplemental Table S3) and 14 were highly significant (p<0.005; Fig. 4, Supplemental Table S6B & S7; Supplemental Fig. S3). Out of the 68 significant cytosines, the majority was located within intergenic regions (n=53). The repeat elements were located predominantly in unknown locations (n=33). The majority of the associated sites (36/68) were significantly hyper-methylated in purebred dogs (adjusted p<0.05; average MFwolf=0.58; MFpurebred=0.59). The average methylation frequency difference between purebred dogs and gray wolves was 0.009 (SD±0.018). Among theses significantly associated DMSs, 51% (30/68) consisted of repetitive DNA of which 30 were transposons. We conducted a GO analysis of eight out of 68 sites that were non-overlapping and had an associated annotation. After Benjamini-Hochberg FDR correction, none of our GO categories under Biological Process, Molecular Function, and Cellular Component were significantly enriched (Supplemental Table S8). Further, we found 19 of 68 sites overlapped with the uncorrected SMP association scan, with the majority of sites annotated as intergenic and located within SINES (Table 1). In addition, we found one out of our 14 highly significant, top-ranking sites that overlapped with the uncorrected SMP association scan (Table 1). Out of our 19 overlapping sites, three sites deviated from Hardy-Weinberg Equilibrium (two sites exhibited an excess of heterozygotes and a single site exhibited an excess of homozygotes; Table 1).
To obtain a perspective at the level of the gene and determine a possible correlation with methylation status, we asked if any repetitive element was within 3Kb of each DMS. We hypothesized that if proximal repetitive elements are epigenetically silenced, their methylation status may spread to nearby sites (Vonholdt et al. 2012). We found that 66% (n=45) of the 68 DMSs were located within or were proximal to repetitive elements. Of the 15 genic DMSs, the majority (67%, n=10) was associated with repeat elements.
Estimating methylation-specific neutrality statistics using a Dm test revealed that out of 3,229 annotated gene bodies (i.e. the gene bodies in the genome covered by our methylome mapping approach), 121 (ndog= 63; nwolf= 96; nshared= 38) exhibited SMP variation (Supplemental Table S9A). The mean observed gene body Dm values we calculated for purebred dogs was −0.5598 and −0.144571 for YNP wolves, eight of which were significantly different (p<0.05) between dogs and wolves (DmDiff different from 0) as assessed by permutation (Table 2). Of these eight genes, SLC17A8, a glutamate transport gene, overlapped the list of significant outliers from both the uncorrected SMP association and corrected methylation scan (Table 1; Table 2). GO analyses revealed that none of our GO categories under Biological Process, Molecular Function, and Cellular Component were significantly enriched after Benjamini-Hochberg FDR correction (Supplemental Table S8). Further, 75% (6/8) of dog genes that deviated from neutrality contained repeat elements within their boundaries (Table 2).
We also assessed if gene body methylation in purebred dogs deviated from neutrality through a bootstrap analyses performed on polymorphic SMPs. We found that 11 of 63 gene bodies had 99.5% Bootstrap Confidence Intervals (BCIs) that did not overlap the mean Dm value (Supplemental Table S9B), consistent with substantial deviation from neutrality. From the list of 63 gene bodies, we again found neurotransmitter-related genes (GRBRB1, SLC17A8, and ADCY1). GO analyses did not yield significant results (Supplemental Table S8).
In YNP wolves, pedigree-assisted estimates of heritability were calculated using 23,997 autosomal methylation sites. Analysis 1, which includes fixed effect covariates for batch, sex, and cell-type composition and incorporates variance components for additive genetic and environmental variance, resulted in 4,504 sites with non-zero variance component estimates, 2,282 of which exhibited significant narrow sense heritability after accounting for multiple testing (p<2.08×10−6) (Table 3). Most of the sites exhibited low narrow sense heritability, 81.25% of the methylation marks exhibited virtually zero narrow sense heritability (h2<0.01), whereas a sizable minority (8.89%) had extremely high narrow sense heritability (h2>0.99) with the remaining 2,367 sites showing moderate levels of heritability (median h2=0.536) (Supplemental Fig. S4A; Supplemental Table S6). In addition, the methylation marks with the least variability (<5th percentile) had standard deviations ranging between 0.0004 and 0.0129. After adjusting for multiple testing (p<2.09×10−6), 7% of these 1,200 least variable methylation marks were significantly heritable. When the 1,200 least variable sites were removed, 9.64% of the 22,797 remaining autosomal methylation marks were significantly heritable, demonstrating that the percentage of sites that were significantly heritable differed when considering non-variable sites compared to the rest of the autosomal sites (p= 0.002).
Because the wolf pedigree was large and highly detailed, we were able to further explore the broad sense heritability by looking for evidence of dominance genetic variance. In Analysis 2, we included only sites with non-zero additive genetic variance estimates from Analysis 1 (n=4,505) to avoid building on a model in which the additive genetic variance was already estimated to be zero, and extended the model to include dominance variance. As a result, Analysis 2 identified 1,362 sites with significant broad sense heritability using the same fixed effect covariates and genome-wide significance threshold (p=2.08×10−6) (Table 3). In addition, 309 of these methylation marks with significant broad sense heritability also had significant narrow sense heritability. As expected from the narrow sense heritability, the vast majority of methylation marks had near-zero broad sense heritability (81.26% with H2<0.01). This percentage represented only two additional methylation marks than were observed to have near-zero narrow sense heritability. A sizable minority of methylation marks were estimated to have either extremely high narrow or broad sense heritability (14.71% with either h2>0.99 or H2>0.99), an additional 1,397 methylation marks over the number estimated when only allowing for additive genetic variance (Supplemental Fig. S4B). Some of these estimates of high heritability were likely due to the small sample size, making it difficult to accurately estimate the heritability; however, some of these estimates were likely due to methylation marks that were inherited in an essentially Mendelian fashion. The remaining 968 sites had moderate levels of broad sense heritability (median H2=0.50). Manhattan plots confirmed that the methylation marks with significant narrow or broad sense heritability were distributed throughout the genome (Supplemental Fig. S5).
From a genome-wide scan of methylation at CpG islands across 33 Yellowstone wolves and 45 American Kennel Club registered purebred dogs, we found that the regions exhibiting differential methylation are enriched for transposons and also contain several intriguing categories of functionally relevant genes (e.g. neurotransmitters). In order to minimize differences in methylation values resulting from cell composition, we used a correction-based approach that estimates the degree of genomic inflation (i.e. genome-wide type 1 error rate) with a subsequent correction for cell-type composition using a linear mixed model conjointly with PCA. In addition, we corrected our analyses for multiple covariates including batch, sex, and breed, when applicable. To measure methylation differences between purebred dogs and YNP wolves, we uncovered the signature of selection for polymorphic genes and determined trends in the overall heritability of methylation, providing a revealing picture on the epigenetic changes that occurred during the domestication of the dog from the wolf.
An association scan revealed that differentially methylated sites were generally hyper-methylated in dogs and frequently found within repetitive DNA, primarily transposons. In canines, transposons make up 34% of the canine genome (Lindblad-Toh et al. 2005), but nearly half of other mammalian genomes (~45%) (Smit 1999; Whitelaw & Martin 2001). Transposons play a variety of roles for genome functioning and stability, which include chromosome breakage and rearrangement, in addition to epigenetic transcriptional regulators (Deininger et al. 2003; Slotkin & Martienssen 2007). Due to their large copy number variation and insertion site polymorphism, transposons can serve as the raw material for evolution and influence local transcriptional patterns as they are targeted by epigenetic mechanisms for silencing (McClintock 1984; Kidwell 2002; Deininger et al. 2003). Insertion sites may also disrupt gene function and alter linked phenotypes, serving as a mechanism that can rapidly generate phenotypic variation, an aspect often noted during species domestication. For example, miniature inverted-repeat transposons (MITEs) experienced a rapid amplification and sequence diversification over a relatively short period of time during rice domestication (Feschotte et al. 2002; Jiang et al. 2003; Naito et al. 2006). After transposition, newly inserted elements are often targeted for epigenetic silencing, which frequently suppresses transcription of linked genes (Kaasik & Lee 2004; Zhanget al. 2006; Lister et al. 2008; Hollister et al. 2011). In maize, a Hopscotch retro-element approximately 60Kb upstream of the tb1 coding region is associated with increased tb1 expression, resulting in apical dominance (i.e. changes in branch structure) in maize compared to teosinte. In addition, molecular dating of the Hopscotch retro-element provides suggestive evidence that it predates the domestication of maize and existed in standing genetic variation of teosinte prior to artificial selection (Studer et al. 2011).
Transposons, specifically in SINES, have been previously associated with genome evolution of the domestic dog (Wang & Kirkness 2005; Kirkness 2006). SINES were originally characterized as having a genome-wide distribution (Minnick et al. 1992; Wang & Kirkness 2005) with insertion mutation mapped to phenotypes: narcolepsy in Doberman Pinschers (Lin et al. 1999), autosomal recessive centronuclear myopathy in Labrador retrievers (Pele et al. 2005), and merle patterning in multiple breeds (Clark et al. 2006). There are ~170,000 SINEC_Cf elements, a subfamily of canine-specific SINES, with approximately half of all genes containing a SINEC_Cf insertion (Wang & Kirkness 2005). When these elements are transcribed, they can provide splice receptor sites that can result in the addition of novel exons to the genome.
A recent study revealed the evolutionary history of an intronic SINE insertion located within IGF1 of purebred dogs was lacking in a variety of canid species, including the gray wolf (Gray et al. 2010). Taken together, these results provide a unique perspective of canine domestication where ancient transposons present at a low copy number in the ancestral genome could have experienced rapid amplification as a response to strong artificial selective pressures during domestication, and have since contributed towards functional variation. The “stress hypothesis” can provide an intuitive context for transposon expansion and genome restructuring (Capy et al. 2000; Chénais et al. 2012). As it states, under certain conditions of stress that impact fitness, transposons may become activated as a potential genomic mechanism for novel phenotype production. Dogs, unlike their wild counterparts, recently experienced significant genomic stress due to strong artificial selection and line breeding (e.g. inbreeding; Capy et al. 2000). Physiological stress, for example, is easily imagined as changes in temperature, nutrition, pesticides, or infection. Genomic or molecular stress often include selection or inbreeding which elicits responses, either at the genotypic or phenotypic level (Capy et al. 2000; Bijlsma & Loeschcke 2005). This transposon expansion process, however, cannot continue uncontrolled. Methylation may have served as a flexible epigenetic control of transposition, with targeted methylation acting to suppress rapidly expanding transposon families and marking them for subsequent purging (Smit 1999; Matzke et al. 2007; Slotkin & Martienssen 2007). This speculative mechanism may have contributed to the rapid production of phenotypic variation evident in the history of the domestic dog.
Similar to single nucleotide changes at the genome level, spontaneous epigenetic mutations can occur at the level of the methylome (Becker et al. 2011; Schmitz et al. 2011; Schmitz et al. 2013) and contribute to fitness differences on which natural selection can act (Rapp & Wendel 2005; Geoghegan & Spencer 2012; Hirsch et al. 2012; Geoghegan & Spencer 2013). Specifically, these epialleles can accumulate in an evolutionary lineage in which the linked phenotype shows non-zero heritability and is thus expected to contribute toward individual fitness and the action of natural selection (Wang & Fan 2015).
We identified eight genes with neutrality estimates that are under different selection pressure or demography effects between purebred dogs and YNP wolves, indicating that both selection and demography have played a role in shaping species-specific methylation patterns. Of these eight genes, SLC17A8, a glutamate transport gene, was an outlier in multiple analyses, exhibiting high heritability and containing repetitive elements. In a surveillance of non-neutral methylation patterns within the purebred dog genome, we again found neurotransmitter-related genes (GRBRB1, SLC17A8, and ADCY1). ADCY1 is a neurotransmitter that is believed to play a role in memory acquisition and learning (Wang et al. 2004). Interestingly, ADCY1 is found within the same gene family as ADCY8, a gene previously identified to be under positive selection in the dog genome with a similar role in functional memory (Vonholdt et al. 2010).
Epigenetic mechanisms, such as DNA methylation, can directly affect gene expression (Jaenisch & Bird 2003; Szyf et al. 2005; Ong & Corces 2011; Jones 2012). To date, few studies have investigated transgenerational epigenetic variation in natural animal populations (Peaston & Whitelaw 2006; Ho & Burggren 2010; Feil & Fraga 2011). Fitness-related phenotypes have been documented across generations in Yellowstone wolves, including reproductive success, fecundity, lifespan, inbreeding coefficients, behavior, and coat color (Coulson et al. 2011; Stahler et al. 2013; Hedrick et al. 2014). To account for inbreeding effects that can result from a founding bottleneck, we used the Yellowstone wolf genealogy to estimate the heritability of each methylation mark (Bell et al. 2012). We did not have relatedness information for dogs; therefore, dogs were not included in heritability analyses.
More specifically, we examined the heritability of methylation marks using a variance component model, a classic approach used for many years with quantitative phenotype traits (Lange 2002). As with humans, a minority of the methylation marks in the wolves showed highly significant evidence of heritability (Bell et al. 2012; Grundberg et al. 2013; McRae et al. 2014; Shah et al. 2014). When using a simple model where the level of methylation was determined by multiple genes each acting independently, a fraction (9.51%) of 23,997 autosomal methylation marks was significantly heritable after adjusting for multiple testing (p<2.08×10−6).
With the advantage of having an extensive and accurate pedigree for the Yellowstone wolves, we could further examine whether any of these methylation marks showed evidence of dominance genetic variance (broad sense heritability). We found 1,362 methylation marks (5.68%) that showed evidence of significant broad sense heritability. As our sample size for the heritability analysis was too small to reliably dissect the contribution of random household effects from genetic effects, the effects of common environment may inflate our estimates of the percentage of significantly heritable sites. However, they are similar to that observed in humans and, as in humans, these methylation marks seem to be uniformly distributed throughout the chromosomes.
While we detected transgenerational stability of differentially methylated sites, a necessary requirement for evolution, the scope of our study cannot address the functionality of these sites. However, we demonstrate the co-occurrence of such sites with repetitive elements. This relationship may indicate genotype-dependent epigenetic activity and, thus the patterns we describe may actually represent the underlying genotype that is both inherited and selected. The prevalence of repetitive elements found among the outlier sites support the notion that transposons, in particular, experienced a recent expansion during dog domestication with rapid and subsequent silencing through an epigenetic mechanism. Therefore, while we observe the stability of favorable epialleles, we cannot determine the cause of transmission or their association.
In our study, we found both GABA (GABRB1) and glutamate (SLC17A8) to be significantly associated with the differences between dogs and wolves, both of which are essential inhibitory and excitatory neurotransmitters in the brain, respectively. More specifically, GABA receptors are a family of proteins involved in the GABAergic neurotransmission of the mammalian central nervous system (Johnston 1996; Bormann 2000). GABA-A receptors are the site of action of a number of important pharmacologic agents including barbiturates, benzodiazepines, and ethanol (Whiting et al. 1999). Glutamate transporters are located on the surface of the cells expressing them in the central nervous system, as well as peripheral organs and tissues whereby they control glutamate signaling, and affect a wide-array of functions including cognition, memory, learning, and cell migration, differentiation, and death (Danbolt 2001). Due to the fact that GABA is inhibitory and glutamate is excitatory, both neurotransmitters conjointly control many processes, including brain excitation (Petroff 2002). A recent study found two glutamate receptors (GRIK3 and GRIK2) that were differentially expressed in the frontal cortex of dogs and wolves, suggesting a role of increased excitatory plasticity during domestication (Li et al. 2014).
Alterations in neural signaling and pathways due to changes in density of receptors, biosynthetic pathway alteration, or gene expression differences are indicative of neural plasticity, or flexible changes in the neural system, as a result of recent adaptation, such as changes that may occur during the process of domestication (Popova et al. 1997; Cheng 2010; Kukekova et al. 2011). Behavioral selection is thought to be one of the primary factors that contributed to canid domestication, in which selection acted directly on the genes involved in behavioral development (Trut et al. 2004). Not surprisingly, changes in brain gene expression have been shown in studies of domestication. Specifically, wild canids show a distinct signature of gene expression of a number of genes potentially targeted during behavioral selection in the hypothalamus, prefrontal cortex, and other brain regions compared to domestic dogs (Saetre et al. 2004; Li et al. 2013).
Similarly, genetic polymorphisms in neurotransmitters have been observed both between wolves and dogs and within dog breeds (Hejjas et al. 2007). A study using whole genome sequencing in dogs and wolves uncovered 19 genes related to brain function, with approximately half of the genes associated specifically with nervous system development (Axelsson et al. 2013). Similarly, another study using whole genome sequencing generated a candidate list of genes thought to be under positive selection during dog domestication and found multiple genes categorized as being involved in neurological processes (Wang et al. 2013a).
Our results on neurotransmitter-related genes must be interpreted with caution, however, as we did not collect methylation values directly from brain tissue. Although other studies have shown that methylation in blood is a reliable proxy for methylation in brain tissue (Davies et al. 2012), future studies are needed to verify whether our aforementioned neurotransmitter-related genes are differentially methylated in brain-specific tissue, such as tissue originating from the cortex.
In conclusion, our analysis of DNA methylation provides novel insights into the processes (e.g. demography, selection) that shape the epigenetic patterns in the genome of wild and domesticated canids. We found distinct methylation patterns between dogs and wolves, with a prevalence of differentially methylated sites located in transposons. A substantial fraction of these methylated sites demonstrated high heritability. We develop a speculative hypothesis regarding canine methylome evolution that may have enabled phenotypic diversification. Future studies will be needed to measure the effect of specific transposons on gene functionality, particularly genes involved in neurotransmission.
The authors would like to thank the purebred dog owners for providing blood samples, and Brian Peckinpaugh for providing Boz samples. In addition, we would like to thank Jennifer Listgarten for providing support for the program FaST-LMM-EWASher. We would also like to thank the Princeton University Computational Science & Engineering Support (CSES) group for providing computational assistance for multiple components of our work. This study was supported in part by the National Science Foundation grants DEB-0613730 and DEB-1245373, Yellowstone National Park, and many donors through the Yellowstone Park Foundation. This work was also partially funded through the Intramural Program of the National Human Genome Research Institute and these additional grants: AKC OAK (CHF #1822), NIH T32 HG002536, National Science Foundation DMS-1264153, and NIH GM053275.
IJK and BMV designed the study, performed analyses, and wrote the manuscript. BMV and LR constructed the RRBS libraries. MJT analyzed RRBS data for quality, and assisted with mapping and computing cytosine methylation levels. MMC and JSS conducted heritability analyses and contributed to the discussion. LD assisted with heritability analyses. KDM and JW conducted neutrality analyses and contributed to the discussion; KDM also assisted with mapping samples to the dog genome. JSS provided statistical approaches to assess the neutrality analyses. GEG and ELM conducted initial analyses of differential methylation in dogs and wolves. DRS, EAO, and RKW provided blood samples and manuscript edits. MP, DRS, EAO, and RKW were an integral part of interpretation of analyses and discussion of results.
BS-Seeker2 log files, epigenotype variant files (used for ADMIXTURE, dendrograms, and PCA), and GO results/input files are accessioned onto Dryad (doi:10.5061/dryad.q2g6h). Raw Illumina Reads are accessioned onto NCBI's Sequence Read Archive (SRA) SRP065319: BioSample Accessions numbers 4217607-4217698. Bam files are accessioned onto SRA SRP065666: BioSample Accessions numbers 4217607-4217698.