|Home | About | Journals | Submit | Contact Us | Français|
Advantages offered by canine population substructure, combined with clinical presentations similar to human disorders, makes the dog an attractive system for studies of cancer genetics. Cancers that have been difficult to study in human families or populations are of particular interest. Histiocytic sarcoma is a rare and poorly understood neoplasm in humans that occurs in 15–25% of Bernese Mountain Dogs (BMD).
Genomic DNA was collected from affected and unaffected BMD in North America (NA) and Europe. Both independent and combined genome wide association studies (GWAS) were used to identify cancer-associated loci. Fine mapping and sequencing narrowed the primary locus to a single gene region.
Both populations shared the same primary locus, which features a single haplotype spanning MTAP and part of CDKN2A and is present in 96% of affected BMD. The haplotype is within the region homologous to human chromosome 9p21, which has been implicated in several types of cancer.
We present the first GWAS for HS in any species. The data identify an associated haplotype in the highly cited tumor suppressor locus near CDKN2A. These data demonstrate the power of studying distinctive malignancies in highly predisposed dog breeds.
Here, we establish a naturally-occurring model of cancer susceptibility due to CDKN2 dysregulation, thus providing insight regarding this cancer-associated, complex, and poorly understood genomic region.
Although many genes have been associated with rare, high-penetrance cancer syndromes in humans, such syndromes account for only a fraction of familial cancer risk (1). A recent explosion of genome wide association studies (GWAS) has identified several putative cancer-associated risk alleles, many of which are located near known cancer genes, although not within classic exonic boundaries (reviewed in:(2)). These non-coding, low-penetrance, cancer-susceptibility alleles likely contribute to quantitative changes in gene expression and, as such, are difficult to find.
Dogs are particularly well-suited to studies of malignancy (3) as cancer is the most frequent cause of disease-associated death in dogs, and naturally occurring, cancers are well-described in several breeds (4, 5). The high incidence of breed-specific cancers offers opportunities to identify sequence variants leading to disease susceptibility that have been difficult to find in humans. Application of the canine system is particularly efficacious when multiple closely related breeds or pure breeding populations of the same breed exist, each with predisposition to the same disease and as such, are likely to segregate the same founder mutation (6, 7), (8).
Histiocytic sarcoma (HS) is a highly aggressive and lethal dendritic cell neoplasm that occurs in 15–25% of BMD (9–12). Localized HS most commonly develops in the skin or subcutis of an extremity. The tumor is locally invasive with metastasis to lymph nodes and/or blood vessels. Disseminated HS is a multi-system disease with tumors appearing in numerous organs including the spleen, liver and lungs. Progression to death is rapid (12). Almost no information is known about the genetic underpinnings HS in humans or animals (13), largely due to the lack of a well-characterized biological system for study. In this paper we summarize findings from two independent histiocytic sarcoma GWAS in BMDs, offering insights into this poorly understood class of neoplasms as well as establishing a foundation for future studies of histiocytoses in humans.
All dog owners provided informed consent consistent with Animal Care and Use Committees at their collecting institution. DNA was isolated from 475 BMD blood samples. Two hundred forty, 95 and 140 were provided from NA, France and The Netherlands, respectively. All dogs with available pedigree data were unrelated at the grandparent level. For detailed collection information see Supplemental Materials S1. Whole blood was collected with either EDTA or ACD anticoagulant. In NA and The Netherlands, genomic DNA was isolated using a standard phenol-chloroform protocol (14). In France the Nucleon BACC Genomic DNA Extraction Kit was used (GE Healthcare, Piscataway, NJ). All samples were stripped of identifiers, numerically coded, and aliquoted for long-term storage at −80°C.
Samples were genotyped using the Canine SNP20 Beadchip panel (Illumina, San Diego, CA) which included ~22,000 SNPs. After removing SNPs with a minor allele frequency <0.01 and genotyping rate <80%, 17,218 SNPs remained. The final data set included 114 and 120 case and control dogs, respectively, from North America (NA), and 128 and 112, respectively from Europe. More than 96% of European dogs came from France and The Netherlands. Two rounds of principal components analysis (PCA) were performed on the dataset using EIGENSTRAT (15). The first removed genetic outliers and the second determined the amount of stratification within the dataset. Eight dogs were removed because they were >6 standard deviations from the average across the top 10 PCs. The remaining 466 dogs (228 NA and 238 European) were clustered according to the top 10 PCs. The process was repeated in the NA and the European samples independently, and two additional outliers were removed. Fst values and inbreeding coefficients were calculated by continents, countries, and case/control status. The complete dataset of genotypes and phenotypes is available by request sent to dog_genome/at/mail.nih.gov.
In the first of two GWAS, 111 case and 117 control BMD from NA were analyzed using PLINK v1.07 (16). Standard χ2 values of association were calculated (17). Spurious missing data was imputed using BEAGLE and the analysis repeated, correcting for multiple testing with 10,000 permutations using PRESTO (18, 19). The data were analyzed using EMMA to correct for stratification and cryptic relatedness (20).
In the second GWAS, 125 European cases and 111 controls were analyzed using the same methods. The two NA and European GWAS results were compared and the datasets combined. Association was calculated without correction in PLINK, stratified by continent and permuted 10,000 times in PRESTO, and corrected for population structure using an additive kinship matrix implemented by EMMA. Loci that were significantly associated with disease in both populations were considered further.
A custom SNPlex genotyping assay was run on 175 case and 162 control BMDs from NA (213) and Europe (124) (Applied Biosystems, Foster City, CA). Selected SNPs spanned 9.7 Mb (chr11:39,072,460-48,846,456) and surrounded highly associated markers from the combined GWAS. After removing failed and uninformative SNPs, 229 remained (Supplemental Table S2).
Genotypes from SNPlex were imputed on all 466 dogs from the original GWAS using BEAGLE (18, 21). Because the datasets were divided by continent before imputation to account for differences in population structure, an additional 212 SNPs were added to the original GWAS data and association was calculated, as previously described. Phasing and association was performed using BEAGLE. Finally, the datasets were combined and association calculated with correction for population structure across the genome, including and excluding chromosome 11. Frequency of the associated haplotype was calculated in cases and controls from each population.
Amplicons were sequenced within a 300 kb region (chr11:44,001,369-44,331,631) that included the 198kb (chr11:44,133,881-44,331,630) associated haplotype and all predicted exons of CDKN2A, CDKN2B, and MTAP. Primers were designed using Primer3 v0.4.0 (22) (Supplemental Table S2). Segments were amplified from 24 case and 20 control NA BMD using standard protocols and sequenced using BigDye Terminator v3.1 on an ABI 3730xl DNA Analyzer (Applied Biosystems). Sequencing 306 amplicons revealed 133 SNPs. The complete sequence of the INK4A transcript and the genomic sequence of CDKN2A exon1a and promoter have been submitted to NCBI GenBank (accession numbers JN086563 and JN086564, respectively).
Sequences were analyzed using Phred/Phrap/Consed (23–25) with SNPs identified by Polyphred (26). BEAGLE was used to estimate haplotypes, impute genotypes, and calculate Fisher’s exact association for SNPs and haplotypes after 10,000 permutations (21, 27). The sequenced SNPs were imputed on the NA SNPlex dataset to calculate association. Markers with >25% missing data were removed prior to imputing. Pairwise LD and haplotype block analysis was performed using Haploview v4.1 (28).
To confirm the relative strength of SNP associations from imputed data, nine SNPs were genotyped in an additional 109 cases and 89 controls (Supplemental Table S3). Also, ten dogs from seven non-HS breeds were genotyped with the same SNPs.
Four ml blood samples were obtained from 53 healthy BMD, randomly selected from ~500. Peripheral mononuclear cells (PBMC) were isolated, uniformly plated, and allowed to expand in the presence of IL-4 (50 ng/ml) and GM-CSF (33ng/ml) in order to select for dendritic precursor cells (29, 30). At 19 days, the cells were harvested and DNA/RNA were extracted using standard methods.
Samples were assigned haplotypes based on their genotypes at chr11:44,201,923 and 44,215,162. Pre-designed Taqman assays were obtained for the B2M (endogenous control) and MTAP genes while primers and probe were designed for CDKN2A and CDKN2B using Primer Express (Applied Biosystems). Real-time PCR was performed on 24 ng of cDNA for each assay using standard protocols. Each sample was run in triplicate and CT values averaged. Relative quantities of the transcripts and average fold change were calculated using the ΔCt method compared to an endogenous control and a reference tissue (testis) and then corrected for amplification efficiency (31, 32). Data was collected for six dogs homozygous for the CA haplotype, six heterozygous dogs, and five dogs lacking the CA haplotype. P-values were calculated for the differences in distributions of transcript quantities using both the two-tailed student’s t-test and the non-parametric Wilcoxon rank-sum test.
PCA of the entire dataset of 474 dogs and 17,218 SNPs revealed significant stratification among the populations of BMD from NA and EU (Figure 1c). Plots of PCs 1 and 2 show separation of NA and EU populations (Figure 1a) while cases and controls are fully integrated (Figure 1b). Calculations of Fst averaged over all loci show that divergence between cases and controls is an order of magnitude lower than between geographic localities (average Fst=0.001 and 0.015, respectively). Overall, NA dogs demonstrated a higher level of inbreeding than either of the EU populations. However, none of the case groups were significantly more inbred than the controls (Supplemental Table S4).
A GWAS was conducted using 111 affected (case) and 117 unaffected (control) BMD from NA revealing >20 markers within a single peak of association on CFA11 spanning ~9 Mb from 38.5–47.1 Mb (praw=1.41×10−9, pemp<1×10−4, 10k permutations) (Figure 2a). After correcting for population stratification and cryptic relatedness, the most associated marker was CFA11:47,179,346 (pcorrected=5.6×10−6).
A second GWAS, performed using 125 case and 111 control European BMD, revealed HS loci on CFA11 at 47.1 Mb (praw=1.5×10−6, pemp=0.0064) and CFA14 from 10.9–14.0 Mb (praw=9.8×10−8,pemp=0.0003). After correction with EMMA, both loci remained significant (pcorrected=1.50×10−7 and pcorrected=6.59×10−6, respectively) (Figure 2b).
The datasets were combined and association analysis with correction for population structure revealed the same two loci as above (Figure 2c), however only the CFA11 locus was associated in both the individual and combined GWAS. The SNP at CFA11:47,179,346 had the strongest association with disease susceptibility by all methods with praw=1.11×10−11, pemp<1.00×10−4, and pcorrected =1.76×10−8. Quantile-quantile plots, showing the distribution of p-values before and after population correction, are shown in Supplemental figure S5. The top ten associated SNPs from each dataset and analysis method are listed in Supplemental Table S6, followed by a list of possible candidate genes from the locus on CFA14 for future studies (Supplemental Table S7).
To refine the locus on CFA11 we genotyped an additional 229 SNPs spanning 9.7 Mb (Supplemental Table S2) in 327 dogs from the combined BMD dataset and imputed the genotypes using all 468 dogs. In all populations two markers in complete LD with one another showed the highest association with pcorrected =4.15×10−12, 3.15×10−8, and 9.90×10−21, in NA, EU, and the combined set, respectively (Supplemental Figure S8). These markers were located at 44,191,398 and 44,215,162 in the CanFam2 assembly.
Genotypes across the region were phased and multi-marker association computed. All dogs carrying the case-associated allele at position 44,191,398 carried an identical three SNP haplotype at positions 44,191,398, 44,215,162, and 44,254,083. The haplotype was common in BMD, however, and comprised 80% of case haplotypes in all populations, but ranged from 49% to 64% in controls (Table 1). Strikingly, 65% of cases were homozygous for the CA haplotype within each population, with >95% of cases carrying the CA haplotype on at least one chromosome. By comparison, only 18–39% of the controls were homozygous for the CA haplotype (Table 1).
These three SNPs define a 198kb region (44,133,881-44,331,630) that spans methylthioadenosine phosphorylase gene (MTAP), and the cyclin dependent kinase inhibitors 2A (CDKN2A), and 2B (CDKN2B).
We identified 139 informative SNPs by sequencing, including 115 within the 198 kb haplotype (Supplemental Table S2). Two coding mutations were found in CDKN2A; a silent mutation in exon 1a and a mutation in exon 2 that is silent in p14ARF but changes an asparagine to a histidine in p16 INK4a. The altered amino acid is not conserved across species and the SNP does not segregate with the disease (Fishers exact p=0.09893). A SNP 88 bases upstream of exon 1a, that is likely within the 5′UTR, showed an association with HS (Fishers exact p=1.09×10−6, pemp=0.00029). However, this SNP alone is unlikely to be causative as the associated allele was found in eight of ten dogs from breeds in which HS is rare (Supplemental Table S3).
Thirty SNPs spanning positions 44,191,314 - 44,293,447 were in complete LD with the two most highly associated SNPs from the combined GWAS (Figure 3b). The associated haplotype was reduced to 75,920 bases (44,177,956-44,253,875) including SNPs at positions 44,177,978-44,251,174 surrounding the MTAP gene and ending within intron 2 of CDKN2A. This haplotype is broken by a single SNP at position 44,232,491 that appears to have arisen on the CA haplotype. Haplotypes on either side of this SNP are nearly identical in frequency with only one dog out of 228 being a possible recombinant.
More than 65kb of the 75kb haplotype have been sequenced in the discovery set. The remaining 10kb is divided among 25 loci ranging from <10 to nearly 2000 bps and is largely comprised of repetitive elements. Thus far, no single marker or combination of markers within the 75.9 kb haplotype conveyed significantly more risk than any other (Figure 3a). LD in dog breeds can be expansive, extending over one Mb at some loci (33, 34). Because of the near perfect LD found within this disease-associated region, and the lack of coding mutations, finding the causative mutation remains outside the scope of this paper. However, functional approaches can be applied to determine the most probable effect of the elusive mutation(s).
The disease-associated haplotype lies across MTAP and continues through the last exon of CDKN2A. We performed quantitative real time PCR across the region to determine if there were changes in transcript levels that correlated with the CA haplotype. Expression was measured on total RNA from histiocytes cultured from whole blood samples of healthy BMDs carrying zero, one, or two copies of the CA haplotype. No significant changes in MTAP expression were observed. However, individuals with two copies of the CA haplotype produced significantly higher amounts of both CDKN2A and CDKN2B transcripts, averaging 16 (p=.0173, Wilcoxon rank sum) and four times (p=0.00866) higher, respectively, compared to those lacking a CA haplotype (Table 2). Heterozygotes had approximately half the homozygote level of transcript, but the differences were not significant given the small sample sizes tested. These data suggest that there are variants within the CA haplotype that affect the expression of the CDKN2A and CDKN2B in HS susceptible dogs.
Dissecting the genetic underpinnings of dendritic cell neoplasms presents unique challenges to canine and human researchers alike, due to confusion regarding the origin of these immune cell tumors. While human disorders, such as Langerhans cell histiocytosis, have been well characterized clinically, etiologies remain elusive. We hypothesized that identification of susceptibility loci in the BMD would likely reveal genes of interest for both canine and human disorders, thus leading to a better understanding of the genetic underpinnings of this complex family of neoplasms.
Our data set was comprised of dogs from one breed but two major geographic areas. Average Fst values show that these BMD populations differ at a level similar to human populations from unique European countries (35). This is an order of magnitude lower than differences found between breeds, yet significant to find global alterations in haplotype and allele frequencies (8, 36).
There is slightly higher population-wide heterozygosity in the NA BMD compared to European BMD; however, individuals are characterized by reduced heterozygosity. The effect of such population differences is apparent in a complex disorder such as HS. In the NA population, only one locus segregates with the disease. However, the European population shows at least two. The second European locus may not be important in the NA population, or the underlying mutation may be present at such a high frequency it is approaching fixation. A brief examination of markers across the region shows that all NA dogs share allele frequencies similar to the cases from Europe, supporting the latter alternative.
This GWAS study unambiguously localized the major HS locus to a 9.7 Mb region on CFA11. The advantages of genetic mapping in dogs, where loci are quickly identified with small numbers of samples, can be offset by the potentially difficult transition from disease-associated haplotype to causative mutation (37). We compared mapping results from two populations of the same breed to reduce LD. The European population showed overlapping association with the NA population in a relatively small region of <200 kb, compared to the ~10 Mb identified in the original GWAS. Extensive sequencing revealed a single 75 kb disease associated haplotype.
Over 85% of the CA haplotype has been sequenced. The majority of the unsequenced segments are within the large third intron of MTAP. Expression levels of the CDKN2 genes show a much greater range in cells from dogs carrying the CA haplotype (SD = 0.0052) compared to those with control haplotypes (SD = 8.5×10−5). It is formally possible that this is a consequence of small sample size, more likely it indicates that the causal variant is present on only a subset of CA haplotypes, and has yet to be discovered. The second explanation also accounts for the relatively high incidence of risk-associated haplotypes in unaffected dogs.
Expression analysis suggests that HS is caused by a regulatory mutation(s). Unfortunately, little is known about regulatory elements in the dog. However, we can compare the canine locus to the corresponding human region and predict regulatory potential. For example, based on ENCODE chromatin state predictions from human ChIP-seq data (build NCBI36/hg18) (38, 39, 40), there is a strong enhancer immediately downstream of MTAP. The homologous region in the dog contains at least one of the 28 highly associated HS SNPs, and a region containing a series of SINE elements and repeats that may be amenable to deletion, insertion or rearrangement in addition to base pair changes, providing an attractive site for further investigation. Another of the highly associated SNPs, at position 44,215,162, lies within a second predicted enhancer region.
Our data suggest that variants on a risk-associated haplotype surrounding MTAP and continuing through exon 3 of CDKN2A affect the expression of both CDKN2A and CDKN2B. All three of the proteins transcribed from the CDKN2 genes; p16INK4A, p14ARF, and p15INK4B have unique promoters, but share regulatory elements (reviewed in: (41)). Loss of the CDKN2A-CDKN2B region through mutation, deletion or silencing is among the most frequent alterations found in human cancers, including HS (42, 43). In addition, CGH analysis shows that a region of at least one Mb centered on the CDKN2 locus is lost in ~60% of HS tumors in BMD (44). Though unexpected, increased expression of p16INK4a and p14ARF has been noted in multiple cancers including prostate, ovarian, cervical and mammary (45–47) and is typically associated with poor prognosis (48–50). Studies have suggested that, in these neoplastic cells, p16 inhibits apoptosis, particularly in response to DNA damage. Further investigation of CDNK2 gene regulation in BMD with and without HS may better illuminate the roles of these common cancer-associated genes.
MTAP is important for the salvage of methionine and adenine, encoding an enzyme that plays a role in polyamine metabolism (51). Recently it has been suggested that MTAP may also be a tumor suppressor (52). In our study, variants within or near the MTAP gene are associated with altered expression of CDKN2A/CDKN2B, but not changes in MTAP expression. Thus, our data offer a new perspective on MTAP’s role in cancer. Specifically, mutations within MTAP likely lead to dysregulation of CDKN2A/B.
The established importance of the MTAP/CDKN2A/CDKN2B locus in multiple cancer types, in combination with our finding that naturally occurring sequence variants in BMDs are associated with expression changes in these genes, suggests that the CA haplotype could be relevant for susceptibility to multiple cancers. Some 16.9% of U.S. BMD reportedly die of HS-related causes (Figure 4). Since 38% of a random sample of U.S. BMD (n=53) were homozygous for the CA haplotype (Supplemental Table S9), we hypothesize that multiple types of BMD cancer may be related to variants within the MTAP-CDKN2A region. This concept mimics what has been observed at human chromosome region 9p21, which is associated with susceptibility to several types of human cancer as well as other complex disorders (53).
Here we present the first GWAS of HS in any species. Using a population-guided mapping approach, followed by sequencing, we have identified a 75.9 kb haplotype found in 96% of all HS affected dogs. This haplotype contains features that affect expression of the CDKN2A and CDKN2B genes, which may be a primary contributor to histiocytic sarcoma susceptibility in the BMD. The CA haplotype overlies the MTAP gene and likely contains one or more variants that alter the expression of INK4A/ARF/INK4B but do not affect MTAP expression. It is plausible that numerous cancers developed by BMD are associated with sequence variants in this region. These findings lead us to hypothesize that BMDs are an excellent system for the study of cancer susceptibility due to INK4A/ARF/INK4B dysregulation, allowing for systematic studies regarding the role of naturally occurring sequence variants in this increasingly important locus.
We thank the Berner Garde Foundation, Bernese Mountain Dog Club of America, French Association for Swiss dogs, and the Swiss, Italian, and Belgium Bernese Mountain Dog Associations for providing data and distributing information. We thank also the many breeders, owners, and clinicians who collected data and samples. We thank Dr. James Rocco for supplying biological reagents and Dr. Erik Teske for cytology review,
This work was supported by the Intramural Program of the National Human Genome Research Institute at NIH. Additional support was received from the AKC-Canine Health Foundation grants 2667 and 760 (M.B.), 336 and 935 (E.A.O. and C.A.); the CNRS and French Association for Swiss dogs (C.A.); NIH NCI R01 CA69069, NIH U01 AI07033, and the Harvard Breast Cancer SPORE P50 CA89393 (E.V.S.); the Alberto Vittoni Award (B.H., E.C. and G.R); the Committee of Preventive Health Care of the Netherlands Royal Society of Veterinary Medicine and the breed societies for Bernese mountain dogs in the Netherlands, Belgium, Germany and Austria (G.R.).