|Home | About | Journals | Submit | Contact Us | Français|
Dupuytren's disease (DD) has strong genetic component that is suggested by population studies and family clustering. Genetic studies have yet to identify the gene(s) involved in DD. The purpose of this study was to identify regions of the entire genome (Chromosome 1 – 23) associated with the disease by performing a genome-wide association scan (GWAS) on DD patients and controls.
Genomic DNA (gDNA) was isolated from saliva collected from 40 unrelated DD patients and 40 unaffected controls. The genotyping was conducted using CytoSNP™ - Infinium® HD Ultra genotyping assay on the Illumina platform. The single nucleotides polymorphism (SNP) genotyping data was analyzed using both log regression and mapping by admixture linkage disequilibrium (MALD) analysis methods.
The single SNP analysis revealed significant association in chromosomes 1, 3, 4, 5, 6, 11, 16, 17 and 23 regions. MALD analysis showed ancestry-associated regions in chromosomes 2, 6, 8, 11, 16 and 20, which may harbor DD susceptibility genes. Both analyses methods revealed loci association in chromosomes 6, 11 and 16.
Our data suggest that chromosome 6, 11 and 16 may contain the genes for DD and that multiple genes may be involved in DD. Future genetic studies on DD should focus on these areas of the genome.
Dupuytren's disease (DD) is the most common heritable disorder of connective tissue; however, sporadic cases are often encountered. The disease is characterized by progressive fibroblastic proliferation of components of the palmar fascial complex leading to digital contracture, most frequently affecting the ring and small fingers. In the advanced stages of the disease the flexion deformities interfere with hand function and the ability to grasp and manipulate objects (1). Palmar fascial proliferation may occur in non-Dupuytren disease (2) a clinical entity that should not be confused with Dupuytren's disease in which the palmar fascial proliferation usually follows trauma or surgery to the hand. Patients affected by palmar fascial proliferation of non-Dupuytren's disease can be of any age, gender, or race, and may be diabetic with no family history of DD. The condition is unilateral, non progressive, usually only one hand is affected and there is minimal digital involvement but no flexion deformity.
Typically DD affects men of Northern European heritage, with a peak incidence at around 50 years of age. The condition is usually bilateral with progressive digital contracture at various rates. More than one digit is usually involved, and patients may have ectopic disease (3). Onset at a younger age is associated with a more aggressive disease course and an increased risk for recurrence after treatment (4),(5). Prevalence is highest in elderly men from Scotland, Norway and Iceland, and can be as high as 40%, but DD was reported among all ethnic groups at lower prevalence rates (6) including Black Africans (7) and Japanese (8). There is a higher disease prevalence in men, with a male-to-female ratio of approximately 6 to 1. (5;9;10), but with advancing age the incidence among women increases (11) however, the disease can be milder in women (12).
Dupuytren's disease is a familial disorder with a strong genetic predisposition and variable autosomal dominance being the most likely pattern of inheritance. DD can be influenced by environmental factors including alcoholism, diabetes, and smoking (5;10;13).
Numerous treatment options have been utilized both surgical and non-operative. These treatment modalities are effective for controlling but not curing the disease. Understanding the molecular pathogenesis of DD is necessary for the development of new more curative therapeutic alternatives.
No single gene responsible for the development of DD thus far has been identified, suggesting that DD may have a complex multifactorial (conditions arising from a combination of environmental and multiple genetic factors) etiology. Complex disorders such as systemic lupus erythematosus, diabetes, and certain cancers result from the combined action of environmental factors and alleles of more than one gene. The inheritance pattern of such disorders is usually complex when compared to monogenic disorders and depends on the simultaneous influence of multiple alleles. DD is the most common hereditary connective tissue disorder among Caucasians (14), hence locating gene(s) for this disease is of utmost importance because their identification would provide insight into the fundamental pathogenesis of the disease, and suggest targets for prevention or medical intervention. Investigating the genetic nature of DD by localizing regions that may harbor DD susceptibility genes will aid in designing more intricate genetic studies that may identify the precise gene(s) and causative variants within these gene(s). The purpose of this study was to identify regions that may harbor DD susceptibility genes by scanning the entire genome in a group of DD patients.
Two groups were selected for this study, DD patients and controls. All subjects of both groups were Caucasians of European ancestry. Dupuytren's disease patients were identified by the presence of minimum phenotypic characteristics of the condition including; unequivocal palmar and digital cords, bilateral disease, multiple digits affected and progressive digital contractures. The control group included volunteers who had no family history of DD and examination of their hands was normal with no evidence of any palmar cutaneous thickening, nodules or cords. Saliva samples were obtained from 40 unrelated DD patients (26 males and 14 females) and 40 (34 males and 6 females) unaffected controls. The samples were used to get genomic DNA. The study was approved by both IINTEGRIS Baptists Medical Center and Oklahoma Medical Research Foundation (OMRF) Institutional Review Boards (IRBs) and all samples were obtained with the written informed consents of the subjects.
The genotype experiments to determine DD genetic association was conducted on 80 participants all of whom resided in the same geographic area (Oklahoma City). The clinical characteristics of both groups are summarized in Table 1. One-half of the patients in this group have family history of the disease. The ages of unaffected control participants ranged from 20 to 60 years old.
The genomic DNA was collected using Oragene•DNA kit (DNA Genotek). The Oragene•DNA immediately stabilizes DNA in the saliva upon mixing and the collected Oragene•DNA/saliva samples are stable at room temperature for years without processing. Samples were processed no later than 4 months after collection. The saliva collection and genomic DNA extraction were performed according to the manufacturer's instruction. The extracted DNA was evaluated for quality using NanoDrop ND-1000 Spectrophometer (Fisher/Thermo Scientific). The ratios at 260/280 and 260/230 as well as the concentration of genomic DNA were calculated. The accepted ratios for 260/280 (protein) and 260/230 (organic material, phenol, etc) are between 1.5 to 2.0 and 0.0 to 3.0, respectively. For each sample, the median yield of genomic DNA from 2 mL of saliva captured in 2 mL of Oragene•DNA was 100 micrograms. The extracted DNA from the unrelated DD patients and unaffected controls were stored at -20°C until ready for genotyping.
Samples were genotyped using the Illumina HumanCytoSNP-12™ array utilizing Infinium® HD Assay Ultra genotyping assay methods (~300,000 single nucleotide polymorphisms (SNPs)) according to the manufacturer's instructions at the genotyping facilities located at OMRF (Oklahoma City, OK). More information on Illumina genotyping can be found at: (http://www.illumina.com/). Genotype data was only used from samples with a call rate (the number of SNPs receiving a genotype call ‘AA, AB or BB’ divided by the total number of SNPs for each sample) (15) greater than 90% of the SNPs screened. The average call rate for all samples was 97.18%.
The QC is a necessary step to avoid false-positive or false-negative results from our statistical analyses due to DNA sample or genotyping errors attributable to poor-quality DNA samples or genotyping errors. The QC for the statistical analysis methods described below were performed as follows: For Single SNP analysis the quality of the genotype was assessed for each tested SNP by predetermined quality control inclusion criteria: minor allele frequency (MAF) > 1%, SNP call rate (the number of samples receiving a genotype call - ie. AA, AB or BB- divided by the total number of samples for each SNPs) > 90%, individual genotyping rate (the number of SNPs receiving a genotype call -ie. AA, AB or BB- divided by the total number of SNPs for each individual) > 90% and Hardy-Weinberg equilibrium (HWE) (P > .001) among all samples. In the mapping by admixture linkage disequilibrium (MALD) analysis, data quality control was performed by including individuals with genotyping rate > 90% and SNPs with call rate > 95%, MAF < 1%, HWE P > 0.001 (16) HWE is the equilibrium state of a locus where both allele and genotype frequencies in a population remain constant under appropriate conditions including random mating, no migration, no inbreeding, no mutation, no natural selection, and large population size (17). Deviations from HWE can be due to genotyping error, chance, assumption violations, or a gene-disease association (18).
Allele and genotype frequencies were calculated for each locus and tested for Hardy–Weinberg equilibrium (HWE) in controls. Case-control association studies were analyzed by chi-square test using 2 ×3 and 2 × 2 contingency tables of genotype and allele frequencies, respectively. Odds ratios and P-values were calculated using PLINK (19), a free, open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner. A P-value explains the strength of association between a SNP and a disease with P < 1 × 10-4 is considered statistically significant.
Admixture mapping (MALD) method with the ADMIXMAP program/software (20;21) was used to localize disease causing genetic variants that differ in frequency across different ancestral populations. Differences in the proportion of admixture for a particular chromosomal segment between cases and controls can indicate a region as being involved in a disease. Case only analysis can also be done by looking for differences in admixture proportions between specific regions and the rest of the genome in the same individual. This analysis method utilizes Bayesian and classical approaches to perform admixture-mapping analyses. Markov Chain Monte Carlo (MCMC) simulation (22) is used to calculate the distribution of all unobserved variables given the observed genotypes, trait data, and prior ancestral allele frequencies. These unobserved variables include the ancestry at each locus and the ancestry-specific allele frequencies at each locus (17). ADMIXMAP compares observed versus expected ancestry across the genome. The readout for the method is a Z-score, which is a statistical test for association with ancestry at each locus by comparing the observed and expected proportions of gene copies at each locus. The Z-score is considered significant at |Z-score|>3.
To complete the ADMIXMAP analysis, we provided founder (Northern European) and non-founder (Southern European) population allele frequencies using publicly available control data from the Illumina iControlDB Hap550v1 and Hap550v3 (http://www.illumina.com/science/icontroldb.ilmn). We clustered Northern and Southern Europe samples and conducted an association analysis to identify SNPs that can distinguish those samples. We identified 3,133 SNPs that are informative to separate Northern and Southern European samples. These SNPs are called ancestry-informative markers (AIMs) where they have large allele frequency differences between those populations. We then calculated Northern and Southern European allele frequencies of those SNPs/AIMs based on 432 Northern European and 121 Southern European samples using publicly available control data from the Illumina iControlDB Hap550v1 and Hap550v3 (http://www.illumina.com/science/icontroldb.ilmn). We hypothesized that we can localize disease causing genetic variants by comparing the allele frequencies between those populations (17).
The DD dataset is from 80 samples, 40 cases and 40 controls with ~300,000 SNPs. We then selected 3,133 AIMs from this dataset for admixture mapping. A standard data QC was performed resulting in one control with > 10% missing genotypes to be removed. In order to increase the number of controls, we added two publicly available control sets: Illumina iControls: 432 Northern Europeans, and 121 Southern Europeans, and HapMap CEU: 60 samples. HapMap CEU is the International HapMap Project sample collection of Utah residents with Northern and Western European ancestry taken from the CEPH (Centre d'Etude du Polymorphisme Humain) collection. Then, ADMIXMAP analyses were performed on the Dupuytren's data only and Dupuytren's + iControl + CEU data for both case-only and case-control analyses across the entire genome (chromosome 1 to 23).
To evaluate the variation as a result of DD phenotype in patients, the SNP analysis was performed on 40 unrelated cases and 40 unaffected controls. The participants showed no problematic population stratification; however, 3 individuals (affected) were identified as outliers when principal component evaluation was conducted to correct for population stratification (23). Four individuals (controls) were removed due to the low genotyping (>10% missing genotypes for each individual), 37 affected and 36 controls were remained for the analysis. There were a total number of 301,232 SNPs, which was reduced to 251,837 after the quality control was performed.
The genome wide association scan (GWAS) output is shown on Figure 1. The Y-axis shows the p-values (significance of association) and each dot represents a SNP evaluated in the study. Any SNP above the red line is considered significant (P < 1 × 10-4) because this indicates that the association around that SNP is less likely due to chance. The X-axis shows the SNP locations on the 25 chromosomes.
The results of the most significant SNPs and nearest genes are tabulated in Table 2. The associated SNPs are located in chromosomes 1, 3, 4, 5, 6, 11, 16, 17 and 23.
Because DD is mostly prevalent in the Northern European population, we took the advantage of ADMIXMAP/MALD analysis method to evaluate the DD genetic association of the genotype databased on ancestry. To accomplish this task, the Illumina iControl data were used to determine Northern European and Southern European population allele frequencies. Results from ADMIXMAP analysis as shown in the Figures 2 indicated that ancestry-associated regions with Z-score |Z-score|>3 in chromosomes 2, 6, 8, 11, 16 and 20 may contain Dupuytren's disease susceptibility genes.
Three chromosomes were identified as common to both analyses methods and these are 6, 11 and 16. The ancestry association on chromosome 6 and single SNP association on the same chromosome are within 10,000 base pairs. These observations are significant because the SNPs that are in proximity (typically, 50kb apart or closer) are more likely than those that lie farther apart to have alleles that travel together in a block when passed from parent to child. This phenomenon is termed linkage disequilibrium (24).
Dupuytren's disease is believed to have a strong genetic component and identifying the gene(s) for the disease has eluded researchers for generations. Identifying the gene(s) responsible for DD will elucidate the nature of the disease, identify at risk individuals, distinguish between genetic and sporadic cases, validate the clinical observations about non-Dupuytren's disease, and usher new methods of treatment and hope for cure rather than controlling the diseases with current treatment methods.
Our study provided strong evidence to the genetic nature of DD. Using GWAS approach we attempted to search for regions on the genome or genes that may play a major role in the pathogenesis of DD. Although our study did not identify the precise gene loci, the data analysis by both ancestry and SNP based methods presented here revealed that regions on chromosomes 6, 11 and 16 may contain DD susceptibility genes. The region associated in chromosome 6 is near the gene HCG27, which is located in the HLA locus. The finding about this region was most intriguing because it was identified as associated with DD by both ancestry and SNP based analysis methods. The most significant SNPs associated with DD in our study is on chromosome 16p13 is 36 cM (centimorgan; one cM corresponds to about 1 million base pairs in humans on average) away from the main associated region on chromosome 16q previously reported on the Swedish family linkage analysis (5). None of the proposed DD associated polymorphisms in TGF-beta receptor 1 and ZF9, reported by Bayat et al (9), were present in our study. A limitation of our study is the relatively small number of patients. However, the fact that our data and others have identified several regions on multiple chromosomes to be associated or linked to DD, strongly suggest that DD is an oligogenic disorder that results from the combined action of alleles of more than one gene. This must be taken into consideration when pursuing future genetic studies on DD.
The data reported here were obtained from unrelated DD patients and half these patients had no known family history of the disease. It is uncertain as to whether those patients without family history are truly sporadic cases or they simply are not aware of an existing family history. Genetic testing on a more homogeneous group of DD patients with known family history and strong diathesis along with sampling of the proband and affected family members may provide information about the precise location of the culprit genes.
The work reported here is very encouraging because the analysis power of this small sample size based on the results is very significant and we believe that future experiments using additional unrelated DD patients and family samples will have enough power to unequivocally identify DD associated genes. Future genetic studies should focus on areas of the genome that most likely contain the genes for DD in particular chromosomes 6, 11and 16. Dupuytren's disease is one of the few remaining autosomal dominant diseases without known causative gene(s), we believe that the results of this work in combination with others already published will significantly aid in designing more detailed genetic studies to identify the precise underlying genes and causative variants within these genes.
The authors would like to thank Kim L. Nguyen for preparation of DNA, Dr. Kenneth M. Kaufman, Adam Adler and Mai Li Zhu for helping with genotyping, and Celi Sun for data analysis. This work was supported in part by National Institute of Health (NO1-AR62277, PO1-AI 083194, R37-AI 24717, RO1-AR42460, PO1-AR049084, P20 RR020143, R01 AI045050 and P30 AR053483), the US Department of Veteran Affairs, the Alliance for Lupus Research, Lupus Family Registry and Repository (LFRR). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.