|Home | About | Journals | Submit | Contact Us | Français|
Early onset disease is frequently examined in genetic studies because it is presumed to contain a more severe subset of patients under a higher influence of genetic effects. In light of the dramatic success of Crohn’s disease (CD) gene discovery efforts, we aimed to characterize the contribution of established common risk variants to pediatric CD. Using 35 confirmed CD risk alleles, we genotyped 384 parent-child trios (mean age of onset 11.7 years) along with 321 healthy controls. We performed association tests on the independent pediatric cohort and compared results to those previously published(1). We also computed a weighted CD genetic risk score for each affected person. Six variants not previously validated in children (at 5q33, 1q24, 7p12, 12q12, 8q24 and 1q32) were significantly associated with pediatric CD (P<0.03). We detected no significant association between risk score and age at onset through age 30. This analysis illustrates that the genetic effect of established CD risk variants is similar in early and later onset CD. These results motivate joint analyses of genome-wide association data in early and late onset cohorts and suggest that, rather than established risk variants, independent variants or environmental exposures should be sought as modulators of age of onset.
Studying early-onset presentations of complex disease is appealing to geneticists because of the expectation that these efforts will increase the probability of finding novel risk variants. Implicit in this strategy is the assumption that these patients represent a more severe, more genetically influenced group of affected individuals. Some studies have identified specific genes that predispose to early onset—others, aided by the enrichment in gene burden, have discovered general risk variants. The discovery by linkage and fine mapping of BRCA1 on chromosome 17q21 in early onset, familial breast cancer(2, 3), for example, provides encouragement that there is a genetic basis to common age of onset phenotypes(4–6). The yields of this approach have been particularly illustrative in early-onset forms of Alzheimer’s disease(7), Huntington’s disease(4) and myocardial infarction(8).
CD has provided a highlight of the recent efforts to implicate genetic variation in complex disease pathogenesis. Genome-wide association studies (GWAS) and a subsequent meta-analysis performed in thousands of predominantly adult onset CD cases have led to confirmation of more than 30 risk alleles explaining approximately 20% of the genetic variance in CD(1, 9–11). Through these recent efforts it has now become possible to study the collective influence of many risk variants in CD pathogenesis.
Phenotypic heterogeneity between adult and pediatric onset CD is well documented but the causal mechanisms underlying these differences are unclear. Different anatomical distributions, responses to medical therapy, and prognoses(12–17) suggest a physiologic basis for these observations. This diversity is likely heritable, suggested by increased familial aggregation(12, 18, 19), higher concordance in disease location(20), and genotype-phenotype correlations(21–27) seen in early onset disease.
Whether genetic variation can explain observed differences between pediatric and adult onset CD is largely unexplored. Several CD susceptibility alleles are confirmed as common to both adult and pediatric populations(27–31) but the majority have not been explored in children. Characterizing the role of DNA variation in influencing earlier onset CD has important implications for drug development, diagnostic testing, and risk stratification.
Although unlikely, pediatric CD may represent a distinct disease entity from later onset disease, with unique genetic risk factors—as is the case with early onset Alzheimer’s disease and breast cancer. In the other extreme, pediatric CD and later onset disease could have identical genetic architecture, but with earlier onset patients inheriting a larger dose of genetic risk factors. An intermediate hypothesis is that environmental exposures, genetic variation outside the CD causal pathway, or rare variants in common risk loci modulate the age at which disease presents.
We aimed to test the hypothesis that the timing of CD might be influenced by the overall burden of common genetic risk or through the action of common variation at individual risk alleles.
Through ongoing IRB-approved genetic studies at Children’s Hospital Boston (CHB) and Milwaukee Children’s Hospital (MCH), we collected detailed phenotypic and demographic information on 384 parent-parent-affected trios with a child age 19 or younger at the time of diagnosis; 189 trios were enrolled at CHB and 195 at MCH. These samples were not involved in any previous linkage or genome-wide association analysis contributing to the discovery of the risk loci being examined. We had previously collected peripheral blood and extracted genomic DNA on all individuals. Physicians at the home institutions made the diagnosis of CD, which was based on a combination of clinical, radiographic, and gross endoscopic findings, as well as on a review of intestinal mucosal biopsies. Patients with CD in any area of the intestine were eligible for analysis. We excluded from the cohort patients with indeterminate colitis.
We collected phenotype and genotype information from 70 patients with pediatric UC to compare risk scores with the PED cohort. These patients, diagnosed under age 20, were enrolled at CHB in the same protocol in which PED was enrolled.
In order to study the age of onset across children and adults, we used previously genotyped data from 1295 parent-parent-affected offspring trios with onset of disease at any age. This sample had been obtained for replicating CD risk loci identified from the recent meta-analysis of CD GWAS. Details about this dataset are explained elsewhere(1).
321 healthy adult controls from the Boston area were genotyped to calculate a control group risk score. This cohort had not been typed previously for any other CD analysis.
The above cohorts and sample sizes reflect the Caucasian-only subset of the data. Non-Caucasians were removed prior to analysis because of the known population differences in CD genetic architecture.
Age of onset was defined as the age at which the patient was initially diagnosed with CD. Pediatric onset disease was defined as a CD diagnosis before age 20. A genetic risk score was calculated to describe the overall burden of CD risk alleles for each individual. Using recently published ORs at all confirmed CD risk variants(1), we multiplied the number of risk alleles at each locus (0, 1, or 2) by the log(OR) of that risk allele. We took the average score across all successfully genotyped loci as a summary score for each individual. The scores were normalized to reflect a Z distribution. Since the five independent risk alleles across CARD15 and IL23R have relatively large effect sizes, we only scored individuals with 100% genotyping at all of these loci.
Using the polymerase chain reaction (PCR) based Sequenom genotyping platform (San Diego CA), primers were designed to amplify and genotype all 35 confirmed independent CD risk alleles or a nearby proxy SNP with an r2=1. These 35 independently acting alleles represent 32 different loci across the genome.
The dataset was filtered to exclude SNPs with a Hardy-Weinberg p-value <0.001 and a call rate > 95%. Individuals with <80% genotyping were excluded.
Clinical and genotype data from CHB samples were read into R, PLINK, and Haploview statistical packages for analysis. Because quality control measures disrupted some intact trios, we evaluated a small subset of affected patients in case-control analysis using the CON dataset as the control group. Family based association testing employed the transmission disequilibrium test (TDT). Association tests employed a chi-squared test with one degree of freedom.
Using observed and expected allele counts and variance estimates for both TDT samples and case-control samples, we combined the association tests into a single meta Z score for each SNP. OR’s were calculated separately for TDT and for case-control samples and were then combined using a stratified approach(32). Alleles were deemed ‘significantly associated’ with pediatric CD if the p value was <=0.03. The rationale for this threshold is that if all 34 tested SNPs were found to be significantly associated with pediatric CD, we would expect one of these associations to be below the threshold significance by chance alone (i.e. a false positive).
Risk scores were compared using the Student’s t-test. Age of onset was compared to risk score by computing the Pearson correlation coefficient. Power analysis was performed using an online genetic power calculator(33).
The home institution’s institutional review board (IRB) approved the recruitment of patients and families at CHB and MCH. Patients and/or parents gave written, informed consent before enrollment. We de-identified patient information before samples were analyzed by the authors. Ethical considerations in patient recruitment for the de-identified REP dataset, are described elsewhere(1).
The PED cohort had a higher proportion of children with colonic, ileo-colonic, and peri-anal CD than the children in the REP cohort (Table 1a and 1b). Since PED is entirely a hospital-based cohort, it is possible that these individuals had more severe and/or extensive disease. Despite these clinical differences, their mean genetic risk scores were similar (see below).
To determine if predominantly adult-identified risk variants are valid and act with similar effect sizes in children, we genotyped the PED cohort at all 35 independent CD risk alleles. All SNPs passed QC filters except for rs17622378, representing the IBD5 locus at 5q31, which was excluded from the analysis. Of the remaining 34 SNPs, 15 were associated with pediatric onset CD with our predetermined significance threshold of P<0.03 (Table 2). This observation is extremely unlikely to occur by chance (P=1.5 × 10−14). For six of these alleles, this is the first validation in pediatric CD. Of note, all independently acting CARD15, IL23R, and ATG16L1 alleles replicated in our cohort. Although association with pediatric CD is well established for lesions in these important pathogenic mechanisms, replication in our cohort provides reassurance that our samples are representative of pediatric CD in the larger population.
In 28 of 34 SNPs tested, the OR 95% confidence bounds include the previously established values (1). Thirty of 34 ORs reflect effect sizes in the same direction as reported ORs. Furthermore, roughly half of the observed OR’s are larger than published values. These data argue for similar genetic determinants with similar effect sizes in pediatric and later onset CD.
Under the assumption that all adult-identified CD risk loci participate in the pathogenesis of pediatric CD, we calculated a genetic risk score for each individual in the PED sample as well as the REP and CON samples. Of the 34 genotyped SNPs, 30 were used in the risk score calculation (or a proxy SNP with an r2>0.89). Four variants (rs7130588, rs10508815, rs2858331and rs2872507) were not sufficiently genotyped in all three samples and were therefore excluded in the risk score calculation. The international meta-analysis reported modest effect sizes for these SNPS (ORs 1.17–1.23).
The risk score distributions from PED and CON overlap to some degree (figure 1), however the mean score was significantly higher in the affected patients than in controls (0.701 -vs- 0 respectively, P=1.74×10−10).
In order to address whether the risk scores in the REP sample were artificially enriched for allelic dosage due to its previous use in replicating the CD meta-analysis findings, we compared the risk scores in the REP children under age 20 at diagnosis to those of the independent PED sample. We found no difference in mean score (0.707 and 0.739 respectively, P=0.635), suggesting that the samples can be combined for further analyses.
Risk score distributions, stratified by the Montreal classification for age at diagnosis(34) reveal a small deflation in risk scores in individuals diagnosed over the age of 40 (figure 2). We detected a very weak negative correlation between age of CD onset and risk score (i.e. lower gene dosage is associated with later CD onset), with a pearson r=−0.088 (P=0.002). Of note, there is a disproportionate amount of outlying data in older onset patients (over age 30). When we analyze all patients diagnosed only up to age 30, there is no longer a detectable correlation between age of onset and risk score.
We also subanalyzed the very early onset group of patients (diagnosed under age 9) and compared them to older onset children (diagnosed age 9 through 19). There was no difference in the mean risk score between these two groups. Additionally, males and females with CD had similar mean risk scores.
To test for single alleles that might exert a disproportionate influence on the age of onset of CD, we modeled age of onset against all 30 alleles in the risk score acting as independent variables using linear regression. We detected no significant action of a single risk variant on age of onset.
We undertook an initial analysis of how the collective burden of CD risk alleles differs in pediatric CD and UC patients. We computed risk scores in 70 cases of pediatric UC and compared this distribution to that of the 371 CD patients in the PED cohort (figure 3). We found a significantly deflated risk score in the UC compared with CD patients (0.197 -vs- 0.739, P=4.8 × 10−7). The score in UC patients was similar to controls (P=0.067). Finally, we noted that when repeating the score calculation solely using 10 alleles that are associated with both CD and UC(35), the mean scores in CD and UC were similar (p=0.4), reflecting the loss of discriminating information (i.e. CD-specific alleles) from the analysis. Not surprisingly in this analysis of shared alleles, all IBD patients combined (UC and CD) had higher mean scores than the controls still (p=0.008), reflecting the fact that we still had ten highly significant IBD loci in the analysis.
This is the first comprehensive analysis of the contribution of common genetic risk to pediatric and adult-onset CD—two heterogeneous presentations of IBD. Previous attempts to compare the genetic influences on age of CD onset examined either individual genes such as NOD2 and IL23R, or a small combination of genetic variants (21–27). Here, we show that the overall burden of confirmed risk alleles, estimated to explain 20% of the genetic variance of CD(1), is at most a very minor factor in the age at which CD presents. The findings argue for heritable, but as of yet undiscovered factors as the important determinants of an early presentation of CD. Rare, penetrant genetic variation and differences in the timing of host-environment interactions are attractive mechanisms that deserve additional attention in light of these findings.
We also report that at common DNA variants, similar loci act with similar effect sizes in early and later onset CD. Through this effort, we confidently validate six such alleles in children for the first time. 15 of 34 tested alleles were significantly associated with pediatric CD by conservative thresholds, reflecting our limited power to detect true associations at all tested alleles. It should be noted that we did not have adequate power to confidently rule out any adult-identified locus as associated with pediatric onset CD.
We further observed that children with UC have a marked deflation in CD risk scores and that their scores are similar to controls. Even without any UC-specific alleles used to calculate the risk score, there was a notable separation in the distributions for UC and CD patients. This raises the possibility that with further refinement—perhaps by accounting for both CD and UC-specific associations in the score--genotype-based approaches could have utility in distinguishing IBD subtypes in clinically ambiguous cases.
This report is timely as the IBD genetics field explores how best to analyze the vast amounts of data generated from genome-wide analyses. Our results provide reassurance that genotype data from children and adults can be combined for more powerful analyses. Deep sequencing of risk loci in search of rare variants that may be active in early onset CD should also be encouraged. It is unlikely that further discovery of common CD risk alleles will increase the ability to detect a major effect of ‘overall genetic load’ in age of onset since current known risk variants explain less than 10% of the variance in this phenotype.
We defined the age of onset phenotype as ‘the age of CD diagnosis by a physician’, which could cause a spurious association between age of onset and risk score. The initial presentation of CD can be insidious, and upon subsequent reflection, patients may recall symptoms for several years preceding their presentation to a physician. Subtle symptoms are more likely to be ignored, resulting in a delay in presentation to a physician. Thus, if highly insidious (but active) disease is associated with both a decrease in CD gene dosage and with the age at which one presents to a physician, there would be an artificial association between genetic dose and our ‘age of onset’ phenotype. This systematic error might be less active in young patients, where parents and pediatricians are more likely to seek prompt referral for any subtle signs or symptoms in their children. In a secondary analysis, we removed all individuals reportedly diagnosed over the age of 30 and we detected no correlation between genetic risk score and age of onset, lending further support for information bias as a source for the weak observed correlation.
We thank the IBD patients and families recruited at Children’s Hospital Boston and at Milwaukee Children’s Hospital. We also thank the General Clinical Research Center at Children’s Hospital and the genotyping platform at the Broad Institute for support, supplies, and technical advise. The NIDDK IBD Genetics Consortium, which collected and curated a large dataset for our analysis includes the following principal investigators: John D Rioux, Ramnik J. Xavier, Kent D Taylor, Mark S Silverberg, Philippe Goyette, Alan Huett, Stephan R Targan, A Hillary Steinhart, Jerome I Rotter, Richard H Duerr, Judy H Cho, Mark J Daly, and Steven R Brant.