|Home | About | Journals | Submit | Contact Us | Français|
To investigate the genetic architecture of severe obesity, we performed a genome-wide association study of 775 cases and 3197 unascertained controls at ~550 000 markers across the autosomal genome. We found convincing association to the previously described locus including the FTO gene. We also found evidence of association at a further six of 12 other loci previously reported to influence body mass index (BMI) in the general population and one of three associations to severe childhood and adult obesity and that cases have a higher proportion of risk-conferring alleles than controls. We found no evidence of homozygosity at any locus due to identity-by-descent associating with phenotype which would be indicative of rare, penetrant alleles, nor was there excess genome-wide homozygosity in cases relative to controls. Our results suggest that variants influencing BMI also contribute to severe obesity, a condition at the extreme of the phenotypic spectrum rather than a distinct condition.
The prevalence of extreme obesity, defined as a body mass index (BMI) above 40, is increasingly high in developed countries: the most recent estimates of prevalence in the USA are between 2.2 and 4.4%, with the 20- to 39-year-old age group showing the greatest increase from previous studies (1). Rare mutations in genes encoding the melanocortin 4 receptor, the leptin receptor, small nuclear ribonucleoprotein polypeptide N/necdin and other genes (2,3) have been shown to underlie extreme obesity in family studies, but are not sufficiently frequent to explain the high prevalence in the general population. To search for additional, more frequent genetic variants that could contribute to extreme obesity, we carried out a genome-wide association analysis of 775 bariatric surgery patients (mean BMI = 50.6) compared with 3197 publicly available controls from the general population. We find that variants near the gene FTO, previously associated with increased BMI, are associated with increased risk of extreme obesity with genome-wide statistical significance; six of 12 other loci previously shown to affect BMI in population samples also have nominal associations to extreme obesity, suggesting an overlap between the genetic contributors to BMI in the general population and the variants that predispose to extreme obesity.
We performed genome-wide genotyping of 655 130 single nucleotide polymorphisms (SNPs) of 972 individuals, including 841 self-identified Caucasians, with BMI > 33.3 (mean 50.61, 722 Caucasians with BMI > 40; see Supplementary Material, Fig. S1) recruited at the Massachusetts General Hospital Weight Center. We obtained publicly available genotypes for 3294 anonymous Caucasian samples genotyped on a comparable Illumina platform (see Materials and Methods). After removing markers and individuals with low-quality data, individuals with appreciable non-European ancestry and correcting for residual stratification, we performed an association study on 775 Caucasian cases and 3197 unascertained controls using 457 251 SNPs. We observed only a modest inflation of association statistics with λGC = 1.05 (4), equivalent to 1.04 in a symmetric design totaling 2000 samples (5) suggesting little systematic bias remains in our analysis and that our cases and controls were well matched by ancestry. We calculate that our case–control design of extreme cases, and unascertained controls is equivalent to quantitative trait analysis in a population sample of ~4300 (see Materials and Methods).
We found significant association of rs9941349 to extreme obesity in the previously described (6,7) locus intronic to the FTO gene on chromosome 16 (P = 6.09 × 10−12, odds ratio OR = 1.48; see Table 1 for top results). This SNP is in strong linkage disequilibrium with the originally reported rs9939609 (r2 = 0.87 in HapMap/CEU) (8). A second SNP, rs8050136, is in complete LD with the previously reported marker and also shows very strong association (P < 2.83 × 10−11, OR = 1.46). These odds ratios are consistent with the proportion of variance explained in the common population (0.34% in Willer et al. 9) given our selection of extreme phenotypes as cases in a threshold model (see Supplementary Material, Fig. S1 and Materials and Methods for details) and in other reports of this association in severe obesity (10–12). No other loci were associated at a genome-wide significance level, suggesting that the remaining common variants interrogated by this genotyping platform are unlikely to have effects of comparable magnitude.
This result, and the recent identification of common alleles that increase BMI in the general population, led us to ask whether our data could distinguish between two extreme models for the genetic basis of extreme obesity. In one case, extreme obesity might simply represent the tail of the population-wide BMI distribution and therefore these individuals would have a greater dose of common BMI increasing alleles than more modestly obese individuals. In the other, extreme obesity might constitute a distinct, Mendelian-like condition and the most extremely obese individuals might then have a reduced burden of these common alleles. Thus we asked whether previously reported BMI-associated variants also modulate risk in our cohort (9,13). We are able to capture all 12 BMI variants (including the FTO locus) either directly or through highly correlated markers, and we find nominal evidence of association for six of these 12 with extreme obesity (P < 0.05; Table 2) Strikingly, 10 alleles previously shown to increase BMI in the general population were more frequent in extremely obese individuals than in controls, suggesting that as a group, these variants influence both common and extreme obesity (Table 2). More compellingly, we find that cases carry significantly more BMI increasing alleles of these SNPs than controls (Welch’s one-sided P < 1 × 10−14). Further, we find some evidence that the more extreme 50% of cases also carry more risk alleles than the remainder (Welch’s one-sided P = 0.038); this observation is recapitulated when regressing BMI phenotype from BMI risk allele count (ANOVA P = 0.033). We therefore suggest that a model of risk allele ‘burden’ may influence BMI even with the extreme tail of the phenotype.
To further search for rare, penetrant recessive alleles that could underlie extreme obesity, we asked whether our cases tended to have more homozygosity-by-descent than our control population (see Materials and Methods). Overall, our cases did not appear more homozygous than controls, and there was no association between homozygosity at any location and extreme obesity (data not shown). Our case cohort thus appears to represent a phenotypic extreme of the normal population rather than a collection of familial disease patients.
Our results suggest that most loci known to influence BMI in the general population also contribute to extreme obesity; this predicts that severely obese individuals should carry more BMI increasing alleles than expected by chance across the population, which we find is true in this cohort. These observations support the hypothesis that extreme obesity in the general population is part of the BMI continuum rather than a distinct condition. We fully expect that studies of rarer variation should uncover higher penetrance alleles that are strongly enriched in the extremely obese population; however, rather than distinct Mendelian subtypes, these alleles will likely behave similarly as risk factors across the spectrum of BMI but, like the common alleles, show increasing enrichment with more extreme phenotype. Dissecting the biological processes affected by all of these genetic variants should increase our understanding of the genetics underlying obesity, one of the largest sources of ill-health in the developed world.
The cohort studied includes men and women, aged 18–75 years, who were recruited from the population of patients undergoing RYGB surgery at the Massachusetts General Hospital Weight Center. Patients at this center routinely undergo extensive preoperative clinical evaluation and phenotyping and are followed closely during the post-operative period. DNA was obtained from 1008 patients who underwent RYGB between 2000 and 2007 and consented to participate (consent rate 97%). Each operation was performed by one of four surgeons using a standardized operative technique for open (41%) or laparoscopic (59%) RYGB. Liver biopsies were performed at the time of surgery, and samples were rapidly frozen and stored at −80°C.
Nine hundred and seventy-two case samples, including 841 self-identifying as ‘white’, were genotyped on the Illumina HumanHap 650Y product at Rosetta Inpharmatics. We converted data to the PLINK format and used this software for subsequent QC analyses unless otherwise indicated (15).
We first removed poorly performing samples and markers. One individual had more than 10% missing data and was excluded from further analysis. Of the 655 130 SNPs, genotyped in the remaining samples 6402 had more than 10% missing data and 33 570 had less than 1% minor allele frequency (MAF). A further 15 769 sex chromosome and mitochondrial SNPs were also excluded to give an initial dataset of 600 173 autosomal SNPs genotyped in 971 individuals.
We assessed cryptic relatedness in our samples by calculating identity-by-descent (IBD) coefficients for all pairwise combinations of individuals. We were able to detect seven isolated pairs and five extended pedigrees of more than two individuals who appeared to be at least first cousins from their IBD patterns. We also identified pairs with an absolute proportion of IBD πhat > 0.15 (equivalent to second cousins in an outbred population). In each case, we retained the sample with the least missing data and discarded the others, eventually removing 50 individuals from our data.
We next addressed population structure in our data: of the 921 samples remaining in our analysis, 806 self-identified as ‘white’ and were kept in the analysis. Recognizing that self-reported ethnicity may still mask considerable population heterogeneity, we calculated principal components of ancestry using EIGENSTRAT (16). Through this analysis, we identified and removed 25 outlying samples (six standard deviations).
We performed two further data integrity checks using PLINK analyses: we assessed the extent of heterozygosity in these samples and found four samples with either high rates (suggesting either admixture or sample contamination) or low (suggesting inbreeding). We then performed an identity-by-missingness rate (15), where samples are clustered on the basis of missing genotypes in common. Two samples had suspiciously high rates of concordance for missing data, indicating a possible technical artifact in the genotyping process.
Finally, we filtered SNPs more aggressively in these samples: we excluded 3020 markers failing Hardy–Weinberg equilibrium testing (HWE P < 0.001); 8040 markers with more than 5% missing data; 61 633 markers with MAF < 5% and 3972 markers with missing data biased by genotype (P < 1 × 10−6). Our final dataset thus comprises 775 samples genotyped at 525 054 markers across the autosomal genome.
We obtained publicly available genotypes for 3294 population control samples from the repository create by Illumina for this purpose (see http://www.illumina.com/pages.ilmn?ID?231). All samples were of self-reported causasian ancestry and genotyped on the Illumina HumanHap 550Y product. After filtering as described above (more than 5% MAF, less than 5% missing data per SNP and individual, HWE P > 0.001, and excluding related samples at IBD πhat > 0.15), we merged these data with our bariatic case data, retaining only the 457 251 SNPs that passed QC in both sets independently.
We investigated population stratification in the merged dataset with EIGENSTRAT: we identified 80 controls as outliers, which we removed prior to calculating the top ten principal components of population variation. We then used logistic regression to calculate association using these principal components as covariates. This approach allows us to estimate odds ratios and their confidence intervals.
One way to gauge residual stratification is by measuring the overall inflation in association statistics. The ratio of observed to expected median, λGC should be 1 (4). Without correcting for stratification, we observed λGC = 1.125 (an inflation of 12.5%); after correcting for the top ten axes of variation with EIGENSTRAT, we observed λGC = 1.05, a value in line with other published reports. We then felt comfortable adjusting for this residual inflation by dividing each association statistic by λGC to give our final association test statistics. We note that λGC scales with sample size, so that our final value of 1.05 in 775 cases and 3204 controls is equivalent to 1.04 in a dataset of 1000 cases and 1000 controls.
Any SNPs either within a 250 kb window around the most associated (or index) SNP or in linkage disequilibrium with the index SNP (r2 > 0.5) was considered not to be independent of the original SNP and was not reported.
To determine the relative power of the bariatric case samples, we examined the observed χ2 from the nine known and replicable associations for BMI. Assuming these effects as positive controls, we compared the observed non-centrality parameter (NCP) with the theoretical expectation for the NCP of 1000 population-based quantitative sample for each effect size from the GIANT meta-analysis. The median of the nine observed/expected ratios give us an approximation of the effective quantitative population size equivalent (in thousands) to our case/control cohort.
Our goal is to estimate an individual's autozygosity by means of maximum likelihood, across the autosomes by adapting a previously described model (17). We assume that homozygosity, especially in long stretches, is likely to be identical-by-descent, and autozygous, rather than identical-by-state. The process of identity by descent for each genomic region, k, has two states: either homozygous with two alleles identical by descent (Xk = 1) or not (Xk = 0). We approximate this process along the genome by a first-order Markov chain, as this approximation has been shown to hold for a variety of inbred relationships (18). Because the homozygous state is unknown, we invoke a hidden Markov model, using the SNP data and marker map to infer the unknown identity by descent state at each site along the genome, as well as to estimate by numerical maximization across the genome underlying inbreeding coefficient (F) and identity by descent switch parameter (a) which govern the coverage of autozygosity genome-wide. To complete the model, we need to enumerate the identity by descent probability at a specific locus and transition probabilities between loci. In order to describe these quantities, we turn to an explanation on how the large density of SNP data is managed. To minimize marker linkage disequilibrium, we partitioned the SNP map into segments flanked by recombination hotspots, requiring a minimum of 20 SNPs per segment (as this was shown to improve parameter estimation, data not shown). This has the effect of maintaining tight linkage within segments, but allows for transitions to occur at hotspots, where most recombination occurs (19). Each segment has an associated identity by descent probability, based on the frequency of the phased haplotypes in the region. Transition probabilities in either case are based on these probabilities as well as the fine-scale recombination rate (8). This work is described in more detail in a manuscript in preparation by B.F.V. and others.
This work was supported by grants to L.M.K. from Merck Research Laboratories and the National Institutes of Health [grant numbers DK046200, DK043351 and DK057478]. C.C. is supported by an MGH ECOR/FMD fellowship. E.K.S. was supported by the National Institutes of Health [grant numbers DK07191-32, DK079466-01 and DK080145-01].
We are indebted to the study subjects for their participation, without which this research would be impossible.
Conflict of Interest statement. None declared.