|Home | About | Journals | Submit | Contact Us | Français|
Human adiposity is highly heritable, but few of the genes that predispose to obesity in most humans are known. We tested candidate genes in pathways related to food intake and energy expenditure for association with measures of adiposity.
We studied 355 genetic variants in 30 candidate genes in 7 molecular pathways related to obesity in two groups of adult subjects: 1,982 unrelated European Americans living in the New York metropolitan area drawn from the extremes of their body mass index (BMI) distribution and 593 related Yup'ik Eskimos living in rural Alaska characterized for BMI, body composition, waist circumference, and skin fold thicknesses. Data were analyzed by using a mixed model in conjunction with a false discovery rate (FDR) procedure to correct for multiple testing.
After correcting for multiple testing, two single nucleotide polymorphisms (SNPs) in Ghrelin (GHRL) (rs35682 and rs35683) were associated with BMI in the New York European Americans. This association was not replicated in the Yup'ik participants. There was no evidence for gene × gene interactions among genes within the same molecular pathway after adjusting for multiple testing via FDR control procedure.
Genetic variation in GHRL may have a modest impact on BMI in European Americans.
Human adiposity resolves complex interactions among genetic, developmental, behavioral, and environmental influences. Obesity is primarily the result of a net imbalance of caloric intake over energy expenditure. Even small differences resulting in positive energy balance – when integrated over long periods of time – can produce increased adiposity. In the United States, 65% of adults are overweight (BMI 25.0–29.9) and more than 30% of the adult population is obese (BMI ≥30) . The problem also affects children in whom the percentage with BMI >95 percentile between the ages of 6–19 defined by the CDC growth charts is now 16% . Increasing in parallel with these trends in obesity are the frequently associated comorbidities of diabetes, hypertension, and cardiovascular disease .
Evidence for potent genetic contributions to human obesity is provided by familial clustering of increased adiposity, including a 3–7 fold increased relative risk (λs) among siblings , estimates of heritability for fat mass between 40 and 70% in twin studies [5, 6], and rare monogenic causes of syndromic and nonsyndromic obesity . Severe loss-of-function mutations in genes causing monogenic forms of obesity do not account for the heritable component of adiposity in the majority of individuals, although mutations in the Melanocortin 4 Receptor (MC4R) may account for obesity in 1–5% of the extremely obese . The phenotypic differences among individuals at the extremes of adiposity presumably reflects – to a substantial degree – allelic variation at genes that affect energy intake, expenditure and the chemical form in which excess calories are stored (‘partitioning’). The nature of these genes and their interactions is of obvious interest.
Both linkage and association studies have been undertaken to identify obesity candidate genes. The 2005 Human Obesity Gene Map update reports over 1,100 studies including 500 genes, markers, and chromosomal regions apparently linked to, or associated with, human obesity phenotypes in specific cohorts . Yet only 22 genes have been reproducibly implicated in ≥5 studies as statistically significant contributors to common obesity. For monogenic obesity, 176 mutations have been identified in 11 genes, and 50 loci have been mapped corresponding to Mendelian syndromes related to obesity . Genome-wide association studies have and will likely continue to identify additional genes associated with obesity, but often identify genetic variants that are not themselves functionally significant and genes which were not previously known to be implicated in energy homeostasis. In most individuals, the genetic basis for obesity is complex and likely to involve the interaction of multiple genes as well as gene-by-environment interactions.
In most candidate-based genetic analyses of complex disease phenotypes, fewer than five candidate genes are interrogated in a population. Moreover, the coverage of coding and non-coding polymorphisms genotyped within a given candidate gene may be sparse, and pathways of functionally-related genes are rarely simultaneously investigated. In the present study, we evaluated seven groups of functionally-related obesity candidate genes (30 total genes) in two racially divergent population groups (European Americans and Alaskan Yup'ik Eskimos). Many of the candidate genes were selected from hypothalamic arcuate pathways since the hypothalamus integrates complex neuroendocrine, autonomic, and behavioral signaling pathways that determine food intake, energy expenditure, and nutrient partitioning (fig. (fig.11).
The seven groups of genes were included amongst the candidate genes:
Five genes that encode proteins involved in the leptin signaling pathway (LEP, LEPR, JAK2, STAT3, SOCS3) were selected for investigation since leptin is a key hormone secreted in proportion to peripheral fat mass as a signal to the hypothalamus regarding the state of long-term energy stores . Central regulation of energy homeostasis begins as leptin binds receptors on (at least) two sets of neurons in the arcuate nucleus to decrease transcription of Agouti Related Peptide (AgRP) and Neuropeptide Y (NPY) or to increase transcription of Pro-Opiomelanocortin (POMC) and Cocaine and Amphetamine Related Transcript (CARTPT) that act reciprocally to increase and decrease food intake, respectively . AgRP is a naturally occurring inverse agonist of MC3R and MC4R that stimulates food intake. Agouti Related Peptide (AGRP), Melanocortin 4 Receptor (MC4R), POMC and Carboxypeptidase E (CPE) represent orexigenic (AGRP) and anorexigenic (POMC) signaling pathways and downstream effector molecules (MC4R). CPE processes prohormones in this pathway including POMC. Neuropeptide Y receptors (NPY1R and NPY5R) are expressed on downstream effector neurons responding to the orexigenic NPY signaling events originating in the arcuate nucleus. The orexigenic peptide ghrelin (GHRL) is secreted from the stomach and duodenum and binds to the ghrelin receptor (GHSR) in the arcuate nucleus activating AGRP/NPY neurons to stimulate food intake. Glucagon-like peptide 1 (GCG) is produced by the small intestine and neurons in the nucleus of tractus solitarius post-prandially and decreases food intake and slows gastric emptying upon binding to the glucagon-like peptide 1 receptor (GLP1R) in the brain. Serotonin is involved in central signaling of satiety in hypothalamic downstream effector neurons and acts through the serotonin receptor genes (HTR2A and HTR2C). Lastly, included in the list of candidate genes are eight genes for Bardet-Biedl syndrome (BBS1, BBS2, ARL6, BBS4, BBS5, MKKS, BBS7, TTC8), a syndromic, oligogenic form of obesity resulting from mutations in at least twelve genes, some of which are apparent components of a ciliary motor that can individually, or in some cases in combination, produce a dysmorphic phenotype that includes obesity, polydactyly, and retinopathy .
In summary, our candidate gene list (table (table1)1) includes genes encoding signaling peptides, receptors and downstream signaling molecules expressed in central nervous system and/or peripheral tissues, including adipocytes and gastrointestinal tract, involved in food intake, energy expenditure or energy storage. Bardet-Biedl syndrome is also caused by mutations within a common molecular pathway (ciliary structure), and hence also provides a paradigm for genetic variants in multiple candidate genes within the same pathway interacting to affect adiposity.
This study was designed to prioritize the relative contributions of individual sequence variants, as well as to examine interactions among functionally related candidate genes and pathways. In an effort to identify more universally relevant genes/alleles, two racially divergent populations were genotyped in this study: a population of European American adults living in the New York City tri-state region, and a population of Yup'ik Eskimos from southwest Alaska. GHRL was associated with BMI in the NY European Americans, even after correction for multiple testing, suggesting that genetic variation in GHRL may contribute to relative adiposity in this population.
European American participants were recruited from the New York Health Project (NYHP), a prospective cohort study undertaken under the auspices of the Academic Medicine Development Corporation (AMDeC) of New York to analyze general health and cancer-related epidemiology . Approximately 18,000 subjects were enrolled between January, 2000 and December, 2002 at fourteen sites across the five boroughs of New York City. The institutional review boards at each of the fourteen sites and Columbia University approved the protocol. Enrollees had to be 30 years of age or older, reside in New York, New Jersey, or Connecticut, and have a literacy level sufficient to complete a consent and simple follow-up questionnaire. All subjects signed a written informed consent. A trained interviewer administered a questionnaire regarding health status and behaviors, drew blood, and measured standing height on a stadiometer and weight on a scale to the nearest pound with light clothing. Written informed consent was obtained from all subjects. Race/ethnicity was assessed by self report using the categories of the 2000 Census (http://www.census.gov/main/www/cen2000.html). A total of 17,709 subjects with valid data were recruited. Of these, 10,260 (57.9%) were European Americans (non-Hispanic white). Individuals at the extremes of the BMI distribution are potentially the most informative in this context and were, therefore, selected for initial analysis. Cancer history was by self-report and confirmed in state cancer registries when possible. All subjects with a history of cancer were eliminated from the study groups to eliminate the possible effect of cancer and its treatment on body weight. European American subjects were sorted by ascending BMI. Subjects below the 10th percentile for BMI were not included to avoid the confounding effects of medical conditions that themselves might cause weight loss. Gender-and age-matched 1,003 pairs of European Americans with BMI between the 10th and 30th percentile or above the 80th percentile were selected for genotyping. In the European American subjects living in New York, with the sampling method we used we had 80% power at α level of 5% to detect a genetic variant accounting for 0.27% of the variance in BMI. This sampling method involved identifying the 1,003 lean subjects between 10th and 30th percentiles for BMI and the 1,003 obese above the 80th percentile of the total sample BMI distribution. A random selection of 2,006 subjects from the same 10,260 individuals would have 80% power to detect a genetic variant accounting for 0.39% of the phenotypic variance. Using all of the available (10,260) subjects would have provided 80% power to detect a gene accounting for 0.077% of the phenotypic variance. Therefore, this strategy of selective genotyping of subjects at the extremes of phenotype can be used to reduce genotyping costs without substantially diminishing power .
Recruitment of Yup'ik Eskimo study participants started in December, 2003 and continued through May, 2005. All residents 14 years of age and older from seven communities were invited to participate . All seven communities were located in rural southwest Alaska. On average, approximately 43% of eligible community members participated in the research project. The age distribution of our study population reflects the age distribution among eligible participants. All participants were consented using protocols approved by the University of Alaska Institutional Review Board, the National and Alaska Area Indian Health Service Institutional Review Boards, and the Yukon Kuskokwim Human Studies Committee.
593 Yup'ik Eskimo participant samples were included in this study. The analysis for this report is based upon participants who identified themselves as non-pregnant Alaska Native Yup'ik Eskimos, ≥18 years, and as having fasted at least 8 h. Anthropometric measurements included height, weight, four circumferences (triceps, waist, hip, and thigh), four skinfolds (triceps, subscapular, abdominal and thigh), and were obtained by trained observers using previously published protocols implemented in the NHANES III survey . Percent body fat was estimated by electrical bioimpedance using a Tanita TBF-300A body fat analyzer. Participants were selected from pedigrees based on availability of DNA for individuals within the family, individuals phenotyped for at least five obesity related traits, and a composite obesity score that is a weighted average of the primary phenotype (BMI) and secondary phenotypes (percent body fat, waist circumference, and four skinfold thicknesses) and weights were a function of the correlations between primary and secondary phenotypes (See the details of derivation of composite score in the Appendix A). The Yup'ik data set consists of 157 families with 731 individuals. 593 individuals were included in the present study based on the following selection strategy. First we selected families with available DNA and at least five phenotypes within a family. We ordered the families with respect to the smallest amount of missing data. The family highest in this ranking is the family with the smallest amount of missing data among all individuals in the family. Then, we selected a minimum number of families that would enable us to achieve at least 80% power at significance level of 5% for traits with a broad-sense heritability of 50% and 5% locus heritability. Using these criteria, we identified 593 individuals belonging to the top 53 families (see Appendix A, table tableA1A1).
Candidate genes were selected according to their physiological function and/or reported association/linkage with obesity or a related endophenotype. The 30 candidate genes are listed in table table1.1. The relationships between genes in the respective molecular pathways is depicted in figure figure11.
Single nucleotide polymorphisms (SNPs) were gathered from multiple public databases, NCBI (http://www.ncbi.nlm.nih.gov/projects/SNP/), Celera (ABI) (now publicly available through dbSNP), HapMap (http://www.hapmap.org), the literature, and our local resequencing data. For each locus, SNPs were selected based upon the following criteria: (1) All SNPs with a minor allele frequency >0.05 between 10,000 base pair (bp) upstream and 10,000 bp downstream of the gene's coding sequence were considered; (2) only one SNP from a group of SNPs in linkage disequilibrium with r2 > 0.80 using the aggressive multimarker tagging method of the Haploview software (http://www.broad.mit.edu/mpg/haploview/download.php); (3) coding SNPs and non-coding SNPs with previous evidence of linkage or association with obesity in the literature or from our local data with a minor allele frequency >0.05 were included; (4) non-coding SNPs within putative transcription factor binding sites or sequences conserved across species with a minor allele frequency >0.05 were included; (5) for large genes with large numbers of non-coding SNPs such as LEPR, we limited SNPs to within 5,000 bp upstream and downstream of the coding sequence, and (6) all nonsynonymous coding SNPs (nsSNPs) with an allele frequency of at least 0.01 were included.
Genomic DNA was extracted from whole blood and stored at 4°C. SNPs were genotyped on an Illumina BeadArray® assay according to the manufacturer's directions. Due to technical characteristics of the Illumina BeadArray® assay , we could not use SNPs within 60 bp of another SNP or SNPs with more than 3 alleles or insertion/deletion polymorphisms. 384 SNPs in 30 genes were selected for a custom-designed BeadArray (supplemental table S1; for online supplementary material, see www.karger.com/doi/10.1159/000181158).
Eighteen subjects from the NYHP and 14 Yup'iks were blindly genotyped in duplicate to assess assay reproducibility. In addition, 28 Illumina controls (4/plate) were included as positive and negative controls to assess bead assay performance. Twenty-five additional subjects for whom genotypes had been independently determined for a subset of the candidate genes were also included to independently determine accuracy of the bead array genotyping method.
For the NYHP BMI data, the BMI distribution for 1,982 individuals with valid genotypic data was evaluated, and were log-transformed to normalize the residual distribution. The sample selected from 2 extremes of population distribution was highly skewed. The log transformation made it a distribution of 2 extremes of comparatively normal distribution. The Ordinary Least Squares (OLS) regression method is quite robust to the distribution. Past research has shown that when OLS regression is used, extreme sampling not only provides more power than random sampling on a per subject basis, but also that the tests are robust to the marked non-normality induced by the extreme sampling [21,22,23]. Similarly, for the Yup'ik data, BMI, subscapular skinfold thickness, and waist circumference phenotypes were log transformed; thigh-skinfold phenotype was square root transformed to render the residuals approximately normally distributed after adjusting for age, age2, and sex. The phenotypes were then tested for univariate and multivariate outliers. Multivariate outliers were identified by chi-square testing using Mahalanobis distance . For the NYHP data, the bivariate distribution of BMI and age was used to calculate Mahalanobis distance; for the Yup'ik data, multivariate distributions of all 8 phenotypes were used to identify possible outliers. None of the individuals in the NYHP study was found to be an outlier. Thirteen individuals were identified as outliers in the Yup'ik data, and their phenotypic data were not included in our final analyses. We did, however, perform an association analysis of all phenotypes in Yup'ik data including the 13 outlier individuals. The results did not vary significantly in comparison to those excluding the outliers. In single SNP, gene, and gene-group association analysis, Pearson's correlation between the resulting p values corresponding to the significance of the test of the association for all phenotypes, including and excluding outliers were found to be between 0.65 and 0.95. The phenotypes in both populations were adjusted for the covariates sex, age, age2, and their respective interactions.
The 355 SNP markers were tested for Hardy-Weinberg equilibrium in both population data sets. For 17 SNPs in the NYHP data and 21 SNPs in the Yup'ik data, all subjects were homozygous for the major allele and were thus not polymorphic. These SNPs were excluded from further analysis. 43 SNPs in the NYHP data and 73 SNPs in the Yup'ik data were found to be in Hardy-Weinberg disequilibrium. In NYHP data there are 17 SNPs that are not in HWE among the entire sample but are in HWE among lean subjects. There would be 11 more SNPs in HWE if the threshold were moved from 0.05 to 0.01. We performed the association analyses including and excluding the SNPs that were in HW disequilibrium. None of the SNPs in HW disequilibrium demonstrated statistical significance at the 0.05 significance level.
For the NYHP data, associations between BMI and both additive and dominance effects of each of 355 SNPs were tested using the Ordinary Least Squares (OLS) regression method. The Yup'ik data, comprised of families, could not be analyzed using the OLS regression method because of correlations present among related individuals. Before analyzing the Yup'ik data set, we compared Within Cluster Resampling (WCR) and Mixed Model procedures to determine the validity of these methods suited for Yup'ik familial data set (see Appendix B for more details). We found that the WCR method gave more false positive results than expected. The false positive rate for the WCR method increased at lower significance thresholds. The mixed model always gave valid type I error rates. Thus, association analysis on these data was performed using a mixed model approach in which we used family membership as a random effect by fitting a mixed model with phenotype as a dependent variable and family ID as a random effect, and SNPs and covariates as fixed effects using the PROC MIXED procedure in the Statistical Analysis System (SAS 9.1, SAS Institute Inc. Cary, N.C., USA). Because of multiple testing (for 355 SNPs) in both data sets, association results were tested at threshold significance level of 0.05 using the false discovery rate method of corrected tests .
To assess the overall effect of multiple genetic variants in a particular gene on relevant phenotypes, each gene was tested for association with each phenotype by linear regression of each phenotype on additive and dominance effects of all SNPs in the gene. For the Yup'ik family data, we used a similar method of regression using a mixed model in which family membership was treated as a random effect. As noted, the candidate genes were included (non-exclusively) in 7 groups based upon their membership in molecular/physiological pathways mediating energy homeostasis. As was done for the individual genes, the effects of these groups on relevant phenotypes were tested using linear regression modeling all SNPs in the group.
The haplotype analysis was performed using a sliding window of size 2, 3, and greater until all of the markers within each gene were analyzed . For each haplotype analysis, the score test, was used in the R package Haplo Stats [27, 28]. In the haplotype association analyses, age and gender were included as covariates. Multiple testing adjustment was performed following the strategy proposed in .
We also searched for interactions among SNPs by fitting a logistic regression model with at most nine parameters: intercept, additive and dominance effects at each locus, and four interaction effects. We used the log-likelihood ratio test statistic (LRT) computed as χ2df = 2(Lfull – Lreduced) where Lfull is the log-likelihood of the data computed under a fully specified model and Lreduced is the log-likelihood of the data computed under the constraint that one or more parameters in the model are equal to zero, to explore presence of epistatic interactions. Since testing for interaction among several SNPs tends to increase multiple testing error rates, which are largely responsible for false positive results and failure to replicate numerous phenotype-genotype associations , we chose to minimize the impact of multiple testing adjustment by estimating the number of truly null hypotheses from observed p values by a finite mixture model approach . Posterior probabilities of true positives (PTP), true negatives (PTN) and false negatives (PFN) were also estimated. Mix-o-matic approach uses maximum likelihood (ML) to fit a mixture model to observed p values, which represent two hypotheses groups: the null (modeled as uniformly distributed) and alternative (modeled as beta distributed).
Phenotypic characteristics of the 1,982 New York European American NYHP subjects with valid genotype data are summarized in table table2a.2a. Obese and lean subjects were matched for sex (56.7% female) and age. The mean age ± standard deviation of obese patients was 47.7 ± 10.2 years, and the lean subjects were 46.5 ± 10.7 years. The BMI of the obese subjects was 36.3 ± 4.6 kg/m2; the lean subjects had a BMI of 22.7 ± 1.4 kg/m2.
The phenotypes of the 593 Yup'ik participants (333 females and 260 males) are summarized in table table2b.2b. They had a mean age of 38.3 years, mean BMI of 27.6 kg/m2, mean percent body fat of 29.4%, and mean waist circumference of 90.2 cm (table (table2b).2b). The mean triceps, subscapular, abdominal and thigh skinfold thicknesses were 21.0, 21.2, 27.5, and 22.9 mm, respectively.
355 of the original 384 SNPs were used in these analyses (suppl. table S1). The 29 SNPs that were not used were not polymorphic in European Americans and/or Yup'ik Eskimos. For the 18 replicate European American subjects and 14 replicate Yup'ik Eskimo participants, all of the genotypes were 100% concordant for the 355 SNPs genotyped. There was 100% concordance of genotypes determined by the bead array and other methods of genotyping including dideoxy sequencing and TaqMan.
When each of the SNPs was analyzed individually, 17 SNPs in 9 genes (GLP1R, GHRL, JAK2, GHSR, HTR2A, LEP, LEPR, NPY5R, and SOCS3) demonstrated association with BMI in the NYHP European Americans sample at significance level of 0.05 (table (table3).3). In the Yup'ik participants, 142 SNPs in 21 genes (ADIPOQ, ARL6, BBS4, BBS5, CNR1, CPE, GHRL, GCG, GHSR, GLP1R, HTR2A, HTR2C, JAK2, LEPR, MC4R, NPY1R, NPY5R, PLIN, SOCS3, STAT3, and TUB) were associated with measures of adiposity at significance level of 0.05 (suppl. table 2). The complete distribution of observed p values in both data sets is given in figure figure2.2. After correction for multiple testing by False Discovery Rate (FDR) controlling procedure, two SNPs (rs35682 and rs35683) in GHRL within 600 base pairs of each other had FDR values below 0.05 in the NYHP European American sample. However, after such correction, none of the associations in the Yup'ik participants remained significant.
When all SNPs within a gene for each of the 30 candidate genes were evaluated jointly, polymorphisms in GHRL, GLP1R, and JAK2 were significantly associated with BMI in the NYHP European American sample, with each gene accounting for approximately 1% of the variance. Table Table44 provides the p values associated with genes significant at α level of ≤5%, R2, and adjusted R2 values. Note that none of the genes were significant after correcting for multiple testing procedures by FDR. R2 is the proportion of variability in the phenotypic data that is accounted for by a statistical model, which is a combined additive and dominant effect of one SNP or group of SNPs on a gene. Adjusted R2 is a modification of R2 that adjusts for the number of predictors (SNPs in this context) in a model. The adjusted R2 value increases only if the newly added predictor improves the model more than would be expected by chance. In the Yup'ik participants, four genes were associated with obesity phenotypes (suppl. table 3), but when corrected for multiple testing, none of the associations with all the SNPs in any of the genes remained significant.
Haplotype analysis demonstrated an association of BMI with AGRP (p = 0.03), GHRL (p = 0.04), and NPY1R-5R (0.008), in the NYHP European Americans after correction for multiple testing (table (table5).5). No haplotypes were associated with any measures of adiposity in the Yup'ik participants after multiple testing correction.
As noted, the candidate genes were selected in part based upon the molecular pathways in which they interact (fig. (fig.1).1). Genes were analyzed in the groups shown in table table66 by modeling all SNPs within the genes in that pathway. The regression analyses of SNPs for genes within a pathway showed significant associations with BMI in the NYHP European American subjects, in particular, the GHRL/GHSR pathway (p = 0.03) and the GLP/GLP1R pathway (p = 0.01) (table (table6).6). However, after correcting for multiple testing by the FDR procedure, neither of these associations had FDR values below 0.05. GHRL/GHSR and GLP1R pathways both account for approximately 1% of the attributable variance in BMI. Similar analyses revealed that none of the gene pathways in the Yup'ik participants had FDR values below 0.05.
The results of interaction analyses are presented in suppl. fig. 1 and suppl. table 4. Of the over 50,000 interaction hypotheses tested, we estimate that for 16.4% (95% confidence interval, 13.5–17.5%) the null hypotheses were estimated by finite mixture models  to be false, which implies that a model including epistatic interactions will better characterize the available data. The ten pair-wise interactions with the lowest observed p values i.e., ≤10–3 and the highest posterior probabilities of being true positives (>30%) are presented in suppl. table 4.
The SNP with the most significant association with sex-, age- and diabetes-adjusted BMI in NYHP European Americans was rs35683 in GHRL with A/A genotypes having an adjusted BMI of 29.47 ± 0.01 kg/m2, A/C genotypes having an adjusted BMI of 29.50 ± 0.008 kg/m2, and C/C genotypes having the lowest adjusted BMI of 29.44 ± 0.01 kg/m2.
We studied two independent groups of subjects for association with 30 candidate genes for obesity involved in 7 molecular pathways regulating energy homeostasis. Of the 30 genes analyzed, two SNPs within GHRL (rs35682 and rs35683 within 600 base pairs of each other) are associated with BMI in the New York European American population. Polymorphisms in GHRL have previously been associated with obesity, impaired glucose tolerance, and the metabolic syndrome in several previous studies in European American children and adults [31,32,33,34,35,36,37,38,39,40]. Both rs35682 and rs35683, the two SNPs in GHRL most highly associated with BMI, are intronic and are not located in regions that are conserved across species. The two SNPs themselves are not predicted to result in changes in splicing or expression. Therefore, the biological effect of these SNPs is likely through another variant in linkage disequilibrium with rs35682 and rs35683. Notably, the GHRL SNPs rs696217 (Leu72Met) and rs27647 (5′UTR polymorphism) previously associated with adiposity in children and adults from Japan, Finland, Italy, and North America [33,34,35,36, 38, 40] were not the SNPs most highly associated with BMI in our study. The phenotypic effect of the GHRL variants is modest, and account for a difference in BMI of only 0.06 kg/m2, with the heterozygotes having a higher BMI than either of the homozygous genotypes. This is an example of heterosis at a single locus conferring an increased BMI for the heterozygotes.
Our studies include two very different racial groups in two very different environments, European Americans living in the New York metropolitan area and Yup'ik Eskimos living in rural Alaska. Only one SNP in one gene, rs560994 in GHSR, was found to be associated with BMI in both populations, but was not significant in the Yup'iks or European Americans after correction for multiple testing. The lack of replication may be in part based upon genetic and environmental differences between the groups, and is not surprising. Our study underscores the need to study large numbers of subjects in multiple populations.
We analyzed the family data (Yup'ik participants) using a mixed model with correction for multiple testing using the FDR procedure. This methodology is conservative and protects against false positives. Our results that did not meet rigorous FDR cutoffs might still be the basis for testing these genes in other populations.
The genetics of complex traits such as obesity are likely to result from interactions among genes and between genes and the environment. We used the molecular physiology of the control of energy homeostasis to cluster genes within molecular pathways and to examine interactions among these genes. Bardet Biedel syndrome represents an oligogenic example of interacting genes  and provides a rationale for this approach. By using prior knowledge of genes within networks/pathways, the number of tests for interactions that must be performed to test hypotheses of gene × gene interactions can be reduced. Given the number of genes we studied, testing for all possible combinations of gene × gene interactions would have resulted in a prohibitive penalty for multiple testing. Even using our strategy of testing only for interactions of genes within common pathways, we were unable to demonstrate significance for the interactions within the ghrelin/ghrelin receptor and GLIP1R pathways in the European Americans after correcting using the false discovery method. Our experience highlights the need for extremely large studies to adequately power analyses of complex traits for gene × gene and gene × environment interactions.
We would like to thank Josue Martinez for help with manuscript preparation. These studies were supported by NIH DK 52431, and NIHDKP30-26687 (NYORC). The Center for Alaska Native Health Research is funded by a grant from the National Center for Research Resources at the National Institutes of Health (RR16430).
Often we are required to select the individuals from large data sets with a multivariate phenotype such that the sub-sample is best representative of the sample and preserves the correlation structure among the multivariate phenotype as well as provides optimal chance of finding putative genetic loci. In the Yup'ik sample, we have adiposity related multivariate phenotypic data consisting of BMI, skinfolds (subscapular, suprailiac, thigh, triceps), waist circumference, and percent body fat. However, it is not clear which phenotype or combination of phenotypes should be used in determining the sub-sample. In most situations investigators use a primary phenotype of interest to create a sub-sample by using an extreme sampling design (i.e. selecting only individuals from both extremes of the distribution using fixed threshold). This sampling design does provide optimal power for the primary phenotype, but not necessarily for other correlated phenotypes which may also be of interest. We used the selection strategy using all correlated phenotypes and incorporating the correlation structure in the pedigrees. Here, we provide a short description of a potentially useful measure to estimate individual phenotypic value based on multivariate data following Allison and Franklin .
Consider k correlated phenotypes Yi measured on individuals in a sample. Then the minimum variance estimator of combined phenotypes Yi for j-th individual is given by
where k = the number of observed variables for individual j, Yij = the standardized i-th variable for the j-th individual, Wj = the weight for the i-th variable.
Without loss of generality, we can assume Y1 is the primary trait. Let ρ1i = genetic correlation between i-th secondary variable and primary variable Y1 (for example, BMI can be considered as a primary variable and percent body fat (PBF) or skin fold measurements can be considered as secondary variables). Then following Allison and Franklin , we can define and the weight for the correlated secondary n-th variable is defined as Note that weights as function of heritability of primary trait and/or genetic correlations between primary and secondary traits will allow us to capture the genetic variance of each trait in the composite score. We can further refine the combined score by using variance shrinkage method, and the new updated combined score is given by . Note that this score can easily accommodate missing data and also preserves the correlation between primary trait and secondary traits.
After calculating a composite score for each individual in the pedigrees, we calculated the total variance of the composite score for each family. The distribution of the composite score is given in figure figureA1A1 and the power estimates using selected pedigrees are given in table tableA1.A1. The power was calculated using simulated data set those mimic the Alaska pedigrees. We simulated 1,000 replicates of pedigrees keeping structure of the pedigrees exactly same as Alaska pedigrees and also only simulating phenotypes and genotypes of individuals with available DNA and obesity related traits as in real data. The analysis was performed using ASSOC of SAGE suite of programs. We have sufficient power to detect putative loci with locus heritability of 0.05 at 5% α level.
Before analyzing the Yup'ik data set that consists of genotypes and multivariate phenotypes in pedigrees, we compared the Within Cluster Resampling and Mixed-Model procedure to determine the validity of these methods suited for Yup'ik familial data set. We briefly describe both methods.
WCR was first introduced by Hoffman et al.  and subsequently expanded upon by Follmann et al. . WCR is an elegant method for analyzing correlated clustered data and is a simple but computationally intensive estimation method. The steps of the methods are described below.
Step 1: Select one individual from each cluster with replacement. Repeat this m times to create m data sets.
Step 2: Analyze each of the m data sets using standard complete-data methods such as SAS or SPSS.
Step 3: Integrate or combine m analyses to get a single estimate of parameters and their variances. This involves averaging the values of the parameter estimates across the m samples to produce a single point estimate and variance. Formally, we can describe it as follows:
Let m = the number of data sets analyzed, = Estimate of the parameter of interest from the i-th set, = Variance Estimate of the from the i-th set.
The point estimate from the WCR method is the average of the estimates from m analyses and is given by
and the total variance estimate of the point estimate is the difference of within replicate variance
and between replicate variance
and is given by
is approximately distributed as t with νm degrees of freedom, where
The original method was used for binary correlated clusters where cluster size is informative, but it could easily be extended to continuous phenotypes.
The linear mixed effects model may be viewed as a generalization of the variance component and regression analysis models in which SNPs are modeled as fixed effects and family ID as random effect. In particular, we can write the model as
where is the outcome variable (phenotype) such as BMI, α is overall mean of the phenotype, β1 is treated as fixed effect, β2 is treated as random effect, and ε as non-shared random effect.
We simulated 100,000 replicates each with 50 unequal size families shown in figure figureB1B1.
We simulated a mixed model quantitative trait for each individual in the family structures described above assuming no major locus effect to determine the behavior of the null distribution. The polygenic component was assumed to be normally distributed with mean 0 and variance 0.30 and the random environmental component was assumed to be normally distributed with mean 0 and variance 0.70.
Marker with 2 alleles was simulated for testing the null hypothesis of no association with allele frequency of 0.2. Genotypes of founders were simulated by two random draws from binomial distribution with p = 0.2 and n = 2 to get both alleles in the genotype. The non-founders genotypes were randomly assigned following Mendelian inheritance.
The Type I error rate of both WCR and Mixed Model is given in the table tableB1B1 at nominal α levels at 0.05, 0.01, 0.001, and 0.0001, respectively. Clearly, WCR produces inflated type I error rates compared to mixed model. Mixed model always gave valid type I error rates. Thus, we have shown that mixed model is an attractive choice as a method to test association in pedigree data since it represents a good balance between quality of results and ease of use.