Search tips
Search criteria

Results 1-24 (24)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
Document Types
1.  Heritability of pulmonary function estimated from pedigree and whole-genome markers 
Frontiers in Genetics  2013;4:174.
Asthma and chronic obstructive pulmonary disease (COPD) are major worldwide health problems. Pulmonary function testing is a useful diagnostic tool for these diseases, and is known to be influenced by genetic and environmental factors. Previous studies have demonstrated that a substantial proportion of the variation in pulmonary function phenotypes can be explained by familial relationships. The availability of whole-genome single nucleotide polymorphism (SNP) data enables us to further evaluate the extent to which genetic factors account for variation in pulmonary function and to compare pedigree- to SNP-based estimates of heritability. Here, we employ methods developed in the animal breeding field to estimate the heritability of forced expiratory volume in one second (FEV1), forced vital capacity (FVC), and the ratio of these two measures (FEV1/FVC) among subjects in the Framingham Heart Study dataset. We compare heritability estimates based on pedigree-based relationships to those based on genome-wide SNPs. We find that, in a family-based study, estimates of heritability using SNP data are nearly identical to estimates based on pedigree information, and range from 0.50 for FEV1 to 0.66 for FEV1/FVC. Therefore, we conclude that genetic factors account for a sizable proportion of inter-individual differences in pulmonary function, and that estimates of heritability based on SNP data are nearly identical to estimates based on pedigree data. Finally, our findings suggest a higher heritability for FEV1/FVC compared to either FEV1 or FVC.
PMCID: PMC3766834  PMID: 24058366
FEV1; FVC; FEV1/FVC; heritability; pulmonary function; genetic
2.  Indigenous American ancestry is associated with arsenic methylation efficiency in an admixed population of northwest Mexico 
Many studies provide evidence relating lower human arsenic (As) methylation efficiency, represented by high % urinary monomethylarsonic acid (MMA(V)), with several arsenic-induced diseases, possibly due to the fact that MMA(V) serves as a proxy for MMA(III), the most toxic arsenic metabolite. Some epidemiological studies have suggested that indigenous Americans (AME) methylate As more efficiently, however data supporting this have been equivocal. The aim of this study was to characterize the association between AME ancestry and arsenic methylation efficiency using a panel of ancestry informative genetic markers to determine individual ancestry proportions in an admixed population (composed of two or more isolated ancestral populations) of 746 individuals environmentally exposed to arsenic in northwest Mexico. Total urinary As (TAs) mean and range were 170.4 and 2.3–1053.5 μg/L, while %AME mean and range were 72.4 and 23–100. Adjusted (gender, age, AS3MT 7388/M287T haplotypes, body mass index (BMI), and TAs) multiple regression model showed that higher AME ancestry is associated with lower %uMMA excretion in this population (p <0.01). The data also showed a significant interaction between BMI and gender indicating negative association between BMI and %uMMA, stronger in women than men (p <0.01). Moreover age and the AS3MT variants 7388 (intronic) and M287T (non-synonymous) were also significantly associated with As methylation efficiency (p = 0.01). This study highlights the importance of BMI and indigenous American ancestry in some of the observed variability in As methylation efficiency, underscoring the need to be considered in epidemiology studies, particularly those carried out in admixed populations.
PMCID: PMC3572940  PMID: 22047162
arsenic; arsenic metabolism; ancestry; admixture; body mass index; BMI; AS3MT
3.  Genetic admixture, social-behavioral factors, and body composition are associated with blood pressure differently by racial-ethnic group among children. 
Journal of Human Hypertension  2011;26(2):98-107.
Cardiovascular disease has a progressively earlier age of onset, and disproportionately affects African Americans in the US. It has been difficult to establish the extent to which group differences are due to physiological, genetic, social, or behavioral factors. In this study, we examined the association between blood pressure and these factors among a sample of 294 children, identified as African-, European-, or Hispanic-American. We use body composition, behavioral (diet and physical activity), and survey-based measures (socio-economic status and perceived racial discrimination), as well as genetic admixture based on 142 ancestry informative markers (AIM) to examine associations with systolic and diastolic blood pressure. We find that associations differ by ethnic/racial group. Notably, among African Americans, physical activity and perceived racial discrimination, but not African genetic admixture, are associated with blood pressure, while the association between blood pressure and body fat is nearly absent. We find an association between blood pressure and an AIM near a marker identified by a recent genome-wide association study. Our findings shed light on the differences in risk factors for elevated blood pressure among ethnic/racial groups, and the importance of including social and behavioral measures to grasp the full genetic/environmental etiology of disparities in blood pressure.
PMCID: PMC3172395  PMID: 21248781
blood pressure; racial/ethnic disparities; children; genetic admixture; social and behavioral risk factors
4.  Ancestry informative markers on chromosomes 2, 8 and 15 are associated with insulin-related traits in a racially diverse sample of children 
Human genomics  2011;5(2):79-89.
Type-2 diabetes represents an increasing health burden. Its prevalence is rising among younger age groups and differs among racial/ethnic groups. Little is known about its genetic basis, including whether there is a genetic basis for racial/ethnic disparities. We examine a multiethnic sample of 253 healthy children to evaluate associations between insulin-related phenotypes and 142 ancestry informative markers (AIMs), while adjusting for sex, age, Tanner stage, genetic admixture, total body fat, height and socio-economic status. We also evaluate the effect of measurement errors in estimation of the individual ancestry proportions on the regression results. We find that European genetic admixture is positively associated with insulin sensitivity (SI), and negatively associated with acute insulin response to glucose, fasting insulin, and homeostasis model assessment of insulin resistance. Our analysis reveals associations between individual AIMs on Chromosomes 2, 8, and 15 and these phenotypes. Most notably, marker rs3287 at chromosome 2p21 was found to be associated with SI (p=5.8 × 10-5). This marker may be in admixture linkage disequilibrium with nearby loci (THADA and BCL11A) that have previously been reported to be associated with diabetes and diabetes-related phenotypes in several genome-wide association and linkage studies. Our results provide further evidence that variation in the 2p21 region containing THADA and BCL11A is associated with type-2 diabetes. Importantly, we have implicated this region in the early development of diabetes-related phenotypes, and in the genetic etiology of population differences in these phenotypes.
PMCID: PMC3146800  PMID: 21296741
insulin sensitivity; genetic admixture; type-2 diabetes; genetic association; ancestry informative marker
5.  Identification of Allelic Heterogeneity at Type-2 Diabetes Loci and Impact on Prediction 
PLoS ONE  2014;9(11):e113072.
Although over 60 single nucleotide polymorphisms (SNPs) have been identified by meta-analysis of genome-wide association studies for type-2 diabetes (T2D) among individuals of European descent, much of the genetic variation remains unexplained. There are likely many more SNPs that contribute to variation in T2D risk, some of which may lie in the regions surrounding established SNPs - a phenomenon often referred to as allelic heterogeneity. Here, we use the summary statistics from the DIAGRAM consortium meta-analysis of T2D genome-wide association studies along with linkage disequilibrium patterns inferred from a large reference sample to identify novel SNPs associated with T2D surrounding each of the previously established risk loci. We then examine the extent to which the use of these additional SNPs improves prediction of T2D risk in an independent validation dataset. Our results suggest that multiple SNPs at each of 3 loci contribute to T2D susceptibility (TCF7L2, CDKN2A/B, and KCNQ1; p<5×10−8). Using a less stringent threshold (p<5×10−4), we identify 34 additional loci with multiple associated SNPs. The addition of these SNPs slightly improves T2D prediction compared to the use of only the respective lead SNPs, when assessed using an independent validation cohort. Our findings suggest that some currently established T2D risk loci likely harbor multiple polymorphisms which contribute independently and collectively to T2D risk. This opens a promising avenue for improving prediction of T2D, and for a better understanding of the genetic architecture of T2D.
PMCID: PMC4231111  PMID: 25393876
6.  Evidence for novel genetic loci associated with metabolic traits in Yup’ik people 
To identify genomic regions associated with fasting plasma lipid profiles, insulin, glucose, and glycosylated hemoglobin in a Yup’ik study population, and to evaluate whether the observed associations between genetic factors and metabolic traits were modified by dietary intake of marine derived omega-3 polyunsaturated acids (n-3 PUFA).
A genome-wide linkage scan was conducted among 982 participants of the Center for Alaska Native Health Research study. n-3 PUFA intake was estimated using the nitrogen stable isotope ratio (δ15N) of erythrocytes. All genotyped SNPs located within genomic regions with LOD scores > 2 were subsequently tested for individual SNP associations with metabolic traits using linear models that account for familial correlation as well as age, sex, community group and n-3 PUFA intake. Separate linear models were fit to evaluate interactions between the genotype of interest and n-3 PUFA intake.
We identified several chromosomal regions linked to serum apolipoprotein A2, high density lipoprotein-, low density lipoprotein-, and total cholesterol, insulin, and glycosylated hemoglobin. Genetic variants found to be associated with total cholesterol mapped to a region containing previously validated lipid loci on chromosome 19, and additional novel peaks of biological interest were identified at 11q12.2-11q13.2. We did not observe any significant interactions between n-3 PUFA intake, genotypes, and metabolic traits.
We have completed a whole genome linkage scan for metabolic traits in Native Alaskans, confirming previously identified loci, and offering preliminary evidence of novel loci implicated in chronic disease pathogenesis in this population.
PMCID: PMC3785243  PMID: 23907821
Alaska Native; metabolism; multi-point linkage genome scan
7.  Obesity polymorphisms identified in genome-wide association studies interact with n-3 polyunsaturated fatty acid intake and modify the genetic association with adiposity phenotypes in Yup’ik people 
Genes & Nutrition  2013;8(5):495-505.
n-3 Polyunsaturated fatty acids (n-3 PUFAs) have anti-obesity effects that may modulate risk of obesity, in part, through interactions with genetic factors. Genome-wide association studies (GWAS) have identified genetic variants associated with body mass index (BMI); however, the extent to which these variants influence adiposity through interactions with n-3 PUFAs remains unknown. We evaluated 10 highly replicated obesity GWAS single nucleotide polymorphisms (SNPs) for individual and cumulative associations with adiposity phenotypes in a cross-sectional sample of Yup’ik people (n = 1,073) and evaluated whether genetic associations with obesity were modulated by n-3 PUFA intake. A genetic risk score (GRS) was calculated by adding the BMI-increasing alleles across all 10 SNPs. Dietary intake of n-3 PUFAs was estimated using nitrogen stable isotope ratio (δ15N) of red blood cells, and genotype–phenotype analyses were tested in linear models accounting for familial correlations. GRS was positively associated with BMI (p = 0.012), PBF (p = 0.022), ThC (p = 0.025), and waist circumference (p = 0.038). The variance in adiposity phenotypes explained by the GRS included BMI (0.7 %), PBF (0.3 %), ThC (0.7 %), and WC (0.5 %). GRS interactions with n-3 PUFAs modified the association with adiposity and accounted for more than twice the phenotypic variation (~1–2 %), relative to GRS associations alone. Obesity GWAS SNPs contribute to adiposity in this study population of Yup’ik people and interactions with n-3 PUFA intake potentiated the risk of fat accumulation among individuals with high obesity GRS. These data suggest the anti-obesity effects of n-3 PUFAs among Yup’ik people may, in part, be dependent upon an individual’s genetic predisposition to obesity.
Electronic supplementary material
The online version of this article (doi:10.1007/s12263-013-0340-z) contains supplementary material, which is available to authorized users.
PMCID: PMC3755132  PMID: 23526194
BMI; Adiposity; Alaska Native; SNP; δ15N; rs9939609; rs7647305; FTO; ETV5; Genetic risk score; CANHR; Gene-by-environment interactions
8.  Associations of the lactase persistence allele and lactose intake with body composition among multiethnic children 
Genes & Nutrition  2013;8(5):487-494.
Childhood obesity is a worldwide health concern with a multifaceted and sometimes confounding etiology. Dairy products have been implicated as both pro- and anti-obesogenic, perhaps due to the confounding relationship between dairy, lactose consumption, and potential genetic predisposition. We aimed to understand how lactase persistence influenced obesity-related traits by observing the relationships among lactose consumption, a single nucleotide polymorphism (SNP) near the lactase (LCT) gene and body composition parameters in a sample of multiethnic children (n = 296, 7–12 years old). We hypothesized that individuals with the lactase persistence (LP) allele of the LCT SNP (rs4988235) would exhibit a greater degree of adiposity and that this relationship would be mediated by lactose consumption. Body composition variables were measured using dual X-ray absorptiometry and a registered dietitian assessed dietary intake of lactose. Statistical models were adjusted for sex, age, pubertal stage, ethnic group, genetic admixture, socio-economic status, and total energy intake. Our findings indicate a positive, significant association between the LP allele and body mass index (p = 0.034), fat mass index (FMI) (p = 0.043), and waist circumference (p = 0.008), with associations being stronger in males than in females. Our results also reveal that lactose consumption is positively and nearly significantly associated with FMI.
PMCID: PMC3755137  PMID: 23479116
Lactose; Obesity; Children; Genetics; Lactase persistence; Body composition
9.  Prediction of Complex Human Traits Using the Genomic Best Linear Unbiased Predictor 
PLoS Genetics  2013;9(7):e1003608.
Despite important advances from Genome Wide Association Studies (GWAS), for most complex human traits and diseases, a sizable proportion of genetic variance remains unexplained and prediction accuracy (PA) is usually low. Evidence suggests that PA can be improved using Whole-Genome Regression (WGR) models where phenotypes are regressed on hundreds of thousands of variants simultaneously. The Genomic Best Linear Unbiased Prediction (G-BLUP, a ridge-regression type method) is a commonly used WGR method and has shown good predictive performance when applied to plant and animal breeding populations. However, breeding and human populations differ greatly in a number of factors that can affect the predictive performance of G-BLUP. Using theory, simulations, and real data analysis, we study the performance of G-BLUP when applied to data from related and unrelated human subjects. Under perfect linkage disequilibrium (LD) between markers and QTL, the prediction R-squared (R2) of G-BLUP reaches trait-heritability, asymptotically. However, under imperfect LD between markers and QTL, prediction R2 based on G-BLUP has a much lower upper bound. We show that the minimum decrease in prediction accuracy caused by imperfect LD between markers and QTL is given by (1−b)2, where b is the regression of marker-derived genomic relationships on those realized at causal loci. For pairs of related individuals, due to within-family disequilibrium, the patterns of realized genomic similarity are similar across the genome; therefore b is close to one inducing small decrease in R2. However, with distantly related individuals b reaches very low values imposing a very low upper bound on prediction R2. Our simulations suggest that for the analysis of data from unrelated individuals, the asymptotic upper bound on R2 may be of the order of 20% of the trait heritability. We show how PA can be enhanced with use of variable selection or differential shrinkage of estimates of marker effects.
Author Summary
Despite great advances in genotyping technologies, the ability to predict complex traits and diseases remains limited. Increasing evidence suggests that many of these traits may be affected by a large number of small-effect genes that are difficult to detect in single-variant association studies. Whole-Genome Regression (WGR) methods can be used to confront this challenge and have exhibited good predictive power when applied to animal and plant breeding populations. WGR is receiving increased attention in the field of human genetics. However, human and breeding populations differ greatly in factors that can affect the performance of WGRs. Using theory, simulation and real data analysis, we study the predictive performance of the Genomic Best Linear Unbiased Predictor (G-BLUP), one of the most commonly used WGR methods. We derive upper bounds for the prediction accuracy of G-BLUP under perfect and imperfect LD between markers and genotypes at causal loci and validate such upper bounds using simulation and real data analysis. Imperfect LD between markers and causal loci can impose a very low upper bound on the prediction accuracy of G-BLUP, especially when data involve unrelated individuals. In this context, we propose and evaluate avenues for improving the predictive performance of G-BLUP.
PMCID: PMC3708840  PMID: 23874214
10.  Associations of Obesity Genes with Obesity-related Outcomes in Multiethnic Children 
Archives of medical research  2011;42(6):509-514.
Background and Aims
Genome-wide association studies (GWAS) have identified several loci that are associated with body mass index (BMI = kg/m2). However, little is known regarding whether the genetic basis of BMI differs among children of diverse racial/ethnic backgrounds, how the cumulative effect of these genes influences weight, or the contribution of these variants to body composition. This study examined the association between 17 GWAS-identified loci located in 16 genes and body-composition phenotypes in a multiethnic pediatric sample and evaluated the association of a composite genetic risk score with these phenotypes.
Anthropometric measures of BMI, waist circumference and waist-to-hip ratio were obtained in a sample of 298 children. Lean and fat mass were obtained from dual-energy X-ray absorptiometry (DXA). Genotypes of 17 single nucleotide polymorphisms (SNPs) were tested for association with the phenotypic measures, adjusted by standard covariates and estimates of genetic admixture.
Both SNPs rs8050136 and rs9939609 in FTO were associated with BMI and waist circumference in a direction opposite to that observed among adults, and an inverse association was detected between the risk variant in MC4R and total lean body mass. Lean body mass mediated the association between TMEM18 and BMI. The association between the genetic risk score and body composition differed according to ethnic/racial classification.
Our findings suggest that genetic associations with BMI among children are different from those in adults, that some loci may operate through lean body mass, and that genetic risk scores will not have universal applicability across ethnic/racial groups.
PMCID: PMC3541020  PMID: 22051089
GWAS; Obesity genes; Multiethnic children; Genetic risk score
11.  A Comprehensive Genetic Approach for Improving Prediction of Skin Cancer Risk in Humans 
Genetics  2012;192(4):1493-1502.
Prediction of genetic risk for disease is needed for preventive and personalized medicine. Genome-wide association studies have found unprecedented numbers of variants associated with complex human traits and diseases. However, these variants explain only a small proportion of genetic risk. Mounting evidence suggests that many traits, relevant to public health, are affected by large numbers of small-effect genes and that prediction of genetic risk to those traits and diseases could be improved by incorporating large numbers of markers into whole-genome prediction (WGP) models. We developed a WGP model incorporating thousands of markers for prediction of skin cancer risk in humans. We also considered other ways of incorporating genetic information into prediction models, such as family history or ancestry (using principal components, PCs, of informative markers). Prediction accuracy was evaluated using the area under the receiver operating characteristic curve (AUC) estimated in a cross-validation. Incorporation of genetic information (i.e., familial relationships, PCs, or WGP) yielded a significant increase in prediction accuracy: from an AUC of 0.53 for a baseline model that accounted for nongenetic covariates to AUCs of 0.58 (pedigree), 0.62 (PCs), and 0.64 (WGP). In summary, prediction of skin cancer risk could be improved by considering genetic information and using a large number of single-nucleotide polymorphisms (SNPs) in a WGP model, which allows for the detection of patterns of genetic risk that are above and beyond those that can be captured using family history. We discuss avenues for improving prediction accuracy and speculate on the possible use of WGP to prospectively identify individuals at high risk.
PMCID: PMC3512154  PMID: 23051645
skin cancer risk; whole-genome prediction; prediction of complex traits and diseases; pedigree predictions; genomic predictions; GenPred; shared data resources
12.  On the Limits of Diversity 
Frontiers in Genetics  2012;3:312.
PMCID: PMC3405290
13.  Prediction of Expected Years of Life Using Whole-Genome Markers 
PLoS ONE  2012;7(7):e40964.
Genetic factors are believed to account for 25% of the interindividual differences in Years of Life (YL) among humans. However, the genetic loci that have thus far been found to be associated with YL explain a very small proportion of the expected genetic variation in this trait, perhaps reflecting the complexity of the trait and the limitations of traditional association studies when applied to traits affected by a large number of small-effect genes. Using data from the Framingham Heart Study and statistical methods borrowed largely from the field of animal genetics (whole-genome prediction, WGP), we developed a WGP model for the study of YL and evaluated the extent to which thousands of genetic variants across the genome examined simultaneously can be used to predict interindividual differences in YL. We find that a sizable proportion of differences in YL—which were unexplained by age at entry, sex, smoking and BMI—can be accounted for and predicted using WGP methods. The contribution of genomic information to prediction accuracy was even higher than that of smoking and body mass index (BMI) combined; two predictors that are considered among the most important life-shortening factors. We evaluated the impacts of familial relationships and population structure (as described by the first two marker-derived principal components) and concluded that in our dataset population structure explained partially, but not fully the gains in prediction accuracy obtained with WGP. Further inspection of prediction accuracies by age at death indicated that most of the gains in predictive ability achieved with WGP were due to the increased accuracy of prediction of early mortality, perhaps reflecting the ability of WGP to capture differences in genetic risk to deadly diseases such as cancer, which are most often responsible for early mortality in our sample.
PMCID: PMC3405107  PMID: 22848416
14.  A Hybrid Bayesian Network/Structural Equation (BN/SEM) Modeling Approach for Detecting Physiological Networks for Obesity-related Genetic Variants 
GWAS studies have been successful in finding genetic determinants of obesity. To translate discovered genetic variants into new therapies or prevention strategies, molecular or physiological mechanisms need to be discovered. One strategy is to perform data mining of data sets with detailed phenotypic data, such as those present in dbGAP (database of Genotypes and Phenotypes) for hypothesis generation. We propose a novel technique that combines the power and computational efficiency of existing Bayesian Network (BN) learning algorithms with the statistical rigor of Structural Equation Modeling (SEM) to produce an overall system that searches the space of potential networks and evaluates promising candidates using standard SEM model selection criteria. We demonstrate our method using the analysis of a candidate SNP data set from the AMERICO sample, a multi-ethnic cross-sectional cohort of roughly three hundred children with detailed obesity-related phenotypes. We demonstrate our approach by showing genetic mechanisms for three obesity-related SNPs.
PMCID: PMC3272699  PMID: 22318170
15.  Natural selection among Eurasians at genomic regions associated with HIV-1 control 
HIV susceptibility and pathogenicity exhibit both interindividual and intergroup variability. The etiology of intergroup variability is still poorly understood, and could be partly linked to genetic differences among racial/ethnic groups. These genetic differences may be traceable to different regimes of natural selection in the 60,000 years since the human radiation out of Africa. Here, we examine population differentiation and haplotype patterns at several loci identified through genome-wide association studies on HIV-1 control, as determined by viral-load setpoint, in European and African-American populations. We use genome-wide data from the Human Genome Diversity Project, consisting of 53 world-wide populations, to compare measures of FST and relative extended haplotype homozygosity (REHH) at these candidate loci to the rest of the respective chromosome.
We find that the Europe-Middle East and Europe-South Asia pairwise FST in the most strongly associated region are elevated compared to most pairwise comparisons with the sub-Saharan African group, which exhibit very low FST. We also find genetic signatures of recent positive selection (higher REHH) at these associated regions among all groups except for sub-Saharan Africans and Native Americans. This pattern is consistent with one in which genetic differentiation, possibly due to diversifying/positive selection, occurred at these loci among Eurasians.
These findings are concordant with those from earlier studies suggesting recent evolutionary change at immunity-related genomic regions among Europeans, and shed light on the potential genetic and evolutionary origin of population differences in HIV-1 control.
PMCID: PMC3141432  PMID: 21689440
16.  Natural selection at genomic regions associated with obesity and type-2 diabetes: East Asians and sub-Saharan Africans exhibit high levels of differentiation at type-2 diabetes regions 
Human genetics  2010;129(4):407-418.
Different populations suffer from different rates of obesity and type-2 diabetes (T2D). Little is known about the genetic or adaptive component, if any, that underlies these differences. Given the cultural, geographic, and dietary variation that accumulated among humans over the last 60,000 years, we examined whether loci identified by genome-wide association studies for these traits have been subject to recent selection pressures. Using genome-wide SNP data on 938 individuals in 53 populations from the Human Genome Diversity Panel, we compare population differentiation and haplotype patterns at these loci to the rest of the genome. Using an “expanding window” approach (100 to 1,600 kb) for the individual loci as well as the loci as ensembles, we find a high degree of differentiation for the ensemble of T2D loci. This differentiation is most pronounced for East Asians and sub-Saharan Africans, suggesting that these groups experienced natural selection at loci associated with T2D. Haplotype analysis suggests an excess of obesity loci with evidence of recent positive selection among South Asians and Europeans, compared to sub-Saharan Africans and Native Americans. We also identify individual loci that may have been subjected to natural selection, such as the T2D locus, HHEX, which displays both elevated differentiation and extended haplotype homozygosity in comparisons of East Asians with other groups. Our findings suggest that there is an evolutionary genetic basis for population differences in these traits, and we have identified potential group-specific genetic risk factors.
PMCID: PMC3113599  PMID: 21188420
obesity; type-2 diabetes; genetics; natural selection; population differentiation
17.  Beyond Missing Heritability: Prediction of Complex Traits 
PLoS Genetics  2011;7(4):e1002051.
Despite rapid advances in genomic technology, our ability to account for phenotypic variation using genetic information remains limited for many traits. This has unfortunately resulted in limited application of genetic data towards preventive and personalized medicine, one of the primary impetuses of genome-wide association studies. Recently, a large proportion of the “missing heritability” for human height was statistically explained by modeling thousands of single nucleotide polymorphisms concurrently. However, it is currently unclear how gains in explained genetic variance will translate to the prediction of yet-to-be observed phenotypes. Using data from the Framingham Heart Study, we explore the genomic prediction of human height in training and validation samples while varying the statistical approach used, the number of SNPs included in the model, the validation scheme, and the number of subjects used to train the model. In our training datasets, we are able to explain a large proportion of the variation in height (h2 up to 0.83, R2 up to 0.96). However, the proportion of variance accounted for in validation samples is much smaller (ranging from 0.15 to 0.36 depending on the degree of familial information used in the training dataset). While such R2 values vastly exceed what has been previously reported using a reduced number of pre-selected markers (<0.10), given the heritability of the trait (∼0.80), substantial room for improvement remains.
Author Summary
While previous genome-wide association studies have implicated numerous loci associated with complex traits, such loci typically account for a very small proportion of phenotypic variation. However, a recent study using height as a model trait has illustrated that common single nucleotide polymorphisms can explain a large amount of genetic variance when evaluated through whole-genome statistical models. However, it is unclear to what extent higher proportions of explained variance will translate into improved predictive accuracy in future populations. Here we evaluate the predictive ability of whole-genome models for human height while varying the modeling approach, the size of the training population, the validation design, and the number of SNPs. Our results suggest that whole-genome prediction models can yield higher accuracy than what is commonly attained by models based on a few selected SNPs; yet, given the heritability of the trait in question, there exists room for improving prediction accuracy. While gains in predictive accuracy are likely to be small based on more expansive genotyping, our results indicate that more substantial benefits are likely to be gained through larger training populations, as well through the inclusion of related individuals.
PMCID: PMC3084207  PMID: 21552331
18.  Ancestry-informative markers on chromosomes 2, 8 and 15 are associated with insulin-related traits in a racially diverse sample of children 
Human Genomics  2011;5(2):79-89.
Type 2 diabetes represents an increasing health burden. Its prevalence is rising among younger age groups and differs among racial/ethnic groups. Little is known about its genetic basis, including whether there is a genetic basis for racial/ethnic disparities. We examined a multi-ethnic sample of 253 healthy children to evaluate associations between insulin-related phenotypes and 142 ancestry-informative markers (AIMs), while adjusting for sex, age, Tanner stage, genetic admixture, total body fat, height and socio-economic status. We also evaluated the effect of measurement errors in the estimation of the individual ancestry proportions on the regression results. We found that European genetic admixture is positively associated with insulin sensitivity (SI), and negatively associated with the acute insulin response to glucose, fasting insulin levels and the homeostasis model assessment of insulin resistance. Our analysis revealed associations between individual AIMs on chromosomes 2, 8 and 15 and these phenotypes. Most notably, marker rs3287 at chromosome 2p21 was found to be associated with SI (p = 5.8 × 10-5). This marker may be in admixture linkage disequilibrium with nearby loci (THADA and BCL11A) that previously have been reported to be associated with diabetes and diabetes-related phenotypes in several genome-wide association and linkage studies. Our results provide further evidence that variation in the 2p21 region containing THADA and BCL11A is associated with type 2 diabetes. Importantly, we have implicated this region in the early development of diabetes-related phenotypes, and in the genetic aetiology of population differences in these phenotypes.
PMCID: PMC3146800  PMID: 21296741
insulin sensitivity; genetic admixture; type 2 diabetes; genetic association; ancestry-informative marker
19.  Canaries in the coal mine: a cross-species analysis of the plurality of obesity epidemics 
A dramatic rise in obesity has occurred among humans within the last several decades. Little is known about whether similar increases in obesity have occurred in animals inhabiting human-influenced environments. We examined samples collectively consisting of over 20 000 animals from 24 populations (12 divided separately into males and females) of animals representing eight species living with or around humans in industrialized societies. In all populations, the estimated coefficient for the trend of body weight over time was positive (i.e. increasing). The probability of all trends being in the same direction by chance is 1.2 × 10−7. Surprisingly, we find that over the past several decades, average mid-life body weights have risen among primates and rodents living in research colonies, as well as among feral rodents and domestic dogs and cats. The consistency of these findings among animals living in varying environments, suggests the intriguing possibility that the aetiology of increasing body weight may involve several as-of-yet unidentified and/or poorly understood factors (e.g. viral pathogens, epigenetic factors). This finding may eventually enhance the discovery and fuller elucidation of other factors that have contributed to the recent rise in obesity rates.
PMCID: PMC3081766  PMID: 21106594
obesity; animals; epigenetic
20.  T-Cell Correlates of Vaccine Efficacy after a Heterologous Simian Immunodeficiency Virus Challenge▿ †  
Journal of Virology  2010;84(9):4352-4365.
Determining the “correlates of protection” is one of the challenges in human immunodeficiency virus vaccine design. To date, T-cell-based AIDS vaccines have been evaluated with validated techniques that measure the number of CD8+ T cells in the blood that secrete cytokines, mainly gamma interferon (IFN-γ), in response to synthetic peptides. Despite providing accurate and reproducible measurements of immunogenicity, these methods do not directly assess antiviral function and thus may not identify protective CD8+ T-cell responses. To better understand the correlates of vaccine efficacy, we analyzed the immune responses elicited by a successful T-cell-based vaccine against a heterologous simian immunodeficiency virus challenge. We searched for correlates of protection using a viral suppression assay (VSA) and an IFN-γ enzyme-linked immunospot assay. While the VSA measured in vitro suppression, it did not predict the outcome of the vaccine trial. However, we found several aspects of the vaccine-induced T-cell response that were associated with improved outcome after challenge. Of note, broad vaccine-induced prechallenge T-cell responses directed against Gag and Vif correlated with lower viral loads and higher CD4+ lymphocyte counts. These results may be relevant for the development of T-cell-based AIDS vaccines since they indicate that broad epitope-specific repertoires elicited by vaccination might serve as a correlate of vaccine efficacy. Furthermore, the present study demonstrates that certain viral proteins may be more effective than others as vaccine immunogens.
PMCID: PMC2863752  PMID: 20164222
21.  High Viremia Is Associated with High Levels of In Vivo Major Histocompatibility Complex Class I Downregulation in Rhesus Macaques Infected with Simian Immunodeficiency Virus SIVmac239 ▿ †  
Journal of Virology  2010;84(10):5443-5447.
Human and simian immunodeficiency viruses (HIV and SIV) downregulate major histocompatibility complex class I (MHC-I) molecules from the surface of infected cells. Although this activity is conserved across viral isolates, its importance in AIDS pathogenesis is not clear. We therefore developed an assay to detect the level of MHC-I expression of SIV-infected cells directly ex vivo. Here we show that the extent of MHC-I downregulation is greatest in SIVmac239-infected macaques that never effectively control virus replication. Our results suggest that a high level of MHC-I downregulation is a hallmark of fast disease progression in SIV infection.
PMCID: PMC2863841  PMID: 20219903
22.  A Conserved Role for Syndecan Family Members in the Regulation of Whole-Body Energy Metabolism 
PLoS ONE  2010;5(6):e11286.
Syndecans are a family of type-I transmembrane proteins that are involved in cell-matrix adhesion, migration, neuronal development, and inflammation. Previous quantitative genetic studies pinpointed Drosophila Syndecan (dSdc) as a positional candidate gene affecting variation in fat storage between two Drosophila melanogaster strains. Here, we first used quantitative complementation tests with dSdc mutants to confirm that natural variation in this gene affects variability in Drosophila fat storage. Next, we examined the effects of a viable dSdc mutant on Drosophila whole-body energy metabolism and associated traits. We observed that young flies homozygous for the dSdc mutation had reduced fat storage and slept longer than homozygous wild-type flies. They also displayed significantly reduced metabolic rate, lower expression of spargel (the Drosophila homologue of PGC-1), and reduced mitochondrial respiration. Compared to control flies, dSdc mutants had lower expression of brain insulin-like peptides, were less fecund, more sensitive to starvation, and had reduced life span. Finally, we tested for association between single nucleotide polymorphisms (SNPs) in the human SDC4 gene and variation in body composition, metabolism, glucose homeostasis, and sleep traits in a cohort of healthy early pubertal children. We found that SNP rs4599 was significantly associated with resting energy expenditure (P = 0.001 after Bonferroni correction) and nominally associated with fasting glucose levels (P = 0.01) and sleep duration (P = 0.044). On average, children homozygous for the minor allele had lower levels of glucose, higher resting energy expenditure, and slept shorter than children homozygous for the common allele. We also observed that SNP rs1981429 was nominally associated with lean tissue mass (P = 0.035) and intra-abdominal fat (P = 0.049), and SNP rs2267871 with insulin sensitivity (P = 0.037). Collectively, our results in Drosophila and humans argue that syndecan family members play a key role in the regulation of body metabolism.
PMCID: PMC2890571  PMID: 20585652
23.  Infection with “Escaped” Virus Variants Impairs Control of Simian Immunodeficiency Virus SIVmac239 Replication in Mamu-B*08-Positive Macaques▿ †  
Journal of Virology  2009;83(22):11514-11527.
An understanding of the mechanism(s) by which some individuals spontaneously control human immunodeficiency virus (HIV)/simian immunodeficiency virus replication may aid vaccine design. Approximately 50% of Indian rhesus macaques that express the major histocompatibility complex (MHC) class I allele Mamu-B*08 become elite controllers after infection with simian immunodeficiency virus SIVmac239. Mamu-B*08 has a binding motif that is very similar to that of HLA-B27, a human MHC class I allele associated with the elite control of HIV, suggesting that SIVmac239-infected Mamu-B*08-positive (Mamu-B*08+) animals may be a good model for the elite control of HIV. The association with MHC class I alleles implicates CD8+ T cells and/or natural killer cells in the control of viral replication. We therefore introduced point mutations into eight Mamu-B*08-restricted CD8+ T-cell epitopes to investigate the contribution of epitope-specific CD8+ T-cell responses to the development of the control of viral replication. Ten Mamu-B*08+ macaques were infected with this mutant virus, 8X-SIVmac239. We compared immune responses and viral loads of these animals to those of wild-type SIVmac239-infected Mamu-B*08+ macaques. The five most immunodominant Mamu-B*08-restricted CD8+ T-cell responses were barely detectable in 8X-SIVmac239-infected animals. By 48 weeks postinfection, 2 of 10 8X-SIVmac239-infected Mamu-B*08+ animals controlled viral replication to <20,000 viral RNA (vRNA) copy equivalents (eq)/ml plasma, while 10 of 15 wild-type-infected Mamu-B*08+ animals had viral loads of <20,000 vRNA copy eq/ml (P = 0.04). Our results suggest that these epitope-specific CD8+ T-cell responses may play a role in establishing the control of viral replication in Mamu-B*08+ macaques.
PMCID: PMC2772717  PMID: 19726517
24.  Estimating Genetic Ancestry Proportions from Faces 
PLoS ONE  2009;4(2):e4460.
Ethnicity can be a means by which people identify themselves and others. This type of identification mediates many kinds of social interactions and may reflect adaptations to a long history of group living in humans. Recent admixture in the US between groups from different continents, and the historically strong emphasis on phenotypic differences between members of these groups, presents an opportunity to examine the degree of concordance between estimates of group membership based on genetic markers and on visually-based estimates of facial features. We first measured the degree of Native American, European, African and East Asian genetic admixture in a sample of 14 self-identified Hispanic individuals, chosen to cover a broad range of Native American and European genetic admixture proportions. We showed frontal and side-view photographs of the 14 individuals to 241 subjects living in New Mexico, and asked them to estimate the degree of NA admixture for each individual. We assess the overall concordance for each observer based on an aggregated measure of the difference between the observer and the genetic estimates. We find that observers reach a significantly higher degree of concordance than expected by chance, and that the degree of concordance as well as the direction of the discrepancy in estimates differs based on the ethnicity of the observer, but not on the observers' age or sex. This study highlights the potentially high degree of discordance between physical appearance and genetic measures of ethnicity, as well as how perceptions of ethnic affiliation are context-specific. We compare our findings to those of previous studies and discuss their implications.
PMCID: PMC2635957  PMID: 19223962

Results 1-24 (24)