Search tips
Search criteria

Results 1-25 (55)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
more »
1.  Evaluation of an ensemble of genetic models for prediction of a quantitative trait 
Frontiers in Genetics  2015;5:474.
Many genetic markers have been shown to be associated with common quantitative traits in genome-wide association studies. Typically these associated genetic markers have small to modest effect sizes and individually they explain only a small amount of the variability of the phenotype. In order to build a genetic prediction model without fitting a multiple linear regression model with possibly hundreds of genetic markers as predictors, researchers often summarize the joint effect of risk alleles into a genetic score that is used as a covariate in the genetic prediction model. However, the prediction accuracy can be highly variable and selecting the optimal number of markers to be included in the genetic score is challenging. In this manuscript we present a strategy to build an ensemble of genetic prediction models from data and we show that the ensemble-based method makes the challenge of choosing the number of genetic markers more amenable. Using simulated data with varying heritability and number of genetic markers, we compare the predictive accuracy and inclusion of true positive and false positive markers of a single genetic prediction model and our proposed ensemble method. The results show that the ensemble of genetic models tends to include a larger number of genetic variants than a single genetic model and it is more likely to include all of the true genetic markers. This increased sensitivity is obtained at the price of a lower specificity that appears to minimally affect the predictive accuracy of the ensemble.
PMCID: PMC4292739  PMID: 25628649
genetic risk prediction; genetic risk score; ensemble-based classifiers; bagging predictors; prediction accuracy
2.  The Hypoxic Response Contributes to Altered Gene Expression and Pre-Capillary Pulmonary Hypertension in Patients with Sickle Cell Disease 
Circulation  2014;129(16):1650-1658.
We postulated that the hypoxic response in sickle cell disease (SCD) contributes to altered gene expression and pulmonary hypertension, a complication associated with early mortality.
Methods and Results
To identify genes regulated by the hypoxic response and not other effects of chronic anemia, we compared expression variation in peripheral blood mononuclear cells from 13 SCD subjects with hemoglobin SS genotype and 15 Chuvash polycythemia subjects (VHLR200W homozygotes with constitutive up-regulation of hypoxia inducible factors in the absence of anemia or hypoxia). At 5% false discovery rate, 1040 genes exhibited >1.15 fold change in both conditions; 297 were up-regulated and 743 down-regulated including MAPK8 encoding a mitogen-activated protein kinase important for apoptosis, T-cell differentiation and inflammatory responses. Association mapping with a focus on local regulatory polymorphisms in 61 SCD patients identified expression quantitative trait loci (eQTL) for 103 of these hypoxia response genes. In a University of Illinois SCD cohort the A allele of a MAPK8 eQTL, rs10857560, was associated with pre-capillary pulmonary hypertension defined as mean pulmonary artery pressure ≥25 and pulmonary capillary wedge pressure ≤15 mm Hg at right heart catheterization (allele frequency=0.66; OR=13.8, P=0.00036, n=238). This association was confirmed in an independent Walk-PHaSST cohort (allele frequency=0.65; OR=11.3, P=0.0025, n=519). The homozygous AA genotype of rs10857560 was associated with decreased MAPK8 expression and present in all 14 identified pre-capillary pulmonary hypertension cases among the combined 757 patients.
Our study demonstrates a prominent hypoxic transcription component in SCD and a MAPK8 eQTL associated with pre-capillary pulmonary hypertension.
PMCID: PMC4287376  PMID: 24515990
sickle cell disease; MAPK8; hypoxic response; expression quantitative trait loci; association mapping; pre-capillary pulmonary hypertension
3.  Relationship Between Poor Physical Function, Inflammatory Markers, and Comorbidities in HIV-Infected Women on Antiretroviral Therapy 
Journal of Women's Health  2014;23(1):69-76.
Background: HIV-infected individuals may be at increased risk of poor physical function. Chronic inflammation has been associated with decreased physical function in the elderly and may also influence physical function in HIV-infected individuals.
Methods: This cross-sectional study assessed physical function in 65 HIV-infected women aged 40 and older on stable antiretroviral treatment using the Short Physical Performance Battery (SPPB): a standardized test of balance, walking speed, and lower- extremity strength developed for elderly populations. The relationship between low SPPB score, selected demographic and medical characteristics, and high inflammatory biomarker profile was analyzed using Fisher's exact test and Wilcoxon rank sum test.
Results: The median age of subjects was 49 years (interquartile range [IQR] 45–55), and the median CD4 T-cell count was 675 cells/mm3 (IQR 436–828). Thirteen subjects (20%) had a low SPPB score. Subjects with a low SPPB score were more likely to be cigarette smokers (p=0.03), had more medical comorbidities (p=0.01), and had higher levels of interleukin-6 (IL-6) (p<0.05). They also tended to be older (median age 55 vs. 48, p=0.06), more likely to have diabetes (p=0.07), and have higher levels of soluble tumor necrosis factor-1 (p=0.09).
Conclusions: Twenty percent of women aged 40 and older with well-treated HIV had poor physical-function performance, which was associated with the high burden of comorbidities in this population and with increased IL-6. However, it is unclear from this cross-sectional study whether increased inflammation was related to poor physical function or to other factors, such as age and medical comorbidities.
PMCID: PMC3880911  PMID: 24219874
4.  An efficient technique for Bayesian modeling of family data using the BUGS software 
Frontiers in Genetics  2014;5:390.
Linear mixed models have become a popular tool to analyze continuous data from family-based designs by using random effects that model the correlation of subjects from the same family. However, mixed models for family data are challenging to implement with the BUGS (Bayesian inference Using Gibbs Sampling) software because of the high-dimensional covariance matrix of the random effects. This paper describes an efficient parameterization that utilizes the singular value decomposition of the covariance matrix of random effects, includes the BUGS code for such implementation, and extends the parameterization to generalized linear mixed models. The implementation is evaluated using simulated data and an example from a large family-based study is presented with a comparison to other existing methods.
PMCID: PMC4235415  PMID: 25477899
BUGS; parameterization; family-based study; covariance matrix; linear mixed models
5.  A prediction model for lung cancer diagnosis that integrates genomic and clinical features 
Lung cancer is the leading cause of cancer death, in part due to lack of early diagnostic tools. Bronchoscopy represents a relatively noninvasive initial diagnostic test in smokers with suspect disease, but has low sensitivity. We have reported a gene expression profile in cytologically normal large airway epithelium obtained via bronchoscopic brushings that is a sensitive and specific biomarker for lung cancer. Here, we evaluate the independence of the biomarker from other clinical risk factors and determine the performance of a clinicogenomic model that combines clinical factors and gene expression.
Training (n = 76) and test sets (n = 62) consisted of smokers undergoing bronchoscopy for suspicion of lung cancer at five medical centers. Logistic regression models describing the likelihood of having lung cancer using the biomarker, clinical factors, and these data combined were tested using the independent set of patients with non-diagnostic bronchoscopies. The model predictions were also compared with physicians’ clinical assessment.
The gene expression biomarker is associated with cancer status in the combined clinicogenomic model (p < 0.005). There is a significant difference in performance of the clinicogenomic relative to the clinical model (p < 0.05). In the test set, the clinicogenomic model increases sensitivity and NPV to 100%, and results in higher specificity (91%) and PPV (81%) compared to other models. The clinicogenomic model has high accuracy where physician assessment is most uncertain.
The airway gene expression biomarker provides information about the likelihood of lung cancer not captured by clinical factors, and the clinicogenomic model has the highest prediction accuracy. These findings suggest that use of the clinicogenomic model may expedite more invasive testing and definitive therapy for smokers with lung cancer and reduce invasive diagnostic procedures for individuals without lung cancer.
PMCID: PMC4167688  PMID: 19138936
6.  Personality Factors in the Long Life Family Study 
To evaluate personality profiles of Long Life Family Study participants relative to population norms and offspring of centenarians from the New England Centenarian Study.
Personality domains of agreeableness, conscientiousness, extraversion, neuroticism, and openness were assessed with the NEO Five-Factor Inventory in 4,937 participants from the Long Life Family Study (mean age 70 years). A linear mixed model of age and gender was implemented adjusting for other covariates.
A significant age trend was found in all five personality domains. On average, the offspring generation of long-lived families scored low in neuroticism, high in extraversion, and within average values for the other three domains. Older participants tended to score higher in neuroticism and lower in the other domains compared with younger participants, but the estimated scores generally remained within average population values. No significant differences were found between long-lived family members and their spouses.
Personality factors and more specifically low neuroticism and high extraversion may be important for achieving extreme old age. In addition, personality scores of family members were not significantly different from those of their spouses, suggesting that environmental factors may play a significant role in addition to genetic factors.
PMCID: PMC3744045  PMID: 23275497
Centenarian; Longevity; NEO; Neuroticism; Personality.
7.  Genetic determinants of haemolysis in sickle cell anaemia 
British journal of haematology  2013;161(2):270-278.
Haemolytic anaemia is variable among patients with sickle cell anaemia and can be estimated by reticulocyte count, lactate dehydrogenase, aspartate aminotransferase and bilirubin levels. Using principal component analysis of these measurements we computed a haemolytic score that we used as a subphenotype in a genome-wide association study. We identified in one cohort and replicated in two additional cohorts the association of a single nucleotide polymorphism in NPRL3 (rs7203560; chr16p13·3) (P = 6·04 × 10−07). This association was validated by targeted genotyping in a fourth independent cohort. The HBA1/HBA2 regulatory elements, hypersensitive sites (HS)-33, HS-40 and HS-48 are located in introns of NPRL3. Rs7203560 was in perfect linkage disequilibrium (LD) with rs9926112 (r2 = 1) and in strong LD with rs7197554 (r2 = 0·75) and rs13336641 (r2 = 0·77); the latter is located between HS-33 and HS-40 sites and next to a CTCF binding site. The minor allele for rs7203560 was associated with the −∝3·7thalassaemia gene deletion. When adjusting for HbF and ∝ thalassaemia, the association of NPRL3 with the haemolytic score was significant (P = 0·00375) and remained significant when examining only cases without gene deletion∝ thalassaemia (P = 0·02463). Perhaps by independently down-regulating expression of the HBA1/HBA2 genes, variants of the HBA1/HBA2 gene regulatory loci, tagged by rs7203560, reduce haemolysis in sickle cell anaemia.
PMCID: PMC4129543  PMID: 23406172
haemolysis; sickle cell anaemia; haemolytic anaemia; genetic analysis; thalassaemia
8.  A Bayesian dynamic model for influenza surveillance 
Statistics in medicine  2006;25(11):1803-1825.
The severe acute respiratory syndrome (SARS) epidemic, the growing fear of an influenza pandemic and the recent shortage of flu vaccine highlight the need for surveillance systems able to provide early, quantitative predictions of epidemic events. We use dynamic Bayesian networks to discover the interplay among four data sources that are monitored for influenza surveillance. By integrating these different data sources into a dynamic model, we identify in children and infants presenting to the pediatric emergency department with respiratory syndromes an early indicator of impending influenza morbidity and mortality. Our findings show the importance of modelling the complex dynamics of data collected for influenza surveillance, and suggest that dynamic Bayesian networks could be suitable modelling tools for developing epidemic surveillance systems.
PMCID: PMC4128871  PMID: 16645996
dynamic Bayesian networks; influenza surveillance; syndromic data
Blood cells, molecules & diseases  2008;41(3):255-258.
Increased HbF levels or F-cell (HbF containing erythrocyte) numbers can ameliorate the disease severity of β-thalassemia major and sickle cell anemia. Recent genome wide association studies reported that single nucleotide polymorphisms (SNPs) in BCL11A gene on chromosome 2p16.1 were correlated with F-cells among healthy northern Europeans, and HbF among Sardinians with β-thalassemias. In this study, we showed that SNPs in BCL11A were associated with F-cell numbers in Chinese with β-thalassemia trait, and with HbF levels in Thais with either β-thalassemia or HbE trait and in African Americans with sickle cell anemia. Taken together, the data suggest that the functional motifs responsible for modulating F-cells and HbF levels reside within a 3 kb region in the second intron of BCL11A.
PMCID: PMC4100606  PMID: 18691915
10.  Age Validation in the Long Life Family Study Through a Linkage to Early-Life Census Records 
Studies of health and longevity require accurate age reporting. Age misreporting among older adults in the United States is common.
Participants in the Long Life Family Study (LLFS) were matched to early-life census records. Age recorded in the census was used to evaluate age reporting in the LLFS. The study population was 99% non-Hispanic white.
About 88% of the participants were matched to 1910, 1920, or 1930 U.S. censuses. Match success depended on the participant’s education, place of birth, and the number of censuses available to be searched. Age at the time of the interview based on the reported date of birth and early-life census age were consistent for about 89% of the participants, and age consistency within 1 year was found for about 99% of the participants.
It is possible to match a high fraction of older study participants to their early-life census records when detailed information is available on participants’ family of origin. Such record linkage can provide an important source of information for evaluating age reporting among the oldest old participants. Our results are consistent with recent studies suggesting that age reporting among older whites in the United States appears to be quite good.
PMCID: PMC3674734  PMID: 23704206
Age validation; Census; Centenarian; Longevity; Oldest old participants.
11.  Fetal Hemoglobin in Sickle Cell Anemia: Genetic Studies of the Arab-Indian Haplotype 
Sickle cell anemia is common in the Middle East and India where the HbS gene is sometimes associated with the Arab-Indian (AI) β-globin gene (HBB) cluster haplotype. In this haplotype of sickle cell anemia, fetal hemoglobin (HbF) levels are 3-4 fold higher than those found in patients with HbS haplotypes of African origin. Little is known about the genetic elements that modulate HbF in AI haplotype patients. We therefor studied Saudi HbS homozygotes with the AI haplotype (mean HbF 19.2±7.0%, range 3.6 to 39.6%) and known genotyped cis- and trans-acting elements associated with HbF expression. All cases, regardless of HbF concentration, were homozygous for AI haplotype-specific elements cis to HBB. SNPs in BCL11A and HBS1L-MYB that were associated with HbF in other populations explained only 8.8% of the variation of HbF. KLF1 polymorphisms associated previously with high HbF were not present In the 44 patients tested. The SNPs and genetic loci we have chosen for this study do not explain the high HbF in sickle cell patients with AI haplotype or its variation among patients with this haplotype. The dispersion of HbF levels among AI haplotype patients suggests that other genetic elements modulate the effects of the known cis- and trans-acting regulators. These regulatory elements, which remain to be discovered, might be specific in the Saudi and some other populations where HbF levels are especially high.
PMCID: PMC3647015  PMID: 23465615
12.  A Dynamic Bronchial Airway Gene Expression Signature of Chronic Obstructive Pulmonary Disease and Lung Function Impairment 
Rationale: Molecular phenotyping of chronic obstructive pulmonary disease (COPD) has been impeded in part by the difficulty in obtaining lung tissue samples from individuals with impaired lung function.
Objectives: We sought to determine whether COPD-associated processes are reflected in gene expression profiles of bronchial airway epithelial cells obtained by bronchoscopy.
Methods: Gene expression profiling of bronchial brushings obtained from 238 current and former smokers with and without COPD was performed using Affymetrix Human Gene 1.0 ST Arrays.
Measurements and Main Results: We identified 98 genes whose expression levels were associated with COPD status, FEV1% predicted, and FEV1/FVC. In silico analysis identified activating transcription factor 4 (ATF4) as a potential transcriptional regulator of genes with COPD-associated airway expression, and ATF4 overexpression in airway epithelial cells in vitro recapitulates COPD-associated gene expression changes. Genes with COPD-associated expression in the bronchial airway epithelium had similarly altered expression profiles in prior studies performed on small-airway epithelium and lung parenchyma, suggesting that transcriptomic alterations in the bronchial airway epithelium reflect molecular events found at more distal sites of disease activity. Many of the airway COPD-associated gene expression changes revert toward baseline after therapy with the inhaled corticosteroid fluticasone in independent cohorts.
Conclusions: Our findings demonstrate a molecular field of injury throughout the bronchial airway of active and former smokers with COPD that may be driven in part by ATF4 and is modifiable with therapy. Bronchial airway epithelium may ultimately serve as a relatively accessible tissue in which to measure biomarkers of disease activity for guiding clinical management of COPD.
PMCID: PMC3707363  PMID: 23471465
chronic obstructive pulmonary disease; gene expression profiling; biologic markers
13.  PleioGRiP: genetic risk prediction with pleiotropy 
Bioinformatics  2013;29(8):1086-1088.
Motivation: Although several studies have used Bayesian classifiers for risk prediction using genome-wide single nucleotide polymorphism (SNP) datasets, no software can efficiently perform these analyses on massive genetic datasets and can accommodate multiple traits.
Results: We describe the program PleioGRiP that performs a genome-wide Bayesian model search to identify SNPs associated with a discrete phenotype and uses SNPs ranked by Bayes factor to produce nested Bayesian classifiers. These classifiers can be used for genetic risk prediction, either selecting the classifier with optimal number of features or using an ensemble of classifiers. In addition, PleioGRiP implements an extension to the Bayesian search and classification and can search for pleiotropic relationships in which SNPs are simultaneosly associated with two or more distinct phenotypes. These relationships can be used to generate connected Bayesian classifiers to predict the phenotype of interest either using genetic data alone or in combination with the secondary phenotype(s).
Availability: PleioGRiP is implemented in Java, and it is available from
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3624803  PMID: 23419378
14.  Families Enriched for Exceptional Longevity also have Increased Health-Span: Findings from the Long Life Family Study 
Hypothesizing that members of families enriched for longevity delay morbidity compared to population controls and approximate the health-span of centenarians, we compared the health-spans of older generation subjects of the Long Life Family Study (LLFS) to controls without family history of longevity and to centenarians of the New England Centenarian Study (NECS) using Bayesian parametric survival analysis. We estimated hazard ratios, the ages at which specific percentiles of subjects had onsets of diseases, and the gain of years of disease-free survival in the different cohorts compared to referent controls. Compared to controls, LLFS subjects had lower hazards for cancer, cardiovascular disease, severe dementia, diabetes, hypertension, osteoporosis, and stroke. The age at which 20% of the LLFS siblings and probands had one or more age-related diseases was approximately 10 years later than NECS controls. While female NECS controls generally delayed the onset of age-related diseases compared with males controls, these gender differences became much less in the older generation of the LLFS and disappeared amongst the centenarians of the NECS. The analyses demonstrate extended health-span in the older subjects of the LLFS and suggest that this aging cohort provides an important resource to discover genetic and environmental factors that promote prolonged health-span in addition to longer life-span.
PMCID: PMC3859985  PMID: 24350207
health-span; longevity; onset of disease; survival analysis; Weibull regression
15.  Meta-analysis of genetic variants associated with human exceptional longevity 
Aging (Albany NY)  2013;5(9):653-661.
Despite evidence from family studies that there is a strong genetic influence upon exceptional longevity, relatively few genetic variants have been associated with this trait. One reason could be that many genes individually have such weak effects that they cannot meet standard thresholds of genome wide significance, but as a group in specific combinations of genetic variations, they can have a strong influence. Previously we reported that such genetic signatures of 281 genetic markers associated with about 130 genes can do a relatively good job of differentiating centenarians from non-centenarians particularly if the centenarians are 106 years and older. This would support our hypothesis that the genetic influence upon exceptional longevity increases with older and older (and rarer) ages. We investigated this list of markers using similar genetic data from 5 studies of centenarians from the USA, Europe and Japan. The results from the meta-analysis show that many of these variants are associated with survival to these extreme ages in other studies. Since many centenarians compress morbidity and disability towards the end of their lives, these results could point to biological pathways and therefore new therapeutics to increase years of healthy lives in the general population.
PMCID: PMC3808698  PMID: 24244950
centenarian; exceptional longevity; genetic association study; aging; gene; lifespan; meta-analysis
16.  Human longevity and common variations in the LMNA gene: a meta-analysis 
Aging Cell  2012;11(3):475-481.
A mutation in the LMNA gene is responsible for the most dramatic form of premature aging, Hutchinson-Gilford progeria syndrome (HGPS). Several recent studies have suggested that protein products of this gene might have a role in normal physiological cellular senescence. To explore further LMNA's possible role in normal aging, we genotyped 16 SNPs over a span of 75.4 kb of the LMNA gene on a sample of long-lived individuals (US Caucasians with age ≥95 years, N=873) and genetically matched younger controls (N=443). We tested all common non-redundant haplotypes (frequency ≥ 0.05) based on subgroups of these 16 SNPs for association with longevity. The most significant haplotype, based on 4 SNPs, remained significant after adjustment for multiple testing (OR = 1.56, P=2.5×10−5, multiple-testing-adjusted P=0.0045). To attempt to replicate these results, we genotyped 3448 subjects from four independent samples of long-lived individuals and control subjects from 1) the New England Centenarian Study (NECS) (N=738), 2) the Southern Italian Centenarian Study (SICS) (N=905), 3) France (N=1103), and 4) the Einstein Ashkenazi Longevity Study (N=702). We replicated the association with the most significant haplotype from our initial analysis in the NECS sample (OR = 1.60, P=0.0023), but not in the other three samples (P>.15). In a meta-analysis combining all five samples, the best haplotype remained significantly associated with longevity after adjustment for multiple testing in the initial and follow-up samples (OR = 1.18, P=7.5×10−4, multiple-testing-adjusted P=0.037). These results suggest that LMNA variants may play a role in human lifespan.
PMCID: PMC3350595  PMID: 22340368
longevity gene; human; longevity; genetics
17.  Fetal Hemoglobin in Sickle Cell Anemia: Molecular Characterization of the Unusually High Fetal Hemoglobin Phenotype in African Americans 
American journal of hematology  2011;87(2):217-219.
Fetal hemoglobin (HbF) is a major modifier of disease severity in sickle cell anemia (SCA). Three major HbF quantitative trait loci (QTL) are known: the Xmn I site upstream of Gγ-globin gene (HBG2) on chromosome 11p15, BCL11A on chromosome 2p16, and HBS1L-MYB intergenic polymorphism (HMIP) on chromosome 6q23. However, the roles of these QTLs in SCA patients with uncharacteristically high HbF are not known. We studied 20 African American SCA patients with markedly elevated HbF (mean 17.2%). They had significantly higher minor allele frequencies (MAF) in two HbF QTLs, BCL11A and HMIP, compared with those with low HbF. A 3-bp (TAC) deletion in complete linkage disequilibrium (LD) with the minor allele of rs9399137 in HMIP was also present significantly more often in these patients. To further explore other genetic loci that might be responsible for this high HbF, we sequenced a 14.1 kb DNA fragment between the Aγ(HBG1) and δ-globin genes (HBD). Thirty-eight SNPs were found. Four SNPs had significantly higher major allele frequencies in the unusually high HbF group. In silico analyses of these 4 polymorphisms predicted alteration in transcription factor binding sites in 3.
PMCID: PMC3302931  PMID: 22139998
Sickle cell anemia; Fetal hemoglobin; HbF quantitative trait loci
18.  Health Span Approximates Life Span Among Many Supercentenarians: Compression of Morbidity at the Approximate Limit of Life Span 
We analyze the relationship between age of survival, morbidity, and disability among centenarians (age 100–104 years), semisupercentenarians (age 105–109 years), and supercentenarians (age 110–119 years). One hundred and four supercentenarians, 430 semisupercentenarians, 884 centenarians, 343 nonagenarians, and 436 controls were prospectively followed for an average of 3 years (range 0–13 years). The older the age group, generally, the later the onset of diseases, such as cancer, cardiovascular disease, dementia, and stroke, as well as of cognitive and functional decline. The hazard ratios for these individual diseases became progressively less with older and older age, and the relative period of time spent with disease was lower with increasing age group. We observed a progressive delay in the age of onset of physical and cognitive function impairment, age-related diseases, and overall morbidity with increasing age. As the limit of human life span was effectively approached with supercentenarians, compression of morbidity was generally observed.
PMCID: PMC3309876  PMID: 22219514
Centenarian; Supercentenarian; Compression of morbidity; Oldest old; Health span
19.  Educating Translational Researchers in Research Informatics Principles and Methods: An Evaluation of a Model Online Course and Plans for its Dissemination 
Translational research generates and/or uses very large amounts of diverse data. Informatics principles and methods address datasets that are large and complex, whereas few translational researchers know these principles and methods and many cannot design, carry out, or analyze the results of these studies optimally. With few exceptions, informatics education has not been directed to researchers, especially established researchers. To fill this gap, we carried out a formal needs assessment of research informatics education of translational researchers, focusing on established researchers. Using the results, we developed a model curriculum for educating researchers in research informatics and a first generation model online course in research informatics for researchers. We are completing a formal evaluation of this online course with a diverse group of translational researchers. From the results of this evaluation, we will create a second version of the online course, a dissemination plan to make it available to researchers nationally, and a plan to enhance the course over time. We will discuss the implications for the future of translational research and research informatics.
PMCID: PMC3814464  PMID: 24303298
20.  The Genetics of Extreme Longevity: Lessons from the New England Centenarian Study 
Frontiers in Genetics  2012;3:277.
The New England Centenarian Study (NECS) was founded in 1994 as a longitudinal study of centenarians to determine if centenarians could be a model of healthy human aging. Over time, the NECS along with other centenarian studies have demonstrated that the majority of centenarians markedly delay high mortality risk-associated diseases toward the ends of their lives, but many centenarians have a history of enduring more chronic age-related diseases for many years, women more so than men. However, the majority of centenarians seem to deal with these chronic diseases more effectively, not experiencing disability until well into their nineties. Unlike most centenarians who are less than 101 years old, people who live to the most extreme ages, e.g., 107+ years, are generally living proof of the compression of morbidity hypothesis. That is, they compress morbidity and disability to the very ends of their lives. Various studies have also demonstrated a strong familial component to extreme longevity and now evidence particularly from the NECS is revealing an increasingly important genetic component to survival to older and older ages beyond 100 years. It appears to us that this genetic component consists of many genetic modifiers each with modest effects, but as a group they can have a strong influence.
PMCID: PMC3510428  PMID: 23226160
centenarians; genetic of longevity; heritability of longevity; compression of morbidity; genetic variation
22.  Bayesian Methods for Multivariate Modeling of Pleiotropic SNP Associations and Genetic Risk Prediction 
Frontiers in Genetics  2012;3:176.
Genome-wide association studies (GWAS) have identified numerous associations between genetic loci and individual phenotypes; however, relatively few GWAS have attempted to detect pleiotropic associations, in which loci are simultaneously associated with multiple distinct phenotypes. We show that pleiotropic associations can be directly modeled via the construction of simple Bayesian networks, and that these models can be applied to produce single or ensembles of Bayesian classifiers that leverage pleiotropy to improve genetic risk prediction. The proposed method includes two phases: (1) Bayesian model comparison, to identify Single-Nucleotide Polymorphisms (SNPs) associated with one or more traits; and (2) cross-validation feature selection, in which a final set of SNPs is selected to optimize prediction. To demonstrate the capabilities and limitations of the method, a total of 1600 case-control GWAS datasets with two dichotomous phenotypes were simulated under 16 scenarios, varying the association strengths of causal SNPs, the size of the discovery sets, the balance between cases and controls, and the number of pleiotropic causal SNPs. Across the 16 scenarios, prediction accuracy varied from 90 to 50%. In the 14 scenarios that included pleiotropically associated SNPs, the pleiotropic model search and prediction methods consistently outperformed the naive model search and prediction. In the two scenarios in which there were no true pleiotropic SNPs, the differences between the pleiotropic and naive model searches were minimal. To further evaluate the method on real data, a discovery set of 1071 sickle cell disease (SCD) patients was used to search for pleiotropic associations between cerebral vascular accidents and fetal hemoglobin level. Classification was performed on a smaller validation set of 352 SCD patients, and showed that the inclusion of pleiotropic SNPs may slightly improve prediction, although the difference was not statistically significant. The proposed method is robust, computationally efficient, and provides a powerful new approach for detecting and modeling pleiotropic disease loci.
PMCID: PMC3438684  PMID: 22973300
pleiotropy; SNP; GWAS; prediction; Bayesian
23.  Ancestry of African Americans with Sickle Cell Disease 
The inheritance of genetic disease depends on ancestry that must be considered when interpreting genetic association studies and can provide insights when comparing traits in a population. We compared the genetic profiles of African Americans with sickle cell disease to those of Black Africans and Caucasian populations of European descent and found that they are less genetically admixed than other African Americans and have an ancestry similar to Yorubans, Mandenkas and Bantu.
PMCID: PMC3116635  PMID: 21546286
sickle cell disease; genetic ancestry; admixture; genetic association
24.  Premature expression of a muscle fibrosis axis in chronic HIV infection 
Skeletal Muscle  2012;2:10.
Despite the success of highly active antiretroviral therapy (HAART), HIV infected individuals remain at increased risk for frailty and declines in physical function that are more often observed in older uninfected individuals. This may reflect premature or accelerated muscle aging.
Skeletal muscle gene expression profiles were evaluated in three uninfected independent microarray datasets including young (19 to 29 years old), middle aged (40 to 45 years old) and older (65 to 85 years old) subjects, and a muscle dataset from HIV infected subjects (36 to 51 years old). Using Bayesian analysis, a ten gene muscle aging signature was identified that distinguished young from old uninfected muscle and included the senescence and cell cycle arrest gene p21/Cip1 (CDKN1A). This ten gene signature was then evaluated in muscle specimens from a cohort of middle aged (30 to 55 years old) HIV infected individuals. Expression of p21/Cip1 and related pathways were validated and further analyzed in a rodent model for HIV infection.
We identify and replicate the expression of a set of muscle aging genes that were prematurely expressed in HIV infected, but not uninfected, middle aged subjects. We validated select genes in a rodent model of chronic HIV infection. Because the signature included p21/Cip1, a cell cycle arrest gene previously associated with muscle aging and fibrosis, we explored pathways related to senescence and fibrosis. In addition to p21/Cip1, we observed HIV associated upregulation of the senescence factor p16INK4a (CDKN2A) and fibrosis associated TGFβ1, CTGF, COL1A1 and COL1A2. Fibrosis in muscle tissue was quantified based on collagen deposition and confirmed to be elevated in association with infection status. Fiber type composition was also measured and displayed a significant increase in slow twitch fibers associated with infection.
The expression of genes associated with a muscle aging signature is prematurely upregulated in HIV infection, with a prominent role for fibrotic pathways. Based on these data, therapeutic interventions that promote muscle function and attenuate pro-fibrotic gene expression should be considered in future studies.
PMCID: PMC3407733  PMID: 22676806
Skeletal muscle; Aging; Gene expression; HIV infection; Senescence
25.  A Genome-Wide Association Study of Total Bilirubin and Cholelithiasis Risk in Sickle Cell Anemia 
PLoS ONE  2012;7(4):e34741.
Serum bilirubin levels have been associated with polymorphisms in the UGT1A1 promoter in normal populations and in patients with hemolytic anemias, including sickle cell anemia. When hemolysis occurs circulating heme increases, leading to elevated bilirubin levels and an increased incidence of cholelithiasis. We performed the first genome-wide association study (GWAS) of bilirubin levels and cholelithiasis risk in a discovery cohort of 1,117 sickle cell anemia patients. We found 15 single nucleotide polymorphisms (SNPs) associated with total bilirubin levels at the genome-wide significance level (p value <5×10−8). SNPs in UGT1A1, UGT1A3, UGT1A6, UGT1A8 and UGT1A10, different isoforms within the UGT1A locus, were identified (most significant rs887829, p = 9.08×10−25). All of these associations were validated in 4 independent sets of sickle cell anemia patients. We tested the association of the 15 SNPs with cholelithiasis in the discovery cohort and found a significant association (most significant p value 1.15×10−4). These results confirm that the UGT1A region is the major regulator of bilirubin metabolism in African Americans with sickle cell anemia, similar to what is observed in other ethnicities.
PMCID: PMC3338756  PMID: 22558097

Results 1-25 (55)