1.  Original Research: A case-control genome-wide association study identifies genetic modifiers of fetal hemoglobin in sickle cell disease 
Experimental Biology and Medicine  2016;241(7):706-718.
Sickle cell disease (SCD) is a group of inherited blood disorders that have in common a mutation in the sixth codon of the β-globin (HBB) gene on chromosome 11. However, people with the same genetic mutation display a wide range of clinical phenotypes. Fetal hemoglobin (HbF) expression is an important genetic modifier of SCD complications leading to milder symptoms and improved long-term survival. Therefore, we performed a genome-wide association study (GWAS) using a case-control experimental design in 244 African Americans with SCD to discover genetic factors associated with HbF expression. The case group consisted of subjects with HbF≥8.6% (133 samples) and control group subjects with HbF≤£3.1% (111 samples). Our GWAS results replicated SNPs previously identified in an erythroid-specific enhancer region located in the second intron of the BCL11A gene associated with HbF expression. In addition, we identified SNPs in the SPARC, GJC1, EFTUD2 and JAZF1 genes as novel candidates associated with HbF levels. To gain insights into mechanisms of globin gene regulation in the HBB locus, linkage disequilibrium (LD) and haplotype analyses were conducted. We observed strong LD in the low HbF group in contrast to a loss of LD and greater number of haplotypes in the high HbF group. A search of known HBB locus regulatory elements identified SNPs 5′ of δ-globin located in an HbF silencing region. In particular, SNP rs4910736 created a binding site for a known transcription repressor GFi1 which is a candidate protein for further investigation. Another HbF-associated SNP, rs2855122 in the cAMP response element upstream of Gγ-globin, was analyzed for functional relevance. Studies performed with siRNA-mediated CREB binding protein (CBP) knockdown in primary erythroid cells demonstrated γ-globin activation and HbF induction, supporting a repressor role for CBP. This study identifies possible molecular determinants of HbF production.
PMCID: PMC4950389  PMID: 27022141
GWAS; sickle cell disease; fetal hemoglobin; HBB locus; haplotypes; single nucleotide polymorphisms
2.  Detection of Significant Groups in Hierarchical Clustering by Resampling 
Frontiers in Genetics  2016;7:144.
Hierarchical clustering is a simple and reproducible technique to rearrange data of multiple variables and sample units and visualize possible groups in the data. Despite the name, hierarchical clustering does not provide clusters automatically, and “tree-cutting” procedures are often used to identify subgroups in the data by cutting the dendrogram that represents the similarities among groups used in the agglomerative procedure. We introduce a resampling-based technique that can be used to identify cut-points of a dendrogram with a significance level based on a reference distribution for the heights of the branch points. The evaluation on synthetic data shows that the technique is robust in a variety of situations. An example with real biomarker data from the Long Life Family Study shows the usefulness of the method.
PMCID: PMC4976109  PMID: 27551289
dendrogram; tree-cutting procedures; resampling techniques
4.  Learning Bayesian Networks from Correlated Data 
Scientific Reports  2016;6:25156.
Bayesian networks are probabilistic models that represent complex distributions in a modular way and have become very popular in many fields. There are many methods to build Bayesian networks from a random sample of independent and identically distributed observations. However, many observational studies are designed using some form of clustered sampling that introduces correlations between observations within the same cluster and ignoring this correlation typically inflates the rate of false positive associations. We describe a novel parameterization of Bayesian networks that uses random effects to model the correlation within sample units and can be used for structure and parameter learning from correlated data without inflating the Type I error rate. We compare different learning metrics using simulations and illustrate the method in two real examples: an analysis of genetic and non-genetic factors associated with human longevity from a family-based study, and an example of risk factors for complications of sickle cell anemia from a longitudinal study with repeated measures.
PMCID: PMC4857179  PMID: 27146517
5.  Familial Risk for Exceptional Longevity 
One of the most glaring deficiencies in the current assessment of mortality risk is the lack of information concerning the impact of familial longevity. In this work, we update estimates of sibling relative risk of living to extreme ages using data from more than 1,700 sibships, and we begin to examine the trend for heritability for different birth-year cohorts. We also build a network model that can be used to compute the increased chance for exceptional longevity of a subject, conditional on his family history of longevity. The network includes familial longevity from three generations and can be used to understand the effects of paternal and maternal longevity on an individual's chance to live to an extreme age.
PMCID: PMC4812435  PMID: 27041978
6.  Novel loci and pathways significantly associated with longevity 
Scientific Reports  2016;6:21243.
Only two genome-wide significant loci associated with longevity have been identified so far, probably because of insufficient sample sizes of centenarians, whose genomes may harbor genetic variants associated with health and longevity. Here we report a genome-wide association study (GWAS) of Han Chinese with a sample size 2.7 times the largest previously published GWAS on centenarians. We identified 11 independent loci associated with longevity replicated in Southern-Northern regions of China, including two novel loci (rs2069837-IL6; rs2440012-ANKRD20A9P) with genome-wide significance and the rest with suggestive significance (P < 3.65 × 10−5). Eight independent SNPs overlapped across Han Chinese, European and U.S. populations, and APOE and 5q33.3 were replicated as longevity loci. Integrated analysis indicates four pathways (starch, sucrose and xenobiotic metabolism; immune response and inflammation; MAPK; calcium signaling) highly associated with longevity (P ≤ 0.006) in Han Chinese. The association with longevity of three of these four pathways (MAPK; immunity; calcium signaling) is supported by findings in other human cohorts. Our novel finding on the association of starch, sucrose and xenobiotic metabolism pathway with longevity is consistent with the previous results from Drosophilia. This study suggests protective mechanisms including immunity and nutrient metabolism and their interactions with environmental stress play key roles in human longevity.
PMCID: PMC4766491  PMID: 26912274
7.  GWAS of Longevity in CHARGE Consortium Confirms APOE and FOXO3 Candidacy 
The genetic contribution to longevity in humans has been estimated to range from 15% to 25%. Only two genes, APOE and FOXO3, have shown association with longevity in multiple independent studies.
We conducted a meta-analysis of genome-wide association studies including 6,036 longevity cases, age ≥90 years, and 3,757 controls that died between ages 55 and 80 years. We additionally attempted to replicate earlier identified single nucleotide polymorphism (SNP) associations with longevity.
In our meta-analysis, we found suggestive evidence for the association of SNPs near CADM2 (odds ratio [OR] = 0.81; p value = 9.66 × 10−7) and GRIK2 (odds ratio = 1.24; p value = 5.09 × 10−8) with longevity. When attempting to replicate findings earlier identified in genome-wide association studies, only the APOE locus consistently replicated. In an additional look-up of the candidate gene FOXO3, we found that an earlier identified variant shows a highly significant association with longevity when including published data with our meta-analysis (odds ratio = 1.17; p value = 1.85×10−10).
We did not identify new genome-wide significant associations with longevity and did not replicate earlier findings except for APOE and FOXO3. Our inability to find new associations with survival to ages ≥90 years because longevity represents multiple complex traits with heterogeneous genetic underpinnings, or alternatively, that longevity may be regulated by rare variants that are not captured by standard genome-wide genotyping and imputation of common variants.
PMCID: PMC4296168  PMID: 25199915
Longevity; GWAS; FOXO3; APOE.
8.  Extended Maternal Age at Birth of Last Child and Women’s Longevity in the Long Life Family Study 
Menopause (New York, N.Y.)  2015;22(1):26-31.
This study investigated the association between maternal ages at birth of last child and the likelihood of survival to advanced ages.
A nested case-control study using Long Life Family Study (LLFS) data. Three hundred and eleven women who survived past the oldest 5th percentile of survival according to birth cohort matched life tables were identified as cases and 151 women who died at ages younger than the top 5th percentile of survival were identified as controls. A Bayesian mixed-effect logistic regression model was used to estimate the association between maternal age at birth of last child and exceptional longevity among these 462 women.
A significant association for later maternal age was found whereby women who had their last child beyond the age of 33 years had twice the odds of survival to the top 5th percentile of survival of their birth cohorts compared to women who had their last child by age 29 (OR=2.08, 95%CI 1.13; 3.92 for age between 33 and 37 years and OR=1.92, 95% CI 1.03; 3.68 for older age).
The study supports the findings from other studies demonstrating a positive association between older maternal age and greater odds of the mother surviving to unusually old age.
PMCID: PMC4270889  PMID: 24977462
maternal age; menopause; centenarian; familial longevity; aging; evolution
9.  The Genetics of Hemoglobin A2 Regulation in Sickle Cell Anemia 
American journal of hematology  2014;89(11):1019-1023.
Hemoglobin A2, a tetramer of α- and δ-globin chains, comprises less than 3% of total hemoglobin in normal adults. In northern Europeans, single nucleotide polymorphisms (SNPs) in the HBS1L-MYB locus on chromosome 6q and the HBB cluster on chromosome 11p were associated with HbA2 levels. We examined the genetic basis of HbA2 variability in sickle cell anemia using genome-wide association studies (GWAS). HbA2 levels were associated with SNPs in the HBS1L-MYB interval that affect erythropoiesis and HbF expression and SNPs in BCL11A that regulate the γ-globin genes. These effects are mediated by the association of these loci with γ-globin gene expression and fetal hemoglobin (HbF) levels. The association of polymorphisms downstream of the β-globin gene (HBB) cluster on chromosome 11 with HbA2 was not mediated by HbF. In sickle cell anemia, levels of HbA2 appear to be modulated by trans-acting genes that affect HBG expression and perhaps also elements within the β-globin gene cluster. HbA2 is expressed pancellularly and can inhibit HbS polymerization. It remains to be seen if genetic regulators of HbA2 can be exploited for therapeutic purposes.
PMCID: PMC4298130  PMID: 25042611
10.  Genetic polymorphism of APOB is associated with diabetes mellitus in sickle cell disease 
Human genetics  2015;134(8):895-904.
Environmental variations have strong influences in the etiology of type 2 diabetes mellitus. In this study, we investigated the genetic basis of diabetes in patients with sickle cell disease (SCD), a Mendelian disorder accompanied by distinct physiological conditions of hypoxia and hyperactive erythropoiesis. Compared to the general African-American population, the prevalence of diabetes as assessed in two SCD cohorts of 856 adults was low, but it markedly increased with older age and overweight. Meta-analyses of over 5 million single nucleotide polymorphisms (SNPs) in the two SCD cohorts identified a SNP, rs59014890, the C allele of which associated with diabetes risk at P= 3.2×10-8 and, surprisingly, associated with decreased APOB expression in peripheral blood mononuclear cells (PBMCs). The risk allele of the APOB polymorphism was associated with overweight in 181 SCD adolescents, with diabetes risk in 592 overweight, non-SCD African Americans ≥45 years of age, and with elevated plasma lipid concentrations in general populations. In addition, lower expression level of APOB in PBMCs was associated with higher values for percent hemoglobin A1C and serum total cholesterol and triglyceride concentrations in patients with Chuvash polycythemia, a congenital disease with elevated hypoxic responses and increased erythropoiesis at normoxia. Our study reveals a novel, environment-specific genetic polymorphism that may affect key metabolic pathways contributing to diabetes in SCD.
PMCID: PMC4607040  PMID: 26025476
11.  Genetic Modifiers of Sickle Cell Disease 
American journal of hematology  2012;87(8):795-803.
Sickle cell anemia is associated with unusual clinical heterogeneity for a Mendelian disorder. Fetal hemoglobin concentration and coincident ∝ thalassemia, both which directly affect the sickle erythrocyte, are the major modulators of the phenotype of disease. Understanding the genetics underlying the heritable subphenotypes of sickle cell anemia would be prognostically useful, could inform personalized therapeutics, and might help the discovery of new “druggable” pathophysiologic targets. Genotype-phenotype association studies have been used to identify novel genetic modifiers. In the future, whole genome sequencing with its promise of discovering hitherto unsuspected variants could add to our understanding of the genetic modifiers of this disease.
PMCID: PMC4562292  PMID: 22641398
13.  Bayesian Polynomial Regression Models to Fit Multiple Genetic Models for Quantitative Traits 
Bayesian analysis (Online)  2014;10(1):53-74.
We present a coherent Bayesian framework for selection of the most likely model from the five genetic models (genotypic, additive, dominant, co-dominant, and recessive) commonly used in genetic association studies. The approach uses a polynomial parameterization of genetic data to simultaneously fit the five models and save computations. We provide a closed-form expression of the marginal likelihood for normally distributed data, and evaluate the performance of the proposed method and existing method through simulated and real genome-wide data sets.
PMCID: PMC4446790  PMID: 26029316
marginal likelihood; GWAS; Bayesian model selection; parameterization; additive; dominant; recessive; co-dominant
14.  Temporal gene expression profiling of the rat knee joint capsule during immobilization-induced joint contractures 
Contractures of the knee joint cause disability and handicap. Recovering range of motion is recognized by arthritic patients as their preference for improved health outcome secondary only to pain management. Clinical and experimental studies provide evidence that the posterior knee capsule prevents the knee from achieving full extension. This study was undertaken to investigate the dynamic changes of the joint capsule transcriptome during the progression of knee joint contractures induced by immobilization. We performed a microarray analysis of genes expressed in the posterior knee joint capsule following induction of a flexion contracture by rigidly immobilizing the rat knee joint over a time-course of 16 weeks. Fold changes of expression values were measured and co-expressed genes were identified by clustering based on time-series analysis. Genes associated with immobilization were further analyzed to reveal pathways and biological significance and validated by immunohistochemistry on sagittal sections of knee joints.
Changes in expression with a minimum of 1.5 fold changes were dominated by a decrease in expression for 7732 probe sets occurring at week 8 while the expression of 2251 probe sets increased. Clusters of genes with similar profiles of expression included a total of 162 genes displaying at least a 2 fold change compared to week 1. Functional analysis revealed ontology categories corresponding to triglyceride metabolism, extracellular matrix and muscle contraction. The altered expression of selected genes involved in the triglyceride biosynthesis pathway; AGPAT-9, and of the genes P4HB and HSP47, both involved in collagen synthesis, was confirmed by immunohistochemistry.
Gene expression in the knee joint capsule was sensitive to joint immobility and provided insights into molecular mechanisms relevant to the pathophysiology of knee flexion contractures. Capsule responses to immobilization was dynamic and characterized by modulation of at least three reaction pathways; down regulation of triglyceride biosynthesis, alteration of extracellular matrix degradation and muscle contraction gene expression. The posterior knee capsule may deploy tissue-specific patterns of mRNA regulatory responses to immobilization. The identification of altered expression of genes and biochemical pathways in the joint capsule provides potential targets for the therapy of knee flexion contractures.
PMCID: PMC4443538  PMID: 26006773
Joint contracture; Immobilization; Knee joint capsule; Gene expression; Rat
15.  Prediction of Fetal Hemoglobin in Sickle Cell Anemia Using an Ensemble of Genetic Risk Prediction Models 
Fetal hemoglobin (HbF) is the major modifier of the clinical course of sickle cell anemia. Its levels are highly heritable and its interpersonal variability is modulated in part by three quantitative trait loci (QTL) that effect HbF gene expression. Genome-wide association studies (GWAS) have identified single nucleotide polymorphisms (SNPs) in these QTLs that are highly associated with HbF but explain only 10 to 12% of the variance of HbF. Combining SNPs into a genetic risk score (GRS) can help to explain a larger amount of the variability of HbF level but the challenge of this approach is to select the optimal number of SNPs to be included in the GRS.
Methods and Results
We develop a collection of 14 models with GRS composed of different numbers of SNPs, and use the ensemble of these models to predict HbF in sickle cell anemia patients. The models were trained in 841 sickle cell anemia patients and were tested in three independent cohorts. The ensemble of 14 models explained 23.4% of the variability in HbF in the discovery cohort, while the correlation between predicted and observed HbF in the 3 independent cohorts ranged between 0.28 and 0.44. The models included SNPs in BCL11A, the HBS1L-MYB intergenic region and the site of the HBB gene cluster, QTL previously associated with HbF.
An ensemble of 14 genetic risk models can predict HbF levels with accuracy between 0.28 and 0.44 and the approach may prove useful in other applications.
PMCID: PMC3994553  PMID: 24585758
sickle cell disease; hemoglobin; genetics; association studies; risk prediction; risk factor
16.  Association between wind speed and the occurrence of sickle cell acute painful episodes: results of a case-crossover study 
British journal of haematology  2008;143(3):433-438.
The role of the weather as a trigger of sickle cell acute painful episodes has long been debated. To more accurately describe the role of the weather as a trigger of painful events, we conducted a case-crossover study of the association between local weather conditions and the occurrence of painful episodes. From the Cooperative Study of Sickle Cell Disease, we identified 813 patients with sickle cell anaemia who had 3570 acute painful episodes. We found an association between wind speed and the onset of pain, specifically wind speed during the 24-h period preceding the onset of pain. Analysing wind speed as a categorical trait, showed a 13% increase (95% confidence interval: 3%, 24%) in odds of pain, when comparing the high wind speed to lower wind speed (P = 0.007). In addition, the association between wind speed and painful episodes was found to be stronger among men, particularly those in the warmer climate regions of the United States. These results are in agreement with another study that found an association between wind speed and hospital visits for pain in the United Kingdom, and lends support to physiological and clinical studies that have suggested that skin cooling is associated with sickle vasoocclusion and perhaps pain.
PMCID: PMC4347894  PMID: 18729854
epidemiology; sickle cell anaemia; pain; weather
17.  Burden of disease variants in participants of the long life family Study 
Aging (Albany NY)  2015;7(2):123-132.
Case control studies of nonagenarians and centenarians provide evidence that long-lived individuals do not differ in the rate of disease associated variants compared to population controls. These results suggest that an enrichment of novel protective variants, rather than a lack of disease associated variants, determine the genetic predisposition to exceptionally long lives. Using data from the Long Life Family Study (LLFS), we sought to replicate these findings and extend them to include a larger number of disease-specific risk alleles. To accomplish this goal, we built a genetic risk score for each of four age-related disease groups: Alzheimer's disease, cardiovascular disease and stroke, type 2 diabetes, and various cancers and compared the distribution of these scores between older participants of the LLFS, their offspring and their spouses. The analyses showed no significant differences in distribution of the genetic risk scores for cardiovascular disease and stroke, type 2 diabetes, or cancer between the groups, while participants of the LLFS appeared to carry an average 1% fewer risk alleles for Alzheimer's disease compared to spousal controls and, while the difference may not be clinically relevant, it was statistically significant. However, the statistical significance between familial longevity and the Alzheimer's disease genetic risk score was lost when a more stringent linkage disequilibrium threshold was imposed to select independent genetic variants.
PMCID: PMC4359694  PMID: 25664523
18.  Evaluation of an ensemble of genetic models for prediction of a quantitative trait 
Frontiers in Genetics  2015;5:474.
Many genetic markers have been shown to be associated with common quantitative traits in genome-wide association studies. Typically these associated genetic markers have small to modest effect sizes and individually they explain only a small amount of the variability of the phenotype. In order to build a genetic prediction model without fitting a multiple linear regression model with possibly hundreds of genetic markers as predictors, researchers often summarize the joint effect of risk alleles into a genetic score that is used as a covariate in the genetic prediction model. However, the prediction accuracy can be highly variable and selecting the optimal number of markers to be included in the genetic score is challenging. In this manuscript we present a strategy to build an ensemble of genetic prediction models from data and we show that the ensemble-based method makes the challenge of choosing the number of genetic markers more amenable. Using simulated data with varying heritability and number of genetic markers, we compare the predictive accuracy and inclusion of true positive and false positive markers of a single genetic prediction model and our proposed ensemble method. The results show that the ensemble of genetic models tends to include a larger number of genetic variants than a single genetic model and it is more likely to include all of the true genetic markers. This increased sensitivity is obtained at the price of a lower specificity that appears to minimally affect the predictive accuracy of the ensemble.
PMCID: PMC4292739  PMID: 25628649
genetic risk prediction; genetic risk score; ensemble-based classifiers; bagging predictors; prediction accuracy
19.  The Hypoxic Response Contributes to Altered Gene Expression and Pre-Capillary Pulmonary Hypertension in Patients with Sickle Cell Disease 
Circulation  2014;129(16):1650-1658.
We postulated that the hypoxic response in sickle cell disease (SCD) contributes to altered gene expression and pulmonary hypertension, a complication associated with early mortality.
Methods and Results
To identify genes regulated by the hypoxic response and not other effects of chronic anemia, we compared expression variation in peripheral blood mononuclear cells from 13 SCD subjects with hemoglobin SS genotype and 15 Chuvash polycythemia subjects (VHLR200W homozygotes with constitutive up-regulation of hypoxia inducible factors in the absence of anemia or hypoxia). At 5% false discovery rate, 1040 genes exhibited >1.15 fold change in both conditions; 297 were up-regulated and 743 down-regulated including MAPK8 encoding a mitogen-activated protein kinase important for apoptosis, T-cell differentiation and inflammatory responses. Association mapping with a focus on local regulatory polymorphisms in 61 SCD patients identified expression quantitative trait loci (eQTL) for 103 of these hypoxia response genes. In a University of Illinois SCD cohort the A allele of a MAPK8 eQTL, rs10857560, was associated with pre-capillary pulmonary hypertension defined as mean pulmonary artery pressure ≥25 and pulmonary capillary wedge pressure ≤15 mm Hg at right heart catheterization (allele frequency=0.66; OR=13.8, P=0.00036, n=238). This association was confirmed in an independent Walk-PHaSST cohort (allele frequency=0.65; OR=11.3, P=0.0025, n=519). The homozygous AA genotype of rs10857560 was associated with decreased MAPK8 expression and present in all 14 identified pre-capillary pulmonary hypertension cases among the combined 757 patients.
Our study demonstrates a prominent hypoxic transcription component in SCD and a MAPK8 eQTL associated with pre-capillary pulmonary hypertension.
PMCID: PMC4287376  PMID: 24515990
sickle cell disease; MAPK8; hypoxic response; expression quantitative trait loci; association mapping; pre-capillary pulmonary hypertension
20.  Relationship Between Poor Physical Function, Inflammatory Markers, and Comorbidities in HIV-Infected Women on Antiretroviral Therapy 
Journal of Women's Health  2014;23(1):69-76.
Background: HIV-infected individuals may be at increased risk of poor physical function. Chronic inflammation has been associated with decreased physical function in the elderly and may also influence physical function in HIV-infected individuals.
Methods: This cross-sectional study assessed physical function in 65 HIV-infected women aged 40 and older on stable antiretroviral treatment using the Short Physical Performance Battery (SPPB): a standardized test of balance, walking speed, and lower- extremity strength developed for elderly populations. The relationship between low SPPB score, selected demographic and medical characteristics, and high inflammatory biomarker profile was analyzed using Fisher's exact test and Wilcoxon rank sum test.
Results: The median age of subjects was 49 years (interquartile range [IQR] 45–55), and the median CD4 T-cell count was 675 cells/mm3 (IQR 436–828). Thirteen subjects (20%) had a low SPPB score. Subjects with a low SPPB score were more likely to be cigarette smokers (p=0.03), had more medical comorbidities (p=0.01), and had higher levels of interleukin-6 (IL-6) (p<0.05). They also tended to be older (median age 55 vs. 48, p=0.06), more likely to have diabetes (p=0.07), and have higher levels of soluble tumor necrosis factor-1 (p=0.09).
Conclusions: Twenty percent of women aged 40 and older with well-treated HIV had poor physical-function performance, which was associated with the high burden of comorbidities in this population and with increased IL-6. However, it is unclear from this cross-sectional study whether increased inflammation was related to poor physical function or to other factors, such as age and medical comorbidities.
PMCID: PMC3880911  PMID: 24219874
21.  An efficient technique for Bayesian modeling of family data using the BUGS software 
Frontiers in Genetics  2014;5:390.
Linear mixed models have become a popular tool to analyze continuous data from family-based designs by using random effects that model the correlation of subjects from the same family. However, mixed models for family data are challenging to implement with the BUGS (Bayesian inference Using Gibbs Sampling) software because of the high-dimensional covariance matrix of the random effects. This paper describes an efficient parameterization that utilizes the singular value decomposition of the covariance matrix of random effects, includes the BUGS code for such implementation, and extends the parameterization to generalized linear mixed models. The implementation is evaluated using simulated data and an example from a large family-based study is presented with a comparison to other existing methods.
PMCID: PMC4235415  PMID: 25477899
BUGS; parameterization; family-based study; covariance matrix; linear mixed models
22.  A prediction model for lung cancer diagnosis that integrates genomic and clinical features 
Lung cancer is the leading cause of cancer death, in part due to lack of early diagnostic tools. Bronchoscopy represents a relatively noninvasive initial diagnostic test in smokers with suspect disease, but has low sensitivity. We have reported a gene expression profile in cytologically normal large airway epithelium obtained via bronchoscopic brushings that is a sensitive and specific biomarker for lung cancer. Here, we evaluate the independence of the biomarker from other clinical risk factors and determine the performance of a clinicogenomic model that combines clinical factors and gene expression.
Training (n = 76) and test sets (n = 62) consisted of smokers undergoing bronchoscopy for suspicion of lung cancer at five medical centers. Logistic regression models describing the likelihood of having lung cancer using the biomarker, clinical factors, and these data combined were tested using the independent set of patients with non-diagnostic bronchoscopies. The model predictions were also compared with physicians’ clinical assessment.
The gene expression biomarker is associated with cancer status in the combined clinicogenomic model (p < 0.005). There is a significant difference in performance of the clinicogenomic relative to the clinical model (p < 0.05). In the test set, the clinicogenomic model increases sensitivity and NPV to 100%, and results in higher specificity (91%) and PPV (81%) compared to other models. The clinicogenomic model has high accuracy where physician assessment is most uncertain.
The airway gene expression biomarker provides information about the likelihood of lung cancer not captured by clinical factors, and the clinicogenomic model has the highest prediction accuracy. These findings suggest that use of the clinicogenomic model may expedite more invasive testing and definitive therapy for smokers with lung cancer and reduce invasive diagnostic procedures for individuals without lung cancer.
PMCID: PMC4167688  PMID: 19138936
23.  Personality Factors in the Long Life Family Study 
To evaluate personality profiles of Long Life Family Study participants relative to population norms and offspring of centenarians from the New England Centenarian Study.
Personality domains of agreeableness, conscientiousness, extraversion, neuroticism, and openness were assessed with the NEO Five-Factor Inventory in 4,937 participants from the Long Life Family Study (mean age 70 years). A linear mixed model of age and gender was implemented adjusting for other covariates.
A significant age trend was found in all five personality domains. On average, the offspring generation of long-lived families scored low in neuroticism, high in extraversion, and within average values for the other three domains. Older participants tended to score higher in neuroticism and lower in the other domains compared with younger participants, but the estimated scores generally remained within average population values. No significant differences were found between long-lived family members and their spouses.
Personality factors and more specifically low neuroticism and high extraversion may be important for achieving extreme old age. In addition, personality scores of family members were not significantly different from those of their spouses, suggesting that environmental factors may play a significant role in addition to genetic factors.
PMCID: PMC3744045  PMID: 23275497
Centenarian; Longevity; NEO; Neuroticism; Personality.
24.  Genetic determinants of haemolysis in sickle cell anaemia 
British journal of haematology  2013;161(2):270-278.
Haemolytic anaemia is variable among patients with sickle cell anaemia and can be estimated by reticulocyte count, lactate dehydrogenase, aspartate aminotransferase and bilirubin levels. Using principal component analysis of these measurements we computed a haemolytic score that we used as a subphenotype in a genome-wide association study. We identified in one cohort and replicated in two additional cohorts the association of a single nucleotide polymorphism in NPRL3 (rs7203560; chr16p13·3) (P = 6·04 × 10−07). This association was validated by targeted genotyping in a fourth independent cohort. The HBA1/HBA2 regulatory elements, hypersensitive sites (HS)-33, HS-40 and HS-48 are located in introns of NPRL3. Rs7203560 was in perfect linkage disequilibrium (LD) with rs9926112 (r2 = 1) and in strong LD with rs7197554 (r2 = 0·75) and rs13336641 (r2 = 0·77); the latter is located between HS-33 and HS-40 sites and next to a CTCF binding site. The minor allele for rs7203560 was associated with the −∝3·7thalassaemia gene deletion. When adjusting for HbF and ∝ thalassaemia, the association of NPRL3 with the haemolytic score was significant (P = 0·00375) and remained significant when examining only cases without gene deletion∝ thalassaemia (P = 0·02463). Perhaps by independently down-regulating expression of the HBA1/HBA2 genes, variants of the HBA1/HBA2 gene regulatory loci, tagged by rs7203560, reduce haemolysis in sickle cell anaemia.
PMCID: PMC4129543  PMID: 23406172
haemolysis; sickle cell anaemia; haemolytic anaemia; genetic analysis; thalassaemia
25.  A Bayesian dynamic model for influenza surveillance 
Statistics in medicine  2006;25(11):1803-1825.
The severe acute respiratory syndrome (SARS) epidemic, the growing fear of an influenza pandemic and the recent shortage of flu vaccine highlight the need for surveillance systems able to provide early, quantitative predictions of epidemic events. We use dynamic Bayesian networks to discover the interplay among four data sources that are monitored for influenza surveillance. By integrating these different data sources into a dynamic model, we identify in children and infants presenting to the pediatric emergency department with respiratory syndromes an early indicator of impending influenza morbidity and mortality. Our findings show the importance of modelling the complex dynamics of data collected for influenza surveillance, and suggest that dynamic Bayesian networks could be suitable modelling tools for developing epidemic surveillance systems.
PMCID: PMC4128871  PMID: 16645996
dynamic Bayesian networks; influenza surveillance; syndromic data

