1.  A Longitudinal Study of Health Improvement in the Atlanta CHDWB Wellness Cohort 
Journal of Personalized Medicine  2014;4(4):489-507.
The Center for Health Discovery and Wellbeing (CHDWB) is an academic program designed to evaluate the efficacy of clinical self-knowledge and health partner counseling for development and maintenance of healthy behaviors. This paper reports on the change in health profiles for over 90 traits, measured in 382 participants over three visits in the 12 months following enrolment. Significant changes in the desired direction of improved health are observed for many traits related to cardiovascular health, including BMI, blood pressure, cholesterol, and arterial stiffness, as well as for summary measures of physical and mental health. The changes are most notable for individuals in the upper quartile of baseline risk, many of whom showed a positive correlated response across clinical categories. By contrast, individuals who start with more healthy profiles do not generally show significant improvements and only a modest impact of targeting specific health attributes was observed. Overall, the CHDWB model shows promise as an effective intervention particularly for individuals at high risk for cardiovascular disease.
PMCID: PMC4282885  PMID: 25563459
personalized medicine; health partner; chronic disease risk; lifestyle intervention
2.  Detection and replication of epistasis influencing transcription in humans 
Nature  2014;508(7495):249-253.
Epistasis is the phenomenon whereby one polymorphism’s effect on a trait depends on other polymorphisms present in the genome. The extent to which epistasis influences complex traits1 and contributes to their variation2,3 is a fundamental question in evolution and human genetics. Though often demonstrated in artificial gene manipulation studies in model organisms4,5, and some examples have been reported in other species6, few examples exist for epistasis amongst natural polymorphisms in human traits7,8. Its absence from empirical findings may simply be due to low incidence in the genetic control of complex traits2,3, but an alternative view is that it has previously been too technically challenging to detect due to statistical and computational issues9. Here we show that, using advanced computation10 and a gene expression study design, many instances of epistasis are found between common single nucleotide polymorphisms (SNPs). In a cohort of 846 individuals with 7339 gene expression levels measured in peripheral blood, we found 501 significant pairwise interactions between common SNPs influencing the expression of 238 genes (p < 2.91 × 10−16). Replication of these interactions in two independent data sets11,12 showed both concordance of direction of epistatic effects (p = 5.56 ×10−31) and enrichment of interaction p-values, with 30 being significant at a conservative threshold of p < 0.05/501. Forty-four of the genetic interactions are located within 2Mb of regions of known physical chromosome interactions13 (p = 1.8 × 10−10). Epistatic networks of three SNPs or more influence the expression levels of 129 genes, whereby one cis-acting SNP is modulated by several trans-acting SNPs. For example MBNL1 is influenced by an additive effect at rs13069559 which itself is masked by trans-SNPs on 14 different chromosomes, with nearly identical genotype-phenotype (GP) maps for each cis-trans interaction. This study presents the first evidence for multiple instances of segregating common polymorphisms interacting to influence human traits.
PMCID: PMC3984375  PMID: 24572353
3.  What if we had whole-genome sequence data for millions of individuals? 
Genome Medicine  2013;5(9):80.
PMCID: PMC3968547  PMID: 24050736
4.  Characterization of Distinct Classes of Differential Gene Expression in Osteoblast Cultures from Non-Syndromic Craniosynostosis Bone 
Journal of Genomics  2014;2:121-130.
Craniosynostosis, the premature fusion of one or more skull sutures, occurs in approximately 1 in 2500 infants, with the majority of cases non-syndromic and of unknown etiology. Two common reasons proposed for premature suture fusion are abnormal compression forces on the skull and rare genetic abnormalities. Our goal was to evaluate whether different sub-classes of disease can be identified based on total gene expression profiles. RNA-Seq data were obtained from 31 human osteoblast cultures derived from bone biopsy samples collected between 2009 and 2011, representing 23 craniosynostosis fusions and 8 normal cranial bones or long bones. No differentiation between regions of the skull was detected, but variance component analysis of gene expression patterns nevertheless supports transcriptome-based classification of craniosynostosis. Cluster analysis showed 4 distinct groups of samples; 1 predominantly normal and 3 craniosynostosis subtypes. Similar constellations of sub-types were also observed upon re-analysis of a similar dataset of 199 calvarial osteoblast cultures. Annotation of gene function of differentially expressed transcripts strongly implicates physiological differences with respect to cell cycle and cell death, stromal cell differentiation, extracellular matrix (ECM) components, and ribosomal activity. Based on these results, we propose non-syndromic craniosynostosis cases can be classified by differences in their gene expression patterns and that these may provide targets for future clinical intervention.
PMCID: PMC4150121  PMID: 25184005
Non-syndromic craniosynostosis; RNA-Seq; Transcriptome profile; Personalized medicine; Biomarkers.
5.  A simulation study of gene-by-environment interactions in GWAS implies ample hidden effects 
Frontiers in Genetics  2014;5:225.
The switch to a modern lifestyle in recent decades has coincided with a rapid increase in prevalence of obesity and other diseases. These shifts in prevalence could be explained by the release of genetic susceptibility for disease in the form of gene-by-environment (GxE) interactions. Yet, the detection of interaction effects requires large sample sizes, little replication has been reported, and a few studies have demonstrated environmental effects only after summing the risk of GWAS alleles into genetic risk scores (GRSxE). We performed extensive simulations of a quantitative trait controlled by 2500 causal variants to inspect the feasibility to detect gene-by-environment interactions in the context of GWAS. The simulated individuals were assigned either to an ancestral or a modern setting that alters the phenotype by increasing the effect size by 1.05–2-fold at a varying fraction of perturbed SNPs (from 1 to 20%). We report two main results. First, for a wide range of realistic scenarios, highly significant GRSxE is detected despite the absence of individual genotype GxE evidence at the contributing loci. Second, an increase in phenotypic variance after environmental perturbation reduces the power to discover susceptibility variants by GWAS in mixed cohorts with individuals from both ancestral and modern environments. We conclude that a pervasive presence of gene-by-environment effects can remain hidden even though it contributes to the genetic architecture of complex traits.
PMCID: PMC4104702  PMID: 25101110
gene-by-environment; environmental perturbation; modern lifestyle; complex disease; genetic risk score; decanalization; GWAS; obesity
6.  Single cell transcriptional analysis reveals novel innate immune cell types 
PeerJ  2014;2:e452.
Single-cell analysis has the potential to provide us with a host of new knowledge about biological systems, but it comes with the challenge of correctly interpreting the biological information. While emerging techniques have made it possible to measure inter-cellular variability at the transcriptome level, no consensus yet exists on the most appropriate method of data analysis of such single cell data. Methods for analysis of transcriptional data at the population level are well established but are not well suited to single cell analysis due to their dependence on population averages. In order to address this question, we have systematically tested combinations of methods for primary data analysis on single cell transcription data generated from two types of primary immune cells, neutrophils and T lymphocytes. Cells were obtained from healthy individuals, and single cell transcript expression data was obtained by a combination of single cell sorting and nanoscale quantitative real time PCR (qRT-PCR) for markers of cell type, intracellular signaling, and immune functionality. Gene expression analysis was focused on hierarchical clustering to determine the existence of cellular subgroups within the populations. Nine combinations of criteria for data exclusion and normalization were tested and evaluated. Bimodality in gene expression indicated the presence of cellular subgroups which were also revealed by data clustering. We observed evidence for two clearly defined cellular subtypes in the neutrophil populations and at least two in the T lymphocyte populations. When normalizing the data by different methods, we observed varying outcomes with corresponding interpretations of the biological characteristics of the cell populations. Normalization of the data by linear standardization taking into account technical effects such as plate effects, resulted in interpretations that most closely matched biological expectations. Single cell transcription profiling provides evidence of cellular subclasses in neutrophils and leukocytes that may be independent of traditional classifications based on cell surface markers. The choice of primary data analysis method had a substantial effect on the interpretation of the data. Adjustment for technical effects is critical to prevent misinterpretation of single cell transcript data.
PMCID: PMC4081288  PMID: 25024920
Single cell analysis; Data processing; Fluidigm; Gene expression
7.  Gene expression profiles associated with acute myocardial infarction and risk of cardiovascular death 
Genome Medicine  2014;6(5):40.
Genetic risk scores have been developed for coronary artery disease and atherosclerosis, but are not predictive of adverse cardiovascular events. We asked whether peripheral blood expression profiles may be predictive of acute myocardial infarction (AMI) and/or cardiovascular death.
Peripheral blood samples from 338 subjects aged 62 ± 11 years with coronary artery disease (CAD) were analyzed in two phases (discovery N = 175, and replication N = 163), and followed for a mean 2.4 years for cardiovascular death. Gene expression was measured on Illumina HT-12 microarrays with two different normalization procedures to control technical and biological covariates. Whole genome genotyping was used to support comparative genome-wide association studies of gene expression. Analysis of variance was combined with receiver operating curve and survival analysis to define a transcriptional signature of cardiovascular death.
In both phases, there was significant differential expression between healthy and AMI groups with overall down-regulation of genes involved in T-lymphocyte signaling and up-regulation of inflammatory genes. Expression quantitative trait loci analysis provided evidence for altered local genetic regulation of transcript abundance in AMI samples. On follow-up there were 31 cardiovascular deaths. A principal component (PC1) score capturing covariance of 238 genes that were differentially expressed between deceased and survivors in the discovery phase significantly predicted risk of cardiovascular death in the replication and combined samples (hazard ratio = 8.5, P < 0.0001) and improved the C-statistic (area under the curve 0.82 to 0.91, P = 0.03) after adjustment for traditional covariates.
A specific blood gene expression profile is associated with a significant risk of death in Caucasian subjects with CAD. This comprises a subset of transcripts that are also altered in expression during acute myocardial infarction.
PMCID: PMC4071233  PMID: 24971157
8.  SDS, a structural disruption score for assessment of missense variant deleteriousness 
We have developed a novel structure-based evaluation for missense variants that explicitly models protein structure and amino acid properties to predict the likelihood that a variant disrupts protein function. A structural disruption score (SDS) is introduced as a measure to depict the likelihood that a case variant is functional. The score is constructed using characteristics that distinguish between causal and neutral variants within a group of proteins. The SDS score is correlated with standard sequence-based deleteriousness, but shows promise for improving discrimination between neutral and causal variants at less conserved sites. The prediction was performed on 3-dimentional structures of 57 gene products whose homozygous SNPs were identified as case-exclusive variants in an exome sequencing study of epilepsy disorders. We contrasted the candidate epilepsy variants with scores for likely benign variants found in the EVS database, and for positive control variants in the same genes that are suspected to promote a range of diseases. To derive a characteristic profile of damaging SNPs, we transformed continuous scores into categorical variables based on the score distribution of each measurement, collected from all possible SNPs in this protein set, where extreme measures were assumed to be deleterious. A second epilepsy dataset was used to replicate the findings. Causal variants tend to receive higher sequence-based deleterious scores, induce larger physico-chemical changes between amino acid pairs, locate in protein domains, buried sites or on conserved protein surface clusters, and cause protein destabilization, relative to negative controls. These measures were agglomerated for each variant. A list of nine high-priority putative functional variants for epilepsy was generated. Our newly developed SDS protocol facilitates SNP prioritization for experimental validation.
PMCID: PMC4001065  PMID: 24795746
non-synonymous single nucleotide polymorphism; missense mutation; protein structural analysis; structural disruption score; variant prioritization; epilepsy disorders
9.  Systems Genomics of Metabolic Phenotypes in Wild-Type Drosophila melanogaster 
Genetics  2014;197(2):781-793.
Systems biology is an approach to dissection of complex traits that explicitly recognizes the impact of genetic, physiological, and environmental interactions in the generation of phenotypic variation. We describe comprehensive transcriptional and metabolic profiling in Drosophila melanogaster across four diets, finding little overlap in modular architecture. Genotype and genotype-by-diet interactions are a major component of transcriptional variation (24 and 5.3% of the total variation, respectively) while there were no main effects of diet (<1%). Genotype was also a major contributor to metabolomic variation (16%), but in contrast to the transcriptome, diet had a large effect (9%) and the interaction effect was minor (2%) for the metabolome. Yet specific principal components of these molecular phenotypes measured in larvae are strongly correlated with particular metabolic syndrome-like phenotypes such as pupal weight, larval sugar content and triglyceride content, development time, and cardiac arrhythmia in adults. The second principal component of the metabolomic profile is especially informative across these traits with glycine identified as a key loading variable. To further relate this physiological variability to genotypic polymorphism, we performed evolve-and-resequence experiments, finding rapid and replicated changes in gene frequency across hundreds of loci that are specific to each diet. Adaptation to diet is thus highly polygenic. However, loci differentially transcribed across diet or previously identified by RNAi knockdown or expression QTL analysis were not the loci responding to dietary selection. Therefore, loci that respond to the selective pressures of diet cannot be readily predicted a priori from functional analyses.
PMCID: PMC4063932  PMID: 24671769
metabolic syndrome; metabolomics; evolve-and-resequence; genotype-by-environment; adaptation
10.  Comparative transcriptomics and metabolomics in a rhesus macaque drug administration study 
We describe a multi-omic approach to understanding the effects that the anti-malarial drug pyrimethamine has on immune physiology in rhesus macaques (Macaca mulatta). Whole blood and bone marrow (BM) RNA-Seq and plasma metabolome profiles (each with over 15,000 features) have been generated for five naïve individuals at up to seven timepoints before, during and after three rounds of drug administration. Linear modeling and Bayesian network analyses are both considered, alongside investigations of the impact of statistical modeling strategies on biological inference. Individual macaques were found to be a major source of variance for both omic data types, and factoring individuals into subsequent modeling increases power to detect temporal effects. A major component of the whole blood transcriptome follows the BM with a time-delay, while other components of variation are unique to each compartment. We demonstrate that pyrimethamine administration does impact both compartments throughout the experiment, but very limited perturbation of transcript or metabolite abundance was observed following each round of drug exposure. New insights into the mode of action of the drug are presented in the context of pyrimethamine's predicted effect on suppression of cell division and metabolism in the immune system.
PMCID: PMC4233942  PMID: 25453034
pyrimethamine; bone marrow; peripheral blood; axes of variation; bayesian network inference; principal component analysis (PCA)
11.  An association-adjusted consensus deleterious scheme to classify homozygous Mis-sense mutations for personal genome interpretation 
BioData Mining  2013;6:24.
Personal genome analysis is now being considered for evaluation of disease risk in healthy individuals, utilizing both rare and common variants. Multiple scores have been developed to predict the deleteriousness of amino acid substitutions, using information on the allele frequencies, level of evolutionary conservation, and averaged structural evidence. However, agreement among these scores is limited and they likely over-estimate the fraction of the genome that is deleterious.
This study proposes an integrative approach to identify a subset of homozygous non-synonymous single nucleotide polymorphisms (nsSNPs). An 8-level classification scheme is constructed from the presence/absence of deleterious predictions combined with evidence of association with disease or complex traits. Detailed literature searches and structural validations are then performed for a subset of homozygous 826 mis-sense mutations in 575 proteins found in the genomes of 12 healthy adults.
Implementation of the Association-Adjusted Consensus Deleterious Scheme (AACDS) classifies 11% of all predicted highly deleterious homozygous variants as most likely to influence disease risk. The number of such variants per genome ranges from 0 to 8 with no significant difference between African and Caucasian Americans. Detailed analysis of mutations affecting the APOE, MTMR2, THSB1, CHIA, αMyHC, and AMY2A proteins shows how the protein structure is likely to be disrupted, even though the associated phenotypes have not been documented in the corresponding individuals.
The classification system for homozygous nsSNPs provides an opportunity to systematically rank nsSNPs based on suggestive evidence from annotations and sequence-based predictions. The ranking scheme, in-depth literature searches, and structural validations of highly prioritized mis-sense mutations compliment traditional sequence-based approaches and should have particular utility for the development of individualized health profiles. An online tool reporting the AACDS score for any variant is provided at the authors’ website.
PMCID: PMC3892026  PMID: 24365473
Homozygous variant; Non-synonymous single nucleotide polymorphism; Personal genome interpretation; Variant prioritization; Protein structure analysis
12.  TRF1 and TRF2 use different mechanisms to find telomeric DNA but share a novel mechanism to search for protein partners at telomeres 
Nucleic Acids Research  2013;42(4):2493-2504.
Human telomeres are maintained by the shelterin protein complex in which TRF1 and TRF2 bind directly to duplex telomeric DNA. How these proteins find telomeric sequences among a genome of billions of base pairs and how they find protein partners to form the shelterin complex remains uncertain. Using single-molecule fluorescence imaging of quantum dot-labeled TRF1 and TRF2, we study how these proteins locate TTAGGG repeats on DNA tightropes. By virtue of its basic domain TRF2 performs an extensive 1D search on nontelomeric DNA, whereas TRF1’s 1D search is limited. Unlike the stable and static associations observed for other proteins at specific binding sites, TRF proteins possess reduced binding stability marked by transient binding (∼9–17 s) and slow 1D diffusion on specific telomeric regions. These slow diffusion constants yield activation energy barriers to sliding ∼2.8–3.6 κBT greater than those for nontelomeric DNA. We propose that the TRF proteins use 1D sliding to find protein partners and assemble the shelterin complex, which in turn stabilizes the interaction with specific telomeric DNA. This ‘tag-team proofreading’ represents a more general mechanism to ensure a specific set of proteins interact with each other on long repetitive specific DNA sequences without requiring external energy sources.
PMCID: PMC3936710  PMID: 24271387
13.  From personalized to public health genomics 
Genome Medicine  2013;5(7):60.
PMCID: PMC3967116  PMID: 23876409
14.  Whole genome sequencing in support of wellness and health maintenance 
Genome Medicine  2013;5(6):58.
Whole genome sequencing is poised to revolutionize personalized medicine, providing the capacity to classify individuals into risk categories for a wide range of diseases. Here we begin to explore how whole genome sequencing (WGS) might be incorporated alongside traditional clinical evaluation as a part of preventive medicine. The present study illustrates novel approaches for integrating genotypic and clinical information for assessment of generalized health risks and to assist individuals in the promotion of wellness and maintenance of good health.
Whole genome sequences and longitudinal clinical profiles are described for eight middle-aged Caucasian participants (four men and four women) from the Center for Health Discovery and Well Being (CHDWB) at Emory University in Atlanta. We report multivariate genotypic risk assessments derived from common variants reported by genome-wide association studies (GWAS), as well as clinical measures in the domains of immune, metabolic, cardiovascular, musculoskeletal, respiratory, and mental health.
Polygenic risk is assessed for each participant for over 100 diseases and reported relative to baseline population prevalence. Two approaches for combining clinical and genetic profiles for the purposes of health assessment are then presented. First we propose conditioning individual disease risk assessments on observed clinical status for type 2 diabetes, coronary artery disease, hypertriglyceridemia and hypertension, and obesity. An approximate 2:1 ratio of concordance between genetic prediction and observed sub-clinical disease is observed. Subsequently, we show how more holistic combination of genetic, clinical and family history data can be achieved by visualizing risk in eight sub-classes of disease. Having identified where their profiles are broadly concordant or discordant, an individual can focus on individual clinical results or genotypes as they develop personalized health action plans in consultation with a health partner or coach.
The CHDWB will facilitate longitudinal evaluation of wellness-focused medical care based on comprehensive self-knowledge of medical risks.
PMCID: PMC3967117  PMID: 23806097
genetic prediction; risk assessment; preventive medicine; clinical profiling
15.  Effect of Normalization on Statistical and Biological Interpretation of Gene Expression Profiles 
Frontiers in Genetics  2013;3:160.
An under-appreciated aspect of the genetic analysis of gene expression is the impact of post-probe level normalization on biological inference. Here we contrast nine different methods for normalization of an Illumina bead-array gene expression profiling dataset consisting of peripheral blood samples from 189 individual participants in the Center for Health Discovery and Well Being study in Atlanta, quantifying differences in the inference of global variance components and covariance of gene expression, as well as the detection of variants that affect transcript abundance (eSNPs). The normalization strategies, all relative to raw log2 measures, include simple mean centering, two modes of transcript-level linear adjustment for technical factors, and for differential immune cell counts, variance normalization by interquartile range and by quantile, fitting the first 16 Principal Components, and supervised normalization using the SNM procedure with adjustment for cell counts. Robustness of genetic associations as a consequence of Pearson and Spearman rank correlation is also reported for each method, and it is shown that the normalization strategy has a far greater impact than correlation method. We describe similarities among methods, discuss the impact on biological interpretation, and make recommendations regarding appropriate strategies.
PMCID: PMC3668151  PMID: 23755061
microarray analysis; normalization; variance component analysis; eSNP
16.  Congruence of Additive and Non-Additive Effects on Gene Expression Estimated from Pedigree and SNP Data 
PLoS Genetics  2013;9(5):e1003502.
There is increasing evidence that heritable variation in gene expression underlies genetic variation in susceptibility to disease. Therefore, a comprehensive understanding of the similarity between relatives for transcript variation is warranted—in particular, dissection of phenotypic variation into additive and non-additive genetic factors and shared environmental effects. We conducted a gene expression study in blood samples of 862 individuals from 312 nuclear families containing MZ or DZ twin pairs using both pedigree and genotype information. From a pedigree analysis we show that the vast majority of genetic variation across 17,994 probes is additive, although non-additive genetic variation is identified for 960 transcripts. For 180 of the 960 transcripts with non-additive genetic variation, we identify expression quantitative trait loci (eQTL) with dominance effects in a sample of 339 unrelated individuals and replicate 31% of these associations in an independent sample of 139 unrelated individuals. Over-dominance was detected and replicated for a trans association between rs12313805 and ETV6, located 4MB apart on chromosome 12. Surprisingly, only 17 probes exhibit significant levels of common environmental effects, suggesting that environmental and lifestyle factors common to a family do not affect expression variation for most transcripts, at least those measured in blood. Consistent with the genetic architecture of common diseases, gene expression is predominantly additive, but a minority of transcripts display non-additive effects.
Author Summary
Gene expression levels are known to influence common disease susceptibility in humans, with GWAS significant SNPs frequently found in regulatory regions. The expression levels of most genes are influenced by genetic variants, often located close to the gene itself. Expression Quantitative Trait Loci (eQTL) mapping studies have been very successful in identifying SNPs associated with expression levels; however, little is currently known about the extent of additive and non-additive genetic variance and the role of common environment on gene expression. Here we report a comprehensive study of the sources of genetic and non-genetic variation for gene expression levels using both pedigree and genotype information. We show that the majority of transcripts exhibit only additive genetic variance with congruence from independent methods using pedigree and genotype approaches. However, there are a small number of probes whose expression levels are influenced by non-additive genetic variance. For some of these probes we identify SNPs acting in a dominant and over-dominant manner that replicate in an independent sample. Surprisingly, only 17 probes exhibit significant levels of common environmental effects, suggesting that environmental and lifestyle factors common to a family do not affect expression variation for most transcripts, at least those measured in blood.
PMCID: PMC3656157  PMID: 23696747
17.  Blood-Informative Transcripts Define Nine Common Axes of Peripheral Blood Gene Expression 
PLoS Genetics  2013;9(3):e1003362.
We describe a novel approach to capturing the covariance structure of peripheral blood gene expression that relies on the identification of highly conserved Axes of variation. Starting with a comparison of microarray transcriptome profiles for a new dataset of 189 healthy adult participants in the Emory-Georgia Tech Center for Health Discovery and Well-Being (CHDWB) cohort, with a previously published study of 208 adult Moroccans, we identify nine Axes each with between 99 and 1,028 strongly co-regulated transcripts in common. Each axis is enriched for gene ontology categories related to sub-classes of blood and immune function, including T-cell and B-cell physiology and innate, adaptive, and anti-viral responses. Conservation of the Axes is demonstrated in each of five additional population-based gene expression profiling studies, one of which is robustly associated with Body Mass Index in the CHDWB as well as Finnish and Australian cohorts. Furthermore, ten tightly co-regulated genes can be used to define each Axis as “Blood Informative Transcripts” (BITs), generating scores that define an individual with respect to the represented immune activity and blood physiology. We show that environmental factors, including lifestyle differences in Morocco and infection leading to active or latent tuberculosis, significantly impact specific axes, but that there is also significant heritability for the Axis scores. In the context of personalized medicine, reanalysis of the longitudinal profile of one individual during and after infection with two respiratory viruses demonstrates that specific axes also characterize clinical incidents. This mode of analysis suggests the view that, rather than unique subsets of genes marking each class of disease, differential expression reflects movement along the major normal Axes in response to environmental and genetic stimuli.
Author Summary
Gene expression profiling of human tissues typically reveals a complex structure of co-regulation of gene expression that has yet to be explored with regard to the genetic and environmental sources of covariance or its implications for quantitative and clinical traits. Here we show that peripheral blood samples from multiple studies can be described by nine common axes of variation that collectively explain up to one half of all transcriptional variance in blood. Specific axes diverge according to environmental variables such as lifestyle and infectious disease exposure, but a strong genetic component to axis regulation is also inferred. As few as 10 “blood-informative transcripts” (BITs) can be used to define each axis and potentially classify individuals with respect to multiple aspects of their blood and immune function. The analysis of longitudinal profiles of one individual shows how these change relative to clinical shifts in metabolic profile following viral infection. The notion that gene expression diverges along genetic paths of least resistance defined by these axes has important implications for interpreting differential expression in case-control studies of disease.
PMCID: PMC3597511  PMID: 23516379
18.  Using Blood Informative Transcripts in Geographical Genomics: Impact of Lifestyle on Gene Expression in Fijians 
Frontiers in Genetics  2012;3:243.
In previous geographical genomics studies of the impact of lifestyle on gene expression inferred from microarray analysis of peripheral blood samples, we described the complex influences of culture, ethnicity, and gender in Morocco, and of pregnancy in Brisbane. Here we describe the use of nanofluidic Fluidigm quantitative RT-PCR arrays targeted at a set of 96 transcripts that are broadly informative of the major axes of immune gene expression, to explore the population structure of transcription in Fiji. As in Morocco, major differences are seen between the peripheral blood transcriptomes of rural villagers and residents of the capital city, Suva. The effect is much greater in Indian villages than in Melanesian highlanders and appears to be similar with respect to the nature of at least two axes of variation. Gender differences are much smaller than ethnicity or lifestyle effects. Body mass index is shown to associate with one of the axes as it does in Atlanta and Brisbane, establishing a link between the epidemiological transition of human metabolic disease, and gene expression profiles.
PMCID: PMC3494018  PMID: 23162571
epidemiological transition; gene expression profiling; body mass index; TLR signaling; axes of variation
19.  Guidelines for Genome-Wide Association Studies 
PLoS Genetics  2012;8(7):e1002812.
PMCID: PMC3390399  PMID: 22792080
20.  Circulating leukocyte telomere length is highly heritable among families of Arab descent 
BMC Medical Genetics  2012;13:38.
Telomere length, an indicator of ageing and longevity, has been correlated with several biomarkers of cardiometabolic disease in both Arab children and adults. It is not known, however, whether or not telomere length is a highly conserved inheritable trait in this homogeneous cohort, where age-related diseases are highly prevalent. As such, the aim of this study was to address the inheritability of telomere length in Saudi families and the impact of cardiometabolic disease biomarkers on telomere length.
A total of 119 randomly selected Saudi families (123 adults and 131 children) were included in this cross-sectional study. Anthropometrics were obtained and fasting blood samples were taken for routine analyses of fasting glucose and lipid profile. Leukocyte telomere length was determined using quantitative real time PCR.
Telomere length was highly heritable as assessed by a parent-offspring regression [h2 = 0.64 (p = 0.0006)]. Telomere length was modestly associated with BMI (R2 0.07; p-value 0.0087), total cholesterol (R2 0.08; p-value 0.0033), and LDL-cholesterol (R2 0.15; p-value 3 x 10-5) after adjustments for gender, age and age within generation.
The high heritability of telomere length in Arab families, and the associations of telomere length with various cardiometabolic parameters suggest heritable genetic fetal and/or epigenetic influences on the early predisposition of Arab children to age-related diseases and accelerated ageing.
PMCID: PMC3458987  PMID: 22606980
Telomere length; Heritability; Arabs; Ageing
22.  Common genetic variation and performance on standardized cognitive tests 
One surprising feature of the recently completed waves of genome-wide association studies is the limited impact of common genetic variation in individually detectable polymorphisms on many human traits. This has been particularly pronounced for studies on psychiatric conditions, which have failed to produce clear, replicable associations for common variants. One popular explanation for these negative findings is that many of these traits may be genetically heterogeneous, leading to the idea that relevant endophenotypes may be more genetically tractable. Aspects of cognition may be the most important endophenotypes for psychiatric conditions such as schizophrenia, leading many researchers to pursue large-scale studies on the genetic contributors of cognitive performance in the normal population as a surrogate for aspects of liability to disease. Here, we perform a genome-wide association study with two tests of executive function, Digit Symbol and Stroop Color-Word, in 1086 healthy volunteers and with an expanded cognitive battery in 514 of these volunteers. We show that, consistent with published studies of the psychiatric conditions themselves, no single common variant has a large effect (explaining >4–8% of the population variation) on the performance of healthy individuals on standardized cognitive tests. Given that these are important endophenotypes, our work is consistent with the idea that identifying rare genetic causes of psychiatric conditions may be more important for future research than identifying genetically homogenous endophenotypes.
PMCID: PMC2987367  PMID: 20125193
endophenotypes; genome-wide association; cognition; psychiatric conditions; common variants
23.  Parent-Offspring Transmission of Adipocytokine Levels and Their Associations with Metabolic Traits 
PLoS ONE  2011;6(4):e18182.
Adipose tissue secreted cytokines (adipocytokines) have significant effects on the physiology and pathology of human metabolism relevant to diabetes and cardiovascular disease. We determined the relationship of the pattern of these circulating hormones with obesity-related phenotypes and whether such pattern is transmitted from parent to offspring. A combined total of 403 individuals from 156 consenting Saudi families divided into initial (119 families with 123 adults and 131 children) and replication (37 families with 58 adults and 91 children) cohorts were randomly selected from the RIYADH Cohort study. Anthropometrics were evaluated and metabolic measures such as fasting serum glucose, lipid profiles, insulin, leptin, adiponectin, resistin, tumor necrosis factor alpha (TNFα), activated plasminogen activator inhibitor 1 (aPAI1), high sensitivity C-reactive protein (hsCRP) and angiotensin II were also assessed. Parent-offspring regressions revealed that with the exception of hsCRP, all hormones measured showed evidence for significant inheritance. Principal component (PC) analysis of standardized hormone levels demonstrated surprising heritability of the three most common axes of variation. PC1, which explained 21% of the variation, was most strongly loaded on levels of leptin, TNFα, insulin, and aPAI1, and inversely with adiponectin. It was significantly associated with body mass index (BMI) and phenotypically stronger in children, and showed a heritability of ∼50%, after adjustment for age, gender and generational effects. We conclude that adipocytokines are highly heritable and their pattern of co-variation significantly influences BMI as early as the pre-teen years. Investigation at the genomic scale is required to determine the variants affecting the regulation of the hormones studied.
PMCID: PMC3070726  PMID: 21483749
24.  Maternal Influences on the Transmission of Leukocyte Gene Expression Profiles in Population Samples from Brisbane, Australia 
PLoS ONE  2010;5(12):e14479.
Two gene expression profiling studies designed to identify maternal influences on development of the neonate immune system and to address the population structure of the leukocyte transcriptome were carried out in Brisbane, Australia. In the first study, a comparison of 19 leukocyte samples obtained from mothers in the last three weeks of pregnancy with 37 umbilical cord blood samples documented differential expression of 7,382 probes at a false discovery rate of 1%, representing approximately half of the expressed transcriptome. An even larger component of the variation involving 8,432 probes, notably enriched for Vitamin E and methotrexate-responsive genes, distinguished two sets of individuals, with perfect transmission of the two profile types between each of 16 mother-child pairs in the study. A minor profile of variation was found to distinguish the gene expression profiles of obese mothers and children of gestational diabetic mothers from those of children born to obese mothers. The second study was of adult leukocyte profiles from a cross-section of Red Cross blood donors sampled throughout Brisbane. The first two axes in this study are related to the third and fourth axes of variation in the first study and also reflect variation in the abundance of CD4 and CD8 transcripts. One of the profiles associated with the third axis is largely excluded from samples from the central portion of the city. Despite enrichment of insulin signaling and aspects of central metabolism among the differentially expressed genes, there was little correlation between leukocyte expression profiles and body mass index overall. Our data is consistent with the notion that maternal health and cytokine milieu directly impact gene expression in fetal tissues, but that there is likely to be a complex interplay between cultural, genetic, and other environmental factors in the programming of gene expression in leukocytes of newborn children.
PMCID: PMC3013110  PMID: 21217831
25.  A genome-wide study of common SNPs and CNVs in cognitive performance in the CANTAB 
Human Molecular Genetics  2009;18(23):4650-4661.
Psychiatric disorders such as schizophrenia are commonly accompanied by cognitive impairments that are treatment resistant and crucial to functional outcome. There has been great interest in studying cognitive measures as endophenotypes for psychiatric disorders, with the hope that their genetic basis will be clearer. To investigate this, we performed a genome-wide association study involving 11 cognitive phenotypes from the Cambridge Neuropsychological Test Automated Battery. We showed these measures to be heritable by comparing the correlation in 100 monozygotic and 100 dizygotic twin pairs. The full battery was tested in ∼750 subjects, and for spatial and verbal recognition memory, we investigated a further 500 individuals to search for smaller genetic effects. We were unable to find any genome-wide significant associations with either SNPs or common copy number variants. Nor could we formally replicate any polymorphism that has been previously associated with cognition, although we found a weak signal of lower than expected P-values for variants in a set of 10 candidate genes. We additionally investigated SNPs in genomic loci that have been shown to harbor rare variants that associate with neuropsychiatric disorders, to see if they showed any suggestion of association when considered as a separate set. Only NRXN1 showed evidence of significant association with cognition. These results suggest that common genetic variation does not strongly influence cognition in healthy subjects and that cognitive measures do not represent a more tractable genetic trait than clinical endpoints such as schizophrenia. We discuss a possible role for rare variation in cognitive genomics.
PMCID: PMC2773267  PMID: 19734545

