1.  Pathway-based analysis of rare and common variants to test for association with blood pressure 
BMC Proceedings  2014;8(Suppl 1):S101.
Our goal is to test the effect of both rare and common variants in a blood pressure study. We use a pathway-based approach, gene-set enrichment analysis, to search for related genes affecting 4 phenotypes: systolic blood pressure, diastolic blood pressure, the difference between each of them and mean arterial pressure, which is a weighted linear combination of systolic and diastolic blood pressure. Using the real Genetic Analysis Workshop 18 data, we consider both rare and common variants in our analysis and incorporate other covariates by using a recently proposed test statistic.
Our study identified a commonly enriched gene set/pathway for the two derived phenotypes we analyzed: the difference between systolic and diastolic blood pressure and mean arterial pressure, but none is identified with the individual blood pressure phenotypes. The gene CD47, in the enriched gene pathway/set, was reported in previous studies to be related to blood pressure.
The findings are not surprising because the sample size we use in our analysis is small, and hence power to detect small but important effects is likely inadequate.
PMCID: PMC4143690  PMID: 25519355
2.  Bivariate linear mixed model analysis to test joint associations of genetic variants on systolic and diastolic blood pressure 
BMC Proceedings  2014;8(Suppl 1):S75.
Genetic variants that predispose adults and the elderly to high blood pressure are largely unknown. We used a bivariate linear mixed model approach to jointly test the associations of common single-nucleotide polymorphisms with systolic and diastolic blood pressure using data from a genome-wide association study consisting of genetic variants from chromosomes 3 and 9 and longitudinal measured phenotypes and environment variables from unrelated individuals of Mexican American ethnicity provided by the Genetic Analysis Workshop 18. Despite the small sample size of a maximum of 131 unrelated subjects, a few single-nucleotide polymorphisms appeared significant at the genome-wide level. Simulated data, which was also provided by Genetic Analysis Workshop 18 organizers, showed higher power of the bivariate approach over univariate analysis to detect the association of a selected single-nucleotide polymorphism with modest effect. This suggests that the bivariate approach to longitudinal data of jointly measured and correlated phenotypes can be a useful strategy to identify candidate single-nucleotide polymorphisms that deserve further investigation.
PMCID: PMC4143667  PMID: 25519403
3.  Entropy-based method for assessing the influence of genetic markers and covariates on hypertension: application to Genetic Analysis Workshop 18 data 
BMC Proceedings  2014;8(Suppl 1):S97.
Many complex diseases are related to genetics, and it is of great interest to evaluate the association between single-nucleotide polymorphisms (SNPs) and disease outcome. The association of genetics with outcome can be modified by covariates such as age, sex, smoking status, and membership to the same pedigree. In this paper, we propose a block entropy method to separate two classes of SNPs, for which the association with hypertension is either sensitive or insensitive to the covariates. We also propose a consistency entropy method to further reduce the number of SNPs that might be associated with the outcome. Based on the data provided by the organizers of Genetic Analysis Workshop 18, we calculated the block entropies for six different blocking strategies. Using block entropy and consistency entropy, we identified 230 SNPs on chromosome 9 that are most likely to be associated with the outcome and whose associations with hypertension are sensitive to the covariates.
PMCID: PMC4143731  PMID: 25519419
4.  Testing for associations between systolic blood pressure and single-nucleotide polymorphism profiles obtained from sparse principal component analysis 
BMC Proceedings  2014;8(Suppl 1):S95.
Background: Hypertension is a prevalent condition linked to major cardiovascular conditions and multiple other comorbidities. Genetic information can offer a deeper understanding about susceptibility and the underlying disease mechanisms. The Genetic Analysis Workshop 18 (GAW18) provides abundant genotype data to determine genetic associations for being hypertensive and for the underlying trait of systolic blood pressure (SBP). The high-dimensional nature of this data promotes dimension reduction techniques to remove excess noise and also synthesize genetic information for complex, polygenic traits. Methods: For both measured and simulated phenotype data from GAW18, we use sparse principal component analysis to obtain sparse genetic profiles that represent the underlying data structures. We then detect associations between the obtained sparse principal components (PCs) and SBP, a major indicator of hypertension, following up by investigating the sparse PCs for genetic structure to gain insight into new patterns. Results: After adjusting for multiple testing, 27 of 122 PCs were significantly associated with measured SBP, offering a large number of components to investigate. Considering the top 3 PCs, linked genetic regions have been identified; these may act in unison while associated with SBP. Simulated data offered similar results. Conclusions: Sparse PCs can offer a new data-driven approach to structuring genotype data and understanding the genetic mechanics behind complex, polygenic traits such as hypertension.
PMCID: PMC4143624  PMID: 25519350
5.  Analysis of baseline, average, and longitudinally measured blood pressure data using linear mixed models 
BMC Proceedings  2014;8(Suppl 1):S80.
This article compares baseline, average, and longitudinal data analysis methods for identifying genetic variants in genome-wide association study using the Genetic Analysis Workshop 18 data. We apply methods that include (a) linear mixed models with baseline measures, (b) random intercept linear mixed models with mean measures outcome, and (c) random intercept linear mixed models with longitudinal measurements. In the linear mixed models, covariates are included as fixed effects, whereas relatedness among individuals is incorporated as the variance-covariance structure of the random effect for the individuals. The overall strategy of applying linear mixed models decorrelate the data is based on Aulchenko et al.'s GRAMMAR. By analyzing systolic and diastolic blood pressure, which are used separately as outcomes, we compare the 3 methods in identifying a known genetic variant that is associated with blood pressure from chromosome 3 and simulated phenotype data. We also analyze the real phenotype data to illustrate the methods. We conclude that the linear mixed model with longitudinal measurements of diastolic blood pressure is the most accurate at identifying the known single-nucleotide polymorphism among the methods, but linear mixed models with baseline measures perform best with systolic blood pressure as the outcome.
PMCID: PMC4143715  PMID: 25519409
6.  Gene-based analysis of rare and common variants to determine association with blood pressure 
BMC Proceedings  2014;8(Suppl 1):S46.
Systolic blood pressure and diastolic blood pressure are known risk factors for cardiovascular diseases and understanding their genetic basis will have important public health implications. For rare variants, it is extremely challenging to make statistical inference for single-maker tests. Therefore, joint analysis of a set of variants has been proposed. In this paper, we applied recently proposed methods "test for testing the effect of an optimally weighted combination of variants" and "variable weight-TOW" to determine genetic regions that are associated with blood pressure. Then least absolute shrinkage and selection operator, as well as sparse partial least square methods, were used to identify significant markers within a gene or in intergenic regions. We investigated the effect of rare variants and common variants, and their combined effect.
PMCID: PMC4143676  PMID: 25519387
7.  Genetic Analysis Workshop 18: Methods and strategies for analyzing human sequence and phenotype data in members of extended pedigrees 
BMC Proceedings  2014;8(Suppl 1):S1.
Genetic Analysis Workshop 18 provided a platform for developing and evaluating statistical methods to analyze whole-genome sequence data from a pedigree-based sample. In this article we present an overview of the data sets and the contributions that analyzed these data. The family data, donated by the Type 2 Diabetes Genetic Exploration by Next-Generation Sequencing in Ethnic Samples Consortium, included sequence-level genotypes based on sequencing and imputation, genome-wide association genotypes from prior genotyping arrays, and phenotypes from longitudinal assessments. The contributions from individual research groups were extensively discussed before, during, and after the workshop in theme-based discussion groups before being submitted for publication.
PMCID: PMC4143625  PMID: 25519310
8.  Using a latent growth curve model for an integrative assessment of the effects of genetic and environmental factors on multiple phenotypes 
BMC Proceedings  2009;3(Suppl 7):S44.
We propose the use of latent growth curve model to assess the influence of genetic, environmental, demographic, and lifestyle factors on multiple phenotypes related to coronary heart disease. We model four quantitative traits (systolic blood pressure, high-density lipoprotein, low-density lipoprotein, and triglycerides) simultaneously in a multivariate framework that allows us to study their change over time, assess individual variation, and investigate cross-phenotype relationships. Environmental, demographic, and lifestyle covariates are included at different levels of the model as time-varying or time-invariant, as appropriate. To investigate the change over time attributed to genetic factors, we use candidate markers that have previously been shown to be associated with the quantitative traits. We illustrate our approach using independent observations from the offspring cohort of the Framingham Heart Study data.
PMCID: PMC2795943  PMID: 20018036
9.  Using a higher criticism statistic to detect modest effects in a genome-wide study of rheumatoid arthritis 
BMC Proceedings  2009;3(Suppl 7):S40.
In high-dimensional studies such as genome-wide association studies, the correction for multiple testing in order to control total type I error results in decreased power to detect modest effects. We present a new analytical approach based on the higher criticism statistic that allows identification of the presence of modest effects. We apply our method to the genome-wide study of rheumatoid arthritis provided in the Genetic Analysis Workshop 16 Problem 1 data set. There is evidence for unknown bias in this study that could be explained by the presence of undetected modest effects. We compared the asymptotic and empirical thresholds for the higher criticism statistic. Using the asymptotic threshold we detected the presence of modest effects genome-wide. We also detected modest effects using 90th percentile of the empirical null distribution as a threshold; however, there is no such evidence when the 95th and 99th percentiles were used. While the higher criticism method suggests that there is some evidence for modest effects, interpreting individual single-nucleotide polymorphisms with significant higher criticism statistics is of undermined value. The goal of higher criticism is to alert the researcher that genetic effects remain to be discovered and to promote the use of more targeted and powerful studies to detect the remaining effects.
PMCID: PMC2795939  PMID: 20018032
10.  Pathway-based analysis of a genome-wide case-control association study of rheumatoid arthritis 
BMC Proceedings  2009;3(Suppl 7):S128.
Evaluation of the association between single-nucleotide polymorphisms (SNPs) and disease outcomes is widely used to identify genetic risk factors for complex diseases. Although this analysis paradigm has made significant progress in many genetic studies, many challenges remain, such as the requirement of a large sample size to achieve adequate power. Here we use rheumatoid arthritis (RA) as an example and explore a new analysis strategy: pathway-based analysis to search for related genes and SNPs contributing to the disease.
We first propose the application of measure of explained variation to quantify the predictive ability of a given SNP. We then use gene set enrichment analysis to evaluate enrichment of specific pathways, where pathways, are considered enriched if they consist of genes that are associated with the phenotype of interest above and beyond is expected by chance. The results are also compared with score tests for association analysis by adjusting for population stratification.
Our study identified some significantly enriched pathways, such as "cell adhesion molecules," which are known to play a key role in RA. Our results showed that pathway-based analysis may identify other biologically interesting loci (e.g., rs1018361) related to RA: the gene (CTLA4) closest to this marker has previously been shown to be associated with RA and the gene is in the significant pathways we identified, even though the marker has not reached genome-wide significance in univariate single-marker analysis.
PMCID: PMC2795901  PMID: 20017994
11.  Genome-wide association analysis of cardiovascular-related quantitative traits in the Framingham Heart Study 
BMC Proceedings  2009;3(Suppl 7):S117.
Multivariate linear growth curves were used to model high-density lipoprotein (HDL), low-density lipoprotein (LDL), triglycerides (TG), and systolic blood pressure (SBP) measured during four exams from 1659 independent individuals from the Framingham Heart Study. The slopes and intercepts from each of two phenotype models were tested for association with 348,053 autosomal single-nucleotide polymorphisms from the Affymetrix Gene Chip 500 k set. Three regions were associated with LDL intercept, TG slope, and SBP intercept (p < 1.44 × 10-7). We observed results consistent with previously reported associations between rs599839, on chromosome 1p13, and LDL. We note that the association is significant with LDL intercept but not slope. Markers on chromosome 17q25 were associated with TG slope, and a single-nucleotide polymorphism on chromosome 7p11 was associated with SBP intercept. Growth curve models can be used to gain more insight on the relationships between SNPs and traits than traditional association analysis when longitudinal data has been collected. The power to detect association with changes over time may be limited if the subjects are not followed over a long enough time period.
PMCID: PMC2795889  PMID: 20017982
13.  Impact of normalization and filtering on linkage analysis of gene expression data 
BMC Proceedings  2007;1(Suppl 1):S150.
Using the Problem 1 data set made available for Genetic Analysis Workshop 15, we assessed sensitivity of linkage results to a correlation-based feature extraction method as well as to different normalization procedures applied to the raw Affymetrix gene expression microarray data. The impact of these procedures on heritability estimates and on expression quantitative trait loci are investigated. The filtering algorithm we propose in this paper ranks genes based on the total absolute correlation of each gene with all other genes on the array and has the potential to extract features that may play role in functional pathways and gene networks. Our results showed that the normalization and filtering algorithms can have a profound influence on genetic analysis of gene expression data.
PMCID: PMC2367572  PMID: 18466495
14.  Identifying cis- and trans-acting single-nucleotide polymorphisms controlling lymphocyte gene expression in humans 
BMC Proceedings  2007;1(Suppl 1):S7.
Assuming multiple loci play a role in regulating the expression level of a single phenotype, we propose a new approach to identify cis- and trans-acting loci that regulate gene expression. Using the Problem 1 data set made available for Genetic Analysis Workshop 15 (GAW15), we identified many expression phenotypes that have significant evidence of association and linkage to one or more chromosomal regions. In particular, six of ten phenotypes that we found to be regulated by cis- and trans-acting loci were also mapped by a previous analysis of these data in which a total of 27 phenotypes were identified with expression levels regulated by cis-acting determinants. However, in general, the p-values associated with these regulators identified in our study were larger than in their studies, since we had also identified other factors regulating expression. In fact, we found that most of the gene expression phenotypes are influenced by at least one trans-acting locus. Our study also shows that much of the observable heritability in the phenotypes could be explained by simple single-nucleotide polymorphism associations; residual heritability was reduced and the remaining heritability may represent complex regulation systems with interactions or noise.
PMCID: PMC2367558  PMID: 18466571
16.  Genome-wide sparse canonical correlation of gene expression with genotypes 
BMC Proceedings  2007;1(Suppl 1):S119.
There is a growing interest in studying natural variation in human gene expression. Studies mapping genetic determinants of expression profiles are often carried out considering the expression of one gene at a time, an approach that is computationally intensive and may be prone to high false-discovery rate because the number of genes under consideration often exceeds tens of thousands. We present an exploratory method for investigating such data and apply it to the data provided as Problem 1 of Genetic Analysis Workshop 15 (GAW15). In multivariate analysis, canonical correlation analysis is a common way to inspect the relationship between two sets of variables based on their correlation. It determines linear combinations of all variables from each data set such that the correlation between the two linear combinations is maximized. However, due to the large number of genes, linear combinations involving all single-nucleotide polymorphism (SNP) loci and gene expression phenotypes lack biological plausibility and interpretability. We introduce sparse canonical correlation analysis, which examines the relationships of many genetic loci and gene expression phenotypes by providing sparse linear combinations that include only a small subset of loci and gene expression phenotypes. These correlated sets of variables are sufficiently small for biological interpretability and further investigation. Applying this method to the GAW15 Problem 1 data, we identified groups of 41 loci and 150 gene expressions with the highest between-group correlation of 43%.
PMCID: PMC2367499  PMID: 18466460
17.  Sex, age and generation effects on genome-wide linkage analysis of gene expression in transformed lymphoblasts 
BMC Proceedings  2007;1(Suppl 1):S92.
Many traits differ by age and sex in humans, but genetic analysis of gene expression has typically not included them in the analysis.
We used Genetic Analysis Workshop 15 Problem 1 data to determine whether gene expression in lymphoblasts showed differences by age and/or sex using generalized estimating equations (GEE). We performed quantitative trait linkage analysis of these genes including age and sex as covariates to determine whether the linkage results changed when they were included as covariates. Because the families included in the study all contain three generations, we also determined what effect inclusion of generation in the model had on the age effects.
When controlling the false-discovery rate at 1%, using GEE we identified 30 transcripts that showed significant differences in expression by sex, while 1950 transcripts showed differences in expression associated with age. When subjected to linkage analysis, there were 37 linkages that disappeared, while 17 appeared when sex was included as a covariate. All these genes were, as expected, on the sex chromosomes. In contrast, when age was included in the linkage analysis, 462 linkage signals were no longer significant, while 223 became significant. When generation was included in the model with age, all but 6 of the GEE age effects were no longer significant. However, there were minimal changes in the linkage results.
The effect of age on linkage analyses was apparent for the expression of many genes, which appear to be mostly due to differences between the generations.
PMCID: PMC2367486  PMID: 18466596

