1.  Data for Genetic Analysis Workshop 18: human whole genome sequence, blood pressure, and simulated phenotypes in extended pedigrees 
BMC Proceedings  2014;8(Suppl 1):S2.
Genetic Analysis Workshop 18 (GAW18) focused on identification of genes and functional variants that influence complex phenotypes in human sequence data. Data for the workshop were donated by the T2D-GENES Consortium and included whole genome sequences for odd-numbered autosomes in 464 key individuals selected from 20 Mexican American families, a dense set of single-nucleotide polymorphisms in 959 individuals in these families, and longitudinal data on systolic and diastolic blood pressure measured at 1-4 examinations over a period of 20 years. Simulated phenotypes were generated based on the real sequence data and pedigree structures. In the design of the simulation model, gene expression measures from the San Antonio Family Heart Study (not distributed as part of the GAW18 data) were used to identify genes whose mRNA levels were correlated with blood pressure. Observed variants within these genes were designated as functional in the GAW18 simulation if they were nonsynonymous and predicted to have deleterious effects on protein function or if they were noncoding and associated with mRNA levels. Two simulated longitudinal phenotypes were modeled to have the same trait distributions as the real systolic and diastolic blood pressure data, with effects of age, sex, and medication use, including a genotype-medication interaction. For each phenotype, more than 1000 sequence variants in more than 200 genes present on the odd-numbered autosomes individually explained less than 0.01-2.78% of phenotypic variance. Cumulatively, variants in the most influential gene explained 7.79% of trait variance. An additional simulated phenotype, Q1, was designed to be correlated among family members but to not be associated with any sequence variants. Two hundred replicates of the phenotypes were simulated, with each including data for 849 individuals.
2.  Evaluation of estimated genetic values and their application to genome-wide investigation of systolic blood pressure 
BMC Proceedings  2014;8(Suppl 1):S66.
The concept of breeding values, an individual's phenotypic deviation from the population mean as a result of the sum of the average effects of the genes they carry, is of great importance in livestock, aquaculture, and cash crop industries where emphasis is placed on an individual's potential to pass desirable phenotypes on to the next generation. As breeding or genetic values (as referred to here) cannot be measured directly, estimated genetic values (EGVs) are based on an individual's own phenotype, phenotype information from relatives, and, increasingly, genetic data. Because EGVs represent additive genetic variation, calculating EGVs in an extended human pedigree is expected to provide a more refined phenotype for genetic analyses. To test the utility of EGVs in genome-wide association, EGVs were calculated for 847 members of 20 extended Mexican American families based on 100 replicates of simulated systolic blood pressure. Calculations were performed in GAUSS to solve a variation on the standard Best Linear Unbiased Predictor (BLUP) mixed model equation with age, sex, and the first 3 principal components of sample-wide genetic variability as fixed effects and the EGV as a random effect distributed around the relationship matrix. Three methods of calculating kinship were considered: expected kinship from pedigree relationships, empirical kinship from common variants, and empirical kinship from both rare and common variants. Genome-wide association analysis was conducted on simulated phenotypes and EGVs using the additive measured genotype approach in the SOLAR software package. The EGV-based approach showed only minimal improvement in power to detect causative loci.
3.  Genetic Analysis Workshop 18: Methods and strategies for analyzing human sequence and phenotype data in members of extended pedigrees 
BMC Proceedings  2014;8(Suppl 1):S1.
Genetic Analysis Workshop 18 provided a platform for developing and evaluating statistical methods to analyze whole-genome sequence data from a pedigree-based sample. In this article we present an overview of the data sets and the contributions that analyzed these data. The family data, donated by the Type 2 Diabetes Genetic Exploration by Next-Generation Sequencing in Ethnic Samples Consortium, included sequence-level genotypes based on sequencing and imputation, genome-wide association genotypes from prior genotyping arrays, and phenotypes from longitudinal assessments. The contributions from individual research groups were extensively discussed before, during, and after the workshop in theme-based discussion groups before being submitted for publication.
4.  Identifying rare variants from exome scans: the GAW17 experience 
BMC Proceedings  2011;5(Suppl 9):S1.
Genetic Analysis Workshop 17 (GAW17) provided a platform for evaluating existing statistical genetic methods and for developing novel methods to analyze rare variants that modulate complex traits. In this article, we present an overview of the 1000 Genomes Project exome data and simulated phenotype data that were distributed to GAW17 participants for analyses, the different issues addressed by the participants, and the process of preparation of manuscripts resulting from the discussions during the workshop.
5.  Genetic Analysis Workshop 17 mini-exome simulation 
BMC Proceedings  2011;5(Suppl 9):S2.
The data set simulated for Genetic Analysis Workshop 17 was designed to mimic a subset of data that might be produced in a full exome screen for a complex disorder and related risk factors in order to permit workshop participants to investigate issues of study design and statistical genetic analysis. Real sequence data from the 1000 Genomes Project formed the basis for simulating a common disease trait with a prevalence of 30% and three related quantitative risk factors in a sample of 697 unrelated individuals and a second sample of 697 individuals in large, extended pedigrees. Called genotypes for 24,487 autosomal markers assigned to 3,205 genes and simulated affection status, quantitative traits, age, sex, pedigree relationships, and cigarette smoking were provided to workshop participants. The simulating model included both common and rare variants with minor allele frequencies ranging from 0.07% to 25.8% and a wide range of effect sizes for these variants. Genotype-smoking interaction effects were included for variants in one gene. Functional variants were concentrated in genes selected from specific biological pathways and were selected on the basis of the predicted deleteriousness of the coding change. For each sample, unrelated individuals and family, 200 replicates of the phenotypes were simulated.
6.  Genetic signal maximization using environmental regression 
BMC Proceedings  2011;5(Suppl 9):S72.
Joint analyses of correlated phenotypes in genetic epidemiology studies are common. However, these analyses primarily focus on genetic correlation between traits and do not take into account environmental correlation. We describe a method that optimizes the genetic signal by accounting for stochastic environmental noise through joint analysis of a discrete trait and a correlated quantitative marker. We conducted bivariate analyses where heritability and the environmental correlation between the discrete and quantitative traits were calculated using Genetic Analysis Workshop 17 (GAW17) family data. The resulting inverse value of the environmental correlation between these traits was then used to determine a new β coefficient for each quantitative trait and was constrained in a univariate model. We conducted genetic association tests on 7,087 nonsynonymous SNPs in three GAW17 family replicates for Affected status with the β coefficient fixed for three quantitative phenotypes and compared these to an association model where the β coefficient was allowed to vary. Bivariate environmental correlations were 0.64 (± 0.09) for Q1, 0.798 (± 0.076) for Q2, and −0.169 (± 0.18) for Q4. Heritability of Affected status improved in each univariate model where a constrained β coefficient was used to account for stochastic environmental effects. No genome-wide significant associations were identified for either method but we demonstrated that constraining β for covariates slightly improved the genetic signal for Affected status. This environmental regression approach allows for increased heritability when the β coefficient for a highly correlated quantitative covariate is constrained and increases the genetic signal for the discrete trait.
7.  Do rare variant genotypes predict common variant genotypes? 
BMC Proceedings  2011;5(Suppl 9):S87.
The synthetic association hypothesis proposes that common genetic variants detectable in genome-wide association studies may reflect the net phenotypic effect of multiple rare polymorphisms distributed broadly within the focal gene rather than, as often assumed, the effect of common functional variants in high linkage disequilibrium with the focal marker. In a recent study, Dickson and colleagues demonstrated synthetic association in simulations and in two well-characterized, highly polymorphic human disease genes. The converse of this hypothesis is that rare variant genotypes must be correlated with common variant genotypes often enough to make the phenomenon of synthetic association possible. Here we used the exome genotype data provided for Genetic Analysis Workshop 17 to ask how often, how well, and under what conditions rare variant genotypes predict the genotypes of common variants within the same gene. We found nominal evidence of correlation between rare and common variants in 21-30% of cases examined for unrelated individuals; this rate increased to 38-44% for related individuals, underscoring the segregation that underlies synthetic association.
8.  Toward the identification of causal genes in complex diseases: a gene-centric joint test of significance combining genomic and transcriptomic data 
BMC Proceedings  2009;3(Suppl 7):S92.
Gene identification using linkage, association, or genome-wide expression is often underpowered. We propose that formal combination of information from multiple gene-identification approaches may lead to the identification of novel loci that are missed when only one form of information is available.
Firstly, we analyze the Genetic Analysis Workshop 16 Framingham Heart Study Problem 2 genome-wide association data for HDL-cholesterol using a "gene-centric" approach. Then we formally combine the association test results with genome-wide transcriptional profiling data for high-density lipoprotein cholesterol (HDL-C), from the San Antonio Family Heart Study, using a Z-transform test (Stouffer's method).
We identified 39 genes by the joint test at a conservative 1% false-discovery rate, including 9 from the significant gene-based association test and 23 whose expression was significantly correlated with HDL-C. Seven genes identified as significant in the joint test were not independently identified by either the association or expression tests.
This combined approach has increased power and leads to the direct nomination of novel candidate genes likely to be involved in the determination of HDL-C levels. Such information can then be used as justification for a more exhaustive search for functional sequence variation within the nominated genes. We anticipate that this type of analysis will improve our speed of identification of regulatory genes causally involved in disease risk.
9.  Effects of covariates and interactions on a genome-wide association analysis of rheumatoid arthritis 
BMC Proceedings  2009;3(Suppl 7):S84.
While genetic and environmental factors and their interactions influence susceptibility to rheumatoid arthritis (RA), causative genetic variants have not been identified. The purpose of the present study was to assess the effects of covariates and genotype × sex interactions on the genome-wide association analysis (GWAA) of RA using Genetic Analysis Workshop 16 Problem 1 data and a logistic regression approach as implemented in PLINK. After accounting for the effects of population stratification, effects of covariates and genotype × sex interactions on the GWAA of RA were assessed by conducting association and interaction analyses. We found significant allelic associations, covariate, and genotype × sex interaction effects on RA. Several top single-nucleotide polymorphisms (SNPs) (~22 SNPs) showed significant associations with strong p-values (p < 1 × 10-4 - p < 1 × 10-24). Only three SNPs on chromosomes 4, 13, and 20 were significant after Bonferroni correction, and none of these three SNPs showed significant genotype × sex interactions. Of the 30 top SNPs with significant (p < 1 × 10-4 - p < 1 × 10-6) interactions, ~23 SNPs showed additive interactions and ~5 SNPs showed only dominance interactions. Those SNPs showing significant associations in the regular logistic regression failed to show significant interactions. In contrast, the SNPs that showed significant interactions failed to show significant associations in models that did not incorporate interactions. It is important to consider interactions of genotype × sex in addition to associations in a GWAA of RA. Furthermore, the association between SNPs and RA susceptibility varies significantly between men and women.
10.  Genome-wide discovery of maternal effect variants 
BMC Proceedings  2009;3(Suppl 7):S19.
Many phenotypes may be influenced by the prenatal environment of the mother and/or maternal care, and these maternal effects may have a heritable component. We have implemented in the computer program SOLAR a variance components-based method for detecting indirect effects of maternal genotype on offspring phenotype. Of six phenotypes measured in three generations of the Framingham Heart Study, height showed the strongest evidence (P = 0.02) of maternal effect. We conducted a genome-wide association analysis for height, testing both the direct effect of the focal individual's genotype and the indirect effect of the maternal genotype. Offspring height showed suggestive evidence of association with maternal genotype for two single-nucleotide polymorphisms in the trafficking protein particle complex 9 gene TRAPPC9 (NIBP), which plays a role in neuronal NF-κB signalling. This work establishes a methodological framework for identifying genetic variants that may influence the contribution of the maternal environment to offspring phenotypes.
12.  Searching for master regulators of transcription in a human gene expression data set 
BMC Proceedings  2007;1(Suppl 1):S81.
Microarray technologies allow the measurement of the expression levels of thousands of transcripts at the same time. As part of Genetic Analysis Workshop 15 (GAW15), we analyzed a data set that measured the expression of more than 3000 genes in 14 families. Our goal was to identify genomic regions that regulate the expression of several genes at the same time. We tried two different approaches: one was maximum likelihood-based variance-component linkage analysis and the other was a new linkage regression approach. We detected some loci that were linked with the expression level of more genes than would be expected by chance. These loci are candidates for master regulators of transcription (MRT). Finally, for each candidate MRT, we did a gene ontology (GO) analysis to test whether the genes linked to it were biologically related.
13.  Effects on linkage analyses of different Affymetrix expression measures as quantitative trait phenotypes 
BMC Proceedings  2007;1(Suppl 1):S158.
We compared results from linkage analyses of different phenotype measurements from the same gene expression traits and found that the strongest signals were detected by all expression measures that we considered. On average, that meant that the same quantitative trait loci (QTLs) were detected across methods, but the magnitude of the LOD score of each particular QTL and the false-positive ratio of QTL detection varied between the methods.
