1.  Identifying rare variants from exome scans: the GAW17 experience 
BMC Proceedings  2011;5(Suppl 9):S1.
Genetic Analysis Workshop 17 (GAW17) provided a platform for evaluating existing statistical genetic methods and for developing novel methods to analyze rare variants that modulate complex traits. In this article, we present an overview of the 1000 Genomes Project exome data and simulated phenotype data that were distributed to GAW17 participants for analyses, the different issues addressed by the participants, and the process of preparation of manuscripts resulting from the discussions during the workshop.
PMCID: PMC3287821  PMID: 22373325
2.  Case-control association testing by graphical modeling for the Genetic Analysis Workshop 17 mini-exome sequence data 
BMC Proceedings  2011;5(Suppl 9):S62.
We generalize recent work on graphical models for linkage disequilibrium to estimate the conditional independence structure between all variables for individuals in the Genetic Analysis Workshop 17 unrelated individuals data set. Using a stepwise approach for computational efficiency and an extension of our previously described methods, we estimate a model that describes the relationships between the disease trait, all quantitative variables, all covariates, ethnic origin, and the loci most strongly associated with these variables. We performed our analysis for the first 50 replicate data sets. We found that our approach was able to describe the relationships between the outcomes and covariates and that it could correctly detect associations of disease with several loci and with a reasonable false-positive detection rate.
PMCID: PMC3287901  PMID: 22373360
3.  Pairwise shared genomic segment analysis in high-risk pedigrees: application to Genetic Analysis Workshop 17 exome-sequencing SNP data 
BMC Proceedings  2011;5(Suppl 9):S9.
We applied our method of pairwise shared genomic segment (pSGS) analysis to high-risk pedigrees identified from the Genetic Analysis Workshop 17 (GAW17) mini-exome sequencing data set. The original shared genomic segment method focused on identifying regions shared by all case subjects in a pedigree; thus it can be sensitive to sporadic cases. Our new method examines sharing among all pairs of case subjects in a high-risk pedigree and then uses the mean sharing as the test statistic; in addition, the significance is assessed empirically based on the pedigree structure and linkage disequilibrium pattern of the single-nucleotide polymorphisms. Using all GAW17 replicates, we identified 18 unilineal high-risk pedigrees that contained excess disease (p < 0.01) and at least 15 meioses between case subjects. Eighteen rare causal variants were polymorphic in this set of pedigrees. Based on a significance threshold of 0.001, 72.2% (13/18) of these pedigrees were successfully identified with at least one region that contains a true causal variant. The regions identified included 4 of the possible 18 polymorphic causal variants. On average, 1.1 true positives and 1.7 false positives were identified per pedigree. In conclusion, we have demonstrated the potential of our new pSGS method for localizing rare disease causal variants in common disease using high-risk pedigrees and exome sequence data.
PMCID: PMC3287931  PMID: 22373081
4.  Pedigree association: assigning individual weights to pedigree members for genetic association analysis 
BMC Proceedings  2009;3(Suppl 7):S121.
Methods exist to appropriately perform association analyses in pedigrees. However, for genome-wide association analysis, these methods are computationally impractical. It is therefore important to determine alternate methods that can be efficiently used genome-wide. Here, we introduce a new algorithm that considers all relationships simultaneously in arbitrary-structured pedigrees and assigns weights to pedigree members that can be used in subsequent analyses to address relatedness. We compare this new method with an existing weighting algorithm, a naïve analysis (relatedness is ignored), and an empirical method that appropriately accounts for all relationships (the gold standard).
Framingham Heart Study Genetic Analysis Workshop 16 Problem 2 data were used with a dichotomous phenotype based on high-density lipoprotein cholesterol level (1,611 cases and 4,043 controls). New and existing algorithms for calculating weights were used. Cochran-Armitage trend tests were performed for 17,333 single-nucleotide polymorphisms on chromosome 8 using both weighting systems and the naïve approach; a subset of 500 single-nucleotide polymorphisms were tested empirically. Correlations of p-values from each method were determined.
Results from the two weighting methods were strongly correlated (r = 0.96). Our new weighting method performed better than the existing weighting method (r = 0.89 vs. r = 0.83), which is due to a more moderate down-weighting. The naive analysis obtained the best correlation with the empirical gold standard results (r = 0.99).
Our results suggest that weighting methods do not accurately represent tests that account for familial relationships in genetic association analyses and are inferior to the naïve method as an efficient initial genome-wide screening tool.
PMCID: PMC2795894  PMID: 20017987
5.  Extracting disease risk profiles from expression data for linkage analysis: application to prostate cancer 
BMC Proceedings  2007;1(Suppl 1):S82.
The genetic factors underlying many complex traits are not well understood. The Genetic Analysis Workshop 15 Problem 1 data present the opportunity to explore whether gene expression data from microarrays can be utilized to define useful phenotypes for linkage analysis in complex diseases. We utilize expression profiles for multiple genes that have been associated with a disease to develop a composite 'risk profile' that can be used to map other loci involved in the same disease process. Using prostate cancer as our disease of interest, we identified 26 genes whose expression levels had previously been associated with prostate cancer and defined three phenotypes: high, neutral, or low risk profiles, based on individual expression levels. Linkage analyses using MCLINK, a Markov-chain Monte Carlo method, and MERLIN were performed for all three phenotypes. Both methods were in very close agreement. Genome-wide suggestive linkage evidence was observed on chromosomes 6 and 4. It was interesting to note that the linkage signals did not appear to be strongly influenced by the location of the original 26 genes used in the phenotype definition, indicating that composite measures may have potential to locate additional genes in the same process. In this example, however, extreme caution is necessary in any extrapolation of the identified loci to prostate cancer due to the lack of data regarding the behavior of these genes' expression level in lymphoblastoid cells. Our results do indicate there exists potential to augment our current knowledge about the relationships among genes associated with complex diseases using expression data.
PMCID: PMC2367601  PMID: 18466585
6.  Analysis of high-density single-nucleotide polymorphism data: three novel methods that control for linkage disequilibrium between markers in a linkage analysis 
BMC Proceedings  2007;1(Suppl 1):S160.
We performed a multipoint linkage analysis for rheumatoid arthritis (RA) using high-density single-nucleotide polymorphism (SNP) data for chromosome 6 and chromosome 21 using Genetic Analysis Workshop 15 (GAW15) data. These regions were previously shown to have high LOD scores, not accounting for linkage disequilibrium (LD). We propose three novel methods to control for LD in a linkage analysis: allow for LD between markers using graphical modeling, eliminate high-LD markers by principal-component analysis (PCA) using haplotype data, and eliminate high-LD markers by PCA using genotype data. All three novel methods were compared to the previously published SNPLINK high-LD elimination method. Although all four methods verified the previous results, differences in linkage peak height and position were observed across methods. Additional work is required to further understand the effects of LD on linkage results and explore LD control methodology.
PMCID: PMC2367532  PMID: 18466506

