PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-3 (3)
 

Clipboard (0)
None
Journals
Authors
more »
Year of Publication
Document Types
1.  Analysis of human mini-exome sequencing data from Genetic Analysis Workshop 17 using a Bayesian hierarchical mixture model 
BMC Proceedings  2011;5(Suppl 9):S93.
Next-generation sequencing technologies are rapidly changing the field of genetic epidemiology and enabling exploration of the full allele frequency spectrum underlying complex diseases. Although sequencing technologies have shifted our focus toward rare genetic variants, statistical methods traditionally used in genetic association studies are inadequate for estimating effects of low minor allele frequency variants. Four our study we use the Genetic Analysis Workshop 17 data from 697 unrelated individuals (genotypes for 24,487 autosomal variants from 3,205 genes). We apply a Bayesian hierarchical mixture model to identify genes associated with a simulated binary phenotype using a transformed genotype design matrix weighted by allele frequencies. A Metropolis Hasting algorithm is used to jointly sample each indicator variable and additive genetic effect pair from its conditional posterior distribution, and remaining parameters are sampled by Gibbs sampling. This method identified 58 genes with a posterior probability greater than 0.8 for being associated with the phenotype. One of these 58 genes, PIK3C2B was correctly identified as being associated with affected status based on the simulation process. This project demonstrates the utility of Bayesian hierarchical mixture models using a transformed genotype matrix to detect genes containing rare and common variants associated with a binary phenotype.
doi:10.1186/1753-6561-5-S9-S93
PMCID: PMC3287935  PMID: 22373180
2.  Detecting gene-by-smoking interactions in a genome-wide association study of early-onset coronary heart disease using random forests 
BMC Proceedings  2009;3(Suppl 7):S88.
Background
Genome-wide association studies are often limited in their ability to attain their full potential due to the sheer volume of information created. We sought to use the random forest algorithm to identify single-nucleotide polymorphisms (SNPs) that may be involved in gene-by-smoking interactions related to the early-onset of coronary heart disease.
Methods
Using data from the Framingham Heart Study, our analysis used a case-only design in which the outcome of interest was age of onset of early coronary heart disease.
Results
Smoking status was dichotomized as ever versus never. The single SNP with the highest importance score assigned by random forests was rs2011345. This SNP was not associated with age alone in the control subjects. Using generalized estimating equations to adjust for sex and account for familial correlation, there was evidence of an interaction between rs2011345 and smoking status.
Conclusion
The results of this analysis suggest that random forests may be a useful tool for identifying SNPs taking part in gene-by-environment interactions in genome-wide association studies.
PMCID: PMC2795991  PMID: 20018084
3.  Classification tree for detection of single-nucleotide polymorphism (SNP)-by-SNP interactions related to heart disease: Framingham Heart Study 
BMC Proceedings  2009;3(Suppl 7):S83.
The aim of this study was to detect the effect of interactions between single-nucleotide polymorphisms (SNPs) on incidence of heart diseases. For this purpose, 2912 subjects with 350,160 SNPs from the Framingham Heart Study (FHS) were analyzed. PLINK was used to control quality and to select the 10,000 most significant SNPs. A classification tree algorithm, Generalized, Unbiased, Interaction Detection and Estimation (GUIDE), was employed to build a classification tree to detect SNP-by-SNP interactions for the selected 10 k SNPs. The classes generated by GUIDE were reexamined by a generalized estimating equations (GEE) model with the empirical variance after accounting for potential familial correlation. Overall, 17 classes were generated based on the splitting criteria in GUIDE. The prevalence of coronary heart disease (CHD) in class 16 (determined by SNPs rs1894035, rs7955732, rs2212596, and rs1417507) was the lowest (0.23%). Compared to class 16, all other classes except for class 288 (prevalence of 1.2%) had a significantly greater risk when analyzed using GEE model. This suggests the interactions of SNPs on these node paths are significant.
PMCID: PMC2795986  PMID: 20018079

Results 1-3 (3)