Search tips
Search criteria

Results 1-8 (8)

Clipboard (0)
Year of Publication
Document Types
1.  Detection of associations with rare and common SNPs for quantitative traits: a nonparametric Bayes-based approach 
BMC Proceedings  2011;5(Suppl 9):S10.
We propose a nonparametric Bayes-based clustering algorithm to detect associations with rare and common single-nucleotide polymorphisms (SNPs) for quantitative traits. Unlike current methods, our approach identifies associations with rare genetic variants at the variant level, not the gene level. In this method, we use a Dirichlet process prior for the distribution of SNP-specific regression coefficients, conduct hierarchical clustering with a distance measure derived from posterior pairwise probabilities of two SNPs having the same regression coefficient, and explore data-driven approaches to select the number of clusters. SNPs falling inside the largest cluster have relatively low or close to zero estimates of regression coefficients and are considered not associated with the trait. SNPs falling outside the largest cluster have relatively high estimates of regression coefficients and are considered potential risk variants. Using the data from the Genetic Analysis Workshop 17, we successfully detected associations with both rare and common SNPs for a quantitative trait. We conclude that our method provides a novel and broadly applicable strategy for obtaining association results with a reasonably low proportion of false discovery and that it can be routinely used in resequencing studies.
PMCID: PMC3287822  PMID: 22373351
2.  Effect of population stratification analysis on false-positive rates for common and rare variants 
BMC Proceedings  2011;5(Suppl 9):S116.
Principal components analysis (PCA) has been successfully used to correct for population stratification in genome-wide association studies of common variants. However, rare variants also have a role in common disease etiology. Whether PCA successfully controls population stratification for rare variants has not been addressed. Thus we evaluate the effect of population stratification analysis on false-positive rates for common and rare variants at the single-nucleotide polymorphism (SNP) and gene level. We use the simulation data from Genetic Analysis Workshop 17 and compare false-positive rates with and without PCA at the SNP and gene level. We found that SNPs’ minor allele frequency (MAF) influenced the ability of PCA to effectively control false discovery. Specifically, PCA reduced false-positive rates more effectively in common SNPs (MAF > 0.05) than in rare SNPs (MAF < 0.01). Furthermore, at the gene level, although false-positive rates were reduced, power to detect true associations was also reduced using PCA. Taken together, these results suggest that sequence-level data should be interpreted with caution, because extremely rare SNPs may exhibit sporadic association that is not controlled using PCA.
PMCID: PMC3287840  PMID: 22373282
3.  Family- and population-based designs identify different rare causal variants 
BMC Proceedings  2011;5(Suppl 9):S36.
Both family- and population-based samples are used to identify genetic variants associated with phenotypes. Each strategy has demonstrated advantages, but their ability to identify rare variants and genes containing rare variants is unclear. To compare these two study designs in the identification of rare causal variants, we applied various methods to the population- and family-based data simulated by the Genetic Analysis Workshop 17 with knowledge of the simulated model. Our results suggest that different variants can be identified by different study designs. Family-based and population-based study designs can be complementary in the identification of rare causal variants and should be considered in future studies.
PMCID: PMC3287872  PMID: 22373077
4.  Population structure analysis using rare and common functional variants 
BMC Proceedings  2011;5(Suppl 9):S8.
Next-generation sequencing technologies now make it possible to genotype and measure hundreds of thousands of rare genetic variations in individuals across the genome. Characterization of high-density genetic variation facilitates control of population genetic structure on a finer scale before large-scale genotyping in disease genetics studies. Population structure is a well-known, prevalent, and important factor in common variant genetic studies, but its relevance in rare variants is unclear. We perform an extensive population structure analysis using common and rare functional variants from the Genetic Analysis Workshop 17 mini-exome sequence. The analysis based on common functional variants required 388 principal components to account for 90% of the variation in population structure. However, an analysis based on rare variants required 532 significant principal components to account for similar levels of variation. Using rare variants, we detected fine-scale substructure beyond the population structure identified using common functional variants. Our results show that the level of population structure embedded in rare variant data is different from the level embedded in common variant data and that correcting for population structure is only as good as the level one wishes to correct.
PMCID: PMC3287920  PMID: 22373300
5.  The effect of minor allele frequency on the likelihood of obtaining false positives 
BMC Proceedings  2009;3(Suppl 7):S41.
Determining the most promising single-nucleotide polymorphisms (SNPs) presents a challenge in genome-wide association studies, when hundreds of thousands of association tests are conducted. The power to detect genetic effects is dependent on minor allele frequency (MAF), and genome-wide association studies SNP arrays include SNPs with a wide distribution of MAFs. Therefore, it is critical to understand MAF's effect on the false positive rate.
Data from the Framingham Heart Study simulated data (Problem 3, with answers) was used to examine the effects of varying MAFs on the likelihood of false positives. Replication set 1 was used to generate 1 million permutations of case/control status in unrelated individuals. Logistic regression was used to test for the association between each SNP and myocardial infarction using an additive model. We report the number of "significant" tests by MAF at α = 10-4, 10-5, and 10-6.
Common SNPs exhibited fewer false positives than expected. At α = 10-4, SNPs with MAF 25% and 50% resulted in 69.2 [95%CI: 62.8-75.6] and 70.8 [95%CI: 61.3-80.4] false positives, respectively, compared to 100 expected. Rare SNPs exhibited more variability but did not show more false-positive results than expected by chance. However, at α = 10-4, MAF = 5% exhibited significantly more false positives (105.5 [95%CI: 81-130.1]) than MAF = 25% and 50%. Similar results were seen at the other alpha values.
These results suggest that removal of low MAF SNPs from analysis due to concerns about inflated false-positive results may not be appropriate.
PMCID: PMC2795940  PMID: 20018033
7.  Comparison of false-discovery rate for genome-wide and fine mapping regions 
BMC Proceedings  2007;1(Suppl 1):S148.
With technological advances in high-throughput genotyping, it is not unusual to perform hundreds of thousands of tests for each phenotype. Thus, correction to control type I error is essential. The false-discovery rate (FDR) has been successfully used in genome-wide expression data. However, its performance has not been evaluated for association analysis. Our objective was to analyze the Genetic Analysis Workshop 15 simulated data set, with answers, to evaluate FDR for genome-wide association and fine mapping. In genome-wide analysis, FDR performed well, with good localization of positive results. However, in fine mapping, all tested methods performed poorly, producing a high proportion of significant results. Thus, caution should be used when employing FDR for fine mapping.
PMCID: PMC2367535  PMID: 18466492

Results 1-8 (8)