Search tips
Search criteria

Results 1-10 (10)

Clipboard (0)
Year of Publication
Document Types
1.  Using Mendelian inheritance errors as quality control criteria in whole genome sequencing data set 
BMC Proceedings  2014;8(Suppl 1):S21.
Although the technical and analytic complexity of whole genome sequencing is generally appreciated, best practices for data cleaning and quality control have not been defined. Family based data can be used to guide the standardization of specific quality control metrics in nonfamily based data. Given the low mutation rate, Mendelian inheritance errors are likely as a result of erroneous genotype calls. Thus, our goal was to identify the characteristics that determine Mendelian inheritance errors. To accomplish this, we used chromosome 3 whole genome sequencing family based data from the Genetic Analysis Workshop 18. Mendelian inheritance errors were provided as part of the GAW18 data set. Additionally, for binary variants we calculated Mendelian inheritance errors using PLINK. Based on our analysis, nonbinary single-nucleotide variants have an inherently high number of Mendelian inheritance errors. Furthermore, in binary variants, Mendelian inheritance errors are not randomly distributed. Indeed, we identified 3 Mendelian inheritance error peaks that were enriched with repetitive elements. However, these peaks can be lessened with the inclusion of a single filter from the sequencing file. In summary, we demonstrated that erroneous sequencing calls are nonrandomly distributed across the genome and quality control metrics can dramatically reduce the number of mendelian inheritance errors. Appropriate quality control will allow optimal use of genetic data to realize the full potential of whole genome sequencing.
PMCID: PMC4144465  PMID: 25519373
2.  Modeling of multivariate longitudinal phenotypes in family genetic studies with Bayesian multiplicity adjustment 
BMC Proceedings  2014;8(Suppl 1):S69.
Genetic studies often collect data on multiple traits. Most genetic association analyses, however, consider traits separately and ignore potential correlation among traits, partially because of difficulties in statistical modeling of multivariate outcomes. When multiple traits are measured in a pedigree longitudinally, additional challenges arise because in addition to correlation between traits, a trait is often correlated with its own measures over time and with measurements of other family members. We developed a Bayesian model for analysis of bivariate quantitative traits measured longitudinally in family genetic studies. For a given trait, family-specific and subject-specific random effects account for correlation among family members and repeated measures, respectively. Correlation between traits is introduced by incorporating multivariate random effects and allowing time-specific trait residuals to correlate as in seemingly unrelated regressions. The proposed model can examine multiple single-nucleotide variations simultaneously, as well as incorporate familyspecific, subject-specific, or time-varying covariates. Bayesian multiplicity technique is used to effectively control false positives. Genetic Analysis Workshop 18 simulated data illustrate the proposed approach's applicability in modeling longitudinal multivariate outcomes in family genetic association studies.
PMCID: PMC4143665  PMID: 25519340
3.  Detection of associations with rare and common SNPs for quantitative traits: a nonparametric Bayes-based approach 
BMC Proceedings  2011;5(Suppl 9):S10.
We propose a nonparametric Bayes-based clustering algorithm to detect associations with rare and common single-nucleotide polymorphisms (SNPs) for quantitative traits. Unlike current methods, our approach identifies associations with rare genetic variants at the variant level, not the gene level. In this method, we use a Dirichlet process prior for the distribution of SNP-specific regression coefficients, conduct hierarchical clustering with a distance measure derived from posterior pairwise probabilities of two SNPs having the same regression coefficient, and explore data-driven approaches to select the number of clusters. SNPs falling inside the largest cluster have relatively low or close to zero estimates of regression coefficients and are considered not associated with the trait. SNPs falling outside the largest cluster have relatively high estimates of regression coefficients and are considered potential risk variants. Using the data from the Genetic Analysis Workshop 17, we successfully detected associations with both rare and common SNPs for a quantitative trait. We conclude that our method provides a novel and broadly applicable strategy for obtaining association results with a reasonably low proportion of false discovery and that it can be routinely used in resequencing studies.
PMCID: PMC3287822  PMID: 22373351
4.  Effect of population stratification analysis on false-positive rates for common and rare variants 
BMC Proceedings  2011;5(Suppl 9):S116.
Principal components analysis (PCA) has been successfully used to correct for population stratification in genome-wide association studies of common variants. However, rare variants also have a role in common disease etiology. Whether PCA successfully controls population stratification for rare variants has not been addressed. Thus we evaluate the effect of population stratification analysis on false-positive rates for common and rare variants at the single-nucleotide polymorphism (SNP) and gene level. We use the simulation data from Genetic Analysis Workshop 17 and compare false-positive rates with and without PCA at the SNP and gene level. We found that SNPs’ minor allele frequency (MAF) influenced the ability of PCA to effectively control false discovery. Specifically, PCA reduced false-positive rates more effectively in common SNPs (MAF > 0.05) than in rare SNPs (MAF < 0.01). Furthermore, at the gene level, although false-positive rates were reduced, power to detect true associations was also reduced using PCA. Taken together, these results suggest that sequence-level data should be interpreted with caution, because extremely rare SNPs may exhibit sporadic association that is not controlled using PCA.
PMCID: PMC3287840  PMID: 22373282
5.  Family- and population-based designs identify different rare causal variants 
BMC Proceedings  2011;5(Suppl 9):S36.
Both family- and population-based samples are used to identify genetic variants associated with phenotypes. Each strategy has demonstrated advantages, but their ability to identify rare variants and genes containing rare variants is unclear. To compare these two study designs in the identification of rare causal variants, we applied various methods to the population- and family-based data simulated by the Genetic Analysis Workshop 17 with knowledge of the simulated model. Our results suggest that different variants can be identified by different study designs. Family-based and population-based study designs can be complementary in the identification of rare causal variants and should be considered in future studies.
PMCID: PMC3287872  PMID: 22373077
6.  Population structure analysis using rare and common functional variants 
BMC Proceedings  2011;5(Suppl 9):S8.
Next-generation sequencing technologies now make it possible to genotype and measure hundreds of thousands of rare genetic variations in individuals across the genome. Characterization of high-density genetic variation facilitates control of population genetic structure on a finer scale before large-scale genotyping in disease genetics studies. Population structure is a well-known, prevalent, and important factor in common variant genetic studies, but its relevance in rare variants is unclear. We perform an extensive population structure analysis using common and rare functional variants from the Genetic Analysis Workshop 17 mini-exome sequence. The analysis based on common functional variants required 388 principal components to account for 90% of the variation in population structure. However, an analysis based on rare variants required 532 significant principal components to account for similar levels of variation. Using rare variants, we detected fine-scale substructure beyond the population structure identified using common functional variants. Our results show that the level of population structure embedded in rare variant data is different from the level embedded in common variant data and that correcting for population structure is only as good as the level one wishes to correct.
PMCID: PMC3287920  PMID: 22373300
7.  The effect of minor allele frequency on the likelihood of obtaining false positives 
BMC Proceedings  2009;3(Suppl 7):S41.
Determining the most promising single-nucleotide polymorphisms (SNPs) presents a challenge in genome-wide association studies, when hundreds of thousands of association tests are conducted. The power to detect genetic effects is dependent on minor allele frequency (MAF), and genome-wide association studies SNP arrays include SNPs with a wide distribution of MAFs. Therefore, it is critical to understand MAF's effect on the false positive rate.
Data from the Framingham Heart Study simulated data (Problem 3, with answers) was used to examine the effects of varying MAFs on the likelihood of false positives. Replication set 1 was used to generate 1 million permutations of case/control status in unrelated individuals. Logistic regression was used to test for the association between each SNP and myocardial infarction using an additive model. We report the number of "significant" tests by MAF at α = 10-4, 10-5, and 10-6.
Common SNPs exhibited fewer false positives than expected. At α = 10-4, SNPs with MAF 25% and 50% resulted in 69.2 [95%CI: 62.8-75.6] and 70.8 [95%CI: 61.3-80.4] false positives, respectively, compared to 100 expected. Rare SNPs exhibited more variability but did not show more false-positive results than expected by chance. However, at α = 10-4, MAF = 5% exhibited significantly more false positives (105.5 [95%CI: 81-130.1]) than MAF = 25% and 50%. Similar results were seen at the other alpha values.
These results suggest that removal of low MAF SNPs from analysis due to concerns about inflated false-positive results may not be appropriate.
PMCID: PMC2795940  PMID: 20018033
9.  Comparison of false-discovery rate for genome-wide and fine mapping regions 
BMC Proceedings  2007;1(Suppl 1):S148.
With technological advances in high-throughput genotyping, it is not unusual to perform hundreds of thousands of tests for each phenotype. Thus, correction to control type I error is essential. The false-discovery rate (FDR) has been successfully used in genome-wide expression data. However, its performance has not been evaluated for association analysis. Our objective was to analyze the Genetic Analysis Workshop 15 simulated data set, with answers, to evaluate FDR for genome-wide association and fine mapping. In genome-wide analysis, FDR performed well, with good localization of positive results. However, in fine mapping, all tested methods performed poorly, producing a high proportion of significant results. Thus, caution should be used when employing FDR for fine mapping.
PMCID: PMC2367535  PMID: 18466492

Results 1-10 (10)