Search tips
Search criteria

Results 1-7 (7)

Clipboard (0)
Year of Publication
Document Types
1.  False-positive rates in two-point parametric linkage analysis 
BMC Proceedings  2014;8(Suppl 1):S110.
Two-point linkage analyses of whole genome sequence data are a promising approach to identify rare variants that segregate with complex diseases in large pedigrees because, in theory, the causal variants have been genotyped. We used whole genome sequence data and simulated traits provided by Genetic Analysis Workshop 18 to evaluate the proportion of false-positive findings in a binary trait using classic two-point parametric linkage analysis. False-positive genome-wide significant log of odds (LOD) scores were identified in more than 80% of 50 replicates for a binary phenotype generated by dichotomizing a quantitative trait that was simulated with a polygenic component (that was not based on any of the provided whole genome sequence genotypes). In contrast, when the trait was truly nongenetic (created by randomly assigning affected-unaffected status), the number of false-positive results was well controlled. These results suggest that when using two-point linkage analyses on whole genome sequence data, one should carefully examine regions yielding significant two-point LOD scores with multipoint analysis and that a more stringent significance threshold may be needed.
PMCID: PMC4143621  PMID: 25519363
2.  Identifying rare variants from exome scans: the GAW17 experience 
BMC Proceedings  2011;5(Suppl 9):S1.
Genetic Analysis Workshop 17 (GAW17) provided a platform for evaluating existing statistical genetic methods and for developing novel methods to analyze rare variants that modulate complex traits. In this article, we present an overview of the 1000 Genomes Project exome data and simulated phenotype data that were distributed to GAW17 participants for analyses, the different issues addressed by the participants, and the process of preparation of manuscripts resulting from the discussions during the workshop.
PMCID: PMC3287821  PMID: 22373325
3.  Performance of random forests and logic regression methods using mini-exome sequence data 
BMC Proceedings  2011;5(Suppl 9):S104.
Machine learning approaches are an attractive option for analyzing large-scale data to detect genetic variants that contribute to variation of a quantitative trait, without requiring specific distributional assumptions. We evaluate two machine learning methods, random forests and logic regression, and compare them to standard simple univariate linear regression, using the Genetic Analysis Workshop 17 mini-exome data. We also apply these methods after collapsing multiple rare variants within genes and within gene pathways. Linear regression and the random forest method performed better when rare variants were collapsed based on genes or gene pathways than when each variant was analyzed separately. Logic regression performed better when rare variants were collapsed based on genes rather than on pathways.
PMCID: PMC3287827  PMID: 22373484
4.  Comparison of results from tests of association in unrelated individuals with uncollapsed and collapsed sequence variants using tiled regression 
BMC Proceedings  2011;5(Suppl 9):S15.
Tiled regression is an approach designed to determine the set of independent genetic variants that contribute to the variation of a quantitative trait in the presence of many highly correlated variants. In this study, we evaluate the statistical properties of the tiled regression method using the Genetic Analysis Workshop 17 data in unrelated individuals for traits Q1, Q2, and Q4. To increase the power to detect rare variants, we use two methods to collapse rare variants and compare the results with those from the uncollapsed data. In addition, we compare the tiled regression method to traditional tests of association with and without collapsed rare variants. The results show that collapsing rare variants generally improves the power to detect associations regardless of method, although only variants with the largest allelic effects could be detected. However, for traditional simple linear regression, the average estimated type I error is dependent on the trait and varies by about three orders of magnitude. The estimated type I error rate is stable for tiled regression across traits.
PMCID: PMC3287849  PMID: 22373501
5.  Evaluation of random forests performance for genome-wide association studies in the presence of interaction effects 
BMC Proceedings  2009;3(Suppl 7):S64.
Random forests (RF) is one of a broad class of machine learning methods that are able to deal with large-scale data without model specification, which makes it an attractive method for genome-wide association studies (GWAS). The performance of RF and other association methods in the presence of interactions was evaluated using the simulated data from Genetic Analysis Workshop 16 Problem 3, with knowledge of the major causative markers, risk factors, and their interactions in the simulated traits. There was good power to detect the environmental risk factors using RF, trend tests, or regression analyses but the power to detect the effects of the causal markers was poor for all methods. The causal marker that had an interactive effect with smoking did show moderate evidence of association in the RF and regression analyses, suggesting that RF may perform well at detecting such interactions in larger, more highly powered datasets.
PMCID: PMC2795965  PMID: 20018058
6.  Application of sex-specific single-nucleotide polymorphism filters in genome-wide association data 
BMC Proceedings  2009;3(Suppl 7):S57.
We explored five sex-specific quality control filters in North American Rheumatoid Arthritis Consortium's Illumina 550 k datasets. Three X chromosome and three autosomal single-nucleotide polymorphisms flagged by sex quality control filters were missed by filters of call rate at 95% and Hardy-Weinberg equilibrium at 10-6. We applied a subset of these sex-specific quality control filters to eight chromosomes in the Framingham Heart Study samples genotyped by Affymetrix 500 k SNP arrays, and identified another two single-nucleotide polymorphisms that failed to be picked up by the above global filters.
PMCID: PMC2795957  PMID: 20018050
7.  Normalization of microarray expression data using within-pedigree pool and its effect on linkage analysis 
BMC Proceedings  2007;1(Suppl 1):S152.
"Genetical genomics", the study of natural genetic variation combining data from genetic marker-based studies with gene expression analyses, has exploded with the recent development of advanced microarray technologies. To account for systematic variation known to exist in microarray data, it is critical to properly normalize gene expression traits before performing genetic linkage analyses. However, imposing equal means and variances across pedigrees can over-correct for the true biological variation by ignoring familial correlations in expression values. We applied the robust multiarray average (RMA) method to gene expression trait data from 14 Centre d'Etude du Polymorphisme Humain (CEPH) Utah pedigrees provided by GAW15 (Genetic Analysis Workshop 15). We compared the RMA normalization method using within-pedigree pools to RMA normalization using all individuals in a single pool, which ignores pedigree membership, and investigated the effects of these different methods on 18 gene expression traits previously found to be linked to regions containing the corresponding structural locus. Familial correlation coefficients of the expressed traits were stronger when traits were normalized within pedigrees. Surprisingly, the linkage plots for these traits were similar, suggesting that although heritability increases when traits are normalized within pedigrees, the strength of linkage evidence does not necessarily change substantially.
PMCID: PMC2367611  PMID: 18466497

Results 1-7 (7)