PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (702629)

Clipboard (0)
None

Related Articles

1.  A hidden Markov random field model for genome-wide association studies 
Biostatistics (Oxford, England)  2009;11(1):139-150.
Genome-wide association studies (GWAS) are increasingly utilized for identifying novel susceptible genetic variants for complex traits, but there is little consensus on analysis methods for such data. Most commonly used methods include single single nucleotide polymorphism (SNP) analysis or haplotype analysis with Bonferroni correction for multiple comparisons. Since the SNPs in typical GWAS are often in linkage disequilibrium (LD), at least locally, Bonferroni correction of multiple comparisons often leads to conservative error control and therefore lower statistical power. In this paper, we propose a hidden Markov random field model (HMRF) for GWAS analysis based on a weighted LD graph built from the prior LD information among the SNPs and an efficient iterative conditional mode algorithm for estimating the model parameters. This model effectively utilizes the LD information in calculating the posterior probability that an SNP is associated with the disease. These posterior probabilities can then be used to define a false discovery controlling procedure in order to select the disease-associated SNPs. Simulation studies demonstrated the potential gain in power over single SNP analysis. The proposed method is especially effective in identifying SNPs with borderline significance at the single-marker level that nonetheless are in high LD with significant SNPs. In addition, by simultaneously considering the SNPs in LD, the proposed method can also help to reduce the number of false identifications of disease-associated SNPs. We demonstrate the application of the proposed HMRF model using data from a case–control GWAS of neuroblastoma and identify 1 new SNP that is potentially associated with neuroblastoma.
doi:10.1093/biostatistics/kxp043
PMCID: PMC2800164  PMID: 19822692
Empirical Bayes; False discovery; Iterative conditional model; Linkage disequilibrium
2.  Correction for Multiplicity in Genetic Association Studies of Triads: The Permutational TDT 
Annals of human genetics  2010;75(2):284-291.
Summary
New technology for large-scale genotyping has created new challenges for statistical analysis. Correcting for multiple comparison without discarding true positive results and extending methods to triad studies are among the important problems facing statisticians. We present a one-sample permutation test for testing transmission disequilibrium hypotheses in triad studies, and show how this test can be used for multiple single nucleotide polymorphism (SNP) testing. The resulting multiple comparison procedure is shown in the case of the transmission disequilibrium test to control the familywise error. Furthermore, this procedure can handle multiple possible modes of risk inheritance per SNP. The resulting permutational procedure is shown through simulation of SNP data to be more powerful than the Bonferroni procedure when the SNPs are in linkage disequilibrium. Moreover, permutations implicitly avoid any multiple comparison correction penalties when the SNP has a rare allele. The method is illustrated by analyzing a large candidate gene study of neural tube defects and an independent study of oral clefts, where the smallest adjusted p-values using the permutation procedure are approximately half those of the Bonferroni procedure. We conclude that permutation tests are more powerful for identifying disease-associated SNPs in candidate gene studies and are useful for analysis of triad studies.
doi:10.1111/j.1469-1809.2010.00626.x
PMCID: PMC3117224  PMID: 21108625
Exchangeable; familywise error rate; linkage disequilibrium; power
3.  Genome-wide association studies of rheumatoid arthritis data via multiple hypothesis testing methods for correlated tests 
BMC Proceedings  2009;3(Suppl 7):S38.
Genome-wide association studies often involve testing hundreds of thousands of single-nucleotide polymorphisms (SNPs). These tests may be highly correlated because of linkage disequilibrium among SNPs. Multiple testing correction ignoring the correlation among markers, as is done in the Bonferroni procedure, can cause loss of power. Several multiple testing adjustment methods accounting for correlations among tests have been developed and have shown improved power compared to the Bonferroni procedure. These methods include a Monte Carlo (MC) method and a method of computing p-values adjusted for correlated tests. The objective of this study is to apply these two multiple testing methods to genome-wide association study of the Genetic Analysis Workshop 16 rheumatoid arthritis data from the North American Rheumatoid Arthritis Consortium, to compare the performance of these two methods to the Bonferroni procedure in identifying susceptibility loci underlying rheumatoid arthritis, and to discuss the strengths and weaknesses of these methods. The results show that both the MC method and p-values adjusted for correlated tests method identified more significant SNPs, thus potentially have higher power than the corresponding Bonferroni methods using the same test statistics as in the MC method and p-values adjusted for correlated tests, respectively. Simulation studies demonstrate that the MC method may have slightly higher power than the p-values adjusted for correlated tests method.
PMCID: PMC2795936  PMID: 20018029
4.  Establishing an adjusted p-value threshold to control the family-wide type 1 error in genome wide association studies 
BMC Genomics  2008;9:516.
Background
By assaying hundreds of thousands of single nucleotide polymorphisms, genome wide association studies (GWAS) allow for a powerful, unbiased review of the entire genome to localize common genetic variants that influence health and disease. Although it is widely recognized that some correction for multiple testing is necessary, in order to control the family-wide Type 1 Error in genetic association studies, it is not clear which method to utilize. One simple approach is to perform a Bonferroni correction using all n single nucleotide polymorphisms (SNPs) across the genome; however this approach is highly conservative and would "overcorrect" for SNPs that are not truly independent. Many SNPs fall within regions of strong linkage disequilibrium (LD) ("blocks") and should not be considered "independent".
Results
We proposed to approximate the number of "independent" SNPs by counting 1 SNP per LD block, plus all SNPs outside of blocks (interblock SNPs). We examined the effective number of independent SNPs for Genome Wide Association Study (GWAS) panels. In the CEPH Utah (CEU) population, by considering the interdependence of SNPs, we could reduce the total number of effective tests within the Affymetrix and Illumina SNP panels from 500,000 and 317,000 to 67,000 and 82,000 "independent" SNPs, respectively. For the Affymetrix 500 K and Illumina 317 K GWAS SNP panels we recommend using 10-5, 10-7 and 10-8 and for the Phase II HapMap CEPH Utah and Yoruba populations we recommend using 10-6, 10-7 and 10-9 as "suggestive", "significant" and "highly significant" p-value thresholds to properly control the family-wide Type 1 error.
Conclusion
By approximating the effective number of independent SNPs across the genome we are able to 'correct' for a more accurate number of tests and therefore develop 'LD adjusted' Bonferroni corrected p-value thresholds that account for the interdepdendence of SNPs on well-utilized commercially available SNP "chips". These thresholds will serve as guides to researchers trying to decide which regions of the genome should be studied further.
doi:10.1186/1471-2164-9-516
PMCID: PMC2621212  PMID: 18976480
5.  The null distributions of test statistics in genomewide association studies 
Statistics in biosciences  2009;1(2):214-227.
SUMMARY
Genomewide association (GWA) studies assay hundreds of thousands of single nucleotide polymorphisms (SNPs) simultaneously across the entire genome and associate them with diseases, other biological or clinical traits. The association analysis usually tests each SNP as an independent entity and ignores the biological information such as linkage disequilibrium. Although the Bonferroni correction and other approaches have been proposed to address the issue of multiple comparisons as a result of testing many SNPs, there is a lack of understanding of the distribution of an association test statistic when an entire genome is considered together. In other words, there are extensive efforts in hypothesis testing, and almost no attempt in estimating the density under the null hypothesis. By estimating the true null distribution, we can apply the result directly to hypothesis testing; better assess the existing approaches of multiple comparisons; and evaluate the impact of linkage disequilibrium on the GWA studies. To this end, we estimate the empirical null distribution of an association test statistic in GWA studies using simulated population data. We further propose a convenient and accurate method based on adaptive spline to estimate the empirical value in GWA studies and validate our findings using a real data set. Our method enables us to fully characterize the null distribution of an association test that not only can be used to test the null hypothesis of no association, but also provide important information about the impact of density of the genetic markers on the significance of the tests. Our method does not require users to perform computationally intensive permutations, and hence provides a timely solution to an important and difficult problem in GWA studies.
doi:10.1007/s12561-009-9011-4
PMCID: PMC2826849  PMID: 20190869
critical value; generalized extreme-value distribution; genomewide association
6.  A New Approach to Account for the Correlations among Single Nucleotide Polymorphisms in Genome-Wide Association Studies 
Human Heredity  2011;72(1):1-9.
In genetic association studies, such as genome-wide association studies (GWAS), the number of single nucleotide polymorphisms (SNPs) can be as large as hundreds of thousands. Due to linkage disequilibrium, many SNPs are highly correlated; assuming they are independent is not valid. The commonly used multiple comparison methods, such as Bonferroni correction, are not appropriate and are too conservative when applied to GWAS. To overcome these limitations, many approaches have been proposed to estimate the so-called effective number of independent tests to account for the correlations among SNPs. However, many current effective number estimation methods are based on eigenvalues of the correlation matrix. When the dimension of the matrix is large, the numeric results may be unreliable or even unobtainable. To circumvent this obstacle and provide better estimates, we propose a new effective number estimation approach which is not based on the eigenvalues. We compare the new method with others through simulated and real data. The comparison results show that the proposed method has very good performance.
doi:10.1159/000330135
PMCID: PMC3171280  PMID: 21849789
Effective number; Genome-wide association studies; Multiple comparisons; Single nucleotide polymorphisms
7.  Multiple testing corrections for imputed SNPs 
Genetic epidemiology  2011;35(3):154-158.
Multiple testing corrections are an active research topic in genetic association studies, especially for genome-wide association studies (GWAS), where tests of association with traits are conducted at millions of imputed SNPs with estimated allelic dosages now. Failure to address multiple comparisons appropriately can introduce excess false positive results and make subsequent studies following up those results inefficient. Permutation tests are considered the gold standard in multiple testing adjustment; however, this procedure is computationally demanding, especially for GWAS. Notably, the permutation thresholds for the huge number of estimated allelic dosages in real data sets have not been reported. Although many researchers have recently developed algorithms to rapidly approximate the permutation thresholds with accuracy similar to the permutation test, these methods have not been verified with estimated allelic dosages. In this study, we compare recently published multiple testing correction methods using 2.5M estimated allelic dosages. We also derive permutation significance levels based on 10,000 GWAS results under the null hypothesis of no association. Our results show that the simpleM method works well with estimated allelic dosages and gives the closest approximation to the permutation threshold while requiring the least computation time.
doi:10.1002/gepi.20563
PMCID: PMC3055936  PMID: 21254223
multiple testing; genome-wide association studies; imputed SNPs; allelic dosages
8.  Common variants at 7p21 are associated with frontotemporal lobar degeneration with TDP-43 inclusions 
Van Deerlin, Vivianna M. | Sleiman, Patrick M. A. | Martinez-Lage, Maria | Chen-Plotkin, Alice | Wang, Li-San | Graff-Radford, Neill R | Dickson, Dennis W. | Rademakers, Rosa | Boeve, Bradley F. | Grossman, Murray | Arnold, Steven E. | Mann, David M.A. | Pickering-Brown, Stuart M. | Seelaar, Harro | Heutink, Peter | van Swieten, John C. | Murrell, Jill R. | Ghetti, Bernardino | Spina, Salvatore | Grafman, Jordan | Hodges, John | Spillantini, Maria Grazia | Gilman, Sid' | Lieberman, Andrew P. | Kaye, Jeffrey A. | Woltjer, Randall L. | Bigio, Eileen H | Mesulam, Marsel | al-Sarraj, Safa | Troakes, Claire | Rosenberg, Roger N. | White, Charles L. | Ferrer, Isidro | Lladó, Albert | Neumann, Manuela | Kretzschmar, Hans A. | Hulette, Christine Marie | Welsh-Bohmer, Kathleen A. | Miller, Bruce L | Alzualde, Ainhoa | de Munain, Adolfo Lopez | McKee, Ann C. | Gearing, Marla | Levey, Allan I. | Lah, James J. | Hardy, John | Rohrer, Jonathan D. | Lashley, Tammaryn | Mackenzie, Ian R.A. | Feldman, Howard H. | Hamilton, Ronald L. | Dekosky, Steven T. | van der Zee, Julie | Kumar-Singh, Samir | Van Broeckhoven, Christine | Mayeux, Richard | Vonsattel, Jean Paul G. | Troncoso, Juan C. | Kril, Jillian J | Kwok, John B.J. | Halliday, Glenda M. | Bird, Thomas D. | Ince, Paul G. | Shaw, Pamela J. | Cairns, Nigel J. | Morris, John C. | McLean, Catriona Ann | DeCarli, Charles | Ellis, William G. | Freeman, Stefanie H. | Frosch, Matthew P. | Growdon, John H. | Perl, Daniel P. | Sano, Mary | Bennett, David A. | Schneider, Julie A. | Beach, Thomas G. | Reiman, Eric M. | Woodruff, Bryan K. | Cummings, Jeffrey | Vinters, Harry V. | Miller, Carol A. | Chui, Helena C. | Alafuzoff, Irina | Hartikainen, Päivi | Seilhean, Danielle | Galasko, Douglas | Masliah, Eliezer | Cotman, Carl W. | Tuñón, M. Teresa | Martínez, M. Cristina Caballero | Munoz, David G. | Carroll, Steven L. | Marson, Daniel | Riederer, Peter F. | Bogdanovic, Nenad | Schellenberg, Gerard D. | Hakonarson, Hakon | Trojanowski, John Q. | Lee, Virginia M.-Y.
Nature genetics  2010;42(3):234-239.
Frontotemporal lobar degeneration (FTLD) is the second most common cause of presenile dementia. The predominant neuropathology is FTLD with TAR DNA binding protein (TDP-43) inclusions (FTLD-TDP)1. FTLD-TDP is frequently familial resulting from progranulin (GRN) mutations. We assembled an international collaboration to identify susceptibility loci for FTLD-TDP, using genome-wide association (GWA). We found that FTLD-TDP associates with multiple SNPs mapping to a single linkage disequilibrium (LD) block on 7p21 that contains TMEM106B in a GWA study (GWAS) on 515 FTLD-TDP cases. Three SNPs retained genome-wide significance following Bonferroni correction; top SNP rs1990622 (P=1.08×10−11; odds ratio (OR) minor allele (C) 0.61, 95% CI 0.53-0.71). The association replicated in 89 FTLD-TDP cases (rs1990622; P=2×10−4). TMEM106B variants may confer risk by increasing TMEM106B expression. TMEM106B variants also contribute to genetic risk for FTLD-TDP in patients with GRN mutations. Our data implicate TMEM106B as a strong risk factor for FTLD-TDP suggesting an underlying pathogenic mechanism.
doi:10.1038/ng.536
PMCID: PMC2828525  PMID: 20154673
9.  ParaHaplo: A program package for haplotype-based whole-genome association study using parallel computing 
Background
Since more than a million single-nucleotide polymorphisms (SNPs) are analyzed in any given genome-wide association study (GWAS), performing multiple comparisons can be problematic. To cope with multiple-comparison problems in GWAS, haplotype-based algorithms were developed to correct for multiple comparisons at multiple SNP loci in linkage disequilibrium. A permutation test can also control problems inherent in multiple testing; however, both the calculation of exact probability and the execution of permutation tests are time-consuming. Faster methods for calculating exact probabilities and executing permutation tests are required.
Methods
We developed a set of computer programs for the parallel computation of accurate P-values in haplotype-based GWAS. Our program, ParaHaplo, is intended for workstation clusters using the Intel Message Passing Interface (MPI). We compared the performance of our algorithm to that of the regular permutation test on JPT and CHB of HapMap.
Results
ParaHaplo can detect smaller differences between 2 populations than SNP-based GWAS. We also found that parallel-computing techniques made ParaHaplo 100-fold faster than a non-parallel version of the program.
Conclusion
ParaHaplo is a useful tool in conducting haplotype-based GWAS. Since the data sizes of such projects continue to increase, the use of fast computations with parallel computing--such as that used in ParaHaplo--will become increasingly important. The executable binaries and program sources of ParaHaplo are available at the following address:
doi:10.1186/1751-0473-4-7
PMCID: PMC2774321  PMID: 19845960
10.  PRESTO: Rapid calculation of order statistic distributions and multiple-testing adjusted P-values via permutation for one and two-stage genetic association studies 
BMC Bioinformatics  2008;9:309.
Background
Large-scale genetic association studies can test hundreds of thousands of genetic markers for association with a trait. Since the genetic markers may be correlated, a Bonferroni correction is typically too stringent a correction for multiple testing. Permutation testing is a standard statistical technique for determining statistical significance when performing multiple correlated tests for genetic association. However, permutation testing for large-scale genetic association studies is computationally demanding and calls for optimized algorithms and software. PRESTO is a new software package for genetic association studies that performs fast computation of multiple-testing adjusted P-values via permutation of the trait.
Results
PRESTO is an order of magnitude faster than other existing permutation testing software, and can analyze a large genome-wide association study (500 K markers, 5 K individuals, 1 K permutations) in approximately one hour of computing time. PRESTO has several unique features that are useful in a wide range of studies: it reports empirical null distributions for the top-ranked statistics (i.e. order statistics), it performs user-specified combinations of allelic and genotypic tests, it performs stratified analysis when sampled individuals are from multiple populations and each individual's population of origin is specified, and it determines significance levels for one and two-stage genotyping designs. PRESTO is designed for case-control studies, but can also be applied to trio data (parents and affected offspring) if transmitted parental alleles are coded as case alleles and untransmitted parental alleles are coded as control alleles.
Conclusion
PRESTO is a platform-independent software package that performs fast and flexible permutation testing for genetic association studies. The PRESTO executable file, Java source code, example data, and documentation are freely available at .
doi:10.1186/1471-2105-9-309
PMCID: PMC2483288  PMID: 18620604
11.  META-ANALYSIS OF GENETIC ASSOCIATION STUDIES AND ADJUSTMENT FOR MULTIPLE TESTING OF CORRELATED SNPS AND TRAITS 
Genetic epidemiology  2010;34(7):739-746.
Meta-analysis has become a key component of well-designed genetic association studies due to the boost in statistical power achieved by combining results across multiple samples of individuals and the need to validate observed associations in independent studies. Meta-analyses of genetic association studies based on multiple SNPs and traits are subject to the same multiple testing issues as single-sample studies, but it is often difficult to adjust accurately for the multiple tests. Procedures such as Bonferroni may control the type I error rate but will generally provide an overly harsh correction if SNPs or traits are correlated. Depending on study design, availability of individual-level data, and computational requirements, permutation testing may not be feasible in a meta-analysis framework. In this paper we present methods for adjusting for multiple correlated tests under several study designs commonly employed in meta-analyses of genetic association tests. Our methods are applicable to both prospective meta-analyses in which several samples of individuals are analyzed with the intent to combine results, and retrospective meta-analyses, in which results from published studies are combined, including situations in which 1) individual-level data are unavailable, and 2) different sets of SNPs are genotyped in different studies due to random missingness or two-stage design. We show through simulation that our methods accurately control the rate of type I error and achieve improved power over multiple testing adjustments that do not account for correlation between SNPs or traits.
doi:10.1002/gepi.20538
PMCID: PMC3070606  PMID: 20878715
meta-analysis; association study; multiple testing; SNPs
12.  Fast and Accurate Approximation to Significance Tests in Genome-Wide Association Studies 
Genome-wide association studies commonly involve simultaneous tests of millions of single nucleotide polymorphisms (SNP) for disease association. The SNPs in nearby genomic regions, however, are often highly correlated due to linkage disequilibrium (LD, a genetic term for correlation). Simple Bonferonni correction for multiple comparisons is therefore too conservative. Permutation tests, which are often employed in practice, are both computationally expensive for genome-wide studies and limited in their scopes. We present an accurate and computationally efficient method, based on Poisson de-clumping heuristics, for approximating genome-wide significance of SNP associations. Compared with permutation tests and other multiple comparison adjustment approaches, our method computes the most accurate and robust p-value adjustments for millions of correlated comparisons within seconds. We demonstrate analytically that the accuracy and the efficiency of our method are nearly independent of the sample size, the number of SNPs, and the scale of p-values to be adjusted. In addition, our method can be easily adopted to estimate false discovery rate. When applied to genome-wide SNP datasets, we observed highly variable p-value adjustment results evaluated from different genomic regions. The variation in adjustments along the genome, however, are well conserved between the European and the African populations. The p-value adjustments are significantly correlated with LD among SNPs, recombination rates, and SNP densities. Given the large variability of sequence features in the genome, we further discuss a novel approach of using SNP-specific (local) thresholds to detect genome-wide significant associations. This article has supplementary material online.
doi:10.1198/jasa.2011.ap10657
PMCID: PMC3226809  PMID: 22140288
Genome-wide association study; Multiple comparison; Poisson approximation
13.  A combined strategy for quantitative trait loci detection by genome-wide association 
BMC Proceedings  2009;3(Suppl 1):S6.
Background
We applied a range of genome-wide association (GWA) methods to map quantitative trait loci (QTL) in the simulated dataset provided by the 12th QTLMAS workshop in order to derive an effective strategy.
Results
A variance component linkage analysis revealed QTLs but with low resolution. Three single-marker based GWA methods were then applied: Transmission Disequilibrium Test and single marker regression, fitting an additive model or a genotype model, on phenotypes pre-corrected for pedigree and fixed effects. These methods detected QTL positions with high concordance to each other and with greater refinement of the linkage signals. Further multiple-marker and haplotype analyses confirmed the results with higher significance. Two-locus interaction analysis detected two epistatic pairs of markers that were not significant by marginal effects. Overall, using stringent Bonferroni thresholds we identified 9 additive QTL and 2 epistatic interactions, which together explained about 12.3% of the corrected phenotypic variance.
Conclusion
The combination of methods that are robust against population stratification, like QTDT, with flexible linear models that take account of the family structure provided consistent results. Extensive simulations are still required to determine appropriate thresholds for more advanced model including epistasis.
PMCID: PMC2654500  PMID: 19278545
14.  Comparison of type I error for multiple test corrections in large single-nucleotide polymorphism studies using principal components versus haplotype blocking algorithms 
BMC Genetics  2005;6(Suppl 1):S78.
Although permutation testing has been the gold standard for assessing significance levels in studies using multiple markers, it is time-consuming. A Bonferroni correction to the nominal p-value that uses the underlying pair-wise linkage disequilibrium (LD) structure among the markers to determine the number of effectively independent tests has recently been proposed. We propose using the number of independent LD blocks plus the number of independent single-nucleotide polymorphisms for correction. Using the Collaborative Study on the Genetics of Alcoholism LD data for chromosome 21, we simulated 1,000 replicates of parent-child trio data under the null hypothesis with two levels of LD: moderate and high. Assuming haplotype blocks were independent, we calculated the number of independent statistical tests using 3 haplotype blocking algorithms. We then compared the type I error rates using a principal components-based method, the three blocking methods, a traditional Bonferroni correction, and the unadjusted p-values obtained from FBAT. Under high LD conditions, the PC method and one of the blocking methods were slightly conservative, whereas the 2 other blocking methods exceeded the target type I error rate. Under conditions of moderate LD, we show that the blocking algorithm corrections are closest to the desired type I error, although still slightly conservative, with the principal components-based method being almost as conservative as the traditional Bonferroni correction.
doi:10.1186/1471-2156-6-S1-S78
PMCID: PMC1866703  PMID: 16451692
15.  FastANOVA: an Efficient Algorithm for Genome-Wide Association Study 
Studying the association between quantitative phenotype (such as height or weight) and single nucleotide polymorphisms (SNPs) is an important problem in biology. To understand underlying mechanisms of complex phenotypes, it is often necessary to consider joint genetic effects across multiple SNPs. ANOVA (analysis of variance) test is routinely used in association study. Important findings from studying gene-gene (SNP-pair) interactions are appearing in the literature. However, the number of SNPs can be up to millions. Evaluating joint effects of SNPs is a challenging task even for SNP-pairs. Moreover, with large number of SNPs correlated, permutation procedure is preferred over simple Bonferroni correction for properly controlling family-wise error rate and retaining mapping power, which dramatically increases the computational cost of association study.
In this paper, we study the problem of finding SNP-pairs that have significant associations with a given quantitative phenotype. We propose an efficient algorithm, FastANOVA, for performing ANOVA tests on SNP-pairs in a batch mode, which also supports large permutation test. We derive an upper bound of SNP-pair ANOVA test, which can be expressed as the sum of two terms. The first term is based on single-SNP ANOVA test. The second term is based on the SNPs and independent of any phenotype permutation. Furthermore, SNP-pairs can be organized into groups, each of which shares a common upper bound. This allows for maximum reuse of intermediate computation, efficient upper bound estimation, and effective SNP-pair pruning. Consequently, FastANOVA only needs to perform the ANOVA test on a small number of candidate SNP-pairs without the risk of missing any significant ones. Extensive experiments demonstrate that FastANOVA is orders of magnitude faster than the brute-force implementation of ANOVA tests on all SNP pairs.
PMCID: PMC2951741  PMID: 20945829
Association study; ANOVA test
16.  Incorporating prior knowledge to facilitate discoveries in a genome-wide association study on age-related macular degeneration 
BMC Research Notes  2010;3:26.
Background
Substantial genotyping data produced by current high-throughput technologies have brought opportunities and difficulties. With the number of single-nucleotide polymorphisms (SNPs) going into millions comes the harsh challenge of multiple-testing adjustment. However, even with the false discovery rate (FDR) control approach, a genome-wide association study (GWAS) may still fall short of discovering any true positive gene, particularly when it has a relatively small sample size.
Findings
To counteract such a harsh multiple-testing penalty, in this report, we incorporate findings from previous linkage and association studies to re-analyze a GWAS on age-related macular degeneration. While previous Bonferroni correction and the traditional FDR approach detected only one significant SNP (rs380390), here we have been able to detect seven significant SNPs with an easy-to-implement prioritized subset analysis (PSA) with the overall FDR controlled at 0.05. These include SNPs within three genes: CFH, CFHR4, and SGCD.
Conclusions
Based on the success of this example, we advocate using the simple method of PSA to facilitate discoveries in future GWASs.
doi:10.1186/1756-0500-3-26
PMCID: PMC2843735  PMID: 20181037
17.  Genome-Wide Associations of Gene Expression Variation in Humans 
PLoS Genetics  2005;1(6):e78.
The exploration of quantitative variation in human populations has become one of the major priorities for medical genetics. The successful identification of variants that contribute to complex traits is highly dependent on reliable assays and genetic maps. We have performed a genome-wide quantitative trait analysis of 630 genes in 60 unrelated Utah residents with ancestry from Northern and Western Europe using the publicly available phase I data of the International HapMap project. The genes are located in regions of the human genome with elevated functional annotation and disease interest including the ENCODE regions spanning 1% of the genome, Chromosome 21 and Chromosome 20q12–13.2. We apply three different methods of multiple test correction, including Bonferroni, false discovery rate, and permutations. For the 374 expressed genes, we find many regions with statistically significant association of single nucleotide polymorphisms (SNPs) with expression variation in lymphoblastoid cell lines after correcting for multiple tests. Based on our analyses, the signal proximal (cis-) to the genes of interest is more abundant and more stable than distal and trans across statistical methodologies. Our results suggest that regulatory polymorphism is widespread in the human genome and show that the 5-kb (phase I) HapMap has sufficient density to enable linkage disequilibrium mapping in humans. Such studies will significantly enhance our ability to annotate the non-coding part of the genome and interpret functional variation. In addition, we demonstrate that the HapMap cell lines themselves may serve as a useful resource for quantitative measurements at the cellular level.
Synopsis
With the finished reference sequence of the human genome now available, focus has shifted towards trying to identify all of the functional elements within the sequence. Although quite a lot of progress has been made towards identifying some classes of genomic elements, in particular protein-coding sequences, the characterization of regulatory elements remains a challenge. The authors describe the genetic mapping of regions of the genome that have functional effects on quantitative levels of gene expression. Gene expression of 630 genes was measured in cell lines derived from 60 unrelated human individuals, the same Utah residents of Northern and Western European ancestry that have been genetically well-characterized by The International HapMap Project. This paper reports significant variation among individuals with respect to levels of gene expression, and demonstrates that this quantitative trait has a genetic basis. For some genes, the genetic signal was localized to specific locations in the human genome sequence; in most cases the genomic region associated with expression variation was physically close to the gene whose expression it regulated. The authors demonstrate the feasibility of performing whole-genome association scans to map quantitative traits, and highlight statistical issues that are increasingly important for whole-genome disease mapping studies.
doi:10.1371/journal.pgen.0010078
PMCID: PMC1315281  PMID: 16362079
18.  Genome-Wide Associations of Gene Expression Variation in Humans 
PLoS Genetics  2005;1(6):e78.
The exploration of quantitative variation in human populations has become one of the major priorities for medical genetics. The successful identification of variants that contribute to complex traits is highly dependent on reliable assays and genetic maps. We have performed a genome-wide quantitative trait analysis of 630 genes in 60 unrelated Utah residents with ancestry from Northern and Western Europe using the publicly available phase I data of the International HapMap project. The genes are located in regions of the human genome with elevated functional annotation and disease interest including the ENCODE regions spanning 1% of the genome, Chromosome 21 and Chromosome 20q12–13.2. We apply three different methods of multiple test correction, including Bonferroni, false discovery rate, and permutations. For the 374 expressed genes, we find many regions with statistically significant association of single nucleotide polymorphisms (SNPs) with expression variation in lymphoblastoid cell lines after correcting for multiple tests. Based on our analyses, the signal proximal (cis-) to the genes of interest is more abundant and more stable than distal and trans across statistical methodologies. Our results suggest that regulatory polymorphism is widespread in the human genome and show that the 5-kb (phase I) HapMap has sufficient density to enable linkage disequilibrium mapping in humans. Such studies will significantly enhance our ability to annotate the non-coding part of the genome and interpret functional variation. In addition, we demonstrate that the HapMap cell lines themselves may serve as a useful resource for quantitative measurements at the cellular level.
Synopsis
With the finished reference sequence of the human genome now available, focus has shifted towards trying to identify all of the functional elements within the sequence. Although quite a lot of progress has been made towards identifying some classes of genomic elements, in particular protein-coding sequences, the characterization of regulatory elements remains a challenge. The authors describe the genetic mapping of regions of the genome that have functional effects on quantitative levels of gene expression. Gene expression of 630 genes was measured in cell lines derived from 60 unrelated human individuals, the same Utah residents of Northern and Western European ancestry that have been genetically well-characterized by The International HapMap Project. This paper reports significant variation among individuals with respect to levels of gene expression, and demonstrates that this quantitative trait has a genetic basis. For some genes, the genetic signal was localized to specific locations in the human genome sequence; in most cases the genomic region associated with expression variation was physically close to the gene whose expression it regulated. The authors demonstrate the feasibility of performing whole-genome association scans to map quantitative traits, and highlight statistical issues that are increasingly important for whole-genome disease mapping studies.
doi:10.1371/journal.pgen.0010078
PMCID: PMC1315281  PMID: 16362079
19.  Association Test Based on SNP Set: Logistic Kernel Machine Based Test vs. Principal Component Analysis 
PLoS ONE  2012;7(9):e44978.
GWAS has facilitated greatly the discovery of risk SNPs associated with complex diseases. Traditional methods analyze SNP individually and are limited by low power and reproducibility since correction for multiple comparisons is necessary. Several methods have been proposed based on grouping SNPs into SNP sets using biological knowledge and/or genomic features. In this article, we compare the linear kernel machine based test (LKM) and principal components analysis based approach (PCA) using simulated datasets under the scenarios of 0 to 3 causal SNPs, as well as simple and complex linkage disequilibrium (LD) structures of the simulated regions. Our simulation study demonstrates that both LKM and PCA can control the type I error at the significance level of 0.05. If the causal SNP is in strong LD with the genotyped SNPs, both the PCA with a small number of principal components (PCs) and the LKM with kernel of linear or identical-by-state function are valid tests. However, if the LD structure is complex, such as several LD blocks in the SNP set, or when the causal SNP is not in the LD block in which most of the genotyped SNPs reside, more PCs should be included to capture the information of the causal SNP. Simulation studies also demonstrate the ability of LKM and PCA to combine information from multiple causal SNPs and to provide increased power over individual SNP analysis. We also apply LKM and PCA to analyze two SNP sets extracted from an actual GWAS dataset on non-small cell lung cancer.
doi:10.1371/journal.pone.0044978
PMCID: PMC3441747  PMID: 23028716
20.  Identification of inherited genetic variations influencing prognosis in early onset breast cancer 
Cancer research  2013;73(6):1883-1891.
Genome Wide Association Studies (GWAs) have begun to investigate associations between inherited genetic variations and breast cancer prognosis. Here we report our findings from a GWAs conducted in 536 early onset breast cancer patients aged 40 or less at diagnosis and with a mean follow-up period of 4.1 years (S.D=1.96). Patients were selected from the POSH (Prospective study of Outcomes in Sporadic versus Hereditary breast cancer). A Bonferroni correction for multiple testing determined that a p-value of 1.0 × 10−7 was a statistically significant association signal. Following QC we identified 487496 SNPs for association tests in stage-1. In stage 2, 35 SNPs with the most significant associations were genotyped in 1516 independent cases from the same early onset cohort. In stage-2, 11 SNPs remained associated in the same direction (p{less than or equal to}0.05). Fixed effects meta-analysis models identified one SNP associated at close to genome wide level of significance 556 kb upstream of the ARRDC3 locus HR=1.61 (1.33-1.96, p=9.5 × 10-7). Four further associations at or close to the PBX1, RORα, NTN1 and SYT6 loci also came close to genome wide significance levels (p=10-6). In the first ever GWAS for identification of SNPs associated with prognosis in early onset breast cancer patients we report a SNP upstream of the ARRDC3 locus as potentially associated with prognosis (Median follow-up time for genotypes CC=4 years, CT=3 years and TT=2.7 years, Wilcoxon rank sum test CC vs. CT, p=4 × 10-4 and CT vs. TT, P=0.76). Four further loci might also be associated with prognosis
doi:10.1158/0008-5472.CAN-12-3377
PMCID: PMC3601979  PMID: 23319801
Early onset; Breast cancer; Prognosis; Survival analysis and GWAs
21.  A Common Variant in the SIAH2 Locus Is Associated with Estrogen Receptor-Positive Breast Cancer in the Chinese Han Population 
PLoS ONE  2013;8(11):e79365.
Background
Genome-wide association studies (GWAS) have identified many loci associated with breast cancer risk. These studies have primarily been conducted in populations of European descent.
Objective
To determine whether previously reported susceptibility loci in other ethnic groups are also risk factors for breast cancer in a Chinese population.
Method
We genotyped 21 previously reported single nucleotide polymorphisms (SNPs) within a female Chinese cohort of 1203 breast cancer cases and 2525 healthy controls using the Sequenom iPlex platform. Fourteen SNPs passed the quality control test. These SNPs were subjected to statistical analysis for the entire cohort and were further analyzed for estrogen receptor (ER) status. The associations of the SNPs with disease susceptibility were assessed using logistic regression, adjusting for age. The Bonferroni correction was used to conservatively account for multiple testing, and the threshold for statistical significance was P<3.57×10−3 (0.05/14).
Result
Although none of the SNPs showed an overall association with breast cancer, an analysis of the ER status of the breast cancer patients revealed that the SIAH2 locus (rs6788895; P = 5.73×10−4, odds ratio [OR] = 0.81) is associated with ER-positive breast cancer.
Conclusion
A common variant in the SIAH2 locus is associated with ER-positive breast cancer in the Chinese Han population. The replication of published GWAS results in other ethnic groups provides important information regarding the genetic etiology of breast cancer.
doi:10.1371/journal.pone.0079365
PMCID: PMC3823686  PMID: 24244489
22.  No association between Parkinson’s disease alleles and the risk of melanoma 
Background
Recent data showed that melanoma was more common among patients with Parkinson’s disease (PD) than individuals without PD and vice versa. It has been hypothesized that these two diseases may share common genetic and environmental risk factors.
Methods
We evaluated the association between single-nucleotide polymorphisms (SNPs) selected based on recent genome-wide association studies (GWAS) on PD risk and the risk of melanoma using 2,297 melanoma cases and 6,651 controls.
Results
The PD SNP rs156429 in the chromosome 7p15 region was nominally associated with melanoma risk with p-value of 0.04, which was not significant after the Bonferroni correction for multiple comparisons. No association was observed between the remaining 31 PD SNPs and the risk of melanoma. The genetic score based on the number of PD risk allele was not associated with melanoma risk (odds ratio for the highest genetic score quartile (30–35) vs. the lowest (15–20), 1.13, 95% confidence interval, 0.47–2.70).
Conclusion
The PD SNPs identified in published GWAS do not appear to play an important role in melanoma development.
Impact
The PD susceptibility loci discovered by GWAS contribute little to the observed epidemiological association between the PD and melanoma.
doi:10.1158/1055-9965.EPI-11-0905
PMCID: PMC3253945  PMID: 22086882
23.  A Modified Forward Multiple Regression in High-Density Genome-wide Association Studies for Complex Traits 
Genetic epidemiology  2009;33(6):518-525.
Genome-wide association studies (GWAS) have been widely used to identify genetic effects on complex diseases or traits. Most currently used methods are based on separate single-nucleotide polymorphism (SNP) analyses. Because this approach requires correction for multiple testing to avoid excessive false positive results, it suffers from reduced power to detect weak genetic effects under limited sample size. To increase the power to detect multiple weak genetic factors and reduce false positive results caused by multiple tests and dependence among test statistics, a modified forward multiple regression (MFMR) approach is proposed. Simulation studies show that MFMR has higher power than the Bonferroni and false discovery rate (FDR) procedures for detecting moderate and weak genetic effects, and MFMR retains an acceptable false positive rate even if causal SNPs are correlated with many SNPs due to population stratification or other unknown reasons.
doi:10.1002/gepi.20404
PMCID: PMC2732748  PMID: 19365845
GWAS; SNP; MFMR; separate SNP analysis; multiple regression analysis
24.  Sex differences in disease risk from reported genome-wide association study findings 
Human Genetics  2011;131(3):353-364.
Men and women differ in susceptibility to many diseases and in responses to treatment. Recent advances in genome-wide association studies (GWAS) provide a wealth of data for associating genetic profiles with disease risk; however, in general, these data have not been systematically probed for sex differences in gene-disease associations. Incorporating sex into the analysis of GWAS results can elucidate new relationships between single nucleotide polymorphisms (SNPs) and human disease. In this study, we performed a sex-differentiated analysis on significant SNPs from GWAS data of the seven common diseases studied by the Wellcome Trust Case Control Consortium. We employed and compared three methods: logistic regression, Woolf’s test of heterogeneity, and a novel statistical metric that we developed called permutation method to assess sex effects (PMASE). After correction for false discovery, PMASE finds SNPs that are significantly associated with disease in only one sex. These sexually dimorphic SNP-disease associations occur in Coronary Artery Disease and Crohn’s Disease. GWAS analyses that fail to consider sex-specific effects may miss discovering sexual dimorphism in SNP-disease associations that give new insights into differences in disease mechanism between men and women.
doi:10.1007/s00439-011-1081-y
PMCID: PMC3260375  PMID: 21858542
25.  Uncovering Networks from Genome-Wide Association Studies via Circular Genomic Permutation 
G3: Genes|Genomes|Genetics  2012;2(9):1067-1075.
Genome-wide association studies (GWAS) aim to detect single nucleotide polymorphisms (SNP) associated with trait variation. However, due to the large number of tests, standard analysis techniques impose highly stringent significance thresholds, leaving potentially associated SNPs undetected, and much of the trait genetic variation unexplained. Pathway- and network-based methodologies applied to GWAS aim to detect associations missed by standard single-marker approaches. The complex and non-random architecture of the genome makes it a challenge to derive an appropriate testing framework for such methodologies. We developed a rapid and simple permutation approach that uses GWAS SNP association results to establish the significance of pathway associations while accounting for the linkage disequilibrium structure of SNPs and the clustering of functionally related elements in the genome. All SNPs used in the GWAS are placed in a “circular genome” according to their location. Then the complete set of SNP association P values are permuted by rotation with respect to the genomic locations of the SNPs. Once these “simulated” P values are assigned, the joint gene P values are calculated using Fisher’s combination test, and the association of pathways is tested using the hypergeometric test. The circular genomic permutation approach was applied to a human genome-wide association dataset. The data consists of 719 individuals from the ORCADES study genotyped for ∼300,000 SNPs and measured for 51 traits ranging from physical to biochemical measurements. KEGG pathways (n = 225) were used as the sets of pathways to be tested. Our results demonstrate that the circular genomic permutations provide robust association P values. The non-permuted hypergeometric analysis generates ∼1400 pathway-trait combination results with an association P value more significant than P ≤ 0.05, whereas applying circular genomic permutation reduces the number of significant results to a more credible 40% of that value. The circular permutation software (“genomicper”) is available as an R package at http://cran.r-project.org/.
doi:10.1534/g3.112.002618
PMCID: PMC3429921  PMID: 22973544
GWAS; pathway-based; permutation method; genomicper R package; cardiac disease

Results 1-25 (702629)