PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (370)
 

Clipboard (0)
None

Select a Filter Below

Journals
Year of Publication
more »
1.  A System-Level Pathway-Phenotype Association Analysis Using Synthetic Feature Random Forest 
Genetic epidemiology  2014;38(3):209-219.
As the cost of genome-wide genotyping decreases, the number of genome-wide association studies (GWAS) has increased considerably. However, the transition from GWAS findings to the underlying biology of various phenotypes remains challenging. As a result, due to its system-level interpretability, pathway analysis has become a popular tool for gaining insights on the underlying biology from high-throughput genetic association data. In pathway analyses, gene sets representing particular biological processes are tested for significant associations with a given phenotype. Most existing pathway analysis approaches rely on single-marker statistics and assume that pathways are independent of each other. As biological systems are driven by complex biomolecular interactions, embracing the complex relationships between single-nucleotide polymorphisms (SNPs) and pathways needs to be addressed. To incorporate the complexity of gene-gene interactions and pathway-pathway relationships, we propose a system-level pathway analysis approach, synthetic feature random forest (SF-RF), which is designed to detect pathway-phenotype associations without making assumptions about the relationships among SNPs or pathways. In our approach, the genotypes of SNPs in a particular pathway are aggregated into a synthetic feature representing that pathway via Random Forest (RF). Multiple synthetic features are analyzed using RF simultaneously and the significance of a synthetic feature indicates the significance of the corresponding pathway. We further complement SF-RF with pathway-based Statistical Epistasis Network (SEN) analysis that evaluates interactions among pathways. By investigating the pathway SEN, we hope to gain additional insights into the genetic mechanisms contributing to the pathway-phenotype association. We apply SF-RF to a population-based genetic study of bladder cancer and further investigate the mechanisms that help explain the pathway-phenotype associations using SEN. The bladder cancer associated pathways we found are both consistent with existing biological knowledge and reveal novel and plausible hypotheses for future biological validations.
doi:10.1002/gepi.21794
PMCID: PMC4327826  PMID: 24535726
interactions; epistasis; pathway analysis; synthetic feature random forest (SF-RF); statistical epistasis network (SEN)
2.  Testing Genetic Association with Rare and Common Variants in Family Data 
Genetic epidemiology  2014;38(0 1):S37-S43.
With the advance of next-generation sequencing technologies in recent years, rare genetic variant data have now become available for genetic epidemiology studies. For family samples however, only a few statistical methods for association analysis of rare genetic variants have been developed. Rare variant approaches are of great interest particularly for family data because samples enriched for trait-relevant variants can be ascertained and rare variants are putatively enriched through segregation. To facilitate the evaluation of existing and new rare variant testing approaches for analyzing family data, Genetic Analysis Workshop 18 (GAW18) provided genotype and next-generation sequencing data and longitudinal blood pressure traits from extended pedigrees of Mexican-American families from the San Antonio Family Study. Our GAW18 group members analyzed real and simulated phenotype data from GAW18 by using generalized linear mixed-effects models or principal components to adjust for familial correlation or by testing binary traits using a correction factor for familial effects. With one exception, approaches dealt with the extended pedigrees in their original state using information based on the kinship matrix or alternative genetic similarity measures. For simulated data, our group demonstrated that the family-based kernel machine score test is superior in power to family-based single-marker or burden tests, except in a few specific scenarios. For real data, three contributions identified significant associations. They substantially reduced the number of tests before performing the association analysis. We conclude from our real data analyses that further development of strategies for targeted testing or more focused screening of genetic variants is strongly desirable.
doi:10.1002/gepi.21823
PMCID: PMC4324976  PMID: 25112186
extended pedigrees; rare variant analysis; family-based association test; linear mixed effects model; kernel machine score test; principal components
3.  A Novel Test for Testing the Optimally Weighted Combination of Rare and Common Variants Based on Data of Parents and Affected Children 
Genetic epidemiology  2013;38(2):135-143.
With the development of sequencing technologies, the direct testing of rare variant associations has become possible. Many statistical methods for detecting associations between rare variants and complex diseases have recently been developed, most of which are population-based methods for unrelated individuals. A limitation of population-based methods is that spurious associations can occur when there is a population structure. For rare variants, this problem can be more serious, since the spectrum of rare variation can be very different in diverse populations, as well as the current nonexistence of methods to control for population stratification in population-based rare variant associations. A solution to the problem of population stratification is to use family-based association tests, which use family members to control for population stratification. In this article, we propose a novel test for Testing the Optimally Weighted combination of variants based on data of Parents and Affected Children (TOW-PAC). TOW-PAC is a family-based association test that tests the combined effect of rare and common variants in a genomic region, and is robust to the directions of the effects of causal variants. Simulation studies confirm that, for rare variant associations, family-based association tests are robust to population stratification while population-based association tests can be seriously confounded by population stratification. The results of power comparisons show that the power of TOW-PAC increases with an increase of the number of affected children in each family and TOW-PAC based on multiple affected children per family is more powerful than TOW based on unrelated individuals.
doi:10.1002/gepi.21787
PMCID: PMC4162402  PMID: 24382753
optimal weights; rare variants; association studies; family-based design
4.  A Multiple Splitting Approach to Linkage Analysis in Large Pedigrees Identifies a Linkage to Asthma on Chromosome 12 
Genetic epidemiology  2009;33(3):207-216.
Large genealogies are potentially very informative for linkage analysis. However, the software available for exact nonparametric multipoint linkage analysis is limited with respect to the complexity of the families it can handle. A solution is to split the large pedigrees into sub-families meeting complexity constraints. Different methods have been proposed to “best” split large genealogies. Here, we propose a new procedure in which linkage is performed on several carefully chosen sub-pedigree sets from the genealogy instead of using just a single sub-pedigree set. Our multiple splitting procedure capitalizes on the sensitivity of linkage results to family structure and has been designed to control computational feasibility and global type I error. We describe and apply this procedure to the extreme case of the highly complex Hutterite pedigree and use it to perform a genome-wide linkage analysis on asthma. The detection of a genome-wide significant linkage for asthma on chromosome 12q21 illustrates the potential of this multiple splitting approach.
doi:10.1002/gepi.20371
PMCID: PMC4300518  PMID: 18839415
pedigree breaking; complex pedigrees; asthma; Hutterite
5.  [No title available] 
PMCID: PMC4291231  PMID: 23529720
6.  Genome-Wide Family-Based Linkage Analysis of Exome Chip Variants and Cardiometabolic Risk 
Genetic epidemiology  2014;38(4):345-352.
Linkage analysis of complex traits has had limited success in identifying trait-influencing loci. Recently, coding variants have been implicated as the basis for some biomedical associations. We tested whether coding variants are the basis for linkage peaks of complex traits in 42 African-American (n = 596) and 90 Hispanic (n = 1,414) families in the Insulin Resistance Atherosclerosis Family Study (IRASFS) using Illumina HumanExome Beadchips. A total of 92,157 variants in African Americans (34%) and 81,559 (31%) in Hispanics were polymorphic and tested using two-point linkage and association analyses with 37 cardiometabolic phenotypes. In African Americans 77 LOD scores greater than 3 were observed. The highest LOD score was 4.91 with the APOE SNP rs7412 (MAF = 0.13) with plasma apolipoprotein B (ApoB). This SNP was associated with ApoB (P-value = 4 × 10−19) and accounted for 16.2% of the variance in African Americans. In Hispanic families, 104 LOD scores were greater than 3. The strongest evidence of linkage (LOD = 4.29) was with rs5882 (MAF = 0.46) in CETP with HDL. CETP variants were strongly associated with HDL (0.00049 < P-value <4.6 × 10−12), accounting for up to 4.5% of the variance. These loci have previously been shown to have effects on the biomedical traits evaluated here. Thus, evidence of strong linkage in this genome wide survey of primarily coding variants was uncommon. Loci with strong evidence of linkage was characterized by large contributions to the variance, and, in these cases, are common variants. Less compelling evidence of linkage and association was observed with additional loci that may require larger family sets to confirm.
doi:10.1002/gepi.21801
PMCID: PMC4281959  PMID: 24719370
Hispanic; African American; genetic variance
7.  Power of Family-Based Association Designs To Detect Rare Variants in Large Pedigrees Using Imputed Genotypes 
Genetic epidemiology  2013;38(1):1-9.
Recently, the “Common Disease-Multiple Rare Variants” hypothesis has received much attention, especially with current availability of next generation sequencing. Family-based designs are well suited for discovery of rare variants, with large and carefully selected pedigrees enriching for multiple copies of such variants. However, sequencing a large number of samples is still prohibitive. Here, we evaluate a cost-effective strategy (pseudo-sequencing) to detect association with rare variants in large pedigrees. This strategy consists of sequencing a small subset of subjects, genotyping the remaining sampled subjects on a set of sparse markers, and imputing the untyped markers in the remaining subjects conditional on the sequenced subjects and pedigree information. We used a recent pedigree imputation method (GIGI), which is able to efficiently handle large pedigrees and to accurately impute rare variants. We used burden and kernel association tests, famWS and famSKAT, which both account for family relationships, and heterogeneity of allelic effect for famSKAT only. We simulated pedigree sequence data and compared the power of association tests for: pseudo-sequence data, a subset of sequence data used for imputation, and all subjects sequenced. We also compared, within the pseudo-sequence data, the power of association test using best-guess genotypes and allelic dosages. Our results show that the pseudo-sequencing strategy considerably improves the power to detect association with rare variants. They also show that the use of allelic dosages results in much higher power than use of best-guess genotypes in these family-based data. Moreover, famSKAT shows greater power than famWS in most of scenarios we considered.
doi:10.1002/gepi.21776
PMCID: PMC3959172  PMID: 24243664
Kernel statistic; burden test; mixed linear model; sequence data; inheritance vectors; MCMC
8.  Detecting Rare Haplotype-Environment Interaction with Logistic Bayesian LASSO 
Genetic epidemiology  2013;38(1):31-41.
Two important contributors to missing heritability are believed to be rare variants and gene-environment interaction (GXE). Thus, detecting GXE where G is a rare haplotype variant (rHTV) is a pressing problem. Haplotype analysis is usually the natural second step to follow up on a genomic region that is implicated to be associated through single nucleotide variants (SNV) analysis. Further, rHTV can tag associated rare SNV and provide greater power to detect them than popular collapsing methods. Recently we proposed Logistic Bayesian LASSO (LBL) for detecting rHTV association with case-control data. LBL shrinks the unassociated (especially common) haplotypes towards zero so that an associated rHTV can be identified with greater power. Here we incorporate environmental factors and their interactions with haplotypes in LBL. As LBL is based on retrospective likelihood, this extension is not trivial. We model the joint distribution of haplotypes and covariates given the case-control status. We apply the approach (LBL-GXE) to the Michigan, Mayo, AREDS, Pennsylvania Cohort Study on Age-related Macular Degeneration (AMD). LBL-GXE detects interaction of a specific rHTV in CFH gene with smoking. To the best of our knowledge, this is the first time in the AMD literature that an interaction of smoking with a specific (rather than pooled) rHTV has been implicated. We also carry out simulations and find that LBL-GXE has reasonably good powers for detecting interactions with rHTV while keeping the type I error rates well-controlled. Thus, we conclude that LBL-GXE is a useful tool for uncovering missing heritability.
doi:10.1002/gepi.21773
PMCID: PMC4174302  PMID: 24272913
Age-related macular degeneration; Complement Factor H gene; GXE; GWAS; LBL; MCMC; Missing Heritability; Rare variants; Regularization; Retrospective Likelihood
9.  A Versatile Omnibus Test for Detecting Mean and Variance Heterogeneity 
Genetic epidemiology  2014;38(1):51-59.
Recent research has revealed loci that display variance heterogeneity through various means such as biological disruption, linkage disequilibrium (LD), gene-by-gene (GxG), or gene-by-environment (GxE) interaction. We propose a versatile likelihood ratio test that allows joint testing for mean and variance heterogeneity (LRTMV) or either effect alone (LRTM or LRTV) in the presence of covariates. Using extensive simulations for our method and others we found that all parametric tests were sensitive to non-normality regardless of any trait transformations. Coupling our test with the parametric bootstrap solves this issue. Using simulations and empirical data from a known mean-only functional variant we demonstrate how linkage disequilibrium (LD) can produce variance-heterogeneity loci (vQTL) in a predictable fashion based on differential allele frequencies, high D’ and relatively low r2 values. We propose that a joint test for mean and variance heterogeneity is more powerful than a variance only test for detecting vQTL. This takes advantage of loci that also have mean effects without sacrificing much power to detect variance only effects. We discuss using vQTL as an approach to detect gene-by-gene interactions and also how vQTL are related to relationship loci (rQTL) and how both can create prior hypothesis for each other and reveal the relationships between traits and possibly between components of a composite trait.
PMCID: PMC4019404  PMID: 24482837
Linkage Disequilibrium; vQTL; rQTL; GxG; GxE; GWAS
10.  A variational Bayes discrete mixture test for rare variant association 
Genetic epidemiology  2014;38(1):21-30.
Recently, many statistical methods have been proposed to test for associations between rare genetic variants and complex traits. Most of these methods test for association by aggregating genetic variations within a predefined region, such as a gene. Although there is evidence that “aggregate” tests are more powerful than the single marker test, these tests generally ignore neutral variants and therefore are unable to identify specific variants driving the association with phenotype. We propose a novel aggregate rare-variant test that explicitly models a fraction of variants as neutral, tests associations at the gene-level, and infers the rare-variants driving the association. Simulations show that in the practical scenario where there are many variants within a given region of the genome with only a fraction causal our approach has greater power compared to other popular tests such as the Sequence Kernel Association Test (SKAT), the Weighted Sum Statistic (WSS), and the collapsing method of Morris and Zeggini (MZ). Our algorithm leverages a fast variational Bayes approximate inference methodology to scale to exome-wide analyses, a significant computational advantage over exact inference model selection methodologies. To demonstrate the efficacy of our methodology we test for associations between von Willebrand Factor (VWF) levels and VWF missense rare-variants imputed from the National Heart, Lung, and Blood Institute’s Exome Sequencing project into 2,487 African Americans within the VWF gene. Our method suggests that a relatively small fraction (~10%) of the imputed rare missense variants within VWF are strongly associated with lower VWF levels in African Americans.
PMCID: PMC4030763  PMID: 24482836
Exome sequencing study; approximate inference; von Willebrand Factor genetics
11.  Genetic association analysis and meta-analysis of imputed SNPs in longitudinal studies 
Genetic epidemiology  2013;37(5):465-477.
In this paper we propose a new method to analyze time-to-event data in longitudinal genetic studies. This method address the fundamental problem of incorporating uncertainty when analyzing survival data and imputed single nucleotide polymorphisms (SNPs) from genomewide association studies (GWAS). Our method incorporates uncertainty in the likelihood function, the opposite of existing methods that incorporate the uncertainty in the design matrix. Through simulation studies and real data analyses, we show that our proposed method is unbiased and provides powerful results. We also show how combining results from different GWAS (meta-analysis) may lead to wrong results when effects are not estimated using our approach. The model is implemented in an R package that is designed to analyze uncertainty not only arising from imputed SNPs, but also from copy number variants (CNVs).
doi:10.1002/gepi.21719
PMCID: PMC4273087  PMID: 23595425
imputed SNP; longitudinal studies; GWAS; genetic association
12.  A sample selection strategy for next-generation sequencing 
Genetic epidemiology  2012;36(7):696-709.
Next-generation sequencing technology provides us with vast amounts of sequence data. It is efficient, and cheaper than previous sequencing technologies, but deep resequencing of entire samples is still expensive. Therefore, sensible strategies for choosing subsets of samples to sequence are required. Here we describe an algorithm for selection of a sub-sample of an existing sample if one has either of two possible goals in mind: maximizing the number of new polymorphic sites that are detected, or improving the efficiency with which the remaining unsequenced individuals can have their types imputed at newly discovered polymorphisms. We then describe a variation on our algorithm that is more focused on detecting rarer variants. We demonstrate the performance of our algorithm using simulated data and data from the 1000 Genomes Project.
doi:10.1002/gepi.21664
PMCID: PMC4272568  PMID: 22865643
Imputation; SNP discovery; Coalescent
13.  Complex Pedigrees in the Sequencing Era: To Track Transmissions or Decorrelate? 
Genetic epidemiology  2014;38(0 1):S29-S36.
Next-generation sequencing (NGS) studies are becoming commonplace, and the NGS field is continuing to develop rapidly. Analytic methods aimed at testing for the various roles that genetic susceptibility plays in disease are also rapidly being developed and optimized. Studies that incorporate large, complex pedigrees are of particular importance because they provide detailed information about inheritance patterns and can be analyzed in a variety of complementary ways. The nine contributions from our Genetic Analysis Workshop 18 working group on family-based tests of association for rare variants using simulated data examined analytic methods for testing genetic association using whole-genome sequencing data from 20 large pedigrees with 200 phenotype simulation replicates. What distinguishes the approaches explored is how the complexities of analyzing familial genetic data were handled. Here, we explore the methods that either harness inheritance patterns and transmission information or attempt to adjust for the correlation between family members in order to utilize computationally and conceptually simpler statistical testing procedures. Although directly comparing these two classes of approaches across contributions is difficult, we note that the two classes balance robustness to population stratification and computational complexity (the transmission-based approaches) with simplicity and increased power, assuming no population stratification or proper adjustment for it (decorrelation approaches).
doi:10.1002/gepi.21822
PMCID: PMC4272198  PMID: 25112185
Genetic Analysis Workshop 18; family-based association testing; decorrelation strategies; next-generation sequencing
14.  A General Framework for Association Tests With Multivariate Traits in Large-Scale Genomics Studies 
Genetic epidemiology  2013;37(8):759-767.
Genetic association studies often collect data on multiple traits that are correlated. Discovery of genetic variants influencing multiple traits can lead to better understanding of the etiology of complex human diseases. Conventional univariate association tests may miss variants that have weak or moderate effects on individual traits. We propose several multivariate test statistics to complement univariate tests. Our framework covers both studies of unrelated individuals and family studies and allows any type/mixture of traits. We relate the marginal distributions of multivariate traits to genetic variants and covariates through generalized linear models without modeling the dependence among the traits or family members. We construct score-type statistics, which are computationally fast and numerically stable even in the presence of covariates and which can be combined efficiently across studies with different designs and arbitrary patterns of missing data. We compare the power of the test statistics both theoretically and empirically. We provide a strategy to determine genome-wide significance that properly accounts for the linkage disequilibrium (LD) of genetic variants. The application of the new methods to the meta-analysis of five major cardiovascular cohort studies identifies a new locus (HSCB) that is pleiotropic for the four traits analyzed.
doi:10.1002/gepi.21759
PMCID: PMC3926135  PMID: 24227293
binary traits; genome-wide association studies; meta-analysis; multivariate tests; pleiotropy; quantitative traits; score statistics
15.  GEE-based SNP Set Association Test for Continuous and Discrete Traits in Family Based Association Studies 
Genetic epidemiology  2013;37(8):778-786.
Family-based genetic association studies of related individuals provide opportunities to detect genetic variants that complement studies of unrelated individuals. Most statistical methods for family association studies for common variants are single-marker-based, which test one SNP a time. In this paper, we consider testing the effect of a SNP set, e.g., SNPs in a gene, in family studies, for both continuous and discrete traits. Specifically, we propose a Generalized Estimating Equations (GEE)-based kernel association test, a variance component-based testing method, to test for the association between a phenotype and multiple variants in a SNP set jointly using family samples. The proposed approach allows for both continuous and discrete traits, where the correlation among family members is taken into account through the use of an empirical covariance estimator. We derive the theoretical distribution of the proposed statistic under the null and develop analytical methods to calculate the p-values. We also propose an efficient resampling method for correcting for small sample size bias in family studies. The proposed method allows for easily incorporating covariates and SNP-SNP interactions. Simulation studies show that the proposed method properly controls for type-I error rates under both random and ascertained sampling schemes in family studies. We demonstrate through simulation studies that our approach has superior performance for association mapping compared to the single marker based minimum p-value GEE test for a SNP set effect over a range of scenarios. We illustrate the application of the proposed method using data from the Cleveland Family GWAS Study.
doi:10.1002/gepi.21763
PMCID: PMC4007511  PMID: 24166731
Family-based association; Generalized estimation equations; Kernel machine regression; Marginal models; Score test; Variance component
16.  Quantitative Allelic Test - a Fast Test for Very Large Association Studies 
Genetic epidemiology  2013;37(8):831-839.
Advances in high throughput technology have enabled the generation of unprecedented amounts of genomic data (e.g., next generation sequence data, transcriptomics, metabolomics, and proteomics), which promises to unravel the genetic architecture of complex traits. These discoveries may lead to novel therapeutic targets, guide disease prevention, and enable personalized medicine. However, the pace of data generation surpasses the ability to process and analyze the vast amounts of data. For example, in a typical study of transcription regulation, the relationship between more than 1 million genetic variants and ten thousand transcript levels are explored, requiring tens of billions of tests. In order to address this problem, we propose a fast, accurate, and robust method that can assess the significance of associations between quantitative phenotypes and genotypes. The method is an extension of the allelic test commonly used in case-control studies for the analysis of quantitative traits. We show the asymptotic equivalence of the proposed test to linear regression results. We also reduce a generalized linear regression problem to the comparison of two groups, which can handle non-normal and survival time phenotypes.
doi:10.1002/gepi.21768
PMCID: PMC4054703  PMID: 24185610
GWAS; quantitative traits; allelic methods
17.  Using Extreme Phenotype Sampling to Identify the Rare Causal Variants of Quantitative Traits in Association Studies 
Genetic epidemiology  2011;35(8):790-799.
Variants identified in recent genome-wide association studies based on the common-disease common-variant hypothesis are far from fully explaining the hereditability of complex traits. Rare variants may, in part, explain some of the missing hereditability. Here, we explored the advantage of the extreme phenotype sampling in rare-variant analysis and refined this design framework for future large-scale association studies on quantitative traits. We first proposed a power calculation approach for a likelihood-based analysis method. We then used this approach to demonstrate the potential advantages of extreme phenotype sampling for rare variants. Next, we discussed how this design can influence future sequencing-based association studies from a cost-efficiency (with the phenotyping cost included) perspective. Moreover, we discussed the potential of a two-stage design with the extreme sample as the first stage and the remaining nonextreme subjects as the second stage. We demonstrated that this two-stage design is a cost-efficient alternative to the one-stage cross-sectional design or traditional two-stage design. We then discussed the analysis strategies for this extreme two-stage design and proposed a corresponding design optimization procedure. To address many practical concerns, for example measurement error or phenotypic heterogeneity at the very extremes, we examined an approach in which individuals with very extreme phenotypes are discarded. We demonstrated that even with a substantial proportion of these extreme individuals discarded, an extreme-based sampling can still be more efficient. Finally, we expanded the current analysis and design framework to accommodate the CMC approach where multiple rare-variants in the same gene region are analyzed jointly.
doi:10.1002/gepi.20628
PMCID: PMC4238184  PMID: 21922541
rare variants; extreme phenotype sampling; next generation sequencing
18.  Shades of gray: A comparison of linkage disequilibrium between Hutterites and Europeans 
Genetic epidemiology  2010;34(2):133-139.
Founder or isolated populations have advantages for genetic studies due to decreased genetic and environmental heterogeneity. However, whereas longer range linkage disequilibrium (LD) in these populations is expected to facilitate gene localization, extensive LD may actually limit the ability for gene discovery. The North American Hutterite population is one of the best characterized young founder populations and members of this isolate have been the subjects of our studies of complex traits, including fertility, asthma and cardiovascular disease, for >20 years. Here, we directly assess the patterns and extent of global LD using single nucleotide polymorphism (SNP) genotypes with minor allele frequencies (MAFs) ≥5% from the Affymetrix GeneChip® Mapping 500K array in 60 relatively unrelated Hutterites and 60 unrelated Europeans (HapMap CEU). Although LD among some marker pairs extends further in the Hutterites than in Europeans, the pattern of LD and minor allele frequencies are surprisingly similar. These results indicate that 1) identifying disease genes should be no more difficult in the Hutterites than in outbred European populations, 2) the same common susceptibility alleles for complex diseases should be present in the Hutterites and outbred European populations, and 3) imputation algorithms based on HapMap CEU should be applicable to the Hutterites.
doi:10.1002/gepi.20442
PMCID: PMC4238926  PMID: 19697328
19.  Statistical genetic analysis of serological measures of common, chronic infections in Alaska Native participants in the GOCADAN study 
Genetic epidemiology  2013;37(7):751-757.
This paper describes genetic investigations of seroreactivity to five common infectious pathogens in the Genetics of Coronary Artery Disease in Alaska Natives (GOCADAN) study. Antibody titers and seroprevalence were available for 495 to 782 (depending on the phenotype) family members at two time points, approximately 15 years apart, for Chlamydophila pneumoniae, Helicobacter pylori, cytomegalovirus (CMV), herpes simplex virus-1 (HSV-1), and herpes simplex virus-2 (HSV-2). Seroprevalence rates indicate that infections with most of these pathogens are common (>=20% for all of them, >80% for H. pylori, CMV and HSV-1). Seropositive individuals typically remain seropositive over time, with seroreversion rates of <1% to 10% over ~15 years. Antibody titers were significantly heritable for most pathogens, with the highest estimate being 0.61 for C. pneumoniae. Significant genome-wide linkage evidence was obtained for C. pneumoniae on chromosome 15 (LOD of 3.13). These results demonstrate that individual host genetic differences influence antibody measures of common infections in this population, and further investigation may elucidate the underlying immunological processes and genes involved.
doi:10.1002/gepi.21745
PMCID: PMC3969261  PMID: 23798484
Chlamydophila pneumoniae; Helicobacter pylori; cytomegalovirus; herpes simplex virus-1; herpes simplex virus-2; antibodies; heritability; linkage; Inupiaq
20.  A Shrinkage Method for Testing the Hardy-Weinberg Equilibrium in Case-Control Studies 
Genetic epidemiology  2013;37(7):743-750.
Testing for the Hardy-Weinberg equilibrium (HWE) is often used as an initial step for checking the quality of genotyping. When testing the HWE for case-control data, the impact of a potential genetic association between the marker and the disease must be controlled for otherwise the results may be biased. Li and Li (2008) proposed a likelihood ratio test (LRT) that accounts for this potential genetic association and it is more powerful than the commonly used control-only χ2 test. However, the LRT is not efficient when the marker is independent of the disease, and also requires numerical optimization to calculate the test statistic.
In this article, we propose a novel shrinkage test for assessing the HWE. The proposed shrinkage test yields higher statistical power than the LRT when the marker is independent of or weakly associated with the disease, and converges to the LRT when the marker is strongly associated with the disease. In addition, the proposed shrinkage test has a closed form and can be easily used to test the HWE for large datasets that result from genome-wide association studies. We compare the performance of the shrinkage test with existing methods using simulation studies, and apply the shrinkage test to a genome-wide association dataset for Alzheimer’s disease.
doi:10.1002/gepi.21753
PMCID: PMC3972031  PMID: 23934751
Bayesian factor; Case-control study; Hardy-Weinberg equilibrium; Shrinkage test
21.  A whole-genome simulator capable of modeling high-order epistasis for complex disease 
Genetic epidemiology  2013;37(7):686-694.
Genome-wide association studies (GWAS) have been successful in finding numerous new risk variants for complex diseases, but the results almost exclusively rely on single-marker scans. Methods that can analyze joint effects of many variants in GWAS data are still being developed and trialed. To evaluate the performance of such methods it is essential to have a GWAS data simulator that can rapidly simulate a large number of samples, and capture key features of real GWAS data such as linkage disequilibrium (LD) among single-nucleotide polymorphisms (SNPs) and joint effects of multiple loci (multilocus epistasis). In the current study, we combine techniques for specifying high-order epistasis among risk SNPs with an existing program GWAsimulator[Li and Li 2008] to achieve rapid whole-genome simulation with accurate modeling of complex interactions. We considered various approaches to specifying interaction models including: departure from product of marginal effects for pair-wise interactions, product terms in logistic regression models for low-order interactions, and penetrance tables conforming to marginal effect constraints for high-order interactions or prescribing known biological interactions. Methods for conversion among different model specifications are developed using penetrance table as the fundamental characterization of disease models. The new program, called simGWA, is capable to efficiently generate large samples of GWAS data with high precision. We show that data simulated by simGWA are faithful to template LD structures, and conform to pre-specified diseases models with (or without) interactions.
doi:10.1002/gepi.21761
PMCID: PMC4143152  PMID: 24114848
genome-wide simulation; epistasis; gene-gene interaction; genome-wide association
22.  Gene-Environment Interactions in Cancer Epidemiology: A National Cancer Institute Think Tank Report 
Genetic epidemiology  2013;37(7):643-657.
Cancer risk is determined by a complex interplay of genetic and environmental factors. Genome-wide association studies (GWAS) have identified hundreds of common (minor allele frequency [MAF]>0.05) and less common (0.01
doi:10.1002/gepi.21756
PMCID: PMC4143122  PMID: 24123198
Gene-environment interactions; complex phenotypes; genetic epidemiology
Genetic epidemiology  2014;38(5):416-429.
With challenges in data harmonization and covariate heterogeneity across various data sources, meta-analysis of gene-environment interaction studies can often involve subtle statistical issues. In this paper, we study the effect of environmental covariate heterogeneity (within and between cohorts) on two approaches for fixed-effect meta-analysis: the standard inverse-variance weighted meta-analysis and a meta-regression approach. Akin to the results in Simmonds and Higgins (2007), we obtain analytic efficiency results for both methods under the assumption of gene-environment independence. The relative efficiency of the two methods depends on the ratio of within- versus between- cohort variability of the environmental covariate. We propose to use an adaptively weighted estimator (AWE), between meta-analysis and meta-regression, for the interaction parameter. The AWE retains full efficiency of the joint analysis using individual level data under certain natural assumptions. Lin and Zeng (2010a, b) showed that a multivariate inverse-variance weighted estimator also had asymptotically full efficiency as joint analysis using individual level data, if the estimates with full covariance matrices for all the common parameters are pooled across all studies. We show consistency of our work with Lin and Zeng (2010a, b). Without sacrificing much efficiency, the AWE uses only univariate summary statistics from each study, and bypasses issues with sharing individual level data or full covariance matrices across studies. We compare the performance of the methods both analytically and numerically. The methods are illustrated through meta-analysis of interaction between Single Nucleotide Polymorphisms in FTO gene and body mass index on high-density lipoprotein cholesterol data from a set of eight studies of type 2 diabetes.
doi:10.1002/gepi.21810
PMCID: PMC4108593  PMID: 24801060
ADAPTIVELY WEIGHTED ESTIMATOR; COVARIATE HETEROGENEITY; GENE-ENVIRONMENT INTERACTION; INDIVIDUAL PATIENT DATA; META-ANALYSIS; META-REGRESSION; POWER CALCULATION
Genetic epidemiology  2011;35(8):880-886.
Despite the numerous, successful applications of GWASs, there has been much difficulty in discovering DSLs. This is due to the fact that the GWAS approach is an indirect mapping technique, often identifying markers. For the identification of DSLs, which is required for the understanding of the genetic pathways for complex diseases, sequencing data that examines every genetic locus directly is necessary. Yet there is currently a lack of methodology targeted at the identification of the DSLs in sequencing data: existing methods localize the causal variant to a region, but not to a single variant and therefore do not allow one to identify unique loci that cause the phenotype association. Here, we have developed such a method to determine if there is evidence that an individual loci affects case-control status with sequencing data. This methodology differs from other rare variant approaches: rather than testing an entire region comprised of many loci for association with the phenotype, we can identify the individual genetic locus that causes the association between the phenotype and the genetic region. For each variant, the test determines if the pattern of LD across the other variants coincides with the pattern expected if that variant were a DSL. Power simulations show that the method successfully detects the causal variant, distinguishing it from other nearby variants (in high LD with the causal variant), and outperforms the standard tests. The efficiency of the method is especially apparent with small samples, which are currently realistic for studies due to sequence data costs. The practical relevance of the approach is illustrated by an application to a sequence dataset for nonsyndromic cleft lip with or without cleft palate. The proposed method implicated one variant (p=0.002, .062 after Bonferroni correction), which was not found by standard analyses. Code for implementation is available.
doi:10.1002/gepi.20638
PMCID: PMC4181609  PMID: 22125225
Genetic epidemiology  2013;37(3):239-247.
In humans, mitochondria contain their own DNA (mtDNA) that is inherited exclusively from the mother. The mitochondrial genome encodes thirteen polypeptides that are components of oxidative phosphorylation to produce energy. Any disruption in these genes might interfere with energy production and thus contribute to metabolic derangement. Mitochondria also regulate several important cellular activities including cell death and calcium homeostasis. Aided by sharply declining costs of high-density genotyping, hundreds of mitochondrial variants will soon be available in several cohorts with pedigree structures. Association testing of mitochondrial variants with disease traits using pedigree data raises unique challenges because of the difficulty in separating the effects of nuclear and mitochondrial genomes, which display different modes of inheritance. Failing to correctly account for these effects might decrease power or inflate type I error in association tests. In this report, we sought to identify the best strategy for association testing of mitochondrial variants when genotype and phenotype data are available in pedigrees. We proposed several strategies to account for polygenic effects of the nuclear and mitochondrial genomes and we performed extensive simulation studies to evaluate type I error and power of these strategies. In addition, we proposed two permutation tests to obtain empirical p-values for these strategies. Furthermore, we applied two of the analytical strategies to association analysis of 196 mitochondrial variants with blood pressure and fasting blood glucose in the pedigree rich, Framingham Heart Study. Finally, we discussed strategies for study design, genotyping, and data cleaning in association testing of mtDNA in pedigrees.
doi:10.1002/gepi.21706
PMCID: PMC4171957  PMID: 23319385
mitochondrial DNA; association test; polygenic effect; maternal lineage; permutation test

Results 1-25 (370)