Search tips
Search criteria

Results 1-13 (13)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
Document Types
1.  A Framework for Structural Equation Models in General Pedigrees 
Human Heredity  2011;70(4):278-286.
Structural Equation Modeling (SEM) is an analysis approach that accounts for both the causal relationships between variables and the errors associated with the measurement of these variables. In this paper, a framework for implementing structural equation models (SEMs) in family data is proposed.
This framework includes both a latent measurement model and a structural model with covariates. It allows for a wide variety of models, including latent growth curve models. Environmental, polygenic and other genetic variance components can be included in the SEM. Kronecker notation makes it easy to separate the SEM process from a familial correlation model. A limited information method of model fitting is discussed. We show how missing data and ascertainment may be handled. We give several examples of how the framework may be used.
A simulation study shows that our method is computationally feasible, and has good statistical properties.
Our framework may be used to build and compare causal models using family data without any genetic marker data. It also allows for a nearly endless array of genetic association and/or linkage tests. A preliminary Matlab program is available, and we are currently implementing a more complete and user-friendly R package.
PMCID: PMC3164176  PMID: 21212683
Latent variable analysis; Path analysis; Extended pedigrees; Complex traits; Genetic linkage analysis; Genetic association
2.  strum: an R package for structural modeling of latent variables for general pedigrees 
BMC Genetics  2015;16:35.
Structural equation modeling (SEM) is an extremely general and powerful approach to account for measurement error and causal pathways when analyzing data, and it has been used in wide range of applied sciences. There are many commercial and freely available software packages for SEM. However, it is difficult to use any of the packages to analyze general pedigree data, and SEM packages for genetics are limited in their application.
We present the new R package strum to serve the need of a suitable SEM software tool for genetic analysis. It implements a general framework for SEM within the context of general pedigree data. This context requires specialized considerations such as familial correlations and ascertainment. Our package is an extraordinarily flexible tool capable of modeling genetic association, linkage analysis, polygenic effects, shared environment, and ascertainment combined with confirmatory factor analysis and general SEM. It also provides a convenient tool for model visualization, and integrates tools for simulating pedigree data. The various features of this package are tested through a simulation study to evaluate performance, and our results show that strum is very reliable and robust in terms of the accuracy and coverage of parameter estimates.
strum is a valuable new tool for genetic analysis. It can be easily used with general pedigree data, incorporating both measurement and structural models, giving it some significant advantages over other software packages. It also includes a built-in approach for handling ascertainment, a helpful integrated tool for genetic data simulation, and built-in tools for model visualization, providing a significant addition to biomedical research.
PMCID: PMC4404673  PMID: 25887541
Structural equation modeling; Latent variable analysis; Pedigree data; Genetics; Genetic epidemiology; Simulation; Visualization
3.  How meaningful are heritability estimates of liability? 
Human genetics  2013;132(12):10.1007/s00439-013-1334-z.
It is commonly acknowledged that estimates of heritability from classical twin studies have many potential shortcomings. Despite this, in the post-GWAS era, these heritability estimates have come to be a continual source of interest and controversy. While the heritability estimates of a quantitative trait are subject to a number of biases, in this article we will argue that the standard statistical approach to estimating the heritability of a binary trait relies on some additional untestable assumptions which, if violated, can lead to badly biased estimates. The ACE liability threshold model assumes at its heart that each individual has an underlying liability or propensity to acquire the binary trait (e.g., disease), and that this unobservable liability is multivariate normally distributed. We investigated a number of different scenarios violating this assumption such as the existence of a single causal diallelic gene and the existence of a dichotomous exposure. For each scenario, we found that substantial asymptotic biases can occur, which no increase in sample size can remove. Asymptotic biases as much as four times larger than the true value were observed, and numerous cases also showed large negative biases. Additionally, regions of low bias occurred for specific parameter combinations. Using simulations, we also investigated the situation where all of the assumptions of the ACE liability model are met. We found that commonly used sample sizes can lead to biased heritability estimates. Thus, even if we are willing to accept the meaningfulness of the liability construct, heritability estimates under the ACE liability threshold model may not accurately reflect the heritability of this construct. The points made in this paper should be kept in mind when considering the meaningfulness of a reported heritability estimate for any specific disease.
PMCID: PMC3843952  PMID: 23867980
4.  A Parametric Survival Model When a Covariate is Subject to Left-Censoring 
Journal of biometrics & biostatistics  2012;Suppl 3(2):10.4172/2155-6180.S3-002.
Problem statement
Modeling survival data with a set of covariates usually assumes that the values of the covariates are fully observed. However, in a variety of applications, some values of a covariate may be left-censored due to inadequate instrument sensitivity to quantify the biospecimen. When data are left-censored, the true values are missing but are known to be smaller than the detection limit. The most commonly used ad-hoc method to deal with nondetect values is to substitute the nondetect values by the detection limit. Such ad-hoc analysis of survival data with an explanatory variable subject to left-censoring may provide biased and inefficient estimators of hazard ratios and survivor functions.
We consider a parametric proportional hazards model to analyze time-to-event data. We propose a likelihood method for the estimation and inference of model parameters. In this likelihood approach, instead of replacing the nondetect values by the detection limit, we adopt a numerical integration technique to evaluate the observed data likelihood in the presence of a left-censored covariate. Monte Carlo simulations were used to demonstrate various properties of the proposed regression estimators including the consistency and efficiency.
The simulation study shows that the proposed likelihood approach provides approximately unbiased estimators of the model parameters. The proposed method also provides estimators that are more efficient than those obtained under the ad-hoc method. Also, unlike the ad-hoc estimators, the coverage probabilities of the proposed estimators are at their nominal level. Analysis of a large cohort study, genetic and inflammatory marker of sepsis study, shows discernibly different results based on the proposed method.
Naive use of detection limit in a parametric survival model may provide biased and inefficient estimators of hazard ratios and survivor functions. The proposed likelihood approach provides approximately unbiased and efficient estimators of hazard ratios and survivor functions.
PMCID: PMC3852406  PMID: 24319625
Left-censored covariate; Maximum likelihood method; Numerical integration; Survival model
5.  Power of Single- vs. Multi-Marker Tests of Association 
Genetic epidemiology  2012;36(5):480-487.
Current genome-wide association studies still heavily rely on a single-marker strategy, in which each single nucleotide polymorphism (SNP) is tested individually for association with a phenotype. Although methods and software packages that consider multimarker models have become available, they have been slow to become widely adopted and their efficacy in real data analysis is often questioned. Based on conducting extensive simulations, here we endeavor to provide more insights into the performance of simple multimarker association tests as compared to single-marker tests. The results reveal the power advantage as well as disadvantage of the two- vs. the single-marker test. Power differentials depend on the correlation structure among tag SNPs, as well as that between tag SNPs and causal variants. A two-marker test has relatively better performance than single-marker tests when the correlation of the two adjacent markers is high. However, using HapMap data, two-marker tests tended to have a greater chance of being less powerful than single-marker tests, due to constraints on the number of actual possible haplotypes in the HapMap data. Yet, the average power difference was small whenever the one-marker test is more powerful, while there were many situations where the two-marker test can be much more powerful. These findings can be useful to guide analyses of future studies.
PMCID: PMC3708310  PMID: 22648939
Asymptotic power; single-marker test; two-marker test; genome-wide association
6.  A variance component based multi-marker association test using family and unrelated data 
BMC Genetics  2013;14:17.
Incorporating family data in genetic association studies has become increasingly appreciated, especially for its potential value in testing rare variants. We introduce here a variance-component based association test that can test multiple common or rare variants jointly using both family and unrelated samples.
The proposed approach implemented in our R package aggregates or collapses the information across a region based on genetic similarity instead of genotype scores, which avoids the power loss when the effects are in different directions or have different association strengths. The method is also able to effectively leverage the LD information in a region and it can produce a test statistic with an adaptively estimated number of degrees of freedom. Our method can readily allow for the adjustment of non-genetic contributions to the familial similarity, as well as multiple covariates.
We demonstrate through simulations that the proposed method achieves good performance in terms of Type I error control and statistical power. The method is implemented in the R package “fassoc”, which provides a useful tool for data analysis and exploration.
PMCID: PMC3614458  PMID: 23497289
Association studies; Family data; Score test; Multi-marker test
7.  Testing gene-environment interactions in gene-based association studies 
BMC Proceedings  2011;5(Suppl 9):S26.
Gene-based and single-nucleotide polymorphism (SNP) set association studies provide an important complement to SNP analysis. Kernel-based nonparametric regression has recently emerged as a powerful and flexible tool for this purpose. Our goal is to explore whether this approach can be extended to incorporate and test for interaction effects, especially for genes containing rare variant SNPs. Here, we construct nonparametric regression models that can be used to include a gene-environment interaction effect under the framework of the least-squares kernel machine and examine the performance of the proposed method on the Genetic Analysis Workshop 17 unrelated individuals data set. Two hundred simulated replicates were used to explore the power for detecting interaction. We demonstrate through a genome scan of the quantitative phenotype Q1 that the simulated gene-environment interaction effect in the data can be detected with reasonable power by using the least-squares kernel machine method.
PMCID: PMC3287861  PMID: 22373316
8.  Single-Marker and Two-Marker Association Tests for Unphased Case-Control Genotype Data, with a Power Comparison 
Genetic epidemiology  2010;34(1):67-77.
In case-control Single Nucleotide Polymorphism (SNP) data, the Allele frequency, Hardy Weinberg Disequilibrium (HWD) and Linkage Disequilibrium (LD) contrast tests are three distinct sources of information about genetic association. While all three tests are typically developed in a retrospective context, we show that prospective logistic regression models may be developed that correspond conceptually to the retrospective tests. This approach provides a flexible framework for conducting a systematic series of association analyses using unphased genotype data and any number of covariates. For a single stage study, two single-marker tests and four two-marker tests are discussed. The true association models are derived and they allow us to understand why a model with only a linear term will generally fit well for a SNP in weak LD with a causal SNP, whatever the disease model, but not for a SNP in high LD with a non-additive disease SNP. We investigate the power of the association tests using real LD parameters from chromosome 11 in the HapMap CEU population data. Among the single-marker tests, the allelic test has on average the most power in the case of an additive disease; but, for dominant, recessive and heterozygote disadvantage diseases, the genotypic test has the most power. Among the six two-marker tests, the Allelic-LD contrast test, which incorporates linear terms for two markers and their interaction term, provides the most reliable power overall for the cases studied. Therefore, our result supports incorporating an interaction term as well as linear terms in multi-marker tests.
PMCID: PMC2796706  PMID: 19557751
Allele frequency contrast test; LD contrast test; HWD contrast test; Genome-wide Association
9.  Calculating Asymptotic Significance Levels of the Constrained Likelihood Ratio Test with Application to Multivariate Genetic Linkage Analysis 
The asymptotic distribution of the multivariate variance component linkage analysis likelihood ratio test has provoked some contradictory accounts in the literature. In this paper we confirm that some previous results are not correct by deriving the asymptotic distribution in one special case. It is shown that this special case is a good approximation to the distribution in many situations. We also introduce a new approach to simulating from the asymptotic distribution of the likelihood ratio test statistic in constrained testing problems. It is shown that this method is very efficient for small p-values, and is applicable even when the constraints are not convex. The method is related to a multivariate integration problem. We illustrate how the approach can be applied to multivariate linkage analysis in a simulation study. Some more philosophical issues relating to one-sided tests in variance components linkage analysis are discussed.
PMCID: PMC2861321  PMID: 19799558
10.  Comparison of univariate and multivariate linkage analysis of traits related to hypertension 
BMC Proceedings  2009;3(Suppl 7):S99.
Complex traits are often manifested by multiple correlated traits. One example of this is hypertension (HTN), which is measured on a continuous scale by systolic blood pressure (SBP). Predisposition to HTN is predicted by hyperlipidemia, characterized by elevated triglycerides (TG), low-density lipids (LDL), and high-density lipids (HDL). We hypothesized that the multivariate analysis of TG, LDL, and HDL would be more powerful for detecting HTN genes via linkage analysis compared with univariate analysis of SBP. We conducted linkage analysis of four chromosomal regions known to contain genes associated with HTN using SBP as a measure of HTN in univariate Haseman-Elston regression and using the correlated traits TG, LDL, and HDL in multivariate Haseman-Elston regression. All analyses were conducted using the Framingham Heart Study data. We found that multivariate linkage analysis was better able to detect chromosomal regions in which the angiotensinogen, angiotensin receptor, guanine nucleotide-binding protein 3, and prostaglandin I2 synthase genes reside. Univariate linkage analysis only detected the AGT gene. We conclude that multivariate analysis is appropriate for the analysis of multiple correlated phenotypes, and our findings suggest that it may yield new linkage signals undetected by univariate analysis.
PMCID: PMC2796003  PMID: 20018096
11.  Mendelian randomization in family data 
BMC Proceedings  2009;3(Suppl 7):S45.
The phrase "mendelian randomization" has become associated with the use of genetic polymorphisms to uncover causal relationships between phenotypic variables. The statistical methods useful in mendelian randomization are known as instrumental variable techniques. We present an approach to instrumental variable estimation that is useful in family data and is robust to the use of weak instruments. We illustrate our method to measure the causal influence of low-density lipoprotein on high-density lipoprotein, body mass index, triglycerides, and systolic blood pressure. We use the Framingham Heart Study data as distributed to participants in the Genetics Analysis Workshop 16.
PMCID: PMC2795944  PMID: 20018037
12.  Comparison of affected sibling-pair linkage methods to identify gene × gene interaction in GAW15 simulated data 
BMC Proceedings  2007;1(Suppl 1):S66.
Non-parametric linkage methods have had limited success in detecting gene by gene interactions. Using affected sibling-pair (ASP) data from all replicates of the simulated data from Problem 3, we assessed the statistical power of three approaches to identify the gene × gene interaction between two loci on different chromosomes. The first method conditioned on linkage at the primary disease susceptibility locus (DR), to find linkage to a simulated effect modifier at Locus A with a mean allele sharing test. The second approach used a regression-based mean test to identify either the presence of interaction between the two loci or linkage to the A locus in the presence of linkage to DR. The third method applied a conditional logistic model designed to test for the presence of interacting loci. The first approach had decreased power over an unconditional linkage analysis, supporting the idea that gene × gene interaction cannot be detected with ASP data. The regression-based mean test and the conditional logistic model had the lowest power to detect gene × gene interaction, possibly because of the complex recoding of the tri-allelic DR locus for use as a covariate. We conclude that the ASP approaches tested have low power to successfully identify the interaction between the DR and A loci despite the large sample size, which may be due to the low prevalence of the high-risk DR genotypes. Additionally, the lack of data on discordant sibships may have decreased the power to identify gene × gene interactions.
PMCID: PMC2367530  PMID: 18466567
13.  Modeling the complex gene × environment interplay in the simulated rheumatoid arthritis GAW15 data using latent variable structural equation modeling 
BMC Proceedings  2007;1(Suppl 1):S118.
Rheumatoid arthritis is a complex disease that appears to involve multiple genetic and environmental factors. Using the Genetic Analysis Workshop 15 simulated rheumatoid arthritis data and the structural equation modeling framework, we tested hypothesized "causal" rheumatoid arthritis model(s) by employing a novel latent gene construct approach that models individual genes as latent variables defined by multiple dense and non-dense single-nucleotide polymorphisms (SNPs). Our approach produced valid latent gene constructs, particularly with dense SNPs, which when coupled with other factors involved in rheumatoid arthritis, were able to generate good fitting models by certain goodness of fit indices. We observed that Gene F, C, DR, sex and smoking were significant predictors of rheumatoid arthritis but Genes A and E were not, which was generally, but not entirely, consistent with how the data were simulated. Our approach holds promise in unravelling complex diseases and improves upon current "one SNP (haplotype)-at-a-time" regression approaches by decreasing the number of statistical tests while minimizing problems with multicolinearity and haplotype estimation algorithm error. Furthermore, when genes are modeled as latent constructs simultaneously with other key cofactors, the approach provides enhanced control of confounding that should lead to less biased effect estimates among genes as well as between gene(s) and the complex disease. However, further study is needed to quantify bias, evaluate fit index disparity, and resolve multiplicative latent gene interactions. Moreover, because some a priori biological information is needed to form an initial substantive model, our approach may be most appropriate for candidate gene SNP panel applications.
PMCID: PMC2367478  PMID: 18466459

Results 1-13 (13)