Search tips
Search criteria

Results 1-14 (14)

Clipboard (0)
more »
Year of Publication
Document Types
1.  Evaluation of a LASSO regression approach on the unrelated samples of Genetic Analysis Workshop 17 
BMC Proceedings  2011;5(Suppl 9):S12.
The Genetic Analysis Workshop 17 data we used comprise 697 unrelated individuals genotyped at 24,487 single-nucleotide polymorphisms (SNPs) from a mini-exome scan, using real sequence data for 3,205 genes annotated by the 1000 Genomes Project and simulated phenotypes. We studied 200 sets of simulated phenotypes of trait Q2. An important feature of this data set is that most SNPs are rare, with 87% of the SNPs having a minor allele frequency less than 0.05. For rare SNP detection, in this study we performed a least absolute shrinkage and selection operator (LASSO) regression and F tests at the gene level and calculated the generalized degrees of freedom to avoid any selection bias. For comparison, we also carried out linear regression and the collapsing method, which sums the rare SNPs, modified for a quantitative trait and with two different allele frequency thresholds. The aim of this paper is to evaluate these four approaches in this mini-exome data and compare their performance in terms of power and false positive rates. In most situations the LASSO approach is more powerful than linear regression and collapsing methods. We also note the difficulty in determining the optimal threshold for the collapsing method and the significant role that linkage disequilibrium plays in detecting rare causal SNPs. If a rare causal SNP is in strong linkage disequilibrium with a common marker in the same gene, power will be much improved.
PMCID: PMC3287844  PMID: 22373385
2.  Interrogating population structure and its impact on association tests 
BMC Proceedings  2011;5(Suppl 9):S25.
We found from our analysis of the Genetic Analysis Workshop 17 data that the population structure of the 697 unrelated individuals was an important confounding factor for association studies, even if it was not explicitly considered when simulating the phenotypes. We uncovered structures beyond the reported ethnicities and found ample evidence of phenotype–population structure associations. The first 10 principal components of the genotype data of the 697 individuals demonstrated much stronger associations with Q1, Q2, and the disease than did the individuals’ ethnicities. In addition, we observed that population structure was a confounding factor for the Q1-gene association when identifying the significant genes both with and without adjusting for the causal single-nucleotide polymorphisms, the ethnicities, and the principal components. Many false discoveries remained after adjusting for the causal single-nucleotide polymorphisms. Adjusting for the principal components appeared more effective than did adjusting for ethnicity in terms of preventing false discoveries. This analysis was performed with knowledge of the causal loci.
PMCID: PMC3287860  PMID: 22373290
3.  Testing gene-environment interactions in gene-based association studies 
BMC Proceedings  2011;5(Suppl 9):S26.
Gene-based and single-nucleotide polymorphism (SNP) set association studies provide an important complement to SNP analysis. Kernel-based nonparametric regression has recently emerged as a powerful and flexible tool for this purpose. Our goal is to explore whether this approach can be extended to incorporate and test for interaction effects, especially for genes containing rare variant SNPs. Here, we construct nonparametric regression models that can be used to include a gene-environment interaction effect under the framework of the least-squares kernel machine and examine the performance of the proposed method on the Genetic Analysis Workshop 17 unrelated individuals data set. Two hundred simulated replicates were used to explore the power for detecting interaction. We demonstrate through a genome scan of the quantitative phenotype Q1 that the simulated gene-environment interaction effect in the data can be detected with reasonable power by using the least-squares kernel machine method.
PMCID: PMC3287861  PMID: 22373316
4.  Estimating heritability using family and unrelated individuals data 
BMC Proceedings  2011;5(Suppl 9):S34.
For the family data from Genetic Analysis Workshop 17, we obtained heritability estimates of quantitative traits Q1 and Q4 using the ASSOC program in the S.A.G.E. software package. ASSOC is a family-based method that estimates heritability through the estimation of variance components. The covariate-adjusted mean heritability was 0.650 for Q1 and 0.745 for Q4. For the unrelated individuals data, we estimated the heritability of Q1 as the proportion of total variance that can be accounted for by all single-nucleotide polymorphisms under an additive model. We examined a novel ordinary least-squares method, a naïve restricted maximum-likelihood method, and a calibrated restricted maximum-likelihood method. We applied the different methods to all 200 replicates for Q1. We observed that the ordinary least-squares method yielded many estimates outside the interval [0, 1]. The restricted maximum-likelihood estimates were more stable than the ordinary least-squares estimates. The naïve restricted maximum-likelihood method yielded an average estimate of 0.462 ± 0.1, and the calibrated restricted maximum-likelihood method yielded an average of 0.535 ± 0.121. Our results demonstrate discrepancies in heritability estimates using the family data and the unrelated individuals data.
PMCID: PMC3287870  PMID: 22373039
5.  A novel method to detect rare variants using both family and unrelated case-control data 
BMC Proceedings  2011;5(Suppl 9):S80.
To detect rare variants associated with a phenotype, we develop a novel statistical method that can use both family and unrelated case-control data. Unlike the currently existing methods, we first use family data to calculate weights to be given to rare variants, differentiating between concordantly affected and discordant sib pairs. These weights are then used in an association test applied to the unrelated case-control data. We applied the proposed method to the simulated sequencing data in Genetic Analysis Workshop 17 and identified two genes associated with the disease.
PMCID: PMC3287921  PMID: 22373319
6.  A method to detect single-nucleotide polymorphisms accounting for a linkage signal using covariate-based affected relative pair linkage analysis 
BMC Proceedings  2011;5(Suppl 9):S84.
We evaluate an approach to detect single-nucleotide polymorphisms (SNPs) that account for a linkage signal with covariate-based affected relative pair linkage analysis in a conditional-logistic model framework using all 200 replicates of the Genetic Analysis Workshop 17 family data set. We begin by combining the multiple known covariate values into a single variable, a propensity score. We also use each SNP as a covariate, using an additive coding based on the number of minor alleles. We evaluate the distribution of the difference between LOD scores with the propensity score covariate only and LOD scores with the propensity score covariate and a SNP covariate. The inclusion of causal SNPs in causal genes increases LOD scores more than the inclusion of noncausal SNPs either within causal genes or outside causal genes. We compare the results from this method to results from a family-based association analysis and conclude that it is possible to identify SNPs that account for the linkage signals from genes using a SNP-covariate-based affected relative pair linkage approach.
PMCID: PMC3287925  PMID: 22373405
7.  Capability of common SNPs to tag rare variants 
BMC Proceedings  2011;5(Suppl 9):S88.
Genome-wide association studies are based on the linkage disequilibrium pattern between common tagging single-nucleotide polymorphisms (SNPs) (i.e., SNPs having only common alleles) and true causal variants, and association studies with rare SNP alleles aim to detect rare causal variants. To better understand and explain the findings from both types of studies and to provide clues to improve the power of an association study with only common SNPs genotyped, we study the correlation between common SNPs and the presence of rare alleles within a region in the genome and look at the capability of common SNPs in strong linkage disequilibrium with each other to capture single rare alleles. Our results indicate that common SNPs can, to some extent, tag the presence of rare alleles and that including SNPs in strong linkage disequilibrium with each other among the tagging SNPs helps to detect rare alleles.
PMCID: PMC3287929  PMID: 22373521
8.  The effect of multiple genetic variants in predicting the risk of type 2 diabetes 
BMC Proceedings  2009;3(Suppl 7):S49.
While recently performed genome-wide association studies have advanced the identification of genetic variants predisposing to type 2 diabetes (T2D), the potential application of these novel findings for disease prediction and prevention has not been well studied. Diabetes prediction and prevention have become urgent issues owing to the rapidly increasing prevalence of diabetes and its associated mortality, morbidity, and health care cost. New prediction approaches using genetic markers could facilitate early identification of high risk sub-groups of the population so that appropriate prevention methods could be effectively applied to delay, or even prevent, disease onset.
This paper assessed 18 recently identified T2D loci for their potential role in diabetes prediction. We built a new predictive genetic test for T2D using the Framingham Heart Study dataset. Using logistic regression and 15 additional loci, the new test was slightly improved over the existing test using just three loci. A formal comparison between the two tests suggests no significant improvement. We further formed a predictive genetic test for identifying early onset T2D and found higher classification accuracy for this test, not only indicating that these 18 loci have great potential for predicting early onset T2D, but also suggesting that they may play important roles in causing early-onset T2D.
To further improve the test's accuracy, we applied a newly developed nonparametric method capable of capturing high order interactions to the data, but it did not outperform a logistic regression that only considers single-locus effects. This could be explained by the absence of gene-gene interactions among the 18 loci.
PMCID: PMC2795948  PMID: 20018041
9.  Comparison of a unified analysis approach for family and unrelated samples with the transmission-disequilibrium test to study associations of hypertension in the Framingham Heart Study 
BMC Proceedings  2009;3(Suppl 7):S22.
Population stratification is one of the major causes of spurious associations in association studies. A unified association approach based on principal-component analysis can overcome the effect of population stratification, as well as make use of both family and unrelated samples combined to increase power (family-case-control, or FamCC). In this study, we compared FamCC and the transmission-disequilibrium test (TDT) using data on hypertension, systolic blood pressure, and diastolic blood pressure in the Framingham Heart Study. Our study indicated FamCC has reasonable type I error for both the unrelated sample and the family sample for all three traits. For these three traits, we found results from FamCC were inconsistent with those from the TDT. We discuss the reasons for this inconsistency. After correcting for multiple tests, we did not detect any significant single-nucleotide polymorphisms by either FamCC or the TDT.
PMCID: PMC2795919  PMID: 20018012
10.  A method to correct for population structure using a segregation model 
BMC Proceedings  2009;3(Suppl 7):S104.
To overcome the "spurious" association caused by population stratification in population-based association studies, we propose a principal-component based method that can use both family and unrelated samples at the same time. More specifically, we adapt the multivariate logistic model, which is often used in segregation analysis and can allow for the family correlation structure, for association analysis. To correct the effect of hidden population structure, the first ten principal-components calculated from the matrix of marker genotype data are incorporated as covariates in the model. To test for the association, the marker of interest is also incorporated as a covariate in the model. We applied the proposed method to the second generation (i.e., the Offspring Cohort), in the Genetic Analysis Workshop 16 Framingham Heart Study 50 k data set to evaluate the performance of the method. Although there may have been difficulty in the convergence while maximizing the likelihood function as indicated by a flat likelihood, the distribution of the empirical p-values for the test statistic does show that the method has a correct type I error rate whenever the variance-covariance matrix of the estimates can be computed.
PMCID: PMC2795875  PMID: 20017968
11.  Linkage studies of catechol-O-methyltransferase (COMT) and dopamine-beta-hydroxylase (DBH) cDNA expression levels 
BMC Proceedings  2007;1(Suppl 1):S95.
The COMT and DBH genes are physically located at chromosomes 22q11 and 9q34, respectively, and both COMT and DBH are involved in catecholamine metabolism and are strong candidates for certain psychiatric and neurological disorders. Although the genetic determinants for both enzymes' activities have been widely studied, their genetic involvement on gene mRNA expression levels remains unclear. In this study we performed quantitative linkage analysis of COMT and DBH cDNA expression levels, identifying transcriptional regulatory regions for both genes. Multiple Haseman-Elston regression was used to detect both additive and interactive effects between two loci. We found that the master transcriptional regulatory region 20q13 had an additive effect on the COMT expression level. We also found that chromosome 19p13 showed both additive and interactive effects with 9q34 on DBH expression level. Furthermore, a potential interaction between COMT and DBH was indicated.
PMCID: PMC2367604  PMID: 18466599
12.  Genome-wide association studies using an adaptive two-stage analysis for a case-control design 
BMC Proceedings  2007;1(Suppl 1):S147.
A new type of test is presented for genome-wide association studies using a case-control design. It is referred to as the adaptive two-stage (ATS) analysis, being based on both the Hardy-Weinberg disequilibrium trend test (HWDTT) and the Cochran-Armitage trend test (CATT). The procedure for the ATS is to screen single-nucleotide polymorphisms (SNPs) using the HWDTT in a first stage, and then test a reduced number of SNPs that pass the screening step in a second stage using the CATT. In the Genetic Analysis Workshop 15 simulated data set, this ATS analysis captured, after Bonferroni correction, the region from 32447.149 kb to 32859.819 kb and the region around 37363.880 kb that are close to the actual trait loci on chromosome 6. We compared the ATS with other ways of combining the p-values of the HWDTT and the CATT, the classical form of Fisher's test and a weighted form of Fisher's test. Results showed that the proposed ATS has good performance and could detect the regions containing a susceptibility locus.
PMCID: PMC2367582  PMID: 18466491
13.  Two-stage analysis strategy for identifying the IgM quantitative trait locus 
BMC Proceedings  2007;1(Suppl 1):S139.
Genetic association studies offer an opportunity to find genetic variants underlying complex human diseases. Various tests have been developed to improve their power. However, none of these tests is uniformly best and it is usually unclear at the outset what test is best for a specific dataset. For example, Hotelling's T2 test is best for normally distributed data, but it can lose considerable power when normality is not met. To achieve satisfactory power in most cases, without compromising the overall significance level, we propose to adopt a two-stage adaptive analysis strategy – several statistics are compared on a portion of the samples at the first stage and the most powerful statistic is then used for the remaining samples. We evaluated this procedure by mapping the quantitative trait locus of IgM with the simulated data in Genetic Analysis Workshop 15 Problem 3. The results show that the gain in power of the two-stage adaptive analysis procedure could be considerable when the initial choice of test statistic is wrong, whereas the loss is relatively small in the case that the optimal test chosen initially is correct.
PMCID: PMC2367539  PMID: 18466482
14.  A logistic mixture model for a family-based association study 
BMC Proceedings  2007;1(Suppl 1):S44.
A family-based association study design is not only able to localize causative genes more precisely than linkage analysis, but it also helps explain the genetic mechanism underlying the trait under study. Therefore, it can be used to follow up an initial linkage scan. For an association study of binary traits in general pedigrees, we propose a logistic mixture model that regresses the trait value on the genotypic values of markers under investigation and other covariates such as environmental factors. We first tested both the validity and power of the new model by simulating nuclear families inheriting a simple Mendelian trait. It is powerful when the correct disease model is specified and shows much loss of power when the dominance of a model is inversely specified, i.e., a dominant model is wrongly specified as recessive or vice versa. We then applied the new model to the Genetic Analysis Workshop (GAW) 15 simulation data to test the performance of the model when adjusting for covariates in the case of complex traits. Adjusting for the covariate that interacts with disease loci improves the power to detect association. The simplest version of the model only takes monogenic inheritance into account, but analysis of the GAW simulation data shows that even this simple model can be powerful for complex traits.
PMCID: PMC2359869  PMID: 18466543

Results 1-14 (14)