Olson's conditional-logistic model retains the nice property of the LOD score formulation and has advantages over other methods that make it an appropriate choice for complex trait linkage mapping. However, the asymptotic distribution of the conditional-logistic likelihood-ratio (CL-LR) statistic with genetic constraints on the model parameters is unknown for some analysis models, even in the case of samples comprising only independent sib pairs. We derive approximations to the asymptotic null distributions of the CL-LR statistics and compare them with the empirical null distributions by simulation using independent affected sib pairs. Generally, the empirical null distributions of the CL-LR statistics match well the known or approximated asymptotic distributions for all analysis models considered except for the covariate model with a minimum-adjusted binary covariate. This work will provide useful guidelines for linkage analysis of real data sets for the genetic analysis of complex traits, thereby contributing to the identification of genes for disease traits.
linkage analysis; affected sib pairs; identity-by-descent; conditional-logistic model; genetic constraints; null distribution; likelihood-ratio statistics
A novel web-based tool PedWiz that pipelines the informatics process for pedigree data is introduced. PedWiz is designed to assist researchers in the analysis of pedigree data. It provides a convenient tool for pedigree informatics: descriptive statistics, relative pairs, genetic similarity coefficients, the variance-covariance matrix for three estimated coefficients of allele identical-by-descent sharing as well as mean allele sharing, a plot of the pedigree structures, and a visualization of the identity coefficients. With a renewed interest in linkage and other family based methods, PedWiz will be a valuable tool for the analysis of family data.
pedigree; informatics; genetic similarity; identity-by-descent; relative pairs; family data
We evaluate an approach to detect single-nucleotide polymorphisms (SNPs) that account for a linkage signal with covariate-based affected relative pair linkage analysis in a conditional-logistic model framework using all 200 replicates of the Genetic Analysis Workshop 17 family data set. We begin by combining the multiple known covariate values into a single variable, a propensity score. We also use each SNP as a covariate, using an additive coding based on the number of minor alleles. We evaluate the distribution of the difference between LOD scores with the propensity score covariate only and LOD scores with the propensity score covariate and a SNP covariate. The inclusion of causal SNPs in causal genes increases LOD scores more than the inclusion of noncausal SNPs either within causal genes or outside causal genes. We compare the results from this method to results from a family-based association analysis and conclude that it is possible to identify SNPs that account for the linkage signals from genes using a SNP-covariate-based affected relative pair linkage approach.
Complex traits are often manifested by multiple correlated traits. One example of this is hypertension (HTN), which is measured on a continuous scale by systolic blood pressure (SBP). Predisposition to HTN is predicted by hyperlipidemia, characterized by elevated triglycerides (TG), low-density lipids (LDL), and high-density lipids (HDL). We hypothesized that the multivariate analysis of TG, LDL, and HDL would be more powerful for detecting HTN genes via linkage analysis compared with univariate analysis of SBP. We conducted linkage analysis of four chromosomal regions known to contain genes associated with HTN using SBP as a measure of HTN in univariate Haseman-Elston regression and using the correlated traits TG, LDL, and HDL in multivariate Haseman-Elston regression. All analyses were conducted using the Framingham Heart Study data. We found that multivariate linkage analysis was better able to detect chromosomal regions in which the angiotensinogen, angiotensin receptor, guanine nucleotide-binding protein 3, and prostaglandin I2 synthase genes reside. Univariate linkage analysis only detected the AGT gene. We conclude that multivariate analysis is appropriate for the analysis of multiple correlated phenotypes, and our findings suggest that it may yield new linkage signals undetected by univariate analysis.
The Metabolic Syndrome (MetSyn), which is a clustering of traits including insulin resistance, obesity, hypertension and dyslipidemia, is estimated to have a substantial genetic component, yet few specific genetic targets have been identified. Factor analysis, a sub-type of structural equation modeling (SEM), has been used to model the complex relationships in MetSyn. Therefore, we aimed to define the genetic determinants of MetSyn in the Framingham Heart Study (Offspring Cohort, Exam 7) using the Affymetrix 50 k Human Gene Panel and three different approaches: 1) an association-based "one-SNP-at-a-time" analysis with MetSyn as a binary trait using the World Health Organization criteria; 2) an association-based "one-SNP-at-a-time" analysis with MetSyn as a continuous trait using second-order factor scores derived from four first-order factors; and, 3) a multivariate SEM analysis with MetSyn as a continuous, second-order factor modeled with multiple putative genes, which were represented by latent constructs defined using multiple SNPs in each gene. Results were similar between approaches in that CSMD1 SNPs were associated with MetSyn in Approaches 1 and 2; however, the effects of CSMD1 diminished in Approach 3 when modeled simultaneously with six other genes, most notably CETP and STARD13, which were strongly associated with the Lipids and MetSyn factors, respectively. We conclude that modeling multiple genes as latent constructs on first-order trait factors, most proximal to the gene's function with limited paths directly from genes to the second-order MetSyn factor, using SEM is the most viable approach toward understanding overall gene variation effects in the presence of multiple putative SNPs.
While recently performed genome-wide association studies have advanced the identification of genetic variants predisposing to type 2 diabetes (T2D), the potential application of these novel findings for disease prediction and prevention has not been well studied. Diabetes prediction and prevention have become urgent issues owing to the rapidly increasing prevalence of diabetes and its associated mortality, morbidity, and health care cost. New prediction approaches using genetic markers could facilitate early identification of high risk sub-groups of the population so that appropriate prevention methods could be effectively applied to delay, or even prevent, disease onset.
This paper assessed 18 recently identified T2D loci for their potential role in diabetes prediction. We built a new predictive genetic test for T2D using the Framingham Heart Study dataset. Using logistic regression and 15 additional loci, the new test was slightly improved over the existing test using just three loci. A formal comparison between the two tests suggests no significant improvement. We further formed a predictive genetic test for identifying early onset T2D and found higher classification accuracy for this test, not only indicating that these 18 loci have great potential for predicting early onset T2D, but also suggesting that they may play important roles in causing early-onset T2D.
To further improve the test's accuracy, we applied a newly developed nonparametric method capable of capturing high order interactions to the data, but it did not outperform a logistic regression that only considers single-locus effects. This could be explained by the absence of gene-gene interactions among the 18 loci.
Metabolic syndrome, by definition, is the manifestation of multiple, correlated metabolic impairments. It is known to have both strong environmental and genetic contributions. However, isolating genetic variants predisposing to such a complex trait has limitations. Using pedigree data, when available, may well lead to increased ability to detect variants associated with such complex traits. The ability to incorporate multiple correlated traits into a joint analysis may also allow increased detection of associated genes. Therefore, to demonstrate the utility of both univariate and multivariate family-based association analysis and to identify possible genetic variants associated with metabolic syndrome, we performed a scan of the Affymetrix 50 k Human Gene Panel data using 1) each of the traits comprising metabolic syndrome: triglycerides, high-density lipoprotein, systolic blood pressure, diastolic blood pressure, blood glucose, and body mass index, and 2) a composite trait including all of the above, jointly. Two single-nucleotide polymorphisms within the cholesterol ester transfer protein (CETP) gene remained significant even after correcting for multiple testing in both the univariate (p < 5 × 10-7) and multivariate (p < 5 × 10-9) association analysis. Three genes met significance for multiple traits after correction for multiple testing in the univariate analysis, while five genes remained significant in the multivariate association. We conclude that while both univariate and multivariate family-based association analysis can identify genes of interest, our multivariate approach is less affected by multiple testing correction and yields more significant results.
Population stratification is one of the major causes of spurious associations in association studies. A unified association approach based on principal-component analysis can overcome the effect of population stratification, as well as make use of both family and unrelated samples combined to increase power (family-case-control, or FamCC). In this study, we compared FamCC and the transmission-disequilibrium test (TDT) using data on hypertension, systolic blood pressure, and diastolic blood pressure in the Framingham Heart Study. Our study indicated FamCC has reasonable type I error for both the unrelated sample and the family sample for all three traits. For these three traits, we found results from FamCC were inconsistent with those from the TDT. We discuss the reasons for this inconsistency. After correcting for multiple tests, we did not detect any significant single-nucleotide polymorphisms by either FamCC or the TDT.
To account for population stratification in association studies, principal-components analysis is often performed on single-nucleotide polymorphisms (SNPs) across the genome. Here, we use Framingham Heart Study (FHS) Genetic Analysis Workshop 16 data to compare the performance of local ancestry adjustment for population stratification based on principal components (PCs) estimated from SNPs in a local chromosomal region with global ancestry adjustment based on PCs estimated from genome-wide SNPs.
Standardized height residuals from unrelated adults from the FHS Offspring Cohort were averaged from longitudinal data. PCs of SNP genotype data were calculated to represent individual's ancestry either 1) globally using all SNPs across the genome or 2) locally using SNPs in adjacent 20-Mbp regions within each chromosome. We assessed the extent to which there were differences in association studies of height depending on whether PCs for global, local, or both global and local ancestry were included as covariates.
The correlations between local and global PCs were low (r < 0.12), suggesting variability between local and global ancestry estimates. Genome-wide association tests without any ancestry adjustment demonstrated an inflated type I error rate that decreased with adjustment for local ancestry, global ancestry, or both. A known spurious association was replicated for SNPs within the lactase gene, and this false-positive association was abolished by adjustment with local or global ancestry PCs.
Population stratification is a potential source of bias in this seemingly homogenous FHS population. However, local and global PCs derived from SNPs appear to provide adequate information about ancestry.
To overcome the "spurious" association caused by population stratification in population-based association studies, we propose a principal-component based method that can use both family and unrelated samples at the same time. More specifically, we adapt the multivariate logistic model, which is often used in segregation analysis and can allow for the family correlation structure, for association analysis. To correct the effect of hidden population structure, the first ten principal-components calculated from the matrix of marker genotype data are incorporated as covariates in the model. To test for the association, the marker of interest is also incorporated as a covariate in the model. We applied the proposed method to the second generation (i.e., the Offspring Cohort), in the Genetic Analysis Workshop 16 Framingham Heart Study 50 k data set to evaluate the performance of the method. Although there may have been difficulty in the convergence while maximizing the likelihood function as indicated by a flat likelihood, the distribution of the empirical p-values for the test statistic does show that the method has a correct type I error rate whenever the variance-covariance matrix of the estimates can be computed.
Standard genetic mapping techniques scan chromosomal segments for location of genetic linkage and association signals. The majority of these methods consider only correlations at single markers and/or phenotypes with explicit detailing of the genetic structure. These methods tend to be limited by their inability to consider the effect of large numbers of model variables jointly. In contrast, we propose a Bayesian analysis of variance (ANOVA) method to categorize individuals based on similarity of multidimensional profiles and attempt to analyze all variables simultaneously. Using Problem 1 of the Genetic Analysis Workshop 15 data set, we demonstrate the method's utility for joint analysis of gene expression levels and single-nucleotide polymorphism genotypes. We show that the method extracts similar information to that of previous genetic mapping analyses, and suggest extensions of the method for mining unique information not previously found.
Errors while genotyping are inevitable and can reduce the power to detect linkage. However, does genotyping error have the same impact on linkage results for single-nucleotide polymorphism (SNP) and microsatellite (MS) marker maps? To evaluate this question we detected genotyping errors that are consistent with Mendelian inheritance using large changes in multipoint identity-by-descent sharing in neighboring markers. Only a small fraction of Mendelian consistent errors were detectable (e.g., 18% of MS and 2.4% of SNP genotyping errors). More SNP genotyping errors are Mendelian consistent compared to MS genotyping errors, so genotyping error may have a greater impact on linkage results using SNP marker maps. We also evaluated the effect of genotyping error on the power and type I error rate using simulated nuclear families with missing parents under 0, 0.14, and 2.8% genotyping error rates. In the presence of genotyping error, we found that the power to detect a true linkage signal was greater for SNP (75%) than MS (67%) marker maps, although there were also slightly more false-positive signals using SNP marker maps (5 compared with 3 for MS). Finally, we evaluated the usefulness of accounting for genotyping error in the SNP data using a likelihood-based approach, which restores some of the power that is lost when genotyping error is introduced.
The basic idea of affected-sib-pair (ASP) linkage analysis is to test whether the inheritance pattern of a marker deviates from Mendelian expectation in a sample of ASPs. The test depends on an assumed Mendelian control distribution of the number of marker alleles shared identical by descent (IBD), i.e., 1/4, 1/2, and 1/4 for 2, 1, and 0 allele(s) IBD, respectively. However, Mendelian transmission may not always hold, for example because of inbreeding or meiotic drive at the marker or a nearby locus. A more robust and valid approach is to incorporate discordant-sib-pairs (DSPs) as controls to avoid possible false-positive results. To be robust to deviation from Mendelian transmission, here we analyzed Collaborative Study on the Genetics of Alcoholism data by modifying the ASP LOD score method to contrast the estimated distribution of the number of allele(s) shared IBD by ASPs with that by DSPs, instead of with the expected distribution under the Mendelian assumption. This strategy assesses the difference in IBD sharing between ASPs and the IBD sharing between DSPs. Further, it works better than the conventional LOD score ASP linkage method in these data in the sense of avoiding false-positive linkage evidence.
We developed a new marker-reordering algorithm to find the best order of fine-mapping markers for multipoint linkage analysis. The algorithm searches for the best order of fine-mapping markers such that the sum of the squared differences in identity-by-descent distribution between neighboring markers is minimized. To test this algorithm, we examined its effect on the evidence for linkage in the simulated and the Collaborative Studies on Genetics of Alcoholism (COGA) data. We found enhanced evidence for linkage with the reordered map at the true location in the simulated data (p-value decreased from 1.16 × 10-9 to 9.70 × 10-10). Analysis of the White population from the COGA data with the reordered map for alcohol dependence led to a significant change of the linkage signal (p = 0.0365 decreased to p = 0.0039) on chromosome 1 between marker D1S1592 and D1S1598. Our results suggest that reordering fine-mapping markers in candidate regions when the genetic map is uncertain can be a critical step when considering a dense map.
The metabolic syndrome is characterized by the clustering of several traits, including obesity, hypertension, decreased levels of HDL cholesterol, and increased levels of glucose and triglycerides. Because these traits cluster, there are likely common genetic factors involved.
We used a multivariate structural equation model (SEM) approach to scan the genome for loci involved in the metabolic syndrome. We found moderate evidence for linkage on chromosomes 2, 3, 11, 13, and 15, and these loci appear to have different relative effects on the component traits of the metabolic syndrome.
Our results suggest that the metabolic syndrome components, diabetes, obesity, and hypertension, are under the pleiotropic control of several loci.
A genome-wide screen was conducted for type 2 diabetes progression genes using measures of elevated fasting glucose levels as quantitative traits from the offspring enrolled in the Framingham Heart Study. We analyzed young (20–34 years) and old (≥ 35 years) subjects separately, using single-point and multipoint sibpair analysis, because of the possible differential impact of progression on the groups of interest. We observed significant linkage with change in fasting glucose levels on 1q25-32 (p = 5.21 × 10-8), 3p26.3-21.31 (p = 1 × 10-11), 8q23.1-24.13 (p = 2.94 × 10-6), 9p24.1-21.3 (p = 7 × 10-7), and 18p11.31-q22.1 (p < 10-11). The evidence for linkage on chromosomes 8 and 18 was consistent for the subset of study participants aged 43 through 55 years.
Type 2 diabetes; progression; longitudinal; genome-wide search; fasting glucose; model-free linkage analysis
Genetic heterogeneity and complex biologic mechanisms of blood pressure regulation pose significant challenges to the identification of susceptibility loci influencing hypertension. Previous linkage studies have reported regions of interest, but lack consistency across studies. Incorporation of covariates, in particular the interaction between two independent risk factors (gender and BMI) greatly improved our ability to detect linkage.
We report a highly significant signal for linkage to chromosome 2p, a region that has been implicated in previous linkage studies, along with several suggestive linkage regions.
We demonstrate the importance of including covariates in the linkage analysis when the phenotype is complex.