Existing linkage-analysis methods address binary or quantitative traits. However, many complex diseases and human conditions, particularly behavioral disorders, are rated on ordinal scales. Herein, we introduce, LOT, a tool that performs linkage analysis of ordinal traits for pedigree data. It implements a latent-variable proportional-odds logistic model that relates inheritance patterns to the distribution of the ordinal trait. The likelihood-ratio test is used for testing evidence of linkage.
Pedigree studies of complex heritable diseases often feature nominal or ordinal phenotypic measurements and missing genetic marker or phenotype data.
We have developed a Bayesian method for Linkage analysis of Ordinal and Categorical traits (LOCate) that can analyze complex genealogical structure for family groups and incorporate missing data. LOCate uses a Gibbs sampling approach to assess linkage, incorporating a simulated tempering algorithm for fast mixing. While our treatment is Bayesian, we develop a LOD (log of odds) score estimator for assessing linkage from Gibbs sampling that is highly accurate for simulated data. LOCate is applicable to linkage analysis for ordinal or nominal traits, a versatility which we demonstrate by analyzing simulated data with a nominal trait, on which LOCate outperforms LOT, an existing method which is designed for ordinal traits. We additionally demonstrate our method's versatility by analyzing a candidate locus (D2S1788) for panic disorder in humans, in a dataset with a large amount of missing data, which LOT was unable to handle.
LOCate's accuracy and applicability to both ordinal and nominal traits will prove useful to researchers interested in mapping loci for categorical traits.
Summary: We present LOX (Level Of eXpression) that estimates the Level Of gene eXpression from high-throughput-expressed sequence datasets with multiple treatments or samples. Unlike most analyses, LOX incorporates a gene bias model that facilitates integration of diverse transcriptomic sequencing data that arises when transcriptomic data have been produced using diverse experimental methodologies. LOX integrates overall sequence count tallies normalized by total expressed sequence count to provide expression levels for each gene relative to all treatments as well as Bayesian credible intervals.
Summary: AUGIST (accomodating uncertainty in genealogies while inferring species tress) is a new software package for inferring species trees while accommodating uncertainty in gene genealogies. It is written for the Mesquite software system and provides sampling procedures to incorporate uncertainty in gene tree reconstruction while providing confidence estimates for inferred species trees.
Many complex human diseases such as alcoholism and cancer are rated on ordinal scales. Well-developed statistical methods for the genetic mapping of quantitative traits may not be appropriate for ordinal traits. We propose a class of variance-component models for the joint linkage and association analysis of ordinal traits. The proposed models accommodate arbitrary pedigrees and allow covariates and gene-environment interactions. We develop efficient likelihood-based inference procedures under the proposed models. The maximum likelihood estimators are approximately unbiased, normally distributed, and statistically efficient. Extensive simulation studies demonstrate that the proposed methods perform well in practical situations. An application to data from the Collaborative Study on the Genetics of Alcoholism is provided.
complex diseases; family studies; IBD sharing; LOD score; maximum likelihood; probit model; SNPs
Trait-model-free (or ‘allele-sharing’) approach to linkage analysis is a popular tool in genetic mapping of complex traits, due to the absence of explicit assumptions about the underlying mode of inheritance of the trait. The likelihood framework introduced by Kong and Cox (1997) allows calculation of accurate p-values and LOD scores to test for linkage between a genomic region and a trait. Their method relies on the specification of a model for the trait-dependent segregation of marker alleles at a genomic region linked to the trait. Here we propose a new such model that is motivated by the desire to extract as much information as possible from extended pedigrees containing data from individuals related over several generations. However, our model is also applicable to smaller pedigrees, and has some attractive features compared with existing models (Kong and Cox, 1997), including the fact that it incorporates information on both affected and unaffected individuals. We illustrate the proposed model on simulated and real data, and compare its performance with the existing approach (Kong and Cox, 1997). The proposed approach is implemented in the program lm_ibdtests within the framework of MORGAN 2.8 (http://www.stat.washington.edu/thompson/Genepi/MORGAN/Morgan.shtml).
Identity by descent; likelihood ratio test; linkage analysis; Trait-model-free
Existing joint models for longitudinal and survival data are not applicable for longitudinal ordinal outcomes with possible non-ignorable missing values caused by multiple reasons. We propose a joint model for longitudinal ordinal measurements and competing risks failure time data, in which a partial proportional odds model for the longitudinal ordinal outcome is linked to the event times by latent random variables. At the survival endpoint, our model adopts the competing risks framework to model multiple failure types at the same time. The partial proportional odds model, as an extension of the popular proportional odds model for ordinal outcomes, is more flexible and at the same time provides a tool to test the proportional odds assumption. We use a likelihood approach and derive an EM algorithm to obtain the maximum likelihood estimates of the parameters. We further show that all the parameters at the survival endpoint are identifiable from the data. Our joint model enables one to make inference for both the longitudinal ordinal outcome and the failure times simultaneously. In addition, the inference at the longitudinal endpoint is adjusted for possible non-ignorable missing data caused by the failure times. We apply the method to the NINDS rt-PA stroke trial. Our study considers the modified Rankin Scale only. Other ordinal outcomes in the trial, such as the Barthel and Glasgow scales can be treated in the same way.
Linkage analysis identifies markers that appear to be co-inherited with a trait within pedigrees. The inheritance of a chromosomal segment may be probabilistically reconstructed, with missing data complicating inference. Inheritance patterns are further obscured in the analysis of complex traits, where variants in one or more genes may contribute to phenotypic variation within a pedigree. In this case, determining which relatives share a trait variant is not simple. We describe how to represent these patterns of inheritance for marker loci. We summarize how to sample patterns of inheritance consistent with genotypic and pedigree data using gl_auto, available in MORGAN v3.0. We describe identification of classes of equivalent inheritance patterns with the program IBDgraph. We finally provide an example of how these programs may be used to simplify interpretation of linkage analysis of complex traits in general pedigrees. We borrow information across loci in a parametric linkage analysis of a large pedigree. We explore the contribution of each equivalence class to a linkage signal, illustrate estimated patterns of identity-by-descent sharing, and identify a haplotype tagging the chromosomal segment driving the linkage signal. Haplotype carriers are more likely to share the linked trait variant, and can be prioritized for subsequent DNA sequencing.
Inheritance vector; Segregation; Genome scan; Haplotype; Equivalence class
We propose a two-step model-based approach, with correction for ascertainment, to linkage analysis of a binary trait with variable age of onset and apply it to a set of multiplex pedigrees segregating for adult glioma.
First, we fit segregation models by formulating the likelihood for a person to have a bivariate phenotype, affection status and age of onset, along with other covariates, and from these we estimate population trait allele frequencies and penetrance parameters as a function of age (N=281 multiplex glioma pedigrees). Second, the best fitting models are used as trait models in multipoint linkage analysis (N=74 informative multiplex glioma pedigrees). To correct for ascertainment, a prevalence constraint is used in the likelihood of the segregation models for all 281 pedigrees. Then the trait allele frequencies are re-estimated for the pedigree founders of the subset of 74 pedigrees chosen for linkage analysis.
Using the best fitting segregation models in model-based multipoint linkage analysis, we identified two separate peaks on chromosome 17; the first agreed with a region identified by Shete et al. who used model-free affected-only linkage analysis, but with a narrowed peak: and the second agreed with a second region they found but had a larger maximum log of the odds (LOD).
Our approach has the advantage of not requiring markers to be in linkage equilibrium unless the minor allele frequency is small (markers which tend to be uninformative for linkage), and of using more of the available information for LOD-based linkage analysis.
Glioma; model-based linkage; segregation; age of onset; prevalence constraint
In contrast to gene-mapping studies of simple Mendelian disorders, genetic analyses of complex traits are far more challenging, and high quality data management systems are often critical to the success of these projects. To minimize the difficulties inherent in complex trait studies, we have developed GeneLink, a Web-accessible, password-protected Sybase database.
GeneLink is a powerful tool for complex trait mapping, enabling genotypic data to be easily merged with pedigree and extensive phenotypic data. Specifically designed to facilitate large-scale (multi-center) genetic linkage or association studies, GeneLink securely and efficiently handles large amounts of data and provides additional features to facilitate data analysis by existing software packages and quality control. These include the ability to download chromosome-specific data files containing marker data in map order in various formats appropriate for downstream analyses (e.g., GAS and LINKAGE). Furthermore, an unlimited number of phenotypes (either qualitative or quantitative) can be stored and analyzed. Finally, GeneLink generates several quality assurance reports, including genotyping success rates of specified DNA samples or success and heterozygosity rates for specified markers.
GeneLink has already proven an invaluable tool for complex trait mapping studies and is discussed primarily in the context of our large, multi-center study of hereditary prostate cancer (HPC). GeneLink is freely available at .
Linkage analysis is a useful tool for detecting genetic variants that regulate a trait of interest, especially genes associated with a given disease. Although penetrance parameters play an important role in determining gene location, they are assigned arbitrary values according to the researcher’s intuition or as estimated by the maximum likelihood principle. Several methods exist by which to evaluate the maximum likelihood estimates of penetrance, although not all of these are supported by software packages and some are biased by marker genotype information, even when disease development is due solely to the genotype of a single allele.
Programs for exploring the maximum likelihood estimates of penetrance parameters were developed using the R statistical programming language supplemented by external C functions. The software returns a vector of polynomial coefficients of penetrance parameters, representing the likelihood of pedigree data. From the likelihood polynomial supplied by the proposed method, the likelihood value and its gradient can be precisely computed. To reduce the effect of the supplied dataset on the likelihood function, feasible parameter constraints can be introduced into maximum likelihood estimates, thus enabling flexible exploration of the penetrance estimates. An auxiliary program generates a perspective plot allowing visual validation of the model’s convergence. The functions are collectively available as the MLEP R package.
Linkage analysis using penetrance parameters estimated by the MLEP package enables feasible localization of a disease locus. This is shown through a simulation study and by demonstrating how the package is used to explore maximum likelihood estimates. Although the input dataset tends to bias the likelihood estimates, the method yields accurate results superior to the analysis using intuitive penetrance values for disease with low allele frequencies. MLEP is part of the Comprehensive R Archive Network and is freely available at
Penetrance; Maximum likelihood estimate; Linkage analysis; Polynomial evaluation
After genetic linkage has been identified for a complex disease, the next step is often fine-mapping by association analysis, using single-nucleotide polymorphisms (SNPs) within a linkage region. If a SNP shows evidence of association, it is useful to know whether the linkage result can be explained in part or in full by the candidate SNP. The genotype identity-by-descent sharing test (GIST) and linkage and association modeling in pedigrees (LAMP) are two methods that were specifically proposed to address this question. GIST determines whether there is significant correlation between family-specific weights, defined by the presence of a tentatively associated allele in affected siblings, and family-specific nonparametric linkage scores. LAMP constructs a pedigree likelihood function of the marker data conditional on the trait data, and implements three likelihood ratio tests to characterize the relationship between the candidate SNP and the disease locus. The goal of our study was to compare the two approaches and evaluate their ability to identify disease-associated SNPs in the Genetic Analysis Workshop 15 (GAW15) simulated data. Our results can be summarized as follows: 1) GIST is simple and fast but, as a test of association, did not perform well in the GAW15 data, especially with adjustment for multiple testing; 2) as a test of association, the LAMP-LE test performs best when the linkage evidence is strong, or when there is at least moderate linkage disequilibrium between the candidate SNP and the trait locus. We conclude that LAMP is more flexible and reliable to use in practice.
Motivation: The sequencing of tumors and their matched normals is frequently used to study the genetic composition of cancer. Despite this fact, there remains a dearth of available software tools designed to compare sequences in pairs of samples and identify sites that are likely to be unique to one sample.
Results: In this article, we describe the mathematical basis of our SomaticSniper software for comparing tumor and normal pairs. We estimate its sensitivity and precision, and present several common sources of error resulting in miscalls.
Availability and implementation: Binaries are freely available for download at http://gmt.genome.wustl.edu/somatic-sniper/current/, implemented in C and supported on Linux and Mac OS X.
Contact: email@example.com; firstname.lastname@example.org
Supplementary information: Supplementary data are available at Bioinformatics online.
Summary: PedMerge allows users to accurately and efficiently merge separately ascertained pedigrees that belong to the same extended family. In addition to validation checks of pedigree structure, the software provides files in LINKAGE or PEDSYS format that easily allow to be used by a variety of genetic statistical software packages including LINKAGE, SOLAR, SLINK or can be further manipulated with Mega2.
Supplementary information: Supplementary data are available at Bioinformatics online.
This paper describes the software package KELVIN, which supports the PPL (posterior probability of linkage) framework for the measurement of statistical evidence in human (or more generally, diploid) genetic studies. In terms of scope, KELVIN supports two-point (trait-marker or marker-marker) and multipoint linkage analysis, based on either sex-averaged or sex-specific genetic maps, with an option to allow for imprinting; trait-marker linkage disequilibrium (LD), or association analysis, in case-control data, trio data, and/or multiplex family data, with options for joint linkage and trait-marker LD or conditional LD given linkage; dichotomous trait, quantitative trait and quantitative trait threshold models; and certain types of gene-gene interactions and covariate effects. Features and data (pedigree) structures can be freely mixed and matched within analyses. The statistical framework is specifically tailored to accumulate evidence in a mathematically rigorous way across multiple data sets or data subsets while allowing for multiple sources of heterogeneity, and KELVIN itself utilizes sophisticated software engineering to provide a powerful and robust platform for studying the genetics of complex disorders.
Association; Covariates; Epistasis; Imprinting; Linkage; Linkage disequilibrium; Quantitative traits; Software; KELVIN; Statistical evidence
We have developed a simulation tool within the NEURON simulator to assist in organization, verification and analysis of simulations. This tool, denominated Neural Query System (NQS), provides a relational database system, a query function based on the SELECT function of Structured Query Language (SQL), and data-mining tools. We show how NQS can be used to organize, manage, verify and visualize parameters for both single cell and network simulations. We demonstrate an additional use of NQS to organize simulation output and relate outputs to parameters in a network model. The NQS software package is available at http://senselab.med.yale.edu/senselab/SimToolDB.
For many years gene mapping studies have been performed through linkage analyses based on pedigree data. Recently, linkage disequilibrium methods based on unrelated individuals have been advocated as powerful tools to refine estimates of gene location. Many strategies have been proposed to deal with simply inherited disease traits. However, locating quantitative trait loci is statistically more challenging and considerable research is needed to provide robust and computationally efficient methods.
Under a three-locus Wright-Fisher model, we derived approximate expressions for the expected haplotype frequencies in a population. We considered haplotypes comprising one trait locus and two flanking markers. Using these theoretical expressions, we built a likelihood-maximization method, called HAPim, for estimating the location of a quantitative trait locus. For each postulated position, the method only requires information from the two flanking markers. Over a wide range of simulation scenarios it was found to be more accurate than a two-marker composite likelihood method. It also performed as well as identity by descent methods, whilst being valuable in a wider range of populations.
Our method makes efficient use of marker information, and can be valuable for fine mapping purposes. Its performance is increased if multiallelic markers are available. Several improvements can be developed to account for more complex evolution scenarios or provide robust confidence intervals for the location estimates.
A family-based association study design is not only able to localize causative genes more precisely than linkage analysis, but it also helps explain the genetic mechanism underlying the trait under study. Therefore, it can be used to follow up an initial linkage scan. For an association study of binary traits in general pedigrees, we propose a logistic mixture model that regresses the trait value on the genotypic values of markers under investigation and other covariates such as environmental factors. We first tested both the validity and power of the new model by simulating nuclear families inheriting a simple Mendelian trait. It is powerful when the correct disease model is specified and shows much loss of power when the dominance of a model is inversely specified, i.e., a dominant model is wrongly specified as recessive or vice versa. We then applied the new model to the Genetic Analysis Workshop (GAW) 15 simulation data to test the performance of the model when adjusting for covariates in the case of complex traits. Adjusting for the covariate that interacts with disease loci improves the power to detect association. The simplest version of the model only takes monogenic inheritance into account, but analysis of the GAW simulation data shows that even this simple model can be powerful for complex traits.
WOMBAT is a software package for quantitative genetic analyses of continuous traits, fitting a linear, mixed model; estimates of covariance components and the resulting genetic parameters are obtained by restricted maximum likelihood. A wide range of models, comprising numerous traits, multiple fixed and random effects, selected genetic covariance structures, random regression models and reduced rank estimation are accommodated. WOMBAT employs up-to-date numerical and computational methods. Together with the use of efficient compilers, this generates fast executable programs, suitable for large scale analyses. Use of WOMBAT is illustrated for a bivariate analysis. The package consists of the executable program, available for LINUX and WINDOWS environments, manual and a set of worked example, and can be downloaded free of charge from http://agbu.une.edu.au/~kmeyer/wombat.html
Software; Variance components; Genetic parameters; Mixed model; Restricted maximum likelihood
Continuously varying traits such as body size or gene expression level evolve during the history of species or gene lineages. To test hypotheses about the evolution of such traits, the maximum likelihood (ML) method is often used. Here we introduce CoMET (Continuous-character Model Evaluation and Testing), which is module for Mesquite that automates likelihood computations for nine different models of trait evolution. Due to its few restrictions on input data, CoMET is applicable to testing a wide range of character evolution hypotheses. The CoMET homepage, which links to freely available software and more detailed usage instructions, is located at http://www.lifesci.ucsb.edu/eemb/labs/oakley/software/comet.htm.
Maximum likelihood; Brownian motion; continuous traits; phylogeny
Rare variation is the current frontier in human genetics. The large pedigree design is practical, efficient, and well-suited for investigating rare variation. In large pedigrees, specific rare variants that co-segregate with a trait will occur in sufficient numbers so that effects can be measured, and evidence for association can be evaluated, by making use of methods that fully use the pedigree information. Evidence from linkage analysis can focus investigation, both reducing the multiple testing burden and expanding the variants that can be evaluated and followed up, as recent studies have shown. The large pedigree design requires only a small fraction of the sample size needed to identify rare variants of interest in population-based designs, and many highly suitable, well-understood, and available statistical and computational tools already exist. Samples consisting of large pedigrees with existing rich phenotype and genome scan data should be prime candidates for high-throughput sequencing in the search of the determinants of complex traits.
Linkage analysis based on identity-by-descent allele-sharing can be used to identify a chromosomal region harboring a quantitative trait locus (QTL), but lacks the resolution required for gene identification. Consequently, linkage disequilibrium (association) analysis is often employed for fine-mapping. Variance-components based combined linkage and association analysis for quantitative traits in sib pairs, in which association is modeled as a mean effect and linkage is modeled in the covariance structure has been extended to general pedigrees (quantitative transmission disequilibrium test, QTDT). The QTDT approach accommodates data not only from parents and siblings, but also from all available relatives. QTDT is also robust to population stratification. However, when population stratification is absent, it is possible to utilize even more information, namely the additional information contained in the founder genotypes. In this paper, we introduce a simple modification of the allelic transmission scoring method used in the QTDT that results in a more powerful test of linkage disequilibrium, but is only applicable in the absence of population stratification. This test, the quantitative trait linkage disequilibrium (QTLD) test, has been incorporated into a new procedure in the statistical genetics computer package SOLAR. We apply this procedure in a linkage/association analysis of an electrophysiological measurement previously shown to be related to alcoholism. We also demonstrate by simulation the increase in power obtained with the QTLD test, relative to the QTDT, when a true association exists between a marker and a QTL.
In this paper we apply two novel quantitative trait linkage statistics based on the posterior probability of linkage (PPL) to chromosome 4 from the GAW 14 COGA dataset. Our approaches are advantageous since they use the full likelihood, use full phenotypic information, do not assume normality at the population level or require population/sample parameter estimates; and like other forms of the PPL, they are specifically tailored to accumulate linkage evidence, either for or against linkage, across multiple sets of heterogeneous data.
The first statistic uses all quantitative trait (QT) information from the pedigree (QT-posterior probability of linkage, PPL); we applied the QT-PPL to the trait ecb21 (resting electroencephalogram). The second statistic allows simultaneous incorporation of dichotomous trait data into the QT analysis via a threshold model (QTT-PPL); we applied the QTT-PPL to combined data on ecb21 and ALDX1. We obtained a QT-PPL of 96% at GABRB1 and a QT-PPL of 18% at FABP2 while the QTT-PPL was 4% and 2% at the same two loci, respectively. By comparison, the variance-components (VC) method, as implemented in SOLAR, yielded multipoint VC LOD scores of 2.05 and 2.21 at GABRB1 and FABP2, respectively; no other VC LODs were greater than 2.
The QTT-PPL was only 4% at GABARB1, which might suggest that the underlying ecb21 gene does not also cause ALDX1, although features of the data complicate interpretation of this result.
Pedimap is a user-friendly software tool for visualizing phenotypic and genotypic data for related individuals linked in pedigrees. Genetic data can include marker scores, Identity-by-Descent probabilities, and marker linkage map positions, allowing the visualization of haplotypes through lineages. The pedigrees can accommodate all types of inheritance, including selfing, cloning, and repeated backcrossing, and all ploidy levels are supported. Visual association of the genetic data with phenotypic data simplifies the exploration of large data sets, thereby improving breeding decision making. Data are imported from text files; in addition data exchange with other software packages (FlexQTLTM and GenomeStudioTM) is possible. Instructions for use and an executable version compatible with the Windows platform are available for free from http://www.plantbreeding.wur.nl/UK/software_pedimap.html.
genetic data; pedigree software; phenotypic data; plant breeding
Often, multiple measures of a trait are available in a genetic linkage analysis. We compare Monte Carlo Markov chain analysis of two very different measures of hypertension in the simulated Genetic Analysis Workshop 13 data to examine how choice of measure affects the results. The measures selected were age-of-onset of hypertension and systolic blood pressure at first visit.
In combined segregation and linkage analysis of the complete pedigrees using the first replicate of the simulated data with missing values, we found that the age-of-onset analysis was better at identifying "slope" genes, while the systolic blood pressure analysis was better at identifying "baseline" genes.
Analysis of different trait measures may identify different trait-related genes. When linkage analysis is conducted on multiple trait measures, a linkage signal found for only one measure can represent a true trait locus.