When testing for genetic effects, failure to account for a gene-environment interaction can mask the true association effects of a genetic marker with disease. Family-based association tests are popular because they are completely robust to population substructure and model misspecification. However, when testing for an interaction, failure to model the main genetic effect correctly can lead to spurious results. Here we propose a family-based test for interaction that is robust to model misspecification, but still sensitive to an interaction effect, and can handle continuous covariates and missing parents. We extend the FBAT-I gene-environment interaction test for dichotomous traits to using both trios and sibships. We then compare this extension to joint tests of gene and gene-environment interaction, and compare the joint test additionally to the main effects test of the gene. Lastly we apply these three tests to a group of nuclear families ascertained according to affection with Bipolar Disorder.
genetic association; genetic interaction; family-based test; FBAT-I
For genome-wide association studies in family-based designs, a new, universally applicable approach is proposed. Using a modified Liptak’s method, we combine the p-value of the family-based association test (FBAT) statistic with the p-value for the Van Steen-statistic. The Van Steen-statistic is independent of the FBAT-statistic and utilizes information that is ignored by traditional FBAT-approaches. The new test statistic takes advantages of all available information about the genetic association, while, by virtue of its design, it achieves complete robustness against confounding due to population stratification. The approach is suitable for the analysis of almost any trait type for which FBATs are available, e.g. binary, continuous, time to-onset, multivariate, etc. The efficiency and the validity of the new approach depend on the specification of a nuisance/tuning parameter and the weight parameters in the modified Liptak’s method. For different trait types and ascertainment conditions, we discuss general guidelines for the optimal specification of the tuning parameter and the weight parameters. Our simulation experiments and an application to an Alzheimer study show the validity and the efficiency of the new method, which achieves power levels that are comparable to those of population-based approaches.
FBAT; Liptak’s method; Tuning parameter
Several family-based approaches for testing genetic association with traits obtained from longitudinal or repeated measurement studies have been previously proposed. These approaches utilize the multivariate data more efficiently by using estimated optimal weights to combine univariate tests. We show that these FBAT approaches are still robust against hidden population stratification, but their power can be heavily affected since the estimated weights might provide poor approximation of the true theoretical optimal weights with the presence of population stratification. We introduce a permutation-based approach FBAT-MinP and an equal combination approach FBAT-EW, both of which do not involve the use of estimated weights. Through simulation studies, FBAT-MinP and FBAT-EW are shown to be powerful even in the presence of population stratification, when other approaches may substantially lose their power. An application of these approaches to the Childhood Asthma Management Program (CAMP) study data for testing an association between body mass index and a previously reported candidate SNP is given as an example.
The availability of a large number of dense SNPs, high-throughput genotyping and computation methods promotes the application of family-based association tests. While most of the current family-based analyses focus only on individual traits, joint analyses of correlated traits can extract more information and potentially improve the statistical power. However, current TDT-based methods are low-powered. Here, we develop a method for tests of association for bivariate quantitative traits in families. In particular, we correct for population stratification by the use of an integration of principal component analysis and TDT. A score test statistic in the variance-components model is proposed. Extensive simulation studies indicate that the proposed method not only outperforms approaches limited to individual traits when pleiotropic effect is present, but also surpasses the power of two popular bivariate association tests termed FBAT-GEE and FBAT-PC, respectively, while correcting for population stratification. When applied to the GAW16 datasets, the proposed method successfully identifies at the genome-wide level the two SNPs that present pleiotropic effects to HDL and TG traits.
For genome-wide association studies in family-based designs, we propose a powerful two-stage testing strategy that can be applied in situations in which parent-offspring trio data are available and all offspring are affected with the trait or disease under study. In the first step of the testing strategy, we construct estimators of genetic effect size in the completely ascertained sample of affected offspring and their parents that are statistically independent of the family-based association/transmission disequilibrium tests (FBATs/TDTs) that are calculated in the second step of the testing strategy. For each marker, the genetic effect is estimated (without requiring an estimate of the SNP allele frequency) and the conditional power of the corresponding FBAT/TDT is computed. Based on the power estimates, a weighted Bonferroni procedure assigns an individually adjusted significance level to each SNP. In the second stage, the SNPs are tested with the FBAT/TDT statistic at the individually adjusted significance levels. Using simulation studies for scenarios with up to 1,000,000 SNPs, varying allele frequencies and genetic effect sizes, the power of the strategy is compared with standard methodology (e.g., FBATs/TDTs with Bonferroni correction). In all considered situations, the proposed testing strategy demonstrates substantial power increases over the standard approach, even when the true genetic model is unknown and must be selected based on the conditional power estimates. The practical relevance of our methodology is illustrated by an application to a genome-wide association study for childhood asthma, in which we detect two markers meeting genome-wide significance that would not have been detected using standard methodology.
The current state of genotyping technology has enabled researchers to conduct genome-wide association studies of up to 1,000,000 SNPs, allowing for systematic scanning of the genome for variants that might influence the development and progression of complex diseases. One of the largest obstacles to the successful detection of such variants is the multiple comparisons/testing problem in the genetic association analysis. For family-based designs in which all offspring are affected with the disease/trait under study, we developed a methodology that addresses this problem by partitioning the family-based data into two statistically independent components. The first component is used to screen the data and determine the most promising SNPs. The second component is used to test the SNPs for association, where information from the screening is used to weight the SNPs during testing. This methodology is more powerful than standard procedures for multiple comparisons adjustment (i.e., Bonferroni correction). Additionally, as only one data set is required for screening and testing, our testing strategy is less susceptible to study heterogeneity. Finally, as many family-based studies collect data only from affected offspring, this method addresses a major limitation of previous methodologies for multiple comparisons in family-based designs, which require variation in the disease/trait among offspring.
The case/pseudocontrol method provides a convenient framework for family-based association analysis of case-parent trios, incorporating several previously proposed methods such as the transmission/disequilibrium test and log-linear modelling of parent-of-origin effects. The method allows genotype and haplotype analysis at an arbitrary number of linked and unlinked multiallelic loci, as well as modelling of more complex effects such as epistasis, parent-of-origin effects, maternal genotype and mother-child interaction effects, and gene-environment interactions. Here we extend the method for analysis of quantitative as opposed to dichotomous (e.g. disease) traits. The resulting method can be thought of as a retrospective approach, modelling genotype given trait value, in contrast to prospective approaches that model trait given genotype. Through simulations and analytical derivations, we examine the power and properties of our proposed approach, and compare it to several previously proposed single-locus methods for quantitative trait association analysis. We investigate the performance of the different methods when extended to allow analysis of haplotype, maternal genotype and parent-of-origin effects. With randomly ascertained families, with or without population stratification, the prospective approach (modeling trait value given genotype) is found to be generally most effective, although the retrospective approach has some advantages with regard to estimation and interpretability of parameter estimates when applied to selected samples. Genet. Epidemiol. 31:833, 2007. © 2007 Wiley-Liss, Inc.
family-based; regression; imprinting; TDT
We propose a new approach for the analysis of copy number variants (CNVs)for genome-wide association studies in family-based designs. Our new overall association test combines the between-family component and the within-family component of the data so that the new test statistic is fully efficient and, at the same time, achieves the complete robustness against population-admixture and stratification, as classical family-based association tests that are based only on the between-family component. Although all data are incorporated into the test statistic, an adjustment for genetic confounding is not needed, not even for the between-family component. The new test statistic is valid for testing either quantitative or dichotomous phenotypes. If external CNV data are available, the approach can also be used in completely ascertained samples. Similar to the approach by Ionita-Laza et al.(1), the proposed test statistic does not required a CNV-calling algorithm and is based directly on the CNV probe intensity data. We show, via simulation studies, that our methodology increases the power of the FBAT statistic to levels comparable to those of population-based designs. The advantages of the approach in practice are demonstrated by an application to a genome-wide association study for body mass index (BMI).
Several family-based approaches have been previously proposed to enhance the power for testing genetic association when the traits are measured longitudinally or repeatedly. In this paper, we show that some of these FBAT approaches can be easily extended to accommodate incomplete data and remain unbiased tests. We also show that because of the nature of FBAT approaches, we can impute the missing phenotypes without biasing our tests and achieve higher power. We propose two imputation techniques based on E-M algorithm and the conditional mean model, respectively. Through simulation studies, these two imputation techniques are shown to have correct false positive rate and generally achieve higher power than complete case analysis or simple mean-imputation. Application of these approaches for testing an association between Body Mass Index and a previously reported candidate SNP confirms our results.
FBAT; Longitudinal Phenotype; Missing Data
This simulation-based report compares the performance of five methods of association analysis in the presence of linkage using extended sibships: the Family-Based Association Test (FBAT), Empirical Variance FBAT (EV-FBAT), Conditional Logistic Regression (CLR), Robust CLR (R-CLR) and Sibship Disequilibrium Test (SDT). The two tests accounting for residual familial correlation (EV-FBAT and R-CLR) and the model-free SDT showed correct test size in all simulated designs, while FBAT and CLR were only valid for small effect sizes. SDT had the lowest power, while CLR had the highest power, generally similar to FBAT and the robust variance analogues. The power of all model-dependent tests dropped when the model was misspecified, although often not substantially. Estimates of genetic effect with CLR and R-CLR were unbiased when the disease locus was analysed but biased when a nearby marker was analysed. This study demonstrates that the genetic effect does not need to be extreme to invalidate tests that ignore familial correlation and confirms that analogous methods using robust variance estimation provide a valid alternative at little cost to power. Overall R-CLR is the best-performing method among these alternatives for the analysis of extended sibship data.
Extended sibships; conditional logistic regression; robust variance; simulation
Traditional transmission disequilibrium test (TDT) based methods for genetic association analyses are robust to population stratification at the cost of a substantial loss of power. We here describe a novel method for family-based association studies that corrects for population stratification with the use of an extension of principal component analysis (PCA). Specifically, we adopt PCA on unrelated parents in each family. We then infer principal components for children from those for their parents through a TDT-like strategy. Two test statistics within variance-components model are proposed for association tests. Simulation results show that the proposed tests have correct type I error rates regardless of population stratification, and have greatly improved power over two popular TDT-based methods: QTDT and FBAT. The application to the Genetic Analysis Workshop 16 (GAW16) data sets attests to the feasibility of the proposed method.
Family Based Association Tests (FBATs); Transmission Disequilibrium Test (TDT); Principal Component Analysis (PCA); Variance-Components
Family data are used extensively in quantitative genetic studies to disentangle the genetic and environmental contributions to various diseases. Many family studies based their analysis on population-based registers containing a large number of individuals composed of small family units. For binary trait analyses, exact marginal likelihood is a common approach, but, due to the computational demand of the enormous data sets, it allows only a limited number of effects in the model. This makes it particularly difficult to perform joint estimation of variance components for a binary trait and the potential confounders. We have developed a data-reduction method of ascertaining informative families from population-based family registers. We propose a scheme where the ascertained families match the full cohort with respect to some relevant statistics, such as the risk to relatives of an affected individual. The ascertainment-adjusted analysis, which we implement using a pseudo-likelihood approach, is shown to be efficient relative to the analysis of the whole cohort and robust to mis-specification of the random effect distribution.
Segregation analysis; Mixed models; Variance components; Probit models
Trait-model-free (or ‘allele-sharing’) approach to linkage analysis is a popular tool in genetic mapping of complex traits, due to the absence of explicit assumptions about the underlying mode of inheritance of the trait. The likelihood framework introduced by Kong and Cox (1997) allows calculation of accurate p-values and LOD scores to test for linkage between a genomic region and a trait. Their method relies on the specification of a model for the trait-dependent segregation of marker alleles at a genomic region linked to the trait. Here we propose a new such model that is motivated by the desire to extract as much information as possible from extended pedigrees containing data from individuals related over several generations. However, our model is also applicable to smaller pedigrees, and has some attractive features compared with existing models (Kong and Cox, 1997), including the fact that it incorporates information on both affected and unaffected individuals. We illustrate the proposed model on simulated and real data, and compare its performance with the existing approach (Kong and Cox, 1997). The proposed approach is implemented in the program lm_ibdtests within the framework of MORGAN 2.8 (http://www.stat.washington.edu/thompson/Genepi/MORGAN/Morgan.shtml).
Identity by descent; likelihood ratio test; linkage analysis; Trait-model-free
Most methods for testing association in the presence of linkage, using family-based studies, have been developed for continuous traits. FBAT (family-based association tests) is one of few methods appropriate for discrete outcomes. In this article we describe a new test of association in the presence of linkage for binary traits. We use a gamma random effects model in which association and linkage are modelled as fixed effects and random effects, respectively. We have compared the gamma random effects model to an FBAT and a generalized estimating equation-based alternative, using two regions in the Genetic Analysis Workshop 14 simulated data. One of these regions contained haplotypes associated with disease, and the other did not.
Genome-wide association studies raise study-design and analytical issues that are still being debated. Among them, stands the issue of reducing the number of markers to be genotyped without loss of efficiency in identifying trait loci, which can reduce the cost of studies and minimize the multiple testing problem. With this aim, we proposed a two-step strategy based on two analytical methods suited to examine sets of markers rather than single markers: the local score, which screens the genome to select candidate regions in Step 1, and FBAT-LC, a multiple-marker family-based association test used to obtain significance levels of regions at step 2. The performance of this strategy was evaluated on all replicates of Genetic Analysis Workshop 15 Problem 3 simulated data, using the answers to that problem. Overall, seven of the nine generated trait loci were detected in at least 87% of the replicates using a framework designed to handle either association with the disease or association with the severity of disease. This multiple-marker strategy was compared to the single-marker approach. By considering regions instead of single markers, this strategy minimizes the multiple testing problem and the number of false-positive results.
Susceptibility to type 2 diabetes may be conferred by genetic variants having modest effects on risk. Genome-wide fixed marker arrays offer a novel approach to detect these variants.
We used the Affymetrix 100K SNP array in 1,087 Framingham Offspring Study family members to examine genetic associations with three diabetes-related quantitative glucose traits (fasting plasma glucose (FPG), hemoglobin A1c, 28-yr time-averaged FPG (tFPG)), three insulin traits (fasting insulin, HOMA-insulin resistance, and 0–120 min insulin sensitivity index); and with risk for diabetes. We used additive generalized estimating equations (GEE) and family-based association test (FBAT) models to test associations of SNP genotypes with sex-age-age2-adjusted residual trait values, and Cox survival models to test incident diabetes.
We found 415 SNPs associated (at p < 0.001) with at least one of the six quantitative traits in GEE, 242 in FBAT (18 overlapped with GEE for 639 non-overlapping SNPs), and 128 associated with incident diabetes (31 overlapped with the 639) giving 736 non-overlapping SNPs. Of these 736 SNPs, 439 were within 60 kb of a known gene. Additionally, 53 SNPs (of which 42 had r2 < 0.80 with each other) had p < 0.01 for incident diabetes AND (all 3 glucose traits OR all 3 insulin traits, OR 2 glucose traits and 2 insulin traits); of these, 36 overlapped with the 736 other SNPs. Of 100K SNPs, one (rs7100927) was in moderate LD (r2 = 0.50) with TCF7L2 (rs7903146), and was associated with risk of diabetes (Cox p-value 0.007, additive hazard ratio for diabetes = 1.56) and with tFPG (GEE p-value 0.03). There were no common (MAF > 1%) 100K SNPs in LD (r2 > 0.05) with ABCC8 A1369S (rs757110), KCNJ11 E23K (rs5219), or SNPs in CAPN10 or HNFa. PPARG P12A (rs1801282) was not significantly associated with diabetes or related traits.
Framingham 100K SNP data is a resource for association tests of known and novel genes with diabetes and related traits posted at . Framingham 100K data replicate the TCF7L2 association with diabetes.
Finding a genetic marker associated with a trait is a classic problem in human genetics. Recently, two-stage approaches have gained popularity in marker-trait association studies, in part because researchers hope to reduce the multiple testing problem by testing fewer markers in the final stage. We compared one two-stage family-based approach to an analogous single-stage method, calculating the empirical type I error rates and power for both methods using fully simulated data sets modeled on nuclear families with rheumatoid arthritis, and data sets of real single-nucleotide polymorphism genotypes from Centre d'Etude du Polymorphisme Humain pedigrees with simulated traits. In these analyses performed in the absence of population stratification, the single-stage method was consistently more powerful than the two-stage method for a given type I error rate. To explore the sources of this difference, we performed a case study comparing the individual steps of two-stage designs, the two-stage design itself, and the analogous one-stage design.
A number of trait-model-free tests have been proposed for linkage detection between a genomic region and a trait. These tests involve testing the dependence in segregation between a trait and marker alleles by assigning a score to every possible identity-by-descent configuration of the pedigree members without modeling the trait, and then averaging the scores over all such configurations compatible with the observed marker genotypes and genealogical relationship of the pedigree members. In this paper we propose a permutation test as an alternative to the existing exact trait-model-free tests for linkage detection. The proposed test is computationally efficient and is applicable on complex multigeneration pedigree structures. In this paper, we have compared the performance of the permutation test with two other exact trait-model-free tests for linkage detection on simulated datasets. We have demonstrated that the proposed permutation test is fully robust against mispecification of marker allele frequencies and has very good power for linkage detection. The permutation test is implemented in the program lm_ibdtests within the framework of MORGAN 2.8 (http://www.stat.washington.edu/thompson/Genepi/MORGAN/Morgan.shtml).
Trait-model-free; Identity by descent; Exact tests for linkage
Osteoporosis is characterized by low bone mass and compromised bone structure, heritable traits that contribute to fracture risk. There have been no genome-wide association and linkage studies for these traits using high-density genotyping platforms.
We used the Affymetrix 100K SNP GeneChip marker set in the Framingham Heart Study (FHS) to examine genetic associations with ten primary quantitative traits: bone mineral density (BMD), calcaneal ultrasound, and geometric indices of the hip. To test associations with multivariable-adjusted residual trait values, we used additive generalized estimating equation (GEE) and family-based association tests (FBAT) models within each sex as well as sexes combined. We evaluated 70,987 autosomal SNPs with genotypic call rates ≥80%, HWE p ≥ 0.001, and MAF ≥10% in up to 1141 phenotyped individuals (495 men and 646 women, mean age 62.5 yrs). Variance component linkage analysis was performed using 11,200 markers.
Heritability estimates for all bone phenotypes were 30–66%. LOD scores ≥3.0 were found on chromosomes 15 (1.5 LOD confidence interval: 51,336,679–58,934,236 bp) and 22 (35,890,398–48,603,847 bp) for femoral shaft section modulus. The ten primary phenotypes had 12 associations with 100K SNPs in GEE models at p < 0.000001 and 2 associations in FBAT models at p < 0.000001. The 25 most significant p-values for GEE and FBAT were all less than 3.5 × 10-6 and 2.5 × 10-5, respectively. Of the 40 top SNPs with the greatest numbers of significantly associated BMD traits (including femoral neck, trochanter, and lumbar spine), one half to two-thirds were in or near genes that have not previously been studied for osteoporosis. Notably, pleiotropic associations between BMD and bone geometric traits were uncommon. Evidence for association (FBAT or GEE p < 0.05) was observed for several SNPs in candidate genes for osteoporosis, such as rs1801133 in MTHFR; rs1884052 and rs3778099 in ESR1; rs4988300 in LRP5; rs2189480 in VDR; rs2075555 in COLIA1; rs10519297 and rs2008691 in CYP19, as well as SNPs in PPARG (rs10510418 and rs2938392) and ANKH (rs2454873 and rs379016). All GEE, FBAT and linkage results are provided as an open-access results resource at .
The FHS 100K SNP project offers an unbiased genome-wide strategy to identify new candidate loci and to replicate previously suggested candidate genes for osteoporosis.
For a diallelic genetic marker locus, tests like the parental-asymmetry test (PAT) are simple and powerful for detecting parent-of-origin effects. However, these approaches are applicable only to qualitative traits and thus are currently not suitable for quantitative traits. In this paper, the authors propose a novel class of PAT-type parent-of-origin effects tests for quantitative traits in families with both parents and an arbitrary number of children, which is denoted by Q-PAT(c) for some constant c. The authors further develop Q-1-PAT(c) for detection of parent-of-origin effects when information is available on only 1 parent in each family. The authors suggest the Q-C-PAT(c) test for combining families with data on both parental genotypes and families with data on only 1 parental genotype. Simulation studies show that the proposed tests control the empirical type I error rates well under the null hypothesis of no parent-of-origin effects. Power comparison also demonstrates that the proposed methods are more powerful than the existing likelihood ratio test. Although normality is commonly assumed in methods for studying quantitative traits, the tests proposed in this paper do not make any assumption about the distribution of the quantitative trait.
genomic imprinting; quantitative trait loci
Genome-wide association (GWA) studies that use population-based association approaches may identify spurious associations in the presence of population admixture. In this paper, we propose a novel three-stage approach that is computationally efficient and robust to population admixture and more powerful than the family-based association test (FBAT) for GWA studies with family data.
We propose a three-stage approach for GWA studies with family data. The first stage is to perform linear regression ignoring phenotypic correlations among family members. SNPs with a first stage p-value below a liberal cut-off (e.g. 0.1) are then analyzed in the second stage that employs a linear mixed effects (LME) model that accounts for within family correlations. Next, SNPs that reach genome-wide significance (e.g. 10-6 for 34,625 genotyped SNPs in this paper) are analyzed in the third stage using FBAT, with correction of multiple testing only for SNPs that enter the third stage. Simulations are performed to evaluate type I error and power of the proposed method compared to LME adjusting for 10 principal components (PC) of the genotype data. We also apply the three-stage approach to the GWA analyses of uric acid in Framingham Heart Study's SNP Health Association Resource (SHARe) project.
Our simulations show that whether or not population admixture is present, the three-stage approach has no inflated type I error. In terms of power, using LME adjusting PC is only slightly more powerful than the three-stage approach. When applied to the GWA analyses of uric acid in the SHARe project of FHS, the three-stage approach successfully identified and confirmed three SNPs previously reported as genome-wide significant signals.
For GWA analyses of quantitative traits with family data, our three-stage approach provides another appealing solution to population admixture, in addition to LME adjusting for genetic PC.
The genetic etiology of complex human diseases has been commonly viewed as a process that involves multiple genetic variants, environmental factors, as well as their interactions. Statistical approaches, such as the multifactor dimensionality reduction (MDR) and generalized MDR (GMDR), have recently been proposed to test the joint association of multiple genetic variants with either dichotomous or continuous traits. In this paper, we propose a novel Forward U-Test to evaluate the combined effect of multiple loci on quantitative traits with consideration of gene-gene/gene-environment interactions. In this new approach, a U-Statistic-based forward algorithm is first used to select potential disease-susceptibility loci and then a weighted U statistic is used to test the joint association of the selected loci with the disease. Through a simulation study, we found the Forward U-Test outperformed GMDR in terms of greater power. Aside from that, our approach is less computationally intensive, making it feasible for high-dimensional gene-gene/gene-environment research. We illustrate our method with a real data application to Nicotine Dependence (ND), using three independent datasets from the Study of Addiction: Genetics and Environment. Our gene-gene interaction analysis of 155 SNPs in 67 candidate genes identified two SNPs, rs16969968 within gene CHRNA5 and rs1122530 within gene NTRK2, jointly associated with the level of ND (p-value = 5.31e-7). The association, which involves essential interaction, is replicated in two independent datasets with p-values of 1.08e-5 and 0.02, respectively. Our finding suggests that joint action may exist between the two gene products.
gene-gene interaction; Forward U-Test; Nicotine Dependence
For genomewide association studies with family-based designs, we propose a Bayesian approach. We show that standard TDT/FBAT statistics can naturally be implemented in a Bayesian framework. We construct a Bayes factor conditional on the offspring phenotype and parental genotype data and then use the data we conditioned on to inform the prior odds for each marker. In the construction of the prior odds, the evidence for association for each single marker is obtained at the population-level by estimating the genetic effect size in the conditional mean model. Since such genetic effect size estimates are statistically independent of the effect size estimation within the families, the actual data set can inform the construction of the prior odds without any statistical penalty. In contrast to Bayesian approaches that have recently been proposed for genomewide association studies, our approach does not require assumptions about the genetic effect size; this makes the proposed method entirely data-driven. The power of the approach was assessed through simulation. We then applied the approach to a genomewide association scan to search for associations between single nucleotide polymorphisms and body mass index in the Childhood Asthma Management Program data.
family-based association tests; Bayes factors; complex traits
This paper describes the software package KELVIN, which supports the PPL (posterior probability of linkage) framework for the measurement of statistical evidence in human (or more generally, diploid) genetic studies. In terms of scope, KELVIN supports two-point (trait-marker or marker-marker) and multipoint linkage analysis, based on either sex-averaged or sex-specific genetic maps, with an option to allow for imprinting; trait-marker linkage disequilibrium (LD), or association analysis, in case-control data, trio data, and/or multiplex family data, with options for joint linkage and trait-marker LD or conditional LD given linkage; dichotomous trait, quantitative trait and quantitative trait threshold models; and certain types of gene-gene interactions and covariate effects. Features and data (pedigree) structures can be freely mixed and matched within analyses. The statistical framework is specifically tailored to accumulate evidence in a mathematically rigorous way across multiple data sets or data subsets while allowing for multiple sources of heterogeneity, and KELVIN itself utilizes sophisticated software engineering to provide a powerful and robust platform for studying the genetics of complex disorders.
Association; Covariates; Epistasis; Imprinting; Linkage; Linkage disequilibrium; Quantitative traits; Software; KELVIN; Statistical evidence
Genome-wide association studies have been able to identify disease associations with many common variants; however most of the estimated genetic contribution explained by these variants appears to be very modest. Rare variants are thought to have larger effect sizes compared to common SNPs but effects of rare variants cannot be tested in the GWAS setting. Here we propose a novel method to test for association of rare variants obtained by sequencing in family-based samples by collapsing the standard family-based association test (FBAT) statistic over a region of interest. We also propose a suitable weighting scheme so that low frequency SNPs that may be enriched in functional variants can be upweighted compared to common variants. Using simulations we show that the family-based methods perform at par with the population-based methods under no population stratification. By construction, family-based tests are completely robust to population stratification; we show that our proposed methods remain valid even when population stratification is present.
Missing data occur in genetic association studies for several reasons including missing family members and uncertain haplotype phase. Maximum likelihood is a commonly used approach to accommodate missing data, but it can be difficult to apply to family-based association studies, because of possible loss of robustness to confounding by population stratification. Here a novel likelihood for nuclear families is proposed, in which distinct sets of association parameters are used to model the parental genotypes and the offspring genotypes. This approach is robust to population structure when the data are complete, and has only minor loss of robustness when there are missing data. It also allows a novel conditioning step that gives valid analysis for multiple offspring in the presence of linkage. Unrelated subjects are included by regarding them as the children of two missing parents. Simulations and theory indicate similar operating characteristics to TRANSMIT, but with no bias with missing data in the presence of linkage. In comparison with FBAT and PCPH, the proposed model is slightly less robust to population structure but has greater power to detect strong effects. In comparison to APL and MITDT, the model is more robust to stratification and can accommodate sibships of any size. The methods are implemented for binary and continuous traits in software, UNPHASED, available from the author.
Conditional likelihood; Family-based association tests; Missing data; Population stratification; Transmission/disequilibrium test; Unphased genotype data