A novel web-based tool PedWiz that pipelines the informatics process for pedigree data is introduced. PedWiz is designed to assist researchers in the analysis of pedigree data. It provides a convenient tool for pedigree informatics: descriptive statistics, relative pairs, genetic similarity coefficients, the variance-covariance matrix for three estimated coefficients of allele identical-by-descent sharing as well as mean allele sharing, a plot of the pedigree structures, and a visualization of the identity coefficients. With a renewed interest in linkage and other family based methods, PedWiz will be a valuable tool for the analysis of family data.
pedigree; informatics; genetic similarity; identity-by-descent; relative pairs; family data
Current genome-wide association studies still heavily rely on a single-marker strategy, in which each single nucleotide polymorphism (SNP) is tested individually for association with a phenotype. Although methods and software packages that consider multimarker models have become available, they have been slow to become widely adopted and their efficacy in real data analysis is often questioned. Based on conducting extensive simulations, here we endeavor to provide more insights into the performance of simple multimarker association tests as compared to single-marker tests. The results reveal the power advantage as well as disadvantage of the two- vs. the single-marker test. Power differentials depend on the correlation structure among tag SNPs, as well as that between tag SNPs and causal variants. A two-marker test has relatively better performance than single-marker tests when the correlation of the two adjacent markers is high. However, using HapMap data, two-marker tests tended to have a greater chance of being less powerful than single-marker tests, due to constraints on the number of actual possible haplotypes in the HapMap data. Yet, the average power difference was small whenever the one-marker test is more powerful, while there were many situations where the two-marker test can be much more powerful. These findings can be useful to guide analyses of future studies.
Asymptotic power; single-marker test; two-marker test; genome-wide association
The potential importance of the joint action of genes, whether modeled with or without a statistical interaction term, has long been recognized. However, identifying such action has been a great challenge, especially when millions of genetic markers are involved. We propose a likelihood ratio-based Mann-Whitney test to search for joint gene action either among candidate genes or genome-wide. It extends the traditional univariate Mann-Whitney test to assess the joint association of genotypes at multiple loci with disease, allowing for high-order statistical interactions. Because only one overall significance test is conducted for the entire analysis, it avoids the issue of multiple testing. Moreover, the approach adopts a computationally efficient algorithm, making a genome-wide search feasible in a reasonable amount of time on a high performance personal computer. We evaluated the approach using both theoretical and real data. By applying the approach to 40 type 2 diabetes (T2D) susceptibility single-nucleotide polymorphisms (SNPs), we identified a four-locus model strongly associated with T2D in the Wellcome Trust (WT) study (permutation P-value < 0.001), and replicated the same finding in the Nurses’ Health Study/Health Professionals Follow-Up Study (NHS/HPFS) (P-value = 3.03 × 10–11). We also conducted a genome-wide search on 385,598 SNPs in the WT study. The analysis took approximately 55 hr on a personal computer, identifying the same first two loci, but overall a different set of four SNPs, jointly associated with T2D (P-value = 1.29 × 10–5). The nominal significance of this same association reached 4.01 × 10–6 in the NHS/HPFS.
gene-gene interaction; genome-wide search; forward selection
This paper is concerned with evaluating whether an interaction between two sets of risk factors for a binary trait is removable and fitting a parsimonious additive model using a suitable link function to estimate the disease odds (on the natural logarithm scale) when an interaction is removable. Statisticians define the term “interaction” as a departure from additivity in a linear model on a specific scale on which the data are measured. Certain interactions may be eliminated via a transformation of the outcome such that the relationship between the risk factors and the outcome is additive on the transformed scale. Such interactions are known as removable interactions. We develop a novel test statistic for detecting the presence a removable interaction in case-control studies. We consider the Guerrero and Johnson family of transformations and show that this family constitutes an appropriate link function for fitting an additive model when an interaction is removable. We use simulation studies to examine the type I error and power of the proposed test and to show that an additive model based on the Guerrero and Johnson link function leads to more precise estimates of the disease odds parameters and a better fit when an interaction is removable. The proposed test and use of the transformation are illustrated using case-control data from three published studies. Finally, we indicate how one can check that, after transformation, no further interaction is significant.
Analysis of variance; curvature; independence; interaction effect; link function; main effect; residuals; score statistic; Tukey’s test; transformation; unbalanced data
We investigated the heritability and familial aggregation of various indexes of arterial stiffness and wave reflection and we partitioned the phenotypic correlation between these traits into shared genetic and environmental components.
Using a family-based population sample, we recruited 204 parents (mean age, 51.7 years) and 290 offspring (29.4 years) from the population in Cracow, Poland (62 families), Hechtel-Eksel, Belgium (36), and Pilsen, the Czech Republic (50). We measured peripheral pulse pressure (PPp) sphygmomanometrically at the brachial artery; central pulse pressure (PPc), the peripheral augmentation indexes (PAIxs) and central augmentation indexes (CAIxs) by applanation tonometry at the radial artery; and aortic pulse wave velocity (PWV) by tonometry or ultrasound. In multivariate-adjusted analyses, we used the ASSOC and PROC GENMOD procedures as implemented in SAGE and SAS, respectively.
We found significant heritability for PAIx, CAIx, PPc and mean arterial pressure ranging from 0.37 to 0.41; P ≤ 0.0001. The method of intrafamilial concordance confirmed these results; intrafamilial correlation coefficients were significant for all arterial indexes (r > ≥ 0.12; P < ≤ 0.02) with the exception of PPc (r = −0.007; P = 0.90) in parent–offspring pairs. The sib–sib correlations were also significant for CAIx (r = 0.22; P = 0.001). The genetic correlation between PWV and the other arterial indexes were significant (ρG ≥ 0.29; P < 0.0001). The corresponding environmental correlations were only significantly positive for PPp (ρE = 0.10, P = 0.03).
The observation of significant intrafamilial concordance and heritability of various indexes of arterial stiffness as well as the genetic correlations among arterial phenotypes strongly support the search for shared genetic determinants underlying these traits.
arterial stiffness; familial aggregation; heritability; pulse pressure; systolic augmentation
15-Hydroxyprostaglandin dehydrogenase (15-PGDH) is a metabolic antagonist of COX-2, catalyzing the degradation of inflammation mediator prostaglandin E2 (PGE2) and other prostanoids. Recent studies have established the 15-PGDH gene as a colon cancer suppressor.
We evaluated 15-PDGH as a colon cancer susceptibility locus in a three-stage design. We first genotyped 102 single-nucleotide polymorphisms (SNPs) in the 15-PGDH gene, spanning ∼50 kb up and down-stream of the coding region, in 464 colon cancer cases and 393 population controls. We then genotyped the same SNPs, and also assayed the expression levels of 15-PGDH in colon tissues from 69 independent patients for whom colon tissue and paired germline DNA samples were available. In the final stage 3, we genotyped the 9 most promising SNPs from stages 1 and 2 in an independent sample of 525 cases and 816 controls (stage 3).
In the first two stages, three SNPs (rs1365611, rs6844282 and rs2332897) were statistically significant (p<0.05) in combined analysis of association with risk of colon cancer and of association with 15-PGDH expression, after adjustment for multiple testing. For one additional SNP, rs2555639, the T allele showed increased cancer risk and decreased 15-PGDH expression, but just missed statistical significance (p-adjusted = 0.063). In stage 3, rs2555639 alone showed evidence of association with an odds ratio (TT compared to CC) of 1.50 (95% CI = 1.05–2.15, p = 0.026).
Our data suggest that the rs2555639 T allele is associated with increased risk of colon cancer, and that carriers of this risk allele exhibit decreased expression of 15-PGDH in the colon.
Incorporating family data in genetic association studies has become increasingly appreciated, especially for its potential value in testing rare variants. We introduce here a variance-component based association test that can test multiple common or rare variants jointly using both family and unrelated samples.
The proposed approach implemented in our R package aggregates or collapses the information across a region based on genetic similarity instead of genotype scores, which avoids the power loss when the effects are in different directions or have different association strengths. The method is also able to effectively leverage the LD information in a region and it can produce a test statistic with an adaptively estimated number of degrees of freedom. Our method can readily allow for the adjustment of non-genetic contributions to the familial similarity, as well as multiple covariates.
We demonstrate through simulations that the proposed method achieves good performance in terms of Type I error control and statistical power. The method is implemented in the R package “fassoc”, which provides a useful tool for data analysis and exploration.
Association studies; Family data; Score test; Multi-marker test
The genetic etiology of complex human diseases has been commonly viewed as a process that involves multiple genetic variants, environmental factors, as well as their interactions. Statistical approaches, such as the multifactor dimensionality reduction (MDR) and generalized MDR (GMDR), have recently been proposed to test the joint association of multiple genetic variants with either dichotomous or continuous traits. In this paper, we propose a novel Forward U-Test to evaluate the combined effect of multiple loci on quantitative traits with consideration of gene-gene/gene-environment interactions. In this new approach, a U-Statistic-based forward algorithm is first used to select potential disease-susceptibility loci and then a weighted U statistic is used to test the joint association of the selected loci with the disease. Through a simulation study, we found the Forward U-Test outperformed GMDR in terms of greater power. Aside from that, our approach is less computationally intensive, making it feasible for high-dimensional gene-gene/gene-environment research. We illustrate our method with a real data application to Nicotine Dependence (ND), using three independent datasets from the Study of Addiction: Genetics and Environment. Our gene-gene interaction analysis of 155 SNPs in 67 candidate genes identified two SNPs, rs16969968 within gene CHRNA5 and rs1122530 within gene NTRK2, jointly associated with the level of ND (p-value = 5.31e-7). The association, which involves essential interaction, is replicated in two independent datasets with p-values of 1.08e-5 and 0.02, respectively. Our finding suggests that joint action may exist between the two gene products.
gene-gene interaction; Forward U-Test; Nicotine Dependence
Familial aggregation of specific response to allergens and asthma adjusted for age and sensitization to multiple allergens was assessed in two large population cohorts. Allergen skin prick tests (SPTs) were administered to 1151 families in the Tucson Children’s Respiratory Study (CRS) and 435 families in the Tucson Epidemiological Study of Airway Obstructive Disease (TESAOD). Sensitization was defined by wheal size ≥ 3 mm; physician-diagnosed asthma at age ≥ 8 years was based on questionnaires. Using S.A.G.E. 6.1 software ASSOC and FCOR, familial correlations of crude and adjusted phenotypes were evaluated. Crude estimates of parent-offspring (P-O) and sibling correlations were statistically significant for most allergens, ranging from 0.03 to 0.29. After adjusting for age of assessment and “other atopy” (SPT-positive response to additional allergens), correlations were reduced by14–71%. Sibling correlations for specific response to allergens were consistently higher than P-O correlations, but this difference was significant only for dust mite and weed mix in the TESAOD population. Familial correlation for atopic status (any positive SPTs versus none) tended to be higher than for specific allergens. Asthma, with and without adjustment, showed greater familial correlation than either specific or general SPT response and significantly higher sibling correlation in TESAOD than in CRS, probably due to the older age of the siblings and the longer period of ascertainment. In conclusion, significant familial aggregation of specific response to allergen after adjustment for other atopy appears to reflect a genetic propensity toward atopy, dependent on shared familial exposures. Results also suggest that inheritance of asthma is independent of atopic sensitization.
familial aggregation; specific response to allergens; atopy; asthma
Interactions among genomic loci (also known as epistasis) have been suggested as one of the potential sources of missing heritability in single locus analysis of genome-wide association studies (GWAS). The computational burden of searching for interactions is compounded by the extremely low threshold for identifying significant p-values due to multiple hypothesis testing corrections. Utilizing prior biological knowledge to restrict the set of candidate SNP pairs to be tested can alleviate this problem, but systematic studies that investigate the relative merits of integrating different biological frameworks and GWAS data have not been conducted.
We developed four biologically based frameworks to identify pairwise interactions among candidate SNP pairs as follows: (1) for each human protein-coding gene, a set of SNPs associated with that gene was constructed providing a gene-based interaction model, (2) for each known biological pathway, a set of SNPs associated with the genes in the pathway was constructed providing a pathway-based interaction model, (3) a set of SNPs associated with genes in a disease-related subnetwork provides a network-based interaction model, and (4) a framework is based on the function of SNPs. The last approach uses expression SNPs (eSNPs or eQTLs), which are SNPs or loci that have defined effects on the abundance of transcripts of other genes. We constructed pairs of eSNPs and SNPs located in the target genes whose expression is regulated by eSNPs. For all four frameworks the SNP sets were exhaustively tested for pairwise interactions within the sets using a traditional logistic regression model after excluding genes that were previously identified to associate with the trait. Using previously published GWAS data for type 2 diabetes (T2D) and the biologically based pair-wise interaction modeling, we identify twelve genes not seen in the previous single locus analysis.
We present four approaches to detect interactions associated with complex diseases. The results show our approaches outperform the traditional single locus approaches in detecting genes that previously did not reach significance; the results also provide novel drug targets and biomarkers relevant to the underlying mechanisms of disease.
In case-control Single Nucleotide Polymorphism (SNP) data, the Allele frequency, Hardy Weinberg Disequilibrium (HWD) and Linkage Disequilibrium (LD) contrast tests are three distinct sources of information about genetic association. While all three tests are typically developed in a retrospective context, we show that prospective logistic regression models may be developed that correspond conceptually to the retrospective tests. This approach provides a flexible framework for conducting a systematic series of association analyses using unphased genotype data and any number of covariates. For a single stage study, two single-marker tests and four two-marker tests are discussed. The true association models are derived and they allow us to understand why a model with only a linear term will generally fit well for a SNP in weak LD with a causal SNP, whatever the disease model, but not for a SNP in high LD with a non-additive disease SNP. We investigate the power of the association tests using real LD parameters from chromosome 11 in the HapMap CEU population data. Among the single-marker tests, the allelic test has on average the most power in the case of an additive disease; but, for dominant, recessive and heterozygote disadvantage diseases, the genotypic test has the most power. Among the six two-marker tests, the Allelic-LD contrast test, which incorporates linear terms for two markers and their interaction term, provides the most reliable power overall for the cases studied. Therefore, our result supports incorporating an interaction term as well as linear terms in multi-marker tests.
Allele frequency contrast test; LD contrast test; HWD contrast test; Genome-wide Association
Numerous studies have provided support for genetic susceptibility to tuberculosis (TB); however, heterogeneity in disease expression has hampered previous genetic studies. The purpose of this work was to investigate possible intermediate phenotypes for TB. A set of cytokine profiles, including antigen-stimulated whole-blood assays for interferon (IFN)–γ, tumor necrosis factor (TNF)–α, transforming growth factor (TGF)–β, and the ratio of IFN to TNF, were analyzed in 177 pedigrees from a community in Uganda with a high prevalence of TB. The heritability of these variables was estimated after adjustment for covariates, and TNF-α, in particular, had an estimated heritability of 68%. A principal component analysis of IFN-γ, TNF-α, and TGF-β reflected the immunologic model of TB. In this analysis, the first component explained >38% of the variation in the data. This analysis illustrates the value of such intermediate phenotypes in mapping susceptibility loci for TB and demonstrates that this area deserves further research.
The possible evidence for association comprises three types of information: differences between cases and controls in allele frequencies, in parameters for Hardy Weinberg disequilibrium (HWD), and in parameters for linkage disequilibrium (LD). LD between marker and disease alleles results in a difference in at least one of the three types of parameters [Won and Elston, 2008]. However, the parameters for LD require knowledge about phase, which is usually unknown, making the LD contrast test without modification infeasible in practice. Methods for handling phase uncertainty are: (1) the most probable haplotype pair for each individual can be considered as the true phase; (2) a weighted average of haplotypes can be used; (3) we can consider the composite LD, which does not require any information about phase. We compare these methods to handle phase uncertainty in terms of validity and efficiency, and the effect on them of HWD in the population, at the same time confirming results for the three types of information. When the LD between markers is high, the LD contrast test that uses a weighted average of haplotypes or the most probable haplotypes to calculate the LD is recommended, but otherwise the LD contrast test that uses the composite LD is recommended. We conclude that, even though the difference in allele frequencies is usually the most informative test except in the case of a recessive disease, the LD contrast test can be more powerful if the markers are dense enough.
linkage disequilibrium; haplotype phase; self replication
It is generally known that risk variants segregate together with a disease within families but this information has not been used in the existing statistical methods for detecting rare variants. Here we introduce two weighted sum statistics that can apply to either genome-wide association data or resequencing data for identifying rare disease variants: weights calculated based on sibpairs and odd ratios, respectively. We evaluated the two methods via extensive simulations under different disease models. We compared the proposed methods with the weighted sum statistic (WSS) proposed by Madsen and Browning, keeping the same genotyping or resequencing cost. Our methods clearly demonstrate more statistical power than the WSS. In addition, we found using sibpair information can increase power over using only unrelated samples by more than 40%. We applied our methods to the Framingham Heart Study (FHS) and Wellcome Trust Case Control Consortium (WTCCC) hypertension datasets. Although we did not identify any genes as reaching a genome-wide significance level, we found variants in the candidate gene angiotensinogen (AGT) significantly associated with hypertension at P=6.9×10-4, whereas the most significant single SNP association evidence is P=0.063. We further applied the odds ratio weighted method to the IFIH1 gene for type 1 diabetes in the WTCCC data. Our method yielded a P value of 4.82×10-4, much more significant than that obtained by haplotype-based methods. We demonstrated that family data are extremely informative in searching for rare variants underlying complex traits, and the odds ratio weighted sum statistic is more efficient than currently existing methods.
Fisher  was the first to suggest a method of combining the p-values obtained from several statistics and many other methods have been proposed since then. However, there is no agreement about what is the best method. Motivated by a situation that now often arises in genetic epidemiology, we consider the problem when it is possible to define a simple alternative hypothesis of interest for which the expected effect size of each test statistic is known and we determine the most powerful test for this simple alternative hypothesis. Based on the proposed method, we show that information about the effect sizes can be used to obtain the best weights for Liptak’s method of combining p-values. We present extensive simulation results comparing methods of combining p-values and illustrate for a real example in genetic epidemiology how information about effect sizes can be deduced.
Fisher; Liptak; effect size
Numerous studies have examined genetic influences on developmental problems such as speech sound disorders, language impairment, and reading disability. Disorders such as speech sound disorder (SSD) are often analyzed using their component endophenotypes. Most studies, however, have involved comparisons of twin pairs or siblings of similar age, or have adjusted for age ignoring effects that are peculiar to age-related trajectories for phenotypic change. Such developmental changes in these skills have limited the usefulness of data from parents or siblings who differ substantially in age from the probands. Employing parent-offspring correlation in heritability estimation permits a more precise estimate of the additive component of genetic variance, but different generations have to be measured for the same trait. We report on a smoothing procedure which fits a series of lines that approximate a curve matching the developmental trajectory. This procedure adjusts for changes in measures with age, so that the adjusted values are on a similar scale for children, adolescents, and adults. We apply this method to four measures of phonological memory and articulation in order to estimate their heritability. Repetition of multisyllabic real words showed the best heritability estimate of 45% in this sample. We conclude that differences in measurement scales across the age span can be reconciled through non-linear modeling of the developmental process.
Speech; Language; longitudinal; developmental genetics; spline fitting
Although recent studies have attempted to dispel the confusion that exists in regard to the definition, analysis and interpretation of interaction in genetics, there still remain aspects that are poorly understood by non-statisticians. After a brief discussion of the definition of gene-gene interaction, the main part of this study addresses the fundamental meaning of statistical interaction and its relationship to measurement scale, disproportionate sample sizes in the cells of a two-way table and gametic phase disequilibrium.
Epistasis; Gametic phase disequilibrium; Interaction; Transformation
Structural Equation Modeling (SEM) is an analysis approach that accounts for both the causal relationships between variables and the errors associated with the measurement of these variables. In this paper, a framework for implementing structural equation models (SEMs) in family data is proposed.
This framework includes both a latent measurement model and a structural model with covariates. It allows for a wide variety of models, including latent growth curve models. Environmental, polygenic and other genetic variance components can be included in the SEM. Kronecker notation makes it easy to separate the SEM process from a familial correlation model. A limited information method of model fitting is discussed. We show how missing data and ascertainment may be handled. We give several examples of how the framework may be used.
A simulation study shows that our method is computationally feasible, and has good statistical properties.
Our framework may be used to build and compare causal models using family data without any genetic marker data. It also allows for a nearly endless array of genetic association and/or linkage tests. A preliminary Matlab program is available, and we are currently implementing a more complete and user-friendly R package.
Latent variable analysis; Path analysis; Extended pedigrees; Complex traits; Genetic linkage analysis; Genetic association
The Genetic Analysis Workshop 17 data we used comprise 697 unrelated individuals genotyped at 24,487 single-nucleotide polymorphisms (SNPs) from a mini-exome scan, using real sequence data for 3,205 genes annotated by the 1000 Genomes Project and simulated phenotypes. We studied 200 sets of simulated phenotypes of trait Q2. An important feature of this data set is that most SNPs are rare, with 87% of the SNPs having a minor allele frequency less than 0.05. For rare SNP detection, in this study we performed a least absolute shrinkage and selection operator (LASSO) regression and F tests at the gene level and calculated the generalized degrees of freedom to avoid any selection bias. For comparison, we also carried out linear regression and the collapsing method, which sums the rare SNPs, modified for a quantitative trait and with two different allele frequency thresholds. The aim of this paper is to evaluate these four approaches in this mini-exome data and compare their performance in terms of power and false positive rates. In most situations the LASSO approach is more powerful than linear regression and collapsing methods. We also note the difficulty in determining the optimal threshold for the collapsing method and the significant role that linkage disequilibrium plays in detecting rare causal SNPs. If a rare causal SNP is in strong linkage disequilibrium with a common marker in the same gene, power will be much improved.
We found from our analysis of the Genetic Analysis Workshop 17 data that the population structure of the 697 unrelated individuals was an important confounding factor for association studies, even if it was not explicitly considered when simulating the phenotypes. We uncovered structures beyond the reported ethnicities and found ample evidence of phenotype–population structure associations. The first 10 principal components of the genotype data of the 697 individuals demonstrated much stronger associations with Q1, Q2, and the disease than did the individuals’ ethnicities. In addition, we observed that population structure was a confounding factor for the Q1-gene association when identifying the significant genes both with and without adjusting for the causal single-nucleotide polymorphisms, the ethnicities, and the principal components. Many false discoveries remained after adjusting for the causal single-nucleotide polymorphisms. Adjusting for the principal components appeared more effective than did adjusting for ethnicity in terms of preventing false discoveries. This analysis was performed with knowledge of the causal loci.
Gene-based and single-nucleotide polymorphism (SNP) set association studies provide an important complement to SNP analysis. Kernel-based nonparametric regression has recently emerged as a powerful and flexible tool for this purpose. Our goal is to explore whether this approach can be extended to incorporate and test for interaction effects, especially for genes containing rare variant SNPs. Here, we construct nonparametric regression models that can be used to include a gene-environment interaction effect under the framework of the least-squares kernel machine and examine the performance of the proposed method on the Genetic Analysis Workshop 17 unrelated individuals data set. Two hundred simulated replicates were used to explore the power for detecting interaction. We demonstrate through a genome scan of the quantitative phenotype Q1 that the simulated gene-environment interaction effect in the data can be detected with reasonable power by using the least-squares kernel machine method.
For the family data from Genetic Analysis Workshop 17, we obtained heritability estimates of quantitative traits Q1 and Q4 using the ASSOC program in the S.A.G.E. software package. ASSOC is a family-based method that estimates heritability through the estimation of variance components. The covariate-adjusted mean heritability was 0.650 for Q1 and 0.745 for Q4. For the unrelated individuals data, we estimated the heritability of Q1 as the proportion of total variance that can be accounted for by all single-nucleotide polymorphisms under an additive model. We examined a novel ordinary least-squares method, a naïve restricted maximum-likelihood method, and a calibrated restricted maximum-likelihood method. We applied the different methods to all 200 replicates for Q1. We observed that the ordinary least-squares method yielded many estimates outside the interval [0, 1]. The restricted maximum-likelihood estimates were more stable than the ordinary least-squares estimates. The naïve restricted maximum-likelihood method yielded an average estimate of 0.462 ± 0.1, and the calibrated restricted maximum-likelihood method yielded an average of 0.535 ± 0.121. Our results demonstrate discrepancies in heritability estimates using the family data and the unrelated individuals data.
To detect rare variants associated with a phenotype, we develop a novel statistical method that can use both family and unrelated case-control data. Unlike the currently existing methods, we first use family data to calculate weights to be given to rare variants, differentiating between concordantly affected and discordant sib pairs. These weights are then used in an association test applied to the unrelated case-control data. We applied the proposed method to the simulated sequencing data in Genetic Analysis Workshop 17 and identified two genes associated with the disease.
We evaluate an approach to detect single-nucleotide polymorphisms (SNPs) that account for a linkage signal with covariate-based affected relative pair linkage analysis in a conditional-logistic model framework using all 200 replicates of the Genetic Analysis Workshop 17 family data set. We begin by combining the multiple known covariate values into a single variable, a propensity score. We also use each SNP as a covariate, using an additive coding based on the number of minor alleles. We evaluate the distribution of the difference between LOD scores with the propensity score covariate only and LOD scores with the propensity score covariate and a SNP covariate. The inclusion of causal SNPs in causal genes increases LOD scores more than the inclusion of noncausal SNPs either within causal genes or outside causal genes. We compare the results from this method to results from a family-based association analysis and conclude that it is possible to identify SNPs that account for the linkage signals from genes using a SNP-covariate-based affected relative pair linkage approach.
Genome-wide association studies are based on the linkage disequilibrium pattern between common tagging single-nucleotide polymorphisms (SNPs) (i.e., SNPs having only common alleles) and true causal variants, and association studies with rare SNP alleles aim to detect rare causal variants. To better understand and explain the findings from both types of studies and to provide clues to improve the power of an association study with only common SNPs genotyped, we study the correlation between common SNPs and the presence of rare alleles within a region in the genome and look at the capability of common SNPs in strong linkage disequilibrium with each other to capture single rare alleles. Our results indicate that common SNPs can, to some extent, tag the presence of rare alleles and that including SNPs in strong linkage disequilibrium with each other among the tagging SNPs helps to detect rare alleles.