Genome-wide association studies have identified polymorphisms associated with breast cancer subtypes and across multiple population subgroups; however, few studies to date have applied linkage analysis to other population groups.
We performed the first genome-wide breast cancer linkage analysis in 106 African American families (comprising 179 affected and 79 unaffected members) not known to be segregating BRCA mutations to search for novel breast cancer loci. We performed regression-based model-free multi-point linkage analyses of the sibling pairs using SIBPAL, and two-level Haseman-Elston linkage analyses of affected relative pairs using RELPAL.
We identified −log10p-values that exceed 4 on chromosomes 3q and 12q, as well as a region near BRCA1 on chromosome 17 (−log10p-values in the range of 3.0-3.2) using both sibling-based and relative-based methods, the latter observation may suggest that undetected BRCA1 mutations or other mutations nearby such as HOXB13 may be segregating in our sample.
In summary, these results suggest novel putative regions harboring risk alleles in African Americans that deserve further study.
We hope that our study will spur further family-based investigation into specific mechanisms for breast cancer disparities.
breast cancer; family; linkage study; African American
We tested for germline variants showing association to colon cancer metastasis using a genome-wide association study that compared Ashkenazi Jewish individuals with stage IV metastatic colon cancers versus those with stage I or II non-metastatic colon cancers. In a two-stage study design, we demonstrated significant association to developing metastatic disease for rs60745952, that in Ashkenazi discovery and validation cohorts, respectively, showed an odds ratio (OR) = 2.3 (P = 2.73E-06) and OR = 1.89 (P = 8.05E-04) (exceeding validation threshold of 0.0044). Significant association to metastatic colon cancer was further confirmed by a meta-analysis of rs60745952 in these datasets plus an additional Ashkenazi validation cohort (OR = 1.92; 95% CI: 1.28–2.87), and by a permutation test that demonstrated a significantly longer haplotype surrounding rs60745952 in the stage IV samples. rs60745952, located in an intergenic region on chromosome 4q31.1, and not previously associated with cancer, is, thus, a germline genetic marker for susceptibility to developing colon cancer metastases among Ashkenazi Jews.
For a family-based sample, the phenotypic variance-covariance matrix can be parameterized to include the variance of a polygenic effect that has then been estimated using a variance component analysis. However, with the advent of large-scale genomic data, the genetic relationship matrix (GRM) can be estimated and can be utilized to parameterize the variance of a polygenic effect for population-based samples. Therefore narrow sense heritability, which is both population and trait specific, can be estimated with both population- and family-based samples. In this study we estimate heritability from both family-based and population-based samples, collected in Korea, and the heritability estimates from the pooled samples were, for height, 0.60; body mass index (BMI), 0.32; log-transformed triglycerides (log TG), 0.24; total cholesterol (TCHL), 0.30; high-density lipoprotein (HDL), 0.38; low-density lipoprotein (LDL), 0.29; systolic blood pressure (SBP), 0.23; and diastolic blood pressure (DBP), 0.24. Furthermore, we found differences in how heritability is estimated—in particular the amount of variance attributable to common environment in twins can be substantial—which indicates heritability estimates should be interpreted with caution.
The growing advances in DNA sequencing tools have made analyzing the human genome cheaper and faster. While such analyses are intended to identify complex variants, related to disease susceptibility and efficacy of drug responses, they have blurred the definitions of mutation and polymorphism.
In the era of personal genomics, it is critical to establish clear guidelines regarding the use of a reference genome. Nowadays DNA variants are called as differences in comparison to a reference. In a sequencing project Single Nucleotide Polymorphisms (SNPs) and DNA mutations are defined as DNA variants detectable in >1 % or <1 % of the population, respectively. The alternative use of the two terms mutation or polymorphism for the same event (a difference as compared with a reference) can lead to problems of classification. These problems can impact the accuracy of the interpretation and the functional relationship between a disease state and a genomic sequence.
We propose to solve this nomenclature dilemma by defining mutations as DNA variants obtained in a paired sequencing project including the germline DNA of the same individual as a reference. Moreover, the term mutation should be accompanied by a qualifying prefix indicating whether the mutation occurs only in somatic cells (somatic mutation) or also in the germline (germline mutation). We believe this distinction in definition will help avoid confusion among researchers and support the practice of sequencing the germline and somatic tissues in parallel to classify the DNA variants thus defined as mutations.
Personal genomics; Precision medicine; DNA sequencing; DNA variants; Human genome
Olson's conditional-logistic model retains the nice property of the LOD score formulation and has advantages over other methods that make it an appropriate choice for complex trait linkage mapping. However, the asymptotic distribution of the conditional-logistic likelihood-ratio (CL-LR) statistic with genetic constraints on the model parameters is unknown for some analysis models, even in the case of samples comprising only independent sib pairs. We derive approximations to the asymptotic null distributions of the CL-LR statistics and compare them with the empirical null distributions by simulation using independent affected sib pairs. Generally, the empirical null distributions of the CL-LR statistics match well the known or approximated asymptotic distributions for all analysis models considered except for the covariate model with a minimum-adjusted binary covariate. This work will provide useful guidelines for linkage analysis of real data sets for the genetic analysis of complex traits, thereby contributing to the identification of genes for disease traits.
linkage analysis; affected sib pairs; identity-by-descent; conditional-logistic model; genetic constraints; null distribution; likelihood-ratio statistics
A novel web-based tool PedWiz that pipelines the informatics process for pedigree data is introduced. PedWiz is designed to assist researchers in the analysis of pedigree data. It provides a convenient tool for pedigree informatics: descriptive statistics, relative pairs, genetic similarity coefficients, the variance-covariance matrix for three estimated coefficients of allele identical-by-descent sharing as well as mean allele sharing, a plot of the pedigree structures, and a visualization of the identity coefficients. With a renewed interest in linkage and other family based methods, PedWiz will be a valuable tool for the analysis of family data.
pedigree; informatics; genetic similarity; identity-by-descent; relative pairs; family data
genome-wide association; family studies; study designs; genetic factors; environmental factors
Current genome-wide association studies still heavily rely on a single-marker strategy, in which each single nucleotide polymorphism (SNP) is tested individually for association with a phenotype. Although methods and software packages that consider multimarker models have become available, they have been slow to become widely adopted and their efficacy in real data analysis is often questioned. Based on conducting extensive simulations, here we endeavor to provide more insights into the performance of simple multimarker association tests as compared to single-marker tests. The results reveal the power advantage as well as disadvantage of the two- vs. the single-marker test. Power differentials depend on the correlation structure among tag SNPs, as well as that between tag SNPs and causal variants. A two-marker test has relatively better performance than single-marker tests when the correlation of the two adjacent markers is high. However, using HapMap data, two-marker tests tended to have a greater chance of being less powerful than single-marker tests, due to constraints on the number of actual possible haplotypes in the HapMap data. Yet, the average power difference was small whenever the one-marker test is more powerful, while there were many situations where the two-marker test can be much more powerful. These findings can be useful to guide analyses of future studies.
Asymptotic power; single-marker test; two-marker test; genome-wide association
The potential importance of the joint action of genes, whether modeled with or without a statistical interaction term, has long been recognized. However, identifying such action has been a great challenge, especially when millions of genetic markers are involved. We propose a likelihood ratio-based Mann-Whitney test to search for joint gene action either among candidate genes or genome-wide. It extends the traditional univariate Mann-Whitney test to assess the joint association of genotypes at multiple loci with disease, allowing for high-order statistical interactions. Because only one overall significance test is conducted for the entire analysis, it avoids the issue of multiple testing. Moreover, the approach adopts a computationally efficient algorithm, making a genome-wide search feasible in a reasonable amount of time on a high performance personal computer. We evaluated the approach using both theoretical and real data. By applying the approach to 40 type 2 diabetes (T2D) susceptibility single-nucleotide polymorphisms (SNPs), we identified a four-locus model strongly associated with T2D in the Wellcome Trust (WT) study (permutation P-value < 0.001), and replicated the same finding in the Nurses’ Health Study/Health Professionals Follow-Up Study (NHS/HPFS) (P-value = 3.03 × 10–11). We also conducted a genome-wide search on 385,598 SNPs in the WT study. The analysis took approximately 55 hr on a personal computer, identifying the same first two loci, but overall a different set of four SNPs, jointly associated with T2D (P-value = 1.29 × 10–5). The nominal significance of this same association reached 4.01 × 10–6 in the NHS/HPFS.
gene-gene interaction; genome-wide search; forward selection
Longitudinal data enables detecting the effect of aging/time, and as a repeated measures design is statistically more efficient compared to cross-sectional data if the correlations between repeated measurements are not large. In particular, when genotyping cost is more expensive than phenotyping cost, the collection of longitudinal data can be an efficient strategy for genetic association analysis. However, in spite of these advantages, genome-wide association studies (GWAS) with longitudinal data have rarely been analyzed taking this into account. In this report, we calculate the required sample size to achieve 80% power at the genome-wide significance level for both longitudinal and cross-sectional data, and compare their statistical efficiency. Furthermore, we analyzed the GWAS of eight phenotypes with three observations on each individual in the Korean Association Resource (KARE). A linear mixed model allowing for the correlations between observations for each individual was applied to analyze the longitudinal data, and linear regression was used to analyze the first observation on each individual as cross-sectional data. We found 12 novel genome-wide significant disease susceptibility loci that were then confirmed in the Health Examination cohort, as well as some significant interactions between age/sex and SNPs.
longitudinal data; cross-sectional data; Korean Association Resource (KARE) cohort; Health Examinee (HEXA) cohort
A predictive joint shared parameter model is proposed for discrete time-to-event and longitudinal data. A discrete survival model with frailty and a generalized linear mixed model for the longitudinal data are joined to predict the probability of events. This joint model focuses on predicting discrete time-to-event outcome, taking advantage of repeated measurements. We show that the probability of an event in a time window can be more precisely predicted by incorporating the longitudinal measurements. The model was investigated by comparison with a two-step model and a discrete time survival model. Results from both a study on the occurrence of tuberculosis and simulated data show that the joint model is superior to the other models in discrimination ability, especially as the latent variables related to both survival times and the longitudinal measurements depart from 0.
joint modeling; discrete time-to-event; longitudinal; nonlinear; biomarker; Tuberculosis, immunology
High blood pressure (BP) is the most common cardiovascular risk factor worldwide and a major contributor to heart disease and stroke. We previously discovered a BP-associated missense SNP (single nucleotide polymorphism)–rs2272996–in the gene encoding vanin-1, a glycosylphosphatidylinositol (GPI)-anchored membrane pantetheinase. In the present study, we first replicated the association of rs2272996 and BP traits with a total sample size of nearly 30,000 individuals from the Continental Origins and Genetic Epidemiology Network (COGENT) of African Americans (P = 0.01). This association was further validated using patient plasma samples; we observed that the N131S mutation is associated with significantly lower plasma vanin-1 protein levels. We observed that the N131S vanin-1 is subjected to rapid endoplasmic reticulum-associated degradation (ERAD) as the underlying mechanism for its reduction. Using HEK293 cells stably expressing vanin-1 variants, we showed that N131S vanin-1 was degraded significantly faster than wild type (WT) vanin-1. Consequently, there were only minimal quantities of variant vanin-1 present on the plasma membrane and greatly reduced pantetheinase activity. Application of MG-132, a proteasome inhibitor, resulted in accumulation of ubiquitinated variant protein. A further experiment demonstrated that atenolol and diltiazem, two current drugs for treating hypertension, reduce the vanin-1 protein level. Our study provides strong biological evidence for the association of the identified SNP with BP and suggests that vanin-1 misfolding and degradation are the underlying molecular mechanism.
Hypertension (HTN) or high blood pressure (BP) is common worldwide and a major risk factor for cardiovascular disease and all-cause mortality. Identification of genetic variants of consequence for HTN serves as the molecular basis for its treatment. Using admixture mapping analysis of the Family Blood Pressure Program data, we recently identified that the VNN1 gene (encoding the protein vanin-1), in particular SNP rs2272996 (N131S), was associated with BP in both African Americans and Mexican Americans. Vanin-1 was reported to act as an oxidative stress sensor using its pantetheinase enzyme activity. Because a linkage between oxidative stress and HTN has been hypothesized for many years, vanin-1's pantetheinase activity offers a physiologic rationale for BP regulation. Here, we first replicated the association of rs2272996 with BP in the Continental Origins and Genetic Epidemiology Network (COGENT), which included nearly 30,000 African Americans. We further demonstrated that the N131S mutation in vanin-1 leads to its rapid degradation in cells, resulting in loss of function on the plasma membrane. The loss of function of vanin-1 is associated with reduced BP. Therefore, our results indicate that vanin-1 is a new candidate to be manipulated to ameliorate HTN.
Translation of results from genetic findings to inform medical practice is a highly anticipated goal of human genetics. The aim of this paper is to review and discuss the role of genetics in medically-relevant prediction. Germline genetics presages disease onset and therefore can contribute prognostic signals that augment laboratory tests and clinical features. As such, the impact of genetic-based predictive models on clinical decisions and therapy choice could be profound. However, given that (i) medical traits result from a complex interplay between genetic and environmental factors, (ii) the underlying genetic architectures for susceptibility to common diseases are not well-understood, and (iii) replicable susceptibility alleles, in combination, account for only a moderate amount of disease heritability, there are substantial challenges to constructing and implementing genetic risk prediction models with high utility. In spite of these challenges, concerted progress has continued in this area with an ongoing accumulation of studies that identify disease predisposing genotypes. Several statistical approaches with the aim of predicting disease have been published. Here we summarize the current state of disease susceptibility mapping and pharmacogenetics efforts for risk prediction, describe methods used to construct and evaluate genetic-based predictive models, and discuss applications.
predictive model; genetic risk; human genetics; prognosis; clinical utility
In case-control Single Nucleotide Polymorphism (SNP) data, the Allele frequency, Hardy Weinberg Disequilibrium (HWD) and Linkage Disequilibrium (LD) contrast tests are three distinct sources of information about genetic association. While all three tests are typically developed in a retrospective context, we show that prospective logistic regression models may be developed that correspond conceptually to the retrospective tests. This approach provides a flexible framework for conducting a systematic series of association analyses using unphased genotype data and any number of covariates. For a single stage study, two single-marker tests and four two-marker tests are discussed. The true association models are derived and they allow us to understand why a model with only a linear term will generally fit well for a SNP in weak LD with a causal SNP, whatever the disease model, but not for a SNP in high LD with a non-additive disease SNP. We investigate the power of the association tests using real LD parameters from chromosome 11 in the HapMap CEU population data. Among the single-marker tests, the allelic test has on average the most power in the case of an additive disease; but, for dominant, recessive and heterozygote disadvantage diseases, the genotypic test has the most power. Among the six two-marker tests, the Allelic-LD contrast test, which incorporates linear terms for two markers and their interaction term, provides the most reliable power overall for the cases studied. Therefore, our result supports incorporating an interaction term as well as linear terms in multi-marker tests.
Allele frequency contrast test; LD contrast test; HWD contrast test; Genome-wide Association
Multiple substance dependence (MSD) trait comorbidity is common, and MSD patients are often severely affected clinically. While shared genetic risks have been documented, so far there has been no published report using the linkage scan approach to survey risk loci for MSD as a phenotype. A total of 1,758 individuals in 739 families [384 African American (AA) and 355 European American (EA) families] ascertained via affected sib-pairs with cocaine or opioid or alcohol dependence were genotyped using an array-based linkage panel of single-nucleotide polymorphism markers. Fuzzy clustering analysis was conducted on individuals with alcohol, cannabis, cocaine, opioid, and nicotine dependence for AAs and EAs separately, and linkage scans were conducted for the output membership coefficients using Merlin-regression. In EAs, we observed an autosome-wide significant linkage signal on chromosome 4 (peak lod = 3.31 at 68.3 cM; empirical autosome-wide P = 0.038), and a suggestive linkage signal on chromosome 21 (peak lod = 2.37 at 19.4 cM). In AAs, four suggestive linkage peaks were observed: two peaks on chromosome 10 (lod = 2.66 at 96.7 cM and lod = 3.02 at 147.6 cM] and the other two on chromosomes 3 (lod = 2.81 at 145.5 cM) and 9 (lod = 1.93 at 146.8 cM). Three particularly promising candidate genes, GABRA4, GABRB1, and CLOCK, are located within or very close to the autosome-wide significant linkage region for EAs on chromosome 4. This is the first linkage evidence supporting existence of genetic loci influencing risk for several comorbid disorders simultaneously in two major US populations.
comorbidity; multiple substance dependence; fuzzy clustering; chromosome 4
The possible evidence for association comprises three types of information: differences between cases and controls in allele frequencies, in parameters for Hardy Weinberg disequilibrium (HWD), and in parameters for linkage disequilibrium (LD). LD between marker and disease alleles results in a difference in at least one of the three types of parameters [Won and Elston, 2008]. However, the parameters for LD require knowledge about phase, which is usually unknown, making the LD contrast test without modification infeasible in practice. Methods for handling phase uncertainty are: (1) the most probable haplotype pair for each individual can be considered as the true phase; (2) a weighted average of haplotypes can be used; (3) we can consider the composite LD, which does not require any information about phase. We compare these methods to handle phase uncertainty in terms of validity and efficiency, and the effect on them of HWD in the population, at the same time confirming results for the three types of information. When the LD between markers is high, the LD contrast test that uses a weighted average of haplotypes or the most probable haplotypes to calculate the LD is recommended, but otherwise the LD contrast test that uses the composite LD is recommended. We conclude that, even though the difference in allele frequencies is usually the most informative test except in the case of a recessive disease, the LD contrast test can be more powerful if the markers are dense enough.
linkage disequilibrium; haplotype phase; self replication
Estimated glomerular filtration rate (eGFR), a measure of kidney function, is heritable, suggesting that genes influence renal function. Genes that influence eGFR have been identified through genome-wide association studies. However, family-based linkage approaches may identify loci that explain a larger proportion of the heritability. This study used genome-wide linkage and association scans to identify quantitative trait loci (QTL) that influence eGFR.
Genome-wide linkage and sparse association scans of eGFR were performed in families ascertained by probands with advanced diabetic nephropathy (DN) from the multi-ethnic Family Investigation of Nephropathy and Diabetes (FIND) study. This study included 954 African Americans (AA), 781 American Indians (AI), 614 European Americans (EA) and 1,611 Mexican Americans (MA). A total of 3,960 FIND participants were genotyped for 6,000 single nucleotide polymorphisms (SNPs) using the Illumina Linkage IVb panel. GFR was estimated by the Modification of Diet in Renal Disease (MDRD) formula.
The non-parametric linkage analysis, accounting for the effects of diabetes duration and BMI, identified the strongest evidence for linkage of eGFR on chromosome 20q11 (log of the odds [LOD] = 3.34; P = 4.4×10−5) in MA and chromosome 15q12 (LOD = 2.84; P = 1.5×10−4) in EA. In all subjects, the strongest linkage signal for eGFR was detected on chromosome 10p12 (P = 5.5×10−4) at 44 cM near marker rs1339048. A subsequent association scan in both ancestry-specific groups and the entire population identified several SNPs significantly associated with eGFR across the genome.
The present study describes the localization of QTL influencing eGFR on 20q11 in MA, 15q21 in EA and 10p12 in the combined ethnic groups participating in the FIND study. Identification of causal genes/variants influencing eGFR, within these linkage and association loci, will open new avenues for functional analyses and development of novel diagnostic markers for DN.
Translation studies have been initiated to assess the combined effect of genetic loci from recently accomplished genome-wide association studies and the existing risk factors for early disease prediction. We propose a bagging optimal receiver operating characteristic (ROC) curve method to facilitate this research. Through simulation and real data application, we compared the new method with the commonly used allele counting method and logistic regression, and found that the new method yields a better performance. The new method was applied on the Wellcome Trust data set to form a predictive genetic test for rheumatoid arthritis. The formed test reached an area under the curve (AUC) value of 0.7.
Area under the ROC curve; Bootstrap aggregation; Gene–gene interaction; Genomewide association studies
Fisher  was the first to suggest a method of combining the p-values obtained from several statistics and many other methods have been proposed since then. However, there is no agreement about what is the best method. Motivated by a situation that now often arises in genetic epidemiology, we consider the problem when it is possible to define a simple alternative hypothesis of interest for which the expected effect size of each test statistic is known and we determine the most powerful test for this simple alternative hypothesis. Based on the proposed method, we show that information about the effect sizes can be used to obtain the best weights for Liptak’s method of combining p-values. We present extensive simulation results comparing methods of combining p-values and illustrate for a real example in genetic epidemiology how information about effect sizes can be deduced.
Fisher; Liptak; effect size
This paper is concerned with evaluating whether an interaction between two sets of risk factors for a binary trait is removable and fitting a parsimonious additive model using a suitable link function to estimate the disease odds (on the natural logarithm scale) when an interaction is removable. Statisticians define the term “interaction” as a departure from additivity in a linear model on a specific scale on which the data are measured. Certain interactions may be eliminated via a transformation of the outcome such that the relationship between the risk factors and the outcome is additive on the transformed scale. Such interactions are known as removable interactions. We develop a novel test statistic for detecting the presence a removable interaction in case-control studies. We consider the Guerrero and Johnson family of transformations and show that this family constitutes an appropriate link function for fitting an additive model when an interaction is removable. We use simulation studies to examine the type I error and power of the proposed test and to show that an additive model based on the Guerrero and Johnson link function leads to more precise estimates of the disease odds parameters and a better fit when an interaction is removable. The proposed test and use of the transformation are illustrated using case-control data from three published studies. Finally, we indicate how one can check that, after transformation, no further interaction is significant.
Analysis of variance; curvature; independence; interaction effect; link function; main effect; residuals; score statistic; Tukey’s test; transformation; unbalanced data
We investigated the heritability and familial aggregation of various indexes of arterial stiffness and wave reflection and we partitioned the phenotypic correlation between these traits into shared genetic and environmental components.
Using a family-based population sample, we recruited 204 parents (mean age, 51.7 years) and 290 offspring (29.4 years) from the population in Cracow, Poland (62 families), Hechtel-Eksel, Belgium (36), and Pilsen, the Czech Republic (50). We measured peripheral pulse pressure (PPp) sphygmomanometrically at the brachial artery; central pulse pressure (PPc), the peripheral augmentation indexes (PAIxs) and central augmentation indexes (CAIxs) by applanation tonometry at the radial artery; and aortic pulse wave velocity (PWV) by tonometry or ultrasound. In multivariate-adjusted analyses, we used the ASSOC and PROC GENMOD procedures as implemented in SAGE and SAS, respectively.
We found significant heritability for PAIx, CAIx, PPc and mean arterial pressure ranging from 0.37 to 0.41; P ≤ 0.0001. The method of intrafamilial concordance confirmed these results; intrafamilial correlation coefficients were significant for all arterial indexes (r > ≥ 0.12; P < ≤ 0.02) with the exception of PPc (r = −0.007; P = 0.90) in parent–offspring pairs. The sib–sib correlations were also significant for CAIx (r = 0.22; P = 0.001). The genetic correlation between PWV and the other arterial indexes were significant (ρG ≥ 0.29; P < 0.0001). The corresponding environmental correlations were only significantly positive for PPp (ρE = 0.10, P = 0.03).
The observation of significant intrafamilial concordance and heritability of various indexes of arterial stiffness as well as the genetic correlations among arterial phenotypes strongly support the search for shared genetic determinants underlying these traits.
arterial stiffness; familial aggregation; heritability; pulse pressure; systolic augmentation
15-Hydroxyprostaglandin dehydrogenase (15-PGDH) is a metabolic antagonist of COX-2, catalyzing the degradation of inflammation mediator prostaglandin E2 (PGE2) and other prostanoids. Recent studies have established the 15-PGDH gene as a colon cancer suppressor.
We evaluated 15-PDGH as a colon cancer susceptibility locus in a three-stage design. We first genotyped 102 single-nucleotide polymorphisms (SNPs) in the 15-PGDH gene, spanning ∼50 kb up and down-stream of the coding region, in 464 colon cancer cases and 393 population controls. We then genotyped the same SNPs, and also assayed the expression levels of 15-PGDH in colon tissues from 69 independent patients for whom colon tissue and paired germline DNA samples were available. In the final stage 3, we genotyped the 9 most promising SNPs from stages 1 and 2 in an independent sample of 525 cases and 816 controls (stage 3).
In the first two stages, three SNPs (rs1365611, rs6844282 and rs2332897) were statistically significant (p<0.05) in combined analysis of association with risk of colon cancer and of association with 15-PGDH expression, after adjustment for multiple testing. For one additional SNP, rs2555639, the T allele showed increased cancer risk and decreased 15-PGDH expression, but just missed statistical significance (p-adjusted = 0.063). In stage 3, rs2555639 alone showed evidence of association with an odds ratio (TT compared to CC) of 1.50 (95% CI = 1.05–2.15, p = 0.026).
Our data suggest that the rs2555639 T allele is associated with increased risk of colon cancer, and that carriers of this risk allele exhibit decreased expression of 15-PGDH in the colon.
Incorporating family data in genetic association studies has become increasingly appreciated, especially for its potential value in testing rare variants. We introduce here a variance-component based association test that can test multiple common or rare variants jointly using both family and unrelated samples.
The proposed approach implemented in our R package aggregates or collapses the information across a region based on genetic similarity instead of genotype scores, which avoids the power loss when the effects are in different directions or have different association strengths. The method is also able to effectively leverage the LD information in a region and it can produce a test statistic with an adaptively estimated number of degrees of freedom. Our method can readily allow for the adjustment of non-genetic contributions to the familial similarity, as well as multiple covariates.
We demonstrate through simulations that the proposed method achieves good performance in terms of Type I error control and statistical power. The method is implemented in the R package “fassoc”, which provides a useful tool for data analysis and exploration.
Association studies; Family data; Score test; Multi-marker test
It has been postulated that multiple-marker methods may have added ability, over single-marker methods, to detect genetic variants associated with disease. The Wellcome Trust Case Control Consortium (WTCCC) provided the first successful large genome-wide association studies (GWAS) which included single-marker association analyses for seven common complex diseases. Of those signals detected, only one was associated with coronary artery disease (CAD), and none were identified for hypertension (HTN). Our objective was to find additional genetic associations and pathways for cardiovascular disease by examining the WTCCC data for variants associated with CAD and HTN using two-marker testing methods. We applied two-marker association testing to the WTCCC dataset, which includes ~2,000 affected individuals with each disorder, and a shared pool of ~3,000 controls, all genotyped using Affymetrix GeneChip 500 K arrays. For CAD, we detected single nucleotide polymorphisms (SNP) pairs in three genes showing genome-wide significance: HFE2, STK32B, and DIPC2. The most notable SNP pairs in a non-protein-coding region were at 9p21, a known major CAD-associated region. For HTN, we detected SNP pairs in five genes: GPR39, XRCC4, MYO6, ZFAT, and MACROD2. Four further associated SNP pair regions were at least 70 kb from any known gene. We have shown that novel, multiple-marker, statistical methods can be of use in finding variants in GWAS. We describe many new, associated variants for both CAD and HTN and describe their known genetic mechanisms.
The genetic etiology of complex human diseases has been commonly viewed as a process that involves multiple genetic variants, environmental factors, as well as their interactions. Statistical approaches, such as the multifactor dimensionality reduction (MDR) and generalized MDR (GMDR), have recently been proposed to test the joint association of multiple genetic variants with either dichotomous or continuous traits. In this paper, we propose a novel Forward U-Test to evaluate the combined effect of multiple loci on quantitative traits with consideration of gene-gene/gene-environment interactions. In this new approach, a U-Statistic-based forward algorithm is first used to select potential disease-susceptibility loci and then a weighted U statistic is used to test the joint association of the selected loci with the disease. Through a simulation study, we found the Forward U-Test outperformed GMDR in terms of greater power. Aside from that, our approach is less computationally intensive, making it feasible for high-dimensional gene-gene/gene-environment research. We illustrate our method with a real data application to Nicotine Dependence (ND), using three independent datasets from the Study of Addiction: Genetics and Environment. Our gene-gene interaction analysis of 155 SNPs in 67 candidate genes identified two SNPs, rs16969968 within gene CHRNA5 and rs1122530 within gene NTRK2, jointly associated with the level of ND (p-value = 5.31e-7). The association, which involves essential interaction, is replicated in two independent datasets with p-values of 1.08e-5 and 0.02, respectively. Our finding suggests that joint action may exist between the two gene products.
gene-gene interaction; Forward U-Test; Nicotine Dependence