genome-wide association; family studies; study designs; genetic factors; environmental factors
Longitudinal data enables detecting the effect of aging/time, and as a repeated measures design is statistically more efficient compared to cross-sectional data if the correlations between repeated measurements are not large. In particular, when genotyping cost is more expensive than phenotyping cost, the collection of longitudinal data can be an efficient strategy for genetic association analysis. However, in spite of these advantages, genome-wide association studies (GWAS) with longitudinal data have rarely been analyzed taking this into account. In this report, we calculate the required sample size to achieve 80% power at the genome-wide significance level for both longitudinal and cross-sectional data, and compare their statistical efficiency. Furthermore, we analyzed the GWAS of eight phenotypes with three observations on each individual in the Korean Association Resource (KARE). A linear mixed model allowing for the correlations between observations for each individual was applied to analyze the longitudinal data, and linear regression was used to analyze the first observation on each individual as cross-sectional data. We found 12 novel genome-wide significant disease susceptibility loci that were then confirmed in the Health Examination cohort, as well as some significant interactions between age/sex and SNPs.
longitudinal data; cross-sectional data; Korean Association Resource (KARE) cohort; Health Examinee (HEXA) cohort
A predictive joint shared parameter model is proposed for discrete time-to-event and longitudinal data. A discrete survival model with frailty and a generalized linear mixed model for the longitudinal data are joined to predict the probability of events. This joint model focuses on predicting discrete time-to-event outcome, taking advantage of repeated measurements. We show that the probability of an event in a time window can be more precisely predicted by incorporating the longitudinal measurements. The model was investigated by comparison with a two-step model and a discrete time survival model. Results from both a study on the occurrence of tuberculosis and simulated data show that the joint model is superior to the other models in discrimination ability, especially as the latent variables related to both survival times and the longitudinal measurements depart from 0.
joint modeling; discrete time-to-event; longitudinal; nonlinear; biomarker; Tuberculosis, immunology
High blood pressure (BP) is the most common cardiovascular risk factor worldwide and a major contributor to heart disease and stroke. We previously discovered a BP-associated missense SNP (single nucleotide polymorphism)–rs2272996–in the gene encoding vanin-1, a glycosylphosphatidylinositol (GPI)-anchored membrane pantetheinase. In the present study, we first replicated the association of rs2272996 and BP traits with a total sample size of nearly 30,000 individuals from the Continental Origins and Genetic Epidemiology Network (COGENT) of African Americans (P = 0.01). This association was further validated using patient plasma samples; we observed that the N131S mutation is associated with significantly lower plasma vanin-1 protein levels. We observed that the N131S vanin-1 is subjected to rapid endoplasmic reticulum-associated degradation (ERAD) as the underlying mechanism for its reduction. Using HEK293 cells stably expressing vanin-1 variants, we showed that N131S vanin-1 was degraded significantly faster than wild type (WT) vanin-1. Consequently, there were only minimal quantities of variant vanin-1 present on the plasma membrane and greatly reduced pantetheinase activity. Application of MG-132, a proteasome inhibitor, resulted in accumulation of ubiquitinated variant protein. A further experiment demonstrated that atenolol and diltiazem, two current drugs for treating hypertension, reduce the vanin-1 protein level. Our study provides strong biological evidence for the association of the identified SNP with BP and suggests that vanin-1 misfolding and degradation are the underlying molecular mechanism.
Hypertension (HTN) or high blood pressure (BP) is common worldwide and a major risk factor for cardiovascular disease and all-cause mortality. Identification of genetic variants of consequence for HTN serves as the molecular basis for its treatment. Using admixture mapping analysis of the Family Blood Pressure Program data, we recently identified that the VNN1 gene (encoding the protein vanin-1), in particular SNP rs2272996 (N131S), was associated with BP in both African Americans and Mexican Americans. Vanin-1 was reported to act as an oxidative stress sensor using its pantetheinase enzyme activity. Because a linkage between oxidative stress and HTN has been hypothesized for many years, vanin-1's pantetheinase activity offers a physiologic rationale for BP regulation. Here, we first replicated the association of rs2272996 with BP in the Continental Origins and Genetic Epidemiology Network (COGENT), which included nearly 30,000 African Americans. We further demonstrated that the N131S mutation in vanin-1 leads to its rapid degradation in cells, resulting in loss of function on the plasma membrane. The loss of function of vanin-1 is associated with reduced BP. Therefore, our results indicate that vanin-1 is a new candidate to be manipulated to ameliorate HTN.
Translation of results from genetic findings to inform medical practice is a highly anticipated goal of human genetics. The aim of this paper is to review and discuss the role of genetics in medically-relevant prediction. Germline genetics presages disease onset and therefore can contribute prognostic signals that augment laboratory tests and clinical features. As such, the impact of genetic-based predictive models on clinical decisions and therapy choice could be profound. However, given that (i) medical traits result from a complex interplay between genetic and environmental factors, (ii) the underlying genetic architectures for susceptibility to common diseases are not well-understood, and (iii) replicable susceptibility alleles, in combination, account for only a moderate amount of disease heritability, there are substantial challenges to constructing and implementing genetic risk prediction models with high utility. In spite of these challenges, concerted progress has continued in this area with an ongoing accumulation of studies that identify disease predisposing genotypes. Several statistical approaches with the aim of predicting disease have been published. Here we summarize the current state of disease susceptibility mapping and pharmacogenetics efforts for risk prediction, describe methods used to construct and evaluate genetic-based predictive models, and discuss applications.
predictive model; genetic risk; human genetics; prognosis; clinical utility
Background and purpose
Interstitial cystitis/bladder pain syndrome (IC/BPS) is relatively common and associated with severe pain, yet effective treatment remains elusive. Research typically emphasized the bladder’s role, but given the high presence of systemic comorbidities, the authors hypothesized a pathophysiologic nervous system role. This paper reports the methodology and approach to study the nervous system in women with IC/BPS. The study compares neurologic, urologic, gynecologic, autonomic, gastrointestinal, and psychological features of women with IC/BPS, their female relatives, women with myofascial pelvic pain (MPP), and healthy controls to elucidate the role of central and peripheral processing.
Methods and results
In total, 228 women (76 IC/BPS, 76 MPP, 38 family members, and 38 healthy controls) will be recruited. Subjects undergo detailed screening, structured neurologic examination of limbs and pelvis, tender point examination, autonomic testing, electrogastrography, and assessment of comorbid functional dysautonomias. Interpreters are blinded to subject classification. Psychological and stress response characteristics are examined with assessments of stress, trauma history, general psychological function, and stress response quantification. As of December 2012, data collection is completed for 25 healthy controls, 33 IC/BPS ± MPP, eight MPP, and three family members. Recruitment rate is accelerating and strategies emphasize maintaining and encouraging investigator participation in study science, internet advertising, and presentations to pelvic pain support groups.
The study represents a comprehensive, interdisciplinary approach to sampling autonomic and psychophysiologic characteristics of women with IC/BPS. Despite divergent opinions on study methodologies based on specialty experiences, the study has proven feasible to date and different perspectives have proved to be one of the greatest study strengths.
interstitial cystitis; bladder pain syndrome; autonomic nervous system; psychophysiology; pelvic pain; myofascial pain
Gene–gene interactions may contribute to the genetic variation underlying complex traits but have not always been taken fully into account. Statistical analyses that consider gene–gene interaction may increase the power of detecting associations, especially for low-marginal-effect markers, and may explain in part the “missing heritability.” Detecting pair-wise and higher-order interactions genome-wide requires enormous computational power. Filtering pipelines increase the computational speed by limiting the number of tests performed. We summarize existing filtering approaches to detect epistasis, after distinguishing the purposes that lead us to search for epistasis. Statistical filtering includes quality control on the basis of single marker statistics to avoid the analysis of bad and least informative data, and limits the search space for finding interactions. Biological filtering includes targeting specific pathways, integrating various databases based on known biological and metabolic pathways, gene function ontology and protein–protein interactions. It is increasingly possible to target single-nucleotide polymorphisms that have defined functions on gene expression, though not belonging to protein-coding genes. Filtering can improve the power of an interaction association study, but also increases the chance of missing important findings.
epistasis; genetic interaction; biological interaction; filtering pipeline; optimal search
Interstitial cystitis/bladder pain syndrome (IC/BPS) is characterized by urinary urgency, frequency, nocturia, pain worse as the bladder fills and improved after emptying. These features might suggest abnormal autonomic bladder control mechanisms. We compared the structural integrity of the autonomic nervous system (ANS) in IC/BPS and control subjects.
IRB-approved study at University Hospitals Case Medical Center, Cleveland, OH to evaluate the structural integrity of the ANS in adult females. Testing included cardiovascular response to deep breathing, Valsalva maneuver, 30 min head up tilt, and sudomotor test.
Differences in ANS integrity for IC/BPS subjects and controls were determined by modified Composite Autonomic Severity Score (CASS) that includes sudomotor, adrenergic and cardiovascular indices. Baseline heart rate (HR) and HRs from each of three 10 min upright segments of a tilt test were compared and trend analyses performed using t tests. Healthy and IC/BPS subjects were demographically similar. The two groups did not differ in modified-CASS scores but elevated average peak heart rate was evident during baseline (supine; p = 0.057) for IC/BPS subjects prior to a tilt test. Difference at baseline was maintained at each interval during the tilt, with nearly identical slopes across intervals. The preliminary nature of this report denotes a small sample size and important differences may not be detected.
The findings show no structural ANS abnormalities in IC/BPS subjects. Higher baseline HR supports the concept of functional rather than structural change in the ANS, such as abnormality of sympathetic/parasympathetic balance that will require further evaluation.
Interstitial cystitis/bladder pain syndrome; Pelvic pain; Autonomic nervous system
Although familial susceptibility to glioma is known, the genetic basis for this susceptibility remains unidentified in the majority of glioma-specific families. An alternative approach to identifying such genes is to examine cancer pedigrees, which include glioma as one of several cancer phenotypes, to determine whether common chromosomal modifications might account for the familial aggregation of glioma and other cancers.
Germline rearrangements in 146 glioma families (from the Gliogene Consortium; http://www.gliogene.org/) were examined using multiplex ligation-dependent probe amplification. These families all had at least 2 verified glioma cases and a third reported or verified glioma case in the same family or 2 glioma cases in the family with at least one family member affected with melanoma, colon, or breast cancer.The genomic areas covering TP53, CDKN2A, MLH1, and MSH2 were selected because these genes have been previously reported to be associated with cancer pedigrees known to include glioma.
We detected a single structural rearrangement, a deletion of exons 1-6 in MSH2, in the proband of one family with 3 cases with glioma and one relative with colon cancer.
Large deletions and duplications are rare events in familial glioma cases, even in families with a strong family history of cancers that may be involved in known cancer syndromes.
CDKN2A/B; family history; glioma; MLH1; MSH2; TP53
Multiple substance dependence (MSD) trait comorbidity is common, and MSD patients are often severely affected clinically. While shared genetic risks have been documented, so far there has been no published report using the linkage scan approach to survey risk loci for MSD as a phenotype. A total of 1,758 individuals in 739 families [384 African American (AA) and 355 European American (EA) families] ascertained via affected sib-pairs with cocaine or opioid or alcohol dependence were genotyped using an array-based linkage panel of single-nucleotide polymorphism markers. Fuzzy clustering analysis was conducted on individuals with alcohol, cannabis, cocaine, opioid, and nicotine dependence for AAs and EAs separately, and linkage scans were conducted for the output membership coefficients using Merlin-regression. In EAs, we observed an autosome-wide significant linkage signal on chromosome 4 (peak lod = 3.31 at 68.3 cM; empirical autosome-wide P = 0.038), and a suggestive linkage signal on chromosome 21 (peak lod = 2.37 at 19.4 cM). In AAs, four suggestive linkage peaks were observed: two peaks on chromosome 10 (lod = 2.66 at 96.7 cM and lod = 3.02 at 147.6 cM] and the other two on chromosomes 3 (lod = 2.81 at 145.5 cM) and 9 (lod = 1.93 at 146.8 cM). Three particularly promising candidate genes, GABRA4, GABRB1, and CLOCK, are located within or very close to the autosome-wide significant linkage region for EAs on chromosome 4. This is the first linkage evidence supporting existence of genetic loci influencing risk for several comorbid disorders simultaneously in two major US populations.
comorbidity; multiple substance dependence; fuzzy clustering; chromosome 4
Estimated glomerular filtration rate (eGFR), a measure of kidney function, is heritable, suggesting that genes influence renal function. Genes that influence eGFR have been identified through genome-wide association studies. However, family-based linkage approaches may identify loci that explain a larger proportion of the heritability. This study used genome-wide linkage and association scans to identify quantitative trait loci (QTL) that influence eGFR.
Genome-wide linkage and sparse association scans of eGFR were performed in families ascertained by probands with advanced diabetic nephropathy (DN) from the multi-ethnic Family Investigation of Nephropathy and Diabetes (FIND) study. This study included 954 African Americans (AA), 781 American Indians (AI), 614 European Americans (EA) and 1,611 Mexican Americans (MA). A total of 3,960 FIND participants were genotyped for 6,000 single nucleotide polymorphisms (SNPs) using the Illumina Linkage IVb panel. GFR was estimated by the Modification of Diet in Renal Disease (MDRD) formula.
The non-parametric linkage analysis, accounting for the effects of diabetes duration and BMI, identified the strongest evidence for linkage of eGFR on chromosome 20q11 (log of the odds [LOD] = 3.34; P = 4.4×10−5) in MA and chromosome 15q12 (LOD = 2.84; P = 1.5×10−4) in EA. In all subjects, the strongest linkage signal for eGFR was detected on chromosome 10p12 (P = 5.5×10−4) at 44 cM near marker rs1339048. A subsequent association scan in both ancestry-specific groups and the entire population identified several SNPs significantly associated with eGFR across the genome.
The present study describes the localization of QTL influencing eGFR on 20q11 in MA, 15q21 in EA and 10p12 in the combined ethnic groups participating in the FIND study. Identification of causal genes/variants influencing eGFR, within these linkage and association loci, will open new avenues for functional analyses and development of novel diagnostic markers for DN.
It is an obvious fact that the power of a test statistic is dependent upon the significance (alpha) level at which the test is performed. It is perhaps a less obvious fact that the relative performance of two statistics in terms of power is also a function of the alpha level. Through numerous personal discussions, we have noted that even some competent statisticians have the mistaken intuition that relative power comparisons at traditional levels such as α = 0.05 will be roughly similar to relative power comparisons at very low levels, such as the level α = 5 × 10−8, which is commonly used in genome-wide association studies. In this brief note, we demonstrate that this notion is in fact quite wrong, especially with respect to comparing tests with differing degrees of freedom. In fact, at very low alpha levels the cost of additional degrees of freedom is often comparatively low. Thus we recommend that statisticians exercise caution when interpreting the results of power comparison studies which use alpha levels that will not be used in practice.
Power; Small Significance Levels
We propose a two-step model-based approach, with correction for ascertainment, to linkage analysis of a binary trait with variable age of onset and apply it to a set of multiplex pedigrees segregating for adult glioma.
First, we fit segregation models by formulating the likelihood for a person to have a bivariate phenotype, affection status and age of onset, along with other covariates, and from these we estimate population trait allele frequencies and penetrance parameters as a function of age (N=281 multiplex glioma pedigrees). Second, the best fitting models are used as trait models in multipoint linkage analysis (N=74 informative multiplex glioma pedigrees). To correct for ascertainment, a prevalence constraint is used in the likelihood of the segregation models for all 281 pedigrees. Then the trait allele frequencies are re-estimated for the pedigree founders of the subset of 74 pedigrees chosen for linkage analysis.
Using the best fitting segregation models in model-based multipoint linkage analysis, we identified two separate peaks on chromosome 17; the first agreed with a region identified by Shete et al. who used model-free affected-only linkage analysis, but with a narrowed peak: and the second agreed with a second region they found but had a larger maximum log of the odds (LOD).
Our approach has the advantage of not requiring markers to be in linkage equilibrium unless the minor allele frequency is small (markers which tend to be uninformative for linkage), and of using more of the available information for LOD-based linkage analysis.
Glioma; model-based linkage; segregation; age of onset; prevalence constraint
Olson's conditional-logistic model retains the nice property of the LOD score formulation and has advantages over other methods that make it an appropriate choice for complex trait linkage mapping. However, the asymptotic distribution of the conditional-logistic likelihood-ratio (CL-LR) statistic with genetic constraints on the model parameters is unknown for some analysis models, even in the case of samples comprising only independent sib pairs. We derive approximations to the asymptotic null distributions of the CL-LR statistics and compare them with the empirical null distributions by simulation using independent affected sib pairs. Generally, the empirical null distributions of the CL-LR statistics match well the known or approximated asymptotic distributions for all analysis models considered except for the covariate model with a minimum-adjusted binary covariate. This work will provide useful guidelines for linkage analysis of real data sets for the genetic analysis of complex traits, thereby contributing to the identification of genes for disease traits.
linkage analysis; affected sib pairs; identity-by-descent; conditional-logistic model; genetic constraints; null distribution; likelihood-ratio statistics
Translation studies have been initiated to assess the combined effect of genetic loci from recently accomplished genome-wide association studies and the existing risk factors for early disease prediction. We propose a bagging optimal receiver operating characteristic (ROC) curve method to facilitate this research. Through simulation and real data application, we compared the new method with the commonly used allele counting method and logistic regression, and found that the new method yields a better performance. The new method was applied on the Wellcome Trust data set to form a predictive genetic test for rheumatoid arthritis. The formed test reached an area under the curve (AUC) value of 0.7.
Area under the ROC curve; Bootstrap aggregation; Gene–gene interaction; Genomewide association studies
A novel web-based tool PedWiz that pipelines the informatics process for pedigree data is introduced. PedWiz is designed to assist researchers in the analysis of pedigree data. It provides a convenient tool for pedigree informatics: descriptive statistics, relative pairs, genetic similarity coefficients, the variance-covariance matrix for three estimated coefficients of allele identical-by-descent sharing as well as mean allele sharing, a plot of the pedigree structures, and a visualization of the identity coefficients. With a renewed interest in linkage and other family based methods, PedWiz will be a valuable tool for the analysis of family data.
pedigree; informatics; genetic similarity; identity-by-descent; relative pairs; family data
This paper is concerned with evaluating whether an interaction between two sets of risk factors for a binary trait is removable and fitting a parsimonious additive model using a suitable link function to estimate the disease odds (on the natural logarithm scale) when an interaction is removable. Statisticians define the term “interaction” as a departure from additivity in a linear model on a specific scale on which the data are measured. Certain interactions may be eliminated via a transformation of the outcome such that the relationship between the risk factors and the outcome is additive on the transformed scale. Such interactions are known as removable interactions. We develop a novel test statistic for detecting the presence a removable interaction in case-control studies. We consider the Guerrero and Johnson family of transformations and show that this family constitutes an appropriate link function for fitting an additive model when an interaction is removable. We use simulation studies to examine the type I error and power of the proposed test and to show that an additive model based on the Guerrero and Johnson link function leads to more precise estimates of the disease odds parameters and a better fit when an interaction is removable. The proposed test and use of the transformation are illustrated using case-control data from three published studies. Finally, we indicate how one can check that, after transformation, no further interaction is significant.
Analysis of variance; curvature; independence; interaction effect; link function; main effect; residuals; score statistic; Tukey’s test; transformation; unbalanced data
Current genome-wide association studies still heavily rely on a single-marker strategy, in which each single nucleotide polymorphism (SNP) is tested individually for association with a phenotype. Although methods and software packages that consider multimarker models have become available, they have been slow to become widely adopted and their efficacy in real data analysis is often questioned. Based on conducting extensive simulations, here we endeavor to provide more insights into the performance of simple multimarker association tests as compared to single-marker tests. The results reveal the power advantage as well as disadvantage of the two- vs. the single-marker test. Power differentials depend on the correlation structure among tag SNPs, as well as that between tag SNPs and causal variants. A two-marker test has relatively better performance than single-marker tests when the correlation of the two adjacent markers is high. However, using HapMap data, two-marker tests tended to have a greater chance of being less powerful than single-marker tests, due to constraints on the number of actual possible haplotypes in the HapMap data. Yet, the average power difference was small whenever the one-marker test is more powerful, while there were many situations where the two-marker test can be much more powerful. These findings can be useful to guide analyses of future studies.
Asymptotic power; single-marker test; two-marker test; genome-wide association
We investigated the heritability and familial aggregation of various indexes of arterial stiffness and wave reflection and we partitioned the phenotypic correlation between these traits into shared genetic and environmental components.
Using a family-based population sample, we recruited 204 parents (mean age, 51.7 years) and 290 offspring (29.4 years) from the population in Cracow, Poland (62 families), Hechtel-Eksel, Belgium (36), and Pilsen, the Czech Republic (50). We measured peripheral pulse pressure (PPp) sphygmomanometrically at the brachial artery; central pulse pressure (PPc), the peripheral augmentation indexes (PAIxs) and central augmentation indexes (CAIxs) by applanation tonometry at the radial artery; and aortic pulse wave velocity (PWV) by tonometry or ultrasound. In multivariate-adjusted analyses, we used the ASSOC and PROC GENMOD procedures as implemented in SAGE and SAS, respectively.
We found significant heritability for PAIx, CAIx, PPc and mean arterial pressure ranging from 0.37 to 0.41; P ≤ 0.0001. The method of intrafamilial concordance confirmed these results; intrafamilial correlation coefficients were significant for all arterial indexes (r > ≥ 0.12; P < ≤ 0.02) with the exception of PPc (r = −0.007; P = 0.90) in parent–offspring pairs. The sib–sib correlations were also significant for CAIx (r = 0.22; P = 0.001). The genetic correlation between PWV and the other arterial indexes were significant (ρG ≥ 0.29; P < 0.0001). The corresponding environmental correlations were only significantly positive for PPp (ρE = 0.10, P = 0.03).
The observation of significant intrafamilial concordance and heritability of various indexes of arterial stiffness as well as the genetic correlations among arterial phenotypes strongly support the search for shared genetic determinants underlying these traits.
arterial stiffness; familial aggregation; heritability; pulse pressure; systolic augmentation
Segmental handling of sodium along the proximal and distal nephron might be heritable and different between black and white participants.
We randomly recruited 95 nuclear families of black South African ancestry and 103 nuclear families of white Belgian ancestry. We measured the (FENa) and estimated the fractional renal sodium reabsorption in the proximal (RNaprox) and distal (RNadist) tubules from the clearances of endogenous lithium and creatinine. In multivariable analyses, we studied the relation of RNaprox and RNadist with FENa and estimated the heritability (h2) of RNaprox and RNadist.
Independent of urinary sodium excretion, South Africans (n =240) had higher RNaprox (unadjusted median, 93.9% vs. 81.0%; P < 0.001) than Belgians (n =737), but lower RNadist (91.2% vs. 95.1%; P < 0.001). The slope of RNaprox on FENa was steeper in Belgians than in South Africans (−5.40 ±0.58 vs. −0.78 ±0.58 units; P < 0.001), whereas the opposite was true for the slope of RNadist on FENa (−3.84 ± 0.19 vs. −13.71 ± 1.30 units; P < 0.001). h2 of RNaprox and RNadist was high and significant (P < 0.001) in both countries. h2 was higher in South Africans than in Belgians for RNaprox (0.82 vs. 0.56; P < 0.001), but was similar for RNadist (0.68 vs. 0.50; P = 0.17). Of the filtered sodium load, black participants reabsorb more than white participants in the proximal nephron and less postproximally.
Segmental sodium reabsorption along the nephron is highly heritable, but the capacity for regulation in the proximal and postproximal tubules differs between whites and blacks.
clinical genetics; epidemiology; kidney; lithium clearance; salt sensitivity; segmental tubular sodium transport
15-Hydroxyprostaglandin dehydrogenase (15-PGDH) is a metabolic antagonist of COX-2, catalyzing the degradation of inflammation mediator prostaglandin E2 (PGE2) and other prostanoids. Recent studies have established the 15-PGDH gene as a colon cancer suppressor.
We evaluated 15-PDGH as a colon cancer susceptibility locus in a three-stage design. We first genotyped 102 single-nucleotide polymorphisms (SNPs) in the 15-PGDH gene, spanning ∼50 kb up and down-stream of the coding region, in 464 colon cancer cases and 393 population controls. We then genotyped the same SNPs, and also assayed the expression levels of 15-PGDH in colon tissues from 69 independent patients for whom colon tissue and paired germline DNA samples were available. In the final stage 3, we genotyped the 9 most promising SNPs from stages 1 and 2 in an independent sample of 525 cases and 816 controls (stage 3).
In the first two stages, three SNPs (rs1365611, rs6844282 and rs2332897) were statistically significant (p<0.05) in combined analysis of association with risk of colon cancer and of association with 15-PGDH expression, after adjustment for multiple testing. For one additional SNP, rs2555639, the T allele showed increased cancer risk and decreased 15-PGDH expression, but just missed statistical significance (p-adjusted = 0.063). In stage 3, rs2555639 alone showed evidence of association with an odds ratio (TT compared to CC) of 1.50 (95% CI = 1.05–2.15, p = 0.026).
Our data suggest that the rs2555639 T allele is associated with increased risk of colon cancer, and that carriers of this risk allele exhibit decreased expression of 15-PGDH in the colon.
The potential importance of the joint action of genes, whether modeled with or without a statistical interaction term, has long been recognized. However, identifying such action has been a great challenge, especially when millions of genetic markers are involved. We propose a likelihood ratio-based Mann-Whitney test to search for joint gene action either among candidate genes or genome-wide. It extends the traditional univariate Mann-Whitney test to assess the joint association of genotypes at multiple loci with disease, allowing for high-order statistical interactions. Because only one overall significance test is conducted for the entire analysis, it avoids the issue of multiple testing. Moreover, the approach adopts a computationally efficient algorithm, making a genome-wide search feasible in a reasonable amount of time on a high performance personal computer. We evaluated the approach using both theoretical and real data. By applying the approach to 40 type 2 diabetes (T2D) susceptibility single-nucleotide polymorphisms (SNPs), we identified a four-locus model strongly associated with T2D in the Wellcome Trust (WT) study (permutation P-value < 0.001), and replicated the same finding in the Nurses’ Health Study/Health Professionals Follow-Up Study (NHS/HPFS) (P-value = 3.03 × 10–11). We also conducted a genome-wide search on 385,598 SNPs in the WT study. The analysis took approximately 55 hr on a personal computer, identifying the same first two loci, but overall a different set of four SNPs, jointly associated with T2D (P-value = 1.29 × 10–5). The nominal significance of this same association reached 4.01 × 10–6 in the NHS/HPFS.
gene-gene interaction; genome-wide search; forward selection
Incorporating family data in genetic association studies has become increasingly appreciated, especially for its potential value in testing rare variants. We introduce here a variance-component based association test that can test multiple common or rare variants jointly using both family and unrelated samples.
The proposed approach implemented in our R package aggregates or collapses the information across a region based on genetic similarity instead of genotype scores, which avoids the power loss when the effects are in different directions or have different association strengths. The method is also able to effectively leverage the LD information in a region and it can produce a test statistic with an adaptively estimated number of degrees of freedom. Our method can readily allow for the adjustment of non-genetic contributions to the familial similarity, as well as multiple covariates.
We demonstrate through simulations that the proposed method achieves good performance in terms of Type I error control and statistical power. The method is implemented in the R package “fassoc”, which provides a useful tool for data analysis and exploration.
Association studies; Family data; Score test; Multi-marker test
It has been postulated that multiple-marker methods may have added ability, over single-marker methods, to detect genetic variants associated with disease. The Wellcome Trust Case Control Consortium (WTCCC) provided the first successful large genome-wide association studies (GWAS) which included single-marker association analyses for seven common complex diseases. Of those signals detected, only one was associated with coronary artery disease (CAD), and none were identified for hypertension (HTN). Our objective was to find additional genetic associations and pathways for cardiovascular disease by examining the WTCCC data for variants associated with CAD and HTN using two-marker testing methods. We applied two-marker association testing to the WTCCC dataset, which includes ~2,000 affected individuals with each disorder, and a shared pool of ~3,000 controls, all genotyped using Affymetrix GeneChip 500 K arrays. For CAD, we detected single nucleotide polymorphisms (SNP) pairs in three genes showing genome-wide significance: HFE2, STK32B, and DIPC2. The most notable SNP pairs in a non-protein-coding region were at 9p21, a known major CAD-associated region. For HTN, we detected SNP pairs in five genes: GPR39, XRCC4, MYO6, ZFAT, and MACROD2. Four further associated SNP pair regions were at least 70 kb from any known gene. We have shown that novel, multiple-marker, statistical methods can be of use in finding variants in GWAS. We describe many new, associated variants for both CAD and HTN and describe their known genetic mechanisms.
The genetic etiology of complex human diseases has been commonly viewed as a process that involves multiple genetic variants, environmental factors, as well as their interactions. Statistical approaches, such as the multifactor dimensionality reduction (MDR) and generalized MDR (GMDR), have recently been proposed to test the joint association of multiple genetic variants with either dichotomous or continuous traits. In this paper, we propose a novel Forward U-Test to evaluate the combined effect of multiple loci on quantitative traits with consideration of gene-gene/gene-environment interactions. In this new approach, a U-Statistic-based forward algorithm is first used to select potential disease-susceptibility loci and then a weighted U statistic is used to test the joint association of the selected loci with the disease. Through a simulation study, we found the Forward U-Test outperformed GMDR in terms of greater power. Aside from that, our approach is less computationally intensive, making it feasible for high-dimensional gene-gene/gene-environment research. We illustrate our method with a real data application to Nicotine Dependence (ND), using three independent datasets from the Study of Addiction: Genetics and Environment. Our gene-gene interaction analysis of 155 SNPs in 67 candidate genes identified two SNPs, rs16969968 within gene CHRNA5 and rs1122530 within gene NTRK2, jointly associated with the level of ND (p-value = 5.31e-7). The association, which involves essential interaction, is replicated in two independent datasets with p-values of 1.08e-5 and 0.02, respectively. Our finding suggests that joint action may exist between the two gene products.
gene-gene interaction; Forward U-Test; Nicotine Dependence