Translation of results from genetic findings to inform medical practice is a highly anticipated goal of human genetics. The aim of this paper is to review and discuss the role of genetics in medically-relevant prediction. Germline genetics presages disease onset and therefore can contribute prognostic signals that augment laboratory tests and clinical features. As such, the impact of genetic-based predictive models on clinical decisions and therapy choice could be profound. However, given that (i) medical traits result from a complex interplay between genetic and environmental factors, (ii) the underlying genetic architectures for susceptibility to common diseases are not well-understood, and (iii) replicable susceptibility alleles, in combination, account for only a moderate amount of disease heritability, there are substantial challenges to constructing and implementing genetic risk prediction models with high utility. In spite of these challenges, concerted progress has continued in this area with an ongoing accumulation of studies that identify disease predisposing genotypes. Several statistical approaches with the aim of predicting disease have been published. Here we summarize the current state of disease susceptibility mapping and pharmacogenetics efforts for risk prediction, describe methods used to construct and evaluate genetic-based predictive models, and discuss applications.
predictive model; genetic risk; human genetics; prognosis; clinical utility
Background and purpose
Interstitial cystitis/bladder pain syndrome (IC/BPS) is relatively common and associated with severe pain, yet effective treatment remains elusive. Research typically emphasized the bladder’s role, but given the high presence of systemic comorbidities, the authors hypothesized a pathophysiologic nervous system role. This paper reports the methodology and approach to study the nervous system in women with IC/BPS. The study compares neurologic, urologic, gynecologic, autonomic, gastrointestinal, and psychological features of women with IC/BPS, their female relatives, women with myofascial pelvic pain (MPP), and healthy controls to elucidate the role of central and peripheral processing.
Methods and results
In total, 228 women (76 IC/BPS, 76 MPP, 38 family members, and 38 healthy controls) will be recruited. Subjects undergo detailed screening, structured neurologic examination of limbs and pelvis, tender point examination, autonomic testing, electrogastrography, and assessment of comorbid functional dysautonomias. Interpreters are blinded to subject classification. Psychological and stress response characteristics are examined with assessments of stress, trauma history, general psychological function, and stress response quantification. As of December 2012, data collection is completed for 25 healthy controls, 33 IC/BPS ± MPP, eight MPP, and three family members. Recruitment rate is accelerating and strategies emphasize maintaining and encouraging investigator participation in study science, internet advertising, and presentations to pelvic pain support groups.
The study represents a comprehensive, interdisciplinary approach to sampling autonomic and psychophysiologic characteristics of women with IC/BPS. Despite divergent opinions on study methodologies based on specialty experiences, the study has proven feasible to date and different perspectives have proved to be one of the greatest study strengths.
interstitial cystitis; bladder pain syndrome; autonomic nervous system; psychophysiology; pelvic pain; myofascial pain
Gene–gene interactions may contribute to the genetic variation underlying complex traits but have not always been taken fully into account. Statistical analyses that consider gene–gene interaction may increase the power of detecting associations, especially for low-marginal-effect markers, and may explain in part the “missing heritability.” Detecting pair-wise and higher-order interactions genome-wide requires enormous computational power. Filtering pipelines increase the computational speed by limiting the number of tests performed. We summarize existing filtering approaches to detect epistasis, after distinguishing the purposes that lead us to search for epistasis. Statistical filtering includes quality control on the basis of single marker statistics to avoid the analysis of bad and least informative data, and limits the search space for finding interactions. Biological filtering includes targeting specific pathways, integrating various databases based on known biological and metabolic pathways, gene function ontology and protein–protein interactions. It is increasingly possible to target single-nucleotide polymorphisms that have defined functions on gene expression, though not belonging to protein-coding genes. Filtering can improve the power of an interaction association study, but also increases the chance of missing important findings.
epistasis; genetic interaction; biological interaction; filtering pipeline; optimal search
Multiple substance dependence (MSD) trait comorbidity is common, and MSD patients are often severely affected clinically. While shared genetic risks have been documented, so far there has been no published report using the linkage scan approach to survey risk loci for MSD as a phenotype. A total of 1,758 individuals in 739 families [384 African American (AA) and 355 European American (EA) families] ascertained via affected sib-pairs with cocaine or opioid or alcohol dependence were genotyped using an array-based linkage panel of single-nucleotide polymorphism markers. Fuzzy clustering analysis was conducted on individuals with alcohol, cannabis, cocaine, opioid, and nicotine dependence for AAs and EAs separately, and linkage scans were conducted for the output membership coefficients using Merlin-regression. In EAs, we observed an autosome-wide significant linkage signal on chromosome 4 (peak lod = 3.31 at 68.3 cM; empirical autosome-wide P = 0.038), and a suggestive linkage signal on chromosome 21 (peak lod = 2.37 at 19.4 cM). In AAs, four suggestive linkage peaks were observed: two peaks on chromosome 10 (lod = 2.66 at 96.7 cM and lod = 3.02 at 147.6 cM] and the other two on chromosomes 3 (lod = 2.81 at 145.5 cM) and 9 (lod = 1.93 at 146.8 cM). Three particularly promising candidate genes, GABRA4, GABRB1, and CLOCK, are located within or very close to the autosome-wide significant linkage region for EAs on chromosome 4. This is the first linkage evidence supporting existence of genetic loci influencing risk for several comorbid disorders simultaneously in two major US populations.
comorbidity; multiple substance dependence; fuzzy clustering; chromosome 4
Estimated glomerular filtration rate (eGFR), a measure of kidney function, is heritable, suggesting that genes influence renal function. Genes that influence eGFR have been identified through genome-wide association studies. However, family-based linkage approaches may identify loci that explain a larger proportion of the heritability. This study used genome-wide linkage and association scans to identify quantitative trait loci (QTL) that influence eGFR.
Genome-wide linkage and sparse association scans of eGFR were performed in families ascertained by probands with advanced diabetic nephropathy (DN) from the multi-ethnic Family Investigation of Nephropathy and Diabetes (FIND) study. This study included 954 African Americans (AA), 781 American Indians (AI), 614 European Americans (EA) and 1,611 Mexican Americans (MA). A total of 3,960 FIND participants were genotyped for 6,000 single nucleotide polymorphisms (SNPs) using the Illumina Linkage IVb panel. GFR was estimated by the Modification of Diet in Renal Disease (MDRD) formula.
The non-parametric linkage analysis, accounting for the effects of diabetes duration and BMI, identified the strongest evidence for linkage of eGFR on chromosome 20q11 (log of the odds [LOD] = 3.34; P = 4.4×10−5) in MA and chromosome 15q12 (LOD = 2.84; P = 1.5×10−4) in EA. In all subjects, the strongest linkage signal for eGFR was detected on chromosome 10p12 (P = 5.5×10−4) at 44 cM near marker rs1339048. A subsequent association scan in both ancestry-specific groups and the entire population identified several SNPs significantly associated with eGFR across the genome.
The present study describes the localization of QTL influencing eGFR on 20q11 in MA, 15q21 in EA and 10p12 in the combined ethnic groups participating in the FIND study. Identification of causal genes/variants influencing eGFR, within these linkage and association loci, will open new avenues for functional analyses and development of novel diagnostic markers for DN.
It is an obvious fact that the power of a test statistic is dependent upon the significance (alpha) level at which the test is performed. It is perhaps a less obvious fact that the relative performance of two statistics in terms of power is also a function of the alpha level. Through numerous personal discussions, we have noted that even some competent statisticians have the mistaken intuition that relative power comparisons at traditional levels such as α = 0.05 will be roughly similar to relative power comparisons at very low levels, such as the level α = 5 × 10−8, which is commonly used in genome-wide association studies. In this brief note, we demonstrate that this notion is in fact quite wrong, especially with respect to comparing tests with differing degrees of freedom. In fact, at very low alpha levels the cost of additional degrees of freedom is often comparatively low. Thus we recommend that statisticians exercise caution when interpreting the results of power comparison studies which use alpha levels that will not be used in practice.
Power; Small Significance Levels
We propose a two-step model-based approach, with correction for ascertainment, to linkage analysis of a binary trait with variable age of onset and apply it to a set of multiplex pedigrees segregating for adult glioma.
First, we fit segregation models by formulating the likelihood for a person to have a bivariate phenotype, affection status and age of onset, along with other covariates, and from these we estimate population trait allele frequencies and penetrance parameters as a function of age (N=281 multiplex glioma pedigrees). Second, the best fitting models are used as trait models in multipoint linkage analysis (N=74 informative multiplex glioma pedigrees). To correct for ascertainment, a prevalence constraint is used in the likelihood of the segregation models for all 281 pedigrees. Then the trait allele frequencies are re-estimated for the pedigree founders of the subset of 74 pedigrees chosen for linkage analysis.
Using the best fitting segregation models in model-based multipoint linkage analysis, we identified two separate peaks on chromosome 17; the first agreed with a region identified by Shete et al. who used model-free affected-only linkage analysis, but with a narrowed peak: and the second agreed with a second region they found but had a larger maximum log of the odds (LOD).
Our approach has the advantage of not requiring markers to be in linkage equilibrium unless the minor allele frequency is small (markers which tend to be uninformative for linkage), and of using more of the available information for LOD-based linkage analysis.
Glioma; model-based linkage; segregation; age of onset; prevalence constraint
Olson's conditional-logistic model retains the nice property of the LOD score formulation and has advantages over other methods that make it an appropriate choice for complex trait linkage mapping. However, the asymptotic distribution of the conditional-logistic likelihood-ratio (CL-LR) statistic with genetic constraints on the model parameters is unknown for some analysis models, even in the case of samples comprising only independent sib pairs. We derive approximations to the asymptotic null distributions of the CL-LR statistics and compare them with the empirical null distributions by simulation using independent affected sib pairs. Generally, the empirical null distributions of the CL-LR statistics match well the known or approximated asymptotic distributions for all analysis models considered except for the covariate model with a minimum-adjusted binary covariate. This work will provide useful guidelines for linkage analysis of real data sets for the genetic analysis of complex traits, thereby contributing to the identification of genes for disease traits.
linkage analysis; affected sib pairs; identity-by-descent; conditional-logistic model; genetic constraints; null distribution; likelihood-ratio statistics
Translation studies have been initiated to assess the combined effect of genetic loci from recently accomplished genome-wide association studies and the existing risk factors for early disease prediction. We propose a bagging optimal receiver operating characteristic (ROC) curve method to facilitate this research. Through simulation and real data application, we compared the new method with the commonly used allele counting method and logistic regression, and found that the new method yields a better performance. The new method was applied on the Wellcome Trust data set to form a predictive genetic test for rheumatoid arthritis. The formed test reached an area under the curve (AUC) value of 0.7.
Area under the ROC curve; Bootstrap aggregation; Gene–gene interaction; Genomewide association studies
A novel web-based tool PedWiz that pipelines the informatics process for pedigree data is introduced. PedWiz is designed to assist researchers in the analysis of pedigree data. It provides a convenient tool for pedigree informatics: descriptive statistics, relative pairs, genetic similarity coefficients, the variance-covariance matrix for three estimated coefficients of allele identical-by-descent sharing as well as mean allele sharing, a plot of the pedigree structures, and a visualization of the identity coefficients. With a renewed interest in linkage and other family based methods, PedWiz will be a valuable tool for the analysis of family data.
pedigree; informatics; genetic similarity; identity-by-descent; relative pairs; family data
This paper is concerned with evaluating whether an interaction between two sets of risk factors for a binary trait is removable and fitting a parsimonious additive model using a suitable link function to estimate the disease odds (on the natural logarithm scale) when an interaction is removable. Statisticians define the term “interaction” as a departure from additivity in a linear model on a specific scale on which the data are measured. Certain interactions may be eliminated via a transformation of the outcome such that the relationship between the risk factors and the outcome is additive on the transformed scale. Such interactions are known as removable interactions. We develop a novel test statistic for detecting the presence a removable interaction in case-control studies. We consider the Guerrero and Johnson family of transformations and show that this family constitutes an appropriate link function for fitting an additive model when an interaction is removable. We use simulation studies to examine the type I error and power of the proposed test and to show that an additive model based on the Guerrero and Johnson link function leads to more precise estimates of the disease odds parameters and a better fit when an interaction is removable. The proposed test and use of the transformation are illustrated using case-control data from three published studies. Finally, we indicate how one can check that, after transformation, no further interaction is significant.
Analysis of variance; curvature; independence; interaction effect; link function; main effect; residuals; score statistic; Tukey’s test; transformation; unbalanced data
Current genome-wide association studies still heavily rely on a single-marker strategy, in which each single nucleotide polymorphism (SNP) is tested individually for association with a phenotype. Although methods and software packages that consider multimarker models have become available, they have been slow to become widely adopted and their efficacy in real data analysis is often questioned. Based on conducting extensive simulations, here we endeavor to provide more insights into the performance of simple multimarker association tests as compared to single-marker tests. The results reveal the power advantage as well as disadvantage of the two- vs. the single-marker test. Power differentials depend on the correlation structure among tag SNPs, as well as that between tag SNPs and causal variants. A two-marker test has relatively better performance than single-marker tests when the correlation of the two adjacent markers is high. However, using HapMap data, two-marker tests tended to have a greater chance of being less powerful than single-marker tests, due to constraints on the number of actual possible haplotypes in the HapMap data. Yet, the average power difference was small whenever the one-marker test is more powerful, while there were many situations where the two-marker test can be much more powerful. These findings can be useful to guide analyses of future studies.
Asymptotic power; single-marker test; two-marker test; genome-wide association
We investigated the heritability and familial aggregation of various indexes of arterial stiffness and wave reflection and we partitioned the phenotypic correlation between these traits into shared genetic and environmental components.
Using a family-based population sample, we recruited 204 parents (mean age, 51.7 years) and 290 offspring (29.4 years) from the population in Cracow, Poland (62 families), Hechtel-Eksel, Belgium (36), and Pilsen, the Czech Republic (50). We measured peripheral pulse pressure (PPp) sphygmomanometrically at the brachial artery; central pulse pressure (PPc), the peripheral augmentation indexes (PAIxs) and central augmentation indexes (CAIxs) by applanation tonometry at the radial artery; and aortic pulse wave velocity (PWV) by tonometry or ultrasound. In multivariate-adjusted analyses, we used the ASSOC and PROC GENMOD procedures as implemented in SAGE and SAS, respectively.
We found significant heritability for PAIx, CAIx, PPc and mean arterial pressure ranging from 0.37 to 0.41; P ≤ 0.0001. The method of intrafamilial concordance confirmed these results; intrafamilial correlation coefficients were significant for all arterial indexes (r > ≥ 0.12; P < ≤ 0.02) with the exception of PPc (r = −0.007; P = 0.90) in parent–offspring pairs. The sib–sib correlations were also significant for CAIx (r = 0.22; P = 0.001). The genetic correlation between PWV and the other arterial indexes were significant (ρG ≥ 0.29; P < 0.0001). The corresponding environmental correlations were only significantly positive for PPp (ρE = 0.10, P = 0.03).
The observation of significant intrafamilial concordance and heritability of various indexes of arterial stiffness as well as the genetic correlations among arterial phenotypes strongly support the search for shared genetic determinants underlying these traits.
arterial stiffness; familial aggregation; heritability; pulse pressure; systolic augmentation
Segmental handling of sodium along the proximal and distal nephron might be heritable and different between black and white participants.
We randomly recruited 95 nuclear families of black South African ancestry and 103 nuclear families of white Belgian ancestry. We measured the (FENa) and estimated the fractional renal sodium reabsorption in the proximal (RNaprox) and distal (RNadist) tubules from the clearances of endogenous lithium and creatinine. In multivariable analyses, we studied the relation of RNaprox and RNadist with FENa and estimated the heritability (h2) of RNaprox and RNadist.
Independent of urinary sodium excretion, South Africans (n =240) had higher RNaprox (unadjusted median, 93.9% vs. 81.0%; P < 0.001) than Belgians (n =737), but lower RNadist (91.2% vs. 95.1%; P < 0.001). The slope of RNaprox on FENa was steeper in Belgians than in South Africans (−5.40 ±0.58 vs. −0.78 ±0.58 units; P < 0.001), whereas the opposite was true for the slope of RNadist on FENa (−3.84 ± 0.19 vs. −13.71 ± 1.30 units; P < 0.001). h2 of RNaprox and RNadist was high and significant (P < 0.001) in both countries. h2 was higher in South Africans than in Belgians for RNaprox (0.82 vs. 0.56; P < 0.001), but was similar for RNadist (0.68 vs. 0.50; P = 0.17). Of the filtered sodium load, black participants reabsorb more than white participants in the proximal nephron and less postproximally.
Segmental sodium reabsorption along the nephron is highly heritable, but the capacity for regulation in the proximal and postproximal tubules differs between whites and blacks.
clinical genetics; epidemiology; kidney; lithium clearance; salt sensitivity; segmental tubular sodium transport
15-Hydroxyprostaglandin dehydrogenase (15-PGDH) is a metabolic antagonist of COX-2, catalyzing the degradation of inflammation mediator prostaglandin E2 (PGE2) and other prostanoids. Recent studies have established the 15-PGDH gene as a colon cancer suppressor.
We evaluated 15-PDGH as a colon cancer susceptibility locus in a three-stage design. We first genotyped 102 single-nucleotide polymorphisms (SNPs) in the 15-PGDH gene, spanning ∼50 kb up and down-stream of the coding region, in 464 colon cancer cases and 393 population controls. We then genotyped the same SNPs, and also assayed the expression levels of 15-PGDH in colon tissues from 69 independent patients for whom colon tissue and paired germline DNA samples were available. In the final stage 3, we genotyped the 9 most promising SNPs from stages 1 and 2 in an independent sample of 525 cases and 816 controls (stage 3).
In the first two stages, three SNPs (rs1365611, rs6844282 and rs2332897) were statistically significant (p<0.05) in combined analysis of association with risk of colon cancer and of association with 15-PGDH expression, after adjustment for multiple testing. For one additional SNP, rs2555639, the T allele showed increased cancer risk and decreased 15-PGDH expression, but just missed statistical significance (p-adjusted = 0.063). In stage 3, rs2555639 alone showed evidence of association with an odds ratio (TT compared to CC) of 1.50 (95% CI = 1.05–2.15, p = 0.026).
Our data suggest that the rs2555639 T allele is associated with increased risk of colon cancer, and that carriers of this risk allele exhibit decreased expression of 15-PGDH in the colon.
The potential importance of the joint action of genes, whether modeled with or without a statistical interaction term, has long been recognized. However, identifying such action has been a great challenge, especially when millions of genetic markers are involved. We propose a likelihood ratio-based Mann-Whitney test to search for joint gene action either among candidate genes or genome-wide. It extends the traditional univariate Mann-Whitney test to assess the joint association of genotypes at multiple loci with disease, allowing for high-order statistical interactions. Because only one overall significance test is conducted for the entire analysis, it avoids the issue of multiple testing. Moreover, the approach adopts a computationally efficient algorithm, making a genome-wide search feasible in a reasonable amount of time on a high performance personal computer. We evaluated the approach using both theoretical and real data. By applying the approach to 40 type 2 diabetes (T2D) susceptibility single-nucleotide polymorphisms (SNPs), we identified a four-locus model strongly associated with T2D in the Wellcome Trust (WT) study (permutation P-value < 0.001), and replicated the same finding in the Nurses’ Health Study/Health Professionals Follow-Up Study (NHS/HPFS) (P-value = 3.03 × 10–11). We also conducted a genome-wide search on 385,598 SNPs in the WT study. The analysis took approximately 55 hr on a personal computer, identifying the same first two loci, but overall a different set of four SNPs, jointly associated with T2D (P-value = 1.29 × 10–5). The nominal significance of this same association reached 4.01 × 10–6 in the NHS/HPFS.
gene-gene interaction; genome-wide search; forward selection
Incorporating family data in genetic association studies has become increasingly appreciated, especially for its potential value in testing rare variants. We introduce here a variance-component based association test that can test multiple common or rare variants jointly using both family and unrelated samples.
The proposed approach implemented in our R package aggregates or collapses the information across a region based on genetic similarity instead of genotype scores, which avoids the power loss when the effects are in different directions or have different association strengths. The method is also able to effectively leverage the LD information in a region and it can produce a test statistic with an adaptively estimated number of degrees of freedom. Our method can readily allow for the adjustment of non-genetic contributions to the familial similarity, as well as multiple covariates.
We demonstrate through simulations that the proposed method achieves good performance in terms of Type I error control and statistical power. The method is implemented in the R package “fassoc”, which provides a useful tool for data analysis and exploration.
Association studies; Family data; Score test; Multi-marker test
It has been postulated that multiple-marker methods may have added ability, over single-marker methods, to detect genetic variants associated with disease. The Wellcome Trust Case Control Consortium (WTCCC) provided the first successful large genome-wide association studies (GWAS) which included single-marker association analyses for seven common complex diseases. Of those signals detected, only one was associated with coronary artery disease (CAD), and none were identified for hypertension (HTN). Our objective was to find additional genetic associations and pathways for cardiovascular disease by examining the WTCCC data for variants associated with CAD and HTN using two-marker testing methods. We applied two-marker association testing to the WTCCC dataset, which includes ~2,000 affected individuals with each disorder, and a shared pool of ~3,000 controls, all genotyped using Affymetrix GeneChip 500 K arrays. For CAD, we detected single nucleotide polymorphisms (SNP) pairs in three genes showing genome-wide significance: HFE2, STK32B, and DIPC2. The most notable SNP pairs in a non-protein-coding region were at 9p21, a known major CAD-associated region. For HTN, we detected SNP pairs in five genes: GPR39, XRCC4, MYO6, ZFAT, and MACROD2. Four further associated SNP pair regions were at least 70 kb from any known gene. We have shown that novel, multiple-marker, statistical methods can be of use in finding variants in GWAS. We describe many new, associated variants for both CAD and HTN and describe their known genetic mechanisms.
The genetic etiology of complex human diseases has been commonly viewed as a process that involves multiple genetic variants, environmental factors, as well as their interactions. Statistical approaches, such as the multifactor dimensionality reduction (MDR) and generalized MDR (GMDR), have recently been proposed to test the joint association of multiple genetic variants with either dichotomous or continuous traits. In this paper, we propose a novel Forward U-Test to evaluate the combined effect of multiple loci on quantitative traits with consideration of gene-gene/gene-environment interactions. In this new approach, a U-Statistic-based forward algorithm is first used to select potential disease-susceptibility loci and then a weighted U statistic is used to test the joint association of the selected loci with the disease. Through a simulation study, we found the Forward U-Test outperformed GMDR in terms of greater power. Aside from that, our approach is less computationally intensive, making it feasible for high-dimensional gene-gene/gene-environment research. We illustrate our method with a real data application to Nicotine Dependence (ND), using three independent datasets from the Study of Addiction: Genetics and Environment. Our gene-gene interaction analysis of 155 SNPs in 67 candidate genes identified two SNPs, rs16969968 within gene CHRNA5 and rs1122530 within gene NTRK2, jointly associated with the level of ND (p-value = 5.31e-7). The association, which involves essential interaction, is replicated in two independent datasets with p-values of 1.08e-5 and 0.02, respectively. Our finding suggests that joint action may exist between the two gene products.
gene-gene interaction; Forward U-Test; Nicotine Dependence
Familial aggregation of specific response to allergens and asthma adjusted for age and sensitization to multiple allergens was assessed in two large population cohorts. Allergen skin prick tests (SPTs) were administered to 1151 families in the Tucson Children’s Respiratory Study (CRS) and 435 families in the Tucson Epidemiological Study of Airway Obstructive Disease (TESAOD). Sensitization was defined by wheal size ≥ 3 mm; physician-diagnosed asthma at age ≥ 8 years was based on questionnaires. Using S.A.G.E. 6.1 software ASSOC and FCOR, familial correlations of crude and adjusted phenotypes were evaluated. Crude estimates of parent-offspring (P-O) and sibling correlations were statistically significant for most allergens, ranging from 0.03 to 0.29. After adjusting for age of assessment and “other atopy” (SPT-positive response to additional allergens), correlations were reduced by14–71%. Sibling correlations for specific response to allergens were consistently higher than P-O correlations, but this difference was significant only for dust mite and weed mix in the TESAOD population. Familial correlation for atopic status (any positive SPTs versus none) tended to be higher than for specific allergens. Asthma, with and without adjustment, showed greater familial correlation than either specific or general SPT response and significantly higher sibling correlation in TESAOD than in CRS, probably due to the older age of the siblings and the longer period of ascertainment. In conclusion, significant familial aggregation of specific response to allergen after adjustment for other atopy appears to reflect a genetic propensity toward atopy, dependent on shared familial exposures. Results also suggest that inheritance of asthma is independent of atopic sensitization.
familial aggregation; specific response to allergens; atopy; asthma
Genetic influences may be discerned in families that have multiple affected members and may manifest as an earlier age of cancer diagnosis. In this study we determine whether cancers develop at an earlier age in multiplex Familial Barrett’s Esophagus (FBE) kindreds, defined by 3 or more members affected by Barrett’s esophagus (BE) or esophageal adenocarcinoma (EAC).
Information on BE/EAC risk factors and family history was collected from probands at eight tertiary care academic hospitals. Age of cancer diagnosis and other risk factors were compared between non-familial (no affected relatives), duplex (two affected relatives), and multiplex (three or more affected relatives) FBE kindreds.
The study included 830 non-familial, 274 duplex and 41 multiplex FBE kindreds with 274, 133 and 43 EAC and 566, 288 and 103 BE cases, respectively. Multivariable mixed models adjusting for familial correlations showed that multiplex kindreds were associated with a younger age of cancer diagnosis (p = 0.0186). Median age of cancer diagnosis was significantly younger in multiplex compared to duplex and non-familial kindreds (57 vs. 62 vs. 63 yrs, respectively, p = 0.0448). Mean body mass index (BMI) was significantly lower in multiplex kindreds (p = 0.0033) as was smoking (p < 0.0001), and reported regurgitation (p = 0.0014).
Members of multiplex FBE kindreds develop EAC at an earlier age compared to non-familial EAC cases. Multiplex kindreds do not have a higher proportion of common risk factors for EAC, suggesting that this aggregation might be related to a genetic factor.
These findings indicate that efforts to identify susceptibility genes for BE and EAC will need to focus on multiplex kindreds.
Esophageal adenocarcinoma; Barrett’s esophagus; genetics; family history
Interactions among genomic loci (also known as epistasis) have been suggested as one of the potential sources of missing heritability in single locus analysis of genome-wide association studies (GWAS). The computational burden of searching for interactions is compounded by the extremely low threshold for identifying significant p-values due to multiple hypothesis testing corrections. Utilizing prior biological knowledge to restrict the set of candidate SNP pairs to be tested can alleviate this problem, but systematic studies that investigate the relative merits of integrating different biological frameworks and GWAS data have not been conducted.
We developed four biologically based frameworks to identify pairwise interactions among candidate SNP pairs as follows: (1) for each human protein-coding gene, a set of SNPs associated with that gene was constructed providing a gene-based interaction model, (2) for each known biological pathway, a set of SNPs associated with the genes in the pathway was constructed providing a pathway-based interaction model, (3) a set of SNPs associated with genes in a disease-related subnetwork provides a network-based interaction model, and (4) a framework is based on the function of SNPs. The last approach uses expression SNPs (eSNPs or eQTLs), which are SNPs or loci that have defined effects on the abundance of transcripts of other genes. We constructed pairs of eSNPs and SNPs located in the target genes whose expression is regulated by eSNPs. For all four frameworks the SNP sets were exhaustively tested for pairwise interactions within the sets using a traditional logistic regression model after excluding genes that were previously identified to associate with the trait. Using previously published GWAS data for type 2 diabetes (T2D) and the biologically based pair-wise interaction modeling, we identify twelve genes not seen in the previous single locus analysis.
We present four approaches to detect interactions associated with complex diseases. The results show our approaches outperform the traditional single locus approaches in detecting genes that previously did not reach significance; the results also provide novel drug targets and biomarkers relevant to the underlying mechanisms of disease.
Dopamine β-hydroxylase (DβH) catalyzes the conversion of dopamine to norepinephrine. DβH enters the plasma after vesicular release from sympathetic neurons and the adrenal medulla. Plasma DβH activity (pDβH) varies widely among individuals, and genetic inheritance regulates that variation. Linkage studies suggested strong linkage of pDβH to ABO on 9q34, and positive evidence for linkage to the complement fixation locus on 19p13.2-13.3. Subsequent association studies strongly supported DBH, which maps adjacent to ABO, as the locus regulating a large proportion of the heritable variation in pDβH. Prior studies have suggested that variation in pDβH, or genetic variants at DβH, associate with differences in expression of psychotic symptoms in patients with schizophrenia and other idiopathic or drug-induced brain disorders, suggesting that DBH might be a genetic modifier of psychotic symptoms. As a first step toward investigating that hypothesis, we performed linkage analysis on pDβH in patients with schizophrenia and their relatives. The results strongly confirm linkage of markers at DBH to pDβH under several models (maximum multipoint LOD score, 6.33), but find no evidence to support linkage anywhere on chromosome 19. Accounting for the contributions to the linkage signal of three SNPs at DBH, rs1611115, rs1611122, and rs6271 reduced but did not eliminate the linkage peak, whereas accounting for all SNPs near DBH eliminated the signal entirely. Analysis of markers genome-wide uncovered positive evidence for linkage between markers at chromosome 20p12 (multi-point LOD = 3.1 at 27.2 cM). The present results provide the first direct evidence for linkage between DBH and pDβH, suggest that rs1611115, rs1611122, rs6271 and additional unidentified variants at or near DBH contribute to the genetic regulation of pDβH, and suggest that a locus near 20p12 also influences pDβH.
Numerous studies have provided support for genetic susceptibility to tuberculosis (TB); however, heterogeneity in disease expression has hampered previous genetic studies. The purpose of this work was to investigate possible intermediate phenotypes for TB. A set of cytokine profiles, including antigen-stimulated whole-blood assays for interferon (IFN)–γ, tumor necrosis factor (TNF)–α, transforming growth factor (TGF)–β, and the ratio of IFN to TNF, were analyzed in 177 pedigrees from a community in Uganda with a high prevalence of TB. The heritability of these variables was estimated after adjustment for covariates, and TNF-α, in particular, had an estimated heritability of 68%. A principal component analysis of IFN-γ, TNF-α, and TGF-β reflected the immunologic model of TB. In this analysis, the first component explained >38% of the variation in the data. This analysis illustrates the value of such intermediate phenotypes in mapping susceptibility loci for TB and demonstrates that this area deserves further research.
It is generally known that risk variants segregate together with a disease within families but this information has not been used in the existing statistical methods for detecting rare variants. Here we introduce two weighted sum statistics that can apply to either genome-wide association data or resequencing data for identifying rare disease variants: weights calculated based on sibpairs and odd ratios, respectively. We evaluated the two methods via extensive simulations under different disease models. We compared the proposed methods with the weighted sum statistic (WSS) proposed by Madsen and Browning, keeping the same genotyping or resequencing cost. Our methods clearly demonstrate more statistical power than the WSS. In addition, we found using sibpair information can increase power over using only unrelated samples by more than 40%. We applied our methods to the Framingham Heart Study (FHS) and Wellcome Trust Case Control Consortium (WTCCC) hypertension datasets. Although we did not identify any genes as reaching a genome-wide significance level, we found variants in the candidate gene angiotensinogen (AGT) significantly associated with hypertension at P=6.9×10-4, whereas the most significant single SNP association evidence is P=0.063. We further applied the odds ratio weighted method to the IFIH1 gene for type 1 diabetes in the WTCCC data. Our method yielded a P value of 4.82×10-4, much more significant than that obtained by haplotype-based methods. We demonstrated that family data are extremely informative in searching for rare variants underlying complex traits, and the odds ratio weighted sum statistic is more efficient than currently existing methods.