Biological pathways provide rich information and biological context on the genetic causes of complex diseases. The logistic kernel machine test integrates prior knowledge on pathways in order to analyze data from genome-wide association studies (GWAS). Here, the kernel converts genomic information of two individuals to a quantitative value reflecting their genetic similarity. With the selection of the kernel one implicitly chooses a genetic effect model. Like many other pathway methods, none of the available kernels accounts for topological structure of the pathway or gene-gene interaction types. However, evidence indicates that connectivity and neighborhood of genes are crucial in the context of GWAS, because genes associated with a disease often interact. Thus, we propose a novel kernel that incorporates the topology of pathways and information on interactions. Using simulation studies, we demonstrate that the proposed method maintains the type I error correctly and can be more effective in the identification of pathways associated with a disease than non-network-based methods. We apply our approach to genome-wide association case control data on lung cancer and rheumatoid arthritis. We identify some promising new pathways associated with these diseases, which may improve our current understanding of the genetic mechanisms.
Kernel Machine Test; Pathways; Networks; Gene-Gene Interactions; Score Test; Generalized Linear Model; Lung Cancer; Rheumatoid Arthritis; Disease Association; Genetic Association Studies
The analysis of gene-environment (GxE) interactions remains one of the greatest challenges in the post-genome-wide-association-studies (GWAS) era. Recent methods constitute a compromise between the robust but underpowered case-control and powerful case-only methods. Inferences of the latter are biased when the assumption of gene-environment (G-E) independence fails. We propose a novel empirical hierarchical Bayes approach to GxE interaction (EHB-GE), which benefits from greater power while accounting for population-based G-E dependence. Building on Lewinger et al.'s ( Genet Epidemiol 31:871-882) hierarchical Bayes prioritization approach, the method utilizes posterior G-E association estimates in controls based on G-E information across the genome to adjust for it in resulting test statistics. These posteriori estimates are subtracted from the corresponding G-E association coefficients within cases.
We compared EHB-GE with rival methods using simulation. EHB-GE has similar or greater rank power to detect GxE interactions in the presence of large numbers of G-E associations with weak to strong effects or only a low number of such associations with large effect. When there are no or only a few weak G-E associations, Murcray et al.'s method ( Am J Epidemiol 169:219-226) identifies markers with low GxE interaction effects better. We applied EHB-GE and competing methods to four lung cancer case-control GWAS from the TRICL/ILCCO consortium with smoking as environmental factor. Genes identified by the EHB-GE approach are reasonable candidates, suggesting usefulness of the method.
population G-E association; GWAS; rank power; lung cancer
The kernel score statistic is a global covariance component test over a set of genetic markers. It provides a flexible modeling framework and does not collapse marker information. We generalize the kernel score statistic to allow for familial dependencies and to adjust for random confounder effects. With this extension, we adjust our analysis of real and simulated baseline systolic blood pressure for polygenic familial background. We find that the kernel score test gains appreciably in power through the use of sequencing compared to tag-single-nucleotide polymorphisms for very rare single nucleotide polymorphisms with <1% minor allele frequency.
Genetic Analysis Workshop 18 provided a platform for developing and evaluating statistical methods to analyze whole-genome sequence data from a pedigree-based sample. In this article we present an overview of the data sets and the contributions that analyzed these data. The family data, donated by the Type 2 Diabetes Genetic Exploration by Next-Generation Sequencing in Ethnic Samples Consortium, included sequence-level genotypes based on sequencing and imputation, genome-wide association genotypes from prior genotyping arrays, and phenotypes from longitudinal assessments. The contributions from individual research groups were extensively discussed before, during, and after the workshop in theme-based discussion groups before being submitted for publication.
Recent evidence suggests that inflammation plays a pivotal role in the development of lung cancer. In this study, we used a two-stage approach to investigate associations between genetic variants in inflammation pathways and lung cancer risk based on genome-wide association study (GWAS) data. A total of 7,650 sequence variants from 720 genes relevant to inflammation pathways were identified using keyword and pathway searches from Gene Cards and Gene Ontology databases. In Stage 1, six GWAS datasets from the International Lung Cancer Consortium were pooled (4,441 cases and 5,094 controls of European ancestry), and a hierarchical modeling (HM) approach was used to incorporate prior information for each of the variants into the analysis. The prior matrix was constructed using (1) role of genes in the inflammation and immune pathways; (2) physical properties of the variants including the location of the variants, their conservation scores and amino acid coding; (3) LD with other functional variants and (4) measures of heterogeneity across the studies. HM affected the priority ranking of variants particularly among those having low prior weights, imprecise estimates and/or heterogeneity across studies. In Stage 2, we used an independent NCI lung cancer GWAS study (5,699 cases and 5,818 controls) for in silico replication. We identified one novel variant at the level corrected for multiple comparisons (rs2741354 in EPHX2 at 8q21.1 with p value = 7.4 × 10−6), and confirmed the associations between TERT (rs2736100) and the HLA region and lung cancer risk. HM allows for prior knowledge such as from bioinformatic sources to be incorporated into the analysis systematically, and it represents a complementary analytical approach to the conventional GWAS analysis.
The logistic kernel machine test (LKMT) is a testing procedure tailored towards high-dimensional genetic data. Its use in pathway analyses of GWA case-control studies results from its computational efficiency and flexibility of incorporating additional information via the kernel. The kernel can be any positive definite function; unfortunately its form strongly influences the power and bias. Most authors have recommended the use of the simple linear kernel. We demonstrate via a simulation that the probability of rejecting the null hypothesis of no association just by chance increases with the number of SNPs or genes in the pathway when applying this kernel.
We propose a novel kernel that includes an appropriate standardization, in order to protect against any inflation of false positive results. Moreover, our novel kernel contains information on gene membership of SNPs in the pathway.
In an application to data from the NARAC Rheumatoid Arthritis Consortium, we find that even this basic genomic structure can improve the ability of the LKMT to identify meaningful associations. We also demonstrate that the standardization effectively eliminates problems with size bias.
We recommend the use of our standardized kernel and urge caution when using non-adjusted kernels in the LKMT to conduct pathway analysis.
Logistic Kernel Machine Regression; Size Bias; Pathway Analysis; GWAS; Rheumatoid Arthritis
To clarify the role of previous lung diseases (chronic bronchitis, emphysema, pneumonia, and tuberculosis) in the development of lung cancer, the authors conducted a pooled analysis of studies in the International Lung Cancer Consortium. Seventeen studies including 24,607 cases and 81,829 controls (noncases), mainly conducted in Europe and North America, were included (1984–2011). Using self-reported data on previous diagnoses of lung diseases, the authors derived study-specific effect estimates by means of logistic regression models or Cox proportional hazards models adjusted for age, sex, and cumulative tobacco smoking. Estimates were pooled using random-effects models. Analyses stratified by smoking status and histology were also conducted. A history of emphysema conferred a 2.44-fold increased risk of lung cancer (95% confidence interval (CI): 1.64, 3.62 (16 studies)). A history of chronic bronchitis conferred a relative risk of 1.47 (95% CI: 1.29, 1.68 (13 studies)). Tuberculosis (relative risk = 1.48, 95% CI: 1.17, 1.87 (16 studies)) and pneumonia (relative risk = 1.57, 95% CI: 1.22, 2.01 (12 studies)) were also associated with lung cancer risk. Among never smokers, elevated risks were observed for emphysema, pneumonia, and tuberculosis. These results suggest that previous lung diseases influence lung cancer risk independently of tobacco use and that these diseases are important for assessing individual risk.
bronchitis; chronic; emphysema; lung diseases; lung neoplasms; meta-analysis; pneumonia; pulmonary disease; chronic obstructive; tuberculosis
Background and Methods
Familial aggregation of lung cancer exists after accounting for cigarette smoking. However, the extent to which family history affects risk by smoking status, histology, relative type and ethnicity is not well described. This pooled analysis included 24 case-control studies in the International Lung Cancer Consortium. Each study collected age of onset/interview, gender, race/ethnicity, cigarette smoking, histology and first-degree family history of lung cancer. Data from 24,380 lung cancer cases and 23,305 healthy controls were analyzed. Unconditional logistic regression models and generalized estimating equations were used to estimate odds ratios and 95% confidence intervals.
Individuals with a first-degree relative with lung cancer had a 1.51-fold increase in risk of lung cancer, after adjustment for smoking and other potential confounders(95% CI: 1.39, 1.63). The association was strongest for those with a family history in a sibling, after adjustment (OR=1.82, 95% CI: 1.62, 2.05). No modifying effect by histologic type was found. Never smokers showed a lower association with positive familial history of lung cancer (OR=1.25, 95% CI: 1.03, 1.52), slightly stronger for those with an affected sibling (OR=1.44, 95% CI: 1.07, 1.93), after adjustment.
The increased risk among never smokers and similar magnitudes of the effect of family history on lung cancer risk across histological types suggests familial aggregation of lung cancer is independent of those associated with cigarette smoking. While the role of genetic variation in the etiology of lung cancer remains to be fully characterized, family history assessment is immediately available and those with a positive history represent a higher risk group.
Olfactory function tests are sensitive tools for assessing sensory-cognitive processing in schizophrenia. However, associations of central olfactory measures with clinical outcome parameters have not been simultaneously studied in large samples of schizophrenia patients.
In the framework of the comprehensive phenotyping of the GRAS (Göttingen Research Association for Schizophrenia) cohort, we modified and extended existing odor naming (active memory retrieval) and interpretation (attribute assignment) tasks to evaluate them in 881 schizophrenia patients and 102 healthy controls matched for age, gender and smoking behavior. Associations with emotional processing, neuropsychological test performance and disease outcome were studied.
Schizophrenia patients underperformed controls in both olfactory tasks. Odor naming deficits were primarily associated with compromised cognition, interpretation deficits with positive symptom severity and general alertness. Contrasting schizophrenia extreme performers of odor interpretation (best versus worst percentile; N=88 each) and healthy individuals (N=102) underscores the obvious relationship between impaired odor interpretation and psychopathology, cognitive dysfunctioning, and emotional processing (all p<0.004).
The strong association of performance in higher olfactory measures, odor naming and interpretation, with lead symptoms of schizophrenia and determinants of disease severity highlights their clinical and scientific significance. Based on the results obtained here in an exploratory fashion in a large patient sample, the development of an easy-to-use clinical test with improved psychometric properties may be encouraged.
Odor naming; Higher olfactory processing; Odor interpretation; Positive symptoms; Cognition
Recent studies have shown an association between cigarettes per day (CPD) and a nonsynonymous single-nucleotide polymorphism in CHRNA5, rs16969968.
To determine whether the association between rs16969968 and smoking is modified by age at onset of regular smoking.
Available genetic studies containing measures of CPD and the genotype of rs16969968 or its proxy.
Uniform statistical analysis scripts were run locally. Starting with 94 050 ever-smokers from 43 studies, we extracted the heavy smokers (CPD >20) and light smokers (CPD ≤10) with age-at-onset information, reducing the sample size to 33 348. Each study was stratified into early-onset smokers (age at onset ≤16 years) and late-onset smokers (age at onset >16 years), and a logistic regression of heavy vs light smoking with the rs16969968 genotype was computed for each stratum. Meta-analysis was performed within each age-at-onset stratum.
Individuals with 1 risk allele at rs16969968 who were early-onset smokers were significantly more likely to be heavy smokers in adulthood (odds ratio [OR]=1.45; 95% CI, 1.36–1.55; n=13 843) than were carriers of the risk allele who were late-onset smokers (OR = 1.27; 95% CI, 1.21–1.33, n = 19 505) (P = .01).
These results highlight an increased genetic vulnerability to smoking in early-onset smokers.
Asthma has been hypothesized to be associated with lung cancer (LC) risk. We conducted a pooled analysis of 16 studies in the International Lung Cancer Consortium (ILCCO) to quantitatively assess this association and compared the results with 36 previously published studies. In total, information from 585 444 individuals was used. Study-specific measures were combined using random effects models. A meta-regression and subgroup meta-analyses were performed to identify sources of heterogeneity. The overall LC relative risk (RR) associated with asthma was 1.28 [95% confidence intervals (CIs) = 1.16–1.41] but with large heterogeneity (I2 = 73%, P < 0.001) between studies. Among ILCCO studies, an increased risk was found for squamous cell (RR = 1.69, 95%, CI = 1.26–2.26) and for small-cell carcinoma (RR = 1.71, 95% CI = 0.99–2.95) but was weaker for adenocarcinoma (RR = 1.09, 95% CI = 0.88–1.36). The increased LC risk was strongest in the 2 years after asthma diagnosis (RR = 2.13, 95% CI = 1.09–4.17) but subjects diagnosed with asthma over 10 years prior had no or little increased LC risk (RR = 1.10, 95% CI = 0.94–1.30). Because the increased incidence of LC was chiefly observed in small cell and squamous cell lung carcinomas, primarily within 2 years of asthma diagnosis and because the association was weak among never smokers, we conclude that the association may not reflect a causal effect of asthma on the risk of LC.
Genetic case-control association studies often include data on clinical covariates, such as body mass index (BMI), smoking status, or age, that may modify the underlying genetic risk of case or control samples. For example, in type 2 diabetes, odds ratios for established variants estimated from low–BMI cases are larger than those estimated from high–BMI cases. An unanswered question is how to use this information to maximize statistical power in case-control studies that ascertain individuals on the basis of phenotype (case-control ascertainment) or phenotype and clinical covariates (case-control-covariate ascertainment). While current approaches improve power in studies with random ascertainment, they often lose power under case-control ascertainment and fail to capture available power increases under case-control-covariate ascertainment. We show that an informed conditioning approach, based on the liability threshold model with parameters informed by external epidemiological information, fully accounts for disease prevalence and non-random ascertainment of phenotype as well as covariates and provides a substantial increase in power while maintaining a properly controlled false-positive rate. Our method outperforms standard case-control association tests with or without covariates, tests of gene x covariate interaction, and previously proposed tests for dealing with covariates in ascertained data, with especially large improvements in the case of case-control-covariate ascertainment. We investigate empirical case-control studies of type 2 diabetes, prostate cancer, lung cancer, breast cancer, rheumatoid arthritis, age-related macular degeneration, and end-stage kidney disease over a total of 89,726 samples. In these datasets, informed conditioning outperforms logistic regression for 115 of the 157 known associated variants investigated (P-value = 1×10−9). The improvement varied across diseases with a 16% median increase in χ2 test statistics and a commensurate increase in power. This suggests that applying our method to existing and future association studies of these diseases may identify novel disease loci.
This work describes a new methodology for analyzing genome-wide case-control association studies of diseases with strong correlations to clinical covariates, such as age in prostate cancer and body mass index in type 2 diabetes. Currently, researchers either ignore these clinical covariates or apply approaches that ignore the disease's prevalence and the study's ascertainment strategy. We take an alternative approach, leveraging external prevalence information from the epidemiological literature and constructing a statistic based on the classic liability threshold model of disease. Our approach not only improves the power of studies that ascertain individuals randomly or based on the disease phenotype, but also improves the power of studies that ascertain individuals based on both the disease phenotype and clinical covariates. We apply our statistic to seven datasets over six different diseases and a variety of clinical covariates. We found that there was a substantial improvement in test statistics relative to current approaches at known associated variants. This suggests that novel loci may be identified by applying our method to existing and future association studies of these diseases.
Recent genome-wide association studies (GWASs) have identified common genetic variants at 5p15.33, 6p21–6p22 and 15q25.1 associated with lung cancer risk. Several other genetic regions including variants of CHEK2 (22q12), TP53BP1 (15q15) and RAD52 (12p13) have been demonstrated to influence lung cancer risk in candidate- or pathway-based analyses. To identify novel risk variants for lung cancer, we performed a meta-analysis of 16 GWASs, totaling 14 900 cases and 29 485 controls of European descent. Our data provided increased support for previously identified risk loci at 5p15 (P = 7.2 × 10−16), 6p21 (P = 2.3 × 10−14) and 15q25 (P = 2.2 × 10−63). Furthermore, we demonstrated histology-specific effects for 5p15, 6p21 and 12p13 loci but not for the 15q25 region. Subgroup analysis also identified a novel disease locus for squamous cell carcinoma at 9p21 (CDKN2A/p16INK4A/p14ARF/CDKN2B/p15INK4B/ANRIL; rs1333040, P = 3.0 × 10−7) which was replicated in a series of 5415 Han Chinese (P = 0.03; combined analysis, P = 2.3 × 10−8). This large analysis provides additional evidence for the role of inherited genetic susceptibility to lung cancer and insight into biological differences in the development of the different histological types of lung cancer.
Radiation sensitivity is assumed to be a cancer susceptibility factor due to impaired DNA damage signalling and repair. Relevant genetic factors may also determine the observed familial aggregation of early onset lung cancer. We investigated the heritability of radiation sensitivity in families of 177 Caucasian cases of early onset lung cancer. In total 798 individuals were characterized for their radiation-induced DNA damage response. DNA damage analysis was performed by alkaline comet assay before and after in vitro irradiation of isolated lymphocytes. The cells were exposed to a dose of 4 Gy and allowed to repair induced DNA-damage up to 60 minutes. The primary outcome parameter Olive Tail Moment was the basis for heritability estimates. Heritability was highest for basal damage (without irradiation) 70% (95%-CI: 51%–88%) and initial damage (directly after irradiation) 65% (95%-CI: 47%–83%) and decreased to 20%–48% for the residual damage after different repair times. Hence our study supports the hypothesis that genomic instability represented by the basal DNA damage as well as radiation induced and repaired damage is highly heritable. Genes influencing genome instability and DNA repair are therefore of major interest for the etiology of lung cancer in the young. The comet assay represents a proper tool to investigate heritability of the radiation sensitive phenotype. Our results are in good agreement with other mutagen sensitivity assays.
COMET Assay; DNA damage; familial aggregation; lung cancer
Pathway analysis has been proposed as a complement to single SNP analyses in GWAS. This study compared pathway analysis methods using two lung cancer GWAS data sets based on four studies: one a combined data set from Central Europe and Toronto (CETO); the other a combined data set from Germany and MD Anderson (GRMD). We searched the literature for pathway analysis methods that were widely used, representative of other methods, and had available software for performing analysis. We selected the programs EASE, which uses a modified Fishers Exact calculation to test for pathway associations, GenGen (a version of Gene Set Enrichment Analysis (GSEA)), which uses a Kolmogorov-Smirnov-like running sum statistic as the test statistic, and SLAT, which uses a p-value combination approach. We also included a modified version of the SUMSTAT method (mSUMSTAT), which tests for association by averaging χ2 statistics from genotype association tests. There were nearly 18000 genes available for analysis, following mapping of more than 300,000 SNPs from each data set. These were mapped to 421 GO level 4 gene sets for pathway analysis. Among the methods designed to be robust to biases related to gene size and pathway SNP correlation (GenGen, mSUMSTAT and SLAT), the mSUMSTAT approach identified the most significant pathways (8 in CETO and 1 in GRMD). This included a highly plausible association for the acetylcholine receptor activity pathway in both CETO (FDR≤0.001) and GRMD (FDR = 0.009), although two strong association signals at a single gene cluster (CHRNA3-CHRNA5-CHRNB4) drive this result, complicating its interpretation. Few other replicated associations were found using any of these methods. Difficulty in replicating associations hindered our comparison, but results suggest mSUMSTAT has advantages over the other approaches, and may be a useful pathway analysis tool to use alongside other methods such as the commonly used GSEA (GenGen) approach.
Genetic Analysis Workshop 17 (GAW17) provided a platform for evaluating existing statistical genetic methods and for developing novel methods to analyze rare variants that modulate complex traits. In this article, we present an overview of the 1000 Genomes Project exome data and simulated phenotype data that were distributed to GAW17 participants for analyses, the different issues addressed by the participants, and the process of preparation of manuscripts resulting from the discussions during the workshop.
Genome-wide association studies have identified three chromosomal regions at 15q25, 5p15, and 6p21 as being associated with the risk of lung cancer. To confirm these associations in independent studies and investigate heterogeneity of these associations within specific subgroups, we conducted a coordinated genotyping study within the International Lung Cancer Consortium based on independent studies that were not included in previous genome-wide association studies.
Genotype data for single-nucleotide polymorphisms at chromosomes 15q25 (rs16969968, rs8034191), 5p15 (rs2736100, rs402710), and 6p21 (rs2256543, rs4324798) from 21 case–control studies for 11 645 lung cancer case patients and 14 954 control subjects, of whom 85% were white and 15% were Asian, were pooled. Associations between the variants and the risk of lung cancer were estimated by logistic regression models. All statistical tests were two-sided.
Associations between 15q25 and the risk of lung cancer were replicated in white ever-smokers (rs16969968: odds ratio [OR] = 1.26, 95% confidence interval [CI] = 1.21 to 1.32, Ptrend = 2 × 10−26), and this association was stronger for those diagnosed at younger ages. There was no association in never-smokers or in Asians between either of the 15q25 variants and the risk of lung cancer. For the chromosome 5p15 region, we confirmed statistically significant associations in whites for both rs2736100 (OR = 1.15, 95% CI = 1.10 to 1.20, Ptrend = 1 × 10−10) and rs402710 (OR = 1.14, 95% CI = 1.09 to 1.19, Ptrend = 5 × 10−8) and identified similar associations in Asians (rs2736100: OR = 1.23, 95% CI = 1.12 to 1.35, Ptrend = 2 × 10−5; rs402710: OR = 1.15, 95% CI = 1.04 to 1.27, Ptrend = .007). The associations between the 5p15 variants and lung cancer differed by histology; odds ratios for rs2736100 were highest in adenocarcinoma and for rs402710 were highest in adenocarcinoma and squamous cell carcinomas. This pattern was observed in both ethnic groups. Neither of the two variants on chromosome 6p21 was associated with the risk of lung cancer.
In this international genetic association study of lung cancer, previous associations found in white populations were replicated and new associations were identified in Asian populations. Future genetic studies of lung cancer should include detailed stratification by histology.
KCNN3, encoding the small conductance calcium-activated potassium channel SK3, harbours a polymorphic CAG repeat in the amino-terminal coding region with yet unproven function. Hypothesizing that KCNN3 genotypes do not influence susceptibility to schizophrenia but modify its phenotype, we explored their contribution to specific schizophrenic symptoms. Using the Göttingen Research Association for Schizophrenia (GRAS) data collection of schizophrenic patients (n = 1074), we performed a phenotype-based genetic association study (PGAS) of KCNN3. We show that long CAG repeats in the schizophrenic sample are specifically associated with better performance in higher cognitive tasks, comprising the capacity to discriminate, select and execute (p < 0.0001). Long repeats reduce SK3 channel function, as we demonstrate by patch-clamping of transfected HEK293 cells. In contrast, modelling the opposite in mice, i.e. KCNN3 overexpression/channel hyperfunction, leads to selective deficits in higher brain functions comparable to those influenced by SK3 conductance in humans. To conclude, KCNN3 genotypes modify cognitive performance, shown here in a large sample of schizophrenic patients. Reduction of SK3 function may constitute a pharmacological target to improve cognition in schizophrenia and other conditions with cognitive impairment.
higher cognitive testing; mouse behaviour; neuropsychology; small conductance calcium-activated potassium channel; whole-cell patch clamp
Background. Analysis of candidate genes in individual studies has had only limited success in identifying particular gene variants that are conclusively associated with lung cancer risk. In the International Lung Cancer Consortium (ILCCO), we conducted a coordinated genotyping study of 10 common variants selected because of their prior evidence of an association with lung cancer. These variants belonged to candidate genes from different cancer-related pathways including inflammation (IL1B), folate metabolism (MTHFR), regulatory function (AKAP9 and CAMKK1), cell adhesion (SEZL6) and apoptosis (FAS, FASL, TP53, TP53BP1 and BAT3). Methods. Genotype data from 15 ILCCO case–control studies were available for a total of 8431 lung cancer cases and 11 072 controls of European descent and Asian ethnic groups. Unconditional logistic regression was used to model the association between each variant and lung cancer risk. Results. Only the association between a non-synonymous variant of TP53BP1 (rs560191) and lung cancer risk was significant (OR = 0.91, P = 0.002). This association was more striking for squamous cell carcinoma (OR = 0.86, P = 6 × 10−4). No heterogeneity by center, ethnicity, smoking status, age group or sex was observed. In order to confirm this association, we included results for this variant from a set of independent studies (9966 cases/11 722 controls) and we reported similar results. When combining all these studies together, we reported an overall OR = 0.93 (0.89–0.97) (P = 0.001). This association was significant only for squamous cell carcinoma [OR = 0.89 (0.85–0.95), P = 1 × 10−4]. Conclusion. This study suggests that rs560191 is associated to lung cancer risk and further highlights the value of consortia in replicating or refuting published genetic associations.
Genome-wide association studies (GWAS) continue to gain in popularity. To utilize the wealth of data created more effectively, a variety of methods have recently been proposed to include a priori information (e.g., biologically interpretable sets of genes, candidate gene information, or gene expression) in GWAS analysis. Six contributions to Genetic Analysis Workshop 16 Group 11 applied novel or recently proposed methods to GWAS of rheumatoid arthritis and heart disease related phenotypes. The results of these analyses were a variety of novel candidate genes and sets of genes, in addition to the validation of well known genotype-phenotype associations. However, because many methods are relatively new, they would benefit from further methodological research to ensure that they maintain type I error rates while increasing power to find additional associations. When methods have been adapted from other study types (e.g., gene expression data analysis or linkage analysis) the lessons learned there should be used to guide implementation of techniques. Lastly, many open research questions exist concerning the logistic details of the origin of the a priori information and the way to incorporate it. Overall, our group has demonstrated a strong potential for identifying novel genotype-phenotype relationships by including a priori data in the analysis of GWAS, while also uncovering a series of questions requiring further research.
gene set analysis; external information; gene expression; hierarchical Bayesian model; candidate regions; candidate genes; pathway
In genome-wide association studies (GWAS) genetic markers are often ranked to select genes for further pursuit. Especially for moderately associated and interrelated genes, information on genes and pathways may improve the selection. We applied and combined two main approaches for data integration to a GWAS for rheumatoid arthritis, gene set enrichment analysis (GSEA) and hierarchical Bayes prioritization (HBP). Many associated genes are located in the HLA region on 6p21. However, the ranking lists of genes and gene sets differ considerably depending on the chosen approach: HBP changes the ranking only slightly and primarily contains HLA genes in the top 100 gene lists. GSEA includes also many non-HLA genes.
For the Framingham Heart Study (FHS) and simulated FHS (FHSsim) data, we tested for gene-gene interaction in quantitative traits employing a longitudinal nonparametric association test (LNPT) and, for comparison, a survival analysis. We report results for the Offspring Cohort by LNPT analysis and on all longitudinal cohorts by survival analysis with cohort effect adjustment. We verified that type I errors were not inflated. We compared the power of both methods to detect in FHSsim data two sets of gene pairs that interact for the trait coronary artery calcification. In FHS, we tested eight gene pairs from a list of candidate genes for interaction effects on body mass index. Both methods found evidence for pairwise non-additive effects of mutations in the genes FTO, PON1, and PFKP on body mass index.
The polymorphism rs2569190 within the CD14 endotoxin (lipopolysaccharide, LPS) receptor gene is associated with various disease conditions that are assumed to rely on endotoxin sensitivity. In vitro experiments suggest that the T allele sensitizes the host for exogenous or endogenous LPS via an enhanced CD14 expression. To prove the impact of this single nucleotide polymorphism in its natural genomic context in vivo, two parameters of gene transcription were analyzed in peripheral blood mononuclear cells (PBMC) from single healthy individuals: (a) recruitment of RNA polymerase II by haplotype-specific chromatin immunoprecipitation and (b) the relative amount of transcripts by allele-specific transcript quantification (ASTQ). RNA polymerase II was found to be twice as much bound to the most prevalent haplotype, C-T-C-G, the only one carrying a T at the position rs2569190 of interest. ASTQ employing two independent read-out assays revealed, however, similar transcript numbers originating from C-T-C-G and non-C-T-C-G haplotypes. Total CD14 mRNA levels from freshly isolated PBMC, moreover, were neither related to donors’ geno- nor haplogenotypes. Our data argue for a functional impact of the rs2569190 polymorphism in terms of a stronger transcription initiation on T allele gene variants even if preferential allele-specific binding does not result in an increase in transcript numbers. Endotoxin sensitivity associated with this genetic variation appears not to rely solely on a cis-acting regulatory impact of rs2569190 on CD14 gene transcription in PBMC.
SNP; LPS; Gene polymorphism; Gene expression; Innate immunity