The logistic kernel machine test (LKMT) is a testing procedure tailored towards high-dimensional genetic data. Its use in pathway analyses of GWA case-control studies results from its computational efficiency and flexibility of incorporating additional information via the kernel. The kernel can be any positive definite function; unfortunately its form strongly influences the power and bias. Most authors have recommended the use of the simple linear kernel. We demonstrate via a simulation that the probability of rejecting the null hypothesis of no association just by chance increases with the number of SNPs or genes in the pathway when applying this kernel.
We propose a novel kernel that includes an appropriate standardization, in order to protect against any inflation of false positive results. Moreover, our novel kernel contains information on gene membership of SNPs in the pathway.
In an application to data from the NARAC Rheumatoid Arthritis Consortium, we find that even this basic genomic structure can improve the ability of the LKMT to identify meaningful associations. We also demonstrate that the standardization effectively eliminates problems with size bias.
We recommend the use of our standardized kernel and urge caution when using non-adjusted kernels in the LKMT to conduct pathway analysis.
Logistic Kernel Machine Regression; Size Bias; Pathway Analysis; GWAS; Rheumatoid Arthritis
To clarify the role of previous lung diseases (chronic bronchitis, emphysema, pneumonia, and tuberculosis) in the development of lung cancer, the authors conducted a pooled analysis of studies in the International Lung Cancer Consortium. Seventeen studies including 24,607 cases and 81,829 controls (noncases), mainly conducted in Europe and North America, were included (1984–2011). Using self-reported data on previous diagnoses of lung diseases, the authors derived study-specific effect estimates by means of logistic regression models or Cox proportional hazards models adjusted for age, sex, and cumulative tobacco smoking. Estimates were pooled using random-effects models. Analyses stratified by smoking status and histology were also conducted. A history of emphysema conferred a 2.44-fold increased risk of lung cancer (95% confidence interval (CI): 1.64, 3.62 (16 studies)). A history of chronic bronchitis conferred a relative risk of 1.47 (95% CI: 1.29, 1.68 (13 studies)). Tuberculosis (relative risk = 1.48, 95% CI: 1.17, 1.87 (16 studies)) and pneumonia (relative risk = 1.57, 95% CI: 1.22, 2.01 (12 studies)) were also associated with lung cancer risk. Among never smokers, elevated risks were observed for emphysema, pneumonia, and tuberculosis. These results suggest that previous lung diseases influence lung cancer risk independently of tobacco use and that these diseases are important for assessing individual risk.
bronchitis; chronic; emphysema; lung diseases; lung neoplasms; meta-analysis; pneumonia; pulmonary disease; chronic obstructive; tuberculosis
Background and Methods
Familial aggregation of lung cancer exists after accounting for cigarette smoking. However, the extent to which family history affects risk by smoking status, histology, relative type and ethnicity is not well described. This pooled analysis included 24 case-control studies in the International Lung Cancer Consortium. Each study collected age of onset/interview, gender, race/ethnicity, cigarette smoking, histology and first-degree family history of lung cancer. Data from 24,380 lung cancer cases and 23,305 healthy controls were analyzed. Unconditional logistic regression models and generalized estimating equations were used to estimate odds ratios and 95% confidence intervals.
Individuals with a first-degree relative with lung cancer had a 1.51-fold increase in risk of lung cancer, after adjustment for smoking and other potential confounders(95% CI: 1.39, 1.63). The association was strongest for those with a family history in a sibling, after adjustment (OR=1.82, 95% CI: 1.62, 2.05). No modifying effect by histologic type was found. Never smokers showed a lower association with positive familial history of lung cancer (OR=1.25, 95% CI: 1.03, 1.52), slightly stronger for those with an affected sibling (OR=1.44, 95% CI: 1.07, 1.93), after adjustment.
The increased risk among never smokers and similar magnitudes of the effect of family history on lung cancer risk across histological types suggests familial aggregation of lung cancer is independent of those associated with cigarette smoking. While the role of genetic variation in the etiology of lung cancer remains to be fully characterized, family history assessment is immediately available and those with a positive history represent a higher risk group.
Olfactory function tests are sensitive tools for assessing sensory-cognitive processing in schizophrenia. However, associations of central olfactory measures with clinical outcome parameters have not been simultaneously studied in large samples of schizophrenia patients.
In the framework of the comprehensive phenotyping of the GRAS (Göttingen Research Association for Schizophrenia) cohort, we modified and extended existing odor naming (active memory retrieval) and interpretation (attribute assignment) tasks to evaluate them in 881 schizophrenia patients and 102 healthy controls matched for age, gender and smoking behavior. Associations with emotional processing, neuropsychological test performance and disease outcome were studied.
Schizophrenia patients underperformed controls in both olfactory tasks. Odor naming deficits were primarily associated with compromised cognition, interpretation deficits with positive symptom severity and general alertness. Contrasting schizophrenia extreme performers of odor interpretation (best versus worst percentile; N=88 each) and healthy individuals (N=102) underscores the obvious relationship between impaired odor interpretation and psychopathology, cognitive dysfunctioning, and emotional processing (all p<0.004).
The strong association of performance in higher olfactory measures, odor naming and interpretation, with lead symptoms of schizophrenia and determinants of disease severity highlights their clinical and scientific significance. Based on the results obtained here in an exploratory fashion in a large patient sample, the development of an easy-to-use clinical test with improved psychometric properties may be encouraged.
Odor naming; Higher olfactory processing; Odor interpretation; Positive symptoms; Cognition
Asthma has been hypothesized to be associated with lung cancer (LC) risk. We conducted a pooled analysis of 16 studies in the International Lung Cancer Consortium (ILCCO) to quantitatively assess this association and compared the results with 36 previously published studies. In total, information from 585 444 individuals was used. Study-specific measures were combined using random effects models. A meta-regression and subgroup meta-analyses were performed to identify sources of heterogeneity. The overall LC relative risk (RR) associated with asthma was 1.28 [95% confidence intervals (CIs) = 1.16–1.41] but with large heterogeneity (I2 = 73%, P < 0.001) between studies. Among ILCCO studies, an increased risk was found for squamous cell (RR = 1.69, 95%, CI = 1.26–2.26) and for small-cell carcinoma (RR = 1.71, 95% CI = 0.99–2.95) but was weaker for adenocarcinoma (RR = 1.09, 95% CI = 0.88–1.36). The increased LC risk was strongest in the 2 years after asthma diagnosis (RR = 2.13, 95% CI = 1.09–4.17) but subjects diagnosed with asthma over 10 years prior had no or little increased LC risk (RR = 1.10, 95% CI = 0.94–1.30). Because the increased incidence of LC was chiefly observed in small cell and squamous cell lung carcinomas, primarily within 2 years of asthma diagnosis and because the association was weak among never smokers, we conclude that the association may not reflect a causal effect of asthma on the risk of LC.
Genetic case-control association studies often include data on clinical covariates, such as body mass index (BMI), smoking status, or age, that may modify the underlying genetic risk of case or control samples. For example, in type 2 diabetes, odds ratios for established variants estimated from low–BMI cases are larger than those estimated from high–BMI cases. An unanswered question is how to use this information to maximize statistical power in case-control studies that ascertain individuals on the basis of phenotype (case-control ascertainment) or phenotype and clinical covariates (case-control-covariate ascertainment). While current approaches improve power in studies with random ascertainment, they often lose power under case-control ascertainment and fail to capture available power increases under case-control-covariate ascertainment. We show that an informed conditioning approach, based on the liability threshold model with parameters informed by external epidemiological information, fully accounts for disease prevalence and non-random ascertainment of phenotype as well as covariates and provides a substantial increase in power while maintaining a properly controlled false-positive rate. Our method outperforms standard case-control association tests with or without covariates, tests of gene x covariate interaction, and previously proposed tests for dealing with covariates in ascertained data, with especially large improvements in the case of case-control-covariate ascertainment. We investigate empirical case-control studies of type 2 diabetes, prostate cancer, lung cancer, breast cancer, rheumatoid arthritis, age-related macular degeneration, and end-stage kidney disease over a total of 89,726 samples. In these datasets, informed conditioning outperforms logistic regression for 115 of the 157 known associated variants investigated (P-value = 1×10−9). The improvement varied across diseases with a 16% median increase in χ2 test statistics and a commensurate increase in power. This suggests that applying our method to existing and future association studies of these diseases may identify novel disease loci.
This work describes a new methodology for analyzing genome-wide case-control association studies of diseases with strong correlations to clinical covariates, such as age in prostate cancer and body mass index in type 2 diabetes. Currently, researchers either ignore these clinical covariates or apply approaches that ignore the disease's prevalence and the study's ascertainment strategy. We take an alternative approach, leveraging external prevalence information from the epidemiological literature and constructing a statistic based on the classic liability threshold model of disease. Our approach not only improves the power of studies that ascertain individuals randomly or based on the disease phenotype, but also improves the power of studies that ascertain individuals based on both the disease phenotype and clinical covariates. We apply our statistic to seven datasets over six different diseases and a variety of clinical covariates. We found that there was a substantial improvement in test statistics relative to current approaches at known associated variants. This suggests that novel loci may be identified by applying our method to existing and future association studies of these diseases.
Recent genome-wide association studies (GWASs) have identified common genetic variants at 5p15.33, 6p21–6p22 and 15q25.1 associated with lung cancer risk. Several other genetic regions including variants of CHEK2 (22q12), TP53BP1 (15q15) and RAD52 (12p13) have been demonstrated to influence lung cancer risk in candidate- or pathway-based analyses. To identify novel risk variants for lung cancer, we performed a meta-analysis of 16 GWASs, totaling 14 900 cases and 29 485 controls of European descent. Our data provided increased support for previously identified risk loci at 5p15 (P = 7.2 × 10−16), 6p21 (P = 2.3 × 10−14) and 15q25 (P = 2.2 × 10−63). Furthermore, we demonstrated histology-specific effects for 5p15, 6p21 and 12p13 loci but not for the 15q25 region. Subgroup analysis also identified a novel disease locus for squamous cell carcinoma at 9p21 (CDKN2A/p16INK4A/p14ARF/CDKN2B/p15INK4B/ANRIL; rs1333040, P = 3.0 × 10−7) which was replicated in a series of 5415 Han Chinese (P = 0.03; combined analysis, P = 2.3 × 10−8). This large analysis provides additional evidence for the role of inherited genetic susceptibility to lung cancer and insight into biological differences in the development of the different histological types of lung cancer.
Radiation sensitivity is assumed to be a cancer susceptibility factor due to impaired DNA damage signalling and repair. Relevant genetic factors may also determine the observed familial aggregation of early onset lung cancer. We investigated the heritability of radiation sensitivity in families of 177 Caucasian cases of early onset lung cancer. In total 798 individuals were characterized for their radiation-induced DNA damage response. DNA damage analysis was performed by alkaline comet assay before and after in vitro irradiation of isolated lymphocytes. The cells were exposed to a dose of 4 Gy and allowed to repair induced DNA-damage up to 60 minutes. The primary outcome parameter Olive Tail Moment was the basis for heritability estimates. Heritability was highest for basal damage (without irradiation) 70% (95%-CI: 51%–88%) and initial damage (directly after irradiation) 65% (95%-CI: 47%–83%) and decreased to 20%–48% for the residual damage after different repair times. Hence our study supports the hypothesis that genomic instability represented by the basal DNA damage as well as radiation induced and repaired damage is highly heritable. Genes influencing genome instability and DNA repair are therefore of major interest for the etiology of lung cancer in the young. The comet assay represents a proper tool to investigate heritability of the radiation sensitive phenotype. Our results are in good agreement with other mutagen sensitivity assays.
COMET Assay; DNA damage; familial aggregation; lung cancer
Pathway analysis has been proposed as a complement to single SNP analyses in GWAS. This study compared pathway analysis methods using two lung cancer GWAS data sets based on four studies: one a combined data set from Central Europe and Toronto (CETO); the other a combined data set from Germany and MD Anderson (GRMD). We searched the literature for pathway analysis methods that were widely used, representative of other methods, and had available software for performing analysis. We selected the programs EASE, which uses a modified Fishers Exact calculation to test for pathway associations, GenGen (a version of Gene Set Enrichment Analysis (GSEA)), which uses a Kolmogorov-Smirnov-like running sum statistic as the test statistic, and SLAT, which uses a p-value combination approach. We also included a modified version of the SUMSTAT method (mSUMSTAT), which tests for association by averaging χ2 statistics from genotype association tests. There were nearly 18000 genes available for analysis, following mapping of more than 300,000 SNPs from each data set. These were mapped to 421 GO level 4 gene sets for pathway analysis. Among the methods designed to be robust to biases related to gene size and pathway SNP correlation (GenGen, mSUMSTAT and SLAT), the mSUMSTAT approach identified the most significant pathways (8 in CETO and 1 in GRMD). This included a highly plausible association for the acetylcholine receptor activity pathway in both CETO (FDR≤0.001) and GRMD (FDR = 0.009), although two strong association signals at a single gene cluster (CHRNA3-CHRNA5-CHRNB4) drive this result, complicating its interpretation. Few other replicated associations were found using any of these methods. Difficulty in replicating associations hindered our comparison, but results suggest mSUMSTAT has advantages over the other approaches, and may be a useful pathway analysis tool to use alongside other methods such as the commonly used GSEA (GenGen) approach.
Genetic Analysis Workshop 17 (GAW17) provided a platform for evaluating existing statistical genetic methods and for developing novel methods to analyze rare variants that modulate complex traits. In this article, we present an overview of the 1000 Genomes Project exome data and simulated phenotype data that were distributed to GAW17 participants for analyses, the different issues addressed by the participants, and the process of preparation of manuscripts resulting from the discussions during the workshop.
Genome-wide association studies have identified three chromosomal regions at 15q25, 5p15, and 6p21 as being associated with the risk of lung cancer. To confirm these associations in independent studies and investigate heterogeneity of these associations within specific subgroups, we conducted a coordinated genotyping study within the International Lung Cancer Consortium based on independent studies that were not included in previous genome-wide association studies.
Genotype data for single-nucleotide polymorphisms at chromosomes 15q25 (rs16969968, rs8034191), 5p15 (rs2736100, rs402710), and 6p21 (rs2256543, rs4324798) from 21 case–control studies for 11 645 lung cancer case patients and 14 954 control subjects, of whom 85% were white and 15% were Asian, were pooled. Associations between the variants and the risk of lung cancer were estimated by logistic regression models. All statistical tests were two-sided.
Associations between 15q25 and the risk of lung cancer were replicated in white ever-smokers (rs16969968: odds ratio [OR] = 1.26, 95% confidence interval [CI] = 1.21 to 1.32, Ptrend = 2 × 10−26), and this association was stronger for those diagnosed at younger ages. There was no association in never-smokers or in Asians between either of the 15q25 variants and the risk of lung cancer. For the chromosome 5p15 region, we confirmed statistically significant associations in whites for both rs2736100 (OR = 1.15, 95% CI = 1.10 to 1.20, Ptrend = 1 × 10−10) and rs402710 (OR = 1.14, 95% CI = 1.09 to 1.19, Ptrend = 5 × 10−8) and identified similar associations in Asians (rs2736100: OR = 1.23, 95% CI = 1.12 to 1.35, Ptrend = 2 × 10−5; rs402710: OR = 1.15, 95% CI = 1.04 to 1.27, Ptrend = .007). The associations between the 5p15 variants and lung cancer differed by histology; odds ratios for rs2736100 were highest in adenocarcinoma and for rs402710 were highest in adenocarcinoma and squamous cell carcinomas. This pattern was observed in both ethnic groups. Neither of the two variants on chromosome 6p21 was associated with the risk of lung cancer.
In this international genetic association study of lung cancer, previous associations found in white populations were replicated and new associations were identified in Asian populations. Future genetic studies of lung cancer should include detailed stratification by histology.
KCNN3, encoding the small conductance calcium-activated potassium channel SK3, harbours a polymorphic CAG repeat in the amino-terminal coding region with yet unproven function. Hypothesizing that KCNN3 genotypes do not influence susceptibility to schizophrenia but modify its phenotype, we explored their contribution to specific schizophrenic symptoms. Using the Göttingen Research Association for Schizophrenia (GRAS) data collection of schizophrenic patients (n = 1074), we performed a phenotype-based genetic association study (PGAS) of KCNN3. We show that long CAG repeats in the schizophrenic sample are specifically associated with better performance in higher cognitive tasks, comprising the capacity to discriminate, select and execute (p < 0.0001). Long repeats reduce SK3 channel function, as we demonstrate by patch-clamping of transfected HEK293 cells. In contrast, modelling the opposite in mice, i.e. KCNN3 overexpression/channel hyperfunction, leads to selective deficits in higher brain functions comparable to those influenced by SK3 conductance in humans. To conclude, KCNN3 genotypes modify cognitive performance, shown here in a large sample of schizophrenic patients. Reduction of SK3 function may constitute a pharmacological target to improve cognition in schizophrenia and other conditions with cognitive impairment.
higher cognitive testing; mouse behaviour; neuropsychology; small conductance calcium-activated potassium channel; whole-cell patch clamp
Background. Analysis of candidate genes in individual studies has had only limited success in identifying particular gene variants that are conclusively associated with lung cancer risk. In the International Lung Cancer Consortium (ILCCO), we conducted a coordinated genotyping study of 10 common variants selected because of their prior evidence of an association with lung cancer. These variants belonged to candidate genes from different cancer-related pathways including inflammation (IL1B), folate metabolism (MTHFR), regulatory function (AKAP9 and CAMKK1), cell adhesion (SEZL6) and apoptosis (FAS, FASL, TP53, TP53BP1 and BAT3). Methods. Genotype data from 15 ILCCO case–control studies were available for a total of 8431 lung cancer cases and 11 072 controls of European descent and Asian ethnic groups. Unconditional logistic regression was used to model the association between each variant and lung cancer risk. Results. Only the association between a non-synonymous variant of TP53BP1 (rs560191) and lung cancer risk was significant (OR = 0.91, P = 0.002). This association was more striking for squamous cell carcinoma (OR = 0.86, P = 6 × 10−4). No heterogeneity by center, ethnicity, smoking status, age group or sex was observed. In order to confirm this association, we included results for this variant from a set of independent studies (9966 cases/11 722 controls) and we reported similar results. When combining all these studies together, we reported an overall OR = 0.93 (0.89–0.97) (P = 0.001). This association was significant only for squamous cell carcinoma [OR = 0.89 (0.85–0.95), P = 1 × 10−4]. Conclusion. This study suggests that rs560191 is associated to lung cancer risk and further highlights the value of consortia in replicating or refuting published genetic associations.
Genome-wide association studies (GWAS) continue to gain in popularity. To utilize the wealth of data created more effectively, a variety of methods have recently been proposed to include a priori information (e.g., biologically interpretable sets of genes, candidate gene information, or gene expression) in GWAS analysis. Six contributions to Genetic Analysis Workshop 16 Group 11 applied novel or recently proposed methods to GWAS of rheumatoid arthritis and heart disease related phenotypes. The results of these analyses were a variety of novel candidate genes and sets of genes, in addition to the validation of well known genotype-phenotype associations. However, because many methods are relatively new, they would benefit from further methodological research to ensure that they maintain type I error rates while increasing power to find additional associations. When methods have been adapted from other study types (e.g., gene expression data analysis or linkage analysis) the lessons learned there should be used to guide implementation of techniques. Lastly, many open research questions exist concerning the logistic details of the origin of the a priori information and the way to incorporate it. Overall, our group has demonstrated a strong potential for identifying novel genotype-phenotype relationships by including a priori data in the analysis of GWAS, while also uncovering a series of questions requiring further research.
gene set analysis; external information; gene expression; hierarchical Bayesian model; candidate regions; candidate genes; pathway
In genome-wide association studies (GWAS) genetic markers are often ranked to select genes for further pursuit. Especially for moderately associated and interrelated genes, information on genes and pathways may improve the selection. We applied and combined two main approaches for data integration to a GWAS for rheumatoid arthritis, gene set enrichment analysis (GSEA) and hierarchical Bayes prioritization (HBP). Many associated genes are located in the HLA region on 6p21. However, the ranking lists of genes and gene sets differ considerably depending on the chosen approach: HBP changes the ranking only slightly and primarily contains HLA genes in the top 100 gene lists. GSEA includes also many non-HLA genes.
For the Framingham Heart Study (FHS) and simulated FHS (FHSsim) data, we tested for gene-gene interaction in quantitative traits employing a longitudinal nonparametric association test (LNPT) and, for comparison, a survival analysis. We report results for the Offspring Cohort by LNPT analysis and on all longitudinal cohorts by survival analysis with cohort effect adjustment. We verified that type I errors were not inflated. We compared the power of both methods to detect in FHSsim data two sets of gene pairs that interact for the trait coronary artery calcification. In FHS, we tested eight gene pairs from a list of candidate genes for interaction effects on body mass index. Both methods found evidence for pairwise non-additive effects of mutations in the genes FTO, PON1, and PFKP on body mass index.
The polymorphism rs2569190 within the CD14 endotoxin (lipopolysaccharide, LPS) receptor gene is associated with various disease conditions that are assumed to rely on endotoxin sensitivity. In vitro experiments suggest that the T allele sensitizes the host for exogenous or endogenous LPS via an enhanced CD14 expression. To prove the impact of this single nucleotide polymorphism in its natural genomic context in vivo, two parameters of gene transcription were analyzed in peripheral blood mononuclear cells (PBMC) from single healthy individuals: (a) recruitment of RNA polymerase II by haplotype-specific chromatin immunoprecipitation and (b) the relative amount of transcripts by allele-specific transcript quantification (ASTQ). RNA polymerase II was found to be twice as much bound to the most prevalent haplotype, C-T-C-G, the only one carrying a T at the position rs2569190 of interest. ASTQ employing two independent read-out assays revealed, however, similar transcript numbers originating from C-T-C-G and non-C-T-C-G haplotypes. Total CD14 mRNA levels from freshly isolated PBMC, moreover, were neither related to donors’ geno- nor haplogenotypes. Our data argue for a functional impact of the rs2569190 polymorphism in terms of a stronger transcription initiation on T allele gene variants even if preferential allele-specific binding does not result in an increase in transcript numbers. Endotoxin sensitivity associated with this genetic variation appears not to rely solely on a cis-acting regulatory impact of rs2569190 on CD14 gene transcription in PBMC.
SNP; LPS; Gene polymorphism; Gene expression; Innate immunity
We present the rationale, the background and the structure for version 2.0 of the GENESTAT information portal (www.genestat.org) for statistical genetics. The fast methodological advances, coupled with a range of standalone software, makes it difficult for expert as well as non-expert users to orientate when designing and analysing their genetic studies. The ultimate ambition of GENESTAT is to guide on statistical methodology related to the broad spectrum of research in genetic epidemiology. GENESTAT 2.0 focuses on genetic association studies. Each entry provides a summary of a topic and gives links to key papers, websites and software. The flexibility of the internet is utilised for cross-referencing and for open editing. This paper gives an overview of GENESTAT and gives short introductions to the current main topics in GENESTAT, with additional entries on the website. Methods and software developers are invited to contribute to the portal, which is powered by a Wikipedia-type engine and allows easy additions and editing.
statistical genetics; genetic software; internet
Embryonic stem (ES) cells have the potential to differentiate into all cell types and are considered as a valuable source of cells for transplantation therapies. A critical issue, however, is the risk of teratoma formation after transplantation. The effect of the immune response on the tumorigenicity of transplanted cells is poorly understood. We have systematically compared the tumorigenicity of mouse ES cells and in vitro differentiated neuronal cells in various recipients. Subcutaneous injection of 1×106 ES or differentiated cells into syngeneic or allogeneic immunodeficient mice resulted in teratomas in about 95% of the recipients. Both cell types did not give rise to tumors in immunocompetent allogeneic mice or xenogeneic rats. However, in 61% of cyclosporine A-treated rats teratomas developed after injection of differentiated cells. Undifferentiated ES cells did not give rise to tumors in these rats. ES cells turned out to be highly susceptible to killing by rat natural killer (NK) cells due to the expression of ligands of the activating NK receptor NKG2D on ES cells. These ligands were down-regulated on differentiated cells. The activity of NK cells which is not suppressed by cyclosporine A might contribute to the prevention of teratomas after injection of ES cells but not after inoculation of differentiated cells. These findings clearly point to the importance of the immune response in this process. Interestingly, the differentiated cells must contain a tumorigenic cell population that is not present among ES cells and which might be resistant to NK cell-mediated killing.
Early onset lung cancer shows some familial aggregation, pointing to a genetic predisposition. This study was set up to investigate the role of candidate genes in the susceptibility to lung cancer patients younger than 51 years at diagnosis.
246 patients with a primary, histologically or cytologically confirmed neoplasm, recruited from 2000 to 2003 in major lung clinics across Germany, were matched to 223 unrelated healthy controls. 11 single nucleotide polymorphisms of genes with reported associations to lung cancer have been genotyped.
Genetic associations or gene-smoking interactions was found for GPX1(Pro200Leu) and EPHX1(His113Tyr). Carriers of the Leu-allele of GPX1(Pro200Leu) showed a significant risk reduction of OR = 0.6 (95% CI: 0.4–0.8, p = 0.002) in general and of OR = 0.3 (95% CI:0.1–0.8, p = 0.012) within heavy smokers. We could also find a risk decreasing genetic effect for His-carriers of EPHX1(His113Tyr) for moderate smokers (OR = 0.2, 95% CI:0.1–0.7, p = 0.012). Considered both variants together, a monotone decrease of the OR was found for smokers (OR of 0.20; 95% CI: 0.07–0.60) for each protective allele.
Smoking is the most important risk factor for young lung cancer patients. However, this study provides some support for the T-Allel of GPX1(Pro200Leu) and the C-Allele of EPHX1(His113Tyr) to play a protective role in early onset lung cancer susceptibility.
Related cases may be included in case-control association studies if correlations between related individuals due to identity-by-descent (IBD) sharing are taken into account. We derived a framework to test for association in a case-control design including affected sibships and unrelated controls. First, a corrected variance for the allele frequency difference between cases and controls was directly calculated or estimated in two ways on the basis of the fixation index FST and the inbreeding coefficient. Then the correlation-corrected association test including controls and affected sibs was carried out. We applied the three strategies to 20 candidate genes on the Genetic Analysis Workshop 15 rheumatoid arthritis data and to 9187 single-nucleotide polymorphisms of replicate one of the Genetic Analysis Workshop 15 simulated data with knowledge of the "answers". The three strategies used to correct for correlation give only minor differences in the variance estimates and yield an almost correct type I error rate for the association tests. Thus, all strategies considered to correct the variance performed quite well.
Genome wide linkage scans have often been successful in the identification of genetic regions containing susceptibility genes for a disease. Meta analysis is used to synthesize information and can even deliver evidence for findings missed by original studies. If researchers are not contributing their data, extracting valid information from publications is technically challenging, but worth the effort. We propose an approach to include data extracted from published figures of genome wide linkage scans. The validity of the extraction was examined on the basis of those 25 markers, for which sufficient information was reported. Monte Carlo simulations were used to take into account the uncertainty in marker position and in linkage test statistic. For the final meta analysis we compared the Genome Search Meta Analysis method (GSMA) and the Corrected p-value Meta analysis Method (CPMM). An application to Parkinson's disease is given. Because we had to use secondary data a meta analysis based on original summary values would be desirable.
Data uncertainty by replicated extraction of marker position is shown to be much smaller than 30 cM, a distance up to which a maximum LOD score may usually be found away from the true locus. The main findings are not impaired by data uncertainty.
Applying the proposed method a novel linked region for Parkinson's disease was identified on chromosome 14 (p = 0.036). Comparing the two meta analysis methods we found in this analysis more regions of interest being identified by GSMA, whereas CPMM provides stronger evidence for linkage. For further validation of the extraction method comparisons with raw data would be required.