Successful independent replication is the most direct approach for distinguishing real genotype-disease associations from false discoveries in Genome Wide Association Studies (GWAS). Selecting SNPs for replication has been primarily based on p-values from the discovery stage, although additional characteristics of SNPs may be used to improve replication success.
We used disease-associated SNPs from more than 2,000 published GWASs to identify predictors of SNP reproducibility. SNP reproducibility was defined as a proportion of successful replications among all replication attempts. The study reporting association for the first time was considered to be discovery and all consequent studies targeting the same phenotype replications. We found that −Log(P), where P is a p-value from the discovery study, is the strongest predictor of the SNP reproducibility. Other significant predictors include type of the SNP (e.g. missense vs intronic SNPs) and minor allele frequency. Features of the genes linked to the disease-associated SNP also predict SNP reproducibility.
Based on empirically defined rules, we developed a reproducibility score (RS) to predict SNP reproducibility independently of −Log(P). We used data from two lung cancer GWAS studies as well as recently reported disease-associated SNPs to validate RS. Minus Log(P) outperforms RS when the very top SNPs are selected, while RS works better with relaxed selection criteria. In conclusion, we propose an empirical model to predict SNP reproducibility, which can be used to select SNPs for validation and prioritization.
Comparative analysis of gene expression in human tissues is important for understanding the molecular mechanisms underlying tissue-specific control of gene expression. It can also open an avenue for using gene expression in blood (which is the most easily accessible human tissue) to predict gene expression in other (less accessible) tissues, which would facilitate the development of novel gene expression based models for assessing disease risk and progression. Until recently, direct comparative analysis across different tissues was not possible due to the scarcity of paired tissue samples from the same individuals.
In this study we used paired whole blood/lung gene expression data from the Genotype-Tissue Expression (GTEx) project. We built a generalized linear regression model for each gene using gene expression in lung as the outcome and gene expression in blood, age and gender as predictors.
For ~18 % of the genes, gene expression in blood was a significant predictor of gene expression in lung. We found that the number of single nucleotide polymorphisms (SNPs) influencing expression of a given gene in either blood or lung, also known as the number of quantitative trait loci (eQTLs), was positively associated with efficacy of blood-based prediction of that gene’s expression in lung. This association was strongest for shared eQTLs: those influencing gene expression in both blood and lung.
In conclusion, for a considerable number of human genes, their expression levels in lung can be predicted using observable gene expression in blood. An abundance of shared eQTLs may explain the strong blood/lung correlations in the gene expression.
Electronic supplementary material
The online version of this article (doi:10.1186/s12920-015-0152-7) contains supplementary material, which is available to authorized users.
Gene expression; Normal lung tissue; Normal blood; Genotype-tissue expression project; GTEx
Genome-wide association studies (GWAS) have generated sufficient data to assess the role of selection in shaping allelic diversity of disease-associated SNPs. Negative selection against disease risk variants is expected to reduce their frequencies making them overrepresented in the group of minor (<50%) alleles. Indeed, we found that the overall proportion of risk alleles was higher among alleles with frequency <50% (minor alleles) compared to that in the group of major alleles. We hypothesized that negative selection may have different effects on environment (or lifestyle)-dependent versus environment (or lifestyle)-independent diseases. We used an environment/lifestyle index (ELI) to assess influence of environmental/lifestyle factors on disease etiology. ELI was defined as the number of publications mentioning “environment” or “lifestyle” AND disease per 1,000 disease-mentioning publications. We found that the frequency distributions of the risk alleles for the diseases with strong environmental/lifestyle components follow the distribution expected under a selectively neutral model, while frequency distributions of the risk alleles for the diseases with weak environmental/lifestyle influences is shifted to the lower values indicating effects of negative selection. We hypothesized that previously selectively neutral variants become risk alleles when environment changes. The hypothesis of ancestrally neutral, currently disadvantageous risk-associated alleles predicts that the distribution of risk alleles for the environment/lifestyle dependent diseases will follow a neutral model since natural selection has not had enough time to influence allele frequencies. The results of our analysis suggest that prediction of SNP functionality based on the level of evolutionary conservation may not be useful for SNPs associated with environment/lifestyle dependent diseases.
We reviewed several thousand genome wide association studies that were conducted to identify genetic variants influencing risk of human diseases. We tested the hypothesis that single nucleotide polymorphisms (SNPs) that influence disease risk undergo positive or negative selection more frequently than an average SNP in the human genome. We found no evidence for excess of positive selection on disease-associated SNPs. At the same time we found that alleles associated with a higher disease risk undergo negative selection. We also demonstrated that risk alleles for diseases with strong influence of environment/lifestyle factors (e.g. Type II diabetes) show little evidence of negative selection, while risk alleles for diseases with weak influence of environment/lifestyle factors (e.g. Pathological myopia) show clear signs of negative selection. The approach used in this study can be used to estimate the number of genetic variants in the human genome influencing risk of human diseases.
INPP4B and PTEN dual specificity phosphatases are frequently lost during progression of prostate cancer to metastatic disease. We and others have previously shown that loss of INPP4B expression correlates with poor prognosis in multiple malignancies and with metastatic spread in prostate cancer.
We demonstrate that de novo expression of INPP4B in highly invasive human prostate carcinoma PC-3 cells suppresses their invasion both in vitro and in vivo. Using global gene expression analysis, we found that INPP4B regulates a number of genes associated with cell adhesion, the extracellular matrix, and the cytoskeleton. Importantly, de novo expressed INPP4B suppressed the proinflammatory chemokine IL-8 and induced PAK6. These genes were regulated in a reciprocal manner following downregulation of INPP4B in the independently derived INPP4B-positive LNCaP prostate cancer cell line. Inhibition of PI3K/Akt pathway, which is highly active in both PC-3 and LNCaP cells, did not reproduce INPP4B mediated suppression of IL-8 mRNA expression in either cell type. In contrast, inhibition of PKC signaling phenocopied INPP4B-mediated inhibitory effect on IL-8 in either prostate cancer cell line. In PC-3 cells, INPP4B overexpression caused a decline in the level of metastases associated BIRC5 protein, phosphorylation of PKC, and expression of the common PKC and IL-8 downstream target, COX-2. Reciprocally, COX-2 expression was increased in LNCaP cells following depletion of endogenous INPP4B.
Taken together, we discovered that INPP4B is a novel suppressor of oncogenic PKC signaling, further emphasizing the role of INPP4B in maintaining normal physiology of the prostate epithelium and suppressing metastatic potential of prostate tumors.
Electronic supplementary material
The online version of this article (doi:10.1186/s12964-014-0061-y) contains supplementary material, which is available to authorized users.
INPP4B; Invasion; Prostate cancer; Protein kinase C; Interleukin 8; Survivin/BIRC5
Whole-genome profiling of gene expression is a powerful tool for identifying cancer-associated genes. Genes differentially expressed between normal and tumorous tissues are usually considered to be cancer associated. We recently demonstrated that the analysis of interindividual variation in gene expression can be useful for identifying cancer associated genes. The goal of this study was to identify the best microarray data–derived predictor of known cancer associated genes.
We found that the traditional approach of identifying cancer genes—identifying differentially expressed genes—is not very efficient. The analysis of interindividual variation of gene expression in tumor samples identifies cancer-associated genes more effectively. The results were consistent across 4 major types of cancer: breast, colorectal, lung, and prostate. We used recently reported cancer-associated genes (2011–2012) for validation and found that novel cancer-associated genes can be best identified by elevated variance of the gene expression in tumor samples.
The observation that the high interindividual variation of gene expression in tumor tissues is the best predictor of cancer-associated genes is likely a result of tumor heterogeneity on gene level. Computer simulation demonstrates that in the case of heterogeneity, an assessment of variance in tumors provides a better identification of cancer genes than does the comparison of the expression in normal and tumor tissues. Our results thus challenge the current paradigm that comparing the mean expression between normal and tumorous tissues is the best approach to identifying cancer-associated genes; we found that the high interindividual variation in expression is a better approach, and that using variation would improve our chances of identifying cancer-associated genes.
Gene expression; Cancer genes; Interindividual variation in gene expression
Identification of genes that are differently expressed is a common approach used to analyze genetic mechanisms underlying cancer development. However, recent study results suggest that many such genes relate to a small number of biological functions. We hypothesized that analysis of these functions provides a better understanding of tumor biology than does actual identification of these genes does.
Materials and Methods
We re-analyzed publicly available gene expression data for paired samples of prostate tumor and adjacent normal tissue from the same patients to identify genes differently expressed in individual tumors and then used them to identify the functions.
We found significant interindividual variation in the type and the number of functions. After adjusting for redundancy and nonspecificity of the functional terms, we identified seven functions. Several of them showed a strong association with clinical traits, e.g. age at diagnosis, preoperative prostate-specific antigen concentration, Gleason grade, and biochemical recurrence. Actin cytoskeleton was the function most frequently associated with clinical traits. Of note, the association between function and clinical traits was much stronger than that between the genes differently expressed and those traits.
Different prostate tumors differ in their functional profiles. Functions of differently expressed genes are strongly associated with clinical traits. This suggests that analysis of functions of differently expressed genes may provide a better description of tumor biology than does analysis of the respective genes.
Gene expression; prostate cancer; in silico; functional profiling; functionality index
Predicting disease progression is one of the most challenging problems in prostate cancer research. Adding gene expression data to prediction models that are based on clinical features has been proposed to improve accuracy. In the current study, we applied a logistic regression (LR) model combining clinical features and gene co-expression data to improve the accuracy of the prediction of prostate cancer progression. The top-scoring pair (TSP) method was used to select genes for the model. The proposed models not only preserved the basic properties of the TSP algorithm but also incorporated the clinical features into the prognostic models. Based on the statistical inference with the iterative cross validation, we demonstrated that prediction LR models that included genes selected by the TSP method provided better predictions of prostate cancer progression than those using clinical variables only and/or those that included genes selected by the one-gene-at-a-time approach. Thus, we conclude that TSP selection is a useful tool for feature (and/or gene) selection to use in prognostic models and our model also provides an alternative for predicting prostate cancer progression.
The common disease/common variant hypothesis has been popular for describing the genetic architecture of common human diseases for several years. According to the originally stated hypothesis, one or a few common genetic variants with a relatively large effect size control the risk of common diseases. A growing body of evidence, however, suggests that rare single-nucleotide polymorphisms (SNPs), i.e., those with a minor allele frequency of less than 5%, are also an important component of the genetic architecture of common human diseases. In this study, we analyzed the relevance of rare SNPs to the risk of common disease from an evolutionary perspective and found that rare SNPs are more likely than common SNPs to be functional and tend to have a stronger effect size than do common SNPs. This observation, plus the fact that most of the SNPs in the human genome are rare, suggests that rare SNPs are a crucial element of the genetic architecture of common human diseases. We propose that the next generation of genomic studies should focus on analyzing rare SNPs. Further, targeting patients with a family history of the disease, an extreme phenotype, or early disease onset may facilitate the detection of risk-associated rare SNPs.
Single Nucleotide Polymorphisms (SNPs); Genome Wide Association Studies (GWAS); Minor Allele Frequency (MAF); negative selection
Prediction of cancer progression after radical prostatectomy (RP) is one of the most challenging problems in the management of prostate cancer. Gene-expression profiling is widely used to identify genes associated with such progression. Usually candidate genes are identified according to a gene-by-gene comparison of expression. Recent reports suggested that relative expression of a gene pair more efficiently predicts cancer progression than single-gene analysis does. The top-scoring pair (TSP) algorithm classifies phenotypes according to the relative expression of a pair of genes. We applied the TSP approach to predict which patients would experience systemic tumor progression after RP. Relative expression of TPD52L2/SQLE and CEACAM1/BRCA1 gene pairs identified those patients, with more than 99% specificity but relatively low sensitivity (~10%). These two gene pairs were validated in three independent datasets. Additionally, combining two pairs of genes improved sensitivity without compromising specificity. Functional annotation of the TSP genes demonstrated that they cluster by a limited number of biologic functions and pathways, suggesting that relatively lower expression of genes from specific pathways can predict cancer progression. In conclusion, comparative analysis of the expression of two genes may be a simple and effective classifier for prediction of prostate cancer progression. The TSP approach can be used to identify patients whose prostate cancer will progress after they undergo radical prostatectomy. Two gene pairs can predict which men would experience progression to the metastatic form of the disease. However, because our analysis was based on a relatively small number of genes, a larger study will be needed to identify the best predictors of disease outcome overall.
prostate cancer; gene co-expression; top-scoring pairs of genes; metastasis; cancer progression
Lung cancer in lifetime never smokers is distinct from that in smokers, but the role of separate or overlapping carcinogenic pathways has not been explored. We therefore evaluated a comprehensive panel of 11,737 SNPs in inflammatory-pathway genes in a discovery phase (451 lung cancer cases, 508 controls from Texas). SNPs that were significant were evaluated in a second external population (303 cases, 311 controls from the Mayo Clinic). An intronic SNP in the ACVR1B gene, rs12809597, was replicated with significance and restricted to those reporting adult exposure to environmental tobacco smoke Another promising candidate was a SNP in NR4A1, although the replication OR did not achieve statistical significance. ACVR1B belongs to the TGFR-β superfamily, contributing to resolution of inflammation and initiation of airway remodeling. An inflammatory microenvironment, (second hand smoking, asthma, or hay fever) is necessary for risk from these gene variants to be expressed. These findings require further replication, followed by targeted resequencing, and functional validation.
lung cancer; never smokers; inflammation genes; sidestream exposure
Heterogeneity in age of onset of colorectal cancer in individuals with mutations in DNA mismatch repair genes (Lynch syndrome) suggests the influence of other lifestyle and genetic modifiers. We hypothesized that genes regulating the cell cycle influence the observed heterogeneity as cell cycle–related genes respond to DNA damage by arresting the cell cycle to provide time for repair and induce transcription of genes that facilitate repair. We examined the association of 1456 single nucleotide polymorphisms (SNPs) in 128 cell cycle–related genes and 31 DNA repair–related genes in 485 non-Hispanic white participants with Lynch syndrome to determine whether there are SNPs associated with age of onset of colorectal cancer. Genotyping was performed on an Illumina GoldenGate platform, and data were analyzed using Kaplan–Meier survival analysis, Cox regression analysis and classification and regression tree (CART) methods. Ten SNPs were independently significant in a multivariable Cox proportional hazards regression model after correcting for multiple comparisons (P < 5×10–4). Furthermore, risk modeling using CART analysis defined combinations of genotypes for these SNPs with which subjects could be classified into low-risk, moderate-risk and high-risk groups that had median ages of colorectal cancer onset of 63, 50 and 42 years, respectively. The age-associated risk of colorectal cancer in the high-risk group was more than four times the risk in the low-risk group (hazard ratio = 4.67, 95% CI = 3.16–6.92). The additional genetic markers identified may help in refining risk groups for more tailored screening and follow-up of non-Hispanic white patients with Lynch syndrome.
Genome-wide association studies of European and East Asian populations have identified lung cancer susceptibility loci on chromosomes 5p15.33, 6p22.1-p21.31 and 15q25.1. We investigated whether these regions contain lung cancer susceptibly loci in African-Americans refined previous association signals by utilizing the reduced linkage disequilibrium observed in African-Americans.
1308 African-American cases and 1241 African-American controls from three centers were genotyped for 760 single nucleotide polymorphisms spanning three regions, and additional SNP imputation was performed. Associations between polymorphisms and lung cancer risk were estimated using logistic regression, stratified by tumor histology where appropriate.
The strongest associations were observed on 15q25.1 in/near CHRNA5, including a missense substitution (rs16969968: OR = 1.57, 95% CI = 1.25–1.97, P = 1.1 × 10−4) and variants in the 5′-UTR. Associations on 6p22.1-p21.31 were histology-specific and included a missense variant in BAT2 associated with squamous-cell carcinoma (rs2736158: OR = 0.64, 95% CI = 0.48–0.85, P = 1.82 × 10−3). Associations on 5p15.33 were detected near TERT, the strongest of which was rs2735940 (OR = 0.82, 95% CI = 0.73–0.93, P = 1.1 × 10−3). This association was stronger among cases with adenocarcinoma (OR = 0.75, 95% CI = 0.65–0.86, P = 8.1 × 10−5).
Polymorphisms in 5p15.33, 6p22.1-p21.31 and 15q25.1 are associated with lung cancer in African-Americans. Variants on 5p15.33 are stronger risk factors for adenocarcinoma and variants on 6p21.33 associated only with squamous-cell carcinoma.
Results implicate the BAT2, TERT and CHRNA5 genes in the pathogenesis of specific lung cancer histologies.
Lung cancer; adenocarcinoma; squamous-cell carcinoma; fine-mapping; African-American; genetic association
Identifying genes associated with cancer development is typically accomplished by comparing mean expression values in normal and tumor tissues, which identifies differentially expressed (DE) genes. Interindividual variation (IV) in gene expression is indirectly included in DE gene identification because given the same absolute differences in means, genes with lower variance tend to have lower P values. We explored the direct use of IV in gene expression to identify candidate genes associated with cancer development. We focused on prostate (PCa) and lung (LC) cancers and compared IV in the expression level of genes shown to be cancer related with that in all other genes in the human genome. Compared with all those other genes, cancer-related genes tended to have greater IV in normal tissues and a greater increase in IV during the transition from normal to tumorous tissue. Genes without significantly different mean expression values between tumor and normal tissues but with greater IV in tumor than in normal tissue (note: the DE-based approach completely ignores those genes) had stronger associations with clinically important features like Gleason score in PCa or tumor histology in LC than all other genes were. Our results suggest that analyzing IV in gene expression level is useful in identifying novel candidate genes associated with cancer development.
Gene expression; interindividual variation in gene expression; prostate cancer; lung cancer
Evolutionary aspects of the genetic architecture of common human diseases remain enigmatic. The results of more than 200 genome-wide association studies published to date were compiled in a catalog (http://www.genome.gov/26525384/). We used cataloged data to determine whether derived (mutant) alleles are associated with higher risk of human disease more frequently than ancestral alleles. We placed all allelic variants into ten categories of population frequency (0%–100%) in 10% increments. We then analyzed the relationship between allelic frequency, evolutionary status of the polymorphic site (ancestral versus derived), and disease risk status (risk versus protection). Given the same population frequency, derived alleles are more likely to be risk associated than ancestral alleles, as are rarer alleles. The common interpretation of this association is that negative selection prevents fixation of the risk variants. However, disease stratification as early or late onset suggests that weak selection against risk-associated alleles is unlikely a major factor shaping genetic architecture of common diseases. Our results clearly suggest that the duration of existence of an allele in a population is more important. Alleles existing longer tend to show weaker linkage disequilibrium with neighboring alleles, including the causal alleles, and are less likely to tag a SNP-disease association.
Genome-wide association studies; ancestral allele; derived allele; minor allele frequency
Tumor size at diagnosis (TSD) indirectly reflects tumor growth rate. The relationship between TSD and smoking is poorly understood. The aim of the study was to determine the relationship between smoking and TSD. We reviewed 1712 newly diagnosed and previously untreated non-small cell lung cancer (NSCLC) patients’ electronic medical records and collected tumor characteristics. Demographic and epidemiologic characteristics were derived from questionnaires administered during personal interviews. Univariate and multivariate linear regression models were used to evaluate the relationship between TSD and smoking controlling for demographic and clinical factors. We also investigated the relationship between the rs1051730 SNP in an intron of the CHRNA3 gene (the polymorphism most significantly associated with lung cancer risk and smoking behavior) and TSD. We found a strong dose dependent relationship between TSD and smoking. Current smokers had largest and never smokers smallest TSD with former smokers having intermediate TSD. In the multivariate linear regression model, smoking status (never, former, and current), histological type (adenocarcinoma vs SqCC), and gender were significant predictors of TSD. Smoking duration and intensity may explain the gender effect in predicting TSD. We found that the variant allele of rs1051730 in CHRNA3 gene was associated with larger TSD of squamous cell carcinoma. In the multivariate linear regression model, both rs1051730 and smoking were significant predictors for the size of squamous carcinomas. We conclude that smoking is positively associated with lung tumor size at the moment of diagnosis.
Lung cancer; tumor size; epidemiologic characteristics; risk factors; CHRNA3
More than 400 cancer genes have been identified in the human genome. The list is not yet complete. Statistical models predicting cancer genes may help with identification of novel cancer gene candidates. We used known prostate cancer (PCa) genes (identified through KnowledgeNet) as a training set to build a binary logistic regression model identifying PCa genes. Internal and external validation of the model was conducted using a validation set (also from KnowledgeNet), permutations, and external data on genes with recurrent prostate tumor mutations. We evaluated a set of 33 gene characteristics as predictors. Sixteen of the original 33 predictors were significant in the model. We found that a typical PCa gene is a prostate-specific transcription factor, kinase, or phosphatase with high interindividual variance of the expression level in adjacent normal prostate tissue and differential expression between normal prostate tissue and primary tumor. PCa genes are likely to have an antiapoptotic effect and to play a role in cell proliferation, angiogenesis, and cell adhesion. Their proteins are likely to be ubiquitinated or sumoylated but not acetylated. A number of novel PCa candidates have been proposed. Functional annotations of novel candidates identified antiapoptosis, regulation of cell proliferation, positive regulation of kinase activity, positive regulation of transferase activity, angiogenesis, positive regulation of cell division, and cell adhesion as top functions. We provide the list of the top 200 predicted PCa genes, which can be used as candidates for experimental validation. The model may be modified to predict genes for other cancer sites.
Tobacco-induced lung cancer is characterized by a deregulated inflammatory microenvironment. Variants in multiple genes in inflammation pathways may contribute to risk of lung cancer.
We therefore conducted a three-stage comprehensive pathway analysis (discovery, replication and meta-analysis) of inflammation gene variants in ever smoking lung cancer cases and controls. A discovery set (1096 cases; 727 controls) and an independent and non-overlapping internal replication set (1154 cases; 1137 controls) were derived from an ongoing case-control study. For discovery, we used an iSelect BeadChip to interrogate a comprehensive panel of 11737 inflammation pathway SNPs and selected nominally significant (p<0.05) SNPs for internal replication.
There were 6 SNPs that achieved statistical significance (p<0.05) in the internal replication dataset with concordant risk estimates for former smokers and 5 concordant and replicated SNPs in current smokers. Replicated hits were further tested in a subsequent meta-analysis using external data derived from two published GWAS and a case-control study. Two of these variants (a BCL2L14 SNP in former smokers and a SNP in IL2RB in current smokers) were further validated. In risk score analyses, there was a 26% increase in risk with each additional adverse allele when we combined the genotyped SNP and the most significant imputed SNP in IL2RB in current smokers and a 36% similar increase in risk for former smokers associated with genotyped and imputed BCL2L14 SNPs.
Before they can be applied for risk prediction efforts, these SNPs should be subject to further external replication and more extensive fine mapping studies.
Inflammation SNPS; lung cancer; smokers
Studies in European and East Asian populations have identified lung cancer susceptibility loci in nicotinic acetylcholine receptor (nAChR) genes on chromosome 15q25.1 which also appear to influence smoking behaviors. We sought to determine if genetic variation in nAChR genes influences lung cancer susceptibly in African-Americans, and evaluated the association of these cancer susceptibility loci with smoking behavior. A total of 1308 African-Americans with lung cancer and 1241 African-American controls from three centers were genotyped for 378 single nucleotide polymorphisms (SNPs) spanning the sixteen human nAChR genes. Associations between SNPs and the risk of lung cancer were estimated using logistic regression, adjusted for relevant covariates. Seven SNPs in three nAChR genes were significantly associated with lung cancer at a strict Bonferroni-corrected level, including a novel association on chromosome 2 near the promoter of CHRNA1 (rs3755486: OR = 1.40, 95% CI = 1.18-1.67, P = 1.0 × 10−4). Association analysis of an additional 305 imputed SNPs on 2q31.1 supported this association. Publicly available expression data demonstrated that the rs3755486 risk allele correlates with increased CHRNA1 gene expression. Additional SNP associations were observed on 15q25.1 in genes previously associated with lung cancer, including a missense variant in CHRNA5 (rs16969968: OR = 1.60, 95% CI = 1.27-2.01, P = 5.9 × 10−5). Risk alleles on 15q25.1 also correlated with an increased number of cigarettes smoked per day among the controls. These findings identify a novel lung cancer risk locus on 2q31.1 which correlates with CHRNA1 expression and replicate previous associations on 15q25.1 in African-Americans.
Lung cancer; nicotine dependence; African-Americans; genetic association; smoking
ATM gene mutations have been implicated in many human cancers. However, the role of ATM polymorphisms in lung carcinogenesis is largely unexplored. We conducted a case-control analysis of 556 Caucasian non-small-cell lung cancer (NSCLC) patients and 556 controls frequency-matched on age, gender and smoking status. We genotyped 11 single nucleotide polymorphisms of the ATM gene and found that compared with the wild-type allele-containing genotypes, the homozygous variant genotypes of ATM08 (rs227060) and ATM10 (rs170548) were associated with elevated NSCLC risk with ORs of 1.55 (95% CI: 1.02–2.35) and 1.51 (0.99–2.31), respectively. ATM haplotypes and diplotypes were inferred using the Expectation-Maximization algorithm. Haplotype H5 was significantly associated with reduced NSCLC risk in former smokers with an OR of 0.47 (0.25–0.96) compared with the common H1 haplotype. Compared with the H1–H2 diplotype, H2–H2 and H3–H4 diplotypes were associated with increased NSCLC risk with ORs of 1.58 (0.99–2.54) and 2.29 (1.05–5.00), respectively. We then evaluated genotype–phenotype correlation in the control group using the comet assay to determine DNA damage and DNA repair capacity. Compared with individuals with at least 1 wild-type allele, the homozygous variant carriers of either ATM08 or ATM10 exhibited significantly increased DNA damage as evidenced by a higher mean value of the radiation-induced olive tail moment (ATM08: 4.86 ± 2.43 vs. 3.79 ± 1.51, p = 0.04; ATM10: 5.14 ± 2.37 vs. 3.79 ± 1.54, p = 0.01). Our study presents the first epidemiologic evidence that ATM genetic variants may affect NSCLC predisposition, and that the risk-conferring variants might act through down-regulating the functions of ATM in DNA repair activity upon genetic insults such as ionizing radiation.
ATM; polymorphism; haplotype; diplotype; NSCLC
Chromosome 5p15.33 has been identified by genome-wide association studies as one of the regions that associate with lung cancer risk. A few single-nucleotide polymorphisms (SNPs) in the telomerase reverse transcriptase (TERT) and cleft lip and palate transmembrane 1-like (CLPTM1L) genes located in this region have shown consistent associations. We performed dense genotyping of SNPs in this region to refine the previously reported association signals for lung cancer risk. Two hundred and fifteen SNPs were genotyped on an Illumina iSelect panel, in a hospital-based case–control study of 1681 lung cancer cases and 1235 unaffected controls. Association was tested using unconditional logistic regression, while adjusting for age, sex and pack-years smoked. Furthermore, since many of the SNPs were in linkage disequilibrium (LD), haplotype blocks were constructed, from which tagging SNPs at an r2 threshold of ≥0.95 were included in a stepwise forward selection logistic regression model. Of the 215 SNPs, 69 were significant at P < 0.05 in univariate analysis; of these, 35 SNPs meeting the r2 threshold were included in the multiple logistic regression model. Two SNPs, rs370348 (odds ratio = 0.76, P = 1.6 × 10−6) and rs4975538 (odds ratio = 1.18, P = 0.005), significantly associated with risk in the overall sample. Among ever smokers, rs4975615 (odds ratio = 0.75, P = 1.2 × 10−4) and rs4975538 (odds ratio = 1.26, P = 0.002) were significant, whereas among never-smokers, rs451360 (odds ratio = 0.62, P = 7.6 × 10−5) was significant. We refined the consistent association signal in this region, allowing for the considerable LD between SNPs and identified four novel SNPs that were independently and significantly associated with lung cancer risk. Results of these analyses strongly suggest effects on risk from several loci in the TERT/CLPTM1L region.
Detection of early stage non-small cell lung cancer (NSCLC) is commonly believed to be incidental. Understanding the reasons that caused initial detection of these patients is important for early diagnosis. However, these reasons are not well studied.
We retrospectively reviewed medical records of patients diagnosed with stage I or II NSCLC between 2000 and 2009 at UT MD Anderson Cancer Center. Information on suggestive LC-symptoms or other reasons that caused detection were extracted from patients' medical records. We applied univariate and multivariate analyses to evaluate the association of suggestive LC-symptoms with tumor size and patient survival.
Of the 1396 early stage LC patients, 733 (52.5%) presented with suggestive LC-symptoms as chief complaint. 347 (24.9%) and 287 (20.6%) were diagnosed because of regular check-ups and evaluations for other diseases, respectively. The proportion of suggestive LC-symptom-caused detection had a linear relationship with the tumor size (correlation 0.96; with p<.0001). After age, gender, race, smoking status, therapy, and stage adjustment, the symptom-caused detection showed no significant difference in overall and LC-specific survival when compared with the other (non-symptom-caused) detection.
Symptoms suggestive of LC are the number one reason that led to detection in early NSCLC. They were also associated with tumor size at diagnosis, suggesting early stage LC patients are developing symptoms. Presence of symptoms in early stages did not compromise survival. A symptom-based alerting system or guidelines may be worth of further study to benefit NSCLC high risk individuals.
Genome-wide association studies of white persons with lung cancer have identified a region of extensive linkage disequilibrium on chromosome 15q25.1 that appears to be associated with both risk for lung cancer and smoking dependence. Because studying African American persons, who exhibit lower levels of linkage disequilibrium in this region, may identify additional loci that are associated with lung cancer, we genotyped 34 single-nucleotide polymorphisms (SNPs) in this region (including LOC123688, PSMA4, CHRNA5, CHRNA3, and CHRNB4 genes) in 467 African American patients with lung cancer and 388 frequency-matched African American control subjects. Associations of SNPs in LOC123688 (rs10519203; odds ratio [OR] = 1.60, 95% confidence interval [CI] = 1.25 to 2.05, P = .00016), CHRNA5 (rs2036527; OR = 1.67, 95% CI = 1.26 to 2.21, P = .00031), and CHRNA3 (rs1051730; OR = 1.81, 95% CI = 1.26 to 2.59, P = .00137) genes with lung cancer risk reached Bonferroni-corrected levels of statistical significance (all statistical tests were two-sided). Joint logistic regression analysis showed that rs684513 (OR = 0.47, 95% CI = 0.31 to 0.71, P = .0003) in CHRNA5 and rs8034191 (OR = 1.76, 95% CI = 1.23 to 2.52, P = .002) in LOC123688 were also associated with risk. The functional A variant of rs1696698 in CHRNA5 had the strongest association with lung cancer (OR = 1.98, 95% CI = 1.25 to 3.11, P = .003). These SNPs were primarily associated with increased risk for lung adenocarcinoma histology and were only weakly associated with smoking phenotypes. Thus, among African American persons, multiple loci in the region of chromosome 15q25.1 appear to be strongly associated with lung cancer risk.
The genetic control of prostate cancer development is poorly understood. Large numbers of gene-expression datasets on different aspects of prostate tumorigenesis are available. We used these data to identify and prioritize candidate genes associated with the development of prostate cancer and bone metastases. Our working hypothesis was that combining meta-analyses on different but overlapping steps of prostate tumorigenesis will improve identification of genes associated with prostate cancer development.
A Z score-based meta-analysis of gene-expression data was used to identify candidate genes associated with prostate cancer development. To put together different datasets, we conducted a meta-analysis on 3 levels that follow the natural history of prostate cancer development. For experimental verification of candidates, we used in silico validation as well as in-house gene-expression data.
Genes with experimental evidence of an association with prostate cancer development were overrepresented among our top candidates. The meta-analysis also identified a considerable number of novel candidate genes with no published evidence of a role in prostate cancer development. Functional annotation identified cytoskeleton, cell adhesion, extracellular matrix, and cell motility as the top functions associated with prostate cancer development. We identified 10 genes--CDC2, CCNA2, IGF1, EGR1, SRF, CTGF, CCL2, CAV1, SMAD4, and AURKA--that form hubs of the interaction network and therefore are likely to be primary drivers of prostate cancer development.
By using this large 3-level meta-analysis of the gene-expression data to identify candidate genes associated with prostate cancer development, we have generated a list of candidate genes that may be a useful resource for researchers studying the molecular mechanisms underlying prostate cancer development.
We genotyped individuals with primary biliary cirrhosis and unaffected controls for suggestive risk loci (genome-wide association P < 1 × 10−4) identified in a previous genome-wide association study. Combined analysis of the genome-wide association and replication datasets identified IRF5-TNPO3 (combined P = 8.66 × 10−13), 7q12-21 (combined P = 3.50 × 10−13) and MMEL1 (combined P = 3.15 × 10−8) as new primary biliary cirrhosis susceptibility loci. Fine-mapping studies showed that a single variant accounts for the IRF5-TNPO3 association. As these loci are implicated in other autoimmune conditions, these findings confirm genetic overlap among such diseases.
Genome-wide association studies (GWASs) and global profiling of gene expression (microarrays) are two major technological breakthroughs that allow hypothesis-free identification of candidate genes associated with tumorigenesis. It is not obvious whether there is a consistency between the candidate genes identified by GWAS (GWAS genes) and those identified by profiling gene expression (microarray genes).
We used the Cancer Genetic Markers Susceptibility database to retrieve single nucleotide polymorphisms from candidate genes for prostate cancer. In addition, we conducted a large meta-analysis of gene expression data in normal prostate and prostate tumor tissue. We identified 13,905 genes that were interrogated by both GWASs and microarrays. On the basis of P values from GWASs, we selected 1,649 most significantly associated genes for functional annotation by the Database for Annotation, Visualization and Integrated Discovery. We also conducted functional annotation analysis using same number of the top genes identified in the meta-analysis of the gene expression data. We found that genes involved in cell adhesion were overrepresented among both the GWAS and microarray genes.
We conclude that the results of these analyses suggest that combining GWAS and microarray data would be a more effective approach than analyzing individual datasets and can help to refine the identification of candidate genes and functions associated with tumor development.