The common disease/common variant hypothesis has been popular for describing the genetic architecture of common human diseases for several years. According to the originally stated hypothesis, one or a few common genetic variants with a relatively large effect size control the risk of common diseases. A growing body of evidence, however, suggests that rare single-nucleotide polymorphisms (SNPs), i.e., those with a minor allele frequency of less than 5%, are also an important component of the genetic architecture of common human diseases. In this study, we analyzed the relevance of rare SNPs to the risk of common disease from an evolutionary perspective and found that rare SNPs are more likely than common SNPs to be functional and tend to have a stronger effect size than do common SNPs. This observation, plus the fact that most of the SNPs in the human genome are rare, suggests that rare SNPs are a crucial element of the genetic architecture of common human diseases. We propose that the next generation of genomic studies should focus on analyzing rare SNPs. Further, targeting patients with a family history of the disease, an extreme phenotype, or early disease onset may facilitate the detection of risk-associated rare SNPs.
Single Nucleotide Polymorphisms (SNPs); Genome Wide Association Studies (GWAS); Minor Allele Frequency (MAF); negative selection
Evolutionary aspects of the genetic architecture of common human diseases remain enigmatic. The results of more than 200 genome-wide association studies published to date were compiled in a catalog (http://www.genome.gov/26525384/). We used cataloged data to determine whether derived (mutant) alleles are associated with higher risk of human disease more frequently than ancestral alleles. We placed all allelic variants into ten categories of population frequency (0%–100%) in 10% increments. We then analyzed the relationship between allelic frequency, evolutionary status of the polymorphic site (ancestral versus derived), and disease risk status (risk versus protection). Given the same population frequency, derived alleles are more likely to be risk associated than ancestral alleles, as are rarer alleles. The common interpretation of this association is that negative selection prevents fixation of the risk variants. However, disease stratification as early or late onset suggests that weak selection against risk-associated alleles is unlikely a major factor shaping genetic architecture of common diseases. Our results clearly suggest that the duration of existence of an allele in a population is more important. Alleles existing longer tend to show weaker linkage disequilibrium with neighboring alleles, including the causal alleles, and are less likely to tag a SNP-disease association.
Genome-wide association studies; ancestral allele; derived allele; minor allele frequency
Tumor size at diagnosis (TSD) indirectly reflects tumor growth rate. The relationship between TSD and smoking is poorly understood. The aim of the study was to determine the relationship between smoking and TSD. We reviewed 1712 newly diagnosed and previously untreated non-small cell lung cancer (NSCLC) patients’ electronic medical records and collected tumor characteristics. Demographic and epidemiologic characteristics were derived from questionnaires administered during personal interviews. Univariate and multivariate linear regression models were used to evaluate the relationship between TSD and smoking controlling for demographic and clinical factors. We also investigated the relationship between the rs1051730 SNP in an intron of the CHRNA3 gene (the polymorphism most significantly associated with lung cancer risk and smoking behavior) and TSD. We found a strong dose dependent relationship between TSD and smoking. Current smokers had largest and never smokers smallest TSD with former smokers having intermediate TSD. In the multivariate linear regression model, smoking status (never, former, and current), histological type (adenocarcinoma vs SqCC), and gender were significant predictors of TSD. Smoking duration and intensity may explain the gender effect in predicting TSD. We found that the variant allele of rs1051730 in CHRNA3 gene was associated with larger TSD of squamous cell carcinoma. In the multivariate linear regression model, both rs1051730 and smoking were significant predictors for the size of squamous carcinomas. We conclude that smoking is positively associated with lung tumor size at the moment of diagnosis.
Lung cancer; tumor size; epidemiologic characteristics; risk factors; CHRNA3
More than 400 cancer genes have been identified in the human genome. The list is not yet complete. Statistical models predicting cancer genes may help with identification of novel cancer gene candidates. We used known prostate cancer (PCa) genes (identified through KnowledgeNet) as a training set to build a binary logistic regression model identifying PCa genes. Internal and external validation of the model was conducted using a validation set (also from KnowledgeNet), permutations, and external data on genes with recurrent prostate tumor mutations. We evaluated a set of 33 gene characteristics as predictors. Sixteen of the original 33 predictors were significant in the model. We found that a typical PCa gene is a prostate-specific transcription factor, kinase, or phosphatase with high interindividual variance of the expression level in adjacent normal prostate tissue and differential expression between normal prostate tissue and primary tumor. PCa genes are likely to have an antiapoptotic effect and to play a role in cell proliferation, angiogenesis, and cell adhesion. Their proteins are likely to be ubiquitinated or sumoylated but not acetylated. A number of novel PCa candidates have been proposed. Functional annotations of novel candidates identified antiapoptosis, regulation of cell proliferation, positive regulation of kinase activity, positive regulation of transferase activity, angiogenesis, positive regulation of cell division, and cell adhesion as top functions. We provide the list of the top 200 predicted PCa genes, which can be used as candidates for experimental validation. The model may be modified to predict genes for other cancer sites.
Tobacco-induced lung cancer is characterized by a deregulated inflammatory microenvironment. Variants in multiple genes in inflammation pathways may contribute to risk of lung cancer.
We therefore conducted a three-stage comprehensive pathway analysis (discovery, replication and meta-analysis) of inflammation gene variants in ever smoking lung cancer cases and controls. A discovery set (1096 cases; 727 controls) and an independent and non-overlapping internal replication set (1154 cases; 1137 controls) were derived from an ongoing case-control study. For discovery, we used an iSelect BeadChip to interrogate a comprehensive panel of 11737 inflammation pathway SNPs and selected nominally significant (p<0.05) SNPs for internal replication.
There were 6 SNPs that achieved statistical significance (p<0.05) in the internal replication dataset with concordant risk estimates for former smokers and 5 concordant and replicated SNPs in current smokers. Replicated hits were further tested in a subsequent meta-analysis using external data derived from two published GWAS and a case-control study. Two of these variants (a BCL2L14 SNP in former smokers and a SNP in IL2RB in current smokers) were further validated. In risk score analyses, there was a 26% increase in risk with each additional adverse allele when we combined the genotyped SNP and the most significant imputed SNP in IL2RB in current smokers and a 36% similar increase in risk for former smokers associated with genotyped and imputed BCL2L14 SNPs.
Before they can be applied for risk prediction efforts, these SNPs should be subject to further external replication and more extensive fine mapping studies.
Inflammation SNPS; lung cancer; smokers
Studies in European and East Asian populations have identified lung cancer susceptibility loci in nicotinic acetylcholine receptor (nAChR) genes on chromosome 15q25.1 which also appear to influence smoking behaviors. We sought to determine if genetic variation in nAChR genes influences lung cancer susceptibly in African-Americans, and evaluated the association of these cancer susceptibility loci with smoking behavior. A total of 1308 African-Americans with lung cancer and 1241 African-American controls from three centers were genotyped for 378 single nucleotide polymorphisms (SNPs) spanning the sixteen human nAChR genes. Associations between SNPs and the risk of lung cancer were estimated using logistic regression, adjusted for relevant covariates. Seven SNPs in three nAChR genes were significantly associated with lung cancer at a strict Bonferroni-corrected level, including a novel association on chromosome 2 near the promoter of CHRNA1 (rs3755486: OR = 1.40, 95% CI = 1.18-1.67, P = 1.0 × 10−4). Association analysis of an additional 305 imputed SNPs on 2q31.1 supported this association. Publicly available expression data demonstrated that the rs3755486 risk allele correlates with increased CHRNA1 gene expression. Additional SNP associations were observed on 15q25.1 in genes previously associated with lung cancer, including a missense variant in CHRNA5 (rs16969968: OR = 1.60, 95% CI = 1.27-2.01, P = 5.9 × 10−5). Risk alleles on 15q25.1 also correlated with an increased number of cigarettes smoked per day among the controls. These findings identify a novel lung cancer risk locus on 2q31.1 which correlates with CHRNA1 expression and replicate previous associations on 15q25.1 in African-Americans.
Lung cancer; nicotine dependence; African-Americans; genetic association; smoking
ATM gene mutations have been implicated in many human cancers. However, the role of ATM polymorphisms in lung carcinogenesis is largely unexplored. We conducted a case-control analysis of 556 Caucasian non-small-cell lung cancer (NSCLC) patients and 556 controls frequency-matched on age, gender and smoking status. We genotyped 11 single nucleotide polymorphisms of the ATM gene and found that compared with the wild-type allele-containing genotypes, the homozygous variant genotypes of ATM08 (rs227060) and ATM10 (rs170548) were associated with elevated NSCLC risk with ORs of 1.55 (95% CI: 1.02–2.35) and 1.51 (0.99–2.31), respectively. ATM haplotypes and diplotypes were inferred using the Expectation-Maximization algorithm. Haplotype H5 was significantly associated with reduced NSCLC risk in former smokers with an OR of 0.47 (0.25–0.96) compared with the common H1 haplotype. Compared with the H1–H2 diplotype, H2–H2 and H3–H4 diplotypes were associated with increased NSCLC risk with ORs of 1.58 (0.99–2.54) and 2.29 (1.05–5.00), respectively. We then evaluated genotype–phenotype correlation in the control group using the comet assay to determine DNA damage and DNA repair capacity. Compared with individuals with at least 1 wild-type allele, the homozygous variant carriers of either ATM08 or ATM10 exhibited significantly increased DNA damage as evidenced by a higher mean value of the radiation-induced olive tail moment (ATM08: 4.86 ± 2.43 vs. 3.79 ± 1.51, p = 0.04; ATM10: 5.14 ± 2.37 vs. 3.79 ± 1.54, p = 0.01). Our study presents the first epidemiologic evidence that ATM genetic variants may affect NSCLC predisposition, and that the risk-conferring variants might act through down-regulating the functions of ATM in DNA repair activity upon genetic insults such as ionizing radiation.
ATM; polymorphism; haplotype; diplotype; NSCLC
Chromosome 5p15.33 has been identified by genome-wide association studies as one of the regions that associate with lung cancer risk. A few single-nucleotide polymorphisms (SNPs) in the telomerase reverse transcriptase (TERT) and cleft lip and palate transmembrane 1-like (CLPTM1L) genes located in this region have shown consistent associations. We performed dense genotyping of SNPs in this region to refine the previously reported association signals for lung cancer risk. Two hundred and fifteen SNPs were genotyped on an Illumina iSelect panel, in a hospital-based case–control study of 1681 lung cancer cases and 1235 unaffected controls. Association was tested using unconditional logistic regression, while adjusting for age, sex and pack-years smoked. Furthermore, since many of the SNPs were in linkage disequilibrium (LD), haplotype blocks were constructed, from which tagging SNPs at an r2 threshold of ≥0.95 were included in a stepwise forward selection logistic regression model. Of the 215 SNPs, 69 were significant at P < 0.05 in univariate analysis; of these, 35 SNPs meeting the r2 threshold were included in the multiple logistic regression model. Two SNPs, rs370348 (odds ratio = 0.76, P = 1.6 × 10−6) and rs4975538 (odds ratio = 1.18, P = 0.005), significantly associated with risk in the overall sample. Among ever smokers, rs4975615 (odds ratio = 0.75, P = 1.2 × 10−4) and rs4975538 (odds ratio = 1.26, P = 0.002) were significant, whereas among never-smokers, rs451360 (odds ratio = 0.62, P = 7.6 × 10−5) was significant. We refined the consistent association signal in this region, allowing for the considerable LD between SNPs and identified four novel SNPs that were independently and significantly associated with lung cancer risk. Results of these analyses strongly suggest effects on risk from several loci in the TERT/CLPTM1L region.
Detection of early stage non-small cell lung cancer (NSCLC) is commonly believed to be incidental. Understanding the reasons that caused initial detection of these patients is important for early diagnosis. However, these reasons are not well studied.
We retrospectively reviewed medical records of patients diagnosed with stage I or II NSCLC between 2000 and 2009 at UT MD Anderson Cancer Center. Information on suggestive LC-symptoms or other reasons that caused detection were extracted from patients' medical records. We applied univariate and multivariate analyses to evaluate the association of suggestive LC-symptoms with tumor size and patient survival.
Of the 1396 early stage LC patients, 733 (52.5%) presented with suggestive LC-symptoms as chief complaint. 347 (24.9%) and 287 (20.6%) were diagnosed because of regular check-ups and evaluations for other diseases, respectively. The proportion of suggestive LC-symptom-caused detection had a linear relationship with the tumor size (correlation 0.96; with p<.0001). After age, gender, race, smoking status, therapy, and stage adjustment, the symptom-caused detection showed no significant difference in overall and LC-specific survival when compared with the other (non-symptom-caused) detection.
Symptoms suggestive of LC are the number one reason that led to detection in early NSCLC. They were also associated with tumor size at diagnosis, suggesting early stage LC patients are developing symptoms. Presence of symptoms in early stages did not compromise survival. A symptom-based alerting system or guidelines may be worth of further study to benefit NSCLC high risk individuals.
Genome-wide association studies of white persons with lung cancer have identified a region of extensive linkage disequilibrium on chromosome 15q25.1 that appears to be associated with both risk for lung cancer and smoking dependence. Because studying African American persons, who exhibit lower levels of linkage disequilibrium in this region, may identify additional loci that are associated with lung cancer, we genotyped 34 single-nucleotide polymorphisms (SNPs) in this region (including LOC123688, PSMA4, CHRNA5, CHRNA3, and CHRNB4 genes) in 467 African American patients with lung cancer and 388 frequency-matched African American control subjects. Associations of SNPs in LOC123688 (rs10519203; odds ratio [OR] = 1.60, 95% confidence interval [CI] = 1.25 to 2.05, P = .00016), CHRNA5 (rs2036527; OR = 1.67, 95% CI = 1.26 to 2.21, P = .00031), and CHRNA3 (rs1051730; OR = 1.81, 95% CI = 1.26 to 2.59, P = .00137) genes with lung cancer risk reached Bonferroni-corrected levels of statistical significance (all statistical tests were two-sided). Joint logistic regression analysis showed that rs684513 (OR = 0.47, 95% CI = 0.31 to 0.71, P = .0003) in CHRNA5 and rs8034191 (OR = 1.76, 95% CI = 1.23 to 2.52, P = .002) in LOC123688 were also associated with risk. The functional A variant of rs1696698 in CHRNA5 had the strongest association with lung cancer (OR = 1.98, 95% CI = 1.25 to 3.11, P = .003). These SNPs were primarily associated with increased risk for lung adenocarcinoma histology and were only weakly associated with smoking phenotypes. Thus, among African American persons, multiple loci in the region of chromosome 15q25.1 appear to be strongly associated with lung cancer risk.
The genetic control of prostate cancer development is poorly understood. Large numbers of gene-expression datasets on different aspects of prostate tumorigenesis are available. We used these data to identify and prioritize candidate genes associated with the development of prostate cancer and bone metastases. Our working hypothesis was that combining meta-analyses on different but overlapping steps of prostate tumorigenesis will improve identification of genes associated with prostate cancer development.
A Z score-based meta-analysis of gene-expression data was used to identify candidate genes associated with prostate cancer development. To put together different datasets, we conducted a meta-analysis on 3 levels that follow the natural history of prostate cancer development. For experimental verification of candidates, we used in silico validation as well as in-house gene-expression data.
Genes with experimental evidence of an association with prostate cancer development were overrepresented among our top candidates. The meta-analysis also identified a considerable number of novel candidate genes with no published evidence of a role in prostate cancer development. Functional annotation identified cytoskeleton, cell adhesion, extracellular matrix, and cell motility as the top functions associated with prostate cancer development. We identified 10 genes--CDC2, CCNA2, IGF1, EGR1, SRF, CTGF, CCL2, CAV1, SMAD4, and AURKA--that form hubs of the interaction network and therefore are likely to be primary drivers of prostate cancer development.
By using this large 3-level meta-analysis of the gene-expression data to identify candidate genes associated with prostate cancer development, we have generated a list of candidate genes that may be a useful resource for researchers studying the molecular mechanisms underlying prostate cancer development.
We genotyped individuals with primary biliary cirrhosis and unaffected controls for suggestive risk loci (genome-wide association P < 1 × 10−4) identified in a previous genome-wide association study. Combined analysis of the genome-wide association and replication datasets identified IRF5-TNPO3 (combined P = 8.66 × 10−13), 7q12-21 (combined P = 3.50 × 10−13) and MMEL1 (combined P = 3.15 × 10−8) as new primary biliary cirrhosis susceptibility loci. Fine-mapping studies showed that a single variant accounts for the IRF5-TNPO3 association. As these loci are implicated in other autoimmune conditions, these findings confirm genetic overlap among such diseases.
Genome-wide association studies (GWASs) and global profiling of gene expression (microarrays) are two major technological breakthroughs that allow hypothesis-free identification of candidate genes associated with tumorigenesis. It is not obvious whether there is a consistency between the candidate genes identified by GWAS (GWAS genes) and those identified by profiling gene expression (microarray genes).
We used the Cancer Genetic Markers Susceptibility database to retrieve single nucleotide polymorphisms from candidate genes for prostate cancer. In addition, we conducted a large meta-analysis of gene expression data in normal prostate and prostate tumor tissue. We identified 13,905 genes that were interrogated by both GWASs and microarrays. On the basis of P values from GWASs, we selected 1,649 most significantly associated genes for functional annotation by the Database for Annotation, Visualization and Integrated Discovery. We also conducted functional annotation analysis using same number of the top genes identified in the meta-analysis of the gene expression data. We found that genes involved in cell adhesion were overrepresented among both the GWAS and microarray genes.
We conclude that the results of these analyses suggest that combining GWAS and microarray data would be a more effective approach than analyzing individual datasets and can help to refine the identification of candidate genes and functions associated with tumor development.
The genetic mechanisms of prostate tumorigenesis remain poorly understood, but with the advent of gene expression array capabilities, we can now produce a large amount of data that can be used to explore the molecular and genetic mechanisms of prostate tumorigenesis.
We conducted a meta-analysis of gene expression data from 18 gene array datasets targeting transition from normal to localized prostate cancer and from localized to metastatic prostate cancer. We functionally annotated the top 500 differentially expressed genes and identified several candidate pathways associated with prostate tumorigeneses.
We found the top differentially expressed genes to be clustered in pathways involving integrin-based cell adhesion: integrin signaling, the actin cytoskeleton, cell death, and cell motility pathways. We also found integrins themselves to be downregulated in the transition from normal prostate tissue to primary localized prostate cancer. Based on the results of this study, we developed a collagen hypothesis of prostate tumorigenesis. According to this hypothesis, the initiating event in prostate tumorigenesis is the age-related decrease in the expression of collagen genes and other genes encoding integrin ligands. This concomitant depletion of integrin ligands leads to the accumulation of ligandless integrin and activation of integrin-associated cell death. To escape integrin-associated death, cells suppress the expression of integrins, which in turn alters the actin cytoskeleton, elevates cell motility and proliferation, and disorganizes prostate histology, contributing to the histologic progression of prostate cancer and its increased metastasizing potential.
The results of this study suggest that prostate tumor progression is associated with the suppression of integrin-based cell adhesion. Suppression of integrin expression driven by integrin-mediated cell death leads to increased cell proliferation and motility and increased tumor malignancy.
To identify risk variants for lung cancer, we conducted a multistage genome-wide association study. In the discovery phase, we analyzed 315,450 tagging SNPs in 1,154 current and former (ever) smoking cases of European ancestry and 1,137 frequency-matched, ever-smoking controls from Houston, Texas. For replication, we evaluated the ten SNPs most significantly associated with lung cancer in an additional 711 cases and 632 controls from Texas and 2,013 cases and 3,062 controls from the UK. Two SNPs, rs1051730 and rs8034191, mapping to a region of strong linkage disequilibrium within 15q25.1 containing PSMA4 and the nicotinic acetylcholine receptor subunit genes CHRNA3 and CHRNA5, were significantly associated with risk in both replication sets. Combined analysis yielded odds ratios of 1.32 (P < 1 × 10−17) for both SNPs. Haplotype analysis was consistent with there being a single risk variant in this region. We conclude that variation in a region of 15q25.1 containing nicotinic acetylcholine receptors genes contributes to lung cancer risk.
Single nucleotide polymorphisms (SNPs) are the most common type of genetic variation in humans. However, the factors that affect SNP density are poorly understood. The goal of this study was to estimate the relative effects of mutability and selection on SNP density in transcribed regions of human genes. It is important for prediction of the regions that harbor functional polymorphisms.
We used frequency-validated SNPs resulting from single-nucleotide substitutions. SNPs were subdivided into five functional categories: (i) 5' untranslated region (UTR) SNPs, (ii) 3' UTR SNPs, (iii) synonymous SNPs, (iv) SNPs producing conservative missense mutations, and (v) SNPs producing radical missense mutations. Each of these categories was further subdivided into nine mutational categories on the basis of the single-nucleotide substitution type. Thus, 45 functional/mutational categories were analyzed. The relative mutation rate in each mutational category was estimated on the basis of published data. The proportion of segregating sites (PSSs) for each functional/mutational category was estimated by dividing the observed number of SNPs by the number of potential sites in the genome for a given functional/mutational category. By analyzing each functional group separately, we found significant positive correlations between PSSs and relative mutation rates (Spearman's correlation coefficient, at least r = 0.96, df = 9, P < 0.001). We adjusted the PSSs for the mutation rate and found that the functional category had a significant effect on SNP density (F = 5.9, df = 4, P = 0.001), suggesting that selection affects SNP density in transcribed regions of the genome. We used analyses of variance and covariance to estimate the relative effects of selection (functional category) and mutability (relative mutation rate) on the PSSs and found that approximately 87% of variation in PSS was due to variation in the mutation rate and approximately 13% was due to selection, suggesting that the probability that a site located in a transcribed region of a gene is polymorphic mostly depends on the mutability of the site.