In follow-up of a recent genome-wide association study (GWAS) that identified a locus in chromosome 2p21 associated with risk for renal cell carcinoma (RCC), we conducted a fine mapping analysis of a 120 kb region that includes EPAS1. We genotyped 59 tagged common single-nucleotide polymorphisms (SNPs) in 2278 RCC and 3719 controls of European background and observed a novel signal for rs9679290 [P = 5.75 × 10−8, per-allele odds ratio (OR) = 1.27, 95% confidence interval (CI): 1.17–1.39]. Imputation of common SNPs surrounding rs9679290 using HapMap 3 and 1000 Genomes data yielded two additional signals, rs4953346 (P = 4.09 × 10−14) and rs12617313 (P = 7.48 × 10−12), both highly correlated with rs9679290 (r2 > 0.95), but interestingly not correlated with the two SNPs reported in the GWAS: rs11894252 and rs7579899 (r2 < 0.1 with rs9679290). Genotype analysis of rs12617313 confirmed an association with RCC risk (P = 1.72 × 10−9, per-allele OR = 1.28, 95% CI: 1.18–1.39) In conclusion, we report that chromosome 2p21 harbors a complex genetic architecture for common RCC risk variants.
The question of which statistical approach is the most effective for investigating gene-environment (G-E) interactions in the context of genome-wide association studies (GWAS) remains unresolved. By using 2 case-control GWAS (the Nurses’ Health Study, 1976–2006, and the Health Professionals Follow-up Study, 1986–2006) of type 2 diabetes, the authors compared 5 tests for interactions: standard logistic regression-based case-control; case-only; semiparametric maximum-likelihood estimation of an empirical-Bayes shrinkage estimator; and 2-stage tests. The authors also compared 2 joint tests of genetic main effects and G-E interaction. Elevated body mass index was the exposure of interest and was modeled as a binary trait to avoid an inflated type I error rate that the authors observed when the main effect of continuous body mass index was misspecified. Although both the case-only and the semiparametric maximum-likelihood estimation approaches assume that the tested markers are independent of exposure in the general population, the authors did not observe any evidence of inflated type I error for these tests in their studies with 2,199 cases and 3,044 controls. Both joint tests detected markers with known marginal effects. Loci with the most significant G-E interactions using the standard, empirical-Bayes, and 2-stage tests were strongly correlated with the exposure among controls. Study findings suggest that methods exploiting G-E independence can be efficient and valid options for investigating G-E interactions in GWAS.
case-control studies; case study; diabetes mellitus, type 2; epidemiologic methods; genome-wide association study; genotype-environment interaction
Several methods for screening gene-environment interaction have recently been proposed that address the issue of using gene-environment independence in a data-adaptive way. In this report, the authors present a comparative simulation study of power and type I error properties of 3 classes of procedures: 1) the standard 1-step case-control method; 2) the case-only method that requires an assumption of gene-environment independence for the underlying population; and 3) a variety of hybrid methods, including empirical-Bayes, 2-step, and model averaging, that aim at gaining power by exploiting the assumption of gene-environment independence and yet can protect against false positives when the independence assumption is violated. These studies suggest that, although the case-only method generally has maximum power, it has the potential to create substantial false positives in large-scale studies even when a small fraction of markers are associated with the exposure under study in the underlying population. All the hybrid methods perform well in protecting against such false positives and yet can retain substantial power advantages over standard case-control tests. The authors conclude that, for future genome-wide scans for gene-environment interactions, major power gain is possible by using alternatives to standard case-control analysis. Whether a case-only type scan or one of the hybrid methods should be used depends on the strength and direction of gene-environment interaction and association, the level of tolerance for false positives, and the nature of replication strategies.
case-control studies; efficiency; familywise error rate; genome-wide association study; profile likelihood; robustness; shrinkage
Over the past several years, genome-wide association studies (GWAS) have succeeded in identifying hundreds of genetic markers associated with common diseases. However, most of these markers confer relatively small increments of risk and explain only a small proportion of familial clustering. To identify obstacles to future progress in genetic epidemiology research and provide recommendations to NIH for overcoming these barriers, the National Cancer Institute sponsored a workshop entitled “Next Generation Analytic Tools for Large-Scale Genetic Epidemiology Studies of Complex Diseases” on September 15–16, 2010. The goal of the workshop was to facilitate discussions on (1) statistical strategies and methods to efficiently identify genetic and environmental factors contributing to the risk of complex disease; and (2) how to develop, apply, and evaluate these strategies for the design, analysis, and interpretation of large-scale complex disease association studies in order to guide NIH in setting the future agenda in this area of research. The workshop was organized as a series of short presentations covering scientific (gene-gene and gene-environment interaction, complex phenotypes, and rare variants and next generation sequencing) and methodological (simulation modeling and computational resources and data management) topic areas. Specific needs to advance the field were identified during each session and are summarized.
gene-gene interactions; gene-environment interactions; rare variants; next generation sequencing; complex phenotypes; simulations; computational resources
In an analysis of 31,717 cancer cases and 26,136 cancer-free controls drawn from 13 genome-wide association studies (GWAS), we observed large chromosomal abnormalities in a subset of clones from DNA obtained from blood or buccal samples. Mosaic chromosomal abnormalities, either aneuploidy or copy-neutral loss of heterozygosity, of size >2 Mb were observed in autosomes of 517 individuals (0.89%) with abnormal cell proportions between 7% and 95%. In cancer-free individuals, the frequency increased with age; 0.23% under 50 and 1.91% between 75 and 79 (p=4.8×10−8). Mosaic abnormalities were more frequent in individuals with solid-tumors (0.97% versus 0.74% in cancer-free individuals, OR=1.25, p=0.016), with a stronger association for cases who had DNA collected prior to diagnosis or treatment (OR=1.45, p=0.0005). Detectable clonal mosaicism was common in individuals for whom DNA was collected at least one year prior to diagnosis of leukemia compared to cancer-free individuals (OR=35.4, p=3.8×10−11). These findings underscore the importance of the role and time-dependent nature of somatic events in the etiology of cancer and other late-onset diseases.
We conducted a genome-wide association study (GWAS) of breast cancer by genotyping 528,173 single nucleotide polymorphisms (SNPs) in 1,145 cases of invasive breast cancer among postmenopausal white women, and 1,142 controls. We identified a set of four SNPs in intron 2 of FGFR2, a tyrosine kinase receptor previously shown to be amplified and/or over-expressed in some breast cancers, as highly associated with breast cancer and we confirmed this association in 1,776 cases and 2,072 controls from three additional studies. In both association testing and ancestral recombination graph analysis, FGFR2 haplotypes were associated with risk of breast cancer. Across the four studies the association with all four SNPs was highly statistically significant (Ptrend for the most strongly associated SNP, rs1219648 = 1.1 × 10−10; population attributable risk = 16%). Four SNPs at other chromosomal loci most strongly associated with breast cancer in the initial GWAS were not associated with risk in the three replication studies. Our summary results from the GWAS are freely available online in a form that should speed the identification of additional loci conferring risk.
We introduce an innovative multilocus test for disease association. It is an extension of an existing score test that gains power over alternative methods by incorporating a parsimonious one-degree-of-freedom model for interaction. We use our method in applications designed to detect interactions that generate hypotheses about the functionality of prostate cancer (PRCA) susceptibility regions.
Our proposed score test is designed to gain additional power through the use of a retrospective likelihood that exploits an assumption of independence between unlinked loci in the underlying population. Its performance is validated through simulation. The method is used in conditional scans with data from stage II of the Cancer Genetic Markers of Susceptibility PRCA genome-wide association study.
Our proposed method increases power to detect susceptibility loci in diverse settings. It identified two high-ranking, biologically interesting interactions: (1) rs748120 of NR2C2 and subregions of 8q24 that contain independent susceptibility loci specific to PRCA and (2) rs4810671 of SULF2 and both JAZF1 and HNF1B that are associated with PRCA and type 2 diabetes.
Our score test is a promising multilocus tool for genetic epidemiology. The results of our applications suggest functionality for poorly understood PRCA susceptibility regions. They motivate replication study.
Gene-gene interaction; Score test; Prostate cancer
Genome-wide and candidate-gene association studies of bladder cancer have identified 10 susceptibility loci thus far. We conducted a meta-analysis of two previously published genome-wide scans (4501 cases and 6076 controls of European background) and followed up the most significant association signals [17 single nucleotide polymorphisms (SNPs) in 10 genomic regions] in 1382 cases and 2201 controls from four studies. A combined analysis adjusted for study center, age, sex, and smoking status identified a novel susceptibility locus that mapped to a region of 18q12.3, marked by rs7238033 (P = 8.7 × 10–9; allelic odds ratio 1.20 with 95% CI: 1.13–1.28) and two highly correlated SNPs, rs10775480/rs10853535 (r2= 1.00; P = 8.9 × 10–9; allelic odds ratio 1.16 with 95% CI: 1.10–1.22). The signal localizes to the solute carrier family 14 member 1 gene, SLC14A1, a urea transporter that regulates cellular osmotic pressure. In the kidney, SLC14A1 regulates urine volume and concentration whereas in erythrocytes it determines the Kidd blood groups. Our findings suggest that genetic variation in SLC14A1 could provide new etiological insights into bladder carcinogenesis.
Genetic association studies, thus far, have focused on the analysis of individual main effects of SNP markers. Nonetheless, there is a clear need for modeling epistasis or gene-gene interactions to better understand the biologic basis of existing associations. Tree-based methods have been widely studied as tools for building prediction models based on complex variable interactions. An understanding of the power of such methods for the discovery of genetic associations in the presence of complex interactions is of great importance. Here, we systematically evaluate the power of three leading algorithms: random forests (RF), Monte Carlo logic regression (MCLR), and multifactor dimensionality reduction (MDR).
We use the algorithm-specific variable importance measures (VIMs) as statistics and employ permutation-based resampling to generate the null distribution and associated p values. The power of the three is assessed via simulation studies. Additionally, in a data analysis, we evaluate the associations between individual SNPs in pro-inflammatory and immunoregulatory genes and the risk of non-Hodgkin lymphoma.
The power of RF is highest in all simulation models, that of MCLR is similar to RF in half, and that of MDR is consistently the lowest.
Our study indicates that the power of RF VIMs is most reliable. However, in addition to tuning parameters, the power of RF is notably influenced by the type of variable (continuous vs. categorical) and the chosen VIM.
Genetic associations; Power; Random forests; SNP; Variable importance measure
A prospective study of diet and cancer has not been conducted in India; consequently, little is known regarding follow-up rates or the completeness and accuracy of cancer case ascertainment.
We assessed follow-up in the India Health Study (IHS; 4,671 participants aged 35–69 residing in New Delhi, Mumbai, or Trivandrum). We evaluated the impact of medical care access and relocation, re-contacted the IHS participants to estimate follow-up rates, and conducted separate studies of cancer cases to evaluate registry coverage (604 cases in Trivandrum) and the accuracy of self- and proxy-reporting (1600 cases in New Delhi and Trivandrum).
Over 97% of people reported seeing a doctor and 85% had lived in their current residence for over six years. The 2-year follow-up rate was 91% for Trivandrum and 53% for New Delhi. No cancer cases were missed among public institutions participating in the surveillance program in Trivandrum during 2003–04; but there are likely to be unmatched cases (ranging from 5 to13% of total cases) from private hospitals in the Trivandrum registry, as there are no mandatory reporting requirements. Vital status was obtained for 36% of cancer cases in New Delhi as compared to 78% in Trivandrum after a period of 4 years.
A prospective cohort study of cancer may be feasible in some centers in India with active follow-up to supplement registry data. Inclusion of cancers diagnosed at private institutions, unique identifiers for individuals, and computerized medical information would likely improve cancer registries.
Cancer; end-point; follow-up; registry; prospective cohort; India
Next Generation Sequencing represents a powerful tool for detecting genetic variation associated with human disease. Because of the high cost of this technology, it is critical that we develop efficient study designs that consider the trade-off between the number of subjects (n) and the coverage depth (μ). How we divide our resources between the two can greatly impact study success, particularly in pilot studies. We propose a strategy for selecting the optimal combination of n and μ for studies aimed at detecting rare variants and for studies aimed at detecting associations between rare or uncommon variants and disease. For detecting rare variants, we find the optimal coverage depth to be between 2 and 8 reads when using the likelihood ratio test. For association studies, we find the strategy of sequencing all available subjects to be preferable. In deriving these combinations, we provide a detailed analysis describing the distribution of depth across a genome and the depth needed to identify a minor allele in an individual. The optimal coverage depth depends on the aims of the study, and the chosen depth can have a large impact on study success.
next generation sequencing; sequencing depth; study design; rare variants
We report a genome-wide association study in 10,286 cases and 9,135 controls of European ancestry, in the Cancer Genetic Markers of Susceptibility (CGEMS) initiative, identifying a new association with prostate cancer risk on chromosome 8q24 (rs620861, p=1.3×10-10, heterozygote OR = 1.17, 95% CI 1.10 – 1.24; homozygote OR = 1.33; 95% CI 1.21 – 1.45). This defines a new prostate locus on 8q24, Region 4, previously associated with breast cancer.
Previous genome-wide association studies have identified two independent variants in HNF1B as susceptibility loci for prostate cancer risk. To fine-map common genetic variation in this region, we genotyped 79 single nucleotide polymorphisms (SNPs) in the 17q12 region harboring HNF1B in 10 272 prostate cancer cases and 9123 controls of European ancestry from 10 case–control studies as part of the Cancer Genetic Markers of Susceptibility (CGEMS) initiative. Ten SNPs were significantly related to prostate cancer risk at a genome-wide significance level of P < 5 × 10−8 with the most significant association with rs4430796 (P = 1.62 × 10−24). However, risk within this first locus was not entirely explained by rs4430796. Although modestly correlated (r2= 0.64), rs7405696 was also associated with risk (P = 9.35 × 10−23) even after adjustment for rs4430769 (P = 0.007). As expected, rs11649743 was related to prostate cancer risk (P = 3.54 × 10−8); however, the association within this second locus was stronger for rs4794758 (P = 4.95 × 10−10), which explained all of the risk observed with rs11649743 when both SNPs were included in the same model (P = 0.32 for rs11649743; P = 0.002 for rs4794758). Sequential conditional analyses indicated that five SNPs (rs4430796, rs7405696, rs4794758, rs1016990 and rs3094509) together comprise the best model for risk in this region. This study demonstrates a complex relationship between variants in the HNF1B region and prostate cancer risk. Further studies are needed to investigate the biological basis of the association of variants in 17q12 with prostate cancer.
While lung cancer is largely caused by tobacco smoking, inherited genetic factors play a role in its etiology. Genome-wide association studies (GWAS) in Europeans have robustly demonstrated only three polymorphic variations influencing lung cancer risk. Tumor heterogeneity may have hampered the detection of association signal when all lung cancer subtypes were analyzed together. In a GWAS of 5,355 European smoking lung cancer cases and 4,344 smoking controls, we conducted a pathway-based analysis in lung cancer histologic subtypes with 19,082 SNPs mapping to 917 genes in the HuGE-defined “inflammation” pathway. We identified a susceptibility locus for squamous cell lung carcinoma (SQ) at 12p13.33 (RAD52, rs6489769), and replicated the association in three independent samples totaling 3,359 SQ cases and 9,100 controls (odds ratio=1.20, Pcombined=2.3×10−8).
The combination of pathway-based approaches and information on disease specific subtypes can improve the identification of cancer susceptibility loci in heterogeneous diseases.
Lung cancer; histology; squamous cell carcinoma; pathway analysis; RAD52
The resampling-based test, which often relies on permutation or bootstrap procedures, has been widely used for statistical hypothesis testing when the asymptotic distribution of the test statistic is unavailable or unreliable. It requires repeated calculations of the test statistic on a large number of simulated data sets for its significance level assessment, and thus it could become very computationally intensive. Here, we propose an efficient p-value evaluation procedure by adapting the stochastic approximation Markov chain Monte Carlo algorithm. The new procedure can be used easily for estimating the p-value for any resampling-based test. We show through numeric simulations that the proposed procedure can be 100–500 000 times as efficient (in term of computing time) as the standard resampling-based procedure when evaluating a test statistic with a small p-value (e.g. less than 10 − 6). With its computational burden reduced by this proposed procedure, the versatile resampling-based test would become computationally feasible for a much wider range of applications. We demonstrate the application of the new method by applying it to a large-scale genetic association study of prostate cancer.
Bootstrap procedures; Genetic association studies; p-value; Resampling-based tests; Stochastic approximation Markov chain Monte Carlo
The balance between Th1 and Th2 activity is critical in lymphoid cell development and differentiation. Immune dysfunction underlies lymphomagenesis, so an alteration in the regulation of key Th1/Th2 cytokines may lead to the development of non-Hodgkin lymphoma (NHL). To study the impact of polymorphism in Th1/Th2 cytokines on NHL risk, we analyzed 145 tag single nucleotide polymorphisms (SNPs) in 17 Th1/Th2 cytokine and related genes in three population-based case-control studies (1,946 cases and 1,808 controls). Logistic regression was used to compute odds ratios (OR) for NHL and four major NHL subtypes in relation to tag SNP genotypes and haplotypes. A gene-based analysis adjusting for the number of tag SNPs genotyped in each gene showed significant associations with risk of NHL combined and one or more NHL subtypes for Th1 (IL12A and IL12RB1) and Th2 (IL4, IL10RB, and IL18) genes. The strongest association was for IL12A rs485497, which plays a central role in bridging the cellular and humoral pathways of innate resistance and antigen-specific adaptive immune responses (allele risk OR=1.17; P(trend)=0.00099). This SNP was also associated specifically with risk of follicular lymphoma (allele risk OR=1.26; P(trend)=0.0012). These findings suggest that genetic variation in Th1/Th2 cytokine genes may contribute to lymphomagenesis.
Non-Hodgkin lymphoma; single nucleotide polymorphisms; immunogenetics; case-control study
Recent genome-wide association studies have identified independent susceptibility loci for prostate cancer (CaP) that could influence risk through interaction with other, possibly undetected, susceptibility loci. We explored evidence of interaction between pairs of 13 known susceptibility loci and single nucleotide polymorphisms (SNPs) across the genome to generate hypotheses about the functionality of CaP susceptibility regions. We used data from Cancer Genetic Markers of Susceptibility: Stage I included 523,841 SNPs in 1175 cases and 1100 controls; Stage II included 27,383 SNPs in an additional 3941 cases and 3964 controls. Power calculations assessed the magnitude of interactions our study is likely to detect. Logistic regression was used with alternative methods that exploit constraints of gene-gene independence between unlinked loci to increase power. Our empirical evaluation demonstrated that an empirical Bayes (EB) technique is powerful and robust to possible violation of the independence assumption. Our EB analysis identified several noteworthy interacting SNP pairs, although none reached genome-wide significance. We highlight a Stage II interaction between the major CaP susceptibility locus in the subregion of 8q24 that contains POU5F1B and an intronic SNP in the transcription factor EPAS1, which has potentially important functional implications for 8q24. Another noteworthy result involves interaction of a known CaP susceptibility marker near the prostate protease genes KLK2 and KLK3 with an intronic SNP in PRXX2. Overall, the interactions we have identified merit follow-up study, particularly the EPAS1 interaction which has implications not only in CaP but also in other epithelial cancers that are associated with the 8q24 locus.
The arylamine N-acetyltransferase 2 (NAT2) slow acetylation phenotype is an established risk factor for urinary bladder cancer. We previously reported on this risk association using NAT2 phenotypic categories inferred from NAT2 haplotypes based on 7 single nucleotide polymorphisms (SNPs) in a study in Spain. In a subsequent genome-wide scan, we have identified a single common tag SNP (rs1495741) located in the 3′ end of NAT2 that is also associated with bladder cancer risk. The aim of this report is to evaluate the agreement between the common tag SNP and the 7-SNP NAT2 inferred phenotype.
The agreement between the 7-SNP NAT2 inferred phenotype and the tag SNP, rs1495741, was initially assessed in 2,174 subjects from the Spanish Bladder Cancer Study (SBCS), and confirmed in a subset of subjects from the Main and Vermont component the New England Bladder Cancer Study (NEBCS). We also investigated the association of rs1495741 genotypes with NAT2 catalytic activity in cryopreserved hepatocytes from 154 individuals of European background.
We observed very strong agreement between rs1495741 and the 7-SNP inferred NAT2 phenotype: sensitivity and specificity for the NAT2 slow phenotype was 99% and 95%, respectively. Our findings were replicated in an independent population from the United States. Estimates for the association between NAT2 slow phenotype and bladder cancer risk in the SBCS and its interaction with cigarette smoking were comparable for the 7-SNP inferred NAT2 phenotype and rs1495741. In addition, rs1495741 genotypes were strongly related to NAT2 activity measured in hepatocytes (P<0.0001).
A novel NAT2 tag SNP (rs1495741) predicts with high accuracy the 7- SNP inferred NAT2 phenotype, and thus can be used as a sole marker in pharmacogenetic or epidemiological studies of populations of European background. These findings illustrate the utility of tag SNPs, often employed in genome-wide association studies (GWAS), to identify novel phenotypic markers. Further studies are required to determine the functional implications of this novel SNP and the structure and evolution of the haplotype on which it resides.
Genetic variation in immune-related genes may play a role in the development of non-Hodgkin lymphoma (NHL). To test the hypothesis that innate immunity polymorphisms may be associated with NHL risk, we genotyped 144 tag single nucleotide polymorphisms (tagSNPs) capturing common genetic variation within 12 innate immunity gene regions in three independent population-based case-control studies (1946 cases and 1808 controls). Gene-based analyses found IL1RN to be associated with NHL risk (minP = 0.03); specifically, IL1RN rs2637988 was associated with an increased risk of NHL (per-allele odds ratio = 1.15, 95% confidence interval = 1.05 – 1.27; ptrend = 0.003), which was consistent across study, subtype, and gender. FCGR2A was also associated with a decreased risk of the follicular lymphoma NHL subtype (minP = 0.03). Our findings suggest that genetic variation in IL1RN and FCGR2A may play a role in lymphomagenesis. Given that conflicting results have been reported regarding the association between IL1RN SNPs and NHL risk, a larger number of innate immunity genes with sufficient genomic coverage should be evaluated systematically across many studies.
non-Hodgkin lymphoma; immune; innate immunity; genetic variation; single nucleotide polymorphisms
When employing model selection methods with oracle properties such as the smoothly clipped absolute deviation (SCAD) and the Adaptive Lasso, it is typical to estimate the smoothing parameter by m-fold cross-validation, for example, m = 10. In problems where the true regression function is sparse and the signals large, such cross-validation typically works well. However, in regression modeling of genomic studies involving Single Nucleotide Polymorphisms (SNP), the true regression functions, while thought to be sparse, do not have large signals. We demonstrate empirically that in such problems, the number of selected variables using SCAD and the Adaptive Lasso, with 10-fold cross-validation, is a random variable that has considerable and surprising variation. Similar remarks apply to non-oracle methods such as the Lasso. Our study strongly questions the suitability of performing only a single run of m-fold cross-validation with any oracle method, and not just the SCAD and Adaptive Lasso.
Adaptive Lasso; Lasso; Model selection; Oracle estimation
Pathway analysis of genome-wide association studies (GWAS) offer a unique opportunity to collectively evaluate genetic variants with effects that are too small to be detected individually. We applied a pathway analysis to a bladder cancer GWAS containing data from 3,532 cases and 5,120 controls of European background (n = 5 studies). Thirteen hundred and ninety-nine pathways were drawn from five publicly available resources (Biocarta, Kegg, NCI-PID, HumanCyc, and Reactome), and we constructed 22 additional candidate pathways previously hypothesized to be related to bladder cancer. In total, 1421 pathways, 5647 genes and ∼90,000 SNPs were included in our study. Logistic regression model adjusting for age, sex, study, DNA source, and smoking status was used to assess the marginal trend effect of SNPs on bladder cancer risk. Two complementary pathway-based methods (gene-set enrichment analysis [GSEA], and adapted rank-truncated product [ARTP]) were used to assess the enrichment of association signals within each pathway. Eighteen pathways were detected by either GSEA or ARTP at P≤0.01. To minimize false positives, we used the I2 statistic to identify SNPs displaying heterogeneous effects across the five studies. After removing these SNPs, seven pathways (‘Aromatic amine metabolism’ [PGSEA = 0.0100, PARTP = 0.0020], ‘NAD biosynthesis’ [PGSEA = 0.0018, PARTP = 0.0086], ‘NAD salvage’ [PARTP = 0.0068], ‘Clathrin derived vesicle budding’ [PARTP = 0.0018], ‘Lysosome vesicle biogenesis’ [PGSEA = 0.0023, PARTP<0.00012], ’Retrograde neurotrophin signaling’ [PGSEA = 0.00840], and ‘Mitotic metaphase/anaphase transition’ [PGSEA = 0.0040]) remained. These pathways seem to belong to three fundamental cellular processes (metabolic detoxification, mitosis, and clathrin-mediated vesicles). Identification of the aromatic amine metabolism pathway provides support for the ability of this approach to identify pathways with established relevance to bladder carcinogenesis.
Telomeres cap chromosome ends and are critical for genomic stability. Many telomere-associated proteins are important for telomere length maintenance. Recent genome-wide association studies (GWAS) have identified single nucleotide polymorphisms (SNPs) in genes encoding telomere-associated proteins (RTEL1 and TERT-CLPTM1) as markers of cancer risk. We conducted an association study of telomere length and 743 SNPs in 43 telomere biology genes. Telomere length in peripheral blood DNA was determined by Q-PCR in 3,646 participants from the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial and Nurses' Health Study. We investigated associations by SNP, gene, and pathway (functional group). We found no associations between telomere length and SNPs in TERT-CLPTM1L or RTEL1. Telomere length was not significantly associated with specific functional groups. Thirteen SNPs from four genes (MEN1, MRE11A, RECQL5, and TNKS) were significantly associated with telomere length. The strongest findings were in MEN1 (Gene-based P=0.006), menin, which associates with the telomerase promoter and may negatively regulate telomerase. This large association study did not find strong associations with telomere length. The combination of limited diversity and evolutionary conservation suggest that these genes may be under selective pressure. More work is needed to explore the role of genetic variants in telomere length regulation.
Telomere length; single-nucleotide polymorphism; SNP; telomere biology; epidemiology