DNA repair pathway genes play an important role in maintaining genomic integrity and protecting against cancer development. This study aimed to identify novel SNPs in the DNA repair–related genes associated with melanoma risk from a genome-wide association study (GWAS).
A total of 8,422 SNPs from the 165 DNA repair–related genes were extracted from a GWAS of melanoma risk, including 494 cases and 5,628 controls from the Nurses’ Health Study (NHS) and the Health Professionals Follow-up Study (HPFS). We further replicated the top SNPs in a GWAS of melanoma risk from the MD Anderson Cancer Center (1,804 cases and 1,026 controls).
A total of 3 SNPs with P value < 0.001 were selected for in silico replication. One SNP was replicated: rs3902093 [A] in EXO1 promoter region (Pdiscovery = 6.6×10-4, Preplication = 0.039, Pjoint = 2.5×10-4; ORjoint = 0.80, 95% CI: 0.71, 0.90). This SNP was associated with the expression of the EXO1; carriers of the A allele showed lower expression (P = 0.002).
Our study found that a promoter region SNP in the editing and processing nucleases gene EXO1 was associated with decreased expression of EXO1 and decreased melanoma risk. Further studies are warranted to validate this association and to investigate the potential mechanisms.
Susceptibility to primary biliary cirrhosis (PBC) is strongly associated with HLA region polymorphisms. To determine if associations can be explained by classical HLA determinants we studied Italian 676 cases and 1440 controls with genotyped with dense single nucleotide polymorphisms (SNPs) for which classical HLA alleles and amino acids were imputed. Although previous genome-wide association studies and our results show stronger SNP associations near DQB1, we demonstrate that the HLA signals can be attributed to classical DRB1 and DPB1 genes. Strong support for the predominant role of DRB1 is provided by our conditional analyses. We also demonstrate an independent association of DPB1. Specific HLA-DRB1 genes (*08, *11 and *14) account for most of the DRB1 association signal. Consistent with previous studies, DRB1*08 (p = 1.59 × 10−11) was the strongest predisposing allele where as DRB1*11 (p = 1.42 × 10−10) was protective. Additionally DRB1*14 and the DPB1 association (DPB1*03:01) (p = 9.18 × 10−7) were predisposing risk alleles. No signal was observed in the HLA class 1 or class 3 regions. These findings better define the association of PBC with HLA and specifically support the role of classical HLA-DRB1 and DPB1 genes and alleles in susceptibility to PBC.
genetic risk; risk allele; imputation; antigen binding pocket; autoimmune disease
The detection of tumor suppressor gene promoter methylation in sputum-derived exfoliated cells predicts early lung cancer. Here we identified genetic determinants for this epigenetic process and examined their biological effects on gene regulation. A two-stage approach involving discovery and replication was employed to assess the association between promoter hypermethylation of a 12-gene panel and common variation in 40 genes involved in carcinogen metabolism, regulation of methylation, and DNA damage response in members of the Lovelace Smokers Cohort (n=1434). Molecular validation of three identified variants was conducted using primary bronchial epithelial cells. Association of study-wide significance (P<8.2×10−5) was identified for rs1641511, rs3730859, and rs1883264 in TP53, LIG1, and BIK, respectively. These SNPs were significantly associated with altered expression of the corresponding genes in primary bronchial epithelial cells. In addition, rs3730859 in LIG1 was also moderately associated with increased risk for lung cancer among Caucasian smokers. Together, our findings suggest that genetic variation in DNA replication and apoptosis pathways impacts the propensity for gene promoter hypermethylation in the aerodigestive tract of smokers. The incorporation of genetic biomarkers for gene promoter hypermethylation with clinical and somatic markers may improve risk assessment models for lung cancer.
DNA damage response; promoter hypermethylation; single nucleotide polymorphism; sputum; smoker
Over the past several years, genome-wide association studies (GWAS) have succeeded in identifying hundreds of genetic markers associated with common diseases. However, most of these markers confer relatively small increments of risk and explain only a small proportion of familial clustering. To identify obstacles to future progress in genetic epidemiology research and provide recommendations to NIH for overcoming these barriers, the National Cancer Institute sponsored a workshop entitled “Next Generation Analytic Tools for Large-Scale Genetic Epidemiology Studies of Complex Diseases” on September 15–16, 2010. The goal of the workshop was to facilitate discussions on (1) statistical strategies and methods to efficiently identify genetic and environmental factors contributing to the risk of complex disease; and (2) how to develop, apply, and evaluate these strategies for the design, analysis, and interpretation of large-scale complex disease association studies in order to guide NIH in setting the future agenda in this area of research. The workshop was organized as a series of short presentations covering scientific (gene-gene and gene-environment interaction, complex phenotypes, and rare variants and next generation sequencing) and methodological (simulation modeling and computational resources and data management) topic areas. Specific needs to advance the field were identified during each session and are summarized.
gene-gene interactions; gene-environment interactions; rare variants; next generation sequencing; complex phenotypes; simulations; computational resources
Recent data showed that melanoma was more common among patients with Parkinson’s disease (PD) than individuals without PD and vice versa. It has been hypothesized that these two diseases may share common genetic and environmental risk factors.
We evaluated the association between single-nucleotide polymorphisms (SNPs) selected based on recent genome-wide association studies (GWAS) on PD risk and the risk of melanoma using 2,297 melanoma cases and 6,651 controls.
The PD SNP rs156429 in the chromosome 7p15 region was nominally associated with melanoma risk with p-value of 0.04, which was not significant after the Bonferroni correction for multiple comparisons. No association was observed between the remaining 31 PD SNPs and the risk of melanoma. The genetic score based on the number of PD risk allele was not associated with melanoma risk (odds ratio for the highest genetic score quartile (30–35) vs. the lowest (15–20), 1.13, 95% confidence interval, 0.47–2.70).
The PD SNPs identified in published GWAS do not appear to play an important role in melanoma development.
The PD susceptibility loci discovered by GWAS contribute little to the observed epidemiological association between the PD and melanoma.
We performed a multistage genome-wide association study of melanoma. In a discovery cohort of 1804 melanoma cases and 1026 controls, we identified loci at chromosomes 15q13.1 (HERC2/OCA2 region) and 16q24.3 (MC1R) regions that reached genome-wide significance within this study and also found strong evidence for genetic effects on susceptibility to melanoma from markers on chromosome 9p21.3 in the p16/ARF region and on chromosome 1q21.3 (ARNT/LASS2/ANXA9 region). The most significant single-nucleotide polymorphisms (SNPs) in the 15q13.1 locus (rs1129038 and rs12913832) lie within a genomic region that has profound effects on eye and skin color; notably, 50% of variability in eye color is associated with variation in the SNP rs12913832. Because eye and skin colors vary across European populations, we further evaluated the associations of the significant SNPs after carefully adjusting for European substructure. We also evaluated the top 10 most significant SNPs by using data from three other genome-wide scans. Additional in silico data provided replication of the findings from the most significant region on chromosome 1q21.3 rs7412746 (P = 6 × 10−10). Together, these data identified several candidate genes for additional studies to identify causal variants predisposing to increased risk for developing melanoma.
Soft tissue sarcomas (STS) are heterogeneous mesenchymal tumors with diverse subtypes. STS can be classified into two main categories according to the type of genomic alteration: recurrent translocation driven STS, and non-recurrent translocations. However, little has known about acquired uniparental disomy in STS.
In this study, we analyzed SNP microarray data to determine the frequency and distribution patterns of acquired uniparental disomy (aUPD) in major soft tissue sarcoma (STS) subtypes using CNAG and R softwares.
We identified recurrent aUPD regions specific to alveolar rhabdomyosarcoma with the most frequent at 11p15.4, gastrointestinal stromal tumor at 1p36.11-p35.3, leiomyosarcoma at 17p13.3-p13.1, myxofibrosarcoma at 1p35.1-p34.2 and 16q23.3-q24.1, and pleomorphic liposarcoma at 13q13.2-q13.3 and 13q14.11-q14.2. In contrast, specific recurrent aUPD regions were not identified in dedifferentiated liposarcoma, Ewing sarcoma, myxoid/round cell liposarcoma, and synovial sarcoma. Strikingly total, centromeric and segmental aUPD regions are more frequent in STS that do not exhibit recurrent translocation events.
Our study yields a detailed map of aUPD across 9 diverse STS subtypes and suggests the potential location of several novel tumor suppressor genes and oncogenes.
Acquired uniparental disomy; Soft tissue sarcoma and whole-genome
Tumor size at diagnosis (TSD) indirectly reflects tumor growth rate. The relationship between TSD and smoking is poorly understood. The aim of the study was to determine the relationship between smoking and TSD. We reviewed 1712 newly diagnosed and previously untreated non-small cell lung cancer (NSCLC) patients’ electronic medical records and collected tumor characteristics. Demographic and epidemiologic characteristics were derived from questionnaires administered during personal interviews. Univariate and multivariate linear regression models were used to evaluate the relationship between TSD and smoking controlling for demographic and clinical factors. We also investigated the relationship between the rs1051730 SNP in an intron of the CHRNA3 gene (the polymorphism most significantly associated with lung cancer risk and smoking behavior) and TSD. We found a strong dose dependent relationship between TSD and smoking. Current smokers had largest and never smokers smallest TSD with former smokers having intermediate TSD. In the multivariate linear regression model, smoking status (never, former, and current), histological type (adenocarcinoma vs SqCC), and gender were significant predictors of TSD. Smoking duration and intensity may explain the gender effect in predicting TSD. We found that the variant allele of rs1051730 in CHRNA3 gene was associated with larger TSD of squamous cell carcinoma. In the multivariate linear regression model, both rs1051730 and smoking were significant predictors for the size of squamous carcinomas. We conclude that smoking is positively associated with lung tumor size at the moment of diagnosis.
Lung cancer; tumor size; epidemiologic characteristics; risk factors; CHRNA3
Genetic variants located at 15q25, including those in the cholinergic receptor nicotinic cluster (CHRNA5) have been implicated in both lung cancer risk and nicotine dependence in recent genome-wide association studies. Among these variants, a 22 base pair insertion/deletion, rs3841324 showed the strongest association with CHRNA5 mRNA expression levels. However the influence of rs3841324 on lung cancer risk has not been studied in depth.
We have therefore evaluated the association of rs3841324 genotypes with lung cancer risk in a case-control study of 624 Caucasian subjects with lung cancer and 766 age- and sex-matched cancer-free Caucasian controls. We also evaluated the joint effects of rs3841324 with single-nucleotide polymorphisms (SNPs) rs16969968 and rs8034191 in the 15q25 region that have been consistently implicated in lung cancer risk.
We found that the homozygous genotype with both short alleles (SS) of rs3841324 was associated with a decreased lung cancer risk in female ever smokers relative to the homozygous wild-type (LL) and heterozygous (LS) genotypes combined in a recessive model (OR adjusted = 0.55, 95% CI = 0.31–0.89, P = 0.0168). There was no evidence for a sex difference in the association between this variant and cigarettes smoked per day (CPD). Diplotype analysis of rs3841324 with either rs16969968 or rs8034191 showed that these polymorphisms influenced the lung cancer risk independently.
Conclusions and impact
This study has shown a sex difference in the association between the 15q25 variant rs3841324 and lung cancers. Further research is warranted to elucidate the mechanisms underlying these observations.
lung cancer; CHRNA5; Chromosome 15q25; rs3841324; sex-specific association
Genetic researchers often collect disease related quantitative traits in addition to disease status because they are interested in understanding the pathophysiology of disease processes. In genome-wide association (GWA) studies, these quantitative phenotypes may be relevant to disease development and serve as intermediate phenotypes or they could be behavioral or other risk factors that predict disease risk. Statistical tests combining both disease status and quantitative risk factors should be more powerful than case-control studies, as the former incorporates more information about the disease. In this paper, we proposed a modified inverse-variance weighted meta-analysis method to combine disease status and quantitative intermediate phenotype information. The simulation results showed that when an intermediate phenotype was available, the inverse-variance weighted method had more power than did a case-control study of complex diseases, especially in identifying susceptibility loci having minor effects. We further applied this modified meta-analysis to a study of imputed lung cancer genotypes with smoking data in 1154 cases and 1137 matched controls. The most significant SNPs came from the CHRNA3-CHRNA5-CHRNB4 region on chromosome 15q24–25.1, which has been replicated in many other studies. Our results confirm that this CHRNA region is associated with both lung cancer development and smoking behavior. We also detected three significant SNPs—rs1800469, rs1982072, and rs2241714—in the promoter region of the TGFB1 gene on chromosome 19 (p = 1.46×10−5, 1.18×10−5, and 6.57×10−6, respectively). The SNP rs1800469 is reported to be associated with chronic obstructive pulmonary disease and lung cancer in cigarette smokers. The present study is the first GWA study to replicate this result. Signals in the 3q26 region were also identified in the meta-analysis. We demonstrate the intermediate phenotype can potentially enhance the power of complex disease association analysis and the modified meta-analysis method is robust to incorporate intermediate phenotype or other quantitative risk factor in the analysis.
A mediation model explores the direct and indirect effects between an independent variable and a dependent variable by including other variables (or mediators). Mediation analysis has recently been used to dissect the direct and indirect effects of genetic variants on complex diseases using case-control studies. However, bias could arise in the estimations of the genetic variant-mediator association because the presence or absence of the mediator in the study samples is not sampled following the principles of case-control study design. In this case, the mediation analysis using data from case-control studies might lead to biased estimates of coefficients and indirect effects. In this article, we investigated a multiple-mediation model involving a three-path mediating effect through two mediators using case-control study data. We propose an approach to correct bias in coefficients and provide accurate estimates of the specific indirect effects. Our approach can also be used when the original case-control study is frequency matched on one of the mediators. We employed bootstrapping to assess the significance of indirect effects. We conducted simulation studies to investigate the performance of the proposed approach, and showed that it provides more accurate estimates of the indirect effects as well as the percent mediated than standard regressions. We then applied this approach to study the mediating effects of both smoking and chronic obstructive pulmonary disease (COPD) on the association between the CHRNA5-A3 gene locus and lung cancer risk using data from a lung cancer case-control study. The results showed that the genetic variant influences lung cancer risk indirectly through all three different pathways. The percent of genetic association mediated was 18.3% through smoking alone, 30.2% through COPD alone, and 20.6% through the path including both smoking and COPD, and the total genetic variant-lung cancer association explained by the two mediators was 69.1%.
Chromosome 5p15.33 has been identified by genome-wide association studies as one of the regions that associate with lung cancer risk. A few single-nucleotide polymorphisms (SNPs) in the telomerase reverse transcriptase (TERT) and cleft lip and palate transmembrane 1-like (CLPTM1L) genes located in this region have shown consistent associations. We performed dense genotyping of SNPs in this region to refine the previously reported association signals for lung cancer risk. Two hundred and fifteen SNPs were genotyped on an Illumina iSelect panel, in a hospital-based case–control study of 1681 lung cancer cases and 1235 unaffected controls. Association was tested using unconditional logistic regression, while adjusting for age, sex and pack-years smoked. Furthermore, since many of the SNPs were in linkage disequilibrium (LD), haplotype blocks were constructed, from which tagging SNPs at an r2 threshold of ≥0.95 were included in a stepwise forward selection logistic regression model. Of the 215 SNPs, 69 were significant at P < 0.05 in univariate analysis; of these, 35 SNPs meeting the r2 threshold were included in the multiple logistic regression model. Two SNPs, rs370348 (odds ratio = 0.76, P = 1.6 × 10−6) and rs4975538 (odds ratio = 1.18, P = 0.005), significantly associated with risk in the overall sample. Among ever smokers, rs4975615 (odds ratio = 0.75, P = 1.2 × 10−4) and rs4975538 (odds ratio = 1.26, P = 0.002) were significant, whereas among never-smokers, rs451360 (odds ratio = 0.62, P = 7.6 × 10−5) was significant. We refined the consistent association signal in this region, allowing for the considerable LD between SNPs and identified four novel SNPs that were independently and significantly associated with lung cancer risk. Results of these analyses strongly suggest effects on risk from several loci in the TERT/CLPTM1L region.
In this study, we observed loss of heterozygosity (LOH) in human chromosomal fragment 6q25.1 in sporadic lung cancer patients. LOH was observed in 65% of the 26 lung tumors examined and was narrowed down to a 2.2-Mb region. Single-nucleotide polymorphism (SNP) analysis of genes located within this region identified a candidate gene, termed p34. This gene, also designated as ZC3H12D, C6orf95, FLJ46041, or dJ281H8.1, carries an A/G nonsynonymous SNP at codon 106, which alters the amino acid from lysine to arginine. Nearly 73% of heterozygous lung cancer tissues with LOH and the A/G SNP also exhibited loss of the A allele. In vitro clonogenic and in vivo nude mouse studies showed that overexpression of the A allele exerts tumor suppressor function compared with the G allele. p34 is located within a recently mapped human lung cancer susceptibility locus, and association of the p34 A/G SNP was tested among these families. No significant association between the less frequent G allele and lung cancer susceptibility was found. Our results suggest that p34 may be a novel tumor suppressor gene involved in sporadic lung cancer but it seems not to be the candidate familial lung cancer susceptibility gene linked to chromosomal region 6q23-25.
The use of tyrosine kinase inhibitors (TKI) has yielded great success in treatment of lung adenocarcinomas. However, patients who develop resistance to TKI treatment often acquire a somatic resistance mutation (T790M) located in the catalytic cleft of the epidermal growth factor receptor (EGFR) enzyme. Recently, a report describing EGFR-T790M as a germ-line mutation suggested that this mutation may be associated with inherited susceptibility to lung cancer. Contrary to previous reports, our analysis indicates that the T790M mutation confers increased Y992 and Y1068 phosphorylation levels. In a human bronchial epithelial cell line, overexpression of EGFR-T790M displayed a growth advantage over wild-type (WT) EGFR. We also screened 237 lung cancer family probands, in addition to 45 bronchoalveolar tumors, and found that none of them contained the EGFR-T790M mutation. Our observations show that EGFR-T790M provides a proliferative advantage with respect to WT EGFR and suggest that the enhanced kinase activity of this mutant is the basis for rare cases of inherited susceptibility to lung cancer.
Variance components (VC) and the Bayesian Markov chain Monte Carlo (MCMC) analysis are two of the widely used linkage analysis approaches to mapping genes for complex quantitative traits. Both approaches can handle extended pedigrees and multiple markers and do not require a prespecified genetic model. In this study, we used simulated data to compare the performance of these two approaches with the traditional parametric linkage analysis. Using simulated data sets without linkage between a quantitative trait and the markers, we estimated a critical value for various test scores used in VC or MCMC and the location (LOC) score at a fixed level of significance (5%). These critical values were then used to determine the power for the three methods for simulated data sets with linkage. We found that both the VC and MCMC approaches worked well, compared with the LOC score, when there was only one gene underlying the quantitative trait; however, VC had higher power than the other methods in a simulation study of a complex phenotype influenced by more than one gene. We also compared two implementations of MCMC analysis, finding interpretation of results using the log of placement score was more accurate for linkage inference than the Bayes factor but required much more intensive simulation studies.
Variance components; Linkage analysis; Location score; Multipoint analysis; Model-free methods; Markov chain Monte Carlo; Statistical power
We conducted a genome-wide association study on cutaneous basal cell carcinoma (BCC) among 2045 cases and 6013 controls of European ancestry, with follow-up replication in 1426 cases and 4845 controls. A non-synonymous SNP in the MC1R gene (rs1805007 encoding Arg151Cys substitution), a previously well-documented pigmentation gene, showed the strongest association with BCC risk in the discovery set (rs1805007[T]: OR (95% CI) for combined discovery set and replication set [1.55 (1.45–1.66); P= 4.3 × 10−17]. We identified that an SNP rs12210050 at 6p25 near the EXOC2 gene was associated with an increased risk of BCC [rs12210050[T]: combined OR (95% CI), 1.24 (1.17–1.31); P= 9.9 × 10−10]. In the locus on 13q32 near the UBAC2 gene encoding ubiquitin-associated domain-containing protein 2, we also identified a variant conferring susceptibility to BCC [rs7335046 [G]; combined OR (95% CI), 1.26 (1.18–1.34); P= 2.9 × 10−8]. We further evaluated the associations of these two novel SNPs (rs12210050 and rs7335046) with squamous cell carcinoma (SCC) risk as well as melanoma risk. We found that both variants, rs12210050[T] [OR (95% CI), 1.35 (1.16–1.57); P= 7.6 × 10−5] and rs7335046 [G] [OR (95% CI), 1.21 (1.02–1.44); P= 0.03], were associated with an increased risk of SCC. These two variants were not associated with melanoma risk. We conclude that 6p25 and 13q32 are novel loci conferring susceptibility to non-melanoma skin cancer.
Genetic variations in the CYP2A6 nicotine metabolic gene and the CHRNA5-CHRNA3-CHRNB4 (CHRNA5-A3-B4) nicotinic gene cluster have been independently associated with lung cancer. With genotype data from ever-smokers of European ancestry (417 lung cancer patients and 443 control subjects), we investigated the relative and combined associations of polymorphisms in these two genes with smoking behavior and lung cancer risk. Kruskal–Wallis tests were used to compare smoking variables among the different genotype groups, and odds ratios (ORs) for cancer risk were estimated using logistic regression analysis. All statistical tests were two-sided. Cigarette consumption (P < .001) and nicotine dependence (P = .036) were the highest in the combined CYP2A6 normal metabolizers and CHRNA5-A3-B4 AA (tag single-nucleotide polymorphism rs1051730 G>A) risk group. The combined risk group also exhibited the greatest lung cancer risk (OR = 2.03; 95% confidence interval [CI] = 1.21 to 3.40), which was even higher among those who smoked 20 or fewer cigarettes per day (OR = 3.03; 95% CI = 1.38 to 6.66). Variation in CYP2A6 and CHRNA5-A3-B4 was independently and additively associated with increased cigarette consumption, nicotine dependence, and lung cancer risk. CYP2A6 and CHRNA5-A3-B4 appear to be more strongly associated with smoking behaviors and lung cancer risk, respectively.
Four genome-wide association (GWA) studies have found that variation in a region of strong linkage disequilibrium on the long arm of chromosome 15 (15q24-25.1), containing nicotinic acetylcholine receptor genes, contributes to lung cancer risk. Since cigarette smoking is a major risk factor for developing both lung cancer and pancreatic cancer, we hypothesized that variation in this region may also modify individual susceptibility to pancreatic cancer.
We conducted a case-control study of 532 patients with pathologically confirmed pancreatic adenocarcinoma and 1046 age-, sex-, ethnicity-, and smoking behavior-matched cancer-free controls.
We found that the two risk single nucleotide polymorphisms (SNPs) reported in the lung cancer GWA studies, rs8034191: A>G and rs1051730: G>A, located in this 15q24-25.1 region, were not associated with risk of pancreatic cancer.
The results of our study suggest that the two SNPs at 15q25.1 do not modify pancreatic cancer risk.
nicotinic acetylcholine receptor; single nucleotide polymorphisms; pancreatic cancer; case-control study and chromosome 15
Despite the significant advances made over the last few years in mapping inversions with the advent of paired-end sequencing approaches, our understanding of the prevalence and spectrum of inversions in the human genome has lagged behind other types of structural variants, mainly due to the lack of a cost-efficient method applicable to large-scale samples. We propose a novel method based on principal components analysis (PCA) to characterize inversion polymorphisms using high-density SNP genotype data. Our method applies to non-recurrent inversions for which recombination between the inverted and non-inverted segments in inversion heterozygotes is suppressed due to the loss of unbalanced gametes. Inside such an inversion region, an effect similar to population substructure is thus created: two distinct “populations” of inversion homozygotes of different orientations and their 1∶1 admixture, namely the inversion heterozygotes. This kind of substructure can be readily detected by performing PCA locally in the inversion regions. Using simulations, we demonstrated that the proposed method can be used to detect and genotype inversion polymorphisms using unphased genotype data. We applied our method to the phase III HapMap data and inferred the inversion genotypes of known inversion polymorphisms at 8p23.1 and 17q21.31. These inversion genotypes were validated by comparing with literature results and by checking Mendelian consistency using the family data whenever available. Based on the PCA-approach, we also performed a preliminary genome-wide scan for inversions using the HapMap data, which resulted in 2040 candidate inversions, 169 of which overlapped with previously reported inversions. Our method can be readily applied to the abundant SNP data, and is expected to play an important role in developing human genome maps of inversions and exploring associations between inversions and susceptibility of diseases.
With the availability of high-density genotype information, principal components analysis (PCA) is now routinely used to detect and quantify the genetic structure of populations in both population genetics and genetic epidemiology. An important issue is how to make appropriate and correct inferences about population relationships from the results of PCA, especially when admixed individuals are included in the analysis. We extend our recently developed theoretical formulation of PCA to allow for admixed populations. Because the sampled individuals are treated as features, our generalized formulation of PCA directly relates the pattern of the scatter plot of the top eigenvectors to the admixture proportions and parameters reflecting the population relationships, and thus can provide valuable guidance on how to properly interpret the results of PCA in practice. Using our formulation, we theoretically justify the diagnostic of two-way admixture. More importantly, our theoretical investigations based on the proposed formulation yield a diagnostic of multi-way admixture. For instance, we found that admixed individuals with three parental populations are distributed inside the triangle formed by their parental populations and divide the triangle into three smaller triangles whose areas have the same proportions in the big triangle as the corresponding admixture proportions. We tested and illustrated these findings using simulated data and data from HapMap III and the Human Genome Diversity Project.
We conducted a genome-wide association study on the number of melanocytic nevi reported by 9136 individuals of European ancestry, with follow-up replication in 3581 individuals. We identified the nidogen 1 (NID1) gene on 1q42 associated with nevus count (two linked single nucleotide polymorphisms with r2 > 0.9: rs3768080 A allele associated with reduced count, P = 6.5 × 10−8; and rs10754833 T allele associated with reduced count, P = 1.5 × 10−7). We further determined that the rs10754833 [T] was associated with a decreased melanoma risk in 2368 melanoma cases and 7432 controls [for CT genotype: odds ratio (OR) = 0.86, 95% confidence interval (CI) = 0.75–0.99, P = 0.04; for TT genotype: OR = 0.84, 95% CI = 0.71–0.98, P = 0.03]. Expression level of the NID1 locus was 2-fold higher for the rs10754833 T allele carriers than that with the CC genotype (P = 0.017) in the 87 HapMap CEU cell lines. The NID1 gene is a biologically plausible locus for nevogenesis and melanoma development, with decreased expression levels of NID1 in benign nevi (P = 3.5 × 10−6) and in primary melanoma (P = 4.6 × 10−4) compared with the normal skin.
A common variant on chromosomal region 15q24–25.1, marked by rs1051730, was found to be associated with lung cancer risk. Here, we attempted to confirm the second variant on 15q24–25.1 in several large sporadic lung cancer populations and determined what percentage of additional risk for lung cancer is due to the genetic effect of the second variant. SNPs rs1051730 and rs481134 were genotyped in 2,818 lung cancer cases and 2,766 controls from four populations. Joint analysis of these two variants (rs1051730 and rs481134) on 15q24–25.1 identified three major haplotypes (G_T, A_C, and G_C) and provided stronger evidence for association of 15q24–25.1 with lung cancer (P = 9.72 × 10−9). These two variants represent three levels of risk associated with lung cancer. The most common haplotype G_T is neutral; the haplotype A_C is associated with increased risk for lung cancer with 5.0% higher frequency in cases than in controls [P = 1.68 × 10−7; odds ratio (OR), 1.24; 95% confidence interval (95% CI), 1.14–1.35]; whereas the haplotype G_C is associated with reduced risk for lung cancer with 4.4% lower frequency in cases than in controls (P = 7.39 × 10−7; OR, 0.80; 95% CI, 0.73–0.87). We further showed that these two genetic variants on 15q24–25.1 independently influence lung cancer risk (rs1051730: P = 4.42 × 10−11; OR, 1.60; 95% CI, 1.46–1.74; rs481134: P = 7.01 × 10−4; OR, 0.81; 95% CI, 0.72–0.92). The second variant on 15q24–25.1, marked by rs481134, explains an additional 13.2% of population attributable risk for lung cancer.
Interindividual variation in genetic background may influence the response to chemotherapy and overall survival for patients with advanced-stage non–small cell lung cancer (NSCLC).
To identify genetic variants associated with poor overall survival in these patients, we conducted a genome-wide scan of 307 260 single-nucleotide polymorphisms (SNPs) in 327 advanced-stage NSCLC patients who received platinum-based chemotherapy with or without radiation at the University of Texas MD Anderson Cancer Center (the discovery population). A fast-track replication was performed for 315 patients from the Mayo Clinic followed by a second validation at the University of Pittsburgh in 420 patients enrolled in the Spanish Lung Cancer Group PLATAX clinical trial. A pooled analysis combining the Mayo Clinic and PLATAX populations or all three populations was also used to validate the results. We assessed the association of each SNP with overall survival by multivariable Cox proportional hazard regression analysis. All statistical tests were two-sided.
SNP rs1878022 in the chemokine-like receptor 1 (CMKLR1) was statistically significantly associated with poor overall survival in the MD Anderson discovery population (hazard ratio [HR] of death = 1.59, 95% confidence interval [CI] = 1.32 to 1.92, P = 1.42 × 10−6), in the PLATAX clinical trial (HR of death = 1.23, 95% CI = 1.00 to 1.51, P = .05), in the pooled Mayo Clinic and PLATAX validation (HR of death = 1.22, 95% CI = 1.06 to 1.40, P = .005), and in pooled analysis of all three populations (HR of death = 1.33, 95% CI = 1.19 to 1.48, P = 5.13 × 10−7). Carrying a variant genotype of rs10937823 was associated with decreased overall survival (HR of death = 1.82, 95% CI = 1.42 to 2.33, P = 1.73 × 10−6) in the pooled MD Anderson and Mayo Clinic populations but not in the PLATAX trial patient population (HR of death = 0.96, 95% CI = 0.69 to 1.35).
These results have the potential to contribute to the future development of personalized chemotherapy treatments for individual NSCLC patients.
DNA repair genes are important for maintaining genomic stability and limiting carcinogenesis. We analyzed all single nucleotide polymorphisms (SNPs) of 125 DNA repair genes covered by the Illumina HumanHap300 (v1.1) BeadChips in a previously conducted genome-wide association study (GWAS) of 1,154 lung cancer cases and 1,137 controls and replicated the top-hits of XRCC4 SNPs in an independent set of 597 cases and 611 controls in Texas populations. We found that six of 20 XRCC4 SNPs were associated with a decreased risk of lung cancer with a P value of 0.01 or lower in the discovery dataset, of which the most significant SNP was rs10040363 (P for allelic test = 4.89 ×10−4). Moreover, the data in this region allowed us to impute a potentially functional SNP rs2075685 (imputed P for allelic test = 1.3 ×10−3). A luciferase reporter assay demonstrated that the rs2075685G>T change in the XRCC4 promoter increased expression of the gene. In the replication study of rs10040363, rs1478486, rs9293329, and rs2075685, however, only rs10040363 achieved a borderline association with a decreased risk of lung cancer in a dominant model (adjusted OR = 0.80, 95% CI = 0.62–1.03, P = 0.079). In the final combined analysis of both the Texas GWAS discovery and replication datasets, the strength of the association was increased for rs10040363 (adjusted OR = 0.77, 95% CI = 0.66–0.89, Pdominant = 5×10−4 and P for trend = 5×10−4) and rs1478486 (adjusted OR = 0.82, 95% CI = 0.71 −0.94, Pdominant = 6×10−3 and P for trend = 3.5×10−3). Finally, we conducted a meta-analysis of these XRCC4 SNPs with available data from published GWA studies of lung cancer with a total of 12,312 cases and 47,921 controls, in which none of these XRCC4 SNPs was associated with lung cancer risk. It appeared that rs2075685, although associated with increased expression of a reporter gene and lung cancer risk in the Texas populations, did not have an effect on lung cancer risk in other populations. This study underscores the importance of replication using published data in larger populations.
XRCC4; variant; Genetic susceptibility; genome-wide association study; replication study
Telomeres play a critical role in maintaining genome integrity. Telomere shortening is associated with the risk of many aging-related diseases. Classic twin studies have shown that genetic components may contribute up to 80% of the heritability of telomere length. In the study we report here, we used a multi-stage genome-wide association study (GWAS) to identify genetic determinants of telomere length. The mean telomere length in peripheral blood leukocytes was measured by quantitative real-time polymerase chain reaction. We first analyzed 300,000 single-nucleotide polymorphisms (SNPs) in 459 healthy controls, finding 15,120 SNPs associated with telomere length at P < 0.05. We then validated these SNPs in two independent populations comprising 890 and 270 healthy controls, respectively. Four SNPs, including rs398652 on 14q21, were associated with telomere length across all three populations (pooled P-values of < 10−5). The variant alleles of these SNPs were associated with longer telomere length. We then analyzed the association of these SNPs with the risk of bladder cancer in a large case-control study. The variant allele of rs398652 was associated with a significantly reduced risk of bladder cancer (odds ratio = 0.81; 95% confidence interval, 0.67–0.97; P = 0.025), consistent with the correlation of this variant allele with longer telomeres. We then conducted a mediation analysis to examine whether the association between rs398652 and reduced bladder cancer risk is mediated by telomere length, finding that telomere length was a significant mediator of the relationship between rs398652 and bladder cancer (P = 0.013), explaining 14% of the effect. In conclusion, we found that the SNP rs398652 on 14q21 was associated with longer telomere length and a reduced risk of bladder cancer and that a portion of the effect of this SNP on bladder cancer risk was mediated by telomere length.
SNP; telomere length; GWAS; bladder cancer risk