|Home | About | Journals | Submit | Contact Us | Français|
Epithelial ovarian cancer (EOC) has a heritable component that remains to be fully characterized. Most identified common susceptibility variants lie in non-protein-coding sequences. We hypothesized that variants in the 3′ untranslated region at putative microRNA (miRNA) binding sites represent functional targets that influence EOC susceptibility. Here, we evaluate the association between 767 miRNA binding site single nucleotide polymorphisms (miRSNPs) and EOC risk in 18,174 EOC cases and 26,134 controls from 43 studies genotyped through the Collaborative Oncological Gene-environment Study. We identify several miRSNPs associated with invasive serous EOC risk (OR=1.12, P=10−8) mapping to an inversion polymorphism at 17q21.31. Additional genotyping of non-miRSNPs at 17q21.31 reveals stronger signals outside the inversion (P=10−10). Variation at 17q21.31 associates with neurological diseases, and our collaboration is the first to report an association with EOC susceptibility. An integrated molecular analysis in this region provides evidence for ARHGAP27 and PLEKHM1 as candidate EOC susceptibility genes.
Genome wide association studies (GWAS) have identified hundreds of genetic variants conferring low penetrance susceptibility to cancer1. More than 90% of these variants lie in non protein-encoding sequences including non-coding RNAs and regions containing regulatory elements (i.e. enhancers, promoters, untranslated regions (UTRs))1. The emerging hypothesis is that common variants within non-coding regulatory regions influence expression of target genes, thereby conferring disease susceptibility1.
MicroRNAs (miRNAs) are short non-coding RNAs that regulate gene expression post-transcriptionally by binding primarily to the 3′ UTR of target messenger RNA (mRNA), causing translational inhibition and/or mRNA degradation2-4. MiRNAs have been shown to play a key role in the development of epithelial ovarian cancer (EOC) 2. We 5,6 and others 7 have found evidence that various miRNA-related single nucleotide polymorphisms (miRSNPs) are associated with EOC risk, suggesting they may be key disruptors of gene function and contributors to disease susceptibility 8,9. However, studies of miRSNPs that affect miRNA-mRNA binding have been restricted by small sample sizes and therefore have limited statistical power to identify associations at genome wide levels of significance7-9. Larger-scale studies and more systematic approaches are warranted to fully evaluate the role of miRSNPs and their contribution to disease susceptibility.
Here, we use the in silico algorithms, TargetScan 10,11 and Pictar 12,13 to predict miRNA:mRNA binding regions involving genes and miRNAs relevant to EOC, and align identified regions with SNPs in the dbSNP database (Methods). We then genotype 1,003 miRSNPs (or tagging SNPs with r2>0.80) in 18,174 EOC cases and 26,134 controls from 43 studies from the Ovarian Cancer Association Consortium (OCAC) (Supplementary Table S1). Genotyping was performed on a custom Illumina Infinium iSelect array designed as part of the Collaborative Oncological Gene-environment Study (COGS), an international effort that evaluated 211,155 SNPs and their association with ovarian, breast, and prostate cancer risk. Our investigation uncovers 17q21.31 as a new susceptibility locus for EOC, and we provide insights into candidate genes and possible functional mechanisms underlying disease development at this locus.
Seven hundred and sixty-seven of the 1,003 miRSNPs passed genotype quality control (QC) and were evaluated for association with invasive EOC risk; most of the miRSNPs that failed QC were monomorphic (see Methods). Primary analysis of 14,533 invasive EOC cases and 23,491 controls of European ancestry revealed four strongly correlated SNPs (r2=0.99; rs1052587, rs17574361, rs4640231, and rs916793) that mapped to 17q21.31 and were associated with increased risk (per allele odds ratio (OR) = 1.10, 95% CI 1.06-1.13) at a genome-wide level of significance (10−7); no other miRSNPs had associations stronger than P<10−4 (Supplementary Fig. S1). The most significant association was for rs1052587 (P=1.9×10−7), and effects varied by histological subtype, with the strongest effect observed for invasive serous EOC cases (OR=1.12, P=4.6×10−8) (Table 1). No heterogeneity in ORs was observed across study sites (Supplementary Fig. S2).
Rs1052587, rs17574361, and rs4640231 reside in the 3′UTR of microtubule-associated protein tau (MAPT), KAT8 regulatory NSL complex subunit 1 (KANSL1/KIAA1267), and corticotrophin releasing hormone receptor 1 (CRHR1) genes, at putative binding sites for miR-34a, miR-130a, and miR-34c, respectively. The fourth SNP, rs916793, is perfectly correlated with rs4640231 and lies in a non-coding RNA, MAPT-antisense 1. 17q21.31 contains a ~900kb inversion polymorphism14 (ch 17: 43,624,578-44,525,051 MB, human genome build 37), and all three miRSNPs and the tagSNP are located within the inversion (Fig. 1).
Chromosomes with the non-inverted or inverted segments of 17q21.31, respectively known as haplotype 1 (H1) and haplotype 2 (H2), represent two distinct lineages that diverged ~3 million years ago and have not undergone any recombination event 14. The four susceptibility alleles identified here reside on the H2 haplotype that is reported to be rare in Africans and East Asians, but is common (frequency >20%) and exhibits strong linkage disequilibrium (LD) among Europeans 14, consistent with our findings. The H2 haplotype has a frequency of 22% among European women in our primary analysis (Table 1) but only 3.2% and 0.3% among Africans (151 invasive cases, 200 controls) and Asians (716 invasive cases, 1573 controls), respectively.
To increase genomic coverage at this locus, we evaluated an additional 142 non-miRSNPs at 17q21.31 that were also genotyped as part of COGS in the same series of OCAC cases and controls. We also imputed genotypes using data from the 1000 Genomes Project15. These approaches identified a second cluster of strongly correlated SNPs (r2>0.90) in a distinct region proximal to the inversion (centered at chromosome 17: 43.5 MB, human genome build 37) that was more significantly associated with the risk of all invasive EOCs (P= 10−9) and invasive serous EOC specifically (P= 10−10) than the cluster of identified miRSNPs (Fig. 1). Association results and annotation for SNPs in this second cluster are shown in Supplementary Table S2; this cluster includes three directly genotyped SNPs (rs2077606, rs17631303, and rs12942666), with the strongest association observed for rs2077606 among all invasive cases (OR=1.12, 95% CI: 1.08-1.16), P=7.8×10−9) and invasive serous cases (OR=1.15, 95% CI: 1.12-1.19, P=3.9×10−10). These SNPs were chosen for genotyping in COGS because they had shown evidence of association as modifiers of EOC risk in BRCA1 gene mutation carriers by the Consortium of Investigators of Modifiers of BRCA1/2 (CIMBA)16. Several imputed SNPs in strong LD (r2>0.90) were more strongly associated with risk than their highly correlated genotyped SNPs (Supplementary Table S2). This risk-associated region at 17q21.31 is distinct from a previously reported ovarian cancer susceptibility locus at 17q2117; neither the genotyped or imputed SNPs we report here are strongly correlated (maximum r2= 0.01) with SNPs from the 17q21 locus (spanning 46.2-46.5 MB, build 37).
Genotype clustering was poor for rs2077606, but clustering was good for its correlated SNP, rs12942666 (r2=0.99), and so results for this SNP are presented instead (Supplementary Fig. S2; Table 1). Subgroup analysis revealed marginal evidence of association for rs12942666 with endometrioid (P=0.04), but not mucinous or clear cell EOC subtypes (Table 1), and results were consistent across studies (Supplementary Fig. S4). Rs12942666 is correlated with the top-ranked miRSNP, rs1052587 (r2=0.76) (Fig. 1). To evaluate whether associations observed for rs12942666 and rs1052587 represented independent signals, stepwise logistic regression was used; only rs12942666 was retained in the model. This suggests that the cluster which includes rs12942666 is driving the association with EOC risk that was initially identified through the candidate miRSNPs.
To evaluate functional evidence for candidate genes, risk-associated SNPs, and regulatory regions at 17q21.31, we examined a one megabase region centered on rs12942666 using a combination of locus specific and genome-wide assays and in silico analyses of publicly available datasets, including The Cancer Genome Atlas (TCGA) Project18 (see Methods). Rs12942666 and many of its correlated SNPs lie within introns of Rho GTPase activating protein 27 (ARHGAP27) or its neighboring gene, pleckstrin homology domain containing, family M (with RUN domain) member 1 (PLEKHM1) (Supplementary Table S2). There are another 15 known protein-coding genes within the region: KIF18B, C1QL1, DCAKD, NMT1, PLCD3, ABCB4, HEXIM1, HEXIM2, FMNL1, C17orf46, MAP3K14, C17orf69, CRHR1, IMP5, and MAPT (Fig. 2a).
To evaluate the likelihood that one or more genes within this region represent target susceptibility gene(s), we first analyzed expression, copy number variation, and methylation involving these genes in EOC tissues and cell lines (Fig. 2b-g; Supplementary Tables S3 and S4). Most genes showed significantly higher expression (P<10−4) in EOC cell lines versus normal ovarian cancer-precursor tissues (OCPTs); ARHGAP27 showed the most pronounced difference in gene expression between cancer and normal cells (P=10−16) (Fig. 2b and Supplementary Table S3). For nine genes, we also found overexpression in primary high-grade serous (HGS) EOC tumors versus normal ovarian tissue in at least one of two publicly available datasets, The Cancer Genome Atlas (TCGA) of 568 tumors 18and/or the Gene Expression Omnibus (GEO) series GSE18520 dataset consisting of 53 tumors19 (Fig. 2c and Supplementary Table S3). Analysis of DNA copy number variation in TCGA revealed frequent loss of heterozygosity in this region rather than gains (Supplementary Fig. 5a-b; Supplementary Methods). We observed significant hypomethylation (P<0.01) in ovarian tumors compared to normal tissue for DCAKD, PLCD3, ACBD4, FMNL1, and PLEKHM1 (Fig. 2d and Supplementary Table S4), which is consistent with the overexpression observed for DCAKD, PLCD3, and FMNL1. Taken together, these data suggest that the mechanism underlying overexpression may be epigenetic rather than based on copy number alterations.
We evaluated associations between genotypes for the top risk SNP rs12942666 (or a tagSNP) and expression of all genes in the region (expression quantitative trait locus (eQTL) analysis) in normal OCPTs, lymphoblastoid cell lines (LCLs), and primary tumors from TCGA. We observed significant eQTL associations (P<0.05) in normal OCPTs only for ARHGAP27 (P=0.04) (Fig. 2e; Supplementary Table S3). Because rs12942666 was not genotyped in tissues analyzed in TCGA, we used data for its correlated SNP rs2077606 (r2=0.99) to evaluate eQTLs in tumor tissues. Rs2077606 genotypes were strongly associated with PLEKHM1 expression in primary HGS-EOCs (P=1×10−4) (Fig. 2f; Supplementary Table S3). We also detected associations between rs12942666 (and rs2077606) genotypes and methylation for PLEKHM1 and CRHR1 in primary tumors (P=0.020 and 0.001, respectively) using methylation quantitative trait locus (mQTL) analyses (Fig. 2g; Supplementary Table S4). Finally, the Catalogue of Somatic Mutations in Cancer (COSMIC) database 20 showed that nine genes in the region, including PLEKHM1, have functionally significant mutations in cancer, although for most genes mutations were not reported in ovarian carcinomas (Supplementary Table S3).
Taken together, these data suggest that several genes at the 17q21.31 locus may play a role in EOC development. The risk-associated SNPs we identified fall within non-coding DNA, suggesting the functional SNP(s) may be located within an enhancer, insulator, or other regulatory element that regulates expression of one of the candidate genes we evaluated. One hypothesis emerging from these molecular analyses is that rs12942666 (or a correlated SNP) mediates regulation of PLEKHM1, a gene implicated in osteopetrosis and endocytosis 21 and/or ARHGAP27, a gene that may promote carcinogenesis through dysregulation of Rho/Rac/Cdc42-like GTPases 22. To identify the most likely candidate for being the causal variant at 17q21.31, we compared the difference between log-likelihoods generated from un-nested logistic regression models for rs12942666 and each of 198 SNPs in a 1 MB region featured in Supplementary Table 2. As expected, the log likelihoods were very similar due to the strong LD; no SNPs emerged as having a likelihood ratio greater than 20 for being the causal variant.
To explore the possible functional significance of rs12942666 and strongly correlated variants (r2>0.80), we then generated a map of regulatory elements around rs12942666 using ENCODE data and FAIRE-seq analysis of OCPTs (Supplementary Methods). We observed no evidence of putative regulatory elements coinciding with rs12942666 or correlated SNPs (Fig. 3a). A map of regulatory elements in the entire 1 MB region can be seen in Supplementary Fig. 5c-f. We subsequently used in silico tools (ANNOVAR23, SNPinfo24, and SNPnexus25) to evaluate the putative function of possible causal SNPs (Supplementary Methods). Of 50 SNPs with possible functional roles, more than 30 reside in putative transcription factor binding sites (TFBS) within or near PLEKHM1 or ARHGAP27; 12 SNPs may affect methylation or miRNA binding, and two are non-synonymous coding variants predicted to be of no functional significance (Supplementary Table S2).
Since most of the top-ranked 17q21.31 SNPs with putative functions (including two of the top directly genotyped SNPs, rs2077606 and rs17631303), are predicted to lie in TFBS (Supplementary Table S2), we used the in silico tool, JASPAR 26 to further examine TFBS coinciding with these SNPs. Two SNPs scored highly in this analysis (Supplementary Table S5); the first, rs12946900, lies in a GAGGAA motif and canonical binding site for SPIB, an Ets family member27. Ets factors have been implicated in the development of ovarian cancer and other malignancies28, but little evidence supports a specific role for SPIB in EOC etiology. The second hit was for rs2077606, which lies in an E-box motif CACCTG at the canonical binding site for ZEB1 (chr. 10p11.2), a zinc-finger E-box binding transcription factor that represses E-cadherin29,30 and contributes to epithelial-mesenchymal transition in EOCs 31.
We analyzed expression of SPIB and ZEB1 in primary ovarian cancers using TCGA data; we found no significant difference in SPIB expression in tumors compared to normal tissues (Fig. 3bi). In contrast, ZEB1 expression was significantly lower in primary HGS-EOCs compared to normal tissues (P=0.005) (Fig. 3bii). We validated this finding using qPCR analysis in 123 EOC and OCPT cell lines (P=8.8 ×10−4) (Fig. 3biii). Since rs2077606 lies within an intron of PLEKHM1, this gene is a candidate target for ZEB1 binding at this site. Our eQTL analysis also suggests ARHGAP27 is a strong candidate ZEB1 target at this locus; ARHGAP27 expression is highest in OCPT cell lines carrying the minor allele of rs2077606 (P=0.034) (Figure 3ci). Although we observed no eQTL associations between rs2077606 and ZEB1 expression in LCLs (Figure 3cii), we found evidence of eQTL between rs2077606 and ZEB1 expression in HGS-EOCs (P=0.045) (Figure 3ciii). ZEB1 binding at the site of the common allele is predicted to repress gene expression while loss of ZEB1 binding conferred by the minor allele may enable expression of ARHGAP27, consistent with the eQTL association in OCPTs (Fig. 3ci). Although this data supports a repressor role for ZEB1 in EOC development and suggests ARHGAP27 may be a functional target of rs2077606 (or a correlated SNP) in OCPTs through trans-regulatory interactions with ZEB1, it is important to investigate additional hypotheses as we continue to narrow down the list of target susceptibility genes, SNPs, and regulatory mechanisms that contribute to EOC susceptibility at this locus.
The present study represents the largest, most comprehensive investigation of the association between putative miRSNPs in the 3′ untranslated region and cancer risk. This and the systematic follow-up to evaluate associations with EOC risk for non-miRSNPs in the region identified 17q21.31 as a new susceptibility locus for EOC. Although the miRSNPs identified here may have some biological significance, our findings suggest that other types of variants in non-coding DNA, especially non-miRSNPs at the 17q21.31 locus, are stronger contributors to EOC risk. It is possible, however, that highly significant miRSNPs exist that were not identified in our study because a) they were not pre-selected for evaluation (i.e. they do not reside in a binding site involving miRNAs or genes with known relevance to EOC, or they reside in regions other than the 3′UTR3,4) and/or b) they were very rare and could not be designed or detected with our genotyping platform and sample size, respectively. Despite these limitations, the homogeneity between studies of varying designs and populations in the OCAC and the genome-wide levels of statistical significance imply that all detected associations are robust. Furthermore, molecular correlative analyses of genes within the region suggest that cis-acting genetic variants influencing non-coding DNA regulatory elements, miRNAs, and/or methylation underlie disease susceptibility at the 17q21.31 locus. Finally, these studies point to a subset of candidate genes (i.e. PLEKHM1, ARHGAP27) and transcription factors (i.e. ZEB1) that may influence EOC initiation and development.
This novel locus is one of eleven loci now identified that contains common genetic variants conferring low penetrance susceptibility to EOC in the general population 17,32,33,34. Genetic variants at several of these loci influence risks of more than one cancer type, suggesting that several cancers may share common mechanisms. For example, alleles at 5p15.33 and 19p13.1 are associated with estrogen-receptor-negative breast cancer and serous EOC susceptibility 32,35, and variants at 8q24 are associated with risk of EOC and other cancers 17,36. Genetic variation at 17q21.31 is also associated with frontotemporal dementia-spectrum disorders, Parkinson's disease, developmental delay, and alopecia 37-42. Through COGS, the CIMBA also recently identified 17q21.31 variants as modifying EOC risk in BRCA1 and BRCA2 carriers (P<10−8 in BRCA1/2 combined)16. In particular, rs17631303, which is perfectly correlated with rs2077606 and rs12942666, was among the top-ranking SNPs detected by CIMBA16. Consistent with our findings, CIMBA also provide data that suggests EOC risk is associated with altered expression of one or more genes in the 17q21.31 region16. Thus, results from this large-scale collaboration support a role for this locus in both BRCA1/2 and non-BRCA1/2 mediated EOC development. Before these findings can be integrated with variants from other confirmed loci and non-genetic factors to predict women at greatest risk of developing EOC and provide options for medical management of these risks, continued efforts will be needed to fine map the 17q21.31 region and to fully characterize the functional and mechanistic effects of potential causal SNPs in disease etiology and development.
Forty-three individual OCAC studies contributed samples and data to the COGS initiative. Nine of the 43 participating studies were case-only (GRR, HSK, LAX, ORE, PVD, RMH, SOC, SRO, UKR); cases from these studies were pooled with case-control studies from the same geographic region. The two national Australian case-control studies were combined into a single study to create 34 case-control sets. Details regarding the 43 participating OCAC studies are summarized in Supplementary Table S1. Briefly, cases were women diagnosed with histologically confirmed primary EOC (invasive or low malignant potential), fallopian tube cancer, or primary peritoneal cancer ascertained from population- and hospital-based studies and cancer registries. The majority of OCAC cases (>90%) do not have a family history of ovarian or breast cancer in a first-degree relative, and most have not been tested for BRCA1/2 mutations as part of their parent study. Controls were women without a current or prior history of ovarian cancer with at least one ovary intact at the reference date. All studies had data on disease status, age at diagnosis/interview, self-reported racial group, and histologic subtype. Most studies frequency-matched cases and controls on age-group and race.
To increase the likelihood of identifying miRSNPs with biological relevance to EOC, we reviewed published literature and consulted public databases to generate two lists of candidate genes: 1) 55 miRNAs reported to be deregulated in EOC tumors compared to normal tissue in at least one study 43-46, and 2) 665 genes implicated in the pathogenesis of EOC through gene expression analyses 47,48, somatic mutations 49, or genetic association studies 50,51. Many genes were identified through the Gene Prospector database51, a web-based application that selects and prioritizes potential disease-related genes using a highly curated, up-to-date database of genetic association studies.
Using each candidate gene list as input, we identified putative sites of miRNA:mRNA binding with the computational prediction algorithms TargetScan version 5.1 10,11 and PicTar 12,13 and Supplementary Methods). Each algorithm generated start and end coordinates for regions of miRNA binding, and database SNP (dbSNP)52 version 129 was mined to identify SNPs falling within the designated binding regions. Of 3,246 unique miRSNPs that were identified, 1102 obtained adequate design scores using Illumina's Assay Design Tool. The majority (n=1085, 98.5%) of the 1102 SNPs resided in predicted sites of miRNA binding (and therefore represent miRSNPs), while the remainder (n=17) are tagSNPs (r2 > 0.80) for miRSNPs that were not designable or had poor to moderate design scores. Ninety nine of the 1102 SNPs failed during custom assay development, leaving a total of 1,003 SNPs that were designed and genotyped.
The candidate miRSNPs selected for the current investigation were genotyped using a custom Illumina Infinium iSelect Array as part of the international Collaborative Oncological Gene-environment Study (COGS), an effort to evaluate 211,155 genetic variants for association with the risk of ovarian, breast, and prostate cancer. Samples and data were included from several consortia, including OCAC, the Breast Cancer Association Consortium (BCAC), the Consortium of Investigators of Modifiers of BRCA1/2 (CIMBA), and the Prostate Cancer Association Group to Investigate Cancer- Associated Alterations in the Genome (PRACTICAL). Although one of the primary goals of COGS was to replicate and fine-map findings from pooled genome-wide association studies (GWAS) from each consortia, this effort also aimed to genotype candidate SNPs of interest (such as the miRSNPs). The genotyping and QC process has been described recently in our report of OCAC's pooled GWAS findings34. Briefly, COGS genotyping was conducted at six centers, two of which were used for OCAC samples: McGill University and Génome Québec Innovation Centre (Montréal, Canada) (n=19,806) and Mayo Clinic Medical Genomics Facility (n=27,824). Each 96-well plate contained 250ng genomic DNA (or 500 ng whole genome-amplified DNA). Raw intensity data files were sent to the COGS data coordination center at the University of Cambridge for genotype calling and QC using the GenCall algorithm.
One thousand two hundred and seventy three OCAC samples were genotyped in duplicate. Genotypes were discordant for greater than 40 percent of SNPs for 22 pairs. For the remaining 1,251 pairs, concordance was greater than 99.6 percent. In addition we identified 245 pairs of samples that were unexpected genotypic duplicates. Of these, 137 were phenotypic duplicates and judged to be from the same individual. We used identity-by-state to identify 618 pairs of first-degree relatives. Samples were excluded according to the following criteria: 1) 1,133 samples with a conversion rate (the proportion of SNPs successfully called per sample) of less than 95 percent; 2) 169 samples with heterozygosity >5 standard deviations from the intercontinental ancestry specific mean heterozygosity; 3) 65 samples with ambiguous sex; 4) 269 samples with the lowest call rate from a first-degree relative pair 5) 1,686 samples that were either duplicate samples that were non-concordant for genotype or genotypic duplicates that were not concordant for phenotype. A total of 44,308 eligible subjects including 18,174 cases and 26,134 controls were available for analysis.
The process of SNP selection by the participating consortia has been summarized previously34. In total, 211,155 SNP assays were successfully designed, including 23,239 SNPs nominated by OCAC. Overall, 94.5% of OCAC-nominated SNPs passed QC. SNPs were excluded if: (1) the call rate was less than 95% with MAF > 5% or less than 99% with MAF < 5% (n=5,201); (2) they were monomorphic upon clustering (n=2,587); (3) p values of HWE in controls were less than 10−7 (n=2,914); (4) there was greater than 2% discordance in duplicate pairs (n=22); (5) no genotypes were called (n=1,311). Of 1,003 candidate miRSNPs genotyped, 767 passed QC criteria and were available for analysis; the majority of miRSNPs that were excluded were monomorphic (n=158, 67%). Genotype intensity cluster plots were visually inspected for the most strongly associated SNPs.
HapMap DNA samples for European (CEU, n=60), African (YRI, n=53) and Asian (JPT+CHB, n=88) populations were also genotyped using the COGS iSelect. We used the program LAMP 53 to estimate intercontinental ancestry based on the HapMap (release no. 23) genotype frequency data for these three populations. Eligible subjects with greater than 90 percent European ancestry were defined as European (n=39,773) and those with greater than 80 percent Asian or African ancestry were defined as Asian (n=2,382) or African respectively (n=387). All other subjects were defined as being of mixed ancestry (n=1,766). We then used a set of 37,000 unlinked markers to perform principal components analysis within each major population subgroup. To enable this analysis on very large sample sizes we used an in-house program written in C++ using the Intel MKL libraries for eigenvectors (available at http://ccge.medschl.cam.ac.uk/software/).
We used unconditional logistic regression treating the number of minor alleles carried as an ordinal variable (log-additive model) to evaluate the association between each SNP and EOC risk. Separate analyses were carried out for each ancestry group. The model for European subjects was adjusted for population substructure by including the first 5 eigenvalues from the principal components analysis. African- and Asian- ancestry-specific estimates were obtained after adjustment for the first two components representing each respective ancestry. Due to the heterogeneous nature of EOC, subgroup analysis was conducted to estimate genotype-specific odds ratios for serous carcinomas (the most predominant histologic subtype) and the three other main histological subtypes of EOC: endometrioid, mucinous, and clear cell. Separate analyses were also carried out for each study site, and site-specific ORs were combined using a fixed-effect meta-analysis. The I2 test of heterogeneity was estimated to quantify the proportion of total variation due to heterogeneity across studies, and the heterogeneity of odds ratios between studies was tested with Cochran's Q statistic. The R statistical package ‘r-meta’ was used to generate forest plots. Statistical analysis was conducted in PLINK54.
To increase genomic coverage, we imputed genotype data for the 17q21.31 region (chr17: 40,099,001-44,900,000, human genome build 37) with IMPUTE2.2 55 using phase 1 haplotype data from the January 2012 release of the 1000 genome project data 15. For each imputed genotype the expected number of minor alleles carried was estimated (as weights). IMPUTE provides estimated allele dosage for SNPs that were not genotyped and for samples with missing data for directly genotyped SNPs. Imputation accuracy was estimated using an r2quality metric. We excluded imputed SNPs from analysis where the estimated accuracy of imputation was low (r2<0.3).
We performed the following assays for each gene in the one megabase region centered on the most significant SNP at the 17q21.31 locus (see Supplementary Methods): gene expression analysis in EOC cell lines (n=51) compared to normal cell lines from ovarian cancer precursor tissues (OCPTs)56, including ovarian surface epithelial cells (OSECs) and fallopian tube secretory epithelial cells (FTSECs) (n=73), and CpG island methylation analysis in high grade serous ovarian cancer (HGS-EOC) tissues (n=106) and normal tissues (n=7). Genes in the region were also evaluated in silico by mining publicly available molecular data generated for primary EOCs and other cancer types, including The Cancer Genome Atlas (TCGA) analysis of 568 HGS EOCs18, the Gene Expression Omnibus series GSE18520 dataset of 53 HGS EOCs 19, and the Catalogue Of Somatic Mutations In Cancer (COSMIC) database20.
We used these data to 1) compare gene expression between a) EOC cell lines and normal cell lines and b) tumor tissue and normal tissue from TCGA, 2) compare gene methylation status in HGS-EOCs and normal tissue, 3) conduct gene expression quantitative trait locus (eQTL) analyses to evaluate genotype-gene expression associations in normal OCPTs, lymphoblastoid cells, and HGS-EOCs, and 4) conduct methylation quantitative trait locus (mQTL) analyses in HGS-EOCs to evaluate genotype-gene methylation associations. Data from ENCyclopedia Of DNA Elements (ENCODE) 57 were used to evaluate the overlap between regulatory elements in non-coding regions and risk-associated SNPs. ENCODE describes regulatory DNA elements (e.g. enhancers, insulators and promotors) and non-coding RNAs (e.g. miRNAs, long non-coding and piwi-interacting RNAs) that may be targets for susceptibility alleles. However, ENCODE does not include data for EOC associated tissues, and activity of such regulatory elements often varies in a tissue specific manner 57,58. Therefore, we profiled the spectrum of non-coding regulatory elements in OSECs and FTSECs using a combination of formaldehyde assisted isolation of regulatory elements sequencing (FAIRE-seq) and RNA sequencing (RNA-seq) (Supplementary Methods).
We thank all the individuals who took part in this study and all the researchers, clinicians and administrative staff who have made possible the many studies contributing to this work. In particular, we thank: D. Bowtell, P Webb, A. deFazio, D. Gertig, A. Green, P. Parsons, N. Hayward, and D. Whiteman (AUS); D. L. Wachter, S. Oeser, S. Landrith (BAV); G. Peuteman, T. Van Brussel and D. Smeets (BEL); the staff of the genotyping unit, S LaBoissière and F Robidoux (McGill University and Génome Québec Innovation Centre); U. Eilber and T. Koehler (GER); L. Gacucova (HMO); P. Schu¨rmann, F. Kramer, T.-W. Park-Simon, K. Beer-Grondke and D. Schmidt (HJO); G.L. Keeney, C. Hilker and J. Vollenweider (MAY); the state cancer registries of AL, AZ, AR, CA, CO, CT, DE, FL, GA, HI, ID, IL, IN, IA, KY, LA, ME, MD, MA, MI, NE, NH, NJ, NY, NC, ND, OH, OK, OR, PA, RI, SC, TN, TX, VA, WA, and WYL (NHS); L. Paddock, M. King, U. Chandran, A. Samoila, and Y. Bensman (NJO); M. Insua and R. Evey (Moffitt); M. Sherman, A. Hutchinson, N. Szeszenia- Dabrowska, B. Peplonska, W. Zatonski, A. Soni, P. Chao and M. Stagner (POL); C. Luccarini, P. Harrington the SEARCH team and ECRIC (SEA); the Scottish Gynaecological Clinical Trails group and SCOTROC1 investigators (SRO); W-H. Chow, Y-T. Gao, G. Yang, B-T. Ji (SWH); I. Jacobs, M. Widschwendter, E. Wozniak, N. Balogun, A. Ryan and J. Ford (UKO); M. Notaridou (USC); C. Pye (UKR); and V. Slusher (U19).
The COGS project is funded through a European Commission's Seventh Framework Programme grant (agreement number 223175 - HEALTH-F2-2009-223175). The Ovarian Cancer Association Consortium is supported by a grant from the Ovarian Cancer Research Fund thanks to donations by the family and friends of Kathryn Sladek Smith (PPD/RPCI.07). The scientific development and funding for this project were in part supported by the US National Cancer Institute (R01-CA-114343 and R01-CA114343-S1) and the Genetic Associations and Mechanisms in Oncology (GAME-ON): a NCI Cancer Post-GWAS Initiative (U19-CA148112).
This study made use of data generated by the Wellcome Trust Case Control consortium. A full list of the investigators who contributed to the generation of the data is available from http://www.wtccc.org.uk/. Funding for the project was provided by the Wellcome Trust under award 076113. The results published here are in part based upon data generated by The Cancer Genome Atlas Pilot Project established by the National Cancer Institute and National Human Genome Research Institute. Information about TCGA and the investigators and institutions who constitute the TCGA research network can be found at http://cancergenome.nih.gov/.
D.F.E. is a Principal Research Fellow of Cancer Research UK G.C.-T. and P.M.W. are supported by the National Health and Medical Research Council. BK holds an American Cancer Society Early Detection Professorship (SIOP-06-258-01-COUN). LEK is supported by a Canadian Institutes of Health Research Investigator award (MSH-87734). A.C.A. is Cancer Research-UK Senior Cancer Research Fellow.
Funding of the constituent studies was provided by the American Cancer Society (CRTG-00-196-01-CCE); the California Cancer Research Program (00-01389V-20170, N01-CN25403, 2II0200); the Canadian Institutes for Health Research (MOP-86727); Cancer Council Victoria; Cancer Council Queensland; Cancer Council New South Wales; Cancer Council South Australia; Cancer Council Tasmania; Cancer Foundation of Western Australia; the Cancer Institute of New Jersey; Cancer Research UK (C490/A6187, C490/A10119, C490/A10124, C536/A13086, C536/A6689); the Celma Mastry Ovarian Cancer Foundation; the Danish Cancer Society (94-222-52); the Norwegian Cancer Society, Helse Vest, the Norwegian Research Council; ELAN Funds of the University of Erlangen-Nuremberg; the Eve Appeal; the Helsinki University Central Hospital Research Fund; Imperial Experimental Cancer Research Centre (C1312/A15589); the Ovarian Cancer Research Fund; Nationaal Kankerplan of Belgium; Grant-in-Aid for the Third Term Comprehensive 10-Year Strategy for Cancer Control from the Ministry of Health Labour and Welfare of Japan; the L & S Milken Foundation; the Radboud University Nijmegen Medical Centre; the Polish Ministry of Science and Higher Education (4 PO5C 028 14, 2 PO5A 068 27); the Roswell Park Cancer Institute Alliance Foundation; the US National Cancer Institute (K07-CA095666, K07-CA143047, K22-CA138563, N01-CN55424, N01-PC067001, N01-PC035137, P01-CA017054, P01-CA087696, P50-CA105009, P50-CA136393, R01-CA014089, R01-CA016056, R01-CA017054, R01-CA049449, R01-CA050385, R01-CA054419, R01-CA058598, R01-CA058860, R01-CA061107, R01-CA061132, R01-CA063682, R01-CA064277, R01-CA067262, R01-CA071766, R01-CA074850, R01-CA076016, R01-CA080742, R01-CA080978, R01-CA087538, R01-CA092044, R01-095023, R01-CA106414, R01-CA122443, R01-CA136924, R01-CA112523, R01-CA114343, R01-CA126841, R01-CA149429, R03-CA113148, R03-CA115195, R37-CA070867, R37-CA70867, R01-CA83918, U01-CA069417, U01-CA071966, P30-CA15083, PSA 042205, and Intramural research funds); the US Army Medical Research and Materiel Command (DAMD17-98-1-8659, DAMD17-01-1-0729, DAMD17-02-1-0666, DAMD17-02-1-0669, W81XWH-10-1-02802); the Department of Defense Ovarian Cancer Research Program (W81XWH-07-1-0449); the National Health and Medical Research Council of Australia (199600 and 400281); the German Federal Ministry of Education and Research of Germany Programme of Clinical Biomedical Research (01 GB 9401); the state of Baden-Württemberg through Medical Faculty of the University of Ulm (P.685); the German Cancer Research Center; Pomeranian Medical University; the Minnesota Ovarian Cancer Alliance; the Mayo Foundation; the Fred C. and Katherine B. Andersen Foundation; the Malaysian Ministry of Higher Education (UM.C/HlR/MOHE/06) and Cancer Research Initiatives Foundation; the Lon V. Smith Foundation (LVS-39420); the Oak Foundation; the OHSU Foundation; the Mermaid I project; the Rudolf-Bartling Foundation; the UK National Institute for Health Research Biomedical Research Centres at the University of Cambridge, Imperial College London, University College Hospital “Womens Health Theme” and the Royal Marsden Hospital; WorkSafeBC.
Access to genotype data for SNPs that were not nominated by OCAC was provided by FJC and ACA on behalf of CIMBA BRCA1 GWAS investigators and was funded by the US National Cancer Institute (R01 CA128978) and U.S. Department of Defense Ovarian Cancer Idea award (W81XWH-10-1-0341). CIMBA BRCA1 GWAS investigators include: Irene L. Andrulis, Ontario Cancer Genetics Network, Cancer Care Ontario and Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, Canada; Trinidad Caldes, Molecular Oncology Laboratory, Hospital Clínico San Carlos, Madrid, Spain; Maria Adelaide Caligo, Section of Genetic Oncology, University Hospital of Pisa, Pisa, Italy; Olga Sinilnikova for GEMO Study Collaborators, Cancer Genetics Network “Groupe Genetique et Cancer”, Federation Nationale des Centres de Lutte Contre le Cancer, Lyon, France; Thomas V. O. Hansen, Genomic Medicine, Department of Clinical Biochemistry, Rigshospitalet, Copenhagen University Hospital, Copenhagen, Denmark; Matti Rookus and Frans Hogervorst for HEBON Investigators, Department of Epidemiology, The Netherlands Cancer Institute, Amsterdam, The Netherlands; Anna Jakubowska and Jan Lubinski, International Hereditary Cancer Center, Department of Genetics and Pathology, Pomeranian Medical University, Szczecin, Poland; Susan Peock for EMBRACE Investigators, Centre for Cancer Genetic Epidemiology, Department of Public Health and Primary Care, University of Cambridge, Strangeways Research Laboratory, Worts Causeway, Cambridge, United Kingdom; Georgia Chenevix-Trench for kConFab Investigators, Peter MacCallum Cancer Centre, East Melbourne, Victoria, Australia; BCFR Investigators, Breast Cancer Family Registry, Epidemiology and Genetics Research Program, DCCPS, National Cancer Institute, Rockville, Maryland, USA; Katherine L. Nathanson and Susan Domchek, Departments of Medicine and Medical Genetics and Abramson Cancer Center, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania, USA; Heli Nevanlinna, Department of Obstetrics and Gynecology, Helsinki University Central Hospital, Helsinki, Finland; Kenneth Offit, Memorial Sloan Kettering Cancer Center, New York, New York, USA; Ana Osorio and Javier Benitez, Human Genetics Group, Human Cancer Genetics Programme, Spanish National Cancer Centre, Madrid, Spain; Paolo Radice for CONSIT investigators, Unit of Molecular Bases of Genetic Risk and Genetic Testing, Department of Preventive and Predictive Medicine, Fondazione IRCCS Istituto Nazionale Tumori (INT); IFOM, Fondazione Istituto FIRC di Oncologia Molecolare, Milan, Italy; Christian F. Singer, Division of Special Gynecology, Medical University of Vienna, Vienna, Austria; SWE-BRCA Investigators, Karolinska University Hospital, Stockholm, Sweden; Rita Schmutzler for GC-HBOC investigators, Center of Familial Breast and Ovarian Cancer, Department of Obstetrics and Gynaecology and Center for Integrated Oncology (CIO), University of Cologne, Cologne, Germany; Andrew Godwin, Department of Pathology and Laboratory Medicine, University of Kansas Medical Center, Kansas City, Kansas; Ignacio Blanco and Conxi Lazaro, Hereditary Cancer Program, Instituto Catalan de Oncología, Barcelona, Spain; Marco Montagna, Immunology and Molecular Oncology Unit, Istituto Oncologico Veneto IOV–IRCCS, Padua, Italy; Mary S. Beattie, Cancer Risk Program, Departments of Medicine, Epidemiology, and Biostatistics, University of California at San Francisco, San Francisco, California, USA; Antonis C. Antoniou and Douglas F. Easton, Centre for Cancer Genetic Epidemiology, University of Cambridge, Cambridge, United Kingdom; Fergus J. Couch, Mayo Clinic, Rochester, Minnesota, USA.
Author Contributions: These authors contributed equally to this work: JPW, KL, and HCS, and ANAM, TAS, and SAG. Writing group: JPW, KL, HCS, AV, ANAM, SAG, TAS, ELG, BLF, SJR, and PDPP. All authors read and approved the final version of the manuscript.Provision of data and/or samples from contributing studies and institutions: JPW, KL, HCS, AV, JT, ZC, H-YL, YAC, Y- YT, XQ, SJR, RK, JL, NL, MCL, KA, HA-C, NA, AA, SMA, FB, LB, EB, JBS, MWB, MJB, GB, NB, LAB, ABW, RB, RB, QC, IC, JCC, SC, GCT, JQC, MSC, GAC, LSC, FJC, DWC, JMC, ADM, ED, JAD, TD, AdB, MD, DFE, DE, RE, ABE, PAF, DAF, JMF, MGC, AGM, GGG, RMG, JGB, MTG, MG, BG, JG, PH, MH, PH, FH, PH, MH, CH, EH, SH, AJ, AJ, HJ, KK, BYK, SBK, LEK, LAK, FK, GK, CK, SKK, JK, DL, SL, JML, NDL, AL, DAL, DL, JL, BKL, JL, KHL, JL, GL, LFAGM, KM, VM, JRM, UM, FM, KBM, TN, SAN, LN, RBN, HN, SN, HN, KO, SHO, IO, JP, CLP, TP, LMP, MCP, EMP, PR, SPR, HAR, LRR, MAR, AR, IR, IKR, HBS, IS, GS, VS, X-OS, YBS, WS, HS, MCS, BS, DS, RS, S-HT, KLT, DCT, PJT, SST, AMvA, IV, RAV, DV, AV, SW-G, RPW, NW, ASW, EW, LRW, BW, YLW, AHW, Y-BX, HPY, WZ, AZ, FZ, CMP, EI, JMS, AB, BLF, ELG, PDDP, ANAM, TAS, and SAG.
Collated and organized samples for genotyping: SJR and CMP.
Genotyping: JMC, DCT, FB, and DV.
Data analysis: JPW, JT, H-YL, YAC, BLF, MLL, and Y-Y T.
Functional analyses: SAG, ANAM, KL, HC S, AV, JL, RK, and SJR.
Bioinformatics support: ZC, XQ.