Common germline variation in the 5′ region proximal to precursor (pre-) miRNA gene sequences is evaluated for association with breast cancer risk and survival among African Americans and Caucasians.
We genotyped 9 single nucleotide polymorphisms (SNPs) within 6 miRNA gene regions previously associated with breast cancer, in 1972 cases and 1776 controls. In a race-stratified analysis using unconditional logistic regression, odds ratios (OR) and 95% confidence intervals (CI) were calculated to evaluate SNP association with breast cancer risk. Additionally, hazard ratios (HR) for breast cancer-specific mortality were estimated.
2 miR-185 SNPs provided suggestive evidence of an inverse association with breast cancer risk (rs2008591, OR = 0.72 (95% CI = 0.53 – 0.98, p-value = 0.04) and rs887205, OR = 0.71 (95% CI = 0.52 – 0.96, p-value = 0.03), respectively) among African Americans. Two SNPs, miR-34b/34c (rs4938723, HR = 0.57 (95% CI = 0.37 – 0.89, p-value = 0.01)) and miR-206 (rs6920648, HR = 0.77 (95% CI = 0.61 – 0.97, p-value = 0.02)), provided evidence of association with breast cancer survival. Further adjustment for stage resulted in more modest associations with survival (HR = 0.65 (95% CI = 0.42 – 1.02, p-value = 0.06 and HR = 0.79 (95% CI = 0.62 – 1.00, p-value = 0.05, respectively).
Our results suggest that germline variation in the 5' region proximal to pre-miRNA gene sequences may be associated with breast cancer risk among African Americans and breast cancer-specific survival generally, however further validation is needed to confirm these findings.
microRNA; breast cancer; germline; single nucleotide polymorphism; risk; survival
To determine a molecular basis for prognostic differences in glioblastoma multiforme (GBM), we employed a combinatorial network analysis framework to exhaustively search for molecular patterns in protein-protein interaction (PPI) networks. We identified a dysregulated molecular signature distinguishing short-term (survival<225 days) from long-term (survival>635 days) survivors of GBM using whole genome expression data from The Cancer Genome Atlas (TCGA). A 50-gene subnetwork signature achieved 80% prediction accuracy when tested against an independent gene expression dataset. Functional annotations for the subnetwork signature included “protein kinase cascade,” “IκB kinase/NFκB cascade,” and “regulation of programmed cell death” – all of which were not significant in signatures of existing subtypes. Finally, we used label-free proteomics to examine how our subnetwork signature predicted protein level expression differences in an independent GBM cohort of 16 patients. We found that the genes discovered using network biology had a higher probability of dysregulated protein expression than either genes exhibiting individual differential expression or genes derived from known GBM subtypes. In particular, the long-term survivor subtype was characterized by increased protein expression of DNM1 and MAPK1 and decreased expression of HSPA9, PSMD3, and CANX. Overall, we demonstrate that the combinatorial analysis of gene expression data constrained by PPIs outlines an approach for the discovery of robust and translatable molecular signatures in GBM.
Glioblastoma multiforme (GBM) is the most common and aggressive brain tumor in adults, and, while the median survival time for treated patients is approximately one year, subgroups of patients respond differently to the same treatments, with some patients showing little improvement and other patients living far longer than expected. These differences in treatment response indicate that the tumors may show molecular differences that we can harness to tailor cancer therapy. To this end, we sought to identify biomarkers of patient survival in GBM. To improve the applicability of our molecular markers to other patient groups, we constrained our markers using maps of protein-protein interactions, and we also employed a unique computational strategy that incorporates patient-to-patient molecular variability into the results. We identified a set of 50 genes comprising a subnetwork signature that successfully separated GBM patients by their survival times. Our approach to identifying this subnetwork signature also improved our ability to identify its protein products in an independent cohort of patients. In the ongoing search to improve cancer detection and treatment, our work represents a successful strategy for identifying reproducible biomarkers that can more efficiently lead to the discovery of druggable protein targets.
The risk of glioma has consistently been shown to be increased two-fold in relatives of patients with primary brain tumors (PBT). A recent genome-wide linkage study of glioma families provided evidence for a disease locus on 17q12-21.32, with the possibility of four additional risk loci at 6p22.3, 12p13.33-12.1, 17q22-23.2, and 18q23.
To identify the underlying genetic variants responsible for the linkage signals, we compared the genotype frequencies of 5,122 SNPs mapping to these five regions in 88 glioma cases with and 1,100 cases without a family history of PBT (discovery study). An additional series of 84 familial and 903 non-familial cases were used to replicate associations.
In the discovery study, 12 SNPs showed significant associations with family history of PBT (P < 0.001). In the replication study, two of the 12 SNPs were confirmed: 12p13.33-12.1 PRMT8 rs17780102 (P = 0.031) and 17q12-21.32 SPOP rs650461 (P = 0.025). In the combined analysis of discovery and replication studies, the strongest associations were attained at four SNPs: 12p13.33-12.1 PRMT8 rs17780102 (P = 0.0001), SOX5 rs7305773 (P = 0.0001) and STKY1 rs2418087 (P = 0.0003), and 17q12-21.32 SPOP rs6504618 (P = 0.0006). Further, a significant gene-dosage effect was found for increased risk of family history of PBT with these four SNPs in the combined data set (Ptrend < 1.0 ×10−8).
The results support the linkage finding that some loci in the 12p13.33-12.1 and 17q12-q21.32 may contribute to gliomagenesis and suggest potential target genes underscoring linkage signals.
Association; Polymorphisms; Glioma; Family history of primary brain tumor; Linkage analysis
One-fifth of all newly diagnosed breast cancer cases are ductal carcinoma in situ (DCIS), but little is known about DCIS risk factors. Recent studies suggest that some subtypes of DCIS (high grade, or comedo) share histopathologic and epidemiologic characteristics with invasive disease, while others (medium or low grade, or non-comedo) show different patterns. To investigate whether reproductive and hormonal risk factors differ among comedo and non-comedo types of DCIS and invasive breast cancer, we used a population-based case-control study of 1808 invasive and 446 DCIS breast cancer cases and their age and race frequency-matched controls (1564 invasive and 458 DCIS). Three or more full-term pregnancies showed a strong inverse association with comedo-type DCIS (odds ratio (OR) = 0.53, 95% confidence interval (CI) = 0.30, 0.95) and a weaker inverse association for non-comedo DCIS (OR = 0.73, 95% CI = 0.42, 1.27). Several risk factors (age at first full-term pregnancy, breastfeeding, and age at menopause) demonstrated similar associations for comedo-type DCIS and invasive breast cancer, but different associations for non-comedo DCIS. Ten or more years of oral contraceptive showed a positive association with comedo-type DCIS (OR = 1.31, 05% CI 0.70, 2.47) and invasive breast cancer (OR = 2.33, 95% CI 1.06, 5.09), but an inverse association for noncomedo DCIS (OR = 0.51, 95% CI 0.25-1.04). Our results support the theory that comedo-type DCIS may share hormonal and reproductive risk factors with invasive breast cancer, while the etiology of non-comedo DCIS deserves further investigation.
ductal carcinoma in situ; risk factors; breast cancer; epidemiology; reproductive
Primary malignant spinal glioma represents a significant clinical challenge due to the devastating effect on patient clinical outcomes seen in the majority of cases. As they are infrequently encountered in any one center, there has been little population-based data analysis on the incidence patterns of these aggressive tumors. The objective of this study was to use publically available Surveillance, Epidemiology and End Results (SEER) program data to examine overall incidence and incidence patterns over time with regard to patient age at diagnosis, gender, race, primary site of tumor and histological subtype for patients diagnosed with primary malignant spinal cord gliomas between 1973 and 2006.
The study population of interest was limited to primary, malignant, pathologically confirmed spinal cord gliomas using data from the SEER 9 standard registries for patients diagnosed between 1973 and 2006. Variables of interest included age at diagnosis, gender, race, primary site of tumor, and histological subtype of tumor. The SEER*Stat 6.5.2. program was used to calculate frequencies, age-adjusted incidence rates with 95% confidence intervals and annual percentage change (APC) statistics with a 2-sided p-values. In addition, linear correlation coefficients (R2) were calculated for the time association stratified by variables of interest.
The overall age-adjusted incidence rate for primary malignant spinal gliomas was 0.12 per 100,000 and increased significantly over the study time period (APC= 1.74; p-value=0.0004; R2=0.36). Incidence was highest for patients diagnosed at ages 35–49 (0.17 per 100,000), males (0.14 per 100,000), Whites (0.13 per 100,000) and those who had epdenymomas (0.07 per 100,000). Over the study period, the incidence of ependymomas increased significantly (APC = 3.17; p-value<0.0001; R2=0.58) as did the incidence of these tumors in Whites (APC = 2.13; p-value=0.001) and for both males (APC=1.90, p-value<0.0001) and females (APC=1.60, p-value<0.0001). No significant changes in incidence over time by age of diagnosis were found.
This study demonstrates an increasing overall incidence of primary, malignant spinal cord glioma over the past three decades. Notably, for ependymoma the incidence has increased, while the incidence of most other glioma subtypes remained stable. This may be due to improved diagnostic and surgical techniques, changes in histological classification criteria, and changes in neuro-pathology diagnostic criteria. Although rare, an improved understanding of the incidence of these rare tumors will assist investigators and clinicians in planning potential studies and preparing for allocation of resources to care for these challenging patients.
Spinal cord glioma; incidence; patterns over time; population-based; SEER
Epithelial ovarian cancer (EOC) has a heritable component that remains to be fully characterized. Most identified common susceptibility variants lie in non-protein-coding sequences. We hypothesized that variants in the 3′ untranslated region at putative microRNA (miRNA) binding sites represent functional targets that influence EOC susceptibility. Here, we evaluate the association between 767 miRNA binding site single nucleotide polymorphisms (miRSNPs) and EOC risk in 18,174 EOC cases and 26,134 controls from 43 studies genotyped through the Collaborative Oncological Gene-environment Study. We identify several miRSNPs associated with invasive serous EOC risk (OR=1.12, P=10−8) mapping to an inversion polymorphism at 17q21.31. Additional genotyping of non-miRSNPs at 17q21.31 reveals stronger signals outside the inversion (P=10−10). Variation at 17q21.31 associates with neurological diseases, and our collaboration is the first to report an association with EOC susceptibility. An integrated molecular analysis in this region provides evidence for ARHGAP27 and PLEKHM1 as candidate EOC susceptibility genes.
Purpose: An estimated 24%–45% of patients with cancer develop brain metastases. Individualized estimation of survival for patients with brain metastasis could be useful for counseling patients on clinical outcomes and prognosis. Methods: De-identified data for 2367 patients with brain metastasis from 7 Radiation Therapy Oncology Group randomized trials were used to develop and internally validate a prognostic nomogram for estimation of survival among patients with brain metastasis. The prognostic accuracy for survival from 3 statistical approaches (Cox proportional hazards regression, recursive partitioning analysis [RPA], and random survival forests) was calculated using the concordance index. A nomogram for 12-month, 6-month, and median survival was generated using the most parsimonious model. Results: The majority of patients had lung cancer, controlled primary disease, no surgery, Karnofsky performance score (KPS) ≥ 70, and multiple brain metastases and were in RPA class II or had a Diagnosis-Specific Graded Prognostic Assessment (DS-GPA) score of 1.25–2.5. The overall median survival was 136 days (95% confidence interval, 126–144 days). We built the nomogram using the model that included primary site and histology, status of primary disease, metastatic spread, age, KPS, and number of brain lesions. The potential use of individualized survival estimation is demonstrated by showing the heterogeneous distribution of the individual 12-month survival in each RPA class or DS-GPA score group. Conclusion: Our nomogram provides individualized estimates of survival, compared with current RPA and DS-GPA group estimates. This tool could be useful for counseling patients with respect to clinical outcomes and prognosis.
brain metastases; nomogram; prediction; prognosis; survival
Following a colonoscopy that is negative for cancer, a subset of patients may be diagnosed with colorectal cancer, also termed interval cancer. The frequency and predictors have not been well studied in a population-based U.S. cohort.
Using the linked SEER-Medicare database, we identified 57,839 patients aged ≥ 69 with colorectal cancer diagnosed between 1994 and 2005 and who underwent colonoscopy within 6 months of cancer diagnosis. Colonoscopy performed between 36 to 6 months prior to cancer diagnosis was a proxy for interval cancer.
Using the case definition, 7.2% of patients developed interval cancers. Factors associated with interval cancers included proximal tumor location (distal colon multivariable OR 0.42, 95% CI 0.390–0.46, rectum OR 0.47, 95% CI 0.42–0.53), increased comorbidity (OR 1.89 95% CI 1.68 2.14 for 3 or more comorbidities), a previous diagnosis of diverticulosis (OR 6.00 95% CI 5.57–6.46), and prior polypectomy (OR 1.74, 95% CI 1.62–1.87). Risk factors at the endoscopist level included a lower polypectomy rate (OR 0.70, 95% CI 0.63–0.78 for the highest quartile), higher colonoscopy volume (OR 1.27, 95% CI 1.13–1.43) and specialty other than gastroenterology (colorectal surgery OR 1.45, 95% CI 1.16–1.83; general surgery OR 1.42, 95% CI 1.24–1.62; internal medicine OR 1.38, 95% CI 1.17–1.63, family practice OR 1.16, 95% CI 1.00–1.35).
A significant proportion of patients develop interval colorectal cancer, particularly in the proximal colon. Contributing factors likely include both procedural and biologic factors, and emphasize the importance of meticulous examination of the mucosa.
We have previously established aberrant DNA methylation of Vimentin exon-1 (VIM methylation) as a common epigenetic event in colon cancer and as a biomarker for detecting colon neoplasia. We now examine VIM methylation in neoplasia of the upper gastrointestinal tract.
Using a quantitative real-time Methylation-Specific PCR assay we tested for VIM methylation in archival specimens of esophageal and gastric neoplasia.
We find that acquisition of aberrant VIM methylation is highly common in these neoplasms, but largely absent in controls. The highest frequency of VIM methylation was detected in lesions of the distal esophagus, including 91% of Barrett’s esophagus (BE, n=11), 100% of high grade dysplasia (HGD, n=5), and 81% of esophageal adenocarcinoma (EAC, n=26), but absent in controls (n=9). VIM methylation similarly was detected in 87% of signet ring (n=15) and 53% of intestinal type gastric cancers (n=17). Moreover, in tests of cytology brushings VIM methylation proved detectable in 100% of BE cases (n=7), 100% of HGD cases (n=4), and 83% of EAC cases (n=18), but was absent in all controls (n=5).
These findings establish aberrant VIM methylation as a highly common epigenetic alteration in neoplasia of the upper gastrointestinal tract, and demonstrate that Barrett’s esophagus, even without dysplasia, already contains epigenetic alterations characteristic of adenocarcinoma.
These findings suggest VIM methylation as a biomarker of upper gastrointestinal neoplasia with potential for development as molecular cytology in esophageal screening.
Barrett’s Esophagus; Esophageal Cancer; Gastric Cancer; Vimentin; Methylation
Lower serum vitamin D (25(OH)D) among individuals with African ancestry is attributed primarily to skin pigmentation. However, the influence of genetic polymorphisms controlling for skin melanin content has not been investigated. Therefore, we investigated differences in non-summer serum vitamin D metabolites according to self-reported race, genetic ancestry, skin reflectance and key pigmentation genes (SLC45A2 and SLC24A5).
Materials and Methods
Healthy individuals reporting at least half African American or half European American heritage were frequency matched to one another on age (+/− 2 years) and sex. 176 autosomal ancestry informative markers were used to estimate genetic ancestry. Melanin index was measured by reflectance spectrometry. Serum vitamin D metabolites (25(OH)D3, 25(OH)D2 and 24,25(OH)2D3) were determined by high performance liquid chromatography (HPLC) tandem mass spectrometry. Percent 24,25(OH)2D3 was calculated as a percent of the parent metabolite (25(OH)D3). Stepwise and backward selection regression models were used to identify leading covariates.
Fifty African Americans and 50 European Americans participated in the study. Compared with SLC24A5 111Thr homozygotes, individuals with the SLC24A5 111Thr/Ala and 111Ala/Ala genotypes had respectively lower levels of 25(OH)D3 (23.0 and 23.8 nmol/L lower, p-dominant=0.007), and percent 24,25(OH)2D3 (4.1 and 5.2 percent lower, p-dominant=0.003), controlling for tanning bed use, vitamin D/fish oil supplement intake, race/ethnicity, and genetic ancestry. Results were similar with melanin index adjustment, and were not confounded by glucocorticoid, oral contraceptive, or statin use.
The SLC24A5 111Ala allele was associated with lower serum vitamin 25(OH)D3 and lower percent 24,25(OH)2D3, independently from melanin index and West African genetic ancestry.
African Continental Ancestry Group; European Continental Ancestry Group; SLC24A5; 25-hydroxyvitamin D; 24,25-Dihydroxyvitamin D 3
Genetic influences may be discerned in families that have multiple affected members and may manifest as an earlier age of cancer diagnosis. In this study we determine whether cancers develop at an earlier age in multiplex Familial Barrett’s Esophagus (FBE) kindreds, defined by 3 or more members affected by Barrett’s esophagus (BE) or esophageal adenocarcinoma (EAC).
Information on BE/EAC risk factors and family history was collected from probands at eight tertiary care academic hospitals. Age of cancer diagnosis and other risk factors were compared between non-familial (no affected relatives), duplex (two affected relatives), and multiplex (three or more affected relatives) FBE kindreds.
The study included 830 non-familial, 274 duplex and 41 multiplex FBE kindreds with 274, 133 and 43 EAC and 566, 288 and 103 BE cases, respectively. Multivariable mixed models adjusting for familial correlations showed that multiplex kindreds were associated with a younger age of cancer diagnosis (p = 0.0186). Median age of cancer diagnosis was significantly younger in multiplex compared to duplex and non-familial kindreds (57 vs. 62 vs. 63 yrs, respectively, p = 0.0448). Mean body mass index (BMI) was significantly lower in multiplex kindreds (p = 0.0033) as was smoking (p < 0.0001), and reported regurgitation (p = 0.0014).
Members of multiplex FBE kindreds develop EAC at an earlier age compared to non-familial EAC cases. Multiplex kindreds do not have a higher proportion of common risk factors for EAC, suggesting that this aggregation might be related to a genetic factor.
These findings indicate that efforts to identify susceptibility genes for BE and EAC will need to focus on multiplex kindreds.
Esophageal adenocarcinoma; Barrett’s esophagus; genetics; family history
There is a critical need to identify molecular markers that can reliably aid in stratifying esophageal adenocarcinoma (EAC) risk in patients with Barrett's esophagus. MicroRNAs (miRNA/miR) are one such class of biomolecules. In the present cross-sectional study, we characterized miRNA alterations in progressive stages of neoplastic development, i.e., metaplasia–dysplasia–adenocarcinoma, with an aim to identify candidate miRNAs potentially associated with progression. Using next generation sequencing (NGS) as an agnostic discovery platform, followed by quantitative real-time PCR (qPCR) validation in a total of 20 EACs, we identified 26 miRNAs that are highly and frequently deregulated in EACs (≥4-fold in >50% of cases) when compared to paired normal esophageal squamous (nSQ) tissue. We then assessed the 26 EAC-derived miRNAs in laser microdissected biopsy pairs of Barrett's metaplasia (BM)/nSQ (n = 15), and high-grade dysplasia (HGD)/nSQ (n = 14) by qPCR, to map the timing of deregulation during progression from BM to HGD and to EAC. We found that 23 of the 26 candidate miRNAs were deregulated at the earliest step, BM, and therefore noninformative as molecular markers of progression. Two miRNAs, miR-31 and –31*, however, showed frequent downregulation only in HGD and EAC cases suggesting association with transition from BM to HGD. A third miRNA, miR-375, showed marked downregulation exclusively in EACs and in none of the BM or HGD lesions, suggesting its association with progression to invasive carcinoma. Taken together, we propose miR-31 and –375 as novel candidate microRNAs specifically associated with early- and late-stage malignant progression, respectively, in Barrett's esophagus.
Gliomas, which generally have a poor prognosis, are the most common primary malignant brain tumors in adults. Recent genome-wide association studies have demonstrated that inherited susceptibility plays a role in the development of glioma. Although first-degree relatives of patients exhibit a two-fold increased risk of glioma, the search for susceptibility loci in familial forms of the disease has been challenging because the disease is relatively rare, fatal, and heterogeneous, making it difficult to collect sufficient biosamples from families for statistical power. To address this challenge, the Genetic Epidemiology of Glioma International Consortium (Gliogene) was formed to collect DNA samples from families with two or more cases of histologically confirmed glioma. In this study, we present results obtained from 46 U.S. families in which multipoint linkage analyses were undertaken using nonparametric (model-free) methods. After removal of high linkage disequilibrium SNPs, we obtained a maximum nonparametric linkage score (NPL) of 3.39 (P=0.0005) at 17q12–21.32 and the Z-score of 4.20 (P=0.000007). To replicate our findings, we genotyped 29 independent U.S. families and obtained a maximum NPL score of 1.26 (P=0.008) and the Z-score of 1.47 (P=0.035). Accounting for the genetic heterogeneity using the ordered subset analysis approach, the combined analyses of 75 families resulted in a maximum NPL score of 3.81 (P=0.00001). The genomic regions we have implicated in this study may offer novel insights into glioma susceptibility, focusing future work to identify genes that cause familial glioma.
Glioma; family studies; linkage; haplotype pattern; NPL
In recent years, many algorithms have been developed for network-based analysis of differential gene expression in complex diseases. These algorithms use protein-protein interaction (PPI) networks as an integrative framework and identify subnetworks that are coordinately dysregulated in the phenotype of interest.
While such dysregulated subnetworks have demonstrated significant improvement over individual gene markers for classifying phenotype, the current state-of-the-art in dysregulated subnetwork discovery is almost exclusively limited to binary phenotype classes. However, many clinical applications require identification of molecular markers for multiple classes.
We consider the problem of discovering groups of genes whose expression signatures can discriminate multiple phenotype classes. We consider two alternate formulations of this problem (i) an all-vs-all approach that aims to discover subnetworks distinguishing all classes, (ii) a one-vs-all approach that aims to discover subnetworks distinguishing each class from the rest of the classes. For the one-vs-all formulation, we develop a set-cover based algorithm, which aims to identify groups of genes such that at least one gene in the group exhibits differential expression in the target class.
We test the proposed algorithms in the context of predicting stages of colorectal cancer. Our results show that the set-cover based algorithm identifying "stage-specific" subnetworks outperforms the all-vs-all approaches in classification. We also investigate the merits of utilizing PPI networks in the search for multiple markers, and show that, with correct parameter settings, network-guided search improves performance. Furthermore, we show that assessing statistical significance when selecting features greatly improves classification performance.
Single nucleotide polymorphisms (SNPs) in alcohol metabolism genes are associated with squamous cell carcinoma of the head and neck (SCCHN), and may influence cancer risk in conjunction with alcohol. Genetic variation in the oxidative stress pathway may impact the carcinogenic effect of reactive oxygen species produced by ethanol metabolism. We hypothesized that alcohol interacts with these pathways to affect SCCHN incidence.
Interview and genotyping data for 64 SNPs were obtained from 2552 European- and African-American subjects (1227 cases, 1325 controls) from the Carolina Head and Neck Cancer Epidemiology study, a population-based case-control study of SCCHN conducted in North Carolina from 2002–2006. We estimated odds ratios and 95% confidence intervals for SNPs and haplotypes, adjusting for age, sex, race, and duration of cigarette smoking. P-values were adjusted for multiple testing using Bonferroni correction.
Two SNPs were associated with SCCHN risk: ADH1B rs1229984 A allele (OR=0.7, 95%CI=0.6–0.9) and ALDH2 rs2238151 C allele (OR=1.2, 95%CI=1.1–1.4). Three were associated with sub-site tumors: ADH1B rs17028834 C allele (larynx, OR=1.5, 95%CI=1.1–2.0), SOD2 rs4342445 A allele (oral cavity, OR=1.3, 95%CI=1.1–1.6), and SOD2 rs5746134 T allele (hypopharynx, OR=2.1, 95%CI=1.2–3.7). Four SNPs in alcohol metabolism genes interacted additively with alcohol consumption: ALDH2 rs2238151, ADH1B rs1159918, ADH7 rs1154460, and CYP2E1 rs2249695. No alcohol interactions were found for oxidative stress SNPs.
Conclusions and Impact
Previously unreported associations of SNPs in ALDH2, CYP2E1, GPX2, SOD1, and SOD2 with SCCHN and sub-site tumors provide evidence that alterations in alcohol and oxidative stress pathways influence SCCHN carcinogenesis, and warrant further investigation.
Head and Neck Neoplasms; Head and Neck Neoplasms/epidemiology; Gene-environment interaction; Alcohol Drinking/metabolism; Oxidative Stress
Genetic epidemiological studies of complex diseases often rely on data from the International HapMap Consortium for identification of single nucleotide polymorphisms (SNPs), particularly those that tag haplotypes. However, little is known about the relevance of the African populations used to collect HapMap data for study populations conducted elsewhere in Africa. Toll-like receptor (TLR) genes play a key role in susceptibility to various infectious diseases, including tuberculosis. We conducted full-exon sequencing in samples obtained from Uganda (n = 48) and South Africa (n = 48), in four genes in the TLR pathway: TLR2, TLR4, TLR6, and TIRAP. We identified one novel TIRAP SNP (with minor allele frequency [MAF] 3.2%) and a novel TLR6 SNP (MAF 8%) in the Ugandan population, and a TLR6 SNP that is unique to the South African population (MAF 14%). These SNPs were also not present in the 1000 Genomes data. Genotype and haplotype frequencies and linkage disequilibrium patterns in Uganda and South Africa were similar to African populations in the HapMap datasets. Multidimensional scaling analysis of polymorphisms in all four genes suggested broad overlap of all of the examined African populations. Based on these data, we propose that there is enough similarity among African populations represented in the HapMap database to justify initial SNP selection for genetic epidemiological studies in Uganda and South Africa. We also discovered three novel polymorphisms that appear to be population-specific and would only be detected by sequencing efforts.
Adipocytokines are produced by visceral fat, and levels may be associated with breast cancer risk. We investigated whether single nucleotide polymorphisms (SNPs) in adipocytokine genes adiponectin (ADIPOQ), leptin (LEP), and the leptin receptor (LEPR) were associated with basal-like or luminal A breast cancer subtypes. 104 candidate and tag SNPs were genotyped in 1776 of 2022 controls and 1972 (200 basal-like, 679 luminal A) of 2311 cases from the Carolina Breast Cancer Study (CBCS), a population-based case–control study of whites and African Americans. Breast cancer molecular subtypes were determined by immunohistochemistry. Genotype odds ratios (ORs) and 95% confidence intervals (CIs) were estimated using unconditional logistic regression. Haplotype ORs and 95% CIs were estimated using Hapstat. Interactions with waist-hip ratio were evaluated using a multiplicative interaction term. Ancestry was estimated from 144 ancestry informative markers (AIMs), and included in models to control for population stratification. Candidate SNPs LEPR K109R (rs1137100) and LEPR Q223R (rs1137101) were positively associated with luminal A breast cancer, whereas ADIPOQ +45 T/G (rs2241766), ADIPOQ +276 G/T (rs1501299), and LEPR K656N (rs8129183) were not associated with either subtype. Few patterns were observed among tag SNPs, with the exception of 3 LEPR SNPs (rs17412175, rs9436746, and rs9436748) that were in moderate LD and inversely associated with basal-like breast cancer. However, no SNP associations were statistically significant after adjustment for multiple comparisons. Haplotypes in LEP and LEPR were associated with both basal-like and luminal A subtypes. There was no evidence of interaction with waist-hip ratio. Data suggest associations between LEPR candidate SNPs and luminal A breast cancer in the CBCS and LEPR intron 2 tag SNPs and basal-like breast cancer. Replication in additional studies where breast cancer subtypes have been defined is necessary to confirm these potential associations.
Adiponectin; Leptin; Leptin receptor; Breast cancer; Subtypes; Single nucleotide polymorphism
Copy number variants (CNVs) have been implicated in many complex diseases. We examined whether inherited CNVs were associated with overall survival among women with invasive epithelial ovarian cancer. Germline DNA from 1,056 cases (494 deceased, average of 3.7 years follow-up) was interrogated with the Illumina 610 quad genome-wide array containing, after quality control exclusions, 581,903 single nucleotide polymorphisms (SNPs) and 17,917 CNV probes. Comprehensive analysis capitalized upon the strengths of three complementary approaches to CNV classification. First, to identify small CNVs, single markers were evaluated and, where associated with survival, consecutive markers were combined. Two chromosomal regions were associated with survival using this approach (14q31.3 rs2274736 p = 1.59 × 10−6, p = 0.001; 22q13.31 rs2285164 p = 4.01 × 10−5, p = 0.009), but were not significant after multiple testing correction. Second, to identify large CNVs, genome-wide segmentation was conducted to characterize chromosomal gains and losses, and association with survival was evaluated by segment. Four regions were associated with survival (1q21.3 loss p = 0.005, 5p14.1 loss p = 0.004, 9p23 loss p = 0.002, and 15q22.31 gain p = 0.002); however, again, after correcting for multiple testing, no regions were statistically significant, and none were in common with the single marker approach. Finally, to evaluate associations with general amounts of copy number changes across the genome, we estimated CNV burden based on genome-wide numbers of gains and losses; no associations with survival were observed (p > 0.40). Although CNVs that were not well-covered by the Illumina 610 quad array merit investigation, these data suggest no association between inherited CNVs and survival after ovarian cancer.
association testing; copy number variation; genotyping array; ovarian cancer; overall survival
The molecular behavior of biological systems can be described in terms of three fundamental components: (i) the physical entities, (ii) the interactions among these entities, and (iii) the dynamics of these entities and interactions. The mechanisms that drive complex disease can be productively viewed in the context of the perturbations of these components. One challenge in this regard is to identify the pathways altered in specific diseases. To address this challenge, Gene Set Enrichment Analysis (GSEA) and others have been developed, which focus on alterations of individual properties of the entities (such as gene expression). However, the dynamics of the interactions with respect to disease have been less well studied (i.e., properties of components ii and iii).
Here, we present a novel method called Gene Interaction Enrichment and Network Analysis (GIENA) to identify dysregulated gene interactions, i.e., pairs of genes whose relationships differ between disease and control. Four functions are defined to model the biologically relevant gene interactions of cooperation (sum of mRNA expression), competition (difference between mRNA expression), redundancy (maximum of expression), or dependency (minimum of expression) among the expression levels. The proposed framework identifies dysregulated interactions and pathways enriched in dysregulated interactions; points out interactions that are perturbed across pathways; and moreover, based on the biological annotation of each type of dysregulated interaction gives clues about the regulatory logic governing the systems level perturbation. We demonstrated the potential of GIENA using published datasets related to cancer.
We showed that GIENA identifies dysregulated pathways that are missed by traditional enrichment methods based on the individual gene properties and that use of traditional methods combined with GIENA provides coverage of the largest number of relevant pathways. In addition, using the interactions detected by GIENA, specific gene networks both within and across pathways associated with the relevant phenotypes are constructed and analyzed.
Gene-gene interaction; Dysregulated pathways; Enrichment analysis; BAD pathway
Mitochondria contribute to oxidative stress, a phenomenon implicated in ovarian carcinogenesis. We hypothesized that inherited variants in mitochondrial-related genes influence epithelial ovarian cancer (EOC) susceptibility.
Through a multi-center study of 1,815 Caucasian EOC cases and 1,900 controls, we investigated associations between EOC risk and 128 single nucleotide polymorphisms (SNPs) from 22 genes/regions within the mitochondrial genome (mtDNA) and 2,839 nuclear-encoded SNPs localized to 138 genes involved in mitochondrial biogenesis (BIO, n=35), steroid hormone metabolism (HOR, n=13), and oxidative phosphorylation (OXP, n=90) pathways. Unconditional logistic regression was used to estimate odds ratios (OR) and 95% confidence intervals (CI) between genotype and case status. Overall significance of each gene and pathway was evaluated using Fisher’s method to combine SNP-level evidence. At the SNP-level, we investigated whether lifetime ovulation, hormone replacement therapy (HRT), and cigarette smoking were confounders or modifiers of associations.
Inter-individual variation involving BIO was most strongly associated with EOC risk (empirical P=0.050), especially for NRF1, MTERF, PPARGC1A, ESRRA, and CAMK2D. Several SNP-level associations strengthened after adjustment for non-genetic factors, particularly for MTERF. Statistical interactions with cigarette smoking and HRT use were observed with MTERF and CAMK2D SNPs, respectively. Overall variation within mtDNA, HOR, and OXP was not statistically significant (empirical P >0.10).
We provide novel evidence to suggest that variants in mitochondrial biogenesis genes may influence EOC susceptibility.
A deeper understanding of the complex mechanisms implicated in mitochondrial biogenesis and oxidative stress may aid in developing strategies to reduce morbidity and mortality from EOC.
polymorphisms; oxidative stress; genetic susceptibility; mitochondria; ovarian cancer
Inherited variability in genes that influence androgen metabolism has been associated with risk of prostate cancer. The objective of this analysis was to evaluate interactions for prostate cancer risk using classification and regression tree (CART) models (i.e. decision trees), and to evaluate whether these interactive effects add information about prostate cancer risk prediction beyond that of “traditional” risk factors.
We compared CART models to traditional logistic regression models for associations of factors with prostate cancer risk using 1084 prostate cancer cases and 941 controls. All analyses were stratified by race. We used unconditional logistic regression (LR) to complement and compare to the race-stratified CART results using the area under curve (AUC) for the receiver operating characteristic (ROC) curves.
The CART modeling of prostate cancer risk showed different interaction profiles by race. For European Americans, interactions among CYP3A43 genotype, history of benign prostate hypertrophy, family history of prostate cancer and age at consent revealed a distinct hierarchy of gene-environment and gene-gene interactions. While for African Americans, interactions among family history of prostate cancer, individual proportion of European ancestry, number of GGC AR repeats and CYP3A4/CYP3A5 haplotype revealed distinct interaction effects from those found in European Americans. For European Americans the CART model had the highest AUC while for African Americans, the LR model with the CART discovered factors had the largest AUC.
Conclusion & Impact
These results provide new insight into underlying prostate cancer biology for European Americans and African Americans.
Decision tree; classification and regression tree (CART); androgen pathway; prostate cancer risk; ancestry
The identification of very small subsets of predictive variables is an important toπc that has not often been considered in the literature. In order to discover highly predictive yet compact gene set classifiers from whole genome expression data, a non-parametric, iterative algorithm, Splitting Random Forest (SRF), was developed to robustly identify genes that distinguish between molecular subtypes. The goal is to improve the prediction accuracy while considering sparsity.
The optimal SRF 50 run (SRF50) gene classifiers for glioblastoma (GB), breast (BC) and ovarian cancer (OC) subtypes had overall prediction rates comparable to those from published datasets upon validation (80.1%-91.7%). The SRF50 sets outperformed other methods by identifying compact gene sets needed for distinguishing between tested cancer subtypes (10–200 fold fewer genes than ANOVA or published gene sets). The SRF50 sets achieved superior and robust overall and subtype prediction accuracies when compared with single random forest (RF) and the Top 50 ANOVA results (80.1% vs 77.8% for GB; 84.0% vs 74.1% for BC; 89.8% vs 88.9% for OC in SRF50 vs single RF comparison; 80.1% vs 77.2% for GB; 84.0% vs 82.7% for BC; 89.8% vs 87.0% for OC in SRF50 vs Top 50 ANOVA comparison). There was significant overlap between SRF50 and published gene sets, showing that SRF identifies the relevant sub-sets of important gene lists. Through Ingenuity Pathway Analysis (IPA), the overlap in “hub” genes between the SRF50 and published genes sets were RB1, πK3R1, PDGFBB and ERK1/2 for GB; ESR1, MYC, NFkB and ERK1/2 for BC; and Akt, FN1, NFkB, PDGFBB and ERK1/2 for OC.
The SRF approach is an effective driver of biomarker discovery research that reduces the number of genes needed for robust classification, dissects complex, high dimensional “omic” data and provides novel insights into the cellular mechanisms that define cancer subtypes.
Tree based models; High dimensional data; Cancer subtypes
We investigated the ability of several principal components analysis (PCA)-based strategies to detect and control for population stratification using data from a multi-center study of epithelial ovarian cancer among women of European-American ethnicity. These include a correction based on an ancestry informative markers (AIMs) panel designed to capture European ancestral variation and corrections utilizing un-thinned genome-wide SNP data; case-control samples were drawn from four geographically distinct North-American sites. The AIMs-only and genome-wide first principal components (PC1) both corresponded to the previously described North or Northwest-Southeast axis of European variation. We found that the genome-wide PCA captured this primary dimension of variation more precisely and identified additional axes of genome-wide variation of relevance to epithelial ovarian cancer. Associations evident between the genome-wide PCs and study site corroborate North American immigration history and suggest that undiscovered dimensions of variation lie within Northern Europe. The structure captured by the genome-wide PCA was also found within control individuals and did not reflect the case-control variation present in the data. The genome-wide PCA highlighted three regions of local LD, corresponding to the lactase (LCT) gene on chromosome 2, the human leukocyte antigen system (HLA) on chromosome 6 and to a common inversion polymorphism on chromosome 8. These features did not compromise the efficacy of PCs from this analysis for ancestry control. This study concludes that although AIMs panels are a cost-effective way of capturing population structure, genome-wide data should preferably be used when available.
Human cancers are driven by the acquisition of somatic mutations. Separating the driving mutations from those that are random consequences of general genomic instability remains a challenge. New sequencing technology makes it possible to detect mutations that are present in only a minority of cells in a heterogeneous tumor population. We sought to leverage the power of ultra-deep sequencing to study various levels of tumor heterogeneity in the serial recurrences of a single glioblastoma multiforme patient. Our goal was to gain insight into the temporal succession of DNA base-level lesions by querying intra- and inter-tumoral cell populations in the same patient over time. We performed targeted “next-generation" sequencing on seven samples from the same patient: two foci within the primary tumor, two foci within an initial recurrence, two foci within a second recurrence, and normal blood. Our study reveals multiple levels of mutational heterogeneity. We found variable frequencies of specific EGFR, PIK3CA, PTEN, and TP53 base substitutions within individual tumor regions and across distinct regions within the same tumor. In addition, specific mutations emerge and disappear along the temporal spectrum from tumor at the time of diagnosis to second recurrence, demonstrating evolution during tumor progression. Our results shed light on the spatial and temporal complexity of brain tumors. As sequencing costs continue to decline and deep sequencing technology eventually moves into the clinic, this approach may provide guidance for treatment choices as we embark on the path to personalized cancer medicine.