The sequencing by the PolyA selection is the most common approach for library preparation. With limited amount or degraded RNA, alternative protocols such as the NuGEN have been developed. However, it is not yet clear how the different library preparations affect the downstream analyses of the broad applications of RNA sequencing.
Methods and Materials
Eight human mammary epithelial cell (HMEC) lines with high quality RNA were sequenced by Illumina’s mRNA-Seq PolyA selection and NuGEN ENCORE library preparation. The following analyses and comparisons were conducted: 1) the numbers of genes captured by each protocol; 2) the impact of protocols on differentially expressed gene detection between biological replicates; 3) expressed single nucleotide variant (SNV) detection; 4) non-coding RNAs, particularly lincRNA detection; and 5) intragenic gene expression.
Sequences from the NuGEN protocol had lower (75%) alignment rate than the PolyA (over 90%). The NuGEN protocol detected fewer genes (12–20% less) with a significant portion of reads mapped to non-coding regions. A large number of genes were differentially detected between the two protocols. About 17–20% of the differentially expressed genes between biological replicates were commonly detected between the two protocols. Significantly higher numbers of SNVs (5–6 times) were detected in the NuGEN samples, which were largely from intragenic and intergenic regions. The NuGEN captured fewer exons (25% less) and had higher base level coverage variance. While 6.3% of reads were mapped to intragenic regions in the PolyA samples, the percentages were much higher (20–25%) for the NuGEN samples. The NuGEN protocol did not detect more known non-coding RNAs such as lincRNAs, but targeted small and “novel” lincRNAs.
Different library preparations can have significant impacts on downstream analysis and interpretation of RNA-seq data. The NuGEN provides an alternative for limited or degraded RNA but it has limitations for some RNA-seq applications.
The objective of this study was to analyze genome-wide differential methylation patterns in maternal leukocyte DNA in early pregnant and non-pregnant states. This is an age and body mass index matched case-control study comparing the methylation patterns of 27,578 cytosine-guanine (CpG) sites in 14,495 genes in maternal leukocyte DNA in early pregnancy (n = 14), in the same women postpartum (n = 14), and in nulligravid women (n = 14) on a BeadChip platform. Transient widespread hypomethylation was found in early pregnancy as compared with the non-pregnant states. Methylation of nine genes was significantly different in early pregnancy compared with both postpartum and nulligravid states (< 10% False Discovery Rate). Early pregnancy may be characterized by widespread hypomethylation compared with non-pregnant states; there is no apparent permanent methylation imprint after a normal term gestation. Nine potential candidate genes were identified as differentially methylated in early pregnancy and may play a role in the maternal adaptation to pregnancy.
DNA Methylation; epigenetics; immune; leukocyte; maternal; postpartum; pregnancy
MicroRNA plays an important role in human diseases and cancer. We seek to investigate the expression status, clinical relevance, and functional role of microRNA in non-small cell lung cancer.
We performed miRNA expression profiling in matched lung adenocarcinoma and uninvolved lung using 56 pairs of fresh-frozen (FF) and 47 pairs of formalin-fixed, paraffin-embedded (FFPE) samples from never smokers. The most differentially expressed miRNA genes were evaluated by Cox analysis and Log-Rank test. Among the best candidate, miR-708 was further examined for differential expression in two independent cohorts. Functional significance of miR-708 expression in lung cancer was examined by identifying its candidate mRNA target and through manipulating its expression levels in cultured cells.
Among the 20 miRNAs most differentially expressed between tested tumor and normal samples, high expression level of miR-708 in the tumors was most strongly associated with an increased risk of death after adjustments for all clinically significant factors including age, sex, and tumor stage (FF cohort: HR, 1.90; 95% CI, 1.08-3.35; P=.025 and FFPE cohort: HR, 1.93; 95% CI, 1.02-3.63; P=.042). The transcript for TMEM88 gene has a miR-708 binding site in its 3′ UTR and was significantly reduced in tumors high of miR-708. Forced miR-708 expression reduced TMEM88 transcript levels and increased the rate of cell proliferation, invasion, and migration in culture.
MicroRNA-708 acts as an oncogene contributing to tumor growth and disease progression by directly down regulating TMEM88, a negative regulator of the Wnt signaling pathway in lung cancer.
NSCLC; adenocarcinoma; miR-708; never smoker; survival; TMEM88; Wnt signaling
Exosomes, endosome-derived membrane microvesicles, contain specific RNA transcripts that are thought to be involved in cell-cell communication. These RNA transcripts have great potential as disease biomarkers. To characterize exosomal RNA profiles systemically, we performed RNA sequencing analysis using three human plasma samples and evaluated the efficacies of small RNA library preparation protocols from three manufacturers. In all we evaluated 14 libraries (7 replicates).
From the 14 size-selected sequencing libraries, we obtained a total of 101.8 million raw single-end reads, an average of about 7.27 million reads per library. Sequence analysis showed that there was a diverse collection of the exosomal RNA species among which microRNAs (miRNAs) were the most abundant, making up over 42.32% of all raw reads and 76.20% of all mappable reads. At the current read depth, 593 miRNAs were detectable. The five most common miRNAs (miR-99a-5p, miR-128, miR-124-3p, miR-22-3p, and miR-99b-5p) collectively accounted for 48.99% of all mappable miRNA sequences. MiRNA target gene enrichment analysis suggested that the highly abundant miRNAs may play an important role in biological functions such as protein phosphorylation, RNA splicing, chromosomal abnormality, and angiogenesis. From the unknown RNA sequences, we predicted 185 potential miRNA candidates. Furthermore, we detected significant fractions of other RNA species including ribosomal RNA (9.16% of all mappable counts), long non-coding RNA (3.36%), piwi-interacting RNA (1.31%), transfer RNA (1.24%), small nuclear RNA (0.18%), and small nucleolar RNA (0.01%); fragments of coding sequence (1.36%), 5′ untranslated region (0.21%), and 3′ untranslated region (0.54%) were also present. In addition to the RNA composition of the libraries, we found that the three tested commercial kits generated a sufficient number of DNA fragments for sequencing but each had significant bias toward capturing specific RNAs.
This study demonstrated that a wide variety of RNA species are embedded in the circulating vesicles. To our knowledge, this is the first report that applied deep sequencing to discover and characterize profiles of plasma-derived exosomal RNAs. Further characterization of these extracellular RNAs in diverse human populations will provide reference profiles and open new doors for the development of blood-based biomarkers for human diseases.
Exosome; microRNA; Next generation sequencing; Plasma; Biomarker
Insulin regulates many cellular processes, but the full impact of insulin deficiency on cellular functions remains to be defined. Applying a mass spectrometry–based nontargeted metabolomics approach, we report here alterations of 330 plasma metabolites representing 33 metabolic pathways during an 8-h insulin deprivation in type 1 diabetic individuals. These pathways included those known to be affected by insulin such as glucose, amino acid and lipid metabolism, Krebs cycle, and immune responses and those hitherto unknown to be altered including prostaglandin, arachidonic acid, leukotrienes, neurotransmitters, nucleotides, and anti-inflammatory responses. A significant concordance of metabolome and skeletal muscle transcriptome–based pathways supports an assumption that plasma metabolites are chemical fingerprints of cellular events. Although insulin treatment normalized plasma glucose and many other metabolites, there were 71 metabolites and 24 pathways that differed between nondiabetes and insulin-treated type 1 diabetes. Confirmation of many known pathways altered by insulin using a single blood test offers confidence in the current approach. Future research needs to be focused on newly discovered pathways affected by insulin deficiency and systemic insulin treatment to determine whether they contribute to the high morbidity and mortality in T1D despite insulin treatment.
Chronic Obstructive Pulmonary Disease (COPD) is a strong risk factor for lung cancer. Published studies regarding variations of genes encoding glutathione metabolism, DNA repair, and inflammatory response pathways in susceptibility to COPD were inconclusive.
We evaluated 470 single nucleotide polymorphisms (SNPs) from 56 genes of these 3 pathways in 620 cases and 893 controls to identify susceptibility markers for COPD risk, using existing resources. We assessed SNP- and gene-level effects adjusting for sex, age, and smoking status. Differential genetic effects on disease risk with and without lung cancer were also assessed; cumulative risk models were established.
Twenty-one SNPs were found to be significantly associated with risk of COPD (P<0.01); gene-based analyses confirmed 2 genes (GCLC and GSS) and identified 3 additional (GSTO2, ERCC1, and RRM1). Carrying 12 high-risk alleles may increase risk by 2.7-fold; 8 SNPs altered COPD risk with lung cancer 3.1-fold, and 4 SNPs altered the risk without lung cancer 2.3-fold.
Our findings indicate that multiple genetic variations in the 3 selected pathways contribute to COPD risk through GCLC, GSS, GSTO2, ERCC1, and RRM1 genes. Functional studies are needed to elucidate the mechanisms of these genes in the development of COPD, lung cancer, or both.
Chronic Obstructive Pulmonary Disease; Glutathione Metabolism Pathway; DNA Repair Pathway; Inflammatory Response Pathway
Background and objective
Gemcitabine is widely used to treat non-small cell lung cancer (NSCLC). To assess the pharmacogenomic effects of the entire gemcitabine metabolic pathway, we genotyped SNPs within the 17 pathway genes using DNA samples from NSCLC patients treated with gemcitabine to determine the effect of genetic variants within gemcitabine pathway genes on overall survival (OS) of NSCLC patients after treatment of gemcitabine.
Eight of the 17 pathway genes were resequenced with DNA samples from Coriell lymphoblastoid cell lines (LCLs) using Sanger sequencing for all exons, exon-intron junctions and 5′-, 3′-UTRs. A total of 107 tag SNPs were selected based on the resequencing data for the 8 genes and on HapMap data for the remaining 9 genes, followed by successful genotyping of 394 NSCLC patient DNA samples. Association of SNPs/haplotypes with OS was performed using the Cox regression model, followed by functional studies performed with LCLs and NSCLC cell lines.
5 SNPs in 4 genes (CDA, NT5C2, RRM1, and SLC29A1) showed associations with OS of those NSCLC patients, as well as 9 haplotypes in 4 genes (RRM1, RRM2, SLC28A3, and SLC29A1) with P < 0.05. Genotype imputation using the LCLs was performed for a region of 200kb surrounding those SNPs, followed by association studies with gemcitabine cytotoxicity. Functional studies demonstrated that downregulation of SLC29A1, NT5C2, and RRM1 in NSCLC cell lines altered cell susceptibility to gemcitabine.
These studies help identify biomarkers to predict gemcitabine response in NSCLC, a step toward the individualized chemotherapy of lung cancer.
gemcitabine; pharmacogenomics; metabolic pathway; lymphoblastoid cell lines; non-small cell lung cancer (NSCLC)
Endoxifen, a cytochrome P450 mediated tamoxifen metabolite, is being developed as a drug for the treatment of estrogen receptor (ER) positive breast cancer. Endoxifen is known to be a potent anti-estrogen and its mechanisms of action are still being elucidated. Here, we demonstrate that endoxifen-mediated recruitment of ERα to known target genes differs from that of 4-hydroxy-tamoxifen (4HT) and ICI-182,780 (ICI). Global gene expression profiling of MCF7 cells revealed substantial differences in the transcriptome following treatment with 4HT, endoxifen and ICI, both in the presence and absence of estrogen. Alterations in endoxifen concentrations also dramatically altered the gene expression profiles of MCF7 cells, even in the presence of clinically relevant concentrations of tamoxifen and its metabolites, 4HT and N-desmethyl-tamoxifen (NDT). Pathway analysis of differentially regulated genes revealed substantial differences related to endoxifen concentrations including significant induction of cell cycle arrest and markers of apoptosis following treatment with high, but not low, concentrations of endoxifen. Taken together, these data demonstrate that endoxifen’s mechanism of action is different from that of 4HT and ICI and provide mechanistic insight into the potential importance of endoxifen in the suppression of breast cancer growth and progression.
Taxane is one of the first line treatments of lung cancer. In order to identify novel single nucleotide polymorphisms (SNPs) that might contribute to taxane response, we performed a genome-wide association study (GWAS) for two taxanes, paclitaxel and docetaxel, using 276 lymphoblastoid cell lines (LCLs), followed by genotyping of top candidate SNPs in 874 lung cancer patient samples treated with paclitaxel.
GWAS was performed using 1.3 million SNPs and taxane cytotoxicity IC50 values for 276 LCLs. The association of selected SNPs with overall survival in 76 small or 798 non-small cell lung cancer (SCLC, NSCLC) patients were analyzed by Cox regression model, followed by integrated SNP-microRNA-expression association analysis in LCLs and siRNA screening of candidate genes in SCLC (H196) and NSCLC (A549) cell lines.
147 and 180 SNPs were associated with paclitaxel or docetaxel IC50s with p-values <10-4 in the LCLs, respectively. Genotyping of 153 candidate SNPs in 874 lung cancer patient samples identified 8 SNPs (p-value < 0.05) associated with either SCLC or NSCLC patient overall survival. Knockdown of PIP4K2A, CCT5, CMBL, EXO1, KMO and OPN3, genes within 200 kb up-/downstream of the 3 SNPs that were associated with SCLC overall survival (rs1778335, rs2662411 and rs7519667), significantly desensitized H196 to paclitaxel. SNPs rs2662411 and rs1778335 were associated with mRNA expression of CMBL or PIP4K2A through microRNA (miRNA) hsa-miR-584 or hsa-miR-1468.
GWAS in an LCL model system, joined with clinical translational and functional studies, might help us identify genetic variations associated with overall survival of lung cancer patients treated paclitaxel.
Taxane; Genome-wide association; Lymphoblastoid cell line; Lung cancer; Overall survival
Studies from selected candidate genes suggest that single nucleotide polymorphisms (SNP) involved in glutathione metabolism, DNA repair, or inflammatory responses may affect overall survival (OS) in stages I-II or low stage non-small cell lung cancer (LS-NSCLC); however, results are inconclusive. In this study, we took a systematic pathway-based approach to simultaneously evaluate the impact of genetic variation from these three pathways on OS following LS-NSCLC diagnosis.
DNA from 647 patients with LS-NSCLC was genotyped for 480 SNPs (tagSNPs) tagging 57 genes from the three candidate pathways. Associations of tagSNPs with OS were assessed at the individual SNP and whole gene levels, adjusting for age, tumor stage, surgery type, and adjuvant therapy. The genotype combinations of the SNPs associated with OS was also estimated.
Among the 412 tagSNPs that were successfully genotyped and passed multi-step quality assessments, 28 showed association with OS (p<0.05). Two of the 28 were estimated to have less than a 20% chance of being false positives (rs3768490 in GSTM4 gene: p=1.32×10-4, q=0.06; rs1729786 in ABCC4 gene: p=9.25×10-4, q=0.20). Gene-based analysis suggested that, in addition to GSTM4 and ABCC4, variation in two other genes, PTGS2 and GSTA2, was also associated with OS.
We describe further evidence that variations in genes involved in the glutathione and inflammatory response pathways are associated with OS in patients with LS-NSCLC. Further studies are warranted to verify our findings and elucidate their functional mechanisms and clinical utility leading to improved survival for lung cancer patients.
glutathione metabolism; DNA repair; inflammation response; genetic polymorphisms; non-small-cell lung cancer; survival analysis
Inherited variability in the prognosis of lung cancer patients treated with platinum-based chemotherapy has been widely investigated. However, the overall contribution of genetic variation to platinum response is not well established. To identify novel candidate SNPs/genes, we performed a genome-wide association study (GWAS) for cisplatin cytotoxicity using lymphoblastoid cell lines (LCLs), followed by an association study of selected SNPs from the GWAS with overall survival (OS) in lung cancer patients.
GWAS for cisplatin were performed with 283 ethnically diverse LCLs. 168 top SNPs were genotyped in 222 small cell and 961 non-small cell lung cancer (SCLC, NSCLC) patients treated with platinum-based therapy. Association of the SNPs with OS was determined using the Cox regression model. Selected candidate genes were functionally validated by siRNA knockdown in human lung cancer cells.
Among 157 successfully genotyped SNPs, 9 and 10 SNPs were top SNPs associated with OS for patients with NSCLC and SCLC, respectively, although they were not significant after adjusting for multiple testing. Fifteen genes, including 7 located within 200 kb up or downstream of the four top SNPs and 8 genes for which expression was correlated with three SNPs in LCLs were selected for siRNA screening. Knockdown of DAPK3 and METTL6, for which expression levels were correlated with the rs11169748 and rs2440915 SNPs, significantly decreased cisplatin sensitivity in lung cancer cells.
This series of clinical and complementary laboratory-based functional studies identified several candidate genes/SNPs that might help predict treatment outcomes for platinum-based therapy of lung cancer.
Lung cancer; cisplatin; pharmacogenomics; lymphoblastoid cell lines; GWAS
Copy number variants (CNVs) have been implicated in many complex diseases. We examined whether inherited CNVs were associated with overall survival among women with invasive epithelial ovarian cancer. Germline DNA from 1,056 cases (494 deceased, average of 3.7 years follow-up) was interrogated with the Illumina 610 quad genome-wide array containing, after quality control exclusions, 581,903 single nucleotide polymorphisms (SNPs) and 17,917 CNV probes. Comprehensive analysis capitalized upon the strengths of three complementary approaches to CNV classification. First, to identify small CNVs, single markers were evaluated and, where associated with survival, consecutive markers were combined. Two chromosomal regions were associated with survival using this approach (14q31.3 rs2274736 p = 1.59 × 10−6, p = 0.001; 22q13.31 rs2285164 p = 4.01 × 10−5, p = 0.009), but were not significant after multiple testing correction. Second, to identify large CNVs, genome-wide segmentation was conducted to characterize chromosomal gains and losses, and association with survival was evaluated by segment. Four regions were associated with survival (1q21.3 loss p = 0.005, 5p14.1 loss p = 0.004, 9p23 loss p = 0.002, and 15q22.31 gain p = 0.002); however, again, after correcting for multiple testing, no regions were statistically significant, and none were in common with the single marker approach. Finally, to evaluate associations with general amounts of copy number changes across the genome, we estimated CNV burden based on genome-wide numbers of gains and losses; no associations with survival were observed (p > 0.40). Although CNVs that were not well-covered by the Illumina 610 quad array merit investigation, these data suggest no association between inherited CNVs and survival after ovarian cancer.
association testing; copy number variation; genotyping array; ovarian cancer; overall survival
Summary: Reduced representation bisulfite sequencing (RRBS) is a cost-effective approach for genome-wide methylation pattern profiling. Analyzing RRBS sequencing data is challenging and specialized alignment/mapping programs are needed. Although such programs have been developed, a comprehensive solution that provides researchers with good quality and analyzable data is still lacking. To address this need, we have developed a Streamlined Analysis and Annotation Pipeline for RRBS data (SAAP-RRBS) that integrates read quality assessment/clean-up, alignment, methylation data extraction, annotation, reporting and visualization. This package facilitates a rapid transition from sequencing reads to a fully annotated CpG methylation report to biological interpretation.
Availability and implementation: SAAP-RRBS is freely available to non-commercial users at the web site http://ndc.mayo.edu/mayo/research/biostat/stand-alone-packages.cfm.
email@example.com or firstname.lastname@example.org
Supplementary data are available at Bioinformatics online.
Variations in genes related to anticancer drugs' biologic activity could influence treatment responses and lung cancer prognosis. Genetic variants in four biological pathways, i.e., glutathione metabolism, DNA repair, cell cycle, and EGFR, were systematically investigated to examine their association with survival in advanced-stage NSCLC treated with chemotherapy.
A total of 894 tagging single-nucleotide polymorphisms (tagSNPs) in 70 genes from the four pathways were genotyped and analyzed in a 1076-patient cohort. Association with overall survival was analyzed at single-SNP and whole-gene levels within all patients and major chemotherapy agent combination groups.
A poorer overall survival was observed in patients with genetic variations in GSS (glutathione pathway) and MAP3K1 (EGFR pathway) (HR=1.45, 95% CI=1.20–1.70 and HR=1.25, 95% CI=1.05–1.50, respectively). In stratified analysis on patients receiving platinum plus taxane treatment, we observed a hazardous effect on overall survival by MAP3K1 variant (HR=1.38, 95% CI =1.11–1.72) and a protective effect by RAF1 (HR=0.64, 95% CI=0.5–0.82) in the EGFR pathway. In patients receiving platinum plus gemcitabine treatment, RAF and GPX5 (glutathione pathway) genetic variations showed protective effects on survival (HR=0.54, 95% CI=0.38–0.77; HR=0.67, 95% CI=0.52–0.85, respectively); in contrast, NRAS (EGFR pathway) and GPX7 (glutathione pathway) variations showed hazardous effects on overall survival (HR=1.91, 95% CI=1.30–2.80; HR=1.83, 95% CI=1.27–2.63, respectively). All genes that harbored these significant SNPs remained significant by whole-gene analysis.
Common genetic variations in genes of EGFR and glutathione pathways may be associated with overall survival among patients with advanced-stage NSCLC treated with platinum, taxane, and/or gemicitabine combinations.
non-small cell lung cancer; survival; single-nucleotide polymorphisms; pathway; chemotherapy
The androgen receptor (AR) is the principal target for treatment of non-organ confined prostate cancer (PCa). Androgen deprivation therapies (ADTs) directed against the AR ligand-binding domain do not fully inhibit androgen-dependent signaling critical for PCa progression. Thus, information that could direct the development of more effective ADTs are desired. Systems and bioinformatics approaches suggest that considerable variation exists in the mechanisms by which AR regulates expression of effector genes, pointing to a role for secondary transcription factors. A combination of microarray and in silico analyses led us to identify a 158 gene signature that relies on AR along with the transcription factor SRF, representing < 6% of androgen-dependent genes. This AR-SRF signature is sufficient to distinguish microdissected benign and malignant prostate samples, and it correlates with the presence of aggressive disease and poor outcome. Compared to other AR target gene signatures of similar size, the AR-SRF signature described here associates more strongly with biochemical failure. Further, it is enriched in malignant versus benign prostate tissues, compared to other signatures. To our knowledge, this profile represents the first demonstration of a distinct mechanism of androgen action with clinical relevance in PCa, offering a possible rationale to develop novel and more effective forms of ADT.
prostate cancer; androgen receptor; transcription; gene expression; disease progression
The Epstein-Barr virus (EBV) nuclear antigen 2 (EBNA-2) plays a key role in the B-cell growth transformation by initiating and maintaining the proliferation of infected B-cell upon EBV infection in vitro. Most studies about EBNA-2 have focused on its functions yet little is known for its intertypic polymorphisms.
Coding region for amino acid (aa) 148-487 of the EBNA-2 gene was sequenced in 25 EBV-associated gastric carcinomas (EBVaGCs), 56 nasopharyngeal carcinomas (NPCs) and 32 throat washings (TWs) from healthy donors in Northern China. Three variations (g48991t, c48998a, t49613a) were detected in all of the samples (113/113, 100%). EBNA-2 could be classified into four distinct subtypes: E2-A, E2-B, E2-C and E2-D based on the deletion status of three aa (294Q, 357K and 358G). Subtypes E2-A and E2-C were detected in 56/113 (49.6%), 38/113 (33.6%) samples, respectively. E2-A was observed more in EBVaGCs samples and subtype E2-D was only detected in the NPC samples. Variation analysis in EBNA-2 functional domains: the TAD residue (I438L) and the NLS residues (E476G, P484H and I486T) were only detected in NPC samples which located in the carboxyl terminus of EBNA-2 gene.
The subtypes E2-A and E2-C were the dominant genotypes of the EBNA-2 gene in Northern China. The subtype E2-D may be associated with the tumorigenesis of NPC. The NPC isolates were prone harbor to more mutations than the other two groups in the functional domains.
Epstein-Barr virus; Gastric carcinoma; Nasopharyngeal carcinoma; Nuclear antigen 2; Polymorphism
The heparin-degrading endosulfatases sulfatase 1 (SULF1) and sulfatase 2 (SULF2) have opposing effects in hepatocarcinogenesis despite structural similarity. Using mRNA expression arrays, we analyzed the correlations of SULF expression with signaling networks in human hepatocellular carcinomas (HCCs) and the associations of SULF expression with tumor phenotype and patient survival. Data from two mRNA microarray analyses of 139 and 36 HCCs and adjacent tissues were used as training and validation sets. Partek and Metacore software were used to identify SULF correlated genes and their associated signaling pathways. Associations between SULF expression, the hepatoblast subtype of HCC, and survival were examined. Both SULF1 and 2 had strong positive correlations with periostin, IQGAP1, TGFB1, and vimentin and inverse correlations with HNF4A and IQGAP2. Genes correlated with both SULFs were highly associated with the cell adhesion, cytoskeletal remodeling, blood coagulation, TGFB, and Wnt/β-catenin and epithelial mesenchymal transition signaling pathways. Genes uniquely correlated with SULF2 were more associated with neoplastic processes than genes uniquely correlated with SULF1. High SULF expression was associated with the hepatoblast subtype of HCC. There was a bimodal effect of SULF1 expression on prognosis, with patients in the lowest or highest tertile having a worse prognosis than those in the middle tertile. SULFs have complex effects on HCC signaling and patient survival. There are functionally similar associations with cell adhesion, ECM remodeling, TGFB, and WNT pathways, but also unique associations of SULF1 and SULF2. The roles and targeting of the SULFs in cancer require further investigation.
KRAS mutations are highly prevalent in non-small cell lung cancer (NSCLC), and tumors harboring these mutations tend to be aggressive and resistant to chemotherapy. We used next-generation sequencing technology to identify pathways that are specifically altered in lung tumors harboring a KRAS mutation. Paired-end RNA-sequencing of 15 primary lung adenocarcinoma tumors (8 harboring mutant KRAS and 7 with wild-type KRAS) were performed. Sequences were mapped to the human genome, and genomic features, including differentially expressed genes, alternate splicing isoforms and single nucleotide variants, were determined for tumors with and without KRAS mutation using a variety of computational methods. Network analysis was carried out on genes showing differential expression (374 genes), alternate splicing (259 genes), and SNV-related changes (65 genes) in NSCLC tumors harboring a KRAS mutation. Genes exhibiting two or more connections from the lung adenocarcinoma network were used to carry out integrated pathway analysis. The most significant signaling pathways identified through this analysis were the NFκB, ERK1/2, and AKT pathways. A 27 gene mutant KRAS-specific sub network was extracted based on gene–gene connections from the integrated network, and interrogated for druggable targets. Our results confirm previous evidence that mutant KRAS tumors exhibit activated NFκB, ERK1/2, and AKT pathways and may be preferentially sensitive to target therapeutics toward these pathways. In addition, our analysis indicates novel, previously unappreciated links between mutant KRAS and the TNFR and PPARγ signaling pathways, suggesting that targeted PPARγ antagonists and TNFR inhibitors may be useful therapeutic strategies for treatment of mutant KRAS lung tumors. Our study is the first to integrate genomic features from RNA-Seq data from NSCLC and to define a first draft genomic landscape model that is unique to tumors with oncogenic KRAS mutations.
transcriptome sequencing; RNA-Seq; KRAS mutation; NSCLC; bioinformatics; network analysis; data integration and computational methods
Genome-wide methylation profiling has led to more comprehensive insights into gene regulation mechanisms and potential therapeutic targets. Illumina Human Methylation BeadChip is one of the most commonly used genome-wide methylation platforms. Similar to other microarray experiments, methylation data is susceptible to various technical artifacts, particularly batch effects. To date, little attention has been given to issues related to normalization and batch effect correction for this kind of data.
We evaluated three common normalization approaches and investigated their performance in batch effect removal using three datasets with different degrees of batch effects generated from HumanMethylation27 platform: quantile normalization at average β value (QNβ); two step quantile normalization at probe signals implemented in "lumi" package of R (lumi); and quantile normalization of A and B signal separately (ABnorm). Subsequent Empirical Bayes (EB) batch adjustment was also evaluated.
Each normalization could remove a portion of batch effects and their effectiveness differed depending on the severity of batch effects in a dataset. For the dataset with minor batch effects (Dataset 1), normalization alone appeared adequate and "lumi" showed the best performance. However, all methods left substantial batch effects intact in the datasets with obvious batch effects and further correction was necessary. Without any correction, 50 and 66 percent of CpGs were associated with batch effects in Dataset 2 and 3, respectively. After QNβ, lumi or ABnorm, the number of CpGs associated with batch effects were reduced to 24, 32, and 26 percent for Dataset 2; and 37, 46, and 35 percent for Dataset 3, respectively. Additional EB correction effectively removed such remaining non-biological effects. More importantly, the two-step procedure almost tripled the numbers of CpGs associated with the outcome of interest for the two datasets.
Genome-wide methylation data from Infinium Methylation BeadChip can be susceptible to batch effects with profound impacts on downstream analyses and conclusions. Normalization can reduce part but not all batch effects. EB correction along with normalization is recommended for effective batch effect removal.
Summary: TREAT (Targeted RE-sequencing Annotation Tool) is a tool for facile navigation and mining of the variants from both targeted resequencing and whole exome sequencing. It provides a rich integration of publicly available as well as in-house developed annotations and visualizations for variants, variant-hosting genes and host-gene pathways.
Availability and implementation: TREAT is freely available to non-commercial users as either a stand-alone annotation and visualization tool, or as a comprehensive workflow integrating sequencing alignment and variant calling. The executables, instructions and the Amazon Cloud Images of TREAT can be downloaded at the website: http://ndc.mayo.edu/mayo/research/biostat/stand-alone-packages.cfm
Contact: Hossain.Asif@mayo.edu; Kocher.JeanPierre@mayo.edu
Supplementary information: Supplementary data are provided at Bioinformatics online.
Gene expression is suppressed by DNA methylation. The goal of this study was to identify genes whose CpG site methylation and mRNA expression are associated with recurrence after surgical resection for hepatocellular carcinoma (HCC). Sixty-two HCCs were examined by both whole genome DNA methylation and transcriptome analysis. The Cox model was used to select genes associated with recurrence. A validation was performed in an independent cohort of 66 HCC patients. Among fifty-nine common genes, increased CpG site methylation and decreased mRNA expression were associated with recurrence for 12 genes (Group A), whereas decreased CpG site methylation and increased mRNA expression were associated with recurrence for 25 genes (Group B). The remaining 22 genes were defined as Group C. Complement factor H (CFH) and myosin VIIA and Rab interacting protein (MYRIP) in Group A; proline/serine-rich coiled-coil 1 (PSRC1), meiotic recombination 11 homolog A (MRE11A), and myosin IE (MYO1E) in Group B; and autophagy-related protein LC3 A (MAP1LC3A), and NADH dehydrogenase 1 alpha subcomplex assembly factor 1 (NDUFAF1) in Group C were validated. In conclusion, potential tumor suppressor (CFH, MYRIP) and oncogenes (PSRC1, MRE11A, MYO1E) in HCC are reported. The regulation of individual genes by methylation in hepatocarcinogenesis needs to be validated.
Carcinoma, Hepatocellular; Gene Expression Profiling; Microarray analysis; DNA methylation; Survival
The cystic fibrosis transmembrane conductance regulator (CFTR) holds an important role in retaining lung function, but its association with lung cancer is unclear. A case-control study was conducted to determine the possible associations of the genetic variants in the CFTR gene with lung cancer risk. Genotypes of a most common deletion ΔF508, one functional SNP, and eight tag SNPs in the CFTR gene were determined in 574 lung cancer patients and 679 controls. A logistic regression model, adjusting for known risk factors, was used to evaluate the association of each variant with lung cancer risk, as confirmation haplotype and sub-haplotype analyses were performed. ΔF508 deletion and genotypes with minor alleles in one tag SNP, rs10487372, and one functional SNP, rs213950, were inversely associated with lung cancer risk. The results of haplotype and sub-haplotype analyses were consistent with single variant analysis, all pointing to deletion ΔF508 being the key variant for significant haplotypes and sub-haplotypes. Individuals with ‘deletion-T’ (ΔF508/rs10487372) haplotype had a 68% reduced risk for lung cancer compared to common haplotype ‘no-deletion-C’ (OR=0.32; 95% CI=0.15–0.68; p=0.01). Genetic variations in the CFTR gene might modulate the risk of lung cancer. This study, for the first time, provides evidence of a protective role of the CFTR deletion carrier in the etiology of lung cancer.
Cystic fibrosis transmembrane conductance regulator; lung cancer; genetic variation
SnowShoes-FTD, developed for fusion transcript detection in paired-end mRNA-Seq data, employs multiple steps of false positive filtering to nominate fusion transcripts with near 100% confidence. Unique features include: (i) identification of multiple fusion isoforms from two gene partners; (ii) prediction of genomic rearrangements; (iii) identification of exon fusion boundaries; (iv) generation of a 5′–3′ fusion spanning sequence for PCR validation; and (v) prediction of the protein sequences, including frame shift and amino acid insertions. We applied SnowShoes-FTD to identify 50 fusion candidates in 22 breast cancer and 9 non-transformed cell lines. Five additional fusion candidates with two isoforms were confirmed. In all, 30 of 55 fusion candidates had in-frame protein products. No fusion transcripts were detected in non-transformed cells. Consideration of the possible functions of a subset of predicted fusion proteins suggests several potentially important functions in transformation, including a possible new mechanism for overexpression of ERBB2 in a HER-positive cell line. The source code of SnowShoes-FTD is provided in two formats: one configured to run on the Sun Grid Engine for parallelization, and the other formatted to run on a single LINUX node. Executables in PERL are available for download from our web site: http://mayoresearch.mayo.edu/mayo/research/biostat/stand-alone-packages.cfm.
Lung cancer in individuals who have never smoked tobacco products is an increasing medical and public-health issue. We aimed to unravel the genetic basis of lung cancer in never smokers.
We did a four-stage investigation. First, a genome-wide association study of single nucleotide polymorphisms (SNPs) was done with 754 never smokers (377 matched case-control pairs at Mayo Clinic, Rochester, MN, USA). Second, the top candidate SNPs from the first study were validated in two independent studies among 735 (MD Anderson Cancer Center, Houston, TX, USA) and 253 (Harvard University, Boston, MA, USA) never smokers. Third, further replication of the top SNP was done in 530 never smokers (UCLA, Los Angeles, CA, USA). Fourth, expression quantitative trait loci (eQTL) and gene-expression differences were analysed to further elucidate the causal relation between the validated SNPs and the risk of lung cancer in never smokers.
44 top candidate SNPs were identified that might alter the risk of lung cancer in never smokers. rs2352028 at chromosome 13q31.3 was subsequently replicated with an additive genetic model in the four independent studies, with a combined odds ratio of 1·46 (95% CI 1·26–1·70, p=5·94×10−6). A cis eQTL analysis showed there was a strong correlation between genotypes of the replicated SNPs and the transcription level of the gene GPC5 in normal lung tissues (p=1·96×10−4), with the high-risk allele linked with lower expression. Additionally, the transcription level of GPC5 in normal lung tissue was twice that detected in matched lung adenocarcinoma tissue (p=6·75×10−11).
Genetic variants at 13q31.3 alter the expression of GPC5, and are associated with susceptibility to lung cancer in never smokers. Downregulation of GPC5 might contribute to the development of lung cancer in never smokers.
We have previously demonstrated that endoxifen is the most important tamoxifen metabolite responsible for eliciting the anti-estrogenic effects of this drug in breast cancer cells expressing estrogen receptor-alpha (ERα). However, the relevance of ERβ in mediating endoxifen action has yet to be explored. Here, we characterize the molecular actions of endoxifen in breast cancer cells expressing ERβ and examine its effectiveness as an anti-estrogenic agent in these cell lines.
MCF7, Hs578T and U2OS cells were stably transfected with full-length ERβ. ERβ protein stability, dimer formation with ERα and expression of known ER target genes were characterized following endoxifen exposure. The ability of various endoxifen concentrations to block estrogen-induced proliferation of MCF7 parental and ERβ-expressing cells was determined. The global gene expression profiles of these two cell lines was monitored following estrogen and endoxifen exposure and biological pathway analysis of these data sets was conducted to identify altered cellular processes.
Our data demonstrate that endoxifen stabilizes ERβ protein, unlike its targeted degradation of ERα, and induces ERα/ERβ heterodimerization in a concentration dependent manner. Endoxifen is also shown to be a more potent inhibitor of estrogen target genes when ERβ is expressed. Additionally, low concentrations of endoxifen observed in tamoxifen treated patients with deficient CYP2D6 activity (20 to 40 nM) markedly inhibit estrogen-induced cell proliferation rates in the presence of ERβ, whereas much higher endoxifen concentrations are needed when ERβ is absent. Microarray analyses reveal substantial differences in the global gene expression profiles induced by endoxifen at low concentrations (40 nM) when comparing MCF7 cells which express ERβ to those that do not. These profiles implicate pathways related to cell proliferation and apoptosis in mediating endoxifen effectiveness at these lower concentrations.
Taken together, these data demonstrate that the presence of ERβ enhances the sensitivity of breast cancer cells to the anti-estrogenic effects of endoxifen likely through the molecular actions of ERα/β heterodimers. These findings underscore the need to further elucidate the role of ERβ in the biology and treatment of breast cancer and suggest that the importance of pharmacologic variation in endoxifen concentrations may differ according to ERβ expression.