Hepatocellular carcinoma (HCC) is associated with poor survival for patients and few effective treatment options, raising the need for novel therapeutic strategies. MicroRNAs (miRNAs) play important roles in tumor development and show deregulated patterns of expression in HCC. Because of the liver’s unique affinity for small nucleic acids, miRNA based therapy has been proposed in the treatment of liver disease. There is thus an urgent need to identify and characterize aberrantly expressed miRNAs in HCC. In our study, we profiled miRNA expression changes in de novo liver tumors driven by MYC and/or RAS, two canonical oncogenes activated in a majority of human HCC. We identified an upregulated miRNA megacluster comprised of 53 miRNAs on mouse chromosome 12qF1 (human homolog 14q32). This miRNA megacluster is upregulated in all three transgenic liver models and in a subset of human HCCs. An unbiased functional analysis of all miRNAs within this cluster was performed.
We found that miR-494 is overexpressed in human HCC, and aids in transformation by regulating the G1/S cell cycle transition through targeting of the Mutated in Colorectal Cancer (MCC) tumor suppressor. miR-494 inhibition in human HCC cell lines decreases cellular transformation and anti-miR-494 treatment of primary MYC-driven liver tumor formation significantly diminishes tumor size. Our findings identify a new therapeutic target, miR-494, for the treatment of HCC.
HCC; cancer; cell cycle; Dlk1-Dio3; miRNA therapy
To develop an algorithm for mapping the Functional Assessment of Cancer Therapy – Breast (FACT-B) to the 5-level EuroQoL Group’s 5-dimension questionnaire (EQ-5D-5L) utility index.
A survey of 238 breast cancer patients in Singapore was conducted. Models using various regression methods with or without recognizing the upper boundary of utility values at 1 were fitted to predict the EQ-5D-5L utility index based on the five subscale scores of the FACT-B. Data from a follow-up survey of these patients were used to validate the results.
A model that maps the physical, emotional, functional well-being and the breast cancer concerns subscales of the FACT-B to the EQ-5D-5L utility index was derived. The social well-being subscale was not associated to the utility index. Although theoretical assumptions may not be valid, ordinary least square outperformed other regression methods. The mean predicted utility index within each performance status level at follow-up deviated from the observed mean less than the minimally important difference of EQ-5D for cancer patients.
The mapping algorithm converts the FACT-B to the EQ-5D utility index. This enables oncologists, clinical researchers and policy makers to obtain a quantitative utility summary of a patient’s health status when only the FACT-B is assessed.
Breast cancer; EQ-5D-5L; FACT-B; Health utility; Mapping; Quality of life
Multi-omics research is a key ingredient of data-intensive life sciences research, permitting measurement of biological molecules at different functional levels in the same individual. For a complete picture at the biological systems level, appropriate statistical techniques must however be developed to integrate different ‘omics’ data sets (e.g., genomics and proteomics). We report here multivariate projection-based analyses approaches to genomics and proteomics data sets, using the case study of and applications to observations in kidney transplant patients who experienced an acute rejection event (n=20) versus non-rejecting controls (n=20). In this data sets, we show how these novel methodologies might serve as promising tools for dimension reduction and selection of relevant features for different analytical frameworks. Unsupervised analyses highlighted the importance of post transplant time-of-rejection, while supervised analyses identified gene and protein signatures that together predicted rejection status with little time effect. The selected genes are part of biological pathways that are representative of immune responses. Gene enrichment profiles revealed increases in innate immune responses and neutrophil activities and a depletion of T lymphocyte related processes in rejection samples as compared to controls. In all, this article offers candidate biomarkers for future detection and monitoring of acute kidney transplant rejection, as well as ways forward for methodological advances to better harness multi-omics data sets.
Acute rejection is a major complication of solid organ transplantation that prevents the long-term assimilation of the allograft. Various populations of lymphocytes are principal mediators of this process, infiltrating graft tissues and driving cell-mediated cytotoxicity. Understanding the lymphocyte-specific biology associated with rejection is therefore critical. Measuring genome-wide changes in transcript abundance in peripheral whole blood cells can deliver a comprehensive view of the status of the immune system. The heterogeneous nature of the tissue significantly affects the sensitivity and interpretability of traditional analyses, however. Experimental separation of cell types is an obvious solution, but is often impractical and, more worrying, may affect expression, leading to spurious results. Statistical deconvolution of the cell type-specific signal is an attractive alternative, but existing approaches still present some challenges, particularly in a clinical research setting. Obtaining time-matched sample composition to biologically interesting, phenotypically homogeneous cell sub-populations is costly and adds significant complexity to study design. We used a two-stage, in silico deconvolution approach that first predicts sample composition to biologically meaningful and homogeneous leukocyte sub-populations, and then performs cell type-specific differential expression analysis in these same sub-populations, from peripheral whole blood expression data. We applied this approach to a peripheral whole blood expression study of kidney allograft rejection. The patterns of differential composition uncovered are consistent with previous studies carried out using flow cytometry and provide a relevant biological context when interpreting cell type-specific differential expression results. We identified cell type-specific differential expression in a variety of leukocyte sub-populations at the time of rejection. The tissue-specificity of these differentially expressed probe-set lists is consistent with the originating tissue and their functional enrichment consistent with allograft rejection. Finally, we demonstrate that the strategy described here can be used to derive useful hypotheses by validating a cell type-specific ratio in an independent cohort using the nanoString nCounter assay.
The molecular profile of circulating blood can reflect physiological and pathological events occurring in other tissues and organs of the body and delivers a comprehensive view of the status of the immune system. Blood has been useful in studying the pathobiology of many diseases. It is accessible and easily collected making it ideally suited to the development of diagnostic biomarker tests. The blood transcriptome has a high complement of globin RNA that could potentially saturate next-generation sequencing platforms, masking lower abundance transcripts. Methods to deplete globin mRNA are available, but their effect has not been comprehensively studied in peripheral whole blood RNA-Seq data. In this study we aimed to assess technical variability associated with globin depletion in addition to assessing general technical variability in RNA-Seq from whole blood derived samples.
We compared technical and biological replicates having undergone globin depletion or not and found that the experimental globin depletion protocol employed removed approximately 80% of globin transcripts, improved the correlation of technical replicates, allowed for reliable detection of thousands of additional transcripts and generally increased transcript abundance measures. Differential expression analysis revealed thousands of genes significantly up-regulated as a result of globin depletion. In addition, globin depletion resulted in the down-regulation of genes involved in both iron and zinc metal ion bonding.
Globin depletion appears to meaningfully improve the quality of peripheral whole blood RNA-Seq data, and may improve our ability to detect true biological variation. Some concerns remain, however. Key amongst them the significant reduction in RNA yields following globin depletion. More generally, our investigation of technical and biological variation with and without globin depletion finds that high-throughput sequencing by RNA-Seq is highly reproducible within a large dynamic range of detection and provides an accurate estimation of RNA concentration in peripheral whole blood. High-throughput sequencing is thus a promising technology for whole blood transcriptomics and biomarker discovery.
In this study, we explored a time course of peripheral whole blood transcriptomes from kidney transplantation patients who either experienced an acute rejection episode or did not in order to better delineate the immunological and biological processes measureable in blood leukocytes that are associated with acute renal allograft rejection. Using microarrays, we generated gene expression data from 24 acute rejectors and 24 nonrejectors. We filtered the data to obtain the most unambiguous and robustly expressing probe sets and selected a subset of patients with the clearest phenotype. We then performed a data-driven exploratory analysis using data reduction and differential gene expression analysis tools in order to reveal gene expression signatures associated with acute allograft rejection. Using a template-matching algorithm, we then expanded our analysis to include time course data, identifying genes whose expression is modulated leading up to acute rejection. We have identified molecular phenotypes associated with acute renal allograft rejection, including a significantly upregulated signature of neutrophil activation and accumulation following transplant surgery that is common to both acute rejectors and nonrejectors. Our analysis shows that this expression signature appears to stabilize over time in nonrejectors but persists in patients who go on to reject the transplanted organ. In addition, we describe an expression signature characteristic of lymphocyte activity and proliferation. This lymphocyte signature is significantly downregulated in both acute rejectors and nonrejectors following surgery; however, patients who go on to reject the organ show a persistent downregulation of this signature relative to the neutrophil signature.
blood transcriptomics; microarray; kidney transplant rejection; peripheral whole blood; neutrophil to lymphocyte ratio
End-stage renal failure is associated with profound changes in physiology and health, but the molecular causation of these pleomorphic effects termed “uremia” is poorly understood. The genomic changes of uremia were explored in a whole genome microarray case-control comparison of 95 subjects with end-stage renal failure (n = 75) or healthy controls (n = 20).
RNA was separated from blood drawn in PAXgene tubes and gene expression analyzed using Affymetrix Human Genome U133 Plus 2.0 arrays. Quality control and normalization was performed, and statistical significance determined with multiple test corrections (qFDR). Biological interpretation was aided by knowledge mining using NIH DAVID, MetaCore and PubGene
Over 9,000 genes were differentially expressed in uremic subjects compared to normal controls (fold change: -5.3 to +6.8), and more than 65% were lower in uremia. Changes appeared to be regulated through key gene networks involving cMYC, SP1, P53, AP1, NFkB, HNF4 alpha, HIF1A, c-Jun, STAT1, STAT3 and CREB1. Gene set enrichment analysis showed that mRNA processing and transport, protein transport, chaperone functions, the unfolded protein response and genes involved in tumor genesis were prominently lower in uremia, while insulin-like growth factor activity, neuroactive receptor interaction, the complement system, lipoprotein metabolism and lipid transport were higher in uremia. Pathways involving cytoskeletal remodeling, the clathrin-coated endosomal pathway, T-cell receptor signaling and CD28 pathways, and many immune and biological mechanisms were significantly down-regulated, while the ubiquitin pathway and certain others were up-regulated.
End-stage renal failure is associated with profound changes in human gene expression which appears to be mediated through key transcription factors. Dialysis and primary kidney disease had minor effects on gene regulation, but uremia was the dominant influence in the changes observed. This data provides important insight into the changes in cellular biology and function, opportunities for biomarkers of disease progression and therapy, and potential targets for intervention in uremia.
Gene expression profiling; Uremia; Chronic renal failure
Recent technical advances in the field of quantitative proteomics have stimulated a large number of biomarker discovery studies of various diseases, providing avenues for new treatments and diagnostics. However, inherent challenges have limited the successful translation of candidate biomarkers into clinical use, thus highlighting the need for a robust analytical methodology to transition from biomarker discovery to clinical implementation. We have developed an end-to-end computational proteomic pipeline for biomarkers studies. At the discovery stage, the pipeline emphasizes different aspects of experimental design, appropriate statistical methodologies, and quality assessment of results. At the validation stage, the pipeline focuses on the migration of the results to a platform appropriate for external validation, and the development of a classifier score based on corroborated protein biomarkers. At the last stage towards clinical implementation, the main aims are to develop and validate an assay suitable for clinical deployment, and to calibrate the biomarker classifier using the developed assay. The proposed pipeline was applied to a biomarker study in cardiac transplantation aimed at developing a minimally invasive clinical test to monitor acute rejection. Starting with an untargeted screening of the human plasma proteome, five candidate biomarker proteins were identified. Rejection-regulated proteins reflect cellular and humoral immune responses, acute phase inflammatory pathways, and lipid metabolism biological processes. A multiplex multiple reaction monitoring mass-spectrometry (MRM-MS) assay was developed for the five candidate biomarkers and validated by enzyme-linked immune-sorbent (ELISA) and immunonephelometric assays (INA). A classifier score based on corroborated proteins demonstrated that the developed MRM-MS assay provides an appropriate methodology for an external validation, which is still in progress. Plasma proteomic biomarkers of acute cardiac rejection may offer a relevant post-transplant monitoring tool to effectively guide clinical care. The proposed computational pipeline is highly applicable to a wide range of biomarker proteomic studies.
Novel proteomic technology has led to the generation of vast amounts of biological data and the identification of numerous potential biomarkers. However, computational approaches to translate this information into knowledge capable of impacting clinical care have been lagging. We propose a computational proteomic pipeline for biomarker studies that is founded on the combination of advanced statistical methodologies. We demonstrate our approach through the analysis of data obtained from heart transplant patients. Heart transplantation is the gold standard treatment for patients with end-stage heart failure, but is complicated by episodes of immune rejection that can adversely impact patient outcomes. Current rejection monitoring approaches are highly invasive, requiring a biopsy of the heart. This work aims to reduce the need for biopsies, and demonstrate the power and utility of computational approaches in proteomic biomarker discovery. Our work utilizes novel high-throughput proteomic technology combined with advanced statistical techniques to identify blood markers that guide the decision as to whether a biopsy is warranted, reduce the number of unnecessary biopsies, and ultimately diagnose the presence of rejection in heart transplant patients. Additionally, the proposed computational methodologies can be applied to a range of proteomic biomarker studies of various diseases and conditions.
Biomarker panels derived separately from genomic and proteomic data and with a variety of computational methods have demonstrated promising classification performance in various diseases. An open question is how to create effective proteo-genomic panels. The framework of ensemble classifiers has been applied successfully in various analytical domains to combine classifiers so that the performance of the ensemble exceeds the performance of individual classifiers. Using blood-based diagnosis of acute renal allograft rejection as a case study, we address the following question in this paper: Can acute rejection classification performance be improved by combining individual genomic and proteomic classifiers in an ensemble?
The first part of the paper presents a computational biomarker development pipeline for genomic and proteomic data. The pipeline begins with data acquisition (e.g., from bio-samples to microarray data), quality control, statistical analysis and mining of the data, and finally various forms of validation. The pipeline ensures that the various classifiers to be combined later in an ensemble are diverse and adequate for clinical use. Five mRNA genomic and five proteomic classifiers were developed independently using single time-point blood samples from 11 acute-rejection and 22 non-rejection renal transplant patients. The second part of the paper examines five ensembles ranging in size from two to 10 individual classifiers. Performance of ensembles is characterized by area under the curve (AUC), sensitivity, and specificity, as derived from the probability of acute rejection for individual classifiers in the ensemble in combination with one of two aggregation methods: (1) Average Probability or (2) Vote Threshold. One ensemble demonstrated superior performance and was able to improve sensitivity and AUC beyond the best values observed for any of the individual classifiers in the ensemble, while staying within the range of observed specificity. The Vote Threshold aggregation method achieved improved sensitivity for all 5 ensembles, but typically at the cost of decreased specificity.
Proteo-genomic biomarker ensemble classifiers show promise in the diagnosis of acute renal allograft rejection and can improve classification performance beyond that of individual genomic or proteomic classifiers alone. Validation of our results in an international multicenter study is currently underway.
Biomarkers; Computational; Pipeline; Genomics; Proteomics; Ensemble; Classification
To investigate the impact of genetic polymorphisms in CYP2D6, CYP3A5, CYP2C9 and CYP2C19 on the pharmacokinetics of tamoxifen and its metabolites in Asian breast cancer patients.
A total of 165 Asian breast cancer patients receiving 20 mg tamoxifen daily and 228 healthy Asian subjects (Chinese, Malay and Indian; n = 76 each) were recruited. The steady-state plasma concentrations of tamoxifen and its metabolites were quantified using high-performance liquid chromatography. The CYP2D6 polymorphisms were genotyped using the INFINITI™ CYP450 2D6I assay, while the polymorphisms in CYP3A5, CYP2C9 and CYP2C19 were determined via direct sequencing.
The polymorphisms, CYP2D6*5 and *10, were significantly associated with lower endoxifen and higher N-desmethyltamoxifen (NDM) concentrations. Patients who were *1/*1 carriers exhibited 2.4- to 2.6-fold higher endoxifen concentrations and 1.9- to 2.1-fold lower NDM concentrations than either *10/*10 or *5/*10 carriers (P < 0.001). Similarly, the endoxifen concentrations were found to be 1.8- to 2.6-times higher in *1/*5 or *1/*10 carriers compared with *10/*10 and *5/*10 carriers (P≤ 0.001). Similar relationships were observed between the CYP2D6 polymorphisms and metabolic ratios of tamoxifen and its metabolites. No significant associations were observed with regards to the polymorphisms in CYP3A5, CYP2C9 and CYP2C19.
The present study in Asian breast cancer patients showed that CYP2D6*5/*10 and *10/*10 genotypes are associated with significantly lower concentrations of the active metabolite of tamoxifen, endoxifen. Identifying such patients before the start of treatment may be useful in optimizing therapy with tamoxifen. The role of CYP3A5, CYP2C9 and CYP2C19 seem to be minor.
CYP2C19; CYP2D6; CYP3A5; pharmacogenetics; pharmacokinetics; tamoxifen
Acute cardiac allograft rejection is a serious complication of heart transplantation. Investigating molecular processes in whole blood via microarrays is a promising avenue of research in transplantation, particularly due to the non-invasive nature of blood sampling. However, whole blood is a complex tissue and the consequent heterogeneity in composition amongst samples is ignored in traditional microarray analysis. This complicates the biological interpretation of microarray data. Here we have applied a statistical deconvolution approach, cell-specific significance analysis of microarrays (csSAM), to whole blood samples from subjects either undergoing acute heart allograft rejection (AR) or not (NR). We identified eight differentially expressed probe-sets significantly correlated to monocytes (mapping to 6 genes, all down-regulated in ARs versus NRs) at a false discovery rate (FDR) ≤ 15%. None of the genes identified are present in a biomarker panel of acute heart rejection previously published by our group and discovered in the same data***.
microarray expression; cell-specific expression; deconvolution; heart; transplantation
MicroRNA-21 (miR-21) is thought to be an oncomir because it promotes cancer cell proliferation, migration, and survival. miR-21 is also expressed in normal cells, but its physiological role is poorly understood. Recently, it has been found that miR-21 expression is rapidly induced in rodent hepatocytes during liver regeneration after two-thirds partial hepatectomy (2/3 PH). Here, we investigated the function of miR-21 in regenerating mouse hepatocytes by inhibiting it with an antisense oligonucleotide. To maintain normal hepatocyte viability and function, we antagonized the miR-21 surge induced by 2/3 PH while preserving baseline expression. We found that knockdown of miR-21 impaired progression of hepatocytes into S phase of the cell cycle, mainly through a decrease in levels of cyclin D1 protein, but not Ccnd1 mRNA. Mechanistically, we discovered that increased miR-21 expression facilitated cyclin D1 translation in the early phase of liver regeneration by relieving Akt1/mTOR complex 1 signaling (and thus eIF-4F–mediated translation initiation) from suppression by Rhob. Our findings reveal that miR-21 enables rapid hepatocyte proliferation during liver regeneration by accelerating cyclin D1 translation.
To evaluate the anti-microbial effects of photodynamic therapy (PDT) on infected human teeth ex vivo.
Materials and Methods
Fifty-two freshly extracted teeth with pulpal necrosis and associated periradicular radiolucencies were obtained from 34 subjects. Twenty-six teeth with 49 canals received chemomechanical debridement (CMD) with 6% NaOCl and twenty-six teeth with 52 canals received CMD plus PDT. For PDT, root canal systems were incubated with methylene blue (MB) at concentration of 50 µg/ml for 5 minutes followed by exposure to red light at 665 nm with an energy fluence of 30 J/cm2. The contents of root canals were sampled by flushing the canals at baseline and following CMD alone or CMD+PDT and were serially diluted and cultured on blood agar. Survival fractions were calculated by counting colony-forming units (CFU). Partial characterization of root canal species at baseline and following CMD alone or CMD+PDT was performed using DNA probes to a panel of 39 endodontic species in the checkerboard assay.
The Mantel-Haenszel chi-square test for treatment effects demonstrated the better performance of CMD+PDT over CMD (P=0.026). CMD+PDT significantly reduced the frequency of positive canals relative to CMD alone (P=0.0003). Following CMD+PDT, 45 of 52 canals (86.5%) had no CFU as compared to 24 of 49 canals (49%) treated with CMD (canal flush samples). The CFU reductions were similar when teeth or canals were treated as independent entities. Post-treatment detection levels for all species were markedly lower for canals treated by CMD+PDT than were for those treated by CMD alone. Bacterial species within dentinal tubules were detected in 17/22 (77.3%) and 15/29 (51.7%) of canals in the CMD and CMD+PDT group, respectively (P= 0.034).
Data indicate that PDT significantly reduces residual bacteria within the root canal system, and that PDT, if further enhanced by technical improvements, holds substantial promise as an adjunct to CMD.
Photodynamic therapy; methylene blue; endodontic disinfection; ex vivo
Recent evidence has contradicted the prevailing view that homeostasis and regeneration of the adult liver are mediated by self duplication of lineage-restricted hepatocytes and biliary epithelial cells. These new data suggest that liver progenitor cells do not function solely as a backup system in chronic liver injury; rather, they also produce hepatocytes after acute injury and are in fact the main source of new hepatocytes during normal hepatocyte turnover. In addition, other evidence suggests that hepatocytes are capable of lineage conversion, acting as precursors of biliary epithelial cells during biliary injury. To test these concepts, we generated a hepatocyte fate-tracing model based on timed and specific Cre recombinase expression and marker gene activation in all hepatocytes of adult Rosa26 reporter mice with an adenoassociated viral vector. We found that newly formed hepatocytes derived from preexisting hepatocytes in the normal liver and that liver progenitor cells contributed minimally to acute hepatocyte regeneration. Further, we found no evidence that biliary injury induced conversion of hepatocytes into biliary epithelial cells. These results therefore restore the previously prevailing paradigms of liver homeostasis and regeneration. In addition, our new vector system will be a valuable tool for timed, efficient, and specific loop out of floxed sequences in hepatocytes.
MicroRNAs (miRNAs) constitute a new class of regulators of gene expression. Among other actions, miRNAs have been shown to control cell proliferation in development and cancer. However, whether miRNAs regulate hepatocyte proliferation during liver regeneration is unknown. We addressed this question by performing 2/3 partial hepatectomy (2/3 PH) on mice with hepatocyte-specific inactivation of DiGeorge syndrome critical region gene 8 (DGCR8), an essential component of the miRNA processing pathway. Hepatocytes of these mice were miRNA-deficient and exhibited a delay in cell cycle progression involving the G1 to S phase transition. Examination of livers of wildtype mice after 2/3 PH revealed differential expression of a subset of miRNAs, notably an induction of miR-21 and repression of miR-378. We further discovered that miR-21 directly inhibits Btg2, a cell cycle inhibitor that prevents activation of forkhead box M1 (FoxM1), which is essential for DNA synthesis in hepatocytes after 2/3 PH. In addition, we found that miR-378 directly inhibits ornithine decarboxylase (Odc1), which is known to promote DNA synthesis in hepatocytes after 2/3 PH.
Our results show that miRNAs are critical regulators of hepatocyte proliferation during liver regeneration. Because these miRNAs and target gene interactions are conserved, our findings may also be relevant to human liver regeneration.
An important consideration when analyzing both microarray and quantitative PCR expression data is the selection of appropriate genes as endogenous controls or reference genes. This step is especially critical when identifying genes differentially expressed between datasets. Moreover, reference genes suitable in one context (e.g. lung cancer) may not be suitable in another (e.g. breast cancer). Currently, the main approach to identify reference genes involves the mining of expression microarray data for highly expressed and relatively constant transcripts across a sample set. A caveat here is the requirement for transcript normalization prior to analysis, and measurements obtained are relative, not absolute. Alternatively, as sequencing-based technologies provide digital quantitative output, absolute quantification ensues, and reference gene identification becomes more accurate.
Serial analysis of gene expression (SAGE) profiles of non-malignant and malignant lung samples were compared using a permutation test to identify the most stably expressed genes across all samples. Subsequently, the specificity of the reference genes was evaluated across multiple tissue types, their constancy of expression was assessed using quantitative RT-PCR (qPCR), and their impact on differential expression analysis of microarray data was evaluated.
We show that (i) conventional references genes such as ACTB and GAPDH are highly variable between cancerous and non-cancerous samples, (ii) reference genes identified for lung cancer do not perform well for other cancer types (breast and brain), (iii) reference genes identified through SAGE show low variability using qPCR in a different cohort of samples, and (iv) normalization of a lung cancer gene expression microarray dataset with or without our reference genes, yields different results for differential gene expression and subsequent analyses. Specifically, key established pathways in lung cancer exhibit higher statistical significance using a dataset normalized with our reference genes relative to normalization without using our reference genes.
Our analyses found NDUFA1, RPL19, RAB5C, and RPS18 to occupy the top ranking positions among 15 suitable reference genes optimal for normalization of lung tissue expression data. Significantly, the approach used in this study can be applied to data generated using new generation sequencing platforms for the identification of reference genes optimal within diverse contexts.
Ensemble methods have become popular for QSAR modeling, but most studies have assumed balanced data consisting of approximately equal numbers of active and inactive compounds. Cheminformatics data is often far from being balanced. We extend the application of ensemble methods to include cases of imbalance of class membership and to more adequately assess model output. Based on the extension, we propose an ensemble method called MBEnsemble that automatically determines the appropriate tuning parameters to provide reliable predictions and maximize the F-measure. Results from multiple datasets demonstrate that the proposed ensemble technique works well on imbalanced data.
Ensemble; Imbalanced Data; F-measure; Majority Vote; Probability Averaging and Threshold
Acute graft rejection is an important clinical problem in renal transplantation and an adverse predictor for long term graft survival. Plasma biomarkers may offer an important option for post-transplant monitoring and permit timely and effective therapeutic intervention to minimize graft damage. This case-control discovery study (n = 32) used isobaric tagging for relative and absolute protein quantification (iTRAQ) technology to quantitate plasma protein relative concentrations in precise cohorts of patients with and without biopsy-confirmed acute rejection (BCAR). Plasma samples were depleted of the 14 most abundant plasma proteins to enhance detection sensitivity. A total of 18 plasma proteins that encompassed processes related to inflammation, complement activation, blood coagulation, and wound repair exhibited significantly different relative concentrations between patient cohorts with and without BCAR (p value <0.05). Twelve proteins with a fold-change ≥1.15 were selected for diagnostic purposes: seven were increased (titin, lipopolysaccharide-binding protein, peptidase inhibitor 16, complement factor D, mannose-binding lectin, protein Z-dependent protease and β2-microglobulin) and five were decreased (kininogen-1, afamin, serine protease inhibitor, phosphatidylcholine-sterol acyltransferase, and sex hormone-binding globulin) in patients with BCAR. The first three principal components of these proteins showed clear separation of cohorts with and without BCAR. Performance improved with the inclusion of sequential proteins, reaching a primary asymptote after the first three (titin, kininogen-1, and lipopolysaccharide-binding protein). Longitudinal monitoring over the first 3 months post-transplant based on ratios of these three proteins showed clear discrimination between the two patient cohorts at time of rejection. The score then declined to baseline following treatment and resolution of the rejection episode and remained comparable between cases and controls throughout the period of quiescent follow-up. Results were validated using ELISA where possible, and initial cross-validation estimated a sensitivity of 80% and specificity of 90% for classification of BCAR based on a four-protein ELISA classifier. This study provides evidence that protein concentrations in plasma may provide a relevant measure for the occurrence of BCAR and offers a potential tool for immunologic monitoring.
Disruptions of beta-catenin and the canonical Wnt pathway are well documented in cancer. However, little is known of the non-canonical branch of the Wnt pathway. In this study, we investigate the transcript level patterns of genes in the Wnt pathway in squamous cell lung cancer using reverse-transcriptase (RT)-PCR. It was found that over half of the samples examined exhibited dysregulated gene expression of multiple components of the non-canonical branch of the WNT pathway. In the cases where beta catenin (CTNNB1) was not over-expressed, we identified strong relationships of expression between wingless-type MMTV integration site family member 5A (WNT5A)/ frizzled homolog 2 (FZD2), frizzled homolog 3 (FZD3) / dishevelled 2 (DVL2), and low density lipoprotein receptor-related protein 5 (LRP5)/ secreted frizzled-related protein 4 (SFRP4). This is one of the first studies to demonstrate expression of genes in the non-canonical pathway in normal lung tissue and its disruption in lung squamous cell carcinoma. These findings suggest that the non-canonical pathway may have a more prominent role in lung cancer than previously reported.
WNT pathway; lung cancer; gene expression; NSCLC; non-canonical; squamous cell carcinoma
Non-small cell lung cancer (NSCLC) presents as a progressive disease spanning precancerous, preinvasive, locally invasive, and metastatic lesions. Identification of biological pathways reflective of these progressive stages, and aberrantly expressed genes associated with these pathways, would conceivably enhance therapeutic approaches to this devastating disease.
Through the construction and analysis of SAGE libraries, we have determined transcriptome profiles for preinvasive carcinoma-in-situ (CIS) and invasive squamous cell carcinoma (SCC) of the lung, and compared these with expression profiles generated from both bronchial epithelium, and precancerous metaplastic and dysplastic lesions using Ingenuity Pathway Analysis. Expression of genes associated with epidermal development, and loss of expression of genes associated with mucociliary biology, are predominant features of CIS, largely shared with precancerous lesions. Additionally, expression of genes associated with xenobiotic metabolism/detoxification is a notable feature of CIS, and is largely maintained in invasive cancer. Genes related to tissue fibrosis and acute phase immune response are characteristic of the invasive SCC phenotype. Moreover, the data presented here suggests that tissue remodeling/fibrosis is initiated at the early stages of CIS. Additionally, this study indicates that alteration in copy-number status represents a plausible mechanism for differential gene expression in CIS and invasive SCC.
This study is the first report of large-scale expression profiling of CIS of the lung. Unbiased expression profiling of these preinvasive and invasive lesions provides a platform for further investigations into the molecular genetic events relevant to early stages of squamous NSCLC development. Additionally, up-regulated genes detected at extreme differences between CIS and invasive cancer may have potential to serve as biomarkers for early detection.
The study of oral premalignant lesions (OPL) is crucial to the identification of initiating genetic events in oral cancer. However, these lesions are minute in size, making it a challenge to recover sufficient DNA from microdissected cells for comprehensive genomic analysis. As a step toward identifying genetic aberrations associated with oral cancer progression, we used tiling-path array comparative genomic hybridization to compare alterations on chromosome 3p for 71 OPLs against 23 oral squamous cell carcinomas. 3p was chosen because although it is frequently altered in oral cancers and has been associated with progression risk, its alteration status has only been evaluated at a small number of loci in OPLs. We identified six recurrent losses in this region that were shared between high-grade dysplasias and oral squamous cell carcinomas, including a 2.89-Mbp deletion spanning the FHIT gene (previously implicated in oral cancer progression). When the alteration status for these six regions was examined in 24 low-grade dysplasias with known progression outcome, we observed that they occurred at a significantly higher frequency in low-grade dysplasias that later progressed to later-stage disease (P < 0.003). Moreover, parallel analysis of all profiled tissues showed that the extent of overall genomic alteration at 3p increased with histologic stage. This first high-resolution analysis of chromosome arm 3p in OPLs represents a significant step toward predicting progression risk in early preinvasive disease and provides a keen example of how genomic instability escalates with progression to invasive cancer.
Oral cancer develops through a series of histopathological stages: through mild (low grade), moderate, and severe (high grade) dysplasia to carcinoma in situ and then invasive disease. Early detection of those oral premalignant lesions (OPLs) that will develop into invasive tumors is necessary to improve the poor prognosis of oral cancer. Because no tools exist for delineating progression risk in low grade oral lesions, we cannot determine which of these cases require aggressive intervention. We undertook whole genome analysis by tiling-path array comparative genomic hybridization for a rare panel of early and late stage OPLs (n = 62), all of which had extensive longitudinal follow up (>10 years). Genome profiles for oral squamous cell carcinomas (n = 24) were generated for comparison. Parallel analysis of genome alterations and clinical parameters was performed to identify features associated with disease progression. Genome alterations in low grade dysplasias progressing to invasive disease more closely resembled those observed for later stage disease than they did those observed for non-progressing low grade dysplasias. This was despite the histopathological similarity between progressing and non-progressing cases. Strikingly, unbiased computational analysis of genomic alteration data correctly classified nearly all progressing low grade dysplasia cases. Our data demonstrate that high resolution genomic analysis can be used to evaluate progression risk in low grade OPLs, a marked improvement over present histopathological approaches which cannot delineate progression risk. Taken together, our data suggest that whole genome technologies could be used in management strategies for patients presenting with precancerous oral lesions.
Motivation: Analysis of array comparative genomic hybridization (aCGH) data for recurrent DNA copy number alterations from a cohort of patients can yield distinct sets of molecular signatures or profiles. This can be due to the presence of heterogeneous cancer subtypes within a supposedly homogeneous population.
Results: We propose a novel statistical method for automatically detecting such subtypes or clusters. Our approach is model based: each cluster is defined in terms of a sparse profile, which contains the locations of unusually frequent alterations. The profile is represented as a hidden Markov model. Samples are assigned to clusters based on their similarity to the cluster's profile. We simultaneously infer the cluster assignments and the cluster profiles using an expectation maximization-like algorithm. We show, using a realistic simulation study, that our method is significantly more accurate than standard clustering techniques. We then apply our method to two clinical datasets. In particular, we examine previously reported aCGH data from a cohort of 106 follicular lymphoma patients, and discover clusters that are known to correspond to clinically relevant subgroups. In addition, we examine a cohort of 92 diffuse large B-cell lymphoma patients, and discover previously unreported clusters of biological interest which have inspired followup clinical research on an independent cohort.
Availability: Software and synthetic datasets are available at http://www.cs.ubc.ca/∼sshah/acgh as part of the CNA-HMMer package.
Supplementary information: Supplementary data are available at Bioinformatics online.
High throughput microarray technologies have afforded the investigation of genomes, epigenomes, and transcriptomes at unprecedented resolution. However, software packages to handle, analyze, and visualize data from these multiple 'omics disciplines have not been adequately developed.
Here, we present SIGMA2, a system for the integrative genomic multi-dimensional analysis of cancer genomes, epigenomes, and transcriptomes. Multi-dimensional datasets can be simultaneously visualized and analyzed with respect to each dimension, allowing combinatorial integration of the different assays belonging to the different 'omics.
The identification of genes altered at multiple levels such as copy number, loss of heterozygosity (LOH), DNA methylation and the detection of consequential changes in gene expression can be concertedly performed, establishing SIGMA2 as a novel tool to facilitate the high throughput systems biology analysis of cancer.
X-box binding protein 1 (XBP-1), a basic leucine zipper transcription factor, plays a key role in the cellular unfolded protein response (UPR). There are two XBP-1 isoforms in cells, spliced XBP-1S and unspliced XBP-1U. XBP-1U has been shown to bind to the 21-bp Tax-responsive element of the human T-lymphotropic virus type 1 (HTLV-1) long terminal repeat (LTR) in vitro and transactivate HTLV-1 transcription. Here we identify XBP-1S as a transcription activator of HTLV-1. Compared to XBP-1U, XBP-1S demonstrates stronger activating effects on both basal and Tax-activated HTLV-1 transcription in cells. Our results show that both XBP-1S and XBP-1U interact with Tax and bind to the HTLV-1 LTR in vivo. In addition, elevated mRNA levels of the gene for XBP-1 and several UPR genes were detected in the HTLV-1-infected C10/MJ and MT2 T-cell lines, suggesting that HTLV-1 infection may trigger the UPR in host cells. We also identify Tax as a positive regulator of the expression of the gene for XBP-1. Activation of the UPR by tunicamycin showed no effect on the HTLV-1 LTR, suggesting that HTLV-1 transcription is specifically regulated by XBP-1. Collectively, our study demonstrates a novel host-virus interaction between a cellular factor XBP-1 and transcriptional regulation of HTLV-1.