The molecular profile of circulating blood can reflect physiological and pathological events occurring in other tissues and organs of the body and delivers a comprehensive view of the status of the immune system. Blood has been useful in studying the pathobiology of many diseases. It is accessible and easily collected making it ideally suited to the development of diagnostic biomarker tests. The blood transcriptome has a high complement of globin RNA that could potentially saturate next-generation sequencing platforms, masking lower abundance transcripts. Methods to deplete globin mRNA are available, but their effect has not been comprehensively studied in peripheral whole blood RNA-Seq data. In this study we aimed to assess technical variability associated with globin depletion in addition to assessing general technical variability in RNA-Seq from whole blood derived samples.
We compared technical and biological replicates having undergone globin depletion or not and found that the experimental globin depletion protocol employed removed approximately 80% of globin transcripts, improved the correlation of technical replicates, allowed for reliable detection of thousands of additional transcripts and generally increased transcript abundance measures. Differential expression analysis revealed thousands of genes significantly up-regulated as a result of globin depletion. In addition, globin depletion resulted in the down-regulation of genes involved in both iron and zinc metal ion bonding.
Globin depletion appears to meaningfully improve the quality of peripheral whole blood RNA-Seq data, and may improve our ability to detect true biological variation. Some concerns remain, however. Key amongst them the significant reduction in RNA yields following globin depletion. More generally, our investigation of technical and biological variation with and without globin depletion finds that high-throughput sequencing by RNA-Seq is highly reproducible within a large dynamic range of detection and provides an accurate estimation of RNA concentration in peripheral whole blood. High-throughput sequencing is thus a promising technology for whole blood transcriptomics and biomarker discovery.
In this study, we explored a time course of peripheral whole blood transcriptomes from kidney transplantation patients who either experienced an acute rejection episode or did not in order to better delineate the immunological and biological processes measureable in blood leukocytes that are associated with acute renal allograft rejection. Using microarrays, we generated gene expression data from 24 acute rejectors and 24 nonrejectors. We filtered the data to obtain the most unambiguous and robustly expressing probe sets and selected a subset of patients with the clearest phenotype. We then performed a data-driven exploratory analysis using data reduction and differential gene expression analysis tools in order to reveal gene expression signatures associated with acute allograft rejection. Using a template-matching algorithm, we then expanded our analysis to include time course data, identifying genes whose expression is modulated leading up to acute rejection. We have identified molecular phenotypes associated with acute renal allograft rejection, including a significantly upregulated signature of neutrophil activation and accumulation following transplant surgery that is common to both acute rejectors and nonrejectors. Our analysis shows that this expression signature appears to stabilize over time in nonrejectors but persists in patients who go on to reject the transplanted organ. In addition, we describe an expression signature characteristic of lymphocyte activity and proliferation. This lymphocyte signature is significantly downregulated in both acute rejectors and nonrejectors following surgery; however, patients who go on to reject the organ show a persistent downregulation of this signature relative to the neutrophil signature.
blood transcriptomics; microarray; kidney transplant rejection; peripheral whole blood; neutrophil to lymphocyte ratio
End-stage renal failure is associated with profound changes in physiology and health, but the molecular causation of these pleomorphic effects termed “uremia” is poorly understood. The genomic changes of uremia were explored in a whole genome microarray case-control comparison of 95 subjects with end-stage renal failure (n = 75) or healthy controls (n = 20).
RNA was separated from blood drawn in PAXgene tubes and gene expression analyzed using Affymetrix Human Genome U133 Plus 2.0 arrays. Quality control and normalization was performed, and statistical significance determined with multiple test corrections (qFDR). Biological interpretation was aided by knowledge mining using NIH DAVID, MetaCore and PubGene
Over 9,000 genes were differentially expressed in uremic subjects compared to normal controls (fold change: -5.3 to +6.8), and more than 65% were lower in uremia. Changes appeared to be regulated through key gene networks involving cMYC, SP1, P53, AP1, NFkB, HNF4 alpha, HIF1A, c-Jun, STAT1, STAT3 and CREB1. Gene set enrichment analysis showed that mRNA processing and transport, protein transport, chaperone functions, the unfolded protein response and genes involved in tumor genesis were prominently lower in uremia, while insulin-like growth factor activity, neuroactive receptor interaction, the complement system, lipoprotein metabolism and lipid transport were higher in uremia. Pathways involving cytoskeletal remodeling, the clathrin-coated endosomal pathway, T-cell receptor signaling and CD28 pathways, and many immune and biological mechanisms were significantly down-regulated, while the ubiquitin pathway and certain others were up-regulated.
End-stage renal failure is associated with profound changes in human gene expression which appears to be mediated through key transcription factors. Dialysis and primary kidney disease had minor effects on gene regulation, but uremia was the dominant influence in the changes observed. This data provides important insight into the changes in cellular biology and function, opportunities for biomarkers of disease progression and therapy, and potential targets for intervention in uremia.
Gene expression profiling; Uremia; Chronic renal failure
Recent technical advances in the field of quantitative proteomics have stimulated a large number of biomarker discovery studies of various diseases, providing avenues for new treatments and diagnostics. However, inherent challenges have limited the successful translation of candidate biomarkers into clinical use, thus highlighting the need for a robust analytical methodology to transition from biomarker discovery to clinical implementation. We have developed an end-to-end computational proteomic pipeline for biomarkers studies. At the discovery stage, the pipeline emphasizes different aspects of experimental design, appropriate statistical methodologies, and quality assessment of results. At the validation stage, the pipeline focuses on the migration of the results to a platform appropriate for external validation, and the development of a classifier score based on corroborated protein biomarkers. At the last stage towards clinical implementation, the main aims are to develop and validate an assay suitable for clinical deployment, and to calibrate the biomarker classifier using the developed assay. The proposed pipeline was applied to a biomarker study in cardiac transplantation aimed at developing a minimally invasive clinical test to monitor acute rejection. Starting with an untargeted screening of the human plasma proteome, five candidate biomarker proteins were identified. Rejection-regulated proteins reflect cellular and humoral immune responses, acute phase inflammatory pathways, and lipid metabolism biological processes. A multiplex multiple reaction monitoring mass-spectrometry (MRM-MS) assay was developed for the five candidate biomarkers and validated by enzyme-linked immune-sorbent (ELISA) and immunonephelometric assays (INA). A classifier score based on corroborated proteins demonstrated that the developed MRM-MS assay provides an appropriate methodology for an external validation, which is still in progress. Plasma proteomic biomarkers of acute cardiac rejection may offer a relevant post-transplant monitoring tool to effectively guide clinical care. The proposed computational pipeline is highly applicable to a wide range of biomarker proteomic studies.
Novel proteomic technology has led to the generation of vast amounts of biological data and the identification of numerous potential biomarkers. However, computational approaches to translate this information into knowledge capable of impacting clinical care have been lagging. We propose a computational proteomic pipeline for biomarker studies that is founded on the combination of advanced statistical methodologies. We demonstrate our approach through the analysis of data obtained from heart transplant patients. Heart transplantation is the gold standard treatment for patients with end-stage heart failure, but is complicated by episodes of immune rejection that can adversely impact patient outcomes. Current rejection monitoring approaches are highly invasive, requiring a biopsy of the heart. This work aims to reduce the need for biopsies, and demonstrate the power and utility of computational approaches in proteomic biomarker discovery. Our work utilizes novel high-throughput proteomic technology combined with advanced statistical techniques to identify blood markers that guide the decision as to whether a biopsy is warranted, reduce the number of unnecessary biopsies, and ultimately diagnose the presence of rejection in heart transplant patients. Additionally, the proposed computational methodologies can be applied to a range of proteomic biomarker studies of various diseases and conditions.
Biomarker panels derived separately from genomic and proteomic data and with a variety of computational methods have demonstrated promising classification performance in various diseases. An open question is how to create effective proteo-genomic panels. The framework of ensemble classifiers has been applied successfully in various analytical domains to combine classifiers so that the performance of the ensemble exceeds the performance of individual classifiers. Using blood-based diagnosis of acute renal allograft rejection as a case study, we address the following question in this paper: Can acute rejection classification performance be improved by combining individual genomic and proteomic classifiers in an ensemble?
The first part of the paper presents a computational biomarker development pipeline for genomic and proteomic data. The pipeline begins with data acquisition (e.g., from bio-samples to microarray data), quality control, statistical analysis and mining of the data, and finally various forms of validation. The pipeline ensures that the various classifiers to be combined later in an ensemble are diverse and adequate for clinical use. Five mRNA genomic and five proteomic classifiers were developed independently using single time-point blood samples from 11 acute-rejection and 22 non-rejection renal transplant patients. The second part of the paper examines five ensembles ranging in size from two to 10 individual classifiers. Performance of ensembles is characterized by area under the curve (AUC), sensitivity, and specificity, as derived from the probability of acute rejection for individual classifiers in the ensemble in combination with one of two aggregation methods: (1) Average Probability or (2) Vote Threshold. One ensemble demonstrated superior performance and was able to improve sensitivity and AUC beyond the best values observed for any of the individual classifiers in the ensemble, while staying within the range of observed specificity. The Vote Threshold aggregation method achieved improved sensitivity for all 5 ensembles, but typically at the cost of decreased specificity.
Proteo-genomic biomarker ensemble classifiers show promise in the diagnosis of acute renal allograft rejection and can improve classification performance beyond that of individual genomic or proteomic classifiers alone. Validation of our results in an international multicenter study is currently underway.
Biomarkers; Computational; Pipeline; Genomics; Proteomics; Ensemble; Classification
Acute cardiac allograft rejection is a serious complication of heart transplantation. Investigating molecular processes in whole blood via microarrays is a promising avenue of research in transplantation, particularly due to the non-invasive nature of blood sampling. However, whole blood is a complex tissue and the consequent heterogeneity in composition amongst samples is ignored in traditional microarray analysis. This complicates the biological interpretation of microarray data. Here we have applied a statistical deconvolution approach, cell-specific significance analysis of microarrays (csSAM), to whole blood samples from subjects either undergoing acute heart allograft rejection (AR) or not (NR). We identified eight differentially expressed probe-sets significantly correlated to monocytes (mapping to 6 genes, all down-regulated in ARs versus NRs) at a false discovery rate (FDR) ≤ 15%. None of the genes identified are present in a biomarker panel of acute heart rejection previously published by our group and discovered in the same data***.
microarray expression; cell-specific expression; deconvolution; heart; transplantation
MicroRNA-21 (miR-21) is thought to be an oncomir because it promotes cancer cell proliferation, migration, and survival. miR-21 is also expressed in normal cells, but its physiological role is poorly understood. Recently, it has been found that miR-21 expression is rapidly induced in rodent hepatocytes during liver regeneration after two-thirds partial hepatectomy (2/3 PH). Here, we investigated the function of miR-21 in regenerating mouse hepatocytes by inhibiting it with an antisense oligonucleotide. To maintain normal hepatocyte viability and function, we antagonized the miR-21 surge induced by 2/3 PH while preserving baseline expression. We found that knockdown of miR-21 impaired progression of hepatocytes into S phase of the cell cycle, mainly through a decrease in levels of cyclin D1 protein, but not Ccnd1 mRNA. Mechanistically, we discovered that increased miR-21 expression facilitated cyclin D1 translation in the early phase of liver regeneration by relieving Akt1/mTOR complex 1 signaling (and thus eIF-4F–mediated translation initiation) from suppression by Rhob. Our findings reveal that miR-21 enables rapid hepatocyte proliferation during liver regeneration by accelerating cyclin D1 translation.
To evaluate the anti-microbial effects of photodynamic therapy (PDT) on infected human teeth ex vivo.
Materials and Methods
Fifty-two freshly extracted teeth with pulpal necrosis and associated periradicular radiolucencies were obtained from 34 subjects. Twenty-six teeth with 49 canals received chemomechanical debridement (CMD) with 6% NaOCl and twenty-six teeth with 52 canals received CMD plus PDT. For PDT, root canal systems were incubated with methylene blue (MB) at concentration of 50 µg/ml for 5 minutes followed by exposure to red light at 665 nm with an energy fluence of 30 J/cm2. The contents of root canals were sampled by flushing the canals at baseline and following CMD alone or CMD+PDT and were serially diluted and cultured on blood agar. Survival fractions were calculated by counting colony-forming units (CFU). Partial characterization of root canal species at baseline and following CMD alone or CMD+PDT was performed using DNA probes to a panel of 39 endodontic species in the checkerboard assay.
The Mantel-Haenszel chi-square test for treatment effects demonstrated the better performance of CMD+PDT over CMD (P=0.026). CMD+PDT significantly reduced the frequency of positive canals relative to CMD alone (P=0.0003). Following CMD+PDT, 45 of 52 canals (86.5%) had no CFU as compared to 24 of 49 canals (49%) treated with CMD (canal flush samples). The CFU reductions were similar when teeth or canals were treated as independent entities. Post-treatment detection levels for all species were markedly lower for canals treated by CMD+PDT than were for those treated by CMD alone. Bacterial species within dentinal tubules were detected in 17/22 (77.3%) and 15/29 (51.7%) of canals in the CMD and CMD+PDT group, respectively (P= 0.034).
Data indicate that PDT significantly reduces residual bacteria within the root canal system, and that PDT, if further enhanced by technical improvements, holds substantial promise as an adjunct to CMD.
Photodynamic therapy; methylene blue; endodontic disinfection; ex vivo
Recent evidence has contradicted the prevailing view that homeostasis and regeneration of the adult liver are mediated by self duplication of lineage-restricted hepatocytes and biliary epithelial cells. These new data suggest that liver progenitor cells do not function solely as a backup system in chronic liver injury; rather, they also produce hepatocytes after acute injury and are in fact the main source of new hepatocytes during normal hepatocyte turnover. In addition, other evidence suggests that hepatocytes are capable of lineage conversion, acting as precursors of biliary epithelial cells during biliary injury. To test these concepts, we generated a hepatocyte fate-tracing model based on timed and specific Cre recombinase expression and marker gene activation in all hepatocytes of adult Rosa26 reporter mice with an adenoassociated viral vector. We found that newly formed hepatocytes derived from preexisting hepatocytes in the normal liver and that liver progenitor cells contributed minimally to acute hepatocyte regeneration. Further, we found no evidence that biliary injury induced conversion of hepatocytes into biliary epithelial cells. These results therefore restore the previously prevailing paradigms of liver homeostasis and regeneration. In addition, our new vector system will be a valuable tool for timed, efficient, and specific loop out of floxed sequences in hepatocytes.
MicroRNAs (miRNAs) constitute a new class of regulators of gene expression. Among other actions, miRNAs have been shown to control cell proliferation in development and cancer. However, whether miRNAs regulate hepatocyte proliferation during liver regeneration is unknown. We addressed this question by performing 2/3 partial hepatectomy (2/3 PH) on mice with hepatocyte-specific inactivation of DiGeorge syndrome critical region gene 8 (DGCR8), an essential component of the miRNA processing pathway. Hepatocytes of these mice were miRNA-deficient and exhibited a delay in cell cycle progression involving the G1 to S phase transition. Examination of livers of wildtype mice after 2/3 PH revealed differential expression of a subset of miRNAs, notably an induction of miR-21 and repression of miR-378. We further discovered that miR-21 directly inhibits Btg2, a cell cycle inhibitor that prevents activation of forkhead box M1 (FoxM1), which is essential for DNA synthesis in hepatocytes after 2/3 PH. In addition, we found that miR-378 directly inhibits ornithine decarboxylase (Odc1), which is known to promote DNA synthesis in hepatocytes after 2/3 PH.
Our results show that miRNAs are critical regulators of hepatocyte proliferation during liver regeneration. Because these miRNAs and target gene interactions are conserved, our findings may also be relevant to human liver regeneration.
An important consideration when analyzing both microarray and quantitative PCR expression data is the selection of appropriate genes as endogenous controls or reference genes. This step is especially critical when identifying genes differentially expressed between datasets. Moreover, reference genes suitable in one context (e.g. lung cancer) may not be suitable in another (e.g. breast cancer). Currently, the main approach to identify reference genes involves the mining of expression microarray data for highly expressed and relatively constant transcripts across a sample set. A caveat here is the requirement for transcript normalization prior to analysis, and measurements obtained are relative, not absolute. Alternatively, as sequencing-based technologies provide digital quantitative output, absolute quantification ensues, and reference gene identification becomes more accurate.
Serial analysis of gene expression (SAGE) profiles of non-malignant and malignant lung samples were compared using a permutation test to identify the most stably expressed genes across all samples. Subsequently, the specificity of the reference genes was evaluated across multiple tissue types, their constancy of expression was assessed using quantitative RT-PCR (qPCR), and their impact on differential expression analysis of microarray data was evaluated.
We show that (i) conventional references genes such as ACTB and GAPDH are highly variable between cancerous and non-cancerous samples, (ii) reference genes identified for lung cancer do not perform well for other cancer types (breast and brain), (iii) reference genes identified through SAGE show low variability using qPCR in a different cohort of samples, and (iv) normalization of a lung cancer gene expression microarray dataset with or without our reference genes, yields different results for differential gene expression and subsequent analyses. Specifically, key established pathways in lung cancer exhibit higher statistical significance using a dataset normalized with our reference genes relative to normalization without using our reference genes.
Our analyses found NDUFA1, RPL19, RAB5C, and RPS18 to occupy the top ranking positions among 15 suitable reference genes optimal for normalization of lung tissue expression data. Significantly, the approach used in this study can be applied to data generated using new generation sequencing platforms for the identification of reference genes optimal within diverse contexts.
Ensemble methods have become popular for QSAR modeling, but most studies have assumed balanced data consisting of approximately equal numbers of active and inactive compounds. Cheminformatics data is often far from being balanced. We extend the application of ensemble methods to include cases of imbalance of class membership and to more adequately assess model output. Based on the extension, we propose an ensemble method called MBEnsemble that automatically determines the appropriate tuning parameters to provide reliable predictions and maximize the F-measure. Results from multiple datasets demonstrate that the proposed ensemble technique works well on imbalanced data.
Ensemble; Imbalanced Data; F-measure; Majority Vote; Probability Averaging and Threshold
Acute graft rejection is an important clinical problem in renal transplantation and an adverse predictor for long term graft survival. Plasma biomarkers may offer an important option for post-transplant monitoring and permit timely and effective therapeutic intervention to minimize graft damage. This case-control discovery study (n = 32) used isobaric tagging for relative and absolute protein quantification (iTRAQ) technology to quantitate plasma protein relative concentrations in precise cohorts of patients with and without biopsy-confirmed acute rejection (BCAR). Plasma samples were depleted of the 14 most abundant plasma proteins to enhance detection sensitivity. A total of 18 plasma proteins that encompassed processes related to inflammation, complement activation, blood coagulation, and wound repair exhibited significantly different relative concentrations between patient cohorts with and without BCAR (p value <0.05). Twelve proteins with a fold-change ≥1.15 were selected for diagnostic purposes: seven were increased (titin, lipopolysaccharide-binding protein, peptidase inhibitor 16, complement factor D, mannose-binding lectin, protein Z-dependent protease and β2-microglobulin) and five were decreased (kininogen-1, afamin, serine protease inhibitor, phosphatidylcholine-sterol acyltransferase, and sex hormone-binding globulin) in patients with BCAR. The first three principal components of these proteins showed clear separation of cohorts with and without BCAR. Performance improved with the inclusion of sequential proteins, reaching a primary asymptote after the first three (titin, kininogen-1, and lipopolysaccharide-binding protein). Longitudinal monitoring over the first 3 months post-transplant based on ratios of these three proteins showed clear discrimination between the two patient cohorts at time of rejection. The score then declined to baseline following treatment and resolution of the rejection episode and remained comparable between cases and controls throughout the period of quiescent follow-up. Results were validated using ELISA where possible, and initial cross-validation estimated a sensitivity of 80% and specificity of 90% for classification of BCAR based on a four-protein ELISA classifier. This study provides evidence that protein concentrations in plasma may provide a relevant measure for the occurrence of BCAR and offers a potential tool for immunologic monitoring.
Disruptions of beta-catenin and the canonical Wnt pathway are well documented in cancer. However, little is known of the non-canonical branch of the Wnt pathway. In this study, we investigate the transcript level patterns of genes in the Wnt pathway in squamous cell lung cancer using reverse-transcriptase (RT)-PCR. It was found that over half of the samples examined exhibited dysregulated gene expression of multiple components of the non-canonical branch of the WNT pathway. In the cases where beta catenin (CTNNB1) was not over-expressed, we identified strong relationships of expression between wingless-type MMTV integration site family member 5A (WNT5A)/ frizzled homolog 2 (FZD2), frizzled homolog 3 (FZD3) / dishevelled 2 (DVL2), and low density lipoprotein receptor-related protein 5 (LRP5)/ secreted frizzled-related protein 4 (SFRP4). This is one of the first studies to demonstrate expression of genes in the non-canonical pathway in normal lung tissue and its disruption in lung squamous cell carcinoma. These findings suggest that the non-canonical pathway may have a more prominent role in lung cancer than previously reported.
WNT pathway; lung cancer; gene expression; NSCLC; non-canonical; squamous cell carcinoma
Non-small cell lung cancer (NSCLC) presents as a progressive disease spanning precancerous, preinvasive, locally invasive, and metastatic lesions. Identification of biological pathways reflective of these progressive stages, and aberrantly expressed genes associated with these pathways, would conceivably enhance therapeutic approaches to this devastating disease.
Through the construction and analysis of SAGE libraries, we have determined transcriptome profiles for preinvasive carcinoma-in-situ (CIS) and invasive squamous cell carcinoma (SCC) of the lung, and compared these with expression profiles generated from both bronchial epithelium, and precancerous metaplastic and dysplastic lesions using Ingenuity Pathway Analysis. Expression of genes associated with epidermal development, and loss of expression of genes associated with mucociliary biology, are predominant features of CIS, largely shared with precancerous lesions. Additionally, expression of genes associated with xenobiotic metabolism/detoxification is a notable feature of CIS, and is largely maintained in invasive cancer. Genes related to tissue fibrosis and acute phase immune response are characteristic of the invasive SCC phenotype. Moreover, the data presented here suggests that tissue remodeling/fibrosis is initiated at the early stages of CIS. Additionally, this study indicates that alteration in copy-number status represents a plausible mechanism for differential gene expression in CIS and invasive SCC.
This study is the first report of large-scale expression profiling of CIS of the lung. Unbiased expression profiling of these preinvasive and invasive lesions provides a platform for further investigations into the molecular genetic events relevant to early stages of squamous NSCLC development. Additionally, up-regulated genes detected at extreme differences between CIS and invasive cancer may have potential to serve as biomarkers for early detection.
The study of oral premalignant lesions (OPL) is crucial to the identification of initiating genetic events in oral cancer. However, these lesions are minute in size, making it a challenge to recover sufficient DNA from microdissected cells for comprehensive genomic analysis. As a step toward identifying genetic aberrations associated with oral cancer progression, we used tiling-path array comparative genomic hybridization to compare alterations on chromosome 3p for 71 OPLs against 23 oral squamous cell carcinomas. 3p was chosen because although it is frequently altered in oral cancers and has been associated with progression risk, its alteration status has only been evaluated at a small number of loci in OPLs. We identified six recurrent losses in this region that were shared between high-grade dysplasias and oral squamous cell carcinomas, including a 2.89-Mbp deletion spanning the FHIT gene (previously implicated in oral cancer progression). When the alteration status for these six regions was examined in 24 low-grade dysplasias with known progression outcome, we observed that they occurred at a significantly higher frequency in low-grade dysplasias that later progressed to later-stage disease (P < 0.003). Moreover, parallel analysis of all profiled tissues showed that the extent of overall genomic alteration at 3p increased with histologic stage. This first high-resolution analysis of chromosome arm 3p in OPLs represents a significant step toward predicting progression risk in early preinvasive disease and provides a keen example of how genomic instability escalates with progression to invasive cancer.
Oral cancer develops through a series of histopathological stages: through mild (low grade), moderate, and severe (high grade) dysplasia to carcinoma in situ and then invasive disease. Early detection of those oral premalignant lesions (OPLs) that will develop into invasive tumors is necessary to improve the poor prognosis of oral cancer. Because no tools exist for delineating progression risk in low grade oral lesions, we cannot determine which of these cases require aggressive intervention. We undertook whole genome analysis by tiling-path array comparative genomic hybridization for a rare panel of early and late stage OPLs (n = 62), all of which had extensive longitudinal follow up (>10 years). Genome profiles for oral squamous cell carcinomas (n = 24) were generated for comparison. Parallel analysis of genome alterations and clinical parameters was performed to identify features associated with disease progression. Genome alterations in low grade dysplasias progressing to invasive disease more closely resembled those observed for later stage disease than they did those observed for non-progressing low grade dysplasias. This was despite the histopathological similarity between progressing and non-progressing cases. Strikingly, unbiased computational analysis of genomic alteration data correctly classified nearly all progressing low grade dysplasia cases. Our data demonstrate that high resolution genomic analysis can be used to evaluate progression risk in low grade OPLs, a marked improvement over present histopathological approaches which cannot delineate progression risk. Taken together, our data suggest that whole genome technologies could be used in management strategies for patients presenting with precancerous oral lesions.
Motivation: Analysis of array comparative genomic hybridization (aCGH) data for recurrent DNA copy number alterations from a cohort of patients can yield distinct sets of molecular signatures or profiles. This can be due to the presence of heterogeneous cancer subtypes within a supposedly homogeneous population.
Results: We propose a novel statistical method for automatically detecting such subtypes or clusters. Our approach is model based: each cluster is defined in terms of a sparse profile, which contains the locations of unusually frequent alterations. The profile is represented as a hidden Markov model. Samples are assigned to clusters based on their similarity to the cluster's profile. We simultaneously infer the cluster assignments and the cluster profiles using an expectation maximization-like algorithm. We show, using a realistic simulation study, that our method is significantly more accurate than standard clustering techniques. We then apply our method to two clinical datasets. In particular, we examine previously reported aCGH data from a cohort of 106 follicular lymphoma patients, and discover clusters that are known to correspond to clinically relevant subgroups. In addition, we examine a cohort of 92 diffuse large B-cell lymphoma patients, and discover previously unreported clusters of biological interest which have inspired followup clinical research on an independent cohort.
Availability: Software and synthetic datasets are available at http://www.cs.ubc.ca/∼sshah/acgh as part of the CNA-HMMer package.
Supplementary information: Supplementary data are available at Bioinformatics online.
High throughput microarray technologies have afforded the investigation of genomes, epigenomes, and transcriptomes at unprecedented resolution. However, software packages to handle, analyze, and visualize data from these multiple 'omics disciplines have not been adequately developed.
Here, we present SIGMA2, a system for the integrative genomic multi-dimensional analysis of cancer genomes, epigenomes, and transcriptomes. Multi-dimensional datasets can be simultaneously visualized and analyzed with respect to each dimension, allowing combinatorial integration of the different assays belonging to the different 'omics.
The identification of genes altered at multiple levels such as copy number, loss of heterozygosity (LOH), DNA methylation and the detection of consequential changes in gene expression can be concertedly performed, establishing SIGMA2 as a novel tool to facilitate the high throughput systems biology analysis of cancer.
X-box binding protein 1 (XBP-1), a basic leucine zipper transcription factor, plays a key role in the cellular unfolded protein response (UPR). There are two XBP-1 isoforms in cells, spliced XBP-1S and unspliced XBP-1U. XBP-1U has been shown to bind to the 21-bp Tax-responsive element of the human T-lymphotropic virus type 1 (HTLV-1) long terminal repeat (LTR) in vitro and transactivate HTLV-1 transcription. Here we identify XBP-1S as a transcription activator of HTLV-1. Compared to XBP-1U, XBP-1S demonstrates stronger activating effects on both basal and Tax-activated HTLV-1 transcription in cells. Our results show that both XBP-1S and XBP-1U interact with Tax and bind to the HTLV-1 LTR in vivo. In addition, elevated mRNA levels of the gene for XBP-1 and several UPR genes were detected in the HTLV-1-infected C10/MJ and MT2 T-cell lines, suggesting that HTLV-1 infection may trigger the UPR in host cells. We also identify Tax as a positive regulator of the expression of the gene for XBP-1. Activation of the UPR by tunicamycin showed no effect on the HTLV-1 LTR, suggesting that HTLV-1 transcription is specifically regulated by XBP-1. Collectively, our study demonstrates a novel host-virus interaction between a cellular factor XBP-1 and transcriptional regulation of HTLV-1.
Recent advances in global genomic profiling methodologies have enabled multi-dimensional characterization of biological systems. Complete analysis of these genomic profiles require an in depth look at parallel profiles of segmental DNA copy number status, DNA methylation state, single nucleotide polymorphisms, as well as gene expression profiles. Due to the differences in data types it is difficult to conduct parallel analysis of multiple datasets from diverse platforms.
To address this issue, we have developed an integrative genomic analysis platform MD-SeeGH, a software tool that allows users to rapidly and directly analyze genomic datasets spanning multiple genomic experiments. With MD-SeeGH, users have the flexibility to easily update datasets in accordance with new genomic builds, make a quality assessment of data using the filtering features, and identify genetic alterations within single or across multiple experiments. Multiple sample analysis in MD-SeeGH allows users to compare profiles from many experiments alongside tracks containing detailed localized gene information, microRNA, CpG islands, and copy number variations.
MD-SeeGH is a new platform for the integrative analysis of diverse microarray data, facilitating multiple profile analyses and group comparisons.
Disruptions of beta-catenin and the canonical Wnt pathway are well documented in cancer. However, little is known of the non-canonical branch of the Wnt pathway. In this study, we investigate the transcript level patterns of genes in the Wnt pathway in squamous cell lung cancer using reverse-transcriptase (RT)-PCR. It was found that over half of the samples examined exhibited dysregulated gene expression of multiple components of the non-canonical branch of the WNT pathway. In the cases where beta catenin (CTNNB1) was not over-expressed, we identified strong relationships of expression between wingless-type MMTV integration site family member 5A (WNT5A)/frizzled homolog 2 (FZD2), frizzled homolog 3 (FZD3)/dishevelled 2 (DVL2), and low density lipoprotein receptor-related protein 5 (LRP5)/secreted frizzled-related protein 4 (SFRP4). This is one of the first studies to demonstrate expression of genes in the non-canonical pathway in normal lung tissue and its disruption in lung squamous cell carcinoma. These findings suggest that the non-canonical pathway may have a more prominent role in lung cancer than previously reported.
WNT pathway; lung cancer; gene expression; NSCLC; non-canonical; squamous cell carcinoma
Lung cancer is the most common cause of cancer-related deaths. Tobacco smoke exposure is the strongest aetiological factor associated with lung cancer. In this study, using serial analysis of gene expression (SAGE), we comprehensively examined the effect of active smoking by comparing the transcriptomes of clinical specimens obtained from current, former and never smokers, and identified genes showing both reversible and irreversible expression changes upon smoking cessation.
Twenty-four SAGE profiles of the bronchial epithelium of eight current, twelve former and four never smokers were generated and analyzed. In total, 3,111,471 SAGE tags representing over 110 thousand potentially unique transcripts were generated, comprising the largest human SAGE study to date. We identified 1,733 constitutively expressed genes in current, former and never smoker transcriptomes. We have also identified both reversible and irreversible gene expression changes upon cessation of smoking; reversible changes were frequently associated with either xenobiotic metabolism, nucleotide metabolism or mucus secretion. Increased expression of TFF3, CABYR, and ENTPD8 were found to be reversible upon smoking cessation. Expression of GSK3B, which regulates COX2 expression, was irreversibly decreased. MUC5AC expression was only partially reversed. Validation of select genes was performed using quantitative RT-PCR on a secondary cohort of nine current smokers, seven former smokers and six never smokers.
Expression levels of some of the genes related to tobacco smoking return to levels similar to never smokers upon cessation of smoking, while expression of others appears to be permanently altered despite prolonged smoking cessation. These irreversible changes may account for the persistent lung cancer risk despite smoking cessation.
In humans, coxsackievirus B3 is the primary etiological agent of viral myocarditis, an inflammatory disease process involving the heart muscle. Specific therapy is currently unavailable. Viral myocarditis is a complex, multiphasic infectious-inflammatory-reparative process. To address the temporal dimensionality of myocarditis, array- and nonarray-based molecular techniques, and histological and functional assays were used to help define enteroviral pathogenesis and its relation to heart failure. The application of high throughput genomic strategies and bioinformatics tools – coupled with established molecular techniques – have allowed us to perform a large-scale analysis of gene expression to better understand the host response to viral infection. Differential messenger RNA display, spotted complementary DNA arrays and Affymetrix Gene Chips (Affymetrix, United States) were used to study murine hearts during acute viremic, inflammatory and reparative stages. The observed global decreases in expression of metabolic and mitochondrial genes were focused on. The authors have previously characterized the role of mitochondria-triggered apoptosis, and pro- and anti-apoptotic Bcl-2 family proteins in enteroviral infections. The impact of altered mitochondrial transcripts on such host cell death and on metabolic injury to the heart is currently under study. In the authors’ experience, the experimental progression from high throughput, unbiased analysis to biological validation has been only partially systematic. Insights are offered into the logic behind the selection of genes of potential interest for further investigation in the myocarditis model. A series of criteria for validatory decision-making, which the authors have developed based on their experiences, is described. Such criteria reflect known or putative gene function and expression patterns, as well as pragmatic considerations in the determination of steps toward investigation. This approach may help other investigators who need to dissect large genomic data sets to find targets for biological confirmation. Together, the authors’ genomic studies have generated new, testable hypotheses regarding the interaction between host and enterovirus.
Bioinformatics; Genomics; Microarrays; Myocarditis
Cyclin A-Cdk2 complexes bind to Skp1 and Skp2 during S phase, but the function of Skp1 and Skp2 is unclear. Skp1, together with F-box proteins like Skp2, are part of ubiquitin-ligase E3 complexes that target many cell cycle regulators for ubiquitination-mediated proteolysis. In this study, we investigated the potential regulation of cyclin A-Cdk2 activity by Skp1 and Skp2. We found that Skp2 can inhibit the kinase activity of cyclin A-Cdk2 in vitro, both by direct inhibition of cyclin A-Cdk2 and by inhibition of the activation of Cdk2 by cyclin-dependent kinase (CDK)-activating kinase phosphorylation. Only the kinase activity of Cdk2, not of that of Cdc2 or Cdk5, is reduced by Skp2. Skp2 is phosphorylated by cyclin A-Cdk2 on residue Ser76, but nonphosphorylatable mutants of Skp2 can still inhibit the kinase activity of cyclin A-Cdk2 toward histone H1. The F box of Skp2 is required for binding to Skp1, and both the N-terminal and C-terminal regions of Skp2 are involved in binding to cyclin A-Cdk2. Furthermore, Skp2 and the CDK inhibitor p21Cip1/WAF1 bind to cyclin A-Cdk2 in a mutually exclusive manner. Overexpression of Skp2, but not Skp1, in mammalian cells causes a G1/S cell cycle arrest.