Very few analytical approaches have been reported to resolve the variability in microarray measurements stemming from sample heterogeneity. For example, tissue samples used in cancer studies are usually contaminated with the surrounding or infiltrating cell types. This heterogeneity in the sample preparation hinders further statistical analysis, significantly so if different samples contain different proportions of these cell types. Thus, sample heterogeneity can result in the identification of differentially expressed genes that may be unrelated to the biological question being studied. Similarly, irrelevant gene combinations can be discovered in the case of gene expression based classification.
We propose a computational framework for removing the effects of sample heterogeneity by "microdissecting" microarray data in silico. The computational method provides estimates of the expression values of the pure (non-heterogeneous) cell samples. The inversion of the sample heterogeneity can be facilitated by providing accurate estimates of the mixing percentages of different cell types in each measurement. For those cases where no such information is available, we develop an optimization-based method for joint estimation of the mixing percentages and the expression values of the pure cell samples. We also consider the problem of selecting the correct number of cell types.
The efficiency of the proposed methods is illustrated by applying them to a carefully controlled cDNA microarray data obtained from heterogeneous samples. The results demonstrate that the methods are capable of reconstructing both the sample and cell type specific expression values from heterogeneous mixtures and that the mixing percentages of different cell types can also be estimated. Furthermore, a general purpose model selection method can be used to select the correct number of cell types.
RNA-seq, a next-generation sequencing based method for transcriptome analysis, is rapidly emerging as the method of choice for comprehensive transcript abundance estimation. The accuracy of RNA-seq can be highly impacted by the purity of samples. A prominent, outstanding problem in RNA-seq is how to estimate transcript abundances in heterogeneous tissues, where a sample is composed of more than one cell type and the inhomogeneity can substantially confound the transcript abundance estimation of each individual cell type. Although experimental methods have been proposed to dissect multiple distinct cell types, computationally "deconvoluting" heterogeneous tissues provides an attractive alternative, since it keeps the tissue sample as well as the subsequent molecular content yield intact.
Here we propose a probabilistic model-based approach, Transcript Estimation from Mixed Tissue samples (TEMT), to estimate the transcript abundances of each cell type of interest from RNA-seq data of heterogeneous tissue samples. TEMT incorporates positional and sequence-specific biases, and its online EM algorithm only requires a runtime proportional to the data size and a small constant memory. We test the proposed method on both simulation data and recently released ENCODE data, and show that TEMT significantly outperforms current state-of-the-art methods that do not take tissue heterogeneity into account. Currently, TEMT only resolves the tissue heterogeneity resulting from two cell types, but it can be extended to handle tissue heterogeneity resulting from multi cell types. TEMT is written in python, and is freely available at https://github.com/uci-cbcl/TEMT.
The probabilistic model-based approach proposed here provides a new method for analyzing RNA-seq data from heterogeneous tissue samples. By applying the method to both simulation data and ENCODE data, we show that explicitly accounting for tissue heterogeneity can significantly improve the accuracy of transcript abundance estimation.
Interpreting gene expression profiles obtained from heterogeneous samples can be difficult because bulk gene expression measures are not resolved to individual cell populations. We have recently devised Population-Specific Expression Analysis (PSEA), a statistical method that identifies individual cell types expressing genes of interest and achieves quantitative estimates of cell type-specific expression levels. This procedure makes use of marker gene expression and circumvents the need for additional experimental information like tissue composition.
To systematically assess the performance of statistical deconvolution, we applied PSEA to gene expression profiles from cerebellum tissue samples and compared with parallel, experimental separation methods. Owing to the particular histological organization of the cerebellum, we could obtain cellular expression data from in situ hybridization and laser-capture microdissection experiments and successfully validated computational predictions made with PSEA. Upon statistical deconvolution of whole tissue samples, we identified a set of transcripts showing age-related expression changes in the astrocyte population.
PSEA can predict cell-type specific expression levels from tissues homogenates on a genome-wide scale. It thus represents a computational alternative to experimental separation methods and allowed us to identify age-related expression changes in the astrocytes of the cerebellum. These molecular changes might underlie important physiological modifications previously observed in the aging brain.
Genomics; Computational biology; Cerebellum; Gene expression; Aging; Astrocyte
Gene expression profiling studies based on DNA microarrays have demonstrated their ability to define the interaction pathways between neoplastic and nonmalignant stromal cells in cancer tissues. During the past ten years, a number of approaches including microdissection have tried to resolve the variability in DNA microarray measurements stemming from cancer tissue sample heterogeneity. Another approach, designated as virtual or in silico microdissection, avoids the laborious and time-consuming step of anatomic microdissection. It consists of confronting the gene expression profiles of complex tissue samples to those of cell lines representative of different cell lineages, different differentiation stages, or different signaling pathways. This strategy has been used in recent studies aiming to analyze microenvironment alterations using gene expression profiling of nonmicrodissected classical Hodgkin lymphoma tissues in order to generate new prognostic factors. These recent contributions are detailed and discussed in the present paper.
For heterogeneous tissues, such as blood, measurements of gene expression are confounded by relative proportions of cell types involved. Conclusions have to rely on estimation of gene expression signals for homogeneous cell populations, e.g. by applying micro-dissection, fluorescence activated cell sorting, or in-silico deconfounding. We studied feasibility and validity of a non-negative matrix decomposition algorithm using experimental gene expression data for blood and sorted cells from the same donor samples. Our objective was to optimize the algorithm regarding detection of differentially expressed genes and to enable its use for classification in the difficult scenario of reversely regulated genes. This would be of importance for the identification of candidate biomarkers in heterogeneous tissues.
Experimental data and simulation studies involving noise parameters estimated from these data revealed that for valid detection of differential gene expression, quantile normalization and use of non-log data are optimal. We demonstrate the feasibility of predicting proportions of constituting cell types from gene expression data of single samples, as a prerequisite for a deconfounding-based classification approach.
Classification cross-validation errors with and without using deconfounding results are reported as well as sample-size dependencies. Implementation of the algorithm, simulation and analysis scripts are available.
The deconfounding algorithm without decorrelation using quantile normalization on non-log data is proposed for biomarkers that are difficult to detect, and for cases where confounding by varying proportions of cell types is the suspected reason. In this case, a deconfounding ranking approach can be used as a powerful alternative to, or complement of, other statistical learning approaches to define candidate biomarkers for molecular diagnosis and prediction in biomedicine, in realistically noisy conditions and with moderate sample sizes.
Uncertainty often affects molecular biology experiments and data for different reasons. Heterogeneity of gene or protein expression within the same tumor tissue is an example of biological uncertainty which should be taken into account when molecular markers are used in decision making. Tissue Microarray (TMA) experiments allow for large scale profiling of tissue biopsies, investigating protein patterns characterizing specific disease states. TMA studies deal with multiple sampling of the same patient, and therefore with multiple measurements of same protein target, to account for possible biological heterogeneity. The aim of this paper is to provide and validate a classification model taking into consideration the uncertainty associated with measuring replicate samples.
We propose an extension of the well-known Naïve Bayes classifier, which accounts for biological heterogeneity in a probabilistic framework, relying on Bayesian hierarchical models. The model, which can be efficiently learned from the training dataset, exploits a closed-form of classification equation, thus providing no additional computational cost with respect to the standard Naïve Bayes classifier. We validated the approach on several simulated datasets comparing its performances with the Naïve Bayes classifier. Moreover, we demonstrated that explicitly dealing with heterogeneity can improve classification accuracy on a TMA prostate cancer dataset.
The proposed Hierarchical Naïve Bayes classifier can be conveniently applied in problems where within sample heterogeneity must be taken into account, such as TMA experiments and biological contexts where several measurements (replicates) are available for the same biological sample. The performance of the new approach is better than the standard Naïve Bayes model, in particular when the within sample heterogeneity is different in the different classes.
Tissue heterogeneity is a serious limiting factor for sound cell-specific molecular studies including genomic and proteomic analyses. Although tissue microdissection technologies (e.g. laser capture microdissection) have advanced tremendously over the last decades several factors such as their generally high cost and inability to microdissect fresh or live tissues limit their widespread use. Therefore, there is a need for a low-cost and easy-to-use microdissection device. Here, we developed a low-cost vacuum-assisted capillary-based cell and tissue acquisition system (CTAS) and demonstrated its use for microdissection of brain tissues samples for several downstream applications including isolation of high quality RNA from microdissected brain tissue samples, their use for proteomics studies and electron microscopy as well as microdissection of native living brain tissues for primary cell culturing. Unlike LCM, CTAS is capable of microdissecting fresh frozen and live tissues, works in a thicker tissue sections ranging from 10 mm to 300 mm and can collect individual cells, cell clusters and subanatomical regions. CTAS has been established as a straightforward and robust microdissection tool, allowing rapid, precise and efficient procurement of specific tissue and cell types at low cost. Developed microdissection protocol avoids extensive heating, chemical treatment, laser beam exposure, and other potentially harmful physical treatment of the tissue samples, thus preserving the primary functions of the dissected cells and the macromolecules within for subsequent downstream applications.
Rationale and Objectives
We have previously described a probabilistic model for the multiple-reader, multiple-case paradigm for ROC analysis. When the figure of merit is the Wilcoxon statistic, this model returns a seven-term expansion for the variance of this statistic as a function of the numbers of cases and readers. This probabilistic model also provides expressions for the coefficients in the seven-term expansion in terms of expectations over the internal noise, readers, and cases. Finally, this probabilistic model sets bounds on both the overall variance of the Wilcoxon statistic as well as the individual coefficients.
Materials and Methods
In this paper we will first validate the probabilistic model by comparing variances determined by direct computation of the expansion coefficients to empirical estimates of the variance using independent sampling. Validation of the probabilistic model will enable us to use the direct estimates of the expansion coefficients as a gold-standard to compare other coefficient-estimation techniques. Next, we develop a coefficient-estimation technique that employs bootstrapping to estimate the Wilcoxon statistic variance for different numbers of readers and cases. We then employ constrained, least-squares fitting techniques to estimate the expansion coefficients. The constraints used in this fitting are derived directly from the probabilistic model.
Results and Discussion
Using two different simulation studies, we show that the novel (and practical) bootstrapping/fitting technique returns estimates of the coefficients that are consistent with the gold standard. The results presented also serve to validate the seven-term expansion for the variance of the Wilcoxon statistic.
ROC analysis; multiple reader multiple case; Wilcoxon statistic
Breast tumors consist of several different tissue components. Despite the heterogeneity, most gene expression analyses have traditionally been performed without prior microdissection of the tissue sample. Thus, the gene expression profiles obtained reflect the mRNA contribution from the various tissue components. We utilized histopathological estimations of area fractions of tumor and stromal tissue components in 198 fresh-frozen breast tumor tissue samples for a cell type-associated gene expression analysis associated with distant metastasis. Sets of differentially expressed gene-probes were identified in tumors from patients who developed distant metastasis compared with those who did not, by weighing the contribution from each tumor with the relative content of stromal and tumor epithelial cells in their individual tumor specimen. The analyses were performed under various assumptions of mRNA transcription level from tumor epithelial cells compared with stromal cells. A set of 30 differentially expressed gene-probes was ascribed solely to carcinoma cells. Furthermore, two sets of 38 and five differentially expressed gene-probes were mostly associated to tumor epithelial and stromal cells, respectively. Finally, a set of 26 differentially expressed gene-probes was identified independently of cell type focus. The differentially expressed genes were validated in independent gene expression data from a set of laser capture microdissected invasive ductal carcinomas. We present a method for identifying and ascribing differentially expressed genes to tumor epithelial and/or stromal cells, by utilizing pathologic information and weighted t-statistics. Although a transcriptional contribution from the stromal cell fraction is detectable in microarray experiments performed on bulk tumor, the gene expression differences between the distant metastasis and no distant metastasis group were mostly ascribed to the tumor epithelial cells of the primary breast tumors. However, the gene PIP5K2A was found significantly elevated in stroma cells in distant metastasis group, compared to stroma in no distant metastasis group. These findings were confirmed in gene expression data from the representative compartments from microdissected breast tissue. The method described was also found to be robust to different histopathological procedures.
The molecular examination of pathologically altered cells and tissues at the DNA, RNA, and protein level has revolutionised research and diagnostics in pathology. However, the inherent heterogeneity of primary tissues with an admixture of various reactive cell populations can affect the outcome and interpretation of molecular studies. Recently, microdissection of tissue sections and cytological preparations has been used increasingly for the isolation of homogeneous, morphologically identified cell populations, thus overcoming the obstacle of tissue complexity. In conjunction with sensitive analytical techniques, such as the polymerase chain reaction, microdissection allows precise in vivo examination of cell populations, such as carcinoma in situ or the malignant cells of Hodgkin's disease, which are otherwise inaccessible for conventional molecular studies. However, most microdissection techniques are very time consuming and require a high degree of manual dexterity, which limits their practical use. Laser capture microdissection (LCM), a novel technique developed at the National Cancer Institute, is an important advance in terms of speed, ease of use, and versatility of microdissection. LCM is based on the adherence of visually selected cells to a thermoplastic membrane, which overlies the dehydrated tissue section and is focally melted by triggering of a low energy infrared laser pulse. The melted membrane forms a composite with the selected tissue area, which can be removed by simple lifting of the membrane. LCM can be applied to a wide range of cell and tissue preparations including paraffin wax embedded material. The use of immunohistochemical stains allows the selection of cells according to phenotypic and functional characteristics. Depending on the starting material, DNA, good quality mRNA, and proteins can be extracted successfully from captured tissue fragments, down to the single cell level. In combination with techniques like expression library construction, cDNA array hybridisation and differential display, LCM will allow the establishment of "genetic fingerprints"of specific pathological lesions, especially malignant neoplasms. In addition to the identification of new diagnostic and prognostic markers, this approach could help in establishing individualised treatments tailored to the molecular profile of a tumour. This review provides an overview of the technique of LCM, summarises current applications and new methodical approaches, and tries to give a perspective on future developments. In addition, LCM is compared with other recently developed laser microdissection techniques.
Key Words: laser capture microdissection • RNA analysis • DNA analysis • gene expression • profiling • immunohistochemistry
Laser capture microdissection (LCM) allows the precise procurement of enriched cell populations from a heterogeneous tissue, or live cell culture, under direct microscopic visualization. Histologically enriched cell populations can be procured by harvesting cells of interest directly, or isolating specific cells by ablating unwanted cells. The basic components of laser microdissection technology are a) visualization of cells via light microscopy, b) transfer of laser energy to a thermolabile polymer with either the formation of a polymer-cell composite (capture method) or transfer of laser energy via an ultraviolet laser to photovolatize a region of tissue (cutting method), and c) removal of cells of interest from the heterogeneous tissue section. The capture and cutting methods (instruments) for laser microdissection differ in the manner by which cells of interest are removed from the heterogeneous sample. Laser energy in the capture method is infrared (810nm), while in the cutting mode the laser is ultraviolet (355nm). Infrared lasers melt a thermolabile polymer that adheres to the cells of interest, whereas ultraviolet lasers ablate cells for either removal of unwanted cells or excision of a defined area of cells. LCM technology is applicable to an array of applications including mass spectrometry, DNA genotyping and loss-of-heterozygosity analysis, RNA transcript profiling, cDNA library generation, proteomics discovery, and signal kinase pathway profiling. This chapter describes laser capture microdissection using an ArcturusXT instrument for protein LCM sample analysis, and using a mmi CellCut Plus® instrument for RNA analysis via NanoString technology.
DNA; infrared laser; laser capture microdissection; molecular profiling; NanoString; phopshoprotein; pre-analytical variablity; protein; RNA; tissue; tissue heterogeneity; UV laser
Cell type heterogeneity may have a substantial effect on gene expression profiling of human tissue. Several in silico methods for deconvoluting a gene expression profile into cell-type-specific subprofiles have been published but not widely used. Here, we consider recent methods and the experimental validations available for them. Shen-Orr et al. recently developed an approach called cell-type-specific significance analysis of microarray for deconvoluting gene expression. This method requires the measurement of the proportion of each cell type in each sample and the expression profiles of the heterogeneous samples. It determines how gene expression varies among pre-defined phenotypes for each cell type. Gene expression can vary substantially among cell types and sample heterogeneity can mask the identification of biologically important phenotypic correlations. Consequently, the deconvolution approach can be useful in the analysis of mixtures of cell populations in clinical samples.
Motivation Accurate knowledge of the genome-wide binding of transcription factors in a particular cell type or under a particular condition is necessary for understanding transcriptional regulation. Using epigenetic data such as histone modification and DNase I, accessibility data has been shown to improve motif-based in silico methods for predicting such binding, but this approach has not yet been fully explored.
Results We describe a probabilistic method for combining one or more tracks of epigenetic data with a standard DNA sequence motif model to improve our ability to identify active transcription factor binding sites (TFBSs). We convert each data type into a position-specific probabilistic prior and combine these priors with a traditional probabilistic motif model to compute a log-posterior odds score. Our experiments, using histone modifications H3K4me1, H3K4me3, H3K9ac and H3K27ac, as well as DNase I sensitivity, show conclusively that the log-posterior odds score consistently outperforms a simple binary filter based on the same data. We also show that our approach performs competitively with a more complex method, CENTIPEDE, and suggest that the relative simplicity of the log-posterior odds scoring method makes it an appealing and very general method for identifying functional TFBSs on the basis of DNA and epigenetic evidence.
Availability and implementation: FIMO, part of the MEME Suite software toolkit, now supports log-posterior odds scoring using position-specific priors for motif search. A web server and source code are available at http://meme.nbcr.net. Utilities for creating priors are at http://research.imb.uq.edu.au/t.bailey/SD/Cuellar2011.
Supplementary information: Supplementary data are available at Bioinformatics online.
An important need of many cancer research projects is the availability of high-quality, appropriately selected tissue. Tissue biorepositories are organized to collect, process, store, and distribute samples of tumor and normal tissue for further use in fundamental and translational cancer research. This, in turn, provides investigators with an invaluable resource of appropriately examined and characterized tissue specimens and linked patient information. Human tissues, in particular, tumor tissues, are complex structures composed of heterogeneous mixtures of morphologically and functionally distinct cell types. It is essential to analyze specific cell types to identify and define accurately the biologically important processes in pathologic lesions. Laser capture microdissection (LCM) is state-of-the-art technology that provides the scientific community with a rapid and reliable method to isolate a homogeneous population of cells from heterogeneous tissue specimens, thus providing investigators with the ability to analyze DNA, RNA, and protein accurately from pure populations of cells. This is particularly well-suited for tumor cell isolation, which can be captured from complex tissue samples. The combination of LCM and a tissue biorepository offers a comprehensive means by which researchers can use valuable human biospecimens and cutting-edge technology to facilitate basic, translational, and clinical research. This review provides an overview of LCM technology with an emphasis on the applications of LCM in the setting of a tissue biorepository, based on the author's extensive experience in LCM procedures acquired at Fox Chase Cancer Center and Hollings Cancer Center.
pathology; cancer biology; cells of interest
The cellular composition of heterogeneous samples can be predicted using an expression deconvolution algorithm to decompose their gene expression profiles based on pre-defined, reference gene expression profiles of the constituent populations in these samples. However, the expression profiles of the actual constituent populations are often perturbed from those of the reference profiles due to gene expression changes in cells associated with microenvironmental or developmental effects. Existing deconvolution algorithms do not account for these changes and give incorrect results when benchmarked against those measured by well-established flow cytometry, even after batch correction was applied. We introduce PERT, a new probabilistic expression deconvolution method that detects and accounts for a shared, multiplicative perturbation in the reference profiles when performing expression deconvolution. We applied PERT and three other state-of-the-art expression deconvolution methods to predict cell frequencies within heterogeneous human blood samples that were collected under several conditions (uncultured mono-nucleated and lineage-depleted cells, and culture-derived lineage-depleted cells). Only PERT's predicted proportions of the constituent populations matched those assigned by flow cytometry. Genes associated with cell cycle processes were highly enriched among those with the largest predicted expression changes between the cultured and uncultured conditions. We anticipate that PERT will be widely applicable to expression deconvolution strategies that use profiles from reference populations that vary from the corresponding constituent populations in cellular state but not cellular phenotypic identity.
The cellular composition of heterogeneous samples can be predicted from reference gene expression profiles that represent the homogeneous, constituent populations of the heterogeneous samples. However, existing methods fail when the reference profiles are not representative of the constituent populations. We developed PERT, a new probabilistic expression deconvolution method, to address this limitation. PERT was used to deconvolve the cellular composition of variably sourced and treated heterogeneous human blood samples. Our results indicate that even after batch correction is applied, cells presenting the same cell surface antigens display different transcriptional programs when they are uncultured versus culture-derived. Given gene expression profiles of culture-derived heterogeneous samples and profiles of uncultured reference populations, PERT was able to accurately recover proportions of the constituent populations composing the heterogeneous samples. We anticipate that PERT will be widely applicable to expression deconvolution strategies that use profiles from reference populations that vary from the corresponding constituent populations in cellular state but not cellular phenotypic identity.
Complicating proteomic analysis of whole tissues is the obvious problem of cell heterogeneity in tissues, which often results in misleading or confusing molecular findings. Thus, the coupling of tissue microdissection for tumor cell enrichment with capillary isotachophoresis-based selective analyte concentration not only serves as a synergistic strategy to characterize low abundance proteins, but it can also be employed to conduct comparative proteomic studies of human astrocytomas.
A set of fresh frozen brain biopsies were selectively microdissected to provide an enriched, high quality, and reproducible sample of tumor cells. Despite sharing many common proteins, there are significant differences in the protein expression level among different grades of astrocytomas. A large number of proteins, such as plasma membrane proteins EGFR and Erbb2, are up-regulated in glioblastoma. Besides facilitating the prioritization of follow-on biomarker selection and validation, comparative proteomics involving measurements in changes of pathways are expected to reveal the molecular relationships among different pathological grades of gliomas and potential molecular mechanisms that drive gliomagenesis.
Motivation: Multicellular systems, such as tissues, are composed of different cell types that form a heterogeneous community. Behavior of these systems is determined by complex regulatory networks within (intracellular networks) and between (intercellular networks) cells. Increasingly more studies are applying genome-wide experimental approaches to delineate the contributions of individual cell types (e.g. stromal, epithelial, vascular cells) to collective behavior of heterogeneous cell communities (e.g. tumors). Although many computational methods have been developed for analyses of intracellular networks based on genome-scale data, these efforts have not been extended toward analyzing genomic data from heterogeneous cell communities.
Results: Here, we propose a network-based approach for analyses of genome-scale data from multiple cell types to extract community-wide molecular networks comprised of intra- and intercellular interactions. Intercellular interactions in this model can be physical interactions between proteins or indirect interactions mediated by secreted metabolites of neighboring cells. Applying this method on data from a recent study on xenograft mouse models of human lung adenocarcinoma, we uncover an extensive network of intra- and intercellular interactions involved in the acquired resistance to angiogenesis inhibitors.
Supplementary data are available at Bioinformatics online.
Automated classification of histopathology involves identification of multiple classes, including benign, cancerous, and confounder categories. The confounder tissue classes can often mimic and share attributes with both the diseased and normal tissue classes, and can be particularly difficult to identify, both manually and by automated classifiers. In the case of prostate cancer, they may be several confounding tissue types present in a biopsy sample, posing as major sources of diagnostic error for pathologists. Two common multi-class approaches are one-shot classification (OSC), where all classes are identified simultaneously, and one-versus-all (OVA), where a “target” class is distinguished from all “non-target” classes. OSC is typically unable to handle discrimination of classes of varying similarity (e.g. with images of prostate atrophy and high grade cancer), while OVA forces several heterogeneous classes into a single “non-target” class. In this work, we present a cascaded (CAS) approach to classifying prostate biopsy tissue samples, where images from different classes are grouped to maximize intra-group homogeneity while maximizing inter-group heterogeneity.
We apply the CAS approach to categorize 2000 tissue samples taken from 214 patient studies into seven classes: epithelium, stroma, atrophy, prostatic intraepithelial neoplasia (PIN), and prostate cancer Gleason grades 3, 4, and 5. A series of increasingly granular binary classifiers are used to split the different tissue classes until the images have been categorized into a single unique class. Our automatically-extracted image feature set includes architectural features based on location of the nuclei within the tissue sample as well as texture features extracted on a per-pixel level. The CAS strategy yields a positive predictive value (PPV) of 0.86 in classifying the 2000 tissue images into one of 7 classes, compared with the OVA (0.77 PPV) and OSC approaches (0.76 PPV).
Use of the CAS strategy increases the PPV for a multi-category classification system over two common alternative strategies. In classification problems such as histopathology, where multiple class groups exist with varying degrees of heterogeneity, the CAS system can intelligently assign class labels to objects by performing multiple binary classifications according to domain knowledge.
The aerobic energy metabolism of cardiac muscle cells is of major importance for the contractile function of the heart. Because energy metabolism is very heterogeneously distributed in heart tissue, especially during coronary disease, a method to quantify metabolic fluxes in small tissue samples is desirable. Taking tissue biopsies after infusion of substrates labeled with stable carbon isotopes makes this possible in animal experiments. However, the appreciable noise level in NMR spectra of extracted tissue samples makes computational estimation of metabolic fluxes challenging and a good method to define confidence regions was not yet available.
Here we present a computational analysis method for nuclear magnetic resonance (NMR) measurements of tricarboxylic acid (TCA) cycle metabolites. The method was validated using measurements on extracts of single tissue biopsies taken from porcine heart in vivo. Isotopic enrichment of glutamate was measured by NMR spectroscopy in tissue samples taken at a single time point after the timed infusion of 13C labeled substrates for the TCA cycle. The NMR intensities for glutamate were analyzed with a computational model describing carbon transitions in the TCA cycle and carbon exchange with amino acids. The model dynamics depended on five flux parameters, which were optimized to fit the NMR measurements. To determine confidence regions for the estimated fluxes, we used the Metropolis-Hastings algorithm for Markov chain Monte Carlo (MCMC) sampling to generate extensive ensembles of feasible flux combinations that describe the data within measurement precision limits. To validate our method, we compared myocardial oxygen consumption calculated from the TCA cycle flux with in vivo blood gas measurements for 38 hearts under several experimental conditions, e.g. during coronary artery narrowing.
Despite the appreciable NMR noise level, the oxygen consumption in the tissue samples, estimated from the NMR spectra, correlates with blood-gas oxygen uptake measurements for the whole heart. The MCMC method provides confidence regions for the estimated metabolic fluxes in single cardiac biopsies, taking the quantified measurement noise level and the nonlinear dependencies between parameters fully into account.
Cardiac physiology; Metabolic modeling; Metabolomics; Sensitivity analysis; 13C metabolic flux analysis
This study evaluated occurrence and potential clinical significance of intratumoral EGFR mutational heterogeneity in Chinese patients with non-small cell lung cancer (NSCLC).
Materials and Methods
Eighty-five stage IIIa-IV NSCLC patients who had undergone palliative surgical resection were included in this study. Of these, 45 patients carried EGFR mutations (group-M) and 40 patients were wild-type (group-W). Each tumor sample was microdissected to yield 28–34 tumor foci and Intratumoral EGFR mutation were determined using Denaturing High Performance Liquid Chromatography (DHPLC) and Amplification Refractory Mutation System (ARMS). EGFR copy numbers were measured using fluorescence in situ hybridization (FISH).
Microdissection yielded 1,431 tumor foci from EGFR mutant patients (group-M) and 1,238 foci from wild-type patients (group-W). The EGFR mutant frequencies in group-M were 80.6% (1,154/1,431) and 87.1% (1,247/1,431) using DHPLC and ARMS, respectively. A combination of EGFR-mutated and wild-type cells was detected in 32.9% (28/85) of samples by DHPLC and 28.2% (24/85) by ARMS, supporting the occurrence of intratumoral heterogeneity. Thirty-one patients (36.5%) were identified as EGFR FISH-positive. Patients harboring intratumoral mutational heterogeneity possessed lower EGFR copy numbers than those tumors contained mutant cells alone (16.7% vs. 71.0%, P<0.05). Among 26 patients who had received EGFR-TKIs, the mean EGFR mutation content was higher in patients showing partial response (86.1%) or stable disease (48.7%) compared with patients experiencing progressive disease (6.0%) (P = 0.001). There also showed relationship between progression-free survival (PFS) and different content of EGFR mutation groups (pure wild type EGFR, EGFR mutation with heterogeneity and pure mutated EGFR) (P = 0.001).
Approximately 30% of patients presented intratumoral EGFR mutational heterogeneity, accompanying with relatively low EGFR copy number. EGFR mutant content was correlated with the response and prognosis of EGFR-TKIs.
Diagnosis of Barrett's esophagus (BE) is typically done through morphologic analysis of esophageal tissue biopsy. Such samples contain several cell types. Laser capture microdissection (LCM) allows the isolation of specific cells from heterogeneous cell populations. The purpose of this study was to determine the degree of overlap of the two sample types and to define a set of genes that may serve as biochemical markers for BE.
We obtained biopsies from regions of the glandular tissue of BE and normal esophagus from 9 subjects with BE. Samples from 5 subjects were examined as whole tissue (BE [whole]; E [whole]), and in 4 subjects the glandular epithelium of BE was isolated using LCM (BE [LCM]) and compared to the averaged values (E [LCM]) for both basal cell (B [LCM]) and squamous cell (S [LCM]) epithelium.
Gene expression revealed 1797 probesets between BE [whole] and E [whole] (fold change > 2.0; p<0.001). Most (74%) were also differentially expressed between BE [LCM] and E [LCM], showing that there was high concordance between the two sampling methods. LCM provided a great deal of additional information (2113 genes) about the alterations in gene expression that may represent the BE phenotype.
There are differences in gene expression profiles depending on whether specimens are whole tissue biopsies or LCM dissected. Whole tissue biopsies should prove satisfactory for diagnostic purposes. Because the data from LCM samples delineated many more Barrett's specific genes, this procedure may provide more information regarding pathogenesis than whole tissue material.
Gene expression analysis is generally performed on heterogeneous tissue samples consisting of multiple cell types. Current methods developed to separate heterogeneous gene expression rely on prior knowledge of the cell-type composition and/or signatures - these are not available in most public datasets. We present a novel method to identify the cell-type composition, signatures and proportions per sample without need for a-priori information. The method was successfully tested on controlled and semi-controlled datasets and performed as accurately as current methods that do require additional information. As such, this method enables the analysis of cell-type specific gene expression using existing large pools of publically available microarray datasets.
Gene expression microarrays are widely used to uncover biological insights. Most microarray experiments profile whole tissues containing mixtures of multiple cell-types. As such, gene expression differences between samples may be due to different cellular compositions or biological differences, highly limiting the conclusions derived from the analysis. All current approaches to computationally separate the heterogeneous gene expression to individual cell-types require that the identity, relative amount of the cell-types in the tissue or their individual gene expression are known. Publically available microarray-based datasets, which include thousands of patient samples, do not usually measure this information, rendering existing separation methods unusable. We developed a novel approach to estimate the number of cell-types, identities, individual gene expression and relative proportions in heterogeneous tissues with no a-priori information except for an initial estimate of the cell-types in the tissue analyzed and general reference signatures of these cell-types that may be easily obtained from public databases. We successfully applied our method to microarray datasets, yielding highly accurate estimations, which often exceed the performance of separation methods that require prior information. Thus, our method can be accurately applied to any heterogeneous dataset, where re-examination and analysis of the individual cell-types in the heterogeneous tissue can aid in discovering new aspects regarding these diseases.
Aims—Laser capture microdissection is a recent development that enables the isolation of specific cell types for subsequent molecular analysis. This study describes a method for obtaining proteome information from laser capture microdissected tissue using colon cancer as a model.
Methods—Laser capture microdissection was performed on toluidine blue stained frozen sections of colon cancer. Tumour cells were selectively microdissected. Conditions were established for solubilising proteins from laser microdissected samples and these proteins were separated by two dimensional gel electrophoresis. Individual protein spots were cut from the gel, characterised by mass spectrometry, and identified by database searching. These results were compared with protein expression patterns and mass spectroscopic data obtained from bulk tumour samples run in parallel.
Results—Proteins could be recovered from laser capture microdissected tissue in a form suitable for two dimensional gel electrophoresis. The solubilised proteins retained their expected electrophoretic mobility in two dimensional gels as compared with bulk samples, and mass spectrometric analysis was also unaffected.
Conclusion—A method for performing two dimensional gel electrophoresis and mass spectrometry using laser capture microdissected tissue has been developed.
colon cancer; electrophoresis; proteomics
We discuss Bayesian modelling and computational methods in analysis of indirectly observed spatial point processes. The context involves noisy measurements on an underlying point process that provide indirect and noisy data on locations of point outcomes. We are interested in problems in which the spatial intensity function may be highly heterogenous, and so is modelled via flexible nonparametric Bayesian mixture models. Analysis aims to estimate the underlying intensity function and the abundance of realized but unobserved points. Our motivating applications involve immunological studies of multiple fluorescent intensity images in sections of lymphatic tissue where the point processes represent geographical configurations of cells. We are interested in estimating intensity functions and cell abundance for each of a series of such data sets to facilitate comparisons of outcomes at different times and with respect to differing experimental conditions. The analysis is heavily computational, utilizing recently introduced MCMC approaches for spatial point process mixtures and extending them to the broader new context here of unobserved outcomes. Further, our example applications are problems in which the individual objects of interest are not simply points, but rather small groups of pixels; this implies a need to work at an aggregate pixel region level and we develop the resulting novel methodology for this. Two examples with with immunofluorescence histology data demonstrate the models and computational methodology.
Bayesian computation; blocked Gibbs sampler; Dirichlet process mixture model; inhomogeneous Poisson process; unobserved point process
In this paper we introduce an efficient algorithm for alignment of multiple large-scale biological networks. In this scheme, we first compute a probabilistic similarity measure between nodes that belong to different networks using a semi-Markov random walk model. The estimated probabilities are further enhanced by incorporating the local and the cross-species network similarity information through the use of two different types of probabilistic consistency transformations. The transformed alignment probabilities are used to predict the alignment of multiple networks based on a greedy approach. We demonstrate that the proposed algorithm, called SMETANA, outperforms many state-of-the-art network alignment techniques, in terms of computational efficiency, alignment accuracy, and scalability. Our experiments show that SMETANA can easily align tens of genome-scale networks with thousands of nodes on a personal computer without any difficulty. The source code of SMETANA is available upon request. The source code of SMETANA can be downloaded from http://www.ece.tamu.edu/~bjyoon/SMETANA/.