PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1196800)

Clipboard (0)
None

Related Articles

1.  A protein–protein interaction guided method for competitive transcription factor binding improves target predictions 
Nucleic Acids Research  2009;37(22):e146.
An important milestone in revealing cells' functions is to build a comprehensive understanding of transcriptional regulation processes. These processes are largely regulated by transcription factors (TFs) binding to DNA sites. Several TF binding site (TFBS) prediction methods have been developed, but they usually model binding of a single TF at a time albeit few methods for predicting binding of multiple TFs also exist. In this article, we propose a probabilistic model that predicts binding of several TFs simultaneously. Our method explicitly models the competitive binding between TFs and uses the prior knowledge of existing protein–protein interactions (PPIs), which mimics the situation in the nucleus. Modeling DNA binding for multiple TFs improves the accuracy of binding site prediction remarkably when compared with other programs and the cases where individual binding prediction results of separate TFs have been combined. The traditional TFBS prediction methods usually predict overwhelming number of false positives. This lack of specificity is overcome remarkably with our competitive binding prediction method. In addition, previously unpredictable binding sites can be detected with the help of PPIs. Source codes are available at http://www.cs.tut.fi/∼harrila/.
doi:10.1093/nar/gkp789
PMCID: PMC2794167  PMID: 19786498
2.  Methods for time series analysis of RNA-seq data with application to human Th17 cell differentiation 
Bioinformatics  2014;30(12):i113-i120.
Motivation: Gene expression profiling using RNA-seq is a powerful technique for screening RNA species’ landscapes and their dynamics in an unbiased way. While several advanced methods exist for differential expression analysis of RNA-seq data, proper tools to anal.yze RNA-seq time-course have not been proposed.
Results: In this study, we use RNA-seq to measure gene expression during the early human T helper 17 (Th17) cell differentiation and T-cell activation (Th0). To quantify Th17-specific gene expression dynamics, we present a novel statistical methodology, DyNB, for analyzing time-course RNA-seq data. We use non-parametric Gaussian processes to model temporal correlation in gene expression and combine that with negative binomial likelihood for the count data. To account for experiment-specific biases in gene expression dynamics, such as differences in cell differentiation efficiencies, we propose a method to rescale the dynamics between replicated measurements. We develop an MCMC sampling method to make inference of differential expression dynamics between conditions. DyNB identifies several known and novel genes involved in Th17 differentiation. Analysis of differentiation efficiencies revealed consistent patterns in gene expression dynamics between different cultures. We use qRT-PCR to validate differential expression and differentiation efficiencies for selected genes. Comparison of the results with those obtained via traditional timepoint-wise analysis shows that time-course analysis together with time rescaling between cultures identifies differentially expressed genes which would not otherwise be detected.
Availability: An implementation of the proposed computational methods will be available at http://research.ics.aalto.fi/csb/software/
Contact: tarmo.aijo@aalto.fi or harri.lahdesmaki@aalto.fi
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btu274
PMCID: PMC4058923  PMID: 24931974
3.  Patterns of basal signaling heterogeneity can distinguish cellular populations with different drug sensitivities 
Non small cell lung cancer H460 clones exhibit a high degree of heterogeneity in signaling states.Clones with similar patterns of basal signaling heterogeneity have similar paclitaxel sensitivities.Models of signaling heterogeneity among the clones can be used to classify sensitivity to paclitaxel for other cancer populations.
A high degree of phenotypic diversity has been classically observed among cancer cells, even within a single tumor (Heppner, 1984; Anderson et al, 2006; Ichim and Wells, 2006; Campbell and Polyak, 2007). Importantly, not all cancer cells contribute equally to disease progression or respond equally to therapeutic intervention (Campbell and Polyak, 2007). This heterogeneity has traditionally been viewed as an impediment to efficient diagnosis and treatment. Understanding the relevance of cellular diversity to cancer requires methods for relating patterns of phenotypic heterogeneity to functional outcomes, such as drug sensitivity. Recent advances in fluorescence microscopy image-based analysis have enabled quantitative single-cell measurements of the activation and (co-)localization of signaling molecules within large cellular populations (Boland and Murphy, 2001; Perlman et al, 2004). Here, we apply this technology to explore the extent to which patterns of basal signaling heterogeneity, present within cancer populations before treatment, reveal information about population-level response to drug perturbation.
To investigate basal cell signaling heterogeneity among a collection of cancer populations having minimal exogenous differences, such as those due to environment, cell type, and genetic background, we generated a collection of 49 low-passage clonal populations from the highly metastatic nonsmall cell lung cancer cell line H460 (Kozaki et al, 2000). We chose to observe patterns of spatial organization and activation for multiple components from diverse signaling pathways associated with cancer (marker sets 1–4: DNA/pSTAT3/pPTEN; DNA/pERK/pP38; DNA/E-cadherin/β-catenin/pGSK3; DNA/pAkt/H3K9-Ac).
We identified an objective set of signaling stereotypes from each marker set based on a probabilistic description of the distribution of cells in the feature space. For each marker set, a ‘reference' set of representative cells was sampled from all 50 H460 cancer populations. Then, each reference set was represented as a mixture of subpopulations modeled as Gaussian distributions with means centered on distinct, ‘stereotyped' signaling states (Slack et al, 2008). Our quantitative analysis suggested that a small collection of signaling stereotypes was sufficient to characterize the complexity of observed cellular phenotypes among all clones. For simplicity, we chose to use five subpopulations to model cellular heterogeneity in each marker set.
For each clone, we computed the fraction of cells in each of the identified subpopulations (Figure 2, scatter plots). Estimation of these fractions allowed us to represent each clone as a probabilistic ensemble of subpopulations. Visual differences among the clones (Figure 2, thumbnail images) were reflected by clear differences in subpopulation mixtures (Figure 2, scatter plots). To compare the subpopulation mixtures of each clone to the parent, a ‘subpopulation enrichment' profile vector was computed. The vector measured the log-fold change between the clone and the H460 parent population for each subpopulation (Figure 2, heat map).
We applied hierarchical clustering to group clones based on the similarity of their subpopulation enrichment profiles (Figure 2). Clustering by subpopulation enrichment profiles revealed only a small number of distinct patterns (or ‘signatures') of subpopulation mixtures (Figure 2, dendrogram and heat map). Thus, parameterization of observed cellular heterogeneity using subpopulation enrichment profiles succinctly encapsulated the apparent complexity of cancer cell phenotypes, and further allowed comparison of clonal populations at a resolution greater than provided by population means.
We next assessed the degree to which clones with distinct patterns of heterogeneity had distinct responses to the drug paclitaxel. We used a multidimensional scaling (Borg and Groenen, 1997) plot to visualize similarity among the clones and annotated each clone with the index of drug sensitivity. This visualization revealed striking geometric separation in ‘profile space' of paclitaxel-sensitive from paclitaxel-nonsensitive clones for each marker set (Figure 3A, green versus red and black circles). The significance of separation was further confirmed by machine learning-based classification studies. Thus heterogeneity of basal cellular signaling states contained information that could be used to predict sensitivity to drug treatment.
Our approach is general, and makes heterogeneity a computable property of cellular populations. Interrogation at subpopulation-resolution facilitated a dramatic reduction in the observed phenotypic complexity of cancer populations, yet retained sufficient biological information to identify drug responses. Our work suggests that rigorous analysis of cancer heterogeneity can provide a new resolution at which to match disease to more effective therapies.
Phenotypic heterogeneity has been widely observed in cellular populations. However, the extent to which heterogeneity contains biologically or clinically important information is not well understood. Here, we investigated whether patterns of basal signaling heterogeneity, in untreated cancer cell populations, could distinguish cellular populations with different drug sensitivities. We modeled cellular heterogeneity as a mixture of stereotyped signaling states, identified based on colocalization patterns of activated signaling molecules from microscopy images. We found that patterns of heterogeneity could be used to separate the most sensitive and resistant populations to paclitaxel within a set of H460 lung cancer clones and within the NCI-60 panel of cancer cell lines, but not for a set of less heterogeneous, immortalized noncancer human bronchial epithelial cell (HBEC) clones. Our results suggest that patterns of signaling heterogeneity, characterized as ensembles of a small number of distinct phenotypic states, can reveal functional differences among cellular populations.
doi:10.1038/msb.2010.22
PMCID: PMC2890326  PMID: 20461076
cancer; heterogeneity; multivariate analysis; signaling; systems biology
4.  Three RNA Binding Proteins Form a Complex to Promote Differentiation of Germline Stem Cell Lineage in Drosophila 
PLoS Genetics  2014;10(11):e1004797.
In regenerative tissues, one of the strategies to protect stem cells from genetic aberrations, potentially caused by frequent cell division, is to transiently expand the stem cell daughters before further differentiation. However, failure to exit the transit amplification may lead to overgrowth, and the molecular mechanism governing this regulation remains vague. In a Drosophila mutagenesis screen for factors involved in the regulation of germline stem cell (GSC) lineage, we isolated a mutation in the gene CG32364, which encodes a putative RNA-binding protein (RBP) and is designated as tumorous testis (tut). In tut mutant, spermatogonia fail to differentiate and over-amplify, a phenotype similar to that in mei-P26 mutant. Mei-P26 is a TRIM-NHL tumor suppressor homolog required for the differentiation of GSC lineage. We found that Tut binds preferentially a long isoform of mei-P26 3′UTR, and is essential for the translational repression of mei-P26 reporter. Bam and Bgcn are both RBPs that have also been shown to repress mei-P26 expression. Our genetic analyses indicate that tut, bam, or bgcn is required to repress mei-P26 and to promote the differentiation of GSCs. Biochemically, we demonstrate that Tut, Bam, and Bgcn can form a physical complex in which Bam holds Tut on its N-terminus and Bgcn on its C-terminus. Our in vivo and in vitro evidence illustrate that Tut acts with Bam, Bgcn to accurately coordinate proliferation and differentiation in Drosophila germline stem cell lineage.
Author Summary
In regenerative tissues, the successive differentiation of stem cell lineage is well controlled and coordinated with proper cell proliferation at each differentiation stage. Disruption of the control mechanism can lead to tumor growth or tissue degeneration. The germline stem cell lineage of Drosophila spermatogenesis provides an ideal research model to unravel the genetic network coordinating proliferation and differentiation. In a genetic screen, we identified a male-sterile mutant whose germ cells are under-differentiated and overproliferating. The responsible gene encodes an RNA-binding protein whose target belongs to a tumor suppressor family. We demonstrate that this and two other RNA-binding proteins form a physical and functional unit to ensure the proper differentiation and accurate proliferation of germline stem cell lineage.
doi:10.1371/journal.pgen.1004797
PMCID: PMC4238977  PMID: 25412508
5.  Computational Methods for Estimation of Cell Cycle Phase Distributions of Yeast Cells 
Two computational methods for estimating the cell cycle phase distribution of a budding yeast (Saccharomyces cerevisiae) cell population are presented. The first one is a nonparametric method that is based on the analysis of DNA content in the individual cells of the population. The DNA content is measured with a fluorescence-activated cell sorter (FACS). The second method is based on budding index analysis. An automated image analysis method is presented for the task of detecting the cells and buds. The proposed methods can be used to obtain quantitative information on the cell cycle phase distribution of a budding yeast S. cerevisiae population. They therefore provide a solid basis for obtaining the complementary information needed in deconvolution of gene expression data. As a case study, both methods are tested with data that were obtained in a time series experiment with S. cerevisiae. The details of the time series experiment as well as the image and FACS data obtained in the experiment can be found in the online additional material at http://www.cs.tut.fi/sgn/csb/yeastdistrib/.
doi:10.1155/2007/46150
PMCID: PMC3171340  PMID: 18354733
6.  A mixture model for expression deconvolution from RNA-seq in heterogeneous tissues 
BMC Bioinformatics  2013;14(Suppl 5):S11.
Background
RNA-seq, a next-generation sequencing based method for transcriptome analysis, is rapidly emerging as the method of choice for comprehensive transcript abundance estimation. The accuracy of RNA-seq can be highly impacted by the purity of samples. A prominent, outstanding problem in RNA-seq is how to estimate transcript abundances in heterogeneous tissues, where a sample is composed of more than one cell type and the inhomogeneity can substantially confound the transcript abundance estimation of each individual cell type. Although experimental methods have been proposed to dissect multiple distinct cell types, computationally "deconvoluting" heterogeneous tissues provides an attractive alternative, since it keeps the tissue sample as well as the subsequent molecular content yield intact.
Results
Here we propose a probabilistic model-based approach, Transcript Estimation from Mixed Tissue samples (TEMT), to estimate the transcript abundances of each cell type of interest from RNA-seq data of heterogeneous tissue samples. TEMT incorporates positional and sequence-specific biases, and its online EM algorithm only requires a runtime proportional to the data size and a small constant memory. We test the proposed method on both simulation data and recently released ENCODE data, and show that TEMT significantly outperforms current state-of-the-art methods that do not take tissue heterogeneity into account. Currently, TEMT only resolves the tissue heterogeneity resulting from two cell types, but it can be extended to handle tissue heterogeneity resulting from multi cell types. TEMT is written in python, and is freely available at https://github.com/uci-cbcl/TEMT.
Conclusions
The probabilistic model-based approach proposed here provides a new method for analyzing RNA-seq data from heterogeneous tissue samples. By applying the method to both simulation data and ENCODE data, we show that explicitly accounting for tissue heterogeneity can significantly improve the accuracy of transcript abundance estimation.
doi:10.1186/1471-2105-14-S5-S11
PMCID: PMC3622628  PMID: 23735186
7.  In silico microdissection of microarray data from heterogeneous cell populations 
BMC Bioinformatics  2005;6:54.
Background
Very few analytical approaches have been reported to resolve the variability in microarray measurements stemming from sample heterogeneity. For example, tissue samples used in cancer studies are usually contaminated with the surrounding or infiltrating cell types. This heterogeneity in the sample preparation hinders further statistical analysis, significantly so if different samples contain different proportions of these cell types. Thus, sample heterogeneity can result in the identification of differentially expressed genes that may be unrelated to the biological question being studied. Similarly, irrelevant gene combinations can be discovered in the case of gene expression based classification.
Results
We propose a computational framework for removing the effects of sample heterogeneity by "microdissecting" microarray data in silico. The computational method provides estimates of the expression values of the pure (non-heterogeneous) cell samples. The inversion of the sample heterogeneity can be facilitated by providing accurate estimates of the mixing percentages of different cell types in each measurement. For those cases where no such information is available, we develop an optimization-based method for joint estimation of the mixing percentages and the expression values of the pure cell samples. We also consider the problem of selecting the correct number of cell types.
Conclusion
The efficiency of the proposed methods is illustrated by applying them to a carefully controlled cDNA microarray data obtained from heterogeneous samples. The results demonstrate that the methods are capable of reconstructing both the sample and cell type specific expression values from heterogeneous mixtures and that the mixing percentages of different cell types can also be estimated. Furthermore, a general purpose model selection method can be used to select the correct number of cell types.
doi:10.1186/1471-2105-6-54
PMCID: PMC1274251  PMID: 15766384
8.  NanoMiner — Integrative Human Transcriptomics Data Resource for Nanoparticle Research 
PLoS ONE  2013;8(7):e68414.
The potential impact of nanoparticles on the environment and on human health has attracted considerable interest worldwide. The amount of transcriptomics data, in which tissues and cell lines are exposed to nanoparticles, increases year by year. In addition to the importance of the original findings, this data can have value in broader context when combined with other previously acquired and published results. In order to facilitate the efficient usage of the data, we have developed the NanoMiner web resource (http://nanominer.cs.tut.fi/), which contains 404 human transcriptome samples exposed to various types of nanoparticles. All the samples in NanoMiner have been annotated, preprocessed and normalized using standard methods that ensure the quality of the data analyses and enable the users to utilize the database systematically across the different experimental setups and platforms. With NanoMiner it is possible to 1) search and plot the expression profiles of one or several genes of interest, 2) cluster the samples within the datasets, 3) find differentially expressed genes in various nanoparticle studies, 4) detect the nanoparticles causing differential expression of selected genes, 5) analyze enriched Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways and Gene Ontology (GO) terms for the detected genes and 6) search the expression values and differential expressions of the genes belonging to a specific KEGG pathway or Gene Ontology. In sum, NanoMiner database is a valuable collection of microarray data which can be also used as a data repository for future analyses.
doi:10.1371/journal.pone.0068414
PMCID: PMC3709991  PMID: 23874618
9.  EPEPT: A web service for enhanced P-value estimation in permutation tests 
BMC Bioinformatics  2011;12:411.
Background
In computational biology, permutation tests have become a widely used tool to assess the statistical significance of an event under investigation. However, the common way of computing the P-value, which expresses the statistical significance, requires a very large number of permutations when small (and thus interesting) P-values are to be accurately estimated. This is computationally expensive and often infeasible. Recently, we proposed an alternative estimator, which requires far fewer permutations compared to the standard empirical approach while still reliably estimating small P-values [1].
Results
The proposed P-value estimator has been enriched with additional functionalities and is made available to the general community through a public website and web service, called EPEPT. This means that the EPEPT routines can be accessed not only via a website, but also programmatically using any programming language that can interact with the web. Examples of web service clients in multiple programming languages can be downloaded. Additionally, EPEPT accepts data of various common experiment types used in computational biology. For these experiment types EPEPT first computes the permutation values and then performs the P-value estimation. Finally, the source code of EPEPT can be downloaded.
Conclusions
Different types of users, such as biologists, bioinformaticians and software engineers, can use the method in an appropriate and simple way.
Availability
http://informatics.systemsbiology.net/EPEPT/
doi:10.1186/1471-2105-12-411
PMCID: PMC3277916  PMID: 22024252
10.  Moving from Data on Deaths to Public Health Policy in Agincourt, South Africa: Approaches to Analysing and Understanding Verbal Autopsy Findings 
PLoS Medicine  2010;7(8):e1000325.
Peter Byass and colleagues compared two methods of assessing data from verbal autopsies, review by physicians or probabilistic modeling, and show that probabilistic modeling is the most efficient means of analyzing these data
Background
Cause of death data are an essential source for public health planning, but their availability and quality are lacking in many parts of the world. Interviewing family and friends after a death has occurred (a procedure known as verbal autopsy) provides a source of data where deaths otherwise go unregistered; but sound methods for interpreting and analysing the ensuing data are essential. Two main approaches are commonly used: either physicians review individual interview material to arrive at probable cause of death, or probabilistic models process the data into likely cause(s). Here we compare and contrast these approaches as applied to a series of 6,153 deaths which occurred in a rural South African population from 1992 to 2005. We do not attempt to validate either approach in absolute terms.
Methods and Findings
The InterVA probabilistic model was applied to a series of 6,153 deaths which had previously been reviewed by physicians. Physicians used a total of 250 cause-of-death codes, many of which occurred very rarely, while the model used 33. Cause-specific mortality fractions, overall and for population subgroups, were derived from the model's output, and the physician causes coded into comparable categories. The ten highest-ranking causes accounted for 83% and 88% of all deaths by physician interpretation and probabilistic modelling respectively, and eight of the highest ten causes were common to both approaches. Top-ranking causes of death were classified by population subgroup and period, as done previously for the physician-interpreted material. Uncertainty around the cause(s) of individual deaths was recognised as an important concept that should be reflected in overall analyses. One notably discrepant group involved pulmonary tuberculosis as a cause of death in adults aged over 65, and these cases are discussed in more detail, but the group only accounted for 3.5% of overall deaths.
Conclusions
There were no differences between physician interpretation and probabilistic modelling that might have led to substantially different public health policy conclusions at the population level. Physician interpretation was more nuanced than the model, for example in identifying cancers at particular sites, but did not capture the uncertainty associated with individual cases. Probabilistic modelling was substantially cheaper and faster, and completely internally consistent. Both approaches characterised the rise of HIV-related mortality in this population during the period observed, and reached similar findings on other major causes of mortality. For many purposes probabilistic modelling appears to be the best available means of moving from data on deaths to public health actions.
Please see later in the article for the Editors' Summary
Editors' Summary
Background
Whenever someone dies in a developed country, the cause of death is determined by a doctor and entered into a “vital registration system,” a record of all the births and deaths in that country. Public-health officials and medical professionals use this detailed and complete information about causes of death to develop public-health programs and to monitor how these programs affect the nation's health. Unfortunately, in many developing countries dying people are not attended by doctors and vital registration systems are incomplete. In most African countries, for example, less than one-quarter of deaths are recorded in vital registration systems. One increasingly important way to improve knowledge about the patterns of death in developing countries is “verbal autopsy” (VA). Using a standard form, trained personnel ask relatives and caregivers about the symptoms that the deceased had before his/her death and about the circumstances surrounding the death. Physicians then review these forms and assign a specific cause of death from a shortened version of the International Classification of Diseases, a list of codes for hundreds of diseases.
Why Was This Study Done?
Physician review of VA forms is time-consuming and expensive. Consequently, computer-based, “probabilistic” models have been developed that process the VA data and provide a likely cause of death. These models are faster and cheaper than physician review of VAs and, because they do not rely on the views of local doctors about the likely causes of death, they are more internally consistent. But are physician review and probabilistic models equally sound ways of interpreting VA data? In this study, the researchers compare and contrast the interpretation of VA data by physician review and by a probabilistic model called the InterVA model by applying these two approaches to the deaths that occurred in Agincourt, a rural region of northeast South Africa, between 1992 and 2005. The Agincourt health and sociodemographic surveillance system is a member of the INDEPTH Network, a global network that is evaluating the health and demographic characteristics (for example, age, gender, and education) of populations in low- and middle-income countries over several years.
What Did the Researchers Do and Find?
The researchers applied the InterVA probabilistic model to 6,153 deaths that had been previously reviewed by physicians. They grouped the 250 cause-of-death codes used by the physicians into categories comparable with the 33 cause-of-death codes used by the InterVA model and derived cause-specific mortality fractions (the proportions of the population dying from specific causes) for the whole population and for subgroups (for example, deaths in different age groups and deaths occurring over specific periods of time) from the output of both approaches. The ten highest-ranking causes of death accounted for 83% and 88% of all deaths by physician interpretation and by probabilistic modelling, respectively. Eight of the most frequent causes of death—HIV, tuberculosis, chronic heart conditions, diarrhea, pneumonia/sepsis, transport-related accidents, homicides, and indeterminate—were common to both interpretation methods. Both methods coded about a third of all deaths as indeterminate, often because of incomplete VA data. Generally, there was close agreement between the methods for the five principal causes of death for each age group and for each period of time, although one notable discrepancy was pulmonary (lung) tuberculosis, which accounted for 6.4% and 21.3% of deaths in this age group, respectively, according to the physicians and to the model. However, these deaths accounted for only 3.5% of all the deaths.
What Do These Findings Mean?
These findings reveal no differences between the cause-specific mortality fractions determined from VA data by physician interpretation and by probabilistic modelling that might have led to substantially different public-health policy programmes being initiated in this population. Importantly, both approaches clearly chart the rise of HIV-related mortality in this South African population between 1992 and 2005 and reach similar findings on other major causes of mortality. The researchers note that, although preparing the amount of VA data considered here for entry into the probabilistic model took several days, the model itself runs very quickly and always gives consistent answers. Given these findings, the researchers conclude that in many settings probabilistic modeling represents the best means of moving from VA data to public-health actions.
Additional Information
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1000325.
The importance of accurate data on death is further discussed in a perspective previously published in PLoS Medicine Perspective by Colin Mathers and Ties Boerma
The World Health Organization (WHO) provides information on the vital registration of deaths and on the International Classification of Diseases; the WHO Health Metrics Network is a global collaboration focused on improving sources of vital statistics; and the WHO Global Health Observatory brings together core health statistics for WHO member states
The INDEPTH Network is a global collaboration that is collecting health statistics from developing countries; it provides more information about the Agincourt health and socio-demographic surveillance system and access to standard VA forms
Information on the Agincourt health and sociodemographic surveillance system is available on the University of Witwatersrand Web site
The InterVA Web site provides resources for interpreting verbal autopsy data and the Umeå Centre for Global Health Reseach, where the InterVA model was developed, is found at http://www.globalhealthresearch.net
A recent PLoS Medicine Essay by Peter Byass, lead author of this study, discusses The Unequal World of Health Data
doi:10.1371/journal.pmed.1000325
PMCID: PMC2923087  PMID: 20808956
11.  POMO - Plotting Omics analysis results for Multiple Organisms 
BMC Genomics  2013;14:918.
Background
Systems biology experiments studying different topics and organisms produce thousands of data values across different types of genomic data. Further, data mining analyses are yielding ranked and heterogeneous results and association networks distributed over the entire genome. The visualization of these results is often difficult and standalone web tools allowing for custom inputs and dynamic filtering are limited.
Results
We have developed POMO (http://pomo.cs.tut.fi), an interactive web-based application to visually explore omics data analysis results and associations in circular, network and grid views. The circular graph represents the chromosome lengths as perimeter segments, as a reference outer ring, such as cytoband for human. The inner arcs between nodes represent the uploaded network. Further, multiple annotation rings, for example depiction of gene copy number changes, can be uploaded as text files and represented as bar, histogram or heatmap rings. POMO has built-in references for human, mouse, nematode, fly, yeast, zebrafish, rice, tomato, Arabidopsis, and Escherichia coli. In addition, POMO provides custom options that allow integrated plotting of unsupported strains or closely related species associations, such as human and mouse orthologs or two yeast wild types, studied together within a single analysis. The web application also supports interactive label and weight filtering. Every iterative filtered result in POMO can be exported as image file and text file for sharing or direct future input.
Conclusions
The POMO web application is a unique tool for omics data analysis, which can be used to visualize and filter the genome-wide networks in the context of chromosomal locations as well as multiple network layouts. With the several illustration and filtering options the tool supports the analysis and visualization of any heterogeneous omics data analysis association results for many organisms. POMO is freely available and does not require any installation or registration.
doi:10.1186/1471-2164-14-918
PMCID: PMC3880012  PMID: 24365393
Omics; Association; Visualization; Ortholog; Phenolog; Genome-wide; Network; Model organism
12.  The Human TUT1 Nucleotidyl Transferase as a Global Regulator of microRNA Abundance 
PLoS ONE  2013;8(7):e69630.
Post-transcriptional modifications of miRNAs with 3′ non-templated nucleotide additions (NTA) are a common phenomenon, and for a handful of miRNAs the additions have been demonstrated to modulate miRNA stability. However, it is unknown for the vast majority of miRNAs whether nucleotide additions are associated with changes in miRNA expression levels. We previously showed that miRNA 3′ additions are regulated by multiple nucleotidyl transferase enzymes. Here we examine the changes in abundance of miRNAs that exhibit altered 3′ NTA following the suppression of a panel of nucleotidyl transferases in cancer cell lines. Among the miRNAs examined, those with increased 3′ additions showed a significant decrease in abundance. More specifically, miRNAs that gained a 3′ uridine were associated with the greatest decrease in expression, consistent with a model in which 3′ uridylation influences miRNA stability. We also observed that suppression of one nucleotidyl transferase, TUT1, resulted in a global decrease in miRNA levels of approximately 40% as measured by qRT-PCR-based miRNA profiling. The mechanism of this global miRNA suppression appears to be indirect, as it occurred irrespective of changes in 3′ nucleotide addition. Also, expression of miRNA primary transcripts did not decrease following TUT1 knockdown, indicating that the mechanism is post-transcriptional. In conclusion, our results suggest that TUT1 affects miRNAs through both a direct effect on 3′ nucleotide additions to specific miRNAs and a separate, indirect effect on miRNA abundance more globally.
doi:10.1371/journal.pone.0069630
PMCID: PMC3715485  PMID: 23874977
13.  New Low Cost Cell and Tissue Acquisition System (CTAS): Microdissection of Live and Frozen Tissues 
Tissue heterogeneity is a serious limiting factor for sound cell-specific molecular studies including genomic and proteomic analyses. Although tissue microdissection technologies (e.g. laser capture microdissection) have advanced tremendously over the last decades several factors such as their generally high cost and inability to microdissect fresh or live tissues limit their widespread use. Therefore, there is a need for a low-cost and easy-to-use microdissection device. Here, we developed a low-cost vacuum-assisted capillary-based cell and tissue acquisition system (CTAS) and demonstrated its use for microdissection of brain tissues samples for several downstream applications including isolation of high quality RNA from microdissected brain tissue samples, their use for proteomics studies and electron microscopy as well as microdissection of native living brain tissues for primary cell culturing. Unlike LCM, CTAS is capable of microdissecting fresh frozen and live tissues, works in a thicker tissue sections ranging from 10 mm to 300 mm and can collect individual cells, cell clusters and subanatomical regions. CTAS has been established as a straightforward and robust microdissection tool, allowing rapid, precise and efficient procurement of specific tissue and cell types at low cost. Developed microdissection protocol avoids extensive heating, chemical treatment, laser beam exposure, and other potentially harmful physical treatment of the tissue samples, thus preserving the primary functions of the dissected cells and the macromolecules within for subsequent downstream applications.
PMCID: PMC3630683
14.  Concentric and Eccentric Time-Under-Tension during Strengthening Exercises: Validity and Reliability of Stretch-Sensor Recordings from an Elastic Exercise-Band 
PLoS ONE  2013;8(6):e68172.
Background
Total, single repetition and contraction-phase specific (concentric and eccentric) time-under-tension (TUT) are important exercise-descriptors, as they are linked to the physiological and clinical response in exercise and rehabilitation.
Objective
To investigate the validity and reliability of total, single repetition, and contraction-phase specific TUT during shoulder abduction exercises, based on data from a stretch-sensor attached to an elastic exercise band.
Methods
A concurrent validity and interrater reliability study with two raters was conducted. Twelve participants performed five sets of 10 repetitions of shoulder abduction exercises with an elastic exercise band. Exercises were video-recorded to assess concurrent validity between TUT from stretch-sensor data and from video recordings (gold standard). Agreement between methods was calculated using Limits of Agreement (LoA), and the association was assessed by Pearson correlation coefficients. Interrater reliability was calculated using intraclass correlation coefficients (ICC 2.1).
Results
Total, single repetition, and contraction-phase specific TUT – determined from video and stretch-sensor data – were highly correlated (r>0.99). Agreement between methods was high, as LoA ranged from 0.0 to 3.1 seconds for total TUT (2.6% of mean TUT), from -0.26 to 0.56 seconds for single repetition TUT (6.9%), and from -0.29 to 0.56 seconds for contraction-phase specific TUT (13.2-21.1%). Interrater reliability for total, single repetition and contraction-phase specific TUT was high (ICC>0.99). Interrater agreement was high, as LoA ranged from -2.11 to 2.56 seconds for total TUT (4.7%), from -0.46 to 0.50 seconds for single repetition TUT (9.7%) and from -0.41 to 0.44 seconds for contraction-phase specific TUT (5.2-14.5%).
Conclusion
Data from a stretch-sensor attached to an elastic exercise band is a valid measure of total and single repetition time-under-tension, and the procedure is highly reliable. This method will enable clinicians and researchers to objectively quantify if home-based exercises are performed as prescribed, with respect to time-under-tension.
doi:10.1371/journal.pone.0068172
PMCID: PMC3692465  PMID: 23825696
15.  Intra-tumor Genetic Heterogeneity and Mortality in Head and Neck Cancer: Analysis of Data from The Cancer Genome Atlas 
PLoS Medicine  2015;12(2):e1001786.
Background
Although the involvement of intra-tumor genetic heterogeneity in tumor progression, treatment resistance, and metastasis is established, genetic heterogeneity is seldom examined in clinical trials or practice. Many studies of heterogeneity have had prespecified markers for tumor subpopulations, limiting their generalizability, or have involved massive efforts such as separate analysis of hundreds of individual cells, limiting their clinical use. We recently developed a general measure of intra-tumor genetic heterogeneity based on whole-exome sequencing (WES) of bulk tumor DNA, called mutant-allele tumor heterogeneity (MATH). Here, we examine data collected as part of a large, multi-institutional study to validate this measure and determine whether intra-tumor heterogeneity is itself related to mortality.
Methods and Findings
Clinical and WES data were obtained from The Cancer Genome Atlas in October 2013 for 305 patients with head and neck squamous cell carcinoma (HNSCC), from 14 institutions. Initial pathologic diagnoses were between 1992 and 2011 (median, 2008). Median time to death for 131 deceased patients was 14 mo; median follow-up of living patients was 22 mo. Tumor MATH values were calculated from WES results. Despite the multiple head and neck tumor subsites and the variety of treatments, we found in this retrospective analysis a substantial relation of high MATH values to decreased overall survival (Cox proportional hazards analysis: hazard ratio for high/low heterogeneity, 2.2; 95% CI 1.4 to 3.3). This relation of intra-tumor heterogeneity to survival was not due to intra-tumor heterogeneity’s associations with other clinical or molecular characteristics, including age, human papillomavirus status, tumor grade and TP53 mutation, and N classification. MATH improved prognostication over that provided by traditional clinical and molecular characteristics, maintained a significant relation to survival in multivariate analyses, and distinguished outcomes among patients having oral-cavity or laryngeal cancers even when standard disease staging was taken into account. Prospective studies, however, will be required before MATH can be used prognostically in clinical trials or practice. Such studies will need to examine homogeneously treated HNSCC at specific head and neck subsites, and determine the influence of cancer therapy on MATH values. Analysis of MATH and outcome in human-papillomavirus-positive oropharyngeal squamous cell carcinoma is particularly needed.
Conclusions
To our knowledge this study is the first to combine data from hundreds of patients, treated at multiple institutions, to document a relation between intra-tumor heterogeneity and overall survival in any type of cancer. We suggest applying the simply calculated MATH metric of heterogeneity to prospective studies of HNSCC and other tumor types.
In this study, Rocco and colleagues examine data collected as part of a large, multi-institutional study, to validate a measure of tumor heterogeneity called MATH and determine whether intra-tumor heterogeneity is itself related to mortality.
Editors’ Summary
Background
Normally, the cells in human tissues and organs only reproduce (a process called cell division) when new cells are needed for growth or to repair damaged tissues. But sometimes a cell somewhere in the body acquires a genetic change (mutation) that disrupts the control of cell division and allows the cell to grow continuously. As the mutated cell grows and divides, it accumulates additional mutations that allow it to grow even faster and eventually from a lump, or tumor (cancer). Other mutations subsequently allow the tumor to spread around the body (metastasize) and destroy healthy tissues. Tumors can arise anywhere in the body—there are more than 200 different types of cancer—and about one in three people will develop some form of cancer during their lifetime. Many cancers can now be successfully treated, however, and people often survive for years after a diagnosis of cancer before, eventually, dying from another disease.
Why Was This Study Done?
The gradual acquisition of mutations by tumor cells leads to the formation of subpopulations of cells, each carrying a different set of mutations. This “intra-tumor heterogeneity” can produce tumor subclones that grow particularly quickly, that metastasize aggressively, or that are resistant to cancer treatments. Consequently, researchers have hypothesized that high intra-tumor heterogeneity leads to worse clinical outcomes and have suggested that a simple measure of this heterogeneity would be a useful addition to the cancer staging system currently used by clinicians for predicting the likely outcome (prognosis) of patients with cancer. Here, the researchers investigate whether a measure of intra-tumor heterogeneity called “mutant-allele tumor heterogeneity” (MATH) is related to mortality (death) among patients with head and neck squamous cell carcinoma (HNSCC)—cancers that begin in the cells that line the moist surfaces inside the head and neck, such as cancers of the mouth and the larynx (voice box). MATH is based on whole-exome sequencing (WES) of tumor and matched normal DNA. WES uses powerful DNA-sequencing systems to determine the variations of all the coding regions (exons) of the known genes in the human genome (genetic blueprint).
What Did the Researchers Do and Find?
The researchers obtained clinical and WES data for 305 patients who were treated in 14 institutions, primarily in the US, after diagnosis of HNSCC from The Cancer Genome Atlas, a catalog established by the US National Institutes of Health to map the key genomic changes in major types and subtypes of cancer. They calculated tumor MATH values for the patients from their WES results and retrospectively analyzed whether there was an association between the MATH values and patient survival. Despite the patients having tumors at various subsites and being given different treatments, every 10% increase in MATH value corresponded to an 8.8% increased risk (hazard) of death. Using a previously defined MATH-value cutoff to distinguish high- from low-heterogeneity tumors, compared to patients with low-heterogeneity tumors, patients with high-heterogeneity tumors were more than twice as likely to die (a hazard ratio of 2.2). Other statistical analyses indicated that MATH provided improved prognostic information compared to that provided by established clinical and molecular characteristics and human papillomavirus (HPV) status (HPV-positive HNSCC at some subsites has a better prognosis than HPV-negative HNSCC). In particular, MATH provided prognostic information beyond that provided by standard disease staging among patients with mouth or laryngeal cancers.
What Do These Findings Mean?
By using data from more than 300 patients treated at multiple institutions, these findings validate the use of MATH as a measure of intra-tumor heterogeneity in HNSCC. Moreover, they provide one of the first large-scale demonstrations that intra-tumor heterogeneity is clinically important in the prognosis of any type of cancer. Before the MATH metric can be used in clinical trials or in clinical practice as a prognostic tool, its ability to predict outcomes needs to be tested in prospective studies that examine the relation between MATH and the outcomes of patients with identically treated HNSCC at specific head and neck subsites, that evaluate the use of MATH for prognostication in other tumor types, and that determine the influence of cancer treatments on MATH values. Nevertheless, these findings suggest that MATH should be considered as a biomarker for survival in HNSCC and other tumor types, and raise the possibility that clinicians could use MATH values to decide on the best treatment for individual patients and to choose patients for inclusion in clinical trials.
Additional Information
Please access these websites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1001786.
The US National Cancer Institute (NCI) provides information about cancer and how it develops and about head and neck cancer (in English and Spanish)
Cancer Research UK, a not-for-profit organization, provides general information about cancer and how it develops, and detailed information about head and neck cancer; the Merseyside Regional Head and Neck Cancer Centre provides patient stories about HNSCC
Wikipedia provides information about tumor heterogeneity, and about whole-exome sequencing (note that Wikipedia is a free online encyclopedia that anyone can edit; available in several languages)
Information about The Cancer Genome Atlas is available
A PLOS Blog entry by Jessica Wapner explains more about MATH
doi:10.1371/journal.pmed.1001786
PMCID: PMC4323109  PMID: 25668320
16.  A hierarchical Naïve Bayes Model for handling sample heterogeneity in classification problems: an application to tissue microarrays 
BMC Bioinformatics  2006;7:514.
Background
Uncertainty often affects molecular biology experiments and data for different reasons. Heterogeneity of gene or protein expression within the same tumor tissue is an example of biological uncertainty which should be taken into account when molecular markers are used in decision making. Tissue Microarray (TMA) experiments allow for large scale profiling of tissue biopsies, investigating protein patterns characterizing specific disease states. TMA studies deal with multiple sampling of the same patient, and therefore with multiple measurements of same protein target, to account for possible biological heterogeneity. The aim of this paper is to provide and validate a classification model taking into consideration the uncertainty associated with measuring replicate samples.
Results
We propose an extension of the well-known Naïve Bayes classifier, which accounts for biological heterogeneity in a probabilistic framework, relying on Bayesian hierarchical models. The model, which can be efficiently learned from the training dataset, exploits a closed-form of classification equation, thus providing no additional computational cost with respect to the standard Naïve Bayes classifier. We validated the approach on several simulated datasets comparing its performances with the Naïve Bayes classifier. Moreover, we demonstrated that explicitly dealing with heterogeneity can improve classification accuracy on a TMA prostate cancer dataset.
Conclusion
The proposed Hierarchical Naïve Bayes classifier can be conveniently applied in problems where within sample heterogeneity must be taken into account, such as TMA experiments and biological contexts where several measurements (replicates) are available for the same biological sample. The performance of the new approach is better than the standard Naïve Bayes model, in particular when the within sample heterogeneity is different in the different classes.
doi:10.1186/1471-2105-7-514
PMCID: PMC1698579  PMID: 17125514
17.  Multi-Population Classical HLA Type Imputation 
PLoS Computational Biology  2013;9(2):e1002877.
Statistical imputation of classical HLA alleles in case-control studies has become established as a valuable tool for identifying and fine-mapping signals of disease association in the MHC. Imputation into diverse populations has, however, remained challenging, mainly because of the additional haplotypic heterogeneity introduced by combining reference panels of different sources. We present an HLA type imputation model, HLA*IMP:02, designed to operate on a multi-population reference panel. HLA*IMP:02 is based on a graphical representation of haplotype structure. We present a probabilistic algorithm to build such models for the HLA region, accommodating genotyping error, haplotypic heterogeneity and the need for maximum accuracy at the HLA loci, generalizing the work of Browning and Browning (2007) and Ron et al. (1998). HLA*IMP:02 achieves an average 4-digit imputation accuracy on diverse European panels of 97% (call rate 97%). On non-European samples, 2-digit performance is over 90% for most loci and ethnicities where data available. HLA*IMP:02 supports imputation of HLA-DPB1 and HLA-DRB3-5, is highly tolerant of missing data in the imputation panel and works on standard genotype data from popular genotyping chips. It is publicly available in source code and as a user-friendly web service framework.
Author Summary
The human leukocyte antigen (HLA) proteins influence how pathogens and components of body cells are presented to immune cells. It has long been known that they are highly variable and that this variation is associated with differential risk for autoimmune and infectious diseases. Variant frequencies differ substantially between and even within continents. Determining HLA genotypes is thus an important part of many studies to understand the genetic basis of disease risk. However, conventional methods for HLA typing (e.g. targeted sequencing, hybridisation, amplification) are typically laborious and expensive. We have developed a method for inferring an individual's HLA genotype based on evaluating genetic information from nearby variable sites that are more easily assayed, which aims to integrate heterogeneous data. We introduce two key innovations: we allow for single HLA types to appear on heterogeneous backgrounds of genetic information and we take into account the possibility of genotyping error, which is common within the HLA region. We show that the method is well-suited to deal with multi-population datasets: it enables integrated HLA type inference for individuals of differing ancestry and ethnicity. It will therefore prove useful particularly in international collaborations to better understand disease risks, where samples are drawn from multiple countries.
doi:10.1371/journal.pcbi.1002877
PMCID: PMC3572961  PMID: 23459081
18.  LINE-1 methylation shows little intra-patient heterogeneity in primary and synchronous metastatic colorectal cancer 
BMC Cancer  2012;12:574.
Background
Long interspersed nucleotide element 1 (LINE-1) hypomethylation is suggested to play a role in the progression of colorectal cancer (CRC). To assess intra-patient heterogeneity of LINE-1 methylation in CRC and to understand its biological relevance in invasion and metastasis, we evaluated the LINE-1 methylation at multiple tumor sites. In addition, the influence of stromal cell content on the measurement of LINE-1 methylation in tumor tissue was analyzed.
Methods
Formalin-fixed paraffin-embedded primary tumor tissue was obtained from 48 CRC patients. Matched adjacent normal colon tissue, lymph node metastases and distant metastases were obtained from 12, 18 and 7 of these patients, respectively. Three different areas were microdissected from each primary tumor and included the tumor center and invasive front. Normal mucosal and stromal cells were also microdissected for comparison with the tumor cells. The microdissected samples were compared in LINE-1 methylation level measured by multicolor MethyLight assay. The assay results were also compared between microdissected and macrodissected tissue samples.
Results
LINE-1 methylation within primary tumors showed no significant intra-tumoral heterogeneity, with the tumor center and invasive front showing identical methylation levels. Moreover, no difference in LINE-1 methylation was observed between the primary tumor and lymph node and distant metastases from the same patient. Tumor cells showed significantly less LINE-1 methylation compared to adjacent stromal and normal mucosal epithelial cells. Consequently, LINE-1 methylation was significantly lower in microdissected samples compared to macrodissected samples. A trend for less LINE-1 methylation was also observed in more advanced stages of CRC.
Conclusions
LINE-1 methylation shows little intra-patient tumor heterogeneity, indicating the suitability of its use for molecular diagnosis in CRC. The methylation is relatively stable during CRC progression, leading us to propose a new concept for the association between LINE-1 methylation and disease stage.
doi:10.1186/1471-2407-12-574
PMCID: PMC3534591  PMID: 23216958
LINE-1; DNA methylation; Colorectal cancer; Laser microdissection
19.  Epigenetic priors for identifying active transcription factor binding sites 
Bioinformatics  2011;28(1):56-62.
Motivation Accurate knowledge of the genome-wide binding of transcription factors in a particular cell type or under a particular condition is necessary for understanding transcriptional regulation. Using epigenetic data such as histone modification and DNase I, accessibility data has been shown to improve motif-based in silico methods for predicting such binding, but this approach has not yet been fully explored.
Results We describe a probabilistic method for combining one or more tracks of epigenetic data with a standard DNA sequence motif model to improve our ability to identify active transcription factor binding sites (TFBSs). We convert each data type into a position-specific probabilistic prior and combine these priors with a traditional probabilistic motif model to compute a log-posterior odds score. Our experiments, using histone modifications H3K4me1, H3K4me3, H3K9ac and H3K27ac, as well as DNase I sensitivity, show conclusively that the log-posterior odds score consistently outperforms a simple binary filter based on the same data. We also show that our approach performs competitively with a more complex method, CENTIPEDE, and suggest that the relative simplicity of the log-posterior odds scoring method makes it an appealing and very general method for identifying functional TFBSs on the basis of DNA and epigenetic evidence.
Availability and implementation: FIMO, part of the MEME Suite software toolkit, now supports log-posterior odds scoring using position-specific priors for motif search. A web server and source code are available at http://meme.nbcr.net. Utilities for creating priors are at http://research.imb.uq.edu.au/t.bailey/SD/Cuellar2011.
Contact: t.bailey@uq.edu.au
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btr614
PMCID: PMC3244768  PMID: 22072382
20.  Nuclear Receptor Expression Defines a Set of Prognostic Biomarkers for Lung Cancer 
PLoS Medicine  2010;7(12):e1000378.
David Mangelsdorf and colleagues show that nuclear receptor expression is strongly associated with clinical outcomes of lung cancer patients, and this expression profile is a potential prognostic signature for lung cancer patient survival time, particularly for individuals with early stage disease.
Background
The identification of prognostic tumor biomarkers that also would have potential as therapeutic targets, particularly in patients with early stage disease, has been a long sought-after goal in the management and treatment of lung cancer. The nuclear receptor (NR) superfamily, which is composed of 48 transcription factors that govern complex physiologic and pathophysiologic processes, could represent a unique subset of these biomarkers. In fact, many members of this family are the targets of already identified selective receptor modulators, providing a direct link between individual tumor NR quantitation and selection of therapy. The goal of this study, which begins this overall strategy, was to investigate the association between mRNA expression of the NR superfamily and the clinical outcome for patients with lung cancer, and to test whether a tumor NR gene signature provided useful information (over available clinical data) for patients with lung cancer.
Methods and Findings
Using quantitative real-time PCR to study NR expression in 30 microdissected non-small-cell lung cancers (NSCLCs) and their pair-matched normal lung epithelium, we found great variability in NR expression among patients' tumor and non-involved lung epithelium, found a strong association between NR expression and clinical outcome, and identified an NR gene signature from both normal and tumor tissues that predicted patient survival time and disease recurrence. The NR signature derived from the initial 30 NSCLC samples was validated in two independent microarray datasets derived from 442 and 117 resected lung adenocarcinomas. The NR gene signature was also validated in 130 squamous cell carcinomas. The prognostic signature in tumors could be distilled to expression of two NRs, short heterodimer partner and progesterone receptor, as single gene predictors of NSCLC patient survival time, including for patients with stage I disease. Of equal interest, the studies of microdissected histologically normal epithelium and matched tumors identified expression in normal (but not tumor) epithelium of NGFIB3 and mineralocorticoid receptor as single gene predictors of good prognosis.
Conclusions
NR expression is strongly associated with clinical outcomes for patients with lung cancer, and this expression profile provides a unique prognostic signature for lung cancer patient survival time, particularly for those with early stage disease. This study highlights the potential use of NRs as a rational set of therapeutically tractable genes as theragnostic biomarkers, and specifically identifies short heterodimer partner and progesterone receptor in tumors, and NGFIB3 and MR in non-neoplastic lung epithelium, for future detailed translational study in lung cancer.
Please see later in the article for the Editors' Summary
Editors' Summary
Background
Lung cancer, the most common cause of cancer-related death, kills 1.3 million people annually. Most lung cancers are “non-small-cell lung cancers” (NSCLCs), and most are caused by smoking. Exposure to chemicals in smoke causes changes in the genes of the cells lining the lungs that allow the cells to grow uncontrollably and to move around the body. How NSCLC is treated and responds to treatment depends on its “stage.” Stage I tumors, which are small and confined to the lung, are removed surgically, although chemotherapy is also sometimes given. Stage II tumors have spread to nearby lymph nodes and are treated with surgery and chemotherapy, as are some stage III tumors. However, because cancer cells in stage III tumors can be present throughout the chest, surgery is not always possible. For such cases, and for stage IV NSCLC, where the tumor has spread around the body, patients are treated with chemotherapy alone. About 70% of patients with stage I and II NSCLC but only 2% of patients with stage IV NSCLC survive for five years after diagnosis; more than 50% of patients have stage IV NSCLC at diagnosis.
Why Was This Study Done?
Patient responses to treatment vary considerably. Oncologists (doctors who treat cancer) would like to know which patients have a good prognosis (are likely to do well) to help them individualize their treatment. Consequently, the search is on for “prognostic tumor biomarkers,” molecules made by cancer cells that can be used to predict likely clinical outcomes. Such biomarkers, which may also be potential therapeutic targets, can be identified by analyzing the overall pattern of gene expression in a panel of tumors using a technique called microarray analysis and looking for associations between the expression of sets of genes and clinical outcomes. In this study, the researchers take a more directed approach to identifying prognostic biomarkers by investigating the association between the expression of the genes encoding nuclear receptors (NRs) and clinical outcome in patients with lung cancer. The NR superfamily contains 48 transcription factors (proteins that control the expression of other genes) that respond to several hormones and to diet-derived fats. NRs control many biological processes and are targets for several successful drugs, including some used to treat cancer.
What Did the Researchers Do and Find?
The researchers analyzed the expression of NR mRNAs using “quantitative real-time PCR” in 30 microdissected NSCLCs and in matched normal lung tissue samples (mRNA is the blueprint for protein production). They then used an approach called standard classification and regression tree analysis to build a prognostic model for NSCLC based on the expression data. This model predicted both survival time and disease recurrence among the patients from whom the tumors had been taken. The researchers validated their prognostic model in two large independent lung adenocarcinoma microarray datasets and in a squamous cell carcinoma dataset (adenocarcinomas and squamous cell carcinomas are two major NSCLC subtypes). Finally, they explored the roles of specific NRs in the prediction model. This analysis revealed that the ability of the NR signature in tumors to predict outcomes was mainly due to the expression of two NRs—the short heterodimer partner (SHP) and the progesterone receptor (PR). Expression of either gene could be used as a single gene predictor of the survival time of patients, including those with stage I disease. Similarly, the expression of either nerve growth factor induced gene B3 (NGFIB3) or mineralocorticoid receptor (MR) in normal tissue was a single gene predictor of a good prognosis.
What Do These Findings Mean?
These findings indicate that the expression of NR mRNA is strongly associated with clinical outcomes in patients with NSCLC. Furthermore, they identify a prognostic NR expression signature that provides information on the survival time of patients, including those with early stage disease. The signature needs to be confirmed in more patients before it can be used clinically, and researchers would like to establish whether changes in mRNA expression are reflected in changes in protein expression if NRs are to be targeted therapeutically. Nevertheless, these findings highlight the potential use of NRs as prognostic tumor biomarkers. Furthermore, they identify SHP and PR in tumors and two NRs in normal lung tissue as molecules that might provide new targets for the treatment of lung cancer and new insights into the early diagnosis, pathogenesis, and chemoprevention of lung cancer.
Additional Information
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1000378.
The Nuclear Receptor Signaling Atlas (NURSA) is consortium of scientists sponsored by the US National Institutes of Health that provides scientific reagents, datasets, and educational material on nuclear receptors and their co-regulators to the scientific community through a Web-based portal
The Cancer Prevention and Research Institute of Texas (CPRIT) provides information and resources to anyone interested in the prevention and treatment of lung and other cancers
The US National Cancer Institute provides detailed information for patients and professionals about all aspects of lung cancer, including information on non-small-cell carcinoma and on tumor markers (in English and Spanish)
Cancer Research UK also provides information about lung cancer and information on how cancer starts
MedlinePlus has links to other resources about lung cancer (in English and Spanish)
Wikipedia has a page on nuclear receptors (note that Wikipedia is a free online encyclopedia that anyone can edit; available in several languages)
doi:10.1371/journal.pmed.1000378
PMCID: PMC3001894  PMID: 21179495
21.  Optimizing Frozen Sample Preparation for Laser Microdissection: Assessment of CryoJane Tape-Transfer System® 
PLoS ONE  2013;8(6):e66854.
Laser microdissection is an invaluable tool in medical research that facilitates collecting specific cell populations for molecular analysis. Diversity of research targets (e.g., cancerous and precancerous lesions in clinical and animal research, cell pellets, rodent embryos, etc.) and varied scientific objectives, however, present challenges toward establishing standard laser microdissection protocols. Sample preparation is crucial for quality RNA, DNA and protein retrieval, where it often determines the feasibility of a laser microdissection project. The majority of microdissection studies in clinical and animal model research are conducted on frozen tissues containing native nucleic acids, unmodified by fixation. However, the variable morphological quality of frozen sections from tissues containing fat, collagen or delicate cell structures can limit or prevent successful harvest of the desired cell population via laser dissection. The CryoJane Tape-Transfer System®, a commercial device that improves cryosectioning outcomes on glass slides has been reported superior for slide preparation and isolation of high quality osteocyte RNA (frozen bone) during laser dissection. Considering the reported advantages of CryoJane for laser dissection on glass slides, we asked whether the system could also work with the plastic membrane slides used by UV laser based microdissection instruments, as these are better suited for collection of larger target areas. In an attempt to optimize laser microdissection slide preparation for tissues of different RNA stability and cryosectioning difficulty, we evaluated the CryoJane system for use with both glass (laser capture microdissection) and membrane (laser cutting microdissection) slides. We have established a sample preparation protocol for glass and membrane slides including manual coating of membrane slides with CryoJane solutions, cryosectioning, slide staining and dissection procedure, lysis and RNA extraction that facilitated efficient dissection and high quality RNA retrieval from CryoJane preparations. CryoJane technology therefore has the potential to facilitate standardization of laser microdissection slide preparation from frozen tissues.
doi:10.1371/journal.pone.0066854
PMCID: PMC3689705  PMID: 23805281
22.  A Mouse to Human Search for Plasma Proteome Changes Associated with Pancreatic Tumor Development 
PLoS Medicine  2008;5(6):e123.
Background
The complexity and heterogeneity of the human plasma proteome have presented significant challenges in the identification of protein changes associated with tumor development. Refined genetically engineered mouse (GEM) models of human cancer have been shown to faithfully recapitulate the molecular, biological, and clinical features of human disease. Here, we sought to exploit the merits of a well-characterized GEM model of pancreatic cancer to determine whether proteomics technologies allow identification of protein changes associated with tumor development and whether such changes are relevant to human pancreatic cancer.
Methods and Findings
Plasma was sampled from mice at early and advanced stages of tumor development and from matched controls. Using a proteomic approach based on extensive protein fractionation, we confidently identified 1,442 proteins that were distributed across seven orders of magnitude of abundance in plasma. Analysis of proteins chosen on the basis of increased levels in plasma from tumor-bearing mice and corroborating protein or RNA expression in tissue documented concordance in the blood from 30 newly diagnosed patients with pancreatic cancer relative to 30 control specimens. A panel of five proteins selected on the basis of their increased level at an early stage of tumor development in the mouse was tested in a blinded study in 26 humans from the CARET (Carotene and Retinol Efficacy Trial) cohort. The panel discriminated pancreatic cancer cases from matched controls in blood specimens obtained between 7 and 13 mo prior to the development of symptoms and clinical diagnosis of pancreatic cancer.
Conclusions
Our findings indicate that GEM models of cancer, in combination with in-depth proteomic analysis, provide a useful strategy to identify candidate markers applicable to human cancer with potential utility for early detection.
Samir Hanash and colleagues identify proteins that are increased at an early stage of pancreatic tumor development in a mouse model and may be a useful tool in detecting early tumors in humans.
Editors' Summary
Background.
Cancers are life-threatening, disorganized masses of cells that can occur anywhere in the human body. They develop when cells acquire genetic changes that allow them to grow uncontrollably and to spread around the body (metastasize). If a cancer is detected when it is still small and has not metastasized, surgery can often provide a cure. Unfortunately, many cancers are detected only when they are large enough to press against surrounding tissues and cause pain or other symptoms. By this time, surgical removal of the original (primary) tumor may be impossible and there may be secondary cancers scattered around the body. In such cases, radiotherapy and chemotherapy can sometimes help, but the outlook for patients whose cancers are detected late is often poor. One cancer type for which late detection is a particular problem is pancreatic adenocarcinoma. This cancer rarely causes any symptoms in its early stages. Furthermore, the symptoms it eventually causes—jaundice, abdominal and back pain, and weight loss—are seen in many other illnesses. Consequently, pancreatic cancer has usually spread before it is diagnosed, and most patients die within a year of their diagnosis.
Why Was This Study Done?
If a test could be developed to detect pancreatic cancer in its early stages, the lives of many patients might be extended. Tumors often release specific proteins—“cancer biomarkers”—into the blood, a bodily fluid that can be easily sampled. If a protein released into the blood by pancreatic cancer cells could be identified, it might be possible to develop a noninvasive screening test for this deadly cancer. In this study, the researchers use a “proteomic” approach to identify potential biomarkers for early pancreatic cancer. Proteomics is the study of the patterns of proteins made by an organism, tissue, or cell and of the changes in these patterns that are associated with various diseases.
What Did the Researchers Do and Find?
The researchers started their search for pancreatic cancer biomarkers by studying the plasma proteome (the proteins in the fluid portion of blood) of mice genetically engineered to develop cancers that closely resemble human pancreatic tumors. Through the use of two techniques called high-resolution mass spectrometry and acrylamide isotopic labeling, the researchers identified 165 proteins that were present in larger amounts in plasma collected from mice with early and/or advanced pancreatic cancer than in plasma from control mice. Then, to test whether any of these protein changes were relevant to human pancreatic cancer, the researchers analyzed blood samples collected from patients with pancreatic cancer. These samples, they report, contained larger amounts of some of these proteins than blood collected from patients with chronic pancreatitis, a condition that has similar symptoms to pancreatic cancer. Finally, using blood samples collected during a clinical trial, the Carotene and Retinol Efficacy Trial (a cancer-prevention study), the researchers showed that the measurement of five of the proteins present in increased amounts at an early stage of tumor development in the mouse model discriminated between people with pancreatic cancer and matched controls up to 13 months before cancer diagnosis.
What Do These Findings Mean?
These findings suggest that in-depth proteomic analysis of genetically engineered mouse models of human cancer might be an effective way to identify biomarkers suitable for the early detection of human cancers. Previous attempts to identify such biomarkers using human samples have been hampered by the many noncancer-related differences in plasma proteins that exist between individuals and by problems in obtaining samples from patients with early cancer. The use of a mouse model of human cancer, these findings indicate, can circumvent both of these problems. More specifically, these findings identify a panel of proteins that might allow earlier detection of pancreatic cancer and that might, therefore, extend the life of some patients who develop this cancer. However, before a routine screening test becomes available, additional markers will need to be identified and extensive validation studies in larger groups of patients will have to be completed.
Additional Information.
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.0050123.
The MedlinePlus Encyclopedia has a page on pancreatic cancer (in English and Spanish). Links to further information are provided by MedlinePlus
The US National Cancer Institute has information about pancreatic cancer for patients and health professionals (in English and Spanish)
The UK charity Cancerbackup also provides information for patients about pancreatic cancer
The Clinical Proteomic Technologies for Cancer Initiative (a US National Cancer Institute initiative) provides a tutorial about proteomics and cancer and information on the Mouse Proteomic Technologies Initiative
doi:10.1371/journal.pmed.0050123
PMCID: PMC2504036  PMID: 18547137
23.  Dynamic Training Volume: A Construct of Both Time Under Tension and Volume Load 
The purpose of this study was to investigate the effects of three different weight training protocols, that varied in the way training volume was measured, on acute muscular fatigue. Ten resistance-trained males performed all three protocols which involved dynamic constant resistance exercise of the elbow flexors. Protocol A provided a standard for the time the muscle group was under tension (TUT) and volume load (VL), expressed as the product of the total number of repetitions and the load that was lifted. Protocol B involved 40% of the TUT but the same VL compared to protocol A; protocol C was equated with protocol A for TUT but only involved 50% of the VL. Fatigue was assessed by changes in maximum voluntary isometric force and integrated electromyography (iEMG) between the pre- and post-training protocols. The results of the study showed that, when equated for VL, greater TUT produced greater overall muscular fatigue (p ≤ 0.001) as reflected by the reduction in the force generating capability of the muscle. When the protocols were equated for TUT, greater VL (p ≤ 0.01) resulted in greater overall muscular fatigue. All three protocols resulted in significant decreases in iEMG (p ≤ 0.05) but they were not significantly different from each other. It was concluded that, because of the importance of training volume to neuromuscular adaptation, the training volume needs to be clearly described when designing resistance training programs.
Key PointsIncrease in either time under tension (TUT) or volume load (VL) increases the acute fatigue response, despite being equated for volume (by another method).A potential discrepancy in training volume may be present with training parameters that fail to control for either TUT or VL.Neural fatigue may be a contributing factor to the development of muscular fatigue but is not influenced by various methods of calculating volume such as TUT or VL.
PMCID: PMC3861774  PMID: 24357968
Resistance training; maximal voluntary contraction; fatigue; electromyography
24.  Community-Based Care for the Specialized Management of Heart Failure 
Executive Summary
In August 2008, the Medical Advisory Secretariat (MAS) presented a vignette to the Ontario Health Technology Advisory Committee (OHTAC) on a proposed targeted health care delivery model for chronic care. The proposed model was defined as multidisciplinary, ambulatory, community-based care that bridged the gap between primary and tertiary care, and was intended for individuals with a chronic disease who were at risk of a hospital admission or emergency department visit. The goals of this care model were thought to include: the prevention of emergency department visits, a reduction in hospital admissions and re-admissions, facilitation of earlier hospital discharge, a reduction or delay in long-term care admissions, and an improvement in mortality and other disease-specific patient outcomes.
OHTAC approved the development of an evidence-based assessment to determine the effectiveness of specialized community based care for the management of heart failure, Type 2 diabetes and chronic wounds.
Please visit the Medical Advisory Secretariat Web site at: www.health.gov.on.ca/ohtas to review the following reports associated with the Specialized Multidisciplinary Community-Based care series.
Specialized multidisciplinary community-based care series: a summary of evidence-based analyses
Community-based care for the specialized management of heart failure: an evidence-based analysis
Community-based care for chronic wound management: an evidence-based analysis
Please note that the evidence-based analysis of specialized community-based care for the management of diabetes titled: “Community-based care for the management of type 2 diabetes: an evidence-based analysis” has been published as part of the Diabetes Strategy Evidence Platform at this URL: http://www.health.gov.on.ca/english/providers/program/mas/tech/ohtas/tech_diabetes_20091020.html
Please visit the Toronto Health Economics and Technology Assessment Collaborative Web site at: http://theta.utoronto.ca/papers/MAS_CHF_Clinics_Report.pdf to review the following economic project associated with this series:
Community-based Care for the specialized management of heart failure: a cost-effectiveness and budget impact analysis.
Objective
The objective of this evidence-based analysis was to determine the effectiveness of specialized multidisciplinary care in the management of heart failure (HF).
Clinical Need: Target Population and Condition
HF is a progressive, chronic condition in which the heart becomes unable to sufficiently pump blood throughout the body. There are several risk factors for developing the condition including hypertension, diabetes, obesity, previous myocardial infarction, and valvular heart disease.(1) Based on data from a 2005 study of the Canadian Community Health Survey (CCHS), the prevalence of congestive heart failure in Canada is approximately 1% of the population over the age of 12.(2) This figure rises sharply after the age of 45, with prevalence reports ranging from 2.2% to 12%.(3) Extrapolating this to the Ontario population, an estimated 98,000 residents in Ontario are believed to have HF.
Disease management programs are multidisciplinary approaches to care for chronic disease that coordinate comprehensive care strategies along the disease continuum and across healthcare delivery systems.(4) Evidence for the effectiveness of disease management programs for HF has been provided by seven systematic reviews completed between 2004 and 2007 (Table 1) with consistency of effect demonstrated across four main outcomes measures: all cause mortality and hospitalization, and heart-failure specific mortality and hospitalization. (4-10)
However, while disease management programs are multidisciplinary by definition, the published evidence lacks consistency and clarity as to the exact nature of each program and usual care comparators are generally ill defined. Consequently, the effectiveness of multidisciplinary care for the management of persons with HF is still uncertain. Therefore, MAS has completed a systematic review of specialized, multidisciplinary, community-based care disease management programs compared to a well-defined usual care group for persons with HF.
Evidence-Based Analysis Methods
Research Questions
What is the effectiveness of specialized, multidisciplinary, community-based care (SMCCC) compared with usual care for persons with HF?
Literature Search Strategy
A comprehensive literature search was completed of electronic databases including MEDLINE, MEDLINE In-Process and Other Non-Indexed Citations, EMBASE, Cochrane Library and Cumulative Index to Nursing & Allied Health Literature. Bibliographic references of selected studies were also searched. After a review of the title and abstracts, relevant studies were obtained and the full reports evaluated. All studies meeting explicit inclusion and exclusion criteria were retained. Where appropriate, a meta-analysis was undertaken to determine the pooled estimate of effect of specialized multidisciplinary community-based care for explicit outcomes. The quality of the body of evidence, defined as one or more relevant studies was determined using GRADE Working Group criteria. (11)
Inclusion Criteria
Randomized controlled trial
Systematic review with meta analysis
Population includes persons with New York Heart Association (NYHA) classification 1-IV HF
The intervention includes a team consisting of a nurse and physician one of which is a specialist in HF management.
The control group receives care by a single practitioner (e.g. primary care physician (PCP) or cardiologist)
The intervention begins after discharge from the hospital
The study reports 1-year outcomes
Exclusion Criteria
The intervention is delivered predominately through home-visits
Studies with mixed populations where discrete data for HF is not reported
Outcomes of Interest
All cause mortality
All cause hospitalization
HF specific mortality
HF specific hospitalization
All cause duration of hospital stay
HF specific duration of hospital stay
Emergency room visits
Quality of Life
Summary of Findings
One large and seven small randomized controlled trials were obtained from the literature search.
A meta-analysis was completed for four of the seven outcomes including:
All cause mortality
HF-specific mortality
All cause hospitalization
HF-specific hospitalization.
Where the pooled analysis was associated with significant heterogeneity, subgroup analyses were completed using two primary categories:
direct and indirect model of care; and
type of control group (PCP or cardiologist).
The direct model of care was a clinic-based multidisciplinary HF program and the indirect model of care was a physician supervised, nurse-led telephonic HF program.
All studies, except one, were completed in jurisdictions outside North America. (12-19) Similarly, all but one study had a sample size of less than 250. The mean age in the studies ranged from 65 to 77 years. Six of the studies(12;14-18) included populations with a NYHA classification of II-III. In two studies, the control treatment was a cardiologist (12;15) and two studies reported the inclusion of a dietitian, physiotherapist and psychologist as members of the multidisciplinary team (12;19).
All Cause Mortality
Eight studies reported all cause mortality (number of persons) at 1 year follow-up. (12-19) When the results of all eight studies were pooled, there was a statistically significant RRR of 29% with moderate heterogeneity (I2 of 38%). The results of the subgroup analyses indicated a significant RRR of 40% in all cause mortality when SMCCC is delivered through a direct team model (clinic) and a 35% RRR when SMCCC was compared with a primary care practitioner.
HF-Specific Mortality
Three studies reported HF-specific mortality (number of persons) at 1 year follow-up. (15;18;19) When the results of these were pooled, there was an insignificant RRR of 42% with high statistical heterogeneity (I2 of 60%). The GRADE quality of evidence is moderate for the pooled analysis of all studies.
All Cause Hospitalization
Seven studies reported all cause hospitalization at 1-year follow-up (13-15;17-19). When pooled, their results showed a statistically insignificant 12% increase in hospitalizations in the SMCCC group with high statistical heterogeneity (I2 of 81%). A significant RRR of 12% in all cause hospitalization in favour of the SMCCC care group was achieved when SMCCC was delivered using an indirect model (telephonic) with an associated (I2 of 0%). The Grade quality of evidence was found to be low for the pooled analysis of all studies and moderate for the subgroup analysis of the indirect team care model.
HF-Specific Hospitalization
Six studies reported HF-specific hospitalization at 1-year follow-up. (13-15;17;19) When pooled, the results of these studies showed an insignificant RRR of 14% with high statistical heterogeneity (I2 of 60%); however, the quality of evidence for the pooled analysis of was low.
Duration of Hospital Stay
Seven studies reported duration of hospital stay, four in terms of mean duration of stay in days (14;16;17;19) and three in terms of total hospital bed days (12;13;18). Most studies reported all cause duration of hospital stay while two also reported HF-specific duration of hospital stay. These data were not amenable to meta-analyses as standard deviations were not provided in the reports. However, in general (and in all but one study) it appears that persons receiving SMCCC had shorter hospital stays, whether measured as mean days in hospital or total hospital bed days.
Emergency Room Visits
Only one study reported emergency room visits. (14) This was presented as a composite of readmissions and ER visits, where the authors reported that 77% (59/76) of the SMCCC group and 84% (63/75) of the usual care group were either readmitted or had an ER visit within the 1 year of follow-up (P=0.029).
Quality of Life
Quality of life was reported in five studies using the Minnesota Living with HF Questionnaire (MLHFQ) (12-15;19) and in one study using the Nottingham Health Profile Questionnaire(16). The MLHFQ results are reported in our analysis. Two studies reported the mean score at 1 year follow-up, although did not provide the standard deviation of the mean in their report. One study reported the median and range scores at 1 year follow-up in each group. Two studies reported the change scores of the physical and emotional subscales of the MLHFQ of which only one study reported a statistically significant change from baseline to 1 year follow-up between treatment groups in favour of the SMCCC group in the physical sub-scale. A significant change in the emotional subscale scores from baseline to 1 year follow-up in the treatment groups was not reported in either study.
Conclusion
There is moderate quality evidence that SMCCC reduces all cause mortality by 29%. There is low quality evidence that SMCCC contributes to a shorter duration of hospital stay and improves quality of life compared to usual care. The evidence supports that SMCCC is effective when compared to usual care provided by either a primary care practitioner or a cardiologist. It does not, however, suggest an optimal model of care or discern what the effective program components are. A field evaluation could address this uncertainty.
PMCID: PMC3377506  PMID: 23074521
25.  Individual crypt genetic heterogeneity and the origin of metaplastic glandular epithelium in human Barrett’s oesophagus 
Gut  2008;57(8):1041-1048.
Objectives:
Current models of clonal expansion in human Barrett’s oesophagus are based upon heterogenous, flow-purified biopsy analysis taken at multiple segment levels. Detection of identical mutation fingerprints from these biopsy samples led to the proposal that a mutated clone with a selective advantage can clonally expand to fill an entire Barrett’s segment at the expense of competing clones (selective sweep to fixation model). We aimed to assess clonality at a much higher resolution by microdissecting and genetically analysing individual crypts. The histogenesis of Barrett’s metaplasia and neo-squamous islands has never been demonstrated. We investigated the oesophageal gland squamous ducts as the source of both epithelial sub-types.
Methods:
Individual crypts across Barrett’s biopsy and oesophagectomy blocks were dissected. Determination of tumour suppressor gene loss of heterozygosity patterns, p16 and p53 point mutations were carried out on a crypt-by-crypt basis. Cases of contiguous neo-squamous islands and columnar metaplasia with oesophageal squamous ducts were identified. Tissues were isolated by laser capture microdissection and genetically analysed.
Results:
Individual crypt dissection revealed mutation patterns that were masked in whole biopsy analysis. Dissection across oesophagectomy specimens demonstrated marked clonal heterogeneity, with multiple independent clones present. We identified a p16 point mutation arising in the squamous epithelium of the oesophageal gland duct, which was also present in a contiguous metaplastic crypt, whereas neo-squamous islands arising from squamous ducts were wild-type with respect to surrounding Barrett’s dysplasia.
Conclusions:
By studying clonality at the crypt level we demonstrate that Barrett’s heterogeneity arises from multiple independent clones, in contrast to the selective sweep to fixation model of clonal expansion previously described. We suggest that the squamous gland ducts situated throughout the oesophagus are the source of a progenitor cell that may be susceptible to gene mutation resulting in conversion to Barrett’s metaplastic epithelium. Additionally, these data suggest that wild-type ducts may be the source of neo-squamous islands.
doi:10.1136/gut.2007.143339
PMCID: PMC2564832  PMID: 18305067

Results 1-25 (1196800)