Advances in cancer genomics have been propelled by the steady evolution of molecular profiling technologies. Over the past decade, high-throughput sequencing technologies have matured to the point necessary to support disease-specific shotgun sequencing. This has compelled whole-genome sequencing studies across a broad panel of malignancies. The emergence of high-throughput sequencing technologies has inspired new chemical and computational techniques enabling interrogation of cancer-specific genomic and transcriptomic variants, previously unannotated genes, and chromatin structure. Finally, recent progress in single-cell sequencing holds great promise for studies interrogating the consequences of tumor evolution in cancers presenting with genomic heterogeneity.
next-generation sequencing; cancer genomics; transcriptomics; chromosomal conformation sequencing; bioinformatics; tumor heterogeneity
The identification of fusion genes such as SYT-SSX1/SSX2, PAX3-FOXO1, TPM3/TPM4-ALK and EWS-FLI1 in human sarcomas has provided important insight into the diagnosis and targeted therapy of sarcomas. No recurrent fusion has been reported in human osteosarcoma.
Transcriptome sequencing was used to characterize the gene fusions and mutations in 11 human osteosarcomas.
Nine of 11 samples were found to harbor genetic inactivating alterations in the TP53 pathway. Two recurrent fusion genes associated with the 12q locus, LRP1-SNRNP25 and KCNMB4-CCND3, were identified and validated by RT-PCR, Sanger sequencing and fluorescence in situ hybridization, and were found to be osteosarcoma specific in a validation cohort of 240 other sarcomas. Expression of LRP1-SNRNP25 fusion gene promoted SAOS-2 osteosarcoma cell migration and invasion. Expression of KCNMB4-CCND3 fusion gene promoted SAOS-2 cell migration.
Our study represents the first whole transcriptome analysis of untreated human osteosarcoma. Our discovery of two osteosarcoma specific fusion genes associated with osteosarcoma cellular motility highlights the heterogeneity of osteosarcoma and provides opportunities for new treatment modalities.
Electronic supplementary material
The online version of this article (doi:10.1186/s13045-014-0076-2) contains supplementary material, which is available to authorized users.
Osteosarcoma; Transcriptome sequencing; Fusion gene; LRP1-SNRNP25; KCNMB4-CCND3
The distinct cell types of multicellular organisms arise due to constraints imposed by gene regulatory networks on the collective change of gene expression across the genome, creating self-stabilizing expression states, or attractors. We compiled a resource of curated human expression data comprising 166 cell types and 2,602 transcription regulating genes and developed a data driven method built around the concept of expression reversal defined at the level of gene pairs, such as those participating in toggle switch circuits. This approach allows us to organize the cell types into their ontogenetic lineage-relationships and to reflect regulatory relationships among genes that explain their ability to function as determinants of cell fate. We show that this method identifies genes belonging to regulatory circuits that control neuronal fate, pluripotency and blood cell differentiation, thus offering a novel large-scale perspective on lineage specification.
The associations of ERG overexpression with clinical behavior and molecular pathways of prostate cancer are incompletely known. We assessed the association of ERG expression with AR, PTEN, SPINK1, Ki-67, and EZH2 expression levels, deletion, and mutations of chromosomal region 3p14 and TP53, and clinicopathologic variables.
The material consisted of 326 prostatectomies, 166 needle biopsies from men treated primarily with endocrine therapy, 177 transurethral resections of castration-resistant prostate cancers (CRPC), and 114 CRPC metastases obtained from 32 men. Immunohistochemistry, FISH, and sequencing was used for the measurements.
ERG expression was found in about 45% of all patient cohorts. In a multivariate analysis, ERG expression showed independent value of favorable prognosis (P = 0.019). ERG positivity was significantly associated with loss of PTEN expression in prostatectomy (P = 0.0348), and locally recurrent CRPCs (P = 0.0042). Loss of PTEN expression was associated (P = 0.0085) with shorter progression-free survival in ERG-positive, but not in negative cases. When metastases in each subject were compared, consistent ERG, PTEN, and AR expression as well as TP53 mutations were found in a majority of subjects.
A similar frequency of ERG positivity from early to late stage of the disease suggests lack of selection of ERG expression during disease progression. The prognostic significance of PTEN loss solely in ERG-positive cases indicates interaction of these pathways. The finding of consistent genetic alterations in different metastases suggests that the major genetic alterations take place in the primary tumor.
Interaction of PTEN and ERG pathways warrants further studies.
Microorganisms often form multicellular structures such as biofilms and structured colonies that can influence the organism’s virulence, drug resistance, and adherence to medical devices. Phenotypic classification of these structures has traditionally relied on qualitative scoring systems that limit detailed phenotypic comparisons between strains. Automated imaging and quantitative analysis have the potential to improve the speed and accuracy of experiments designed to study the genetic and molecular networks underlying different morphological traits. For this reason, we have developed a platform that uses automated image analysis and pattern recognition to quantify phenotypic signatures of yeast colonies. Our strategy enables quantitative analysis of individual colonies, measured at a single time point or over a series of time-lapse images, as well as the classification of distinct colony shapes based on image-derived features. Phenotypic changes in colony morphology can be expressed as changes in feature space trajectories over time, thereby enabling the visualization and quantitative analysis of morphological development. To facilitate data exploration, results are plotted dynamically through an interactive Yeast Image Analysis web application (YIMAA; http://yimaa.cs.tut.fi) that integrates the raw and processed images across all time points, allowing exploration of the image-based features and principal components associated with morphological development.
colony morphology; image analysis; software; yeast; phenotype; time-lapse
Integrated genomic analyses revealed a miRNA-regulatory network, which further defined a robust integrated mesenchymal subtype associated with poor overall survival in 459 cases of serous ovarian cancer (OvCa) from The Cancer Genome Atlas and 560 cases from independent cohorts. Eight key miRNAs, including miR-506, miR-141 and miR-200a, were predicted to regulate 89% of the targets in this network. Follow-up functional experiments illustrate that miR-506 augmented E-cadherin expression, inhibited cell migration and invasion, and prevented TGFβ-induced epithelial-mesenchymal transition (EMT) by targeting SNAI2, a transcriptional repressor of E-cadherin. In human OvCa, miR-506 expression was correlated with decreased SNAI2 and VIM, elevated E-cadherin, and beneficial prognosis. Nanoparticle delivery of miR-506 in orthotopic OvCa mouse models led to E-cadherin induction and reduced tumor growth.
Altered expression of oncogenic and tumor-suppressing microRNAs (miRNAs) is widely associated with tumorigenesis. However, the regulatory mechanisms underlying these alterations are poorly understood. We sought to shed light on the deregulation of miRNA biogenesis promoting the aberrant miRNA expression profiles identified in these tumors. Using sequencing technology to perform both whole-transcriptome and small RNA sequencing of glioma patient samples, we examined precursor and mature miRNAs to directly evaluate the miRNA maturation process, and interrogated expression profiles for genes involved in the major steps of miRNA biogenesis. We found that ratios of mature to precursor forms of a large number of miRNAs increased with the progression from normal brain to low-grade and then to high-grade gliomas. The expression levels of genes involved in each of the three major steps of miRNA biogenesis (nuclear processing, nucleo-cytoplasmic transport, and cytoplasmic processing) were systematically altered in glioma tissues. Survival analysis of an independent data set demonstrated that the alteration of genes involved in miRNA maturation correlates with survival in glioma patients. Direct quantification of miRNA maturation with deep sequencing demonstrated that deregulation of the miRNA biogenesis pathway is a hallmark for glioma genesis and progression.
microRNA; biogenesis; glioma
Systems biology experiments studying different topics and organisms produce thousands of data values across different types of genomic data. Further, data mining analyses are yielding ranked and heterogeneous results and association networks distributed over the entire genome. The visualization of these results is often difficult and standalone web tools allowing for custom inputs and dynamic filtering are limited.
We have developed POMO (http://pomo.cs.tut.fi), an interactive web-based application to visually explore omics data analysis results and associations in circular, network and grid views. The circular graph represents the chromosome lengths as perimeter segments, as a reference outer ring, such as cytoband for human. The inner arcs between nodes represent the uploaded network. Further, multiple annotation rings, for example depiction of gene copy number changes, can be uploaded as text files and represented as bar, histogram or heatmap rings. POMO has built-in references for human, mouse, nematode, fly, yeast, zebrafish, rice, tomato, Arabidopsis, and Escherichia coli. In addition, POMO provides custom options that allow integrated plotting of unsupported strains or closely related species associations, such as human and mouse orthologs or two yeast wild types, studied together within a single analysis. The web application also supports interactive label and weight filtering. Every iterative filtered result in POMO can be exported as image file and text file for sharing or direct future input.
The POMO web application is a unique tool for omics data analysis, which can be used to visualize and filter the genome-wide networks in the context of chromosomal locations as well as multiple network layouts. With the several illustration and filtering options the tool supports the analysis and visualization of any heterogeneous omics data analysis association results for many organisms. POMO is freely available and does not require any installation or registration.
Omics; Association; Visualization; Ortholog; Phenolog; Genome-wide; Network; Model organism
We describe a supervised prediction method for diagnosis of acute myeloid leukemia (AML) from patient samples based on flow cytometry measurements. We use a data driven approach with machine learning methods to train a computational model that takes in flow cytometry measurements from a single patient and gives a confidence score of the patient being AML-positive. Our solution is based on an regularized logistic regression model that aggregates AML test statistics calculated from individual test tubes with different cell populations and fluorescent markers. The model construction is entirely data driven and no prior biological knowledge is used. The described solution scored a 100% classification accuracy in the DREAM6/FlowCAP2 Molecular Classification of Acute Myeloid Leukaemia Challenge against a golden standard consisting of 20 AML-positive and 160 healthy patients. Here we perform a more extensive validation of the prediction model performance and further improve and simplify our original method showing that statistically equal results can be obtained by using simple average marker intensities as features in the logistic regression model. In addition to the logistic regression based model, we also present other classification models and compare their performance quantitatively. The key benefit in our prediction method compared to other solutions with similar performance is that our model only uses a small fraction of the flow cytometry measurements making our solution highly economical.
Cancer is a broad group of genetic diseases which account for millions of deaths worldwide each year. Cancers are classified by various clinical, pathological and molecular methods, but even within a well-characterized disease, there is a significant inter-patient variability in survival, response to treatment, and other parameters. Especially in molecular level, tumours of the same category can appear significantly dissimilar due to complex combinations of genetic aberrations leading to a similar malignancy. We extended the current classification methods by studying tumour heterogeneity at pathway level.
We computed the rate of alterations in 1994 pathways and 2210 tumours consisting of eight different cancers. Using gene set enrichment analysis, each sample was computed a pathway aberration profile that reflected its molecular state. The profiles were analysed together to infer the characteristic aberration rates for each pathway within each cancer. Subgroups of tumours defined by similar pathway aberrations were identified using clustering analyses. The pathway aberration and gene expression profiles of the subgroups were consecutively compared across all eight cancer types to search for similar tumours crossing the standard classification.
We identified pathways and processes that were common to all cancers as well as traits that are unique to a cancer type or closely related cancers. Studying the gene expression patterns within the pathway context suggested potential alteration mechanisms. Clustering analysis revealed five clinically relevant subgroups of tumours in four cancers that exhibited significant differences in survival compared to others. The cross-cancer analysis of the subgroups resulted in the identification of tumours that shared potentially significant alterations.
This study represents the first effort to extend the molecular characterizations towards pathway level descriptions across the family of cancers. In addition to providing a proof-of-concept for single sample pathway aberration analysis in this context, we present a comprehensive pathway aberration dataset that can be used to study pathway aberration patterns within or across cancers. Significant similarities between subgroups of different cancers on pathway and gene expression levels provide interesting hypotheses for understanding variable drug response, or transferring treatments across diseases by identifying common druggable pathways or genes, for example.
Aging and gender have a strong influence on the functional capacity of the immune system. In general, the immune response in females is stronger than that in males, but there is scant information about the effect of aging on the gender difference in the immune response. To address this question, we performed a transcriptomic analysis of peripheral blood mononuclear cells derived from elderly individuals (nonagenarians, n = 146) and young controls (aged 19–30 years, n = 30). When compared to young controls, we found 339 and 248 genes that were differentially expressed (p<0.05, fold change >1.5 or <−1.5) in nonagenarian females and males, respectively, 180 of these genes were changed in both genders. An analysis of the affected signaling pathways revealed a clear gender bias: there were 48 pathways that were significantly changed in females, while only 29 were changed in males. There were 24 pathways that were shared between both genders. Our results indicate that female nonagenarians have weaker T cell defenses and a more prominent pro-inflammatory response as compared to males. In males significantly fewer pathways were affected, two of which are known to be regulated by estrogen. These data show that the effects of aging on the human immune system are significantly different in males and females.
To facilitate analysis and understanding of biological systems, large-scale data are often integrated into models using a variety of mathematical and computational approaches. Such models describe the dynamics of the biological system and can be used to study the changes in the state of the system over time. For many model classes, such as discrete or continuous dynamical systems, there exist appropriate frameworks and tools for analyzing system dynamics. However, the heterogeneous information that encodes and bridges molecular and cellular dynamics, inherent to fine-grained molecular simulation models, presents significant challenges to the study of system dynamics. In this paper, we present an algorithmic information theory based approach for the analysis and interpretation of the dynamics of such executable models of biological systems. We apply a normalized compression distance (NCD) analysis to the state representations of a model that simulates the immune decision making and immune cell behavior. We show that this analysis successfully captures the essential information in the dynamics of the system, which results from a variety of events including proliferation, differentiation, or perturbations such as gene knock-outs. We demonstrate that this approach can be used for the analysis of executable models, regardless of the modeling framework, and for making experimentally quantifiable predictions.
Boolean networks have been used as a discrete model for several biological systems, including metabolic and genetic regulatory networks. Due to their simplicity they offer a firm foundation for generic studies of physical systems. In this work we show, using a measure of context-dependent information, set complexity, that prior to reaching an attractor, random Boolean networks pass through a transient state characterized by high complexity. We justify this finding with a use of another measure of complexity, namely, the statistical complexity. We show that the networks can be tuned to the regime of maximal complexity by adding a suitable amount of noise to the deterministic Boolean dynamics. In fact, we show that for networks with Poisson degree distributions, all networks ranging from subcritical to slightly supercritical can be tuned with noise to reach maximal set complexity in their dynamics. For networks with a fixed number of inputs this is true for near-to-critical networks. This increase in complexity is obtained at the expense of disruption in information flow. For a large ensemble of networks showing maximal complexity, there exists a balance between noise and contracting dynamics in the state space. In networks that are close to critical the intrinsic noise required for the tuning is smaller and thus also has the smallest effect in terms of the information processing in the system. Our results suggest that the maximization of complexity near to the state transition might be a more general phenomenon in physical systems, and that noise present in a system may in fact be useful in retaining the system in a state with high information content.
Fusion genes are chromosomal aberrations that are found in many cancers and can be used as prognostic markers and drug targets in clinical practice. Fusions can lead to production of oncogenic fusion proteins or to enhanced expression of oncogenes. Several recent studies have reported that some fusion genes can escape microRNA regulation via 3′–untranslated region (3′-UTR) deletion. We performed whole transcriptome sequencing to identify fusion genes in glioma and discovered FGFR3-TACC3 fusions in 4 of 48 glioblastoma samples from patients both of mixed European and of Asian descent, but not in any of 43 low-grade glioma samples tested. The fusion, caused by tandem duplication on 4p16.3, led to the loss of the 3′-UTR of FGFR3, blocking gene regulation of miR-99a and enhancing expression of the fusion gene. The fusion gene was mutually exclusive with EGFR, PDGFR, or MET amplification. Using cultured glioblastoma cells and a mouse xenograft model, we found that fusion protein expression promoted cell proliferation and tumor progression, while WT FGFR3 protein was not tumorigenic, even under forced overexpression. These results demonstrated that the FGFR3-TACC3 gene fusion is expressed in human cancer and generates an oncogenic protein that promotes tumorigenesis in glioblastoma.
Malignant peripheral nerve sheath tumor (MPNST) is a rare sarcoma that lacks effective therapeutic strategies. We gain insight into the most recurrent genetically altered pathways with the purpose of scanning possible therapeutic targets.
We performed a microarray based-comparative genomic hybridization (aCGH) profiling of two cohorts of primary MPNST tissue samples including 25 patients treated at The University of Texas MD Anderson Cancer Center and 26 patients from Tianjin Cancer Hospital. IHC and cell biology detection and validation were performed on human MPNST tissues and cell lines.
Genomic characterization of 51 MPNST tissue samples identified several frequently amplified regions harboring 2,599 genes and regions of deletion including 4,901 genes. At the pathway level, we identified a significant enrichment of copy number–altering events in the insulin-like growth factor 1 receptor (IGF1R) pathway, including frequent amplifications of the IGF1R gene itself. To validate the IGF1R pathway as a potential target in MPNSTs, we first confirmed that high IGF1R protein correlated with worse tumor-free survival in an independent set of samples using immunohistochemistry. Two MPNST cell lines (ST88-14 and STS26T) were used to determine the effect of attenuating IGF1R. Inhibition of IGF1R in ST88-14 cells using small interfering RNAs or an IGF1R inhibitor, MK-0646, led to significant decreases in cell proliferation, invasion, and migration accompanied by attenuation of the PI3K/AKT and MAPK pathways.
These integrated genomic and molecular studies provide evidence that the IGF1R pathway is a potential therapeutic target for patients with MPNST.
malignant peripheral nerve sheath tumor; insulin-like growth factor 1 receptor; genomic characterization; targeted therapy; microarray-based comparative genomic hybridization; gene amplification; MK-0646; epidermal growth factor receptor; Gefitinib
Gastrointestinal stromal tumors (GISTs) were historically grouped with leiomyosarcomas (LMSs) based on their morphological similarities, but recently they have been unequivocally established as a distinct type of sarcoma based on the molecular features and response to imatinib treatment. To gain further insight into the genomic differences between GISTs and LMSs, we mapped gene copy number aberrations (CNAs) in 42 GISTs and 30 LMSs and integrated them with gene expression profiles. Our studies revealed distinct patterns of CNAs between GISTs and LMSs. Losses in chromosomes 1p, 14q, 15q, and 22q were significantly more frequent in GISTs than in LMSs (P < 0.001), whereas losses in chromosomes 10 and 16 as well as gains in 1q, 14q, and 15q (P < 0.001) were more common in LMSs. By integrating CNAs with gene expression data and clinical information, we found several clinically relevant CNAs that were prognostic of survival in patients with GIST. Furthermore, GISTs were categorized into four groups according to an accumulating pattern of genetic alterations. Many key cellular pathways were differently expressed in the four groups and the patients had increasingly worse prognosis as the extent of genomic alterations increased. These findings lead us to propose a new tumor-progression genetic staging system termed Genomic Instability Stage (GIS) to complement the current prognostic predictive system based on tumor size, mitotic index (MI), and KIT mutation.
imatinib; CNA; GIST; leiomyosarcoma; array CGH; survival; KIT
Protein binding microarrays (PBM) are a high throughput technology used to characterize protein-DNA binding. The arrays measure a protein's affinity toward thousands of double-stranded DNA sequences at once, producing a comprehensive binding specificity catalog. We present a linear model for predicting the binding affinity of a protein toward DNA sequences based on PBM data. Our model represents the measured intensity of an individual probe as a sum of the binding affinity contributions of the probe's subsequences. These subsequences characterize a DNA binding motif and can be used to predict the intensity of protein binding against arbitrary DNA sequences. Our method was the best performer in the Dialogue for Reverse Engineering Assessments and Methods 5 (DREAM5) transcription factor/DNA motif recognition challenge. For the DREAM5 bonus challenge, we also developed an approach for the identification of transcription factors based on their PBM binding profiles. Our approach for TF identification achieved the best performance in the bonus challenge.
In this editorial we introduce the research paradigms of signal processing in the era of systems biology. Signal processing is a field of science traditionally focused on modeling electronic and communications systems, but recently it has turned to biological applications with astounding results. The essence of signal processing is to describe the natural world by mathematical models and then, based on these models, develop efficient computational tools for solving engineering problems. Here, we underline, with examples, the endless possibilities which arise when the battle-hardened tools of engineering are applied to solve the problems that have tormented cancer researchers. Based on this approach, a new field has emerged, called cancer systems biology. Despite its short history, cancer systems biology has already produced several success stories tackling previously impracticable problems. Perhaps most importantly, it has been accepted as an integral part of the major endeavors of cancer research, such as analyzing the genomic and epigenomic data produced by The Cancer Genome Atlas (TCGA) project. Finally, we show that signal processing and cancer research, two fields that are seemingly distant from each other, have merged into a field that is indeed more than the sum of its parts.
Systems biology; signal processing; gene regulation; methylation; glioblastoma
Colorectal cancer (CRC) remains one of the major cancer types and cancer
related death worldwide. Sensitive, non-invasive biomarkers that can
facilitate disease detection, staging and prediction of therapeutic outcome
are highly desirable to improve survival rate and help to determine
optimized treatment for CRC. The small non-coding RNAs, microRNAs (miRNAs),
have recently been identified as critical regulators for various diseases
including cancer and may represent a novel class of cancer biomarkers. The
purpose of this study was to identify and validate circulating microRNAs in
human plasma for use as such biomarkers in colon cancer.
By using quantitative reverse transcription-polymerase chain reaction, we
found that circulating miR-141 was significantly associated with stage IV
colon cancer in a cohort of 102 plasma samples. Receiver operating
characteristic (ROC) analysis was used to evaluate the sensitivity and
specificity of candidate plasma microRNA markers. We observed that
combination of miR-141 and carcinoembryonic antigen (CEA), a widely used
marker for CRC, further improved the accuracy of detection. These findings
were validated in an independent cohort of 156 plasma samples collected at
Tianjin, China. Furthermore, our analysis showed that high levels of plasma
miR-141 predicted poor survival in both cohorts and that miR-141 was an
independent prognostic factor for advanced colon cancer.
We propose that plasma miR-141 may represent a novel biomarker that
complements CEA in detecting colon cancer with distant metastasis and that
high levels of miR-141 in plasma were associated with poor prognosis.
Neuronal networks exhibit a wide diversity of structures, which contributes to the diversity of the dynamics therein. The presented work applies an information theoretic framework to simultaneously analyze structure and dynamics in neuronal networks. Information diversity within the structure and dynamics of a neuronal network is studied using the normalized compression distance. To describe the structure, a scheme for generating distance-dependent networks with identical in-degree distribution but variable strength of dependence on distance is presented. The resulting network structure classes possess differing path length and clustering coefficient distributions. In parallel, comparable realistic neuronal networks are generated with NETMORPH simulator and similar analysis is done on them. To describe the dynamics, network spike trains are simulated using different network structures and their bursting behaviors are analyzed. For the simulation of the network activity the Izhikevich model of spiking neurons is used together with the Tsodyks model of dynamical synapses. We show that the structure of the simulated neuronal networks affects the spontaneous bursting activity when measured with bursting frequency and a set of intraburst measures: the more locally connected networks produce more and longer bursts than the more random networks. The information diversity of the structure of a network is greatest in the most locally connected networks, smallest in random networks, and somewhere in between in the networks between order and disorder. As for the dynamics, the most locally connected networks and some of the in-between networks produce the most complex intraburst spike trains. The same result also holds for sparser of the two considered network densities in the case of full spike trains.
information diversity; neuronal network; structure-dynamics relationship; complexity
Identification of genetic signatures is the main objective for many computational oncology studies. The signature usually consists of numerous genes that are differentially expressed between two clinically distinct groups of samples, such as tumor subtypes. Prospectively, many signatures have been found to generalize poorly to other datasets and, thus, have rarely been accepted into clinical use. Recognizing the limited success of traditionally generated signatures, we developed a systems biology-based framework for robust identification of key transcription factors and their genomic regulatory neighborhoods. Application of the framework to study the differences between gastrointestinal stromal tumor (GIST) and leiomyosarcoma (LMS) resulted in the identification of nine transcription factors (SRF, NKX2-5, CCDC6, LEF1, VDR, ZNF250, TRIM63, MAF, and MYC). Functional annotations of the obtained neighborhoods identified the biological processes which the key transcription factors regulate differently between the tumor types. Analyzing the differences in the expression patterns using our approach resulted in a more robust genetic signature and more biological insight into the diseases compared to a traditional genetic signature.
Systems biology; transcription factor; gene regulation; binding motif; sarcoma
The innate immune system is a two-edged sword; it is absolutely required for host defense against infection but, uncontrolled, can trigger a plethora of inflammatory diseases. Here we used systems biology approaches to predict and validate a gene regulatory network involving a dynamic interplay between the transcription factors NF-κB, C/EBPδ, and ATF3 that controls inflammatory responses. We mathematically modeled transcriptional regulation of Il6 and Cebpd genes and experimentally validated the prediction that the combination of an initiator (NF-κB), an amplifier (C/EBPδ) and an attenuator (ATF3) forms a regulatory circuit that discriminates between transient and persistent Toll-like receptor 4-induced signals. Our results suggest a mechanism that enables the innate immune system to detect the duration of infection and to respond appropriately.
We present a computational framework for predicting targets of transcription factor regulation. The framework is based on the integration of a number of sources of evidence, derived from DNA sequence and gene expression data, using a weighted sum approach. Sources of evidence are prioritized based on a training set, and their relative contributions are then optimized. The performance of the proposed framework is demonstrated in the context of BCL6 target prediction. We show that this framework is able to uncover BCL6 targets reliably when biological prior information is utilized effectively, particularly in the case of sequence analysis. The framework results in a considerable gain in performance over scores in which sequence information was not incorporated. This analysis shows that with assessment of the quality and biological relevance of the data, reliable predictions can be obtained with this computational framework.
network inference; transcription factor binding site prediction; data integration
Two computational methods for estimating the cell cycle phase distribution of a budding yeast (Saccharomyces cerevisiae) cell population are presented. The first one is a nonparametric method that is based on the analysis of DNA content in the individual cells of the population. The DNA content is measured with a fluorescence-activated cell sorter (FACS). The second method is based on budding index analysis. An automated image analysis method is presented for the task of detecting the cells and buds. The proposed methods can be used to obtain quantitative information on the cell cycle phase distribution of a budding yeast S. cerevisiae population. They therefore provide a solid basis for obtaining the complementary information needed in deconvolution of gene expression data. As a case study, both methods are tested with data that were obtained in a time series experiment with S. cerevisiae. The details of the time series experiment as well as the image and FACS data obtained in the experiment can be found in the online additional material at http://www.cs.tut.fi/sgn/csb/yeastdistrib/.