PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-18 (18)
 

Clipboard (0)
None

Select a Filter Below

Journals
more »
Year of Publication
Document Types
1.  Logic models to predict continuous outputs based on binary inputs with an application to personalized cancer therapy 
Scientific Reports  2016;6:36812.
Mining large datasets using machine learning approaches often leads to models that are hard to interpret and not amenable to the generation of hypotheses that can be experimentally tested. We present ‘Logic Optimization for Binary Input to Continuous Output’ (LOBICO), a computational approach that infers small and easily interpretable logic models of binary input features that explain a continuous output variable. Applying LOBICO to a large cancer cell line panel, we find that logic combinations of multiple mutations are more predictive of drug response than single gene predictors. Importantly, we show that the use of the continuous information leads to robust and more accurate logic models. LOBICO implements the ability to uncover logic models around predefined operating points in terms of sensitivity and specificity. As such, it represents an important step towards practical application of interpretable logic models.
doi:10.1038/srep36812
PMCID: PMC5120272  PMID: 27876821
2.  A Landscape of Pharmacogenomic Interactions in Cancer 
Cell  2016;166(3):740-754.
Summary
Systematic studies of cancer genomes have provided unprecedented insights into the molecular nature of cancer. Using this information to guide the development and application of therapies in the clinic is challenging. Here, we report how cancer-driven alterations identified in 11,289 tumors from 29 tissues (integrating somatic mutations, copy number alterations, DNA methylation, and gene expression) can be mapped onto 1,001 molecularly annotated human cancer cell lines and correlated with sensitivity to 265 drugs. We find that cell lines faithfully recapitulate oncogenic alterations identified in tumors, find that many of these associate with drug sensitivity/resistance, and highlight the importance of tissue lineage in mediating drug response. Logic-based modeling uncovers combinations of alterations that sensitize to drugs, while machine learning demonstrates the relative importance of different data types in predicting drug response. Our analysis and datasets are rich resources to link genotypes with cellular phenotypes and to identify therapeutic options for selected cancer sub-populations.
Graphical Abstract
Highlights
•We integrate heterogeneous molecular data of 11,289 tumors and 1,001 cell lines•We measure the response of 1,001 cancer cell lines to 265 anti-cancer drugs•We uncover numerous oncogenic aberrations that sensitize to an anti-cancer drug•Our study forms a resource to identify therapeutic options for cancer sub-populations
A look at the pharmacogenomic landscape of 1,001 human cancer cell lines points to new treatment applications for hundreds of known anti-cancer drugs.
doi:10.1016/j.cell.2016.06.017
PMCID: PMC4967469  PMID: 27397505
3.  Using Incomplete Trios to Boost Confidence in Family Based Association Studies 
Most currently available family based association tests are designed to account only for nuclear families with complete genotypes for parents as well as offspring. Due to the availability of increasingly less expensive generation of whole genome sequencing information, genetic studies are able to collect data for more families and from large family cohorts with the goal of improving statistical power. However, due to missing genotypes, many families are not included in the family based association tests, negating the benefits of large scale sequencing data. Here, we present the CIFBAT method to use incomplete families in Family Based Association Test (FBAT) to evaluate robustness against missing data. CIFBAT uses quantile intervals of the FBAT statistic by randomly choosing valid completions of incomplete family genotypes based on Mendelian inheritance rules. By considering all valid completions equally likely and computing quantile intervals over many randomized iterations, CIFBAT avoids assumption of a homogeneous population structure or any particular missingness pattern in the data. Using simulated data, we show that the quantile intervals computed by CIFBAT are useful in validating robustness of the FBAT statistic against missing data and in identifying genomic markers with higher precision. We also propose a novel set of candidate genomic markers for uterine related abnormalities from analysis of familial whole genome sequences, and provide validation for a previously established set of candidate markers for Type 1 diabetes. We have provided a software package that incorporates TDT, robustTDT, FBAT, and CIFBAT. The data format proposed for the software uses half the memory space that the standard FBAT format (PED) files use, making it efficient for large scale genome wide association studies.
doi:10.3389/fgene.2016.00034
PMCID: PMC4796035  PMID: 27047537
family based association tests; missing genotypes; randomized imputation; quantile intervals; population stratification; whole genome analysis; memory efficient data format
4.  CloudForest: A Scalable and Efficient Random Forest Implementation for Biological Data 
PLoS ONE  2015;10(12):e0144820.
Random Forest has become a standard data analysis tool in computational biology. However, extensions to existing implementations are often necessary to handle the complexity of biological datasets and their associated research questions. The growing size of these datasets requires high performance implementations. We describe CloudForest, a Random Forest package written in Go, which is particularly well suited for large, heterogeneous, genetic and biomedical datasets. CloudForest includes several extensions, such as dealing with unbalanced classes and missing values. Its flexible design enables users to easily implement additional extensions. CloudForest achieves fast running times by effective use of the CPU cache, optimizing for different classes of features and efficiently multi-threading. https://github.com/ilyalab/CloudForest.
doi:10.1371/journal.pone.0144820
PMCID: PMC4692062  PMID: 26679347
5.  A multilevel pan-cancer map links gene mutations to cancer hallmarks 
Background
A central challenge in cancer research is to create models that bridge the gap between the molecular level on which interventions can be designed and the cellular and tissue levels on which the disease phenotypes are manifested. This study was undertaken to construct such a model from functional annotations and explore its use when integrated with large-scale cancer genomics data.
Methods
We created a map that connects genes to cancer hallmarks via signaling pathways. We projected gene mutation and focal copy number data from various cancer types onto this map. We performed statistical analyses to uncover mutually exclusive and co-occurring oncogenic aberrations within this topology.
Results
Our analysis showed that although the genetic fingerprint of tumor types could be very different, there were less variations at the level of hallmarks, consistent with the idea that different genetic alterations have similar functional outcomes. Additionally, we showed how the multilevel map could help to clarify the role of infrequently mutated genes, and we demonstrated that mutually exclusive gene mutations were more prevalent in pathways, whereas many co-occurring gene mutations were associated with hallmark characteristics.
Conclusions
Overlaying this map with gene mutation and focal copy number data from various cancer types makes it possible to investigate the similarities and differences between tumor samples systematically at the levels of not only genes but also pathways and hallmarks.
Electronic supplementary material
The online version of this article (doi:10.1186/s40880-015-0050-6) contains supplementary material, which is available to authorized users.
doi:10.1186/s40880-015-0050-6
PMCID: PMC4593384  PMID: 26369414
Cancer systems biology; Cancer hallmarks; Gene mutations; Multilevel model
6.  SMARCE1 suppresses EGFR expression and controls responses to MET and ALK inhibitors in lung cancer 
Cell Research  2015;25(4):445-458.
Recurrent inactivating mutations in components of SWI/SNF chromatin-remodeling complexes have been identified across cancer types, supporting their roles as tumor suppressors in modulating oncogenic signaling pathways. We report here that SMARCE1 loss induces EGFR expression and confers resistance to MET and ALK inhibitors in non-small cell lung cancers (NSCLCs). We found that SMARCE1 binds to regulatory regions of the EGFR locus and suppresses EGFR transcription in part through regulating expression of Polycomb Repressive Complex component CBX2. Addition of the EGFR inhibitor gefitinib restores the sensitivity of SMARCE1-knockdown cells to MET and ALK inhibitors in NSCLCs. Our findings link SMARCE1 to EGFR oncogenic signaling and suggest targeted treatment options for SMARCE1-deficient tumors.
doi:10.1038/cr.2015.16
PMCID: PMC4387553  PMID: 25656847
EGFR signaling; drug resistance; RNAi screening; SMARCE1; SWI/SNF
7.  Multiscale Representation of Genomic Signals 
Nature methods  2014;11(6):689-694.
Genomic information is encoded on a wide range of distance scales, ranging from tens of base pairs to megabases. We developed a multiscale framework to analyze and visualize the information content of genomic signals. Different types of signals, such as GC content or DNA methylation, are characterized by distinct patterns of signal enrichment or depletion across scales spanning several orders of magnitude. These patterns are associated with a variety of genomic annotations, including genes, nuclear lamina associated domains, and repeat elements. By integrating the information across all scales, as compared to using any single scale, we demonstrate improved prediction of gene expression from Polymerase II chromatin immunoprecipitation sequencing (ChIP-seq) measurements and we observed that gene expression differences in colorectal cancer are not most strongly related to gene body methylation, but rather to methylation patterns that extend beyond the single-gene scale.
doi:10.1038/nmeth.2924
PMCID: PMC4040162  PMID: 24727652
8.  MED12 Controls the Response to Multiple Cancer Drugs through Regulation of TGF-β Receptor Signaling 
Cell  2012;151(5):937-950.
SUMMARY
Inhibitors of the ALK and EGF receptor tyrosine kinases provoke dramatic but short-lived responses in lung cancers harboring EML4-ALK translocations or activating mutations of EGFR, respectively. We used a large-scale RNAi screen to identify MED12, a component of the transcriptional MEDIATOR complex that is mutated in cancers, as a determinant of response to ALK and EGFR inhibitors. MED12 is in part cytoplasmic where it negatively regulates TGF-βR2 through physical interaction. MED12 suppression therefore results in activation of TGF-βR signaling, which is both necessary and sufficient for drug resistance. TGF-β signaling causes MEK/ERK activation, and consequently MED12 suppression also confers resistance to MEK and BRAF inhibitors in other cancers. MED12 loss induces an EMT-like phenotype, which is associated with chemotherapy resistance in colon cancer patients and to gefitinib in lung cancer. Inhibition of TGF-βR signaling restores drug responsiveness in MED12KD cells, suggesting a strategy to treat drug-resistant tumors that have lost MED12.
doi:10.1016/j.cell.2012.10.035
PMCID: PMC3672971  PMID: 23178117
9.  Fastbreak: a tool for analysis and visualization of structural variations in genomic data 
Genomic studies are now being undertaken on thousands of samples requiring new computational tools that can rapidly analyze data to identify clinically important features. Inferring structural variations in cancer genomes from mate-paired reads is a combinatorially difficult problem. We introduce Fastbreak, a fast and scalable toolkit that enables the analysis and visualization of large amounts of data from projects such as The Cancer Genome Atlas.
doi:10.1186/1687-4153-2012-15
PMCID: PMC3605143  PMID: 23046488
Cancer genomics; Structural variation; Translocation
10.  EPEPT: A web service for enhanced P-value estimation in permutation tests 
BMC Bioinformatics  2011;12:411.
Background
In computational biology, permutation tests have become a widely used tool to assess the statistical significance of an event under investigation. However, the common way of computing the P-value, which expresses the statistical significance, requires a very large number of permutations when small (and thus interesting) P-values are to be accurately estimated. This is computationally expensive and often infeasible. Recently, we proposed an alternative estimator, which requires far fewer permutations compared to the standard empirical approach while still reliably estimating small P-values [1].
Results
The proposed P-value estimator has been enriched with additional functionalities and is made available to the general community through a public website and web service, called EPEPT. This means that the EPEPT routines can be accessed not only via a website, but also programmatically using any programming language that can interact with the web. Examples of web service clients in multiple programming languages can be downloaded. Additionally, EPEPT accepts data of various common experiment types used in computational biology. For these experiment types EPEPT first computes the permutation values and then performs the P-value estimation. Finally, the source code of EPEPT can be downloaded.
Conclusions
Different types of users, such as biologists, bioinformaticians and software engineers, can use the method in an appropriate and simple way.
Availability
http://informatics.systemsbiology.net/EPEPT/
doi:10.1186/1471-2105-12-411
PMCID: PMC3277916  PMID: 22024252
11.  A regression model approach to enable cell morphology correction in high-throughput flow cytometry 
Large variations in cell size and shape can undermine traditional gating methods for analyzing flow cytometry data. Correcting for these effects enables analysis of high-throughput data sets, including >5000 yeast samples with diverse cell morphologies.
The regression model approach corrects for the effects of cell morphology on fluorescence, as well as an extremely small and restrictive gate, but without removing any of the cells.In contrast to traditional gating, this approach enables the quantitative analysis of high-throughput flow cytometry experiments, since the regression model can compare between biological samples that show no or little overlap in terms of the morphology of the cells.The analysis of a high-throughput yeast flow cytometry data set consisting of >5000 biological samples identified key proteins that affect the time and intensity of the bifurcation event that happens after the carbon source transition from glucose to fatty acids. Here, some yeast cells undergo major structural changes, while others do not.
Flow cytometry is a widely used technique that enables the measurement of different optical properties of individual cells within large populations of cells in a fast and automated manner. For example, by targeting cell-specific markers with fluorescent probes, flow cytometry is used to identify (and isolate) cell types within complex mixtures of cells. In addition, fluorescence reporters can be used in conjunction with flow cytometry to measure protein, RNA or DNA concentration within single cells of a population.
One of the biggest advantages of this technique is that it provides information of how each cell behaves instead of just measuring the population average. This can be essential when analyzing complex samples that consist of diverse cell types or when measuring cellular responses to stimuli. For example, there is an important difference between a 50% expression increase of all cells in a population after stimulation and a 100% increase in only half of the cells, while the other half remains unresponsive. Another important advantage of flow cytometry is automation, which enables high-throughput studies with thousands of samples and conditions. However, current methods are confounded by populations of cells that are non-uniform in terms of size and granularity. Such variability affects the emitted fluorescence of the cell and adds undesired variability when estimating population fluorescence. This effect also frustrates a sensible comparison between conditions, where not only fluorescence but also cell size and granularity may be affected.
Traditionally, this problem has been addressed by using ‘gates' that restrict the analysis to cells with similar morphological properties (i.e. cell size and cell granularity). Because cells inside the gate are morphologically similar to one another, they will show a smaller variability in their response within the population. Moreover, applying the same gate in all samples assures that observed differences between these samples are not due to differential cell morphologies.
Gating, however, comes with costs. First, since only a subgroup of cells is selected, the final number of cells analyzed can be significantly reduced. This means that in order to have sufficient statistical power, more cells have to be acquired, which, if even possible in the first place, increases the time and cost of the experiment. Second, finding a good gate for all samples and conditions can be challenging if not impossible, especially in cases where cellular morphology changes dramatically between conditions. Finally, gating is a very user-dependent process, where both the size and shape of the gate are determined by the researcher and will affect the outcome, introducing subjectivity in the analysis that complicates reproducibility.
In this paper, we present an alternative method to gating that addresses the issues stated above. The method is based on a regression model containing linear and non-linear terms that estimates and corrects for the effect of cell size and granularity on the observed fluorescence of each cell in a sample. The corrected fluorescence thus becomes ‘free' of the morphological effects.
Because the model uses all cells in the sample, it assures that the corrected fluorescence is an accurate representation of the sample. In addition, the regression model can predict the expected fluorescence of a sample in areas where there are no cells. This makes it possible to compare between samples that have little overlap with good confidence. Furthermore, because the regression model is automated, it is fully reproducible between labs and conditions. Finally, it allows for a rapid analysis of big data sets containing thousands of samples.
To probe the validity of the model, we performed several experiments. We show how the regression model is able to remove the morphological-associated variability as well as an extremely small and restrictive gate, but without the caveat of removing cells. We test the method in different organisms (yeast and human) and applications (protein level detection, separation of mixed subpopulations). We then apply this method to unveil new biological insights in the mechanistic processes involved in transcriptional noise.
Gene transcription is a process subjected to the randomness intrinsic to any molecular event. Although such randomness may seem to be undesirable for the cell, since it prevents consistent behavior, there are situations where some degree of randomness is beneficial (e.g. bet hedging). For this reason, each gene is tuned to exhibit different levels of randomness or noise depending on its functions. For core and essential genes, the cell has developed mechanisms to lower the level of noise, while for genes involved in the response to stress, the variability is greater.
This gene transcription tuning can be determined at many levels, from the architecture of the transcriptional network, to epigenetic regulation. In our study, we analyze the latter using the response of yeast to the presence of fatty acid in the environment. Fatty acid can be used as energy by yeast, but it requires major structural changes and commitments. We have observed that at the population level, there is a bifurcation event whereby some cells undergo these changes and others do not. We have analyzed this bifurcation event in mutants for all the non-essential epigenetic regulators in yeast and identified key proteins that affect the time and intensity of this bifurcation. Even though fatty acid triggers major morphological changes in the cell, the regression model still makes it possible to analyze the over 5000 flow cytometry samples in this data set in an automated manner, whereas a traditional gating approach would be impossible.
Cells exposed to stimuli exhibit a wide range of responses ensuring phenotypic variability across the population. Such single cell behavior is often examined by flow cytometry; however, gating procedures typically employed to select a small subpopulation of cells with similar morphological characteristics make it difficult, even impossible, to quantitatively compare cells across a large variety of experimental conditions because these conditions can lead to profound morphological variations. To overcome these limitations, we developed a regression approach to correct for variability in fluorescence intensity due to differences in cell size and granularity without discarding any of the cells, which gating ipso facto does. This approach enables quantitative studies of cellular heterogeneity and transcriptional noise in high-throughput experiments involving thousands of samples. We used this approach to analyze a library of yeast knockout strains and reveal genes required for the population to establish a bimodal response to oleic acid induction. We identify a group of epigenetic regulators and nucleoporins that, by maintaining an ‘unresponsive population,' may provide the population with the advantage of diversified bet hedging.
doi:10.1038/msb.2011.64
PMCID: PMC3202802  PMID: 21952134
flow cytometry; high-throughput experiments; statistical regression model; transcriptional noise
12.  Genome-Wide Analysis of Effectors of Peroxisome Biogenesis 
PLoS ONE  2010;5(8):e11953.
Peroxisomes are intracellular organelles that house a number of diverse metabolic processes, notably those required for β-oxidation of fatty acids. Peroxisomes biogenesis can be induced by the presence of peroxisome proliferators, including fatty acids, which activate complex cellular programs that underlie the induction process. Here, we used multi-parameter quantitative phenotype analyses of an arrayed mutant collection of yeast cells induced to proliferate peroxisomes, to establish a comprehensive inventory of genes required for peroxisome induction and function. The assays employed include growth in the presence of fatty acids, and confocal imaging and flow cytometry through the induction process. In addition to the classical phenotypes associated with loss of peroxisomal functions, these studies identified 169 genes required for robust signaling, transcription, normal peroxisomal development and morphologies, and transmission of peroxisomes to daughter cells. These gene products are localized throughout the cell, and many have indirect connections to peroxisome function. By integration with extant data sets, we present a total of 211 genes linked to peroxisome biogenesis and highlight the complex networks through which information flows during peroxisome biogenesis and function.
doi:10.1371/journal.pone.0011953
PMCID: PMC2915925  PMID: 20694151
13.  Genome-wide histone acetylation data improve prediction of mammalian transcription factor binding sites 
Bioinformatics  2010;26(17):2071-2075.
Motivation: Histone acetylation (HAc) is associated with open chromatin, and HAc has been shown to facilitate transcription factor (TF) binding in mammalian cells. In the innate immune system context, epigenetic studies strongly implicate HAc in the transcriptional response of activated macrophages. We hypothesized that using data from large-scale sequencing of a HAc chromatin immunoprecipitation assay (ChIP-Seq) would improve the performance of computational prediction of binding locations of TFs mediating the response to a signaling event, namely, macrophage activation.
Results: We tested this hypothesis using a multi-evidence approach for predicting binding sites. As a training/test dataset, we used ChIP-Seq-derived TF binding site locations for five TFs in activated murine macrophages. Our model combined TF binding site motif scanning with evidence from sequence-based sources and from HAc ChIP-Seq data, using a weighted sum of thresholded scores. We find that using HAc data significantly improves the performance of motif-based TF binding site prediction. Furthermore, we find that within regions of high HAc, local minima of the HAc ChIP-Seq signal are particularly strongly correlated with TF binding locations. Our model, using motif scanning and HAc local minima, improves the sensitivity for TF binding site prediction by ∼50% over a model based on motif scanning alone, at a false positive rate cutoff of 0.01.
Availability: The data and software source code for model training and validation are freely available online at http://magnet.systemsbiology.net/hac.
Contact: aderem@systemsbiology.org; ishmulevich@systemsbiology.org
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btq405
PMCID: PMC2922897  PMID: 20663846
14.  Fewer permutations, more accurate P-values 
Bioinformatics  2009;25(12):i161-i168.
Motivation: Permutation tests have become a standard tool to assess the statistical significance of an event under investigation. The statistical significance, as expressed in a P-value, is calculated as the fraction of permutation values that are at least as extreme as the original statistic, which was derived from non-permuted data. This empirical method directly couples both the minimal obtainable P-value and the resolution of the P-value to the number of permutations. Thereby, it imposes upon itself the need for a very large number of permutations when small P-values are to be accurately estimated. This is computationally expensive and often infeasible.
Results: A method of computing P-values based on tail approximation is presented. The tail of the distribution of permutation values is approximated by a generalized Pareto distribution. A good fit and thus accurate P-value estimates can be obtained with a drastically reduced number of permutations when compared with the standard empirical way of computing P-values.
Availability: The Matlab code can be obtained from the corresponding author on request.
Contact: tknijnenburg@systemsbiology.org
Supplementary information:Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btp211
PMCID: PMC2687965  PMID: 19477983
15.  Combinatorial effects of environmental parameters on transcriptional regulation in Saccharomyces cerevisiae: A quantitative analysis of a compendium of chemostat-based transcriptome data 
BMC Genomics  2009;10:53.
Background
Microorganisms adapt their transcriptome by integrating multiple chemical and physical signals from their environment. Shake-flask cultivation does not allow precise manipulation of individual culture parameters and therefore precludes a quantitative analysis of the (combinatorial) influence of these parameters on transcriptional regulation. Steady-state chemostat cultures, which do enable accurate control, measurement and manipulation of individual cultivation parameters (e.g. specific growth rate, temperature, identity of the growth-limiting nutrient) appear to provide a promising experimental platform for such a combinatorial analysis.
Results
A microarray compendium of 170 steady-state chemostat cultures of the yeast Saccharomyces cerevisiae is presented and analyzed. The 170 microarrays encompass 55 unique conditions, which can be characterized by the combined settings of 10 different cultivation parameters. By applying a regression model to assess the impact of (combinations of) cultivation parameters on the transcriptome, most S. cerevisiae genes were shown to be influenced by multiple cultivation parameters, and in many cases by combinatorial effects of cultivation parameters. The inclusion of these combinatorial effects in the regression model led to higher explained variance of the gene expression patterns and resulted in higher function enrichment in subsequent analysis. We further demonstrate the usefulness of the compendium and regression analysis for interpretation of shake-flask-based transcriptome studies and for guiding functional analysis of (uncharacterized) genes and pathways.
Conclusion
Modeling the combinatorial effects of environmental parameters on the transcriptome is crucial for understanding transcriptional regulation. Chemostat cultivation offers a powerful tool for such an approach.
doi:10.1186/1471-2164-10-53
PMCID: PMC2640415  PMID: 19173729
16.  Physiological and Transcriptional Responses of Saccharomyces cerevisiae to Zinc Limitation in Chemostat Cultures †  
Applied and Environmental Microbiology  2007;73(23):7680-7692.
Transcriptional responses of the yeast Saccharomyces cerevisiae to Zn availability were investigated at a fixed specific growth rate under limiting and abundant Zn concentrations in chemostat culture. To investigate the context dependency of this transcriptional response and eliminate growth rate-dependent variations in transcription, yeast was grown under several chemostat regimens, resulting in various carbon (glucose), nitrogen (ammonium), zinc, and oxygen supplies. A robust set of genes that responded consistently to Zn limitation was identified, and the set enabled the definition of the Zn-specific Zap1p regulon, comprised of 26 genes and characterized by a broader zinc-responsive element consensus (MHHAACCBYNMRGGT) than so far described. Most surprising was the Zn-dependent regulation of genes involved in storage carbohydrate metabolism. Their concerted down-regulation was physiologically relevant as revealed by a substantial decrease in glycogen and trehalose cellular content under Zn limitation. An unexpectedly large number of genes were synergistically or antagonistically regulated by oxygen and Zn availability. This combinatorial regulation suggested a more prominent involvement of Zn in mitochondrial biogenesis and function than hitherto identified.
doi:10.1128/AEM.01445-07
PMCID: PMC2168061  PMID: 17933919
17.  Integration of Known Transcription Factor Binding Site Information and Gene Expression Data to Advance from Co-Expression to Co-Regulation 
The common approach to find co-regulated genes is to cluster genes based on gene expression. However, due to the limited information present in any dataset, genes in the same cluster might be co-expressed but not necessarily co-regulated. In this paper, we propose to integrate known transcription factor binding site information and gene expression data into a single clustering scheme. This scheme will find clusters of co-regulated genes that are not only expressed similarly under the measured conditions, but also share a regulatory structure that may explain their common regulation. We demonstrate the utility of this approach on a microarray dataset of yeast grown under different nutrient and oxygen limitations. Our integrated clustering method not only unravels many regulatory modules that are consistent with current biological knowledge, but also provides a more profound understanding of the underlying process. The added value of our approach, compared with the clustering solely based on gene expression, is its ability to uncover clusters of genes that are involved in more specific biological processes and are evidently regulated by a set of transcription factors.
doi:10.1016/S1672-0229(07)60019-9
PMCID: PMC5054099  PMID: 17893074
gene clustering; gene regulation; binding motifs
18.  Exploiting combinatorial cultivation conditions to infer transcriptional regulation 
BMC Genomics  2007;8:25.
Background
Regulatory networks often employ the model that attributes changes in gene expression levels, as observed across different cellular conditions, to changes in the activity of transcription factors (TFs). Although the actual conditions that trigger a change in TF activity should form an integral part of the generated regulatory network, they are usually lacking. This is due to the fact that the large heterogeneity in the employed conditions and the continuous changes in environmental parameters in the often used shake-flask cultures, prevent the unambiguous modeling of the cultivation conditions within the computational framework.
Results
We designed an experimental setup that allows us to explicitly model the cultivation conditions and use these to infer the activity of TFs. The yeast Saccharomyces cerevisiae was cultivated under four different nutrient limitations in both aerobic and anaerobic chemostat cultures. In the chemostats, environmental and growth parameters are accurately controlled. Consequently, the measured transcriptional response can be directly correlated with changes in the limited nutrient or oxygen concentration. We devised a tailor-made computational approach that exploits the systematic setup of the cultivation conditions in order to identify the individual and combined effects of nutrient limitations and oxygen availability on expression behavior and TF activity.
Conclusion
Incorporating the actual growth conditions when inferring regulatory relationships provides detailed insight in the functionality of the TFs that are triggered by changes in the employed cultivation conditions. For example, our results confirm the established role of TF Hap4 in both aerobic regulation and glucose derepression. Among the numerous inferred condition-specific regulatory associations between gene sets and TFs, also many novel putative regulatory mechanisms, such as the possible role of Tye7 in sulfur metabolism, were identified.
doi:10.1186/1471-2164-8-25
PMCID: PMC1797021  PMID: 17241460

Results 1-18 (18)