Search tips
Search criteria

Results 1-13 (13)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
Document Types
author:("Ott, paschal")
1.  Wigwams: identifying gene modules co-regulated across multiple biological conditions 
Bioinformatics  2013;30(7):962-970.
Motivation: Identification of modules of co-regulated genes is a crucial first step towards dissecting the regulatory circuitry underlying biological processes. Co-regulated genes are likely to reveal themselves by showing tight co-expression, e.g. high correlation of expression profiles across multiple time series datasets. However, numbers of up- or downregulated genes are often large, making it difficult to discriminate between dependent co-expression resulting from co-regulation and independent co-expression. Furthermore, modules of co-regulated genes may only show tight co-expression across a subset of the time series, i.e. show condition-dependent regulation.
Results: Wigwams is a simple and efficient method to identify gene modules showing evidence for co-regulation in multiple time series of gene expression data. Wigwams analyzes similarities of gene expression patterns within each time series (condition) and directly tests the dependence or independence of these across different conditions. The expression pattern of each gene in each subset of conditions is tested statistically as a potential signature of a condition-dependent regulatory mechanism regulating multiple genes. Wigwams does not require particular time points and can process datasets that are on different time scales. Differential expression relative to control conditions can be taken into account. The output is succinct and non-redundant, enabling gene network reconstruction to be focused on those gene modules and combinations of conditions that show evidence for shared regulatory mechanisms. Wigwams was run using six Arabidopsis time series expression datasets, producing a set of biologically significant modules spanning different combinations of conditions.
Availability and implementation: A Matlab implementation of Wigwams, complete with graphical user interfaces and documentation, is available at:
Supplementary Data: Supplementary data are available at Bioinformatics online.
PMCID: PMC3967106  PMID: 24351708
2.  Wellington: a novel method for the accurate identification of digital genomic footprints from DNase-seq data 
Nucleic Acids Research  2013;41(21):e201.
The expression of eukaryotic genes is regulated by cis-regulatory elements such as promoters and enhancers, which bind sequence-specific DNA-binding proteins. One of the great challenges in the gene regulation field is to characterise these elements. This involves the identification of transcription factor (TF) binding sites within regulatory elements that are occupied in a defined regulatory context. Digestion with DNase and the subsequent analysis of regions protected from cleavage (DNase footprinting) has for many years been used to identify specific binding sites occupied by TFs at individual cis-elements with high resolution. This methodology has recently been adapted for high-throughput sequencing (DNase-seq). In this study, we describe an imbalance in the DNA strand-specific alignment information of DNase-seq data surrounding protein–DNA interactions that allows accurate prediction of occupied TF binding sites. Our study introduces a novel algorithm, Wellington, which considers the imbalance in this strand-specific information to efficiently identify DNA footprints. This algorithm significantly enhances specificity by reducing the proportion of false positives and requires significantly fewer predictions than previously reported methods to recapitulate an equal amount of ChIP-seq data. We also provide an open-source software package, pyDNase, which implements the Wellington algorithm to interface with DNase-seq data and expedite analyses.
PMCID: PMC3834841  PMID: 24071585
3.  A local regulatory network around three NAC transcription factors in stress responses and senescence in Arabidopsis leaves 
The Plant Journal  2013;75(1):26-39.
A model is presented describing the gene regulatory network surrounding three similar NAC transcription factors that have roles in Arabidopsis leaf senescence and stress responses. ANAC019, ANAC055 and ANAC072 belong to the same clade of NAC domain genes and have overlapping expression patterns. A combination of promoter DNA/protein interactions identified using yeast 1-hybrid analysis and modelling using gene expression time course data has been applied to predict the regulatory network upstream of these genes. Similarities and divergence in regulation during a variety of stress responses are predicted by different combinations of upstream transcription factors binding and also by the modelling. Mutant analysis with potential upstream genes was used to test and confirm some of the predicted interactions. Gene expression analysis in mutants of ANAC019 and ANAC055 at different times during leaf senescence has revealed a distinctly different role for each of these genes. Yeast 1-hybrid analysis is shown to be a valuable tool that can distinguish clades of binding proteins and be used to test and quantify protein binding to predicted promoter motifs.
PMCID: PMC3781708  PMID: 23578292
Arabidopsis thaliana; Botrytis cinerea; NAC transcription factors; gene regulatory network; senescence; stress
4.  Arabidopsis HEAT SHOCK TRANSCRIPTION FACTORA1b overexpression enhances water productivity, resistance to drought, and infection 
Journal of Experimental Botany  2013;64(11):3467-3481.
Heat-stressed crops suffer dehydration, depressed growth, and a consequent decline in water productivity, which is the yield of harvestable product as a function of lifetime water consumption and is a trait associated with plant growth and development. Heat shock transcription factor (HSF) genes have been implicated not only in thermotolerance but also in plant growth and development, and therefore could influence water productivity. Here it is demonstrated that Arabidopsis thaliana plants with increased HSFA1b expression showed increased water productivity and harvest index under water-replete and water-limiting conditions. In non-stressed HSFA1b-overexpressing (HSFA1bOx) plants, 509 genes showed altered expression, and these genes were not over-represented for development-associated genes but were for response to biotic stress. This confirmed an additional role for HSFA1b in maintaining basal disease resistance, which was stress hormone independent but involved H2O2 signalling. Fifty-five of the 509 genes harbour a variant of the heat shock element (HSE) in their promoters, here named HSE1b. Chromatin immunoprecipitation-PCR confirmed binding of HSFA1b to HSE1b in vivo, including in seven transcription factor genes. One of these is MULTIPROTEIN BRIDGING FACTOR1c (MBF1c). Plants overexpressing MBF1c showed enhanced basal resistance but not water productivity, thus partially phenocopying HSFA1bOx plants. A comparison of genes responsive to HSFA1b and MBF1c overexpression revealed a common group, none of which harbours a HSE1b motif. From this example, it is suggested that HSFA1b directly regulates 55 HSE1b-containing genes, which control the remaining 454 genes, collectively accounting for the stress defence and developmental phenotypes of HSFA1bOx.
PMCID: PMC3733161  PMID: 23828547
Arabidopsis thaliana; basal resistance; biotic and abiotic stress; Brassica napus; drought stress; heat stress; Hyaloperonospora parasitica; hydrogen peroxide; Pseudomonas syringae; transcription factors; water productivity.
5.  New Class of Molecular Conductance Switches Based on the [1,3]-Silyl Migration from Silanes to Silenes 
On the basis of first-principles density functional theory calculations, we propose a new molecular photoswitch which exploits a photochemical [1,3]-silyl(germyl) shift leading from a silane to a silene (a Si=C double bonded compound). The silanes investigated herein act as the OFF state, with tetrahedral saturated silicon atoms disrupting the conjugation through the molecules. The silenes, on the other hand, have conjugated paths spanning over the complete molecules and thus act as the ON state. We calculate ON/OFF conductance ratios in the range of 10–50 at a voltage of +1 V. In the low bias regime, the ON/OFF ratio increases to a range of 200–1150. The reverse reaction could be triggered thermally or photolytically, with the silene being slightly higher in relative energy than the silane. The calculated activation barriers for the thermal back-rearrangement of the migrating group can be tuned and are in the range 108–171 kJ/mol for the switches examined herein. The first-principles calculations together with a simple one-level model show that the high ON/OFF ratio in the molecule assembled in a solid state device is due to changes in the energy position of the frontier molecular orbitals compared to the Fermi energy of the electrodes, in combination with an increased effective coupling between the molecule and the electrodes for the ON state.
PMCID: PMC3670211  PMID: 23741530
6.  MEME-LaB: motif analysis in clusters 
Bioinformatics  2013;29(13):1696-1697.
Summary: Genome-wide expression analysis can result in large numbers of clusters of co-expressed genes. Although there are tools for ab initio discovery of transcription factor-binding sites, most do not provide a quick and easy way to study large numbers of clusters. To address this, we introduce a web tool called MEME-LaB. The tool wraps MEME (an ab initio motif finder), providing an interface for users to input multiple gene clusters, retrieve promoter sequences, run motif finding and then easily browse and condense the results, facilitating better interpretation of the results from large-scale datasets.
Availability: MEME-LaB is freely accessible at:
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3694638  PMID: 23681125
7.  The Retinoblastoma Protein Modulates Tbx2 Functional Specificity 
Molecular Biology of the Cell  2010;21(15):2770-2779.
This study demonstrates that Tbx2 binds Rb1. The interaction with Rb1 increases Tbx2 DNA-binding activity and enhances the ability of Tbx2 to repress transcription. The results show that Tbx2 regulates the expression of genes involved in cell division and DNA replication and that Rb1 modulates Tbx2 target gene recognition and specificity.
Tbx2 is a member of a large family of transcription factors defined by homology to the T-box DNA-binding domain. Tbx2 plays a key role in embryonic development, and in cancer through its capacity to suppress senescence and promote invasiveness. Despite its importance, little is known of how Tbx2 is regulated or how it achieves target gene specificity. Here we show that Tbx2 specifically associates with active hypophosphorylated retinoblastoma protein (Rb1), a known regulator of many transcription factors involved in cell cycle progression and cellular differentiation, but not with the Rb1-related proteins p107 or p130. The interaction with Rb1 maps to a domain immediately carboxy-terminal to the T-box and enhances Tbx2 DNA binding and transcriptional repression. Microarray analysis of melanoma cells expressing inducible dominant-negative Tbx2, comprising the T-box and either an intact or mutated Rb1 interaction domain, shows that Tbx2 regulates the expression of many genes involved in cell cycle control and that a mutation which disrupts the Rb1-Tbx2 interaction also affects Tbx2 target gene selectivity. Taken together, the data show that Rb1 is an important determinant of Tbx2 functional specificity.
PMCID: PMC2912361  PMID: 20534814
8.  Dynamic Distribution of SeqA Protein across the Chromosome of Escherichia coli K-12 
mBio  2010;1(1):e00012-10.
The bacterial SeqA protein binds to hemi-methylated GATC sequences that arise in newly synthesized DNA upon passage of the replication machinery. In Escherichia coli K-12, the single replication origin oriC is a well-characterized target for SeqA, which binds to multiple hemi-methylated GATC sequences immediately after replication has initiated. This sequesters oriC, thereby preventing reinitiation of replication. However, the genome-wide DNA binding properties of SeqA are unknown, and hence, here, we describe a study of the binding of SeqA across the entire Escherichia coli K-12 chromosome, using chromatin immunoprecipitation in combination with DNA microarrays. Our data show that SeqA binding correlates with the frequency and spacing of GATC sequences across the entire genome. Less SeqA is found in highly transcribed regions, as well as in the ter macrodomain. Using synchronized cultures, we show that SeqA distribution differs with the cell cycle. SeqA remains bound to some targets after replication has ceased, and these targets locate to genes encoding factors involved in nucleotide metabolism, chromosome replication, and methyl transfer.
DNA replication in bacteria is a highly regulated process. In many bacteria, a protein called SeqA plays a key role by binding to newly replicated DNA. Thus, at the origin of DNA replication, SeqA binding blocks premature reinitiation of replication rounds. Although most investigators have focused on the role of SeqA at replication origins, it has long been suspected that SeqA has a more pervasive role. In this study, we describe how we have been able to identify scores of targets, across the entire Escherichia coli chromosome, to which SeqA binds. Using synchronously growing cells, we show that the distribution of SeqA between these targets alters as replication of the chromosome progresses. This suggests that sequential changes in SeqA distribution orchestrate a program of gene expression that ensures coordinated DNA replication and cell division.
PMCID: PMC2912659  PMID: 20689753
9.  Variable structure motifs for transcription factor binding sites 
BMC Genomics  2010;11:30.
Classically, models of DNA-transcription factor binding sites (TFBSs) have been based on relatively few known instances and have treated them as sites of fixed length using position weight matrices (PWMs). Various extensions to this model have been proposed, most of which take account of dependencies between the bases in the binding sites. However, some transcription factors are known to exhibit some flexibility and bind to DNA in more than one possible physical configuration. In some cases this variation is known to affect the function of binding sites. With the increasing volume of ChIP-seq data available it is now possible to investigate models that incorporate this flexibility. Previous work on variable length models has been constrained by: a focus on specific zinc finger proteins in yeast using restrictive models; a reliance on hand-crafted models for just one transcription factor at a time; and a lack of evaluation on realistically sized data sets.
We re-analysed binding sites from the TRANSFAC database and found motivating examples where our new variable length model provides a better fit. We analysed several ChIP-seq data sets with a novel motif search algorithm and compared the results to one of the best standard PWM finders and a recently developed alternative method for finding motifs of variable structure. All the methods performed comparably in held-out cross validation tests. Known motifs of variable structure were recovered for p53, Stat5a and Stat5b. In addition our method recovered a novel generalised version of an existing PWM for Sp1 that allows for variable length binding. This motif improved classification performance.
We have presented a new gapped PWM model for variable length DNA binding sites that is not too restrictive nor over-parameterised. Our comparison with existing tools shows that on average it does not have better predictive accuracy than existing methods. However, it does provide more interpretable models of motifs of variable structure that are suitable for follow-up structural studies. To our knowledge, we are the first to apply variable length motif models to eukaryotic ChIP-seq data sets and consequently the first to show their value in this domain. The results include a novel motif for the ubiquitous transcription factor Sp1.
PMCID: PMC2824720  PMID: 20074339
10.  Transcriptional programs: Modelling higher order structure in transcriptional control 
BMC Bioinformatics  2009;10:218.
Transcriptional regulation is an important part of regulatory control in eukaryotes. Even if binding motifs for transcription factors are known, the task of finding binding sites by scanning sequences is plagued by false positives. One way to improve the detection of binding sites from motifs is by taking cooperativity of transcription factor binding into account. We propose a non-parametric probabilistic model, similar to a document topic model, for detecting transcriptional programs, groups of cooperative transcription factors and co-regulated genes. The analysis results in transcriptional programs which generalise both transcriptional modules and TF-target gene incidence matrices and provide a higher-level summary of these structures. The method is independent of prior specification of training sets of genes, for example, via gene expression data. The analysis is based on known binding motifs.
We applied our method to putative regulatory regions of 18,445 Mus musculus genes. We discovered just 68 transcriptional programs that effectively summarised the action of 149 transcription factors on these genes. Several of these programs were significantly enriched for known biological processes and signalling pathways. One transcriptional program has a significant overlap with a reference set of cell cycle specific transcription factors.
Our method is able to pick out higher order structure from noisy sequence analyses. The transcriptional programs it identifies potentially represent common mechanisms of regulatory control across the genome. It simultaneously predicts which genes are co-regulated and which sets of transcription factors cooperate to achieve this co-regulation. The programs we discovered enable biologists to choose new genes and transcription factors to study in specific transcriptional regulatory systems.
PMCID: PMC2725141  PMID: 19607663
11.  A comparative study of S/MAR prediction tools 
BMC Bioinformatics  2007;8:71.
S/MARs are regions of the DNA that are attached to the nuclear matrix. These regions are known to affect substantially the expression of genes. The computer prediction of S/MARs is a highly significant task which could contribute to our understanding of chromatin organisation in eukaryotic cells, the number and distribution of boundary elements, and the understanding of gene regulation in eukaryotic cells. However, while a number of S/MAR predictors have been proposed, their accuracy has so far not come under scrutiny.
We have selected S/MARs with sufficient experimental evidence and used these to evaluate existing methods of S/MAR prediction. Our main results are: 1.) all existing methods have little predictive power, 2.) a simple rule based on AT-percentage is generally competitive with other methods, 3.) in practice, the different methods will usually identify different sub-sequences as S/MARs, 4.) more research on the H-Rule would be valuable.
A new insight is needed to design a method which will predict S/MARs well. Our data, including the control data, has been deposited as additional material and this may help later researchers test new predictors.
PMCID: PMC1847452  PMID: 17335576
12.  Mechanism of the Phospha-Wittig–Horner Reaction** 
PMCID: PMC3738942  PMID: 23653134
ketenes; phosphaallenes; phospha-Wittig–Horner reaction; reaction mechanisms
13.  Enhanced Photochemical Hydrogen Production by a Molecular Diiron Catalyst Incorporated into a Metal–Organic Framework 
Journal of the American Chemical Society  2013;135(45):16997-17003.
A molecular proton reduction catalyst [FeFe](dcbdt)(CO)6 (1, dcbdt = 1,4-dicarboxylbenzene-2,3-dithiolate) with structural similarities to [FeFe]-hydrogenase active sites has been incorporated into a highly robust Zr(IV)-based metal–organic framework (MOF) by postsynthetic exchange (PSE). The PSE protocol is crucial as direct solvothermal synthesis fails to produce the functionalized MOF. The molecular integrity of the organometallic site within the MOF is demonstrated by a variety of techniques, including X-ray absorption spectroscopy. In conjunction with [Ru(bpy)3]2+ as a photosensitizer and ascorbate as an electron donor, MOF-[FeFe](dcbdt)(CO)6 catalyzes photochemical hydrogen evolution in water at pH 5. The immobilized catalyst shows substantially improved initial rates and overall hydrogen production when compared to a reference system of complex 1 in solution. Improved catalytic performance is ascribed to structural stabilization of the complex when incorporated in the MOF as well as the protection of reduced catalysts 1– and 12– from undesirable charge recombination with oxidized ascorbate.
PMCID: PMC3829681  PMID: 24116734

Results 1-13 (13)