Search tips
Search criteria

Results 1-18 (18)

Clipboard (0)

Select a Filter Below

Year of Publication
Document Types
1.  Systematic characterization of novel lncRNAs responding to phosphate starvation in Arabidopsis thaliana 
BMC Genomics  2016;17:655.
Previously, several long non-coding RNAs (lncRNAs) were characterized as regulators in phosphate (Pi) starvation responses. However, systematic studies of novel lncRNAs involved in the Pi starvation signaling pathways have not been reported.
Here, we used a genome-wide sequencing and bioinformatics approach to identify both poly(A) + and poly(A)– lncRNAs that responded to Pi starvation in Arabidopsis thaliana. We sequenced shoot and root transcriptomes of the Arabidopsis seedlings grown under Pi-sufficient and Pi-deficient conditions, and predicted 1212 novel lncRNAs, of which 78 were poly(A)– lncRNAs. By employing strand-specific RNA libraries, we discovered many novel antisense lncRNAs for the first time. We further defined 309 lncRNAs that were differentially expressed between P+ and P– conditions in either shoots or roots. Through Gene Ontology enrichment of the associated protein-coding genes (co-expressed or close on the genome), we found that many lncRNAs were adjacent or co-expressed with the genes involved in several Pi starvation related processes, including cell wall organization and photosynthesis. In total, we identified 104 potential lncRNA targets of PHR1, a key regulator for transcriptional response to Pi starvation. Moreover, we identified 16 candidate lncRNAs as potential targets of miR399, another key regulator of plant Pi homeostasis.
Altogether, our data provide a rich resource of candidate lncRNAs involved in the Pi starvation regulatory network.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-016-2929-2) contains supplementary material, which is available to authorized users.
PMCID: PMC4991007  PMID: 27538394
Long ncRNAs; RNA-Seq; Phosphate starvation; Arabidopsis thaliana; Poly(A)+; Poly(A)–
2.  RNAex: an RNA secondary structure prediction server enhanced by high-throughput structure-probing data 
Nucleic Acids Research  2016;44(Web Server issue):W294-W301.
Several high-throughput technologies have been developed to probe RNA base pairs and loops at the transcriptome level in multiple species. However, to obtain the final RNA secondary structure, extensive effort and considerable expertise is required to statistically process the probing data and combine them with free energy models. Therefore, we developed an RNA secondary structure prediction server that is enhanced by experimental data (RNAex). RNAex is a web interface that enables non-specialists to easily access cutting-edge structure-probing data and predict RNA secondary structures enhanced by in vivo and in vitro data. RNAex annotates the RNA editing, RNA modification and SNP sites on the predicted structures. It provides four structure-folding methods, restrained MaxExpect, SeqFold, RNAstructure (Fold) and RNAfold that can be selected by the user. The performance of these four folding methods has been verified by previous publications on known structures. We re-mapped the raw sequencing data of the probing experiments to the whole genome for each species. RNAex thus enables users to predict secondary structures for both known and novel RNA transcripts in human, mouse, yeast and Arabidopsis. The RNAex web server is available at
PMCID: PMC4987914  PMID: 27137891
3.  Improved prediction of RNA secondary structure by integrating the free energy model with restraints derived from experimental probing data 
Nucleic Acids Research  2015;43(15):7247-7259.
Recently, several experimental techniques have emerged for probing RNA structures based on high-throughput sequencing. However, most secondary structure prediction tools that incorporate probing data are designed and optimized for particular types of experiments. For example, RNAstructure-Fold is optimized for SHAPE data, while SeqFold is optimized for PARS data. Here, we report a new RNA secondary structure prediction method, restrained MaxExpect (RME), which can incorporate multiple types of experimental probing data and is based on a free energy model and an MEA (maximizing expected accuracy) algorithm. We first demonstrated that RME substantially improved secondary structure prediction with perfect restraints (base pair information of known structures). Next, we collected structure-probing data from diverse experiments (e.g. SHAPE, PARS and DMS-seq) and transformed them into a unified set of pairing probabilities with a posterior probabilistic model. By using the probability scores as restraints in RME, we compared its secondary structure prediction performance with two other well-known tools, RNAstructure-Fold (based on a free energy minimization algorithm) and SeqFold (based on a sampling algorithm). For SHAPE data, RME and RNAstructure-Fold performed better than SeqFold, because they markedly altered the energy model with the experimental restraints. For high-throughput data (e.g. PARS and DMS-seq) with lower probing efficiency, the secondary structure prediction performances of the tested tools were comparable, with performance improvements for only a portion of the tested RNAs. However, when the effects of tertiary structure and protein interactions were removed, RME showed the highest prediction accuracy in the DMS-accessible regions by incorporating in vivo DMS-seq data.
PMCID: PMC4551937  PMID: 26170232
4.  CLIPdb: a CLIP-seq database for protein-RNA interactions 
BMC Genomics  2015;16(1):51.
RNA-binding proteins (RBPs) play essential roles in gene expression regulation through their interactions with RNA transcripts, including coding, canonical non-coding and long non-coding RNAs. Large amounts of crosslinking immunoprecipitation (CLIP)-seq data (including HITS-CLIP, PAR-CLIP, and iCLIP) have been recently produced to reveal transcriptome-wide binding sites of RBPs at the single-nucleotide level.
Here, we constructed a database, CLIPdb, to describe RBP-RNA interactions based on 395 publicly available CLIP-seq data sets for 111 RBPs from four organisms: human, mouse, worm and yeast. We consistently annotated the CLIP-seq data sets and RBPs, and developed a user-friendly interface for rapid navigation of the CLIP-seq data. We applied a unified computational method to identify transcriptome-wide binding sites, making the binding sites directly comparable and the data available for integration across different CLIP-seq studies. The high-resolution binding sites of the RBPs can be visualized on the whole-genome scale using a browser. In addition, users can browse and download the identified binding sites of all profiled RBPs by querying genes of interest, including both protein coding genes and non-coding RNAs.
Manually curated metadata and uniformly identified binding sites of publicly available CLIP-seq data sets will be a foundation for further integrative and comparative analyses. With maintained up-to-date data sets and improved functionality, CLIPdb ( will be a valuable resource for improving the understanding of post-transcriptional regulatory networks.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1273-2) contains supplementary material, which is available to authorized users.
PMCID: PMC4326514  PMID: 25652745
CLIP-seq; RNA-binding protein; RNA; Regulatory networks; Data integration
5.  A common set of distinct features that characterize noncoding RNAs across multiple species 
Nucleic Acids Research  2014;43(1):104-114.
To find signature features shared by various ncRNA sub-types and characterize novel ncRNAs, we have developed a method, RNAfeature, to investigate >600 sets of genomic and epigenomic data with various evolutionary and biophysical scores. RNAfeature utilizes a fine-tuned intra-species wrapper algorithm that is followed by a novel feature selection strategy across species. It considers long distance effect of certain features (e.g. histone modification at the promoter region). We finally narrow down on 10 informative features (including sequences, structures, expression profiles and epigenetic signals). These features are complementary to each other and as a whole can accurately distinguish canonical ncRNAs from CDSs and UTRs (accuracies: >92% in human, mouse, worm and fly). Moreover, the feature pattern is conserved across multiple species. For instance, the supervised 10-feature model derived from animal species can predict ncRNAs in Arabidopsis (accuracy: 82%). Subsequently, we integrate the 10 features to define a set of noncoding potential scores, which can identify, evaluate and characterize novel noncoding RNAs. The score covers all transcribed regions (including unconserved ncRNAs), without requiring assembly of the full-length transcripts. Importantly, the noncoding potential allows us to identify and characterize potential functional domains with feature patterns similar to canonical ncRNAs (e.g. tRNA, snRNA, miRNA, etc) on ∼70% of human long ncRNAs (lncRNAs).
PMCID: PMC4288202  PMID: 25505163
6.  The phzA2-G2 Transcript Exhibits Direct RsmA-Mediated Activation in Pseudomonas aeruginosa M18 
PLoS ONE  2014;9(2):e89653.
In bacteria, RNA-binding proteins of the RsmA/CsrA family act as post-transcriptional regulators that modulate translation initiation at target transcripts. The Pseudomonas aeruginosa genome contains two phenazine biosynthetic (phz) gene clusters, phzA1-G1 (phz1) and phzA2-G2 (phz2), each of which is responsible for phenazine-1-carboxylic acid (PCA) biosynthesis. In the present study, we show that RsmA exhibits differential gene regulation on two phz clusters in P. aeruginosa M18 at the post-transcriptional level. Based on the sequence analysis, four GGA motifs, the potential RsmA binding sites, are found on the 5′-untranslated region (UTR) of the phz2 transcript. Studies with a series of lacZ reporter fusions, and gel mobility shift assays suggest that the third GGA motif (S3), located 21 nucleotides upstream of the Shine-Dalgarno (SD) sequence, is involved in direct RsmA-mediated activation of phz2 expression. We therefore propose a novel model in which the binding of RsmA to the target S3 results in the destabilization of the stem-loop structure and the enhancement of ribosome access. This model could be fully supported by RNA structure prediction, free energy calculations, and nucleotide replacement studies. In contrast, various RsmA-mediated translation repression mechanisms have been identified in which RsmA binds near the SD sequence of target transcripts, thereby blocking ribosome access. Similarly, RsmA is shown to negatively regulate phz1 expression. Our new findings suggest that the differential regulation exerted by RsmA on the two phz clusters may confer an advantage to P. aeruginosa over other pseudomonads containing only a single phz cluster in their genomes.
PMCID: PMC3933668  PMID: 24586939
7.  Systematic Identification of Synergistic Drug Pairs Targeting HIV 
Nature biotechnology  2012;30(11):1125-1130.
The systematic identification of effective drug combinations has been hindered by the unavailability of methods that can explore the large combinatorial search space of drug interactions. Here we present a multiplex screening method named MuSIC (Multiplex Screening for Interacting Compounds), which expedites the comprehensive assessment of pair-wise compound interactions. We examined ~500,000 drug pairs from 1000 FDA-approved or clinically tested drugs and identified drugs that synergize to inhibit HIV replication. Our analysis reveals an enrichment of anti-inflammatory drugs in drug combinations that synergize against HIV, indicating HIV benefits from inflammation that accompanies its infection. Multiple drug pairs identified in this study, including glucocorticoid and nitazoxanide, synergize by targeting different steps of the HIV life cycle. As inflammation accompanies HIV infection, our findings indicate that inhibiting inflammation could curb HIV propagation. MuSIC can be applied to a wide variety of disease-relevant screens to facilitate efficient identification of compound combinations.
PMCID: PMC3494743  PMID: 23064238
Combination therapy; FDA-approved drug library; HIV
8.  Pervasive and dynamic protein binding sites of the mRNA transcriptome in Saccharomyces cerevisiae 
Genome Biology  2013;14(2):R13.
Protein-RNA interactions are integral components of nearly every aspect of biology, including regulation of gene expression, assembly of cellular architectures, and pathogenesis of human diseases. However, studies in the past few decades have only uncovered a small fraction of the vast landscape of the protein-RNA interactome in any organism, and even less is known about the dynamics of protein-RNA interactions under changing developmental and environmental conditions.
Here, we describe the gPAR-CLIP (global photoactivatable-ribonucleoside-enhanced crosslinking and immunopurification) approach for capturing regions of the untranslated, polyadenylated transcriptome bound by RNA-binding proteins (RBPs) in budding yeast. We report over 13,000 RBP crosslinking sites in untranslated regions (UTRs) covering 72% of protein-coding transcripts encoded in the genome, confirming 3' UTRs as major sites for RBP interaction. Comparative genomic analyses reveal that RBP crosslinking sites are highly conserved, and RNA folding predictions indicate that secondary structural elements are constrained by protein binding and may serve as generalizable modes of RNA recognition. Finally, 38% of 3' UTR crosslinking sites show changes in RBP occupancy upon glucose or nitrogen deprivation, with major impacts on metabolic pathways as well as mitochondrial and ribosomal gene expression.
Our study offers an unprecedented view of the pervasiveness and dynamics of protein-RNA interactions in vivo.
PMCID: PMC4053964  PMID: 23409723
9.  MiRmat: Mature microRNA Sequence Prediction 
PLoS ONE  2012;7(12):e51673.
MicroRNAs are known to be generated from primary transcripts mainly through the sequential cleavages by two enzymes, Drosha and Dicer. The sequence of a mature microRNA, especially the ‘seeding sequence’, largely determines its binding ability and specificity to target mRNAs. Therefore, methods that predict mature microRNA sequences with high accuracy will benefit the identification and characterization of novel microRNAs and their targets, and contribute to inferring the post-transcriptional regulation network at a genome scale.
Methodology/Principal Findings
We have developed a method, MiRmat, to predict the mature microRNA sequence. MiRmat is essentially composed of two parts: the prediction of Drosha processing site and the identification of Dicer processing site. Based on the analysis of microRNAs from 12 species, we found that the patterns of free energy profiles are conserved among vertebrate microRNA hairpins. Therefore, we introduced in our method the free energy distribution pattern of the downstream part of pri-microRNA secondary structure and Random Forest algorithm to predict the mature microRNA sequence. Based on the evaluation on an independent test dataset from 10 vertebrates, MiRmat was shown to identify 77.8% of the Drosha processing sites and 92.8% of the Dicer sites within a deviation of 2 nt. In a more stringent evaluation by excluding the microRNAs sharing the same family between the training set and test set, MiRmat kept a rather well performance of 71.9% and 87.2% of the identification rate on the Drosha and Dicer site respectively, which represents the ability to deal with the novel microRNA family. MiRmat outperforms other state-of-the-art methods and has a high degree of efficacy for the prediction of mature microRNA sequences of vertebrates.
MiRmat was developed for identifying microRNA mature sequence(s) by introducing the free energy distribution of RNA stem-loop structure and the Random Forest algorithm. We prove that MiRmat has better performance than the existing tools and is applicable among vertebrates. MiRmat is freely available at
PMCID: PMC3531441  PMID: 23300555
10.  Construction and Analysis of an Integrated Regulatory Network Derived from High-Throughput Sequencing Data 
PLoS Computational Biology  2011;7(11):e1002190.
We present a network framework for analyzing multi-level regulation in higher eukaryotes based on systematic integration of various high-throughput datasets. The network, namely the integrated regulatory network, consists of three major types of regulation: TF→gene, TF→miRNA and miRNA→gene. We identified the target genes and target miRNAs for a set of TFs based on the ChIP-Seq binding profiles, the predicted targets of miRNAs using annotated 3′UTR sequences and conservation information. Making use of the system-wide RNA-Seq profiles, we classified transcription factors into positive and negative regulators and assigned a sign for each regulatory interaction. Other types of edges such as protein-protein interactions and potential intra-regulations between miRNAs based on the embedding of miRNAs in their host genes were further incorporated. We examined the topological structures of the network, including its hierarchical organization and motif enrichment. We found that transcription factors downstream of the hierarchy distinguish themselves by expressing more uniformly at various tissues, have more interacting partners, and are more likely to be essential. We found an over-representation of notable network motifs, including a FFL in which a miRNA cost-effectively shuts down a transcription factor and its target. We used data of C. elegans from the modENCODE project as a primary model to illustrate our framework, but further verified the results using other two data sets. As more and more genome-wide ChIP-Seq and RNA-Seq data becomes available in the near future, our methods of data integration have various potential applications.
Author Summary
The precise control of gene expression lies at the heart of many biological processes. In eukaryotes, the regulation is performed at multiple levels, mediated by different regulators such as transcription factors and miRNAs, each distinguished by different spatial and temporal characteristics. These regulators are further integrated to form a complex regulatory network responsible for the orchestration. The construction and analysis of such networks is essential for understanding the general design principles. Recent advances in high-throughput techniques like ChIP-Seq and RNA-Seq provide an opportunity by offering a huge amount of binding and expression data. We present a general framework to combine these types of data into an integrated network and perform various topological analyses, including its hierarchical organization and motif enrichment. We find that the integrated network possesses an intrinsic hierarchical organization and is enriched in several network motifs that include both transcription factors and miRNAs. We further demonstrate that the framework can be easily applied to other species like human and mouse. As more and more genome-wide ChIP-Seq and RNA-Seq data are going to be generated in the near future, our methods of data integration have various potential applications.
PMCID: PMC3219617  PMID: 22125477
11.  Integrative Analysis of the Caenorhabditis elegans Genome by the modENCODE Project 
Gerstein, Mark B. | Lu, Zhi John | Van Nostrand, Eric L. | Cheng, Chao | Arshinoff, Bradley I. | Liu, Tao | Yip, Kevin Y. | Robilotto, Rebecca | Rechtsteiner, Andreas | Ikegami, Kohta | Alves, Pedro | Chateigner, Aurelien | Perry, Marc | Morris, Mitzi | Auerbach, Raymond K. | Feng, Xin | Leng, Jing | Vielle, Anne | Niu, Wei | Rhrissorrakrai, Kahn | Agarwal, Ashish | Alexander, Roger P. | Barber, Galt | Brdlik, Cathleen M. | Brennan, Jennifer | Brouillet, Jeremy Jean | Carr, Adrian | Cheung, Ming-Sin | Clawson, Hiram | Contrino, Sergio | Dannenberg, Luke O. | Dernburg, Abby F. | Desai, Arshad | Dick, Lindsay | Dosé, Andréa C. | Du, Jiang | Egelhofer, Thea | Ercan, Sevinc | Euskirchen, Ghia | Ewing, Brent | Feingold, Elise A. | Gassmann, Reto | Good, Peter J. | Green, Phil | Gullier, Francois | Gutwein, Michelle | Guyer, Mark S. | Habegger, Lukas | Han, Ting | Henikoff, Jorja G. | Henz, Stefan R. | Hinrichs, Angie | Holster, Heather | Hyman, Tony | Iniguez, A. Leo | Janette, Judith | Jensen, Morten | Kato, Masaomi | Kent, W. James | Kephart, Ellen | Khivansara, Vishal | Khurana, Ekta | Kim, John K. | Kolasinska-Zwierz, Paulina | Lai, Eric C. | Latorre, Isabel | Leahey, Amber | Lewis, Suzanna | Lloyd, Paul | Lochovsky, Lucas | Lowdon, Rebecca F. | Lubling, Yaniv | Lyne, Rachel | MacCoss, Michael | Mackowiak, Sebastian D. | Mangone, Marco | McKay, Sheldon | Mecenas, Desirea | Merrihew, Gennifer | Miller, David M. | Muroyama, Andrew | Murray, John I. | Ooi, Siew-Loon | Pham, Hoang | Phippen, Taryn | Preston, Elicia A. | Rajewsky, Nikolaus | Rätsch, Gunnar | Rosenbaum, Heidi | Rozowsky, Joel | Rutherford, Kim | Ruzanov, Peter | Sarov, Mihail | Sasidharan, Rajkumar | Sboner, Andrea | Scheid, Paul | Segal, Eran | Shin, Hyunjin | Shou, Chong | Slack, Frank J. | Slightam, Cindie | Smith, Richard | Spencer, William C. | Stinson, E. O. | Taing, Scott | Takasaki, Teruaki | Vafeados, Dionne | Voronina, Ksenia | Wang, Guilin | Washington, Nicole L. | Whittle, Christina M. | Wu, Beijing | Yan, Koon-Kiu | Zeller, Georg | Zha, Zheng | Zhong, Mei | Zhou, Xingliang | Ahringer, Julie | Strome, Susan | Gunsalus, Kristin C. | Micklem, Gos | Liu, X. Shirley | Reinke, Valerie | Kim, Stuart K. | Hillier, LaDeana W. | Henikoff, Steven | Piano, Fabio | Snyder, Michael | Stein, Lincoln | Lieb, Jason D. | Waterston, Robert H.
Science (New York, N.Y.)  2010;330(6012):1775-1787.
We systematically generated large-scale data sets to improve genome annotation for the nematode Caenorhabditis elegans, a key model organism. These data sets include transcriptome profiling across a developmental time course, genome-wide identification of transcription factor–binding sites, and maps of chromatin organization. From this, we created more complete and accurate gene models, including alternative splice forms and candidate noncoding RNAs. We constructed hierarchical networks of transcription factor–binding and microRNA interactions and discovered chromosomal locations bound by an unusually large number of transcription factors. Different patterns of chromatin composition and histone modification were revealed between chromosome arms and centers, with similarly prominent differences between autosomes and the X chromosome. Integrating data types, we built statistical models relating chromatin, transcription factor binding, and gene expression. Overall, our analyses ascribed putative functions to most of the conserved genome.
PMCID: PMC3142569  PMID: 21177976
12.  Analysis of genomic variation in non-coding elements using population-scale sequencing data from the 1000 Genomes Project 
Nucleic Acids Research  2011;39(16):7058-7076.
In the human genome, it has been estimated that considerably more sequence is under natural selection in non-coding regions [such as transcription-factor binding sites (TF-binding sites) and non-coding RNAs (ncRNAs)] compared to protein-coding ones. However, less attention has been paid to them. To study selective pressure on non-coding elements, we use next-generation sequencing data from the recently completed pilot phase of the 1000 Genomes Project, which, compared to traditional methods, allows for the characterization of a full spectrum of genomic variations, including single-nucleotide polymorphisms (SNPs), short insertions and deletions (indels) and structural variations (SVs). We develop a framework for combining these variation data with non-coding elements, calculating various population-based metrics to compare classes and subclasses of elements, and developing element-aware aggregation procedures to probe the internal structure of an element. Overall, we find that TF-binding sites and ncRNAs are less selectively constrained for SNPs than coding sequences (CDSs), but more constrained than a neutral reference. We also determine that the relative amounts of constraint for the three types of variations are, in general, correlated, but there are some differences: counter-intuitively, TF-binding sites and ncRNAs are more selectively constrained for indels than for SNPs, compared to CDSs. After inspecting the overall properties of a class of elements, we analyze selective pressure on subclasses within an element class, and show that the extent of selection is associated with the genomic properties of each subclass. We find, for instance, that ncRNAs with higher expression levels tend to be under stronger purifying selection, and the actual regions of TF-binding motifs are under stronger selective pressure than the corresponding peak regions. Further, we develop element-aware aggregation plots to analyze selective pressure across the linear structure of an element, with the confidence intervals evaluated using both simple bootstrapping and block bootstrapping techniques. We find, for example, that both micro-RNAs (particularly the seed regions) and their binding targets are under stronger selective pressure for SNPs than their immediate genomic surroundings. In addition, we demonstrate that substitutions in TF-binding motifs inversely correlate with site conservation, and SNPs unfavorable for motifs are under more selective constraints than favorable SNPs. Finally, to further investigate intra-element differences, we show that SVs have the tendency to use distinctive modes and mechanisms when they interact with genomic elements, such as enveloping whole gene(s) rather than disrupting them partially, as well as duplicating TF motifs in tandem.
PMCID: PMC3167619  PMID: 21596777
13.  Regulatory Feedback Loop of Two phz Gene Clusters through 5′-Untranslated Regions in Pseudomonas sp. M18 
PLoS ONE  2011;6(4):e19413.
Phenazines are important compounds produced by pseudomonads and other bacteria. Two phz gene clusters called phzA1-G1 and phzA2-G2, respectively, were found in the genome of Pseudomonas sp. M18, an effective biocontrol agent, which is highly homologous to the opportunistic human pathogen P. aeruginosa PAO1, however little is known about the correlation between the expressions of two phz gene clusters.
Methodology/Principal Findings
Two chromosomal insertion inactivated mutants for the two gene clusters were constructed respectively and the correlation between the expressions of two phz gene clusters was investigated in strain M18. Phenazine-1-carboxylic acid (PCA) molecules produced from phzA2-G2 gene cluster are able to auto-regulate expression itself and activate the expression of phzA1-G1 gene cluster in a circulated amplification pattern. However, the post-transcriptional expression of phzA1-G1 transcript was blocked principally through 5′-untranslated region (UTR). In contrast, the phzA2-G2 gene cluster was transcribed to a lesser extent and translated efficiently and was negatively regulated by the GacA signal transduction pathway, mainly at a post-transcriptional level.
A single molecule, PCA, produced in different quantities by the two phz gene clusters acted as the functional mediator and the two phz gene clusters developed a specific regulatory mechanism which acts through 5′-UTR to transfer a single, but complex bacterial signaling event in Pseudomonas sp. strain M18.
PMCID: PMC3084852  PMID: 21559370
14.  Genome-Wide Identification of Binding Sites Defines Distinct Functions for Caenorhabditis elegans PHA-4/FOXA in Development and Environmental Response 
PLoS Genetics  2010;6(2):e1000848.
Transcription factors are key components of regulatory networks that control development, as well as the response to environmental stimuli. We have established an experimental pipeline in Caenorhabditis elegans that permits global identification of the binding sites for transcription factors using chromatin immunoprecipitation and deep sequencing. We describe and validate this strategy, and apply it to the transcription factor PHA-4, which plays critical roles in organ development and other cellular processes. We identified thousands of binding sites for PHA-4 during formation of the embryonic pharynx, and also found a role for this factor during the starvation response. Many binding sites were found to shift dramatically between embryos and starved larvae, from developmentally regulated genes to genes involved in metabolism. These results indicate distinct roles for this regulator in two different biological processes and demonstrate the versatility of transcription factors in mediating diverse biological roles.
Author Summary
The C. elegans transcription factor PHA-4 is a member of the highly conserved FOXA family of transcription factors. These factors act as master regulators of organ development by controlling how genes are turned off and on as tissues are formed. Additionally they regulate genes in response to nutrient levels and control both longevity and survival of the organism. However, the extent to which these factors control similar or distinct gene targets for each of these functions is unknown. For this reason, we have used the technique of chromatin immunoprecipitation followed by deep sequencing (ChIP–Seq), to define the target binding sites of PHA-4 on a genome-wide scale, when it is either functioning as an organ identity regulator or in response to environmental stress. Our data clearly demonstrate distinct sets of biologically relevant target genes for the transcription factor PHA-4 under these two different conditions. Not only have we defined PHA-4 targets, but we established an experimental ChIP–Seq pipeline to facilitate the identification of binding sites for many transcription factors in the future.
PMCID: PMC2824807  PMID: 20174564
15.  OligoWalk: an online siRNA design tool utilizing hybridization thermodynamics 
Nucleic Acids Research  2008;36(Web Server issue):W104-W108.
Given an mRNA sequence as input, the OligoWalk web server generates a list of small interfering RNA (siRNA) candidate sequences, ranked by the probability of being efficient siRNA (silencing efficacy greater than 70%). To accomplish this, the server predicts the free energy changes of the hybridization of an siRNA to a target mRNA, considering both siRNA and mRNA self-structure. The free energy changes of the structures are rigorously calculated using a partition function calculation. By changing advanced options, the free energy changes can also be calculated using less rigorous lowest free energy structure or suboptimal structure prediction methods for the purpose of comparison. Considering the predicted free energy changes and local siRNA sequence features, the server selects efficient siRNA with high accuracy using a support vector machine. On average, the fraction of efficient siRNAs selected by the server that will be efficient at silencing is 78.6%. The OligoWalk web server is freely accessible through internet at
PMCID: PMC2447759  PMID: 18490376
16.  Fundamental differences in the equilibrium considerations for siRNA and antisense oligodeoxynucleotide design 
Nucleic Acids Research  2008;36(11):3738-3745.
Both siRNA and antisense oligodeoxynucleotides (ODNs) inhibit the expression of a complementary gene. In this study, fundamental differences in the considerations for RNA interference and antisense ODNs are reported. In siRNA and antisense ODN databases, positive correlations are observed between the cost to open the mRNA target self-structure and the stability of the duplex to be formed, meaning the sites along the mRNA target with highest potential to form strong duplexes with antisense strands also have the greatest tendency to be involved in pre-existing structure. Efficient siRNA have less stable siRNA–target duplex stability than inefficient siRNA, but the opposite is true for antisense ODNs. It is, therefore, more difficult to avoid target self-structure in antisense ODN design. Self-structure stabilities of oligonucleotide and target correlate to the silencing efficacy of siRNA. Oligonucleotide self-structure correlations to efficacy of antisense ODNs, conversely, are insignificant. Furthermore, self-structure in the target appears to correlate with antisense ODN efficacy, but such that more effective antisense ODNs appear to target mRNA regions with greater self-structure. Therefore, different criteria are suggested for the design of efficient siRNA and antisense ODNs and the design of antisense ODNs is more challenging.
PMCID: PMC2441788  PMID: 18483081
17.  Efficient siRNA selection using hybridization thermodynamics 
Nucleic Acids Research  2007;36(2):640-647.
Small interfering RNA (siRNA) are widely used to infer gene function. Here, insights in the equilibrium of siRNA-target hybridization are used for selection of efficient siRNA. The accessibilities of siRNA and target mRNA for hybridization, as measured by folding free energy change, are shown to be significantly correlated with efficacy. For this study, a partition function calculation that considers all possible secondary structures is used to predict target site accessibility; a significant improvement over calculations that consider only the predicted lowest free energy structure or a set of low free energy structures. The predicted thermodynamic features, in addition to siRNA sequence features, are used as input for a support vector machine that selects functional siRNA. The method works well for predicting efficient siRNA (efficacy >70%) in a large siRNA data set from Novartis. The positive predictive value (percentage of sites predicted to be efficient for silencing that are) is as high as 87.6%. The sensitivity and specificity are 22.7 and 96.5%, respectively. When tested on data from different sources, the positive predictive value increased 8.1% by adding equilibrium terms to 25 local sequence features. Prediction of hybridization affinity using partition functions is now available in the RNAstructure software package.
PMCID: PMC2241856  PMID: 18073195
18.  A set of nearest neighbor parameters for predicting the enthalpy change of RNA secondary structure formation 
Nucleic Acids Research  2006;34(17):4912-4924.
A complete set of nearest neighbor parameters to predict the enthalpy change of RNA secondary structure formation was derived. These parameters can be used with available free energy nearest neighbor parameters to extend the secondary structure prediction of RNA sequences to temperatures other than 37°C. The parameters were tested by predicting the secondary structures of sequences with known secondary structure that are from organisms with known optimal growth temperatures. Compared with the previous set of enthalpy nearest neighbor parameters, the sensitivity of base pair prediction improved from 65.2 to 68.9% at optimal growth temperatures ranging from 10 to 60°C. Base pair probabilities were predicted with a partition function and the positive predictive value of structure prediction is 90.4% when considering the base pairs in the lowest free energy structure with pairing probability of 0.99 or above. Moreover, a strong correlation is found between the predicted melting temperatures of RNA sequences and the optimal growth temperatures of the host organism. This indicates that organisms that live at higher temperatures have evolved RNA sequences with higher melting temperatures.
PMCID: PMC1635246  PMID: 16982646

Results 1-18 (18)