Search tips
Search criteria

Results 1-13 (13)

Clipboard (0)

Select a Filter Below

Year of Publication
Document Types
1.  New Insights into Dehalococcoides mccartyi Metabolism from a Reconstructed Metabolic Network-Based Systems-Level Analysis of D. mccartyi Transcriptomes 
PLoS ONE  2014;9(4):e94808.
Organohalide respiration, mediated by Dehalococcoides mccartyi, is a useful bioremediation process that transforms ground water pollutants and known human carcinogens such as trichloroethene and vinyl chloride into benign ethenes. Successful application of this process depends on the fundamental understanding of the respiration and metabolism of D. mccartyi. Reductive dehalogenases, encoded by rdhA genes of these anaerobic bacteria, exclusively catalyze organohalide respiration and drive metabolism. To better elucidate D. mccartyi metabolism and physiology, we analyzed available transcriptomic data for a pure isolate (Dehalococcoides mccartyi strain 195) and a mixed microbial consortium (KB-1) using the previously developed pan-genome-scale reconstructed metabolic network of D. mccartyi. The transcriptomic data, together with available proteomic data helped confirm transcription and expression of the majority genes in D. mccartyi genomes. A composite genome of two highly similar D. mccartyi strains (KB-1 Dhc) from the KB-1 metagenome sequence was constructed, and operon prediction was conducted for this composite genome and other single genomes. This operon analysis, together with the quality threshold clustering analysis of transcriptomic data helped generate experimentally testable hypotheses regarding the function of a number of hypothetical proteins and the poorly understood mechanism of energy conservation in D. mccartyi. We also identified functionally enriched important clusters (13 for strain 195 and 11 for KB-1 Dhc) of co-expressed metabolic genes using information from the reconstructed metabolic network. This analysis highlighted some metabolic genes and processes, including lipid metabolism, energy metabolism, and transport that potentially play important roles in organohalide respiration. Overall, this study shows the importance of an organism's metabolic reconstruction in analyzing various “omics” data to obtain improved understanding of the metabolism and physiology of the organism.
PMCID: PMC3986231  PMID: 24733489
2.  RNA-Seq effectively monitors gene expression in Eutrema salsugineum plants growing in an extreme natural habitat and in controlled growth cabinet conditions 
BMC Genomics  2013;14:578.
The investigation of extremophile plant species growing in their natural environment offers certain advantages, chiefly that plants adapted to severe habitats have a repertoire of stress tolerance genes that are regulated to maximize plant performance under physiologically challenging conditions. Accordingly, transcriptome sequencing offers a powerful approach to address questions concerning the influence of natural habitat on the physiology of an organism. We used RNA sequencing of Eutrema salsugineum, an extremophile relative of Arabidopsis thaliana, to investigate the extent to which genetic variation and controlled versus natural environments contribute to differences between transcript profiles.
Using 10 million cDNA reads, we compared transcriptomes from two natural Eutrema accessions (originating from Yukon Territory, Canada and Shandong Province, China) grown under controlled conditions in cabinets and those from Yukon plants collected at a Yukon field site. We assessed the genetic heterogeneity between individuals using single-nucleotide polymorphisms (SNPs) and the expression patterns of 27,016 genes. Over 39,000 SNPs distinguish the Yukon from the Shandong accessions but only 4,475 SNPs differentiated transcriptomes of Yukon field plants from an inbred Yukon line. We found 2,989 genes that were differentially expressed between the three sample groups and multivariate statistical analyses showed that transcriptomes of individual plants from a Yukon field site were as reproducible as those from inbred plants grown under controlled conditions. Predicted functions based upon gene ontology classifications show that the transcriptomes of field plants were enriched by the differential expression of light- and stress-related genes, an observation consistent with the habitat where the plants were found.
Our expectation that comparative RNA-Seq analysis of transcriptomes from plants originating in natural habitats would be confounded by uncontrolled genetic and environmental factors was not borne out. Moreover, the transcriptome data shows little genetic variation between laboratory Yukon Eutrema plants and those found at a field site. Transcriptomes were reproducible and biological associations meaningful whether plants were grown in cabinets or found in the field. Thus RNA-Seq is a valuable approach to study native plants in natural environments and this technology can be exploited to discover new gene targets for improved crop performance under adverse conditions.
PMCID: PMC3846481  PMID: 23984645
Eutrema salsugineum; Thellungiella salsuginea; Transcriptome profiling; RNA-Seq; Salt tolerance; Natural plant populations; Single nucleotide polymorphisms; Phenotypic plasticity; Ecological genomics; Natural field conditions; Halophyte; Extremophile; Plant stress tolerance traits
3.  The embryonic leaf identity gene FUSCA3 regulates vegetative phase transitions by negatively modulating ethylene-regulated gene expression in Arabidopsis 
BMC Biology  2012;10:8.
The embryonic temporal regulator FUSCA3 (FUS3) plays major roles in the establishment of embryonic leaf identity and the regulation of developmental timing. Loss-of-function mutations of this B3 domain transcription factor result in replacement of cotyledons with leaves and precocious germination, whereas constitutive misexpression causes the conversion of leaves into cotyledon-like organs and delays vegetative and reproductive phase transitions.
Herein we show that activation of FUS3 after germination dampens the expression of genes involved in the biosynthesis and response to the plant hormone ethylene, whereas a loss-of-function fus3 mutant shows many phenotypes consistent with increased ethylene signaling. This FUS3-dependent regulation of ethylene signaling also impinges on timing functions outside embryogenesis. Loss of FUS3 function results in accelerated vegetative phase change, and this is again partially dependent on functional ethylene signaling. This alteration in vegetative phase transition is dependent on both embryonic and vegetative FUS3 function, suggesting that this important transcriptional regulator controls both embryonic and vegetative developmental timing.
The results of this study indicate that the embryonic regulator FUS3 not only controls the embryonic-to-vegetative phase transition through hormonal (ABA/GA) regulation but also functions postembryonically to delay vegetative phase transitions by negatively modulating ethylene-regulated gene expression.
PMCID: PMC3305478  PMID: 22348746
Arabidopsis; embryonic development; phase transition; FUSCA3; hormones; ethylene
4.  The role of the Arabidopsis FUSCA3 transcription factor during inhibition of seed germination at high temperature 
BMC Plant Biology  2012;12:15.
Imbibed seeds integrate environmental and endogenous signals to break dormancy and initiate growth under optimal conditions. Seed maturation plays an important role in determining the survival of germinating seeds, for example one of the roles of dormancy is to stagger germination to prevent mass growth under suboptimal conditions. The B3-domain transcription factor FUSCA3 (FUS3) is a master regulator of seed development and an important node in hormonal interaction networks in Arabidopsis thaliana. Its function has been mainly characterized during embryonic development, where FUS3 is highly expressed to promote seed maturation and dormancy by regulating ABA/GA levels.
In this study, we present evidence for a role of FUS3 in delaying seed germination at supraoptimal temperatures that would be lethal for the developing seedlings. During seed imbibition at supraoptimal temperature, the FUS3 promoter is reactivated and induces de novo synthesis of FUS3 mRNA, followed by FUS3 protein accumulation. Genetic analysis shows that FUS3 contributes to the delay of seed germination at high temperature. Unlike WT, seeds overexpressing FUS3 (ML1:FUS3-GFP) during imbibition are hypersensitive to high temperature and do not germinate, however, they can fully germinate after recovery at control temperature reaching 90% seedling survival. ML1:FUS3-GFP hypersensitivity to high temperature can be partly recovered in the presence of fluridone, an inhibitor of ABA biosynthesis, suggesting this hypersensitivity is due in part to higher ABA level in this mutant. Transcriptomic analysis shows that WT seeds imbibed at supraoptimal temperature activate seed-specific genes and ABA biosynthetic and signaling genes, while inhibiting genes that promote germination and growth, such as GA biosynthetic and signaling genes.
In this study, we have uncovered a novel function for the master regulator of seed maturation, FUS3, in delaying germination at supraoptimal temperature. Physiologically, this is important since delaying germination has a protective role at high temperature. Transcriptomic analysis of seeds imbibed at supraoptimal temperature reveal that a complex program is in place, which involves not only the regulation of heat and dehydration response genes to adjust cellular functions, but also the activation of seed-specific programs and the inhibition of germination-promoting programs to delay germination.
PMCID: PMC3296646  PMID: 22279962
High temperature; FUSCA3; Seed germination; Hormones; ABA; Transcriptome
5.  The Re-Establishment of Desiccation Tolerance in Germinated Arabidopsis thaliana Seeds and Its Associated Transcriptome 
PLoS ONE  2011;6(12):e29123.
The combination of robust physiological models with “omics” studies holds promise for the discovery of genes and pathways linked to how organisms deal with drying. Here we used a transcriptomics approach in combination with an in vivo physiological model of re-establishment of desiccation tolerance (DT) in Arabidopsis thaliana seeds. We show that the incubation of desiccation sensitive (DS) germinated Arabidopsis seeds in a polyethylene glycol (PEG) solution re-induces the mechanisms necessary for expression of DT. Based on a SNP-tile array gene expression profile, our data indicates that the re-establishment of DT, in this system, is related to a programmed reversion from a metabolic active to a quiescent state similar to prior to germination. Our findings show that transcripts of germinated seeds after the PEG-treatment are dominated by those encoding LEA, seed storage and dormancy related proteins. On the other hand, a massive repression of genes belonging to many other classes such as photosynthesis, cell wall modification and energy metabolism occurs in parallel. Furthermore, comparison with a similar system for Medicago truncatula reveals a significant overlap between the two transcriptomes. Such overlap may highlight core mechanisms and key regulators of the trait DT. Taking into account the availability of the many genetic and molecular resources for Arabidopsis, the described system may prove useful for unraveling DT in higher plants.
PMCID: PMC3237594  PMID: 22195004
6.  MetaBase—the wiki-database of biological databases 
Nucleic Acids Research  2011;40(Database issue):D1250-D1254.
Biology is generating more data than ever. As a result, there is an ever increasing number of publicly available databases that analyse, integrate and summarize the available data, providing an invaluable resource for the biological community. As this trend continues, there is a pressing need to organize, catalogue and rate these resources, so that the information they contain can be most effectively exploited. MetaBase (MB) (http://MetaDatabase.Org) is a community-curated database containing more than 2000 commonly used biological databases. Each entry is structured using templates and can carry various user comments and annotations. Entries can be searched, listed, browsed or queried. The database was created using the same MediaWiki technology that powers Wikipedia, allowing users to contribute on many different levels. The initial release of MB was derived from the content of the 2007 Nucleic Acids Research (NAR) Database Issue. Since then, approximately 100 databases have been manually collected from the literature, and users have added information for over 240 databases. MB is synchronized annually with the static Molecular Biology Database Collection provided by NAR. To date, there have been 19 significant contributors to the project; each one is listed as an author here to highlight the community aspect of the project.
PMCID: PMC3245051  PMID: 22139927
7.  ePlant and the 3D Data Display Initiative: Integrative Systems Biology on the World Wide Web 
PLoS ONE  2011;6(1):e15237.
Visualization tools for biological data are often limited in their ability to interactively integrate data at multiple scales. These computational tools are also typically limited by two-dimensional displays and programmatic implementations that require separate configurations for each of the user's computing devices and recompilation for functional expansion. Towards overcoming these limitations we have developed “ePlant” ( – a suite of open-source world wide web-based tools for the visualization of large-scale data sets from the model organism Arabidopsis thaliana. These tools display data spanning multiple biological scales on interactive three-dimensional models. Currently, ePlant consists of the following modules: a sequence conservation explorer that includes homology relationships and single nucleotide polymorphism data, a protein structure model explorer, a molecular interaction network explorer, a gene product subcellular localization explorer, and a gene expression pattern explorer. The ePlant's protein structure explorer module represents experimentally determined and theoretical structures covering >70% of the Arabidopsis proteome. The ePlant framework is accessed entirely through a web browser, and is therefore platform-independent. It can be applied to any model organism. To facilitate the development of three-dimensional displays of biological data on the world wide web we have established the “3D Data Display Initiative” (
PMCID: PMC3018417  PMID: 21249219
8.  Abscisic acid inhibits PP2Cs via the PYR/PYL family of ABA-binding START proteins 
Science (New York, N.Y.)  2009;324(5930):1068-1071.
Analysis of a synthetic ABA agonist uncovers a new family of ABA binding proteins that control signal transduction by directly regulating the activity of type 2C protein phosphatases.
PP2Cs are vital phosphatases that play important roles in abscisic acid (ABA) signaling. Using chemical genetics, we previously identified a synthetic growth inhibitor called pyrabactin. Here we show that pyrabactin is a selective ABA agonist that acts through PYR1, the founding member of a family of START proteins called PYR/PYLs, which are necessary for both pyrabactin and ABA signaling in vivo. We show that ABA binds to PYR1, which in turn binds to and inhibits PP2Cs. We therefore suggest that PYR/PYLs are ABA-receptors that function at the apex of a negative regulatory pathway that controls ABA signaling by inhibiting PP2Cs. Our results illustrate the power of small-molecule approaches for sidestepping the functional redundancy that hampers genetic analysis.
PMCID: PMC2827199  PMID: 19407142
9.  An extensive (co-)expression analysis tool for the cytochrome P450 superfamily in Arabidopsis thaliana 
BMC Plant Biology  2008;8:47.
Sequencing of the first plant genomes has revealed that cytochromes P450 have evolved to become the largest family of enzymes in secondary metabolism. The proportion of P450 enzymes with characterized biochemical function(s) is however very small. If P450 diversification mirrors evolution of chemical diversity, this points to an unexpectedly poor understanding of plant metabolism. We assumed that extensive analysis of gene expression might guide towards the function of P450 enzymes, and highlight overlooked aspects of plant metabolism.
We have created a comprehensive database, 'CYPedia', describing P450 gene expression in four data sets: organs and tissues, stress response, hormone response, and mutants of Arabidopsis thaliana, based on public Affymetrix ATH1 microarray expression data. P450 expression was then combined with the expression of 4,130 re-annotated genes, predicted to act in plant metabolism, for co-expression analyses. Based on the annotation of co-expressed genes from diverse pathway annotation databases, co-expressed pathways were identified. Predictions were validated for most P450s with known functions. As examples, co-expression results for P450s related to plastidial functions/photosynthesis, and to phenylpropanoid, triterpenoid and jasmonate metabolism are highlighted here.
The large scale hypothesis generation tools presented here provide leads to new pathways, unexpected functions, and regulatory networks for many P450s in plant metabolism. These can now be exploited by the community to validate the proposed functions experimentally using reverse genetics, biochemistry, and metabolic profiling.
PMCID: PMC2383897  PMID: 18433503
10.  Combining classifiers to predict gene function in Arabidopsis thaliana using large-scale gene expression measurements 
BMC Bioinformatics  2007;8:358.
Arabidopsis thaliana is the model species of current plant genomic research with a genome size of 125 Mb and approximately 28,000 genes. The function of half of these genes is currently unknown. The purpose of this study is to infer gene function in Arabidopsis using machine-learning algorithms applied to large-scale gene expression data sets, with the goal of identifying genes that are potentially involved in plant response to abiotic stress.
Using in house and publicly available data, we assembled a large set of gene expression measurements for A. thaliana. Using those genes of known function, we first evaluated and compared the ability of basic machine-learning algorithms to predict which genes respond to stress. Predictive accuracy was measured using ROC50 and precision curves derived through cross validation. To improve accuracy, we developed a method for combining these classifiers using a weighted-voting scheme. The combined classifier was then trained on genes of known function and applied to genes of unknown function, identifying genes that potentially respond to stress. Visual evidence corroborating the predictions was obtained using electronic Northern analysis. Three of the predicted genes were chosen for biological validation. Gene knockout experiments confirmed that all three are involved in a variety of stress responses. The biological analysis of one of these genes (At1g16850) is presented here, where it is shown to be necessary for the normal response to temperature and NaCl.
Supervised learning methods applied to large-scale gene expression measurements can be used to predict gene function. However, the ability of basic learning methods to predict stress response varies widely and depends heavily on how much dimensionality reduction is used. Our method of combining classifiers can improve the accuracy of such predictions – in this case, predictions of genes involved in stress response in plants – and it effectively chooses the appropriate amount of dimensionality reduction automatically. The method provides a useful means of identifying genes in A. thaliana that potentially respond to stress, and we expect it would be useful in other organisms and for other gene functions.
PMCID: PMC2213690  PMID: 17888165
11.  An “Electronic Fluorescent Pictograph” Browser for Exploring and Analyzing Large-Scale Biological Data Sets 
PLoS ONE  2007;2(8):e718.
The exploration of microarray data and data from other high-throughput projects for hypothesis generation has become a vital aspect of post-genomic research. For the non-bioinformatics specialist, however, many of the currently available tools provide overwhelming amounts of data that are presented in a non-intuitive way.
Methodology/Principal Findings
In order to facilitate the interpretation and analysis of microarray data and data from other large-scale data sets, we have developed a tool, which we have dubbed the electronic Fluorescent Pictograph – or eFP – Browser, available at, for exploring microarray and other data for hypothesis generation. This eFP Browser engine paints data from large-scale data sets onto pictographic representations of the experimental samples used to generate the data sets. We give examples of using the tool to present Arabidopsis gene expression data from the AtGenExpress Consortium (Arabidopsis eFP Browser), data for subcellular localization of Arabidopsis proteins (Cell eFP Browser), and mouse tissue atlas microarray data (Mouse eFP Browser).
The eFP Browser software is easily adaptable to microarray or other large-scale data sets from any organism and thus should prove useful to a wide community for visualizing and interpreting these data sets for hypothesis generation.
PMCID: PMC1934936  PMID: 17684564
12.  C-terminal motif prediction in eukaryotic proteomes using comparative genomics and statistical over-representation across protein families 
BMC Genomics  2007;8:191.
The carboxy termini of proteins are a frequent site of activity for a variety of biologically important functions, ranging from post-translational modification to protein targeting. Several short peptide motifs involved in protein sorting roles and dependent upon their proximity to the C-terminus for proper function have already been characterized. As a limited number of such motifs have been identified, the potential exists for genome-wide statistical analysis and comparative genomics to reveal novel peptide signatures functioning in a C-terminal dependent manner. We have applied a novel methodology to the prediction of C-terminal-anchored peptide motifs involving a simple z-statistic and several techniques for improving the signal-to-noise ratio.
We examined the statistical over-representation of position-specific C-terminal tripeptides in 7 eukaryotic proteomes. Sequence randomization models and simple-sequence masking were applied to the successful reduction of background noise. Similarly, as C-terminal homology among members of large protein families may artificially inflate tripeptide counts in an irrelevant and obfuscating manner, gene-family clustering was performed prior to the analysis in order to assess tripeptide over-representation across protein families as opposed to across all proteins. Finally, comparative genomics was used to identify tripeptides significantly occurring in multiple species. This approach has been able to predict, to our knowledge, all C-terminally anchored targeting motifs present in the literature. These include the PTS1 peroxisomal targeting signal (SKL*), the ER-retention signal (K/HDEL*), the ER-retrieval signal for membrane bound proteins (KKxx*), the prenylation signal (CC*) and the CaaX box prenylation motif. In addition to a high statistical over-representation of these known motifs, a collection of significant tripeptides with a high propensity for biological function exists between species, among kingdoms and across eukaryotes. Motifs of note include a serine-acidic peptide (DSD*) as well as several lysine enriched motifs found in nearly all eukaryotic genomes examined.
We have successfully generated a high confidence representation of eukaryotic motifs anchored at the C-terminus. A high incidence of true-positives in our results suggests that several previously unidentified tripeptide patterns are strong candidates for representing novel peptide motifs of a widely employed nature in the C-terminal biology of eukaryotes. Our application of comparative genomics, statistical over-representation and the adjustment for protein family homology has generated several hypotheses concerning the C-terminal topology as it pertains to sorting and potential protein interaction signals. This approach to background reduction could be expanded for application to protein motif prediction in the protein interior. A parallel N-terminal analysis is presented as supplementary data.
PMCID: PMC1929074  PMID: 17594486
13.  CapsID: a web-based tool for developing parsimonious sets of CAPS molecular markers for genotyping 
BMC Genetics  2006;7:27.
Genotyping may be carried out by a number of different methods including direct sequencing and polymorphism analysis. For a number of reasons, PCR-based polymorphism analysis may be desirable, owing to the fact that only small amounts of genetic material are required, and that the costs are low. One popular and cheap method for detecting polymorphisms is by using cleaved amplified polymorphic sequence, or CAPS, molecular markers. These are also known as PCR-RFLP markers.
We have developed a program, called CapsID, that identifies snip-SNPs (single nucleotide polymorphisms that alter restriction endonuclease cut sites) within a set or sets of reference sequences, designs PCR primers around these, and then suggests the most parsimonious combination of markers for genotyping any individual who is not a member of the reference set. The output page includes biologist-friendly features, such as images of virtual gels to assist in genotyping efforts. CapsID is freely available at .
CapsID is a tool that can rapidly provide minimal sets of CAPS markers for molecular identification purposes for any biologist working in genetics, community genetics, plant and animal breeding, forensics and other fields.
PMCID: PMC1471797  PMID: 16686952

Results 1-13 (13)