|Home | About | Journals | Submit | Contact Us | Français|
Genetic regulatory networks (GRNs) are complex, large-scale, and spatially and temporally distributed. These characteristics impose challenging demands on computational GRN modeling tools, and there is a need for custom modeling tools. In this paper, we report on our ongoing development of BioTapestry, an open source, freely available computational tool designed specifically for GRN modeling. We also outline our future development plans, and give some examples of current applications of BioTapestry.
As our understanding of genetic regulatory networks (GRNs) increases, ever more complex networks are studied. Ad-hoc ways of describing such networks (e.g. using generic drawing tools) are inefficient and inadequate, and there is an increasing need for specialized software.
In this paper, we present a software tool we have developed, called BioTapestry (http://www.BioTapestry.org/) , which has been designed from the ground-up to model GRNs. BioTapestry is free, open source, and runs on all popular computer platforms. Below, we illustrate some of the ways in which BioTapestry facilitates GRN modeling. Since BioTapestry is an active and ongoing project, we will also outline the requirements that will guide our future development efforts. We follow this with a roadmap on how to get started using BioTapestry, and finally give some examples of current applications of the software.
The architecture of a GRN arises directly from the DNA sequence of the genome, and a GRN model is directly testable by DNA manipulations. Thus, the representation of GRNs must be genome oriented, with specific emphasis placed on the predicted DNA inputs that form the basis of the model. Furthermore, the GRN needs to be viewable at a number of different levels, from the whole, to the subcircuits, to the cis-regulatory DNA, and to the nucleotide sequence.
General-purpose network layout and presentation tools do not provide an appropriate level and style of abstraction for modeling GRNs. Many pathway modeling tools represent molecular interaction networks at the level of biochemical reactions. Because of the large number of reactions involved, representing GRNs as a set of biochemical reactions can result in overwhelmingly complex diagrams and obscure the regulatory architecture of GRNs. Moreover, the necessary biochemical data is rarely available to characterize such detailed views. On the other hand, overly abstract representations, such as those used in graph visualization software, lead to ambiguous network diagrams that convey little information.
Figure 1A shows a common graph-layout style GRN diagram. Compare this to the BioTapestry diagram in Figure 1B. The BioTapestry view immediately coveys a number of key concepts absent from the graph view. Firstly, cis-regulatory relationships are very easy to see and decipher in the BioTapestry view. Second, all nodes are not equal in BioTapestry layouts. BioTapestry uses automated layout templates to highlight regulatory relationships among the genes. For example in this figure, upstream regulators are placed near the top and to the left, while downstream genes are cascaded towards the right and bottom.
A key feature of GRNs is that a single gene will typically perform different regulatory interactions in different cells and at different times. A single static view of a GRN cannot convey the way a gene becomes part of different processes and functional modules in different cells and times. Figure 1B also shows how BioTapestry provides a hierarchical representation of GRNs which allow a user to track a GRN within a given group of cells over time, or to compare GRN state different between different cells at any given time.
Finally, BioTapestry is designed to facilitate the process of GRN model building and provides extensive support for network annotation and curation. We discuss these features in depth below.
BioTapestry supports a symbolic representation of genes, their products, and their interactions, which emphasizes regulatory and experimentally-derived network features.
The most important concept to be communicated in GRN visualization is how the transcription of a gene is regulated by other genes in the network. This crucial information must be instantly recognizable from a cursory inspection of the network diagram. The representation of a gene, and in particular the cis-regulatory region of the gene, must be unique, structured, and organized in a fashion that stands out and is quickly understandable.
BioTapestry depicts a gene with the commonly used shorthand representation shown in Figure 2A. The key feature is an explicit schematic representation of the cis-regulatory modules of the gene. As shown, multiple binding sites for the same transcription factor, and multiple cis-regulatory modules can be depicted. Potentially important regulatory features, such as the spatial ordering of transcription factor binding sites on DNA, are also preserved. Furthermore, each regulatory input can be provided with a colored annotation tag for documentation purposes. For example, we often use colored diamond symbols to indicate the type of experimental evidence available for each binding site.
The above “cartoon” representation facilitates whole-network visualization of GRNs. To allow more in-depth evaluation, BioTapestry provides a data page for each network element (e.g. a gene or an interaction). These data pages can be customized by the user to display tables, figures and other data, or illustrations of the internal cis-regulatory logic of a gene. We plan to enhance BioTapestry in the future to permit multiple drill-down displays, and to make it easier to customize the data page.
BioTapestry depicts off-DNA interactions as simple, compact and distinct symbols that quickly provide a sense of the general nature of the process, and its regulatory inputs and outputs, while hiding the details. In this way, complex processes such as signal transduction are modeled in terms of their regulatory role within the GRN of interest. This makes it much easier to understand the GRN at a glance, and allows uncluttered visualization of large-scale GRNs.
BioTapestry's collection of symbols for off-DNA actions and interactions are shown in Figure 2A. They are designed to provide enough information so that a viewer can “mentally fill in” the details from general knowledge. For example the canonical Wnt pathway may be communicated by a single labeled symbol. In general, pathways that do not include multiple regulatory inputs can be summarized by a single input-output symbol to avoid clutter. In cases where the details do need be accessible to the user, right-clicking on the symbol allows the user to view the same type of (user-customizable) pop-up data page containing tables, figures and other data, as is provided for genes.
Post translation processes that are not differentially regulated within a GRN of interest are not represented explicitly in BioTapestry. The outputs of transcription factor genes are typically shown as direct inputs into the regulated gene targets, with an implicit understanding of what that simplification represents. Sometimes, there are post-transcriptional steps that are critical components of the regulatory behavior of the network, e.g. translation inhibition. In these cases, an explicit series of one or more linked off-DNA symbols can be inserted into, and replace, the simple direct input link (see Figure 2B). Similarly, if the gene creates multiple distinct products that are relevant to the regulatory function, the single gene output can be split into several tagged links, one for each product (Figure 2C). The same approach can be used to model the regulation of gene outputs via interactions with micro-RNAs, such as destruction of mRNA or interference with translation. Transcription of miRNAs is regulated in the same fashion as any other genes and can be modeled in BioTapestry in the same way as for any other gene. Regulation by miRNAs is modeled as post-transcriptional interaction, as illustrated in Figure 2D.
BioTapestry uses a variety of strategies to facilitate the visualization of large numbers of genetic linkages:
Although the outputs from a single source are typically rendered in a uniform fashion, BioTapestry allows the user to optionally highlight particular links by properties such as thickness, color, or line style. In addition, certain presentation properties of links, genes, and other nodes can be assigned to specific model properties. For example, the thickness of a link can be tied to the type of experimental evidence available. Such links are rendered accordingly even if the model is laid out differently (see Figure 3C).
The same underlying GRN behaves differently in different cell types, spatial domains, and environmental conditions, and at different times. BioTapestry is designed to help the user to organize these varying views of the network state in a coherent fashion, while helping the user to understand how these views are derived from the single underlying GRN. As illustrated in Figure 4 (see also ), BioTapestry uses a three-level hierarchy to describe a GRN:
Each of these hierarchical views provides a different perspective on the GRN. A researcher can start their exploration of the network at any level, depending on the data available and the researcher's interests. For example, VfGs offer a natural perspective on each gene's full regulatory program within a GRN. However, to study functional motifs in the network (e.g. mutual exclusion), which are highly dependent on specific temporal and spatial conditions, VfN diagrams would be the most appropriate view.
BioTapestry makes it easy to create and organize GRN models using this approach. The hierarchical framework is general enough to be useful in many different ways. For example, one can use the lower levels of the hierarchy to depict variations in network behavior due to different experimental conditions. Alternately, submodels can be used to highlight network components discovered with particular experimental methods, or highlight subsets of genes or interactions that meet some significant selection criteria.
In BioTapestry, all the network models are automatically kept consistent across additions and deletions of network elements. For example, when a user inserts a node into an existing link in the root BioTapestry model, the program propagates the new node to all submodels that include an instance of the link. Since links may be laid out differently, and there may be multiple link copies spanning multiple regions in submodels, this can be a complex process and its automation greatly enhances model consistency and integrity. Along these same lines, we plan to add even more features that will help the user to easily propagate newly added network features to targeted regions and submodels.
As we discussed above, network elements are simplified, abstract representations of complex subsystems, be they detailed cis-regulatory logic networks controlling a gene or complex off-DNA interactions such as lengthy signal transduction pathways. Often, it is useful to have access to more detailed information.
In BioTapestry, right-clicking on any gene or symbol in a network give the user an option to pop up a data display page. For genes, this page is typically configured to display raw perturbation data, the generic expression and interaction data tables that are used to drive the dynamic models, or arbitrary user-specified text. However, the page is customizable using small code plug-ins. In particular, it is easy to write a plug-in that displays data from a web server.
The plug-in approach allows for great flexibility in what data can be displayed, which is appropriate given the wide variety of information that may be appropriate to show. For example, what evidence is available to support the conclusion that a regulatory input is direct? What data have been used to determine that a gene expresses in a particular cell type and time? What are the exact cell type, genetic background and experimental conditions that a model is based on? Which particular member of a family does a gene or gene-product refer to? Potentially, all these types of information can be essential to fully understanding a model and its limitations.
In future versions of BioTapestry, we plan to simplify the data page customization process so that users can install commonly used options without needing to create special code modules.
It is much easier to understand a sequence of GRN state changes through animations and interactive manipulations. A key feature of the BioTapestry Editor is that the user can click on genes and linkages to query their properties (e.g. all target genes, experimental evidence, or alternative paths between two nodes). These interactive features make it much easier to see the underlying organization within a large and busy static view.
BioTapestry also provides strong support for representing network dynamics:
Support for continuously variable expression levels in dynamic time-slider models is a planned future enhancement.
BioTapestry is written in Java, which is a freely available, cross-platform web technology. The program can be run as a web-based application using the Java Web Start facility. This means that a user with Java installed on their computer can click on a link on a web page, and the program will be downloaded to their computer and start to run automatically. Using this framework, a stripped-down “read-only” version of the program, called the BioTapestry Viewer, allows web-based viewing and interactive exploration of published GRN models. This feature enables a GRN model to act as an interactive and dynamic information portal for dissemination of research results and classroom teaching. The facility also encourages the development of community consensus models for widely-studied GRNs, shared over the web and built up by long-range collaboration. Since it is possible to tie the annotation data displayed for each network feature to a web page (using the plug-in facility), collaborative web technologies such as wikis can be used to enable community discussion and feedback.
BioTapestry allows images to be displayed alongside each network view in the model hierarchy. This facility can be used to indicate the cells of interest in the embryo at the appropriate developmental stage, or to indicate other spatial and contextual information using illustrative cartoons. In this way, images can be used to provide biological context for GRN models (Figure 6). This is particularly useful in systems involving multiple signaling events and changing cellular neighborhoods (e.g. development). When combined with the viewing path feature described above, the smooth progression of GRN states can be tracked using the images as well as the network model.
In future BioTapestry development, we plan to introduce 3D pictures and to integrate the image representation with the navigation functions.
BioTapestry top-level networks can be exported to other software tools using the Systems Biology Markup Language (SBML) (http://sbml.org/) . There are currently over 110 SBML-compatible software packages. BioTapestry also can import and export Cytoscape (http://cytoscape.org/)  interaction files. Furthermore, ongoing BioTapestry development aims to support the Gaggle framework (http://gaggle.systemsbiology.org/) . Integration with the Gaggle will allow BioTapestry to interactively exchange network models and data with other tools.
A particularly popular feature of BioTapestry is that it can read entire hierarchical network descriptions from spreadsheets. Such spreadsheets could be generated manually, or by ad-hoc processing of experimental data, or automatically by computational data analysis pipelines. For example, Figure 7 shows a single subnetwork view of a very large dataset, from the Halobacterium EGRIN project (http://baliga.systemsbiology.net/egrin.php) , that was built using the BioTapestry spreadsheet import feature.
A common challenge in GRN modeling is distinguishing between direct and indirect linkages. To help meet this challenge, BioTapestry provides a display of alternative paths between a source and target gene (Figure 8A). The user can then inspect the supporting data to determine whether each linkage is direct. Tools like this help the researcher to build the model up from raw data and explore possible network architectures consistent with the data. We are currently working to enhance BioTapestry to provide more evidence visualization tools and a supporting analysis pipeline that can build a plausible network model from the perturbation and expression data.
BioTapestry now includes an enhanced search tool (Figure 8B) that allows the user to search for all nodes matching or partially matching a given name. It also allows the user to find all sources or targets of a given node, optionally selecting the relevant network segments. Another tool allows the user to select the sources or targets of a given link segment. Even more search enhancements are anticipated. For example, we plan to modify the search tool to work across multiple models as well as the current model. Another planned feature will allow the user to select all nodes and links within n hops of a selected subnetwork, thus making it possible to find the local neighborhoods of that subnetwork.
In general, a GRN model will never be definitive and complete. Over time, the model will be expanded, pruned, revised, and refined; interactions that were once thought to be direct turn out to be indirect, and vice versa. Modifying and maintaining increasingly complex models over an extended period of time is challenging, and BioTapestry provides features to handle these issues. As one example, it provides an incremental layout feature so that new network elements can be added while retaining most of an existing layout. Since this is an important long-term goal, we are continuing to add improvements that simplify creation and modification of the network. For example, recent BioTapestry enhancements allow model hierarchy trees to be duplicated; regions in a model can also be duplicated. Nodes can be inserted into links and automatically propagated to submodels.
Whether the layout of complex GRN models is meaningful and understandable is a subjective judgment dependent on factors such as the user's points of interest, and focus. The commonplace algorithms for laying out network diagrams typically do not take biological considerations into account in their layout. BioTapestry layout algorithms address this challenge in the following ways:
The level of abstraction that we have used in BioTapestry so far has worked well for up to medium-sized networks, but as networks grow in size and complexity, new ways of organizing and thinking about network elements are needed.
The simplest way in which BioTapestry aids the understanding of GRNs is interactivity. A typical GRN presentation can be hard to understand when first viewed; it is only after interactively interrogating it and studying the various levels of the hierarchy that the network organization and functional features become apparent.
One of the ways in we hope we to facilitate understanding of large-scale GRNs is by introducing process diagrams as an additional level of representation within BioTapestry (see Figure 9A for an example). Another form of higher-level grouping, but typically dealing with smaller network chunks, is the identification of functional blocks  (for an example, see Figure 9B). Identification of functional blocks (e.g. feedback loops) also allows the user to view the GRN as a smaller set of interacting units, each with a clearly understood function.
These additional views, together with the network representations we have already discussed, imply a natural ordering of four levels of abstraction that are appropriate for looking at GRNs:
We plan to enhance BioTapestry to support these different representations in a manner that allows the user to view the network at the chosen level of detail, and to switch between these views as needed.
The BioTapestry Editor and associated tutorials are available freely at http://www.BioTapestry.org. The only prerequisite is that the freely-available Java Runtime Environment (JRE), from Sun Microsystems, is installed on your computer. This is commonly the case. If not, JRE can be installed easily by following instructions on the BioTapestry home page. With Java installed, clicking on the BioTapestry start link will cause the Java Web Start system to download the software to your computer and run it. This system allows users to be kept up-to-date with the latest version. When a user is not connected to the internet, they can still run the program from a desktop icon using the associated files saved automatically when the BioTapestry Java Web Start link was last accessed. Note that although the software is downloaded and maintained via the web, your data stays on your machine as locally saved files and is never uploaded to the server.
Also on the BioTapestry home page are a set of online tutorials that are designed to help you learn how to use the program by taking you step-by-step through simple examples that highlight important and common program operations. For example, BioTapestry supports several different ways of creating networks. Firstly, GRN models can be created manually. In this case, BioTapestry provides the user with an easy way to draw the network by hand, with fine-grained manual control over network placement and link layout. This method is taught in a Quick Start tutorial. At the other end of the spectrum, for large-scale regulatory networks based upon large amounts of data (e.g. from high-throughput experiments or computational analyses), BioTapestry is able to import large network descriptions from spreadsheet files, and automatically layout the large number of network elements in a coherent fashion. This method is described in a separate tutorial on Building Networks from Comma-Separated Value Files. Between these two extremes, BioTapestry allows users to specify networks interactively through a set of dialog boxes that guide the user. BioTapestry then automatically generates the network layout, thereby avoiding mundane layout tasks; this method is covered in a tutorial on Building Networks from Interaction Tables
In addition to these and other tutorials, there is an extensive online Frequently Asked Questions (FAQ) list that covers many topics in considerable depth.
BioTapestry was initially developed to model the sea urchin endomesoderm specification networks . A regularly updated BioTapestry viewer for this network is available at: http://sugp.caltech.edu/endomes/
Also, an increasing number of projects are taking advantage of BioTapestry's ability to share interactive GRN models over the web. For example, an interactive model of mouse ventral neural tube specification  is available at: http://www.mcb.harvard.edu/McMahon/BioTapestry/
This model demonstrates how multiple VfA models can be created in BioTapestry to track a continuously varying set of developmental domains as development progresses.
The EGRIN (Environment and Gene Regulatory Influence Network) model recently developed for Halobacterium salinarum  provides an example of using BioTapestry's CSV input facilities and new auto layout algorithms to visualize large networks generated from computational analysis of high-throughput data. See (Figure 7), and: http://baliga.systemsbiology.net/egrin.php
In this case, BioTapestry's model hierarchy provides an excellent solution for showing the many different states of a genome-scale model.
The zebrafish developmental GRN presented in this issue  is an example of a BioTapestry model developed by merging data from public databases with additional local and 3rd party experimental observations. An interactive web model is available at: http://www.zebrafishGRNs.org/
Finally, an interactive web model of the mammalian T-cell developmental GRN , which is available at: http://www.its.caltech.edu/~tcellgrn/ is an example of a BioTapestry model developed by merging data from local and 3rd party experimental observations with additional public sources.
More and more GRNs are being characterized every day. The resulting models are increasingly complex, and integrate very large volumes of experimental data. As the size and complexity of GRN models grows, four inherent capabilities of BioTapestry will prove increasingly essential.
Firstly, BioTapestry network diagrams present an integrated view of (i) the high-level architecture of the network, (ii) the cis-regulatory features of individual genes, and (iii) the supporting experimental evidence.
Second, BioTapestry's hierarchical views of GRNs highlight regulatory differences among cells (visualized in VfNs), as well as regulatory changes over time (visualized with ‘slide shows’), while at the same time emphasizing the relationship of dynamic GRN modules to genomic organization (visualized in VfAs and VfGs).
Third, as larger and larger networks are studied, the size and complexity of datasets is increasingly making their ad-hoc interpretation difficult and error-prone. BioTapestry supports a structured process for curating and translating experimental data into GRN models.
Fourth, our ongoing work to integrate process diagrams, and functional block representations into BioTapestry models facilitates the comparison of GRNs from different organisms (see for example [15,16,17]), allowing new insights into the functional and logical architecture of GRNs, and the fundamental principles underlying genetic control of cellular function.
BioTapestry development is supported by NIGMS grant GM061005.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.