RIMS is an integrated platform for reverse phase protein array data management that consists of a client side application, central repository and a communication protocol based on RPPAML data structure. The client side application consists of an upload module and a visualization module.
Data uploading and annotation is done using a separate module accessed through RPPA Data Manager GUI (please see Figure for an overview of the process). Data imported is assembled with the annotation data into XML format, RPPAML (see under RPPAML metadata format for more information). The user can upload the data by pointing the application to the relevant folder and following the intuitive graphical user interface instruction. The user will be prompted to annotate the data once it has been pre-processed. Once the data is annotated, it can be saved on the local disk or uploaded to a web repository based on simple sloppy semantic database (S3DB) infrastructure [11
Figure 1 Flow of information from experimental samples to data visualization and analysis. The implementation architecture (shown in boxes in the figure) consists of a data management layer where data warehousing takes place (1). The next layer is the information (more ...)
RIMS is a client side application that interacts with the knowledge database (S3DB) to create a management infrastructure for RPPA data. RIMS interaction with S3DB is fully automated and location of S3DB can be distributed as long as it can be reached with a URL. Additionally, entry creation and data download is managed by RIMS software and there is no limit to the amount of data that can be uploaded or downloaded. RPPA Data Manager application manages the Upload and VisualizeGUI modules. Upload module lets the user annotate the data from what is gleaned by the software. The user enters the data through Excel templates, thereby eliminating the need to learn a new method (figure , box 1). Currently, the application supports MicroVigene® data for conversion to RPPAML. However, as new readers for RPPA analysis become available converters for these instruments will be made available.
RIMS provides many methods for data visualization, ranging from the scanned images of individual antibody arrays to the averages and standard deviations of individual samples on multiple arrays. VisualizeGUI module shows the integrated data in the context of sample names. It allows the user to export data in provided formats (RPPAML, Text, Excel, original format) for import into other applications. Additionally, the user can create correlation maps with the uploaded data. The data can also be exported to other applications for visualization (figure , box 3 & 4).
The RIMS client application also supports the creation of pathway maps for the selected antibody and sample lists. The generated pathways can be visualized using the popular Cytoscape tool [9
] or through another application that supports the extensible graph markup and modeling language (XGMML) as an input format [12
]. To generate pathway maps, the user can select corresponding groups (e.g. control (Group 1) Vs disease (Group 2)) and add them to the corresponding list boxes. Clicking the 'Go' button in the 'Pathway Maps' panel will generate the correlation maps from which pathways for the corresponding antibodies will be generated. Correlation maps and pathway maps will be displayed to the user and can then be saved or printed (Figure ).
RPPAML metadata format
There are three options in the current version of the client application to export the data and processed results. The original data can be exported as a Microsoft Excel document, as a text file similar to the original upload format, as a Matlab®
MAT file or as an XML document known as RPPAML. Exporting as an XML document is the most comprehensive option as it provides the original data with the context of its acquisition and processing, including the raw images. Since RPPAML is an application independent XML document, application developers using any programming language can access the data stored in the file. The RPPAML schema details can be found here: http://www.rppacentral.org/
. A well formed RPPAML document contains minimum but sufficient information about the RPPA experiment and is defined as follows:
a) biological information: sample biological information such as its provenance and treatment conditions, etc.
b) antibody information: validation information and approach
c) detection information: blocking, staining, amplification approach and antibody blotting information, etc.
d) slide information: slide preparation information such as array machine, lysate transfer method, pin or spot size, lysated amount, etc.
e) data: data about the experiment using the above conditions.
Sub element <allSampleInfo> under the main element <experimentInformation> stores information about the biological sample such as its provenance, treatment conditions and other protocols (more details can be found on the website http://www.rppacentral.org
under the schema tab). Also, under this main element sub-element <SlidePrepInfo> slide preparation information, such as array machine use and lysate transfer method used, is stored. Additionally, sub element <detection> stores information about blocking, staining and amplification approaches.
Element <arrays> describes all information about the reverse phase protein array (i.e. slide). Sub element <antibodyInfo> contains all information pertaining to an antibody used in the study. Additionally, <spotInfo> contains information about a spot in the slide and element <Img> contains the image of the slide as stored in any acceptable image file format. More information about individual elements is given on the website under the schema tab.
The schema describing the RPPAML structure is also represented in UML notation in Figure . This data model is the result of interaction between experimental researchers and bioinformaticians with the purpose of capturing all the relevant information for both data management and analysis. The proposed model documents the biological context, experimental conditions and data, thereby providing the data with the provenance and context and consequently preserving the granularity of the data set. Implementation of the model was achieved through the use of XML.
RIMS uses S3DB as the data service backbone. The distributed nature of this component implies that individual users have the option of relying on locally installed S3DB deployments or using an external deployment such as the reference repository at The University of Texas M. D. Anderson Cancer Center [13
]. As a consequence, individual users can access the data stored in these federated knowledge bases by simply pointing the application (RIMS) to any S3DB data service. A characteristic of S3DB semantic data services is that other data models describing complementary information can be integrated without compromising existing data [11
]. This is particularly relevant for RPPA technology as new methods and improvements are devised for this young technology. However, using the proposed RPPAML data format, client applications will be aware of the context and provenance of data and provide the user with possible choices for analyzing the data.