The aim of this paper is to propose standard XML syntax for data exchange and visualization of 2DE/MS experimental data that is designated as the annotated gel markup language (AGML). However, this does not limit adapting AGML, a XML application, for data storage [13
]. The proposed AGML syntax captures the essence of a 2D gel experiment and its pertinent MS data, and conveys enough information to analyze and replicate the results. The need to go beyond a format for data storage in the development of the AGML syntax is justified by the diverse set of methods involved and, the enduring obstacles to full automation. The need for a common format to manipulate as well as to store the data is captured by the concept of annotated "virtual gel". This practical solution was reflected in the identification of the data model and ultimately mimicked by the AGML schema.
AGML syntax could easily be adapted by other applications to present the data in XML format. In this specific application the 2DE experimental data was generated using PDQUEST coupled with a MicroMass MALDI-TOF instrument using MassLynx and Micromass global server software for protein identification (MicroMass, Manchester, U.K.). The data generated from a 2DE/MS experiment using the above instrumentation is stored using the manufacturer specific formatting as tab-delimited files. This text file is then converted to AGML syntax through a web interface using software written in PHP (see availability). The conversion of the tab-delimited file to AGML syntax and, if requested, web-based graphical representation, is fully automated. The latter application illustrates the advantage of using AGML as a common format as the graphical displaying is in effect a web-based service available for any dataset represented in our proposed AGML syntax notation. Registered users can then decide whether to deposit the AGML to the database. Since AGML conforms to the XML rules, it's highly flexible and simple to modify [5
]. This adaptability of the syntax, also known as content scalability, helps in defining new elements when new information is acquired through 2DE/MS experiments. This is a great asset in an emerging field like proteomics where new information is discovered at a rapid pace, which requires a constant adaptation of the prevailing data model.
In the field of bioinformatics experimental data needs to be analyzed, stored, updated and exchanged often by researchers [5
]. To this effect, a bioinformatics infrastructure built around AGML will fulfill all these aspects for 2DE/MS experimental data. The ultimate goal of developing the AGML syntax, is to enable proteomics research to move into the 'browsing mode' of searching through existing information databases along similar lines as proposed by Aebersold [14
The proposed AGML document contains the experimental procedure, the experimental results and the composite virtual gel. It is useful to compare the proposed standards with related work in transcriptomics. For example, MAGE-ML [16
] has the representation of DNA array data in XML format as the sole purpose. A similar focus on representation of data is found in ProML [17
], which only includes the protein sequence information while not making allowances for the description of the methodological procedure followed. The advantage in incorporating both the experimental procedure and the results, as we have proposed in AGML, is that the data could be understood in the context of the experiment. The methodological detail facilitates repeating the experiment documented in AGML by another researcher. Arguably, the need to include methodological detail in AGML reflects unresolved methodological challenges in proteomic profiling based on 2D gel electrophoresis, a lesser problem in sequencing or transcriptomics projects.
AGML can be incorporated to be used with large descriptors of proteomics information. Using XML namespace rules, AGML markup can easily be incorporated into any other schema. Specifically, the PEDRo model proposed by Taylor et al. explicitly accommodates the representation of 2DE/MS data [12
] where AGML could be particularly useful. AGML by no way replaces the structure envisioned by PEDRo; instead proposes an XML format for handling 2DE/MS data that can be incorporated into the existing large schemas such as PEDRo. In order to transmit subsets of information from this repository, the PEDRo model has to employ other methods [12
]. Accordingly, the PEDRo model could use AGML syntax to transmit the description of individual 2DE/MS experiments. For that purpose, the AGML could greatly benefit from general-purpose XML translation languages (e.g. XSLT). Additionally, with the wide use of resource description framework (RDF, http://www.w3.org/RDF
), AGML can easily be incorporated into other proposed model frameworks that have been written in XML. Thus incorporating the strengths of AGML, such as storing multiple gel runs per experiment in the same file, with the strengths of other proposed models such as PEDRo and HUPO ML.