For the efficient exchange of data it is important to speak the same language and to agree upon a common terminology. Due to the scope of the RDML format we will not discuss each term, but rather focus on those elements for which multiple names are in use, elements that can be interpreted in different ways, or whose intended role cannot be fully intuited from the name.
Sample can be used to refer to different inputs: tissue biopsy, cell culture, RNA extract, cDNA, cDNA dilution, etc. Depending on the interpretation of a sample, data may be processed in a different way (e.g. technical versus biological replicates). In RDML, sample refers to the nucleic acid material that is being added to the PCR reaction mix. As a consequence, technical replicate samples should contain the same name (reactions are performed on the same material), and biological replicates should contain different names (the nucleic acids derived from the different biological replicates are not the same).
Universal term for the nucleic acid sequence to be amplified (including but not limited to genes). We did not use the term gene because it cannot be used for intergenic sequences and it does not allow discrimination between different target sequences of the same gene.
Depending on the real-time instrument, either threshold cycle (Ct), crossing point (Cp) or a take-off point (Top) are used to refer to the same quantification cycle value (Cq): the fractional PCR cycle at which the target is quantified in a given sample.
Depending on the real-time instrument a reaction corresponds to a well in a microtiter plate, a glass rotor capillary or a microfluidic reaction volume.
Generic name for a plate, rotor or other physical form containing the data from one single PCR run.
An experiment is a collection of runs that need to be analyzed as a single data set.
RDML file structure
Apart from a common terminology, we also developed a standard file structure to create a universal real-time PCR markup language. The RDML standard is based on XML (eXtensible Markup Language), an extensible language especially created to facilitate the sharing of data across different information systems, making it the perfect language in which to implement this standard.
RDML was constructed to accommodate the storage of data from multiple experiments. As can be seen in the simplified overview in , the RDML schema basically consists of seven element types at root level, namely the blocks: documentation, ID, sample, target, experimenter, thermal cycling conditions and experiment. The checklist information according to MIQE (Bustin et al., submitted for publication) is mainly stored in documentation elements. The ID elements contain multiple identifiers supplied by databases or repositories in which the file can be stored. All samples used in the different experiments and relevant information about them are added as sample elements. The same applies for the target elements that contain information about the genes and other target sequences. A list of experimenters who contributed to one or more experiments can be saved in experimenter elements while the thermal cycling conditions elements hold PCR programs.
Schematic representation of the RDML XML schema.
The main part of an RDML file is formed by one or more experiment blocks; each block containing the data of one experiment. For each experiment, the actual data is organized and stored on a run-by-run basis using run elements. The run element is further subdivided into elements containing information about the experimental setup, the IDs of the experimenters who participated and multiple reaction elements associated with a name and the ID of the sample analyzed in that reaction. Using an experimenter's ID that refers to the corresponding personal information in the top-level experimenter element, has the benefit of grouping similar information in one place, creating a more database like structure and easier transfer of the data into repositories; it may also make for more compact instance documents if the same information is referred to several times. The same principle was applied to the thermal cycling conditions, sample and target elements by mentioning only their corresponding ID in the reaction elements. To support the use of multiplex analysis, different data elements, containing quantification (and possibly raw) data can be created for each reaction.
All elements of this data format are optional, making RDML very flexible and widely useful for a multitude of purposes from sharing information about samples to the exchange of raw measurement data. Additionally, the XML nature of RDML allows for straightforward extension with new elements or features to contain extra information if required in the future. A more detailed description and technical information is available on the RDML website (http://www.rdml.org
) and on a public open source repository (http://sourceforge.net/projects/rdml/