Gating-ML has undergone several revisions since the first public release in February 2006. At this point, the ISAC DSTF is satisfied that the specification serves its purpose, and has released it as a Candidate Recommendation to the ISAC membership and other interested parties. The Gating-ML specification is included as

supplemental information to this manuscript. Alternatively, it can be obtained directly from ISAC (

16,

17).

Components of the Standard

Gating-ML specifications consist of several components divided into normative and informative parts. The normative parts are crucial for a compliant implementation. They consist of a detailed description of Gating-ML and of XML schemas (

13) defining the syntax of compliant files. In addition, several informative parts are included. These include example Gating-ML XML files, and HTML- and UML-based (

18) documentation of the XML schemas. Informative compliance tests can also be downloaded (

19). These tests include both example FCS files and Gating-ML files, along with the expected results of membership for all the types of gates included in the standard. It is expected that these example files will significantly aid the development and testing of software that implements the standard.

Scope and Features of the Standard

Gating-ML is an XML-based specification on how to form unambiguous gate definitions that are transferable between different software packages. It is a file format primarily serving the purpose of computationally exchanging details about post-acquisition analysis, and is not intended to define directly data acquisition or physical sorting gates. Gating-ML does not cover guidelines (protocols, standard operating procedures) specifying how gates shall be formed or what gating strategies shall be used for particular assays. However, in Gating-ML, gates can be ordered hierarchically so that each gate can either be applicable on the whole event population (as in a list mode data file), or on a subpopulation defined by another gate. Therefore, an arbitrary gating strategy may be encoded.

FCM data are commonly visualized on scales that do not directly correspond to the actual values stored in list mode data files. These user-friendly visualizations (

20–

22) can improve and simplify the interpretation of the data and analysis. If analyses involve transformed data, then the transformation needs to be described in order to reconstruct the analyses (

11). In general, gates created in transformed space can effectively only be described in the transformed space. As transformations may not be reversible, and since nonlinear transformations have significant effects on the shape of gate boundaries, description of these gates in the data space would be both inaccurate and inefficient (see

Supplementary Material). Therefore, Gating-ML supports applying gates on both raw and transformed data. Either parameters from list mode data files directly, or transformations applied on these parameters create dimensions of the space where gates are defined. Transformations (which include fluorescence compensation) may also be combined into composite transformations.

Supported Gate Types

Gating-ML supports the following types of gates: rectangular, polygon, convex polytopes, ellipsoids, decision trees and Boolean collections of any of the other types of gates. The six types of gates supported by Gating-ML are illustrated in . XML-based definitions are demonstrated in the examples shown in .

The most basic type of gates supported in Gating-ML are Rectangular gates in any number of dimensions, from one-dimensional range gates up to multidimensional hyper-rectangular regions. Rectangular gates are defined by a set of one or more dimensions with the minimum (inclusive) and/or maximum (exclusive) thresholds specified for each dimension. Either the minimum or the maximum threshold may be left out to specify a one-side open gate.

Polygon gates in two dimensions represent one of the most common gates used for traditional manual gating when users draw borders around populations of interest. These gates are specified by an ordered sequence of vertices; the polygon is created by straight line segments spanned between consecutive vertices and between the last and the first vertex in the sequence. The polygon gate is defined as the interior of the polygon for simple polygons (boundary including), and by the Crossing Number method (

23) for non-simple polygons, respectively. This definition allows for concave polygons and polygons with intersecting segments.

Convex polytope gates represent an extension of polygon gates into an unlimited number of dimensions. Convex polytope gates are defined by an intersection of half-spaces; each half space defined by a linear inequality. The polytope gate is defined as *G(A,b)* = {*x: Ax*+*b* ≤ 0}, where *A* is an *m* × *n* matrix, *m* being the number of bounding half-spaces and *n* being the number of dimensions of the polytope; *b* is an *m* × 1 column vector, 0 is a 0 column vector, and the inequality shall be met for each row. The coefficients of each row of *A* and *b* correspond to the coefficients of the linear inequality defining the particular half-space. This representation is computationally fast and easy to process to determine gate membership. These gates are unlikely to be created manually; however, they may be created as data driven gates by software performing (semi)automated analysis based on processing more than two parameters at the same time. Restricting the polytope gates to the “convex only” reduces the complexity significantly, while it does not affect the expressiveness of the Gating-ML language as any polytope gate is describable as a Boolean collection of several convex polytope gates.

Ellipsoid gates in two or more dimensions are the fourth type of gate supported in Gating-ML. While two-dimensional ellipse gates are commonly supported by traditional analytical software, general ellipsoid and hyper-ellipsoid gates represent a straightforward extension into multidimensional space. These types of gates are expected to be one of the most typical outputs of advanced automated data driven gating, such as based on multidimensional clustering and multivariate normal modelling of the data. Representation of the ellipsoid gates has been designed to naturally support for the statistically driven use cases and to be computationally inexpensive to process. Therefore, the ellipsoid gates are defined by a covariance matrix, a mean vector, and a Mahalanobis distance (

24). Specifically, the ellipsoid gate is defined as

*G(μ*,

*C,D*^{2}) = {

*x*: (

*x*−

*μ*)

^{T}*C*^{−1}*(x*−

*μ)* ≤

*D*^{2}}, where

*μ* is a column vector specifying the center of the ellipsoid,

*C*^{−1} is the inverse of a covariance matrix,

*D*^{2} is the square of the Mahalanobis distance, and

^{T} stands for transposition. For multidimensional inputs, one can compute the variance for each dimension separately as well as calculate the correlation between the dimensions (covariance). The resulting covariance matrix represents a description of the shape and orientation of the data, taking the first and second order moments of the data into account. See the EllipseCalculations.xls in the

Supplemental information as a demonstration of how this representation can be calculated from an ellipse specified by its center point, two half-axes and rotation.

Gating-ML also supports decision tree structures where a binary decision tree is stored for the gate. The decision tree encodes a sequence of computing steps that are supposed to be applied on an event in order to decide about its membership in the decision tree gate. A dimension and a threshold are specified in each non-terminal node of the tree. In each computing step, the value of the event is compared against the threshold. Based on the result, computing continues in the “less than” or “greater or equal” tree branch, respectively. The membership results are encoded in terminal nodes. This form of a gate is computationally fast to process; it allows for a complete specification of any arbitrary multidimensional region, whether contiguous or not, and it also allows for encoding of a gate when the boundary is difficult to define geometrically.

Finally, Boolean collections of any of the types of gates extend the expressiveness of the Gating-ML language. Any number of arbitrary gates may be combined using the “AND”, “OR”, and “NOT” operators to describe complex multidimensional regions and to combine gates defined in different dimensions. The operand gates can either be defined in line or a gate reference may be used.

Built-in Data Transformations

Gating-ML includes built in public transformations that have been shown useful for display or analysis of cytometry data. These include

*logarithmic, polynomial of degree one* (i.e., linear combination with translation),

*square root, asinh* (inverse hyperbolic sin),

*split-scale* (

22),

*Hyperlog* (

21), and

*ratio of two parameters*, as well as inverse transformations wherever these exist, i.e.,

*exponential*,

*quadratic transformation*,

*hyperbolic sin, inverse split scale*, and

*EH* transformations (

21). The inverse transformations are useful when transformed data are only available; including cases where list mode data are stored on logarithmical scales in FCS files. The exponential transformation may be used to describe the conversion of channel values to linear scale prior to applying any further transformation, such as (

20–

22).

Hyperlog (

21) is a transformation that performs linear-like for low and negative values and log-like for high values. It is defined as inverse of a linear combination of an exponential with a linear transformation, and is more suitable for certain kinds of FCM data analysis compared to a traditional logarithmic scale. Depending on implementation, the inverse-based nature of the Hyperlog definition may make it computationally expensive. If performance is an issue, the asinh and the split-scale transformations may be used as simple and computationally inexpensive alternatives.

Not all Gating-ML data transformations are reversible, which supports the design choice to describe gates in transformed space. For example, linear combination of parameters is a transformation, which actually loses some of the original information; however, it is used in analytical software to “increase” the number of parameters that can be visually inspected in a two-dimensional space. Another non-reversible transformation is the ratio of two parameters, which becomes useful for computational depolarizations and normalizations, such as using the forward scatter to normalize fluorescence values.

Compensation

In FCM, the emission spectral overlap of fluorescent labels makes it usually necessary to correct detected signals before using the values as a basis for other analyses. Fluorescence compensation is the process by which the amount of spectral overlap is subtracted from the total detected signals to yield an estimate of the actual amount of each dye. Within Gating-ML, fluorescence compensation is a type of transformation. While compensation of a single parameter can also be described as a linear combination, storing spillover matrices is more transparent and it defines compensation of all involved parameters at once (i.e., as a function from *n* to *n* arguments). Gating-ML supports spillover matrices and referencing compensated parameters individually.

Initial Software Implementations

Support for Gating-ML has been included in the

*flowUtils* R package. The R Project for Statistical Computing (

25) is an open-source research platform for evaluating and implementing statistical methods. In order to support statistical analysis in FCM several R packages (

*flowCore, plateCore, flowUtils, flowQ*,

*flowClust* (

26), and

*flowViz*) are being developed within Bioconductor (

27,

28), an open source and open development R-based software project for the analysis and comprehension of genomic data.

We have also developed an open source Java application named FACEJava in order to test the implementability of the Gating-ML specification. This operating system independent tool is capable of applying gating, compensation and other transformations on FCS list mode data files. FACEJava aided the development process as it identified implementation bottle necks and other issues that were subsequently solved during the early stage of development of Gating-ML. It has been invaluable in both, testing design aspects of this specification and building compliance tests as informative references for third-party developers.