Two methods are currently in use for specifying SEM models in scripts. The first centers around specifying the matrices that define the covariance and mean structure of the manifest and latent variables. The second method is based on path analysis and uses a compact specification for the paths and variables in a path diagram. In the end, both of these methods produce a set of matrix equations that are used as an objective function (sometimes called a cost function) that is optimized in order to find parameters such that the objective function is at a minimum. Popular objective functions include maximum likelihood and several variants of least squares.
OpenMx implements both matrix–centric and path–centric methods for specifying the desired structure of the model. Thus, one can use either of these two methods or even a combination of the two. We will provide a short example of these two methods later in the article. In addition to providing built–in objective functions such as FIML, OpenMx provides methods for the user to specify their own custom objective functions.
The data structures that are produced when one creates an OpenMx SEM model are a departure from the structures produced by other SEM software. We will next describe these structures and how they fit together. While software has improved, SEM modelers continue to think about their model structure in ways that have changed very little since the 1960s. One may use OpenMx without changing one's conception of model building, continuing to use path specifications or matrix specifications in a serially ordered script. However, the fact that R is interactive, has powerful vector and matrix operations, and incorporates the flow control of a full programming language all act to allow one to rethink the way models are specified. The OpenMx data structures are designed to flexibly accommodate the power of R. The authors hope that these factors will be sufficient to trigger a paradigm shift in the way SEM is conceived and taught.
This section begins with a description of three of the basic structures in OpenMx: MxModel, MxMatrix, and MxAlgebra. We describe how MxModels may contain other Mx-Models in a tree–like hierarchy, and how references are made within an MxModel hierarchy. We then briefly discuss how data and objective functions are specified within an MxModel. Finally, we provide two example specifications of a simple confirmatory factor model.
MxModels and the Objects They Contain
Data structures in OpenMx are implemented as objects, specifically R S4 objects. The MxModel is the object that contains all of what is necessary in order to specify a structural model. It is primarily a container for other objects while providing the organization that allows the contained objects to refer to one another (see ). Each MxModel has three slots for metainformation about the model: an internal reference name, a type, and a flag that indicates whether the model can be estimated independently from other models.
An MxModel is a data object that contains metainformation and lists of other Mx objects.
MxModels define a namespace, in other words, a self–contained set of strings that define either (1) objects or (2) elements in matrices. Each of these names is unique within the namespace. Therefore, if a name occurs more than once during the specification of an MxModel, it is taken to mean that the name is referring to the same thing. This turns out to be very powerful. For instance, if you name two matrix elements “b” then these two elements are constrained to be equal.
An MxModel may contain: lists of MxMatrices, MxAlgebras, MxConstraints, no more than one MxData object, and an objective function. There are also slots in the MxModel that contain a list of optimization options and a list that contains output from the most recent optimization run. We note here that MxModels can also contain a list of other MxModels. This allows one to create a hierarchical tree of MxModels which is subsumed within a root MxModel container. A hierarchical tree of child and parent MxModels provides a new way of thinking about constructing SEM models that is surprisingly powerful.
An MxMatrix is an object which contains five separate R matrices and five metainformation slots: a type, the number of rows and columns; the labels for each row and column (in R this is called dimnames); and the name by which the matrix is known in its MxModel namespace. The five matrices in the MxMatrix are all of the same order, but of different R storage types. The values matrix holds the starting (or estimated) values and is of type double. The labels matrix is of type character and holds the name of each element of the matrix. Matrix elements that have the same name are constrained to be equal to one another. The free matrix is of type logical and if an element is TRUE, then that element is considered to be a free parameter during estimation. The lbound and ubound matrices are of type double and contain lower and upper bounds for the free parameters.
An MxAlgebra is an object that contains its name, a formula in R notation, and a result matrix of type double. The operands in the formula are named objects in the MxModel namespace that are either an MxMatrix or an MxAlgebra. Matrix operators include most of the common matrix operations such as addition, subtraction, matrix multiplication, dot product, Kronecker product, inverse, transpose, augmentation, exponentiation, log, and many others. A full list of operators can be found on the OpenMx website wiki.
An MxConstraint contains two objects, either of which can be an MxMatrix or MxAlgebra, and a relation between them, which can be one of >, <, or ==. This allows the specification of nonlinear constraints which should be satisfied at the end of optimization.
Objective Functions and Data
One of the most flexible parts of OpenMx is the way that the objective functions can be defined. An objective function for optimization results in a scalar number that is minimized. Examples of predefined objective functions include maximum likelihood (mxMLObjective) and full information maximum likelihood (mxFIMLObjective). However, other objective functions can be specified using the mxAlgebraObjective which allows one to specify a formula in the same way as an MxAlgebra is specified with the caveat that the result of the formula must be a 1 × 1 matrix. This allows the possibility of creating objective functions that perform specific optimizations such as variants of least squares or even various Bayesian optimizations.
The MxData object contains the data used for optimization. The data object may be raw data, a correlation matrix, a covariance matrix, a covariance matrix and vector of means, or a sums of squares and crossproducts matrix. Each column in the raw data or covariance matrix must have a column name. If the data is an R dataframe or covariance matrix calculated from a dataframe, these column names are automatically supplied, but these column names must be defined via dimnames for data supplied from other sources. Named columns in MxMatrices that match the dimnames in the MxData are automatically mapped to the correct column in the data.
One of the novel features of OpenMx is that models can contain other models as shown in . This allows one to think very naturally about how dependency is structured in an SEM context. For instance, a model hierarchy can be built that expresses dependency in a genetic SEM analysis: An ACE model is built that contains matrices common to all groups and then two submodels are constructed, one for the monozygotic twin pairs and one for the dizygotic twin pairs. This approach partitions the problem into submodels that follow the logical group structure in the data. A Mixture distribution analysis can be set up as a model tree where the submodels are the elements of the mixture and the top level model expresses the overall likelihood calculation for the mixture.
MxModels can contain lists of submodels.
Multiple independent models can be grouped together as submodels into a single run for problems such as bootstrapping or simulations where the top level model can fit an overall model on the estimation results returned from the independent models. In a case of independent models, OpenMx uses the facilities of snow and swift to distribute the job over multiple CPUs The limit to how many models can be structured into a hierarchy is the memory limit of your computer. We have run cases with tens of thousands of submodels.
A model hierarchy structure allows one to express the logic of an analysis in a straightforward and simplified manner. This feature of OpenMx is a departure from traditional SEM specification, and has proven popular among beta testers of OpenMx.
References within MxModels and MxModel Trees
The namespace for an MxModel includes all of the non–independent models in a hierarchical tree. Thus, for instance, parameters can be constrained between two submodels as shown in . Constraints cannot be made to elements in an independent submodel — one of the conditions that allows independent estimation of branches of a model tree that have been marked as independent. In four elements from three matrices across two submodels have all been constrained to be equal by labeling the corresponding elements as “d”.
Equality constraints can be defined between submodels.
Free elements of MxMatrices can also be constrained to be equal to the results of MxAlgebras by using labels that include the MxAlgebra name and an index into the result matrix of the MxAlgebra as shown in . This allows matrix elements to be constrained to be nonlinear functions of free parameters for use in, e.g., logistic regression or continuous time differential equations models.
Labels can be used to constrain a matrix element to be equal to a matrix element from an algebraic result.