2.1. Formal Framework

Formally, we focus on data for multiple matrices *X*_{1}*, X*_{2}*,* …*, X*_{k} with *k* ≥ 2. Each matrix has *n* columns, corresponding to a common set of *n* objects. The *i*th matrix *X*_{i} has *p*_{i} rows, each corresponding to a variable in a given measurement technology that varies from matrix to matrix. For example, in our application to GBM data the rows of *X*_{1} contain gene expression measurements (of dimension *p*_{1} = 23*,* 293) and the rows of *X*_{2} contain miRNA measurements (of dimension *p*_{2} = 534) for the same set of 234 tissue samples (*n* = 234). The *k* matrices may be combined into a single data matrix

where

*p* =

*p*_{1} +

*p*_{2} + … +

*p*_{k}.

Direct analysis of

*X* can be problematic as the size and scale of the constituent data types often differ. To remove baseline differences between data types, it is helpful to row-center the data by subtracting the mean within each row. Data types may also be of different dimension (

*p*_{i}) or differ in variability. To circumvent cases where “the largest dataset wins”, it is helpful to scale each data type by its total variation, or sum-of-squares. In particular, for each

*i* define

, where || · || defines the Frobenius norm

. Then,

for each

*i*, and each data type contributes equally to the total variation of the concatenated matrix

2.2. Model

Let *X*_{1}*, X*_{2}*,* …*, X*_{k} be matrices as in Section 2.1, scaled appropriately. Joint structure is represented by a single *p* × *n* matrix of rank *r <* rank(*X*). Individual structure for each *X*_{i} is represented by a *p*_{i} × *n* matrix of rank *r*_{i} < rank(*X*_{i}).

More formally, let *A*_{i} be the matrix representing the individual structure of *X*_{i}, and let *J*_{i} be the submatrix of the joint structure matrix that is associated with *X*_{i}. Then, the unified JIVE model is

where

*ε*_{i} are

*p*_{i} ×

*n* error matrices of independent entries with

(

*ε*_{i}) = 0

_{p}_{</sub>}_{i}_{</sub>×}_{n}. Let

denote the joint structure matrix. The model imposes the rank constraints rank(

*J*) =

*r* and rank(

*A*_{i}) =

*r*_{i} for

*i* = 1

*,* …

*, k*. Furthermore, we require that the rows of joint and individual structure are orthogonal:

for

*i* = 1

*,* …

*, k*. Intuitively, this means that sample patterns responsible for joint structure between data types are unrelated to sample patterns responsible for individual structure. This requirement does not constrain the model, in that any matrix in the form (

2.2) can be written equivalently with orthogonality between joint and individual structure. Furthermore, the orthogonality constraint assures that the joint and individual components in (

2.2) are uniquely determined (see

Supplement A for more details). It is remarkable that no further orthogonality constraints (say, between the column space of

*J*_{i} and the column space of

*A*_{i}, or between the row spaces of each

*A*_{i}) are required to make the decomposition identifiable.

2.3. Estimation

Here we discuss estimation of the model for fixed ranks

*r, r*_{1}*,* …

*, r*_{k}. The choice of these ranks is important to accurately quantify the amount of joint and individual structure.

Supplement A describes a permutation testing approach to rank selection.

Joint and individual structure are estimated by minimizing the sum of squared error. Let

*R* be the

*p* ×

*n* matrix of residuals after accounting for joint and individual structure:

We estimate the matrices *J* and *A*_{1}*,* …*, A*_{k} by minimizing ||*R*||^{2} under the given ranks. This is accomplished by iteratively estimating joint and individual structure:

- Given
*J*, find *A*_{1}*,* …*, A*_{k} to minimize ||*R*||. - Given
*A*_{1}*,* …*, A*_{k}, find *J* to minimize ||*R*||. - Repeat until convergence.

The joint structure

*J* minimizing ||

*R*|| is equal to the first

*r* terms in the singular value decomposition (SVD) of X with individual structure removed. The estimated individual structure for

*X*_{i} is equal to the first

*r*_{i} terms of the SVD of

*X*_{i} with the joint structure removed. The estimate of individual structure for

*X*_{i} will not change those for

*X*_{j},

*j* ≠

*i*, and hence the

*k* individual approximations minimize ||

*R*|| for fixed joint structure. Pseudocode for this iterative algorithm is given in

Supplement A. Computing time can be improved by reducing the dimensionality of

*X*_{1}*,* …

*, X*_{k} at the outset via their SVD (see

Supplement A).

The iterative method is monotone in the sense that ||*R*|| decreases at each step. Thus ||*R*|| converges to a coordinate-wise minimum, which can’t be improved by changing the estimated joint or individual, structure. Further convergence properties of the algorithm are currently under study.

2.4. Illustrative Example

As a basic illustration we generate two matrices, *X* and *Y*, with simple patterns corresponding to joint and individual structure. The simulated data are depicted in . Both *X* and *Y* are of dimension 50 × 100, i.e., each has 50 variables measured for the same 100 objects. A common pattern *V* of 100 independent standard normal variables is added to half of the rows in *X* and half of the rows in *Y*. This represents the joint structure between the two datasets. Structure individual to *X* is generated by partitioning the objects into five groups, each of size twenty. Those columns corresponding to group 1, 2, 3, 4, or 5 have −2, −1, 0, 1, 2 added to each row of *X*, respectively. Structure individual to *Y* is generated similarly, but the groups are randomly determined and are therefore independent of the groups in *X*. Finally, independent *N(0,1)* noise is added to both *X* and *Y*. Note that the important joint structure is visually obscured.

The common pattern *V* represents an underlying phenomenon that contributes to several variables in both *X* and *Y*. Practically, the individual structure in *X* (or *Y*) may correspond to an experimental grouping of the measured variables in *X* (*Y*) not present in *Y* (*X*), e.g., *batch* effects in microarray data. Our goal is to identify both the common underlying phenomenon and individual group effects.

shows the JIVE estimate for joint structure, JIVE estimates for individual structure, and the fit given by the sum of joint and indivual structure. Estimates closely resemble the true signal in .

2.5. GBM Data

As the gene expression and miRNA data for the GBM samples differ in dimension and variability, they were scaled as in Section 2.1. Permutation testing (see

Supplement A) was used to determine the ranks of estimated joint and individual structure. The test (using

*α* = 0.01, and 1000 permutations) identified

- rank 5 joint structure
- rank 33 structure individual to gene expression.
- rank 13 structure individual to miRNA.

The percentage of variation (sum of squares) explained in each dataset by joint structure, individual structure, and residual noise is shown in . This illustrates how the JIVE decomposition can be used to quantify and compare the amount of shared and individual variation between data types. As shown in , joint structure is responsible for more variation in miRNA than in gene expression (23% and 14%, respectively), and the gene expression data has a considerable amount of structured variation (58%) that is unrelated to miRNA. This is consistent with current biological understanding, as miRNAs are just one of several factors that can influence gene expression.

Heatmaps of the low-rank estimates for joint structure, individual structure, and residual noise are shown in . Columns have the same order in all heatmaps. This reveals the shared patterns present in the joint structure from , but little of the structure that is present in the individual estimates.