We begin the description of our model with definition notations. Let Ψ

^{oct} and Ψ

^{vf} denote the macular OCT thickness ratio map and its paired VF map, respectively. Suppose there are

*n* such example pair images in our training library database as denoted by

. The

*n* OCT ratio maps are aligned with one another because of the manner by which they are obtained (all centered over the fovea). In a similar fashion, the

*n* VF maps are aligned with one another. Hence, alignment is not an issue for either the OCT or the VF images, as they are all performed/obtained with reference to the same fixed structure.

We first compute the mean image functions

^{oct} and

^{vf}, representing the mean OCT ratio map and mean VF map of our library database, respectively. To extract the variability of the OCT and VF images, the mean images

^{oct} and

^{vf} are subtracted from each of the

*n* OCT and VF images, respectively. This operation yields

*n* mean-offset OCT and

*n* mean-offset VF images as denoted by

^{oct} and

^{vf}, respectively, as shown below:

where

To prepare the training library, we form

*n* column vectors {

_{1}, …,

_{n}} where each column vector

_{i} consists of

*P* samples of

directly concatenated on top of

*Q* samples of

. Therefore, each

_{i} consists of

*N* =

*P* +

*Q* elements. Specifically, in our setup, for OCT and VF imaging pair

*i*, the 501 × 501 pixel size OCT image is used to generate a column of

*P* = 251, 001 lexicographically ordered pixels (where the columns of the image grid are sequentially concatenated on top of one another to form one large column). This same strategy is applied to transform the 6 × 6 VF image to generate a column of

*Q* = 34 lexicographically ordered pixels (after excluding the blind spot which occupies 2 test locations in the VF). These two column vectors are concatenated on top of each other to form a large column vector

_{i} of dimension

*N* = 251, 035. This process is repeated

*n* times, one for each OCT and VF image pair, to form a tall and skinny matrix of size 251, 035 ×

*n*. We define this matrix

*S* as

A simple schematic diagram to illustrate the formation of *S* is shown in .

Eigenvalue decomposition is employed to factor

as:

where

*X* is a tall and skinny rectangular

*N* ×

*n* matrix whose columns represent the

*n* principal variational modes or eigenvectors of the paired OCT and VF images, and Λ is an

*n* ×

*n* diagonal matrix whose diagonal elements, denoted by λ

_{1}, …, λ

_{n}, represent the corresponding non-zero eigenvalues. Because these eigenvectors are derived by performing principal component analysis on the matrix

*S*, which contains not only information about the variabilities of the OCT and VF images, but also about their co-dependencies, the eigenvectors derived will naturally demonstrate strong coupling between the OCT and the VF images. This idea is central to the success of our algorithm. Further, as a direct result of concatenating the vectorized OCT image representation directly on top of the vectorized VF image representation, the

*N* ×

*n*-dimensional matrix

*X* can be easily partitioned into an upper and a lower submatrix. Specifically,

where

*X*^{oct} is of dimension

*P* ×

*n*, and

*X*^{vf} is of dimension

*Q* ×

*n*. As we will show later, these submatrices

*X*^{oct} and

*X*^{vf} are important for image reconstruction.

From a computational standpoint, calculating the eigenvectors and eigenvalues of the

*N* ×

*N* matrix

is simply not practical. To offset this difficulty, in standard practice, we typically compute the eigenvectors and eigenvalues from a much smaller

*n* ×

*n* matrix

*W* given by:

Thus, if

**d** is an eigenvector of

*W* with corresponding eigenvalue λ, then

*S***d** is an eigenvector of

with eigenvalue λ (see proof on page 59 in

Leventon (2000)).

Because of the projection operation, the weighted sum of these eigenvectors can be used to generate a reconstructed image vector, which, in turn, can then be *unconcatenated* to yield two separate vectorized image representations–one to reconstruct the OCT image, and one to reconstruct the VF image. More precisely, the first *P* samples of the reconstructed image vector can be rearranged (by undoing the earlier lexicographical concatenation of the grid columns) to form a 501 × 501 rectangular OCT image. Likewise, the last *Q* samples of the reconstructed image vector can be rearranged in a similar fashion to form a 6 × 6 rectangular VF image (after adding back in the two blind spots).