Locally linear embedding is a method of nonlinear dimensionality reduction that was originally introduced in Roweis and Saul (

2000). It is useful for visualizing high-dimensional data sets as they would be embedded in a low-dimensional space, and it can often uncover relationships and patterns that are masked by the complexity of the original data set. It has been used to obtain maps of facial expressions and classify handwritten digits (Saul et al.

2003), as well as discriminate between normal and pre-seizure EEG measurements (Ataee et al.

2007). Here we will use LLE to characterize sleep EEG data and the numerical solutions of the cortical model. By embedding both in a two-dimensional space, we will be able to associate traditional EEG sleep stages with the continuous spectrum of states provided by the model.

The algorithm

Let us begin with a high-dimensional data set stored in a matrix

**X** of size

*D* ×

*N*, where each column

**X**_{i} represents one of the

*N* D-dimensional data points. Then the LLE algorithm consists of three steps:

*Calculate the nearest neighbors of each data point***X**_{i}*in the D-dimensional space*. This can be done in several ways; for example, we might choose the *k* closest points based on Euclidian distance, or we may choose only the points within a sphere of a given radius.*Determine the best reconstruction of each point using only its nearest neighbors*. Mathematically, this takes the form of a least squares minimization problem:
where *k* represents the number of nearest neighbors. Our goal is to choose the weights *W* that best reconstruct the original data points in the D-dimensional space, based on the criteria of least-squared error. Because we use only the nearest neighbors, we must have *W*_{ij}=0 if **X**_{j} is not a neighbor of **X**_{i}. In addition, we guarantee invariance to translations by enforcing ∑_{j}*W*_{ij}=1. Note that the minimization can be calculated individually for every *i*.*Compute the low-dimensional output vectors***Y**_{i}. These are chosen to provide the best global reconstruction using the weights *W* from the previous step. Again, this can be formulated as a least squares minimization:
Here we are making the assumption that the weights that give the best reconstruction in *D* dimensions will also be the optimal weights in the lower-dimensional space. In this case, the *N* minimization problems are coupled by the elements of **Y**, so they must be solved simultaneously.

A detailed description of the algorithm and several examples are provided in Saul et al. (

2003). In addition, a Matlab implementation of LLE is available on the authors’ website (Roweis and Saul

2009); it was used to generate all results presented here.

As a simple example, consider using LLE on a known 3D manifold (Fig. ). In this toy example, the underlying manifold is known (although normally this would not be the case), and we recognize that it has only two dimensions, despite living in 3-dimensional space as shown in Fig. (a). The data set **X** consists of a random sampling of points from the manifold (Fig. (b)), and the LLE output for a reduction to two dimensions is displayed in Fig. (c). Here we see that LLE successfully unravels the manifold and uncovers its true 2D nature.

A possible source of confusion with locally linear embedding is the interpretation of output dimensions such as *Y*_{1} and *Y*_{2}. Unlike linear methods such as principal component analysis, LLE does not provide a description of the output vectors in terms of the original *D* dimensions. The elements of **Y** are chosen to give the best *local* reconstructions based on a global minimization problem; this means that the interpretation of **Y** is different for every data point, and it cannot be described by a simple combination of the original dimensions.

Strongly connected components

The use of the LLE algorithm is based on the assumption that the entire data set lies on the same manifold in high-dimensional space. If more than one manifold is present, the locally linear reconstructions will no longer be accurate (imagine, for example, a point with nearest neighbors located on two separate manifolds). Therefore, before using LLE on a data set, we must verify this assumption.

The mathematics and terminology of directed graphs allows us to accomplish this task (Tarjan

1972). Note that when we calculate the nearest neighbors in the first step of the LLE algorithm, we create a directed graph based on the data points. For example, suppose there is a data set of seven points, and we have determined that point 2 is a neighbor of point 1, point 5 is a neighbor of point 2, etc. This can be depicted by arrows drawn from each point to its neighbors (Fig. ). Then we can define a

*strongly connected component* as a group of points where the arrows created by nearest neighbor associations allow for travel from every point in the group to every other point in the group (Tarjan

1972).

*When a group of data points is strongly connected, this indicates that they lie on the same manifold* (Saul et al.

2003).

The example in Fig. has two strongly connected components: points 1, 2, 5, and points 3, 4, 6. However, the two groups are not strongly connected together; one can move from the first group to the second through the connection between 4 and 5, but there is no way to get from the second group to the first. Point 7 is not strongly connected to any other point. Therefore, to use LLE on this sample data set, we would remove point 7 and use the algorithm separately on each strongly connected component.

There are several ways to identify the strongly connected components of a data set. The most traditional method involves an algorithm based on depth-first search of the directed graph (Tarjan

1972). An alternative method relies on analysis of the eigenspace that results from the LLE calculations (Polito and Perona

2001). It is also true that choosing the nearest neighbors in a different manner or increasing the value of

*k* can change the structure of the strongly connected components. However, for the purposes of this study, we used the MATLAB function

dmperm

on a matrix containing the nearest neighbor associations for the data set. This function, based on the Dulmage-Mendelsohn decomposition, permutes the rows and columns of a matrix to put it into block diagonal form; by including the fact that every point is a neighbor with itself, we can guarantee that this permutation will be symmetric. As output,

dmperm

provides the new order of rows and columns and identifies the blocks of the permuted matrix, where each block represents one strongly connected component within the data.

A remark about principal component analysis (PCA) is in order. This is perhaps the most common mode of dimensionality reduction, and it has also been used in the analysis of sleep EEG data (Gervasoni et al.

2004; Jobert et al.

1994; Corsi-Cabrera et al.

2000). However, PCA places the greatest importance on the directions of largest variance and relies on the assumption that the data is best reconstructed by a

*linear* combination of the original measurements. While we tried PCA and achieved reasonable results, the nonlinear nature of the sleep manifold suggests that a more sophisticated solution is necessary. In addition, the concept of nearest neighbors on which the LLE algorithm is based enabled improvement in the separation of different sleep stages (see Section

4.4), and it played a crucial role in defining the quantitative relationship between the EEG data and mathematical model, as is discussed in Section

5.2.