Locally linear embedding is a method of nonlinear dimensionality reduction that was originally introduced in Roweis and Saul (2000
). It is useful for visualizing high-dimensional data sets as they would be embedded in a low-dimensional space, and it can often uncover relationships and patterns that are masked by the complexity of the original data set. It has been used to obtain maps of facial expressions and classify handwritten digits (Saul et al. 2003
), as well as discriminate between normal and pre-seizure EEG measurements (Ataee et al. 2007
). Here we will use LLE to characterize sleep EEG data and the numerical solutions of the cortical model. By embedding both in a two-dimensional space, we will be able to associate traditional EEG sleep stages with the continuous spectrum of states provided by the model.
Let us begin with a high-dimensional data set stored in a matrix X
of size D
, where each column Xi
represents one of the N
D-dimensional data points. Then the LLE algorithm consists of three steps:
A detailed description of the algorithm and several examples are provided in Saul et al. (2003
). In addition, a Matlab implementation of LLE is available on the authors’ website (Roweis and Saul 2009
); it was used to generate all results presented here.
As a simple example, consider using LLE on a known 3D manifold (Fig. ). In this toy example, the underlying manifold is known (although normally this would not be the case), and we recognize that it has only two dimensions, despite living in 3-dimensional space as shown in Fig. (a). The data set X consists of a random sampling of points from the manifold (Fig. (b)), and the LLE output for a reduction to two dimensions is displayed in Fig. (c). Here we see that LLE successfully unravels the manifold and uncovers its true 2D nature.
Fig. 2 A simple example of LLE, where three dimensions are reduced to two. (a) The underlying manifold, which lives in 3D space but has only two dimensions. In a typical LLE problem, the shape of this manifold is unknown and has too many dimensions to visualize (more ...)
A possible source of confusion with locally linear embedding is the interpretation of output dimensions such as Y1 and Y2. Unlike linear methods such as principal component analysis, LLE does not provide a description of the output vectors in terms of the original D dimensions. The elements of Y are chosen to give the best local reconstructions based on a global minimization problem; this means that the interpretation of Y is different for every data point, and it cannot be described by a simple combination of the original dimensions.
Strongly connected components
The use of the LLE algorithm is based on the assumption that the entire data set lies on the same manifold in high-dimensional space. If more than one manifold is present, the locally linear reconstructions will no longer be accurate (imagine, for example, a point with nearest neighbors located on two separate manifolds). Therefore, before using LLE on a data set, we must verify this assumption.
The mathematics and terminology of directed graphs allows us to accomplish this task (Tarjan 1972
). Note that when we calculate the nearest neighbors in the first step of the LLE algorithm, we create a directed graph based on the data points. For example, suppose there is a data set of seven points, and we have determined that point 2 is a neighbor of point 1, point 5 is a neighbor of point 2, etc. This can be depicted by arrows drawn from each point to its neighbors (Fig. ). Then we can define a strongly connected component
as a group of points where the arrows created by nearest neighbor associations allow for travel from every point in the group to every other point in the group (Tarjan 1972
). When a group of data points is strongly connected, this indicates that they lie on the same manifold
(Saul et al. 2003
Fig. 3 An example of a directed graph generated by nearest neighbor associations. Here point 2 is a neighbor of point 1, point 3 is a neighbor of point 7, etc. In this case, the directed graph forms two strongly connected components: points 1, 2, 5, and points (more ...)
The example in Fig. has two strongly connected components: points 1, 2, 5, and points 3, 4, 6. However, the two groups are not strongly connected together; one can move from the first group to the second through the connection between 4 and 5, but there is no way to get from the second group to the first. Point 7 is not strongly connected to any other point. Therefore, to use LLE on this sample data set, we would remove point 7 and use the algorithm separately on each strongly connected component.
There are several ways to identify the strongly connected components of a data set. The most traditional method involves an algorithm based on depth-first search of the directed graph (Tarjan 1972
). An alternative method relies on analysis of the eigenspace that results from the LLE calculations (Polito and Perona 2001
). It is also true that choosing the nearest neighbors in a different manner or increasing the value of k
can change the structure of the strongly connected components. However, for the purposes of this study, we used the MATLAB function
on a matrix containing the nearest neighbor associations for the data set. This function, based on the Dulmage-Mendelsohn decomposition, permutes the rows and columns of a matrix to put it into block diagonal form; by including the fact that every point is a neighbor with itself, we can guarantee that this permutation will be symmetric. As output,
provides the new order of rows and columns and identifies the blocks of the permuted matrix, where each block represents one strongly connected component within the data.
A remark about principal component analysis (PCA) is in order. This is perhaps the most common mode of dimensionality reduction, and it has also been used in the analysis of sleep EEG data (Gervasoni et al. 2004
; Jobert et al. 1994
; Corsi-Cabrera et al. 2000
). However, PCA places the greatest importance on the directions of largest variance and relies on the assumption that the data is best reconstructed by a linear
combination of the original measurements. While we tried PCA and achieved reasonable results, the nonlinear nature of the sleep manifold suggests that a more sophisticated solution is necessary. In addition, the concept of nearest neighbors on which the LLE algorithm is based enabled improvement in the separation of different sleep stages (see Section 4.4
), and it played a crucial role in defining the quantitative relationship between the EEG data and mathematical model, as is discussed in Section 5.2