PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of sensorsMDPI Open Access JournalsMDPI Open Access JournalsThis articleThis JournalInstructions for authorssubscribe
 
Sensors (Basel). 2017 February; 17(2): 314.
Published online 2017 February 8. doi:  10.3390/s17020314
PMCID: PMC5335939

Hyperspectral Image Classification with Spatial Filtering and [ell]2,1 Norm

Massimo Menenti, Academic Editor

Abstract

Recently, the sparse representation based classification methods have received particular attention in the classification of hyperspectral imagery. However, current sparse representation based classification models have not considered all the test pixels simultaneously. In this paper, we propose a hyperspectral classification method with spatial filtering and 2,1 norm (SFL) that can deal with all the test pixels simultaneously. The 2,1 norm regularization is used to extract relevant training samples among the whole training data set with joint sparsity. In addition, the 2,1 norm loss function is adopted to make it robust for samples that deviate significantly from the rest of the samples. Moreover, to take the spatial information into consideration, a spatial filtering step is implemented where all the training and testing samples are spatially averaged with its nearest neighbors. Furthermore, the non-negative constraint is added to the sparse representation matrix motivated by hyperspectral unmixing. Finally, the alternating direction method of multipliers is used to solve SFL. Experiments on real hyperspectral images demonstrate that the proposed SFL method can obtain better classification performance than some other popular classifiers.

Keywords: alternating direction method of multipliers, hyperspectral classification, outliers, spatial filtering and [ell]2,1 norm (SFL)

1. Introduction

Over the past few decades, hyperspectral imagery has been widely used in different remote sensing applications owing to its high-resolution spectral information of the materials in the scene [1,2,3]. Various hyperspectral image classification techniques have been presented for a lot of real applications including material recognition, urban mapping and so on [4,5,6,7,8].

To date, a lot of hyperspectral image classification methods have been presented. Among them, the most representative method is the support vector machine (SVM) [9], which has shown desirable hyperspectral image classification performance. Recently, the sparse representation based classification methods have received a lot of attention in the area of image analysis [10,11,12,13,14], particularly in the classification of hyperspectral image. Chen et al. introduced a dictionary-based sparse representation framework for hyperspectral classification [15]. To be specific, a test pixel is sparsely represented by a few labeled training samples, and the class is determined as the one with the minimal class-specific representation error. In addition, Chen et al. also proposed the simultaneous orthogonal match pursuit (SOMP) to utilize the spatial information of hyperspectral data [15]. To take the additional structured sparsity priors into consideration, Sun et al. reviewed and compared several structured priors for sparse representation based hyperspectral image classification [16], which can exploit both the spatial dependences between the neighboring pixels and the inherent structure of the dictionary. In [17], Chen et al. extended the joint sparse representation to the kernel version for hyperspectral image classification, which can provide a higher classification accuracy than the conventional linear sparse representation algorithms. In addition, Liu et al. proposed a class-specific sparse multiple kernel learning framework for hyperspectral image classification [18], which determined the associated weights of optimal base kernels for any two classes and led to better classification performances. To take other spectral properties and higher order context information into consideration, Wang et al. proposed the spatial-spectral derivative-aided kernel joint sparse representation for hyperspectral image classification [19], and the derivative-aided spectral information can complement traditional spectral features without inducing the curse of dimensionality and ignoring discriminating features. Moreover, Li et al. proposed the joint robust sparse representation classification (JRSRC) method to take the sparse representation residuals into consideration, which can deal with outliers in hyperspectral classification [20]. To integrate the sophisticated prior knowledge about the spatial nature of the image, Roscher et al. proposed constructing a novel dictionary for sparse-representation-based classification [21], which can combine the characteristic spatial patterns and spectral information to improve the classification performance. In order to adaptively explore the spatial information for different types of spatial structures, Fu et al. proposed a new shape-adaptive joint sparse representation method for hyperspectral image classification [22], which can construct a shape-adaptive local smooth region for each test pixel. In order to capture the class-discriminative information, He et al. proposed a group-based sparse and low-rank representation to improve the dictionary for hyperspectral image classification [23]. To take different types of features into consideration, Zhang et al. proposed an alternative joint sparse representation by the multitask joint sparse representation model [24]. To overcome the high coherence of the training samples, Bian et al. proposed a novel multi-layer spatial-spectral sparse representation framework for hyperspectral image classification [25]. In addition, to take the class structure of hyperspectral image data into consideration, Shao et al. proposed a probabilistic class structure regularized sparse representation method to incorporate the class structure information into the sparse representation model [26].

It had been argued in [27] that the collaborative representation classification can obtain very competitive classification performance, while the time consumption was much lower than that of sparse representation. Thus, various collaborative representation methods had been proposed for hyperspectral image classification. Li et al. proposed the nearest regularized subspace (NRS) classifier by using the distance-weighted Tikhonov regularization [28]. Then, the Gabor filtering based nearest regularized subspace classifier had been proposed to exploit the benefits of using spatial features [29]. Collaborative representation with Tikhonov regularization (CRT) had also been proposed for hyperspectral classification [30]. The main difference between NRS and CRT was that the NRS only used within-class training data for collaborative representation while the latter adopted all the training data simultaneously [30]. In [31], the kernel version of a collaborative representation was proposed and denoted as kernel collaborative representation classifier (KCRC). In addition, Li et al. proposed proposed combining the sparse representation and collaborative representation for hyperspectral image classification to make a balance between sparse representation and collaborative representation in the residual domain [32]. Moreover, Sun et al. combined the active learning and semi-supervised learning to improve the classification performance when given a few initial labeled samples, and proposed the extended random walker [33] algorithm for the classification of hyperspectral image.

Very recently, some deep models had been proposed for hyperspectral image classification [34]. To the best of our knowledge, Chen et al. proposed a deep learning method named stacked autoencoder for hyperspectral image classification in 2014 [35]. Recently, convolutional neural networks have been very popular in pattern recognition, computer vision and remote sensing. Convolutional neural networks usually contained a number of convolutional layers and a classification layer, which can learn deep features from the training data and exploit spatial dependence among them. Krizhevsky et al. trained a large convolutional neural networks to classify the 1.2 million high-resolution images in the ImageNet, which had obtained superior image classification accuracy [36]. Since then, convolutional neural networks had been applied for hyperspectral image classification [37,38], which had achieved desirable classification performance. To take the spatial information into consideration, a novel convolutional neural networks framework for hyperspectral image classification using both spectral and spatial features was presented [39]. In addition, Aptoula et al. proposed a combined strategy of both attribute profiles and convolutional neural networks for hyperspectral image classification [40]. To overcome the imbalance between dimensionality and the number of available training samples, Ghamisi et al. proposed a self-improving band selection based convolutional neural networks method for hyperspectral image classification [41]. In addition, some patch based convolutional neural networks hyperspectral image classification methods had also been proposed, such as the method in [42,43]. In order to achieve low computational cost and good generalization performance, Li et al. proposed combining convolutional neural networks with extreme learning machines for hyperspectral image classification [44]. Furthermore, Shi et al. proposed a 3D convolutional neural networks (3D-CNN) method for hyperspectral image classification that can take both the spectral and spatial information into consideration [45].

However, all of the above mentioned methods, whether they are based on sparse representation, collaborative representation or deep models, adopt the pixel-wise classification strategy, i.e., they do not consider all the pixels simultaneously. In [46], theoretical work has demonstrated that multichannel joint sparse recovery is superior to applying standard sparse reconstruction methods to each single channel individually, and the probability of recovery failure decays exponentially with the increase in the number of channels. In addition, the probability bounds still hold true even for a small number of signals. For the classification of hyperspectral images, the multichannel means recovering multi hyperspectral pixels simultaneously. Therefore, inspired by the theoretical work in [46], in this paper, we propose a hyperspectral classification method with spatial filtering and 2,1 norm (SFL) to deal with all the test samples simultaneously, which can not only take much less time but also obtain comparable good or better classification performance. First, the 2,1 norm regularization is adopted to select correlated training samples among the whole training data set. Meanwhile, the 2,1 norm loss function which is robust for outliers is also implemented. Second, we adopt the simple strategy in [47] to exploit the local continuity, and all the training and testing samples are spatially averaged with their nearest neighbors to take the spatial information into consideration, which can be seen as spatial filtering. Third, the non-negative constraint is added in the sparse representation coefficient matrix motivated by hyperspectral unmixing. Finally, to solve SFL, we use the alternating direction method of multipliers [48], a simple but powerful algorithm that is well suited to distributed convex optimization.

The main contribution of this work lies in proposing an SFL for hyperspectral classification that can deal with all the test pixels simultaneously. Experiments on real hyperspectral images demonstrate that the proposed SFL method can obtain better classification performance than some other popular classifiers.

2. Related Work

In this section, we briefly introduce the classical sparse representation for the classification of hyperspectral images, which can be found in [16]. It is assumed that the pixels in the same class lie in the same low-dimensional subspace, and it has K different classes. Therefore, for an unknown test sample yRB, where B denotes the the number of bands, y is assumed to lie in the union of the K different subspaces, which can seen as the sparse linear combination of all the training samples

y=A1x1+A2x2++AKxK=[A1AK]x1xK=Ax.
(1)

Therefore, given the dictionary of training samples ARB×M, where M is the number of training samples. For an unknown test sample y, the sparse representation coefficient vector xRM can be obtained by solving the optimization problem as follows:

x^=argminxyAx22+λx1,
(2)

where A consists of the class subdictionaries {Ak}k=1,,K, and λ is the regularization parameter. In addition, Equation (2) can be solved by the alternating direction method of multipliers in [49]. Thus, the class label of x is determined as the one with the minimal class-specific reconstruction residual:

Class(y)=argmink=1,,KyAkx^k22.
(3)

3. Proposed Classifiers

In [46], it has been proved that, with the increase in the number of channels, the failure probability of sparse reconstruction decreases exponentially. Thus, multichannel sparse reconstruction is superior to single channel sparse reconstruction. In addition, the probability bounds are valid even for a small number of signals. Based on this theory, we deal with all the test samples simultaneously, and the proposed SFL classification method will be briefly described.

Let Y=[y1,y2,,yN]RB×N, where {yn}n=1,,N denotes the columns of Y, and N denotes the number of test pixels. To deal with all the test pixels simultaneously, it is natural that the sparse representation coefficient matrix X=[x1,x2,,xN]RM×N for all the test pixels can be obtained by solving the optimization problem as follows:

X^=argminXYAXF2+λX1,
(4)

which also can be solved by the alternating direction method of multipliers in [49]. .F represents the matrix Frobenius norm, which is equal to the Euclidean norm of the vector of singular values, i.e.,

XF=X,X=(i=1Mj=1NXij2)12=(i=1rσi2)12,
(5)

where σi (i=1,...,r) denotes the singular value of X. After the optimized X^ is obtained, the classes of all test pixels can be obtained by the minimum class reconstruction error:

Class(yn)=argmink=1,,KynAkxn^k22,n=1,,N.
(6)

However, Equation (4) adopts the pixel-wise independent regression, which ignores the correlation among the whole training data set. Recent research shows that the high-dimensional data space is smooth and locally linear, and it has been versified in image reconstruction and classification problems [50,51]. For joint consideration of the classification of neighborhoods, in this paper, we introduce the 2,1 norm regularization and adapt it to extract correlated training samples among the whole training data set with joint sparsity, which is defined as follows:

X2,1=i=1Mj=1NXij2.
(7)

The 2,1 norm was first introduced by Ding et al. [52], which makes the traditional principal component analysis more robust for outliers. The outliers are defined as data points that deviate significantly from the rest of data. Traditional principal component analysis optimizes the sum of squared errors, since the few data points that have large squared errors will dominate the sum. Therefore, the traditional principal component analysis is sensitive to outliers. It has been shown that minimizing the 1 norm is more robust and can resist a larger proportion of outliers compared with quadratic 2 norms [53]. The 2,1 norm is identical to a rotational invariant 1 norm, and the solution of 2,1 norm based robust principal component analysis is the principal eigenvectors of a more robust re-weighted covariance matrix, which can alleviate the effects of outliers. In addition, the 2,1 norm has the advantage of being rotation invariant compared with the 1 norm [52,54,55], i.e., applying the same rotation to all points has no effect on its performance. Due to the above-mentioned advantages, the 2,1 norm has been applied in feature selection [56], multi-task learning [57], multi-kernel learning [58], and non-negative matrix factorization [59]. Nie et al. [56] introduced the 2,1 norm to feature selection, and they used 2,1 norm regularization to select features across all data points with joint sparsity. The 2,1 norm based loss function is used to remove outliers, and the feature selection process is proved to be effective and efficient.

Similarly, we adopt the 2,1 norm regularization to select correlated training samples among the whole training data set with joint sparsity for hyperspectral image classification. Thus, the corresponding optimization problem is as follows:

X^=argminXYAXF2+λX2,1,
(8)

which can be solved by the alternating direction method of multipliers in [60]. This model can be seen as an instance of the methodology in [61], which can impose sparsity across the pixels both at the group and individual levels. In addition, to make it more robust for outliers, the 2,1 norm loss function is adopted. Thus, the corresponding optimization problem is as follows:

X^=argminXYAX2,1+λX2,1.
(9)

Due to limited resolution of hyperspectral image sensors and the complexity of ground materials, mixed pixels can easily be found in hyperspectral images. Therefore, a hyperspectral unmixing step is needed [62,63]. Hyperspectral unmixing is a process to identify the pure constituent materials (endmembers) and estimate the proportion of each material (abundance) [64]. The linear mixture model has been prevalently used in hyperspectral unmixing, and the abundance is considered to be non-negative in a linear mixture model [65]. If we deem A as the spectral library consisting of endmembers, then X can be seen as the abundance matrix. Therefore, X is also non-negative. When adding the non-negative constraint into the sparse representation matrix, the corresponding optimization problem is as follows:

X^=argminX0YAXF2+λX2,1,
(10)

X^=argminX0YAX2,1+λX2,1.
(11)

In addition, since the spectral signatures of neighboring pixels are highly correlated, which make them belong to the same material with high probability, we thus adopt the simple strategy in [47] to exploit the local continuity, and all the training and testing samples are spatially averaged with their nearest neighbors to take the spatial information into consideration, which can be seen as spatial filtering. Moreover, when N=1, it is easy to see that Equation (8) reduces to Equation (2), and Equation (9) reduces to the optimization problem as follows:

x^=argminxyAx1+λx1.
(12)

To sum up, the detailed procedure of our proposed method can be seen from Figure 1. Finally, to solve the optimization problem from Equation (9) to Equation (12), Equation (10) can be solved by the alternating direction method of multipliers in [60], and Equations (9) and (12) are special cases of Equation (11). Thus, it comes down to solving Equation (11). For simplification, Equation (11) can be written as:

minXAXY2,1+λX2,1+lR+(X),
(13)

where lR+(X)=i=1PlR+(Xi) is the indicator function of nonnegative quadrant R+, and Xi is the i-th column of X. If Xi belongs to the nonnegative quadrant, then lR+(Xi) is zero. If not, it is +.

Figure 1
Flow chart of the proposed method.

In order to solve Equation (11), the alternating direction method of multipliers [48] method is implemented. By introducing auxiliary variables P, Q and W, Equation (11) could be rewritten as:

minXP2,1+λW2,1+lR+(X),s.t.AQY=P,Q=W,Q=A.
(14)

A compact version of it is:

minV,Qg(V)s.t.GQ+BV=Z,
(15)

where g(V)=P2,1+λW2,1+lR+(A), G=AII, B=I000I000I, Z=Y00, V(P,W,X), and I is the unit matrix. Thus, the augmented Lagrangian function could be expressed as:

L(V,Q,Λ)=g(V)+μ2GQ+BVZΛF2,
(16)

where μ>0, Λ/μ stands for the Lagrange multipliers. In order to update P, we solve

Pk+1=argminPP2,1+μ2AQkYPΛ1kF2,
(17)

and its solution is the famous vector soft threshold operator [10], which updates each row independently

Pk+1(r,:)=vect-soft(ζ(r,:),1μ),
(18)

where ζ=AQkYΛ1k, and the vect-soft-threshold function g(b,τ)=bmax{b2τ,0}max{b2τ,0}+τ. To update W, we solve

Wk+1=argminWλW2,1+μ2QkWΛ2kF2,
(19)

and its solution is also the vector soft threshold operator [10]:

Wk+1(r,:)=vect-soft(γ(r,:),λμ),
(20)

where γ=QkΛ2k.

To update X, we solve

Xk+1=argminXlR+(X)+μ2QkXΛ3kF2=max(QkΛ3k,0).
(21)

To update Q, we solve

Qk+1=argminQAQYPk+1Λ1kF2+QWk+1Λ2kF2+QXk+1Λ3kF2,=(ATA+2I)1[AT(Y+Pk+1+Λ1k)+Wk+1+Λ2k+Xk+1+Λ3k].
(22)

The stopping criterion is GQk+BVkZF2<ε*(J*K), where ε is the error threshold, and J and K are the number of rows and columns of Z. μ is updated in the same way as [48], which keeps the ratio between the alternating direction method of multiplier primal norms and dual residual norms within a given positive interval. Based on this, we can get Proposition 1, whose proof of convergence is given in [48].

Proposition 1.

Function g in Equation (15) is closed, proper, and convex. If there exist solutions V* and Q*, then iterative sequence {Vk} and {Qk} converge to V* and Q*, respectively. If not, at least one of {Vk} and {Qk} diverge [48].

4. Experiments

4.1. Experimental Data

Two datasets are used in the experiment. The first dataset is Indiana Pines obtained by Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) in 1992. The image size is 145 × 145, and 220 bands are taken in the spectral range from 0.4–2.5 μm. After removal of water absorption bands (No. 104–108, 150–163, 220), 200 bands are used, and the ground truth image is shown in Figure 2a. There are 16 material classes in Indiana Pines and 10,249 labeled samples. In addition, 1027 samples (about 10%) are used as training data, as shown in Table 1. Thus, the rest is used for testing.

Figure 2
Ground truth image of (a) Indian Pines; (b) Pavia University.
Table 1
Sixteen ground-truth classes in Aviris Indian Pines and the training and test sets for each class.

The second dataset is Pavia University obtained by a Reflective Optics System Imaging Spectrometer (ROSIS) in 2001 at Paiva University, Pavia, Italy. The size of the image is 610 × 340 with a spatial resolution of 1.3 m. The number of bands is 103, and the ground truth image is shown in Figure 2b. There are nine classes and 42,776 labeled samples, 426 of them (about 1%) are chosen as the training data, and the others are used as test data, as shown in Table 2.

Table 2
Nine classes in the University of Pavia and the training and test sets for each class.

4.2. Parameter Setting

In experiments, we mainly compare the classification performance when using the pixel-wise strategy and dealing with all the test pixels simultaneously. In addition, we also made a step-by-step comparison by adding or removing spatial filtering and/or constraints to see which step’s contribution is more important. For these methods, there are mainly five parameters: i.e., the neighbor size T, the regularization parameter λ, the Lagrange multiplier regularization parameter μ, the error tolerance ε and the maximum number of iteration. The neighbor size T and the regularization parameter λ play an important role in the proposed method, which control the size of spatial filtering and the trade-off between fidelity to the data and sparsity of the solution, respectively. While the Lagrange multiplier regularization parameter μ, the error tolerance ε and the maximum number of iteration, which have lesser impact on the efficiency of the corresponding algorithms, are set to a fixed value, i.e., μ=102, ε=106, and the maximum number of iteration is 1000. For the neighbor size T, we use the same parameter setting in [16]. For the Indian Pine data set, a spatial window of 9×9 (T = 81) is adopted, which is due to this image consisting of mostly large homogeneous regions. For the University of Pavia data set, a spatial window of 5×5 (T = 25) is used, which is due many narrow regions being present in this image. The regularization parameter λ is chosen from the given intervals {106, 105, 104, 103, 102, 101}.

Figure 3 shows the performance of overall accuracy as a function of the regularization parameter λ using the hyperspectral image of Indian Pines and Pavia University. For convenience, the “Spatial Filtering” and “Non-negative Constraint” are abbreviated as “SF” and “NC”, respectively. For example, for the “2,1+2,1+SF+NC”, the first “2,1” denotes the loss function norm, the second “2,1” denotes the regularization term norm, “SF” denotes using the spatial filtering, and “NC” denotes using the non-negative constraint. Thus, they are the same as the abbreviation of the other compared methods. It can be seen from Figure 3 that the overall accuracy remains stable when ε<102. It then decreases when ε>102. In addition, “2,1+2,1+SF+NC” and “2,1+2,1+SF” have much better overall accuracy than “2,1+2,1+NC” and “2,1+2,1”, respectively, which demonstrate that it is significant to improve the overall accuracy when taking the spatial filtering into consideration. Moreover, “2,1+2,1+SF+NC” and “2,1+2,1+NC” have better overall accuracy than “2,1+2,1+SF” and “2,1+2,1”, respectively, which demonstrate that it helps to improve the overall accuracy when taking the non-negative constraint into consideration. Furthermore, the elevation of overall accuracy when using the spatial filtering is much larger than those when using the non-negative constraint, which suggests that the spatial filtering has a larger effect on the overall accuracy than the non-negative constraint.

Figure 3
Performance of overall accuracy as a function of the parameter λ using the hyperspectral image of (a) Indian Pines; (b) Pavia University.

4.3. Classification Performance

The experiments are performed on a desktop with 3.5 GHz Intel Core CPU, 64 GB memory and Matlab Code. To evaluate the classification performance of different methods, the overall accuracy, average accuracy and kappa statistic [16] are used to evaluate the performances of these methods. Table 3 and Table 4 show the classification performances for Indian Pines data set when using the pixel-wise strategy and dealing with all the test pixels simultaneously, respectively. It can be seen from Table 3 and Table 4 that methods using the spatial filtering generally obtain better overall accuracy, average accuracy and kappa statistics than those without spatial filtering. For example, “2+1+SF+NC” and “2+1+SF” have much better overall accuracy than “2+1+NC” and “2+1”, respectively, which demonstrates that it helps a lot to improve overall accuracy by using the spatial filtering. In addition, methods using the non-negative constraint generally obtain better overall accuracy than those without non-negative constraints. For example, “1+1+SF+NC” and “1+1+NC” have better overall accuracy than “1+1+SF” and “1+1”, respectively, which demonstrates that it helps to improve overall accuracy by using the non-negative constraint. It also can be clearly seen that the spatial filtering has a larger effect on the classification performance than the non-negative constraint. Moreover, methods using 2,1 norm regularization term can generally obtain better classification performance than methods using 1 norm regularization term, for example, “F+2,1+SF+NC” and “F+2,1” generally have better overall accuracy than “F+1+SF+NC” and “F+1”, respectively, which demonstrate that it is beneficial to select correlated training samples among the whole training data set, and can impose sparsity across the pixels both at the group and individual levels. Furthermore, methods using 2,1 norm loss function can generally obtain better classification performance than methods using F norm loss function. For example, “2,1+2,1+SF+NC” and “2,1+2,1” generally have better overall accuracy than “F+2,1+SF+NC” and “F+2,1”, respectively, which demonstrate that the 2,1 norm loss function is more robust for outliers than F norm loss function. Table 5 and Table 6 show the classification performances for Pavia University data set when using the pixel-wise strategy and dealing with all the test pixels simultaneously, respectively. We can also obtain the above-mentioned conclusion from Table 5 and Table 6 when using the Pavia University data. In addition, from Table 3, Table 4, Table 5 and Table 6, it can be observed that these methods when dealing with all the test pixels simultaneously can obtain comparable or better overall accuracy than these regression based pixel-wise sparse representation methods, and they are much faster than these pixel-wise sparse representation methods, which demonstrates that it is significant to considerer all the test pixels simultaneously. Figure 4 and Figure 5 show the classification maps for the Indian Pines and Pavia University data sets, respectively, which can give a visual comparison between different methods.

Figure 4
Classification maps for the Indian Pines data set. (a) 2+1; (b) 2+1+NC; (c) 2+1+SF; (d) 2+1+SF+NC; (e) 1+1; (f) 1+1+NC; (g) 1+ ...
Figure 5
Classification maps for the Pavia University data set. (a) 2+1; (b) 2+1+NC; (c) 2+1+SF; (d) 2+1+SF+NC; (e) 1+1; (f) 1+1+NC; (g) 1+ ...
Table 3
Overall Accuracy, Average Accuracy, Kappa Statistic and Time of the Indian Pines data set when using pixel-wise strategy.
Table 4
Overall Accuracy, Average Accuracy, Kappa Statistic and Time of the Indian Pines data set when dealing with all the test pixels simultaneously.
Table 5
Overall Accuracy, Average Accuracy, Kappa Statistic and Time of the Pavia University data set when using pixel-wise strategy.
Table 6
Overall Accuracy, Average Accuracy, Kappa Statistic and Time of the Pavia University data set when dealing with all the test pixels simultaneously.

We also choose other eight methods for comparison, i.e., SVM [9,66], NRS [28,67], CRT [30,67], KCRC [31,68], OMP [15], SOMP [15], JRSRC [20] and 3D-CNN [45,69]. The SVM is a very popular classifier, the 3D-CNN is a deep neural network based classifier, and the other six compared methods are collaborative representation and sparse representation based classifiers. Table 7 and Table 8 show the classification performances of the proposed SFL and eight compared methods using the Indian Pines and Pavia University data set, respectively. In addition, Figure 6 and Figure 7 show the classification maps of the Indian Pines and Pavia University data set when using the proposed SFL and eight compared methods, which can give a visual comparison between different methods. From Table 7 and Table 8, it can be clearly seen that the proposed SFL can obtain the best classification performance, which demonstrates that our proposed SFL is efficient for hyperspectral image classification. In addition, the SVM is the fastest, the reason lies in that it is implemeted in C Lagnuage which is much faster than Matlab. NRS, CRT and KCRC are very fast due to the fact that they are collaborative representation methods, and they have closed solutions, which do not need iteration. The OMP and SOMP are also very fast due to the fact that they are greedy sparse representation methods, while the JRSRC method is very time-consuming due to the fact that JRSRC is a regression based sparse representation method. In addition, the 3D-CNN is not fast because the main time-consuming aspect lies in the training. Our proposed method is also a regression based method, which takes more time than the collaborative representation methods and greedy sparse representation methods. There are several possible ways for us to improve the time consumed in the process. One way is to use C Language and graphic processing unit for fast implementation. Another way is to use the first-order primal-dual algorithm in [70] to achieve faster convergence.

Figure 6
Classification maps for the Indian Pines data set using the compared methods and the proposed method. (a) SVM; (b) NRS; (c) CRT; (d) KCRC; (e) OMP; (f) SOMP; (g) JRSRC; (h) 3D-CNN; (i) SFL.
Figure 7
Classification maps for the Pavia University data set using the compared methods and the proposed method. (a) SVM; (b) NRS; (c) CRT; (d) KCRC; (e) OMP; (f) SOMP; (g) JRSRC; (h) 3D-CNN; (i) SFL.
Table 7
Overall Accuracy, Average Accuracy, Kappa Statistic and Time of the Indian Pines data set when using the compared methods and the proposed methods.
Table 8
Overall Accuracy, Average Accuracy, Kappa Statistic and Time of the Pavia University data set when using the compared methods and the proposed methods.

5. Conclusions

In this paper, we propose an SFL method for a hyperspectral image classification method based on the multichannel joint sparse recovery theory in [46], which can deal with all the test pixels simultaneously. The proposed SFL can not only obtain comparably good or better classification performance than using the pixel-wise classification strategy but also takes much less time. In addition, spatial filtering and the non-negative constraints are both adopted to improve the classification performance, and the spatial filtering has a larger effect on the classification than the non-negative constraint. Moreover, methods using 2,1 norm regularization term can generally obtain better classification performance than methods using an 1 norm regularization term, which demonstrate that it is beneficial to select correlated training samples among the whole training data set, and the 2,1 norm regularization term can impose sparsity across the pixels both at the group and individual levels. Furthermore, methods using 2,1 norm loss function can generally obtain better classification performance than methods using F norm loss function, which demonstrate that the 2,1 norm loss function is more robust for outliers than F norm loss function. Finally, experiments on two real hyperspectral image data sets demonstrate that the proposed SFL method outperforms some other popular classifiers. In our future work, we can adopt the CNN framework to extract deep features of hyperspectral images, which can be integrated into our method to improve the classification performance.

Acknowledgments

Financial support for this study was provided by the National Natural Science Foundation of China under Grants 61272278, 61275098 and 61503288; the Ph.D. Programs Foundation of Ministry of Education of China under Grant 20120142110088; the China Postdoctoral Science Foundation 2015M572194, 2015M570665; and the Hubei Province Natural Science Foundation 2014CFB270, 2015CFA061.

Author Contributions

Author Contributions

All authors have made great contributions to the work. Hao Li and Cong Zhang designed the research and analyzed the results. Hao Li, Chang Li, Zhe Liu and Chengyin Liu performed the experiments and wrote the manuscript. Chang Li gave insightful suggestions for the work and revised the manuscript.

Conflicts of Interest

Conflicts of Interest

The authors declare no conflict of interest.

References

1. Iordache M.D., Bioucas-Dias J.M., Plaza A. Sparse unmixing of hyperspectral data. IEEE Trans. Geosci. Remote Sens. 2011;49:2014–2039. doi: 10.1109/TGRS.2010.2098413. [Cross Ref]
2. Mei X., Ma Y., Fan F., Li C., Liu C., Huang J., Ma J. Infrared ultraspectral signature classification based on a restricted Boltzmann machine with sparse and prior constraints. Int. J. Remote Sens. 2015;36:4724–4747. doi: 10.1080/01431161.2015.1079664. [Cross Ref]
3. Ma J., Zhou H., Zhao J., Gao Y., Jiang J., Tian J. Robust feature matching for remote sensing image registration via locally linear transforming. IEEE Trans. Geosci. Remote Sens. 2015;53:6469–6481. doi: 10.1109/TGRS.2015.2441954. [Cross Ref]
4. Ma L., Zhang X., Yu X., Luo D. Spatial Regularized Local Manifold Learning for Classification of Hyperspectral Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2016;9:609–624. doi: 10.1109/JSTARS.2015.2472460. [Cross Ref]
5. Poona N., Van Niekerk A., Ismail R. Investigating the utility of oblique tree-based ensembles for the classification of hyperspectral data. Sensors. 2016;16:1918 doi: 10.3390/s16111918. [PMC free article] [PubMed] [Cross Ref]
6. Mei X., Ma Y., Li C., Fan F., Huang J., Ma J. A real-time infrared ultra-spectral signature classification method via spatial pyramid matching. Sensors. 2015;15:15868–15887. doi: 10.3390/s150715868. [PMC free article] [PubMed] [Cross Ref]
7. Yang X., Hong H., You Z., Cheng F. Spectral and image integrated analysis of hyperspectral data for waxy corn seed variety classification. Sensors. 2015;15:15578–15594. doi: 10.3390/s150715578. [PMC free article] [PubMed] [Cross Ref]
8. Liu S., Jiao L., Yang S. Hierarchical Sparse Learning with Spectral-Spatial Information for Hyperspectral Imagery Denoising. Sensors. 2016;16:1718 doi: 10.3390/s16101718. [PMC free article] [PubMed] [Cross Ref]
9. Chang C.C., Lin C.J. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2011;2:27. doi: 10.1145/1961189.1961199. [Cross Ref]
10. Wright S.J., Nowak R.D., Figueiredo M.A. Sparse reconstruction by separable approximation. IEEE Trans. Signal Process. 2009;57:2479–2493. doi: 10.1109/TSP.2009.2016892. [Cross Ref]
11. Jiang J., Ma J., Chen C., Jiang X., Wang Z. Noise robust face image super-resolution through smooth sparse representation. IEEE Trans. Cybern. 2016 doi: 10.1109/TCYB.2016.2594184. [PubMed] [Cross Ref]
12. Ma J., Zhao J., Ma Y., Tian J. Non-rigid visible and infrared face registration via regularized Gaussian fields criterion. Pattern Recognit. 2015;48:772–784. doi: 10.1016/j.patcog.2014.09.005. [Cross Ref]
13. Ma J., Zhao J., Tian J., Yuille A.L., Tu Z. Robust point matching via vector field consensus. IEEE Trans. Image Process. 2014;23:1706–1721. [PubMed]
14. Ma J., Zhao J., Tian J., Bai X., Tu Z. Regularized vector field learning with sparse approximation for mismatch removal. Pattern Recognit. 2013;46:3519–3532. doi: 10.1016/j.patcog.2013.05.017. [Cross Ref]
15. Chen Y., Nasrabadi N.M., Tran T.D. Hyperspectral image classification using dictionary-based sparse representation. IEEE Trans. Geosci. Remote Sens. 2011;49:3973–3985. doi: 10.1109/TGRS.2011.2129595. [Cross Ref]
16. Sun X., Qu Q., Nasrabadi N.M., Tran T.D. Structured priors for sparse-representation-based hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 2014;11:1235–1239.
17. Chen Y., Nasrabadi N.M., Tran T.D. Hyperspectral image classification via kernel sparse representation. IEEE Trans. Geosci. Remote Sens. 2013;51:217–231. doi: 10.1109/TGRS.2012.2201730. [Cross Ref]
18. Liu T., Gu Y., Jia X., Benediktsson J.A., Chanussot J. Class-Specific Sparse Multiple Kernel Learning for Spectral–Spatial Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2016;54:7351–7365. doi: 10.1109/TGRS.2016.2600522. [Cross Ref]
19. Wang J., Jiao L., Liu H., Yang S., Liu F. Hyperspectral Image Classification by Spatial–Spectral Derivative-Aided Kernel Joint Sparse Representation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015;8:2485–2500. doi: 10.1109/JSTARS.2015.2394330. [Cross Ref]
20. Li C., Ma Y., Mei X., Liu C., Ma J. Hyperspectral Image Classification with Robust Sparse Representation. IEEE Geosci. Remote Sens. Lett. 2016;13:641–645. doi: 10.1109/LGRS.2016.2532380. [Cross Ref]
21. Roscher R., Waske B. Shapelet-Based Sparse Representation for Landcover Classification of Hyperspectral Images. IEEE Trans. Geosci. Remote Sens. 2016;54:1623–1634. doi: 10.1109/TGRS.2015.2484619. [Cross Ref]
22. Fu W., Li S., Fang L., Kang X., Benediktsson J.A. Hyperspectral Image Classification Via Shape-Adaptive Joint Sparse Representation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2016;9:556–567. doi: 10.1109/JSTARS.2015.2477364. [Cross Ref]
23. He Z., Liu L., Zhou S., Shen Y. Learning group-based sparse and low-rank representation for hyperspectral image classification. Pattern Recognit. 2016;60:1041–1056. doi: 10.1016/j.patcog.2016.04.009. [Cross Ref]
24. Zhang E., Jiao L., Zhang X., Liu H., Wang S. Class-Level Joint Sparse Representation for Multifeature-Based Hyperspectral Image Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2016;9:4160–4177. doi: 10.1109/JSTARS.2016.2522182. [Cross Ref]
25. Bian X., Chen C., Xu Y., Du Q. Robust Hyperspectral Image Classification by Multi-Layer Spatial-Spectral Sparse Representations. Remote Sens. 2016;8:985 doi: 10.3390/rs8120985. [Cross Ref]
26. Shao Y., Sang N., Gao C., Ma L. Probabilistic class structure regularized sparse representation graph for semi-supervised hyperspectral image classification. Pattern Recognit. 2017;63:102–114. doi: 10.1016/j.patcog.2016.09.011. [Cross Ref]
27. Zhang L., Yang M., Feng X. Sparse representation or collaborative representation: Which helps face recognition?; Proceedings of the IEEE International Conference on Computer Vision; Barcelona, Spain. 6–13 November 2011; pp. 471–478.
28. Li W., Tramel E.W., Prasad S., Fowler J.E. Nearest regularized subspace for hyperspectral classification. IEEE Trans. Geosci. Remote Sens. 2014;52:477–489. doi: 10.1109/TGRS.2013.2241773. [Cross Ref]
29. Li W., Du Q. Gabor-filtering-based nearest regularized subspace for hyperspectral image classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014;7:1012–1022. doi: 10.1109/JSTARS.2013.2295313. [Cross Ref]
30. Li W., Du Q., Xiong M. Kernel collaborative representation with Tikhonov regularization for hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 2015;12:48–52.
31. Wang D., Lu H., Yang M.H. Kernel collaborative face recognition. Pattern Recognit. 2015;48:3025–3037. doi: 10.1016/j.patcog.2015.01.012. [Cross Ref]
32. Li W., Du Q., Zhang F., Hu W. Hyperspectral Image Classification by Fusing Collaborative and Sparse Representations. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2016;9:4178–4187. doi: 10.1109/JSTARS.2016.2542113. [Cross Ref]
33. Sun B., Kang X., Li S., Benediktsson J.A. Random-Walker-Based Collaborative Learning for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2017;55:212–222. doi: 10.1109/TGRS.2016.2604290. [Cross Ref]
34. Chen Y., Jiang H., Li C., Jia X., Ghamisi P. Deep feature extraction and classification of hyperspectral images based on convolutional neural networks. IEEE Trans. Geosci. Remote Sens. 2016;54:6232–6251. doi: 10.1109/TGRS.2016.2584107. [Cross Ref]
35. Chen Y., Lin Z., Zhao X., Wang G., Gu Y. Deep learning-based classification of hyperspectral data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014;7:2094–2107. doi: 10.1109/JSTARS.2014.2329330. [Cross Ref]
36. Krizhevsky A., Sutskever I., Hinton G.E. Imagenet classification with deep convolutional neural networks; Proceedings of the Advances in Neural Information Processing Systems; Lake Tahoe, NV, USA. 3–6 December 2012; pp. 1097–1105.
37. Slavkovikj V., Verstockt S., De Neve W., Van Hoecke S., Van de Walle R. Hyperspectral image classification with convolutional neural networks; Proceedings of the 23rd ACM International Conference on Multimedia; Brisbane, Australia. 26–30 October 2015; pp. 1159–1162.
38. Hu W., Huang Y., Wei L., Zhang F., Li H. Deep convolutional neural networks for hyperspectral image classification. J. Sens. 2015;2015:258619. doi: 10.1155/2015/258619. [Cross Ref]
39. Yue J., Zhao W., Mao S., Liu H. Spectral–spatial classification of hyperspectral images using deep convolutional neural networks. Remote Sens. Lett. 2015;6:468–477. doi: 10.1080/2150704X.2015.1047045. [Cross Ref]
40. Aptoula E., Ozdemir M.C., Yanikoglu B. Deep Learning With Attribute Profiles for Hyperspectral Image Classification. IEEE Geosci. Remote Sens. Lett. 2016;13:1970–1974. doi: 10.1109/LGRS.2016.2619354. [Cross Ref]
41. Ghamisi P., Chen Y., Zhu X.X. A Self-Improving Convolution Neural Network for the Classification of Hyperspectral Data. IEEE Geosci. Remote Sens. Lett. 2016;13:1537–1541. doi: 10.1109/LGRS.2016.2595108. [Cross Ref]
42. Yu S., Jia S., Xu C. Convolutional neural networks for hyperspectral image classification. Neurocomputing. 2017;219:88–98. doi: 10.1016/j.neucom.2016.09.010. [Cross Ref]
43. Liang H., Li Q. Hyperspectral Imagery Classification Using Sparse Representations of Convolutional Neural Network Features. Remote Sens. 2016;8:99 doi: 10.3390/rs8020099. [Cross Ref]
44. Li Y., Xie W., Li H. Hyperspectral image reconstruction by deep convolutional neural network for classification. Pattern Recognit. 2017;63:371–383. doi: 10.1016/j.patcog.2016.10.019. [Cross Ref]
45. Shi C., Liu F., Jiao L., Bibi I. 3-D Deep Convolutional Neural Networks for Hyperspectral classification. IEEE Tr. Geosci. Remote Sens. 2017 in press.
46. Eldar Y.C., Rauhut H. Average case analysis of multichannel sparse recovery using convex relaxation. IEEE Trans. Inf. Theory. 2010;56:505–519. doi: 10.1109/TIT.2009.2034789. [Cross Ref]
47. Li W., Du Q. Joint within-class collaborative representation for hyperspectral image classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014;7:2200–2208. doi: 10.1109/JSTARS.2014.2306956. [Cross Ref]
48. Boyd S., Parikh N., Chu E., Peleato B., Eckstein J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 2011;3:1–122. doi: 10.1561/2200000016. [Cross Ref]
49. Bioucas-Dias J.M., Figueiredo M.A. Alternating direction algorithms for constrained sparse regression: Application to hyperspectral unmixing; Proceedings of the IEEE 2010 2nd Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing; Reykjavik, Iceland. 14–16 June 2010; pp. 1–4.
50. Jiang J., Hu R., Wang Z., Han Z. Noise robust face hallucination via locality-constrained representation. IEEE Trans. Multimedia. 2014;16:1268–1281. doi: 10.1109/TMM.2014.2311320. [Cross Ref]
51. Jiang J., Hu R., Wang Z., Han Z., Ma J. Facial image hallucination through coupled-layer neighbor embedding. IEEE Trans. Circuits Syst. Video Technol. 2016;26:1674–1684. doi: 10.1109/TCSVT.2015.2433538. [Cross Ref]
52. Ding C., Zhou D., He X., Zha H. R1-PCA: Rotational invariant L1-norm principal component analysis for robust subspace factorization; Proceedings of the 23rd International Conference on Machine Learning; Pittsburgh, PA, USA. 25–29 June 2006; pp. 281–288.
53. Ma J., Qiu W., Zhao J., Ma Y., Yuille A.L., Tu Z. Robust L2E estimation of transformation for non-rigid registration. IEEE Trans. Signal Process. 2015;63:1115–1129. doi: 10.1109/TSP.2014.2388434. [Cross Ref]
54. Xu H., Caramanis C., Sanghavi S. Robust PCA via Outlier Pursuit. IEEE Trans. Inf. Theory. 2012;58:3047–3064. doi: 10.1109/TIT.2011.2173156. [Cross Ref]
55. Ma J., Chen C., Li C., Huang J. Infrared and visible image fusion via gradient transfer and total variation minimization. Inf. Fusion. 2016;31:100–109. doi: 10.1016/j.inffus.2016.02.001. [Cross Ref]
56. Nie F., Huang H., Cai X., Ding C.H. Efficient and robust feature selection via joint L2,1-norms minimization; Proceedings of the Advances in Neural Information Processing Systems; Vancouver, BC, Canada. 6–11 December 2010; pp. 1813–1821.
57. Evgeniou A., Pontil M. Multi-task feature learning. Adv. Neural Inf. Process. Syst. 2007;19:41.
58. Bach F.R. Consistency of the group lasso and multiple kernel learning. J. Mach. Learn. Res. 2008;9:1179–1225.
59. Kong D., Ding C., Huang H. Robust nonnegative matrix factorization using l21-norm; Proceedings of the 20th ACM International Conference on Information and Knowledge Management; Glasgow, UK. 24–28 October 2011; pp. 673–682.
60. Iordache M.D., Bioucas-Dias J.M., Plaza A. Collaborative sparse regression for hyperspectral unmixing. IEEE Trans. Geosci. Remote Sens. 2014;52:341–354. doi: 10.1109/TGRS.2013.2240001. [Cross Ref]
61. Sprechmann P., Ramirez I., Sapiro G., Eldar Y.C. C-HiLasso: A collaborative hierarchical sparse modeling framework. IEEE Trans. Signal Process. 2011;59:4183–4198. doi: 10.1109/TSP.2011.2157912. [Cross Ref]
62. Bioucas-Dias J.M., Plaza A., Dobigeon N., Parente M., Du Q., Gader P., Chanussot J. Hyperspectral unmixing overview: Geometrical, statistical, and sparse regression-based approaches. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2012;5:354–379. doi: 10.1109/JSTARS.2012.2194696. [Cross Ref]
63. Li C., Ma Y., Mei X., Liu C., Ma J. Hyperspectral unmixing with robust collaborative sparse regression. Remote Sens. 2016;8:588 doi: 10.3390/rs8070588. [Cross Ref]
64. Ma Y., Li C., Mei X., Liu C., Ma J. Robust Sparse Hyperspectral Unmixing with [ell]2,1 Norm. IEEE Trans. Geosci. Remote Sens. 2017 doi: 10.1109/TGRS.2016.2616161. [Cross Ref]
65. Li C., Ma Y., Huang J., Mei X., Liu C., Ma J. GBM-Based Unmixing of Hyperspectral Data Using Bound Projected Optimal Gradient Method. IEEE Geosci. Remote Sens. Lett. 2016;13:952–956. doi: 10.1109/LGRS.2016.2555341. [Cross Ref]
66. Chang C.-C., Lin C.J. LIBSVM—A Library for Support Vector Machines. [(accessed on 8 January 2017)]. Available online: https://www.csie.ntu.edu.tw/cjlin/libsvm/
67. Li W. Wei Li’s Homepage. [(accessed on 8 January 2017)]. Available online: http://research.cs.buct.edu.cn/liwei/
68. Lu H. Huchuan Lu’s Homepage. [(accessed on 8 January 2017)]. Available online: http://202.118.75.4/lu/publications.html.
69. Liu F. Fang Liu’s Homepage. [(accessed on 8 January 2017)]. Available online: http://web.xidian.edu.cn/fliu/en/paper.html.
70. Chambolle A., Pock T. A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 2011;40:120–145. doi: 10.1007/s10851-010-0251-1. [Cross Ref]

Articles from Sensors (Basel, Switzerland) are provided here courtesy of Multidisciplinary Digital Publishing Institute (MDPI)