Hilbert–Schmidt Independence Criterion Subspace Learning on Hybrid Region Covariance Descriptor for Image Classification

The region covariance descriptor (RCD), which is known as a symmetric positive deﬁnite (SPD) matrix, is commonly used in image representation. As SPD manifolds have a non-Euclidean geometry, Euclidean machine learning methods are not directly applicable to them. In this work, an improved covariance descriptor called the hybrid region covariance descriptor (HRCD) is proposed. The HRCD incorporates the mean feature information into the RCD to improve the latter’s discriminative performance. To address the non-Euclidean properties of SPD manifolds, this study also proposes an algorithm called the Hilbert-Schmidt independence criterion subspace learning (HSIC-SL) for SPD manifolds. The HSIC-SL algorithm is aimed at improving classiﬁcation accuracy. This algorithm is a kernel function that embeds SPD matrices into the reproducing kernel Hilbert space and further maps them to a linear space. To make the mapping consider the correlation between SPD matrices and linear projection, this method introduces global HSIC maximization to the model. The proposed method is compared with existing methods and is proved to be highly accurate and valid by classiﬁcation experiments on the HRCD and HSIC-SL using the COIL-20, ETH-80, QMUL, face data FERET, and Brodatz datasets.


Introduction
A growing number of non-Euclidean data, such as symmetric positive definite (SPD) manifolds [1] and Grassmann manifolds [2], are often encountered in vision recognition tasks. In particular, SPD manifolds have attracted increased attention in the form of the region covariance descriptor (RCD) [3,4], Gaussian mixed model (GMM) [5], tensors [6][7][8][9], etc. In this work, we mainly discuss the image classification on SPD manifolds. e RCD has been proved to be an effective descriptor in a variety of applications [10][11][12]. It captures the correlation between different features of an image and represents the image with a covariance matrix. However, the mean vector of features has been proved to be significant in image recognition tasks [13,14]. In this work, we construct a new image descriptor by directly incorporating the mean feature information into the RCD. e new image descriptor is called the hybrid region covariance descriptor (HRCD). e HRCD inherits the advantages of the RCD, and it is more discriminable than the RCD. e images represented by the HRCD are also SPD matrices that lie on SPD manifolds. Most classical machine learning algorithms are constructed on linear spaces. Given the non-Euclidean geometry of Riemannian manifolds, directly using most of the conventional machine learning methods on Riemannian manifolds is inadequate [15,16]. erefore, the classification of the points on Riemannian manifolds has become a hot research topic.
Two main approaches are generally adopted to cope with the nonlinearity of Riemannian manifolds. e first approach is to construct learning methods by directly considering the Riemannian geometry; one such method is the widely used tangent approximation [17,18]. Most existing SPD classification methods have been proposed by making use of Riemannian metrics [15,16] or matrix divergences [19,20] as the distance measure for SPD matrices [21][22][23]. e other approach is to project the SPD matrices to another space, such as a high-dimensional reproducing kernel Hilbert space (RKHS) [24] and another low-dimensional SPD manifold [25]. Classification algorithms can be constructed on the projection space. Benefiting from the success of kernel methods in Euclidean spaces, the kernelbased classification scheme is a good choice for the analysis of SPD manifolds and has shown promising performance [26,27]. Kernel-based methods embed manifolds into RKHSs and further project these manifolds to Euclidean spaces via an explicit mapping. Hence, algorithms designed for linear spaces can be extended to Riemannian manifolds. However, the mapping from RKHSs to Euclidean spaces using existing methods is based on a linear assumption. Moreover, the intrinsic connections of SPD matrices and low-dimensional projections are ignored.
To circumvent this limitation of kernel-based methods, we propose introducing the Hilbert-Schmidt independence criterion (HSIC) to the kernel trick and refer to the resulting method as the HSIC subspace learning (HSIC-SL) algorithm. Specifically, we derive the log-linear and log-Gaussian kernels to embed SPD matrices into a high-dimensional RKHS and then project these points into a low-dimensional vector space of the RKHS. To align the low-dimensional representation with the intrinsic features of the input data, we introduce statistical dependence between the SPD matrices and the low-dimensional representation. In this work, explicit mapping is obtained on the basis of subspace learning and HSIC maximization. Here, HSIC can be used to characterize the statistical correlation between two datasets. e main contributions of this study are as follows: (1) We propose a novel covariance descriptor called the HRCD. e proposed descriptor explores discriminative information effectively. (2) e HSIC is first applied to the kernel framework on SPD Riemannian manifolds, and a novel subspace learning algorithm called the HSIC-SL is proposed. e proposed method achieves effective classification on the basis of global HSIC maximization.
(3) We identify two simple kernel functions involved in the HSIC-SL algorithm. e diversity of kernels improves the flexibility of the HSIC-SL. e rest of the paper is organized as follows. We provide a review of previous work in Section 2. A brief description about RCD, RKHS, and HSIC is presented in Section 3. We derive the proposed descriptor and algorithm in detail in Section 4. e experimental results are presented in Section 5 to demonstrate the effectiveness of the HRCD and HSIC-SL. Conclusions and future research directions are established in Section 6.

Literature Review
is section presents a brief review of RCDs, as well as recent manifold classification methods constructed on SPD manifolds.
e RCD was first introduced by Tuzel et al. [28]. It represents an image region with a nonsingular SPD matrix by extracting the covariance matrix of multiple features. e covariance matrix does not have any information about size and ordering, which implies certain scale and rotation independence. e RCD is used not only in image recognition but also in image set recognition tasks, in which an image set is modeled with its natural second-order statistic [4,29]. e GMM could also serve as the SPD descriptor of an image set. Under the assumption of the multi-Gaussian distribution of an image set [30], hundreds of images in the image set are assigned to a small number of Gaussian components. Each Gaussian component is represented as an SPD matrix [31]. us, the image set is described by multiple SPD matrices. As mentioned previously, mean vectors have also been proved to be important in recognition tasks. In [32], the mean information was utilized in an improved log-Euclidean Gaussian kernel. However, this approach is limited to a specific algorithm and lacks generality. In the current work, we propose to incorporate the feature mean information and covariance matrix into a new SPD matrix and introduce first-order statistic information into the image RCD to improve the discriminant ability of the descriptor.
When the manifold under consideration is an SPD manifold, the tangent space of a particular point is a linear space. Most works map SPD matrices onto the tangent space of a particular point; thus, traditional linear classifiers can be applied. Under this framework, dimensionality reduction and clustering methods, such as Laplacian eigenmaps, local linear embedding (LLE), and Hessian LLE, have been extended to Riemannian manifolds [17]. Tuzel et al. introduced LogitBoost for classification on Riemannian manifolds [18]. e classifier has been generalized to multiclass classification [33]. Sparse coding by embedding manifolds into identity tangent spaces to identify the Lie algebra of SPD manifolds was considered in [34]. Such tangent space approximations could preserve manifold value data and eliminate the swelling effect. However, flattening a manifold through tangent spaces may generate inaccurate modeling, especially for regions far away from the tangent pole.
Except for tangent approximation, many efforts have been devoted to the distance measure on SPD manifolds to measure the true SPD manifold geometry; examples include the log-Euclidean Riemannian metric (LERM) [15] and the affine invariance Riemannian metric (AIRM) [16]. Although matrix divergences are not real Riemannian metric, they provide fast and approximate distance computation. Sivalingam et al. proposed tensor sparse coding (TSC) for positive definite matrices [35] that utilizes the Burg divergence to perform sparse coding and dictionary learning on SPD manifolds. Riemannian dictionary learning and sparse coding (DLSC) [36] represents data as sparse combinations of SPD dictionary atoms via a Riemannian geometric approach and characterizes the loss of optimization for DLSC via the affine invariant Riemannian metric. However, these methods cannot be applied to other Riemannian manifolds because of the specificity of the specific metrics used. Embedding discriminant analysis (EDA) [37] identifies a bilinear isometric mapping such that the resulting representation maximizes the preservation of Riemannian geodesic distance.

Mathematical Problems in Engineering
As for the proposed kernel methods for SPD manifolds, Riemannian locality preserving projections (RLPPs) [38] embed Riemannian manifolds into low-dimensional vector spaces by defining Riemannian kernels; moreover, their computational complexity is heavy, and the kernel is not always positive definite. Jayasumana et al. [39] presented a framework on Riemannian manifolds to identify the positive definiteness of Gaussian RBF kernels and utilized the log-Euclidean Gaussian kernel in kernel principal component analysis (KPCA) for a recognition task. Caseiro et al. proposed a heat kernel mean shift on Riemannian manifolds [40]. In [41], kernel DLSC based on LERM was introduced. Harandi et al. proposed to seek sparse coding by embedding the space of SPD matrices into Hilbert spaces through two types of Bregman matrix divergences [42]. Covariance discriminative learning (CDL) [4] utilizes a matrix logarithm operator to define kernel functions and then explicitly maps the covariance matrices from a Riemannian manifold to a Euclidean space. Zhuang et al. proposed a data-dependent kernel learning framework on the basis of the kernel learning and Riemannian metric (KLRM) [43]. In [44], multi-kernel SSCR (MKSSCR) created a linear combination of a set of Riemannian kernels. Considerable results are achieved in the kernel framework. To improve the performance of the kernel trick, we introduce a statistical dependence constraint between SPD matrices and projections and measure the statistical dependence with the HSIC.

Related Work
In this section, we briefly review the RCD and the properties of the RKHS and HSIC.

Region Covariance
Descriptor. Region covariance descriptor (RCD), as a special case of SPD matrices, proposes a natural way of fusing multiple features. Suppose R is an image region of size h × w, and we can extract multiple features of every pixel in R. e features could be location, grey values, and gradients. We denote the feature vector of the k-th pixel as where x and y denote the location, I is the grey value, and I x and I y are the gradients with respect to x and y. e RCD of R is defined as where n � h × w and μ � (1/n) n k�1 z k ∈ R d denotes the mean of the points. en, the image region can be presented by a d × d SPD matrix, where d depends on the number of features.

RKHS.
Reproducing kernel Hilbert space (RKHS) is the theoretical basis of kernel methods. After projecting the data into a RKHS, various machine learning methods will be implemented in the RKHS.
Let S(Ω) be a function space, and 〈·, ·〉 is an inner product defined on S(Ω). e complete inner product space H � (S(Ω), 〈·, ·〉) induced by 〈·, ·〉 is a Hilbert space. For all x ∈ Ω and f ∈ S(Ω), if the function k satisfies f(x) � 〈f, k(·, x)〉, then k is the reproducing kernel of the RKHS H. We denote the mapping defined by the repro- e function k could be a kernel function only if the kernel matrix K is symmetric positive definite, where According to Mercer's theorem [45], once a valid reproducing kernel is defined, we can generate a unique Hilbert space.

Hilbert-Schmidt Independence Criterion (HSIC).
e HSIC [46] is usually used to characterize the statistical correlation of two datasets. e mathematical theory of HSIC has been studied for a long time and there are many achievements [47][48][49][50][51]. In the computation of HSIC, the two datasets are firstly embedded onto two RKHSs, and then the HSIC of the two set of data is measured by the Hilbert-Schmidt (HS) operator of these two RKHSs.
Let X be a random variable/vector defined on Ω X and Y be a random variable/vector defined on Ω Y , H X and H Y be two separate Hilbert spaces, and ϕ X : Ω X ⟶ H X and ϕ Y : Ω Y ⟶ H Y be the kernel mappings defined by the reproducing kernels, respectively.

Hilbert-Schmidt (HS) Operators.
Let T: H X ⟶ H Y be a compact operator and e X i |i ∈ I be the orthonormal basis of H X ; if i∈I ‖Te X i ‖ 2 Y < +∞, then T is called a Hilbert-Schmidt (HS) operator [52]. If for all

Mean Functions and Cross Covariance Operators.
Let Φ X : H X ⟶ R be a continuous linear functional over According to Riesz theorem, there must a unique HS operator μ X ∈ H X such that for all f ∈ H X , Φ X (f) � 〈f, μ X 〉 X ; then, μ X is called the mean function of φ X (X). Similarly, the mean function μ Y of φ Y (Y) is defined in the same way.

Mathematical Problems in Engineering
Let Φ be a continuous linear functional over en, according to Riesz theorem, there must a unique HS operator where C XY is called the cross covariance operator between φ X (X) and φ Y (Y). e relationship between C XY , μ X , and μ Y is illustrated in Figure 1. e two datasets Ω X and Ω Y are embedded into H X and H Y by the kernel functions ϕ X : Ω X ⟶ H X and ϕ Y : Ω Y ⟶ H Y , respectively. μ X and μ Y are the mean functions. e HSIC of Ω X and Ω Y is given by the Hilbert-Schmidt (HS) operator C XY of H X and H Y .

HSIC.
HSIC of two random variables/vectors is defined as follows: It can be seen from the definition of HSIC(X, Y) that instead of directly calculating the covariance of X and Y, i.e., , HSIC first transforms X and Y into H X and H Y , respectively, and then calculates the covariance of φ X (X) and φ Y (Y) by using HS operators between H X and H Y . In practice, H X and H Y are generated from kernel functions k X and k Y .
If the joint probability distribution of X and Y is given or known, HSIC(X, Y) can be calculated as follows: where Generally speaking, the joint probability distribution of X and Y is unknown, and only some samples of X and Y are given: In this case, the statistical average can be approximated by the sample average. Moreover, it is assumed that when i ≠ j, the probability of the random event X � x i ; Y � y j is 0; then, the cross covariance operator C XY and the mean functions μ X and μ Y can be approximated as follows: Figure 1: e sketch mapping of HSIC.
Substituting equation (12) into equations (9) and (10) gives where with all elements being 1 and K X and K Y are the kernel matrices of X and Y, respectively. At last, the calculation formula of HSIC(X, Y) can be obtained by where

HRCD and HSIC Subspace Learning
e algorithm can be divided into four steps. First, we model the labeled training samples through the proposed HRCD. Each training sample is described by an SPD matrix. Second, we embed the SPD matrices into a high-dimensional RKHS H with a defined kernel function and further project the elements in the RKHS into a vector space H ′ . e mapping f: H ⟶ H ′ is explored by solving the optimization problem.
ird, we use the explicit map in mapping the training and test samples onto the low-dimensional and relatively discriminative space. Finally, the classification task can be realized by executing a classifier on H ′ . An overall illustration of the algorithm is shown in Figure 2.
Given a set of training samples belonging to c classes χ � X 1 , X 2 , . . . , X N ⊆ M, X i ∈ R d×d is an SPD matrix, and l � l 1 , l 2 , . . . , l N denote the corresponding labels. e representation of χ on the low-dimensional vector space H ′ is denoted as Y � [y 1 , y 2 , . . . , y N ], y i ∈ R m . In the framework of the kernel analysis, the low-dimensional representation y i of X i is obtained by the mapping

Hybrid Region Covariance Descriptor.
As mentioned previously, we propose incorporating the feature mean information into the RCD to improve the discrimination of the descriptor. We refer to the resulting descriptor as the HRCD. Given an image region R, we extract multiple features of each point in R and then compute the mean vector and covariance matrix of the features. Suppose that the feature vector of the k-th pixel is z k ; the mean vector μ ∈ R d and covariance matrix Σ ∈ R d×d can then be computed as Following the information geometry theory [54], we combine the mean and the covariance matrix into a new matrix without additional computational complexity. e new matrix is constructed as Here, d is the dimensionality of the feature vector, and |·| is the determinant operator. e (d + 1) × (d + 1) SPD matrix X is the HRCD of the image. As a result of the inheritance from the covariance matrix, the HRCD is not only effective, robust, and low-dimensional but also more discriminable than the RCD.

Kernel Function in HSIC-SL.
In defining a valid RKHS, the kernel must be symmetric positive definite. Many discussions on the symmetric positive definiteness of kernel functions are based on vector spaces. In this section, we introduce two typical kernel functions on SPD Riemannian manifolds.

Log-Linear Kernel.
e polynomial kernel is one of the commonly used kernel functions in Euclidean spaces. e polynomial kernel function in a vector space is defined as where x i , x j ∈ R n . If α � c � 1 and β � 0, then equation (17) is a linear kernel.
When the linear kernel is developed into SPD Riemannian manifolds, it should be redefined in a sophisticated form. e linear kernel on SPD Riemannian manifolds can be defined as Mathematical Problems in Engineering We denote the kernel as the log-linear kernel.

Log-Gaussian Kernel.
e Gaussian kernel is another popular kernel function in Euclidean spaces. e definition of a Gaussian kernel function is where d(x i , x j ) � ‖x i − x j ‖ F . A good effect can be achieved by replacing the Euclidean distance with a log-Euclidean distance. e log-Gaussian kernel is defined by where d LE (X i , X j ) � ‖log(X i ) − log(X j )‖ F is the log-Euclidean distance between X i and X j . e positive definiteness of k LE was proved in [39]. e parameter σ is an important parameter in the Gaussian kernel. To make the log-Gaussian kernel sensitive to distances, we suggest setting σ to the average value of distances between the training samples.

HSIC Subspace Learning.
After embedding the matrices to the RKHS, we further project the points into a vector space through explicit mapping. We aim to find the explicit mapping from the RKHS to the vector space by maximizing the HSIC between the SPD matrices and the low-dimensional representation, as well as preserving the local information. e proposed HSIC-SL includes global HSIC maximization and within-class information preservation.
We denote the HSIC of χ and the low-dimensional representation Y as HSIC(χ, Y). According to equation (14), HSIC(χ, Y) can be computed as e input data χ and projection Y are represented by K χ and K Y , respectively. To explicitly realize the low-dimensional representation, we define the kernel function of Y in We denote the kernel matrix of k Y as K Y . It can be computed by Substituting equation (24) to equation (22) yields As N is not related to Y, the coefficient (1/N 2 ) in equation (25) can be omitted. en, we have where L H � K χ C N K χ C N K T χ . e within-class information is represented by the within-class scatter S W , which is defined as where N i is the number of training samples of the i-th class,  Mathematical Problems in Engineering i-th class. According to the relationship between y i and X i , equation (27) can be further transformed into where In sum, the objective function is formulated as e Rayleigh quotient maximum problem is commonly used in optimization problems because of the fast and simple calculation. e problem shown in equation (29) can be solved by calculating the Rayleigh quotient maximum. To tackle the singularity, we add a small perturbation ε to the diagonal elements of L W . e optimal projection matrix W is composed of the eigenvectors corresponding to the m biggest eigenvalues of (L W + εI N ) − 1 L H , where I N is the identity matrix.
Hence, for the given test image, we first compute its HRCD and denote the result as X t . e projection can be obtained by y t � WK tRow . en, the class of the test image can be predicted through the nearest neighbor classifier.

Experiment
e performance of the HRCD and the proposed algorithm is verified in this section. We considered five widely studied image datasets: COIL-20 (Columbia Object Image Library) dataset [55], ETH-80 dataset [56], Queen Mary University of London (QMUL) dataset [57], face data FERET dataset [58], and Brodatz dataset [59]. All of the compared methods were implemented in MATLAB R2014 and tested on an Intel(R) Core(TM) i5-4670K (3.40 GHz) machine.

Performance of HRCD.
To verify that the HRCD is an effective image descriptor, we directly used the KNN classifier on the image feature space represented by the HRCD and RCD without feature extraction. By adopting the Euclidean metric, LERM, AIRM, and Burg divergence as the measurements, the classification experiments were performed on COIL-20 and ETH-80. e COIL-20 dataset contains 20 objects, each of which contains 72 images measuring 128 × 128 at different directions. Figure 3 shows the sample pictures. Features including grey values and firstand second-order gradients were extracted to calculate the RCD and HRCD of an image. Hence, the RCD and HRCD of an image were a 5 × 5 SPD matrix and a 6 × 6 SPD matrix, respectively. e images were randomly split into the training set and test set, with 10 pictures assigned to the training set and the remaining images assigned to the test set.
ETH-80 is an image set containing eight types of objects, such as apple, pears, cars, and dogs. Each object has 10 instances, and each instance contains images from 41 different viewpoints. e images in ETH-80 were resized to 128 × 128 (Figure 4). For the RCD and HRCD representations, we extracted the following features: F(x, y) � x, y, R x,y , G x,y , B x,y , I x,y , I x , I y , I xx , I yy , (30) where R x,y , G x,y , B x,y are the RGB color values of a pixel at the position of x and y, I x,y is the greyscale value, and |I x |, |I y |, |I xx |, |I yy | are the first-and second-order gradients of intensities. e RCD and HRCD of the image were a 10 × 10 SPD matrix and a 11 × 11 SPD matrix, respectively. Half of the instances in every object were used for training, and the remaining instances were used for the test. Each instance in the training and test sets comprised 100 random samples. erefore, the training and test sets each contained 800 images. Table 1 lists the classification accuracies and runtimes under different metrics. To eliminate the randomness of the experiment, we obtained the average accuracy and runtime for 20 tests.

Experiments on QMUL Dataset.
e QMUL dataset [44] is a set of images of human heads collected from airport terminal cameras. e dataset is composed of 20,005 images. It is divided into five classes according to the direction of the head images: back, front, left, right, and background. e samples from QMUL are shown in Figure 5. e dataset was divided into the training and test sets in advance. Table 2 shows the number of training and test sets in every class. e extracted feature of any pixel is F(x, y) � I L (x, y), I a (x, y), I b (x, y), where I L (x, y), I a (x, y), and I b (x, y) are the three channel values of the CIELAB color space, I x and I y are the first-order gradients in the x-and y-directions of I L (x, y), respectively, and G i (x, y); i � 1, . . . , 8 is the response of eight difference-of-Gaussians filters. We obtained a 13 × 13 SPD matrix for the RCD and a 14 × 14 SPD matrix for the HRCD. e training data consisted of 200 randomly selected samples for each category, and the test set consisted of 100 randomly selected samples. e KNN (k � 12) search was used to construct the neighborhood graphs in the RLPP and Geometry-DR. e parameters (σ) in the kernels of the KPCA, RLPP, KSLR, and HSIC-SL were set to the average distances. e parameter c in the KSLR was set to 0.3. e parameter ε in the proposed method was set to 0.001. We evaluated the performance of the CDL, RLPP, KSLR, Geometry-DR, and HSIC-SL for various dimensions and reported the maximum performance. In logEuc-SC, RSR, TSC, Riem-DLSC, and KLRM-DL, 50 dictionaries and kernel parameters were learned from the training set. e kernel function in the RSR and the basic kernel in the KLRM-DL was the Stein kernel. e parameter alpha was set to 0.1, and the number of data samples was set to 30. e 1NN classifier was adopted in all the algorithms.
In Table 3, we show the recognition accuracy of the HSIC-SL and the other existing algorithms. To eliminate the  randomness of the experiment, we used the average recognition rate for 20 tests. HSIC-SL (log-Gaussian) + HRCD and HSIC-SL (log-linear) + HRCD achieved impressive performance while HSIC-SL (log-Gaussian) + HRCD obtained the highest classification accuracy. Moreover, the accuracy of the HRCD was greater than that of the RCD in the experiment. ese results indicated that the HRCD was better than the RCD. Furthermore, HSIC-SL + HRCD was better than the other algorithms.

Experiments on FERET Dataset.
To conduct the face recognition experiment, we used the "b" subset of the FERET dataset [56], which consists of 2,000 face images of 200 people. e images are those of 71 females and 129 males of diverse ethnicities, genders, and ages. e images were cropped and downsampled to 64 × 64. e training set was composed of images with "ba," "bc," "bh," and "bk" labels. Images marked as "bd," "be," "bf," and "bg" constituted the test set. e feature vector for computing the RCD and HRCD is described by where x and y denote the position, I(x, y) is the intensity, and G uv (x, y) is the response value of the Gabor filter. e direction u of the Gabor filter was from 0 to 4, and the scale v was from 0 to 7. us, the RCD and HRCD of each image were a 43 × 43 SPD matrix and a 44 × 44 SPD matrix, respectively. e neighborhood graphs constructed in the RLPP and Geometry-DR were KNN (k � 3). e kernel functions with Jeffrey and Stein divergences were adopted in RSR and, respectively, denoted as RSR-J and RSR-S for brevity. In RSR, TSC, Riem-DLSC, KLRM-DL, and logEuc-SC, all training samples were regarded as dictionary atoms. e settings of the other parameters were the same as those for the QMUL dataset. Table 4 shows the recognition rates of the compared algorithms. e proposed method was not the best algorithm for the FERET dataset. It only achieved the highest recognition accuracy in the "bd" test scenario. Nevertheless, the average recognition accuracies of HSIC-SL were still better than those of the other algorithms and were only slightly worse than those of KLRM-DL. Hence, HSIC-SL was still a feasible algorithm for the FERET dataset. We also noticed that HSIC-SL (log-Gaussian) performed better than HSIC-SL (log-linear). erefore, the log-Gaussian kernel was more suitable than the log-linear kernel for this dataset.

Experiments on Brodatz Dataset.
We performed two texture classification experiments on the Brodatz dataset [57]. Examples from the Brodatz dataset are shown in     Figure 6. e first experiment was a grouping experiment with selected textures, and the other was a classification experiment for all texture images.
In the first experiment, we followed the test setup designed in [35] and selected three of the test schemes. e schemes included one of the 5-texture groups, one of the 10texture groups, and one of the 16-texture groups. e number of classes selected in each test scheme is shown in Table 5. Each image was resized to 256 × 256, and then 64 regions measuring 32 × 32 were extracted. e covariance matrices were computed from a five-dimensional feature vector, including intensity and the first-and second-order gradients. In each test scheme, eight samples in one image were randomly selected as the training data, and the remaining samples were used for the test. Geometry-DR was not suitable for the dataset because of the low dimension of the SPD matrices in this experiment. e results shown in Figure 7 are the average results for 20 tests.
HSIC-SL achieved the highest classification result on all test schemes, except for the 5-texture test, in which the recognition rates of most of the algorithms were relatively close. In this dataset, HSIC-SL (log-linear) performed better than HSIC-SL (log-Gaussian).
In the second experiment, 20 random samples were chosen as the training set, and 10 random samples were chosen as the test set from all the texture images. e average results for 20 tests are presented in Table 6. HSIC-SL (log-linear) and HSIC-SL (log-Gaussian) outperformed the other methods, with the latter being marginally better than the former when all texture pictures were classified. In addition, HSIC-SL modeled in the HRCD was much higher than HSIC-SL modeled in the RCD. e discriminative ability of the HRCD was verified again.

Experiments on COIL-20 and ETH-80 Datasets.
In this experiment, we used the COIL-20 dataset [55] and ETH-80 dataset in the object categorization task. e experimental procedure was the same as that described in Section 5.1. We compared the proposed method with KPCA [39], RLPP [38], KSLR [32], and CDL [4]. In addition, KPCA and RLPP were conducted on the HRCD and, respectively, denoted as KPCA + HRCD and RLPP + HRCD. e classifier adopted in all of the algorithms was the 1NN classifier. Table 7 shows the classification accuracies of the methods on COIL-20 and ETH-80. First, HSIC-SL obtained the best accuracy in all of the datasets. is result indicated that the introduction of the HSIC improved the effectiveness of the recognition algorithm. Second, the classification accuracies of RLPP, KPCA, and HSIC-SL in the RCD were lower than those in the HRCD (i.e., RLPP + HRCD, KPCA + HRCD, and HSIC-SL + HRCD). is result proved once again that the HRCD had advantages over the RCD. Finally, the effectiveness of the log-linear kernel and log-Gaussian kernel in HSIC-SL was demonstrated in the experiments.

Analysis of Dimensionality.
e parameter m was regarded as the dimensionality of the vector space after feature extraction. e curves of the classification accuracies of the compared algorithms on COIL-20 [55], ETH-80 [37], and Brodatz versus m are shown in Figures 8 and 9. e experimental setups were the same as those described in the previous section.
With the increase of the dimensionality, the recognition accuracy curves showed an upward trend. When the recognition accuracy reached a certain value, the recognition rate remained basically stable within a certain range of the subspace dimension.

Discussion.
In the above experiments, the performance of the RCD and HRCD and the effectiveness of HSIC-SL and the other algorithms were compared. e following observations were made: (1) e classification accuracy in the image feature space represented by the HRCD was better than that by the RCD regardless of which classifier was used (i.e., KNN classifier without feature extraction or the proposed HSIC-SL). e result showed that the proposed image descriptor HRCD outperformed the RCD.
(2) When the RCD was used as the image descriptor, the HSIC-SL method was superior to most of the methods, except for the FERET and Brodatz datasets. In FERET, the performance of Riem-DLSC, MKSSCR, and KLRM-DL was slightly better than that of HSIC-SL + RCD (log-Gaussian kernel). In Brodatz, the performance of HSIC-SL + RCD was slightly worse than that of the other methods in the 5-texture group, 10-texture group, and 16-texture group. Nevertheless, the recognition accuracy of HSIC-SL + RCD in the experiment on all texture images was higher than those of the other methods. e results showed that HSIC-SL was indeed an excellent algorithm on SPD manifolds, but it was inferior in the classification of datasets with subtle features, such as face recognition and texture recognition. At the same time, the HRCD makes up for this defect to a certain extent. e performance of HSIC-SL + HRCD was almost superior to that of all methods. However, in the FERET dataset, the average recognition accuracy of HSIC-SL was lower than that of KLRM-DL.      (3) In the experiments, we also compared the performance of the log-Gaussian kernel and log-linear kernel. In general, the log-Gaussian kernel was better than the log-linear kernel. However, in the experiments on QMUL and Brodatz, the log-linear kernel obtained better results than the log-Gaussian kernel. e difference in performance indicated that the choice of kernel affected the performance of HSIC-SL. We can improve the performance of HSIC-SL by selecting a suitable kernel function.

Conclusions
In this work, we propose an improved covariance descriptor called the HRCD, which represents images with SPD matrices. e HRCD inherits the advantages of the RCD and is more effective.
To address the classification problem on SPD Riemannian manifolds, we propose an efficient image classification method that is based on a kernel framework. We refer to it as HSIC-SL. rough the definition of the log-linear kernel and log-Gaussian kernel, the input images represented by SPD matrices can be embedded into the RKHS. To seek explicit mapping from the RKHS to the vector space, HSIC-SL constructs the objective function on the basis of the framework of subspace learning and HSIC maximization. HSIC-SL always outperforms other representative methods without increasing computational complexity. e proposed algorithm also has certain limitations. e average classification accuracy is slightly worse than that of KLRM-DL on the FERET dataset. Hence, the covariance descriptor is not strong enough to handle the classification of small details, such as face recognition. For our future work, we will employ other effective features to form the covariance matrices. We will also explore other useful kernel functions to suit different types of datasets.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.