Scientiﬁc African

Kernel-based Sparse Representation (SR) has impacted positively on the classiﬁcation performance in image recognition and has eradicated the problems attributed to the non-linear distribution of face images and its implementation with Dictionary Learning (DL). However, the locality construction of image data containing more discriminative information, which is crucial for classiﬁcation has not been fully examined by the current Kernel Sparse representation-based approaches. Furthermore, similar coding outcomes between test samples and neighbouring training data, restrained in the kernel space are not being fully realized from the image features with similar image groupings to effectively capture the embedded discriminative information. To handle these issues, we propose a novel DL method, Kernel Locality-Sensitive Discriminative SR (K-LSDSR) for face recognition. In the proposed K-LSDSR, a discriminative loss function for the groupings based on sparse coeﬃcients is introduced into a locality-sensitive DL (LSDL). After solving the optimized dictionary, the sparse coeﬃcients for the testing image feature samples are obtained, and then the classiﬁcation results for face recognition is realized by reducing the error between the original and reassembled samples. Experimental results have shown that the proposed K-LSDSR signiﬁcantly improves the performance of face recognition accuracies compared with competing methods and is robust to various diverse environments in image recognition. © 2019 Published by Elsevier B.V. on behalf of African Institute of Mathematical Sciences / Next Einstein Initiative.


Introduction
SR is mostly applied to signal reconstruction [55] , experiments have also shown that it performs well with video analysis and image classification [41 , 52] . Again, the implementation of SR for face recognition in the past decades has received a lot of attention in the area of pattern recognition and Computer Vision. For a specified signal to be thoroughly represented, a good dictionary must be learned from training samples. This shows that the quality of the dictionary is very crucial for efficient SR. The dictionary could be realized either by utilizing all the training samples as the dictionary to code the test samples, as in the case of the locality constrained linear coding [38] or a learned dictionary for SR is adopted for each training sample in the set (e.g. K-means, Singular Value Decomposition (K-SVD) algorithm, Fisher Discriminative Dictionary Learning (FDDL)). Training samples are utilized as the dictionary for all the methods that adopts the first strategy. Even though, they exhibit good performance with regards to classification, the dictionary might not be that effective to exemplify the samples well. This is mainly attributed to noisy information that may have accompanied the initial training samples and may not fully capture the Discrimination information embedded in the training samples. The latter grouping is also not appropriate for recognition, since it only ensures that the dictionary is best expressed in the training samples with strict SR. Even though numerous approaches including the LSDL [27 , 41 , 43] that incorporated a Locality Regularization Constraint into the structure of the DL which guarantees that the over complete dictionary (OCD) Leaned is representative enough to resolve the above stated issues, there was still the need for methods that will help resolve issues of noise, occlusion, aging, variations in resolution, different facial expressions of objects, and illumination changes in universal environments.
More so, the problem with the traditional SR methods is that, they cannot produce the same or similar results when the input features are from the same categorization. The elementary approach to building the dictionary is by exhausting all the training samples which may result in the size of the dictionary being huge, and is subsequently unfavorable to the sparse solver [43] . A lot of techniques such as the Method of Optimal Direction (MOD) [11] which updates all the atoms simultaneously with fixed sparse codes using Least Square Methods and an orthogonal matching pursuit for sparse coding, the K-SVD algorithm [1] for studying a handful sized dictionary for SR from the perspective of the training data have recently been suggested. As has been discussed in [28] , in an attempt to resolve this issue, they recommneded further iteratively updating the K-SVD (which focuses on only the representational power or efficiency of the dictionary but not its discrimination abilities) trained dictionary grounded on the result of a linear classifier, by realizing a Dictionary that may be effective for classification with representational power. Further studies literarily along the same direction includes [24 , 25] , which used a more complicated objective function in optimizing the dictionary for the training phase so as to gain some discriminative power for the dictionary. There are other discriminative DL methods that use discriminative and reconstructive error to learn the dictionary. A structured based FDDL algorithm was proposed in [48] , that improves the performance of pattern classification whose dictionary has a relation with the class labels and further exploited the discriminability in both representative coefficient and its residual in [49] . A discriminative kernel SR based classification approach was introduced in [26] , that uses a discriminative dictionary and a residual for an effective classification results. An over complete discriminative DL approach based on K-SVD was also proposed in [54] for an optimal classification results, by introducing a classification error term into the objective function. Recently, the discriminative structured DL with hierarchical group sparsity that reduces the linear predictive error and discriminability of sparse codes (SC) using hierarchical group sparsity was introduced in [45] . The discriminative structured DL for image classification that combines into the objective function, the discriminative properties associated with reconstruction error, characteristic error and the classification error terms was also proposed in [39] . In the same vein of improving the discriminability of dictionary learning algorithms, the structure adaptive dictionary leaning for SR based classification was proposed in [6] , the learning of dictionary discriminatively for group sparse representation was discussed in [2 , 33] , and discriminative DL for face recognition with low ranked regularization by [20] . However, these methods do not take into consideration the locality structure of data in spite of the fact that they have been effectively applied in the area of image classification. One way to take advantage of data locality and Kernel Sparse Representation based Classification (KSRC) [50] , the Locality-constrained Linear Coding for Image Classification (LLC) built on the theories of [47] for image classification was proposed in [38] . LLC could achieve a smaller reconstruction error with multiple code reconstruction and local smoothness with respect to sparsity. More recently, there has been an advancement in sparse DL for video semantic analysis as proposed in [52] , a video semantic analysis based on Locality-sensitive Discriminant SR and Weighted KNN (LSDSR-WKNN). This approach gave a better category discrimination on the SR of video semantic concepts as well as robust face recognition. The kernelized locality sensitive adaptor was also discussed in [34] , that utilize a group structure information in the training data. In the face of having improved group discrimination on the SR of video semantic models, the LSDSR-WKNN method also unsuccessful in its bid to completely explore the probable information of the given samples.
In recent years, several kernel approaches has been proposed to improve the effectiveness of face recognition algorithms. The Multi kernel learning (locality constraint) for face recognition, that considered local structure of data was discussed in [56] . In this approach, a projection matrix is learned in conjunction with a kernel weight to generate a low dimensional subspace by minimizing the within class and maximizing the between class reconstruction errors. A virtual kernel based sparse dictionary for face recognition is proposed in [12] . This approach was introduced to improve on the computational efficiency of kernel based sparse representation approaches. A self-weighted kernel method for clustering and classification was also proposed in [18] . This approach aid graph based clustering and semi-supervised classification. In [34] , a robust face recognition approach based kernelized group sparse representation was engineered. The approach uses group structure information to in the training set and measures the local similarity information existing amongst the training and the test sets in the kernel space. Other approaches to exploit the nonlinear nature of face images, videos and to capture local discriminative information were also discussed in [2 , 7 , 8 , 22 , 46 , 52] . Though these approaches yield good face image and video recognition results, with embedded structure information exploited, they are not able to capture effectively and wholly, the discriminative information that could be hidden in the structure of face images for effective recognition results.
Based on the foregoing, this paper seeks to improve the discriminative power of SR features and further exploit the embedded non-linear information of similar image categories by encoding SC of samples from the same category. Thus proposes a kernel based locality -sensitive discriminative sparse representation for face recognition (K-LSDSR), which is more effective and has better numeric stability with the adaptation of the L2-norm. The K-LSDSR algorithm aids the modification of the dictionary adaptively by means of the changes amongst the reconstructed dictionary atoms and the training samples with the kernelled locality adaptor mapped into high dimensional feature space. More so, the implementation of the kernel at both DL and Sparse Coding stages as well as the locality adaptor increases the efficiency of the algorithm for the locality and similarity of the dictionary atoms. The discriminative information of the dictionary can be explored by the proposed technique by introducing a discriminant loss function at the dictionary learning stage. This approach has the propensity in obtaining more discriminative information thereby enhancing the classification ability of spares representation features. The dictionaries learned could realistically represent facial images more, and thus give a better representation of the samples, associating the sparse coefficients. The framework of the proposed K-LSDSR is depicted by Fig. 1 .
The main contributions of this paper are as enumerated below: (1) A discriminative loss function is utilized, giving an optimal dictionary for addressing SR issues and optimized sparse coefficients for reconstructing the dictionary, so as to enhance the discriminative classification of image features. This further results in an efficient reconstruction of the dictionary, particularly when the size gets large and enforces the capture of dependencies in image features belonging to the same category. (2) The implementation of Kernel into the objective function enables the structure and the non-linear information hidden in the test and training data is best utilized and hence more discriminative SR obtained. More so, it enables efficient reconstruction of samples most especially when the size gets large. This enables higher recognition performance by exploring the nonlinear discriminative information embedded in image data. (3) A locality-sensitive discriminative dictionary learning and SR based on kernel sparsity is mainly made for face recognition that embeds image features originally. Consequently the features are sparsely encoded with image features for better preservation of the dictionary, hence an improvement in the accuracy of image recognition concept. This therefore yields an improvement in the accuracy of face recognition.
The remainder of the paper is systematized into the following categorizations; Section 2 details a review of related works, Section 3 discusses the proposed algorithms, Experimental results are presented in Section 4. Finally, Section 5 outlines the main conclusions and recommendations.

Sparse representation algorithms
Generally, SR for signal reconstruction and all the techniques involved including but not limited to decoding, sampling, compression, and transmission is one of the utmost essential ideologies as regards signal processing deductions. Additionally, a signal sampled could impeccably be reconstructed from an arrangement of samples, if only the rate surpasses two times the maximum frequency of the main signal [3] . It utilizes an OCD to linearly reconstruct a data case for a given dictionary. Assuming that example is emanating from space R d , and thus every example integrated to a matrix X є R d xm . Thus for a sample to be closely exemplified by a linear amalgamation of dictionary, D and for the magnitude of the samples being huge as compared with its dimensions in D, then m > d , hence D is an OCD. Generally, from the matrix stated above, if d < m , and y є R d then we define the linear system of equations as y = Z β.
The sparsest representation solution can be acquired by solving y = Z β with the l 0 -norm constraint [9] . Thus expression y = Z β could be transformed into the optimization problem as stated in Eq. (1) below: Now since the issue of noise cannot be entirely avoided as a result of real data containing some minute amount of noise, the formulation of Eq. (1) is transmuted into Eq. (2) below; y = Zβ + s (2) where s R d denotes the noise and is constrained as ||s|| 2 ≤ ɛ . More so, Eq. (2) could also be transformed with the introduction of noise as stated in the optimization problems of equations (3) below: Additionally, the Lagrange multiplier could be adopted and applied on Eq.
The sparse representation of the original sample is denoted as the sparse linear integration of the over-complete dictionary. The objective function of representation is as stated in formula (1) . SR remained extensively utilized in computational intelligence, ML, computer vision and pattern recognition, and many more. Implementation SR algorithms encompasses pursuing the sparsest linear groupings of basis functions from an OCD as stated in formula (1) and is known to be essentially NP-hard with numerical instability [9] . In the quest to approximate a desired solution, some greedy algorithms have been proposed. However, the approximated solutions of these algorithms are suboptimal despite being easy and simple to implement [41] . This has led to the convex relaxation of formula (4) recently, by replacing the nonconvex L 0 -norm with the convex L 1 -norm as indicated in formula (5) According to Eq. (4) , since the L 0 -norm minimization could adequately guarantee a distinctive SR solution of x , the SR solution can be solved with the L 1 -norm minimization problem of Eq. (5) . In situations where a may have some little dense noise, the following optimization problems referred to as Lasso as discussed in literature with an introduction of a regularization parameter [35] .
where the sparsity of x is controlled by the regularization parameter λ. In recent times, a lot of approaches have adopted different techniques to solve the problem of Eq. (6) as discussed in the literature [40] . In order to improve the over-fitting issues, the L 2 -norm regularization was introduced. However, the solutions of both l 0 -norm and L 1 -norm minimizations are strictly sparse while that of L 2 -norm, is not but limitedly-sparse and thus believe inhibits the qualities of discriminability (Z. Zhang, Wang, Zhu, Liu, & Chen). Thus in our approach, we adopted the L2-normalization to make our approach more stable.

Kernel sparse representation
Mapping the features to a higher dimensional space may be helpful for more efficient classification. Suppose ∅ ( · ) is a mapping function from N R to a higher dimensional feature space. To avoid the explicit high-dimensional mapping procedure, mercer kernels could be helpful. A mercer kernel can be written as K( x, y ) = ∅ (x ) , ∅ (y ) . The dot product of the two high dimensional mapping ∅ ( x ) and ∅ ( y ) can be directly replaced by the function value of ( x, y ). The common mercer kernels include the linear kernel ( x, y ) = x, y , which equals to non-mapping; the Gaussian kernels K( x, y ) = ( − | | x −y | | 2 C ) ; the polynomial kernels K( x, y ) = ( x, y + c) d ( c and d are parameters) and the sigmoid kernels K( x, y ) = tanh (a ( x T y ) + r ( a and r are parameters) [10 , 16 , 21] . Any algorithms involved dot products can be replaced by a chosen kernel without explicit mapping to a higher dimensional space. Multiple Kernel Learning for SR based for classification taking advantage of nonlinear kernel SRC for its efficient representation in high dimensional feature space was proposed in [32] . In this approach the SC are firstly updated while the kernel collaborative coefficients are fixed and vice versa by its implementation of the two methods training approaches to learn the kernel weights and the sparse code and are repeated until a stall criterion is met. Supposed the class of a training sample was predicted by the use of a training matrix Y then the SC will always be zero throughout with the exception of a single 1 corresponding to the training sample under consideration and might not help in computing the optimal kernel. Hence to avoid the aforementioned, the associated column of Y is set to 0 before determining the sparse code. This deduction is done as follows: where ˜ Y i = [ y 1 , . . . , y i −1 , 0 , y i +1 , . . . , y N , and 0 is a d -dimensional with zeros as entries, x i is the sparse vector and X is the matrix composed of X = [ X 1 , . . . , X N ] R NxN . The linear amalgamation of M base kernels is as stated below in learning the optimal kernel k k y i , where η m is the weight of the m th base kernel and M m =1 η m k m = 1 . Impressive result have been reported in [32] that learns the optimum kernel function k and resulted in a extreme training classification while sidestepping the problem of over-fitting.

Locality-sensitive sparse representation
Test examples in SR are usually denoted by the atoms of the dictionary that may not be neighbouring it for signal reconstructions. This may render it incongruous for holding data locality, and thus primes to unsuitable recognition rates. In the LSDL approach, the locality constraint was introduced into both the sparse coding and the DL stages, hence a well representation of the test sample by the neighbouring dictionary atoms. The optimization function below denotes the formulation of the LSDL approach: where p i R Kx 1 is locality adaptor, is the elementwise multiplicative symbol and the regularization parameter that measures the locality constraint is λ. The SR centered classification is boosted by the introduction of a discriminative loss function to the structure of the proposed approach, by utilizing class information since the LSDL approach does not take into account class information, thus to advance the exactitude of image data representation or video analysis. In image feature based sparse representation, image features from the same image category may not be able to achieve the same coding results. It assumed however that image samples from the same category could be encoded as having similar sparse coefficients for face recognition. This is done to enhance the power of discrimination for sparse based classification approaches. To address the above challenge, a locality-sensitive discriminative sparse representative (LSDSR) algorithm was proposed in [53] based on Eq. (9) . The approach incorporated a discriminative loss function based on sparse coefficients, into the objective function of the LSDL technique, which achieve an optimal dictionary with the power of discrimination enhanced. A discriminating loss function using class information to enhance the SR based classification is designed in this study, since class information is not cogitated by the LSDL technique. This was done in order to improve the accuracy of video analysis. It has been reported in [37] that LLC give promising classification results if resultant sparse coefficients are utilized as features for testing and training. A lot of locality preserving techniques such as discussed in [23 , 42] , for preserving the local structure of data in dictionary learning that results in a close form solution during the learning process and an integration of the of the penalty term into the structure of the DL has successfully been implemented for locality DL. In [19] , a dual layer locality preserving technique based on KSVD was proposed for object recognition. Despite the successes chalked by the approach in improving the discriminability of the learned dictionary, it is challenged with the issues of dimensionality reduction and fails to exploit fully, the discriminative information essential for classification. Besides, the discrimination ability of the learned dictionary is a key to an effective classification results and aims at classifying an input query sample correctly. This has motivated us to propose efficient classification techniques by adopting kernel learning and locality sensitive sparsity coding of the sparse coefficients for the purposes of enhancing the power of discrimination for video semantic analysis and classification of video semantic contents.

Proposed method
The proposed algorithm for K-LSDSR is detailed in this section by incorporating kernel and a discriminative lost function centred on SR into the objective function of locality-sensitive discriminative SR approach. The proposed approach could realize optimum dictionary and further augment the power of discriminability of SC. Image Features from representation might not be capable of achieving the same coding results, hence an assumption is made that samples are coded as related sparse coefficients in the SR based data recognition. This is done with the objective of improving the discrimination capabilities of SR based classifiers and also dealing with the capture of hidden nonlinear information. As a result, data similarity in the kernel feature space is measured, by exploring the nonlinear relationship of data better. Assume where we have the samples A, the dictionary D and the locality-sensitive adaptor P i being mapped into a high dimensional feature space with the function ∅ ( · ), then K(A ) = ∅ (A ) R qxn , K(D ) = ∅ (D ) R qxM , and K( P i ) = ∅ ( P i ) R Kx 1 could be substituted into Eq. (10) . Hence our K-LSDSR framework is reformulated as stated below; Here the locality-sensitive adaptor P i utilizes the l2-norm function and its kth element in P i is denoted by P i j = | | ∅ a i − ∅ d j | | and the symbol represents element-wise multiplication, the shift variant constant 1 T x i enforces the coding result of x to stay the same although the source of the data coordinate may be budged as indicated in [51] and the regularization parameter controlling the reconstruction error and the sparsity is λ. The dictionary in the high dimensional feature space, ∅ (D ) = ∅ (A ) V according to [31] , where V R nxm is the representative matrix.
Considering cases where sample features from similar categories could be having comparable SC, a discriminative loss function built on sparse coefficients for the purposes of augmenting the discrimination capabilities of input signals or samples in SR. More so, a discriminative loss function is introduced. This is aimed at minimizing the with-class scatter of SC and concurrently maximizing the between-class scatter of SC based on the Fisher criterion. In addition, samples from same category are accordingly compressed and the ones from different categories are detached. The discriminative loss function centred on sparse coefficients is explained below: is the within class-similar term, M i is the mean vector of the representation coefficient x i belonging to the class i . The term | | x i || 2 2 combined with | | X || 2 2 could make Eq. (11) more stable based on theorem of [57] .
With η set to 1 for simplicity, Eq. (11) can therefore be reformulated as The proposed method as stated in Eq. (13) is implemented by enforcing data locality in high dimensional feature space. The locality adaptor is utilized to determine the kernel distance amongst the test sample ∅ ( A ) and every column of D . Note , . . . , ∅ ( a n ) ] represents all the training samples in the kernel feature space. The vector P, the dissimilarity vector in Eq. (13) is implemented to suppress the corresponding weight and also penalizes the distance existing amongst the test sample and every training sample in the kernel feature space. Furthermore, it should however be made know that the resulting coefficients in our K-LSDSR formulation may not be fully sparse with regards to l2 -norm, but is seen as sparse because the representation solutions only have few significant values with most being zero. The test samples and their neighbouring training samples in kernel space are encoded when the problem of Eq. (12) is being minimized and the resulting coefficients X still sparsed because as P ij gets large, x ij shrink to be zero. Most coefficients therefore get to zero, with just some few having significant values.
The proposed K-LSDSR approach integrates kernel, data locality and sparsity in obtaining sparse coefficients, hence the capability of learning discriminative sparse representation coefficients for classification. A kernelled exponential locality adaptor was implemented in our proposed method as explained in Eq. (14) below: where σ is a constant and d k ( a i , a j ) is the kernel Euclidean distance induced by a kernel space, k defined as The P ij increases exponentially with an increase d k ( a i , a j ) hence yields a huge P ij when the samples a i and a j are far apart, hence a smaller a ij .

Optimization
The objective function in Eq. (13) could be sub-divided into two sub-divisions by updating X with D fixed and updating D with X fixed since it is not convex as a whole. To find the desired dictionary D and the sparse coefficient X, an alternative optimization is implemented iteratively adopting the theories of [5 , 13 , 17 , 48] .

Updating X with V fixed, sparse coding stage
When the coefficient matrix X is being updated with the representative matrix V being fixed, the objective function of Eq. (13) is reduced to the coding problem below: The X in Eq. (16) could be addressed on class bases. Thus x i corresponding to class i could be derived as: We let and simplify it as = min Using the analysis in [14] Eq. (18) where ˜ (17) could be rewritten as: The convex problem in Eq. (20) above could be solved by the iterative projection method [29] and each column of X could be updated as x i = ( i = 1 , . . . , k ) .
Updating V with the sparse coefficient matrix X fixed, the dictionary update stage During the dictionary update stage, the objective function in Eq. (13) is reduced to Eq. (16) as stated below: We take derivative of ( V ) over V and equate it to zero ( ∇ (V ) = 0 , the update of the equation is as stated in Eq. (22) below:

Classification scheme
For a given test sample y, the representation coefficients on dictionary D is given as: The test sample can finally be classified after obtaining V , hence y and D in Eq. (23) could be replaced with φ(y ) R d and φ( A ) V respectively as stated in Eq. (24) below: where λ is a scalar constant. Let where W = V T k ( A, A ) V, Q = V T k ( y, A ) , and S = k ( y, y ) .
Input: Training samples X, Initial dictionary D and parameters, λ 1 and λ 2 Output: The learned dictionary D (a) Initialize X and V. Based on Eq. (28) , the representative coefficient X and the representative matrix V could be initialized using the kernel KSVD algorithm [36] . (b) Optimize the solution by updating the sparse coding matrix X and the representation matrix V combined with Eq. (28) . Update X by fixing V and updating each x i = ( i = 1 , . . . , k ) according to Eq. (20) on class bases using the IPM [29] .
(c) Fix X and update V using Eq. (22) (d) Iteratively return to step (b) until convergence. (e) Get the representative coefficient x of y based on Eq. (26) combined with (28) (f) Identify y: classify y using D and X based on Eq. (27) where ˜ y = −1 / 2 U T Q, ˜ D = 1 / 2 U T and SVD of W is U U T . Then Eq. (24) could be rewritten as: We get the sparse coding efficient associated with test sample y by solving Eq. (26) and (27) , after which the residual r(y ) = || y − D i x i || 2 2 is calculated. Where x i is the sparse coding coefficient on the i th class and the test sample y is lastly classified into the class that has the least residual ( Algorithm 1 ).
Eq. (28) above indicates J features for each sample and could infused by a kernel combination of J kernels where subkernel K j , corresponds to feature j. Hence Eqs. (20) and (26) are comparable to (28) for the proposed K-LSDSR algorithm.

Database selection and face recognition
Experiments are performed on natural images to demonstrate the efficient performance of our proposed K-LSDSR method is discussed in this section. We scrutinize and relate the performance of our DL algorithm with most distinctive approaches such as the K-SVD [1] , SRC [15] , FDDL [48] , Kernel Locality-Sensitive Group Sparse Representation (KLSGSR) [34] , Kernel Sparse Representation based Classification (KSRC), Locality Sensitive Discriminative Sparse Representation (LSDSR) [52] and LSDL [41] algorithms. Out of the many collections of datasets for objects categorization and classification, four of them are being used in our experimental set up or evaluations. The public datasets being used are the ORL face dataset for face recognition, Yale B face dataset for face recognition, FERET face dataset for face recognition, and AR face dataset for face recognition and gender discrimination. Despite the better recognition performances chalked by deep machine learning with its implementation on datasets recently, the proposed approach did not adopt it simply because of its limitations with regards to its super hardware and large training samples requirements [4 , 44] . The proposed approach considers the two different locality adaptors (L2 norm and Exponential) on face recognitions.

Parameter selection
Several parameters including λ, the classification parameter, λ 1 , a positive weighing parameter for the locality sensitive constraint and λ 2 for the discriminative constraint were utilized by the proposed K-LSDSR and the classification error scheme, with varied values assigned depending on the dataset being used. More so, the classification parameter, λ made a little or no effect on the output of the experimental results hence was set to 0.01 in all experiments. The recognition rates of the proposed K-LSDSR approach (for the optimization phase) with varying values of λ 1 and λ 2 on AR face dataset are as depicted by Fig. 2 . It could be seen from the Fig. 2 that, the optimum results were obtained for recognition when λ 1 is 0.01 and λ 2 is 0.6 respectively with λ 2 and λ 1 values being fixed for each dataset by fivefold cross validation. Throughout all experiments, the Principal Component Analysis (PCA) was used to reduce feature dimensions before classification. And the parameter values used for the various experiments for each dataset are shown in Table 1 . Table 1 indicates the various parameter selections with their respective datasets. Based on the aforementioned values, an experiment was conducted to validate the performance of our proposed algorithms and the findings are detailed below.

Experimental results
There are a lot of datasets utilized for classification of objects. Out of these collections, three of them are being considered in our experiments. The experimental outcomes are realized by an N-fold cross-validation specifically 5-fold. The training   samples are randomly selected from each class and the rest utilized as testing samples which are later interchanged to yield better results in all cases. The first to be analysed is ORL face repository which contains four hundred (400) face images gotten from forty (40) different entities with each comprising of ten (10) facial images [30] . More so, some of these images were snapped at diverse periods with different facial expressions and specifics, under varying light intensity, images captured homogenously against dark and grey background with the entities in an upright frontal position. Every image was resized using Matlab codes to 32 × 32 image matrixes to ensure sufficient data in the estimation of Gaussian models and also for covariance matrices to be positive definite. The images were randomly corrupted manually by an unrelated block image at random locations. The recognition accuracies at dissimilar or various occlusion levels are shown in Table 2 . It could be seen from Table 2 that, our algorithm consistently accomplishes the best optimal result under increasing occlusion levels as compared with the other state-of-the-art approaches in all cases. However, the FDDL achieves the greatest result (96.7%) in the absence of noise or corruption which goes to suggest the effectiveness of low ranked approximation in the existence of noise and performs better owing to the adaptation of the loss function exhibiting the traits of Fisher criterion on the coefficients. The method that came close to the proposed K-LSDSR is the KLSGSR recording average recognition accuracy of 70.9% against occlusion rate of 50, which also yielded a good result due to the implementation of the group sparse representation. This was followed closely by the LSDSR approach also recording 69.9% recognition accuracy.
One of the imperative concerns of DL for SR is the size of the dictionary since it could greatly affect the performance of the model under discussion. Thus dictionary of varied sizes is exploited using the proposed K-LSDSR on the ORL face dataset. Random faces of cropped images are used. The training images of each subject are set from 20 to 200 consisting of 16 to 32 images per subject. The recognition rates of the proposed K-LSDSR is compared with other existing approaches in different dictionary sizes as plotted in Fig. 4 . From the figure, it could be realized that K-LSDSR achieves a better recognition rate of over 98% and outperform other state-of-the-art approaches even when the size of the dictionary is small.
The second is the Yale B Face Database that contains a total of 2432 frontal images of 38 subjects snapped under capricious light intensity environments were normalized to a size of 32 × 32 and the database was randomly split into two halves. One set comprising of 32 images for each subject utilized as the dictionary and the remaining set used for testing. The recognition rates for KSRC, K-SVD, LSDL, FDDL, LSDSR, KLSGSR and K-LSDSR achieved similar results in all dimensions with a rate difference of less than 0.6% as shown in the   dimensions of 50, 80, 120, 200 and 300 respectively as compared with the other methods at all levels. It was closely followed KLSGSR and LSDSR in that order. This was achieved under complex illumines conditions of the various face images. The third to be analysed is the AR face repository which contains 1200 face images taken from 100 different entities or subjects with each encompassing 1 2 normalized facial images of each subject out of which 50 male entities and 50 female entities are chosen. More so, the first 25 images of females and 25 males were utilized for training and the rest for testing. Every image was resized using Matlab to 32 × 32 image matrixes to guarantee adequate data in the estimation of Gaussian models and also for covariance matrices to be positive definite. It could be seen from Table 4 that the K-LSDSR achieves the best results consistently in all cases when compared with the other competing methods with high dimensionality. The KSLGSR and the K-LSDSR also had the best results in terms of recognition rates higher than the other competing methods which go to say that, the Kernelled and Locality adaptor has a higher impact in facial classification. Experimental results as indicated in Table 4 shows the recognition accuracy against varying dimensions on AR face dataset of the various algorithms used with respect to this paper. It could be seen from the results that, recognition rate increases for the K-LSDSR as the dimensions are increase from 30 to 300. This goes to suggest that, kernel approaches work best with an increase in dimension of the face images with respect to the AR database. The results of the proposed K-LSDSR approach was followed closely by KLSGSR and LSDSR in that order.
The fourth database is the FERET Face Database that contains a total of 1400 frontal images of 200 subjects each with 7 different images of varying illuminations and facial expressions. The images taken under varying light intensity conditions were normalized to a size of 32 × 32 and the database was however spliced indiscriminately into two. One set containing images of 4 for each entity utilized as the dictionary and the remaining set used for testing. The recognition rates for Kernel Sparse Representation based classification (KSRC) algorithm [50] , K-SVD, LSDL, FDDL, LSDSR, KLSGSR and K-LSDSR achieved similar results in all dimensions with rate differences are shown in the Table 5 . The table depicts the experimental

Gender classification
In this session, we choose 14 images per entity for the AR face dataset comprising of 50 males and 50 female entities or subjects out of which 25 images each of female and that of the males were using for training and the lingering 25 images of each gender used for testing. The dimensions of the images were reduces to 300 based on the setting of [48] . The recognition results of the K-LSDSR approach and the other competing approaches are as listed in Table 6 . The dimensionality of each image as reduced to and the training samples coded with each dictionary and then classified based on the representation or the residual error. The query atom y is classified to the sample which gives the minimal residual r(y ) = || y − D i x i || 2 2 . When the minimization results of the K-LSDSR are matched with that of the others, one can see that our approach get the best minimization results as matched with the first which validates that, the implementation of the dictionary using the Kernel and a locality sensitive adaptor with a loss function is more powerful on the same number of training samples. It could be realized from the results that the K-LSDSR method realized the best classification result of 98% when the implementation is based on testing the images of the dictionary of each class, and it gets best when the implementation is done with the test samples on the entire dictionary. The above accession is due to the fact that there are only two categorizations involved in gender classification, hence each categorization has a lot of training samples, resulting in the learned dictionary of every class being represented adequately in the case of the test sample.
It could be realized that the proposed DL learns a more compacted dictionary by assignments of label information adaptively to the dictionary atoms. In classification based dictionary, dictionary size is considered an important factor based on an effective rule used in setting the size. Usually, classification size increases with an increase in dictionary size. However, the size of the dictionary must not be that little for better representations more so, when kernel is involved. The dictionary size could initially be set as the number of training samples when training samples are not adequate and the size of the dictionary should be reduced than the training samples to lessen redundancy when there are sufficiently enough training samples. Samples of male and female facial images are shown in Fig. 5 .

Algorithmic complexity and computational time
The complexity time is carefully presented in this section of the manuscript. It is very important to note that, though dimensionality has effect on computational cost, the speed of the kernel matrix is only reduced by high feature dimensionality and this can be precomputed. The calculations of efficiency can also be boosted by adopting the KPCA algorithm. Thus irrespective of an increase in the feature dimensions of the images, our algorithm still does well of computational speed. The algorithmic (computational) complexity of the proposed K-LSDSR and the comparative approaches are presented in Section 4.6 of our manuscript. Consequently, to convex optimization problem is to solve an over-complete dictionary (sparse representation). Each dictionary atom is initially reconstructed by the remaining atoms and the algorithmic complexity of the reconstruction is represented as O (M), where M denotes the size of the dictionary D for the sparse learning. If the dictionary D is assumed to be fixed at the sparse learning stage, then the algorithmic complexity of the optimized sparse coefficient stage is O (N 2 + N) where N denotes the size of the training set A. Lastly, if we assumed that the matrix X is fixed at the dictionary update stage, the algorithmic complexity of solving the learned dictionary D is O (M 2 ). Thus the algorithmic complexity of the proposed K-LSDSR will be O (K * (M + + N + + N 2 + M 2 )) where K is the iteration number. For LSDSR, the algorithmic complexity is O (K * (N 2 + M 2 + N)). The computational complexity of LSDL is O (K * (M 2 + M + + N)) because it has no discrimination function as part of the sparse coefficient representation.
The key measuring tool used in evaluating dictionary learning algorithms is the testing time of samples. Thus the mean testing time of the proposed K-LSDSR of every test sample on the Extended YaleB face dataset is compared with the other algorithms including KSRC, K-SVD, LSDL, and FDDL. Varied dictionary sizes are set ranging from 100 to 1064 with each method tested on random faces of 10 divisions of the datasets. The means of the testing results are shown in Fig. 6 . The figure shows varying dictionary sizes with their testing times. The result shows that the proposed K-LSDSR outperformed the other methods with varying dictionary size.

Analysis of the proposed K-LSDSR approach
In the proposed K-LSDSR method, the kernelized locality-sensitive adaptor is employed to increasingly control the process of dictionary learning by superimposing the capture of local discriminative nonlinear data of face data. Also, a discriminative loss function based on sparse coefficients is combined into the structure of the proposed K-LSDSR to enhance the discrimination ability of SR. In addition, the proposed algorithm results in an optimized dictionary that enhances further, the discriminative classification of SR. We chose face samples or images from the Extended Yale B dataset for a subject with 32 training images each with 192 × 168 pixels. An experiment was performed to determine the efficient discrimination ability of the proposed K-LSDSR, LSDL and KSRC. The experiment was carried out with the same training sets for the three algorithms. This was done to optimize the dictionaries and obtain sparse coefficients of the testing samples. The test samples are then reconstructed by the optimized dictionaries and projected onto two dimensional subspaces by PCA with the first 300 eigenvectors. The two dimensional subspaces of KSRC, LSDL and the proposed K-LSDSR method are plotted via multidimensional scaling as shown in the sample distribution graphs of Fig. 7 by randomly choosing (8) instances. There is no clear distinction amongst the atoms for KSRC after reconstruction. Even though the integration of data locality and sparsity into the structure of LSDL resulting in a sufficient representation, there still exist a massive dispersion amongst some of the learned atoms from the training instances as could be seen from the figure. However, the proposed K-LSDSR has its boundaries more clarified with the optimized dictionary. More so, samples from the same categorization are more compact whiles the ones from different categories are well separated for the proposed K-LSDSR technique. Subsequently, we can conclude that the proposed K-LSDSR is expected to results in an improved recognition performance due to the integration of kernel, data locality and sparsity as well as the introduction of a loss function.

Discussions
The highest recognition rate of all the experimental results are represented in bold values as indicated by Tables 2-5 . The algorithms achieve a higher recognition rate since the images are mapped into a high dimensional feature space as a result of more information being carried by high dimensionality which enhances recognition performance. The rate of recognition increases steadily as feature dimensions also increases amongst the various approaches. In addition, the proposed method demonstrated superiority over the other methods even at low feature dimensions as indicated by the various experimental results on the different datasets. This progresses in that order to the features space with high dimensionality. Furthermore, the proposed K-LSDSR approach exhibits more classification and discrimination capabilities as a result of the introduction of the Locality-Sensitive adaptor into the structure of the proposed discriminative algorithm.

Conclusion and recommendations for future work
To handle efficiently multi-featured approach to discrimination, a Kernel Locality Sensitive Discriminative Sparse Representation (K-LSDSR) for face recognition is presented in this paper. The paper recognized that, image samples from the same categorization could be encrypted as the same sparse coefficients in the course of image recognition to enhance the discrimination capability features of SR. Thus it takes into account data locality in the kernel space and hence better discriminability of SR and better utilization of non-linear information hidden in the training and the test data. The proposed K-LSDSR directly use kernel to map features from a high dimensional space, and hence enables fusion of multi features and an enhanced data demonstration due to its ability to preserve data locality. It could be seen from the experimental results that, incorporating kernel into the structure of dictionary learning coupled with Locality -Sensitive Regularization Adaptor and a Loss Function inherently improves the classification performance as compared with other Dictionary Learning approaches. Furthermore, the proposed method is more computationally effective as a result of the implementation of the l2-norm which increases its stability. Thus close form solutions are derived from the sparse coding and dictionary update stages. We however proposed to incorporate weighted K-nearness neighbour into the structure of the Kernel Locality-Sensitive Discriminative Sparse Representation to enhance the power of classification.

Declaration of Competing Interest
The authors declare they have no competing interests.