Incoherent Dictionary Pair Learning: Application to a Novel Open-Source Database of Chinese Numbers

We enhance the efficacy of an existing dictionary pair learning algorithm by adding a dictionary incoherence penalty term. After presenting an alternating minimization solution, we apply the proposed incoherent dictionary pair learning (InDPL) method in classification of a novel open-source database of Chinese numbers. Benchmarking results confirm that the InDPL algorithm offers enhanced classification accuracy, especially when the number of training samples is limited.


I. INTRODUCTION
C HINESE numbers represent the wealth of China's history and culture.Certain numbers can be considered auspicious.For example, number 6 ( ) (Pinyin: lìu) is associated with six types of morality and it can be used to express the wish of success.Likewise, number 8 ( ) (Pinyin: bā) is associated with luck because it sounds similar to the word (Pinyin: fā), which means "make a fortune, to be rich."In China, two indigenous number systems, namely simplified and traditional, are used to communicate numeral values.An example of certain simplified Chinese numbers and their Hindu-Arabic counterparts is shown in Fig. 1.Traditional Chinese numerals, also known as banker's numerals, are used in commerce because of their robustness against forgery.
Handwritten character recognition is an established pattern recognition problem [1]- [5].However, despite the wealth of literature in pattern recognition of Chinese characters, e.g., [6]- [8], there is surprisingly little work carried out on the classification of handwritten Chinese numbers [9].This may be partly due to lack of a user-friendly and compact database of Chinese numbers.Here, we present a new database of handwritten simplified Chinese numbers acquired from 100 Chinese nationals.In addition, to classify these numbers, we introduce a novel concurrent dictionary learning and classification algorithm.
Classic dictionary learning methods do not explicitly embed pattern discrimination within the dictionary construction procedure.Recently, the notion of class-specific dictionary design for classification has been proposed [10]- [15].For instance, in discriminative K-SVD (D-KSVD) [16], the classification error was incorporated into the objective function.Li et al. [17] presented a reference-based objective function that was combined with the K-SVD algorithm for scene image categorization.Similarly, the label consistent K-SVD method attempted to associate label information with columns of the dictionary matrix during learning [18]- [20].
Atoms of a learned dictionary are typically desired to be incoherent [21]- [29].Several techniques have been proposed to enhance incoherence in dictionary learning.For example, Mailhé et al. [21] and Abolghasemi et al. [24] added an incoherence penalty to the K-SVD dictionary learning algorithm [30].In addition, a joint dictionary learning-projection was developed for compressive sensing in [31].However, existing works that address incoherence in dictionaries for classification tasks are neither sufficient nor application-specific; examples include [32], [34].
We therefore integrated an incoherence penalty term into the dictionary pair learning (DPL) [33] algorithm aiming to minimize similarity (measured by inner product) between the dictionary atoms associated with different classes.Upon deriving a solution by employing an alternating minimization strategy, we verified the efficacy of this approach in classification of our novel dataset of Chinese numbers.

A. Data Collection
One hundred Chinese students took part in data collection.Each participant wrote with a standard black ink pen all 15 numbers in a table with 15 designated regions drawn on a white A4 paper.This process was repeated ten times with each participant.Each sheet was scanned at the resolution of 300 × 300 pixels.

B. Preprocessing
Subjects were instructed to write the numbers at the center of the designated region.However, deviations were inevitable.To avoid classification error, we adopted a preprocessing procedure comprising: 1) images were scanned vertically and horizontally to determine the center and the bounding box of the number; 2) after centering, background was removed; and 3) images were resized to 25 × 25 pixels, as depicted in Fig. 2.

C. Discriminative Dictionary Learning
Let matrix denotes a synthesis dictionary and S i ∈ R p×n is a sparse coefficient matrix, discriminative dictionary learning [18]- [20] can be achieved with F and . 1 are the Frobenius and 1 -norms, respectively.The penalty term R(X, D, S) is normally defined with the aim of improving classification.Gu et al. [34] extended the conventional problem (1) into the DPL model by including a linear decomposition of the sparse matrix as S = P • X with P ∈ R m ×nK being an analysis dictionary.In this setting, simultaneous learning of D and P enabled avoiding direct approximation of the sparse coding coefficients in S. They defined the following: where τ, λ > 0 are constant scalars and X i denotes a matrix that includes samples from all classes except that of the ith class.Therefore P i will best represent samples of the ith class and simultaneously least represent other samples in other classes.The matrices D i and P i were used for classification.

D. Incoherent Dictionary Pair Learning (InDPL)
The DPL algorithm has a penalty term corresponding to the analysis subdictionary P i for the ith class that projects the samples of all other classes to an approximate-null space.By adding an incoherence penalty to learning of the synthesis subdictionary D i we modified the DPL cost function to D, Ŝ, P = arg min where β > 0 is a constant scalar and (.) T denotes matrix transpose operation.The added penalty attempts to enforce D T j D i ≈ 0 ∀i = j.To approximate D, P, and S, we alternately kept two fixed and computed the third.For instance, by fixing D and P, taking the derivative of (3) with respect to S i , and equating to zero, S i was calculated as where I is the identity matrix.After repeating (4) for all classes, we have Ŝ = [ Ŝ1 , Ŝ2 , . . ., ŜK ].Similarly, P i will be where γ is typically a small positive parameter to avoid division by zero.We have P = [ P1 , P2 , . . ., PK ] after repeating this step for all classes.The matrix D was calculated with the iterative method of alternating direction method of multipliers (ADMM) [35].An auxiliary matrix T was introduced into (3) where k ∈ {1, 2, • • • , p} and t k denotes the kth column of T. The columns of T were normalized to avoid trivial solutions.The solution is then obtained iteratively based on a triple subproblem set where r is the iteration index and 0 < ρ < 1 is a scalar that gradually increases at rate ρ rate ≥ 1. Closed-form solutions for ( 7) and ( 8) can be obtained by taking the derivatives of every subdictionary and equating Algorithm 1: Incoherent dictionary pair learning (InDPL).Input: X 1 , X 2 , . . ., X K and parameters λ = 0.005, τ = 1, β = 0.08, γ = 10 −4 , ρ = 1, ρ rate = 1.2, and synthesis sub-dictionary size p = 30.
Initialize D (0) , r = 0. Output: D, P, and S for l ← 1, Iter do for i ← 1, K do perform (4) for S i end for S ← [S 1 , S 2 , . . ., S K ] for i ← 1, K do perform (5) for P i end for P ← [P 1 , P 2 , . . ., P K ] for i ← 1, K do repeat perform (10) to solve for The pseudocode of the proposed approach is given in Algorithm 1.
For any two distinct synthesis dictionaries D i and D j , three incoherence measures, namely, μ min , μ max , and μ average can be calculated as follows: Smaller values of μ min , μ max , and μ average indicate that higher incoherence between dictionaries is achieved.
In both DPL and InDPL algorithms, if a test sample y belongs to class i, then the error y − D i P i y 2  2 would be the smallest.

E. Benchmarking
We compared the performance of the proposed InDPL algorithm with the performance of the DPL algorithm [34].In addition, as a benchmark, we compared the results to the case when only a k-nearest neighbor (kNN) classifier was applied to the resized data without any dimensionality reduction or dictionary learning.Inputs to the kNN classifier were vectorized images and their class labels.

F. Cross Validation
Three different cross-validation techniques were implemented.In the interest of clarity, we use the following notation: number of subjects n s = 100, number of repetitions n r = 10, and number of Chinese numbers n c = 15.
1) Conventional: All n r repetitions from all n s subjects were pooled.Therefore, the data set has (n s × n r ) images for each of the n c classes; total: n s × n r × n c = 15, 000.We then performed a conventional 10-fold cross-validation.2) Between-Subjects: In each fold of this cross-validation method, the training set comprised data from n s − 1 subjects, n r repetitions, and n c classes.The testing set included the remaining n s = 1 subject, n r repetitions, and n c classes.We repeated this process 100 times.Each time all 10 × 15 images from a distinct subject were left out for testing.3) Within-Subject: In each fold of this cross-validation method, the training set comprised data from n s subjects, n r − 1 repetitions, and n c classes.The testing set included the remaining n s subjects, n r = 1 repetition, and n c classes.We repeated this process 10 times, each time all 100 × 15 images from a distinct repetition were left out for testing.

III. RESULTS
To choose an optimum number of dictionary atoms, we plotted the overall classification accuracy, achieved with the In-DPL method, versus the number of atoms in each dictionary p. Fig. 3 shows that as p reaches 30, essentially peak accuracy is achieved.In order to avoid unnecessary combinational complexity, we therefore chose p = 30 beyond which the improvement in classification accuracy was negligible.
Fig. 4 illustrates an example of learned dictionary atoms for all 15 classes.For the purpose of clarity in visualization we have included only 25 of the 30 atoms in this figure.The dictionary atoms of each class distinctly represent one specific number.This observation suggested that the added penalty for   incoherence effectively enforced the dictionaries to be as discriminative as possible.The coherence between the learned dictionaries is reported in Table I -smaller numbers mean higher incoherence.The values associated with the proposed InDPL algorithm were consistently smaller than those achieved with the DPL algorithm.Fig. 5 shows the average classification accuracy that was achieved with each of the three InDPL, DPL, and kNN algorithms.These scores are plotted with respect to the crossvalidation method.The InDPL algorithm outperformed the DPL and the kNN algorithms in both conventional and within-subject cross validations.The three algorithms exhibited comparable performance in the between-subjects cross validation, suggesting that when a large training dataset is available the choice of algorithm is less important from the accuracy point of view.In addition, we observed a significantly larger standard deviation in the case of classification with kNN in the within-subject cross-validation scheme.
Fig. 6 shows the improvement in classification accuracy achieved by using the InDPL algorithm instead of the DPL method.Darker colors on the diagonal reflect clearly the better performance of the InDPL in the conventional and withinsubject cross-validation conditions.Visual inspection of the digits in Fig. 1 led to the prediction that distinguishing between numbers 10 and 10 3 would be most challenging.Class-specific accuracy showed that the most difficult pair to decode were 2 and 3, reflecting the sparsity of image data before encoding.

IV. CONCLUSION
We augmented the DPL algorithm by adding an incoherence penalty term.With the resulting InDPL algorithm class-specific dictionaries were achieved.The InDPL cost function was broken to three sub-problems, of which two were solved using closedform solutions.The third was solved by utilizing the ADMM method.It was applied to a novel database of Chinese handwritten numbers.We developed three cross-validation techniques to verify the efficacy of the proposed incoherent dictionary pair learning methodology.Results showed that in the conventional and within-subject cross-validation conditions, the classification accuracy achieved with the InDPL algorithm exceeds that obtained with the DPL and the kNN methods.When the number of training samples increases, the three classification methods result in comparable scores, reaffirming the hypothesis that dictionary learning-based techniques are suited better to cases in which the number of training data is limited.The availability of this novel dataset allows machine learning and signal processing researchers to develop further pattern recognition algorithms and utilize the proposed algorithm and cross-validation methods for benchmarking.

Fig. 3 .
Fig. 3. Classification accuracy of the proposed method versus the number of atoms p within the synthesis dictionary.

Fig. 4 .
Fig. 4. Illustration of 25 learned atoms of all 15 synthesis dictionaries.Atoms in each class clearly reflect the class label.In analysis, 30 atoms were used.

Fig. 5 .
Fig. 5. Average classification accuracy with standard deviation the three cross-validation cases.

TABLE I COHERENCE
VALUES AMONG PAIRS OF SYNTHESIS DICTIONARIES