Abstract
The existing outliers in high-dimensional data create various challenges to classify datasets such as the exact classification with imbalanced scatters. In this paper, we propose an angle-based framework as Angle Global and Local Discriminant Analysis (AGLDA) to consider imbalanced scatters. AGLDA chooses an optimal subspace by using angle cosine to achieve appropriate scatter balance in the dataset. The privilege of this method is to classify datasets with the effect of outliers by finding optimal subspace in high-dimensional data. Generally, this method is more effective and more reliable than other methods to classify data when there are outliers. Besides, human posture classification has been used as an application of the balanced semi-supervised dimensionality reduction to assist human factor experts and designers of industrial systems for diagnosing the type of maintenance crew postures. The experimental results show the efficiency of the proposed method via two real case studies, and the results have also been verified by comparing it with other approaches.
Similar content being viewed by others
References
Wu, Z.; Lin, T.; Li, M.: A computer-aided coloring method for virtual agents based on personality impression, color harmony, and designer preference. Int. J. Ind. Ergon. 68, 327–336 (2018)
Jin, S.: Biomechanical characteristics in the recovery phase after low back fatigue in passive and active tissues. Int. J. Ind. Ergon. 64, 163–169 (2018)
Zhang, L.; Lin, J.; Karim, R.: An angle-based subspace anomaly detection approach to high-dimensional data: with an application to industrial fault detection. Reliab. Eng. Syst. Saf. 142, 482–497 (2015)
Zhu, L.; Zhang, C.; Zhang, C.; Zhang, Z.; Nie, X.; Zhou, X.; Liu, W.; Wang, X.: Forming a new small sample deep learning model to predict total organic carbon content by combining unsupervised learning with semisupervised learning. Appl. Soft Comput. 83, 105596 (2019)
Qu, Y.; Liu, Z.: Dimensionality reduction and derivative spectral feature optimization for hyperspectral target recognition. Optik (Stuttg). 130, 1349–1357 (2017)
Cui, D.; Xia, K.: Dimension reduction and defect recognition of strip surface defects based on intelligent information processing. Arab. J. Sci. Eng. 43, 6729–6736 (2018)
Zhu, L.; Zhang, C.; Zhang, C.; Zhou, X.; Wang, J.; Wang, X.: Application of multiboost-KELM algorithm to alleviate the collinearity of log curves for evaluating the abundance of organic matter in marine mud shale reservoirs: a case study in Sichuan Basin, China. Acta Geophys. 66, 983–1000 (2018)
Rousseeuw, P.J.; Hubert, M.: Robust statistics for outlier detection. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 1, 73–79 (2011)
Tao, D.; Li, X.; Wu, X.; Maybank, S.J.: Geometric mean for subspace selection. IEEE Trans. Pattern Anal. Mach. Intell. 31, 260–274 (2009)
Lotlikar, R.; Kothari, R.: Fractional-step dimensionality reduction. IEEE Trans. Pattern Anal. Mach. Intell. 22, 623–627 (2000)
Lu, J.; Plataniotis, K.N.; Venetsanopoulos, A.N.: Regularized discriminant analysis for the small sample size problem in face recognition. Pattern Recognit. Lett. 24, 3079–3087 (2003)
Loog, M.; Duin, R.P.W.; Haeb-Umbach, R.: Multiclass linear dimension reduction by weighted pairwise Fisher criteria. IEEE Trans. Pattern Anal. Mach. Intell. 23, 762–766 (2001)
Liu, S.; Feng, L.; Qiao, H.: Scatter balance: an angle-based supervised dimensionality reduction. IEEE Trans. Neural Netw. Learn. Syst. 26, 277–289 (2015)
Zou, H.; Hastie, T.; Tibshirani, R.: Sparse principal component analysis. J. Comput. Graph. Stat. 15, 265–286 (2006)
Jiao, J.; Zhao, M.; Lin, J.; Liang, K.: Hierarchical discriminating sparse coding for weak fault feature extraction of rolling bearings. Reliab. Eng. Syst. Saf. 184, 41–54 (2018)
Gao, S.; Zhou, J.; Yan, Y.; Ye, Q.L.: Recursively global and local discriminant analysis for semi-supervised and unsupervised dimension reduction with image analysis. Neurocomputing. (2016). https://doi.org/10.1016/j.neucom.2016.08.018
Belkin, M.; Niyogi, P.; Sindhwani, V.: Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J. Mach. Learn. Res. 7, 2399–2434 (2006)
Sindhwani, V.; Niyogi, P.; Belkin, M.; Keerthi, S.: Linear manifold regularization for large scale semi-supervised learning. In: Proceedings of the 22nd ICML Workshop on Learning with Partially Classified Training Data (2005)
Gan, H.: A noise-robust semi-supervised dimensionality reduction method for face recognition. Optik (Stuttg). 157, 858–865 (2018)
Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)
Zhu, L.; Zhang, C.; Zhang, C.; Zhang, Z.; Zhou, X.; Liu, W.; Zhu, B.: A new and reliable dual model-and data-driven TOC prediction concept: a TOC logging evaluation method using multiple overlapping methods integrated with semi-supervised deep learning. J. Pet. Sci. Eng. 188, 106944 (2020)
Zhu, L.; Zhang, C.; Wei, Y.; Zhou, X.; Huang, Y.; Zhang, C.: Inversion of the permeability of a tight gas reservoir with the combination of a deep Boltzmann kernel extreme learning machine and nuclear magnetic resonance logging transverse relaxation time spectrum data. Interpretation. 5, T341–T350 (2017)
Liu, Z.; Lai, Z.; Ou, W.; Zhang, K.; Zheng, R.: Structured optimal graph based sparse feature extraction for semi-supervised learning. Signal Process. 170, 107456 (2020)
Jiang, J.; He, X.; Gao, M.; Wang, X.; Wu, X.: Human action recognition via compressive-sensing-based dimensionality reduction. Optik (Stuttg). 126, 882–887 (2015)
Lan, Z.; Huang, M.: Health assessment model and maintenance decision model for seawall prognostics and health management system. Arab. J. Sci. Eng. 44, 8377–8387 (2019)
Tharwat, A.; Gaber, T.; Ibrahim, A.; Hassanien, A.E.: Linear discriminant analysis: a detailed tutorial. AI Commun. 30, 169–190 (2017)
Zhang, D.; Zhou, Z.-H.; Chen, S.: Semi-supervised dimensionality reduction. In: Proceedings of the 2007 SIAM International Conference on Data Mining, pp. 629–634. SIAM (2007)
Yang, J.; Zhang, D.; Yang, J.; Niu, B.: Globally maximizing, locally minimizing: unsupervised discriminant projection with applications to face and palm biometrics. IEEE Trans. Pattern Anal. Mach. Intell. 29, 650–664 (2007)
Chen, X.; Yang, J.; Ye, Q.; Liang, J.: Recursive projection twin support vector machine via within-class variance minimization. Pattern Recognit. 44, 2643–2655 (2011)
Dornaika, F.; El Traboulsi, Y.: Learning flexible graph-based semi-supervised embedding. IEEE Trans. Cybern. 46, 206–218 (2016)
Ye, Q.L.; Zhao, C.X.; Zhang, H.F.; Chen, X.B.: Recursive “concave–convex” Fisher linear discriminant with applications to face, handwritten digit and terrain recognition. Pattern Recognit. 45, 54–65 (2012)
Yang, J.; Yang, J.: Why can LDA be performed in PCA transformed space? Pattern Recognit. 36, 563–566 (2003)
Zheng, W.; Zhao, L.; Zou, C.: An efficient algorithm to solve the small sample size problem for LDA. Pattern Recognit. 37, 1077–1079 (2004)
Chen, L.-F.; Liao, H.-Y.M.; Ko, M.-T.; Lin, J.-C.; Yu, G.-J.: A new LDA-based face recognition system which can solve the small sample size problem. Pattern Recognit. 33, 1713–1726 (2000)
Bian, W.; Tao, D.: Asymptotic generalization bound of Fisher’s linear discriminant analysis. IEEE Trans. Pattern Anal. Mach. Intell. 36, 2325–2337 (2014)
Huang, Y.; Xu, D.; Nie, F.: Semi-supervised dimension reduction using trace ratio criterion. IEEE Trans. Neural Netw. Learn. Syst. 23, 519–526 (2012)
Liu, T.; Tao, D.: Classification with noisy labels by importance reweighting. IEEE Trans. Pattern Anal. Mach. Intell. 38, 447–461 (2016)
He, X.; Yan, S.; Hu, Y.; Niyogi, P.; Zhang, H.-J.: Face recognition using laplacianfaces. IEEE Trans. Pattern Anal. Mach. Intell. 27, 328–340 (2005)
Cai, D.; He, X.; Han, J.: Semi-supervised Discriminant Analysis. In: 2007 IEEE 11th International Conference on Computer Vision, pp. 1–7. IEEE (2007)
Tisseur, F.; Meerbergen, K.: The quadratic eigenvalue problem. SIAM Rev. 43, 235–286 (2001)
Khemchandani, R.; Chandra, S.: Twin support vector machines for pattern classification. IEEE Trans. Pattern Anal. Mach. Intell. 29, 905–910 (2007)
Nie, F.; Xu, D.; Li, X.; Xiang, S.: Semisupervised dimensionality reduction and classification through virtual label regression. IEEE Trans. Syst. Man, Cybern. Part B 41, 675–685 (2011). https://doi.org/10.1109/TSMCB.2010.2085433
Ye, J.; Ji, S.; Chen, J.: Multi-class discriminant kernel learning via convex programming. J. Mach. Learn. Res. 9, 719–758 (2008)
Acknowledgements
The first author would like to thank University of Mazandaran and Luleå University of Technology for allotting her PhD research opportunity.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A
To explain how a linear combination of features can be achieved, the following assumptions are made. Given a data matrix X, \(X = \{ x_{i} |x_{1} ,x_{2} , \ldots ,x_{l} \} \in R^{n \times l}\) where \(l\) and \(n\) are, respectively, the number of samples and the dimension of data, LDA can model the supervised data distribution by mapping the input data \(x_{i} \in R^{n}\) in \(n\)-dimensional space into a vector \(y_{i} \in R^{r}\) in the lower \(r\)-dimensional [34, 35]. Then, a scalar y can be obtained by projecting the samples \(x\) onto a line
The optimization procedure \(a\) can be applied in the following form
where \(a^{T} a = I\) and the within-class scatter matrix \(S_{\text{w}}\) and the between-class matrix \(S_{\text{b}}\) are defined as
Here, \(c\) is the number of classes, \(L_{F} = I - F\) is a graph Laplacian matrix, and F is an adjacency matrix. If any pair of samples belongs to the same class, elements of F will be considered \(\frac{1}{{N_{i} }}\) and 0 otherwise. Also, \(\mu_{i}\) and \(\mu\) are the mean of ith class and the mean of the labeled points, respectively.
Appendix B
SDA has been formulated as follows:
where \(a\) is an \(n\)-dimensional projection vector, \(S_{\text{b}}\) and \(S_{\text{w}}\) are the between–class and within-class scatter matrices as shown in Eqs. (A-3) and (A-4). Furthermore, \(J\left( a \right)\) is a regularization term which is learned from the labeled and unlabeled data and \(S_{L} = X^{T} LX\) is a graph scatter matrix [30, 36, 37]. \(L = D - H\) is a Laplacian matrix and \(D\) is a diagonal matrix whose entries are the column (or row, because \(H\) is symmetric) sum of \(H.\) SDA minimizes both the within-class scatter of labeled data and the local scatter of the labeled and unlabeled data and simultaneously maximizes the between-class scatter of labeled samples.
A p-nearest neighbor graph \(G = \left( {V,E} \right)\) can be used to model scatter matrices. The vertex set \(V = \left\{ {1,2, \ldots ,n} \right\}\) corresponding to the data points in \(X\) and the edge \(E \subseteq V \times V\) represents the relationships between data points [38, 39]. Let H denote the similarity matrix between entire data points and \(H_{ij}\) is \(\left( {i,j} \right)\) element of \(H\) matrix. Hence, each edge of graph is assigned \(H_{ij}\) which put the edge between nodes i and j if \(x_{i}\) and \(x_{j}\) are close to each other among p-nearest neighbors. Thus, \(H_{ij}\) can be defined as follows:
where \(N_{p} \left( {x_{i} } \right)\) represents the set of p-nearest neighbors of \(x_{i}\). \(A\) Laplacian style matrix is defined as follows:
According to [16], solving optimization SDA problem is equivalent to solving the following generalized eigenvalue problem (GEP) [40]:
Appendix C
The RGLDA optimization problem has been defined as follows [41, 42]:
where \(S_{D} = S_{\text{w}} + \gamma S_{L}\), \(\varepsilon_{i}\) is a loss function which relaxes the hard constraints and \(H = [ \left( {x_{1} - \mu } \right),\left( {x_{2} - \mu } \right), \ldots \left( {x_{l} - \mu } \right),\gamma \left( {x_{1} - m} \right),\gamma \left( {x_{2} - \mu } \right), \ldots ,\gamma \left( {x_{l + M} - m} \right)].\) Here, \(M\) is the number of unlabeled data. Gao et al. also constructed the following semi-supervised problem:
where \(\gamma\) denotes a regularization parameter. Equation (C-2) can be rewritten with the following semi-supervised problem:
see relation between Eqs. (C-2) and (C-3) in Ref. [16]. After that, the following optimization problem can be solved
where \(F = \{ {\text{sign}}\left( {a_{t}^{T} \left( {H_{1} } \right)} \right)\left( {H_{1} } \right),{\text{sign}}\left( {a_{t}^{T} \left( {H_{2} } \right)} \right)\left( {H_{2} } \right), \ldots ,{\text{sign}}\left( {a_{t}^{T} \left( {H_{l + M} } \right)} \right)\left( {H_{l + M} } \right)\}^{T}\) and \(e\) is a column vector of ones with \(l + M\) dimensions. According to the Concave–Convex Procedure (CCP) [31, 43], to solve Eq. (C-4), first of all \(a_{t}\) is obtained by minimizing (C-1), and then, it is replaced with \(a_{t + 1}\). The solution (C-1) can be conveniently obtained by solving the following Wolf dual mathematical optimization formulation:
It is easy to check, and the optimization (C-4) can be emulated by a regularized SVM without threshold.
For emulation, a new training set consisting of \(l + M\) training samples \(\left( {k_{1} ,y_{1} } \right),\left( {k_{2} ,y_{2} } \right), \ldots\),\(\left( {k_{l + M} ,y_{l + M} } \right)\) is needed, where \(k_{i} = H_{i}\), \(y_{i} \in \left\{ { - 1, + 1} \right\}\) and \(i = 1,2, \ldots ,l + M\) are the class labels estimated by computing \({\text{sign}}\left( {a_{t}^{T} \left( {H_{i} } \right)} \right)\).
Appendix D
Proof of Eq. (5).
The scatter \(S_{L}\) within a local neighborhood from two samples \(x_{i}\) and \(x_{j}\) based on angle can be obtained as follows:
Meanwhile, the above equation has been transformed based on graph to:
where \(D = {\text{diag}}\left( {d_{i} } \right)\), \(d_{i} = \sum H_{ij}\), \(L_{H} = H - D.\)
Rights and permissions
About this article
Cite this article
Ramezani, Z., Pourdarvish, A. & Teymourian, K. A Novel Angle-Based Learning Framework on Semi-supervised Dimensionality Reduction in High-Dimensional Data with Application to Action Recognition. Arab J Sci Eng 45, 11051–11063 (2020). https://doi.org/10.1007/s13369-020-04869-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13369-020-04869-w