Skip to main content
Log in

A Novel Angle-Based Learning Framework on Semi-supervised Dimensionality Reduction in High-Dimensional Data with Application to Action Recognition

  • Research Article-Computer Engineering and Computer Science
  • Published:
Arabian Journal for Science and Engineering Aims and scope Submit manuscript

Abstract

The existing outliers in high-dimensional data create various challenges to classify datasets such as the exact classification with imbalanced scatters. In this paper, we propose an angle-based framework as Angle Global and Local Discriminant Analysis (AGLDA) to consider imbalanced scatters. AGLDA chooses an optimal subspace by using angle cosine to achieve appropriate scatter balance in the dataset. The privilege of this method is to classify datasets with the effect of outliers by finding optimal subspace in high-dimensional data. Generally, this method is more effective and more reliable than other methods to classify data when there are outliers. Besides, human posture classification has been used as an application of the balanced semi-supervised dimensionality reduction to assist human factor experts and designers of industrial systems for diagnosing the type of maintenance crew postures. The experimental results show the efficiency of the proposed method via two real case studies, and the results have also been verified by comparing it with other approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Wu, Z.; Lin, T.; Li, M.: A computer-aided coloring method for virtual agents based on personality impression, color harmony, and designer preference. Int. J. Ind. Ergon. 68, 327–336 (2018)

    Article  Google Scholar 

  2. Jin, S.: Biomechanical characteristics in the recovery phase after low back fatigue in passive and active tissues. Int. J. Ind. Ergon. 64, 163–169 (2018)

    Article  Google Scholar 

  3. Zhang, L.; Lin, J.; Karim, R.: An angle-based subspace anomaly detection approach to high-dimensional data: with an application to industrial fault detection. Reliab. Eng. Syst. Saf. 142, 482–497 (2015)

    Article  Google Scholar 

  4. Zhu, L.; Zhang, C.; Zhang, C.; Zhang, Z.; Nie, X.; Zhou, X.; Liu, W.; Wang, X.: Forming a new small sample deep learning model to predict total organic carbon content by combining unsupervised learning with semisupervised learning. Appl. Soft Comput. 83, 105596 (2019)

    Article  Google Scholar 

  5. Qu, Y.; Liu, Z.: Dimensionality reduction and derivative spectral feature optimization for hyperspectral target recognition. Optik (Stuttg). 130, 1349–1357 (2017)

    Article  Google Scholar 

  6. Cui, D.; Xia, K.: Dimension reduction and defect recognition of strip surface defects based on intelligent information processing. Arab. J. Sci. Eng. 43, 6729–6736 (2018)

    Article  Google Scholar 

  7. Zhu, L.; Zhang, C.; Zhang, C.; Zhou, X.; Wang, J.; Wang, X.: Application of multiboost-KELM algorithm to alleviate the collinearity of log curves for evaluating the abundance of organic matter in marine mud shale reservoirs: a case study in Sichuan Basin, China. Acta Geophys. 66, 983–1000 (2018)

    Article  Google Scholar 

  8. Rousseeuw, P.J.; Hubert, M.: Robust statistics for outlier detection. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 1, 73–79 (2011)

    Article  Google Scholar 

  9. Tao, D.; Li, X.; Wu, X.; Maybank, S.J.: Geometric mean for subspace selection. IEEE Trans. Pattern Anal. Mach. Intell. 31, 260–274 (2009)

    Article  Google Scholar 

  10. Lotlikar, R.; Kothari, R.: Fractional-step dimensionality reduction. IEEE Trans. Pattern Anal. Mach. Intell. 22, 623–627 (2000)

    Article  Google Scholar 

  11. Lu, J.; Plataniotis, K.N.; Venetsanopoulos, A.N.: Regularized discriminant analysis for the small sample size problem in face recognition. Pattern Recognit. Lett. 24, 3079–3087 (2003)

    Article  Google Scholar 

  12. Loog, M.; Duin, R.P.W.; Haeb-Umbach, R.: Multiclass linear dimension reduction by weighted pairwise Fisher criteria. IEEE Trans. Pattern Anal. Mach. Intell. 23, 762–766 (2001)

    Article  Google Scholar 

  13. Liu, S.; Feng, L.; Qiao, H.: Scatter balance: an angle-based supervised dimensionality reduction. IEEE Trans. Neural Netw. Learn. Syst. 26, 277–289 (2015)

    Article  MathSciNet  Google Scholar 

  14. Zou, H.; Hastie, T.; Tibshirani, R.: Sparse principal component analysis. J. Comput. Graph. Stat. 15, 265–286 (2006)

    Article  MathSciNet  Google Scholar 

  15. Jiao, J.; Zhao, M.; Lin, J.; Liang, K.: Hierarchical discriminating sparse coding for weak fault feature extraction of rolling bearings. Reliab. Eng. Syst. Saf. 184, 41–54 (2018)

    Article  Google Scholar 

  16. Gao, S.; Zhou, J.; Yan, Y.; Ye, Q.L.: Recursively global and local discriminant analysis for semi-supervised and unsupervised dimension reduction with image analysis. Neurocomputing. (2016). https://doi.org/10.1016/j.neucom.2016.08.018

    Article  Google Scholar 

  17. Belkin, M.; Niyogi, P.; Sindhwani, V.: Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J. Mach. Learn. Res. 7, 2399–2434 (2006)

    MathSciNet  MATH  Google Scholar 

  18. Sindhwani, V.; Niyogi, P.; Belkin, M.; Keerthi, S.: Linear manifold regularization for large scale semi-supervised learning. In: Proceedings of the 22nd ICML Workshop on Learning with Partially Classified Training Data (2005)

  19. Gan, H.: A noise-robust semi-supervised dimensionality reduction method for face recognition. Optik (Stuttg). 157, 858–865 (2018)

    Article  Google Scholar 

  20. Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)

    MATH  Google Scholar 

  21. Zhu, L.; Zhang, C.; Zhang, C.; Zhang, Z.; Zhou, X.; Liu, W.; Zhu, B.: A new and reliable dual model-and data-driven TOC prediction concept: a TOC logging evaluation method using multiple overlapping methods integrated with semi-supervised deep learning. J. Pet. Sci. Eng. 188, 106944 (2020)

    Article  Google Scholar 

  22. Zhu, L.; Zhang, C.; Wei, Y.; Zhou, X.; Huang, Y.; Zhang, C.: Inversion of the permeability of a tight gas reservoir with the combination of a deep Boltzmann kernel extreme learning machine and nuclear magnetic resonance logging transverse relaxation time spectrum data. Interpretation. 5, T341–T350 (2017)

    Article  Google Scholar 

  23. Liu, Z.; Lai, Z.; Ou, W.; Zhang, K.; Zheng, R.: Structured optimal graph based sparse feature extraction for semi-supervised learning. Signal Process. 170, 107456 (2020)

    Article  Google Scholar 

  24. Jiang, J.; He, X.; Gao, M.; Wang, X.; Wu, X.: Human action recognition via compressive-sensing-based dimensionality reduction. Optik (Stuttg). 126, 882–887 (2015)

    Article  Google Scholar 

  25. Lan, Z.; Huang, M.: Health assessment model and maintenance decision model for seawall prognostics and health management system. Arab. J. Sci. Eng. 44, 8377–8387 (2019)

    Article  Google Scholar 

  26. Tharwat, A.; Gaber, T.; Ibrahim, A.; Hassanien, A.E.: Linear discriminant analysis: a detailed tutorial. AI Commun. 30, 169–190 (2017)

    Article  MathSciNet  Google Scholar 

  27. Zhang, D.; Zhou, Z.-H.; Chen, S.: Semi-supervised dimensionality reduction. In: Proceedings of the 2007 SIAM International Conference on Data Mining, pp. 629–634. SIAM (2007)

  28. Yang, J.; Zhang, D.; Yang, J.; Niu, B.: Globally maximizing, locally minimizing: unsupervised discriminant projection with applications to face and palm biometrics. IEEE Trans. Pattern Anal. Mach. Intell. 29, 650–664 (2007)

    Article  Google Scholar 

  29. Chen, X.; Yang, J.; Ye, Q.; Liang, J.: Recursive projection twin support vector machine via within-class variance minimization. Pattern Recognit. 44, 2643–2655 (2011)

    Article  Google Scholar 

  30. Dornaika, F.; El Traboulsi, Y.: Learning flexible graph-based semi-supervised embedding. IEEE Trans. Cybern. 46, 206–218 (2016)

    Article  Google Scholar 

  31. Ye, Q.L.; Zhao, C.X.; Zhang, H.F.; Chen, X.B.: Recursive “concave–convex” Fisher linear discriminant with applications to face, handwritten digit and terrain recognition. Pattern Recognit. 45, 54–65 (2012)

    Article  Google Scholar 

  32. Yang, J.; Yang, J.: Why can LDA be performed in PCA transformed space? Pattern Recognit. 36, 563–566 (2003)

    Article  Google Scholar 

  33. Zheng, W.; Zhao, L.; Zou, C.: An efficient algorithm to solve the small sample size problem for LDA. Pattern Recognit. 37, 1077–1079 (2004)

    Article  Google Scholar 

  34. Chen, L.-F.; Liao, H.-Y.M.; Ko, M.-T.; Lin, J.-C.; Yu, G.-J.: A new LDA-based face recognition system which can solve the small sample size problem. Pattern Recognit. 33, 1713–1726 (2000)

    Article  Google Scholar 

  35. Bian, W.; Tao, D.: Asymptotic generalization bound of Fisher’s linear discriminant analysis. IEEE Trans. Pattern Anal. Mach. Intell. 36, 2325–2337 (2014)

    Article  Google Scholar 

  36. Huang, Y.; Xu, D.; Nie, F.: Semi-supervised dimension reduction using trace ratio criterion. IEEE Trans. Neural Netw. Learn. Syst. 23, 519–526 (2012)

    Article  Google Scholar 

  37. Liu, T.; Tao, D.: Classification with noisy labels by importance reweighting. IEEE Trans. Pattern Anal. Mach. Intell. 38, 447–461 (2016)

    Article  Google Scholar 

  38. He, X.; Yan, S.; Hu, Y.; Niyogi, P.; Zhang, H.-J.: Face recognition using laplacianfaces. IEEE Trans. Pattern Anal. Mach. Intell. 27, 328–340 (2005)

    Article  Google Scholar 

  39. Cai, D.; He, X.; Han, J.: Semi-supervised Discriminant Analysis. In: 2007 IEEE 11th International Conference on Computer Vision, pp. 1–7. IEEE (2007)

  40. Tisseur, F.; Meerbergen, K.: The quadratic eigenvalue problem. SIAM Rev. 43, 235–286 (2001)

    Article  MathSciNet  Google Scholar 

  41. Khemchandani, R.; Chandra, S.: Twin support vector machines for pattern classification. IEEE Trans. Pattern Anal. Mach. Intell. 29, 905–910 (2007)

    Article  Google Scholar 

  42. Nie, F.; Xu, D.; Li, X.; Xiang, S.: Semisupervised dimensionality reduction and classification through virtual label regression. IEEE Trans. Syst. Man, Cybern. Part B 41, 675–685 (2011). https://doi.org/10.1109/TSMCB.2010.2085433

    Article  Google Scholar 

  43. Ye, J.; Ji, S.; Chen, J.: Multi-class discriminant kernel learning via convex programming. J. Mach. Learn. Res. 9, 719–758 (2008)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The first author would like to thank University of Mazandaran and Luleå University of Technology for allotting her PhD research opportunity.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahmad Pourdarvish.

Appendices

Appendix A

To explain how a linear combination of features can be achieved, the following assumptions are made. Given a data matrix X, \(X = \{ x_{i} |x_{1} ,x_{2} , \ldots ,x_{l} \} \in R^{n \times l}\) where \(l\) and \(n\) are, respectively, the number of samples and the dimension of data, LDA can model the supervised data distribution by mapping the input data \(x_{i} \in R^{n}\) in \(n\)-dimensional space into a vector \(y_{i} \in R^{r}\) in the lower \(r\)-dimensional [34, 35]. Then, a scalar y can be obtained by projecting the samples \(x\) onto a line

$$y_{i} = a^{T} x_{i} ,\;a \in R,\;r \le n.$$
(A-1)

The optimization procedure \(a\) can be applied in the following form

$$a_{\text{opt}}^{\text{LDA}} = \mathop {\arg \limits_{a}\hbox{max} } \frac{{{\text{tr}}\left( {a^{T} S_{\text{b}} a} \right)}}{{{\text{tr}}(a^{T} S_{\text{w}} a)}},$$
(A-2)

where \(a^{T} a = I\) and the within-class scatter matrix \(S_{\text{w}}\) and the between-class matrix \(S_{\text{b}}\) are defined as

$$S_{\text{w}} = \frac{1}{N}\mathop \sum \limits_{i = 1}^{c} \mathop \sum \limits_{{x \in X_{j} }} \left( {x - \mu_{i} } \right)\left( {x - \mu_{i} } \right)^{T} = X^{T} L_{F} X$$
(A-3)
$$S_{\text{b}} = \frac{1}{N}\mathop \sum \limits_{i = 1}^{c} N_{i} \left( {\mu_{i} - \mu } \right)\left( {\mu_{i} - \mu } \right)^{T} .$$
(A-4)

Here, \(c\) is the number of classes, \(L_{F} = I - F\) is a graph Laplacian matrix, and F is an adjacency matrix. If any pair of samples belongs to the same class, elements of F will be considered \(\frac{1}{{N_{i} }}\) and 0 otherwise. Also, \(\mu_{i}\) and \(\mu\) are the mean of ith class and the mean of the labeled points, respectively.

Appendix B

SDA has been formulated as follows:

$$J_{S} \left( a \right) = \frac{{a^{T} S_{\text{b}} a}}{{a^{T} S_{\text{w}} a + \gamma a^{T} S_{L} a}} = \frac{{a^{T} S_{\text{b}} a}}{{a^{T} S_{\text{w}} a + J\left( a \right)}}, ,$$
(B-1)

where \(a\) is an \(n\)-dimensional projection vector, \(S_{\text{b}}\) and \(S_{\text{w}}\) are the between–class and within-class scatter matrices as shown in Eqs. (A-3) and (A-4). Furthermore, \(J\left( a \right)\) is a regularization term which is learned from the labeled and unlabeled data and \(S_{L} = X^{T} LX\) is a graph scatter matrix [30, 36, 37]. \(L = D - H\) is a Laplacian matrix and \(D\) is a diagonal matrix whose entries are the column (or row, because \(H\) is symmetric) sum of \(H.\) SDA minimizes both the within-class scatter of labeled data and the local scatter of the labeled and unlabeled data and simultaneously maximizes the between-class scatter of labeled samples.

A p-nearest neighbor graph \(G = \left( {V,E} \right)\) can be used to model scatter matrices. The vertex set \(V = \left\{ {1,2, \ldots ,n} \right\}\) corresponding to the data points in \(X\) and the edge \(E \subseteq V \times V\) represents the relationships between data points [38, 39]. Let H denote the similarity matrix between entire data points and \(H_{ij}\) is \(\left( {i,j} \right)\) element of \(H\) matrix. Hence, each edge of graph is assigned \(H_{ij}\) which put the edge between nodes i and j if \(x_{i}\) and \(x_{j}\) are close to each other among p-nearest neighbors. Thus, \(H_{ij}\) can be defined as follows:

$$H_{ij} = \left\{ {\begin{array}{*{20}l} {1,} \hfill & {{\text{if}}\;x_{i} \in N_{p} \left( {x_{j} } \right)\;{\text{or}}\;x_{j} \in N_{p} \left( {x_{i} } \right) } \hfill \\ {0,} \hfill & {\text{otherwise}} \hfill \\ \end{array} } \right.$$
(B-2)

where \(N_{p} \left( {x_{i} } \right)\) represents the set of p-nearest neighbors of \(x_{i}\). \(A\) Laplacian style matrix is defined as follows:

$$\begin{aligned} S_{L} & = \frac{1}{2}\mathop \sum \limits_{i = 1}^{N} \mathop \sum \limits_{j = 1}^{N} H_{ij} (x_{i} - x_{j} )^{2} \\ & = X^{T} LX \\ \end{aligned}$$
(B-3)

According to [16], solving optimization SDA problem is equivalent to solving the following generalized eigenvalue problem (GEP) [40]:

$$S_{\text{b}} a = \lambda \left( {S_{\text{w}} + \gamma XLX^{T} } \right)a.$$
(B-4)

Appendix C

The RGLDA optimization problem has been defined as follows [41, 42]:

$$\begin{aligned} & \hbox{min} \frac{1}{2} a^{T} S_{D} a + \nu \sum \varepsilon_{i} \\ & \quad {\text{s}}.{\text{t}}\; \left| {a^{T} H_{i} } \right| + \varepsilon_{i} \ge 1,\;\varepsilon_{i} \ge 0 \\ \end{aligned}$$
(C-1)

where \(S_{D} = S_{\text{w}} + \gamma S_{L}\), \(\varepsilon_{i}\) is a loss function which relaxes the hard constraints and \(H = [ \left( {x_{1} - \mu } \right),\left( {x_{2} - \mu } \right), \ldots \left( {x_{l} - \mu } \right),\gamma \left( {x_{1} - m} \right),\gamma \left( {x_{2} - \mu } \right), \ldots ,\gamma \left( {x_{l + M} - m} \right)].\) Here, \(M\) is the number of unlabeled data. Gao et al. also constructed the following semi-supervised problem:

$$J_{\text{RG}} \left( a \right) = \frac{{a^{T} S_{\text{b}} a + \gamma a^{T} S_{N} a}}{{a^{T} S_{\text{w}} a + \gamma a^{T} S_{L} a}},$$
(C-2)

where \(\gamma\) denotes a regularization parameter. Equation (C-2) can be rewritten with the following semi-supervised problem:

$$J_{\text{RG}} \left( a \right) = \frac{{a^{T} S_{\text{w}} a + \gamma a^{T} S_{L} a}}{{a^{T} S_{T}^{{\prime }} a + \gamma a^{T} S_{T} a}},$$
(C-3)

see relation between Eqs. (C-2) and (C-3) in Ref. [16]. After that, the following optimization problem can be solved

$$\begin{aligned} & \hbox{min} \frac{1}{2}a^{T} S_{D} a + \nu e^{T} \varepsilon \\ & \quad {\text{s}}.{\text{t}}\;Fa + \varepsilon \ge e,\;\varepsilon \ge 0 \\ \end{aligned}$$
(C-4)

where \(F = \{ {\text{sign}}\left( {a_{t}^{T} \left( {H_{1} } \right)} \right)\left( {H_{1} } \right),{\text{sign}}\left( {a_{t}^{T} \left( {H_{2} } \right)} \right)\left( {H_{2} } \right), \ldots ,{\text{sign}}\left( {a_{t}^{T} \left( {H_{l + M} } \right)} \right)\left( {H_{l + M} } \right)\}^{T}\) and \(e\) is a column vector of ones with \(l + M\) dimensions. According to the Concave–Convex Procedure (CCP) [31, 43], to solve Eq. (C-4), first of all \(a_{t}\) is obtained by minimizing (C-1), and then, it is replaced with \(a_{t + 1}\). The solution (C-1) can be conveniently obtained by solving the following Wolf dual mathematical optimization formulation:

$$\begin{aligned} & \hbox{min} \frac{1}{2}\tau^{T} F(S_{D} )^{ - 1} F^{T} \tau - e^{T} \tau \\ & \quad {\text{s}}.{\text{t}}\;0 \le \tau \le \upsilon e. \\ \end{aligned}$$
(C-5)

It is easy to check, and the optimization (C-4) can be emulated by a regularized SVM without threshold.

For emulation, a new training set consisting of \(l + M\) training samples \(\left( {k_{1} ,y_{1} } \right),\left( {k_{2} ,y_{2} } \right), \ldots\),\(\left( {k_{l + M} ,y_{l + M} } \right)\) is needed, where \(k_{i} = H_{i}\), \(y_{i} \in \left\{ { - 1, + 1} \right\}\) and \(i = 1,2, \ldots ,l + M\) are the class labels estimated by computing \({\text{sign}}\left( {a_{t}^{T} \left( {H_{i} } \right)} \right)\).

Appendix D

Proof of Eq. (5).

The scatter \(S_{L}\) within a local neighborhood from two samples \(x_{i}\) and \(x_{j}\) based on angle can be obtained as follows:

$$\begin{aligned} & \frac{1}{2}\mathop \sum \limits_{i,j = 1}^{n} H_{ij} \parallel x_{i} - x_{j} \parallel_{2}^{2} \\ & \quad = \frac{1}{2}\mathop \sum \limits_{i,j = 1}^{n} 2H_{ij} \left( {1 - \cos \left( {x_{i} ,x_{j} } \right)} \right) \\ & \quad = \mathop \sum \limits_{i,j = 1}^{n} H_{ij} \left( {1 - \frac{{x_{i}^{T} x_{j} }}{{\sqrt {x_{i}^{T} x_{j} } \sqrt {x_{j}^{T} x_{i} } }}} \right) \\ & \quad = \mathop \sum \limits_{i,j = 1}^{n} H_{ij} \left( {1 - \frac{{(a^{T} x_{i} )^{T} (a^{T} x_{j} )}}{{\sqrt {(a^{T} x_{i} )^{T} (a^{T} x_{i} )} \sqrt {(a^{T} x_{j} )^{T} (a^{T} x_{j} )} }}} \right) \\ & \quad = \mathop \sum \limits_{i,j = 1}^{n} H_{ij} \left( {1 - \frac{{(a^{T} x_{i} )^{T} }}{{\parallel a^{T} x_{i} \parallel }} \cdot \frac{{(a^{T} x_{j} )}}{{\parallel a^{T} x_{j} \parallel }}} \right) \\ & \quad = \mathop \sum \limits_{i,j = 1}^{n} H_{ij} \left( {1 - (a^{T} x_{i} )_{e} (a^{T} x_{j} )_{e} } \right) \\ & \quad = \mathop \sum \limits_{i,j = 1}^{n} H_{ij} \left( {1 - \cos \left( {(a^{T} x_{i} )_{e} ,(a^{T} x_{j} )_{e} } \right)} \right) \\ & \quad = a^{T} \frac{1}{2}\mathop \sum \limits_{i,j = 1}^{n} H_{ij} \left( {x_{i} - x_{j} } \right)_{e}^{T} \left( {x_{i} - x_{j} } \right)_{e} a \\ & \quad = {\text{tr}}\left[ {a^{T} S_{L}^{{\prime }} a} \right]. \\ \end{aligned}$$

Meanwhile, the above equation has been transformed based on graph to:

$$\begin{aligned} & = {\text{tr}}\left[ {X^{T} \left( {I - D^{{ - \frac{1}{2}}} HD^{{ - \frac{1}{2}}} } \right)X} \right] \\ & = {\text{tr}}\left[ {X^{T} HX} \right] - {\text{tr}}\left[ {X^{T} DX} \right] \\ & = {\text{tr}}\left[ {X^{T} L_{H} X} \right], \\ \end{aligned}$$

where \(D = {\text{diag}}\left( {d_{i} } \right)\), \(d_{i} = \sum H_{ij}\), \(L_{H} = H - D.\)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ramezani, Z., Pourdarvish, A. & Teymourian, K. A Novel Angle-Based Learning Framework on Semi-supervised Dimensionality Reduction in High-Dimensional Data with Application to Action Recognition. Arab J Sci Eng 45, 11051–11063 (2020). https://doi.org/10.1007/s13369-020-04869-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13369-020-04869-w

Keywords

Navigation