Robust local metric learning via least square regression regularization for scene recognition
Introduction
Scene recognition plays an important role in the field of computer vision since it helps reduce the semantic gap of scene understanding between human beings and computers [1]. During the past few decades, various researches [2], [3], [4], [5], [6], [7], [8] have been developed to improve the performance of scene recognition from different points of view. However, it is still a challenging problem because of intra-class diversity and inter-class similarity in the scene images. The challenge for scene recognition consists of two steps, feature extraction step and metric learning step.
For the feature extraction step, traditional methods are mostly based on low-level features or mid-level semantic representations. Methods based on low-level features directly extract the basic visual features of scene images, while methods based on mid-level features tend to learn a holistic scene representation through the high-order statistical information. The lack of the more discriminative and abstractive scene representation greatly restricts the recognition performance of these methods. Recently, with powerful convolutional neural network (CNN) and large-scale training datasets being available, deep learning based methods [9], [10], [11], [12] have achieved prominent performance in the domain of scene recognition. They can hierarchically extract more abstractive and representative features from scene images, thereby promoting the recognition performance in a large degree. Nevertheless, for practical scene recognition tasks, it is hard to fully train a new CNN model from scratch. This is the reason why researchers transfer CNN models pre-trained on large-scale datasets for scene recognition.
For the metric learning step, how to learn an appropriate distance metric has been comprehensively surveyed in recent years, yet it still remains an open problem. The essence of metric learning is to find a kind of projection transformation, so that it can transform the original samples into a more discriminative metric space. Considering that global metric learning is not suitable for all training samples and results in unsatisfactory performance, more researchers pay attention to local metric learning. Representative local metric learning methods include neighborhood component analysis (NCA) [13], large margin nearest neighbor (LMNN) [14], local discriminative distance metrics (LDDM) [15], large margin local metric learning (LMLML) [16], and local metric learning with eigenvectors (MLEV-L) [17]. While these methods have shown promising performance for scene recognition, most of them suffer from over-fitting problem due to the high similarity of inter-class scenes, such as bedroom and living room. Alternatively, other researchers combined local metric learning with global metric learning to learn a more reasonable distance metric. Liong et al. [18] proposed a regularized local metric learning (RLML) method, which combines global and local metrics to represent the intra-class and inter-class variances. Zhang and Zhao [19] explored an integrated global–local metric learning (IGLML) method, where the local metrics are combined with the global metric by their posterior probabilities of GMM. Whereas, these methods ignore the robustness of local metric learning, thereby failing to estimate the variation of intra-class and inter-class in the scene images. To this end, it is important and urgent to learn a robust local metric learning method for scene recognition, which should be able to alleviate the intra-class diversity and inter-class similarity and hence enhance the recognition performance of scene images.
As mentioned previously, in this paper, we take the pre-trained deep features as scene representation, and focus on studying a robust local metric learning method via least square regression regularization for scene recognition. To be specific, given the pre-trained deep features, we first formulate local discriminative metric function with all label information taking into consideration, which pulls same class neighbors closer and pushes different classes ones farther away simultaneously. Therefore, the corresponding low-dimensional discriminative features can be well explored from original deep features. Then taking advantage of the least square regression, we minimize the regression error between the original deep features and corresponding low-dimensional discriminative features, such that the local geometry structure can be preserved as much as possible. Finally, the local discriminative metric function and least square regression regularization are integrated into a unified framework. By doing so, the least square regression can regularize the local metric learning and in turn the local metric learning is able to guide the least square regression, which therefore has the ability to promote the robustness of local metric learning and enhance the recognition performance of scene images. As shown in Fig. 1, compared to the original deep features, the robust local metric learning makes the same class more compact and different classes more separable.
The rest of the paper is organized as follows. Related works are presented in Section 2. Section 3 introduces the proposed robust local metric learning via least square regression regularization. Experimental results are given in Section 4. Section 5 concludes this paper.
Section snippets
Related works
In this section, we briefly review two related topics: scene recognition and metric learning.
Robust local metric learning via least square regression regularization
In this section, we first introduce the proposed RLML-LSR method in detail, followed by convergence analysis as well as computational complexity analysis.
Experiments
To validate the effectiveness and robustness of the proposed RLML-LSR method for scene recognition, we test the experiments on both natural scene and remote sensing scene datasets. First, we give the experiment datasets and setup. Second, we conduct parameter analysis to select the optimal parameter. Third, we compare our method with related metric learning methods as well as the state-of-the-art recognition methods. Finally, we take convergence study to demonstrate the efficiency of the
Conclusion
In this paper, taking advantage of the least square regression, we propose a robust local metric learning method for scene recognition. We formulate a local discriminative metric function to well explore the low-dimensional discriminative features from original deep features. Besides, minimizing the regression errors of same class neighbors enables us to preserve the local geometry structure as much as possible. What’s more, the local discriminative metric function and least square regression
CRediT authorship contribution statement
Chen Wang: Methodology, Software, Writing - original draft. Guohua Peng: Visualization, Supervision. Wei Lin: Validation, Writing - review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
The authors would like to thank Fei-Fei Li, Svetlana Lazebnik, and Shawn Newsam, who generously provided their UIUC-8 dataset, Scene-15 dataset and UCM-21 dataset.
Chen Wang received the M.S. degree in School of Science from Northwestern Polytechnical University, Xi’an, China in March 2018. She is currently pursuing the Ph.D. degree in School of Science of Northwestern Polytechnical University, China. Her research interests include image processing, scene recognition, and metric learning.
References (41)
- et al.
Exemplar based deep discriminative and shareable feature learning for scene image classification
Pattern Recognition
(2015) - et al.
Improved spatial pyramid matching for scene recognition
Pattern Recognition
(2018) - et al.
A novel scene classification model combining ResNet based transfer learning and data augmentation with a filter
Neurocomputing
(2019) - et al.
G-MS2F: GoogLeNet based multi-stage feature fusion of deep CNN for scene recognition
Neurocomputing
(2017) - et al.
Scene recognition with objectness
Pattern Recognition
(2018) - et al.
Local discriminative distance metrics ensemble learning
Pattern Recognition
(2013) - et al.
Global and local metric learning via eigenvectors
Knowledge-Based Systems
(2017) - et al.
Regularized local metric learning for person re-identification
Pattern Recognition Letters
(2015) - et al.
Learning a Mahalanobis distance metric for data clustering and classification
Pattern Recognition
(2008) - et al.
Multiview discriminative marginal metric learning for makeup face verification
Neurocomputing
(2019)
Extreme learning machine: Theory and applications
Neurocomputing
Multicriteria-based active discriminative dictionary learning for scene recognition
IEEE Access
Histograms of oriented gradients for human detection
Linear spatial pyramid matching using sparse coding for image classification
Places: A 10 million image database for scene recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
Learning deep features for scene recognition using places database
Scene recognition with CNNs: Objects, scales and dataset bias
Neighbourhood components analysis
Distance metric learning for large margin nearest neighbor classification
Journal of Machine Learning Research
Cited by (10)
Sparse robust adaptive unsupervised subspace learning for dimensionality reduction
2024, Engineering Applications of Artificial IntelligencePattern classification based on regional models[Formula presented]
2022, Applied Soft ComputingCitation Excerpt :Local modeling is an alternative for modeling nonlinear problems, using multiple local linear models [3]. Notably, local modeling is still an active research topic in machine learning (ML), and the literature has presented relevant contributions in the area of metrics learning [4], clustering [5,6], features learning [7], image segmentation [8], forecasting [9], system identification [3], and more expressively, pattern classification [10–22], with recent developments in deep learning [23–27]. A comprehensive study on different local classifiers is carried out in [28].
Robust metric learning based on subspace learning with l<inf>p</inf>−norm
2022, Signal ProcessingBuilding discriminative features of scene recognition using multi-stages of inception-ResNet-v2
2023, Applied IntelligenceDenoising Multi-Similarity Formulation: A Self-Paced Curriculum-Driven Approach for Robust Metric Learning
2023, Proceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023
Chen Wang received the M.S. degree in School of Science from Northwestern Polytechnical University, Xi’an, China in March 2018. She is currently pursuing the Ph.D. degree in School of Science of Northwestern Polytechnical University, China. Her research interests include image processing, scene recognition, and metric learning.
Guohua Peng received the Ph.D. degree from Northwestern Polytechnical University, China in 1993. Currently, he is a professor in School of Science of Northwestern Polytechnical University. His major research interests are CAGD, computer graphics, and image processing.
Wei Lin received the Ph.D. degree from Northwestern Polytechnical University, China in 2007. Currently, she is an associate professor in School of Science of Northwestern Polytechnical University. Her research interests are image processing and scene recognition.