Robust local metric learning via least square regression regularization for scene recognition

doi:10.1016/j.neucom.2020.08.077

Neurocomputing

Volume 423, 29 January 2021, Pages 179-189

https://doi.org/10.1016/j.neucom.2020.08.077 Get rights and content

Abstract

Metric learning plays an important role in various machine learning tasks. Particularly, local metric learning is prevailing since it can learn more flexible metrics on complex datasets. However it may not be robust for scene recognition because of intra-class diversity and inter-class similarity in the scene images. To address this issue, we propose a novel method, called robust local metric learning via least square regression regularization (RLML-LSR), to learn a more robust distance metric for scene recognition. We first formulate a local discriminative metric function, aimed to pull same class neighbors closer and push different classes ones farther away simultaneously. Then taking advantage of the least square regression, we minimize the regression errors of same class neighbors, such that the local geometry structure can be preserved as much as possible. Finally, the local discriminative metric function and least square regression regularization are integrated into a unified framework, which jointly promotes the robustness of local metric learning and enhances the recognition performance of scene images. Extensive experiments on both natural scene and remote sensing scene datasets demonstrate the effectiveness and robustness of the proposed RLML-LSR method for scene recognition.

Introduction

Scene recognition plays an important role in the field of computer vision since it helps reduce the semantic gap of scene understanding between human beings and computers [1]. During the past few decades, various researches [2], [3], [4], [5], [6], [7], [8] have been developed to improve the performance of scene recognition from different points of view. However, it is still a challenging problem because of intra-class diversity and inter-class similarity in the scene images. The challenge for scene recognition consists of two steps, feature extraction step and metric learning step.

For the feature extraction step, traditional methods are mostly based on low-level features or mid-level semantic representations. Methods based on low-level features directly extract the basic visual features of scene images, while methods based on mid-level features tend to learn a holistic scene representation through the high-order statistical information. The lack of the more discriminative and abstractive scene representation greatly restricts the recognition performance of these methods. Recently, with powerful convolutional neural network (CNN) and large-scale training datasets being available, deep learning based methods [9], [10], [11], [12] have achieved prominent performance in the domain of scene recognition. They can hierarchically extract more abstractive and representative features from scene images, thereby promoting the recognition performance in a large degree. Nevertheless, for practical scene recognition tasks, it is hard to fully train a new CNN model from scratch. This is the reason why researchers transfer CNN models pre-trained on large-scale datasets for scene recognition.

For the metric learning step, how to learn an appropriate distance metric has been comprehensively surveyed in recent years, yet it still remains an open problem. The essence of metric learning is to find a kind of projection transformation, so that it can transform the original samples into a more discriminative metric space. Considering that global metric learning is not suitable for all training samples and results in unsatisfactory performance, more researchers pay attention to local metric learning. Representative local metric learning methods include neighborhood component analysis (NCA) [13], large margin nearest neighbor (LMNN) [14], local discriminative distance metrics (LDDM) [15], large margin local metric learning (LMLML) [16], and local metric learning with eigenvectors (MLEV-L) [17]. While these methods have shown promising performance for scene recognition, most of them suffer from over-fitting problem due to the high similarity of inter-class scenes, such as bedroom and living room. Alternatively, other researchers combined local metric learning with global metric learning to learn a more reasonable distance metric. Liong et al. [18] proposed a regularized local metric learning (RLML) method, which combines global and local metrics to represent the intra-class and inter-class variances. Zhang and Zhao [19] explored an integrated global–local metric learning (IGLML) method, where the local metrics are combined with the global metric by their posterior probabilities of GMM. Whereas, these methods ignore the robustness of local metric learning, thereby failing to estimate the variation of intra-class and inter-class in the scene images. To this end, it is important and urgent to learn a robust local metric learning method for scene recognition, which should be able to alleviate the intra-class diversity and inter-class similarity and hence enhance the recognition performance of scene images.

As mentioned previously, in this paper, we take the pre-trained deep features as scene representation, and focus on studying a robust local metric learning method via least square regression regularization for scene recognition. To be specific, given the pre-trained deep features, we first formulate local discriminative metric function with all label information taking into consideration, which pulls same class neighbors closer and pushes different classes ones farther away simultaneously. Therefore, the corresponding low-dimensional discriminative features can be well explored from original deep features. Then taking advantage of the least square regression, we minimize the regression error between the original deep features and corresponding low-dimensional discriminative features, such that the local geometry structure can be preserved as much as possible. Finally, the local discriminative metric function and least square regression regularization are integrated into a unified framework. By doing so, the least square regression can regularize the local metric learning and in turn the local metric learning is able to guide the least square regression, which therefore has the ability to promote the robustness of local metric learning and enhance the recognition performance of scene images. As shown in Fig. 1, compared to the original deep features, the robust local metric learning makes the same class more compact and different classes more separable.

The rest of the paper is organized as follows. Related works are presented in Section 2. Section 3 introduces the proposed robust local metric learning via least square regression regularization. Experimental results are given in Section 4. Section 5 concludes this paper.

Section snippets

Related works

In this section, we briefly review two related topics: scene recognition and metric learning.

Robust local metric learning via least square regression regularization

In this section, we first introduce the proposed RLML-LSR method in detail, followed by convergence analysis as well as computational complexity analysis.

Experiments

To validate the effectiveness and robustness of the proposed RLML-LSR method for scene recognition, we test the experiments on both natural scene and remote sensing scene datasets. First, we give the experiment datasets and setup. Second, we conduct parameter analysis to select the optimal parameter. Third, we compare our method with related metric learning methods as well as the state-of-the-art recognition methods. Finally, we take convergence study to demonstrate the efficiency of the

Conclusion

In this paper, taking advantage of the least square regression, we propose a robust local metric learning method for scene recognition. We formulate a local discriminative metric function to well explore the low-dimensional discriminative features from original deep features. Besides, minimizing the regression errors of same class neighbors enables us to preserve the local geometry structure as much as possible. What’s more, the local discriminative metric function and least square regression

CRediT authorship contribution statement

Chen Wang: Methodology, Software, Writing - original draft. Guohua Peng: Visualization, Supervision. Wei Lin: Validation, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The authors would like to thank Fei-Fei Li, Svetlana Lazebnik, and Shawn Newsam, who generously provided their UIUC-8 dataset, Scene-15 dataset and UCM-21 dataset.

Chen Wang received the M.S. degree in School of Science from Northwestern Polytechnical University, Xi’an, China in March 2018. She is currently pursuing the Ph.D. degree in School of Science of Northwestern Polytechnical University, China. Her research interests include image processing, scene recognition, and metric learning.

References (41)

Z. Zuo et al.
Exemplar based deep discriminative and shareable feature learning for scene image classification
Pattern Recognition
(2015)
L. Xie et al.
Improved spatial pyramid matching for scene recognition
Pattern Recognition
(2018)
S. Liu et al.
A novel scene classification model combining ResNet based transfer learning and data augmentation with a filter
Neurocomputing
(2019)
P. Tang et al.
G-MS2F: GoogLeNet based multi-stage feature fusion of deep CNN for scene recognition
Neurocomputing
(2017)
X. Cheng et al.
Scene recognition with objectness
Pattern Recognition
(2018)
Y. Mu et al.
Local discriminative distance metrics ensemble learning
Pattern Recognition
(2013)
D. Li et al.
Global and local metric learning via eigenvectors
Knowledge-Based Systems
(2017)
V.E. Liong et al.
Regularized local metric learning for person re-identification
Pattern Recognition Letters
(2015)
S. Xiang et al.
Learning a Mahalanobis distance metric for data clustering and classification
Pattern Recognition
(2008)
L. Zhang et al.
Multiview discriminative marginal metric learning for makeup face verification
Neurocomputing
(2019)

G.B. Huang et al.

Extreme learning machine: Theory and applications

Neurocomputing

(2006)

C. Zheng et al.

Multicriteria-based active discriminative dictionary learning for scene recognition

IEEE Access

(2018)

N. Dalal et al.

Histograms of oriented gradients for human detection

J. Yang et al.

Linear spatial pyramid matching using sparse coding for image classification

L.J. Li, H. Su, Fei-Fei. Li, E.P. Xing, Object bank: a high-level image representation for scene classification and...

B. Zhou et al.

Places: A 10 million image database for scene recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence

(2018)

B. Zhou et al.

Learning deep features for scene recognition using places database

L. Herranz et al.

Scene recognition with CNNs: Objects, scales and dataset bias

J. Goldberger et al.

Neighbourhood components analysis

K.Q. Weinberger et al.

Distance metric learning for large margin nearest neighbor classification

Journal of Machine Learning Research

(2009)

Cited by (10)

Sparse robust adaptive unsupervised subspace learning for dimensionality reduction
2024, Engineering Applications of Artificial Intelligence
This work is devoted to the investigation of dimension reduction problem. As an efficient dimension reduction method, much attention has been paid on unsupervised subspace learning since it does not rely on expensive labels. Firstly, we implant a robust estimator in the error term of objective function, this leads to that small coefficients can be automatically allocated to the abnormal points. Thus, our model is robust to noise and outliers. Posteriorly, the $L_{2, r}$ -norm $(1 \leq r \leq 2)$ is used as a measure of error, then, the performance of the model can be improved by selecting the appropriate adaptive parameter $r$ . Further, a $L_{2, p}$ -norm $(0 < p \leq 1)$ regularization term is added to the objective function, therefore the gained sparse subspace can further improve the efficiency and accuracy of the expression, as well as enhance the generalization ability of the model and reduce over-fitting. Moreover, an efficient algorithm with fast convergence speed is designed to solve the model. Finally, the experimental results on 14 datasets show that the subspace dimension obtained by our method is no more than 6. From the results of classification test, our algorithm has obvious advantages over the other similar six algorithms.
Pattern classification based on regional models[Formula presented]
2022, Applied Soft Computing
Citation Excerpt :
Local modeling is an alternative for modeling nonlinear problems, using multiple local linear models [3]. Notably, local modeling is still an active research topic in machine learning (ML), and the literature has presented relevant contributions in the area of metrics learning [4], clustering [5,6], features learning [7], image segmentation [8], forecasting [9], system identification [3], and more expressively, pattern classification [10–22], with recent developments in deep learning [23–27]. A comprehensive study on different local classifiers is carried out in [28].
In a supervised setting, the global classification paradigm leverages the whole training data to produce a single class discriminative model. Alternatively, the local classification approach builds multiple base classifiers, each of them using a small subset of the training data. In this paper, we take a path to stand in-between the global and local approaches. We introduce a two-level clustering-based method in which base classifiers operate on a larger portion of the input space than in the traditional local paradigm. In particular, we first obtain a grained input representation by employing a Self-Organizing Map (SOM) to the inputs. We then apply a clustering algorithm (e.g., K-Means) to the SOM units to define input regions — a subset of input samples associated with a specific cluster of SOM units. We refer to this approach as regional classification. We demonstrate the effectiveness of regional classification on several benchmarks. Also, we study the impact of (1) adopting linear and nonlinear base classifiers (e.g., least squares support vector machines) and (2) using cluster validation indexes to determine the optimal number of clusters. Based on the experiments, the regional classification approach achieves competitive performance compared to its global and local counterparts, especially when equipped with linear base classifiers.
Robust metric learning based on subspace learning with l<inf>p</inf>−norm
2022, Signal Processing
Distance metric learning has been an important technique in machine learning field recently due to its high effectiveness in improving the performance of distance related methods. In order to take advantages of both subspace learning and metric learning to overcome the limitations of metric learning, in this work we intend to learn a robust discriminative subspace and a distance metric simultaneously by maximizing the ratio of inter-class covariance to inner-class covariance using $l_{p} - n o r m$ $(0 < p \leq 2)$ , where the $l_{p} - n o r m$ is used to enhance the robustness. The proposed model is a more general framework compared to the state-of-art algorithms. Moreover, a modified gradient ascending algorithm is designed to optimize the problem, and the convergence of the algorithm and complexity are analyzed. To verify the proposed method, we carry out numerical experiments on artificial data sets and benchmark data sets. Under different evaluation criterions, experiment results show that the proposed method achieves better performance than the state-of-art algorithms in most cases.
Building discriminative features of scene recognition using multi-stages of inception-ResNet-v2
2023, Applied Intelligence
EnTri: Ensemble learning with tri-level representations for explainable scene recognition
2023, arXiv
Denoising Multi-Similarity Formulation: A Self-Paced Curriculum-Driven Approach for Robust Metric Learning
2023, Proceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023

View all citing articles on Scopus

Guohua Peng received the Ph.D. degree from Northwestern Polytechnical University, China in 1993. Currently, he is a professor in School of Science of Northwestern Polytechnical University. His major research interests are CAGD, computer graphics, and image processing.

Wei Lin received the Ph.D. degree from Northwestern Polytechnical University, China in 2007. Currently, she is an associate professor in School of Science of Northwestern Polytechnical University. Her research interests are image processing and scene recognition.

View full text

Robust local metric learning via least square regression regularization for scene recognition

Abstract

Introduction

Section snippets

Related works

Robust local metric learning via least square regression regularization

Experiments

Conclusion

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgments

Pattern Recognition

Pattern Recognition

Neurocomputing

Neurocomputing

Pattern Recognition

Pattern Recognition

Knowledge-Based Systems

Pattern Recognition Letters

Pattern Recognition

Neurocomputing

Neurocomputing

Multicriteria-based active discriminative dictionary learning for scene recognition

IEEE Access

Histograms of oriented gradients for human detection

Linear spatial pyramid matching using sparse coding for image classification

Places: A 10 million image database for scene recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence

Learning deep features for scene recognition using places database

Scene recognition with CNNs: Objects, scales and dataset bias

Neighbourhood components analysis

Distance metric learning for large margin nearest neighbor classification

Journal of Machine Learning Research