Elsevier

Applied Soft Computing

Volume 85, December 2019, 105803
Applied Soft Computing

Extensions to rank-based prototype selection in k-Nearest Neighbour classification

https://doi.org/10.1016/j.asoc.2019.105803Get rights and content

Highlights

  • New Prototype Selection methods based on ranking strategies.

  • Parameter-free strategies for Prototype Selection.

  • Robustness against label-level noise.

  • Competitive with respect to existing Prototype Selection strategies.

Abstract

The k-nearest neighbour rule is commonly considered for classification tasks given its straightforward implementation and good performance in many applications. However, its efficiency represents an obstacle in real-case scenarios because the classification requires computing a distance to every single prototype of the training set. Prototype Selection (PS) is a typical approach to alleviate this problem, which focuses on reducing the size of the training set by selecting the most interesting prototypes. In this context, rank methods have been postulated as a good solution: following some heuristics, these methods perform an ordering of the prototypes according to their relevance in the classification task, which is then used to select the most relevant ones. This work presents a significant improvement of existing rank methods by proposing two extensions: (i) a greater robustness against noise at label level by considering the parameter ‘k’ of the classification in the selection process; and (ii) a new parameter-free rule to select the prototypes once they have been ordered. The experiments performed in different scenarios and datasets demonstrate the goodness of these extensions. Also, it is empirically proved that the new full approach is competitive with respect to existing PS algorithms.

Introduction

The k-Nearest Neighbour (kNN) rule is one of the well-known algorithms in the supervised classification field [1]. Its wide popularity comes from both its conceptual simplicity as well as its good results categorizing a prototype1 with respect to its k nearest neighbour prototypes of the training set [2]. In spite of its longevity, it is still subject of ongoing research [3], [4], [5]. However, since no classification model is generated out of the training data, this algorithm generally exhibits a low efficiency in both memory consumption and computational cost.

These shortcomings have been widely analysed in the literature, where three different families of solutions have been proposed:

  • (i)

    Fast Similarity Search (FSS) methods, which base its performance on the creation of search models for fast prototype retrieval in the training set [6] as, for example, the Approximating and Eliminating Search Algorithm (AESA) family of algorithms [7].

  • (ii)

    Approximate Similarity Search (ASS) algorithms whichwork on the premise of searching sufficiently similar prototypes to a given query in the training set at the cost of slightly decreasing the classification accuracy [8], as for instance the methods in [9], [10].

  • (iii)

    Data Reduction (DR) techniques, which consist of pre-processing techniques that aim at reducing the size of the training set without affecting the quality of the classification [11].

In this work we shall focus on the latter family of methods, i.e., the ones which aim at reducing the size of the training set by means of pre-processing it.

DR can be broadly divided into two different approaches: Prototype Generation (PG) [12] and Prototype Selection (PS) [13]. The former builds a new training set with artificial prototypes that represent more efficiently the same information, while the latter simply selects the most interesting prototypes of the initial training set. PS strategies are more general as regards data representation because it is not necessary to know how the feature space is codified [14] but only the distance values among the prototypes in the set. We therefore focus on this set of strategies.

Over the last decades, there have been a number of proposals for performing PS, which will be reviewed in detail in the next section. Recently, rank-based approaches has been proposed, which are based on ordering the prototypes of the training set according to their relevance in the success of the classification task. That is, prototypes are ranked following some criteria, after which they are selected according to the established order [15].

Among the current rank methods we identify two main drawbacks. The first is that, so far, the process does not take into account the possible noise at the label level. It is true that the kNN classification is robust to this type of phenomenon because of the parameter ‘k’, which tends to soften the impact of this noise by taking into account more neighbours when classifying. However, PS methods are performed before the classification process, and so this robustness to noise might be mitigated if the PS algorithm does not take into account the value of the parameter ‘k’ that will be eventually considered. Thus, we extend in this work the current rank methods so that they also consider the ‘k’ during the selection of the prototypes. On the other hand, these methods require an extra parameter to be fixed, which regulates how many prototypes are finally selected. Given this, we also extend these rank methods to avoid the need for tuning this parameter so that the selection criterion depends exclusively on the data itself. As will be seen in the experiments, these extensions provide higher robustness to noise, as well as optimal results in the trade-off between accuracy and efficiency, thus establishing the new procedures as successful alternative to the PS methods proposed to date.

The rest of the article is structured as follows: Section 2 introduces previous attempts to PS, including those concerning rank methods. Section 3 describes our new strategy to extend previous rank methods. Section 4 presents the different data collections, evaluation metrics, and alternative PS strategies to benchmark with. Experimental evidence of the goodness of the proposed approach is given in Section 5 through a series of experiments and analyses. Finally, Section 6 outlines the main conclusions as well as promising lines for future work.

Section snippets

Background

Given that the work is framed in the context of PS, this section provides some background in this regard.

PS techniques aim at reducing the size of a given training set while maintaining (or increasing) as much as possible the accuracy of the classifier. To achieve this goal, these techniques select those prototypes of the training set that are most promising, discarding the rest of them. Formally, let T denote an initial training set. PS seek for a reduced set ST.

Typically, the accuracy

Extensions to rank methods for prototype selection

This section describes the proposed extensions to improve the rank methods for PS. For the sake of clarity, we first introduce the basic notions of the aforementioned rank methods on which the modifications will be performed. After that, we present the different proposals to improve their robustness against noisy instances. Finally, we explain the selection rule proposed to avoid the need for the manual tuning.

Experimental setup

In this section we present the configuration of the experiments carried out to evaluate the proposed improvements, such as the considered datasets, the set of PS algorithms to comparatively assess the performance of the proposed algorithms, and the evaluation protocol.

Results

In order to comprehensively evaluate our proposals, the experimental results are presented in two different ways. First of all, we compare the classical rank methods with those including the proposal to improve the process in noise environments. That is, we will compare the classical rank-based PS algorithms that always assumed k=1, with the new voting approach that considers the same k for both the selection and the classification processes. Then, we also carry out an exhaustive comparison of

Conclusions and future work

In this paper we present extensions to some classical rank methods for PS based on voting heuristics. The first extension focuses on improving the tolerance of the reduced set to noisy data by considering the parameter ‘k’ of the classifier in the voting strategies. Additionally, a self-guided criterion is proposed for the actual selection, which eliminates the need for tuning an external user parameter that the classical methods hold.

We conduct experiments with several datasets and report the

Declaration of Competing Interest

No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.asoc.2019.105803.

Acknowledgements

This work is supported by the Spanish Ministry HISPAMUS project TIN2017-86576-R, partially funded by the EU .

References (47)

  • CoverT. et al.

    Nearest neighbor pattern classification

    IEEE Trans. Inf. Theory

    (1967)
  • Ömer Faruk ErtuğrulN. et al.

    A novel version of k nearest neighbor: Dependent nearest neighbor

    Appl. Soft Comput.

    (2017)
  • ZhangS. et al.

    Efficient kNN classification with different numbers of nearest neighbors

    IEEE Trans. Neural Netw. Learn. Syst.

    (2017)
  • P. Jain, B. Kulis, I.S. Dhillon, K. Grauman, Online metric learning and fast similarity search, in: Advances in neural...
  • RuizE.V.

    An algorithm for finding nearest neighbours in (approximately) constant average time

    Pattern Recognit. Lett.

    (1986)
  • J. Wang, H.T. Shen, J. Song, J. Ji, Hashing for similarity search: A survey, arXiv preprint arXiv:1408.2927, 2014,...
  • OugiaroglouS. et al.

    Fast and accurate k-nearest neighbor classification using prototype selection by clustering

  • GarcíaS. et al.
  • TrigueroI. et al.

    A taxonomy and experimental study on prototype generation for nearest neighbor classification

    IEEE Trans. Syst. Man Cybernet. Part C (Appl. Rev.)

    (2012)
  • GarciaS. et al.

    Prototype selection for nearest neighbor classification: Taxonomy and empirical study

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2012)
  • Calvo-ZaragozaJ. et al.

    Prototype generation on structural data using dissimilarity space representation

    Neural Comput. Appl.

    (2017)
  • DebK.

    Multi-objective optimization

  • HartP.

    The condensed nearest neighbor rule (corresp.)

    IEEE Trans. Inform. Theory

    (1968)
  • Cited by (22)

    • K-nearest neighbors rule combining prototype selection and local feature weighting for classification

      2022, Knowledge-Based Systems
      Citation Excerpt :

      In this section, we review the related work and summarize our contributions. Prototype selection that selects a set of representative instances to classify the unknown instances is a widely used technique for data reduction in KNN rule [6,29]. Generally, there are three categories of prototype selection methods: editing, condensation and hybrid.

    • Efficient k-nearest neighbor search based on clustering and adaptive k values

      2022, Pattern Recognition
      Citation Excerpt :

      Since one may prioritize one of these goals depending on the actual context of use, it is thus not possible to depict a single optimal configuration. Related literature addresses this issue considering a Multi-Objective Problem framework, which basically retrieves a set of possible solutions to the task without any particular preference among them [8,28,75]. Section 5.5.1 of this work provides the analysis of the caKD+ proposal considering that particular framework for further providing additional insights about the method.

    • Efficient and decision boundary aware instance selection for support vector machines

      2021, Information Sciences
      Citation Excerpt :

      However, they also preserve a few inner instances located in dense areas. Rico-Juan et al. [36] developed some neighborhood-based heuristics that allow sorting instances of a dataset according to their expected relevance in the classification task, which is then used to decimate the dataset to a specific number of the most relevant samples. Zhu et al. [49] proposed a new neighborhood-based heuristic named cited count for identifying border instances.

    View all citing articles on Scopus
    View full text