Elsevier

Information Sciences

Volume 558, May 2021, Pages 229-245
Information Sciences

Cost-sensitive positive and unlabeled learning

https://doi.org/10.1016/j.ins.2021.01.002Get rights and content

Abstract

Positive and Unlabeled learning (PU learning) aims to train a binary classifier solely based on positively labeled and unlabeled data when negatively labeled data are absent or distributed too diversely. However, none of the existing PU learning methods takes the class imbalance problem into account, which significantly neglects the minority class and is likely to generate a biased classifier. Therefore, this paper proposes a novel algorithm termed “Cost-Sensitive Positive and Unlabeled learning” (CSPU) which imposes different misclassification costs on different classes when conducting PU classification. Specifically, we assign distinct weights to the losses caused by false negative and false positive examples, and employ double hinge loss to build our CSPU algorithm under the framework of empirical risk minimization. Theoretically, we analyze the computational complexity, and also derive a generalization error bound of CSPU which guarantees the good performance of our algorithm on test data. Empirically, we compare CSPU with the state-of-the-art PU learning methods on synthetic dataset, OpenML benchmark datasets, and real-world datasets. The results clearly demonstrate the superiority of the proposed CSPU to other comparators in dealing with class imbalanced tasks.

Introduction

Positive and Unlabeled learning (PU learning) [1] has gained increasing popularity in recent years due to its usefulness and effectiveness for practical applications, of which the target is to train a binary classifier from only positive and unlabeled data. Here the unlabeled data might be positive or negative, but the learning algorithm does not know their groundtruth labels during the training stage.

Since the training of a PU classifier does not depend on the explicit negative examples, it is preferred when the negative data are absent or distributed too diversely. For example, in information retrieval, the user-provided information constitutes the positive data, while the databases are regarded as unlabeled as they contain both similar and dissimilar information to the user’s query [2]. In this application, negative examples are unavailable, and thus PU learning can be utilized to find the user’s interest in the unlabeled set. In addition, in a remotely-sensed hyperspectral image, we may only be interested in identifying one specific land-cover type for certain use without considering other types [3]. In this case, we may directly treat the type-of-interest as positive and leave the remaining ones as negative, so PU learning can be employed to detect the image regions of positive land-cover type.

Existing PU learning algorithms can be mainly divided into three categories based on how the unlabeled data are treated. The first category [4], [5] initially identifies some reliable negative data in unlabeled data, and then invokes a traditional classifier to perform ordinary supervised learning. The result of such two-step framework is severely dependent on the precision of the identified negative data. That is, if the detection of negative data is inaccurate, the final outcome could be disastrous. To handle this shortcoming, the second category [6], [7], [8] directly treats all unlabeled data as negative and takes PU learning as a label noise learning problem (the definition of label noise learning can be found in [9]), among which the original positive examples are deemed as mislabeled as negative data. The last but also the most prevalent category [10], [11], [12] in recent years focuses on designing various unbiased risk estimators. The approaches of this category apply distinct loss functions that satisfy specific conditions to PU risk estimators, and result in various unbiased risk estimators. A breakthrough in this direction is [10] for proposing the first unbiased risk estimator with a nonconvex loss function (z) such that (z)+(-z)=1 (e.g., ramp loss R(z)=12max(0,min(2,1-z))) with z being the variable. Furthermore, a more general and consistent unbiased estimator was proposed in [11] which advances a novel “double hinge loss” DH(z)=max(-z,max(0,12-12z)) so that the composite loss ̃(z)=DH(z)-DH(-z) satisfies ̃(z)=-z after normalization. After that, a nonnegative unbiased risk estimator suggested in [12] converts negative part of empirical risks in [11] into zero to avoid overfitting.

Although the methods mentioned above have received encouraging performances on various datasets or tasks, these methods would fail when encountering class imbalanced situations. In practical applications, the class imbalanced phenomena are prevalent, such as credit card fraud detection, disease diagnosis, and outlier detection, etc. In outlier detection, the very few outliers identified by the primitive detector constitute the positive set, and the remaining data points are deemed as unlabeled because some outliers are probably hidden among them. Moreover, the outliers usually occupy a small part of the entire dataset when compared with the inliers, which results in a class imbalanced PU learning problem. Unfortunately, none of the existing PU learning methods takes the class imbalance problem into consideration, so they are all likely to classify every example into the majority class (e.g., inlier) to acquire high classification accuracy. As a result, the influence of minority class (e.g., outlier) will be overwhelmed by the majority class [13] in deciding the decision function, and thus the biased classifier will be generated. This is obviously undesirable as the minority class usually contains our primary interest.

To enable PU learning to applicable to the imbalanced data, in this paper, we propose a novel algorithm dubbed “Cost-Sensitive Positive and Unlabeled learning” (CSPU) which is convex and depends on the widely-used unbiased double hinge loss [11]. To be specific, we cast PU learning as an empirical risk minimization problem, in which the losses incurred by false negative and false positive examples are assigned distinct weights. As a result, the generated decision boundary can be calibrated to the potentially correct one. We show that our CSPU algorithm can be converted into a traditional Quadratic Programming (QP) problem, so it can be easily solved via off-the-shelf QP optimization toolbox. Theoretically, we analyze the computational complexity of our CSPU algorithm, and derive a generalization error bound of the algorithm based on its Rademacher complexity. Thorough experiments on various practical imbalanced datasets demonstrate that the proposed CSPU is superior to the state-of-the-art PU methods in terms of the F-measure metric [14], [15]. The main contributions of our work are summarized as follows:

  • We propose a novel learning setting called “Cost-Sensitive PU learning” (CSPU) to model the practical problems where the absence of negative data and the class imbalance problem co-occur.

  • We design a novel algorithm to address the CSPU learning problem, which introduces a convex empirical risk estimator with double hinge loss, and an efficient optimization method is also provided to solve our algorithm.

  • We analyze the computational complexity of our algorithm, which takes O(9n3+15n2+7n+1). We also derive a generalization error bound of the algorithm based on its Rademacher complexity, which reveals that the generalization error converges to the expected classification risk with the order of O(1/np+1/nu+1/n) (n,np, and nu are the amounts of training data, positive data, and unlabeled data correspondingly).

  • We achieve the state-of-the-art results when compared with other PU learning methods in dealing with class imbalanced PU learning problem.

The rest of this paper is organized as follows. In Section 2, the related works of PU learning and imbalanced data learning are reviewed. Section 3 introduces the proposed CSPU algorithm. The optimal solution of our CSPU is given in Section 4. Section 5 studies the computational complexity and derives a generalization error bound of the proposed algorithm. The experimental results of our CSPU and other representative PU comparators are presented in Section 6. Finally, we draw a conclusion in Section 7.

Section snippets

Related work

In this section, we review the representative works of PU learning and imbalanced data learning, as these two learning frameworks are very relevant to the topic of this paper.

The proposed algorithm

The target of PU learning is to train a binary classifier from only positive and unlabeled data. Our proposed algorithm aims to address the situations where the absence of negative training data and the class imbalance problem co-occur. These phenomena are prevalent in many real-world cases, such as outlier detection. In this section, we first provide the formal setting for the PU learning problem, and then propose our CSPU classification algorithm.

Optimization

In this section, we solve our algorithm presented in (14)–(20), which falls into scope of Quadratic Programming (QP) that has the formminγ12γHγ+fγs.t.Lγk,qγ.

In our algorithm, we letγ=αn+1×1ηnp×1ξnu×1.

Then H is defined asH=λKKO(n+1)×npO(n+1)×nuOnp×(n+1)Onp×npOnp×nuOnu×(n+1)Onu×npOnu×nu,where the O(n+1)×np is a zero matrix of the size (n+1)×np. Accordingly, the coefficient f in (14) is constituted off=0(n+1)×1πnp1np×1c-1nu1nu×1.

Similarly, the q in the constraint of (14) isq=-n+1×1-

Theoretical analyses

This section provides the theoretical analyses on CSPU. We firstly analyze the computational complexity of Algorithm 1, and then theoretically derive a generalization error bound of CSPU.

Experiments

In this section, we test the performance of our proposed CSPU by performing exhaustive experiments on one synthetic dataset, four publicly available benchmark datasets, and two real-world datasets. To demonstrate the superiority of CSPU, we compare it with several state-of-the-art PU learning algorithms including Weighted SVM (W-SVM) [19], Unbiased PU learning (UPU) [11], Multi-Layer Perceptron with Non-Negative PU risk estimator (NNPU-MLP) [12], Linear classifier with Non-Negative PU risk

Conclusion

In this paper, we propose a novel PU learning algorithm to deal with class imbalance problem named “Cost-Sensitive PU learning” (CSPU) which imposes distinct weights on the losses regarding false negative and false positive examples. Then the PU learning is formulated as an empirical risk minimization problem with respect to the unbiased double hinge loss that makes the empirical risk to be convex. The proposed algorithm can be easily solved via off-the-shelf quadratic programming optimization

CRediT authorship contribution statement

Xiuhua Chen: Conceptualization, Data curation, Investigation, Methodology, Validation, Writing - original draft. Chen Gong: Formal analysis, Validation, Writing - review & editing, Supervision. Jian Yang: Writing - review & editing, Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

We would like to thank Dr. Tongliang Liu from the University of Sydney for helping proofread this paper and all anonymous reviewers for the valuable comments to improve our paper. This work was supported by the NSF of China (Nos: 61973162, U1713208), the Fundamental Research Funds for the Central Universities (No: 30920032202), CCF-Tencent Open Fund (No: RAGR20200101), the “Young Elite Scientists Sponsorship Program” by CAST (No: 2018QNRC001), and Hong Kong Scholars Program (No: XJ2019036).

References (50)

  • B. Liu, W.S. Lee, P.S. Yu, X. Li, Partially supervised classification of text documents, in: International Conference...
  • X. Li, B. Liu, Learning to classify texts using positive and unlabeled data, in: International Joint Conference on...
  • W.S. Lee, B. Liu, Learning with positive and unlabeled examples using weighted logistic regression, in: International...
  • H. Shi, S. Pan, J. Yang, C. Gong, Positive and unlabeled learning via loss decomposition and centroid estimation, in:...
  • F. He, T. Liu, G.I. Webb, D. Tao, Instance-dependent PU learning by bayesian optimal relabeling, arXiv preprint...
  • B. Frénay et al.

    Classification in the presence of label noise: a survey

    IEEE Trans. Neural Netw. Learn. Syst.

    (2014)
  • M.C. Du Plessis, G. Niu, M. Sugiyama, Analysis of learning from positive and unlabeled data, in: Advances in Neural...
  • M. Du Plessis, G. Niu, M. Sugiyama, Convex formulation for learning from positive and unlabeled data, in: International...
  • R. Kiryo, G. Niu, M.C. Du Plessis, M. Sugiyama, Positive-unlabeled learning with non-negative risk estimator, in:...
  • N.V. Chawla et al.

    Special issue on learning from imbalanced data sets

    ACM SIGKDD Explorations Newslett.

    (2004)
  • M. Liu et al.

    Cost-sensitive feature selection by optimizing F-measures

    IEEE Trans. Image Process.

    (2017)
  • S.P. Parambath, N. Usunier, Y. Grandvalet, Optimizing F-measures by cost-sensitive classification, in: Advances in...
  • B. Liu, Y. Dai, X. Li, W.S. Lee, S.Y. Philip, Building text classifiers using positive and unlabeled examples, in: IEEE...
  • H. Yu et al.

    PEBL: positive example based learning for web page classification using SVM

  • C. Gong et al.

    Loss decomposition and centroid estimation for positive and unlabeled learning

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2019)
  • Cited by (18)

    • A loss matrix-based alternating optimization method for sparse PU learning

      2022, Swarm and Evolutionary Computation
      Citation Excerpt :

      To this end, Plessis et al. [14] further discussed different loss functions and proposed a convex double hinge loss function for PU learning, named uPU, which can obtain the PU classifier with better classification performance. Recently, Chen et al. [31] imposed different misclassification costs on different classes and employed double hinge loss to build for the proposed algorithm under the framework of empirical risk minimization. Zhang et al. [32] assigned a set of candidate labels (i.e., positive and negative) to each U sample and proposed a novel PU learning algorithm, termed PULD, in which a disambiguation technique was designed to determine the true label of each U sample.

    • A new self-paced learning method for privilege-based positive and unlabeled learning

      2022, Information Sciences
      Citation Excerpt :

      In the work of [4,5], Wu et al. further propose a puMGL method for the multi-graph learning problem. Chen et al. [6] take the class imbalance problem into account, which leads to a biased classifier due to the neglection of the minority class, and propose a cost-sensitive algorithm called CSPU for the class imbalance problem. According to how unlabeled samples are used, we can divide existing PU learning algorithms into two classes.

    • An efficiency curve for evaluating imbalanced classifiers considering intrinsic data characteristics: Experimental analysis

      2022, Information Sciences
      Citation Excerpt :

      One disadvantage of cost-sensitive classifiers is the difficulty in determining the optimal cost ratio. Chen et al. [6] introduced cost-sensitive positive and unlabeled learning, and imposed different misclassification costs on different classes. Wang et al. [47] concluded that existing methods are unable to determine precise misclassification cost values.

    • Predict-then-optimize or predict-and-optimize? An empirical evaluation of cost-sensitive learning strategies

      2022, Information Sciences
      Citation Excerpt :

      A recent, dedicated framework and overview of cost-sensitive ensemble methods is presented in [34]. Moreover, whereas this work focuses on cost-sensitive learning in the context of supervised learning, other work has focused on cost-sensitive semi-supervised [44] and positive-unlabeled learning [8]. Finally, a related line of work in regression considers asymmetric objectives to more closely align a regression model’s learning objective with the decision-making task [19].

    • Evidential reasoning based ensemble classifier for uncertain imbalanced data

      2021, Information Sciences
      Citation Excerpt :

      Algorithm level methods are to develop a new algorithm or modify existing algorithms to adapt them to imbalanced data [20]. Cost-sensitive methods combine resampling methods or algorithm level methods and assign different misclassification costs for classes in the training process of classifiers [21]. Ensemble learning is frequently combined with resampling methods to balance the class distribution of data before individual classifiers are trained.

    View all citing articles on Scopus
    View full text