Cost-sensitive positive and unlabeled learning
Introduction
Positive and Unlabeled learning (PU learning) [1] has gained increasing popularity in recent years due to its usefulness and effectiveness for practical applications, of which the target is to train a binary classifier from only positive and unlabeled data. Here the unlabeled data might be positive or negative, but the learning algorithm does not know their groundtruth labels during the training stage.
Since the training of a PU classifier does not depend on the explicit negative examples, it is preferred when the negative data are absent or distributed too diversely. For example, in information retrieval, the user-provided information constitutes the positive data, while the databases are regarded as unlabeled as they contain both similar and dissimilar information to the user’s query [2]. In this application, negative examples are unavailable, and thus PU learning can be utilized to find the user’s interest in the unlabeled set. In addition, in a remotely-sensed hyperspectral image, we may only be interested in identifying one specific land-cover type for certain use without considering other types [3]. In this case, we may directly treat the type-of-interest as positive and leave the remaining ones as negative, so PU learning can be employed to detect the image regions of positive land-cover type.
Existing PU learning algorithms can be mainly divided into three categories based on how the unlabeled data are treated. The first category [4], [5] initially identifies some reliable negative data in unlabeled data, and then invokes a traditional classifier to perform ordinary supervised learning. The result of such two-step framework is severely dependent on the precision of the identified negative data. That is, if the detection of negative data is inaccurate, the final outcome could be disastrous. To handle this shortcoming, the second category [6], [7], [8] directly treats all unlabeled data as negative and takes PU learning as a label noise learning problem (the definition of label noise learning can be found in [9]), among which the original positive examples are deemed as mislabeled as negative data. The last but also the most prevalent category [10], [11], [12] in recent years focuses on designing various unbiased risk estimators. The approaches of this category apply distinct loss functions that satisfy specific conditions to PU risk estimators, and result in various unbiased risk estimators. A breakthrough in this direction is [10] for proposing the first unbiased risk estimator with a nonconvex loss function such that (e.g., ramp loss ) with z being the variable. Furthermore, a more general and consistent unbiased estimator was proposed in [11] which advances a novel “double hinge loss” so that the composite loss satisfies after normalization. After that, a nonnegative unbiased risk estimator suggested in [12] converts negative part of empirical risks in [11] into zero to avoid overfitting.
Although the methods mentioned above have received encouraging performances on various datasets or tasks, these methods would fail when encountering class imbalanced situations. In practical applications, the class imbalanced phenomena are prevalent, such as credit card fraud detection, disease diagnosis, and outlier detection, etc. In outlier detection, the very few outliers identified by the primitive detector constitute the positive set, and the remaining data points are deemed as unlabeled because some outliers are probably hidden among them. Moreover, the outliers usually occupy a small part of the entire dataset when compared with the inliers, which results in a class imbalanced PU learning problem. Unfortunately, none of the existing PU learning methods takes the class imbalance problem into consideration, so they are all likely to classify every example into the majority class (e.g., inlier) to acquire high classification accuracy. As a result, the influence of minority class (e.g., outlier) will be overwhelmed by the majority class [13] in deciding the decision function, and thus the biased classifier will be generated. This is obviously undesirable as the minority class usually contains our primary interest.
To enable PU learning to applicable to the imbalanced data, in this paper, we propose a novel algorithm dubbed “Cost-Sensitive Positive and Unlabeled learning” (CSPU) which is convex and depends on the widely-used unbiased double hinge loss [11]. To be specific, we cast PU learning as an empirical risk minimization problem, in which the losses incurred by false negative and false positive examples are assigned distinct weights. As a result, the generated decision boundary can be calibrated to the potentially correct one. We show that our CSPU algorithm can be converted into a traditional Quadratic Programming (QP) problem, so it can be easily solved via off-the-shelf QP optimization toolbox. Theoretically, we analyze the computational complexity of our CSPU algorithm, and derive a generalization error bound of the algorithm based on its Rademacher complexity. Thorough experiments on various practical imbalanced datasets demonstrate that the proposed CSPU is superior to the state-of-the-art PU methods in terms of the F-measure metric [14], [15]. The main contributions of our work are summarized as follows:
- •
We propose a novel learning setting called “Cost-Sensitive PU learning” (CSPU) to model the practical problems where the absence of negative data and the class imbalance problem co-occur.
- •
We design a novel algorithm to address the CSPU learning problem, which introduces a convex empirical risk estimator with double hinge loss, and an efficient optimization method is also provided to solve our algorithm.
- •
We analyze the computational complexity of our algorithm, which takes . We also derive a generalization error bound of the algorithm based on its Rademacher complexity, which reveals that the generalization error converges to the expected classification risk with the order of (, and are the amounts of training data, positive data, and unlabeled data correspondingly).
- •
We achieve the state-of-the-art results when compared with other PU learning methods in dealing with class imbalanced PU learning problem.
The rest of this paper is organized as follows. In Section 2, the related works of PU learning and imbalanced data learning are reviewed. Section 3 introduces the proposed CSPU algorithm. The optimal solution of our CSPU is given in Section 4. Section 5 studies the computational complexity and derives a generalization error bound of the proposed algorithm. The experimental results of our CSPU and other representative PU comparators are presented in Section 6. Finally, we draw a conclusion in Section 7.
Section snippets
Related work
In this section, we review the representative works of PU learning and imbalanced data learning, as these two learning frameworks are very relevant to the topic of this paper.
The proposed algorithm
The target of PU learning is to train a binary classifier from only positive and unlabeled data. Our proposed algorithm aims to address the situations where the absence of negative training data and the class imbalance problem co-occur. These phenomena are prevalent in many real-world cases, such as outlier detection. In this section, we first provide the formal setting for the PU learning problem, and then propose our CSPU classification algorithm.
Optimization
In this section, we solve our algorithm presented in (14)–(20), which falls into scope of Quadratic Programming (QP) that has the form
In our algorithm, we let
Then is defined aswhere the is a zero matrix of the size . Accordingly, the coefficient in (14) is constituted of
Similarly, the in the constraint of (14) is
Theoretical analyses
This section provides the theoretical analyses on CSPU. We firstly analyze the computational complexity of Algorithm 1, and then theoretically derive a generalization error bound of CSPU.
Experiments
In this section, we test the performance of our proposed CSPU by performing exhaustive experiments on one synthetic dataset, four publicly available benchmark datasets, and two real-world datasets. To demonstrate the superiority of CSPU, we compare it with several state-of-the-art PU learning algorithms including Weighted SVM (W-SVM) [19], Unbiased PU learning (UPU) [11], Multi-Layer Perceptron with Non-Negative PU risk estimator (NNPU-MLP) [12], Linear classifier with Non-Negative PU risk
Conclusion
In this paper, we propose a novel PU learning algorithm to deal with class imbalance problem named “Cost-Sensitive PU learning” (CSPU) which imposes distinct weights on the losses regarding false negative and false positive examples. Then the PU learning is formulated as an empirical risk minimization problem with respect to the unbiased double hinge loss that makes the empirical risk to be convex. The proposed algorithm can be easily solved via off-the-shelf quadratic programming optimization
CRediT authorship contribution statement
Xiuhua Chen: Conceptualization, Data curation, Investigation, Methodology, Validation, Writing - original draft. Chen Gong: Formal analysis, Validation, Writing - review & editing, Supervision. Jian Yang: Writing - review & editing, Supervision.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
We would like to thank Dr. Tongliang Liu from the University of Sydney for helping proofread this paper and all anonymous reviewers for the valuable comments to improve our paper. This work was supported by the NSF of China (Nos: 61973162, U1713208), the Fundamental Research Funds for the Central Universities (No: 30920032202), CCF-Tencent Open Fund (No: RAGR20200101), the “Young Elite Scientists Sponsorship Program” by CAST (No: 2018QNRC001), and Hong Kong Scholars Program (No: XJ2019036).
References (50)
- et al.
A hybrid evolutionary preprocessing method for imbalanced datasets
Inf. Sci.
(2018) - et al.
A novel ensemble method for classifying imbalanced data
Pattern Recogn.
(2015) - et al.
A comprehensive analysis of Synthetic Minority Oversampling TEchnique (SMOTE) for handling class imbalance
Inf. Sci.
(2019) - et al.
Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE
Inf. Sci.
(2018) - et al.
Under-sampling class imbalanced datasets by combining clustering analysis and instance selection
Inf. Sci.
(2019) - et al.
Neighbourhood-based undersampling approach for handling imbalanced and overlapped data
Inf. Sci.
(2020) - et al.
Cost-sensitive dual-bidirectional linear discriminant analysis
Inf. Sci.
(2020) - et al.
Learning from positive and unlabeled data: a survey
Mach. Learn.
(2020) - et al.
Efficient training for positive unlabeled learning
IEEE Trans. Pattern Anal. Mach. Intell.
(2018) - et al.
A positive and unlabeled learning algorithm for one-class classification of remote-sensing data
IEEE Trans. Geosci. Remote Sens.
(2010)
Classification in the presence of label noise: a survey
IEEE Trans. Neural Netw. Learn. Syst.
Special issue on learning from imbalanced data sets
ACM SIGKDD Explorations Newslett.
Cost-sensitive feature selection by optimizing F-measures
IEEE Trans. Image Process.
PEBL: positive example based learning for web page classification using SVM
Loss decomposition and centroid estimation for positive and unlabeled learning
IEEE Trans. Pattern Anal. Mach. Intell.
Cited by (18)
A loss matrix-based alternating optimization method for sparse PU learning
2022, Swarm and Evolutionary ComputationCitation Excerpt :To this end, Plessis et al. [14] further discussed different loss functions and proposed a convex double hinge loss function for PU learning, named uPU, which can obtain the PU classifier with better classification performance. Recently, Chen et al. [31] imposed different misclassification costs on different classes and employed double hinge loss to build for the proposed algorithm under the framework of empirical risk minimization. Zhang et al. [32] assigned a set of candidate labels (i.e., positive and negative) to each U sample and proposed a novel PU learning algorithm, termed PULD, in which a disambiguation technique was designed to determine the true label of each U sample.
A new self-paced learning method for privilege-based positive and unlabeled learning
2022, Information SciencesCitation Excerpt :In the work of [4,5], Wu et al. further propose a puMGL method for the multi-graph learning problem. Chen et al. [6] take the class imbalance problem into account, which leads to a biased classifier due to the neglection of the minority class, and propose a cost-sensitive algorithm called CSPU for the class imbalance problem. According to how unlabeled samples are used, we can divide existing PU learning algorithms into two classes.
An efficiency curve for evaluating imbalanced classifiers considering intrinsic data characteristics: Experimental analysis
2022, Information SciencesCitation Excerpt :One disadvantage of cost-sensitive classifiers is the difficulty in determining the optimal cost ratio. Chen et al. [6] introduced cost-sensitive positive and unlabeled learning, and imposed different misclassification costs on different classes. Wang et al. [47] concluded that existing methods are unable to determine precise misclassification cost values.
Predict-then-optimize or predict-and-optimize? An empirical evaluation of cost-sensitive learning strategies
2022, Information SciencesCitation Excerpt :A recent, dedicated framework and overview of cost-sensitive ensemble methods is presented in [34]. Moreover, whereas this work focuses on cost-sensitive learning in the context of supervised learning, other work has focused on cost-sensitive semi-supervised [44] and positive-unlabeled learning [8]. Finally, a related line of work in regression considers asymmetric objectives to more closely align a regression model’s learning objective with the decision-making task [19].
A recent survey on instance-dependent positive and unlabeled learning
2022, Fundamental ResearchEvidential reasoning based ensemble classifier for uncertain imbalanced data
2021, Information SciencesCitation Excerpt :Algorithm level methods are to develop a new algorithm or modify existing algorithms to adapt them to imbalanced data [20]. Cost-sensitive methods combine resampling methods or algorithm level methods and assign different misclassification costs for classes in the training process of classifiers [21]. Ensemble learning is frequently combined with resampling methods to balance the class distribution of data before individual classifiers are trained.