To read this content please select one of the options below:

A systematical approach to classification problems with feature space heterogeneity

Hongshan Xiao (Research Center for International Business and Economy, Sichuan International Studies University, Chongqing, China and School of International Business, Sichuan International Studies University, Chongqing, China)
Yu Wang (School of Economics and Business Administration, Chongqing University, Chongqing, China)

Kybernetes

ISSN: 0368-492X

Article publication date: 13 August 2019

Issue publication date: 23 September 2019

118

Abstract

Purpose

Feature space heterogeneity exists widely in various application fields of classification techniques, such as customs inspection decision, credit scoring and medical diagnosis. This paper aims to study the relationship between feature space heterogeneity and classification performance.

Design/methodology/approach

A measurement is first developed for measuring and identifying any significant heterogeneity that exists in the feature space of a data set. The main idea of this measurement is derived from a meta-analysis. For the data set with significant feature space heterogeneity, a classification algorithm based on factor analysis and clustering is proposed to learn the data patterns, which, in turn, are used for data classification.

Findings

The proposed approach has two main advantages over the previous methods. The first advantage lies in feature transform using orthogonal factor analysis, which results in new features without redundancy and irrelevance. The second advantage rests on samples partitioning to capture the feature space heterogeneity reflected by differences of factor scores. The validity and effectiveness of the proposed approach is verified on a number of benchmarking data sets.

Research limitations/implications

Measurement should be used to guide the heterogeneity elimination process, which is an interesting topic in future research. In addition, to develop a classification algorithm that enables scalable and incremental learning for large data sets with significant feature space heterogeneity is also an important issue.

Practical implications

Measuring and eliminating the feature space heterogeneity possibly existing in the data are important for accurate classification. This study provides a systematical approach to feature space heterogeneity measurement and elimination for better classification performance, which is favorable for applications of classification techniques in real-word problems.

Originality/value

A measurement based on meta-analysis for measuring and identifying any significant feature space heterogeneity in a classification problem is developed, and an ensemble classification framework is proposed to deal with the feature space heterogeneity and improve the classification accuracy.

Keywords

Acknowledgements

The authors are grateful to the editor and the anonymous reviewers for their constructive comments and suggestions, which have significantly improved the paper. This research was supported by the National Natural Science Foundation of China (Grants No. 71801164 and 71471022) and Fundamental Research Funds for the Central Universities (Project No. 106112016CDJXY020010).

Citation

Xiao, H. and Wang, Y. (2019), "A systematical approach to classification problems with feature space heterogeneity", Kybernetes, Vol. 48 No. 9, pp. 2006-2029. https://doi.org/10.1108/K-06-2018-0313

Publisher

:

Emerald Publishing Limited

Copyright © 2019, Emerald Publishing Limited

Related articles