Abstract
Database may contain pivotal records-small chunks of records or instances consist of important information specific to the domain. These chunks of instances may contain crucial information which assists in decision making by assigning labels to pivotal records, unlabeled data instances and improves accuracy of the classification model. Our work suggests the heuristic Rough Set Boundary detection for approximating the boundary set efficiently from the large database to reduce the search space substantially for finding critical records. The use of Rough Set Boundary detection has the advantage of obtaining rough set from the original data set which confines the search space only to the boundary. It uses the concept of pivotal score for each instance in the boundary to isolate the critical records. The method also exploits Feature Selection technique for reduced set of attributes in order to obtain less computational time. The proposed work retrieves the pivotal records from the boundary set and also improves the classification accuracy by increasing true positive and true negative errors. Experiments are carried out for real—world medical data sets with numeric values and various classification algorithms are executed to validate the results. Result shows that the identification of pivotal records from rough boundary set helps for improved classification accuracy using less computational time and which are validated using real-world data sets.
Similar content being viewed by others
Change history
05 April 2019
The Editor-in-Chief has retracted this article [1], because it shows substantial overlap with a previously published article [2]. Author A. Suresh does not agree with the retraction. Author R. Varatharajan has not responded to correspondence about this retraction.
05 April 2019
The Editor-in-Chief has retracted this article [1], because it shows substantial overlap with a previously published article [2]. Author A. Suresh does not agree with the retraction. Author R. Varatharajan has not responded to correspondence about this retraction.
References
Angiulli F, Basta S, Lodi S, Sartori C (2013) Distributed strategies for mining outliers in large data sets. IEEE Trans Knowl Data Eng 25(7):1520–1532
Anitha A, Kannan E, (2014) Isolating critical data points from boundary region with feature selection. IEEE International Conference in Computational Intelligence and Computing Research (ICCIC), 1–4
Anitha A, Kannan E (2014) A Constructive Distance-Based Boundary detection approach with numeric variables. Journal of Theoretical & Applied Information Technology 67, (3)
Balamurugan SAA, Rajaram R (2009) Effective and efficient feature selection for large-scale data using Bayes’ theorem. Int J Autom Comput 6(1):62–71
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM comput surv 41(3):15
Cherkassky V, Muier F (1998) Learning from data: concepts, theory and methods. Wiley, New York
Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd 96(34):226–231
Fan J, Zhou S, Siddique MA (2017) Fuzzy color distribution chart-based shot boundary detection. Multimed Tool Appl 76:10169. https://doi.org/10.1007/s11042-016-3604-y 10190
Ghoting A, Parthasarathy S, Otey ME (2008) Fast mining of distance-based outliers in high-dimensional datasets. Data Min Knowl Disc 16(3):349–364
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
Hu Q, Yu D, Xie Z (2007) Selecting samples and features for SVM based on neighborhood model. In Rough Sets, Fuzzy Sets, Data Mining and Granular Computing, Springer Berlin Heidelberg, 508–517,
Hu Q, Yu D, Liu J, Wu C (2008) Neighborhood rough set based heterogeneous feature subset selection. Inf Sci 178(18):3577–3594
Huang CL, Wang CJ (2006) A GA-based feature selection and parameters optimization for support vector machines. Expert Syst Appl 31(2):231–240
Jiang MF, Tseng SS, Su CM (2001) Two- phase clustering process for outliers detection. Pattern Recogn Lett 22(6):691–700
Knorr EM, Ng RT, Tucakov V (2000) Distance-based outliers: algorithms and applications. VLDB J 8(3–4):237–253
Knox EM, Ng RT (1998) Algorithms for mining distance based outliers in large datasets. Proceedings of the International Conference on Very Large Data Bases VLDB, San Francisco, 392–403
Mitchell T (1997) Machine learning. WCB/McGraw-Hill, Boston
Novakovic J, Strbac P, Bulatovic D (2011) Toward optimal feature selection using ranking methods and classification algorithms. Yugoslav Journal of Operations Research, vol. 21, no. 1, pp. ISSN: 0354-0243, EISSN: 2334–6043
Parthalain N, Shen Q, Jensen R A distance measure approach to exploring the rough set boundary region for attribute reduction. IEEE Trans Knowl Data Eng 22(3, 2010):305–317
Pawlak Z (1982) Rough sets. Int J Comput Inf Sci 11(5):341–356
Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Poulisse GJ, Patsis Y, Moens MF (2014) Unsupervised scene detection and commentator building using multi-modal chains. Multimed Tool Appl 70:159. https://doi.org/10.1007/s11042-012-1086-0 175
Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517
Sathiaraj D, Triantaphyllou E (2013) On identifying critical nuggets of information during classification tasks. IEEE Trans Knowl Data Eng 25(6):1354–1367
Song Q, Ni J, Wang G (2013) A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Trans Knowl Data Eng 25(1):1–14
Thivagar ML, Richard C, Paul NR (2012) Mathematical innovations of a modern topology in medical events. Int j inf sci 2(4):33–36
Ye M, Li X, Orlowska ME (2009) Projected outlier detection in high -dimensional mixed-attributes data set. Expert Syst Appl 36(3):7104–7113
Ye N, Li X, Chen Q, Emran SM, Xu M (2001) Probabilistic techniques for intrusion detection based on computer audit data. IEEE Trans Syst Man cybern Part A Syst Hum 31(4):266–274
Yu D, Sheikholeslami G, Zhang A (2002) Findout: finding outliers in very large datasets. Knowl Inf Syst 4(4):387–412
Author information
Authors and Affiliations
Corresponding author
Additional information
The Editor-in-Chief has retracted this article, because it shows substantial overlap with a previously published article. Author A. Suresh does not agree with the retraction. Author R. Varatharajan has not responded to correspondence about this retraction.
About this article
Cite this article
Suresh, A., Varatharajan, R. RETRACTED ARTICLE: Recognition of pivotal instances from uneven set boundary during classification. Multimed Tools Appl 77, 27075–27088 (2018). https://doi.org/10.1007/s11042-018-5905-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-5905-9