Abstract
The support vector machine (SVM) algorithm is widely used in various fields because of its good classification effect, simplicity and practicability. However, the support vector machine calculates the support vector by quadratic programming, and the solution of quadratic programming will calculate the n-order matrix. When the amount of data is large, the calculation and storage of the n-order matrix will make the optimization speed very slow, even lead to memory overflow and interrupt operation. Using the big data computing platform Spark to improve the support vector machine algorithm can solve the above problems, but it’s not competent for multi-classification problems. Therefore, this paper starts with constructing multiple classifiers, combines the Spark framework of big data programming model and the classification characteristics of support vector machine to realize a parallel one-to-many SVM optimization algorithm based on large data sets and compares them through UCI data sets. In the experiments, the one-to-many support vector machine improved by Spark is obviously better than the one-to-many support vector machine in the single-machine environment. The simulation results show that the proposed algorithm has better performance.
Similar content being viewed by others
References
Danchin A, Ouzounis C, Tokuyasu T et al (2018) No wisdom in the crowd: genome annotation in the era of big data—current status and future prospects[J]. Microb Biotechnol 11(4):588–605
Grover P, Kar AK (2017) Big data analytics: a review on theoretical contributions and tools used in literature[J]. Glob J Flex Syst Manag 18(3):1–27
Zhou S, Hu C et al (2016) Research on supply chain risk assessment with intuitionistic fuzzy information[J]. J Intell Fuzzy Syst 30(6):3367–3372
Li D, Li M, Liu J (2018) Evolutionary trust scheme of certificate game in mobile cloud computing[J]. Soft Comput 22(7):2245–2255
Rad MH, Patooghy A, Fazeli M (2017) An Efficient Programming Skeleton for Clusters of Multi-Core Processors[J]. Int J Parallel Program 2:1–16
Dastgeer U, Kessler C (2016) Smart Containers and Skeleton Programming for GPU-Based Systems[J]. Int J Parallel Program 44(3):1–25
Kim Y, Lee J, Kim D et al (2017) ScaleGPU: gPU Architecture for Memory-Unaware GPU Programming[J]. IEEE Comput Archit Lett 13(2):101–104
Sitaridi EA, Ross KA (2016) GPU-accelerated string matching for database applications[J]. VLDB J 25(5):719–740
Fang Y, Chen Q, Xiong NN et al (2017) RGCA: a reliable GPU cluster architecture for large-scale internet of things computing based on effective performance-energy optimization[J]. Sensors 17(8):1799
Washington ID, Swartz CLE (2017) A parallel structure exploiting nonlinear programming algorithm for multiperiod dynamic optimization[J]. Comput Chem Eng 103:151–164
Gerardo G, Martinez-Velasco JA (2017) Evaluation of MATPOWER and OpenDSS load flow calculations in power systems using parallel computing[J]. J Eng 2017(6):195–204
Kumar N, Lee JH, Chilamkurti N et al (2016) Energy-efficient multimedia data dissemination in vehicular clouds: stochastic-reward-nets-based coalition game approach[J]. IEEE Syst J 10(2):847–858
Madar V, Batista S (2016) FastLSU: a more practical approach for the Benjamini-Hochberg FDR controlling procedure for huge-scale testing problems[J]. Bioinformatics 32(11):1716–1723
Allombert V, Gava F, Tesson J (2016) Multi-ML: programming multi-BSP algorithms in ML[J]. Int J Parallel Program 45(2):340–361
Tetko IV, Varbanov HP, Galanski M et al (2016) Prediction of logP for Pt(II) and Pt(IV) complexes: comparison of statistical and quantum-chemistry based approaches[J]. J Inorg Biochem 156:1–13
Janka E, Vincze F, Ádány Róza et al (2018) Is the definition of Roma an important matter? The parallel application of self and external classification of ethnicity in a population-based health interview survey[J]. Int J Environ Res Public Health 15(2):353
Zhang C, Yang Y, Du Z et al (2016) Particle swarm optimization algorithm based on ontology model to support cloud computing applications[J]. J Ambient Intell Humaniz Comput 7(5):633–638
Kim JS, Lee S, Chung MY (2018) Time-division random-access scheme based on coverage level for cellular internet-of-things in 3GPP networks[J]. Pervasive Mob Comput 44:45–57
Magnus JR, Luca GD (2016) Weighted-average least squares (WALS): a survey[J]. J Econ Surv 30(1):117–148
Chang C, Srirama S, Ling S (2015) Service discovery and trust in mobile social network in proximity[J]. Comput Sci SI:58–62
Ripepi V, Cignoni M, Tosi M et al (2016) The VST survey of the SMC and the Magellanic bridge (STEP): first results[J]. Universe of Digital Sky Surv 42(5):809–811
Babiceanu RF, Seker R (2016) Big Data and virtualization for manufacturing cyber-physical systems: a survey of the current status and future outlook[J]. Comput Ind 81(C):128–137
Smith C, Albarghouthi A (2016) MapReduce program synthesis[J]. Acm Sigplan Not 51(6):326–340
Wu X, Chen L, Wei D et al (2016) Static analysis of runtime errors in interrupt-driven programs via sequentialization[J]. Acm Trans Embed Comput Syst 15(4):70
Bhatia V, Rani R (2018) Ap-FSM: a parallel algorithm for approximate frequent subgraph mining using pregel[J]. Expert Syst Appl 106:217–232
Zaharia M, Xin RS, Wendell P et al (2016) Apache spark: a unified engine for big data processing[J]. Commun ACM 59(11):56–65
Zhou S, Qian S et al (1934) A novel bearing multi-fault diagnosis approach based on weighted Permutation entropy and an improved SVM ensemble classifier[J]. Sensors 2018:18
Acknowledgements
The authors acknowledge the Natural Science Foundation of Jiangsu Province (Grant: BK20150204).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Gong, Y., Jia, L. Research on SVM environment performance of parallel computing based on large data set of machine learning. J Supercomput 75, 5966–5983 (2019). https://doi.org/10.1007/s11227-019-02894-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-019-02894-7