Research on SVM environment performance of parallel computing based on large data set of machine learning

Gong, Yunlu; Jia, Lianguo

doi:10.1007/s11227-019-02894-7

Research on SVM environment performance of parallel computing based on large data set of machine learning

Published: 21 June 2019

Volume 75, pages 5966–5983, (2019)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

396 Accesses
10 Citations
Explore all metrics

Abstract

The support vector machine (SVM) algorithm is widely used in various fields because of its good classification effect, simplicity and practicability. However, the support vector machine calculates the support vector by quadratic programming, and the solution of quadratic programming will calculate the n-order matrix. When the amount of data is large, the calculation and storage of the n-order matrix will make the optimization speed very slow, even lead to memory overflow and interrupt operation. Using the big data computing platform Spark to improve the support vector machine algorithm can solve the above problems, but it’s not competent for multi-classification problems. Therefore, this paper starts with constructing multiple classifiers, combines the Spark framework of big data programming model and the classification characteristics of support vector machine to realize a parallel one-to-many SVM optimization algorithm based on large data sets and compares them through UCI data sets. In the experiments, the one-to-many support vector machine improved by Spark is obviously better than the one-to-many support vector machine in the single-machine environment. The simulation results show that the proposed algorithm has better performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Non-linear Classification of Massive Datasets with a Parallel Algorithm of Local Support Vector Machines

A divide-and-conquer method for large scale ν-nonparallel support vector machines

Article 02 September 2016

An accelerator for support vector machines based on the local geometrical information and data partition

Article 09 October 2018

References

Danchin A, Ouzounis C, Tokuyasu T et al (2018) No wisdom in the crowd: genome annotation in the era of big data—current status and future prospects[J]. Microb Biotechnol 11(4):588–605
Article Google Scholar
Grover P, Kar AK (2017) Big data analytics: a review on theoretical contributions and tools used in literature[J]. Glob J Flex Syst Manag 18(3):1–27
Article Google Scholar
Zhou S, Hu C et al (2016) Research on supply chain risk assessment with intuitionistic fuzzy information[J]. J Intell Fuzzy Syst 30(6):3367–3372
Article Google Scholar
Li D, Li M, Liu J (2018) Evolutionary trust scheme of certificate game in mobile cloud computing[J]. Soft Comput 22(7):2245–2255
Article Google Scholar
Rad MH, Patooghy A, Fazeli M (2017) An Efficient Programming Skeleton for Clusters of Multi-Core Processors[J]. Int J Parallel Program 2:1–16
Google Scholar
Dastgeer U, Kessler C (2016) Smart Containers and Skeleton Programming for GPU-Based Systems[J]. Int J Parallel Program 44(3):1–25
Article Google Scholar
Kim Y, Lee J, Kim D et al (2017) ScaleGPU: gPU Architecture for Memory-Unaware GPU Programming[J]. IEEE Comput Archit Lett 13(2):101–104
Article Google Scholar
Sitaridi EA, Ross KA (2016) GPU-accelerated string matching for database applications[J]. VLDB J 25(5):719–740
Article Google Scholar
Fang Y, Chen Q, Xiong NN et al (2017) RGCA: a reliable GPU cluster architecture for large-scale internet of things computing based on effective performance-energy optimization[J]. Sensors 17(8):1799
Article Google Scholar
Washington ID, Swartz CLE (2017) A parallel structure exploiting nonlinear programming algorithm for multiperiod dynamic optimization[J]. Comput Chem Eng 103:151–164
Article Google Scholar
Gerardo G, Martinez-Velasco JA (2017) Evaluation of MATPOWER and OpenDSS load flow calculations in power systems using parallel computing[J]. J Eng 2017(6):195–204
Google Scholar
Kumar N, Lee JH, Chilamkurti N et al (2016) Energy-efficient multimedia data dissemination in vehicular clouds: stochastic-reward-nets-based coalition game approach[J]. IEEE Syst J 10(2):847–858
Article Google Scholar
Madar V, Batista S (2016) FastLSU: a more practical approach for the Benjamini-Hochberg FDR controlling procedure for huge-scale testing problems[J]. Bioinformatics 32(11):1716–1723
Article Google Scholar
Allombert V, Gava F, Tesson J (2016) Multi-ML: programming multi-BSP algorithms in ML[J]. Int J Parallel Program 45(2):340–361
Article Google Scholar
Tetko IV, Varbanov HP, Galanski M et al (2016) Prediction of logP for Pt(II) and Pt(IV) complexes: comparison of statistical and quantum-chemistry based approaches[J]. J Inorg Biochem 156:1–13
Article Google Scholar
Janka E, Vincze F, Ádány Róza et al (2018) Is the definition of Roma an important matter? The parallel application of self and external classification of ethnicity in a population-based health interview survey[J]. Int J Environ Res Public Health 15(2):353
Article Google Scholar
Zhang C, Yang Y, Du Z et al (2016) Particle swarm optimization algorithm based on ontology model to support cloud computing applications[J]. J Ambient Intell Humaniz Comput 7(5):633–638
Article Google Scholar
Kim JS, Lee S, Chung MY (2018) Time-division random-access scheme based on coverage level for cellular internet-of-things in 3GPP networks[J]. Pervasive Mob Comput 44:45–57
Article Google Scholar
Magnus JR, Luca GD (2016) Weighted-average least squares (WALS): a survey[J]. J Econ Surv 30(1):117–148
Article Google Scholar
Chang C, Srirama S, Ling S (2015) Service discovery and trust in mobile social network in proximity[J]. Comput Sci SI:58–62
Google Scholar
Ripepi V, Cignoni M, Tosi M et al (2016) The VST survey of the SMC and the Magellanic bridge (STEP): first results[J]. Universe of Digital Sky Surv 42(5):809–811
Google Scholar
Babiceanu RF, Seker R (2016) Big Data and virtualization for manufacturing cyber-physical systems: a survey of the current status and future outlook[J]. Comput Ind 81(C):128–137
Article Google Scholar
Smith C, Albarghouthi A (2016) MapReduce program synthesis[J]. Acm Sigplan Not 51(6):326–340
Article Google Scholar
Wu X, Chen L, Wei D et al (2016) Static analysis of runtime errors in interrupt-driven programs via sequentialization[J]. Acm Trans Embed Comput Syst 15(4):70
Google Scholar
Bhatia V, Rani R (2018) Ap-FSM: a parallel algorithm for approximate frequent subgraph mining using pregel[J]. Expert Syst Appl 106:217–232
Article Google Scholar
Zaharia M, Xin RS, Wendell P et al (2016) Apache spark: a unified engine for big data processing[J]. Commun ACM 59(11):56–65
Article Google Scholar
Zhou S, Qian S et al (1934) A novel bearing multi-fault diagnosis approach based on weighted Permutation entropy and an improved SVM ensemble classifier[J]. Sensors 2018:18
Google Scholar

Download references

Acknowledgements

The authors acknowledge the Natural Science Foundation of Jiangsu Province (Grant: BK20150204).

Author information

Authors and Affiliations

Department of Mathematics, Shanghai University, Shanghai, China
Yunlu Gong
Wuxi Huoqiupuhui Co. Ltd, Wuxi, China
Lianguo Jia

Authors

Yunlu Gong
View author publications
You can also search for this author in PubMed Google Scholar
Lianguo Jia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yunlu Gong.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gong, Y., Jia, L. Research on SVM environment performance of parallel computing based on large data set of machine learning. J Supercomput 75, 5966–5983 (2019). https://doi.org/10.1007/s11227-019-02894-7

Download citation

Published: 21 June 2019
Issue Date: September 2019
DOI: https://doi.org/10.1007/s11227-019-02894-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Research on SVM environment performance of parallel computing based on large data set of machine learning

Abstract

Access this article

Similar content being viewed by others

Non-linear Classification of Massive Datasets with a Parallel Algorithm of Local Support Vector Machines

A divide-and-conquer method for large scale ν-nonparallel support vector machines

An accelerator for support vector machines based on the local geometrical information and data partition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Research on SVM environment performance of parallel computing based on large data set of machine learning

Abstract

Access this article

Similar content being viewed by others

Non-linear Classification of Massive Datasets with a Parallel Algorithm of Local Support Vector Machines

A divide-and-conquer method for large scale ν-nonparallel support vector machines

An accelerator for support vector machines based on the local geometrical information and data partition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation