Skip to main content
Log in

Research on SVM environment performance of parallel computing based on large data set of machine learning

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

The support vector machine (SVM) algorithm is widely used in various fields because of its good classification effect, simplicity and practicability. However, the support vector machine calculates the support vector by quadratic programming, and the solution of quadratic programming will calculate the n-order matrix. When the amount of data is large, the calculation and storage of the n-order matrix will make the optimization speed very slow, even lead to memory overflow and interrupt operation. Using the big data computing platform Spark to improve the support vector machine algorithm can solve the above problems, but it’s not competent for multi-classification problems. Therefore, this paper starts with constructing multiple classifiers, combines the Spark framework of big data programming model and the classification characteristics of support vector machine to realize a parallel one-to-many SVM optimization algorithm based on large data sets and compares them through UCI data sets. In the experiments, the one-to-many support vector machine improved by Spark is obviously better than the one-to-many support vector machine in the single-machine environment. The simulation results show that the proposed algorithm has better performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Danchin A, Ouzounis C, Tokuyasu T et al (2018) No wisdom in the crowd: genome annotation in the era of big data—current status and future prospects[J]. Microb Biotechnol 11(4):588–605

    Article  Google Scholar 

  2. Grover P, Kar AK (2017) Big data analytics: a review on theoretical contributions and tools used in literature[J]. Glob J Flex Syst Manag 18(3):1–27

    Article  Google Scholar 

  3. Zhou S, Hu C et al (2016) Research on supply chain risk assessment with intuitionistic fuzzy information[J]. J Intell Fuzzy Syst 30(6):3367–3372

    Article  Google Scholar 

  4. Li D, Li M, Liu J (2018) Evolutionary trust scheme of certificate game in mobile cloud computing[J]. Soft Comput 22(7):2245–2255

    Article  Google Scholar 

  5. Rad MH, Patooghy A, Fazeli M (2017) An Efficient Programming Skeleton for Clusters of Multi-Core Processors[J]. Int J Parallel Program 2:1–16

    Google Scholar 

  6. Dastgeer U, Kessler C (2016) Smart Containers and Skeleton Programming for GPU-Based Systems[J]. Int J Parallel Program 44(3):1–25

    Article  Google Scholar 

  7. Kim Y, Lee J, Kim D et al (2017) ScaleGPU: gPU Architecture for Memory-Unaware GPU Programming[J]. IEEE Comput Archit Lett 13(2):101–104

    Article  Google Scholar 

  8. Sitaridi EA, Ross KA (2016) GPU-accelerated string matching for database applications[J]. VLDB J 25(5):719–740

    Article  Google Scholar 

  9. Fang Y, Chen Q, Xiong NN et al (2017) RGCA: a reliable GPU cluster architecture for large-scale internet of things computing based on effective performance-energy optimization[J]. Sensors 17(8):1799

    Article  Google Scholar 

  10. Washington ID, Swartz CLE (2017) A parallel structure exploiting nonlinear programming algorithm for multiperiod dynamic optimization[J]. Comput Chem Eng 103:151–164

    Article  Google Scholar 

  11. Gerardo G, Martinez-Velasco JA (2017) Evaluation of MATPOWER and OpenDSS load flow calculations in power systems using parallel computing[J]. J Eng 2017(6):195–204

    Google Scholar 

  12. Kumar N, Lee JH, Chilamkurti N et al (2016) Energy-efficient multimedia data dissemination in vehicular clouds: stochastic-reward-nets-based coalition game approach[J]. IEEE Syst J 10(2):847–858

    Article  Google Scholar 

  13. Madar V, Batista S (2016) FastLSU: a more practical approach for the Benjamini-Hochberg FDR controlling procedure for huge-scale testing problems[J]. Bioinformatics 32(11):1716–1723

    Article  Google Scholar 

  14. Allombert V, Gava F, Tesson J (2016) Multi-ML: programming multi-BSP algorithms in ML[J]. Int J Parallel Program 45(2):340–361

    Article  Google Scholar 

  15. Tetko IV, Varbanov HP, Galanski M et al (2016) Prediction of logP for Pt(II) and Pt(IV) complexes: comparison of statistical and quantum-chemistry based approaches[J]. J Inorg Biochem 156:1–13

    Article  Google Scholar 

  16. Janka E, Vincze F, Ádány Róza et al (2018) Is the definition of Roma an important matter? The parallel application of self and external classification of ethnicity in a population-based health interview survey[J]. Int J Environ Res Public Health 15(2):353

    Article  Google Scholar 

  17. Zhang C, Yang Y, Du Z et al (2016) Particle swarm optimization algorithm based on ontology model to support cloud computing applications[J]. J Ambient Intell Humaniz Comput 7(5):633–638

    Article  Google Scholar 

  18. Kim JS, Lee S, Chung MY (2018) Time-division random-access scheme based on coverage level for cellular internet-of-things in 3GPP networks[J]. Pervasive Mob Comput 44:45–57

    Article  Google Scholar 

  19. Magnus JR, Luca GD (2016) Weighted-average least squares (WALS): a survey[J]. J Econ Surv 30(1):117–148

    Article  Google Scholar 

  20. Chang C, Srirama S, Ling S (2015) Service discovery and trust in mobile social network in proximity[J]. Comput Sci SI:58–62

    Google Scholar 

  21. Ripepi V, Cignoni M, Tosi M et al (2016) The VST survey of the SMC and the Magellanic bridge (STEP): first results[J]. Universe of Digital Sky Surv 42(5):809–811

    Google Scholar 

  22. Babiceanu RF, Seker R (2016) Big Data and virtualization for manufacturing cyber-physical systems: a survey of the current status and future outlook[J]. Comput Ind 81(C):128–137

    Article  Google Scholar 

  23. Smith C, Albarghouthi A (2016) MapReduce program synthesis[J]. Acm Sigplan Not 51(6):326–340

    Article  Google Scholar 

  24. Wu X, Chen L, Wei D et al (2016) Static analysis of runtime errors in interrupt-driven programs via sequentialization[J]. Acm Trans Embed Comput Syst 15(4):70

    Google Scholar 

  25. Bhatia V, Rani R (2018) Ap-FSM: a parallel algorithm for approximate frequent subgraph mining using pregel[J]. Expert Syst Appl 106:217–232

    Article  Google Scholar 

  26. Zaharia M, Xin RS, Wendell P et al (2016) Apache spark: a unified engine for big data processing[J]. Commun ACM 59(11):56–65

    Article  Google Scholar 

  27. Zhou S, Qian S et al (1934) A novel bearing multi-fault diagnosis approach based on weighted Permutation entropy and an improved SVM ensemble classifier[J]. Sensors 2018:18

    Google Scholar 

Download references

Acknowledgements

The authors acknowledge the Natural Science Foundation of Jiangsu Province (Grant: BK20150204).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yunlu Gong.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gong, Y., Jia, L. Research on SVM environment performance of parallel computing based on large data set of machine learning. J Supercomput 75, 5966–5983 (2019). https://doi.org/10.1007/s11227-019-02894-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-019-02894-7

Keywords

Navigation