Abstract
Nowadays the phenomenon of Big Data is overwhelming our capacity to extract relevant knowledge through classical machine learning techniques. Multitarget regression has arisen in several interesting industrial and environmental application domains, such as ecological modeling and energy forecasting. However, standard multi-target regressors are not designed to perform well with such amounts of data. This paper proposes a scalable implementation for a multi-target linear regression algorithm with output dependence estimation for Big Data analytics in Apache Spark. Our experiments on large-scale datasets show an accurate analysis compared to standard implementation and order of training time reduction as the available number of working nodes in the processing cluster increases.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
BakIr, G., Hofmann, T., Smola, A.J., Schölkopf, B., Taskar, B.: Predicting Structured Data. MIT Press, Cambridge (2007)
Borchani, H., Varando, G., Bielza, C., Larranaga, P.: A survey on multi-output regression. Wiley Interdisc. Rev.: Data Min. Knowl. Discovery 5(5), 216–233 (2015)
Corona, J.C., Gonzalez, H., Morell, C.: Los principales algoritmos para regresión con salidas múltiples. una revisión para big data. Revista Cubana de Ciencias Informáticas 13(4), 118–150 (2019)
Corona, J.C., Gonzalez, H., Morell, C.: Solución distribuida de los algoritmos de predicción con salidas múltiples MTS y ERC. In: XVIII Convención y Feria Internacional Informática 2020 (2020)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Gonzalez, H., Morell, C., Ferri, F.J.: Generalized multitarget linear regression with output dependence estimation. In: Vera-Rodriguez, R., Fierrez, J., Morales, A. (eds.) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2018. LNCS, vol. 11401, pp. 296–304. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-13469-3_35
Gu, R., et al.: Efficient large scale distributed matrix computation with spark. In: 2015 IEEE International Conference on Big Data (Big Data), pp. 2327–2336. IEEE (2015)
Hatzikos, E.V., Tsoumakas, G., Tzanis, G., Bassiliades, N., Vlahavas, I.: An empirical study on sea water quality prediction. Knowl.-Based Syst. 21(6), 471–478 (2008)
Hebrail, G., Baillard, A.: UCI machine learning repository: Individual household electric power consumption dataset. Technical report. University of California, Irvine, School of Information and Computer Sciences 2 (2012)
Karalič, A., Bratko, I.: First order regression. Mach. Learn. 26(2), 147–176 (1997)
Laney, D., et al.: 3D data management: controlling data volume, velocity and variety. META Gr. Res. Note 6(70), 1 (2001)
Moniz, N., Torgo, L.: Multi-source social feedback of online news feeds (2018)
Ramírez-Gallego, S., García, S., Benítez, J.M., Herrera, F.: A distributed evolutionary multivariate discretizer for big data processing on apache spark. Swarm Evol. Comput. 38, 240–250 (2018)
Spyromitros-Xioufis, E., Tsoumakas, G., Groves, W., Vlahavas, I.: Multi-target regression via input space expansion: treating targets as inputs. Mach. Learn. 104(1), 55–98 (2016)
Torres-Sospedra, J., et al.: UJIIndoorLoc: a new multi-building and multi-floor database for WLAN fingerprint-based indoor localization problems. In: 2014 International Conference on Indoor Positioning and Indoor Navigation (IPIN), pp. 261–270. IEEE (2014)
Zhen, X., Yu, M., He, X., Li, S.: Multi-target regression via robust low-rank learning. IEEE Trans. Pattern Anal. Mach. Intell. 40(2), 497–504 (2017)
Zhen, X., et al.: Multitarget sparse latent regression. IEEE Trans. Neural Netw. Learn. Syst. 29(5), 1575–1586 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Corona, J.C., Gonzalez, H., Morell, C. (2021). Scalable Generalized Multitarget Linear Regression With Output Dependence Estimation. In: Hernández Heredia, Y., Milián Núñez, V., Ruiz Shulcloper, J. (eds) Progress in Artificial Intelligence and Pattern Recognition. IWAIPR 2021. Lecture Notes in Computer Science(), vol 13055. Springer, Cham. https://doi.org/10.1007/978-3-030-89691-1_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-89691-1_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-89690-4
Online ISBN: 978-3-030-89691-1
eBook Packages: Computer ScienceComputer Science (R0)