Skip to main content
Log in

High-performance IoT streaming data prediction system using Spark: a case study of air pollution

  • S.I. : Green and Human Information Technology 2019
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Internet-of-Things (IoT) devices are becoming prevalent, and some of them, such as sensors, generate continuous time-series data, i.e., streaming data. These IoT streaming data are one of Big Data sources, and they require careful consideration for efficient data processing and analysis. Deep learning is emerging as a solution to IoT streaming data analytics. However, there is a persistent problem in deep learning that it takes a long time to learn neural networks. In this paper, we propose a high-performance IoT streaming data prediction system to improve the learning speed and to predict in real time. We showed the efficacy of the system through a case study of air pollution. The experimental results show that the modified LSTM autoencoder model shows the best performance compared to a generic LSTM model. We noticed that achieving the best performance requires optimizing many parameters, including learning rate, epoch, memory cell size, input timestep size, and the number of features/predictors. In that regard, we show that the high-performance data learning/prediction frameworks (e.g., Spark, Dist-Keras, and Hadoop) are essential to rapidly fine-tune a model for training and testing before real deployment of the model as data accumulate.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  1. Chen F, Deng P, Wan J, Zhang D, Vasilakos AV, Rong X (2015) Data mining for the internet of things: literature review and challenges. Int J Distrib Sens Netw 11(8):431047. https://doi.org/10.1155/2015/431047

    Article  Google Scholar 

  2. Distributed Keras. https://joerihermans.com/work/distributed-keras/. Accessed 16 Dec 2019

  3. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735

    Article  Google Scholar 

  4. Apache Spark™—Unified Analytics Engine for Big Data (2019). https://spark.apache.org/. Accessed 16 Dec 2019

  5. Air korea. https://www.airkorea.or.kr/index. Accessed 16 Dec 2019

  6. Marjani M, Nasaruddin F, Gani A, Karim A, Hashem IAT, Siddiqa A, Yaqoob I (2017) Big IoT data analytics: architecture, opportunities, and open research challenges. IEEE Access 5:5247–5261. https://doi.org/10.1109/ACCESS.2017.2689040

    Article  Google Scholar 

  7. Brockwell PJ, Davis RA (2016) Modeling and forecasting with ARMA Processes. In: Brockwell PJ, Davis RA (eds) Introduction to time series and forecasting, Springer texts in statistics. Springer, Cham, pp 121–155. https://doi.org/10.1007/978-3-319-29854-2_5

    Chapter  Google Scholar 

  8. Salinas D, Flunkert V, Gasthaus J (2017) DeepAR: probabilistic forecasting with autoregressive recurrent networks. arXiv:1704.04110 [cs, stat]

  9. Bui TC, Le VD, Cha SK (2018) A deep learning approach for forecasting air pollution in South Korea using LSTM. arXiv:1804.07891 [cs, stat]

  10. Li X, Peng L, Hu Y, Shao J, Chi T (2016) Deep learning architecture for air quality predictions. Environ Sci Pollut Res 23(22):22408–22417. https://doi.org/10.1007/s11356-016-7812-9

    Article  Google Scholar 

  11. Reddy VN, Mohanty S (2017) Deep air: forecasting air pollution in Beijing, China. https://www.ischool.berkeley.edu/sites/default/files/sproject_attachments/deep-air-forecasting_final.pdf. Accessed 16 Dec 2019

  12. Li X, Peng L, Yao X, Cui S, Hu Y, You C, Chi T (2017) Long short-term memory neural network for air pollutant concentration predictions: method development and evaluation. Environ Pollut 231:997–1004. https://doi.org/10.1016/j.envpol.2017.08.114

    Article  Google Scholar 

  13. Srivastava N, Mansimov E, Salakhutdinov R (2015) Unsupervised learning of video representations using LSTMs. In: Proceedings of the 32nd international conference on international conference on machine learning, vol 37, ICML’15, JMLR.org. Event-place, Lille, France, pp 843–852. http://dl.acm.org/citation.cfm?id=3045118.3045209. Accessed 16 Dec 2019

  14. Apache Hadoop. http://hadoop.apache.org/. Accessed 16 Dec 2019

  15. UCI Machine Learning Repository: PM2.5 Data of Five Chinese Cities Data Set. https://archive.ics.uci.edu/ml/datasets/PM2.5+Data+of+Five+Chinese+Cities. Accessed 16 Dec 2019

Download references

Acknowledgements

This work was supported by Basic Science Research Program through the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (NRF-2017R1D1A1B03033632).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eun-Sung Jung.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jin, HY., Jung, ES. & Lee, D. High-performance IoT streaming data prediction system using Spark: a case study of air pollution. Neural Comput & Applic 32, 13147–13154 (2020). https://doi.org/10.1007/s00521-019-04678-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-019-04678-9

Keywords

Navigation