Skip to main content

Part of the book series: Studies in Big Data ((SBD,volume 94))

Abstract

Data science is the process that extracts insights from data, say big data. A large volume of data is generated due to the digital era and analysing these data is crucial for business growth (Provost and Fawcett [1]).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Provost, F., Fawcett, T.: Data Science for Business: What you Need to Know about Data Mining and Data-Analytic Thinking. O'Reilly Media, Inc. (2013)

    Google Scholar 

  2. Shinde, G.R., Kalamkar, A.B., Mahalle, P.N., Dey, N.: Data Analytics for Pandemics: A COVID-19 Case Study. CRC Press (2020)

    Book  Google Scholar 

  3. Waller, M.A., Fawcett, S.E.: Data science, predictive analytics, and big data: a revolution that will transform supply chain design and management (2013)

    Google Scholar 

  4. Shinde, G.R., Kalamkar, A.B., Mahalle, P.N., Dey, N., Chaki, J., Hassanien, A.E.: Forecasting models for coronavirus disease (COVID-19): a survey of the state-of-the-art. SN Comput. Sci. 1(4), 1–15 (2020)

    Article  Google Scholar 

  5. Mahalle, P.N., Sable, N.P., Mahalle, N.P., Shinde, G.R.: Data analytics: Covid-19 prediction using multimodal data. In: Intelligent Systems and Methods to Combat Covid-19, pp. 1–10. Springer, Singapore (2020)

    Google Scholar 

  6. Han, J., Kamber, M., Pei, J.: Data mining concepts and techniques third edition. Morgan Kaufmann Ser. Data Manag. Syst. 5(4), 83–124 (2011)

    Google Scholar 

  7. Mahalle, P.N., Sonawane, S.S.: Internet of things in healthcare. In: Foundations of Data Science Based Healthcare Internet of Things, pp. 13–25. Springer, Singapore (2021)

    Google Scholar 

  8. Whitley, E., Ball, J.: Statistics review 1: presenting and summarising data. Crit. Care 6(1), 1–6 (2001)

    Google Scholar 

  9. Potter, K., Hagen, H., Kerren, A., Dannenmann, P.: Methods for presenting statistical information: the box plot. Visual. Large Unstruct. Data Sets 4, 97–106 (2006)

    Google Scholar 

  10. Famili, A.A, Wei-Minb, S., Richardc, W., Simoudis, Famili, E.: Data Preprocessing and Intelligent Data Analysis, 3–23

    Google Scholar 

  11. Alasadi, S.A., Bhaya, W.S.: Review of data preprocessing techniques in data mining. J. Eng. Appl. Sci. 12(16), 4102–4107 (2017)

    Google Scholar 

  12. Sukumar, P., Robert, L., Yuvaraj, S.: Review on modern Data Preprocessing techniques in Web usage mining (WUM). In: 2016 International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS) (pp. 64–69). IEEE (2016)

    Google Scholar 

  13. García, S., Ramírez-Gallego, S., Luengo, J., Benítez, J.M., Herrera, F.: Big data preprocessing: methods and prospects. Big Data Analyt. 1(1), 1–22 (2016)

    Article  Google Scholar 

  14. Zhang, Z.: Missing data imputation: focusing on single imputation. Ann. Transl. Med. 4(1) (2016)

    Google Scholar 

  15. Patro, S., Sahu, K.K.: Normalization: A preprocessing stage. arXiv preprint arXiv:1503.06462 (2015)

    Google Scholar 

  16. Saranya, C., Manikandan, G.: A study on normalization techniques for privacy preserving data mining. Int. J. Eng. Technol. (IJET) 5(3), 2701–2704 (2013)

    Google Scholar 

  17. Hancock, J.T., Khoshgoftaar, T.M.: Survey on categorical data for neural networks. J. Big Data 7, 1–41 (2020)

    Article  Google Scholar 

  18. Seger, C.: An investigation of categorical variable encoding techniques in machine learning: binary versus one-hot and feature hashing (2018)

    Google Scholar 

  19. Khalid, S., Khalil, T., Nasreen, S.: A survey of feature selection and feature extraction techniques in machine learning. In: 2014 Science and Information Conference, pp. 372–378. IEEE (2014)

    Google Scholar 

  20. Saurkar, A.V., Pathare, K.G., Gode, S.A.: An Overview On Web Scraping Techniques And Tools. Int. J. Future Revol. Comput. Sci. Commun. Eng. 4(4), 363–367 (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Parikshit Narendra Mahalle .

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Mahalle, P.N., Shinde, G.R., Pise, P.D., Deshmukh, J.Y. (2022). Data Collection and Preparation. In: Foundations of Data Science for Engineering Problem Solving. Studies in Big Data, vol 94. Springer, Singapore. https://doi.org/10.1007/978-981-16-5160-1_2

Download citation

Publish with us

Policies and ethics