Skip to main content
Log in

Data science strategies leading to the development of data scientists’ skills in organizations

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The purpose of this paper is to compare the strategies of companies with data science practices and methodologies and the data specificities/variables that can influence the definition of a data science strategy in pharma companies. The current paper is an empirical study, and the research approach consists of verifying against a set of statistical tests the differences between companies with a data science strategy and companies without a data science strategy. We have designed a specific questionnaire and applied it to a sample of 280 pharma companies. The main findings are based on the analysis of these variables: overwhelming volume, managing unstructured data, data quality, availability of data, access rights to data, data ownership issues, cost of data, lack of pre-processing facilities, lack of technology, shortage of talent/skills, privacy concerns and regulatory risks, security, and difficulties of data portability regarding companies with a data science strategy and companies without a data science strategy. The paper offers an in-depth comparative analysis between companies with or without a data science strategy, and the key limitation is regarding the literature review as a consequence of the novelty of the theme; there is a lack of scientific studies regarding this specific aspect of data science. In terms of the practical business implications, an organization with a data science strategy will have better direction and management practices as the decision-making process is based on accurate and valuable data, but it needs data scientists skills to fulfil those goals.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Similar content being viewed by others

References

  1. Adam NR, Wieder R, Ghosh D (2017) Data science, learning, and applications to biomedical and health sciences. Ann N Y Acad Sci 1387(1):5–11

    Article  Google Scholar 

  2. Akerkar R, Sajja PS (2016) Intelligent techniques for data science, 1st edn. Springer, Switzerland

    Book  Google Scholar 

  3. Anderson JC, Gerbing DW (1988) Structural equation modeling in practice: a review and recommended two-step approach. Psychol Bull 103(3):411

    Article  Google Scholar 

  4. Blanca MJ, Alarcón R, Arnau J, Bono R, Bendayan R (2017) Non-normal data: is ANOVA still a valid option? Psicothema 29(4):552–557

    Google Scholar 

  5. Brownson RC, Colditz GA, Proctor EK (2017) Dissemination and implementation research in health: translating science to practice. Oxford University Press, Oxford

    Book  Google Scholar 

  6. Cao L (2017) Data science: a comprehensive overview. ACM Comput Surv (CSUR) 50(3):43

    Article  Google Scholar 

  7. Cao L (2017) Data science: challenges and directions. Commun ACM 60(8):59–68

    Article  Google Scholar 

  8. Cao L (2016) Data science: nature and pitfalls. IEEE Intell Syst 31(5):66–75

    Article  Google Scholar 

  9. Cleveland WS (2001) Data science: an action plan for expanding the technical areas of the field of statistics. Int Stat Rev 69(1):21–26

    Article  Google Scholar 

  10. Cronbach LJ (1951) Coefficient alpha and the internal structure of tests. Psychometrika 16(3):297–334

    Article  Google Scholar 

  11. Cruz-Correia R, Ferreira D, Bacelar G et al (2018) Personalised medicine challenges: quality of data. Int J Data Sci Anal 6:251. https://doi.org/10.1007/s41060-018-0127-9

    Article  Google Scholar 

  12. Dinov ID (2019) Quant data science meets dexterous artistry. Int J Data Sci Anal 7:81

    Article  Google Scholar 

  13. Dinov ID (2016) Volume and value of big healthcare data. J Med Stat Inf 4(1):1–7

    Article  Google Scholar 

  14. Fornell C, Larcker DF (1981) Structural equation models with unobservable variables and measurement error. J Mark Res 18(1):39–50

    Article  Google Scholar 

  15. Hair JF, Black WC, Babin BJ, Anderson RE (2010) Multivariate Data Analysis. Seventh Edition. Prentice Hall, Upper Saddle River, New Jersey

  16. Hayashi C (1998) What is data science? Fundamental concepts and a heuristic example. In: Data science, classification, and related methods 1998. Springer, Tokyo, pp 40–51

  17. Jain S (2017) Bridging the Gap Between R&D and commercialization in the pharmaceutical industry: role of medical affairs and medical communications. Int J Biomed Sci 3(3):44–49

    Google Scholar 

  18. Malley B, Ramazzotti D, Wu JT (2016) Data Pre-processing. In: Secondary Analysis of Electronic Health Records. Springer, Cham. Available from: https://www.ncbi.nlm.nih.gov/books/NBK543629/, https://doi.org/10.1007/978-3-319-43742-2_12

  19. Mercadé-Melé P, Molinillo S, Fernández-Morales A (2017) The influence of the types of media on the formation of perceived CSR. Span J Market-ESIC 21:54–64

    Article  Google Scholar 

  20. Mercadé-Melé P, Molinillo S, Fernández-Morales A, Porcu L (2018) CSR activities and consumer loyalty: the effect of the type of publicizing medium. J Bus Econ Manag 19(3):431–455

    Article  Google Scholar 

  21. Radermacher WJ (2018) Official statistics in the era of big data opportunities and threats. Int J Data Sci Anal 6:225. https://doi.org/10.1007/s41060-018-0124-z

    Article  Google Scholar 

  22. Rheinheimer DC, Penfield DA (2001) The effects of type I error rate and power of the ANCOVA F test and selected alternatives under nonnormality and variance heterogeneity. J Exp Educ 69(4):373–391

    Article  Google Scholar 

  23. Salas J, Domingo-Ferrer J (2018) Some Basics on privacy techniques, anonymization, and their big data challenges. J Math Comput Sci 12:263. https://doi.org/10.1007/s11786-018-0344-6

    Article  MathSciNet  MATH  Google Scholar 

  24. Steinwandter V, Borchert D, Herwig C (2019) Data science tools and applications on the way to Pharma 4.0. Drug Discov Today. 24(9):1795–1805

    Article  Google Scholar 

  25. Satorra A, Bentler PM (1988) Scaling corrections for chi-square statistics in covariance structure analysis. In: Proceedings of the American Statistical Association

  26. Satorra A, Bentler PM (1994) Corrections to test statistics and standard errors in covariance structure analysis. In: von Eye A, Clogg CC (eds) Latent variables analysis

  27. Schneeweiss S (2014) Learning from big health care data. N Engl J Med 370(23):2161–2163

    Article  Google Scholar 

  28. Skiena SS (2017) The data science design manual. Springer, Cham

    Book  Google Scholar 

  29. Tariq MI, Memon NA, Ahmed S, Tayyaba S, Mushtaq MT, Mian NA, Imran M, Ashraf MW (2020) A Review of Deep Learning Security and Privacy Defensive Techniques. Mobile Information Systems. https://doi.org/10.1155/2020/6535834

  30. Tabachnick BG, Fidell LS, Ullman JB (2007) Using multivariate statistics, vol 5. Pearson, Boston

    Google Scholar 

  31. Torra V, Navarro-Arribas G (2016) Big data privacy, and anonymization. In: Lehmann A, Whitehouse D, Fischer-Hübner S, Fritsch L, Raab C (eds) Privacy and identity management. facing up to next steps. Privacy and Identity, 2016. IFIP Advances in Information and Communication Technology, vol 498. Springer, Cham

  32. Wenwu H, Guomai L (2017) Exploration and research on the core course construction of data science and big data technology specialty, education review (2017)

  33. Wilkinson MD et al (2016) The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3:160018–1–160018–9

    Article  Google Scholar 

  34. Wohlfarth M (2019) Data portability on the internet. Bus Inf Syst Eng 61:551

    Article  Google Scholar 

Download references

Funding

This study was funded by Fundação para a Ciência e Tecnologia, Grant: UIDB/00315/2020.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maria José Sousa.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (XLSX 26 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sousa, M.J., Melé, P.M., Pesqueira, A.M. et al. Data science strategies leading to the development of data scientists’ skills in organizations. Neural Comput & Applic 33, 14523–14531 (2021). https://doi.org/10.1007/s00521-021-06095-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-021-06095-3

Keywords

Navigation