Skip to main content
Log in

Security-aware intermediate data placement strategy in scientific cloud workflows

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Massive computation power and storage capacity of cloud computing systems allow scientists to deploy data-intensive applications without the infrastructure investment, where large application datasets can be stored in the cloud. Based on the pay-as-you-go model, data placement strategies have been developed to cost-effectively store large volumes of generated datasets in the scientific cloud workflows. As promising as it is, this paradigm also introduces many new challenges for data security when the users outsource sensitive data for sharing on the cloud servers, which are not within the same trusted domain as the data owners. This challenge is further complicated by the security constraints on the potential sensitive data for the scientific workflows in the cloud. To effectively address this problem, we propose a security-aware intermediate data placement strategy. First, we build a security overhead model to reasonably measure the security overheads incurred by the sensitive data. Second, we develop a data placement strategy to dynamically place the intermediate data for the scientific workflows. Finally, our experimental results show that our strategy can effectively improve the intermediate data security while ensuring the data transfer time during the execution of scientific workflows.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Costantino Thanos (2012) Global research data infrastructures: towards a 10-year vision for global research data infrastructures. http://www.grdi2020.eu/Repository/FileScaricati/6bdc07fb-b21d-4b90-81d4-d909fdb96b87.pdf

  2. European Commission High Level Expert Group on Scientific Data (2010) Riding the wave: how Europe can gain from the rising tide of scientific data. http://cordis.europa.eu/fp7/ict/e-infrastructure/docs/hlg-sdi-report.pdf

  3. Demchenko Y, Grosso P, de Laat C et al. (2013) Addressing big data issues in scientific data infrastructure. In: Proceedings of the international conference on collaboration technologies and systems, pp 48–55

  4. Sagiroglu S, Sinanc D (2013) Big data: a review. In: Proceedings of the international conference on collaboration technologies and systems, pp 42–47

  5. Acar UA, Chen Y (2013) Streaming big data with self-adjusting computation. In: Proceedings of the workshop on data driven functional programming, pp 15–18

  6. Baru C, Bhandarkar M, Nambiar R et al (2013) Setting the direction for big data benchmark standards. In: Nambiar R, Poess M (eds) Selected topics in performance evaluation and benchmarking. Lecture notes in computer science, Springer, Heidelberg, pp 197–208

  7. Srivastava D, Dong XL (2013) Big data integration. In: Proceedings of the international conference on data engineering, pp 1245–1248

  8. Fei X, Lu S (2012) A dataflow-based scientific workflow composition framework. IEEE Trans Serv Comput 5(1):45–58

    Article  Google Scholar 

  9. Szalay A, Gray J (2006) 2020 Computing: science in an exponential world. Nature 440(7083):413–414

    Article  Google Scholar 

  10. Deelman E, Gannon D, Shields M et al (2009) Workflows and e-Science: an overview of workflow system features and capabilities. Future Gener Comput Syst 25(5):528–540

    Article  Google Scholar 

  11. Yuan D, Yang Y, Liu X et al (2013) A highly practical approach towards achieving minimum datasets storage cost in the cloud. IEEE Trans Parallel Distrib Syst 24(6):1234–1244

    Article  Google Scholar 

  12. Yuan D, Yang Y, Liu X et al (2012) A data dependency based strategy for intermediate data storage in scientific cloud workflow systems. Concurr Comput Pract Exp 24(9):956–976

    Article  Google Scholar 

  13. Bertram L, Ilkay A, Chad B et al (2006) Scientific workflow management and the Kepler system. Concurr Comput Pract Exp 18(10):1039–1065

    Article  Google Scholar 

  14. Weiss A (2007) Computing in the clouds. ACM Netw 11(4):16–25

    Google Scholar 

  15. Foster I, Yong Z, Raicu I et al (2008) Cloud computing and grid computing 360-degree compared. In: Proceedings of the grid computing environments workshop, pp 1–10

  16. Yuan D, Yang Y, Liu X et al (2010) A data placement strategy in scientific cloud workflows. Future Gener Comput Syst 26(8):1200–1214

    Article  Google Scholar 

  17. Wan C, Wang C, Pei J (2012) A QoS-awared scientific workflow scheduling schema in cloud computing. In: Proceedings of international conference on information science and technology, pp 634–639

  18. Wei L, Zhu H, Cao Z et al (2014) Security and privacy for storage and computation in cloud computing. Inf Sci 258(2):371–386

    Article  Google Scholar 

  19. Chu CK, Zhu WT, Han J et al (2013) Security concerns in popular cloud storage services. IEEE Pervasive Comput 12(4):50–57

    Article  Google Scholar 

  20. Kalloniatis C, Mouratidis H, Islam S (2013) Evaluating cloud deployment scenarios based on security and privacy requirements. Requir Eng 18(4):299–319

    Article  Google Scholar 

  21. Xiong L, Goryczka S, Sunderam V (2011) Adaptive, secure, and scalable distributed data outsourcing: a vision paper. In: Proceedings of workshop on dynamic distributed data-intensive applications, pp 1–6

  22. Mohamed EM, Abdelkader HS, El-Etriby S (2012) Enhanced data security model for cloud computing. In: Proceedings of 8th international conference on informatics and systems, pp 12–17

  23. Kaufman LM (2009) Data security in the world of cloud computing. IEEE Secur Priv 7(4):61–64

    Article  Google Scholar 

  24. Armbrust M, Fox A, Griffith R et al (2010) A view of cloud computing. Commun ACM 53(4):50–58

    Article  Google Scholar 

  25. Saritha S (2010) Google File System. Dissertation, Cochin University of Science and Technology

  26. Hadoop (2011) http://hadoop.apache.org/

  27. Natarajan A (2013) User-oriented modeling of scientific workflows for high frequency event data analysis. In: Proceedings of the 29th IEEE international conference on data engineering workshops, pp 306–309

  28. Guo L, He Z, Zhao S et al (2012) Multi-objective optimization for data placement strategy in cloud computing. In: Liu C, Wang L, Yang A (eds) Information computing and applications. Communications in computer and information science. Springer, Heidelberg, pp 119–126

  29. Guo L, Zhao S, Shen S et al (2012) A particle swarm optimization for data placement strategy in cloud computing. In: Zhu R, Ma Y (eds) Information engineering and applications. Lecture notes in electrical engineering, vol 154. Springer, London, pp 946–953

  30. Ma F, Yang Y, Li T (2012) A data placement method based on Bayesian network for data-intensive scientific workflows. In: Proceedings of the international conference on computer science and service system, pp 1811–1814

  31. Er-Dum Z, Yong-Qiang Q, Xing-Xing X et al (2012) A data placement strategy based on genetic algorithm for scientific workflows. In: Proceedings of the 8th international conference on computational intelligence and security, pp 146–149

  32. Liu S-W, Kong L-M, Ren K-J et al (2011) A two-step data placement and task scheduling strategy for optimizing scientific workflow performance on cloud computing platform. Chin J Comput 34(11):2121–2130

    Article  Google Scholar 

  33. Xi R, Lin N, Chen Y et al (2011) Compression and aggregation of Bayesian estimates for data intensive computing. Knowl Inf Syst 33(1):191–212

    Article  Google Scholar 

  34. Peng Z, Guiling W, Xu X (2013) A data placement approach for workflow in cloud. J Comput Res Dev 50(3):636–647

    Google Scholar 

  35. Zeng P, Cui L-Z, Wang H-Y et al (2010) A data placement strategy for data-intensive applications in cloud. Chin J Comput 33(8):1472–1480

    Article  Google Scholar 

  36. Xie T, Qin X (2006) Scheduling security-critical real-time applications on clusters. IEEE Trans Comput 55(7):864–879

    Article  Google Scholar 

  37. Bishop M (2003) What is computer security? IEEE Secur Priv 1(1):67–69

    Article  Google Scholar 

  38. Xie T, Qin X (2007) Performance evaluation of a new scheduling algorithm for distributed systems with security heterogeneity. J Parallel Distrib Comput 67(10):1067–1081

    Article  MATH  Google Scholar 

  39. Zhu X, Lu P (2009) A two-phase scheduling strategy for real-time applications with security requirements on heterogeneous clusters. Comput Electr Eng 35(6):980–993

    Article  MathSciNet  MATH  Google Scholar 

  40. Zhu X, Qin X, Qiu M (2011) QoS-aware fault-tolerant scheduling for real-time tasks on heterogeneous clusters. IEEE Trans Comput 60(6):800–812

    Article  MathSciNet  Google Scholar 

  41. Stutzle T, Dorigo M (2002) A short convergence proof for a class of ant colony optimization algorithms. IEEE Trans Evol Comput 6(4):358–365

    Article  Google Scholar 

  42. Calheiros RN, Ranjan R, Beloglazov A et al (2011) CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Softw Pract Exp 41(1):23–50

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China under Grant Nos. 61272107, 61202173, and 61103068; the Open Foundation of State Key Lab of Software Engineering of Wuhan University (SKLSE20080720 and SKLSE2012-09-29); the Open Foundation of Key Laboratory of Embedded System and Service Computing Ministry of Education of Tongji University; the Open Foundation of State Key Laboratory for Novel Software Technology of Nanjing University (KFKT2013B21); the Fundamental Research Funds for the Central Universities (WUT: 2013-IV-022 and WUT: 2014-IV-107) and the Ph.D. Programs Foundation of Ministry of Education under Grant No. (20090072110035 and 20110072120017).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Du.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, W., Peng, S., Du, W. et al. Security-aware intermediate data placement strategy in scientific cloud workflows. Knowl Inf Syst 41, 423–447 (2014). https://doi.org/10.1007/s10115-014-0755-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-014-0755-x

Keywords

Navigation