Abstract
Massive computation power and storage capacity of cloud computing systems allow scientists to deploy data-intensive applications without the infrastructure investment, where large application datasets can be stored in the cloud. Based on the pay-as-you-go model, data placement strategies have been developed to cost-effectively store large volumes of generated datasets in the scientific cloud workflows. As promising as it is, this paradigm also introduces many new challenges for data security when the users outsource sensitive data for sharing on the cloud servers, which are not within the same trusted domain as the data owners. This challenge is further complicated by the security constraints on the potential sensitive data for the scientific workflows in the cloud. To effectively address this problem, we propose a security-aware intermediate data placement strategy. First, we build a security overhead model to reasonably measure the security overheads incurred by the sensitive data. Second, we develop a data placement strategy to dynamically place the intermediate data for the scientific workflows. Finally, our experimental results show that our strategy can effectively improve the intermediate data security while ensuring the data transfer time during the execution of scientific workflows.
Similar content being viewed by others
References
Costantino Thanos (2012) Global research data infrastructures: towards a 10-year vision for global research data infrastructures. http://www.grdi2020.eu/Repository/FileScaricati/6bdc07fb-b21d-4b90-81d4-d909fdb96b87.pdf
European Commission High Level Expert Group on Scientific Data (2010) Riding the wave: how Europe can gain from the rising tide of scientific data. http://cordis.europa.eu/fp7/ict/e-infrastructure/docs/hlg-sdi-report.pdf
Demchenko Y, Grosso P, de Laat C et al. (2013) Addressing big data issues in scientific data infrastructure. In: Proceedings of the international conference on collaboration technologies and systems, pp 48–55
Sagiroglu S, Sinanc D (2013) Big data: a review. In: Proceedings of the international conference on collaboration technologies and systems, pp 42–47
Acar UA, Chen Y (2013) Streaming big data with self-adjusting computation. In: Proceedings of the workshop on data driven functional programming, pp 15–18
Baru C, Bhandarkar M, Nambiar R et al (2013) Setting the direction for big data benchmark standards. In: Nambiar R, Poess M (eds) Selected topics in performance evaluation and benchmarking. Lecture notes in computer science, Springer, Heidelberg, pp 197–208
Srivastava D, Dong XL (2013) Big data integration. In: Proceedings of the international conference on data engineering, pp 1245–1248
Fei X, Lu S (2012) A dataflow-based scientific workflow composition framework. IEEE Trans Serv Comput 5(1):45–58
Szalay A, Gray J (2006) 2020 Computing: science in an exponential world. Nature 440(7083):413–414
Deelman E, Gannon D, Shields M et al (2009) Workflows and e-Science: an overview of workflow system features and capabilities. Future Gener Comput Syst 25(5):528–540
Yuan D, Yang Y, Liu X et al (2013) A highly practical approach towards achieving minimum datasets storage cost in the cloud. IEEE Trans Parallel Distrib Syst 24(6):1234–1244
Yuan D, Yang Y, Liu X et al (2012) A data dependency based strategy for intermediate data storage in scientific cloud workflow systems. Concurr Comput Pract Exp 24(9):956–976
Bertram L, Ilkay A, Chad B et al (2006) Scientific workflow management and the Kepler system. Concurr Comput Pract Exp 18(10):1039–1065
Weiss A (2007) Computing in the clouds. ACM Netw 11(4):16–25
Foster I, Yong Z, Raicu I et al (2008) Cloud computing and grid computing 360-degree compared. In: Proceedings of the grid computing environments workshop, pp 1–10
Yuan D, Yang Y, Liu X et al (2010) A data placement strategy in scientific cloud workflows. Future Gener Comput Syst 26(8):1200–1214
Wan C, Wang C, Pei J (2012) A QoS-awared scientific workflow scheduling schema in cloud computing. In: Proceedings of international conference on information science and technology, pp 634–639
Wei L, Zhu H, Cao Z et al (2014) Security and privacy for storage and computation in cloud computing. Inf Sci 258(2):371–386
Chu CK, Zhu WT, Han J et al (2013) Security concerns in popular cloud storage services. IEEE Pervasive Comput 12(4):50–57
Kalloniatis C, Mouratidis H, Islam S (2013) Evaluating cloud deployment scenarios based on security and privacy requirements. Requir Eng 18(4):299–319
Xiong L, Goryczka S, Sunderam V (2011) Adaptive, secure, and scalable distributed data outsourcing: a vision paper. In: Proceedings of workshop on dynamic distributed data-intensive applications, pp 1–6
Mohamed EM, Abdelkader HS, El-Etriby S (2012) Enhanced data security model for cloud computing. In: Proceedings of 8th international conference on informatics and systems, pp 12–17
Kaufman LM (2009) Data security in the world of cloud computing. IEEE Secur Priv 7(4):61–64
Armbrust M, Fox A, Griffith R et al (2010) A view of cloud computing. Commun ACM 53(4):50–58
Saritha S (2010) Google File System. Dissertation, Cochin University of Science and Technology
Hadoop (2011) http://hadoop.apache.org/
Natarajan A (2013) User-oriented modeling of scientific workflows for high frequency event data analysis. In: Proceedings of the 29th IEEE international conference on data engineering workshops, pp 306–309
Guo L, He Z, Zhao S et al (2012) Multi-objective optimization for data placement strategy in cloud computing. In: Liu C, Wang L, Yang A (eds) Information computing and applications. Communications in computer and information science. Springer, Heidelberg, pp 119–126
Guo L, Zhao S, Shen S et al (2012) A particle swarm optimization for data placement strategy in cloud computing. In: Zhu R, Ma Y (eds) Information engineering and applications. Lecture notes in electrical engineering, vol 154. Springer, London, pp 946–953
Ma F, Yang Y, Li T (2012) A data placement method based on Bayesian network for data-intensive scientific workflows. In: Proceedings of the international conference on computer science and service system, pp 1811–1814
Er-Dum Z, Yong-Qiang Q, Xing-Xing X et al (2012) A data placement strategy based on genetic algorithm for scientific workflows. In: Proceedings of the 8th international conference on computational intelligence and security, pp 146–149
Liu S-W, Kong L-M, Ren K-J et al (2011) A two-step data placement and task scheduling strategy for optimizing scientific workflow performance on cloud computing platform. Chin J Comput 34(11):2121–2130
Xi R, Lin N, Chen Y et al (2011) Compression and aggregation of Bayesian estimates for data intensive computing. Knowl Inf Syst 33(1):191–212
Peng Z, Guiling W, Xu X (2013) A data placement approach for workflow in cloud. J Comput Res Dev 50(3):636–647
Zeng P, Cui L-Z, Wang H-Y et al (2010) A data placement strategy for data-intensive applications in cloud. Chin J Comput 33(8):1472–1480
Xie T, Qin X (2006) Scheduling security-critical real-time applications on clusters. IEEE Trans Comput 55(7):864–879
Bishop M (2003) What is computer security? IEEE Secur Priv 1(1):67–69
Xie T, Qin X (2007) Performance evaluation of a new scheduling algorithm for distributed systems with security heterogeneity. J Parallel Distrib Comput 67(10):1067–1081
Zhu X, Lu P (2009) A two-phase scheduling strategy for real-time applications with security requirements on heterogeneous clusters. Comput Electr Eng 35(6):980–993
Zhu X, Qin X, Qiu M (2011) QoS-aware fault-tolerant scheduling for real-time tasks on heterogeneous clusters. IEEE Trans Comput 60(6):800–812
Stutzle T, Dorigo M (2002) A short convergence proof for a class of ant colony optimization algorithms. IEEE Trans Evol Comput 6(4):358–365
Calheiros RN, Ranjan R, Beloglazov A et al (2011) CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Softw Pract Exp 41(1):23–50
Acknowledgments
This work was supported by the National Natural Science Foundation of China under Grant Nos. 61272107, 61202173, and 61103068; the Open Foundation of State Key Lab of Software Engineering of Wuhan University (SKLSE20080720 and SKLSE2012-09-29); the Open Foundation of Key Laboratory of Embedded System and Service Computing Ministry of Education of Tongji University; the Open Foundation of State Key Laboratory for Novel Software Technology of Nanjing University (KFKT2013B21); the Fundamental Research Funds for the Central Universities (WUT: 2013-IV-022 and WUT: 2014-IV-107) and the Ph.D. Programs Foundation of Ministry of Education under Grant No. (20090072110035 and 20110072120017).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Liu, W., Peng, S., Du, W. et al. Security-aware intermediate data placement strategy in scientific cloud workflows. Knowl Inf Syst 41, 423–447 (2014). https://doi.org/10.1007/s10115-014-0755-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-014-0755-x