Abstract
Existing orchestrated bioinformatics workflow execution approaches necessitate the transfer of datasets from biological data services to the analysis tool (computation) services of the workflow for various data analysis. This model of moving data to computation during workflow execution weakens the performance of the workflow especially when the orchestrated bioinformatics workflow has to handle big-data in it. Since the size of the analysis tools are much smaller than the datasets size in a workflow, in this paper, to minimize the dataflow and improve workflow performance, we propose a novel computation-flow delegated (CFD) approach. The CFD approach lets the tool services of the workflow to dynamically migrate analysis tools towards the datasets to perform computation on data side during workflow execution. We use a set of mobile agents to operate the CFD approach and present a mobile agent-based computation-flow delegation framework (MABCFD) to execute the workflow tasks. We implement the prototype of the MABCFD framework and analyze the performance of the CFD approach empirically by executing in isolation workflow patterns (sequence, fan-out and fan-in) common to bioinformatics applications. Performance analysis shows that the computation-driven CFD approach consistently outperforms the existing data-driven approaches across all patterns and scales favorably with data size.
Similar content being viewed by others
References
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410. Elsevier
Baker A (2009) Optimizing service orchestrations. arXiv:0901.4762
Barika M, Garg S, Zomaya A, Wang L, van Moorsel A, Ranjan R (2019) Orchestrating big data analysis workflows in the cloud: research challenges, survey, and future directions. ACM Comput Surv, pp 1–37
Barker A, Weissman JB, van Hemert JI (2009) The circulate architecture: avoiding workflow bottlenecks caused by centralised orchestration. Clust Comput 12(2):221–235. Springer
Barker A, Weissman JB, van Hemert JI (2012) Reducing data transfer in service-oriented architectures: the circulate approach. IEEE Trans Serv Comput 5(3):437–449
Binder W, Constantinescu I, Faltings B (2006) Decentralized orchestration of compositeweb services. In: 2006 IEEE international conference on web services (ICWS’06). IEEE, pp 869– 876
Boutet E, Lieberherr D, Tognolli M, Schneider M, Bansal P, Bridge AJ, Poux S, Bougueleret L, Xenarios I (2016) Uniprotkb/swiss-prot, the manually annotated section of the uniprot knowledgebase: how to use the entry view. Plant Bioinformatics: Methods and Protocols. Springer, pp 23–54
Brewington B, Gray R, Moizumi K, Kotz D, Cybenko G, Rus D (1999) Mobile agents for distributed information retrieval. In: Intelligent information agents. Springer, pp 355–395
Cao L, Gorodetsky V, Mitkas PA (2009) Agent mining: the synergy of agents and data mining. IEEE Intell. Syst. 24(3)
Cao L, Weiss G, Philip SY (2012) A brief introduction to agent mining. Auton Agent Multi-Agent Syst 25(3):419–424
Craddock T, Harwood CR, Hallinan J, Wipat A (2008) e-science: relieving bottlenecks in large-scale genome analyses. Nat Rev Microbiol 6(12):948–954. Nature Publishing Group
Delias P, Doulamis A, Matsatsinis N (2011) What agents can do in workflow management systems. Artif Intell Rev 35(2):155–189. Kluwer Academic Publishers
Du TC, Li EY, Chang A-P (2003) Mobile agents in distributed network management. Commun ACM 46(7):127–132
Javadi B, Tomko M, Sinnott RO (2012) Decentralized orchestration of data-centric workflows using the object modeling system. In: Proceedings of the 2012 12th IEEE/ACM international symposium on cluster, cloud and grid computing (ccgrid 2012). IEEE, pp 73–80
Kacsuk P, Kovács J, Farkas Z (2018) The flowbster cloud-oriented workflow system to process large scientific data sets. Journal of Grid Computing 16(1):55–83
Khan S, Shakil KA, Alam M (2017) Workflow-based big data analytics in the cloud environment present research status and future prospects. arXiv:1711.02087
Kintsakis AM, Psomopoulos FE, Mitkas PA (2016) Data-aware optimization of bioinformatics workflows in hybrid clouds. Journal of Big Data 3(1):20
Luck M, Merelli E (2005) Agents in bioinformatics. Knowl Eng Rev 20(02):117–125. Cambridge University Press
Luckeneder M, Barker A (2013) Location, location Data-intensive distributed computing in the cloud. In: IEEE 5th international conference on cloud computing technology and science (cloudcom), vol 1. IEEE, pp 647–654
Manzoor U, Nefti S (2010) Quiet: a methodology for autonomous software deployment using mobile agents. J Netw Comput Appl 33(6):696–706. Elsevier
Marx V (2013) Biology: the big challenges of big data. Nature 498(7453):255–260
Pham VA, Karmouch Ahmed (1998) Mobile software agents: an overview. IEEE Commun Mag 36(7):26–37
Rothermel K, Hohl F, Radouniklis N (1997) Mobile agent systems: What is missing. Distributed Applications and Interoperable Systems (DAIS’97), Chapman & Hall, pp 111–124
Sharma A, Rai A, Lal SB (2013) Workflow management systems for gene sequence analysis and evolutionary studies–a review. Bioinformation 9(13):663. Biomedical Informatics Publishing Group
Song W, Chen F, Jacobsen H-A, Xia X, Ye C, Ma X (2017) Scientific workflow mining in clouds. IEEE Trans Parallel Distrib Syst 28(10):2979–2992
Subramanian S, Puntervoll P, Sztromwasser P (2010) Optimizing the data-traffic of centrally coordinated scientific workflow systems. In: International conference on web services (ICWS), pp 685–688
Subramanian S, Sztromwasser P, Petersen K, Puntervoll P (2012) Direct data transfer between soap web services in orchestration. In: Proceedings of the 14th international conference on information integration and web-based applications & services. ACM, pp 91–100
Subramanian S, Sztromwasser P, Puntervoll P, Petersen K (2013) Pipelined data-flow delegated orchestration for data-intensive escience workflows. International Journal of Web Information Systems 9(3):204–218
Sztromwasser P, Petersen K, Puntervoll P (2011) Data partitioning enables the use of standard soap web services in genome-scale workflows. J Integr Bioinform (JIB) 8(2):95–114
Wieland M, Gorlach K, Schumm D, Leymann F (2009) Towards reference passing in web service and workflow-based applications. In: Enterprise distributed object computing conference, 2009. EDOC’09. IEEE international. IEEE, pp 109–118
Yang X, Wallom D, Waddington S, Wang J, Shaon A, Matthews B, Wilson M, Guo Y, Guo L, Blower JD et al (2014) Cloud computing in e-science: research challenges and opportunities. J Supercomput 70(1):408–464
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Nunes, R.T.P., Deshpande, S.L. Reducing data transfer in big-data workflows: the computation-flow delegated approach. J. of Data, Inf. and Manag. 1, 129–145 (2019). https://doi.org/10.1007/s42488-019-00012-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42488-019-00012-z