Skip to main content
Log in

Reducing data transfer in big-data workflows: the computation-flow delegated approach

  • Original Article
  • Published:
Journal of Data, Information and Management Aims and scope Submit manuscript

Abstract

Existing orchestrated bioinformatics workflow execution approaches necessitate the transfer of datasets from biological data services to the analysis tool (computation) services of the workflow for various data analysis. This model of moving data to computation during workflow execution weakens the performance of the workflow especially when the orchestrated bioinformatics workflow has to handle big-data in it. Since the size of the analysis tools are much smaller than the datasets size in a workflow, in this paper, to minimize the dataflow and improve workflow performance, we propose a novel computation-flow delegated (CFD) approach. The CFD approach lets the tool services of the workflow to dynamically migrate analysis tools towards the datasets to perform computation on data side during workflow execution. We use a set of mobile agents to operate the CFD approach and present a mobile agent-based computation-flow delegation framework (MABCFD) to execute the workflow tasks. We implement the prototype of the MABCFD framework and analyze the performance of the CFD approach empirically by executing in isolation workflow patterns (sequence, fan-out and fan-in) common to bioinformatics applications. Performance analysis shows that the computation-driven CFD approach consistently outperforms the existing data-driven approaches across all patterns and scales favorably with data size.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410. Elsevier

    Article  Google Scholar 

  • Baker A (2009) Optimizing service orchestrations. arXiv:0901.4762

  • Barika M, Garg S, Zomaya A, Wang L, van Moorsel A, Ranjan R (2019) Orchestrating big data analysis workflows in the cloud: research challenges, survey, and future directions. ACM Comput Surv, pp 1–37

  • Barker A, Weissman JB, van Hemert JI (2009) The circulate architecture: avoiding workflow bottlenecks caused by centralised orchestration. Clust Comput 12(2):221–235. Springer

    Article  Google Scholar 

  • Barker A, Weissman JB, van Hemert JI (2012) Reducing data transfer in service-oriented architectures: the circulate approach. IEEE Trans Serv Comput 5(3):437–449

    Article  Google Scholar 

  • Binder W, Constantinescu I, Faltings B (2006) Decentralized orchestration of compositeweb services. In: 2006 IEEE international conference on web services (ICWS’06). IEEE, pp 869– 876

  • Boutet E, Lieberherr D, Tognolli M, Schneider M, Bansal P, Bridge AJ, Poux S, Bougueleret L, Xenarios I (2016) Uniprotkb/swiss-prot, the manually annotated section of the uniprot knowledgebase: how to use the entry view. Plant Bioinformatics: Methods and Protocols. Springer, pp 23–54

    Chapter  Google Scholar 

  • Brewington B, Gray R, Moizumi K, Kotz D, Cybenko G, Rus D (1999) Mobile agents for distributed information retrieval. In: Intelligent information agents. Springer, pp 355–395

  • Cao L, Gorodetsky V, Mitkas PA (2009) Agent mining: the synergy of agents and data mining. IEEE Intell. Syst. 24(3)

    Article  Google Scholar 

  • Cao L, Weiss G, Philip SY (2012) A brief introduction to agent mining. Auton Agent Multi-Agent Syst 25(3):419–424

    Article  Google Scholar 

  • Craddock T, Harwood CR, Hallinan J, Wipat A (2008) e-science: relieving bottlenecks in large-scale genome analyses. Nat Rev Microbiol 6(12):948–954. Nature Publishing Group

    Article  Google Scholar 

  • Delias P, Doulamis A, Matsatsinis N (2011) What agents can do in workflow management systems. Artif Intell Rev 35(2):155–189. Kluwer Academic Publishers

    Article  Google Scholar 

  • Du TC, Li EY, Chang A-P (2003) Mobile agents in distributed network management. Commun ACM 46(7):127–132

    Article  Google Scholar 

  • Javadi B, Tomko M, Sinnott RO (2012) Decentralized orchestration of data-centric workflows using the object modeling system. In: Proceedings of the 2012 12th IEEE/ACM international symposium on cluster, cloud and grid computing (ccgrid 2012). IEEE, pp 73–80

  • Kacsuk P, Kovács J, Farkas Z (2018) The flowbster cloud-oriented workflow system to process large scientific data sets. Journal of Grid Computing 16(1):55–83

    Article  Google Scholar 

  • Khan S, Shakil KA, Alam M (2017) Workflow-based big data analytics in the cloud environment present research status and future prospects. arXiv:1711.02087

  • Kintsakis AM, Psomopoulos FE, Mitkas PA (2016) Data-aware optimization of bioinformatics workflows in hybrid clouds. Journal of Big Data 3(1):20

    Article  Google Scholar 

  • Luck M, Merelli E (2005) Agents in bioinformatics. Knowl Eng Rev 20(02):117–125. Cambridge University Press

    Article  Google Scholar 

  • Luckeneder M, Barker A (2013) Location, location Data-intensive distributed computing in the cloud. In: IEEE 5th international conference on cloud computing technology and science (cloudcom), vol 1. IEEE, pp 647–654

  • Manzoor U, Nefti S (2010) Quiet: a methodology for autonomous software deployment using mobile agents. J Netw Comput Appl 33(6):696–706. Elsevier

    Article  Google Scholar 

  • Marx V (2013) Biology: the big challenges of big data. Nature 498(7453):255–260

    Article  Google Scholar 

  • Pham VA, Karmouch Ahmed (1998) Mobile software agents: an overview. IEEE Commun Mag 36(7):26–37

    Article  Google Scholar 

  • Rothermel K, Hohl F, Radouniklis N (1997) Mobile agent systems: What is missing. Distributed Applications and Interoperable Systems (DAIS’97), Chapman & Hall, pp 111–124

  • Sharma A, Rai A, Lal SB (2013) Workflow management systems for gene sequence analysis and evolutionary studies–a review. Bioinformation 9(13):663. Biomedical Informatics Publishing Group

    Article  Google Scholar 

  • Song W, Chen F, Jacobsen H-A, Xia X, Ye C, Ma X (2017) Scientific workflow mining in clouds. IEEE Trans Parallel Distrib Syst 28(10):2979–2992

    Article  Google Scholar 

  • Subramanian S, Puntervoll P, Sztromwasser P (2010) Optimizing the data-traffic of centrally coordinated scientific workflow systems. In: International conference on web services (ICWS), pp 685–688

  • Subramanian S, Sztromwasser P, Petersen K, Puntervoll P (2012) Direct data transfer between soap web services in orchestration. In: Proceedings of the 14th international conference on information integration and web-based applications & services. ACM, pp 91–100

  • Subramanian S, Sztromwasser P, Puntervoll P, Petersen K (2013) Pipelined data-flow delegated orchestration for data-intensive escience workflows. International Journal of Web Information Systems 9(3):204–218

    Article  Google Scholar 

  • Sztromwasser P, Petersen K, Puntervoll P (2011) Data partitioning enables the use of standard soap web services in genome-scale workflows. J Integr Bioinform (JIB) 8(2):95–114

    Google Scholar 

  • Wieland M, Gorlach K, Schumm D, Leymann F (2009) Towards reference passing in web service and workflow-based applications. In: Enterprise distributed object computing conference, 2009. EDOC’09. IEEE international. IEEE, pp 109–118

  • Yang X, Wallom D, Waddington S, Wang J, Shaon A, Matthews B, Wilson M, Guo Y, Guo L, Blower JD et al (2014) Cloud computing in e-science: research challenges and opportunities. J Supercomput 70(1):408–464

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rickey T. P. Nunes.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nunes, R.T.P., Deshpande, S.L. Reducing data transfer in big-data workflows: the computation-flow delegated approach. J. of Data, Inf. and Manag. 1, 129–145 (2019). https://doi.org/10.1007/s42488-019-00012-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42488-019-00012-z

Keywords

Navigation