Skip to main content

Decaf: Decoupled Dataflows for In Situ Workflows

  • Conference paper
  • First Online:
In Situ Visualization for Computational Science

Part of the book series: Mathematics and Visualization ((MATHVISUAL))

Abstract

In situ workflows manage the coordination and communication in a directed graph of heterogeneous tasks executing simultaneously in an high-performance computing system. The communication through the graph can be modeled as a dataflow, and Decaf is a software library for managing the dataflow for in situ workflows. Decaf includes a Python API to define a workflow, creating a complete stand-alone system, but the dataflow design also allows Decaf to support the communication needs of other workflow management systems, because a science campaign may be composed of several workflow tools. Decaf creates efficient parallel communication channels over MPI, including arbitrary data transformations ranging from simple data forwarding to complex data redistribution. Decaf provides three building blocks: (i) a lightweight data model that enables users to define the policies needed to preserve semantic integrity during data redistribution, (ii) flow control designed to prevent overflows in the communication channels between tasks, and (iii) a data contract mechanism that allows users to specify the required data in the parallel communication of the workflow tasks. Decaf has been used in a variety of applications. Two examples are highlighted. The first case is from materials science, where the science campaign consists of several workflow tools that cooperate, and Decaf supports these tools as the dataflow layer. The second problem is motivated by computational cosmology, where the in situ workflow consists of three parallel tasks: synthetic particle generation, Voronoi tessellation, and density estimation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/tpeterka/decaf.

References

  1. Abbasi, H., Wolf, M., Eisenhauer, G., Klasky, S., Schwan, K., Zheng, F.: Datastager: scalable data staging services for petascale applications. In: Proceedings of the 18th ACM International Symposium on High Performance Distributed Computing, HPDC ’09, pp. 39–48. ACM, New York, NY, USA (2009). https://doi.org/10.1145/1551609.1551618. http://doi.acm.org/10.1145/1551609.1551618

  2. Ahern, S., Brugger, E., Whitlock, B., Meredith, J.S., Biagas, K., Miller, M.C., Childs, H.: Visit: Experiences with Sustainable Software (2013). arXiv:1309.1796

  3. Ahrens, J., Geveci, B., Law, C.: 36 paraview: an end-user tool for large-data visualization. The Visualization Handbook, p. 717 (2005)

    Google Scholar 

  4. Bennett, J., Abbasi, H., Bremer, P.T., Grout, R., Gyulassy, A., Jin, T., Klasky, S., Kolla, H., Parashar, M., Pascucci, V., Pebay, P., Thompson, D., Yu, H., Zhang, F., Chen, J.: Combining in-situ and in-transit processing to enable extreme-scale scientific analysis. In: 2012 International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 1–9 (2012). https://doi.org/10.1109/SC.2012.31

  5. Biddiscombe, J., Soumagne, J., Oger, G., Guibert, D., Piccinali, J.G.: Parallel computational steering and analysis for HPC applications using a ParaView interface and the HDF5 DSM virtual file driver. In: Kuhlen, T., Pajarola, R., Zhou, K. (eds.), Eurographics Symposium on Parallel Graphics and Visualization. The Eurographics Association (2011). https://doi.org/10.2312/EGPGV/EGPGV11/091-100

  6. Boyuka, D., Lakshminarasimham, S., Zou, X., Gong, Z., Jenkins, J., Schendel, E., Podhorszki, N., Liu, Q., Klasky, S., Samatova, N.: Transparent I Situ data transformations in ADIOS. In: 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 256–266 (2014). https://doi.org/10.1109/CCGrid.2014.73

  7. Chavent, M., Vanel, A., Tek, A., Levy, B., Robert, S., Raffin, B., Baaden, M.: GPU-accelerated atom and dynamic bond visualization using hyperballs: a unified algorithm for balls, sticks, and hyperboloids. J. Comput. Chem. 32(13), 2924–2935 (2011)

    Article  Google Scholar 

  8. Dayal, J., Bratcher, D., Eisenhauer, G., Schwan, K., Wolf, M., Zhang, X., Abbasi, H., Klasky, S., Podhorszki, N.: Flexpath: Type-based publish/subscribe system for large-scale science analytics. In: 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 246–255 (2014). https://doi.org/10.1109/CCGrid.2014.104

  9. Docan, C., Parashar, M., Klasky, S.: DataSpaces: an interaction and coordination framework for coupled simulation workflows. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing (HPDC ’10), pp. 25–36. ACM, New York, NY, USA (2010). https://doi.org/10.1145/1851476.1851481

  10. Docan, C., Parashar, M., Klasky, S.: Enabling high-speed asynchronous data extraction and transfer using dart. Concurr. Comput. Pract. Exp. 22(9), 1181–1204 (2010). https://doi.org/10.1002/cpe.1567. http://dx.doi.org/10.1002/cpe.1567

  11. Dorier, M., Antoniu, G., Cappello, F., Snir, M., Orf, L.: Damaris: how to efficiently leverage multicore parallelism to achieve scalable, Jitter-free I/O. In: CLUSTER—IEEE International Conference on Cluster Computing. IEEE (2012)

    Google Scholar 

  12. Dorier, M., Antoniu, G., Cappello, F., Snir, M., Sisneros, R., Yildiz, O., Ibrahim, S., Peterka, T., Orf, L.: Damaris: addressing performance variability in data management for post-petascale simulations. ACM Transactions on Parallel Computing (ToPC) (2016)

    Google Scholar 

  13. Dorier, M., Dreher, M., Peterka, T., Wozniak, J.M., Antoniu, G., Raffin, B.: Lessons learned from building in situ coupling frameworks. In: Proceedings of the First Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization, pp. 19–24. ACM (2015)

    Google Scholar 

  14. Dorier, M., Sisneros Roberto, R., Peterka, T., Antoniu, G., Semeraro Dave, B.: Damaris/Viz: a nonintrusive, adaptable and user-friendly in situ visualization framework. In: LDAV—IEEE Symposium on Large-Scale Data Analysis and Visualization, Atlanta, USA (2013). https://hal.inria.fr/hal-00859603

  15. Dorier, M., Yildiz, O., Peterka, T., Ross, R.: The challenges of elastic in situ analysis and visualization. In: Proceedings of the Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization, pp. 23–28 (2019)

    Google Scholar 

  16. Dreher, M., Peterka, T.: Bredala: Semantic data redistribution for in situ applications. In: CLUSTER—IEEE International Conference on Cluster Computing. IEEE (2016)

    Google Scholar 

  17. Dreher, M., Peterka, T.: Decaf: Decoupled dataflows for in situ high-performance workflows. Technical report, Argonne National Lab. (ANL), Argonne, IL (USA) (2017)

    Google Scholar 

  18. Dreher, M., Prevoteau-Jonquet, J., Trellet, M., Piuzzi, M., Baaden, M., Raffin, B., Férey, N., Robert, S., Limet, S.: ExaViz: a flexible framework to analyse, steer and interact with molecular dynamics simulations. Faraday Discuss. Chem. Soc. 169, 119–142 (2014). https://doi.org/10.1039/C3FD00142C. https://hal.inria.fr/hal-00942627

  19. Dreher, M., Raffin, B.: A flexible framework for asynchronous in situ and in transit analytics for scientific simulations. In: 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Chicago, USA (2014). https://hal.inria.fr/hal-00941413

  20. Dreher, M., Sasikumar, K., Sankaranarayanan, S., Peterka, T.: Manala: a flexible flow control library for asynchronous task communication. In: 2017 IEEE International Conference on Cluster Computing (CLUSTER), pp. 509–519. IEEE (2017)

    Google Scholar 

  21. Fabian, N., Moreland, K., Thompson, D., Bauer, A., Marion, P., Geveci, B., Rasquin, M., Jansen, K.: The paraview coprocessing library: a scalable, general purpose In Situ visualization library. In: 2011 IEEE Symposium on Large Data Analysis and Visualization (LDAV), pp. 89–96 (2011). https://doi.org/10.1109/LDAV.2011.6092322

  22. Habib, S., Pope, A., Finkel, H., Frontiere, N., Heitmann, K., Daniel, D., Fasel, P., Morozov, V., Zagaris, G., Peterka, T., et al.: HACC: simulating sky surveys on state-of-the-art supercomputing architectures. New Astron. (2015)

    Google Scholar 

  23. Hart, P.E., Nilsson, N.J., Raphael, B.: A formal basis for the heuristic determination of minimum cost paths. IEEE Trans. Syst. Sci. Cybern. 4(2), 100–107 (1968). https://doi.org/10.1109/TSSC.1968.300136

    Article  Google Scholar 

  24. Humphrey, W., Dalke, A., Schulten, K.: VMD—Visual molecular dynamics. J. Mol. Graph. 14, 33–38 (1996)

    Article  Google Scholar 

  25. Li, M., Vazhkudai, S.S., Butt, A.R., Meng, F., Ma, X., Kim, Y., Engelmann, C., Shipman, G.: Functional partitioning to optimize end-to-end performance on many-core architectures. In: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’10, pp. 1–12. IEEE Computer Society, Washington, DC, USA (2010). https://doi.org/10.1109/SC.2010.28. http://dx.doi.org/10.1109/SC.2010.28

  26. Liu, Q., Logan, J., Tian, Y., Abbasi, H., Podhorszki, N., Choi, J.Y., Klasky, S., Tchoua, R., Lofstead, J., Oldfield, R., Parashar, M., Samatova, N., Schwan, K., Shoshani, A., Wolf, M., Wu, K., Yu, W.: Hello ADIOS: the challenges and lessons of developing leadership class I/O frameworks. Concurr. Comput.: Pract. Exp. 26(7), 1453–1473 (2014). https://doi.org/10.1002/cpe.3125. http://dx.doi.org/10.1002/cpe.3125

  27. Mommessin, C., Dreher, M., Raffin, B., Peterka, T.: Automatic data filtering for in situ workflows. In: 2017 IEEE International Conference on Cluster Computing (CLUSTER), pp. 370–378. IEEE (2017)

    Google Scholar 

  28. Morozov, D., Peterka, T.: Block-Parallel Data Analysis with DIY2 (2016)

    Google Scholar 

  29. Morozov, D., Peterka, T.: Efficient delaunay tessellation through K-D tree decomposition. In: Proceedings of SC16. IEEE Press (2016)

    Google Scholar 

  30. Morton: A computer oriented geodetic data base and a new technique in file sequencing. Technical report Ottawa, Ontario, Canada (1966)

    Google Scholar 

  31. Peterka, T., Croubois, H., Li, N., Rangel, E., Cappello, F.: Self-adaptive density estimation of particle data. SIAM J. Sci. Comput. 38(5), S646–S666 (2016)

    Article  MathSciNet  Google Scholar 

  32. Peterka, T., Morozov, D., Phillips, C.: High-Performance computation of distributed-memory parallel 3D Voronoi and Delaunay Tessellation. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 997–1007. IEEE Press (2014)

    Google Scholar 

  33. Pronk, S., Pall, S., Schulz, R., Larsson, P., Bjelkmar, P., Apostolov, R., Shirts, M.R., Smith, J.C., Kasson, P.M., van der Spoel, D., Hess, B., Lindahl, E.: Gromacs 4.5: a high-throughput and highly parallel open source molecular simulation toolkit. Bioinformatics 29(7), 845–854 (2013)

    Google Scholar 

  34. Whitlock, B., Favre, J.M., Meredith, J.S.: Parallel in situ coupling of simulation with a fully featured visualization system. In: Proceedings of the 11th Eurographics Conference on Parallel Graphics and Visualization, EGPGV ’11, pp. 101–109. Eurographics Association, Aire-la-Ville, Switzerland, Switzerland (2011). https://doi.org/10.2312/EGPGV/EGPGV11/101-109. http://dx.doi.org/10.2312/EGPGV/EGPGV11/101-109

  35. Yu, H., Wang, C., Grout, R., Chen, J., Ma, K.L.: In situ visualization for large-scale combustion simulations. Comput. Graph. Appl. IEEE 30(3), 45–57 (2010)

    Article  Google Scholar 

  36. Zheng, F., Abbasi, H., Cao, J., Dayal, J., Schwan, K., Wolf, M., Klasky, S., Podhorszki, N.: In-situ i/o processing: a case for location flexibility. In: Proceedings of the Sixth Workshop on Parallel Data Storage, PDSW ’11, pp. 37–42. ACM, New York, NY, USA (2011). https://doi.org/10.1145/2159352.2159362. http://doi.acm.org/10.1145/2159352.2159362

  37. Zheng, F., Zou, H., Eisenhauer, G., Schwan, K., Wolf, M., Dayal, J., Nguyen, T.A., Cao, J., Abbasi, H., Klasky, S., Podhorszki, N., Yu, H.: FlexIO: I/O middleware for location-flexible scientific data analytics. In: 2013 IEEE 27th International Symposium on Parallel Distributed Processing (IPDPS), pp. 320–331 (2013). https://doi.org/10.1109/IPDPS.2013.46

Download references

Acknowledgements

This work is supported by Advanced Scientific Computing Research, Office of Science, U.S. Department of Energy, under Contract DE-AC02-06CH11357, program manager Laura Biven. This research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Orcun Yildiz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yildiz, O., Dreher, M., Peterka, T. (2022). Decaf: Decoupled Dataflows for In Situ Workflows. In: Childs, H., Bennett, J.C., Garth, C. (eds) In Situ Visualization for Computational Science. Mathematics and Visualization. Springer, Cham. https://doi.org/10.1007/978-3-030-81627-8_7

Download citation

Publish with us

Policies and ethics