Abstract
In situ workflows manage the coordination and communication in a directed graph of heterogeneous tasks executing simultaneously in an high-performance computing system. The communication through the graph can be modeled as a dataflow, and Decaf is a software library for managing the dataflow for in situ workflows. Decaf includes a Python API to define a workflow, creating a complete stand-alone system, but the dataflow design also allows Decaf to support the communication needs of other workflow management systems, because a science campaign may be composed of several workflow tools. Decaf creates efficient parallel communication channels over MPI, including arbitrary data transformations ranging from simple data forwarding to complex data redistribution. Decaf provides three building blocks: (i) a lightweight data model that enables users to define the policies needed to preserve semantic integrity during data redistribution, (ii) flow control designed to prevent overflows in the communication channels between tasks, and (iii) a data contract mechanism that allows users to specify the required data in the parallel communication of the workflow tasks. Decaf has been used in a variety of applications. Two examples are highlighted. The first case is from materials science, where the science campaign consists of several workflow tools that cooperate, and Decaf supports these tools as the dataflow layer. The second problem is motivated by computational cosmology, where the in situ workflow consists of three parallel tasks: synthetic particle generation, Voronoi tessellation, and density estimation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abbasi, H., Wolf, M., Eisenhauer, G., Klasky, S., Schwan, K., Zheng, F.: Datastager: scalable data staging services for petascale applications. In: Proceedings of the 18th ACM International Symposium on High Performance Distributed Computing, HPDC ’09, pp. 39–48. ACM, New York, NY, USA (2009). https://doi.org/10.1145/1551609.1551618. http://doi.acm.org/10.1145/1551609.1551618
Ahern, S., Brugger, E., Whitlock, B., Meredith, J.S., Biagas, K., Miller, M.C., Childs, H.: Visit: Experiences with Sustainable Software (2013). arXiv:1309.1796
Ahrens, J., Geveci, B., Law, C.: 36 paraview: an end-user tool for large-data visualization. The Visualization Handbook, p. 717 (2005)
Bennett, J., Abbasi, H., Bremer, P.T., Grout, R., Gyulassy, A., Jin, T., Klasky, S., Kolla, H., Parashar, M., Pascucci, V., Pebay, P., Thompson, D., Yu, H., Zhang, F., Chen, J.: Combining in-situ and in-transit processing to enable extreme-scale scientific analysis. In: 2012 International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 1–9 (2012). https://doi.org/10.1109/SC.2012.31
Biddiscombe, J., Soumagne, J., Oger, G., Guibert, D., Piccinali, J.G.: Parallel computational steering and analysis for HPC applications using a ParaView interface and the HDF5 DSM virtual file driver. In: Kuhlen, T., Pajarola, R., Zhou, K. (eds.), Eurographics Symposium on Parallel Graphics and Visualization. The Eurographics Association (2011). https://doi.org/10.2312/EGPGV/EGPGV11/091-100
Boyuka, D., Lakshminarasimham, S., Zou, X., Gong, Z., Jenkins, J., Schendel, E., Podhorszki, N., Liu, Q., Klasky, S., Samatova, N.: Transparent I Situ data transformations in ADIOS. In: 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 256–266 (2014). https://doi.org/10.1109/CCGrid.2014.73
Chavent, M., Vanel, A., Tek, A., Levy, B., Robert, S., Raffin, B., Baaden, M.: GPU-accelerated atom and dynamic bond visualization using hyperballs: a unified algorithm for balls, sticks, and hyperboloids. J. Comput. Chem. 32(13), 2924–2935 (2011)
Dayal, J., Bratcher, D., Eisenhauer, G., Schwan, K., Wolf, M., Zhang, X., Abbasi, H., Klasky, S., Podhorszki, N.: Flexpath: Type-based publish/subscribe system for large-scale science analytics. In: 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 246–255 (2014). https://doi.org/10.1109/CCGrid.2014.104
Docan, C., Parashar, M., Klasky, S.: DataSpaces: an interaction and coordination framework for coupled simulation workflows. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing (HPDC ’10), pp. 25–36. ACM, New York, NY, USA (2010). https://doi.org/10.1145/1851476.1851481
Docan, C., Parashar, M., Klasky, S.: Enabling high-speed asynchronous data extraction and transfer using dart. Concurr. Comput. Pract. Exp. 22(9), 1181–1204 (2010). https://doi.org/10.1002/cpe.1567. http://dx.doi.org/10.1002/cpe.1567
Dorier, M., Antoniu, G., Cappello, F., Snir, M., Orf, L.: Damaris: how to efficiently leverage multicore parallelism to achieve scalable, Jitter-free I/O. In: CLUSTER—IEEE International Conference on Cluster Computing. IEEE (2012)
Dorier, M., Antoniu, G., Cappello, F., Snir, M., Sisneros, R., Yildiz, O., Ibrahim, S., Peterka, T., Orf, L.: Damaris: addressing performance variability in data management for post-petascale simulations. ACM Transactions on Parallel Computing (ToPC) (2016)
Dorier, M., Dreher, M., Peterka, T., Wozniak, J.M., Antoniu, G., Raffin, B.: Lessons learned from building in situ coupling frameworks. In: Proceedings of the First Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization, pp. 19–24. ACM (2015)
Dorier, M., Sisneros Roberto, R., Peterka, T., Antoniu, G., Semeraro Dave, B.: Damaris/Viz: a nonintrusive, adaptable and user-friendly in situ visualization framework. In: LDAV—IEEE Symposium on Large-Scale Data Analysis and Visualization, Atlanta, USA (2013). https://hal.inria.fr/hal-00859603
Dorier, M., Yildiz, O., Peterka, T., Ross, R.: The challenges of elastic in situ analysis and visualization. In: Proceedings of the Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization, pp. 23–28 (2019)
Dreher, M., Peterka, T.: Bredala: Semantic data redistribution for in situ applications. In: CLUSTER—IEEE International Conference on Cluster Computing. IEEE (2016)
Dreher, M., Peterka, T.: Decaf: Decoupled dataflows for in situ high-performance workflows. Technical report, Argonne National Lab. (ANL), Argonne, IL (USA) (2017)
Dreher, M., Prevoteau-Jonquet, J., Trellet, M., Piuzzi, M., Baaden, M., Raffin, B., Férey, N., Robert, S., Limet, S.: ExaViz: a flexible framework to analyse, steer and interact with molecular dynamics simulations. Faraday Discuss. Chem. Soc. 169, 119–142 (2014). https://doi.org/10.1039/C3FD00142C. https://hal.inria.fr/hal-00942627
Dreher, M., Raffin, B.: A flexible framework for asynchronous in situ and in transit analytics for scientific simulations. In: 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Chicago, USA (2014). https://hal.inria.fr/hal-00941413
Dreher, M., Sasikumar, K., Sankaranarayanan, S., Peterka, T.: Manala: a flexible flow control library for asynchronous task communication. In: 2017 IEEE International Conference on Cluster Computing (CLUSTER), pp. 509–519. IEEE (2017)
Fabian, N., Moreland, K., Thompson, D., Bauer, A., Marion, P., Geveci, B., Rasquin, M., Jansen, K.: The paraview coprocessing library: a scalable, general purpose In Situ visualization library. In: 2011 IEEE Symposium on Large Data Analysis and Visualization (LDAV), pp. 89–96 (2011). https://doi.org/10.1109/LDAV.2011.6092322
Habib, S., Pope, A., Finkel, H., Frontiere, N., Heitmann, K., Daniel, D., Fasel, P., Morozov, V., Zagaris, G., Peterka, T., et al.: HACC: simulating sky surveys on state-of-the-art supercomputing architectures. New Astron. (2015)
Hart, P.E., Nilsson, N.J., Raphael, B.: A formal basis for the heuristic determination of minimum cost paths. IEEE Trans. Syst. Sci. Cybern. 4(2), 100–107 (1968). https://doi.org/10.1109/TSSC.1968.300136
Humphrey, W., Dalke, A., Schulten, K.: VMD—Visual molecular dynamics. J. Mol. Graph. 14, 33–38 (1996)
Li, M., Vazhkudai, S.S., Butt, A.R., Meng, F., Ma, X., Kim, Y., Engelmann, C., Shipman, G.: Functional partitioning to optimize end-to-end performance on many-core architectures. In: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’10, pp. 1–12. IEEE Computer Society, Washington, DC, USA (2010). https://doi.org/10.1109/SC.2010.28. http://dx.doi.org/10.1109/SC.2010.28
Liu, Q., Logan, J., Tian, Y., Abbasi, H., Podhorszki, N., Choi, J.Y., Klasky, S., Tchoua, R., Lofstead, J., Oldfield, R., Parashar, M., Samatova, N., Schwan, K., Shoshani, A., Wolf, M., Wu, K., Yu, W.: Hello ADIOS: the challenges and lessons of developing leadership class I/O frameworks. Concurr. Comput.: Pract. Exp. 26(7), 1453–1473 (2014). https://doi.org/10.1002/cpe.3125. http://dx.doi.org/10.1002/cpe.3125
Mommessin, C., Dreher, M., Raffin, B., Peterka, T.: Automatic data filtering for in situ workflows. In: 2017 IEEE International Conference on Cluster Computing (CLUSTER), pp. 370–378. IEEE (2017)
Morozov, D., Peterka, T.: Block-Parallel Data Analysis with DIY2 (2016)
Morozov, D., Peterka, T.: Efficient delaunay tessellation through K-D tree decomposition. In: Proceedings of SC16. IEEE Press (2016)
Morton: A computer oriented geodetic data base and a new technique in file sequencing. Technical report Ottawa, Ontario, Canada (1966)
Peterka, T., Croubois, H., Li, N., Rangel, E., Cappello, F.: Self-adaptive density estimation of particle data. SIAM J. Sci. Comput. 38(5), S646–S666 (2016)
Peterka, T., Morozov, D., Phillips, C.: High-Performance computation of distributed-memory parallel 3D Voronoi and Delaunay Tessellation. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 997–1007. IEEE Press (2014)
Pronk, S., Pall, S., Schulz, R., Larsson, P., Bjelkmar, P., Apostolov, R., Shirts, M.R., Smith, J.C., Kasson, P.M., van der Spoel, D., Hess, B., Lindahl, E.: Gromacs 4.5: a high-throughput and highly parallel open source molecular simulation toolkit. Bioinformatics 29(7), 845–854 (2013)
Whitlock, B., Favre, J.M., Meredith, J.S.: Parallel in situ coupling of simulation with a fully featured visualization system. In: Proceedings of the 11th Eurographics Conference on Parallel Graphics and Visualization, EGPGV ’11, pp. 101–109. Eurographics Association, Aire-la-Ville, Switzerland, Switzerland (2011). https://doi.org/10.2312/EGPGV/EGPGV11/101-109. http://dx.doi.org/10.2312/EGPGV/EGPGV11/101-109
Yu, H., Wang, C., Grout, R., Chen, J., Ma, K.L.: In situ visualization for large-scale combustion simulations. Comput. Graph. Appl. IEEE 30(3), 45–57 (2010)
Zheng, F., Abbasi, H., Cao, J., Dayal, J., Schwan, K., Wolf, M., Klasky, S., Podhorszki, N.: In-situ i/o processing: a case for location flexibility. In: Proceedings of the Sixth Workshop on Parallel Data Storage, PDSW ’11, pp. 37–42. ACM, New York, NY, USA (2011). https://doi.org/10.1145/2159352.2159362. http://doi.acm.org/10.1145/2159352.2159362
Zheng, F., Zou, H., Eisenhauer, G., Schwan, K., Wolf, M., Dayal, J., Nguyen, T.A., Cao, J., Abbasi, H., Klasky, S., Podhorszki, N., Yu, H.: FlexIO: I/O middleware for location-flexible scientific data analytics. In: 2013 IEEE 27th International Symposium on Parallel Distributed Processing (IPDPS), pp. 320–331 (2013). https://doi.org/10.1109/IPDPS.2013.46
Acknowledgements
This work is supported by Advanced Scientific Computing Research, Office of Science, U.S. Department of Energy, under Contract DE-AC02-06CH11357, program manager Laura Biven. This research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Yildiz, O., Dreher, M., Peterka, T. (2022). Decaf: Decoupled Dataflows for In Situ Workflows. In: Childs, H., Bennett, J.C., Garth, C. (eds) In Situ Visualization for Computational Science. Mathematics and Visualization. Springer, Cham. https://doi.org/10.1007/978-3-030-81627-8_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-81627-8_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-81626-1
Online ISBN: 978-3-030-81627-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)