Skip to main content
Log in

In-situ feature-based objects tracking for data-intensive scientific and enterprise analytics workflows

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Emerging scientific simulations on leadership class systems are generating huge amounts of data and processing this data in an efficient and timely manner is critical for generating insights from the simulations. However, the increasing gap between computation and disk I/O speeds makes traditional data analytics pipelines based on post-processing cost prohibitive and often infeasible. In this paper, we investigate an alternate approach that aims to bring the analytics closer to the data using in-situ execution of data analysis operations. Specifically, we present the design, implementation and evaluation of a framework that can support in-situ feature-based objects tracking on distributed scientific datasets. Central to this framework is a scalable decentralized and online clustering, a cluster tracking algorithm, which executes in-situ (on different cores) in parallel with the simulation processes, and retrieves data from the simulations directly via on-chip shared memory. The results from our experimental evaluation demonstrate that the in-situ approach significantly reduces the cost of data movement, that the presented framework can support scalable feature-based objects tracking, and that it can be effectively used for in-situ analytics in large scale simulations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Childs, H.: Architectural challenges and solutions for petascale postprocessing. J. Phys. 78(1), 12 (2007)

    Google Scholar 

  2. Gamell, M., Rodero, I., Parashar, M., Poole, S.: “Exploring energy and performance behaviors of data-intensive scientific workflows on systems with deep memory hierarchies”. In: Proceedings of the 20th International Conference on High Performance Computing (HiPC), pp. 1–10. (2013)

  3. Zhang, F., Docan, C., Parashar, M., Klasky, S.: “Dads: a dynamic and adaptive data space for interacting parallel applications”. In: Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS 2010), Marina Del Rey (2010)

  4. Bennett, J.C., Abbasi, H., Bremer, P.-T., Grout, R., Gyulassy, A., Jin, T., Klasky, S., Kolla, H., Parashar, M., Pascucci, V., Pebay, P., Thompson, D., Yu, H., Zhang, F., Chen, J.: “Combining in-situ and in-transit processing to enable extreme-scale scientific analysis”. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, ser. SC ’12, 2012, pp. 49:1–49:9

  5. Gamell, M., Rodero, I., Parashar, M., Bennett, J., et al.: “Exploring power behaviors and tradeoffs of in-situ data analytics”. In: International Conferencce on High Performance Computing Networking, Storage and Analysis (SC), pp. 1–12. Denver, Nov 2013

  6. Quiroz, A., Parashar, M., Gnanasambandam, N., Sharma, N.: “Design and evaluation of decentralized online clustering”. ACM Trans. Auton. Adapt. Syst. 7(3), 34:1–34:31 (2012). doi:10.1145/2348832.2348837

    Article  Google Scholar 

  7. Quiroz, A., Gnanasambandam, N., Parashar, M., Sharma, N.: Robust clustering analysis for the management of self-monitoring distributed systems. Clust. Comput. 12(1), 73–85 (Mar. 2009)

  8. Chen, J.H., Choudhary, A., de Supinski, B., DeVries, M., Hawkes, E.R., Klasky, S., Liao, W.K., Ma, K.L., Mellor-Crummey, J., Podhorski, N., Sankaran, R., Shende, S., Yoo, C.S.: Terascale direct numerical simulations of turbulent combustion using s3d. Comput. Sci. Discov. 2, 1–31 (2009)

    Article  Google Scholar 

  9. Docan, C., Parashar, M., Klasky, S.: “Dataspaces: an interaction and coordination framework for-coupled simulation workflows”. Clust. Comput. 15(2), 163–181 (2012). doi:10.1007/s10586-011-0162-y

    Article  Google Scholar 

  10. Podhorszki, N., Klasky, S., Liu, Q., Docan, C., Parashar, M., Abbasi, H., Lofstead, J., Schwan, K., Wolf, M., Zheng, F., Cummings, J.: “Plasma fusion code coupling using scalable i/o services and scientific workflows”. In: Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science, ser. WORKS ’09, pp. 8:1–8:9. ACM, New York, (2009) doi:10.1145/1645164.1645172

  11. Pak, A., Paroubek, P.: “Twitter as a corpus for sentiment analysis and opinion mining”. In: LREC, Baton Rouge (2010)

  12. Zhang, F., Docan, C., Parashar, M., Klasky, S., Podhorszki, N., Abbasi, H.: “Enabling in-situ execution of coupled scientific workflow on multi-core platform”. In: Proceedings of 26th IEEE International Parallel and Distributed Processing Symposium (IPDPS’12), (2012)

  13. Quiroz, A.: Decentralized online clustering for supporting autonomic management of distributed systems. Ph.D in Electrical and Computer Engineering, Rutgers University, (2010)

  14. Schmidt, C., Parashar, M.: “Flexible information discovery in decentralized distributed systems”. In: Proceedings of the 12th High Performance Distributed Computing (HPDC), pp. 226–235. (2003)

  15. Yu, H., Wang, C., Grout, R., Chen, J., Ma, K.-L.: In situ visualization for large-scale combustion simulations. IEEE Comput. Graph. Appl. 30(3), 45–57 (2010)

    Article  Google Scholar 

  16. Kim, J., Abbasi, H., Chacon, L., Docan, C., Klasky, S., Liu, Q., Podhorszki, N., Shoshani, A., Wu, K.: “Parallel in situ indexing for data-intensive computing”. In: Proceedings of IEEE Symposium on Large Data Analysis and Visualization (LDAV’11), Oct (2011)

  17. Whitlock, B., Favre, J.M., Meredith, J.S.: “Parallel in situ coupling of simulation with a fully featured visualization system”. In: Proceedings of 11th Eurographics Symposium on Parallel Graphics and Visualization (EGPGV’11), Apr (2011)

  18. Fabian, N., Moreland, K., Thompson, D., Bauer, A., Marion, P., Gevecik, B., Rasquin, M., Jansen, K.: “The paraview coprocessing library: a scalable, general purpose in situ visualization library”. In Proceedings of IEEE Symposium on Large Data Analysis and Visualization (LDAV’11), Oct (2011)

  19. Abbasi, H., Wolf, M., Eisenhauer, G., Klasky, S., Schwan, K., Zheng, F.: “Datastager: scalable data staging services for petascale applications”. In: Proceedings of 18th International Symposium on High Performance Distributed Computing (HPDC’09), (2009)

  20. Zheng, F., Abbasi, H., Docan, C., Lofstead, J., Klasky, S., Liu, Q., Parashar, M., Podhorszki, N., Schwan, K., Wolf, M.: “PreDatA - preparatory data analytics on peta-scale machines”. In: Proceedings of 24th IEEE International Parallel and Distributed Processing Symposium (IPDPS’10), Apr (2010)

  21. Abbasi, H., Eisenhauer, G., Wolf, M., Schwan, K., Klasky, S.: “Just in time: adding value to the IO pipelines of high performance applications with JIT staging”. In: Proceedings 20th International Symposium on High Performance Distributed Computing (HPDC’11), June (2011)

  22. Docan, C., Parashar, M., Cummings, J., Klasky, S.: “Moving the code to the data - dynamic code deployment using active spaces”. In: Proceedings of 25th IEEE International Parallel and Distributed Processing Symposium (IPDPS’11), May (2011)

  23. Vishwanath, V., Hereld, M., Papka, M.: “Toward simulation-time data analysis and i/o acceleration on leadership-class systems”. In: Proceedings of IEEE Symposium on Large Data Analysis and Visualization (LDAV’11), Oct 2011

  24. Gelernter, D.: Generative communication in Linda. ACM Trans. Programm. Lang. Syst. 7(1), 80–112 (1985)

    Article  MATH  Google Scholar 

  25. Zhang, L., Parashar, M.: “A dynamic geometry-based shared space interaction framework for parallel scientific applications”. In: Proceedings of the 11th International Conference on High Performance Computing (HiPC’04), 2004

  26. “Enabling efficient and flexible coupling of parallel scientific applications”. In: Proceedings of the 20th IEEE International Parallel and Distributed Processing Symposium (IPDPS’06), 2006

  27. Docan, C., Parashar, M., Klasky, S.: “DataSpaces: an interaction and coordination framework for coupled simulation workflows”. In: Proceedings of 19th International Symposium on High Performance and Distributed Computing (HPDC’10), June 2010

  28. Guha, S., Meyerson, A., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams: theory and practice. IEEE Trans. Knowl. Data Eng. 15(3), 515–528 (2003)

    Article  Google Scholar 

  29. Charikar, M., O’Callaghan, L., Panigrahy, R.: “Better streaming algorithms for clustering problems”. In: Proceedings of the Thirty-fifth Annual ACM Symposium on Theory of Computing, pp. 30–39. (2003)

  30. Aggarwal, C.C., Watson, T.J., Ctr, R., Han, J., Wang, J., Yu, P.S.: “A framework for clustering evolving data streams”. In: VLDB, pp. 81–92. (2003)

  31. O’Callaghan, L., Mishra, N., Meyerson, A., Guha, S., Motwani, R.: “Streaming-data algorithms for high-quality clustering”. In: 2013 IEEE 29th International Conference on Data Engineering (ICDE) pp. 0685-0685. IEEE Computer Society (2013)

  32. Csernel, B., Clerot, F., Hbrail, G.: “Streamsamp: datastream clustering over tilted windows through sampling”. In: ECML PKDD 2006 Workshop on Knowledge Discovery from Data Streams, (2006)

  33. Abrantes, A.J.,Marques, J.S.: “A method for dynamic clustering of data”. In: British Machine Vision Conference, (1998)

  34. Silver, D., Wang, X.: Tracking and visualizing turbulent 3d features. IEEE Trans. Visual. Comput. Graph. 3(2), 129–141 (1997)

  35. Chen, J., Silver, D., Parashar, M.: “Real-time feature extraction and tracking in a computational steering environment”. In: Proceedings of Advanced Simulations Technologies Conference (ASTC’03), (2003)

Download references

Acknowledgments

The research presented in this work is supported in part by US National Science Foundation (NSF) via Grants numbers OCI 1310283, DMS 1228203, IIP 0758566, OCI 1339036 and CNS 1305375, by the Director, Office of Advanced Scientific Computing Research, Office of Science, of the U.S. Department of Energy through the Scientific Discovery through Advanced Computing (SciDAC) Institute of Scalable Data Management, Analysis and Visualization (SDAV) under ward number DE-SC0007455, the Advanced Scientific Computing Research and Fusion Energy Sciences Partnership for Edge Physics Simulations (EPSI) under award number DE-FG02-06ER54857, the ExaCT Combustion Co-Design Center via subcontract number 4000110839 from UT Battelle, and by an IBM Faculty Award. This work used the Extreme Science and Engineering Discovery Environment (XSEDE) under project number TG-CCR110035, which is supported by NSF grant number OCI 1053575. The research was conducted as part of the NSF Cloud and Autonomic Computing (CAC) Center at Rutgers University and the Rutgers Discovery Informatics Institute (RDI2). We thank Dr. Deborah Silver and Sedat Ozer for useful discussions on data visualization and providing the scientific dataset for our experimental evaluation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ivan Rodero.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lasluisa, S., Zhang, F., Jin, T. et al. In-situ feature-based objects tracking for data-intensive scientific and enterprise analytics workflows. Cluster Comput 18, 29–40 (2015). https://doi.org/10.1007/s10586-014-0396-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-014-0396-6

Keywords

Navigation