Skip to main content

Using the Provenance from Astronomical Workflows to Increase Processing Efficiency

  • Conference paper
  • First Online:
Book cover Provenance and Annotation of Data and Processes (IPAW 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11017))

Included in the following conference series:

Abstract

Astronomy is increasingly becoming a data-driven science as the community builds larger instruments which are capable of gathering more data than previously possible. As the sizes of the datasets increase, it becomes even more important to make the most efficient use of the computational resources available. In this work, we highlight how provenance can be used to increase the computational efficiency of astronomical workflows. We describe a provenance-enabled image processing pipeline and motivate the generation of provenance with two relevant use cases. The first use case investigates the origin of an optical variation and the second is concerned with the objects used to calibrate the image. The provenance was then queried in order to evaluate the relative computational efficiency of use case evaluation, with and without the use of provenance. We find that recording the provenance of the pipeline increases the original processing time by \(\sim \)45%. However, we find that when evaluating the two identified use cases, the inclusion of provenance improves the efficiency of processing by \(\sim \)99% and \(\sim \)96% for Use Cases 1 and 2, respectively. Furthermore, we combine these results with the probability that Use Cases 1 and 2 will need to be evaluated and find a net decrease in computational processing efficiency of 13–44% when incorporating provenance generation within the workflow. However, we deduce that provenance has the potential to produce a net increase in this efficiency if more uses cases are to be considered.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://lucmoreau.github.io/ProvToolbox/.

References

  1. York, D.G.: The Sloan digital sky survey: technical summary. Astron. J. 120(3), 1579 (2000)

    Article  Google Scholar 

  2. Law, N.M., et al.: The palomar transient factory: system overview, performance, and first results. Publ. Astron. Soc. Pac. 121(886), 1395 (2009)

    Article  Google Scholar 

  3. Anthony Tyson, J.: Large synoptic survey telescope: overview. In: Survey and Other Telescope Technologies and Discoveries, vol. 4836, p. 10–21. International Society for Optics and Photonics (2002)

    Google Scholar 

  4. Moreau, L., Batlajery, B., Huynh, T.D., Michaelides, D., Packer, H.: A templating system to generate provenance. IEEE Trans. Softw. Eng. 44, 103–121 (2017)

    Article  Google Scholar 

  5. Wenger, M., et al.: The SIMBAD astronomical database-the CDS reference database for astronomical objects. Astron. Astrophys. Suppl. Ser. 143(1), 9–22 (2000)

    Article  Google Scholar 

  6. Sáenz-Adán, C., Pérez, B., Huynh, T.D., Moreau, L.: UML2PROV: automating provenance capture in software engineering. In: Tjoa, A.M., Bellatreche, L., Biffl, S., van Leeuwen, J., Wiedermann, J. (eds.) SOFSEM 2018. LNCS, vol. 10706, pp. 667–681. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73117-9_47

    Chapter  Google Scholar 

  7. Lanter, D.P.: Design of a lineage-based meta-data base for GIS. Cartograph. Geograph. Inf. Syst. 18(4), 255–261 (1991)

    Article  Google Scholar 

  8. Stevens, R.D., Robinson, A.J., Goble, C.A.: myGrid: personalised bioinformatics on the information grid. Bioinformatics 19(suppl. 1), 302–304 (2003)

    Article  Google Scholar 

  9. Foster, I., Vockler, J., Wilde, M., Zhao, Y.: Chimera: a virtual data system for representing, querying, and automating data derivation. In: Proceedings of 14th International Conference on Scientific and Statistical Database Management, pp. 37–46. IEEE (2002)

    Google Scholar 

  10. Ludäscher, B., et al.: Scientific workflow management and the Kepler system. Concurr. Comput.: Pract. Exp. 18(10), 1039–1065 (2006)

    Article  Google Scholar 

  11. McPhillips, T., et al.: YesWorkFlow: a user-oriented, language-independent tool for recovering workflow information from scripts. arXiv preprint arXiv:1502.02403 (2015)

  12. Murta, L., Braganholo, V., Chirigati, F., Koop, D., Freire, J.: NoWorkFlow: capturing and analyzing provenance of scripts. In: Ludäscher, B., Plale, B. (eds.) IPAW 2014. LNCS, vol. 8628, pp. 71–83. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16462-5_6

    Chapter  Google Scholar 

  13. Groth, P., Deelman, E., Juve, G., Mehta, G., Berriman, B.: Pipeline-centric provenance model. In: Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science, p. 4. ACM (2009)

    Google Scholar 

  14. Giesler, A., Czekala, M., Hagemeier, B., Grunzke, R.: UniProv: a flexible provenance tracking system for UNICORE. In: Di Napoli, E., Hermanns, M.-A., Iliev, H., Lintermann, A., Peyser, A. (eds.) JHPCS 2016. LNCS, vol. 10164, pp. 233–242. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-53862-4_20

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael A. C. Johnson .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Johnson, M.A.C., Moreau, L., Chapman, A., Gandhi, P., Sáenz-Adán, C. (2018). Using the Provenance from Astronomical Workflows to Increase Processing Efficiency. In: Belhajjame, K., Gehani, A., Alper, P. (eds) Provenance and Annotation of Data and Processes. IPAW 2018. Lecture Notes in Computer Science(), vol 11017. Springer, Cham. https://doi.org/10.1007/978-3-319-98379-0_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-98379-0_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-98378-3

  • Online ISBN: 978-3-319-98379-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics