Skip to main content

Application-Aware Storage Strategy for Scientific Data

  • Conference paper
Contemporary Research on E-business Technology and Strategy (iCETS 2012)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 332))

Included in the following conference series:

  • 2734 Accesses

Abstract

The explosive growth of modern scientific data opens new challenges for storing and accessing very large (petabytes) scale data. Traditional file systems and databases cannot meet the requirements of managing scientific data. Arrays are considered as a natural data model for scientific data. Some science-oriented systems have been developed for array data model handling. However, a shortcoming of those systems is that most of them use a “no overwrite” storage strategy, which destabilizes the performance of supporting different applications. In this paper, we proposed an application-aware storage strategy which can optimize data layout gradually according to different access patterns. We implemented the strategy based off of SciDB by creating arrays with different indices for specific parts of the dataset. Experiment testing has been conducted to verify the proposed strategy, and the experimental results show that our strategy improves the performance of science-oriented database on supporting various kinds of applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Paul, G.B.: Overview of SciDB: large scale array storage, processing and analysis. In: Proceedings of the 2010 International Conference on Management of Data, SIGMOD 2010, pp. 963–968. ACM, New York (2010)

    Google Scholar 

  2. Cudre-Mauroux, P., Lim, H., Simakov, J.: A Demonstration of SciDB: A Science-Oriented DBMS. In: 35th International Conference on Very Large Data Bases (VLDB 2009), Lyon, pp. 1534–1537 (2009)

    Google Scholar 

  3. Stonebraker, M., Becla, J., DeWitt, D.J., Lim, K.T., Maier, D., Ratzesberger, O., Zdonik, S.: Requirements for science data bases and scidb. In: Conference on Innovative Data Systems Research (CIDR), Monterey (2009)

    Google Scholar 

  4. Kesheng, W., Surendra, B., Doron, R., Arie, S.: Scientific data services: a high-performance I/O system with array semantics. In: Proceedings of the First Annual Workshop on High Performance Computing Meets Databases, pp. 9–12. ACM, Washington (2011)

    Google Scholar 

  5. Huaiming, S., Yanlong, Y., Yong, C., Xian-He, S.: A cost-intelligent application-specific data layout scheme for parallel file systems. In: Proceedings of the 20th International Symposium on High Performance Distributed Computing, pp. 37–48. ACM, California (2011)

    Google Scholar 

  6. European Organization for Nuclear Research, http://public.web.cern.ch/public/

  7. Stonebraker, M.: A next-generation information system for the study of global change. In: Proc. 13th IEEE Symp. on Mass Storage Systems, Sequoia, pp. 47–53 (1994)

    Google Scholar 

  8. Dewitt, P.J., Kabra, N., Luo, P., Patel, M.: Client-Server Paradise. In: Proceedings of the 20th International Conference on Very Large Data Bases, pp. 558–569. Morgan Kaufmann (1994)

    Google Scholar 

  9. Ivanova, P., Nes, N., Goncalves, R., Kersten, M.: MonetDB/SQL Meets SkyServer: the Challenges of a Scientific Database. In: Proceedings of the 19th International Conference on Scientific and Statistical Database Management, p. 13. IEEE, Washington (2007)

    Google Scholar 

  10. Cudre-Maroux, P.: SS-DB: A Standard Science DBMS Benchmark (submitted for publication)

    Google Scholar 

  11. Baumann, P., Dehmel, A., Furtado, P., Ritsch, R., Widmann, N.: The multidimensional database system RasDaMan. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, pp. 575–577. ACM Press, Washington (1998)

    Chapter  Google Scholar 

  12. Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. In: Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation, pp. 137–150. USENIX Association, San Francisco (2004)

    Google Scholar 

  13. National Astronomical Observatories, Chinese Academy of Sciences, http://www.bao.ac.cn/

  14. Sloan Digital Sky Survey, http://www.sdss.org/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, R. et al. (2012). Application-Aware Storage Strategy for Scientific Data. In: Khachidze, V., Wang, T., Siddiqui, S., Liu, V., Cappuccio, S., Lim, A. (eds) Contemporary Research on E-business Technology and Strategy. iCETS 2012. Communications in Computer and Information Science, vol 332. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34447-3_61

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-34447-3_61

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-34446-6

  • Online ISBN: 978-3-642-34447-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics