Skip to main content

A Fast Data Ingestion and Indexing Scheme for Real-Time Log Analytics

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9313))

Abstract

Structured log data is a kind of append-only time-series data which grows rapidly as new entries are continuously generated and captured. It has become very popular in application domains such as Internet, sensor networks and telecommunications. In recent years, many systems have been developed to support batch analysis of such structured log data. But they often fail to meet the high throughput requirements of real-time log data ingestion and analytics. An efficient index is very important to accelerate log data analytics, and at the meanwhile to support high throughput data loading. This paper focuses on designing a specialized indexing scheme for real-time log data analytics. The solution adopts a dynamic global hash index to partition the tuples into hash buckets. Then the tuples in the hash buckets are sorted and buffered in the sort buffer queue. When the amount of data in the queue reaches a threshold, the data is packed into segments before spilling to the disks. Moreover, an intra-segment index is maintained by meta database. With such an indexing scheme, the database system achieves high throughput and real-time data loading and query performance. As shown in the experiments, the data loading throughput reaches 5 million tuples per second per node. The delay of data loading does not exceed 10 seconds, and a sub-second query performance is achieved for the given queries.

This work is supported by the Outstanding Innovative Talents Cultivation Funded Programs 2014 of Renmin University of China.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. http://docs.oracle.com/cd/b28359_01/server.111/b28313/indexes.htm#i1006549

  2. http://en.wikipedia.org/wiki/fractal_tree_index

  3. http://hbase.apache.org/

  4. http://redis.io

  5. http://sandbox.mc.edu/~bennet/cs402/lec/losedex.html

  6. https://dev.mysql.com/doc/refman/5.5/en/index-btree-hash.html

  7. https://github.com/google/leveldb

  8. https://github.com/tokutek/tokudb-engine

  9. http://www.gbase.cn/comcontent_detail1/&i=30&comcontentid=30.html

  10. http://www.gluster.org

  11. http://www.oracle.com/technetwork/articles/sharma-indexes-093638.html

  12. Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. CACM 13(7), 422–426 (1970)

    Article  MATH  Google Scholar 

  13. Boncz, P.A., Zukowski, M., Nes, N.: Monetdb/x100: Hyper-pipelining query execution. In: CIDR, vol. 5, pp. 225–237 (2005)

    Google Scholar 

  14. Chan, C.-Y., Ioannidis, Y.E.: Bitmap index design and evaluation. In: SIGMOD, vol. 27, pp. 355–366 (1998)

    Google Scholar 

  15. Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: A distributed storage system for structured data. In: OSDI (2006)

    Google Scholar 

  16. Guttman, A.: R-trees: A dynamic index structure for spatial searching. In: SIGMOD, pp. 47–57 (1984)

    Google Scholar 

  17. He, Y., Lee, R., Huai, Y., Shao, Z., Jain, N., Zhang, X., Xu, Z.: Rcfile: A fast and space-efficient data placement structure in mapreduce-based warehouse systems. In: ICDE (2011)

    Google Scholar 

  18. Karger, D., Lehman, E., Leighton, T., Panigrahy, R., Levine, M., Lewin, D.: Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web. In: STOC, pp. 654–663 (1997)

    Google Scholar 

  19. Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. SIGOPS 44(2), 35–40 (2010)

    Article  Google Scholar 

  20. Lehman, P.L., et al.: Efficient locking for concurrent operations on b-trees. TODS 6(4), 650–670 (1981)

    Article  MATH  Google Scholar 

  21. Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. In: SIGMOD (2008)

    Google Scholar 

  22. Neil, P.O., Cheng, E., Gawlick, D., ONeil, E.: The log-structured merge-tree (lsm-tree). Acta Informatica 33(4), 351–385 (1996)

    Article  Google Scholar 

  23. Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: SIGMOD (2009)

    Google Scholar 

  24. Ślźak, D., Wróblewski, J., Eastwood, V., Synak, P.: Brighthouse: an analytic data warehouse for ad-hoc queries. PVLDB 1(2), 1337–1345 (2008)

    Google Scholar 

  25. Stonebraker, M., Abadi, D., DeWitt, D.J., Madden, S., Paulson, E., Pavlo, A., Rasin, A.: Mapreduce and parallel dbmss: Friends or foes? CACM, 53(1), January 2010

    Google Scholar 

  26. Stonebraker, M., Abadi, D.J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E.J., O’Neil, P.E., Rasin, A., Tran, N., Zdonik, S.B.: C-store: A column-oriented DBMS. In: VLDB, pp. 553–564 (2005)

    Google Scholar 

  27. Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Zhang, N., Anthony, S., Liu, H., Murthy, R.: Hive - a petabyte scale data warehouse using hadoop. In: ICDE (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Bian, H., Chen, Y., Qin, X., Du, X. (2015). A Fast Data Ingestion and Indexing Scheme for Real-Time Log Analytics. In: Cheng, R., Cui, B., Zhang, Z., Cai, R., Xu, J. (eds) Web Technologies and Applications. APWeb 2015. Lecture Notes in Computer Science(), vol 9313. Springer, Cham. https://doi.org/10.1007/978-3-319-25255-1_69

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25255-1_69

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25254-4

  • Online ISBN: 978-3-319-25255-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics