A Fast Data Ingestion and Indexing Scheme for Real-Time Log Analytics

Bian, Haoqiong; Chen, Yueguo; Qin, Xiongpai; Du, Xiaoyong

doi:10.1007/978-3-319-25255-1_69

A Fast Data Ingestion and Indexing Scheme for Real-Time Log Analytics

Haoqiong Bian¹⁸,
Yueguo Chen¹⁸,
Xiongpai Qin¹⁸ &
…
Xiaoyong Du¹⁸

Conference paper
First Online: 13 November 2015

2834 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9313))

Abstract

Structured log data is a kind of append-only time-series data which grows rapidly as new entries are continuously generated and captured. It has become very popular in application domains such as Internet, sensor networks and telecommunications. In recent years, many systems have been developed to support batch analysis of such structured log data. But they often fail to meet the high throughput requirements of real-time log data ingestion and analytics. An efficient index is very important to accelerate log data analytics, and at the meanwhile to support high throughput data loading. This paper focuses on designing a specialized indexing scheme for real-time log data analytics. The solution adopts a dynamic global hash index to partition the tuples into hash buckets. Then the tuples in the hash buckets are sorted and buffered in the sort buffer queue. When the amount of data in the queue reaches a threshold, the data is packed into segments before spilling to the disks. Moreover, an intra-segment index is maintained by meta database. With such an indexing scheme, the database system achieves high throughput and real-time data loading and query performance. As shown in the experiments, the data loading throughput reaches 5 million tuples per second per node. The delay of data loading does not exceed 10 seconds, and a sub-second query performance is achieved for the given queries.

This work is supported by the Outstanding Innovative Talents Cultivation Funded Programs 2014 of Renmin University of China.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

http://docs.oracle.com/cd/b28359_01/server.111/b28313/indexes.htm#i1006549
http://en.wikipedia.org/wiki/fractal_tree_index
http://hbase.apache.org/
http://redis.io
http://sandbox.mc.edu/~bennet/cs402/lec/losedex.html
https://dev.mysql.com/doc/refman/5.5/en/index-btree-hash.html
https://github.com/google/leveldb
https://github.com/tokutek/tokudb-engine
http://www.gbase.cn/comcontent_detail1/&i=30&comcontentid=30.html
http://www.gluster.org
http://www.oracle.com/technetwork/articles/sharma-indexes-093638.html
Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. CACM 13(7), 422–426 (1970)
Article MATH Google Scholar
Boncz, P.A., Zukowski, M., Nes, N.: Monetdb/x100: Hyper-pipelining query execution. In: CIDR, vol. 5, pp. 225–237 (2005)
Google Scholar
Chan, C.-Y., Ioannidis, Y.E.: Bitmap index design and evaluation. In: SIGMOD, vol. 27, pp. 355–366 (1998)
Google Scholar
Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: A distributed storage system for structured data. In: OSDI (2006)
Google Scholar
Guttman, A.: R-trees: A dynamic index structure for spatial searching. In: SIGMOD, pp. 47–57 (1984)
Google Scholar
He, Y., Lee, R., Huai, Y., Shao, Z., Jain, N., Zhang, X., Xu, Z.: Rcfile: A fast and space-efficient data placement structure in mapreduce-based warehouse systems. In: ICDE (2011)
Google Scholar
Karger, D., Lehman, E., Leighton, T., Panigrahy, R., Levine, M., Lewin, D.: Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web. In: STOC, pp. 654–663 (1997)
Google Scholar
Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. SIGOPS 44(2), 35–40 (2010)
Article Google Scholar
Lehman, P.L., et al.: Efficient locking for concurrent operations on b-trees. TODS 6(4), 650–670 (1981)
Article MATH Google Scholar
Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. In: SIGMOD (2008)
Google Scholar
Neil, P.O., Cheng, E., Gawlick, D., ONeil, E.: The log-structured merge-tree (lsm-tree). Acta Informatica 33(4), 351–385 (1996)
Article Google Scholar
Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: SIGMOD (2009)
Google Scholar
Ślźak, D., Wróblewski, J., Eastwood, V., Synak, P.: Brighthouse: an analytic data warehouse for ad-hoc queries. PVLDB 1(2), 1337–1345 (2008)
Google Scholar
Stonebraker, M., Abadi, D., DeWitt, D.J., Madden, S., Paulson, E., Pavlo, A., Rasin, A.: Mapreduce and parallel dbmss: Friends or foes? CACM, 53(1), January 2010
Google Scholar
Stonebraker, M., Abadi, D.J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E.J., O’Neil, P.E., Rasin, A., Tran, N., Zdonik, S.B.: C-store: A column-oriented DBMS. In: VLDB, pp. 553–564 (2005)
Google Scholar
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Zhang, N., Anthony, S., Liu, H., Murthy, R.: Hive - a petabyte scale data warehouse using hadoop. In: ICDE (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Key Laboratory of Data Engineering and Knowledge Engineering (MOE), Renmin University of China, Beijing, 100872, China
Haoqiong Bian, Yueguo Chen, Xiongpai Qin & Xiaoyong Du

Authors

Haoqiong Bian
View author publications
You can also search for this author in PubMed Google Scholar
Yueguo Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xiongpai Qin
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyong Du
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Hong Kong, Hong Kong, China
Reynold Cheng
Computer Science, Peking University, Beijing, China
Bin Cui
Advanced Digital Sciences Center (ADSC), Singapore, Singapore
Zhenjie Zhang
University of Technology, Guangzhou, China
Ruichu Cai
Guangxi University, Guangxi, China
Jia Xu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bian, H., Chen, Y., Qin, X., Du, X. (2015). A Fast Data Ingestion and Indexing Scheme for Real-Time Log Analytics. In: Cheng, R., Cui, B., Zhang, Z., Cai, R., Xu, J. (eds) Web Technologies and Applications. APWeb 2015. Lecture Notes in Computer Science(), vol 9313. Springer, Cham. https://doi.org/10.1007/978-3-319-25255-1_69

Download citation

DOI: https://doi.org/10.1007/978-3-319-25255-1_69
Published: 13 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25254-4
Online ISBN: 978-3-319-25255-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics