High-throughput publish/subscribe on top of LSM-based storage

Qader, Mohiuddin Abdul; Hristidis, Vagelis

doi:10.1007/s10619-018-7236-2

High-throughput publish/subscribe on top of LSM-based storage

Published: 16 August 2018

Volume 37, pages 101–132, (2019)
Cite this article

Distributed and Parallel Databases Aims and scope Submit manuscript

Mohiuddin Abdul Qader¹ &
Vagelis Hristidis¹

285 Accesses
1 Citation
Explore all metrics

Abstract

State-of-the-art publish/subscribe systems are efficient when the subscriptions are relatively static—for instance, the set of followers in Twitter—or can fit in memory. However, now-a-days, many big data and IoT based applications follow a highly dynamic query paradigm, where both continuous queries and data entries are in the millions and can arrive and expire rapidly. In this paper we propose and compare several publish/subscribe storage architectures, based on the popular NoSQL log-structured merge tree (LSM) storage paradigm, to support high-throughput and highly dynamic publish/subscribe systems. Our framework naturally supports subscriptions on both historic and future streaming data, and generates instant notifications. We also extend our framework to efficiently support self-joining subscriptions, where streaming pub/sub records join with past pub/sub entries. Further, we show how hierarchical attributes, such as concept ontologies, can be efficiently supported; for example, a publication’s topic is “politics” whereas a subscription’s topic is “US politics.” We implemented and experimentally evaluated our methods on the popular LSM-based LevelDB system, using real datasets, for simple match and self-joining subscriptions on both flat and hierarchical attributes. Our results show that our approaches achieve significantly higher throughput compared to state-of-the-art baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Distributed RDF Archives Querying with Spark

GLENDA: Querying RDF Archives with Full SPARQL

Full-Text Support for Publish/Subscribe Ontology Systems

Notes

Which is C1 in Fig. 2 as LevelDB does not number the memory component.

References

Carey, M.J., Jacobs, S., Tsotras, V.J.: Breaking bad: a data serving vision for big active data. In: Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems, pp. 181–186. ACM, New York (2016)
Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)
Article Google Scholar
Feinberg, A.: Project voldemort: reliable distributed storage. In: Proceedings of the 10th IEEE International Conference on Data Engineering (2011)
Alsubaiee, S., Behm, A., Borkar, V., Heilbron, Z., Kim, Y.S., Carey, M.J., Dreseler, M., Li, C.: Storage management in AsterixDB. Proc. VLDB Endow. 7(10), 841–852 (2014)
Article Google Scholar
Mongodb.: https://www.mongodb.com
Leveldb.: http://leveldb.org/
Fidler, E., Jacobsen, H.A., Li, G., Mankovski, S.: The padres distributed publish/subscribe system. In: FIW, pp. 12–30 (2005)
Project Website for Open Source Code.: http://dblab.cs.ucr.edu/projects/PubSub-Store/
Eugster, P.T., Felber, P.A., Guerraoui, R., Kermarrec, A.M.: The many faces of publish/subscribe. ACM Comput. Surv. (CSUR) 35(2), 114–131 (2003)
Article Google Scholar
Kermarrec, A.M., Triantafillou, P.: Xl peer-to-peer pub/sub systems. ACM Comput. Surv. (CSUR) 46(2), 16 (2013)
Article Google Scholar
Jacobsen, H.A., Muthusamy, V., Li, G.: The padres event processing network: uniform querying of past and future eventsdas padres ereignisverarbeitungsnetzwerk: Einheitliche anfragen auf ereignisse der vergangenheit und zukunft. it Inform. Technol. 51(5), 250–260 (2009)
Article Google Scholar
Bhatt, N., Gawlick, D., Soylemez, E., Yaseem, R.: Content based publish-and-subscribe system integrated in a relational database system. US Patent 6,405,191 (2002)
Jacobs, S., Uddin, M.Y.S., Carey, M., Hristidis, V., Tsotras, V.J., Venkatasubramanian, N., Wu, Y., Safir, S., Kaul, P., Wang, X., Qader, M.A., Li, Y.: A bad demonstration: towards big active data. Proc. VLDB Endow. 10(12), 1941–1944 (2017)
Article Google Scholar
Tian, F., Reinwald, B., Pirahesh, H., Mayr, T., Myllymaki, J.: Implementing a scalable xml publish/subscribe system using relational database systems. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 479–490. ACM, New York (2004)
Guo, L., Zhang, D., Li, G., Tan, K.L., Bao, Z.: Location-aware pub/sub system: when continuous moving queries meet dynamic event streams. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 843–857. ACM, New York (2015)
Qader, M.A., Hristidis, V.: Dualdb: An efficient lsm-based publish/subscribe storage system. In: Proceedings of the 29th International Conference on Scientific and Statistical Database Management (SSDBM) (2017)
Widom, J., Finkelstein, S.J.: Set-oriented production rules in relational database systems. In: ACM SIGMOD Record, vol. 19, pp. 259–270. ACM, New York (1990)
Schreier, U., Pirahesh, H., Agrawal, R., Mohan, C.: Alert: An architecture for transforming a passive DBMS into an active DBMS. In: Proceedings of the 17th International Conference on Very Large Data Bases, pp. 469–478. Morgan Kaufmann Publishers Inc. (1991)
Hanson, E.N., Carnes, C., Huang, L., Konyala, M., Noronha, L., Parthasarathy, S., Park, J., Vernon, A.: Scalable trigger processing. In: Proceedings 15th International Conference on Data Engineering, pp. 266–275. IEEE (1999)
Chen, J., DeWitt, D.J., Tian, F., Wang, Y.: Niagaracq: a scalable continuous query system for internet databases. In: ACM SIGMOD Record, vol. 29, pp. 379–390. ACM, New York (2000)
Chandrasekaran, S., Cooper, O., Deshpande, A., Franklin, M.J., Hellerstein, J.M., Hong, W., Krishnamurthy, S., Madden, S.R., Reiss, F., Shah, M.A.: Telegraphcq: continuous dataflow processing. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp. 668–668. ACM, New York (2003)
Babu, S., Widom, J.: Continuous queries over data streams. ACM Sigmod Record 30(3), 109–120 (2001)
Article Google Scholar
Madden, S., Shah, M., Hellerstein, J.M., Raman, V.: Continuously adaptive continuous queries over streams. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 49–60. ACM, New York (2002)
Garg, N.: Apache Kafka. Packt Publishing Ltd, Birmingham (2013)
Google Scholar
Gemfire Continuous Querying.: https://pubs.vmware.com/vfabric5/index.jsp?topic=/com.vmware.vfabric.gemfire.6.6/developing/continuous_querying/how_continuous_querying_works.html
Influxdb.: https://www.influxdata.com/
Hendawi, A.M., Gupta, J., Shi, Y., Fattah, H., Ali, M.: The microsoft reactive framework meets the internet of moving things. In: IEEE 33rd International Conference on Data Engineering (2017)
Oracle Bitmap Indexes.: https://docs.oracle.com/cd/B10500_01/server.920/a96520/indexes.htm
Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. TOCS 26(2), 4 (2008)
Article Google Scholar
George, L.: HBase: The Definitive Guide. O’Reilly Media Inc, Sebastopol, CA (2011)
Google Scholar
Rocksdb.: http://rocksdb.org/
Qader, M.A., Cheng, S., Hristidis, V.: A comparative study of secondary indexing techniques in LSM-based NoSQL databases. In: Proceedings of the 2018 International Conference on Management of Data, pp. 551–566. ACM, New York (2018)
Tatarinov, I., Viglas, S.D., Beyer, K., Shanmugasundaram, J., Shekita, E., Zhang, C.: Storing and querying ordered xml using a relational database system. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 204–215. ACM, New York (2002)
Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM Symposium on Cloud Computing, pp. 143–154 (2010)

Download references

Acknowledgements

This project is partially supported by NSF Grants IIS-1447826 and IIS-1619463.

Author information

Authors and Affiliations

Department of Computer Science & Engineering, University of California Riverside, Riverside, CA, USA
Mohiuddin Abdul Qader & Vagelis Hristidis

Authors

Mohiuddin Abdul Qader
View author publications
You can also search for this author in PubMed Google Scholar
Vagelis Hristidis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohiuddin Abdul Qader.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qader, M.A., Hristidis, V. High-throughput publish/subscribe on top of LSM-based storage. Distrib Parallel Databases 37, 101–132 (2019). https://doi.org/10.1007/s10619-018-7236-2

Download citation

Published: 16 August 2018
Issue Date: 15 March 2019
DOI: https://doi.org/10.1007/s10619-018-7236-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

High-throughput publish/subscribe on top of LSM-based storage

Abstract

Access this article

Similar content being viewed by others

Distributed RDF Archives Querying with Spark

GLENDA: Querying RDF Archives with Full SPARQL

Full-Text Support for Publish/Subscribe Ontology Systems

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

High-throughput publish/subscribe on top of LSM-based storage

Abstract

Access this article

Similar content being viewed by others

Distributed RDF Archives Querying with Spark

GLENDA: Querying RDF Archives with Full SPARQL

Full-Text Support for Publish/Subscribe Ontology Systems

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation