Skip to main content
Log in

High-throughput publish/subscribe on top of LSM-based storage

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

State-of-the-art publish/subscribe systems are efficient when the subscriptions are relatively static—for instance, the set of followers in Twitter—or can fit in memory. However, now-a-days, many big data and IoT based applications follow a highly dynamic query paradigm, where both continuous queries and data entries are in the millions and can arrive and expire rapidly. In this paper we propose and compare several publish/subscribe storage architectures, based on the popular NoSQL log-structured merge tree (LSM) storage paradigm, to support high-throughput and highly dynamic publish/subscribe systems. Our framework naturally supports subscriptions on both historic and future streaming data, and generates instant notifications. We also extend our framework to efficiently support self-joining subscriptions, where streaming pub/sub records join with past pub/sub entries. Further, we show how hierarchical attributes, such as concept ontologies, can be efficiently supported; for example, a publication’s topic is “politics” whereas a subscription’s topic is “US politics.” We implemented and experimentally evaluated our methods on the popular LSM-based LevelDB system, using real datasets, for simple match and self-joining subscriptions on both flat and hierarchical attributes. Our results show that our approaches achieve significantly higher throughput compared to state-of-the-art baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Notes

  1. Which is C1 in Fig. 2 as LevelDB does not number the memory component.

References

  1. Carey, M.J., Jacobs, S., Tsotras, V.J.: Breaking bad: a data serving vision for big active data. In: Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems, pp. 181–186. ACM, New York (2016)

  2. Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)

    Article  Google Scholar 

  3. Feinberg, A.: Project voldemort: reliable distributed storage. In: Proceedings of the 10th IEEE International Conference on Data Engineering (2011)

  4. Alsubaiee, S., Behm, A., Borkar, V., Heilbron, Z., Kim, Y.S., Carey, M.J., Dreseler, M., Li, C.: Storage management in AsterixDB. Proc. VLDB Endow. 7(10), 841–852 (2014)

    Article  Google Scholar 

  5. Mongodb.: https://www.mongodb.com

  6. Leveldb.: http://leveldb.org/

  7. Fidler, E., Jacobsen, H.A., Li, G., Mankovski, S.: The padres distributed publish/subscribe system. In: FIW, pp. 12–30 (2005)

  8. Project Website for Open Source Code.: http://dblab.cs.ucr.edu/projects/PubSub-Store/

  9. Eugster, P.T., Felber, P.A., Guerraoui, R., Kermarrec, A.M.: The many faces of publish/subscribe. ACM Comput. Surv. (CSUR) 35(2), 114–131 (2003)

    Article  Google Scholar 

  10. Kermarrec, A.M., Triantafillou, P.: Xl peer-to-peer pub/sub systems. ACM Comput. Surv. (CSUR) 46(2), 16 (2013)

    Article  Google Scholar 

  11. Jacobsen, H.A., Muthusamy, V., Li, G.: The padres event processing network: uniform querying of past and future eventsdas padres ereignisverarbeitungsnetzwerk: Einheitliche anfragen auf ereignisse der vergangenheit und zukunft. it Inform. Technol. 51(5), 250–260 (2009)

    Article  Google Scholar 

  12. Bhatt, N., Gawlick, D., Soylemez, E., Yaseem, R.: Content based publish-and-subscribe system integrated in a relational database system. US Patent 6,405,191 (2002)

  13. Jacobs, S., Uddin, M.Y.S., Carey, M., Hristidis, V., Tsotras, V.J., Venkatasubramanian, N., Wu, Y., Safir, S., Kaul, P., Wang, X., Qader, M.A., Li, Y.: A bad demonstration: towards big active data. Proc. VLDB Endow. 10(12), 1941–1944 (2017)

    Article  Google Scholar 

  14. Tian, F., Reinwald, B., Pirahesh, H., Mayr, T., Myllymaki, J.: Implementing a scalable xml publish/subscribe system using relational database systems. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 479–490. ACM, New York (2004)

  15. Guo, L., Zhang, D., Li, G., Tan, K.L., Bao, Z.: Location-aware pub/sub system: when continuous moving queries meet dynamic event streams. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 843–857. ACM, New York (2015)

  16. Qader, M.A., Hristidis, V.: Dualdb: An efficient lsm-based publish/subscribe storage system. In: Proceedings of the 29th International Conference on Scientific and Statistical Database Management (SSDBM) (2017)

  17. Widom, J., Finkelstein, S.J.: Set-oriented production rules in relational database systems. In: ACM SIGMOD Record, vol. 19, pp. 259–270. ACM, New York (1990)

  18. Schreier, U., Pirahesh, H., Agrawal, R., Mohan, C.: Alert: An architecture for transforming a passive DBMS into an active DBMS. In: Proceedings of the 17th International Conference on Very Large Data Bases, pp. 469–478. Morgan Kaufmann Publishers Inc. (1991)

  19. Hanson, E.N., Carnes, C., Huang, L., Konyala, M., Noronha, L., Parthasarathy, S., Park, J., Vernon, A.: Scalable trigger processing. In: Proceedings 15th International Conference on Data Engineering, pp. 266–275. IEEE (1999)

  20. Chen, J., DeWitt, D.J., Tian, F., Wang, Y.: Niagaracq: a scalable continuous query system for internet databases. In: ACM SIGMOD Record, vol. 29, pp. 379–390. ACM, New York (2000)

  21. Chandrasekaran, S., Cooper, O., Deshpande, A., Franklin, M.J., Hellerstein, J.M., Hong, W., Krishnamurthy, S., Madden, S.R., Reiss, F., Shah, M.A.: Telegraphcq: continuous dataflow processing. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp. 668–668. ACM, New York (2003)

  22. Babu, S., Widom, J.: Continuous queries over data streams. ACM Sigmod Record 30(3), 109–120 (2001)

    Article  Google Scholar 

  23. Madden, S., Shah, M., Hellerstein, J.M., Raman, V.: Continuously adaptive continuous queries over streams. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 49–60. ACM, New York (2002)

  24. Garg, N.: Apache Kafka. Packt Publishing Ltd, Birmingham (2013)

    Google Scholar 

  25. Gemfire Continuous Querying.: https://pubs.vmware.com/vfabric5/index.jsp?topic=/com.vmware.vfabric.gemfire.6.6/developing/continuous_querying/how_continuous_querying_works.html

  26. Influxdb.: https://www.influxdata.com/

  27. Hendawi, A.M., Gupta, J., Shi, Y., Fattah, H., Ali, M.: The microsoft reactive framework meets the internet of moving things. In: IEEE 33rd International Conference on Data Engineering (2017)

  28. Oracle Bitmap Indexes.: https://docs.oracle.com/cd/B10500_01/server.920/a96520/indexes.htm

  29. Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. TOCS 26(2), 4 (2008)

    Article  Google Scholar 

  30. George, L.: HBase: The Definitive Guide. O’Reilly Media Inc, Sebastopol, CA (2011)

    Google Scholar 

  31. Rocksdb.: http://rocksdb.org/

  32. Qader, M.A., Cheng, S., Hristidis, V.: A comparative study of secondary indexing techniques in LSM-based NoSQL databases. In: Proceedings of the 2018 International Conference on Management of Data, pp. 551–566. ACM, New York (2018)

  33. Tatarinov, I., Viglas, S.D., Beyer, K., Shanmugasundaram, J., Shekita, E., Zhang, C.: Storing and querying ordered xml using a relational database system. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 204–215. ACM, New York (2002)

  34. Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM Symposium on Cloud Computing, pp. 143–154 (2010)

Download references

Acknowledgements

This project is partially supported by NSF Grants IIS-1447826 and IIS-1619463.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohiuddin Abdul Qader.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qader, M.A., Hristidis, V. High-throughput publish/subscribe on top of LSM-based storage. Distrib Parallel Databases 37, 101–132 (2019). https://doi.org/10.1007/s10619-018-7236-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-018-7236-2

Keywords

Navigation