ABSTRACT
YoctoDB is a small embedded engine for extremely fast partitioned immutable-after-construction databases. Several high load services at Yandex.Classifieds implement pipelined partitioned data reindexing. The result of the reindexing process is an immutable index delivered to many search machines, reopened as a part of the composite index and queried when serving user requests. Read performance, memory consumption, fast reopening and reproducible latencies are paramount for the database engine. YoctoDB has successfully provided a solution for all of these services. We describe the role of YoctoDB in the architecture of indexing and search components, it's simple data model, client API, design, implementation and use cases. We conclude the paper with limitations of the approach and directions of future development.
- S. Chambi, D. Lemire, O. Kaser, and R. Godin. Better bitmap performance with roaring bitmaps. Software: practice and experience, 2015.Google Scholar
- F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS), 26(2):4, 2008. Google ScholarDigital Library
- D. Comer. Ubiquitous b-tree. ACM Computing Surveys (CSUR), 11(2):121--137, 1979. Google ScholarDigital Library
- D. J. DeWitt, R. H. Katz, F. Olken, L. D. Shapiro, M. R. Stonebraker, and D. A. Wood. Implementation techniques for main memory database systems, volume 14. ACM, 1984. Google ScholarDigital Library
- L. George. HBase: the definitive guide. O'Reilly Media, Inc., 2011.Google Scholar
- G. Graefe. Modern b-tree techniques. Foundations and Trends in Databases, 3(4):203--402, 2011. Google ScholarDigital Library
- J. E. Hopcroft. Data structures and algorithms. Addison-Wesley Boston, MA, USA, 1983.Google Scholar
- M. Kleppmann. Designing data-intensive applications. O'Reilly Media, 2016.Google Scholar
- A. Lakshman and P. Malik. Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Systems Review, 44(2):35--40, 2010. Google ScholarDigital Library
- D. Lemire, G. Ssi-Yan-Kai, and O. Kaser. Consistently faster and smaller compressed bitmaps with roaring. Software: Practice and Experience, 2016. Google ScholarDigital Library
- C. Okasaki. Purely functional data structures. Cambridge University Press, 1999.Google ScholarDigital Library
- P. O'Neil, E. Cheng, D. Gawlick, and E. O'Neil. The log-structured merge-tree (lsm-tree). Acta Informatica, 33(4):351--385, 1996. Google ScholarDigital Library
- V. Tsesko. Akka at Yandex (in Russian) // JPoint 2014. http://2014.javapoint.ru/talks/07/, Apr. 2014. [Online; accessed 18-July-2016].Google Scholar
- J. Zhou and K. A. Ross. Implementing database operations using simd instructions. In Proceedings of the 2002 ACM SIGMOD international conference on Management of data, pages 145--156. ACM, 2002. Google ScholarDigital Library
Recommendations
EmbedDB: A High-Performance Database for Resource-Constrained Embedded Systems Too Small for SQLite
SAC '24: Proceedings of the 39th ACM/SIGAPP Symposium on Applied ComputingData processing on the smallest devices typically requires custom development as embedded databases such as SQLite require too many resources for use. This work develops EmbedDB, an embedded, key-value database optimized for memory-constrained devices ...
Java Embedded Real-Time Systems: An Overview of Existing Solutions
ISORC '00: Proceedings of the Third IEEE International Symposium on Object-Oriented Real-Time Distributed ComputingMichel Banatre, Gilbert Cabillic, Jean-Philippe Lesot and Frederic ParaiIRISA-INRIAJava is a programming language with features not found in traditional languages such as platform independence and dynamic loading. Because of this, Java is extending and ...
Utilizing a NoSQL Data Store for Scalable Log Analysis
IDEAS '15: Proceedings of the 19th International Database Engineering & Applications SymposiumA potential problem for persisting large volume of data logs with a conventional relational database is that loading massive logs produced at high rates is not fast enough due to the strong consistency model and high cost of indexing. As a possible ...
Comments