Abstract
As more applications migrate to the cloud, and as "big data" edges into even more production environments, the performance and simplicity of exchanging data between compute nodes/devices is increasing in importance. An issue central to distributed programming, yet often under-considered, is serialization or pickling, i.e., persisting runtime objects by converting them into a binary or text representation. Pickler combinators are a popular approach from functional programming; their composability alleviates some of the tedium of writing pickling code by hand, but they don't translate well to object-oriented programming due to qualities like open class hierarchies and subtyping polymorphism. Furthermore, both functional pickler combinators and popular, Java-based serialization frameworks tend to be tied to a specific pickle format, leaving programmers with no choice of how their data is persisted. In this paper, we present object-oriented pickler combinators and a framework for generating them at compile-time, called scala/pickling, designed to be the default serialization mechanism of the Scala programming language. The static generation of OO picklers enables significant performance improvements, outperforming Java and Kryo in most of our benchmarks. In addition to high performance and the need for little to no boilerplate, our framework is extensible: using the type class pattern, users can provide both (1) custom, easily interchangeable pickle formats and (2) custom picklers, to override the default behavior of the pickling framework. In benchmarks, we compare scala/pickling with other popular industrial frameworks, and present results on time, memory usage, and size when pickling/unpickling a number of data types used in real-world, large-scale distributed applications and frameworks.
- AvroApache. Avro®. http://avro.apache.org. Accessed: 2013-08-11.Google Scholar
- A. W. Appel and M. J. R. Gonçalves. Hash-consing garbage collection. Technical Report CS-TR-412-93, Princeton University, Computer Science Department, 1993.Google Scholar
- M. Armbrust, A. Fox, D. A. Patterson, N. Lanham, B. Trushkowsky, J. Trutna, and H. Oh. SCADS: Scale-independent storage for social computing applications. In CIDR, 2009.Google Scholar
- Azavea. GeoTrellis. http://www.azavea.com/products/geotrellis/, 2010. Accessed: 2013-08-11.Google Scholar
- E. Burmako and M. Odersky. Scala macros, a technical report. In Third International Valentin Turchin Workshop on Meta-computation, 2012.Google Scholar
- L. Cardelli, J. E. Donahue, M. J. Jordan, B. Kalsow, and G. Nelson. The modula-3 type system. In POPL, pages 202--212, 1989. Google ScholarDigital Library
- B. Carpenter, G. Fox, S. H. Ko, and S. Lim. Object serialization for marshalling data in a Java interface to MPI. In Java Grande, pages 66--71, 1999. Google ScholarDigital Library
- B. C. d. S. Oliveira, A. Moors, and M. Odersky. Type classes as objects and implicits. In OOPSLA, pages 341--360, 2010. Google ScholarDigital Library
- G. Dubochet. Embedded Domain-Specific Languages using Libraries and Dynamic Metaprogramming. PhD thesis, EPFL, Switzerland, 2011.Google Scholar
- M. Elsman. Type-specialized serialization with sharing. In Trends in Functional Programming, pages 47--62, 2005.Google Scholar
- C. Flanagan, A. Sabry, B. F. Duba, and M. Felleisen. The essence of compiling with continuations. In PLDI, pages 237--247. 1993. Google ScholarDigital Library
- J. Gil and I. Maman. Whiteoak: introducing structural typing into Java. In G. E. Harris, editor, OOPSLA, pages 73--90, 2008. Google ScholarDigital Library
- Google. Protocol Buffers. https://code.google.com/p/protobuf/, 2008. Accessed: 2013-08-11.Google Scholar
- P. Haller and M. Odersky. Capabilities for uniqueness and borrowing. In T. D'Hondt, editor, ECOOP, pages 354--378, 2010. Google ScholarDigital Library
- M. Herlihy and B. Liskov. A value transmission method for abstract data types. ACM Trans. Program. Lang. Syst, 4 (4): 527--551, 1982. Google ScholarDigital Library
- A. Igarashi, B. C. Pierce, and P. Wadler. Featherweight Java: a minimal core calculus for Java and GJ. ACM Trans. Program. Lang. Syst, 23 (3): 396--450, May 2001. Google ScholarDigital Library
- A. Kennedy. Pickler combinators. J. Funct. Program., 14 (6): 727--739, 2004. Google ScholarDigital Library
- J. Maassen, R. van Nieuwpoort, R. Veldema, H. E. Bal, and A. Plaat. An efficient implementation of Java's remote method invocation. In PPOPP, pages 173--182, Aug. 1999. Google ScholarDigital Library
- J. P. Magalhães, A. Dijkstra, J. Jeuring, and A. Löh. A generic deriving mechanism for Haskell. In J. Gibbons, editor, Haskell, pages 37--48, 2010. Google ScholarDigital Library
- Nathan Marz and James Xu and Jason Jackson et al. Storm. http://storm-project.net/, 2012. Accessed: 2013-08-11.Google Scholar
- Nathan Sweet et al. Kryo. https://code.google.com/p/kryo/. Accessed: 2013-08-11.Google Scholar
- K. Ng, M. Warren, P. Golde, and A. Hejlsberg. The Roslyn project: Exposing the C# and VB compiler's code analysis. http://msdn.microsoft.com/en-gb/hh500769, Sept. 2012. Accessed: 2013-08-11.Google Scholar
- M. Odersky. Scala Language Specification. http://www.scala-lang.org/files/archive/nightly/pdfs/ScalaReference.pdf, 2013. Accessed: 2013-08-11.Google Scholar
- M. Odersky and M. Zenger. Scalable component abstractions. In R. E. Johnson and R. P. Gabriel, editors, OOPSLA, pages 41--57, 2005. Google ScholarDigital Library
- Oracle, Inc. Java Object Serialization Specification. http://docs.oracle.com/javase/7/docs/platform/serialization/spec/serialTOC.html, 2011. Accessed: 2013-08-11.Google Scholar
- Oscar Boykin and Mike Gagnon and Sam Ritchie. Twitter Chill. https://github.com/twitter/chill, 2012. Accessed: 2013-08-11.Google Scholar
- M. Philippsen, B. Haumacher, and C. Nester. More efficient serialization and RMI for Java. Concurrency - Practice and Experience, 12 (7): 495--518, 2000.Google Scholar
- B. C. Pierce. Types and Programming Languages. MIT Press, Cambridge, MA, 2002. Google ScholarDigital Library
- G. D. Reis and B. Stroustrup. Specifying C++ concepts. In J. G. Morrisett and S. L. P. Jones, editors, POPL, pages 295--308, 2006. Google ScholarDigital Library
- A. Rossberg. Typed open programming: a higher-order, typed approach to dynamic modularity and distribution. PhD thesis, Saarland University, 2007.Google Scholar
- A. Rossberg, G. Tack, and L. Kornstaedt. Status report: HOT pickles, and how to serve them. In ML, pages 25--36, 2007. Google ScholarDigital Library
- P. V. Roy. Announcing the mozart programming system. SIGPLAN Notices, 34 (4): 33--34, 1999. Google ScholarDigital Library
- D. Shabalin, E. Burmako, and M. Odersky. Quasiquotes for Scala. Technical Report EPFL-REPORT-185242, EPFL, Switzerland, 2013.Google Scholar
- K. Skalski. Syntax-extending and type-reflecting macros in an object-oriented language. Master's thesis, University of Warsaw, Poland, 2005.Google Scholar
- R. Strnisa, P. Sewell, and M. J. Parkinson. The Java module system: core design and semantic definition. In OOPSLA, pages 499--514, 2007. Google ScholarDigital Library
- G. Tack, L. Kornstaedt, and G. Smolka. Generic pickling and minimization. Electr. Notes Theor. Comput. Sci, 148 (2): 79--103, 2006. Google ScholarDigital Library
- Typesafe. Akka. http://akka.io/, 2009. Accessed: 2013-08-11.Google Scholar
- G. van Rossum. Python programming language. In USENIX Annual Technical Conference. USENIX, 2007.Google Scholar
- D. Vytiniotis and A. J. Kennedy. Functional pearl: every bit counts. SIGPLAN Not., 45 (9): 15--26, Sept. 2010. Google ScholarDigital Library
- S. Wehr and P. Thiemann. JavaGI: The interaction of type classes with interfaces and inheritance. ACM Trans. Program. Lang. Syst, 33 (4): 12, 2011. Google ScholarDigital Library
- M. Welsh and D. E. Culler. Jaguar: enabling efficient communication and I/O in Java. Concurrency - Practice and Experience, 12 (7), 2000.Google Scholar
- M. Zaharia, M. Chowdhury, T. Das, A. Dave, M. McCauley, M. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In NSDI. USENIX, 2012. Google ScholarDigital Library
Index Terms
Instant pickles: generating object-oriented pickler combinators for fast and extensible serialization
Recommendations
Instant pickles: generating object-oriented pickler combinators for fast and extensible serialization
OOPSLA '13: Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applicationsAs more applications migrate to the cloud, and as "big data" edges into even more production environments, the performance and simplicity of exchanging data between compute nodes/devices is increasing in importance. An issue central to distributed ...
Enhancing closures in scala 3 with spores3
Scala '22: Proceedings of the Scala SymposiumThe use of closures, a core language feature of functional programming languages, has become popular in the context of concurrent and distributed programming. Using closures in a concurrent or distributed setting increases safety hazards, however, due ...
Status report: hot pickles, and how to serve them
ML '07: Proceedings of the 2007 workshop on Workshop on MLThe need for flexible forms of serialisation arises under many circumstances, e.g. for doing high-level inter-process communication or to achieve persistence. Many languages, including variants of ML, thus offer pickling as a system service, but usually ...
Comments