skip to main content
research-article
Open Access

Unboxed Data Constructors: Or, How cpp Decides a Halting Problem

Published:05 January 2024Publication History
Skip Abstract Section

Abstract

We propose a new language feature for ML-family languages, the ability to selectively unbox certain data constructors, so that their runtime representation gets compiled away to just the identity on their argument. Unboxing must be statically rejected when it could introduce confusion, that is, distinct values with the same representation.

We discuss the use-case of big numbers, where unboxing allows to write code that is both efficient and safe, replacing either a safe but slow version or a fast but unsafe version. We explain the static analysis necessary to reject incorrect unboxing requests. We present our prototype implementation of this feature for the OCaml programming language, discuss several design choices and the interaction with advanced features such as Guarded Algebraic Datatypes.

Our static analysis requires expanding type definitions in type expressions, which is not necessarily normalizing in presence of recursive type definitions. In other words, we must decide normalization of terms in the first-order λ-calculus with recursion. We provide an algorithm to detect non-termination on-the-fly during reduction, with proofs of correctness and completeness. Our algorithm turns out to be closely related to the normalization strategy for macro expansion in the cpp preprocessor.

References

  1. Ömer Sínan Ağacan. 2016. GHC unboxed sums. https://github.com/ghc/ghc/commit/714bebff44076061d0a719c4eda2cfd213b7ac3d Google ScholarGoogle Scholar
  2. Noah Lev Bartell-Mangel. 2022. Filling a Niche: Using Spare Bits to Optimize Data Representations. https://www.noahlev.org/papers/popl22src-filling-a-niche.pdf POPL’22 student research presentation Google ScholarGoogle Scholar
  3. Thaïs Baudon, Gabriel Radanne, and Laure Gonnord. 2023. Bit-Stealing Made Legal. In ICFP. https://doi.org/10.1145/3607858 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Aria Beingessner. 2015. Rust RFC 1230: More Exotic Enum Layout Optimizations. https://github.com/rust-lang/rfcs/issues/1230 Google ScholarGoogle Scholar
  5. Michael Benfield. 2022. rustc PR 94075: Use niche-filling optimization even when multiple variants have data. https://github.com/rust-lang/rust/pull/94075 Google ScholarGoogle Scholar
  6. Mathieu Boespflug, Maxime Dénès, and Benjamin Grégoire. 2011. Full Reduction at Full Throttle. In CPP. https://inria.hal.science/hal-00650940 Google ScholarGoogle Scholar
  7. Eduard-Mihai Burtescu. 2017. rustc PR 45225: Refactor type memory layouts and ABIs, to be more general and easier to optimize. https://github.com/rust-lang/rust/pull/45225 Google ScholarGoogle Scholar
  8. Lloyd Chan. 2017. Scala Pre-SIP: Unboxed wrapper types. https://contributors.scala-lang.org/t/pre-sip-unboxed-wrapper-types/987 Google ScholarGoogle Scholar
  9. Zilin Chen, Ambroise Lafont, Liam O’Connor, Gabriele Keller, Craig McLaughlin, Vincent Jackson, and Christine Rizkallah. 2023. Dargent: A Silver Bullet for Verified Data Layout Refinement. PACMPL, 7, POPL (2023), Article 47, Jan, 27 pages. https://doi.org/10.1145/3571240 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Simon Colin, Rodolphe Lepigre, and Gabriel Scherer. 2019. Unboxing Mutually Recursive Type Definitions in OCaml. In JFLA 2019. https://hal.inria.fr/hal-01929508 Google ScholarGoogle Scholar
  11. Stephen Compall. 2017. Blog post: the high cost of AnyVal classes. https://failex.blogspot.com/2017/04/the-high-cost-of-anyval-subclasses.html Google ScholarGoogle Scholar
  12. Iavor S. Diatchki, Mark P. Jones, and Rebekah Leslie. 2005. High-Level Views on Low-Level Representations. In ICFP’05. http://web.cecs.pdx.edu/~mpj/pubs/bitdata-icfp05.pdf Google ScholarGoogle Scholar
  13. Torbjörn Granlund and contributors. 1991. GMP. https://gmplib.org/ Google ScholarGoogle Scholar
  14. John Hughes. 1982. Super-Combinators a New Implementation Method for Applicative Languages. In Proceedings of the 1982 ACM Symposium on LISP and Functional Programming (LFP). https://doi.org/10.1145/800068.802129 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Zurab Khasidashvil. 2020. A short proof of the decidability of normalization in recursive program schemes. In Shalva Pkhakadze’s Festschrift, AMIM Vol. 25 No. 2. http://www.viam.science.tsu.ge/Ami/2020_2/5_zura.pdf Google ScholarGoogle Scholar
  16. Simon Marlow. 2003. GHC’s UNPACK pragma. https://github.com/ghc/ghc/commit/abbc5a0be1df84a33015470319062ed7a3aa3153 Google ScholarGoogle Scholar
  17. Antoine Miné and Xavier Leroy. 2012. Zarith. https://github.com/ocaml/Zarith/ Google ScholarGoogle Scholar
  18. Martin Odersky and Adriaan Moors. 2018. dotty PR 5300: Opaque types. https://github.com/lampepfl/dotty/pull/5300 Google ScholarGoogle Scholar
  19. Erik Osheim, Jorge Vicente Cantero, and Sébastien Doeraene. 2017. Scala SIP 35: Opaque types. https://contributors.scala-lang.org/t/pre-sip-unboxed-wrapper-types/987 Google ScholarGoogle Scholar
  20. Simon Peyton-Jones. 2007. GHC view patterns. https://gitlab.haskell.org/ghc/ghc/-/wikis/view-patterns Google ScholarGoogle Scholar
  21. Gordon Plotkin. 2022. Recursion does not always help. arxiv:2206.08413 Google ScholarGoogle Scholar
  22. Dave Prosser. 1986. X3J11/86-196: Complete macro expansion algorithm. https://www.spinellis.gr/blog/20060626/x3J11-86-196.pdf Google ScholarGoogle Scholar
  23. Sylvain Salvati and Igor Walukiewicz. 2015. Using models to model-check recursive schemes. Logical Methods in Computer Science, Volume 11, Issue 2 (2015), June, https://doi.org/10.2168/LMCS-11(2:7)2015 Google ScholarGoogle ScholarCross RefCross Ref
  24. Diomidis Spinellis. 2008. A corrected and annotated version of the X4J11/86-196 document. https://www.spinellis.gr/blog/20060626/ Google ScholarGoogle Scholar
  25. Don Syme. 2016. Fsharp PR 1395: struct discriminated unions. https://github.com/dotnet/fsharp/pull/1395 Google ScholarGoogle Scholar
  26. Don Syme, Gregory Neverov, and James Margetson. 2007. Extensible Pattern Matching via a Lightweight Language Extension. In ICFP’07 (ICFP ’07). https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/p29-syme.pdf Google ScholarGoogle Scholar
  27. The C++ standard committee, working group SG12. 2014. n3882; An update to the preprocessor specification. https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3882.pdf Google ScholarGoogle Scholar
  28. The C standard committee, working group WG14. 1992. Defect report 017. https://www.open-std.org/Jtc1/sc22/wg14/www/docs/dr_017.html Google ScholarGoogle Scholar
  29. David A. Turner. 1979. A new implementation technique for applicative languages. In Software - Practice and Experience. Google ScholarGoogle Scholar
  30. Stephen Weeks. 2006. Whole-Program Compilation in MLton. In ML Workshop 2006. http://www.mlton.org/References.attachments/060916-mlton.pdf Google ScholarGoogle Scholar
  31. Jeremy Yallop. 2020. OCaml RFC: constructor unboxing. https://github.com/ocaml/RFCs/pull/14 Google ScholarGoogle Scholar

Index Terms

  1. Unboxed Data Constructors: Or, How cpp Decides a Halting Problem

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Article Metrics

            • Downloads (Last 12 months)1,478
            • Downloads (Last 6 weeks)110

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader