research-article

Open Access

Unboxed Data Constructors: Or, How cpp Decides a Halting Problem

Authors:
Nicolas Chataing

ENS Paris, Paris, France

ENS Paris, Paris, France

0009-0006-4174-2088
View Profile

,
Stephen Dolan

Jane Street, London, UK

Jane Street, London, UK

0000-0002-4609-9101
View Profile

,
Gabriel Scherer

Inria, Paris, France

Inria, Paris, France

0000-0003-1758-3938
View Profile

,
Jeremy Yallop

University of Cambridge, Cambridge, UK

University of Cambridge, Cambridge, UK

0009-0002-1650-6340
View Profile

Proceedings of the ACM on Programming Languages Volume 8 Issue POPLArticle No.: 51pp 1509–1539https://doi.org/10.1145/3632893

Published:05 January 2024Publication History

Proceedings of the ACM on Programming Languages

Abstract

We propose a new language feature for ML-family languages, the ability to selectively unbox certain data constructors, so that their runtime representation gets compiled away to just the identity on their argument. Unboxing must be statically rejected when it could introduce confusion, that is, distinct values with the same representation.

We discuss the use-case of big numbers, where unboxing allows to write code that is both efficient and safe, replacing either a safe but slow version or a fast but unsafe version. We explain the static analysis necessary to reject incorrect unboxing requests. We present our prototype implementation of this feature for the OCaml programming language, discuss several design choices and the interaction with advanced features such as Guarded Algebraic Datatypes.

Our static analysis requires expanding type definitions in type expressions, which is not necessarily normalizing in presence of recursive type definitions. In other words, we must decide normalization of terms in the first-order λ-calculus with recursion. We provide an algorithm to detect non-termination on-the-fly during reduction, with proofs of correctness and completeness. Our algorithm turns out to be closely related to the normalization strategy for macro expansion in the cpp preprocessor.

References

Ömer Sínan Ağacan. 2016. GHC unboxed sums. https://github.com/ghc/ghc/commit/714bebff44076061d0a719c4eda2cfd213b7ac3d Google Scholar
Noah Lev Bartell-Mangel. 2022. Filling a Niche: Using Spare Bits to Optimize Data Representations. https://www.noahlev.org/papers/popl22src-filling-a-niche.pdf POPL’22 student research presentation Google Scholar
Thaïs Baudon, Gabriel Radanne, and Laure Gonnord. 2023. Bit-Stealing Made Legal. In ICFP. https://doi.org/10.1145/3607858 Google ScholarDigital Library
Aria Beingessner. 2015. Rust RFC 1230: More Exotic Enum Layout Optimizations. https://github.com/rust-lang/rfcs/issues/1230 Google Scholar
Michael Benfield. 2022. rustc PR 94075: Use niche-filling optimization even when multiple variants have data. https://github.com/rust-lang/rust/pull/94075 Google Scholar
Mathieu Boespflug, Maxime Dénès, and Benjamin Grégoire. 2011. Full Reduction at Full Throttle. In CPP. https://inria.hal.science/hal-00650940 Google Scholar
Eduard-Mihai Burtescu. 2017. rustc PR 45225: Refactor type memory layouts and ABIs, to be more general and easier to optimize. https://github.com/rust-lang/rust/pull/45225 Google Scholar
Lloyd Chan. 2017. Scala Pre-SIP: Unboxed wrapper types. https://contributors.scala-lang.org/t/pre-sip-unboxed-wrapper-types/987 Google Scholar
Zilin Chen, Ambroise Lafont, Liam O’Connor, Gabriele Keller, Craig McLaughlin, Vincent Jackson, and Christine Rizkallah. 2023. Dargent: A Silver Bullet for Verified Data Layout Refinement. PACMPL, 7, POPL (2023), Article 47, Jan, 27 pages. https://doi.org/10.1145/3571240 Google ScholarDigital Library
Simon Colin, Rodolphe Lepigre, and Gabriel Scherer. 2019. Unboxing Mutually Recursive Type Definitions in OCaml. In JFLA 2019. https://hal.inria.fr/hal-01929508 Google Scholar
Stephen Compall. 2017. Blog post: the high cost of AnyVal classes. https://failex.blogspot.com/2017/04/the-high-cost-of-anyval-subclasses.html Google Scholar
Iavor S. Diatchki, Mark P. Jones, and Rebekah Leslie. 2005. High-Level Views on Low-Level Representations. In ICFP’05. http://web.cecs.pdx.edu/~mpj/pubs/bitdata-icfp05.pdf Google Scholar
Torbjörn Granlund and contributors. 1991. GMP. https://gmplib.org/ Google Scholar
John Hughes. 1982. Super-Combinators a New Implementation Method for Applicative Languages. In Proceedings of the 1982 ACM Symposium on LISP and Functional Programming (LFP). https://doi.org/10.1145/800068.802129 Google ScholarDigital Library
Zurab Khasidashvil. 2020. A short proof of the decidability of normalization in recursive program schemes. In Shalva Pkhakadze’s Festschrift, AMIM Vol. 25 No. 2. http://www.viam.science.tsu.ge/Ami/2020_2/5_zura.pdf Google Scholar
Simon Marlow. 2003. GHC’s UNPACK pragma. https://github.com/ghc/ghc/commit/abbc5a0be1df84a33015470319062ed7a3aa3153 Google Scholar
Antoine Miné and Xavier Leroy. 2012. Zarith. https://github.com/ocaml/Zarith/ Google Scholar
Martin Odersky and Adriaan Moors. 2018. dotty PR 5300: Opaque types. https://github.com/lampepfl/dotty/pull/5300 Google Scholar
Erik Osheim, Jorge Vicente Cantero, and Sébastien Doeraene. 2017. Scala SIP 35: Opaque types. https://contributors.scala-lang.org/t/pre-sip-unboxed-wrapper-types/987 Google Scholar
Simon Peyton-Jones. 2007. GHC view patterns. https://gitlab.haskell.org/ghc/ghc/-/wikis/view-patterns Google Scholar
Gordon Plotkin. 2022. Recursion does not always help. arxiv:2206.08413 Google Scholar
Dave Prosser. 1986. X3J11/86-196: Complete macro expansion algorithm. https://www.spinellis.gr/blog/20060626/x3J11-86-196.pdf Google Scholar
Sylvain Salvati and Igor Walukiewicz. 2015. Using models to model-check recursive schemes. Logical Methods in Computer Science, Volume 11, Issue 2 (2015), June, https://doi.org/10.2168/LMCS-11(2:7)2015 Google ScholarCross Ref
Diomidis Spinellis. 2008. A corrected and annotated version of the X4J11/86-196 document. https://www.spinellis.gr/blog/20060626/ Google Scholar
Don Syme. 2016. Fsharp PR 1395: struct discriminated unions. https://github.com/dotnet/fsharp/pull/1395 Google Scholar
Don Syme, Gregory Neverov, and James Margetson. 2007. Extensible Pattern Matching via a Lightweight Language Extension. In ICFP’07 (ICFP ’07). https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/p29-syme.pdf Google Scholar
The C++ standard committee, working group SG12. 2014. n3882; An update to the preprocessor specification. https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3882.pdf Google Scholar
The C standard committee, working group WG14. 1992. Defect report 017. https://www.open-std.org/Jtc1/sc22/wg14/www/docs/dr_017.html Google Scholar
David A. Turner. 1979. A new implementation technique for applicative languages. In Software - Practice and Experience. Google Scholar
Stephen Weeks. 2006. Whole-Program Compilation in MLton. In ML Workshop 2006. http://www.mlton.org/References.attachments/060916-mlton.pdf Google Scholar
Jeremy Yallop. 2020. OCaml RFC: constructor unboxing. https://github.com/ocaml/RFCs/pull/14 Google Scholar

Index Terms

Unboxed Data Constructors: Or, How cpp Decides a Halting Problem
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Source code generation
    2. General programming languages
      1. Language features
        Data types and structures
      2. Language types
        Functional languages
2. Theory of computation
  1. Semantics and reasoning
    1. Program constructs
      1. Type structures

Recommendations

Self type constructors
OOPSLA '09: Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications

Bruce and Foster proposed the language LOOJ, an extension of Java with the notion of MyType, which represents the type of a self reference and changes its meaning along with inheritance. MyType is useful to write extensible yet type-safe classes for ...
Read More
Self type constructors
OOPSLA '09

Bruce and Foster proposed the language LOOJ, an extension of Java with the notion of MyType, which represents the type of a self reference and changes its meaning along with inheritance. MyType is useful to write extensible yet type-safe classes for ...
Read More
Unboxed values and polymorphic typing revisited
FPCA '95: Proceedings of the seventh international conference on Functional programming languages and computer architecture
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the ACM on Programming Languages Volume 8, Issue POPL
January 2024
2820 pages
EISSN:2475-1421
DOI:10.1145/3554315
Editor:
Michael Hicks
Amazon, USA
Issue’s Table of Contents
Copyright © 2024 Owner/Author
This work is licensed under a Creative Commons Attribution 4.0 International License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 January 2024
Published in pacmpl Volume 8, Issue POPL

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
boxing
data representation
recursive definitions
sum types
tagging
termination
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 1,478
  Total Downloads
- Downloads (Last 12 months)1,478
- Downloads (Last 6 weeks)110
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Unboxed Data Constructors: Or, How cpp Decides a Halting Problem

Proceedings of the ACM on Programming Languages

Abstract

References

Cited By

Index Terms

Recommendations

Self type constructors

Self type constructors

Unboxed values and polymorphic typing revisited

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Unboxed Data Constructors: Or, How cpp Decides a Halting Problem

Proceedings of the ACM on Programming Languages

Abstract

References

Cited By

Index Terms

Recommendations

Self type constructors

Self type constructors

Unboxed values and polymorphic typing revisited

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media