skip to main content
10.1145/3597465.3605220acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Overlay Spreadsheets

Published:21 July 2023Publication History

ABSTRACT

Efforts to scale spreadsheets either follow a 'virtual' strategy that layers a spreadsheet interface on top of an existing database engine or a 'materialized' strategy based on re-engineering a spreadsheet engine. Because databases are not optimized for spreadsheet access patterns, the materialized approach has better performance. However, the virtual approach offers several advantages that can not be easily replicated in the materialized approach, including the ability to re-apply user interactions to an updated input dataset. We propose the overlay update model, a hybrid approach that overlays user updates on an existing dataset (as in the virtual approach) and indexes user updates (as in the materialized approach). A key feature of our approach is storing updates generated by bulk operations (e.g., copy/paste) as compact "patterns" that can be leveraged to reduce execution costs. We implement an overlay spreadsheet over Apache Spark and demonstrate that, compared to DataSpread (a materialized spreadsheet), it can significantly reduce execution costs.

References

  1. Michael Armbrust, Reynold S. Xin, Cheng Lian, Yin Huai, Davies Liu, Joseph K. Bradley, Xiangrui Meng, Tomer Kaftan, Michael J. Franklin, Ali Ghodsi, and Matei Zaharia. 2015. Spark SQL: Relational Data Processing in Spark. In SIGMOD.Google ScholarGoogle Scholar
  2. Eirik Bakke and Edward Benson. 2011. The Schema-Independent Database UI: A Proposed Holy Grail and Some Suggestions. In CIDR. 219--222.Google ScholarGoogle Scholar
  3. Eirik Bakke, David R. Karger, and Rob Miller. 2011. A spreadsheet-based user interface for managing plural relationships in structured data. In CHI. 2541--2550.Google ScholarGoogle Scholar
  4. Mangesh Bendre, Bofan Sun, Ding Zhang, Xinyan Zhou, Kevin Chen-Chuan Chang, and Aditya G. Parameswaran. 2015. DATASPREAD: Unifying Databases and Spreadsheets. PVLDB 8, 12 (2015), 2000--2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Mangesh Bendre, Vipul Venkataraman, Xinyan Zhou, Kevin Chen-Chuan Chang, and Aditya G. Parameswaran. 2018. Towards a Holistic Integration of Spreadsheets with Databases: A Scalable Storage Engine for Presentational Data Management. In ICDE. 113--124.Google ScholarGoogle Scholar
  6. Mangesh Bendre, Tana Wattanawaroon, Kelly Mack, Kevin Chang, and Aditya G. Parameswaran. 2019. Anti-Freeze for Large and Complex Spreadsheets: Asynchronous Formula Computation. In SIGMOD. 1277--1294.Google ScholarGoogle Scholar
  7. Mike Brachmann, Carlos Bautista, Sonia Castelo, Su Feng, Juliana Freire, Boris Glavic, Oliver Kennedy, Heiko Mueller, Remi Rampin, William Spoth, and Ying Yang. 2019. Data Debugging and Exploration with Vizier. In SIGMOD.Google ScholarGoogle Scholar
  8. Michael Brachmann, William Spoth, Oliver Kennedy, Boris Glavic, Heiko Mueller, Sonia Castelo, Carlos Bautista, and Juliana Freire. 2020. Your notebook is not crumby enough, REPLace it. In CIDR.Google ScholarGoogle Scholar
  9. The Transaction Processing Performance Council. 2018. TPC Benchmark H (Decision Support), Revision 2.18.0. https://www.tpc.org/tpch/default5.asp.Google ScholarGoogle Scholar
  10. Juliana Freire, Boris Glavic, Oliver Kennedy, and Heiko Mueller. 2016. The Exception That Improves The Rule. In HILDA.Google ScholarGoogle Scholar
  11. H. V. Jagadish, Adriane Chapman, Aaron Elkiss, Magesh Jayapandian, Yunyao Li, Arnab Nandi, and Cong Yu. 2007. Making database systems usable. In SIGMOD. 13--24.Google ScholarGoogle Scholar
  12. Sean Kandel, Andreas Paepcke, Joseph M. Hellerstein, and Jeffrey Heer. 2011. Wrangler: interactive visual specification of data transformation scripts. In CHI. 3363--3372.Google ScholarGoogle Scholar
  13. Oliver Kennedy, Boris Glavic, Juliana Freire, and Mike Brachmann. 2022. The Right Tool for the Job: Data-Centric Workflows in Vizier. IEEE-DEB (2022).Google ScholarGoogle Scholar
  14. Poonam Kumari, Michael Brachmann, Oliver Kennedy, Su Feng, and Boris Glavic. 2021. DataSense: Display Agnostic Data Documentation. In CIDR.Google ScholarGoogle Scholar
  15. Bin Liu and H. V. Jagadish. 2009. A Spreadsheet Algebra for a Direct Data Manipulation Query Interface. In ICDE. 417--428.Google ScholarGoogle Scholar
  16. Sajjadur Rahman, Kelly Mack, Mangesh Bendre, Ruilin Zhang, Karrie Karahalios, and Aditya G. Parameswaran. 2020. Benchmarking Spreadsheet Systems. In SIGMOD. 1589--1599.Google ScholarGoogle Scholar
  17. Dixin Tang, Fanchao Chen, Christopher De Leon, Tana Wattanawaroon, Jeaseok Yun, Srinivasan Seshadri, and Aditya G. Parameswaran. 2023. Efficient and Compact Spreadsheet Formula Graphs. CoRR abs/2302.05482 (2023). arXiv:2302.05482Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    HILDA '23: Proceedings of the Workshop on Human-In-the-Loop Data Analytics
    June 2023
    76 pages
    ISBN:9798400702167
    DOI:10.1145/3597465

    Copyright © 2023 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 21 July 2023

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate28of56submissions,50%
  • Article Metrics

    • Downloads (Last 12 months)37
    • Downloads (Last 6 weeks)4

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader