skip to main content
10.1145/2034654.2034670acmconferencesArticle/Chapter ViewAbstractPublication PagesicfpConference Proceedingsconference-collections
research-article

Disco: a computing platform for large-scale data analytics

Published:23 September 2011Publication History

ABSTRACT

We describe the design and implementation of Disco, a distributed computing platform for MapReduce style computations on large-scale data. Disco is designed for operation in clusters of commodity server machines, and provides both a fault-tolerant scheduling and execution layer as well as a distributed and replicated storage layer. Disco is implemented in Erlang and Python; Erlang is used for the implementation of the core aspects of cluster monitoring, job management, task scheduling and distributed filesystem, while Python is used to implement the standard Disco library.

Disco has been used in production for several years at Nokia, to analyze tens of terabytes of data daily on a cluster of over 100 nodes. With a small but very functional codebase, it provides a free, proven, and effective component of a full-fledged data analytics stack.

Skip Supplemental Material Section

Supplemental Material

_talk10.mp4

mp4

46.9 MB

References

  1. J. Dean, and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In Proceedings of the Sixth Symposium on Operating System Design and Implementation, San Francisco, CA, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Apache Foundation. Apache Hadoop. At http://hadoop.apache.org.Google ScholarGoogle Scholar
  3. The Disco Project. Disco. At http://discoproject.org.Google ScholarGoogle Scholar
  4. SciPy.org. Scientific Tools for Python. At http://http://www.scipy.org.Google ScholarGoogle Scholar
  5. The Disco Project. ODisco, an OCaml library for Disco. At https://github.com/pmundkur/odisco.Google ScholarGoogle Scholar
  6. D. K. Gifford, P. Jouvelot, M. Sheldon, J. O'Toole. Semantic File Systems. In ACM SIGOPS Operating Systems Review (1991), Vol 25, Issue 5, pages 16--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Ghemawat, H. Gobioff, S-T Leung. The Google File System. In 19th ACM Symposium on Operating Systems Principles, Lake George, NY, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. Ananthanarayanan, S. Kandula, A. Greenberg, I. Stoica. Reining in the Outliers in Map-Reduce Clusters using Mantri. In Proceedings of the 9th USENIX conference on Operating Systems Design and Implementation, Vancouver, BC, Canada, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Basho Technologies. Riak's MapReduce. At http://wiki.basho.com/MapReduce.html.Google ScholarGoogle Scholar
  10. J. Chris Anderson, N. Slater, J. Lehnardt. CouchDB: The Definitive Guide. O'Reilly Media, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Isard, M. Budiu, Y. Yu, A. Birrell, D. Fetterly. Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks. In Eurosys'07, Lisboa, Portugal, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. Murray, M. Schwarzkopf, C. Smowton, S. Smith, A. Madhavapeddy, S. Hand. Ciel: a universal execution engine for distributed data-flow computing. In Proceedings of NSDI 2011, Boston, MA, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Disco: a computing platform for large-scale data analytics

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            Erlang '11: Proceedings of the 10th ACM SIGPLAN workshop on Erlang
            September 2011
            108 pages
            ISBN:9781450308595
            DOI:10.1145/2034654

            Copyright © 2011 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 23 September 2011

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Erlang '11 Paper Acceptance Rate10of14submissions,71%Overall Acceptance Rate51of68submissions,75%

            Upcoming Conference

            ICFP '24

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader