ABSTRACT
MapReduce is a programming model for the development of Web-scale programs. It is based on concepts from functional programming, namely higher-order functions, which can be strongly typed using parametric polymorphism. Yet this connection is tenuous. For example, in Hadoop, the connection between the two phases of a MapReduce computation is unsafe: there is no static type check of the generic type parameters involved. We provide a static check for Hadoop programs without asking the user to write any more code. To this end, we use strongly typed higher-order functions checked by the standard Java 5 type checker together with the Hadoop program. We also generate automatically the code needed to execute this program with a standard Hadoop implementation.
- J. Berthold, M. Dieterle, and R. Loogen. Implementing Parallel Google Map-Reduce in Eden. In Proc. Euro-Par, LNCS 5704, pages 990--1002, 2009. Google ScholarDigital Library
- J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. Comm. ACM, 51(1):107--113, 2008. Google ScholarDigital Library
- C. A. Herrmann and C. Lengauer. Transforming Functional Prototypes to Efficient Parallel Programs. In Rabhi and Gorlatch {10}, chapter 3, pages 65--94. Google ScholarDigital Library
- A. Igarashi, B. C. Pierce, and P. Wadler. Featherweight Java: A Minimal Core Calculus for Java and GJ. In Proc. OOPSLA, pages 132--146, 1999. Google ScholarDigital Library
- C. Jardak, J. Riihijarvi, F. Oldewurtel, and P. Mahönen. Parallel Processing of Data from Very Large-Scale Wireless Sensor Networks. In Proc. HPDC Workshops, pages 787--794, 2010. Google ScholarDigital Library
- H. Kuchen and J. Striegnitz. Features from Functional Programming for a C++ Skeleton Library. Concurrency Computat.: Pract. Exper., 17(7--8):739--756, 2005. Google ScholarDigital Library
- R. Lämmel. Google's MapReduce Programming Model -- Revisited. Sci. Comput. Program., 70(1):1--30, 2008. Google ScholarCross Ref
- C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig Latin: A Not-So-Foreign Language for Data Processing. In Proc. SIGMOD, pages 1099--1110, 2008. Google ScholarDigital Library
- B. C. Pierce. Types and Programming Languages. MIT Press, 2002. Google ScholarDigital Library
- F. A. Rabhi and S. Gorlatch, editors. Patterns and Skeletons for Parallel and Distributed Computing. Springer, 2003. Google ScholarDigital Library
- K. Wiley, A. Connolly, J. P. Gardner, S. Krughof, M. Balazinska, B. Howe, Y. Kwon, and Y. Bu. Astronomy in the Cloud: Using MapReduce for Image Coaddition. CoRR, abs/1010.1015, 2010.Google Scholar
Index Terms
- Static type checking of Hadoop MapReduce programs
Recommendations
MapReduce: Review and open challenges
The continuous increase in computational capacity over the past years has produced an overwhelming flow of data or big data, which exceeds the capabilities of conventional processing tools. Big data signify a new era in data exploration and utilization. ...
Survey on improving the performance of MapReduce in Hadoop
NISS '21: Proceedings of the 4th International Conference on Networking, Information Systems & SecurityHadoop has become the most popular and the most used platform in distributed data processing, Hadoop is also an open-source software that implements the MapReduce model for processing big data, it has taken a large part in scientific research in the ...
Implementation of Distributed Searching and Sorting using Hadoop MapReduce
ICTCS '14: Proceedings of the 2014 International Conference on Information and Communication Technology for Competitive StrategiesThis paper focuses on implementation of MapReduce programming model on Hadoop cluster for parallel processing of huge amount of data efficiently. There is deluge of data everywhere and we need to process these data efficiently to take decisions and to ...
Comments