Fault Tolerance in MapReduce: A Survey

Memishi, Bunjamin; Ibrahim, Shadi; Pérez, María S.; Antoniu, Gabriel

doi:10.1007/978-3-319-44881-7_11

Bunjamin Memishi⁵,
Shadi Ibrahim⁶,
María S. Pérez⁵ &
…
Gabriel Antoniu⁶

Part of the book series: Computer Communications and Networks ((CCN))

1741 Accesses
9 Citations

Abstract

MapReduce-based systems have emerged as a prominent framework for large-scale data analysis, having fault tolerance as one of its key features. MapReduce has introduced simple yet efficient mechanisms to handle different kinds of failures including crashes, omissions, and arbitrary failures. This contribution discusses in detail the types of failures in MapReduce systems and surveys the different mechanisms used in the framework for detecting, handling, and recovering from these failures. It also surveys the state-of-the-art optimization mechanisms to improve the fault tolerance in MapReduce, and in particular its open-source implementation Hadoop. Finally, it identifies the remaining challenges and open issues for building efficient fault tolerance mechanisms for MapReduce.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
This is not particularly true in the case of speculative execution, since it has proven to exhaust a considerable amount of resources, when executed on heterogeneous environments [39, 72] or when the system is going under failures [22].
2.
Jeff Dean, one of the leading engineers in Google, said: (we) “lost 1600 of 1800 machines once, but finished fine”.
3.
It is important to mention that, differently from [72] which considers tasks as stragglers, in the default paper of Google [19], a straggler is “a machine that takes an unusually long time to complete one of the last few map or reduce tasks in the computation”.
4.
Spot instances are virtual machines resources in Amazon Web Services (WS), for which a user defines a maximum biding price that he/she is willing to pay. If there is no concurrence, the prices are lower and the possibility of using them is higher. But when the demand is higher, then Amazon WS has the right to stop your spot instances. If the spot instances are stopped by Amazon, the user does not pay, otherwise if the user decides to stop them before completing the normal hour, the user is obliged to pay for that consumption.

References

Ananthanarayanan, G., Agarwal, S., Kandula, S., Greenberg, A., Stoica, I., Harlan, D., Harris, E.: Scarlett: coping with skewed content popularity in mapreduce clusters. In: Proceedings of the Sixth Conference on Computer Systems, ACM, New York, NY, USA, EuroSys ’11, pp. 287–300, (2011). http://doi.acm.org/10.1145/1966445.1966472
Ananthanarayanan, G., Ghodsi, A., Shenker, S., Stoica, I.: Effective straggler mitigation: Attack of the clones. In: Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation, USENIX Association, Berkeley, CA, USA, NSDI’13, pp. 185–198, (2013). http://dl.acm.org/citation.cfm?id=2482626.2482645
Ananthanarayanan, G., Hung, M.C.C., Ren, X., Stoica, I., Wierman, A., Yu, M.: GRASS: trimming stragglers in approximation analytics. In: Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation, USENIX Association, Berkeley, CA, USA, NSDI’14, pp. 289–302, (2014). http://dl.acm.org/citation.cfm?id=2616448.2616475
Ananthanarayanan, G., Kandula, S., Greenberg, A., Stoica, I., Lu, Y., Saha, B., Harris, E.: Reining in the outliers in map-reduce clusters using Mantri. In: Proceedings of the 9th USENIX conference on Operating Systems Design and Implementation, USENIX Association, Berkeley, CA, USA, OSDI’10, pp. 1–16, (2010). http://dl.acm.org/citation.cfm?id=1924943.1924962
Apache Zookeeper: (2015). http://zookeeper.apache.org/
Armbrust, M., Xin, R.S., Lian, C., Huai, Y., Liu, D., Bradley, J.K., Meng, X., Kaftan, T., Franklin, M.J., Ghodsi, A., Zaharia, M.: Spark sql: Relational data processing in spark. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, ACM, New York, NY, USA, SIGMOD ’15, pp. 1383–1394 (2015). http://doi.acm.org/10.1145/2723372.2742797
Avizienis, A., Laprie, J.C., Randell, B., Landwehr, C.E.: Basic concepts and taxonomy of dependable and secure computing. IEEE Trans. Dependable Secure Comput. 1(1), 11–33 (2004)
Article Google Scholar
Barborak, M., Dahbura, A., Malek, M.: The consensus problem in fault-tolerant computing. ACM Comput. Surv. 25(2), 171–220 (1993). http://doi.acm.org/10.1145/152610.152612
Google Scholar
Borthakur, D., Gray, J., Sarma, J.S., Muthukkaruppan, K., Spiegelberg, N., Kuang, H., Ranganathan, K., Molkov, D., Menon, A., Rash, S., Schmidt, R., Aiyer, A.: Apache Hadoop goes realtime at Facebook. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, ACM, New York, NY, USA, SIGMOD ’11, pp. 1071–1080 (2011). http://doi.acm.org/10.1145/1989323.1989438
Bressoud, T.C., Kozuch, M.A.: Cluster fault-tolerance: An experimental evaluation of checkpointing and MapReduce through simulation. In: Proceedings of the 2009 IEEE International Conference on Cluster Computing and Workshops, IEEE, pp. 1–10 (2009). http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=5289185
Cachin, C., Guerraoui, R., Rodrigues, L.: Introduction to Reliable and Secure Distributed Programming (2. ed.). Springer (2011)
Google Scholar
Chaiken, R., Jenkins, B., Larson, P., Ramsey, B., Shakib, D., Weaver, S., Zhou, J.: SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets. Proc. VLDB Endow 1(2), 1265–1276 (2008). http://dl.acm.org/citation.cfm?id=1454159.1454166
Google Scholar
Chen, Q., Liu, C., Xiao, Z.: Improving mapreduce performance using smart speculative execution strategy. IEEE Trans. Comput. 63(4), 954–967 (2014). doi:10.1109/TC.2013.15
Article MathSciNet Google Scholar
Chohan, N., Castillo, C., Spreitzer, M., Steinder, M., Tantawi, A., Krintz, C.: See spot run: using spot instances for MapReduce workflows. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, USENIX Association, Berkeley, CA, USA, HotCloud’10, pp. 7–7 (2010). http://dl.acm.org/citation.cfm?id=1863103.1863110
Clement, A., Kapritsos, M., Lee, S., Wang, Y., Alvisi, L., Dahlin, M., Riche, T.: Upright cluster services. In: Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, ACM, New York, NY, USA, SOSP ’09, pp. 277–290 (2009). http://doi.acm.org/10.1145/1629575.1629602
Condie, T., Conway, N., Alvaro, P., Hellerstein, J.M., Elmeleegy, K., Sears, R.: MapReduce online. In: Proceedings of the 7th USENIX Conference on Networked Systems Design and Implementation, USENIX Association, Berkeley, CA, USA, NSDI’10, pp. 21–21 (2010). http://dl.acm.org/citation.cfm?id=1855711.1855732
Correia, M., Costa, P., Pasin, M., Bessani, A., Ramos, F., Verissimo, P.: On the feasibility of byzantine fault-tolerant mapreduce in clouds-of-clouds. In: 2012 IEEE 31st Symposium on Reliable Distributed Systems (SRDS), pp. 448–453 (2012). doi:10.1109/SRDS.2012.46
Costa, P., Pasin, M., Bessani, A., Correia, M.: Byzantine Fault-Tolerant MapReduce: Faults are Not Just Crashes. In: Proceedings of the 3rd IEEE Second International Conference on Cloud Computing Technology and Science, IEEE Computer Society, Washington, DC, USA, CLOUDCOM ’11, pp. 17–24 (2010). http://dx.doi.org/10.1109/CloudCom.2010.25
Dean, J., Ghemawat, S., Inc, G.: MapReduce: simplified data processing on large clusters. In: Proceedings of the 6th Conference on Symposium on Operating Systems Design & Implementation, USENIX Association, OSDI’04 (2004)
Google Scholar
Dean, J.: Building software systems at google and lessons learned. Stanford EE Computer Systems Colloquium (2010). http://www.stanford.edu/class/ee380/Abstracts/101110-slides.pdf
Dinu, F., Ng, T.S.E.: Hadoop’s Overload Tolerant Design Exacerbates Failure Detection and Recovery. In: Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, ACM, New York, NY, USA, NetDB’11, pp. 1–7 (2011)
Google Scholar
Dinu, F., Ng, T.E.: Understanding the effects and implications of compute node related failures in Hadoop. In: HPDC ’12: Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, ACM, New York, NY, USA, pp. 187–198 (2012). http://doi.acm.org/10.1145/2287076.2287108
Facebook, Inc.: (2015). https://www.facebook.com/
Facebook, I.: Under the Hood: Scheduling MapReduce jobs more efficiently with Corona (2012). http://www.facebook.com/notes/facebook-engineering/under-the-hood-scheduling-mapreduce-jobs-more-efficiently-with-corona/10151142560538920
Fedak, G., He, H., Cappello, F.: BitDew: A data management and distribution service with multi-protocol file transfer and metadata abstraction. J Netw. Compu. Appl. 32(5), 961–975 (2009)
Article Google Scholar
Gonzalez, J.E., Xin, R.S., Dave, A., Crankshaw, D., Franklin, M.J., Stoica, I.: Graphx: Graph processing in a distributed dataflow framework. In: Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, USENIX Association, Berkeley, CA, USA, OSDI’14, pp. 599–613 (2014). http://dl.acm.org/citation.cfm?id=2685048.2685096
Hadoop Releases: (2015). http://hadoop.apache.org/releases.html
Hindman, B., Konwinski, A., Zaharia, M., Ghodsi, A., Joseph, A.D., Katz, R., Shenker, S., Stoica, I.: Mesos: A Platform for Fine-grained Resource Sharing in the Data Center. In: Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation, USENIX Association, Berkeley, CA, USA, NSDI’11, pp. 22–22 (2011). http://dl.acm.org/citation.cfm?id=1972457.1972488
How-to: Set Up a Hadoop Cluster with Network Encryption: (2013). http://blog.cloudera.com/blog/2013/03/how-to-set-up-a-hadoop-cluster-with-network-encryption/
Ibrahim, S., Phuong, T.A., Antoniu, G.: An Eye on the Elephant in the Wild: A Performance Evaluation of Hadoop’s Schedulers Under Failures. In: Workshop on Adaptive Resource Management and Scheduling for Cloud Computing (ARMS-CC-2015), held in conjunction with PODC’15 (2015)
Google Scholar
Introduction to Hadoop Security: (2013). http://www.cloudera.com/content/cloudera/en/home.html
Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: distributed data-parallel programs from sequential building blocks. In: Proceedings of the 2nd ACM SIGOPS/EuroSys 2007, ACM, New York, NY, USA, EuroSys ’07, pp. 59–72 (2007). http://doi.acm.org/10.1145/1272996.1273005
Jin, H., Ibrahim, S., Qi, L., Cao, H., Wu, S., Shi, X.: The MapReduce programming model and implementations. Cloud Computing: Principles and Paradigms pp. 373–390. doi:10.1002/9780470940105.ch14
Google Scholar
Jin, H., Qiao, K., Sun, X.H., Li, Y.l.: Performance under Failures of MapReduce Applications. In: Proceedings of the 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, IEEE Computer Society, Washington, DC, USA, CCGRID ’11, pp. 608–609 (2011). http://dx.doi.org/10.1109/CCGrid.2011.84
Jin, H., Sun, X.H.: Performance comparison under failures of MPI and MapReduce: An Analytical Approach. Future Gener. Comput. Syst. 29(7), 1808–1815 (2013). http://dx.doi.org/10.1016/j.future.2013.01.013
Google Scholar
Kerberos: The Network Authentication Protocol: (2015). http://web.mit.edu/kerberos/
Ko, S.Y., Hoque, I., Cho, B., Gupta, I.: Making cloud intermediate data fault-tolerant. In: Proceedings of the 1st ACM Symposium on Cloud Computing, ACM, New York, NY, USA, SoCC ’10, pp. 181–192 (2010). http://doi.acm.org/10.1145/1807128.1807160
Ko, S.Y., Hoque, I., Cho, B., Gupta, I.: On availability of intermediate data in cloud computations. In: Proceedings of the 12th conference on Hot topics in operating systems, USENIX Association, Berkeley, CA, USA, HotOS’09, pp. 6–6 (2009). http://dl.acm.org/citation.cfm?id=1855568.1855574
Lin, H., Ma, X., Archuleta, J., Feng, W.c., Gardner, M., Zhang, Z.: MOON: MapReduce On Opportunistic eNvironments. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, ACM, New York, NY, USA, HPDC ’10, pp. 95–106 (2010). http://doi.acm.org/10.1145/1851476.1851489
Lin, J., Dyer, C.: Data-Intensive Text Processing with MapReduce. Tech. rep., University of Maryland, College Park (2010)
Google Scholar
Liu, H., Orban, D.: Cloud MapReduce: A MapReduce implementation on top of a cloud operating system. In: 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 464–474 (2011). doi:10.1109/CCGrid.2011.25
Liu, H.: Cutting MapReduce Cost with Spot Market. In: Proceedings of the 3rd USENIX Conference on Hot topics in Cloud Computing, USENIX Association, Berkeley, CA, USA, HotCloud’11, pp. 5–5 (2011). https://www.usenix.org/conference/hotcloud11/cutting-mapreduce-cost-spot-market
Memishi, B., Ibrahim, S., Pérez, M.S., Antoniu, G.: On the Dynamic Shifting of the MapReduce Timeout. In: Kannan, R., Rasool, R.U., Jin, H., Balasundaram, S. (eds) Managing and Processing Big Data in Cloud Computing, IGI Global, Hershey, Pennsylvania (USA), pp. 1–22 (2016). doi:10.4018/978-1-4666-9767-6
Memishi, B., Pérez, M.S., Antoniu, G.: Diarchy: An Optimized Management Approach for MapReduce Masters. Procedia Comput. Sci. 51, 9–18 (2015). http://www.sciencedirect.com/science/article/pii/S1877050915009874. International Conference On Computational Science, ICCS Computational Science at the Gates of Nature
Microsoft, Inc.: (2015). http://www.microsoft.com/
Mone, G.: Beyond Hadoop. Commun. ACM 56(1), 22–24 (2013). http://doi.acm.org/10.1145/2398356.2398364
Google Scholar
Okorafor, E., Patrick, M.K.: Availability of Jobtracker machine in Hadoop/MapReduce Zookeeper coordinated clusters. Adv. Comput.: An Int. J. 3(3), 19–30 (2012). http://www.chinacloud.cn/upload/2012-07/12072600543782.pdf
Google Scholar
Pan, X., Tan, J., Kavulya, S., Gandhi, R., Narasimhan, P.: Ganesha: blackBox diagnosis of MapReduce systems. SIGMETRICS Perform. Eval. Rev. 37(3), 8–13 (2010). http://doi.acm.org/10.1145/1710115.1710118
Google Scholar
Phan, T.D., Ibrahim, S., Antoniu, G., Bougé, L.: On Understanding the energy impact of speculative execution in Hadoop. In: IEEE International Conference on Green Computing and Communications (GreenCom 2015), Sydney, Australia (2015). https://hal.inria.fr/hal-01238055
RedHat: A guide for developers using the JBoss Enterprise SOA Platform (2008). http://www.redhat.com/docs/en-US/JBoss_SOA_Platform/4.3.GA/html/Programmers_Guide/index.html, programmersGuide
Roy, I., Setty, S.T.V., Kilzer, A., Shmatikov, V., Witchel, E.: Airavat: security and privacy for MapReduce. In: Proceedings of the 7th USENIX Conference on Networked Systems Design and Implementation, USENIX Association, Berkeley, CA, USA, NSDI’10, pp. 20–20 (2010). http://dl.acm.org/citation.cfm?id=1855711.1855731
Shih, J.: Hadoop security overview—from security infrastructure deployment to high-level services. Hadoop & BigData Technology Conference (2012). www.hbtc2012.hadooper.cn/subject/keynotep8shihongliang.pdf
Sorting 1PB with MapReduce: (2013). http://googleblog.blogspot.com/2008/11/sorting-1pb-with-mapreduce.html
Stonebraker, M., Abadi, D., DeWitt, D.J., Madden, S., Paulson, E., Pavlo, A., Rasin, A.: MapReduce and parallel DBMSs: friends or foes? Commun. ACM 53:64–71 (2010). http://doi.acm.org/10.1145/1629175.1629197
Google Scholar
Tang, B., Moca, M., Chevalier, S., He, H., Fedak, G.: Towards MapReduce for Desktop Grid Computing. In: Proceedings of the 2010 International Conference on P2P, Parallel, Grid, Cloud and Internet Computing, IEEE Computer Society, Washington, DC, USA, 3PGCIC ’10, pp. 193–200 (2010). http://dx.doi.org/10.1109/3PGCIC.2010.33
The Apache Hadoop Project: (2015). http://hadoop.apache.org/
Vavilapalli, V.K., Murthy, A.C., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, T., Lowe, J., Shah, H., Seth, S., Saha, B., Curino, C., O’Malley, O., Radia, S., Reed, B., Baldeschwieler, E.: Apache Hadoop YARN: Yet Another Resource Negotiator. In: Proceedings of the 4th Annual Symposium on Cloud Computing, ACM, New York, NY, USA, SoCC ’13, p. 5:1–5:16 (2013). http://doi.acm.org/10.1145/2523616.2523633
Wang, G., Butt, A.R., Pandey, P., Gupta, K.: A simulation approach to evaluating design decisions in MapReduce setups. In: 17th Annual Meeting of the IEEE/ACM International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems, IEEE, MASCOTS 2009, pp. 1–11
Google Scholar
Wang, F., Qiu, J., Yang, J., Dong, B., Li, X., Li, Y.: Hadoop high availability through metadata replication. In: Proceedings of the First International Workshop on Cloud Data Management, ACM, New York, NY, USA, CloudDB ’09, pp. 37–44 (2009). http://doi.acm.org/10.1145/1651263.1651271
Warneke, D., Kao, O.: Nephele: Efficient parallel data processing in the cloud. In: Proceedings of the 2Nd Workshop on Many-Task Computing on Grids and Supercomputers, ACM, New York, NY, USA, MTAGS ’09, pp. 8:1–8:10 (2009). http://doi.acm.org/10.1145/1646468.1646476
White, T.: Hadoop—The Definitive Guide: Storage and Analysis at Internet Scale (3. ed., revised and updated). O’Reilly (2012)
Google Scholar
Xiao, Z., Xiao, Y.: Achieving accountable MapReduce in cloud computing. Future Gener. Comput. Syst. 30, 1–13 (2014). http://dx.doi.org/10.1016/j.future.2013.07.001
Google Scholar
Xu, H., Lau, W.C.: Optimization for speculative execution in a MapReduce-like cluster. In: 2015 IEEE Conference on Computer Communications, INFOCOM 2015, Kowloon, Hong Kong, April 26–1May 1, 2015, pp. 1071–1079. http://dx.doi.org/10.1109/INFOCOM.2015.7218480
Xu, H., Lau, W.C.: Speculative execution for a single job in a mapreduce-like system. In: 2014 IEEE 7th International Conference on Cloud Computing (CLOUD), pp. 586–593 (2014). doi:10.1109/CLOUD.2014.84
Yahoo! Inc: (2015). http://www.yahoo.com/
Yildiz, O., Ibrahim, S., Phuong, T.A., Antoniu, G.: Chronos: Failure-aware scheduling in shared Hadoop clusters. In: IEEE International Conference on Big Data (BigData 2015), pp 313–318 (2015). doi:10.1109/BigData.2015.7363770
Yu, Y., Isard, M., Fetterly, D., Budiu, M., Erlingsson, U., Gunda, P.K., Currey, J.: DryadLINQ: a system for general-purpose distributed data-parallel computing using a high-level language. In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, USENIX Association, Berkeley, CA, USA, OSDI’08, pp. 1–14 (2008). http://dl.acm.org/citation.cfm?id=1855741.1855742
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, USENIX Association, Berkeley, CA, USA, NSDI’12, pp. 2–2 (2012). http://dl.acm.org/citation.cfm?id=2228298.2228301
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: Cluster computing with working sets. In: Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing, USENIX Association, Berkeley, CA, USA, HotCloud’10, pp. 10–10 (2010). http://dl.acm.org/citation.cfm?id=1863103.1863113
Zaharia, M., Das, T., Li, H., Hunter, T., Shenker, S., Stoica, I.: Discretized streams: Fault-tolerant streaming computation at scale. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, ACM, New York, NY, USA, SOSP ’13, pp. 423–438 (2013). http://doi.acm.org/10.1145/2517349.2522737
Zaharia, M., Das, T., Li, H., Shenker, S., Stoica, I.: Discretized streams: An efficient and fault-tolerant model for stream processing on large clusters. In: Proceedings of the 4th USENIX Conference on Hot Topics in Cloud Ccomputing, USENIX Association, Berkeley, CA, USA, HotCloud’12, pp. 10–10 (2012). http://dl.acm.org/citation.cfm?id=2342763.2342773
Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R., Stoica, I.: Improving MapReduce performance in heterogeneous environments. In: Proceedings of the 8th USENIX conference on Operating Systems Design and Implementation, USENIX Association, Berkeley, CA, USA, OSDI’08, pp. 29–42 (2008). http://dl.acm.org/citation.cfm?id=1855741.1855744
Zhu, H., Haopeng, C.: Adaptive failure detection via heartbeat under Hadoop. In: Proceedings of the 2011 IEEE Asia-Pacific Services Computing Conference, IEEE, New York, NY, USA, ApSCC’11, pp. 231–238 (2011)
Google Scholar

Download references

Acknowledgments

The research leading to these results has received funding from the H2020 project reference number 642963 in the call H2020-MSCA-ITN-2014.

Author information

Authors and Affiliations

OEG, E.T.S. Ingenieros Informáticos, Universidad Politécnica de Madrid, Campus de Montegancedo s/n, 28660, Boadilla del Monte, Madrid, Spain
Bunjamin Memishi & María S. Pérez
Inria Campus Universitaire de Beaulieu, Rennes, 35042, Brittany, France
Shadi Ibrahim & Gabriel Antoniu

Authors

Bunjamin Memishi
View author publications
You can also search for this author in PubMed Google Scholar
Shadi Ibrahim
View author publications
You can also search for this author in PubMed Google Scholar
María S. Pérez
View author publications
You can also search for this author in PubMed Google Scholar
Gabriel Antoniu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bunjamin Memishi .

Editor information

Editors and Affiliations

University Politehnica of Bucharest, Bucharest, Romania
Florin Pop
Cracow University of Technology, Cracow, Poland
Joanna Kołodziej
Second University of Naples, Naples, Caserta, Italy
Beniamino Di Martino

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Memishi, B., Ibrahim, S., Pérez, M.S., Antoniu, G. (2016). Fault Tolerance in MapReduce: A Survey. In: Pop, F., Kołodziej, J., Di Martino, B. (eds) Resource Management for Big Data Platforms. Computer Communications and Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-44881-7_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-44881-7_11
Published: 28 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44880-0
Online ISBN: 978-3-319-44881-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics