Skip to main content

Performance under Failures of MapReduce Applications (Poster Presentation)

Authors: H. Jin, K. Qiao, X.-H. Sun, Y. Li

Date: May, 2011

Venue: The 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid'11), Newport Beach, CA, USA

Type: Conference

Abstract

The MapReduce programming paradigm is gaining more and more popularity in recent years due to its ability in supporting easy programming, data distribution, as well as fault tolerance. Failure is an unwanted but inevitable fact that all large-scale parallel computing systems have to face with. MapReduce introduces a novel data replication and task re- execution strategy for fault tolerance. This study intends to lead a better understanding of such fault tolerance mechanisms. In particular, we build a stochastic performance model to quantify the impact of failures on MapReduce applications and to inves- tigate its effectiveness under different computing environments. Simulations also have been carried out to verify the accuracy of the proposed model. Our results show that data replication is an effective approach even when failure rate is high, and the task migration mechanism of MapReduce works well in balancing the reliability difference among individual nodes. This work provides a theoretical foundation for optimizing large-scale MapReduce applications, especially when fault tolerance is the concern.