MPI-Mitten: Enabling Migration Technology in MPI
Authors: C. Du, X.-H. Sun
Date: May, 2006
Venue: The 6th IEEE International Symposium on Cluster Computing and the Grid, Singapore
Type: Conference
Abstract
Group communications are commonly used in parallel and distributed environment. However, existing migration mechanisms do not support group communications. This weakness prevents migration- based proactive fault tolerance, among others, to be applied to MPI applications. In this study, we propose distributed migration protocols with group membership management to support process migration with group changing. We design and implement a process migration enabling MPI library, named MPI- Mitten, to verify the protocols and enhance current MPI platforms for reliability and usability. MPI-Mitten is based on MPI standard and can be applied to any MPI-2 implementations. Experimental results show the proposed distributed process migration protocols are solid and the MPI-Mitten system is effective and is uniquely supporting migration-based fault tolerance.