Trade-off Study of Localizing Communication and Balancing Network Traffic on Dragonfly System
Authors: X. Wang, X. Yang, M. Mubarak, R. Ross, Z. Lan
Date: May, 2018
Venue: The 32nd IEEE International Parallel and Distributed Processing Symposium (IPDPS'18), Vancouver, Canada2018. pp. 1113-1122.
Type: Conference
Abstract
Dragonfly networks are being widely adopted in high-performance computing systems. On these networks, how- ever, interference caused by resource sharing can lead to sig- nificant network congestion and performance variability. We present a comparative analysis exploring the trade-off between localizing communication and balancing network traffic. We conduct trace-based simulations for applications with different communication patterns, using multiple job placement policies and routing mechanisms. We perform an in-depth performance analysis on representative applications individually and show that different applications have distinct preferences regarding localized communication and balanced network traffic. We fur- ther demonstrate the effect of external network interference by introducing background traffic and show that localized commu- nication can help reduce the application performance variation caused by network sharing.