ACES: Accelerating Sparse Matrix Multiplication with Adaptive Execution Flow and Concurrency-Aware Cache Optimizations

Authors: X. Lu, B. Long, X. Chen, Y. Han, X.-H. Sun

Date: April, 2024

Venue: The 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-29)

Type: Conference

Abstract

Sparse matrix-matrix multiplication (SpMM) is a critical com- putational kernel in numerous scientific and machine learn- ing applications. SpMM involves massive irregular memory accesses and poses great challenges to conventional cache- based computer architectures. Recently dedicated SpMM accelerators have been proposed to enhance SpMM per- formance. However, current SpMM accelerators still face challenges in adapting to varied sparse patterns, fully ex- ploiting inherent parallelism, and optimizing cache perfor- mance. To address these issues, we introduce ACES, a novel SpMM accelerator in this study. First, ACES features an adaptive execution flow that dynamically adjusts to diverse sparse patterns. The adaptive execution flow balances par- allel computing efficiency and data reuse. Second, ACES incorporates locality-concurrency co-optimizations within the global cache. ACES utilizes a concurrency-aware cache management policy, which considers data locality and con- currency for optimal replacement decisions. Additionally, the integration of a non-blocking buffer with the global cache en- hances concurrency and reduces computational stalls. Third, the hardware architecture of ACES is designed to integrate all innovations. The architecture ensures efficient support across the adaptive execution flow, advanced cache opti- mizations, and fine-grained parallel processing. Our perfor- mance evaluation demonstrates that ACES significantly out- performs existing solutions, providing a 2.1× speedup and marking a substantial advancement in SpMM acceleration.

Links

Bibtex Citation Pdf Poster Slides

DOI: 10.1145/3620666.3651381

Abstract

Tags

Links