Uncover the Overhead and Resource Usage for Handling KV Cache Overflow in LLM Inference
Authors: J. Ye, B. Nicolae, A. Kougkas, X.-H. Sun
Date: November, 2024
Venue: The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'24)
Type: Poster
Tags
KV CacheLLM Inference