Skip to main content

Uncover the Overhead and Resource Usage for Handling KV Cache Overflow in LLM Inference

Authors: J. Ye, B. Nicolae, A. Kougkas, X.-H. Sun

Date: November, 2024

Venue: The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'24)

Type: Poster

Tags

KV CacheLLM Inference