Large Language Models: Inference Process and KV-Cache Structure

Explore the foundational concepts of LLM inference, including the prefill and decode phases, transformer architecture, and the detailed structure and terminology of the KV-cache.