News › 2025 › September 2025
KV-cache compression extends context cheaply
Research2025-09-25
Source: arxiv.org
New methods shrink the memory cost of long context, letting smaller GPUs handle bigger windows.
Long context is mostly a memory problem: the KV cache grows with every token. Compressing it lets a given card hold a much larger window before running out of VRAM.
That directly affects what fits on the cards on the prices page.