News › 2025 › September 2025

KV-cache compression extends context cheaply

Research2025-09-25 Source: arxiv.org

New methods shrink the memory cost of long context, letting smaller GPUs handle bigger windows.

Long context is mostly a memory problem: the KV cache grows with every token. Compressing it lets a given card hold a much larger window before running out of VRAM.

That directly affects what fits on the cards on the prices page.

Read the original at arxiv.org ↗ More from September 2025