Publication
ACL 2025 Industry Track (Oral)
TaDA: Training-free recipe for Decoding with Adaptive KV Cache Compression and Mean-centering
TaDA slashes KV cache memory usage by over 70% without sacrificing accuracy — enabling longer, smarter, and more scalable LLM inference with zero retraining.
Read more