ContextGC

Generational Memory Management Against the Context Avalanche in LLM Conversations ● LIVE — refreshes every 30s
🖥 NHN B200 — % — / — GB VRAM —°C — W
📊 Experiment Results (v7)
🌊 Context Avalanche Demo
Loading results from NHN server…

The Context Avalanche — Token Growth Simulation

Watch how cumulative token usage grows across strategies as conversations get longer.

Context Avalanche
Without compression, each turn adds all prior tokens to the context. At turn T with average R tokens/response, cumulative cost ≈ R·T(T+1)/2 — quadratic growth. At 50 turns our measurements showed 6.44× acceleration vs linear baseline.
ContextGC approach
Generational GC keeps Young (last K turns raw) + Mid summaries + Old merged summary. Compression cost is O(K) per batch — fixed, not growing. Naive compression re-summarizes the entire history every K turns — cost grows with history length.