Generational Memory Management Against the Context Avalanche in LLM Conversations● LIVE — refreshes every 30s
🖥 NHN B200— %— / — GB VRAM—°C— W
📊 Experiment Results (v7)
🌊 Context Avalanche Demo
Loading results from NHN server…
Recall Score by Condition
Total Tokens by Condition
Detailed Results Table
Condition
Type
n
Recall (avg±σ)
All scores
Tokens (avg)
Comp. overhead
The Context Avalanche — Token Growth Simulation
Watch how cumulative token usage grows across strategies as conversations get longer.
Context Avalanche
Without compression, each turn adds all prior tokens to the context.
At turn T with average R tokens/response, cumulative cost ≈ R·T(T+1)/2 — quadratic growth.
At 50 turns our measurements showed 6.44× acceleration vs linear baseline.
ContextGC approach
Generational GC keeps Young (last K turns raw) + Mid summaries + Old merged summary.
Compression cost is O(K) per batch — fixed, not growing.
Naive compression re-summarizes the entire history every K turns — cost grows with history length.