HLR Memory Model: Spaced Repetition for Language Learning
Forgetting is inevitable. Within 24 hours of learning something new, we forget roughly 70% of it—unless we review at strategic intervals. Traditional learning systems ignore this reality, presenting content randomly or sequentially without considering optimal review timing. The result: learners waste time reviewing concepts they've mastered while forgetting material that needs reinforcement.
We implemented Half-Life Regression (HLR), Duolingo's proven spaced repetition algorithm, to optimize review timing based on memory science. Every concept has an individual retention curve that adapts to each learner's performance, ensuring reviews happen exactly when they're most effective.
The Forgetting Curve Problem
Hermann Ebbinghaus discovered the forgetting curve in 1885: memory retention decays exponentially over time without reinforcement. The steeper the curve, the faster we forget. The key insight: reviewing content just before you're likely to forget it maximizes retention while minimizing study time.
Before implementing HLR, our platform had no memory model. Content selection was random or sequential, with no consideration for how long ago a user last saw a concept or how well they knew it.
Before: Random Content Selection
Content Pool Random Selection
┌──────────────────┐ ┌──────────────────┐
│ All Available │ │ No Memory Model │
│ Lessons │──────>│ - Random order │
│ [L1...L100] │ │ - No scheduling │
└──────────────────┘ │ - No retention │
└──────────────────┘
Users encountered the same concepts too frequently (wasting time) or too infrequently (leading to forgetting). There was no data-driven approach to determining when review would be most beneficial.
Half-Life Regression: Personalized Memory Modeling
HLR models memory decay with a simple but powerful formula:
Retention = exp(-t / hlr)
where:
t = time since last exposure (in days)
hlr = half-life of recall (concept-specific, learned from attempts)
The half-life represents how long it takes for retention probability to drop to 50%. A high half-life means the concept is well-remembered; a low half-life means it's easily forgotten.
After: HLR-Driven Scheduling
Content Pool HLR Scheduler Optimized Review
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ All Available │ │ Memory Science │ │ Right Content │
│ Lessons │──────>│ - Decay rate │──────>│ - Right time │
│ [L1...L100] │ │ - Retention calc │ │ - Max learning │
│ │ │ - Optimal timing │ │ - Min forgetting │
└──────────────────┘ └──────────────────┘ └──────────────────┘
How HLR Learns from Performance
Every time a user attempts a concept, the system updates the half-life based on success or failure:
Correct Attempt
If the user answers correctly, the half-life increases. The concept is being retained, so we can wait longer before reviewing it again:
new_hlr = old_hlr * (1 + success_factor * time_since_last_attempt)
A correct attempt after a long interval signals strong retention, so the half-life increases more dramatically.
Incorrect Attempt
If the user answers incorrectly, the half-life decreases. The concept is not being retained, so we need to review it sooner:
new_hlr = old_hlr * (1 - failure_penalty)
The penalty is proportional to how long it's been since the last attempt—forgetting after 2 days is more concerning than forgetting after 2 hours.
Initial Half-Life
For concepts never seen before, we assign a default half-life based on concept difficulty and user persona:
- Beginner users: 1 day (review sooner)
- Intermediate users: 3 days
- Advanced users: 7 days (review later)
Calculating Retention Probability
At lesson generation time, the system calculates retention probability for all previously-seen concepts:
def calculate_retention(concept, user, current_time):
last_attempt_time = get_last_attempt_time(concept, user)
time_elapsed = current_time - last_attempt_time
hlr = get_half_life(concept, user)
retention = exp(-time_elapsed / hlr)
return retention
Concepts with retention below a threshold (typically 0.5 or 50%) are prioritized for review. Concepts with retention above 0.9 are considered mastered and excluded from review slots.
Preventing Over-Practice and Under-Practice
HLR prevents two common inefficiencies:
1. Over-Practice (Wasted Time)
Without a memory model, users often review concepts they've already mastered. HLR identifies high-retention concepts (retention > 0.9) and deprioritizes them, freeing up study time for weaker material.
2. Under-Practice (Forgetting)
Conversely, users sometimes neglect concepts they've forgotten. HLR identifies low-retention concepts (retention < 0.3) and surfaces them immediately for review before the knowledge is completely lost.
The sweet spot is retention between 0.3 and 0.7—material that's challenging but not forgotten, maximizing learning efficiency.
Implementation Details
Our HLR implementation lives in src/services/personalization/hlr.py with supporting models in src/models/curriculum/.
Data Model
Each user-concept pair has an associated record tracking:
last_attempt_at: Timestamp of most recent attempthalf_life_days: Current half-life in days (updated after each attempt)correct_attempts: Total correct attemptstotal_attempts: Total attempts (correct + incorrect)mastery_level: Derived metric (0.0 - 1.0) based on accuracy and half-life
HLR Service API
The service exposes two key methods:
class HLRService:
def calculate_retention(self, user_id, concept_id):
"""Calculate current retention probability (0.0 - 1.0)"""
def update_half_life(self, user_id, concept_id, correct):
"""Update half-life after attempt (correct=True/False)"""
Integration with Content Selection
The Content Duo selector calls calculate_retention() for all previously-seen concepts, then allocates the review slot to concepts with retention between 0.3 and 0.7 (optimal review window).
Adapting to Study Patterns
HLR gracefully handles irregular study schedules. If a user studies daily, half-lives stabilize around 1-3 days. If a user studies weekly, half-lives adapt to longer intervals (7-14 days).
This self-adjusting property makes HLR robust across different learner behaviors—from cramming students to casual learners.
Empirical Validation
Duolingo validated HLR through extensive A/B testing, showing:
- 13% improvement in retention compared to traditional spaced repetition (Leitner system)
- 25% reduction in time required to achieve the same mastery level
- Better long-term retention (measured at 30, 60, and 90 days post-learning)
Our implementation follows their proven approach, adapted to our content structure and user base.
Results: Data-Driven Review Scheduling
Since deploying HLR, we've achieved:
- Dynamic scheduling - Every concept reviewed at optimal intervals based on individual retention
- Automatic priority adjustment - Weak concepts surface sooner; strong concepts surface later
- Reduced over-practice - Users spend less time reviewing mastered material
- Personalized decay curves - Every user-concept pair has unique retention modeling
Visual Example: Retention Over Time
Consider a concept learned on Day 0:
Retention Probability
1.0 ┤●
│ ╲
0.9 ┤ ╲
│ ╲
0.8 ┤ ╲ ●───── (Correct attempt → hlr increases)
│ ╲ ╱
0.7 ┤ ╲ ╱
│ ╲ ╱
0.6 ┤ ╲ ╱
│ ╲ ╱
0.5 ┤ ╲ ╱ ← Review here (50% retention)
│ ●
0.4 ┤ ╱ ╲
│ ╱ ╲
0.3 ┤ ╱ ╲ ← If no review, forgetting continues
│ ╱ ╲
0.2 ┤ ╱ ●─────── (Incorrect → hlr decreases)
│
└─────┬─────┬─────┬─────┬─────┬────>
D0 D3 D6 D9 D12 Days
The curve steepens after an incorrect attempt (lower hlr) and flattens after a correct attempt (higher hlr).
Beyond Vocabulary: Extending HLR
While we currently apply HLR to vocabulary concepts, the algorithm generalizes to any learnable skill:
- Grammar rules - Track mastery of grammatical constructs
- Pronunciation - Model retention of phonetic patterns
- Cultural knowledge - Optimize review of cultural concepts
Challenges and Trade-offs
Cold Start Problem
New users have no attempt history, so we can't calculate personalized half-lives. We address this with persona-based defaults and rapid adaptation (half-lives update after just 2-3 attempts).
Computational Cost
Calculating retention for thousands of concepts at lesson generation time could be slow. We optimize by:
- Caching recently-calculated retention scores (5-minute TTL)
- Pre-filtering by persona to reduce concept pool size
- Indexing
last_attempt_atfor fast time-based queries
Balancing Exploration vs. Exploitation
Pure HLR-driven selection can create "review loops" where users never encounter new material. Our slot-based system (40% new, 30% review, 30% challenge) ensures balanced exposure.
What's Next for HLR
Future enhancements include:
- Cross-skill transfer - Use vocabulary mastery to inform grammar difficulty
- Contextual retention - Track retention separately for recognition vs. production tasks
- Collaborative filtering - Use population-level data to improve individual predictions
- Confidence intervals - Provide uncertainty estimates for retention predictions
HLR transforms spaced repetition from a fixed schedule (review on days 1, 3, 7, 14...) to a personalized, adaptive system that responds to individual learning patterns. Every review happens exactly when it's most beneficial—no sooner, no later.
Implementation Files: src/services/personalization/hlr.py
Commits: 7d558b2, 4315731, 8465fd1
Algorithm: Half-Life Regression (Duolingo's proven model)