HLR Memory Model: Spaced Repetition for Language Learning

Forgetting is inevitable. Within 24 hours of learning something new, we forget roughly 70% of it—unless we review at strategic intervals. Traditional learning systems ignore this reality, presenting content randomly or sequentially without considering optimal review timing. The result: learners waste time reviewing concepts they've mastered while forgetting material that needs reinforcement.

We implemented Half-Life Regression (HLR), Duolingo's proven spaced repetition algorithm, to optimize review timing based on memory science. Every concept has an individual retention curve that adapts to each learner's performance, ensuring reviews happen exactly when they're most effective.

The Forgetting Curve Problem

Hermann Ebbinghaus discovered the forgetting curve in 1885: memory retention decays exponentially over time without reinforcement. The steeper the curve, the faster we forget. The key insight: reviewing content just before you're likely to forget it maximizes retention while minimizing study time.

Before implementing HLR, our platform had no memory model. Content selection was random or sequential, with no consideration for how long ago a user last saw a concept or how well they knew it.

Before: Random Content Selection

Content Pool                Random Selection
┌──────────────────┐       ┌──────────────────┐
│ All Available    │       │ No Memory Model  │
│ Lessons          │──────>│ - Random order   │
│ [L1...L100]      │       │ - No scheduling  │
└──────────────────┘       │ - No retention   │
                            └──────────────────┘

Users encountered the same concepts too frequently (wasting time) or too infrequently (leading to forgetting). There was no data-driven approach to determining when review would be most beneficial.

Half-Life Regression: Personalized Memory Modeling

HLR models memory decay with a simple but powerful formula:

Retention = exp(-t / hlr)
  where:
    t   = time since last exposure (in days)
    hlr = half-life of recall (concept-specific, learned from attempts)

The half-life represents how long it takes for retention probability to drop to 50%. A high half-life means the concept is well-remembered; a low half-life means it's easily forgotten.

After: HLR-Driven Scheduling

Content Pool                HLR Scheduler               Optimized Review
┌──────────────────┐       ┌──────────────────┐       ┌──────────────────┐
│ All Available    │       │ Memory Science   │       │ Right Content    │
│ Lessons          │──────>│ - Decay rate     │──────>│ - Right time     │
│ [L1...L100]      │       │ - Retention calc │       │ - Max learning   │
│                  │       │ - Optimal timing │       │ - Min forgetting │
└──────────────────┘       └──────────────────┘       └──────────────────┘

How HLR Learns from Performance

Every time a user attempts a concept, the system updates the half-life based on success or failure:

Correct Attempt

If the user answers correctly, the half-life increases. The concept is being retained, so we can wait longer before reviewing it again:

new_hlr = old_hlr * (1 + success_factor * time_since_last_attempt)

A correct attempt after a long interval signals strong retention, so the half-life increases more dramatically.

Incorrect Attempt

If the user answers incorrectly, the half-life decreases. The concept is not being retained, so we need to review it sooner:

new_hlr = old_hlr * (1 - failure_penalty)

The penalty is proportional to how long it's been since the last attempt—forgetting after 2 days is more concerning than forgetting after 2 hours.

Initial Half-Life

For concepts never seen before, we assign a default half-life based on concept difficulty and user persona:

Beginner users: 1 day (review sooner)
Intermediate users: 3 days
Advanced users: 7 days (review later)

Calculating Retention Probability

At lesson generation time, the system calculates retention probability for all previously-seen concepts:

def calculate_retention(concept, user, current_time):
    last_attempt_time = get_last_attempt_time(concept, user)
    time_elapsed = current_time - last_attempt_time
    hlr = get_half_life(concept, user)

    retention = exp(-time_elapsed / hlr)
    return retention

Concepts with retention below a threshold (typically 0.5 or 50%) are prioritized for review. Concepts with retention above 0.9 are considered mastered and excluded from review slots.

Preventing Over-Practice and Under-Practice

HLR prevents two common inefficiencies:

1. Over-Practice (Wasted Time)

Without a memory model, users often review concepts they've already mastered. HLR identifies high-retention concepts (retention > 0.9) and deprioritizes them, freeing up study time for weaker material.

2. Under-Practice (Forgetting)

Conversely, users sometimes neglect concepts they've forgotten. HLR identifies low-retention concepts (retention < 0.3) and surfaces them immediately for review before the knowledge is completely lost.

The sweet spot is retention between 0.3 and 0.7—material that's challenging but not forgotten, maximizing learning efficiency.

Implementation Details

Our HLR implementation lives in src/services/personalization/hlr.py with supporting models in src/models/curriculum/.

Data Model

Each user-concept pair has an associated record tracking:

last_attempt_at: Timestamp of most recent attempt
half_life_days: Current half-life in days (updated after each attempt)
correct_attempts: Total correct attempts
total_attempts: Total attempts (correct + incorrect)
mastery_level: Derived metric (0.0 - 1.0) based on accuracy and half-life

HLR Service API

The service exposes two key methods:

class HLRService:
    def calculate_retention(self, user_id, concept_id):
        """Calculate current retention probability (0.0 - 1.0)"""

    def update_half_life(self, user_id, concept_id, correct):
        """Update half-life after attempt (correct=True/False)"""

Integration with Content Selection

The Content Duo selector calls calculate_retention() for all previously-seen concepts, then allocates the review slot to concepts with retention between 0.3 and 0.7 (optimal review window).

Adapting to Study Patterns

HLR gracefully handles irregular study schedules. If a user studies daily, half-lives stabilize around 1-3 days. If a user studies weekly, half-lives adapt to longer intervals (7-14 days).

This self-adjusting property makes HLR robust across different learner behaviors—from cramming students to casual learners.

Empirical Validation

Duolingo validated HLR through extensive A/B testing, showing:

13% improvement in retention compared to traditional spaced repetition (Leitner system)
25% reduction in time required to achieve the same mastery level
Better long-term retention (measured at 30, 60, and 90 days post-learning)

Our implementation follows their proven approach, adapted to our content structure and user base.

Results: Data-Driven Review Scheduling

Since deploying HLR, we've achieved:

Dynamic scheduling - Every concept reviewed at optimal intervals based on individual retention
Automatic priority adjustment - Weak concepts surface sooner; strong concepts surface later
Reduced over-practice - Users spend less time reviewing mastered material
Personalized decay curves - Every user-concept pair has unique retention modeling

Visual Example: Retention Over Time

Consider a concept learned on Day 0:

Retention Probability
1.0 ┤●
    │ ╲
0.9 ┤  ╲
    │   ╲
0.8 ┤    ╲              ●───── (Correct attempt → hlr increases)
    │     ╲           ╱
0.7 ┤      ╲         ╱
    │       ╲       ╱
0.6 ┤        ╲     ╱
    │         ╲   ╱
0.5 ┤          ╲ ╱   ← Review here (50% retention)
    │           ●
0.4 ┤          ╱ ╲
    │         ╱   ╲
0.3 ┤        ╱     ╲  ← If no review, forgetting continues
    │       ╱       ╲
0.2 ┤      ╱         ●─────── (Incorrect → hlr decreases)
    │
    └─────┬─────┬─────┬─────┬─────┬────>
         D0    D3    D6    D9   D12  Days

The curve steepens after an incorrect attempt (lower hlr) and flattens after a correct attempt (higher hlr).

Beyond Vocabulary: Extending HLR

While we currently apply HLR to vocabulary concepts, the algorithm generalizes to any learnable skill:

Grammar rules - Track mastery of grammatical constructs
Pronunciation - Model retention of phonetic patterns
Cultural knowledge - Optimize review of cultural concepts

Challenges and Trade-offs

Cold Start Problem

New users have no attempt history, so we can't calculate personalized half-lives. We address this with persona-based defaults and rapid adaptation (half-lives update after just 2-3 attempts).

Computational Cost

Calculating retention for thousands of concepts at lesson generation time could be slow. We optimize by:

Caching recently-calculated retention scores (5-minute TTL)
Pre-filtering by persona to reduce concept pool size
Indexing last_attempt_at for fast time-based queries

Balancing Exploration vs. Exploitation

Pure HLR-driven selection can create "review loops" where users never encounter new material. Our slot-based system (40% new, 30% review, 30% challenge) ensures balanced exposure.

What's Next for HLR

Future enhancements include:

Cross-skill transfer - Use vocabulary mastery to inform grammar difficulty
Contextual retention - Track retention separately for recognition vs. production tasks
Collaborative filtering - Use population-level data to improve individual predictions
Confidence intervals - Provide uncertainty estimates for retention predictions

HLR transforms spaced repetition from a fixed schedule (review on days 1, 3, 7, 14...) to a personalized, adaptive system that responds to individual learning patterns. Every review happens exactly when it's most beneficial—no sooner, no later.

Implementation Files: src/services/personalization/hlr.py Commits: 7d558b2, 4315731, 8465fd1 Algorithm: Half-Life Regression (Duolingo's proven model)