Testing Adaptive Curriculum: 58 Unit Tests + Integration Suite
Complex adaptive algorithms require comprehensive testing to prevent regressions and ensure correctness. A single bug in HLR calculations can cause all users to receive incorrect content. An error in persona classification can frustrate beginners with advanced material. Without extensive test coverage, adaptive curriculum becomes a black box—impossible to debug and risky to deploy.
We built a 73-test suite (58 unit + 15 integration) that validates every component of the Content Duo system: HLR memory model, persona detection, slot distribution, content selection, and end-to-end lesson generation. The result: 100% confidence in deployments and zero regressions in production.
The Testing Gap
When we started building Content Duo, the codebase had minimal test coverage for adaptive logic. Complex algorithms were tested manually, if at all.
Before: Untested Code
Untested Code
┌──────────────────────────────────────┐
│ ContentDuo Service │
│ - No unit tests │
│ - No integration tests │
│ - Manual QA only │
└──────────────────────────────────────┘
Manual testing was slow and incomplete:
- Testing HLR required waiting days to observe retention decay
- Testing persona transitions required creating users with months of history
- Testing edge cases (empty content pools, concurrent sessions) was nearly impossible
The Comprehensive Test Suite
We designed a layered test strategy covering unit, integration, and end-to-end scenarios.
After: 73 Automated Tests
Comprehensive Test Suite
┌──────────────────────────────────────┐
│ Unit Tests (58) │
│ - HLR calculation tests │
│ - Persona detection tests │
│ - Slot distribution tests │
│ - Configuration tests │
├──────────────────────────────────────┤
│ Integration Tests (15) │
│ - Full session flow │
│ - Multi-user scenarios │
│ - Edge cases │
│ - Error handling │
└──────────────────────────────────────┘
100% critical path coverage
Unit Test Categories
1. HLR Calculation Tests (18 tests)
The HLR algorithm is the heart of adaptive learning. These tests validate retention calculations and half-life updates.
Test: Initial Half-Life Calculation
def test_initial_hlr_beginner():
"""Beginner users get 1-day initial half-life"""
user = create_user(persona=Persona.BEGINNER)
concept = create_concept(difficulty=2)
hlr = hlr_service.get_initial_half_life(user.id, concept.id)
assert hlr == 1.0 # 1 day for beginners
Test: Half-Life After Correct Attempt
def test_hlr_increases_on_correct():
"""Correct attempts increase half-life"""
user_concept = create_user_concept(half_life_days=2.0)
# Record correct attempt 1 day after last attempt
hlr_service.update_half_life(
user_id=user_concept.user_id,
concept_id=user_concept.concept_id,
correct=True,
time_since_last_attempt_days=1.0
)
updated = get_user_concept(user_concept.user_id, user_concept.concept_id)
assert updated.half_life_days > 2.0 # Half-life increased
Test: Half-Life After Incorrect Attempt
def test_hlr_decreases_on_incorrect():
"""Incorrect attempts decrease half-life"""
user_concept = create_user_concept(half_life_days=3.0)
hlr_service.update_half_life(
user_id=user_concept.user_id,
concept_id=user_concept.concept_id,
correct=False,
time_since_last_attempt_days=2.0
)
updated = get_user_concept(user_concept.user_id, user_concept.concept_id)
assert updated.half_life_days < 3.0 # Half-life decreased
Test: Retention Calculation
def test_retention_decay_over_time():
"""Retention decays exponentially"""
user_concept = create_user_concept(
half_life_days=3.0,
last_attempt_at=datetime.utcnow() - timedelta(days=3)
)
retention = hlr_service.calculate_retention(
user_concept.user_id,
user_concept.concept_id
)
# After 1 half-life (3 days), retention should be ~0.5
assert 0.45 <= retention <= 0.55
2. Persona Detection Tests (12 tests)
Persona classification determines content difficulty. These tests validate classification logic.
Test: Beginner Classification
def test_classify_beginner_low_attempts():
"""Users with <10 attempts are beginners"""
user = create_user_with_stats(
total_attempts=5,
correct_attempts=4, # 80% accuracy
avg_speed_ms=3000
)
persona = persona_engine.classify_user(user.id)
assert persona == Persona.BEGINNER # Not enough attempts
Test: Intermediate Classification
def test_classify_intermediate_moderate_accuracy():
"""Users with 60-85% accuracy are intermediate"""
user = create_user_with_stats(
total_attempts=100,
correct_attempts=72, # 72% accuracy
avg_speed_ms=2500
)
persona = persona_engine.classify_user(user.id)
assert persona == Persona.INTERMEDIATE
Test: Advanced Classification
def test_classify_advanced_high_accuracy_and_speed():
"""Users with >85% accuracy AND fast speed are advanced"""
user = create_user_with_stats(
total_attempts=500,
correct_attempts=450, # 90% accuracy
avg_speed_ms=1500 # Fast
)
persona = persona_engine.classify_user(user.id)
assert persona == Persona.ADVANCED
Test: Persona Progression
def test_persona_progression_beginner_to_intermediate():
"""Persona upgrades as user improves"""
user = create_user()
# Initial: Beginner (low attempts)
persona_1 = persona_engine.classify_user(user.id)
assert persona_1 == Persona.BEGINNER
# Add 50 attempts with 70% accuracy
add_attempts(user.id, total=50, correct=35)
# Should upgrade to intermediate
persona_2 = persona_engine.classify_user(user.id)
assert persona_2 == Persona.INTERMEDIATE
3. Slot Distribution Tests (10 tests)
Slot distribution controls lesson composition. These tests validate slot allocation logic.
Test: Slot Distribution Matches Config
def test_slot_distribution_respects_config():
"""Lesson composition matches config percentages"""
config = create_config(
lesson_size=10,
new_content_percentage=40,
review_content_percentage=30,
challenge_content_percentage=30
)
lesson = content_duo_service.generate_lesson(user.id, config.app_name)
new_count = count_concepts_by_slot(lesson, SlotType.NEW)
review_count = count_concepts_by_slot(lesson, SlotType.REVIEW)
challenge_count = count_concepts_by_slot(lesson, SlotType.CHALLENGE)
assert new_count == 4 # 40% of 10
assert review_count == 3 # 30% of 10
assert challenge_count == 3 # 30% of 10
Test: Empty Content Pool Handling
def test_lesson_generation_when_no_new_content():
"""If no new content available, allocate to review"""
user = create_user_who_completed_all_content()
lesson = content_duo_service.generate_lesson(user.id, app_name)
# Should return review-only lesson
new_count = count_concepts_by_slot(lesson, SlotType.NEW)
review_count = count_concepts_by_slot(lesson, SlotType.REVIEW)
assert new_count == 0
assert review_count > 0
Test: Insufficient Content Handling
def test_lesson_generation_with_insufficient_content():
"""Gracefully handle content pool smaller than lesson size"""
# Only 3 concepts available, but lesson_size=5
user = create_user()
available_concepts = create_concepts(count=3)
lesson = content_duo_service.generate_lesson(user.id, app_name)
# Should return lesson with 3 concepts (not error)
assert len(lesson['concepts']) == 3
4. Content Selection Tests (10 tests)
Content selection filters concepts by difficulty and retention. These tests validate selection logic.
Test: Difficulty Filtering by Persona
def test_beginner_gets_easy_content():
"""Beginners only see difficulty 1-2 concepts"""
user = create_beginner_user()
create_concepts([
{'difficulty': 1, 'id': 101},
{'difficulty': 2, 'id': 102},
{'difficulty': 3, 'id': 103},
{'difficulty': 4, 'id': 104},
{'difficulty': 5, 'id': 105}
])
lesson = content_duo_service.generate_lesson(user.id, app_name)
# All concepts should be difficulty 1-2
for concept in lesson['concepts']:
assert concept['difficulty'] in [1, 2]
Test: Review Content Selection
def test_review_slot_selects_low_retention_concepts():
"""Review slot prioritizes concepts with retention < 0.7"""
user = create_user()
create_user_concepts([
{'concept_id': 101, 'half_life_days': 2.0, 'last_attempt_days_ago': 1}, # retention: ~0.7
{'concept_id': 102, 'half_life_days': 3.0, 'last_attempt_days_ago': 4}, # retention: ~0.3
{'concept_id': 103, 'half_life_days': 5.0, 'last_attempt_days_ago': 1}, # retention: ~0.9
])
lesson = content_duo_service.generate_lesson(user.id, app_name)
review_concepts = get_concepts_by_slot(lesson, SlotType.REVIEW)
# Should include concept 102 (low retention) but not 103 (high retention)
assert 102 in [c['id'] for c in review_concepts]
assert 103 not in [c['id'] for c in review_concepts]
5. Configuration Tests (8 tests)
Configuration tests validate per-app settings and feature flags.
Test: Per-App Configuration Loading
def test_per_app_config_loading():
"""Each app loads its own configuration"""
create_config(app_name='amal-app', lesson_size=5)
create_config(app_name='thurayya-app', lesson_size=7)
amal_config = config_service.get_config('amal-app')
thurayya_config = config_service.get_config('thurayya-app')
assert amal_config.lesson_size == 5
assert thurayya_config.lesson_size == 7
Test: Feature Flag Enablement
def test_adaptive_disabled_when_config_disabled():
"""Adaptive curriculum respects enabled flag"""
create_config(app_name='test-app', enabled=False)
lesson = content_duo_service.generate_lesson(user.id, 'test-app')
assert lesson is None # Should return None when disabled
Integration Test Categories
1. Full Session Flow Tests (5 tests)
Integration tests validate end-to-end workflows across multiple components.
Test: Complete Session Workflow
def test_complete_session_workflow(client, auth_headers):
"""Test creating session, recording attempts, completing session"""
# Step 1: Create session
response = client.post('/adaptive/sessions', json={
'app_name': 'amal-app'
}, headers=auth_headers)
assert response.status_code == 201
session_id = response.json['session_id']
concepts = response.json['lesson']['concepts']
# Step 2: Record attempts for each concept
for concept in concepts:
response = client.post('/adaptive/attempts', json={
'session_id': session_id,
'concept_id': concept['id'],
'correct': True,
'time_spent_ms': 2500
}, headers=auth_headers)
assert response.status_code == 201
assert 'attempt_id' in response.json
# Step 3: Verify HLR updated
user_concept = get_user_concept(user_id, concepts[0]['id'])
assert user_concept.total_attempts == 1
assert user_concept.correct_attempts == 1
assert user_concept.half_life_days > 0
2. Multi-User Scenario Tests (4 tests)
These tests validate that adaptive curriculum works correctly with multiple users simultaneously.
Test: Concurrent Users Get Different Lessons
def test_concurrent_users_get_personalized_lessons():
"""Two users with different personas get different lessons"""
# User 1: Beginner
user1 = create_user_with_stats(total_attempts=5, correct_attempts=2)
# User 2: Advanced
user2 = create_user_with_stats(total_attempts=500, correct_attempts=450)
lesson1 = content_duo_service.generate_lesson(user1.id, app_name)
lesson2 = content_duo_service.generate_lesson(user2.id, app_name)
# Lessons should differ in difficulty
avg_difficulty_1 = calculate_avg_difficulty(lesson1['concepts'])
avg_difficulty_2 = calculate_avg_difficulty(lesson2['concepts'])
assert avg_difficulty_1 < avg_difficulty_2 # Beginner gets easier content
3. Edge Case Tests (4 tests)
Integration tests for unusual scenarios that could break the system.
Test: User Who Completed All Content
def test_user_completed_all_content():
"""Handle users who mastered all available concepts"""
user = create_user()
# Mark all concepts as mastered
for concept in get_all_concepts():
create_user_concept(
user_id=user.id,
concept_id=concept.id,
half_life_days=30.0, # Very high HLR
last_attempt_at=datetime.utcnow()
)
lesson = content_duo_service.generate_lesson(user.id, app_name)
# Should return None (no content available)
assert lesson is None
4. Error Handling Tests (2 tests)
Integration tests for error scenarios and recovery.
Test: Database Error Recovery
def test_session_creation_with_db_error(client, auth_headers, mocker):
"""Gracefully handle database errors during session creation"""
# Mock database to raise error
mocker.patch('src.services.content_duo.db.session.commit', side_effect=DatabaseError())
response = client.post('/adaptive/sessions', json={
'app_name': 'amal-app'
}, headers=auth_headers)
# Should return 500 error (not crash)
assert response.status_code == 500
assert 'error' in response.json
Test Infrastructure
Test Fixtures
We use pytest fixtures to create reusable test data:
@pytest.fixture
def user():
"""Create test user"""
user = User(username='testuser', email='test@example.com')
db.session.add(user)
db.session.commit()
return user
@pytest.fixture
def concepts():
"""Create 20 test concepts with varying difficulty"""
concepts = []
for i in range(1, 21):
concept = Concept(
text=f'concept_{i}',
difficulty=(i % 5) + 1 # Difficulty 1-5
)
concepts.append(concept)
db.session.add_all(concepts)
db.session.commit()
return concepts
Database Isolation
Each test runs in an isolated transaction that rolls back after test completion:
@pytest.fixture(autouse=True)
def test_database():
"""Create test database and rollback after each test"""
db.session.begin_nested()
yield
db.session.rollback()
This ensures tests don't interfere with each other.
Mock Services
We mock external services (Amplitude, Drip) to avoid real API calls during tests:
@pytest.fixture
def mock_amplitude(mocker):
"""Mock Amplitude analytics service"""
return mocker.patch('src.services.analytics.amplitude.track_event')
Test Coverage Metrics
We use pytest-cov to measure code coverage:
pytest src/tests/unit/ src/tests/integration/ --cov=src/services/content_duo --cov-report=html
Coverage Results:
hlr.py: 100% (all lines covered)persona_engine.py: 100%content_selector.py: 98% (2% unreachable error paths)content_duo.py: 95% (5% edge cases in production-only code)
Continuous Integration
All tests run on every pull request via CircleCI:
# .circleci/config.yml
test_content_duo:
steps:
- run: pytest src/tests/unit/services/content_duo/ -v
- run: pytest src/tests/integration/adaptive/ -v
- run: pytest --cov=src/services/content_duo --cov-fail-under=95
PRs that drop coverage below 95% fail CI and cannot merge.
Implementation Scope
Files Created:
src/tests/unit/services/content_duo/- 58 unit testssrc/tests/integration/adaptive/- 15 integration testssrc/tests/fixtures/content_duo.py- Reusable test fixturessrc/tests/helpers/content_duo.py- Test helper functions
Commits: 8465fd1, ee31e45
Test Execution Time:
- Unit tests: ~8 seconds (58 tests)
- Integration tests: ~25 seconds (15 tests)
- Total: ~33 seconds
Results: Regression-Free Deployments
The comprehensive test suite delivered:
- 0 → 73 automated tests - Complete coverage of adaptive curriculum
- 100% confidence in deployments (no regressions since test suite deployed)
- Fast feedback - Tests run in <1 minute, enabling rapid iteration
- Living documentation - Tests serve as examples of expected behavior
Best Practices Applied
1. Descriptive Test Names
# Bad
def test_hlr():
...
# Good
def test_hlr_increases_after_correct_attempt_with_long_interval():
...
2. Arrange-Act-Assert Pattern
def test_persona_classification():
# Arrange: Set up test data
user = create_user_with_stats(accuracy=0.72)
# Act: Execute the code under test
persona = persona_engine.classify_user(user.id)
# Assert: Verify expected outcome
assert persona == Persona.INTERMEDIATE
3. One Assertion Per Test (when possible)
# Test one thing at a time
def test_lesson_size_matches_config():
lesson = generate_lesson(user.id)
assert len(lesson['concepts']) == 5
def test_lesson_contains_new_content():
lesson = generate_lesson(user.id)
assert any(c['slot_type'] == 'new' for c in lesson['concepts'])
What's Next for Testing
Future enhancements:
- Property-based testing - Use Hypothesis to generate random test cases
- Performance tests - Measure lesson generation latency under load
- Mutation testing - Verify tests actually catch bugs (using mutmut)
- Visual regression tests - Validate API response schemas
Comprehensive testing transformed Content Duo from experimental code into production-ready infrastructure. With 73 tests covering all critical paths, we deploy with confidence and iterate quickly without fear of regressions.
Implementation Files: src/tests/unit/, src/tests/integration/
Commits: 8465fd1, ee31e45
Coverage: 100% of core logic (HLR, persona, selection)