alqosh

Testing Adaptive Curriculum: 58 Unit Tests + Integration Suite

February 1, 2026·content-duo

Overview

Complex adaptive algorithms require comprehensive testing to prevent regressions and ensure correctness. A single bug in HLR calculations can cause all users to receive incorrect content. An error in persona classification can frustrate beginners with advanced material. Without extensive test coverage, adaptive curriculum becomes a black box—impossible to debug and risky to deploy.

We built a 73-test suite (58 unit + 15 integration) that validates every component of the Content Duo system: HLR memory model, persona detection, slot distribution, content selection, and end-to-end lesson generation. The result: 100% confidence in deployments and zero regressions in production.

The Testing Gap

When we started building Content Duo, the codebase had minimal test coverage for adaptive logic. Complex algorithms were tested manually, if at all.

Before: Untested Code

Manual testing was slow and incomplete:

Testing HLR required waiting days to observe retention decay
Testing persona transitions required creating users with months of history
Testing edge cases (empty content pools, concurrent sessions) was nearly impossible

Untested Code
┌──────────────────────────────────────┐
│ ContentDuo Service                   │
│ - No unit tests                      │
│ - No integration tests               │
│ - Manual QA only                     │
└──────────────────────────────────────┘

The Comprehensive Test Suite

We designed a layered test strategy covering unit, integration, and end-to-end scenarios.

After: 73 Automated Tests

Comprehensive Test Suite
┌──────────────────────────────────────┐
│ Unit Tests (58)                      │
│ - HLR calculation tests              │
│ - Persona detection tests            │
│ - Slot distribution tests            │
│ - Configuration tests                │
├──────────────────────────────────────┤
│ Integration Tests (15)               │
│ - Full session flow                  │
│ - Multi-user scenarios               │
│ - Edge cases                         │
│ - Error handling                     │
└──────────────────────────────────────┘
    100% critical path coverage

Unit Test Categories

1. HLR Calculation Tests (18 tests)

The HLR algorithm is the heart of adaptive learning. These tests validate retention calculations and half-life updates.

Test: Initial Half-Life Calculation

def test_initial_hlr_beginner():
    """Beginner users get 1-day initial half-life"""
    user = create_user(persona=Persona.BEGINNER)
    concept = create_concept(difficulty=2)

    hlr = hlr_service.get_initial_half_life(user.id, concept.id)

    assert hlr == 1.0  # 1 day for beginners

Test: Half-Life After Correct Attempt

def test_hlr_increases_on_correct():
    """Correct attempts increase half-life"""
    user_concept = create_user_concept(half_life_days=2.0)

    # Record correct attempt 1 day after last attempt
    hlr_service.update_half_life(
        user_id=user_concept.user_id,
        concept_id=user_concept.concept_id,
        correct=True,
        time_since_last_attempt_days=1.0
    )

    updated = get_user_concept(user_concept.user_id, user_concept.concept_id)
    assert updated.half_life_days > 2.0  # Half-life increased

Test: Half-Life After Incorrect Attempt

def test_hlr_decreases_on_incorrect():
    """Incorrect attempts decrease half-life"""
    user_concept = create_user_concept(half_life_days=3.0)

    hlr_service.update_half_life(
        user_id=user_concept.user_id,
        concept_id=user_concept.concept_id,
        correct=False,
        time_since_last_attempt_days=2.0
    )

    updated = get_user_concept(user_concept.user_id, user_concept.concept_id)
    assert updated.half_life_days < 3.0  # Half-life decreased

Test: Retention Calculation

def test_retention_decay_over_time():
    """Retention decays exponentially"""
    user_concept = create_user_concept(
        half_life_days=3.0,
        last_attempt_at=datetime.utcnow() - timedelta(days=3)
    )

    retention = hlr_service.calculate_retention(
        user_concept.user_id,
        user_concept.concept_id
    )

    # After 1 half-life (3 days), retention should be ~0.5
    assert 0.45 <= retention <= 0.55

2. Persona Detection Tests (12 tests)

Persona classification determines content difficulty. These tests validate classification logic.

Test: Beginner Classification

def test_classify_beginner_low_attempts():
    """Users with <10 attempts are beginners"""
    user = create_user_with_stats(
        total_attempts=5,
        correct_attempts=4,  # 80% accuracy
        avg_speed_ms=3000
    )

    persona = persona_engine.classify_user(user.id)
    assert persona == Persona.BEGINNER  # Not enough attempts

Test: Intermediate Classification

def test_classify_intermediate_moderate_accuracy():
    """Users with 60-85% accuracy are intermediate"""
    user = create_user_with_stats(
        total_attempts=100,
        correct_attempts=72,  # 72% accuracy
        avg_speed_ms=2500
    )

    persona = persona_engine.classify_user(user.id)
    assert persona == Persona.INTERMEDIATE

Test: Advanced Classification

def test_classify_advanced_high_accuracy_and_speed():
    """Users with >85% accuracy AND fast speed are advanced"""
    user = create_user_with_stats(
        total_attempts=500,
        correct_attempts=450,  # 90% accuracy
        avg_speed_ms=1500  # Fast
    )

    persona = persona_engine.classify_user(user.id)
    assert persona == Persona.ADVANCED

Test: Persona Progression

def test_persona_progression_beginner_to_intermediate():
    """Persona upgrades as user improves"""
    user = create_user()

    # Initial: Beginner (low attempts)
    persona_1 = persona_engine.classify_user(user.id)
    assert persona_1 == Persona.BEGINNER

    # Add 50 attempts with 70% accuracy
    add_attempts(user.id, total=50, correct=35)

    # Should upgrade to intermediate
    persona_2 = persona_engine.classify_user(user.id)
    assert persona_2 == Persona.INTERMEDIATE

3. Slot Distribution Tests (10 tests)

Slot distribution controls lesson composition. These tests validate slot allocation logic.

Test: Slot Distribution Matches Config

def test_slot_distribution_respects_config():
    """Lesson composition matches config percentages"""
    config = create_config(
        lesson_size=10,
        new_content_percentage=40,
        review_content_percentage=30,
        challenge_content_percentage=30
    )

    lesson = content_duo_service.generate_lesson(user.id, config.app_name)

    new_count = count_concepts_by_slot(lesson, SlotType.NEW)
    review_count = count_concepts_by_slot(lesson, SlotType.REVIEW)
    challenge_count = count_concepts_by_slot(lesson, SlotType.CHALLENGE)

    assert new_count == 4  # 40% of 10
    assert review_count == 3  # 30% of 10
    assert challenge_count == 3  # 30% of 10

Test: Empty Content Pool Handling

def test_lesson_generation_when_no_new_content():
    """If no new content available, allocate to review"""
    user = create_user_who_completed_all_content()

    lesson = content_duo_service.generate_lesson(user.id, app_name)

    # Should return review-only lesson
    new_count = count_concepts_by_slot(lesson, SlotType.NEW)
    review_count = count_concepts_by_slot(lesson, SlotType.REVIEW)

    assert new_count == 0
    assert review_count > 0

Test: Insufficient Content Handling

def test_lesson_generation_with_insufficient_content():
    """Gracefully handle content pool smaller than lesson size"""
    # Only 3 concepts available, but lesson_size=5
    user = create_user()
    available_concepts = create_concepts(count=3)

    lesson = content_duo_service.generate_lesson(user.id, app_name)

    # Should return lesson with 3 concepts (not error)
    assert len(lesson['concepts']) == 3

4. Content Selection Tests (10 tests)

Content selection filters concepts by difficulty and retention. These tests validate selection logic.

Test: Difficulty Filtering by Persona

def test_beginner_gets_easy_content():
    """Beginners only see difficulty 1-2 concepts"""
    user = create_beginner_user()
    create_concepts([
        {'difficulty': 1, 'id': 101},
        {'difficulty': 2, 'id': 102},
        {'difficulty': 3, 'id': 103},
        {'difficulty': 4, 'id': 104},
        {'difficulty': 5, 'id': 105}
    ])

    lesson = content_duo_service.generate_lesson(user.id, app_name)

    # All concepts should be difficulty 1-2
    for concept in lesson['concepts']:
        assert concept['difficulty'] in [1, 2]

Test: Review Content Selection

def test_review_slot_selects_low_retention_concepts():
    """Review slot prioritizes concepts with retention < 0.7"""
    user = create_user()
    create_user_concepts([
        {'concept_id': 101, 'half_life_days': 2.0, 'last_attempt_days_ago': 1},  # retention: ~0.7
        {'concept_id': 102, 'half_life_days': 3.0, 'last_attempt_days_ago': 4},  # retention: ~0.3
        {'concept_id': 103, 'half_life_days': 5.0, 'last_attempt_days_ago': 1},  # retention: ~0.9
    ])

    lesson = content_duo_service.generate_lesson(user.id, app_name)

    review_concepts = get_concepts_by_slot(lesson, SlotType.REVIEW)

    # Should include concept 102 (low retention) but not 103 (high retention)
    assert 102 in [c['id'] for c in review_concepts]
    assert 103 not in [c['id'] for c in review_concepts]

5. Configuration Tests (8 tests)

Configuration tests validate per-app settings and feature flags.

Test: Per-App Configuration Loading

def test_per_app_config_loading():
    """Each app loads its own configuration"""
    create_config(app_name='amal-app', lesson_size=5)
    create_config(app_name='thurayya-app', lesson_size=7)

    amal_config = config_service.get_config('amal-app')
    thurayya_config = config_service.get_config('thurayya-app')

    assert amal_config.lesson_size == 5
    assert thurayya_config.lesson_size == 7

Test: Feature Flag Enablement

def test_adaptive_disabled_when_config_disabled():
    """Adaptive curriculum respects enabled flag"""
    create_config(app_name='test-app', enabled=False)

    lesson = content_duo_service.generate_lesson(user.id, 'test-app')

    assert lesson is None  # Should return None when disabled

Integration Test Categories

1. Full Session Flow Tests (5 tests)

Integration tests validate end-to-end workflows across multiple components.

Test: Complete Session Workflow

def test_complete_session_workflow(client, auth_headers):
    """Test creating session, recording attempts, completing session"""
    # Step 1: Create session
    response = client.post('/adaptive/sessions', json={
        'app_name': 'amal-app'
    }, headers=auth_headers)

    assert response.status_code == 201
    session_id = response.json['session_id']
    concepts = response.json['lesson']['concepts']

    # Step 2: Record attempts for each concept
    for concept in concepts:
        response = client.post('/adaptive/attempts', json={
            'session_id': session_id,
            'concept_id': concept['id'],
            'correct': True,
            'time_spent_ms': 2500
        }, headers=auth_headers)

        assert response.status_code == 201
        assert 'attempt_id' in response.json

    # Step 3: Verify HLR updated
    user_concept = get_user_concept(user_id, concepts[0]['id'])
    assert user_concept.total_attempts == 1
    assert user_concept.correct_attempts == 1
    assert user_concept.half_life_days > 0

2. Multi-User Scenario Tests (4 tests)

These tests validate that adaptive curriculum works correctly with multiple users simultaneously.

Test: Concurrent Users Get Different Lessons

def test_concurrent_users_get_personalized_lessons():
    """Two users with different personas get different lessons"""
    # User 1: Beginner
    user1 = create_user_with_stats(total_attempts=5, correct_attempts=2)

    # User 2: Advanced
    user2 = create_user_with_stats(total_attempts=500, correct_attempts=450)

    lesson1 = content_duo_service.generate_lesson(user1.id, app_name)
    lesson2 = content_duo_service.generate_lesson(user2.id, app_name)

    # Lessons should differ in difficulty
    avg_difficulty_1 = calculate_avg_difficulty(lesson1['concepts'])
    avg_difficulty_2 = calculate_avg_difficulty(lesson2['concepts'])

    assert avg_difficulty_1 < avg_difficulty_2  # Beginner gets easier content

3. Edge Case Tests (4 tests)

Integration tests for unusual scenarios that could break the system.

Test: User Who Completed All Content

def test_user_completed_all_content():
    """Handle users who mastered all available concepts"""
    user = create_user()

    # Mark all concepts as mastered
    for concept in get_all_concepts():
        create_user_concept(
            user_id=user.id,
            concept_id=concept.id,
            half_life_days=30.0,  # Very high HLR
            last_attempt_at=datetime.utcnow()
        )

    lesson = content_duo_service.generate_lesson(user.id, app_name)

    # Should return None (no content available)
    assert lesson is None

4. Error Handling Tests (2 tests)

Integration tests for error scenarios and recovery.

Test: Database Error Recovery

def test_session_creation_with_db_error(client, auth_headers, mocker):
    """Gracefully handle database errors during session creation"""
    # Mock database to raise error
    mocker.patch('src.services.content_duo.db.session.commit', side_effect=DatabaseError())

    response = client.post('/adaptive/sessions', json={
        'app_name': 'amal-app'
    }, headers=auth_headers)

    # Should return 500 error (not crash)
    assert response.status_code == 500
    assert 'error' in response.json

Test Infrastructure

Test Fixtures

We use pytest fixtures to create reusable test data:

@pytest.fixture
def user():
    """Create test user"""
    user = User(username='testuser', email='test@example.com')
    db.session.add(user)
    db.session.commit()
    return user

@pytest.fixture
def concepts():
    """Create 20 test concepts with varying difficulty"""
    concepts = []
    for i in range(1, 21):
        concept = Concept(
            text=f'concept_{i}',
            difficulty=(i % 5) + 1  # Difficulty 1-5
        )
        concepts.append(concept)
    db.session.add_all(concepts)
    db.session.commit()
    return concepts

Database Isolation

Each test runs in an isolated transaction that rolls back after test completion:

@pytest.fixture(autouse=True)
def test_database():
    """Create test database and rollback after each test"""
    db.session.begin_nested()
    yield
    db.session.rollback()

This ensures tests don't interfere with each other.

Mock Services

We mock external services (Amplitude, Drip) to avoid real API calls during tests:

@pytest.fixture
def mock_amplitude(mocker):
    """Mock Amplitude analytics service"""
    return mocker.patch('src.services.analytics.amplitude.track_event')

Test Coverage Metrics

We use pytest-cov to measure code coverage:

pytest src/tests/unit/ src/tests/integration/ --cov=src/services/content_duo --cov-report=html

Coverage Results:

hlr.py: 100% (all lines covered)
persona_engine.py: 100%
content_selector.py: 98% (2% unreachable error paths)
content_duo.py: 95% (5% edge cases in production-only code)

Continuous Integration

All tests run on every pull request via CircleCI:

# .circleci/config.yml
test_content_duo:
  steps:
    - run: pytest src/tests/unit/services/content_duo/ -v
    - run: pytest src/tests/integration/adaptive/ -v
    - run: pytest --cov=src/services/content_duo --cov-fail-under=95

PRs that drop coverage below 95% fail CI and cannot merge.

Implementation Scope

Files Created:

src/tests/unit/services/content_duo/ - 58 unit tests
src/tests/integration/adaptive/ - 15 integration tests
src/tests/fixtures/content_duo.py - Reusable test fixtures
src/tests/helpers/content_duo.py - Test helper functions

Commits: 8465fd1, ee31e45

Test Execution Time:

Unit tests: ~8 seconds (58 tests)
Integration tests: ~25 seconds (15 tests)
Total: ~33 seconds

Results: Regression-Free Deployments

The comprehensive test suite delivered:

0 → 73 automated tests - Complete coverage of adaptive curriculum
100% confidence in deployments (no regressions since test suite deployed)
Fast feedback - Tests run in <1 minute, enabling rapid iteration
Living documentation - Tests serve as examples of expected behavior

Best Practices Applied

1. Descriptive Test Names

# Bad
def test_hlr():
    ...

# Good
def test_hlr_increases_after_correct_attempt_with_long_interval():
    ...

2. Arrange-Act-Assert Pattern

def test_persona_classification():
    # Arrange: Set up test data
    user = create_user_with_stats(accuracy=0.72)

    # Act: Execute the code under test
    persona = persona_engine.classify_user(user.id)

    # Assert: Verify expected outcome
    assert persona == Persona.INTERMEDIATE

3. One Assertion Per Test (when possible)

# Test one thing at a time
def test_lesson_size_matches_config():
    lesson = generate_lesson(user.id)
    assert len(lesson['concepts']) == 5

def test_lesson_contains_new_content():
    lesson = generate_lesson(user.id)
    assert any(c['slot_type'] == 'new' for c in lesson['concepts'])

What's Next for Testing

Future enhancements:

Property-based testing - Use Hypothesis to generate random test cases
Performance tests - Measure lesson generation latency under load
Mutation testing - Verify tests actually catch bugs (using mutmut)
Visual regression tests - Validate API response schemas

Comprehensive testing transformed Content Duo from experimental code into production-ready infrastructure. With 73 tests covering all critical paths, we deploy with confidence and iterate quickly without fear of regressions.

Implementation Files: src/tests/unit/, src/tests/integration/ Commits: 8465fd1, ee31e45 Coverage: 100% of core logic (HLR, persona, selection)