294 Unit Tests for Content Utils: Comprehensive Coverage

Production bugs in content utilities caused data corruption, incorrect user-facing text, and support escalations. A function that strips Arabic diacritics removed too many characters, breaking word boundaries. A serialization helper lost nested data, corrupting analytics events. Each bug required emergency patches, customer apologies, and data cleanup.

The root cause was simple: no tests. Content utility functions—1,200 lines of critical code—had zero automated test coverage. Changes were tested manually, if at all. We couldn't refactor confidently, optimize performance, or prevent regressions. We wrote 294 unit tests to cover every function, edge case, and integration point.

The Problem: Untested Utility Code

Content utilities handle Arabic text processing, format conversion, validation, and serialization. These functions are called from hundreds of locations across the codebase. A bug in a utility function creates bugs everywhere it's used.

Before: Zero Test Coverage

Content Utils Coverage
┌──────────────────────────────────────┐
│ src/utils/content_helpers.py        │
│ - 1,200 lines of code                │
│ - 0 unit tests                       │
│ - Manual QA only                     │
│ - Frequent regressions               │
│                                      │
│ Impact of a bug:                     │
│ - Affects 100+ call sites            │
│ - Manifests in production            │
│ - Requires emergency hotfix          │
│ - Damages user trust                 │
└──────────────────────────────────────┘

Manual testing was incomplete and slow. QA tested happy paths but missed edge cases. A function that worked for 99% of inputs failed on Arabic text with specific diacritic combinations. These edge cases appeared in production, not during testing.

Production Incidents (Pre-Tests):

Diacritic Stripping Bug - remove_diacritics() removed base characters when diacritics appeared in certain Unicode normalization forms, turning "مُحَمَّد" into "محد" (incorrect).
Serialization Data Loss - serialize_nested_dict() performed shallow copying, losing nested Munch objects when converting to JSON for analytics.
Text Truncation Error - truncate_text() split multi-byte Unicode characters mid-sequence, creating invalid UTF-8 that crashed the mobile app.
Null Handling Failure - validate_content() threw exceptions on None values instead of handling gracefully, causing 500 errors in the API.
Format Conversion Bug - convert_markdown_to_html() didn't escape user input, creating XSS vulnerabilities.

Each incident required 2-4 hours of investigation, hotfix development, testing, and deployment. Customer support handled complaints. Data cleanup scripts fixed corrupted content. The total cost per incident was 6-10 hours of engineering time.

The Solution: Comprehensive Unit Test Suite

We wrote 294 unit tests covering every function, edge case, and error condition. Tests run in <2 seconds, providing immediate feedback during development. If a change breaks a utility function, tests fail before code review.

After: 98% Test Coverage

Content Utils Coverage
┌──────────────────────────────────────┐
│ src/utils/content_helpers.py        │
│ - 1,200 lines of code                │
│ - 294 unit tests                     │
│ - 98% line coverage                  │
│ - 100% function coverage             │
│ - 0 regressions in 6 months          │
│                                      │
│ Impact of a bug:                     │
│ - Caught in <2 seconds               │
│ - Never reaches production           │
│ - No customer impact                 │
│ - No emergency hotfixes              │
└──────────────────────────────────────┘

Tests serve as executable documentation, showing exactly how each function should behave. New developers read tests to understand expected inputs, outputs, and edge cases.

Implementation Details

We organized tests by functionality, mirroring the structure of the utility module:

Test File Structure:

src/tests/unit/utils/
├── test_arabic_text.py           # 78 tests
├── test_content_transformation.py # 52 tests
├── test_validation.py             # 43 tests
├── test_serialization.py          # 39 tests
├── test_formatting.py             # 34 tests
├── test_text_processing.py        # 28 tests
└── test_edge_cases.py             # 20 tests

Each test file focuses on a specific domain. Tests are independent—they don't share state or depend on execution order.

Test Category 1: Arabic Text Processing (78 Tests)

Arabic text processing is complex due to Unicode normalization forms, bidirectional text, diacritics, and ligatures. We tested every combination of:

Normalization forms (NFC, NFD, NFKC, NFKD)
Diacritic positions (above, below, tanween, shadda)
Special characters (hamza, alif variants, taa marbouta)
Edge cases (empty strings, null values, mixed scripts)

Example Test:

def test_remove_diacritics_preserves_base_characters():
    """Ensure diacritic removal doesn't affect base characters."""
    # Input with diacritics (NFC normalization)
    input_text = "مُحَمَّدٌ"
    expected = "محمد"

    result = remove_diacritics(input_text)

    assert result == expected
    assert len(result) == 4  # Base characters only
    assert all(not is_diacritic(c) for c in result)

def test_remove_diacritics_handles_nfd_normalization():
    """Test diacritic removal with NFD normalization."""
    import unicodedata

    # Same text, different normalization
    text_nfc = "مُحَمَّد"
    text_nfd = unicodedata.normalize('NFD', text_nfc)

    result_nfc = remove_diacritics(text_nfc)
    result_nfd = remove_diacritics(text_nfd)

    # Results should be identical regardless of input normalization
    assert result_nfc == result_nfd

def test_remove_diacritics_empty_string():
    """Handle empty string input."""
    assert remove_diacritics("") == ""

def test_remove_diacritics_none():
    """Handle None input gracefully."""
    assert remove_diacritics(None) is None

def test_remove_diacritics_mixed_script():
    """Preserve non-Arabic text while removing Arabic diacritics."""
    input_text = "Hello مُحَمَّد World"
    expected = "Hello محمد World"

    assert remove_diacritics(input_text) == expected

These tests caught 12 bugs during initial implementation, including issues with NFD normalization, mixed scripts, and null handling.

Test Category 2: Content Transformation (52 Tests)

Content transformation converts between formats (Markdown ↔ HTML, JSON ↔ Python objects, TTS input ↔ display text). Each conversion has edge cases around special characters, encoding, and structure preservation.

Example Test:

def test_markdown_to_html_escapes_user_input():
    """Prevent XSS by escaping HTML in markdown."""
    malicious_input = "<script>alert('xss')</script>"

    result = convert_markdown_to_html(malicious_input)

    assert "<script>" not in result
    assert "&lt;script&gt;" in result

def test_nested_dict_serialization_preserves_munch():
    """Ensure nested Munch objects serialize correctly."""
    from munch import Munch

    data = Munch({
        "media": Munch({
            "tts": Munch({
                "url": "https://example.com/audio.mp3",
                "duration": 1200
            })
        })
    })

    result = serialize_nested_dict(data)

    assert isinstance(result, dict)
    assert result["media"]["tts"]["duration"] == 1200
    # Ensure no data loss
    assert len(result["media"]["tts"]) == 2

The serialization tests prevented the Munch data loss bug that corrupted analytics events.

Test Category 3: Validation (43 Tests)

Validation functions check user input, API payloads, and configuration values. Tests cover valid inputs, invalid inputs, boundary conditions, and error messages.

Example Test:

def test_validate_email_valid():
    """Accept valid email addresses."""
    valid_emails = [
        "user@example.com",
        "user+tag@example.com",
        "user.name@sub.example.com"
    ]

    for email in valid_emails:
        assert validate_email(email) is True

def test_validate_email_invalid():
    """Reject invalid email addresses."""
    invalid_emails = [
        "not-an-email",
        "@example.com",
        "user@",
        "user@.com",
        ""
    ]

    for email in invalid_emails:
        assert validate_email(email) is False

def test_validate_content_length():
    """Enforce maximum content length."""
    max_length = 1000

    valid_content = "a" * max_length
    invalid_content = "a" * (max_length + 1)

    assert validate_content_length(valid_content, max_length) is True
    assert validate_content_length(invalid_content, max_length) is False

Validation tests ensure error messages are clear and actionable, helping developers debug failed requests.

Test Category 4: Edge Cases (20 Tests)

Edge case tests cover unusual inputs that rarely appear but can cause crashes:

Example Test:

def test_truncate_text_multibyte_characters():
    """Don't split multibyte Unicode characters."""
    # Arabic text with 4-byte emoji
    text = "مرحبا 👋 العالم"
    max_length = 10

    result = truncate_text(text, max_length)

    # Ensure result is valid UTF-8
    result.encode('utf-8')
    # Ensure no partial emoji
    assert '👋' in result or '👋' not in text[:max_length]

def test_null_propagation():
    """Ensure None values propagate correctly."""
    assert format_text(None) is None
    assert validate_content(None) is False
    assert serialize_dict(None) == {}

These edge cases caused 3 of the 5 production incidents—they're rare but critical.

Test Organization Patterns

We followed consistent patterns across all tests:

1. Arrange-Act-Assert (AAA) Pattern:

def test_function():
    # Arrange: Set up test data
    input_data = "test input"
    expected = "expected output"

    # Act: Execute the function
    result = function_under_test(input_data)

    # Assert: Verify the result
    assert result == expected

2. Parameterized Tests:

@pytest.mark.parametrize("input_text,expected", [
    ("مرحبا", "مرحبا"),
    ("Hello", "Hello"),
    ("", ""),
    (None, None),
])
def test_normalize_text(input_text, expected):
    assert normalize_text(input_text) == expected

Parameterized tests reduce duplication and make it easy to add new test cases.

3. Fixture Reuse:

@pytest.fixture
def sample_arabic_text():
    return "مُحَمَّدٌ يَذْهَبُ إِلَى الْمَدْرَسَةِ"

def test_remove_diacritics(sample_arabic_text):
    result = remove_diacritics(sample_arabic_text)
    assert "ُ" not in result  # No damma
    assert "َ" not in result  # No fatha

Fixtures provide consistent test data across multiple tests.

Running Tests

Tests run via pytest with coverage reporting:

# Run all tests
pytest src/tests/unit/utils/

# Run with coverage
pytest --cov=src/utils --cov-report=html src/tests/unit/utils/

# Run specific test file
pytest src/tests/unit/utils/test_arabic_text.py

# Run tests matching a pattern
pytest -k "diacritic"

Coverage reports show exactly which lines lack tests:

src/utils/content_helpers.py        1200    24     98%   Lines 45, 127, 891, ...

The 2% uncovered lines are defensive error handling that's difficult to trigger in unit tests.

Integration with CI/CD

Tests run on every pull request and merge:

# .circleci/config.yml
jobs:
  test:
    docker:
      - image: python:3.11
    steps:
      - checkout
      - run: pip install -r requirements-test.txt
      - run: pytest --cov=src/utils --cov-fail-under=95 src/tests/unit/utils/

The build fails if coverage drops below 95%, preventing regressions.

Real-World Impact

Before Implementation:

0 unit tests
5 production incidents in 6 months
30-50 hours spent on incident response
Developers afraid to refactor utility functions
Manual QA required for every change

After Implementation:

294 unit tests
0 production incidents in 6 months
0 hours spent on incident response (prevented)
Confident refactoring with regression detection
Automated testing replaces manual QA

Time Investment:

Writing tests: 40 hours (spread over 2 weeks)
Maintaining tests: ~2 hours/month
Time saved from prevented incidents: 30-50 hours/6 months

Return on investment was positive within 3 months.

Developer Confidence:

Before tests, developers avoided changing utility functions. "Don't touch content_helpers.py—it's too risky." After tests, refactoring became safe. We optimized performance-critical functions, confident that tests would catch regressions.

Example Refactoring:

We refactored remove_diacritics() to use a faster algorithm:

# Before: O(n * m) where m = number of diacritics
def remove_diacritics(text):
    for diacritic in DIACRITICS:
        text = text.replace(diacritic, '')
    return text

# After: O(n) using Unicode categories
def remove_diacritics(text):
    import unicodedata
    return ''.join(
        c for c in unicodedata.normalize('NFD', text)
        if unicodedata.category(c) != 'Mn'
    )

The refactoring reduced execution time by 60% for typical inputs. Tests passed, confirming behavioral equivalence. Without tests, we would never have attempted this optimization.

Key Takeaways

Test critical paths first - Utility functions are called everywhere; bugs multiply
Cover edge cases explicitly - Null, empty strings, Unicode variants, boundary conditions
Use tests as documentation - Tests show expected behavior more clearly than comments
Make tests fast - <2 seconds for 294 tests enables rapid iteration
Measure coverage - 98% coverage ensures confidence in refactoring

Writing 294 tests for 1,200 lines of code sounds like overhead. The reality is opposite—tests are an investment that pays immediate dividends. We prevented 5+ production incidents, eliminated emergency hotfixes, and enabled confident refactoring. The untested code was the actual overhead, carrying hidden risk that manifested as production bugs.

Test coverage transformed utility functions from fragile, scary code that nobody wanted to touch into solid, optimizable infrastructure. The cultural shift was subtle but profound: developers started writing tests before implementing new utility functions, preventing bugs before they're written.