294 Unit Tests for Content Utils: Comprehensive Coverage
Production bugs in content utilities caused data corruption, incorrect user-facing text, and support escalations. A function that strips Arabic diacritics removed too many characters, breaking word boundaries. A serialization helper lost nested data, corrupting analytics events. Each bug required emergency patches, customer apologies, and data cleanup.
The root cause was simple: no tests. Content utility functions—1,200 lines of critical code—had zero automated test coverage. Changes were tested manually, if at all. We couldn't refactor confidently, optimize performance, or prevent regressions. We wrote 294 unit tests to cover every function, edge case, and integration point.
The Problem: Untested Utility Code
Content utilities handle Arabic text processing, format conversion, validation, and serialization. These functions are called from hundreds of locations across the codebase. A bug in a utility function creates bugs everywhere it's used.
Before: Zero Test Coverage
Content Utils Coverage
┌──────────────────────────────────────┐
│ src/utils/content_helpers.py │
│ - 1,200 lines of code │
│ - 0 unit tests │
│ - Manual QA only │
│ - Frequent regressions │
│ │
│ Impact of a bug: │
│ - Affects 100+ call sites │
│ - Manifests in production │
│ - Requires emergency hotfix │
│ - Damages user trust │
└──────────────────────────────────────┘
Manual testing was incomplete and slow. QA tested happy paths but missed edge cases. A function that worked for 99% of inputs failed on Arabic text with specific diacritic combinations. These edge cases appeared in production, not during testing.
Production Incidents (Pre-Tests):
-
Diacritic Stripping Bug -
remove_diacritics()removed base characters when diacritics appeared in certain Unicode normalization forms, turning "مُحَمَّد" into "محد" (incorrect). -
Serialization Data Loss -
serialize_nested_dict()performed shallow copying, losing nested Munch objects when converting to JSON for analytics. -
Text Truncation Error -
truncate_text()split multi-byte Unicode characters mid-sequence, creating invalid UTF-8 that crashed the mobile app. -
Null Handling Failure -
validate_content()threw exceptions onNonevalues instead of handling gracefully, causing 500 errors in the API. -
Format Conversion Bug -
convert_markdown_to_html()didn't escape user input, creating XSS vulnerabilities.
Each incident required 2-4 hours of investigation, hotfix development, testing, and deployment. Customer support handled complaints. Data cleanup scripts fixed corrupted content. The total cost per incident was 6-10 hours of engineering time.
The Solution: Comprehensive Unit Test Suite
We wrote 294 unit tests covering every function, edge case, and error condition. Tests run in <2 seconds, providing immediate feedback during development. If a change breaks a utility function, tests fail before code review.
After: 98% Test Coverage
Content Utils Coverage
┌──────────────────────────────────────┐
│ src/utils/content_helpers.py │
│ - 1,200 lines of code │
│ - 294 unit tests │
│ - 98% line coverage │
│ - 100% function coverage │
│ - 0 regressions in 6 months │
│ │
│ Impact of a bug: │
│ - Caught in <2 seconds │
│ - Never reaches production │
│ - No customer impact │
│ - No emergency hotfixes │
└──────────────────────────────────────┘
Tests serve as executable documentation, showing exactly how each function should behave. New developers read tests to understand expected inputs, outputs, and edge cases.
Implementation Details
We organized tests by functionality, mirroring the structure of the utility module:
Test File Structure:
src/tests/unit/utils/
├── test_arabic_text.py # 78 tests
├── test_content_transformation.py # 52 tests
├── test_validation.py # 43 tests
├── test_serialization.py # 39 tests
├── test_formatting.py # 34 tests
├── test_text_processing.py # 28 tests
└── test_edge_cases.py # 20 tests
Each test file focuses on a specific domain. Tests are independent—they don't share state or depend on execution order.
Test Category 1: Arabic Text Processing (78 Tests)
Arabic text processing is complex due to Unicode normalization forms, bidirectional text, diacritics, and ligatures. We tested every combination of:
- Normalization forms (NFC, NFD, NFKC, NFKD)
- Diacritic positions (above, below, tanween, shadda)
- Special characters (hamza, alif variants, taa marbouta)
- Edge cases (empty strings, null values, mixed scripts)
Example Test:
def test_remove_diacritics_preserves_base_characters():
"""Ensure diacritic removal doesn't affect base characters."""
# Input with diacritics (NFC normalization)
input_text = "مُحَمَّدٌ"
expected = "محمد"
result = remove_diacritics(input_text)
assert result == expected
assert len(result) == 4 # Base characters only
assert all(not is_diacritic(c) for c in result)
def test_remove_diacritics_handles_nfd_normalization():
"""Test diacritic removal with NFD normalization."""
import unicodedata
# Same text, different normalization
text_nfc = "مُحَمَّد"
text_nfd = unicodedata.normalize('NFD', text_nfc)
result_nfc = remove_diacritics(text_nfc)
result_nfd = remove_diacritics(text_nfd)
# Results should be identical regardless of input normalization
assert result_nfc == result_nfd
def test_remove_diacritics_empty_string():
"""Handle empty string input."""
assert remove_diacritics("") == ""
def test_remove_diacritics_none():
"""Handle None input gracefully."""
assert remove_diacritics(None) is None
def test_remove_diacritics_mixed_script():
"""Preserve non-Arabic text while removing Arabic diacritics."""
input_text = "Hello مُحَمَّد World"
expected = "Hello محمد World"
assert remove_diacritics(input_text) == expected
These tests caught 12 bugs during initial implementation, including issues with NFD normalization, mixed scripts, and null handling.
Test Category 2: Content Transformation (52 Tests)
Content transformation converts between formats (Markdown ↔ HTML, JSON ↔ Python objects, TTS input ↔ display text). Each conversion has edge cases around special characters, encoding, and structure preservation.
Example Test:
def test_markdown_to_html_escapes_user_input():
"""Prevent XSS by escaping HTML in markdown."""
malicious_input = "<script>alert('xss')</script>"
result = convert_markdown_to_html(malicious_input)
assert "<script>" not in result
assert "<script>" in result
def test_nested_dict_serialization_preserves_munch():
"""Ensure nested Munch objects serialize correctly."""
from munch import Munch
data = Munch({
"media": Munch({
"tts": Munch({
"url": "https://example.com/audio.mp3",
"duration": 1200
})
})
})
result = serialize_nested_dict(data)
assert isinstance(result, dict)
assert result["media"]["tts"]["duration"] == 1200
# Ensure no data loss
assert len(result["media"]["tts"]) == 2
The serialization tests prevented the Munch data loss bug that corrupted analytics events.
Test Category 3: Validation (43 Tests)
Validation functions check user input, API payloads, and configuration values. Tests cover valid inputs, invalid inputs, boundary conditions, and error messages.
Example Test:
def test_validate_email_valid():
"""Accept valid email addresses."""
valid_emails = [
"user@example.com",
"user+tag@example.com",
"user.name@sub.example.com"
]
for email in valid_emails:
assert validate_email(email) is True
def test_validate_email_invalid():
"""Reject invalid email addresses."""
invalid_emails = [
"not-an-email",
"@example.com",
"user@",
"user@.com",
""
]
for email in invalid_emails:
assert validate_email(email) is False
def test_validate_content_length():
"""Enforce maximum content length."""
max_length = 1000
valid_content = "a" * max_length
invalid_content = "a" * (max_length + 1)
assert validate_content_length(valid_content, max_length) is True
assert validate_content_length(invalid_content, max_length) is False
Validation tests ensure error messages are clear and actionable, helping developers debug failed requests.
Test Category 4: Edge Cases (20 Tests)
Edge case tests cover unusual inputs that rarely appear but can cause crashes:
Example Test:
def test_truncate_text_multibyte_characters():
"""Don't split multibyte Unicode characters."""
# Arabic text with 4-byte emoji
text = "مرحبا 👋 العالم"
max_length = 10
result = truncate_text(text, max_length)
# Ensure result is valid UTF-8
result.encode('utf-8')
# Ensure no partial emoji
assert '👋' in result or '👋' not in text[:max_length]
def test_null_propagation():
"""Ensure None values propagate correctly."""
assert format_text(None) is None
assert validate_content(None) is False
assert serialize_dict(None) == {}
These edge cases caused 3 of the 5 production incidents—they're rare but critical.
Test Organization Patterns
We followed consistent patterns across all tests:
1. Arrange-Act-Assert (AAA) Pattern:
def test_function():
# Arrange: Set up test data
input_data = "test input"
expected = "expected output"
# Act: Execute the function
result = function_under_test(input_data)
# Assert: Verify the result
assert result == expected
2. Parameterized Tests:
@pytest.mark.parametrize("input_text,expected", [
("مرحبا", "مرحبا"),
("Hello", "Hello"),
("", ""),
(None, None),
])
def test_normalize_text(input_text, expected):
assert normalize_text(input_text) == expected
Parameterized tests reduce duplication and make it easy to add new test cases.
3. Fixture Reuse:
@pytest.fixture
def sample_arabic_text():
return "مُحَمَّدٌ يَذْهَبُ إِلَى الْمَدْرَسَةِ"
def test_remove_diacritics(sample_arabic_text):
result = remove_diacritics(sample_arabic_text)
assert "ُ" not in result # No damma
assert "َ" not in result # No fatha
Fixtures provide consistent test data across multiple tests.
Running Tests
Tests run via pytest with coverage reporting:
# Run all tests
pytest src/tests/unit/utils/
# Run with coverage
pytest --cov=src/utils --cov-report=html src/tests/unit/utils/
# Run specific test file
pytest src/tests/unit/utils/test_arabic_text.py
# Run tests matching a pattern
pytest -k "diacritic"
Coverage reports show exactly which lines lack tests:
src/utils/content_helpers.py 1200 24 98% Lines 45, 127, 891, ...
The 2% uncovered lines are defensive error handling that's difficult to trigger in unit tests.
Integration with CI/CD
Tests run on every pull request and merge:
# .circleci/config.yml
jobs:
test:
docker:
- image: python:3.11
steps:
- checkout
- run: pip install -r requirements-test.txt
- run: pytest --cov=src/utils --cov-fail-under=95 src/tests/unit/utils/
The build fails if coverage drops below 95%, preventing regressions.
Real-World Impact
Before Implementation:
- 0 unit tests
- 5 production incidents in 6 months
- 30-50 hours spent on incident response
- Developers afraid to refactor utility functions
- Manual QA required for every change
After Implementation:
- 294 unit tests
- 0 production incidents in 6 months
- 0 hours spent on incident response (prevented)
- Confident refactoring with regression detection
- Automated testing replaces manual QA
Time Investment:
- Writing tests: 40 hours (spread over 2 weeks)
- Maintaining tests: ~2 hours/month
- Time saved from prevented incidents: 30-50 hours/6 months
Return on investment was positive within 3 months.
Developer Confidence:
Before tests, developers avoided changing utility functions. "Don't touch content_helpers.py—it's too risky." After tests, refactoring became safe. We optimized performance-critical functions, confident that tests would catch regressions.
Example Refactoring:
We refactored remove_diacritics() to use a faster algorithm:
# Before: O(n * m) where m = number of diacritics
def remove_diacritics(text):
for diacritic in DIACRITICS:
text = text.replace(diacritic, '')
return text
# After: O(n) using Unicode categories
def remove_diacritics(text):
import unicodedata
return ''.join(
c for c in unicodedata.normalize('NFD', text)
if unicodedata.category(c) != 'Mn'
)
The refactoring reduced execution time by 60% for typical inputs. Tests passed, confirming behavioral equivalence. Without tests, we would never have attempted this optimization.
Key Takeaways
- Test critical paths first - Utility functions are called everywhere; bugs multiply
- Cover edge cases explicitly - Null, empty strings, Unicode variants, boundary conditions
- Use tests as documentation - Tests show expected behavior more clearly than comments
- Make tests fast - <2 seconds for 294 tests enables rapid iteration
- Measure coverage - 98% coverage ensures confidence in refactoring
Writing 294 tests for 1,200 lines of code sounds like overhead. The reality is opposite—tests are an investment that pays immediate dividends. We prevented 5+ production incidents, eliminated emergency hotfixes, and enabled confident refactoring. The untested code was the actual overhead, carrying hidden risk that manifested as production bugs.
Test coverage transformed utility functions from fragile, scary code that nobody wanted to touch into solid, optimizable infrastructure. The cultural shift was subtle but profound: developers started writing tests before implementing new utility functions, preventing bugs before they're written.