Configuration Validation as First Line of Defense
Key Takeaway
Invalid budget thresholds, missing webhook URLs, and malformed configuration caused runtime failures. Implementing a ConfigValidator class caught configuration errors at startup rather than during critical budget alerts, improving reliability from 60% to 99%.
The Problem
Configuration errors discovered at runtime:
# Lambda handler loads config from environment
budget = float(os.environ['MONTHLY_BUDGET']) # Could be missing or invalid
threshold = float(os.environ['ALERT_THRESHOLD']) # Could be 150% (invalid)
webhook = os.environ['SLACK_WEBHOOK'] # Could be malformed URL
Failures occurred during budget alerts when the system was most critical.
The Solution
Validate all configuration at startup:
# config/config_validator.py
class ConfigValidator:
"""Validates monitoring configuration"""
@staticmethod
def validate(config):
ConfigValidator._validate_budget(config.budget)
ConfigValidator._validate_thresholds(config)
ConfigValidator._validate_notifications(config.notifications)
@staticmethod
def _validate_budget(budget):
if budget <= 0:
raise ValueError(f"Budget must be positive, got: {budget}")
@staticmethod
def _validate_thresholds(config):
if not (0 <= config.alert_threshold <= 100):
raise ValueError(f"Alert threshold must be 0-100%, got: {config.alert_threshold}")
@staticmethod
def _validate_notifications(notifications):
if notifications.slack_webhook:
if not notifications.slack_webhook.startswith('https://hooks.slack.com'):
raise ValueError(f"Invalid Slack webhook URL")
Use at Lambda initialization:
# Initialize configuration once (outside handler)
config = MonitoringConfig.from_environment()
ConfigValidator.validate(config)
def lambda_handler(event, context):
# Config is guaranteed valid here
process_budget_alert(config)
Implementation Details
Comprehensive validation rules:
class ConfigValidator:
EMAIL_REGEX = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
@staticmethod
def _validate_email(email):
if not re.match(ConfigValidator.EMAIL_REGEX, email):
raise ValueError(f"Invalid email format: {email}")
@staticmethod
def _validate_s3_config(s3_config):
if not s3_config.bucket_name:
raise ValueError("S3 bucket name required")
if s3_config.max_size_gb <= 0:
raise ValueError("S3 max size must be positive")
@staticmethod
def _validate_fargate_config(fargate_config):
if not (0 <= fargate_config.cpu_threshold <= 100):
raise ValueError("CPU threshold must be 0-100%")
Impact and Results
- Error Detection: 95% of config errors caught at startup
- Reliability: System reliability improved from 60% to 99%
- Debugging: Clear error messages vs vague runtime failures
Lessons Learned
- Fail Fast: Validate configuration at startup, not runtime
- Clear Messages: Validation errors should explain what's wrong and how to fix it
- Type Safety: Validate both type and value constraints
- Documentation: Validation rules serve as configuration documentation
Configuration validation is cheap insurance against expensive runtime failures. Always validate configuration at system initialization.