spatialx

Safe Environment Variable Handling with Feature Flags

·budget-manager

Key Takeaway

Missing environment variables caused Lambda crashes during initialization. Implementing safe defaults, explicit validation, and feature flags enabled graceful degradation and staged rollouts, improving system reliability from 85% to 99.5%.

The Problem

Environment variables accessed without validation:

# Crash if SLACK_WEBHOOK not set
webhook = os.environ['SLACK_WEBHOOK']

# Type error if value isn't float
max_size = float(os.environ['S3_MAX_SIZE'])

The Solution

Safe environment variable access:

def lambda_handler(event, context):
    # Required variables with validation
    bucket_name = os.environ.get('S3_BUCKET_NAME')
    if not bucket_name:
        raise ConfigurationError("S3_BUCKET_NAME required")

    # Optional variables with defaults
    max_size = float(os.environ.get('S3_MAX_SIZE_GB', '1.0'))

    # Feature flags
    enable_monitoring = os.environ.get('ENABLE_S3_MONITORING', 'true').lower() == 'true'
    enable_notifications = os.environ.get('ENABLE_NOTIFICATIONS', 'true').lower() == 'true'

    # Graceful degradation
    if not enable_monitoring:
        logger.info('S3 monitoring disabled')
        return {'statusCode': 200, 'body': 'Monitoring disabled'}

    # Execute with notification fallback
    result = monitor_s3(bucket_name, max_size)

    if enable_notifications:
        try:
            send_notification(result)
        except Exception as e:
            logger.error(f'Notification failed: {e}')
            # Don't fail monitoring if notification fails

    return {'statusCode': 200, 'body': json.dumps(result)}

Configuration helper:

class EnvironmentConfig:
    """Safe environment variable access"""

    @staticmethod
    def get_required(key):
        value = os.environ.get(key)
        if value is None:
            raise ConfigurationError(f"Required environment variable missing: {key}")
        return value

    @staticmethod
    def get_optional(key, default=None):
        return os.environ.get(key, default)

    @staticmethod
    def get_int(key, default=0):
        value = os.environ.get(key, str(default))
        try:
            return int(value)
        except ValueError:
            raise ConfigurationError(f"Invalid integer for {key}: {value}")

    @staticmethod
    def get_float(key, default=0.0):
        value = os.environ.get(key, str(default))
        try:
            return float(value)
        except ValueError:
            raise ConfigurationError(f"Invalid float for {key}: {value}")

    @staticmethod
    def get_bool(key, default=False):
        value = os.environ.get(key, str(default)).lower()
        return value in ['true', '1', 'yes']

Usage:

config = {
    'bucket': EnvironmentConfig.get_required('S3_BUCKET_NAME'),
    'max_size': EnvironmentConfig.get_float('S3_MAX_SIZE_GB', 1.0),
    'enable_alerts': EnvironmentConfig.get_bool('ENABLE_ALERTS', True)
}

Implementation Details

Feature flag pattern:

class FeatureFlags:
    S3_MONITORING = 'ENABLE_S3_MONITORING'
    SQS_MONITORING = 'ENABLE_SQS_MONITORING'
    FARGATE_MONITORING = 'ENABLE_FARGATE_MONITORING'
    NOTIFICATIONS = 'ENABLE_NOTIFICATIONS'

    @staticmethod
    def is_enabled(flag_name):
        return os.environ.get(flag_name, 'true').lower() == 'true'

# Usage
if FeatureFlags.is_enabled(FeatureFlags.S3_MONITORING):
    monitor_s3()

Deployment configuration:

# serverless.yml
functions:
  s3Monitor:
    handler: handlers.s3_monitor
    environment:
      S3_BUCKET_NAME: ${self:custom.s3Bucket}
      S3_MAX_SIZE_GB: ${self:custom.s3MaxSize.${self:provider.stage}}
      ENABLE_S3_MONITORING: ${self:custom.featureFlags.s3Monitoring.${self:provider.stage}}

custom:
  s3MaxSize:
    dev: "1.0"
    staging: "5.0"
    prod: "10.0"

  featureFlags:
    s3Monitoring:
      dev: "true"
      staging: "true"
      prod: "true"

Impact and Results

  • Reliability: Initialization failures dropped from 15% to 0.5%
  • Deployments: Feature flags enabled canary deployments
  • Debugging: Clear error messages for missing configuration
  • Flexibility: Easy to disable features without code changes

Lessons Learned

  1. Never Assume: Environment variables may be missing or malformed
  2. Provide Defaults: Optional variables should have sensible defaults
  3. Validate Types: Convert and validate variable types explicitly
  4. Feature Flags: Enable gradual rollouts and quick feature disabling
  5. Fail Fast: Validate required configuration at startup

Safe environment variable handling is essential for production Lambda functions. Always validate, provide defaults, and implement graceful degradation for optional features.