From Generic Exceptions to Structured Error Handling

Key Takeaway

Our Plotly visualization service threw generic Exception instances that gave clients no context about failures. Implementing a structured exception hierarchy with specific error types improved debugging time by 70% and reduced client integration issues by 60%.

The Problem

Our Lambda functions caught and raised generic exceptions that provided no actionable information:

def generate_chart(data):
    try:
        chart = plotly.graph_objects.Figure(data)
        return chart.to_json()
    except Exception as e:
        raise Exception("Chart generation failed")

This created multiple issues:

No Error Context: Clients received "Chart generation failed" for every failure
Difficult Debugging: No way to distinguish between input errors, system errors, or library failures
Poor Client Experience: No guidance on how to fix the problem
Monitoring Blindness: All errors looked the same in CloudWatch
Retry Logic Impossible: Clients couldn't determine if retrying would help

Context and Background

Our visualization service generates charts from user data using Plotly. Different types of failures require different responses:

Validation errors (400): Client should fix the input
System errors (500): Service issue, retry might work
Timeout errors (504): Data too large, need to reduce
Library errors (500): Plotly configuration issue

Generic exception handling lumped all these together, making it impossible for clients to respond appropriately. Support engineers spent hours debugging issues that should have been obvious from the error message.

The Solution

We implemented a structured exception hierarchy with specific error types:

class VisualizationError(Exception):
    """Base exception for all visualization errors"""
    def __init__(self, message: str, status_code: int = 500):
        self.message = message
        self.status_code = status_code
        super().__init__(self.message)

class ValidationError(VisualizationError):
    """Raised when input validation fails"""
    def __init__(self, message: str):
        super().__init__(message, status_code=400)

class DataProcessingError(VisualizationError):
    """Raised when data processing fails"""
    def __init__(self, message: str):
        super().__init__(message, status_code=422)

class TimeoutError(VisualizationError):
    """Raised when operation times out"""
    def __init__(self, message: str):
        super().__init__(message, status_code=504)

class PlotlyError(VisualizationError):
    """Raised when Plotly library fails"""
    def __init__(self, message: str, original_error: Exception = None):
        self.original_error = original_error
        super().__init__(message, status_code=500)

# Usage in handler
def generate_chart(data):
    try:
        if not data.get('x'):
            raise ValidationError("Missing required field: 'x'")

        if len(data['x']) == 0:
            raise DataProcessingError("X values array is empty")

        if len(data['x']) > 10000:
            raise DataProcessingError("Dataset too large (max 10,000 points)")

        chart = plotly.graph_objects.Figure(data)
        return chart.to_json()

    except plotly.exceptions.PlotlyError as e:
        raise PlotlyError(f"Chart generation failed: {str(e)}", original_error=e)

def lambda_handler(event, context):
    try:
        data = json.loads(event['body'])
        result = generate_chart(data)

        return {
            'statusCode': 200,
            'headers': {'Content-Type': 'application/json'},
            'body': json.dumps({'chart': result})
        }

    except VisualizationError as e:
        logger.error(f"{e.__class__.__name__}: {e.message}")
        return {
            'statusCode': e.status_code,
            'body': json.dumps({
                'error': e.__class__.__name__,
                'message': e.message
            })
        }

    except Exception as e:
        logger.exception("Unexpected error")
        return {
            'statusCode': 500,
            'body': json.dumps({
                'error': 'InternalServerError',
                'message': 'An unexpected error occurred'
            })
        }

Implementation Details

Exception Hierarchy Design

We created a base VisualizationError class that all custom exceptions inherit from:

class VisualizationError(Exception):
    """Base exception with status code support"""
    def __init__(self, message: str, status_code: int = 500, details: dict = None):
        self.message = message
        self.status_code = status_code
        self.details = details or {}
        super().__init__(self.message)

    def to_dict(self):
        """Convert exception to JSON-serializable dict"""
        return {
            'error': self.__class__.__name__,
            'message': self.message,
            'details': self.details
        }

Error Context Enrichment

We added contextual information to errors:

class DataProcessingError(VisualizationError):
    def __init__(self, message: str, field: str = None, value = None):
        details = {}
        if field:
            details['field'] = field
        if value is not None:
            details['value'] = str(value)[:100]  # Truncate large values

        super().__init__(message, status_code=422, details=details)

# Usage
raise DataProcessingError(
    "Invalid data type for X values",
    field="x.value",
    value=type(data['x']).__name__
)

CloudWatch Integration

We structured error logging for better monitoring:

import json
import logging

logger = logging.getLogger()
logger.setLevel(logging.INFO)

def log_error(error: VisualizationError, event: dict):
    """Log structured error data for CloudWatch Insights"""
    log_data = {
        'error_type': error.__class__.__name__,
        'error_message': error.message,
        'status_code': error.status_code,
        'details': error.details,
        'request_id': event.get('requestContext', {}).get('requestId'),
        'user_agent': event.get('headers', {}).get('User-Agent')
    }

    logger.error(json.dumps(log_data))

# CloudWatch Insights query:
# fields @timestamp, error_type, error_message, status_code
# | filter error_type = "ValidationError"
# | stats count() by error_message

Client Retry Logic

Clients can now implement smart retry logic:

async function generateChart(data, retries = 3) {
  for (let i = 0; i < retries; i++) {
    try {
      const response = await fetch('/chart', {
        method: 'POST',
        body: JSON.stringify(data)
      });

      if (response.ok) {
        return await response.json();
      }

      const error = await response.json();

      // Don't retry validation errors
      if (error.error === 'ValidationError') {
        throw new Error(`Validation failed: ${error.message}`);
      }

      // Don't retry dataset too large
      if (error.error === 'DataProcessingError' &&
          error.message.includes('too large')) {
        throw new Error('Dataset exceeds size limit');
      }

      // Retry system errors
      if (i < retries - 1) {
        await sleep(1000 * Math.pow(2, i));
        continue;
      }

      throw new Error(error.message);

    } catch (e) {
      if (i === retries - 1) throw e;
    }
  }
}

Impact and Results

After implementing structured error handling:

Metric	Before	After	Improvement
Average debug time	45 min	13 min	71% reduction
Client integration issues	28/month	11/month	61% reduction
Error classification rate	0%	95%	Enabled monitoring
Support ticket resolution	2.5 days	4 hours	94% faster
Successful retries	N/A	78%	New capability

CloudWatch dashboards now show clear breakdowns:

65% validation errors (client-side fixes)
20% data processing errors (size/format issues)
10% timeout errors (dataset too large)
5% system errors (actual bugs)

Lessons Learned

Exception Hierarchies Matter: Well-designed error types enable better client responses
Context is King: Include field names, values, and suggestions in error messages
Status Codes are Documentation: Use correct HTTP codes to guide client behavior
Log Structured Data: JSON logs enable powerful CloudWatch Insights queries
Design for Retries: Help clients distinguish transient from permanent failures

Generic exceptions are a sign of immature error handling. Invest time in designing a proper exception hierarchy—it pays dividends in debugging, monitoring, and client satisfaction.