From Generic Exceptions to Structured Error Handling
Key Takeaway
Our Plotly visualization service threw generic Exception instances that gave clients no context about failures. Implementing a structured exception hierarchy with specific error types improved debugging time by 70% and reduced client integration issues by 60%.
The Problem
Our Lambda functions caught and raised generic exceptions that provided no actionable information:
def generate_chart(data):
try:
chart = plotly.graph_objects.Figure(data)
return chart.to_json()
except Exception as e:
raise Exception("Chart generation failed")
This created multiple issues:
- No Error Context: Clients received "Chart generation failed" for every failure
- Difficult Debugging: No way to distinguish between input errors, system errors, or library failures
- Poor Client Experience: No guidance on how to fix the problem
- Monitoring Blindness: All errors looked the same in CloudWatch
- Retry Logic Impossible: Clients couldn't determine if retrying would help
Context and Background
Our visualization service generates charts from user data using Plotly. Different types of failures require different responses:
- Validation errors (400): Client should fix the input
- System errors (500): Service issue, retry might work
- Timeout errors (504): Data too large, need to reduce
- Library errors (500): Plotly configuration issue
Generic exception handling lumped all these together, making it impossible for clients to respond appropriately. Support engineers spent hours debugging issues that should have been obvious from the error message.
The Solution
We implemented a structured exception hierarchy with specific error types:
class VisualizationError(Exception):
"""Base exception for all visualization errors"""
def __init__(self, message: str, status_code: int = 500):
self.message = message
self.status_code = status_code
super().__init__(self.message)
class ValidationError(VisualizationError):
"""Raised when input validation fails"""
def __init__(self, message: str):
super().__init__(message, status_code=400)
class DataProcessingError(VisualizationError):
"""Raised when data processing fails"""
def __init__(self, message: str):
super().__init__(message, status_code=422)
class TimeoutError(VisualizationError):
"""Raised when operation times out"""
def __init__(self, message: str):
super().__init__(message, status_code=504)
class PlotlyError(VisualizationError):
"""Raised when Plotly library fails"""
def __init__(self, message: str, original_error: Exception = None):
self.original_error = original_error
super().__init__(message, status_code=500)
# Usage in handler
def generate_chart(data):
try:
if not data.get('x'):
raise ValidationError("Missing required field: 'x'")
if len(data['x']) == 0:
raise DataProcessingError("X values array is empty")
if len(data['x']) > 10000:
raise DataProcessingError("Dataset too large (max 10,000 points)")
chart = plotly.graph_objects.Figure(data)
return chart.to_json()
except plotly.exceptions.PlotlyError as e:
raise PlotlyError(f"Chart generation failed: {str(e)}", original_error=e)
def lambda_handler(event, context):
try:
data = json.loads(event['body'])
result = generate_chart(data)
return {
'statusCode': 200,
'headers': {'Content-Type': 'application/json'},
'body': json.dumps({'chart': result})
}
except VisualizationError as e:
logger.error(f"{e.__class__.__name__}: {e.message}")
return {
'statusCode': e.status_code,
'body': json.dumps({
'error': e.__class__.__name__,
'message': e.message
})
}
except Exception as e:
logger.exception("Unexpected error")
return {
'statusCode': 500,
'body': json.dumps({
'error': 'InternalServerError',
'message': 'An unexpected error occurred'
})
}
Implementation Details
Exception Hierarchy Design
We created a base VisualizationError class that all custom exceptions inherit from:
class VisualizationError(Exception):
"""Base exception with status code support"""
def __init__(self, message: str, status_code: int = 500, details: dict = None):
self.message = message
self.status_code = status_code
self.details = details or {}
super().__init__(self.message)
def to_dict(self):
"""Convert exception to JSON-serializable dict"""
return {
'error': self.__class__.__name__,
'message': self.message,
'details': self.details
}
Error Context Enrichment
We added contextual information to errors:
class DataProcessingError(VisualizationError):
def __init__(self, message: str, field: str = None, value = None):
details = {}
if field:
details['field'] = field
if value is not None:
details['value'] = str(value)[:100] # Truncate large values
super().__init__(message, status_code=422, details=details)
# Usage
raise DataProcessingError(
"Invalid data type for X values",
field="x.value",
value=type(data['x']).__name__
)
CloudWatch Integration
We structured error logging for better monitoring:
import json
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def log_error(error: VisualizationError, event: dict):
"""Log structured error data for CloudWatch Insights"""
log_data = {
'error_type': error.__class__.__name__,
'error_message': error.message,
'status_code': error.status_code,
'details': error.details,
'request_id': event.get('requestContext', {}).get('requestId'),
'user_agent': event.get('headers', {}).get('User-Agent')
}
logger.error(json.dumps(log_data))
# CloudWatch Insights query:
# fields @timestamp, error_type, error_message, status_code
# | filter error_type = "ValidationError"
# | stats count() by error_message
Client Retry Logic
Clients can now implement smart retry logic:
async function generateChart(data, retries = 3) {
for (let i = 0; i < retries; i++) {
try {
const response = await fetch('/chart', {
method: 'POST',
body: JSON.stringify(data)
});
if (response.ok) {
return await response.json();
}
const error = await response.json();
// Don't retry validation errors
if (error.error === 'ValidationError') {
throw new Error(`Validation failed: ${error.message}`);
}
// Don't retry dataset too large
if (error.error === 'DataProcessingError' &&
error.message.includes('too large')) {
throw new Error('Dataset exceeds size limit');
}
// Retry system errors
if (i < retries - 1) {
await sleep(1000 * Math.pow(2, i));
continue;
}
throw new Error(error.message);
} catch (e) {
if (i === retries - 1) throw e;
}
}
}
Impact and Results
After implementing structured error handling:
| Metric | Before | After | Improvement | |--------|--------|-------|-------------| | Average debug time | 45 min | 13 min | 71% reduction | | Client integration issues | 28/month | 11/month | 61% reduction | | Error classification rate | 0% | 95% | Enabled monitoring | | Support ticket resolution | 2.5 days | 4 hours | 94% faster | | Successful retries | N/A | 78% | New capability |
CloudWatch dashboards now show clear breakdowns:
- 65% validation errors (client-side fixes)
- 20% data processing errors (size/format issues)
- 10% timeout errors (dataset too large)
- 5% system errors (actual bugs)
Lessons Learned
- Exception Hierarchies Matter: Well-designed error types enable better client responses
- Context is King: Include field names, values, and suggestions in error messages
- Status Codes are Documentation: Use correct HTTP codes to guide client behavior
- Log Structured Data: JSON logs enable powerful CloudWatch Insights queries
- Design for Retries: Help clients distinguish transient from permanent failures
Generic exceptions are a sign of immature error handling. Invest time in designing a proper exception hierarchy—it pays dividends in debugging, monitoring, and client satisfaction.