spatialx

Type Validation: Catching Runtime Type Errors Before They Crash Production

·visualization-utils

Key Takeaway

Our visualization service crashed with TypeError when clients sent numbers as strings or wrong data structures. Implementing runtime type validation with Pydantic reduced type-related crashes by 95% and improved API contract clarity for client teams.

The Problem

Our code assumed data types without validation:

def bar_chart(data):
    x = data['x']['value']  # Assumes list
    y = data['y'][0]['value']  # Assumes list of numbers

    # Crashes if x or y are strings, None, or wrong type
    fig = go.Figure(data=[go.Bar(x=x, y=y)])
    return fig.to_json()

This caused multiple failures:

  1. Runtime TypeErrors: TypeError: 'str' object is not iterable when iterating over non-lists
  2. Silent Failures: Plotly silently failed on incompatible types
  3. Confusing Errors: "Cannot convert string to float" deep in Plotly stack
  4. No Type Contract: Clients didn't know what types to send
  5. Debugging Hell: Stack traces pointed to Plotly internals, not our code

Common failing inputs:

{
  "x": {"value": "1,2,3,4,5"},  // String instead of array
  "y": [{"value": "[10, 20, 30]", "name": "Series1"}]  // Stringified array
}

Context and Background

Different clients integrated with our API:

  • JavaScript/TypeScript frontends (type-aware)
  • Python backends (dynamic typing)
  • Excel/CSV imports (everything as strings)
  • Third-party integrations (unknown)

Without explicit type validation, clients made incorrect assumptions about our API contract. CSV-based imports were particularly problematic—Excel exports often converted arrays to comma-separated strings, causing silent failures that users attributed to "broken charts."

The Solution

We implemented Pydantic models for comprehensive type validation:

from pydantic import BaseModel, validator, Field
from typing import List, Optional, Union
import numpy as np

class XAxisData(BaseModel):
    """X-axis data structure"""
    value: List[Union[int, float, str]] = Field(..., min_items=1, max_items=10000)
    label: Optional[str] = None

    @validator('value')
    def validate_value_types(cls, v):
        """Ensure all values are valid types"""
        for idx, item in enumerate(v):
            if not isinstance(item, (int, float, str)):
                raise ValueError(
                    f"X value at index {idx} must be number or string, got {type(item).__name__}"
                )
        return v

    @validator('value')
    def check_no_nan(cls, v):
        """Check for NaN or infinity"""
        for idx, item in enumerate(v):
            if isinstance(item, float):
                if np.isnan(item):
                    raise ValueError(f"X value at index {idx} is NaN")
                if np.isinf(item):
                    raise ValueError(f"X value at index {idx} is infinity")
        return v

class YAxisData(BaseModel):
    """Y-axis data structure"""
    value: List[Union[int, float]] = Field(..., min_items=1, max_items=10000)
    name: str = Field(..., min_length=1, max_length=100)
    color: Optional[str] = None

    @validator('value')
    def validate_numeric_only(cls, v):
        """Y values must be numeric"""
        for idx, item in enumerate(v):
            if not isinstance(item, (int, float)):
                raise ValueError(
                    f"Y value at index {idx} must be numeric, got {type(item).__name__}: {item}"
                )
            if isinstance(item, float):
                if np.isnan(item):
                    raise ValueError(f"Y value at index {idx} is NaN")
                if np.isinf(item):
                    raise ValueError(f"Y value at index {idx} is infinity")
        return v

class BarChartRequest(BaseModel):
    """Complete bar chart request validation"""
    x: XAxisData
    y: List[YAxisData] = Field(..., min_items=1, max_items=10)
    title: Optional[str] = Field(None, max_length=200)
    theme: Optional[str] = Field('default', regex='^(default|dark|light)$')

    @validator('y')
    def validate_array_lengths(cls, y_data, values):
        """Ensure all Y arrays match X length"""
        if 'x' not in values:
            return y_data

        x_length = len(values['x'].value)

        for idx, y_item in enumerate(y_data):
            if len(y_item.value) != x_length:
                raise ValueError(
                    f"Y series '{y_item.name}' has {len(y_item.value)} values, "
                    f"but X has {x_length} values. Arrays must be same length."
                )

        return y_data

    class Config:
        # Provide example in schema
        schema_extra = {
            "example": {
                "x": {"value": [1, 2, 3, 4, 5], "label": "X Axis"},
                "y": [{"value": [10, 20, 30, 40, 50], "name": "Series 1"}],
                "title": "Sample Chart",
                "theme": "default"
            }
        }

def lambda_handler(event, context):
    try:
        # Parse JSON body
        body = json.loads(event['body'])

        # Validate types - Pydantic does all the work
        request = BarChartRequest(**body)

        # Generate chart with validated data
        chart = generate_bar_chart(request)

        return {
            'statusCode': 200,
            'headers': {'Content-Type': 'application/json'},
            'body': json.dumps({'chart': chart})
        }

    except ValidationError as e:
        # Pydantic provides detailed error information
        errors = []
        for error in e.errors():
            errors.append({
                'field': '.'.join(str(x) for x in error['loc']),
                'message': error['msg'],
                'type': error['type']
            })

        return {
            'statusCode': 400,
            'headers': {'Content-Type': 'application/json'},
            'body': json.dumps({
                'error': 'ValidationError',
                'message': 'Request validation failed',
                'errors': errors
            })
        }

    except Exception as e:
        logger.exception("Unexpected error")
        return {
            'statusCode': 500,
            'body': json.dumps({'error': 'Internal server error'})
        }

Implementation Details

Auto-Coercion for Common Cases

Pydantic can automatically coerce types:

class CoerciveYAxisData(BaseModel):
    """Y-axis with automatic type coercion"""
    value: List[float]  # Automatically converts "10" -> 10.0
    name: str

    @validator('value', pre=True)
    def coerce_to_float_list(cls, v):
        """Handle common type conversion issues"""
        # Handle stringified JSON arrays
        if isinstance(v, str):
            try:
                v = json.loads(v)
            except json.JSONDecodeError:
                raise ValueError(f"Cannot parse Y values: {v}")

        # Handle single value instead of array
        if not isinstance(v, list):
            v = [v]

        # Convert each item to float
        result = []
        for idx, item in enumerate(v):
            try:
                result.append(float(item))
            except (ValueError, TypeError):
                raise ValueError(
                    f"Cannot convert Y value at index {idx} to number: {item}"
                )

        return result

# Now accepts: "10,20,30" or [10, 20, 30] or ["10", "20", "30"]

OpenAPI Schema Generation

Pydantic models generate OpenAPI schemas automatically:

from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI()

@app.post("/chart/bar")
def create_bar_chart(request: BarChartRequest):
    """
    Generate a bar chart from provided data.

    The API automatically generates documentation from Pydantic models.
    """
    return generate_bar_chart(request)

# FastAPI automatically creates:
# - Interactive docs at /docs
# - OpenAPI schema at /openapi.json
# - Request validation
# - Response serialization

Custom Validators for Business Rules

We added domain-specific validation:

class ChartRequest(BaseModel):
    x: XAxisData
    y: List[YAxisData]

    @validator('x')
    def check_x_uniqueness(cls, v):
        """Warn about duplicate X values"""
        values = v.value
        if len(values) != len(set(values)):
            # Don't fail, but log warning
            logger.warning(f"X values contain duplicates")
        return v

    @validator('y')
    def check_y_variance(cls, y_data):
        """Warn if all Y values are identical"""
        for y in y_data:
            if len(set(y.value)) == 1:
                logger.warning(
                    f"Y series '{y.name}' has no variance (all values are {y.value[0]})"
                )
        return y_data

    @validator('y')
    def check_reasonable_scale(cls, y_data):
        """Warn about extreme value ranges"""
        for y in y_data:
            min_val = min(y.value)
            max_val = max(y.value)
            if max_val > 0 and min_val / max_val < 0.0001:
                logger.warning(
                    f"Y series '{y.name}' has extreme range: {min_val} to {max_val}"
                )
        return y_data

Type Documentation

We generated type definitions for clients:

# Export TypeScript definitions
from pydantic2ts import generate_typescript_defs

# Generates:
# export interface XAxisData {
#   value: (number | string)[];
#   label?: string;
# }
#
# export interface YAxisData {
#   value: number[];
#   name: string;
#   color?: string;
# }
#
# export interface BarChartRequest {
#   x: XAxisData;
#   y: YAxisData[];
#   title?: string;
#   theme?: "default" | "dark" | "light";
# }

Impact and Results

After implementing type validation:

MetricBeforeAfterImprovement
Type-related errors340/week18/week95% reduction
Invalid request rate12%0.8%93% reduction
Client integration time3-5 days4-6 hours85% faster
Support: "Why isn't this working?"25/week2/week92% reduction
Time to identify bad data30 min0 secInstant feedback

Clear error messages helped developers immediately:

Before:

TypeError: 'str' object is not iterable
  at plotly/graph_objs/_bar.py line 234

After:

{
  "error": "ValidationError",
  "message": "Request validation failed",
  "errors": [
    {
      "field": "y.0.value.3",
      "message": "Y value at index 3 must be numeric, got str: 'N/A'",
      "type": "value_error"
    }
  ]
}

Lessons Learned

  1. Validate at the Boundary: Type-check all inputs before processing
  2. Use Type Libraries: Pydantic, marshmallow, or similar save enormous effort
  3. Fail Early with Context: Tell users exactly what's wrong and where
  4. Generate Documentation: Derive API docs from validation schemas
  5. Coerce Carefully: Auto-convert common mistakes, but log warnings

Runtime type validation is essential in dynamically-typed languages like Python. Pydantic provides production-grade validation with minimal code and excellent error messages. The investment pays back immediately in reduced errors and faster client integration.