← Back

Removing Sequential Access Mode: A 10x Performance Improvement in Image Processing

·wsi-processor

Removing Sequential Access Mode: A 10x Performance Improvement in Image Processing

Key Takeaway

Our WSI processor used access="sequential" mode when opening large images, forcing slow sequential reads instead of leveraging random access. Removing this single parameter improved tile generation speed by 10x and reduced processing time from 45 minutes to 4 minutes for large slides.

The Problem

We enabled sequential access mode thinking it would optimize large file processing:

def process_wsi(image_path):
    # Sequential access mode forces linear reads
    img = openslide.OpenSlide(image_path, access="sequential")

    # Tile generation requires random access!
    # This is incredibly slow with sequential mode
    for level in range(img.level_count):
        for tile_x in range(num_tiles_x):
            for tile_y in range(num_tiles_y):
                # Each tile read seeks through file sequentially
                tile = img.read_region(
                    (tile_x * 256, tile_y * 256),
                    level,
                    (256, 256)
                )

This caused severe performance issues:

  1. Extremely Slow Processing: 45 minutes for a 2GB slide (should take 4 minutes)
  2. Linear File Traversal: Had to read through entire file to get each tile
  3. Wasted I/O: Read same data multiple times
  4. High Costs: Long-running Lambda/ECS tasks cost 10x more
  5. Poor User Experience: Users waited hours for slide processing

Processing pattern with sequential mode:

Reading tile at position 50000:
  - Seek from start
  - Read bytes 0-50000 (discard)
  - Read tile data
  - Total: Read 50KB, discarded 50MB

Reading tile at position 100000:
  - Seek from start
  - Read bytes 0-100000 (discard)
  - Read tile data
  - Total: Read 50KB, discarded 100MB

Context and Background

We were processing Whole Slide Images (WSI) from digital pathology scanners. These files are massive:

  • Typical size: 1-5GB per slide
  • Dimensions: 100,000 x 100,000 pixels
  • Tile count: 10,000-50,000 tiles
  • Format: Pyramidal TIFF with multiple resolution levels

Deep zoom tile generation requires random access to different parts of the image:

  • Level 0: Full resolution (bottom of pyramid)
  • Level 1: 50% resolution
  • Level 2: 25% resolution
  • etc.

Each level requires reading from different file offsets. Sequential access mode meant we had to traverse from the file start for every single tile read, making the process O(n²) instead of O(n).

The confusion came from documentation suggesting sequential mode for "large files." We misinterpreted this to mean "large image files" when it actually means "files you're reading sequentially from start to end" (like log files or streams).

The Solution

We removed the sequential access parameter:

def process_wsi_optimized(image_path: str, output_path: str):
    """Process WSI with proper random access"""

    logger.info(f"Opening slide: {image_path}")

    # Remove access="sequential" - use default random access
    slide = openslide.OpenSlide(image_path)

    logger.info(
        f"Slide opened: {slide.dimensions}, "
        f"levels: {slide.level_count}"
    )

    # Generate tiles with random access
    start_time = time.time()

    dzi_generator = DeepZoomImageGenerator(
        slide,
        output_path=output_path,
        tile_size=256,
        tile_format='jpeg',
        overlap=0,
        limit_bounds=True
    )

    # Generate all tiles
    dzi_generator.generate()

    duration = time.time() - start_time

    logger.info(
        f"Generated {dzi_generator.tile_count} tiles "
        f"in {duration:.1f} seconds "
        f"({dzi_generator.tile_count / duration:.1f} tiles/sec)"
    )

    # Close slide
    slide.close()

    return {
        'tile_count': dzi_generator.tile_count,
        'duration_seconds': duration,
        'tiles_per_second': dzi_generator.tile_count / duration
    }

Implementation Details

Performance Monitoring

We added detailed performance tracking:

import time
from contextlib import contextmanager

@contextmanager
def timing_context(operation: str):
    """Context manager for timing operations"""
    start = time.time()
    logger.info(f"Starting: {operation}")

    try:
        yield
    finally:
        duration = time.time() - start
        logger.info(f"Completed: {operation} in {duration:.2f}s")

        # Log to CloudWatch
        cloudwatch.put_metric_data(
            Namespace='WSI/Performance',
            MetricData=[{
                'MetricName': 'OperationDuration',
                'Value': duration,
                'Unit': 'Seconds',
                'Dimensions': [
                    {'Name': 'Operation', 'Value': operation}
                ]
            }]
        )

def process_wsi_with_metrics(image_path: str):
    """Process WSI with detailed performance metrics"""

    metrics = {}

    with timing_context('OpenSlide'):
        slide = openslide.OpenSlide(image_path)
        metrics['slide_open_time'] = timing_context.duration

    with timing_context('TileGeneration'):
        tile_count = 0

        for level in range(slide.level_count):
            level_start = time.time()

            # Generate tiles for this level
            level_tiles = generate_level_tiles(slide, level)
            tile_count += level_tiles

            level_duration = time.time() - level_start

            logger.info(
                f"Level {level}: {level_tiles} tiles "
                f"in {level_duration:.1f}s "
                f"({level_tiles/level_duration:.1f} tiles/sec)"
            )

        metrics['tile_generation_time'] = timing_context.duration
        metrics['tile_count'] = tile_count
        metrics['tiles_per_second'] = tile_count / timing_context.duration

    slide.close()

    return metrics

Parallel Tile Generation

With random access, we could parallelize tile generation:

from concurrent.futures import ThreadPoolExecutor, as_completed
import multiprocessing

def generate_tile(slide, level, x, y, tile_size):
    """Generate a single tile"""
    try:
        # Calculate position
        location = (x * tile_size, y * tile_size)

        # Read tile with random access
        tile = slide.read_region(
            location,
            level,
            (tile_size, tile_size)
        )

        # Convert to RGB
        tile = tile.convert('RGB')

        return (level, x, y, tile)

    except Exception as e:
        logger.error(f"Failed to generate tile ({level}, {x}, {y}): {e}")
        return None

def process_wsi_parallel(image_path: str, output_path: str, workers: int = 4):
    """Process WSI with parallel tile generation"""

    slide = openslide.OpenSlide(image_path)

    tile_size = 256
    tile_jobs = []

    # Create tile generation jobs
    for level in range(slide.level_count):
        level_dims = slide.level_dimensions[level]
        tiles_x = (level_dims[0] + tile_size - 1) // tile_size
        tiles_y = (level_dims[1] + tile_size - 1) // tile_size

        for x in range(tiles_x):
            for y in range(tiles_y):
                tile_jobs.append((level, x, y))

    logger.info(f"Generating {len(tile_jobs)} tiles with {workers} workers")

    # Process tiles in parallel
    start_time = time.time()
    completed = 0

    with ThreadPoolExecutor(max_workers=workers) as executor:
        # Submit all jobs
        futures = {
            executor.submit(
                generate_tile,
                slide,
                level,
                x,
                y,
                tile_size
            ): (level, x, y)
            for level, x, y in tile_jobs
        }

        # Process completed tiles
        for future in as_completed(futures):
            result = future.result()

            if result:
                level, x, y, tile = result

                # Save tile
                tile_path = f"{output_path}/level{level}/tile_{x}_{y}.jpg"
                os.makedirs(os.path.dirname(tile_path), exist_ok=True)
                tile.save(tile_path, 'JPEG', quality=95)

                completed += 1

                if completed % 100 == 0:
                    elapsed = time.time() - start_time
                    rate = completed / elapsed
                    logger.info(
                        f"Progress: {completed}/{len(tile_jobs)} "
                        f"({rate:.1f} tiles/sec)"
                    )

    duration = time.time() - start_time

    logger.info(
        f"Completed {len(tile_jobs)} tiles "
        f"in {duration:.1f}s "
        f"({len(tile_jobs)/duration:.1f} tiles/sec)"
    )

    slide.close()

Resource Optimization

We optimized memory usage for parallel processing:

def process_wsi_optimized_memory(image_path: str, batch_size: int = 100):
    """Process WSI in batches to manage memory"""

    slide = openslide.OpenSlide(image_path)

    # Calculate all tile positions first
    tile_positions = []

    for level in range(slide.level_count):
        level_dims = slide.level_dimensions[level]
        tiles_x = (level_dims[0] + 256 - 1) // 256
        tiles_y = (level_dims[1] + 256 - 1) // 256

        for x in range(tiles_x):
            for y in range(tiles_y):
                tile_positions.append((level, x, y))

    # Process in batches
    for i in range(0, len(tile_positions), batch_size):
        batch = tile_positions[i:i + batch_size]

        logger.info(
            f"Processing batch {i//batch_size + 1} "
            f"({len(batch)} tiles)"
        )

        # Generate tiles for this batch
        with ThreadPoolExecutor(max_workers=4) as executor:
            futures = [
                executor.submit(generate_tile, slide, *pos)
                for pos in batch
            ]

            for future in as_completed(futures):
                result = future.result()
                if result:
                    save_tile(result)

        # Force garbage collection between batches
        import gc
        gc.collect()

    slide.close()

Impact and Results

After removing sequential access mode:

| Metric | Before (Sequential) | After (Random Access) | Improvement | |--------|--------------------|-----------------------|-------------| | Processing time (2GB slide) | 45 min | 4 min | 11.25x faster | | Tiles per second | 15 | 165 | 11x faster | | I/O operations | 500 million | 45 million | 90% reduction | | Lambda timeout rate | 67% | 0% | 100% fix | | Processing cost per slide | $3.20 | $0.28 | 91% reduction |

Performance by slide size:

  • Small (500MB): 2m 15s → 15s (9x faster)
  • Medium (2GB): 45m → 4m (11x faster)
  • Large (5GB): Timeout → 9m (enabled processing)

With parallel processing (4 workers):

  • Processing time: 4m → 1m 45s (2.3x faster)
  • Tiles per second: 165 → 380 (2.3x faster)

Lessons Learned

  1. Understand Access Patterns: Sequential mode is for streaming, not random access
  2. Profile Before Optimizing: We should have measured I/O patterns earlier
  3. Read Documentation Carefully: "Large files" doesn't mean "large image files"
  4. Random Access Enables Parallelization: Can't parallelize sequential reads
  5. One Parameter Can Matter: Removing 19 characters improved performance 10x

The access="sequential" parameter was a catastrophic performance mistake. Image tile generation is inherently a random-access workload—we need to read from arbitrary file positions to extract tiles. Sequential access mode forced the OS to read linearly through the file for every tile, turning an O(n) operation into O(n²). Always match your I/O mode to your access pattern.