Removing Sequential Access Mode: A 10x Performance Improvement in Image Processing
Key Takeaway
Our WSI processor used access="sequential" mode when opening large images, forcing slow sequential reads instead of leveraging random access. Removing this single parameter improved tile generation speed by 10x and reduced processing time from 45 minutes to 4 minutes for large slides.
The Problem
We enabled sequential access mode thinking it would optimize large file processing:
def process_wsi(image_path):
# Sequential access mode forces linear reads
img = openslide.OpenSlide(image_path, access="sequential")
# Tile generation requires random access!
# This is incredibly slow with sequential mode
for level in range(img.level_count):
for tile_x in range(num_tiles_x):
for tile_y in range(num_tiles_y):
# Each tile read seeks through file sequentially
tile = img.read_region(
(tile_x * 256, tile_y * 256),
level,
(256, 256)
)
This caused severe performance issues:
- Extremely Slow Processing: 45 minutes for a 2GB slide (should take 4 minutes)
- Linear File Traversal: Had to read through entire file to get each tile
- Wasted I/O: Read same data multiple times
- High Costs: Long-running Lambda/ECS tasks cost 10x more
- Poor User Experience: Users waited hours for slide processing
Processing pattern with sequential mode:
Reading tile at position 50000:
- Seek from start
- Read bytes 0-50000 (discard)
- Read tile data
- Total: Read 50KB, discarded 50MB
Reading tile at position 100000:
- Seek from start
- Read bytes 0-100000 (discard)
- Read tile data
- Total: Read 50KB, discarded 100MB
Context and Background
We were processing Whole Slide Images (WSI) from digital pathology scanners. These files are massive:
- Typical size: 1-5GB per slide
- Dimensions: 100,000 x 100,000 pixels
- Tile count: 10,000-50,000 tiles
- Format: Pyramidal TIFF with multiple resolution levels
Deep zoom tile generation requires random access to different parts of the image:
- Level 0: Full resolution (bottom of pyramid)
- Level 1: 50% resolution
- Level 2: 25% resolution
- etc.
Each level requires reading from different file offsets. Sequential access mode meant we had to traverse from the file start for every single tile read, making the process O(n²) instead of O(n).
The confusion came from documentation suggesting sequential mode for "large files." We misinterpreted this to mean "large image files" when it actually means "files you're reading sequentially from start to end" (like log files or streams).
The Solution
We removed the sequential access parameter:
def process_wsi_optimized(image_path: str, output_path: str):
"""Process WSI with proper random access"""
logger.info(f"Opening slide: {image_path}")
# Remove access="sequential" - use default random access
slide = openslide.OpenSlide(image_path)
logger.info(
f"Slide opened: {slide.dimensions}, "
f"levels: {slide.level_count}"
)
# Generate tiles with random access
start_time = time.time()
dzi_generator = DeepZoomImageGenerator(
slide,
output_path=output_path,
tile_size=256,
tile_format='jpeg',
overlap=0,
limit_bounds=True
)
# Generate all tiles
dzi_generator.generate()
duration = time.time() - start_time
logger.info(
f"Generated {dzi_generator.tile_count} tiles "
f"in {duration:.1f} seconds "
f"({dzi_generator.tile_count / duration:.1f} tiles/sec)"
)
# Close slide
slide.close()
return {
'tile_count': dzi_generator.tile_count,
'duration_seconds': duration,
'tiles_per_second': dzi_generator.tile_count / duration
}
Implementation Details
Performance Monitoring
We added detailed performance tracking:
import time
from contextlib import contextmanager
@contextmanager
def timing_context(operation: str):
"""Context manager for timing operations"""
start = time.time()
logger.info(f"Starting: {operation}")
try:
yield
finally:
duration = time.time() - start
logger.info(f"Completed: {operation} in {duration:.2f}s")
# Log to CloudWatch
cloudwatch.put_metric_data(
Namespace='WSI/Performance',
MetricData=[{
'MetricName': 'OperationDuration',
'Value': duration,
'Unit': 'Seconds',
'Dimensions': [
{'Name': 'Operation', 'Value': operation}
]
}]
)
def process_wsi_with_metrics(image_path: str):
"""Process WSI with detailed performance metrics"""
metrics = {}
with timing_context('OpenSlide'):
slide = openslide.OpenSlide(image_path)
metrics['slide_open_time'] = timing_context.duration
with timing_context('TileGeneration'):
tile_count = 0
for level in range(slide.level_count):
level_start = time.time()
# Generate tiles for this level
level_tiles = generate_level_tiles(slide, level)
tile_count += level_tiles
level_duration = time.time() - level_start
logger.info(
f"Level {level}: {level_tiles} tiles "
f"in {level_duration:.1f}s "
f"({level_tiles/level_duration:.1f} tiles/sec)"
)
metrics['tile_generation_time'] = timing_context.duration
metrics['tile_count'] = tile_count
metrics['tiles_per_second'] = tile_count / timing_context.duration
slide.close()
return metrics
Parallel Tile Generation
With random access, we could parallelize tile generation:
from concurrent.futures import ThreadPoolExecutor, as_completed
import multiprocessing
def generate_tile(slide, level, x, y, tile_size):
"""Generate a single tile"""
try:
# Calculate position
location = (x * tile_size, y * tile_size)
# Read tile with random access
tile = slide.read_region(
location,
level,
(tile_size, tile_size)
)
# Convert to RGB
tile = tile.convert('RGB')
return (level, x, y, tile)
except Exception as e:
logger.error(f"Failed to generate tile ({level}, {x}, {y}): {e}")
return None
def process_wsi_parallel(image_path: str, output_path: str, workers: int = 4):
"""Process WSI with parallel tile generation"""
slide = openslide.OpenSlide(image_path)
tile_size = 256
tile_jobs = []
# Create tile generation jobs
for level in range(slide.level_count):
level_dims = slide.level_dimensions[level]
tiles_x = (level_dims[0] + tile_size - 1) // tile_size
tiles_y = (level_dims[1] + tile_size - 1) // tile_size
for x in range(tiles_x):
for y in range(tiles_y):
tile_jobs.append((level, x, y))
logger.info(f"Generating {len(tile_jobs)} tiles with {workers} workers")
# Process tiles in parallel
start_time = time.time()
completed = 0
with ThreadPoolExecutor(max_workers=workers) as executor:
# Submit all jobs
futures = {
executor.submit(
generate_tile,
slide,
level,
x,
y,
tile_size
): (level, x, y)
for level, x, y in tile_jobs
}
# Process completed tiles
for future in as_completed(futures):
result = future.result()
if result:
level, x, y, tile = result
# Save tile
tile_path = f"{output_path}/level{level}/tile_{x}_{y}.jpg"
os.makedirs(os.path.dirname(tile_path), exist_ok=True)
tile.save(tile_path, 'JPEG', quality=95)
completed += 1
if completed % 100 == 0:
elapsed = time.time() - start_time
rate = completed / elapsed
logger.info(
f"Progress: {completed}/{len(tile_jobs)} "
f"({rate:.1f} tiles/sec)"
)
duration = time.time() - start_time
logger.info(
f"Completed {len(tile_jobs)} tiles "
f"in {duration:.1f}s "
f"({len(tile_jobs)/duration:.1f} tiles/sec)"
)
slide.close()
Resource Optimization
We optimized memory usage for parallel processing:
def process_wsi_optimized_memory(image_path: str, batch_size: int = 100):
"""Process WSI in batches to manage memory"""
slide = openslide.OpenSlide(image_path)
# Calculate all tile positions first
tile_positions = []
for level in range(slide.level_count):
level_dims = slide.level_dimensions[level]
tiles_x = (level_dims[0] + 256 - 1) // 256
tiles_y = (level_dims[1] + 256 - 1) // 256
for x in range(tiles_x):
for y in range(tiles_y):
tile_positions.append((level, x, y))
# Process in batches
for i in range(0, len(tile_positions), batch_size):
batch = tile_positions[i:i + batch_size]
logger.info(
f"Processing batch {i//batch_size + 1} "
f"({len(batch)} tiles)"
)
# Generate tiles for this batch
with ThreadPoolExecutor(max_workers=4) as executor:
futures = [
executor.submit(generate_tile, slide, *pos)
for pos in batch
]
for future in as_completed(futures):
result = future.result()
if result:
save_tile(result)
# Force garbage collection between batches
import gc
gc.collect()
slide.close()
Impact and Results
After removing sequential access mode:
| Metric | Before (Sequential) | After (Random Access) | Improvement | |--------|--------------------|-----------------------|-------------| | Processing time (2GB slide) | 45 min | 4 min | 11.25x faster | | Tiles per second | 15 | 165 | 11x faster | | I/O operations | 500 million | 45 million | 90% reduction | | Lambda timeout rate | 67% | 0% | 100% fix | | Processing cost per slide | $3.20 | $0.28 | 91% reduction |
Performance by slide size:
- Small (500MB): 2m 15s → 15s (9x faster)
- Medium (2GB): 45m → 4m (11x faster)
- Large (5GB): Timeout → 9m (enabled processing)
With parallel processing (4 workers):
- Processing time: 4m → 1m 45s (2.3x faster)
- Tiles per second: 165 → 380 (2.3x faster)
Lessons Learned
- Understand Access Patterns: Sequential mode is for streaming, not random access
- Profile Before Optimizing: We should have measured I/O patterns earlier
- Read Documentation Carefully: "Large files" doesn't mean "large image files"
- Random Access Enables Parallelization: Can't parallelize sequential reads
- One Parameter Can Matter: Removing 19 characters improved performance 10x
The access="sequential" parameter was a catastrophic performance mistake. Image tile generation is inherently a random-access workload—we need to read from arbitrary file positions to extract tiles. Sequential access mode forced the OS to read linearly through the file for every tile, turning an O(n) operation into O(n²). Always match your I/O mode to your access pattern.