alqosh

Lambda Cost Investigation: From $2,700 to $1,700/month

February 1, 2026·cost-optimization

Context

Our monthly AWS bill showed Lambda costs climbing to $2,700/month, 35% above our budgeted $2,000 target. Without visibility into the cost breakdown, we couldn't identify which optimizations would deliver the highest ROI. We needed a data-driven investigation to find quick wins.

This post details our systematic approach to Lambda cost analysis and the specific optimizations that reduced costs by 37% in one month.

The Investigation Process

Initial State:

We established a three-phase investigation methodology:

Phase 1: Cost Attribution

Used AWS Cost Explorer with Lambda-specific filters to break down costs by dimension:

Function name
Memory configuration
Region
Time period (hourly patterns)

Phase 2: Metric Analysis

Queried CloudWatch Logs Insights for 30 days of Lambda execution data:

fields @timestamp, @duration, @billedDuration, @memorySize, @maxMemoryUsed
| stats
    count() as invocations,
    avg(@duration) as avg_duration,
    avg(@billedDuration) as avg_billed,
    max(@maxMemoryUsed) as peak_memory,
    avg(@memorySize) as provisioned_memory
by @functionName

Phase 3: Optimization Modeling

For each cost driver, we calculated:

Current cost: Monthly spend
Optimization potential: Expected savings
Implementation effort: Engineering hours
Risk level: Production impact risk

Monthly Lambda Bill: $2,700
┌──────────────────────────────────────┐
│ Line Item               Amount       │
│ ────────────────────────────────     │
│ (Unknown breakdown)                  │
│ Total:                  $2,700       │
└──────────────────────────────────────┘

Questions:
- Which functions cost the most?
- What drives duration charges?
- Are we over-provisioned?

Cost Breakdown Discovery

After analyzing 30 days of billing and execution data, we identified the cost drivers:

Key Insights:

Invocation costs dominated: 44% from request count alone
SnapStart was expensive: $550/month for 1% of requests (see Post 8.1)
Duration had headroom: Functions averaged 45% memory utilization
Timeout misconfiguration: 30s timeout for operations completing in 800ms

Lambda Cost Breakdown (Monthly)
┌──────────────────────────────────────┐
│ Category             Cost      %     │
│ ────────────────────────────────     │
│ Invocations:        $1,200   44%     │
│ Duration:           $800     30%     │
│ SnapStart:          $550     20%     │
│ Data Transfer:      $150      6%     │
│ ────────────────────────────────     │
│ Total:              $2,700   100%    │
└──────────────────────────────────────┘

Optimization Opportunities

We identified seven optimization opportunities ranked by ROI:

1. Disable SnapStart

Current cost: $550/month Savings: $550/month Effort: 1 hour (config change) Risk: Low (only affects 1% of requests)

Analysis:
- Cold starts: 1% of requests
- User impact: Minimal (2.5s increase for cold starts)
- Cost per cold start: $0.0011
- Decision: DISABLE

Impact: $550/month saved

2. Reduce Function Timeout

Current cost: Contributes to $800 duration charges Savings: $200/month Effort: 2 hours (testing + deployment) Risk: Medium (requires validation)

3. Optimize Memory Configuration

Current cost: Part of $800 duration charges Savings: $150/month Effort: 4 hours (benchmarking + testing) Risk: Medium (performance testing required)

We reduced memory from 1024 MB to 640 MB, increasing utilization to 97% while maintaining safety margin. Lambda pricing is linear with memory, so this saved 37.5% on memory-related costs.

Impact: $150/month saved

4. Consolidate Lambda Functions

Current cost: Multiple functions increase cold start frequency Savings: $100/month Effort: 16 hours (architecture refactor) Risk: High (requires testing)

This optimization was covered in detail in Performance Post 5.2. The consolidation reduced cold start frequency by 75% and eliminated redundant initialization overhead.

Impact: $100/month saved

5. API Request Batching

Current cost: High invocation count Savings: Indirect (reduces API Gateway costs more than Lambda) Effort: 12 hours (client + server changes) Risk: Medium

See Cost Post 8.3 for detailed analysis of API Gateway optimizations.

6. CloudWatch Logs Optimization

Current cost: Not Lambda directly, but related Savings: $570/month (CloudWatch) Effort: 3 hours Risk: Low

See Cost Post 8.4 for detailed analysis.

7. EventBridge Warmup Elimination

Current cost: $150/month in warmup invocations Savings: $150/month Effort: 1 hour Risk: Low

See Cost Post 8.5 for detailed analysis.

Current State:
┌──────────────────────────────────────┐
│ Function: main_lambda                │
│ Timeout: 30 seconds                  │
│ P99 duration: 1.2 seconds            │
│ Waste: 28.8s of billed time          │
└──────────────────────────────────────┘

Optimization:
┌──────────────────────────────────────┐
│ Function: main_lambda                │
│ Timeout: 3 seconds                   │
│ P99 duration: 1.2 seconds            │
│ Buffer: 1.8s safety margin           │
└──────────────────────────────────────┘

Lambda bills in 1ms increments, but timeout configuration affects resource reservation costs. More importantly, this prevents runaway functions from billing unnecessarily.

Impact: $200/month saved

Memory Analysis (CloudWatch data):
┌──────────────────────────────────────┐
│ Provisioned: 1024 MB                 │
│ Avg Used:    460 MB (45%)            │
│ P95 Used:    580 MB (57%)            │
│ P99 Used:    620 MB (60%)            │
└──────────────────────────────────────┘

Optimization Path:
┌──────────────────────────────────────┐
│ Test 768 MB:  P99 = 650 MB (85%)     │
│ Test 640 MB:  P99 = 620 MB (97%) ✓   │
│ Decision:     640 MB with monitoring │
└──────────────────────────────────────┘

Before: 4 Functions
┌──────────────────┐
│ auth_lambda      │ → Cold starts
├──────────────────┤
│ content_lambda   │ → Cold starts
├──────────────────┤
│ user_lambda      │ → Cold starts
├──────────────────┤
│ analytics_lambda │ → Cold starts
└──────────────────┘

Each function has separate cold starts and
resource allocation

After: 1 Function
┌──────────────────────────────────────┐
│ main_lambda                          │
│ ├─ /auth/*      (routing)            │
│ ├─ /content/*   (routing)            │
│ ├─ /user/*      (routing)            │
│ └─ /analytics/* (routing)            │
└──────────────────────────────────────┘

Shared runtime, reduced cold starts,
better resource utilization

Implementation Plan

We prioritized optimizations by ROI (savings ÷ effort):

Week 1: Low-Hanging Fruit ($900 savings)

Disabled SnapStart (1 hour, $550/month saved)
Reduced timeout from 30s to 3s (2 hours, $200/month saved)
Disabled EventBridge warmup (1 hour, $150/month saved)

Week 2: Memory Optimization ($150 savings)

Ran memory benchmarks at 768 MB, 640 MB, 512 MB
Monitored P99 memory usage for 3 days
Deployed 640 MB configuration
Validated performance for 4 days

Week 3: Architecture Refactor ($100 savings)

Consolidated 4 Lambda functions into 1
Updated API Gateway routing
Ran integration test suite
Phased rollout with traffic splitting

Optimization Roadmap
┌──────────────────────────────────────────────────┐
│ Priority  Optimization          ROI    Timeline  │
│ ────────────────────────────────────────────     │
│ 1         Disable SnapStart    $550/hr  Week 1   │
│ 2         Reduce timeout       $100/hr  Week 1   │
│ 3         EventBridge disable  $150/hr  Week 1   │
│ 4         Optimize memory      $37/hr   Week 2   │
│ 5         Lambda consolidation $6/hr    Week 3   │
└──────────────────────────────────────────────────┘

Results

Cost Reduction:

Breakdown of Savings:

SnapStart disabled: $550/month
Timeout reduction: $200/month
EventBridge warmup: $150/month
Memory optimization: $150/month (blended into duration)
Function consolidation: $100/month (reduced cold starts)

Performance Impact:

The P99 latency increased from 850ms to 2.8s due to cold starts (SnapStart disabled), but this only affected 1% of requests and didn't correlate with user complaints or increased error rates.

Lambda Costs (Before → After)
┌──────────────────────────────────────┐
│ Before:         $2,700/month         │
│ After:          $1,700/month         │
│ ────────────────────────────────     │
│ Savings:        $1,000/month         │
│ Reduction:      37%                  │
│ Annual Impact:  $12,000/year         │
└──────────────────────────────────────┘

Latency Metrics (Before → After)
┌──────────────────────────────────────┐
│ P50 latency:    210ms → 205ms        │
│ P95 latency:    450ms → 440ms        │
│ P99 latency:    850ms → 2.8s*        │
│ Error rate:     0.02% → 0.02%        │
│ ────────────────────────────────     │
│ *P99 increase due to cold starts     │
│  affecting 1% of requests            │
└──────────────────────────────────────┘

Lessons Learned

1. Start with Data, Not Assumptions

We assumed duration costs were the problem. The data showed invocation count and SnapStart fees were larger cost drivers. Without CloudWatch Logs Insights analysis, we would have optimized the wrong things.

2. ROI Matters More Than Raw Savings

Lambda consolidation saved $100/month but required 16 hours of engineering time. SnapStart disable saved $550/month and took 1 hour. We should have started with SnapStart.

3. User Impact Trumps Technical Metrics

Cold starts increased P99 latency by 2 seconds, but user-facing metrics (bounce rate, session duration, conversion) showed no degradation. Technical perfection isn't always worth the cost.

4. Monitor Before and After

We monitored metrics for 7 days before optimization and 14 days after. This gave us confidence that changes didn't cause regressions and provided data to refine further.

5. Optimize the Whole System

Lambda costs were $2,700/month, but related services (API Gateway, CloudWatch) added another $5,000/month. Optimizing Lambda alone missed the bigger picture (covered in Posts 8.3 and 8.4).

Tools and Techniques

AWS Cost Explorer:

Lambda cost breakdown by function
Time-series analysis to identify trends
Tag-based cost allocation

CloudWatch Logs Insights Queries:

-- Find over-provisioned memory
fields @memorySize, @maxMemoryUsed,
       (@memorySize - @maxMemoryUsed) as waste
| filter @type = "REPORT"
| stats avg(@memorySize) as avg_provisioned,
        avg(@maxMemoryUsed) as avg_used,
        avg(waste) as avg_waste
by @functionName

-- Identify slow operations
fields @timestamp, @duration
| filter @duration > 1000
| sort @duration desc
| limit 100

Lambda Power Tuning: We used the open-source Lambda Power Tuning tool to benchmark different memory configurations and find the optimal cost/performance ratio.

Custom Cost Dashboard: Built a CloudWatch dashboard tracking:

Daily Lambda costs (CloudWatch metric math)
Invocation count by function
Average duration by function
Memory utilization percentiles

Conclusion

Reducing Lambda costs by 37% required data-driven investigation, not guesswork. We identified seven optimization opportunities, prioritized by ROI, and implemented them over three weeks. The $12,000/year savings were reallocated to engineer salaries and infrastructure improvements.

Key Takeaways:

Use CloudWatch Logs Insights to analyze execution patterns
Prioritize optimizations by ROI (savings per engineering hour)
Measure user impact, not just technical metrics
Monitor before and after to validate assumptions
Optimize the whole system, not just Lambda in isolation

Final Results:

Cost reduction: $1,000/month ($12,000/year)
Engineering effort: 36 hours over 3 weeks
ROI: $333/hour saved
User experience: No measurable degradation

Related Plan: docs/plans/implemented/high/2026-01-16-cost-savings-lambda-plan.md Related Posts:

Cost Post 8.1 (SnapStart Disable)
Cost Post 8.5 (EventBridge Warmup)
Performance Post 5.2 (Lambda Consolidation)