How I Saved a federal client $200K on AWS

During my time supporting a US federal client, I led AWS optimization projects that reduced annual cloud spending by over $200,000, a 40% reduction in costs. But here's what made it remarkable: we didn't just cut costs. We actually improved performance, enhanced security, and built more reliable systems.

This wasn't achieved through magic or risky shortcuts. It was the result of systematic analysis, strategic implementation, and a deep understanding of AWS services. Let me show you exactly how we did it.

$200K+

Annual Savings

40%

Cost Reduction

100%

Uptime Maintained

The Challenge: Growing AWS Costs Without Clear ROI

Like many organizations, US federal installations I supported had experienced rapid AWS adoption. Teams spun up resources quickly to meet mission requirements, but without centralized oversight, costs spiraled. We faced:

Over-provisioned resources: EC2 instances sized for peak loads that rarely occurred
Idle resources: Development and test environments running 24/7
Unoptimized storage: Data sitting in expensive storage tiers unnecessarily
Missing Reserved Instances: Steady-state workloads running on expensive On-Demand pricing
No cost visibility: Teams didn't know what they were spending or why

The mission was clear: reduce costs significantly without compromising operational capabilities or security posture.

Strategy 1: Comprehensive Infrastructure Audit

You can't optimize what you can't measure. Our first step was gaining complete visibility into the AWS environment.

What We Did

AWS Cost Explorer deep dive: Analyzed six months of spending patterns to identify the biggest cost drivers
Resource inventory: Tagged every resource with owner, project, and environment metadata
Utilization analysis: Used CloudWatch metrics to identify underutilized resources
Reserved Instance analysis: Identified steady-state workloads perfect for RI commitments

💡

Pro Tip

Use AWS Cost Explorer's hourly granularity to identify exact usage patterns. We discovered that 60% of our development EC2 instances could run on schedules rather than 24/7, immediately cutting those costs by 65%.

Key Findings

The audit revealed eye-opening patterns:

23% of EC2 instances had CPU utilization below 5%
Dev/test environments consumed 35% of monthly spend but only needed 40 hours/week of uptime
$47K annually spent on EBS snapshots older than 90 days with no retention policy
82% of steady-state compute workloads were running On-Demand instead of Reserved Instances

Strategy 2: Right-Sizing EC2 Instances

The most impactful quick win came from EC2 right-sizing. Many instances were massively over-provisioned based on "just in case" sizing decisions made months or years earlier.

The Process

Collected 30 days of CloudWatch metrics for CPU, memory, network, and disk I/O
Identified candidates: Instances consistently using less than 40% of provisioned capacity
Calculated optimal sizes: Matched actual usage patterns to appropriate instance types
Tested in dev/test first: Validated performance before production changes
Implemented gradually: Changed production instances during maintenance windows

📊

Real Example

We had a fleet of m5.2xlarge instances (8 vCPU, 32GB RAM) running web applications. Analysis showed average CPU at 12% and memory at 18%. We downsized to m5.large (2 vCPU, 8GB RAM) and saved $68,000 annually with zero performance degradation.

Results from Right-Sizing

Reduced 47 instances by 1-3 sizes
Annual savings: $89,000
Average performance improvement: 3% (better instance utilization)

Strategy 3: Automated Resource Scheduling

Development, test, and staging environments don't need to run 24/7. We implemented automated start/stop schedules using a combination of Lambda functions and EventBridge rules.

Implementation Details

I created CloudFormation templates that deployed:

Lambda functions to start/stop EC2 instances, RDS databases, and Redshift clusters
EventBridge rules scheduled to run weekdays 7 AM - 6 PM (only when teams needed access)
SNS notifications to alert teams before scheduled shutdowns
Tag-based targeting using "Environment:Dev" and "AutoSchedule:True" tags

Example Lambda Function (Python) Python

import boto3
import os

def lambda_handler(event, context):
    ec2 = boto3.client('ec2')
    action = os.environ['ACTION']  # START or STOP
    
    # Find instances with AutoSchedule tag
    instances = ec2.describe_instances(
        Filters=[
            {'Name': 'tag:AutoSchedule', 'Values': ['True']},
            {'Name': 'tag:Environment', 'Values': ['Dev', 'Test']}
        ]
    )
    
    instance_ids = []
    for reservation in instances['Reservations']:
        for instance in reservation['Instances']:
            instance_ids.append(instance['InstanceId'])
    
    if instance_ids:
        if action == 'STOP':
            ec2.stop_instances(InstanceIds=instance_ids)
        else:
            ec2.start_instances(InstanceIds=instance_ids)
        
        return {
            'statusCode': 200,
            'body': f'{action} completed for {len(instance_ids)} instances'
        }

Scheduling Results

Automated 127 non-production resources
Reduced non-production runtime from 168 hours/week to 55 hours/week (67% reduction)
Annual savings: $52,000

Strategy 4: Storage Optimization and Lifecycle Policies

S3 and EBS storage costs were quietly consuming budget. We implemented intelligent lifecycle policies to automatically move data to cost-effective storage tiers.

S3 Lifecycle Policies

Transition to S3-IA after 30 days for infrequently accessed data
Move to Glacier after 90 days for compliance archives
Delete old versions after 365 days for versioned buckets
Intelligent-Tiering for unpredictable access patterns

EBS Snapshot Management

Implemented Data Lifecycle Manager to automate snapshot creation and deletion
Deleted 1,200+ orphaned snapshots from terminated instances
Set 30-day retention for dev/test, 90-day for production

⚠️

Common Mistake

Don't just delete old snapshots without understanding dependencies. We discovered several "backup" snapshots that were actually gold images for critical systems. Always audit before deletion.

Storage Results

S3 costs reduced by 34% through lifecycle policies
Eliminated $47,000 in unnecessary snapshot storage
Annual savings: $38,000

Strategy 5: Reserved Instances and Savings Plans

After right-sizing and optimization, we had clear visibility into steady-state workloads. This made Reserved Instance planning straightforward and low-risk.

Our RI Strategy

Conservative approach: Only committed to 60% of steady-state usage (reducing risk)
1-year terms: Maintained flexibility for mission changes
Standard RIs: For known, stable workloads (databases, core applications)
Compute Savings Plans: For variable instance families but steady usage

RI Results

Purchased $180K in Reserved Instances
Average discount: 42% vs On-Demand pricing
Annual savings: $32,000 (with room to optimize further)

Strategy 6: CloudWatch Monitoring and Continuous Optimization

Cost optimization isn't a one-time project, it's an ongoing process. We implemented comprehensive monitoring to catch cost anomalies and optimization opportunities.

Monitoring Implementation

AWS Budgets with alerts at 50%, 80%, and 100% of forecasted spend
Cost anomaly detection using AWS Cost Anomaly Detection service
Custom CloudWatch dashboards showing cost trends by service, team, and project
Weekly cost reports sent to engineering leads
Quarterly optimization reviews to identify new opportunities

💡

Pro Tip

Set up AWS Cost Anomaly Detection early. It caught a misconfigured NAT Gateway that was generating $4,000/month in unexpected data transfer charges within 48 hours of the issue starting.

The Complete Savings Breakdown

EC2 Right-Sizing

$89,000

Resource Scheduling

$52,000

Storage Optimization

$38,000

Reserved Instances

$32,000

Total Annual Savings

$211,000

Key Lessons Learned

1. Start with Visibility

You can't optimize what you can't measure. Invest in tagging, Cost Explorer analysis, and CloudWatch metrics before making changes.

2. Quick Wins Build Momentum

Resource scheduling delivered immediate, visible savings that got stakeholder buy-in for larger optimization projects.

3. Automate Everything

Manual processes don't scale and drift over time. CloudFormation, Lambda, and EventBridge made our optimizations sustainable.

4. Test Before Production

Every right-sizing change went through dev/test validation first. This prevented the one production outage that would have erased all credibility.

5. Make it Continuous

Set up monitoring and regular reviews. We discovered an additional $30K in savings during quarterly reviews six months after the initial project.

How You Can Apply These Strategies

Whether you're spending $5,000 or $500,000 monthly on AWS, these strategies scale:

Week 1: Run a comprehensive audit using Cost Explorer and CloudWatch
Week 2: Implement resource tagging and identify quick wins
Week 3-4: Deploy resource scheduling for non-production environments
Week 5-6: Right-size EC2 instances based on utilization data
Week 7-8: Implement storage lifecycle policies and snapshot cleanup
Month 3: Analyze RI/Savings Plan opportunities and commit
Ongoing: Monitor, review quarterly, and continuously optimize

Need Help Optimizing Your AWS Costs?

I offer comprehensive AWS Health Check Audits that identify your specific optimization opportunities. In one week, we'll analyze your environment and deliver an actionable plan with projected savings.

Schedule a Free Consultation →

Conclusion

Saving $200K on AWS wasn't about cutting corners or sacrificing capabilities. It was about understanding what we were paying for, eliminating waste, and optimizing for our actual usage patterns.

The best part? These optimizations improved our infrastructure. Right-sized instances performed better. Automated scheduling reduced security exposure. Better monitoring caught issues faster. We delivered more value while spending less.

Your AWS environment likely has similar opportunities waiting to be discovered. The question isn't whether you can save money, it's how much.

AWS Cost Optimization CloudFormation EC2 DevOps Cloud Architecture