Writing · April 12, 2026
How to Optimize and Forecast Cloud Costs: A FinOps Guide (2026)
How to optimize and forecast cloud costs on AWS and Azure. Covers FinOps framework, rightsizing, reserved instances, spot instances, tagging strategy, cost anomaly detection, and forecasting with real Terraform and CLI examples.
Why Cloud Costs Spiral Out of Control {#why-costs-spiral}
Cloud's pay-as-you-go model is a double-edged sword. The same elasticity that lets you scale to millions of users also lets costs grow unchecked. I've seen this pattern on almost every enterprise cloud project:
Month 1: $5,000 — small team, controlled spend
Month 6: $28,000 — more services, more engineers
Month 12: $95,000 — nobody knows what half of it is
Month 18: $180,000 — "we need to do something about the cloud bill"
The root causes are almost always the same:
- Overprovisioning — instances sized for peak load running at 5% CPU 24/7
- Zombie resources — dev environments, test databases, old snapshots nobody deleted
- No tagging — can't attribute costs to teams or products
- Wrong pricing model — paying on-demand rates for workloads that run 24/7
- Data transfer costs — nobody budgeted for egress fees
The FinOps Framework {#finops}
FinOps is not a tool — it's a practice. The FinOps Foundation defines three phases:
┌─────────────────────────────────────────────────────────┐
│ FinOps Lifecycle │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ INFORM │─────►│OPTIMISE │─────►│ OPERATE │ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ │ │ │ │
│ Visibility Rightsizing Budgets & │
│ Tagging Reserved Forecasts │
│ Allocation Instances Anomaly │
│ Dashboards Spot usage Detection │
│ Waste removal Showback │
└─────────────────────────────────────────────────────────┘
The three teams that must collaborate:
| Team | Responsibility | Common failure mode |
|---|---|---|
| Engineering | Implement cost-efficient architecture | "That's a finance problem" |
| Finance | Budget, forecast, chargeback | No visibility into cloud spend |
| Business | Define value metrics | No connection between cost and outcome |
Rightsizing Compute {#rightsizing}
Rightsizing means matching instance size to actual workload requirements — not what you think you might need.
AWS Compute Optimizer analyses CloudWatch metrics and recommends optimal instance types:
# Get rightsizing recommendations via AWS CLI
aws compute-optimizer get-ec2-instance-recommendations \
--region us-east-1 \
--query 'instanceRecommendations[*].{
Instance:instanceArn,
CurrentType:currentInstanceType,
RecommendedType:recommendationOptions[0].instanceType,
MonthlySavings:recommendationOptions[0].estimatedMonthlySavings.value
}' \
--output table
Typical findings:
Instance Current Recommended Monthly Saving
─────────────────────────────────────────────────────────
prod-api-01 m5.2xlarge m5.large $180/month
prod-worker-02 c5.4xlarge c5.xlarge $320/month
staging-db-01 r5.2xlarge r5.large $210/month
─────────────────────────────────────────────────────────
Total potential saving: $710/month = $8,520/year
Switch to AWS Graviton (ARM) for up to 40% savings:
# Terraform — switch EC2 to Graviton
resource "aws_instance" "api_server" {
ami = "ami-0c02fb55956c7d316" # Amazon Linux 2023 ARM64
instance_type = "m7g.large" # Graviton3 — 40% cheaper than m5.large
# Same vCPU and memory, lower cost, better performance
}
Pricing Models: On-Demand vs Reserved vs Spot {#pricing-models}
COST COMPARISON — m5.large (2 vCPU, 8GB RAM) — us-east-1
On-Demand: $0.096/hr = $69.12/month (baseline)
1yr Reserved (No Upfront): $0.061/hr = $43.92/month (-36%)
1yr Reserved (All Upfront): $0.056/hr = $40.32/month (-42%)
3yr Reserved (All Upfront): $0.038/hr = $27.36/month (-60%)
Spot Instance: $0.029/hr = ~$20.88/month (-70%, interruptible)
Savings Plan (1yr): $0.059/hr = $42.48/month (-39%, flexible)
Decision framework:
Is this workload running > 70% of the time?
│
├── YES ──► Buy Reserved Instance or Savings Plan
│ (1yr = ~36% saving, 3yr = ~60% saving)
│
└── NO ───► Is it fault-tolerant / can restart?
│
├── YES ──► Use Spot Instances
│ (up to 90% saving)
│
└── NO ───► On-Demand
(dev/test, unpredictable)
Spot Instance example — batch ML training job:
import boto3
ec2 = boto3.client('ec2', region_name='us-east-1')
response = ec2.request_spot_instances(
SpotPrice='0.05', # Max price willing to pay
InstanceCount=1,
Type='one-time',
LaunchSpecification={
'ImageId': 'ami-0c02fb55956c7d316',
'InstanceType': 'c5.2xlarge',
'KeyName': 'my-key',
'IamInstanceProfile': {'Name': 'ml-training-role'},
'UserData': open('train_script.sh', 'rb').read(),
}
)
print("Spot request ID:", response['SpotInstanceRequests'][0]['SpotInstanceRequestId'])
Tagging Strategy for Cost Allocation {#tagging}
Without tags, you cannot answer: "How much does the payments service cost?" Tags are the foundation of cost allocation.
Mandatory tag schema (enforce via AWS Config or Azure Policy):
# Terraform — enforce tags on all resources
locals {
mandatory_tags = {
Environment = var.environment # prod / staging / dev
Team = var.team # platform / payments / data
Product = var.product # checkout / analytics / auth
CostCentre = var.cost_centre # CC-1234
Owner = var.owner_email # team-lead@company.com
ManagedBy = "terraform"
}
}
resource "aws_instance" "app" {
ami = data.aws_ami.amazon_linux.id
instance_type = "t3.medium"
tags = local.mandatory_tags
}
Enforce tagging compliance with AWS Config:
aws configservice put-config-rule --config-rule '{
"ConfigRuleName": "required-tags",
"Source": {
"Owner": "AWS",
"SourceIdentifier": "REQUIRED_TAGS"
},
"InputParameters": "{\"tag1Key\":\"Environment\",\"tag2Key\":\"Team\",\"tag3Key\":\"CostCentre\"}"
}'
Forecasting Cloud Costs {#forecasting}
AWS Cost Explorer forecast:
# Forecast next 3 months of EC2 costs
aws ce get-cost-forecast \
--time-period Start=2026-05-01,End=2026-08-01 \
--metric UNBLENDED_COST \
--granularity MONTHLY \
--filter '{"Dimensions":{"Key":"SERVICE","Values":["Amazon EC2"]}}' \
--query 'ForecastResultsByTime[*].{Month:TimePeriod.Start,Mean:MeanValue,Upper:PredictionIntervalUpperBound}' \
--output table
Python — build your own cost forecast with linear regression:
import boto3
import pandas as pd
from sklearn.linear_model import LinearRegression
import numpy as np
ce = boto3.client('ce', region_name='us-east-1')
# Pull last 12 months of costs
response = ce.get_cost_and_usage(
TimePeriod={'Start': '2025-04-01', 'End': '2026-04-01'},
Granularity='MONTHLY',
Metrics=['UnblendedCost'],
)
costs = [float(r['Total']['UnblendedCost']['Amount'])
for r in response['ResultsByTime']]
X = np.arange(len(costs)).reshape(-1, 1)
y = np.array(costs)
model = LinearRegression().fit(X, y)
# Forecast next 3 months
future = np.arange(len(costs), len(costs) + 3).reshape(-1, 1)
forecast = model.predict(future)
for i, cost in enumerate(forecast, 1):
print(f"Month +{i}: ${cost:,.0f}")
Cost Anomaly Detection {#anomaly}
AWS Cost Anomaly Detection uses ML to identify unusual spend patterns and alert you before the bill arrives.
# Create an anomaly monitor for all AWS services
aws ce create-anomaly-monitor --anomaly-monitor '{
"MonitorName": "AllServicesMonitor",
"MonitorType": "DIMENSIONAL",
"MonitorDimension": "SERVICE"
}'
# Create an alert subscription — email when anomaly > $100
aws ce create-anomaly-subscription --anomaly-subscription '{
"SubscriptionName": "DailyAnomalyAlert",
"MonitorArnList": ["arn:aws:ce::123456789:anomalymonitor/MONITOR_ID"],
"Subscribers": [{"Address": "team@company.com", "Type": "EMAIL"}],
"Threshold": 100,
"Frequency": "DAILY"
}'
Tools Comparison {#tools}
| Tool | Best for | Cost | Strengths |
|---|---|---|---|
| AWS Cost Explorer | AWS-only environments | Free | Native, deep AWS integration |
| AWS Compute Optimizer | EC2/Lambda rightsizing | Free | ML-based recommendations |
| Azure Cost Management | Azure environments | Free | Native Azure, budget alerts |
| Infracost | Pre-deployment cost estimates | Free/paid | Integrates with Terraform CI/CD |
| CloudHealth (VMware) | Multi-cloud enterprises | Paid | Cross-cloud visibility, governance |
| Apptio Cloudability | Large enterprises | Paid | FinOps reporting, showback |
| Spot.io (NetApp) | Spot instance automation | Paid | Automated spot management |
Infracost in CI/CD — show cost impact of every Terraform PR:
# .github/workflows/infracost.yml
- name: Run Infracost
uses: infracost/actions/setup@v3
with:
api-key: $
- name: Generate cost estimate
run: infracost breakdown --path=./terraform --format=json --out-file=/tmp/infracost.json
- name: Post PR comment
uses: infracost/actions/comment@v3
with:
path: /tmp/infracost.json
behavior: update
FAQ {#faq}
How do you optimize cloud costs on AWS? Highest-impact actions: rightsize underutilised instances with Compute Optimizer, buy Reserved Instances or Savings Plans for steady workloads (saves 36–60%), use Spot for batch jobs (saves up to 90%), enforce tagging for cost allocation, and set up Cost Anomaly Detection.
What is FinOps? FinOps is a cloud financial management practice that brings engineering, finance, and business together to maximise value per dollar spent on cloud. It's not about cutting costs — it's about making informed spending decisions.
How much can you save by rightsizing? Typically 20–40% of compute costs. Most environments have instances running at under 20% CPU utilisation. Switching to Graviton (ARM) instances alone saves up to 40% with no code changes.
What is the difference between Reserved Instances and Savings Plans? Reserved Instances commit to a specific instance type in a specific region. Savings Plans are more flexible — you commit to a spend rate per hour and the discount applies across instance types, regions, Lambda, and Fargate.
Dealing with a runaway cloud bill? Let's talk on LinkedIn.
Comments & Reactions