AWS ECS Mumbai has mood swings - DevOps engineer perspective
MOJAHID UL HAQUE
DevOps Engineer
As a DevOps engineer, I've basically accepted that AWS ECS Mumbai has mood swings. Once or twice a month, it just… decides it's done with life. Deploy? Maybe. Pull images? If it feels like it. Random crash? Always a crowd pleaser.
And of course, the AWS status page sits there smiling like everything's perfectly normal.
Meanwhile, I'm digging through IAM, logs, task defs, pipelines, wondering if I forgot how computers work… only to realize it's just Mumbai taking a personal day again.
But who gets blamed?
"DevOps can't deploy."
Yes. Clearly, I woke up and told ECS to stop doing its job.
At this point, we just want a little stability and a status page that doesn't gaslight me while the region is on vacation.
Originally posted on LinkedIn
View original postRelated Posts
How I reduced AWS networking costs by 93% while removing public attack surface
I recently tackled a common but expensive challenge in AWS: the hidden cost of public IPv4 addresses. In a setup with dozens of ECS Fargate tasks, my "In-use Public IP" charges were hitting hundreds of dollars per month. Beyond the cost, having backend workers exposed to the public internet was a security risk I wanted to eliminate. The Fix: I transitioned the entire architecture to a private-first model. 1. Disabled Public IPs: Moved all Fargate tasks to private mode within the VPC. 2. VPC Peering: Connected multiple VPCs using VPC Peering to enable secure, private communication between services across environments, no internet routing required. 3. Optimized Routing: Navigated complex DNS and routing requirements to ensure seamless communication between services without needing a NAT Gateway. 4. Added a Public Load Balancer: Introduced an internet-facing Application Load Balancer to handle inbound traffic. Only the load balancer is publicly accessible backend services remain private. The Results: - Cost: Monthly networking spend for public IPs was eliminated entirely, replaced by a much smaller, fixed endpoint fee. - Security: Drastically reduced the attack surface by ensuring backend workers are no longer reachable from the internet. - Efficiency: The system is now more robust, secure, and cost-predictable.
Scaling Applications on AWS (Real Example)
See how to scale an application on AWS with a real architecture example covering stateless compute, data bottlenecks, caching, queues, and rollout safety.
Stop Leaving AWS Credits Unclaimed - That Outage Might've Owed You Money
Remember the AWS outage on October 20th? Six hours down. Over 100+ services affected. Millions of users impacted. Everyone's talked about the RCA, multi-region setups, and resilience planning. But here's what most teams completely miss: 👉 You might be owed money. The SLA Reality Check AWS makes uptime promises like: - Cognito — 99.9% - DynamoDB — 99.99% (and 99.999% for Global Tables) - EC2, Lambda, CloudFront… all have their own SLAs. Now, do the math: 6 hours of downtime in a 30-day month = 99.17% uptime. That's below every single SLA above. What That Means for You If your services were affected, you're entitled to service credits — typically 10–25% of your monthly bill. So if you spend $10K/month on Cognito or DynamoDB… that's real money sitting unclaimed. How to Claim It (Takes 10 Minutes) 1. Go to your AWS Support Center 2. Open a new case 3. List the affected services 4. Reference the SLA breach 5. Submit before the end of the second billing cycle AWS won't credit you automatically. You have to ask. The Takeaway Yes — improve your DR and multi-region strategy. But also — don't forget to claim what you're owed. It's quick, it's legit, and your FinOps team will thank you.