AWS is having a bad day

Matt The Horwood@lemmy.horwood.cloud · 11 hours ago

AWS is having a bad day

db0@lemmy.dbzer0.com · 11 hours ago

It’s wild that these cloud providers were seen as a one-way stop to ensure reliability, only to make them a universal single point of failure.

joel_feila@lemmy.world · 8 minutes ago

Well companies use not for relibibut to outsource responsibility. Even a medium sized company treated Windows like a subscription for many many years. People have been emailing files to themself since the start of email.

For companies moving everything to msa or aws just was the next step and didn’t change day to operations

corsicanguppy@lemmy.ca · 5 hours ago

universal single point of failure.

If it’s not a region failure, it’s someone pushing untested slop into the devops pipeline and vaping a network config. So very fired.

4am@lemmy.zip · 4 hours ago

Apparently it was DNS. It’s always DNS…

GissaMittJobb@lemmy.ml · 10 hours ago

It’s mostly a skill issue for services that go down when USE-1 has issues in AWS - if you actually know your shit, then you don’t get these kinds of issues.

Case in point: Netflix runs on AWS and experienced no issues during this thing.

And yes, it’s scary that so many high-profile companies are this bad at the thing they spend all day doing

B0rax@feddit.org · edit-2 1 hour ago

Case in point: Netflix runs on AWS and experienced no issues during this thing.

But Netflix did encounter issues. For example the account cancel page did not work.

village604@adultswim.fan · 8 hours ago

Yeah, if you’re a major business and don’t have geographic redundancy for your service, you need to rework your BCDR plan.

Bob Robertson IX @discuss.tchncs.de · 8 hours ago

But… that costs money.

village604@adultswim.fan · 7 hours ago

So does an outage, but I get that the C-suite can only think one quarter at a time

sugar_in_your_tea@sh.itjust.works · edit-2 7 hours ago

Absolutely this. We are based out of one region, but also have a second region as a quick disaster recovery option, and we have people 24/7 who can manage the DR process. We’re not big enough to have live redundancy, but big enough that an hour of downtime would be a big deal.

corsicanguppy@lemmy.ca · 5 hours ago

I love the “git gud” response. Sacred cashcows?

tourist@lemmy.world · 6 hours ago

What’s the general plan of action when a company’s base region shits the bed?

Keep dormant mirrored resources in other regions?

I presumed the draw of us-east-1 was its lower cost, so if any solutions involve spending slightly more money, I’m not surprised high profile companies put all their eggs in one basket.

corsicanguppy@lemmy.ca · 5 hours ago

I presumed the draw of us-east-1 was its lower cost

At no time is pub-cloud cheaper than priv-cloud.

The draw is versatility, as change didn’t require spinning up hardware. No one knew how much the data costs would kill the budget, but now they do.

Nighed@feddit.uk · 11 hours ago

But if everyone else is down too, you don’t look so bad 🧠

clif@lemmy.world · edit-2 5 hours ago

One of our client support people told an angry client to open a Jira with urgent priority and we’d get right on it.

… the client support person knew full well that Jira was down too : D

At least, I think they knew. Either way, not shit we could do about it for that particular region until AWS fixed things.

queerlilhayseed@piefed.blahaj.zone · 10 hours ago

No one ever got fired for buying IBM.

Auli@lemmy.ca · 6 hours ago

Yes but now it is nobody ever got fired for buying Cisco.

cdzero@lemmy.ml · 9 hours ago

I wouldn’t be so sure about that. The state government of Queensland, Australia just lifted a 12 year ban on IBM getting government contracts after a colossal fuck up.

queerlilhayseed@piefed.blahaj.zone · edit-2 8 hours ago

It’s an old joke from back when IBM was the dominant player in IT infrastructure. The idea was that IBM was such a known quantity that even non-technical executives knew what it was and knew that other companies also used IBM equipment. If you decide to buy from a lesser known vendor and something breaks, you might be blamed for going off the beaten track and fired (regardless of where the fault actually lay), whereas if you bought IBM gear and it broke, it was simply considered the cost of doing business, so buying IBM became a CYA tactic for sysadmins even if it went against their better technical judgement. AWS is the modern IBM.

sugar_in_your_tea@sh.itjust.works · 7 hours ago

AWS is the modern IBM.

That’s basically why we use it at work. I hate it, but that’s how things are.

ByteJunk@lemmy.world · 7 hours ago

Such a monstrous clusterfuck, and you’ll be hard pressed to find anyone having been sacked, let alone facing actual charges over the whole debacle.

If anything, I’d say that’s the single best case for buying IBM - if you’re incompetent and/or corrupt, just go with them and even if shit hits the fan, you’ll be OK.

tburkhol@lemmy.world · 11 hours ago

It is still a logical argument, especially for smaller shops. I mean, you can (as self-hosters know) set up automatic backups, failover systems, and all that, but it takes significant time & resources. Redundant internet connectivity? Redundant power delivery? Spare capacity to handle a 10x demand spike? Those are big expenses for small, even mid-sized business. No one really cares if your dentist’s office is offline for a day, even if they have to cancel appointments because they can’t process payments or records.

Meanwhile, theoretically, reliability is such a core function of cloud providers that they should pay for experts’ experts and platinum standard infrastructure. It makes any problem they do have newsworthy.

I mean,it seems silly for orgs as big and internet-centric as Fortnite, Zoom, or forturne-500 bank to outsource their internet, and maybe this will be a lesson for them.

village604@adultswim.fan · 8 hours ago

It’s also silly for the orgs to not have geographic redundancy.

killabeezio@lemmy.zip · 3 hours ago

No it’s not. It’s very expensive to run and there are a lot of edge cases. It’s much easier to have regional redundancy for a fraction of the cost.

ms.lane@lemmy.world · 11 hours ago

They zigged when we all zagged.

Decentralisation has always been the answer.

Matt The Horwood@lemmy.horwood.cloud · 11 hours ago

yeah, so many things now use AWS in some way. So when AWS has a cold, the internet shivers

relativestranger@feddit.nl · 9 hours ago

sidekicks in '09. had so many users here affected.

never again.

wirebeads@lemmy.ca · 10 hours ago

A single point of failure you pay them for.