• db0@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    205
    ·
    11 hours ago

    It’s wild that these cloud providers were seen as a one-way stop to ensure reliability, only to make them a universal single point of failure.

    • joel_feila@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      8 minutes ago

      Well companies use not for relibibut to outsource responsibility. Even a medium sized company treated Windows like a subscription for many many years. People have been emailing files to themself since the start of email.

      For companies moving everything to msa or aws just was the next step and didn’t change day to operations

    • corsicanguppy@lemmy.ca
      link
      fedilink
      English
      arrow-up
      7
      ·
      5 hours ago

      universal single point of failure.

      If it’s not a region failure, it’s someone pushing untested slop into the devops pipeline and vaping a network config. So very fired.

    • GissaMittJobb@lemmy.ml
      link
      fedilink
      English
      arrow-up
      52
      ·
      10 hours ago

      It’s mostly a skill issue for services that go down when USE-1 has issues in AWS - if you actually know your shit, then you don’t get these kinds of issues.

      Case in point: Netflix runs on AWS and experienced no issues during this thing.

      And yes, it’s scary that so many high-profile companies are this bad at the thing they spend all day doing

      • B0rax@feddit.org
        link
        fedilink
        English
        arrow-up
        3
        ·
        edit-2
        1 hour ago

        Case in point: Netflix runs on AWS and experienced no issues during this thing.

        But Netflix did encounter issues. For example the account cancel page did not work.

      • village604@adultswim.fan
        link
        fedilink
        English
        arrow-up
        13
        ·
        8 hours ago

        Yeah, if you’re a major business and don’t have geographic redundancy for your service, you need to rework your BCDR plan.

        • sugar_in_your_tea@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          4
          ·
          edit-2
          7 hours ago

          Absolutely this. We are based out of one region, but also have a second region as a quick disaster recovery option, and we have people 24/7 who can manage the DR process. We’re not big enough to have live redundancy, but big enough that an hour of downtime would be a big deal.

      • tourist@lemmy.world
        link
        fedilink
        English
        arrow-up
        4
        ·
        6 hours ago

        What’s the general plan of action when a company’s base region shits the bed?

        Keep dormant mirrored resources in other regions?

        I presumed the draw of us-east-1 was its lower cost, so if any solutions involve spending slightly more money, I’m not surprised high profile companies put all their eggs in one basket.

        • corsicanguppy@lemmy.ca
          link
          fedilink
          English
          arrow-up
          2
          ·
          5 hours ago

          I presumed the draw of us-east-1 was its lower cost

          At no time is pub-cloud cheaper than priv-cloud.

          The draw is versatility, as change didn’t require spinning up hardware. No one knew how much the data costs would kill the budget, but now they do.

    • Nighed@feddit.uk
      link
      fedilink
      English
      arrow-up
      96
      arrow-down
      1
      ·
      11 hours ago

      But if everyone else is down too, you don’t look so bad 🧠

      • clif@lemmy.world
        link
        fedilink
        English
        arrow-up
        7
        ·
        edit-2
        5 hours ago

        One of our client support people told an angry client to open a Jira with urgent priority and we’d get right on it.

        … the client support person knew full well that Jira was down too : D

        At least, I think they knew. Either way, not shit we could do about it for that particular region until AWS fixed things.

        • cdzero@lemmy.ml
          link
          fedilink
          English
          arrow-up
          14
          ·
          9 hours ago

          I wouldn’t be so sure about that. The state government of Queensland, Australia just lifted a 12 year ban on IBM getting government contracts after a colossal fuck up.

          • queerlilhayseed@piefed.blahaj.zone
            link
            fedilink
            English
            arrow-up
            35
            ·
            edit-2
            8 hours ago

            It’s an old joke from back when IBM was the dominant player in IT infrastructure. The idea was that IBM was such a known quantity that even non-technical executives knew what it was and knew that other companies also used IBM equipment. If you decide to buy from a lesser known vendor and something breaks, you might be blamed for going off the beaten track and fired (regardless of where the fault actually lay), whereas if you bought IBM gear and it broke, it was simply considered the cost of doing business, so buying IBM became a CYA tactic for sysadmins even if it went against their better technical judgement. AWS is the modern IBM.

          • ByteJunk@lemmy.world
            link
            fedilink
            English
            arrow-up
            3
            ·
            7 hours ago

            Such a monstrous clusterfuck, and you’ll be hard pressed to find anyone having been sacked, let alone facing actual charges over the whole debacle.

            If anything, I’d say that’s the single best case for buying IBM - if you’re incompetent and/or corrupt, just go with them and even if shit hits the fan, you’ll be OK.

    • tburkhol@lemmy.world
      link
      fedilink
      English
      arrow-up
      27
      arrow-down
      1
      ·
      11 hours ago

      It is still a logical argument, especially for smaller shops. I mean, you can (as self-hosters know) set up automatic backups, failover systems, and all that, but it takes significant time & resources. Redundant internet connectivity? Redundant power delivery? Spare capacity to handle a 10x demand spike? Those are big expenses for small, even mid-sized business. No one really cares if your dentist’s office is offline for a day, even if they have to cancel appointments because they can’t process payments or records.

      Meanwhile, theoretically, reliability is such a core function of cloud providers that they should pay for experts’ experts and platinum standard infrastructure. It makes any problem they do have newsworthy.

      I mean,it seems silly for orgs as big and internet-centric as Fortnite, Zoom, or forturne-500 bank to outsource their internet, and maybe this will be a lesson for them.

        • killabeezio@lemmy.zip
          link
          fedilink
          English
          arrow-up
          1
          ·
          3 hours ago

          No it’s not. It’s very expensive to run and there are a lot of edge cases. It’s much easier to have regional redundancy for a fraction of the cost.