Normal view

Before yesterdayMain stream

The browser wars are back, and this time they’re powered by AI

24 October 2025 at 23:00
The browser wars are heating up again, this time with AI in the driver’s seat.  OpenAI just launched Atlas, a ChatGPT-powered browser that lets users surf the web using natural language, and even includes an “agent mode” that can complete tasks autonomously. It’s one of the biggest browser launches in recent memory, but it’s debuting […]

How the AWS outage happened: Amazon blames rare software bug and ‘faulty automation’ for massive glitch

24 October 2025 at 00:46
(GeekWire Photo / Todd Bishop)

A detailed explanation of this week’s Amazon Web Services outage, released Thursday morning, confirms that it wasn’t a hardware glitch or an outside attack but a complex, cascading failure triggered by a rare software bug in one of the company’s most critical systems.

The company said a “faulty automation” in its internal systems — two independent programs that began racing each other to update records — erased key network entries for its DynamoDB database service, triggering a domino effect that temporarily broke many other AWS tools.

AWS said it has turned off the flawed automation worldwide and will fix the bug before bringing it back online. The company also plans to add new safety checks and improve how quickly its systems recover if something similar happens again.

Amazon apologized and acknowledged the widespread disruption caused by the outage.

“While we have a strong track record of operating our services with the highest levels of availability, we know how critical our services are to our customers, their applications and end users, and their businesses,” the company said, promising to learn from the incident.

The outage began early Monday and impacted sites and online services around the world, again illustrating the internet’s deep reliance on Amazon’s cloud and showing how a single failure inside AWS can quickly ripple across the web.

Related: The AWS outage is a warning about the risks of digital dependance and AI infrastructure

The AWS outage is a warning about the risks of digital dependance and AI infrastructure

23 October 2025 at 00:08
The show floor at AWS re:Invent 2024 in Las Vegas. (GeekWire File Photo)

Unless you’ve been on a “digital cleanse” this week, you know that Amazon Web Services (AWS) had a major outage at the start of the week.

You know this because apps and sites you use were down. Credible reports estimate at least 1,000 sites and apps were affected. Large swaths of modern digital life went dark: from finance (Venmo and Robinhood) to gaming (Roblox and Fortnite) to communications (Signal and Slack). Some people couldn’t even get a good night’s sleep because the outage took out “smart beds.” Even sporting events were impacted when Ticketmaster failed.

We’ve seen outages before, but this one seemed broader and harder to ignore.

In the wake of the outage, many well-intentioned hot takes boiled down to: “They should’ve used more cloud providers.”

Setting aside the subtle victim-blaming, there’s also the fact that in a world with only three major cloud providers (AWS, Microsoft Azure, Google Cloud) if you want to “diversify” there’s not a lot of diversity out there.

And the argument for diversity in cloud providers is really about market diversity, not individual organizations juggling multiple vendors. More competition in the cloud market would mean fewer cascading failures when one provider goes down.

The key question when something like this happens is whether we’re taking the risk lessons and expanding them beyond the immediate problem to see the emerging problems. 

Instead of saying organizations need to have multiple cloud providers, we should be asking how we’re dealing with the reality of highly concentrated risks with exceptionally broad impact because we just had an object lesson in what that really means.

In this recent outage there’s a pointer to where we should be looking proactively to apply this lesson: generative AI. This recent AWS outage gives us two lessons for the emerging generative AI ecosystem.

Concentration crisis in AI

With the generative AI ecosystem, I’m talking not about chatbots — I mean AI-native applications that are built on generative AI as a platform. We just saw that when there’s no cloud, there’s no cloud-native application. Likewise, when there’s no generative AI provider, there’s no AI-native application.

The first lesson from the AWS outage for AI-native applications is what happens to an industry when there’s a limited number of providers for centralized resources and there’s an outage. We just saw: it has huge rippling effects across the industry and all walks of life built on it.

It’s a throwback to the mainframe era: when “the computer” is down, it’s down for everyone.

There are as few, if not fewer, generative AI providers as there are cloud providers. A major outage is inevitable — that’s just engineering reality. When that happens, every AI-native app built on that generative AI platform will also go down, full stop.

The impact could be even more severe than the AWS outage. It will be more like “the computer is down, and the people are gone” for many different industries and services. Ironically, the “smarter” the industry and service, the greater the potential fallout.

The second lesson is one of intertwined risk. OpenAI itself was affected by this week’s AWS outage. 

That means AI-native apps have double exposure to the risks around a limited number of providers for critical, centralized resources. For AI-native apps, it’s like the mainframe era squared. If the generative AI platform fails, everything built on it fails. And if the cloud that hosts the AI platform fails, it all goes down, too.

This is not to say don’t do cloud or don’t do AI. But it is to say we need to understand this new, complex intertwining of risks inherent in a world where everything is relying on a small number of key providers and that small number of key providers also rely on a small number of key providers.

The realities of physical requirements and capital investment required for cloud and generative AI make a truly diverse ecosystem impracticable for either. I don’t think anyone sees more than a literal handful of providers for either of these in the future. 

The bottom line

Highly concentrated risks with exceptionally broad impact aren’t going away anytime soon. 

But the growth of generative AI providers — and their reliance on cloud providers — show where there is going to be growth and where and what those risks will be. The growth will be upwards, as technologies stack on top of and rely on each other. And that means these risks are only going to become more concentrated and the impacts even broader.

In the world of security, there’s the “CIA” triad: “confidentiality”, “integrity” and “availability.” In the first days of “Trustworthy Computing” at Microsoft, the principles included “availability.” But in recent years, availability has been overlooked often as security and privacy concerns understandably dominate.

A thoughtful application of the AWS outage tells us that outages like this are a kind of problem that isn’t an anomaly: it’s inherent in the nature of today’s technology reality. And since there are no easy solutions and only increasingly complex problems around this, we need to start understanding this new reality and thinking seriously about how to mitigate these risks.

Amazon customers report delivery delays after major AWS outage

21 October 2025 at 19:48
An Amazon Prime delivery van parked near the company’s Seattle’s headquarters. (GeekWire File Photo / Kurt Schlosser)

Amazon’s e-commerce customers are experiencing unusual delivery delays following the Amazon Web Services outage on Monday — suggesting that the cloud glitch has impacted the company’s own operations more than previously reported.

Customers posting on Reddit and X reported Amazon orders that were scheduled for Monday delivery but did not arrive. Some of the comments:

  • “I received a delay email on everything due today. Coming tomorrow and I’m fine with that.”
  • “I have 4 items that are suppose to be delivered today as well and they haven’t even left the facility. So I’m sure it’s the outage.”
  • “My amazon fresh order was cancelled at 5:15PM.”

Amazon workers posting on the “r/AmazonFC” Reddit community cited downtime at fulfillment centers.

  • “Today was the first day I’ve experienced an entire day of downtime, and not as a shutdown for maintenance. Very odd feeling to maintain a constant state of readiness for 10 hours in case the system comes back at any moment.”

We reached out to Amazon for details about delayed deliveries.

Amazon’s package fulfillment systems run atop AWS infrastructure — so disruptions in key AWS services can ripple directly into its retail and logistics network.

Amazon’s logistics arm processes about 17.2 million delivery orders per day, according to Capital One.

The fallout from delayed deliveries could lead to increased costs due to potential refund obligations and additional labor needs.

The outage started shortly after midnight Monday and lasted for about three hours, but the aftershock effects were felt by Amazon’s cloud customers for much of the day. The company blamed a DNS resolution issue with its DynamoDB service in US-EAST-1 region, it oldest and largest digital hub. Major outages originating from this same region also caused widespread disruptions in 20172021, and 2023.

The outage impacted everything from sites including Facebook, Coinbase, and Ticketmaster, to check-in kiosks at LaGuardia Airport. Amazon’s own retail site, its Prime Video streaming service, and its Ring subsidiary were also affected.

Despite the major outage, Amazon’s stock was up Monday and in early Tuesday trading.

❌
❌