“Large language models have well-known issues and constraints. And so if you want to solve complex problems, you’re going to want to adopt what’s called multi-method agentic AI, which combines large language models with other kinds of proven automation technologies so that you can build more adaptable, more transparent systems that are much more likely […]
GeekWire’s Todd Bishop tries Amazon’s new smart delivery glasses in a simulated demo.
SAN FRANCISCO — Putting on Amazon’s new smart delivery glasses felt surprisingly natural from the start. Despite their high-tech components and slightly bulky design, they were immediately comfortable and barely heavier than my normal glasses.
Then a few lines of monochrome green text and a square target popped up in the right-hand lens — reminding me that these were not my regular frames.
Occupying just a portion of my total field of view, the text showed an address and a sorting code: “YLO 339.” As I learned, “YLO” represented the yellow tote bag where the package would normally be found, and “339” was a special code on the package label.
My task: find the package with that code. Or more precisely, let the glasses find them.
Amazon image from a separate demo, showing the process of scanning packages with the new glasses.
As soon as I looked at the correct package label, the glasses recognized the code and scanned the label automatically. A checkmark appeared on a list of packages in the glasses.
Then an audio alert played from the glasses: “Dog on property.”
When all the packages were scanned, the tiny green display immediately switched to wayfinding mode. A simple map appeared, showing my location as a dot, and the delivery destination marked with pins. In this simulation, there were two pins, indicating two stops.
After putting the package on the doorstep, it was time for proof of delivery. Instead of reaching for a phone, I looked at the package on the doorstep and pressed a button once on the small controller unit —the “compute puck” — on my harness. The glasses captured a photo.
With that, my simulated delivery was done, without ever touching a handheld device.
In my very limited experience, the biggest concern I had was the potential to be distracted — focusing my attention on the text in front of my eyes rather than the world around me. I understand now why the display automatically turns off when a van is in motion.
But when I mentioned that concern to the Amazon leaders guiding me through the demo, they pointed out that the alternative is looking down at a device. With the glasses, your gaze is up and largely unobstructed, theoretically making it much easier to notice possible hazards.
Beyond the fact that they’re not intended for public release, that simplicity is a key difference between Amazon’s utilitarian design and other augmented reality devices — such as Meta Ray-Bans, Apple Vision Pro, and Magic Leap — which aim to more fully enhance or overlay the user’s environment.
One driver’s experience
KC Pangan, who delivers Amazon packages in San Francisco and was featured in Amazon’s demo video, said wearing the glasses has become so natural that he barely notices them.
Pangan has been part of an Amazon study for the past two months. On the rare occasions when he switches back to the old handheld device, he finds himself thinking, “Oh, this thing again.”
“The best thing about them is being hands-free,” Pangan said in a conversation on the sidelines of the Amazon Delivering the Future event, where the glasses were unveiled last week.
Without needing to look down at a handheld device, he can keep his eyes up and stay alert for potential hazards. With another hand free, he can maintain the all-important three points of contact when climbing in or out of a vehicle, and more easily carry packages and open gates.
The glasses, he said, “do practically everything for me” — taking photos, helping him know where to walk, and showing his location relative to his van.
While Amazon emphasizes safety and driver experience as the primary goals, early tests hint at efficiency gains, as well. In initial tests, Amazon has seen up to 30 minutes of time savings per shift, although execs cautioned that the results are preliminary and could change with wider testing.
KC Pangan, an Amazon delivery driver in San Francisco who has been part of a pilot program for the new glasses. (GeekWire Photo / Todd Bishop)
Regulators, legislators and employees have raised red flags over new technology pushing Amazon fulfillment and delivery workers to the limits of human capacity and safety. Amazon disputes this premise, and calls the new glasses part of a larger effort to use technology to improve safety.
Using the glasses will be fully optional for both its Delivery Service Partners (DSPs) and their drivers, even when they’re fully rolled out, according to the company. The system also includes privacy features, such as a hardware button that allows drivers to turn off all sensors.
For those who use them, the company says it plans to provide the devices at no cost.
Despite the way it may look to the public, Amazon doesn’t directly employ the drivers who deliver its packages in Amazon-branded vans and uniforms. Instead, it contracts with DSPs, ostensibly independent companies that hire drivers and manage package deliveries from inside Amazon facilities.
With the introduction of smart glasses and other tech initiatives, including a soon-to-be-expanded training program, Amazon is deepening its involvement with DSPs and their drivers — potentially raising more questions about who truly controls the delivery workforce.
From ‘moonshot’ to reality
The smart glasses, still in their prototype phase, trace their origins to a brainstorming session about five years ago, said Beryl Tomay, Amazon’s vice president of transportation.
Each year, the team brainstorms big ideas for the company’s delivery system. During one of those sessions, a question emerged: What if drivers didn’t have to interact with any technology at all?
“The moonshot idea we came up with was, what if there was no technology that the driver had to interact with — and they could just follow the physical process of delivering a package from the van to the doorstep?” Tomay said in an interview. “How do we make that happen so they don’t have to use a phone or any kind of tech that they have to fiddle with?”
Beryl Tomay, Amazon’s vice president of transportation, introduces the smart glasses at Amazon’s Delivering the Future event. (GeekWire Photo / Todd Bishop)
That question led the team to experiment with different approaches before settling on glasses. It seemed kind of crazy at first, Tomay said, but they soon realized the potential to improve safety and the driver experience. Early trials with delivery drivers confirmed the theory.
“The hands-free aspect of it was just kind of magical,” she said, summing up the reaction from early users.
The project has already been tested with hundreds of delivery drivers across more than a dozen DSPs. Amazon plans to expand those trials in the coming months, with a larger test scheduled for November. The goal is to collect more feedback before deciding when the technology will be ready for wider deployment.
Typically, Amazon would have kept a new hardware project secret until later in its development. But Reuters reported on the existence of the project nearly a year ago. (The glasses were reportedly code-named “Amelia,” but they were announced without a name.) And this way, Amazon can get more delivery partners involved, get input, and make improvements.
Future versions may also expand the system’s capabilities, using sensors and data to automatically recognize potential hazards such as uneven walkways.
How the technology works
Amazon’s smart glasses are part of a system that also includes a small wearable computer and a battery, integrated with Amazon’s delivery software and vehicle systems.
The lenses are photochromatic, darkening automatically in bright sunlight, and can be fitted with prescription inserts. Two cameras — one centered, one on the left — support functions such as package scanning and photo capture for proof of delivery.
A built-in flashlight switches on automatically in dim conditions, while onboard sensors help the system orient to the driver’s movement and surroundings.
Amazon executive Viraj Chatterjee and driver KC Pangan demonstrate the smart glasses.
The glasses connect by a magnetic wire to a small controller unit, or “compute puck,” worn on the chest of a heat-resistant harness. The controller houses the device’s AI models, manages the visual display, and handles functions such as taking a delivery photo. It also includes a dedicated emergency button that connects drivers directly to Amazon’s emergency support systems.
On the opposite side of the chest, a swappable battery keeps the system balanced and running for a full route. Both components are designed for all-day comfort — the result, Tomay said, of extensive testing with drivers to ensure that wearing the gear feels natural when they’re moving around.
Connectivity runs through the driver’s official Amazon delivery phone via Bluetooth, and through the vehicle itself using a platform called “Fleet Edge” — a network of sensors and onboard computing modules that link the van’s status to the glasses.
This connection allows the glasses to know precisely when to activate, when to shut down, and when to sync data. When a van is put in park, the display automatically activates, showing details such as addresses, navigation cues, and package information. When the vehicle starts moving again, the display turns off — a deliberate safety measure so drivers never see visual data while driving.
Data gathered by the glasses plays a role in Amazon’s broader mapping efforts. Imagery and sensor data feed into “Project Wellspring,” a system that uses AI to better model the physical world. This helps Amazon refine maps, identify the safest parking spots, pinpoint building entrances, and optimize walking routes for future deliveries.
Amazon says the data collection is done with privacy in mind. In addition to the driver-controlled sensor shut-off button, any imagery collected is processed to “blur or remove personally identifiable information” such as faces and license plates before being stored or used.
The implications go beyond routing and navigation. Conceivably, the same data could also lay the groundwork for greater automation in Amazon’s delivery network over time.
Testing the delivery training
In addition to trying the glasses during the event at Amazon’s Delivery Station in Milpitas, Calif., I experienced firsthand just how difficult the job of delivering packages can be.
GeekWire’s Todd Bishop uses an Amazon training program that teaches drivers to walk safely on slippery surfaces.
Strapped into a harness for a slip-and-fall demo, I learned how easily a driver can lose footing on slick surfaces if not careful to walk properly.
I tried a VR training device that highlighted hidden hazards like pets sleeping under tires and taught me how to navigate complex intersections safely.
My turn in the company’s Rivian van simulator proved humbling. Despite my best efforts, I ran red lights and managed to crash onto virtual sidewalks.
GeekWire’s Todd Bishop after a highly unsuccessful attempt to use Amazon’s driving simulator.
The simulator, known as the Enhanced Vehicle Operation Learning Virtual Experience (EVOLVE), has been launched at Amazon facilities in Colorado, Maryland, and Florida, and Amazon says it will be available at 40 sites by the end of 2026.
It’s part of what’s known as the Integrated Last Mile Driver Academy (iLMDA), a program available at 65 sites currently, which Amazon says it plans to expand to more than 95 delivery stations across North America by the end of 2026.
“Drivers are autonomous on the road, and the amount of variables that they interact with on a given day are countless,” said Anthony Mason, Amazon’s director of delivery training and programs, who walked me through the training demos. One goal of the training, he said, is to give drivers a toolkit to pull from when they face challenging situations.
Suffice it to say, this is not the job for me. But if Amazon’s smart glasses live up to the company’s expectations, they might be a step forward for the drivers doing the real work.
Healthcare, once a digital laggard, has become an AI powerhouse, deploying solutions at 2.2 times the rate of other industries and tripling spending to $1.4 billion this year.
Tye Brady, chief technologist for Amazon Robotics, introduces “Project Eluna,” an AI model that assists operations teams, during Amazon’s Delivering the Future event in Milpitas, Calif. (GeekWire Photo / Todd Bishop)
SAN FRANCISCO — Amazon showed off its latest robotics and AI systems this week, presenting a vision of automation that it says will make warehouse and delivery work safer and smarter.
But the tech giant and some of the media at its Delivering the Future event were on different planets when it came to big questions about robots, jobs, and the future of human work.
The backdrop: On Tuesday, a day before the event, The New York Times cited internal Amazon documents and interviews to report that the company plans to automate as much as 75% of its operations by 2033. According to the report, the robotics team expects automation to “flatten Amazon’s hiring curve over the next 10 years,” allowing it to avoid hiring more than 600,000 workers even as sales continue to grow.
In a statement cited in the article, Amazon said the documents were incomplete and did not represent the company’s overall hiring strategy.
On stage at the event, Tye Brady, chief technologist for Amazon Robotics, introduced the company’s newest systems — Blue Jay, a setup that coordinates multiple robotic arms to pick, stow, and consolidate items; and Project Eluna, an agentic AI model that acts as a digital assistant for operations teams.
Later, he addressed the reporters in the room: “When you write about Blue Jay or you write about Project Eluna … I hope you remember that the real headline is not about robots. The real headline is about people, and the future of work we’re building together.”
Amazon’s new “Blue Jay” robotic system uses multiple coordinated arms to pick, stow, and consolidate packages inside a fulfillment center — part of the company’s next generation of warehouse automation. (Amazon Photo)
He said the benefits for employees are clear: Blue Jay handles repetitive lifting, while Project Eluna helps identify safety issues before they happen. By automating routine tasks, he said, AI frees employees to focus on higher-value work, supported by Amazon training programs.
Brady coupled that message with a reminder that no company has created more U.S. jobs over the past decade than Amazon, noting its plan to hire 250,000 seasonal workers this year.
His message to the company’s front-line employees: “These systems are not experiments. They’re real tools built for you, to make your job safer, smarter, and more rewarding.”
‘Menial, mundane, and repetitive’
Later, during a press conference, a reporter cited the New York Times report, asking Brady if he believes Amazon’s workforce could shrink on the scale the paper described based on the internal report.
Brady didn’t answer the question directly, but described the premise as speculation, saying it’s impossible to predict what will happen a decade from now. He pointed instead to the past 10 years of Amazon’s robotics investments, saying the company has created hundreds of thousands of new jobs — including entirely new job types — while also improving safety.
He said Amazon’s focus is on augmenting workers, not replacing them, by designing machines that make jobs easier and safer. The company, he added, will continue using collaborative robotics to help achieve its broader mission of offering customers the widest selection at the lowest cost.
In an interview with GeekWire after the press conference, Brady said he sees the role of robotics as removing the “menial, mundane, and repetitive” tasks from warehouse jobs while amplifying what humans do best — reasoning, judgment, and common sense.
“Real leaders,” he added, “will lead with hope — hope that technology will do good for people.”
When asked whether the company’s goal was a “lights-out” warehouse with no people at all, Brady dismissed the idea. “There’s no such thing as 100 percent automation,” he said. “That doesn’t exist.”
Tye Brady, chief technologist for Amazon Robotics, speaks about the company’s latest warehouse automation and AI initiatives during the Delivering the Future event. (GeekWire Photo / Todd Bishop)
Instead, he emphasized designing machines with real utility — ones that improve safety, increase efficiency, and create new types of technical jobs in the process.
When pressed on whether Amazon is replacing human hands with robotic ones, Brady pushed back: “People are much more than hands,” he said. “You perceive the environment. You understand the environment. You know when to put things together. Like, people got it going on. It’s not replacing a hand. That’s not the right way to think of it. It’s augmenting the human brain.”
Brady pointed to Amazon’s new Shreveport, La., fulfillment center as an example, saying the highly automated facility processes orders faster than previous generations while also adding about 2,500 new roles that didn’t exist before.
“That’s not a net job killer,” he said. “It’s creating more job efficiency — and more jobs in different pockets.”
The New York Times report offered a different view of Shreveport’s impact on employment. Describing it as Amazon’s “most advanced warehouse” and a “template for future robotic fulfillment centers,” the article said the facility uses about 1,000 robots.
Citing internal documents, the Times reported that automation allowed Amazon to employ about 25% fewer workers last year than it would have without the new systems. As more robots are added next year, it added, the company expects the site to need roughly half as many workers as it would for similar volumes of items under previous methods.
Wall Street sees big savings
Analysts, meanwhile, are taking the potential impact seriously. A Morgan Stanley research note published Wednesday — the same day as Amazon’s event and in direct response to the Times report — said the newspaper’s projections align with the investment bank’s baseline analysis.
Rather than dismissing the report as speculative, Morgan Stanley’s Brian Nowak treated the article’s data points as credible. The analysts wrote that Amazon’s reported plan to build around 40 next-generation robotic warehouses by 2027 was “in line with our estimated slope of robotics warehouse deployment.”
More notably, Morgan Stanley put a multi-billion-dollar price tag on the efficiency gains. Its previous models estimated the rollout could generate $2 billion to $4 billion in annual savings by 2027. But using the Times’ figure — that Amazon expects to “avoid hiring 160,000+ U.S. warehouse employees by ’27” — the analysts recalculated that the savings could reach as much as $10 billion per year.
Back at the event, the specific language used by Amazon executives aligned closely with details in the Times report about the company’s internal communications strategy.
According to the Times, internal documents advised employees to avoid terms such as “automation” and “A.I.” and instead use collaborative language like “advanced technology” and “cobots” — short for collaborative robots — as part of a broader effort to “control the narrative” around automation and hiring.
On stage, Brady’s remarks closely mirrored that approach. He consistently framed Amazon’s robotics strategy as one of augmentation, not replacement, describing new systems as tools built for people.
In the follow-up interview, Brady said he disliked the term “artificial intelligence” altogether, preferring to refer to the technology simply as “machines.”
“Intelligence is ours,” he said. “Intelligence is a very much a human thing.”
Hai Robotics' HaiPick Systems have achieved EU RED compliance, a critical validation by TÜV SÜD that significantly boosts cybersecurity for automated warehouses.
As 50,000 attendees descend on Salesforce's Dreamforce conference this week, the enterprise software giant is making its most aggressive bet yet on artificial intelligence agents, positioning itself as the antidote to what it calls an industry-wide "pilot purgatory" where 95% of enterprise AI projects never reach production.
The company on Monday launched Agentforce 360, a sweeping reimagination of its entire product portfolio designed to transform businesses into what it calls "agentic enterprises" — organizations where AI agents work alongside humans to handle up to 40% of work across sales, service, marketing, and operations.
"We are truly in the agentic AI era, and I think it's probably the biggest revolution, the biggest transition in technology I've ever experienced in my career," said Parker Harris, Salesforce's co-founder and chief technology officer, during a recent press briefing. "In the future, 40% of the work in the Fortune 1000 is probably going to be done by AI, and it's going to be humans and AI actually working together."
The announcement comes at a pivotal moment for Salesforce, which has deployed more than 12,000 AI agent implementations over the past year while building what Harris called a "$7 billion business" around its AI platform. Yet the launch also arrives amid unusual turbulence, as CEO Marc Benioff faces fierce backlash for recent comments supporting President Trump and suggesting National Guard troops should patrol San Francisco streets.
Why 95% of enterprise AI projects never launch
The stakes are enormous. While companies have rushed to experiment with AI following ChatGPT's emergence two years ago, most enterprise deployments have stalled before reaching production, according to recent MIT research that Salesforce executives cited extensively.
"Customers have invested a lot in AI, but they're not getting the value," said Srini Tallapragada, Salesforce's president and chief engineering and customer success officer. "95% of enterprise AI pilots fail before production. It's not because of lack of intent. People want to do this. Everybody understands the power of the technology. But why is it so hard?"
The answer, according to Tallapragada, is that AI tools remain disconnected from enterprise workflows, data, and governance systems. "You're writing prompts, prompts, you're getting frustrated because the context is not there," he said, describing what he called a "prompt doom loop."
Salesforce's solution is a deeply integrated platform connecting what it calls four ingredients: the Agentforce 360 agent platform, Data 360 for unified data access, Customer 360 apps containing business logic, and Slack as the "conversational interface" where humans and agents collaborate.
Slack becomes the front door to Salesforce
Perhaps the most significant strategic shift is the elevation of Slack — acquired by Salesforce in 2019 for $27.7 billion — as the primary interface for Salesforce itself. The company is effectively reimagining its traditional Lightning interface around Slack channels, where sales deals, service cases, and data insights will surface conversationally rather than through forms and dashboards.
"Imagine that you maybe don't log into Salesforce, you don't see Salesforce, but it's there. It's coming to you in Slack, because that's where you're getting your work done," Harris explained.
The strategy includes embedding Salesforce's Agentforce agents for sales, IT service, HR service, and analytics directly into Slack, alongside a completely rebuilt Slackbot that acts as a personal AI companion. The company is also launching "Channel Expert," an always-on agent that provides instant answers from channel conversations.
To enable third-party AI tools to access Slack's conversational data, Salesforce is releasing a Real-Time Search API and Model Context Protocol server. Partners including OpenAI, Anthropic, Google, Perplexity, Writer, Dropbox, Notion, and Cursor are building agents that will live natively in Slack.
"The best way to see the power of the platform is through the AI apps and agents already being built," Rob Seaman, a Salesforce executive, said during a technical briefing, citing examples of startups "achieving tens of thousands of customers that have it installed in 120 days or less."
Voice and IT service take aim at new markets
Beyond Slack integration, Salesforce announced major expansions into voice-based interactions and employee service. Agentforce Voice, now generally available, transforms traditional IVR systems into natural conversations that can update CRM records, trigger workflows, and seamlessly hand off to human agents.
The IT Service offering represents Salesforce's most direct challenge to ServiceNow, the market leader. Mudhu Sudhakar, who joined Salesforce two months ago as senior vice president for IT and HR Service, positioned the product as a fundamental reimagining of employee support.
"Legacy IT service management is very portals, forms, tickets focused, manual process," Sudhakar said. "What we had a few key tenets: conversation first and agent first, really focused on having a conversational experience for the people requesting the support and for the people providing the support."
The IT Service platform includes what Salesforce describes as 25+ specialized agents and 100+ pre-built workflows and connectors that can handle everything from password resets to complex incident management.
Early customers report dramatic efficiency gains
Customer results suggest the approach is gaining traction. Reddit reduced average support resolution time from 8.9 minutes to 1.4 minutes — an 84% improvement — while deflecting 46% of cases entirely to AI agents. "This efficiency has allowed us to provide on-demand help for complex tasks and boost advertiser satisfaction scores by 20%," said John Thompson, Reddit's VP of sales strategy and operations, in a statement.
Engine, a travel management company, reduced average handle time by 15%, saving over $2 million annually. OpenTable resolved 70% of restaurant and diner inquiries autonomously. And 1-800Accountant achieved a 90% case deflection rate during the critical tax week period.
Salesforce's own internal deployments may be most telling. Tallapragada's customer success organization now handles 1.8 million AI-powered conversations weekly, with metrics published at help.salesforce.com showing how many agents answer versus escalating to humans.
Even more significantly, Salesforce has deployed AI-powered sales development representatives to follow up on leads that would previously have gone uncontacted due to cost constraints. "Now, Agentforce has an SDR which is doing thousands of leads following up," Tallapragada explained. The company also increased proactive customer outreach by 40% by shifting staff from reactive support.
The trust layer problem enterprises can't ignore
Given enterprise concerns about AI reliability, Salesforce has invested heavily in what it calls the "trust layer" — audit trails, compliance checks, and observability tools that let organizations monitor agent behavior at scale.
"You should think of an agent as a human. Digital labor. You need to manage performance just like a human. And you need these audit trails," Tallapragada explained.
The company encountered this challenge firsthand when its own agent deployment scaled. "When we started at Agentforce at Salesforce, we would track every message, which is great until 1,000, 3,000," Tallapragada said. "Once you have a million chats, there's no human, we cannot do it."
The platform now includes "Agentforce Grid" for searching across millions of conversations to identify and fix problematic patterns. The company also introduced Agent Script, a new scripting language that allows developers to define precise guardrails and deterministic controls for agent behavior.
Data infrastructure gets a major upgrade
Underlying the agent capabilities is significant infrastructure investment. Salesforce's Data 360 includes "Intelligent Context," which automatically extracts structured information from unstructured content like PDFs, diagrams, and flowcharts using what the company describes as "AI-powered unstructured data pipelines."
The company is also collaborating with Databricks, dbt Labs, and Snowflake on the "Universal Semantic Interchange," an attempt to standardize how different platforms define business metrics. The pending $8 billion acquisition of Informatica, expected to close soon, will expand metadata management capabilities across the enterprise.
The competitive landscape keeps intensifying
Salesforce's aggressive AI agent push comes as virtually every major enterprise software vendor pursues similar strategies. Microsoft has embedded Copilot across its product line, Google offers agent capabilities through Vertex AI and Gemini, and ServiceNow has launched its own agentic offerings.
When asked how Salesforce's announcement compared to OpenAI's recent releases, Tallapragada emphasized that customers will use multiple AI tools simultaneously. "Most of the time I'm seeing they're using OpenAI, they're using Gemini, they're using Anthropic, just like Salesforce, we use all three," he said.
The real differentiation, executives argued, lies not in the AI models but in the integration with business processes and data. Harris framed the competition in terms familiar from Salesforce's founding: "26 years ago, we just said, let's make Salesforce automation as easy as buying a book on Amazon.com. We're doing that same thing. We want to make agentic AI as easy as buying a book on Amazon."
The company's customer success stories are impressive but remain a small fraction of its customer base. With 150,000 Salesforce customers and one million Slack customers, the 12,000 Agentforce deployments represent roughly 8% penetration — strong for a one-year-old product line, but hardly ubiquitous.
The company's stock, down roughly 28% year to date with a Relative Strength rating of just 15, suggests investors remain skeptical. This week's Dreamforce demonstrations — and the months of customer deployments that follow — will begin to provide answers to whether Salesforce can finally move enterprise AI from pilots to production at scale, or whether the "$7 billion business" remains more aspiration than reality.
Anthropic launched a new capability on Thursday that allows its Claude AI assistant to tap into specialized expertise on demand, marking the company's latest effort to make artificial intelligence more practical for enterprise workflows as it chases rival OpenAI in the intensifying competition over AI-powered software development.
The feature, called Skills, enables users to create folders containing instructions, code scripts, and reference materials that Claude can automatically load when relevant to a task. The system marks a fundamental shift in how organizations can customize AI assistants, moving beyond one-off prompts to reusable packages of domain expertise that work consistently across an entire company.
"Skills are based on our belief and vision that as model intelligence continues to improve, we'll continue moving towards general-purpose agents that often have access to their own filesystem and computing environment," said Mahesh Murag, a member of Anthropic's technical staff, in an exclusive interview with VentureBeat. "The agent is initially made aware only of the names and descriptions of each available skill and can choose to load more information about a particular skill when relevant to the task at hand."
The launch comes as Anthropic, valued at $183 billion after a recent $13 billion funding round, projects its annual revenue could nearly triple to as much as $26 billion in 2026, according to a recent Reuters report. The company is currently approaching a $7 billion annual revenue run rate, up from $5 billion in August, fueled largely by enterprise adoption of its AI coding tools — a market where it faces fierce competition from OpenAI's recently upgraded Codex platform.
How 'progressive disclosure' solves the context window problem
Skills differ fundamentally from existing approaches to customizing AI assistants, such as prompt engineering or retrieval-augmented generation (RAG), Murag explained. The architecture relies on what Anthropic calls "progressive disclosure" — Claude initially sees only skill names and brief descriptions, then autonomously decides which skills to load based on the task at hand, accessing only the specific files and information needed at that moment.
"Unlike RAG, this relies on simple tools that let Claude manage and read files from a filesystem," Murag told VentureBeat. "Skills can contain an unbounded amount of context to teach Claude how to complete a task or series of tasks. This is because Skills are based on the premise of an agent being able to autonomously and intelligently navigate a filesystem and execute code."
This approach allows organizations to bundle far more information than traditional context windows permit, while maintaining the speed and efficiency that enterprise users demand. A single skill can include step-by-step procedures, code templates, reference documents, brand guidelines, compliance checklists, and executable scripts — all organized in a folder structure that Claude navigates intelligently.
The system's composability provides another technical advantage. Multiple skills automatically stack together when needed for complex workflows. For instance, Claude might simultaneously invoke a company's brand guidelines skill, a financial reporting skill, and a presentation formatting skill to generate a quarterly investor deck — coordinating between all three without manual intervention.
What makes Skills different from OpenAI's Custom GPTs and Microsoft's Copilot
Anthropic is positioning Skills as distinct from competing offerings like OpenAI's Custom GPTs and Microsoft's Copilot Studio, though the features address similar enterprise needs around AI customization and consistency.
"Skills' combination of progressive disclosure, composability, and executable code bundling is unique in the market," Murag said. "While other platforms require developers to build custom scaffolding, Skills let anyone — technical or not — create specialized agents by organizing procedural knowledge into files."
The cross-platform portability also sets Skills apart. The same skill works identically across Claude.ai, Claude Code (Anthropic's AI coding environment), the company's API, and the Claude Agent SDK for building custom AI agents. Organizations can develop a skill once and deploy it everywhere their teams use Claude, a significant advantage for enterprises seeking consistency.
The feature supports any programming language compatible with the underlying container environment, and Anthropic provides sandboxing for security — though the company acknowledges that allowing AI to execute code requires users to carefully vet which skills they trust.
Early customers report 8x productivity gains on finance workflows
Early customer implementations reveal how organizations are applying Skills to automate complex knowledge work. At Japanese e-commerce giant Rakuten, the AI team is using Skills to transform finance operations that previously required manual coordination across multiple departments.
"Skills streamline our management accounting and finance workflows," said Yusuke Kaji, general manager of AI at Rakuten in a statement. "Claude processes multiple spreadsheets, catches critical anomalies, and generates reports using our procedures. What once took a day, we can now accomplish in an hour."
That's an 8x improvement in productivity for specific workflows — the kind of measurable return on investment that enterprises increasingly demand from AI implementations. Mike Krieger, Anthropic's chief product officer and Instagram co-founder, recently noted that companies have moved past "AI FOMO" to requiring concrete success metrics.
Design platform Canva plans to integrate Skills into its own AI agent workflows. "Canva plans to leverage Skills to customize agents and expand what they can do," said Anwar Haneef, general manager and head of ecosystem at Canva in a statement. "This unlocks new ways to bring Canva deeper into agentic workflows—helping teams capture their unique context and create stunning, high-quality designs effortlessly."
Cloud storage provider Box sees Skills as a way to make corporate content repositories more actionable. "Skills teaches Claude how to work with Box content," said Yashodha Bhavnani, head of AI at Box. "Users can transform stored files into PowerPoint presentations, Excel spreadsheets, and Word documents that follow their organization's standards—saving hours of effort."
The enterprise security question: Who controls which AI skills employees can use?
For enterprise IT departments, Skills raise important questions about governance and control—particularly since the feature allows AI to execute arbitrary code in sandboxed environments. Anthropic has built administrative controls that allow enterprise customers to manage access at the organizational level.
"Enterprise admins control access to the Skills capability via admin settings, where they can enable or disable access and monitor usage patterns," Murag said. "Once enabled at the organizational level, individual users still need to opt in."
That two-layer consent model — organizational enablement plus individual opt-in — reflects lessons learned from previous enterprise AI deployments where blanket rollouts created compliance concerns. However, Anthropic's governance tools appear more limited than some enterprise customers might expect. The company doesn't currently offer granular controls over which specific skills employees can use, or detailed audit trails of custom skill content.
Organizations concerned about data security should note that Skills require Claude's code execution environment, which runs in isolated containers. Anthropic advises users to "stick to trusted sources" when installing skills and provides security documentation, but the company acknowledges this is an inherently higher-risk capability than traditional AI interactions.
From API to no-code: How Anthropic is making Skills accessible to everyone
Anthropic is taking several approaches to make Skills accessible to users with varying technical sophistication. For non-technical users on Claude.ai, the company provides a "skill-creator" skill that interactively guides users through building new skills by asking questions about their workflow, then automatically generating the folder structure and documentation.
Developers working with Anthropic's API get programmatic control through a new /skills endpoint and can manage skill versions through the Claude Console web interface. The feature requires enabling the Code Execution Tool beta in API requests. For Claude Code users, skills can be installed via plugins from the anthropics/skills GitHub marketplace, and teams can share skills through version control systems.
"Skills are included in Max, Pro, Teams, and Enterprise plans at no additional cost," Murag confirmed. "API usage follows standard API pricing," meaning organizations pay only for the tokens consumed during skill execution, not for the skills themselves.
Anthropic provides several pre-built skills for common business tasks, including professional generation of Excel spreadsheets with formulas, PowerPoint presentations, Word documents, and fillable PDFs. These Anthropic-created skills will remain free.
Why the Skills launch matters in the AI coding wars with OpenAI
The Skills announcement arrives during a pivotal moment in Anthropic's competition with OpenAI, particularly around AI-assisted software development. Just one day before releasing Skills, Anthropic launched Claude Haiku 4.5, a smaller and cheaper model that nonetheless matches the coding performance of Claude Sonnet 4 — which was state-of-the-art when released just five months ago.
That rapid improvement curve reflects the breakneck pace of AI development, where today's frontier capabilities become tomorrow's commodity offerings. OpenAI has been pushing hard on coding tools as well, recently upgrading its Codex platform with GPT-5 and expanding GitHub Copilot's capabilities.
Anthropic's revenue trajectory — potentially reaching $26 billion in 2026 from an estimated $9 billion by year-end 2025 — suggests the company is successfully converting enterprise interest into paying customers. The timing also follows Salesforce's announcement this week that it's deepening AI partnerships with both OpenAI and Anthropic to power its Agentforce platform, signaling that enterprises are adopting a multi-vendor approach rather than standardizing on a single provider.
Skills addresses a real pain point: the "prompt engineering" problem where effective AI usage depends on individual employees crafting elaborate instructions for routine tasks, with no way to share that expertise across teams. Skills transforms implicit knowledge into explicit, shareable assets. For startups and developers, the feature could accelerate product development significantly — adding sophisticated document generation capabilities that previously required dedicated engineering teams and weeks of development.
The composability aspect hints at a future where organizations build libraries of specialized skills that can be mixed and matched for increasingly complex workflows. A pharmaceutical company might develop skills for regulatory compliance, clinical trial analysis, molecular modeling, and patient data privacy that work together seamlessly — creating a customized AI assistant with deep domain expertise across multiple specialties.
Anthropic indicates it's working on simplified skill creation workflows and enterprise-wide deployment capabilities to make it easier for organizations to distribute skills across large teams. As the feature rolls out to Anthropic's more than 300,000 business customers, the true test will be whether organizations find Skills substantively more useful than existing customization approaches.
For now, Skills offers Anthropic's clearest articulation yet of its vision for AI agents: not generalists that try to do everything reasonably well, but intelligent systems that know when to access specialized expertise and can coordinate multiple domains of knowledge to accomplish complex tasks. If that vision catches on, the question won't be whether your company uses AI — it will be whether your AI knows how your company actually works.
One year after emerging from stealth, Strella has raised $14 million in Series A funding to expand its AI-powered customer research platform, the company announced Thursday. The round, led by Bessemer Venture Partners with participation from Decibel Partners, Bain Future Back Ventures, MVP Ventures and 645 Ventures, comes as enterprises increasingly turn to artificial intelligence to understand customers faster and more deeply than traditional methods allow.
The investment marks a sharp acceleration for the startup founded by Lydia Hylton and Priya Krishnan, two former consultants and product managers who watched companies struggle with a customer research process that could take eight weeks from start to finish. Since October, Strella has grown revenue tenfold, quadrupled its customer base to more than 40 paying enterprises, and tripled its average contract values by moving upmarket to serve Fortune 500 companies.
"Research tends to be bookended by two very strategic steps: first, we have a problem—what research should we do? And second, we've done the research—now what are we going to do with it?" said Hylton, Strella's CEO, in an exclusive interview with VentureBeat. "All the stuff in the middle tends to be execution and lower-skill work. We view Strella as doing that middle 90% of the work."
The platform now serves Amazon, Duolingo, Apollo GraphQL, and Chobani, collectively conducting thousands of AI-moderated interviews that deliver what the company claims is a 90% average time savings on manual research work. The company is approaching $1 million in revenue after beginning monetization only in January, with month-over-month growth of 50% and zero customer churn to date.
How AI-powered interviews compress eight-week research projects into days
Strella's technology addresses a workflow that has frustrated product teams, marketers, and designers for decades. Traditional customer research requires writing interview guides, recruiting participants, scheduling calls, conducting interviews, taking notes, synthesizing findings, and creating presentations — a process that consumes weeks of highly-skilled labor and often delays critical product decisions.
The platform compresses that timeline to days by using AI to moderate voice-based interviews that run like Zoom calls, but with an artificial intelligence agent asking questions, following up on interesting responses, and detecting when participants are being evasive or fraudulent. The system then synthesizes findings automatically, creating highlight reels and charts from unstructured qualitative data.
"It used to take eight weeks. Now you can do it in the span of a couple days," Hylton told VentureBeat. "The primary technology is through an AI-moderated interview. It's like being in a Zoom call with an AI instead of a human — it's completely free form and voice based."
Critically, the platform also supports human moderators joining the same calls, reflecting the founders' belief that humans won't disappear from the research process. "Human moderation won't go away, which is why we've supported human moderation from our Genesis," Hylton said. "Research tends to be bookended by two very strategic steps: we have a problem, what's the research that we should do? And we've done the research, now what are we going to do with it? All the stuff in the middle tends to be execution and lower skill work. We view Strella as doing that middle 90% of the work."
Why customers tell AI moderators the truth they won't share with humans
One of Strella's most surprising findings challenges assumptions about AI in qualitative research: participants appear more honest with AI moderators than with humans. The founders discovered this pattern repeatedly as customers ran head-to-head comparisons between traditional human-moderated studies and Strella's AI approach.
"If you're a designer and you get on a Zoom call with a customer and you say, 'Do you like my design?' they're always gonna say yes. They don't want to hurt your feelings," Hylton explained. "But it's not a problem at all for Strella. They would tell you exactly what they think about it, which is really valuable. It's very hard to get honest feedback."
Krishnan, Strella's COO, said companies initially worried about using AI and "eroding quality," but the platform has "actually found the opposite to be true. People are much more open and honest with an AI moderator, and so the level of insight that you get is much richer because people are giving their unfiltered feedback."
This dynamic has practical business implications. Brian Santiago, Senior Product Design Manager at Apollo GraphQL, said in a statement: "Before Strella, studies took weeks. Now we get insights in a day — sometimes in just a few hours. And because participants open up more with the AI moderator, the feedback is deeper and more honest."
The platform also addresses endemic fraud in online surveys, particularly when participants are compensated. Because Strella interviews happen on camera in real time, the AI moderator can detect when someone pauses suspiciously long — perhaps to consult ChatGPT — and flags them as potentially fraudulent. "We are fraud resistant," Hylton said, contrasting this with traditional surveys where fraud rates can be substantial.
Solving mobile app research with persistent screen sharing technology
A major focus of the Series A funding will be expanding Strella's recently-launched mobile application, which Krishnan identified as critical competitive differentiation. The mobile app enables persistent screen sharing during interviews — allowing researchers to watch users navigate mobile applications in real time while the AI moderator asks about their experience.
"We are the only player in the market that supports screen sharing on mobile," Hylton said. "You know, I want to understand what are the pain points with my app? Why do people not seem to be able to find the checkout flow? Well, in order to do that effectively, you'd like to see the user screen while they're doing an interview."
For consumer-facing companies where mobile represents the primary customer interface, this capability opens entirely new use cases. The founders noted that "several of our customers didn't do research before" but have now built research practices around Strella because the platform finally made mobile research accessible at scale.
The platform also supports embedding traditional survey question types directly into the conversational interview, approaching what Hylton called "feature parity with a survey" while maintaining the engagement advantages of a natural conversation. Strella interviews regularly run 60 to 90 minutes with nearly 100% completion rates—a duration that would see 60-70% drop-off in a traditional survey format.
How Strella differentiated in a market crowded with AI research startups
Strella enters a market that appears crowded at first glance, with established players like Qualtrics and a wave of AI-powered startups promising to transform customer research. The founders themselves initially pursued a different approach — synthetic respondents, or "digital twins" that simulate customer perspectives using large language models.
"We actually pivoted from that. That was our initial idea," Hylton revealed, referring to synthetic respondents. "People are very intrigued by that concept, but found in practice, no willingness to pay right now."
Recent research suggesting companies could use language models as digital twins for customer feedback has reignited interest in that approach. But Hylton remains skeptical: "The capabilities of the LLMs as they are today are not good enough, in my opinion, to justify a standalone company. Right now you could just ask ChatGPT, 'What would new users of Duolingo think about this ad copy?' You can do that. Adding the standalone idea of a synthetic panel is sort of just putting a wrapper on that."
Instead, Strella's bet is that the real value lies in collecting proprietary qualitative data at scale — building what could become "the system of truth for all qualitative insights" within enterprises, as Lindsey Li, Vice President at Bessemer Venture Partners, described it.
Li, who led the investment just one year after Strella emerged from stealth, said the firm was convinced by both the technology and the team. "Strella has built highly differentiated technology that enables a continuous interview rather than a survey," Li said. "We heard time and time again that customers loved this product experience relative to other offerings."
On the defensibility question that concerns many AI investors, Li emphasized product execution over patents: "We think the long game here will be won with a million small product decisions, all of which must be driven by deep empathy for customer pain and an understanding of how best to address their needs. Lydia and Priya exhibit that in spades."
The founders point to technical depth that's difficult to replicate. Most competitors started with adaptive surveys — text-based interfaces where users type responses and wait for the next question. Some have added voice, but typically as uploaded audio clips rather than free-flowing conversation.
"Our approach is fundamentally better, which is the fact that it is a free form conversation," Hylton said. "You never have to control anything. You're never typing, there's no buttons, there's no upload and wait for the next question. It's completely free form, and that has been an extraordinarily hard product to build. There's a tremendous amount of IP in the way that we prompt our moderator, the way that we run analysis."
The platform also improves with use, learning from each customer's research patterns to fine-tune future interview guides and questions. "Our product gets better for our customers as they continue to use us," Hylton said. All research accumulates in a central repository where teams can generate new insights by chatting with the data or creating visualizations from previously unstructured qualitative feedback.
Creating new research budgets instead of just automating existing ones
Perhaps more important than displacing existing research is expanding the total market. Krishnan said growth has been "fundamentally related to our product" creating new research that wouldn't have happened otherwise.
"We have expanded the use cases in which people would conduct research," Krishnan explained. "Several of our customers didn't do research before, have always wanted to do research, but didn't have a dedicated researcher or team at their company that was devoted to it, and have purchased Strella to kick off and enable their research practice. That's been really cool where we've seen this market just opening up."
This expansion comes as enterprises face mounting pressure to improve customer experience amid declining satisfaction scores. According to Forrester Research's 2024 Customer Experience Index, customer experience quality has declined for three consecutive years — an unprecedented trend. The report found that 39% of brands saw CX quality deteriorate, with declines across effectiveness, ease, and emotional connection.
Meanwhile, Deloitte's 2025 Technology, Media & Telecommunications Predictions report forecasts that 25% of enterprises using generative AI will deploy AI agents by 2025, growing to 50% by 2027. The report specifically highlighted AI's potential to enhance customer satisfaction by 15-20% while reducing cost to serve by 20-30% when properly implemented.
Gartner identified conversational user interfaces — the category Strella inhabits — as one of three technologies poised to transform customer service by 2028, noting that "customers increasingly expect to be able to interact with the applications they use in a natural way."
Against this backdrop, Li sees substantial room for growth. "UX Research is a sub-sector of the $140B+ global market-research industry," Li said. "This includes both the software layer historically (~$430M) and professional services spend on UX research, design, product strategy, etc. which is conservatively estimated to be ~$6.4B+ annually. As software in this vertical, led by Strella, becomes more powerful, we believe the TAM will continue to expand meaningfully."
Making customer feedback accessible across the enterprise, not just research teams
The founders describe their mission as "democratizing access to the customer" — making it possible for anyone in an organization to understand customer perspectives without waiting for dedicated research teams to complete months-long studies.
"Many, many, many positions in the organization would like to get customer feedback, but it's so hard right now," Hylton said. With Strella, she explained, someone can "log into Strella and through a chat, create any highlight reel that you want and actually see customers in their own words answering the question that you have based on the research that's already been done."
This video-first approach to research repositories changes organizational dynamics around customer feedback. "Then you can say, 'Okay, engineering team, we need to build this feature. And here's the customer actually saying it,'" Hylton continued. "'This is not me. This isn't politics. Here are seven customers saying they can't find the Checkout button.' The fact that we are a very video-based platform really allows us to do that quickly and painlessly."
The company has moved decisively upmarket, with contract values now typically in the five-figure range and "several six figure contracts" signed, according to Krishnan. The pricing strategy reflects a premium positioning: "Our product is very good, it's very premium. We're charging based on the value it provides to customers," Krishnan said, rather than competing on cost alone.
This approach appears to be working. The company reports 100% conversion from pilot programs to paid contracts and zero churn among its 40-45 customers, with month-over-month revenue growth of 50%.
The roadmap: Computer vision, agentic AI, and human-machine collaboration
The Series A funding will primarily support scaling product and go-to-market teams. "We're really confident that we have product-market fit," Hylton said. "And now the question is execution, and we want to hire a lot of really talented people to help us execute."
On the product roadmap, Hylton emphasized continued focus on the participant experience as the key to winning the market. "Everything else is downstream of a joyful participant experience," she said, including "the quality of insights, the amount you have to pay people to do the interviews, and the way that your customers feel about a company."
Near-term priorities include adding visual capabilities so the AI moderator can respond to facial expressions and other nonverbal cues, and building more sophisticated collaboration features between human researchers and AI moderators. "Maybe you want to listen while an AI moderator is running a call and you might want to be able to jump in with specific questions," Hylton said. "Or you want to run an interview yourself, but you want the moderator to be there as backup or to help you."
These features move toward what the industry calls "agentic AI" — systems that can act more autonomously while still collaborating with humans. The founders see this human-AI collaboration, rather than full automation, as the sustainable path forward.
"We believe that a lot of the really strategic work that companies do will continue to be human moderated," Hylton said. "And you can still do that through Strella and just use us for synthesis in those cases."
For Li and Bessemer, the bet is on founders who understand this nuance. "Lydia and Priya exhibit the exact archetype of founders we are excited to partner with for the long term — customer-obsessed, transparent, thoughtful, and singularly driven towards the home-run scenario," she said.
As Strella scales, the founders remain focused on a vision where technology enhances rather than eliminates human judgment—where an engineering team doesn't just read a research report, but watches seven customers struggle to find the same button. Where a product manager can query months of accumulated interviews in seconds. Where companies don't choose between speed and depth, but get both.
"The interesting part of the business is actually collecting that proprietary dataset, collecting qualitative research at scale," Hylton said, describing what she sees as Strella's long-term moat. Not replacing the researcher, but making everyone in the company one.
Microsoft is fundamentally reimagining how people interact with their computers, announcing Thursday a sweeping transformation of Windows 11 that brings voice-activated AI assistants, autonomous software agents, and contextual intelligence to every PC running the operating system — not just premium devices with specialized chips.
The announcement represents Microsoft's most aggressive push yet to integrate generative artificial intelligence into the desktop computing experience, moving beyond the chatbot interfaces that have defined the first wave of consumer AI products toward a more ambient, conversational model where users can simply talk to their computers and have AI agents complete complex tasks on their behalf.
"When we think about what the promise of an AI PC is, it should be capable of three things," Yusuf Mehdi, Microsoft's Executive Vice President and Consumer Chief Marketing Officer, told reporters at a press conference last week. "First, you should be able to interact with it naturally, in text or voice, and have it understand you. Second, it should be able to see what you see and be able to offer guided support. And third, it should be able to take action on your behalf."
The shift could prove consequential for an industry searching for the "killer app" for generative AI. While hundreds of millions of people have experimented with ChatGPT and similar chatbots, integrating AI directly into the operating system that powers the vast majority of workplace computers could dramatically accelerate mainstream adoption — or create new security and privacy headaches for organizations already struggling to govern employee use of AI tools.
How 'Hey Copilot' aims to replace typing with talking on Windows PCs
At the heart of Microsoft's vision is voice interaction, which the company is positioning as the third fundamental input method for PCs after the mouse and keyboard — a comparison that underscores Microsoft's ambitions for reshaping human-computer interaction nearly four decades after the graphical user interface became standard.
Starting this week, any Windows 11 user can enable the "Hey Copilot" wake word with a single click, allowing them to summon Microsoft's AI assistant by voice from anywhere in the operating system. The feature, which had been in limited testing, is now being rolled out to hundreds of millions of devices globally.
"It's been almost four decades since the PC has changed the way you interact with it, which is primarily mouse and keyboard," Mehdi said. "When you think about it, we find that people type on a given day up to 14,000 words on their keyboard, which is really kind of mind-boggling. But what if now you can go beyond that and talk to it?"
The emphasis on voice reflects internal Microsoft data showing that users engage with Copilot twice as much when using voice compared to text input — a finding the company attributes to the lower cognitive barrier of speaking versus crafting precise written prompts.
"The magic unlock with Copilot Voice and Copilot Vision is the ease of interaction," according to the company's announcement. "Using the new wake word, 'Hey Copilot,' getting something done is as easy as just asking for it."
But Microsoft's bet on voice computing faces real-world constraints that Mehdi acknowledged during the briefing. When asked whether workers in shared office environments would use voice features, potentially compromising privacy, Mehdi noted that millions already conduct voice calls through their PCs with headphones, and predicted users would adapt: "Just like when the mouse came out, people have to figure out when to use it, what's the right way, how to make it happen."
Crucially, Microsoft is hedging its voice-first strategy by making all features accessible through traditional text input as well, recognizing that voice isn't always appropriate or accessible.
AI that sees your screen: Copilot Vision expands worldwide with new capabilities
Perhaps more transformative than voice control is the expansion of Copilot Vision, a feature Microsoft introduced earlier this year that allows the AI to analyze what's displayed on a user's screen and provide contextual assistance.
Previously limited to voice interaction, Copilot Vision is now rolling out worldwide with a new text-based interface, allowing users to type questions about what they're viewing rather than speaking them aloud. The feature can now access full document context in Microsoft Office applications — meaning it can analyze an entire PowerPoint presentation or Excel spreadsheet without the user needing to scroll through every page.
"With 68 percent of consumers reporting using AI to support their decision making, voice is making this easier," Microsoft explained in its announcement. "The magic unlock with Copilot Voice and Copilot Vision is the ease of interaction."
During the press briefing, Microsoft demonstrated Copilot Vision helping users navigate Spotify's settings to enable lossless audio streaming, coaching an artist through writing a professional bio based on their visual portfolio, and providing shopping recommendations based on products visible in YouTube videos.
"What brings AI to life is when you can give it rich context, when you can type great prompts," Mehdi explained. "The big challenge for the majority of people is we've been trained with search to do the opposite. We've been trained to essentially type in fewer keywords, because it turns out the less keywords you type on search, the better your answers are."
He noted that average search queries remain just 2.3 keywords, while AI systems perform better with detailed prompts — creating a disconnect between user habits and AI capabilities. Copilot Vision aims to bridge that gap by automatically gathering visual context.
"With Copilot Vision, you can simply share your screen and Copilot in literally milliseconds can understand everything on the screen and then provide intelligence," Mehdi said.
The vision capabilities work with any application without requiring developers to build specific integrations, using computer vision to interpret on-screen content — a powerful capability that also raises questions about what the AI can access and when.
Software robots take control: Inside Copilot Actions' controversial autonomy
The most ambitious—and potentially controversial—new capability is Copilot Actions, an experimental feature that allows AI to take control of a user's computer to complete tasks autonomously.
Coming first to Windows Insiders enrolled in Copilot Labs, the feature builds on Microsoft's May announcement of Copilot Actions on the web, extending the capability to manipulate local files and applications on Windows PCs.
During demonstrations, Microsoft showed the AI agent organizing photo libraries, extracting data from documents, and working through multi-step tasks while users attended to other work. The agent operates in a separate, sandboxed environment and provides running commentary on its actions, with users able to take control at any time.
"As a general-purpose agent — simply describe the task you want to complete in your own words, and the agent will attempt to complete it by interacting with desktop and web applications," according to the announcement. "While this is happening, you can choose to focus on other tasks. At any time, you can take over the task or check in on the progress of the action, including reviewing what actions have been taken."
Navjot Virk, Microsoft's Windows Experience Leader, acknowledged the technology's current limitations during the briefing. "We'll be starting with a narrow set of use cases while we optimize model performance and learn," Virk said. "You may see the agent make mistakes or encounter challenges with complex interfaces, which is why real-world testing of this experience is so critical."
The experimental nature of Copilot Actions reflects broader industry challenges with agentic AI — systems that can take actions rather than simply providing information. While the potential productivity gains are substantial, AI systems still occasionally "hallucinate" incorrect information and can be vulnerable to novel attacks.
Can AI agents be trusted? Microsoft's new security framework explained
Recognizing the security implications of giving AI control over users' computers and files, Microsoft introduced a new security framework built on four core principles: user control, operational transparency, limited privileges, and privacy-preserving design.
Central to this approach is the concept of "agent accounts" — separate Windows user accounts under which AI agents operate, distinct from the human user's account. Combined with a new "agent workspace" that provides a sandboxed desktop environment, the architecture aims to create clear boundaries around what agents can access and modify.
Peter Waxman, Microsoft's Windows Security Engineering Leader, emphasized that Copilot Actions is disabled by default and requires explicit user opt-in. "You're always in control of what Copilot Actions can do," Waxman said. "Copilot Actions is turned off by default and you're able to pause, take control, or disable it at any time."
During operation, users can monitor the agent's progress in real-time, and the system requests additional approval before taking "sensitive or important" actions. All agent activity occurs under the dedicated agent account, creating an audit trail that distinguishes AI actions from human ones.
However, the agent will have default access to users' Documents, Downloads, Desktop, and Pictures folders—a broad permission grant that could concern enterprise IT administrators.
Dana Huang, Corporate Vice President for Windows Security, acknowledged in a blog post that "agentic AI applications introduce novel security risks, such as cross-prompt injection (XPIA), where malicious content embedded in UI elements or documents can override agent instructions, leading to unintended actions like data exfiltration or malware installation."
Microsoft promises more details about enterprise controls at its Ignite conference in November.
Gaming, taskbar redesign, and deeper Office integration round out updates
Beyond voice and autonomous agents, Microsoft introduced changes across Windows 11's core interfaces and extended AI to new domains.
A new "Ask Copilot" feature integrates AI directly into the Windows taskbar, providing one-click access to start conversations, activate vision capabilities, or search for files and settings with "lightning-fast" results. The opt-in feature doesn't replace traditional Windows search.
File Explorer gains AI capabilities through integration with third-party services. A partnership with Manus AI allows users to right-click on local image files and generate complete websites without manual uploading or coding. Integration with Filmora enables quick jumps into video editing workflows.
Microsoft also introduced Copilot Connectors, allowing users to link cloud services like OneDrive, Outlook, Google Drive, Gmail, and Google Calendar directly to Copilot on Windows. Once connected, users can query personal content across platforms using natural language.
In a notable expansion beyond productivity, Microsoft and Xbox introduced Gaming Copilot for the ROG Xbox Ally handheld gaming devices developed with ASUS. The feature, accessible via a dedicated hardware button, provides an AI assistant that can answer gameplay questions, offer strategic advice, and help navigate game interfaces through natural voice conversation.
Why Microsoft is racing to embed AI everywhere before Apple and Google
Microsoft's announcement comes as technology giants race to embed generative AI into their core products following the November 2022 launch of ChatGPT. While Microsoft moved quickly to integrate OpenAI's technology into Bing search and introduce Copilot across its product line, the company has faced questions about whether AI features are driving meaningful engagement. Recent data shows Bing's search market share remaining largely flat despite AI integration.
The Windows integration represents a different approach: rather than charging separately for AI features, Microsoft is building them into the operating system itself, betting that embedded AI will drive Windows 11 adoption and competitive differentiation against Apple and Google.
Apple has taken a more cautious approach with Apple Intelligence, introducing AI features gradually and emphasizing privacy through on-device processing. Google has integrated AI across its services but has faced challenges with accuracy and reliability.
Crucially, while Microsoft highlighted new Copilot+ PC models from partners with prices ranging from $649.99 to $1,499.99, the core AI features announced today work on any Windows 11 PC — a significant departure from earlier positioning that suggested AI capabilities required new hardware with specialized neural processing units.
"Everything we showed you here is for all Windows 11 PCs. You don't need to run it on a copilot plus PC. It works on any Windows 11 PC," Mehdi clarified.
This democratization of AI features across the Windows 11 installed base potentially accelerates adoption but also complicates Microsoft's hardware sales pitch for premium devices.
What Microsoft's AI bet means for the future of computing
Mehdi framed the announcement in sweeping terms, describing Microsoft's goal as fundamentally reimagining the operating system for the AI era.
"We're taking kind of a bold view of it. We really feel that the vision that we have is, let's rewrite the entire operating system around AI and build essentially what becomes truly the AI PC," he said.
For Microsoft, the success of AI-powered Windows 11 could help drive the company's next phase of growth as PC sales have matured and cloud growth faces increased competition.
For users and organizations, the announcement represents a potential inflection point in how humans interact with computers — one that could significantly boost productivity if executed well, or create new security headaches if the AI proves unreliable or difficult to control.
The technology industry will be watching closely to see whether Microsoft's bet on conversational computing and agentic AI marks the beginning of a genuine paradigm shift, or proves to be another ambitious interface reimagining that fails to gain mainstream traction.
What's clear is that Microsoft is moving aggressively to stake its claim as the leader in AI-powered personal computing, leveraging its dominant position in desktop operating systems to bring generative AI directly into the daily workflows of potentially a billion users.
Copilot Voice and Vision are available today to Windows 11 users worldwide, with experimental capabilities coming to Windows Insiders in the coming weeks.
The Dfinity Foundation on Wednesday released Caffeine, an artificial intelligence platform that allows users to build and deploy web applications through natural language conversation alone, bypassing traditional coding entirely. The system, which became publicly available today, represents a fundamental departure from existing AI coding assistants by building applications on a specialized decentralized infrastructure designed specifically for autonomous AI development.
Unlike GitHub Copilot, Cursor, or other "vibe coding" tools that help human developers write code faster, Caffeine positions itself as a complete replacement for technical teams. Users describe what they want in plain language, and an ensemble of AI models writes, deploys, and continually updates production-grade applications — with no human intervention in the codebase itself.
"In the future, you as a prospective app owner or service owner… will talk to AI. AI will give you what you want on a URL," said Dominic Williams, founder and chief scientist at the Dfinity Foundation, in an exclusive interview with VentureBeat. "You will use that, completely interact productively, and you'll just keep talking to AI to evolve what that does. The AI, or an ensemble of AIs, will be your tech team."
The platform has attracted significant early interest: more than 15,000 alpha users tested Caffeine before its public release, with daily active users representing 26% of those who received access codes — "early Facebook kind of levels," according to Williams. The foundation reports some users spending entire days building applications on the platform, forcing Dfinity to consider usage limits due to underlying AI infrastructure costs.
Why Caffeine's custom programming language guarantees your data won't disappear
Caffeine's most significant technical claim addresses a problem that has plagued AI-generated code: data loss during application updates. The platform builds applications using Motoko, a programming language developed by Dfinity specifically for AI use, which provides mathematical guarantees that upgrades cannot accidentally delete user data.
"When AI is updating apps and services in production, a mistake cannot lose data. That's a guarantee," Williams said. "It's not like there are some safeguards to try and stop it losing data. This language framework gives it rails that guarantee if an upgrade, an update to its app's underlying logic, would cause data loss, the upgrade fails and the AI just tries again."
This addresses what Williams characterizes as critical failures in competing platforms. User forums for tools like Lovable and Replit, he notes, frequently report three major problems: applications that become irreparably broken as complexity increases, security vulnerabilities that allow unauthorized access, and mysterious data loss during updates.
Traditional tech stacks evolved to meet human developer needs — familiarity with SQL databases, preference for known programming languages, existing skill investments. "That's how the traditional tech stacks evolved. It's really evolved to meet human needs," Williams explained. "But in the future, it's going to be different. You're not going to care how the AI did it. Instead, for you, AI is the tech stack."
Caffeine's architecture reflects this philosophy. Applications run entirely on the Internet Computer Protocol (ICP), a blockchain-based network that Dfinity launched in May 2021 after raising over $100 million from investors including Andreessen Horowitz and Polychain Capital. The ICP uses what Dfinity calls "chain-key cryptography" to create what Williams describes as "tamper-proof" code — applications that are mathematically guaranteed to execute their written logic without interference from traditional cyberattacks.
"The code can't be affected by ransomware, so you don't have to worry about malware in the same way you do," Williams said. "Configuration errors don't result in traditional cyber attacks. That passive traditional cyber attacks isn't something you need to worry about."
How 'orthogonal persistence' lets AI build apps without managing databases
At the heart of Caffeine's technical approach is a concept called "orthogonal persistence," which fundamentally reimagines how applications store and manage data. In traditional development, programmers must write extensive code to move data between application logic and separate database systems — marshaling data in and out of SQL servers, managing connections, handling synchronization.
Motoko eliminates this entirely. Williams demonstrated with a simple example: defining a blog post data type and declaring a variable to store an array of posts requires just two lines of code. "This declaration is all that's necessary to have the blog maintain its list of posts," he explained during a presentation on the technology. "Compare that to traditional IT where in order to persist the blog posts, you'd have to marshal them in and out of a database server. This is quite literally orders of magnitude more simple."
This abstraction allows AI to work at a higher conceptual level, focusing on application logic rather than infrastructure plumbing. "Logic and data are kind of the same," Williams said. "This is one of the things that enables AI to build far more complicated functionality than it could otherwise do."
The system also employs what Dfinity calls "loss-safe data migration." When AI needs to modify an application's data structure — adding a "likes" field to blog posts, for example — it must write migration logic in two passes. The framework automatically verifies that the transformation won't result in data loss, refusing to compile or deploy code that could delete information unless explicitly instructed.
From million-dollar SaaS contracts to conversational app building in minutes
Williams positions Caffeine as particularly transformative for enterprise IT, where he claims costs could fall to "1% of what they were before" while time-to-market shrinks to similar fractions. The platform targets a spectrum from individual creators to large corporations, all of whom currently face either expensive development teams or constraining low-code templates.
"A corporation or government department might want to create a corporate portal or CRM, ERP functionality," Williams said, referring to customer relationship management and enterprise resource planning systems. "They will otherwise have to obtain this by signing up for some incredibly expensive SaaS service where they become locked in, their data gets stuck, and they still have to spend a lot of money on consultants customizing the functionality."
Applications built through Caffeine are owned entirely by their creators and cannot be shut down by centralized parties — a consequence of running on the decentralized Internet Computer network rather than traditional cloud providers like Amazon Web Services. "When someone says built on the internet computer, it actually means built on the internet computer," Williams emphasized, contrasting this with blockchain projects that merely host tokens while running actual applications on centralized infrastructure.
The platform demonstrated this versatility during a July 2025 hackathon in San Francisco, where participants created applications ranging from a "Will Maker" tool for generating legal documents, to "Blue Lens," a voice-AI water quality monitoring system, to "Road Patrol," a gamified community reporting app for infrastructure problems. Critically, many of these came from non-technical participants with no coding background.
"I'm from a non-technical background, I'm actually a quality assurance professional," said the creator of Blue Lens in a video testimonial. "Through Caffeine I can build something really intuitive and next-gen to the public." The application integrated multiple external services — Eleven Labs for voice AI, real-time government water data through retrieval-augmented generation, and Midjourney-generated visual assets — all coordinated through conversational prompts.
What separates Caffeine from GitHub Copilot, Cursor, and the 'vibe coding' wave
Caffeine enters a crowded market of AI-assisted development tools, but Williams argues the competition isn't truly comparable. GitHub Copilot, Cursor, and similar tools serve human developers working with traditional technology stacks. Platforms like Replit and Lovable occupy a middle ground, offering "vibe coding" that mixes AI generation with human editing.
"If you're a Node.js developer, you know you're working with the traditional stack, and you might want to do your coding with Copilot or using Claude or using Cursor," Williams said. "That's a very different thing to what Caffeine is offering. There'll always be cases where you probably wouldn't want to hand over the logic of the control system for a new nuclear missile silo to AI. But there's going to be these holdout areas, right? And there's all the legacy stuff that has to be maintained."
The key distinction, according to Williams, lies in production readiness. Existing AI coding tools excel at rapid prototyping but stumble when applications grow complex or require guaranteed reliability. Reddit forums for these platforms document users hitting insurmountable walls where applications break irreparably, or where AI-generated code introduces security vulnerabilities.
"As the demands and the requirements become more complicated, eventually you can hit a limit, and when you hit that limit, not only can you not go any further, but sometimes your app will get broken and there's no way of going back to where you were before," Williams said. "That can't happen with productive apps, and it also can't be the case that you're getting hacked and losing data, because once you go hands-free, if you like, and there's no tech team, there's no technical people involved, who's going to run the backups and restore your app?"
The Internet Computer's architecture addresses this through Byzantine fault tolerance — even if attackers gain physical control over some network hardware, they cannot corrupt applications or their data. "This is the beginning of a compute revolution and it's also the perfect platform for AI to build on," Williams said.
Inside the vision: A web that programs itself through natural language
Dfinity frames Caffeine within a broader vision it calls the "self-writing internet," where the web literally programs itself through natural language interaction. This represents what Williams describes as a "seismic shift coming to tech" — from human developers selecting technology stacks based on their existing skills, to AI selecting optimal implementations invisible to users.
"You don't care about whether some human being has learned all of the different platforms and Amazon Web Services or something like that. You don't care about that. You just care: Is it secure? Do you get security guarantees? Is it resilient? What's the level of resilience?" Williams said. "Those are the new parameters."
The platform demonstrated this during live demonstrations, including at the World Computer Summit 2025 in Zurich. Williams created a talent recruitment application from scratch in under two minutes, then modified it in real-time while the application ran with users already interacting with it. "You will continue talking to the AI and just keep on refreshing the URL to see the changes," he explained.
This capability extends to complex scenarios. During demonstrations, Williams showed building a tennis lesson booking system, an e-commerce platform, and an event registration system — all simultaneously, working on multiple applications in parallel. "We predict that as people get very proficient with Caffeine, they could be working on even 10 apps in parallel," he said.
The system writes substantial code: a simple personal blog generated 700 lines of code in a couple of minutes. More complex applications can involve thousands of lines across frontend and backend components, all abstracted away from the user who only describes desired functionality.
The economics of cloning: How Caffeine's app market challenges traditional stores
Caffeine's economic model differs fundamentally from traditional software-as-a-service platforms. Applications run on the Internet Computer Protocol, which uses a "reverse gas model" where developers pay for computation rather than users paying transaction fees. The platform includes an integrated App Market where creators can publish applications for others to clone and adapt — creating what Dfinity envisions as a new economic ecosystem.
"App stores today obviously operate on gatekeeping," said Pierre Samaties, chief business officer at Dfinity, during the World Computer Summit. "That's going to erode." Rather than purchasing applications, users can clone them and modify them for their own purposes — fundamentally different from Apple's App Store or Google Play models.
Williams acknowledges that Caffeine itself currently runs on centralized infrastructure, despite building applications on the decentralized Internet Computer. "Caffeine itself actually is centralized. It uses aspects of the Internet Computer. We want Caffeine itself to run on the Internet Computer in the future, but it's not there now," he said. The platform leverages commercially available foundation models from companies like Anthropic, whose Claude Sonnet model powers much of Caffeine's backend logic.
This pragmatic approach reflects Dfinity's strategy of using best-in-class AI models while focusing its own development on the specialized infrastructure and programming language designed for AI use. "These content models have been developed by companies with enormous budgets, absolutely enormous budgets," Williams said. "I don't think in the near future we'll run AI on the Internet Computer for that reason, unless there's a special case."
A decade in the making: From Ethereum roots to the self-writing internet
The Dfinity Foundation has pursued this vision since Williams began researching decentralized networks in late 2013. After involvement with Ethereum before its 2015 launch, Williams became fascinated with the concept of a "world computer"—a public blockchain network that could host not just tokens but entire applications and services.
"By 2015 I was talking about network-focused drivers, Dfinity back then, and that could really operate as an alternative tech stack, and eventually host even things like social networks and massive enterprise systems," Williams said. The foundation launched the Internet Computer Protocol in May 2021, initially focusing on Web3 developers. Despite not being among the highest-valued blockchain projects, ICP consistently ranks in the top 10 for developer numbers.
The pivot to AI-driven development came from recognizing that "in the future, the tech stack will be AI," according to Williams. This realization led to Caffeine's development, announced on Dfinity's public roadmap in March 2025 and demonstrated at the World Computer Summit in June 2025.
One successful example of the Dfinity vision running in production is OpenChat, a messaging application that runs entirely on the Internet Computer and is governed by a decentralized autonomous organization (DAO) with tens of thousands of participants voting on source code updates through algorithmic governance. "The community is actually controlling the source code updates," Williams explained. "Developers propose updates, community reads the updates, and if the community is happy, OpenChat updates itself."
The skeptics weigh in: Crypto baggage and real-world testing ahead
The platform faces several challenges. Dfinity's crypto industry roots may create perception problems in enterprise markets, Williams acknowledges. "The Web3 industry's reputation is a bit tarnished and probably rightfully so," he said during the World Computer Summit. "Now people can, for themselves, experience what a decentralized network is. We're going to see self-writing take over the enterprise space because the speed and efficiency are just incredible."
The foundation's history includes controversy: ICP's token launched in 2021 at over $100 per token with an all-time high around $700, then crashed below $3 in 2023 before recovering. The project has faced legal challenges, including class action lawsuits alleging misleading investors, and Dfinity filed defamation claims against industry critics.
Technical limitations also remain. Caffeine cannot yet compile React front-ends on the Internet Computer itself, requiring some off-chain processing. Complex integrations with traditional systems — payment processing through Stripe, for example — still require centralized components. "Your app is running end-to-end on the Internet Computer, then when it needs to actually accept payment, it's going to hand over to your Stripe account," Williams explained.
The platform's claims about data loss prevention and security guarantees, while technically grounded in the Motoko language design and Internet Computer architecture, remain to be tested at scale with diverse real-world applications. The 26% daily active user rate from alpha testing is impressive but comes from a self-selected group of early adopters.
When five billion smartphone users become developers
Williams rejects concerns that AI-driven development will eliminate software engineering jobs, arguing instead for market expansion. "The self-writing internet empowers eight billion non-technical people," he said. "Some of these people will enter roles in tech, becoming prompt engineers, tech entrepreneurs, or helping run online communities. Humanity will create millions of new custom apps and services, and a subset of those will require professional human assistance."
During his World Computer Summit demonstration, Williams was explicit about the scale of transformation Dfinity envisions. "Today there are about 35,000 Web3 engineers in the world. Worldwide there are about 15 million full-stack engineers," he said. "But tomorrow with the self-writing internet, everyone will be a builder. Today there are already about five billion people with internet-connected smartphones and they'll all be able to use Caffeine."
The hackathon results suggest this isn't pure hyperbole. A dentist built "Dental Tracks" to help patients manage their dental records. A transportation industry professional created "Road Patrol" for gamified infrastructure reporting. A frustrated knitting student built "Skill Sprout," a garden-themed app for learning new hobbies, complete with material checklists and step-by-step skill breakdowns—all without writing a single line of code.
"I was learning to knit. I got irritated because I had the wrong materials," the creator explained in a video interview. "I don't know how to do the stitches, so I have to individually search, and it's really intimidating when you're trying to learn something you don't—you don't even know what you don't know."
Whether Caffeine succeeds depends on factors still unknown: how production applications perform under real-world stress, whether the Internet Computer scales to millions of applications, whether enterprises can overcome their skepticism of blockchain-adjacent technology. But if Williams is right about the fundamental shift — that AI will be the tech stack, not just a tool for human developers — then someone will build what Caffeine promises.
The question isn't whether the future looks like this. It's who gets there first, and whether they can do it without losing everyone's data along the way.