STL Partners webinar: Agentic AI needed for RAN autonomy & efficiency

Yesterday, a STL Partners webinar titled “Turning autonomy into margin: Agentic AI and the autonomous RAN,” suggested agentic AI is the missing layer that can turn RAN autonomy from a technical goal into a direct profit margin booster. It argues that operators should prioritize autonomy use cases by business impact, not just by how much automation coverage they add, and that the right roadmap can move autonomy from an engineering KPI to a commercial advantage.

The central message was that autonomy only matters if it improves economics (see poll results below). The webinar revealed that network operators need a dual-axis framework that combines the usual autonomous-network maturity view with a value-creation lens, so they can focus on the capabilities that scale into measurable business outcomes.

Agentic AI is presented as the practical enabler for moving beyond human-in-the-loop operations. In this framing, agents help orchestrate tasks, make decisions, and coordinate network actions in ways that support more closed-loop automation than traditional workflows can deliver.

The results of an “actuality” poll relating to RAN autonomy revealed that controlling costs and reliability were most important, with the enablement of new revenue growth through APIs and sensing only scoring 10.87% of respondents.  Similarly, results for an “aspirations” poll for RAN autonomy were also fairly evenly spread between reducing costs and optimizing the customer experience, with just 13.21% citing new revenue growth.

Source: STL Partners

Terje Jensen, SVP, global business security officer and head of network and cloud technology strategy at Telenor, said that he had expected to see network operators’ aspirations shift more clearly towards improving customer experience and even revenue generation, not just efficiency.

Darwin Janz, strategic technology planner at SaskTel, also thought network operators’ ambitions would be higher, but he noted that they still struggle to identify concrete, monetizable use cases. Without that, there’s a real risk of building technical solutions in search of a problem, rather than starting from clear enterprise needs and value, Darwin noted. “We really need to see those use cases and enterprise customer needs,” he added.

……………………………………………………………………………………………………………………….

The webinar was built around four practical questions:

  1. Which use cases create real commercial impact?
  2. How to shift from autonomy as an engineering metric to a margin driver?
  3. Where agentic does AI add value today?
  4. What data, orchestration, and organizational foundations are needed to scale beyond pilots.

For network operators, the implication is that autonomous RAN strategy should be tied to P&L outcomes such as lower operating cost, better resource utilization, and faster optimization cycles. The webinar’s message is that autonomy becomes strategically important only when it is deployed in a way that compounds across the network and business.

…………………………………………………………………………………………………………………..

References:

https://www.lightreading.com/network-automation/telcos-showing-limited-aspiration-for-ran-autonomy-benefit

The Financial Trap of Autonomous Networks: Scaling Agentic AI in the Telecom Core

Nokia to showcase agentic AI network slicing; Ericsson partners with Ookla to measure 5G network slicing performance

 

 

Anthropic’s Project Glasswing aims to reshape IT cybersecurity

Backgrounder:

Late last year, Anthropic said that state-sponsored Chinese hackers had used its artificial intelligence (AI) technology in an effort to infiltrate the computer systems of roughly 30 companies and government agencies around the world. The company said it was the first reported case of a cyberattack in which AI technologies had gathered sensitive information with limited help from human operators.

As Anthropic and its chief rival, OpenAI, prepare to release new and more powerful AI systems, cybersecurity experts are increasingly vocal in their warnings that AI is fundamentally changing cybersecurity.  AI technology could allow hackers to identify security holes in computer systems far faster than in the past, vastly raising the stakes in the decades-long fight between hackers and the security experts guarding computer networks.  As hackers deploy AI to break and steal, security experts are also leaning on AI to spot flaws in their systems — including some that had gone unnoticed for decades.

“This is the most change in the cyber environment, ever,” said Francis deSouza, the chief operating officer and president of security products at Google Cloud. “You have to fight A.I. “This is the most change in the cyber environment, ever,” said Francis deSouza, the chief operating officer and president of security products at Google Cloud. “You have to fight AI with AI.”

Hackers have used AI chatbots to draft phishing emails and ransom notes, cybersecurity experts said. Others have used AI to parse large quantities of stolen data and determine what information might be valuable. Without help from AI attackers could sometimes break into computer networks within minutes, Mr. deSouza said, but with the help of AI breaches can take just seconds.  Some hackers specialize in breaking into systems and then selling off their access to other attackers. Those handoffs used to take as much as eight hours, as hackers negotiated the sales and passed along the compromised entry points, deSouza added. Now that process has accelerated to about 20 seconds, he said, with hackers sometimes using A.I. agents to speed up the process.

Some experts argue that the guardrails added by companies like Anthropic and OpenAI can actually provide an advantage to malicious attackers. Guardrails could cause an AI chatbot to deny help to a user trying to defend a system from an attack, they argue, but persistent hackers could be more diligent about finding vulnerabilities — and keeping those tricks to themselves.

In February, Anthropic said it had used its A.I. technologies to find over 500 so-called zero-day vulnerabilities — security holes that were unknown to software makers — in various pieces of commonly used open source software. The next month, a researcher at Anthropic revealed that he had used A.I. to find a serious security vulnerability in the core of the Linux operating system, which is software that powers much of the internet and is used in computer servers, cloud computing services, Android phones and Teslas. The bug had existed, apparently undiscovered, since 2003.

Project Glasswing Overview:

Anthropic has announced Project Glasswing – a new initiative that brings together Amazon Web Services, Anthropic, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks – in an effort to secure the world’s most critical software.

The fast growing AI private company has found that AI models (like its own Claude) have reached a level of coding capability where they can surpass all but the most skilled humans at finding and exploiting software vulnerabilities. Their Mythos Preview language model has already found thousands of high-severity vulnerabilities, including some in every major operating system and web browser.

Given the rate of AI progress, it will not be long before such capabilities proliferate, potentially beyond actors who are committed to deploying them safely. The fallout—for economies, public safety, and national security—could be severe. Project Glasswing is an urgent attempt to put these capabilities to work for defensive purposes.

The Project Glasswig partners will use Mythos Preview as part of their defensive security work. Anthropic will share what they learn so the entire IT industry can benefit. They have also extended access to a group of over 40 additional organizations that build or maintain critical software infrastructure so they can use the model to scan and secure both first-party and open-source systems.

Anthropic is committing up to $100M in usage credits for Mythos Preview across these efforts, as well as $4M in direct donations to open-source security organizations.

Project Glasswing Core Objectives:
  • Give Defenders a Head Start: The initiative aims to use Mythos’s capabilities to find and fix zero-day vulnerabilities in critical codebases before they can be discovered by malicious actors.
  • Secure Critical Infrastructure: Partners use the model to scan first-party systems and open-source software that underpin global banking, energy, and logistics networks.
  • Modernize Defense Practices: Anthropic is collaborating with partners to evolve security workflows, such as patching and disclosure processes, to match the “machine speed” of AI-driven vulnerability discovery.
Claude Mythos Capabilities:
The Glasswing initiative was formed after Anthropic researchers observed that the Mythos model had reached a threshold where its reasoning and coding skills surpassed all but the most skilled human security researchers.
  • Zero-Day Discovery: In early testing, the model autonomously found thousands of high-severity vulnerabilities, including a 27-year-old bug in OpenBSD and a 16-year-old flaw in FFmpeg code that had been scanned by automated tools millions of times without detection.
  • Performance Benchmarks: Mythos Preview scored 83% on the CyberGym cybersecurity benchmark, significantly outperforming previous models like Claude Opus.

 

References:

https://www.anthropic.com/glasswing

https://www.nytimes.com/2026/04/06/technology/ai-cybersecurity-hackers.html

Anthropic Glasswing: AI Vulnerability Detection Has Crossed a Threshold

Anthropic Claude Users Reveal AI Hallucinations as their Top Concern

Nvidia CEO Huang: AI is the largest infrastructure buildout in human history; AI Data Center CAPEX will generate new revenue streams for operators

New Linux Foundation white paper: How to integrate AI applications with telecom networks using standardized CAMARA APIs and the Model Context Protocol (MCP)

Nokia’s AI Applications Study: “Physical AI” may require RAN redesign to support high‑volume, low‑latency uplink traffic

According to Nokia, AI-generated traffic in most mobile networks is at an early stage, with application maturity and adoption by consumers and enterprises only at the start of a broader AI super cycle.  The Finland based company analyzed more than 50 AI applications and came to three conclusions: higher uplink traffic, overall data growth and increasing sensitivity to delay in conversational services such as chat and voice. Also, the mobile network industry is moving toward “AI-RAN” or “6G-native” structures that embed AI into the network, transforming radio sites into “robotic” nodes capable of edge inference and handling these new demands.

–>Do those findings require a structural change in Radio Access Network (RAN) design?  Let’s take a fresh look…..

Mobile networks traditionally support a heterogeneous mix of traffic, ranging from high-throughput video streaming to low-bandwidth, delay-tolerant messaging. Network operators typically address escalating capacity demands through infrastructure expansion and overprovisioning, relying on best-effort delivery—a model that has proven remarkably resilient. However, capacity alone is insufficient for new use cases.

The transition from circuit-switched voice to packet-switched (voice/video/data) IP traffic requires a redesign to accommodate variable packet sizes instead of predictable, continuous voice patterns. The proliferation of Internet of Things (IoT) devices introduced requirements for massive machine-type communications (mMTC), driving the development of LTE-M and NB-IoT to optimize for deep indoor penetration and power efficiency.  Conversely, consumer web-based services and video streaming scale seamlessly by adding RAN and core capacity. Existing AI applications, such as generative AI chatbots, follow this model, making current RAN architectures adequate for the present load.

A paradigm shift is emerging with Physical AI [1.], which enables machines like autonomous vehicles and robots to interact with the environment in real time. Unlike traditional video streaming, these applications cannot leverage buffering to absorb network jitter. In Physical AI, high-definition video frames and sensor data must arrive within stringent time-to-live (TTL) constraints to remain actionable. This shifts the focus from average throughput to consistent low latency. Maintaining this strict QoS, particularly in the uplink, requires abandoning best-effort, overprovisioned models in favor of guaranteed scheduling, which necessitates substantial reserved capacity or specialized AI-RAN functionalities.

Note 1. Physical AI combines sensors, perception, decision-making, and actuators so machines can understand their environment and take physical (real world) action. Physical AI is used by robots, vehicles, drones, industrial machines, and smart infrastructure that generate and consume real-time sensor, video, and control traffic. These systems need tight coupling between low latency, high reliability, and continuous feedback loops because decisions in software immediately affect physical motion or control. Physical AI is different from typical generative AI because the output is not text or images; it is real-world action. That makes network performance critical, especially for uplink-heavy, latency-sensitive traffic where delays can affect safety, control accuracy, and operational efficiency.

Physical AI introduces the possibility that large-volume uplink video with strict latency requirements. It will become a meaningful part of mobile traffic, creating both a design challenge and a monetization opportunity,” says Harish Viswanathan, Head of the Radio Systems Research Group at Nokia.

Image Credit: Techslang

Delivering uplink video with sub‑20 ms end-to-end latency can require provisioning three to four times the average uplink capacity. While this level of redundancy is manageable for low-bandwidth services such as voice or control signaling, it becomes prohibitively expensive when supporting high-throughput video streams.

As device densities increase, the required headroom for reserved capacity grows disproportionately, significantly constraining network scalability and driving up cost per bit. This makes Physical AI traffic—characterized by real-time sensor and video inputs for machine analysis—fundamentally different from conventional services, and unsuited to existing best‑effort transport models.  From a Nokia blog post:

“Physical AI will rely on low latency videos to enable real-time control. While the machines or robots will perform most functions locally, there will be situations where they need to rely on more powerful models or human operators to provide remote control via the network. For example, driverless taxis may require remote assistance in unexpected scenarios; service robots may need guidance in complex environments; drones may depend on real‑time video analysis at the point of delivery; and field workers using AR may require timely visual instructions. In all these cases, the network must deliver fresh video information with low and predictable latency.”

To address these challenges, telecom operators are expected to adopt a multi‑layer approach encompassing network architecture, traffic management, and service monetization.

At the Application layer, not all traffic requires identical latency treatment. When video or sensor data is processed by AI rather than consumed by humans, only semantically relevant information may need immediate uplink transmission. This emerging paradigm, known as semantic communication, allows for significant data reduction while preserving information integrity within latency‑critical loops.

Within the network domain, established mechanisms such as Quality of Service (QoS) and network slicing remain essential. QoS enables prioritization of specific traffic classes, while slicing supports logically isolated virtual networks with guaranteed service-level attributes—latency, jitter, bandwidth, and reliability.

At the service and business model level, supporting low-latency, bandwidth-intensive applications reshapes network economics. Operators must evolve beyond best‑effort pricing structures toward differentiated service tiers or performance-based charging models aligned with enterprise and industrial use cases.

For the RAN, Physical AI underscores the need for greater programmability and elasticity. Future RAN designs will depend on dynamic resource allocation, real-time traffic classification, and AI-driven orchestration to balance throughput, latency, and reliability at scale.

As Physical AI deployments expand—from autonomous mobility to precision manufacturing and tele‑robotics—managing high‑volume, low‑latency uplink traffic will become a defining capability for next‑generation network strategy and differentiation. Unlike conventional mobile data, Physical AI cannot rely on buffering to manage traffic spikes. The requirement for continuous video and sensor data to arrive within strict time limits to inform real-time actions makes traditional “best-effort” network approaches inefficient and costly.

Reasons for RAN Redesign:
  • Uplink-Centric Demand: Physical AI shifts the network requirement from downlink-heavy (human consumption) to uplink-heavy (machine-generated) traffic.
  • Strict Latency & Throughput: Maintaining consistent low latency (e.g., around 20 milliseconds) for high-volume video uploads can require 3x to 4x more capacity than average, making overprovisioning unsustainable.
  • Need for Programmable Architectures: To support this, RAN must move toward more flexible, AI-native architectures that prioritize critical data and provide deterministic, rather than best-effort, performance.
  • Semantic Communication: To reduce data volume while maintaining performance, the RAN will need to adopt semantic communication—transmitting only the essential data needed for the AI to make decisions.

………………………………………………………………………………………………………………………………………………………..

References:

https://www.nokia.com/asset/215147/

https://www.nokia.com/blog/physical-ai-redefining-ran-and-telco-monetization/

https://telcomagazine.com/news/nokia-report-points-to-ai-driven-shift-in-mobile-traffic

What Is Physical AI?

Arm Holdings unveils “Physical AI” business unit to focus on robotics and automotive

Is the “far edge” a bridge to far to cross for AI inferencing? What about “Distributed AI Grids”?

The Financial Trap of Autonomous Networks: Scaling Agentic AI in the Telecom Core

Ericsson and Intel collaborate to accelerate AI-Native 6G; other AI-Native 6G advancements at MWC 2026

NVIDIA and global telecom leaders to build 6G on open and secure AI-native platforms + Linux Foundation launches OCUDU

Comparing AI Native mode in 6G (IMT 2030) vs AI Overlay/Add-On status in 5G (IMT 2020)

AI-RAN Reality Check: hype vs hesitation, shaky business case, no specific definition, no standards?

IDC Survey of Networking Leaders: Enterprise AI progress stalls despite ambitious goals

New IDC research released in April 2026 highlights a growing disconnect between ambitious enterprise AI goals and the reality of their technical execution.  The 2026 IDC AI in Networking Special Report (LinkedIn Video hyperlink) [1.] found that organizations expecting to move from early and selective AI use for business and IT initiatives to more advanced deployments largely haven’t. The result is a widening gap between intent and execution that is becoming harder to ignore.  This widening gap in AI execution is driven by a mismatch between ambitious goals and the realities of legacy infrastructure, which cannot handle the data demands for production-grade models.

Despite high expectations, many organizations have seen their AI progress stall over the last 18 months, with “select use” adopters failing to advance to more “substantial” deployments. A critical shortage of specialized AI experienced personnel, combined with lagging security and governance controls, has caused widespread “pilot paralysis” across most enterprises. To overcome this, organizations are shifting toward “AI factories” to create a repeatable, governed pipeline for deploying AI.

Note 1. IDC’s 2026 AI in Networking Special Report is a report driven by a worldwide survey of 500+ enterprise network executives and experts. The report covers both the impact and plans for supporting AI workloads across the network and using AI-powered networking solutions. The focus of this research is comprehensive, covering datacenters, cloud services, multi-cloud environments, network core and edge, and network management.

…………………………………………………………………………………………………………………………………………………………………………………………………………………………………………

Mark Leary, IDC research director, Network Observability and Automation:

“Many solution suppliers are prioritizing a platform approach to the challenges associated with moving AI workloads into production. This survey of networking leaders highlights the shift in preference from platforms to best-in-class solutions when supporting AI workloads across their networks. As certain functional requirements intensify, as IT staff experience and expertise build, and as platforms fall short in delivering expected advantages, IT organizations are more willing to take on the added responsibilities associated with assembling their own mix of best-in-class solutions. For the supplier, the challenge is to avoid developing and delivering a platform that is classified as a jack-of-all-trades and master of none.”

Agentic AI is to have a profound effect on the network infrastructure and on networking staff. Two years ago, AI assistants were labeled leading edge when they offered natural language processing for operator interactions and network management guidance driven by technical manual content. How things have changed! Agentic AI is no longer just a passive informer and instructor but an active intelligent virtual network engineer. Agents gather and process comprehensive network data, develop deep and precise insights, and determine and, increasingly, execute needed network management actions. Whether fixing a network problem, activating a network service, optimizing a network configuration, or responding to a developing network condition, agentic AI solutions are proving more and more useful across the entire network and the entire set of tasks required to engineer and operate the network.”

While this IDC Survey Spotlight offers only an overview of responses relating to agentic AI, detailed results are available by geographic region, select country, company size, major vertical industries, respondent role, and the AI maturity level of the respondent’s organization.

…………………………………………………………………………………………………………………………………………………………………………………………………………………………………………

Organizations are pursuing AI in networking across two categories:

1.] Supporting AI workloads across network infrastructure and

2.] Applying AI to network operations. 

But in both cases, progress is constrained by persistent challenges. “2026 is when organizations find out if AI in networking delivers real operational impact—or remains stuck in pilot mode,” Leary said in the referenced LinkedIn Video.

Source: IDC

……………………………………………………………………………………………………………………………

Security remains the top concern among enterprises, both as a barrier to deployment and a primary use case for AI itself. “You have to fight AI with AI from a network security perspective,” said Brandon Butler, senior research manager at IDC. “There’s a realization that nefarious actors are leveraging AI themselves. The pressure is already on the network. The question now is whether organizations can keep up with what AI is demanding of their infrastructure,” he added.

Integration with existing systems and a shortage of skilled talent follow close behind. “Most folks don’t feel their staff can fully evaluate and select the right solutions,” Leary said. As a result, many organizations are turning outward for help:

  • 81% say they are increasing spending on managed service providers (MSP) to support AI initiatives.
  • 89% of data centers expect to increase bandwidth by at least 11% within the next year, driven by AI workloads.
  • That demand extends beyond individual facilities, with 91% expecting similar growth in inter-data center connectivity, highlighting the strain on distributed architectures.
  • Nearly half of respondents (46%) prefer AI systems that can both determine and execute network actions autonomously.
  • Another 41% favor a guided approach, while 13% prefer no AI involvement.

Cloud environments are seeing sharper increases in AI use. Organizations anticipate an average 49% rise in bandwidth for cloud connectivity over the next year. “The cloud is almost always involved,” Leary says. “The biggest group mixes one cloud platform with one or more data centers.”

Beyond the data center and cloud, the network edge is emerging as the next major growth area. Today, 27% of organizations have deployed AI workloads at the edge, and 54% plan to do so within two years. Butler said: “Folks who are leveraging AI more extensively are already pushing workloads to the edge. We see this as a leading indicator of where the market is going.”

“Two years in a row, the largest group said they want AI to both determine and execute actions. It was honestly surprising,” he added.

Enterprise edge bandwidth is projected to grow by an average of 51% in the next year. As AI becomes more distributed, network teams will need to manage greater complexity across environments while maintaining performance and security.

…………………………………………………………………………………………………………………………………………………………………………….

When assessing expected ROI from AI in networking, IDC survey respondents focused on elevating IT capabilities, with 31% prioritizing superior service levels and 30% focusing on operational efficiency. These outcomes ranked above worker productivity and revenue, suggesting that leaders are strategically utilizing AI to enhance foundational operational workflows. Notably, reducing operating costs ranked seventh, suggesting a focus on strategic value rather than immediate expense reduction.

Source: IDC

……………………………………………………………………………………………………

IDC Research identified specific applications—from automated configuration validation to AI-enhanced threat response—as catalysts for measurable performance gains and the organizational trust essential for broader implementation. For network executives, this phased approach represents the most strategic methodology for achieving long-term operational objectives.

“It doesn’t have to be handing the keys of your kingdom to AI to really get some benefits from these AI tools,” Butler concluded.

……………………………………………………………………………………………………………………………………………………………………………………….

References:

https://www.linkedin.com/posts/brandon-butler-29761a3_idc-recently-published-our-second-annual-activity-7429576183614320640-p5PA/

https://www.networkworld.com/article/4152655/ai-for-it-stalls-as-network-complexity-rises.html

Using AI, DeepSig Advances Open, Intelligent Baseband RAN Architectures

Using advanced AI techniques, DeepSig has reportedly managed to eliminate a mobile network’s pilot signal, thereby removing signaling overhead without degrading overall performance. Founded in 2016, the U.S.-based startup occupies a leading position at the intersection of artificial intelligence (AI) and the radio access network (RAN), developing data-driven models that could supplant traditional, human-engineered signal processing algorithms.

This work has become especially relevant as the telecom industry moves toward open and software-defined RAN architectures. DeepSig is now a visible contributor to OCUDU (Open Centralized Unit Distributed Unit), an open-source initiative announced by the Linux Foundation in collaboration with the U.S. Department of Defense and its FutureG ecosystem partners to accelerate open CU/DU development for 5G and early 6G systems. OCUDU is intended to establish a carrier-grade reference platform for baseband software, with support for AI-based algorithms and solutions embedded in the RAN compute stack.

As AI becomes a central theme across the telecom ecosystem, DeepSig has rapidly moved from relative obscurity to prominence through collaborations with major industry and government stakeholders. Most recently, the company emerged as a key contributor to OCUDU—the Open Central Unit Distributed Unit initiative announced by the Linux Foundation and the U.S. Department of Defense (DoD) ahead of MWC Barcelona 2026. The program’s goal is to introduce open-source software elements into the RAN baseband domain, an area historically dominated by proprietary offerings from Ericsson, Nokia, and Samsung. By lowering barriers to entry, OCUDU aims to foster innovation and enable smaller players like DeepSig to participate more freely in the U.S. baseband ecosystem.

Image Credit:  DeepSig

DeepSig was identified, alongside Ireland-based Software Radio Systems (SRS), as one of two startups selected to deliver OCUDU’s initial software stack. “The National Spectrum Consortium had an RFQ for developing an open-source stack,” explained Jim Shea, DeepSig’s CEO. “SRS already had a capable baseline, but it needed to be elevated to carrier-grade—adding new features and strengthening reliability,” he added.

Meanwhile, major vendors Ericsson and Nokia were named “premier members” of the new OCUDU Ecosystem Foundation. While both could, in principle, leverage the platform to integrate third-party components into their baseband systems, industry observers remain skeptical that these incumbents will fully embrace open-source alternatives over their established proprietary stacks. In comments at MWC, Nokia CEO Justin Hotard characterized OCUDU as a welcome ecosystem evolution to accelerate innovation but clarified that “not everything necessarily needs to be open source.”

Driven in part by DoD interests, OCUDU reflects broader U.S. government ambitions to ensure that 5G and future 6G networks remain open to domestic innovation, particularly for defense and mission-critical use cases. For vendors like Ericsson and Nokia—who view defense markets as increasingly strategic—this alignment could bring both opportunity and complexity.

DeepSig’s trajectory extends beyond OCUDU. The company’s technology originated from research by Tim O’Shea, now CTO, during his tenure at Virginia Tech, where he explored deep learning’s application to wireless signal processing. “You can apply deep learning to enhance the way communication systems operate by replacing many of the traditional algorithms,” said Jim Shea. While these methods do not circumvent theoretical limits such as Shannon’s Law, small efficiency gains can yield substantial operational and economic benefits for cost-sensitive mobile operators.

As DeepSig and peers continue to redefine how intelligence is integrated into the RAN, their work signals a shift toward AI-native architectures—where machine learning, rather than handcrafted algorithms, becomes the foundation for next-generation network optimization.

 

References:

https://www.lightreading.com/5g/small-deepsig-is-at-heart-of-ai-ran-challenge-to-ericsson-nokia

Accelerating 5G vRAN, AI-RAN, and 6G on OCUDU, “the Linux of RAN”

AI-RAN Reality Check: hype vs hesitation, shaky business case, no specific definition, no standards?

Ericsson goes with custom silicon (rather than Nvidia GPUs) for AI RAN

Dell’Oro: RAN Market Stabilized in 2025 with 1% CAG forecast over next 5 years; Opinion on AI RAN, 5G Advanced, 6G RAN/Core risks

Dell’Oro: Analysis of the Nokia-NVIDIA-partnership on AI RAN

RAN silicon rethink – from purpose built products & ASICs to general purpose processors or GPUs for vRAN & AI RAN

Dell’Oro: AI RAN to account for 1/3 of RAN market by 2029; AI RAN Alliance membership increases but few telcos have joined

Dell’Oro: AI RAN to account for 1/3 of RAN market by 2029; AI RAN Alliance membership increases but few telcos have joined

InterDigital led consortium to advance wireless spectrum coexistence & sharing

Telecom sessions at Nvidia’s 2025 AI developers GTC: March 17–21 in San Jose, CA

Sources: AI is Getting Smarter, but Hallucinations Are Getting Worse

The Financial Trap of Autonomous Networks: Scaling Agentic AI in the Telecom Core

By Pavan Madduri with Ajay Lotan Thakur

The telecom industry wants autonomous, self-healing networks, but nobody is looking at the GPU bill. Running Agentic AI 24/7 “just in case” will bankrupt your IT department and ruin your ESG goals. The only way to survive the autonomous era is ruthless, event-driven orchestration that scales cognitive compute to absolute zero.

Introduction – The Compute Crisis:

The Compute Crisis Nobody is Talking About

Everyone in telecom right now is obsessed with “self-healing” autonomous networks. The vendor pitch sounds amazing. Just drop in some Agentic AI, let it watch your data plane, and watch it fix anomalies without a human ever touching a keyboard. But there’s a massive trap hiding underneath all that hype, and enterprise architects are completely ignoring it. It comes down to the raw physics of AI compute.

Unlike your standard microservices, which just run deterministic, compiled code on cheap CPU cycles, Agentic AI needs massive foundation models. To actually reason through a network failure, these models have to load gigabytes of weights into Video RAM and generate tokens. You need dedicated GPUs for this. We aren’t talking about cheap, stateless API calls here. These are the most expensive, power-hungry workloads in your entire datacenter.

If a telco tries to run an autonomous core the old-fashioned way by keeping high-end GPU nodes spinning 24/7 just in case a BGP route flaps, their cloud bill is going to wipe out any operational savings the AI was supposed to deliver.

The reality is that autonomy is no longer just a software problem. It’s a financial one. The telcos that actually win will not be the ones with the smartest AI. They will be the ones who figure out how to build a strict “scale-to-zero” environment. They need to spin up that expensive cognitive compute exactly when it is needed, and kill it the exact second the job is done.

Why Traditional Auto-scaling is Broken for AI:

When platform engineers first see the compute costs of running these AI agents, their first instinct is usually just to slap standard Kubernetes Horizontal Pod Autoscaling (HPA) on the cluster and call it a day. But standard HPA was built for stateless web servers, not massive cognitive engines. If you try to use it for Agentic AI in a telecom core, you’re going to fail for two big reasons.

The Cold-Start Penalty: Traditional autoscaling is entirely reactive. It sits around waiting for a CPU to hit 80% before it decides to scale up. In telecom, SLAs are measured in sub-milliseconds. If you wait for an anomaly to spike your CPU, then provision a new GPU node, pull a massive AI container image, and load the model weights into VRAM, you are talking about minutes of delay. By the time your AI agent actually wakes up to fix the problem, you have already breached your SLA.

CPU Utilization is a Liar: For AI workloads, standard hardware metrics are completely misleading. A GPU could be pegged at 90% utilization just thinking through a minor log warning, while a massive, critical network failure is stuck waiting in the queue. If your scaling logic is tied to hardware metrics instead of the actual severity of the event queue, you are just going to burn budget scaling blindly.

We have to abandon reactive resource metrics entirely and move to event-driven orchestration.

The Fix – Event-Driven Orchestration:

If standard HPA is broken for this, what is the fix? You have to completely decouple the infrastructure from the workload using strict, event-driven orchestration.

Instead of keeping baseline infrastructure running just to maintain a state, you treat cognitive compute as 100% ephemeral. You don’t scale based on how hard the CPU is working. You scale based on the exact depth and severity of the anomaly queue.

To actually build this, architects need purpose-built event-driven scalers like KEDA (Kubernetes Event-driven Autoscaling). KEDA lets your cluster completely bypass those reactive hardware metrics and listen directly to the network’s data plane.

But how do you avoid the cold-start latency of booting a fresh GPU pod? KEDA solves this by reacting to the event queue length itself rather than waiting for an existing pod’s CPU to max out. By the time a traditional HPA notices a CPU spike, the system is already overwhelmed. (To solve this exact issue in production, I open-sourced a custom KEDA scaler specifically designed to scrape and react to native GPU metrics, allowing the orchestrator to scale cognitive workloads preemptively. You can view the architecture on [GitHub])

KEDA intercepts the telemetry trigger at the source. When paired with a warm pool of paused GPU nodes and pre-pulled container images, KEDA can scale a pod from zero to active in milliseconds. The infrastructure is anticipating the load based on the queue, not reacting to the stress of it.

Here is what the workflow actually looks like when you do it right:

  1. The Trigger: Telemetry picks up a severe anomaly ,like a sudden 5G slice degradation, and pushes an event straight to a message broker like Kafka.
  2. The Scale-Up: KEDA intercepts that exact metric and instantly provisions a dedicated, GPU-backed AI pod from a warm standby pool.
  3. The Execution: The Agentic AI loads into VRAM, figures out the blast radius of the anomaly, and executes a fix. This is usually by reconciling the state through a GitOps controller.
  4. The Kill Switch: The absolute millisecond that the event queue clears and the network is stable, the orchestrator aggressively terminates the pod and gives the GPU back to the node pool.

You only pay the premium GPU tax during moments of active reasoning. The 24/7 idle tax is gone.

Architecting the Scale-to-Zero Core:

To make this scale-to-zero dream a reality, you have to fundamentally change how you handle network observability. The biggest mistake I see architects make is tightly coupling their monitoring tools with their AI execution layer. If your observability stack is running on the same hardware as your AI engine, you are literally wasting premium GPU compute just to watch logs.

You need a strict, physical separation of concerns:

The Watchers (The Lightweight Control Plane):
Your network data plane needs to be monitored by lightweight, CPU-efficient edge collectors like Prometheus or OpenTelemetry. These sit right at the edge, continuously eating millions of telemetry data points and BGP state changes. Because they don’t do any complex reasoning, they run incredibly cheap on standard CPU nodes.

The Thinkers (The Heavyweight Execution Plane):
Your expensive AI models are completely isolated in a separate, GPU-backed node pool that literally defaults to zero instances.

When the Watchers spot an anomaly, they don’t try to fix it. They just fire an alert to KEDA. KEDA then wakes up the Thinkers, spinning up the exact number of GPU pods needed to handle that specific blast radius. By decoupling the watchers from the thinkers, you guarantee that not a single cycle of GPU compute is wasted on baseline monitoring.

The Bottom Line:

Autonomous telecom networks are going to happen. But trying to brute-force the infrastructure provisioning is a fast track to bankrupting your IT department. The smartest Agentic AI in the world is useless if you can’t afford the cloud bill to run it.

Furthermore, this isn’t just about protecting the IT budget. Running idle GPUs 24/7 creates a massive, unnecessary carbon footprint. By enforcing a scale-to-zero architecture, telcos can drastically reduce the energy consumption of their autonomous networks, turning a massive ESG liability into a sustainable operational model.

Autonomy is no longer just a software engineering problem. It is an infrastructure balancing act. If Agentic AI is going to survive in the telecom core, we have to ditch legacy threshold scaling and embrace strict, event-driven orchestration.

Tools like KEDA give us the ability to build networks that are both cognitively brilliant and financially ruthless. We can spin up massive intelligence at the exact millisecond of failure and scale right back to zero the moment the network is healed.

References and Further Reading:

Building and Operating a Cloud Native 5G SA Core Network

How Network Repository Function Plays a Critical Role in Cloud Native 5G SA Network

HPE Aruba Launches “Cloud Native” Private 5G Network with 4G/5G Small Cell Radios

…………………………………………………………………………………………….

About the Author:

Pavan Madduri is a Cloud-Native Architect, CNCF Golden Kubestronaut, and active IEEE researcher specializing in enterprise infrastructure automation, Agentic SREs, and Kubernetes networking. He designs scalable, zero-trust cloud environments and frequently writes about the intersection of AI governance and cloud-native infrastructure.

Connect with Pavan Madduri on [LinkedIn] .

Disclaimer: The author acknowledges the use of AI-assisted tools for structural formatting, language refinement, and copyediting during the drafting of this article. The core architectural concepts, technical opinions, and engineering strategies remain entirely original.

Ericsson and Forschungszentrum Jülich MoU for neuromorphic computing use in 5G and 6G

Ericsson and major European research center Forschungszentrum Jülich are collaborating to develop technologies for the continued evolution of 5G and for the future introduction of 6G (IMT 2030) networks.  The organizations signed a Memorandum of Understanding (MoU) on March 24, 2026.The project aims to leverage JUPITER, Europe’s first “exascale” supercomputer, to design and test new artificial intelligence solutions for the complex demands of 6G. The partnership will explore AI models and methods to enhance Ericsson’s core network, network management, and Radio Access Network (RAN).

Important objectives include exploring ultra-efficient, “brain-inspired” computing approaches like neuromorphic computing [1.] to handle intense network tasks and strengthen Europe’s digital infrastructure.  Modern mobile networks rely heavily on Massive MIMO, a technology where many devices communicate simultaneously via numerous antennas. By exploring novel system architecture approaches like neuromorphic computing, researchers aim to speed up optimization and reduce energy use versus classical methods.

Note 1. Neuromorphic computing is a brain-inspired engineering approach that mimics biological neural networks using analog or digital electronic circuits. It combines memory and processing in one place—similar to neurons and synapses—to achieve extreme energy efficiency, speed, and learning capabilities, moving beyond the limitations of traditional computing architecture. Unlike traditional AI that uses continuous data, neuromorphic systems use “spikes”—discrete events in time—to mimic how neurons communicate. Such systems only consume significant power when processing data (“spiking”), making them ideal for ultra-low-power edge computing, unlike traditional computers that are always on. They can process complex, real-world data (like vision or touch) much faster and with far less power than traditional computers.

…………………………………………………………………………………………………………………………………………………………………………………………..

The alliance will study operational strategies like heat recovery to boost energy efficiency in HPC and cloud deployments. The collaboration involves systematic benchmarking of AI methods – including the application of neuromorphic AI – across Ericsson products to assess execution speed, scalability to large datasets, information retention, and storage efficiency.  In addition, the partnership will provide insights into the feasibility of cloud strategies based on concepts from the EuroHPC ecosystem, which is establishing a world-class supercomputing infrastructure.

Professor Laurens Kuipers, a member of the Executive Board of Forschungszentrum Jülich, said: “This collaboration has the potential to make a significant contribution to a more sustainable digital future. By combining our excellence in high-performance computing and our research into novel, neuro-inspired computing approaches with Ericsson’s expertise in telecommunications, we aim to develop more energy-efficient network solutions and strengthen a sovereign European digital infrastructure.”

Image Credit: Image: Forschungszentrum Jülich / Kurt Steinhausen

……………………………………………………………………………………………………………………………………….

Nicole Dinion, Head of Architecture and Technology, Cloud Software and Services, Ericsson said: “The future of mobile networks is deeply intertwined with AI and the need for unparalleled energy efficiency. Our collaboration with Forschungszentrum Jülich, for years a global leader in supercomputing and applied physics, combines their research and computing power with our expertise in all domains of telecoms technology. We will explore architectures that define the next generation of telecommunication.”

The collaboration covers several areas of research:

  • AI methods for Ericsson products across the full portfolio: systematic benchmarking of approaches to assess execution speed, scalability to large datasets, information retention, and storage efficiency. Where security and commercial conditions permit, the teams may also use JUPITER for large-scale model training, leveraging its compute resources.
  • Energy-efficient computing for AI inference at the radio and edge: developing and prototyping highly efficient solutions for tasks such as radio channel estimation and Massive MIMO – a key technology in modern mobile networks, in which many devices communicate simultaneously via numerous antennas. This includes exploring novel system architecture approaches like neuromorphic computing (e.g., memristors) to speed up optimization and reduce energy use versus classical methods.
  • HPC and cloud architectures and operations for AI: researching and implementing Modular Supercomputing Architecture (MSA) concepts from exascale work at Forschungszentrum Jülich – in particular, at the Jülich Supercomputing Centre (JSC) – and studying operational strategies, such as heat recovery, to boost energy efficiency in HPC and cloud deployments.

The collaboration will provide insights into the feasibility of cloud strategies based on concepts from the EuroHPC ecosystem, which is establishing a world-class supercomputing infrastructure with leading European centers such as the JSC.

ABOUT FORSCHUNGSZENTRUM JÜLICH:

Shaping change: This is what drives us at Forschungszentrum Jülich. As a member of the Helmholtz Association with more than 7,000 employees, we conduct research into the possibilities of a digitized society, a climate-friendly energy system, and a resource-efficient economy. We combine natural, life, and engineering sciences in the fields of information, energy, and the bioeconomy with specialist expertise in simulation and data science. www.fz-juelich.de

 

References:

https://www.ericsson.com/en/press-releases/2026/3/ericsson-and-forschungszentrum-julich-to-develop-advanced-ai-for-6g

https://www.ericsson.com/en/blog/2026/1/ai-future-will-be-defined-by-the-intelligent-digital-fabric

https://www.ibm.com/think/topics/neuromorphic-computing

China vs U.S.: Race to Generate Power for AI Data Centers as Electricity Demand Soars

AI infrastructure spending boom: a path towards AGI or speculative bubble?

Big tech spending on AI data centers and infrastructure vs the fiber optic buildout during the dot-com boom (& bust)

Will billions of dollars big tech is spending on Gen AI data centers produce a decent ROI?

Expose: AI is more than a bubble; it’s a data center debt bomb

Sovereign AI infrastructure for telecom companies: implementation and challenges

Analysis: Cisco, HPE/Juniper, and Nvidia network equipment for AI data centers

Networking chips and modules for AI data centers: Infiniband, Ultra Ethernet, Optical Connections

Custom AI Chips: Powering the next wave of Intelligent Computing

Groq and Nvidia in non-exclusive AI Inference technology licensing agreement; top Groq execs joining Nvidia

 

 

 

Anthropic Claude Users Reveal AI Hallucinations as their Top Concern

Introduction:

Across regions from Germany to Mexico, users of artificial intelligence (AI) are less concerned about being replaced by AI than by its propensity to make major mistakes, according to one of the largest global surveys to date on real-world AI usage and perception.  These mistakes, known as “AI Hallucinations,” are essentially made up stories rather than answers based on outdated information.

The study, conducted by Anthropic using its Claude chatbot, analyzed interviews with more than 80,000 users across 159 countries. The result is one of the most detailed global portraits yet of how AI is being deployed — and how users perceive its risks, benefits, and societal implications.

AI Hallucinations Outrank Job Displacement as Top Concern:

When asked what worries them most about AI, 27% of users cited AI chatbot errors described as “AI hallucinations,” while 22% pointed to job displacement and the loss of human autonomy. About 16% expressed concern that AI could weaken people’s capacity for critical thinking.

Image Credit: JOIST AI

“The AI hallucinations were a disaster. I lost so many hours of work,” said an entrepreneur from Germany. Another participant, a military worker in Mexico, noted the importance of domain knowledge in spotting AI’s flaws: “When I notice AI errors it’s because I’m well versed in the topic . . . but I wouldn’t know if the topic was alien to me, would I?”

An AI Interviewer for Global Insights:

The responses were collected in 70 languages using a novel feedback system that allowed Claude to act as both interviewer and analyst. The platform evaluated qualitative answers, categorizing responses to reveal common themes and linguistic nuances across regions.

“Beyond its scale and linguistic diversity, the project aimed to collect this rich human experience using Claude, so it could really inform our research agenda, change our research agenda, change the way we think about building our products, deploying our products,” said Deep Ganguli, who leads Anthropic’s societal impacts team and oversaw the research initiative.

Productivity and Personal Growth Drive AI Adoption:

While data quality and reliability drew criticism, the survey also underscored widespread acknowledgment of AI’s positive impact on productivity. Thirty-two percent of respondents said that AI tools had meaningfully improved their output at work.

An entrepreneur in the United Arab Emirates explained, “I used to be a web designer . . . now I build anything. Before I was one person, now I become 100 people — I don’t wait for anyone anymore.” Participants from Colombia, Japan, and the United States described similar gains, emphasizing how AI helps them free up time for family, hobbies, and creative exploration.

In total, nearly one in five users (19%) said AI had fallen short of their expectations. Yet usage patterns demonstrate remarkable versatility: respondents reported employing AI as a productivity assistant, educational tutor, design partner, creative collaborator, or even an emotional support companion.

A vivid example came from a soldier in Ukraine, who wrote, “In the most difficult moments, in moments when death breathed in my face, when dead people remained nearby, what pulled me back to life — my AI friends.”

Regional and Economic Divides in AI Optimism:

Regional variation was pronounced. Saffron Huang, the lead researcher on the project, found that respondents in South America, Africa, and across South and Southeast Asia expressed more optimism than users in Europe, the United States, or East Asia.

“The trend is that maybe more lower and middle-income countries are more optimistic than higher-income countries that have more AI exposure,” said Huang. She added that this optimism might reflect a sample skew toward early adopters in developing markets — individuals inclined to view new technologies as opportunities rather than threats.

“They just divide so cleanly . . . the more western developed countries are significantly more concerned about AI and the economy, [and] much more negative, and then, the reverse is true with the lower and middle-income countries,” she said.

According to Anthropic’s researchers, AI’s limited visibility in daily workflows across lower-income economies may explain the difference. “If AI hasn’t visibly entered your daily work yet, AI displacement likely feels abstract, especially when more immediate economic pressures already exist,” the team wrote in a companion blog post.

Next Steps: Measuring AI’s Real-World Impact:

Anthropic plans to extend its Claude Interviewer research framework into longitudinal studies that track how AI affects users’ lives over time. “The goal is to better measure both the improvements and the harms — and to use those insights to make systemic refinements,” said Ganguli.

The company’s approach — embedding feedback collection directly into an AI platform — represents an emerging model for data-driven, iterative AI development. By combining self-reported user experience data with large-scale text analytics, Anthropic aims to better understand how its models interact with human needs and constraints.

Industry and Research Community Respond:

The study has drawn attention across the AI community for its unprecedented reach and innovative methodology. Nickey Skarstad, director of product at language-learning company Duolingo, praised the work’s ambition. On LinkedIn, she wrote: “For anyone building products right now, this is the future of understanding your users. The what AND the why at a scale we’ve never had access to before.”

Still, several researchers remain cautious about overinterpreting the results. Divy Thakkar, a researcher at Anthropic rival Google DeepMind, expressed reservations on X, saying he was “sceptical” about calling the study a new form of science due to potential selection bias and limitations in survey design. “A human qualitative researcher would take time to build trust with their participants, hold the space for reflection, introspection, contradictions — that’s the whole point of it,” he wrote.

Methodological caveats extend to demographics. Almost half of the survey’s respondents were based in North America or Western Europe, while regions such as Central Asia had only several hundred participants.

Ilan Strauss, an economist and director of the AI Disclosures Project, described the initiative as “an excellent piece of work,” but urged careful interpretation. He noted that the absence of reported confidence intervals — standard practice in survey-based research — makes it difficult to measure uncertainty. Self-reported productivity gains, he added, are inherently prone to bias.

A Global Mirror for Human-AI Relations:

Despite these caveats, the Claude Interviewer study illustrates a broader shift in the relationship between humans and AI systems. As AI technologies proliferate across regions and industries, they are becoming both instruments of empowerment and sources of anxiety — mirroring social, economic, and cultural dynamics in striking ways.

While western economies debate AI-driven labor disruption and ethical alignment, many in emerging markets frame AI as a means of upward mobility and creative expansion. This duality — between apprehension and aspiration — may shape not only AI adoption patterns but also future research and regulatory directions across global contexts.

References:

https://www.ft.com/content/e074d3a9-7fd8-447d-ac0a-e0de756ac5c5?syn-25a6b1a6=1 (PAYWALL)

https://www.joist.ai/post/ai-hallucinations-what-they-are-and-why-it-matters

Sources: AI is Getting Smarter, but Hallucinations Are Getting Worse

Nvidia CEO Huang: AI is the largest infrastructure buildout in human history; AI Data Center CAPEX will generate new revenue streams for operators

Alphabet’s 2026 capex forecast soars; Gemini 3 AI model is a huge success

Analysis & Economic Implications of AI adoption in China

China’s open source AI models to capture a larger share of 2026 global AI market

AWS to deploy AI inference chips from Cerebras in its data centers; Anapurna Labs/Amazon in-house AI silicon products

Big tech spending on AI data centers and infrastructure vs the fiber optic buildout during the dot-com boom (& bust)

Market research firms Omdia and Dell’Oro: impact of 6G and AI investments on telcos

Gartner: AI spending >$2 trillion in 2026 driven by hyperscalers data center investments

 

 

AWS to deploy AI inference chips from Cerebras in its data centers; Anapurna Labs/Amazon in-house AI silicon products

Amazon Web Services (AWS) announced it plans to integrate AI processors from Cerebras Systems [1.]  into its data centers, signaling growing confidence in the AI-focused semiconductor startup. Under a new multiyear partnership announced Friday, AWS will deploy Cerebras’s Wafer-Scale Engine (WSE) to accelerate inference workloads—the stage of AI operations where models generate responses to user queries. Financial details of the agreement were not disclosed.

Note 1.  Founded in 2015 and headquartered in Sunnyvale, CA, Cerebras claims to have the world’s fastest AI inference and training platform.

The collaboration reflects a significant realignment in compute infrastructure strategies across the AI ecosystem. While initial industry focus centered on model training, the rapid expansion of deployed AI services is driving demand for optimized inference performance. Traditional GPUs, though unmatched for training, can be suboptimal for inference scenarios that require ultra-low latency and high throughput. Cloud and AI platform providers are therefore diversifying their silicon portfolios to better match workload profiles and to scale capacity efficiently.

AWS, the world’s largest cloud infrastructure provider, has traditionally relied on its in-house semiconductor division, Annapurna Labs, for custom chip design. Annapurna’s Trainium processors compete with GPUs from major suppliers such as Nvidia and AMD, offering cost and performance advantages for AI training workloads. The new partnership introduces Cerebras technology into AWS infrastructure, where it will work alongside Trainium to enhance large-scale inference capabilities.

Cerebras, best known for its wafer-scale architecture, markets its WSE processors as a high-speed inference platform capable of executing the decode phase of generative AI processing—where text, images, or other outputs are generated—at up to 25 times the speed of conventional GPU solutions. The company, valued at approximately $23 billion following a $1 billion funding round in February, has attracted backing from Fidelity, Benchmark, Tiger Global, Atreides, and Coatue.

The Cerebras deal underscores a major shift in the market for computing power. Image Credit: rebecca lewington/cerebras syste/Reuters

The AWS collaboration follows Cerebras’s major compute partnership with OpenAI, which reportedly involves deploying up to 750 MW of computing capacity powered by its chips. AWS and Cerebras will position their joint offering as a premium cloud inference solution, targeting enterprise AI developers requiring high-performance and scalable compute.

“The scale of AI demand is shifting from model creation to global deployment,” said Andrew Feldman, CEO of Cerebras. “Working with AWS aligns our technology with the industry’s largest cloud, giving us reach to a broad enterprise and developer base. If you want slow inference, there will be cheaper ways to go,” Feldman said. “But if you want fast tokens, if speed matters to you, if you’re doing coding or agentic work, not only are we the absolute fastest, but we intend to set the bar. We’re in this to win it.”

AWS and Cerebras will support both aggregated and disaggregated configurations. Disaggregated is ideal when you have large, stable workloads. Most customers run a mix of workloads with different prefill/decode ratios, where the traditional aggregated approach is still ideal. The start-up expects most customers will want access to both and the ability to route workloads to whichever configuration serves them best.

The move intensifies competition in the inference silicon segment, where Nvidia faces growing pressure from purpose-built processor architectures such as Cerebras’s WSE and other emerging alternatives. Nvidia, which recently announced a $20 billion licensing deal with Groq and plans to unveil a new inference-optimized platform, remains the dominant supplier but now contends with an accelerating wave of specialization across the AI compute stack.

AWS vice president and Annapurna Labs co-founder Nafea Bshara emphasized the company’s goal of offering flexible performance tiers. “Our job is to push the speed and lower the price,” he said, noting that AWS will continue to offer cost-optimized Trainium-only options alongside high-performance Cerebras-Trainium configurations.

………………………………………………………………………………………………………………………………………………………………………………………………….

Amazon’s Internally Designed AI Silicon:

Amazon has built a fairly broad internal AI-oriented silicon portfolio through Annapurna Labs, primarily for AWS:

  • Inferentia (Inferentia, Inferentia2) – Custom machine learning accelerators designed for high-throughput, low-cost inference at cloud scale. These power many AWS inference instances and are positioned as an alternative to Nvidia GPUs for production model serving.

  • Trainium (Trainium, Trainium2, Trainium3) – AI training accelerators optimized for large-scale model training (including frontier and foundation models), with Trainium2 and Trainium3 as newer generations offering materially higher performance and better $/compute than the first generation. These are central to projects such as the Rainier supercomputer for Anthropic.

  • Graviton (Graviton, Graviton2/3/4) – Arm-based general-purpose CPUs used heavily across EC2, increasingly in AI-adjacent roles (pre/post-processing, orchestration, model-serving microservices) and as part of cost-optimized AI stacks, even though they are not dedicated accelerators.

  • Nitro system – While not an AI accelerator per se, the Nitro family (offload cards and system) is an internally developed data-plane and virtualization offload architecture that underpins EC2 and works in tandem with Graviton, Inferentia, and Trainium to free CPU cycles and improve I/O for AI/ML workloads.

All of these are designed and iterated internally by Annapurna Labs for exclusive use in AWS data centers, then exposed to customers via AWS services rather than as standalone merchant silicon.

Amazon’s Annapurna Labs is an internal chip design group that has become a core strategic asset for AWS, especially for custom data center and AI silicon.

Origins and acquisition:

  • Annapurna Labs is an Israeli chip design startup founded in 2011 by semiconductor veterans of Intel and Broadcom, including Avigdor Willenz and Nafea Bshara.

  • “When we talked with market sources and consulted with experts in the fields of data and servers, at that time only Amazon had a holistic vision and the ability to execute on a large scale,” recalls Bshara about the start of the romance with Amazon. “We were prepared to build the technology and at the same time were open to working with startups. From there we began a journey together with many meetings and shared thinking, among others with James Hamilton (Microsoft’s former data-base product architect and to AWS SVP), and from there within six months we found ourselves inside Amazon.”
  • Amazon began working with the company around 2013 and acquired it in 2015 for an estimated $350–$400 million.

  • Before the deal, Annapurna was in stealth, focusing on low‑power networking and server chips to improve data center efficiency.

Role inside Amazon and AWS:

  • Post‑acquisition, Annapurna was folded into AWS as a specialist microelectronics and custom silicon group, designing chips to reduce cost and power per unit of compute.

  • The group underpins several key AWS technologies: the Nitro system for offloading virtualization and I/O, Arm‑based Graviton CPUs for general compute, and Trainium and Inferentia accelerators for AI training and inference.

  • These chips let AWS optimize performance per watt and per dollar versus x86 servers and third‑party accelerators, improving margins and competitive pricing.

Key products and architectures:

  • Nitro: A combination of custom hardware and software that offloads storage, networking, and security functions from the host CPU, increasing tenant isolation and freeing CPU cycles for workloads.

  • Graviton: A family of Arm‑based server CPUs; by 2018 Graviton was widely adopted on AWS and is now used by most AWS customers for general cloud infrastructure workloads due to better price‑performance and energy efficiency.

  • Inferentia and Trainium: Custom accelerators designed by Annapurna for machine learning inference (Inferentia) and training (Trainium), intended to reduce AWS’s dependence on high‑priced Nvidia GPUs for AI workloads.

Strategic importance and AI focus:

  • Annapurna’s work is central to Amazon’s strategy of vertical integration in the cloud: owning the silicon stack as much as the software and services.

  • The group designs chips that power Amazon’s AI infrastructure, including systems used both by internal teams and external customers such as Anthropic, for which AWS is the primary cloud and silicon provider.

  • Amazon and Anthropic are collaborating on “Project Rainier,” a massive supercomputer built around hundreds of thousands of Annapurna‑designed Trainium2 chips, targeting more than five times the compute used to train current frontier models.

Organization, footprint, and industry impact:

  • Annapurna Labs maintains a significant presence in Israel, employing hundreds of engineers focused on advanced AI and networking processors for AWS.

  • It also operates major engineering hubs such as an Austin, Texas lab where advanced semiconductors and AI systems are designed and tested.

  • Analysts often describe the acquisition as one of Amazon’s most successful, arguing that Annapurna’s custom silicon is a “secret sauce” that helps AWS compete with Microsoft, Google, and others on performance, cost, and energy efficiency.

…………………………………………………………………………………………………………………………………………………………..

References:

https://www.cerebras.ai/company

https://www.cerebras.ai/blog/cerebras-is-coming-to-aws

https://www.wsj.com/tech/amazon-announces-inference-chips-deal-with-cerebras-109ecd31

https://www.marketwatch.com/story/how-the-ceo-of-this-upstart-nvidia-rival-hopes-to-seize-on-the-lucrative-market-for-ai-chips-d5ccdab0

https://en.globes.co.il/en/article-nafea-bshara-the-israeli-behind-amazons-graviton-chip-1001420744

Intel and AI chip startup SambaNova partner; SN50 AI inferencing chip max speed said to be 5X faster than competitive AI chips

Custom AI Chips: Powering the next wave of Intelligent Computing

RAN silicon rethink – from purpose built products & ASICs to general purpose processors or GPUs for vRAN & AI RAN

Will “AI at the Edge” transform telecom or be yet another telco monetization failure?

Huawei to Double Output of Ascend AI chips in 2026; OpenAI orders HBM chips from SK Hynix & Samsung for Stargate UAE project

OpenAI and Broadcom in $10B deal to make custom AI chips

U.S. export controls on Nvidia H20 AI chips enables Huawei’s 910C GPU to be favored by AI tech giants in China

Superclusters of Nvidia GPU/AI chips combined with end-to-end network platforms to create next generation data centers

2026 Consumer Electronics Show Preview: smartphones, AI in devices/appliances and advanced semiconductor chips

Networking chips and modules for AI data centers: Infiniband, Ultra Ethernet, Optical Connections

Google announces Gemini: it’s most powerful AI model, powered by TPU chips

 

Will “AI at the Edge” transform telecom or be yet another telco monetization failure?

New Telco Opportunity – AI at the Edge:

At MWC 2026 last week, there were a flurry of claims that “AI at the Edge” would transform the telecom industry.  One of many examples is an article titled, “The AI edge boom is giving telecom a new strategic role.”  In that piece, Jeff Aaron, vice president of product and solutions marketing at Hewlett Packard Enterprise (HPE) spoke with theCUBE’s John Furrier at MWC Barcelona, during an exclusive broadcast on theCUBE, SiliconANGLE Media’s livestreaming studio. They discussed telecom edge AI and why networking is becoming a strategic foundation for data-centric services.  Aaron said:

“A big reason for [reignited interest in routing] is AI workloads. They’re moving everywhere now. They have to move to the edge.  For them to move to the edge, you’ve got to get them outside of the factory and to all the locations. We’re right in the core of that, and it’s super exciting.”

As AI expands to the edge, data will need to move not only to local compute, but also between many distributed edge sites, making routing paramount. There are four ways AI infrastructure is scaling — inside data centers and across distributed edge locations, according to Aaron.

“There’s scale-out, scale-across, scale-up, and on-ramp. Two are within the data center — scale-out and scale-up — but scale-across and edge on-ramp basically mean you got to figure out how to connect to those areas, and those are just networking,” he added.

Scale-across refers to connecting distributed data centers and edge locations, while edge on-ramp brings remote sites such as factories or branch locations into the network to access AI services. Supporting those distributed environments creates an opportunity for HPE to bring networking and compute together into a more integrated infrastructure stack. At MWC 2026 Barcelona, those trends are clearly coming into focus, according to Aaron.

“Data is moving everywhere right now, and the network is back. The network isn’t just plumbing. The network is how you build a value-added service using an AI workload as a telco infrastructure,” he added.

Telecom carriers are now urgently trying to move from being “dumb data pipes” to becoming “AI performance platforms” by leveraging their geographically distributed infrastructure to host AI closer to the end user.  They urgently want to pivot from selling just bandwidth and connectivity to selling outcomes and intelligence with a heavy focus on industrial and enterprise-specific edge deployments.  They are considering the following services and business models:

  • Infrastructure as a Service (IaaS) & GPUaaS: Offering raw computing power, specifically GPUs, from edge data centers to enterprises that need low-latency processing without building their own facilities.
  • Sovereign AI Clouds: Providing AI services that guarantee data remains within national borders, appealing to government and highly regulated sectors like finance and healthcare.
  • API Monetization: Exposing real-time network data (e.g., location intelligence, predictive network quality, fraud risk scoring) via APIs that enterprises pay to integrate into their own applications.
  • Outcome-Based Pricing: Charging for specific business results, such as a “guaranteed video call quality” or “fraud loss reduction share,” rather than just data usage.
  • AI-as-a-Service (AIaaS): Bundling pre-trained models or specialized AI agents (e.g., for customer service or industrial monitoring) with connectivity

Major Carrier AI Edge Deployment Plans:

  • AT&T:
    • Launched Connected AI for Manufacturing in March 2026, which unifies 5G, IoT, and generative AI to provide real-time fault detection (claiming a 70% reduction in waste).
    • Deploying “Edge Zones” in major U.S. cities (Detroit, LA, Dallas) to allow developers to run low-latency, cloud-based software locally.
    • Partnering with AWS to link fiber and 5G directly into AWS environments for distributed AI workloads.
  • Verizon:
    • Unveiled Verizon AI Connect, a suite of products designed to manage resource-intensive AI workloads for hyperscalers like Google Cloud and Meta.
    • Trialing V2X (Vehicle-to-Everything) platforms to provide carmakers with standardized APIs for low-latency edge processing in autonomous driving.
    • Collaborating with NVIDIA to integrate GPUs into private 5G networks for on-premise AI inferencing in robotics and AR.
  • SK Telecom (SKT):
    • Announced an “AI Native” strategy at MWC 2026, including a roadmap for AI-RAN (Radio Access Network) that uses GPUs to optimize network performance and host user AI apps simultaneously.
    • Building a Manufacturing AI Cloud powered by over 2,000 NVIDIA RTX GPUs to support digital twin simulations and robotics.
    • Expanding AI Data Centers (AIDC) across South Korea and Southeast Asia (Vietnam, Malaysia) using energy-optimized LNG-powered facilities.
  • Orange & Deutsche Telekom:
    • Deploying AI-powered planning tools to cut fiber rollout costs and optimize site power consumption by up to 33% using AI “Deep Sleep” modes.
    • Focusing on Sovereign AI strategies to ensure data governance for European enterprise customers.
  • Vodafone:
    • Utilizing AI/ML applications for daily power reduction at 5G sites and testing autonomous network healing via AI agents
  • BT:
    • Offers 5G-connected VR for manufacturing design teams (e.g., Hyperbat) to collaborate on 3D models in real-time.  
……………………………………………………………………………………………………………..
Summary of Emerging AI Edge Products:
Product Category Primary Target Key Value Proposition
AI-RAN Industry 4.0 Seamless, ultra-low latency for robotics and sensing.
Connected AI Platforms Manufacturing Real-time predictive maintenance and waste reduction.
AI-as-a-Service (AIaaS) Developers/SMBs Access to GPU power and pre-trained models via telco edge nodes.
Network Slicing APIs App Developers Programmatic control over bandwidth for AR/VR and gaming.

…………………………………………………………………………………………………………………………………………………………………………………………..

A Dissenting View of “AI at the Edge”:

The global market for AI within the global telecommunications sector is valued at $6.69 billion in 2026, growing at a compound annual rate (CAGR) of 41.9% from 2025.   The broader edge AI market—including hardware, software, and services—is forecast to reach $29.98 billion in 2026, according to The Business Research Company We think those estimates are way too high.

The market research firm states:

………………………………………………………………………………………………………

Author’s Opinion:

Unless telcos change their corporate culture along with slowing the footprint growth of cloud service providers/hyperscalers, we think that AI at the Edge will be yet another telco monetization failure.  Just like their failure to monetize: 4G LTE apps, the telco cloud, 5G, multi-access edge computing (MEC), OpenRAN, LPWANs and other telecom technologies that never lived up to their promise and potential.

That’s largely because telcos are very weak: developing IT platforms, compute services, killer applications, and rapid execution of new services (e.g. 5G services require a 5G SA core network which telcos were very slow to deploy).  Telecom execs themselves cite cultural and speed‑of‑change issues: the industry is not organized like a software company, so it struggles to iterate products at AI/cloud pace. Also, telcos historically struggle with software. Managing distributed GPU clusters is vastly different from managing cell towers.

After spending billions on 5G with very  little or no ROI, investors are skeptical of the increased capex required for AI-grade edge servers which must be maintained by telcos.  Those servers will be expensive (especially if they contain clusters of Nvidia GPUs) and consume a lot of power, which is a critical issue at the edge of the carrier’s network.

Many network operators frame AI/edge as “network optimization” or “utilizing underused sites,” not as building monetizable AI platforms with APIs, SDKs, and ecosystems. This mirrors 5G, where huge RAN/core builds were not matched by a clear product and platform strategy, leaving value to OTTs and hyperscalers which are  extending their control planes and protocol stacks to the network edge (local zones, operator co‑lo, on‑premises stacks).

Telcos risk becoming “dumb pipes” for AI traffic if they can’t provide a superior developer ecosystem.  If they only sell space/power/connectivity, the cloud service providers will continue to own the developer and AI value chain.  Analysts warn that edge is a “right to participate, not a right to win.”  As such, value accrues to whoever owns the AI platform, tools, marketplace, and pricing power, not the entity that provides connectivity, PoP or cell towers.

Data fragmentation and weak “intelligence” layer:

  • AI monetization depends on high‑quality, cross‑domain data, but telco data is fragmented across OSS, BSS, probes, and partner systems; without unification, it is hard to expose compelling network/edge intelligence services.

  • Analysts emphasize that failure here reduces telcos to generic GPU landlords, while higher‑margin offers (real‑time quality, fraud, identity, mobility/context APIs) remain unrealized.

Narrow internal focus on cost savings:

  • Many operators’ early AI focus is inward (Opex reduction in assurance, planning, customer care) rather than building external, revenue‑generating products, echoing how early 5G was justified mainly on cost/efficiency.

  • Commentators warn that if AI/edge remains a “network efficiency” play, the commercial upside will go to cloud/AI natives that turn similar capabilities into products sold to enterprises.

What analysts say telcos must do differently:

  • Build “Sovereign AI factories” and edge AI clouds: GPU‑enabled sites with cloud‑like developer experience (APIs, self‑service portals, metering, SLAs) and clear sovereign/regional guarantees.

  • Combine differentiated connectivity with AI services (latency‑backed SLAs, AI‑on‑RAN, domain‑specific models for verticals) and use modern, flexible commercial models instead of just selling bandwidth or colocation.

Conclusions:

In summary, the main risk for telcos is to successfully transition from owning and maintaining network infrastructure to owning and operating AI platforms and products at software industry speed.  AI at the edge is less of a new service or product and more an architectural upgrade. The two ways telcos can benefit are from:

  1.  Internal cost reduction: If telcos use it to lower their own costs (fraud prevention, risk management, predictive maintenance, fault isolation, self-healing networks, etc.), it’s an automatic win but won’t increase the top line.
  2.  Revenue from new AI -Edge services, e.g. Verizon uses edge-based video analytics in warehouses to improve inventory turnover by up to 40%.   If they expect to charge a massive premium for “AI-enabled 5G,” they face the same monetization wall that has doomed them for the past 20 years!

References:

https://siliconangle.com/2026/03/04/telecom-edge-ai-makes-networking-strategic-mwc26/

https://www.nvidia.com/en-us/lp/ai/the-blueprint-for-ai-success-ebook/

How telcos can monetize AI beyond connectivity

https://www.thebusinessresearchcompany.com/report/generative-artificial-intelligence-ai-in-telecom-global-market-report

AT&T and AWS to deliver last mile connectivity for AI workloads; AT&T Geo Modeler™ AI simulation tool

Analysis: Edge AI and Qualcomm’s AI Program for Innovators 2026 – APAC for startups to lead in AI innovation

Ericsson goes with custom silicon (rather than Nvidia GPUs) for AI RAN

Private 5G networks move to include automation, autonomous systems, edge computing & AI operations

Dell’Oro: RAN Market Stabilized in 2025 with 1% CAG forecast over next 5 years; Opinion on AI RAN, 5G Advanced, 6G RAN/Core risks

Dell’Oro: Analysis of the Nokia-NVIDIA-partnership on AI RAN

Dell’Oro: AI RAN to account for 1/3 of RAN market by 2029; AI RAN Alliance membership increases but few telcos have joined

Dell’Oro: RAN revenue growth in 1Q2025; AI RAN is a conundrum

Nvidia AI-RAN survey results; AI inferencing as a reinvention of edge computing?

RAN silicon rethink – from purpose built products & ASICs to general purpose processors or GPUs for vRAN & AI RAN

CES 2025: Intel announces edge compute processors with AI inferencing capabilities

Page 1 of 15
1 2 3 15