Nvidia
CES 2025: Intel announces edge compute processors with AI inferencing capabilities
At CES 2025 today, Intel unveiled the new Intel® Core™ Ultra (Series 2) processors, designed to revolutionize mobile computing for businesses, creators and enthusiast gamers. Intel said “the new processors feature cutting-edge AI enhancements, increased efficiency and performance improvements.”
“Intel Core Ultra processors are setting new benchmarks for mobile AI and graphics, once again demonstrating the superior performance and efficiency of the x86 architecture as we shape the future of personal computing,” said Michelle Johnston Holthaus, interim co-CEO of Intel and CEO of Intel Products. “The strength of our AI PC product innovation, combined with the breadth and scale of our hardware and software ecosystem across all segments of the market, is empowering users with a better experience in the traditional ways we use PCs for productivity, creation and communication, while opening up completely new capabilities with over 400 AI features. And Intel is only going to continue bolstering its AI PC product portfolio in 2025 and beyond as we sample our lead Intel 18A product to customers now ahead of volume production in the second half of 2025.”
Intel also announced new edge computing processors, designed to provide scalability and superior performance across diverse use cases. Intel Core Ultra processors were said to deliver remarkable power efficiency, making them ideal for AI workloads at the edge, with performance gains that surpass competing products in critical metrics like media processing and AI analytics. Those edge processors are targeted at compute servers running in hospitals, retail stores, factory floors and other “edge” locations that sit between big data centers and end-user devices. Such locations are becoming increasingly important to telecom network operators hoping to sell AI capabilities, private wireless networks, security offerings and other services to those enterprise locations.
Intel edge products launching today at CES include:
- Intel® Core™ Ultra 200S/H/U series processors (code-named Arrow Lake).
- Intel® Core™ 200S/H series processors (code-named Bartlett Lake S and Raptor Lake H Refresh).
- Intel® Core™ 100U series processors (code-named Raptor Lake U Refresh).
- Intel® Core™ 3 processor and Intel® Processor (code-named Twin Lake).
“Intel has been powering the edge for decades,” said Michael Masci, VP of product management in Intel’s edge computing group, during a media presentation last week. According to Masci, AI is beginning to expand the edge opportunity through inferencing [1.]. “Companies want more local compute. AI inference at the edge is the next major hotbed for AI innovation and implementation,” he added.
Note 1. Inferencing in AI refers to the process where a trained AI model makes predictions or decisions based on new data, rather than previously stored “training models.” It’s essentially AI’s ability to apply learned knowledge on fresh inputs in real-time. Edge computing plays a critical role in inferencing, because it brings it closer to users. That lowers latency (much faster AI responses) and can also reduce bandwidth costs and ensure privacy and security as well.
Editor’s Note: Intel’s edge compute business – the one pursuing AI inferencing – is in in its Client Computing Group (CCG) business unit. Intel’s chips for telecom operators reside inside its NEX business unit.
Intel’s Masci specifically called out Nvidia’s GPU chips, claiming Intel’s new silicon lineup supports up to 5.8x faster performance and better usage per watt. Indeed, Intel claims their “Core™ Ultra 7 processor uses about one-third fewer TOPS (Trillions Operations Per Second) than Nvidia’s Jetson AGX Orin, but beats its competitor with media performance that is up to 5.6 times faster, video analytics performance that is up to 3.4x faster and performance per watt per dollar up to 8.2x better.”
………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………….
However, Nvidia has been using inference in its AI chips for quite some time. Company officials last month confirmed that 40% of Nvidia’s revenues come from AI inference, rather than AI training efforts in big data centers. Colette Kress, Nvidia Executive Vice President and Chief Financial Officer, said, “Our architectures allows an end-to-end scaling approach for them to do whatever they need to in the world of accelerated computing and Ai. And we’re a very strong candidate to help them, not only with that infrastructure, but also with the software.”
“Inference is super hard. And the reason why inference is super hard is because you need the accuracy to be high on the one hand. You need the throughput to be high so that the cost could be as low as possible, but you also need the latency to be low,” explained Nvidia CEO Jensen Huang during his company’s recent quarterly conference call.
“Our hopes and dreams is that someday, the world does a ton of inference. And that’s when AI has really succeeded, right? It’s when every single company is doing inference inside their companies for the marketing department and forecasting department and supply chain group and their legal department and engineering, and coding, of course. And so we hope that every company is doing inference 24/7.”
……………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………….
Sadly for its many fans (including this author), Intel continues to struggle in both data center processors and AI/ GPU chips. The Wall Street Journal recently reported that “Intel’s perennial also-ran, AMD, actually eclipsed Intel’s revenue for chips that go into data centers. This is a stunning reversal: In 2022, Intel’s data-center revenue was three times that of AMD.”
Even worse for Intel, more and more of the chips that go into data centers are GPUs and Intel has minuscule market share of these high-end chips. GPUs are used for training and delivering AI. The WSJ notes that many of the companies spending the most on building out new data centers are switching to chips that have nothing to do with Intel’s proprietary architecture, known as x86, and are instead using a combination of a competing architecture from ARM and their own custom chip designs. For example, more than half of the CPUs Amazon has installed in its data centers over the past two years were its own custom chips based on ARM’s architecture, Dave Brown, Amazon vice president of compute and networking services, said recently.
This displacement of Intel is being repeated all across the big providers and users of cloud computing services. Microsoft and Google have also built their own custom, ARM-based CPUs for their respective clouds. In every case, companies are moving in this direction because of the kind of customization, speed and efficiency that custom silicon supports.
References:
https://www.intel.com/content/www/us/en/newsroom/news/2025-ces-client-computing-news.html#gs.j0qbu4
https://www.intel.com/content/www/us/en/newsroom/news/2025-ces-client-computing-news.html#gs.j0qdhd
https://www.wsj.com/tech/intel-microchip-competitors-challenges-562a42e3
Massive layoffs and cost cutting will decimate Intel’s already tiny 5G network business
WSJ: China’s Telecom Carriers to Phase Out Foreign Chips; Intel & AMD will lose out
The case for and against AI-RAN technology using Nvidia or AMD GPUs
Superclusters of Nvidia GPU/AI chips combined with end-to-end network platforms to create next generation data centers
FT: Nvidia invested $1bn in AI start-ups in 2024
AI winner Nvidia faces competition with new super chip delayed
AI Frenzy Backgrounder; Review of AI Products and Services from Nvidia, Microsoft, Amazon, Google and Meta; Conclusions
Networking chips and modules for AI data centers: Infiniband, Ultra Ethernet, Optical Connections
A growing portion of the billions of dollars being spent on AI data centers will go to the suppliers of networking chips, lasers, and switches that integrate thousands of GPUs and conventional micro-processors into a single AI computer cluster. AI can’t advance without advanced networks, says Nvidia’s networking chief Gilad Shainer. “The network is the most important element because it determines the way the data center will behave.”
Networking chips now account for just 5% to 10% of all AI chip spending, said Broadcom CEO Hock Tan. As the size of AI server clusters hits 500,000 or a million processors, Tan expects that networking will become 15% to 20% of a data center’s chip budget. A data center with a million or more processors will cost $100 billion to build.
The firms building the biggest AI clusters are the hyperscalers, led by Alphabet’s Google, Amazon.com, Facebook parent Meta Platforms, and Microsoft. Not far behind are Oracle, xAI, Alibaba Group Holding, and ByteDance. Earlier this month, Bloomberg reported that capex for those four hyperscalers would exceed $200 billion this year, making the year-over-year increase as much as 50%. Goldman Sachs estimates that AI data center spending will rise another 35% to 40% in 2025. Morgan Stanley expects Amazon and Microsoft to lead the pack with $96.4bn and $89.9bn of capex respectively, while Google and Meta will follow at $62.6bn and $52.3bn.
AI compute server architectures began scaling in recent years for two reasons.
1.] High end processor chips from Intel neared the end of speed gains made possible by shrinking a chip’s transistors.
2.] Computer scientists at companies such as Google and OpenAI built AI models that performed amazing feats by finding connections within large volumes of training material.
As the components of these “Large Language Models” (LLMs) grew to millions, billions, and then trillions, they began translating languages, doing college homework, handling customer support, and designing cancer drugs. But training an AI LLM is a huge task, as it calculates across billions of data points, rolls those results into new calculations, then repeats. Even with Nvidia accelerator chips to speed up those calculations, the workload has to be distributed across thousands of Nvidia processors and run for weeks.
To keep up with the distributed computing challenge, AI data centers all have two networks:
- The “front end” network which sends and receives data to/from external users —like the networks of every enterprise data center or cloud-computing center. It’s placed on the network’s outward-facing front end or boundary and typically includes equipment like high end routers, web servers, DNS servers, application servers, load balancers, firewalls, and other devices which connect to the public internet, IP-MPLS VPNs and private lines.
- A “back end” network that connects every AI processor (GPUs and conventional MPUs) and memory chip with every other processor within the AI data center. “It’s just a supercomputer made of many small processors,” says Ram Velaga, Broadcom’s chief of core switching silicon. “All of these processors have to talk to each other as if they are directly connected.” AI’s back-end networks need high bandwidth switches and network connections. Delays and congestion are expensive when each Nvidia compute node costs as much as $400,000. Idle processors waste money. Back-end networks carry huge volumes of data. When thousands of processors are exchanging results, the data crossing one of these networks in a second can equal all of the internet traffic in America.
Nvidia became one of today’s largest vendors of network gear via its acquisition of Israel based Mellanox in 2020 for $6.9 billion. CEO Jensen Huang and his colleagues realized early on that AI workloads would exceed a single box. They started using InfiniBand—a network designed for scientific supercomputers—supplied by Mellanox. InfiniBand became the standard for AI back-end networks.
While most AI dollars still go to Nvidia GPU accelerator chips, back-end networks are important enough that Nvidia has large networking sales. In the September quarter, those network sales grew 20%, to $3.1 billion. However, Ethernet is now challenging InfiniBand’s lock on AI networks. Fortunately for Nvidia, its Mellanox subsidiary also makes high speed Ethernet hardware modules. For example, xAI uses Nvidia Ethernet products in its record-size Colossus system.
While current versions of Ethernet lack InfiniBand’s tools for memory and traffic management, those are now being added in a version called Ultra Ethernet [1.]. Many hyperscalers think Ethernet will outperform InfiniBand, as clusters scale to hundreds of thousands of processors. Another attraction is that Ethernet has many competing suppliers. “All the largest guys—with an exception of Microsoft—have moved over to Ethernet,” says an anonymous network industry executive. “And even Microsoft has said that by summer of next year, they’ll move over to Ethernet, too.”
Note 1. Primary goals and mission of Ultra Ethernet Consortium (UEC): Deliver a complete architecture that optimizes Ethernet for high performance AI and HPC networking, exceeding the performance of today’s specialized technologies. UEC specifically focuses on functionality, performance, TCO, and developer and end-user friendliness, while minimizing changes to only those required and maintaining Ethernet interoperability. Additional goals: Improved bandwidth, latency, tail latency, and scale, matching tomorrow’s workloads and compute architectures. Backwards compatibility to widely-deployed APIs and definition of new APIs that are better optimized to future workloads and compute architectures.
……………………………………………………………………………………………………………………………………………………………………………………………………………………………….
Ethernet back-end networks offer a big opportunity for Arista Networks, which builds switches using Broadcom chips. In the past two years, AI data centers became an important business for Arista. AI provides sales to Arista switch rivals Cisco and Juniper Networks (soon to be a part of Hewlett Packard Enterprise), but those companies aren’t as established among hyperscalers. Analysts expect Arista to get more than $1 billion from AI sales next year and predict that the total market for back-end switches could reach $15 billion in a few years. Three of the five big hyperscale operators are using Arista Ethernet switches in back-end networks, and the other two are testing them. Arista CEO Jayshree Ullal (a former SCU EECS grad student of this author/x-adjunct Professor) says that back-end network sales seem to pull along more orders for front-end gear, too.
The network chips used for AI switching are feats of engineering that rival AI processor chips. Cisco makes its own custom Ethernet switching chips, but some 80% of the chips used in other Ethernet switches comes from Broadcom, with the rest supplied mainly by Marvell. These switch chips now move 51 terabits of data a second; it’s the same amount of data that a person would consume by watching videos for 200 days straight. Next year, switching speeds will double.
The other important parts of a network are connections between computing nodes and cables. As the processor count rises, connections increase at a faster rate. A 25,000-processor cluster needs 75,000 interconnects. A million processors will need 10 million interconnects. More of those connections will be fiber optic, instead of copper or coax. As networks speed up, copper’s reach shrinks. So, expanding clusters have to “scale-out” by linking their racks with optics. “Once you move beyond a few tens of thousand, or 100,000, processors, you cannot connect anything with copper—you have to connect them with optics,” Velaga says.
AI processing chips (GPUs) exchange data at about 10 times the rate of a general-purpose processor chip. Copper has been the preferred conduit because it’s reliable and requires no extra power. At current network speeds, copper works well at lengths of up to five meters. So, hyperscalers have tried to “scale-up” within copper’s reach by packing as many processors as they can within each shelf, and rack of shelves.
Back-end connections now run at 400 gigabits per second, which is equal to a day and half of video viewing. Broadcom’s Velaga says network speeds will rise to 800 gigabits in 2025, and 1.6 terabits in 2026.
Nvidia, Broadcom, and Marvell sell optical interface products, with Marvell enjoying a strong lead in 800-gigabit interconnects. A number of companies supply lasers for optical interconnects, including Coherent, Lumentum Holdings, Applied Optoelectronics, and Chinese vendors Innolight and Eoptolink. They will all battle for the AI data center over the next few years.
A 500,000-processor cluster needs at least 750 megawatts, enough to power 500,000 homes. When AI models scale to a million or more processors, they will require gigawatts of power and have to span more than one physical data center, says Velaga.
The opportunity for optical connections reaches beyond the AI data center. That’s because there isn’t enough power. In September, Marvell, Lumentum, and Coherent demonstrated optical links for data centers as far apart as 300 miles. Nvidia’s next-generation networks will be ready to run a single AI workload across remote locations.
Some worry that AI performance will stop improving as processor counts scale. Nvidia’s Jensen Huang dismissed those concerns on his last conference call, saying that clusters of 100,000 processors or more will just be table stakes with Nvidia’s next generation of chips. Broadcom’s Velaga says he is grateful: “Jensen (Nvidia CEO) has created this massive opportunity for all of us.”
References:
https://www.datacenterdynamics.com/en/news/morgan-stanley-hyperscaler-capex-to-reach-300bn-in-2025/
https://ultraethernet.org/ultra-ethernet-specification-update/
Will AI clusters be interconnected via Infiniband or Ethernet: NVIDIA doesn’t care, but Broadcom sure does!
Will billions of dollars big tech is spending on Gen AI data centers produce a decent ROI?
Canalys & Gartner: AI investments drive growth in cloud infrastructure spending
AI Echo Chamber: “Upstream AI” companies huge spending fuels profit growth for “Downstream AI” firms
AI wave stimulates big tech spending and strong profits, but for how long?
Markets and Markets: Global AI in Networks market worth $10.9 billion in 2024; projected to reach $46.8 billion by 2029
Using a distributed synchronized fabric for parallel computing workloads- Part I
Using a distributed synchronized fabric for parallel computing workloads- Part II
FT: Nvidia invested $1bn in AI start-ups in 2024
Nvidia invested $1bn in artificial intelligence companies in 2024, as it emerged as a crucial backer of start-ups using the company’s graphics processing units (GPUs). The king of AI semiconductors, which surpassed a $3tn market capitalization in June due to huge demand for its high-performing GPUs, has significantly invested into some of its own customers.
According to corporate filings and Dealroom research, Nvidia spent a total of $1bn across 50 start-up funding rounds and several corporate deals in 2024, compared with 2023, which saw 39 start-up rounds and $872mn in spending. The vast majority of deals were with “core AI” companies with high computing infrastructure demands, and so in some cases also buyers of its own chips. Tech companies have spent tens of billions of dollars on Nvidia’s chips over the past year since the debut of ChatGPT two years ago kick-started an unprecedented surge of investment in AI. Nvidia’s uptick in deals comes after it amassed a $9bn war chest of cash with its GPUs becoming one of the world’s hottest commodities.
The company’s shares rose more than 170% in 2024, as it and other tech giants helped power the S&P 500 index to its best two-year run this century. Nvidia’s $1bn worth of investments in “non-affiliated entities” in the first nine months last year includes both its venture and corporate investment arms.
According to company filings, that sum was 15% more than in 2023 and more than 10 times as much as it invested in 2022. Some of Nvidia’s largest customers, such as Microsoft, Amazon and Google, are actively working to reduce their reliance on its GPUs by developing their own custom chips. Such a development could make smaller AI companies a more important generator of revenues for Nvidia in the future.
“Right now Nvidia wants there to be more competition and it makes sense for them to have these new players in the mix,” said a fund manager with a stake in a number of companies it had invested in.
In 2024, Nvidia struck more deals than Microsoft and Amazon, although Google remains far more active, according to Dealroom. Such prolific dealmaking has raised concerns about Nvidia’s grip over the AI industry, at a time when it is facing heightened antitrust scrutiny in the US, Europe and China. Bill Kovacic, former chair of the US Federal Trade Commission, said competition watchdogs were “keen” to investigate a “dominant enterprise making these big investments” to see if buying company stakes was aimed at “achieving exclusivity”, although he said investments in a customer base could prove beneficial. Nvidia strongly rejects the idea that it connects funding with any requirement to use its technology.
The company said it was “working to grow our ecosystem, support great companies and enhance our platform for everyone. We compete and win on merit, independent of any investments we make.” It added: “Every company should be free to make independent technological choices that best suit their needs and strategies.”
The Santa Clara based company’s most recent start-up deal was a strategic investment in Elon Musk’s xAI. Other significant 2024 investments included its participation in funding rounds for OpenAI, Cohere, Mistral and Perplexity, some of the most prominent AI model providers.
Nvidia also has a start-up incubator, Inception, which separately has helped the early evolution of thousands of fledgling companies. The Inception program offers start-ups “preferred pricing” on hardware, as well as cloud credits from Nvidia’s partners.
There has been an uptick in Nvidia’s acquisitions, including a takeover of Run:ai, an Israeli AI workload management platform. The deal closed this week after coming under scrutiny from the EU’s antitrust regulator, which ultimately cleared the transaction. The US Department of Justice was also looking at the deal, according to Politico. Nvidia also bought AI software groups Nebulon, OctoAI, Brev.dev, Shoreline.io and Deci. Collectively it has made more acquisitions in 2024 than the previous four years combined, according to Dealroom. Recommended News in-depthArtificial intelligence Wall Street frenzy creates $11bn debt market for AI groups buying Nvidia chips.
The company is investing widely, pouring millions of dollars into AI groups involved in medical technology, search engines, gaming, drones, chips, traffic management, logistics, data storage and generation, natural language processing and humanoid robots. Its portfolio includes a number of start-ups whose valuations have soared to billions of dollars. CoreWeave, an AI cloud computing service provider and significant purchaser of Nvidia chips, is preparing to float early this year at a valuation as high as $35bn — increasing from about $7bn a year ago.
Nvidia invested $100mn in CoreWeave in early 2023, and participated in a $1bn equity fundraising round by the company in May. Another start-up, Applied Digital, was facing a plunging share price in 2024, with revenue misses and considerable debt obligations, before a group of investors led by Nvidia provided $160mn of equity capital in September, prompting a 65 per cent surge in its share price.
“Nvidia is using their massive market cap and huge cash flow to keep purchasers alive,” said Nate Koppikar, a short seller at Orso Partners. “If Applied Digital had died, that’s [a large volume] of sales that would have died with it.”
Neocloud groups such as CoreWeave, Crusoe and Lambda Labs have acquired tens of thousands of Nvidia’s high-performance GPUs, that are crucial for developing generative AI models. Those Nvidia AI chips are now also being used as collateral for huge loans. The frenzied dealmaking has shone a light on a rampant GPU economy in Silicon Valley that is increasingly being supported by deep-pocketed financiers in New York. However, its rapid growth has raised concerns about the potential for more risky lending, circular financing and Nvidia’s chokehold on the AI market.
References:
https://www.ft.com/content/f8acce90-9c4d-4433-b189-e79cad29f74e
https://www.ft.com/content/41bfacb8-4d1e-4f25-bc60-75bf557f1f21
The case for and against AI-RAN technology using Nvidia or AMD GPUs
Nvidia is proposing a new approach to telco networks dubbed “AI radio access network (AI-RAN).” The GPU king says: “Traditional CPU or ASIC-based RAN systems are designed only for RAN use and cannot process AI traffic today. AI-RAN enables a common GPU-based infrastructure that can run both wireless and AI workloads concurrently, turning networks from single-purpose to multi-purpose infrastructures and turning sites from cost-centers to revenue sources. With a strategic investment in the right kind of technology, telcos can leap forward to become the AI grid that facilitates the creation, distribution, and consumption of AI across industries, consumers, and enterprises. This moment in time presents a massive opportunity for telcos to build a fabric for AI training (creation) and AI inferencing (distribution) by repurposing their central and distributed infrastructures.”
One of the first principles of AI-RAN technology is to be able to run RAN and AI workloads concurrently and without compromising carrier-grade performance. This multi-tenancy can be either in time or space: dividing the resources based on time of day or based on percentage of compute. This also implies the need for an orchestrator that can provision, de-provision, or shift workloads seamlessly based on available capacity.
Image Credit: Pitinan Piyavatin/Alamy Stock Photo
ARC-1, an appliance Nvidia showed off earlier this year, comes with a Grace Blackwell “superchip” that would replace either a traditional vendor’s application-specific integrated circuit (ASIC) or an Intel processor. Ericsson and Nokia are exploring the possibilities with Nvidia. Developing RAN software for use with Nvidia’s chips means acquiring competency in compute unified device architecture (CUDA), Nvidia’s instruction set. “They do have to reprofile into CUDA,” said Soma Velayutham, the general manager of Nvidia’s AI and telecom business, during a recent interview with Light Reading. “That is an effort.”
Proof of Concept:
SoftBank has turned the AI-RAN vision into reality, with its successful outdoor field trial in Fujisawa City, Kanagawa, Japan, where NVIDIA-accelerated hardware and NVIDIA Aerial software served as the technical foundation. That achievement marks multiple steps forward for AI-RAN commercialization and provides real proof points addressing industry requirements on technology feasibility, performance, and monetization:
- World’s first outdoor 5G AI-RAN field trial running on an NVIDIA-accelerated computing platform. This is an end-to-end solution based on a full-stack, virtual 5G RAN software integrated with 5G core.
- Carrier-grade virtual RAN performance achieved.
- AI and RAN multi-tenancy and orchestration achieved.
- Energy efficiency and economic benefits validated compared to existing benchmarks.
- A new solution to unlock AI marketplace integrated on an AI-RAN infrastructure.
- Real-world AI applications showcased, running on an AI-RAN network.
Above all, SoftBank aims to commercially release their own AI-RAN product for worldwide deployment in 2026. To help other mobile network operators get started on their AI-RAN journey now, SoftBank is also planning to offer a reference kit comprising the hardware and software elements required to trial AI-RAN in a fast and easy way.
SoftBank developed their AI-RAN solution by integrating hardware and software components from NVIDIA and ecosystem partners and hardening them to meet carrier-grade requirements. Together, the solution enables a full 5G vRAN stack that is 100% software-defined, running on NVIDIA GH200 (CPU+GPU), NVIDIA Bluefield-3 (NIC/DPU), and Spectrum-X for fronthaul and backhaul networking. It integrates with 20 radio units and a 5G core network and connects 100 mobile UEs.
The core software stack includes the following components:
- SoftBank-developed and optimized 5G RAN Layer 1 functions such as channel mapping, channel estimation, modulation, and forward-error-correction, using NVIDIA Aerial CUDA-Accelerated-RAN libraries
- Fujitsu software for Layer 2 functions
- Red Hat’s OpenShift Container Platform (OCP) as the container virtualization layer, enabling different types of applications to run on the same underlying GPU computing infrastructure
- A SoftBank-developed E2E AI and RAN orchestrator, to enable seamless provisioning of RAN and AI workloads based on demand and available capacity
AI marketplace solution integrated with SoftBank AI-RAN. Image Credit: Nvidia
The underlying hardware is the NVIDIA GH200 Grace Hopper Superchip, which can be used in various configurations from distributed to centralized RAN scenarios. This implementation uses multiple GH200 servers in a single rack, serving AI and RAN workloads concurrently, for an aggregated-RAN scenario. This is comparable to deploying multiple traditional RAN base stations.
In this pilot, each GH200 server was able to process 20 5G cells using 100-MHz bandwidth, when used in RAN-only mode. For each cell, 1.3 Gbps of peak downlink performance was achieved in ideal conditions, and 816Mbps was demonstrated with carrier-grade availability in the outdoor deployment.
……………………………………………………………………………………………………………………………………..
Could AMD GPU’s be an alternative to Nvidia AI-RAN?
AMD is certainly valued by NScale, a UK business with a GPU-as-a-service offer, as an AI alternative to Nvidia. “AMD’s approach is quite interesting,” said David Power, NScale’s chief technology officer. “They have a very open software ecosystem. They integrate very well with common frameworks.” So far, though, AMD has said nothing publicly about any AI-RAN strategy.
The other telco concern is about those promised revenues. Nvidia insists it was conservative when estimating that a telco could realize $5 in inferencing revenues for every $1 invested in AI-RAN. But the numbers met with a fair degree of skepticism in the wider market. Nvidia says the advantage of doing AI inferencing at the edge is that latency, the time a signal takes to travel around the network, would be much lower compared with inferencing in the cloud. But the same case was previously made for hosting other applications at the edge, and they have not taken off.
Even if AI changes that, it is unclear telcos would stand to benefit. Sales generated by the applications available on the mobile Internet have gone largely to hyperscalers and other software developers, leaving telcos with a dwindling stream of connectivity revenues. Expect AI-RAN to be a big topic for 2025 as operators carefully weigh their options. Many telcos are unconvinced there is a valid economic case for AI-RAN, especially since GPUs generate a lot of power (they are perceived as “energy hogs”).
References:
AI-RAN Goes Live and Unlocks a New AI Opportunity for Telcos
https://www.lightreading.com/ai-machine-learning/2025-preview-ai-ran-would-be-a-paradigm-shift
Nvidia bid to reshape 5G needs Ericsson and Nokia buy-in
Softbank goes radio gaga about Nvidia in nervy days for Ericsson
T-Mobile emerging as Nvidia’s big AI cheerleader
AI cloud start-up Vultr valued at $3.5B; Hyperscalers gorge on Nvidia GPUs while AI semiconductor market booms
Superclusters of Nvidia GPU/AI chips combined with end-to-end network platforms to create next generation data centers
Nvidia enters Data Center Ethernet market with its Spectrum-X networking platform
FT: New benchmarks for Gen AI models; Neocloud groups leverage Nvidia chips to borrow >$11B
Will AI clusters be interconnected via Infiniband or Ethernet: NVIDIA doesn’t care, but Broadcom sure does!
Superclusters of Nvidia GPU/AI chips combined with end-to-end network platforms to create next generation data centers
Meta Platforms and Elon Musk’s xAI start-up are among companies building clusters of computer servers with as many as 100,000 of Nvidia’s most advanced GPU chips as the race for artificial-intelligence (AI) supremacy accelerates.
- Meta Chief Executive Mark Zuckerberg said last month that his company was already training its most advanced AI models with a conglomeration of chips he called “bigger than anything I’ve seen reported for what others are doing.”
- xAI built a supercomputer called Colossus—with 100,000 of Nvidia’s Hopper GPU/AI chips—in Memphis, TN in a matter of months.
- OpenAI and Microsoft have been working to build up significant new computing facilities for AI. Google is building massive data centers to house chips that drive its AI strategy.
xAI built a supercomputer in Memphis that it calls Colossus, with 100,000 Nvidia AI chips. Photo: Karen Pulfer Focht/Reuters
A year ago, clusters of tens of thousands of GPU chips were seen as very large. OpenAI used around 10,000 of Nvidia’s chips to train the version of ChatGPT it launched in late 2022, UBS analysts estimate. Installing many GPUs in one location, linked together by superfast networking equipment and cables, has so far produced larger AI models at faster rates. But there are questions about whether ever-bigger super clusters will continue to translate into smarter chatbots and more convincing image-generation tools.
Nvidia Chief Executive Jensen Huang said that while the biggest clusters for training for giant AI models now top out at around 100,000 of Nvidia’s current chips, “the next generation starts at around 100,000 Blackwells. And so that gives you a sense of where the industry is moving. Do we think that we need millions of GPUs? No doubt. That is a certainty now. And the question is how do we architect it from a data center perspective,” Huang added.
“There is no evidence that this will scale to a million chips and a $100 billion system, but there is the observation that they have scaled extremely well all the way from just dozens of chips to 100,000,” said Dylan Patel, the chief analyst at SemiAnalysis, a market research firm.
Giant super clusters are already getting built. Musk posted last month on his social-media platform X that his 100,000-chip Colossus super cluster was “soon to become” a 200,000-chip cluster in a single building. He also posted in June that the next step would probably be a 300,000-chip cluster of Nvidia’s newest GPU chips next summer. The rise of super clusters comes as their operators prepare for Nvidia’s nexgen Blackwell chips, which are set to start shipping out in the next couple of months. Blackwell chips are estimated to cost around $30,000 each, meaning a cluster of 100,000 would cost $3 billion, not counting the price of the power-generation infrastructure and IT equipment around the chips.
Those dollar figures make building up super clusters with ever more chips something of a gamble, industry insiders say, given that it isn’t clear that they will improve AI models to a degree that justifies their cost. Indeed, new engineering challenges also often arise with larger clusters:
- Meta researchers said in a July paper that a cluster of more than 16,000 of Nvidia’s GPUs suffered from unexpected failures of chips and other components routinely as the company trained an advanced version of its Llama model over 54 days.
- Keeping Nvidia’s chips cool is a major challenge as clusters of power-hungry chips become packed more closely together, industry executives say, part of the reason there is a shift toward liquid cooling where refrigerant is piped directly to chips to keep them from overheating.
- The sheer size of the super clusters requires a stepped-up level of management of those chips when they fail. Mark Adams, chief executive of Penguin Solutions, a company that helps set up and operate computing infrastructure, said elevated complexity in running large clusters of chips inevitably throws up problems.
The continuation of the AI boom for Nvidia largely depends on how the largest clusters of GPU chips deliver a return on investment for its customers. The trend also fosters demand for Nvidia’s networking equipment, which is fast becoming a significant business. Nvidia’s networking equipment revenue in 2024 was $3.13 billion, which was a 51.8% increase from the previous year. Mostly from its Mellanox acquisition, Nvidia offers these networking platforms:
- Accelerated Ethernet Switching for AI and the Cloud
- Quantum InfiniBand for AI and Scientific Computing
- Bluefield® Network Accelerators
………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………..
Nvidia forecasts total fiscal fourth-quarter sales of about $37.5bn, up 70%. That was above average analyst projections of $37.1bn, compiled by Bloomberg, but below some projections that were as high as $41bn. “Demand for Hopper and anticipation for Blackwell – in full production – are incredible as foundation model makers scale pretraining, post-training and inference, Huang said. “Both Hopper and Blackwell systems have certain supply constraints, and the demand for Blackwell is expected to exceed supply for several quarters in fiscal 2026,” CFO Colette Kress said.
References:
https://www.wsj.com/tech/ai/nvidia-chips-ai-race-96d21d09?mod=tech_lead_pos5
https://www.nvidia.com/en-us/networking/
https://nvidianews.nvidia.com/news/nvidia-announces-financial-results-for-third-quarter-fiscal-2025
FT: New benchmarks for Gen AI models; Neocloud groups leverage Nvidia chips to borrow >$11B
The Financial Times reports that technology companies are rushing to redesign how they test and evaluate their Gen AI models, as current AI benchmarks appear to be inadequate. AI benchmarks are used to assess how well an AI model can generate content that is coherent, relevant, and creative. This can include generating text, images, music, or any other form of content.
OpenAI, Microsoft, Meta and Anthropic have all recently announced plans to build AI agents that can execute tasks for humans autonomously on their behalf. To do this effectively, the AI systems must be able to perform increasingly complex actions, using reasoning and planning.
Current public AI benchmarks — Hellaswag and MMLU — use multiple-choice questions to assess common sense and knowledge across various topics. However, researchers argue this method is now becoming redundant and models need more complex problems.
“We are getting to the era where a lot of the human-written tests are no longer sufficient as a good barometer for how capable the models are,” said Mark Chen, senior vice-president of research at OpenAI. “That creates a new challenge for us as a research world.”
The SWE Verified benchmark was updated in August to better evaluate autonomous systems based on feedback from companies, including OpenAI. It uses real-world software problems sourced from the developer platform GitHub and involves supplying the AI agent with a code repository and an engineering issue, asking them to fix it. The tasks require reasoning to complete.
“It is a lot more challenging [with agentic systems] because you need to connect those systems to lots of extra tools,” said Jared Kaplan, chief science officer at Anthropic.
“You have to basically create a whole sandbox environment for them to play in. It is not as simple as just providing a prompt, seeing what the completion is and then evaluating that.”
Another important factor when conducting more advanced tests is to make sure the benchmark questions are kept out of the public domain, in order to ensure the models do not effectively “cheat” by generating the answers from training data, rather than solving the problem.
The need for new benchmarks has also led to efforts by external organizations. In September, the start-up Scale AI announced a project called “Humanity’s Last Exam”, which crowdsourced complex questions from experts across different disciplines that required abstract reasoning to complete.
Meanwhile, the Financial Times recently reported that Wall Street’s largest financial institutions had loaned more than $11bn to “neocloud” groups, backed by their possession of Nvidia’s AI GPU chips. These companies include names such as CoreWeave, Crusoe and Lambda, and provide cloud computing services to tech businesses building AI products. They have acquired tens of thousands of Nvidia’s graphics processing units (GPUs) through partnerships with the chipmaker. With capital expenditure on data centres surging, in the rush to develop AI models, the Nvidia’s AI GPU chips have become a precious commodity.
Nvidia’s chips have become a precious commodity in the ongoing race to develop AI models © Marlena Sloss/Bloomberg
…………………………………………………………………………………………………………………………………
The $3tn tech group’s allocation of chips to neocloud groups has given confidence to Wall Street lenders to lend billions of dollars to the companies that are then used to buy more Nvidia chips. Nvidia is itself an investor in neocloud companies that in turn are among its largest customers. Critics have questioned the ongoing value of the collateralised chips as new advanced versions come to market — or if the current high spending on AI begins to retract. “The lenders all coming in push the story that you can borrow against these chips and add to the frenzy that you need to get in now,” said Nate Koppikar, a short seller at hedge fund Orso Partners. “But chips are a depreciating, not appreciating, asset.”
References:
https://www.ft.com/content/866ad6e9-f8fe-451f-9b00-cb9f638c7c59
https://www.ft.com/content/fb996508-c4df-4fc8-b3c0-2a638bb96c19
https://www.ft.com/content/41bfacb8-4d1e-4f25-bc60-75bf557f1f21
Tata Consultancy Services: Critical role of Gen AI in 5G; 5G private networks and enterprise use cases
Reuters & Bloomberg: OpenAI to design “inference AI” chip with Broadcom and TSMC
AI adoption to accelerate growth in the $215 billion Data Center market
AI Echo Chamber: “Upstream AI” companies huge spending fuels profit growth for “Downstream AI” firms
AI winner Nvidia faces competition with new super chip delayed
Nvidia enters Data Center Ethernet market with its Spectrum-X networking platform
Nvidia is planning a big push into the Data Center Ethernet market. CFO Colette Kress said the Spectrum-X Ethernet-based networking solution it launched in May 2023 is “well on track to begin a multi-billion-dollar product line within a year.” The Spectrum-X platform includes: Ethernet switches, optics, cables and network interface cards (NICs). Nvidia already has a multi-billion-dollar play in this space in the form of its Ethernet NIC product. Kress said during Nvidia’s earnings call that “hundreds of customers have already adopted the platform.” And that Nvidia plans to “launch new Spectrum-X products every year to support demand for scaling compute clusters from tens of thousands of GPUs today to millions of DPUs in the near future.”
- With Spectrum-X, Nvidia will be competing with Arista, Cisco, and Juniper at the system level along with “bare metal switches” from Taiwanese ODMs running DriveNets network cloud software.
- With respect to high performance Ethernet switching silicon, Nvidia competitors include Broadcom, Marvell, Microchip, and Cisco (which uses Silicon One internally and also sells it on the merchant semiconductor market).
Image by Midjourney for Fierce Network
…………………………………………………………………………………………………………………………………………………………………………..
In November 2023, Nvidia said it would work with Dell Technologies, Hewlett Packard Enterprise and Lenovo to incorporate Spectrum-X capabilities into their compute servers. Nvidia is now targeting tier-2 cloud service providers and enterprise customers looking for bundled solutions.
Dell’Oro Group VP Sameh Boujelbene told Fierce Network that “Nvidia is positioning Spectrum-X for AI back-end network deployments as an alternative fabric to InfiniBand. While InfiniBand currently dominates AI back-end networks with over 80% market share, Ethernet switches optimized for AI deployments have been gaining ground very quickly.” Boujelbene added Nvidia’s success with Spectrum-X thus far has largely been driven “by one major 100,000-GPU cluster, along with several smaller deployments by Cloud Service Providers.” By 2028, Boujelbene said Dell’Oro expects Ethernet switches to surpass InfiniBand for AI in the back-end network market, with revenues exceeding $10 billion.
………………………………………………………………………………………………………………………………………………………………………………
In a recent IEEE Techblog post we wrote:
While InfiniBand currently has the edge in the data center networking market, but several factors point to increased Ethernet adoption for AI clusters in the future. Recent innovations are addressing Ethernet’s shortcomings compared to InfiniBand:
- Lossless Ethernet technologies
- RDMA over Converged Ethernet (RoCE)
- Ultra Ethernet Consortium’s AI-focused specifications
Some real-world tests have shown Ethernet offering up to 10% improvement in job completion performance across all packet sizes compared to InfiniBand in complex AI training tasks. By 2028, it’s estimated that: 1] 45% of generative AI workloads will run on Ethernet (up from <20% now) and 2] 30% will run on InfiniBand (up from <20% now).
………………………………………………………………………………………………………………………………………………………………………………
References:
https://www.fierce-network.com/cloud/data-center-ethernet-nvidias-next-multi-billion-dollar-business
https://www.nvidia.com/en-us/networking/spectrumx/
Will AI clusters be interconnected via Infiniband or Ethernet: NVIDIA doesn’t care, but Broadcom sure does!
Data Center Networking Market to grow at a CAGR of 6.22% during 2022-2027 to reach $35.6 billion by 2027
LightCounting: Optical Ethernet Transceiver sales will increase by 40% in 2024
Will AI clusters be interconnected via Infiniband or Ethernet: NVIDIA doesn’t care, but Broadcom sure does!
InfiniBand, which has been used extensively for HPC interconnect, currently dominates AI networking accounting for about 90% of deployments. That is largely due to its very low latency and architecture that reduces packet loss, which is beneficial for AI training workloads. Packet loss slows AI training workloads, and they’re already expensive and time-consuming. This is probably why Microsoft chose to run InfiniBand when building out its data centers to support machine learning workloads. However, InfiniBand tends to lag Ethernet in terms of top speeds. Nvidia’s very latest Quantum InfiniBand switch tops out at 51.2 Tb/s with 400 Gb/s ports. By comparison, Ethernet switching hit 51.2 Tb/s nearly two years ago and can support 800 Gb/s port speeds.
While InfiniBand currently has the edge, several factors point to increased Ethernet adoption for AI clusters in the future. Recent innovations are addressing Ethernet’s shortcomings compared to InfiniBand:
- Lossless Ethernet technologies
- RDMA over Converged Ethernet (RoCE)
- Ultra Ethernet Consortium’s AI-focused specifications
Some real-world tests have shown Ethernet offering up to 10% improvement in job completion performance across all packet sizes compared to InfiniBand in complex AI training tasks. By 2028, it’s estimated that: 1] 45% of generative AI workloads will run on Ethernet (up from <20% now) and 2] 30% will run on InfiniBand (up from <20% now).
In a lively session at VM Ware-Broadcom’s Explore event, panelists were asked how to best network together the GPUs, and other data center infrastructure, needed to deliver AI. Broadcom’s Ram Velaga, SVP and GM of the Core Switching Group, was unequivocal: “Ethernet will be the technology to make this happen.” Velaga opening remarks asked the audience, “Think about…what is machine learning and how is that different from cloud computing?” Cloud computing, he said, is about driving utilization of CPUs; with ML, it’s the opposite.
“No one…machine learning workload can run on a single GPU…No single GPU can run an entire machine learning workload. You have to connect many GPUs together…so machine learning is a distributed computing problem. It’s actually the opposite of a cloud computing problem,” Velaga added.
Nvidia (which acquired Israel interconnect fabless chip maker Mellanox [1.] in 2019) says, “Infiniband provides dramatic leaps in performance to achieve faster time to discovery with less cost and complexity.” Velaga disagrees saying “InfiniBand is expensive, fragile and predicated on the faulty assumption that the physical infrastructure is lossless.”
Note 1. Mellanox specialized in switched fabrics for enterprise data centers and high performance computing, when high data rates and low latency are required such as in a computer cluster.
…………………………………………………………………………………………………………………………………………..
Ethernet, on the other hand, has been the subject of ongoing innovation and advancement since, he cited the following selling points:
- Pervasive deployment
- Open and standards-based
- Highest Remote Direct Access Memory (RDMA) performance for AI fabrics
- Lowest cost compared to proprietary tech
- Consistent across front-end, back-end, storage and management networks
- High availability, reliability and ease of use
- Broad silicon, hardware, software, automation, monitoring and debugging solutions from a large ecosystem
To that last point, Velaga said, “We steadfastly have been innovating in this world of Ethernet. When there’s so much competition, you have no choice but to innovate.” InfiniBand, he said, is “a road to nowhere.” It should be noted that Broadcom (which now owns VMWare) is the largest supplier of Ethernet switching chips for every part of a service provider network (see diagram below). Broadcom’s Jericho3-AI silicon, which can connect up to 32,000 GPU chips together, competes head-on with InfiniBand!
Image Courtesy of Broadcom
………………………………………………………………………………………………………………………………………………………..
Conclusions:
While InfiniBand currently dominates AI networking, Ethernet is rapidly evolving to meet AI workload demands. The future will likely see a mix of both technologies, with Ethernet gaining significant ground due to its improvements, cost-effectiveness, and widespread compatibility. Organizations will need to evaluate their specific needs, considering factors like performance requirements, existing infrastructure, and long-term scalability when choosing between InfiniBand and Ethernet for AI clusters.
–>Well, it turns out that Nvidia’s Mellanox division in Israel makes BOTH Infiniband AND Ethernet chips so they win either way!
…………………………………………………………………………………………………………………………………………………………………………..
References:
https://www.perplexity.ai/search/will-ai-clusters-run-on-infini-uCYEbRjeR9iKAYH75gz8ZA
https://www.theregister.com/2024/01/24/ai_networks_infiniband_vs_ethernet/
Broadcom on AI infrastructure networking—’Ethernet will be the technology to make this happen’
https://www.nvidia.com/en-us/networking/products/infiniband/h
ttps://www.nvidia.com/en-us/networking/products/ethernet/
Part1: Unleashing Network Potentials: Current State and Future Possibilities with AI/ML
Using a distributed synchronized fabric for parallel computing workloads- Part II
Part-2: Unleashing Network Potentials: Current State and Future Possibilities with AI/ML
AI Echo Chamber: “Upstream AI” companies huge spending fuels profit growth for “Downstream AI” firms
According to the Wall Street Journal, the AI industry has become an “Echo Chamber,” where huge capital spending by the AI infrastructure and application providers have fueled revenue and profit growth for everyone else. Market research firm Bespoke Investment Group has recently created baskets for “downstream” and “upstream” AI companies.
- The Downstream group involves “AI implementation,” which consist of firms that sell AI development tools, such as the large language models (LLMs) popularized by OpenAI’s ChatGPT since the end of 2022, or run products that can incorporate them. This includes Google/Alphabet, Microsoft, Amazon, Meta Platforms (FB), along with IBM, Adobe and Salesforce.
- Higher up the supply chain (Upstream group), are the “AI infrastructure” providers, which sell AI chips, applications, data centers and training software. The undisputed leader is Nvidia, which has seen its sales triple in a year, but it also includes other semiconductor companies, database developer Oracle and owners of data centers Equinix and Digital Realty.
The Upstream group of companies have posted profit margins that are far above what analysts expected a year ago. In the second quarter, and pending Nvidia’s results on Aug. 28th , Upstream AI members of the S&P 500 are set to have delivered a 50% annual increase in earnings. For the remainder of 2024, they will be increasingly responsible for the profit growth that Wall Street expects from the stock market—even accounting for Intel’s huge problems and restructuring.
It should be noted that the lines between the two groups can be blurry, particularly when it comes to giants such as Amazon, Microsoft and Alphabet, which provide both AI implementation (e.g. LLMs) and infrastructure: Their cloud-computing businesses are responsible for turning these companies into the early winners of the AI craze last year and reported breakneck growth during this latest earnings season. A crucial point is that it is their role as ultimate developers of AI applications that have led them to make super huge capital expenditures, which are responsible for the profit surge in the rest of the ecosystem. So there is a definite trickle down effect where the big tech players AI directed CAPEX is boosting revenue and profits for the companies down the supply chain.
As the path for monetizing this technology gets longer and harder, the benefits seem to be increasingly accruing to companies higher up in the supply chain. Meta Platforms Chief Executive Mark Zuckerberg recently said the company’s coming Llama 4 language model will require 10 times as much computing power to train as its predecessor. Were it not for AI, revenues for semiconductor firms would probably have fallen during the second quarter, rather than rise 18%, according to S&P Global.
………………………………………………………………………………………………………………………………………………………..
A paper written by researchers from the likes of Cambridge and Oxford uncovered that the large language models (LLMs) behind some of today’s most exciting AI apps may have been trained on “synthetic data” or data generated by other AI. This revelation raises ethical and quality concerns. If an AI model is trained primarily or even partially on synthetic data, it might produce outputs lacking human-generated content’s richness and reliability. It could be a case of the blind leading the blind, with AI models reinforcing the limitations or biases inherent in the synthetic data they were trained on.
In this paper, the team coined the phrase “model collapse,” claiming that training models this way will answer user prompts with low-quality outputs. The idea of “model collapse” suggests a sort of unraveling of the machine’s learning capabilities, where it fails to produce outputs with the informative or nuanced characteristics we expect. This poses a serious question for the future of AI development. If AI is increasingly trained on synthetic data, we risk creating echo chambers of misinformation or low-quality responses, leading to less helpful and potentially even misleading systems.
……………………………………………………………………………………………………………………………………………
In a recent working paper, Massachusetts Institute of Technology (MIT) economist Daron Acemoglu argued that AI’s knack for easy tasks has led to exaggerated predictions of its power to enhance productivity in hard jobs. Also, some of the new tasks created by AI may have negative social value (such as design of algorithms for online manipulation). Indeed, data from the Census Bureau show that only a small percentage of U.S. companies outside of the information and knowledge sectors are looking to make use of AI.
References:
https://deepgram.com/learn/the-ai-echo-chamber-model-collapse-synthetic-data-risks
https://economics.mit.edu/sites/default/files/2024-04/The%20Simple%20Macroeconomics%20of%20AI.pdf
AI wave stimulates big tech spending and strong profits, but for how long?
AI winner Nvidia faces competition with new super chip delayed
SK Telecom and Singtel partner to develop next-generation telco technologies using AI
Telecom and AI Status in the EU
Vodafone: GenAI overhyped, will spend $151M to enhance its chatbot with AI
Data infrastructure software: picks and shovels for AI; Hyperscaler CAPEX
AI winner Nvidia faces competition with new super chip delayed
The Clear AI Winner Is: Nvidia!
Strong AI spending should help Nvidia make its own ambitious numbers when it reports earnings at the end of the month (it’s 2Q-2024 ended July 31st). Analysts are expecting nearly $25 billion in data center revenue for the July quarter—about what that business was generating annually a year ago. But the latest results won’t quell the growing concern investors have with the pace of AI spending among the world’s largest tech giants—and how it will eventually pay off.
In March, Nvidia unveiled its Blackwell chip series, succeeding its earlier flagship AI chip, the GH200 Grace Hopper Superchip, which was designed to speed generative AI applications. The NVIDIA GH200 NVL2 fully connects two GH200 Superchips with NVLink, delivering up to 288GB of high-bandwidth memory, 10 terabytes per second (TB/s) of memory bandwidth, and 1.2TB of fast memory. The GH200 NVL2 offers up to 3.5X more GPU memory capacity and 3X more bandwidth than the NVIDIA H100 Tensor Core GPU in a single server for compute- and memory-intensive workloads. The GH200 meanwhile combines an H100 chip [1.] with an Arm CPU and more memory.
Photo Credit: Nvidia
Note 1. The Nvidia H100, sits in a 10.5 inch graphics card which is then bundled together into a server rack alongside dozens of other H100 cards to create one massive data center computer.
This week, Nvidia informed Microsoft and another major cloud service provider of a delay in the production of its most advanced AI chip in the Blackwell series, the Information website said, citing a Microsoft employee and another person with knowledge of the matter.
…………………………………………………………………………………………………………………………………………
Nvidia Competitors Emerge – but are their chips ONLY for internal use?
In addition to AMD, Nvidia has several big tech competitors that are currently not in the merchant market semiconductor business. These include:
- Huawei has developed the Ascend series of chips to rival Nvidia’s AI chips, with the Ascend 910B chip as its main competitor to Nvidia’s A100 GPU chip. Huawei is the second largest cloud services provider in China, just behind Alibaba and ahead of Tencent.
- Microsoft has unveiled an AI chip called the Azure Maia AI Accelerator, optimized for artificial intelligence (AI) tasks and generative AI as well as the Azure Cobalt CPU, an Arm-based processor tailored to run general purpose compute workloads on the Microsoft Cloud.
- Last year, Meta announced it was developing its own AI hardware. This past April, Meta announced its next generation of custom-made processor chips designed for their AI workloads. The latest version significantly improves performance compared to the last generation and helps power their ranking and recommendation ads models on Facebook and Instagram.
- Also in April, Google revealed the details of a new version of its data center AI chips and announced an Arm-based based central processor. Google’s 10 year old Tensor Processing Units (TPUs) are one of the few viable alternatives to the advanced AI chips made by Nvidia, though developers can only access them through Google’s Cloud Platform and not buy them directly.
As demand for generative AI services continues to grow, it’s evident that GPU chips will be the next big battleground for AI supremacy.
References:
AI Frenzy Backgrounder; Review of AI Products and Services from Nvidia, Microsoft, Amazon, Google and Meta; Conclusions
https://www.nvidia.com/en-us/data-center/grace-hopper-superchip/
https://www.theverge.com/2024/2/1/24058186/ai-chips-meta-microsoft-google-nvidia/archives/2
https://news.microsoft.com/source/features/ai/in-house-chips-silicon-to-service-to-meet-ai-demand/