Reuters & Bloomberg: OpenAI to design “inference AI” chip with Broadcom and TSMC

Bloomberg reports that OpenAI, the fast-growing company behind ChatGPT, is working with Broadcom Inc. to develop a new artificial intelligence chip specifically focused on running AI models after they’ve been trained, according to two people familiar with the matter.   The two companies are also consulting with Taiwan Semiconductor Manufacturing Company(TSMC) the world’s largest chip contract manufacturer. OpenAI has been planning a custom chip and working on its uses for the technology for around a year, the people said, but the discussions are still at an early stage.  The company has assembled a chip team of about 20 people, led by top engineers who have previously built Tensor Processing Units (TPUs) at Google, including Thomas Norrie and Richard Ho.

Reuters reported on OpenAI’s ongoing talks with Broadcom and TSMC on Tuesday. It has been working for months with Broadcom to build its first AI chip focusing on inference (responds to user requests), according to sources. Demand right now is greater for training chips, but analysts have predicted the need for inference chips could surpass them as more AI applications are deployed.

OpenAI has examined a range of options to diversify chip supply and reduce costs. OpenAI considered building everything in-house and raising capital for an expensive plan to build a network of chip manufacturing factories known as “foundries.”

REUTERS/Dado Ruvic/Illustration/File Photo Purchase Licensing Rights

OpenAI may continue to research setting up its own network of foundries, or chip factories, one of the people said, but the startup has realized that working with partners on custom chips is a quicker, attainable path for now. Reuters earlier reported that OpenAI was pulling back from the effort of establishing its own chip manufacturing capacity.  The company has dropped the ambitious foundry plans for now due to the costs and time needed to build a network, and plans instead to focus on in-house chip design efforts, according to sources.

OpenAI, which helped commercialize generative AI that produces human-like responses to queries, relies on substantial computing power to train and run its systems. As one of the largest purchasers of Nvidia’s graphics processing units (GPUs), OpenAI uses AI chips both to train models where the AI learns from data and for inference, applying AI to make predictions or decisions based on new information. Reuters previously reported on OpenAI’s chip design endeavors. The Information reported on talks with Broadcom and others.

The Information reported in June that Broadcom had discussed making an AI chip for OpenAI. As one of the largest buyers of chips, OpenAI’s decision to source from a diverse array of chipmakers while developing its customized chip could have broader tech sector implications.

Broadcom is the largest designer of application-specific integrated circuits (ASICs) — chips designed to fit a single purpose specified by the customer. The company’s biggest customer in this area is Alphabet Inc.’s Google. Broadcom also works with Meta Platforms Inc. and TikTok owner ByteDance Ltd.

When asked last month whether he has new customers for the business, given the huge demand for AI training, Broadcom Chief Executive Officer Hock Tan said that he will only add to his short list of customers when projects hit volume shipments.  “It’s not an easy product to deploy for any customer, and so we do not consider proof of concepts as production volume,” he said during an earnings conference call.

OpenAI’s services require massive amounts of computing power to develop and run — with much of that coming from Nvidia chips. To meet the demand, the industry has been scrambling to find alternatives to Nvidia. That’s included embracing processors from Advanced Micro Devices Inc. and developing in-house versions.

OpenAI is also actively planning investments and partnerships in data centers, the eventual home for such AI chips. The startup’s leadership has pitched the U.S. government on the need for more massive data centers and CEO Sam Altman has sounded out global investors, including some in the Middle East, to finance the effort.

“It’s definitely a stretch,” OpenAI Chief Financial Officer Sarah Friar told Bloomberg Television on Monday. “Stretch from a capital perspective but also my own learning. Frankly we are all learning in this space: Infrastructure is destiny.”

Currently, Nvidia’s GPUs hold over 80% AI market share. But shortages and rising costs have led major customers like Microsoft, Meta, and now OpenAI, to explore in-house or external alternatives.

Training AI models and operating services like ChatGPT are expensive. OpenAI has projected a $5 billion loss this year on $3.7 billion in revenue, according to sources. Compute costs, or expenses for hardware, electricity and cloud services needed to process large datasets and develop models, are the company’s largest expense, prompting efforts to optimize utilization and diversify suppliers.
OpenAI has been cautious about poaching talent from Nvidia because it wants to maintain a good rapport with the chip maker it remains committed to working with, especially for accessing its new generation of Blackwell chips, sources added.

References:

https://www.bloomberg.com/news/articles/2024-10-29/openai-broadcom-working-to-develop-ai-chip-focused-on-inference?embedded-checkout=true

https://www.reuters.com/technology/artificial-intelligence/openai-builds-first-chip-with-broadcom-tsmc-scales-back-foundry-ambition-2024-10-29/

AI Echo Chamber: “Upstream AI” companies huge spending fuels profit growth for “Downstream AI” firms

AI Frenzy Backgrounder; Review of AI Products and Services from Nvidia, Microsoft, Amazon, Google and Meta; Conclusions

AI sparks huge increase in U.S. energy consumption and is straining the power grid; transmission/distribution as a major problem

Generative AI Unicorns Rule the Startup Roost; OpenAI in the Spotlight

 

Will AI clusters be interconnected via Infiniband or Ethernet: NVIDIA doesn’t care, but Broadcom sure does!

InfiniBand, which has been used extensively for HPC interconnect, currently dominates AI networking accounting for about 90% of deployments. That is largely due to its very low latency and architecture that reduces packet loss, which is beneficial for AI training workloads.  Packet loss slows AI training workloads, and they’re already expensive and time-consuming. This is probably why Microsoft chose to run InfiniBand when building out its data centers to support machine learning workloads.  However, InfiniBand tends to lag Ethernet in terms of top speeds. Nvidia’s very latest Quantum InfiniBand switch tops out at 51.2 Tb/s with 400 Gb/s ports. By comparison, Ethernet switching hit 51.2 Tb/s nearly two years ago and can support 800 Gb/s port speeds.

While InfiniBand currently has the edge, several factors point to increased Ethernet adoption for AI clusters in the future. Recent innovations are addressing Ethernet’s shortcomings compared to InfiniBand:

  • Lossless Ethernet technologies
  • RDMA over Converged Ethernet (RoCE)
  • Ultra Ethernet Consortium’s AI-focused specifications

Some real-world tests have shown Ethernet offering up to 10% improvement in job completion performance across all packet sizes compared to InfiniBand in complex AI training tasks.  By 2028, it’s estimated that: 1] 45% of generative AI workloads will run on Ethernet (up from <20% now) and 2] 30% will run on InfiniBand (up from <20% now).

In a lively session at VM Ware-Broadcom’s Explore event, panelists were asked how to best network together the GPUs, and other data center infrastructure, needed to deliver AI. Broadcom’s Ram Velaga, SVP and GM of the Core Switching Group, was unequivocal: “Ethernet will be the technology to make this happen.”  Velaga opening remarks asked the audience, “Think about…what is machine learning and how is that different from cloud computing?” Cloud computing, he said, is about driving utilization of CPUs; with ML, it’s the opposite.

“No one…machine learning workload can run on a single GPU…No single GPU can run an entire machine learning workload. You have to connect many GPUs together…so machine learning is a distributed computing problem. It’s actually the opposite of a cloud computing problem,” Velaga added.

Nvidia (which acquired Israel interconnect fabless chip maker Mellanox [1.] in 2019) says, “Infiniband provides dramatic leaps in performance to achieve faster time to discovery with less cost and complexity.”  Velaga disagrees saying “InfiniBand is expensive, fragile and predicated on the faulty assumption that the physical infrastructure is lossless.”

Note 1. Mellanox specialized in switched fabrics for enterprise data centers and high performance computing, when high data rates and low latency are required such as in a computer cluster.

…………………………………………………………………………………………………………………………………………..

Ethernet, on the other hand, has been the subject of ongoing innovation and advancement since, he cited the following selling points:

  • Pervasive deployment
  • Open and standards-based
  • Highest Remote Direct Access Memory (RDMA) performance for AI fabrics
  • Lowest cost compared to proprietary tech
  • Consistent across front-end, back-end, storage and management networks
  • High availability, reliability and ease of use
  • Broad silicon, hardware, software, automation, monitoring and debugging solutions from a large ecosystem

To that last point, Velaga said, “We steadfastly have been innovating in this world of Ethernet. When there’s so much competition, you have no choice but to innovate.” InfiniBand, he said, is “a road to nowhere.” It should be noted that Broadcom (which now owns VMWare) is the largest supplier of Ethernet switching chips for every part of a service provider network (see diagram below). Broadcom’s Jericho3-AI silicon, which can connect up to 32,000 GPU chips together, competes head-on with InfiniBand!

Image Courtesy of Broadcom

………………………………………………………………………………………………………………………………………………………..

Conclusions:

While InfiniBand currently dominates AI networking, Ethernet is rapidly evolving to meet AI workload demands. The future will likely see a mix of both technologies, with Ethernet gaining significant ground due to its improvements, cost-effectiveness, and widespread compatibility. Organizations will need to evaluate their specific needs, considering factors like performance requirements, existing infrastructure, and long-term scalability when choosing between InfiniBand and Ethernet for AI clusters.

–>Well, it turns out that Nvidia’s Mellanox division in Israel makes BOTH Infiniband AND Ethernet chips so they win either way!

…………………………………………………………………………………………………………………………………………………………………………..

References:

https://www.perplexity.ai/search/will-ai-clusters-run-on-infini-uCYEbRjeR9iKAYH75gz8ZA

https://i0.wp.com/techjunction.co/wp-content/uploads/2023/10/InfiniBand-Topology.png?resize=768%2C420&ssl=1

https://www.theregister.com/2024/01/24/ai_networks_infiniband_vs_ethernet/

Broadcom on AI infrastructure networking—’Ethernet will be the technology to make this happen’

https://www.nvidia.com/en-us/networking/products/infiniband/h

ttps://www.nvidia.com/en-us/networking/products/ethernet/

Part1: Unleashing Network Potentials: Current State and Future Possibilities with AI/ML

Using a distributed synchronized fabric for parallel computing workloads- Part II

Part-2: Unleashing Network Potentials: Current State and Future Possibilities with AI/ML

 

 

 

Broadcom: 5nm 100G/lane Optical PAM-4 DSP PHY; 200G Optical Link with Semtech

Broadcom Inc. today announced the availability of its 5nm 100G/lane optical PAM-4 DSP PHY with integrated transimpedance amplifier (TIA) and laser driver, the BCM85812, optimized for 800G DR8, 2x400G FR4 and 800G Active Optical Cable (AOC) [1.] module applications. Built on Broadcom’s proven 5nm 112G PAM-4 DSP platform, this fully integrated DSP PHY delivers superior performance and efficiency and drives the overall system power down to unprecedented levels for hyperscale data center and cloud providers.

Note 1.  Active Optical Cable (AOC) is a cabling technology that accepts same electrical inputs as a traditional copper cable, but uses optical fiber “between the connectors.” AOC uses electrical-to-optical conversion on the cable ends to improve speed and distance performance of the cable without sacrificing compatibility with standard electrical interfaces.

BCM85812 Product Highlights:

  • Monolithic 5nm 800G PAM-4 PHY with integrated TIA and high-swing laser driver
  • Delivers best-in-class module performance in BER and power consumption.
  • Drives down 800G module power for SMF solutions to sub 11W and MMF solutions to sub 10W.
  • Compliant to all applicable IEEE and OIF standards, capable of supporting MR links on the chip to module interface.
  • Fully compliant with OIF 3.2T Co-Packaged Optical Module Specs
  • Capable of supporting optical modules from 800G to 3.2T

Demo Showcases at OFC 2023:

Broadcom will demonstrate the BCM85812 in an end-to-end link connecting two Tomahawk 5 (TH5) switches using Eoptolink’s 800G DR8 optical modules. Attendees will see live traffic stream of 800GbE data running between two TH5 switches. Broadcom will showcase various 800G DR8, 2x400G FR4, 2x400G DR4, 800G SR8, and 800G AOC solutions from third party transceiver vendors that interoperate with each other, all using Broadcom’s DSP solutions. Following are module vendors that will be participating in a multi-vendor interop plug-fest on the latest Tomahawk 5 switch platform: Eoptolink, Intel, Molex, Innolight, Source Photonics, Cloud Light Technology Limited and Hisense Broadband.

Additionally, Broadcom in collaboration with Semtech and Keysight will demonstrate a 200G per lane (200G/lane) optical transmission link leveraging Broadcom’s latest SerDes, DSP and laser technology. These demonstrations will be in Broadcom Booth 6425 at the Optical Fiber Communication (OFC) 2023 exhibition in San Diego, California from March 7th to 9th.

“This first-to-market highly integrated 5nm 100G/lane DSP PHY extends Broadcom’s optical PHY leadership and demonstrates our commitment to addressing the stringent low power requirements from hyperscale data center and cloud providers,” said Vijay Janapaty, vice president and general manager of the Physical Layer Products Division at Broadcom. “With our advancement in 200G/lane, Broadcom continues to lead the industry in developing next generation solutions for 51.2T and 102.T switch platforms.”

“By 2028, optical transceivers are projected to account for up to 8% of total power consumption in cloud data centers,” said Bob Wheeler, principal analyst at Wheeler’s Network. “The integration of TIA and driver functions in DSP PHYs is an important step in reducing this energy consumption, and Broadcom is leading the innovation charge in next-generation 51.2T cloud switching platforms while also demonstrating a strong commitment to Capex savings.”

Availability:

Broadcom has begun shipping samples of the BCM85812 to its early access customers and partners. Please contact your local Broadcom sales representative for samples and pricing.

……………………………………………………………………………………………………………………………………….

Separately, Semtech Corp. with Broadcom will demonstrate a 200G per lane optical transmission link that leverages Semtech’s latest FiberEdge 200G PAM4 PMDs and Broadcom’s latest DSP PHY. The two companies plan to recreate the demonstration this week at OFC 2023 in San Diego in their respective booths. Such a capability will be useful to enable 3.2-Tbps optical modules to support 51.2- and 102.4-Tbps switch platforms, Semtech points out.

The two demonstrations will leverage Semtech’s FiberEdge 200G PAM4 EML driver and TIA and Broadcom’s 5-nm 112-Gbaud PAM4 DSP platform. Instrumentation from Keysight Technologies Inc. will verify the performance of the links.

“We are very excited to collaborate with Broadcom and Keysight in this joint demonstration that showcases Semtech’s 200G PMDs and their interoperability with Broadcom’s cutting-edge 200G/lane DSP and Keysight’s latest 200G equipment,” said Nicola Bramante, senior product line manager for Semtech’s Signal Integrity Products Group. “The demonstration proves the performance of a 200G/lane ecosystem, paving the way for the deployment of next-generation terabit optical transceivers in data centers.”

“This collaboration with Semtech and Keysight, two of the primary ecosystem enablers, is key to the next generation of optical modules that will deliver increased bandwidth in hyperscale cloud networks. This achievement demonstrates our commitment to pushing the boundaries of high-speed connectivity, and we are excited to continue working with industry leaders to drive innovation and deliver cutting-edge solutions to our customers,” added Khushrow Machhi, senior director of marketing of the Physical Layer Products Division at Broadcom.

“Semtech’s and Broadcom’s successful demonstration of the 200-Gbps optical link is another important milestone for the industry towards ubiquitous future 800G and 1.6T networks. Keysight’s early engagement with leading customers and continuous investments in technology and tools deliver the needed insights that enable these milestones,” said Dr. Joachim Peerlings, vice president and general manager of Keysight’s Network and Data Center Solutions Group.

…………………………………………………………………………………………………………………………………………………………………….

About Broadcom:
Broadcom Inc. is a global technology leader that designs, develops and supplies a broad range of semiconductor and infrastructure software solutions. Broadcom’s category-leading product portfolio serves critical markets including data center, networking, enterprise software, broadband, wireless, storage and industrial. Our solutions include data center networking and storage, enterprise, mainframe and cyber security software focused on automation, monitoring and security, smartphone components, telecoms and factory automation. For more information, go to https://www.broadcom.com.

Broadcom, the pulse logo, and Connecting everything are among the trademarks of Broadcom. The term “Broadcom” refers to Broadcom Inc., and/or its subsidiaries. Other trademarks are the property of their respective owners.

About Semtech:

Semtech Corporation is a high-performance semiconductor, IoT systems and Cloud connectivity service provider dedicated to delivering high quality technology solutions that enable a smarter, more connected and sustainable planet. Our global teams are dedicated to empowering solution architects and application developers to develop breakthrough products for the infrastructure, industrial and consumer markets.

References:

https://www.broadcom.com/company/news/product-releases/60996

https://www.fiberoptics4sale.com/blogs/archive-posts/95047430-active-optical-cable-aoc-explained-in-details

https://www.lightwaveonline.com/optical-tech/electronics/article/14290606/semtech-broadcom-to-demo-200glane-electricaltooptical-link-at-ofc-2023

 

Broadcom confirms bid for VMware at a staggering $61 billion in cash and stock; “Cloud Native” reigns supreme

Broadcom has confirmed its plan to acquire VMware for USD 61 billion in cash and stock. VMware will become the new name of the Broadcom Software Group, with just under half the chipmaker’s revenues coming from software following completion of the takeover.

Building on its previous acquisitions of CA Technologies and the Symantec enterprise business, Broadcom will create a new division focused on enterprise infrastructure software and the VMware brand. A pioneer in virtualization technology, VMware has expanded to offer a wide range of cloud software, spanning application modernization, cloud management, cloud infrastructure, networking, security and anywhere workspaces. The company was spun off from Dell Technologies last year, regaining its own stock market listing.

VMware shareholders will have a choice of USD 142.50 cash or 0.2520 shares of Broadcom common stock for each VMware share. This is equal to a 44% premium on VMware’s share price the day before news of a possible deal broke. The cash and stock elements of the deal will each be capped at 50 percent, and Broadcom will also assume around USD 8 billion in VMware debt.

Michael Dell and Silver Lake, which own 40.2% and 10% of VMware shares respectively, have agreed already to support the deal, so long as the VMware Board continues to recommend the sale to Broadcom.

Broadcom has obtained commitments from banks for USD 32 billion in new debt financing for the takeover. The company pledged to maintain its dividend policy of paying out 50% of annual free cash flow to shareholders, as well as an investment-grade credit rating after the acquisition. Completion of the deal is expected to take at least a year, with a target of the end of Broadcom’s fiscal year in October 2023. VMware will have 40 days, until  July 5th, to consider alternative offers.

The news was accompanied by the publication of quarterly results by the two companies. VMware reported revenues of USD 3.1 billion for the three months to April, up 3% from a year earlier, while net profit fell over the same period to USD 242 million from USD 425 million. Subscription and SaaS revenue was up 21%, leading to a subsequent fall in licensing revenue and higher costs for managing the transition to a SaaS model.

Over the same period, Broadcom revenues grew 23% to USD 8.1 billion, better than its forecast, and net profit jumped to USD 2.6 billion from USD 1.5 billion. Infrastructure software accounted for nearly USD 1.9 billion in revenue, up 5 percent year-on-year. With free cash flow at nearly USD 4.2 billion during the quarter and cash of USD 9.0 billion at the end of April, the company said it returned USD 4.5 billion to shareholders during the quarter in the form of dividends and buybacks. It also authorised a new share buyback program for up to USD 10 billion, valid until the end of December 2023.

The transaction is important to telecom network operators for a variety of reasons. Broadcom supplies some of the core chips for cable modems and other gadgets, including those that run Wi-Fi.  Meanwhile, VMware is an active player in the burgeoning “cloud native” space, whereby network operators run their software in the cloud. Indeed, VMware supplies a core part of the cloud platform that Dish Network plans to use to run its forthcoming 5G network in the U.S.

In its press release, Broadcom explained the benefits of the transaction:

“By bringing together the complementary Broadcom Software portfolio with the leading VMware platform, the combined company will provide enterprise customers an expanded platform of critical infrastructure solutions to accelerate innovation and address the most complex information technology infrastructure needs. The combined solutions will enable customers, including leaders in all industry verticals, greater choice and flexibility to build, run, manage, connect and protect applications at scale across diversified, distributed environments, regardless of where they run: from the data center, to any cloud and to edge-computing. With the combined company’s shared focus on technology innovation and significant research and development expenditures, Broadcom will deliver compelling benefits for customers and partners.”

As Reuters reported, Broadcom doesn’t have a history of investing heavily in research and development, which could cast a cloud over VMware supporters hoping to use the company to navigate the tumultuous market for cloud networking.  That is why the U.S. blocked Broadcom’s attempt to acquire Qualcomm in 2018.

“VMware should take heed of Symantec and CA Technologies’ experiences following their acquisition by Broadcom. CA Technologies reportedly saw a 40% reduction in U.S. headcount and employee termination costs were also high at Symantec,” analyst David Bicknell, with research and consulting firm GlobalData, said in a statement. “VMware currently has a strong reputation for its cybersecurity capability in safeguarding endpoints, workloads, and containers. Broadcom’s best shot at making this deal work is to let profitable VMware be VMware.”

References:

https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/company/vmware-broadcom.pdf

https://www.lightreading.com/service-provider-cloud/broadcom-makes-massive-$61b-bid-for-vmware/d/d-id/777848?