Data Center Network Equipment
Initiatives and Analysis: Nokia focuses on data centers as its top growth market
Telco is no longer the top growth market for Nokia. Instead, it’s data centers, said Nokia’s CEO Pekka Lundmark on the company’s Q3 2024 earnings call last week. “”Across Nokia, we are investing to create new growth opportunities outside of our traditional communications service provider market,” he said. “We see a significant opportunity to expand our presence in the data center market and are investing to broaden our product portfolio in IP Networks to better address this.”There will be others as well, but that will be the number one. This is obviously in the very core of our strategy.”
Lundmark said Nokia’s telco total addressable market (TAM) is €84 billion, while its data center total addressable market is currently at €20 billion. “I mean, telco TAM will never be a significant growth market,” he added to no one’s surprise.
Nokia’s recent deal to acquire fiber optics equipment vendor Infinera for $2.3 Billion might help. The Finland based company said the combination with Infinera is expected to accelerate its path to double-digit operating margins in its optical-networks business unit (which was inherited from Alcatel-Lucent) . The transaction (expected to close in the first half of 2025) and the recent sale of submarine networks will reshape Nokia’s Network Infrastructure business to be built around fixed networks, internet-protocol networks and optical networks, the company said. Data centers not only require GPUs, but they also require optical networking to support their AI workloads. Lundmark said the role of optics will increase, not only in connections between data centers, but also inside data centers to connect servers to each other. “Once we get there, that market will be of extremely high volumes,” he said.
– Photo: Arno Mikkor
- In September, Nokia announced the availability of its AI era, Event-Driven Automation (EDA) platform. Nokia EDA raises the bar on data center network operations with a modern approach that builds on Kubernetes to bring highly reliable, simplified, and adaptable lifecycle management to data center networks. Aimed at driving human error in network operations to zero, Nokia’s new platform reduces network disruptions and application downtime while also decreasing operational effort up to 40%. Nokia says its new EDA platform helps data center operators reduce errors in network operations. Nokia said it hopes to remove the risk of human error and reduce network disruptions and application downtime.
- A highlight of the recent quarter is a September deal with self proclaimed “AI hyperscalar” CoreWeave [1.] which selected Nokia to deploy its IP routing and optical transport equipment globally as part of its extensive backbone build-out, with immediate roll-out across its data centers in the U.S. and Europe. Raymond James analyst Simon Leopold said the CoreWeave win was good for Nokia to gain some exposure to AI, and he wondered if Nokia had a long-term strategy of evolving customers away from its telco base into more enterprise-like opportunities. “The reason why CoreWeave is so important is that they are now the leading GPU-as-a- service company,” said Lundmark. “And they have now taken pretty much our entire portfolio, both on the IP side and optical side. And as we know, AI is driving new business models, and one of the business models is clearly GPU-as-a-service,” he added.
Note 1. CoreWeave rents graphical processing units (GPUs) to artificial intelligence (AI) developers. A modern, Kubernetes native cloud that’s purpose-built for large scale, GPU-accelerated workloads. Designed with engineers and innovators in mind, CoreWeave claims to offer unparalleled access to a broad range of compute solutions that are up to 35x faster and 80% less expensive than legacy cloud providers.
……………………………………………………………………………………………………………………………………………………………
Nokia says its IP Interconnection can provide attractive business benefits to data center customers including:
- Improved security – Applications and services can be accessed via private direct connections to the networks of cloud providers collocated in the same facility without traversing the internet.
- Reduced transport costs – Colocated service providers, alternative network providers and carrier neutral network operators offer a wide choice of connections to remote destinations at a lower price.
- Higher performance and lower latency – As connections are direct and are often located closer to the person or thing they are serving, there is a reduction in latency and an increase in reliability as they bypass multiple hops across the public internet.
- More control – Through network automation and via customer portals, cloud service providers can gain more control of their cloud connectivity.
- Greater flexibility – With a wider range of connectivity options, enterprises can distribute application workloads and access cloud applications and services globally to meet business demands and to gain access to new markets.
……………………………………………………………………………………………………………………………………………………………
Nokia’s Data Center Market Risks:
The uncertainty is whether spending on GPUs and optical network equipment in the data center will produce the traffic growth to justify a decent ROI for Nokia. Also, the major cloud vendors (Amazon, Google, Microsoft and Facebook) design, develop, and install their own fiber optic networks. So it will likely be the new AI Data Center players that Nokia will try to sell to. William Webb, an independent consultant and former executive at Ofcom told Light Reading, “There may be substantially more traffic between data centers as models are trained but this will flow across high-capacity fiber connections which can be expanded easily if needed.” Text-based AI apps like ChatGPT generate “minuscule amounts of traffic,” he said. Video-based AI will merely substitute for the genuinely intelligent form.
References:
https://www.datacenterdynamics.com/en/news/nokia-eyes-data-center-market-growth-as-q3-sales-fall/
https://www.nokia.com/blog/enhance-cloud-services-with-high-capacity-interconnection/
https://www.lightreading.com/5g/telecom-glory-days-are-over-bad-news-for-nokia-worse-for-ericsson
AI adoption to accelerate growth in the $215 billion Data Center market
Market Overview:
Data Centers are a $215bn global market that grew 18% annually between 2018-2023. AI adoption is expected to accelerate data center growth as AI chips require 3-4x more electrical power versus traditional central processing units (CPUs).
AI adoption is poised to accelerate this growth meaningfully over coming years. BofA‘s US Semis analyst, Vivek Arya, forecasts the AI chip market to reach ~$200bn in 2027, up from $44bn in 2023. This has positive implications for the broader data center industry.
AI workloads are bandwidth-intensive, connecting hundreds of processors with gigabits of throughput. As these AI models grow, the number of GPUs required to process them grows, requiring larger networks to interconnect the GPUs. See Network Equipment market below.
The electrical and thermal equipment within a data center is sized for maximum load to ensure reliability and uptime. For electrical and thermal equipment manufacturers, AI adoption drives faster growth in data center power loads. AI chips require 3-4x more electrical power versus traditional CPUs (Central Processing Units).
BofA estimates data center capex was $215bn globally in 2023. The majority of this spend is for compute servers, networking and storage ($160bn) with data center infrastructure being an important, but smaller, piece ($55bn). For perspective, data center capex represented ~1% of global fixed capital formation, which includes all private & public sector spending on equipment and structures.
Networking Equipment Market:
BofA estimates a $20bn market size for Data Center networking equipment. Cisco is the market share leader, with an estimated 28% market share.
- Ethernet switches which communicate within the data center via local area networks. Typically, each rack would have a networking switch.
- Routers handle traffic between buildings, typically using internet protocol (IP). Some cloud service providers use “white box“ networking switches (e.g., manufactured by third parties, such as Taiwanese ODMs, to their specifications).
Data center speeds are in a state of constant growth. The industry has moved from 40G speeds to 100G speeds, and those are quickly giving way to 400G speeds. Yet even 400G speeds won’t be fast enough to support some emerging applications which may require 800G and 1.6TB data center speeds.
…………………………………………………………………………………………………………………………………….
Data Centers are also a bright spot for the construction industry. BofA notes that construction spending for data centers is approaching $30bn (vs $2bn in 2014) and accounts for nearly 21% of data center capex. At 4% of private construction spending (vs 2% five years ago), the data center category has surpassed retail, and could be a partial offset in a construction downturn.
Source: BofA Global Research
………………………………………………………………………………………………………………………..
References:
https://www.belden.com/blogs/smart-building/faster-data-center-speeds-depend-on-fiber-innovation#
Proposed solutions to high energy consumption of Generative AI LLMs: optimized hardware, new algorithms, green data centers
Nvidia enters Data Center Ethernet market with its Spectrum-X networking platform
Co-Packaged Optics to play an important role in data center switches
EdgeCore Digital Infrastructure and Zayo bring fiber connectivity to Santa Clara data center
Deutsche Telekom with AWS and VMware demonstrate a global enterprise network for seamless connectivity across geographically distributed data centers
Nvidia enters Data Center Ethernet market with its Spectrum-X networking platform
Nvidia is planning a big push into the Data Center Ethernet market. CFO Colette Kress said the Spectrum-X Ethernet-based networking solution it launched in May 2023 is “well on track to begin a multi-billion-dollar product line within a year.” The Spectrum-X platform includes: Ethernet switches, optics, cables and network interface cards (NICs). Nvidia already has a multi-billion-dollar play in this space in the form of its Ethernet NIC product. Kress said during Nvidia’s earnings call that “hundreds of customers have already adopted the platform.” And that Nvidia plans to “launch new Spectrum-X products every year to support demand for scaling compute clusters from tens of thousands of GPUs today to millions of DPUs in the near future.”
- With Spectrum-X, Nvidia will be competing with Arista, Cisco, and Juniper at the system level along with “bare metal switches” from Taiwanese ODMs running DriveNets network cloud software.
- With respect to high performance Ethernet switching silicon, Nvidia competitors include Broadcom, Marvell, Microchip, and Cisco (which uses Silicon One internally and also sells it on the merchant semiconductor market).
Image by Midjourney for Fierce Network
…………………………………………………………………………………………………………………………………………………………………………..
In November 2023, Nvidia said it would work with Dell Technologies, Hewlett Packard Enterprise and Lenovo to incorporate Spectrum-X capabilities into their compute servers. Nvidia is now targeting tier-2 cloud service providers and enterprise customers looking for bundled solutions.
Dell’Oro Group VP Sameh Boujelbene told Fierce Network that “Nvidia is positioning Spectrum-X for AI back-end network deployments as an alternative fabric to InfiniBand. While InfiniBand currently dominates AI back-end networks with over 80% market share, Ethernet switches optimized for AI deployments have been gaining ground very quickly.” Boujelbene added Nvidia’s success with Spectrum-X thus far has largely been driven “by one major 100,000-GPU cluster, along with several smaller deployments by Cloud Service Providers.” By 2028, Boujelbene said Dell’Oro expects Ethernet switches to surpass InfiniBand for AI in the back-end network market, with revenues exceeding $10 billion.
………………………………………………………………………………………………………………………………………………………………………………
In a recent IEEE Techblog post we wrote:
While InfiniBand currently has the edge in the data center networking market, but several factors point to increased Ethernet adoption for AI clusters in the future. Recent innovations are addressing Ethernet’s shortcomings compared to InfiniBand:
- Lossless Ethernet technologies
- RDMA over Converged Ethernet (RoCE)
- Ultra Ethernet Consortium’s AI-focused specifications
Some real-world tests have shown Ethernet offering up to 10% improvement in job completion performance across all packet sizes compared to InfiniBand in complex AI training tasks. By 2028, it’s estimated that: 1] 45% of generative AI workloads will run on Ethernet (up from <20% now) and 2] 30% will run on InfiniBand (up from <20% now).
………………………………………………………………………………………………………………………………………………………………………………
References:
https://www.fierce-network.com/cloud/data-center-ethernet-nvidias-next-multi-billion-dollar-business
https://www.nvidia.com/en-us/networking/spectrumx/
Will AI clusters be interconnected via Infiniband or Ethernet: NVIDIA doesn’t care, but Broadcom sure does!
Data Center Networking Market to grow at a CAGR of 6.22% during 2022-2027 to reach $35.6 billion by 2027
LightCounting: Optical Ethernet Transceiver sales will increase by 40% in 2024
AI winner Nvidia faces competition with new super chip delayed
The Clear AI Winner Is: Nvidia!
Strong AI spending should help Nvidia make its own ambitious numbers when it reports earnings at the end of the month (it’s 2Q-2024 ended July 31st). Analysts are expecting nearly $25 billion in data center revenue for the July quarter—about what that business was generating annually a year ago. But the latest results won’t quell the growing concern investors have with the pace of AI spending among the world’s largest tech giants—and how it will eventually pay off.
In March, Nvidia unveiled its Blackwell chip series, succeeding its earlier flagship AI chip, the GH200 Grace Hopper Superchip, which was designed to speed generative AI applications. The NVIDIA GH200 NVL2 fully connects two GH200 Superchips with NVLink, delivering up to 288GB of high-bandwidth memory, 10 terabytes per second (TB/s) of memory bandwidth, and 1.2TB of fast memory. The GH200 NVL2 offers up to 3.5X more GPU memory capacity and 3X more bandwidth than the NVIDIA H100 Tensor Core GPU in a single server for compute- and memory-intensive workloads. The GH200 meanwhile combines an H100 chip [1.] with an Arm CPU and more memory.
Photo Credit: Nvidia
Note 1. The Nvidia H100, sits in a 10.5 inch graphics card which is then bundled together into a server rack alongside dozens of other H100 cards to create one massive data center computer.
This week, Nvidia informed Microsoft and another major cloud service provider of a delay in the production of its most advanced AI chip in the Blackwell series, the Information website said, citing a Microsoft employee and another person with knowledge of the matter.
…………………………………………………………………………………………………………………………………………
Nvidia Competitors Emerge – but are their chips ONLY for internal use?
In addition to AMD, Nvidia has several big tech competitors that are currently not in the merchant market semiconductor business. These include:
- Huawei has developed the Ascend series of chips to rival Nvidia’s AI chips, with the Ascend 910B chip as its main competitor to Nvidia’s A100 GPU chip. Huawei is the second largest cloud services provider in China, just behind Alibaba and ahead of Tencent.
- Microsoft has unveiled an AI chip called the Azure Maia AI Accelerator, optimized for artificial intelligence (AI) tasks and generative AI as well as the Azure Cobalt CPU, an Arm-based processor tailored to run general purpose compute workloads on the Microsoft Cloud.
- Last year, Meta announced it was developing its own AI hardware. This past April, Meta announced its next generation of custom-made processor chips designed for their AI workloads. The latest version significantly improves performance compared to the last generation and helps power their ranking and recommendation ads models on Facebook and Instagram.
- Also in April, Google revealed the details of a new version of its data center AI chips and announced an Arm-based based central processor. Google’s 10 year old Tensor Processing Units (TPUs) are one of the few viable alternatives to the advanced AI chips made by Nvidia, though developers can only access them through Google’s Cloud Platform and not buy them directly.
As demand for generative AI services continues to grow, it’s evident that GPU chips will be the next big battleground for AI supremacy.
References:
AI Frenzy Backgrounder; Review of AI Products and Services from Nvidia, Microsoft, Amazon, Google and Meta; Conclusions
https://www.nvidia.com/en-us/data-center/grace-hopper-superchip/
https://www.theverge.com/2024/2/1/24058186/ai-chips-meta-microsoft-google-nvidia/archives/2
https://news.microsoft.com/source/features/ai/in-house-chips-silicon-to-service-to-meet-ai-demand/
Light Source Communications Secures Deal with Major Global Hyperscaler for Fiber Network in Phoenix Metro Area
Light Source Communications is building a 140-mile fiber middle-mile network in the Phoenix, AZ metro area, covering nine cities: Phoenix, Mesa, Tempe, Chandler, Gilbert, Queen Creek, Avondale, Coronado and Cashion. The company already has a major hyperscaler as the first anchor tenant.
There are currently 70 existing and planned data centers in the area that Light Source will serve. As one might expect, the increase in data centers stems from the boom in artificial intelligence (AI).
The network will include a big ring, which will be divided into three separate rings. In total, Light Source will be deploying 140 miles of fiber. The company has partnered with engineering and construction provider Future Infrastructure LLC, a division of Primoris Services Corp., to make it happen.
“I would say that AI happens to be blowing up our industry, as you know. It’s really in response to the amount of data that AI is demanding,” said Debra Freitas [1.], CEO of Light Source Communications (LSC).
Note 1. Debra Freitas has led LSC since co-founding in 2014. Owned and operated network with global OTT as a customer. She developed key customer relationships, secured funding for growth. Currently sits on the Executive Board of Incompas.
……………………………………………………………………………………………………..
Light Source plans for the entire 140-mile route to be underground. It’s currently working with the city councils and permitting departments of the nine cities as it goes through its engineering and permit approval processes. Freitas said the company expects to receive approvals from all the city councils and to begin construction in the third quarter of this year, concluding by the end of 2025.
Primoris delivers a range of specialty construction services to the utility, energy, and renewables markets throughout the United States and Canada. Its communications business is a leading provider of critical infrastructure solutions, including program management, engineering, fabrication, replacement, and maintenance. With over 12,700 employees, Primoris had revenue of $5.7 billion in 2023.
“We’re proud to partner with Light Source Communications on this impactful project, which will exceed the growing demands for high-capacity, reliable connectivity in the Phoenix area,” said Scott Comley, president of Primoris’ communications business. “Our commitment to innovation and excellence is well-aligned with Light Source’s cutting-edge solutions and we look forward to delivering with quality and safety at the forefront.”
Light Source is a carrier neutral, owner-operator of networks serving enterprises throughout the U.S. In addition to Phoenix, several new dark fiber routes are in development in major markets throughout the Central and Western United States. For more information about Light Source Communications, go to lightsourcecom.net.
The city councils in the Phoenix metro area have been pretty busy with fiber-build applications the past couple of years because the area is also a hotbed for companies building fiber-to-the-premises (FTTP) networks. In 2022 the Mesa City Council approved four different providers to build fiber networks. AT&T and BlackRock have said their joint venture would also start deploying fiber in Mesa.
Light Source is focusing on middle-mile, rather than FTTP because that’s where the demand is, according to Freitas. “Our route is a unique route, meaning there are no other providers where we’re going. We have a demand for the route we’re putting in,” she noted.
The company says it already has “a major, global hyperscaler” anchor tenant, but it won’t divulge who that tenant is. Its network will also touch Arizona State University at Tempe and the University of Arizona.
Light Source doesn’t light any of the fiber it deploys. Rather, it is carrier neutral and sells the dark fiber to customers who light it themselves and who may resell it to their own customers.
Light Source began operations in 2014 and is backed by private equity. It did not receive any federal grants for the new middle-mile network in Arizona.
………………………………………………………………………………………………………..
Bill Long, Zayo’s chief product officer, told Fierce Telecom recently that data centers are preparing for an onslaught of demand for more compute power, which will be needed to handle AI workloads and train new AI models.
…………………………………………………………………………………………………………
About Light Source Communications:
Light Source Communications (LSC) is a carrier neutral, customer agnostic provider of secure, scalable, reliable connectivity on a state-of-the-art dark fiber network. The immense amounts of data businesses require to compete in today’s global market requires access to an enhanced fiber infrastructure that allows them to control their data. With over 120 years of telecom experience, LSC offers an owner-operated network for U.S. businesses to succeed here and abroad. LSC is uniquely positioned and is highly qualified to build the next generation of dark fiber routes across North America, providing the key connections for business today and tomorrow.
References:
https://www.lightsourcecom.net/services/
https://www.fiercetelecom.com/ai/ai-demand-spurs-light-source-build-middle-mile-network-phoenix
Proposed solutions to high energy consumption of Generative AI LLMs: optimized hardware, new algorithms, green data centers
AI sparks huge increase in U.S. energy consumption and is straining the power grid; transmission/distribution as a major problem
AI Frenzy Backgrounder; Review of AI Products and Services from Nvidia, Microsoft, Amazon, Google and Meta; Conclusions
CoreSite Enables 50G Multi-cloud Networking with Enhanced Virtual Connections to Oracle Cloud Infrastructure FastConnect
Note 1. CoreSite is a subsidiary of American Tower Corporation and a member of Oracle PartnerNetwork (OPN).
Note 2. Oracle FastConnect enables customers to bypass the public internet and connect directly to Oracle Cloud Infrastructure and other Oracle Cloud services. With connectivity available at CoreSite’s data centers, FastConnect provides a flexible, economical private connection to higher bandwidth options for your hybrid cloud architecture. Oracle FastConnect is accessible at CoreSite’s data center facilities in Northern Virginia and Los Angeles through direct fiber connectivity. FastConnect is also available via the CoreSite Open Cloud Exchange® in seven CoreSite markets, including Los Angeles, Silicon Valley, Denver, Chicago, New York, Boston and Northern Virginia.
The integration of Oracle FastConnect and the CoreSite Open Cloud Exchange offers on-demand, virtual connectivity and access to best in class, end-to-end, fully redundant connection architecture.
Image Credit: CoreSite
…………………………………………………………………………………………………………………………………………………………………………………………………………..
The connectivity of FastConnect and the OCX can offer customers deploying artificial intelligence (AI) and data-intensive applications the ability to transfer large datasets securely and rapidly from their network edge to machine learning (ML) models and big data platforms running on OCI. With the launch of the new OCX capabilities to FastConnect, businesses can gain greater flexibility to provision on-demand, secure bandwidth to OCI with virtual connections of up to 50 Gbps.
With OCI, customers benefit from best-in-class security, consistent high performance, simple predictable pricing, and the tools and expertise needed to bring enterprise workloads to cloud quickly and efficiently. In addition, OCI’s distributed cloud offers multicloud, hybrid cloud, public cloud, and dedicated cloud options to help customers harness the benefits of cloud with greater control over data residency, locality, and authority, even across multiple clouds. As a result, customers can bring enterprise workloads to the cloud quickly and efficiently while meeting the strictest regulatory compliance requirements.
“The digital world requires faster connections to deploy complex, data-intense workloads. The simplified process offered through the Open Cloud Exchange enables businesses to rapidly scale network capacity between the enterprise edge and cloud providers,” said Juan Font, President and CEO of CoreSite, and SVP of U.S. Tower. “These enhanced, faster connections with FastConnect can provide businesses with a competitive advantage by ensuring near-seamless and reliable data transfers at massive scale for real-time analysis and rapid data processing.”
OCI’s extensive network of more than 90 FastConnect global and regional partners offer customers dedicated connectivity to Oracle Cloud Regions and OCI services – providing customers with the best options anywhere in the world. OCI is a deep and broad platform of cloud infrastructure services that enables customers to build and run a wide range of applications in a scalable, secure, highly available, and high-performance environment. From application development and business analytics to data management, integration, security, AI, and infrastructure services including Kubernetes and VMware, OCI delivers unmatched security, performance, and cost savings.
The new Open Cloud Exchange capabilities on FastConnect will be available in Q4 2023.
Related Resources:
- Watch What is The Open Cloud Exchange® and How Can It Simplify and Automate Your Cloud Connectivity?
- Open Cloud Exchange® Solution Brochure
- Trust CoreSite Data Centers to Enable Your AI Strategy
- Why businesses partner with CoreSite
About CoreSite:
CoreSite, an American Tower company (NYSE: AMT), provides hybrid IT solutions that empower enterprises, cloud, network, and IT service providers to monetize and future-proof their digital business. Our highly interconnected data center campuses offer a native digital supply chain featuring direct cloud onramps to enable our customers to build customized hybrid IT infrastructure and accelerate digital transformation. For more than 20 years, CoreSite’s team of technical experts has partnered with customers to optimize operations, elevate customer experience, dynamically scale, and leverage data to gain competitive edge. For more information, visit CoreSite.com and follow us on LinkedIn and Twitter.
References:
IEEE Santa Clara Valley (SCV) Lecture and Tour of CoreSite Multi-Tenant Data Center
https://www.coresite.com/cloud-networking/oracle-fastconnect
Using a distributed synchronized fabric for parallel computing workloads- Part I
by Run Almog Head of Product Strategy, Drivenets (edited by Alan J Weissberger)
Introduction:
Different networking attributes are needed for different use cases. Endpoints can be the source of a service provided via the internet or can also be a handheld device streaming a live video from anywhere on the planet. In between endpoints we have network vertices that handle this continuous and ever-growing traffic flow onto its destination as well as handle the knowhow of the network’s whereabouts, apply service level assurance, handle interruptions and failures and a wide range of additional attributes that eventually enable network service to operate.
This two part article will focus on a use case of running artificial intelligence (AI) and/or high-performance computing (HPC) applications with the resulting networking aspects described. The HPC industry is now integrating AI and HPC, improving support for AI use cases. HPC has been successfully used to run large-scale AI models in fields like cosmic theory, astrophysics, high-energy physics, and data management for unstructured data sets.
In this Part I article, we examine: HPC/AI workloads, disaggregation in data centers, role of the Open Compute Project, telco data center networking, AI clusters and AI networking.
HPC/AI Workloads, High Performance Compute Servers, Networking:
HPC/AI workloads are applications that run over an array of high performance compute servers. Those servers typically host a dedicated computation engine like GPU/FPGA/accelerator in addition to a high performance CPU, which by itself can act as a compute engine, and some storage capacity, typically a high-speed SSD. The HPC/AI application running on such servers is not running on a specific server but on multiple servers simultaneously. This can range from a few servers or even a single machine to thousands of machines all operating in synch and running the same application which is distributed amongst them.
The interconnect (networking) between these computation machines need to allow any to any connectivity between all machines running the same application as well as cater for different traffic patterns which are associated with the type of application running as well as stages of the application’s run. An interconnect solution for HPC/AI would resultingly be different than a network built to serve connectivity to residential households or a mobile network as well as be different than a network built to serve an array of servers purposed to answers queries from multiple users as a typical data center structure would be used for.
Disaggregation in Data Centers (DCs):
Disaggregation has been successfully used as a solution for solving challenges in cloud resident data centers. The Open Compute Project (OCP) has generated open source hardware and software for this purpose. The OCP community includes hyperscale data center operators and industry players, telcos, colocation providers and enterprise IT users, working with vendors to develop and commercialize open innovations that, when embedded in product are deployed from the cloud to the edge.
High-performance computing (HPC) is a term used to describe computer systems capable of performing complex calculations at exceptionally high speeds. HPC systems are often used for scientific research, engineering simulations and modeling, and data analytics. The term high performance refers to both speed and efficiency. HPC systems are designed for tasks that require large amounts of computational power so that they can perform these tasks more quickly than other types of computers. They also consume less energy than traditional computers, making them better suited for use in remote locations or environments with limited access to electricity.
HPC clusters commonly run batch calculations. At the heart of an HPC cluster is a scheduler used to keep track of available resources. This allows for efficient allocation of job requests across different compute resources (CPUs and GPUs) over high-speed networks. Several HPC clusters have integrated Artificial Intelligence (AI).
While hyperscale, cloud resident data centers and HPC/AI clusters have a lot of similarities between them, the solution used in hyperscale data centers is falling short when trying to address the additional complexity imposed by the HPC/AI workloads.
Large data center implementations may scale to thousands of connected compute servers. Those servers are used for an array of different application and traffic patterns shift between east/west (inside the data center) and north/south (in and out of the data center). This variety boils down to the fact that every such application handles itself so the network does not need to cover guarantee delivery of packets to and from application endpoints, these issues are solved with standard based retransmission or buffering of traffic to prevent traffic loss.
An HPC/AI workload on the other hand, is measured by how fast a job is completed and is interfacing to machines so latency and accuracy are becoming more of a critical factor. A delayed packet or a packet being lost, with or without the resulting retransmission of that packet, drags a huge impact on the application’s measured performance. In HPC/AI world, this is the responsibility of the interconnect to make sure this mishaps do not happen while the application simply “assumes” that it is getting all the information “on-time” and “in-synch” with all the other endpoints it shares the workload with.
–>More about how Data centers use disaggregation and how it benefits HPC/AI in the second part of this article (Part II).
Telco Data Center Networking:
Telco data centers/central offices are traditionally less supportive of deploying disaggregated solutions than hyper scale, cloud resident data centers. They are characterized by large monolithic, chassis based and vertically integrated routers. Every such router is well-structured and in fact a scheduled machine built to carry packets between every group of ports is a constant latency and without losing any packet. A chassis based router would potentially pose a valid solution for HPC/AI workloads if it could be built with scale of thousands of ports and be distributed throughout a warehouse with ~100 racks filled with servers.
However, some tier 1 telcos, like AT&T, use disaggregated core routing via white box switch/routers and DriveNets Network Cloud (DNOS) software. AT&T’s open disaggregated core routing platform was carrying 52% of the network operators traffic at the end of 2022, according to Mike Satterlee, VP of AT&T’s Network Core Infrastructure Services. The company says it is now exploring a path to scale the system to 500Tbps and then expand to 900Tbps.
“Being entrusted with AT&T’s core network traffic – and delivering on our performance, reliability and service availability commitments to AT&T– demonstrates our solution’s strengths in meeting the needs of the most demanding service providers in the world,” said Ido Susan, DriveNets founder and CEO. “We look forward to continuing our work with AT&T as they continue to scale their next-gen networks.”
Satterlee said AT&T is running a nearly identical architecture in its core and edge environments, though the edge system runs Cisco’s disaggregates software. Cisco and DriveNets have been active parts of AT&T’s disaggregation process, though DriveNets’ earlier push provided it with more maturity compared to Cisco.
“DriveNets really came in as a disruptor in the space,” Satterlee said. “They don’t sell hardware platforms. They are a software-based company and they were really the first to do this right.”
AT&T began running some of its network backbone on DriveNets core routing software beginning in September 2020. The vendor at that time said it expected to be supporting all of AT&T’s traffic through its system by the end of 2022.
Attributes of an AI Cluster:
Artificial intelligence is a general term that indicates the ability of computers to run logic which assimilates the thinking patterns of a biological brain. The fact is that humanity has yet to understand “how” a biological brain behaves, how are memories stored and accessed, how come different people have different capacities and/or memory malfunction, how are conclusions being deduced and how come they are different between individuals and how are actions decided in split second decisions. All this and more are being observed by science but not really understood to a level where it can be related to an explicit cause.
With evolution of compute capacity, the ability to create a computing function that can factor in large data sets was created and the field of AI focuses on identifying such data sets and their resulting outcome to educate the compute function with as many conclusion points as possible. The compute function is then required to identify patterns within these data sets to predict the outcome of new data sets which it did not encounter before. Not the most accurate description of what AI is (it is a lot more than this) but it is sufficient to explain why are networks built to run AI workloads different than regular data center networks as mentioned earlier.
Some example attributes of AI networking are listed here:
- Parallel computing – AI workloads are a unified infrastructure of multiple machines running the same application and same computation task
- Size – size of such task can reach thousands of compute engines (e.g., GPU, CPU, FPGA, Etc.)
- Job types – different tasks vary in their size, duration of the run, the size and number of data sets it needs to consider, type of answer it needs to generate, etc. this as well as the different language used to code the application and the type of hardware it runs on contributes to a growing variance of traffic patterns within a network built for running AI workloads
- Latency & Jitter – some AI workloads are resulting a response which is anticipated by a user. The job completion time is a key factor for user experience in such cases which makes latency an important factor. However, since such parallel workloads run over multiple machines, the latency is dictated by the slowest machine to respond. This means that while latency is important, jitter (or latency variation) is in fact as much a contributor to achieve the required job completion time
- Lossless – following on the previous point, a response arriving late is delaying the entire application. Whereas in a traditional data center, a message dropped will result in retransmission (which is often not even noticed), in an AI workload, a dropped message means that the entire computation is either wrong or stuck. It is for this reason that AI running networks requires lossless behavior of the network. IP networks are lossy by nature so for an IP network to behave as lossless, certain additions need to be applied. This will be discussed in. follow up to this paper.
- Bandwidth – large data sets are large. High bandwidth of traffic needs to run in and out of servers for the application to feed on. AI or other high performance computing functions are reaching interface speeds of 400Gbps per every compute engine in modern deployments.
The narrowed down conclusion from these attributes is that a network purposed to run AI workloads differs from a traditional data center network in that it needs to operate “in-synch.
There are several such “in-synch” solutions available. The main options are: Chassis based solutions, Standalone Ethernet solutions, and proprietary locked solutions.–>These will be briefly described to their key advantages and deficiencies in our part II article.
Conclusions:
There are a few differences between AI and HPC workloads and how this translates to the interconnect used to build such massive computation machines.
While the HPC market finds proprietary implementations of interconnect solutions acceptable for building secluded supercomputers for specific uses, the AI market requires solutions that allow more flexibility in their deployment and vendor selection.
AI workloads have greater variance of consumers of outputs from the compute cluster which puts job completion time as the primary metric for measuring the efficiency of the interconnect. However, unlike HPC where faster is always better, some AI consumers will only detect improvements up to a certain level which gives interconnect jitter a higher impact than latency.
Traditional solutions provide reasonable solutions up to the scale of a single machine (either standalone or chassis) but fail to scale beyond a single interconnect machine and keep the required performance to satisfy the running workloads. Further conclusions and merits of the possible solutions will be discussed in a follow up article.
………………………………………………………………………………………………………………………………………………………………………………..
About DriveNets:
DriveNets is a fast-growing software company that builds networks like clouds. It offers communications service providers and cloud providers a radical new way to build networks, detaching network growth from network cost and increasing network profitability.
DriveNets Network Cloud uniquely supports the complete virtualization of network and compute resources, enabling communication service providers and cloud providers to meet increasing service demands much more efficiently than with today’s monolithic routers. DriveNets’ software runs over standard white-box hardware and can easily scale network capacity by adding additional white boxes into physical network clusters. This unique disaggregated network model enables the physical infrastructure to operate as a shared resource that supports multiple networks and services. This network design also allows faster service innovation at the network edge, supporting multiple service payloads, including latency-sensitive ones, over a single physical network edge.
References:
https://drivenets.com/resources/events/nfdsp1-drivenets-network-cloud-and-serviceagility/
https://www.run.ai/guides/hpc-clusters/hpc-and-ai
https://drivenets.com/news-and-events/press-release/drivenets-network-cloud-now-carries-more-than-52-of-atts-core-production-traffic/
https://techblog.comsoc.org/2023/01/27/att-highlights-5g-mid-band-spectrum-att-fiber-gigapower-joint-venture-with-blackrock-disaggregation-traffic-milestone/
AT&T Deploys Dis-Aggregated Core Router White Box with DriveNets Network Cloud software
DriveNets Network Cloud: Fully disaggregated software solution that runs on white boxes
Equinix to deploy Nokia’s IP/MPLS network infrastructure for its global data center interconnection services
Today, Nokia announced that Equinix will deploy a new Nokia IP/MPLS network infrastructure to support its global interconnection services. As one of the largest data center and colocation providers, Equinix currently runs services on multiple networks from multiple vendors. With the new network, Equinix will be able to consolidate into one, efficient web-scale infrastructure to provide FP4-powered connectivity to all data centers – laying the groundwork for customers to deploy 5G networks and services.
Muhammad Durrani, Director of IP Architecture for Equinix, said, “We see tremendous opportunity in providing our customers with 5G services, but this poses special demands for our network, from ultra-low latency to ultra broadband performance, all with business- and mission-critical reliability. Nokia’s end-to-end router portfolio will provide us with the highly dynamic and programmable network fabric we need, and we are pleased to have the support of the Nokia team every step of the way.”
“We’re pleased to see Nokia getting into the data center networking space and applying the same rigor to developing a next-generation open and easily extendible data center network operating system while leveraging its IP routing stack that has been proven in networks globally. It provides a platform that network operations teams can easily adapt and build applications on, giving them the control they need to move fast.”
Sri Reddy, Co-President of IP/Optical Networks, Nokia, said, “We are working closely with Equinix to help advance its network and facilitate the transformation and delivery of 5G services. Our end-to-end portfolio was designed precisely to support this industrial transformation with a highly flexible, scalable and programmable network fabric that will be the ideal platform for 5G in the future. It is exciting to work with Equinix to help deliver this to its customers around the world.”
With an end-to-end portfolio, including the Nokia FP4-powered routing family, Nokia is working in partnership with operators to deliver real 5G. The FP4 chipset is the industry’s leading network processor for high-performance routing, setting the bar for density and scale. Paired with Nokia’s Service Router Operating System (SR OS) software, it will enable Equinix to offer additional capabilities driven by routing technologies such as Ethernet VPNs (EVPNs) and segment routing (SR).
Image Credit: Nokia
……………………………………………………………………………………………………………………………………………………………………………….
This latest deal comes just two weeks after Equinix said it will host Nokia’s Worldwide IoT Network Grid (WING) service on its data centers. WING is an Infrastructure-as-a-Service offering that provides low-latency and global reach to businesses, hastening their deployment of IoT and utilizing solutions offered by the Edge and cloud.
Equinix operates more than 210 data centers across 55 markets. It is unclear which of these data centers will first offer Nokia’s services and when WING will be available to customers.
“Nokia needed access to multiple markets and ecosystems to connect to NSPs and enterprises who want a play in the IoT space,” said Jim Poole, VP at Equinix. “By directly connecting to Nokia WING, mobile network operators can capture business value across IoT, AI, and security, with a connectivity strategy to support business transformation.”
References:
…………………………………………………………………………………………………………………………………………………………..
About Nokia:
We create the technology to connect the world. Only Nokia offers a comprehensive portfolio of network equipment, software, services and licensing opportunities across the globe. With our commitment to innovation, driven by the award-winning Nokia Bell Labs, we are a leader in the development and deployment of 5G networks.
Our communications service provider customers support more than 6.4 billion subscriptions with our radio networks, and our enterprise customers have deployed over 1,300 industrial networks worldwide. Adhering to the highest ethical standards, we transform how people live, work and communicate. For our latest updates, please visit us online www.nokia.com and follow us on Twitter @nokia.
Resources:
- Webpage: Nokia 7750 SR-s
- Webpage: Nokia FP4 silicon
- Webpage: Nokia Service Router Operating System (SR OS)
- Webpage: Nokia Network Services Platform
Dell’Oro: Data Center Switch market declined 9% YoY; SD-WAN market increased at slower rate than in 2019
Market research firm Dell’Oro Group reported today that the worldwide Data Center Switch market recorded its first decline in nine years, dropping 9 percent year-over-year in the first quarter. 1Q 2020 revenue level was also the lowest in three years. The softness was broad-based across all major branded vendors, except Juniper Networks and white box vendors. Revenue from white box vendors was propelled mainly by strong demand from Google and Amazon.
“The COVID-19 pandemic has created some positive impact on the market as some customers pulled in orders in anticipation of supply shortage and elongated lead times,” said Sameh Boujelbene, Senior Director at Dell’Oro Group. “Yet this upside dynamic was more than offset by the pandemic’s more pronounced negative impact on customer demand as they paused purchases due to macro-economic uncertainties. Supply constraints were not major headwinds during the first quarter but expected to become more apparent in the next quarter,” added Boujelbene.
Additional highlights from the 1Q 2020 Ethernet Switch – Data Center Report:
- The revenue decline was broad-based across all regions but was less pronounced in North America.
- We expect revenue in the market to decline high single-digit in 2020, despite some pockets of strength from certain segments.
The Dell’Oro Group Ethernet Switch – Data Center Quarterly Report offers a detailed view of the market, including Ethernet switches for server access, server aggregation, and data center core. (Software is addressed separately.) The report contains in-depth market and vendor-level information on manufacturers’ revenue; ports shipped; average selling prices for both Modular and Fixed Managed and Unmanaged Ethernet Switches (1000 Mbps,10, 25, 40, 50, 100, 200, and 400 GE); and regional breakouts. To purchase these reports, please contact us by email at [email protected].
…………………………………………………………………………………………………………………………………………………
Separately, Dell’Oro Group reported that the market for software-defined (SD)-WAN equipment increased by 24% in the first quarter (year-to-year), which was significantly below the 64% growth seen in 2019. Citing supply chain issues created by the coronavirus pandemic, the market research firm’s Shin Umeda predicted the market will post double-digit growth in 2020 despite “macroeconomic uncertainty.”
- Supply chain disruptions accounted for the majority of the Service Provider (SP) Router and CES Switch market decline in 1Q 2020.
- The SP Router and CES market in China showed a modest decline in 1Q 2020, but upgrades for 5G infrastructure are expected to drive strong demand over the rest of 2020.
Omdia: High-speed data-center Ethernet adapter market at $1.7 billion in 2019
Executive Summary:
The market for Ethernet adapters with speeds of 25 gigabits (25GE) and faster deployed by enterprises, cloud service providers and telecommunication network providers at data centers topped $1 billion for the first time in 2019, according to Omdia.
The total Ethernet adapter market size stood at $1.7 billion for the year. This result was in line with Omdia’s long term server and storage connectivity forecast. Factors driving that forecast include the growth in data sets, such as those computed by analytics algorithms looking for patterns, and the adoption of new software technologies like AI and ML which must examine large data sets to be effective, driving larger movement of data.
“Server virtualization and containerization reached new highs in 2019 and drove up server utilization. This increased server connectivity bandwidth requirements, and the need for higher speed Ethernet adapters” said Vlad Galabov, principal analyst for data center IT, at Omdia. “The popularization of data-intensive workloads, like analytics and AI, were also strong drivers for higher speed adapters in 2019”
25GE Ethernet adapters represented more than 25 percent of total data-center Ethernet adapter ports and revenue in 2019, as reported by Omdia’s Ethernet Network Adapter Equipment Market Tracker. Omdia also found that the price per each 25GE port is continued to decline. A single 25GE port cost an average of $81 in 2019, a decrease of $9 from 2018.
Despite representing a small portion of the market, 100GE Ethernet adapters are increasingly deployed by cloud service providers and enterprises running high-performance computing clusters. Shipments and revenue for 100GE Ethernet adapter ports both grew by more than 60 percent in 2019. Each 100GE adapter port is also becoming more affordable. In 2019, an individual 100GE Ethernet adapter port cost $321 on average, a decrease of $34 from 2018.
“Cloud service providers (CSPs) are leading the transition to faster networks as they run multi-tenant servers with a large number of virtual machines and/or containers per server. This is driving high traffic and bandwidth needs,” Galabov said. “Omdia expects telcos to invest more in higher speeds going forward—including 100GE—driven by network function virtualization (NFV) and increased bandwidth requirements from HD video, social media, AR/VR and expanded IoT use cases.”
The Ethernet outlook:
Omdia expects Ethernet adapter revenue to grow 21 percent on average each year through 2024. Despite the COVID-19 lockdown, the Ethernet adapter market is set to remain close to this growth curve in 2020.
Ethernet adapters that can provide complete on-card processing of network, storage or memory protocols, data-plane offload or that can offload server memory access will account for half of the total market revenue in 2020, or $1.1 billion. Ethernet adapters that have an onboard field customizable processor such as a field-programmable gate array (FPGA) or system on chip (SoC), will account for slightly more than than a quarter of 2020 adapter revenue, totaling $557 million. Adapters that only provide Ethernet connectivity will make up a minority share of the market, at just $475 million.
Intel maintains lead:
Looking at semiconductor vendor market share, Intel held 24 percent of the 2019 Ethernet adapter market, shipping adapters worth $424 million in 2019. This represents a 2.5-point decrease from 2018 that Omdia attributes to the aging Intel Ethernet adapter portfolio which consists primarily of 1GE and 10GE adapters with Ethernet connectivity only. Intel indicated it will introduce adapters with offload functionality in 2020 that will help it remain competitive in the market.
Mellanox (now part of NVIDIA) captured 21 percent of the 2019 Ethernet adapter market, a 1-point increase compared to 2018. The vendor reported strong growth of its 25GE and 100GE offload adapters driven by strong cloud service provider demand and growing demand among enterprises for 25GE networking.
Broadcom was the third largest Ethernet adapter vendor in 2019, commanding a 14 percent share of the market, an increase of 3 points from 2018. Broadcom’s revenue growth was driven by strong demand for high-speed offload and programmable adapters at hyperscale CSPs.
In 2019, Microsoft and Amazon continued to adopt in-house-developed Ethernet adapters. Given their large scale and the high value of their high-speed offload and programmable adapters, the companies cumulatively deployed Ethernet adapters worth over $300 million. This made them the fourth and fifth largest makers of Ethernet adapters in 2019. As both service providers deploy 100GE adapters in larger numbers in 2020, they’re set to remain key trendsetters in the market.
Amazon AWS and Microsoft Azure continued to use in-house-developed Ethernet adapters. Given their large scale and the high value of their high-speed offload and programmable adapters, the companies cumulatively deployed Ethernet adapters worth over $300 million, according to Omdia. This made Microsoft and Amazon, respectively, the fourth and fifth largest makers of Ethernet adapters in 2019. As both service providers deploy 100GE adapters in larger numbers in 2020, Omdia expects them to continue to be key trendsetters in the market going forward.
About Omdia:
Omdia is a global technology research powerhouse, established following the merger of the research division of Informa Tech (Ovum, Heavy Reading and Tractica) and the acquired IHS Markit technology research portfolio*.
We combine the expertise of over 400 analysts across the entire technology spectrum, analyzing 150 markets publishing 3,000 research solutions, reaching over 14,000 subscribers, and covering thousands of technology, media & telecommunications companies.
Our exhaustive intelligence and deep technology expertise allow us to uncover actionable insights that help our customers connect the dots in today’s constantly evolving technology environment and empower them to improve their businesses – today and tomorrow.
………………………………………………………………………………………………………………………………………………..
Omdia is a registered trademark of Informa PLC and/or its affiliates. All other company and product names may be trademarks of their respective owners. Informa PLC registered in England & Wales with number 8860726, registered office and head office 5 Howick Place, London, SW1P 1WG, UK. Copyright © 2020 Omdia. All rights reserved.
*The majority of IHS Markit technology research products and solutions were acquired by Informa in August 2019 and are now part of Omdia.