Facebook’s F16 achieves 400G effective intra DC speeds using 100GE fabric switches and 100G optics, Other Hyperscalers?

On March 14th at the 2019 OCP Summit, Omar Baldonado of Facebook (FB) announced  a next-generation intra -data center (DC) fabric/topology called the F16.  It has 4x the capacity of their previous DC fabric design using the same Ethernet switch ASIC and 100GE optics. FB engineers developed the F16 using mature, readily available 100G 100G CWDM4-OCP optics (contributed by FB to OCP in early 2017), which in essence gives their data centers the same desired 4x aggregate capacity increase as 400G optical link speeds, but using 100G optics and 100GE switching.

F16 is based on the same Broadcom ASIC that was the candidate for a 4x-faster 400G fabric design – Tomahawk 3 (TH3). But FB uses it differently: Instead of four multichip-based planes with 400G link speeds (radix-32 building blocks), FB uses the Broadcom TH3 ASIC to create 16 single-chip-based planes with 100G link speeds (optimal radix-128 blocks).  Note that 400G optical components are not easy to buy inexpensively at Facebook’s large volumes. 400G ASICs and optics would also consume a lot more power, and power is a precious resource within any data center building.  Therefore, FB built the F16 fabric out of 16 128-port 100G switches, achieving the same bandwidth as four 128-port 400G switches would.

Below are some of the primary features of the F16 (also see two illustrations below):

-Each rack is connected to 16 separate planes. With FB Wedge 100S as the top-of-rack (TOR) switch, there is 1.6T uplink bandwidth capacity and 1.6T down to the servers.

-The planes above the rack comprise sixteen 128-port 100G fabric switches (as opposed to four 128-port 400G fabric switches).

-As a new uniform building block for all infrastructure tiers of fabric, FB created a 128-port 100G fabric switch, called Minipack – a flexible, single ASIC design that uses half the power and half the space of Backpack.

-Furthermore, a single-chip system allows for easier management and operations.

Facebook F16 data center network topology.Facebook F16 data center network topology

………………………………………………………………………………………………………………………………………………………………………………………………..

 Multichip 400G pod fabric switch topology vs. single-chip F16 at 100G.

Multichip 400G b/sec pod fabric switch topology vs. FBs single chip (Broadcom ASIC) F16 at 100G b/sec

…………………………………………………………………………………………………………………………………………………………………………………………………..

In addition to Minipack (built by Edgecore Networks), FB also jointly developed Arista Networks’ 7368X4 switch. FB is contributing both Minipack and the Arista 7368X4 to OCP. Both switches run FBOSS – the software that binds together all FB data centers.  Of course the Arista 7368X4 will also run that company’s EOS network operating system.

F16 was said to be more scalable and simpler to operate and evolve, so FB says their DCs are better equipped to handle increased intra-DC throughput for the next few years, the company said in a blog post.  “We deploy early and often,” Baldonado said during his OCP 2019 session (video below).  “The FB teams came together to rethink the DC network, hardware and software.  The components of the new DC are F16 and HGRID as the network topology, Minipak as the new modular switch, and FBOSS software which unifies them.”

This author was very impressed with Baldonado’s presentation- excellent content and flawless delivery of the information with insights and motivation for FBs DC design methodology and testing!

References:

https://code.fb.com/data-center-engineering/f16-minipack/

………………………………………………………………………………………………………………………………….

Other Hyperscale Cloud Providers move to 400GE in their DCs?

Large hyperscale cloud providers initially championed 400 Gigabit Ethernet because of their endless thirst for networking bandwidth. Like so many other technologies that start at the highest end with the most demanding customers, the technology will eventually find its way into regular enterprise data centers.  However, we’ve not seen any public announcement that it’s been deployed yet, despite its potential and promise!

Some large changes in IT and OT are driving the need to consider 400 GbE infrastructure:

  • Servers are more packed in than ever. Whether it is hyper-converged, blade, modular or even just dense rack servers, the density is increasing. And every server features dual 10 Gb network interface cards or even 25 Gb.
  • Network storage is moving away from Fibre Channel and toward Ethernet, increasing the demand for high-bandwidth Ethernet capabilities.
  • The increase in private cloud applications and virtual desktop infrastructure puts additional demands on networks as more compute is happening at the server level instead of at the distributed endpoints.
  • IoT and massive data accumulation at the edge are increasing bandwidth requirements for the network.

400 GbE can be split via a multiplexer into smaller increments with the most popular options being 2 x 200 Gb, 4 x 100 Gb or 8 x 50 Gb. At the aggregation layer, these new higher-speed connections begin to increase in bandwidth per port, we will see a reduction in port density and more simplified cabling requirements.

Yet despite these advantages, none of the U.S. based hyperscalers have announced they have deployed 400GE within their DC networks as a backbone or to connect leaf-spine fabrics.  We suspect they all are using 400G for Data Center Interconnect, but don’t know what optics are used or if it’s Ethernet or OTN framing and OAM.

…………………………………………………………………………………………………………………………………………………………………….

In February, Google said it plans to spend $13 billion in 2019 to expand its data center and office footprint in the U.S. The investments include expanding the company’s presence in 14 states. The dollar figure surpasses the $9 billion the company spent on such facilities in the U.S. last year.

In the blog post, CEO Sundar Pichai wrote that Google will build new data centers or expand existing facilities in Nebraska, Nevada, Ohio, Oklahoma, South Carolina, Tennessee, Texas, and Virginia. The company will establish or expand offices in California (the Westside Pavillion and the Spruce Goose Hangar), Chicago, Massachusetts, New York (the Google Hudson Square campus), Texas, Virginia, Washington, and Wisconsin. Pichai predicts the activity will create more than 10,000 new construction jobs in Nebraska, Nevada, Ohio, Texas, Oklahoma, South Carolina, and Virginia. The expansion plans will put Google facilities in 24 states, including data centers in 13 communities.  Yet there is no mention of what data networking technology or speed the company will use in its expanded DCs.

I believe Google is still designing all their own IT hardware (compute servers, storage equipment, switch/routers, Data Center Interconnect gear other than the PHY layer transponders). They announced design of many AI processor chips that presumably go into their IT equipment which they use internally but don’t sell to anyone else.  So they don’t appear to be using any OCP specified open source hardware.  That’s in harmony with Amazon AWS, but in contrast to Microsoft Azure which actively participates in OCP with its open sourced SONIC now running on over 68 different hardware platforms.

It’s no secret that Google has built its own Internet infrastructure since 2004 from commodity components, resulting in nimble, software-defined data centers. The resulting hierarchical mesh design is standard across all its data centers.  The hardware is dominated by Google-designed custom servers and Jupiter, the switch Google introduced in 2012. With its economies of scale, Google contracts directly with manufactures to get the best deals.
Google’s servers and networking software run a hardened version of the Linux open source operating system. Individual programs have been written in-house.
…………………………………………………………………………………………………………………………………………………………………………

 

 

 

Google Expands Cloud Network Infrastructure via 3 New Undersea Cables & 5 New Regions

Google has plans to build three new undersea cables in 2019 to support its Google Cloud customers. The company plans to co-commission the Hong Kong-Guam (HK-G) cable system as part of a consortium.   In a blog post by Ben Treynor Sloss, vice president of Google’s cloud platform, three undersea cables and five new regions were announced..

The HK-G will be an extension of the SEA-US cable system, and will have a design capacity of more than 48Tbps. It is being built by RTI-C and NEC. Google said that together with Indigo and other cable systems, HK-G will create multiple scalable, diverse paths to Australia. In addition, Google plans to commission Curie, a private cable connecting Chile to Los Angeles and Hvfrue, a consortium cable connecting the US to Denmark and Ireland as shown in the figure below.

Late last year, Google also revealed plans to open a Google Cloud Platform region in Hong Kong in 2018 to join its recently launched Mumbai, Sydney, and Singapore regions, as well as Taiwan and Tokyo.

Of the five new Google Cloud regions, Netherlands and Montreal will be online in the first quarter of 2018. Three others in Los Angeles, Finland, and Hong Kong will come online later this year. The Hong Kong region will be designed for high availability, launching with three zones to protect against service disruptions. The HK-G cable will provide improved network capacity for the cloud region.  Google promises they are not done yet and there will be additional announcements of other regions.

In an earlier announcement last week, Google revealed that it has implemented a compile-time patch for its Google Cloud Platform infrastructure to address the major CPU security flaw disclosed by Google’s Project Zero zero-day vulnerability unit at the beginning of this year.

Google Cloud Platform Regions

Diane Greene, who heads up Google’s cloud unit, often marvels at how much her company invests in Google Cloud infrastructure. It’s with good reason. Over the past three years since Greene came on board, the company has spent a whopping $30 billion beefing up the infrastructure.

infrastructure-3

Google has direct investment in 11 cables, including those planned or under construction. The three cables highlighted in yellow are being announced in this blog post. (In addition to these 11 cables where Google has direct ownership, the company also leases capacity on numerous additional submarine cables.)

In the referenced Google blog post, Mr Treynor Sloss wrote:

At Google, we’ve spent $30 billion improving our infrastructure over three years, and we’re not done yet. From data centers to subsea cables, Google is committed to connecting the world and serving our Cloud customers, and today we’re excited to announce that we’re adding three new submarine cables, and five new regions.

We’ll open our Netherlands and Montreal regions in the first quarter of 2018, followed by Los Angeles, Finland, and Hong Kong – with more to come. Then, in 2019 we’ll commission three subsea cables: Curie, a private cable connecting Chile to Los Angeles; Havfrue, a consortium cable connecting the U.S. to Denmark and Ireland; and the Hong Kong-Guam Cable system (HK-G), a consortium cable interconnecting major subsea communication hubs in Asia.

Together, these investments further improve our network—the world’s largest—which by some accounts delivers 25% of worldwide internet traffic……………….l.l….

Simply put, it wouldn’t be possible to deliver products like Machine Learning Engine, Spanner, BigQuery and other Google Cloud Platform and G Suite services at the quality of service users expect without the Google network. Our cable systems provide the speed, capacity and reliability Google is known for worldwide, and at Google Cloud, our customers are able to to make use of the same network infrastructure that powers Google’s own services.

While we haven’t hastened the speed of light, we have built a superior cloud network as a result of the well-provisioned direct paths between our cloud and end-users, as shown in the figure below.

infrastructure-4

According to Ben: “The Google network offers better reliability, speed and security performance as compared with the nondeterministic performance of the public internet, or other cloud networks. The Google network consists of fiber optic links and subsea cables between 100+ points of presence7500+ edge node locations90+ Cloud CDN  locations47 dedicated interconnect locations and 15 GCP regions.”

……………………………………………………………………………………………………………………………………………………………………………………………

Reference:

https://www.blog.google/topics/google-cloud/expanding-our-global-infrastructure-new-regions-and-subsea-cables/

 

 

Recent Posts