Vincent Rodriguez – IEEE ComSoc Technology Blog

Emerging Cybersecurity Risks in Modern Manufacturing Factory Networks

Posted on August 4, 2025 by Vincent Rodriguez

By Omkar Ashok Bhalekar with Ajay Lotan Thakur

Introduction

With the advent of new industry 5.0 standards and ongoing advancements in the field of Industry 4.0, the manufacturing landscape is facing a revolutionary challenge which not only demands sustainable use of environmental resources but also compels us to make constant changes in industrial security postures to tackle modern threats. Technologies such as Internet of Things (IoT) in Manufacturing, Private 4G/5G, Cloud-hosted applications, Edge-computing, and Real-time streaming telemetry are effectively fueling smart factories and making them more productive.

Although this evolution facilitates industrial automation, innovation and high productivity, it also greatly makes the exposure footprint more vulnerable for cyberattacks. Industrial Cybersecurity is quintessential for mission critical manufacturing operations; it is a key cornerstone to safeguard factories and avoid major downtimes.

With the rapid amalgamation of IT and OT (Operational Technology), a hack or a data breach can cause operational disruptions like line down situations, halt in production lines, theft or loss of critical data, and huge financial damage to an organization.

Industrial Networking

Why does Modern Manufacturing demand Cybersecurity? Below outlines a few reasons why cybersecurity is essential in modern manufacturing:

Convergence of IT and OT: Industrial control systems (ICS) which used to be isolated or air-gapped are now all inter-connected and hence vulnerable to breaches.
Enlarged Attack Surface: Every device or component in the factory which is on the network is susceptible to threats and attacks.
Financial Loss: Cyberattacks such as WannaCry or targeted BSOD Blue Screen of Death (BSOD) can cost millions of dollars per minute and result in complete shutdown of operations.
Disruptions in Logistics Network: Supply chain can be greatly disarrayed due to hacks or cyberattacks causing essential parts shortage.
Legislative Compliance: Strict laws and regulations such as CISA, NIST, and ISA/IEC 62443 are proving crucial and mandating frameworks to safeguard industries

It is important to understand and adapt to the changing trends in the cybersecurity domain, especially when there are several significant factors at risk. Historically, it has been observed that mankind always has had some lessons learned from their past mistakes while not only advances at fast pace, but the risks from external threats would limit us from making advancements without taking cognizance.

This attitude of adaptability or malleability needs to become an integral part of the mindset and practices in cybersecurity spheres and should not be limited to just industrial security. Such practices can scale across other technological fields. Moreover, securing industries does not just mean physical security, but it also opens avenues for cybersecurity experts to learn and innovate in the field of applications and software such as Manufacturing Execution System (MES) which are crucial for critical operations.

Greatest Cyberattacks in Manufacturing of all times:

Familiarizing and acknowledging different categories of attacks and their scales which have historically hampered the manufacturing domain is pivotal. In this section we would highlight some of the Real-World cybersecurity incidents.

Ransomware (Colonial Pipeline, WannaCry, y.2021):

These attacks brought the US east coast to a standstill due to extreme shortage of fuel and gasoline after hacking employee credentials.

Cause: The root cause for this was compromised VPN account credentials. An VPN account which wasn’t used for a long time and lacked Multi-factor Authentication (MFA) was breached and the credentials were part of a password leak on dark web. The Ransomware group “Darkside” exploited this entry point to gain access to Colonial Pipeline’s IT systems. They did not initially penetrate operational technology systems. However, the interdependence of IT and OT systems caused operational impacts. Once inside, attackers escalated privileges and exfiltrated 100 GB of data within 2 hours. Ransomware was deployed to encrypt critical business systems. Colonial Pipeline proactively shut down the pipeline fearing lateral movement into OT networks.

Effect: The pipeline, which supplies nearly 45% of the fuel to the U.S. East Coast, was shut down for 6 days. Mass fuel shortages occurred across several U.S. states, leading to public panic and fuel hoarding. Colonial Pipeline paid $4.4 million ransom. Later, approximately $2.3 million was recovered by the FBI. Led to a Presidential Executive Order on Cybersecurity and heightened regulations around critical infrastructure cybersecurity. Exposed how business IT network vulnerabilities can lead to real-world critical infrastructure impacts, even without OT being directly targeted.

Industrial Sabotage (Stuxnet, y.2009):

This unprecedented and novel software worm was able to hijack an entire critical facility and sabotage all the machines rendering them defunct.

Cause: Nation-state-developed malware specifically targeting Industrial Control Systems (ICS), with an unprecedented level of sophistication. Stuxnet was developed jointly by the U.S. (NSA) and Israel (Unit 8200) under operation “Olympic Games”. The target was Iran’s uranium enrichment program at Natanz Nuclear Facility. The worm was introduced via USB drives (air-gapped network). Exploited four zero-day vulnerabilities in Windows systems at that time, unprecedented. Specifically targeted Siemens Step7 software running on Windows, which controls Siemens S7-300 PLCs. Stuxnet would identify systems controlling centrifuges used for uranium enrichment. Reprogrammed the PLCs to intermittently change the rotational speed of centrifuges, causing mechanical stress and failure, while reporting normal operations to operators. Used rootkits for both Windows and PLC-level to remain stealthy.

Effect: Destroyed approximately 1,000 IR-1 centrifuges (~10% of Iran’s nuclear capability). Set back Iran’s nuclear program by 1-2 years. Introduced a new era of cyberwarfare, where malware caused physical destruction. Raised global awareness about the vulnerabilities in industrial control systems (ICS). Iran responded by accelerating its cyber capabilities, forming the Iranian Cyber Army. ICS/SCADA security became a top global priority, especially in energy and defense sectors.

Upgrade spoofing (SolarWinds Orion Supply chain Attack, y.2020):

Attackers injected malicious pieces of software into the software updates which infected millions of users.

Cause: Compromise of the SolarWinds build environment leading to a supply chain attack. Attackers known as Russian Cozy Bear, linked to Russia’s foreign intelligence agency, gained access to SolarWinds’ development pipeline. Malicious code was inserted into Orion Platform updates, released between March to June 2020 Customers who downloaded the update installed malware known as SUNBURST. Attackers compromised SolarWinds build infrastructure. It created a backdoor in Orion’s signed DLLs. Over 18,000 customers were potentially affected, including 100 high-value targets. After the exploit, attackers used manual lateral movement, privilege escalation, and custom C2 (command-and-control) infrastructure to exfiltrate data.

Effect: Breach included major U.S. government agencies: DHS, DoE, DoJ, Treasury, State Department, and more. Affected top corporations: Cisco, Intel, Microsoft, FireEye, and others FireEye discovered the breach after noticing unusual two-factor authentication activity. Exposed critical supply chain vulnerabilities and demonstrated how a single point of compromise could lead to nationwide espionage. Promoted the creation of Cybersecurity Executive Order 14028, Zero Trust mandates, and widespread adoption of Software Bill of Materials (SBOM) practices.

Spywares (Pegasus, y.2016-2021):

Cause: Zero-click and zero-day exploits leveraged by NSO Group’s Pegasus spyware, sold to governments. Pegasus can infect phones without any user interaction also known as zero-click exploits. It acquires malicious access to WhatsApp, iMessage or browsers like Safari’s vulnerabilities on iOS, including zero-days attacks on Android devices. Delivered via SMS, WhatsApp messages, or silent push notifications. Once installed, it provides complete surveillance capability such as access to microphones, camera, GPS, calls, photos, texts, and encrypted apps. Zero-click iOS exploit ForcedEntry allows complete compromise of an iPhone. Malware is extremely stealthy, often removing itself after execution. Bypassed Apple’s BlastDoor sandbox and Android’s hardened security modules.

Effect: Used by multiple governments to surveil activists, journalists, lawyers, opposition leaders, even heads of state. The 2021 Pegasus Project, led by Amnesty International and Forbidden Stories, revealed a leaked list of 50,000 potential targets. Phones of high-profile individuals including international journalists, associates, specifically French president, and Indian opposition figures were allegedly targeted which triggered legal and political fallout. NSO Group was blacklisted by the U.S. Department of Commerce. Apple filed a lawsuit against NSO Group in 2021. Renewed debates over the ethics and regulation of commercial spyware.

Other common types of attacks:

Phishing and Smishing: These attacks send out links or emails that appear to be legitimate but are crafted by bad actors for financial means or identity theft.

Social Engineering: Shoulder surfing though sounds funny; it’s the tale of time where the most expert security personnel have been outsmarted and faced data or credential leaks. Rather than relying on technical vulnerabilities, this attack targets human psychology to gain access or break into systems. The attacker manipulates people into revealing confidential information using techniques such as Reconnaissance, Engagement, Baiting or offering Quid pro quo services.

Security Runbook for Manufacturing Industries:

To ensure ongoing enhancements to industrial security postures and preserve critical manufacturing operations, following are 11 security procedures and tactics which will ensure 360-degree protection based on established frameworks:

A. Incident Handling Tactics (First Line of Defense) Team should continuously improve incident response with the help of documentation and response apps. Co-ordination between teams, communications root, cause analysis and reference documentation are the key to successful Incident response.

B. Zero Trust Principles (Trust but verify) Use strong security device management tools to ensure all end devices are in compliance such as trusted certificates, NAC, and enforcement policies. Regular and random checks on users’ official data patterns and assign role-based policy limiting full access to critical resources.

C. Secure Communication and Data Protection Use endpoint or cloud-based security session with IPSec VPN tunnels to make sure all traffic can be controlled and monitored. All user data must be encrypted using data protection and recovery software such as BitLocker.

D. Secure IT Infrastructure Hardening of network equipment such switches, routers, WAPs with dot1x, port-security and EAP-TLS or PEAP. Implement edge-based monitoring solutions to detect anomalies and redundant network infrastructure to ensure least MTTR.

E. Physical Security Locks, badge readers or biometric systems for all critical rooms and network cabinets are a must. A security operations room (SOC) can help monitor internal thefts or sabotage incidents.

F. North-South and East-West Traffic Isolation Safety traffic and external traffic can be rate limited using Firewalls or edge compute devices. 100% isolation is a good wishful thought, but measures need to be taken to constantly monitor any security punch-holes.

G. Industrial Hardware for Industrial applications Use appropriate Industrial grade IP67 or IP68 rated network equipment to avoid breakdowns due to environmental factors. Localized industrial firewalls can provide desired granularity on the edge thereby skipping the need to follow Purdue model.

H. Next-Generation Firewalls with Application-Level Visibility Incorporate Stateful Application Aware Firewalls, which can help provide more control over zones and policies and differentiate application’s behavioral characteristics. Deploy Tools which can perform deep packet inspection and function as platforms for Intrusion prevention (IPS/IDS).

I. Threat and Traffic Analyzer Tools such as network traffic analyzers can help achieve network Layer1-Layer7 security monitoring by detecting and responding to malicious traffic patterns. Self-healing networks with automation and monitoring tools which can detect traffic anomalies and rectify the network incompliance.

J. Information security and Software management Companies must maintain a repo of trust certificates, software and releases and keep pushing regular patches for critical bugs. Keep a constant track of release notes and CVEs (Common Vulnerabilities and exposures) for all vendor software.

K. Idiot-Proofing (How to NOT get Hacked) Regular training to employees and familiarizing them with cyber-attacks and jargons like CryptoJacking or HoneyNets can help create awareness. Encourage and provide a platform for employees or workers to voice their opinions and resolve their queries regarding security threats.

Current Industry Perspective and Software Response

In response to the escalating tide of cyberattacks in manufacturing, from the Triton malware striking industrial safety controls to LockerGoga shutting down production at Norsk Hydro, there has been a sea change in how the software industry is facilitating operational resilience. Security companies are combining cutting-edge threat detection with ICS/SCADA systems, delivering purpose-designed solutions like zero-trust network access, behavior-based anomaly detection, and encrypted machine-to-machine communications. Companies such as Siemens and Claroty are leading the way, bringing security by design rather than an afterthought. A prime example is Dragos OT-specific threat intelligence and incident response solutions, which have become the focal point in the fight against nation-state attacks and ransomware operations against critical infrastructure.

Bridging the Divide between IT and OT: Two way street

With the intensification of OT and IT convergence, perimeter-based defense is no longer sufficient. Manufacturers are embracing emerging strategies such as Cybersecurity Mesh Architecture (CSMA) and applying IT-centric philosophies such as DevSecOps within the OT environment to foster secure by default deployment habits. The trend also brings attention to IEC 62443 conformity as well as NIST based risk assessment frameworks catering to manufacturing. Legacy PLCs having been networked and exposed to internet-borne threats, companies are embracing micro-segmentation, secure remote access, and real-time monitoring solutions that unify security across both environments. Learn how Schneider Electric is empowering manufacturers to securely link IT/OT systems with scalable cybersecurity programs.

Conclusion

In a nutshell, Modern manufacturing, contrary to the past, is not just about quick input and quick output systems which can scale and be productive, but it is an ecosystem, where cybersecurity and manufacturing harmonize and just like healthcare system is considered critical to humans, modern factories are considered quintessential to manufacturing. So many experiences with cyberattacks on critical infrastructure such as pipelines, nuclear plants, power-grids over the past 30 years not only warrant world’s attention but also calls to action the need to devise regulatory standards which must be followed by each and every entity in manufacturing.

As mankind keeps making progress and sprinting towards the next industrial revolution, it’s an absolute exigency to emphasize making Industrial Cybersecurity a keystone in building upcoming critical manufacturing facilities and building a strong foundation for operational excellency. Now is the right time to buy into the trend of Industrial security, sure enough the leaders who choose to be “Cyberfacturers” will survive to tell the tale, and the rest may just serve as stark reminders of what happens when pace outperforms security.

References

About Author:

Omkar Bhalekar is a senior network engineer and technology enthusiast specializing in Data center architecture, Manufacturing infrastructure, and Sustainable solutions with extensive experience in designing resilient industrial networks and building smart factories and AI data centers with scalable networks. He is also the author of the Book Autonomous and Predictive Networks: The future of Networking in the Age of AI and co-author of Quantum Ops – Bridging Quantum Computing & IT Operations. Omkar writes to simplify complex technical topics for engineers, researchers, and industry leaders.

Countdown to Q-day: How modern-day Quantum and AI collusion could lead to The Death of Encryption

Posted on July 28, 2025 by Vincent Rodriguez

By Omkar Ashok Bhalekar with Ajay Lotan Thakur

Behind the quiet corridors of research laboratories and the whir of supercomputer data centers, a stealth revolution is gathering force, one with the potential to reshape the very building blocks of cybersecurity. At its heart are qubits, the building blocks of quantum computing, and the accelerant force of generative AI. Combined, they form a double-edged sword capable of breaking today’s encryption and opening the door to an era of both vast opportunity and unprecedented danger.

Modern Cryptography is Fragile

Modern-day computer security relies on the un-sinking complexity of certain mathematical problems. RSA encryption, introduced for the first time in 1977 by Rivest, Shamir, and Adleman, relies on the principle that factorization of a 2048-bit number into primes is computationally impossible for ordinary computers (RSA paper, 1978). Also, Diffie-Hellman key exchange, which was described by Whitfield Diffie and Martin Hellman in 1976, offers key exchange in a secure manner over an insecure channel based on the discrete logarithm problem (Diffie-Hellman paper, 1976). Elliptic-Curve Cryptography (ECC) was described in 1985 independently by Victor Miller and Neal Koblitz, based on the hardness of elliptic curve discrete logarithms, and remains resistant to brute-force attacks but with smaller key sizes for the same level of security (Koblitz ECC paper, 1987).

But quantum computing flips the script. Thanks to algorithms like Shor’s Algorithm, a sufficiently powerful quantum computer could factor large numbers exponentially faster than regular computers rendering RSA and ECC utterly useless. Meanwhile, Grover’s Algorithm provides symmetric key systems like AES with a quadratic boost.

What would take millennia or centuries to classical computers, quantum computers could boil down to days or even hours with the right scale. In fact, experts reckon that cracking RSA-2048 using Shor’s Algorithm could take just 20 million physical qubits which is a number that’s diminishing each year.

Generative AI adds fuel to the fire

While quantum computing threatens to undermine encryption itself, generative AI is playing an equally insidious but no less revolutionary role. By mass-producing activities such as the development of malware, phishing emails, and synthetic identities, generative AI models, large language models, and diffusion-based visual synthesizers, for example, are lowering the bar on sophisticated cyberattacks.

Even worse, generative AI can be applied to model and experiment with vulnerabilities in implementations of cryptography, including post-quantum cryptography. It can be employed to assist with training reinforcement learning agents that optimize attacks against side channels or profile quantum circuits to uncover new behaviors.

With quantum computing on the horizon, generative AI is both a sophisticated research tool and a player to watch when it comes to weaponization. On the one hand, security researchers utilize generative AI to produce, examine, and predict vulnerabilities in cryptography systems to inform the development of post-quantum-resistant algorithms. Meanwhile, it is exploited by malicious individuals for their ability to automate the production of complex attack vectors like advanced malware, phishing attacks, and synthetic identities radically reducing the barrier to conducting high impact cyberattacks. This dual-use application of generative AI radically shortens the timeline for adversaries to take advantage of breached or transitional cryptographic infrastructures, practically bridging the window of opportunity for defenders to deploy effective quantum-safe security solutions.

Real-World Implications

The impact of busted cryptography is real, and it puts at risk the foundations of everyday life:

1. Online Banking (TLS/HTTPS)

When you use your bank’s web site, the “https” in the address bar signifies encrypted communication over TLS (Transport Layer Security). Most TLS implementations rely on RSA or ECC keys to securely exchange session keys. A quantum attack would decrypt those exchanges, allowing an attacker to decrypt all internet traffic, including sensitive banking data.

2. Cryptocurrencies

Bitcoin, Ethereum, and other cryptocurrencies use ECDSA (Elliptic Curve Digital Signature Algorithm) for signing transactions. If quantum computers can crack ECDSA, a hacker would be able to forge signatures and steal digital assets. In fact, scientists have already performed simulations in which a quantum computer might be able to extract private keys from public blockchain data, enabling theft or rewriting the history of transactions.

3. Government Secrets and Intelligence Archives

National security agencies all over the world rely heavily on encryption algorithms such as RSA and AES to protect sensitive information, including secret messages, intelligence briefs, and critical infrastructure data. Of these, AES-256 is one that is secure even in the presence of quantum computing since it is a symmetric-key cipher that enjoys quantum resistance simply because Grover’s algorithm can only give a quadratic speedup against it, brute-force attacks remain gigantic in terms of resources and time. Conversely, asymmetric cryptographic algorithms like RSA and ECC, which underpin the majority of public key infrastructures, are fundamentally vulnerable to quantum attacks that can solve the hard mathematical problems they rely on for security.

Such a disparity offers a huge security gap. Information obtained today, even though it is in such excellent safekeeping now, might not be so in the future when sufficiently powerful quantum computers will be accessible, a scenario that is sometimes referred to as the “harvest now, decrypt later” threat. Both intelligence agencies and adversaries could be quietly hoarding and storing encrypted communications, confident that quantum technology will soon have the capability to decrypt this stockpile of sensitive information. The Snowden disclosures placed this threat in the limelight by revealing that the NSA catches and keeps vast amounts of global internet traffic, such as diplomatic cables, military orders, and personal communications. These repositories of encrypted data, unreadable as they stand now, are an unseen vulnerability; when Q-Day which is the onset of available, practical quantum computers that can defeat RSA and ECC, come around, confidentiality of decades’ worth of sensitive communications can be irretrievably lost.

Such a compromise would have apocalyptic consequences for national security and geopolitical stability, exposing classified negotiations, intelligence operations, and war plans to adversaries. Such a specter has compelled governments and security entities to accelerate the transition to post-quantum cryptography standards and explore quantum-resistant encryption schemes in an effort to safeguard the confidentiality and integrity of information in the era of quantum computing.

Arms Race Toward Post-Quantum Cryptography

In response, organizations like NIST are leading the development of post-quantum cryptographic standards, selecting algorithms believed to be quantum resistant. But migration is glacial. Implementing backfitting systems with new cryptographic foundations into billions of devices and services is a logistical nightmare. This is not a process of merely software updates but of hardware upgrades, re-certifications, interoperability testing, and compatibility testing with worldwide networks and critical infrastructure systems, all within a mode of minimizing downtime and security vulnerabilities.

Building such a large quantum computer that can factor RSA-2048 is an enormous task. It would require millions of logical qubits with very low error rates, it’s estimated. Today’s high-end quantum boxes have less than 100 operational qubits, and their error rates are too high to support complicated processes over a long period of time. However, with continued development of quantum correction methods, materials research, and qubit coherence times, specialists warn that effective quantum decryption capability may appear more quickly than the majority of organizations are prepared to deal with.

This convergence time frame, when old and new environments coexist, is where danger is most present. Attackers can use generative AI to look for these hybrid environments in which legacy encryption is employed, by botching the identification of old crypto implementations, producing targeted exploits en masse, and choreographing multi-step attacks that overwhelm conventional security monitoring and patching mechanisms.

Preparing for the Convergence

In order to be able to defend against this coming storm, the security strategy must evolve:

Inventory Cryptographic Assets: Firms must take stock of where and how encryption is being used across their environments.
Adopt Crypto-Agility: System needs to be designed so it can easily switch between encryption algorithms without full redesign.
Quantum Test Threats: Use AI tools to stress-test quantum-like threats in encryption schemes.

Adopt PQC and Zero-Trust Models: Shift towards quantum-resistant cryptography and architectures with breach as the new default state.

In Summary

Quantum computing is not only a looming threat, it is a countdown to a new cryptographic arms race. Generative AI has already reshaped the cyber threat landscape, and in conjunction with quantum power, it is a force multiplier. It is a two-front challenge that requires more than incremental adjustment; it requires a change of cybersecurity paradigm.

Panic will not help us. Preparation will.

Abbreviations

RSA – Rivest, Shamir, and Adleman
ECC – Elliptic-Curve Cryptography
AES – Advanced Encryption Standard
TLS – Transport Layer Security
HTTPS – Hypertext Transfer Protocol Secure
ECDSA – Elliptic Curve Digital Signature Algorithm
NSA – National Security Agency
NIST – National Institute of Standards and Technology
PQC – Post-Quantum Cryptography

References

***Google’s Gemini is used in this post to paraphrase some sentences to add more context. ***

About Author:

Liquid Dreams: The Rise of Immersion Cooling and Underwater Data Centers

Posted on July 10, 2025 by Vincent Rodriguez

By Omkar Ashok Bhalekar with Ajay Lotan Thakur

As demand for data keeps rising, driven by generative AI, real-time analytics, 8K streaming, and edge computing, data centers are facing an escalating dilemma: how to maintain performance without getting too hot. Traditional air-cooled server rooms that were once large enough for straightforward web hosting and storage are being stretched to their thermal extremes by modern compute-intensive workloads. While the world’s digital backbone burns hot, innovators are diving deep, deep to the ocean floor. Say hello to immersion cooling and undersea data farms, two technologies poised to revolutionize how the world stores and processes data.

Heat Is the Silent Killer of the Internet – In each data center, heat is the unobtrusive enemy. If racks of performance GPUs, CPUs, and ASICs are all operating at the same time, they generate massive amounts of heat. The old approach with gigantic HVAC systems and chilled air manifolds is reaching its technological and environmental limits.

In the majority of installations, over 35-40% of total energy consumption is spent on simply cooling the hardware, rather than running it. As model sizes and inference loads explode (think ChatGPT, DALL·E, or Tesla FSD), traditional cooling infrastructures simply aren’t up to the task without costly upgrades or environmental degradation. This is why there is a paradigm shift.

Liquid cooling is not an option everywhere due to lack of infrastructure, expense, and geography, so we still must rely on every player in the ecosystem to step up the ante when it comes to energy efficiency. The burden crosses multiple domains, chip manufacturers need to deliver far greater performance per watt with advanced semiconductor design, and software developers need to write that’s fundamentally low power by optimizing algorithms and reducing computational overhead.

Along with these basic improvements, memory manufacturers are designing low-power solutions, system manufacturers are making more power-efficient delivery networks, and cloud operators are making their data center operations more efficient while increasing the use of renewable energy sources. As Microsoft Chief Environmental Officer Lucas Joppa said, “We need to think about sustainability not as a constraint, but as an innovative driver that pushes us to build more efficient systems across every layer of the stack of technology.”

However, despite these multifaceted efficiency gains, thermal management remains a significant bottleneck that can have a deep and profound impact on overall system performance and energy consumption. Ineffective cooling can force processors to slow down their performance, which is counterintuitive to better chips and optimized software. This becomes a self-perpetuating loop where wasteful thermal management will counteract efficiency gains elsewhere in the system.

In this blogpost, we will address the cooling aspect of energy consumption, considering how future thermal management technology can be a multiplier of efficiency across the entire computing infrastructure. We will explore how proper cooling strategies not only reduce direct energy consumption from cooling components themselves but also enable other components of the system to operate at their maximum efficiency levels.

What Is Immersion Cooling?

Immersion cooling cools servers by submerging them in carefully designed, non-conductive fluids (typically dielectric liquids) that transfer heat much more efficiently than air. Immersion liquids are harmless to electronics; in fact, they allow direct liquid contact cooling with no risk of short-circuiting or corrosion.

Two general types exist:

Single-phase immersion, with the fluid remaining liquid and transferring heat by convection.
Two-phase immersion, wherein fluid boils at low temperature, gets heated and condenses in a closed loop.

According to Vertiv’s research, in high-density data centers, liquid cooling improves the energy efficiency of IT and facility systems compared to air cooling. In their fully optimized study, the introduction of liquid cooling created a 10.2% reduction in total data center power and a more than 15% improvement in Total Usage Effectiveness (TUE).

Total Usage Effectiveness is calculated by using the formula below:

TUE = ITUE x PUE (ITUE = Total Energy Into the IT Equipment/Total Energy into the Compute Components, PUE = Power Usage Effectiveness)

Reimagining Data Centers Underwater
Imagine shipping an entire data center in a steel capsule and sinking it to the ocean floor. That’s no longer sci-fi.

Microsoft’s Project Natick demonstrated the concept by deploying a sealed underwater data center off the Orkney Islands, powered entirely by renewable energy and cooled by the surrounding seawater. Over its two-year lifespan, the submerged facility showed:

A server failure rate 1/8th that of land-based centers.
No need for on-site human intervention.
Efficient, passive cooling by natural sea currents.

Why underwater? Seawater is an open, large-scale heat sink, and underwater environments are naturally less prone to temperature fluctuations, dust, vibration, and power surges. Most coastal metropolises are the biggest consumers of cloud services and are within 100 miles of a viable deployment site, which would dramatically reduce latency.

Why This Tech Matters Now Data centers already account for about 2–3% of the world’s electricity, and with the rapid growth in AI and metaverse workloads, that figure will grow. Generative inference workloads and AI training models consume up to 10x the power per rack that regular server workloads do, subjecting cooling gear and sustainability goals to tremendous pressure. Legacy air cooling technologies are reaching thermal and density thresholds, and immersion cooling is a critical solution to future scalability. According to Submer, a Barcelona based immersion cooling company, immersion cooling has the ability to reduce energy consumed by cooling systems by up to 95% and enable higher rack density, thus providing a path to sustainable growth in data centers under AI-driven demands

Advantages & Challenges

Immersion and submerged data centers possess several key advantages:

Sustainability – Lower energy consumption and lower carbon footprints are paramount as ESG (Environmental, Social, Governance) goals become business necessities.
Scalability & Efficiency – Immersion allows more density per square foot, reducing real estate and overhead facility expenses.
Reliability – Liquid-cooled and underwater systems have fewer mechanical failures including less thermal stress, fewer moving parts, and less oxidation.
Security & Autonomy – Underwater encased pods or autonomous liquid systems are difficult to hack and can be remotely monitored and updated, ideal for zero-trust environments.

While there are advantages of Immersion Cooling / Submerges Datacenters, there are some challenges/limitations as well –

Maintenance and Accessibility Challenges – Both options make hardware maintenance complex. Immersion cooling requires careful removal and washing of components to and from dielectric liquids, whereas underwater data centers provide extremely poor physical access, with entire modules having to be removed to fix them, which translates to longer downtimes.
High Initial Costs and Deployment Complexity – Construction of immersion tanks or underwater enclosures involves significant capital investment in specially designed equipment, infrastructure, and deployment techniques. Underwater data centers are also accompanied by marine engineering, watertight modules, and intricate site preparation.
Environmental and Regulatory Concerns – Both approaches involve environmental issues and regulatory adherence. Immersion systems struggle with fluid waste disposal regulations, while underwater data centers have marine environmental impact assessments, permits, and ongoing ecosystem protection mechanisms.
Technology Maturity and Operational Risks – These are immature technologies with minimal historical data on long-term performance and reliability. Potential problems include leakage of liquids in immersion cooling or damage and biofouling in underwater installation, leading to uncertain large-scale adoption.

Industry Momentum

Various companies are leading the charge:

GRC (Green Revolution Cooling) and submersion cooling offer immersion solutions to hyperscalers and enterprises.
HPC is offered with precision liquid cooling by Iceotope. Immersion cooling at scale is being tested by Alibaba, Google, and Meta to support AI and ML clusters.
Microsoft is researching commercial viability of underwater data centers as off-grid, modular ones in Project Natick.

Hyperscalers are starting to design entire zones of their new data centers specifically for liquid-cooled GPU pods, while smaller edge data centers are adopting immersion tech to run quietly and efficiently in urban environments.

The Future of Data Centers: Autonomous, Sealed, and Everywhere
Looking ahead, the trend is clear: data centers are becoming more intelligent, compact, and environmentally integrated. We’re entering an era where:
AI-based DCIM software predicts and prevents failure in real-time.
Edge nodes with immersive cooling can be located anywhere, smart factories, offshore oil rigs.
Entire data centers might be built as prefabricated modules, inserted into oceans, deserts, or even space.
The general principle? Compute must not be limited by land, heat, or humans.

Final Thoughts

In the fight to enable the digital future, air is a luxury. Immersed in liquid or bolted to the seafloor, data centers are shifting to cool smarter, not harder.

Underwater installations and liquid cooling are no longer out-there ideas, they’re lifelines to a scalable, sustainable web.

So, tomorrow’s “Cloud” won’t be in the sky, it will hum quietly under the sea.

References

About Author:
Omkar Bhalekar is a senior network engineer and technology enthusiast specializing in Data center architecture, Manufacturing infrastructure, and Sustainable solutions. With extensive experience in designing resilient industrial networks and building smart factories and AI data centers with scalable networks, Omkar writes to simplify complex technical topics for engineers, researchers, and industry leaders.

Cloud Hosting- A Deep Dive into a Rapidly Growing Market

Posted on September 16, 2024 by Vincent Rodriguez

By Paribhasha Tiwari, Market Analyst and Content Curator

What is Cloud Hosting?

Cloud hosting is a type of web hosting that uses a network of virtual servers to host websites, applications, or data. Unlike traditional hosting, which relies on a single server, cloud hosting spreads the load across multiple interconnected servers, ensuring better scalability, reliability, and performance.

As digital transformation continues to accelerate across the globe, the Cloud Hosting Market is emerging as a key enabler of business agility, scalability, and innovation. Whether its large enterprises seeking to optimize their IT infrastructure or startups aiming to minimize overhead, cloud hosting has become the go-to solution. With the market poised for substantial growth, driven by the demands of modern business environments, this post will explore the factors fueling the expansion of cloud hosting, the challenges that must be navigated, and the future trends shaping this dynamic industry.

The Growth of Cloud Hosting- A Market Overview:

The global cloud hosting market has experienced tremendous growth over the past decade, becoming one of the most critical components of the broader cloud services ecosystem. According to market research, the cloud hosting market was valued at $60 billion in 2021 and is projected to reach over $100 billion by 2026, reflecting a CAGR of 18% during the forecast period. This growth can be attributed to several factors, including the increasing adoption of digital services, the rise of e-commerce, and the surge in remote working practices following the COVID-19 pandemic.

The transition from on-premises infrastructure to cloud-based hosting solutions offers businesses greater flexibility in scaling their operations. Unlike traditional hosting services, where companies are locked into a fixed amount of server capacity, cloud hosting allows for on-demand resource allocation, ensuring businesses only pay for what they use. This model provides substantial cost savings, particularly for industries with fluctuating traffic and demand, such as retail, media, and financial services.

Why Businesses are Migrating to Cloud Hosting Solutions?

Cloud hosting’s advantages extend far beyond cost savings. Companies today are looking for more than just affordable IT infrastructure—they need reliability, performance, and scalability. These three elements have made cloud hosting the preferred choice for businesses of all sizes. Here’s why-

Scalability and Flexibility– The most compelling reason businesses are moving to the cloud is the ability to scale IT resources dynamically. Whether it’s handling a sudden spike in website traffic or expanding business operations globally, cloud hosting allows enterprises to scale resources up or down based on real-time needs. This elasticity ensures businesses remain agile and responsive to market conditions without worrying about over-provisioning or underutilization of hardware.
Cost Efficiency– Cloud hosting significantly reduces the capital expenditure (CapEx) associated with setting up and maintaining physical servers. Instead of investing in costly infrastructure, businesses can convert these expenses to operational expenditures (OpEx) by paying only for the computing power they need. This is especially beneficial for startups and small-to-medium enterprises (SMEs) that need to manage their budgets tightly while still accessing world-class hosting services.
Performance and Uptime– Leading cloud hosting providers guarantee uptime of 99.99% or higher, thanks to their globally distributed data centers and redundant systems. Downtime can be catastrophic, especially for e-commerce platforms, fintech companies, and digital service providers where revenue and customer satisfaction are directly tied to system availability. With cloud hosting, businesses can rely on seamless performance, even during peak demand periods.
Disaster Recovery and Business Continuity– Another critical advantage of cloud hosting is the ability to integrate disaster recovery (DR) strategies into the hosting plan. In traditional hosting, implementing a comprehensive DR plan is complex and costly. Cloud hosting, however, offers built-in DR features like data backups, redundancy, and geo-replication, ensuring businesses can recover swiftly from any unplanned outage or data loss event.

Survey of Cloud Network Access Alternatives- Wireless and Wireline Solutions:

The growing adoption of cloud hosting has raised an important question for businesses- How should they connect to the cloud? Network access plays a critical role in determining the performance, reliability, and cost-efficiency of cloud hosting services. As businesses weigh their options, they typically consider two main types of cloud access solutions- wireline (fixed-line) and wireless alternatives.

Wireline Access– Traditionally, businesses have relied on fixed-line connections, such as fiber-optic, DSL, or Ethernet, for cloud access. These wireline solutions provide stable, high-bandwidth connections and are often favored by enterprises with high-performance computing needs or large volumes of data traffic. Fiber-optic connections, in particular, offer ultra-high speeds and low latency, making them ideal for industries like finance, healthcare, and media, where data-intensive operations are critical.
- Advantages of Wireline Access– Wireline access is known for its reliability and low-latency performance, making it an excellent choice for mission-critical applications that require real-time data processing. Additionally, wireline connections tend to offer better security and dedicated bandwidth, reducing the risk of network congestion.
- Challenges of Wireline Access– The main downside to wireline solutions is the lack of mobility and flexibility. Fixed-line connections are geographically limited, which can pose challenges for businesses with distributed workforces or operations in remote areas.
Wireless Access– As cloud technology evolves, wireless access solutions, particularly through 4G LTE, 5G, and Wi-Fi 6, are becoming increasingly popular. Wireless cloud access is particularly appealing for businesses with remote or mobile operations, where flexibility and mobility are critical. The advent of 5G has been a game-changer, offering near fiber-optic speeds, reduced latency, and massive device connectivity—all of which are crucial for sectors like logistics, manufacturing, and retail.
- Advantages of Wireless Access– Wireless solutions enable businesses to connect to the cloud from virtually anywhere, without the need for fixed infrastructure. This is particularly beneficial for businesses with multiple locations or a high degree of mobility. Wireless connections, particularly with 5G and Wi-Fi 6, provide speeds and performance that rival many traditional wireline solutions, making wireless access a viable alternative for many use cases.
- Challenges of Wireless Access– Despite its flexibility, wireless cloud access can be more prone to latency and security concerns compared to wireline solutions. While advances in 5G and Wi-Fi technology are helping to mitigate these issues, businesses must carefully assess their wireless infrastructure to ensure performance and security standards are met.

Security and Compliance- Addressing Concerns in the Cloud:

For many businesses, security remains a primary concern when moving to the cloud. Despite the growing adoption of cloud services, there are lingering worries about data breaches, hacking attempts, and regulatory compliance. However, cloud hosting providers have significantly improved their security protocols, and many offer industry-leading security measures that far surpass traditional on-premises hosting.

Advanced Encryption– Data is often stored in encrypted formats, both at rest and in transit, ensuring that unauthorized users cannot access sensitive information. Leading cloud providers offer 256-bit encryption standards, along with secure key management systems to further bolster security.
Compliance with Regulatory Standards– Cloud hosting platforms comply with major regulatory standards, including GDPR, HIPAA, SOC 2, and ISO 27001 certifications. This compliance ensures that businesses operating in regulated industries such as healthcare, finance, and retail can meet legal requirements without having to manage their own data compliance infrastructure.
Multi-factor Authentication (MFA) and Identity Management– Cloud providers also implement stringent access controls, using technologies like MFA, identity and access management (IAM) systems, and biometric authentication to safeguard user data.

Challenges Facing the Cloud Hosting Market:

Despite its numerous benefits, the cloud hosting sector still faces challenges that need to be addressed for continued growth-

Latency and Regional Data Centers– One of the key challenges is latency, particularly for businesses with a global customer base. To ensure minimal latency, cloud hosting providers need to offer data centers in geographically diverse locations. However, regions with limited data center infrastructure, such as parts of Africa and Latin America, may still face connectivity and latency issues.
Data Sovereignty and Privacy– Another challenge is navigating data sovereignty laws, which dictate that certain types of data must be stored within specific geographical regions. Businesses operating across borders must ensure compliance with local data regulations, which may complicate cloud hosting strategies.
Vendor Lock-In– Vendor lock-in occurs when a business becomes too dependent on a single cloud hosting provider, limiting its flexibility to switch providers or adopt new technologies. While many cloud providers offer tools to mitigate lock-in, businesses must carefully plan their cloud strategy to avoid over-reliance on a single vendor.

What’s Next for Cloud Hosting- Key Emerging Trends:

As the market continues to grow, several trends are emerging that will shape its future trajectory-

Hybrid and Multi-Cloud Strategies– Businesses are increasingly adopting hybrid and multi-cloud strategies, combining public and private cloud environments to optimize cost, performance, and security. Hybrid cloud allows companies to store sensitive data in a private cloud while taking advantage of the scalability of the public cloud for less sensitive workloads.
Edge Computing– With the rise of Internet of Things (IoT) devices and the need for low-latency computing, edge computing is becoming a critical component of cloud hosting. By bringing computing resources closer to the data source, edge computing reduces latency, enhances real-time processing, and improves the overall performance of IoT applications.
AI and Automation in Cloud Hosting– Artificial Intelligence (AI) and automation are transforming cloud hosting services. AI-powered cloud infrastructure can optimize resource allocation, predict maintenance needs, and enhance cybersecurity. Automation tools are also helping businesses manage their cloud environments more efficiently, reducing the need for manual intervention.

Conclusion- Making the Most of Cloud Hosting:

The cloud hosting industry is not just growing, it is evolving to meet the complex demands of modern businesses. With its scalable, secure, and cost-effective infrastructure, cloud hosting offers unparalleled opportunities for companies to innovate and stay competitive in an increasingly digital world. By understanding the benefits, challenges, and future trends, businesses can make informed decisions to maximize the potential of their cloud hosting strategies and drive long-term growth.

Author: Paribhasha Tiwari, Market Analyst and Content Curator

………………………………………………………………………………………………………………

References:

What is Cloud Hosting? Benefits and Risks

https://aws.amazon.com/what-is/cloud-hosting/

IDC: Public Cloud services spending to hit $1.35 trillion in 2027

https://www.forbes.com/advisor/business/software/best-cloud-hosting/

Telecom and AI Status in the EU

Posted on July 5, 2024 by Vincent Rodriguez

By Afnan Khan with Ajay Lotan Thakur

Introduction

In the eerie silence of deserted streets and amidst the anxious hum of masked conversations, the world found itself gripped by the rapid proliferation of COVID-19. Soon labelled a global pandemic due to the havoc wreaked by soaring death tolls, it brought unprecedented disruption and accelerated the inevitable rise of the digital age. The era of digital transformation has swiftly transitioned, spawning a multitude of businesses catering to every human need. Today, our dependence on digital technology remains steadfast, with remote work becoming the norm and IT services spending increasing from $1.071 trillion in 2020 to $1.585 trillion. [1]

The chart below, sourced from Oliver Wyman Forum Analysis,[2] vividly illustrates our increasing dependence on technology. It presents findings from a survey conducted in the latter half of 2020 across eight countries – US, UK, France, Germany, Italy, Spain, Singapore, and China. The survey reveals that 60% of respondents favoured increased use of video conferencing, while online grocery shopping and telehealth services each garnered 59% approval, and E-learning showed a strong preference at 56%. This data underscores how swiftly digital solutions integrated into our daily lives during the pandemic.

_{Source: Olive Wyman Forum Analysis [2]}

Advancements in Telecom and AI Applications Across EU

The graph below represents the project and infrastructure finance deal volume in the telecommunications sector from 2020 to 2023. The dominance of Germany is evident, with the deal volume reported to be $36.115 billion, followed by the UK at $21.889 billion. France follows closely in third place with a deal volume of $20.415 billion, representing significant market potential. The only other two countries with substantial figures are Italy and Spain, although there have been some promising deals closing in Ireland, Portugal, and Romania with large new financing deals in the project finance sector.

_{Source: Proximo Intelligence [5]}

Deutsche Telekom, the national provider, has spearheaded advancements with AI-powered network optimisation tools. These tools leverage real-time analytics, resulting in a notable 20% enhancement in network performance and a 15% reduction in customer complaints. [14] While 2022 marked a pivotal year for the industry in Germany, the evolution of German fibre optics infrastructure has continued apace. Germany led Europe’s FTTH (Fibre to the Home) initiative, with significant financings closing throughout the year. According to Proximo Data, 16 European FTTH financings concluded in 2022, amassing nearly $26 billion in deal volume, with German deals accounting for almost $9 billion of that total.

Spain’s Telefónica has deployed an advanced AI-driven fraud detection system that effectively blocks over 95% of fraudulent activities. This initiative not only protects Telefónica from financial losses but also enhances security for its customers. [15] The adoption of AI for cybersecurity underscores a broader trend in the telecom industry towards leveraging advanced technologies to bolster trust and safeguard digital transactions.

Orange has introduced AI-driven chatbots that autonomously handle more than 90% of customer queries in France, resulting in a significant reduction in customer service costs by 40% and a notable increase in customer satisfaction rates by 25%. [16] This innovation represents a paradigm shift in customer service automation within the telecom sector, demonstrating the effectiveness of AI in improving operational efficiency and enhancing the overall customer experience.

Telecom Italia (TIM) has implemented AI-powered network security solutions to proactively detect and mitigate cyber threats in real-time, achieving a remarkable 60% reduction in cybersecurity. [17]

This strategic deployment of AI highlights TIM’s commitment to enhancing network resilience and safeguarding critical infrastructure from evolving cyber threats, setting a precedent for cybersecurity strategies in the telecommunications industry.

Predictive Analysis Enhancing Telecom Resilience

Interference mitigation strategies are essential for smooth digital operations in the post-pandemic world. Picture digital experts rapidly addressing problems from rogue networks and environmental noise, creating a digital shield against disruptions, and ensuring a seamless user experience. These strategies propel telecom companies towards better connectivity and user satisfaction.

These examples highlight the trend of using AI and predictive analytics to boost network performance in cities. As urban areas contend with population growth and increasing digital demands, telecom companies invest in advanced technologies. These reduce network congestion, enhance service reliability, and support sustainable urban development. This trend not only improves customer experience but also positions telecom providers as leaders in developing future smart cities.

The chart below, from the Proximo Intelligence database, shows European deal volumes over the past three years, categorised by sub-sectors. The broadband and cable network sector leads with a deal volume of $79.784 billion from 86 deals out of a total 137 in project and infrastructure finance. Cellular and mobile infrastructure follows with $31.292 billion across 25 deals. Data centres, a growing trend, also report a deal volume of $30.967 billion across 25 deals.

_{Source: Proximo Intelligence [5]}

In the post-COVID era, the adoption of predictive maintenance and real-time monitoring has accelerated, becoming a critical component of the new normal for businesses. These technologies enable companies to build more resilient infrastructures, proactively mitigate risks, and enhance operational efficiency. As businesses continue adapting to a rapidly changing environment, the integration of predictive maintenance solutions plays a pivotal role in sustaining long-term growth and stability.

Europe has seen profound impacts from these advancements, setting a precedent for global telecom strategies moving forward.

Future Trends and The Way Forward

European telecommunications face challenges shaped by regulatory frameworks, economic conditions, and technological advancements:

Brexit introduces regulatory uncertainties for UK telecoms. [18]
Germany’s GDPR compliance challenges demand heavy investment. [19]
Spain faces economic instability affecting telecom investments. [20]
France’s 5G deployment is delayed by regulatory barriers. [21]
Italy’s 5G rollout is hindered by spectrum allocation challenges. [22]
The Netherlands invests in cybersecurity for evolving threats. [23]
Sweden focuses on bridging rural connectivity gaps. [24]
Switzerland navigates complex regulatory landscapes for innovation. [25]

In the wake of COVID-19, with masks now a thing of the past and streets deserted only due to construction, digital technologies are transforming European telecommunications amidst regulatory shifts and economic uncertainties. Investments in infrastructure and AI innovations are pivotal, shaping the industry’s future and its adaptation to rapid change while driving economic recovery across Europe. How will the industry sustain innovation and meet growing digital demands ahead? Only time will tell.

References

https://www.statista.com/statistics/203291/global-it-services-spending-forecast/
https://www.oliverwyman.com/our-expertise/perspectives/health/2021/mar/why-4-technologies-that-boomed-during-covid-19-will-keep-people-.html
https://www.worldometers.info/coronavirus/
https://www.gov.uk/government/news/new-data-shows-small-businesses-received-213-billion-in-covid-19-local-authority-business-support-grants#:~:text=Press%20release-,New%20data%20shows%20small%20businesses%20received%20%C2%A321.3%20billion%20in,and%20arts%2C%20entertainment%20and%20recreation.
Proximo Intelligence Data: www.proximoinfra.com
Vodafone Press Release, 2022.
“McKinsey & Company. “Predictive maintenance: The rise of self-maintaining assets.”
Deloitte. “Predictive maintenance: Taking proactivity to the next level.”
Forbes. “Why Virtual Assistants Are Becoming Essential for Businesses.”
Statista. “Growth in Demand for Virtual Assistants in Europe.”
TechRadar. “Vodafone’s AI traffic prediction cuts network congestion by 25% in London.”
The Guardian. “BT/EE’s AI traffic prediction cuts network congestion by 30% in London.”
FCC (2023). Spectrum Efficiency Report. Federal Communications Commission. Available at: https://www.fcc.gov/reports-research/reports/fcc-research/spectrum-efficiency.
Deutsche Telekom’s AI-Powered Network Optimization,” TechInsights
“Telefónica’s AI-Driven Fraud Detection,” TelecomsToday. Available at: TelecomsToday AI Fraud Detection
“Orange’s AI-Enabled Customer Support,” AI Insider. Available at: AI Insider AI Customer Support
“TIM’s AI-Powered Cybersecurity Measures,” CyberTechNews. Available at: CyberTechNews AI Cybersecurity
TelecomsInsight. “Brexit’s Regulatory Impact on UK Telecoms.”
DataPrivacyToday. “GDPR Compliance Challenges for German Telcos.”
BusinessWire. “Spain’s Economic Recovery Challenges.”
TelecomsObserver. “France’s Regulatory Roadblocks to 5G Deployment.”
SpectrumInsight. “Italy’s Spectrum Allocation Challenges.”
CyberDefenseMag. “Netherlands’ Cybersecurity Imperatives.”
DigitalInclusionHub. “Sweden’s Rural Connectivity Initiatives.”
RegTechInsights. “Switzerland’s Regulatory Adaptation Challenges.”

Afnan Khan is a Machine Learning Engineer specialising in Marketing Analytics, currently working as a Marketing Analyst at the Exile Group in London. He is involved in various projects, research, and roles related to Machine Learning, Data Science, and AI.

Ajay Lotan Thakur is a Senior IEEE Member, IEEE Techblog Editorial Board Member, BCS Fellow, TST Member of ONF’s Open-Source Aether (Private 5G) Project, Cloud Software Architect at Intel Canada.

Post COVID Telco AI Blueprint for the UK

Posted on July 5, 2024 by Vincent Rodriguez

By Afnan Khan with Ajay Lotan Thakur

Introduction

Accelerating Telecom Growth in Britain

Europe was among the hardest-hit regions by the pandemic, with death tolls exceeding 2.1 million. [3] This crisis accelerated the adoption of digital technologies, prompting businesses to invest in smarter, more sustainable operations to increase their longevity and stay relevant in the market.

In the United Kingdom, despite the government’s injection of £21.3 billion into the economy to support small businesses, the emphasis on digital transformation has been paramount. [4] The push towards digital solutions, including enhanced internet connectivity and robust data centres, underscores the long-term strategic shift towards a more resilient and technologically advanced business landscape.

Statistically, the UK telecom industry has experienced significant growth, driven by increased demand and advancements in network equipment. The shift towards digital dependency, accelerated by the COVID-19 pandemic and the rise of remote work, is expected to be long-term. This trend has also led to a surge in 5G and data centre deals.

According to Proximo, a leading Project and Infrastructure Finance Journal, projects worth $30.967 billion have closed in Europe between 2020 and 2023, highlighting the critical role of data centres in boosting the telecommunications sector. Of this, the UK accounted for $14.133 billion across seven deals, comprising both refinancing and new financing deals, representing 45.6% of Europe’s total contribution. Notably, one of the recent financing deals to close was for Ark Data Centres, based in London, with the term loan reported to be in the region of £170 million for five years, aimed at supporting a significant data project in the UK – thus establishing the country as one of the market leaders in Europe. [5]

Telecom Landscape in the UK’s New Normal

Imagine having the ability to pinpoint precisely when hardware needs replacement, akin to pre-emptively replacing floorboards. Vodafone’s United Performance Management (UPM) facilitates real-time monitoring and proactive identification of anomalies. [6] Predictive maintenance can reduce unplanned downtime by 30-50%, lower maintenance costs by 10-40%, and extend asset lifespan by 20-40%. [7][8]

Virtual Assistants

The integration of virtual assistants has not only streamlined operations but has also emerged as one of the most sought-after roles, as reported by Forbes. [9] In the telecom industry, where customer service reigns supreme, consider the live example of broadband giant BT/EE. Their adoption of remote customer support in the post-COVID world has propelled them to the forefront as the leading data provider in the UK. Mirroring European trends, the demand for virtual assistant roles has surged by 20%, [10] spurred on by initiatives such as digital nomad visas in Spain and Portugal. This trend not only reflects the changing landscape of customer service but also signals significant injections into the economy.

Traffic congestion

In the hustle and bustle of post-pandemic London, navigating the city’s streets amidst fluctuating traffic patterns and network demands presents a unique challenge. Telecom companies are stepping up to the plate, leveraging cutting-edge AI and ML technologies to tackle these issues head-on. By predicting traffic patterns and dynamically managing network loads, they’re ensuring that Londoners experience optimal connectivity and responsiveness, even during peak hours when congestion is at its peak. Imagine this: congestion hotspots are pinpointed in real-time, and network resources are strategically directed to these areas, reducing disruptions. This means that residents and commuters alike enjoy a smoother, more reliable connection, whether they’re streaming, working remotely, or simply staying connected on the go.

One shining example is Vodafone, which has implemented AI-driven traffic prediction models specifically tailored to London’s intricate traffic patterns. The result? A remarkable 25% reduction in network congestion during peak hours, as reported by TechRadar. [11] This underscores the significance of bespoke solutions in addressing London’s unique challenges post-pandemic, solidifying network performance and reliability for the city’s diverse population and thriving businesses.

Another notable case is BT/EE, which has also deployed AI-driven traffic prediction models in London. This initiative led to a significant 30% reduction in network congestion during peak hours. [12] Such tailored AI solutions not only enhance operational efficiency but also demonstrate the telecom industry’s commitment to leveraging technology to improve urban infrastructure.

Dynamic Spectrum

In the dynamic realm of post-COVID technology, telecom pioneers are revolutionising spectrum management with dynamic spectrum allocation. Imagine a digital symphony where frequencies dance to the beat of demand, seamlessly adapting to surges in digital traffic. This innovative approach ensures uninterrupted connectivity, even in the busiest digital arenas. According to recent studies, dynamic spectrum allocation has shown to increase spectrum efficiency by up to 40%, supporting seamless connectivity for the data-hungry masses. [13] Telecom wizards are thus reshaping the digital landscape, delivering turbo-charged connectivity.

References

https://www.statista.com/statistics/203291/global-it-services-spending-forecast/
https://www.oliverwyman.com/our-expertise/perspectives/health/2021/mar/why-4-technologies-that-boomed-during-covid-19-will-keep-people-.html
https://www.worldometers.info/coronavirus/
https://www.gov.uk/government/news/new-data-shows-small-businesses-received-213-billion-in-covid-19-local-authority-business-support-grants#:~:text=Press%20release-,New%20data%20shows%20small%20businesses%20received%20%C2%A321.3%20billion%20in,and%20arts%2C%20entertainment%20and%20recreation.
Proximo Intelligence Data: www.proximoinfra.com
Vodafone Press Release, 2022.
“McKinsey & Company. “Predictive maintenance: The rise of self-maintaining assets.”
Deloitte. “Predictive maintenance: Taking proactivity to the next level.”
Forbes. “Why Virtual Assistants Are Becoming Essential for Businesses.”
Statista. “Growth in Demand for Virtual Assistants in Europe.”
TechRadar. “Vodafone’s AI traffic prediction cuts network congestion by 25% in London.”
The Guardian. “BT/EE’s AI traffic prediction cuts network congestion by 30% in London.”

Harnessing the Power of 5G

Posted on June 10, 2024 by Vincent Rodriguez

Revolutionizing Telecom with Programmable Networks and APIs

By Ameer Shohail L with Ajay Lotan Thakur

Ameer Shohail is an experienced ICT Solutions Design Specialist and IEEE Senior Member at a Tier 1 telecom operator in the Middle East, specializing in advanced wireless technologies.

Ajay Lotan Thakur is a Senior IEEE Member, IEEE Techblog Editorial Board Member (who edits/adds to blog posts), BCS Fellow, TST Member of ONF’s open source Aether (Private 5G) Project, Cloud Software Architect at Intel Canada.

Abstract

The telecom industry is on the verge of a significant transformation driven by the convergence of 5G/Beyond 5G and programmable networks. This article explores the immense potential of these advancements, emphasizing the shift from rigid infrastructures to dynamic platforms that offer their capabilities as services. We delve into the importance of programmable networks, the Network as a Service (NaaS) model, and the critical role of APIs. The article highlights the Network Exposure Function (5G CORE – NEF) and its role in creating new revenue streams for CSPs and fostering a thriving digital ecosystem. The TM (TeleManagement) Forum’s emphasis on service lifecycle management and the collective effort towards a digitally interconnected future are key themes, inviting all stakeholders to embrace this transformative journey.

Introduction: The Paradigm Shift in Telecom

The telecom industry is undergoing a profound transformation from traditional infrastructures to more dynamic and flexible systems. This shift is essential to meet the increasing demands for faster, more reliable, and scalable network services. Innovation and collaboration are now crucial drivers of industry growth, enabling communication service providers (CSPs) to leverage cutting-edge technologies to enhance their offerings and stay competitive.

Figure1: EPS Architecture

Figure2: 5GS Architecture

Programmable Networks and NaaS: The New Frontier

Programmable networks and Network as a Service (NaaS) represent the next frontier in telecom innovation. These technologies allow CSPs to offer network capabilities as customizable services, enhancing flexibility and efficiency. The TM Forum’s Open Digital Architecture (ODA) is central to this transformation, providing a standardized framework for the seamless integration of network services. By adopting ODA, CSPs can decouple network functions from the underlying hardware, enabling greater agility and innovation

Figure3: NaaS function wheel, depicting various NaaS functions offered to consumers, REF[1]

APIs: The Engine of Innovation

APIs are fundamental to the telecom industry’s transformation, empowering developers to create innovative services that leverage network capabilities. The Common API Framework (CAPIF) by 3GPP standardizes API usage across various network functions, ensuring smooth and secure communication. This framework acts as a universal language, facilitating collaboration and enhancing security across diverse applications and industries.

Figure4: Functional model for the CAPIF

Network Exposure Function (NEF): Unlocking New Potential

3GPP 5G Core architecture

5GC (Figure2) is the new 3GPP standard for core networks defining how the core network should evolve to support the needs of 5G New Radio (NR) and the advanced use cases enabled by it. The figure below depicts NEF representation in the non-roaming architecture, using 5G reference point representation.

Figure5: Network Exposure Function in reference point representation REF[4]

The Network Exposure Function (NEF) provides a secure, standardized method for exposing APIs, enabling CSPs to broaden their service offerings and explore new revenue streams. NEF is essential for fostering a thriving digital ecosystem, driving innovation, and economic growth in the telecom sector. Figure below depicts how NEF plays a role with the IoT ecosystem in enabling the communication exchange through API.

Figure6: Illustrates NEF and its role in enabling the network and external applications to exchange information, REF[5]

CAMARA Project: Accelerating Innovation through Standardized APIs

The CAMARA project, an open-source initiative hosted by the Linux Foundation, is pivotal in advancing the telecom industry’s move towards programmable networks and Network as a Service (NaaS). CAMARA aims to develop standardized APIs for network services, enabling seamless integration and interoperability across diverse network functions and applications.

This initiative promotes community-driven development and open-source principles, fostering collaboration among CSPs, technology vendors, and developers. Supported by leading industry players, CAMARA drives the adoption of robust, secure, and widely accepted APIs, facilitating new business models and revenue streams. By focusing on the needs of future technologies, CAMARA ensures that the telecom sector is well-equipped to leverage the unique capabilities of 5G, B5G, and beyond. REF[8]

Transition to a Service-Centric Model

The TM Forum’s focus on NaaS and service lifecycle management underscores the industry’s shift towards a service-centric model. This approach emphasizes managing the entire lifecycle of network services, from design and deployment to operation and optimization. By adopting a service-centric mindset, CSPs can deliver more personalized and efficient services, improving customer satisfaction and driving business growth. This approach also helps CSPs optimize operations and reduce costs by streamlining processes and improving resource utilization.

Figure7: Open Gateway NaaS Architecture and contributing stakeholders REF[7]

Conclusion

The telecom industry’s shift to programmable networks and NaaS marks a pivotal moment in its evolution. By embracing APIs, NEF, and service lifecycle management, CSPs can unlock new opportunities for innovation and growth. The collective efforts of industry stakeholders, supported by initiatives from TM Forum, GSMA, and CAMARA, will pave the way for a digitally interconnected future where collaboration and innovation are the norms. As the industry continues to evolve, embracing these advancements will be crucial for CSPs to stay competitive and meet the ever-growing demands of the digital age.

This journey is not just about technological advancement; it’s a collective endeavor towards a digitally interconnected future. It’s an invitation for all stakeholders, from telecom operators and technology providers to developers, to contribute to and reap the benefits of the expanding digital economy. Let’s embrace this transformative time, shaping the way we connect and interact in the digital world.

References

IG1224 NaaS Transformation v12.0.0″: https://www.tmforum.org/resources/reference/ig1224-naas-transformation-v12-0-0/
“Northbound exposure – how NEF and CAMARA can enable telecom’s platform play” by James Crawshaw, Practice Leader: https://omdia.tech.informa.com/om028769/northbound-exposure–how-nef-and-camara-can-enable-telecoms-platform-play
ETSI TS 129 522 V16.4.0 (2020-08) – 5G; 5G System; Network Exposure Function Northbound APIs; Stage 3.
3GPP TS 23.501 version 15.3.0 Release 15; System Architecture for the 5G System
“Common Framework for 5G Northbound APIs”: https://www.etsi.org/deliver/etsi_ts/123200_123299/123222/15.03.00_60/ts_123222v150300p.pdf
“5G and B5G NEF exposure capabilities towards an Industrial IoT use case” : https://scholar.google.com/scholar?q=5G+and+B5G+NEF+exposure+capabilities+towards+an+Industrial+IoT+use+case&hl=en&as_sdt=0&as_vis=1&oi=scholart
“The-Ecosystem-for-Open-Gateway-NaaS-API-development”: https://www.gsma.com/solutions-and-impact/gsma-open-gateway/gsma_resources/naas-ecosystem-whitepaper/
https://camaraproject.org/; APIs enabling seamless access to Telco network capabilities

Ameer Shohail, Experienced ICT Solutions Design Specialist and IEEE Senior Member at a Tier 1 telecom operator in the Middle East, specializing in advanced wireless technologies

Part-2: Unleashing Network Potentials: Current State and Future Possibilities with AI/ML

Posted on May 13, 2024 by Vincent Rodriguez

By Vinay Tripathi with Ajay Lotan Thakur

Introduction

In the dynamic realm of networking, AI/ML has emerged as a transformative force to reshape the networking world by making it more secure, reliable, efficient and optimized. In this blog we will dive into characteristics, possibilities, use cases and challenges of AI/ML in the networking.

About AI and ML

Definitions of AL/ML

AI and ML are often used interchangeably, but there are some key differences between the two.

AI is the ability of machines to perform tasks that would normally require human intelligence, such as understanding natural language, recognizing objects, and making decisions.
ML is a subfield of AI that allows machines to learn from data and improve their performance over time.
DL = Uses neural networks for complex structured models and greater insights.

Types of AI/ML

AI/ML encompass a wide range of techniques and algorithms that can be used to solve a variety of problems. In the context of networks, AI/ML technologies can be broadly categorized into the following types:

Key Points:

AI/ML taxonomy is continuously evolving due to industry growth and various methodologies and algorithms.
The choice of AI/ML algorithm significantly influences business outcomes, including training time, prediction accuracy, and resource usage.
The selection of algorithms depends on the type and volume of available data for a specific use case.

Popular ML Types:

Supervised/Unsupervised: When available data is simple or significant pre-processing has resulted in high data quality:
Neural Networks and Deep Learning: When you have substantial amounts of unstructured/structured data or unclear features these may offer superior accuracy over Classical ML methods
AutoML: When you need to streamline machine learning model development, especially with limited expertise, time, or resources.
NLP: When tasks involve text or language data and require automation, understanding, or generation of natural language content.
Reinforcement learning: Suitable when you need to train agents to make sequential decisions in dynamic environments, optimizing for long-term rewards, and when there is a need for autonomous decision-making, such as in robotics, game playing, or autonomous systems.
Figure-1: Hierarchy of AI, ML and DL

Applications of AI/ML

AI and ML technologies provide a diverse array of applications in networks, encompassing security, engineering, capacity planning, and operations. These technologies have the capability to augment network security, optimize network design and performance, forecast traffic demand, and automate network tasks. This leads to enhanced efficiency, reliability, and overall network performance. Here are some specific examples:

Network Security

Intrusion Detection System (IDS): AI-powered IDS can detect and respond to cyberattacks in real-time, providing a more robust defense against threats.
Thread Detection and Prevention (TDP): AI can analyze network traffic to identify and prevent threats before they can cause damage.
Anomaly Detection: AI can detect deviations from normal network behavior, indicating potential security incidents.

Network Engineering

Quality of Service (QoS): AI can optimize network resources to ensure consistent and reliable performance for critical applications.
Routing and Traffic Management: AI can optimize routing decisions and manage traffic flow to avoid congestion and improve network performance.
Optimized Traffic Flow: AI can analyze traffic patterns and make real-time adjustments to optimize traffic flow, reducing latency and improving overall network performance.
Load Balancing: AI can distribute traffic across multiple servers or network links to balance the load and prevent bottlenecks.

Network Capacity Planning

Improved Capacity Forecasting: AI can analyze historical data and predict future traffic demand, enabling network operators to plan for future capacity needs.
Efficient Uses of Resources: AI can identify and allocate network resources more efficiently, reducing costs and improving network performance.

Network Maintenance, Troubleshooting, Operations and Monitoring

Real-time Monitoring: AI can continuously monitor network performance and identify potential issues before they cause outages or disruptions.
Quicker Resolutions of Vendor/Hardware Issues: AI can diagnose and resolve vendor and hardware issues more quickly, minimizing downtime.
Faster Root Cause Analysis: AI can analyze large amounts of data to identify the root cause of network issues, enabling faster resolution.
Quick Mitigations of Network Issues: AI can automatically implement mitigations for network issues, reducing the impact on users and applications.

AI/ML Based Network in Action

The seamless integration of AI/ML components at various levels of the network (edge, core, management, etc.) enhances its reliability, efficiency, and security by optimizing performance and safeguarding against vulnerabilities.

The diagram illustrates a practical application of AI/ML within one of the extensive networks.

Figure-2: AI/ML in action in a cloud network

Trends in AI/ML

AI/ML are revolutionizing the field of networks. These technologies are being used to improve the performance, security, and reliability of networks.

Here are some of the key trends in AI/ML for networks:

Simplify and scale data operations.
AI/ML can be used to automate and simplify many of the tasks involved in managing and analyzing network data. This can free up network administrators to focus on more strategic tasks.
Increase accuracy of forecasts.
AI/ML can be used to predict network traffic patterns, identify potential problems, and plan for future capacity needs. This can help organizations to avoid costly downtime and improve the quality of service for their users.
Decrease time to market.
AI/ML can be used to automate the process of designing, deploying, and managing new network services. This can help organizations to bring new products and services to market faster.
Enable insights on otherwise unusable data
AI/ML can be used to extract insights from network data that would otherwise be too complex or voluminous to analyze manually. This can help organizations to identify security threats, optimize network performance, and improve customer experience.

Figure-3: Trends in ML

AI/ML Use Cases

The introduction of AI/ML use cases in network functions has revolutionized the field of networking. AI/ML technologies are being leveraged to enhance network security, optimize network design and performance, anticipate traffic demand, and automate network tasks. This integration leads to improved efficiency, reliability, and overall network performance.

Examples of the popular use cases of AI/ML in large networks.

Figure-4: AI/ML Use Case: Hardware Failure Prediction

Figure-5: AI/ML Use Case: Network Demand Forecasting

ML vs Non-ML Networks

The comparison of ML-based and non-ML-based networks provides valuable insights into the advantages and limitations of each approach. By examining the key aspects such as scalability, flexibility, accuracy, and security, organizations can make informed decisions about the most suitable solution for their specific networking needs. This comparison can guide network engineers, architects, and decision-makers in selecting the optimal approach to meet their performance, efficiency, and security requirements.

A comparison between ML-based and non-ML-based solutions is provided in the followingtable:

Figure-6: Comparison of ML and non-ML solutions

Reasons Not to Use AI/ML

While AI/ML technologies offer significant benefits for networks, there are certain scenarios where their application may not be suitable or feasible. Several factors, such as data availability, use case definition, cost considerations, the need for customized models, and the effectiveness of existing automation, can influence the decision to refrain from using AI/ML in networks. Understanding the limitations and potential drawbacks of AI/ML is crucial for organizations to make informed choices about the most appropriate approach for their specific networking needs.

Not enough data sets to train the model:
- AI/ML models require large amounts of high-quality data to train effectively. In the context of networks, it may be challenging to collect and prepare sufficient data. Factors such as network size, traffic patterns, and security considerations can make data collection a complex and time-consuming process.
- The lack of adequate data can lead to models that are not well-generalized and may not perform well in real-world scenarios.
Use case is not defined well:
- AI/ML models are designed to solve specific problems or achieve specific goals. If the use case for AI/ML in networks is not clearly defined, it can be difficult to develop a model that effectively addresses the desired outcomes.
- A poorly defined use case can lead to misalignment between the model’s capabilities and the actual requirements of the network.
High cost is a problem:
- Implementing AI/ML solutions in networks can be expensive. Factors such as hardware requirements, software licenses, and the cost of hiring skilled professionals contribute to the overall cost.
- Organizations need to carefully evaluate the cost-benefit analysis before investing in AI/ML for their networks. In some cases, the cost of deploying and maintaining an AI/ML solution may outweigh the potential benefits.
Customized AI/ML model is required:
- Off-the-shelf AI/ML solutions may not always be suitable for specific network scenarios. Organizations may require customized models that are tailored to their unique requirements.
- Developing customized AI/ML models requires specialized expertise and resources, which can further increase the cost and complexity of the project.
Existing automation is already serving the requirement:
- Many networks already have existing automation solutions in place, such as network management systems (NMS) and configuration management tools. These solutions provide a range of automation capabilities that may already be sufficient for the organization’s needs.
- Implementing AI/ML in such scenarios may not offer significant additional benefits or may require a substantial investment to achieve incremental improvements.

AI/ML Challenges in Networks

AI/ML in networks has benefits but also challenges. Complexity arises from numerous interconnected components and interactions, which AI/ML further complicates. Data limitations and algorithmic bias are additional concerns. Regulatory compliance adds another layer of complexity. Some of the challenges are described in detail below:

Complexity

As networks become increasingly complex, it can be difficult to troubleshoot issues that arise. This is due to the large number of interconnected components and the complex interactions between them.
For example, a problem with a single router can have a cascading effect on the entire network, making it difficult to identify the root cause of the issue.
Additionally, the use of AI and ML in networks can further increase complexity by introducing new layers of abstraction and decision-making.

Data Requirements

AI and ML algorithms require large amounts of data to train and operate effectively. This can be a challenge for networks, as they may not have access to sufficient data to train their models.
For example, a network security system may not have enough data on recent attacks to train a model to detect and prevent future attacks.
Additionally, the data that is available may be biased or incomplete, which can lead to inaccurate or unfair models.

Algorithmic Bias

AI and ML algorithms can be biased, which can lead to unfair or discriminatory outcomes. This is because the algorithms are trained on data that may contain biases, such as racial or gender bias.
For example, a facial recognition system may be biased towards certain ethnicities, leading to false identifications or denials of service.
It is important to address algorithmic bias in networks to ensure that AI and ML are used in a fair and responsible manner.

Regulatory Compliances

Networks are subject to a variety of regulatory compliance requirements, such as the General Data Protection Regulation (GDPR) and the Health Insurance Portability and Accountability Act (HIPAA).
These regulations impose strict requirements on how data is collected, stored, and used.
AI and ML can add additional complexity to compliance, as they can introduce new data processing and decision-making processes.
Organizations need to carefully consider the regulatory implications of using AI and ML in networks to ensure that they are compliant with all applicable regulations.

Ethical Concerns

The use of AI and ML in networks raises several ethical concerns, such as the misuse of data and job replacement.
For example, AI-powered surveillance systems could be used to track and monitor people without their consent, raising concerns about privacy and civil liberties.
Additionally, AI and ML could lead to job automation, which could displace workers and have a negative impact on the economy.
It is important to consider the ethical implications of using AI and ML in networks to ensure that they are used in a responsible and ethical manner.

Networks: AI/ML Benefits

In today’s digital world, networks are becoming increasingly complex and interconnected. To manage and operate these networks effectively, organizations are turning to AI/ML. AI/ML can automate repetitive tasks, identify, and mitigate network threats, and optimize network performance. AI/ML can also help organizations to gain more insights from their network data, which can lead to better decision-making and improved business outcomes. Some of the top benefits are described below:

Lower Cost:

Automated tasks: AI/ML can automate repetitive and time-consuming network tasks, such as configuration, monitoring, and troubleshooting. This can free up staff to focus on more strategic initiatives.
Efficient customer support: AI/ML-powered chatbots and virtual assistants can provide 24/7 customer support, answering common questions and resolving simple issues. This can reduce the need for human customer support representatives, saving costs.
Improved performance: AI/ML can be used to optimize network performance by identifying and resolving bottlenecks and inefficiencies. This can lead to reduced latency, improved throughput, and better overall network performance while minimizing the network operation cost.

Reduced Network Risk:

Resilient network: AI/ML can be used to create more resilient networks that are better able to withstand outages and attacks. This can be done by predicting and preventing network failures, and by quickly identifying and resolving issues.
Identify and mitigate threats: AI/ML can be used to detect and mitigate network threats, such as malware, DDoS attacks, and phishing attempts. This can help to protect sensitive data and systems from being compromised.
Accurate network trends and forecast: AI/ML can be used to analyze network data to identify trends and forecast future needs. This information can be used to make informed decisions about network planning and investment.
Network outage prediction: AI/ML can be used to predict network outages before they occur. This can help to prevent downtime and lost productivity.

More Revenue:

Enhanced network and capacity planning: AI/ML can be used to optimize network and capacity planning, ensuring that the network has the resources it needs to meet current and future demands. This can help to avoid costly over-provisioning or under-provisioning of network resources.
Faster time to market: AI/ML can help to accelerate time to market for new network services and applications. This can be done by automating the testing and deployment process, and by identifying and resolving potential issues early on.
Better customer experience: AI/ML can be used to improve the customer experience by providing personalized and proactive support. This can lead to increased customer satisfaction and loyalty.

Networks: AI/ML Innovation Catalysts

The convergence of AI/ML with networks is revolutionizing various industries. Here are some key factors driving this transformation:

Increase in Data/Compute and Storage:
- The proliferation of IoT devices has led to an exponential growth in data generation, fueling AI/ML innovation.
- High-performance computing (HPC) clusters and cloud platforms provide the necessary compute and storage resources for complex AI/ML models.
Edge Computing:
- Edge computing brings AI/ML capabilities closer to data sources, enabling real-time decision-making.
- Edge devices, such as sensors and gateways, collect and process data locally, reducing latency and bandwidth requirements.
Cloud Infrastructure:
- Cloud platforms offer scalable and elastic infrastructure for deploying and managing AI/ML workloads.
- Cloud-based AI/ML services provide pre-built tools and frameworks for developers, accelerating the development and deployment of AI/ML applications.
Increase in Devices Running AI:
- Smartphones, smart home devices, and autonomous vehicles are increasingly equipped with AI capabilities.
- These devices generate vast amounts of data and use AI to perform tasks such as image recognition, natural language processing, and predictive analytics.
Pre-trained Models:
- Pre-trained models, such as open-source BERT and ResNet, provide a starting point for developing custom AI models.
- These models have been trained on large datasets and can be fine-tuned for specific tasks, reducing the time and resources required for model development.
Human and AI Cooperation:
- AI/ML is augmenting human capabilities, enabling collaboration between humans and machines.
- Human-AI teams can leverage their respective strengths to solve complex problems and make better decisions.

Conclusion

AI and ML are revolutionizing the field of networking, bringing efficiency, automation, and significant performance improvements. As networks continue to grow and complexity, traditional management methods are becoming increasingly ineffective. AI and ML offer a powerful solution by enabling networks to self-configure, self-optimize, and self-heal, leading to a more agile, resilient, and cost-effective network infrastructure. The use of AI and ML in networks is still in its early stages, but it has the potential to transform the way networks are designed, built, and operated. As AI and ML technologies continue to evolve, we can expect to see even more innovative applications that will further unleash the potential of networks.

References

_{**** This blog post was written with the assistance of Google’s Gemini. The AI was used to generate initial draft, rephrasing, and brainstorming, which I then refined, edited, and expanded upon.}

Part1: Unleashing Network Potentials: Current State and Future Possibilities with AI/ML

Posted on May 13, 2024 by Vincent Rodriguez

By Vinay Tripathi with Ajay Lotan Thakur

Introduction

We live in an era of rapid digitization and ubiquitous connectivity where networks touch every aspect of our lives. From the global telecommunication infrastructure enabling seamless voice and data communication to the diverse social media platforms facilitating instant global interactions, the way we collaborate, communicate, and access information is heavily dependent on the seamless operation of networks. However, as networks continue to evolve, expanding in size and complexity, managing, provisioning, and optimizing them efficiently poses significant challenges.

Introducing Artificial Intelligence (AI) and Machine Learning (ML), offering a transformative solution to simplify network provisioning, streamline operations, enhance network performance, and unlock valuable insights from the vast amounts of network data. AI and ML empower network administrators, architects, planners, engineers, and managers with a range of capabilities that significantly improve network efficiency and effectiveness.

Type of Networks

Networks can be classified into various types based on their purpose, size, and geographical coverage. Some common types of networks include:

1. Core Networks:

Form the backbone of the internet, connecting large geographical regions and major network providers.
Characterized by high-speed data transmission, typically fiber optic cables, and redundant paths for reliability.
Responsible for routing traffic between different parts of the internet and carrying large amounts of data.

2. Data Center Networks:

Designed to support the infrastructure of data centers, where large amounts of data are processed and stored.
Highly interconnected and optimized for low latency and high bandwidth to facilitate efficient communication between servers and storage systems.
Often utilize specialized networking technologies such as Ethernet and InfiniBand.

3. Enterprise Networks:

Connect devices and resources within an organization or company.
Include local area networks (LANs) for devices within a building or campus and wide area networks (WANs) for connecting geographically dispersed sites.
Provide secure and reliable connectivity for employees, customers, and partners.

4. Cellular Networks:

Provide wireless connectivity for mobile devices such as smartphones, tablets, and IoT devices.
Consists of cellular towers or base stations that communicate with mobile devices using radio waves.
Offer various cellular technologies such as 2G, 3G, 4G, and 5G, each providing different levels of speed and capacity.

Here’s an example that demonstrates various types of networks:

Figure-1: Various types of Networks

Figure-2: Google Network Infrastructure

Network Functions

Networks are designed to achieve specific goals, for example, edge networks can have very different routing and switching requirements when compared to core networks. However, there are some functions which are common to all networks.

Engineering:Deals with design, optimization, provision and development of the network infrastructure and services. Engineering teams ensure the network operates efficiently, reliability and meets all the performance and scale requirements.
Capacity Planning & Forecasting:Estimates future demand of network resources such as routers, switches, servers, storage and bandwidth. It helps in network planning and scaling by analyzing history consumptions and future demand.
Implementation:Physically deploys the network components like routers, switches, servers, etc. It integrates the systems to the rest of the network and services based on the designs and plans developed by the engineering team.
Monitoring:Another critical function of network infrastructure which provides vital insights to the current state of network infrastructure. Data collected from the systems can be used by other network functions to improve network performance, reliability, and security.
Operation:A crucial function of the network which focuses on day-to-day management, maintenance and support of network infrastructure and services. It ensures the network operates smoothly, efficiently and with least disruptions.
Security:Maintains confidentiality and integrity of information and systems. It uses firewall, intrusion detection systems and access control lists to keep the network secure.

Network Without AI/ML

Many large-scale network outages result from human manual errors or automated system malfunctions. Avoiding such issues is difficult when humans are involved in daily operational decision-making. Many network functions have been automated in recent years, but they still rely on predefined values or actions that require continuous system or service updates. Additionally, there are still many networks or functions that are not automated due to a lack of expertise, resources, or willingness. Even in automated networks, operators must perform manual operations in certain situations, such as tooling infrastructure failures or recoveries. Some scenarios where automated and/or manual operations are performed in a network include:

Manual/automated security provisions:
- Manual security provisions involve tasks such as manually configuring firewalls, intrusion detection systems, and other security devices.
- Automated security provisions involve using software tools to automate security tasks, such as vulnerability scanning, patch management, and threat detection.
Manual/automated configuration of network devices (switches, routers, etc.):
- Manual configuration involves manually configuring network devices, such as switches and routers, using command-line interfaces or web-based interfaces.
- Automated configuration involves using software tools to automate the configuration of network devices, which can save time and reduce errors.
Manual/automated monitoring dashboard with predefined values:
- Manual monitoring involves manually monitoring network performance and security metrics using technologies such as Telemetry, SNMP, and syslog.
- Automated monitoring involves using software tools to automate the monitoring of network metrics and generate alerts when predefined thresholds are exceeded.
Manual/automated troubleshooting of network issues:
- Manual troubleshooting involves manually diagnosing and resolving network issues, such as connectivity problems, performance issues, and security breaches.
- Automated troubleshooting involves using software tools to automate the diagnosis and resolution of network issues, which can reduce the time it takes to resolve problems.
Manual/automated mitigation of network events:
- Manual mitigation involves manually responding to network events, such as security breaches, denial-of-service attacks, and natural disasters.
- Automated mitigation involves using software tools to automate the response to network events, which can help to minimize the impact of these events.
Manual/automated capacity planning process:
- Manual capacity planning involves manually forecasting network traffic demand and planning for future capacity needs.
- Automated capacity planning involves using software tools to automate the forecasting of network traffic demand and the planning of future capacity needs, which can help to ensure that the network has sufficient capacity to meet future demand. Automated solutions can save time, reduce errors, and improve efficiency.

NextGen Network Requirements

Next-generation networks must meet diverse use cases and deliver exceptional customer experiences. Network applications and use cases constantly evolve, necessitating adjustments in network design, technologies, and operations. Continuous optimization is needed to unleash the network’s full potential. For example, existing data center networks require redesign and optimization to meet the demands of AI/ML applications. Critical requirements that must be fulfilled by next-generation networks are as follows:

Increased performance, reliability, and security: Networks must handle massive data volumes and complex workloads with high performance and low latency. Reliability and security are paramount, ensuring uninterrupted operations and safeguarding sensitive information.
Customer-centric focus: Delivering a seamless and delightful customer experience is crucial. Networks must facilitate seamless coordination across business functions, enabling personalized services and addressing customer needs effectively.
Managing massive complexity: The convergence of 5G, Internet of Things (IoT), AI/ML loads and edge computing introduces unprecedented complexity. Networks need to be equipped with advanced orchestration and management capabilities to handle this complexity efficiently.
Value beyond connectivity: Networks should not be limited to providing mere connectivity. They must deliver value-added services and capabilities such as real-time analytics, edge computing, and network slicing to meet diverse customer requirements.
Improved service assurance and issue prediction: Networks must proactively monitor and analyze network performance to predict potential issues before they impact customers. Fault detection and self-healing mechanisms are essential to ensure uninterrupted service availability.
Measuring and optimizing customer experience: Networks should have built-in capabilities to measure and analyze customer experience metrics such as latency, packet loss, and jitter. This data can be leveraged to optimize network performance and rectify areas of improvement.
Understanding customer expectations: Networks must provide insights into customer expectations and evolving needs. This can be achieved through surveys, feedback mechanisms, and real-time monitoring of customer interactions.
Increased efficiency and intelligence: Networks should incorporate AI and ML technologies to automate tasks, optimize resource allocation, and enhance overall network efficiency and intelligence.

Conclusions

Future networks need AI/ML integration to fulfill the next generation of requirements. AI/ML can make networks more efficient, secure, reliable, and scalable. AI/ML can effectively monitor and alert operators, utilizes resources efficiently, make network customer centric and faster delivery of services. In the next blog, we will discuss AI/ML use cases, benefits, limitations, and projections.

References

_{**** This blog post was written with the assistance of Google’s Gemini. The AI was used to generate initial draft, rephrasing, and brainstorming, which I then refined, edited, and expanded upon.}

Building and Operating a Cloud Native 5G SA Core Network

Posted on February 9, 2024 by Vincent Rodriguez

By Ajay Thakur with Alan J Weissberger

Abstract:

In this article, we endeavor to clarify some of the critical issues and questions related to implementing a cloud native 5G SA core network and how it differs from the traditional core network composed of hardware devices and software solutions. It’s important to note that NONE of the 3GPP defined 5G features can be realized without a 5G SA core. Those include: Network Automation, Network Function Virtualization, 5G Security, Network Slicing, Multi-Access Edge Computing (MEC), Policy Control, Network Data Analytics, etc.

Communication Service Providers (CSPs) will need to do things differently (than 4G) in order to implement and use a 5G cloud native SA core. Various cloud native 5G SA core aspects include network planning, deployment, software upgrades, network monitoring, hardware and platform upgrades.

These will be examined and contrasted with traditional implementations, such as the 4G Evolved Packet Core (EPC).

Introduction:

3GPP introduced 5G SA core network architecture in release 15. Since then numerous new features (work items) have been introduced to specifications. 3GPP’s 5G SA’s specifications use virtualization and cloud native principles as the foundation. A few key 3GPP Technical Specifications (TSs) for 5G system are the following:

TS 22.261, “Service requirements for the 5G system”.
TS 23.501, “System architecture for the 5G System (5GS)”
TS 23.502 “Procedures for the 5G System (5GS)
TS 32.240 “Charging management; Charging architecture and principles”.
TS 24.501 “Non-Access-Stratum (NAS) protocol for 5G System (5GS); Stage 3”
TS 38.300 “NR; NR and NG-RAN Overall description; Stage-2”
TS 23.527 “5G System; Restoration procedures Stage-2”

5G SA tries to resolve the challenges faced by network operators in the EPC deployments and how those challenges can be mitigated with new design.

Several important changes in the 5G SA core are support for a Service Based Architecture and Cloud Native implementation of 5G SA core. That will enable new 5G features and functions like Network Slicing, 5G Security, and MEC, among others.

To reap the benefits of Cloud Native SA, CSPs are required to adapt to new cloud native principles of network deployment, operation and monitoring. We shall examine various aspects of Life Cycle Management of 5G SA software and also ask some open ended questions on each aspect.

SBA architecture Diagram with multiple Interfaces:

The Network diagram above shows the SBI interfaces in case of roaming. Each of the individual NF may be composed of one more micro-services. Each NF may come from different vendor. Since these NFs are available in containerized format, they may or may not share the same container orchestration platform.

Network Dimensioning:

5G SA solves the network expansion problem since the network can be scaled up/down by adding/removing commercial off the shelf hardware.
Once we have all the NFs software releases available, operators can gather the compute, memory & network requirements for deployment.
Operators would rely on auto scaling functionality provided by the 5G NF vendors to avoid over provisioning at the start. This is relatively much simpler compared to adding dedicated hardware for NF in case of EPC.

Deployment Options Selection/Planning

Operators need to decide the type of deployment for the network like the 5G SA deployment on public cloud or on private cloud.
Along with Public vs Private cloud decisions, the Operator also needs to decide on a Container orchestration engine. Container orchestration can be managed service or operator managed service. Popular container orchestration engine is Kubernetes (K8s).
Next step would be to finalize or select one of the CI/CD tools which works for all the vendors and integrate that with the container orchestration platform.
Also placement of the NFs needs to be decided, e.g. User Plane Function (UPF) could be on-prem close to RAN. It’s possible that the operator may place some of the RAN virtualized components in the cloud.
These all are important decisions to be made before getting into the real deployment of the software.
Some of the aspects here are due to Cloud Native 5G SA and were not applicable for EPC.

CI/CD:

CI/CD landscape (https://landscape.cncf.io/) from CNCF shown in below Figure

It can be seen from the CNCF project that there are many projects which are available for CI/CD
Irrespective of Public or Private cloud, operators need to follow cloud native CI/CD principles for deploying the Core Network in the Cloud environment. CI/CD involves taking the software release from the vendor and running it against existing network functions and carrying out some minimum tests and once operator is satisfied with evaluation of the release then rollout the release in the network.
Operators can decide to have a separate staging environments where new releases can soak for a certain period of time while few subscribers use the new release.
CI/CD gives the option of rolling back the release if the operator is not happy with the performance of the new release.
Now the challenge here would be, will the operator have single CI/CD tools used for deploying all the vendors’ solution OR would the operator use its own CI/CD tooling and integrate NFs from vendor it in its own environment. This is the decision the operator needs to take.
Having Automated CI/CD infrastructure which takes the software releases from the vendors and passes it through multiple environments and all the way to the production environment is key.
Without having an appropriate CI/CD environment it would be very difficult to manage all micro-services and their deployments.
Public Cloud may come with inbuilt CI/CD solutions and would be easy for operators to start with.
AWS offers multiple services around CI/CD and those are listed here – https://docs.aws.amazon.com/whitepapers/latest/cicd_for_5g_networks_on_aws/cicd-on-aws.html
Azure offering can be found here https://learn.microsoft.com/en-us/azure/devops/pipelines/apps/cd/azure/cicd-data-overview?view=azure-devops
Google Cloud CI/CD services can be found here – https://cloud.google.com/solutions/continuous-integration

Software Upgrade:

Cloud Native 5G SA allows operator to upgrade some of the components easily instead of complete 5G Core update in one go. CI/CD framework would help in the software upgrades with minimal human intervention.
Note that with Cloud Native principles the operator would get multiple patch/minor releases and may be some major releases throughout the life cycle of the software. So the traditional approach of pulling down complete hardware & upgrading it with new software is not required for microservice based solutions. But this really works as long as all microservices are truly built stateless and supports live upgrade.
As an operator, it would be required to know the impact of each upgrade package and prepare for rollback in case something goes wrong during the upgrade cycle.

Network Monitoring:

Traditionally, operators developed their own Network Monitoring solutions to monitor the health of EPC since there was no standard mechanism to get the metrics, statistics from the NFs,
5G SA follows Cloud Native principles; it is easy to get the logs, statistics, alerts, alarms from all microservices in a consistent manner. There are common tooling used by most of the cloud native applications and CNCF has multiple projects in these categories.
CNCF supported Monitoring projects are shown below figure

- Tracing is important aspect to find out the bottleneck in the performance.

Logging has been a traditional approach for debugging network issues. Below are the projects offered by CNCF in the logging area

Public Cloud providers can extend the monitoring easily by generating Texts or email alerts as per CSPs needs.
Operators can define their policies to retain the network performance monitoring output for a long time and can take backup of this easily through use of Public Cloud Providers data backup services.
In the case of EPC these mechanisms were product dependent.
Challenge in case of 5G SA would be each NF vendor may end up in using different tool and operators would have some challenges to converge all NF vendors to the common tooling.

Hardware & platform upgrade:

In the traditional approach, providing hardware with updated operating systems and platforms was the responsibility of the equipment vendor. Now this responsibility has gone into either operator’s hand or sometimes in Public Cloud Provider’s hand. It depends on if the 5G SA is deployed on on-prem or on public cloud. Operators need to carefully plan for these upgrades without causing any downtime and of course follow the rolling upgrade patterns to avoid updating multiple entities at a time.
If managed container orchestration is used, then these upgrades are seamlessly handled by Public Cloud Providers.

Vendor Lock In:

In the case of EPC, vendor lock in was specific to NF & Radio Vendors. The 3GPP EPC specification allows the operators to swap NFs from one vendor with another and as long as NF supports the required Services. Point to note that this is less costly replacement compared to replacing one vendor with another vendor when NF had associated hardware.
But with cloud native SA there is a chance that the operator may end up in building the tooling (CI/CD, monitoring etc.) over the period of time and this may lead to cloud provider lock in.

Conclusions:

The 5G SA core network provides a lot of flexibility and automation via cloud native deployments. However, the 3GPP 5G core specs contain a lot of implementation options, which network operators and their vendors must select to properly deploy a 5G core network. Making those decisions will likely require solid experience with operating applications on a cloud native platform. And that may be a reason that 5G SA core network rollouts have been so slow.

In the U.S. AT&T and Verizon have taken a very cautious approach to deploying their long ago promised 5G SA core networks. During the Brooklyn 6G Summit in November 2023, Chris Sambar, EVP for technology at AT&T said, ““I would say we are not moving as quickly as some of the other operators on the 5G standalone core, but we see the use cases that are coming, we understand when they’re coming, so we’re being very purposeful about getting there when we need to get there.” That’s despite AT&T outsourcing its 5G SA core network deployment to Microsoft Azure in June 2021. Yet Microsoft is the world’s second biggest cloud services provider with tons of experience running cloud native applications.

Summing up, Dave Bolan, Research Director at Dell’Oro Group wrote, “The buildout of 5G SA networks is going slower than anticipated which is restraining growth in the marketplace. To date, we count fifty 5G SA eMBB (enhanced Mobile BroadBand) networks that have been commercially deployed worldwide by Mobile Network Operators (MNOs). We counted 18 new 5G SA networks in 2022, but only 12 were launched in 2023. On a positive note, we believe a lot of work has been done in the background, preparing for 5G SA launches by Mobile Network Operators (MNOs) and we expect 2024 to have more launches than 2022.”

Ajay Lotan Thakur, Cloud Software Architect at Intel and IEEE Techblog Editorial Team member

References: