< back to overview

SD-Fabric: An End-to-End Programmable Data Plane – A Year in Review (Part 1)

Dec 9, 2021
Carmelo Cascone and Brian O'Connor
Carmelo Cascone and Brian O'Connor About the author

This is part one of a two part blog about one of ONF’s newest projects in software defined networking: SD-Fabric. The first blog is focused on background and the significant accomplishments to date; the second blog outlines exciting development opportunities going forward.

We invite additional ecosystem participation in this exciting endeavor. Please contact Brian O’Connor (brian@openneworking.org) to learn how if you are interested.

Introduction
To better support applications and users, programmability and end-to-end customization has been the goal of advanced networks for more than two decades. Today, the networking industry has begun to embrace programmability beyond switches, realizing improvements to the security, efficiency, and profitability of their infrastructure.

To realize these benefits, ONF has successfully developed and deployed its SD-Fabric, a software defined switching fabric generated with the use of open source components, including ONOS and Stratum. The fabric runs on P4 programmable switches and introduces novel capabilities such as end-to-end QoS, slicing, INT, and embedded network functions. SD-Fabric is the network foundation for Project Pronto and commercial Aether deployments.

Background
Since 2010, the industry has witnessed the democratization of the network, and ownership of many aspects of the network as it has shifted from network vendors to network operators. This shift has largely been enabled by disaggregation, SDN, and open source.

ONF and its partnering vendors have been at the vanguard of one key enabler of this transformation — SDN — where the network operating system (NOS) has been broken down into the control plane and the data plane with a well-defined, remote-ready interface between the two. SDN has allowed network operators to rethink the way the control plane is designed, rapidly modernize it using best of breed techniques from other computing disciplines, and accelerate feature and fix delivery.

The ONF and many others in the open source networking community have applied the principle of disaggregation in their development of open source network operating systems that mirror the capabilities of their black-box predecessors. The open NOS is modular in its design and brings together a constellation of open source applications and services to produce a complete solution. They also define vendor-agnostic interfaces for common components, such as the switching ASIC and platform.

More recently, the networking community has found another area to apply these principles: the data plane. Languages and programming abstractions, like P4, have been developed to provide a common programming model for the data plane and provide an opportunity to exchange data plane targets of different types and from different vendors. Data plane programmability provides operators unprecedented control over what features are supported and how packets are processed in their networks.

All three of these principles, disaggregation, SDN, and open source, have been successfully used to improve the switching fabric of the network. Modular and programmable data planes and control planes have improved feature velocity, given network operators more control over their networks, reduced costs, increased automation, unlocked interoperability, and enabled disruptive ideas to improve the overall reliability by reducing bugs and improving visibility.

While it all started with taking control of switches, we are now witnessing new trends such as the emergence of IPUs and DPUs - a new class of SmartNICs that expands data plane programmability to servers by offloading a broad range of OS networking capabilities, while providing better performance, security, and isolation. Similarly, we are observing industry momentum around initiatives to enable Kernel and vSwitch programmability (such as eBPF and P4-OvS) that provide control and visibility on packet processing all the way to virtual interfaces of application containers (or VMs). 

Hyperscalers are embracing programmability beyond switches to improve the security, efficiency, and profitability of their infrastructure. The same trends are particularly appealing to edge clouds providers, which can benefit from different kinds of data plane targets to meet requirements such as:

  • Ultra-low latency and high throughput to deliver on the needs of 5G and Industry 4.0
  • Reduced footprint to facilitate deployments in remote locations with space and power constraints, for example with switch-less fabrics for deployments comprising just a few servers.
  • Data processing capabilities beyond packet I/O, to aggregate, transform, and compress huge amounts of data collected from connected devices (such as sensors or cameras) before sending it to a central cloud for further processing.
  • Enhanced security, to protect mission critical applications from attacks and malfunctioning, by means of deep visibility, verification and closed-loop control techniques.

However, expanding the range of data plane programmability means more complexity that needs to be managed to realize its full potential. In the network we have alluded to so far, packets pass through CPU-based vSwitches, NICs, and fabric switches on their way to tenant applications. They also encounter one or more network functions (e.g. security checks, packet inspection, compression, encryption) in different parts of the network that are implemented on a variety of targets, including CPU, GPU, FGPA, and switching ASIC. Today, much of the control, management, and visibility for these functions, different target types, and different vendors is siloed. For this reason, it is difficult to make end-to-end service guarantees for packets that flow through the network. Typically, this involves coordinating between multiple systems which may be provided by different vendors and managed by different teams. These guarantees include slicing, isolation, QoS, and other SLAs. By extending the domain of control, management, and visibility to include all of the network elements in a packet’s path, it is much easier to implement these services and new ones, deploy them on different target types, and better understand if and how packets are flowing from application to destination.

We have an opportunity to build a reference next-generation fabric that embraces end-to-end data plane programmability as a core principle. This program expands the scope of ONF’s fabric solution to encompass additional data plane elements, and identifies and enables key use cases that demonstrate the advantages of these enhancements. Our plan is to focus primarily on 5G-connected edge clouds because of the huge market potential, but we believe the same ideas and techniques can be re-used by hyperscalers.

End-to-End Programmable Network Fabric Accomplishments
ONF has already developed SD-Fabric, an SDN fabric that leverages the benefits of data plane programmability. This was first developed under the Trellis program, then extended to support P4 and programmable switches, and most recently, enhanced with new features and operationally improved under the DARPA-funded Pronto program.

We recently released the first version of SD-Fabric, which uses Stratum and ONOS to offer a full-stack P4-programmable hybrid-cloud network fabric, specifically targeted at edge applications for Industry 4.0 with features well beyond traditional fabrics, such as:

  • Deep visibility with In-Band Network Telemetry (INT), which allows operators to provide SLA compliance and drastically reduce troubleshooting times by knowing exactly where and why individual packets are getting dropped or delayed.
  • Slicing and QoS capabilities, allowing edge application developers to use high-level APIs to create slices to isolate their traffic from other applications/tenants, and define QoS classification rules to handle different traffic classes within a slice, from latency-sensitive to throughput-intensive ones.
  • Embedded 5G UPF (user plane function) fully integrated with standard 3GPP interfaces. SD-Fabric improves performance and reduces the cost of 5G workloads by offloading UPF functions from CPU resources to the switch ASIC.

SD-Fabric has been operationalized and integrated with Aether, and it is currently deployed in over a dozen Aether production sites. As part of the SD-Fabric project, ONF has also begun to extend data plane programmability to non-switching elements such as:

  • Demonstrating PoC of eBPF-based support for INT visibility in the host networking stack integrated with multiple Kubernetes CNIs
  • Designing of more granular Hierarchical QoS (HQoS) functions using FPGA-based NICs

In parallel, ONF has been working with Google, Intel, and other partners on a new P4 Integrated Network Stack (PINS), which brings SDN and programmable data plane capabilities to SONiC and was recently open sourced. PINS shares the same control (P4Runtime), configuration/telemetry (gNMI), and operational (gNOI) interfaces that are exposed by Stratum and used by ONOS. The initial version of PINS supports installing routes and WCMP groups, configuring and programming ACLs, and sending and receiving packets from an SDN controller. PINS, like SONiC, leverages the Switch Abstraction Interface (SAI) to achieve vendor independence, and P4 is used to model the SAI pipeline. So far, PINS has focused on data center switches and is on track for production deployments.

Finally, ONF has also been working with Intel to provide P4 programmability and common runtime interfaces to additional data plane targets, including CPU (via DPDK and eBPF) and the SmartNIC. These new targets support Intel’s Table Driver Interface (TDI), which is the same interface used to program Intel Barefoot Tofino ASICs. This work has been built on Stratum and forms the basis of the P4 OvS project that is being jointly developed by ONF, Intel, and Orange.

================================

In the next blog, we will outline some of the opportunities moving forward, the benefits of programmability, and the trend for embedding network functions in software. We invite additional ecosystem participation in this exciting endeavor. Please contact Brian O’Connor (brian@openneworking.org) to learn how if you are interested.

Share this post:
ABOUT THE AUTHOR Carmelo Cascone and Brian O'Connor
Carmelo Cascone and Brian O'Connor