# Limago: an FPGA-based Open-Source 100 GbE TCP/IP Stack

Mario Ruiz<sup>1</sup>, David Sidler<sup>2</sup>, Gustavo Sutter<sup>1</sup>, Gustavo Alonso<sup>2</sup> and Sergio López-Buedo<sup>1,3</sup>

<sup>1</sup>High-Performance Computing and Networking Research Group,
Autonomous University of Madrid, Spain

<sup>2</sup> Systems Group, Department of Computer Science, ETH Zürich, Switzerland

<sup>3</sup>NAUDIT High-Performance Computing and Networking, Spain

\*\*mario.ruiz@uam.es\*\*







### Motivation

- Network is becoming a bottleneck in current datacenter applications.
- New approaches are being explored to maximize the network efficiency and to tailor its functionality to the actual needs.
- In-network data processing.
- Network-attached paradigm.
- Provide a platform for further research in programmable networks.
- Starting point 10 Gbit/s stack by Sidler et al. [1]

[1] Sidler, David, et al. "Scalable 10Gbps TCP/IP stack architecture for reconfigurable hardware." 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines. IEEE, 2015.

## Challenges

- Datapath 8x, clock frequency 2x
- Scalability with increasing network bandwidth.
- Flexible and high-productivity methodology. Vivado-HLS
- Widen applicability.
- Long Fat Pipe Issue.

$$RTT(s) \times LinkCapacity(b/s) > BufferSize(b)$$

- One's complement checksum [2].
- CAM. New design based on cuckoo hashing (HLS).
- DRAM bandwidth.

[2] Sutter, Gustavo, et al. "FPGA-based TCP/IP Checksum Offloading Engine for 100 Gbps Networks." 2018 International Conference on ReConFigurable Computing and FPGAs (ReConFig). IEEE, 2018.

## Limago at a Glance



## TOE



## Experiments

#### Limago to Limago (running iperf2 one connection)



# **Experiments**

#### Server(s) to Limago (running iperf2)

Throughput for concurrent connections



## Resource consumption (TOE)



## Conclusions

- ✓ Open-Source implementation.
- ✓ Support for multiple connections and Window Scale.
- ✓ Mostly written in C/C++ using Vivado-HLS.
- ✓ 7,456 lines of C/C++ and 1,482 lines of HDL.
- ✓ Future work includes support for packet reordering and selective acknowledgement (using HBM).

| VCU118     | LUT    | FF    | BRAM   |
|------------|--------|-------|--------|
| 10 G       | 6.6 %  | 3.6 % | 17.1 % |
| 100 G      | 10.1 % | 7.5 % | 20.4 % |
| Difference | 1.55x  | 2.1x  | 1.2x   |

Just 20 % more BRAM for 10x throughput

# Visit our poster for further details

Check out our github!

