loxilb eBPF implementation details

In this section, we will look into details of loxilb ebpf implementation in little details and try to check what goes on under the hood. When loxilb is build, it builds two object files as follows :

llb@nd2:~/loxilb$ ls -l /opt/loxilb/
total 396
drwxrwxrwt 3 root root    0  6?? 20 11:17 dp
-rw-rw-r-- 1 llb llb 305536  6?? 29 09:39 llb_ebpf_main.o
-rw-rw-r-- 1 llb llb  95192  6?? 29 09:39 llb_xdp_main.o

As the name suggests and based on hook point, xdp version does XDP packet processing while ebpf version is used at TC layer for TC eBPF processing. Interesting enough, the packet forwarding code is largely agnostic of its final hook point due to usage of a light abstraction layer to hide differences between eBPF and XDP layer.

Now this beckons the question why separate hook points and how does it all work together ? loxilb does bulk of its processing at TC eBPF layer as this layer is most optimized for doing L4+ processing needed for loxilb operation. XDP's frame format is different than what is used by skb (linux kernel's generic socket buffer). This makes it very difficult (if not impossible) to do tcp checksum offload and other such features used by linux networking stack for quite some time now. In short, if we need to do such operations, XDP performance will be inherently slow. XDP as such is perfect for quick operations at l2 layer. loxilb uses XDP to do certain operations like mirroring. Due to how TC eBPF works, it is difficult to work with multiple packet copies and loxilb's TC eBPF offloads some functinality to XDP layer in such special cases.

base

Loading of loxilb eBPF program

loxilb's goLang based agent by default loads the loxilb ebpf programs in all the interfaces(only physical/real/bond/wireguard) available in the system. As loxilb is designed to run in its own docker/container, this is convenient for users who dont want to have to manually load/unload eBPF programs. However, it is still possible to do so manually if need arises :

To load :

ntc filter add dev eth1 ingress bpf da obj /opt/loxilb/llb_ebpf_main.o sec tc_packet_hook0

To unload:

ntc filter del dev eth1 ingress

To check:

root@nd2:/home/llb# ntc filter show dev eth1 ingress
filter protocol all pref 49152 bpf chain 0 
filter protocol all pref 49152 bpf chain 0 handle 0x1 llb_ebpf_main.o:[tc_packet_hook0] direct-action not_in_hw id 8715 tag 43a829222e969bce jited

Please not that ntc is the customized tc tool from iproute2 package which can be found in loxilb's repository

Entry points of loxilb eBPF

loxilb's eBPF code is usually divided into two program sections with the following entry functions :

tc_packet_func

This alongwith the consequent code does majority of the packet processing. If conntrack entries are in established state, this is also responsible for packet tx. However if conntrack entry for a particular packet flow is not established, it makes a bpf tail call to the tc_packet_func_slow

tc_packet_func_slow

This is responsible mainly for doing NAT lookup and stateful conntrack implementation. Once conntrack entry transitions to established state, the forwarding then can happen directly from tc_packet_func

loxilb's XDP code is contained in the following section :

xdp_packet_func

This is the entry point for packet processing when hook point is XDP instead of TC eBPF

Pinned Maps of loxilb eBPF

All maps used by loxilb eBPF are mounted in the file-system as below :

root@nd2:/home/llb/loxilb# ls -lart /opt/loxilb/dp/
total 4
drwxrwxrwt 3 root root    0  6?? 20 11:17 .
drwxr-xr-x 3 root root 4096  6?? 29 10:19 ..
drwx------ 3 root root    0  6?? 29 10:19 bpf
root@nd2:/home/llb/loxilb# mount | grep bpf
none on /opt/netlox/loxilb type bpf (rw,relatime)

root@nd2:/home/llb/loxilb# ls -lart /opt/loxilb/dp/bpf/
total 0
drwxrwxrwt 3 root root 0  6?? 20 11:17 ..
lrwxrwxrwx 1 root root 0  6?? 20 11:17 xdp -> /opt/loxilb/dp/bpf//tc/
drwx------ 3 root root 0  6?? 20 11:17 tc
lrwxrwxrwx 1 root root 0  6?? 20 11:17 ip -> /opt/loxilb/dp/bpf//tc/
-rw------- 1 root root 0  6?? 29 10:19 xfis
-rw------- 1 root root 0  6?? 29 10:19 xfck
-rw------- 1 root root 0  6?? 29 10:19 xctk
-rw------- 1 root root 0  6?? 29 10:19 tx_intf_stats_map
-rw------- 1 root root 0  6?? 29 10:19 tx_intf_map
-rw------- 1 root root 0  6?? 29 10:19 tx_bd_stats_map
-rw------- 1 root root 0  6?? 29 10:19 tmac_stats_map
-rw------- 1 root root 0  6?? 29 10:19 tmac_map
-rw------- 1 root root 0  6?? 29 10:19 smac_map
-rw------- 1 root root 0  6?? 29 10:19 rt_v6_stats_map
-rw------- 1 root root 0  6?? 29 10:19 rt_v4_stats_map
-rw------- 1 root root 0  6?? 29 10:19 rt_v4_map
-rw------- 1 root root 0  6?? 29 10:19 polx_map
-rw------- 1 root root 0  6?? 29 10:19 pkts
-rw------- 1 root root 0  6?? 29 10:19 pkt_ring
-rw------- 1 root root 0  6?? 29 10:19 pgm_tbl
-rw------- 1 root root 0  6?? 29 10:19 nh_map
-rw------- 1 root root 0  6?? 29 10:19 nat_v4_map
-rw------- 1 root root 0  6?? 29 10:19 mirr_map
-rw------- 1 root root 0  6?? 29 10:19 intf_stats_map
-rw------- 1 root root 0  6?? 29 10:19 intf_map
-rw------- 1 root root 0  6?? 29 10:19 fc_v4_stats_map
-rw------- 1 root root 0  6?? 29 10:19 fc_v4_map
-rw------- 1 root root 0  6?? 29 10:19 fcas
-rw------- 1 root root 0  6?? 29 10:19 dmac_map
-rw------- 1 root root 0  6?? 29 10:19 ct_v4_map
-rw------- 1 root root 0  6?? 29 10:19 bd_stats_map
-rw------- 1 root root 0  6?? 29 10:19 acl_v6_stats_map
-rw------- 1 root root 0  6?? 29 10:19 acl_v4_stats_map
-rw------- 1 root root 0  6?? 29 10:19 acl_v4_map

Using bpftool, it is easy to check state of these maps as follows :

root@nd2:/home/llb# bpftool map dump pinned /opt/loxilb/dp/bpf/intf_map 
[{
        "key": {
            "ifindex": 2,
            "ing_vid": 0,
            "pad": 0
        },
        "value": {
            "ca": {
                "act_type": 11,
                "ftrap": 0,
                "oif": 0,
                "cidx": 0
            },
            "": {
                "set_ifi": {
                    "xdp_ifidx": 1,
                    "zone": 0,
                    "bd": 3801,
                    "mirr": 0,
                    "polid": 0,
                    "r": [0,0,0,0,0,0
                    ]
                }
            }
        }
    },{
        "key": {
            "ifindex": 3,
            "ing_vid": 0,
            "pad": 0
        },
        "value": {
            "ca": {
                "act_type": 11,
                "ftrap": 0,
                "oif": 0,
                "cidx": 0
            },
            "": {
                "set_ifi": {
                    "xdp_ifidx": 3,
                    "zone": 0,
                    "bd": 3803,
                    "mirr": 0,
                    "polid": 0,
                    "r": [0,0,0,0,0,0
                    ]
                }
            }
        }
    }
]

As our development progresses, we will keep updating details about these map's internals

loxilb eBPF pipeline at a glance

The following figure shows a very high-level diagram of packet flow through loxilb eBPF pipeline :

pipe

We use eBPF tail calls to jump from one section to another majorly due to the fact that there is clear separation for CT (conntrack) functionality and packet-forwarding logic. At the same time, since kernel's built in eBPF-verifier imposes a maximum code size limit for a single program/section, it also helps to circumvent this issue.