
I’m not going to lie, I have a strong hatred towards the Berkeley Packet Filter (BPF). There are a lot of reasons mainly having to do with having to support BPF on a network monitoring tool. There’s also the challenge of writing BPF filters and the weird way they work. So when I first heard about eBPF, I was more than a little reluctant to be excited. As I dug in further, I became much more excited about the technology and the benefits it can bring.
So, what is eBPF then? Well, in the words of the eBPF Foundation:
eBPF (which is no longer an acronym for anything) is a revolutionary technology with origins in the Linux kernel that can run sandboxed programs in a privileged context such as the operating system kernel. It is used to safely and efficiently extend the capabilities of the kernel without requiring to change kernel source code or load kernel modules.
We now know a few things. First, we know that eBPF, while related to BPF in some ways, is a whole new beast. Secondly, we know that eBPF is related to running programs directly in a system kernel (such as the Linux kernel). But what does that mean, really?
Linux Kernels
The Linux kernel is a highly important piece of software. It, quite literally, is what makes Linux work. The kernel contains all the code needed for systems to work and acts as the interface between the physical hardware on which it is running and the software processes using that hardware. The kernel is responsible for handling all the communications between these two things and ensuring that they work correctly. For a more detailed look at exactly what the kernel is, take a look at Red Hat’s description.
In an ideal world, the kernel would be entirely invisible to the end user. You turn your computer on, launch your programs and they just work. Also keep in mind that all computers are dependent on some sort of kernel for interfacing between software and hardware.
Of course, the kernel is not the end all of things in the computing world. There are three main things to keep in mind:
- Hardware: As new hardware is released and old hardware is discontinued, people regularly change what is connected to their systems
- Drivers: Housing every possible device driver in any kernel would result in something too large to use
- Security: Because of what the kernel does, it needs to be highly reliable and secure. Making lots of changes can result in lots of problems (i.e. security issues, performance issues, etc.). Change does not come quickly to the Linux kernel
Modules & Linux Kernels
These issues, at least in the Linux world, resulted in the concept of modules. Modules are little bits of code containing things like device drivers that can be loaded on the fly. Effectively, modules enable you to add functionality to the kernel without having to rebuild the kernel with all new code. There are some benefits to this:
- Kernel Size: Modules help keep kernels a more reasonable size. You simply load the modules you want, rather than having to build functionality into the kernel. Modules also prevent rebuilding the kernel every time you make a change
- Device Updates: When new devices come out or existing devices are updated, modules make it easy to update the software. Instead of having to make changes to the kernel, you can simply update the module software
Great you say, so what does all this have to do with eBPF? Well, there are some challenges with kernel modules. The biggest challenge with Linux kernel modules is that there is no consistency within the kernel itself. This means that a version of the module built for and that works with one specific version of the Linux kernel will not work with a different kernel version. Basically, you need to rebuild each module for every kernel version you want to use it with. And since kernels are released regularly, this can be a lot of work. Wouldn’t it be nice if we had something better? Enter eBPF!
Running eBPF in Linux Kernels
eBPF allows developers to write and run code directly in the Linux kernel. This may sound scary (and it can be) but the kernel and eBPF development communities have spent a lot of time coming up with safe approaches to this. When eBPF code is deployed in the kernel, it runs in a sandbox (basically a special space within the kernel dedicated to the eBPF program from which it should not be able to escape). From within the sandbox, the program interfaces with the rest of the kernel via standard Application Programming Interfaces (API’s). This means a program written for eBPF will work with every version of the kernel (unless the eBPF API’s change – something that is usually communicated well in advance).
This all sounds great! Now that you know a bit about eBPF you may be asking yourself “great, so what can I actually do with this thing?” and I am so glad you asked. While the technology is really just getting started (an important thing to remember) here are a few areas eBPF can bring benefit to today:
- Because eBPF runs in kernel space it is really, really fast. Much faster than applications that run in the more traditional user space (where most of the applications you use likely run)
- Since eBPF runs in the kernel if you are trying to debug code there is no required reason to stop the program to see what is going on
- While eBPF programs need to be compiled, this is done on a just-in-time basis, improving performance and efficiency as well as convenience (especially since the eBPF program should work with any kernel version)
- Some things (such as http requests) can be more easily traced via eBPF, since it helps eliminate or reduce the need for manual instrumentation (i.e. agents)
And Yes, eBPF Works!
There are organizations and software using eBPF today. A few examples include:
- Facebook is working on using eBPF to replace their existing load-balancing infrastructure, while also including additional functionality (such as DDoS mitigation). According to reports, they have seen a 10x improvement in performance from moving to eBPF
- Diptanu Choudhury wrote a simple ingress firewall using eBPF and XDP capable of processing 11 million packets per second (before optimizations)
- Netflix has been leveraging eBPF for performance profiling and tracing (via the bcc project)
- Cloudflare is using eBPF to mitigate DDoS attacks
- Linux firewall (iptables) will eventually be replaced with an eBPF-based solution for enhanced performance and ease of use
- Suricata (an IDS) has started using BPF and XDP to replace older approaches to getting access to packets
Sadly, things are not all sunshine and roses. Because of where eBPF programs run and their unprecedented access to kernel-level data, it is also possible to write and deploy nefarious eBPF programs. While there are no known malicious eBPF programs in the wild – as of the time of this writing – there are examples available.
The takeaway from all this is that eBPF is an interesting, relatively new technology that is going to allow for some amazing improvements in the way we interact with data on systems of all types (even Windows)!