Linux "bpf" Command Line Options and Examples

perform a command on an extended BPF map or program

The bpf() system call performs a range of operations related to extended Berkeley Packet Filters. Extended BPF (or eBPF) is similar to the original ("classic") BPF (cBPF) used to filter network packets. For both cBPF and eBPF programs, the kernel statically ana‐ lyzes the programs before loading them, in order to ensure that they cannot harm the running system.

Usage:

#include <linux/bpf.h>

int bpf(int cmd, union bpf_attr *attr, unsigned int size);

Related Commands

Command Line Options:

-->

prog_1 prog_2 prog_3 classifier action| | | | prog_4 prog_5


                            bpf --> ...

-1

* log_level verbosity level of the verifier. A value of zero means that the verifier will not provide a log; in this case, log_bufmust be a NULL pointer, and log_size must be zero.Applying close(2) to the file descriptor returned by BPF_PROG_LOAD will unload the eBPF program (but see NOTES).Maps are accessible from eBPF programs and are used to exchange data between eBPF programs and between eBPF programs and user-spaceprograms. For example, eBPF programs can process various events (like kprobe, packets) and store their data into a map, and user-space programs can then fetch data from the map. Conversely, user-space programs can use a map as a configuration mechanism, popu‐lating the map with values checked by the eBPF program, which then modifies its behavior on the fly according to those values.eBPF program typesThe eBPF program type (prog_type) determines the subset of kernel helper functions that the program may call. The program type alsodetermines the program input (context)—the format of struct bpf_context (which is the data blob passed into the eBPF program as thefirst argument).For example, a tracing program does not have the exact same subset of helper functions as a socket filter program (though they mayhave some helpers in common). Similarly, the input (context) for a tracing program is a set of register values, while for a socketfilter it is a network packet.The set of functions available to eBPF programs of a given type may increase in the future.The following program types are supported:BPF_PROG_TYPE_SOCKET_FILTER (since Linux 3.19)Currently, the set of functions for BPF_PROG_TYPE_SOCKET_FILTER is:bpf_map_lookup_elem(map_fd, void *key)/* look up key in a map_fd */bpf_map_update_elem(map_fd, void *key, void *value)/* update key/value */bpf_map_delete_elem(map_fd, void *key)/* delete key in a map_fd */The bpf_context argument is a pointer to a struct __sk_buff.BPF_PROG_TYPE_KPROBE (since Linux 4.1)[To be documented]BPF_PROG_TYPE_SCHED_CLS (since Linux 4.1)[To be documented]BPF_PROG_TYPE_SCHED_ACT (since Linux 4.1)[To be documented]EventsOnce a program is loaded, it can be attached to an event. Various kernel subsystems have different ways to do so.Since Linux 3.19, the following call will attach the program prog_fd to the socket sockfd, which was created by an earlier call tosocket(2):setsockopt(sockfd, SOL_SOCKET, SO_ATTACH_BPF,&prog_fd, sizeof(prog_fd));Since Linux 4.1, the following call may be used to attach the eBPF program referred to by the file descriptor prog_fd to a perf eventfile descriptor, event_fd, that was created by a previous call to perf_event_open(2):ioctl(event_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);EXAMPLES/* bpf+sockets example:* 1. create array map of 256 elements* 2. load program that counts number of packets received* r0 = skb->data[ETH_HLEN + offsetof(struct iphdr, protocol)]* map[r0]++* 3. attach prog_fd to raw socket via setsockopt()* 4. print number of received TCP/UDP packets every second*/intmain(int argc, char **argv){int sock, map_fd, prog_fd, key;long long value = 0, tcp_cnt, udp_cnt;map_fd = bpf_create_map(BPF_MAP_TYPE_ARRAY, sizeof(key),sizeof(value), 256);if (map_fd < 0) {printf("failed to create map '%s'\n", strerror(errno));/* likely not run as root */return 1;}struct bpf_insn prog[] = {BPF_MOV64_REG(BPF_REG_6, BPF_REG_1), /* r6 = r1 */BPF_LD_ABS(BPF_B, ETH_HLEN + offsetof(struct iphdr, protocol)),/* r0 = ip->proto */BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_0, -4),/* *(u32 *)(fp - 4) = r0 */BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), /* r2 = fp */BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4), /* r2 = r2 - 4 */BPF_LD_MAP_FD(BPF_REG_1, map_fd), /* r1 = map_fd */BPF_CALL_FUNC(BPF_FUNC_map_lookup_elem),/* r0 = map_lookup(r1, r2) */BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2),/* if (r0 == 0) goto pc+2 */BPF_MOV64_IMM(BPF_REG_1, 1), /* r1 = 1 */BPF_XADD(BPF_DW, BPF_REG_0, BPF_REG_1, 0, 0),/* lock *(u64 *) r0 += r1 */BPF_MOV64_IMM(BPF_REG_0, 0), /* r0 = 0 */BPF_EXIT_INSN(), /* return r0 */};prog_fd = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, prog,sizeof(prog), "GPL");sock = open_raw_sock("lo");assert(setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, &prog_fd,sizeof(prog_fd)) == 0);for (;;) {key = IPPROTO_TCP;assert(bpf_lookup_elem(map_fd, &key, &tcp_cnt) == 0);key = IPPROTO_UDP;assert(bpf_lookup_elem(map_fd, &key, &udp_cnt) == 0);printf("TCP %lld UDP %lld packets\n", tcp_cnt, udp_cnt);sleep(1);}return 0;}Some complete working code can be found in the samples/bpf directory in the kernel source tree.RETURN VALUEFor a successful call, the return value depends on the operation:BPF_MAP_CREATEThe new file descriptor associated with the eBPF map.BPF_PROG_LOADThe new file descriptor associated with the eBPF program.All other commandsZero.On error, -1 is returned, and errno is set appropriately.ERRORSE2BIG The eBPF program is too large or a map reached the max_entries limit (maximum number of elements).EACCES For BPF_PROG_LOAD, even though all program instructions are valid, the program has been rejected because it was deemed unsafe.This may be because it may have accessed a disallowed memory region or an uninitialized stack/register or because the functionconstraints don't match the actual types or because there was a misaligned memory access. In this case, it is recommended tocall bpf() again with log_level = 1 and examine log_buf for the specific reason provided by the verifier.EBADF fd is not an open file descriptor.EFAULT One of the pointers (key or value or log_buf or insns) is outside the accessible address space.EINVAL The value specified in cmd is not recognized by this kernel.EINVAL For BPF_MAP_CREATE, either map_type or attributes are invalid.EINVAL For BPF_MAP_*_ELEM commands, some of the fields of union bpf_attr that are not used by this command are not set to zero.EINVAL For BPF_PROG_LOAD, indicates an attempt to load an invalid program. eBPF programs can be deemed invalid due to unrecognizedinstructions, the use of reserved fields, jumps out of range, infinite loops or calls of unknown functions.ENOENT For BPF_MAP_LOOKUP_ELEM or BPF_MAP_DELETE_ELEM, indicates that the element with the given key was not found.ENOMEM Cannot allocate sufficient memory.EPERM The call was made without sufficient privilege (without the CAP_SYS_ADMIN capability).VERSIONSThe bpf() system call first appeared in Linux 3.18.CONFORMING TOThe bpf() system call is Linux-specific.NOTESIn the current implementation, all bpf() commands require the caller to have the CAP_SYS_ADMIN capability.eBPF objects (maps and programs) can be shared between processes. For example, after fork(2), the child inherits file descriptorsreferring to the same eBPF objects. In addition, file descriptors referring to eBPF objects can be transferred over UNIX domainsockets. File descriptors referring to eBPF objects can be duplicated in the usual way, using dup(2) and similar calls. An eBPFobject is deallocated only after all file descriptors referring to the object have been closed.eBPF programs can be written in a restricted C that is compiled (using the clang compiler) into eBPF bytecode. Various features areomitted from this restricted C, such as loops, global variables, variadic functions, floating-point numbers, and passing structuresas function arguments. Some examples can be found in the samples/bpf/*_kern.c files in the kernel source tree.The kernel contains a just-in-time (JIT) compiler that translates eBPF bytecode into native machine code for better performance. TheJIT compiler is disabled by default, but its operation can be controlled by writing one of the following integer strings to the file/proc/sys/net/core/bpf_jit_enable:0 Disable JIT compilation (default).1 Normal compilation.2 Debugging mode. The generated opcodes are dumped in hexadecimal into the kernel log. These opcodes can then be disassembledusing the program tools/net/bpf_jit_disasm.c provided in the kernel source tree.JIT compiler for eBPF is currently available for the x86-64, arm64, and s390 architectures.


                            bpf -1 ...