Kernel Module Weak Updates

From trapsink.com
Jump to: navigation, search


Overview

The Red Hat oriented Linux kernel architecture has a method for 3rd party entities to provide a kernel module for an entire family of kernel releases, based on the fundamental understanding the kernel's entry tables and module interface does not change within that family. This document goes over the basic design behind the solution.

The use of this methodology is popular amongst 3rd party vendors who provide a pre-compiled kernel module for their hardware and allow that same binary module to work for a number of compatible kernels. Shipping a new binary module for each and every Red Hat kernel release is therefore not required, reducing the complexity of producing the module and it's runtime maintenance on a server.

In common usage, these types of modules are delivered in packages named kmod-<foo>, where foo> is the name of the existing stock kernel module as shipped by the distribution. The overall compatibility is referred to as kABI or Kernel Application Binary Interface.


Module Loading

Key to understanding the method is how the kernel will look for the modules to load. It varies vendor by vendor depending on the distribution, but generally speaking the modules are looked for in this order:

Location Description
/lib/modules/(kernel-version)/updates manually controlled area for use by sysadmins to insert a module by hand and override everything
/lib/modules/(kernel-version)/extra override everything shipped with the kernel and weak-updates (see below)
/lib/modules/(kernel-version)/* stock kernel modules (usually in a subdirectory kernel) and other named directories; a vendor may choose to have a top-level directory here, such as the EMC PowerPath software using /lib/modules/(kernel-version)/powerpath as it's standard location
/lib/modules/(kernel-version)/weak-updates compatible kernel modules for this kernel, but were actually compiled against another similar kernel in the family

The concept named weak-updates works in tandem with the extra module location; typically the original module is installed in an 'extra' directory named where it was compiled, and a symlink exists in the weak-updates directory from another kernel.


Functional Examples

Using a Red Hat Enterprise (RHEL) 7 system, we first note that only one kernel is installed:

# rpm -qa | grep ^kernel-3
kernel-3.10.0-327.49.1.el7.x86_64

In Red Hat's versioning scheme, this is read as two parts of a design:

  • 3.10.0-327.* - the suite or "family" of kernel releases
  • .49.1.el7 - the specific patched release of this kernel within the family

This design indicates that any kernel module built for the 3.10.0-327.* family of kernels should be compatible with any specific kernel in the family; but as it's not possible to 100% guarantee this ahead of time, safety checks exist (more on this below). On this server, we have a kmod kernel module that is replacing one of the stock ones.

Example: bfa

This package kmod-bfa was obtained from a 3rd party provider for the Brocade fiber channel adapters.

/lib/modules/3.10.0-327.el7.x86_64/extra/bfa:
-rw-r--r--. 1 root root 23431886 Apr 22  2016 bfa.ko

/lib/modules/3.10.0-327.49.1.el7.x86_64/weak-updates/bfa:
lrwxrwxrwx. 1 root root 51 Feb 14 12:54 bfa.ko -> /lib/modules/3.10.0-327.el7.x86_64/extra/bfa/bfa.ko

Notice that the real file is in a directory for a kernel that is not installed; it's located in the "base" kernel for the family (the first one released in the family, in this case RHEL 7.2) extra/ directory, and from our running kernel has a symlink from it's weak-updates/ directory back to the module. This module is compatible for weak-updates; it was compiled against kernel 3.10.0-327 but functionally works with kernel 3.10.0-327.49.1 as is, no modifications needed.

Example: lpfc

This package kmod-lpfc is provided in the main RHEL7 software repository by Red Hat, providing newer upstream code for Emulex fiber channel adapters.

/lib/modules/3.10.0-327.el7.x86_64/extra/lpfc:
-rw-r--r--. 1 root root 1180268 Sep  5 02:51 lpfc.ko

/lib/modules/3.10.0-327.49.1.el7.x86_64/weak-updates/lpfc:
lrwxrwxrwx. 1 root root 53 Feb 16 15:09 lpfc.ko -> /lib/modules/3.10.0-327.el7.x86_64/extra/lpfc/lpfc.ko

The design is exactly that of the previous example; this module is compatible for weak-updates; it was compiled against kernel 3.10.0-327 but functionally works with kernel 3.10.0-327.49.1 as is, no modifications needed.


Compatible Modules

Ensuring that a module compiled for one version of a kernel is compatible with another kernel is key to the system working correctly; the topic deals a great deal with compilers, assemblers and linkers which provide the needed data to compare for compatibility. When a binary is compiled it has a symbol table which basically indicates the structural location of all usable functions; this is both the kernel itself, and any modules trying to load themselves into that kernel.

The location address of all kernel functions a module expects to use are embedded in itself, as well as what it exports for others (imagine a module using a module) - the process is at it's simplest asking the target kernel if the map the module knows about has changed or not. If nothing has changed, it's compatible so long as some sort of internal change has not happened that is not visible to the outside world. This is the kABI in effect, the module is kernel ABI compatible between several compiled kernels.

Kernel Symbols

The kernel(s) ship with a pre-exported symbol table stored in the /boot directory next to the kernel:

# ls -l /boot/symvers-3.10.0-327.49.1.el7.x86_64.gz 
-rw-r--r--. 1 root root 252731 Jan 25 11:37 /boot/symvers-3.10.0-327.49.1.el7.x86_64.gz

# zgrep blk_queue_init_tags /boot/symvers-3.10.0-327.49.1.el7.x86_64.gz
0x00a006aa    blk_queue_init_tags    vmlinux    EXPORT_SYMBOL

The output above shows the function blk_queue_init_tags is exported for all to use (EXPORT_SYMBOL) by the binary vmlinux (the kernel) with address 0x00a006aa in the stack. This is a general function being used for example purposes herein, there are many more in use.

Due to specifically how the kernel operates, the shipped vmlinuz file (a compressed, stripped copy of vmlinux) typically does not contain the symbols; hence, they are extracted while the kernel package is being compiled and packaged and saved as a separate file for use in userspace. The symvers file contains all the symbols of every module as well as just the main kernel itself, making it quite a large set of data. If a symbol is exported by a module the module's name will be located where vmlinux is shown above.

Also note that some exports are for GPL compliant module use only; they have EXPORT_SYMBOL_GPL type and can only be used by GPL compliant modules.

Module Symbols

A module is nothing more fancy than a standard library (shared object) designed specifically to work with the kernel. As such, all the normal commands to deal with symbol tables can be used like so with GNU nm:

# nm /lib/modules/3.10.0-327.el7.x86_64/extra/bfa/bfa.ko | grep blk_queue_init_tags
                 U blk_queue_init_tags

# nm /lib/modules/3.10.0-327.el7.x86_64/extra/lpfc/lpfc.ko | grep blk_queue_init_tags
                 U blk_queue_init_tags

You'll notice that this information is not super useful as shown; how a binary is assembled is more complex and requires a bit of work to get the data required in a format which makes sense. The modprobe tool with a bit of sed can be used to reassemble the data in a way that makes more sense for the task at hand, namely comparisons of addresses to names:

# modprobe --dump-modversions /lib/modules/3.10.0-327.el7.x86_64/extra/bfa/bfa.ko \
    | sed -r -e 's:^(0x[0]*[0-9a-f]{8}\t.*):\1:' | grep blk_queue_init_tags

0x00a006aa    blk_queue_init_tags

# modprobe --dump-modversions /lib/modules/3.10.0-327.el7.x86_64/extra/lpfc/lpfc.ko \
    | sed -r -e 's:^(0x[0]*[0-9a-f]{8}\t.*):\1:' | grep blk_queue_init_tags

0x00a006aa    blk_queue_init_tags

This shows us the kernel is exporting the function blk_queue_init_tags at 0x00a006aa and the weak module (compiled for another kernel) is expecting to find this same function at address 0x00a006aa - this is a compatible function entry point, nothing has changed. From here, all that's left is to ensure each and every function the module uses or exports undergoes the same scrutiny for kABI compatibility.


Methodology

There are several steps to ensuring a weak-updates kernel module is integrated well with the system and is compatible with a given target kernel. Each kernel is checked on it's own, so it is possible to have one kernel in a family using the kernel module (a symlink exists from it to the older file), or to not be using it (no symlink exists).

Dependencies

The dependencies first must be taken care of; in the chance the module being inserted as a weak-update is used by another module, the system needs to know about the symbols in the weak-update version as they may have changed (therefore causing a cascaded incompatibility by accident).

The entity shipping the module creates a file in /etc/depmod.d/ with the override, like so:

# cat /etc/depmod.d/bfa.conf
override bfa 3.10.0-* weak-updates/bfa

# cat /etc/depmod.d/lpfc.conf 
override lpfc 3.10.0-327.* weak-updates/lpfc

This is telling the system to use the weak-updates/bfa module version for all kernels in the 3.10.0-* suite (which is all of RHEL 7 in this example) if it is found for bfa, but for lpfc the wildcard is more refined to only work with 3.10.0-327.* kernels as an alternate example.

The entity shipping the module then runs this command after the module has been added (via RPM post-install, etc.); in this example, the module was compiled for 3.10.0-327 so the depmod command is using that version to update the symbols:

# depmod -aeF "/boot/System.map-3.10.0-327.el7.x86_64" "3.10.0-327.el7.x86_64"

As might be inferred from the above, this updated the /boot/System.map-3.10.0-327.el7.x86_64 file with all symbols from the new file in the extra/ directory.

Weak Modules

The second step is to now create all the compatibility symlinks in the weak-updates/ subdirectories of all kernels installed on the system which are in fact 100% compatible with this new module. From the outside, it's all built into a script that can just be used by the entity shipping the module (again, in their package post-install):

# weak-modules --add-modules /lib/modules/3.10.0-327.el7.x86_64/extra/bfa/bfa.ko
...or:
# weak-modules --add-modules /lib/modules/3.10.0-327.el7.x86_64/extra/lpfc/lpfc.ko

The weak-updates script will also rebuild the initramfs files in /boot for all the kernels found, inserting the new module setup for stage 1 boot. When installing a kmod package this is the perceived lag, after the RPM has placed the bits down it's updating all initramfs files for kernels it adjusted.

The process inside the script can be broken down into these basic steps:

  1. Take the kernel symbols file and massage it into a format that works with diff and join later (loops for every kernel found):
    # krel=$(uname -r)
    
    # zcat /boot/symvers-$krel.gz \
        | sed -r -ne 's:^(0x[0]*[0-9a-f]{8}\t[0-9a-zA-Z_.]+)\t.*:\1:p' \
        > symvers-$krel
    
  2. If required (the kernel may not have any), extract and prepare the same information from any extra/ modules in the target kernel (this will loop for every installed kernel). Notice that we're only extracting data of the installed kernels and if they have something in extra/ - this file may be zero bytes if none are there:
    # krel=$(uname -r)
    
    # find /lib/modules/$krel/extra -name '*.ko' \
        | xargs nm \
        | sed -nre 's:^[0]*([0-9a-f]{8}) A __crc_(.*):0x\1 \2:p' \
        > addon-symvers-$krel
    
  3. Do the same action as the above, but specifically for the kernel the module was built against known as vermagic within the module's data:
    # modinfo -F vermagic bfa lpfc
    3.10.0-327.el7.x86_64 SMP mod_unload modversions
    3.10.0-327.el7.x86_64 SMP mod_unload modversions
    
    # module_krel=3.10.0-327.el7.x86_64
    
    # find /lib/modules/$module_krel/extra -name '*.ko' \
        | xargs nm \
        | sed -nre 's:^[0]*([0-9a-f]{8}) A __crc_(.*):0x\1 \2:p' \
        > extra-symvers-$module_krel
    
  4. Take the data from the above steps and simply combine and sort it for use:
    # sort -u symvers-$krel \
        extra-symvers-$module_krel \
        addon-symvers-$krel \
        > all-symvers-$krel-$module_krel
    
  5. Now extract the data from the new module physically being added to the system and extract it's symbols as well:
    # module="/lib/modules/3.10.0-327.el7.x86_64/extra/bfa/bfa.ko"
    ...or:
    # module="/lib/modules/3.10.0-327.el7.x86_64/extra/lpfc/lpfc.ko"
    
    # /sbin/modprobe --dump-modversions "$module" \
        | sed -r -e 's:^(0x[0]*[0-9a-f]{8}\t.*):\1:' \
        | sort -u \
        > modvers
    
  6. Last, use the join command in reverse mode (think grep -v) to tell us if any lines from all the known symbols provided does not match the symbols the new module is expecting:
    join -j 1 -v 2 all-symvers-$krel-$module_krel modvers
    

This set of steps tells us if the incoming module being added is identical in symbols to what's actually running an expected on the system; any output from the last step is indicating that something was found that differs in either address or availability and this module is not compatible. No output means it's fully compatible and cane be symlinked to the target kernel safely.


Incompatible Example

Using the above methodology, we can examine a different kernel module which is incompatible. This specific version of kmod-bna has 4 occurrences of incompatible function addresses with a specific (older) kernel that has been installed. Each of the items above is covered in order:

The setup:
# rpm -qa | egrep "^(kernel-3|kmod-bna)"
kernel-3.10.0-229.20.1.el7.x86_64
kernel-3.10.0-327.49.1.el7.x86_64
kmod-bna-3.2.7.0-0.el7.x86_64

Step 1:
# krel=3.10.0-229.20.1.el7.x86_64
# zcat /boot/symvers-$krel.gz \
>     | sed -r -ne 's:^(0x[0]*[0-9a-f]{8}\t[0-9a-zA-Z_.]+)\t.*:\1:p' \
>     > symvers-$krel

Step 2:
# find /lib/modules/$krel/extra -name '*.ko' \
>     | xargs nm \
>     | sed -nre 's:^[0]*([0-9a-f]{8}) A __crc_(.*):0x\1 \2:p' \
>     > addon-symvers-$krel

Step 3:
# modinfo -F vermagic bna | cut -f1 -d' '
3.10.0-327.el7.x86_64
# module_krel=3.10.0-327.el7.x86_64
# find /lib/modules/$module_krel/extra -name '*.ko' \
>     | xargs nm \
>     | sed -nre 's:^[0]*([0-9a-f]{8}) A __crc_(.*):0x\1 \2:p' \
>     > extra-symvers-$module_krel

Step 4:
# sort -u symvers-$krel \
>     extra-symvers-$module_krel \
>     addon-symvers-$krel \
>     > all-symvers-$krel-$module_krel

Step 5:
# module="/lib/modules/3.10.0-327.el7.x86_64/extra/bna/bna.ko"
# /sbin/modprobe --dump-modversions "$module" \
>     | sed -r -e 's:^(0x[0]*[0-9a-f]{8}\t.*):\1:' \
>     | sort -u \
>     > modvers

Step 6:
# join -j 1 -v 2 all-symvers-$krel-$module_krel modvers
0x7efd609f __netif_napi_add
0x905307be napi_complete_done
0xd93737a0 napi_disable
0xe1d1af76 __dev_kfree_skb_any

The methodology is showing us there are 4 addresses which do not match up between the older kernel and this newer module, making them incompatible for use together.


Caveats

A compatible kernel module as determined by the weak-updates methodology is an observation from the symbol addresses from the outside only; there is no way to functionally test the module works at runtime transparently, only that it can be inserted to the target kernel without error. It is entirely possible for a coding error internally to occur and the module not work; the kernel engineers patching a given kernel may have changed something which causes breakage.

Testing a newly updated kernel against any existing weak module must performed to ensure all functionality is retained.


References