kb:driver_software:driver_software_for_linux:linux_driver_package_-_troubleshooting

Linux Driver Package - Troubleshooting

This page contains some hints that maybe helpful to find out why something doesn't work as expected.

If ntpd doesn't see the device, or doesn't accept it as reference time source, but mbgstatus can access the device successfully, some possible reasons are:

  • The device is not synchronized, and thus not used by ntpd. In this case mbgstatus reports that the device is not synchronized.
  • mbgsvcd is not running, thus no time stamps are fed into ntpd's shared memory driver, and thus ntpd doesn't get any reference time. In this case no mbgsvcd process is listed in the output of the ps ax command.
  • AppArmor or SELinux are installed, active, and preventing ntpd from accessing the shared memory segment. In this case there should be some log messages from ntpd in the syslog saying Permisson denied or so. See the README file of the driver package for hints how to fix this.
  • ntpd has been compiled without support for the Shared Memory Driver. In this case there should be some log messages from ntpd in the syslog saying that driver 28 is unknown, or so.

If the driver software for Linux has been compiled and installed correctly but the kernel driver doesn't load, or a device is not detected even though it is obviously installed, the following instructions may help to determine the reason for this problem. In such case even a command like mbgstatus only reports something like “No Device found.”.

If the kernel module file mbgclock.ko can't be found when the command modprobe mbgclock is executed, possible reasons are:

  • The driver hasn't been installed, yet, i.e. the make install command hasn't been executed.
  • The driver has been built and installed, but for a different kernel version.

The 2nd case can occur if the kernel has been updated to a new version, but the system hasn't been rebooted since, so an older kernel is still running when the driver package is built. The kernel module is installed in a subdirectory associated with the version of the running kernel, but after reboot the kernel version has changed, and thus module file is in a different directory than expected.

The following commands can be used to check this:

uname -r
find /lib/modules/ -name mbgclock.ko

The output may look similar to:

#> uname -r
3.16.7-35-desktop
#> find /lib/modules -name mbgclock.ko
/lib/modules/3.16.7-24-desktop/extra/mbgclock.ko
/lib/modules/3.16.7-35-desktop/extra/mbgclock.ko
/lib/modules/3.16.7-29-desktop/extra/mbgclock.ko
/lib/modules/3.16.7-21-desktop/extra/mbgclock.ko
/lib/modules/3.16.7-7-desktop/extra/mbgclock.ko

In the case above there have been several kernel updates, and there is one instance of the kernel module per kernel version. This is OK, but it is important that there is also a module listed for the running kernel 3.16.7-35-desktop, which is /lib/modules/3.16.7-35-desktop/extra/mbgclock.ko in this case.

The Linux kernel can optionally be built with a requirement that out-of-tree modules are only loaded if they have been built with the same compiler version as the target kernel. In this case an extra module does not load if it has been built using the proper kernel headers, but a different compiler version.

This has been observed e.g. on RedHat Enterprise Linux (RHEL) 5.3 where obviously the kernel was built with gcc-4.1 and the constraint that the compiler version must be the same for modules built out-of-tree.

A customer optionally installed a newer version of gcc-4.3 on this system. The original gcc-4.1 was still there but gcc-4.3 became the default compiler, so an extra kernel module which was properly built with the new default compiler failed to be loaded with these messages reported by dmesg:

mbgclock: version magic '2.6.18-128.1.10.el5 SMP mod_unload gcc-4.3'
should be '2.6.18-128.1.10.el5 SMP mod_unload gcc-4.1'

So obviously the compiler version is part of the version magic string (in most cases it is not), and the module failed to be loaded because it was built with gcc-4.3 while the kernel itself had been built with gcc-4.1.

A fix in this case is to make sure the expected compiler version is installed:

# ls -l /usr/bin/gcc*
-rwxr-xr-x 2 root root 216016 Jan 21 14:54 /usr/bin/gcc
-rwxr-xr-x 2 root root  97728 Jan  9  2007 /usr/bin/gcc41
-rwxr-xr-x 2 root root 234936 Jan 21 13:29 /usr/bin/gcc43

The driver package containing the extra kernel module can then be built with specification of the appropriate compiler version:

make clean; make CC=gcc41 && make install

If the kernel module tries to load normally, but no PCI card or USB device is found even though it is obviously installed, some debugging is required to find out what might be the reason.

First step is to find out if the device is recognized by the kernel. In case of a PCI card:

lspci -d 1360:*

should list the device, for example:

#> lspci -d 1360:*
01:00.0 System peripheral: Meinberg Funkuhren GPS180PEX GPS Receiver (PCI Express) (rev 01)

In case of an USB device this command can be used:

lsusb -d 1938:

and if a device is detected the output should look similar to:

#> lsusb -d 1938:
Bus 003 Device 023: ID 1938:0301 

If no device is listed by the commands above then the kernel module can't find the device, either.

The reason may be that the USB port isn't connected, or the PCI slot is disabled, so it's definitely worth trying a different USB port or PCI slot.

Please note that on some machines (e.g. some Dell servers) individual PCI slots can be disabled in the BIOS setup, so you may enter the BIOS setup check for such setting, and enable the affected slot, if it is disabled.

If the device is listed this just means that is is recognized according to it's vendor and device ID. There is still no guarantee that the driver is capable of talking to the microcontroller on the device.

So if the device is listed, but the kernel module doesn't detect it, the kernel module may be recompiled with DEBUG enabled to provide some more detailed information. Assuming the current directory is the base directory of the driver package, change into the directory of the kernel driver

cd mbgclock/

and run the following command to rebuild only the kernel module with debug information:

cd mbgclock/
make clean; make DEBUG_FULL=1  # was DEBUG=15 for driver packages before 4.2.0

The kernel module can be built with a number of debug options associated with different topics, which are all listed if make is run with DEBUG_FULL=1, e.g.:

Calling kernel build system to make "clean"
Kernel build system finished making "clean"
DEBUG_FULL:1  
REPORT_IO_ERRORS: [1]  
REPORT_CFG: [1]  
REPORT_CFG_DETAILS: [1]  
DEBUG_RSRC: [1]  
DEBUG_SERNUM: [1]  
DEBUG_ACCESS_TIMING: [1]  
DEBUG_IO_TIMING: [1]  
DEBUG_IO: [1]  
DEBUG_USB_IO: [1]  
DEBUG_DRVR: [1]  
DEBUG_DEV_INIT: [1]  
DEBUG_IOCTL: [1]  
Calling kernel build system to make "modules"
Kernel build system finished making "modules"

When the kernel driver is built this way, loading the module produces a huge amount of messages in the output of dmesg. To enable only a limited subset of the full debug stuff, make can also be called with the interesting options, e.g.:

make clean; make REPORT_CFG=1 REPORT_CFG_DETAILS=1 DEBUG_RSRC=1

Once the kernel module has been compiled with the desired debug information, the freshly compiled module can be loaded without having to install it first:

insmod ./mbgclock.ko

After you have tried to load the module, a huge number of debug messages is printed by the dmesg command, and the output can easily be copied to e text file, e.g.:

dmesg | tee dmesg-output.txt

displays the messages and also writes them to the file dmesg-output.txt. The messages can be helpful to identify what's going wrong when the driver probes the device, and the result may be that the device is faulty. E.g., if the microprocessor on the device is dead, the device doesn't reply when the driver tries to read some data.

In this case the debug output might contain a message like Failed to read firmware ID or so. Please contact Meinberg support at techsupport@meinberg.de and provide some information on the problem. If possible, append the text file with the dmesg output to your email, preferably archived, e.g. as .tar.gz or .zip file.

Don't run make install for the DEBUG version of the kernel module. Due to the debug messages the module is pretty slow, system timekeeping is significantly worse, and the system log is quickly filled with debug messages. So after debugging has finished, only the kernel module or simply the whole package can be recompiled and installed without DEBUG by running the following command in the base directory of the driver package:

make clean; make && make install

Martin Burnicki martin.burnicki@meinberg.de, last updated 2022-01-13

  • kb/driver_software/driver_software_for_linux/linux_driver_package_-_troubleshooting.txt
  • Last modified: 2022-01-13 16:32
  • by 127.0.0.1