Check dmesg¶
Overview¶
Checks the kernel ring buffer (dmesg) for messages at severity levels emerg, alert, crit, and err. Known false positives and hardware-specific noise are filtered out by default. To clear reported messages after resolving the underlying issue, run "dmesg --clear". Requires root or sudo.
Important Notes:
- The reported timestamps may be inaccurate. The time source used for dmesg is not updated after system SUSPEND/RESUME. Timestamps are adjusted according to the current delta between boottime and monotonic clocks, which only works for messages printed after the last resume
- The kernel ring buffer is a fixed-size circular buffer. Over time, newer messages overwrite older ones, so errors that have been resolved and whose messages have been overwritten will no longer be reported
Data Collection:
- Executes
dmesg --level=emerg,alert,crit,err --ctimeto read the kernel ring buffer - Known false positives are filtered out by default, including common harmless messages such as "Assuming drive cache: write through", "ioctl error in smb2_get_dfs_refer rc=-5", "shpchp pci_hp_register failed with error -16" on virtualized hosts, and various KVM/EFI/SMBus messages. The bundled default ignore list is annotated inline with the rationale and reference URLs for each entry, so it can be re-evaluated as the plugin matures
- Additional messages can be excluded using the
--ignoreparameter, which accepts Python regular expressions and may be specified multiple times. Once--ignoreis given, the user-supplied list replaces the bundled default ignore list, so admins can curate their own catalogue without inheriting the defaults - If more than 10 error lines are found, the output is shortened to the first 5 and last 5 lines
Fact Sheet¶
| Fact | Value |
|---|---|
| Check Plugin Download | https://github.com/Linuxfabrik/monitoring-plugins/tree/main/check-plugins/dmesg |
| Nagios/Icinga Check Name | check_dmesg |
| Check Interval Recommendation | Every minute |
| Can be called without parameters | Yes |
| Runs on | Linux |
| Compiled for Windows | No |
Help¶
usage: dmesg [-h] [-V] [--always-ok] [--ignore IGNORE] [--test TEST]
Checks the kernel ring buffer (dmesg) for messages at severity levels emerg,
alert, crit, and err. Known false positives and hardware-specific noise are
filtered out by default; the filtered count is reported as the `errors`
perfdata so trends can be graphed. To clear reported messages after resolving
the underlying issue, run "dmesg --clear". Note: the kernel ring buffer is a
fixed-size circular buffer, so older messages are overwritten over time, and
timestamps may drift across SUSPEND/RESUME because the time source is not
updated on resume. Requires root or sudo.
options:
-h, --help show this help message and exit
-V, --version show program's version number and exit
--always-ok Always returns OK.
--ignore IGNORE Ignore a kernel message matching this Python regular
expression. Can be specified multiple times. Specifying
this parameter replaces the bundled default ignore list.
Example: `--ignore="^.* unhandled (rd|wr)msr: "`.
--test TEST For unit tests. Needs "path-to-stdout-file,path-to-stderr-
file,expected-retc".
Usage Examples¶
Run with the bundled defaults:
./dmesg
Add a regex to suppress noisy ACPI EC method-abort messages on top of the defaults:
./dmesg --ignore="ACPI Error: Aborting method"
Note: specifying --ignore replaces the bundled defaults. To keep the defaults plus an extra pattern, repeat the bundled patterns or wrap them in a single broader regex such as --ignore="(unhandled (rd|wr)msr: |EFI MOKvar)".
Sample output on a host with real errors:
5 errors in Kernel Ring Buffer.
[Mon May 31 18:27:14 2021] x86/cpu: SGX disabled by BIOS
[Sat Jun 5 18:49:50 2021] ACPI Error: Thread 2495397888 cannot release Mutex [ECMX] acquired by thread 1817575424 (20210105/exmutex-378)
[Sat Jun 5 18:49:50 2021] ACPI Error: Aborting method \_SB.PCI0.LPCB.ECDV._Q66 due to previous error (AE_AML_NOT_OWNER) (20210105/psparse-529)
[Tue Jun 8 18:54:41 2021] usb usb2-port1: Cannot enable. Maybe the USB cable is bad?
[Tue Jun 8 18:54:41 2021] usb usb2-port1: unable to enumerate USB device|'errors'=5;;;0
States¶
- OK if no emerg, alert, crit, or err messages are found in the kernel ring buffer (after filtering).
- CRIT if any such messages are found.
--always-oksuppresses all alerts and always returns OK.
Perfdata / Metrics¶
| Name | Description |
|---|---|
errors |
Number of unfiltered error lines found in the ring buffer. |
Credits, License¶
- Authors: Linuxfabrik GmbH, Zurich
- License: The Unlicense, see LICENSE file.