Check redis-status¶

Overview¶

Returns information and statistics about a Redis server. Alerts on memory consumption, memory fragmentation, hit rates and more. Connects to Redis via 127.0.0.1:6379 by default.

Hints:

Tested on Redis 3.0+.
"I'm here to keep you safe, Sam. I want to help you." comes from the character GERTY in the movie "Moon" (2009).

Fact Sheet¶

Fact	Value
Check Plugin Download	https://github.com/Linuxfabrik/monitoring-plugins/tree/main/check-plugins/redis-status
Check Interval Recommendation	Once a minute
Can be called without parameters	Yes
Compiled for Windows	No
Requirements	command-line tool `redis-cli`

Help¶

usage: redis-status [-h] [-V] [--always-ok] [--cacert CACERT] [-c CRIT]
                    [-H HOSTNAME] [--ignore-maxmemory0] [--ignore-overcommit]
                    [--ignore-somaxconn] [--ignore-sync-partial-err]
                    [--ignore-thp] [-p PASSWORD] [--port PORT]
                    [--socket SOCKET] [--test TEST] [--tls]
                    [--username USERNAME] [--verbose] [-w WARN]

Returns information and statistics about a Redis server. Alerts on memory
consumption, memory fragmentation, hit rates and more.

options:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  --always-ok           Always returns OK.
  --cacert CACERT       CA Certificate file to verify with. Needs `--tls`.
                        Default: /etc/pki/tls/certs/rootCA.pem
  -c, --critical CRIT   Set the CRIT memory usage threshold as a percentage.
                        Default: >= None
  -H, --hostname HOSTNAME
                        Redis server hostname. Default: 127.0.0.1
  --ignore-maxmemory0   Don't warn about redis' maxmemory=0. Default: False
  --ignore-overcommit   Don't warn about vm.overcommit_memory<>1. Default:
                        False
  --ignore-somaxconn    Don't warn about net.core.somaxconn <
                        net.ipv4.tcp_max_syn_backlog. Default: False
  --ignore-sync-partial-err
                        Don't warn about partial sync errors (because if you
                        have an asynchronous replication, a small number of
                        "denied partial resync requests" might be normal).
                        Default: False
  --ignore-thp          Don't warn about transparent huge page setting.
                        Default: False
  -p, --password PASSWORD
                        Password to use when connecting to the redis server.
  --port PORT           Redis server port. Default: 6379
  --socket SOCKET       Redis server socket (overrides hostname and port).
  --test TEST           For unit tests. Needs "path-to-stdout-file,path-to-
                        stderr-file,expected-retc".
  --tls                 Establish a secure TLS connection to Redis.
  --username USERNAME   Username to use when connecting to the Redis server.
  --verbose             Makes this plugin verbose during the operation. Useful
                        for debugging and seeing what's going on under the
                        hood. Default: False
  -w, --warning WARN    Set the WARN memory usage threshold as a percentage.
                        Default: >= 90

Usage Examples¶

./redis-status \
    --ignore-maxmemory0 \
    --ignore-overcommit \
    --ignore-somaxconn \
    --ignore-sync-partial-err \
    --ignore-thp \
    --username=linus \
    --password=linuxfabrik

Output:

Redis v5.0.3, standalone mode on 127.0.0.1:6379, /etc/redis.conf, up 4m 25s, 100.9% memory usage
[WARNING] (9.6MiB/9.5MiB, 9.6MiB peak, 19.6MiB RSS), maxmemory-policy=noeviction, 3 DBs
(db0 db3 db4) with 10 keys, 0.0 evicted keys, 0.0 expired keys, hit rate 100.0%
(3.0M hits, 0.0 misses), vm.overcommit_memory is not set to 1, kernel transparent_hugepage is not
set to "madvise" or "never", net.core.somaxconn (128) is lower than net.ipv4.tcp_max_syn_backlog
(256). Sam, I detected a few issues in this Redis instance memory implants:

 * High total RSS: This instance has a memory fragmentation and RSS overhead greater than 1.4
 (this means that the Resident Set Size of the Redis process is much larger than the sum of the
logical allocations Redis performed). This problem is usually due either to a large peak
memory (check if there is a peak memory entry above in the report) or may result from a workload
that causes the allocator to fragment memory a lot. If the problem is a large peak memory, then
there is no issue. Otherwise, make sure you are using the Jemalloc allocator and not the default
libc malloc. Note: The currently used allocator is "jemalloc-5.1.0".

I'm here to keep you safe, Sam. I want to help you.

States¶

WARN or CRIT in case of memory usage above the specified thresholds
WARN on Redis' maxmemory 0 setting (can be disabled)
WARN on any memory issues (can be disabled)
WARN on partial sync errors (can be disabled)
WARN on bad OS configuration (can be disabled)

Perfdata / Metrics¶

Latest info can be found here.

Name	Type	Description
clients_blocked_clients	Number	Number of clients pending on a blocking call
clients_connected_clients	Number	Number of client connections (excluding connections from replicas)
cpu_used_cpu_sys	Number	System CPU consumed by the Redis server, which is the sum of system CPU consumed by all threads of the server process (main thread and background threads)
cpu_used_cpu_sys_children	Number	System CPU consumed by the background processes
cpu_used_cpu_user	Number	User CPU consumed by the Redis server, which is the sum of user CPU consumed by all threads of the server process (main thread and background threads)
cpu_used_cpu_user_children	Number	User CPU consumed by the background processes
db_count	Number	Number of Redis databases
key_count	Number	Sum of all keys across all databases
keyspace_DBNAME_keys	Number	The number of keys
keyspace_DBNAME_expires	Number	The number of keys with an expiration
keyspace_DBNAME_avg_ttl	Seonds
keyspace_hit_rate	Percentage	Percentage of key lookups that are successfully returned by keys in your Redis instance. Generally speaking, a higher cache-hit ratio is better than a lower cache-hit ratio. You should make a note of your cache-hit ratio before you make any large configuration changes such as adjusting the maxmemory-gb limit, changing your eviction policy, or scaling your instance. Then, after you modify your instance, check the cache-hit ratio again to see how your change impacted this metric.
mem_usage	Percentage	Indicates how close your working set size is to reaching the maxmemory-gb limit. Unless the eviction policy is set to no-eviction, the instance data reaching maxmemory does not always indicate a problem. However, key eviction is a background process that takes time. If you have a high write-rate, you could run out of memory before Redis has time to evict keys to free up space.
memory_maxmemory	Bytes
memory_mem_fragmentation_ratio	Number	Ratio between used_memory_rss and used_memory. Note that this doesn't only includes fragmentation, but also other process overheads (see the allocator_* metrics), and also overheads like code, shared libraries, stack, etc. Memory fragmentation can cause your Memorystore instance to run out of memory even when the used memory to maxmemory-gb ratio is low. Memory fragmentation happens when the operating system allocates memory pages which Redis cannot fully utilize after repeated write and delete operations. The accumulation of such pages can result in the system running out of memory and eventually causes the Redis server to crash.
memory_total_system_memory	Bytes	The total amount of memory that the Redis host has
memory_used_memory	Bytes	Total number of bytes allocated by Redis using its allocator (either standard libc, jemalloc, or an alternative allocator such as tcmalloc)
memory_used_memory_lua	Bytes	Number of bytes used by the Lua engine
memory_used_memory_rss	Bytes	Number of bytes that Redis allocated as seen by the operating system (a.k.a resident set size). This is the number reported by tools such as top(1) and ps(1)
persistance_aof_current_rewrite_time_sec	Seconds	Duration of the on-going AOF rewrite operation if any
persistance_aof_rewrite_in_progress	Number	Flag indicating a AOF rewrite operation is on-going
persistance_aof_rewrite_scheduled	Number	Flag indicating an AOF rewrite operation will be scheduled once the on-going RDB save is complete.
persistance_loading	Number	Flag indicating if the load of a dump file is on-going
persistance_rdb_bgsave_in_progress	Number	Flag indicating a RDB save is on-going
persistance_rdb_changes_since_last_save	Number	Number of changes since the last dump
persistance_rdb_current_bgsave_time_sec	Seconds	Duration of the on-going RDB save operation if any
replication_connected_slaves	Number	Number of connected replicas
replication_repl_backlog_histlen	Bytes	Size in bytes of the data in the replication backlog buffer
replication_repl_backlog_size	Bytes	Total size in bytes of the replication backlog buffer
server_uptime_in_seconds	Seconds	Number of seconds since Redis server start
stats_evicted_keys	Continous Counter	Number of evicted keys due to maxmemory limit
stats_expired_keys	Continous Counter	Total number of key expiration events. If there are no expirable keys, it can be an indication that you are not setting TTLs on keys. In such cases, when your instance data reaches the maxmemory-gb limit, there are no keys to evict which can result in an out of memory condition. If the metric shows many expired keys, but you still see memory pressure on your instance, you should lower maxmemory-gb.
stats_instantaneous_input	Number	The network read rate per second in KB/sec
stats_instantaneous_ops_per_sec	Number	Number of commands processed per second
stats_instantaneous_output	Number	The networks write rate per second in KB/sec
stats_keyspace_hits	Number	Number of successful lookup of keys in the main dictionary
stats_keyspace_misses	Number	Number of failed lookup of keys in the main dictionary
stats_latest_fork_usec	Number	Duration of the latest fork operation in microseconds
stats_migrate_cached_sockets	Number	The number of sockets open for MIGRATE purposes
stats_pubsub_channels	Number	Global number of pub/sub channels with client subscriptions
stats_pubsub_patterns	Number	Global number of pub/sub pattern with client subscriptions
stats_rejected_connections	Number	Number of connections rejected because of maxclients limit
stats_sync_full	Number	The number of full resyncs with replicas
stats_sync_partial_err	Number	The number of denied partial resync requests
stats_sync_partial_ok	Number	The number of accepted partial resync requests
stats_total_commands_processed	Number	Total number of commands processed by the server
stats_total_connections_received	Number	Total number of connections accepted by the server
stats_total_net_input_bytes	Bytes	The total number of bytes read from the network
stats_total_net_output_bytes	Bytes	The total number of bytes written to the network

Troubleshooting¶

vm.overcommit_memory is not set to 1
sysctl -w vm.overcommit_memory=1

kernel transparent_hugepage is not set to "madvise"
echo madvise > /sys/kernel/mm/transparent_hugepage/enabled

net.core.somaxconn is lower than net.ipv4.tcp_max_syn_backlog
tcp_max_syn_backlog represents the maximal number of connections in SYN_RECV queue. somaxconn represents the maximal size of ESTABLISHED queue and should be greater than tcp_max_syn_backlog, so do something like this: sysctl -w net.core.somaxconn=1024; sysctl -w net.ipv4.tcp_max_syn_backlog=512

Credits, License¶

Authors: Linuxfabrik GmbH, Zurich
License: The Unlicense, see LICENSE file.