OS / General

TriggerLevelDiscoveryKeyDescriptionFix
Os : General nodatahighn/ak.os.knock.nodataThis trigger fires when we have no data for a specific server (meaning this server seams to be down). Please note this triggers acts as a master trigger for all no data triggers.Check this was expected, and check the server, check you network, check connectivity, check the knock daemon on this server.

OS / Disks

TriggerLevelDiscoveryKeyDescriptionFix
Hdd.Health no datahighPer diskk.hard.hd.health, no data for 7200 secThis trigger fires when a disk has been removed or dispeared, and we have no data for more than 7200 sec.Check this was expected and/or replace the disk.
Hdd.Health statushighPer diskk.hard.hd.health != KOKThis trigger fires when SMART status for a disk is invalid.Check this was expected and/or replace the disk.
Hdd re-allocated sectorshighPer diskk.hard.hd.reallocated_sector_ct > 0This trigger fires when a disk start re-allocating sectors. This usually indicates a predictive disk failure.Check the disk, replace it if necessary.
Hdd serial number modifiedhighPer diskk.hard.hd.serial_number modifiedThis trigger fires when a disk serial number has changed.Check this was expected.

OS / Network

TriggerLevelDiscoveryKeyDescriptionFix
Network full duplexhighPer interfacek.net.if.status != fullThis trigger fires when a network interface is not in full duplex mode.Check configuration and restore full duplex mode.
Network speedaveragePer interfacek.net.if.status speed < 1000This trigger fires when a network interface speed is lower than 1 GB/sec.Check configuration, connectivity and restore at least 1 GB interface speed.
Network statusaveragePer interfacek.net.if.status != okThis trigger fires when a network interface oper status is not okCheck configuration, connectivity and restore interface status.

OS / Ping

TriggerLevelDiscoveryKeyDescriptionFix
Ping delayinfoPer ping ipk.ping.delay > 0.5This trigger fires when ping delay toward specified IP is greater than 0.5 seconds.Check your network, local and remove devices.
Ping delayhighPer ping ipk.ping.delay > 1This trigger fires when ping delay toward specified IP is greater than 1 seconds.Check your network, local and remove devices.
Ping delay no datadisasterPer diskk.ping.delay, no data for 7200 secThis trigger fires when we have no ping data for at least 300 seconds toward specified IP.Check your network, local and remove devices.
Ping lostinfoPer ping ipk.ping.lost > 0This trigger fires when ping packet loss are detected (> 0).Check your network, local and remove devices.
Ping losthighPer ping ipk.ping.lost > 25This trigger fires when ping packet loss are detected (> 25).Check your network, local and remove devices.
Ping lost no datadisasterPer ping ipk.ping.lost, no data for 7200 secThis trigger fires when we have no ping loss for at least 300 seconds toward specified IP.Check your network, local and remove devices.

OS / File systems

TriggerLevelDiscoveryKeyDescriptionFix
Free disk spacehighPer volumek.vfs.fs.size, % free < 15This trigger fires when free disk space on specified volume is lower than 15%.Check your disk usage, remove some files, archive some files, upgrade volume capacity.
Free disk spacewarnPer volumek.vfs.fs.size, % free < 30This trigger fires when free disk space on specified volume is lower than 30%.Check your disk usage, remove some files, archive some files, upgrade volume capacity.
Free inode spacehighPer volumek.vfs.fs.inode, % free < 15This trigger fires when free inode space on specified volume is lower than 15%.Check your disk usage, remove some files, archive some files, upgrade inode capacity (may be difficult depending on underlying FS).
Free inode spacewarnPer volumek.vfs.fs.inode, % free < 30This trigger fires when free inode space on specified volume is lower than 30%.Check your disk usage, remove some files, archive some files, upgrade inode capacity (may be difficult depending on underlying FS).

OS / Cpu

TriggerLevelDiscoveryKeyDescriptionFix
Os.Cpu : CPU is overloadedwarnn/ak.os.cpu.util, idle < 10%This trigger fires when idle cpu usage < 10%Check this was expected, and check server/processes/users activities.
Os.Cpu : Load is too highwarnn/ak.os.cpu.load, per cpu > 5This trigger fires when server load, per cpu, is greater than 5Check this was expected, and check server/processes/users activities.

OS / Misc

TriggerLevelDiscoveryKeyDescriptionFix
Os.Host : /etc/passwd has been changedwarnn/ak.vfs.file.cksum has changedThis trigger fires when /etc/passwd is modifiedCheck this was expected
Os.Host : Configured max number of opened files is too lowinfon/ak.os.maxfiles < 1025This trigger fires when configured max opened files is too lowCheck your server configuration & tuning.
Os.Host : Configured max number of processes is too lowinfon/ak.os.maxproc < 1025This trigger fires when configured max processes is too lowCheck your server configuration & tuning.
Os.Host : Time Diffaveragen/ak.os.timediff > 1This trigger fires when server time difference is greater than 1 secondsCheck your server time synchronization.
Os.Host : Time Diffdisastern/ak.os.timediff > 2This trigger fires when server time difference is greater than 2 secondsCheck your server time synchronization.
Os.Host : Server has just been restartedinfon/ak.os.uptime < 600This trigger fires when server restarted for less than 600 secondsCheck this was expected.
Os.Memory : Low available memoryaveragen/ak.os.memory.size available < 16MB and k.os.memory.size cached < 32MBThis trigger fires when server is in low memory conditions (no more available and cached memory)Check you server & processes memory usage. Increase memory.
Os.Swap : Lack of free swap spacehighn/ak.os.swap.size, % free < 25This trigger fires when swap free space if less than 25%.Check you server & processes memory usage. Increase memory. Please note that a swap is hell, fix it.
Os.Swap : Lack of free swap spaceaveragen/ak.os.swap.size, % free < 50This trigger fires when swap free space if less than 50%.Check you server & processes memory usage. Increase memory. Please note that a swap is hell, fix it.
Os.Swap : Lack of free swap spaceinfon/ak.os.swap.size, % free < 75This trigger fires when swap free space if less than 75%.Check you server & processes memory usage. Increase memory. Please note that a swap is hell, fix it.

OS / Dns

TriggerLevelDiscoveryKeyDescriptionFix
Dns statushighPer hostname / dns serverk.dns.resolv==KOThis trigger fires when dns resolving for specified hostname, toward specified dns server is not ok (invalid reply, resolving failed, timeout).Possible remote dns server issue, platform connectivity issue, dns entry issue, dns resolving configuration issue.

OS / Process

TriggerLevelDiscoveryKeyDescriptionFix
CheckProcess : pidfile for processaveragePer process monitoredk.proc.pidfile!=okThis trigger fires when process pidfile is not ok.Possible process down and/or crashed, pidfile deleted, process not stopped
CheckProcess : running for processaveragePer process monitoredk.proc.running!=okThis trigger fires when process is not running.Possible process down and/or crashed

WEB / Nginx

TriggerLevelDiscoveryKeyDescriptionFix
Nginx startedhighPer instancek.nginx.started!=1This trigger fires when nginx status is down (no valid reply, no status reply, http timeout).Possible nginx status not correctly deployed (check daemon logs), possible nginx workers overload, possible nginx instance failure or stopped.

WEB / Apache

TriggerLevelDiscoveryKeyDescriptionFix
Apache idle workersaveragePer instancek.apache.stat.idle_workers==0This trigger fires when no more apache idle workers are available (meaning that all apache workers are in use).Possible platform slowdown or issue. May requires apache tuning, server code optimization, platform optimization, benchmarking.
Apache startedhighPer instancek.apache.started!=1This trigger fires when apache status is down (no valid reply, no status reply, http timeout).Possible apache status not correctly deployed (check daemon logs), possible apache workers overload, possible apache instance failure or stopped.

WEB / PhpFpm

TriggerLevelDiscoveryKeyDescriptionFix
PhpFpm idle processesaveragePer instancek.phpfpm.idle_processes==0This trigger fires when no more PhpFpm idle processes are available (meaning that all processes are in use).Possible platform slowdown or issue. May requires PhpFpm tuning, server code optimization, platform optimization, benchmarking.
PhpFpm startedhighPer instancek.phpfpm.started!=1This trigger fires when PhpFpm status is down (no valid reply, no status reply, http timeout).Possible PhpFpm status not correctly deployed (check daemon logs), possible PhpFpm pool overload, possible PhpFpm instance failure or stopped, possible upper Web Server issue (Nginx, Apache...).
PhpFpm restartedinfoPer instancek.phpfpm.start_since<600This trigger fires when PhpFpm has been restarted recently (<600 seconds).Check instance restart was expected.

WEB / Uwsgi

TriggerLevelDiscoveryKeyDescriptionFix
Uwsgi overloadaveragePer instancek.uwsgi.cores.cur.idle==0This trigger fires when no more idle cores are available (meaning that all cores are in use).Possible platform slowdown or issue. May requires uwsgi tuning, server code optimization, platform optimization, benchmarking.
Uwsgi startedhighPer instancek.uwsgi.started!=1This trigger fires when uwsgi stat is down (no valid reply, no stats socket reply, stat timeout).Possible uwsgi stat not correctly deployed (check daemon logs), possible server overload, possible uwsgi instance failure or stopped.

WEB / Varnish

TriggerLevelDiscoveryKeyDescriptionFix
Varnish main.uptimeinfoPer instancek.varnish.main.uptime<600This trigger fires when main varnish process has been restarted recently (<600 seconds).Check instance restart was expected.
Varnish mgt.uptimeinfoPer instancek.varnish.mgt.uptime<600This trigger fires when mgt varnish process has been restarted recently (<600 seconds).Check instance restart was expected.
Varnish startedhighPer instancek.varnish.started!=1This trigger fires when varnishstat is down (no valid reply, no invoke reply, invoke timeout).Possible varnishstat not correctly installed (check daemon logs), possible server overload, possible varnish instance failure or stopped.
Varnish backend_busyaveragePer instancek.varnish.backend_busy>0This trigger fires when some busy backends are detected.You may have to check your underlying backends.
Varnish backend_failaveragePer instancek.varnish.backend_fail>0This trigger fires when some failed backends are detected.You may have to check your underlying backends.
Varnish cur.thread_queue_lenaveragePer instancek.varnish.cur.thread_queue_len>0This trigger fires when some sessions are waiting for available threads.Possible varnish thread pool tuning required, possible server and/or backend slowdown.
Varnish sess_dropaveragePer instancek.varnish.sess_drop>0This trigger fires when sessions are silently dropped due to lack of worker thread.Possible varnish thread pool tuning required, possible server and/or backend slowdown.
Varnish sess_droppedaveragePer instancek.varnish.sess_dropped>0This trigger fires when sessions are dropped because the queue were too long already.Possible varnish thread pool tuning required, possible server and/or backend slowdown.
Varnish sess_failaveragePer instancek.varnish.sess_fail>0This trigger occurs when some TCP accept failed. Can be caused by client, or the server ran out of some resource like file descriptors.Possible OS and/or varnish process tuning required, possible server and/or backend slowdown.
Varnish sess_queuedaveragePer instancek.varnish.sess_queued>0This trigger fires when session are queued waiting for a thread.Possible varnish process tuning required, possible server and/or backend slowdown.
Varnish threads_failedaveragePer instancek.varnish.threads_failed>0This trigger fires when creating a thread failed.Indeed an issue, investigate varnish instance (open files limits?).
Varnish threads_limitedaveragePer instancek.varnish.threads_limited>0This trigger fires when thread pool is maxed.Possible varnish thread pool tuning required, possible server and/or backend slowdown.

SQL / Mysql

TriggerLevelDiscoveryKeyDescriptionFix
Mysql startedhighPer instancek.mysql.started!=1This trigger fires when daemon is not able to connect and/or execute basic SQL statements toward Mysql instance (SQL timeout, SQL failure, instance down).Possible debian-sys-maint account issue, Mysql instance down or crashed.
Mysql Replication ThreadshighPer instancek.mysql.repli.cur.lag_sec<0This trigger fires a replication setup is detected, but replication threads are not running.Possible broken replication, master server connectivity issue, replication threads crash, instance issue.
Mysql Replication LaginfoPer instancek.mysql.repli.cur.lag_sec>600This trigger fires a replication setup is detected, replication is up, but slave server lag (600+ seconds) compared to master.Possible slave server slowdown, possible huge sql requests processing on slave server, replicating table without PK (involving massive full scan on slave server)...
Mysql Replication LagwarnPer instancek.mysql.repli.cur.lag_sec>3600This trigger fires a replication setup is detected, replication is up, but slave server lag (3600+ seconds) compared to master.Possible slave server slowdown, possible huge sql requests processing on slave server, replicating table without PK (involving massive full scan on slave server)...
Mysql Replication LaghighPer instancek.mysql.repli.cur.lag_sec>7200This trigger fires a replication setup is detected, replication is up, but slave server lag (7200+ seconds) compared to master.Possible slave server slowdown, possible huge sql requests processing on slave server, replicating table without PK (involving massive full scan on slave server)...

NOSQL / Redis

TriggerLevelDiscoveryKeyDescriptionFix
Redis startedhighPer instancek.redis.started!=1This trigger fires when redis status is down (no valid reply, no INFO reply, socket timeout).Possible instance overload, server slowdown, possible redis instance failure or stopped.
Redis uptimeinfoPer instancek.redis.uptime_in_seconds<600This trigger fires when Redis has been restarted recently (<600 seconds).Check instance restart was expected.
Redis replication downhighPer instancek.redis.master_link_down_since_seconds>60This trigger fires a replication setup is detected, replication is down (since 60+ seconds) toward master.Possible master connectivity issue, possible replication issue.
Redis rdb saveaveragePer instancek.redis.rdb_last_bgsave_status!=okThis trigger fires if last rdb save was not ok.Possible instance issue, possible disk full.
Redis aof save (write)averagePer instancek.redis.aof_last_write_status!=okThis trigger fires if last aof write was not ok.Possible instance issue, possible disk full.
Redis aof save (rewrite)averagePer instancek.redis.aof_last_bgrewrite_status!=okThis trigger fires if last aof rewrite was not ok.Possible instance issue, possible disk full.

NOSQL / MemCached

TriggerLevelDiscoveryKeyDescriptionFix
MemCached startedhighPer instancek.memcached.started!=1This trigger fires when memcached status is down (no valid reply, no stats reply, socket timeout).Possible instance overload, server slowdown, possible memcached instance failure or stopped.
MemCached uptimeinfoPer instancek.memcached.uptime<600This trigger fires when MemCached has been restarted recently (<600 seconds).Check instance restart was expected.
MemCached accepting stopinfoPer instancek.memcached.listen_disabled_num>0This trigger fires when MemCached has stopped acception connections recently.Possible instance overload, possible tuning at maxconns end required, possible tuning at tcp stack end required.
MemCached accepting disabledhighPer instancek.memcached.accepting_conns!=1This trigger fires when MemCached do not accept connections any more.Possible instance overload, crashed, mis-configured, buggy.
MemCached auth errorsinfoPer instancek.memcached.auth_errors>0This trigger fires when MemCached authentication errors occured.Possible configuration issue (at client end|at server end), possible breakout attempts.