System Monitoring Applications

When was the last time you knew what was normal for your server? Have you ever had a suspicion some process was eating up your resources? Ever left for vacation and want'd to know if your web server went down? These are some of the questions that a good system monitoring application can help answer.

Before I left for vacation I installed Munin and Nagios. These are two mature system monitoring applications. Munin keeps track of various bits of system information; the number of apache processes, system load, mysql queries, and other useful statistics. Munin's benefit is it keeps a historical record of system status and displays it in graph format over day, week, month, and year. This can allows you to monitor your servers to determine if anything out of the ordinary is occurring.

Nagios is well suited to monitoring your system and services to determine if there are problem and send alerts if detected. It can monitor your apache services, ssh, disk space, and other critical apps. Though Nagios does allow you to review some historical data, Munin is better suited to this task. When I first installed, Nagios was configured to monitor http, ssh, root disk space, processes, number of users, and swap space. These were sufficient for my needs, but it didn't setup alerting. I was kind of short on time, and Nagio's configuration is complicated, so I didn't get alerts setup before I left. This defeated the purpose of installing Nagios to monitor my server while I was on vacation but I could still log in and check my system status if I wished.

One caveat to consider when installing Nagios on a single server, if you are monitoring the localhost and something goes wrong which makes the system inaccessible, it doesn't do much good to get alerted to the problem. The best approach would be to setup a separate monitoring system. Using Xen or another virtualization product would be a good idea in the case of a single piece of hardware.

Remember that even with good monitoring software installed, Murphy's law applies. This means that your monitoring system will fail before any of your other systems.