Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I like icinga's model, which can run a small agent on the server, but it doesn't run as root. I grant specific sudo rules for checks that need elevated permissions.

I find it easier to write custom checks for things where I don't control the application. My custom checks often do API calls for the applications they monitor (using curl locally against their own API).

There are also lots of existing scripts I can re-use, either from the Icinga or from Nagios community, so that I don't write my own.

For example, recently I added systemd monitoring. There is a package for the check (monitoring-plugins-systemd). So I used Ansible to install everywhere, and then "apply" a conf to all my Debian servers. Helped me find a bunch of failing services or timers, which previously went un-noticed, including things like backups, where my backup monitoring said everything was OK, but the systemd service for borgmatic was running a "check" a found some corruption.

For logs I use promtail/loki. Also very much worth the investment. Useful to detect elevated error rates, and also for finding slow http queries (again, I don't fully control the code of applications I manage).



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: