Monitoring-agents don’t always report what you expect, or sometimes not at all. The NiTO agent is, after all, software which runs on machines, and machines break, which is why we need to monitor them in the first place.
When an agent isn’t able to function at all, NiTO will let you know via alerts (by default). These type of issues usually boil down to a problem with the system itself, or an inability to communicate with our central servers over the Internet. There are a multitude of things that can go wrong with machines, and that’s where you come in; the human.
Here are a few scenarios, and steps that you can take to troubleshoot and resolve any underlying issues that may be preventing the NiTO agent from running.
Problem: The agent appears to be running, but NiTO says it’s ‘Down’ or ‘Not Communicating’:
Summary: The NiTO agent is likely collecting data, but cannot transfer it to NiTO’s central servers.
Resolution Option 1:
The NiTO agent must be able to communicate using the Hyper-text Protocol, with https://mediator.nito.net/
If you cannot connect to the above URL in a browser (or wget/curl) from the same machine as the agent, this would indicate a networking problem on your end that you’ll need to troubleshoot independently of NiTO.
If you get the following response from the above URL, via HTTPS GET, and the problem persists, see Resolution 2:
Connected to NiTO mediator (but this is not a fulfillable request from a NiTO agent)
Resolution Option 2:
Although not always installed, the telnet utility is readily available on both Windows and Linux based systems (i.e., under ‘Add or Remove Programs’, or ‘[yum/apt-get] install telnet’).
The NiTO agent must be able to communicate with collector.nito.net via port 443. If you cannot connect to this host from the same machine as the agent, this would indicate a networking problem on your end that you’ll need to troubleshoot independently of NiTO.
If the following command connects (i.e., says “Connected to collector.nito.net”, or presents a blank screen prompt), see Resolution 3 (Linux) or 4 (Windows):
telnet collector.nito.net 443
Resolution Option 3 (Linux only):
If the NiTO agent itself is running, you’ll usually see two processes. One is always a child worker of the main supervisor process. If the child process can’t get running, it will likely sleep for a minute before shutting down, at which point the supervisor process will start it again to retry.
If you see two ‘nito’ processes running (‘ps a | grep nito’), wait a minute or two to see if one goes away momentarily before a new one starts. It’s best to issue the following command within a few seconds after a new child process starts (or at any time if it doesn’t seem to be stopping and starting):
kill -USR1 'cat /var/run/nito.pid'
This should generate a temporary log file at /opt/nito/var/nito_crashlog. You should move this file somewhere where you can inspect it for clues and/or submit it to NiTO support for review (the NiTO agent doesn’t write logs to disk during normal operation, to minimize overhead on your system).
Resolution Option 4 (Windows only):
NA