BroCtl status/top take excessive amount of time

Description

After running a large bro cluster for a few days on a FreeBSD system (FreeBSD 10.1, 28 physical nodes, 81 worker processes), broctl actions that interact with all nodes seem to take excessive amounts of time (>2 minutes for a broctl status). This was not the case right after starting up the cluster.

If there is any way I can help with more information, please let me know what to do.

Environment

None

Activity

Show:
Johanna Amann
March 26, 2015, 2:46 PM

And even more detail - the cause of this was hardware problems on two nodes. The bro instances of these nodes were still kind-of-running, but I don't think they were communicating with master anymore and they were unnkillable (even with kill -9); probably hanging while waiting for disk-io (harddrive problems). Since you still could ssh into the nodes, and they worked normally unless you tried to do certain file system accesses, broctl apparently listed them as online, without giving any indication of problems with the nodes, besides the fact that "status" takes a long time.

Daniel Thayer
March 27, 2015, 9:39 PM

I'm not seeing a problem. As a test, I simulated a slow node by adding a "sleep"
command to one of the scripts that broctl runs on the remote host.
If the sleep is long enough to exceed the timeout, then I see "???" in the status
output (in the "Running", "Peers", and "Started" columns).
Otherwise, broctl status simply gathers information reported by Bro.

Robin Sommer
April 3, 2015, 6:31 PM

set timeout to 30s and make configurable, revisit later when Broker is there

Daniel Thayer
April 16, 2015, 9:29 PM

Branch topic/dnthayer/ticket1353 in the broctl repo contains the fix for this issue.

Robin Sommer
April 21, 2015, 2:25 AM

This has been merged already.

Assignee

Unassigned

Reporter

Johanna Amann

Labels

None

External issue ID

None

Components

Fix versions

Affects versions

Priority

Normal
Configure