Bro process sticks around after broctl stop

Description

It seems that after running a "broctl stop" not all bro processes are killed immediately. On our cluster, one of the processes keeps running; I seems like it eventually terminates after all log-compression is done. Is that on purpose or is that a bug?

Ps output (on the node running the manager, bro process in first line, including the running compression jobs for completeness):

Environment

None

Activity

Show:
Justin Azoff
April 3, 2015, 7:42 PM

I wonder if that process is just left over from when bro calls system() to run the child process...

I'm not sure what to do about this. killing that process is not the best idea, but there may be a way to wait for it.

I think there is a larger issue here in that log rotation has a number of problems:

  • All logs get rotated+compressed at the same time, causing a CPU/IO Storm

  • Logs are compressed on the fly to their destination, then the originals are removed

  • If compression is not in use, logs are copied and then removed (rather than moved)

  • If using something like the sftp handler and sftp fails, nothing is retried.

  • Bro is the parent process to all of this.

  • If bro crashes logs often end up in a crash directory rather than the proper location.

I think that the only thing bro should be doing is atomically moving the current logs to an archive directory or an archive staging directory. The compression,moving,copying,uploading would be done by an external tool. There are a number of benefits to this:

  • If bro crashes recovering the logs is easy: on startup just move any existing log files to the staging dir. A bro crash could never result in a partially compressed/rotated log file

  • Compression can be done serially or with limited parallelism rather than all at once

  • You could even delay the compression to idle periods

  • Bugs like this would not occur since stopping bro would just require the logs to be moved, not compressed

Robin Sommer
April 10, 2015, 6:28 PM
Edited

This may be related to BIT-1306, let's wait for that.

Robin Sommer
April 11, 2015, 5:19 AM

Can somebody see if 0620bc97 helps?

Daniel Thayer
April 13, 2015, 1:12 AM

I've tested this, and 0620bc97 fixed the problem for me.

Assignee

Daniel Thayer

Reporter

Johanna Amann

Labels

None

External issue ID

None

Components

Fix versions

Affects versions

Priority

Normal
Configure