After the broker merge, I noticed the broctl "command.scripts" test would
sometimes fail. I was able to reproduce this failure on CentOS 7, and I saw
it fail in two different ways. Sometimes "broctl scripts" runs normally and
does not fail.
Sometimes, it fails and bro dumps core.
Sometimes, it hangs due to the "bro" process
hanging that was spawned by the "share/broctl/scripts/check-config"
script. In this case, I had to kill bro with "kill -9".
I also saw this happen on Ubuntu 18.04, but it happens much
more frequently on CentOS 7.
I didn't see either problem after several hundred runs on CentOS 7, so would be helpful if you could capture a stack trace from both the core dump and the hung process to see if they're similar to other problems.
e.g. I just found this: https://github.com/actor-framework/actor-framework/pull/700
and when checking into BIT-1938, I can reproduce what looks like: https://github.com/actor-framework/actor-framework/issues/686
I probably should have mentioned that I was using the same
node.cfg as the "command.scripts" test uses. I also ran
"broctl scripts" from a small shell script that just does
what the test case does (before running this script, I
did a "broctl install"):
broctl scripts worker-1
broctl scripts -c
broctl scripts -c worker-1
I just tried this again now, and Bro dumped core on the first
run of the above shell script.
I am using a dual-core VirtualBox VM with CentOS 7.5.
The crash is fixed by: https://github.com/bro/bro/commit/66ee3764117e4d01c1ed3e0ff05ca60161ea9a69
The hang is a duplicate of (which is waiting on CAF issue https://github.com/actor-framework/actor-framework/issues/686).
I also changed BroControl to tell Bro to not attempt to set up cluster connections when running 'scripts' or 'check' commands.