Branch topic/dnthayer/ticket1396 in the broctl repo was originally intended
to address (logs disappearing on broctl restart). Most of the commits
in this branch are aimed at making it easier to diagnose such problems
in the future. The most user-visible changes are:
1) post-terminate will now send an email if it fails to archive any logs,
2) post-terminate will now re-try to archive logs that previously failed to be archived,
3) improvements to some error messages,
4) better sanity checking of config values,
5) significant improvements to the broctl README




Justin Azoff
December 8, 2015, 7:34 PM

This looks pretty good, though it's a bit large. One thing I do notice, that is even more apparent when looking at diffs, is what we need to use namedtuple more. Lines like this:

are pretty hard to understand right now. We have the whole CmdResult thing now, we should probably add a CmdRequest type namedtuple so that line could look like

though probably wrapped better.

also I see some other changes:

if 'res' was a list of namedtuples for ("node", "status", "output") that could be

Daniel Thayer
December 10, 2015, 10:09 PM

In the second example, I've now simplified the code.

For the first example, that one would need code changes in numerous places,
and it isn't actually related to any changes in this branch, so I'd prefer to work on
that in a different branch.

Justin Azoff
December 10, 2015, 10:18 PM

Sounds good. I'll give the code another read through tomorrow and get it merged.

as for the possibility of another branch, I think we should look into any place where things like [0] or [1] appear, and where we have for loops that unpack tuples. 'for a,b,c in lst' is better than using [0], [1], [2], but then it makes it hard to add or remove a field without changing multiple things.

Justin Azoff
December 11, 2015, 3:27 PM

I had one more thought.. where broctl now does:

Is there any reason why it can't just cat the post-terminate.out file and include it in the email?

Daniel Thayer
December 11, 2015, 4:49 PM

The post-terminate.out file is being generated from the script that sends the email, so if we cat the
file at that point, we might not see the entire file contents due to buffering. Besides, if someone receives
that email, they're going to need to look in that directory anyway (to see which logs weren't
archived, and then to manually archive them).

The long-term solution is to change the way we archive logs (for that, I expect we will leverage

Justin Azoff
December 11, 2015, 4:54 PM

Ah, nevermind then


Justin Azoff


Daniel Thayer



External issue ID



Fix versions