topic/dnthayer/ticket1396

Description

Branch topic/dnthayer/ticket1396 in the broctl repo was originally intended
to address (logs disappearing on broctl restart). Most of the commits
in this branch are aimed at making it easier to diagnose such problems
in the future. The most user-visible changes are:
1) post-terminate will now send an email if it fails to archive any logs,
2) post-terminate will now re-try to archive logs that previously failed to be archived,
3) improvements to some error messages,
4) better sanity checking of config values,
5) significant improvements to the broctl README

Environment

None

Activity

Show:
Daniel Thayer
December 10, 2015, 10:09 PM

In the second example, I've now simplified the code.

For the first example, that one would need code changes in numerous places,
and it isn't actually related to any changes in this branch, so I'd prefer to work on
that in a different branch.

Justin Azoff
December 10, 2015, 10:18 PM

Sounds good. I'll give the code another read through tomorrow and get it merged.

as for the possibility of another branch, I think we should look into any place where things like [0] or [1] appear, and where we have for loops that unpack tuples. 'for a,b,c in lst' is better than using [0], [1], [2], but then it makes it hard to add or remove a field without changing multiple things.

Justin Azoff
December 11, 2015, 3:27 PM

I had one more thought.. where broctl now does:

Is there any reason why it can't just cat the post-terminate.out file and include it in the email?

Daniel Thayer
December 11, 2015, 4:49 PM

The post-terminate.out file is being generated from the script that sends the email, so if we cat the
file at that point, we might not see the entire file contents due to buffering. Besides, if someone receives
that email, they're going to need to look in that directory anyway (to see which logs weren't
archived, and then to manually archive them).

The long-term solution is to change the way we archive logs (for that, I expect we will leverage
broctld).

Justin Azoff
December 11, 2015, 4:54 PM

Ah, nevermind then

Assignee

Justin Azoff

Reporter

Daniel Thayer

Labels

None

External issue ID

None

Components

Fix versions

Priority

Normal
Configure