Bro's ASCII logging facilities do not escape escape characters

Description

  • Bro escapes non-printable ASCII characters with either \x?? or ^ depending on the character (https://www.bro.org/sphinx/scripts/base/bif/strings.bif.bro.html).

  • Bro does not however escape \ or ^.

  • This behavior makes recovering the original string impossible as you can not differentiate between an escaped sequence and a string containing those characters.

Examples:
$ bro -e 'event bro_init() { print "foo \xc2\xae bar \\xc2
xae baz"; }'
foo \xc2\xae bar \xc2\xae baz

$ bro -e 'event bro_init() { print "foo\x00bar
0baz"; }'
foo\0bar\0baz

$ bro -e 'event bro_init() { print "foo \16 bar ^N baz"; }'
foo ^N bar ^N baz

Additionally, it would be ideal if there was a way to standardize escaping to a single syntax (\x?? for all, for example). This would allow post-processing of the bro logs in languages like Python or Ruby trivially using existing decode/encode functionality. I'm happy to file a separate feature request for this behavior, if that is preferred.

I brought this up on the mailing list (http://mailman.icsi.berkeley.edu/pipermail/bro/2015-February/008174.html). It was suggested (off list) that I file a ticket as well.

Environment

None

Activity

Show:
Seth Hall
March 12, 2015, 12:52 PM

Thanks for testing. I need to add a test and then I think I'll mark this branch for merging for 2.4.

Robin Sommer
April 10, 2015, 6:34 PM

Needs code review, probably not working quite right, messes up the test-suite.

Robin Sommer
April 14, 2015, 7:31 PM

I don't think this is quite right yet: we can't really generally escape backslashes on "print". If we did, we'd get for example this:

I.e, the escape_string() inserts "\x00", and then the print escapes that backslash.

What if we did the backslash escape only on "special request", that is when calling escape_string() and simiarl functions? If one wants the reversible representation, one would then need to call such a function; whereas the semantics for a normal print would remain at "make sure it doesn't output non-printable characters", without being reversible.

Paul Pearce
April 14, 2015, 8:22 PM
Edited

Robin,

Thanks for looking at this.

The quoted behavior above seems desirable to me as it provides for a completely reversible process. Can you elaborate a bit?

The issues I'm encountering has to do with these characters being outputted via the logging framework. My understanding of the framework is such that your solution (special function) would mean that you could never get the recoverable representation via logging. Is that correct? If so, that seems problematic given that many programs consume these logs.

Perhaps a middle ground solution would be a bro configuration option that controls this behavior globally?

Robin Sommer
April 15, 2015, 5:26 AM

I can see doing this generally for logging. So would it work if we did the backslash escaping for logging, but stayed with my suggestion above for print and other script-land stuff?

Paul Pearce
April 15, 2015, 7:45 PM

That sounds great.

Robin Sommer
April 16, 2015, 12:00 AM

Try topic/robin/ascii-escape-normalization and see if that works for you.

Robin Sommer
April 16, 2015, 8:49 PM

Assigning this back to Seth for review and merging.

Robin Sommer
April 16, 2015, 8:51 PM

Oh, still need to update external tests actually, just a second.

Robin Sommer
April 17, 2015, 4:57 AM

Now in topic/robin/ascii-escape-normalization in bro, bro-testing and bro-testing-private.

Assignee

Robin Sommer

Reporter

Paul Pearce

Labels

None

External issue ID

None

Components

Fix versions

Affects versions

Priority

Normal
Configure