to_json mishandles strings with both \ and unprintable characters


The to_json function (implemented in base/utils/json.bro) escapes unprintable characters using the clean primitive, and then manually escapes }} and {{" with regexes. This is wrong on two levels: the transformation performed by clean is irreversible, and \x escapes are not part of the JSON standard. For instance, consider

Running this test program with bro -b -C test.bro will produce the output "\"\\\\x81". Because of the irreversible clean transformation, this output could correspond to either the original three-byte string (hexdumped, 22 5C 81) or a six-byte string containing two backslashes and the literal string x81 (hexdumped, 22 5C 5C 78 38 31.) Because \x escapes are not part of the JSON standard, it is not enough to replace clean and the inner gsub with escape_string; that would produce "\\\x81", which is unambiguous, but also unparseable.

The ideal output would be "\"\\\u0081". I'm not sure how to accomplish this, considering that gsub does not appear to implement any way of referring to capture groups from the replacement string. For now I'm going to change json.bro to read

and postprocess the JSON with a more powerful regex engine before trying to parse it.

(There is also the headache of dealing with strings with U+0000 in them, but I think it would be fair to declare that Not Your Problem.)




Zack Weinberg
July 26, 2018, 3:55 AM

For the record, the construct that doesn't seem to be possible with gsub, but is possible with a "more powerful regex engine", is

Robin Sommer
August 2, 2018, 3:45 AM

to_json() should probably become a bif so that we can write it in C++ and "do the right thing". This is part of a broader question of potentially making Bro more JSON-friendly. Having well-defined JSON de-/serialization of Bro values would be nice.

And yeah, our regexps don't support capture groups. That's another item on the roadmap: consider replacing Bro's engine with one of the modern libraries; that needs some thought & research.

Jon Siwek
September 18, 2018, 6:25 AM




Zack Weinberg



External issue ID



Affects versions