This branch makes it less expensive to serialize large/complex values (e.g. connection and/or fa_file records).

The obvious overhead that could be reduced was from the fixed growth incrementation of the buffer used to contain serialized data. With records that expand out to ~1.6M (master) or ~3M (topic/bernhard/file-analysis-x509) in serialized form, it takes a bit too many allocations when trying to get there in growth increments of 64K. It may also help some to use realloc instead of new/memcpy/delete each time it needs to grow.

I didn't find it helped much to increase the initial buffer size from 64K (and 90% of the things needing serialization fit in that size buffer anyway).

It could possibly help to preallocate a buffer that gets re-used across serializations instead of repeatedly allocating small buffers that will need to be resized.

I don't have a complete breakdown/view of the bytes that make up the serialized version of the large/complex records, but taking a quick look I note that the filenames from Location information of each BroObj/Val make up a third of ~1.6M (master). And that's the full path of each file, so this all will depend on where the Bro scripts reside on the file system (i.e. put them as close to the root dir as possible and you might increase performance!).

Any other quick ideas of what can be done here? If not, improving the serialization seems to deserve its own project (which also might be part of the new comm. library project) for later.

In the meantime, it's at least shown that avoiding situations where large/complex records are serialized can help (). And that might always be a useful optimization strategy if the serialized representation of Vals is going to scale not just as a function of their value, but also w/ their type/attribute/location information.






Jon Siwek



External issue ID



Fix versions

Affects versions