Speed up for loops over empty tables

Description

topic/jazoff/speedup-for

This change doubles the performance of for loops over empty tables.

A bro binary that prints out this size shows for
testing/external/bro-testing/2009-M57-day11-18.trace, for loops are run
over tables of size:

11477 for size 0
8371 for size 1
1227 for size 3
239 for size 2
141 for size 6
57 for size 5
10 for size 4
5 for size 7
2 for size 13
2 for size 8
2 for size 11
1 for size 9

~53% of the for loops were across an empty table. These loops come from
things like the for loop in the http script over c$http_state$pending

This change prevents the creation of an iteration cookie entirely if the
table is empty.

Using this test script:

const scan_ports: table[port] of count = { };

local x = 0;
while ( x < 20000000 ) {
for(p in scan_ports) {
}
++x;
}

$ time bro.orig -b ___bench.bro

real 0m10.732s
user 0m10.415s
sys 0m0.113s

$ time bro.nocookie -b ___bench.bro

real 0m4.694s
user 0m4.464s
sys 0m0.086s

Environment

None

Activity

Show:
Justin Azoff
December 8, 2017, 9:33 AM

There's likely more speedups that can be done inside ForStmt:oExec over tables.. but I have no idea what CompHash.cc actually does and why CompositeHash::RecoverOneVal is so complicated.

I tried replacing

while ( loop_vals->NextEntry(k, c) )
{
ListVal* ind_lv = tv->RecoverIndex(k);

with just something like this to skip RecoverIndex entirely

while ( ind_lv = loop_vals->NextEntry(k, c) )
{

but whatever 'value' NextEntry returns it's not actually the value.

Jon Siwek
December 9, 2017, 6:41 AM

Cool, thanks.

In the other attempt of what you tried, NextEntry() is actually returning the value (in context of key-value pairs), though that isn't what's needed here – the RecoverIndex() call is retrieving the next key in the table because Bro's for-loops only iterate over keys and the programmer needs to then do the lookup of the value themselves if they need it.

Maybe it would be nice is if Bro's for-loop allowed looping over the key-value pairs instead of just keys. And that could also be a performance benefit, since it's not great to have to do a redundant lookup in script-land if we already have the value in the core for free. I'll make another ticket to investigate what it would take to extend the scripting language to support that.

Merged

Assignee

Jon Siwek

Reporter

Justin Azoff

Labels

None

External issue ID

None

Components

Fix versions

Affects versions

Priority

Normal