Investigate further improvements to file analysis performance

Description

Some further ideas for measuring and improving the performance of maintaining the handles were floating around.

Environment

None

Activity

Show:
Robin Sommer
April 24, 2014, 10:59 PM
Edited

Two questions:

(1)

I'm actually wondering about performance here as set/map can potentially
be expensive in particular for small sizes (compared to using a vector
for example), and these will be instantiated and manipulated quite often.

Put differently: I wouldn't be sure that using a set here is necessarily faster overall than a list as long as there's just a few elements in there. Were you able to confirm that?

(2)

Baseline/tests.m57-long/http.log: some MIME types change from
text/html to text/plain, is that expected? (Update: Ah, is that the bof_buffer_size change?)

Robin Sommer
April 24, 2014, 11:08 PM

How ugly would it be (or would it even work) to have an interface to change Val's internal values (e.g. Val::val.uint_val) so a new Val doesn't have to be created to update fields in a fa_file record?

Not sure that would be safe. I believe the Val can end up being shared with other locations which wouldn't expect it to change. Say if somebody assigned the current value to somewhere else, then when it would later be changed under the hood it would show up there too.

Robin Sommer
April 24, 2014, 11:11 PM

Here are the performance improvements I'm seeing:

Pretty neat (though as you say, also clearly traffic dependent).

Jon Siwek
April 28, 2014, 2:18 PM

I'm actually wondering about performance here as set/map can potentially
be expensive in particular for small sizes (compared to using a vector
for example), and these will be instantiated and manipulated quite often.
Put differently: I wouldn't be sure that using a set here is necessarily faster overall than a list as long as there's just a few elements in there. Were you able to confirm that?

It can be questionable – in other places I've tried replacing lists with sets/maps and have measured some performance decrease. But in this case, the difference seemed negligible... I think it was a slight improvement possibly because file signatures will now more commonly have multiple matches where before only a single protocol signature would match. Code-wise, it did simplify things, though I guess that's only a minor/weak argument for the change.

Baseline/tests.m57-long/http.log: some MIME types change from
text/html to text/plain, is that expected? (Update: Ah, is that the bof_buffer_size change?)

Yes, that was from the change to restrict how much data may be fed in the the file MIME signature matching stuff to be no greater than the bof_buffer_size field – as that's the original intent and also the way it's documented.

Robin Sommer
May 2, 2014, 3:33 AM

Sounds good, thanks.

Assignee

Jon Siwek

Reporter

Robin Sommer

Labels

None

External issue ID

None

Components

Fix versions

Priority

Normal
Configure