HTTP analyzer too sensitive to content gaps, was: HTTP messages missing in files.log

Description

I have a trace with multiple HTTP requests inside a persistent HTTP session. for which only the first two appear in files.log, the remaining ones are missing. Looks like a bug.

Environment

None

Activity

Show:
Jon Siwek
April 9, 2014, 8:31 PM

There's a missing TCP segment in the middle of that pcap that looks like it would have contained an HTTP reply. And the thing about the HTTP analyzer seems to be that it stops parsing the rest of the connection if there's a gap that's not isolated to an HTTP message body. So two files end up being pushed from the HTTP analyzer over to the file analysis stuff, then the HTTP analyzer stops parsing anything else due to the missing TCP segment.

Since that seems intentional and it's an HTTP analysis limitation not a file analysis bug, think there's anything to do here right now?

Robin Sommer
April 10, 2014, 3:30 PM

Agree, if that's indeed the reason, it's nothing to fix.

Though the client side of those requests still appears in http.log,
are they parsed before the gap comes on the server side?

Also, I noticed this when comparing against output of when using the
BinPAC++ HTTP analyzer; with that one, they all get reported in
files.log. However, that one doesn't deal with gaps either so not sure
how that comes. I'll take another look later.

Robin

Jon Siwek
April 10, 2014, 3:49 PM

Though the client side of those requests still appears in http.log,
are they parsed before the gap comes on the server side?

A gap from the server's side just disables further parsing of responses, but any more requests will still be parsed. HTTP.cc:1158 is where this happens.

And though it doesn't apply to this case, the code for dealing with a gap in a request seems fishy as the comment implies an intention that doesn't match what the code actually does and the code has an extra content_line->SetSkipDeliveries(1) statement which was probably meant to be invoked on the other side's ContentLine_Analyzer.

Seth Hall
April 10, 2014, 4:07 PM

This has been a long standing issue that we've always kind of deferred on so we just left the HTTP analyzer as one that can't tolerate packet loss. We've discussed some ways of making this "re-synchronization" support a general concept in binpac++ which probably makes the most sense. I'm really not sure there is much point in dumping lots of time in dealing with the current http analyzer since the analyzer has had this behavior for a long time. That said, if it's an easy change and unlikely to break stuff too badly it would then make sense to do it.

If we changed this, it's possible that we'd have to revisit the base http scripts too to make sure they can cope with re-synchronization appropriately.

Assignee

Unassigned

Reporter

Robin Sommer

Labels

None

External issue ID

None

Components

Fix versions

Priority

Normal
Configure