Bug in SPICY match() Method :: Does Not Match an Extended ASCII Char

Description

I am developing a protocol analyzer using Spicy, using the BRO/HILTI/SPICY docker image posted at the following URL: https://hub.docker.com/r/rsmmr/hilti/

In my Spicy protocol analyzer, I am trying to match a pattern. My data type is ‘bytes’, and I am using the ‘match()’ method. If the pattern includes an extended ASCII character (range from 0x80 to 0xFF), then the pattern fails to find a match. However, if I wildcard the extended ASCCI character, then it finds a match. While I cannot share source code from my original project, I created a sample project to demonstrate the bug.

I will upload the following sample files so that you may attempt to reproduce the bug:
(1) regex_test.spicy*
(2) regex_test.evt*
(3) regex_test.bro*
(4) smb-browser-elections.pcap

  • NOTE: There might be CL/LF issues because I saved these files with Notepad on a Windows box.

As my sample data, I downloaded an SMB pcap file from wireshark.org. The regular expression patterns below are based on Frame #3, NetBIOS/SMB datagram, in the SMB pcap file 'smb-browser-elections.pcap' downloaded from the wireshark website, at the following URL:

https://wiki.wireshark.org/SampleCaptures?action=AttachFile&do=get&target=smb-browser-elections.pcapng

Here are my sample regex patterns**:

  1. Appears at offset 0x2A in Frame 3

  2. or offset 0x00 within UDP payload
    const SmbRegEx_1a = /^\x11\x02.\x16/;
    const SmbRegEx_1b = /^\x11\x02\x82\x16/;

  1. Appears at offset 0x2D in Frame 3

  2. or offset 0x03 within UDP payload
    const SmbRegEx_2a = /\x16..\x7B/;
    const SmbRegEx_2b = /\x16\xC0\xA8\x7B/;

Patterns _1a and _2a match successfully, because they include the wildcard in place of the extended ASCII character(s).

Patterns _1b and _2b do not match, because they contain offending character(s) in the extended ASCII range.

  •  

    • NOTE: my Spicy source code contains a third regex pattern, shown below:

  1. Appears at offset 0x78 in Frame 3

  2. or offset 0x4E within UDP payload
    const SmbRegEx_3a = /\x43\x41\x42\x00.\x53\x4D\x42/;
    const SmbRegEx_3b = /\x43\x41\x42\x00\xFF\x53\x4D\x42/;

Interestingly, this pattern fails to match for both _3a and _3b. I would expect _3a to match because it contains the wildcard. Not sure what is going wrong with this pattern. Is there a certain depth/limit at which the match() method will stop searching?

Thanks!
Mark

Environment

None

Assignee

Unassigned

Reporter

Mark Fernandez

Labels

External issue ID

None

Components

Priority

Normal
Configure