Some time ago someone in #bro asked for matching mail addresses using the intel-framework. We realized, that the seen-script seems to contain a bug: Using
to extract a mail address misses the last character and does not respect the possibility of multiple addresses.
I will add a pcap later.
Having a look at this issue I noticed another problem with SMTP: Bro assumes that e.g. the To-field contains a comma-separated list of mail-addresses. According to RFC 5322 there is also the possibility to use groups (see below).
Regarding groups I am not sure whether they can be nested. If I am not mistaken, the grammar in the RFC would allow nested groups. But for my understanding this is not desired for the Destination Address Fields:
the field name, which is either "To", "Cc", or "Bcc", followed by a comma-separated list of one or more addresses (either mailbox or group syntax)
That leads to two questions for me:
Would it be sufficient for Bro to extract just the addresses (usually whats inside < and >) without full names (description quoted with " )?
If full names are desired, should Bro support nested group-syntax?
I think option 1 (just log the plain addresses) should be sufficient, because if someone is interested in more details, he could have a look at the raw headers himself.
What do you think about that?
assigning to Seth as I believe he started looking at this already
Done. Thanks Jan!