
                    1. HoneyD Challenge Submission
        Honeycomb, an IDS Signature Generator for Honeyd Traffic

========================================================================

Author: Christian Kreibich
Email:  Christian.Kreibich@cl.cam.ac.uk
[ Address removed.  Niels Provos ]



Hi guys,


here's my humble contribution to your Honeyd contest: it is a pattern
detection engine for the network traffic passing through Honeyd,
including a signature generator that currently outputs Snort
signatures. It's called Honeycomb, as it combs the data in your honeypot
for useful stuff (think of the guys in Spaceballs combing the desert :)

I originally hoped to have a paperlike document ready by the time of
the deadline but unfortunately ran out of time, so this email will have
to make do for now.

System Description
==================

The basic idea is as follows: given that we're dealing with a honeypot
here, we know that any traffic we see is basically not supposed to be
there. It hence would be cool to have an engine that looks for patterns
and anomalies in the traffic automatically, not by comparing the traffic
to an existing pattern set, but by comparing the traffic to previously
seen traffic itself and by performing sanity checks on packet headers
etc. The system is clearly responsive in nature, but could still be
really helpful to get a quick grasp of what's been happening to your
honeypot on a higher abstraction level that tcpdump logs, but in more
detail than the syslog messages honeyd generates. Think of a worm that
hits your pot twice, or typical cgi exploits -- if the system works
correctly, the entire characteristic part of those attacks should show
up in a new signature.

Now, why does this need to live in honeyd? Well, it could be put in an
external monitor watching the traffic in and out of the honeypot, but

- it saves the overhead of grabbing the packets, as honeyd already does
that.
- honeyd already does IP fragment reassembly
- honeyd *is* a honeypot. That means, only while it's running can there
be any traffic. We hence eliminate any cold start or state synchronization
issues compared to an external system that can be started/stopped at times
honeyd keeps running.

Obviously the signatures generated aren't instantly useful in a production
environment, but could nevertheless prove of great value if unseen new
attacks are becoming usable by scriptkiddies and are hence attempt repeatedly.


Packet Handling
===============

The system basically handles packets as follows:

- IP, UDP and TCP packet headers are compared to a number of previously
seen ones on a header field basis. Matching fields (or also partially 
matching ones like IP address ranges) are reported.

- TCP stream reassembly is performed and packet payload is mined for 
similar content. The system maintains for a number of recent TCP 
connections and keeps the messages exchanged available. By message I mean
payload data sent in one direction without real data (other than ACKs)
flowing in the other direction. Think HTTP request/answer for example.
The system then investigates matching TCP messages using a longest
common substring algorithm, finding the largest match in the payload
possible, and adding it to a new signature.


System Design
=============

In order to keep the impact on the honeyd code at a minimum, I've extended
honeyd 0.5 by adding a plugin engine and pattern inspection hooks, which
only required very few real changes to the code. Honeycomb itself is a
plugin and doesn't interfere with honeyd at all. My contribution contains
the follwing:

- honeyd-hooks-0.5.0.tar.gz

This is my modified version of honeyd-0.5. Look at plugins.[ch] and
hooks.[ch] for the stuff I've added. I've also done pretty thorough
cleanups of the automake/autoconf build stuff as it didn't work very well
on my system (eg the libhoneyd hack broke as soon as a different libtool
was used, libdnet is not detected as dnet is dumbnet on Debian etc). It
does contain the two patches mentioned on the website. I hope I didn't
break anything. A more detailed list of changes is at the bottom of this
mail.

- honeycomb-0.1.tar.gz

My plugin. Install the modified honeyd first, and then make sure this
plugin's
configure script picks it up, use --with-honeyd=blah if necessary.

- libstree-0.1.0.tar.gz

This is a generic suffix tree implementation providing a longest common
substring algorithm implmentation. It's probably not particularly valuable
for this competition but took the longest time to implement :) You need
it installed for honeycomb to build.

I have not been able to test the builds on many systems, in particular
not on BSDs yet, sorry. I hope things won't be hard to get working. I have
built the thing on a Debian Linux system. There was generally not  much time
for testing (I basically finished the code two hours ago) so there will
be bugs left. Sorry. There is however ample documentation in the code ...

Feedback is appreciated. If you have any questions regarding why things
don't work etc just let me know. I'll be offline for a few days but will
get back to you next week.


How to play with it
===================

Basically

1. Install the patched honeyd.
2. Install libstree.
3. Install the Honeycomb plugin.
4. Run honeyd. You should see a message that the plugin got picked up.
5. Watch the signatures that appear in /tmp/honeycomb.log

Configuration should be through a config file but that's not there yet.
Look at the values in honeycomb.h to see what can be tweaked.


TODO list
=========

I think this thing clearly deserves more time than the three or so weeks
I had to implement everything. In particular, one problem now is that
the longest common substrings found aren't necessarily part of the area
of the payload that contains the relevant data. The approach of analyzing
packet payload should also incorporate the protocol we're dealing with,
e.g. HTTP etc. The longest common substring algorithm in libstree is
flexible enough to give you multiple longest strings, only those up to
a certain length etc, so there's some room for experimenting.

Another thing is a better mechanism for "accepting" generated signatures.
Right now a new signature is printed out once it is different to all
those previously printed (up to a certain number). Features here could
be a minimum number of features included, for example.

ICMP support is not yet there.


Honeyd Changes:
===============

More detailed list of honeyd changes:

- Fixed configure check to honour dumbnet.h
- Added dnet.h compatibility wrapper to compat/ directory
- Fixed configure.in so that it correctly finds /usr/bin/dnet-conf
  by default.
- My build sometimes aborted saying that before I could use
  "CFLAGS += ...", CFLAGS must be defined somewhere. I removed the +
  as it wasn't used in my case.
- included grp.h in command.c to fix warning
- changed a few NULLs to 0s to fix warnings
- Added an option -o to use own packets -- I found that useful
  to test my honeyd on a standalone laptop, sending data to my vmnet1.
- Switched to getopt_long and added --plugin-dir to display the
  directory used for plugins.
- hooks.[ch]: A simple list-based implementation of hook implementations
  for packet data. Users can register hooks on a per-protocol basis
  for the various IP_PROTO_xxx constants.
- plugins.[ch]: Added ltdl support to dynamically load in plugins
  installed in $(datadir)/honeyd/plugins. Makefile.am passes that
  value as a #define to each file as PATH_HONEYDPLUGINS.
- Revamped the help output to be a bit clearer.
- Added libltdl to provide plugin support.

