John McHugh | 28 Jun 19:07 2012
Picon

Out of order pcap timestamps (discussion)

The tcpreplay FAQ says "More specifically, I have seen cases where a packet has a timestamp before the
previous packet in the capture file. I'm not sure how such a pcap got created, but it seems to occasionally happen."

The purpose of this note is to provide a plausible explanation for timestamp reversals and a bit of advice
for those who are capturing pcap.  The posting is somewhat lengthly as I think the context is important.  It
is not specific to tcpreplay except in the context of the FAQ quotation, above.

1) There are two publicly available data sets that manifest the problem. (There are probably many more, but
these are available.)
  a) The Crawdad (Dartmouth) 2003-2004 wireless packet header traces - about 4 months of anonymized headers
from 18 wireless sniffers located on the dartmouth campus.  Packets were truncated after the port fields
for TCP and UDP, after the IP header for everything else. The user agreement is fairly benign, but
attacking the anonymization is off limits.
  b) The IARPA data set available from the DHS Predict repository.  Note that a Predict user agreement is
required which requires institutional or corporate level responsibility.  This is artificial data with
background and attack scenarios captured by 3 tcpdump probes attached to trunks within a simulated /11
network.  It contains tcpdump traces as well as logs, alerts, etc. along with ground truth for labeled attacks.

2) I have primarily worked with NetFlow data, so packet headers are fine for what I do.  I am paranoid about
data, so I do a lot of sanity checking.  I use the SiLK tools from CERT NETSA as my primary tool base, sometimes
heavily modified.  To convert pcap to "degenerate" flows (1 packet produces 1 flow record), I use a
modified "rwptoflow" program.

3) The Dartmouth data is about 160GB of compressed packet header files. (a week or more to download on a
modest DSL line).  I modified libpcap (modification is in latest source release) to allow reading of
compressed files using a hack that supports filenames of the form "| gunzip -c <file>" to let
pcap_open_offline use popen / pclose on such forms. (thanks to Phil Budne's SNOBOL 4 in C) and modified
rwptoflow to treat file names starting with " <at> " as a list of files to be processed so that I could process all
4 months of data for one sensor in one pass, producing a date hierarchy of hourly SiLK files.  (mergecap,
distributed with wireshark, opens all the files at once and runs out of file descriptors for some sensors
(Continue reading)

Aaron Turner | 28 Jun 19:38 2012
Picon

Re: Out of order pcap timestamps (discussion)

Wow, that's a lot of information.  I'll add one more cases I'm aware of:

Applications capturing on multiple interfaces and writing to a single
file.  Depending on who does the timestamping (the NIC or the CPU) as
well as how you're poll()'ing the interfaces can lead to small jumps
back in time.  Basically if the NIC is doing the timestamping of the
packet (which is hardware/driver dependent) it will put the "correct
time" in the packet header, but it's quite probable that the
user-space application which is polling those NIC's won't actually
read the packets in the order they were received.

Basically, userspace would have to cache packets in memory and reorder
them on the fly before writing the pcap file, but I suspect a lot of
people don't bother to do this.  Or you could write them to separate
files and merge them offline.

On Thu, Jun 28, 2012 at 10:07 AM, John McHugh <mchugh <at> cs.unc.edu> wrote:
> The tcpreplay FAQ says "More specifically, I have seen cases where a packet has a timestamp before the
previous packet in the capture file. I'm not sure how such a pcap got created, but it seems to occasionally happen."
>
> The purpose of this note is to provide a plausible explanation for timestamp reversals and a bit of advice
for those who are capturing pcap.  The posting is somewhat lengthly as I think the context is important.
 It is not specific to tcpreplay except in the context of the FAQ quotation, above.
>
> 1) There are two publicly available data sets that manifest the problem. (There are probably many more,
but these are available.)
>   a) The Crawdad (Dartmouth) 2003-2004 wireless packet header traces - about 4 months of anonymized
headers from 18 wireless sniffers located on the dartmouth campus.  Packets were truncated after the
port fields for TCP and UDP, after the IP header for everything else. The user agreement is fairly benign,
but attacking the anonymization is off limits.
(Continue reading)


Gmane