1) Clone the original STAMP benchmarks:
git clone https://github.com/kozyraki/stamp
2) Obtain and apply our patch:
cd stamp
wget http://ditec.um.es/~rtitos/patches/intruder-v-mmap/intruder-v-mmap.patch
patch -p2 < intruder-v-mmap
3) Build the modified version of intruder (NOTE: Makefile.seq builds the sequential flavour only suitable for single thread runs!)
cd intruder
make -f Makefile.seq
4) Generate serialized packet stream file (-o FILE)
./intruder -a10 -l128 -n262144 -s1 -t1 -o stream-l128-n256k-s1.input
5) Run benchmark passing serialized packet stream file (-i FILE) and page fault frequency (-f NUM) as arguments
./intruder -a10 -l128 -n262144 -s1 -t1 -i stream-l128-n256k-s1.input -f 50
./intruder -a10 -l128 -n262144 -s1 -t1 -i stream-l128-n256k-s1.input -f 50
The following inputs are recommended for this modified version of the benchmark:
./intruder -a10 -l128 -n4096 -s1 -t16 -i inputs/packetstream-l128-n4096 -f20
The patched version of intruder differs from the original STAMP
benchmark in two aspects that may be enabled/disabled separately by editing intruder/Defines.common.mk
CFLAGS += -DMMAP_SERIALIZED_STREAM
1) The stream of packets used as input data is not generated during the initialization phase (stream_generate), as this moves all page faults ocurring in the heap area away from the region of interest of the benchmak. Instead, threads obtain packets using memory-mapped I/O during the parallel phase, reading from an input file that contains a previously serialized packet stream. Moreover, a varying fraction of the mmap'ed region is touched during initialization, so as to model scenarios with different page fault frequencies (e.g. 20\% of the pages in the memory-mapped region are touched), according to the -f command-line option.
CFLAGS += -DUSE_FRAGMENT_VECTOR
2) Intruder's main transaction (decoder_process) processes a packet by looking up its flowId in a red-black tree of flows, where each node stores packets belonging to the same flow. The original benchmark uses a sorted linked list to keep packets, until all packets ("fragments") have been collected and the flow can be reassembled. This design leads to i) every single transaction performing dynamic memory allocation, at the very least to insert a new node in the list; and ii) every single transaction traverses the fragment list to insert a packet in the appropriate position to keep the list sorted. Because the length (number of packets) of a flow is known a priori, and fragment have consecutive id's, it is perfectly possible (and more efficient) to use a vector instead of a linked list, so that transactions i) only allocate dynamic memory when handling the first packet from a given flow (a new node is inserted into the flow tree, and the vector of packets is allocated); and ii) fragment insertion is done in constant time, rather than in linear time.