Add support for slice by 8 to existing crc32 algorithm. Also modify
gen_crc32table.c to only produce table entries that are actually used.
The parameters CRC_LE_BITS and CRC_BE_BITS determine the number of bits in
the input array that are processed during each step. Generally the more
bits the faster the algorithm is but the more table data required.
Using an x86_64 Opteron machine running at 2100MHz the following table was
collected with a pre-warmed cache by computing the crc 1000 times on a
buffer of 4096 bytes.
BITS is the value of CRC_LE_BITS or CRC_BE_BITS. The old
default was 8 which actually selected the 32 bit algorithm. In
this version the value 8 is used to select the standard
8 bit algorithm and two new values: 32 and 64 are introduced
to select the slice by 4 and slice by 8 algorithms respectively.
Where Size is the size of crc32.o's text segment which includes
code and table data when both LE and BE versions are set to BITS.
The current version of crc32.c by default uses the slice by 4 algorithm
which requires about 2.8 cycles per byte. The slice by 8 algorithm is
roughly 2X faster and enables packet processing at over 1GB/sec on a
typical 2-3GHz system.
Signed-off-by: Bob Pearson <rpearson@systemfabricworks.com> Cc: Roland Dreier <roland@kernel.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>