A PAINLESS GUIDE TO CRC ERROR DETECTION ALGORITHMS

Contents:


Chapter 12) "Reflected" Table-Driven Implementations

Despite the fact that the above code is probably optimized about as much as it could be, this did not stop some enterprising individuals from making things even more complicated. To understand how this happened, we have to enter the world of hardware.

DEFINITION: A value/register is reflected if it's bits are swapped around its centre. For example: 0101 is the 4-bit reflection of 1010.

0011 is the reflection of 1100.
0111-0101-1010-1111-0010-0101-1011-1100 is the reflection of
0011-1101-1010-0100-1111-0101-1010-1110.

Turns out that UARTs (those handy little chips that perform serial IO) are in the habit of transmitting each byte with the least significant bit (bit 0) first and the most significant bit (bit 7) last (i.e. reflected). An effect of this convention is that hardware engineers constructing hardware CRC calculators that operate at the bit level took to calculating CRCs of bytes streams with each of the bytes reflected within itself. The bytes are processed in the same order, but the bits in each byte are swapped; bit 0 is now bit 7, bit 1 is now bit 6, and so on. Now this wouldn't matter much if this convention was restricted to hardware land. However it seems that at some stage some of these CRC values were presented at the software level and someone had to write some code that would interoperate with the hardware CRC calculation.

In this situation, a normal sane software engineer would simply reflect each byte before processing it. However, it would seem that normal sane software engineers were thin on the ground when this early ground was being broken, because instead of reflecting the bytes, whoever was responsible held down the byte and reflected the world, leading to the following "reflected" algorithm which is identical to the previous one except that everything is reflected except the input bytes.

        Message (non augmented) >-----+
                                      |
      Bytes   0    1    2    3        v
           +----+----+----+----+      |
           |    |    |    |    |>----XOR
           +----+----+----+----+      |
                     ^                |
                     |                |
                    XOR               |
                     |                |
           +----+----+----+----+0     |
           +----+----+----+----+      v
           +----+----+----+----+      |
           +----+----+----+----+      |
           +----+----+----+----+      |
           +----+----+----+----+      |
           +----+----+----+----+      |
           +----+----+----+----+<-----+
           +----+----+----+----+
           +----+----+----+----+
           +----+----+----+----+
           +----+----+----+----+
           +----+----+----+----+255

Notes:

At the end of execution, the register contains the reflection of the final CRC value (remainder). Actually, I'm being rather hard on whoever cooked this up because it seems that hardware implementations of the CRC algorithm used the reflected checksum value and so producing a reflected CRC was just right. In fact reflecting the world was probably a good engineering solution - if a confusing one.

We will call this the REFLECTED algorithm.

Whether or not it made sense at the time, the effect of having reflected algorithms kicking around the world's FTP sites is that about half the CRC implementations one runs into are reflected and the other half not. It's really terribly confusing. In particular, it would seem to me that the casual reader who runs into a reflected, table-driven implementation with the bytes "fed in the wrong end" would have Buckley's chance of ever connecting the code to the concept of binary mod 2 division.

It couldn't get any more confusing could it? Yes it could.


Chapter 13) "Reversed" Polys

As if reflected implementations weren't enough, there is another concept kicking around which makes the situation bizaarly confusing. The concept is reversed Polys.

It turns out that the reflection of good polys tend to be good polys too! That is, if G=11101 is a good poly value, then 10111 will be as well. As a consequence, it seems that every time an organization (such as CCITT) standardizes on a particularly good poly ("polynomial"), those in the real world can't leave the poly's reflection alone either. They just HAVE to use it. As a result, the set of "standard" poly's has a corresponding set of reflections, which are also in use. To avoid confusion, we will call these the "reversed" polys.

   X25   standard: 1-0001-0000-0010-0001
   X25   reversed: 1-0000-1000-0001-0001

   CRC16 standard: 1-1000-0000-0000-0101
   CRC16 reversed: 1-0100-0000-0000-0011

Note that here it is the entire poly that is being reflected/reversed, not just the bottom W bits. This is an important distinction. In the reflected algorithm described in the previous section, the poly used in the reflected algorithm was actually identical to that used in the non-reflected algorithm; all that had happened is that the bytes had effectively been reflected. As such, all the 16-bit/32-bit numbers in the algorithm had to be reflected. In contrast, the ENTIRE poly includes the implicit one bit at the top, and so reversing a poly is not the same as reflecting its bottom 16 or 32 bits.

The upshot of all this is that a reflected algorithm is not equivalent to the original algorithm with the poly reflected. Actually, this is probably less confusing than if they were duals.

If all this seems a bit unclear, don't worry, because we're going to sort it all out "real soon now". Just one more section to go before that.


Chapter 14) Initial and Final Values

In addition to the complexity already seen, CRC algorithms differ from each other in two other regards:

For example, the "CRC32" algorithm initializes its register to FFFFFFFF and XORs the final register value with FFFFFFFF.

Most CRC algorithms initialize their register to zero. However, some initialize it to a non-zero value. In theory (i.e. with no assumptions about the message), the initial value has no affect on the strength of the CRC algorithm, the initial value merely providing a fixed starting point from which the register value can progress. However, in practice, some messages are more likely than others, and it is wise to initialize the CRC algorithm register to a value that does not have "blind spots" that are likely to occur in practice. By "blind spot" is meant a sequence of message bytes that do not result in the register changing its value. In particular, any CRC algorithm that initializes its register to zero will have a blind spot of zero when it starts up and will be unable to "count" a leading run of zero bytes. As a leading run of zero bytes is quite common in real messages, it is wise to initialize the algorithm register to a non-zero value.


Chapter 15) Defining Algorithms Absolutely

At this point we have covered all the different aspects of table-driven CRC algorithms. As there are so many variations on these algorithms, it is worth trying to establish a nomenclature for them. This section attempts to do that.

We have seen that CRC algorithms vary in:

In order to be able to talk about particular CRC algorithms, we need to able to define them more precisely than this. For this reason, the next section attempts to provide a well-defined parameterized model for CRC algorithms. To refer to a particular algorithm, we need then simply specify the algorithm in terms of parameters to the model.


Chapter 16) A Parameterized Model For CRC Algorithms

In this section we define a precise parameterized model CRC algorithm which, for want of a better name, we will call the "Rocksoft^tm Model CRC Algorithm" (and why not? Rocksoft^tm could do with some free advertising :-).

The most important aspect of the model algorithm is that it focusses exclusively on functionality, ignoring all implementation details. The aim of the exercise is to construct a way of referring precisely to particular CRC algorithms, regardless of how confusingly they are implemented. To this end, the model must be as simple and precise as possible, with as little confusion as possible.

The Rocksoft^tm Model CRC Algorithm is based essentially on the DIRECT TABLE ALGORITHM specified earlier. However, the algorithm has to be further parameterized to enable it to behave in the same way as some of the messier algorithms out in the real world.

To enable the algorithm to behave like reflected algorithms, we provide a boolean option to reflect the input bytes, and a boolean option to specify whether to reflect the output checksum value. By framing reflection as an input/output transformation, we avoid the confusion of having to mentally map the parameters of reflected and non-reflected algorithms.

An extra parameter allows the algorithm's register to be initialized to a particular value. A further parameter is XORed with the final value before it is returned.

By putting all these pieces together we end up with the parameters of the algorithm:

NAME
This is a name given to the algorithm. A string value.
WIDTH
This is the width of the algorithm expressed in bits. This is one less than the width of the Poly.
POLY
This parameter is the poly. This is a binary value that should be specified as a hexadecimal number. The top bit of the poly should be omitted. For example, if the poly is 10110, you should specify 06. An important aspect of this parameter is that it represents the unreflected poly; the bottom bit of this parameter is always the LSB of the divisor during the division regardless of whether the algorithm being modelled is reflected.
INIT
This parameter specifies the initial value of the register when the algorithm starts. This is the value that is to be assigned to the register in the direct table algorithm. In the table algorithm, we may think of the register always commencing with the value zero, and this value being XORed into the register after the N'th bit iteration. This parameter should be specified as a hexadecimal number.
REFIN
This is a boolean parameter. If it is FALSE, input bytes are processed with bit 7 being treated as the most significant bit (MSB) and bit 0 being treated as the least significant bit. If this parameter is FALSE, each byte is reflected before being processed.
REFOUT
This is a boolean parameter. If it is set to FALSE, the final value in the register is fed into the XOROUT stage directly, otherwise, if this parameter is TRUE, the final register value is reflected first.
XOROUT
This is an W-bit value that should be specified as a hexadecimal number. It is XORed to the final register value (after the REFOUT) stage before the value is returned as the official checksum.
CHECK
This field is not strictly part of the definition, and, in the event of an inconsistency between this field and the other field, the other fields take precedence. This field is a check value that can be used as a weak validator of implementations of the algorithm. The field contains the checksum obtained when the ASCII string "123456789" is fed through the specified algorithm (i.e. 313233... (hexadecimal)).

With these parameters defined, the model can now be used to specify a particular CRC algorithm exactly. Here is an example specification for a popular form of the CRC-16 algorithm.

   Name   : "CRC-16"
   Width  : 16
   Poly   : 8005
   Init   : 0000
   RefIn  : True
   RefOut : True
   XorOut : 0000
   Check  : BB3D