Fast CRC library (uses the built-in crc-module in Teensy3)

Status
Not open for further replies.
I'm using latest 1.6.11 and 1.30beta3 -- FastCRC looks like it is using __KINETISK__ and not KINETISK ??

I changed lib to use KINETISK and also had to add #include "Arduino.h" in the .cpp's, and benchmark is using hardware now. Here is data for Teensy 3.6 (K66 beta) @180 mhz
Code:
CRC Benchmark
F_CPU: 180 MHz, length: 57344 Bytes.

Maxim (iButton) FastCRC:	Value:0xD2, Time: 718 us (638.93 mbs)
Maxim (iButton) builtin:	Value:0xD2, Time: 23276 us (19.71 mbs)
MODBUS FastCRC:	Value:0x25F1, Time: 720 us (637.16 mbs)
MODBUS builtin: 	Value:0x25F1, Time: 22319 us (20.55 mbs)
XMODEM FastCRC:	Value:0xB217, Time: 718 us (638.93 mbs)
XMODEM builtin: 	Value:0xB217, Time: 22639 us (20.26 mbs)
MCRF4XX FastCRC:	Value:0x4BE4, Time: 718 us (638.93 mbs)
MCRF4XX builtin:	Value:0x4BE4, Time: 3827 us (119.87 mbs)
KERMIT FastCRC:	Value:0x9137, Time: 718 us (638.93 mbs)
Ethernet FastCRC:	Value:0x29C6653B, Time: 720 us (637.16 mbs)

Seems a little slower then the 3.1?
 
Seems a little slower then the 3.1?

The buffer sizes might differ so you need to look at data rate: K66@180mhz 637 mbs vs T3.1@120mhz 470 mbs

The CRC hardware is the same for teensy 3.0,3.1,3.2, 3.5,3.6 so speed is proportional to CPU clock (K66@120mhz 477 mbs)
 
Last edited:
Think Im losing mind. For some reason.. Fast CRC is actually 20times slower then a software implementation on Teensy 3.2.

Am using the latest version of the FastCRC, but.. if I didnt know better... FastCRC is maybe reverting to using its own software implementation maybe?

When running the code below, I get the following micro timing..
523
11284

Any idea what could be wrong? Got excited after seeing some peoples results.. but mine appear to be the opposite!

delay(1000);
uint8_t MyArray[4] = {0,0xFF,0x55,0x11};
uint32_t StartMicro = 0;
uint32_t EndMicro = 0;

StartMicro = micros();
for (uint16_t i = 0; i< 10000;i++)
{
CalculateCRC(MyArray,4);
}
EndMicro = micros();
USBSerial.println((EndMicro - StartMicro));

uint32_t Flags = (((1<<31) | (1<<30)) | ((0<<29) | (0<<28))); //(refin=false refout=true)
uint8_t Val = 0;
StartMicro = micros();
for (uint16_t i = 0; i< 10000;i++)
{
~CRC8.generic(0x1D,0xFF,Flags,MyArray,4);
}
EndMicro = micros();
USBSerial.println((EndMicro - StartMicro));
delay(100000000);


And the SW CRC that Im using is below
uint8_t CalculateCRC(uint8_t *Frame, uint16_t Length)
{
uint8_t crc_reg=0xff;
uint8_t poly,bit_count= 0;
uint16_t byte_count;
uint8_t bit_point;

for (byte_count=0; byte_count<Length; ++byte_count, ++Frame)
{
for (bit_count=0, bit_point=0x80 ; bit_count<8; ++bit_count, bit_point>>=1)
{
if (bit_point & *Frame) // case for new bit = 1
{
if (crc_reg & 0x80)
poly=1; // define the polynomial
else
poly=0x1c;
crc_reg= ( (crc_reg << 1) | 1) ^ poly;
}
else // case for new bit = 0
{
poly=0;
if (crc_reg & 0x80)
poly=0x1d;
crc_reg= (crc_reg << 1) ^ poly;
}
}
}
return ~crc_reg; // Return CRC

}
 
Last edited:
I haven't confirmed it yet with a test, but since you are only doing 4 bytes of CRC inside the loop, the overhead in setting up the hardware CRC takes longer than doing a 4-byte software CRC. you could better test performance with a 1000-byte CRC

EDIT: Nope, bigger buffer didn't solve the timing discrepancy ?? Your software crc function is somehow incredibly fast, but I haven't figured out what the compiler is doing ??

EDIT 2
using 1.6.13/1.34 and 1.6.12/1.32, the compiler seems to be doing something with the loop around your software CRC. Increasing the reps does not slow the elapsed time by much, even with val += CalculateCRC(buf, sizeof(buf)); and printing val outside the loop. if I declare val "volatile" that gives reasonable time for "Fast" optimization.
very weird!
 
Last edited:
yes, might be that some optimizations take place.
Is it the same with non-constant buffer ? (real data, for example from communications or a file on SD)
Then, the lib has some overhead to setup the hardware - and is better the larger the buffer is.
And yes, with only 4 Bytes and a simple 8 bit crc, there may be better ways.
 
I haven't confirmed it yet with a test, but since you are only doing 4 bytes of CRC inside the loop, the overhead in setting up the hardware CRC takes longer than doing a 4-byte software CRC. you could better test performance with a 1000-byte CRC

EDIT: Nope, bigger buffer didn't solve the timing discrepancy ?? Your software crc function is somehow incredibly fast, but I haven't figured out what the compiler is doing ??

EDIT 2
using 1.6.13/1.34 and 1.6.12/1.32, the compiler seems to be doing something with the loop around your software CRC. Increasing the reps does not slow the elapsed time by much, even with val += CalculateCRC(buf, sizeof(buf)); and printing val outside the loop. if I declare val "volatile" that gives reasonable time for "Fast" optimization.
very weird!

Using visual micro, and selecting optimization as 'faster', the timing has now dropped to 3 instead of 500+. The FastCRC is around 10,0000 mark.
I was thinking the optimizations was getting smart.. and bypassing the loop knowing its doing the same thing and affecting nothing.. but, I have it pumping the result into a large array and verifying its output afterwards which is correct.

The CRC im using is for the SAE J1850 car protocol. Most received messages are only 4-12 bytes long, but when reflashing ECU's, you can have upwards of 4000+ bytes.
The issue I was facing was the time it took to calculate the CRC and compare with the received CRC was taking just that fraction too long which affected capturing the start of the following message which is only 50micros after ending of previous message.
The frame is then also sent via usb so timing for that also needs to be considered.

Pretty much.. every microsecond counts. I only updated the Arduino software, visual micro and teensyduino to the latest yesterday incase that was causing the issues but still the same.


yes, might be that some optimizations take place.
Is it the same with non-constant buffer ? (real data, for example from communications or a file on SD)
Then, the lib has some overhead to setup the hardware - and is better the larger the buffer is.
And yes, with only 4 Bytes and a simple 8 bit crc, there may be better ways.

I'll try with a buffer of 4000-12000 and see if that makes any difference. But I think your right, its just due to being a simple 8bit CRC.

15years ago when this protocol was implemented into vehicles, there were dedicated Motorola chips for processing and handling received frames including calculating and validating CRC's... just like CAN controllers do. Although they no longer produce them so have to manually process the pulses.

Was hoping the inbuilt CRC calculations would be suited to this, although guess they are designed for large files?
 
if you can keep the compiler from optimizing out your timing loop, you will find that for your 4-byte CRC and 1000 iterations, the hardware CRC takes 1110 us and your software CRC is taking 4991 us. So the hardware CRC should give you a speedup of 5x

If you run the benchmark example in the FastCRC lib, on the T3.2@96mhz you will see the hardware 8-bit MODBUS CRC is 42 times faster than the software version for large buffer
 
Last edited:
if you can keep the compiler from optimizing out your timing loop, you will find that for your 4-byte CRC and 1000 iterations, the hardware CRC takes 1110 us and your software CRC is taking 4991 us. So the hardware CRC should give you a speedup of 5x

If you run the benchmark example in the FastCRC lib, on the T3.2@96mhz you will see the hardware 8-bit MODBUS CRC is 42 times faster than the software version for large buffer

Im getting no-where near that value for the hardware CRC. I'll keep messing with it, might throw it into the FastCRC lib since it doesnt appear that is getting affected by the optimization.
 
Im getting no-where near that value for the hardware CRC. I'll keep messing with it, might throw it into the FastCRC lib since it doesnt appear that is getting affected by the optimization.

Note, i was only doing 1000 iterations, your sketch was doing 10000
 
The compiler is very smart about propagating constant values. For benchmarking algorithms, you really need to fill the input buffer with data the compiler doesn't "know" in advance. The compilers are getting so good that often the only really reliable way to do this is receive the data over the USB or some other communication channel at runtime.
 
The compiler is very smart about propagating constant values. For benchmarking algorithms, you really need to fill the input buffer with data the compiler doesn't "know" in advance. The compilers are getting so good that often the only really reliable way to do this is receive the data over the USB or some other communication channel at runtime.

Ohhhhhhhhhhh that makes more sense.

Guess could read 100random bytes from analogue pin and try again, see which is faster that way :)
 
Could anyone help out with how this library works? There's not much to go by in the readme.
I have this message:

Code:
  remoteSerial.write(0xA5);
  remoteSerial.write(0x5A);
  remoteSerial.write(0x0);
  remoteSerial.write(0x10);
  remoteSerial.write(0x5);
  remoteSerial.write(0x9F);
  remoteSerial.write(0x0);
  remoteSerial.write(0x0);
  remoteSerial.write(0x4A); //0xFF   255, 0x4A   74
  remoteSerial.write(0x0);
  remoteSerial.write(0xE);  //0xF6   246, 0xE    14
  remoteSerial.write(0xD5); //0x34   52,  0xD5   213

and the last two bits seem to be CRC. How would I use FastCRC to calculate them if any of the other bits are changed?
This should be XMODEM calc.
 
Using http://crccalc.com and entering 0010059F00004A00 I get CRC-16/XMODEM 0xD50E so it seems correct (only the 8 payload bits are calculated without header A5 5A).
Because I have no experience with CRC calc's, I don't understand why 0xD50E is sent as 0x0E, 0xD5.
Is this library able to fill in the leading zeros? Ie. 0 -> 00 or E -> 0E or should I do this is code before passing on to the calculation?
 
0 -> 00 or E -> 0E
0xE and 0x0E are the same thing. They tend to look tidier when always written or printed as two digits (as in the code below).


The code below calculates the XMODEM CRC of your message and prints the CRC as a word and as two bytes (hi order byte first).
The order of the CRC bytes is defined by the protocol. As far as I can tell, the protocol requires that the high order byte be sent first.

Pete

Code:
/*
  FastCRC-Example

  (c) Frank Boesing 2014

Modified by Pete (El_Supremo)
See message #89+ff in:
https://forum.pjrc.com/threads/25699-Fast-CRC-library-(uses-the-built-in-crc-module-in-Teensy3)?p=152456&viewfull=1#post152456

*/

#include <FastCRC.h>

FastCRC16 CRC16;

uint8_t buf[] = {
  0x00,
  0x10,
  0x05,
  0x9F,
  0x00,
  0x00,
  0x4A,
  0x00, 
};

void setup()
{
  Serial.begin(115200);
  while(!Serial);
  delay(1000);
  uint16_t crc = CRC16.xmodem(buf, sizeof(buf));
  Serial.println(crc , HEX );
  Serial.print(crc >> 8,HEX);
  Serial.print(" ");
  Serial.print(crc & 0xFF, HEX);
}

void loop()
{
}
 
This works, thank you!
Any idea why they'd have the CRC backwards? Maybe for rev. engineering confusion?
 
Any idea why they'd have the CRC backwards?
If you mean crccalc.com, they aren't sending the result as part of the XMODEM protocol. They're just reporting all of the different 16-bit CRCs as 16-bit words.
As you can see from the example code I posted, the CRC is returned as a 16-bit word. It is then up to you to transmit it in the correct order.

Pete
 
My question came out unclear. The message that I'm sending to serial (in my quote above) is actually just a repetition of the incoming message that I'm sniffing on a device's UART.
I wired the Teensy as a man-in-the-middle to listen and only allow the message to flow if I want. So the question should've been 'Why would the programmers of that device chose to flip the CRC bits?'.
Just in theory, there's nothing I can do about it anyway, if I want to fool the device into thinking the Teensy is the original sender of the message. Just thought it might be relevant to what I don't know about CRC yet.
 
Just because they are using the XMODEM CRC doesn't mean that they are using XMODEM protocol and it doesn't look like they are because an XMODEM packet starts with the SOH (0x02) character. In which case, they can set up the protocol any way they see fit. I found one mention of a protocol which starts with the sync bytes 0xA5 0x5A but not one which also ends with a CRC.

Pete
 
Thanks for the explanation.
I figured they're not bound to any standard, but inerested in how CRCs work.
 
Status
Not open for further replies.
Back
Top