I'm using latest 1.6.11 and 1.30beta3 -- FastCRC looks like it is using __KINETISK__ and not KINETISK ??
I changed lib to use KINETISK and also had to add #include "Arduino.h" in the .cpp's, and benchmark is using hardware now. Here is data for Teensy 3.6 (K66 beta) @180 mhz
Code:CRC Benchmark F_CPU: 180 MHz, length: 57344 Bytes. Maxim (iButton) FastCRC: Value:0xD2, Time: 718 us (638.93 mbs) Maxim (iButton) builtin: Value:0xD2, Time: 23276 us (19.71 mbs) MODBUS FastCRC: Value:0x25F1, Time: 720 us (637.16 mbs) MODBUS builtin: Value:0x25F1, Time: 22319 us (20.55 mbs) XMODEM FastCRC: Value:0xB217, Time: 718 us (638.93 mbs) XMODEM builtin: Value:0xB217, Time: 22639 us (20.26 mbs) MCRF4XX FastCRC: Value:0x4BE4, Time: 718 us (638.93 mbs) MCRF4XX builtin: Value:0x4BE4, Time: 3827 us (119.87 mbs) KERMIT FastCRC: Value:0x9137, Time: 718 us (638.93 mbs) Ethernet FastCRC: Value:0x29C6653B, Time: 720 us (637.16 mbs)
Seems a little slower then the 3.1?
delay(1000);
uint8_t MyArray[4] = {0,0xFF,0x55,0x11};
uint32_t StartMicro = 0;
uint32_t EndMicro = 0;
StartMicro = micros();
for (uint16_t i = 0; i< 10000;i++)
{
CalculateCRC(MyArray,4);
}
EndMicro = micros();
USBSerial.println((EndMicro - StartMicro));
uint32_t Flags = (((1<<31) | (1<<30)) | ((0<<29) | (0<<28))); //(refin=false refout=true)
uint8_t Val = 0;
StartMicro = micros();
for (uint16_t i = 0; i< 10000;i++)
{
~CRC8.generic(0x1D,0xFF,Flags,MyArray,4);
}
EndMicro = micros();
USBSerial.println((EndMicro - StartMicro));
delay(100000000);
uint8_t CalculateCRC(uint8_t *Frame, uint16_t Length)
{
uint8_t crc_reg=0xff;
uint8_t poly,bit_count= 0;
uint16_t byte_count;
uint8_t bit_point;
for (byte_count=0; byte_count<Length; ++byte_count, ++Frame)
{
for (bit_count=0, bit_point=0x80 ; bit_count<8; ++bit_count, bit_point>>=1)
{
if (bit_point & *Frame) // case for new bit = 1
{
if (crc_reg & 0x80)
poly=1; // define the polynomial
else
poly=0x1c;
crc_reg= ( (crc_reg << 1) | 1) ^ poly;
}
else // case for new bit = 0
{
poly=0;
if (crc_reg & 0x80)
poly=0x1d;
crc_reg= (crc_reg << 1) ^ poly;
}
}
}
return ~crc_reg; // Return CRC
}
I haven't confirmed it yet with a test, but since you are only doing 4 bytes of CRC inside the loop, the overhead in setting up the hardware CRC takes longer than doing a 4-byte software CRC. you could better test performance with a 1000-byte CRC
EDIT: Nope, bigger buffer didn't solve the timing discrepancy ?? Your software crc function is somehow incredibly fast, but I haven't figured out what the compiler is doing ??
EDIT 2
using 1.6.13/1.34 and 1.6.12/1.32, the compiler seems to be doing something with the loop around your software CRC. Increasing the reps does not slow the elapsed time by much, even with val += CalculateCRC(buf, sizeof(buf)); and printing val outside the loop. if I declare val "volatile" that gives reasonable time for "Fast" optimization.
very weird!
yes, might be that some optimizations take place.
Is it the same with non-constant buffer ? (real data, for example from communications or a file on SD)
Then, the lib has some overhead to setup the hardware - and is better the larger the buffer is.
And yes, with only 4 Bytes and a simple 8 bit crc, there may be better ways.
if you can keep the compiler from optimizing out your timing loop, you will find that for your 4-byte CRC and 1000 iterations, the hardware CRC takes 1110 us and your software CRC is taking 4991 us. So the hardware CRC should give you a speedup of 5x
If you run the benchmark example in the FastCRC lib, on the T3.2@96mhz you will see the hardware 8-bit MODBUS CRC is 42 times faster than the software version for large buffer
Im getting no-where near that value for the hardware CRC. I'll keep messing with it, might throw it into the FastCRC lib since it doesnt appear that is getting affected by the optimization.
The compiler is very smart about propagating constant values. For benchmarking algorithms, you really need to fill the input buffer with data the compiler doesn't "know" in advance. The compilers are getting so good that often the only really reliable way to do this is receive the data over the USB or some other communication channel at runtime.
remoteSerial.write(0xA5);
remoteSerial.write(0x5A);
remoteSerial.write(0x0);
remoteSerial.write(0x10);
remoteSerial.write(0x5);
remoteSerial.write(0x9F);
remoteSerial.write(0x0);
remoteSerial.write(0x0);
remoteSerial.write(0x4A); //0xFF 255, 0x4A 74
remoteSerial.write(0x0);
remoteSerial.write(0xE); //0xF6 246, 0xE 14
remoteSerial.write(0xD5); //0x34 52, 0xD5 213
0xE and 0x0E are the same thing. They tend to look tidier when always written or printed as two digits (as in the code below).0 -> 00 or E -> 0E
/*
FastCRC-Example
(c) Frank Boesing 2014
Modified by Pete (El_Supremo)
See message #89+ff in:
https://forum.pjrc.com/threads/25699-Fast-CRC-library-(uses-the-built-in-crc-module-in-Teensy3)?p=152456&viewfull=1#post152456
*/
#include <FastCRC.h>
FastCRC16 CRC16;
uint8_t buf[] = {
0x00,
0x10,
0x05,
0x9F,
0x00,
0x00,
0x4A,
0x00,
};
void setup()
{
Serial.begin(115200);
while(!Serial);
delay(1000);
uint16_t crc = CRC16.xmodem(buf, sizeof(buf));
Serial.println(crc , HEX );
Serial.print(crc >> 8,HEX);
Serial.print(" ");
Serial.print(crc & 0xFF, HEX);
}
void loop()
{
}
If you mean crccalc.com, they aren't sending the result as part of the XMODEM protocol. They're just reporting all of the different 16-bit CRCs as 16-bit words.Any idea why they'd have the CRC backwards?