Hi Duff,
your library is great, but won't you use the newlib-inbuilt-memcpy instead ? It's assembler-hand-optimized by the ARM-guys.
Even there is no great difference - it would save some space at least.
That newlib-inbuilt-memcpy is the default memcpy now for (Teensyduino 1.21) Teensy 3.1 or do I need to use optimized menu item in the ide? I tested both optimized and not optimized via the ide against memcpy from Daniel Vik and his was still faster. I didn't look at the ram size difference though.
The other problem was I had cast away the volitileness of the TX buffer to use the regular memcpy, this is one thing maybe you or someone else can shed some light on is if I do cast away volatile global array like (uint8_t*)"volatile"buffer does it lose all its volatilness forever or just in the local scope that it is cast awayed? Sorry for the convoluted question but I'm just not sure.
I tested the speed by toggling a digital pin and using my oscope before (HIGH) and after (LOW) the call to *.write(packet, size). Maybe this is not the best metric to measure the speed of a function?