memcpy can fail on a Teensy 4.0, 100% reliable on a 3.2

KrisKasprzak

Well-known member
All,

I have a datalogger project based on a Teensy 3.2 that uses a database library to save telemetry data to a flash chip. Over they last 3 years, I've logged > 100 MB of data and have never lost a single bit. The driver can save a wide variety of data types.

I've converted my project to a Teensy 4.0 and using the same code. I'm noticing conversion for floats using memcpy() fails occasionally--as in every few seconds. Heck even bit shifting to convert a uint16_t to 2 bytes can fail every few minutes. In either case, one of the converted bytes is 255--this never happens if compiled on a 3.2

Below is a snipped of my code, which may get called 20 times a second.

compiling to run at 600 mhz--although different speeds makes no difference
compiling to smallest code--although other settings e.g. fastest makes no difference

using Arduino 2.1.0
using Teensduino 1.58

Any thoughts?


Code:
// aBytes[8] is a private uint8_t variable
// fdata[i] is a pointer to passed-in data (and is 100% reliable)

memcpy(aBytes, (void *)fdata[i], 4);

if ( (aBytes[0] == 255) ||  (aBytes[1] == 255) ||  (aBytes[2] == 255) ||  (aBytes[3] == 255) ){
    Serial.println("float convert failed");
    Serial.print(*fdata[i]);
    Serial.print(": ");
    Serial.print(aBytes[0]);
    Serial.print("-");
    Serial.print(aBytes[1]);
    Serial.print("-");
    Serial.print(aBytes[2]);
    Serial.print("-");
    Serial.println(aBytes[3]);
}


// results
float convert failed
0.00: 6-140-255-58
float convert failed
0.01: 251-226-255-59
float convert failed
79.62: 255-59-159-66
 
Any thoughts?
Interesting snippet - can a short expanded sketch shown to fail be posted so anyone looking is dealing from the same deck?

// void* memcpy( void* dest, const void* src, std::size_t count );

Where do fdata[] and aBytes[] reside ( stack, other RAM1, RAM2, PSRAM )?
 
I'll some more detail, but I powered down my 4.0 for a few hours and upon restarting, it seems to be working. I've noticed this, that after several hours of usage, the problem returns.

This will be a pain to debug.
 
I thought of starting a memcpy() sketch with these values from PJRC PSRAM test - Not floats by nature - but chosen values to iterate through and test and compare in some fashion?

Code:
static uint32_t lfsrPatt[ 44 ] = { 2976674124ul, 1438200953ul, 3413783263ul, 1900517911ul, 1227909400ul, 276562754ul, 146878114ul, 615545407ul, 110497896ul, 74539250ul, 4197336575ul, 2280382233ul, 542894183ul, 3978544245ul, 2315909796ul, 3736286001ul, 2876690683ul, 215559886ul, 539179291ul, 537678650ul, 4001405270ul, 2169216599ul, 4036891097ul, 1535452389ul, 2959727213ul, 4219363395ul, 1036929753ul, 2125248865ul, 3177905864ul, 2399307098ul, 3847634607ul, 27467969ul, 520563506ul, 381313790ul, 4174769276ul, 3932189449ul, 4079717394ul, 868357076ul, 2474062993ul, 1502682190ul, 2471230478ul, 85016565ul, 1427530695ul, 1100533073ul };
const uint32_t lfsrCnt = sizeof(lfsrPatt) / sizeof(lfsrPatt[0]);

static uint32_t fixPatt[ 13 ] = { 0x5A698421, 0x55555555, 0x33333333, 0x0F0F0F0F, 0x00FF00FF, 0x0000FFFF, 0xAAAAAAAA, 0xCCCCCCCC, 0xF0F0F0F0, 0xFF00FF00, 0xFFFF0000, 0xFFFFFFFF, 0x00000000 };
const uint32_t fixPCnt = sizeof(fixPatt) / sizeof(fixPatt[0]);

uint8_t aBytes[8];  // is a private uint8_t variable
float fdata[4]  // is a pointer to passed-in data (and is 100% reliable)
 
There isn’t enough information here to help with the problem. For example, are there threads or ISRs? What are the types of those arrays? Why is a cast to void* necessary here? Why is ‘255’ in one of the four bites considered an error? Etc.

If you could provide a short, minimal sketch that outlines how to generate the problem, or even outlines what your code does, that would be most helpful.
 
One possible cause - and this is pure speculation at this point - I have seen arm compilers be a little overzealous with the optimisations in the past. It has optimised a memcpy of 8 bytes into two 32 bit load and store commands completely removing the call to the memcpy function. Which worked great if the two addresses were 32 bit aligned and fails horribly if they aren't. In the end rather than hard coding the size of the copy I make it a parameter that was passed to that function. The value passed in was always 8 but that was enough to prevent the optimisation process from removing the call to memcpy.

Any chance this is something similar and the errors occur when memory addresses aren't 32 bit aligned?

edit - fdata looks to be a float array so that's aligned, the arm FPU gives a hard fault if you try using a float that isn't. Assuming abytes isn't part of some packed struct or similar and you are always memcpying to the start of it then that's also going to be aligned.
So assuming the sample code you gave is representative then I can't see how this could be the issue. But maybe from a complete paranoia perspective it's worth printing out the addresses for the two memory locations when it goes wrong and verify that they are both multiples of 4.
 
Last edited:
(alignment)* can't see how this could be the issue.
Also - it works for some "few hours" before failure. If it was compiled wrong, would it change behavior? Unless the MCU got sloppy with on the fly twin instruction execution it seems there must be something else going on.
 
One possible cause - and this is pure speculation at this point - I have seen arm compilers be a little overzealous with the optimisations in the past. It has optimised a memcpy of 8 bytes into two 32 bit load and store commands completely removing the call to the memcpy function. Which worked great if the two addresses were 32 bit aligned and fails horribly if they aren't. In the end rather than hard coding the size of the copy I make it a parameter that was passed to that function. The value passed in was always 8 but that was enough to prevent the optimisation process from removing the call to memcpy.

Any chance this is something similar and the errors occur when memory addresses aren't 32 bit aligned?
Teensy 4.x supports unaligned loads/stores so there's no chance of this sort of error.
 
If you could provide a short, minimal sketch that outlines how to generate the problem, or even outlines what your code does, that would be most helpful.
For "outlines", read "shows exactly". For "most helpful", read "vital".

I've tried a very simple sketch based on the vague hints in post#1, using Arduino 1.8.19 and Teensyduino 1.60b5, and completely failed to observe any problem. I'm quite interested to know what's going on (selfishly, it may be something that could affect Future Me), but not so interested I'm going to thrash about in the dark for any length of time...
 
After days of debugging I’ve found my issue is not with memcpy. I initially thought it was since often the convert from a float to bytes would give 255 255 255 255. Even simple bit shifting to convert a uint165_t would give 255 255--at least that were the saved bytes to the flash chip. After much debugging it turns out the conversion was correct but during chip write (or read), 255 was returned. This would happen randomly and perhaps 2-5 times during the write of 800K or so of data.

I’ve built a new board with a new winbond chip and all is well. I can only think the initial chip had some strange issue (maybe due to soldering/desoldering to several previous boards.

Not sure I’ll ever know what the culprit was but seems my issue is resolved.
 
After days of debugging I’ve found my issue is not with memcpy. I initially thought it was since often the convert from a float to bytes would give 255 255 255 255. Even simple bit shifting to convert a uint165_t would give 255 255--at least that were the saved bytes to the flash chip. After much debugging it turns out the conversion was correct but during chip write (or read), 255 was returned. This would happen randomly and perhaps 2-5 times during the write of 800K or so of data.

I’ve built a new board with a new winbond chip and all is well. I can only think the initial chip had some strange issue (maybe due to soldering/desoldering to several previous boards.

Not sure I’ll ever know what the culprit was but seems my issue is resolved.

This is a good example of why it's important to provide a complete program showing the issue. I think it would have been easy to show that memcpy() was not the issue.
 
Back
Top