GCC 11

"Maybe it needs certain hardware connected to demonstrate the problem?"

Unfortunately, that is the case.

I only put the stack monitor in after the problem occurred and was suggested by jrraines and luni earlier.

However, once upon a time a while back, I wrote something to analyses the .elf file. In the faulty version, the boolean variable retrySBDIX is allocated 3 bytes:

elf1.jpg


In the working version, retrySBDIX gets 1 byte, as you would expect:

elf2.png


Might this be a problem?
 
Some time ago there was a discussion in the forum about a very similar issue when an if clause checking the return value of new was not called correctly. Here the corresponding code

C++:
void setup()
{
  while (!Serial);
  void *p = new byte[1024 * 600];  // try to allocate 600kB (which is too much) -> new fails and returns 0 (Teensyduino implementation of new)
  
  Serial.printf("p= %p\n", p);     // returned pointer is 0x00 as expected
  if(p != 0)                       // however, the compiler knows that 'new' can not return null (as per language definition)
  {                                // -> it doesn't check the actual value but executes the if-clause uncoditionally
    Serial.println("Memory OK");
    Serial.printf("p= %p\n", p);   // Just to check if p is still 0
  }
  else                             // the else-clause is completely optimized away since new is supposed to never return nullptr
  {
    Serial.println("not enough memory"); 
  }
}
void loop()
{
}

prints:
Code:
p= 0x0
Memory OK
p= 0x0
It turned out that the compiler (rightfully) assumed that the return value of new can never be 0 and optimized the else-clause away.

Of course this doesn't help you directly but shows that subtle and unexpected things can lead to very weird behaviour.
Your code is way to large to reasonably debug it (at least for me). If you really want to get this fixed/analyzed, I suggest to reduce its size step by step and check when the issue disapears. In any case, I'd try to get rid of external libraries and hardware dependencies so that others can help more easy. Also, if you don't want to exclude folks which are using other IDEs, it might be a good idea to not split your sketch it into multiple ino files.
 
Last edited:
I had a quick scan of the code - could not compile it - but spotted stuff that might easily go wrong:

In setup()
modemType = EEPROM.read(EEModemType); // 53
modemCN = EEPROM.read(EEModemCN);

after that

sprintf(logMsg, "Modem set to (%d) %s on %s", modemType, modems[modemType], CNs[modemCN]);
}
writeLog(logMsg);

If EEPROM has random data (because why would it not after starting with a Teensy that was programmed with something once?) then modelCN will be a random number up to 255. String CNs[modemCN] will then be somewhere outside the globally defined two options
char* CNs[] = { "CN7", "CN8" };
Which may or may not be a null terminated string of more or less than what fits in...
char logMsg[600];

So expect you can overwrite after those 600 bytes, into whatever sits there as global variables.

Code looks like it's prone to many more vulnerabilities with those type of string manipulations...
 
In the faulty version, the boolean variable retrySBDIX is allocated 3 bytes:

Is it really allocated 3 bytes? Or is it a single byte followed by 2 unused padding bytes? Does your elf analysis software really know the allocation size, or is it merely guessing the allocated size by substracting with the address of next allocated item it finds?

I looked at the .map file. It shows this line for retrySBDIX, only 1 byte allocated

Code:
1fffaff1 g     O .bss    00000001 retrySBDIX

All gcc versions align variables to the natural access size. So if you have a byte which can be allocated at any address followed by a uint32_t which must be allocated at a multiple of 4, you can potentially have 1 or 2 or 3 unused padding bytes between those variables.

To answer directly, yes, if retrySBDIX is somehow allocated 3 bytes then yeah, could be a problem. But your screenshot shows its address is 1fffafbd and another 8 byte variable is at 1fffafc0. Seems pretty likely retrySBDIX is really only 1 byte and addresses 1fffafbe - 1fffafbf are unused padding bytes because the next variable needs to be 32 or 64 bit aligned, which of course 1fffafc0 is.

Of course someone with a lot of time could dive into the .lst file and read through the disassembly of generated code to see how retrySBDIX is really being accessed. But I really doubt that would turn up something so outrageous as the compiler using 24 bits for a boolean, or using unaligned access to memory (which can of course be forced by casting integers to pointers but usually results in the compiler giving aliasing warnings).
 
Yes. I agree they're unused padding bytes.

In so far as the coding suggestions are concerned, the code provided is a highly cut down version of the real thing which initialises all the EEPROM properly, for example:

C:
    if (modemCN > 10) {
        modemCN = 1;
        EEPROM.write(EEModemCN, modemCN);
    }

Most significantly, the code does work when built with 1.57 and earlier. Just not with 1.58.

I suppose the simplest solution is not to upgrade beyond 1.57, but it would be good to know why this occurs.
 
If modemCN in EEPROM was say 5, then it will pass your test that it's not >10. So it will stay 5.

If you then do

sprintf(logMsg, "Modem set to (%d) %s on %s", modemType, modems[modemType], CNs[modemCN]);

while having declared CNs as

char* CNs[] = { "CN7", "CN8" };

then expect CNs[5] to be an unpredictable value, leading to trouble when attempting to read that as a string.

The type of trouble that manifests may well be dependent on compiler version. But I think it would be unfair to blame compilers for that.

Better to take compiler warnings more seriously and fix their root cause, and then see if the 'bug' still bites.
 
Yes. However, a factory fresh Teensy has EEPROM values set to 255 and it copes with that.

The problem occurs on a Teensy which has been fine, and is fine when programmed with code from GCC5.3. So I'm not sceptical that correcting all the warnings will make the issue disappear.

I continue to whittle away and will try and publish a warning free set of code which uses no libraries and is contained with a single .ino but which still misbehaves.
 
Can you edit your code so it reproduces the problem when run on a just a Teensy with no other hardware connected? Maybe in the places you access your other hardware, just put some canned results so the rest of the program is able to run and reproduce the issue?
 
Not really...but I have just this minute determined the issue through the process of pruning down the code and eliminating warnings.

It seems that GCC11 is less tolerant of function types than GCC5. The routine in question was declared as int processIridiumResponse() but did not return a value. It probably did once upon a time.

Changing that to void processIridiumResponse() appears to have cleared the problem.

Thank you to everyone for their input. I have learned some useful diagnostic techniques.
 
The routine in question was declared as int processIridiumResponse() but did not return a value.

David:

I can attest to the fact (ask me how I know ?!?!?) that this would have been one of the warnings generated by the compiler, no matter which version (GCC11, GCC5, or otherwise). Hopefully, this experience provides you even more encouragement to take the time to work your way through all of the compiler warnings...they're reported by the compiler for good reason & resolving such warnings could keep you from having to spend even more of your valuable time troubleshooting in the future !! Glad you got it sorted !!

Mark J Culross
KD5RXT
 
Glad you found the problem.

I've also see this, where not returning a value from a function that is defined as returning something causes problems for the calling code. I suspect the newer GCC may be applying more "global" analysis and optimization across function calls. The traditional approach was to optimize each function, but unless you used LTO the older GCC was limited in the analysis it could do between functions, so mistakes in 1 function couldn't really mess up other functions.
 
Back
Top