I think Arduino IDE V2.3.2 causing problems may need to downgrade to Arduino 1.8.19?

Can you share "Teensy_Hang_Demo.ino" here on this forum? (the Google Drive link wants login which I don't have)
I attached just the source to teensy demo - other folders have copies of libraries including wire/time/arduino.d etc. and also the vs code and the hex files etc that I deleted
 

Attachments

  • Teensy_Hang_Demo.zip
    35.1 KB · Views: 22
I was able to narrow down the point at which Teensyduino causes my OTA-enabled robot program to hang up. For each of the Teensyduino versions listed below, I did a complete uninstall/reinstall of Arduino 1.8.19, followed by a fresh install of Teensyduino of that particular version. Then I did a 'build clean' on my WallE3_Git project, followed by a debug compile (F5) which triggers an OTA upload to my robot.

1.58b2 --> Works OK
1.58b4 --> Works OK
1.59 --> Hangs
1.58rel --> Works OK
1.59b3 --> Hangs
1.59b1 --> Works
1.59b2 --> Hangs

So it looks like something in the changes between 1.59b1 and 1.59b2 causes the OTA process to fail.

Frank
 
Changes since Teensyduino 1.59-beta1: https://forum.pjrc.com/index.php?threads/teensyduino-1-59-beta-2.72572/

Add inplace_function for callbacks
IntervalTimer use inplace_function
IntervalTimer demo callback backwards compatibility
Delete unused flags from String
Use C++17 to simplify IntervalTimer (Luni)
Fault handler use main vs process stack pointer (Christian Kahlo)
USBHost_t36 update DriveInfo example (Warren Watson)
FastLED fix C++17 compiler warnings
Tlc5940 update documentation, SCLK overshoot sensitivity
 
Now have you tried to localize down exactly where it is hanging?
Like in the call to xyz->abc()...
Where maybe before each major call, put in. things like Serial.println("Before call ABC"); Serial.flush();
And find where it hangs?
Does the code make it to setup();

If you then know which call is failing to return, then go into the source of that function and again add Serial.prints or the like into each area of that function... Until you have it localized down to what is failing...

Note: If you have something like logic analyzer, you can also localize using IO pins, like put in things like digitalWriteFast, or digitalToggleFast or ... in key spots with unused pins (obviously need to pinMode them)...
 
Seems having the breadcrumb()'s is nice way to have 'static' storage of info across restarts. Even across warm restart and programming.

That could work even if the execution results in a HANG instead of a fault - where it would by design print.

As long as a non-Fault restart doesn't clear() stored data, it could be diagnostically useful to have a way to see this information:
Code:
  if (bc->bitmask && bc->checksum == checksum(bc, 28)) {
    for (int i=0; i < 6; i++) {
      if (bc->bitmask & (1 << i)) {
        p.print("  Breadcrumb #");
        p.print(i + 1);
        p.print(" was ");
        p.print(bc->value[i]);
        p.print(" (0x");
        p.print(bc->value[i], HEX);
        p.println(")");
      }
    }
  }

Not having a minute now to code this - it occurred that this could be the type of case where it might be helpful and seemed a post would gather feedback if not an idea for the CrashReportClass::??name??(). It would not clear() [or otherwise affect Crash Data] and it would print if there was a valid CRC stored set of breadCrumbs data? It would be of the same Stream print class as CrashReport - it could be shown on restart - or at any point during execution for a debug log SPEW.

In the case of a HANG that never manages to print - a warm upload of a simple BlinkShowCrumbs.INO could present that info.
 
Seems having the breadcrumb()'s is nice way to have 'static' storage of info across restarts.
Do we have those on a Teensy 3.5?

Again does it make it through any of these calls?
Code:
#pragma region MPU6050
#ifndef NO_MPU6050
  gl_pSerPort->printf("\nChecking for MPU6050 IMU at I2C Addr 0x%x\n", MPU6050_I2C_ADDR);
  gl_pSerPort->println(mpu.testConnection() ? F("MPU6050 connection successful") : F("MPU6050 connection failed"));
  mpu.initialize();

  // verify connection

  float StartSec = 0; //used to time MPU6050 init
  gl_pSerPort->println(F("Initializing DMP..."));
  devStatus = mpu.dmpInitialize();

  // make sure it worked (returns 0 if successful)
  if (devStatus == 0)
  {
    // turn on the DMP, now that it's ready
    gl_pSerPort->printf(F("Enabling DMP...\n"));
    mpu.setDMPEnabled(true);

    // set our DMP Ready flag so the main loop() function knows it's okay to use it
    gl_pSerPort->println(F("DMP ready! Waiting for MPU6050 drift rate to settle..."));
    dmpReady = true;

    // get expected DMP packet size for later comparison
    packetSize = mpu.dmpGetFIFOPacketSize();
...
I see you have some print statements... But no flush calls.
I have chased my tail more than once, believing I hung in some function, only to find out it was several calls later, but the
hang kept the pending Serial output from ever being printed out...
 
Now have you tried to localize down exactly where it is hanging?
Like in the call to xyz->abc()...
Where maybe before each major call, put in. things like Serial.println("Before call ABC"); Serial.flush();
And find where it hangs?
Does the code make it to setup();

If you then know which call is failing to return, then go into the source of that function and again add Serial.prints or the like into each area of that function... Until you have it localized down to what is failing...

Note: If you have something like logic analyzer, you can also localize using IO pins, like put in things like digitalWriteFast, or digitalToggleFast or ... in key spots with unused pins (obviously need to pinMode them)...
I'm pretty sure it is failing on a call into the MPU6050 library but I'm not sure I want to try and track it down any further, especially, since my robot code works fine with 1.57 (and apparently 1.58 as well).

Moreover, it appears to me everyone is missing the fact that the OTA code has to be involved in order for the problem to occur - so it seems reasonable to me that someone (Joe?) should look at the OTA code to see why it is not properly handling the hex file from 1.59b2 and later. For instance, maybe the hex file for 1.59b2 & later is significantly larger, and maybe it is getting truncated by the OTA process, which would then cause some random problem in the code. It just happens to be in MPU6050 library call in my particular case. If that's the case, then drilling down into my code to find the exact place it hangs won't be very useful, as it could occur someplace else entirely in a different program.

I will be happy to provide hex files from working and non-working configuration - say 1.58 and 1.59. Then someone (Joe?) might be able to see what goes wrong with the OTA processing to result in bad code actually being written to the Teensy 3.5. After all, it's pretty much a given that the result of OTA processing of 1.59b2 and later results in incorrect code in the Teensy's memory. My bet would be that the memory maps in the two cases (OTA vs direct USB transfer with 1.59b2 and up) are different.
 
I'm pretty sure it is failing on a call into the MPU6050 library but I'm not sure I want to try and track it down any further, especially, since my robot code works fine with 1.57 (and apparently 1.58 as well).

Moreover, it appears to me everyone is missing the fact that the OTA code has to be involved in order for the problem to occur - so it seems reasonable to me that someone (Joe?) should look at the OTA code to see why it is not properly handling the hex file from 1.59b2 and later
Good luck.
 
Do we have those on a Teensy 3.5?
Opps - would not apply here - though T_3.5 does have some NVRAM bytes that could emulate this behavior ... with user code.
Though had this come to mind some time back on T_4.x in some situation - and oddity like this could be assisted.
 
Moreover, it appears to me everyone is missing the fact that the OTA code has to be involved in order for the problem to occur - so it seems reasonable to me that someone (Joe?) should look at the OTA code to see why it is not properly handling the hex file from 1.59b2 and later. For instance, maybe the hex file for 1.59b2 & later is significantly larger, and maybe it is getting truncated by the OTA process, which would then cause some random problem in the code. It just happens to be in MPU6050 library call in my particular case. If that's the case, then drilling down into my code to find the exact place it hangs won't be very useful, as it could occur someplace else entirely in a different program.

I will be happy to provide hex files from working and non-working configuration - say 1.58 and 1.59. Then someone (Joe?) might be able to see what goes wrong with the OTA processing to result in bad code actually being written to the Teensy 3.5. After all, it's pretty much a given that the result of OTA processing of 1.59b2 and later results in incorrect code in the Teensy's memory. My bet would be that the memory maps in the two cases (OTA vs direct USB transfer with 1.59b2 and up) are different.

FlasherX definitely works on T4.1 with TD 1.57, 1.58, and 1.59. I can say that for sure because I've continued using it through all of these versions, including the betas, with no problems. I haven't used T3.5 in some time, but I think others are using it and there haven't been any reports of problems on any T3.x model. FlasherX is all plain C, so there are no objects and it shouldn't be affected by any of the C++ initialization stuff that has changed in more recent versions. FlasherX knows nothing about where variables are stored or where functions are. All it does is buffer the data that is sent, then write that data to the program space. I'm sure that the base address for T3.5 hasn't changed, else it wouldn't work at all.

If the issue was FlasherX, I don't think the code would crash in the same place each time. Are you using EEPROM or any other non-volatile memory?
 
Back
Top