Over the air updates

So it's bailing out on the third check in the parse_hex_line() function: sscanf( ptr, PSTR("%02x"), &len ). Makes my head hurt. I should take a break.

Ugh, I was wrong (actually, I was worn down and should have taken a break from this a long time ago). The parse_hex_line() function is failing at the fourth check (not the third as I asserted earlier):
if( strlen( theline ) < ( 11 + ( len * 2 ) ) ).

Here's a screenshot of the `less` command on the HEX file, which shows the offending line highlighted:
Screen Shot 2021-07-12 at 08.55.48.png

So, strlen( theline ) is 42 and ( 11 + ( len * 2 ) ) is 43, which is why this condition fails. Can someone smarter than me explain what this is checking for?
 
That looks like a properly formed line. 43 bytes is the correct length, the first two characters '10' is the number of bytes of data in hex and since it's all ascii there are two ascii characters per byte represented, hence the len * 2 part. Then there are a total of 11 characters that represent all the other data that's in the line.

The input of 42 is the problem. Are you including the colon ':' in the string you are sending to to flash_hex_line? It expects that to be there. Or maybe you are accidentally terminating that string one byte too early at some stage along the line.

You can see the format of these lines described here which is useful to figure out what's going on. It's a pretty straight forward format.
https://en.wikipedia.org/wiki/Intel_HEX
 
I can tell you that ":10010000D8041808000000000000000000000000F3" is a valid, 43 character line and passes all checks.
 
Thanks, jonr and ipaq3115.

As a test, I decided to dump out every line received via HttpClient just before being fed to flash_hex_line():
Code:
Downloading 52756 bytes...
   1: :0200000460009A
   2: :100000004643464200000156000000000103030081
   3: :1000100000000000000000000000000000000000E0
   4: :1000200000000000000000000000000000000000D0
   5: :1000300000000000000000000000000000000000C0
   6: :1000400000000000010408000000000000000000A3
   7: :100050000000800000000000000000000000000020
   8: :100060000000000000000000000000000000000090
   9: :100070000000000000000000000000000000000080
  10: :10008000EB04180A063204260000000000000000FD
  11: :10009000050404240000000000000000000000002F
  12: :1000A0000000000000000000000000000000000050
  13: :1000B0000604000000000000000000000000000036
  14: :1000C0000000000000000000000000000000000030
  15: :1000D00020041808000000000000000000000000DC
  16: :1000E0000000000000000000000000000000000010
  17: :1000F0000000000000000000000000000000000000
  18: :10010000D804180800000000000000000000000000

bad hex line :10010000D8041808000000000000000000000000

Incidentally, I just noticed that the checksum for line 18 is wrong -- in the HEX file, it's F3, but it's showing up here as 00. Well, that's a different issue -- the parse_hex_line() function doesn't compare checksums until a few more checks after the length check that it's stuck on currently. However, what if the checksum goof is somehow related to the line length issue? Like, maybe there's some odd ASCII garbage that isn't visible that's causing this? Hmm...

I'm starting to think there's a problem between the Nginx server that's serving the HEX file and maybe the HttpClient library munging the data because of it?

I'll stop spamming the forum with this since it looks like it has nothing to do with the flasher code.
 
Last edited:
Yeah, look at the difference between what you are printing out before and what the error prints.

Code:
:10010000D804180800000000000000000000000000
:10010000D8041808000000000000000000000000

First one is the right length so there is something happening between those two outputs in addition to the missing checksum.
 
Yep.

In the loop that reads each character to reassemble each hex line, I added a memset( line, '\0', sizeof line ); to ensure the line buffer is empty before starting on the next line, and that explains the checksum issue -- there never was a checksum problem, it was just a dirty buffer causing a red herring.

Now it's becoming more apparent that it's the Nginx web server and/or the HttpClient library.

It might even be related to this issue, where HttpClient is making HTTP/1.1 requests, which means it must accept chunked responses, but the library does not support it, so we're getting spurious line endings in the response that is causing short lines (masked by the dirty buffer issue).
 
A follow-up for posterity...

Turns out, the root of all of my problems was having too large receive buffer on my Client, which was causing missed characters in the HttpClient response body. Specifically, as I use TinyGSM, I set TINY_GSM_RX_BUFFER to 1024 -- something I did after reading some advice early on, but now I can't remember where I got that information nor why I chose to follow it. The default value of 64 works perfectly, although I have it set to 256 to help my MQTT client stay connected (which is what it was before I started messing with all of this).

Plus, It didn't help that I was using the way-outdated HttpClient, which sends HTTP/1.1 requests, but does not support chunking. Moving to the up-to-date ArduinoHttpClient fork took care of that.

In the end, none of my troubles had anything to do with the flasher.
 
Okay, now I have a legitimate issue.

I've been quite successful with downloading a hex file over HTTP and saving it to an external flash chip via LittleFS. This part of the process worked perfectly every time, so when the actual upgrade process starts, it has a perfect hex file.

When it comes time to read the stored hex file line-by-line into flash_hex_line(), this too works perfectly every time. Every call to flash_move() succeeds, which means every call to flash_erase_sector() and flash_write() and flash_wait() all succeed. I'm quite pleased with how well this all works!

The trouble comes at the very end, where it seems to hang after writing to the last address.

I modified the flash_move() function to spit out the sector address being erased and each address being written to, so I can see the progress and confirm that the entire address range (in the case of the Blink test sketch, 60000000 to 6000492c) is written. When it gets to the very last address, 6000492c, the entire process seems to hang. After the for() loop in flash_move(), I have a Serial.print() to indicate that the flash is complete, but this never fires. Because flash_move() doesn't finish, flash_hex_line() never returns 8, and thus upgrade_firmware() never finishes.

Here's what my modified flash_move() looks like:
Code:
void flash_move( uint32_t addressMin, uint32_t addressMax )
{
  SerialUSB.println( "Flashing..." );

  if( addressMin != FLASH_BASE || WRITE_SIZE != 4 )
  {
    sprintf( flashErrorString, "Base address incorrect\n" );
    flashError( 15 );
    return;
  }

  // below here is critical - we are writing to flash that could contain active code
  __disable_irq();
  
  // copy upper to lower, always erasing as we go up
  for( uint32_t address = addressMin; address <= addressMax; address += WRITE_SIZE )
  {

    if( ( address & ( FLASH_SECTOR_SIZE - 1 ) ) == 0 )
    {
      SerialUSB.printf( "\tErasing sector %x\n", address );
      flash_erase_sector( (void *)address );
      delay( 100 );
    }

    uint32_t from = *(uint32_t *)( (uint32_t)hexBuffer + ( address - FLASH_BASE ) );   // 4 bytes from buffer

    SerialUSB.printf( "\t\tWriting to %x...", address );
    flash_write( (void *)address, &from, WRITE_SIZE );
    SerialUSB.println( "done." );
    delay( 100 );
  }

  SerialUSB.println( "...flash complete." );

  __enable_irq();
  delay( 1000 );
  
}  // flash_move()

As you can see, I'm printing to serial before and after the flash_write(). When the sketch hangs after the last write, the Serial.print() afterward does fire, so it's not the flash_write() that's getting stuck. It's like the for() loop just stops, which doesn't make sense.

If I cycle power on the Teensy 4.1, the Blink sketch runs. So everything appears to work perfectly, just not at the very end after writing the last address.

Here's an excerpt of what my modified flesher is dumping to the serial console:
Code:
Bootup at 08:45 on Thursday, 15 July 2021


----------------------------------------------------------------------------------------------------
Booting into firmware update mode.
----------------------------------------------------------------------------------------------------

Preparing modem.....done.

Waiting for cellular network.......................................ready.

Connecting to cellular network...done.
    IMEI:       868963044845386
    ICCID:      8944501109207912266F
    Carrier:    T-Mobile Hologram
    Signal:     -75 dBm
    IP Address: 10.111.143.96

Configuring the clock...done.
    2021-07-15T08:47:06-05:00

Initiating firmware upgrade...
Connecting to __.___.___._ on port 80... request for firmware sent.

Downloading 50408 bytes...
    [   1] :0200000460009A
    [   2] :100000004643464200000156000000000103030081
    ...
    [1174] :10491C00000000000000000000000000000000008B
    [1175] :040000056000100087
    [1176] :00000001FF
Download complete.
    Lines: 1176 expected, 1176 received
    Bytes: 50408 processed out of 50408 total, 51591 saved to flash

Reading firmware file...
    [   1] :0200000460009A
    [   2] :100000004643464200000156000000000103030081
    ...
    [1174] :10491C00000000000000000000000000000000008B
    [1175] :040000056000100087
    [1176] :00000001FF
Firmware ready: 1176 hex lines, address range 60000000:6000492c.  Waiting for ':flash 1176' command.
    [1177] :flash 1176
Received :flash command.

Applying new firmware.
Flashing...
    Erasing sector 60000000
        Writing to 60000000...done.
        Writing to 60000004...done.
    ...
        Writing to 60000ff8...done.
        Writing to 60000ffc...done.
    Erasing sector 60001000
        Writing to 60001000...done.
        Writing to 60001004...done.
    ...
        Writing to 60001ff8...done.
        Writing to 60001ffc...done.
    Erasing sector 60002000
    ...
        Writing to 60004928...done.
        Writing to 6000492c...done.  <== Here is where the sketch stops.

What I expect to see after this is:
Code:
    ...
        Writing to 6000492c...done.  <== Here is where the sketch stops.

...flash complete.                   <== This comes after the for() loop in flash_move()
Firmware update complete!            <== This would come from the loop in upgrade_firmware() when the flash_hex_line() returns 8
Rebooting into normal mode...        <== This would come from a call to reboot(), which should fire after upgrade_firmware() completes


I'm using EEPROM to help my sketch determine whether it should boot into "normal" mode or firmware update mode. My reboot() contains an EEPROM write to reset that boot mode flag, but since this never fires, that flags is always set to firmware upgrade mode, so even a manual reboot would not allow my sketch to start up in "normal" mode again.

Just in case my use of EEPROM, which I understand is emulated in flash, was causing some kind of collision, I commented it all out, but it didn't help any.

Out of the dozens of times I've tested this, there were two instances where I saw my "Firmware upgrade complete!" message, and it rebooted. But that's it, and I have no idea why it worked those two times and not for the other 99 times.

Any ideas?
 
You aren't using the version in #93.

Actually, I am. I did start with an earlier version, yes, but as I kept reading this thread, I came across the updated on in #93 and switched to that (very early on in my development). Just to be certain, I deleted my copies and re-downloaded just now, then compared with mine -- it's the same. May I ask how you determined that I'm not using that version?


You should switch to the latest version here (for Teensy 3 and 4):

https://forum.pjrc.com/threads/43165...sy-3-5-amp-3-6

Aha, a whole other thread with new stuff? Cool, I'll certainly have a look. The only downside is that I'm so close with what I have -- everything works except the tiny issue of getting stuck at the very end :p.

Thank you, jonr, for the help.
 
You posted the following code:

Code:
// below here is critical - we are writing to flash that could contain active code
  __disable_irq();

This doesn't appear in the flasher4.zip in #93. Also, as the comment says, this is critical code. So don't add print statements in flash_move().
 
Oh, yeah, that. I'd swapped the cli() and sei() for those while I was sifting through the code to try to understand it. I forgot to change them back, but also wouldn't have thought it'd make a difference, but I guess it does?

Thanks for the tip on removing the print statements from flash_move() -- it wasn't obvious to me that it would interfere with "critical code."

As much as I know about microcontrollers, Arduino, and C/C++, I realize there's a shit-ton more that I don't know, so I'm not surprised when my snout gets whacked with the rolled-up newspaper.
 
Back
Top