Teensy 4 "EEPROM" blocking for flash sector erase

blahfoo

Well-known member
The T4 and 4.1's EEPROM emulation using Flash ROM has to sometimes erase a sector. This takes 0.045 to 0.400 seconds to do and the code is completely blocking- interrupts are disabled during this time. (eeprom.c in \hardware\teensy\avr\cores\teensy4, code below.)

I don't see what would influence the Winbond serial flash chip towards the 0.4 second end of that spec, but I have to assume the worst unless I hear otherwise. (P.90 on the datasheet referenced in this thread: https://forum.pjrc.com/index.php?threads/teensy-4-1-eeprom-endurance.72280/).

Moreover, some scenarios where you write more than one byte of data will result in multiples of these, though unlikely, it appears possible for the CPU to disappear for most of 3-25 seconds (all 63 sectors touched). (Possibly triggering a watchdog timer restart BTW, as I am using it?)

In my application, blocking for more than one ms is fatal to the purpose of the product, so writing to this flash is usually not permissible. Yet I need NV storage (not on the SD card - that's being used and might not always be there). And I don't want to add more parts.

So my question is, for an application where there is only one user of the flash, and that can be written to wait while everything else continues, is there a reason I shouldn't write a modified version of eeprom.c to not block so long, or not at all? Such here: eepromemu_flash_erase_sector()

Thanks if you have a chance to comment on this,
-Phil

// erase a 4K sector - From eeprom.c
void eepromemu_flash_erase_sector(void *addr)
{
__disable_irq();
FLEXSPI_LUTKEY = FLEXSPI_LUTKEY_VALUE;
FLEXSPI_LUTCR = FLEXSPI_LUTCR_UNLOCK;
FLEXSPI_LUT60 = LUT0(CMD_SDR, PINS1, 0x06); // 06 = write enable
FLEXSPI_LUT61 = 0;
FLEXSPI_LUT62 = 0;
FLEXSPI_LUT63 = 0;
FLEXSPI_IPCR0 = 0;
FLEXSPI_IPCR1 = FLEXSPI_IPCR1_ISEQID(15);
FLEXSPI_IPCMD = FLEXSPI_IPCMD_TRG;
arm_dcache_delete((void *)((uint32_t)addr & 0xFFFFF000), 4096); // purge data from cache
while (!(FLEXSPI_INTR & FLEXSPI_INTR_IPCMDDONE)) ; // wait
FLEXSPI_INTR = FLEXSPI_INTR_IPCMDDONE;
FLEXSPI_LUT60 = LUT0(CMD_SDR, PINS1, 0x20) | LUT1(ADDR_SDR, PINS1, 24); // 20 = sector erase
FLEXSPI_IPCR0 = (uint32_t)addr & 0x00FFF000;
FLEXSPI_IPCR1 = FLEXSPI_IPCR1_ISEQID(15);
FLEXSPI_IPCMD = FLEXSPI_IPCMD_TRG;
while (!(FLEXSPI_INTR & FLEXSPI_INTR_IPCMDDONE)) ; // wait
FLEXSPI_INTR = FLEXSPI_INTR_IPCMDDONE;
flash_wait();
}

static void flash_wait()
{
FLEXSPI_LUT60 = LUT0(CMD_SDR, PINS1, 0x05) | LUT1(READ_SDR, PINS1, 1); // 05 = read status
FLEXSPI_LUT61 = 0;
uint8_t status;
do {
FLEXSPI_IPRXFCR = FLEXSPI_IPRXFCR_CLRIPRXF; // clear rx fifo
FLEXSPI_IPCR0 = 0;
FLEXSPI_IPCR1 = FLEXSPI_IPCR1_ISEQID(15) | FLEXSPI_IPCR1_IDATSZ(1);
FLEXSPI_IPCMD = FLEXSPI_IPCMD_TRG;
while (!(FLEXSPI_INTR & FLEXSPI_INTR_IPCMDDONE)) {;}
FLEXSPI_INTR = FLEXSPI_INTR_IPCMDDONE;
asm("":::"memory");
status = *(uint8_t *)&FLEXSPI_RFDR0;
} while (status & 1);
FLEXSPI_MCR0 |= FLEXSPI_MCR0_SWRESET; // purge stale data from FlexSPI's AHB FIFO
while (FLEXSPI_MCR0 & FLEXSPI_MCR0_SWRESET) ; // wait
__enable_irq();
}
 
I don't see what would influence the Winbond serial flash chip towards the 0.4 second end of that spec,

The maximum spec is probably conservative, covering a wide temperature range and possible variation on manufacturing of the chips.

But it's also well know that flash memory write and erase becomes slower with use, so you could probably expect to see closer to the maximum after 100,000 erase-write cycles.


is there a reason I shouldn't write a modified version of eeprom.c to not block so long

Yes, of course there's a reason. The code wouldn't have been written this way for no reason.

If any code tries to execute from flash memory and has a cache miss needing the FlexSPI controller to fetch from the actual chip, wrong data will be read and your program will crash.
 
Thanks Paul for the first answer. I've been testing and so far it's faster than the fast end of the spec, if I read the address scattering code right. I can get it to go away for a good chunk of a second for multiple sector erases, but nothing like the spec. But I shouldn't trust that for the reasons you mention. Interestingly, one can easily make it go away for long enough that the serial terminal connection goes away and reconnects a number of seconds later. (Windows end, probably, I'll drop the test routines here if anyone's interested).

On the second point, as I said, there would be only one user unless there's something going on I don't understand (quite possible). Correct me, as (mentioned) no other use specifically uses the EEPROM emulation, I am assuming the only teensyduino use of the flash is to load the code/initialized vars into RAM on startup (or restart)? Why otherwise would code run from flash?
 
On the second point, as I said, there would be only one user unless there's something going on I don't understand (quite possible). Correct me, as (mentioned) no other use specifically uses the EEPROM emulation, I am assuming the only teensyduino use of the flash is to load the code/initialized vars into RAM on startup (or restart)? Why otherwise would code run from flash?
Any user code can be marked with the FLASHMEM attribute so that it runs from FLASH. This is helpful in memory constrained systems where the total amount of runnable code does not fit into RAM1
 
Last edited:
We do indeed have functions defined with FLASHMEM in libraries. Over time more can be expected. It just makes good sense to allocate code that isn't performance critical into only flash memory.
 
Thanks, so if I make sure that all code I'm using fits in (and is running from) RAM1, then I am safe to write a private version of the EEPROM write code? I'm pretty much only running things that are performance critical on this one. Of course it would get all the testingz, but preferably only once. Teensyduino or whatnot isn't going to swap locations of things currently in RAM, into FLASHMEM in the future? Or perhaps I just go on a hunt for the FLASHMEM define and kill it?

Latest testing is that if I confine my stuff to one sector (68 bytes available, plenty) the read times are fine (<.2ms). The max write time is ~3mS however. This gadget has about 0.5ms to save some parts from short circuits.
-Phil
 
One of my projects requires a watchdog with timeout of a few ms, so I faced this issue of needing to prevent watchdog reset during blocking flash operations. EEPROM writes can block for a relatively time, but as far as I can tell so far, only when they trigger a sector erase. I did some testing and found that the only significant blocking is in flash_wait(), specifically in do/while loop that waits for completion of whatever operation was requested. My work-around was to add a copy of cores\teensy4\eeprom.c to my project folder and modify flash_wait() to periodically refresh the watchdog.

I'm using the RTWDOG (WDOG3) with 32-bit refresh enabled, so modifying the do/while loop as shown below does the trick. I just added one line (plus a comment line) that refreshes WDOG3 if its counter value is greater than its window value, i.e. when the minimum time for refresh has been achieved. This avoids watchdog reset during any or all flash operations, including T4 EEPROM usage and firmware updates via FLASHERX.

Code:
static void flash_wait()
{
    FLEXSPI_LUT60 = LUT0(CMD_SDR, PINS1, 0x05) | LUT1(READ_SDR, PINS1, 1); // 05 = read status
    FLEXSPI_LUT61 = 0;
    uint8_t status;
    do {
        // 10/15/24 JWP toggle watchdog if CNT > WIN (minimum)
        if (WDOG3_CNT > WDOG3_WIN) WDOG3_CNT = 0xB480A602;        
        FLEXSPI_IPRXFCR = FLEXSPI_IPRXFCR_CLRIPRXF; // clear rx fifo
        FLEXSPI_IPCR0 = 0;
        FLEXSPI_IPCR1 = FLEXSPI_IPCR1_ISEQID(15) | FLEXSPI_IPCR1_IDATSZ(1);
        FLEXSPI_IPCMD = FLEXSPI_IPCMD_TRG;
        while (!(FLEXSPI_INTR & FLEXSPI_INTR_IPCMDDONE)) {;}
        FLEXSPI_INTR = FLEXSPI_INTR_IPCMDDONE;
        asm("":::"memory");
        status = *(uint8_t *)&FLEXSPI_RFDR0;
    } while (status & 1);
    FLEXSPI_MCR0 |= FLEXSPI_MCR0_SWRESET; // purge stale data from FlexSPI's AHB FIFO
    while (FLEXSPI_MCR0 & FLEXSPI_MCR0_SWRESET) ; // wait
    __enable_irq();
}
 
Nice! Thanks. This is helpful - just been turning on WDT's lately. I reduce the wait damage (for other reasons) by arranging to only do one flash sector erase in a single write. I do that by re-addressing all my EEPROM writes to keep them in a single actual sector at any one (small block of <64 bytes) write.
 
Watchdog timers are difficult. Or said another way, they're easy to use ineffectively.

If you're going to reset the watchdog inside a tight loop, at the very least you might consider adding a count variable to limit the number of times you're willing to reset it. Any place you're resetting the watchdog timer inside a small loop, you have a risk of preventing the watchdog from actually recovering from a problem.

Generally speaking, to get the best benefit from a watchdog timer you would want to verify that every feature of your program is operating correctly. Ideally you would only reset the watchdog when you're absolutely sure all subsystems are functioning properly. A short watchdog timeout feels good, because it seems like you're quickly recover from any problem. But if the short timeout forces you to reset the watchdog under *any* circumstances where you can't fully verify all functions of your application are indeed working, you risk a scenario where some part of your program isn't working but another part keeps the watchdog from rebooting.

But another hard question is whether a watchdog reboot would actually recover if something has gone wrong with the flash memory. If the flash chip has become stuck (or something has become misconfigured with FlexSPI that put the flash chip in an unexpected state) how to reliably recover might involve more than simply rebooting.

These are hard problems to solve. Getting the best chance of recovery from unexpected problems using a watchdog timer isn't easy. I don't have any easy answer... just a cautious warning that unconditionally resetting the watchdog from within a small-scope loop should almost always be considered a red flag. To really make effective use of a watchdog timer, you really should design your program to be quite conservative about resetting the watchdog, usually only in circumstances where you can be certain everything is still functioning as it should.
 
Watchdog timers are difficult. Or said another way, they're easy to use ineffectively.
Points taken, Paul. In this case the watchdog timeout was specified for me, and we are using T4 EEPROM for parameters, so I really have no choice in the short term. Parameter writes will be rare, and erase even more so, but I can’t have the device reboot if it happens. We will use external EEPROM in future. The refresh occurs at 2 kHz within that loop, and I can slow it down further, so it’s not as bad as it looks. The window feature of RTWDOG is helpful there.
 
The problems with badly used WDT's are as old as microcontrollers, many folks have experience doing it well. This is a shift of subject, but I've got T4.1's going catatonic - possibly only when I use a lockable version in secure mode (SPI bus glitch=security violation?). I've exhausted every idea I have from the forums and documentation for getting it to reset itself, or even flipping a pin to say "reset my power". Though one could detect this condition by detecting a pin going from hard output to weak "keeper" state. The WDT's work for software stuck in a loop, etc, but not this hardware shutdown. Instead I've landed on an external WTD chip (ST STWD100), and a 5V regulator with a *disable input (Pololu D24V5F5). This works where none of the internal WDT's have. Now I can safely fix the cause of the shutdown, which will be the easy part.
 
I've got T4.1's going catatonic - possibly only when I use a lockable version in secure mode (SPI bus glitch=security violation?). The WDT's work for software stuck in a loop, etc, but not this hardware shutdown.

I don't understand the state of the processor when your system becomes "catatonic", but it's somehow hardware related. The internal watchdogs can't reset a chip that doesn't at least have running clocks, etc.
 
It stops running software - not even a WDT interrupt routine with a fast pin write - and releases drive to all digital I/O pins, which go into keeper state. Teensy's 3.3V power output is still up. I suspect it's caused by weak ESD zaps to -> SPI port to LCD controller.
 
Last edited:
It stops running software - not even a WDT interrupt routine with a fast pin write - and releases drive to all digital I/O pins, which go into keeper state. Teensy's 3.3V power output is still up. I suspect it's caused by weak ESD zaps to -> SPI port to LCD controller.
You might want to start another thread to address the hardware issues, since that wasn't really the topic here. The output pins would not change state if the processor was simply stopped, and if you do have ESD, the chip is likely being damaged, and prevention is more important than how to automatically reset/restart when it happens.
 
You might want to start another thread to address the hardware issues, since that wasn't really the topic here. The output pins would not change state if the processor was simply stopped, and if you do have ESD, the chip is likely being damaged, and prevention is more important than how to automatically reset/restart when it happens.
Yes, sorry, did mention it elsewhere but got no answers, probably should have started it as a brand new thread. No apparent hardware damage, comes back all good after power cycling. In any event I'm convinced the external watchdog is easier than solving it in T4 setup - already working in 1 day - way simpler (T4: what a beast! And I mean that in the most admiring way.) Also: in a system that should run for years without hitch, it's good to have both automatic recovery and prevention, because no amount of testing is enough. ESD is suspected partly 'cos it seems to be effected by humidity. If/when I learn exactly what's happening I'll post that.
 
Nice! Thanks. This is helpful - just been turning on WDT's lately. I reduce the wait damage (for other reasons) by arranging to only do one flash sector erase in a single write. I do that by re-addressing all my EEPROM writes to keep them in a single actual sector at any one (small block of <64 bytes) write.
Can you give some more info to keep the the EEPROM writes to one sector? Is it just writing less than 64 bytes (or also 64 bytes?) or is there more involved. This would be very useful information for most users of EEPROM.
 
Can you give some more info to keep the the EEPROM writes to one sector? Is it just writing less than 64 bytes (or also 64 bytes?) or is there more involved. This would be very useful information for most users of EEPROM.
It's actually 68 bytes max emulated EEPROM in one sector on T4.1 as I'm doing it, but I don't want to unleash a not-long-tested idea on anyone. So I'll give you only a little code. Basically (if I remember correctly), the T4 EEPROM emulation scatters write addresses around on FLASH to do better wear leveling, but I don't need that in my application. Not writing often. So I keep everything in one sector so it only writes one at a time, reducing max time. Also, lazy. Didn't want to change the EEPROM emulation code itself.

The de-scattering looks like this:
C++:
// "EEPROM" facility
//   T4.1 Flash EEPROM emulation with wear leveling across 63x68 byte sectors
//const unsigned EEsize = 4284;
const unsigned EEsize = 68;
  // Notes: Scattering for wear-leveling works like thas:
  // Each 4 byte chunk (starting at addrs 0-3) is in a different sector, up to bytes 248-251.
  // Then 252-255 are in the first sector once again.  Pattern repeats.
  // Only changed bytes are written into that sector, as new records.  Reading gets the latest.
  // Once a sector is full, the latest data is copied, and a sector is erased.
  // Thus loosing dead record space.  Erasing that sector costs up to 0.4s time, BLOCKING.
  // New part measurement is <4mS, however.  Erasure time is said to go up with wear.

//function to re-arrange flash addresses sequentially, so all settings/NV variables can be kept in one sector.
//Only works for first sector of "EEPROM" addresses: 68 bytes
inline unsigned deScattered(unsigned addr)
{
  return((addr&3)+252*(addr>>2));
}

//Write interface to "EEPROM"
void EEwriteDeScattered(unsigned idx, uint8_t val)
{
  EEPROM.write(deScattered(idx), val);
}

//Read interface to "EEPROM"
uint8_t EEreadDeScattered8b(unsigned idx)
{
  return(EEPROM.read(deScattered(idx)));
}
 
Last edited:
While it's unlikely the Teensy 4 code emulating EERPOM will change, you have no assurance future releases will always implement this specific wear leveling strategy. If you build your code to depend on internal details of the EEPROM emulation wear leveling, please be aware your program could break if some future version of Teensy's core library changes the wear leveling algorithm.
 
While it's unlikely the Teensy 4 code emulating EERPOM will change, you have no assurance future releases will always implement this specific wear leveling strategy. If you build your code to depend on internal details of the EEPROM emulation wear leveling, please be aware your program could break if some future version of Teensy's core library changes the wear leveling algorithm.
Thanks

Would it be an option that there is a conditional erase/rewrite function? Something like if there less space than for writing x bytes for a certain EEPROM address, do an copy/erase. This would make it possible to avoid copy/erase at time critical moments and limit it to one sector at a time, avoiding a long blocking wait. A user could do it on restart for example or in a loop. I'm aware that this would lower the wear levelling rating, but most of the time it would be only a few almost full sectors.

On a T4.0 there are only 1080 bytes of EEPROM. I've read in some older posts that it could be increased, but having less wear levelling. It was at that time not "possible" to increase the amount of wear levelling sectors, because some software would erase it. I also remember that some of those erase routines where changed to make a more permanent filesystem possible on the T4.0. Is the reserved space for EEPROM still limited or more flexible?
 
While it's unlikely the Teensy 4 code emulating EERPOM will change, you have no assurance future releases will always implement this specific wear leveling strategy. If you build your code to depend on internal details of the EEPROM emulation wear leveling, please be aware your program could break if some future version of Teensy's core library changes the wear leveling algorithm.
Aah thanks.
 
In that case, the watchdog would reset the processor, unless there is a watchdog refresh is in the ISR of the offending interrupt. What the OP says about the GPIO pins seems more like a hardware issue.
Hardware issue found for my catatonic T4.1 problem: bad choice of IC sockets! The +5V power pin (highest current and corner pin) was just on the edge of disconnecting. Could be caused by pushing sideways on the Teensy, or low humidity. Interestingly, once the +5 input dropped enough to shut down the CPU, there must have been enough 10's of mW of dissipation in that pin connection resistance to hold it in the bad connection state, but power cycling it cooled the connection down enough to allow it to restart. I'll post this with pin socket model numbers.
 
Back
Top