T4 NVRAM (SNVS_LPGPR0..3) and the penalty of using it?

Digging a bit deeper on this: the answer to my question above seems to be YES. The imxrt1062 CPU effectively freezes for a long time when reading or writing to SNVS registers. And your interrupts will have to wait. Code below is how I tested and confirmed.

This creates a coding problem for me that I cannot solve now. Maybe someone has a workaround suggestion?

This is the dilemma: application keeps some parameters in SNVS_GPR0..3 registers. Not in flash rom, because they get updated very frequently. And it does RS485 communication at 1.5 Mbit/s. Software interrupts toggle the data direction from Tx back to Rx after the last packet byte was transmitted. After that, a connected device can respond within a few tens of microsecond. But if I'm too late releasing the RS485 line, then I'll miss the RS485 response packet from the connected device. Because my RS485 line driver is still in Tx mode... Likewise, there's an issue already with HardwareSerial when the LPUART Rx FIFO overflows. that's at 4+ un-serviced chars already, so at 1.5 Mbaud, that's also within ~50 us. So no way can this Teensy4.1 application afford interrupt latencies as long as 50+ us. Yet any low priority task that reads/writes SNVS area will cause freezes of ~91 us (read or write), 182 us (read-modify-write). I see no way to prevent this. Except for not ever using SNVS.

Looks like setting the real time clock is another operation that also freezes the CPU for 92 us. I understand why the Arm7 chip does all this with its 32kHz clock domains etc. But my application (and I guess many others) just cannot afford this behavior.

Only workaround I can now think of is to use something like an external RTC/NVRAM chip on I2C or SPI or so. Anyone any better idea?


Code:
//#include <util/atomic.h>
#include <imxrt.h>

volatile uint32_t CYCCNT_start = 0;
volatile uint32_t CYCCNT_servicing_isr = 0;
volatile uint32_t latency = 0;
volatile uint32_t max_latency = 0;

volatile uint32_t Loop_count = 0;
volatile uint32_t Fire_interrupts_count = 0;

volatile uint32_t test0;
volatile uint32_t test1;

volatile bool was_enabled_at_powerup = 0;

uint32_t ms_now;
uint32_t ms_prev_now;
   
time_t time_start;

enum {READ_SNVS, NO_SNVS, WRITE_SNVS, READ_MODIFY_WRITE_SNVS, READ_TWO_SNVS, READ_RTC, WRITE_RTC};
const char test_mode_strings[10][40] = 
{
  "READ_SNVS", "NO_SNVS", "WRITE_SNVS", "READ_MODIFY_WRITE_SNVS", "READ_TWO_SNVS", "READ_RTC", "WRITE_RTC"
};

int test_mode = NO_SNVS;

void setup() 
{
    was_enabled_at_powerup = (SNVS_LPCR >> 24) & 1;
    SNVS_LPCR |= (1 << 24);
    
    time_start = Teensy3Clock.get();

    Serial.begin(115200);
    int retry = 0;    // Serial is the USB port, but if no PC connected/active, then proceed anyway
    while ((!Serial) && (retry++ < 100))
        delay(10);
    if (CrashReport)
      Serial.print(CrashReport);
    Serial.println("\n" __FILE__ " " __DATE__ " " __TIME__);

    Serial.printf("\nGPR0-3 was enabled = %d\n", was_enabled_at_powerup);

    uint32_t reload = us_to_CYCCNT(10.0) / 4;
    if (reload > 65535)
    {
      Serial.printf("\nError: requesting a too long 16 bits timer time\n");
      while (1);
    }

    attachInterruptVector(IRQ_QTIMER1, t1_isr); // start
    NVIC_ENABLE_IRQ(IRQ_QTIMER1);
    NVIC_SET_PRIORITY(IRQ_QTIMER1, 16);

    TMR1_CTRL1   = 0x0000;
    TMR1_LOAD1   = 0x0000;
    TMR1_COMP11  = reload;
    TMR1_CMPLD11 = reload;
    TMR1_CNTR1   = 0x0000;
    TMR1_SCTRL1  = 0x4000;
    TMR1_CTRL1 = TMR_CTRL_CM(1) | TMR_CTRL_PCS(0b1000) | TMR_CTRL_ONCE | TMR_CTRL_LENGTH;
   
    ms_now = millis();
    ms_prev_now = ms_now;
}

void t1_isr()
{
    CYCCNT_servicing_isr = ARM_DWT_CYCCNT;
    TMR1_SCTRL1 &= ~TMR_SCTRL_TCF;

    latency = CYCCNT_servicing_isr - CYCCNT_start;
    if (latency > max_latency) 
        max_latency = latency;

    Fire_interrupts_count++;
}

double CYCCNT_to_us (uint32_t cycles)
{
    return (double)cycles / 600;
}

uint32_t us_to_CYCCNT (double us)
{
    us *= 600.0;
    return (uint32_t)us;
}

//void lloop() __attribute__((optimize("-O0")));

void loop()
{
    Loop_count++;
    ms_now = millis();
    int r = random (32768);


// let TMR1 do a one-shot 10 us delay that ends with an interrupt that is to be serviced immediately after those 10 microseconds
    TMR1_CNTR1   = 0x0000;
    CYCCNT_start = ARM_DWT_CYCCNT;
    TMR1_CTRL1 = TMR_CTRL_CM(1) | TMR_CTRL_PCS(0b1000) | TMR_CTRL_ONCE | TMR_CTRL_LENGTH;

// one-shot is now ticking away 10 us. While it does that, we'll run one of these tests to see how long the CPU freezes (and thus also does not service interrupts!)
    switch (test_mode)
    {
      case READ_SNVS:
        test0 = SNVS_LPGPR0;
        break;
      case NO_SNVS:
        delay(2);
        break;
      case WRITE_SNVS:
        SNVS_LPGPR0 = r;
        break;
      case READ_MODIFY_WRITE_SNVS:
        SNVS_LPGPR0 += r;
        break;
      case READ_TWO_SNVS:
        test0 = SNVS_LPGPR0;
        test1 = SNVS_LPGPR1;
        break;
      case READ_RTC:
        time_start = Teensy3Clock.get();
        break;
      case WRITE_RTC:
        Teensy3Clock.set(time_start);
        break;
    }


// user input to select other tests and to reset the max latency detected so far
    if (Serial.available())
    {
        char ch;
        switch (ch = Serial.read())
        {
          case 'r' :
            max_latency = 0;
            Loop_count = 0;
            Fire_interrupts_count = 0;
            break;
          case '+' :
            test_mode++;
              if (test_mode > WRITE_RTC)
            test_mode = 0;
            break;
          case '0':
          case '1':
          case '2':
          case '3':
          case '4':
          case '5':
          case '6':
              test_mode = ch - '0';
              break;
        }
    }
   
// console output every 2 seconds to show ongoing test results 
    if ((ms_now - ms_prev_now) >= 2000)
    {
        ms_prev_now = ms_now;
        Serial.printf ("test_mode=%d %s ", test_mode, test_mode_strings[test_mode]);
        Serial.printf ("loop_count=%d Cnt=%d last latency=%1.3f us, longest latency=%1.3f us ", Loop_count, Fire_interrupts_count, CYCCNT_to_us (latency), CYCCNT_to_us (max_latency));
        Serial.printf ("test0= %08x\n", test0);
    }

}
 
I haven't read that part of the documentation so not completely sure there's another fix, but what about writing to some shadow space in memory and then triggering DMA to copy the new data to the SNVS registers (rather than using the CPU directly) ?
 
Good question. I was wondering why something that changes "very frequently" needs to be preserved across power cycles, firmware updates, etc.

In this application, it’s about an auto correction for drift in magnetic and gravitational MEMS. The values are recalculated every 10ms when this tool is rotating. The lowest bits are pretty much noise. When the tool stops spinning for 10 minutes, it will power off, and power for Teensy comes back when it spins again.

But i’m worried about this being a generic pitfall. I mean how many of us are aware of the danger of interrupt latencies possibly going up to ~100 us, even for those interrupts that are set as the top priority, when low priority code sets e.g. the real time clock? I was not aware. Till last week.

I will try the DMA mode as suggested by jmarsh. Tomorrow.
 
I haven't read that part of the documentation so not completely sure there's another fix, but what about writing to some shadow space in memory and then triggering DMA to copy the new data to the SNVS registers (rather than using the CPU directly) ?

Briefly looked into DMA for memcpy. I fear that at best it might free the CPU from freezing, but it will freeze all other DMA channels that compete for bus access while 'memcpy' DMA is taking its time to read or write SNVS registers. Each DMA channel has DMA priority features, just like different interrupts can have priorities. But that's not going to help much if a SNVS area DMA read or write has commenced already is it? Plus that 32 byte DMA decache boundary will/may give trouble?

A possible workaround would be to ditch the concept of using NVRAM SNVS_LPGPR0..3, and use flash instead. But my fear is that flash memory cells will degrade over time when erasing/overwriting too frequently. Or is that a thing of the past? Other fear is that writing to flash requires all interrupts disabled while writing and writes take a long time also - maybe also a thing of the past?
 
Briefly looked into DMA for memcpy. I fear that at best it might free the CPU from freezing, but it will freeze all other DMA channels that compete for bus access while 'memcpy' DMA is taking its time to read or write SNVS registers. Each DMA channel has DMA priority features, just like different interrupts can have priorities. But that's not going to help much if a SNVS area DMA read or write has commenced already is it? Plus that 32 byte DMA decache boundary will/may give trouble?

It would, but what other DMAs are happening at the time? Whatever happens, something is going to be blocked while it does the SNVS write and it's better to not be the CPU.
DMA doesn't have any 32-byte alignment requirements, that's purely to do with the CPU cache (which only applies to DMAMEM/EXTMEM/cached memory, not DTCM).
 
In this application, it’s about an auto correction for drift in magnetic and gravitational MEMS. The values are recalculated every 10ms when this tool is rotating. The lowest bits are pretty much noise. When the tool stops spinning for 10 minutes, it will power off, and power for Teensy comes back when it spins again.

If it has to calaculated every 10ms, there's no point saving it across power cycles is there? It'll be stale by the time the power comes back.
 
If it has to calaculated every 10ms, there's no point saving it across power cycles is there? It'll be stale by the time the power comes back.

What it calculates is the offset in earth magnetic field sensors. It's about compensating for long term (minutes to years timescale) drift in these sensors. The X,Y sensors spin in the X,Y plane. It's a tool in a borehole. When they spin then in each 360 deg rotation the field that they sense will go in between an arbitrary min and an arbitrary max. Right in between that min and max is where the sensor should read zero. So whatever that value is, that must be the offset to zero gauss. If it is on and spinning then it does the auto bias detection measurement. And uses it for navigation. When it's been off for days, and then goes back on, I'd like it to restart using to last known bias values from a previous run. Because then it's a decent value immediately. So don't need to wait say 30 minutes before the bias is measured and auto-cancelled again.
The raw sensor readings arrive every 10 ms. Huge low pass filters after that. But the update rate remains high. And I cannot easily predict when power for Teensy is about to disappear and persistence for these bias values is required...

That said, I could opt for saving only once every 10 seconds. But if I do that, for days, then will the flash cells (that emulate) EEPROM survive say 10k+ (over)write cycles? And how to make sure that the tool keeps doing its normal job for which many interrupt driven tasks need to carry on undisturbed, also when writing to flash...

but what other DMAs are happening at the time
Al lot actually, and they will not tolerate 100 us gaps for their real-time actions. Synchronous MEMS sensor readout over DMA SPI. 1-wire DMA_UARTS for maintaining a dialog with a motor controller and more...
 
That said, I could opt for saving only once every 10 seconds. But if I do that, for days, then will the flash cells (that emulate) EEPROM survive say 10k+ (over)write cycles? And how to make sure that the tool keeps doing its normal job for which many interrupt driven tasks need to carry on undisturbed, also when writing to flash...

The PJRC site says 100K cycles for EEPROM, and that applies to each location individually, so you could use the entire space as an array. Or you might want to use the low-level code in cores\Teensy4\eeprom.c to manage the erasing and writing yourself. You can erase in 32K and 64K blocks, and use all of that space to hold values in sequence, and add logic to find the latest value on bootup.
 
The EEPROM emulation code already takes care of wear-levelling to minimize the flash erase cycles.
 
If OP already had a 3V coin cell battery connected, they could just repurpose it to maintain power to pretty much any SPI RAM chip. As long as it's connected up for low power consumption (usually means putting a pull-up on CS) the battery should last for months, possibly years.
 
The PJRC site says 100K cycles for EEPROM, and that applies to each location individually, so you could use the entire space as an array. Or you might want to use the low-level code in cores\Teensy4\eeprom.c to manage the erasing and writing yourself. You can erase in 32K and 64K blocks, and use all of that space to hold values in sequence, and add logic to find the latest value on bootup.

Yes, but, EEPROM.c is full of __disable_irq(); flash_wait(); so I guess that's blocking interrupt services again... My application cannot tolerate that.
W25Q64 datasheet states individual writes are several ms, sector erase is tens-hundreds of ms max...
 
Your problem seems to be that you cannot tolerate losing power and therefore important settings.
I have a similar requirement and am implementing a UPS system (see picture below).
This utilises a 18650 cell as a battery with a TP4056 circuit for charge and a 3V-5V Boost Converter for 5V battery output.
The circuit incorporates a MOSFET circuit to select 5V from the "Wall" when available or battery 5v when not.
I also monitor battery voltage and save data and go into sleep mode when battery voltage gets to 3.2V.

The circuit below shows two 5V outputs but you probably only need one.
There is a picture below showing the boost circuit module, it costs about $1.

Boost Converter Circuit2.jpg
Boost Converter Circuit.jpg
 

Attachments

  • battery charge3.png
    battery charge3.png
    778.9 KB · Views: 11
Yes, but, EEPROM.c is full of __disable_irq(); flash_wait(); so I guess that's blocking interrupt services again... My application cannot tolerate that.
W25Q64 datasheet states individual writes are several ms, sector erase is tens-hundreds of ms max...

23K256 low power SPI RAM: ~5uA standby current = roughly five years of battery lifetime from a 220mAh CR2032.
 
Your problem seems to be that you cannot tolerate losing power and therefore important settings.
I have a similar requirement and am implementing a UPS system (see picture below).
This utilises a 18650 cell as a battery with a TP4056 circuit for charge and a 3V-5V Boost Converter for 5V battery output.
The circuit incorporates a MOSFET circuit to select 5V from the "Wall" when available or battery 5v when not.
I also monitor battery voltage and save data and go into sleep mode when battery voltage gets to 3.2V.

The circuit below shows two 5V outputs but you probably only need one.
There is a picture below showing the boost circuit module, it costs about $1.

View attachment 32172
View attachment 32171

This is identical to how I power my Teensy Bass pedal; Two 18650's (P) connected directly to that very same usb charge controller and battery protector, then with a small 5v boost converter. With the Teensy 4.1 running at 150mhz, it does the job nicely.
 
I've used an external F-RAM chip on a Teensy 3.6 like MatrixRat suggested in Post #13. If you're using a Teensy 4.1 you should be able to use a QSPI F-RAM chip as noted here: https://www.infineon.com/dgdl/Infineon-AN218375_Designing_with_Infineon_Quad_SPI_(QSPI)_F-RAM-ApplicationNotes-v01_00-EN.pdf

The largest currently available in 8-SOIC is 4Mb or 512KB. https://www.digikey.com/en/products/detail/cypress-semiconductor-corp/CY15B104QSN-108SXIT/11486261

Just solder it to the pads on the back of the Teensy 4.1. The library that handles initializing the EXMEM interface will probably have to be modified to recognize the F-RAM, but once that is done the non-volatile memory is accessed through the normal EXMEM procedure.
 
Back
Top