CPU Operating Voltage Boundary issue with 528MHz speed on Teensy 4.1

KenHahn

Well-known member
I am posting this to capture the info in case others stumble onto the same issue and are doing a search or have further input. Tagging @defragster as he was instrumental in figuring out what was going on.

The system in question uses a Teeny 4.1 with a 7” RA8875 LCD sitting on SPI1 and mounted onto a PCB baseboard.

Various software that had been running on the system over a fairly long time showed no issues with the setup.

T41 software (HAM radio software with heavy display usage) was installed and would quickly fail with the display freezing. This usually occurred while painting the initial screen, but sometimes a little later. The same software was previously running OK on a number of different, but similar setups that used the same LCD also on SPI1.

1776638683904.jpeg


There seemed to be a thermal component as a warm system would tend to fail quicker than a cold one, but the setup was not hot even though the Teensy was positioned under the LCD. CPU temp was generally under or around 50C.

A total of 9 Teensy were installed and run on 3 separate baseboards. 5 of the Teensy failed with this same hang condition and 4 would run OK. A Teensy that passed or failed would stay consistent when swapped between baseboards. Using extender cables on the bus to see if it pointed to a marginal bus integrity issue of some type did not have any effect. Nor did changing the value of the bus dampening resistors from 56 ohm to 100 ohm. I have not seen this type of variability between Teensy before, even if the problem was with the baseboard itself.

The focus was on the SPI1 bus, and it appeared that some SPI transactions were not being completed, thus hanging the bus and freezing the display, but not hanging the Teensy itself.

The issue was eventually tracked down to the fact that the CPU compiler option had been set to 528MHz. It was also set to smallest code size, though that does not appear to be a factor.

Defragster theorized that it might be related to the lower internal operating voltage of 1.15V used at 528MHz vs. 1.25V used at the standard 600MHz. He modified the operating voltage to be 1.25V while keeping the CPU frequency at 528MHz and that proved to fix the issue. Compiling the software at 600MHz (which also raises the voltage to 1.25V) also fixed the issue. Further testing indicated that a voltage of about 1.20V at 528MHz seemed to be where most if not all systems would work correctly. At least one previous failure case at 1.15V would work at 1.18V.

The exact root of the issue was not uncovered. Clearly there was something marginal at this lower internal operating voltage that caused some Teensy 4.1 to malfunction in regards to the SPI1 bus but not others in the same exact baseboard. It seemed to be a perfect storm with this system as each of the pieces were observed to run this same software fine in other systems including the same Teensy 4.1s.

Defragster mentioned that in hindsight, the project by @Dogbone06 which used 16-bit 32MB external SDRAM while running at 528MHz, may have also run into a similar issue which required fine tuning of the operating voltage to achieve optimum performance.
 
Defragster mentioned that in hindsight, the project by @Dogbone06 which used 16-bit 32MB external SDRAM while running at 528MHz, may have also run into a similar issue which required fine tuning of the operating voltage to achieve optimum performance.
It did but it also used the industrial version of the IMXRT chip that is rated for nominal 500MHz operation rather than 600MHz. So that was attributed to be the cause.
In that case the code would typically crash with bizarre "can't happen" errors, e.g. looking at register dumps showed they contained values that were different from what the code immediately before the fault had loaded into them.
 
Responding to this before reading @KenHahn post I read the pre-release of...
used the industrial version of the IMXRT chip
It did indeed use that 528M Auto chip. Changing the voltage was the last thing in testing and running and getting reproduction of the hang.
I was only recalling that the PSRAM mem test moved to SDRAM was returning NOT expected values? Maybe there was more - it's been 2 years?

Repro could be 7-10 minutes, or safe after overnight run? And selected T_4.1's would NEVER show failure. Seeing the recent Silicon REV [ SRC_SBMR2 ] discovery note I found I had TWO of the same reported Earlier REV#25 and ONE never failed to run as desired, and the SECOND would repro the stalling problem. This was consistent on the PCB at hand with the RA8875.h connected even when the T_4.1's were swapped over time.

Voltage change was an early thought - but only tried when nothing else made sense, and Repro was at hand. I started interval timers that kept ticking after the 'Profiling LEDS' in the code stopped showing any activity. At no time did the T_4.1 shows signs of overtemp restart or a logged Crashreport restart - ALSO AUTO Teensy Loader upload was functional - so the 'core' was still active. Until the PRI==42 IntervalTimer was added to Toggle another LED there was no indication of function. Note: the Intervaltimer was set at 4 sec trigger. Other efforts to show life at times made the repro case seem to go away.
 
Last edited:
Reading p#1 that covers the situation. The person behind the HAM software showing the issue made some note that 'others tend to 528MHz' due to running in a hot enclosure (I suppose). It was a long chain of emails over days. The HAM at hand ended up with 1ft of SPI wires a scope and LA seeing good SPI traffic, so nothing SPI specific to the display was noted.

The sketch updates the MCU Temp on screen and it was showing 47-50 here with dynamic room temp. Bumping the Voltage to 1.25V or 1.20V did not show any change in the displayed voltage range.

T_4.1 here with 1.15V fail showed no failure at 1.18V as tested, but it did for Ken, so bumped to 1.20V and all seems well on both.
 
Ok, I've put adjusting the CPU voltage on my high priority list for version 1.61.
For your reference: this is the code I use in my local tree so I don't have to worry if the target board is a regular Teensy or one of the SDRAM boards, it checks the OTP to set the voltage based on the chip variant. Not suggesting it should be adopted directly (because I capped the overvolt to significantly less than normal) but it would be nice if the core code supported the industrial temperature chips.
 
Code seems right and the Voltage numbers it gives at change are:
Code:
 ---- Teensy 1062 chip ----
f/1M=16 v=950
f/1M=32 v=975
f/1M=64 v=1000
f/1M=96 v=1025
f/1M=128 v=1050
f/1M=160 v=1075
f/1M=192 v=1100
f/1M=224 v=1125
f/1M=244 v=1150
f/1M=492 v=1175
f/1M=528 v=1200
f/1M=564 v=1225
f/1M=600 v=1250
f/1M=628 v=1275
f/1M=656 v=1300
f/1M=684 v=1325
f/1M=712 v=1350
f/1M=740 v=1375
f/1M=768 v=1400
f/1M=796 v=1425
f/1M=824 v=1450
f/1M=852 v=1475
f/1M=880 v=1500
f/1M=908 v=1525
f/1M=936 v=1550
f/1M=964 v=1575
 ---- 500 MHz AUTO 1062 ----
f/1M=16 v=975
f/1M=32 v=1000
f/1M=64 v=1025
f/1M=96 v=1050
f/1M=128 v=1075
f/1M=160 v=1100
f/1M=192 v=1125
f/1M=224 v=1150
f/1M=244 v=1175
f/1M=492 v=1200
f/1M=528 v=1225
f/1M=564 v=1250
f/1M=600 v=1275
f/1M=628 v=1300
f/1M=656 v=1325
f/1M=684 v=1350
f/1M=712 v=1375
f/1M=740 v=1400
f/1M=768 v=1425
f/1M=796 v=1450
f/1M=824 v=1475
f/1M=852 v=1500
f/1M=880 v=1525
f/1M=908 v=1550
f/1M=936 v=1575

from sketch:
Code:
#undef HW_OCOTP_CFG3
// code from https://github.com/PaulStoffregen/cores/commit/538cb33001c9e9033cb0d95d7911a0e25eeb9308
// https://forum.pjrc.com/index.php?threads/cpu-operating-voltage-boundary-issue-with-528mhz-speed-on-teensy-4-1.77839/post-367667
#define OVERCLOCK_STEPSIZE  28000000
#define OVERCLOCK_MAX_VOLT  1575
uint32_t getVolt( uint32_t frequency, uint32_t HW_OCOTP_CFG3 )
{
  uint32_t voltage; // millivolts
  if (frequency <= 240000000) {
    voltage = 950 + (frequency / 32000000) * 25;
  } else if (frequency <= 456000000) {
    voltage = 1150;
  } else if (frequency <= 600000000) {
    voltage = 1150 + ((frequency - 456000000) / 36000000) * 25;
  } else {
    voltage = 1250 + ((frequency - 600000000) / OVERCLOCK_STEPSIZE) * 25;
  }
  if (((HW_OCOTP_CFG3 >> 16) & 3) == 1) {
    // wide temperature chips rated for 500 MHz may need higher voltage
    voltage += 25;
  }
  if (voltage > OVERCLOCK_MAX_VOLT) voltage = OVERCLOCK_MAX_VOLT;

  return voltage;
}

void setup() {
  // put your setup code here, to run once:

  uint32_t frequency;
  uint32_t HW_OCOTP_CFG3;

  Serial.begin( 2);
  HW_OCOTP_CFG3 = 0;
  uint32_t highV = 0;
  Serial.println( " ---- Teensy 1062 chip ----");
  for ( frequency = 16000000; frequency < 1100000000; frequency += 4000000 ) {
    uint32_t nowV = getVolt( frequency, HW_OCOTP_CFG3 );
    if ( nowV > highV ) {
      Serial.printf( "f/1M=%ld v=%ld\n", frequency / 1000000, nowV );
      highV = nowV;
    }
  }
  HW_OCOTP_CFG3 = 1 << 16;
  highV = 0;
  Serial.println( " ---- 500 MHz AUTO 1062 ----");
  for ( frequency = 16000000; frequency < 1100000000; frequency += 4000000 ) {
    uint32_t nowV = getVolt( frequency, HW_OCOTP_CFG3 );
    if ( nowV > highV ) {
      Serial.printf( "f/1M=%ld v=%ld\n", frequency / 1000000, nowV );
      highV = nowV;
    }
  }
}

void loop() {
  // put your main code here, to run repeatedly:
}
 
Last edited:
Back
Top