Teensy 4.1 : overclocking - do NOT do!

tjaekel

Well-known member
FYI,
I found a thread in Internet to overclock the MCU on Teensy 4.1 board. Even they say, "with a heatsink or active cooling" you can reach up to 1 GHz (1000 MHz, compared to 600 MHz nominal),
it is not really true.

What I have found:
  1. faster as 600 MHz degrades GPIO signals (their peak level, not 3V3 anymore)
  2. too fast - and it bricks the board!

The faster you set MCU core clock - the GPIO signals degrade! They do not reach anymore the 3V3 level (examples below). The peak amplitude of pulsing a GPIO is degraded!
Starting with 900 MHz MCU core clock - the flashing (bootloader) via USB becomes flaky (unreliable).
With 1000 MHz core clock - it was almost IMPOSSIBLE to flash the board again: the USB is so flaky, the bootloader, USB flash is almost NOT working anymore!
(it took me so may trials just to recover the board!)

The overclocking affects the bootloader, USB...!

My conclusion:
It does not make sense to try overclocking the MCU:
with 600 MHz it seems to be already right at the max. performance. Otherwise, signals become degraded (not reaching nominal high level voltage anymore) and the
flash loader, USB, ... starts getting unreliable.

The max. clock speed I could find able to set: 620 MHz (resulting in 618 MHz effectively).
(independent of cooling: based on signal integrity and reliability)

Details
Code:
MCU clock set      MCU clock effective        GPIO voltage level               max. GPIO toggle frequency         comments
--------------------------------------------------------------------------------------------------------------------------------------------------------
600 MHz             600 MHz                       3V3                                     149.7/150.2 MHz    (1)                OK
620 MHz             618 MHz                       3V3                                     154.8 MHz    (1)                          OK - still ok - but MAXIMUM!
700 MHz             696 MHz                       2V0                                     162.3 MHz    (1)                          peak amplitude degrades!
800 MHz             804 MHz                       1V9                                     201 MHz       (1)                          not enough peak amplitude!
900 MHz             900 MHz                       1V8                                     225 MHz       (1)                          flash loading gets flaky
1000 MHz           1000 MHz                     1V7                                      ? (not measured)                        almost impossible to use USB, UART, flash loader -> [B]BRICKED BOARD![/B]

(1) this speed I can get only with my own GPIO pin set function, not with the library functions!, e.g. "digitalWrite()". (separate topic)

Code/Functions

For the MCU clock speed setting there is a file "clockspeed.c" (as C-code !) in library.
Here my local version I have used:

Code:
#include <stdint.h>
#include "imxrt.h"
#include "wiring.h"

// A brief explanation of F_CPU_ACTUAL vs F_CPU
//  https://forum.pjrc.com/threads/57236?p=212642&viewfull=1#post212642
volatile uint32_t F_CPU_ACTUAL = 396000000;
volatile uint32_t F_BUS_ACTUAL = 132000000;

// Define these to increase the voltage when attempting overclocking
// The frequency step is how quickly to increase voltage per frequency
// The datasheet says 1600 is the absolute maximum voltage.  The hardware
// can actually create up to 1575.  But 1300 is the recommended limit.
//  (earlier versions of the datasheet said 1300 was the absolute max)
#define OVERCLOCK_STEPSIZE  28000000
#define OVERCLOCK_MAX_VOLT  1575

#ifdef __cplusplus
extern "C" {
#endif
uint32_t set_arm_clock(uint32_t frequency);
#ifdef __cplusplus
}
#endif

// stuff needing wait handshake:
//  CCM_CACRR  ARM_PODF
//  CCM_CBCDR  PERIPH_CLK_SEL
//  CCM_CBCMR  PERIPH2_CLK_SEL
//  CCM_CBCDR  AHB_PODF
//  CCM_CBCDR  SEMC_PODF

uint32_t set_arm_clock(uint32_t frequency)
{
	uint32_t cbcdr = CCM_CBCDR; // pg 1021
	uint32_t cbcmr = CCM_CBCMR; // pg 1023
	uint32_t dcdc = DCDC_REG3;

	// compute required voltage
	uint32_t voltage = 1150; // default = 1.15V
	if (frequency > 528000000) {
		voltage = 1250; // 1.25V
#if defined(OVERCLOCK_STEPSIZE) && defined(OVERCLOCK_MAX_VOLT)
		if (frequency > 600000000) {
			voltage += ((frequency - 600000000) / OVERCLOCK_STEPSIZE) * 25;
			if (voltage > OVERCLOCK_MAX_VOLT) voltage = OVERCLOCK_MAX_VOLT;
		}
#endif
	} else if (frequency <= 24000000) {
		voltage = 950; // 0.95
	}

	// if voltage needs to increase, do it before switch clock speed
	CCM_CCGR6 |= CCM_CCGR6_DCDC(CCM_CCGR_ON);
	if ((dcdc & DCDC_REG3_TRG_MASK) < DCDC_REG3_TRG((voltage - 800) / 25)) {
		dcdc &= ~DCDC_REG3_TRG_MASK;
		dcdc |= DCDC_REG3_TRG((voltage - 800) / 25);
		DCDC_REG3 = dcdc;
		while (!(DCDC_REG0 & DCDC_REG0_STS_DC_OK)) ; // wait voltage settling
	}

	if (!(cbcdr & CCM_CBCDR_PERIPH_CLK_SEL)) {
		const uint32_t need1s = CCM_ANALOG_PLL_USB1_ENABLE | CCM_ANALOG_PLL_USB1_POWER |
			CCM_ANALOG_PLL_USB1_LOCK | CCM_ANALOG_PLL_USB1_EN_USB_CLKS;
		uint32_t sel, div;
		if ((CCM_ANALOG_PLL_USB1 & need1s) == need1s) {
			sel = 0;
			div = 3; // divide down to 120 MHz, so IPG is ok even if IPG_PODF=0
		} else {
			sel = 1;
			div = 0;
		}
		if ((cbcdr & CCM_CBCDR_PERIPH_CLK2_PODF_MASK) != CCM_CBCDR_PERIPH_CLK2_PODF(div)) {
			// PERIPH_CLK2 divider needs to be changed
			cbcdr &= ~CCM_CBCDR_PERIPH_CLK2_PODF_MASK;
			cbcdr |= CCM_CBCDR_PERIPH_CLK2_PODF(div);
			CCM_CBCDR = cbcdr;
		}
		if ((cbcmr & CCM_CBCMR_PERIPH_CLK2_SEL_MASK) != CCM_CBCMR_PERIPH_CLK2_SEL(sel)) {
			// PERIPH_CLK2 source select needs to be changed
			cbcmr &= ~CCM_CBCMR_PERIPH_CLK2_SEL_MASK;
			cbcmr |= CCM_CBCMR_PERIPH_CLK2_SEL(sel);
			CCM_CBCMR = cbcmr;
			while (CCM_CDHIPR & CCM_CDHIPR_PERIPH2_CLK_SEL_BUSY) ; // wait
		}
		// switch over to PERIPH_CLK2
		cbcdr |= CCM_CBCDR_PERIPH_CLK_SEL;
		CCM_CBCDR = cbcdr;
		while (CCM_CDHIPR & CCM_CDHIPR_PERIPH_CLK_SEL_BUSY) ; // wait
	} else {
	}

	// TODO: check if PLL2 running, can 352, 396 or 528 can work? (no need for ARM PLL)

	// DIV_SELECT: 54-108 = official range 648 to 1296 in 12 MHz steps
	uint32_t div_arm = 1;
	uint32_t div_ahb = 1;
	while (frequency * div_arm * div_ahb < 648000000) {
		if (div_arm < 8) {
			div_arm = div_arm + 1;
		} else {
			if (div_ahb < 5) {
				div_ahb = div_ahb + 1;
				div_arm = 1;
			} else {
				break;
			}
		}
	}
	uint32_t mult = (frequency * div_arm * div_ahb + 6000000) / 12000000;
	if (mult > 108) mult = 108;
	if (mult < 54) mult = 54;
	frequency = mult * 12000000 / div_arm / div_ahb;

	const uint32_t arm_pll_mask = CCM_ANALOG_PLL_ARM_LOCK | CCM_ANALOG_PLL_ARM_BYPASS |
		CCM_ANALOG_PLL_ARM_ENABLE | CCM_ANALOG_PLL_ARM_POWERDOWN |
		CCM_ANALOG_PLL_ARM_DIV_SELECT_MASK;
	if ((CCM_ANALOG_PLL_ARM & arm_pll_mask) != (CCM_ANALOG_PLL_ARM_LOCK
	  | CCM_ANALOG_PLL_ARM_ENABLE | CCM_ANALOG_PLL_ARM_DIV_SELECT(mult))) {
		CCM_ANALOG_PLL_ARM = CCM_ANALOG_PLL_ARM_POWERDOWN;
		// TODO: delay needed?
		CCM_ANALOG_PLL_ARM = CCM_ANALOG_PLL_ARM_ENABLE
			| CCM_ANALOG_PLL_ARM_DIV_SELECT(mult);
		while (!(CCM_ANALOG_PLL_ARM & CCM_ANALOG_PLL_ARM_LOCK)) ; // wait for lock
	} else {
	}

	if ((CCM_CACRR & CCM_CACRR_ARM_PODF_MASK) != (div_arm - 1)) {
		CCM_CACRR = CCM_CACRR_ARM_PODF(div_arm - 1);
		while (CCM_CDHIPR & CCM_CDHIPR_ARM_PODF_BUSY) ; // wait
	}

	if ((cbcdr & CCM_CBCDR_AHB_PODF_MASK) != CCM_CBCDR_AHB_PODF(div_ahb - 1)) {
		cbcdr &= ~CCM_CBCDR_AHB_PODF_MASK;
		cbcdr |= CCM_CBCDR_AHB_PODF(div_ahb - 1);
		CCM_CBCDR = cbcdr;
		while (CCM_CDHIPR & CCM_CDHIPR_AHB_PODF_BUSY); // wait
	}

	uint32_t div_ipg = (frequency + 149999999) / 150000000;
	if (div_ipg > 4) div_ipg = 4;
	if ((cbcdr & CCM_CBCDR_IPG_PODF_MASK) != (CCM_CBCDR_IPG_PODF(div_ipg - 1))) {
		cbcdr &= ~CCM_CBCDR_IPG_PODF_MASK;
		cbcdr |= CCM_CBCDR_IPG_PODF(div_ipg - 1);
		// TODO: how to safely change IPG_PODF ??
		CCM_CBCDR = cbcdr;
	}

	//cbcdr &= ~CCM_CBCDR_PERIPH_CLK_SEL;
	//CCM_CBCDR = cbcdr;  // why does this not work at 24 MHz?
	CCM_CBCDR &= ~CCM_CBCDR_PERIPH_CLK_SEL;
	while (CCM_CDHIPR & CCM_CDHIPR_PERIPH_CLK_SEL_BUSY) ; // wait

	F_CPU_ACTUAL = frequency;
	F_BUS_ACTUAL = frequency / div_ipg;
	scale_cpu_cycles_to_microseconds = 0xFFFFFFFFu / (uint32_t)(frequency / 1000000u);

	// if voltage needs to decrease, do it after switch clock speed
	if ((dcdc & DCDC_REG3_TRG_MASK) > DCDC_REG3_TRG((voltage - 800) / 25)) {
		dcdc &= ~DCDC_REG3_TRG_MASK;
		dcdc |= DCDC_REG3_TRG((voltage - 800) / 25);
		DCDC_REG3 = dcdc;
		while (!(DCDC_REG0 & DCDC_REG0_STS_DC_OK)) ; // wait voltage settling
	}

	return frequency;
}

The GPIO test has two flavors:
- use the original "digitalWrite()" - which is very slow (10x slower as possible)!
- use my own "GPIO_setOutValue()" function - 10x faster! (I assume due to difference running code on ITCM versus on external flash)

Code:
/* helper function to set GPIO Output register before configuring mode */

void GPIO_setOutValue(uint8_t pin, uint8_t val)
{
	const struct digital_pin_bitband_and_config_table_struct *p;
	uint32_t mask;

	if (pin >= CORE_NUM_DIGITAL) return;
	p = digital_pin_to_info_PGM + pin;
	mask = p->mask;
	// pin is configured for output mode
	if (val) {
		*(p->reg + 0x21) = mask; // set register
	} else {
		*(p->reg + 0x22) = mask; // clear register
	}
}

void GPIO_testSpeed(void) {
#if 1
  /* this is 10x faster! assuming, this code runs on ITCM */
  while (1) {
    GPIO_setOutValue(32, arduino::HIGH);
    GPIO_setOutValue(32, arduino::LOW);
  }
#else
  /* this is 10x slower! assuming the function sits on external flash (and is not cached or running full speed) */
  while (1) {
    digitalWrite(32, arduino::HIGH);
    digitalWrite(32, arduino::LOW);
  }
#endif
}

Conclusion
It is not worth to try overclocking. The Teensy 4.1. MCU seems to run already with 600 MHz at the limits.
(if you want to keep good signals, the MCU might run much faster, but the peripherals start to fail, including USB, flash loader ...!)

Do not brick your board with overclocking (no way to recover!).
 
FYI,
I found a thread in Internet to overclock the MCU on Teensy 4.1 board. Even they say, "with a heatsink or active cooling" you can reach up to 1 GHz (1000 MHz, compared to 600 MHz nominal),
it is not really true.

What I have found:
  1. faster as 600 MHz degrades GPIO signals (their peak level, not 3V3 anymore)
  2. too fast - and it bricks the board!

The faster you set MCU core clock - the GPIO signals degrade! They do not reach anymore the 3V3 level
...

But how is the signal integrity actually. It might be worthwhile posting an image of your experiments (from digital oscilloscope) at the various measured voltage and clock settings.

Not doubting your experience or what you have witnessed, but my own practical experience differs widely from you've detailed. Eg; I run multiple Teensy 4.1's at 720mhz and 816mhz 24/7 - all using 20+ GPIO pins operating at a minimum of 20mhz. One device GPIO is running @ 60mhz, but has reached 120Mhz stable. All have two heatsinks (copper, one large with fins to draw the heat on to the a smaller cover plate) attached via 3M double sided thermal sticky tape.
 
IIRC, don't the GPIO pins also have a "drive level" configuration ?? Did you try increasing the drive level ?? Since that should directly affect the rise time of the GPIO signals, very likely the overclocked shape of the output waveform will improve. Of course, there will still be a maximum limit, but probably not as low as you have experienced/reported.

Mark J Culross
KD5RXT
 
Indeed overclocking involves risks, especially without cooling. Hardware rated to 600 MHz should not be expected stable when pushed to 900 MHz to 1 GHz.

But I would like to comment regarding some of your conclusions.

No matter what speed your program runs, the bootloader always runs at 396 MHz.

When you click Upload in Arduino IDE, entering bootloader mode depends on your code running at overclocked speed to notice the USB request to enter bootloader mode. Like everything else with overclocking, it can become unreliable. But you should not conclude Teensy is bricked only because you see the error in Arduino saying Teensy didn't respond and you need to press the pushbutton. That error is the expected behavior when a program (even without overclocking) crashes in any way where USB communication no longer works.

Expect to have to press the Program pushbutton on Teensy when things go wrong.

About the GPIO voltage conclusions, please understand your code is generating extremely short pulses! The pulse width varies with CPU speed.

Code:
void GPIO_testSpeed(void) {
#if 1
  /* this is 10x faster! assuming, this code runs on ITCM */
  while (1) {
    GPIO_setOutValue(32, arduino::HIGH);
    GPIO_setOutValue(32, arduino::LOW);
  }

Even if you have the pin set to fast slew rate, when running at high speeds you may be trying to change the pin faster than the transistors inside the chip are able to actually change the voltage. They are only capable of limited current to charge & discharge the capacitance of the pin and any test gear you've connected. Overclocking the CPU doesn't change anything about the transistors which drive the GPIO pin.

You should add a delay, even just several NOP instructions (enough to fill the M7's pipeline to be sure to have an effect), so the pin's voltage is able to settle. I'm pretty sure you'll see the GPIO voltage problem you experienced is mostly due to the pins not able to physically change voltage as fast as your overclocked code is running, and with delay for the signal to settle you'll probably see the GPIO voltage remains fairly consistent even up to 1 GHz.
 
Update:
I have a heat sink (cupper) on MCU. temperature is fine (maybe 40C).

The drive strengths (DSE) does not matter so much for speed (more related to external impedance, I have scope with 1 MOhm probe only (no load)).
The Slew Rate (SRE) and Speed (SPEED) matters more for maximum speed.

BTW: the drive strengths is set already to maximum (DSE(7)) in LIB functions.

Anyway, based on the datasheet for MCU: with proper SRE and SPEED config - the maximum possible GPIO speed is 200 MHz.

Here two scope pictures:
1. 700 MHz MCU core clock, full speed GPIO: 174 MHz - but signal amplitude is degraded!
2. 620 MHz MCU core clock, full speed GPIO: 154 MHz - signal amplitude is OK

GPIO_700MHz.jpg
GPIO_620MHz.jpg

All fine, it was just a test.
(I do not need really so fast toggling GPIO, more a test what happens when overclocking the MCU)
 
I suspect you are seeing ringing in your 'scope ground wire, and/or bandwidth limitations of your 'scope, not the real voltage on the pin. These are fast logic edges that require a low-impedance probe and GHz BW 'scope to see accurately.

The amplitude accuracy of a scope input section often goes to pieces when close to its cut-off frequency as the response is peaked to extend the usable bandwidth, and a 10:1 passive probe also has anomolies at the end of its range too.
 
Back
Top