Potential Issue with OVERCLOCK_MAX_VOLT in clockspeed.c for Teensy 4.0

nmissa

New member
Hi everyone,

I've encountered a concerning issue while overclocking the Teensy 4.0's CPU to 816 MHz using methods from the `clockspeed.c` file. In our application, the overclocked CPU is tasked with reading high-speed data from an ADC via the SPI bus and transmitting this data to a host computer using the Serial library. Additionally, it controls two indicator LEDs drawing approximately 2 mA each.

After operating smoothly for several hours, we've had two Teensy units overheat suddenly upon program execution. Despite the program continuing to run, the temperature escalates rapidly. If unchecked, the microcontroller overheats and eventually fails. We've been monitoring the junction temperature with the `InternalTemperature.h` library and noticed this issue seems irreversible post-failure. Post-mortem analysis also shows an unusual increase in USB current consumption, regardless of the CPU frequency, once the units fail.

Upon reviewing the MIMXRT1062DVL6B datasheet (NXP Documentation), I observed that the maximum voltage is listed as 1.3V (see Table 10), contradicting the 1.6V mentioned in the library. Although Paul's comments suggest 1.3V as the recommended limit, the OVERCLOCK_MAX_VOLT is set to 1.575V in the library. In a previous version of the file, it seems he did limit the voltage to 1.3V. I'm uncertain if the datasheet was updated after the latest library's release or if this discrepancy might be causing the observed failures.

Has anyone else experienced similar issues when overclocking Teensy 4.0, or does anyone have insights into using the clockspeed.c library effectively without risking hardware failure?

Thank you for your help.
--
Missael Garcia
 
Well, the latest datasheet (couldn't find a newer one than the one you referred to) states 1.6V:

1708031535694.png

So, strictly spoken clockspeed.c is correct.
However, for overdrive operation it's 1.3v:

1708031824255.png

So perhaps the voltage calculation should be modified?
C++:
    // compute required voltage
    uint32_t voltage = 1150; // default = 1.15V
    if (frequency > 528000000) {
        voltage = 1250; // 1.25V
#if defined(OVERCLOCK_STEPSIZE) && defined(OVERCLOCK_MAX_VOLT)
        if (frequency > 600000000) {
            voltage += ((frequency - 600000000) / OVERCLOCK_STEPSIZE) * 25;
            if (voltage > OVERCLOCK_MAX_VOLT) voltage = OVERCLOCK_MAX_VOLT;
        }
#endif
    } else if (frequency <= 24000000) {
        voltage = 950; // 0.95
    }

Paul
 
Hello Paul,

Your response is greatly appreciated. Yes, I believe the voltage calculation code needs revision. We attempted to create our own version of the snippet locally to change the voltage limit and avoid altering the library source code (for portability), but we encountered several problems. However, before addressing that issue, we are questioning whether it's even worthwhile. We were reviewing a document titled "Product Lifetime Usage Estimate" and discovered quite discouraging results regarding both overclocking and increasing the CPU's voltage. Figures 1 and 2 show a lifetime estimate decrease (for the commercial version) from approximately 75,000 hours to around 30,000 hours when increasing the frequency from 528 MHz to 600 MHz (minimum voltage required) at 95°C. It also indicates a lifetime reduction from about 30,000 hours to 22,000 hours at 600 MHz, 95°C, when the voltage is increased by only 25 mV. These figures suggest that, from a reliability perspective, it might be better to seek alternative solutions rather than overclocking the CPU (and consequently increasing the CPU's voltage). Additionally, considering how rapidly the lifetime decreases as a function of temperature and frequency, we are not surprised that we had a few units fail at 816 MHz (1425 mV) when reaching temperatures around 65-75°C. Furthermore, we also had a few I/Os supplying a few milliamperes to a couple of LEDs (barely below the limit under normal conditions), which may have further strained the CPU (though these ports still work even after the CPU is permanently damaged).

1708540897473.png


1708541016639.png
 
Hello,

We did not use heatsinks for the units that failed. We are now starting to attach small 6x6x4 mm copper heatsinks on top of the microcontroller (uC) since our application does not allow for fans. I am not certain about their thermal resistance coefficient, but I assume it is relatively low given the small footprint. If you have experience with heatsinks for Teensys, is there a minimum thermal resistance you would recommend?
I would also like to point out that I don't believe the heat directly caused our units to fail, but rather it was a symptom after the units had already failed (damaged). We have been monitoring the internal temperature using a library, and the units never exceeded approximately 75 degrees Celsius and the slope had very different characteristics (about 10 C every hour in a closed system plateauing around 75 depending on external factors). We programmed them to decrease the uC frequency back to 600 MHz if the temperature ever rose above 85 degrees Celsius (again, using the library). This is 10 degrees below the stated maximum for the commercial version, which I believe is what Teensy 4.0 uses. We record the internal temperature along with other parameters in a log file for each system; after analyzing these files, we noticed that once the units failed, the rate at which the CPU temperature increases is much more aggressive than that of normal units (about 10 degrees in a couple of minutes), potentially reaching 95 C quickly and rendering them unusable. In a previous version of the code, we believe one of the units completely died after its rapid temperature increase went unnoticed and the unit continued to operate reaching above-recommended temperatures until it died. For the units that have been damaged, the temperature rises quickly even if the units are run at lower frequencies, so I think those are permanently damaged.
 
There was some amount of temp watching with OC and the start of the tempmon library in the Beta thread.

816 was marked as highest speed for OC where heatsink wasn't required. But if temps climb then heat a problem as used. Smaller T_4.0 PCB more sensitive than more mass/surface/pins of T_4.1.

Over 816 last seen temp would often be under 60 before shutdown. Simple heat sink gave some help of a few degrees removing/spreading hot spot. Fan alone was maybe 2X better as it cooled the whole board - and combo better yet.

The MCU chip temp measure point is a point somewhere - maybe not the hottest or most prone to cause failure.

Paul images showed a bit more effective high finned unit versus RaspPi copper blob used here. This was base experiment on minimally connected units most in open air. Not seen any scientifical feedback from real world use cases.

See this post and thread for similar: https://forum.pjrc.com/index.php?threads/teensy-4-0-first-beta-test.54711/post-218911
 
Last edited:
Back
Top