Disappointing and inexplicable Teensy 4.0 24mhz high power consumption compared to 3x

Paul_M

Member
Paul, I love your products and would appreciate a technical explanation of the following disappointing phenomenon.

Long story short: Despite supposing to have WAY BETTER mhz/ma power consumption ratio than Teensy 3.x, my Teensy 4.0 at 24mhz is using almost TWICE THE POWER my Teensy 3.x is running the same code at 24mhz.

I got super excited for Teensy 4.0 because the 600mhz-at-100ma promise means, for my application and for many practical applications, I can get 24mhz at 4ma assuming linear scaling.

According to the NXP datasheet for the worse-1050 (Page 13 here), there should be no more than 7ma usage at 24mhz.

According to this NXP datasheet for the 1060 (Page 26 here), there should be no more than 12ma consumption at 24mhz.

After implementing defragster's kind instructions in this thread on how to set Teensy 4.0 clock speed, I am observing 36ma power consumption at 24mhz (LED on, 33ma without).

In fact, I tested and got the following clock speed / power consumption results:

16mhz/36ma
24mhz/36ma
48mhz/41ma
72mhz/44ma
96mhz/53ma
120mhz/53ma
200mhz/54ma
300mhz/63ma
400mhz/65ma
500mhz/68ma
600mhz/83ma

(NOTE: I was running defragster's code here. Bare empty loop results in slightly higher consumption especially at higher mhz, e.g., 100ma at 600mhz)

What is wrong? This result is incredibly disappointing. Virtually nobody needs 600mhz (just buy an extremely cheap Raspberry Pi Zero if you do), but I have found Teensy 3.5/6 to be THE ULTIMATE best choice for any sort of battery powered real-time constant-on logging/monitoring application. I got my hopes up Teensy 4.0 would advance the bar even further with optimized power consumption, but the results are far worse - not better - than Teensy 3.x

See the graph manitou posted in 2016 here: https://forum.pjrc.com/threads/36241-NXPK66-vs-STM32L4?p=127260&viewfull=1#post127260

I can confirm those results. I have a Teensy 3.2 laying around here currently pulling 20ma at 24mhz running same code that the Teensy 4.0 took a whopping 36ma to run.

I am disappointed and I really want to stick with Teensy because I love Paul's products and, especially, the FreqMeasureMulti library. I guess worst case I will be stuck with T3.5/6, but I was hoping Paul would push the bar for low-power MCU applications with Teensy 4.0.

Is there something we can do to fix this and get the 12ma @ 24mhz power consumption which is indicated in the user manual?

If not, the competition is finally ahead of Teensy. I can report that my Sparkfun Artemis Nano is pulling only 4ma to 5ma when plugged into USB. I assume the sample blink sketch it comes shipped with is the regular 48mhz at which it is advertised. 4.5ma at 48mhz! This blows Teensy 3.5 out of the water, which uses just less than 30ma at 48mhz.

But I don't want to use Sparkfun Artemis Nano with zero libraries or documentation, I want to support Paul and use a Teensy product which is compatible with the FreqMeasureMulti library.

How is the Sparkfun Artemis Nano able to pull 5ma at 48mhz whereas Teensy 4.0 is using 36ma to run a basic sketch at 24mhz?

How can we fix this?

Is there a software solution, or is there some physical component on the Teensy 4.0 which is draining much more current than the processor itself needs or uses?
 
I’d guess that the T4 core might run draining only a few milliamps when running @24MHz. But in that chip, there isn’t only the core but also a huge bunch of integrated peripherals which are partly activated on boot to make it compatible with most Arduino libraries which expect several UARTs, the SPI, the I2C, and so on to be magically enabled since these libraries have no code to enable these peripherals in a specific way. It’s like when you start your car, the A/C, the radio, the GPS and the traffic lights were automatically switched on for your comfort, while (naturally) pushing the fuel consumption beyond the published “motor only” values.
Thus, in order to cut the power consumption down, you’d have to read through the whole T4 boot code to see which peripherals are enabled by it but not needed by you. Then, your code would have to disable these in your setup() routine by setting the corresponding configuration flags in the SIM to false before disabling the corresponding clock distribution and in some cases PLL modules.
 
I’d guess that the T4 core might run draining only a few milliamps when running @24MHz. But in that chip, there isn’t only the core but also a huge bunch of integrated peripherals which are partly activated on boot to make it compatible with most Arduino libraries which expect several UARTs, the SPI, the I2C, and so on to be magically enabled since these libraries have no code to enable these peripherals in a specific way. It’s like when you start your car, the A/C, the radio, the GPS and the traffic lights were automatically switched on for your comfort, while (naturally) pushing the fuel consumption beyond the published “motor only” values.
Thus, in order to cut the power consumption down, you’d have to read through the whole T4 boot code to see which peripherals are enabled by it but not needed by you. Then, your code would have to disable these in your setup() routine by setting the corresponding configuration flags in the SIM to false before disabling the corresponding clock distribution and in some cases PLL modules.
Thank you for the explanation.

Do you really think that the whopping 36ma-observed to 12ma-should-be disparity can be explained by activated additional peripherals on the device?

If this explanation is valid, why does the Teensy 3.2 use less than 20ma to run 24mhz likewise without disabling any peripherals? When T4 is using 36ma for 24mhz?

That would make the Teensy 4.0's power consumption overall, apples-to-apples, much worse than Teensy 3.x when it is supposed to be markedly better.

For that reason, and because of the minimal difference in T4 power consumption between 16mhz and 48mhz (see my test results above), I have to respectfully disagree with your theory.

There must quite clearly be something else, and something wrong, going on, and I would like to know if Paul agrees.
 
I suspect the t4 has more peripherals compared to a t3.6 or 3.2 plus dont forget theres a external memory chip on the t4 and the mk02 chip as well..
 
The MKL02 is contributing about 6mA, pretty much the same as the early days of Teensy 3.0. It's on my to-do list to fix.

My hunch is the 480 Mbit USB PHY is probably adding several mA. There's a bit in the PHY registers which can restrict it to only 12 Mbit speed. Might be worth a try?

Just to be clear about this, I want you to understand that I do care about supporting lower power at low speeds, but that's a much lower priority than many other things. I don't expect to put much engineering time into this for at least a few months. So many other things are much more urgent, especially the missing USB functionality and a couple dozen libraries not yet ported.
 
@Paul_M

I would suggest a few things:

a) The T4 code base is young. That is for many of us who worked on the T4 beta, getting to the absolute lowest power usage was not something we even looked at. I am sorry to say that I am/was more interested on how fast I could run things versus how slow...

Yes there are some who have and I suspect that there will be some changes found/made that improve this over time. Remember the 3.2 which was a slightly modified 3.1 which was close to the 3.0 has been out for several years now and as such has had time to mature and hopefully find ways to reduce the power requirements.

Example of above. With T3.x boards, you can choose what CPU speed you wish for the board to run at... Currently we have it set to always setup for 600mhz. Probably over time this will change.

b) Paul is probably still up to his eyeballs trying to catch up on lots of things like, getting the third shipment of boards in, fixing how Arduino handles the terminal monitor. Let alone the work I know he wants (needs) to do to rework USB Serial for full high speed, plus allow USB to be used for other types of usage, like Mouse, Keyboard, Joystick, Midi, ... Probably a new/updated Sound board for the T4 pin pattern...

c) As @Theremingenieur mentioned, there are lots of bits and pieces of the T4, which may or may not be initialized in the best way for Low power usage. The good news is all of the sources are at your fingertips.
So for example if you went through code, like what is in startup.c and found some obvious things that could be done, which reduce the power usage, I am sure @Paul would be happy to take in Pull Requests to improve things.

There could be lots of things that one might look at. Example in what state are all of the GPIO pins initialized to on T4. With T3.x, I believe you could initialize them to a mode 0 which says the pin is not active... I don't think there is an equivalent on the IMXRT, so I don't remember what they were actually configured for. But one could always experiment and see if changing these help out.

Maybe how the memory types are used and configured is bad for power usage? I think we configure to use all of FlexRam (DTCM/ITCM) to be setup. Good or Bad??? How many timers are running? Are there ISRs that are firing too often?

Again probably lots that can be done and most likely will be done. And those of us who try to help out, appreciate all of the help we can get!
 
According to this NXP datasheet for the 1060 (Page 26 here), there should be no more than 12ma consumption at 24mhz.

I had a look at page 26 as quoted but could not find this reference to 12 mA? Are you looking at a different page? Maybe a typo, what paragraph number?

On page 26, the first line says "Table 13 shows the current core consumption (not including I/O) of i.MX RT1060 processors in selected low power modes." So this table appears to be specifically excluding I/O.
 
Not to impune efforts - but to show the freshness of all this - the fact that the first pass to boot and function has been done and then selective priority revisiting is underway at the same time as finishing some first pass elements.

This line in set_arm_clock() :: reading } else if (frequency <= 24) { should read } else if (frequency <= 24000000) {

>> That buys 4 mA. Glancing at the code it 'reads well' until you double check that that means Hz not MHz ... I just made a PULL request for that edit. Some of the many other flags and BITS are way less apparent when it comes to use or side effects. One 'blessing' of the 1062 over the 1052 was higher speed GPIO - that setting probably doesn't come with the 'free lunch' package?

There probably are not many more software things like that 24 Hz...MHz - except as Paul Notes if HIGH speed USB can be made FAST instead and perhaps drop power. And perhaps others - or Duff in looking at SNOOZE lib will pick up some generally useful edits reading about shutting down power hungry components.

T4 uses an external Flash for code/data storage - ideally that isn't much power? But it adds something. And as Paul notes the known 6 mA from boot MCU can be reduced ... just needs time and attention. Otherwise there isn't much external to the MCU eating power on thie Teensy thing.
 
Not to impune efforts - but to show the freshness of all this - the fact that the first pass to boot and function has been done and then selective priority revisiting is underway at the same time as finishing some first pass elements.



>> That buys 4 mA. Glancing at the code it 'reads well' until you double check that that means Hz not MHz ... I just made a PULL request for that edit. Some of the many other flags and BITS are way less apparent when it comes to use or side effects. One 'blessing' of the 1062 over the 1052 was higher speed GPIO - that setting probably doesn't come with the 'free lunch' package?

There probably are not many more software things like that 24 Hz...MHz - except as Paul Notes if HIGH speed USB can be made FAST instead and perhaps drop power. And perhaps others - or Duff in looking at SNOOZE lib will pick up some generally useful edits reading about shutting down power hungry components.

T4 uses an external Flash for code/data storage - ideally that isn't much power? But it adds something. And as Paul notes the known 6 mA from boot MCU can be reduced ... just needs time and attention. Otherwise there isn't much external to the MCU eating power on thie Teensy thing.

In the SDK for the NXP EVKB for the 1062 there is an application that does adjust power settings for you in line with the reference power measurement document. However, to try and port that over is a project in a half. Essentially going through it disables most of the peripheral clocks as I mentioned before and resets others to keep the processor going. There is a specific sequence to follow (implement) if you are going to do that in a library or a sketch. Just figured I would throw that out there. To boot it was built on top of RTOS so I don't know the interaction between the 2 I didn't do a deep dive.
 
In the SDK for the NXP EVKB for the 1062 there is an application that does adjust power settings for you in line with the reference power measurement document. However, to try and port that over is a project in a half. Essentially going through it disables most of the peripheral clocks as I mentioned before and resets others to keep the processor going. There is a specific sequence to follow (implement) if you are going to do that in a library or a sketch. Just figured I would throw that out there. To boot it was built on top of RTOS so I don't know the interaction between the 2 I didn't do a deep dive.

Here is the NXP pdf describing 1060 power measurements (1060@600MHz consumes 74 ma)
https://www.nxp.com/docs/en/application-note/AN12245.pdf
 
Last edited:
During the first T4 Beta round (1052 device), I noted that the "keeper" mode on the ADC input pins was set by the default initialization, which reduces idle power consumption, but also causes a small kink near midpoint in the ADC transfer curve if you use analog input. I've sort of lost track of the issue and I'm not sure of the current code default behavior for the released T4 using 1062. The keeper mode may be turned off in at least some cases, which would increase current at least a small amount but I don't know if is done per-pin used by ADC, or on all analog pins if ADC is used, or on all ADC pins regardless of use. See also, for example https://forum.pjrc.com/threads/54711-Teensy-4-0-First-Beta-Test?p=201652&viewfull=1#post201652
 
OK, i updated the coremark power plot. Your 24mhz fix does reduce power consumption by about 4ma for T4@24MHz. Thanks

Cool! Paul Pulled that change into cores github told me so it was valid and will be coming out that way.

Looking at PAGE 10 of the p#13 PDF is interesting! The 1062 CAN be made to run Lower Power at 24 or 133 MHz? - but that has not been a focus so far. Just a few compromises to be considered and bits to be flipped …
 
Questions about 24MHz operation and clockspeed.c:

  1. Looking at clockspeed.c and when changing the clock speed to 24MHz is the BUS clock running 12MHz? I can't tell yet.
  2. Are we using the RC OSC 24M used instead of the XTAL OSC 24M? Looks like the USB PLL has to be disabled to truly run off the RC 24MHz OSC when changing clockspeed in a sketch?
  3. 24MHz is considered according to the NXP documentation as "Low Speed Run Mode", what PLLS are running after reset and initialization?
  4. According to documentation the Analog LDO should be using the "Weak Mode" I don't think this is touched in clockspeed.c?
  5. According to documentation something about the a "Module Clock" should be off , not sure what this is yet?

I'm pretty sure that the current configuration of the T4 at 24MHz is not ideal for running this at the lowest possible power consumption if thats your design goals.
 
The Teensy 4 can run with somewhat useable USB down to 1MHz AHB/IPG core clocks using the 24MHZ OSC as the root. OSC get divided down to the target frequency with the CLK2/AHB PDOF's. As long as sw-PLL3 is enabled and running at 480 MHz (default) the AHB/IPG can be clocked down to 1MHz and the usb will work better than I thought it would. I saw no problems running the AHB/IPG clock from the OSC at 24MHz (CLK2/AHB divider set to 1). I need to check the current consumption but my guess is that instead of currently using the ARM PLL at 24MHz we can use the OSC24M instead and power down the ARM PLL and still get useable USB use with a lower current consumption. Also this would allow us to run at 1,2,3,4,6,8,12 MHz for low power run modes.
 
Long time no update here. Few days ago I received teensy 4 and was appalled how much current it draws without any load. I searched for answer and found this thread.
My question is, is there a summary of todos without reading pages of answers?
 
You do realize that the chip has a high speed FPU, and clocks at 600MHz? CMOS power consumption goes as

N x f x C x V^2 where N represents the number of clocked gates (complexity), f the clock freq, C the gate capacitance and
V the core voltage. This is a hard limit due to physics - smaller processes can reduce C and V, but also give higher N
and f....

If you need the high performance you will have to pay in power consumption - note that the T4 can do FFTs at pretty
comparable speeds to my laptop's FPU, and only needs 500mW to do this which is far far less.
 
You do realize that the chip has a high speed FPU, and clocks at 600MHz? CMOS power consumption goes as

N x f x C x V^2 where N represents the number of clocked gates (complexity), f the clock freq, C the gate capacitance and
V the core voltage. This is a hard limit due to physics - smaller processes can reduce C and V, but also give higher N
and f....

If you need the high performance you will have to pay in power consumption - note that the T4 can do FFTs at pretty
comparable speeds to my laptop's FPU, and only needs 500mW to do this which is far far less.

Mark, with respect, did you read my OP? This concerns T4.0 at 24mhz. The variable of clock speed is controlled for a ceteris paribus comparison between T3.5 and T4.0. The point of this thread is that the NXP published mhz/ma runtime power efficiency factor is superior for the T4.0 chip, but in reality, the T3.5 has way better practical power-powerformance all things held equal, id est, clock speed for both comparisons held to 24mhz.

Has there been an update on this topic? Is it really a software-based difference? It seems so high of a difference it is likely to be hardware-based.

The Teensy 3.5 is such an awesome product and way more useful than T4.0, because if you really need extreme compute power you can just get a $5 RPi Zero. But Teensy 3.5 serves so many power-constrained, battery operated purposes. But as much as I love T35 I know the technology has advanced so much further. If you look at Ambiq's Apollo3 and now Apollo4 MCUs, which SparkFun have integrated into their Artemis module, it blows away T35 by an order of magnitude on the power/performance ratio.

Obviously however it doesn't have the highly useful, refined, tested software environment of Teensy. So here is my suggestion to make a special low-power high-compute-efficiency Teensy based on Ambiq Apollo4 and adopting the same unbeatably convenient form factor and software libraries that the Teensy platform offers. I cannot help but believe this much-easier-said-than-done endeavor would be a more financially valuable investment of time than redoing the same old thing with whatever NXP's next even-fancier high-speed, awful-power-efficiency MCU is going to be.
 
Perhaps ask NXP why the 1062 draws more power.

I'm pretty sure they can answer this better. PJRC does not produce the chip.
And why do you use it, if you need 24MHz only? 1062 is built for high speed, not for low power consumption. So, you just use the wrong product. You use a Ferrari to get buns.
Have you tried a Teensy LC? And did you try the things Paul suggested?
 
Last edited:
I was wondering if there was any update to this. I too am interested in having my Teensy 4.1 sleep with the most minimal power draw as possible/wake up to do something useful. I had hopes for the sleep library, but will settle for a defined sleep period before waking up.

Really, I am just hoping to see minimal/almost non existent current draw if possible.
 
Back
Top