Targeting Teensy 4.1 uses 5x as much dynamic memory as Teensy 3.6

Status
Not open for further replies.

W5UXH

Member
This is only a curiosity at this point. I have been working on a project using the Teensy 3.6 and decided it would be interesting to see if it would even compile when targeting the Teensy 4.1. I switched from using TeensyDelay for the timers, to using TeensyTimerTool in order to get it to compile without error. I am using Teensyduino 1.52 / Arduino 1.8.12. The 5X increase in dynamic memory strikes me as being a bit unusual, so I am only interested in knowing if this could possibly be expected.

This is memory usage when compiled for the Teensy 4.1:

Sketch uses 114272 bytes (1%) of program storage space. Maximum is 8126464 bytes.

Global variables use 139956 bytes (26%) of dynamic memory, leaving 384332 bytes for local variables. Maximum is 524288 bytes.


This is memory usage when compiled for the Teensy 3.6:

Sketch uses 107216 bytes (10%) of program storage space. Maximum is 1048576 bytes.

Global variables use 25484 bytes (9%) of dynamic memory, leaving 236660 bytes for local variables. Maximum is 262144 bytes.

The project uses the following libraries:

Using library ILI9341_t3 at version 1.0
Using library SPI at version 1.0
Using library ADC at version 8.0
Using library SdFat-beta-master at version 2.0.0-beta.6
Using library USBHost_t36 at version 0.1
Using library TeensyTimerTool-master at version 0.1.9
Using library Encoder at version 1.4.1
Using library XPT2046_Touchscreen at version 1.3

I would like to see if I can move my project to the Teensy 4.1 without major breadboard changes. That is what prompted me to do this comparison, without knowing how practical it might be to make the change.

Thanks,

Chuck
 
With the "Blink" example it's even worse: using "Faster" (1.8.12 + 1.52)

BLINK
Code:
T3.6:
Sketch uses 10724 bytes (1%) of program storage space. Maximum is 1048576 bytes.
Global variables use 3828 bytes (1%) of dynamic memory, leaving 258316 bytes for local variables. Maximum is 262144 bytes.

T4.1:
Sketch uses 14688 bytes (0%) of program storage space. Maximum is 8126464 bytes.
Global variables use 41660 bytes (7%) of dynamic memory, leaving 482628 bytes for local variables. Maximum is 524288 bytes
 
My guess it is something like, differences in some of the buffers, like USB. And of course you do have the other 256KB of DMAMEM available to you as well.
 
On Teensy 4.x, the main 512K RAM1 bank is divided into ITCM and DTCM, where I=Instructions and D=Data. ITCM is used by default for functions, unless you add "FLASHMEM" to them. The FlexRAM hardware supports memory partitioning in 32K chunks, so even if you have only 1 small function in ITCM, a minimum of 32K must be allocated. Most of that memory usage Arduino is reporting is one 32K chunk dedicated to ITCM.

ITCM is basically zero wait memory with a wide 64 bit bus directly to the M7 processor. Code runs extremely fast from ITCM. Caching is never used with ITCM because it's as fast as the M7 caches, so you also get very consistent & deterministic speed.

All code not in ITCM shares a 32K cache. Usually that gives excellent performance for normal code, though cache misses are quite slow if the code is executing from the flash chip (what "FLASHMEM" does). Normally FLASHMEM is used with startup code. But if your program becomes large, you would probably want to start using FLASHMEM on functions where performance is less important.
 
KurtE and Paul: Thanks, sounds like it is to be expected. The 4.1 has more than enough memory for me even with this, so it is good to know nothing bad is going on. I am new to all of this, so it is great to have this forum as a resource. I will go ahead and order a 4.1 to experiment with in parallel with my several 3.6 modules while continuing to work on the project.

Thanks again.
 
Excellent explanation Paul, thanks.
To be honest there were already bits of explanations across the various 4.x topics, but this message should really be in the T4.x web pages, IMHO.
 
Teensy 4.0 is the latest Teensy, offering the fastest microcontroller and powerful peripherals in the Teensy 1.4 by 0.7 inch form factor

too picky Frank :) ... full context:
Teensy 4.0 is the latest Teensy ... in the Teensy 1.4 by 0.7 inch form factor

And to update for the T_4.1::
Teensy 4.1 is the latest Teensy ... in the Teensy 2.4 by 0.7 inch form factor

Good to see updates on the PJRC website
 
Interrupt Service Routine Syntax

To use interrupts, you must include the AVR interrupt header.
#include <avr/io.h>
#include <avr/interrupt.h>
--etc

For debug:
#include <usb_debug_only.h>

--etc
 
This product page has lots of 4.0 details that generally apply to 4.1 as well for the 1062 aspects:: pjrc.com/store/teensy40.html

It gives a larger explanation of memory on the 1062 processor

A user reads this (Teensy 4.0):
https://www.pjrc.com/store/teensy40.html
"1024K RAM (512K is tightly coupled)
2048K Flash (64K reserved for recovery & EEPROM emulation) "

And this (Teensy 3.6):
https://www.pjrc.com/teensy/techspecs.html
"256K RAM
1024K flash"

and should figure out by himself that actually there are cases where his sketches could fit the 3.6 but not the 4.x?

I humbly think this is asking too much.
 
There could be a very simple fix for this which would make the T4.x more compatible to the older models by optionally switching to a different linkage.
But this has not found much love :) I don't know why. ... which reminds me, I still have to close the PR.
 
There could be a very simple fix for this which would make the T4.x more compatible to the older models by optionally switching to a different linkage.
But this has not found much love :) I don't know why. ... which reminds me, I still have to close the PR.

That is why I use Makefiles for projects that do not need to be Arduino compatible, or where the word Arduino has a bad taste.
 
Since the weather is getting better now and I have other hobbies and will move, I will leave this forum again for a while.

@all: Stay healthy. Keep the distance.
Best wishes,
Frank.
 
There could be a very simple fix for this which would make the T4.x more compatible to the older models by optionally switching to a different linkage.

Indeed it would be an excellent solution.

For example, it could have been called automatically by the toolchain when the user selects "Smallest Code" (which means it has memory occupation issues).
 
Since the weather is getting better now and I have other hobbies and will move, I will leave this forum again for a while.

@all: Stay healthy. Keep the distance.
Best wishes,
Frank.

Moving? A new residence - or getting out for exercise moving?

Stay well, looking forward to your next posts.
 
A new residence.

@WMXZ, why not do a step more and switch to MBED or MCUxpresso?
I'll use the time off to think about it.
 
Last edited:
I agree with this.
I recently ported my project (designed and printed out new PCBs, bought the new Audio Adaptor for 4.1) from T3.5 (96% Dynamic Memory) to T4.1 and found out that my brand new 4.1 doesn't have enough memory :-(
I thought that moving over to 4.1 would enable me to add more functions to my project, instead I have to remove features...
"data section exceeds available space in board".

Yes, have you asked NXP about this? What is their excuse? What do they say? :)

A bit more serious: There are several issues that may play a role.
Easiest: The printed sizes that you can see at the end of the compilation are just wrong. I'd say it would be better without them..but..ok..
For T4.x, it prints too high RAM usage, too low FLASH usage. The latter will be fixed with TD 1.54 release.

T4 copies the code from FLASH to RAM. This way, it executes faster.
Obviously: the larger your code is, the less usable RAM you'll have.
There are at least three ways to "fix" this:
a) use the FLASHMEM keyword (Macro) - esp. for functions that are not speed-sensitive.
Example: FLASHMEM void foo() {...}
This way, it gets not copied to RAM. (Note: RAM for code gets allocated in 32KB Blocks.)
b) use the HEAP. It is located in the 2nd big RAM area -> Use getMem() and new().
c) modify the linker file. It's possible to link in a way that all the code stays in FLASH. I've shown such files for a earlier Teensyduino version, but its unlikeley that they still work and they'll need some attention to update them.

Edit:
And use the PROGMEM macro for const arrays:
PROGMEM const int xyz[12345] = {...
 
Last edited:
Can you show one of them? Maybe there is an easy fix. And pls. see my answer #21 to your deleted post.
 
Extending what FrankB has noted ... Referencing this page - See FLASHMEM :: pjrc.com/store/teensy40.html

Any code header not marked 'FLASHMEM' will be brought to the primary 512KB of RAM1

The 1062 IS Generally MORE powerful in ALL respects - but one - is has no onboard flash directly connected to the MCU - but uses a somewhat slower to access external FLASH part for code/storage. Where 'somewhat' is an even bigger factor when the core is running at 600 MHz rather than 120 MHZ of the T_3.5.

So preloading code into RAM1 allows faster code execution but on 32KB storage boundaries takes runtime RAM from the sketch.

Uncommenting code leaves it unusable. Putting FLASHMEM on that code makes it active but keeps it stored on FLASH where execution requires it to be read before execution - if not in the CODE cache from prior use.

If there are lesser run - or startup only pieces of of code mark them as FLASHMEM and they will not be pulled into RAM on startup.

It looks like this in use:
Code:
FLASHMEM
bool LittleFS_Program::begin(uint32_t size)
{
...

Putting this line in platform.txt will show the memory breakdown:
Code:
teensy41.build.flags.ld=-Wl,--print-memory-usage,--gc-sections,--relax "-T{build.core.path}/imxrt1062_t41.ld"

Also in that linked page on memory layout is the DMAMEM in RAM2 - it runs at 25% of CPU speed - but gives access to another 512KB of usable RAM. But using that takes manual loading and manipulation.
 
@johan: yes, just write
Code:
PROGMEM const unsigned int bpb606bd04[10561] = {
And you'll see NXP is not guilty ;)
 
Status
Not open for further replies.
Back
Top