Teensy 3.6 RAM memory use very high

Status
Not open for further replies.

sixeight

Well-known member
I have been working for the past six years on a big project with a lot of code.
Recently I have moved from Teensy 3.6 to Teensy 4.1.

Now when I compile it, Arduino shows:
Code:
Sketch uses 307840 bytes (3%) of program storage space. Maximum is 8126464 bytes.
Global variables use 345108 bytes (65%) of dynamic memory, leaving 179180 bytes for local variables. Maximum is 524288 bytes.

How can the global variables suddenly become so large? On Teensy 3.6 it is less than 10k. Here it is larger than the program code. Where can I start looking? I use a lot of PROGMEM arrays and also a derived class structure in my code.

To get an idea of my code:
https://github.com/sixeight7/VController_v3/
 
How can the global variables suddenly become so large?

RAM is being used for code. If you install 1.54-beta7, the memory usage info is improved to show how the memory is being used. 1.53 uses Arduino's default memory summary which can't show more detail than everything is either flash or ram.
 
RAM is being used for code.

So technically a Teensy 3.6 can have more code than a Teensy 4? Or is there a way around it? I did read something about FLASHMEM (https://forum.pjrc.com/threads/57326-T4-0-Memory-trying-to-make-sense-of-the-different-regions), but to add that to every bit of the code seems impractical. Or does adding RAM help?

I already added DMAMEM to the largest memory buffers.

Really want to know, so I can make a future proof product. I sell these and would not like to run out of program space.
 
There is a way around the appearance that : "Teensy 3.6 can have more code than a Teensy 4"

See FLASHMEM here - and other memory details: pjrc.com/store/teensy41.html#memory
Code:
FLASHMEM - Functions defined with "FLASHMEM" executed directly from Flash. If the Cortex-M7 cache is not already holding a copy of the function, a delay results while the Flash memory is read into the M7's cache. FLASHMEM should be used on startup code and other functions where speed is not important.

All of the larger FLASH on the T_4.x's can hold code (or const data), but any code not in RAM1/ITCM is subject to longer load times when it isn't in the 32KB Code cache.

As noted the TeensyDuino 1.54 beta 7 ( or coming later versions to release ) have an improved memory usage display for RAM1 [ITCM and DTCM] and RAM2 [DMAMEM] - with details shown on linked page.
 
So, how does it compare speedwise? Running code from flash on the Teensy 3.6 compared to running code from flash on the Teensy 4.1? The Teensy 3.6 has a much higher bandwidth - 411 vs 66 on the Teensy 4.1. Or is that not a relevant specification in this case?
 
It really depends on how well (or poorly) the code makes use of the cache.

Teensy 3.6 has an 8K cache. Teensy 4.1 has a pair of 32K caches, but cache misses have a much larger impact on performance.

Code running from ITCM RAM doesn't use the cache, since all 512K of RAM which can be ITCM / DTCM is as fast as using the 32K caches.
 
...and you can use both.
If you keep your "hot" functions in the RAM (ITCM size is minimum 32kB anyway) it will be fast. Then, the additional 32kB (Teensy 4) instruction cache for the flash does a pretty good job.
Of course, uncached code will be slower. Just use it for initialization things, settings, etc. - code that does not need to be fast. Remember, if there is a loop there, it will be in cache after the first run..

I used a special linker script to keep everything in flash. Some benchmarks ran as fast as from ITCM. !No! difference.
 
Last edited:
Status
Not open for further replies.
Back
Top