Teensy 4.1 stack size of 13k enough?

danielkr

Member
Hi all,

I am coding a little project for my model railway using some own code and some libraries (SPI, SD, NativeEthernet and Websocket).
Compiling the project with -Og results in the following teensy_size output:
Code:
teensy_size*: Memory Usage on Teensy 4.1
teensy_size*: FLASH: code:443968, data:41008, headers:9060   free for files:7632428
teensy_size*: RAM1: variables:52228, code:441464, padding:17288   free for local variables:13308
teensy_size*: RAM2: variables:24768  free for malloc\new:499520
Line 3 means that 13308 bytes are left for the stack and interrupt handling, right? To me, this seems a little low and doing debug work, strange effects happen (e.g. only small packets are sent via ethernet, large ones kill the socket). Compiling with -Os, the code size is reduced to the following teensy_size output:
Code:
teensy_size*: Memory Usage on Teensy 4.1
teensy_size*: FLASH: code:277032, data:31788, headers:9100   free for files:7808544
teensy_size*: RAM1: variables:41892, code:274568, padding:20344   free for local variables:187484
teensy_size*: RAM2: variables:24768  free for malloc\new:499520
Here, I have plenty of space left and the application works without hiccups. I am totally fine with 55k of variables, but the code size is still very heavy.

I already moved many strings to FLASHMEM but still struggle to reduce code size.

Is there a way to check the stack size at runtime? Or is there a way to put code into the RAM2 for debugging sessions? Or put the stack on RAM2 instead?
I also appreciate any other suggestion. With visual micro I have the option to compile my application, libraries and core with different optimization levels, but this is not the best idea, right?

Thanks
Daniel
 
My rule of thumb is if I am concerned then I am concerned ;)
You may easily be fine, but sometimes best to be safe than sorry.

If I run into things like this, some of the steps I might take, include some of the following. Usually I will look for some easy low hanging fruit.

a) if You have some larger uninitialized buffers - I would consider moving them to RAM2, like:
Code:
DMAMEM uint16_t myScreenBuffer[320*240];

b) If I have large constant tables, I would look to leave those tables in the flash memory and not copied to RAM1(DTCM)
Code:
static const uint8_t PROGMEM init_commands[] = {4, 0xEF, 0x03, 0x80, 0x02,
                                        4, 0xCF, 0x00, 0XC1, 0X30, 
                                        5, 0xED, 0x64, 0x03, 0X12, 0X81, 
                                        4, 0xE8, 0x85, 0x00, 0x78, 
...

c) If you can reduce the code that is downloaded to RAM1(ITCM) by enough it will give that memory to DTCM. These are in chunks of 32K.
That padding 20344 tells you that you used about 12K of the last ITCM page. So if you mark enough code with FLASHMEM like:
Code:
FLASHMEM void ILI9341_t3n::begin(uint32_t spi_clock, uint32_t spi_clock_read) {
Then you could gain that 32k for data...
 
KurtE's "c)" brings attention to first thought here on reading. In the first case:
Code:
teensy_size*: FLASH: code:443968, data:41008, headers:9060   free for files:7632428
teensy_size*: RAM1: variables:52228, code:441464, padding:17288   free for local variables:13308

That "padding:17288" is going to waste and would return 32K for RAM1 if ~15K of code was not in RAM1:ITCM

Also ODD Both FLASH and RAM1:DTCM data/variables dropped ~10K when "Compiling with -Os", as shown in two size outputs in post #1

Not sure if that points to something that could be worked with? When not using "-Os" the code consumes ~20K more data&variables?
 
Thanks for the idea with bringing initialization code also to FLASHMEM.
As of now, I do not have large arrays which I can bring there, too.
I am now close to 32k of padding and the rest I will find, too.
Although I was bit worried that with added functionality, I will soon be in that area again...

So I combed through the source and replaced stl libraries with little usage. Seems like #include <sstream> eats up 200k of code section ram:
Code:
teensy_size*: Memory Usage on Teensy 4.1
teensy_size*: FLASH: code:258452, data:27696, headers:8796   free for files:7831520
teensy_size*: RAM1: variables:33092, code:243512, padding:18632   free for local variables:229052
teensy_size*: RAM2: variables:24768  free for malloc\new:499520

I have no clue why as inspecting the file it is included in (only one cpp file), arm-size.exe reports 5744 in the text section and after removing the include, it only reports 2348. Where is the remaining 20000k?
Who knows what other includes out there cause the remaining code size :cool:

Thanks for your help, I learned alot :)
 
That is a big change! : free for local variables:229052 - even FLASH code is smaller.

Assuming that is not compiled '-0s' ?

Feel free to leave more details on what was changed where to address the issue for others that run into similar issue.
 
As I said, I removed the #include <sstream> include I used for a custom integer-to-std::string function and replaced it with a manual conversion. The results shared were using only -Og.
The strange thing is that the file sstream was included in shrinks from 5744 to 2348 according to arm-size.exe. I have no clue where the 200k come from.

So lessons for all: remove standard library stuff if you can.
 
As I said, I removed the #include <sstream> include I used for a custom integer-to-std::string function and replaced it with a manual conversion. The results shared were using only -Og.
The strange thing is that the file sstream was included in shrinks from 5744 to 2348 according to arm-size.exe. I have no clue where the 200k come from.

So lessons for all: remove standard library stuff if you can.

Good note. It was the massive 200K that suggested there was more at hand - that '#include <sstream>' must drag in a lot of baggage.

Stack issue should be resolved now :) - room to grow!
 
I just wanted to add:
When you get close to the RAM1 limit, i experienced that some SD card functions might not work any more as expected.
I had fatal crashes when calling SD.exist() and SD.open with write enabled. For the latter one, i dug deeper and found out that
FatFile:: openCachedEntry()
would cause a crash (this was still with TD 1.58, not sure if this persists in the current 1.59-3).
Reducing code size at a totally different part of the application solved the problem.
 
Can I talk you into creating a small example program which consumes 200K by use of #include <sstream> ?

I tried this:

Code:
#include <sstream>

void setup() {
}

void loop() {
}

and the result was only the ordinary small memory used by the USB stack and startup code.

Code:
Memory Usage on Teensy 4.1:
  FLASH: code:8932, data:3016, headers:8528   free for files:8105988
   RAM1: variables:3808, code:6240, padding:26528   free for local variables:487712
   RAM2: variables:12416  free for malloc/new:511872
 
So I made a quick example with Teensyduino 1.59 and compiling for Teensy 4.1:
Code:
#define WITH_SSTREAM
#ifdef WITH_SSTREAM
#include <sstream>
#else
#include <string>
#endif
const int i = 911;
void setup() {
  Serial.begin(115200);
}
void loop() {
 
#ifdef WITH_SSTREAM
  std::string s = std::to_string(i);
#else
  std::string s("911");
#endif
  Serial.println(s.c_str());
}
Which gives with WITH_SSTREAM defined:
Code:
Memory Usage on Teensy 4.1:
  FLASH: code:71084, data:10184, headers:8840   free for files:8036356
   RAM1: variables:11072, code:68388, padding:29916   free for local variables:414912
   RAM2: variables:12416  free for malloc/new:511872
else
Code:
Memory Usage on Teensy 4.1:
  FLASH: code:11156, data:4040, headers:8352   free for files:8102916
   RAM1: variables:4864, code:8464, padding:24304   free for local variables:486656
   RAM2: variables:12416  free for malloc/new:511872

Which means that SSTREAM takes 72k of RAM and 60k of code. Wow.
I now know what the reason could be if the RAM shrinks too much...

Have a good day
Daniel
 
If you switch the standard library from the default newlib to newlib-nano your size difference shrinks down to a few 100 bytes. Here some info about the two versions of the standard library: https://mcuoneclipse.com/2023/01/28/which-embedded-gcc-standard-library-newlib-newlib-nano/

Switching to newlib-nano can be done by using smallest code in the Arduino IDE. If you want/need a higher optimization level together with the smaller standard lib, you can add --specs=nano.specs to your compiler options and keep the optimization flags to -O2 or -O3.
 
With this, we are down to:
With WITH_SSTREAM defined:
Code:
Memory Usage on Teensy 4.1:
  FLASH: code:9996, data:3008, headers:8496   free for files:8104964
   RAM1: variables:3808, code:7536, padding:25232   free for local variables:487712
   RAM2: variables:12416  free for malloc/new:511872
And without:
Code:
  FLASH: code:9436, data:1984, headers:9056   free for files:8105988
   RAM1: variables:2784, code:6976, padding:25792   free for local variables:488736
   RAM2: variables:12416  free for malloc/new:511872
So the difference is 1k FLASH and 1k RAM now which is acceptable, I think.
 
Back
Top