libstdc++ exception handling (__verbose_terminate_handler) causing bloat in output binary

jmarsh

Well-known member
Even though the c++ compiler is passed switches like "-fnoexceptions -fno-rtti -fno-threadsafe-statics", the copy of libstdc++ that gets linked by Teensyduino was still built with exceptions enabled and includes a bunch of cruft associated with handling them. Most of it is associated with the "__verbose_terminate_handler" function that tries to display "friendly" information when unhandled exceptions cause the program to terminate.

Here's an example of build output from a program that makes reasonable use of the C++ STL:
Code:
Memory Usage on Teensy 4.1:
  FLASH: code:106108, data:13256, headers:8632   free for files:7998468
   RAM1: variables:18208, code:102564, padding:28508   free for local variables:375008
   RAM2: variables:17728  free for malloc/new:506560

If I add this code to stub out __verbose_terminate_handler:
Code:
namespace __gnu_cxx
{
    void __verbose_terminate_handler()
    {
        while (1) asm ("WFI");
    }
}
The size of the output binary shrinks to this:
Code:
Memory Usage on Teensy 4.1:
  FLASH: code:58224, data:7112, headers:8388   free for files:8052740
   RAM1: variables:12032, code:54680, padding:10856   free for local variables:446720
   RAM2: variables:17728  free for malloc/new:506560

By default nearly half of the program consists of exception handling rubbish that should never get used. Can we please either include the stub for this function in cores somewhere, or use a copy of libstdc++ that was built with --disable-libstdcxx-verbose.
 
Wow. Just adding your stub to a current large project went from this
Code:
teensy_size: Memory Usage on Teensy 4.1:
teensy_size:   FLASH: code:419516, data:281120, headers:8992   free for files:7416836
teensy_size:    RAM1: variables:114624, code:348808, padding:11640   free for local variables:49216
teensy_size:    RAM2: variables:36288  free for malloc/new:488000
to this
Code:
teensy_size: Memory Usage on Teensy 4.1:
teensy_size:   FLASH: code:394072, data:277024, headers:8836   free for files:7446532
teensy_size:    RAM1: variables:110528, code:323364, padding:4316   free for local variables:86080
teensy_size:    RAM2: variables:36288  free for malloc/new:488000
 
I figured I may as well try this as well. Here is my program by default:

teensy_size: Memory Usage on Teensy MicroMod:
teensy_size: FLASH: code:337960, data:95104, headers:8276 free for files:16073732
teensy_size: RAM1: variables:140512, code:317912, padding:9768 free for local variables:56096
teensy_size: RAM2: variables:66112 free for malloc/new:458176

And with the stub:
teensy_size: Memory Usage on Teensy MicroMod:
teensy_size: FLASH: code:312516, data:91008, headers:9144 free for files:16102404
teensy_size: RAM1: variables:136416, code:292468, padding:2444 free for local variables:92960
teensy_size: RAM2: variables:66112 free for malloc/new:458176

Interestingly, it shrank both the variable storage and the code size. Overall this one simple fix added 36864 extra bytes of free RAM1.
 
Interestingly, I don't notice a difference in sizes when I build with the "smallest code" option.
 
@shawn: the "smallest-code" option uses the nano version of newlib which is optimized for embedded systems. You can switch to it using the compiler switch --specs = nano.specs. Looks like newlib-nano doesn't have the exception stuff compied in?
 
Last edited:
Right, there's "libstdc++.a" which is 4879KB (TD 1.59) and "libstdc++_nano.a" which is 3042KB.
But using the Arduino IDE, the only way to use it is to set optimization to "smallest code" instead of using one of the other levels - I don't believe that shouldn't be necessary just to avoid linking in a bunch of unused data.
We should either be able to use exceptions, or libstdc++ should be built properly without support for them.
 
Not to mention that the nano version of newlib also excludes a bunch of other useful things, like %f formatting of floats etc
 
There probably are good reasons to use newlib instead of newlib-nano by default. I'm just wondering because ARM developped it for use in embedded processors as opposed to newlib which is meant for linux. The missing float printf / scanf can be activated by adding `-u _printf_float` and `-u _scanf_float`.
Does anyone know of something not working or working slower etc when using newlib-nano (smallest-code option)? See also this thread: https://forum.pjrc.com/index.php?threads/using-nano-lib-instead-of-new-lib.68376/

Also interesting:
 
Last edited:
Hi @luni
I switched to newlib-nano and enabled _printf_float and _scanf_float. The difference is enormous

Before:
Code:
teensy_size: Memory Usage on Teensy 4.1:
teensy_size:   FLASH: code:419516, data:281120, headers:8992   free for files:7416836
teensy_size:    RAM1: variables:114624, code:348808, padding:11640   free for local variables:49216
teensy_size:    RAM2: variables:36288  free for malloc/new:488000

After:
Code:
teensy_size: Memory Usage on Teensy 4.1:
teensy_size:   FLASH: code:361416, data:247328, headers:8724   free for files:7508996
teensy_size:    RAM1: variables:80352, code:290712, padding:4200   free for local variables:149024
teensy_size:    RAM2: variables:36288  free for malloc/new:488000

Note the 100K increase in free RAM1!!!

I'll upload the hex file to my bike and go for a runaround to confirm but initially everything seems to be working just fine...
 
@luni
been following this thread and wondering maybe a PR to the teensy 4 core is in order. When @PaulStoffregen gets back into it might be good to incorporate. Remember seeing lots of posts on issues with code size.

did see this about float formating
One of the space saving technique that nanolib uses is to include a trimmed down version of printf that does not support floating point printout. If you need to print out floating point numbers (e.g. %f and %g format), you need to enable that explicitly.
or is that covered when you enable printf?

Mike
 
Last edited:
Changing the c-library would obviously require a lot of testing. And I still believe that Paul had a good reason for using newlib in the first place. So I don't think this will happen soon (if ever). IMOH, the workaround suggested by @jmarsh, would be much less intrusive and has about the same memory savings as @shawn pointed out. I'd vote for it.

Regarding the printf of floats: yes this is activated with the -u _printf_float compiler switch. You can also use asm(".global _printf_float") in setup() which has the same effect and is useful if you compile for smallest-code, need printf-float and don't want/can change compiler switches.
 
This is just a follow up. @KurtE and I have been playing with cameras using CSI/Flexio/DMA/ILI displays. When I use the latest sketch we are playing with I am seeing crashes when printf/scanf is not enabled. Probably because we use printf's all over the place.


Code:
 --specs=nano.specs -u _printf_float -u _scanf_float
 
 teensy_size:   FLASH: code:94740, data:27096, headers:8208   free for files:7996420
teensy_size:    RAM1: variables:350752, code:88008, padding:10296   free for local variables:75232
teensy_size:    RAM2: variables:319648  free for malloc/new:204640

==========  smallest : crashes camera code
teensy_size:   FLASH: code:60908, data:24016, headers:8256   free for files:8033284
teensy_size:    RAM1: variables:347232, code:55280, padding:10256   free for local variables:111520
teensy_size:    RAM2: variables:319648  free for malloc/new:204640

========= 02 nano.specs : crashes camera code
teensy_size:   FLASH: code:72028, data:24024, headers:8392   free for files:8022020
teensy_size:    RAM1: variables:347232, code:65296, padding:240   free for local variables:111520
teensy_size:    RAM2: variables:319648  free for malloc/new:204640

= 02 with nano and printf/scanf
teensy_size: Memory Usage on Teensy 4.1:
teensy_size:   FLASH: code:88364, data:26072, headers:8440   free for files:8003588
teensy_size:    RAM1: variables:349280, code:81632, padding:16672   free for local variables:76704
teensy_size:    RAM2: variables:319648  free for malloc/new:204640

and if I just use -u printf_float
Code:
teensy_size:   FLASH: code:80636, data:26072, headers:9000   free for files:8010756
teensy_size:    RAM1: variables:349280, code:73904, padding:24400   free for local variables:76704
teensy_size:    RAM2: variables:319648  free for malloc/new:204640

It does save some space.
 
Changing the c-library would obviously require a lot of testing. And I still believe that Paul had a good reason for using newlib in the first place. So I don't think this will happen soon (if ever). IMOH, the workaround suggested by @jmarsh, would be much less intrusive and has about the same memory savings as @shawn pointed out. I'd vote for it.

Regarding the printf of floats: yes this is activated with the -u _printf_float compiler switch. You can also use asm(".global _printf_float") in setup() which has the same effect and is useful if you compile for smallest-code, need printf-float and don't want/can change compiler switches.
Yep you are probably right with that - sometimes I forget when I get overly optimistic :)

Any with @jmarsh suggestion, if I add that code snippet to the beginning of the same sketch (using Faster) I am not seeing any space savings:
Code:
teensy_size: Memory Usage on Teensy 4.1:
teensy_size:   FLASH: code:94740, data:27096, headers:8208   free for files:7996420
teensy_size:    RAM1: variables:350752, code:88008, padding:10296   free for local variables:75232
teensy_size:    RAM2: variables:319648  free for malloc/new:204640

vs with the code added to the sketch. Or am I putting it in the wrong spot.

Code:
teensy_size:   FLASH: code:94740, data:27096, headers:8208   free for files:7996420
teensy_size:    RAM1: variables:350752, code:88008, padding:10296   free for local variables:75232
teensy_size:    RAM2: variables:319648  free for malloc/new:204640
 
If you don't use a lot of C++ code, particularly STL stuff that can (by definition) throw exceptions, the problematic code probably isn't being linked in to begin with. But as soon as you start to use basic things like std::string or std::vector that can throw, the size blows up.
There's a small example I posted to demonstrate the missing new overrides, but in that case it's the call to new that is specifically triggering it.

TBH the only reason I didn't send a PR for this is because I'm not sure where the stub should live. It's c++ so needs to be in a .cpp file, which makes either main.cpp or new.cpp the likely choices - I'd be inclined to keep main.cpp clean (to allow main() to be replaced without symbol clashes), put it in new.cpp and rename it to something like cxx_stubs.cpp.
Also the stub should just call abort() (or maybe unused_interrupt_vector?) rather than spinning forever.
 
I'll upload the hex file to my bike and go for a runaround to confirm but initially everything seems to be working just fine...
No problems after some spirited testing.

I had some questions offline why the free RAM1 increase from 49K to 149K was such a big deal for me. I have some code paths in my application that *require* 42K of stack, so the effective free headroom increases from 7K to 107K which means I have a lot more space to keep adding new features now.

I found some benchmarks while googling that indicated that newlib-nano was much slower for things like memcpy(), but I was not able to reproduce them.
 
Does anyone know of something not working or working slower etc when using newlib-nano (smallest-code option)?
I've heard on the grapevine that newlib-nano also cannot printf 64-bit integers (long longs in the Teensy world), is anyone able to confirm/deny that?
 
I too would really like the C99 formats, so I may investigate building my own newlib-nano…
 
I too would really like the C99 formats, so I may investigate building my own newlib-nano…

I'm keen to hear how it goes. Been considering making newlib-nano the default, but losing float printing and other things could be too painful.
 
I'm keen to hear how it goes. Been considering making newlib-nano the default, but losing float printing and other things could be too painful.

So I've been playing with building newlib-nano, and I can build successfully, but my output is on the order of about 5 times larger than what's in the Teensy package. Can you share the script you're using to build the toolchain? Basically, you can add "--enable-newlib-io-long-long" and "--enable-newlib-io-c99-formats" to the configure options for newlib-nano, and it'll add all the well-known printf format conversion specifiers such as "%zu" and "%llu".
 
Can you share the script you're using to build the toolchain?

I've only built the toolchain a few times over Teensy's entire 16 year history. Every time the process has been different. I've tried to use the scripts ARM publishes. On the most recent version, I didn't built it at all for Windows, MacOS and Linux x86-64. I just used the copies ARM published.

Here's the info I have from the most recent build. This should give a build more or less the same as ARM creates.

Code:
hidden gotchas:

sudo apt install libncurses5-dev
sudo apt install autoconf automake libtool
sudo apt install python2.7-dev
if /usr/bin/python does not exist: sudo ln -s /usr/bin/python2.7 /usr/bin/python

can not build within virtulbox shared folder
because git-new-workdir needs to create a symbolic link
https://www.speich.net/articles/en/2018/12/24/virtualbox-6-how-to-enable-symlinks-in-a-linux-guest-os/



The example below shows how to build arm-gnu-toolchain-arm-none-eabi from sources using Linaro ABE build system.
Instructions

1. Install the dependencies ABE has a dependency on git-new-workdir and needs this tool to be installed in /usr/local/bin directory:

  wget https://raw.githubusercontent.com/git/git/master/contrib/workdir/git-new-workdir
  sudo mv git-new-workdir /usr/local/bin
  sudo chmod +x /usr/local/bin/git-new-workdir


2. Clone ABE from the URL below and checkout the stable branch (see Getting ABE):

  git clone https://git.linaro.org/toolchain/abe.git


3. Create the build directory and change to it:

  mkdir build && cd build


4. Configure ABE (from the build directory):

  ../abe/configure


5. Download the toolchain manifest file:

Download the toolchain manifest file arm-gnu-toolchain-arm-none-eabi-abe-manifest.txt from https://developer.arm.com/tools-and-software/open-source-software/developer-tools/gnu-toolchain/downloads, into the build folder:

   wget https://developer.arm.com/-/media/Files/downloads/gnu/11.3.rel1/manifest/arm-gnu-toolchain-arm-none-eabi-abe-manifest.txt


6. Build toolchain (from the build directory):

  ../abe/abe.sh --manifest arm-gnu-toolchain-arm-none-eabi-abe-manifest.txt --build all >& log &

  ../abe/abe.sh --manifest arm-gnu-toolchain-arm-none-eabi-abe-manifest.txt --build all


7. To build toolchain with newlib-nano configuration move out of build directory and create the build_newlib directory and change to it:

  cd .. && mkdir build_newlib && cd build_newlib


8. Clone ABE from the URL below and checkout the stable branch (see Getting ABE):

  git clone https://git.linaro.org/toolchain/abe.git


9. Configure ABE (from the build_newlib directory):

  ./abe/configure


10. Download the toolchain manifest file:

Download the toolchain manifest file arm-gnu-toolchain-arm-none-eabi-nano-abe-manifest.txt from https://developer.arm.com/tools-and-software/open-source-software/developer-tools/gnu-toolchain/downloads, into the build_newlib folder:

   wget https://developer.arm.com/-/media/Files/downloads/gnu/11.3.rel1/manifest/arm-gnu-toolchain-arm-none-eabi-nano-abe-manifest.txt


11. Build toolchain (from the build_newlib directory):

  abe/abe.sh --manifest arm-gnu-toolchain-arm-none-eabi-nano-abe-manifest.txt --build all >& log_nano

  abe/abe.sh --manifest arm-gnu-toolchain-arm-none-eabi-nano-abe-manifest.txt --build all


12. Move out of newlib_nano directory and download the copy_nano_libraries.sh script:

Download the copy_nano_libraries.sh script from https://developer.arm.com/tools-and-software/open-source-software/developer-tools/gnu-toolchain/downloads, to the folder above build_newlib directory:

  cd .. && wget https://developer.arm.com/-/media/Files/downloads/gnu/11.3.rel1/manifest/copy_nano_libraries.sh
  chmod 755 copy_nano_libraries.sh


12b: need to edit "host=" in copy_nano_libraries.sh

  grep build= build/host.conf | awk -F= '{print $2}'


13a: Create backup copy of compiled toolchains, in case the following steps go wrong

  tar -cf compiled.tar build build_newlib
  xz compiled.tar &
  disown


13. Copy the newlib nano header and newlib nano libraries build in build_newlib folder to build folder and change to build folder:

  ./copy_nano_libraries.sh && cd build

The built arm-none-eabi toolchain will be installed and available for use in the builds/destdir/x86_64-pc-linux-gnu/bin/ directory.


14. Create tar of built toolchain

  cd builds/destdir
  tar -cJf /tmp/arm-gnu-toolchain-11.3.rel1-i686-pc-linux-ubuntu18-arm-none-eabi.tar.xz i686-pc-linux-gnu

  tar -cvJf ~/toolchain/arm-gnu-toolchain-11.3.rel1-armv7l-unknown-linux-gnueabihf-arm-none-eabi.tar.xz armv7l-unknown-linux-gnueabihf
  tar -cvJf ~/toolchain/arm-gnu-toolchain-11.3.rel1-aarch64-unknown-linux-gnu-arm-none-eabi.tar.xz aarch64-unknown-linux-gnu
  tar -cvf ~/teensy/armtoolchain/linux32_ubuntu16/arm-gnu-toolchain-11.3.rel1-i686-pc-linux-gnu-arm-none-eabi.tar i686-pc-linux-gnu



armv7l-unknown-linux-gnueabihf


I also used a cleanup / pruning script which deletes a lot of dead weight. It has 2 parts, most of the real work is in "mk_generic.sh". Several other scripts just run it with different file names. This pruning process is not necessary. Its only purpose is to delete files we don't need for Teensy, so the final files are smaller to download and quicker to extract (especially on Windows where the OS has huge filesystem overhead for handling so many tiny files).
 

Attachments

  • toolchain_prune_scripts.zip
    3.5 KB · Views: 30
Back
Top