Memory usage is much higher when building for teensy

Status
Not open for further replies.

Gibbedy

Well-known member
When I compile a sketch in arduino with target of teensy 3.2 the reported program and dynamic memory usage is 3 x + more than if I select another arduino board like a mega.
 
Last edited:
I imagine the Teensy 3.x is packing things in a 32bit alignment rather than the 8bit alignment of the AVR chips.

It's hard to say when you don't post any code
 
Here's an investigation I did in late 2014. It's a bit dated, but still mostly applied.

This was the LED blink example on Arduino Uno vs Teensy 3.1.

-------------------------


Teensy 3.1 also uses a little over 11k for the LED blink. Today I spent some time analyzing a disassembly of every last byte.

Turns out, 4996 bytes are unused code which the linker is including, even though it's never needed. The main culprit is the hardware serial code, which is probably due to serialEvent and how the interrupts are defined. As I recall, this problem happened on AVR at one point. Teensy 3.1's HardwareSerial code is much larger than Arduino's, due to lots of optimizations for the FIFO-based hardware. Handlers for attachInterrupt are also getting included, even though Blink doesn't use attachInterrupt. (I'm not suggesting these problems are present on Arduino Due or Maple... this 5k of wasted space is entirely my fault unique to Teensy).

The actual sketch and core lib code uses 764 bytes. The main reason that's larger than AVR is the bigger pinout mapping table, since Teensy 3.1 has twice as many pins, and pointers are 32 bits instead of 16 bits, and Teensy uses a flat table, rather than the space-conserving but slower 2-tier table in Arduino. The serialEvent stuff is bigger, as it would be on Arduino Mega, because 4 ports can have events, rather than just 1 on Uno. The digitalWrite function also needs some extra code to emulate the pullup resistor quirk of AVR, and pinMode has some extra work to do. Even with all those differences, 764 bytes isn't a tremendous size difference, though Uno's 600 byte total includes the overhead, which I'll describe in a moment...

My USB stack uses 3218 bytes, which seems pretty reasonable.

I found 1720 bytes of "overhead". This is where ARM really differs from AVR.

Both AVR & ARM interrupt vector tables uses 4 bytes per entry. However, the '328 chip has only 26 vectors. The ARM chip on Teensy 3.1 has 111 vectors (95 peripheral interrupts and 16 ARM system vectors), making the vector table over 4 times larger.

The other difference in interrupts is dummy code. On AVR, I believe 2 bytes are used, causing unused interrupts to jump to an infinite loop. On Teensy 3.1, dummy interrupts and system exceptions use 136 bytes. On AVR, there is no detection of accesses to undefined memory and other types of system faults. ARM has these features, which necessitates uses at least some code for dummy handlers.

Teensy 3.1 uses 596 bytes for hardware initialization code. This is another huge difference on ARM. On AVR, the chip boots up with the crystal oscillating, most peripherals powered up, and everything pretty much ready to go. Freescale's ARM chip, and most other ARM chips, start up in a low power mode, running from an internal RC oscillator, with nearly all the chip's features turned off. The advantage is quick startup (with the caveat of an 1% accurate RC oscillator clock) and very low power consumption. The downside is much more hardware initialization is needed, before the chip is in an Arduino-like state where everything "just works" or is "ready to use".

Probably the most striking difference in startup is the clock. On AVR, a hardware mechanism waits for the crystal to start oscillating, so you're running code at 16 MHz at ~0.003% accurate speed from the very first instruction. On ARM, there's a fairly involved procedure to start the crystal and detect when it's stable, and then start the phase locked loop, which multiplies the 16 MHz crystal up to the 48 / 72 / 96 MHz internal system clock. The startup code is much longer as a result.

There's another 192 bytes for analog startup. The 10 bit ADC on AVR has minimal setup requirements. You just turn it on, and the first reading takes a little longer. The 16 bit ADC (which is really only about 12 to 13 effective bits) has a runtime calibration procedure. Because the calibration can take time, I wrote code to begin the process early, which has the advantage of little or no extra delay for the first use of analogRead(), but the trade-off is 192 bytes of always-linked code to start the ADC calibration early, even if analogRead() or other ADC code is never used.

I also found 352 bytes of complex compiler startup/overhead stuff, related to C++ constructors and other features. Some of this might be related to the USB stack, hardware serial, special constants, or other stuff. 352 bytes is quite a lot compared to AVR's program size, which leads me to be believe there is some extra overhead in how the compiler or standard libraries work, but to be honest, this stuff is small and complicated, so I mostly just tallied up the bytes and skipped it.

Hopefully this long-winded writeup helps anyone wondering why ARM & AVR code size differ so much. I learned a few things today by digging through the disassembled code!
 
The USB stack part isn't explained much, but consider if you compile code for Arduino Leonardo. The USB stack gets built into your program.

On boards like Arduino Uno & Mega, there's another chip with all that code!
 
Yup, and don't forget that it is a 32-BIT machine, unlike 8-BIT AVR Hardware.. an "int" for example needs 4 Bytes instead 2 Bytes on the AVR.
That's the little price for 32BIT System. But it can run circles around the AVR .. and 64KB RAM compensate much ..

You save space if you use int16_t instead of int.
 
Last edited:
Status
Not open for further replies.
Back
Top