malloc fails on Teensy LC

Status
Not open for further replies.

jbliesener

Well-known member
WARNING: THIS IS A VERY TECHNICAL POST, LOOKING DEEP INTO TEENSY'S RUNTIME LIBRARY

Running Teensyduino 1.41 on Arduino 1.8.1 (Arduino version shouldn't make too much of a difference, as the problem seems to be related to the ARM toolchain).

Teensy LC has 8 kB of RAM. STACK_MARGIN (the minimum amount of stack that is left) is defined as 512 bytes in mk20dx128.c.

Here's my sketch:

Code:
unsigned char staticData[1500];


void setup() {
    Serial.begin(115200);
    delay(2000);
    Serial.printf("staticData start: %08lx\n", (uint32_t) staticData);
    Serial.printf("staticData end:   %08lx\n", (uint32_t) (staticData+sizeof(staticData)));
    void* data = malloc(4);
    Serial.printf("malloc'd data:    %08lx\n", (uint32_t) data);
}


void loop() { }

Compiler settings in Arduino:

  • Board: Teensy LC
  • USB Type: RAW HID
  • CPU Speed: 48 MHz
  • Keyboard Layout: US English
  • Optimize: Fastest

The sketch should allocate 1500 bytes of static (BSS) memory on top of the runtime library data (which depend on the USB Type setting) on the bottom of the 8K RAM. Stack grows in from the other end. When more memory is required, the _sbrk function in mk20dx128.c is called and fails if the distance between top of heap and bottom of stack is less than 512 bytes.

Now, when I run the sketch, the 1500 bytes of BSS, plus 512 bytes of stack, plus some 4K of runtime library variables and buffers are far less than the 8K of RAM the LC provides. Yet, malloc fails:

Code:
staticData start: 20000804
staticData end:   20000de0
malloc'd data:    00000000

I was trying to hunt this down and put some debug output through the hardware serial port into _sbrk:

Code:
void * _sbrk(int incr)
{
    char *prev, *stack;


    serial_print("Allocating ");
    serial_phex32(incr);
    serial_print(" bytes, current=");
    serial_phex32((uint32_t) __brkval);
    prev = __brkval;
    if (incr != 0) {
        __asm__ volatile("mov %0, sp" : "=r" (stack) ::);
        if (prev + incr >= stack - STACK_MARGIN) {
            serial_print(", failed, new end=");
            serial_phex32((uint32_t) (prev+incr));
            serial_print(", stack=");
            serial_phex32(stack);
            serial_print(", margin=");
            serial_phex32(STACK_MARGIN);
            serial_print("\n");
            errno = ENOMEM;
            return (void *)-1;
        }
        __brkval = prev + incr;
    }
    serial_print(", ok. result: ");
    serial_phex32(prev);
    serial_print(", new end of heap: ");
    serial_phex32(__brkval);
    serial_print("\n");
    return prev;
}

Here's the result:

Code:
Allocating 00000020 bytes, current=200010C4, ok. result: 200010C4, new end of heap: 200010E4
Allocating 00000F1C bytes, current=200010E4, failed, new end=20002000, stack=20001770, margin=00000200

The malloc(4) call results in two calls to _sbrk. The first one allocates a block of 32 bytes, presumably for the runtime library's internal housekeeping. The second call, though, tries to allocate a block of 3868 bytes which is exactly until the end of the 4K page from 20001000 to 20001FFF.

Now, this is interesting, because the LC doesn't even have RAM at 2001FFF. The LC's RAM ranges from 1FFFF800 to 200017FF. This points towards a 4K page size in the runtime library's malloc implementation.

Changing the size of the staticData array gets us closer to the problem. Reducing its size from 1500 to 1400 bytes doesn't change anything, but once we get to 1260 bytes, we get the following outputs:

Code:
unsigned char staticData[1260];

--- OUTPUT:

staticData start: 20000804
staticData end:   20000cf0
malloc'd data:    20000fe0

-- OUTPUT on HardwareSerial:


Allocating 00000020 bytes, current=20000FD8, ok. result: 20000FD8, new end of heap: 20000FF8
Allocating 00000008 bytes, current=20000FF8, ok. result: 20000FF8, new end of heap: 20001000

So, malloc asks _sbrk for a page that ends at exactly a 4K boundary. Watch what happens if I only add a single more byte to staticData

Code:
unsigned char staticData[1260];

--- OUTPUT:

staticData start: 20000804
staticData end:   20000cf1
malloc'd data:    00000000




-- OUTPUT on HardwareSerial:

Allocating 00000020 bytes, current=20000FDC, ok. result: 20000FDC, new end of heap: 20000FFC
Allocating 00001004 bytes, current=20000FFC, failed, new end=20002000, stack=20001770, margin=00000200

malloc asks _sbrk for a block until the next 4K page, which obviously fails.

So, the question is: Is there anything we can do against this? Can newlib's malloc page size be reduced to, let's say, 256 or 512 bytes for the LC? Doesn't make too much sense to work with 4K pages when even RAM isn't aligned at a 4K boundary, does it?
 
Looking at the memory map, I see there's a 1K memory block called "__malloc_av". That seems to be related to newlib's mallocr function, which uses, by default, a 4K page size. There's another, alternative malloc implementation in newlib, called "nano-mallocr", which provides a much smaller memory footprint. From the docs at https://sourceware.org/newlib/README:

Code:
`--enable-newlib-nano-malloc'
     NEWLIB has two implementations of malloc family's functions, one in
     `mallocr.c' and the other one in `nano-mallocr.c'.  This options
     enables the nano-malloc implementation, which is for small systems
     with very limited memory.  Note that this implementation does not
     support `--enable-malloc-debugging' any more.
     Disabled by default.

Without getting into a philosophic discussion whether malloc()/calloc()/new should or should not be used on embedded systems (I agree that it's a bad idea during runtime, but there are sufficient cases where you'd want to do it at startup), I believe that the nano-malloc implementation could benefit Teensy's runtime performance. Eliminating the 1K __malloc_av buffer already provides an additional 1K of RAM for static variables.
 
Status
Not open for further replies.
Back
Top