jbliesener
Well-known member
WARNING: THIS IS A VERY TECHNICAL POST, LOOKING DEEP INTO TEENSY'S RUNTIME LIBRARY
Running Teensyduino 1.41 on Arduino 1.8.1 (Arduino version shouldn't make too much of a difference, as the problem seems to be related to the ARM toolchain).
Teensy LC has 8 kB of RAM. STACK_MARGIN (the minimum amount of stack that is left) is defined as 512 bytes in mk20dx128.c.
Here's my sketch:
Compiler settings in Arduino:
The sketch should allocate 1500 bytes of static (BSS) memory on top of the runtime library data (which depend on the USB Type setting) on the bottom of the 8K RAM. Stack grows in from the other end. When more memory is required, the _sbrk function in mk20dx128.c is called and fails if the distance between top of heap and bottom of stack is less than 512 bytes.
Now, when I run the sketch, the 1500 bytes of BSS, plus 512 bytes of stack, plus some 4K of runtime library variables and buffers are far less than the 8K of RAM the LC provides. Yet, malloc fails:
I was trying to hunt this down and put some debug output through the hardware serial port into _sbrk:
Here's the result:
The malloc(4) call results in two calls to _sbrk. The first one allocates a block of 32 bytes, presumably for the runtime library's internal housekeeping. The second call, though, tries to allocate a block of 3868 bytes which is exactly until the end of the 4K page from 20001000 to 20001FFF.
Now, this is interesting, because the LC doesn't even have RAM at 2001FFF. The LC's RAM ranges from 1FFFF800 to 200017FF. This points towards a 4K page size in the runtime library's malloc implementation.
Changing the size of the staticData array gets us closer to the problem. Reducing its size from 1500 to 1400 bytes doesn't change anything, but once we get to 1260 bytes, we get the following outputs:
So, malloc asks _sbrk for a page that ends at exactly a 4K boundary. Watch what happens if I only add a single more byte to staticData
malloc asks _sbrk for a block until the next 4K page, which obviously fails.
So, the question is: Is there anything we can do against this? Can newlib's malloc page size be reduced to, let's say, 256 or 512 bytes for the LC? Doesn't make too much sense to work with 4K pages when even RAM isn't aligned at a 4K boundary, does it?
Running Teensyduino 1.41 on Arduino 1.8.1 (Arduino version shouldn't make too much of a difference, as the problem seems to be related to the ARM toolchain).
Teensy LC has 8 kB of RAM. STACK_MARGIN (the minimum amount of stack that is left) is defined as 512 bytes in mk20dx128.c.
Here's my sketch:
Code:
unsigned char staticData[1500];
void setup() {
Serial.begin(115200);
delay(2000);
Serial.printf("staticData start: %08lx\n", (uint32_t) staticData);
Serial.printf("staticData end: %08lx\n", (uint32_t) (staticData+sizeof(staticData)));
void* data = malloc(4);
Serial.printf("malloc'd data: %08lx\n", (uint32_t) data);
}
void loop() { }
Compiler settings in Arduino:
- Board: Teensy LC
- USB Type: RAW HID
- CPU Speed: 48 MHz
- Keyboard Layout: US English
- Optimize: Fastest
The sketch should allocate 1500 bytes of static (BSS) memory on top of the runtime library data (which depend on the USB Type setting) on the bottom of the 8K RAM. Stack grows in from the other end. When more memory is required, the _sbrk function in mk20dx128.c is called and fails if the distance between top of heap and bottom of stack is less than 512 bytes.
Now, when I run the sketch, the 1500 bytes of BSS, plus 512 bytes of stack, plus some 4K of runtime library variables and buffers are far less than the 8K of RAM the LC provides. Yet, malloc fails:
Code:
staticData start: 20000804
staticData end: 20000de0
malloc'd data: 00000000
I was trying to hunt this down and put some debug output through the hardware serial port into _sbrk:
Code:
void * _sbrk(int incr)
{
char *prev, *stack;
serial_print("Allocating ");
serial_phex32(incr);
serial_print(" bytes, current=");
serial_phex32((uint32_t) __brkval);
prev = __brkval;
if (incr != 0) {
__asm__ volatile("mov %0, sp" : "=r" (stack) ::);
if (prev + incr >= stack - STACK_MARGIN) {
serial_print(", failed, new end=");
serial_phex32((uint32_t) (prev+incr));
serial_print(", stack=");
serial_phex32(stack);
serial_print(", margin=");
serial_phex32(STACK_MARGIN);
serial_print("\n");
errno = ENOMEM;
return (void *)-1;
}
__brkval = prev + incr;
}
serial_print(", ok. result: ");
serial_phex32(prev);
serial_print(", new end of heap: ");
serial_phex32(__brkval);
serial_print("\n");
return prev;
}
Here's the result:
Code:
Allocating 00000020 bytes, current=200010C4, ok. result: 200010C4, new end of heap: 200010E4
Allocating 00000F1C bytes, current=200010E4, failed, new end=20002000, stack=20001770, margin=00000200
The malloc(4) call results in two calls to _sbrk. The first one allocates a block of 32 bytes, presumably for the runtime library's internal housekeeping. The second call, though, tries to allocate a block of 3868 bytes which is exactly until the end of the 4K page from 20001000 to 20001FFF.
Now, this is interesting, because the LC doesn't even have RAM at 2001FFF. The LC's RAM ranges from 1FFFF800 to 200017FF. This points towards a 4K page size in the runtime library's malloc implementation.
Changing the size of the staticData array gets us closer to the problem. Reducing its size from 1500 to 1400 bytes doesn't change anything, but once we get to 1260 bytes, we get the following outputs:
Code:
unsigned char staticData[1260];
--- OUTPUT:
staticData start: 20000804
staticData end: 20000cf0
malloc'd data: 20000fe0
-- OUTPUT on HardwareSerial:
Allocating 00000020 bytes, current=20000FD8, ok. result: 20000FD8, new end of heap: 20000FF8
Allocating 00000008 bytes, current=20000FF8, ok. result: 20000FF8, new end of heap: 20001000
So, malloc asks _sbrk for a page that ends at exactly a 4K boundary. Watch what happens if I only add a single more byte to staticData
Code:
unsigned char staticData[1260];
--- OUTPUT:
staticData start: 20000804
staticData end: 20000cf1
malloc'd data: 00000000
-- OUTPUT on HardwareSerial:
Allocating 00000020 bytes, current=20000FDC, ok. result: 20000FDC, new end of heap: 20000FFC
Allocating 00001004 bytes, current=20000FFC, failed, new end=20002000, stack=20001770, margin=00000200
malloc asks _sbrk for a block until the next 4K page, which obviously fails.
So, the question is: Is there anything we can do against this? Can newlib's malloc page size be reduced to, let's say, 256 or 512 bytes for the LC? Doesn't make too much sense to work with 4K pages when even RAM isn't aligned at a 4K boundary, does it?