malloc/free fails to aggregate chunks, sometimes...

chrisbrooks

Member
I'm seeing some weird behavior when I use malloc/free to work with dynamic memory on my Teensy 4.1. When I free memory, it doesn't always get aggregated with adjacent chunks, which prevents me from allocating a large chunk.

Is this expected behavior?

Here is a minimal example:
C++:
#include "Arduino.h"
#include <memory>

struct BigStruct 
{
  uint8_t data[1024*64];
};

struct HugeStruct
{
  uint8_t data[1024*492];
};

int freeram() {
  // From https://forum.pjrc.com/index.php?threads/how-to-display-free-ram.33443/post-275013
  extern unsigned long _heap_end;
  extern char *__brkval;

  return (char *)&_heap_end - __brkval;
}

bool test_malloc_free_BigStruct(void)
{
  BigStruct *A = (BigStruct *) malloc(sizeof(BigStruct));
  bool success = (bool) A;
  free(A);
  return success;
}

bool test_malloc_free_two_BigStructs(void)
{
  BigStruct *B = (BigStruct *) malloc(sizeof(BigStruct));
  bool success = (bool) B;
  success &= test_malloc_free_BigStruct();
  free(B);
  return success;
}

bool test_malloc_free_HugeStruct(void)
{
  HugeStruct *A = (HugeStruct *) malloc(sizeof(HugeStruct));
  bool success = (bool) A;
  free(A);
  return success;
}

void print_freeram(void)
{
  Serial.print("freeram(): ");
  Serial.println(freeram());
}

void print_testresult(const char *testname, bool success)
{
  Serial.print(testname);
  Serial.println(success ? "successful": "failed");
}

void setup() {
  while (!Serial) {}

  Serial.println("teensy_malloc_test");
  Serial.println("==================");

  if (CrashReport)
    Serial.println(CrashReport); 

  // malloc/free a HugeStruct
  print_freeram();
  print_testresult("malloc/free a HugeStruct: ", test_malloc_free_HugeStruct());

  // malloc/free a BigStruct
  print_freeram();
  print_testresult("malloc/free a BigStruct: ", test_malloc_free_BigStruct());

  // malloc/free a HugeStruct
  print_freeram();
  print_testresult("malloc/free a HugeStruct: ", test_malloc_free_HugeStruct());

  // malloc/free two BigStructs
  print_freeram();
  print_testresult("malloc/free two BigStructs: ", test_malloc_free_two_BigStructs());

  // malloc/free a HugeStruct
  print_freeram();
  print_testresult("malloc/free a HugeStruct: ", test_malloc_free_HugeStruct());

  print_freeram();
}

void loop() {
  // Empty loop
}

And here is the serial output:
Code:
teensy_malloc_test
==================
freeram(): 511872
malloc/free a HugeStruct: successful
freeram(): 507904
malloc/free a BigStruct: successful
freeram(): 438272
malloc/free a HugeStruct: failed
freeram(): 438272
malloc/free two BigStructs: successful
freeram(): 507904
malloc/free a HugeStruct: successful
freeram(): 507904

Here's what I see happening:
  1. I successfully allocate and free a HugeStruct (one 492k chunk).
  2. I successfully allocate and free a BigStruct (one 64k chunk). (Note that "freeram" has gone down.)
  3. I fail to allocate a HugeStruct (492k).
  4. I successfully allocate and free two BigStructs (two 64k chunks). (Note that the "freeram" lost in step 2 has been recovered.)
  5. I successfully allocate a HugeStruct (492k).
Why should allocating two chunks and freeing them leave me with more usable memory than when I allocate and free one chunk?

Thanks for your help!


Details:
* Product: Teensy 4.1
* Arduino IDE 2.3.6 with Teensyduino 1.59.0
* Windows 11 Pro
 

Attachments

  • teensy_malloc_test.ino
    1.9 KB · Views: 75
I ran Chris's code. I get the same output.

This is weird. Why isn't free() moving the end-of-heap memory pointer back where it should be?

Chip
 
Free'd memory doesn't always get returned immediately to sbrk().
IIRC newlib includes mallinfo(), I suggest using that function to properly track allocations made using malloc/free.
 
I could be wrong, but I don't believe that it ever call sbrk() with a negative number. So it grows from the
Heap start to the heap end.

If I remember correctly the allocator on GCC does allocations out of multiple pools depending on the size of the allocation requests.
My guess is it might try to recoalesce, that if the memory near it is free it might combine them together... Been awhile since I looked
at all of the details on line like at:

 
I could be wrong, but I don't believe that it ever call sbrk() with a negative number. So it grows from the
Heap start to the heap end.
It should, once the amount of unallocated heap is over a certain threshold and there's nothing allocated above it.
 
The OP's issue is that, in the simple example, it is known that there is enough free contiguous memory to allocate the HugeStruct because of that free() operation in step two should release the one chunk of allocated memory. Yet because of whatever free() is doing (or not), the allocation request for HugeStruct fails in step three. That's a bummer! It should have had enough space! Why didn't free() move the end-of-heap pointer back so that malloc() would know that there was space?

Then, in Step 4, when he asks free() to do a more complicated task (release two allocations), we see that free() does correctly move the end-of-heap pointer back; it is able to successfully allocate the HugeStruct in step 5. Why does the more-complicated operation work, but the simpler one doesn't?!?

So weird. So mysterious (to me).
 
Last edited:
Oh, and since Chris mentioned the optimization setting that he used, I tried compiling his example using a diff setting. I compiled using "smallest code size" (or whatever it's called). When the code ran, all of his test cases ran successfully. Step 3 did not fail.

Why would changing the compilation option for smaller code size change how malloc() and free() work?
 
Smallest code uses a linker option for the "nano" version of the C library. Not sure if that makes any difference, but it might...
 
The OP's issue is that, in the simple example, it is known that there is enough free contiguous memory to allocate the HugeStruct because of that free() operation in step two should release the one chunk of allocated memory. Yet because of whatever free() is doing (or not), the allocation request for HugeStruct fails in step three. That's a bummer! It should have had enough space! Why didn't free() move the end-of-heap pointer back so that malloc() would know that there was space?

Then, in Step 4, when he asks free() to do a more complicated task (release two allocations), we see that free() does correctly move the end-of-heap pointer back; it is able to successfully allocate the HugeStruct in step 5. Why does the more-complicated operation work, but the simpler one doesn't?!?

So weird. So mysterious (to me).
malloc() isn't a simple wrapper around sbrk(). It does a lot of other stuff, particularly to make repeated allocations/deallocations of the same size fast. In this case here's what is probably happening:
- the first malloc/free of a HugeStruct initializes the allocator. It keeps some memory for permanent internal state tracking, hence the reduction of free heap from 511872 to 507904.
- the second malloc/free of a BigStruct grabs some more memory from sbrk(), but it doesn't return it when it is free'd because it's below a certain threshold. Instead it gets kept in its own "free-for-use" block.
- the third malloc/free attempt for a HugeStruct can't fit in the "free" block, so malloc tries to allocate it from sbrk. That also fails due to the size.
- the fourth malloc/free attempt also can't fit in the "free" block, but sbrk can supply the memory. When it gets free'd, it gets coalesced with the existing "free" block and since it's over the required size threshold, it gets returned to sbrk. Now we're more or less back to the beginning state.

There's nothing "bad' going on here, it's just memory fragmentation. In reality if you were going to use structures that large, they would likely be statically allocated. Smallest code/newlib-nano probably uses a much smaller/simpler implementation of malloc.
 
Smallest code uses a linker option for the "nano" version of the C library. Not sure if that makes any difference, but it might...
Is the source available for the two versions of the C libraries that are used by the teensy tool chain?

This has really lit up my curiosity and I'd like to look at the source and see if I can learn something about how the magic works...

Chip
 
Last edited:
@jmarsh Thanks for your explanation.

Inspired by your comment about mallinfo(), I poked around in "malloc.h" and found another useful function: malloc_trim(). When I call malloc_trim(), my "freeram()" value is always restored to 507904, and my apparent fragmentation does away.

Example sketch and serial output are attached.

I'd still love to see the source code for these two versions of the C library used by the teensy tool chain, so I can understand what the differences are.
 

Attachments

  • teensy_malloc_test.ino
    3.1 KB · Views: 47
  • serialoutput.txt
    1.7 KB · Views: 49
I'd still love to see the source code for these two versions of the C library used by the teensy tool chain,

It's just newlib, whatever version was part of ARM's 11.3.rel1 release. That's the "easy" answer.

At the time we did the toolchain update I created these notes with the specific build commands used for the versions ARM didn't supply as compiled binaries. The commands use a high level tool called "abe" which downloads manifest files and then uses the contents of those files to figure out which source code to download and build. Here is a direct link to the main manifest file. Inside that file you can find this section for newlib.

Code:
# Component data for newlib
newlib_url=https://developer.arm.com/-/media/Files/downloads/gnu/11.3.rel1/src
newlib_filespec=newlib-cygwin.tar.xz
newlib_mingw_only="no"
newlib_linuxhost_only="no"
newlib_staticlink="yes"
newlib_configure="--disable-newlib-supplied-syscalls --enable-newlib-io-long-long --enable-newlib-io-c99-formats --enable-newlib-mb --enable-newlib-reent-check-verify --enable-newlib-register-fini --enable-newlib-retargetable-locking"

By just adding those first 2 lines, I believe this is the direct link to the actual newlib source code used. But I need to mention for Teensy's toolchain I used the binaries ARM publishes for the major platforms and the high-level build script for the minor platforms that didn't have binaries (eg, Raspberry Pi). I never downloaded individual files like this. I ran "abe" and let it handle these sorts of details.

If you download and extract that newlib-cygwin.tar.xz file, you should end up with 6785 files in many subdirectories. The most obvious place to start looking is newlib-cygwin/newlib/libc/stdlib/malloc.c, but it's just a wrapper for _malloc_r. Maybe newlib/libc/stdlib/_mallocr.c has the actual source?

Unfortunately there is a lot of abstraction stuff to unpack, as newlib is a very mature open source project targeting pretty much every type of CPU. The different versions get build according for .specs files and a lot of confirgurable stuff. If you really want to dive down this deep rabbit hole, probably the easiest way to get started would be installing Ubuntu 22.04 on a virtual machine and then try running the commands from the 14 steps in my notes file. If you're lucky all those links will still work and you'll get a complete build of the toolchain, which of course will have a huge pile of source code and intermediate build files.
 
Thanks for all the instructions. By googling around, I had started down this path and, as your reply describes, it got really complicated really fast. I was hoping that it was my inexperience that made it difficult to thread my way through the library.

I had hoped that maybe there was a chance that it would be as simple as: "Here is the source for malloc as used by newlib and there is the source for malloc used by nano-lib. See how they're different? That's why their behaviors are different in your case. Isn't that cool?"

Sadly, it's not gonna be that easy.

I greatly appreciate your input. And, I'm glad that Chris pointed out malloc_trim(). I've learned so much!

Chip
 
I have no idea how accurate it is, but if you ask claude.ai "what are the differences between the malloc implementations in newlib and nano-lib?", it gives you a plausible answer.
 
Back
Top