Multiple issues using TeensyThreads on T3.5. Dynamic heap allocation problems.

As noted p#11 the first caller to malloc() sets baseline ptr* brkval - that is the 'logic' behind the main thread alloc 'hack':
Code:
	if (__brkval == 0)
		__brkval = __malloc_heap_start;
	cp = __malloc_heap_end;
	if (cp == 0)
		cp = STACK_POINTER() - __malloc_margin;


Indeed, testing a series of allocs ( and free's ) Main and Thread1 - perhaps another thread or two to see if the allocations are coming from the desired memory area and not over lapping.
 
Thanks again, defragster. I will test many allocs and frees from different threads and check the pointers to see where the memory is being allocated in the ram.
 
Thanks again, defragster. I will test many allocs and frees from different threads and check the pointers to see where the memory is being allocated in the ram.

Good luck. Just make sure they don't overlap on ending timeslice with re-entry. What is the best way to enforce that? Semaphore/mutex/interrupts off?
 
As noted p#11 the first caller to malloc() sets baseline ptr* brkval - that is the 'logic' behind the main thread alloc 'hack':
Code:
	if (__brkval == 0)
		__brkval = __malloc_heap_start;
	cp = __malloc_heap_end;
	if (cp == 0)
		cp = STACK_POINTER() - __malloc_margin;

Indeed, testing a series of allocs ( and free's ) Main and Thread1 - perhaps another thread or two to see if the allocations are coming from the desired memory area and not over lapping.

I guess I don't understand this part. In my test program (update below), the blocks are allocated in order from lowest to highest address, but with some failures among threads other than the main thread. Error pattern is different depending on threadStackSize.

Code:
#include <Arduino.h>
#include <TeensyThreads.h>

#define SIZE_ALLOC 1024 // JWP 500
uint32_t total_bytes_alloc = 0;
uint32_t thread_bytes_alloc[5] = {0,0,0,0,0};

void thread_loop( int threadID )
{
  bool fail = false;
  while(1) {
    char* pointer = (char*)malloc(SIZE_ALLOC);
    if (pointer) {
      fail = false;
      thread_bytes_alloc[threadID] += SIZE_ALLOC;
      total_bytes_alloc += SIZE_ALLOC;
      Serial.printf("Thread %1d malloc %d. Addr = %p. thread_total = %d. total = %d\n",
                        threadID, SIZE_ALLOC, pointer,
                        thread_bytes_alloc[threadID], total_bytes_alloc);
    }
    else {
      if (fail == false)
        Serial.printf("Thread %1d malloc failed.\n" , threadID);
      fail = true;
    }
    threads.delay(50); // JWP (500)
  } 
}

void thread_func( void *param )
{
  thread_loop( (int)param );
}

void setup()
{
  Serial.begin(9600);
  delay(1000); // JWP (5000);
  int threadStackSize = 1024;
  int threadCount = 4;
  for (int i=0; i<threadCount; i++)
    threads.addThread(thread_func, (void*)i+1, threadStackSize); // stack alloc'd from heap 
  Serial.println(threads.threadsInfo()); // Information for thread 0 (main) is incorrect, information for thread 1 is missing
}

void loop() 
{
  thread_loop( 0  );
}
 
Just added: ftrias/TeensyThreads/issues/38
It was not giving threadsInfo on LAST thread.

Something isn't working as expected? I added int* me in code below - output changed drastically and makes no sense - how does thread 4 fail malloc without starting - except all threads are using print.
Code:
C:\T_Drive\tCode\FORUM\MallocThreads2\MallocThreads2.ino Aug  9 2022 01:45:14
_____
0:Stack size:10240|Used:537067512|Remains:-537057272|State:RUNNING|
1:Stack size:1040|Used:40|Remains:1000|State:RUNNING|
2:Stack size:1060|Used:40|Remains:1020|State:RUNNING|
3:Stack size:1080|Used:40|Remains:1040|State:RUNNING|
4:Stack size:1100|Used:40|Remains:1060|State:RUNNING|

me=0x2002ffbf Thread 0 malloc 1024. Addr = 0x1fff2c88. thread_total = 1024. total = 1024
me=0x1fff1d3f Thread 1 malloc 1024. Addr = 0x1fff3090. thread_total = 1024. total = 2048
me=0x1fff2233 Thread 2 malloc 1024. Addr = 0x1fff3498. thread_total = 1024. total = 3072
me=0x1fff2737 Thread 3 malloc 1024. Addr = 0x1fff38a0. thread_total = 1024. total = 4096
[B]Thread 4 malloc failed.[/B]
me=0x2002ffbf Thread 0 malloc 1024. Addr = 0x1fff3ca8. thread_total = 2048. total = 5120
me=0x1fff1d3f Thread 1 malloc 1024. Addr = 0x1fff40b0. thread_total = 2048. total = 6144
me=0x2002ffbf Thread 0 malloc 1024. Addr = 0x1fff44b8. thread_total = 3072. total = 7168
me=0x1fff1d3f Thread 1 malloc 1024. Addr = 0x1fff48c0. thread_total = 3072. total = 8192
me=0x2002ffbf Thread 0 malloc 1024. Addr = 0x1fff4cc8. thread_total = 4096. total = 9216
...

This code did that - with edit in Issue #38 made for threadsInfo - and changing stack size to see it:
Code:
// https://forum.pjrc.com/threads/70831-Multiple-issues-using-TeensyThreads-on-T3-5-Dynamic-heap-allocation-problems

#include <Arduino.h>
#include <TeensyThreads.h>

#define SIZE_ALLOC 1024 // JWP 500
uint32_t total_bytes_alloc = 0;
uint32_t thread_bytes_alloc[5] = {0,0,0,0,0};

void thread_loop( int threadID )
{
  bool fail = false;
[B]  int* me=(int *)&fail;[/B]
  while(1) {
    char* pointer = (char*)malloc(SIZE_ALLOC);
    if (pointer) {
      fail = false;
      thread_bytes_alloc[threadID] += SIZE_ALLOC;
      total_bytes_alloc += SIZE_ALLOC;
      Serial.printf("me=%p Thread %1d malloc %d. Addr = %p. thread_total = %d. total = %d\n",
                        me, threadID, SIZE_ALLOC, pointer,
                        thread_bytes_alloc[threadID], total_bytes_alloc);
    }
    else {
      if (fail == false)
        Serial.printf("Thread %1d malloc failed.\n" , threadID);
      fail = true;
    }
    threads.delay(50); // JWP (500)
  } 
}

void thread_func( void *param )
{
  thread_loop( (int)param );
}

void setup()
{
  pinMode( 13, OUTPUT );
  digitalWrite( 13, HIGH );
  Serial.begin(9600);
  while (!Serial && millis() < 4000 );
  Serial.println("\n" __FILE__ " " __DATE__ " " __TIME__);
  int threadStackSize = 1020;
  int threadCount = 4;
  for (int i=1; i<=threadCount; i++)
    threads.addThread(thread_func, (void*)i, threadStackSize+i*20); // stack alloc'd from heap 
  Serial.println(threads.threadsInfo()); // Information for thread 0 (main) is incorrect, information for thread 1 is missing
}

void loop() 
{
  thread_loop( 0  );
}
 
Something isn't working as expected? I added int* me in code below - output changed drastically and makes no sense - how does thread 4 fail malloc without starting - except all threads are using print.

I had the same experience. Made a small, valid change to a working program, and the program stopped working completely. What I think I see in the test results are that malloc() only works (for some undefined time) after doing a malloc() from the main thread. I think this is because the stack pointer is actually below the top of the heap for all other threads, and this is confusing for the malloc() algorithm. I wouldn't expect a program that uses TeensyThreads and makes much (if any) use of malloc() from threads other than the main thread to be reliable or predictable. Limiting malloc() to the main thread is really not much of a restriction.
 
Something odd for sure. Seemed all 4 threads were doing Alloc with post #29 - then some change and only thread 0 and 4 made repeat calls. Then the change indicated in p#30 and then only thread 0 and 1 are active?

Seems malloc usage in threads corrupting the storage of TeensyThreads?

Also looking at the code and Issue #38 linked above - not sure the core threading code is cycling all threads created based on use of count and the loop ranges?
 
Something odd for sure. Seemed all 4 threads were doing Alloc with post #29 - then some change and only thread 0 and 4 made repeat calls. Then the change indicated in p#30 and then only thread 0 and 1 are active?

Seems malloc usage in threads corrupting the storage of TeensyThreads?

Also looking at the code and Issue #38 linked above - not sure the core threading code is cycling all threads created based on use of count and the loop ranges?

It seems like even if it works, it's not going to work as expected when the stack pointer for all threads other than the main thread is below the top of the heap. Can you comment on that? One of your first suggestions was to use something other than standard malloc() for memory allocation. I think that's not only recommended but necessary, unless malloc and new are limited to the main thread.
 
Hello defragster and joepasquariello!

I have been experimenting with your codes and making some small modifications to try to isolate the problem and find the root cause. Thanks very much for your help so far! :)
I have produced a code that is able to reliably malloc() from any thread until the limit of around 120 kB total allocated heap on Teensy 3.5. To achieve this, i made some modifications and used the "hack" found by defragster.

Some of the changes are:
  • I have added some entropy to the delay times and allocation size, just to be sure that it is not working by coincidence.
  • I stopped using printf, because it seems to be a problem sometimes. Instead, I am only using Serial.print and Serial.println.
  • I protect the malloc function and Serial access with mutex, so those function now should be "thread safe".

This is the code:

Code:
#include <Arduino.h>
#include <TeensyThreads.h>
#include <Entropy.h>
#define SIZE_ALLOC 1024 // JWP 500
#define INIT_ALLOC 125000

uint32_t total_bytes_alloc = 0;
uint32_t thread_bytes_alloc[5] = {0,0,0,0,0};
Threads::Mutex heapLock;
Threads::Mutex serialLock;

void thread_loop( int threadID )
{
  while(1) {
    uint32_t sizeToAlloc = SIZE_ALLOC + Entropy.random(0, SIZE_ALLOC*3);
    heapLock.lock();
    char* pointer = (char*)malloc(sizeToAlloc);
    heapLock.unlock();
    if (pointer) {
      thread_bytes_alloc[threadID] += sizeToAlloc;
      total_bytes_alloc += sizeToAlloc;

      serialLock.lock();
      Serial.print("Thread ");
      Serial.print(threadID);
      Serial.print(" malloc ");
      Serial.print(sizeToAlloc);
      Serial.print(". Addr = 0x");
      Serial.print((uint32_t)pointer, HEX);
      Serial.print(". thread_total = ");
      Serial.print(thread_bytes_alloc[threadID]);
      Serial.print(". total = ");
      Serial.println(total_bytes_alloc);
      serialLock.unlock();
    }
    else
    {
      serialLock.lock();
      Serial.print("Thread ");
      Serial.print(threadID);
      Serial.println(" malloc failed!");
      serialLock.unlock();
    }
    threads.delay(50 + Entropy.random(200)); // JWP (500)
  } 
}

void thread_func( void *param )
{
  thread_loop( (int)param );
}

void setup()
{
  Entropy.Initialize();
  Serial.begin(9600);
  delay(3000);
  int threadStackSize = 1024;
  int threadCount = 4;

  // "Hack" to initialize the malloc pointers
  int* pointer = (int *)malloc(INIT_ALLOC); // This works if INIT_ALLOC is less than 130000.
  if(pointer)
  {
    Serial.println("Allocated and freed");
    free(pointer);
  }
  else
  {
    Serial.println("Failed to allocated");
  }

  for (int i=0; i<threadCount; i++)
    threads.addThread(thread_func, (void*)i+1, threadStackSize); // stack alloc'd from the heap
  Serial.println(threads.threadsInfo()); // Information for thread 0 (main) is incorrect, information for thread 1 is missing
}

void loop() 
{
  thread_loop( 0  );
}

My biggest question now is what is the reason for this limit? When I increase INIT_ALLOC to 130000, it no longer works, and threads are sometimes unable to allocate. With anything I tested 125000 and below, it works, and threads + loop are able to allocate until around INIT_ALLOC bytes have been allocated. In any case, the main thread (loop) is always able to malloc() until we exhaust the heap (230 kB + of ram allocated).
 
One other thing I tested was to sometimes free the heap space allocated by the threads and check if new allocations were able to use the empty spaces left on the heap. As far as I could see, it works. So I think malloc works as expected, as far as I don't use more than INIT_ALLOC bytes allocated by the threads, and that INIT_ALLOC is < 130 kB (at least in my code, but I know this should change depending on stack size and other parameters).
 
In my testing, there is nothing magic about a preliminary allocation from main thread.

I don't understand. What do you mean by "nothing magic"? In my testing, if I comment out the preliminary allocation from the main thread, it doesn't work. If I let it there, it works. Isn't it related to the first malloc() call initializing some values like __brkval ?
 
Not looked deeper but: Problem seems malloc expects to own the RAM space as it understands it based on that first call setting the base pointer - and the main and threads have unique views with regard to stack and the calcs malloc makes.
> the "hack" showed that - and may or may not have offered a usable work around.
> Odd though is that issue with INIT_ALLOC limit below all of free heap?

Question: Are the needed allocs controlled and expected? Or varying size and occurrence?

Doing a static alloc of a block of RAM for each thread { uint32_t myThread[12048] } would allow the thread to own and control its own memory. Either with single fixed block or internal tracking of subblocks to use and free.
 
I don't understand. What do you mean by "nothing magic"? In my testing, if I comment out the preliminary allocation from the main thread, it doesn't work. If I let it there, it works. Isn't it related to the first malloc() call initializing some values like __brkval ?

If you run the last program I posted, you can see that all of heap can eventually be used even without a large initial malloc by main thread.
 
Cool, haven't looked in visits ... and gotta run now ... what diff made it change behavior?

Doing some or occasional malloc from main thread seems to “unlock” the heap for malloc by the other threads, but I don’t think it’s meant to work that way, and it works only “accidentally”. It never makes sense for top of heap to be above bottom of stack. In the main thread, it never will be, but in all other threads it will be.
 
Doing some or occasional malloc from main thread seems to “unlock” the heap for malloc by the other threads, but I don’t think it’s meant to work that way, and it works only “accidentally”. It never makes sense for top of heap to be above bottom of stack. In the main thread, it never will be, but in all other threads it will be.

Okay that is the last code I started with - but minor edit to remove 'warning' and (void*)i+1
Code:
  for (int i=1; i<=threadCount; i++)
    threads.addThread(thread_func, (void*)i, threadStackSize); // stack alloc'd from heap

That had the intermittent failures until I touched the code - then it went wierd skiiping whole threads it seemed?

...
 
Did a quick edit to store strings for thread 0 to print - and had each thread yield after one try and only try in turn if their number was up - then pass turn to next thread ... in the process it went odder at each step

One problem is using sprintf to string may be the same as printf, but only one thread can try at a time and it went bad to worse in seeing what was expected ...

Didn't stick with it to resolve or see if initial setup alloc would resolve.

Okay - oops - cleaned a few code errors - thread 4 fails early - the rest consume all of memory:
Code:
0:Stack size:10240|Used:537067512|Remains:-537057272|State:RUNNING|
1:Stack size:1024|Used:40|Remains:984|State:RUNNING|
2:Stack size:1024|Used:40|Remains:984|State:RUNNING|
3:Stack size:1024|Used:40|Remains:984|State:RUNNING|
4:Stack size:1024|Used:40|Remains:984|State:RUNNING|

Thread 0 malloc 1024. Addr = 0x1fff2e60. thread_total = 1024. total = 1024
void ID=1
void ID=2
void ID=3
void ID=4
Thread 0 malloc 1024. Addr = 0x1fff3e80. thread_total = 2048. total = 5120
Thread 1 malloc 1024. Addr = 0x1fff3268. thread_total = 1024. total = 2048
Thread 2 malloc 1024. Addr = 0x1fff3670. thread_total = 1024. total = 3072
Thread 3 malloc 1024. Addr = 0x1fff3a78. thread_total = 1024. total = 4096
Thread 4 malloc failed #0.
Thread 0 malloc 1024. Addr = 0x1fff4ea0. thread_total = 3072. total = 9216
Thread 1 malloc 1024. Addr = 0x1fff4288. thread_total = 2048. total = 6144
Thread 2 malloc 1024. Addr = 0x1fff4690. thread_total = 2048. total = 7168
Thread 3 malloc 1024. Addr = 0x1fff4a98. thread_total = 2048. total = 8192
Thread 4 malloc failed #1.
Thread 0 malloc 1024. Addr = 0x1fff5ec0. thread_total = 4096. total = 13312
Thread 1 malloc 1024. Addr = 0x1fff52a8. thread_total = 3072. total = 10240
Thread 2 malloc 1024. Addr = 0x1fff56b0. thread_total = 3072. total = 11264
Thread 3 malloc 1024. Addr = 0x1fff5ab8. thread_total = 3072. total = 12288
Thread 4 malloc failed #2.
Thread 0 malloc 1024. Addr = 0x1fff6ee0. thread_total = 5120. total = 17408
Thread 1 malloc 1024. Addr = 0x1fff62c8. thread_total = 4096. total = 14336
Thread 2 malloc 1024. Addr = 0x1fff66d0. thread_total = 4096. total = 15360
Thread 3 malloc 1024. Addr = 0x1fff6ad8. thread_total = 4096. total = 16384
Thread 4 malloc failed #3.
Thread 0 malloc 1024. Addr = 0x1fff7f00. thread_total = 6144. total = 21504
Thread 1 malloc 1024. Addr = 0x1fff72e8. thread_total = 5120. total = 18432
Thread 2 malloc 1024. Addr = 0x1fff76f0. thread_total = 5120. total = 19456
Thread 3 malloc 1024. Addr = 0x1fff7af8. thread_total = 5120. total = 20480
Thread 4 malloc failed #4.
Thread 0 malloc 1024. Addr = 0x1fff8f20. thread_total = 7168. total = 25600
Thread 1 malloc 1024. Addr = 0x1fff8308. thread_total = 6144. total = 22528
Thread 2 malloc 1024. Addr = 0x1fff8710. thread_total = 6144. total = 23552
Thread 3 malloc 1024. Addr = 0x1fff8b18. thread_total = 6144. total = 24576
void ID=4
...
Thread 0 malloc 1024. Addr = 0x20029d30. thread_total = 57344. total = 224256
Thread 1 malloc 1024. Addr = 0x20029118. thread_total = 56320. total = 221184
Thread 2 malloc 1024. Addr = 0x20029520. thread_total = 56320. total = 222208
Thread 3 malloc 1024. Addr = 0x20029928. thread_total = 54272. total = 223232
void ID=4
Thread 0 malloc 1024. Addr = 0x2002ad50. thread_total = 58368. total = 228352
Thread 1 malloc 1024. Addr = 0x2002a138. thread_total = 57344. total = 225280
Thread 2 malloc 1024. Addr = 0x2002a540. thread_total = 57344. total = 226304
Thread 3 malloc 1024. Addr = 0x2002a948. thread_total = 55296. total = 227328
void ID=4
Thread 0 malloc 1024. Addr = 0x2002bd70. thread_total = 59392. total = 232448
Thread 1 malloc 1024. Addr = 0x2002b158. thread_total = 58368. total = 229376
Thread 2 malloc 1024. Addr = 0x2002b560. thread_total = 58368. total = 230400
Thread 3 malloc 1024. Addr = 0x2002b968. thread_total = 56320. total = 231424
void ID=4
Thread 0 malloc failed #0.
Thread 1 malloc 1024. Addr = 0x2002c178. thread_total = 59392. total = 233472
Thread 2 malloc 1024. Addr = 0x2002c580. thread_total = 59392. total = 234496
Thread 3 malloc 1024. Addr = 0x2002c988. thread_total = 57344. total = 235520
void ID=4
Thread 0 malloc failed #1.
Thread 1 malloc failed #0.
Thread 2 malloc failed #0.
Thread 3 malloc failed #2.
void ID=4
Thread 0 malloc failed #2.
Thread 1 malloc failed #1.
Thread 2 malloc failed #1.
Thread 3 malloc failed #3.
void ID=4
Thread 0 malloc failed #3.
Thread 1 malloc failed #2.
Thread 2 malloc failed #2.
Thread 3 malloc failed #4.
void ID=4
Thread 0 malloc failed #4.
Thread 1 malloc failed #3.
Thread 2 malloc failed #3.
Thread 0 malloc failed #5.
Thread 1 malloc failed #4.
Thread 2 malloc failed #4.
Thread 0 malloc failed #6.
Thread 0 malloc failed #7.
Thread 0 malloc failed #8.
Thread 0 malloc failed #9.

Code:
#include <Arduino.h>
#include <TeensyThreads.h>

#define SIZE_ALLOC 1024 // JWP 500
uint32_t total_bytes_alloc = 0;
uint32_t thread_bytes_alloc[5] = {0, 0, 0, 0, 0};
char szOuts[5][128];
volatile int32_t myTurn = 0;

void thread_loop( int threadID )
{
  //bool fail = false;
  uint32_t fCnt = 0;
  szOuts[threadID][0] = 0;
  while (1) {
   [B] if ( threadID == myTurn ) {[/B]
      char* pointer = (char*)malloc(SIZE_ALLOC);
      if (pointer) {
        //fail = false;
        thread_bytes_alloc[threadID] += SIZE_ALLOC;
        total_bytes_alloc += SIZE_ALLOC;
        sprintf(&szOuts[threadID][0], "Thread %1d malloc %d. Addr = %p. thread_total = %ld. total = %lu\n",
                threadID, SIZE_ALLOC, pointer,
                thread_bytes_alloc[threadID], total_bytes_alloc);
      }
      else {
        if ( fCnt < 5 || ( fCnt < 10 && 0 == threadID )) {
          sprintf(szOuts[threadID], "Thread %1d malloc failed #%lu.\n" , threadID, fCnt);
          fCnt++;
        }
        //fail = true;
      }
      [B]if ( 0 == threadID ) {[/B]
        for ( int ii = 0; ii < 5; ii++ ) {
          if ( szOuts[ii][0] ) {
            Serial.print( szOuts[ii] );
            szOuts[ii][0] = 0;
          }
          else if ( fCnt < 5 ) {
            Serial.printf( "void ID=%d\n", ii );
          }
        }
      }
      myTurn++;
      if ( myTurn > 4 )
        myTurn = 0;
    }
    threads.yield();
    // threads.delay(50); // JWP (500)
  }
}

void thread_func( void *param )
{
  thread_loop( (int)param );
}

void setup()
{
  Serial.begin(9600);
  delay(1000); // JWP (5000);
  int threadStackSize = 1024;
  int threadCount = 4;
  for (int i = 1; i <= threadCount; i++)
    threads.addThread(thread_func, (void*)i, threadStackSize); // stack alloc'd from heap
  Serial.println(threads.threadsInfo()); // Information for thread 0 (main) is incorrect, information for thread 1 is missing
}

void loop()
{
  thread_loop( 0  );
}
 
Doing some or occasional malloc from main thread seems to “unlock” the heap for malloc by the other threads, but I don’t think it’s meant to work that way, and it works only “accidentally”. It never makes sense for top of heap to be above bottom of stack. In the main thread, it never will be, but in all other threads it will be.

I have tested all the codes posted previously but haven't had success in being able to reliably malloc() from any thread at any time, without needing to malloc() from the main thread in order to "unlock" the malloc to the other threads again. Just to confirm, did you happen to discover any way to do that so far in your testing? Thanks again for the help.

Okay - oops - cleaned a few code errors - thread 4 fails early - the rest consume all of memory

Wow, I tested this code and got the same results. It just don't make any sense to me. It seems like a coincidence that it worked. I have tried changing the number of threads but could not see any clear pattern. I tried with 7 threads total and 3 threads, all of them had some kind of malloc() issues.

I am still working on this, trying to figure out a way to make malloc() work reliably
 
Ok, so I was thinking about this:
It seems like even if it works, it's not going to work as expected when the stack pointer for all threads other than the main thread is below the top of the heap.
And came up with an crazy ideia that (at least in my testing) worked: What if all the threads stacks were above the top of the heap, like it is natural for a stack to be?

What I did was to allocate the stack for the threads as an local (stack) buffer in the loop() function.

Code:
#include <Arduino.h>
#include <TeensyThreads.h>
#define SIZE_ALLOC 1024 // JWP 500

uint32_t total_bytes_alloc = 0;
uint32_t thread_bytes_alloc[5] = {0,0,0,0,0};
bool threads_created = false;

void thread_loop( int threadID )
{
  int numLoops = 0;
  while(1) {
    numLoops++;
    char* pointer = (char*)malloc(SIZE_ALLOC);
    if (pointer) {
      thread_bytes_alloc[threadID] += SIZE_ALLOC;
      total_bytes_alloc += SIZE_ALLOC;

      Serial.print("Thread ");
      Serial.print(threadID);
      Serial.print(" loop #");
      Serial.print(numLoops);
      Serial.print(" malloc ");
      Serial.print(SIZE_ALLOC);
      Serial.print(". Addr = 0x");
      Serial.print((uint32_t)pointer, HEX);
      Serial.print(". thread_total = ");
      Serial.print(thread_bytes_alloc[threadID]);
      Serial.print(". total = ");
      Serial.println(total_bytes_alloc);
    }
    else
    {
      Serial.print("Thread ");
      Serial.print(threadID);
      Serial.print(" loop #");
      Serial.print(numLoops);
      Serial.println(" malloc failed!");
    }
    threads.delay(100);
  } 
}

void thread_func( void *param )
{
  thread_loop( (int)param );
}

void setup()
{
  Serial.begin(9600);
  delay(3000);
}

void loop() 
{
  int threadStackSize = 1024;
  int threadCount = 4;
  char threadsStack[threadCount][threadStackSize]; // The stack of the threads are created on the stack of the loop()
  if(!threads_created)
  {
    // Only runs once
    for (int i=1; i<=threadCount; i++)
    threads.addThread(thread_func, (void*)i, threadStackSize, threadsStack[i-1]); // threads stack are now on the main stack
    Serial.println(threads.threadsInfo()); // Information for thread 0 (main) is incorrect, information for thread 1 is missing
    threads_created = true;
  }
  thread_loop(0);
}

Does it make any sense that it worked? Can you guys see any immediate pitfalls or dangers in doing that?
 
Last edited:
If the root cause of trouble is overflowing the allocated stack space (just a blind guess), then moving the stack allocation somewhere else where you're less likely to overwrite anything important would (probably) make the problems less likely to manifest. Maybe.
 
If the root cause of trouble is overflowing the allocated stack space (just a blind guess), then moving the stack allocation somewhere else where you're less likely to overwrite anything important would (probably) make the problems less likely to manifest. Maybe.

Based on this note code was added when threads start:
Code:
void thread_loop( int threadID )
{
  //bool fail = false;
  uint32_t fCnt = 0;
#define STK_CHK_SIZE 80
  int stkSig[STK_CHK_SIZE];
  for ( int ii = 0; ii < STK_CHK_SIZE; ii++ ) stkSig[ii] = threadID;
...

There are 800 bytes free in the stack so that should work up to: #define STK_CHK_SIZE 200

The above with 80 works, starting with 200 or even 100 causes a quick HALT!
Code:
New Thread #1
New Thread #2
New Thread #3
New Thread #4
_____
0:Stack size:10240|Used:537067512|Remains:-537057272|State:RUNNING|
1:Stack size:1024|Used:40|Remains:984|State:RUNNING|
2:Stack size:1024|Used:40|Remains:984|State:RUNNING|
3:Stack size:1024|Used:40|Remains:984|State:RUNNING|
4:Stack size:1024|Used:40|Remains:984|State:RUNNING|

Thread 0 malloc 1024. Addr = 0x1fff2e70. thread_total = 1024. total = 1024
void ID=1
void ID=2
void ID=3
void ID=4
Thread 0 malloc 1024. Addr = 0x1fff3e90. thread_total = 2048. total = 5120
Thread 1 malloc 1024. Addr = 0x1fff3278. thread_total = 1024. total = 2048
Thread 2 malloc 1024. Addr = 0x1fff3680. thread_total = 1024. total = 3072
Thread 3 malloc 1024. Addr = 0x1fff3a88. thread_total = 1024. total = 4096
Thread 4 malloc failed #0.
_____
0:Stack size:10240|Used:537067512|Remains:-537057272|State:RUNNING|
1:Stack size:1024|Used:584|Remains:440|State:RUNNING|
[B][COLOR="#FF0000"]2:Stack size:1024|Used:584|Remains:440|State:1628430336|[/COLOR][/B]
3:Stack size:1024|Used:584|Remains:440|State:RUNNING|
4:Stack size:1024|Used:584|Remains:440|State:RUNNING|

  // [B]END OF OUTPUT[/B] - execution output stops here ????

Button push reprogram and run stops one iteration sooner on the prints from thread 0 showing saved strings.
Code added to confirm thread is created - not returning -1 but the thread ID.
Code before threads.yield() verifies the STK_CHK_SIZE entry values are as expected.
Would need more code added to find where it is ...

ADDED thread 0 check for this '0' when thread 4 fails malloc: Thread 4 malloc failed #0.
Code:
        if (szOuts[4][24] == '0')
          Serial.println(threads.threadsInfo());
> then threadsInfo is printed :: Threads structs are in a bad "State"

And the Stack space remaining should support this added Stack Signature { stkSig } testing beyond 100.
 
I tried a few more things, including having the thread stacks on the stack of the main thread, and I think the fundamental problem is the same. If you call malloc() with different stack pointers, you can overwrite memory you did not intend. I added call to memset() in thread_loop() to set all malloc'd bytes to 0xFF. When I do this, only the main thread continues to run.
 
Did a quick edit to setup for Alloc/Free and no diff now in current code - except the indication of TThreads data issues.

Works with 80 and fails with 100 stack DWord edits.
Code:
...
Thread 0 malloc 1024. Addr = 0x1fff3e90. thread_total = 2048. total = 5120
Thread 1 malloc 1024. Addr = 0x1fff3278. thread_total = 1024. total = 2048
Thread 2 malloc 1024. Addr = 0x1fff3680. thread_total = 1024. total = 3072
Thread 3 malloc 1024. Addr = 0x1fff3a88. thread_total = 1024. total = 4096
Thread 4 malloc failed #0.
_____
0:Stack size:10240|Used:537067512|Remains:-537057272|State:RUNNING|
1:Stack size:1024|Used:584|Remains:440|State:RUNNING|
2:Stack size:1024|Used:584|Remains:440|State:554631168|
3:Stack size:1024|Used:584|Remains:440|State:1628393472|
4:Stack size:1024|Used:584|Remains:440|State:RUNNING|
// END OF OUTPUT

Not sure if this points to issue with TeensyThreads code or just malloc not liking the thread world as presented.

Need to show pointers TThread has for stack
 
Back
Top