Multiple issues using TeensyThreads on T3.5. Dynamic heap allocation problems.

Good news. Thanks for letting us know. What was your timeslice before?

I was using 5 ticks (5 ms) for the main thread and the comm thread. For the led blink thread I was using just one tick.

I wasn't aware that the SPI and I2C communication was non-reentrant, so my reasoning for those short timeslices was that the user would feel more like it's three parallel threads on different cores, instead of three threads taking turns on just one core. Turns out the Teensy is so fast that even if I don't interrupt the thread to do context switching (using a very large timeslice) and only change context using yield or delay, most of the times the threads run for less than 5 ms each anyway. It only takes more time when one particularly slow task which which can't be split in smaller quicker tasks is being executed, like writing to the SSD1306 display via I2C.

All the Teensies are still running without rebooting or crashing. I think we finally managed to solve the issue! I am very grateful for all the help received from this forum! Thanks!
 
I was using 5 ticks (5 ms) for the main thread and the comm thread. For the led blink thread I was using just one tick.

I wasn't aware that the SPI and I2C communication was non-reentrant, so my reasoning for those short timeslices was that the user would feel more like it's three parallel threads on different cores, instead of three threads taking turns on just one core. Turns out the Teensy is so fast that even if I don't interrupt the thread to do context switching (using a very large timeslice) and only change context using yield or delay, most of the times the threads run for less than 5 ms each anyway. It only takes more time when one particularly slow task which which can't be split in smaller quicker tasks is being executed, like writing to the SSD1306 display via I2C.

All the Teensies are still running without rebooting or crashing. I think we finally managed to solve the issue! I am very grateful for all the help received from this forum! Thanks!

That's great. By using a time-slice long enough that thread swaps occur only via thread.yield() and thread.delay(), you are avoiding preemption and using TeensyThreads as a cooperative RTOS. That means you don't have to worry about protecting calls to malloc. It should be okay for a thread switch to occur within i/o to I2C or UART as long as ISRs and other threads don't do i/o on the same device. For example, an I2C driver could yield() during a long operation such as waiting for an i/o operation to complete.
 
That's great. By using a time-slice long enough that thread swaps occur only via thread.yield() and thread.delay(), you are avoiding preemption and using TeensyThreads as a cooperative RTOS. That means you don't have to worry about protecting calls to malloc. It should be okay for a thread switch to occur within i/o to I2C or UART as long as ISRs and other threads don't do i/o on the same device. For example, an I2C driver could yield() during a long operation such as waiting for an i/o operation to complete.

Yep, I think not only malloc(), but also the other shared resources I was protecting with locks such as Serial, EEPROM, SD and the command queue don't need to be protected anymore, as one thread always starts and concludes the operation before the other starts. I have checked my code and there are no threads.yield() or threads.delay() while using those resources, so there is no context switch and I think it will be fine.

One more day has passed and no restarts or crashes :)
 
Yes, the problem seems pretty obvious. new/malloc() is using TOS (which grows downwards) minus top_of_heap to determine how much unallocated space exists. It assumes TOS is above top_of_heap cuz thats the way the program is linked, and how that malloc is written(). In teensyThreads, which allocate thread stacks out of the heap, TOS will be below the top_of_heap (it will be inside the heap) when running inside of a thread. This causes malloc to fail, as it thinks the heap has grown past the TOS and the difference is a negative number.

I don't see any good simple solution. I'll have to understand how teensyThreads schedules and executes threads (could be, likely is, on a timer() interrupt), and then see if the fix could be either shoe-horned into malloc(), or could somehow be provided by writing a wrapper around it that is teensyThread aware. Note further that malloc() is likely not-reentrant, and so would further need to be protected with a semaphore (volatile memory int) if it is indeed called from interrupts and does not (I would hope) disable interrupts.

It's a shame I discovered this at the tail end of a couple of weeks of writing code to use teeensyThreads to effect a of serial file server. Had it all working except for dynamic allocation within the threads when I got to the most complicated, final part of the implementation. Now I have to rethink it entirely.
 
Did you read the last few messages of the thread? There are lots of things in Arduino/Teensy that are non-reentrant, so a "generic" solution is set a long time slice so that all task switches occur via yield().
 
There is another solution, at least for stdlib functions like malloc, write, read, etc: override the locks and locking functions (all the stuff in sys/locks.h) that are built-in to newlib. That is after all what they're designed for.

It doesn't solve the problem of Teensy-specific libraries not being re-entrant but that's a lot less of an issue than things like new/delete, STL classes using dynamic memory, and so on.
 
The solution I settled on to solve the teensyThreads malloc() problem was to intercept sbrk() and have it invariantly use the MSP ('main' stack pointer), rather than the current executing stack pointer. Paul made sbrk() weakly linked in the teensy3 core code for good reason. I copied his source for sbrk() to my main ino file, and had to modify exactly 4 characters of his source code (after discounting the copy-and-paste used to get a version of sbrk() that compiled) to get it to work.

Code:
//--------------------------------------------------
// teensyThreads teensy 3.x specific malloc fix
//--------------------------------------------------
// sbrk() and STACK_MARGIN define copied from
// 	C:\Program Files (x86)\Arduino\hardware\teensy\avr\cores\teensy3\mk20dx128.c
// Also needed to include <errno.h> and declare extern char* __brkval;
//     but otherwise, all I changed was the one line of inline assembly

extern "C" {

    #include <errno.h>

    #ifndef STACK_MARGIN
        #if defined(__MKL26Z64__)
            #define STACK_MARGIN  512
        #elif defined(__MK20DX128__)
            #define STACK_MARGIN  1024
        #elif defined(__MK20DX256__)
            #define STACK_MARGIN  4096
        #elif defined(__MK64FX512__) || defined(__MK66FX1M0__)
            #define STACK_MARGIN  8192
        #else
            #define STACK_MARGIN  1024    
                // I din't bother to figure out what the default should; 
                // I just stuck 1024 here
        #endif
    #endif

    extern char* __brkval;      // current end (top) of heap

    void * _sbrk(int incr)
        // copied from C:\Program Files (x86)\Arduino\hardware\teensy\avr\cores\teensy3\mk20dx128.c
    {
        char *prev, *stack;
        prev = __brkval;
        if (incr != 0) {
            // instead of following asm which gets sp register
            // __asm__ volatile("mov %0, sp" : "=r" (stack) ::);
            // we get the msp ('main' thread) stack register like this:
            __asm__ volatile("mrs %0, msp" : "=r" (stack) ::);
            if (prev + incr >= stack - STACK_MARGIN) {
                errno = ENOMEM;
                return (void *)-1;
            }
        __brkval = prev + incr;
        }
        return prev;
    }

}    // extern "C"

You can cut and paste this into your teensy3 ino file and it will probably work.

This doesn't solve the re-entrancy problem, but in my code that will be easy to fix by merely bracketing the few new/malloc() and delete/free() calls (along with SDFat calls) that I make with a semaphore. I'm not doing SPI, I2C, etc, etc, etc, from my threads. I just din't want to have to write malloc or my own suballocator) tonight ... I'm in the middle of other things ... so this works for me. Hope someone finds it useful.
 
Did you read the last few messages of the thread? There are lots of things in Arduino/Teensy that are non-reentrant, so a "generic" solution is set a long time slice so that all task switches occur via yield().

the length of the time slice wont fix the basic disagreement between teensyThreads and malloc ... though I think I saw some other work arounds in previous posts, I think the one I suggested is kinda simple and elegant.
 
My final solutions - still using those

As this thread became long, I will add to phorton1 solution above and try to summarize my final solution to both problems related with TeensyThreads (non-reentrant code and malloc/new). I hope that someone finds this useful.

Problem: malloc() does not work correctly inside thread
Solution: When creating a thread with TeensyThreads, if you don't provide a buffer to be the new thread's stack, the library will use malloc() to create the thread stack on Teensy's heap. This is the problem, but the explanation is more technical (check the previous posts from this thread). The solution is to provide a statically allocated buffer (whose memory lives in the stack) to the threads.addThread() function. But doing this, you also need to make sure that the statically allocated buffer never goes out of scope:

Instead of doing this:
Code:
#include <TeensyThreads.h>

void thread_func(){
  // thread code that includes malloc()
}

void setup() {
  threads.addThread(thread_func);
}

void loop() {
  // main loop
}

do this:

Code:
#include <TeensyThreads.h>

void thread_func(){
  // thread code that includes malloc()
}

void setup() {
  // usual setup, without adding threads here
}

void myLoop(){
  // Everything that was previously inside loop() goes here
}

void loop() {
  char threadStack[1024] = ""; // Statically allocated thread stack
  memset((void*)threadStack, 0, 1024); // Clear stack
  threads.addThread(thread_func, 0, 1024, threadStack); //Create the thread using the provided stack

  while(1) // Loop forever
  {
    myLoop();
    yield();
  }
}

Ps: This is the case for Teensy 3.x. Teensy 4.x doesn't seem to be affected by malloc() problems inside the thread because malloc() will allocate memory in RAM2, which is apart from RAM1 where code and stack are located.

Problem: The majority of Teensy code is non-reentrant, so a thread cannot be interrupted while it is performing some operations
Solution: Increase the time-slice for each thread to be a very large, like 10 or 20 seconds, with the intention of never letting the timer interrupt a running thread for context switching. Instead, use the cooperative threading approach: call threads.yield() at the end of threads loops or tasks to give back time for other threads to run in appropriate moments. Always use threads.delay() instead of delay, so other threads can run while one is busy waiting. If you yield() frequently enough, the code will run similarly as if the time slices were much smaller, and all threads will remain responsive.
 
Back
Top