Teensy 3.x multithreading library first release

Is it possible to set a thread priority using this library, such that a higher priority thread will preempt a lower priority thread?
 
Me again Brian.

Been digging into c++ threads again for another project and I don't think c++11 or c++14 implements a setPriority command, so I doubt that TeensyThreads does it either. I think it is supported in Windows or Linux implementations of threads. So unless fritas implements in his library don't think the current version has a command to setPriority.
 
I didn't see it in my following - github doesn't address Priority - except first lock blocked item will be called next.

Adding selectivity to time order queueing would really impact overhead of what seems to be a simple thread switcher.
 
I just wanted to say thanks for this library. It's solved a problem with my POV wheel project. You see, It's spinning about 900rpm (planning to go faster), and has 32 APA102 leds divided up into 360 segments per rotation. I was struggling with too much CPU use during the "render" phase, that was causing skips of some of the output segments. Even overclocked to 120, 144 or 168Mhz wasn't enough. I'm using a simple double-buffer. Rendering to one one buffer while the other is read for outputs to the LEDs. Some of the animations and graphics I'm rendering at about 5fps needed more CPU cycles than I had to spare between segment outputs. I was doing every trick in the book to speed up the rendering, like having a ATAN, SIN and COS lookup table. A lookup table for converstion of a polar coord to a cartsean coord. That sort of thing. I was even starting to plan out my own "half-assed" threading scheme. All I needed was to break up the rendering phase into chunks until it was done, then flip the buffer. After lots of failed "Arduino" searches, I found this thread.. and Voila!! I can take as long as I need on my rendering thread now, and almost no impact on the output thread at all. I even removed the COS and SIN lookup tables and everything is working perfectly.

I still have couple of math precision issues to work out, and the fact that I have to be sure to leave enough clock-ticks for the FastLED library to send it's output. But, other than that, it's looking good!! :cool:
 
I helped someone with a project yesterday and had a look at TeensyThreads and found some flaws in it. I hope this can help.

1. The way FPU registers are saved and restored unconditionally are very inefficient. Cortex uses lazy stacking and all you have to do is: (google it for technical info)

// save
tst r14, #0x10
it eq
vstmdbeq r0!, {s16-s31}

// restore
tst r14, #0x10
it eq
vldmiaeq r0!, {s16-s31}

2. The threads_svcall_isr interrupt should have the lowest priority (0xff) on the system. Or it’s impossible to yield from any interrupt with lower or same priority.

3. The use of noinline attributes makes no sens. An advice: Always develop and test you code the optimizer on full throttle -O3 (don’t use lto as default). You can always turn it off if you want to debug. It’s always easier discover and correct the errors right away and the optimizer is normally not destroying anything if done correctly. (Be carefull, the Internet is floating with a lot of corrupt examples on inline asm)

4. Interrupts are disabled longer than necessary, it is recommended to use BASEPRI instead of disabling all interrupts. But that change is up to PJRC or it will break compatibility. Freertos has done a decent job, look at a small file under Source/portable/GCC/ARM_CM4F/port.c
 
Last edited:
Thank you for looking at the code and pointing out these very interesting optimizations. It has been over 2 years since I wrote this code so I'm don't recall many of the specifics. Can you create a patch or a change in git for #1 and #2?

For #3, I presume you are referring to the noinline Threads::Mutex::lock/unlock? I don't recall the details but there was a problem with one of the compiler settings. Perhaps when link-time optimizations are enabled? I'll have to read through my notes. Maybe it's not relevant any more.

For #4, it seems like a good idea but, like you, I don't know what will break. So maybe it's something to note for the future.

I helped someone with a project yesterday and had a look at TeensyThreads and found some flaws in it. I hope this can help.

1. The way FPU registers are saved and restored unconditionally are very inefficient. Cortex uses lazy stacking and all you have to do is: (google it for technical info)

// save
tst r14, #0x10
it eq
vstmdbeq r0!, {s16-s31}

// restore
tst r14, #0x10
it eq
vldmiaeq r0!, {s16-s31}

2. The threads_svcall_isr interrupt should have the lowest priority (0xff) on the system. Or it’s impossible to yield from any interrupt with lower or same priority.

3. The use of noinline attributes makes no sens. An advice: Always develop and test you code the optimizer on full throttle -O3 (don’t use lto as default). You can always turn it off if you want to debug. It’s always easier discover and correct the errors right away and the optimizer is normally not destroying anything if done correctly. (Be carefull, the Internet is floating with a lot of corrupt examples on inline asm)

4. Interrupts are disabled longer than necessary, it is recommended to use BASEPRI instead of disabling all interrupts. But that change is up to PJRC or it will break compatibility. Freertos has done a decent job, look at a small file under Source/portable/GCC/ARM_CM4F/port.c
 
Hi,

first of all thank you for this great library….I just want to be sure that i understood the threading right…

I coded:
Code:
threads.setSliceMicros(100);
const int instrumentUpdateThread = threads.addThread(updataInstValuesThreadFunction);
threads.setTimeSlice(threads.id(), 5);// This is the main loop as thread
threads.setTimeSlice(instrumentUpdateThread, 10);// This is an additional thread
…..

void loop()
{
…..do things here (costs approx. 400 µsec)
threads.delay(20);    // Will give 20 msec to other threads ?
}

void updataInstValuesThreadFunction()
{
     int i = 0;
    do
    {
      i++;
    }while(true);
}

OK as far as i understood the threads…
Setting "setSliceMicros" to 100 => means 100 ticks is one time slice
Setting "setTimeSlice(…) to 5 => means 5 * 100 = 500 ticks will be the main loop running without context task switch ?
=> so if approx 1 tick is 1 µsec => the "do things here" will not be interrupted ?
=> then threads.delay(20) will switch to "updataInstValuesThreadFunction" which has now time to do things for 20 ms=>20000 µsec and will be called multible times then ?
Which effect does it have to rise the number of ticks avail for one thread ? Higher priority and/or less context switches ?

Thank you very much for your help

Torsten
 
Hi all, I am trying to use the TeensyThread library to thread a function with 2 in parameters, but I cannot make it work. I was unable to find an example about this case, can anyone please show how to do it? This is my poor trial...

Code:
#include <TeensyThreads.h>

volatile int count = 0;

void pattern(void* arg[]){
  while(1){
    count = count + arg[0]*arg[1];
  }
}



void setup() {
  Serial.begin(9600);
  int a[2] = {10, 20};
  threads.addThread(pattern, a);
}


void loop() {
  Serial.println(count);
}
 
Start by not passning in a pointer to an array that is local to the function. I Have a feeling that might be your problem.
 
Start by not passning in a pointer to an array that is local to the function. I Have a feeling that might be your problem.

That could easily be tested, by moving int a[2] = {10, 20}; outside setup() or by making the array static "static int a[2] = {10, 20}; "
 
Hi
I want to know if i can used Serial , Serial1 , Serial2 in different threads (One Serial entity per thread) without risk to loss datas. Need i add some secure mechanism like mutex ?
Eric
 
mutexes are only used if you plan to use one serial object in 2 or more threads. If you are using the serial object ONLY in one thread, no mutex is needed
 
Ok thanks tonton81.
Another questions about mutex.
1)I have seen 2 methods to get mutex, fist with lock() and second with Scope. In the first, is it possible to pass one timeOut like argument, but what about if TimeOut fall. The code section between lock() and unlock() is avoided ?
2) With Scope method, what about if mutex is not free ? it wait indefinitively ?
3) In my project, i want ussing class to pass datas between threads. Need i protects all read/write actions with mutex, or class mechanism protect them.
4) In the library example, ftrias, use volatile to pass datas between threads, do you known if this declaration protect from concurrency ?
Thanks
 
Hi,

anyone tried TeensyThreads on a teensy 4? I tried it and get an error that says that IRQ_PIT_CH0 is not declared…


TeensyThreads.cpp: In member function int Threads::setMicroTimer(int)

TeensyThreads.cpp: 260:46: error: 'IRQ_PIT_CH0' was not declared in this scope
int number = (IRQ_NUMBER_t)context_timer - IRQ_PIT_CH0
Error compiling libraries
Debug build failed for project 'T4 Test'

@ftrias are there plans to adapt this great library to the T4?

Thank you

Torsten
 
@Spyy

Yes I have a hacked up version of TeensyThreads that seems to work on the T4. Only piece that you won't be able to use is "setMicroTimer". Ran into the same problem. If you want to give it a try: https://github.com/mjs513/TeensyThre...ensyThreads_t4.

My notes on what I did and still open questions are in this post: https://forum.pjrc.com/threads/54711-Teensy-4-0-First-Beta-Test?p=211460&viewfull=1#post211460

I did a couple of tests and it seems to work ok. I did post an issue to @ftrias repository.
 
Thanks, @mjs513, for porting the library to the T4. Your changes look great. However, I do not currently own a T4 so I haven't tested them. I will try to buy one soon and try it out.
 
Paul was kind enough to send me a Teensy 4 so I was able to look into porting TeensyThreads. As a starting point, I used @mjs513's excellent code at https://forum.pjrc.com/threads/54711-Teensy-4-0-First-Beta-Test?p=211460&viewfull=1#post211460.

The handing of the missing "unused_isr()" function seems reasonable.

However, I am not so sure about the setMicroTimer() code. It seems that the new T4 only has one interrupt for handling all PIT timers (vs T3 has 4 interrupts). On T3, TeensyThreads will take over one of those interrupts and thus you would only be able to use 3 IntervalTimers, which seems like a fair tradeoff. But on the T4, if TeensyThreads takes over the interrupt, it will disable the use of IntervalTimer. Is that acceptable? Perhaps I am missing something.

This is the implementation I'm thinking of using:

Code:
int Threads::setMicroTimer(int tick_microseconds)
{
  // lowest priority so we don't interrupt other interrupts
  context_timer.priority(255);
  // start timer with dummy fuction
  if (context_timer.begin(context_pit_empty, tick_microseconds) == 0) {
    // failed to set the timer!
    return 0;
  }
  currentUseSystick = 0; // disable Systick calls

#ifdef __IMXRT1062__
  attachInterruptVector(IRQ_PIT, context_switch_pit_isr);
#else
  // get the PIT number [0-3] (IntervalTimer overrides IRQ_NUMBER_t op)
  int number = (IRQ_NUMBER_t)context_timer - IRQ_PIT_CH0;
  // calculate number of uint32_t per PIT; should be 4.
  // Not hard-coded in case this changes in future CPUs.
  const int width = (PIT_TFLG1 - PIT_TFLG0) / sizeof(int);
  // get the right flag to ackowledge PIT interrupt
  context_timer_flag = &PIT_TFLG0 + (width * number);
  attachInterruptVector(context_timer, context_switch_pit_isr);
#endif

  return 1;
}
 
@ftrias

You may want to read this post in Beta thread on Interval Timer: https://forum.pjrc.com/threads/54711-Teensy-4-0-First-Beta-Test?p=195605&viewfull=1#post195605, the fix though is in post 884 of the same thread a few posts down. Seems like while you have 1 PIT interrupt you can test on the channel which defines the one of the 4 available interval timers. I think :)

EDIT: PS. Thanks for compliment but I didn't do anything except try to figure out interval timer - the code is all yours. There is no way I could I wrote teensythreads.
 
@ftrias

You may want to read this post in Beta thread on Interval Timer: https://forum.pjrc.com/threads/54711-Teensy-4-0-First-Beta-Test?p=195605&viewfull=1#post195605, the fix though is in post 884 of the same thread a few posts down. Seems like while you have 1 PIT interrupt you can test on the channel which defines the one of the 4 available interval timers. I think :)

EDIT: PS. Thanks for compliment but I didn't do anything except try to figure out interval timer - the code is all yours. There is no way I could I wrote teensythreads.

The problem is the the context_switch_isr() function as written must be called directly as an interrupt. It will unroll the special interrupt call stack frame to figure out where it was called from in order to know where to jump to on subsequent context switches. It can't be called in the normal way from within another function. This is why I have to take over the isr function instead of using EventTimer or something like that.

I downloaded the RT1060 reference and see that there is a General Purpose Timer (GPT) that might help (Section 51, page 3073). It does not appear to be used by T4. Is that so? I could try to use that, but this would be a fair amount of work, especially since the T4 does not have JTAG/SWD debugging access.

EDIT: I found this code: https://github.com/manitou48/teensy4/blob/master/gpt_isr.ino that has all the setup laid out for me so this should be relatively easy. So I think the plan is to not use systick and IntervalTimer on the T4, but rather to simply take over one of the GPT timers.
 
Last edited:
@ftrias

The problem is the the context_switch_isr() function as written must be called directly as an interrupt. It will unroll the special interrupt call stack frame to figure out where it was called from in order to know where to jump to on subsequent context switches. It can't be called in the normal way from within another function. This is why I have to take over the isr function instead of using EventTimer or something like that.
Never realized that - thanks for the explanation.

EDIT: I found this code: https://github.com/manitou48/teensy4...er/gpt_isr.ino that has all the setup laid out for me so this should be relatively easy. So I think the plan is to not use systick and IntervalTimer on the T4, but rather to simply take over one of the GPT timers.
Glad you found the stuff that @manitou put together - he has done a lot with timers.

Good luck
 
I committed this to my github: https://github.com/ftrias/TeensyThreads/commit/d61579a76428e59e9548204312a7b36da85df59c

For Teensy 3, it works the same as before. For Teensy 4, it uses an unused GPT timer to schedule context switches. Thus, it removes the dependency on SysTick and IntervalTimer. If both GPT timers are in use, it will fail. Perhaps in the future it should default to the old method if that's the case. But for now, I just wanted to put something out there that works.
 
I committed this to my github: https://github.com/ftrias/TeensyThreads/commit/d61579a76428e59e9548204312a7b36da85df59c

For Teensy 3, it works the same as before. For Teensy 4, it uses an unused GPT timer to schedule context switches. Thus, it removes the dependency on SysTick and IntervalTimer. If both GPT timers are in use, it will fail. Perhaps in the future it should default to the old method if that's the case. But for now, I just wanted to put something out there that works.

Nice.

Wondered about T4 GPT resolution … looks like it works to get better than 1 ms - with a 24 MHz clock for the GPT's :: int setMicroTimer(int tick_microseconds …

Code:
int Threads::setMicroTimer(int tick_microseconds)
{
#ifdef __IMXRT1062__
  gtp1_init(tick_microseconds);
#else
 
Back
Top