RTOS no longer usable with Teensy?

Status
Not open for further replies.

Bill Greiman

Well-known member
Looks like EventResponder prevents use of FreeRTOS, ChibiOS and other RTOSs.

D:\Teensy\teensy139\hardware\teensy\avr\cores\teensy3/EventResponder.cpp:86: multiple definition of `pendablesrvreq_isr'
D:\Teensy\teensy139\hardware\teensy\avr\cores\teensy3/EventResponder.cpp:45: multiple definition of `systick_isr'

Should I remove Teensy support from ChibiOS/RT, FreeRTOS, and NillRTOS on GitHub?

I don't see any weak hook or other mechanism to use SysTick or PendSV.

I am directing issues from GitHub to this post.
 
I don't see any weak hook or other mechanism to use SysTick or PendSV.

From the comments at line 323 EventResponder.cpp:

// Long ago you could install your own systick interrupt handler by just
// creating your own systick_isr() function. No longer. But if you
// *really* want to commandeer systick, you can still do so by writing
// your function into the RAM-based vector table.
//
// _VectorsRam[15] = my_systick_function;
 
I assume the same is true for PendSV.

Code:
void pendablesrvreq_isr(void)
{
	EventResponder::runFromInterrupt();
}

It's probably time to give-up trying to support an RTOS on Arduino style systems. It is nearly impossible to make a RTOS library that simultaneously supports AVR, SAMD, SAMX, STM32, and Teensy3.

It's not use of SysTick that is a problem, in fact modern RTOSs don't use a tick, they are tickless so they are more efficient and to make power management easier. I just enabled a tick so the RTOS would be synchronized with millis/micros. Cortex M RTOSs use a 32-bit timer with interrupts only at required times.

PendSV is required.

The problem is too many versions of CMSIS, too many ways to access features like PendSV/SysTick, the need to restart in Thread Mode/Handler Mode, the use of processor priories, the fact that libraries and key functions like malloc are not thread safe, plus more...

This means the code is not easily supportable/maintainable.

The best way to support Arduino with an RTOS is to emulate the Arduino API on the RTOS. Modern RTOSs support a special class of “Fast Interrupts”, such interrupt sources have a higher hardware priority than the kernel so you can support devices with no RTOS overhead. This allows a clean partition between threads and extremely fast code.

Systems like Particle Photon/Electron do this so they can better support networking and USB.
 
Last edited:
@Bill - Paul's goal is to move EventResponder into Teensy core to create:

... In essence, this is a very lightweight cooperative system. Like all cooperative sharing systems, time taken for 1 task can stall others. Trade-offs in multitasking are a huge subject that's been discussed many times over. The idea is the core library will provide a default implementation that's as simple and lightweight as possible. My hope is more complex systems like RTOS libs can successfully override these defaults to provide API compatible functionality, so in the long term we can craft libraries that will (hopefully) be far more compatible with multitasking systems than today's status quo.
...
trying the best I can to move Teensy (and hopefully eventually Arduino) forward towards better support for event-based concurrency that defaults to the simplest possible cooperative system and can be extended to integrate nicely with more powerful cooperative and preemptive threading systems.

This and some related notes here: Minimal-Blink-fails-with-void-yield()

If done well that would improve Teensy ( if not all of Arduino ) to be more RTOS friendly - Teensy is only one family - but with "the new iMXRT chip" sitting on Paul's desk - not having robust co-operative multitasking or RTOS support would really limit it's full 600 Mhz potential.
 
@Bill - Paul's goal is to move EventResponder into Teensy core to create:



This and some related notes here: Minimal-Blink-fails-with-void-yield()

If done well that would improve Teensy ( if not all of Arduino ) to be more RTOS friendly - Teensy is only one family - but with "the new iMXRT chip" sitting on Paul's desk - not having robust co-operative multitasking or RTOS support would really limit it's full 600 Mhz potential.

The problems of taking full advantage of a modern RTOS in the Arduino environment with new Cortex M7 and multi-processor chips are overwhelming. It's no longer about a scheduler. You need a whole new foundation and HAL to support modern devices and modern protocols.

Arduino is great for close to bare hardware access. Most people using Arduino don't need or don't want an RTOS. Many people want to do a single task really fast.

I spent my career in a different world. An RTOS is more about reliability than speed.

What an RTOS is

An RTOS is an operating system whose internal processes are guaranteed to be compliant with (hard or soft) realtime requirements. The fundamental qualities of an RTOS are:

Predictability. It is the quality of being predictable in the scheduling behavior.
Deterministic. It is the quality of being able to consistently produce the same results under the same conditions.

RTOS are often confused with “fast” operating systems. While efficiency is a positive attribute of an RTOS, efficiency alone does not qualifies an OS as RTOS but it could separate a good RTOS from a not so good one.

What an RTOS is not

An RTOS is not a magic wand, your system will not be “realtime” just because you are using an RTOS, what matters is your system design. The RTOS itself is just a toolbox that offers you the required tools for creating a realtime system, you can use the tools correctly or in the wrong way.

Soon after the near crash of the Apollo 11 lander, which used a coop scheduler, theorems like this appeared:

Theorem Liu and Layland 1973. Given a preemptive, fixed priority scheduler and a finite set of repeating tasks T = {T1; T2; ...; Tn} with associated periods {p1; p2 ...; pn} and no precedence constraints, if any priority assignment yields a feasible schedule, then the rate monotonic priority assignment yields a feasible schedule.

Liu and Layland also derived a bound on CPU utilization that guarantees there will be a feasible Rate Monotonic Schedule when a set of n tasks have CPU utilization less than the bound.

It was proven that a non-preemptive scheduler will miss some deadlines.

This type of theorem implied that preemptive scheduling is necessary and that you can use static analysis of machine code to improve reliability. This is done today in many critical systems.

I have been using RTOSs since the early 1970s.

For speed, we started designing custom chips like specialized CCDs merged with flash ADCs. By we I mean other physicists on large experiments. This allows you to capture analog data at the equivalent of 1 PB/sec (petabyte per second) at LHC but only digitize a small fraction. You can now buy this type of device from Maxim Integrated and others.
 
No doubt a true RTOS would take a different route. But a good stable robust effort to bridge the gap from a kid friendly learning tool to a powerful and cycle efficient development platform looks to be possible. Teensy seems to be in general the best platform for Arduino from a cost/size/power/capability/support basis - with improved cooperative tasking it may define the new direction for Arduino power users. If only it had a Wiki
 
That's your call Bill. Just don't say it was my doing with EventResponder that made things impossible. You *do* have a way to commandeer those interrupts.
No, commandeering interrupts is not the problem. This change made me realize that it is not worth porting an RTOS when you get almost none of the new functionality.

As I said above, the path of open source RTOSs does not fit with Arduino like systems. A scheduler is not a "RTOS".

It's not just changes to Teensy, all Arduino branches are diverging at the level of porting an RTOS. The standard Arduino cores for AVR, SAMD, and SAMX each have different problems. STM32duino is very different.

An RTOS is all of these drivers and protocols optimized for the kernel. You can't make this work with any form of "Arduino Core", certainly not all versions of Arduino.

HAL includes a rich set of features making writing application a simpler task. The user has no necessity to understand all the inner working of MCU.

Rich set of device driver models:
ADC, abstraction of ADC units with streaming and callback capability.
CAN, abstraction of CAN bus units.
DAC, abstraction of DAC units with streaming and callback capability.
EXT, abstraction of external interrupts with callback capability.
GPT, abstraction on one-shot or continuous timers with callback capability.
I2C, abstraction of an I2C master.
I2S, abstraction of an I2S master or slave with streaming and callback capability.
ICU, abstraction of a PWM-input unit with callback capability.
MAC, abstraction of an Ethernet interface with events generation capability.
PAL, abstraction of GPIO initialization, configuration and use.
PWM, abstraction of a PWM-output unit with callback capability and multiple channels handling.
RTC, abstraction of an RTC clock unit with alarms callback capability.
SDC, abstraction of an SDIO interface to SD/MMC cards. Implements a block interface.
Serial, abstraction of a buffered serial port with events generation capability. Implements a channel interface.
SPI, abstraction of an SPI master with callback capability.
ST, abstraction of a system tick timer for RTOS support.
UART, abstraction of an UART unit with callback capability and unbuffered handling.
USB, abstraction of an USB device unit with callback capability.
Device drivers are configured using configuration structures, all the HW-dependent information is encapsulated in those structures, the API is invariant.
Complex device drivers.
Serial over USB, a CDC device is implemented on top of the USB driver. Implements a channel interface.
MMC/SD over SPI, MMC and SD cards handling on top of the SPI driver. Implements a block interface.
Abstract interfaces:
Streams. Abstract interface of an bidirectional stream, streams can be read and written.
Channels. Extends streams with events for use in asynchronous I/O, introduces timeouts.
Files. Extends streams for file I/O.
Block Devices. Abstraction over devices that handle data in blocks (MMC and SD cards for example).
Support functionalities:
I/O queues. Circular buffers handling.
Formatter. Printf-like functionality over streams.
Memory Stream. Stream-like object over a memory buffer.
Null Stream. Stream-like object that discards output.

Supports multiple interfaces and multiple IP addresses per interface (multihoming)
Support IPv4 and IPv6.
BSD socket API with most popular socket option.
SSL/TLS socket option.
Scalable to contain only required features and minimize memory footprint.

No doubt a true RTOS would take a different route. But a good stable robust effort to bridge the gap from a kid friendly learning tool to a powerful and cycle efficient development platform looks to be possible. Teensy seems to be in general the best platform for Arduino from a cost/size/power/capability/support basis - with improved cooperative tasking it may define the new direction for Arduino power users. If only it had a Wiki

You can add useful improvements that matters to most users but you will be limited by the Arduino foundation, it doesn't even use Thread/Handler mode. You can't follow the current RTOS paths to standard functionality and APIs. It probably is not the correct direction for a system like Teensy where people like to play with core internals.

Teensy is one system for one set of users so why do they need any of these layers?

Hardware Abstraction Layer, HAL, is essentially API’s designed to interact with hardware. A properly designed HAL provides developers with many benefits, such as code that is portable, reusable, lower cost, abstracted, and with fewer bugs.

The CMSIS-RTOS is a common API for Real-Time operating systems. It provides a standardized programming interface that is portable to many RTOS and enables therefore software templates, middleware, libraries, and other components that can work across supported the RTOS systems.

An operating system abstraction layer (OSAL) provides an application programming interface (API) to an abstract operating system making it easier and quicker to develop code for multiple software or hardware platforms.
 
@Bill - Paul's goal is to move EventResponder into Teensy core to create:



This and some related notes here: Minimal-Blink-fails-with-void-yield()

If done well that would improve Teensy ( if not all of Arduino ) to be more RTOS friendly - Teensy is only one family - but with "the new iMXRT chip" sitting on Paul's desk - not having robust co-operative multitasking or RTOS support would really limit it's full 600 Mhz potential.

I have been using the STM32 Cortex-M7 chip for almost two years, it was announced mid 2015. It is more than twice as fast as an M4 chip running at the same clock frequency. In a few days I should get the new STM32H7 which starts at 400 MHz and soon I expect 600MHz versions.

I agree that you need multi-threading with this class of processor. I have also learned that a RTOS can't be added as a library if you want full performance.

You need a true priority based preemptive RTOS with these chips. You can present the single thread Arduino model to users that don't need the complexity but you can't start with a single threaded core and I/O libraries and get high performance by adding a scheduler.

The reason you need threading in the core is I/O, You need a good network stack. I can run 10 threads on a STM32F7, each with a 10 mbit/sec network connection and get 99% of the 100 mbit/sec Ethernet bandwidth. It uses almost no CPU time.

Every I/O device needs a driver that is optimized for threads, you can't add it later. When an I/O driver uses DMA it must start the DMA and put the thread to sleep by waiting on a semaphore. The DMA done interrupt wakes the thread by giving the semaphore. Every modern RTOS uses this mechanism.

When I/O devices and memory structures are shared, you need to use a mutex. You can't added it later to user libraries.

If you don't do this, libraries will be written for external devices that are not thread safe/friendly and you are sunk.

I have been down this path many times since the early 1970s. From 1972-1976 I spent four years in a group that built the first multi-tasking OS for a Cray supercomputer. The first Cray came with a single threaded OS since classified users ran bomb simulations for days.

We found we could not add-on satisfactory multi-processing and started from scratch building a true multi-processor kernel.

In this period I also worked on a multi-tasking RTOS for Modcomp mini-computers for accelerator control in our physics experiments. Same result - you can't add multi-threading to a basic single threaded system without a multi-threaded kernel.

You really need multi-threading for iMXRT to do IoT with many devices .
 
Bill, how would you redesign the Arduino user-facing interface, so that the underlying architecture would work with typical/most RTOS-on-a-microcontroller needs?

For example, if the number of threads and their priorities are fixed at a compile time, I can see at least a couple of ways how to provide an interface similar to the current Arduino environment, say
Code:
#define  THREADS  3
#define  THREAD_1_PRIORITY  100
#define  THREAD_2_PRIORITY  50
#define  THREAD_3_PRIORITY  85

void begin_1(void) { ... }
void begin_2(void) { ... }
void begin_3(void) { ... }

void loop_1(void) { ... }
void loop_2(void) { ... }
void loop_3(void) { ... }
Event handlers could be implemented with some kind of message passing interface, or a similar facility as the above could be extended for them, too. The point is, the total number of threads, and their priorities, would be known at compile time; either explicitly as above, or via an extra source preprocessing pass.

Is that too strict a limitation for real world RTOS applications? (Such a limitation should avoid typical bugs, like resource starvation due to creating too many threads, or bugs in the red tape code when creating the threads; while making it very easy for users to implement multithreaded code.)

You see, it is obvious to me that at some point, the Arduino environment will need a pretty much complete rewrite/redesign, the kind that for example the Apache server has gone through, or how subsystems are replaced in the Linux kernel. Having practical knowledge on what users actually need (as opposed to what they want), and what can cater for the widest audience without severe compromises, is immensely important input to the new design. Everything points to microcontrollers being well suited for multithreaded operation already, right now -- not just the hardware, but also the tasks user-devs write code for --; even more so in the future. To retain the user base, and the interest of hobbyist developers, the superficial user-facing interface should remain largely similar, at least as easy as it is now to get started. We need fusion here.

(I personally have zero sway on Arduino (or on anyone else, really), so I am only interested in this as a user-developer, not an upstream dev. However, I do believe the current Arduino environment design is more than a bit wonky, and something better ought to come up pretty soon, before we get locked up in an Apple-esque or Microsoft-esque walled garden, due to lack of alternatives.)
 
Bill, how would you redesign the Arduino user-facing interface, so that the underlying architecture would work with typical/most RTOS-on-a-microcontroller needs?

Perhaps a related question is how one might improve the Arduino API, so that's it's later possible to change the underlying architecture *without* rewriting programs and libraries.

Many times I've tried asking this question of RTOS fans. The "answer" is almost always heavy on rhetoric or RTOS sales pitches and light on useful API feedback.
 
*without* rewriting programs and libraries
Unfortunately, I believe there is a similar effect to uncanny valley here, as evidenced by the Python 2 -> 3 transition (still underway for most developers; also most Apache server configurations at major version changes). I am pretty much certain that *some* changes to the API are necessary, so I'd rather see a proper redesign that fixes the design issues, rather than a minimal transition. (The way Python 3 treats environment variables and raw binary I/O is an example of how messy some features of a minimal redesign can be.)

So, rather than no-rewrites, I'm interested in "some-rewrites" where the changes lead to such obvious enhancements (from user-dev's point of view), that the rewrites are generally found to be well worth the effort. Easy multi-threading is, I believe, one of those.

Many times I've tried asking this question of RTOS fans. The "answer" is almost always heavy on rhetoric or RTOS sales pitches and light on useful API feedback.
Yes, this seems to always occur when two or more different design approaches clash.

It is rare to get a workable fusion out of the situation, too. Either you need a generalist that is crazily interested (but un-experienced, thus not limited to an established design approach) in both, pretty much paranoid about everything (but still wants everything to work), and has zero ego themselves; or an approach someone finds that just happens to be robust enough to work for both.

I know from personal experience that asking others to find the fusion does not work. It feels like trying to grab a fart from thin air. You just end up getting angry and disgusted. What seems workable, though, is an iterative process, where detailed questions like what exactly does not work, and how is it usually implemented are asked, the answers roughly integrated into an intermediate design model, and followed up with generic question-suggestions like would X work?

It's definitely time-consuming and slow process, which I know you cannot really afford, but us interested user-developers might try our hand at this (assuming I am not alone in being interested in the subject, and some others are also willing to contribute some time in an effort to find a workable fusion model for an API). Maybe start a new thread about it, if Bill is willing to participate?
 
Unfortunately, I believe there is a similar effect to uncanny valley here, as evidenced by the Python 2 -> 3 transition (still underway for most developers; also most Apache server configurations at major version changes). I am pretty much certain that *some* changes to the API are necessary, so I'd rather see a proper redesign that fixes the design issues, rather than a minimal transition. (The way Python 3 treats environment variables and raw binary I/O is an example of how messy some features of a minimal redesign can be.)

So, rather than no-rewrites, I'm interested in "some-rewrites" where the changes lead to such obvious enhancements (from user-dev's point of view), that the rewrites are generally found to be well worth the effort. Easy multi-threading is, I believe, one of those.


Yes, this seems to always occur when two or more different design approaches clash.

It is rare to get a workable fusion out of the situation, too. Either you need a generalist that is crazily interested (but un-experienced, thus not limited to an established design approach) in both, pretty much paranoid about everything (but still wants everything to work), and has zero ego themselves; or an approach someone finds that just happens to be robust enough to work for both.

I know from personal experience that asking others to find the fusion does not work. It feels like trying to grab a fart from thin air. You just end up getting angry and disgusted. What seems workable, though, is an iterative process, where detailed questions like what exactly does not work, and how is it usually implemented are asked, the answers roughly integrated into an intermediate design model, and followed up with generic question-suggestions like would X work?

It's definitely time-consuming and slow process, which I know you cannot really afford, but us interested user-developers might try our hand at this (assuming I am not alone in being interested in the subject, and some others are also willing to contribute some time in an effort to find a workable fusion model for an API). Maybe start a new thread about it, if Bill is willing to participate?

Perhaps a related question is how one might improve the Arduino API, so that's it's later possible to change the underlying architecture *without* rewriting programs and libraries.

Many times I've tried asking this question of RTOS fans. The "answer" is almost always heavy on rhetoric or RTOS sales pitches and light on useful API feedback.

Wow, Nominal Animal, great insight. There are so many points here but I will only reply to a few that I think are most important.

First, the system should start as Arduino and look like Arduino for current users. This can clearly be done since there are already examples like Particle Electron/Photon that use FreeRTOS to support advanced networking over WiFi or Cellular and most users don't see or use it.

Next, which RTOS do you use. My answer is don't worry about this yet. Most people think of an RTOS as a fancy multi-threaded scheduler and don't include the HAL (Hardware Abstraction Layer). At this level, most popular RTOSes can be map to the CMSIS RTOS API.

The HAL is key to performance. The hardware interface needs to look like Arduino as much as possible but take advantage of threading and keep semaphores, mutexs, and other thread functions hidden. For example, hide the fact that a DMA SPI transfer waits on a semaphore but the user still sees SPI.transfer(txBuf, rxBuf, count).

You should be able to preserve the existing Arduino API but in some cases it would be necessary to modify code for better performance. For example, delay() needs to be kept and sleep() should be offered as a replacement. Making delay() call sleep() could break code by allowing other threads to run.

You need to offer the option to run the RTOS in coop mode. In this mode threads of the same priority don't do round-robin time slices. You must yield to cause a context switch.

Often you can make minor changes to existing code and get big performance improvements. Here is an example from an AVR I2C driver I did.

Version that works with standard Arduino core. The I2C transfer is done entirely by a state machine in an ISR. The program just burns CPU while waiting for the I2C transfer and a scheduler won't help.
Code:
inline bool twiMstrBusy() {return TWCR & (1<<TWIE);}

// Function called from ISR to signal done.
void twiMstrSignal() __attribute__((weak));
void twiMstrSignal() {}

// Function called by library functions to wait for ISR to finish.
void twiMstrWait() __attribute__((weak));
void twiMstrWait() {
  while (twiMstrBusy()) {
  }
}

Here is the code that saves huge amounts if CPU time in the multi-threaded case.

Code:
static SEMAPHORE_DECL(twiSem, 0);

// Function called from ISR to signal done.
void twiMstrSignal() {
  CH_IRQ_PROLOGUE();
  chSemSignalI(&twiSem);
  CH_IRQ_EPILOGUE();
}

// Function called by library functions to wait for ISR to finish.
// Sleep on semaphore while other threads run.
void twiMstrWait() {
  chSemWait(&twiSem);
}
I wrapped this with the Wire API so users don't see what happened. The problem is that I can't replace the standard Arduino Wire so libraries that use Wire still block threads. In fact, you can't use my code if a library uses Wire.

I did the same for analogRead since the standard AVR version burns about 120 usec per call. I start the ADC and sleep on a semaphore until the ADC done interrupt. The user sees analogRead(pin).

Now for how users would see threads. Most RTOSes adapt well to multiple loop() functions but you could use any name. I have experimented with a small RTOS on AVR that uses static threads and a table of properties for each thread. This would be good for new users but you need to give advanced users full access to the RTOS.

I would do something like this. The table macros contain tricks to hook up with an Arduino library. Threads are created and start execution at the end of setup().
Code:
/*
 * Threads static table, one entry per thread.
 * These threads start with a null argument.  A thread's name may also
 * be null to save RAM since the name is currently not used.
 */
NIL_THREADS_TABLE_BEGIN()
NIL_THREADS_TABLE_ENTRY("loop1", loop1, NULL, LOOP1_STACK_SIZE, LOOP1_PRIORITY)
NIL_THREADS_TABLE_ENTRY("loop2", loop2, NULL, LOOP2_STACK_SIZE, LOOP2_PRIORITY)
NIL_THREADS_TABLE_END()

void setup() {
  //....
}
// the idle thread uses main stack.
void loop() {
  // ....
}
// Uses stack in a global.
void loop1(void* arg) {
  // ....
}
// Uses stack in a global.
void loop2(void* arg) {
  // ....
}

I will stop here.

There are a huge number of issues and decisions to implement true threads and maintain minimal changes to Arduino and all the code in third party libraries. A key issue is tools so users can diagnose problems. I have built some like filling stacks so you can get "high water marks".
 
Last edited:
In my experiences writing the TeensyThreads library, which provides pre-emptive multithreading, there are two basic problems that are solvable in the long run.

1. System code is not thread safe

Primarily this is because the use of global variables and global state makes it hard for multiple tasks to use the same APIs. For example, two threads cannot access the Wire library, even if they are working on different channels. The Serial library is the same. This problem is extensive, but it can be tackled a bit at a time.

2. Not event driven

Since the APIs are not event driven, when you have any interaction, you always have to wait. You send out your command (using Wire, for example) and then you have to wait for a reply. EventResponder is a great framework for fixing this. Instead of calling delay and doing nothing, you could get a callback when the results are complete (the way that Javascript on a browser does). Applying this to all the APIs would probably not be backward compatible.

On the other hand, if Teensy had the concept of threads, another approach would be to change the delay() function. Instead of doing nothing, delay simply yields it's time to some other tasks. This is the approach taken in TeensyThreads. Using threads in this way can increase the performance and responsiveness of the system significantly because waiting time not wasted. But it is limited because of the non-thread-safe nature of the system code.
 
Again I don't know all of the specifics on exactly how each one wishes for the threading stuff to work.

But for example if the user today calls something like: SPI.transfer(0);
The code will spin waiting for the POPR to be available, such that we can return the result to the user (even if they don't need it...).

So There are several options on how to do this spinning and how to apply to multitasking.
1) is to have this code directly call yield()... This can be taken over by RTOS or the like.
2) In some cases people don't want to yield that much... That is they are maybe not wanting to have too many calls go out and so that code currently may hard hang... Maybe need a yield light...

The question with several of these is how is the thread reactivated? Does it simply wait for next time slot where it again rechecks the state and yields again (today stuff). Or is there some way for it to in some case, register some for of event/mutex/??? that the system will only awake that thread when a condition is met? How does that work with things like SPI, which is looking at SPI registers to see if there is any POPR values available? Do you assume that this code gets rewritten such that SPI uses ISR to handle that data is available and the ISR sets some event/mutex object, when then can wake up thread?

If we can come up with some model that we wish to work toward, hopefully over time we can migrate the code base to the new model and maybe slowly add in features like Teensy threads into the mainline code.

Again I am probably missing a lot here.
 
Code:
// Long ago you could install your own systick interrupt handler by just
// creating your own systick_isr() function. No longer. But if you
// *really* want to commandeer systick, you can still do so by writing
// your function into the RAM-based vector table.
//
// _VectorsRam[15] = my_systick_function;

I wrote in another thread that in order to preserve compatibility with other libraries it's worth considering saving the previous systick_isr and calling it from within your own. Something like:

Code:
void (*previous_systick_isr)();

void my_systick_isr() {
  asm volatile("push {r0-r4,lr}");
  (*previous_systick_isr)();
  asm volatile("pop {r0-r4,lr}");
  /* do my own thing */
}

void MyClass::begin() {
  previous_systick_isr = _VectorsRam[15];
  _VectorsRam[15] = my_systick_isr;
}

EDIT: In original post, I neglected to sandwich the isr() call between a push/pop to save registers because the isr function doesn't do that. In any case, this is just an illustration.
 
Last edited:
In my experiences writing the TeensyThreads library, which provides pre-emptive multithreading, there are two basic problems that are solvable in the long run.

1. System code is not thread safe

Primarily this is because the use of global variables and global state makes it hard for multiple tasks to use the same APIs. For example, two threads cannot access the Wire library, even if they are working on different channels. The Serial library is the same. This problem is extensive, but it can be tackled a bit at a time.

2. Not event driven

Since the APIs are not event driven, when you have any interaction, you always have to wait. You send out your command (using Wire, for example) and then you have to wait for a reply. EventResponder is a great framework for fixing this. Instead of calling delay and doing nothing, you could get a callback when the results are complete (the way that Javascript on a browser does). Applying this to all the APIs would probably not be backward compatible.

On the other hand, if Teensy had the concept of threads, another approach would be to change the delay() function. Instead of doing nothing, delay simply yields it's time to some other tasks. This is the approach taken in TeensyThreads. Using threads in this way can increase the performance and responsiveness of the system significantly because waiting time not wasted. But it is limited because of the non-thread-safe nature of the system code.

The first point, "System code is not thread safe". A key function of an RTOS is to handle this.

I am a bit puzzled, I noticed TeensyThreads has a "Mutex" but it appears to not queue waiting threads.

This the way the Wire problem is normally handled. If the I2C controller is to be shared, a mutex is associated with the controller and two functions, i2cAcquireBus() and i2cReleaseBus(), are implemented using the mutex.

Here is the documentation from ChibiOS:
void i2cAcquireBus(I2CDriver * i2cp)

Gains exclusive access to the I2C bus.

This function tries to gain ownership to the I2C bus, if the bus is already being used then the invoking thread is queued.
void i2cReleaseBus (I2CDriver * i2cp)

Releases exclusive access to the I2C bus.
If a thread is queued for an I2C bus when the bus is released and the threads priority is high enough, ownership of the bus is given to the thread and a context switch to the thread is done immediately.

A thread may require a number of resources and deadlock can be a problem so you must acquire resources in the proper order and the RTOS must implement protection against priority inversion.

2. Not event driven - This is the whole point of an RTOS. Wire must be modified to use a semaphore like my example above. When a thread calls my I2C driver, it sleeps on a semaphore wait until the I2C done interrupt. The I2C ISR signals the semaphore and the thread is immediately made runable. As soon as the thread is the highest priority runable thread it will execute.
 
I am a bit puzzled, I noticed TeensyThreads has a "Mutex" but it appears to not queue waiting threads.
That's right. It's the simplest implementation of a Mutex and it provides no queueing or prioritization like a proper RTOS would provide.
 
Again I don't know all of the specifics on exactly how each one wishes for the threading stuff to work.

But for example if the user today calls something like: SPI.transfer(0);
The code will spin waiting for the POPR to be available, such that we can return the result to the user (even if they don't need it...).

So There are several options on how to do this spinning and how to apply to multitasking.
1) is to have this code directly call yield()... This can be taken over by RTOS or the like.
2) In some cases people don't want to yield that much... That is they are maybe not wanting to have too many calls go out and so that code currently may hard hang... Maybe need a yield light...

The question with several of these is how is the thread reactivated? Does it simply wait for next time slot where it again rechecks the state and yields again (today stuff). Or is there some way for it to in some case, register some for of event/mutex/??? that the system will only awake that thread when a condition is met? How does that work with things like SPI, which is looking at SPI registers to see if there is any POPR values available? Do you assume that this code gets rewritten such that SPI uses ISR to handle that data is available and the ISR sets some event/mutex object, when then can wake up thread?

If we can come up with some model that we wish to work toward, hopefully over time we can migrate the code base to the new model and maybe slowly add in features like Teensy threads into the mainline code.

Again I am probably missing a lot here.

First, a thread is activated when it is the highest priority thread and runable. Threads are not runable while they wait for events. Threads wait on mutexes, semaphores, sleeping for an interval or until a set time, until a Boolean condition on event flags. ISRs and other threads can trigger events. For example, a semaphore may be associated with a queue and a consumer thread waits on the semaphore. When another thread inserts an item in the queue, the semaphore is signaled and the consumer thread becomes runable.

I have studied and timed SPI implementations for almost 10 years. Spinning while waiting for the receive data in transfer(dataByte) is a minor problem. Since shift out and shift in happen in parallel and access to even simple devices involves several transfers, the complexity to optimize this is not worth it. If you do 5-6 SPI accesses to write an SPI device you save at most a fraction of a microsecond. Often a sequence of transfers to a device ends with a read so you save nothing. At 8 MHz a byte takes a microsecond so two context switches would use all of that.

Threads are great for unexpected events like external interrupts where processing the event takes longer than a few microseconds so an ISR isn't the best answer but the event is urgent so polling is too long.

Threads are great for running lower priority code during a DMA transfer by a higher priority thread.

Threads simplify low priority compute intensive tasks. You don't need to yield or poll for events.

I have a feeling that you can't really understand an RTOS without real world examples. Perhaps there are good tutorials somewhere but just reading the ones I know about often doesn't clear things up.

Often several threads run with little interaction, a display thread, read sensor thread that queues data for a logger thread can be a very simple program.

Threads at the same priority run round-robin with typically about a 20 ms time slice. A collection of threads can act like applications on a PC.
 
Last edited:
Hi,

I don't want to disturb this interesting discussion,
and I also have the opinion that we need a reasonable, easy to use multitasking with the next Teensy model at the latest.

Nevertheless, I would like to mention briefly, at least, an alternative option that we already have today - and use it without being aware of it - I'm always a fan of unconventional solutions:

We have almost everything already, and even "free" and really fast: The ARM and the interrupt controller (do not laugh).
Every beginner already uses several tasks without realizing it. Like the systick. This can be extended without problems, if you don't leave the thinking to an OS but do it yourself:)

Let's just use a timer - e. g. the existing "intervalTimer". This can start our "Task" on a regular basis. If we set the interrupt priority to 255 (the lowest), we have our main task.
loop () of Arduino is now our idle-task. Idle () always runs when no other task (interrupt) is running - completely automatically, without another line of code.
There are also software interrupts (all of them in the ARM Cortex that are not needed for the hardware at the moment - so a lot of them! If we assign the same interrupt priority to them and start them appropriately, we automatically have a "RoundRobin"because they are processed one after the other.
Of course, this can also be done with other interrupts.
At the same time, you can add higher prioritized tasks at any time without having to mix anything up. If you raise the whole one or two interrupt levels, you can make it very complicated and structured. Only if you want to.
without changing anything in Arduino, without library.

A delay () of a low-priority task is automatically interrupted - you don't even have to think about it - or even write code.
You can jump out of your task and be "cooperative" with a simple "return;"

If you want, you can pause individual tasks by switching them off in the interrupt controller (easy to use already existing macros)

I have been using this system for some time now, and I am very happy with it. The only "problem" are shared resources - like I2C - but you can find a solution to it relatively quickly (honestly: I haven't needed this yet - how often does it happen that two tasks need I2C - this can be taken into account during the program design. Just THINK :)

My current project spends 99% of the time in a very long interrupt (+ higher priority ones), and calls in loop () almost only "WFI" and a few small routines to control the main task - that saves power.

Just take it as independend idea.

Edit: An "Event" is just starting a software-interrupt. It will run as soon as possible if no other tasks have priority. Again, no additional code needed.

So, now back to the actual topic please: -)
Please excuse my English.
 
Last edited:
The thing you get from a RTOS or even a cooperative scheduler, which you can't have from interrupts or EventResponder, is the convenience of structuring your threads as simple wait-then-do-something code with blocking APIs. But the huge cost, even if every library properly protects shares resources, is divvying up the limited RAM into small stacks.

My personal opinion is we will only manage to make this really work for beginners (Arduino) when we get chips with a real MMU. Don't confuse this with the MPU we have now. A real MMU which can implement virtual memory which will let us put each task into its own address space and automatically allocate more blocks as each stack needs them.

Someday we will get MMU hardware. Right now, all the semiconductor manufacturers are scrambling to come up with ways to save their imagined cash cow ("Internet Of Things") from the terrible security reputation connected devices are accumulating. Sooner or later they'll get past their "marked segmentation" fears and turn MMUs into a security selling point. I really hope it happens sooner.

In the meantime, I do want to put better APIs into Arduino. I believe we need more begin() and end() style usage of hardware and other shared resources. It took over a year to get SPI.beginTransaction() and SPI.endTransaction() accepted, but eventually it did happen. Hopefully soon Arduino will be fully recovered from the Federico Musto saga and (maybe) willing to consider APIs. But again the old problem is how difficult it can be to have even a productive conversation, not to mention someone knowledgeable about RTOS systems willing to put time and energy into incrementally improving Arduino.
 
"as simple wait-then-do-something code:" If you change that to "don't wait, get startet by an other thread and do it" it is easy - I know, you can not translate the common thread-thinking to the interrupt system 1:1 - but that's not needed. You get used to the "Interrupt" system really fast if you just use it for a project :)

Apart from that: I said it -a RTOS would be really useful, esp. for beginners. I totally agree !
 
Hi,

I don't want to disturb this interesting discussion,
and I also have the opinion that we need a reasonable, easy to use multitasking with the next Teensy model at the latest.

Nevertheless, I would like to mention briefly, at least, an alternative option that we already have today - and use it without being aware of it - I'm always a fan of unconventional solutions:

We have almost everything already, and even "free" and really fast: The ARM and the interrupt controller (do not laugh).
Every beginner already uses several tasks without realizing it. Like the systick. This can be extended without problems, if you don't leave the thinking to an OS but do it yourself:)

Let's just use a timer - e. g. the existing "intervalTimer". This can start our "Task" on a regular basis. If we set the interrupt priority to 255 (the lowest), we have our main task.
loop () of Arduino is now our idle-task. Idle () always runs when no other task (interrupt) is running - completely automatically, without another line of code.
There are also software interrupts (all of them in the ARM Cortex that are not needed for the hardware at the moment - so a lot of them! If we assign the same interrupt priority to them and start them appropriately, we automatically have a "RoundRobin"because they are processed one after the other.
Of course, this can also be done with other interrupts.
At the same time, you can add higher prioritized tasks at any time without having to mix anything up. If you raise the whole one or two interrupt levels, you can make it very complicated and structured. Only if you want to.
without changing anything in Arduino, without library.

A delay () of a low-priority task is automatically interrupted - you don't even have to think about it - or even write code.
You can jump out of your task and be "cooperative" with a simple "return;"

If you want, you can pause individual tasks by switching them off in the interrupt controller (easy to use already existing macros)

I have been using this system for some time now, and I am very happy with it. The only "problem" are shared resources - like I2C - but you can find a solution to it relatively quickly (honestly: I haven't needed this yet - how often does it happen that two tasks need I2C - this can be taken into account during the program design. Just THINK :)

My current project spends 99% of the time in a very long interrupt (+ higher priority ones), and calls in loop () almost only "WFI" and a few small routines to control the main task - that saves power.

Just take it as independend idea.

Edit: An "Event" is just starting a software-interrupt. It will run as soon as possible if no other tasks have priority. Again, no additional code needed.

So, now back to the actual topic please: -)
Please excuse my English.
The idea of using an interrupt controller has been used before in place of an RTOS. You are just 50 years too late to get credit for the idea.

The first I know of were early NASA flight control computers. Late Mercury, all Gemini and the first Apollo missions were controlled by IBM 7094 machines. The channel control section which is an interrupt controller was adapted for quick real time response.

Too bad they didn't have a Teensy 3.6. The 7094 had a 500 kHz clock and the equivalent of 150 KB of oil cooled magnetic core memory arranged as 32K 36-bit words. I believe they cost about $3.5 M in 1963, probably a million times a Teensy 3.6 in constant dollars.

One was still in use at the Lawrence Berkeley Lab in 1969 when I started my postdoc work. It was a massive machine powered by huge motor generator sets with 400 Hz power to racks.

IBM7094.jpg
 
Last edited:
?? Where did i say its my idea.? No, I really hope everbody is smart enough to develop such a simple system. It may be 50 years old but its not bad because the Idea is old.
 
The thing you get from a RTOS or even a cooperative scheduler, which you can't have from interrupts or EventResponder, is the convenience of structuring your threads as simple wait-then-do-something code with blocking APIs. But the huge cost, even if every library properly protects shares resources, is divvying up the limited RAM into small stacks.

My personal opinion is we will only manage to make this really work for beginners (Arduino) when we get chips with a real MMU. Don't confuse this with the MPU we have now. A real MMU which can implement virtual memory which will let us put each task into its own address space and automatically allocate more blocks as each stack needs them.

Someday we will get MMU hardware. Right now, all the semiconductor manufacturers are scrambling to come up with ways to save their imagined cash cow ("Internet Of Things") from the terrible security reputation connected devices are accumulating. Sooner or later they'll get past their "marked segmentation" fears and turn MMUs into a security selling point. I really hope it happens sooner.

In the meantime, I do want to put better APIs into Arduino. I believe we need more begin() and end() style usage of hardware and other shared resources. It took over a year to get SPI.beginTransaction() and SPI.endTransaction() accepted, but eventually it did happen. Hopefully soon Arduino will be fully recovered from the Federico Musto saga and (maybe) willing to consider APIs. But again the old problem is how difficult it can be to have even a productive conversation, not to mention someone knowledgeable about RTOS systems willing to put time and energy into incrementally improving Arduino.

My personal opinion is we will only manage to make this really work for beginners (Arduino) when we get chips with a real MMU.

But the huge cost, even if every library properly protects shares resources, is divvying up the limited RAM into small stacks.
Beginners have no problems with setting stack sizes.

I patched my ChRt library so it will run on Teensy 3.6. This example illustrates how to set stack sizes.

Here is output:
Count: 19947119, Unused Stack: 236 120 254824 312
Count: 19945026, Unused Stack: 236 120 254824 312
Count: 19945191, Unused Stack: 236 120 254824 312
Count: 19945200, Unused Stack: 236 120 254824 312
Count: 19945157, Unused Stack: 236 120 254800 312
Count: 19945179, Unused Stack: 236 120 254800 312
Count: 19945246, Unused Stack: 236 120 254800 312
Count: 19945223, Unused Stack: 236 120 254800 312
Count: 19945179, Unused Stack: 236 120 254800 312
Count: 19945222, Unused Stack: 236 120 254800 312
Count: 19945179, Unused Stack: 236 120 254800 312
Count: 19945220, Unused Stack: 236 120 254800 312

The unused stack numbers for a Teensy 3.6 are for Thread1, Thread2, Main stack, Handler stack. I run the RTOS in Handler/Thread mode.

I get these numbers by filling memory with a 0x55 pattern when I restart the processor in Handler/Thread mode. I check how much of the pattern remains so it is a high water mark. Real-time programmers have used this method for decades. It's easy to explain to a beginner what this means.

By the way, almost as many beginners have stared my RTOS libraries as SdFat.

There is no cost to run a RTOS. No code runs unless an event happens and you want the RTOS to handle it. This assumes you use tickless mode.

RTOSs allow a fast interrupt class so if you wish, you can run key ISRs in native mode without the RTOS being involved.

In real systems, RTOSs are generally more efficient than simple schedulers. That's why there are hundreds of copies of VxWorks in satellites where weight and power are critical.

Two of my former colleagues developed VxWorks in their Wind River company and adapted a version for all of the Mars Landers. They also became extremely wealthy when they sold the company to Intel.

If you fly on a Boeing plane, VxWorks is in the Avionics. People really do use RTOSs in big critical systems.

As they say, "When it Matters, It Runs on Wind River."
 
Status
Not open for further replies.
Back
Top