RTOS no longer usable with Teensy?

Bill Greiman · Sep 30, 2017

Frank B said:
?? Where did i say its my idea.? No, I really hope everbody is smart enough to develop such a simple system. It may be 50 years old but its not bad because the Idea is old.

The reason it is no longer used is that the cost of developing big systems is now in software. Big systems like LHC use thousands of micro-controllers. There is no longer a place for a clever trick in one model of micro-controller that may be replaced by something else during the life of the system.

History has shown that a trick at one level results in problems at another with no gain.

There are too many standards for critical systems to use tricks that can't be certified.

VxWorks is an RTOS I described above.

VxWorks 653 and VxWorks Cert Platforms

Proven in the most challenging safety- and security-critical applications, VxWorks® makes it easier and more cost-effective for technology suppliers to meet the stringent safety certification requirements of EN 50128, IEC 61508, RTCA DO-178C, EUROCAE ED-12C, and other standards in aerospace, defense, industrial automation, transportation, and other regulated industries.

Powering billions of intelligent devices, VxWorks
®
is the world’s most widely deployed real-
time operating system (RTOS). It delivers unrivaled deterministic performance and sets the
standard for a scalable, future-proof, safe, and secure operating environment for connected
devices in the Internet of Things (IoT). Leading global innovators such as ABB, Airbus, Alcatel-
Lucent, BD Biosciences, Boeing, Delphi, Eurocopter, Huawei, Mitsubishi, NASA, Northrop
Grumman, Siemens, and Varian leverage VxWorks to create award-winning, innovative,
IoT-ready products effectively and efficiently

PaulStoffregen · Sep 30, 2017

Bill, what's the point of all this intense rhetoric?

Bill Greiman · Sep 30, 2017

PaulStoffregen said:
Bill, what's the point of all this intense rhetoric?

I posted an example that replied to your conjecture that beginners couldn't deal with an RTOS. Did you look at it? It's a real example with real code, not rhetoric.

Sorry, I should have not replied to why the idea of using an interrupt controller in place of an RTOS is not used in critical systems any longer.

The title of this topic was an unintentional mistake. When I didn't see a hook for use of exception vectors, I never guessed the answer for a library would be to just overwrite the vector in RAM. I should have asked a different question.

The comment that one should overwrite SysTick was not in the version of Teensy that I downloaded.

I assumed you intended to use the various vectors required by an RTOS for the EverntResponder in various drivers so they were no longer usable.

The one thing that has become clear to me is that new features open source RTOSs can't be ported to Arduino as libraries. It's not about the vectors it's the fact that the HAL of an RTOS is a replacement for the Arduino core.

KurtE · Sep 30, 2017

Thanks Bill,

Bill Greiman said:
First, a thread is activated when it is the highest priority thread and runable. Threads are not runable while they wait for events. Threads wait on mutexes, semaphores, sleeping for an interval or until a set time, until a Boolean condition on event flags. ISRs and other threads can trigger events. For example, a semaphore may be associated with a queue and a consumer thread waits on the semaphore. When another
thread inserts an item in the queue, the semaphore is signaled and the consumer thread becomes runable.

I am actually aware of this stuff. I have worked on large machines (IBM, Amdahl, Prime, ...), Workstations (Sun, Apollo) and wrote a protect mode kernel (30+ years ago) for Intel 286/386, that we were working on distributed OS, project canceled, kernel taken over by another group doing Air Defense system... And I have been playing with lots of linux boards (RPI, Odroid, UP, Intel Edison, Galileo)

What I was trying to ask, are there places in the current Teensy Arduino code base that we should consider changing to make it more compatible with your RTOSs or other threaded systems. Currently I am more interested in making it work better with Teensy Threads, not because it is necessarily better but simply because it now ships with Teensyduino...

One option which I am not advocating is to do like Edison, where you are actually running Linux, but you have a version of the Arduino build, which simply compiles to an app that gets downloaded to the main storage and runs like any other app in user space... All system level stuff is done in Kernel space...

Bill Greiman said:
I have studied and timed SPI implementations for almost 10 years. Spinning while waiting for the receive data in transfer(dataByte) is a minor problem. Since shift out and shift in happen in parallel and access to even simple devices involves several transfers, the complexity to optimize this is not worth it. If you do 5-6 SPI accesses to write an SPI device you save at most a fraction of a microsecond. Often a sequence of transfers to a device ends with a read so you save nothing. At 8 MHz a byte takes a microsecond so two context switches would use all of that.

Yes, this is very true in some simple byte/word transfers. However maybe not so necessarily true of some usages so SPI. Example updating an TFT display. If you are using a backing frame buffer for example with ILI9341 display you setup the SPI with a few short commands: Set columns, set Rows, <start memory write> then you output 240*320*2 bytes, and you don't care about what is returned... Also remember with the T3.x boards we also have 4 item queue, that also allows us to encode the CS pin(s) state for each command, such that we don't need to wait until one command completes before queuing the next command.

A more simple update example is for the SSD1306(?)/TeensyView - Where you update the display with outputting either 512 bytes (32 lines), or 1024 bytes (64 lines), again you don't care what comes back as this one does not have a MISO pin...

Over the last few builds have added support to improve this, including now the Async version (DMA) using the new events... But for many setups having a thread wait for display update to complete makes sense and is simple.

But again asking, what issues are you having with different subsystems like SPI. When I hear things like they are not thread safe... Would help to know more, like: You can not use the SPI object in multiple threads (example multiple devices on the same SPI buss), But can you use different SPI busses in different threads? Where are the hangups? Again maybe we can not solve for all old code, but for example on same buss, could we maybe have way for beginTransaction/endTransaction - be augmented such that in a multiple thread case one thread will know that another thread owns the SPI object and wait for it to be available...

Also SPI was just an example, what about issues in SerialX, ...

Bill Greiman said:
Threads are great for unexpected events like external interrupts where processing the event takes longer than a few microseconds so an ISR isn't the best answer but the event is urgent so polling is too long.
...

- Yep I do use threads when working on Linux and the like...

Bill Greiman said:
I have a feeling that you can't really understand an RTOS without real world examples. Perhaps there are good tutorials somewhere but just reading the ones I know about often doesn't clear things up.

Often several threads run with little interaction, a display thread, read sensor thread that queues data for a logger thread can be a very simple program.

Threads at the same priority run round-robin with typically about a 20 ms time slice. A collection of threads can act like applications on a PC.

Again I, and I assume many/most others up here have used systems with threads or the like and understand their usefulness.

Side tract - After I retired from developing software on PCs and the like, I was attracted to working on machines like the Arduino as it was refreshing to be in total control and to be able to see all of the code and if does not work for you, you can fix it. You are pretty much into control of when things like IOs happen, which makes it great for things like controlling servos of a robot to get nice smooth walking gaits and the like. Small Linux boards are fun to play with, but for at least most of us you lose the concept of fully understanding all of the code that is happening. You also also often loose control of the ability to finely control the timings. That is why for example when several people have moved from using an Arduino to something like an RPI, they might end up with a jerkiness in their servo movements... Yes I know there are ways around this...

So again I would like to help make it easier to use some simple RTOS or other threading systems, that we can all use, when we wish to use it. But I also understand the requirement for many that especially on the smaller chips (T-LC, T3.2), that adding this support should not take away the ability to control it the way they currently do. i.e - they already use up most/all of the memory and have timings worked out for their stuff... Again I know that it is a hard line to walk.

It would be great if we could setup a list of things that would help, and as several of us work through different subsystems we can hopefully converge to a better system.

Bill Greiman · Sep 30, 2017

What I was trying to ask, are there places in the current Teensy Arduino code base that we should consider changing to make it more compatible with your RTOSs or other threaded systems. Currently I am more interested in making it work better with Teensy Threads, not because it is necessarily better but simply because it now ships with Teensyduino...

The answer to what users want in an RTOS is simple, FreeRTOS if they can't use Linux. I like other RTOSs much more but I keep getting mail asking me to update my port of FreeRTOS.

2017 Embedded Market Survey

So I guess I will update my FreeRTOS library. I love ChibiOS/RT but once again, it's hard to find an application where the speed or features of ChibiOS/RT will make a difference.

I no longer work on an RTOS I started. I get the message from surveys like the above chart and lack of interest in ports of technically better systems.

I agree, I don't like the Edsion approach.

One option which I am not advocating is to do like Edison, where you are actually running Linux, but you have a version of the Arduino build, which simply compiles to an app that gets downloaded to the main storage and runs like any other app in user space... All system level stuff is done in Kernel space...

I do use Linux boards like RPI. I did a system with my daughter using a RPI and a lot of Particle Photon and Electron boards. She recently left IT in a big company and started her own real estate business. She wanted a system to monitor and control stuff in vacant vacation rental homes.

The Particle boards are Arduino compatible so there are lots of sensor libraries. They have on board WiFi or Cellular. We put stuff to sense temp, humidity, gas, smoke, movement, sump pumps status and control and more in the Particle boards. They talked to the central RPI so if something goes wrong, she gets a text message on her phone. We made an Android app so she can look at and control all the sensors and pins on the Particle board from her phone. She can connect about anything to the Particle board and control it from her phone. It only took a few days.

I did several libraries for Particle boards. Many users of Particle boards are in startup companies and have a very different outlook than us old timers.

But again asking, what issues are you having with different subsystems like SPI. When I hear things like they are not thread safe... Would help to know more, like: You can not use the SPI object in multiple threads (example multiple devices on the same SPI buss), But can you use different SPI busses in different threads? Where are the hangups? Again maybe we can not solve for all old code, but for example on same buss, could we maybe have way for beginTransaction/endTransaction - be augmented such that in a multiple thread case one thread will know that another thread owns the SPI object and wait for it to be available...

The STM32duino guys are also trying to optimize single byte writes. I just don't find cases where it matters.

I use SPI for SD cards, displays, and sensors, and external memory.

For SD cards you try to do huge transfers since cards have big flash pages. Single byte transfer are used for commands. You send a six byte command and wait for a status return so optimizing send doesn't matter. My displays take large writes so single byte transfers don't matter. I read sensors after writing commands. I guess it could matter for writing external SRAM but I use it with pages to avoid call overhead so the transfers are large.

There is nothing better than lots of SPI controllers to simplify life with multiple SPI devices. I even wrote a bit-bang SPI library that implements SPI modes. It is a template class and does over 1 MHz on AVR so it is plenty fast for simple sensors and avoids conflicts with SD cards and displays. Setting or clearing a bit on AVR takes just two cycles.

Nominal Animal · Oct 4, 2017

ftrias said:
Since the APIs are not event driven, when you have any interaction, you always have to wait. You send out your command (using Wire, for example) and then you have to wait for a reply.

This is not only an API issue, but requires a paradigm shift for developers to understand and utilize it correctly.

As an example, heavily parallelized atomic simulations (physics/biology/chemistry) use MPI for communication between nodes. There is are perfectly useable asynchronous message passing facilities, but very, very few simulators actually use them: instead, they do computation, then messaging, then computation, and so on, rather than interleaving the two. Why? I've heard two answers. The first is "because it is simpler". The second is "we don't want the CPUs to overheat", and I think that's a bogus argument. (This is also the reason for the InfiniBand network used in computing clusters, really.)

Granted, parallelizing something like a single Monte Carlo simulation is at least an order of magnitude more complicated than just implementing it procedurally using a single thread. (I still haven't proven that a scheme I developed almost a decade ago retains detailed balance..)

I believe that the underlying reason is that developers find it hard to design and understand asynchronous events and how to use them; especially scientists, who seem to be very comfortable (and stuck) to procedural programming models. (I do not know whether this can be extrapolated to microcontroller developers, however.)

Simply put, the procedural view and asynchronous, event-based, or multithreaded processing simply do not mix well.

(It is also much easier to teach async/event-based/multithreaded programming to a complete newbie, than it is to someone who is well experienced with procedural programming.)

PaulStoffregen said:
But the huge cost, even if every library properly protects shares resources, is divvying up the limited RAM into small stacks.

This is also a compiler/ABI problem. Basically, the idea that there is a linear stack is too deeply intertwined with how the compilers generate code.

For example, instead of a single stack, you can use a pool of stack regions. For each thread, used regions are chained together, not linear in memory. There is only one pool of stack regions, shared by all threads. At compile time, the compiler determines the amount of stack a function requires (as it does now, except it should also account for local overhead for function calls, but not the stack needed by the called functions). If there is enough room in the current stack region for the local state, the current region is used as usual; otherwise, a new stack region is obtained and initialized, and a return cleanup pushed/injected to release the new stack region and revert to the old stack region and position in it at return time. So, there is some overhead, but it isn't much, even in the new-stack-region case. You also do need to reserve another register for the current stack region pointer/watermark.

The main problem with the scheme is that C compilers expect the unused part of the stack to be freely usable as a temporary scratch pad (that is not retained across a function call). It's built in pretty deep, because it is basically true for all architectures a C compiler supports.
_____

It seems like I was way too optimistic; too many people -- and I'm not talking about us here: I'm talking about people like GCC devs, hardware folks, and a lot of library devs -- would need to act in concert to get this work. (Not to get everything done at once, but before the design finalization phase, to see the actual use cases and practical needs.)

Darn. The possibilities exist, it's just that no single person (or even a small team) can do this in any sensible timeframe.

Bill Greiman · Oct 5, 2017

Nominal Animal said:
This is not only an API issue, but requires a paradigm shift for developers to understand and utilize it correctly.

As an example, heavily parallelized atomic simulations (physics/biology/chemistry) use MPI for communication between nodes. There is are perfectly useable asynchronous message passing facilities, but very, very few simulators actually use them: instead, they do computation, then messaging, then computation, and so on, rather than interleaving the two. Why? I've heard two answers. The first is "because it is simpler". The second is "we don't want the CPUs to overheat", and I think that's a bogus argument. (This is also the reason for the InfiniBand network used in computing clusters, really.)

Granted, parallelizing something like a single Monte Carlo simulation is at least an order of magnitude more complicated than just implementing it procedurally using a single thread. (I still haven't proven that a scheme I developed almost a decade ago retains detailed balance..)

I believe that the underlying reason is that developers find it hard to design and understand asynchronous events and how to use them; especially scientists, who seem to be very comfortable (and stuck) to procedural programming models. (I do not know whether this can be extrapolated to microcontroller developers, however.)

Simply put, the procedural view and asynchronous, event-based, or multithreaded processing simply do not mix well.

(It is also much easier to teach async/event-based/multithreaded programming to a complete newbie, than it is to someone who is well experienced with procedural programming.)

This is also a compiler/ABI problem. Basically, the idea that there is a linear stack is too deeply intertwined with how the compilers generate code.

For example, instead of a single stack, you can use a pool of stack regions. For each thread, used regions are chained together, not linear in memory. There is only one pool of stack regions, shared by all threads. At compile time, the compiler determines the amount of stack a function requires (as it does now, except it should also account for local overhead for function calls, but not the stack needed by the called functions). If there is enough room in the current stack region for the local state, the current region is used as usual; otherwise, a new stack region is obtained and initialized, and a return cleanup pushed/injected to release the new stack region and revert to the old stack region and position in it at return time. So, there is some overhead, but it isn't much, even in the new-stack-region case. You also do need to reserve another register for the current stack region pointer/watermark.

The main problem with the scheme is that C compilers expect the unused part of the stack to be freely usable as a temporary scratch pad (that is not retained across a function call). It's built in pretty deep, because it is basically true for all architectures a C compiler supports.
_____

It seems like I was way too optimistic; too many people -- and I'm not talking about us here: I'm talking about people like GCC devs, hardware folks, and a lot of library devs -- would need to act in concert to get this work. (Not to get everything done at once, but before the design finalization phase, to see the actual use cases and practical needs.)

Darn. The possibilities exist, it's just that no single person (or even a small team) can do this in any sensible timeframe.

Yikes! This is most of high performance computing.

I started this thread with the misunderstanding that the EventResponder ruled out use of exception vectors for use by an RTOS.

I understand there is no problem and have already updated this port of ChibiOS/RT for Teensy 3.0 - 3.6, AVR, SAMD, and SAM3X Arduinos. I am almost finished with a new port of FreeRTOS V9 to these boards. I just need to do some cleanup and testing before posting FreeRTOS V9.0.

defragster · Oct 5, 2017

That's good news Bill, glad you found a path forward.

RTOS no longer usable with Teensy?

Bill Greiman

Well-known member

PaulStoffregen

Well-known member

Bill Greiman

Well-known member

KurtE

Senior Member+

Bill Greiman

Well-known member

Nominal Animal

Well-known member

Bill Greiman

Well-known member

defragster

Senior Member+