Use of yield() in Teensy Cores and Libraries

joepasquariello

Well-known member
The recent thread "Thoughts on Handling Complexity" led to a discussion of cooperative multi-tasking and the use of yield(). In Arduino, yield() is an empty function defined as "weak" to allow replacement with a function that does a cooperative task switch. Here is the comment from the Arduino core:

/* Empty yield() hook. This function is intended to be used by library writers
* to build libraries/sketches that support cooperative threads. It's defined as
* a weak symbol and can be redefined to implement a cooperative scheduler.
*/

The calls to yield() in the latest version of the Teensy cores are listed below. All are made from within while loops, waiting either for a hardware operation to complete, or a timeout to expire. If using a cooperative RTOS, calling any of these functions from within a task will result in that task yielding the CPU if the function must wait for the operation to complete. If there is no need to wait, such as being able to put some number of bytes into a TX buffer, the function will return without yielding.

I think this is the right model to maintain, so the only change I would suggest to the cores is to add yield() to analogRead() in Teensy4/analog.c, to be consistent with Teensy3/analog.c.

The TeensyDuino libraries contain many uses of yield(), too many to discuss in one thread, but I’m interested in SPI and I2C, so I’ll focus on those for now. The SPI library transfer functions currently do not call yield(), and I think they could. For I2C, the Teensy4 file WireIMXRT does call yield() while waiting. WireKinetis has timeouts, but does not call yield(), and I think it should.

Code:
Teensy (none)

Teensy3

    analog.c (2)
        analogRead - wait for conversion complete
    main.cpp (1)
        main - after each execution of loop()
    pins_teensy.c (1)
        delay - wait for delay complete
    serial1.c, serial2.c (4)
        serial_end - wait for TX complete
        serial_putchar - wait for buffer to have room (without hardware FIFO)
        serial_write - wait for buffer to have room (with hardware FIFO)
        serial_flush - wait for TX complete
    serial3.c, serial4.c, serial5.c, serial6.c, serial6_lpuart.c (3)
        serial_end - wait for TX complete
        serial_putchar - wait for buffer to have room (no hardware FIFO)
        serial_flush - wait for TX complete
    Stream.cpp
        timedRead - wait for timeout (if no data available)
        timedPeek - wait for timeout (if no data available)
    usb_flightsim/joystick/keyboard/midi/mtp/rawhid/seremu/serial/serial2/serial3
        (generally wait for send complete or timeout on read)

Teensy4 (same as Teensy3 with the exception of analogRead)

    delay.c (1)
        delay - wait for delay complete
    HardwareSerial.cpp (3, one class for HardwareSerial1-8)
        end - wait for TX complete
        flush - wait for TX complete
        write_9_bit - wait for buffer to have room
    main.cpp
        main - after each execution of loop()
    Stream.cpp
        timedRead - wait for timeout (if no data available)
        timedPeek - wait for timeout (if no data available)
    usb_flightsim/joystick/keyboard/midi/mtp/rawhid/seremu/serial/serial2/serial3
        (generally wait for send complete or timeout on read)
 
Normally for all hardware's that takes a lot of time to complete
for example slow interfaces such as UART/I2C/SPI etc.
You use both transmit and receive interrupt mechanisms, sometimes even DMA if available.

For example if we want to transmit something over UART
First when the program starts you allocate a buffer with for example 1024 bytes
(or less depending how much memory that is available),
it could also be allocated dynamically, when there is need for it,
depending on the requirements for the specific 'product'.

1. if you need to send something
2. buffer is filled with the data
3. transmit is activated, if not already running which begins to send the first data from the buffer
4. transmit interrupt is activated
(sometimes it can be activated before beginning to send the first data, but the buffer need to have the data so the interrupt routine have something to send)
this then automatically reads the buffer when above sent data is finished,
and transmit the rest of the data in the background,
when it's finished/or a transmit error has occurred (specially for I2C),
it can set a flag, so that the main program can see if the transmit was successful.
5. the main program runs and just checks the flags above, no wait is required.
also the main program can fill in the buffer, (safest is to disable interrupts when doing so, to ensure 'atomic' access)
a wait would only occur if the buffer is full and the main loop then need to wait for it to be empty
(this state can be avoided by having a bigger buffer size)

a more advanced alternative to using the peripheral specific interrupt system is by using DMA access
then transmits specially for UART can be done without any CPU interaction
and the interrupt is normally only occurring at the end of DMA transfer.


ADC also have a interrupt system, so no waits are needed for the conversion to complete,
but maybe normally you don't use interrupts for simple reading of the ADC,
but when doing continuous readings, you could utilize adc interrupts
to make the main loop more 'free' from unnecessary waits
also by utilizing a buffer system

1. have a buffer to store for example 16 adc-readings
2. activate adc interrupts
3. start a adc-read
4. the adc interrupt routine takes the newly read value and puts it into the buffer
then starts a new adc-read unless the buffer is full, two flags can be set: read complete, buffer full
5. the main loop only reads the flags and process the data, still no unnecessary waits are required.


then how the main loop takes care of multiple tasks is another chapter



note.
when I'm talking about a buffer I mean a software-FIFO buffer.
 
Thanks, @Manicksan. You're right that DMA and interrupts provide other ways to avoid waiting. The question I'm asking is where calls to yield() would be useful if someone is using a cooperative RTOS. Functions related to using peripherals with DMA or completion interrupts would not have calls to yield().
 
have you seen TeensyThreads
that is utilizing a real task switching threads.yield() function

that mean that you could write a read ADC task
but then I did think about one more time
no this would still wait for the adc conversion to complete

then by having yield() call inside analogRead:s while (!(ADCx_HS & ADC_HS_COCO0)) { yield(); }
as you want
we can override it with

Code:
void yield() {
    threads.yield();
}
if using TeensyThreads


and here is the task
Code:
volatile int doRead=0;
volatile int adcValue=0;
void readADC_task() {
    while(1) {
         if (doRead) {
               doRead = 0;
               adcValue = analogRead(14); // A0 
         }
         threads.yield();
    }
}

setup() {
     threads.addThread(readADC_task);
}
 
have you seen TeensyThreads that is utilizing a real task switching threads.yield() function we can override it with

Code:
void yield() { threads.yield(); }

Yes, that's exactly right. TeensyThreads is time-sliced preemptive, but if you set the time-slice to be very large, and always call threads.yield() before the time-slice expires, then it is cooperative. With yield() defined as you show, calls to functions in the Cores and Libraries will yield the CPU when they are waiting for something to happen.
 
I've seen that delayMicroseconds() doesn't call yield, not even one time.
I've made a version based on delay() that calls yield(), but I lowered the internal resolution to keep it working, aka run the loop for 10+ microseconds when running on 24MHz (2+ microseconds could work but is a close one).
I can use F_CPU to find the CPU speed and adapt.

Is there a minimum CPU speed for the teensy 4.0 and 4.1, those don't have a compile time F_CPU?
 
@ AlainD
delays should be avoided
but could in some rare occasions be used when bitbanging timing sensitive stuff
in other cases use a state machine together with hardware timers
that avoid delays completely

except threads.delay() that actually allows other stuff to run



yield the CPU
do that mean what I think it mean
i.e. halting the cpu?
 
These conversations about yield() are so difficult because yield() is weak symbol that's meant to be overridden when someone wants to change from the simple event callbacks we have now to something else, like a cooperative system or a preemptive RTOS. What yield() will actually do isn't a fixed known quantity.
 
These conversations about yield() are so difficult because yield() is weak symbol that's meant to be overridden when someone wants to change from the simple event callbacks we have now to something else, like a cooperative system or a preemptive RTOS. What yield() will actually do isn't a fixed known quantity.

I think all of the current uses of yield() in the Teensy cores are consistent with the purpose of yield() as stated in the comment in the Arduino core, to perform a cooperative task switch. If we only look for places to add calls to yield() that are also consistent with that purpose, I don't think we will introduce any new difficulty for EventResponder. If someone is using TeensyThreads in a preemptive mode, then yield() would not be calling threads.yield(). This is only relevant for cooperative task switching.
 
Teensy 4 is quite new
and taking the pandemic in context
there has not been enough resources
to fix things that are missing
i.e. the different calls to yield();

@joepasquariello
The main loop actually have a call to yield
But I can now agree that it would be nice
if the calls to yield where the same as in teensy 3.x

It's at least better than just stay in the loops and 'twiddle your thumbs'
 
Yes, delayMicroseconds() should be used for very short or very precise delays, otherwise use delay().

Sometimes a delay for 1ms is to long and 100-200 microseconds would be enough, without the need of a very precise delay. I prefer then to have a few call's to yield().
 
@ AlainD
delays should be avoided
but could in some rare occasions be used when bitbanging timing sensitive stuff
in other cases use a state machine together with hardware timers
that avoid delays completely

except threads.delay() that actually allows other stuff to run

A state machine is very powerful, but if the goal is to take 3-5 readings to be able to get a median of 3 or 5, it's often overkill.
 
I've cleaned up my delayMicrosecondsWithYield and added some extra testfunctions.
Unfortunaly things like F_CPU_ACTUAL and ARM_DWT_CYCCNT are considered non public, so those are removed and only the original function for Teensy LC is left..
The following code was for the Teensy LC, but also runs on 3.6 and 4.1 (and probably the others also)
It seems even more accurate than the library delayMicroseconds on LC and for longer periods. For short (a few microseconds) the overhead of millis() is to high.

Code:
inline void delayMicrosecondsWithYieldLC(uint32_t usec1)
{
  const uint32_t start = micros();  // call to micros() is about 36 cycles or about 1.5 usec at 24Mhz
  #if((defined F_CPU) && (F_CPU >= 48000000))  // first call to micros() takes more than 1 usec
    const uint32_t uSecReserve = 15u;  // We take this nr of reserve usec for the last call to yield
  #else
    const uint32_t uSecReserve = 25u;  // We take this nr of reserve usec for the last call to yield
  #endif
  
  // It will not call yield() when les than uSecReserve are left or 24x that amount of instructions at 24Mhz.
  // This function will be very often be accurate within 3usec when running faster than 24MHz exept for very long yield() calls.
  // For very short durations the calls to micros() are giving extra delay, especially at 24MHz ...
  // F_CPU_ACTUAL Teensy 4/4.1 variable, not #define and for internal use --> not public; F_CPU is defined on Teensy 4.1

  if (usec1 > 0)
  {
    #if((defined F_CPU) && (F_CPU <= 24000000))  // first call to micros() takes more than 1 usec
      --usec1;
    #endif
    while (micros() - start + uSecReserve < usec1)
    {
      yield();
    };
    while (micros() - start < usec1);
  }
};

void Test1delay(unsigned int teller2)
{
  unsigned int microseconds;
  unsigned int totalmicros;
  unsigned int totalmicros2;
  
  microseconds = micros();
  delayMicroseconds(teller2);
  totalmicros = micros() - microseconds;
  microseconds = micros();
  delayMicrosecondsWithYieldLC(teller2);
  totalmicros2 = micros() - microseconds;
  if ((totalmicros2 < teller2) || (totalmicros2 > teller2 + 2u) )  // || (totalmicros2< (totalmicros - 3u))|| (totalmicros2> (totalmicros + 3u))
  {      
    Serial.print(teller2);
    Serial.print(':');
    Serial.print(totalmicros);
    Serial.print('_');
    Serial.print(totalmicros2);
    Serial.print(' ');
  };
};

void TestdelayMicrosecondsWithYield(void)
{
#ifndef F_CPU
  Serial.print('A');
  Serial.print(0);
#else
  Serial.print('_');
  Serial.print(F_CPU);
#endif
  Serial.print(' ');


  unsigned int microseconds;
  unsigned int totalmicros;

  microseconds = micros();
  delay(1);
  totalmicros = micros() - microseconds;
  Serial.print(totalmicros);
  Serial.print(' ');
  
  Test1delay(1000);

  for (unsigned int teller2 = 1; (teller2 <= 19); teller2 = teller2 + 1)
  {
    Test1delay(teller2);
  };

  for (unsigned int teller2 = 20; (teller2 <= 2999); teller2 = teller2 + 17)
  {
    Test1delay(teller2);
  };
  Serial.println(' ');
};

// the setup routine runs once when you press reset:
void setup() {
  // initialize serial communication at 9600 bits per second:
  Serial.begin(9600);
  pinMode(LED_BUILTIN, OUTPUT);
  delay(500); // Delay 1000 ms
}

// the loop routine runs over and over again forever:
void loop() {
  digitalWriteFast(LED_BUILTIN, HIGH);
  delayMicrosecondsWithYield(500);
  digitalWriteFast(LED_BUILTIN, !digitalRead(LED_BUILTIN));
  TestdelayMicrosecondsWithYield();
  delayMicrosecondsWithYield(500000);
}
 
OK, so I've just found a case where randomly sprinkling yield() calls around deeply-nested library code is a major pain for the application writer.

I'm using EventResponder to re-fill audio buffers from SD card, without the user having to remember to write extra code; if triggered, then the responder loads the buffer at the next yield(), ideally at the end of loop(), but also if called explicitly, or implicitly using delay(). This all works very well, by and large.

Now along comes the SdFat library, and spung in the middle of deep SD card voodoo, waiting for the card to Do Stuff, there's a yield() call. "Not ready? See if someone else can execute, then". Great idea ... but it never tells "someone else" that it can "do whatever you want, except don't touch the SD card because that's where this yielded from". So the responder goes merrily ahead, tried to access the SD card, and the world falls apart.

I'm not sure what a good general solution to this looks like. For now, I'm going to make a PR for the SdFat library, and maybe others, simply not to run EventResponder in their internal yield() calls. But it feels a bit crude...
 
This is a topic near and dear to my heart. In my opinion, the problem is this statement:

Code:
I'm using EventResponder to re-fill audio buffers from SD card, without the user having
to remember to write extra code

Arduino generally, including Teensy, have a "weak" and "empty" yield() function by default, and the documented purpose of this function is to be a placeholder for a cooperative task switch. EventResponder overrides yield(), so when you are using EventResponder, every call to yield() in a 3rd-party library will call the EventResponder version of yield().

There is no way for Bill Greiman (SdFat) to "not run EventResponder in their internal yield()", because it's not running an internal yield. It's simply calling yield() the way it was intended, which is to provide an opportunity to switch to another task and do something else while SdFat is waiting for some hardware action to complete. When you use EventResponder, you are overriding the default, weak yield(), so at link time, every call to yield() in Teensy and SdFat and every other library is replaced with a call to the EventResponder version of yield().

The solution is to stop trying to use yield() to prevent users from having to "do anything" to make their code work. I've been working through this issue recently with QNEthernet, which overrides yield(), and I've been meaning to write up the issue for @shawn .

My personal opinion is that EventResponder should not override yield(), but rather that it should require the user to explicitly call a differently-named function.

Code:
Now along comes the SdFat library, and spung in the middle of deep SD card voodoo, waiting for
the card to Do Stuff, there's a yield() call. "Not ready? See if someone else can execute, then". Great idea ...
but it never tells "someone else" that it can "do whatever you want, except don't touch the SD card because
that's where this yielded from". So the responder goes merrily ahead, tried to access the SD card, and the
world falls apart.

I use a cooperative OS on Teensy, so from my perspective there is only so much that can be done to avoid users needing to understand how their system works. If you consider SdFat, or any library that calls yield() as part of a cooperative system, access to shared resources, such as a UART or the SPI bus or the SD card, must be managaed. The simplest way to do that is to design the system so that a given resource is only accessed by one task. Another way is to use a mutex or some other such protective device.

My recommendation would be to change EventResponder to not override yield(), but rather to require the user to call it explicitly, and for yield() to be left for its original purpose as an optional cooperative task switch.

If anyone does override yield(), they should keep in mind they should also provide an alternative of making explicity calls to their replacement function, so that those who are using yield() as a cooperative task switch can continue to do so and still be able to use their library, such as QNEthernet.
 
EventResponder doesn't override yield(), the default (weak) yield checks a flag and calls EventResponder::runFromYield() if it is set.

What would perhaps be nicer, is having the default yield() called something else (i.e. default_yield() ) and have "yield" be a weak alias to that function. That way "yield" could be overriden but the default version could still be called by the replacement function, so core components like the serial events and EventResponder still work.
 
Last edited:
The solution is to stop trying to use yield() to prevent users from having to "do anything" to make their code work. I've been working through this issue recently with QNEthernet, which overrides yield(), and I've been meaning to write up the issue for @shawn .

The QNEthernet library doesn't override yield(). I hook into the EventResponder system's yield hook. Look for attachLoopToYield().

Below that code, however, is a commented-out example of how to call Ethernet.loop() from an overridden yield() on those systems that don't have EventResponder. That's only an example, though; maybe that documentation could be updated.

I like @jmarsh's suggestion for how to improve these kinds of things.
 
The QNEthernet library doesn't override yield(). I hook into the EventResponder system's yield hook. Look for attachLoopToYield().

Below that code, however, is a commented-out example of how to call Ethernet.loop() from an overridden yield() on those systems that don't have EventResponder. That's only an example, though; maybe that documentation could be updated.

I like @jmarsh's suggestion for how to improve these kinds of things.
Hi Shawn. You're right. When I said that QNEthernet overrides yield(), I should have said that QNEthernet relies on yield() being overridden by EventResponder. I'm not remembering right now the exact issue and what I had to do as a work-around, but it took some doing to figure out how to use QNEthernet with a cooperative OS that overrides yield().

@jmarsh, EventResponder overrides yield() in the sense that Paul has replaced the default Arduino yield(), which is both weak and empty, with a yield() that is weak but not empty. This allows users, such as me, to override EventResponder's yield() with a non-weak yield() that does a cooperative task switch.

@shawn, I disagree. yield() is meant to be used as a cooperative task switch, and knowing that it will be is what allows third-party library developers to insert yield() where a cooperative task switch would be appropriate, such as when waiting for some hardware action to complete. For Teensy to appropriate yield() for its own use would work for those using EventResponder, but it would break all of the third-party libraries that are using yield() the way it was intended.

EventResponder is simply an alternative to a cooperative OS.
 
The solution is to stop trying to use yield() to prevent users from having to "do anything" to make their code work.
Well ... it's "a" solution. Possibly even the right one, in the end. But there must be some reason Paul wrote EventResponder :)
The QNEthernet library doesn't override yield(). I hook into the EventResponder system's yield hook. Look for attachLoopToYield().
There you go ... that's two of us at it now!
What would perhaps be nicer, is having the default yield() called something else (i.e. default_yield() ) and have "yield" be a weak alias to that function. That way "yield" could be overridden but the default version could still be called by the replacement function, so core components like the serial events and EventResponder still work.
Yes, that would be the correct way to do it. There's still the issue that if two or more entities wanted to override it, there would be a clash ... they'd have to have a list to link into so everyone got their turn ... and you're back to EventResponder again.

Rather than my brutal approach of disallowing all EventResponder processing from SDIOTeensy.cpp, an extension to EventResponder could be to mask some pending events if the "calling" yield() would clash. So, I attach(responder, MASK_SD_ACCESS); SDIOTeensy.cpp calls masked_yield(MASK_SD_ACCESS); and EventResponder::runFromYield() skips my triggered event as it runs down its list.

To be perfectly honest, I'm of the opinion that yield() calls should under no circumstances be buried in hardware library code. That is absolutely contrary to "where a cooperative task switch would be appropriate". The only way round that would be to compel library writers who wish to do that, to also provide an isBusy() function so that it's possible to avoid switching to a task that would clash. But then my EventResponder code would have to re-trigger itself when it found it couldn't run...
 
Hi Shawn. You're right. When I said that QNEthernet overrides yield(), I should have said that QNEthernet relies on yield() being overridden by EventResponder. I'm not remembering right now the exact issue and what I had to do as a work-around, but it took some doing to figure out how to use QNEthernet with a cooperative OS that overrides yield().

That’s not quite the intent. I don’t rely on yield() being overridden; I rely on something being called regularly. It so happens that EventResponder provides that feature by making sure something is called every time loop() finishes without the user having to do anything. The fact that it uses yield() to do that is an EventResponder internal detail.

I do use yield() internally when the user needs to wait for something, which aligns with your description.

@shawn, I disagree. yield() is meant to be used as a cooperative task switch, and knowing that it will be is what allows third-party library developers to insert yield() where a cooperative task switch would be appropriate, such as when waiting for some hardware action to complete. For Teensy to appropriate yield() for its own use would work for those using EventResponder, but it would break all of the third-party libraries that are using yield() the way it was intended.

What are you disagreeing with? I never said that yield() isn’t for cooperative task switching. I don’t disagree with that. In fact, I use yield() internally (specifically in user-called functions where the user wants to wait) with this intent.

Maybe I could change the name of that setup function from “attachLoopToYield” to something that doesn’t use the word “yield” to make it more clear that I’m not actually desiring “yield” specifically, per se.

EventResponder is simply an alternative to a cooperative OS.

I’m open to suggestions on how to align with your approach. I’d love to see what you’ve changed to make it compatible with that idea.
 
Last edited:
To be perfectly honest, I'm of the opinion that yield() calls should under no circumstances be buried in hardware library code. That is absolutely contrary to "where a cooperative task switch would be appropriate". The only way round that would be to compel library writers who wish to do that, to also provide an isBusy() function so that it's possible to avoid switching to a task that would clash. But then my EventResponder code would have to re-trigger itself when it found it couldn't run...

What about those cases, for example in the QNEthernet library, where a user-called function needs to wait for something? I think it’s an appropriate use of yield() inside the “wait for condition” loop. But on the other hand, this is user-called and not “buried” in the internals, so I guess that does align with your opinion…
 
Yes, that would be the correct way to do it. There's still the issue that if two or more entities wanted to override it, there would be a clash ... they'd have to have a list to link into so everyone got their turn ... and you're back to EventResponder again.
Yes, this is similar to the problem of more than one entity wanting to override the startup hook functions. Which I came up with a solution for: create a const pointer to the function and ensure the backing storage for that pointer is located in a certain program section. Then it's trivial to walk through each pointer in the section and call each function.
 
Back
Top