Forum Rule: Always post complete source code & details to reproduce any issue!
Results 1 to 16 of 16

Thread: Use of yield() in Teensy Cores and Libraries

  1. #1

    Use of yield() in Teensy Cores and Libraries

    The recent thread "Thoughts on Handling Complexity" led to a discussion of cooperative multi-tasking and the use of yield(). In Arduino, yield() is an empty function defined as "weak" to allow replacement with a function that does a cooperative task switch. Here is the comment from the Arduino core:

    /* Empty yield() hook. This function is intended to be used by library writers
    * to build libraries/sketches that support cooperative threads. It's defined as
    * a weak symbol and can be redefined to implement a cooperative scheduler.
    */
    The calls to yield() in the latest version of the Teensy cores are listed below. All are made from within while loops, waiting either for a hardware operation to complete, or a timeout to expire. If using a cooperative RTOS, calling any of these functions from within a task will result in that task yielding the CPU if the function must wait for the operation to complete. If there is no need to wait, such as being able to put some number of bytes into a TX buffer, the function will return without yielding.

    I think this is the right model to maintain, so the only change I would suggest to the cores is to add yield() to analogRead() in Teensy4/analog.c, to be consistent with Teensy3/analog.c.

    The TeensyDuino libraries contain many uses of yield(), too many to discuss in one thread, but I’m interested in SPI and I2C, so I’ll focus on those for now. The SPI library transfer functions currently do not call yield(), and I think they could. For I2C, the Teensy4 file WireIMXRT does call yield() while waiting. WireKinetis has timeouts, but does not call yield(), and I think it should.

    Code:
    Teensy (none)
    
    Teensy3
    
        analog.c (2)
            analogRead - wait for conversion complete
        main.cpp (1)
            main - after each execution of loop()
        pins_teensy.c (1)
            delay - wait for delay complete
        serial1.c, serial2.c (4)
            serial_end - wait for TX complete
            serial_putchar - wait for buffer to have room (without hardware FIFO)
            serial_write - wait for buffer to have room (with hardware FIFO)
            serial_flush - wait for TX complete
        serial3.c, serial4.c, serial5.c, serial6.c, serial6_lpuart.c (3)
            serial_end - wait for TX complete
            serial_putchar - wait for buffer to have room (no hardware FIFO)
            serial_flush - wait for TX complete
        Stream.cpp
            timedRead - wait for timeout (if no data available)
            timedPeek - wait for timeout (if no data available)
        usb_flightsim/joystick/keyboard/midi/mtp/rawhid/seremu/serial/serial2/serial3
            (generally wait for send complete or timeout on read)
    
    Teensy4 (same as Teensy3 with the exception of analogRead)
    
        delay.c (1)
            delay - wait for delay complete
        HardwareSerial.cpp (3, one class for HardwareSerial1-8)
            end - wait for TX complete
            flush - wait for TX complete
            write_9_bit - wait for buffer to have room
        main.cpp
            main - after each execution of loop()
        Stream.cpp
            timedRead - wait for timeout (if no data available)
            timedPeek - wait for timeout (if no data available)
        usb_flightsim/joystick/keyboard/midi/mtp/rawhid/seremu/serial/serial2/serial3
            (generally wait for send complete or timeout on read)

  2. #2
    Senior Member manicksan's Avatar
    Join Date
    Jun 2020
    Location
    Sweden
    Posts
    613
    Normally for all hardware's that takes a lot of time to complete
    for example slow interfaces such as UART/I2C/SPI etc.
    You use both transmit and receive interrupt mechanisms, sometimes even DMA if available.

    For example if we want to transmit something over UART
    First when the program starts you allocate a buffer with for example 1024 bytes
    (or less depending how much memory that is available),
    it could also be allocated dynamically, when there is need for it,
    depending on the requirements for the specific 'product'.

    1. if you need to send something
    2. buffer is filled with the data
    3. transmit is activated, if not already running which begins to send the first data from the buffer
    4. transmit interrupt is activated
    (sometimes it can be activated before beginning to send the first data, but the buffer need to have the data so the interrupt routine have something to send)
    this then automatically reads the buffer when above sent data is finished,
    and transmit the rest of the data in the background,
    when it's finished/or a transmit error has occurred (specially for I2C),
    it can set a flag, so that the main program can see if the transmit was successful.
    5. the main program runs and just checks the flags above, no wait is required.
    also the main program can fill in the buffer, (safest is to disable interrupts when doing so, to ensure 'atomic' access)
    a wait would only occur if the buffer is full and the main loop then need to wait for it to be empty
    (this state can be avoided by having a bigger buffer size)

    a more advanced alternative to using the peripheral specific interrupt system is by using DMA access
    then transmits specially for UART can be done without any CPU interaction
    and the interrupt is normally only occurring at the end of DMA transfer.


    ADC also have a interrupt system, so no waits are needed for the conversion to complete,
    but maybe normally you don't use interrupts for simple reading of the ADC,
    but when doing continuous readings, you could utilize adc interrupts
    to make the main loop more 'free' from unnecessary waits
    also by utilizing a buffer system

    1. have a buffer to store for example 16 adc-readings
    2. activate adc interrupts
    3. start a adc-read
    4. the adc interrupt routine takes the newly read value and puts it into the buffer
    then starts a new adc-read unless the buffer is full, two flags can be set: read complete, buffer full
    5. the main loop only reads the flags and process the data, still no unnecessary waits are required.


    then how the main loop takes care of multiple tasks is another chapter



    note.
    when I'm talking about a buffer I mean a software-FIFO buffer.

  3. #3
    Thanks, @Manicksan. You're right that DMA and interrupts provide other ways to avoid waiting. The question I'm asking is where calls to yield() would be useful if someone is using a cooperative RTOS. Functions related to using peripherals with DMA or completion interrupts would not have calls to yield().

  4. #4
    Senior Member manicksan's Avatar
    Join Date
    Jun 2020
    Location
    Sweden
    Posts
    613
    have you seen TeensyThreads
    that is utilizing a real task switching threads.yield() function

    that mean that you could write a read ADC task
    but then I did think about one more time
    no this would still wait for the adc conversion to complete

    then by having yield() call inside analogRead:s while (!(ADCx_HS & ADC_HS_COCO0)) { yield(); }
    as you want
    we can override it with

    Code:
    void yield() {
        threads.yield();
    }
    if using TeensyThreads


    and here is the task
    Code:
    volatile int doRead=0;
    volatile int adcValue=0;
    void readADC_task() {
        while(1) {
             if (doRead) {
                   doRead = 0;
                   adcValue = analogRead(14); // A0 
             }
             threads.yield();
        }
    }
    
    setup() {
         threads.addThread(readADC_task);
    }

  5. #5
    Quote Originally Posted by manicksan View Post
    have you seen TeensyThreads that is utilizing a real task switching threads.yield() function we can override it with

    Code:
    void yield() { threads.yield(); }
    Yes, that's exactly right. TeensyThreads is time-sliced preemptive, but if you set the time-slice to be very large, and always call threads.yield() before the time-slice expires, then it is cooperative. With yield() defined as you show, calls to functions in the Cores and Libraries will yield the CPU when they are waiting for something to happen.

  6. #6
    Senior Member
    Join Date
    Dec 2016
    Location
    EU
    Posts
    218
    I've seen that delayMicroseconds() doesn't call yield, not even one time.
    I've made a version based on delay() that calls yield(), but I lowered the internal resolution to keep it working, aka run the loop for 10+ microseconds when running on 24MHz (2+ microseconds could work but is a close one).
    I can use F_CPU to find the CPU speed and adapt.

    Is there a minimum CPU speed for the teensy 4.0 and 4.1, those don't have a compile time F_CPU?

  7. #7
    Yes, delayMicroseconds() should be used for very short or very precise delays, otherwise use delay().

  8. #8
    Senior Member manicksan's Avatar
    Join Date
    Jun 2020
    Location
    Sweden
    Posts
    613
    @ AlainD
    delays should be avoided
    but could in some rare occasions be used when bitbanging timing sensitive stuff
    in other cases use a state machine together with hardware timers
    that avoid delays completely

    except threads.delay() that actually allows other stuff to run



    Quote Originally Posted by joepasquariello View Post
    yield the CPU
    do that mean what I think it mean
    i.e. halting the cpu?

  9. #9
    No, I mean yielding to the next task.

  10. #10
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    27,685
    These conversations about yield() are so difficult because yield() is weak symbol that's meant to be overridden when someone wants to change from the simple event callbacks we have now to something else, like a cooperative system or a preemptive RTOS. What yield() will actually do isn't a fixed known quantity.

  11. #11
    Quote Originally Posted by PaulStoffregen View Post
    These conversations about yield() are so difficult because yield() is weak symbol that's meant to be overridden when someone wants to change from the simple event callbacks we have now to something else, like a cooperative system or a preemptive RTOS. What yield() will actually do isn't a fixed known quantity.
    I think all of the current uses of yield() in the Teensy cores are consistent with the purpose of yield() as stated in the comment in the Arduino core, to perform a cooperative task switch. If we only look for places to add calls to yield() that are also consistent with that purpose, I don't think we will introduce any new difficulty for EventResponder. If someone is using TeensyThreads in a preemptive mode, then yield() would not be calling threads.yield(). This is only relevant for cooperative task switching.

  12. #12
    Senior Member manicksan's Avatar
    Join Date
    Jun 2020
    Location
    Sweden
    Posts
    613
    Teensy 4 is quite new
    and taking the pandemic in context
    there has not been enough resources
    to fix things that are missing
    i.e. the different calls to yield();

    @joepasquariello
    The main loop actually have a call to yield
    But I can now agree that it would be nice
    if the calls to yield where the same as in teensy 3.x

    It's at least better than just stay in the loops and 'twiddle your thumbs'

  13. #13
    Senior Member
    Join Date
    Dec 2016
    Location
    EU
    Posts
    218
    Quote Originally Posted by joepasquariello View Post
    Yes, delayMicroseconds() should be used for very short or very precise delays, otherwise use delay().
    Sometimes a delay for 1ms is to long and 100-200 microseconds would be enough, without the need of a very precise delay. I prefer then to have a few call's to yield().

  14. #14
    Senior Member
    Join Date
    Dec 2016
    Location
    EU
    Posts
    218
    Quote Originally Posted by manicksan View Post
    @ AlainD
    delays should be avoided
    but could in some rare occasions be used when bitbanging timing sensitive stuff
    in other cases use a state machine together with hardware timers
    that avoid delays completely

    except threads.delay() that actually allows other stuff to run
    A state machine is very powerful, but if the goal is to take 3-5 readings to be able to get a median of 3 or 5, it's often overkill.

  15. #15
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    27,685
    I've updated analogRead() on Teensy 3 and 4 to (hopefully) be consistent in calling yield().

    https://github.com/PaulStoffregen/co...db43f3433c6f78

  16. #16
    Senior Member
    Join Date
    Dec 2016
    Location
    EU
    Posts
    218
    I've cleaned up my delayMicrosecondsWithYield and added some extra testfunctions.
    Unfortunaly things like F_CPU_ACTUAL and ARM_DWT_CYCCNT are considered non public, so those are removed and only the original function for Teensy LC is left..
    The following code was for the Teensy LC, but also runs on 3.6 and 4.1 (and probably the others also)
    It seems even more accurate than the library delayMicroseconds on LC and for longer periods. For short (a few microseconds) the overhead of millis() is to high.

    Code:
    inline void delayMicrosecondsWithYieldLC(uint32_t usec1)
    {
      const uint32_t start = micros();  // call to micros() is about 36 cycles or about 1.5 usec at 24Mhz
      #if((defined F_CPU) && (F_CPU >= 48000000))  // first call to micros() takes more than 1 usec
        const uint32_t uSecReserve = 15u;  // We take this nr of reserve usec for the last call to yield
      #else
        const uint32_t uSecReserve = 25u;  // We take this nr of reserve usec for the last call to yield
      #endif
      
      // It will not call yield() when les than uSecReserve are left or 24x that amount of instructions at 24Mhz.
      // This function will be very often be accurate within 3usec when running faster than 24MHz exept for very long yield() calls.
      // For very short durations the calls to micros() are giving extra delay, especially at 24MHz ...
      // F_CPU_ACTUAL Teensy 4/4.1 variable, not #define and for internal use --> not public; F_CPU is defined on Teensy 4.1
    
      if (usec1 > 0)
      {
        #if((defined F_CPU) && (F_CPU <= 24000000))  // first call to micros() takes more than 1 usec
          --usec1;
        #endif
        while (micros() - start + uSecReserve < usec1)
        {
          yield();
        };
        while (micros() - start < usec1);
      }
    };
    
    void Test1delay(unsigned int teller2)
    {
      unsigned int microseconds;
      unsigned int totalmicros;
      unsigned int totalmicros2;
      
      microseconds = micros();
      delayMicroseconds(teller2);
      totalmicros = micros() - microseconds;
      microseconds = micros();
      delayMicrosecondsWithYieldLC(teller2);
      totalmicros2 = micros() - microseconds;
      if ((totalmicros2 < teller2) || (totalmicros2 > teller2 + 2u) )  // || (totalmicros2< (totalmicros - 3u))|| (totalmicros2> (totalmicros + 3u))
      {      
        Serial.print(teller2);
        Serial.print(':');
        Serial.print(totalmicros);
        Serial.print('_');
        Serial.print(totalmicros2);
        Serial.print(' ');
      };
    };
    
    void TestdelayMicrosecondsWithYield(void)
    {
    #ifndef F_CPU
      Serial.print('A');
      Serial.print(0);
    #else
      Serial.print('_');
      Serial.print(F_CPU);
    #endif
      Serial.print(' ');
    
    
      unsigned int microseconds;
      unsigned int totalmicros;
    
      microseconds = micros();
      delay(1);
      totalmicros = micros() - microseconds;
      Serial.print(totalmicros);
      Serial.print(' ');
      
      Test1delay(1000);
    
      for (unsigned int teller2 = 1; (teller2 <= 19); teller2 = teller2 + 1)
      {
        Test1delay(teller2);
      };
    
      for (unsigned int teller2 = 20; (teller2 <= 2999); teller2 = teller2 + 17)
      {
        Test1delay(teller2);
      };
      Serial.println(' ');
    };
    
    // the setup routine runs once when you press reset:
    void setup() {
      // initialize serial communication at 9600 bits per second:
      Serial.begin(9600);
      pinMode(LED_BUILTIN, OUTPUT);
      delay(500); // Delay 1000 ms
    }
    
    // the loop routine runs over and over again forever:
    void loop() {
      digitalWriteFast(LED_BUILTIN, HIGH);
      delayMicrosecondsWithYield(500);
      digitalWriteFast(LED_BUILTIN, !digitalRead(LED_BUILTIN));
      TestdelayMicrosecondsWithYield();
      delayMicrosecondsWithYield(500000);
    }

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •