Forum Rule: Always post complete source code & details to reproduce any issue!
Results 1 to 23 of 23

Thread: T4.x (maybe other) reduce serialEventX overhead on system (worth it?)

  1. #1
    Senior Member+ KurtE's Avatar
    Join Date
    Jan 2014
    Posts
    6,921

    T4.x (maybe other) reduce serialEventX overhead on system (worth it?)

    While I was hacking (carefully crafting) changes to how yield worked as per the other thread,

    I came back to the code for T4.x where I would blindly enable yield to call off to the serialEventX function on every
    Serial port you did a begin on, and then I would remove the call on the first time our default implementation was called as it served no useful function.

    I did that because I did not know a way to find out if the sketch has provided their own serialEvent implementation.

    Example sketches that do things like:
    Code:
    void serialEvent1() {
        while (Serial1.available()) Serial.write(Serial1.read());
    }
    But I think I figured out a way of detecting this...

    What I just tried was to extract the default implementation for serialEvent1 out of HardwareSerial1.cpp and put it into a new file
    serialEvent1.cpp.

    Now in HardwareSerial.cpp I add the following:
    Code:
    //void serialEvent1() __attribute__((weak));
    //void serialEvent1() {Serial1.disableSerialEvents(); }		// No use calling this so disable if called...
    uint8_t serialEvent1_default __attribute__((weak)) PROGMEM = 0 ;
    and in the new serialEvent1.cpp the file has:
    Code:
    #include <Arduino.h>
    #include "HardwareSerial.h"
    void serialEvent1() __attribute__((weak));
    void serialEvent1() {Serial1.disableSerialEvents(); }		// No use calling this so disable if called...
    uint8_t serialEvent1_default PROGMEM = 1;
    So I tried adding to simple sketch:
    Code:
      extern const uint8_t serialEvent1_default;
      Serial.printf("Default serialEvent1? %d\n", serialEvent1_default);
    Which did not have a serialEvent1 and printed 1, when I add it it printed 0...

    So I can create 9 new files like this Serial1-8 plus the USB 1, change my begin method to check for a flag like this and only setup to call the serialEvent if the users code actually makes use of it.

    Does this make sense? Worth it?

  2. #2
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    11,785
    If that lets yield() processing more quickly do less or nothing when those things are not in user code that is very cool.

    What dies the resultant yield() processing look like?

  3. #3
    Senior Member+ KurtE's Avatar
    Join Date
    Jan 2014
    Posts
    6,921
    I added the changes to the branch: https://github.com/KurtE/cores/tree/...educe_overhead
    That I have a PR on. I probably should have someone (including myself) build it on linux or mac to make sure file name stuff work OK.

    It appears like build file order makes differences in if a ((weak)) variable is used or another may be brought in...

    Also the changes now include the Serial changes for XBAR pins (different PR) as I did not want to have myself or Paul if he merges them in. have to
    resolve conflicts.

    But with these changes. If you have simple sketches, that don't implement any serialEvent like functions and don't use any eventResponder objects that are setup to be called on yield. Then yield reduces to:
    Code:
    void yield(void) __attribute__ ((weak));
    void yield(void)
    {
    	static uint8_t running=0;
    	if (!yield_active_check_flags) return;	// nothing to do
    	if (running) return; // TODO: does this need to be atomic?
    	running = 1;
    
    
    	// USB Serail - Add hack to minimize impact...
    	if (yield_active_check_flags & YIELD_CHECK_USB_SERIAL) {
    		if (Serial.available()) serialEvent();
    		if (_serialEvent_default) yield_active_check_flags &= ~YIELD_CHECK_USB_SERIAL;
    	}
    
    	// Current workaround until integrate with EventResponder.
    	if (yield_active_check_flags & YIELD_CHECK_HARDWARE_SERIAL) HardwareSerial::processSerialEvents();
    
    	running = 0;
    	if (yield_active_check_flags & YIELD_CHECK_EVENT_RESPONDER) EventResponder::runFromYield();
    	
    };
    where yield_active_check_flags will be 0.
    So yield will simply do one if and return.

    From the other thread on eventResponder. The test was run again with some stuff printing out properly.
    Code:
    SPI Test program
    
    Default serialEvent? 1 1
    Press any key to run test
    
    start test yield_active_check_flags 0
      systick ISR: 2159
    Start: 35
    
    Test Immediate: 0 2159
    After Immediate: 35
    
    Test yield: 4 2159
    After yield: 55
    
    Test Interrupt: 4 2179
    After Interrupt: 56
    
    Press any key to run test
    Where I am printing out the value of yield_active_check_flags
    I then test how many microseconds calling yield takes.

    So at start of test where yield does nothing : 35
    If I have setup an eventResponder that is called on yield: 55
    ...

  4. #4
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    11,785
    That looks cool KurtE - reading the PR comment provides good info

  5. #5
    Senior Member+ KurtE's Avatar
    Join Date
    Jan 2014
    Posts
    6,921
    Thanks,

    Again not sure how much it is worth it, as I don't know anyone who cares about the yield and eventResponder overhead ... So again probably waste of time, but I thought I would give an hour or two to see if I can bring some of it back in to the Teensy3 branch...

    I hacked up same sketch to disable printing some of the T4 specific (and change stuff) information and ran on T3.5 ...
    Some of the timing are a bit different than T4.1 I just tested on .

    Code:
    SPI Test program
    
    Press any key to run test
    
    Start: 2466
    
    Test Immediate: 0 2859
    After Immediate: 2466
    
    Test yield: 0 2859
    After yield: 2474
    
    Test Interrupt: 0 2859
    After Interrupt: 2472
    
    Press any key to run test
    So with my last test on T4.1 calling yield 1000 times took: Start: 35
    Starting off T3.5 Took: Start: 2466

    So maybe an area for some slight improvements.

    Also starting code/data sizes
    Code:
    "C:\\arduino-1.8.12\\hardware\\teensy/../tools/arm/bin/arm-none-eabi-size" -A "C:\\Users\\kurte\\AppData\\Local\\Temp\\arduino_build_604884/SPI_test_eventResponder.ino.elf"
    Sketch uses 36720 bytes (7%) of program storage space. Maximum is 524288 bytes.
    Global variables use 5360 bytes (2%) of dynamic memory, leaving 256776 bytes for local variables. Maximum is 262136 bytes.
    C:\arduino-1.8.12\hardware\teensy/../tools/teensy_post_compile -file=SPI_test_eventResponder.ino -path=C:\Users\kurte\AppData\Local\Temp\arduino_build_604884 -tools=C:\arduino-1.8.12\hardware\teensy/../tools -board=TEENSY35 -reboot -port=usb:0/140000/0/1/1/1 -portlabel=hid#vid_16c0&pid_0478 Bootloader -portprotocol=Teensy
    Now some quick hacking

  6. #6
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    11,785
    Not sure I'm making sense of the test numbers?

    One simple real test would be monitor of count of loop() per second:
    >> Yield as it was
    >> User sketch :: void yield() {}
    >> Yield as it is when no serialEvent() in use?

    That will result in it being called as it normally does and result in some 100K's to Millions of loops()'s per second unique to each Teensy.

  7. #7
    Senior Member+ KurtE's Avatar
    Join Date
    Jan 2014
    Posts
    6,921
    The test numbers are simply:
    Code:
    void TimeYieldCalls(const char *sz) {
      yield();
      Serial.print(sz); Serial.flush();
      elapsedMicros em = 0;
      for (uint32_t i = 0; i < 1000; i++) yield();
      uint32_t elapsed = em;
      Serial.print(": ");
      Serial.println(elapsed, DEC);
      Serial.flush();
    }
    So with my updates for T4.x with sketch that does not do any serialEvent stuff and does not do eventResponder.attach(&my_func);
    which adds this event object to a list of ones called by yield... Without these in sketch 100 calls to yield took 35us and current T3.5 unchanged took 2466us.

    So now working on a version of the T3.x and I assume T-LC will fall out. that should again in this case end up at simply logically do.

    void yield(void)
    {
    static uint8_t running=0;
    if (!yield_active_check_flags) return; // nothing to do
    ...
    }
    where in this case yield_active_check_flags will be 0 and return.

    But with my hacking, I hopefully will also get to a version that with T3.x and T-LC you only get the code and data objects brought into your sketch for only those objects you actually use. So for example if your sketch on T3.5 only uses Serial1, then you will not have the space penalty for Serial2-6...
    Although not sure how much that is saving. So far with my hacking of only using Serial1, it reduced data size by about 580 bytes.
    Which is probably not too far off.

    My current run on T3.5 is a lot better speed wise...
    Code:
    SPI Test program
    
    Press any key to run test
    
    Start: 212
    
    Test Immediate: 0 26cd
    After Immediate: 212
    
    Test yield: 4 26cd
    After yield: 395
    
    Test Interrupt: 4 26dd
    After Interrupt: 390
    Obviously not as fast as T4.1 but ...

    Will push up changes in the morning.

  8. #8
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    11,785
    That makes more sense of the numbers and where the state of progress was for 4.x and 3.x.

    Testing below on stock TD 1.52 - will wait for github update for 3.x and make sure 4.x is settled.

    Below are runs of this sketch - Above TimeYieldCalls() - with and without void yield() and loopCnt.
    >> manual line 1 edit for Teensy ###
    >> line 2 :: #if 0 or 1
    >> Line 4&6 :: change "TD 1.52" when cores changed ( or edit line 1 text )
    Code:
    const char szTeensy[] = "Teensy 4.1";
    #if 0
    void yield() {}
    const char szTest[] = "TD 1.52 :: PRIVATE  yield() :: setup Test:";
    #else
    const char szTest[] = "TD 1.52 setup Test:";
    #endif
    
    elapsedMillis loopTime;
    uint32_t loopCnt = 0;
    void setup() {
      while (!Serial) ; // wait
      TimeYieldCalls( szTest );
      Serial.println("\n" __FILE__ " " __DATE__ " " __TIME__);
      Serial.println(szTeensy);
      TimeYieldCalls( szTest );
      loopTime = 0;
    }
    
    void loop() {
      if ( loopTime >= 1000 ) {
        loopTime -= 1000;
        Serial.printf("loop's per sec = %lu\n", loopCnt);
        loopCnt = 0;
      }
      loopCnt++;
    }
    
    void TimeYieldCalls(const char *sz) {
      yield();
      Serial.print(sz); Serial.flush();
      elapsedMicros em = 0;
      for (uint32_t i = 0; i < 1000; i++) yield();
      uint32_t elapsed = em;
      Serial.print(": ");
      Serial.println(elapsed, DEC);
      Serial.flush();
    }

    Code:
    TD 1.52 :: PRIVATE  yield() :: setup Test:: 1
    
    T:\tCode\Serial\YieldTest\YieldTest.ino May 20 2020 23:28:11
    Teensy 4.1
    TD 1.52 :: PRIVATE  yield() :: setup Test:: 0
    loop's per sec = 17141019
    loop's per sec = 17141119
    loop's per sec = 17141119
    loop's per sec = 17141120
    Code:
    TD 1.52 setup Test:: 64
    
    T:\tCode\Serial\YieldTest\YieldTest.ino May 20 2020 23:28:37
    Teensy 4.1
    TD 1.52 setup Test:: 64
    loop's per sec = 11761913
    loop's per sec = 11763478
    loop's per sec = 11763477
    loop's per sec = 11763478
    Teensy 3.6::
    Code:
    TD 1.52 :: PRIVATE  yield() :: setup Test:: 0
    
    T:\tCode\Serial\YieldTest\YieldTest.ino May 20 2020 23:35:06
    Teensy 3.6
    TD 1.52 :: PRIVATE  yield() :: setup Test:: 1
    loop's per sec = 3909437
    loop's per sec = 3909477
    loop's per sec = 3909493
    loop's per sec = 3909500
    Code:
    TD 1.52 setup Test:: 847
    
    T:\tCode\Serial\YieldTest\YieldTest.ino May 20 2020 23:35:50
    Teensy 3.6
    TD 1.52 setup Test:: 849
    loop's per sec = 940876
    loop's per sec = 941538
    loop's per sec = 941541
    loop's per sec = 941544
    Teensy 3.1::
    Code:
    TD 1.52 :: PRIVATE  yield() :: setup Test:: 0
    
    T:\tCode\Serial\YieldTest\YieldTest.ino May 20 2020 23:39:03
    Teensy 3.1
    TD 1.52 :: PRIVATE  yield() :: setup Test:: 0
    loop's per sec = 2081188
    loop's per sec = 2081255
    loop's per sec = 2081258
    loop's per sec = 2081261
    Code:
    TD 1.52 setup Test:: 1425
    
    T:\tCode\Serial\YieldTest\YieldTest.ino May 20 2020 23:39:23
    Teensy 3.1
    TD 1.52 setup Test:: 1431
    loop's per sec = 442589
    loop's per sec = 442984
    loop's per sec = 442984
    loop's per sec = 442984
    Teensy LC:
    Code:
    TD 1.52 :: PRIVATE  yield() :: setup Test:: 1
    
    T:\tCode\Serial\YieldTest\YieldTest.ino May 20 2020 23:45:15
    Teensy LC
    TD 1.52 :: PRIVATE  yield() :: setup Test:: 1
    loop's per sec = 851889
    loop's per sec = 851922
    loop's per sec = 852000
    loop's per sec = 851994
    Code:
    TD 1.52 setup Test:: 3858
    
    T:\tCode\Serial\YieldTest\YieldTest.ino May 20 2020 23:44:55
    Teensy LC
    TD 1.52 setup Test:: 3876
    loop's per sec = 202910
    loop's per sec = 203055
    loop's per sec = 203073
    loop's per sec = 203072
    Last edited by defragster; 05-21-2020 at 07:25 AM.

  9. #9
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    11,785
    Wondered how many cycles in and out of loop() - runs, but CYCCNT won't work on T_LC:
    Code:
    const char szTeensy[] = "Teensy 4.1";
    #if 1
    void yield() {}
    const char szTest[] = "TD 1.52 :: PRIVATE  yield() :: setup Test:";
    #else
    const char szTest[] = "TD 1.52 setup Test:";
    #endif
    
    elapsedMillis loopTime;
    uint32_t loopCnt = 0;
    uint32_t loopACC[16];
    uint32_t yieldACC[16];
    uint32_t lastACC = 0;
    
    void setup() {
      if ( ARM_DWT_CYCCNT == ARM_DWT_CYCCNT ) {
        // Enable CPU Cycle Count
        ARM_DEMCR |= ARM_DEMCR_TRCENA;
        ARM_DWT_CTRL |= ARM_DWT_CTRL_CYCCNTENA;
      }
      while (!Serial) ; // wait
      TimeYieldCalls( szTest );
      Serial.println("\n" __FILE__ " " __DATE__ " " __TIME__);
      Serial.println(szTeensy);
      TimeYieldCalls( szTest );
      loopTime = 0;
    }
    
    void loop() {
      yieldACC[loopCnt & 0xF] = ARM_DWT_CYCCNT - lastACC;
      lastACC = ARM_DWT_CYCCNT;
      if ( loopTime >= 1000 ) {
        for ( int ii = 1; ii <= 0xf; ii++ ) {
          loopACC[0] += loopACC[ii];
          yieldACC[0] += yieldACC[ii];
        }
        Serial.printf("loop's per sec = %lu\t", loopCnt);
        Serial.printf("ARM_Cycles's in loop = %lu\t in yield = %lu\n\n", loopACC[0]/16, yieldACC[0]/16);
        loopCnt = 0;
        loopTime = 0;
      }
      loopCnt++;
      loopACC[loopCnt & 0xF] = ARM_DWT_CYCCNT - lastACC;
      lastACC = ARM_DWT_CYCCNT;
    }
    
    void TimeYieldCalls(const char *sz) {
      yield();
      Serial.print(sz); Serial.flush();
      elapsedMicros em = 0;
      for (uint32_t i = 0; i < 1000; i++) yield();
      uint32_t elapsed = em;
      Serial.print(": ");
      Serial.println(elapsed, DEC);
      Serial.flush();
    }
    Shows how little 'simple' loop() does - and how few cycles difference are in [return/ run yield()/ call loop()] for the void and current yield(), but it adds up ... about 60 million cycles per second for current yield with extra loop()'s.
    >> Even adding the CYCCNT tracking code slowed it down and loses cycles per loop recording the counts
    Code:
    TD 1.52 :: PRIVATE  yield() :: setup Test:: 1
    
    T:\tCode\Serial\YieldTest\YieldTest.ino May 21 2020 02:12:48
    Teensy 4.1
    TD 1.52 :: PRIVATE  yield() :: setup Test:: 0
    loop's per sec = 10525174	ARM_Cycles's in loop = 8	 in yield = 40
    
    loop's per sec = 10525203	ARM_Cycles's in loop = 11	 in yield = 37  // accounts for 505209744 cycles
    
    loop's per sec = 10525196	ARM_Cycles's in loop = 8	 in yield = 40
    
    loop's per sec = 10525203	ARM_Cycles's in loop = 8	 in yield = 37
    Code:
    TD 1.52 setup Test:: 64
    
    T:\tCode\Serial\YieldTest\YieldTest.ino May 21 2020 02:13:11
    Teensy 4.1
    TD 1.52 setup Test:: 64
    loop's per sec = 8217224	ARM_Cycles's in loop = 8	 in yield = 53
    
    loop's per sec = 8218294	ARM_Cycles's in loop = 8	 in yield = 56   // accounts for 525970816 cycles
    
    loop's per sec = 8218294	ARM_Cycles's in loop = 8	 in yield = 56
    
    loop's per sec = 8218294	ARM_Cycles's in loop = 8	 in yield = 56
    Note:
    static inline void yield() {} // runs the same fast 37 to 40 cycles so the build does this it seems.
    inline void yield() {} // runs the same slower 56 cycles as calling the current PJRC yield()

  10. #10
    Senior Member+ KurtE's Avatar
    Join Date
    Jan 2014
    Posts
    6,921
    Thanks @defragster - I just pushed up the changes for the teensy3 branch. I did compile my test sketch for some of this on T3.5 and ran it, plus compiled for 3.2 and then compiled for LC and ran it.

    So I think everything appears to be working.

  11. #11
    Senior Member+ KurtE's Avatar
    Join Date
    Jan 2014
    Posts
    6,921
    Note: Mostly talking to self

    Thinking of doing a quick cleanup on T3.x/LC code I did yesterday, trying to decide which way is cleaner/faster.

    Currently I have code that populates an array by Serial object index with an individual callback function for each of the Serial objects. And then now yield does:


    Code:
    ...
    	if (yield_active_check_flags & YIELD_CHECK_HARDWARE_SERIAL) {
    		if (serial_event_handler_checks[0]) (*serial_event_handler_checks[0])();
    		if (serial_event_handler_checks[1]) (*serial_event_handler_checks[1])();
    		if (serial_event_handler_checks[2]) (*serial_event_handler_checks[2])();
    #ifdef HAS_KINETISK_UART3
    		if (serial_event_handler_checks[3]) (*serial_event_handler_checks[3])();
    #endif
    #ifdef HAS_KINETISK_UART4
    		if (serial_event_handler_checks[4]) (*serial_event_handler_checks[4])();
    #endif
    #if defined(HAS_KINETISK_UART5) || defined (HAS_KINETISK_LPUART0)
    		if (serial_event_handler_checks[5]) (*serial_event_handler_checks[5])();
    #endif
    	}
    First I am going to move the callback code over to HardwareSerial code as cleaner...
    I am also thinking of having the array populated by simply adding any ones that have user call back functions to list.
    So simple loop, calling each one...

    Also thinking of adding the add to list, and processing to the HardwareSerial class, and maybe add pointer to their serialEvent function to the constructor.
    So have one simple callback member function that does: if(available()) (*_serialEvent)();

    Instead of individual ones.

    Side note: the T3.x core has support for serialEventUSB1 and serialEventUSB2, but does not currently have this... I see an extern defined for both but not anything in yield.
    Maybe add in

  12. #12
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    11,785
    Pulled the few hours old CORES to 2nd IDE 1.8.12 folder.

    LOOKS AWESOME KURT!!!! That set of QUICK EXIT changes to yield processing looks to MATCH user added private 'void yield()' !!!

    Testing PR code with two versions of loop() above shows great results. Did not actually test with use of serialEvents() but to compare to above:

    Code:
    TD 1.52 :: PRIVATE  yield() :: setup Test:: 0
    
    T:\tCode\Serial\YieldTest\YieldTest.ino May 21 2020 12:26:39
    Teensy 4.1 :: CORES event PR
    TD 1.52 :: PRIVATE  yield() :: setup Test:: 1
    loop's per sec = 10525718	ARM_Cycles's in loop = 8	 in yield = 37
    
    loop's per sec = 10525695	ARM_Cycles's in loop = 8	 in yield = 38
    Code:
    TD 1.52 setup Test:: 37
    
    T:\tCode\Serial\YieldTest\YieldTest.ino May 21 2020 12:27:14
    Teensy 4.1 :: CORES event PR
    TD 1.52 setup Test:: 35
    loop's per sec = 10524895	ARM_Cycles's in loop = 8	 in yield = 38
    
    loop's per sec = 10525716	ARM_Cycles's in loop = 8	 in yield = 38
    Then changing to prior non-CYCCNT version of loop()::
    Code:
    TD 1.52 setup Test:: 35
    
    T:\tCode\Serial\YieldTest\YieldTest.ino May 21 2020 12:29:04
    Teensy 4.1 :: CORES event PR
    TD 1.52 setup Test:: 35
    loop's per sec = 17644653
    loop's per sec = 17646083
    Code:
    TD 1.52 :: PRIVATE  yield() :: setup Test:: 0
    
    T:\tCode\Serial\YieldTest\YieldTest.ino May 21 2020 12:29:29
    Teensy 4.1 :: CORES event PR
    TD 1.52 :: PRIVATE  yield() :: setup Test:: 1
    loop's per sec = 17141903
    loop's per sec = 17141920
    And the same runs on a T_3.6 - doesn't recover the gains quite as well ... but better!:
    Code:
    TD 1.52 :: PRIVATE  yield() :: setup Test:: 0
    
    T:\tCode\Serial\YieldTest\YieldTest.ino May 21 2020 12:42:13
    Teensy 3.6 :: CORES event PR
    TD 1.52 :: PRIVATE  yield() :: setup Test:: 0
    loop's per sec = 2067462	ARM_Cycles's in loop = 25	 in yield = 44
    
    loop's per sec = 2067448	ARM_Cycles's in loop = 26	 in yield = 43
    
    TD 1.52 setup Test:: 141
    
    T:\tCode\Serial\YieldTest\YieldTest.ino May 21 2020 12:42:32
    Teensy 3.6 :: CORES event PR
    TD 1.52 setup Test:: 141
    loop's per sec = 1712587	ARM_Cycles's in loop = 26	 in yield = 61
    
    loop's per sec = 1713034	ARM_Cycles's in loop = 25	 in yield = 62
    Code:
    TD 1.52 :: PRIVATE  yield() :: setup Test:: 0
    
    T:\tCode\Serial\YieldTest\YieldTest.ino May 21 2020 12:43:08
    Teensy 3.6 :: CORES event PR
    TD 1.52 :: PRIVATE  yield() :: setup Test:: 0
    loop's per sec = 3910247
    loop's per sec = 3910269
    TD 1.52 setup Test:: 140
    
    T:\tCode\Serial\YieldTest\YieldTest.ino May 21 2020 12:43:26
    Teensy 3.6 :: CORES event PR
    TD 1.52 setup Test:: 141
    loop's per sec = 2809698
    loop's per sec = 2810521

  13. #13
    Senior Member+ KurtE's Avatar
    Join Date
    Jan 2014
    Posts
    6,921
    Still doing some testing.

    Run into issue with T4 branch that Serial2-x appear to always think it is using user specific event handler.

    The T3.x branch appears to work properly... So investigating.

    Saw the above on both Windows and MAC.

  14. #14
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    11,785
    @Kurt - did you make a #ifdef''d sketch that iterates the valid : if (serial_event_handler_default) Serial.print( "port USER Event" ); else Serial.print( "port NULL" );

    oppps - found that is private ...
    Last edited by defragster; 05-22-2020 at 07:25 PM.

  15. #15
    Senior Member+ KurtE's Avatar
    Join Date
    Jan 2014
    Posts
    6,921
    I just pushed up a fix for the T4.x... Will double verify it with the MAC, that it builds correctly there. Note: I changed some of the T4 processing to be more in line with T3.x code. Where instead of checking all 8 items for NULL to see if we should check code. I instead just populate the array with the active ones and a count. Also got rid of special function per Serial port and instead have the
    Hardware structure keep link to the event function and then add member function which does a call to available if true then call through pointer to the event function...

    I noticed when I had Serial2 with begin that yield flags I mentioned earlier was not 0... SO I wondered what was going on. SO hacked up test sketch a bit more like:
    Code:
    void setup() {
      pinMode(CS_PIN, OUTPUT);
      digitalWriteFast(CS_PIN, HIGH);
      while (!Serial && millis() < 4000) ;  // wait for Serial port
      Serial.begin(115200);
      SPI.begin();
      Serial.println("SPI Test program");
      Serial1.begin(2000000);
      Serial2.begin(2000000);
      Serial3.begin(2000000);
      extern const uint8_t _serialEvent_default;
      extern const uint8_t _serialEvent1_default;
      extern const uint8_t _serialEvent2_default;
      extern const uint8_t _serialEvent3_default;
      Serial.printf("Default serialEvent? %d %d %d %d\n", _serialEvent_default, 
          _serialEvent1_default,_serialEvent2_default,_serialEvent3_default);
    #if defined(__IMXRT1062__)
      Serial4.begin(2000000);
      Serial5.begin(2000000);
      Serial6.begin(2000000);
      Serial7.begin(2000000);
      //Serial8.begin(2000000);
      extern const uint8_t _serialEvent4_default;
      extern const uint8_t _serialEvent5_default;
      extern const uint8_t _serialEvent6_default;
      extern const uint8_t _serialEvent7_default;
    //  extern const uint8_t _serialEvent8_default;
      Serial.printf("    %d %d %d %d\n", _serialEvent4_default,
          _serialEvent5_default,_serialEvent6_default,_serialEvent7_default);
    #endif
    ...
    }
    And it was showing the default flags as 0 ...
    Now all show default...

    Back to MAC

    Update: Works on MAC
    Last edited by KurtE; 05-22-2020 at 08:49 PM.

  16. #16
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    11,785
    Starting a sketch with JUMPER plugs on all T_4.1 Serial#'s. So far all 8 ports work without Event checking.

    Every second it prints 'ii' from loop to each Serial#[1-8]
    Every 2+ seconds if reads from each Serial# if ->available() to Serial.
    Will add serialEvent#()'s it it will be read from there.

    Initial results before adding any serialEvent#()'s:
    Code:
    Sketch uses 38032 bytes (0%) of program storage space. Maximum is 8126464 bytes.
    Global variables use 49844 bytes (9%) of dynamic memory, leaving 474444 bytes for local variables. Maximum is 524288 bytes.

    Code:
    T:\tCode\Serial\SerialEventsTest\SerialEventsTest.ino May 22 2020 16:55:20
    Teensy 4.1 :: CORES event PR
    TD 1.52 :: PRIVATE  yield() :: setup Test:
    
    loop's per sec = 9818908
    loop's per sec = 14633190
    loop's per sec = 14633187
    0] 0	0] 0	0] 0	
    1] 1	1] 1	1] 1	
    2] 2	2] 2	2] 2	
    3] 3	3] 3	3] 3	
    4] 4	4] 4	4] 4	
    5] 5	5] 5	5] 5	
    6] 6	6] 6	6] 6	
    7] 7	7] 7	7] 7
    And NO PRIVATE yield() :: Sketch uses 38208 bytes (0%) of program storage space. Maximum is 8126464 bytes.
    T:\tCode\Serial\SerialEventsTest\SerialEventsTest. ino May 22 2020 16:56:31

    Teensy 4.1 :: CORES event PR

    TD 1.52 setup Test:

    loop's per sec = 8808613
    loop's per sec = 13635471
    loop's per sec = 13635468
    0] 0 0] 0 0] 0

  17. #17
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    11,785
    Code behaving and working as expected!!!

    It is best not to use serialEvent() processing! But this improves the situation.

    Code below for this and prior post. Adding a single :: void serialEvent1() { readMe( 0 ); }
    Sketch uses 38336 bytes (0%) of program storage space. Maximum is 8126464 bytes.
    Global variables use 49844 bytes (9%) of dynamic memory, leaving 474444 bytes for local variables. Maximum is 524288 bytes.
    loop() count drops:
    Code:
    T:\tCode\Serial\SerialEventsTest\SerialEventsTest.ino May 22 2020 17:08:28
    
    Teensy 4.1 :: CORES event PR
    
    TD 1.52 setup Test:
    
    loop's per sec = 5241743
    0>> 0	
    loop's per sec = 7894202
    0>> 0	
    loop's per sec = 7894202
    1] 1	1] 1	1] 1	
    2] 2	2] 2	2] 2	
    3] 3	3] 3	3] 3	
    4] 4	4] 4	4] 4	
    5] 5	5] 5	5] 5	
    6] 6	6] 6	6] 6	
    7] 7	7] 7	7] 7	
    0>> 0	
    loop's per sec = 7893886
    0>> 0	
    loop's per sec = 7894202
    0>> 0	
    loop's per sec = 7894202
    1] 1	1] 1	1] 1	
    2] 2	2] 2	2] 2	
    3] 3	3] 3	3] 3	
    4] 4	4] 4	4] 4	
    5] 5	5] 5	5] 5	
    6] 6	6] 6	6] 6	
    7] 7	7] 7	7] 7
    All eight serialEvent():
    Sketch uses 38336 bytes (0%) of program storage space. Maximum is 8126464 bytes.
    Global variables use 49844 bytes (9%) of dynamic memory, leaving 474444 bytes for local variables. Maximum is 524288 bytes.
    Code:
    T:\tCode\Serial\SerialEventsTest\SerialEventsTest.ino May 22 2020 17:14:27
    
    Teensy 4.1 :: CORES event PR
    
    TD 1.52 setup Test:
    
    loop's per sec = 2077276
    0>> 0	1>> 1	2>> 2	3>> 3	4>> 4	5>> 5	6>> 6	7>> 7	loop's per sec = 3225545
    0>> 0	1>> 1	2>> 2	3>> 3	4>> 4	5>> 5	6>> 6	7>> 7	loop's per sec = 3225544
    0>> 0	1>> 1	2>> 2	3>> 3	4>> 4	5>> 5	6>> 6	7>> 7	loop's per sec = 3225544
    SAME IDE 1.8.12 with release TD 1.52 :: USING serialEvent():
    Sketch uses 38848 bytes (0%) of program storage space. Maximum is 8126464 bytes.
    Global variables use 49844 bytes (9%) of dynamic memory, leaving 474444 bytes for local variables. Maximum is 524288 bytes.
    Code:
    T:\tCode\Serial\SerialEventsTest\SerialEventsTest.ino May 22 2020 17:19:02
    
    Teensy 4.1 :: TD 1.52 RELEASE 
    
    TD 1.52 setup Test:
    
    loop's per sec = 222073
    0>> 0	1>> 1	2>> 2	3>> 3	4>> 4	5>> 5	6>> 6	7>> 7	loop's per sec = 1657236
    0>> 0	1>> 1	2>> 2	3>> 3	4>> 4	5>> 5	6>> 6	7>> 7	loop's per sec = 1657235
    0>> 0	1>> 1	2>> 2	3>> 3	4>> 4	5>> 5	6>> 6	7>> 7	loop's per sec = 1657234
    0>> 0	1>> 1	2>> 2	3>> 3	4>> 4	5>> 5	6>> 6	7>> 7	loop's per sec = 1657235
    SAME IDE 1.8.12 with release TD 1.52 :: NOT using serialEvent():
    Code:
    loop's per sec = 10525092
    loop's per sec = 10525120
    0] 0	0] 0	0] 0	
    1] 1	1] 1	1] 1	
    2] 2	2] 2	2] 2	
    3] 3	3] 3	3] 3	
    4] 4	4] 4	4] 4	
    5] 5	5] 5	5] 5	
    6] 6	6] 6	6] 6	
    7] 7	7] 7	7] 7

    Code:
    const char szTeensy[] = "Teensy 4.1 :: CORES event PR";
    #if 0
    void yield() {}
    const char szTest[] = "TD 1.52 :: PRIVATE  yield() :: setup Test:";
    #else
    const char szTest[] = "TD 1.52 setup Test:";
    #endif
    HardwareSerial *pSer[8] = { &Serial1, &Serial2, &Serial3, &Serial4, &Serial5, &Serial6, &Serial7, &Serial8 };
    #define DO_SE1  1
    #if 1 // MOVE this to determine which are declared 
    #define DO_SE2  1
    #define DO_SE3  1
    #define DO_SE4  1
    #define DO_SE5  1
    #define DO_SE6  1
    #define DO_SE7  1
    #define DO_SE8  1
    #endif
    
    elapsedMillis serWait;
    elapsedMillis serWaitPrt;
    elapsedMillis loopTime;
    uint32_t loopCnt = 0;
    void loop() {
      if ( loopTime >= 1000 ) {
        loopTime -= 1000;
        Serial.printf("loop's per sec = %lu\n", loopCnt);
        loopCnt = 0;
      }
      loopCnt++;
      if ( serWait >= 1000 ) {
        serWait = 0;
        for ( int ii = 0; ii < 8; ii++ ) {
          pSer[ii]->print( ii );
        }
        if ( serWaitPrt > 2500 ) {
          serWaitPrt = 0;
          for ( int ii = 0; ii < 8; ii++ ) {
            if ( pSer[ii]->available() ) {
              while ( pSer[ii]->available() ) {
                Serial.printf( "%u] %c\t", ii, pSer[ii]->read() );
              }
              Serial.print( "\n");
            }
          }
        }
      }
    }
    
    void setup() {
      // put your setup code here, to run once:
      for ( int ii = 0; ii < 8; ii++ ) {
        pSer[ii]->begin( 2000000 );
      }
      while (!Serial) ; // wait
      Serial.println("\n" __FILE__ " " __DATE__ " " __TIME__);
      Serial.println(szTeensy);
      Serial.println(szTest);
      loopCnt = 0;
    }
    
    
    void readMe( int ii ) {
      if ( pSer[ii]->available() ) {
        while ( pSer[ii]->available() ) {
          Serial.printf( "%u>> %c\t", ii, pSer[ii]->read() );
        }
        //Serial.print( "\n");
      }
    }
    
    #if DO_SE1
    void serialEvent1() {
      readMe( 0 );
    }
    #endif
    #if DO_SE2
    void serialEvent2() {
      readMe( 1 );
    }
    #endif
    #if DO_SE3
    void serialEvent3() {
      readMe( 2 );
    }
    #endif
    #if DO_SE4
    void serialEvent4() {
      readMe( 3 );
    }
    #endif
    #if DO_SE5
    void serialEvent5() {
      readMe( 4 );
    }
    #endif
    #if DO_SE6
    void serialEvent6() {
      readMe( 5 );
    }
    #endif
    #if DO_SE7
    void serialEvent7() {
      readMe( 6 );
    }
    #endif
    #if DO_SE8
    void serialEvent8() {
      readMe( 7 );
    }
    #endif

  18. #18
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    11,785
    Works as well on T_3.6!
    Moved 5 jumpers to the T_3.6 topside pins with minor edits for array declare and #define of number of ports to loop.

    Results below when using Events - look to be worse.

    PR code last pulled works at BETTER loop speed when no Event()'s declared.
    Code:
    Teensy :: CORES event PR
    
    TD 1.52 setup Test:
    
    loop's per sec = 1083273
    loop's per sec = 2043918
    loop's per sec = 2043943
    0] 0	0] 0	0] 0	
    1] 1	1] 1	1] 1	
    2] 2	2] 2	2] 2	
    3] 3	3] 3	3] 3	
    4] 4	4] 4	4] 4
    STOCK TD 1.52 slower - no events::
    Code:
    Teensy :: TD 1.52
    
    TD 1.52 setup Test:
    
    loop's per sec = 301454
    loop's per sec = 716458
    loop's per sec = 716465
    0] 0	0] 0	0] 0	
    1] 1	1] 1	1] 1	
    2] 2	2] 2	2] 2	
    3] 3	3] 3	3] 3	
    4] 4	4] 4	4] 4	
    loop's per sec = 868641
    Though with 5 serialEvents() TD 1.52 Release::
    Code:
    loop's per sec = 390058
    0>> 0	1>> 1	2>> 2	3>> 3	4>> 4	loop's per sec = 868665
    0>> 0	1>> 1	2>> 2	3>> 3	4>> 4	loop's per sec = 868709
    0] 0	
    1] 1	
    2] 2	
    3] 3	
    4] 4	
    loop's per sec = 868709
    And with the edited PR code there is a speed loss:
    Code:
    Teensy :: CORES event PR
    
    TD 1.52 setup Test:
    
    loop's per sec = 295075
    0>> 0	1>> 1	2>> 2	3>> 3	4>> 4	loop's per sec = 615973
    0>> 0	1>> 1	2>> 2	3>> 3	4>> 4	loop's per sec = 615992
    0] 0	
    1] 1	
    2] 2	
    3] 3	
    4] 4	
    loop's per sec = 615922
    PR Code A bit better when only Serial1 and 2 Event() code in use:
    Code:
    Teensy :: CORES event PR
    
    TD 1.52 setup Test:
    
    loop's per sec = 328881
    0>> 0	1>> 1	loop's per sec = 944055
    0>> 0	1>> 1	loop's per sec = 944138
    0] 0	
    1] 1	
    2] 2	2] 2	2] 2	
    3] 3	3] 3	3] 3	
    4] 4	4] 4	4] 4
    And the TD 1.52 gets marginally worse than when using all five::
    Code:
    Teensy :: TD 1.52
    
    TD 1.52 setup Test:
    
    loop's per sec = 397889
    0>> 0	1>> 1	loop's per sec = 768488
    0>> 0	1>> 1	loop's per sec = 768498
    0] 0	
    1] 1	
    2] 2	2] 2	2] 2	
    3] 3	3] 3	3] 3	
    4] 4	4] 4	4] 4	
    loop's per sec = 868670
    and PRIVATE local void yield() still much better:
    Code:
    Teensy :: CORES event PR
    
    TD 1.52 :: PRIVATE  yield() :: setup Test:
    
    loop's per sec = 1105308
    loop's per sec = 2901101
    loop's per sec = 2901124
    0] 0	0] 0	0] 0	
    1] 1	1] 1	1] 1	
    2] 2	2] 2	2] 2	
    3] 3	3] 3	3] 3	
    4] 4	4] 4	4] 4
    Trivial code changes to handle just 5 ports:
    Code:
    const char szTeensy[] = "Teensy :: CORES event PR";
    //const char szTeensy[] = "Teensy :: TD 1.52";
    #if 0
    void yield() {}
    const char szTest[] = "TD 1.52 :: PRIVATE  yield() :: setup Test:";
    #else
    const char szTest[] = "TD 1.52 setup Test:";
    #endif
    //HardwareSerial *pSer[8] = { &Serial1, &Serial2, &Serial3, &Serial4, &Serial5, &Serial6, &Serial7, &Serial8 };
    //HardwareSerial *pSer[] = { &Serial1, &Serial2, &Serial3, &Serial4, &Serial5, &Serial6, &Serial7, &Serial8 };
    
    // T_3.6 // 
    HardwareSerial *pSer[] = { &Serial1, &Serial2, &Serial3, &Serial4, &Serial5 };
    
    #define NUM_SER_LOOP 5
    #define DO_SE2  1
    #define DO_SE1  1
    #define DO_SE3  1
    #define DO_SE4  1
    #define DO_SE5  1
    #if 0 // MOVE this to determine which are declared 
    #define DO_SE6  1
    #define DO_SE7  1
    #define DO_SE8  1
    #endif
    
    elapsedMillis serWait;
    elapsedMillis serWaitPrt;
    elapsedMillis loopTime;
    uint32_t loopCnt = 0;
    void loop() {
      if ( loopTime >= 1000 ) {
        loopTime -= 1000;
        Serial.printf("loop's per sec = %lu\n", loopCnt);
        loopCnt = 0;
      }
      loopCnt++;
      if ( serWait >= 1000 ) {
        serWait = 0;
        for ( int ii = 0; ii < NUM_SER_LOOP; ii++ ) {
          pSer[ii]->print( ii );
        }
        if ( serWaitPrt > 2500 ) {
          serWaitPrt = 0;
          for ( int ii = 0; ii < NUM_SER_LOOP; ii++ ) {
            if ( pSer[ii]->available() ) {
              while ( pSer[ii]->available() ) {
                Serial.printf( "%u] %c\t", ii, pSer[ii]->read() );
              }
              Serial.print( "\n");
            }
          }
        }
      }
    }
    
    void setup() {
      // put your setup code here, to run once:
      for ( int ii = 0; ii < NUM_SER_LOOP; ii++ ) {
        pSer[ii]->begin( 2000000 );
      }
      while (!Serial) ; // wait
      Serial.println("\n" __FILE__ " " __DATE__ " " __TIME__);
      Serial.println(szTeensy);
      Serial.println(szTest);
      loopCnt = 0;
    }
    
    
    void readMe( int ii ) {
      if ( pSer[ii]->available() ) {
        while ( pSer[ii]->available() ) {
          Serial.printf( "%u>> %c\t", ii, pSer[ii]->read() );
        }
        //Serial.print( "\n");
      }
    }
    
    #if DO_SE1
    void serialEvent1() {
      readMe( 0 );
    }
    #endif
    #if DO_SE2
    void serialEvent2() {
      readMe( 1 );
    }
    #endif
    #if DO_SE3
    void serialEvent3() {
      readMe( 2 );
    }
    #endif
    #if DO_SE4
    void serialEvent4() {
      readMe( 3 );
    }
    #endif
    #if DO_SE5
    void serialEvent5() {
      readMe( 4 );
    }
    #endif
    #if DO_SE6
    void serialEvent6() {
      readMe( 5 );
    }
    #endif
    #if DO_SE7
    void serialEvent7() {
      readMe( 6 );
    }
    #endif
    #if DO_SE8
    void serialEvent8() {
      readMe( 7 );
    }
    #endif
    Last edited by defragster; 05-23-2020 at 04:41 AM.

  19. #19
    Senior Member+ KurtE's Avatar
    Join Date
    Jan 2014
    Posts
    6,921
    Thanks @defragster - I can imagine when using serialEventX on T3.x, could be slightly slower, which I could reduce/eliminate, with trade offs
    That is could have the yield code go back to directly calling code like:

    Code:
    if (Serial1.available()) serialEvent1();
    if (Serial2.available()) serialEvent2();
    ...
    Trade off is, this will as before pull in the code/data for all serial objects.

    There may be a few shortcuts we can do. Again not sure how far to take this. Like could maybe know if any Serial object has any data in it and only continue then or ...

  20. #20
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    11,785
    Would it be too much to have each SerialX Rx code set a flag?

    And a way to put the same in the weak code - obviously always false.

    When real code gets loaded on use - on Rx event the flag goes true. Safe to test without bringing in 'real code and buffer alloc'. Then when true - it says SerialX object has data.

    May seem like it could lead to extra code on every Rx byte - but even 200K dataSerX=true {at 2M baud } - should be less overhead than 50K * {1-5} NON_STOP calls to SerialX.available(), when that calculates a number that doesn't matter with :
    Code:
    int HardwareSerial::available(void)
    {
    	uint32_t head, tail;
    
    	head = rx_buffer_head_;
    	tail = rx_buffer_tail_;
    	if (head >= tail) return head - tail;
    	return rx_buffer_total_size_ + head - tail;
    }
    A quick glance shows that would be in :
    Code:
    void HardwareSerial::IRQHandler() 
    // ...
    			} while (--avail > 0) ;
    			rx_buffer_head_ = head;
    			dataSerX=true;
    Then of course when buffer is read to head==tail empty : dataSerX=false;
    Code:
    int HardwareSerial::read(void)
    {
    	uint32_t head, tail;
    	int c;
    
    	head = rx_buffer_head_;
    	tail = rx_buffer_tail_;
    	if (head == tail) return -1;
    	if (++tail >= rx_buffer_total_size_) tail = 0;
    	if (tail < rx_buffer_size_) {
    		c = rx_buffer_[tail];
    	} else {
    		c = rx_buffer_storage_[tail-rx_buffer_size_];
    	}
    	rx_buffer_tail_ = tail;
    	if (rts_pin_baseReg_) {
    		uint32_t avail;
    		if (head >= tail) avail = head - tail;
    		else avail = rx_buffer_total_size_ + head - tail;
    
    		if (avail <= rts_low_watermark_) rts_assert();
    	}
    	if (head == tail) dataSerX=false;;
    	return c;
    }
    Not sure that is everything/everywhere - but hopefully explained ... except a good variable name and where to add it.

  21. #21
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    11,785
    Quote Originally Posted by defragster View Post
    Would it be too much to have each SerialX Rx code set a flag?

    ...
    Tried the post #20 idea - somehow it runs at the same speed? Skipping if ( available() ) for with setting of rx_some_ it works ... but same reduced loop()/sec :
    Code:
    	inline void doYieldCode()  {
    		if (rx_some_) (*hardware->_serialEvent)();
    	}

  22. #22
    Senior Member+ KurtE's Avatar
    Join Date
    Jan 2014
    Posts
    6,921
    Thanks @defragster

    Earlier I tried a few different approaches like that and many of them did not help much.

    Other approaches I have thought about, but have not tried include:

    Have each Serial object, when it receives a character and puts them into Software queue, maybe remember something like the 32 bit microseconds... We keep two 32 bit values. The one mentioned, and one of what that first one was at start of the last time we found we had gone through the list and we nothing was available...
    Something like:

    Code:
    uint32_t millis_last_empty = 0;
    volatile uint32_t millis_last_data = 0;
    
    ... 
    
    void check_hardware_serials() {
       if (millis_last_empty == millis_last_data ) return; // Nothing last time, nothing new
       bool event_called = false;
       uint32_t millis_last_data_start = millis_last_data;
       if (Serial1.available) { event_called = true; serialEvent1(); }
       if (Serial2.available) { event_called = true; serialEvent2(); }
    ...
       if (!event_called) millis_last_empty = millis_last_data_start;  // use the start time not current as something may have come in since
    }
    Another approach I was thinking, was I currently have that call as part of the HardwareSerial that calls available() on the object to see if it has data and then calls the saved event...
    Right now in T3.x all methods go through virtual functions, which then call of through an individual method for each serial class...
    Thought about getting the T3.x HardwareSerial code somewhat closer to the T4 where the is the root class... I had a full implementation earlier which is where the T4.x code started from.
    But could go half way there... That is maybe create a structure which has things like head and tail pointer... And maybe then have the HardwareSerial object have pointer to that data... But difficulty is that each one may be of different sizes, as we have code in each of these source files that the rx_buffer_head and likewise tail, might be 1 byte or 2 bytes or 4 bytes in length, depending on the value of
    SERIAL1_RX_BUFFER_SIZE ... So I punted on this approach.

  23. #23
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    11,785
    Seemed Odd - that "if (rx_some_) (*hardware->_serialEvent)();" had the same overhead as "if ( available() )".

    When all SerEvents are gone it was showing 'approx' just over 14M loops/sec instead of just under 15M - but enabling even one cuts it in half - enabling them all cuts that in half again.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •