Forum Rule: Always post complete source code & details to reproduce any issue!
Page 17 of 17 FirstFirst ... 7 15 16 17
Results 401 to 419 of 419

Thread: Teensy 3.x multithreading library first release

  1. #401
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    9,378
    @ftrias - not done anything beyond initial test yet that worked as noted. Added yield() to threads for them to exit after being done some simple task - in minor change to the example\print.ino.

    I noticed I skewed the run/exec time of the threads shared with loop() - and it took some added code to discover how/why as it was odd behavior of the non-linear switching operation.

    >> Wondering if there would be an easy way to add '#ifdef DEBUG' instrumentation that could give thread operation feedback?
    like :
    Code:
    class ThreadInfo {
      public:
        uint32_t cyclesStart=0;  // On T_4 the CycCnt is always active - on T_3.x it currently is not - unless Audio starts it AFAIK
        uint32_t cyclesAccum=0;
    When a thread is started: thread_X.cyclesStart = ARM_DWT_CYCCNT

    When a thread is swapped for time or yield: thread_X.cyclesAccum += ARM_DWT_CYCCNT - thread_X.cyclesStart

    Then once per second or so a print of cycles/time spent per thread would show inactive or long running threads by getting and zeroing the cyclesAccum per thread? Along the lines of what the Audio Library shows with CPU_USE.


    >> Not followed the code develop or seen how time is allowed for loop() to run?

    >> There is 'thread 0, which is always running' - is that just the default task switch thread?

    >> Would be interesting to know how much time is spent in in loop() from the above - as it was only counting loop()/sec that told of the off thread behavior I created.


    Perhaps there are better or additional ways to get feedback that might give a feel for where the time goes when something isn't clicking right? The read of the ARM_DWT_CYCCNT is a fast 'memory' read so would add minimal over head '#ifdef DEBUG'.

  2. #402
    Senior Member
    Join Date
    Sep 2013
    Location
    Boston, MA
    Posts
    129
    Quote Originally Posted by defragster View Post
    @ftrias - not done anything beyond initial test yet that worked as noted. Added yield() to threads for them to exit after being done some simple task - in minor change to the example\print.ino.

    I noticed I skewed the run/exec time of the threads shared with loop() - and it took some added code to discover how/why as it was odd behavior of the non-linear switching operation.

    >> Wondering if there would be an easy way to add '#ifdef DEBUG' instrumentation that could give thread operation feedback?
    like :
    Code:
    class ThreadInfo {
      public:
        uint32_t cyclesStart=0;  // On T_4 the CycCnt is always active - on T_3.x it currently is not - unless Audio starts it AFAIK
        uint32_t cyclesAccum=0;
    When a thread is started: thread_X.cyclesStart = ARM_DWT_CYCCNT

    When a thread is swapped for time or yield: thread_X.cyclesAccum += ARM_DWT_CYCCNT - thread_X.cyclesStart

    Then once per second or so a print of cycles/time spent per thread would show inactive or long running threads by getting and zeroing the cyclesAccum per thread? Along the lines of what the Audio Library shows with CPU_USE.


    >> Not followed the code develop or seen how time is allowed for loop() to run?

    >> There is 'thread 0, which is always running' - is that just the default task switch thread?

    >> Would be interesting to know how much time is spent in in loop() from the above - as it was only counting loop()/sec that told of the off thread behavior I created.


    Perhaps there are better or additional ways to get feedback that might give a feel for where the time goes when something isn't clicking right? The read of the ARM_DWT_CYCCNT is a fast 'memory' read so would add minimal over head '#ifdef DEBUG'.
    This is an excellent idea. I committed a change along your suggestion: https://github.com/ftrias/TeensyThre...685c7cb531905e

    If you uncomment '#define DEBUG' in TeensyThreads.h, you get a new function 'unsigned long Threads::getCyclesUsed(int id)'. You pass the thread id returned by addThread as in:

    Code:
    int id = threads.addThread(func, 1);
    delay(1000);
    Serial.println(threads.getCyclesUsed(id));

  3. #403
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    9,378
    Quote Originally Posted by ftrias View Post
    This is an excellent idea. I committed a change along your suggestion: https://github.com/ftrias/TeensyThre...685c7cb531905e

    If you uncomment '#define DEBUG' in TeensyThreads.h, you get a new function 'unsigned long Threads::getCyclesUsed(int id)'. You pass the thread id returned by addThread as in:

    Code:
    int id = threads.addThread(func, 1);
    delay(1000);
    Serial.println(threads.getCyclesUsed(id));
    Awesome! It seemed like it would be nice.

    Note: for T_3.x's the cycle counter is not always running so DEBUG should start it ( and T_LC doesn't have one ):
    Code:
      ARM_DEMCR |= ARM_DEMCR_TRCENA; // Assure Cycle Counter active
      ARM_DWT_CTRL |= ARM_DWT_CTRL_CYCCNTENA;
    Note: That Accum doesn't get reset - and that adds multiple calls in sketch with stop()/start(). So I saw this as a way to make it easy on the user and less intrusive in debug code:

    I get this output for initialized threads:
    Code:
    th#0 use=296021495	th#1 use=298854709	th#2 use=1816594	th#3 use=2
    Clearly this is a hack as I hardcoded 8? And just plopped a global "uint32_t threadCycles[ 8 ];", but it works with getCyclesUsedAll() prototyped in the class in .h and in .cpp added this code
    Code:
    uint32_t threadCycles[ 8 ];
    unsigned long *Threads::getCyclesUsedAll(void) {
      stop();
      for (int i = 0; i < MAX_THREADS; i++) {
        if (threadp[i]) { // initialized thread?
          if (threadp[current_thread]->flags == RUNNING) { // is it running?
            threadCycles[i] = threadp[i]->cyclesAccum;
            threadp[i]->cyclesAccum = 0;
            if ( threadCycles[i] <2 ) threadCycles[i] = 2; // show 2 - not run
          }
          else { // not running
            threadCycles[i] = 1;
          }
        }
        else { // not initialized
          threadCycles[i] = 0;
        }
      }
      start();
      return threadCycles;
    }
    Above output works for sketch to run this once per second from loop():
    Code:
    void thread_Activity() {
    	uint32_t *threadCycles;
    	threadCycles = threads.getCyclesUsedAll();
    	for (int i = 0; i < 8; i++) {
    		if ( threadCycles[i] > 0 )
    			Serial.printf( "th#%u use=%u\t", i, threadCycles[i]);
    	}
    	Serial.println();
    }
    Problem - I was trying to add this as a THREAD to just run as a way to add a thread and it runs ONCE and then never prints again? I just edited to print each time and it stops after the loop starts?


    Output - note the trailing "_TH" when the thread version prints - then it stops when loop is entered?
    Can you see if I have something BAD in the sketch - or is it triggering something stopping the thread from execution once loop() printing starts?
    Code:
    T:\tCode\tThreads\PrintExPlus\PrintExPlus.ino Aug 20 2019 13:45:11
    th#0 use=200723099	th#1 use=35862	th#2 use=223	th#3 use=2	  <TH_
    th#0 use=1	th#1 use=1	th#2 use=1	th#3 use=1	  <TH_
    th#0 use=1	th#1 use=1	th#2 use=1	th#3 use=1	  <TH_
    th#0 use=1	th#1 use=1	th#2 use=1	th#3 use=1	  <TH_
    th#0 use=1	th#1 use=1	th#2 use=1	th#3 use=1	  <TH_
    3160419	45376531	8039	48544989
    th#0 use=285902887	th#1 use=288282873	th#2 use=1687992	th#3 use=739886826	
    3274150	47041664	8334	50324148
    th#0 use=296265333	th#1 use=298898904	th#2 use=1750150	th#3 use=2	
    3274123	47036032	8333	50318488
    th#0 use=296229789	th#1 use=298863042	th#2 use=1749938	th#3 use=2
    Here is the edited example\print.ino edited and run on Teensy 4:
    Code:
    #include <TeensyThreads.h>
    
    volatile uint32_t countL = 0;
    volatile uint32_t count = 0;
    volatile uint32_t countB = 0;
    
    void thread_func(int inc) {
    	while (1) {
    		count += inc;
    		if ( 0 &&  !(count % 1667) )
    			threads.yield();
    	}
    }
    
    void thread_funcB(int inc) {
    	while (1) {
    		countB += inc;
    		threads.yield();
    	}
    }
    
    
    void thread_Activity() {
    	uint32_t *threadCycles;
    	threadCycles = threads.getCyclesUsedAll();
    	for (int i = 0; i < 8; i++) {
    		if ( threadCycles[i] > 0 )
    			Serial.printf( "th#%u use=%u\t", i, threadCycles[i]);
    	}
    	Serial.println();
    }
    
    void TH_thread_Activity() {
    	static uint32_t nextShow = 0;
    	static uint32_t cyclesShow = F_CPU_ACTUAL;
    	volatile static uint32_t cnt = 0;
    	nextShow = ARM_DWT_CYCCNT + cyclesShow;
    	while (1)
    	{
    		cnt++;
    		//if ( ARM_DWT_CYCCNT - nextShow > cyclesShow ) 
    		{
    			//if ( cnt > 10 ) {
    			cnt = 0;
    			nextShow = ARM_DWT_CYCCNT + cyclesShow;
    			uint32_t *threadCycles;
    			threadCycles = threads.getCyclesUsedAll();
    			for (int i = 0; i < 8; i++) {
    				if ( threadCycles[i] > 0 )
    					Serial.printf( "th#%u use=%u\t", i, threadCycles[i]);
    			}
    			Serial.println( "  <TH_");
    		}
    		threads.yield();
    	}
    }
    
    void setup() {
    	Serial.begin(115200);
    	while (!Serial && millis() < 4000 );
    	Serial.println("\n" __FILE__ " " __DATE__ " " __TIME__);
    	threads.setMicroTimer( 10 );
    	threads.addThread(thread_func, 1);
    	threads.addThread(thread_funcB, 1);
    	threads.addThread(TH_thread_Activity);
    }
    
    elapsedMillis tWait = 0;
    void loop() {
    	countL++;
    	if ( tWait >= 1000 ) {
    		tWait = 0;
    		Serial.print(countL);
    		Serial.print("\t");
    		Serial.print(count);
    		Serial.print("\t");
    		Serial.print(countB);
    		Serial.print("\t");
    		Serial.println(countB + count + countL);
    		count = 0;
    		countB = 0;
    		countL = 0;
    		// thread_Activity(); // ABOVE output was with this enabled - accts the same when commented - TH_thread_Activity() stops running?
    	}
    	//threads.delay(1000);
    }
    As posted the output does this with commented _Activity() line:
    Code:
    T:\tCode\tThreads\PrintExPlus\PrintExPlus.ino Aug 20 2019 13:52:15
    th#0 use=206723925	th#1 use=35863	th#2 use=222	th#3 use=2	  <TH_
    th#0 use=1	th#1 use=1	th#2 use=1	th#3 use=1	  <TH_
    th#0 use=1	th#1 use=1	th#2 use=1	th#3 use=1	  <TH_
    th#0 use=1	th#1 use=1	th#2 use=1	th#3 use=1	  <TH_
    th#0 use=1	th#1 use=1	th#2 use=1	th#3 use=1	  <TH_
    3125290	44923620	7956	48056866
    3272113	47052364	8333	50332810
    3271932	47058087	8334	50338353

  4. #404
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    9,378
    @ftrias - I haven't gotten back to this - just saw your profiling post … made me return to this as it does similar for TeensyThreads

    still wondering why the task TH_thread_Activity() to dump the fails to run after loop() starts? But works when doing the same from loop() as thread_Activity() - which shows htat thread is not getting execution time?

    Also the zero of the accumulated values and ALL() in one version to capture the thread run times.

  5. #405
    Senior Member+
    Join Date
    Jul 2014
    Location
    New York
    Posts
    3,958
    @defragster

    Was this what you were expecting to see:
    Code:
    C:\Users\Merli\AppData\Local\Temp\arduino_modified_sketch_349377\Print.ino Aug 24 2019 00:41:47
    1	0	0	1
    th#0 use=2	th#1 use=2	th#2 use=2	
    3572205	47058752	8334	50639291
    th#0 use=1156543385	th#1 use=298940578	th#2 use=1750151	
    3571121	47061363	8333	50640817
    th#0 use=296296484	th#1 use=298954699	th#2 use=1799930	
    3571363	47061362	8333	50641058
    If it is I just didn't put TH_thread_Activity in its own thread. Loop is thread zero already and you just need to call it as a function. so the whole sketch looks like this:
    Code:
    #include <TeensyThreads.h>
    
    volatile uint32_t countL = 0;
    volatile uint32_t count = 0;
    volatile uint32_t countB = 0;
    
    void thread_func(int inc) {
      while (1) {
        count += inc;
        if ( 0 &&  !(count % 1667) )
          threads.yield();
      }
    }
    
    void thread_funcB(int inc) {
      while (1) {
        countB += inc;
        threads.yield();
      }
    }
    
    
    void thread_Activity() {
      uint32_t *threadCycles;
      threadCycles = threads.getCyclesUsedAll();
      for (int i = 0; i < 8; i++) {
        if ( threadCycles[i] > 0 )
          Serial.printf( "th#%u use=%u\t", i, threadCycles[i]);
      }
      Serial.println();
    }
    
    void TH_thread_Activity() {
      static uint32_t nextShow = 0;
      static uint32_t cyclesShow = F_CPU_ACTUAL;
      volatile static uint32_t cnt = 0;
      nextShow = ARM_DWT_CYCCNT + cyclesShow;
      while (1)
      {
        cnt++;
        //if ( ARM_DWT_CYCCNT - nextShow > cyclesShow ) 
        {
          //if ( cnt > 10 ) {
          cnt = 0;
          nextShow = ARM_DWT_CYCCNT + cyclesShow;
          uint32_t *threadCycles;
          threadCycles = threads.getCyclesUsedAll();
          for (int i = 0; i < 8; i++) {
            if ( threadCycles[i] > 0 )
              Serial.printf( "th#%u use=%u\t\t", i, threadCycles[i]);
          }
          Serial.println( "  <TH_");
        }
        threads.yield();
      }
    }
    
    void setup() {
      Serial.begin(115200);
      while (!Serial && millis() < 4000 );
      Serial.println("\n" __FILE__ " " __DATE__ " " __TIME__);
      threads.setMicroTimer( 10 );
      threads.addThread(thread_func, 1);
      threads.addThread(thread_funcB, 1);
      //threads.addThread(TH_thread_Activity);
    }
    
    elapsedMillis tWait = 0;
    void loop() {
      countL++;
      if ( tWait >= 1000 ) {
        tWait = 0;
        Serial.print(countL);
        Serial.print("\t");
        Serial.print(count);
        Serial.print("\t");
        Serial.print(countB);
        Serial.print("\t");
        Serial.println(countB + count + countL);
        count = 0;
        countB = 0;
        countL = 0;
        thread_Activity(); // ABOVE output was with this enabled - accts the same when commented - TH_thread_Activity() stops running?
      }
      //threads.delay(1000);
    }

  6. #406
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    9,378
    Quote Originally Posted by mjs513 View Post
    @defragster

    Was this what you were expecting to see:

    ...
    I wanted there to be an active running thread printout it out - not in loop but lines like this from TH_thread_Activity():

    Code:
    th#0 use=1	th#1 use=1	th#2 use=1	th#3 use=1	  <TH_
    The comments in TH_thread_Activity() where trials to get it to wait and not print each time - and then dropping them showed that thread dying as loop() starts it seems based on posted output.

    After those efforts I quit for review here in case I was doing something wrong … got some more ideas to try ...

  7. #407
    Senior Member
    Join Date
    Jul 2014
    Posts
    2,296
    Quote Originally Posted by defragster View Post

    Problem - I was trying to add this as a THREAD to just run as a way to add a thread and it runs ONCE and then never prints again? I just edited to print each time and it stops after the loop starts?
    Sounds to me that Serial.print is not thread save.
    (I would be surprised if there are NO global variables in Serial/Stream/Print)

  8. #408
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    9,378
    @ftrias - let me know if you like the post #403 threads.getCyclesUsedAll() version clearing ACCUM data better and can put it in [my hack is not clean with #8 hardcoded] - it seems @mjs513 did ... but he's been merging my code for over a year now

    @mjs513 - I see you answered post #401 question assumed but couldn't decipher in the code ::
    Loop is thread zero already
    QUESTION: Is loop() called on the same schedule as other threads or between each 'other' thread call?

    In the end I don't have ANY idea what was FIXED - but it is running as desired now! The three things THOUGHT to be problems were put back and it runs fine, dual thread Serial.print() and then loop() th#0 and th#1 each taking about 50% of CPU cycles ????

    I began to think it was using Serial.print in loop() and another thread … so I switched to sprint( showBuf) in TH_thread_Activity(), I got a valid thread buffer to show in loop(). And finally got it working like that - not sure of the real diff. I added two more threads and different task chores. And code below prints the sprint( showBuf ) in both threads - where func() name follows the string.

    MAYBE it was using a series of Serial.print() in the TH_() versus the sprintf to buffer and a single prints that is much faster to Serial? Though loop() uses a series as well?

    This is the desired output - using sprint( showBuf ) from TH_ AND loop() as labeled at line end - where the showBuf ACCUM data is calculated each F_CPU_ACTUAL cycles only in TH_thread_Activity():
    Code:
    T:\tCode\tThreads\PrintExPlus\PrintExPlus.ino Aug 24 2019 01:01:38
    th#0 use=204340286 [0.340567]	th#1 use=1311 [0.000002]	th#2 use=1425 [0.000002]	th#3 use=2 [0.000000]	th#4 use=2 [0.000000]	th#5 use=2 [0.000000]	TH_thread_Activity
    
    th#0 use=204340286 [0.340567]	th#1 use=1311 [0.000002]	th#2 use=1425 [0.000002]	th#3 use=2 [0.000000]	th#4 use=2 [0.000000]	th#5 use=2 [0.000000]	loop
    11	8743896	138792	277584	555168	8882699
    th#0 use=29168864 [0.048615]	th#1 use=178458391 [0.297431]	th#2 use=197037369 [0.328396]	th#3 use=31832225 [0.053054]	th#4 use=29436150 [0.049060]	th#5 use=30812309 [0.051354]	TH_thread_Activity
    
    th#0 use=29168864 [0.048615]	th#1 use=178458391 [0.297431]	th#2 use=197037369 [0.328396]	th#3 use=31832225 [0.053054]	th#4 use=29436150 [0.049060]	th#5 use=30812309 [0.051354]	loop
    10	8743770	138790	555164	1110328	8882570
    th#0 use=29169040 [0.048615]	th#1 use=178454480 [0.297424]	th#2 use=197038536 [0.328398]	th#3 use=31834063 [0.053057]	th#4 use=29438027 [0.049063]	th#5 use=30810788 [0.051351]	TH_thread_Activity
    
    th#0 use=29169040 [0.048615]	th#1 use=178454480 [0.297424]	th#2 use=197038536 [0.328398]	th#3 use=31834063 [0.053057]	th#4 use=29438027 [0.049063]	th#5 use=30810788 [0.051351]	loop
    10	8743896	138792	832748	1665496	8882698
    // ...
    This is the working code where the noted edits made to support threads.getCyclesUsedAll() are in place:
    Code:
    #include <TeensyThreads.h>
    
    volatile uint32_t countL = 0;
    volatile uint32_t count = 0;
    volatile uint32_t countB = 0;
    volatile uint32_t countC = 0;
    volatile uint32_t countD = 0;
    volatile uint32_t showNow = 0;
    
    void thread_func(int inc) {
    	while (1) {
    		count += inc;
    		if ( !(count % 63) )
    			threads.yield();
    	}
    }
    
    void thread_funcB(int inc) {
    	while (1) {
    		countB += inc;
    		delayMicroseconds(2);
    		threads.yield();
    	}
    }
    
    void thread_funcC(int inc) {
    	while (1) {
    		countC += inc;
    		countC += inc;
    		threads.yield();
    	}
    }
    
    void thread_funcD(int inc) {
    	while (1) {
    		countD += inc;
    		countD += inc;
    		countD += inc;
    		countD += inc;
    		threads.yield();
    	}
    }
    
    char showBuf[200];
    void TH_thread_Activity() {
    	uint32_t *threadCycles;
    	int jj;
    	static uint32_t nextShow = 0;
    	static uint32_t cyclesShow = F_CPU_ACTUAL;
    	while (1) {
    		if ( nextShow - ARM_DWT_CYCCNT > cyclesShow ) {
    			nextShow = ARM_DWT_CYCCNT + cyclesShow;
    			jj = 0;
    			threadCycles = threads.getCyclesUsedAll();
    			for (int i = 0; i < 8; i++) {
    				if ( threadCycles[i] > 0 ) {
    					jj += sprintf( &showBuf[jj], "th#%d use=%lu [%f]\t", i, threadCycles[i], threadCycles[i] / (float)cyclesShow);
    				}
    			}
    			Serial.print( showBuf );
    			Serial.println( __func__ );
    			showNow = 2;
    		}
    		threads.yield();
    	}
    }
    
    void setup() {
    	Serial.begin(115200);
    	while (!Serial && millis() < 4000 );
    	Serial.println("\n" __FILE__ " " __DATE__ " " __TIME__);
    	threads.setMicroTimer( 100 );
    	showBuf[0] = 0;
    	threads.addThread(thread_func, 1);
    	threads.addThread(thread_funcB, 1);
    	threads.addThread(TH_thread_Activity);
    	threads.addThread(thread_funcC, 1);
    	threads.addThread(thread_funcD, 1);
    }
    
    elapsedMillis tWait = 0;
    void loop() {
    	countL++;
    	if ( 2 == showNow ) {
    		Serial.println( );
    		Serial.print( showBuf );
    		Serial.println( __func__ );
    		showNow = 0;
    	}
    	if ( tWait >= 1000 ) {
    		tWait = 0;
    		Serial.print(countL);
    		Serial.print("\t");
    		Serial.print(count);
    		Serial.print("\t");
    		Serial.print(countB);
    		Serial.print("\t");
    		Serial.print(countC);
    		Serial.print("\t");
    		Serial.print(countD);
    		Serial.print("\t");
    		Serial.println(countB + count + countL);
    		count = 0;
    		countB = 0;
    		countL = 0;
    	}
    	threads.delay(100);
    }

  9. #409
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    9,378
    Quote Originally Posted by WMXZ View Post
    Sounds to me that Serial.print is not thread save.
    (I would be surprised if there are NO global variables in Serial/Stream/Print)
    Just saw this post in the mix … it is working with last post - Not sure I cleaned up anything I did wrong by accident - but I did for sure 'reduce' the chance for Serial.print collision - and altered the timing if that is what it was. Though two threads are now printing … will see if it is running after some Zzzz's.

    Very odd because it would ALWAYS never run the Activity() thread after loop thread did it's first print. I saw the loop was entering up to the 1 sec timeout and then printing - then no more Activity() thread. In the process I made things WORSE before they seemed to go right where it was going dead needing Button to upload - but before all threads but the Activity thread kept being active ????

    May have been a combo of abuse - I had Micros Timing very fast at threads.setMicroTimer( 10 ) and loop() had no .yield() - now bigger slices as well at threads.setMicroTimer( 100 ); and loop() only 10 Hz not some millions.

    This is the kind of feedback the 'Cycles Per Thread' can help to correct errors where linear logic can fail on time slicing not knowing where the problem is?

    If would be nice perhaps if the thread name could optionally be passed - the compiler gives a pointer to string. That could also help in debugging:
    Code:
    void thread_funcC(int inc) {
            thread.pNameSet( __func__);
    	while (1) {
    		countC += inc;
    		countC += inc;
    		threads.yield();
    	}
    }
    It could start NULL and then be set as above with thread.pNameSet( __func__); for storage. Then be shown with thread.nameShow( threadId );

    Though not sure how threadId is known? On dual core ESP32 there is a function that returns what CORE# the function is running on?

    @ftrias would it make sense to do something like this so it could be used at compile time?:
    Code:
      #define DEF_MAX_THREADS 8
      static const int MAX_THREADS = DEF_MAX_THREADS;
    
    // for use like abused in the posted unsigned long *Threads::getCyclesUsedAll(void) {
    uint32_t threadCycles[ DEF_MAX_THREADS ];

  10. #410
    Senior Member+
    Join Date
    Jul 2014
    Location
    New York
    Posts
    3,958
    @WMXZ and @defragster

    Serial.print's are not thread safe - there were a few posts about that awhile ago. Was just setting up the code to use a thread lock for Serial when I saw the updated posts. Heres @tonton81 explaining locks to me a couple of years ago: https://forum.pjrc.com/threads/41504...l=1#post139717

  11. #411
    Senior Member
    Join Date
    Sep 2013
    Location
    Boston, MA
    Posts
    129
    Quote Originally Posted by defragster View Post
    @ftrias - I haven't gotten back to this - just saw your profiling post … made me return to this as it does similar for TeensyThreads

    still wondering why the task TH_thread_Activity() to dump the fails to run after loop() starts? But works when doing the same from loop() as thread_Activity() - which shows htat thread is not getting execution time?

    Also the zero of the accumulated values and ALL() in one version to capture the thread run times.
    I think it would still be useful to see per-thread CPU usage, but profiling is probably more useful.

    FYI, I just created a new TeensyGProf that's even better than TeensyProf. See https://github.com/ftrias/TeensyGProf

    On the forum: https://forum.pjrc.com/threads/57333...512#post213512

  12. #412
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    9,378
    Quote Originally Posted by ftrias View Post
    I think it would still be useful to see per-thread CPU usage, but profiling is probably more useful.

    FYI, I just created a new TeensyGProf that's even better than TeensyProf. See https://github.com/ftrias/TeensyGProf

    On the forum: https://forum.pjrc.com/threads/57333...512#post213512
    Won't the profiling at 1ms intervals will miss a lot on 600 MHz T4? It does a great deal in 600,000 cycles It can cycle loop() 5 to 20 Million times in a second.

    And when it comes to THREADS watching each thread switch timing is exact to the CPU CYCLE per thread and will show when a thread is BROKEN either ALL ON or ALL OFF.

    For TThreads - the per thread seems critical as noted when going from linear coding to Time Slicing it is very easy to get lost/broken - of course then the usage report may get lost too ...

  13. #413
    Senior Member
    Join Date
    Sep 2013
    Location
    Boston, MA
    Posts
    129
    Quote Originally Posted by defragster View Post
    Won't the profiling at 1ms intervals will miss a lot on 600 MHz T4? It does a great deal in 600,000 cycles It can cycle loop() 5 to 20 Million times in a second.

    And when it comes to THREADS watching each thread switch timing is exact to the CPU CYCLE per thread and will show when a thread is BROKEN either ALL ON or ALL OFF.

    For TThreads - the per thread seems critical as noted when going from linear coding to Time Slicing it is very easy to get lost/broken - of course then the usage report may get lost too ...
    I suppose we could test more frequently, but it may not make a difference. gprof is a sampling profiler so it relies on statistical distributions. If you run it for 1 minutes (60000 samples), it will catch the biggest offenders at the risk of missing out on small fry. For more details on why this works see http://wwwcdf.pd.infn.it/localdoc/gprof.pdf

  14. #414
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    9,378
    Quote Originally Posted by ftrias View Post
    I suppose we could test more frequently, but it may not make a difference. gprof is a sampling profiler so it relies on statistical distributions. If you run it for 1 minutes (60000 samples), it will catch the biggest offenders at the risk of missing out on small fry. For more details on why this works see http://wwwcdf.pd.infn.it/localdoc/gprof.pdf
    No doubt it will be statistically relevant and helpful in most cases. I didn't get as far as opening the source to see how it gathers the samples - but I should. But in the case of TThreads and forced time slices this info tells a great deal of general info more than random samples - especially where each thread has a known task set - and when trying to understand where it has gone wrong on the upper level.
    Code:
    thread_func is th#1:
    thread_funcB is th#2:
    TH_thread_Activity is th#3:
    thread_funcC is th#4:
    thread_funcD is th#5:
    
    th#0 use=207934324 [0.346557]	th#1 use=1303 [0.000002]	th#2 use=1417 [0.000002]	th#3 use=2 [0.000000]	th#4 use=2 [0.000000]	th#5 use=2 [0.000000]	loop
    Since I added the sprintf() on the th#3 there won't be contention for Serial.print with th#0/loop that does the print.

    Just did a quick test and for sure there is contention when a subthread prints the loop() thread goes mute on Serial.print. Need to scroll back and find the note @mjs513 referred to - which is why I came here ...

  15. #415
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    9,378
    TeensyThreads task switcher Overhead measure: 16.8% ... 100M of the 600M cycles with my current sketch and 6 total threads.

    I added a DEBUG Accum of CycCnt's complementing the thread timing - that is when a thread is off the clock I put the Engine on the clock until the next thread is clocked in:
    Code:
    th_Eng use=101003188 [0.168339]	th#0 use=28416306 [0.047361]	th#1 use=175960630 [0.293268]	th#2 use=194081934 [0.323470]	th#3 use=30918336 [0.051531]
    	th#4 use=28799936 [0.048000]	th#5 use=30123054 [0.050205]	thSum=589303384 [0.982172]	loop
    Most of the tasks at hand quickly TH.yield() so the switching is much faster than normal time slice sharing no matter the clock switch rate so that 17% overhead is worse than full or longer slices would be. But the overhead on switching does look significant.

    In the TH_thread_Activity() I created also started counting the SUM of measured clock cycles for each task. That comes out to 98.2% so there is no double counting. I just overlapped the stop/start CYC_CNT on the one end and it jumped to 99.44% from 98.2% so the number of task switches and the couple of ticks involved in the tracking account for that 'loss' it seems.

    Just added count of threadEngineSwitches showing there are 827,466 - with each switch taking 123.67 cycles per switch:
    Code:
    th_Eng use=102330882 [0.170551] w/SW#=827466 [Cyc/=123.67]	th#0 use=28750668 [0.047918]	th#1 use=176506911 [0.294178]	th#2 use=195159196 [0.325265]	th#3 use=32442995 [0.054072]	th#4 use=29560783 [0.049268]	th#5 use=31115592 [0.051859]	thSum=595867027 [0.993112]	loop
    I took out the PRIORITY thread SEARCH CODE loop and the overhead drops over 66% from 17% to 5.2% - which yields 14+% more task switches per second:
    Code:
    th_Eng use=31468513 [0.052448] w/SW#=947844 [Cyc/=33.20]	th#0 use=32741061 [0.054568]	th#1 use=202109210 [0.336849]	th#2 use=223228089 [0.372047]	th#3 use=37719772 [0.062866]	th#4 use=34320755 [0.057201]	th#5 use=35571489 [0.059286]	thSum=597158889 [0.995265]	loop
    Here is the .h and .cpp file as used including my current sketch with it in use in TH_thread_Activity():

  16. #416
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    9,378
    << IGNORE - RUNNING is not what assumed>.

    @ftrias / @mjs513 / ... ::
    I just edited the PRIORITY test loop and I get this - much improved at 7% over 17%:
    Code:
    th_Eng use=40984570 [0.068308] w/SW#=932166 [Cyc/=43.97]	th#0 use=32500477 [0.054167]	th#1 use=199464888 [0.332441]	th#2 use=220106337 [0.366844]	th#3 use=36990595 [0.061651]	th#4 use=33562497 [0.055937]	th#5 use=35459571 [0.059099]	thSum=599068935 [0.998448]	loop
    Is this ::
    Code:
      if (threadp[current_thread]->flags == RUNNING) { // is it running?
        if (threadp[current_thread]->priority) {
          priority_thread = current_thread;
          currentCount = threadp[current_thread]->ticks; // priority
          threadp[current_thread]->priority = 0; // SHOULD THIS be  -=1; ????
        }
      }
    Given that it only cares about the CURRENT_THREAD ---- Assuming current_thread is initialized - - - IS this the functional equivalent of this? ::
    Code:
      for (int i = 0; i < MAX_THREADS; i++) {
        if (threadp[i]) { // initialized thread?
          if (threadp[current_thread]->flags == RUNNING) { // is it running?
            if (threadp[i]->priority) {
              current_thread = i;
              priority_thread = i;
              currentCount = threadp[i]->ticks; // priority
              threadp[i]->priority = 0;
              break;
            }
          }
        }
      }
    Of course - asking this seems redundant :: if (threadp[current_thread]->flags == RUNNING) { // is it running?
    Last edited by defragster; 08-25-2019 at 05:14 AM.

  17. #417
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    9,378
    Post #416 didn't fix the time spent in the thread scheduler where p#415 showed it consuming ~17% of Teensy 4 execution time.

    Question: to make the scheduler more efficient would it be possible to maintain a simpler list of 'next thread' and 'Priority thread' to get it started? This would require anything that alters the 'list' to update/maintain it by adding or removing the threads when their state was altered. The commands that could do that would be:
    >> Kill, Suspend, Restart, addThread. setState

    Also noted in p#416 was when priority was set to 1 or higher, current code always goes to ZERO taking it off the priority list - should it just decrement by one so if 2 or higher it would continue as a priority until is reduces to Zero?
    Code:
            if (threadp[i]->priority) {
    //…
              threadp[i]->priority = 0; //  should this be :: threadp[i]->priority -= 1;

  18. #418
    Senior Member
    Join Date
    Sep 2013
    Location
    Boston, MA
    Posts
    129
    Quote Originally Posted by defragster View Post
    Post #416 didn't fix the time spent in the thread scheduler where p#415 showed it consuming ~17% of Teensy 4 execution time.

    Question: to make the scheduler more efficient would it be possible to maintain a simpler list of 'next thread' and 'Priority thread' to get it started? This would require anything that alters the 'list' to update/maintain it by adding or removing the threads when their state was altered. The commands that could do that would be:
    >> Kill, Suspend, Restart, addThread. setState

    Also noted in p#416 was when priority was set to 1 or higher, current code always goes to ZERO taking it off the priority list - should it just decrement by one so if 2 or higher it would continue as a priority until is reduces to Zero?
    Code:
            if (threadp[i]->priority) {
    //…
              threadp[i]->priority = 0; //  should this be :: threadp[i]->priority -= 1;
    Looking at it now, it does seem like the priority thread support is not implemented very efficiently. If I recall, I added it at a later time. It was really just a way to give a boost to threads that had been suspended via a lock. Maybe it's best to just get rid of priorities. It doesn't make any practical difference and the current code is pretty inefficient.

  19. #419
    Junior Member
    Join Date
    Jun 2016
    Location
    Rennes France
    Posts
    18
    I have question about EEPROM Write action and thread switching. I know than eeprom write action can be critique and can't be interrupt during proceed. What's append if multi thread switch context during those operation ? Using mutex not seem a good solution for me because their don't avoid switching context. And like EEPROM write cycle is low level operation that can't be interrupt, switching context can crash it. Perhaps one thread Stop() before and thread Start() after can avoid this ? Is-it a good solution, or is there another ?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •