Forum Rule: Always post complete source code & details to reproduce any issue!
Results 1 to 8 of 8

Thread: Teensy 4.0 vs 3.2 speed comparison

  1. #1

    Teensy 4.0 vs 3.2 speed comparison

    Hi all,

    I am working on speed comparisons between the Teensy 3.2 and 4.0 chips. We currently have numerous 3.2 chips in use around the lab, but we are thinking about upgrading them to the 4.0 chips. I have ran a comparison of the analog read function on the two chips at various clock speeds, but I get interesting results which I am unsure about. I have attached an image that shows the different read speeds for both chips at the various clock speeds.

    Click image for larger version. 

Name:	bar_graph_comparison_nolegend.png 
Views:	24 
Size:	56.9 KB 
ID:	21302

    There are two things that confuse me, first is why the teensy 3.2 is showing a higher read speed for the 72MHz trial then any other clock speed, and second why the 4.0 reads consistently around 1 us for all clock speeds when I have seen others mention that it is around 2 us. I have also played around with the averaging but it does not change any of my numbers on the 4.0 while having a drastic change on the 3.2.

    Here is my code in arduino:

    Code:
    long t = 0;
    long loopnumb = 0;
    
    void setup()
    {
     
      Serial.begin(115200);
      pinMode(A0, INPUT);
      //analogReadAveraging(4);  //tested different averaging 
      Serial.println();  //used for formatting in my text file
    }
    
    void loop()
    {
      noInterrupts(); //disable all interrupts to increase speed
      loopnumb = loopnumb + 1;
      for(int i=0; i<1000; i++) {  //Run 1000 reads to make data easier to read
        analogRead(A0);
      }
      t = micros();  // calculate elapsed time in microseconds
    
      //Serial.print("Time per sample: ");
      Serial.print(t);
      Serial.print(", ");
      //Serial.print("Loop Number: ");
      Serial.println(loopnumb);
      interrupts();
    }
    I am quite new to coding in arduino so I am sure that there are plenty of improvements to be made to my code. Any help on this would be wonderful. Thanks!
    Last edited by defragster; 08-07-2020 at 08:59 PM. Reason: added CODE # for readability

  2. #2
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    12,178
    Notes:
    > Faster clock speed may not relate to the resolution time of the Analog read which to some degree uses the similar hardware process on both chips.
    > Code above shows "t = micros();" after the loop - but not setting before the loop.
    > Best use of micros() would be with uint32_t ( unsigned long ) as it is an unsigned value.
    > It might be that both use the same default analogReadAveraging() value - but to be sure always setting that would make sure it is 'apples to apples' and known.
    > ... same for the analogReadResolution()

    Code:
    uint32_t t = 0;
       // ...
      t = micros();  // get start elapsed time in microseconds
      for(int i=0; i<1000; i++) {  //Run 1000 reads to make data easier to read
        analogRead(A0);
      }
      t = micros() - t;  // calculate elapsed time in microseconds
       // ...

  3. #3
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    22,459
    Yup, you need to compute the elapsed time.

    This is also likely to cause problems if you measure more than 1 ms.

    Code:
      noInterrupts(); //disable all interrupts to increase speed
    The timing functions like micros() depend on an interrupt. If you disable all interrupts for too long, you'll risk messing up the micros() function.

    Another alternative would be to read the ARM DWT cycle counter. It doesn't need interrupts. But it increments at 600 MHz on Teensy 4.x, so you do need to be careful about taking too long since the 32 bit count will overflow in just over 7 seconds. But 1000 analogRead() calls should be well under that threshold.

  4. #4
    Senior Member
    Join Date
    Jul 2020
    Posts
    308
    Be very careful with benchmarking code like this:
    Quote Originally Posted by Nick.swidinsky View Post
    Code:
      for(int i=0; i<1000; i++) {  //Run 1000 reads to make data easier to read
        analogRead(A0);
      }
    Compilers are smart enough to remove the call completely if the result isn't used and the function doesn't have
    side effects (in the compiler's eyes). The entire loop could get optimized away, (or not) depending on compiler
    versions and settings...

  5. #5
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    12,178
    Two added good points I left off for brevity as I had some place to be ...

    Was going to note the loss of interrupts and switch to ARM_DWT_CYCCNT, also if the times look odd the loop may be optimized away - doing something with the return value adds some uniform overhead.
    Code:
    uint32_t t;
    uint32_t rs=0;
    	if ( ARM_DWT_CYCCNT == ARM_DWT_CYCCNT ) { // T_3.2 doesn't enter setup() with counter running
    		// Enable CPU Cycle Count
    		ARM_DEMCR |= ARM_DEMCR_TRCENA;
    		ARM_DWT_CTRL |= ARM_DWT_CTRL_CYCCNTENA;
    	}
       // ...
    
      t = ARM_DWT_CYCCNT;  // get start CPU cycle count
      for(int i=0; i<1000; i++) {  //Run 1000 reads to make data easier to read
        rs += analogRead(A0);
      }
      t = ARM_DWT_CYCCNT - t;  // calculate elapsed time in CPU cycles either F_CPU or F_CPU_ACTUAL
      Serial.print(rs);
       // ...

  6. #6
    Senior Member
    Join Date
    Dec 2016
    Location
    Montreal, Canada
    Posts
    3,416
    i don't know if this optomizes out, not a compiler expert, but it does quiet the compiler
    (void)rs;

  7. #7
    Quote Originally Posted by defragster View Post
    Two added good points I left off for brevity as I had some place to be ...

    Was going to note the loss of interrupts and switch to ARM_DWT_CYCCNT, also if the times look odd the loop may be optimized away - doing something with the return value adds some uniform overhead.
    Code:
    uint32_t t;
    uint32_t rs=0;
    	if ( ARM_DWT_CYCCNT == ARM_DWT_CYCCNT ) { // T_3.2 doesn't enter setup() with counter running
    		// Enable CPU Cycle Count
    		ARM_DEMCR |= ARM_DEMCR_TRCENA;
    		ARM_DWT_CTRL |= ARM_DWT_CTRL_CYCCNTENA;
    	}
       // ...
    
      t = ARM_DWT_CYCCNT;  // get start CPU cycle count
      for(int i=0; i<1000; i++) {  //Run 1000 reads to make data easier to read
        rs += analogRead(A0);
      }
      t = ARM_DWT_CYCCNT - t;  // calculate elapsed time in CPU cycles either F_CPU or F_CPU_ACTUAL
      Serial.print(rs);
       // ...

    Thanks for the help with this comment and Paul's comment I have a better Idea of what I should be doing, however I am now a bit confused with what values I am actually seeing for the t variable. When I upload the code I am getting values that are around 10 000 000. You mention that this calculates the elapsed time in CPU cycles so do I have to divide that by clock speed to get the time per cycle, and does that correlate to doing 1000 reads of the analog pin? Sorry if this is a simple question I am still very new to using Teensy chips but I appreciate the help.

  8. #8
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    12,178
    That is RAW CPU counts - yes division will turn into time values. And yes, that would be for the loop of 1,000 reads.
    >> 600,000,000 per second on 600 MHz T_4.1 with F_CPU_ACTUAL as run time reference
    >> 96,000,000 per second on 96 MHz T_3.2 with F_CPU as run time reference

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •