Teensy 4.0 vs 3.2 speed comparison

Status
Not open for further replies.

Nick.swidinsky

New member
Hi all,

I am working on speed comparisons between the Teensy 3.2 and 4.0 chips. We currently have numerous 3.2 chips in use around the lab, but we are thinking about upgrading them to the 4.0 chips. I have ran a comparison of the analog read function on the two chips at various clock speeds, but I get interesting results which I am unsure about. I have attached an image that shows the different read speeds for both chips at the various clock speeds.

bar_graph_comparison_nolegend.png

There are two things that confuse me, first is why the teensy 3.2 is showing a higher read speed for the 72MHz trial then any other clock speed, and second why the 4.0 reads consistently around 1 us for all clock speeds when I have seen others mention that it is around 2 us. I have also played around with the averaging but it does not change any of my numbers on the 4.0 while having a drastic change on the 3.2.

Here is my code in arduino:

Code:
long t = 0;
long loopnumb = 0;

void setup()
{
 
  Serial.begin(115200);
  pinMode(A0, INPUT);
  //analogReadAveraging(4);  //tested different averaging 
  Serial.println();  //used for formatting in my text file
}

void loop()
{
  noInterrupts(); //disable all interrupts to increase speed
  loopnumb = loopnumb + 1;
  for(int i=0; i<1000; i++) {  //Run 1000 reads to make data easier to read
    analogRead(A0);
  }
  t = micros();  // calculate elapsed time in microseconds

  //Serial.print("Time per sample: ");
  Serial.print(t);
  Serial.print(", ");
  //Serial.print("Loop Number: ");
  Serial.println(loopnumb);
  interrupts();
}

I am quite new to coding in arduino so I am sure that there are plenty of improvements to be made to my code. Any help on this would be wonderful. Thanks!
 
Last edited by a moderator:
Notes:
> Faster clock speed may not relate to the resolution time of the Analog read which to some degree uses the similar hardware process on both chips.
> Code above shows "t = micros();" after the loop - but not setting before the loop.
> Best use of micros() would be with uint32_t ( unsigned long ) as it is an unsigned value.
> It might be that both use the same default analogReadAveraging() value - but to be sure always setting that would make sure it is 'apples to apples' and known.
> ... same for the analogReadResolution()

Code:
uint32_t t = 0;
   // ...
  t = micros();  // get start elapsed time in microseconds
  for(int i=0; i<1000; i++) {  //Run 1000 reads to make data easier to read
    analogRead(A0);
  }
  t = micros() - t;  // calculate elapsed time in microseconds
   // ...
 
Yup, you need to compute the elapsed time.

This is also likely to cause problems if you measure more than 1 ms.

Code:
  noInterrupts(); //disable all interrupts to increase speed

The timing functions like micros() depend on an interrupt. If you disable all interrupts for too long, you'll risk messing up the micros() function.

Another alternative would be to read the ARM DWT cycle counter. It doesn't need interrupts. But it increments at 600 MHz on Teensy 4.x, so you do need to be careful about taking too long since the 32 bit count will overflow in just over 7 seconds. But 1000 analogRead() calls should be well under that threshold.
 
Be very careful with benchmarking code like this:
Code:
  for(int i=0; i<1000; i++) {  //Run 1000 reads to make data easier to read
    analogRead(A0);
  }
Compilers are smart enough to remove the call completely if the result isn't used and the function doesn't have
side effects (in the compiler's eyes). The entire loop could get optimized away, (or not) depending on compiler
versions and settings...
 
Two added good points I left off for brevity as I had some place to be ...

Was going to note the loss of interrupts and switch to ARM_DWT_CYCCNT, also if the times look odd the loop may be optimized away - doing something with the return value adds some uniform overhead.
Code:
uint32_t t;
uint32_t rs=0;
	if ( ARM_DWT_CYCCNT == ARM_DWT_CYCCNT ) { // T_3.2 doesn't enter setup() with counter running
		// Enable CPU Cycle Count
		ARM_DEMCR |= ARM_DEMCR_TRCENA;
		ARM_DWT_CTRL |= ARM_DWT_CTRL_CYCCNTENA;
	}
   // ...

  t = ARM_DWT_CYCCNT;  // get start CPU cycle count
  for(int i=0; i<1000; i++) {  //Run 1000 reads to make data easier to read
    rs += analogRead(A0);
  }
  t = ARM_DWT_CYCCNT - t;  // calculate elapsed time in CPU cycles either F_CPU or F_CPU_ACTUAL
  Serial.print(rs);
   // ...
 
i don't know if this optomizes out, not a compiler expert, but it does quiet the compiler
(void)rs;
 
Two added good points I left off for brevity as I had some place to be ...

Was going to note the loss of interrupts and switch to ARM_DWT_CYCCNT, also if the times look odd the loop may be optimized away - doing something with the return value adds some uniform overhead.
Code:
uint32_t t;
uint32_t rs=0;
	if ( ARM_DWT_CYCCNT == ARM_DWT_CYCCNT ) { // T_3.2 doesn't enter setup() with counter running
		// Enable CPU Cycle Count
		ARM_DEMCR |= ARM_DEMCR_TRCENA;
		ARM_DWT_CTRL |= ARM_DWT_CTRL_CYCCNTENA;
	}
   // ...

  t = ARM_DWT_CYCCNT;  // get start CPU cycle count
  for(int i=0; i<1000; i++) {  //Run 1000 reads to make data easier to read
    rs += analogRead(A0);
  }
  t = ARM_DWT_CYCCNT - t;  // calculate elapsed time in CPU cycles either F_CPU or F_CPU_ACTUAL
  Serial.print(rs);
   // ...


Thanks for the help with this comment and Paul's comment I have a better Idea of what I should be doing, however I am now a bit confused with what values I am actually seeing for the t variable. When I upload the code I am getting values that are around 10 000 000. You mention that this calculates the elapsed time in CPU cycles so do I have to divide that by clock speed to get the time per cycle, and does that correlate to doing 1000 reads of the analog pin? Sorry if this is a simple question I am still very new to using Teensy chips but I appreciate the help.
 
That is RAW CPU counts - yes division will turn into time values. And yes, that would be for the loop of 1,000 reads.
>> 600,000,000 per second on 600 MHz T_4.1 with F_CPU_ACTUAL as run time reference
>> 96,000,000 per second on 96 MHz T_3.2 with F_CPU as run time reference
 
Status
Not open for further replies.
Back
Top