Reading Quad Timer Seems Slow

lerman

New member
That's on a Teensy 4.1.

My code is:

Code:
    digitalWriteFast(READING_start_pin, HIGH); // create a leading edge
    // capture the counters by reading one
    TMR3->CH[0].CNTR;  // get all the values into TMR3->CH[n].HOLD
    digitalWriteFast(READING_end_pin, HIGH); // create a leading edge
I put the code in a loop and drop the pins at the beginning. I've also setup the timers outside the loop. I'm using the Arduino IDE.

That seems to take around 39 nanoseconds. That seems slow to me. Since everything is a constant, reading the timer should only be one instruction. When I comment out the TMR... line, it takes 3 nsec from leading edge to leading edge.

Where would I find out how long that should take?

(I should comment, that there may be something funky about my measurement setup. It might be a PICNIC -- Problem In Chair, Not In Computer problem.)

Thanks,

Ken
 
Technically it is one instruction, but not every instruction takes one cycle/tick.
The quad timers use the IPG clock (150MHz) so any access to their registers from the CPU has to be synchronized to that, so there's at least ~7ns accounted for.
Any chance there is something else using the bus at that moment, i.e. DMA being performed in the background?
 
Any chance there is something else using the bus at that moment
IIRC the GPIO registers are using the same bus as the quad timers.

I was interested if the compiler generates unnecessary complicated code for this
Code:
void setup()
{
    constexpr IMXRT_TMR_t* TMR3 = (IMXRT_TMR_t*)IMXRT_TMR3_ADDRESS;

    digitalWriteFast(17, HIGH); // create a leading edge
    TMR3->CH[0].CNTR;           // get all the values into TMR3->CH[n].HOLD
    digitalWriteFast(18, HIGH); // create a leading edge
}

void loop()
{
}

But as usual the generated code (gcc 5.4) seems to be pretty optimal (from the list file, cleaned up, comments added by me):
Code:
0000007c <setup>:
      7c:	mov.w	r3, #1107296256	; 0x42000000        r3 = IMXRT_GPIO6_ADDRESS (pin register)
      80:	mov.w	r0, #4194304	; 0x400000          r0 = mask for pin 17
      84:	ldr	r1, [pc, #16]	; (98 <setup+0x1c>) r1 = IMXRT_TMR3_ADDRESS
      86:	mov.w	r2, #131072	; 0x20000           r2 = mask for pin18

      8a:	str.w	r0, [r3, #132]	; 0x84              set pin 17 high / write mask to GPIO6.DR  (IMXRT_GPIO6_ADDRESS + 0x84)
      8e:	ldrh	r1, [r1, #10]                       load  TMR3->CH[0].CNTR  (IMXRT_TMR3_ADDRESS+10) into r1 
      90:	str.w	r2, [r3, #132]	; 0x84              set pin 18 high / write mask to GPIO6.DR)
      94:	bx	lr                                  return

      96:	nop
      98:	.word	0x401e4000

I can't measure this with the required accuracy, but if it takes 33ns then the bus sync needs quite some time here

EDIT: Just checked, gcc 11 generates the same code. Note: one needs to replace the constexpr by const since gcc 11 is more picky about this
 
Last edited:
Using the code below to measure in ARM cycle counts, I get 20-21 cycles = 32.0 - 33.6 ns at 600 MHz.

With IDE 1.8.19 and TD 1.59b3, "invalid cast" compile error occurred for "constexpr", so it is commented out.

Code:
void setup()
{
  Serial.begin(9600);
  while (!Serial) {}
  
  /*constexpr*/ IMXRT_TMR_t* TMR3 = (IMXRT_TMR_t*)IMXRT_TMR3_ADDRESS;
  uint32_t start = ARM_DWT_CYCCNT;
  uint16_t count = TMR3->CH[0].CNTR;           // get all the values into TMR3->CH[n].HOLD
  uint32_t end = ARM_DWT_CYCCNT;
  Serial.printf( "%5hd  %1lu\n", count, end-start );
}

void loop()
{
}
 
Well, thanks to all. That answers the question, although it raises another. Why does it take so long?
Now that I know that the code seems to be doing what it is supposed to do, I can move on.

Thanks, again.

(I should note that this has been my first interaction with this community. It's been great.)

Regards,

Ken
 
I put the code in a loop and drop the pins at the beginning. I've also setup the timers outside the loop. I'm using the Arduino IDE.

That seems to take around 39 nanoseconds. That seems slow to me.
Most of that will be the loop overhead I suspect. The best way to time very short instruction sequences is to unroll the loop by a decent factor, 10 or more, so that the loop overhead is amortized by that factor.
 
I used a scope to see how long it takes.

I set one pin, I do something, I set another pin. I use the scope to measure the time between seeing the pins being set. Without the instruction in the middle it the time between the signals in about 3 nanosec. With the code in the middle, it took around 39. @joepasquariello measured it with software and got pretty much the same value.

I didn't mention it, but the loop is executed once per millisecond.

I'd sure like to know why it takes that long. The docs say that the maximum counter update rate is BUS_CLOCK_ROOT/2 for external clocks. I'm wondering if that clock is used to transfer the data from the CNTR registers to the HOLD registers and if that clock is slow.

It isn't obvious to me what the speed of that clock is and how it is set.
 
Well, thanks to all. That answers the question, although it raises another. Why does it take so long?
...

See posts #2 and #3 - the 1062 has two halves on different clocks and busses in some fashion.

3 cycles is what this takes: start = ARM_DWT_CYCCNT;

That CYCCNT is an easy read. Putting them together they are always 3 cycles apart - so that seems to come from the full speed 600 MHz MCU core. The other half IIRC runs at 1/4th that speed so the overhead may be to schedule the access and return the value for use.

The extended time of 20-21 cycles for the read of the timer seems to have to do with the processor design, given the underlying ASM shown is 'simple' looking.
 
I'd sure like to know why it takes that long. The docs say that the maximum counter update rate is BUS_CLOCK_ROOT/2 for external clocks. I'm wondering if that clock is used to transfer the data from the CNTR registers to the HOLD registers and if that clock is slow.

It isn't obvious to me what the speed of that clock is and how it is set.

That would be the IPG bus clock mentioned earlier. It's always one quarter of the main CPU clock (default CPU clock = 600MHz, so default IPG = 150MHz).
 
That would be the IPG bus clock mentioned earlier. It's always one quarter of the main CPU clock (default CPU clock = 600MHz, so default IPG = 150MHz).
Almost. It is a bit more flexible - 150MHz maximum, and is calculated and set in clockspeed.c.
(I played a bit with overclocking the IPG, but in the end it wasn't really stable, and you should refrain from doing so)

The reference manual has a nice diagram of all clocks (a bunch), and shows how they can be used.
 
Last edited:
Back
Top