Teensy 4.0 First Beta Test

Status
Not open for further replies.
Kurt, the TCR has a bit RMSK:
Receive Data Mask
When set, receive data is masked (receive data is not stored in receive FIFO).
0b - Normal transfer
1b - Receive data is masked
This might be better for DMA than my initial NOSTALL :)
Setting the CONT bit has the effect, that the transfers do not start... hm... I'm missing something..
 
ACMP and DAC

Here is a simple sketch to demonstrate T4 comparator ACMP using 6-bit DAC as reference/comparison voltage.
https://github.com/manitou48/teensy4/blob/master/acmpdac.ino

Sketch uses ACMP3-0 which corresponds to A4, and the DAC output should be visible on pin 26 (backside). Jumper A4 to either GND or 3v3 to see printed output change. I don't have any easy way to get at underside pins, so if someone with breakout can check voltage on pin 26 (should be 3.3/2 v), that would be nice. I didn't configure pin 26 for anything (OUTPUT?), so that may need to be added ?? Doesn't appear to be an ALT mode for "DAC out". May be possible to manipulate DAC output without enabling ACMP?

EDIT i now have breakout board, so I can try get pin 26 pogo, and I can try GPT capture on pin 30

I clipped on to pin 26, and verified if i did digitalWrite(26,HIGH), my meter was seeing 3.3v. But meter shows 0v when ACMP sketch is running. I configured pin 26 MUX (ALT 1), and meter shows 0v or 3v3 depending on ACMP result. So can't get DAC voltage from pin 26, only the result of the ACMP.
 
Last edited:
Interval Timer again

I did some more tests on the low performance of the IntervalTimer and found something strange. It looks like the ISR is called two two times in a row.

To dig further into this I did a minimal example showing the effect without using IntervalTimer at all. I just set the corresponding registers. The code sets up the PIT and channel 1 with a reload value of 100 (T=4.1µs). In the PIT ISR I loop through all four channels and generate a pulse if the TFLAG of a channel is set. The pulse pin number (0..3) corresponds to the PIT channel number. The strange thing is tht, after the first call of the ISR (with the correct TFLAG set), the ISR is called again. This time without any TFLAG set so that the code falls through the loop and pulses on pin 4.

This behavior might be the reason for the suboptimal performance of the IntervallTimer (see #830). The code in the corresponding ISR from the interval timer needs to cycle through all the channels two times which costs about 1µs in total. Here the simple test code:

Code:
void pitIsr()
{
  for (int i = 0; i < 4; i++)
  {
    if (IMXRT_PIT_CHANNELS[i].TFLG == 1) // if channel TFLG is set, delete it and generate pulse on pin i
    {
      IMXRT_PIT_CHANNELS[i].TFLG = 1;
      digitalWriteFast(i, HIGH);
      digitalWriteFast(i, LOW);
      return;
    }
  }
  //why is the code below called?
  digitalWriteFast(4, HIGH);
  digitalWriteFast(4, LOW);
}

void beginPIT(uint32_t cycles)
{
  CCM_CCGR1 |= CCM_CCGR1_PIT(CCM_CCGR_ON);
  PIT_MCR = 0;

  IMXRT_PIT_CHANNELS[0].LDVAL = cycles;
  IMXRT_PIT_CHANNELS[0].TFLG = 1;
  IMXRT_PIT_CHANNELS[0].TCTRL = PIT_TCTRL_TEN | PIT_TCTRL_TIE;

  attachInterruptVector(IRQ_PIT, pitIsr);
  NVIC_ENABLE_IRQ(IRQ_PIT);
}

void setup()
{
  pinMode(LED_BUILTIN, OUTPUT);
  for (int i = 0; i < 5; i++) pinMode(i, OUTPUT);

  beginPIT(100);
}

void loop()
{
  digitalWriteFast(LED_BUILTIN, !digitalReadFast(LED_BUILTIN));
  delay(250);
}

itimer3.jpg
 
I did some more tests on the low performance of the IntervalTimer and found something strange. It looks like the ISR is called two two times in a row.

To dig further into this I did a minimal example showing the effect without using IntervalTimer at all. I just set the corresponding registers. The code sets up the PIT and channel 1 with a reload value of 100 (T=4.1µs). In the PIT ISR I loop through all four channels and generate a pulse if the TFLAG of a channel is set. The pulse pin number (0..3) corresponds to the PIT channel number. The strange thing is tht, after the first call of the ISR (with the correct TFLAG set), the ISR is called again. This time without any TFLAG set so that the code falls through the loop and pulses on pin 4.

This behavior might be the reason for the suboptimal performance of the IntervallTimer (see #830). The code in the corresponding ISR from the interval timer needs to cycle through all the channels two times which costs about 1µs in total. Here the simple test code:

Code:
void pitIsr()
{
  for (int i = 0; i < 4; i++)
  {
    if (IMXRT_PIT_CHANNELS[i].TFLG == 1) // if channel TFLG is set, delete it and generate pulse on pin i
    {
      IMXRT_PIT_CHANNELS[i].TFLG = 1;
      digitalWriteFast(i, HIGH);
      digitalWriteFast(i, LOW);
      return;
    }
  }
  //why is the code below called?
  digitalWriteFast(4, HIGH);
  digitalWriteFast(4, LOW);
}

void beginPIT(uint32_t cycles)
{
  CCM_CCGR1 |= CCM_CCGR1_PIT(CCM_CCGR_ON);
  PIT_MCR = 0;

  IMXRT_PIT_CHANNELS[0].LDVAL = cycles;
  IMXRT_PIT_CHANNELS[0].TFLG = 1;
  IMXRT_PIT_CHANNELS[0].TCTRL = PIT_TCTRL_TEN | PIT_TCTRL_TIE;

  attachInterruptVector(IRQ_PIT, pitIsr);
  NVIC_ENABLE_IRQ(IRQ_PIT);
}

void setup()
{
  pinMode(LED_BUILTIN, OUTPUT);
  for (int i = 0; i < 5; i++) pinMode(i, OUTPUT);

  beginPIT(100);
}

void loop()
{
  digitalWriteFast(LED_BUILTIN, !digitalReadFast(LED_BUILTIN));
  delay(250);
}

View attachment 15598

Have you tried to add asm volatile ("dsb")) ?
 
Perfect, that fixes it.

Code:
for (int i = 0; i < 4; i++)
  {
    if (IMXRT_PIT_CHANNELS[i].TFLG == 1) // if channel TFLG is set, delete it and generate pulse on pin i
    {
      IMXRT_PIT_CHANNELS[i].TFLG = 1;
      digitalWriteFast(i, HIGH);
      digitalWriteFast(i, LOW);
      asm volatile ("dsb") ;  // <------ That fixes it. 
      return;
    }
  }
 
great.
I've read, sometimes it's better / faster to read a register of the device instead of using "dsb" - perhaps try " /*dummy = */ IMXRT_PIT_CHANNELS.TFLG;", too ?
Would be interesting to know what is better in this case..
 
With DSB I can go down to a LDVAL of 4 (~2MHz). Polling seems to be slightly slower but whats worse: Polling does not seem to work for all timings. E.g. LDVAL = 5 leads to a double interrupt every 5th ISR call. LDVAL=6 gives one double every 1100 calls :-/

Edit: did a single read into a volatile unsigned, not a real polling
 
As I recall ping lib uses pulseIn() which in turn uses micros(). But on T4, micros() only has 10us resolution :( so I wonder how ping distance calculations are affected. (i'd really prefer systick ran at current CPU clock speed so we could get microsecond resolution from micros(). if CPU clock is changed, function could take care of updating systick rate.)

I did a little reading (always dangerous in my case) and tried this sketch:
Code:
void setup() {
  pinMode(3, OUTPUT);
}

void loop() {
  digitalWrite(3, HIGH);
  delayUS_DWT(1000000);
  digitalWrite(3, LOW);
  delayUS_DWT(1000000);
}

void delayUS_DWT(uint32_t us) {
  volatile uint32_t cycles = (600000000L/1000000L)*us;
  volatile uint32_t start = ARM_DWT_CYCCNT;
  do  {
  } while(ARM_DWT_CYCCNT - start < cycles);
}
Put it on scope and looked pretty good, or am I reading it wrong
 
Using a GPT is better than defective micros :) What would be the reasons not to use it - are there libraries that need a GPT, or both GPT?
 
Using a GPT is better than defective micros :) What would be the reasons not to use it - are there libraries that need a GPT, or both GPT?

Nothing uses GPTn as far as I know ... which goes to Paul's need to have a "timers consumers list" to know what is using which timers (PIT, GPT, QTMR, flexPWM) ... IntervalTimer, Tone, PWM, IRremote ...
 
Really? Using a "precious" 32bit timer/counter for micros? That would definitely eliminate the timer for any other general use. Just thought of using them for TeensyStep instead of the slow T4 Intervall-timers....
 
Nothing uses GPTn as far as I know ... which goes to Paul's need to have a "timers consumers list" to know what is using which timers (PIT, GPT, QTMR, flexPWM) ... IntervalTimer, Tone, PWM, IRremote ...
I started looking at the different timers now and was going to play around with them and threw what I did to start the conversation. I can start going through the libraries and see what they are using if I can sort it out of course :)
 
Using a GPT is better than defective micros :) What would be the reasons not to use it - are there libraries that need a GPT, or both GPT?

That would also be a great fix for ElapsedMicros too as it suffers the same problem and using CycleCounter wraps in under 7.2 seconds with F_CPU==600MHz
 
@luni, depends on what is considered more important - having working micros() or having a spare timer (Pauls decision).. You can never please everyone anyway.
 
That would also be a great fix for ElapsedMicros too as it suffers the same problem and using CycleCounter wraps in under 7.2 seconds with F_CPU==600MHz

Tim, just curious why would you want to use elapsed micros to count to 7.2 seconds? Do you have a use case?

Mike
 
I started looking at the different timers now and was going to play around with them and threw what I did to start the conversation. I can start going through the libraries and see what they are using if I can sort it out of course :)

Started with some of the libraries I touched so far:
Capture.JPG

Comments - suggestions?
 
Tim, just curious why would you want to use elapsed micros to count to 7.2 seconds? Do you have a use case?

Mike

Nothing in particular - just that that it would rollover and be confusing. I was perhaps conflating it with micros rolling in ~87 secs where it was used for longer as a reference.

Also the quick look I gave to the elapsedus code didn't turn out right - given everything coming or going needs to be handled with 1/F_CPU_ACTUAL in the proper fashion. Having a steady 1 MHz value to feed it would make it fit in less code and run faster. The T_3.x version of elapsedus and micros goes to some effort to resolve/round the current value - running overhead extra code.

Mike, does NeoPixel modifiy the CycleCounter or just start and use it as a running value? <edit> seems to have two places it starts it and then just reads it for it's value. Though it uses F_CPU for resolution that will be wrong.
 
Last edited:
Status
Not open for further replies.
Back
Top