Teensy 4.0 First Beta Test

Status
Not open for further replies.
@KurtE
Just checked the link and couldn't find the repository.

It should hopefully be there now. For some reason it marked the repository as private? Even though I don't have a paid version of github, so should not be able to create private ones?

I now have some of the Receive code working... Current stuff pushed up there. I am getting some either duplicate stuff or missed stuff with my code, so need to debug some more. But at least it might actually work sufficiently for normal devices...
 
Code:
volatile unsigned t1, c;
void myIrq();

void setup() {
  delay(1000);
  Serial.begin(9600);
  _VectorsRam[IRQ_Reserved1 + 16] = myIrq;
  NVIC_ENABLE_IRQ(IRQ_Reserved1);
  NVIC_SET_PRIORITY(IRQ_Reserved1, 0x0); 
 t1 = c = 0;
 unsigned t = ARM_DWT_CYCCNT; 
 NVIC_TRIGGER_IRQ(IRQ_Reserved1 ); 
 delay(1);
 Serial.printf("Cycles:%d, %d calls of myIrq\n", t1 - t, c);

}

void loop() {}

void myIrq(){
  unsigned t = ARM_DWT_CYCCNT;
  asm volatile("" ::: "memory");
  t1 = t;  
  c++;
}
No sure why this works only with optimize for size.. but that's another problem.
It shows 20 Cycles - wich is pretty clse to the stated (NXP) 10-12 cycles. The remaining 8-10 cycles are just register loading, or something else.
So.. whatever the interrupt-problem is, it's not there with software-triggered interrupts.
 

Its indeed a kind of bus/clock-problem, I think.
This for example:
Code:
  start = ARM_DWT_CYCCNT;
  unsigned a = channel->TFLG;
  channel->TFLG = 1;
  end = ARM_DWT_CYCCNT;
takes 101 cycles.
I'll leave this for Paul or someone who wants to dig into the internal PIT-clocking.
Overclocking IPG to 300MHz reduced that to 85 - not enough.

I fear, it's not only the PIT with this issue.
 
Last edited:
Looks like more things are unexpectedly slow on the T4
Code:
start = ARM_DWT_CYCCNT;

digitalWriteFast(LED_BUILTIN,!digitalReadFast(LED_BUILTIN));

end = ARM_DWT_CYCCNT;
Serial.println(end - start);
takes 119 cycles...
 
Looks like the reason is the periphal clock.
Register CSCMR1, Page 717 (same register as for SPI - EDIT: Err, no, but almost :) )
The clockselector is "1" which means "OSC" (is that the 24MHz Crystal?)
Ok.. 600 / 24 = 25 Cycles just for waiting for the periphal, + another 25 read/write.. if I'm right..
We need to change the clocksource.

EDIT : BUT
Code:
static const uint32_t MAX_PERIOD = UINT32_MAX / (24000000 / 1000000);

That has influence to the MAX_PERIOD.

On the other hand... we have way higher clocks on T3.5/3.6, too. so.. maybe..>100MHz or such may be OK ? would be 4-5 times faster...
 
Last edited:
Re: double interrupts

for your viewing pleasure, here is another scope shot of PIT isr firing twice. scoped pin is set HIGH on entrance to ISR, and LOW just before exit.
dblisr.png
Adding asm("dsb") or while(IMXRT_PIT_CHANNELS.TFLG == 1); // spin
eliminates double interrupt. Counting number of cycles in ISR with ARM_DWT_CYCCNT, i see 244 cycles for dsb, and 349 cycles for spin wait. The double-interrupt takes about 456 cycles (760 ns on scope)
 
shall we try 75MHz ? (150 IPG / 2) ?
At least we'll see if your testsketch shows better numbers, then. Or if it's an other reason.
 
At least we'll see if your testsketch shows better numbers,
I personally think that a large MAX_PERIOD is not as important as a efficient interrupt handler. For 150MHz MAX_PERIOD would still be some 28s. If you tell me what to change I'll test it.
 
test this ;)
replace IntervalTimer.h with this:
Code:
Edit: Code was not functional
Unfortantely, the GPT will be influenced by this, too.
2019-01-24 21_26_31-Start.png
Page 676

Edit: Can someone check if the generated frequency is correct?
 
Last edited:
Great, this brings it down to 2.9% @200kHz Interrupt frequency. Which is better now than the T3.6 (3.8%)
 
For 150/2 = 75MHz:
CCM_CSCMR1 = (CCM_CSCMR1 & ~0x3f) | CCM_CSCMR1_PERCLK_PODF(1);

Shall we try GPIO, too ?

But I can't find the clock for it, at the moment.
The 1062 will have faster GPIOs - clocked with the same clock as the ARM.
 
GPIO seems to be clocked directly by IPG, so the only way to make it faster, is to overclock IPG.
This will influence PIT and GPT, again...
If you want to try it:
There is a line
Code:
    if (div_ipg > 4) div_ipg = 4;
in clockspeed.c
you can use:
Code:
    if (div_ipg > 2) div_ipg = 2;
to boost it a little..to 300MHz, or better 1/2 ARM Speed.


Good night.
I take a break and drink a beer.

I'll try to create a pullrequest for the PIT change in the next days - IPG is 1/2 (1/4) ARMclock, so we don't have a fixed clock anymore, if we do it this way.
 
@manitou - do you think we should add that DSB to the intervaltimer-interrupt - just to be sure?

Maybe.? In the NXP SDK examples almost every ISR had the dsb, with this comment
Code:
/* Add for ARM errata 838869, affects Cortex-M4, Cortex-M4F, Cortex-M7, Cortex-M7F Store immediate overlapping
  exception return operation might vector to incorrect interrupt */
#if defined __CORTEX_M && (__CORTEX_M == 4U || __CORTEX_M == 7U)
    __DSB();
#endif
Does the errata reason jive with what we are seeing?
 
Re: double interrupts

for your viewing pleasure, here is another scope shot of PIT isr firing twice. scoped pin is set HIGH on entrance to ISR, and LOW just before exit.
View attachment 15712
Adding asm("dsb") or while(IMXRT_PIT_CHANNELS.TFLG == 1); // spin
eliminates double interrupt. Counting number of cycles in ISR with ARM_DWT_CYCCNT, i see 244 cycles for dsb, and 349 cycles for spin wait. The double-interrupt takes about 456 cycles (760 ns on scope)

I was also wondering if maybe also slightly related, but with the FlexIO stuff, I put some digitalWriteFast high/low calls around where my call back code is.
I also had two others defined in them as well (process RX or Process TX)...

And I am seeing things like:screenshot.jpg
Where the first 4 lines are the different TX/RX pins for Serial1 and my FlexSerial object.

The next line is when my Callback is called from ISR handler and next ones are when it processes either RX or TX...

The ISR's are being called when specific bits when the appropriate bit is set in the SHIFTSEIN and the state is set in SHIFTSTAT register, the actual states are cleared on TX if I write something to: the appropriate SHIFTBUF and on RX if I read something from the SHIFTBUF (or in this case the alternate) SHIFTBUFBYS (has to do with swapping bits/bytes...

But was wondering if maybe seeing these double interrupts is same thing?
 
@Paul:

I think you read the conversation with Luni.
I can edit the all the files, and introduce some new "#defines" or better vaiables with the different clockspeeds.
I'd go away from the F_CPU/F_BUS scheme and add something like __imxrt_ipg_clk, __imxrt_per_clk etc.

Since I have done some PR in the past, which never got merged :) and this takes some time (have to edit intervaltimer too, and adopt it to the new variable__imxrt_ipg_clk), edit clockspeed.c, perhaps more -
I'd like to now it beforehand, wether you'll merge it or not.

I'll do it saturday or sunday - can you answer till then ? that would be great.
 
What are the design goals and requirements for the 4.0? How does it differ, at a high level, from the 3.x boards?
 
test this ;)
replace IntervalTimer.h with this:
// ...
Unfortantely, the GPT will be influenced by this, too.
...
Page 676

Edit: Can someone check if the generated frequency is correct?

One of my early sketches was interval timer at 2 us and it was doing just var++ and showing right at 500K hits as expected.

When I swap this {posted .h code} in, the counts on var++ was way low - about 80,000 instead of 500,000.

The sample is github.com ... /Blink_IntvTime

In my local copy I switched to Serial1 with .begin and output -
The output shows IntervalTimer count as "ITcnt=79753" versus "ITcnt=499294"

My CORES was copied yesterday - not sure if I need any other changes for this to work right - but it isn't now?

I just made the _isr look like this and the results don't change - so I don't see it double firing:
Code:
void TimeSome() {
  jj++;
[B]  asm("dsb");
[/B]}
 
@defragster: Oh indeed -
@Luni, sorry, seems not to work.
apart from the typo here:
constexpr IntervalTimer() {
CCM_CSCMR1 &= ~CCM_CSCMR1_PERCLK_CLK_SEL;
There seem to be other problems.

Ok, not much we can do, then.
Good night.
 
tO Shed some light on my earlier question...
I was also wondering if maybe also slightly related, but with the FlexIO stuff, I put some digitalWriteFast high/low calls around where my call back code is.
I also had two others defined in them as well (process RX or Process TX)...

But was wondering if maybe seeing these double interrupts is same thing?
I went ahead and tried adding the dsb...

Code:
void IRQHandler_FlexIO1() {
	FlexIOHandlerCallback **ppfhc = flex1_Handler_callbacks;
	for (uint8_t i = 0; i < 4; i++) {
		if (*ppfhc) {
			if ((*ppfhc)->call_back(&flexIO1)) return;
		}
		ppfhc++;
	}
	flexIO1.IRQHandler();
	 asm("dsb");
}
And now it appears like the double interrupts went away... :D
Wondering if we need to sprinkle it in more places as well, like HardwareSerial, ...
 
Yep - I thought I would mention the stuff with Serial Monitor, in case Paul expected the Autoscroll stuff to be working.

I've put these on the issue list. These are long-standing problems with the Arduino serial monitor. The lack of proper freezing when auto-scroll is off and data continues pouring in at high speed has been with us for quite some time, but now that we're able to send so much faster it's more serious.

I'm also concerned about resources usage when we're able to send ~50 Mbyte/sec speed....

For now I'm trying to avoid opening up the Java code. Definitely will look at it later, but probably not until we're well into 1062 testing.
 
Status
Not open for further replies.
Back
Top