Teensy 4.0 First Beta Test

Status
Not open for further replies.
@luni
Have to ask but are you calculating your load value correctly. Your load equation says to me (CPU cycles no interrupts)/(CPU Cycles w/interrupts). For load shouldn't be the other way around, (CPU Cycles w/interrupts)/(CPU cycles no interrupts)? Or am I missing something here on how you are calculating load?
 
Wow Kurt, good work.
Thanks.

Mostly just having some fun and learning :D
Your prior comment suggested you hadn't gotten to tweaking yet, assumed you'd get there :) Given CAN FD is likely taking 2nd SPI w/1062 it is very good the FlexIO can function - and the 1062 adds more of those it seems.
That is my thoughts as well, which is why I started playing with some of this stuff.

I still have lots to understand in this subsystem. So far I have taken their example setups like SPI Master, and put them to code, with changes like their examples would say things like using clock0 on pin0... So would convert those to which clock I allocated and which pin that was passed in... And now I have started playing more of the configuration of it, like, if their setup generated a 7.5mhz clock, can I configure it....

Example of now trying to learn something, I hacked in to my flexio main library function code, the ability to hopefully configure the clock for Flexio 1 and 2... And the ability to hopefully read in the current configuration and figure out what the FlexIO current speed is.
Code:
	// clksel(0-3PLL4, Pll3 PFD2 PLL5, *PLL3_sw)
	// clk_pred(0, 1, 2, 7) - divide (n+1)
	// clk_podf(0, *7) divide (n+1)
	// So default is 480mhz/16

Right now trying to experiment to see what the Actual base clock speeds are and not all of them are working.
Code:
//-----------------------------------------------------------------------------
// Compute the Flex IO clock rate. 
//-----------------------------------------------------------------------------
uint32_t FlexIOHandler::computeClockRate()  {
	// Todo: add all of this stuff into hardware()... 
	uint32_t pll_clock = 480000000U;	// Assume PPL3_SEL
	uint8_t clk_sel;
	uint8_t clk_pred;
	uint8_t clk_podf;
	if ((IMXRT_FLEXIO_t *)port_addr == &IMXRT_FLEXIO1_S) {
		// FlexIO1... 
		clk_sel = (CCM_CDCDR >> 7) & 0x3;
		clk_pred = (CCM_CDCDR >> 12) & 0x7;
		clk_podf = (CCM_CDCDR >> 9) & 0x7;
	} else {
		// FlexIO2... 
		clk_sel = (CCM_CSCMR2 >> 19) & 0x3;
		clk_pred = (CCM_CS1CDR >> 9) & 0x7;
		clk_podf = (CCM_CS1CDR >> 25) & 0x7;
	}
	// TODO - look at the actual clock select
	return pll_clock / (uint32_t)((clk_pred+1) * (clk_podf+1));
}

//-----------------------------------------------------------------------------
// Try to set clock settings
//-----------------------------------------------------------------------------
void FlexIOHandler::setClockSettings(uint8_t clk_sel, uint8_t clk_pred, uint8_t clk_podf)  {
	// Todo: add all of this stuff into hardware()... 
	// warning this does no validation of the values passed in...
	if ((IMXRT_FLEXIO_t *)port_addr == &IMXRT_FLEXIO1_S) {
		// FlexIO1... 
		// need to turn clock off...
		hardware().clock_gate_register &= ~hardware().clock_gate_mask;

		CCM_CDCDR = (CCM_CDCDR & ~(CCM_CDCDR_FLEXIO1_CLK_SEL(3) | CCM_CDCDR_FLEXIO1_CLK_PRED(7) | CCM_CDCDR_FLEXIO1_CLK_PODF(7))) 
			| CCM_CDCDR_FLEXIO1_CLK_SEL(clk_sel) | CCM_CDCDR_FLEXIO1_CLK_PRED(clk_pred) | CCM_CDCDR_FLEXIO1_CLK_PODF(clk_podf);

		// turn clock back on
		hardware().clock_gate_register |= hardware().clock_gate_mask;
	} else {
		// FlexIO2... 
		// need to turn clock off...
		hardware().clock_gate_register &= ~hardware().clock_gate_mask;

		CCM_CSCMR2 = (CCM_CSCMR2 & ~(CCM_CSCMR2_FLEXIO2_CLK_SEL(3))) | CCM_CSCMR2_FLEXIO2_CLK_SEL(clk_sel);
		CCM_CS1CDR = (CCM_CS1CDR & ~(CCM_CS1CDR_FLEXIO2_CLK_PRED(7)|CCM_CS1CDR_FLEXIO2_CLK_PODF(7)) )
			| CCM_CS1CDR_FLEXIO2_CLK_PRED(clk_pred) | CCM_CS1CDR_FLEXIO2_CLK_PODF(clk_podf);

		// turn clock back on
		hardware().clock_gate_register |= hardware().clock_gate_mask;
	}
}


Right now the above code is assuming the default that the clk_sel will be the default (3,1,7) 480mhz/16 = 30mhz. And SPI default /4 and I see about 7.5mhz

So current experiment is if I try to change default clks_sel for the different possibilities. Current results are:
Code:
0 - PLL4 - Hang have to use program button to get it to program again
1 - Pll3 PFD2 - 7.937mhz * 4 = 31.748 * 16 = 507.968
2 - PLL5 - Hang ... 
3 - PLL3_sw - as mentioned 7.5*4 = 30 * 16 = 480
So the 507.968 is not far off what I expected of 508.24 from the clock switcher page 986 of the pdf...
As for PLL4 and PLL5, wonder if I need to do something to enable these clocks... Which I expect default to PLL4(786.43) and PLL4(649.52) ?

Also assuming I get some of this configuration stuff working, then I need to update the Flex Serial code as well to call off to get the configured clock speed...

Another thing I am trying to figure out what I wish to do here, is with SPISettings for transaction support. Currently I can not use the standard SPI version as it is very hard coded to only work with SPI and only it the assumed clock setups... That is if you ask for a 20mhz clock speed, it will at compile time try to reduce this down to the actual register settings for the LPSPI system with the assumed clock speeds...

Currently I have a FlexSPISetting class part of my FlexIO stuff, that currently just holds onto the values passed in. Then on beginTransaction, it checks to see if the settings are the same as before. If not, it then computes the dividers... (currently I am not looking at LSB/MSB or mode)... Wonder if this is the best way to go, or if somehow the SPISettings could do both? That is hold onto the values they were configured with and also precompute the SPI clock settings...

Well now back to experimenting.... Searching to see if I can enable PLL4 or ...
 
Which hardware uses CANFD? Never worked with CAN. It's used in cars - are there more uses? Seems interesting...

the need for 2 minimum CAN controllers are used by enthusiasts for isolated CAN gateways or bridging as well. On the other side, you can use it for communication between microcontrollers in over 40 meters as its a differential network. My previous attempt at a use for it was CANquitto, which gave payload support to multiple teensies and peripheral port access (USBSerial,UART, SPI, I2C, TOUCH, digital pins, analog, etc) in a multi-master environment. Granted that was built on teensy3.x’s CAN2.0 8 byte frames, CANFD has up to 64 byte frames. Not only will less frames be needed for reassembly, but the flexdata speeds give it the extra boost at data speeds :)
 
PLL4 and PLL5

Maybe I won't setup to use these for FlexIO... But to follow on with previous post:
PLL4 I believe is the Audio PLL and PLL5 is the Video PLL...

My quick look through system, it appears like by default these are disabled. That is if I look at the register CCM_ANALOG_PLL_VIDEO I believe that this maybe controls this PLL. There are some bits like ENABLE(appears to default to off), POWERDOWN(default to 1) BYPASS(default to 1) (not sure what it is bypassing...)

Guess question will be for these it looks like PLL4(786.43) and PLL5(649.74) - If their additional speeds would be good to have here and/or if other sub-systems like Audio will be in direct competition to use/set these.
 
@luni
Have to ask but are you calculating your load value correctly. Your load equation says to me (CPU cycles no interrupts)/(CPU Cycles w/interrupts). For load shouldn't be the other way around, (CPU Cycles w/interrupts)/(CPU cycles no interrupts)? Or am I missing something here on how you are calculating load?

Thats right, I calculated the load relative to the time with interrupts, it makes more sense to calculate relative to the time without interrupts.

Load = 100% * ( cyclesWithInterrupts / cyclesWithoutInterrupts - 1.0)

SanityCheck (small numbers for easy calculation)
A background load of 20% means every 5th cycle is used by the background task -> If the loop takes 15 cycles without interrupts it would require 18cycles with a background load of 20%. The formula from above gives: Load = 100% * (18/15 -1 ) = 20%, I.e. the formula should be OK...

I'll change that, apply Franks additional changes (24->150MHz clock, anything else?) and do the comparison again later today.
 
Load = 100% * ( cyclesWithInterrupts / cyclesWithoutInterrupts - 1.0)

Sorry, this is not a good definition since it gets larger than 100% for large loads. Switched back to my old definition

Load = 100% * (1.0 - cyclesWithoutInterrupts / cyclesWithInterrupts) ,

and printed the raw cycle numbers in case somebody wants to use a different load definition. I also changed the constructor of the IntervalTimer to
Code:
constexpr IntervalTimer() {
  CCM_CSCMR1 &= ~CCM_CSCMR1_PERCLK_CLK_SEL;
  CCM_CSCMR1 | CCM_CSCMR1_PERCLK_PODF(0);
}
(@Frank: hope thats correct, unfortunately you deleted your originally posted code) ... adjusted the calculation of the LDVAL to 150MHz instead of 24MHz and verified the output frequency with my LA.

Summary:

Code:
1 timer 200kHz:       4 timers 200kHz
T4.0:  6.9% load      T4.0:  49.4% load
T3.6:  3.8% load      T3.6:  15.2% load

(Remark: with the currently set clock of 24MHz I get a load of 83% using all 4 timers with empty ISR)

Of course a T4.0 @50% load has still more computation power than a T3.6@15% load.

It is interesting that the T4 needs about 20% less cycles for the loop with disabled interrupts compared to a T3.6 (see below). Does the T4 have a more efficient instruction set? Both where compiled with -O2. I'll have a look at the *.lst file tomorrow.

Details:
Code:
T4.0 -----------------------------------------------------------------------

1 interval timer, empty ISR ----
f:200.0 kHz Load:   6.9  (w/o interrupts: 6500008 with interrupts 6979972)
f:100.0 kHz Load:   3.4  (w/o interrupts: 6500008 with interrupts 6727494)
f: 50.0 kHz Load:   1.7  (w/o interrupts: 6500008 with interrupts 6612375)
f: 25.0 kHz Load:   0.9  (w/o interrupts: 6500008 with interrupts 6556277)
f: 12.5 kHz Load:   0.4  (w/o interrupts: 6500008 with interrupts 6528713)
f:  6.2 kHz Load:   0.2  (w/o interrupts: 6500008 with interrupts 6514250)

4 interval timers, empty ISR
f:200.0 kHz Load:  49.4  (w/o interrupts: 6500009 with interrupts 12836482)
f:100.0 kHz Load:  12.2  (w/o interrupts: 6500009 with interrupts 7400713)
f: 50.0 kHz Load:   6.0  (w/o interrupts: 6500009 with interrupts 6917965)
f: 25.0 kHz Load:   3.0  (w/o interrupts: 6500009 with interrupts 6702975)
f: 12.5 kHz Load:   3.1  (w/o interrupts: 6500009 with interrupts 6707220)
f:  6.2 kHz Load:   1.6  (w/o interrupts: 6500009 with interrupts 6602729)

T3.6 -----------------------------------------------------------------------

1 interval timer, empty ISR ----
f:200.0 kHz Load:   3.8  (w/o interrupts: 8002773 with interrupts 8320619)
f:100.0 kHz Load:   1.9  (w/o interrupts: 8002773 with interrupts 8160598)
f: 50.0 kHz Load:   1.0  (w/o interrupts: 8002773 with interrupts 8085110)
f: 25.0 kHz Load:   0.6  (w/o interrupts: 8002773 with interrupts 8047330)
f: 12.5 kHz Load:   0.3  (w/o interrupts: 8002773 with interrupts 8028611)
f:  6.2 kHz Load:   0.2  (w/o interrupts: 8002773 with interrupts 8019045)

4 interval timers, empty ISR
f:200.0 kHz Load:  15.2  (w/o interrupts: 8002727 with interrupts 9436160)
f:100.0 kHz Load:   7.6  (w/o interrupts: 8002727 with interrupts 8662358)
f: 50.0 kHz Load:   3.8  (w/o interrupts: 8002727 with interrupts 8322371)
f: 25.0 kHz Load:   2.0  (w/o interrupts: 8002727 with interrupts 8163062)
f: 12.5 kHz Load:   1.0  (w/o interrupts: 8002727 with interrupts 8085143)
f:  6.2 kHz Load:   0.6  (w/o interrupts: 8002727 with interrupts 8047168)
 
@luni : "It is interesting that the T4 needs about 20% less cycles for the loop with disabled interrupts compared to a T3.6 (see below). Does the T4 have a more efficient instruction set? Both where compiled with -O2. I'll have a look at the *.lst file tomorrow. "
The T4 has way more advanced/deeper multilevel pipeline and a cute TLA name for doing two things at once so that may be coming into play. <Mike :)>Paul made note of that when I misinterpreted the TLA in a post he made some time back when the wonders of this IMXRT MCU were called out.

Kurt: How to I keep Serial data moving when interrupts are disabled? Unless I keep pushing out new Serial1.print() data - some of the old data gets orphaned unmoved in the output buffers.
Paul did this on T_3.x:
Code:
 if (SIM_SCGC4 & SIM_SCGC4_UART0) uart0_status_isr();

Code is working and running to print when in a Fault from "C" code in a CPP - but not 'real time' to Serial1 - but USB Serial seems to be. The CPU is not taking 'PC software bootloader' request - but again that is the current USB stack and Paul has a TODO for that
 
Well now back to experimenting.... Searching to see if I can enable PLL4 or ...

Kurt, i installed/cloned your flexio_T4 stuff and ran the simple SPI example. flexSPI.cpp had wrong name for include file. fixed that and it ran and printed the setup info, but i didn't see any activity on any of pins 2,3,4,5 (assuming those are 2,3,4,5 on the T4). I'm running with stock beta7 core file, do i need other core file updates, or ? A serial.print at the start of loop() only prints once.

Curious, do you not have to set any of the values for the GPIO MUX PAD for the pins?

EDIT: ok fetched latest imxrt.h and example works, flexIO clock @160mhz. scope shows SPI clock at 19.6 mhz (setting requested 20mhz). Changed settings to 40mhz, and scope shows 40 mhz (Vpp 2.5v), interbyte gap 290 ns, so data rate is about 16.4 mbs for 8-bit frames. About 17.7mbs for 16-bit frames

EDIT 2: 8-bit frame, block-transfer (null RX buffer), @40mhz, 31.6 mbs. no change if disable hardware CS. inter-byte gap 47 ns
 
Last edited:
Kurt: How to I keep Serial data moving when interrupts are disabled? Unless I keep pushing out new Serial1.print() data - some of the old data gets orphaned unmoved in the output buffers.
Paul did this on T_3.x:
Code:
 if (SIM_SCGC4 & SIM_SCGC4_UART0) uart0_status_isr();
A couple of things might work:
a) makre the member function IRQHandler() public in HardwareSerial. Or maybe protected, maybe friend... If so you could maybe do something like:
Serial1.IRQHandler();

Not very clean, also if you did this for all 8 possible serial ports, this would pull in the objects, buffers, ... For all 8...

b) Even less clean, you might be able to get away with doing something similar to T3.x... Again not clean, but maybe something like:
Code:
if (CCM_CCGR3 & CCM_CCGR3_LPUART6(CCM_CCGR_ON)) IRQHandler_Serial1();
Note: again IRQHandler_Serial1 is only defined in HardwareSerial.cpp and part of a hardware table also would require probably either doing external... or exporting and again this would bring in all of these objects...

c) Wonder if you could simply look at the IRQ handler and if set called it...
That is currently the code when you call Serial1.begin(...) will do: a attachInterruptVector(hardware->irq, hardware->irq_handler);

Which in this case is: for: IRQ_LPUART6 and IRQHandler_Serial1 So would something like:
Code:
if (_VectorsRam[IRQ_LPUART6 +16]) (*_VectorsRam[IRQ_LPUART6 +16])();

Note: I have not taken a detailed look at how _VectorsRam is initialized, so not sure if test would be for non NULL or not being the default handler address.

Again not sure of best ways... But the last one, hopefully does not cause any additional code to be brought in.

EDIT: looks like default value installed...
So maybe more like:
Code:
if (_VectorsRam[IRQ_LPUART6 +16] != &unused_interrupt_vector) (*_VectorsRam[IRQ_LPUART6 +16])();
 
Last edited:
A couple of things might work:
a) makre the member function IRQHandler() public in HardwareSerial. Or maybe protected, maybe friend... If so you could maybe do something like:
Serial1.IRQHandler();

Not very clean, also if you did this for all 8 possible serial ports, this would pull in the objects, buffers, ... For all 8...

b) Even less clean, you might be able to get away with doing something similar to T3.x... Again not clean, but maybe something like:
Code:
if (CCM_CCGR3 & CCM_CCGR3_LPUART6(CCM_CCGR_ON)) IRQHandler_Serial1();
Note: again IRQHandler_Serial1 is only defined in HardwareSerial.cpp and part of a hardware table also would require probably either doing external... or exporting and again this would bring in all of these objects...

c) Wonder if you could simply look at the IRQ handler and if set called it...
That is currently the code when you call Serial1.begin(...) will do: a attachInterruptVector(hardware->irq, hardware->irq_handler);

Which in this case is: for: IRQ_LPUART6 and IRQHandler_Serial1 So would something like:
Code:
if (_VectorsRam[IRQ_LPUART6 +16]) (*_VectorsRam[IRQ_LPUART6 +16])();

Note: I have not taken a quick look at how _VectorsRam is initialized, so not sure if test would be for non NULL or not being the default handler address.

Again not sure of best ways... But the last one, hopefully does not cause any additional code to be brought in.

Thanks Kurt - Clean doesn't matter if it works - it is DEV time debug code - just needs to be safe and effective and not cause any trouble - and at this point the MCU is faulted and I just want to show the Log/Trace info in case it helps see how the fault came about. It works perfectly when not faulted - i.e. just calling the debug_tt functions for fun.

You may have missed my post 1067 then post #1109 the other day - I already hacked the "IRQHandler() public" the other day and it didn't seem to help - though I may have had other issues?

The data doesn't get lost - just buffered ... and each new print pushes out about 16 bytes?
Will look at this tips and post what I see. And the code doesn't have a Stream* until runtime ...

Just gave the last one a quick try as it seemed 'direct' - not seeing it help yet ... will study a bit
 
Hi @defragster,

You might have missed my edit of the post, where it the last one should check against not being the default...
Code:
if (_VectorsRam[IRQ_LPUART6 +16] != &unused_interrupt_vector) (*_VectorsRam[IRQ_LPUART6 +16])();
And of course if you edited the ram vectors to something other than unused... Then need to check against that one...

As for in post 1067 calling: HardwareSerial::IRQHandler()
This is not a static function, but is instead called with pointer to this... That is for example for Serial1, we have:
Code:
void IRQHandler_Serial1()
{
	Serial1.IRQHandler();
}
So then it has pointers to all of the stuff specific for that hardware port and buffers and...
 
Hi @defragster,

You might have missed my edit of the post, where it the last one should check against not being the default...
Code:
if (_VectorsRam[IRQ_LPUART6 +16] != &unused_interrupt_vector) (*_VectorsRam[IRQ_LPUART6 +16])();
And of course if you edited the ram vectors to something other than unused... Then need to check against that one...

As for in post 1067 calling: HardwareSerial::IRQHandler()
This is not a static function, but is instead called with pointer to this... That is for example for Serial1, we have:
Code:
void IRQHandler_Serial1()
{
	Serial1.IRQHandler();
}
So then it has pointers to all of the stuff specific for that hardware port and buffers and...

I did not see the edit - but just confirmed indeed Serial1 in use has been set to !=unused_interrupt_vector. And in the call to FlushPorts - I see it was called about 1027 times during the dump loop and it had no helpful effect.

It is like it is counting bytes in the buffer and deciding to leave before empty - or the FIFO is stalling? It turns out the number of bytes I'm printing in subsequent batches is ~14 and ~14 is what comes out belatedly - then I see this code/comment:
Code:
} while (((port->WATER >> 8) & 0x7) < 4); 	[B]// need to computer properly[/B]
> it doesn't seem to be that as that is the xfer to the FIFO?

I see this when I force a fault - the bold comes out in good time - then it starts trickling out where underlined:
NOW FAULT ...

in Sketch ... Fault irq 3
stacked_r0 :: 1
stacked_r1 :: 16fe28
stacked_r2 :: 5e
stacked_r3 :: 418000f1
stacked_r12 :: 20001c8c
stacked_lr :: 233
stacked_pc :: 236
stacked_psr :: 1000000
_CFSR :: 400
_HFSR :: 40000000
_DFSR :: 0
_AFSR :: 0
_BFAR :: 0
_MMAR :: 0

HELLO WORLD ... Fault irq 3

userDebugDumptt() IN debug_tt ___

t:\tcode\libraries\debug_tt\debug_tt.cpp Jan 27 2019 18:01:18

F_CPU=600000000
userDebugDumptt() CALL >> debug_fault( 7 );

>>>> debug_fault >>>> debug_fault >>>> TYPE:T_4
Debug Info:
0 => 756765 0xB8C1D [L#94_C#2
1 => 1506856 0x16FE28 [L#94_C#3 _<< last func::loop
2 => 755809
0xB8861 [L#76_C#1

returned from CALL >> d

Just for fun and testing I put this in the FlushPorts() and it worked without #include or faulting:
Code:
  IRQHandler_Serial1();
  IRQHandler_Serial2();
  IRQHandler_Serial3();

That is the 'friendly' interface I thought I'd want to find - but it doesn't help.

Very odd because I print in parallel to Serial and Serial1 and Serial comes out real time and then Serial1 'trickles' so I'm not sure what is driving Serial as I'm doing nothing for that.

I added '__enable_irq();' and it doesn't help as in Fault state as we are in a Sub_Zero Priority interrupt.
 
@Defragster: The "world smallest softwareserial" I posted some days ago does not need interrupts..

Yeah - I saw that except it generally works as is with standard print functions - I'd have to sprint to buffers and then dump it out - and then taking user input to prompts gets involved ... Hopefully the workaround below adds needed value until perhaps we find out why output sticks.

More CORES pulled - the new DUMP regs labelled like p#1289 above is in and works as is or for user/debug_tt to take over the weak handler and go from there.

KurtE - I punted for now and in the case of a T4 fault I just print out strings of '...' to force the buffer through. It limits what can happen after that - no user input for next actions - but at least the dump data can come out ASAP if it helps. Back to the main debug_tt stuff.

<edit> Found Paul's code to put T4 to Bootloader - with a Prompted 'b' for bootloader from debug_tt [i.e. not a Fault] when TeensyLoader 'Auto' is active it programs and restarts. Didn't find code to just restart yet - but T_Loader can GUI restart if Auto is off. Frank has PowerOff code but not sure that helps here.
 
Last edited:
AUDIO LIBRARY

Started looking at the audio library yesterday and going through it slowly. Yes, I know Paul said he was looking at redoing audiostream, based on FrankB's pull request, but wanted to see what it would take. But ran into a silly problem. If I try to use F_BUS, it tells me not defined, if I use F_BUS_ACTUAL its happy. F_BUS is all over the place, I can do a workaround but where do I do the initial define for F_BUS? I tried in clockspeed.c but that didn't work.
 
I'm working on the library since yesterday. It compiles now, and tonight I'll try to add WMXZ's code.
Created a PR yesterday for some #defs.
 
Morning Frank,
Got it. I'll cease and desist :) on the lib. You know it better than me. Will wait for your changes to post.

Mike
 
Kurt, i installed/cloned your flexio_T4 stuff and ran the simple SPI example. flexSPI.cpp had wrong name for include file. fixed that and it ran and printed the setup info, but i didn't see any activity on any of pins 2,3,4,5 (assuming those are 2,3,4,5 on the T4). I'm running with stock beta7 core file, do i need other core file updates, or ? A serial.print at the start of loop() only prints once.

Curious, do you not have to set any of the values for the GPIO MUX PAD for the pins?

EDIT: ok fetched latest imxrt.h and example works, flexIO clock @160mhz. scope shows SPI clock at 19.6 mhz (setting requested 20mhz). Changed settings to 40mhz, and scope shows 40 mhz (Vpp 2.5v), interbyte gap 290 ns, so data rate is about 16.4 mbs for 8-bit frames. About 17.7mbs for 16-bit frames

Thanks for testing! Note I just pushed up some changes, which I believe allows me to not define using hardware (FLEX) pin for CS pin. Still WIP as I can and should update my code to not allocate second timer in this case as I am not using it.

But if I output at the default clocks... Asking for 20mhz, getting something like 14.7/15... With using the Hardware Flex pin you see:
screenshot.jpg

But if I turn off the hardware CS support and have the sketch put out start/stop CS instead, you see something like:
screenshot.jpg
So time went from 17.03us to 14.81us to output the 27 characters

Now probably fix the allocating of extra timer and then play more with PLL4/PLL5 settings. I see that my last update enable PLL5 some which first test showed was in this configuration slower...
 
Now probably fix the allocating of extra timer and then play more with PLL4/PLL5 settings. I see that my last update enable PLL5 some which first test showed was in this configuration slower...

FWIW, I should have been timing the block transfer. With 8-bit frame, 1024-byte block transfer (null RX buffer) @40mhz, I measure 31.6 mbs. Inter-byte gap 47 ns, the ubiquitous MCU synchronization delay for touching an IO register -- maybe DMA would eliminate the inter-frame delay.
 
Last edited:
Hi @manitou,

Yep I have not tried DMA here yet, but with the Transfer buffer, I would think it would be working at about the highest speed output as I believe it is all double buffered...

The code for transfer buffer could be optimized a bit more, like precompute the byte that goes out instead of figure out after the Shift buffer is filled with previous output. But again I believe that the Buffer register is moved into the shifter register... But at some point will have to try the dma!

Also at some point may want to try out using more DMA for other areas as well. Example I wonder if it makes sense to try to change things like HardwareSerial RX interrupts to try to use DMA (circular memory) to do the receives... Not sure how easy it would be to control, that is knowing when additional information came into the logical queue and move it to the user queue... I did do some of this for some other boards (OpenCM9.04, OpenCR), for both RX and TX... Not sure if that makes sense here or not.

But that is a different story.
 
Hi @manitou,

Yep I have not tried DMA here yet, but with the Transfer buffer, I would think it would be working at about the highest speed output as I believe it is all double buffered...
Your 2nd logic analyzer plot seems to show no inter-frame gap (with software controlled CS), but i just fetched your latest from github, and I am still seeing an interframe gap with software CS. Here is scope of 1024-byte block transfer @40mhz, data rate 31.5 mbs
flex.png
Curiously the first two byte have no gap.

For those of us with case-sensitive OS, you need to change include in flexSPI.cpp to FlexSPI.h

EDIT: I just ran block transfer with SPI CLK@20mhz, and i see no interframe gap on scope, get data rate of 20 mbs. Tests were with SPIFLEX.flexIOHandler()->setClockSettings(3,2,0); (160mhz flexio clock)

? SPI@30mhz block transfer ran twice and hung ?
In flexSPI.cpp, if i change transfer() while to while (tx_count) instead of rx_count, sketch doesn't hang @30mhz (26.7 mbs), no inter-frame gaps
 
Last edited:
For those of us with case-sensitive OS, you need to change include in flexSPI.cpp to FlexSPI.h

I renamed some of the files such that the all hopefully start with the name Flex... where a few were flex... And hopefully updated all of the #includes to use the correct file names...

Let me know if I missed some. I will try to update my MAC later to try it out. But was sort of waiting until next beta release.
 
@KurtE and others
I am driving myself crazy again - so what else is new. I am trying to convert the Encoder sketch I put together into a library. I have been using your SPI and Flex lib and reading about register addressing in C++ and getting confused (never did this so something new - again). So far I created a structure for the encoder offsets:
Code:
typedef struct {
  volatile uint16_t CTRL;
  volatile uint16_t FILT; 
…… and the rest...
} IMXRT_ENC_t;

then I found that you defined the base addresses using:
Code:
#define IMXRT_FLEXIO1_S		(*(IMXRT_FLEXIO_t *)0x401AC000)
#define IMXRT_FLEXIO2_S		(*(IMXRT_FLEXIO_t *)0x401B0000)
For the encoder I would need to create 4 defines using my struct. Then you have defined a function:
Code:
IMXRT_FLEXIO_t & port() { return *(IMXRT_FLEXIO_t *)port_addr; }
Where I am stuck that I can't seem to find defined is what is
Code:
port_addr
I have seen this all over the place and I get it up to that last part. Can you all explain how this works.

After this you can use port()->CTRL as an example. I did find a descent reference on technic's for register access (https://accu.org/index.php/journals/281) and I think the structure example (Listing 5) is what you all going for here.

EDIT: Been kind of distracted last few days taking care of wife with her knee - she is fine now and trying to get focused again. Hate cooking and cleaning.
 
Status
Not open for further replies.
Back
Top