Teensy 4.1 How to start using FlexIO?

miciwan

Active member
Hi everyone,

I'm slowly working though different parts of the documentation, to build the enough understanding of the i.MX RT1060 to use it in one of my projects. Two important components will be a LCD screen and a camera sensor, and my aim is to have them both running at the highest possible framerates, to provide a good usear experience. For that, I would like to utilize the parallel interfaces of both the sensor and the screen, and I was looking for ways to do it efficiently on the microcontroller side.

I initially looked at just doing DMAs. When digging through the docs, I kept writing my thought process up, and I posted it here:https://forum.pjrc.com/threads/6335...tart-using-DMA?p=266991&viewfull=1#post266991. It seemed however, that with DMAs alone on the GPIO I simply won't be able to get the speeds I would like (the camera sensor I have in mind can run at 75MHz; DMA seems to be loosing data if going faster than 10MHz), so I started looking into some other options.

(just for reference: the docs are here: https://www.pjrc.com/teensy/IMXRT1060RM_rev2.pdf, KurtE pinout diagrams are here: https://github.com/KurtE/TeensyDocuments/blob/master/Teensy4.1 Pins.pdf and here: https://github.com/KurtE/TeensyDocuments/blob/master/Teensy4.1 Pins Mux.pdf)

After skimming through the docs, the CMOS sensor interface (Chapter 34 in the docs) and the Enhanced LDC interface (Chapter 35) seemed like an option. The first one has two problems: first my sensor outputs raw, 12-bit data, but the CMOS interface can only supports raw data in 10 bit width. Second, only 8 of its data lines are exposed as Teensy pins. These are actually the top 8 bits of the 10 bit bus supported by that interface, but I don't want to lose 4 bits of precision, so it's a no-go for me. The LCD sensor interface has even fewer pins exposed (only 5 iirc), which makes it pretty much unusable. Oh well, back to the drawing board, or rather the docs.

If you scroll the table of content further, there's however this section on FlexIO - and it seemed *exactly* like something I could use. Since there is fairly little information on it available (I think the only bigger project is KurtE SPI code on github), I thought that I'll do the same thing as I did with the DMA and document how I approached my problem, step by step, with references to the docs, so if you're unfamiliar with the system, just like I was, you can follow along and adjust it as you go to your particular use case. Here it is.

Side comment: interestingly, once you get this general intuition about the microcontroller-docs-lingo, the FlexIO documentation is actually not that hard to parse. It's all there in Chapter 50. While it's not as nicely packaged as the Raspberry Pi Pico PIO (which I think is absolutely amazing from the concept perspective - moving to a dedicated interpeter for driving IO from this convoluted, fixed function setup reminds me of how GPUs moved from fixed function register combiners to assembly shaders back in the early 2000s) it seems like it's not really anything less expressive.

How to start using FlexIO

(I highly suggest going through the DMA example first, as it's simpler and introduces a lot of these concepts - that are not really complicated, but take a while to get used to, if you don't have a lot of experience with microcontroller (like me).)

Under the hood, the FlexIO is a collection of shifters and timers. In the Teensy 4 version, they have 8 shifters and 8 timers. The numbers are actually reported in the PARAM register (page 2914). One caviat here: the docs only describe 4 shifters in the registers section (and only 4 shifters are exposed as defines in the Teensy core header files), while the textual description correctly talks about 8 shifters. You will need to add a couple extra #defines if you want to access the shifters 4-7. As usual, all the configuration is done through a number of memory mapped registers. The table with the list of them is on page 2912.

The shifters are 32 bit registers, that can, well, shift the data. When shifting the data, the input appears on the top bits, and everything already in the shifter shifts right, and the lowest bits are output by the shifter (nice diagram on page 2885). So if you configure the shifter to the receive mode it will shift in the incoming data to the top bits. If you set it up to the transmit mode, it will spit out the low bits. The cool feature here is the data can be sourced or spit out to either the output pins, but also to a neighboring shifter. This allows to create a chain of shifters, one outputting to the next, and only the last one actually driving the pins.

Why would you do something like that? For buffering. This way you can buffer the incoming/outgoing data and the CPU only has to take care of it every now and then, instead of every time new piece of data appears (as it was the case with the DMA) - so exactly what we wanted. You might however wonder what happens with the data between the point in time when the shifters are already full, but before the CPU picks the data up (or between the point when all the previously data is shifted out and new data is provided by the CPU). There's actually one more, crucial element there: shifter buffer. Every now and then (it's for you to decide!) the data is moved from the shifter to the shifter buffer (in case of receiving the data; or from the shifter buffer to the shifter in case of transmitting) and the CPU gets notified (in a form of interrupt request, DMA request, or you can also manually check the status bit in a register). The new data can be fetched from the shifter buffer, while the shifters continue their job and receive more data. After the data is read from the shifter buffer, the notification is automatically cleared and the shifters can put the new data there when it's ready.

In short:
- data comes from the external pins into the shifter
- every some number of shifts it gets transferred to the shifter buffer and the CPU gets notified
- the CPU reads the data from the shifter buffer

The shifter buffers are available as SHIFTBUFa registers. They are continuous memory, which is useful when filling them, and they also come in a bunch of versions with swapped individual bits, nibbles, bytes. Pages 2929+ describe all the variations.

All of the shifters can shift in/out one bit at a time from/to the external pins. This doesn't really mean that you can only implement serial protocols - anything parallel just needs the data to be transposed first and transmitted through some number of pins at the same time. And additionally shifters 0 and 4 can shift the data OUT to multiple pins at a time, and shifters 3 and 7 can shift data IN from multiple pins at a time. For example, if the shifter is set out for parallel output of 4 bits, instead of shifting by 1 bit, it's shifted by 4, and the lowest 4 bits are put on the output pins. As usual, there are caveats - the pins used for parallel input/output must be continuous - so if you want 8 bit parallel output, you need 8 consecutive FlexIO pins. To save you trouble of looking it up: out of the three FlexIO modules, only the third one (FlexIO3) has more than 4 consecutive pins exposed in Teensy (it actually has 20 - FlexIO3:0 - FlexIO3:19). The problem is that FlexIO3 is nerfed, and cannot generate DMA requests, that makes it more a pain in a butt to use (because you either have to busy wait, polling the status and filling/reading from the shifter buffers as necessary or you have to use interrupts for that, but at any decent data rates, you're pretty much just sit in the interrupt handler - so you might just as well to the busy polling). FlexIO2 has 2 strides of 4 consecutive pins (0-3, 16-19). FlexIO1 has stride od 4 consecutive pins (4-8) but they are just as good as 4, because FlexIO can only do parallel outputs in power-of-two counts. Important thing to note is that even though only certain shifters support parallel transfer to/from the pins, ALL of them support parallel transfer from/to neighboring shifters. So you can set up shifter 0 to shift the data out 4 bits at a time to the pins, shifter 1 to shift 4 bits at a time to shifter 0, shifter 2 to shifter 1 and so, creating a long chain of shifters, effectively getting an output queue 8 x 32 bits = 32 bytes in length. The number of bits shifted by each shifter is controlled by PWIDTH field in the shifter configuration register SHIFTCFGa, page 2927.


There are two more pieces to the puzzle: when to shift and when to transfer from the shifter to the shifter buffer (or from the shifter buffer to the shifter, when transmitting out). Both of these tasks are controlled by the internal timers. There are 8 timers, and each shifter can pick which timer they use. Each timer has a corresponding counter and an output. The counter is decremented based on some signal - either the FlexIO clock, pin input or by an external trigger. When the counter is zero and it decrements again, the timer output is toggled. And the shifter clock can be either that timer output, or alternatively the pin or external trigger input (TIMDEC setting in TIMCFGa register, page 2934). Each shifter can pick to shift on positive or negative edge of that shifter clock signal (TIMPOL setting in the shifter control register SHIFTCTLa on page 2925).

To make it a bit more convoluted, there are three different modes for decrementing the timer's counter, and they, together with the decrement logic define when the data is moved between the shifter and its corresponding shifter buffer. In the simplest, 16-bit mode, whenever the counter is at zero and decrements, the timer toggles the output and the data is moved from shifter to buffer (or vice versa). In the 8-bit mode, the counter is divided into two 8 bit parts. Whenever the lower half gets to zero and decrements, the timer output toggles, the higher half decrements and the lower half is reloaded. When the higher half gets to zero, the data is transferred between the shifter and the buffer. The third mode is the PWM mode, where again, the counter is divided into two 8-bit parts. The lower eight bits control decrement when the timer output is high, and when the value gets to zero, the timer output toggles to low and the high 8 bits start decrementing. When they reach zero, the output toggles again and the data is copied between the shifter and the buffer. Technically the timer output is available as an output from the FlexIO module (section 50.2.6.1 on page 2895) but it doesn't seem it's available as an input to any of the XBARs, so I'm not sure how useful this is. The timer output can be however routed back to FlexIO, as a *trigger* for another timer.

And to make it more flexible/confusing, timers also get yet *another* independent input - a trigger. A trigger can be used as the clocking signal for the timer, but more interestingly to enable or disable given timer. The TIMCTLa register (page 2932) lets us to pick an internal trigger or an external one (TRGSRC field). The internal ones are pin inputs, shifter status flags (indicating if the data has been loaded from/to the shifter from the corresponding buffer) or other timer outputs. External triggers can be hooked up through the XBAR. Selecting the trigger is done with TRGSEL field, and the logic is funky to say the least - see page 2933. For the external trigger you simply select the index (corresponding to the FLEXIOa_TRIGGER_INb outputs from the XBAR - see page 72). Then, the timer configuration register TIMCFGa allows us to set when to enable or disable the given timer with TIMDIS and TIMENA fields (page 2936). This allows to implement really useful functionality, where the entire send/receive process is only performed when some signal is high or low (something like ENABLE/DATA_VALID/whatever).

The above describes only the most basic functionality of the shifters - the transmit and receive mode. They can also run in other modes - for instance you can define a small state machine, flipping between different shifters based on the inputs, but for now, we can ignore that - if you need, all the derails are described on pages 2886 and on. Then there's also some extra functionality for inserting extra start and stop bits, resetting the timers on different occasions and some other more exotic stuff. The pretty unusual bit (in a good sense, this doesn't really happen too often in this documentation...) is the 50.4 section starting on page 2896 - that goes over the configuration for a number of popular protocols, including UART, SPI, I2C and some others. It's a good start when you want to look for some examples.

Fairly realistic use case

Now, we'll go through implementing a high speed, 12-bit wide parallel interface (it can totally run 50MHz, probably faster too, though even at 50MHz it's tricky to process the data online - though perfectly fine if you just want to store it - assuming it fits in your memory budget).

The setup will be as follows:
- there are 12 parallel data lines that will present new data every clock cycle
- there's also a clock signal
- and a DATA_VALID signal, that is high when the data presented on the parallel lines is valid, and should be ignored when it's low. The clock singal still clocks when the DATA_VALID is low.

which is a pretty common was of accessing CMOS sensors (which is why I even started looking into any of this).

First of all we need to get these 12 data lines through. As noted before, to use the parallel input/output functionality, the pins need to be continuous. Let's disregard FlexIO3, because it doesn't support DMA (and has some other shortcomings - for instance we cannot route signals through XBAR to it). FlexIO1 only has few pins exposed as Teensy pins, not enough for us. FlexIO2 has 13 pins available - which looks like it's *almost* enough for us - we would be only one short, but as it turns out that's actually enough.

The problem is that the pins available are not continuous. There's the 0-3 range and the 16-19 range, and the remaining pins are scattered all over the place. We definitely will not be able to use the parallel input functionality, at least not with all our 12 bits of data (and actually parallel input does only power-of-two counts). There's however a fairly simple way to work around this, it just requires massaging the data a bit.

To output n bits over n serial interfaces, instead of a single n-bit parallel one, we need to transpose the data first. We take the 0th bits from 32 words and pack them in a single unsigned int. Then we do the same with 1st bit from each of these word and so one. In the end, we get twelve 32bit words representing 32 elements from the data stream - and we can just output them at the same time, one bit by one bit - getting parallel data stream, without the need for the parallel output. Same goes for input, just in the opposite order.

The i.MX RT1060 however has only 8 shifters, so we can only get an 8-bit wide bus this way. But this is when the parallel shifters come in handy - we can output some data in parallel, using parallel shifters (instead of transposing the data bit by bit, we need to do it in chunks of the width of our parallel output), and some serially. Teensy has these two 4-bit ranges of continuous FlexIO2 pins - and we can use them both - which will give us 8 bits - and the remaining 4 bits will be processed serially.

The setup will be as follows:
- Teensy pinsy 6, 9 corresponding the to the pads B0_10, B0_11 will get the first two bits, each of them processing a single bit. They represent FlexIO2 pins 10, 11
- Teensy pins 10, 12, 11, 13, corresponding the the native pads B0_00 - B0_03, which represent FlexIO2 pins 0-3, will get the next four bits, grabbing them using parallel shift 4 bits wide
- Teensy pinsy 35, 34 corresponding the to the pads B1_12, B1_13 will get the next two bits, each of them processing a single bit. They represent FlexIO2 pins 28, 29
- Teensy pins 8, 7, 36, 37 corresponding to pads B1_00 - B1_03 representing FlexIO2 pins 16-19 will get four bits again, in parallel.

It's quite possible that there's a better way of choosing that setup, that will result in a simpler decoding code later, but I didn't bother to investigate that avenue.

To set it up, we first have to remember about actually enabling FlexIO2 clock:

Code:
CCM_CCGR3 |= CCM_CCGR3_FLEXIO2(CCM_CCGR_ON);

Another thing is the actual clock speed for FlexIO2, which defaults to 30MHz (see the nice diagram on page 1016 - the default paths are marked with dots on the multiplexers!). This is way too slow for us, we want it at maximum. According to the table on page 1031 that maximum for FlexIO2 is 120MHz, so we find how to set these dividers that are on the way from PLL3 (that's 480MHz) to FlexIO2 on the clocking diagram. We want the overall divider to be 4, so we set CS1CDR[FLEXIO2_CLK_PODF] to 1 - which makes it makes it divide by 2 - so together with CS1CDR[FLEXIO2_CLK_PRED] will divide our PLL3 frequency by 4.
Code:
CCM_CS1CDR &= ~( CCM_CS1CDR_FLEXIO2_CLK_PODF( 7 ) );
CCM_CS1CDR |= CCM_CS1CDR_FLEXIO2_CLK_PODF( 1 );
Note about that maximum: I tried setting it to be clocked even higher (twice as high :) and it worked too. I haven't thoroughly tested it, so maybe it becomes unstable at times, or maybe heats up like crazy, or maybe there's some limiter somewhere and it doesn't really go faster - but anyway, I tried and it doesn't blow up the chip instantly, so if you're in need, maybe you can actually run it faster.

Now we need to enable the FlexIO2 itself. For some reason, there's an extra "enable" bit in one of its registers, so let's flip it on (PARAM register, page 2914)


Code:
FLEXIO2_CTRL |= 1;

Now that we have the FlexIO2 up and running, we can start setting up the shifters.
First, lets set the external pads to actually use the FlexIO2 mode (see KurtE's diagrams linked earlier, or if you want to find it in the docs, the registers are, for instance, on page 508, and table with all the modes for each pad in on page 293):

Code:
IOMUXC_SW_MUX_CTL_PAD_GPIO_B0_10 = 4;		// 10
IOMUXC_SW_MUX_CTL_PAD_GPIO_B0_11 = 4;		// 11

IOMUXC_SW_MUX_CTL_PAD_GPIO_B0_00 = 4;		// 0
IOMUXC_SW_MUX_CTL_PAD_GPIO_B0_01 = 4;		// 1
IOMUXC_SW_MUX_CTL_PAD_GPIO_B0_02 = 4;		// 2
IOMUXC_SW_MUX_CTL_PAD_GPIO_B0_03 = 4;		// 3

IOMUXC_SW_MUX_CTL_PAD_GPIO_B1_12 = 4;		// 28
IOMUXC_SW_MUX_CTL_PAD_GPIO_B1_13 = 4;		// 29

IOMUXC_SW_MUX_CTL_PAD_GPIO_B1_00 = 4;		// 16
IOMUXC_SW_MUX_CTL_PAD_GPIO_B1_01 = 4;		// 17
IOMUXC_SW_MUX_CTL_PAD_GPIO_B1_02 = 4;		// 18
IOMUXC_SW_MUX_CTL_PAD_GPIO_B1_03 = 4;		// 19


Setting up a shifter requires filling up two registers: SHIFTCTLa and SHFTCFGa (pages 2925 and on). Lets start with SHIFTCTL for shifter 0. We want it to be controlled with timer 0, shifting on positive edge, we disable the pin output (remember, we're reading the data in!), we want it to control pin 10 (that's the FlexIO2 pin index!), we want it active high and want it in receive mode:

Code:
FLEXIO2_SHIFTCTL0	= FLEXIO_SHIFTCTL_TIMSEL( 0 )		|			// timer 0
			  // FLEXIO_SHIFTCTL_TIMPOL		|			// on positive edge
			  FLEXIO_SHIFTCTL_PINCFG( 0 )		|			// pin output disabled
			  FLEXIO_SHIFTCTL_PINSEL( 10 )		|			// pin 0
			  // FLEXIO_SHIFTCTL_PINPOL		|			// active high
			  FLEXIO_SHIFTCTL_SMOD( 1 );					// receive mode

Next the SHIFTCFG register: we want to shift in one bit at a time, from the pin, and we don't need any start or stop bits:

Code:
FLEXIO2_SHIFTCFG0	= FLEXIO_SHIFTCFG_PWIDTH( 0 )		|			// single bit
			  // FLEXIO_SHIFTCFG_INSRC		|			// from pin
			  FLEXIO_SHIFTCFG_SSTOP( 0 )		|			// stop bit disabled
			  FLEXIO_SHIFTCFG_SSTART( 0 );					// start bit disabled

The settings are the same for shifters 1, 4 and 5, since they also receive bits one at a time. The only thing changing is the pin number:
Code:
FLEXIO2_SHIFTCTL1	= FLEXIO_SHIFTCTL_TIMSEL( 0 ) | FLEXIO_SHIFTCTL_PINCFG( 0 ) | FLEXIO_SHIFTCTL_PINSEL( 11 ) | FLEXIO_SHIFTCTL_SMOD( 1 );
FLEXIO2_SHIFTCFG1	= FLEXIO_SHIFTCFG_PWIDTH( 0 ) | FLEXIO_SHIFTCFG_SSTOP( 0 ) | FLEXIO_SHIFTCFG_SSTART( 0 );

FLEXIO2_SHIFTCTL4	= FLEXIO_SHIFTCTL_TIMSEL( 0 ) | FLEXIO_SHIFTCTL_PINCFG( 0 ) | FLEXIO_SHIFTCTL_PINSEL( 28 ) | FLEXIO_SHIFTCTL_SMOD( 1 );
FLEXIO2_SHIFTCFG4	= FLEXIO_SHIFTCFG_PWIDTH( 0 ) | FLEXIO_SHIFTCFG_SSTOP( 0 ) | FLEXIO_SHIFTCFG_SSTART( 0 );

FLEXIO2_SHIFTCTL5	= FLEXIO_SHIFTCTL_TIMSEL( 0 ) | FLEXIO_SHIFTCTL_PINCFG( 0 ) | FLEXIO_SHIFTCTL_PINSEL( 29 ) | FLEXIO_SHIFTCTL_SMOD( 1 );
FLEXIO2_SHIFTCFG5	= FLEXIO_SHIFTCFG_PWIDTH( 0 ) | FLEXIO_SHIFTCFG_SSTOP( 0 ) | FLEXIO_SHIFTCFG_SSTART( 0 );

For the parallel input, the setup is slightly more tricky. So far we used 4 shifters, and we will use another two now that have the parallel input capability. This leaves two shifters unused. We will actually use them to receive the data from the two parallel input shifter. This way, when they are full, instead of just overflowing and loosing the data (or forcing the CPU/DMA to pick it up), they will shift it into the neighboring shifters, effectively allowing for longer buffering. Shifters are 32 bits wide, so if we're shifting in 4 bits at a time, we will fill it in 8 cycles. Adding these extra shifters as additional buffers gives us 16 cycles between fills, giving some more headroom. There's this slight imbalance here, as the single-bit shifters can buffer through 32 cycles (because they are 32 bits wide), while the parallel ones, even with that extra backup can only do 16 cycles - but oh well, shit happens, we need to live with that.

Shifter 3 is the parallel input one. For the pin we just select the first one, and for the width, we set 3 (anything between 1 and 3 indicates 4 bit shifts). Everything else stays the same:

Code:
FLEXIO2_SHIFTCTL3	= FLEXIO_SHIFTCTL_TIMSEL( 0 ) | FLEXIO_SHIFTCTL_PINCFG( 0 ) | FLEXIO_SHIFTCTL_PINSEL( 0 ) | FLEXIO_SHIFTCTL_SMOD( 1 );
FLEXIO2_SHIFTCFG3	= FLEXIO_SHIFTCFG_PWIDTH( 3 ) | FLEXIO_SHIFTCFG_SSTOP( 0 ) | FLEXIO_SHIFTCFG_SSTART( 0 );

Shifter 2 is configured to grab the data falling out the shifter 3. SHIFTCTL is set up the same (pin number doesn't matter here), and in SHIFTCFG we set the bit indicating that we're grabbing data from the shifter N+1 (so 3 here) instead of the pin:

Code:
FLEXIO2_SHIFTCTL2	= FLEXIO_SHIFTCTL_TIMSEL( 0 ) | FLEXIO_SHIFTCTL_PINCFG( 0 ) | FLEXIO_SHIFTCTL_PINSEL( 0 ) | FLEXIO_SHIFTCTL_SMOD( 1 );
FLEXIO2_SHIFTCFG2	= FLEXIO_SHIFTCFG_PWIDTH( 3 ) | FLEXIO_SHIFTCFG_INSRC | FLEXIO_SHIFTCFG_SSTOP( 0 ) | FLEXIO_SHIFTCFG_SSTART( 0 );

Shifters 6 and 7 are set up in the same way:
Code:
FLEXIO2_SHIFTCTL6	= FLEXIO_SHIFTCTL_TIMSEL( 0 ) | FLEXIO_SHIFTCTL_PINCFG( 0 ) | FLEXIO_SHIFTCTL_PINSEL( 0 ) | FLEXIO_SHIFTCTL_SMOD( 1 );
FLEXIO2_SHIFTCFG6	= FLEXIO_SHIFTCFG_PWIDTH( 3 ) | FLEXIO_SHIFTCFG_INSRC | FLEXIO_SHIFTCFG_SSTOP( 0 ) | FLEXIO_SHIFTCFG_SSTART( 0 );

FLEXIO2_SHIFTCTL7	= FLEXIO_SHIFTCTL_TIMSEL( 0 ) | FLEXIO_SHIFTCTL_PINCFG( 0 ) | FLEXIO_SHIFTCTL_PINSEL( 16 ) | FLEXIO_SHIFTCTL_SMOD( 1 );
FLEXIO2_SHIFTCFG7	= FLEXIO_SHIFTCFG_PWIDTH( 3 ) | FLEXIO_SHIFTCFG_SSTOP( 0 ) | FLEXIO_SHIFTCFG_SSTART( 0 );

That small caveat from above here: the docs, and the header files mention only 4 shifters, so we want to add the following macros:
Code:
#define FLEXIO2_SHIFTCTL4		(IMXRT_FLEXIO2.offset090)
#define FLEXIO2_SHIFTCTL5		(IMXRT_FLEXIO2.offset094)
#define FLEXIO2_SHIFTCTL6		(IMXRT_FLEXIO2.offset098)
#define FLEXIO2_SHIFTCTL7		(IMXRT_FLEXIO2.offset09C)
#define FLEXIO2_SHIFTCFG4		(IMXRT_FLEXIO2.offset110)
#define FLEXIO2_SHIFTCFG5		(IMXRT_FLEXIO2.offset114)
#define FLEXIO2_SHIFTCFG6		(IMXRT_FLEXIO2.offset118)
#define FLEXIO2_SHIFTCFG7		(IMXRT_FLEXIO2.offset11C)


Now it's time for the clocking signal. We still have one FlexIO2 pin left so lets use it - it's Teensy pin 32, so pad B0_12, FlexIO2 pin 12. Set it to FlexIO2 mode first:

Code:
IOMUXC_SW_MUX_CTL_PAD_GPIO_B0_12 = 4;

We also need to take care of the DATA_VALID line here, that should enable/disable the whole clocking. But we're out of FlexIO2 pins. Fortunately, signals can be routed into FlexIO2 as triggers through the XBAR. This is exactly the same as in the case of DMA. Let's take Teensy pin 4 again (pad EMC 06), like before

Set it to XBAR mode:

Code:
IOMUXC_SW_MUX_CTL_PAD_GPIO_EMC_06 = 3;

Set it to be XBAR input:

Code:
IOMUXC_GPR_GPR6 &= ~(IOMUXC_GPR_GPR6_IOMUXC_XBAR_DIR_SEL_8);

Set the daisy-chaining to pick EMC06

Code:
IOMUXC_XBAR1_IN08_SELECT_INPUT = 0;

And connect it to FlexIO2 as trigger 0:

Code:
xbar_connect( XBARA1_IN_IOMUX_XBAR_INOUT08, XBARA1_OUT_FLEXIO2_TRIGGER_IN0 );

Now the timer setup, first the TIMCTLa register (page 2932). We want it to use trigger 0 (TRGSEL = 0), that's active high (TRGPOL not set), and it's an external trigger (TRGSRC not set), we disable the ouput (because the clocking comes from outside, we're not generating it - PINCFG = 0), and we set the clock pin to be pin 12 (FlexIO2 pin again!). We want the timer pin to be active high (PINPOL not set) and we want to use the 16 bit counter mode.

Code:
FLEXIO2_TIMCTL0		= FLEXIO_TIMCTL_TRGSEL( 0 )		|			// src trigger 0
			  // FLEXIO_TIMCTL_TRGPOL		|			// trigger active high
			  // FLEXIO_TIMCTL_TRGSRC		|			// exeternal trigger
			  FLEXIO_TIMCTL_PINCFG( 0 )		|			// timer pin output disabled
			  FLEXIO_TIMCTL_PINSEL( 12 )		|			// timer pin 12
			  // FLEXIO_TIMCTL_PINPOL		|			// timer pin active high
			  FLEXIO_TIMCTL_TIMOD( 3 );					// timer mode 16bit

Next the TIMCFG register (page 2934). We don't really care about the timer output signal, so we set it to be low, and not affected by reset (whatever - TIMOUT = 0). Now the really important bit: timer decrement mode - we set it to 2 - to decrement on pin input, on every edge and set the shift clock to be pin input (in the shifter config we set it to shift on positive edge of that signal). The clock signal supplied to pin 12 that we just set will cause our 16 bit counter to decrement on every edge, both rising and falling. We want the timer to get disabled on a falling edge of the trigger (TIMDIS = 6) and enabled again on the rising trigger (TIMENA = 6) - this way, the timer will be working only when the trigger signal (so our DATA_VALID) is high. We could reset the timer to the default value when the trigger goes high (TIMRST = 6), but we can just as well not do it (TIMRST = 0). We don't need any extra stop or start bits handles (TSTOP = 0, TSTART not set):

Code:
FLEXIO2_TIMCFG0		= FLEXIO_TIMCFG_TIMOUT( 0 )		|			// timer output = low, not affcted by reset
			  FLEXIO_TIMCFG_TIMDEC( 2 )		|			// decrement on pin input (both edges), shift clock = pin input
			  FLEXIO_TIMCFG_TIMRST( 0 )		|			// dont reset timer 
			  FLEXIO_TIMCFG_TIMDIS( 6 )		|			// disable timer on trigger falling
			  FLEXIO_TIMCFG_TIMENA( 6 )		|			// enable timer on trigger rising
			  FLEXIO_TIMCFG_TSTOP( 0 )		;			// stop bit disabled
			  //FLEXIO_TIMCFG_TSTART					// start bit disabled

We want to shift 16 times (to fully fill shifters doing parallel input) and then transfer data from shifters to the shifter buffer - so for the reset value for the timer we need to set 31 (remember, it's decremented on *both* edges, and it does the transfer when it's zero and tries to decrement):

Code:
FLEXIO2_TIMCMP0		= 31;	// move from shift to shiftbuf every 32 timer ticks (so 16 shift clock cycles)
We could also use 8 bit counter mode, setting the reset value low byte to 1 (so two edges for every shift) and the high byte to 15 (so 16 shifts until transfer to shifter buffers) - or at least I think so, I haven't tried that.

And that's pretty much it for the FlexIO setup. We however need to grab the received data from the shifter buffers and copy it to some buffer. And since we're already DMA champions, there's no better way than to use DMA. Because the setup is slightly more tricky this time, we'll do most it by hand, directly in the registers (actually... it might be possible to do all that just with the available DMA interface, I was however experimenting with all this a bit, had to touch the internals by hand and it kind of stayed that way...)

In total, we want to transfer 32 bytes (8 shifter buffers, each 32 bits, so 4 bytes wide)

Code:
dmaChannel.TCD->NBYTES		= 8 * 4;

The start address for the transfer is of course the address of the 0th shifter buffer register, and each transfer moves forward by 4 bytes.

Code:
dmaChannel.TCD->SADDR		= &FLEXIO2_SHIFTBUF0;
dmaChannel.TCD->SOFF		= 4;

We want the transfer to be 32 bit, and we want our source address to be computed modulo 32 - so after reading 32 bytes, the address will effectively reset to the beginning. We could also set the SLAST to be -32, to move the start address back by 32 bytes after finishing:

Code:
dmaChannel.TCD->ATTR_SRC	= ( 5 << 3 ) | 2;								// 32 bit reads + 2^5 modulo
dmaChannel.TCD->SLAST		= 0;

For the destination, we just set our target buffer, also doing 4 byte writes, incrementing the address by 4 after every time and rewidning to the beginning after the entire transfer is done:

Code:
dmaChannel.TCD->DADDR		= dmaBuffer;
dmaChannel.TCD->DOFF		= 4;
dmaChannel.TCD->ATTR_DST	= 2;											// 32 bit writes
dmaChannel.TCD->DLASTSGA	= -DMABUFFER_SIZE * 4;

We set the total number of major loop to be DMABUFFER_SIZE / 8 (DMABUFFER_SIZE is in DWORDs, and we copy 8 of them in every loop iteration)

Code:
dmaChannel.TCD->BITER		= DMABUFFER_SIZE / 8;
dmaChannel.TCD->CITER		= DMABUFFER_SIZE / 8;

We also don't want to disable that DMA after it finishes, we just want to keep it going, but we want to get an interrupt when the transfer is half done and fully done - effectively dividing the target buffer in half and doing a double buffering scheme - processing one half while the other is being filled:

Code:
dmaChannel.TCD->CSR	&= ~(DMA_TCD_CSR_DREQ);				// do not disable the channel after it completes - so it just keeps going 
dmaChannel.TCD->CSR	|= DMA_TCD_CSR_INTMAJOR | DMA_TCD_CSR_INTHALF;	// interrupt at completion and at half completion

We now attach the interrupt handler and make the DMA to be triggered by the request 0 from FlexIO2

Code:
dmaChannel.attachInterrupt( inputDMAInterrupt );
dmaChannel.triggerAtHardwareEvent( DMAMUX_SOURCE_FLEXIO2_REQUEST0 );

The final bit is actually enabling that DMA request in the FlexIO - we want it on the shifter status flag, which is set when the data is transferred from shifter to shifter buffer (SHIFTSDEN register, page 2923). Since all the shifters work in sync, we can just set one bit:

Code:
FLEXIO2_SHIFTSDEN |= 1 << 0;

We can now enable the DMA and the system is ready to receive the data:

Code:
dmaChannel.enable();

Of course we need to remember that the data received has all the bits intertwined, so we need to shuffle them around to get the actual values!

The entire code is below. It includes the shuffling of the bits and basic consistency checks for the received values - I've been testing it with a binary counter. It works totally fine clocked up to 25MHz (after I hooked up enough ground signals... see here: https://forum.pjrc.com/threads/66170-Crosstalk-with-many-multiple-inputs?p=269141#post269141). At 50MHz it works fine too, but I'm starting to get these glitches caused by insufficient grounding, and it's actually too fast to decode the incoming data on the fly. If you're just aiming to capture the data and process it then, it should be fine though (maybe you can get even faster than that).

Hope it's useful to anyone, and post a correction if you spot any errors!


Code:
#include <DMAChannel.h>

#define FLEXIO2_SHIFTCTL4		(IMXRT_FLEXIO2.offset090)
#define FLEXIO2_SHIFTCTL5		(IMXRT_FLEXIO2.offset094)
#define FLEXIO2_SHIFTCTL6		(IMXRT_FLEXIO2.offset098)
#define FLEXIO2_SHIFTCTL7		(IMXRT_FLEXIO2.offset09C)
#define FLEXIO2_SHIFTCFG4		(IMXRT_FLEXIO2.offset110)
#define FLEXIO2_SHIFTCFG5		(IMXRT_FLEXIO2.offset114)
#define FLEXIO2_SHIFTCFG6		(IMXRT_FLEXIO2.offset118)
#define FLEXIO2_SHIFTCFG7		(IMXRT_FLEXIO2.offset11C)

DMAChannel dmaChannel;

unsigned long dmaStartTime;
unsigned long dmaEndTime;

unsigned long prevTime;
unsigned long currTime;

uint32_t lastSeenHalf = 0;

#define DMABUFFER_SIZE	4096

// data written by the DMA
uint32_t dmaBuffer[DMABUFFER_SIZE];
uint32_t dmaBufferHalfCount = 0;

// deinterleaved data
uint32_t processedBuffer[DMABUFFER_SIZE * 2];

// data consistency check
uint32_t prevVal = 0;
bool dataCorrect = true;

void xbar_connect(unsigned int input, unsigned int output)
{
	if (input >= 88) return;
	if (output >= 132) return;

	volatile uint16_t *xbar = &XBARA1_SEL0 + (output / 2);
	uint16_t val = *xbar;
	if (!(output & 1)) {
		val = (val & 0xFF00) | input;
	} else {
		val = (val & 0x00FF) | (input << 8);
	}
	*xbar = val;
}


void inputDMAInterrupt()
{
	dataCorrect = true;

	prevTime = currTime;
	currTime = micros();  

	dmaStartTime = micros();

	uint32_t* dmaData = dmaBuffer + ( DMABUFFER_SIZE / 2 ) * ( dmaBufferHalfCount & 1 );
	uint32_t* processedData = processedBuffer + ( DMABUFFER_SIZE / 2 ) * ( dmaBufferHalfCount & 1 ) * 2;	

	uint32_t inData[] = { dmaData[2],		// pins 0-3
						  dmaData[3],		
						  dmaData[6],		// pins 16-19
						  dmaData[7],
						  dmaData[0],		// pin 10
						  dmaData[1],		// pin 11
						  dmaData[4],		// pin 28
						  dmaData[5] };		// pin 29


	for( int batch=0; batch< ( DMABUFFER_SIZE / 2 ) / 8; ++batch )
	{
		for( int i=0; i<16; ++i )
		{
			uint32_t pins_00_03 = ( ( ( i < 8 ) ? dmaData[2] : dmaData[3] ) >> ( ( i & 0x07 ) * 4 ) ) & 0x0F;
			uint32_t pins_16_19 = ( ( ( i < 8 ) ? dmaData[6] : dmaData[7] ) >> ( ( i & 0x07 ) * 4 ) ) & 0x0F;
			uint32_t pin_10 = ( dmaData[0] >> ( 16 + i ) ) &  1;
			uint32_t pin_11 = ( dmaData[1] >> ( 16 + i ) ) &  1;
			uint32_t pin_28 = ( dmaData[4] >> ( 16 + i ) ) &  1;
			uint32_t pin_29 = ( dmaData[5] >> ( 16 + i ) ) &  1;
	
			uint32_t outData = ( pins_00_03		 ) |
							   ( pins_16_19 << 4 ) |
							   ( pin_10 << 8 )	   |
							   ( pin_11 << 9 )	   |
							   ( pin_28 << 10 )	   |
							   ( pin_29 << 11 );
	
			processedData[i] = outData;

			if ( ( ( prevVal + 1 ) & 4095 ) != outData )
			{
				dataCorrect = false;
			}
			prevVal = outData;
		}
	
		dmaData += 8;
		processedData += 16;
	}

	dmaEndTime = micros();

	++dmaBufferHalfCount;

	dmaChannel.clearInterrupt();	// tell system we processed it.
	asm("DSB");						// this is a memory barrier
}



void setupFlexIOInput()
{
	// FlexIO2 works at 30Mhz by default, we need it faster!

	// set the FlexIO2 clock divider to 2 instead of 8
	CCM_CS1CDR &= ~( CCM_CS1CDR_FLEXIO2_CLK_PODF( 7 ) );
	CCM_CS1CDR |= CCM_CS1CDR_FLEXIO2_CLK_PODF( 1 );

	//CCM_CS1CDR |= CCM_CS1CDR_FLEXIO2_CLK_PODF( 0 );	// even faster seems to work ;-)

	// enable clock for FlexIO2
	CCM_CCGR3 |= CCM_CCGR3_FLEXIO2(CCM_CCGR_ON); 

	// enable clock for clock XBAR
	CCM_CCGR2 |= CCM_CCGR2_XBAR1(CCM_CCGR_ON);

	// enable FlexIO2
	FLEXIO2_CTRL |= 1;

	// fast mode -if it's on, the 0/1 DMA requests do not work
	// FLEXIO2_CTRL |= 1 << 2;	

	///////////////////////////////////////////////
	// set the pads corresponding to the FlexIO2 pins 0-3, 10, 11, 28, 29, 16-19 to proper mode

	IOMUXC_SW_MUX_CTL_PAD_GPIO_B0_00 = 4;		// 0
	IOMUXC_SW_MUX_CTL_PAD_GPIO_B0_01 = 4;		// 1
	IOMUXC_SW_MUX_CTL_PAD_GPIO_B0_02 = 4;		// 2
	IOMUXC_SW_MUX_CTL_PAD_GPIO_B0_03 = 4;		// 3
	IOMUXC_SW_MUX_CTL_PAD_GPIO_B0_10 = 4;		// 10
	IOMUXC_SW_MUX_CTL_PAD_GPIO_B0_11 = 4;		// 11

	IOMUXC_SW_MUX_CTL_PAD_GPIO_B1_00 = 4;		// 16
	IOMUXC_SW_MUX_CTL_PAD_GPIO_B1_01 = 4;		// 17
	IOMUXC_SW_MUX_CTL_PAD_GPIO_B1_02 = 4;		// 18
	IOMUXC_SW_MUX_CTL_PAD_GPIO_B1_03 = 4;		// 19
	IOMUXC_SW_MUX_CTL_PAD_GPIO_B1_12 = 4;		// 28
	IOMUXC_SW_MUX_CTL_PAD_GPIO_B1_13 = 4;		// 29

	// set the mode for the clock pin
	IOMUXC_SW_MUX_CTL_PAD_GPIO_B0_12 = 4;		// 12

	//////////////////////////////////////////////
	// setup shifters and timer

	//		0 - single bits from pin 10
	FLEXIO2_SHIFTCTL0	= FLEXIO_SHIFTCTL_TIMSEL( 0 )		|			// timer 0
						  //FLEXIO_SHIFTCTL_TIMPOL			|			// on positive edge
						  FLEXIO_SHIFTCTL_PINCFG( 0 )		|			// pin output disabled
						  FLEXIO_SHIFTCTL_PINSEL( 10 )		|			// pin 0
						  //FLEXIO_SHIFTCTL_PINPOL			|			// active high
						  FLEXIO_SHIFTCTL_SMOD( 1 );					// receive mode
	
	FLEXIO2_SHIFTCFG0	= FLEXIO_SHIFTCFG_PWIDTH( 0 )		|			// single bit
						  //FLEXIO_SHIFTCFG_INSRC			|			// from pin
						  FLEXIO_SHIFTCFG_SSTOP( 0 )		|			// stop bit disabled
						  FLEXIO_SHIFTCFG_SSTART( 0 );					// start bit disabled
					
	////		1 - single bits from pin 11
	FLEXIO2_SHIFTCTL1	= FLEXIO_SHIFTCTL_TIMSEL( 0 ) | FLEXIO_SHIFTCTL_PINCFG( 0 ) | FLEXIO_SHIFTCTL_PINSEL( 11 ) | FLEXIO_SHIFTCTL_SMOD( 1 );
	FLEXIO2_SHIFTCFG1	= FLEXIO_SHIFTCFG_PWIDTH( 0 ) | FLEXIO_SHIFTCFG_SSTOP( 0 ) | FLEXIO_SHIFTCFG_SSTART( 0 );
	
	//		2 - 4 bits from shifter 3
	FLEXIO2_SHIFTCTL2	= FLEXIO_SHIFTCTL_TIMSEL( 0 ) | FLEXIO_SHIFTCTL_PINCFG( 0 ) | FLEXIO_SHIFTCTL_PINSEL( 0 ) | FLEXIO_SHIFTCTL_SMOD( 1 );
	FLEXIO2_SHIFTCFG2	= FLEXIO_SHIFTCFG_PWIDTH( 3 ) | FLEXIO_SHIFTCFG_INSRC | FLEXIO_SHIFTCFG_SSTOP( 0 ) | FLEXIO_SHIFTCFG_SSTART( 0 );
	
	//		3 - 4 bits from pins 0-3
	FLEXIO2_SHIFTCTL3	= FLEXIO_SHIFTCTL_TIMSEL( 0 ) | FLEXIO_SHIFTCTL_PINCFG( 0 ) | FLEXIO_SHIFTCTL_PINSEL( 0 ) | FLEXIO_SHIFTCTL_SMOD( 1 );
	FLEXIO2_SHIFTCFG3	= FLEXIO_SHIFTCFG_PWIDTH( 3 ) | FLEXIO_SHIFTCFG_SSTOP( 0 ) | FLEXIO_SHIFTCFG_SSTART( 0 );
	
	//		4 - single bit from pin 28
	FLEXIO2_SHIFTCTL4	= FLEXIO_SHIFTCTL_TIMSEL( 0 ) | FLEXIO_SHIFTCTL_PINCFG( 0 ) | FLEXIO_SHIFTCTL_PINSEL( 28 ) | FLEXIO_SHIFTCTL_SMOD( 1 );
	FLEXIO2_SHIFTCFG4	= FLEXIO_SHIFTCFG_PWIDTH( 0 ) | FLEXIO_SHIFTCFG_SSTOP( 0 ) | FLEXIO_SHIFTCFG_SSTART( 0 );
	
	//		5 - single bit from pin 29
	FLEXIO2_SHIFTCTL5	= FLEXIO_SHIFTCTL_TIMSEL( 0 ) | FLEXIO_SHIFTCTL_PINCFG( 0 ) | FLEXIO_SHIFTCTL_PINSEL( 29 ) | FLEXIO_SHIFTCTL_SMOD( 1 );
	FLEXIO2_SHIFTCFG5	= FLEXIO_SHIFTCFG_PWIDTH( 0 ) | FLEXIO_SHIFTCFG_SSTOP( 0 ) | FLEXIO_SHIFTCFG_SSTART( 0 );
	
	//		6 - 4 bits from shifter 7
	FLEXIO2_SHIFTCTL6	= FLEXIO_SHIFTCTL_TIMSEL( 0 ) | FLEXIO_SHIFTCTL_PINCFG( 0 ) | FLEXIO_SHIFTCTL_PINSEL( 0 ) | FLEXIO_SHIFTCTL_SMOD( 1 );
	FLEXIO2_SHIFTCFG6	= FLEXIO_SHIFTCFG_PWIDTH( 3 ) | FLEXIO_SHIFTCFG_INSRC | FLEXIO_SHIFTCFG_SSTOP( 0 ) | FLEXIO_SHIFTCFG_SSTART( 0 );
	
	//		7 - 4 bits from pins 16-19
	FLEXIO2_SHIFTCTL7	= FLEXIO_SHIFTCTL_TIMSEL( 0 ) | FLEXIO_SHIFTCTL_PINCFG( 0 ) | FLEXIO_SHIFTCTL_PINSEL( 16 ) | FLEXIO_SHIFTCTL_SMOD( 1 );
	FLEXIO2_SHIFTCFG7	= FLEXIO_SHIFTCFG_PWIDTH( 3 ) | FLEXIO_SHIFTCFG_SSTOP( 0 ) | FLEXIO_SHIFTCFG_SSTART( 0 );

	// timer 0 - clocked from pin 12, enabled by an external trigger rise, disabled by the external trigger fall
	FLEXIO2_TIMCTL0		= FLEXIO_TIMCTL_TRGSEL( 0 )		|			// src trigger 0
						  // FLEXIO_TIMCTL_TRGPOL		|			// trigger active high
						  // FLEXIO_TIMCTL_TRGSRC		|			// exeternal trigger
						  FLEXIO_TIMCTL_PINCFG( 0 )		|			// timer pin output disabled
						  FLEXIO_TIMCTL_PINSEL( 12 )	|			// timer pin 12
						  // FLEXIO_TIMCTL_PINPOL		|			// timer pin active high
						  FLEXIO_TIMCTL_TIMOD( 3 );					// timer mode 16bit

	FLEXIO2_TIMCFG0		= FLEXIO_TIMCFG_TIMOUT( 0 )		|			// timer output = low, not affcted by reset
						  FLEXIO_TIMCFG_TIMDEC( 2 )		|			// decrement on pin input (both edges), shift clock = pin input
						  FLEXIO_TIMCFG_TIMRST( 6 )		|			// timer reset on trigger rising (this resets the timer when line valid becomes asserted)
						  FLEXIO_TIMCFG_TIMDIS( 6 )		|			// disable timer on trigger falling
						  FLEXIO_TIMCFG_TIMENA( 6 )		|			// enable timer on trigger rising
						  FLEXIO_TIMCFG_TSTOP( 0 )		;			// stop bit disabled
						  //FLEXIO_TIMCFG_TSTART					// start bit disabled	

	FLEXIO2_TIMCMP0		= 31;										// move from shift to shiftbuf every 32 timer ticks (so 16 shift clock cycles)
			

	/////////////////////////////////////////////////////////
	// setup external trigger (line valid signal)

	// set the IOMUX mode to 3, to route it to XBAR
	IOMUXC_SW_MUX_CTL_PAD_GPIO_EMC_06 = 3;	
	
	// set XBAR1_IO008 to INPUT
	IOMUXC_GPR_GPR6 &= ~(IOMUXC_GPR_GPR6_IOMUXC_XBAR_DIR_SEL_8);
	
	// daisy chaining - select between EMC06 and SD_B0_04
	IOMUXC_XBAR1_IN08_SELECT_INPUT = 0;
	
	// connect the IOMUX_XBAR_INOUT08 to FlexIO2 trigger 0
	xbar_connect( XBARA1_IN_IOMUX_XBAR_INOUT08, XBARA1_OUT_FLEXIO2_TRIGGER_IN0 );
		
	////////////////////////////////////////////////////////
	// setup dma to pick up the data

	// configure DMA channels
	dmaChannel.begin();

	dmaChannel.TCD->SADDR		= &FLEXIO2_SHIFTBUF0;
	dmaChannel.TCD->SOFF		= 4;
	dmaChannel.TCD->ATTR_SRC	= ( 5 << 3 ) | 2;								// 32 bit reads + 2^5 modulo
	dmaChannel.TCD->SLAST		= 0;

	dmaChannel.TCD->DADDR		= dmaBuffer;
	dmaChannel.TCD->DOFF		= 4;
	dmaChannel.TCD->ATTR_DST	= 2;											// 32 bit writes
	dmaChannel.TCD->DLASTSGA	= -DMABUFFER_SIZE * 4;			

	dmaChannel.TCD->NBYTES		= 8 * 4;										// write 32 bytes - all the shiftbuf registers
	dmaChannel.TCD->BITER		= DMABUFFER_SIZE / 8;
	dmaChannel.TCD->CITER		= DMABUFFER_SIZE / 8;
	
	dmaChannel.TCD->CSR		   &= ~(DMA_TCD_CSR_DREQ);							// do not disable the channel after it completes - so it just keeps going 
	dmaChannel.TCD->CSR		   |= DMA_TCD_CSR_INTMAJOR | DMA_TCD_CSR_INTHALF;	// interrupt at completion and at half completion

	dmaChannel.attachInterrupt( inputDMAInterrupt );
	dmaChannel.triggerAtHardwareEvent( DMAMUX_SOURCE_FLEXIO2_REQUEST0 );
		 
	// enable DMA on shifter status
	FLEXIO2_SHIFTSDEN |= 1 << 0;	
	
	// enable DMA
	dmaChannel.enable();
}


void setup()
{
	Serial.begin(115200);	

	setupFlexIOInput();	
}


void loop()
{	
	delay( 100 );

	if ( lastSeenHalf != dmaBufferHalfCount )
	{ 
		uint32_t* dmaData = processedBuffer + 2 * ( DMABUFFER_SIZE / 2 ) * ( dmaBufferHalfCount & 1 );

		Serial.printf( "%s %8u, %8u, 0x%08X 0x%08X 0x%08X 0x%08X 0x%08X 0x%08X 0x%08X 0x%08X\n", dataCorrect ? "" : "DATA INCORRECT!", currTime - prevTime, dmaEndTime - dmaStartTime, dmaData[0], dmaData[1], dmaData[2], dmaData[3], dmaData[4], dmaData[5], dmaData[6], dmaData[7]  );

		lastSeenHalf = dmaBufferHalfCount;
	}
	else
	{
		//Serial.printf("Nothing\n" );
	}
}
 
Looks good, will read it in more detail later, have not had my coffee yet :D

Note: earlier I played around with SPI, and have a github project that has some stuff to sort of try to manage some of the resources, like ask for a timer or shifter, or reserve them...
It is up at: https://github.com/KurtE/FlexIO_t4

And a thread about it: https://forum.pjrc.com/threads/58228-T4-FlexIO-Looking-back-at-my-T4-beta-testing-library-FlexIO_t4

And with it, did put most if not all of the definitions into the imxrt.h file for FlexIO... into a structure...
But I see I never went back to redefine all of the defines.

So instead of needing to do defines like: #define FLEXIO2_SHIFTCTL4 (IMXRT_FLEXIO2.offset090)

You should be able to use things like: IMXRT_FLEXIO2_S.SHIFTCTL[4];

But: as far as I know from the IMXRT manual there are only 4 SHIFTCTL registers?

Is there some place that says there are 8 of them now? If so should change the structure in imxrt.h to reflect it:
That is:
Code:
        volatile uint32_t SHIFTCTL[4];          // 0x80 84 88 8C
        const   uint32_t UNUSED4[28];           // 0x90 - 0xfc
Could simple change 4->8 and 28->24, but again so far I have not seen anything that talks about shifters 4-7?
 
The manual is indeed pretty inconsistent. Most of the sections just talk about SHIFTERi. The text description of the parallel interface in 50.3.4.1 mentions shifters 0-7, and so does the example setup of the Motorola 68K/Intel 8080 Bus Interface in section 50.4.9.
But more importantly the chip itself reports that there's 8 of them:

Code:
Serial.printf( "FlexIO1 PARAM: 0x%08X\nFlexIO2 PARAM: 0x%08X\nFlexIO3 PARAM: 0x%08X\n", FLEXIO1_PARAM, FLEXIO2_PARAM, FLEXIO3_PARAM );

Gives back:

Code:
FlexIO1 PARAM: 0x02100808
FlexIO2 PARAM: 0x02200808
FlexIO3 PARAM: 0x02200808

So 2 external triggers, 16 (1) or 32 (2 & 3) pins, 8 timers and 8 shifters.

@luni no, I totally don't mind you putting it on the WIKI!
 
Last edited:
@luni no, I totally don't mind you putting it on the WIKI!

Thanks, copied your post into the "Connectivity" section of the user WIKI (https://github.com/TeensyUser/doc/wiki/FLEXIO) Feel free to edit and add other stuff to the wiki as you like (the text might benefit from a few structuring headings...:rolleyes:). In case you are not familiar with GitHub WIKIs or markdown formatting you find some general information on the homepage https://github.com/TeensyUser/doc/wiki. There is also a discussion section (https://github.com/TeensyUser/doc/discussions) for, well, discussions.
 
Last edited:
Awesome writeup!

I noticed too that the chip behaves as if it has extra shifters. In fact my SmartMatrix driver code makes use of an undocumented fifth FlexIO shifter, but somehow I was of the impression that the extra shifters were only partially functional... At least in some cases.

If anyone wishes to see another example of a FlexIO parallel interface, check out SmartMatrix which uses FlexIO2 in 16-bit shift mode to output on 6 data pins (and also uses a FlexIO timer output on a seventh pin to generate a clock signal).

https://github.com/pixelmatix/Smart...422/src/MatrixTeensy4Hub75Refresh_Impl.h#L400
 
Thanks will take a look...
Will update my semi library code, plus should do PR back to cores IMXRT.h...
 
Nice!

@easone do you happen to remember what problems you had when using the shifters 4-7? In my experiments, they were behaving just fine, but I haven't tested them really thoroughly. BTW, in the SmartMatrix code, you're setting the FlexIO clock to run at 480Mhz (clocking source 3, so PLL3 and then both dividers to 1) - does it actually ran at 480Mhz? Have you seen any problems with running it that high?
 
@easone and @miiwan and others -Started hacking a little on my library some to add the extra shifters and timers. Plus will update imxrt.h ....

But wondering about some of the other registers:
Example assuming TIMIEN has 8 bits of data?
Also ones like SHIFTSTAT, SHIFTERR, TIMSTAT, ...

What about DMA? Again only on FLEXIO1 and 2 but not 3... But FLEXIO1 has 2 DMAMUX Sources
#define DMAMUX_SOURCE_FLEXIO1_REQUEST0 0
#define DMAMUX_SOURCE_FLEXIO1_REQUEST1 0

and

#define DMAMUX_SOURCE_FLEXIO1_REQUEST2 64
#define DMAMUX_SOURCE_FLEXIO1_REQUEST3 64

I am assuming that 4-7 don't have DMA capability?

Still more to play with!
 
Nice!

@easone do you happen to remember what problems you had when using the shifters 4-7? In my experiments, they were behaving just fine, but I haven't tested them really thoroughly. BTW, in the SmartMatrix code, you're setting the FlexIO clock to run at 480Mhz (clocking source 3, so PLL3 and then both dividers to 1) - does it actually ran at 480Mhz? Have you seen any problems with running it that high?

I'm not sure exactly what the problem was, maybe it was a DMA or triggering issue, or maybe it was just a mistake on my part. The fact that you got your example working on the extra shifters is great evidence that they do work though!

Yes, the FlexIO peripheral runs happily with the 480 MHz clock source, but as far as FlexIO is concerned this translates to 480 million timer edges per second so the fastest data rate is 240 MHz since the data is shifted on either rising or falling timer edges. For SmartMatrix we are actually configuring the timer in Dual 8-bit mode by configuring TIMCTL = FLEXIO_TIMCTL_TIMOD(1). In this mode, the upper 8 bits of the timer count down the number of shifts, and the lower 8 bits provide an additional clock divider in the following manner:

Code:
flexIO->TIMCMP[0] = ((shiftsPerReload * 2 - 1) << 8) | ((FLEXIO_CLOCK_DIVIDER / 2 - 1) << 0);

By default, SmartMatrix uses shiftsPerReload = 8 and FLEXIO_CLOCK_DIVIDER = 26, so that means it shifts out data 8 times per DMA transfer (the contents of four 32-bit shifters in 16-bit shift mode) at a speed of 480/26 = 18.462 MHz. That speed is chosen to work with the LED Matrix hardware limitations but also to avoid excessively taxing the DMA engine. When using really high speeds, I find that reloading the buffers by DMA is the limiting factor especially if there's other demands like SD, USB, or SPI. If more shifters are used, that decreases the number of DMA transfers and decreases the total overhead.

I got better performance using the dual 8-bit counter to slow down the clock, rather than using the actual clock source's pre- and post-dividers to get the desired frequency.
 
Last edited:
I am assuming that 4-7 don't have DMA capability?

I think you're right, shifters 4-7 don't have the ability to generate DMA triggers when their buffers are empty. However, depending on application it may suffice to use the triggers from shifters 0-3 to reload the buffers if all the shifters are operating simultaneously or in a daisy-chained configuration...
 
I am assuming that 4-7 don't have DMA capability?

Yup, they cannot generate DMA requests (or rather they most likely can, it's just not routed to the DMA Mux). With the interrupts I've been generally seeing some weird stuff going on. Enabling them for any shifter does generate them, but it looks a bit like the higher shifters generate them more often or sth. Having just a simple interrupt handler, that increments a value, gives completely different behavior depending on which shifter I attach the interrupt to - attaching it to the 7th one gives ~6x more interrupts per second than the exact same code, just with the interrupt attached to 0th one - but I haven't investigated tha


Yes, the FlexIO peripheral runs happily with the 480 MHz clock source

Oh, great to know that it doesn't burn in a longer run or sth :) But yeah, I'm not really sure how useful that is either, unless you only have some limited data to transmit/receive and it's more of a burst rather than continuous transfer. If you need to process the incoming/outgoing data on the fly, even at 50Mhz, it's only 12 core cycles (assuming 600MHz core) per data word.
 
> DMA seems to be loosing data if going faster than 10MHz

It later turned out that you had some wiring issues. Would you still stay that 10 Mhz is the max speed for DMA input of GPIO pin data?
 
The comment about 10MHz was about doing DMA from GPIO pins. There, you're somewhat limited by the latency between the pin going high and the DMA request being generated/processed. With FlexIO, you have shifters that act as buffers, so DMA requests do not have to be generated that often - and you should be able to get much higher speeds.
 
@miciwan
This is a real great work you did there. Thank you for that. I actually really mean it as I am part of NXP, but in Sales.

I also do tech stuff for myself. I intend to do a 16bit, or 18 bit parallel LCD interface. Do you think we can somehow link FlexIO1 and FlexIO2 to get a wider datapath?

Cheers.

Hobi.
 
@hobi I have a 8/16 bit library written by another forum member that uses the fast GPIO's
A full screen update on an ILI9488 @16bit color / 16bit bus takes roughly 6ms - not too bad!
But, the fast GPIOs don't support DMA.

With a T4.1, you might be able to use FlexIO1 or 2 and shift 8 bits into the 32 bit register/shifters and use the relevant exposed pins. But all of this is worthwhile only if DMA can be utilized to offload the writing from the main core.

I'm actually looking forward to the Teensy MicroMod as it has 8 consecutive pins exposed on FlexIO2

Here are some interesting documents to read:
https://community.nxp.com/t5/Kinetis-Microcontrollers/Understanding-FlexIO/ta-p/1115419
https://www.nxp.com/docs/en/application-note/AN12822.pdf

Would be happy to collaborate on such a library if you move forward with it.
 
@miciwan I have a quick Q about the shifters/shifter buffers.
Looking at this image:
Screen Shot 2021-06-22 at 17.34.35.png

If I configure the flexio instance to shift out 8 bits at a time, does that mean I can fill each buffer with 4*8 bits of data * 8 shifter = 32bytes of data without refilling, or is it only 8 bits per shifter * 8 shifters = 8 bytes of data?
 
It will need to refilled every 32 bytes.

On each shift 8 bytes are shifted and the top bits are filled from a previous shifter (assuming the setup on the picture). The entire set is going to be shifted out after 32 cycles/bytes. Then the shifters will be re-filled from the buffers and the interrupt/DMA request will be generated (and the appropriate bit in the status will change)
 
@miciwan thanks for confirming!

I have a few more questions up my sleeve :D

I'm in the process of adapting the NXP FlexIO 8080 LCD example to the Teensy 4.1/MM (focus is 8 bit width on MM) and I have some questions about several defines in imxrt.h

For example the following register config in the NXP example - shifter input source
FLEXIO_SHIFTCFG_INSRC(1U)

The 1060 datasheet the config has a value of 0 or 1
Code:
Input Source
Selects the input source for the shifter.
0b - Pin
1b - Shifter N+1 Output


In imxrt.h it's configured as follows:
Code:
#define FLEXIO_SHIFTCFG_INSRC			((uint32_t)(1<<8))

In some places the library sets the input source to a pin. Could I set the value as follows?
Code:
((uint32_t)(0<<8))
 
I've hacked some code together based on one of NXPs code examples for FlexIO.
I was wondering if someone here (Paul, Kurt, miciwan) who knows the in's & out's just a bit (ALOT) better and can confirm the setup. I am starting to understand some of the settings slowly, but not all of them.

FlexIO init
Code:
/* Get a FlexIO channel */
  pFlex = FlexIOHandler::flexIOHandler_list[1]; // use FlexIO2

  /* Pointer to the port structure in the FlexIO channel */
  p = &pFlex->port();
  
  /* Pointer to the hardware structure in the FlexIO channel */
  hw = &pFlex->hardware();

  /* Basic pin setup */
  pinMode(10, OUTPUT); // FlexIO2:0 - WR
  pinMode(40, OUTPUT); // FlexIO2:4 - D0
  pinMode(41, OUTPUT); // FlexIO2:5
  pinMode(42, OUTPUT); // FlexIO2:6
  pinMode(43, OUTPUT); // FlexIO2:7
  pinMode(44, OUTPUT); // FlexIO2:8
  pinMode(45, OUTPUT); // FlexIO2:9
  pinMode(6, OUTPUT); // FlexIO2:10
  pinMode(9, OUTPUT); // FlexIO2:11 - D7
  
  /* High speed and drive strength configuration */
  *(portControlRegister(10)) = 0xFF; 
  *(portControlRegister(40)) = 0xFF;
  *(portControlRegister(41)) = 0xFF;
  *(portControlRegister(42)) = 0xFF;
  *(portControlRegister(43)) = 0xFF;
  *(portControlRegister(44)) = 0xFF;
  *(portControlRegister(45)) = 0xFF;
  *(portControlRegister(6)) = 0xFF;
  *(portControlRegister(9)) = 0xFF;

  /* Set clock */
  pFlex->setClockSettings(3, 0, 0); // 480 MHz

  /* Set up pin mux */
  pFlex->setIOPinToFlexMode(10);
  pFlex->setIOPinToFlexMode(40);
  pFlex->setIOPinToFlexMode(41);
  pFlex->setIOPinToFlexMode(42);
  pFlex->setIOPinToFlexMode(43);
  pFlex->setIOPinToFlexMode(44);
  pFlex->setIOPinToFlexMode(45);
  pFlex->setIOPinToFlexMode(6);
  pFlex->setIOPinToFlexMode(9);

  /* Enable the clock */
  hw->clock_gate_register |= hw->clock_gate_mask;
  
  /* Enable the FlexIO with fast access */
  p->CTRL = FLEXIO_CTRL_FLEXEN | FLEXIO_CTRL_FASTACC;

Shifter & timer setup
Code:
/* Disable and reset FlexIO */
    p->CTRL &= ~FLEXIO_CTRL_FLEXEN;
    p->CTRL |= (1<<1); //SWRT enable
    p->CTRL &= (0<<1);  //SWRT disable
    

    /* Configure the shifters */
    p->SHIFTCFG[0] = 
        FLEXIO_SHIFTCFG_INSRC                                                   /* Shifter input */
      | FLEXIO_SHIFTCFG_SSTOP(0U)                                               /* Shifter stop bit disabled */
      | FLEXIO_SHIFTCFG_SSTART(0U)                                              /* Shifter start bit disabled and loading data on enabled */
      | FLEXIO_SHIFTCFG_PWIDTH(8U-1U);                                          /* Bus width */
      
    p->SHIFTCTL[0] = 
        FLEXIO_SHIFTCTL_TIMSEL(0U)                                              /* Shifter's assigned timer index */
      | (0<<23) //FLEXIO_SHIFTCTL_TIMPOL(0U)                                    /* Shift on posedge of shift clock */
      | FLEXIO_SHIFTCTL_PINCFG(3U)                                              /* Shifter's pin configured as output */
      | FLEXIO_SHIFTCTL_PINSEL(4U)                                              /* Shifter's pin start index */
      | (0<<7)                                                                  /* Shifter's pin active high */
      | FLEXIO_SHIFTCTL_SMOD(2U);                                               /* Shifter mode as transmit */

    /* Configure the timer for shift clock */
    p->TIMCMP[0] = 
        ((1U * 2U - 1) << 8)                                                    /* TIMCMP[15:8] = number of beats x 2 – 1 */
      | (40U/2U - 1U); //(4U/2U - 1U)                                           /* TIMCMP[7:0] = baud rate divider / 2 – 1 */
    
    p->TIMCFG[0] = 
        FLEXIO_TIMCFG_TIMOUT(0U)                                                /* Timer output logic one when enabled and not affected by reset */
      | FLEXIO_TIMCFG_TIMDEC(0U)                                                /* Timer decrement on FlexIO clock, shift clock equals timer output */
      | FLEXIO_TIMCFG_TIMRST(0U)                                                /* Timer never reset */
      | FLEXIO_TIMCFG_TIMDIS(2U)                                                /* Timer disabled on timer compare */
      | FLEXIO_TIMCFG_TIMENA(2U)                                                /* Timer enabled on trigger high */
      | FLEXIO_TIMCFG_TSTOP(0U)                                                 /* Timer stop bit disabled */
      | (0<<1); //FLEXIO_TIMCFG_TSTART(0U);                                     /* Timer start bit disabled */
    
    
    p->TIMCTL[0] = 
        FLEXIO_TIMCTL_TRGSEL((((0U) << 2) | 1U))                                /* Timer trigger selected as shifter's status flag */
      | (1<<23) //FLEXIO_TIMCTL_TRGPOL(1U)                                      /* Timer trigger polarity as active low */
      | (1<<22)//FLEXIO_TIMCTL_TRGSRC(1U)                                       /* Timer trigger source as internal */
      | FLEXIO_TIMCTL_PINCFG(3U)                                                /* Timer' pin configured as output */
      | FLEXIO_TIMCTL_PINSEL(0)                                                 /* Timer' pin index: WR pin */
      | (1<<7) //FLEXIO_TIMCTL_PINPOL(1U)                                       /* Timer' pin active low */
      | FLEXIO_TIMCTL_TIMOD(1U);                                                /* Timer mode as dual 8-bit counters baud/bit */

    /* Enable FlexIO */
    p->CTRL |= FLEXIO_CTRL_FLEXEN;

Load the shifter buffer and check the timer status
Code:
/* Write command index */
    p->SHIFTBUF[0U] = data;

    /*Wait for transfer to be completed */
    while(0 == (p->TIMSTAT & (1U << 0U)))
    {
    }

The setup of this FlexIO instance is to push out data in a single beat write setup across an 8 bit wide parallel bus. And will be used in a very simple tft display driver
 
Hi Kurt,

It's been a while since this thread was active, but I thought I would take a chance... I would like to push SPI out on 6 channels at 20+MHz. Seems like a job for FlexIO. My current solution (which I don't have quite working yet) is to use external ICs to MUX SPI0 from the Teensy. This ultimately _should_ work, but I want to dig in to alternate plans. Eliminating the external chips and wiring would be a big win :) Any thoughts or advice? Should I just dive in with the code from your repo? Have you messed around with it more since last year?

Thanks and all the best,
-Max
 
Thanks a lot to miciwan and the other folks on this thread for explaining this FlexIO stuff; I am doing my best to wrap my head around it. Before I get in too far over my head, can anyone tell me if it would be possible to set up FlexIO to interface with a QSPI PSRAM memory chip? I'd love to be able to use this to give the MicroMod external memory like the T4.1 has (when it's soldered on).
 
Ive never messed with SPI, only with parallel FLEXIO, but technically speaking (assuming QSPI is just an additional three parallel serial lines) it would be a matter of increasing the shifter buffer length from 1 to 4 and feeding the data accordingly.
Take Kurts FlexIOSPI example and play around with the config, I’m sure you can get it running..

I would use FlexIO1 pins as is is DMA capable and leaves the wider FlexIO2 for camera/lcd interface
 
Last edited:
Back
Top