Reading Pins in Parallel -- Teensy 4.1

briuz

Member
Hello,

I am looking for a fast way to read the digital pins. While there is a thread that shows it can be done, what is the exact mapping on a Teensy 4.1 to do so? i.e If I want to read 8 (or 16 or whatever) pins at a time what are they called.

(The thread is here: https://forum.pjrc.com/threads/5837...O-pins-on-the-Teensy-4-0-quot-atomically-quot)

I am not sure how to convert the Teensy 4.1 schematic into actual pin names.

It appears it can be done something like this:

register uint32_t data = IMXRT_GPIO6_DIRECT

But what does IMXRT_GPIO6_DIRECT map to? What are the names for the other groups of pins and/or where do I look?

Thanks!
 
But what does IMXRT_GPIO6_DIRECT map to?
That's a define in the code where you copied it from in the thread you linked.
The thread answers your main question, too, I think.

I'd suggest to read the reference manual, too.
 
As far as the schematic it shows on one side the Teensy pin number and on the other on the MCU end shows the port and 'bit' number within the port.

Running this sketch github.com/TeensyUser/doc/tree/master/examples/GPIO/pinList may help as it shows like this:
Code:
PIN   GPIOn-BITm  |  GPIOn-BITm    PIN
------------------|-------------------
00  -> GPIO6-03   |   GPIO6-02  ->  01
01  -> GPIO6-02   |   GPIO6-03  ->  00
02  -> GPIO9-04   |   GPIO6-12  ->  24
03  -> GPIO9-05   |   GPIO6-13  ->  25
...
> and on github.com/TeensyUser/doc/wiki there may be added info to find

They are out of order based on supporting the historical pin function where possible. So sequential MCU Port Pins are distributed based on MCU functional support.

Note: the PORT # on the schematic IIRC are the values for LOW SPEED pin operation, but on reset the pins are placed into HIGH speed mode that shifts the port numbers upwards with an offset that will then make sense when looking at example code. The Actual Port number and association will show running the above GPIO/pinList example code.

There are other examples that may clarify the port reading of pins - this one perhaps:
Simultaneously-reading-8-GPIO-pins
 
Sometimes it is hard to give one total answer to a question like this, it often depends on your needs.

That is how fast does it have to be... Teensy is pretty fast, so even simply simulating this with reading one pin at a time may be fast enough...

How to read schematic and figure out the pins... Note: I don't typically start off from schematic to figure which pin is which, As for most of the newer boards, I started experimenting in either Alpha or Beta mode of the boards, where Paul already generated the pin table... So had to generate tables starting from that side... Then usually generate excel document with pin information in different orders.
Which I usually keep up at: https://github.com/KurtE/TeensyDocuments

One page for T4.1 looks like:
screenshot.jpg

I have another page with the pins in GPIO pin order. Note: my pin numbers use the port numbers in normal speed (1-5) which translate 6-9 in Defragsters. I kept mine in lower as it matches the document.

Now how from schematic: if you look at Pin 0 you see on schematic it is connected to AD_B0_03 on IMXRT .

Now if you look in the IMXRT1060RM.pdf file (can download from product page)... And look probably easiest place is Chapter 11 (IOMUXC) and do a search for AD_B0_03 you will find
the register IOMUXC_SW_MUX_CTL_PAD_GPIO_AD_B0_03 (P477 in the version of pdf I am looking at)
You will see in Mux Mode 5: 101 ALT5 — Select mux mode: ALT5 mux port: GPIO1_IO03 of instance: gpio1
So this pin is GPIO1 (or 6) and Pin 3 on it...

And it shows you all of the other things this pin can do like for a Hardware Serial port: 010 ALT2 — Select mux mode: ALT2 mux port: LPUART6_RX of instance: lpuart6

Different ways to read multiple pins at once: The core code has code to emulate some of the AVR port registers, like I think on early AVR Arduino boards PortD was pins 0-7. The Teensy code
has classes to emulate this. If you look at ...\teensy4\avr_emulation.h about lines starting 512...
Like: uint8_t pins = PIND;
Which does:
Code:
	operator int () const __attribute__((always_inline)) {
		int ret = 0;
		if (digitalReadFast(0)) ret |= (1<<0);
		if (digitalReadFast(1)) ret |= (1<<1);
		if (digitalReadFast(2)) ret |= (1<<2);
		if (digitalReadFast(3)) ret |= (1<<3);
		if (digitalReadFast(4)) ret |= (1<<4);
		if (digitalReadFast(5)) ret |= (1<<5);
		if (digitalReadFast(6)) ret |= (1<<6);
		if (digitalReadFast(7)) ret |= (1<<7);
		return ret;
	}
You are obviously not restricted to these pins, you could create your own similar one...

You can take the earlier stuff and find one of the ports that has enough pins for you and read in the whole register and depending on your needs, either just use them and/or manipulate the output, depending on needs.

Again depending on why... there may be other solutions. Example simple camera inputs, like OV7670...
We experimented with GPIO pins, with choosing ones that we could simple manipulation of the 32 bit values.
We also on T4.1 experimented using the CSI interface for reading in 4 bits at a time.
On Micromod we experimented with using FlexIO for this...

Hope some of this helps
 
Thank you all for the answers. I am going to look into what everyone posted and will report back soon. What I am trying to do is have the Teensy read 16 address pins and 8 data pins then respond on the data pins in under 250 ns. (I am basically attempting to race the bus on a 6502 chip running at almost 2 MHz.) I believe this may be possible if the pins are read in parallel.
 
Thank you all for the answers. I am going to look into what everyone posted and will report back soon. What I am trying to do is have the Teensy read 16 address pins and 8 data pins then respond on the data pins in under 250 ns. (I am basically attempting to race the bus on a 6502 chip running at almost 2 MHz.) I believe this may be possible if the pins are read in parallel.

That's hard.
You must remember that
a) the Teensy has some running interrupts.
b) interrupts get globally disabled from time to time (and get served after that-> back to a)

both can kill the needed timing completely. The few ns you save by reading parallel are neglegtible compared to that.
You may have to disable the interrupts yourself - > you'll loose USB communication, the systick and much more.

It'S probably more a job for a FPGA or other additional hardware.. (Adressdecoder etc...)
 
As far as the schematic it shows on one side the Teensy pin number and on the other on the MCU end shows the port and 'bit' number within the port.

Running this sketch github.com/TeensyUser/doc/tree/master/examples/GPIO/pinList may help as it shows like this:
Code:
PIN   GPIOn-BITm  |  GPIOn-BITm    PIN
------------------|-------------------
00  -> GPIO6-03   |   GPIO6-02  ->  01
01  -> GPIO6-02   |   GPIO6-03  ->  00
02  -> GPIO9-04   |   GPIO6-12  ->  24
03  -> GPIO9-05   |   GPIO6-13  ->  25
...
> and on github.com/TeensyUser/doc/wiki there may be added info to find

They are out of order based on supporting the historical pin function where possible. So sequential MCU Port Pins are distributed based on MCU functional support.

Note: the PORT # on the schematic IIRC are the values for LOW SPEED pin operation, but on reset the pins are placed into HIGH speed mode that shifts the port numbers upwards with an offset that will then make sense when looking at example code. The Actual Port number and association will show running the above GPIO/pinList example code.

There are other examples that may clarify the port reading of pins - this one perhaps:
Simultaneously-reading-8-GPIO-pins


I ran the pinlist program and got this:

PIN GPIOn-BITm | GPIOn-BITm PIN

------------------|-------------------

00 -> GPIO6-03 | GPIO6-02 -> 01
01 -> GPIO6-02 | GPIO6-03 -> 00
02 -> GPIO9-04 | GPIO6-12 -> 24
03 -> GPIO9-05 | GPIO6-13 -> 25
04 -> GPIO9-06 | GPIO6-16 -> 19
05 -> GPIO9-08 | GPIO6-17 -> 18
06 -> GPIO7-10 | GPIO6-18 -> 14
07 -> GPIO7-17 | GPIO6-19 -> 15
08 -> GPIO7-16 | GPIO6-20 -> 40
09 -> GPIO7-11 | GPIO6-21 -> 41
10 -> GPIO7-00 | GPIO6-22 -> 17
11 -> GPIO7-02 | GPIO6-23 -> 16
12 -> GPIO7-01 | GPIO6-24 -> 22
13 -> GPIO7-03 | GPIO6-25 -> 23
14 -> GPIO6-18 | GPIO6-26 -> 20
15 -> GPIO6-19 | GPIO6-27 -> 21
16 -> GPIO6-23 | GPIO6-28 -> 38
17 -> GPIO6-22 | GPIO6-29 -> 39
18 -> GPIO6-17 | GPIO6-30 -> 26
19 -> GPIO6-16 | GPIO6-31 -> 27
20 -> GPIO6-26 | GPIO7-00 -> 10
21 -> GPIO6-27 | GPIO7-01 -> 12
22 -> GPIO6-24 | GPIO7-02 -> 11
23 -> GPIO6-25 | GPIO7-03 -> 13
24 -> GPIO6-12 | GPIO7-10 -> 06
25 -> GPIO6-13 | GPIO7-11 -> 09
26 -> GPIO6-30 | GPIO7-12 -> 32
27 -> GPIO6-31 | GPIO7-16 -> 08
28 -> GPIO8-18 | GPIO7-17 -> 07
29 -> GPIO9-31 | GPIO7-18 -> 36
30 -> GPIO8-23 | GPIO7-19 -> 37
31 -> GPIO8-22 | GPIO7-28 -> 35
32 -> GPIO7-12 | GPIO7-29 -> 34
33 -> GPIO9-07 | GPIO8-12 -> 45
34 -> GPIO7-29 | GPIO8-13 -> 44
35 -> GPIO7-28 | GPIO8-14 -> 43
36 -> GPIO7-18 | GPIO8-15 -> 42
37 -> GPIO7-19 | GPIO8-16 -> 47
38 -> GPIO6-28 | GPIO8-17 -> 46
39 -> GPIO6-29 | GPIO8-18 -> 28
40 -> GPIO6-20 | GPIO8-22 -> 31
41 -> GPIO6-21 | GPIO8-23 -> 30
42 -> GPIO8-15 | GPIO9-04 -> 02
43 -> GPIO8-14 | GPIO9-05 -> 03
44 -> GPIO8-13 | GPIO9-06 -> 04
45 -> GPIO8-12 | GPIO9-07 -> 33
46 -> GPIO8-17 | GPIO9-08 -> 05
47 -> GPIO8-16 | GPIO9-22 -> 51
48 -> GPIO9-24 | GPIO9-24 -> 48
49 -> GPIO9-27 | GPIO9-25 -> 53
50 -> GPIO9-28 | GPIO9-26 -> 52
51 -> GPIO9-22 | GPIO9-27 -> 49
52 -> GPIO9-26 | GPIO9-28 -> 50
53 -> GPIO9-25 | GPIO9-29 -> 54
54 -> GPIO9-29 | GPIO9-31 -> 29


If I understand correctly, pin 27, for example is the left-most (highest) bit of GPIO6.

The Teensy 4.1 schematic for pin 27 shows AD_B1_15. On page 503 of the reference manual (revision 3) mentioned by KurtE *AD_B1_15 lists GPIO1_IO31...which would also be GPIO6_IO31.

So, given:

register uint32_t data = IMXRT_GPIO6_DIRECT

Pin 27 (digital) would be the 31st bit (or left-most bit) of (uint32_t)data above.


Thanks!
 
here is a sorted by (GPIO groups) outtake from core_pins.h
the core pins I assume is the pin number given to Teensy PCB
then GPIOx_DR is the data register
there is also
GPIOx_DR_SET, GPIOx_DR_CLEAR, GPIOx_DR_TOGGLE
GPIOx_GDIR, GPIOx_PSR

Code:
#define CORE_PIN1_PORTREG	GPIO6_DR 2
#define CORE_PIN0_PORTREG	GPIO6_DR 3
#define CORE_PIN24_PORTREG	GPIO6_DR 12
#define CORE_PIN25_PORTREG	GPIO6_DR 13

// here is one continuous block of 16 bits
// should be able to read like this
// uint16_t addr= (GPIO6_DR & 0xFF00) / 256;

#define CORE_PIN19_PORTREG	GPIO6_DR 16  0
#define CORE_PIN18_PORTREG	GPIO6_DR 17  1
#define CORE_PIN14_PORTREG	GPIO6_DR 18  2
#define CORE_PIN15_PORTREG	GPIO6_DR 19  3
#define CORE_PIN40_PORTREG	GPIO6_DR 20  4
#define CORE_PIN41_PORTREG	GPIO6_DR 21  5
#define CORE_PIN17_PORTREG	GPIO6_DR 22  6
#define CORE_PIN16_PORTREG	GPIO6_DR 23  7 
#define CORE_PIN22_PORTREG	GPIO6_DR 24  8
#define CORE_PIN23_PORTREG	GPIO6_DR 25  9
#define CORE_PIN20_PORTREG	GPIO6_DR 26  10
#define CORE_PIN21_PORTREG	GPIO6_DR 27  11
#define CORE_PIN38_PORTREG	GPIO6_DR 28  12
#define CORE_PIN39_PORTREG	GPIO6_DR 29  13
#define CORE_PIN26_PORTREG	GPIO6_DR 30  14
#define CORE_PIN27_PORTREG	GPIO6_DR 31  15

// by the looks of it there are not any continuous block of 8 bits

#define CORE_PIN10_PORTREG	GPIO7_DR 0
#define CORE_PIN12_PORTREG	GPIO7_DR 1
#define CORE_PIN11_PORTREG	GPIO7_DR 2
#define CORE_PIN13_PORTREG	GPIO7_DR 3
#define CORE_PIN6_PORTREG	GPIO7_DR 10
#define CORE_PIN9_PORTREG	GPIO7_DR 11
#define CORE_PIN32_PORTREG	GPIO7_DR 12
#define CORE_PIN8_PORTREG	GPIO7_DR 16
#define CORE_PIN7_PORTREG	GPIO7_DR 17
#define CORE_PIN36_PORTREG	GPIO7_DR 18
#define CORE_PIN37_PORTREG	GPIO7_DR 19
#define CORE_PIN35_PORTREG	GPIO7_DR 28
#define CORE_PIN34_PORTREG	GPIO7_DR 29

#define CORE_PIN45_PORTREG	GPIO8_DR 12
#define CORE_PIN44_PORTREG	GPIO8_DR 13
#define CORE_PIN43_PORTREG	GPIO8_DR 14
#define CORE_PIN42_PORTREG	GPIO8_DR 15
#define CORE_PIN47_PORTREG	GPIO8_DR 16
#define CORE_PIN46_PORTREG	GPIO8_DR 17
#define CORE_PIN28_PORTREG	GPIO8_DR 18
#define CORE_PIN31_PORTREG	GPIO8_DR 22
#define CORE_PIN30_PORTREG	GPIO8_DR 23

#define CORE_PIN2_PORTREG	GPIO9_DR 4
#define CORE_PIN3_PORTREG	GPIO9_DR 5
#define CORE_PIN4_PORTREG	GPIO9_DR 6
#define CORE_PIN33_PORTREG	GPIO9_DR 7
#define CORE_PIN5_PORTREG	GPIO9_DR 8
#define CORE_PIN51_PORTREG	GPIO9_DR 22
#define CORE_PIN48_PORTREG	GPIO9_DR 24
#define CORE_PIN53_PORTREG	GPIO9_DR 25
#define CORE_PIN52_PORTREG	GPIO9_DR 26
#define CORE_PIN49_PORTREG	GPIO9_DR 27
#define CORE_PIN50_PORTREG	GPIO9_DR 28
#define CORE_PIN54_PORTREG	GPIO9_DR 29
#define CORE_PIN29_PORTREG	GPIO9_DR 31

as you read in the comments above there is not additional 8 bits continuous
but that can easily be solved by multiplexing the address and data using 3x 74hc245
but think this approach will be the fastest possible

multiplexing_addr_data.png
 
also need to mention that you have to use the
GPIO6_GDIR to set the data directions
i.e.
GPIO6_GDIR &= 0x00FF; // to set pins to inputs (bit16-31)
GPIO6_GDIR |= 0xFF00; // to set pins to outputs (bit16-31)

and to mention that by reading the datasheet
GPIO6_DR is used to write data to the port
GPIO6_PSR is used to read data from the port
i.e.
// reading data
uint16_t data = (GPIO6_PSR & 0xFF00) >> 16;
// writing data (and preserve other pins data)
GPIO6_DR = (GPIO6_DR & 0x00FF) | (data << 16);
 
Other options for speed or timing may include DMA.

Side notes with DMA is it does not work with pin in High speed mode, you have to switch them back to normal... GPIO1 instead of 6...
More about that in other threads...
 
also need to mention that you have to use the
GPIO6_GDIR to set the data directions
i.e.
GPIO6_GDIR &= 0x00FF; // to set pins to inputs (bit16-31)
GPIO6_GDIR |= 0xFF00; // to set pins to outputs (bit16-31)

and to mention that by reading the datasheet
GPIO6_DR is used to write data to the port
GPIO6_PSR is used to read data from the port
i.e.
// reading data
uint16_t data = (GPIO6_PSR & 0xFF00) >> 16;
// writing data (and preserve other pins data)
GPIO6_DR = (GPIO6_DR & 0x00FF) | (data << 16);

I was thinking I could still set the data directions with the normal pinMode() function???
 
I was thinking I could still set the data directions with the normal pinMode() function???

yes but that would be very slow, specially if you want to write data back to the 6502 bus
and why not use the GDIR directly, there is no problems doing so.
 
yes but that would be very slow, specially if you want to write data back to the 6502 bus
and why not use the GDIR directly, there is no problems doing so.

Good point. The data line level shifters will need to be flipped quickly. Thanks for pointing it out.
 
You'll need additional leveshifters.

I've got 5 level shifters. 2 set for input to the Teensy for the address lines, 1 set for the data lines and the direction can be controlled by the Teensy, and 1 set to input to the Teensy for certain signals (such as Phi2), and 1 more that has both a chip select and direction that can be controlled by the Teensy for some other lines.

What I screwed up on, in retrospect, is I should have figured on arranging the connections as per the GPIO0x pin registers(?) as opposed to what made the most sense by the physical wiring. But there are a couple other mistakes on the board and I have it hacked to work. I am hoping to at least prove the concept with this board then get a board made using all the lessons I learned.
 
That's hard.
You must remember that
a) the Teensy has some running interrupts.
b) interrupts get globally disabled from time to time (and get served after that-> back to a)

both can kill the needed timing completely. The few ns you save by reading parallel are neglegtible compared to that.
You may have to disable the interrupts yourself - > you'll loose USB communication, the systick and much more.

It'S probably more a job for a FPGA or other additional hardware.. (Adressdecoder etc...)

I wasn't thinking about the interrupts...

Many times this is done by an FPGA but somebody created a cartridge called the UNO Cart that uses a micro controller.

I was originally thinking of having the Teensy manage a parallel SRAM chip that was loaded by the Teensy but then decided to try a Teensy alone as someone else got a micro-controller to race the bus in a project called "UNO Cart".
 
That's hard.
You must remember that
a) the Teensy has some running interrupts.
b) interrupts get globally disabled from time to time (and get served after that-> back to a)

both can kill the needed timing completely. The few ns you save by reading parallel are neglegtible compared to that.
You may have to disable the interrupts yourself - > you'll loose USB communication, the systick and much more.

It'S probably more a job for a FPGA or other additional hardware.. (Adressdecoder etc...)



Maybe use a CPLD and a memory chip with the Teensy on the back-end to load and save from SD cards, load other memory features, etc.
 
Another aproach would be to emulate the 6502+RAM on the Teensy :) So, all timing would be in your hand, you could still use the rest of the hardware...
 
Thanks for the great support! I decided to do things differently and include memory and a CPLD as well as a Teensy 4.1. The CPLD will decode and act as a register so the 6502 can communicate with the Teensy. I'll solve the loading and interfacing issues with software. I will, however, read the pins in parallel, like we discussed in this thread, so as to speed up communication.
 
I did this, nice that almost all the pins are on one side
View attachment 27261
I have a project using 3xADS8588SIPMR ADCs to make a 24 channel power meter. I'm trying to see how I can use FlexIO3 to read 16 bit parallel data and like the contiguous 16 bits available on this port. I'm only sampling at 20kHz and won't lose too much time manually reading the ADC without DMA. If the Teensy 4.1 will do the job I'll be able to use it in my design.

However, I'm really struggling to understand FlexIO...
Things I can't find answers to:
It appears to be possible in the FlexIO mux to only configure muxing the 16 pins of the 32 bit port I need (half the port). The user manual shows each pin is individually programmable for mode with Alt9 the one I need.
When reading the 32 bit port will the top 16 bits be rubbish or the current state of the unmuxed pins, either way I don't care as I'll mask the upper 16 bits to get just the lower 16 bit result.
If the other half of the port pins aren't muxed with their default function still be available?

Are there any examples showing how to mux half the port and read it?
 
Back
Top