Teensy 4.1 - do I need other board with 2 cores to solve my problem?

Status
Not open for further replies.

Fiskusmati

Active member
Hello.

What I'm trying to achieve:
-2x 16bit parallel ADC connected to GPIOS (32 GPIOS used)
-probe 32GPIO states and save them in memory (array with 10 000 records). Each measurement must be taken after external interrupt occurs on GPIO (about 1 000 000 samples per second)
-after 10 000 records are collected (40 000 bytes) send them through Ethernet via UDP
-generate square wave 1MHz on one GPIO


Issue: sending that amount of data through Ethernet on teensy is blocking. It does take some time eg. 500uS. So I lose 500 measurements. I thought with PHY Ethernet it will take nanoseconds and not block my program.

Normally I would use ESP32, one core for taking measurements, another for UDP sending. But I need cable stable Ethernet connection (a lot of data that WiFi won't handle) and I need a lot of GPIO pins.

Teensy does not have second core, but maybe there is a solution? RTOS?
Or maybe I can still interrupt while data is sent through UDP, and take measurements (read gpio stated and put them in memory)?



I'm using RJ45 connector with magnetics inside, soldered directly to the teensy board.


Regards Mateusz
 
Last edited:
Hello.
What I'm trying to achieve:
-2x 16bit parallel ADC connected to GPIOS (32 GPIOS used)
-probe 32GPIO states and save them in memory (array with 10 000 records). Each measurement must be taken after external interrupt occurs on GPIO (about 1 000 000 samples per second)
-after 10 000 records are collected (40 000 bytes) send them through Ethernet via UDP
-generate square wave 1MHz on one GPIO

Maybe I'm misunderstanding your goal, but I would think about this some more. Are you actually trying to get analog data or digital pin state (i.e. on or off)? If it's digital pin state, then you could treat each pin as a bit and compress your data into a single 32 bit (4 byte) integer. If you are using analog data, how much usable resolution are you expecting on your ADC and how much do you need? Assuming that you can get by with 16 bit resolution, then use an array of 16 bit integers instead of converting to float - you'll save half the size.

Off the top of my head, I would probably set up an ISR to collect the data and add to a ring buffer. Pop data off the buffer and send over ethernet in the main loop or a lower priority ISR.
 
If the sampling is done in an interrupt service routine and the ethernet writing is done in the main loop, then no samples should be missed. Use a double buffer (2x 40k).
 
Yes, there will be binary data on 32 GPIO pins (not analog). It can be packet to 32bit (4 bytes) integers, but udp.write() function only allows byte arrays to be sent, not integer arrays.
 
If the sampling is done in an interrupt service routine and the ethernet writing is done in the main loop, then no samples should be missed. Use a double buffer (2x 40k).

The question is: is it safe to interrupt while UPD data is being transfered to Ethernet PHY chip?
 
If you are using my Ethernet Wrapper Library, I know it currently isn't setup to be thread safe so it may not be interrupt safe either, I haven't tested interrupts at all. It's probably going to be best if you can test it yourself, see what happens and report your findings back here.
 
If you are using my Ethernet Wrapper Library, I know it currently isn't setup to be thread safe so it may not be interrupt safe either, I haven't tested interrupts at all. It's probably going to be best if you can test it yourself, see what happens and report your findings back here.

Yes, I use your library, and thank you for that!
Quick test with timers ISR's + external interrupts showed no issues, but I will test it deeper.

Code:
    do
    {
    Udp.beginPacket(Udp.remoteIP(), Udp.remotePort());
    Udp.write(ReplyBuffer,sizeof(ReplyBuffer));
    Udp.endPacket();    
    }while(true);

This still generates 98-100mbps ethernet traffic, and ISR's also work ok.
 
Does anyone know is it possible to read all 32 GPIO states at once, not using digitalRead(1); digitalRead(2); etc...?
Port reading was easy on atmega's, how it's done with teensy and teensyduino?
 
The GPIO groups can be read all at once, but most pins are spread out between different groups, here’s the table for the T4.1:
Code:
PIN   GPIOn-BITm  |  GPIOn-BITm    PIN

------------------|-------------------

00  -> GPIO6-03   |   GIPO6-02  ->  01
01  -> GPIO6-02   |   GIPO6-03  ->  00
02  -> GPIO9-04   |   GIPO6-12  ->  24
03  -> GPIO9-05   |   GIPO6-13  ->  25
04  -> GPIO9-06   |   GIPO6-16  ->  19
05  -> GPIO9-08   |   GIPO6-17  ->  18
06  -> GPIO7-10   |   GIPO6-18  ->  14
07  -> GPIO7-17   |   GIPO6-19  ->  15
08  -> GPIO7-16   |   GIPO6-20  ->  40
09  -> GPIO7-11   |   GIPO6-21  ->  41
10  -> GPIO7-00   |   GIPO6-22  ->  17
11  -> GPIO7-02   |   GIPO6-23  ->  16
12  -> GPIO7-01   |   GIPO6-24  ->  22
13  -> GPIO7-03   |   GIPO6-25  ->  23
14  -> GPIO6-18   |   GIPO6-26  ->  20
15  -> GPIO6-19   |   GIPO6-27  ->  21
16  -> GPIO6-23   |   GIPO6-28  ->  38
17  -> GPIO6-22   |   GIPO6-29  ->  39
18  -> GPIO6-17   |   GIPO6-30  ->  26
19  -> GPIO6-16   |   GIPO6-31  ->  27
20  -> GPIO6-26   |   GIPO7-00  ->  10
21  -> GPIO6-27   |   GIPO7-01  ->  12
22  -> GPIO6-24   |   GIPO7-02  ->  11
23  -> GPIO6-25   |   GIPO7-03  ->  13
24  -> GPIO6-12   |   GIPO7-10  ->  06
25  -> GPIO6-13   |   GIPO7-11  ->  09
26  -> GPIO6-30   |   GIPO7-12  ->  32
27  -> GPIO6-31   |   GIPO7-16  ->  08
28  -> GPIO8-18   |   GIPO7-17  ->  07
29  -> GPIO9-31   |   GIPO7-18  ->  36
30  -> GPIO8-23   |   GIPO7-19  ->  37
31  -> GPIO8-22   |   GIPO7-28  ->  35
32  -> GPIO7-12   |   GIPO7-29  ->  34
33  -> GPIO9-07   |   GIPO8-12  ->  45
34  -> GPIO7-29   |   GIPO8-13  ->  44
35  -> GPIO7-28   |   GIPO8-14  ->  43
36  -> GPIO7-18   |   GIPO8-15  ->  42
37  -> GPIO7-19   |   GIPO8-16  ->  47
38  -> GPIO6-28   |   GIPO8-17  ->  46
39  -> GPIO6-29   |   GIPO8-18  ->  28
40  -> GPIO6-20   |   GIPO8-22  ->  31
41  -> GPIO6-21   |   GIPO8-23  ->  30
42  -> GPIO8-15   |   GIPO9-04  ->  02
43  -> GPIO8-14   |   GIPO9-05  ->  03
44  -> GPIO8-13   |   GIPO9-06  ->  04
45  -> GPIO8-12   |   GIPO9-07  ->  33
46  -> GPIO8-17   |   GIPO9-08  ->  05
47  -> GPIO8-16   |   GIPO9-22  ->  51
48  -> GPIO9-24   |   GIPO9-24  ->  48
49  -> GPIO9-27   |   GIPO9-25  ->  53
50  -> GPIO9-28   |   GIPO9-26  ->  52
51  -> GPIO9-22   |   GIPO9-27  ->  49
52  -> GPIO9-26   |   GIPO9-28  ->  50
53  -> GPIO9-25   |   GIPO9-29  ->  54
54  -> GPIO9-29   |   GIPO9-31  ->  29

As you can see only GPIO6 has 16 consecutive bits in order, obviously this is only enough for one of your ADCs to be read all at one time without having to rearrange any bits. If you used some external logic chips you could easily read both ADCs from those same 16 pins in a shared bus configuration, which in terms of processor cycles would use the least amount if you wanted to go that route.

To read them you just have to set your variable equal to GPIOn_DR, the registers are 32 bit so make sure your variables are sized appropriately such as:
Code:
uint32_t myVar = GPIO6_DR;
 
I'm at home - here's a fix :)
Code:
PIN   GPIOn-BITm  |  GPIOn-BITm    PIN
                                      
------------------|-------------------
                                      
00  -> GPIO6-03   |   GPIO6-02  ->  01
01  -> GPIO6-02   |   GPIO6-03  ->  00
02  -> GPIO9-04   |   GPIO6-12  ->  24
03  -> GPIO9-05   |   GPIO6-13  ->  25
04  -> GPIO9-06   |   GPIO6-16  ->  19
05  -> GPIO9-08   |   GPIO6-17  ->  18
06  -> GPIO7-10   |   GPIO6-18  ->  14
07  -> GPIO7-17   |   GPIO6-19  ->  15
08  -> GPIO7-16   |   GPIO6-20  ->  40
09  -> GPIO7-11   |   GPIO6-21  ->  41
10  -> GPIO7-00   |   GPIO6-22  ->  17
11  -> GPIO7-02   |   GPIO6-23  ->  16
12  -> GPIO7-01   |   GPIO6-24  ->  22
13  -> GPIO7-03   |   GPIO6-25  ->  23
14  -> GPIO6-18   |   GPIO6-26  ->  20
15  -> GPIO6-19   |   GPIO6-27  ->  21
16  -> GPIO6-23   |   GPIO6-28  ->  38
17  -> GPIO6-22   |   GPIO6-29  ->  39
18  -> GPIO6-17   |   GPIO6-30  ->  26
19  -> GPIO6-16   |   GPIO6-31  ->  27
20  -> GPIO6-26   |   GPIO7-00  ->  10
21  -> GPIO6-27   |   GPIO7-01  ->  12
22  -> GPIO6-24   |   GPIO7-02  ->  11
23  -> GPIO6-25   |   GPIO7-03  ->  13
24  -> GPIO6-12   |   GPIO7-10  ->  06
25  -> GPIO6-13   |   GPIO7-11  ->  09
26  -> GPIO6-30   |   GPIO7-12  ->  32
27  -> GPIO6-31   |   GPIO7-16  ->  08
28  -> GPIO8-18   |   GPIO7-17  ->  07
29  -> GPIO9-31   |   GPIO7-18  ->  36
30  -> GPIO8-23   |   GPIO7-19  ->  37
31  -> GPIO8-22   |   GPIO7-28  ->  35
32  -> GPIO7-12   |   GPIO7-29  ->  34
33  -> GPIO9-07   |   GPIO8-12  ->  45
34  -> GPIO7-29   |   GPIO8-13  ->  44
35  -> GPIO7-28   |   GPIO8-14  ->  43
36  -> GPIO7-18   |   GPIO8-15  ->  42
37  -> GPIO7-19   |   GPIO8-16  ->  47
38  -> GPIO6-28   |   GPIO8-17  ->  46
39  -> GPIO6-29   |   GPIO8-18  ->  28
40  -> GPIO6-20   |   GPIO8-22  ->  31
41  -> GPIO6-21   |   GPIO8-23  ->  30
42  -> GPIO8-15   |   GPIO9-04  ->  02
43  -> GPIO8-14   |   GPIO9-05  ->  03
44  -> GPIO8-13   |   GPIO9-06  ->  04
45  -> GPIO8-12   |   GPIO9-07  ->  33
46  -> GPIO8-17   |   GPIO9-08  ->  05
47  -> GPIO8-16   |   GPIO9-22  ->  51
48  -> GPIO9-24   |   GPIO9-24  ->  48
49  -> GPIO9-27   |   GPIO9-25  ->  53
50  -> GPIO9-28   |   GPIO9-26  ->  52
51  -> GPIO9-22   |   GPIO9-27  ->  49
52  -> GPIO9-26   |   GPIO9-28  ->  50
53  -> GPIO9-25   |   GPIO9-29  ->  54
54  -> GPIO9-29   |   GPIO9-31  ->  29

Pete
 
Good point. So the answer might be a singe read/shift for 16 bits and something similar to the T4.0 solution for the 2nd 16 bits?
 
The second 16 bits would be the more tricky part, if you really just wanted the most speed out of it I would use some fast tri-state 16 bit buffer chips so the two ADCs can share the same 16 pins. It should be faster to simply toggle an output and do another GPIO6 read than it would to do any kind of multi bit shifting to get 16 bits from the other registers.
 
You could put together some pins for the second 16 bits that would minimize the amount of bit manipulation that is required.
On GPIO7 use bits 0-3 (pins 10,12,11,13), on GPIO9 use bits 4-8 (pins 2,3,4,33,5) and on GPIO8 use bits 12-18 (pins 45,44,43,42,47,46,28).
The bits from GPIO7 and GPIO9 will already be in the correct place and only need to be masked off and then ORd together. The bits on GPIO8 will need to be masked, shifted right 3 places and ORd.
Note that the shift on T4.1 is a single-cycle instruction because the T4.1 (and T4.0 and T3.6) has a barrel shifter. Shifting 31 places "costs" no more than shifting one place.
I think the whole operation would take about 10 instruction cycles or about 17ns.

Pete
 
Honestly I'm curious because I've never looked into it, but how many instructions cycles and time does it take to read and write the GPIO registers?
 
I'm at home - here's a fix :)
Code:
PIN   GPIOn-BITm  |  GPIOn-BITm    PIN
                                      
------------------|-------------------
                                      
00  -> GPIO6-03   |   GPIO6-02  ->  01
01  -> GPIO6-02   |   GPIO6-03  ->  00
02  -> GPIO9-04   |   GPIO6-12  ->  24
03  -> GPIO9-05   |   GPIO6-13  ->  25
04  -> GPIO9-06   |   GPIO6-16  ->  19
05  -> GPIO9-08   |   GPIO6-17  ->  18
06  -> GPIO7-10   |   GPIO6-18  ->  14
07  -> GPIO7-17   |   GPIO6-19  ->  15
08  -> GPIO7-16   |   GPIO6-20  ->  40
...
Pete[/QUOTE]

Just in case someone is interested in the sketch producing this data for the various boards, here a link to the corresponding user wiki page: [url]https://github.com/TeensyUser/doc/wiki/Mapping-Pins-to-Ports[/url]
 
I don't understand why for you GPIO6 is the group where these 16 GPIOS are.
Please find out attached pdf and image below.
For me GPIO1 should be read, to obtain these 16 GPIO states (in docs below colored this group yellow).
And what is more, I have tested it and reading it using this instruction GPIO1_DR works perfectly fine, and gives me access to 16 bit AD_B1_00 to AD_B1_15.

schematic41.jpg
 

Attachments

  • i.MX RT1060 Crossover Processors for Consumer Prodcuts.pdf
    153.6 KB · Views: 90
Back during beta GPIO1-4 were being used, the short answer is that GPIO6-9 access the same pins but faster. I'm not sure how you are accessing them with that register since they aren't made active in the Teensy startup.c file so they shouldn't work as far as I'm aware.
 
I get roughly 13ns to read a port vs 2-3 ns for bit manipulation instructions. So you want to avoid port reads, even if it means more bit manipulation. Just ports 6 and 7 to get 32 bits?
 
I get roughly 13ns to read a port vs 2-3 ns for bit manipulation instructions. So you want to avoid port reads, even if it means more bit manipulation. Just ports 6 and 7 to get 32 bits?

Read a port, you mean read whole 32 bits at once using GPIOX_DR instruction?
 
Back during beta GPIO1-4 were being used, the short answer is that GPIO6-9 access the same pins but faster. I'm not sure how you are accessing them with that register since they aren't made active in the Teensy startup.c file so they shouldn't work as far as I'm aware.

Indeed this is correct. The 1062 was changed to FAST GPIO mode and that invalidates access using GPIO1-4 and moves them to GPIO6-9 for access unless the FAST IO setting is undone.

The imxrt.h file contains function macro information for access to those pins as GPIO6,7,8,9
 
Status
Not open for further replies.
Back
Top