10ns to read 8 lignes: feasible with Teensy 4.0 ?

nounours18200

Active member
Hi,

I usually program on Arduino, but this time I have a task than cannot be achieved by an Arduino, because it requires a much more quicker processor than any Arduino.

I have to read 8 lines of a parallel interface, and (according to the timing diagram below) I have 10ns for doing this:
03.Parallel Timing Interface_§8.1_M.jpg


We initially imagine to attach an interrupt to the /WR line, and read the lines D0-D7.
But it seems that an Arduino (processor ATmega328) will take more time than the available 10ns, to process the interrupt and read the 8 lines.

Therefore I would like your opinion: would it be feasible with a Teensy 4.0 ?

its Cortex M7 is far more powerful than an ATmega328, so maybe it would achieve this [ interrupt + reading] in 10ns?

Thank you very much for your reply,
Nounours
 
I would choose 9 pins which are all on the same 32 bit hardware GPIO port. When WR goes low, begin reading the port register in a tight loop. Save the prior reading in another variable before you read it again. On the first reading where you observe WR as 1, take the 8 data bits from the prior reading when WR was 0.
 
It looks like this interface is really designed to talk to something with an input latch register. Pauls strategy seems a good one.
 
Data ist valid "Min 100ns". - Just wait 160ns-100ns = 60ns after setting WR->0, then read. And isn't WR an input to the chip?
Am i'm reading the diagram that wrong?

In a German forum, you would have been criticized if you didn't think it necessary to say which chip it was. You wouldn't have gotten an answer either. Be glad :)
 
Last edited:
Data ist valid "Min 100ns". - Just wait 160ns-100ns = 60ns after setting WR->0, then read. And isn't WR an input to the chip?
Am i'm reading the diagram that wrong?
If I understand what you mean, you assume that T1=60ns , don't you ?
03.Parallel Timing Interface_§8.1_M2.jpg

it should be OK, but it is not 100% sure, unless I am wrong ?

It looks like this interface is really designed to talk to something with an input latch register. Pauls strategy seems a good one.
This parallel interface comes from a device and goes to a (very old) VFD screen display.
The VFD display has its own PCB with its own controller (a NEC FIPC8367), and to date we are not able to find the FIPC8367 datasheet.
Fortunately we have the datasheet of the display module, so we have this timing diagram.
I don't know if this kind of device (VFD display) use an input latch register.


When WR goes low, begin reading the port register in a tight loop. Save the prior reading in another variable before you read it again. On the first reading where you observe WR as 1, take the 8 data bits from the prior reading when WR was 0.
Your approach is a bit more complex to program but it seems more "secure", I mean that the result should be OK because it does not depends on T1.

What do you mean by "I would choose 9 pins which are all on the same 32 bit hardware GPIO port": on a Teensy 4.0 I planned to use:
-Pins 0 to 7 for [D0 to D7]
-Pin 8 for WR; Pin 9 for A0; Pin 10 for CS; Pin 11 for T0; Pin 14 for BUSY.
do they belong to "the same 32 bits GPIO port" ?

It would be the 1st time I use a Teensy 4.0, so do you think that it will be "powerful enough" to complete the job ?
And do you think that a "poor Arduino Nano" has a chance to to the job , or is it definitely too slow ?

Thank you very much for your expertise !
 
Ah, okay, that sheds some light on the matter! I would first use a logic analyzer to obtain the actual numbers.
 
It would be the 1st time I use a Teensy 4.0, so do you think that it will be "powerful enough" to complete the job ?

I don't really understand what the job is. So far I've seen a lot of talk about reading 8 pins within a tight timing window, but unless I missed something, what is the actual job to be done?

Maybe you're reverse engineering how these 2 devices talk to each other?

Or maybe you're trying to create a replacement for one of them?


What do you mean by "I would choose 9 pins which are all on the same 32 bit hardware GPIO port": on a Teensy 4.0 I planned to use:
-Pins 0 to 7 for [D0 to D7]

Simiar to how AVR has 3 ports with 8 bits each, Teensy 4.x has 4 ports with 32 bits each. You can read an entire 32 bit port at once, which is a lot faster than reading it one bit at a time.

You'll probably need to look at Kurt's large pinout chart, because which pins are assigned to which ports isn't shown on the normal small pinout card.


Teensy 4.1 will probably work better for this project, because it has more pins, and more pins can be read as a single group. If you look at the GPIO column, you'll see pin 19 is GPIO 1.16. Pin 19 is GPIO 1.17, and so on. Not all 32 possible bits of port 1 are available, but 16 to 23 definitely are. One way to accomplish this would have D0 to D7 connect to the pins that are 1.16 to 1.23. Then you can read the entire 32 bits at once, and if you want the 8 data bits, just right shift by 16 and mask off the upper 24 bits.
 
One other issue you might face is 5 volt signals. Teensy 4.x is not 5V tolerant. If this old system uses 5V signals, you'll need a buffer chip or other circuitry to reduce the signals to 3.3V level. For slower signals you can just use resistors. But for this sort of signal where you're dealing with timing in the ~100ns range, best to use a 3.3V buffer chip which has 5V tolerant input.
 
I don't really understand what the job is. So far I've seen a lot of talk about reading 8 pins within a tight timing window, but unless I missed something, what is the actual job to be done?

Maybe you're reverse engineering how these 2 devices talk to each other?

Or maybe you're trying to create a replacement for one of them?
I agree that it can be a little "cloudy" !...
The purpose is to replace the very old VFD display of a device by a modern LCD display.
The old VFD display is almost impossible to find (used it costs >1.000€ ....) and the available space for the display in the device is very limited, so another old VFD would not fit in.
Therefore we have to create a board that will read the signal coming from the device through the 34 pins ribbon, and will send the characters to a new LCD display (connected to the board via I2C).

Teensy 4.x has 4 ports with 32 bits each. You can read an entire 32 bit port at once, which is a lot faster than reading it one bit at a time.
I understand, I have the following lines to read:
-D0 to D7 ---> 8 bits (or lines coming from the ribbon), the byte representing the character.
-/CS, /WR, A0 ---> 3 controls bits (or lines coming from the ribbon), representing the control and synchronization signals.

In my opinion, these 11 lines are "time reading critical" because they are involved in the reading LOOP.

In addition, there are 4 others lines:
-BUSY that seems to be an OUTPUT line to indicate to the device that it has to wait because the display is not ready to receive the next characters.
-/BL --> a control line to indicate that we have to blank the display.
-T0 and RXD that seem useless for our project.

Consequently, it seems to me that we have 11+2=13 lines that are "time-reading" critical, and /BL is an output.
All the other odd lines of the ribbon (2, 4, 6,..., 34) are permanently connected to the Ground.
In addition we need SDA and SCL, to control the I2C bus.

Consequently it seems to me that we just need 13 lines in the reading loop: is it feasible to do the same with a Teensy 4.0 ?

I do have a Teensy 4.1 in stock, but it has a larger size than the 4.0, so it will be more difficult to place it into the device chassis --> please give me your viewpoint (is there a diagram showing the link between pin and GPIO, to check if a Teensy 4.0 can handle the lines on a unique GPIO ?).

Not all 32 possible bits of port 1 are available, but 16 to 23 definitely are. One way to accomplish this would have D0 to D7 connect to the pins that are 1.16 to 1.23.

True.
I can do this with a 4.1, tell me if a 4.0 can do the same (it is just a matter of physical size for me, the cost does not matter).
Then you can read the entire 32 bits at once, and if you want the 8 data bits, just right shift by 16 and mask off the upper 24 bits.
I am lost, don't know how to read the 8 lines simultaneously ? you will explain me that when we will review the program (I work on).

One other issue you might face is 5 volt signals. Teensy 4.x is not 5V tolerant. If this old system uses 5V signals, you'll need a buffer chip or other circuitry to reduce the signals to 3.3V level. For slower signals you can just use resistors. But for this sort of signal where you're dealing with timing in the ~100ns range, best to use a 3.3V buffer chip which has 5V tolerant input

I thought about this, and I planned to use 5V<->3V3 bidirectional converters that I already have in stock, like this one:
5V to 3V3 converter.jpg


is it OK, or do you recommend something else ?

Thank yo very much for your help, meantime I go back to the program...
 
Hi Paul,
This is what I suspected, so I will go to the 74LCX125.
But before going in the "hard part" of the program and motherboard PCB design, I would like to have your viewpoint on an approach that has been proposed by a friend of mine, and that matches with one of my ideas:
-since the beginning of this project, I have always been surprised that we have to use a powerful 600MHz processor, to read an interface that has been designed in the 90s, and that feeds a display using a 3.68MHz resonator...
-so a friend has suggested to use a set of 74HCT573 latch as inputs, and just read its outputs using a common µC (he means a not powerful one, or at least with no need to deal with huge time constraints) and feed a LCD via I2C.

The BUSY line could be activated (HIGH) by the µC as soon as the /WR line goes UP, to ensure that the device will not send any new character until the LCD has finished its work.

What do you think about this approach ?

Thank you very much,
 
Yes, adding digital circuitry could lessen your need for fast response. Exactly what circuitry you will need, I can't say for sure. Just 1 latch might be enough, or you might need more.

Obviously if you need to change the circuitry, that's not only a matter of loading new firmware. Well, unless you use a programmable logic chip.

Even with circuitry added, you might consider the effect of asserting the busy signal for longer times than the original display does. Maybe it will have no noticeable effect, or maybe too much could alter the behavior of that old controller.
 
IMHO, a small FPGA might be a better fit for such a task. Or programmable state machine like PIO in RPico with its wait instruction.
 
Yes, adding digital circuitry could lessen your need for fast response. Exactly what circuitry you will need, I can't say for sure. Just 1 latch might be enough, or you might need more.
...
Even with circuitry added, you might consider the effect of asserting the busy signal for longer times than the original display does. Maybe it will have no noticeable effect, or maybe too much could alter the behavior of that old controller.
I agree.
No need to change the circuitry: the motherboard and program will be designed only for this project.
I am also (like you) a bit concerned with the "response delay" in case I use a not powerful µC: so I will use a Teensy 4.x to get the best processing time.

The Teensy 4.0 would be a better option regarding the limited available space in the device: the 4.1 is bigger and it would be extremely difficult to install it in the device....

Can you tell me what are the pins that are connected to a same GPIO on a Teensy 4.0 ? I have not found this info for the 4.0, only for the 4.1 (I will use the 4.1 only if I am obliged, because of its size...).

Thank you very much !
P.S.:
IMHO, a small FPGA might be a better fit for such a task. Or programmable state machine like PIO in RPico with its wait instruction.
True, but I have never used a FPGA, so for this project I prefer to stay on a Teensy.
But a next project that I have in mind may require a FPGA or a DSP...
 
-since the beginning of this project, I have always been surprised that we have to use a powerful 600MHz processor, to read an interface that has been designed in the 90s, and that feeds a display using a 3.68MHz resonator...
The Teensy 4.x is more than capable of doing this, but it's not really correct to assume the clock speed of 600MHz puts it on a par with something like an FPGA that is designed to handle high speed IO. That speed is really only applicable to the CPU core, as far as IO speeds go the Teensy is limited to around only 200MHz.

If you want to know what pins are on the same port, look at KurtE's spreadsheet: https://github.com/KurtE/TeensyDocuments/blob/master/Teensy4x Pins.xlsx
The "Alt5" column shows the GPIO bank and bit for each pin.
 
Thank you jmarsh: very useful information !
I was in the process of designing the PCB of the motherboard; I try to use the Teensy 4.0 because of the very limited available space in the original device...
It seems that I can link all the necessary input lines to the same GPIO, as follow:
Line A0 -> Pin 0 -> GPIO 1.03
Line /CS -> Pin 19 -> GPIO 1.16
Line /WR -> Pin 18 -> GPIO 1.17
Line D7 -> Pin 14 -> GPIO 1.18
Line D6 -> Pin 15 -> GPIO 1.19
Line D5 -> Pin 17 -> GPIO 1.22
Line D4 -> Pin 16 -> GPIO 1.23
Line D3 -> Pin 22 -> GPIO 1.24
Line D2 -> Pin 23 -> GPIO 1.25
Line D1 -> Pin 20 -> GPIO 1.26
Line D0 -> Pin 21 -> GPIO 1.27
The 2 remaining lines, BL and BUSY are not involved in the reading Loop, so if seems that they are less time-critical...

I also need an I2C output (SDA, SCL), and according to the excel file they could be:
-Pins 24 & 25, Wire2(4) SCL; and Wire2(4) SDA
or
-Pins 37 a 36, Wire1(3= SCL; and Wire(3) SDA
None of them is easily accessible from the border of the T4... that complexifies the PCB and/or the usage.

Regarding the reading of the GPIO, I have briefly read the suggested post, and it seems that something like GPIO1_PSR plus masking and OR operations should be the right approach.

But it over my knowledge, it is too difficult to understand for me...
 
Back
Top