Teensy 4.1: Storing the value of 18 pins' input quickly

Status
Not open for further replies.
I'm currently working on making a capture card for a DSi console and am currently working on the part where I get the input from the screen. The screen is an 18-bit display with a parallel RGB interface. I have to read the 18 bits with a window of .00016 ms multiple times in succession (for a total of ~60,000 readings per frame). Between frames I have about 4ms to send the frame data to a computer, but I'm not focusing on that part yet. I'm using a Teensy 4.1 board.

The timing pin I have is the DCLK pin. I have to read the 18 bits and store them somewhere every time the DCLK pin changes (during the frame, it changes every ~.00016 ms). I've calculated that I have ~100 clock cycles of the Teensy board when running it at 600 Mhz.

I was able to easily get 1 bit of the 18 working with all the timing, but when I added the digitalRead of 5 other bits, it wasn't able to keep up at 600 Mhz, and was somewhat able to keep up when overclocked at 816 Mhz (some pixels were missing, but most were there). Since I was able to get 1 bit of color working, I know that the hardware is correct.

Pin Descriptions:
GSP - When it falls it marks the start of a new frame
GCK - Marks the beginning/end of rows
DCLK - Pixel Clock - changes and iterates through pixels, row by row
R0-R5 - 6 of the 18 bits of color. These update each time DCLK changes.

I have been using FASTRUN and interrupts in order to solve this, though I could change to a loop which constantly checks waiting for the DCLK signal to change until the frame is over (the only things that change between frames are the 18 bits of color, all other pins repeat).

Please let me know if you have any advice. I'm using the Arduino IDE with Teensyduino, though I'm open to switching to another method - I just haven't figured out how to.

Code:
#include <SD.h>

File myFile;
const int chipSelect = BUILTIN_SDCARD;
volatile int state = 0;


#define DCLK 2//Pixel Clock
#define GCK 3//Essentially HSYNC
//SP__ are speaker pins
#define GSP 5//Essentially VSYNC

#define R0 23
#define R1 22
#define R2 21
#define R3 20
#define R4 19
#define R5 18
//GSP should be the same as SPS



volatile int numgck = 0;
volatile int numclk = 0;
FASTRUN volatile int filestream[100000];
volatile int currentindex = 0;



FASTRUN void gspinterrupt(){
  
  if (state==0){
//This right here is just making sure it only reads one frame - for testing purposes
    Serial.println("Started2");
    state=1;
    addfilestream(4);
  }else{
    Serial.println("Writing files");
    detachInterrupt(GSP);
    detachInterrupt(GCK);
    detachInterrupt(DCLK);
    Serial.println("Writing files");
    for(int i=0;i<=currentindex;i++){
      myFile.print(filestream[i]);
    }
    myFile.close();
    Serial.println("File Closed");
    while(1){
      
    }
  }
  
}
void gckrisinginterrupt(){
  //This isn't relevant to the problem I'm having, I just have it write a "3" in the file so I know when the GCK pin changes
  addfilestream(3);
  
  
}
FASTRUN void dclkinterrupt(){
  addfilestream(digitalRead(R5));

  //It was able to keep up with the screen before I added these digitalReads
  digitalRead(R4);
  digitalRead(R3);
  digitalRead(R2);
  digitalRead(R1);
  digitalRead(R0);
  
}

FASTRUN void addfilestream(int toAdd){
  filestream[currentindex]=toAdd;
  currentindex++;
}

//Have just under 4 ms between frames
void setup() {
  // put your setup code here, to run once:
  pinMode(DCLK, INPUT);
  pinMode(GCK, INPUT);
  pinMode(GSP, INPUT);
  pinMode(R0, INPUT);
  pinMode(R1, INPUT);
  pinMode(R2, INPUT);
  pinMode(R3, INPUT);
  pinMode(R4, INPUT);
  pinMode(R5, INPUT);
  for(int i=0;i>100000;i++){
    filestream[i] = 6;
  }
  Serial.begin(115200);
  while (!Serial) {
  }
  SD.begin(chipSelect);
  myFile = SD.open("tost.txt", FILE_WRITE);
  attachInterrupt(GSP, gspinterrupt, FALLING);
  attachInterrupt(GCK, gckrisinginterrupt, RISING);
  attachInterrupt(DCLK, dclkinterrupt, CHANGE);
  
}

void loop(){

}
 
I have been wondering the same question. I have a project where I am trying to get the relative timing of 34 pins going high and low. I haven't decided whether to use an interrupt on change on all of them, or just poll as quickly as possible.

For the polling, I think you can just read the entire port data registers. From the Teensy core code (see https://github.com/PaulStoffregen/c...9e3a4c0ba7a7d4786a18cfbe0/teensy4/core_pins.h), it looks like you should just be able to use GPIO<n>_GDIR and GPIOn>_PSR to read whole swaths of pins, where <n> is {6,7,8,9}. That being said, I don't think that physically contiguous pins are necessarily on on the same port data register. I think the max on any given port data register is 16 bits, so you will at least need to read from two register. That should still be pretty fast, and you will need to unscramble them if they aren't in the order you want.

Can anyone with more Teensy 4.x experience verify that this is a valid way of reading many pins simultaneously?
 
You can quickly read 18 pins with one read to GPIO6. Wire them in this order:

CORE_PIN24_BIT 12 GPIO6_DR
CORE_PIN25_BIT 13 GPIO6_DR
CORE_PIN19_BIT 16 GPIO6_DR
CORE_PIN18_BIT 17 GPIO6_DR
CORE_PIN14_BIT 18 GPIO6_DR
CORE_PIN15_BIT 19 GPIO6_DR
CORE_PIN40_BIT 20 GPIO6_DR
CORE_PIN41_BIT 21 GPIO6_DR
CORE_PIN17_BIT 22 GPIO6_DR
CORE_PIN16_BIT 23 GPIO6_DR
CORE_PIN22_BIT 24 GPIO6_DR
CORE_PIN23_BIT 25 GPIO6_DR
CORE_PIN20_BIT 26 GPIO6_DR
CORE_PIN21_BIT 27 GPIO6_DR
CORE_PIN38_BIT 28 GPIO6_DR
CORE_PIN39_BIT 29 GPIO6_DR
CORE_PIN26_BIT 30 GPIO6_DR
CORE_PIN27_BIT 31 GPIO6_DR
 
You are correct. I was mistakenly looking at the Teensy 4.0 section of that header file when counting :(

As an additional question (though I should go home and test this on the scope), any idea how long it takes to poll each GPIOn_GDIR port? Can it be read in a single instruction cycle?
 
I/O ports are slow, maybe 13ns. This should be the fastest way to read 18 parallel pins on a T4.1:

Code:
#define IMXRT_GPIO6_DIRECT  (*(volatile uint32_t *)0x42000000)

inline uint32_t fixbits()
{
  register uint32_t data  = IMXRT_GPIO6_DIRECT;  // 0B11111111111111110011000000000000
  register uint32_t data2 = data >> 12;
  asm volatile("bfi %0, %1, 14, 2" : "+r"(data) : "r"(data2));
  return (data >> 14);
}
 
I can do all my timing analysis as a post-processing step, so my plan was to do something like the following (except extending it to read all of GPIO6, GPIO7, GPIO8, and GPIO9 which span my 34 inputs). Maybe polling in this manner will actually be slower than just using an interrupt on change. This seems like it should be pretty fast and limit how much memory is used. Basically, I can expect that only about 4-10 of the my inputs will go high then low for any particular experiment and just trying to figure out how to capture the absolute most accurate timing of those rising and falling edges. Detecting the timing on the edges is literally the only thing the Teensy 4.1 will be doing during this time period, so that is why I thought polling would be best.

Code:
// UNTESTED CODE
#define READ_TIME_OFFSET 4
#define IMXRT_GPIO6_DIRECT  (*(volatile uint32_t *)0x42000000)
uint32_t gpio6Idx = 0;
uint32_t gpio6Vals[100]; // I have 100 because there may be some bouncing on occasion that I can filter later, but should have a maximum of about 20 on a perfect trial
long unsigned gpio6Times[100];

// TODO: add code for GPIO7,GPIO8, GPIO9 also

inline uint32_t captureTimeChange()
{
  register uint32_t data  = IMXRT_GPIO6_DIRECT;  // 0B11111111111111110011000000000000
  // TODO: Add code to read GPIO7, GPIO8, GPIO9 also
  long unsigned time = ARM_DWT_CYCCNT - READ_TIME_OFFSET;
  if (gpio6Idx == 0 || data != gpio6Vals[gpio6Idx-1])
  {
    gpio6Vals[gpio6Idx] = data;
    gpio6Times[gpio6Idx] = time;
    gpio6Idx++;
  }
}
 
Polling 3 ports for 34 pins sounds reasonable. There will be some jitter in your timing results.
 
You can quickly read 18 pins with one read to GPIO6. Wire them in this order:

CORE_PIN24_BIT 12 GPIO6_DR
CORE_PIN25_BIT 13 GPIO6_DR
CORE_PIN19_BIT 16 GPIO6_DR
CORE_PIN18_BIT 17 GPIO6_DR
CORE_PIN14_BIT 18 GPIO6_DR
CORE_PIN15_BIT 19 GPIO6_DR
CORE_PIN40_BIT 20 GPIO6_DR
CORE_PIN41_BIT 21 GPIO6_DR
CORE_PIN17_BIT 22 GPIO6_DR
CORE_PIN16_BIT 23 GPIO6_DR
CORE_PIN22_BIT 24 GPIO6_DR
CORE_PIN23_BIT 25 GPIO6_DR
CORE_PIN20_BIT 26 GPIO6_DR
CORE_PIN21_BIT 27 GPIO6_DR
CORE_PIN38_BIT 28 GPIO6_DR
CORE_PIN39_BIT 29 GPIO6_DR
CORE_PIN26_BIT 30 GPIO6_DR
CORE_PIN27_BIT 31 GPIO6_DR

Does this mean that I should connect the wires to 12, 13, 16, 17... or connect them to 24, 25, 19, 18, 14...

Regarding your code sample, I'm not very experienced with C++. How would I break it up into 3 bytes, with 6 of the pins in each byte? What would be the fastest way to store multiple readings into memory?

Sorry for all the questions, but thank you so much for helping me!
 
> or connect them to 24, 25, 19, 18, 14...

Yes, pin 24 will be bit 0, 25 will be bit 1, etc.
 
Does this mean that I should connect the wires to 12, 13, 16, 17... or connect them to 24, 25, 19, 18, 14...

You would connect one color (e.g. red) to pins 24 (LSB), 25, 19, 18, 14, 15 (MSB). The next color (green) to 40 (LSB) through 23 (MSB), and last color (blue) to 20 (LSB) through 27 (MSB).

How would I break it up into 3 bytes, with 6 of the pins in each byte?

Code:
rawValue = GPIO6_PSR;

red = ((rawValue & 0x000F0000) >> 14) + ((rawValue & 0x00003000) >> 12);
green = (rawValue & 0x03F00000) >> 20;
blue = (rawValue & 0xFC000000) >> 26;
(code has not been compiled or tested)
What would be the fastest way to store multiple readings into memory?
With 100 clock cycles, there should be time to write each value to a byte array and increment the index. For example:
Code:
bytes[index++] = red;
bytes[index++] = green;
bytes[index++] = blue;
 
You would connect one color (e.g. red) to pins 24 (LSB), 25, 19, 18, 14, 15 (MSB). The next color (green) to 40 (LSB) through 23 (MSB), and last color (blue) to 20 (LSB) through 27 (MSB).



Code:
rawValue = GPIO6_PSR;

red = ((rawValue & 0x000F0000) >> 14) + ((rawValue & 0x00003000) >> 12);
green = (rawValue & 0x03F00000) >> 20;
blue = (rawValue & 0xFC000000) >> 26;
(code has not been compiled or tested)

With 100 clock cycles, there should be time to write each value to a byte array and increment the index. For example:
Code:
bytes[index++] = red;
bytes[index++] = green;
bytes[index++] = blue;


Thank you so much! I'll try it out soon.
 
Status
Not open for further replies.
Back
Top