Forum Rule: Always post complete source code & details to reproduce any issue!
Results 1 to 11 of 11

Thread: Teensy 4.1: Storing the value of 18 pins' input quickly

  1. #1
    Junior Member
    Join Date
    Jun 2020
    Posts
    5

    Teensy 4.1: Storing the value of 18 pins' input quickly

    I'm currently working on making a capture card for a DSi console and am currently working on the part where I get the input from the screen. The screen is an 18-bit display with a parallel RGB interface. I have to read the 18 bits with a window of .00016 ms multiple times in succession (for a total of ~60,000 readings per frame). Between frames I have about 4ms to send the frame data to a computer, but I'm not focusing on that part yet. I'm using a Teensy 4.1 board.

    The timing pin I have is the DCLK pin. I have to read the 18 bits and store them somewhere every time the DCLK pin changes (during the frame, it changes every ~.00016 ms). I've calculated that I have ~100 clock cycles of the Teensy board when running it at 600 Mhz.

    I was able to easily get 1 bit of the 18 working with all the timing, but when I added the digitalRead of 5 other bits, it wasn't able to keep up at 600 Mhz, and was somewhat able to keep up when overclocked at 816 Mhz (some pixels were missing, but most were there). Since I was able to get 1 bit of color working, I know that the hardware is correct.

    Pin Descriptions:
    GSP - When it falls it marks the start of a new frame
    GCK - Marks the beginning/end of rows
    DCLK - Pixel Clock - changes and iterates through pixels, row by row
    R0-R5 - 6 of the 18 bits of color. These update each time DCLK changes.

    I have been using FASTRUN and interrupts in order to solve this, though I could change to a loop which constantly checks waiting for the DCLK signal to change until the frame is over (the only things that change between frames are the 18 bits of color, all other pins repeat).

    Please let me know if you have any advice. I'm using the Arduino IDE with Teensyduino, though I'm open to switching to another method - I just haven't figured out how to.

    Code:
    #include <SD.h>
    
    File myFile;
    const int chipSelect = BUILTIN_SDCARD;
    volatile int state = 0;
    
    
    #define DCLK 2//Pixel Clock
    #define GCK 3//Essentially HSYNC
    //SP__ are speaker pins
    #define GSP 5//Essentially VSYNC
    
    #define R0 23
    #define R1 22
    #define R2 21
    #define R3 20
    #define R4 19
    #define R5 18
    //GSP should be the same as SPS
    
    
    
    volatile int numgck = 0;
    volatile int numclk = 0;
    FASTRUN volatile int filestream[100000];
    volatile int currentindex = 0;
    
    
    
    FASTRUN void gspinterrupt(){
      
      if (state==0){
    //This right here is just making sure it only reads one frame - for testing purposes
        Serial.println("Started2");
        state=1;
        addfilestream(4);
      }else{
        Serial.println("Writing files");
        detachInterrupt(GSP);
        detachInterrupt(GCK);
        detachInterrupt(DCLK);
        Serial.println("Writing files");
        for(int i=0;i<=currentindex;i++){
          myFile.print(filestream[i]);
        }
        myFile.close();
        Serial.println("File Closed");
        while(1){
          
        }
      }
      
    }
    void gckrisinginterrupt(){
      //This isn't relevant to the problem I'm having, I just have it write a "3" in the file so I know when the GCK pin changes
      addfilestream(3);
      
      
    }
    FASTRUN void dclkinterrupt(){
      addfilestream(digitalRead(R5));
    
      //It was able to keep up with the screen before I added these digitalReads
      digitalRead(R4);
      digitalRead(R3);
      digitalRead(R2);
      digitalRead(R1);
      digitalRead(R0);
      
    }
    
    FASTRUN void addfilestream(int toAdd){
      filestream[currentindex]=toAdd;
      currentindex++;
    }
    
    //Have just under 4 ms between frames
    void setup() {
      // put your setup code here, to run once:
      pinMode(DCLK, INPUT);
      pinMode(GCK, INPUT);
      pinMode(GSP, INPUT);
      pinMode(R0, INPUT);
      pinMode(R1, INPUT);
      pinMode(R2, INPUT);
      pinMode(R3, INPUT);
      pinMode(R4, INPUT);
      pinMode(R5, INPUT);
      for(int i=0;i>100000;i++){
        filestream[i] = 6;
      }
      Serial.begin(115200);
      while (!Serial) {
      }
      SD.begin(chipSelect);
      myFile = SD.open("tost.txt", FILE_WRITE);
      attachInterrupt(GSP, gspinterrupt, FALLING);
      attachInterrupt(GCK, gckrisinginterrupt, RISING);
      attachInterrupt(DCLK, dclkinterrupt, CHANGE);
      
    }
    
    void loop(){
    
    }

  2. #2
    Junior Member
    Join Date
    Jul 2020
    Posts
    16
    I have been wondering the same question. I have a project where I am trying to get the relative timing of 34 pins going high and low. I haven't decided whether to use an interrupt on change on all of them, or just poll as quickly as possible.

    For the polling, I think you can just read the entire port data registers. From the Teensy core code (see https://github.com/PaulStoffregen/co...y4/core_pins.h), it looks like you should just be able to use GPIO<n>_GDIR and GPIOn>_PSR to read whole swaths of pins, where <n> is {6,7,8,9}. That being said, I don't think that physically contiguous pins are necessarily on on the same port data register. I think the max on any given port data register is 16 bits, so you will at least need to read from two register. That should still be pretty fast, and you will need to unscramble them if they aren't in the order you want.

    Can anyone with more Teensy 4.x experience verify that this is a valid way of reading many pins simultaneously?

  3. #3
    Senior Member
    Join Date
    May 2015
    Location
    USA
    Posts
    728
    You can quickly read 18 pins with one read to GPIO6. Wire them in this order:

    CORE_PIN24_BIT 12 GPIO6_DR
    CORE_PIN25_BIT 13 GPIO6_DR
    CORE_PIN19_BIT 16 GPIO6_DR
    CORE_PIN18_BIT 17 GPIO6_DR
    CORE_PIN14_BIT 18 GPIO6_DR
    CORE_PIN15_BIT 19 GPIO6_DR
    CORE_PIN40_BIT 20 GPIO6_DR
    CORE_PIN41_BIT 21 GPIO6_DR
    CORE_PIN17_BIT 22 GPIO6_DR
    CORE_PIN16_BIT 23 GPIO6_DR
    CORE_PIN22_BIT 24 GPIO6_DR
    CORE_PIN23_BIT 25 GPIO6_DR
    CORE_PIN20_BIT 26 GPIO6_DR
    CORE_PIN21_BIT 27 GPIO6_DR
    CORE_PIN38_BIT 28 GPIO6_DR
    CORE_PIN39_BIT 29 GPIO6_DR
    CORE_PIN26_BIT 30 GPIO6_DR
    CORE_PIN27_BIT 31 GPIO6_DR

  4. #4
    Junior Member
    Join Date
    Jul 2020
    Posts
    16
    You are correct. I was mistakenly looking at the Teensy 4.0 section of that header file when counting

    As an additional question (though I should go home and test this on the scope), any idea how long it takes to poll each GPIOn_GDIR port? Can it be read in a single instruction cycle?

  5. #5
    Senior Member
    Join Date
    May 2015
    Location
    USA
    Posts
    728
    I/O ports are slow, maybe 13ns. This should be the fastest way to read 18 parallel pins on a T4.1:

    Code:
    #define IMXRT_GPIO6_DIRECT  (*(volatile uint32_t *)0x42000000)
    
    inline uint32_t fixbits()
    {
      register uint32_t data  = IMXRT_GPIO6_DIRECT;  // 0B11111111111111110011000000000000
      register uint32_t data2 = data >> 12;
      asm volatile("bfi %0, %1, 14, 2" : "+r"(data) : "r"(data2));
      return (data >> 14);
    }

  6. #6
    Junior Member
    Join Date
    Jul 2020
    Posts
    16
    I can do all my timing analysis as a post-processing step, so my plan was to do something like the following (except extending it to read all of GPIO6, GPIO7, GPIO8, and GPIO9 which span my 34 inputs). Maybe polling in this manner will actually be slower than just using an interrupt on change. This seems like it should be pretty fast and limit how much memory is used. Basically, I can expect that only about 4-10 of the my inputs will go high then low for any particular experiment and just trying to figure out how to capture the absolute most accurate timing of those rising and falling edges. Detecting the timing on the edges is literally the only thing the Teensy 4.1 will be doing during this time period, so that is why I thought polling would be best.

    Code:
    // UNTESTED CODE
    #define READ_TIME_OFFSET 4
    #define IMXRT_GPIO6_DIRECT  (*(volatile uint32_t *)0x42000000)
    uint32_t gpio6Idx = 0;
    uint32_t gpio6Vals[100]; // I have 100 because there may be some bouncing on occasion that I can filter later, but should have a maximum of about 20 on a perfect trial
    long unsigned gpio6Times[100];
    
    // TODO: add code for GPIO7,GPIO8, GPIO9 also
    
    inline uint32_t captureTimeChange()
    {
      register uint32_t data  = IMXRT_GPIO6_DIRECT;  // 0B11111111111111110011000000000000
      // TODO: Add code to read GPIO7, GPIO8, GPIO9 also
      long unsigned time = ARM_DWT_CYCCNT - READ_TIME_OFFSET;
      if (gpio6Idx == 0 || data != gpio6Vals[gpio6Idx-1])
      {
        gpio6Vals[gpio6Idx] = data;
        gpio6Times[gpio6Idx] = time;
        gpio6Idx++;
      }
    }

  7. #7
    Senior Member
    Join Date
    May 2015
    Location
    USA
    Posts
    728
    Polling 3 ports for 34 pins sounds reasonable. There will be some jitter in your timing results.

  8. #8
    Junior Member
    Join Date
    Jun 2020
    Posts
    5
    Quote Originally Posted by jonr View Post
    You can quickly read 18 pins with one read to GPIO6. Wire them in this order:

    CORE_PIN24_BIT 12 GPIO6_DR
    CORE_PIN25_BIT 13 GPIO6_DR
    CORE_PIN19_BIT 16 GPIO6_DR
    CORE_PIN18_BIT 17 GPIO6_DR
    CORE_PIN14_BIT 18 GPIO6_DR
    CORE_PIN15_BIT 19 GPIO6_DR
    CORE_PIN40_BIT 20 GPIO6_DR
    CORE_PIN41_BIT 21 GPIO6_DR
    CORE_PIN17_BIT 22 GPIO6_DR
    CORE_PIN16_BIT 23 GPIO6_DR
    CORE_PIN22_BIT 24 GPIO6_DR
    CORE_PIN23_BIT 25 GPIO6_DR
    CORE_PIN20_BIT 26 GPIO6_DR
    CORE_PIN21_BIT 27 GPIO6_DR
    CORE_PIN38_BIT 28 GPIO6_DR
    CORE_PIN39_BIT 29 GPIO6_DR
    CORE_PIN26_BIT 30 GPIO6_DR
    CORE_PIN27_BIT 31 GPIO6_DR
    Does this mean that I should connect the wires to 12, 13, 16, 17... or connect them to 24, 25, 19, 18, 14...

    Regarding your code sample, I'm not very experienced with C++. How would I break it up into 3 bytes, with 6 of the pins in each byte? What would be the fastest way to store multiple readings into memory?

    Sorry for all the questions, but thank you so much for helping me!

  9. #9
    Senior Member
    Join Date
    May 2015
    Location
    USA
    Posts
    728
    > or connect them to 24, 25, 19, 18, 14...

    Yes, pin 24 will be bit 0, 25 will be bit 1, etc.

  10. #10
    Quote Originally Posted by Gymnast544 View Post
    Does this mean that I should connect the wires to 12, 13, 16, 17... or connect them to 24, 25, 19, 18, 14...
    You would connect one color (e.g. red) to pins 24 (LSB), 25, 19, 18, 14, 15 (MSB). The next color (green) to 40 (LSB) through 23 (MSB), and last color (blue) to 20 (LSB) through 27 (MSB).

    Quote Originally Posted by Gymnast544 View Post
    How would I break it up into 3 bytes, with 6 of the pins in each byte?
    Code:
    rawValue = GPIO6_PSR;
    
    red = ((rawValue & 0x000F0000) >> 14) + ((rawValue & 0x00003000) >> 12);
    green = (rawValue & 0x03F00000) >> 20;
    blue = (rawValue & 0xFC000000) >> 26;
    (code has not been compiled or tested)
    Quote Originally Posted by Gymnast544 View Post
    What would be the fastest way to store multiple readings into memory?
    With 100 clock cycles, there should be time to write each value to a byte array and increment the index. For example:
    Code:
    bytes[index++] = red;
    bytes[index++] = green;
    bytes[index++] = blue;

  11. #11
    Junior Member
    Join Date
    Jun 2020
    Posts
    5
    Quote Originally Posted by LAtimes View Post
    You would connect one color (e.g. red) to pins 24 (LSB), 25, 19, 18, 14, 15 (MSB). The next color (green) to 40 (LSB) through 23 (MSB), and last color (blue) to 20 (LSB) through 27 (MSB).



    Code:
    rawValue = GPIO6_PSR;
    
    red = ((rawValue & 0x000F0000) >> 14) + ((rawValue & 0x00003000) >> 12);
    green = (rawValue & 0x03F00000) >> 20;
    blue = (rawValue & 0xFC000000) >> 26;
    (code has not been compiled or tested)

    With 100 clock cycles, there should be time to write each value to a byte array and increment the index. For example:
    Code:
    bytes[index++] = red;
    bytes[index++] = green;
    bytes[index++] = blue;

    Thank you so much! I'll try it out soon.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •