Forum Rule: Always post complete source code & details to reproduce any issue!
Page 1 of 2 1 2 LastLast
Results 1 to 25 of 30

Thread: Simultaneously reading 8 GPIO pins

  1. #1

    Simultaneously reading 8 GPIO pins

    Hello everyone,

    I am trying to read and store eight(8) GPIO pins simultaneously. The ultimate goal is to complete this operation in the fewest number of CPU clock cycles as possible. One clock cycle would be ideal, but most likely unfeasible. I can't exceed ~10 instruction clocks since I am only allotted 18 clock pulses total for the loop I am writing, and I need some wiggle room for the other operation which is two write a pin high then low again after the 8bits are stored. AVR microcontrollers were easier to do this with using the PINx command, but the ARM microcontroller doesn't seem to support this command. And since I am limited in timing, I can't use the built-in functions like digitalRead(), digitalWrite(), etc. Since each one consumes ~50 clock cycles according to post #3 on an arduino forum: https://forum.arduino.cc/t/how-many-...-take/467153/2

    I have extensively searched through these forums and came across some information such as using IOMUX to force the pins to act as GPIO when ALT5 is used. And that GPIOx_GDIR = 0xFFFF will set the port to be input, and then GPIOx_DR will read the whole port. I am aware that a port read will read 32 bits, then I will need to truncate the top MSB bits and store only those bits.

    I also need some way to write a pin high and low on command preferably in a single clock pulse of the 600MHz CPU clock if possible, so once again, the digitalWrite() and I suspect even the digitalWriteFast() are not fast enough.

    A few of the relevant forums/sources I have found so far:
    https://forum.pjrc.com/threads/64702...ntrol-register
    https://forum.pjrc.com/threads/54711...irst-Beta-Test
    https://forum.pjrc.com/threads/61497...lve-my-problem
    The IMXRT1060 datasheet

    I am new to programming microcontrollers with C/Arduino-C/Teensyduino, I have only been trained to use assembly for microcontroller programming and I am very unsure how to program this loop to read and store the 8 parallel pins. I would use a standalone MCU, but adding on a memory peripheral and interfacing with it using assembly sounds to be more effort than it's worth. And so I have landed on using the Teensy 4.0.

    Any feedback on how to program this beyond just being told to read the datasheet would be greatly appreciated since the datasheet is pretty convoluted in my opinion.

    And thank you Paul, KurtE and TRNPep for your previous posts which have guided me up to this point.
    ~RRkt

  2. #2
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    15,069
    Each Teensy 4.x port can read all of its pins with a single read of 32 bits - where the pins will be 'distributed'.

    Looking at the PJRC posted schematic or other info will show what pins are on which port and in what order.

    Then reading that port all pin values can be seen/recorded.

    See this post and related :: Does-T4-0-have-PORTS-like-read-8-pins-at-once

    That links to Teensy-4-1-Storing-the-value-of-18-pins-input-quickly

    concept is the same - only the pin mapping changes as noted in the schematic or the linked @KurtE info

  3. #3
    Senior Member
    Join Date
    May 2015
    Location
    USA
    Posts
    1,070
    This T4.1 code reads 12 usable bits of input and synchronizes reads to an external clock signal:

    Code:
        for (register unsigned i = 0; i < FFTSize; ++i) {
          register uint32_t data;
    
          do {    // wait for clock low (could also store this falling edge data)
          } while (clock_bit(GPIO6));
    
          do {    // wait for rising edge
            data = GPIO6;
          } while (!clock_bit(data));
    
          input[scan][i] = data >> 16;    // store the data
    
        } // for samples

  4. #4
    Thank you both for the replies so far.
    Here is the pseudocode I have thought up. Obviously, the code is not real, because I am still unsure how the majority of the built in functions/commands work such as the IOMUX ALT5 stuff to make the pins behave as GPIO.

    Code:
    MasterClock = PinX //External clock
    SlaveClock = PinY //External clock
    SerialClock = PinZ //Coded clock used to drive a peripheral
    
    [Declare IOMUX somehow] //FIXME: Figure out how to use the IOMUX command to set port 'x' as GPIOx
    GPIOx_DGIR = 0xFFFF; //Set port 'x' to be input
    
    while(MasterClock = high){
         while(SlaveClock = high){
              SerialClock = high; //FIXME: How to write a single pin high without using digitalWrite(pin)
              SomeBuffer = GPIOx_DR; //FIXME: What type of array/matrix to use to store 8-bits
              SerialClock = low; //FIXME: How to write a single pin low without using digitalWrite(pin)
              break; //break out of the SlaveClock loop wait for the the SlaveClock to go high againw
         }
    }
    The problem I see that I might encounter is if I use GPIOx_DR to poll the whole port, it might disrupt the logic level of the pin being used to feed into a peripheral as its clock. Additionally, I don't know how I can set a single pin high at the beginning of the "SlaveClock" loop, then set it back to low at the end of the loop while leaving all other pins unaffected.

  5. #5
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    15,069
    Paul has digitalWriteFast( Const pin#, high/low ) resolve out to the fastest raw write in inline code on Output pins.

    Setting all desired Input pins with pinMode will assure they are right for reading.

    The tight loop read should be reliable

  6. #6
    Has Paul calculated/measured the instruction cycles needed to execute the digitalWriteFast() command? I am very limited with how many CPU clock cycles can execute in the main loop.

    Thank you

  7. #7
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    15,069
    Easy to do using ARM_DWT_CYCCNT which is already running on T_4.x.

    Code:
    	uint32_t start;
    	uint32_t end;
    
    	start = ARM_DWT_CYCCNT;
    	for (unsigned i = 0; i < 100; i++)
    	{
    		// dummy++;
    	}
    	end = ARM_DWT_CYCCNT;
    
    	// end - start; // Cycles used plus ~3 for reading the CYCCNT

  8. #8
    Senior Member
    Join Date
    Jul 2020
    Posts
    1,224
    Only one variable is needed:
    Code:
      uint32_t count = 0 ;
      count -= ARM_DWT_CYCCNT;
      ... code to time ...
      count += ARM_DWT_CYCCNT;
    Using the -= allows discontinuous time periods to be summed together easily if you want to time only certain parts of the code.

    Code:
      uint32_t count = 0 ;
      count -= ARM_DWT_CYCCNT;
      ... code to time ...
      count += ARM_DWT_CYCCNT;
    
      .. code not to be timed ...
    
      count -= ARM_DWT_CYCCNT;
      ... code to time ...
      count += ARM_DWT_CYCCNT;

  9. #9
    Senior Member
    Join Date
    Jul 2015
    Posts
    118
    If you really only have about 10-18 clock cycles, assembly language is probably the only way. I wrote a logic analyzer that reads and stores 32 bits in 8 clock cycles, but it is in assembly. On the Teensy 4, a GPIO read takes 8 clock cycles and a write takes 1 clock cycle. Luckily with the parallel processing, if done optimally, it can read GPIO while simultaneously writing the previous value to memory. So reading, writing, then toggling an output high and low might be possible in 10-12 clocks. This is without a loop - I have 1024 reads in a row for the logic analyzer. Adding a loop would add at least 3 clocks.

    Writing it in c may be almost as fast if you are very, very careful.

  10. #10
    Thank you everyone for your help so far, and especially defragster and MarkT for your help on recording CPU cycles.

    However, I am very lost right now trying to setup a port of GPIO pins, then read 8 of the pins, preferably simultaneously as to use the fewest number of CPU cycles as possible. I also need to then store those values into a buffer of some sort, perhaps an array.

    If someone has some example code that does this type of port manipulation, I would be eternally grateful for your help.

    Thank you,
    ~RRkt

  11. #11
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    15,069
    If the pins to be used fit in a port then reading that returns a 32 bit value of ALL pins on that port.

    The useful pins could be extracted and saved in a byte at the time- which you may not have - or the 32 bit word saved in an array to be parsed later where each of the desired pins has a fixed location in that value. Or if the pins are in the high or low 16 bits - then it might save space to just store those 16 bits.

    Hopefully links above demonstrate port reading - if not there are other posts.

    Spelling out what pins and port are of interest might allow better help.

  12. #12
    I'm flexible on which pins/ports to use since there is no conflict with anything. I am just not knowledgeable at all with the T4.x microcontroller, and I have no idea how to read or store a whole port at once. I just need it spelled out for me at this point as to how to read the port, like the DDRx, PORTx, and PINx commands for the AVR microcontrollers, but in the manner that the ARM based T4.0 can understand.

  13. #13
    Senior Member
    Join Date
    Jul 2015
    Posts
    118
    Here's a short sketch that reads 1024 32-bit values from Port 6 in an efficient manner:

    Code:
    #define BUFFER_SIZE 1024
    
    uint32_t buffer[BUFFER_SIZE];
    
    void setup() {
      uint32_t *buffer_ptr = &buffer[0];
      uint32_t *end_of_buffer = &buffer[BUFFER_SIZE];
    
      while (buffer_ptr < end_of_buffer) {
        *(buffer_ptr) = GPIO6_PSR;
        ++buffer_ptr;
      }
    }
    
    void loop() {  
    }

  14. #14
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    15,069
    @KurtE has summarized pin info here : github.com/KurtE/TeensyDocuments

    And PJRC schematic here gives it visually: pjrc.com/store/teensy41.html#tech

  15. #15
    Senior Member
    Join Date
    Jul 2014
    Posts
    3,317
    Quote Originally Posted by LAtimes View Post
    Here's a short sketch that reads 1024 32-bit values from Port 6 in an efficient manner:
    Talking about efficiency
    A for loop can be about 100 times faster than a while as shown next
    Code:
    #define BUFFER_SIZE 1024
    
    uint32_t buffer[BUFFER_SIZE];
    
    void setup() {
      uint32_t *buffer_ptr = &buffer[0];
      uint32_t *end_of_buffer = &buffer[BUFFER_SIZE];
    
    while(!Serial);
    
    uint32_t to=ARM_DWT_CYCCNT;
      while (buffer_ptr < end_of_buffer) {
        *(buffer_ptr) = GPIO6_PSR;
        ++buffer_ptr;
      }
    Serial.print("while ");Serial.println(ARM_DWT_CYCCNT-to);
    
    to=ARM_DWT_CYCCNT;
      for (; buffer_ptr < end_of_buffer; )  *buffer_ptr++ = GPIO6_PSR;
    Serial.print("for "); Serial.println(ARM_DWT_CYCCNT-to);
    }
    
    void loop() {  
    }
    Serial output
    Code:
    while 9358
    for 87
    T4.1, 600MHz, faster
    reason? I guess unwrapping in for loop by compiler

  16. #16
    Senior Member
    Join Date
    Apr 2021
    Location
    Cambridgeshire, UK
    Posts
    119
    Time difference for serial printing 6 characters rather than 4? You might want to read CYCCNT immediately after the loops end...

  17. #17
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    15,069
    Was wondering how 1024 port reads could complete in 87 cycles??? Didn't catch the dual print on one line. So was playing with the code.

    That and the OP ref to 50 clocks per read is NOT at Teensy speed. DigitalReadFast ( with bit shifting and ||'ing ) is under 10 cycles per read with loop overhead. Even the looped read of pin numbers from an array and bitshift or'ing is under 28 cycles per read - with overhead of 2 loops.

    Quote Originally Posted by h4yn0nnym0u5e View Post
    Time difference for serial printing 6 characters rather than 4? You might want to read CYCCNT immediately after the loops end...
    Swapping the prints around and adding alternate cases - it looks like the Compiler is optimizing away the "for" case?
    Code:
    T:\tCode\FORUM\GPIOreadPort\GPIOreadPort.ino Jul 29 2021 00:43:32
    9232 >> while 
    1 >>for 
    9255 >>for ii * 
    9251 >>for [ii] 
    222777 >>for pin Read 
    76398 >>for pin ReadFast
    Code:
    #define BUFFER_SIZE 1024
    
    // https://forum.pjrc.com/threads/67751-Simultaneously-reading-8-GPIO-pins?p=284737&viewfull=1#post284737
    
    uint32_t buffer[BUFFER_SIZE];
    
    void setup() {
      uint32_t *buffer_ptr = &buffer[0];
      uint32_t *end_of_buffer = &buffer[BUFFER_SIZE];
    
      while (!Serial);
      Serial.println("\n" __FILE__ " " __DATE__ " " __TIME__);
      int ii;
    
      if ( CrashReport ) Serial.print ( CrashReport );
      uint32_t to = ARM_DWT_CYCCNT;
      while (buffer_ptr < end_of_buffer) {
        *(buffer_ptr) = GPIO6_PSR;
        ++buffer_ptr;
      }
      Serial.print(ARM_DWT_CYCCNT - to); Serial.println(" >> while ");
    
      to = ARM_DWT_CYCCNT;
      for (; buffer_ptr < end_of_buffer; )  *buffer_ptr++ = GPIO6_PSR;
      Serial.print(ARM_DWT_CYCCNT - to); Serial.println(" >>for ");
    
      buffer_ptr = &buffer[0];
      to = ARM_DWT_CYCCNT;
      for (ii = 0; ii < BUFFER_SIZE; ii++ )  *buffer_ptr++ = GPIO6_PSR;
      Serial.print(ARM_DWT_CYCCNT - to); Serial.println(" >>for ii * ");
    
      to = ARM_DWT_CYCCNT;
      for (ii = 0; ii < BUFFER_SIZE; ii++ )  buffer[ii] = GPIO6_PSR;
      Serial.print(ARM_DWT_CYCCNT - to); Serial.println(" >>for [ii] ");
    
      int myPins[8] = {2, 4, 6, 8, 10, 12, 14, 16};
      for (ii = 0; ii < 8; ii++)  pinMode( myPins[ii], INPUT );
      byte myB = 0;
      to = ARM_DWT_CYCCNT;
      for (buffer_ptr = &buffer[0]; buffer_ptr < end_of_buffer; )  {
        for (ii = 0; ii < 8; ii++)  myB = (myB << 1) || digitalRead( myPins[ii] );
        *buffer_ptr++ = myB;
      }
      Serial.print(ARM_DWT_CYCCNT - to); Serial.println(" >>for pin Read ");
    
      to = ARM_DWT_CYCCNT;
      for (buffer_ptr = &buffer[0]; buffer_ptr < end_of_buffer; )  {
        myB = digitalReadFast( 2 ) << 7 || digitalReadFast( 4 ) << 6 || digitalReadFast( 6 ) << 5 || digitalReadFast( 8 ) << 4 || \
              digitalReadFast( 10 ) << 3 || digitalReadFast( 12 ) << 2 || digitalReadFast( 14 ) << 1 || digitalReadFast( 16 );
        *buffer_ptr++ = myB;
      }
      Serial.print(ARM_DWT_CYCCNT - to); Serial.println(" >>for pin ReadFast ");
    
    }
    
    void loop() {
    }
    Last edited by defragster; 07-29-2021 at 08:10 AM.

  18. #18
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    15,069
    Changing the p#17 RED code as follows gives similar clocks value as while()

    Code:
    T:\tCode\FORUM\GPIOreadPort\GPIOreadPort.ino Jul 29 2021 00:55:33
    9232 >> while 
    9252 >>for 
    9251 >>for ii * 
    9253 >>for [ii] 
    222778 >>for pin Read 
    76394 >>for pin ReadFast
    Add initialization :: buffer_ptr = &buffer[0]
    Code:
      to = ARM_DWT_CYCCNT;
      for (buffer_ptr = &buffer[0]; buffer_ptr < end_of_buffer; )  *buffer_ptr++ = GPIO6_PSR;
      Serial.print(ARM_DWT_CYCCNT - to); Serial.println(" >>for ");

  19. #19
    Member
    Join Date
    Jan 2013
    Posts
    29
    Quote Originally Posted by WMXZ View Post
    100 times faster
    Careful ... your second loop does nothing, because buffer_ptr hasn't been reset.

    Code:
    #include <Arduino.h>
    
    #define BUFFER_SIZE 1024
    
    uint32_t buffer [BUFFER_SIZE];
    
    void setup () {
        while(!Serial) {}
    
        auto buffer_ptr = buffer, end_of_buffer = buffer + BUFFER_SIZE;
        int t0, t1;
    
        t0 = ARM_DWT_CYCCNT;
        while (buffer_ptr < end_of_buffer)
            *buffer_ptr++ = GPIO6_PSR;
        t1 = ARM_DWT_CYCCNT;
    
        Serial.printf("while %d\n", t1-t0);
        //arm_dcache_flush_delete(buffer, sizeof buffer);
        buffer_ptr = buffer;
    
        t0 = ARM_DWT_CYCCNT;
        for (; buffer_ptr < end_of_buffer; )
            *buffer_ptr++ = GPIO6_PSR;
        t1 = ARM_DWT_CYCCNT;
    
        Serial.printf("for %d\n", t1-t0);
    }
    
    void loop () {}
    while 9232
    for 9234

    PS. With C++11, you can also use this notation (same cycle count):

    Code:
        t0 = ARM_DWT_CYCCNT;
        for (auto& e : buffer)
            e = GPIO6_PSR;
        t1 = ARM_DWT_CYCCNT;
    Last edited by jcw; 07-29-2021 at 08:13 AM.

  20. #20
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    15,069
    Indeed @jcw that is the case.

    Also note post #17 code adds CrashReport - that didn't catch anything.

    But when one of those loops was first copied with changes [ and not resetting buffer_ptr ] it went storming off into memory causing an immediate RESET of Teensy

    No fault, just: reset ... after reset ...

  21. #21
    Senior Member
    Join Date
    Apr 2014
    Location
    Germany
    Posts
    1,623
    I was interested how efficient the compiler implements a range based for loop so I added one to the code above:
    Code:
     for (auto& slot : buffer) slot = GPIO6_PSR;
    Which also gives the same result:

    Code:
    while 9234
    for 9235
    range 9236
    Looking at the generated assembly (only the loops shown)
    Code:
    While loop
    aa:	6881      	ldr	r1, [r0, #8]
    ac:	f843 1b04 	str.w	r1, [r3], #4
    b0:	42a3      	cmp	r3, r4
    b2:	d1fa      	bne.n	aa <setup+0x2e>
    
    Range based
    ea:	6881      	ldr	r1, [r0, #8]
    ec:	f843 1b04 	str.w	r1, [r3], #4
    f0:	42a3      	cmp	r3, r4
    f2:	d1fa      	bne.n	ea <setup+0x6e>
    The manually written while loop and the range based loop generate exactly the same assembly. IMHO, as so often, no need to do the compilers work, a simple

    Code:
    uint32_t buffer [1024];
    
    void setup () {
          for (auto& slot : buffer) slot = GPIO6_PSR;
    }
    
    void loop () {}
    seems to do the job quite nicely.

    Edit: Sorry, cross post with jcw's answer
    Last edited by luni; 07-29-2021 at 09:04 AM.

  22. #22
    Senior Member
    Join Date
    Jul 2014
    Posts
    3,317
    I would agree on bad (too quickly developed) test program (I should have known it better)
    BUT
    I had an issue with parallel ILI9341
    that was too slow with while loop and that I could speed-up with for.
    Now trying to reproduce, I realize that with while() I may had introduced a side-effect that slowed the display down by a factor of 100
    (filling display with single colour needed 1.6 seconds!!)
    Last edited by WMXZ; 07-29-2021 at 09:27 AM.

  23. #23
    Member
    Join Date
    Jan 2013
    Posts
    29
    Quote Originally Posted by WMXZ View Post
    while(h--) while(w--) ...
    Careful ... on first iteration the w-loop runs N times, but on the next iterations it won't, because w is now zero.

  24. #24
    Senior Member
    Join Date
    Jul 2014
    Posts
    3,317
    Quote Originally Posted by jcw View Post
    Careful ... on first iteration the w-loop runs N times, but on the next iterations it won't, because w is now zero.
    Yes, that was the problem. w is declared uint16_t so instead of 240 there were 32768 iterations (or a factor of 136)

  25. #25
    Member
    Join Date
    Jan 2013
    Posts
    29
    At the risk of drawing out a long discussion further ...

    Nope, that's not exactly what's going on. The first inner while runs w times, and then always 65535 (i.e. "(uint16_t) 0 - 1").
    If w were declared as uint8_t, the nested while loops would still run 255 times, which is not the same as the for loops.
    In short: you'll be better off by clearly stating intent. The compiler will optimise ... no point trying to outsmart it.

    PS. My previous comment was wrong: w will be 65535 every time it comes out of its while loop, not zero.
    PPS. Hrm, I was assuming unsigned. Probably incorrectly so. Oh well, enough yak-shaving, onwards
    Last edited by jcw; 07-29-2021 at 12:41 PM. Reason: note about unsigned ints

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •