Fast 8 bit parallel I/O for T4.0

markkimball

Active member
For various reasons I've been looking at various ways to get reasonably fast parallel I/O for 8 bits on a teensy 4.0. None of the T4.0's GPIOs that are used for digital I/O have 8 contiguous bits, but I did find that GPIO6 has two contiguous groups of 4 bits or more. Getting 8 bits still requires a small amount of bit twiddling and shifts but those operations are pretty fast. I wrote some code to test the idea out and it seems to work OK.

I wrote it so it wouldn't be too difficult to turn the core functions into a library. It is simple enough that I have included it here:

Code:
/* 8 bit parallel I/O functions.
 *  For Teensy4.0
 *  The read/write routines directly access the GPIO6 data register for fast
 *  reads and writes.
 *  
 *  GPIO6 is the best choice because:
 *  pin #'s 21, 20, 23, 22, 16 and 17 (in MSB to LSB order, GPIO6 bit#'s 27-22) are contiguous;
 *  and pin#'s 15, 14, 18 and 19 are contiguous in the same order, bit#'s 19-16.
 *  So one register read could fetch 2 separate nybbles that can be combined to make an 8-bit word.
 *  If we choose bit #'s 27-24 for the upper nybble, we need to right-shift them by 20 bits;
 *  and for the lower 4 bits (#'s 19-16) we need to right-shift them by 16 bits.
 *  the T4's processor only supports 32-bit Read/Write accesses so we're stuck with both shift operations.
 * 
 *  if raw_reg = GPIO6_DR, we can do something like this to read an 8-bit input:
 *  raw_reg &= 0x0f0f0000;
 *   value = ((raw_reg >> 20) | (raw_reg >> 16)) & 0xff;
 * 
 *  outputting an 8 bit value will require some masking to avoid changing the state of other pins associated with GPIO6.
 * 
 * 
 *  The bit ordering can be found in core_pins.h, in the section that defines CORE_PINX_BIT,
 *  where X = 0...39.
 *  If we use an external multiplexer to place either the uppper 8 or lower 8 bits from a 16-bit ADC we can
 *  then combine THEM to acquire a 16-bit value.
 *  OK this is not optimal compared to directly fetching 16 bits, but the Teensy4.0 board design doesn't permit that.
 * 
 *   NOTE:  we still will use pinMode() to configure the input/output pins.
 */

 #include <Arduino.h>
 #include <core_pins.h>

#define _pickbits 0x0f0f0000
#define ledpin 13

const int EightPins[8] = {21,20,23,22,15,14,18,19};

// I'm using a somewhat-Arduino-like naming scheme for these.

uint8_t digitalRead8Bits(void)
{
	uint32_t rawregvalue;
	
	rawregvalue = GPIO6_DR & _pickbits;
	return (uint8_t) ((rawregvalue >> 20) + (rawregvalue >> 16)) & 0xff;
}


// For a bit more efficiency I'm using the XOR function to change the output bits.
// This avoids the need to explicitly protect other register bits.
//  Since the bootup logic states are all LOWs this should work even if digitalWrite8Bits() is called before pinMode8Bits() is called.
volatile uint32_t _last = 0;

void digitalWrite8Bits(uint8_t value)
{
  uint32_t writebits,newbits;
  
  newbits =  (( (uint32_t) value & 0xf0) << 20 ) + (((uint32_t) value & 0x0f)<<16); // we want to update _last with this value later
  writebits = newbits ^ _last;
  GPIO6_DR ^= writebits;
  _last = newbits;
}

void pinMode8Bits(int mode)
{
 int i;
 for(i = 0; i < 8; i++)
  pinMode(EightPins[i], mode);
}

void setup() {
   pinMode8Bits(OUTPUT);
   digitalWrite8Bits(LOW);
 } // end setup()


// We test this with a simple "walking one" routine
void loop() {
  int i,j = 1;

  for(i = 0; i < 8; i++)
  {
   digitalWrite8Bits(j);
   j <<= 1;
   delay(500);
 }

}

I'm a get'er done kind of c/c++ programmer so any suggestions on how to improve the code are welcome.
 
Last edited:
Code:
	return (uint8_t) ((rawregvalue >> 20) + (rawregvalue >> 16)) & 0x0f;
Shouldn't that be & 0xff ?

Pete
 
I should add that el_supremo must have seen the un-edited version of my first post (I caught the error before he posted his comment). So latecomers won't see my error.....I think....

Maybe it's best to leave our misteaks <deliberate misspelling> alone and take our lumps :). At least that way everyone can be in on the conversation.
 
Maybe it's best to leave our misteaks <deliberate misspelling> alone and take our lumps :). At least that way everyone can be in on the conversation.
I think it is best to fix them in the original post, so when someone new comes along later and takes that code from the OP, they have a working version of it rather than potentially dealing with an error for a while themselves, before spending even more time trawling through a forum thread looking for and applying updates.
 
Benchmarking my routines gave me a read rate a bit over 70MBytes/second and about half that for writes. Quite a bit slower than I had expected for writing. Delving into the digitalWriteFast code I found that it uses a different set of GPIO registers, the DR_SET and DR_CLEAR registers. I modified my 8-bit write routine to use them instead and achieved almost 170MBytes/second.

The new digitalWrite8Bits routine looks like this:

Code:
// This faster version of digitalWrite8Bits() uses the GPIO SET and CLEAR registers, similar to how digitalWriteFast() works.
// Despite the fact that I'm writing to two registers instead of one, this approach benchmarks much faster.
void digitalWrite8Bits(uint8_t value)
{
  uint32_t writebits;

  writebits = (( (uint32_t) value & 0xf0) << 20 ) + (((uint32_t) value & 0x0f)<<16);
  GPIO6_DR_SET = writebits;
  GPIO6_DR_CLEAR = writebits ^ _pickbits;
}

BTW uint32_t _pickbits = 0x0f0f0000, although that can be seen in my OP.

The disadvantage of this approach is the 0-->1 transition occurs some nanoseconds before the 1-->0 transition does. If this is an issue, it would be necessary to use an external 8-wide latch clocked by a ninth Teensy pin. The transition precedence also can easily swapped by swapping the precedence of the two register writes. You would not use a transparent latch, it would have to be an edge triggered type - but anyone who cares about this kind of thing would know that anyway :).

I don't think there's much hope for speeding up the read rate, since it's necessary to read the GPIO data regisgter; and that is what slowed down my original write function. Still, 71MBytes/second isn't too bad.
Mark
 
Hi,

You could also remember the pins states and then use the GPIO6_DR_TOGGLE register instead to prevent a delay between the high to low and low to high transitions.

All the details for the available registers can be found in the IMXRT1060 manual available here, specifically chap 12.
 
Hi,

You could also remember the pins states and then use the GPIO6_DR_TOGGLE register instead to prevent a delay between the high to low and low to high transitions.

All the details for the available registers can be found in the IMXRT1060 manual available here, specifically chap 12.

RE: the toggle register -- good idea! that would take care of the precedence issue. It actually will be more like my original approach, but would use a different register. So it should be a trivial thing to do.

I downloaded the manual long ago. Big beast, it is.
 
Writing to the TOGGLE register benchmarks between the versions that modify DR and SET/CLEAR , somewhere around 80MBytes/second. Writing to SET/CLEAR is a bit more than twice as fast, according to my testing.

These raw numbers represent what the Teensy4.0 can do when it's running in a tight loop (no overclocking) with no additional data processing. So a real-world application that needs to run other code can't achieve these kinds of sample rates. That said, the 600MHz system clock is fast enough for the processor to do some concurrent data processing.
 
Writing to the TOGGLE register benchmarks between the versions that modify DR and SET/CLEAR , somewhere around 80MBytes/second. Writing to SET/CLEAR is a bit more than twice as fast, according to my testing.

These raw numbers represent what the Teensy4.0 can do when it's running in a tight loop (no overclocking) with no additional data processing. So a real-world application that needs to run other code can't achieve these kinds of sample rates. That said, the 600MHz system clock is fast enough for the processor to do some concurrent data processing.

Hi,

I find your your results quite surprising. I would think that the TOGGLE, SET and CLEAR registers have the same access time... In my (admittingly very dirty) test, I find that the write time is the same for all 3 registers: it takes two cycles when every values are hardcoded. So that is around 300MHz when the T4 is clocked at 600Mhz.But of course, this kind of testing is mostly useless in practice as many other factors will certainly play a more important role in practical applications. Note also that Teensyduino moves the digital pins to the 'fast' ports GPIO6-9 at boot.
 
I modified my code to benchmark all three approaches and it reports a different result when modifying the TOGGLE register. I'm not sure why -- I didn't change the code other than rename the various versions of digitalWrite8Bitsxx, where xx is either _DR, _TOGGLE or _SET_CLEAR.

Anyway, my most-recent benchmarking produced the following result:


WRITE rate to GPIO6_DR, Mbytes/sec: 50.00
WRITE rate to GPIO6_TOGGLE, Mbytes/sec: 250.00
WRITE rate to GPIO6_SET_CLEAR, Mbytes/sec: 166.67
data READ rate, Mbytes/sec: 71.43


I have included the code below. It occurs to me that digitalWrite8Bits_DR may run more slowly because I use a read-modify-write sequence in the form of:
GPIO6_DR ^= _lastbits_DR. It would be faster to just write the modified value in _lastbits_DR (of course, it would be in a different form since we won't perform the XOR on the register).
BUT this has the great disadvantage of smashing all the other bits that are used to set pin states. Given the fact that the other two forms preserve the other bits
and they run much faster, it makes no sense to use digitalWrite8Bits_DR().

Code:
/* Code to benchmark my 8 bit parallel I/O functions.
 *  For Teensy4.0
 *  The read/write routines directly access the GPIO6 data registers for fast
 *  reads and writes.
 *  
 *  GPIO6 is the best choice because:
 *  pin #'s 21, 20, 23, 22, 16 and 17 (in MSB to LSB order, GPIO6 bit#'s 27-22) are contiguous;
 *  and pin#'s 15, 14, 18 and 19 are contiguous in the same order, bit#'s 19-16.
 *  So one register read could fetch 2 separate nybbles that can be combined to make an 8-bit word.
 *  If we choose bit #'s 27-24 for the upper nybble, we need to right-shift them by 20 bits;
 *  and for the lower 4 bits (#'s 19-16) we need to right-shift them by 16 bits.
 *  the T4's processor only supports 32-bit Read/Write accesses so we're stuck with both shift operations.
 * 
 *  if raw_reg = GPIO6_DR & 0x0f0f0000, we can do something like this to read an 8-bit input:
 *  value = ((raw_reg >> 20) | (raw_reg >> 16)) & 0xff
  * outputting an 8 bit value will require some masking to avoid changing the state of other pins associated with GPIO6.
 * 
 * 
 *  The bit ordering can be found in core_pins.h, in the section that defines CORE_PINX_BIT,
 *  where X = 0...39.
 * 
 *   NOTE: pinMode()is used inside pinMode8Bits() to configure the input/output pins.  Calling any of the digitalWrite8Bits_..() routines before
 *   setting the 8-bit mode to OUT should work the same as in digitalWrite and pinMode.
 *   
 * 
 */

// #include <Arduino.h>
 #include <core_pins.h>

#define _pickbits 0x0f0f0000;
#define ledpin 13

const int EightPins[8] = {21,20,23,22,15,14,18,19};

// I'm using a somewhat-Arduino-like naming scheme for these.

uint8_t digitalRead8Bits(void)
{
	uint32_t rawregvalue;
	
	rawregvalue = GPIO6_DR & _pickbits;
  return (uint8_t) ((rawregvalue >> 20) + (rawregvalue >> 16));
}

void pinMode8Bits(int mode)
{
 int i;
 for(i = 0; i < 8; i++)
  pinMode(EightPins[i], mode);
}

uint32_t _last_DR = 0;

// This version writes to DR.  _last_DR is used to avoid the overhead of a read/write cycle.
void digitalWrite8Bits_DR(uint8_t value)
{
  uint32_t writebits,newbits;
  
  newbits =  (( (uint32_t) value & 0xf0) << 20 ) + (((uint32_t) value & 0x0f)<<16); // we want to update _last with this value later
  writebits = newbits ^ _last_DR;
  GPIO6_DR ^= writebits;
  _last_DR = newbits;
}

// A version that uses the toggle register.  Suggestion by vindar on the PJRC forum.
uint32_t _last_TOGGLE = 0;

void digitalWrite8Bits_TOGGLE(uint8_t value)
{
  uint32_t writebits,newbits;
  
  newbits =  (( (uint32_t) value & 0xf0) << 20 ) + (((uint32_t) value & 0x0f)<<16); // we want to update _last with this value later
  writebits = newbits ^ _last_TOGGLE;
  GPIO6_DR_TOGGLE = writebits;
  _last_TOGGLE = newbits;
}

// This faster version of digitalWrite8Bits() uses the GPIO SET and CLEAR registers, similar to how digitalWriteFast() works.
// Despite the fact that I'm writing to two registers instead of one, this approach benchmarks much faster.
void digitalWrite8Bits_SET_CLEAR(uint8_t value)
{
  uint32_t writebits;

  writebits = (( (uint32_t) value & 0xf0) << 20 ) + (((uint32_t) value & 0x0f)<<16); // same as above
  GPIO6_DR_SET = writebits;
  GPIO6_DR_CLEAR = writebits ^ _pickbits;
}

void setup() {
  float time0;
  int i;
  uint16_t temp,temp1,temp2,temp3,temp4,temp5,temp6,temp7,temp8,temp9;  // this is to keep the compiler from optimizing my benchmark code too much.  I hope...
  
  Serial.begin(115200);
  while(!Serial);
  
   pinMode8Bits(OUTPUT);

   // Benchmark for modifying GPIO6_DR
   time0 = (float) millis();
   
   for(i = 0; i < 100000; i++) // one hundred-thousand loops
   {
   digitalWrite8Bits_DR(0); // ten calls/loop = one million calls
   digitalWrite8Bits_DR(0xff);
   digitalWrite8Bits_DR(0);
   digitalWrite8Bits_DR(0xff);
   digitalWrite8Bits_DR(0);
   digitalWrite8Bits_DR(0xff);
   digitalWrite8Bits_DR(0);
   digitalWrite8Bits_DR(0xff);
   digitalWrite8Bits_DR(0);
   digitalWrite8Bits_DR(0xff);
}
  time0 = (float) millis() - time0; // This is how long it took to execute one million calls in milliseconds.
  digitalWrite8Bits_DR(0); // This is done so subsequent calls to digitalWrite8Bits_TOGGLE work right (probably doesn't matter in the context of benchmarking)
  Serial.print("WRITE rate to GPIO6_DR, Mbytes/sec: ");
  Serial.println(1000/time0);

   // Benchmark for modifying GPIO6_TOGGLE
   time0 = (float) millis();
   
   for(i = 0; i < 100000; i++) // one hundred-thousand loops
   {
   digitalWrite8Bits_TOGGLE(0); // ten calls/loop = one million calls
   digitalWrite8Bits_TOGGLE(0xff);
   digitalWrite8Bits_TOGGLE(0);
   digitalWrite8Bits_TOGGLE(0xff);
   digitalWrite8Bits_TOGGLE(0);
   digitalWrite8Bits_TOGGLE(0xff);
   digitalWrite8Bits_TOGGLE(0);
   digitalWrite8Bits_TOGGLE(0xff);
   digitalWrite8Bits_TOGGLE(0);
   digitalWrite8Bits_TOGGLE(0xff);
}
  time0 = (float) millis() - time0; // This is how long it took to execute one million calls in milliseconds.
  digitalWrite8Bits_TOGGLE(0);
  Serial.print("WRITE rate to GPIO6_TOGGLE, Mbytes/sec: ");
  Serial.println(1000/time0);

  // Benchmark for modifying GPIO6_SET and _CLEAR

   time0 = (float) millis();
   
   for(i = 0; i < 100000; i++) // one hundred-thousand loops
   {
   digitalWrite8Bits_SET_CLEAR(0); // ten calls/loop = one million calls
   digitalWrite8Bits_SET_CLEAR(0xff);
   digitalWrite8Bits_SET_CLEAR(0);
   digitalWrite8Bits_SET_CLEAR(0xff);
   digitalWrite8Bits_SET_CLEAR(0);
   digitalWrite8Bits_SET_CLEAR(0xff);
   digitalWrite8Bits_SET_CLEAR(0);
   digitalWrite8Bits_SET_CLEAR(0xff);
   digitalWrite8Bits_SET_CLEAR(0);
   digitalWrite8Bits_SET_CLEAR(0xff);
}
  time0 = (float) millis() - time0; // This is how long it took to execute one million calls in milliseconds.
  digitalWrite8Bits_SET_CLEAR(0);
  Serial.print("WRITE rate to GPIO6_SET_CLEAR, Mbytes/sec: ");
  Serial.println(1000/time0);

  // now see how long it takes to read 8 bits

  time0 = (float) millis();
  for(i =0; i < 100000; i++)
  {
    temp = digitalRead8Bits();
    temp1 = digitalRead8Bits();
    temp2 = digitalRead8Bits();
    temp3 = digitalRead8Bits();
    temp4 = digitalRead8Bits();
    temp5 = digitalRead8Bits();
    temp6 = digitalRead8Bits();
    temp7 = digitalRead8Bits();
    temp8 = digitalRead8Bits();
    temp9 = digitalRead8Bits();
  }
  time0 = (float) millis() - time0;
  Serial.print("data READ rate, Mbytes/sec: ");
  Serial.println(1000/time0);
 } // end setup()



void loop() {
  while(1);

/*
 //  A walking ones test to verify correct pin vs bit ordering.
  int i,j = 1;
  unsigned char readback;

  for(i = 0; i < 8; i++)
  {
   digitalWrite8Bits(j);
   readback = digitalRead8Bits();
   Serial.println(readback, HEX);
   j <<= 1;
   delay(500);
   }
 */
}
 
BTW, the version writing to SET and CLEAR runs slower is because it uses two writes to the GPIO register, compared to one that writes to TOGGLE. So the benchmark results are consistent with this.
 
I noticed some peculiar variations in the benchmark results that were resolved by using micros() instead of millis() to get the execution time (along with a few minor code changes to account for the changeover from milliseconds to microseconds). The improvement probably is due to the relatively poor time resooution of millis() in this situation. Now writing to the TOGGLE register benchmarks to a consistent 300MBytes/second. Using the TOGGLE register for fast writes appears to the best option at this point, particularly if you don't want the parallel output code to impose much of a processor load.

Independently setting or clearing bits in the set associated with the 8 bit Write function may cause problems with subsequent calls to digitalWrite8Bits_TOGGLE(), depending on what the state of that bit was.
 
@markkimball: using uint32_t time0;

and ARM_DWT_CYCCNT for timing should have even less error than micros():
Code:
   // Benchmark for modifying GPIO6_DR
[B]   time0 = ARM_DWT_CYCCNT;[/B]
   
   for(i = 0; i < 100000; i++) // one hundred-thousand loops
   {
   digitalWrite8Bits_DR(0); // ten calls/loop = one million calls
   digitalWrite8Bits_DR(0xff);
   digitalWrite8Bits_DR(0);
   digitalWrite8Bits_DR(0xff);
   digitalWrite8Bits_DR(0);
   digitalWrite8Bits_DR(0xff);
   digitalWrite8Bits_DR(0);
   digitalWrite8Bits_DR(0xff);
   digitalWrite8Bits_DR(0);
   digitalWrite8Bits_DR(0xff);
}
 [B] time0 = ARM_DWT_CYCCNT - time0;[/B] // This is how long it took to execute one million calls in milliseconds.
  digitalWrite8Bits_DR(0); // This is done so subsequent calls to digitalWrite8Bits_TOGGLE work right (probably doesn't matter in the context of benchmarking)
  Serial.print("WRITE rate to GPIO6_DR, Mbytes/sec: ");
  [B]Serial.println((float)F_CPU_ACTUAL/time0);[/B]

Code:
WRITE rate to GPIO6_DR, Mbytes/sec: 47.24
WRITE rate to GPIO6_TOGGLE, Mbytes/sec: 299.89
WRITE rate to GPIO6_SET_CLEAR, Mbytes/sec: 149.97
data READ rate, Mbytes/sec: 74.99
 
Last edited:
Thanks for the tidbit, @defragster. I wasn't aware of ARM_DWT_CYCCNT.

Glad to point it out. It is of course the MCU's cycle clock counter and at 600 MHz has 600 times more resolution than micros() - and on T_4.x it is used to resolve micros() between millis().

It is generally very useful even for the shortest set of instructions/elapsed time and takes about 3 clock cycles to read, versus the 35-40 cycles it takes to resolve micros() when used in a tight spot like _isr() timing.

It also works on the T_3.x family - but there the ARM_DWT_CYCCNT currently isn't set running on Reset like the T_4.x family, since it is used in resolving micros().

Though at 600 MHz it does wrap in (2^32)/600,000,000 seconds when using uint32_t's. @luni has made a lib with a 64 bit aware version.
 
Glad to point it out. It is of course the MCU's cycle clock counter and at 600 MHz has 600 times more resolution than micros() - and on T_4.x it is used to resolve micros() between millis().

It is generally very useful even for the shortest set of instructions/elapsed time and takes about 3 clock cycles to read, versus the 35-40 cycles it takes to resolve micros() when used in a tight spot like _isr() timing.

It also works on the T_3.x family - but there the ARM_DWT_CYCCNT currently isn't set running on Reset like the T_4.x family, since it is used in resolving micros().

Though at 600 MHz it does wrap in (2^32)/600,000,000 seconds when using uint32_t's. @luni has made a lib with a 64 bit aware version.

OK, based on the register in question I'd been wondering why it was updating just every microsecond,. But I see that isn't the case at all. Good to know when using it to convert to time.

Does it update even after noInterrupts() is called?
 
OK, based on the register in question I'd been wondering why it was updating just every microsecond,. But I see that isn't the case at all. Good to know when using it to convert to time.

Does it update even after noInterrupts() is called?

Once enabled the ARM_DWT_CYCCNT ticks on each clock cycle of the MCU without pause.

In doing the micros() based offset to millis() code - or other testing - I did something like.

uint32_t anArray[10]

then in some fashion - loop or unrolled using index 'ii' - anArray[ii]=ARM_DWT_CYCCNT

You'll see indications of about 3 clock ticks of 600 MHz when printing the diff of : anArray[ii+1] - anArray[ii]
 
Once enabled the ARM_DWT_CYCCNT ticks on each clock cycle of the MCU without pause.

In doing the micros() based offset to millis() code - or other testing - I did something like.

uint32_t anArray[10]

then in some fashion - loop or unrolled using index 'ii' - anArray[ii]=ARM_DWT_CYCCNT

You'll see indications of about 3 clock ticks of 600 MHz when printing the diff of : anArray[ii+1] - anArray[ii]

Alright! This will be useful for profiling code sections. I have some concerns with regard to execution speed with an isr inside something I'm working on, this looks like a way to check it out.

Many thanks for the information.
 
For various reasons I've been looking at various ways to get reasonably fast parallel I/O for 8 bits on a teensy 4.0. None of the T4.0's GPIOs that are used for digital I/O have 8 contiguous bits, but I did find that GPIO6 has two contiguous groups of 4 bits or more. Getting 8 bits still requires a small amount of bit twiddling and shifts but those operations are pretty fast. I wrote some code to test the idea out and it seems to work OK.

I wrote it so it wouldn't be too difficult to turn the core functions into a library. It is simple enough that I have included it here:

Code:
/* 8 bit parallel I/O functions.
 *  For Teensy4.0
 *  The read/write routines directly access the GPIO6 data register for fast
 *  reads and writes.
 * 
 *  GPIO6 is the best choice because:
 *  pin #'s 21, 20, 23, 22, 16 and 17 (in MSB to LSB order, GPIO6 bit#'s 27-22) are contiguous;
 *  and pin#'s 15, 14, 18 and 19 are contiguous in the same order, bit#'s 19-16.
 *  So one register read could fetch 2 separate nybbles that can be combined to make an 8-bit word.
 *  If we choose bit #'s 27-24 for the upper nybble, we need to right-shift them by 20 bits;
 *  and for the lower 4 bits (#'s 19-16) we need to right-shift them by 16 bits.
 *  the T4's processor only supports 32-bit Read/Write accesses so we're stuck with both shift operations.
 *
 *  if raw_reg = GPIO6_DR, we can do something like this to read an 8-bit input:
 *  raw_reg &= 0x0f0f0000;
 *   value = ((raw_reg >> 20) | (raw_reg >> 16)) & 0xff;
 *
 *  outputting an 8 bit value will require some masking to avoid changing the state of other pins associated with GPIO6.
 *
 *
 *  The bit ordering can be found in core_pins.h, in the section that defines CORE_PINX_BIT,
 *  where X = 0...39.
 *  If we use an external multiplexer to place either the uppper 8 or lower 8 bits from a 16-bit ADC we can
 *  then combine THEM to acquire a 16-bit value.
 *  OK this is not optimal compared to directly fetching 16 bits, but the Teensy4.0 board design doesn't permit that.
 *
 *   NOTE:  we still will use pinMode() to configure the input/output pins.
 */

 #include <Arduino.h>
 #include <core_pins.h>

#define _pickbits 0x0f0f0000
#define ledpin 13

const int EightPins[8] = {21,20,23,22,15,14,18,19};

// I'm using a somewhat-Arduino-like naming scheme for these.

uint8_t digitalRead8Bits(void)
{
    uint32_t rawregvalue;
   
    rawregvalue = GPIO6_DR & _pickbits;
    return (uint8_t) ((rawregvalue >> 20) + (rawregvalue >> 16)) & 0xff;
}


// For a bit more efficiency I'm using the XOR function to change the output bits.
// This avoids the need to explicitly protect other register bits.
//  Since the bootup logic states are all LOWs this should work even if digitalWrite8Bits() is called before pinMode8Bits() is called.
volatile uint32_t _last = 0;

void digitalWrite8Bits(uint8_t value)
{
  uint32_t writebits,newbits;
 
  newbits = (( (uint32_t) value & 0xf0) << 20 ) + (((uint32_t) value & 0x0f)<<16); // 我们想稍后用这个值更新 _last
  写入位 = 新位 ^ _last;
  GPIO6_DR^=写入位;
  _last = 新位;
}

void pinMode8Bits(int 模式)
{
 int 我;
 对于(i = 0;i < 8;i++)
  pinMode(EightPins [i],模式);
}

无效设置(){
   pinMode8Bits(输出);
   digitalWrite8Bits(低);
 } // 结束设置()


// 我们用一个简单的“步行一”程序来测试这一点
无效循环(){
  int i,j = 1;

  对于(i = 0;i < 8;i++)
  {
   digitalWrite8Bits(j);
   j<<=1;
   延迟(500);
 }

}
[/代码]

我是一位完成任务的 c/c++ 程序员,因此欢迎任何关于如何改进代码的建议。
[/QUOTE]
你好,非常幸运看到你的这篇博客,最近看到一段teensy 4用ADS901E ADC采集信号的代码,ADS901E的10端口12(MSB)接teensy4的14,然后是11-15,10-16。。。,3-23,代码如下:  

/* Return unsigned integer (0-1023) from 10 Continuous GPIO pins (14-23, MSb on 14) (takes ~50.1ns) */
uint16_t 模拟读取()
{
    //GPIO 寄存器位顺序:2、3、16、17、18、19、22、23、24、25、26、27
    // Teensy 引脚顺序:1、0、19、18、14、15、17、16、22、23、20、21
    // 所有引脚都在 GPIO6 上
    
    uint16_t gpio_reg = *(&GPIO6_DR + 2) >> 16;
    uint16_t val = ((gpio_reg & 0x0200) >> 9) | // 引脚 23 (GPIO 25)
                   ((gpio_reg & 0x0100) >> 7) | // 引脚 22 (GPIO 24)
                   ((gpio_reg & 0x0800) >> 9) | // 引脚 21 (GPIO 27)
                   ((gpio_reg & 0x0400) >> 7) | // 引脚 20 (GPIO 26)
                   ((gpio_reg & 0x0003) << 4) | // 引脚 19,18 (GPIO 16,17)
                    (gpio_reg & 0x00C0)| // 引脚 17,16(GPIO 22,23)
                   ((gpio_reg & 0x0008) << 5) | // 引脚 15 (GPIO 19)
                   ((gpio_reg & 0x0004) << 7); //引脚 14 (GPIO 18)
    返回值;
}。
我比较疑惑的是,代码中为什么会有这样的偏移?为什么要这样偏移?如果你看到这个问题,能不能抽点时间回答一下?非常感谢。
 

Attachments

  • 1111.png
    1111.png
    207.6 KB · Views: 29
  • 2222222.png
    2222222.png
    146.6 KB · Views: 24
你好,很幸运看到你的这篇博客,最近看到一个teensy 4用ADS901E ADC采集信号的代码,ADS901E的10端口12(MSB)接teensy4的14,然后是11-15,10-16。。。,3-23,代码如下:/* Return unsigned integer (0-1023) from 10 Continuous GPIO pins (14-23, MSb on 14) (takes ~50.1ns) */
uint16_t 模拟读取()
{
//GPIO 寄存器位顺序:2、3、16、17、18、19、22、23、24、25、26、27
// Teensy 引脚顺序:1、0、19、18、14、15、17、16、22、23、20、21
// 所有引脚都在 GPIO6 上

uint16_t gpio_reg = *(&GPIO6_DR + 2) >> 16;
uint16_t val = ((gpio_reg & 0x0200) >> 9) | // 引脚 23 (GPIO 25)
((gpio_reg & 0x0100) >> 7) | // 引脚 22 (GPIO 24)
((gpio_reg & 0x0800) >> 9) | // 引脚 21 (GPIO 27)
((gpio_reg & 0x0400) >> 7) | // 引脚 20 (GPIO 26)
((gpio_reg & 0x0003) << 4) | // 引脚 19,18 (GPIO 16,17)
(gpio_reg & 0x00C0)| // 引脚 17,16(GPIO 22,23)
((gpio_reg & 0x0008) << 5) | // 引脚 15 (GPIO 19)
((gpio_reg & 0x0004) << 7); //引脚 14 (GPIO 18)
返回值;
}。
我比较疑惑的是,代码中为什么会有这样的偏移?为什么要这样偏移?如果你看到这个问题,能不能抽点时间回答一下?非常感谢。
 
Translation of above:
Hello, I am very lucky to see your blog. I recently saw a code for teensy 4 to use ADS901E ADC to collect signals. ADS901E's 10 port 12 (MSB) is connected to teensy4's 14, then 11-15, 10-16. . . , 3-23, the code is as follows: /* Return unsigned integer (0-1023) from 10 Continuous GPIO pins (14-23, MSb on 14) (takes ~50.1ns) */

uint16_t simulate_read()
{
//GPIO register bit sequence: 2, 3, 16, 17, 18, 19, 22, 23, 24, 25, 26, 27
// Teensy pin order: 1, 0, 19, 18, 14, 15, 17, 16, 22, 23, 20, 21
// All pins are on GPIO6

uint16_t gpio_reg = *(&GPIO6_DR + 2) >> 16;
uint16_t val = (( gpio_reg & 0x0200) >> 9 ) | // Pin 23 (GPIO 25)
(( gpio_reg & 0x0100 ) >> 7 ) | // Pin 22 (GPIO 24)
(( gpio_reg & 0x0800 ) >> 9 ) | // Pin 21 (GPIO 27)
(( gpio_reg & 0x0400 ) >> 7 ) | // Pin 20 (GPIO 26)
(( gpio_reg & 0x0003 ) << 4 ) | // Pin 19,18 (GPIO 16,17)
( gpio_reg & 0x00C0 ) | // Pins 17,16 (GPIO 22,23)
(( gpio_reg & 0x0008 ) << 5 ) | // Pin 15 (GPIO 19)
(( gpio_reg & 0x0004 ) << 7 ); //Pin 14 (GPIO 18)
return val;
}


What I'm more confused about is why is there such an offset in the code? Why is it offset like this? If you see this question, can you take a moment to answer it? Thank you so much.
 
Translation of above:
I assume you are asking about the (translated) statement "uint16_t gpio_reg=*(&GPIO6_DR+2)>>16".

I don't think it will work the way the writer wanted it to. When I started working on my scheme, I thought I could avoid some right-shifting by just reading the upper 16 bits of DR but it didn't work -- the data I got was incorrect when compared to a 32-bit read. I concluded that the processor doesn't "like" accessing just one half of a register. It's all 32 bits on integral register address boundaries, or nothing.

I also want to comment that the code which assembles the 10-bit output value is very inefficient. GPIO6_DR has 6 contiguous bits (27....22) so all of them could be masked-off, then the result shifted to fill in bit #'s 10.....5 in "val". Finish off the lower 4 bits by again masking-off GPIO6_DR bits 19...16, then shifting them over to fill in bit #'s 4....1 in "val". Bit 1 is the LSB in this discussion.

My recommended approach is based on appropriately connecting the ADS901E's output pins to the T4.0. They will NOT be connected in contiguous order as defined by the Teensyduino D0....Dnn nomenclature (in addition to scrambling the register bit order, they use different registers). Connect them up according to the order defined by the DR pins-to-Arduino pin mapping.

A T4.1 would be even easier since it has 16 contiguous bits brought out to its pins. That's what I'm using for a 16 bit 1MSPS ADC board I designed.
 
I assume you are asking about the (translated) statement "uint16_t gpio_reg=*(&GPIO6_DR+2)>>16".

I don't think it will work the way the writer wanted it to. When I started working on my scheme, I thought I could avoid some right-shifting by just reading the upper 16 bits of DR but it didn't work -- the data I got was incorrect when compared to a 32-bit read. I concluded that the processor doesn't "like" accessing just one half of a register. It's all 32 bits on integral register address boundaries, or nothing.

I also want to comment that the code which assembles the 10-bit output value is very inefficient. GPIO6_DR has 6 contiguous bits (27....22) so all of them could be masked-off, then the result shifted to fill in bit #'s 10.....5 in "val". Finish off the lower 4 bits by again masking-off GPIO6_DR bits 19...16, then shifting them over to fill in bit #'s 4....1 in "val". Bit 1 is the LSB in this discussion.

My recommended approach is based on appropriately connecting the ADS901E's output pins to the T4.0. They will NOT be connected in contiguous order as defined by the Teensyduino D0....Dnn nomenclature (in addition to scrambling the register bit order, they use different registers). Connect them up according to the order defined by the DR pins-to-Arduino pin mapping.

A T4.1 would be even easier since it has 16 contiguous bits brought out to its pins. That's what I'm using for a 16 bit 1MSPS ADC board I designed.
Thanks for the answer! But I am still confused. Except for the first sentence of code, why are the following codes shifted like this? I am not asking why they are shifted, but what is the basis for such shifting? Where is the provision for such shifting? Looking forward to your reply!

uint16_t val = (( gpio_reg & 0x0200) >> 9 ) | // Pin 23 (GPIO 25)
(( gpio_reg & 0x0100 ) >> 7 ) | // Pin 22 (GPIO 24)
(( gpio_reg & 0x0800 ) >> 9 ) | // Pin 21 (GPIO 27)
(( gpio_reg & 0x0400 ) >> 7 ) | // Pin 20 (GPIO 26)
(( gpio_reg & 0x0003 ) << 4 ) | // Pin 19,18 (GPIO 16,17)
( gpio_reg & 0x00C0 ) | // Pins 17,16 (GPIO 22,23)
(( gpio_reg & 0x0008 ) << 5 ) | // Pin 15 (GPIO 19)
(( gpio_reg & 0x0004 ) << 7 ); //Pin 14 (GPIO 18)
 
Thanks for the answer! But I am still confused. Except for the first sentence of code, why are the following codes shifted like this? I am not asking why they are shifted, but what is the basis for such shifting? Where is the provision for such shifting? Looking forward to your reply!

uint16_t val = (( gpio_reg & 0x0200) >> 9 ) | // Pin 23 (GPIO 25)
(( gpio_reg & 0x0100 ) >> 7 ) | // Pin 22 (GPIO 24)
(( gpio_reg & 0x0800 ) >> 9 ) | // Pin 21 (GPIO 27)
(( gpio_reg & 0x0400 ) >> 7 ) | // Pin 20 (GPIO 26)
(( gpio_reg & 0x0003 ) << 4 ) | // Pin 19,18 (GPIO 16,17)
( gpio_reg & 0x00C0 ) | // Pins 17,16 (GPIO 22,23)
(( gpio_reg & 0x0008 ) << 5 ) | // Pin 15 (GPIO 19)
(( gpio_reg & 0x0004 ) << 7 ); //Pin 14 (GPIO 18)
Shifting is one of the fundamental operators in C and C++ (just like "+", "-" and "/" are). All you have to do is write ">>" and the compiler spits out the machine code needed for that operation. It is an operator that is extensively used when altering I/O registers.

Also, pay attention to the vertical bar at the end of each line. It is the OR operator so it combines the mask and shift operations. This section of code actually is ONE statement that assembles and combines the bits in gpio_reg and puts them into "val".
 
Back
Top