#define that can evaluate?

mhavoc

Member
Hi All...

Im pretty far down a rabbithole with trying to make a set of macros that will do all my GPIO bit shifting and logical OR'ing to build up 32-bit variables in such a way that when I move I/O pins around on a schematic, it is much easier to update the code.

In short, Im trying to move away from this type of thing.. which is fast but a 'bit' of a nightmare to maintain. Thanks in advance!! :cool:

register uint16_t address;
register uint32_t GPIO6_data=0;
register uint32_t GPIO7_data=0;
register uint32_t GPIO8_data=0;
register uint32_t GPIO9_data=0;

GPIO6_data = GPIO6_DR;
GPIO7_data = GPIO7_DR;
GPIO8_data = GPIO8_DR;
GPIO9_data = GPIO9_DR;

address = ( (GPIO6_data&0x00020000) >> 17 ) | // A0 Teensy 4.0 PIN_18 GPIO6_DR[17]
( (GPIO6_data&0x00010000) >> 15 ) | // A1 Teensy 4.0 PIN_19 GPIO6_DR[16]
( (GPIO6_data&0x04000000) >> 24 ) | // A2 Teensy 4.0 PIN_20 GPIO6_DR[26]
( (GPIO7_data&0x00000800) >> 8 ) | // A3 Teensy 4.0 PIN_9 GPIO7_DR[11]

( (GPIO7_data&0x00020000) >> 13 ) | // A4 Teensy 4.0 PIN_7 GPIO7_DR[17]
( (GPIO9_data&0x00000100) >> 3 ) | // A5 Teensy 4.0 PIN_5 GPIO9_DR[8]
( (GPIO6_data&0x01000000) >> 18 ) | // A6 Teensy 4.0 PIN_22 GPIO6_DR[24]
( (GPIO9_data&0x00000040) << 1 ) | // A7 Teensy 4.0 PIN_4 GPIO9_DR[6]

( (GPIO6_data&0x02000000) >> 17 ) | // A8 Teensy 4.0 PIN_23 GPIO6_DR[25]
( (GPIO6_data&0x08000000) >> 18 ) | // A9 Teensy 4.0 PIN_21 GPIO6_DR[27]
( (GPIO7_data&0x00010000) >> 6 ) | // A10 Teensy 4.0 PIN_8 GPIO7_DR[16]
( (GPIO7_data&0x00000400) << 1 ) | // A11 Teensy 4.0 PIN_6 GPIO7_DR[10]

( (GPIO9_data&0x00000010) << 8 ) | // A12 Teensy 4.0 PIN_2 GPIO9_DR[4]
( (GPIO9_data&0x00000020) << 8 ) | // A13 Teensy 4.0 PIN_3 GPIO9_DR[5]
( (GPIO6_data&0x00000004) << 12 ) | // A14 Teensy 4.0 PIN_1 GPIO6_DR[2]
( (GPIO6_data&0x00000008) << 12 ) ; // A15 Teensy 4.0 PIN_0 GPIO6_DR[3]

... and more towards something like this.. note that the incrementing number in the second parameter to MAPR and MAPL is the bit position that I want the GPIO bit to be in this case for a 16-bit variable.

address = (MAPR(PIN_ADDR0,0)) |
(MAPR(PIN_ADDR1,1))|
(MAPR(PIN_ADDR2,2))|
(MAPR(PIN_ADDR3,3))|
(MAPR(PIN_ADDR4,4))|
(MAPR(PIN_ADDR5,5))|
(MAPL(PIN_ADDR6,6))|
(MAPL(PIN_ADDR7,7))|
(MAPR(PIN_ADDR8,8))|
(MAPR(PIN_ADDR9,9))|
(MAPR(PIN_ADDR10,10))|
(MAPR(PIN_ADDR11,11))|
(MAPL(PIN_ADDR12,12))|
(MAPR(PIN_ADDR13,13))|
(MAPR(PIN_ADDR14,14))|
(MAPL(PIN_ADDR15,15));

First caveat... I KNOW this is just making something overly complicated in the opposite direction. But if I can define these MACROS, then I can use this approach on all my projects and it will save time.

The reason I needed to have a MAPR and a MAPL is that sometimes the original GPIO bit location is to the LEFT of the target/wanted bit position and sometimes it is to the RIGHT. So I end up having to have two different #defines as I can't pass negative numbers to a bitwise shift operator.

Second caveat... I'm going to this extreme as well because Im trying to do the GPIO reads as fast as possible so I don't want any code execution, or am I being silly?

#define MAPR(pin,targetbit) ((REGISTER(pin)&(1<<BITSH(pin)) >> (BITSH(pin)-targetbit)))
#define MAPL(pin,targetbit) ((REGISTER(pin)&(1<<BITSH(pin)) << (targetbit-BITSH(pin))))

My question is how can I get away from having double MACROS based upon if I need to shift right or left?

Ideas I had that don't work
1. negative shifting - compiler error
2. over-shifting (like rotating) - shifts can't rotate
3. #if inside of #define - not allowed

Here are all the DEFINES that allow this to work.

#define PIN_ADDR0 9
#define PIN_ADDR1 8
#define PIN_ADDR2 7
#define PIN_ADDR3 6
#define PIN_ADDR4 5
#define PIN_ADDR5 4
#define PIN_ADDR6 3
#define PIN_ADDR7 2
#define PIN_ADDR8 22
#define PIN_ADDR9 28
#define PIN_ADDR10 26
#define PIN_ADDR11 27
#define PIN_ADDR12 1
#define PIN_ADDR13 23
#define PIN_ADDR14 17
#define PIN_ADDR15 0



//The macros below are based upon the output of the GPIO map for the Teensy 4.0
//PIN GPIOn-BITm | GPIOn-BITm PIN
//------------------|-------------------
//00 -> GPIO6-03 | GIPO6-02 -> 01
//01 -> GPIO6-02 | GIPO6-03 -> 00
//02 -> GPIO9-04 | GIPO6-12 -> 24
//03 -> GPIO9-05 | GIPO6-13 -> 25
//04 -> GPIO9-06 | GIPO6-16 -> 19
//05 -> GPIO9-08 | GIPO6-17 -> 18
//06 -> GPIO7-10 | GIPO6-18 -> 14
//07 -> GPIO7-17 | GIPO6-19 -> 15
//08 -> GPIO7-16 | GIPO6-22 -> 17
//09 -> GPIO7-11 | GIPO6-23 -> 16
//10 -> GPIO7-00 | GIPO6-24 -> 22
//11 -> GPIO7-02 | GIPO6-25 -> 23
//12 -> GPIO7-01 | GIPO6-26 -> 20
//13 -> GPIO7-03 | GIPO6-27 -> 21
//14 -> GPIO6-18 | GIPO6-30 -> 26
//15 -> GPIO6-19 | GIPO6-31 -> 27
//16 -> GPIO6-23 | GIPO7-00 -> 10
//17 -> GPIO6-22 | GIPO7-01 -> 12
//18 -> GPIO6-17 | GIPO7-02 -> 11
//19 -> GPIO6-16 | GIPO7-03 -> 13
//20 -> GPIO6-26 | GIPO7-10 -> 06
//21 -> GPIO6-27 | GIPO7-11 -> 09
//22 -> GPIO6-24 | GIPO7-12 -> 32
//23 -> GPIO6-25 | GIPO7-16 -> 08
//24 -> GPIO6-12 | GIPO7-17 -> 07
//25 -> GPIO6-13 | GIPO8-12 -> 37
//26 -> GPIO6-30 | GIPO8-13 -> 36
//27 -> GPIO6-31 | GIPO8-14 -> 35
//28 -> GPIO8-18 | GIPO8-15 -> 34
//29 -> GPIO9-31 | GIPO8-16 -> 39
//30 -> GPIO8-23 | GIPO8-17 -> 38
//31 -> GPIO8-22 | GIPO8-18 -> 28
//32 -> GPIO7-12 | GIPO8-22 -> 31
//33 -> GPIO9-07 | GIPO8-23 -> 30
//34 -> GPIO8-15 | GIPO9-04 -> 02
//35 -> GPIO8-14 | GIPO9-05 -> 03
//36 -> GPIO8-13 | GIPO9-06 -> 04
//37 -> GPIO8-12 | GIPO9-07 -> 33
//38 -> GPIO8-17 | GIPO9-08 -> 05
//39 -> GPIO8-16 | GIPO9-31 -> 29

#define _r0 GPIO6_data
#define _p0 3
#define _r1 GPIO6_data
#define _p1 2
#define _r2 GPIO9_data
#define _p2 4
#define _r3 GPIO9_data
#define _p3 5
#define _r4 GPIO9_data
#define _p4 6
#define _r5 GPIO9_data
#define _p5 8
#define _r6 GPIO7_data
#define _p6 10
#define _r7 GPIO7_data
#define _p7 17
#define _r8 GPIO7_data
#define _p8 16
#define _r9 GPIO7_data
#define _p9 11
#define _r10 GPIO7_data
#define _p10 0
#define _r11 GPIO7_data
#define _p11 2
#define _r12 GPIO7_data
#define _p12 1
#define _r13 GPIO7_data
#define _p13 3
#define _r14 GPIO6_data
#define _p14 18
#define _r15 GPIO6_data
#define _p15 19
#define _r16 GPIO6_data
#define _p16 23
#define _r17 GPIO6_data
#define _p17 22
#define _r18 GPIO6_data
#define _p18 17
#define _r19 GPIO6_data
#define _p19 16
#define _r20 GPIO6_data
#define _p20 26
#define _r21 GPIO6_data
#define _p21 27
#define _r22 GPIO6_data
#define _p22 24
#define _r23 GPIO6_data
#define _p23 25
#define _r24 GPIO6_data
#define _p24 12
#define _r25 GPIO6_data
#define _p25 13
#define _r26 GPIO6_data
#define _p26 30
#define _r27 GPIO6_data
#define _p27 31
#define _r28 GPIO8_data
#define _p28 18
#define _r29 GPIO9_data
#define _p29 31
#define _r30 GPIO8_data
#define _p30 23
#define _r31 GPIO8_data
#define _p31 22
#define _r32 GPIO7_data
#define _p32 12
#define _r33 GPIO9_data
#define _p33 7
#define _r34 GPIO8_data
#define _p34 15
#define _r35 GPIO8_data
#define _p35 14
#define _r36 GPIO8_data
#define _p36 13
#define _r37 GPIO8_data
#define _p37 12
#define _r38 GPIO8_data
#define _p38 17
#define _r39 GPIO8_data
#define _p39 16


#define REGISTER(pin) _r##pin
#define BITSH(pin) _p##pin
 
1. don't use "register" - it's outdated and the compiler knows better what should be a register. ARM Cortex doesn't have so many registers to work with, and declaring something as a "register" will worsen the pressure and result in slower code as a consequence.
2. you can simply use the "?" (e.g. (a<b)?true:false; or even the normal if..{}else....
If everything is constant, the compiler will notice this and automatically generate the maximum compact code.

example:
if (1) Serial.print("xy"); - the compiler will optimize the if(1) away, completely. Same for more complex expressions, as long they are const. If not gcc tries to optimize it as much as possible (and often will be than you ;)
or.. x = 7>5?1+1:3+9; -> compiles as x=2;


Just trust the compiler. It is much better than you think.
And always remember #define is not more than a super simple text search+replace before compiling. It does nothing.


3. do you know digitalWriteFast(const,...)/digitalReadFast(const)?
 
Last edited:
do you know digitalWriteFast(const,...)/digitalReadFast(const)?

Yes you are being silly. digitalWriteFast will be as fast or faster than anything you write yourself. ( Unless you want to just toggle a pin where you can be faster by writing directly to the toggle register ).

Just define your pins with names and change the defines to 'rewire' your board.

Code:
#define TX_PIN 12
#define RX_PIN 11

  digitaWriteFast( TX_PIN, LOW );
 
I also once wanted to be as fast as possible with the I/O and wrote a speed test program for the Teensy 3.6 ( it may not run on Teensy 4 ). It proved to me that you won't invent anything faster than digitalWriteFast.

Code:
/* LED Blink, Teensyduino Tutorial #1
   http://www.pjrc.com/teensy/tutorial.html
 
   This example code is in the public domain.
*/

// Teensy 2.0 has the LED on pin 11
// Teensy++ 2.0 has the LED on pin 6
// Teensy 3.x / Teensy LC have the LED on pin 13
const int ledPin = 13;
elapsedMicros time1;
float time3, time4, time2;


// the setup() method runs once, when the sketch starts

void setup() {
  // initialize the digital pin as an output.
  pinMode(ledPin, OUTPUT);
  Serial.begin(9600);
  
}

// the loop() method runs over and over again,
// as long as the board has power

void loop() {
long i;

  time1 = 0;
  for( i = 0; i < 10000000L; ++i ){
    digitalWrite(ledPin, HIGH);   // set the LED on
    digitalWrite(ledPin, LOW);   // set the LED off
  }
  time3 = time1;

  time1 = 0;
  for( i = 0; i < 10000000L; ++i ){
  // use the internal register name for bit set, bit clear
     GPIOC_PSOR =  ( 1 << 5 );
     GPIOC_PCOR =  ( 1 << 5 );
  }
  time4 = time1;

  // just for fun toggle the LED
  // changed to test digitalWriteFast
  time1 = 0;
  for( i = 0; i < 10000000L; ++i ){
    //  GPIOC_PTOR =  ( 1 << 5 );
    digitalWriteFast(ledPin, HIGH);   // set the LED on
    digitalWriteFast(ledPin, LOW);   // set the LED off

  }
  time2 = time1;

  Serial.println( "Teensy 3.6 running at 120 mhz");
  Serial.print( "Digital Write Test ");  Serial.print( time3/1000000.0,3 ); Serial.println(" seconds");
  Serial.print( " Set Clear Test    ");  Serial.print( time4/1000000.0,3 ); Serial.println(" seconds");
  Serial.print( "Digital Write Fast ");  Serial.print( time2/1000000.0,3 ); Serial.println(" seconds");
  Serial.println();
  
  delay(1000);                  
  
}
 
Yes you are being silly.
I was not answering you.
I also once wanted to be as fast as possible with the I/O and wrote a speed test program for the Teensy 3.6 ( it may not run on Teensy 4 ). It proved to me that you won't invent anything faster than digitalWriteFast.
For single pins, true.
For more than one: Often, but not always. Depends....
 
I guess Im doing all of this to do a single write to the register instead of 32 individual commands where my digital pins are all being set with a 6ns delay between them... I need my outputs to be *very* responsive (like 70ns)... if I go sequentially, then I will have a 16*6 ns delay in just writing.. which is 96ns. I will certainly do some benchmarking soon (my prototype will show up this week) but I still feel that the register approach is going to be FASTER than sequentially writing each pin.
 
thanks for the boilerplate... I will set up some benchmarks of both approaches and report back. Not sure my crappy digital scope will be fast enough to actually look at my signals but from the software side I should be able to get some timing info with your code..

I also once wanted to be as fast as possible with the I/O and wrote a speed test program for the Teensy 3.6 ( it may not run on Teensy 4 ). It proved to me that you won't invent anything faster than digitalWriteFast.

Code:
/* LED Blink, Teensyduino Tutorial #1
   http://www.pjrc.com/teensy/tutorial.html
 
   This example code is in the public domain.
*/

// Teensy 2.0 has the LED on pin 11
// Teensy++ 2.0 has the LED on pin 6
// Teensy 3.x / Teensy LC have the LED on pin 13
const int ledPin = 13;
elapsedMicros time1;
float time3, time4, time2;


// the setup() method runs once, when the sketch starts

void setup() {
  // initialize the digital pin as an output.
  pinMode(ledPin, OUTPUT);
  Serial.begin(9600);
  
}

// the loop() method runs over and over again,
// as long as the board has power

void loop() {
long i;

  time1 = 0;
  for( i = 0; i < 10000000L; ++i ){
    digitalWrite(ledPin, HIGH);   // set the LED on
    digitalWrite(ledPin, LOW);   // set the LED off
  }
  time3 = time1;

  time1 = 0;
  for( i = 0; i < 10000000L; ++i ){
  // use the internal register name for bit set, bit clear
     GPIOC_PSOR =  ( 1 << 5 );
     GPIOC_PCOR =  ( 1 << 5 );
  }
  time4 = time1;

  // just for fun toggle the LED
  // changed to test digitalWriteFast
  time1 = 0;
  for( i = 0; i < 10000000L; ++i ){
    //  GPIOC_PTOR =  ( 1 << 5 );
    digitalWriteFast(ledPin, HIGH);   // set the LED on
    digitalWriteFast(ledPin, LOW);   // set the LED off

  }
  time2 = time1;

  Serial.println( "Teensy 3.6 running at 120 mhz");
  Serial.print( "Digital Write Test ");  Serial.print( time3/1000000.0,3 ); Serial.println(" seconds");
  Serial.print( " Set Clear Test    ");  Serial.print( time4/1000000.0,3 ); Serial.println(" seconds");
  Serial.print( "Digital Write Fast ");  Serial.print( time2/1000000.0,3 ); Serial.println(" seconds");
  Serial.println();
  
  delay(1000);                  
  
}
 
If you want to use arbitrary pins on the Teensy as an I/O bus, you don't need such convoluted macros.

On Teensy 4.x, there are four banks of 32 bits each, so a single number between 0 and 127 suffices to identify a pin. 0-31 are GPIO6, 32-63 GPIO7, 64-95 GPIO8, and 96-127 GPIO9.
If their states are described by an array of four 32-bit values, uint32_t state[4], then bit i state is (state[i/32] & (1 << (i & 31))).
The state is boolean, i.e. zero if the bit is zero, and nonzero if the bit is nonzero. If you want it to be zero or one, use !!(state[i/32] & (1 << (i & 31))).

(! is just the logical not operator. The not-not operator evaluates to 1 if the argument is true/nonzero, and to 0 if false/zero. The compiler can optimize this kind of code quite well.)

If you want a parallel output bus using arbitrary Teensy pins for the bus, you use a look-up table. Each look-up table entry on Teensy 4.x is 4×32=128 bits as above. If the bus is N bits wide, you need 2N entries. Thus, for example an 8-bit bus will need an array uint32_t bus_lookup[256][4]; taking 4096 bytes of closely-coupled RAM. To save memory, you can of course split wide buses into narrower ones. For example, a 16-bit bus would need 1048768 bytes (so you'd need to put it in PSRAM), but two 8-bit buses only 8192 bytes. You initialize the look-up table to zeroes, then assign a pin (bit index, 0-127, as above) to a bus bit using a loop over the look-up table,
Code:
static void add_output_bus_mapping(uint32_t size, uint32_t table[size][4], uint_fast8_t pinbit, uint_fast8_t busbit)
{
    const uint32_t  busmask = 1 << busbit;

    for (uint32_t i = 0; i < size; i++)
        if (i & busmask)
            table[i][pinbit/32] |= 1 << (pinbit & 31);
}
where size is 2bus width in bits.

To initialize the bus state, you write the initial bus state via the look-up table to the port SET register; its inverted state (~) to the port CLEAR register, and save the bus value in a variable.
To change the bus state, you write the exclusive-OR (^) of the old and new states, via the look-up table, to the port TOGGLE register; then save the new state (as is) to the bus state variable.

Note that you can even duplicate bus bits via the look-up method. There is a slight difference in the GPIO pin state changes in successive banks, a few clock cycles (and you might wish to disable interrupts while setting them if parallelism is important), but it is absolutely the fastest method to do this. If you can limit the pins to a specific GPIO bank, not only do you need only 32 bits per look-up entry (and thus only a quarter of RAM for the lookup table), but all pins on the bus will change their state at the same time.

An input bus is more complicated. An arbitrary pin, up to 32 bit wide parallel bus, can be implemented using 4-bit (2048 bytes), 6-bit (6144 bytes), 7-bit (10240 bytes), 8-bit (16384 bytes), 11-bit (98304 bytes), or 16-bit (524288 bytes) lookup tables; with the smallest having most computational work, and largest least computational work. I believe the 8-bit ones are most suitable.
The idea is that the 128-bit GPIO6-9 state is divided into slices of the same size, noting that there is a natural boundary at 32 bit intervals (making 4, 8, and 16 bit lookup tables most suitable).
If the slice is N bits wide, it has 2N entries. Each lookup table entry is 32 bit value, representing the input bus state.

With 8-bit slices, we have 16 slices: four slices per GPIO port, and four GPIO ports. Each slice has 256 entries, each entry being 32-bit or four bytes in size. Thus, 4×256×16 = 16384 bytes in the lookup tables.
When reading the bus state, you read the states of all four GPIO ports. You use a loop that extracts the slice width from the GPIO port state, looks it up in the corresponding slice, and OR's the lookup states together. The input bus state is the binary OR (|) of the lookup values.

(This means that if you map two Teensy pins to the same input bus bit, the result is the OR of their state. If you use exclusive-or (^) above instead, you get exclusive-OR of the pins also. But, you must choose exactly one behaviour – OR or exclusive OR – across all pins in the bus. The code changes depending on which one you use.)

Mapping from the four uint32_t's describing the GPIO pin states to an arbitrary bus using 8-bit slices can be done thus:
Code:
uint32_t  get_bus_state(const uint32_t gpio[4], const uint32_t lookup[4][4][256])
{
    uint32_t  result = 0;
    for (uint_fast8_t bank = 0; bank < 4; bank++)
        result |= lookup[bank][0][(gpio[bank] >> 24) & 255]
               |  lookup[bank][1][(gpio[bank] >> 16) & 255]
               |  lookup[bank][2][(gpio[bank] >> 8) & 255]
               |  lookup[bank][3][gpio[bank] & 255];
    return result;
}
and mapping input pin (bit numbers, 0-127 as above) to bus bit via
Code:
static void add_input_bus_mapping(uint32_t lookup[4][4][256], uint_fast8_t pinbit, uint_fast8_t busbit)
{
    const uint_fast8_t  bank = pinbit / 32;
    const uint_fast8_t  slice = (pinbit / 8) & 3;
    const uint_fast8_t  mask = 1 << (pinbit & 7);
    const uint32_t busmask = 1 << busbit;
    for (uint32_t i = 0; i < 256; i++)
        if (i & mask)
            lookup[bank][slice][i] |= busmask;
}

If you want to control say a ILI9341 display controller or similar with a parallel data interface, you might wish to have one 8-bit input and one 8-bit output bus for commands and data, and three 5 or 6-bit buses for the RGB data. Strobes and D/C selector would be manipulated separately, to ensure you e.g. set data bits before strobing write, and so on. Depending on what you do, you probably have time to combine the RGB data channels from more than one source, say a 15-bit image plane and an 8-bit overlay plane with color masking, between bus state updates.



Teensy 4.0 has 16 pins in GPIO6 (2-3, 12-13, 16-19, 22-27, 30-31); 9 pins in GPIO7 (0-3, 10-12, 16-17); 9 pins in GPIO8 (12-18, 22-23); and 6 pins in GPIO9 (4-8, 31).
Teensy 4.1 has 20 pins in GPIO6 (2-3, 12-13, 16-31); 13 pins in GPIO7 (0-3, 10-12, 16-19, 28-29); 9 pins in GPIO8 (12-18, 22-23); and 13 pins in GPIO9 (4-8, 22, 24-29, 31).
Teensy Micromod has 16 pins in GPIO6 (2-3, 12-13, 16-19, 22-27, 30-31); 15 pins in GPIO7 (0-12, 16-17); 9 pins in GPIO8 (12-18, 22-23); and 6 pins in GPIO9 (4-8, 31).
The numbers and number ranges in quotes refer to the bits in the GPIO registers, not Teensy pin numbers.

This means that for an arbitrary input or output bus, lookup table sizes can be compressed by compacting the input GPIO data. For example, GPIO8 only uses bits 12-23, with bits 12-18 contiguous, so that makes a natural 7-bit lookup table. That leaves just two bits (two-bit, four-entry) look-up table. So, although the absolutely generic way I described above will work, for input bus you can still do better by using dedicated slicing instead of the generic one I showed. And, limiting to pins in one, two, or three banks will significantly reduce the memory needed for the lookup tables.
 
Sorry, I know I am late to the table.

Sometimes, with things like, this it is simplest to look at the sources: For example, for T4.x

If you look at the implementation, it will start of like:
https://github.com/PaulStoffregen/cores/blob/master/teensy4/core_pins.h#L2181

Code:
static inline void digitalWriteFast(uint8_t pin, uint8_t val)
{
	if (__builtin_constant_p(pin)) {
		if (val) {
			if (pin == 0) {
				CORE_PIN0_PORTSET = CORE_PIN0_BITMASK;
			} else if (pin == 1) {
				CORE_PIN1_PORTSET = CORE_PIN1_BITMASK;
			} else if (pin == 2) {
				CORE_PIN2_PORTSET = CORE_PIN2_BITMASK;
			} else if (pin == 3) {
				CORE_PIN3_PORTSET = CORE_PIN3_BITMASK;
			} else if (pin == 4) {
				CORE_PIN4_PORTSET = CORE_PIN4_BITMASK;
			} else if (pin == 5) {
				CORE_PIN5_PORTSET = CORE_PIN5_BITMASK;
			} else if (pin == 6) {
				CORE_PIN6_PORTSET = CORE_PIN6_BITMASK;
			} else if (pin == 7) {
				CORE_PIN7_PORTSET = CORE_PIN7_BITMASK;
			} else if (pin == 8) {
				CORE_PIN8_PORTSET = CORE_PIN8_BITMASK;
			} else if (pin == 9) {
				CORE_PIN9_PORTSET = CORE_PIN9_BITMASK;
			} else if (pin == 10) {
				CORE_PIN10_PORTSET = CORE_PIN10_BITMASK;
			} else if (pin == 11) {
				CORE_PIN11_PORTSET = CORE_PIN11_BITMASK;
			} else if (pin == 12) {
				CORE_PIN12_PORTSET = CORE_PIN12_BITMASK;
			} else if (pin == 13) {
				CORE_PIN13_PORTSET = CORE_PIN13_BITMASK;
			} else if (pin == 14) {
...
		} else {
			if (pin == 0) {
				CORE_PIN0_PORTCLEAR = CORE_PIN0_BITMASK;
			} else if (pin == 1) {
				CORE_PIN1_PORTCLEAR = CORE_PIN1_BITMASK;
			} else if (pin == 2) {
				CORE_PIN2_PORTCLEAR = CORE_PIN2_BITMASK;
			} else if (pin == 3) {
				CORE_PIN3_PORTCLEAR = CORE_PIN3_BITMASK;
			} else if (pin == 4) {
				CORE_PIN4_PORTCLEAR = CORE_PIN4_BITMASK;
			} else if (pin == 5) {
				CORE_PIN5_PORTCLEAR = CORE_PIN5_BITMASK;
			} else if (pin == 6) {
				CORE_PIN6_PORTCLEAR = CORE_PIN6_BITMASK;
			} else if (pin == 7) {
				CORE_PIN7_PORTCLEAR = CORE_PIN7_BITMASK;
			} else if (pin == 8) {
				CORE_PIN8_PORTCLEAR = CORE_PIN8_BITMASK;
			} else if (pin == 9) {
				CORE_PIN9_PORTCLEAR = CORE_PIN9_BITMASK;
			} else if (pin == 10) {
				CORE_PIN10_PORTCLEAR = CORE_PIN10_BITMASK;
			} else if (pin == 11) {
				CORE_PIN11_PORTCLEAR = CORE_PIN11_BITMASK;
			} else if (pin == 12) {
				CORE_PIN12_PORTCLEAR = CORE_PIN12_BITMASK;
			} else if (pin == 13) {
				CORE_PIN13_PORTCLEAR = CORE_PIN13_BITMASK;
			} else if (pin == 14) {
...
So the compile optimize code for something like digitalWriteFast(13, HIGH);
Will detect that 13 is a constant and that HIGH is contant: and will reduce the whole code down to:
CORE_PIN13_PORTSET = CORE_PIN13_BITMASK;

Where:
#define CORE_PIN13_PORTSET GPIO7_DR_SET
#define CORE_PIN13_BITMASK (1<<(CORE_PIN13_BIT))

Which reduces down to simple instruction.

Note: digitalToggleFast(13) does similar:
Code:
static inline void digitalToggleFast(uint8_t pin)
{
	if (__builtin_constant_p(pin)) {
		if (pin == 0) {
			CORE_PIN0_PORTTOGGLE = CORE_PIN0_BITMASK;
		} else if (pin == 1) {
			CORE_PIN1_PORTTOGGLE = CORE_PIN1_BITMASK;
		} else if (pin == 2) {
			CORE_PIN2_PORTTOGGLE = CORE_PIN2_BITMASK;
		} else if (pin == 3) {
			CORE_PIN3_PORTTOGGLE = CORE_PIN3_BITMASK;
		} else if (pin == 4) {
			CORE_PIN4_PORTTOGGLE = CORE_PIN4_BITMASK;
		} else if (pin == 5) {
			CORE_PIN5_PORTTOGGLE = CORE_PIN5_BITMASK;
		} else if (pin == 6) {
			CORE_PIN6_PORTTOGGLE = CORE_PIN6_BITMASK;
		} else if (pin == 7) {
			CORE_PIN7_PORTTOGGLE = CORE_PIN7_BITMASK;
		} else if (pin == 8) {
			CORE_PIN8_PORTTOGGLE = CORE_PIN8_BITMASK;
		} else if (pin == 9) {
			CORE_PIN9_PORTTOGGLE = CORE_PIN9_BITMASK;
		} else if (pin == 10) {
			CORE_PIN10_PORTTOGGLE = CORE_PIN10_BITMASK;
		} else if (pin == 11) {
			CORE_PIN11_PORTTOGGLE = CORE_PIN11_BITMASK;
		} else if (pin == 12) {
			CORE_PIN12_PORTTOGGLE = CORE_PIN12_BITMASK;
		} else if (pin == 13) {
			CORE_PIN13_PORTTOGGLE = CORE_PIN13_BITMASK;
		} else if (pin == 14) {
where:
#define CORE_PIN13_PORTTOGGLE GPIO7_DR_TOGGLE


With T3.x - Things are a bit different. That is digitalWrite uses the bit band support that Arm M3/M4 processors support. So there is an address for each IO pins value. Which gives you atomic operations without having to use set and clear
 
Very helpful, @KurtE. For anyone else wondering, here is a nice explanation from StackOverflow for why/when to use static inline functions in header files.

A static inline function is, in practice, likely (but not certain) to be inlined by some good optimizing compiler (e.g. by GCC when it is given -O2) at most of its call sites.

It is defined in a header file, because it then could be inlined at most call sites (perhaps all of them). If it was just declared (and simply "exported") the inlining is unlikely to happen (except if you compile and link with link-time optimizations, a.k.a. LTO, also, e.g. compile and link with gcc -flto -O2, and that increases a lot the build time).

In practice, the compiler needs to know the body of a function to be able to inline it. So a suitable place is to define it in some common header file (otherwise, it could be inlined only in the same translation unit defining it, unless you enable LTO), so that every translation unit would know the body of that inlinable function.

It is declared static to avoid multiple definitions (at link time) in case the compiler did not inline it (e.g. when you use its address).

In practice, in C99 or C11 code (except with LTO, which I rarely use), I would always put the short functions I want to be inlined as static inline definitions in common header files.
 
I was not answering you.

Let me fix my post just for you Mcu32.


do you know digitalWriteFast(const,...)/digitalReadFast(const)?

Mcu32 has the right idea here.


The OP said:
Second caveat... I'm going to this extreme as well because Im trying to do the GPIO reads as fast as possible so I don't want any code execution, or am I being silly?



Yes you are being silly. digitalWriteFast will be as fast or faster than anything you write yourself. ( Unless you want to just toggle a pin where you can be faster by writing directly to the toggle register ).

Just define your pins with names and change the defines to 'rewire' your board.

Code:

#define TX_PIN 12
#define RX_PIN 11

digitaWriteFast( TX_PIN, LOW );
 
You can define the macro as:
Code:
#define MAP(pin,targetbit) (((REGISTER(pin) >> BITSH(pin)) & 1) << targetbit)
and the compiler will optimize the shifts for you as long as these BITSH() and targetbit are constant expressions.
 
Back
Top