digitalWriteFast on non-constant pins can fault or at least trash memory location

KurtE

Senior Member+
I was trying out a sketch where I did not specify an optional pin number to the constructor.
The library code later did a
Code:
digitalWriteFast(_rd, HIGH);

The addr2line pointed to core_pins.h to the digitalWriteFast function
Which make sense, as:
C++:
static inline void digitalWriteFast(uint8_t pin, uint8_t val)
{
    if (__builtin_constant_p(pin)) {
        if (val) {
            if (pin == 0) {
                CORE_PIN0_PORTSET = CORE_PIN0_BITMASK;
            } else if (pin == 1) {
                CORE_PIN1_PORTSET = CORE_PIN1_BITMASK;
            } else if (pin == 2) {
                CORE_PIN2_PORTSET = CORE_PIN2_BITMASK;
...
            }
    } else {
        if(val) *portSetRegister(pin) = digitalPinToBitMask(pin);
        else *portClearRegister(pin) = digitalPinToBitMask(pin);
    }
}
As portSetRegister just blindly indexes into an array and grabs stuff from random location:
Code:
#define portSetRegister(pin)     ((digital_pin_to_info_PGM[(pin)].reg + 33))

And likewise for getting the mask and then writing that mask value into the random location
returned from the portSetRegister...

Note: This does not happen if you pass in a constant pin number like 0xff
as none of the tests like:
Code:
if (pin == 0) {
                CORE_PIN0_PORTSET = CORE_PIN0_BITMASK;
            } else if (pin == 1) {
                CORE_PIN1_PORTSET = CORE_PIN1_BITMASK;
            } else if (pin == 2) {
                CORE_PIN2_PORTSET = CORE_PIN2_BITMASK;
            } else if (pin == 3) {

will match so it does not do anything...

Two obvious options here:
a) Punt and say - as expected... And change library to check for this.
b) change digitalWriteFast (and probably others to be like:

Code:
} else if (pin < CORE_NUM_DIGITAL {
        if(val) *portSetRegister(pin) = digitalPinToBitMask(pin);
        else *portClearRegister(pin) = digitalPinToBitMask(pin);
    }

Back to playing
 
I believe digitalWriteFast/digitalReadFast are only intended to be used with (compile-time) constant arguments and the non-const cases are only there for completeness. They're not meant to be called with variable pin values because that gives no benefit over using the regular digitalWrite/digitalRead. Probably it would be better if the *Fast functions called the non-Fast function equivalent for the case where the argument isn't const, but really the "fix" here is to use the correct functions to suit the situation - a non-const pin value means use the non-fast functions.
 
That explains something seen {and noted} many months ago. Doing a { loop(variable_PIN) < CORE_NUM_DIGITAL } with the FAST functions was the same as with a constant_PIN!

In some cases the code shows something like this as expected based on __builtin_constant_p():

Code:
// ...\arduino-1.8.19\hardware\teensy\avr\cores\teensy4\core_pins.h
static inline void digitalToggleFast(uint8_t pin)
{
    if (__builtin_constant_p(pin)) {
// ...
    } else {
        digitalToggle(pin);
    }
}

But now seeing in other FAST functions the above
WRITE :: digitalWriteFast
} else { if(val) *portSetRegister(pin) = digitalPinToBitMask(pin); else *portClearRegister(pin) = digitalPinToBitMask(pin); }
READ :: digitalReadFast
} else { return (*portInputRegister(pin) & digitalPinToBitMask(pin)) ? 1 : 0; }

This is the Teensy4 and similar in Teensy3 in CORES.
 
I believe digitalWriteFast/digitalReadFast are only intended to be used with (compile-time) constant arguments and the non-const cases are only there for completeness. They're not meant to be called with variable pin values because that gives no benefit over using the regular digitalWrite/digitalRead. Probably it would be better if the *Fast functions called the non-Fast function equivalent for the case where the argument isn't const, but really the "fix" here is to use the correct functions to suit the situation - a non-const pin value means use the non-fast functions.
I hear you, I believe it used to simply call off to digitalRead if it was not contant... as for speed differences>

Here is a quick and dirty test:
Code:
void setup() {
  pinMode(0, OUTPUT);
  while(!Serial) {} 
  Serial.begin(115200);


}

void loop() {
  // put your main code here, to run repeatedly:
  elapsedMicros em = 0;
  int8_t mypin = 0;
  int i = 0;
  for (i = 0; i < 10000; i++) digitalWrite(0, HIGH);
  uint32_t dtdw = em;
  em = 0;
  for (i = 0; i < 10000; i++) digitalWrite(mypin, HIGH);
  uint32_t dtdwp = em;
  em = 0;
  for (i = 0; i < 10000; i++) digitalWriteFast(0, HIGH);
  uint32_t dtdwf = em;
  em = 0;
  for (i = 0; i < 10000; i++) digitalWriteFast(mypin, HIGH);
  uint32_t dtdwfp = em;
  Serial.printf("%u %u %u %u\n", dtdw, dtdwp, dtdwf, dtdwfp);
  delay(5000);
  }

Output:
Code:
300 300 33 34
300 300 33 34
300 300 33 34
Not much differences between passing in constant or variable in either case:

Also I did my option b:
and the timings did not change...
 
Pretty sure __builtin_constant_p(mypin) would evaluate to 1 in that case because the compiler can see that its value is never modified. Making it a (non-static) global may work around that (but it also might not, compiler can be pretty smart...).
 
This shows some diff ...
Running this changed code to repeat with for() and add Toggle:
Code:
#define SOME_NUM 3
void setup() {
  for ( int ii = 0; ii < SOME_NUM; ii++ )
    pinMode(ii, OUTPUT);
  while (!Serial) {}
  Serial.begin(115200);
}

void loop() {
  // put your main code here, to run repeatedly:
  elapsedMicros em = 0;
  int i = 0;
  for (i = 0; i < 10000; i++)
    for ( int ii = 0; ii < SOME_NUM; ii++ )
      digitalWrite(0, HIGH);
  uint32_t dtdw = em;
  em = 0;
  for (i = 0; i < 10000; i++)
    for ( int ii = 0; ii < SOME_NUM; ii++ )
      digitalWrite(ii, HIGH);
  uint32_t dtdwp = em;
  em = 0;
  for (i = 0; i < 10000; i++)
    for ( int ii = 0; ii < SOME_NUM; ii++ )
      digitalWriteFast(0, HIGH);
  uint32_t dtdwf = em;
  em = 0;
  for (i = 0; i < 10000; i++)
    for ( int ii = 0; ii < SOME_NUM; ii++ )
      digitalWriteFast(ii, HIGH);
  uint32_t dtdwfp = em;
  Serial.printf("%u %u %u %u\n", dtdw, dtdwp, dtdwf, dtdwfp);
  for (i = 0; i < 10000; i++)
    for ( int ii = 0; ii < SOME_NUM; ii++ )
      digitalToggle(0);
  dtdw = em;
  em = 0;
  for (i = 0; i < 10000; i++)
    for ( int ii = 0; ii < SOME_NUM; ii++ )
      digitalToggle(ii);
  dtdwp = em;
  em = 0;
  for (i = 0; i < 10000; i++)
    for ( int ii = 0; ii < SOME_NUM; ii++ )
      digitalToggleFast(0);
  dtdwf = em;
  em = 0;
  for (i = 0; i < 10000; i++)
    for ( int ii = 0; ii < SOME_NUM; ii++ )
      digitalToggleFast(ii);
  dtdwfp = em;
  Serial.printf("\t Toggle: %u %u %u %u\n", dtdw, dtdwp, dtdwf, dtdwfp);
  delay(5000);
}

Made this transition:
Code:
300 300 33 34
300 300 33 34
// Altered Code
1017 1016 100 100
     Toggle: 688 600 100 100
1017 1016 100 100
     Toggle: 688 600 100 100
 
The compiler would unroll the inner loops making the argument a constant. So it's the same as the last test except multiplied by SOME_NUM (3)... which is reflected in the timings.
 
The compiler would unroll the inner loops making the argument a constant. So it's the same as the last test except multiplied by SOME_NUM (3)... which is reflected in the timings.
EDITED - an EM=0 was missing after first Print
In theory perhaps but in practice would this edit preclude that?
Code:
1017 1016 100 100
     Toggle: 688 600 100 100
1017 1016 100 100
     Toggle: 688 600 100 100

random( PIN ):
Code:
#define SOME_NUM 3
void setup() {
  for ( int ii = 0; ii < CORE_NUM_DIGITAL; ii++ )
    pinMode(ii, OUTPUT);
  while (!Serial) {}
  Serial.begin(115200);


}

void loop() {
  // put your main code here, to run repeatedly:
  elapsedMicros em = 0;
  int i = 0;
  int jj;
  for (i = 0; i < 10000; i++)
    for ( int ii = 0; ii < SOME_NUM; ii++ )
      digitalWrite(0, HIGH);
  uint32_t dtdw = em;
  em = 0;
  for (i = 0; i < 10000; i++)
    for ( int ii = 0; ii < SOME_NUM; ii++ ) {
      jj = random(CORE_NUM_DIGITAL);
      digitalWrite(jj, HIGH);
    }
  uint32_t dtdwp = em;
  em = 0;
  for (i = 0; i < 10000; i++)
    for ( int ii = 0; ii < SOME_NUM; ii++ )
      digitalWriteFast(0, HIGH);
  uint32_t dtdwf = em;
  em = 0;
  for (i = 0; i < 10000; i++)
    for ( int ii = 0; ii < SOME_NUM; ii++ ) {
      jj = random(CORE_NUM_DIGITAL);
      digitalWriteFast(jj, HIGH);
    }
  uint32_t dtdwfp = em;
  Serial.printf("%u %u %u %u\n", dtdw, dtdwp, dtdwf, dtdwfp);
  em = 0;
  for (i = 0; i < 10000; i++)
    for ( int ii = 0; ii < SOME_NUM; ii++ )
      digitalToggle(0);
  dtdw = em;
  em = 0;
  for (i = 0; i < 10000; i++)
    for ( int ii = 0; ii < SOME_NUM; ii++ ) {
      jj = random(CORE_NUM_DIGITAL);
      digitalToggle(jj);
    }
  dtdwp = em;
  em = 0;
  for (i = 0; i < 10000; i++)
    for ( int ii = 0; ii < SOME_NUM; ii++ )
      digitalToggleFast(0);
  dtdwf = em;
  em = 0;
  for (i = 0; i < 10000; i++)
    for ( int ii = 0; ii < SOME_NUM; ii++ ) {
      jj = random(CORE_NUM_DIGITAL);
      digitalToggleFast(jj);
    }
  dtdwfp = em;
  Serial.printf("\t Toggle: %u %u %u %u\n", dtdw, dtdwp, dtdwf, dtdwfp);
  delay(5000);
}
 
Last edited:
I put together a simple program (attached at end) to look at the number of CPU cycles per pair of set/clear of a pin depending on whether the pin was defined as a const int or just an int and whether the set/clear pair was done with digitalWrite or digitalWriteFast. Here are the results:

Code:
DigitalWrite      const int time:       49.0  Clocks per Hi/Low Pair
DigitalWrite      Int time:             51.9
DigitalWriteFast  const int time:        3.8
DigitalWriteFast  Int time:              4.0


I could see the difference in pin toggling times on my oscilloscope, but the 25-year-old 60MHz scope couldn't resolve the pulses for the digitalWriteFast series.

Here's the code:
C:
// Digital pin setting and clearing test
// MJB 6/9/2024
#define NSAMPLES 100
const int CP1 = 1;
int IP1 = 1;
void setup() {
  // put your setup code here, to run once:
  Serial.begin(9600);

  delay(400);
  Serial.println("Pin Timing test");
  pinMode(1, OUTPUT);
  Serial.println("Pin 1 set to OUTPUT.\n");
}

void loop() {
  float ctime, itime;
  int i;
  // put your main code here, to run repeatedly:
  ctime = 0.0;
  for (i = 0; i < NSAMPLES; i++) {
    ctime += (float)constClocks(CP1);
  }
  Serial.printf("DigitalWrite      const int time:     %6.1f", ctime / NSAMPLES / 10);
  Serial.println("  Clocks per Hi/Low Pair");

  itime = 0.0;
  for (i = 0; i < NSAMPLES; i++) {
    itime += (float)intClocks(IP1);
  }
  Serial.printf("DigitalWrite      Int time:           %6.1f\n", itime / NSAMPLES / 10);

  ctime = 0.0;
  for (i = 0; i < NSAMPLES; i++) {
    ctime += (float)fastconstClocks(CP1);
  }
  Serial.printf("DigitalWriteFast  const int time:     %6.1f\n", ctime / NSAMPLES / 10);

  itime = 0.0;
  for (i = 0; i < NSAMPLES; i++) {
    itime += (float)fastintClocks(IP1);
  }
  Serial.printf("DigitalWriteFast  Int time:           %6.1f\n", itime / NSAMPLES / 10);
  Serial.println("\n");
  delay(2000);
}


uint32_t constClocks(const int CP1) {
  uint32_t startclk;
  // Now do 10 high/low cycles--without for loop that
  // the compiler might optimize
  noInterrupts();
  startclk = ARM_DWT_CYCCNT;
  digitalWrite(CP1, 1);
  digitalWrite(CP1, 0);
  digitalWrite(CP1, 1);
  digitalWrite(CP1, 0);
  digitalWrite(CP1, 1);
  digitalWrite(CP1, 0);
  digitalWrite(CP1, 1);
  digitalWrite(CP1, 0);
  digitalWrite(CP1, 1);
  digitalWrite(CP1, 0);
  digitalWrite(CP1, 1);
  digitalWrite(CP1, 0);
  digitalWrite(CP1, 0);
  digitalWrite(CP1, 1);
  digitalWrite(CP1, 0);
  digitalWrite(CP1, 1);
  digitalWrite(CP1, 0);
  digitalWrite(CP1, 1);
  digitalWrite(CP1, 0);
  startclk = (ARM_DWT_CYCCNT - startclk);
  interrupts();
  return (startclk);
}

uint32_t intClocks(int IP1) {
  uint32_t startclk;
  // Now do 10 high/low cycles--without for loop that
  // the compiler might optimize
  noInterrupts();
  startclk = ARM_DWT_CYCCNT;
  digitalWrite(IP1, 1);
  digitalWrite(IP1, 0);
  digitalWrite(IP1, 1);
  digitalWrite(IP1, 0);
  digitalWrite(IP1, 1);
  digitalWrite(IP1, 0);
  digitalWrite(IP1, 1);
  digitalWrite(IP1, 0);
  digitalWrite(IP1, 1);
  digitalWrite(IP1, 0);
  digitalWrite(IP1, 1);
  digitalWrite(IP1, 0);
  digitalWrite(IP1, 1);
  digitalWrite(IP1, 0);
  digitalWrite(IP1, 1);
  digitalWrite(IP1, 0);
  digitalWrite(IP1, 1);
  digitalWrite(IP1, 0);
  digitalWrite(IP1, 1);
  digitalWrite(IP1, 0);

  startclk = (ARM_DWT_CYCCNT - startclk);
  interrupts();
  return (startclk);
}


uint32_t fastconstClocks(const int CP1) {
  uint32_t startclk;
  // Now do 10 high/low cycles--without for loop that
  // the compiler might optimize
  noInterrupts();
  startclk = ARM_DWT_CYCCNT;
  digitalWriteFast(CP1, 1);
  digitalWriteFast(CP1, 0);
  digitalWriteFast(CP1, 1);
  digitalWriteFast(CP1, 0);
  digitalWriteFast(CP1, 1);
  digitalWriteFast(CP1, 0);
  digitalWriteFast(CP1, 1);
  digitalWriteFast(CP1, 0);
  digitalWriteFast(CP1, 1);
  digitalWriteFast(CP1, 0);
  digitalWriteFast(CP1, 1);
  digitalWriteFast(CP1, 0);
  digitalWriteFast(CP1, 0);
  digitalWriteFast(CP1, 1);
  digitalWriteFast(CP1, 0);
  digitalWriteFast(CP1, 1);
  digitalWriteFast(CP1, 0);
  digitalWriteFast(CP1, 1);
  digitalWriteFast(CP1, 0);
  startclk = (ARM_DWT_CYCCNT - startclk);
  interrupts();
  return (startclk);
}

uint32_t fastintClocks(int IP1) {
  uint32_t startclk;
  // Now do 10 high/low cycles--without for loop that
  // the compiler might optimize
  noInterrupts();
  startclk = ARM_DWT_CYCCNT;
  digitalWriteFast(IP1, 1);
  digitalWriteFast(IP1, 0);
  digitalWriteFast(IP1, 1);
  digitalWriteFast(IP1, 0);
  digitalWriteFast(IP1, 1);
  digitalWriteFast(IP1, 0);
  digitalWriteFast(IP1, 1);
  digitalWriteFast(IP1, 0);
  digitalWriteFast(IP1, 1);
  digitalWriteFast(IP1, 0);
  digitalWriteFast(IP1, 1);
  digitalWriteFast(IP1, 0);
  digitalWriteFast(IP1, 1);
  digitalWriteFast(IP1, 0);
  digitalWriteFast(IP1, 1);
  digitalWriteFast(IP1, 0);
  digitalWriteFast(IP1, 1);
  digitalWriteFast(IP1, 0);
  digitalWriteFast(IP1, 1);
  digitalWriteFast(IP1, 0);

  startclk = (ARM_DWT_CYCCNT - startclk);
  interrupts();
  return (startclk);
}
 
p#8 code above edited in a missing em=0;

Changed the CORES code for fast ELSE to call the non-FAST - LIKE TOGGLE already does:
Code:
1017 3353 100 3354
     Toggle: 585 2903 100 2954
1016 3353 100 3354
     Toggle: 584 2903 100 2953

Same fixed code called edited CORES: See post #8

This is expected result - when not a builtin pin - don't call Fast - only call FAST when super speed is desired.
 
Last edited:
Reverting to p#6 non-Random version of KurtE's extended code shows YES the compiler does INDEED see the FIXED values - even in the [For(ii) loop] variable - as __builtin_constant_p(mypin)

Code:
// Random pins here
1017 3304 100 3303
     Toggle: 601 2921 100 2854
1017 3303 100 3303
     Toggle: 601 2920 101 2853

// Reverted code here - 
1017 1016 100 100
     Toggle: 687 601 100 100
1017 1016 100 100
     Toggle: 688 600 100 100
 
Here is sort of a contrived code closer to what really is happing:
Code:
class Test {
  public:
    Test(uint8_t pin)
        : _pin(pin) {}
    inline void csHigh() {
        digitalWrite(_pin, HIGH);
    }
    inline void csLow() {
        digitalWrite(_pin, LOW);
    }
    inline void csHighFast() {
        digitalWriteFast(_pin, HIGH);
    }
    inline void csLowFast() {
        digitalWriteFast(_pin, LOW);
    }
    void drawPixel(int16_t x, int16_t y, uint32_t color) {
      csHigh();
      _color = color;
      csLow();
    }
    void drawPixelFast(int16_t x, int16_t y, uint32_t color) {
      csHighFast();
      _color = color;
      csLowFast();
    }
  private:
    uint8_t _pin;
    uint32_t _color;
};

Test test0(0);
Test test1(1);

void setup() {
  while (!Serial) {}
  Serial.begin(115200);
}

void loop() {
  elapsedMicros em;
  for (int i = 0; i < 10000; i++) {
    test0.csHigh();
    test0.csLow();
    test1.csHigh();
    test1.csLow();
  }
  uint32_t slow = em;
  em = 0;

  for (int i = 0; i < 10000; i++) {
    test0.csHighFast();
    test0.csLowFast();
    test1.csHighFast();
    test1.csLowFast();
  }
  uint32_t fast = em;
  em = 0;

  for (int i = 0; i < 10000; i++) {
    test0.drawPixel(i, i, i);
    test1.drawPixel(i, i, i);
  }
  uint32_t slowp = em;
  em = 0;

  for (int i = 0; i < 10000; i++) {
    test0.drawPixelFast(i, i, i);
    test1.drawPixelFast(i, i, i);
  }
  uint32_t fastp = em;


  Serial.printf("%u %u %u %u\n", slow, fast, slowp, fastp);
  delay(5000);
}
A class has one or more pin numbers passed into it as part of it's constructor... Sometimes the pin is not needed for the case
at hand. Example with some displays, maybe there is no CS pin and/or the user tied their pin to +3.3v and as such does not
want to waste a pin on their board...

And the library uses digitalWriteFast for that pin...
Which does still make a significant difference in speed:
Code:
2133 734 2133 700
2133 734 2133 700
2133 734 2133 700

Note: I am running with a modified version of digitalWriteFast where it checkes the pin is within valid range.

It is interesting that the compiler does appear to optimize the High/Low code in the Fast version inside a function which
is an added benefit
 
Back
Top