Speed of digitalRead and digitalWrite with Teensy3.0

Status
Not open for further replies.
Hi,

I was wondering if someone already measured the duration of digitalRead and digitalWrite. In my Arduino Project both commands resulted in a significant delay and it was necessary to directly access the ports. Is this delay also present using Teensy 3.0?

Best Nils
 
Yes, but it's much less delay.

Teensy 3.0 has 2 other functions, digitalReadFast() and digitalWriteFast() which you can use for the fastest possible performance. They are extremely quick.

But there are a couple small caveats to these fast versions (the reason why the normal ones aren't made fast). The main limitation is the fast ones only work with a constant for the pin number. The fast ones also skip some fancy stuff that's needed for perfect Arduino compatibility, like using digitalWrite() to control the pullup resistor on a pin configured for input mode.

Here are a couple quick tests. First, the normal digitalWrite():

Code:
void setup() {
  pinMode(2, OUTPUT);
}
void loop() {
  while (1) {
    digitalWrite(2, HIGH);
    digitalWrite(2, LOW);
    digitalWrite(2, HIGH);
    digitalWrite(2, LOW);
    digitalWrite(2, HIGH);
    digitalWrite(2, LOW);
  }
}

scope_0.png
(click for larger)

As you can see from the 700 kHz waveform, each digitalWrite() is taking approx 0.71 us.

Here's another test using digitalWriteFast():

Code:
void setup() {
  pinMode(2, OUTPUT);
}
void loop() {
  while (1) {
    digitalWriteFast(2, HIGH);
    digitalWriteFast(2, LOW);
    digitalWriteFast(2, HIGH);
    digitalWriteFast(2, LOW);
    digitalWriteFast(2, HIGH);
    digitalWriteFast(2, LOW);
  }
}

scope_1.png
(click for larger)

This shows the extreme speed that's possible. A pair of digitalWriteFast() takes only 21 ns. It's so extremely fast that the rise and fall times of the digital output can be easily seen (the scope's horizontal scale is 50X faster than the one above). The was tested with a 2 inch wire and my scope probe using an ordinary 3 inch ground clip, so there's some overshoot. To make proper measurements at these high bandwidths, better probing with short wires and ground leads is needed. I just quickly hook my scope up to a test board I had laying on my desk for the sake of this message. Also, my scope is only a 200 MHz model, and this waveform is 48 MHz in the pulses, so this is pushing close to the limits of my equipment.

Another feature obvious feature of this waveform is the dead time between each 3 pulses. Some of that time is the loop overhead, to branch back and execute the same code over again. But it also includes some compiler overhead, and other thing I'll mention in a moment. On ARM, writing to the pin involves a store instruction, which needs 2 registers loaded with constants. Some of that dead time is the compiler placing the constants into registers, in preparation for the 6 digitalWriteFast() lines. The compiler is pretty smart about loading registers in advance to optimize loops. Without understanding this, it might seem like digitalWriteFast() only takes 10.5 ns, but in fact there is overhead to set up the registers.

The other factor at play here is a special hardware optimization in the Cortex-M4 chip for back-to-back bus operations. Normally a store instruction takes 2 cycles. But if your code uses multiple store (or load) instructions in a row, it uses a special bus burst mode where the 2nd, 3rd, etc only take a single cycle. In this waveform, we're seeing that effect. The first digitalWriteFast() actually took twice as long as the other 5, but the effect isn't visible since the line was already low.

So digitalWriteFast() can give you extreme speed, and for some uses it can create tiny 10 ns wide pulses (which might be much too fast for some chips), but there is some overhead which the compiler will sometimes optimize to outside of loops.
 
Last edited:
Paul, have you considered making digitalWrite a macro that uses __builtin_constant_p to call digitalWriteFast automagically? Something like:

Code:
void digitalWrite(uint8_t pin, uint8_t val);
static inline void digitalWriteFast(uint8_t pin, uint8_t val) __attribute__((always_inline, unused));
static inline void digitalWriteFast(uint8_t pin, uint8_t val)
{
  // stuff to do digitalWriteFast
}

// Convert digitalWrite into digitalWriteFast if the argument is constant
// the parenthesis around digitalWrite ensures the function is called, and not the macro
#define digitalWrite(PIN, VAL) (__builtin_constant_p (PIN) ? digitalWriteFast (PIN, VAL) : (digitalWrite) (PIN, VAL))

You would need:

Code:
#undef digitalWrite

before the definition.

In looking at it, since digitalWriteFast already does the __builtin_constant_p test, you could potentially move the guts to digitalWrite, and then use:

Code:
#define digitalWriteFast(PIN, VAL) digitalWrite(PIN, VAL)

This assumes that there is no additional processing that digitalWrite does for constant pins that isn't done for digitalWriteFast. Presumably the same would go for digitalRead/digitalReadFast.

The __builtin_constant_p function is documented at: http://gcc.gnu.org/onlinedocs/gcc-4.7.3/gcc/Other-Builtins.html#Other-Builtins
 
Yes, I did that on Teensy 2.0, planning to do it on 3.0 at some point.

But on 3.0, even using __builtin_constant_p, more stuff is needed to emulate the AVR quirks and handle PWM pins, so it won't ever be as fast as using digitalWriteFast which skips those checks.
 
I notice that te Arduino IDE doesn't highlight the digitalWriteFast() or digitalReadFast() functions... Am I doing something wrong?
 
Status
Not open for further replies.
Back
Top