Yes, but it's much less delay.
Teensy 3.0 has 2 other functions, digitalReadFast() and digitalWriteFast() which you can use for the fastest possible performance. They are extremely quick.
But there are a couple small caveats to these fast versions (the reason why the normal ones aren't made fast). The main limitation is the fast ones only work with a constant for the pin number. The fast ones also skip some fancy stuff that's needed for perfect Arduino compatibility, like using digitalWrite() to control the pullup resistor on a pin configured for input mode.
Here are a couple quick tests. First, the normal digitalWrite():
Code:
void setup() {
pinMode(2, OUTPUT);
}
void loop() {
while (1) {
digitalWrite(2, HIGH);
digitalWrite(2, LOW);
digitalWrite(2, HIGH);
digitalWrite(2, LOW);
digitalWrite(2, HIGH);
digitalWrite(2, LOW);
}
}
(click for larger)
As you can see from the 700 kHz waveform, each digitalWrite() is taking approx 0.71 us.
Here's another test using digitalWriteFast():
Code:
void setup() {
pinMode(2, OUTPUT);
}
void loop() {
while (1) {
digitalWriteFast(2, HIGH);
digitalWriteFast(2, LOW);
digitalWriteFast(2, HIGH);
digitalWriteFast(2, LOW);
digitalWriteFast(2, HIGH);
digitalWriteFast(2, LOW);
}
}
(click for larger)
This shows the extreme speed that's possible. A pair of digitalWriteFast() takes only 21 ns. It's so extremely fast that the rise and fall times of the digital output can be easily seen (the scope's horizontal scale is 50X faster than the one above). The was tested with a 2 inch wire and my scope probe using an ordinary 3 inch ground clip, so there's some overshoot. To make proper measurements at these high bandwidths, better probing with short wires and ground leads is needed. I just quickly hook my scope up to a test board I had laying on my desk for the sake of this message. Also, my scope is only a 200 MHz model, and this waveform is 48 MHz in the pulses, so this is pushing close to the limits of my equipment.
Another feature obvious feature of this waveform is the dead time between each 3 pulses. Some of that time is the loop overhead, to branch back and execute the same code over again. But it also includes some compiler overhead, and other thing I'll mention in a moment. On ARM, writing to the pin involves a store instruction, which needs 2 registers loaded with constants. Some of that dead time is the compiler placing the constants into registers, in preparation for the 6 digitalWriteFast() lines. The compiler is pretty smart about loading registers in advance to optimize loops. Without understanding this, it might seem like digitalWriteFast() only takes 10.5 ns, but in fact there is overhead to set up the registers.
The other factor at play here is a special hardware optimization in the Cortex-M4 chip for back-to-back bus operations. Normally a store instruction takes 2 cycles. But if your code uses multiple store (or load) instructions in a row, it uses a special bus burst mode where the 2nd, 3rd, etc only take a single cycle. In this waveform, we're seeing that effect. The first digitalWriteFast() actually took twice as long as the other 5, but the effect isn't visible since the line was already low.
So digitalWriteFast() can give you extreme speed, and for some uses it can create tiny 10 ns wide pulses (which might be much too fast for some chips), but there is some overhead which the compiler will sometimes optimize to outside of loops.