onehorse
Well-known member
I finally received a Teensy 3.6 to beta test (thanks Paul and Robin!), was able to quickly configure the Arduino IDE to load programs and started playing around with it. It's beautifully engineered, small, peripheral rich and easy to use. I joined the Kickstarter and bought a few more.
The Teensy 3.1 and 3.2 I started with are great too, I have used them extensively for a lot of my own and customer's projects. But I have always been interested in the STM32 line of MCUs and struggled with mbed and Keil, etc to make use of them. Why bother when we have the Teensy, right? Well the STM32F4 offered a single-precision FPU that I wanted to make use of. When the low-power STM32L4 came out, this proved irresistible to me and I and Thomas Roell designed an STM32L476 MCU development board with a Teensy 3.1 footprint and Thomas wrote an Arduino core for it so we could have our cake and eat it too. That is, we have access to an 80 MHz MCU with an FPU programmable via the USB using an Arduino IDE just like the Teensy. Heaven!
I didn't know at the time we did this that Paul was working on what would become the Teensy 3.6. Now with two small MCUs both with an FPU it is appropriate to ask, how do they compare? I intend to start answering this question in this thread.
First up is power usage, where we expect the STM32L4 to do very well; it is specifically designed as a low-power MCU.
I loaded a simple blink program,
onto both the K66 and L4 and measured the current required to run with both the led on and off while changing the CPU clock speed in the IDE. This is a very simple test of the native power usage when there is very low demand on the MCU resources but all MCU peripherals active by default contribute to the power usage. Here is the results:
The difference on vs. off in each case is simply a matter of which current limiting resistor and color of led was chosen, the useful comparison is with led off. Here you can see that the K66 is using ~350 microAmp per MHz CPU speed while the L4 is using ~39 microAmp per MHz, one-ninth the slope (the actual power usage difference is less) to perform the same task(s). Of course, the tasks aren't exactly the same and that is partly the point, both MCUs are busy doing a lot of things not needed for this particular application. But the L4 has been designed by ST to minimize the number of these tasks, the power each draws, and allows the user (as does the K66 to a degree) to configure the MCU to lower power even further.
So what? If you want to use an MCU to control a megawatt string of leds, you won't care about the power drawn by the MCU! But if you have portable, remote and/or wearable applications running from a small LiPo battery (like almost all of my applications), then MCU power usage is critical. A typical small application uses a 110 mAH LiPo battery for the smallest form factor. This will last ~20 hours, 2.5 eight-hour days, when using the L4 MCU running at 72 MHz continuously (perhaps less when we add sensors, etc). The K66 will last ~3 hours. For some applications, recharging often is no big deal, for others it makes all the difference. But this is why low power can matter.
Next up is Madgwick sensor fusion filter rates, an excellent test of FPU performance.
Here are the results:
I ran very similar sketches (they can't be identical) using an interrupt-based data ready scheme with a MPU9250+MS5637 breakout board connected to pins 16/17 for I2C, 3V3 and GND and pin 9 for interrupt, as I usually do on the Teensy 3.2. For the STM32 I used an identical setup. The I2C bus was run at 400 kHz for both and apart from a few extra serial outputs for the L4, the sketches were functionally identical (I will claim).
The results compare well and I would say that both MCUs perform similarly. It is nice to see the fusion rate is linear with zero intercept with the K66 as I would expect (the rate being totally a function of raw processing speed) and gets up to the 130 kHz level, which is certainly overkill. Although I note the orientation solution was very stable at that rate. It is odd that the L4 fusion rate is less linear and that the best fit doesn't go through the origin. I will keep looking at this, but it might have to do with the extra gravity/linear acceleration calculations and subsequent serial output that doesn't appear in the Teensy sketch.
Overall I would say both of these FPUs perform well, and it is for computations like these that the 180 MHz of the K66 shines.
Lastly, how does the power usage scale with a more or less real world computational task? I have plotted the current measured vs. the Madgwick sensor fusion rate for both the K66 and L4 at different CPU speeds:
Now that the task is mostly FPU calculations rather than waiting in a delay, the power usages compare more closely. But note there is still about a factor of two difference in current usage for the same sensor fusion rate. I judge about ~7 mA is drawn to power just the MPU9250. So by reducing the accl/gyro rate from 200 Hz to 100 Hz and dropping the mag rate from 100 Hz to 8 Hz one can save some power. But power minimization isn't the point of this series of tests. Still, is it possible to lower the power usage on the K66 further? I don't know what kind of optimizations have been attempted already but I can say we have worked pretty hard to get the power usage down on the L4 in software beyond what the ST designers have already done in silicon.
The Teensy 3.1 and 3.2 I started with are great too, I have used them extensively for a lot of my own and customer's projects. But I have always been interested in the STM32 line of MCUs and struggled with mbed and Keil, etc to make use of them. Why bother when we have the Teensy, right? Well the STM32F4 offered a single-precision FPU that I wanted to make use of. When the low-power STM32L4 came out, this proved irresistible to me and I and Thomas Roell designed an STM32L476 MCU development board with a Teensy 3.1 footprint and Thomas wrote an Arduino core for it so we could have our cake and eat it too. That is, we have access to an 80 MHz MCU with an FPU programmable via the USB using an Arduino IDE just like the Teensy. Heaven!
I didn't know at the time we did this that Paul was working on what would become the Teensy 3.6. Now with two small MCUs both with an FPU it is appropriate to ask, how do they compare? I intend to start answering this question in this thread.
First up is power usage, where we expect the STM32L4 to do very well; it is specifically designed as a low-power MCU.
I loaded a simple blink program,
Code:
/* LED Blink, Teensyduino Tutorial #1
http://www.pjrc.com/teensy/tutorial.html
This example code is in the public domain.
*/
#define myLed1 13 // Teensy
void setup() {
// Serial.begin(38400);
pinMode(myLed1, OUTPUT);
digitalWrite(myLed1, LOW);
digitalWrite(myLed1, HIGH); // Test function
delay(5000);
digitalWrite(myLed1, LOW);
}
void loop() {
digitalWrite(myLed1, !digitalRead(myLed1));
delay(5000);
digitalWrite(myLed1, !digitalRead(myLed1));
delay(5000);
}
onto both the K66 and L4 and measured the current required to run with both the led on and off while changing the CPU clock speed in the IDE. This is a very simple test of the native power usage when there is very low demand on the MCU resources but all MCU peripherals active by default contribute to the power usage. Here is the results:
The difference on vs. off in each case is simply a matter of which current limiting resistor and color of led was chosen, the useful comparison is with led off. Here you can see that the K66 is using ~350 microAmp per MHz CPU speed while the L4 is using ~39 microAmp per MHz, one-ninth the slope (the actual power usage difference is less) to perform the same task(s). Of course, the tasks aren't exactly the same and that is partly the point, both MCUs are busy doing a lot of things not needed for this particular application. But the L4 has been designed by ST to minimize the number of these tasks, the power each draws, and allows the user (as does the K66 to a degree) to configure the MCU to lower power even further.
So what? If you want to use an MCU to control a megawatt string of leds, you won't care about the power drawn by the MCU! But if you have portable, remote and/or wearable applications running from a small LiPo battery (like almost all of my applications), then MCU power usage is critical. A typical small application uses a 110 mAH LiPo battery for the smallest form factor. This will last ~20 hours, 2.5 eight-hour days, when using the L4 MCU running at 72 MHz continuously (perhaps less when we add sensors, etc). The K66 will last ~3 hours. For some applications, recharging often is no big deal, for others it makes all the difference. But this is why low power can matter.
Next up is Madgwick sensor fusion filter rates, an excellent test of FPU performance.
Here are the results:
I ran very similar sketches (they can't be identical) using an interrupt-based data ready scheme with a MPU9250+MS5637 breakout board connected to pins 16/17 for I2C, 3V3 and GND and pin 9 for interrupt, as I usually do on the Teensy 3.2. For the STM32 I used an identical setup. The I2C bus was run at 400 kHz for both and apart from a few extra serial outputs for the L4, the sketches were functionally identical (I will claim).
The results compare well and I would say that both MCUs perform similarly. It is nice to see the fusion rate is linear with zero intercept with the K66 as I would expect (the rate being totally a function of raw processing speed) and gets up to the 130 kHz level, which is certainly overkill. Although I note the orientation solution was very stable at that rate. It is odd that the L4 fusion rate is less linear and that the best fit doesn't go through the origin. I will keep looking at this, but it might have to do with the extra gravity/linear acceleration calculations and subsequent serial output that doesn't appear in the Teensy sketch.
Overall I would say both of these FPUs perform well, and it is for computations like these that the 180 MHz of the K66 shines.
Lastly, how does the power usage scale with a more or less real world computational task? I have plotted the current measured vs. the Madgwick sensor fusion rate for both the K66 and L4 at different CPU speeds:
Now that the task is mostly FPU calculations rather than waiting in a delay, the power usages compare more closely. But note there is still about a factor of two difference in current usage for the same sensor fusion rate. I judge about ~7 mA is drawn to power just the MPU9250. So by reducing the accl/gyro rate from 200 Hz to 100 Hz and dropping the mag rate from 100 Hz to 8 Hz one can save some power. But power minimization isn't the point of this series of tests. Still, is it possible to lower the power usage on the K66 further? I don't know what kind of optimizations have been attempted already but I can say we have worked pretty hard to get the power usage down on the L4 in software beyond what the ST designers have already done in silicon.
Last edited: