Here's a CPU speed benchmark I've been using. Tonight I finally took a few moments to clean it up, add some comments in the code, and run the test on several boards (see the readme file).
https://github.com/PaulStoffregen/RSA_signature_speed
Here's a few more MCU's doing RSAsign
Code:
T4@600 0.085 seconds faster
T3.6@180 0.474
T3.5@120 0.910
T3.2@120 1.223
ESP32@240 0.492 -O2 @240mhz
STM32F405 0.675 adafruit @168mhz
M4@120 0.816 faster SAMD51
DUE@84 1.901
dragon@80 1.162 faster dragonfly STM32L476
maple@72 1.964 faster
R4@48 2.254 -O3
pico@125 2.339 -O3
cpx@48 9.496 circuit playground express SAMD21
ZERO@48 11.022
1284p@16 119.068 16K enough
otto@180 0.565 mbed arm cc -O3 F469NI
leo@180 0.565 mbed -O3 F446RE
F767ZI@216 0.332 mbed -O3 0.203 mbedtls
1010@500 0.094 SDK -O3
1170@996 0.057 SDK -O3 (NXP SDK mbedtls lib +[COLOR="#FF0000"]crypto-accel[/COLOR] 0.0069 secs)
cm4@400 0.334 SDK -O3 (NXP SDK mbedtls lib +[COLOR="#FF0000"]crypto-accel[/COLOR] 0.0134 secs)
Teensy LC and 2++ and MEGA2560 won't work -- need more than 8K RAM?
The benchmark is based on mbedtls. The NXP SDK has upgraded their lib's (mbedlts and wolfssl) to take advantage of MCU's crypto accelerators (CAU or DCP or CAAM) for hashing (SHA256) and secret-key encryption (AES). One could also construct and test the benchmark on desktop's and Raspberry PI using the repository C files or using
OpenSSL or
mbedtls or
wolfssl.
Some more Teensy crypto performance numbers (microseconds) for
mbedtls and
wolfssl (optimize Faster) where the CRT column represents RSA signature with CRT shortcut.
Code:
tls SHA256 100! DH RSAs RSAv CRT us Faster
T4 53 163 40279 286990 3413 81259 @600
T3.6 371 718 223952 1601592 19819 451616 @180
T3.5 593 1494 427386 3020459 36098 861444 @120 sketch 64K 6.5K
T3.2 844 1724 580803 4182796 47101 1169631 @120mhz
ESP32 278 1408 270480 1999836 25128 545525 @240 -O2
F767ZI 159 303 157470 1179640 13442 317300 @216 -O3 ARM cc
F469NI 326 792 266110 1980234 23927 536865 @180 -O3 ARM cc otto
M4 559 1384 380810 2641328 38555 769537 @120 SAMD51
dragon 1473 2333 548201 3857128 47745 1105866 STML476RE @80mhz
32F405 448 1272 313675 2214487 33196 633691 @168mhz -O2
pico 1691 1662 1117457 8488853 87161 2246416 @125mhz -O3
maple 1083 2498 969395 7132988 82426 1952519 -O2
DUE 887 2407 854304 6085038 74818 1722067 -Os
ZERO 3124 6972 5278327 40129636 425293 11133000 -Os cpx SAMD21
R4 1352 3785 1041835 7293808 104724 2104544 -O3 @48mhz
1010 123 286 44383 310865 4249 86645 -O3 @500mhz SDK
1170 8 11780 2849 19114 486 6178 -O3 SDK @996mhz [COLOR="#FF0000"]crypto accel[/COLOR], Dcache off
1170 74 88 27108 195441 2325 54660 -O3 Paul's local tls, no crypto accel, Dcache enabled
ssl SHA256 100! DH RSAs RSAv CRT us Faster
T4 47 22 33947 213540 7918 67656
T3.6 347 135 223692 1478821 52545 444273
T3.5 558 204 344588 2233720 83411 684374
T3.2 769 210 348616 2241161 86211 692799 @120mhz
ESP32 550 224 254573 1641904 41192 508553 @240mhz -O2
F767ZI 138 76 96382 553786 21363 191819 @216 -O3
F469NI 700 168 693462 5216466 134668 1376524 @180
M4 442 243 355905 2319518 78639 704381 @120
dragon 798 371 539358 3488030 127341 1072544 @80mhz
32F405 397 171 243018 1582018 57316 479503 @168mhz -O2
pico 1225 606 115476 8442560 178083 2308845 @125mhz -O3
maple 1127 462 911500 6235716 193069 1810600 -O2
DUE 927 638 1199187 8311095 241489 2381622 -Os
ZERO 3081 2684 6385065 47938562 765075 12819002 -Os cpx
R4 1218 559 907313 5996682 216354 1792292 -O3 @48mhz
1010 06 51 38257 257315 9334 79592 -O3 @500mhz SDK
1170 110 85 19627 129262 4563 40022 -O3 SDK @996mhz
Factorial, Diffie-Hellman, and RSA are based on the respective library's big-integer implementation. RSA is 2048-bit encryption/decryption using N, P, Q, E, and D from Paul's RSA sketch, with or without
Chinese Remainder Theorem (CRT). With CRT disabled in Paul's sketch (
#define MBEDTLS_RSA_NO_CRT in local_rsa.h), T4 signing takes 0.290 seconds. Also see
mini-gmp
T4 crypto accelerator
DCP tests
T4
floating point benchmarks
more MCU
performance comparisons crypto, floating point, coremark