Teensy 4.0 First Beta Test

Status
Not open for further replies.
Here's a CPU speed benchmark I've been using. Tonight I finally took a few moments to clean it up, add some comments in the code, and run the test on several boards (see the readme file).

https://github.com/PaulStoffregen/RSA_signature_speed

Safe to assume those are default clock speed numbers 180/120/96 of 3.6/3.5/3.2 ?

<edit> Indeed a T_3.6 at 256 MHz gives: Signature computation took 0.333 seconds { versus the posted 0.474 }

To compare - I just got some 240 MHz Dual Core ESP32 PICO based 'TinyPICO' boards. The code as published ran one pass on task in the indicated time 0.518.

Revisiting a quick Task done on each core RTOS version {std Arduino} showed that each core in parallel showed about the same result. I had to tweak to get the loops down to 6 and put in to not trigger the RTOS watchdog

To hope the two collided I did for() runs of 6 passes, one rsa_init() in setup() and then two versions of the rsa_sign code with unique signature[] strings used, then tested the state of the string after the 6th iteration - this shows the first two groups of 6 on each core - after this they hold at the .51 number:
Code:
Task1: 6 iterations in us=3072821	Signature computation took 0.512 seconds
Signature is good :-)

Task0: 6 iterations in us=3063082	Signature computation took 0.511 seconds
Signature is good :-)

Task1: 6 iterations in us=3059924	Signature computation took 0.510 seconds
Signature is good :-)

Task0: 6 iterations in us=3062031	Signature computation took 0.510 seconds
Signature is good :-)

So a T_3.6 at 256 MHz - 50% faster on one core loses ~30% to the ESP32 even if both 240 MHz cores are "used" - and a T_4 at 600 MHz with 0.085 is 3 times faster than even the 0.510/2 counting both cores.

@mjs513 - I wonder what your TeensyThreads on T4 would show running it similarly on two threads?
 
Last edited:
I also cleaned up the CoreMark benchmark.

https://github.com/PaulStoffregen/CoreMark

This copy has all the "no edits allowed" files unchanged, and as far as I know the porting I did is well within what's allowed to be claimed as a valid CoreMark result.

It prints to proper Serial on all Arduino compatible boards, so you can benchmark almost any board if it has enough RAM.


Safe to assume those are default clock speed numbers 180/120/96 of 3.6/3.5/3.2 ?

Yes, I tested every board with its default settings. Same for CoreMark - the results in the readme file are with default settings. Yes, I know Teensy 4.0 can do a little better if the other optimizations are used.
 
defragster said:
@mjs513 - I wonder what your TeensyThreads on T4 would show running it similarly on two threads?
Should have posted this before. My WIP TeensyThreads_t4 version is posted up on GitHub - https://github.com/mjs513/TeensyThreads/tree/TeensyThreads_t4.

I just woke up to all this activity and haven't even had my coffee yet :) Oh boy.

EDIT: Just tried running in a thread - can't seem to get it run properly from within the thread. Thread never seems to return. Probably something conflicting in other tab functions.
 
Last edited:
offline here - but just put this up that shows the shifting for 2 cores as noted above. TinyPICO/.../RSA_signature_speed

<Updated > minor tweaks to github of ESP32 PICO using RTOS - no speed improvement - just Task adjust playing - And updated Speed table and notes in readme.

NXP's ARM core in Teensy 3.6 and 4 more efficient at this task
 
Last edited:
To test TeensyThreads I resurrected my GPS sketch that runs in a thread for a test. The GPS is BN-280 with PPS broken out. So was able to also test a PPS signal of 2000Hz with the freqCount library. To change the PPS I used @ChrisO.'s ublox library which allows provisions to change the PPS rate. Needless to say it was right on at 2000Hz.

As for the GPS running in a thread it worked without a problem.
 
Here's a CPU speed benchmark I've been using. Tonight I finally took a few moments to clean it up, add some comments in the code, and run the test on several boards (see the readme file).

https://github.com/PaulStoffregen/RSA_signature_speed
Here's a few more MCU's doing RSAsign
Code:
      T4@600     0.085 seconds   faster
      T3.6@180   0.474   
      T3.5@120   0.910
      T3.2@120   1.223
      ESP32@240  0.492  -O2 @240mhz
      STM32F405  0.675  adafruit @168mhz
      M4@120     0.816  faster   SAMD51
      DUE@84     1.901
      dragon@80  1.162 faster   dragonfly STM32L476
      maple@72   1.964  faster
      R4@48      2.254 -O3
      pico@125   2.339 -O3
      cpx@48     9.496  circuit playground express SAMD21
      ZERO@48   11.022
      1284p@16 119.068    16K  enough
      otto@180   0.565   mbed  arm cc -O3  F469NI
      leo@180    0.565   mbed  -O3  F446RE
      F767ZI@216 0.332   mbed -O3    0.203 mbedtls
      1010@500   0.094  SDK -O3
      1170@996   0.057  SDK -O3 (NXP SDK mbedtls lib  +[COLOR="#FF0000"]crypto-accel[/COLOR]  0.0069 secs)
       cm4@400   0.334  SDK -O3 (NXP SDK mbedtls lib  +[COLOR="#FF0000"]crypto-accel[/COLOR]  0.0134 secs)
rsasign.png
Teensy LC and 2++ and MEGA2560 won't work -- need more than 8K RAM?

The benchmark is based on mbedtls. The NXP SDK has upgraded their lib's (mbedlts and wolfssl) to take advantage of MCU's crypto accelerators (CAU or DCP or CAAM) for hashing (SHA256) and secret-key encryption (AES). One could also construct and test the benchmark on desktop's and Raspberry PI using the repository C files or using OpenSSL or mbedtls or wolfssl.

Some more Teensy crypto performance numbers (microseconds) for mbedtls and wolfssl (optimize Faster) where the CRT column represents RSA signature with CRT shortcut.
Code:
     tls    SHA256 100!   DH       RSAs    RSAv    CRT   us    Faster
     T4       53    163   40279   286990   3413    81259    @600
     T3.6    371    718  223952  1601592  19819   451616    @180
     T3.5    593   1494  427386  3020459  36098   861444    @120  sketch 64K 6.5K
     T3.2    844   1724  580803  4182796  47101  1169631    @120mhz
     ESP32   278   1408  270480  1999836  25128   545525    @240 -O2
     F767ZI  159    303  157470  1179640  13442   317300    @216 -O3 ARM cc
     F469NI  326    792  266110  1980234  23927   536865    @180 -O3 ARM cc  otto
     M4      559   1384  380810  2641328  38555   769537    @120 SAMD51
     dragon 1473   2333  548201  3857128  47745  1105866    STML476RE @80mhz
     32F405  448   1272  313675  2214487  33196   633691    @168mhz -O2
     pico   1691   1662 1117457  8488853  87161  2246416    @125mhz -O3
     maple  1083   2498  969395  7132988  82426  1952519    -O2
     DUE     887   2407  854304  6085038  74818  1722067    -Os
     ZERO   3124   6972 5278327 40129636 425293 11133000    -Os  cpx  SAMD21
     R4     1352   3785 1041835  7293808 104724  2104544    -O3  @48mhz
     1010    123    286   44383   310865   4249    86645    -O3 @500mhz  SDK
     1170      8  11780    2849    19114    486     6178    -O3 SDK @996mhz [COLOR="#FF0000"]crypto accel[/COLOR], Dcache off
     1170     74     88   27108   195441   2325    54660    -O3 Paul's local tls, no crypto accel, Dcache enabled


     ssl  SHA256 100!   DH       RSAs    RSAv    CRT     us    Faster
     T4      47    22    33947   213540   7918    67656
     T3.6   347   135   223692  1478821  52545   444273
     T3.5   558   204   344588  2233720  83411   684374
     T3.2   769   210   348616  2241161  86211   692799  @120mhz
     ESP32  550   224   254573  1641904  41192   508553  @240mhz -O2
     F767ZI 138    76    96382   553786  21363   191819  @216 -O3
     F469NI 700   168   693462  5216466 134668  1376524  @180
     M4     442   243   355905  2319518  78639   704381  @120
     dragon 798   371   539358  3488030 127341  1072544  @80mhz
     32F405 397   171   243018  1582018  57316   479503  @168mhz -O2
     pico  1225   606   115476  8442560 178083  2308845  @125mhz -O3
     maple 1127   462   911500  6235716 193069  1810600  -O2
     DUE    927   638  1199187  8311095 241489  2381622  -Os
     ZERO  3081  2684  6385065 47938562 765075 12819002  -Os cpx
     R4    1218   559   907313  5996682 216354  1792292  -O3 @48mhz
     1010    06    51    38257   257315   9334    79592  -O3 @500mhz SDK
     1170   110    85    19627   129262   4563    40022  -O3 SDK @996mhz
Factorial, Diffie-Hellman, and RSA are based on the respective library's big-integer implementation. RSA is 2048-bit encryption/decryption using N, P, Q, E, and D from Paul's RSA sketch, with or without Chinese Remainder Theorem (CRT). With CRT disabled in Paul's sketch (#define MBEDTLS_RSA_NO_CRT in local_rsa.h), T4 signing takes 0.290 seconds. Also see mini-gmp

T4 crypto accelerator DCP tests

T4 floating point benchmarks

more MCU performance comparisons crypto, floating point, coremark
 
Last edited:
Not sure if Best hardware, but been playing around some with:
IMG_0845-(002).jpg

Also I have some other boards I have fabricated but have not assembled one yet:
T4-RPI.jpg
 
Ok, it's official - Teensy 4.0 has been release is starts shipping today.

https://www.pjrc.com/store/teensy40.html

It's now ok to post photos of the best test hardware. :)

Congratulations!

A huge Thank You to Paul, the PJRC crew, and the beta testers/contributors!

Unfortunately, I was (and I am still) "under water" with lots of other things going on and could neither test nor contribute. But I'm happy to see that the T4 has now come to market, even without crowdfunding, and I wish it a huge success!
 
Great news!

I know this is hot off the press, but any preliminary thoughts on the evolution of the 4.0, will there be a 4.6 with more available I/O pins and an SD card slot?

We have several hardware devices we use in scientific balloon projects that are based on the 3.6, and it would be great if we could migrate these to a 4.6 to take advantage of the new performance and features.
 
Oh, that is going to be a pain to get access to.

What is the best way to access to those pins when designing a board for T4 ?

Don't know if there is a best way but two ways that have been used are with pogo pins or with SMD header. For one of the breakout boards that I created I am using the SMD header on the underside pins and pogo pins for the USBHost connector. Here is a photo of the breakout board and the underside of one of the Beta T4 boards we used for testing.
20190807_120441.jpg
 
I've checked this thread often just to see the crazy stuff being tested. I can't say I fully understood any of it but it was awesome to watch all the contributors working on this together. Congratulations to everyone and my hat is off to you Paul!
 
@KurtE

I like that, are they available?

If you mean are they available for purchase... Not exactly

I do often put some of the designs I am working with up on github in a hodge podge of stuff: https://github.com/KurtE/Teensy3.1-Breakout-Boards
There as Diptrace Design and layout files for these, plus a zip file that you can send off to places like: OSHPark, Seeedstudio, pcbway...

For the one partially assembled board, I put up a xls file with parts on it, which is probably only partially correct. I just updated it a little as I found earlier I called out the wrong resistors for R11 and R12...

Again I only do this for my own fun, so guarantee it is good for anything. Also I have not fully assembled one yet. Like I have not put on a ILI9341 display (PJRC) on it yet. So I have not populated the two transistors that allow you to control the brightness... Also Have not put on it a LoRa, so don't remember what all I have not put on it yet... And I only just started to play around with Robotis Servos, And I have not tried out the USB on it yet. The version you see had issue with power chip on bottom, which I updated and got new boards, but have not built one yet...
 
Status
Not open for further replies.
Back
Top