AES encryption/decryption for Teensy - test speeds

Status
Not open for further replies.

stevech

Well-known member
Just FYI

I adapted some C code I had on hand from a prior project, where that code ran on a Windows PC and an ARM7 that communicated via short range wireless + internet. (end to end encryption).

I ran the (US) NIST test data sets (test vectors), where NIST specs say, for encryption, that for these particular bytes going in, the encrypted bytes must be, and vice-versa for decryption.
The speed was way faster than I expected based on prior work.

The NIST tests use a set of of 32 bytes.
For embedded systems, I've used AES128 and the CCM option. That makes the encrypted bytes be the same number as the unencrypted (plain text). The last encryption block is truncated so that pad bytes aren't needed. This is important for wireless links where the MAC/PHY frames are a small size at max, like IEEE 802.15.4 and about 100 bytes max. The AES scheme uses a message authentication (and integrity) check which is about 4 bytes. With these, and a mutual agreement on the "nonce", one can do mutual authentication and avoid using shared key. This means that if a key is stolen, only one device is affected, unlike shared key systems (like 802.11/WiFi). Really important when the are dozens/thousands of devices communicating with a host and these are embedded systems/unattended. A lost/stolen key can be revoked in the central server and all other nodes press on, each with their unique key.

So the speed results for the NIST 32 byte data (one can extrapolate) is measured at 2.4milliseconds for 10 iterations of encryption, or about 0.24mSec for 32 bytes. Decryption is about the same. I'll try 100 bytes but it should scale with no big surprises.

These numbers are with a Teensy 3.1. The 3.0 would be somewhat slower.
 
Last edited:
If you have the desire, you might be able to speed it up using asm and Cortex M4 specific instructions if you absolutely need the speed, and have plenty of spare time.
 
If you have the desire, you might be able to speed it up using asm and Cortex M4 specific instructions if you absolutely need the speed, and have plenty of spare time.
Yes, the ARM7 version I have is in asm. But milliseconds per frame speeds are more than adequate.
 
Additional to the first post:
Increase to 100 bytes, rather typical of telemetry systems.
AES 128/ccm encryption is 5.1 milliseconds per 100 bytes; decryption is 5 milliseconds.
 
Yes, the ARM7 version I have is in asm. But milliseconds per frame speeds are more than adequate.

Yes, if you communicating with something else, it won't matter once the decode speed is faster than the time to transmit the data (assuming you can be doing the reading in the background, such as with hardware FIFO's). On server class machines, the equation is different, since there you are typically encrypting/decrypting multiple streams of traffic, including reads/writes to disk. That's why the high end chips now have AES helper instructions.
 
Indeed, on servers. I'm investigating speed on microprocessors where a couple of mSec is adequate. The encryption speed is typically slower than decryption, due to the algorithms.
WiFi and 802.15.4 and others have MAC layer AES, but that only handles the wireless hop. The idea here is that the encryption is done at the application or network layer (if there is one), in simple devices, then moves across many IP networks, still encrypted, to the destination system where the keys and counters exist in order to do mutual-authentication and decrypt, and without the complexity of digital certificates (impractical for many embedded systems that are remote/unattended).
 
Status
Not open for further replies.
Back
Top