How to build a fairly secure way to authenticate/license product

Status
Not open for further replies.

Matthew

Member
Hi,

So as the topic stated I want to be sure that:

  1. the device which I am selling will be used untouched as I provided it - same hardware with same software which authenticate the product itself, and
  2. store some persistence licence info about the customer and product number - make device uniquely traceable.

All this concerning the usage of Teensy 3.6 (it can also be 3.2 or 3.5) connected to linux machine which runs user application.

My thoughts:

1.
Firstly I wanted to use the MAC address. For example I get the address from controller compare it with data stored in user app side and then unlock the device. But in my case - I am buying the mian chip and bootloader chip separately so the MAC is useless (factory set to FF:FF:...:FF). Then I wanted to use the Serial number however after some conversation with Paul, the serial number also seems to be useless.

However however, the serial number alone probably isn't a useful way to check for authenticity, since it's easy to craft a USB device which simply sends the same descriptors

Another question is the uniqueness of the serial number assinged by bootloader - I made new thread for that - https://forum.pjrc.com/threads/5688...number-to-brand-new-chips?p=209717#post209717

2.
  • I will use the option of blocking access to flash memory (FSEC set to secured), so I can flash program with already provided customer data and product number and AFAIK I can feel safe as secure as the NXP system is - tell me if I am wrong. I can make special app for this purpose to automate the build process with customer data provided from app and store that in some company internal database as well. So it not inconvenient for me.
  • However 2a will generate new .hex file - not good in my case. I would like rather keep one .hex and provide the customer data externally from special app using my custom protocol over USB in "factory safe conditions". For example if I flash the program for the first time, I can enter the data once and after this process the access is blocked by setting some data in EEPROM. Which one is better in security terms? Or maybe there is a better solution? I'am aiming also at automation of the process but it is not the issue due to small forecasted annual sale.

And finally if something goes wrong (different hardware, check for license failed etc.) the device should not start - application that control the device should show error etc.

I image that during the in company setting-up process the hardware and software should be somehow securely and permanently paired with user application once. So as I wrote I am thinking about combining the hardware id's with some product specific description provided in firmware. I am waiting for your opinions and suggestions.

~Matthew
 
Hi,

if you are still able to modify your circuit you could add a preprogrammed UID chip.
Those have got an unique serial number already programmed in a Read-Only portion, therefore there is no way to overwrite it.
This way you can store all the serial numbers of your official products and:
  • If someone wil be able to replicate your hardware you will be able to identify that by crosscheck your SN database.
  • You will be able to provide a restore feature to the end user as a firmware rewrite procedure won't wipe an SN data as these are "outside" the main MCU
  • You will be able to use the same method with different MCU even with different architectures as long as the chip library can be compiled

Hope this will help you solve this issue.
 
if you are still able to modify your circuit you could add a preprogrammed UID chip.
Those have got an unique serial number already programmed in a Read-Only portion, therefore there is no way to overwrite it.

It's still a work-in-progress so that can be done. Thanks for the advice. Do you know or can you recommend any chips?

However, the more I read about it the more I think that the unique serial number is only 50% of the work. The other half is end-to-end encryption, at least for the time of key exchange or authorization.
 
Hi,

So as the topic stated I want to be sure that:
...
And finally if something goes wrong (different hardware, check for license failed etc.) the device should not start - application that control the device should show error etc.
...
~Matthew

Make this bomb-proof, otherwise you'll experience an increase in product support interaction and I would expect eventually, users to seek solutions elsewhere.
 
Assuming you're setting the code security so nobody can read a copy of your code, you could embed a RSA private key and digital signature code into your firmware.

Then when your PC-based program wants to check whether some USB device really is running your code, it could generate some random data and send it to your device. Your firmware would use its private key to create a digital signature of the random data, which it would send back to the PC. Then on the PC side, you'd embed the public key into your program, to verify if the signature is correct.

If someone snoops the communication, they'll only capture the correct response for that 1 set of random data. Even if someone reverse engineers your program on the PC side and learns everything about your checking protocol, at most they'll get the smaller public key which is only good for checking the signature. Without a copy of the private key, which is only inside the firmware, they can't create a correct signature for arbitrary random data your program sends.

As part of the CPU benchmarking for Teensy 4.0, I ported a copy of mbedtls specifically for digital signatures. Maybe this could could help?

https://github.com/PaulStoffregen/RSA_signature_speed

On Teensy 3.6, the signing process takes about half a second, using a 2048 bit RSA key, which of course gives a signature that's 2048 bits.
 
One of the companies I worked for made a consumer product involving crytographic authentication, using a dedicated chip doing a secure exchange between the device and a consumable pay-as-you-go token (I don't actually recall which one we were using). I just wanted to mention, be prepared for a significant annoyance factor in both development and production that this kind of thing necessarily adds to the mix. Basically it's adding a significant intentional failure mode, and if the security is any good, a very limited subset of people are able to deal with any functional issues that may arise. No doubt there are cases where the business model demands it. Just don't underestimate the overall hassle factor to yourself.

(When marketing changed their minds and the new model got rid of the crypto system, there was some rejoicing in engineering and production.)
 
I also recommend public key cryptography for this, just like Paul above.

However, to combat against replay attacks, you'll want the microcontroller to generate a random number, to which the host program makes some mathematical modification, and then returns. This ensures that even if somebody records the USB transmission, they cannot replay the sequence. (For example, if your device was say an electronic lock, a replay attack would be to record the USB transmission for that particular lock, and replay it back later/without the computer, to open the lock.)

To ensure that an application and a device are paired, I'd use two key pairs: D and A. The device knows DPRI and APUB, and the application DPUB and APRI. This locks in a specific device to a specific application, and vice versa. The way public key cryptography works, is that anything encrypted with XPRI is decryptable only using XPUB, and vice versa; you cannot derive XPRI or XPUB even if you know the other one.

I am also rather paranoid, so I like to have both ends generate parts of the secret random number, to make all communication replay attacks fail, not just those against the device or the application. I like the secret to be long enough for it to be used as the initialization vectors for the symmetric encryption used for normal communications, with different vectors for each direction. For the symmetric encryption, AES256-CTR seems like an obvious choice here.

So, the sequence I would implement, would be
  1. Application sends an unencrypted "hello" message to the device.
    This is just a marker, something easily recognized even in the middle of encrypted communications.
    I recommend using a message that starts with a lot of zeros (longer than any single encrypted message), followed by a specific signature.
    (Consider the situation where the application crashes midway through, and is restarted: the device needs to restart the communications from scratch also. So, this message is basically "Let's restart", and the device should probably be ready to expect it at any time. It being long with lots of zeros means it is easy to detect.)
  2. Device responds with say a 256-bit random number, encrypted using DPRI.
  3. Application decrypts the random number with DPUB. It generates another 256-bit random number, concatenates it to the decrypted one from the device, encrypts both with APRI, and sends the encrypted data to the device.
  4. Device decrypts the result with APUB, and verifies that the first part matches what it sent.
    If the result does not match, the device sends some kind of "no, you're not the correct application" message, and refuses anything but "hello" messages from the application.
    If the result matches:
  5. Device uses specific 96 bits from its original random number, and another 96 bits from the application, to be used as the 192-bit nonce (with 64-bit counter) for SHA256-CTR encryption of the data sent from device to application.
  6. Application uses specific 96 bits from its own random number, and another 96 bits from the device random number, to be used as the 192-bit nonce (with 64-bit counter) for SHA256-CTR encryption of the data sent from application to device.

After the above has been successfully done, both the application and device know the 192-bit nonces used with 64-bit counters as the keys for SHA256-CTR, a fast symmetric encryption for messages between the two. Each direction has their own key, because that way the counter is explicit and easy to maintain.

To encrypt or decrypt data using SHA256-CTR, you work in blocks of 256 bits (32 bytes). You use a 256-bit key, with 192 high bits from the nonce, and 64 low bits forming a counter (that increments by one for each message). You compute pad = sha256(key) (i.e. SHA256 hash of the current key) for each message. To encrypt or decrypt a block, you simply XOR with that particular pad. Remember, the pad changes for every single message.

Personally, I would use USB Serial, and send and receive in chunks of 32 bytes. The internal structure of those chunks depends on the data transmitted. Typically, I'd use the first byte as a type or indicator, with 0 to 31 bytes of actual payload data.

For example, let's say the device and the application communicate using binary strings. (Say, ASCII or UTF-8 text.)
Then, I'd have the first byte in each chunk be a "type", followed by 31 data bytes. Zero type is reserved for goodbye message. Types 33-63 indicate number of data bytes (1 to 31) in this chunk and that this string continues; types 64-127 indicate that there is 0 to 31 bytes in this chunk, and that the string ends there.

All of this is implementable using the code Paul showed, including the SHA256 cipher, except for the random number generator. (I am personally paranoid enough to suggest adding an avalanche diode circuit to generate additional physical randomness, and probably would use a couple of Xorshift* generators with entropy pools fed through SHA256 to get the 256-bit random numbers. The trick is to continually progress the generators even when nothing interesting happens, and refuse operation until there is enough hardware entropy (randomness) to work.)

I would consider this safe enough for any tool use, but not for transferring financial information, passwords, or secret keys. For those, I would require public-key authentication for each transaction separately, as the above scheme allows an attacker able to interrupt the application to potentially interject messages (if they find out the nonce somehow, even statistically).
 
If you're using any external entropy source, like a diode, might also want some kind of statistical check to make sure it didn't accidentally or on purpose get stuck on all 0's or 1's.
 
Fully agreed, JBeale. I alluded to that in the second-to-last paragraph.

Teensy 4.0 contains a hardware entropy source, TPRNG, that might suffice; but personally, I'd like more than one entropy source. Using exclusive-or to mix in entropy retains the randomness (ie. even if the mixed in data contains just a fraction of a bit of randomness, it will never decrease the entropy in the pool, only increase it).

For example, if you use a white noise circuit and sample it using a 16-bit ADC, you could use say a 32-sample circular buffer for the readings. Whenever you read a new sample, you check if it matches any of the last 32 samples. If it does, discard it. Otherwise, XOR the sample with the entropy pool, and replace the oldest value in the buffer with this sample. This is trivial to implement, but does not try to estimate the randomness in the noise; it could even be a perfectly repeated signal (say, a sine wave). However, as long as there is a signal, any randomness in that signal, or in the times when that signal is sampled, adds to the randomness in the entropy pool. I like that kind of robustness myself. Only a flat signal is then completely useless!

The Xorshift family of pseudorandom number generators are nice in that they have only one invalid state, all zeros, but are extremely fast. This means that you can "churn" the generator state constantly (say in your main loop, tick interrupt, USB receive interrupt, and other interrupts -- it only adds a few cycles on any of the Teensies; especially as the multiplication step can be omitted as the generated numbers are discarded, so the entire operation is just a few shifts and XORs on the generator state!), making it impossible to derive the generator state from the outputs, as only a small fraction of the state is ever used, and it changes constantly, independent of the computer or the application. Running the output through SHA256 is just insurance: it both ensures all bits of the generated output are uniformly random, and gives mathematical assurance that the output does not reveal the internal generator state.
 
First off all, I want to thank you guys, I didn't notice your answers earlier.

If te time comes i will try your suggestions and publish my results.
 
Last edited:
Status
Not open for further replies.
Back
Top