Is there a crypto analysis on this?
As in a reputable cryptographer saying this is okay, no, not exactly.
The current state of the art in
CSPRNGs is
fast-key-erasure random-number generators. The
/dev/urandom in the Linux kernel is a well-known, well-examined implementation.
DJB describes the case for when using AES-256-CTR (aka CTR-AES256, see
RFC3686). The state of the art comes in by making sure the key/nonce is erased and undiscoverable afterwards, even if the full internal CSPRNG state is divulged. DJB's recommendation is simple: generate say 6144 bits at a time, using the first two blocks (128 bits) to overwrite the stored key used for the next chunk, leaving you with 5888 (or 5920) random bits per chunk, or 736 (or 740) bytes per chunk. The exact number of 128-bit blocks in any subsequence using the same key and nonce is not that important, as long as it is not "too" long.
For a CSPRNG, you don't need to follow RFC3686, of course. If you limit the subsequence to 256 blocks or less with the same stored key, you could use the upper 24 bits of the counter from Xorshift64* to ensure the 128-bit block order is random also. (The only benefit from that is to stop any cryptanalysis involving the plaintext-ciphertext relation in a
sequence of blocks, because the counter is no longer monotonic, only unique.)
_ _ _ _ _
There are different levels of "cryptographically secure" and "randomness". There are very valid reasons why Linux has both
/dev/random and
/dev/urandom, and why the former blocks when there is no entropy available.
For QNEthernet, my problem is that "entropy" is not the same as "cryptographically secure pseudorandom numbers". For example, the
Pedersen2006 reference (corrected link) qualifies the jitter in the WDT (watch dog timer) as an entropy source; you probably shouldn't expect each sample to have more than one or two bits of randomness. The same applies to comparing the cycle counts of any two clocks, where at least one is generated from a (separate) RC oscillator, especially an internal one. You use one of the clocks to measure some suitably long interval, and count the number of cycles in the other. The count will vary by a few values, but typically you'll only want to use the least significant bit as an entropy source.
The current approach of using the entropy pool directly, without any good PRNGs implemented to tell that one is supposed to use the entropy pool to initialize their state and not to use entropy pools as random values, is the crux of the problem.
Solution? Let's say you reuse the excellent
SipHash function you already have (authored by DJB and Aumasson). It's only drawback is the "small" 128-bit key. For the 64-bit plaintext, we use a counter.
Plaintext that is known to be monotonically increasing value with many constant bits may be vulnerable to various attacks. As it happens, Xorshift64* is bijective, which means that it can also be used to mix the bits in a 64-bit value, and its sequence (starting from any nonzero 64-bit value) acts like a counter with full 2⁶⁴-1 period. Cryptographically, each plaintext being unique suffices; statistical randomness is not needed here.
(Note: That explains why I use Xorshift64* so often. The full one I use for a fast, 64-bit nonzero most-bits-change-every-step counter, although the sequence is predictable. The high 32-40 bits I use as a verified statistically random (but not cryptographically secure) PRNG.)
This means that one quite valid implementation for random 64-bit words in QNEthernet would be
C:
typedef struct {
uint32_t key[4]; // Key; for example, entropy buffer
uint64_t prng; // Xorshift64* state, nonzero
uint16_t crounds; // SipHash C rounds, typically 2; 4 recommended for security
uint16_t drounds; // SipHash D rounds, typically 4; 8 recommended for security
} sprng;
#define SPRNG_INIT { .key = { 0, 0, 0, 0 }, .prng = 1, .crounds = 4, .drounds = 8 }
static inline void spring_init(sprng *ctx) {
if (ctx) {
ctx->key[0] = 0; // TODO: Initialise from entropy pool
ctx->key[1] = 0; // TODO: Initialise from entropy pool
ctx->key[2] = 0; // TODO: Initialise from entropy pool
ctx->key[3] = 0; // TODO: Initialise from entropy pool
ctx->prng = 1; // Nonzero; initializing from entropy pool would be better
ctx->crounds = 4;
ctx->drounds = 8;
}
}
static inline uint64_t sprng_u64(sprng *ctx) {
// Update Xorshift64* state
uint64_t x = ctx->prng;
x ^= x >> 12;
x ^= x << 25;
x ^= x >> 27;
ctx->prng = x;
// Temporary buffer for SipHash use, applying the Xorshift64* mixing step.
uint64_t sh_in[1] = { x * UINT64_C(2685821657736338717) };
return siphash(ctx->crounds, ctx->drounds, ctx->key, sh_in, sizeof sh_in);
}
Even with initial
ctx->key = { 0, 0, 0, 0 } (and
ctx->prng = 1) this will produce a "cryptographically secure random sequence"; it will always just be the same one. So, statistically random, but not cryptographically secure. Its cryptographic security depends on whether
ctx->key is filled with unknown and unpredictable values.
Just like Linux
/dev/urandom, this will always produce a new pseudorandom value. If the key state is unpredictable, the sequence of values will be cryptographically as secure as SipHash is. The longer the sequence without a key change, the higher the risk of a successful attack by a resourceful antagonist.
Optimally, whenever there is TRNG data or other entropy available, it should be mixed in to the key. The simplest valid mixing procedure is 128-bit unsigned integer addition:
C:
void sprng_add_entropy(sprng *ctx, const uint32_t maybe_random) {
const uint32_t old = ctx->key;
const uint32_t sum = old + maybe_random;
uint32_t tmp[4] = { ctx->key[0], ctx->key[1], ctx->key[2], sum };
if (sum < old || sum < maybe_random[0]) // Carried
if (!(++tmp[2]))
if (!(++tmp[1]))
++tmp[0];
}
Note that such mixing in could be done to statically allocated CSPRNGs in one of the regularly occurring interrupts, triggering HW-TRNG generation at the end, and no active waiting.
Using a cryptographically secure hash function (other than SipHash) to generate the new 128-bit key from the above (i.e., add maybe random data to current 128-bit key, then do an 128-bit hash, and store the result as a new key) would be much, much better, though: it would essentially "destroy" the old key with excellent forward and backward security, when the
maybe_random values applied are not observable by attackers. You don't want to use the same key for too much output, remember! The downside is code size increase, for another hash function. TLS/SSL libraries will already contain useful ones.
Teensy 4.x
7.4. Data coprocessor (page 179) has
many interesting and useful features that would help with all this (including chaining two SPI buses to do AES-128-CTR in hardware), but I don't think NXP is willing to provide the security programming details even under NDA to open source developers.