Teensy 3.6 Random Number Generator

Status
Not open for further replies.

randomvibe

Well-known member
Is there a Teensy 3.6 library or function that accesses the hardware random number generator in the K66 chip?
 

Thanks for that. It compiles and runs on my Teensy 3.6.

I was expecting the random number "r" to range from 0 to 2^32-1, but the values are consistently large, in the order of 1e9. Is "r" the 32-bit random number? Also, what is the significance of the "REPS" variable? Why is REPS=50?

Here is the @manitou code I ran for reference:

Code:
// random RNG
#define PRREG(x) Serial.print(#x" 0x"); Serial.println(x,HEX)

#define REPS 50

#define RNG_CR_GO_MASK                           0x1u
#define RNG_CR_HA_MASK                           0x2u
#define RNG_CR_INTM_MASK                         0x4u
#define RNG_CR_CLRI_MASK                         0x8u
#define RNG_CR_SLP_MASK                          0x10u
#define RNG_SR_OREG_LVL_MASK                     0xFF00u
#define RNG_SR_OREG_LVL_SHIFT                    8
#define RNG_SR_OREG_LVL(x)                       (((uint32_t)(((uint32_t)(x))<<RNG_SR_OREG_LVL_SHIFT))&RNG_SR_OREG_LVL_MASK)
#define SIM_SCGC6_RNGA    ((uint32_t)0x00000200)

//#define Serial Serial1
uint32_t trng(){
    RNG_CR |= RNG_CR_GO_MASK;
    while((RNG_SR & RNG_SR_OREG_LVL(0xF)) == 0); // wait
    return RNG_OR;
}

void setup() {
  Serial.begin(9600);
  Serial.println("hello");  delay(2000);
    SIM_SCGC6 |= SIM_SCGC6_RNGA; // enable RNG
    PRREG(SIM_SCGC6);
    RNG_CR &= ~RNG_CR_SLP_MASK;
    RNG_CR |= RNG_CR_HA_MASK;  // high assurance, not needed
    PRREG(RNG_CR);
    PRREG(RNG_SR);
}

void loop() {
  uint32_t t, r;
  int i;

  t=micros();
  for(i=0;i<REPS;i++) r=trng();
  t=micros()-t;
  float bps = REPS*32.e6/t;

  Serial.println(" ");
  Serial.println(t);
  Serial.println(bps,2);
  Serial.println(r,HEX);
  Serial.println(r);

  delay(2000);

}
 
Last edited:
Looks like REPS gives the loop counter value for repeated calls for random numbers here: for(i=0;i<REPS;i++) r=trng(); Thus, only every 50th generated random number is printed in this sketch. The "decoration" around is just to check the speed of the generation.

A more simplified loop() would just retrieve random numbers every n milliseconds and print these:

Code:
// random RNG
#define PRREG(x) Serial.print(#x" 0x"); Serial.println(x,HEX)

#define REPS 50 //not needed in this variant
#define MYDLY 500 //loop delay in milliseconds

#define RNG_CR_GO_MASK                           0x1u
#define RNG_CR_HA_MASK                           0x2u
#define RNG_CR_INTM_MASK                         0x4u
#define RNG_CR_CLRI_MASK                         0x8u
#define RNG_CR_SLP_MASK                          0x10u
#define RNG_SR_OREG_LVL_MASK                     0xFF00u
#define RNG_SR_OREG_LVL_SHIFT                    8
#define RNG_SR_OREG_LVL(x)                       (((uint32_t)(((uint32_t)(x))<<RNG_SR_OREG_LVL_SHIFT))&RNG_SR_OREG_LVL_MASK)
#define SIM_SCGC6_RNGA    ((uint32_t)0x00000200)

//#define Serial Serial1
uint32_t trng(){
    RNG_CR |= RNG_CR_GO_MASK;
    while((RNG_SR & RNG_SR_OREG_LVL(0xF)) == 0); // wait
    return RNG_OR;
}

void setup() {
  Serial.begin(9600);
  Serial.println("hello");  delay(2000);
    SIM_SCGC6 |= SIM_SCGC6_RNGA; // enable RNG
    PRREG(SIM_SCGC6);
    RNG_CR &= ~RNG_CR_SLP_MASK;
    RNG_CR |= RNG_CR_HA_MASK;  // high assurance, not needed
    PRREG(RNG_CR);
    PRREG(RNG_SR);
}

void loop() {
  uint32_t r;

  r=trng();
  
  Serial.println(" ");
  Serial.println(r,HEX);
  Serial.println(r);

  delay(MYDLY);

}
 
The sketch is a proof of concept and speed test. you can trim it down to suit your application needs. REPS is just the number of times the for-loop runs to get timing data. You need to initialize the RNG in setup() with

SIM_SCGC6 |= SIM_SCGC6_RNGA; // enable RNG
RNG_CR &= ~RNG_CR_SLP_MASK;
RNG_CR |= RNG_CR_HA_MASK; // high assurance, not needed

then retrieve an unsigned 32-bit random number with trng()

The hardware produces 32-bit random numbers in the range 0 to 2^32-1 as you note. The probability that you will see a 0 is 1 in 4 billion. NXP has verified randomness, though Ch 38 in K66 reference manual suggests for crypto strength you use only 1-bit out of each 32-bit word generated. Using all bits and collecting several million bytes of data, I have successfully run various randomness tests (ent, diehard, NIST sts-2.1.1)
 
then retrieve an unsigned 32-bit random number with trng()

Like I said, trng() seems to return values greater than 1e9 only. I'll run it for several minutes to satisfy myself.

In your timing loop, why did you multiply REPS by 32 million, did you mean 2^32?

The statement... "for crypto strength you use only 1-bit out of each 32-bit word generated" is confusing. Does this mean a coin toss only?

Thanks for putting together the function.
 
REPS * 32, because hardware RNG generates 32 random bits on each call, multiply that by 1 million and divide by elapsed microscedonds, and you get bits/second. The manual says K66 generates 32 random bits every 256 F_BUS cycles, 7.5 megabits/sec with F_BUS 60 MHz. Other anecdotal RNG performance numbers at https://github.com/manitou48/DUEZoo/blob/master/RNGperf.txt

crypto-strength should be a worry only if you are generating a random encryption key for your bit coins or missile launch codes. For crypto strength, NXP suggests that the entropy from the 32-bits is really only worth about 1 -bit, so if you needed an 8-bit random number, you would call the hardware RNG 8 times and take only 1 bit from each call.

trng() seems to return values greater than 1e9 only
To visually observe 32-bit random numbers and "determine their randomness" is a fool's errand. You need to run statistical tests over millions of bytes of the random bits (see post #6). Maybe just observe the the low order bit (odd vs even). Here is a sketch that does a weak real-time statistical test of the random bytes.

Code:
// random RNG

#define RNG_CR_GO_MASK                           0x1u
#define RNG_CR_HA_MASK                           0x2u
#define RNG_CR_INTM_MASK                         0x4u
#define RNG_CR_CLRI_MASK                         0x8u
#define RNG_CR_SLP_MASK                          0x10u
#define RNG_SR_OREG_LVL_MASK                     0xFF00u
#define RNG_SR_OREG_LVL_SHIFT                    8
#define RNG_SR_OREG_LVL(x)                       (((uint32_t)(((uint32_t)(x))<<RNG_SR_OREG_LVL_SHIFT))&RNG_SR_OREG_LVL_MASK)
#define SIM_SCGC6_RNGA    ((uint32_t)0x00000200)

uint32_t trng() {
  while ((RNG_SR & RNG_SR_OREG_LVL(0xF)) == 0); // wait
  return RNG_OR;
}

void setup() {
  Serial.begin(9600);
  SIM_SCGC6 |= SIM_SCGC6_RNGA; // enable RNG  0x40029000
  RNG_CR &= ~RNG_CR_SLP_MASK;
  RNG_CR |= RNG_CR_HA_MASK;  // high assurance, not needed
  RNG_CR |= RNG_CR_GO_MASK;       // ? only need to do this once?
}

//  quick entropy check  expect 8 bits entropy  0.5 1s  127.5
uint32_t bits[8], cnts[256], bytes;

void dobyte(uint8_t byte) {
  bytes++;
  cnts[byte]++;
  for (int j = 0; j < 8; j++) if ( byte &  1 << j) bits[j]++;
}

void loop() {
  static uint32_t ms = millis();
  uint32_t r = trng();
  uint8_t *b = (uint8_t *)  &r;

  for (int i = 0; i < 4; i++) dobyte(b[i]);
  if (millis() - ms > 5000) {
    float avrg = 0, p, e = 0;
    ms = millis();
    for (int i = 0; i < 256; i++) {
      avrg += 1.0 * i * cnts[i];
      if (cnts[i]) {
        p = (float) cnts[i] / bytes;
        e -= p * log(p) / log(2.0);
      }
    }
    Serial.printf("%d bytes avrg %f entropy %f\n", bytes, avrg / bytes, e);
    for (int j = 0; j < 8; j++) {
      Serial.print((float) bits[j] / bytes, 6);
      Serial.print(" ");
    }
    Serial.println();
  }
}
 
Last edited:
Wondering what random tests show with kinetis.h change of F_BUS:
Code:
#if (F_CPU == 240000000)
 #define F_PLL 240000000
 #ifndef F_BUS
 // #define F_BUS 60000000
 //#define F_BUS 80000000   // uncomment these to try peripheral overclocking
 #define F_BUS 120000000  // all the usual overclocking caveats apply...

There was another recent thread about a test of the RNG output.
 
For crypto strength, NXP suggests that the entropy from the 32-bits is really only worth about 1 -bit, so if you needed an 8-bit random number, you would call the hardware RNG 8 times and take only 1 bit from each call.

Yes, that makes sense.

I looped the K66 random number generator function. A plot of just a small subset appears very random and correctly ranges from ~0 to ~2^32-1 (of course those limits are never reached in just a million samples).

The histogram is virtually flat in contrast to a Gaussian function (bell curve) that occurs naturally. The K66 random number generator is fantastic. See plots:

Plot of random numbers: (subset)
serialout.jpg

Histogram of ~0.5 million samples:
histogram.jpg
 
Excellent. Just for the record here is a plot of p-values for NIST sts-2.1.1 tests and DIEHARD tests using 1 MB of K66 random data and 12 MB.

k66.png
The tests return a p-value, which should be uniform on [0,1) if the input file contains truly independent random bits. Those p-values are obtained by p=1-F(X), where F is the assumed distribution of the sample random variable X---often normal. But that assumed F is often just an asymptotic approximation, for which the fit will be worst in the tails. Thus you should not be surprised with occasional p-values near 0 or 1, such as .0012 or .9983. When a bit stream really FAILS BIG, you will get p`s of 0 or 1 to six or more places. By all means, do not, as a Statistician might, think that a p < .025 or p> .975 means that the RNG has "failed the test at the .05 level". Such p`s happen among the hundreds that DIEHARD (239) or NIST (188) sts produces, even with good RNGs. So keep in mind that "p happens". (from DIEHARD description)

Another visual test is to treat the binary data as a raster of RGB values, here is a 256x256 scatter plot

tmpk.png

It took 63 seconds to generate and upload (Serial.write) the 12 MB of random data from the T3.6. By comparison, it took 61 seconds to write 12 MB of random data to the BUILTIN_SDCARD on T3.6.

T3.5/3.6 TRNG sketch

RNG performance data for various MCUs
 
Last edited:
Using Manitou's post #8** code and to answer my post #9 question it looks like 120 MHz F_BUS speed returns RNG's 2X faster and measures similar against the included entropy indicator?

** Added a quick 1,000,000 calls to trng() in setup()

F_CPU == 240000000 F_BUS == 60000000 1000000 trng() calls millis == 4266
4687812 bytes avrg 127.484650 entropy 7.999966
0.499969 0.499705 0.499510 0.499709 0.500060 0.500247 0.499891 0.499904
9372492 bytes avrg 127.496246 entropy 7.999980
0.499627 0.499808 0.499694 0.499999 0.499889 0.499783 0.500111 0.499999
14057184 bytes avrg 127.503380 entropy 7.999986
0.499836 0.499766 0.499733 0.500013 0.500028 0.499845 0.500095 0.500026
18741880 bytes avrg 127.505615 entropy 7.999991
0.499778 0.499804 0.499788 0.500079 0.499975 0.499886 0.500127 0.500019
23426580 bytes avrg 127.506622 entropy 7.999991
0.499829 0.499845 0.499809 0.500012 0.499976 0.499955 0.500212 0.499969
28111280 bytes avrg 127.507240 entropy 7.999993
0.499851 0.499872 0.499903 0.499988 0.499964 0.499885 0.500207 0.499993
32795980 bytes avrg 127.510254 entropy 7.999993
0.499796 0.499900 0.499938 0.500000 0.499978 0.499870 0.500199 0.500021
37480680 bytes avrg 127.507889 entropy 7.999997
0.499848 0.499843 0.499983 0.499969 0.500019 0.499866 0.500176 0.500011

F_CPU == 240000000 F_BUS == 120000000 1000000 trng() calls millis == 2133
7718948 bytes avrg 127.520798 entropy 7.999976
0.500197 0.500092 0.500216 0.500200 0.500287 0.500074 0.500177 0.499997
15432424 bytes avrg 127.503799 entropy 7.999989
0.500111 0.500096 0.500214 0.500055 0.500128 0.499946 0.500098 0.499966
23145864 bytes avrg 127.510590 entropy 7.999995
0.500093 0.500048 0.500210 0.499964 0.500083 0.500008 0.500114 0.500007
30859064 bytes avrg 127.504814 entropy 7.999996
0.500088 0.500044 0.500124 0.499950 0.500018 0.500070 0.500111 0.499961
38572636 bytes avrg 127.501213 entropy 7.999997
0.500035 0.500060 0.500071 0.499984 0.499987 0.500033 0.500058 0.499971
46285988 bytes avrg 127.502007 entropy 7.999996
0.500046 0.500106 0.500084 0.500021 0.500002 0.500044 0.500066 0.499966
53999544 bytes avrg 127.499855 entropy 7.999997
0.500019 0.500095 0.500083 0.499987 0.500006 0.500037 0.500068 0.499951
61712860 bytes avrg 127.505417 entropy 7.999997
0.500027 0.500111 0.500095 0.499998 0.500039 0.500036 0.500089 0.499979
 
HW-RNG speed

I compared the speed with some good software random generators for fun and the HW is very slow. ISAAC is a cryptsafe cipher and uses about 2 KB of ram, the code can be found here: http://www.burtleburtle.net/bob/rand/isaacafa.html
It shows the HW is only good for seeding if you are doing some heavy math. ISAAC is 12x faster,


HW-RNG: 4266 ms
SIMPLE-RNG: 505 ms
ISAAC-RNG: 339 ms
(1.000.000 passes each with 32bit results)


Simple-RNG, quiet good for statistical purpose and passes all diehard tests.
Code:
typedef struct ranctx { uint32_t a; uint32_t b; uint32_t c; uint32_t d; } ranctx;

#define rot(x,k) (((x)<<(k))|((x)>>(32-(k))))

uint32_t ranval( ranctx *x ) {
    uint32_t e = x->a - rot(x->b, 27);
    x->a = x->b ^ rot(x->c, 17);
    x->b = x->c + x->d;
    x->c = x->d + e;
    x->d = e + x->a;
    return x->d;
}

void raninit( ranctx *x, uint32_t seed ) {
    uint32_t i;
    x->a = 0xf1ea5eed, x->b = x->c = x->d = seed;
    for (i=0; i<20; ++i) {
        (void)ranval(x);
    }
}
 
Last edited:
Status
Not open for further replies.
Back
Top