Perlin Noise implementation?

Status
Not open for further replies.

StefanPetrick

Well-known member
Hi guys, I´d really love to have a good quality and fast Perlin Noise implementation.
Maybe it would be of general interest for usecases beside led effects as well?!

The FastLEDs noise functions are okayish, but produce spikes here and there which interrupt the flow of the waveform.

I was adviced to have a look at this code here:

Code:
// JAVA REFERENCE IMPLEMENTATION OF IMPROVED NOISE - COPYRIGHT 2002 KEN PERLIN.

public final class ImprovedNoise {
   static public double noise(double x, double y, double z) {
      int X = (int)Math.floor(x) & 255,                  // FIND UNIT CUBE THAT
          Y = (int)Math.floor(y) & 255,                  // CONTAINS POINT.
          Z = (int)Math.floor(z) & 255;
      x -= Math.floor(x);                                // FIND RELATIVE X,Y,Z
      y -= Math.floor(y);                                // OF POINT IN CUBE.
      z -= Math.floor(z);
      double u = fade(x),                                // COMPUTE FADE CURVES
             v = fade(y),                                // FOR EACH OF X,Y,Z.
             w = fade(z);
      int A = p[X  ]+Y, AA = p[A]+Z, AB = p[A+1]+Z,      // HASH COORDINATES OF
          B = p[X+1]+Y, BA = p[B]+Z, BB = p[B+1]+Z;      // THE 8 CUBE CORNERS,

      return lerp(w, lerp(v, lerp(u, grad(p[AA  ], x  , y  , z   ),  // AND ADD
                                     grad(p[BA  ], x-1, y  , z   )), // BLENDED
                             lerp(u, grad(p[AB  ], x  , y-1, z   ),  // RESULTS
                                     grad(p[BB  ], x-1, y-1, z   ))),// FROM  8
                     lerp(v, lerp(u, grad(p[AA+1], x  , y  , z-1 ),  // CORNERS
                                     grad(p[BA+1], x-1, y  , z-1 )), // OF CUBE
                             lerp(u, grad(p[AB+1], x  , y-1, z-1 ),
                                     grad(p[BB+1], x-1, y-1, z-1 ))));
   }
   static double fade(double t) { return t * t * t * (t * (t * 6 - 15) + 10); }
   static double lerp(double t, double a, double b) { return a + t * (b - a); }
   static double grad(int hash, double x, double y, double z) {
      int h = hash & 15;                      // CONVERT LO 4 BITS OF HASH CODE
      double u = h<8 ? x : y,                 // INTO 12 GRADIENT DIRECTIONS.
             v = h<4 ? y : h==12||h==14 ? x : z;
      return ((h&1) == 0 ? u : -u) + ((h&2) == 0 ? v : -v);
   }
   static final int p[] = new int[512], permutation[] = { 151,160,137,91,90,15,
   131,13,201,95,96,53,194,233,7,225,140,36,103,30,69,142,8,99,37,240,21,10,23,
   190, 6,148,247,120,234,75,0,26,197,62,94,252,219,203,117,35,11,32,57,177,33,
   88,237,149,56,87,174,20,125,136,171,168, 68,175,74,165,71,134,139,48,27,166,
   77,146,158,231,83,111,229,122,60,211,133,230,220,105,92,41,55,46,245,40,244,
   102,143,54, 65,25,63,161, 1,216,80,73,209,76,132,187,208, 89,18,169,200,196,
   135,130,116,188,159,86,164,100,109,198,173,186, 3,64,52,217,226,250,124,123,
   5,202,38,147,118,126,255,82,85,212,207,206,59,227,47,16,58,17,182,189,28,42,
   223,183,170,213,119,248,152, 2,44,154,163, 70,221,153,101,155,167, 43,172,9,
   129,22,39,253, 19,98,108,110,79,113,224,232,178,185, 112,104,218,246,97,228,
   251,34,242,193,238,210,144,12,191,179,162,241, 81,51,145,235,249,14,239,107,
   49,192,214, 31,181,199,106,157,184, 84,204,176,115,121,50,45,127, 4,150,254,
   138,236,205,93,222,114,67,29,24,72,243,141,128,195,78,66,215,61,156,180
   };
   static { for (int i=0; i < 256 ; i++) p[256+i] = p[i] = permutation[i]; }
}

I have to admit that my coding skills and general knowledge of C is very limited yet.

So question from a novice: Do you have any advice how to get this algorithm running on a Teensy 3.6? Maybe even using the FPU?

Hints & help is highly appreciated.
 
Last edited:
The Perlin noise version I use is below, I think it's doing the same process as the PC side code but with some shortcuts with things like the array of bytes trading entropy for processing speed.

You feed it X/Y/Z values, so for the 2D matrix project this was cribed from X and Y where the pixels colour (green here) was called with a slowing shifting rise value to flow the colours:
byte g = 128*((pnoise(xfloat-sin(xfloat),yfloat-cos(yfloat),rise+10))+1);

Not clear on why x-sint and y-cosy is in there but it works, the times 128 and plus one is to shift results from +/-1 to 0-256 for useful colour results.

Code:
/*
 Ken Perlins improved noise   -  http://mrl.nyu.edu/~perlin/noise/
 C-port:  http://www.fundza.com/c4serious/noise/perlin/perlin.html
 by Malcolm Kesson;   arduino port by Peter Chiochetti, Sep 2007 :
 -  make permutation constant byte, obsoletes init(), lookup % 256
*/

static const byte p[] = {   151,160,137,91,90, 15,131, 13,201,95,96,
53,194,233, 7,225,140,36,103,30,69,142, 8,99,37,240,21,10,23,190, 6,
148,247,120,234,75, 0,26,197,62,94,252,219,203,117, 35,11,32,57,177,
33,88,237,149,56,87,174,20,125,136,171,168,68,175,74,165,71,134,139,
48,27,166, 77,146,158,231,83,111,229,122, 60,211,133,230,220,105,92,
41,55,46,245,40,244,102,143,54,65,25,63,161, 1,216,80,73,209,76,132,
187,208, 89, 18,169,200,196,135,130,116,188,159, 86,164,100,109,198,
173,186, 3,64,52,217,226,250,124,123,5,202,38,147,118,126,255,82,85,
212,207,206, 59,227, 47,16,58,17,182,189, 28,42,223,183,170,213,119,
248,152,2,44,154,163,70,221,153,101,155,167,43,172, 9,129,22,39,253,
19,98,108,110,79,113,224,232,178,185,112,104,218,246, 97,228,251,34,
242,193,238,210,144,12,191,179,162,241,81,51,145,235,249,14,239,107,
49,192,214, 31,181,199,106,157,184, 84,204,176,115,121,50,45,127, 4,
150,254,138,236,205, 93,222,114, 67,29,24, 72,243,141,128,195,78,66,
215,61,156,180
};

double fade(double t){ return t * t * t * (t * (t * 6 - 15) + 10); }
double lerp(double t, double a, double b){ return a + t * (b - a); }
double grad(int hash, double x, double y, double z)
{
int     h = hash & 15;          /* CONVERT LO 4 BITS OF HASH CODE */
double  u = h < 8 ? x : y,      /* INTO 12 GRADIENT DIRECTIONS.   */
          v = h < 4 ? y : h==12||h==14 ? x : z;
return ((h&1) == 0 ? u : -u) + ((h&2) == 0 ? v : -v);
}

#define P(x) p[(x) & 255]

double pnoise(double x, double y, double z)
{
int   X = (int)floor(x) & 255,             /* FIND UNIT CUBE THAT */
      Y = (int)floor(y) & 255,             /* CONTAINS POINT.     */
      Z = (int)floor(z) & 255;
x -= floor(x);                             /* FIND RELATIVE X,Y,Z */
y -= floor(y);                             /* OF POINT IN CUBE.   */
z -= floor(z);
double  u = fade(x),                       /* COMPUTE FADE CURVES */
        v = fade(y),                       /* FOR EACH OF X,Y,Z.  */
        w = fade(z);
int  A = P(X)+Y, 
     AA = P(A)+Z, 
     AB = P(A+1)+Z,                        /* HASH COORDINATES OF */
     B = P(X+1)+Y, 
     BA = P(B)+Z, 
     BB = P(B+1)+Z;                        /* THE 8 CUBE CORNERS, */

return lerp(w,lerp(v,lerp(u, grad(P(AA  ), x, y, z),   /* AND ADD */
                          grad(P(BA  ), x-1, y, z)),   /* BLENDED */
              lerp(u, grad(P(AB  ), x, y-1, z),        /* RESULTS */
                   grad(P(BB  ), x-1, y-1, z))),       /* FROM  8 */
            lerp(v, lerp(u, grad(P(AA+1), x, y, z-1),  /* CORNERS */
                 grad(P(BA+1), x-1, y, z-1)),          /* OF CUBE */
              lerp(u, grad(P(AB+1), x, y-1, z-1),
                   grad(P(BB+1), x-1, y-1, z-1))));
}

Edit: credit to the original authors listed at the top. Getting it running faster is a case of using the T3.6 trig commands, and I think moving from double to ordinary floats that fit inside the FPU, which I think is how this code runs on an 8 bit micro anyway.
https://www.arduino.cc/en/Reference/Double
 
Last edited:
Here is some code that compiles. It compares different noise implementations.

Unfortunately none of the versions produces a perfect smooth output. The results look like this (inoise16_mod)

graph3.jpg

or like that (FastLED inoise16)

graph4.jpg

or so (renderNoise)

graph5.jpg

I simply don´t understand the math behind which makes it impossible to hunt the bug.
 
Yes, the scaling of renderNoise is wrong but the bigger problem is that it performs similar erratic jumps like inoise16_mod.

I got your code running. The waveform looks really great so far (just observed for some minutes). Here a picture of pnoise:

graph6.jpg

Problem: The calculation is slow (edit: on a 3.2). Is that because of operating with 64 bit data? Basically I would be fine with results in the uint16 range. Until today I avoided the usage of any non integer data types. What would be the best for the Teensy 3.6?

Any easy way to change some datatypes in order to maybe get it performing faster?

edit: Noticed your edit. Replaced all doubles with floats. Working fine. Thanks again!
 
Last edited:
I did some short tests. Simple rendering of one 16x16 layer and writing to 256 APA102s.

Teensy 3.2@120 MHz (Fastest + LTO): 85 fps
Teensy 3.6@240 MHz (Fastest + LTO): 896 fps

I´m stunned and impressed! That board is a serious monster. :D
 
Last edited:
Along with using floats that fit, have you swapped out the trig for the hardware versions (sinf, cosf, logf)? Certainly an interesting metric there with a 2X clock rate getting more than 10 times the useful processing.

Never actually used that code at more than 32 pixels so hadn't thought to look for ways to accelerate it. May need to revisit some projects.

Some notes on what the Teensy 3.6 FPU does and doesn't do
https://forum.pjrc.com/threads/3854...and-ADC-speeds?p=120003&viewfull=1#post120003
 
Also make sure you change floor() to floorf().

Code:
int   X = (int)floor(x) & 255,             /* FIND UNIT CUBE THAT */
      Y = (int)floor(y) & 255,             /* CONTAINS POINT.     */
      Z = (int)floor(z) & 255;
x -= floor(x);                             /* FIND RELATIVE X,Y,Z */
y -= floor(y);                             /* OF POINT IN CUBE.   */
z -= floor(z);

Use of floor as in this code causes the floats to be converted to 64 bit double, then the double are turned into integers. Slow library routines are called.

Using floorf keeps it all within the fast FPU.
 
I don´t use the trig yet, GremlinWrangler. Too predictable to control the coordinates this way. I used some values from the noise data itself to add / subtract to the coordinates. Creates more vivid and organic appearing motion paths.

Here an example how it looks: https://forum.pjrc.com/threads/45027-Video-FastLED-simplex-noise-modulated-by-itself
But the noise implementation I used in the first 2 videos produces some discontinuities. Maybe you notice the jumps in the flow. Tried to hide it with datasmoothing.

In case I need trig I´ll come back to the hardware versions. Had always look up tables for that. Stacked sine waves with different frequencies (based on prime numbers) and modulated amplitudes are also nice for motion control but they contain "rythms" as well. Afer watching it a while the observer gets a feeling what will happen next, even if the sines meet again precicely only after ages. When noise parameters are controlled by the noise data itself the animation stays interesting and surprising. Never managed to get a similar feeling with trig yet.

Thanks for the FPU notes, sure ideas will come how to exploid it.

edit: Here a picture of the resulting waveforms from the selfmodulated noise. Beautiful.
graph7.jpg
 
Last edited:
Thanks for the advice, Paul! This improved the framrate from 450 to 960 for the animation I have right now. The numbers are not comparable with #7.
 
Last edited:
Status
Not open for further replies.
Back
Top