Execution Speed of the Teensy 3.1

Status
Not open for further replies.

Jake

Well-known member
I have been writing a program and at the same time am wondering about whether I will need to speed it up with fixed point or other things. To test the speed of some specific functions I created a little program to perform the timing of some of the operations. The results show that integer multiplies are about as fast as integer adds and subtracts. Floating point operations take longer as expected. Float and double variables also take the same time as expected, as I understand they are both 32 bit numbers on the teensy. My program does do some trig operations so I implemented a lookup table for the sine function and used it for cosine and tangent as well. Surprisingly it is slower it down by a factor of 4 as compared to the built-in sine function. However a rational approximation for the arctangent function was significantly faster but not as accurate. I looked at my code and did some hand optimization in the source file, and was surprised that using brackets to eliminate dividing by constants improved the speed. I thought compiler optimization would take care of easy things like that.

Any comments on why the lookup table is soooo slow.

The calling program is: (I hope I posted this correctly)

Code:
#include "debug.h"

int n, time1, dummy;
float floaty;
byte timerNumber = 10; 

void setup() {
  Serial.begin(9600);
  delay(800);    // wait for above command to finish, 500 gives intermittant operation
  Serial.print("Starting Timer ");
  Serial.println(timerNumber);

  time1 = micros();
  for(n=0; n<1000; n++)
    dummy = 5 + n;
  Serial.print("Addition Time ");
  Serial.println(micros() - time1);  

  time1 = micros();
  for(n=0; n<1000; n++)
    dummy = 5 - n;
  Serial.print("Subtraction Time ");
  Serial.println(micros() - time1);  

  time1 = micros();
  for(n=0; n<1000; n++)
    dummy = 5 * n;
  Serial.print("Integer Multiply Time ");
  Serial.println(micros() - time1);  

 time1 = micros();
  for(n=0; n<1000; n++)
    floaty = 5.0 * (float) n;
  Serial.print("Float Multiply Time ");
  Serial.println(micros() - time1);  

 time1 = micros();
  for(n=0; n<1000; n++)
    floaty = 5.0 / (float) n;
  Serial.print("Float Divide Time ");
  Serial.println(micros() - time1);  

 time1 = micros();
  for(n=0; n<1000; n++)
    floaty = sin((float) n/ 100);
  Serial.print("Sine Evaluation Time ");
  Serial.println(micros() - time1);  
  
 time1 = micros();
  for(n=0; n<1000; n++)
    floaty = sint((float) n/ 100);  Serial.print("Sin Table Lookup Time ");
  Serial.println(micros() - time1);  
  
 time1 = micros();
  for(n=0; n<1000; n++)
    floaty = atan((float) n/ 100);
  Serial.print("Arctangent Time ");
  Serial.println(micros() - time1);  
  
 time1 = micros();
  for(n=0; n<1000; n++)
    floaty = atanb((float) n/ 100);
  Serial.print("Rational Approximation B Arctangent Time ");
  Serial.println(micros() - time1);  

 time1 = micros();
  for(n=0; n<1000; n++)
    floaty = atanc((float) n/ 100);
  Serial.print("Rational Approximation C Arctangent Time ");
  Serial.println(micros() - time1);  
}

void loop() {
  // put your main code here, to run repeatedly:

}

with the header file:

Code:
// #include <Arduino.h>

// table lookup and rational approximation of trig functions
double interpolate(double x, double x1, double y1, double x2, double y2);
double sinTableBasic(double x);
double sint(double x);
double cost(double x);
double tant(double x);
double atana(double x);
double atanb(double x);
double atanc(double x);
double atana2(double y, double x);

and the table lookup file that I called sinCosTanTable, the table values are more precise than required, but I assume that there will not be a hit in performance for that.

Code:
//#include <stdio.h>
//#include <stdarg.h>
//#include <Arduino.h>
//#include <stdlib.h>
#include <math.h>

#include "debug.h"

#define PI M_PI
#define N 1024

// function to perform a linear interpolation of values
double interpolate(double x, double x1, double y1, double x2, double y2) {
  return(y1 + (x - x1)/(x2 - x1) * (y2 - y1));
}

// Sin lookup table valid from 0 <= x <= PI/2
// Values generated from transcendentalFunctionGenerator.m
double sinTableBasic(double x) {
  double sinTable[(int) N + 1] = {
    0,0.001533980186285,0.003067956762966,0.004601926120449,0.006135884649154,0.007669828739531,0.00920375478206,0.01073765916726,
    0.01227153828572,0.01380538852806,0.01533920628499,0.01687298794728,0.0184067299058,0.01994042855151,0.02147408027547,0.02300768146884,
    0.02454122852291,0.0260747178291,0.02760814577897,0.02914150876419,0.03067480317664,0.0322080254083,0.03374117185138,0.03527423889821,
    0.03680722294136,0.03834012037355,0.03987292758774,0.04140564097708,0.04293825693494,0.04447077185494,0.04600318213091,0.04753548415696,
    0.04906767432742,0.0505997490369,0.05213170468028,0.05366353765273,0.05519524434969,0.05672682116691,0.05825826450044,0.05978957074664,
    0.06132073630221,0.06285175756416,0.06438263092986, 0.065913352797,0.06744391956366,0.06897432762827,0.07050457338961,0.07203465324689,
    0.07356456359967,0.07509430084792,0.07662386139203,0.07815324163279,0.07968243797143,0.08121144680959,0.08274026454938,0.08426888759332,
    0.08579731234444,0.08732553520619,0.08885355258252,0.09038136087786,0.09190895649713,0.09343633584575,0.09496349532964,0.09649043135525,
    0.09801714032956,0.09954361866007,0.1010698627548,0.1025958690224,0.1041216338721,0.1056471537134,0.1071724249568,0.1086974440131,
    0.1102222072939,0.1117467112111,0.1132709521776,0.1147949266065,0.1163186309119,0.1178420615083, 0.119365214811,0.1208880872358,
    0.1224106751992,0.1239329751185,0.1254549834115,0.1269766964969,0.1284981107938,0.1300192227222,0.1315400287029,0.1330605251571,
    0.1345807085071,0.1361005751757,0.1376201215865,0.1391393441638,0.1406582393328,0.1421768035194,0.1436950331503,0.1452129246528,
    0.1467304744554,0.1482476789869,0.1497645346773,0.1512810379573,0.1527971852584, 0.154312973013,0.1558283976543,0.1573434556162,
    0.1588581433339,0.1603724572429,0.1618863937801, 0.163399949383,  0.16491312049,0.1664259035405,0.1679382949747, 0.169450291234,
    0.1709618887603,0.1724730839968,0.1739838733875,0.1754942533773,0.1770042204121, 0.178513770939,0.1800229014057,0.1815316082611,
    0.1830398879551,0.1845477369386,0.1860551516634,0.1875621285825,0.1890686641498,0.1905747548203,0.1920803970499,0.1935855872958,
    0.1950903220161,0.1965945976701, 0.198098410718,0.1996017576211,0.2011046348421,0.2026070388444,0.2041089660928,0.2056104130531,
    0.2071113761922,0.2086118519783,0.2101118368805,0.2116113273692,0.2131103199161,0.2146088109938,0.2161067970762,0.2176042746385,
    0.2191012401569,0.2205976901089,0.2220936209732,0.2235890292298,0.2250839113598,0.2265782638456,0.2280720831709,0.2295653658205,
    0.2310581082807,0.2325503070388,0.2340419585835, 0.235533059405,0.2370236059944,0.2385135948443,0.2400030224487,0.2414918853029,
    0.2429801799033,0.2444679027478,0.2459550503358,0.2474416191678,0.2489276057457, 0.250413006573,0.2518978181542,0.2533820369956,
    0.2548656596045,0.2563486824899,0.2578311021622,0.2593129151329,0.2607941179153,0.2622747070239,0.2637546789748,0.2652340302855,
    0.2667127574749,0.2681908570634,0.2696683255729,0.2711451595268,0.2726213554499,0.2740969098687, 0.275571819311,0.2770460803061,
    0.2785196893851,0.2799926430803,0.2814649379258,0.2829365704571,0.2844075372113,0.2858778347271,0.2873474595447, 0.288816408206,
    0.2902846772545, 0.291752263235,0.2932191626943,0.2946853721805,0.2961508882436,0.2976157074351, 0.299079826308,0.3005432414173,
    0.3020059493192, 0.303467946572,0.3049292297354,0.3063897953709,0.3078496400415,0.3093087603123,0.3107671527496,0.3122248139218,
    0.3136817403989,0.3151379287525,0.3165933755562, 0.318048077385, 0.319502030816,0.3209552324279,0.3224076788011,0.3238593665179,
    0.3253102921623,0.3267604523201,0.3282098435791,0.3296584625286,0.3311063057599, 0.332553369866, 0.333999651442,0.3354451470845,
    0.3368898533922,0.3383337669655,0.3397768844068,0.3412192023203, 0.342660717312,0.3441014259899, 0.345541324964,0.3469804108459,
    0.3484186802494,0.3498561297901,0.3512927560856,0.3527285557552,0.3541635254205,0.3555976617048,0.3570309612334,0.3584634206337,
    0.359895036535,0.3613258055685,0.3627557243674,0.3641847895671,0.3656129978048,0.3670403457198,0.3684668299534,0.3698924471489,
    0.3713171939518,0.3727410670095,0.3741640629715,0.3755861784892,0.3770074102164,0.3784277548088,0.3798472089241,0.3812657692222,
    0.3826834323651,0.3841001950169,0.3855160538439,0.3869310055144,0.3883450466988,0.3897581740699,0.3911703843023, 0.392581674073,
    0.393992040061,0.3954014789478,0.3968099874167,0.3982175621534,0.3996241998456,0.4010298971836,0.4024346508594,0.4038384575677,
    0.405241314005,0.4066432168704, 0.408044162865,0.4094441486923,0.4108431710579,0.4122412266699,0.4136383122384,0.4150344244761,
    0.4164295600976,0.4178237158202,0.4192168883632,0.4206090744484,0.4220002707998,0.4233904741438,0.4247796812091,0.4261678887268,
    0.4275550934303,0.4289412920553,0.4303264813401,0.4317106580251,0.4330938188532,0.4344759605697,0.4358570799223, 0.437237173661,
    0.4386162385385,0.4399942713096,0.4413712687317,0.4427472275646,0.4441221445704, 0.445496016514,0.4468688401624,0.4482406122852,
    0.4496113296546,0.4509809890451,0.4523495872338,0.4537171210002,0.4550835871263,0.4564489823969,0.4578133035989,0.4591765475219,
    0.4605387109582,0.4618997907025,0.4632597835519,0.4646186863062, 0.465976495768, 0.467333208742,0.4686888220358,0.4700433324596,
    0.471396736826,0.4727490319503,0.4741002146505,0.4754502817472,0.4767992300633,0.4781470564248,0.4794937576602,0.4808393306003,
    0.4821837720791,0.4835270789329,0.4848692480008,0.4862102761245,0.4875501601484,0.4888888969198,0.4902264832883,0.4915629161065,
    0.4928981922298, 0.494232308516,0.4955652618258,0.4968970490227,0.4982276669728,0.4995571125451,0.5008853826112,0.5022124740457,
    0.5035383837257,0.5048631085313,0.5061866453452, 0.507508991053,0.5088301425431,0.5101500967068, 0.511468850438,0.5127864006336,
    0.5141027441932,0.5154178780195,0.5167317990176, 0.518044504096,0.5193559901656,0.5206662541404,0.5219752929372,0.5232831034757,
    0.5245896826785,0.5258950274711,0.5271991347819,0.5285020015422,0.5298036246863,0.5311040011513,0.5324031278772,0.5337010018072,
    0.5349976198871, 0.536292979066,0.5375870762956, 0.538879908531,0.5401714727299,0.5414617658531,0.5427507848645,0.5440385267309,
    0.545324988422,0.5466101669108,0.5478940591731,0.5491766621877,0.5504579729366,0.5517379884047,  0.55301670558,0.5542941214536,
    0.5555702330196,0.5568450372752,0.5581185312206,0.5593907118591,0.5606615761973,0.5619311212447,0.5631993440138,0.5644662415205,
    0.5657318107836,0.5669960488251,0.5682589526701,0.5695205193469, 0.570780745887,0.5720396293248, 0.573297166698,0.5745533550477,
    0.5758081914178,0.5770616728557,0.5783137964117,0.5795645591394,0.5808139580958,0.5820619903408,0.5833086529377, 0.584553942953,
    0.5857978574564,0.5870403935209,0.5882815482226,0.5895213186411,0.5907597018589, 0.591996694962,0.5932322950398,0.5944664991847,
    0.5956993044924,0.5969307080622,0.5981607069963,0.5993892984006,0.6006164793839,0.6018422470586,0.6030665985403,0.6042895309482,
    0.6055110414043,0.6067311270345,0.6079497849678,0.6091670123365,0.6103828062763,0.6115971639265,0.6128100824294, 0.614021558931,
    0.6152315905806,0.6164401745309,0.6176473079378, 0.618852987961,0.6200572117633,0.6212599765111,0.6224612793741,0.6236611175257,
    0.6248594881424,0.6260563884043,0.6272518154951,0.6284457666018,0.6296382389149,0.6308292296284,0.6320187359398,0.6332067550501,
    0.6343932841636,0.6355783204886,0.6367618612363,0.6379439036218,0.6391244448638,0.6403034821842,0.6414810128086,0.6426570339662,
    0.6438315428898,0.6450045368155,0.6461760129833,0.6473459686365,0.6485144010221,0.6496813073907,0.6508466849964, 0.652010531097,
    0.6531728429538,0.6543336178318,0.6554928529996,0.6566505457294,0.6578066932971, 0.658961292982,0.6601143420674,  0.66126583784,
    0.6624157775902, 0.663564158612,0.6647109782033,0.6658562336655,0.6669999223036,0.6681420414265,0.6692825883466,0.6704215603802,
    0.671558954847,0.6726947690708,0.6738290003788, 0.674961646102,0.6760927035753,0.6772221701372,0.6783500431299,0.6794763198994,
    0.6806009977955,0.6817240741716,0.6828455463852,0.6839654117973,0.6850836677727,  0.68620031168,0.6873153408918,0.6884287527841,
    0.6895405447371,0.6906507141345,0.6917592583642,0.6928661748174,0.6939714608897,  0.69507511398,0.6961771314915,0.6972775108309,
    0.698376249409,0.6994733446403,0.7005687939432,0.7016625947402,0.7027547444572,0.7038452405245,0.7049340803759,0.7060212614493,
    0.7071067811865,0.7081906370332,0.7092728264389,0.7103533468571,0.7114321957452,0.7125093705647,0.7135848687808,0.7146586878628,
    0.7157308252838,0.7168012785211,0.7178700450557,0.7189371223728,0.7200025079614,0.7210661993145,0.7221281939292,0.7231884893065,
    0.7242470829515,0.7253039723731,0.7263591550843,0.7274126286024,0.7284643904482, 0.729514438147,0.7305627692278,0.7316093812239,
    0.7326542716724,0.7336974381147, 0.734738878096,0.7357785891657,0.7368165688774,0.7378528147885,0.7388873244606,0.7399200954595,
    0.740951125355,0.7419804117208,0.7430079521351,0.7440337441799,0.7450577854415,0.7460800735101,0.7471006059802,0.7481193804504,
    0.7491363945235,0.7501516458062,0.7511651319097, 0.752176850449,0.7531867990436,0.7541949753169,0.7552013768965,0.7562060014144,
    0.7572088465065, 0.758209909813,0.7592091889784,0.7602066816512,0.7612023854843,0.7621962981346,0.7631884172634,0.7641787405361,
    0.7651672656225,0.7661539901963,0.7671389119358,0.7681220285234,0.7691033376456,0.7700828369933,0.7710605242618,0.7720363971504,
    0.7730104533627,0.7739826906068,0.7749531065949,0.7759216990434,0.7768884656732,0.7778534042095,0.7788165123815, 0.779777787923,
    0.7807372285721,0.7816948320711,0.7826505961666,0.7836045186096,0.7845565971556,0.7855068295641,0.7864552135991, 0.787401747029,
    0.7883464276266,0.7892892531689,0.7902302214373,0.7911693302177,0.7921065773002,0.7930419604794,0.7939754775543,0.7949071263282,
    0.7958369046089,0.7967648102084,0.7976908409434,0.7986149946348,0.7995372691079,0.8004576621926,0.8013761717231,0.8022927955381,
    0.8032075314806,0.8041203773983, 0.805031331143,0.8059403905712,0.8068475535438,0.8077528179262,0.8086561815882,0.8095576424041,
    0.8104571982526,0.8113548470171,0.8122505865852,0.8131444148493,0.8140363297059,0.8149263290565,0.8158144108067,0.8167005728668,
    0.8175848131516,0.8184671295803,0.8193475200768,0.8202259825694,0.8211025149911,0.8219771152792,0.8228497813758,0.8237205112274,
    0.824589302785,0.8254561540044,0.8263210628457,0.8271840272737,0.8280450452578,0.8289041147719,0.8297612337945,0.8306164003088,
    0.8314696123025,0.8323208677679,0.8331701647019, 0.834017501106,0.8348628749864,0.8357062843538,0.8365477272235,0.8373872016157,
    0.8382247055548,0.8390602370703, 0.839893794196,0.8407253749705,0.8415549774369,0.8423825996432,0.8432082396418,0.8440318954901,
    0.8448535652497,0.8456732469873,0.8464909387741,0.8473066386859,0.8481203448033,0.8489320552116,0.8497417680009,0.8505494812656,
    0.8513551931053,0.8521589016239,0.8529606049304,0.8537603011381,0.8545579883654,0.8553536647352,0.8561473283752,0.8569389774178,
    0.8577286100003,0.8585162242644, 0.859301818357,0.8600853904294,0.8608669386378,0.8616464611431, 0.862423956111,0.8631994217121,
    0.8639728561216,0.8647442575195,0.8655136240906,0.8662809540245,0.8670462455157,0.8678094967633,0.8685707059713,0.8693298713486,
    0.8700869911087,0.8708420634701, 0.871595086656,0.8723460588944,0.8730949784183,0.8738418434654,0.8745866522782,0.8753294031041,
    0.8760700941954,0.8768087238091,0.8775452902073,0.8782797916565,0.8790122264286,   0.8797425928,0.8804708890522,0.8811971134712,
    0.8819212643484,0.8826433399796,0.8833633386657,0.8840812587126,0.8847970984309,0.8855108561362,0.8862225301489,0.8869321187943,
    0.8876396204029,0.8883450333096,0.8890483558547,0.8897495863831,0.8904487232448,0.8911457647946,0.8918407093923,0.8925335554028,
    0.8932243011955,0.8939129451452,0.8945994856314,0.8952839210386,0.8959662497562,0.8966464701787,0.8973245807054,0.8980005797407,
    0.898674465694,0.8993462369793,0.9000158920162,0.9006834292286, 0.901348847046,0.9020121439025,0.9026733182373,0.9033323684945,
    0.9039892931234,0.9046440905782,0.9052967593181,0.9059472978073,0.9065957045149,0.9072419779153,0.9078861164877,0.9085281187163,
    0.9091679830905,0.9098057081047,0.9104412922581,0.9110747340552,0.9117060320054,0.9123351846233,0.9129621904284,0.9135870479453,
    0.9142097557035,0.9148303122379,0.9154487160883,0.9160649657993, 0.916679059921,0.9172909970084,0.9179007756214,0.9185083943252,
    0.9191138516901,0.9197171462912,0.9203182767091,0.9209172415292, 0.921514039342,0.9221086687433,0.9227011283339,0.9232914167195,
    0.9238795325113,0.9244654743253,0.9250492407827,0.9256308305099,0.9262102421383,0.9267874743046,0.9273625256504,0.9279353948226,
    0.9285060804732,0.9290745812593,0.9296408958432,0.9302050228922, 0.930766961079,0.9313267090812,0.9318842655817,0.9324396292685,
    0.9329927988347,0.9335437729788,0.9340925504043,0.9346391298197,0.9351835099389,0.9357256894811,0.9362656671703,0.9368034417359,
    0.9373390119126,  0.93787237644,0.9384035340631,0.9389324835321,0.9394592236022, 0.939983753034,0.9405060705933,0.9410261750509,
    0.941544065183, 0.942059739771,0.9425731976014,0.9430844374661, 0.943593458162,0.9441002584913,0.9446048372615,0.9451071932853,
    0.9456073253805,0.9461052323704,0.9466009130833,0.9470943663528,0.9475855910177,0.9480745859223,0.9485613499157,0.9490458818527,
    0.949528180593,0.9500082450018,0.9504860739495,0.9509616663116, 0.951435020969,0.9519061368079,0.9523750127198,0.9528416476012,
    0.9533060403542, 0.953768189886,0.9542280951091,0.9546857549413,0.9551411683058,0.9555943341308,  0.95604525135,0.9564939189024,
    0.9569403357322, 0.957384500789,0.9578264130275, 0.958266071408,0.9587034748959,0.9591386224618, 0.959571513082,0.9600021457377,
    0.9604305194156,0.9608566331077,0.9612804858113,0.9617020765291, 0.962121404269,0.9625384680444,0.9629532668737, 0.963365799781,
    0.9637760657954,0.9641840639517,0.9645897932898,0.9649932528549,0.9653944416977,0.9657933588741,0.9661900034454,0.9665843744783,
    0.9669764710449,0.9673662922223,0.9677538370935,0.9681391047464,0.9685220942744,0.9689028047764,0.9692812353565,0.9696573851243,
    0.9700312531945,0.9704028386876, 0.970772140729,0.9711391584497,0.9715038909863,0.9718663374803,0.9722264970789,0.9725843689347,
    0.9729399522056,0.9732932460547,0.9736442496508, 0.973992962168,0.9743393827856,0.9746835106885, 0.975025345067,0.9753648851167,
    0.9757021300385, 0.976037079039,  0.97636973133,0.9767000861287,0.9770281426578,0.9773539001452,0.9776773578245,0.9779985149346,
    0.9783173707196,0.9786339244294,0.9789481753191,0.9792601226491,0.9795697656854,0.9798771036995,0.9801821359681,0.9804848617735,
    0.9807852804032,0.9810833911505,0.9813791933138, 0.981672686197,0.9819638691096,0.9822527413663,0.9825393022874,0.9828235511987,
    0.9831054874312,0.9833851103216,0.9836624192117,0.9839374134492,0.9842100923869,0.9844804553832,0.9847485018019,0.9850142310122,
    0.9852776423889,0.9855387353122,0.9857975091676,0.9860539633462,0.9863080972446,0.9865599102648,0.9868094018142,0.9870565713058,
    0.9873014181579,0.9875439417944,0.9877841416446,0.9880220171433,0.9882575677307,0.9884907928527,0.9887216919603,0.9889502645103,
    0.9891765099648,0.9894004277914,0.9896220174632,0.9898412784588,0.9900582102623,0.9902728123632,0.9904850842565,0.9906950254427,
    0.9909026354278,0.9911079137233,0.9913108598461,0.9915114733187,0.9917097536691,0.9919057004306,0.9920993131422,0.9922905913483,
    0.9924795345987,0.9926661424489,0.9928504144599,0.9930323501979,0.9932119492348,0.9933892111481,0.9935641355206,0.9937367219407,
    0.9939069700024,0.9940748793049,0.9942404494532,0.9944036800577,0.9945645707343,0.9947231211043,0.9948793307948,0.9950331994381,
    0.9951847266722,0.9953339121405,0.9954807554919, 0.995625256381,0.9957674144677,0.9959072294174,0.9960447009013,0.9961798285957,
    0.9963126121828,  0.99644305135,0.9965711457906,0.9966968952029,0.9968202992912, 0.996941357765,0.9970600703395,0.9971764367353,
    0.9972904566787,0.9974021299013,0.9975114561403,0.9976184351385,0.9977230666442,0.9978253504111,0.9979252861986,0.9980228737715,
    0.9981181129001,0.9982110033605,0.9983015449339,0.9983897374073,0.9984755805733,0.9985590742298,0.9986402181803,0.9987190122339,
    0.9987954562052,0.9988695499143,0.9989412931869,0.9990106858541,0.9990777277526,0.9991424187248,0.9992047586184,0.9992647472866,
    0.9993223845883, 0.999377670388,0.9994306045555,0.9994811869662,0.9995294175011,0.9995752960467,0.9996188224952, 0.999659996744,
    0.9996988186962,0.9997352882606,0.9997694053512,0.9998011698879,0.9998305817958,0.9998576410058,0.9998823474542,0.9999047010829,
    0.9999247018391, 0.999942349676, 0.999957644552, 0.999970586431,0.9999811752826,0.9999894110819,0.9999952938096,0.9999988234517,
    1};
    
    int nlow = (int) (x * (2.0/PI) * (double) N);
    double xtemp1 = PI/(2.0*N);
    double x1 = (double) nlow * xtemp1;
    double x2 = (double) (nlow + 1) * xtemp1;
    double y1 = sinTable[nlow];
    double y2 = sinTable[nlow+1];
    double y = interpolate(x, x1, y1, x2, y2);
    return(y);
}

double sint(double x) {
//   put into range of 0 <x < 2*pi
  double y;
  double num = floor(x / (2.0*PI));
  x = x - num * 2.0*PI;
  if (x <= PI/2.0)
    y = (sinTableBasic(x));
  else if (x <= PI)
    y = (sinTableBasic(PI - x));
  else if (x <= 1.5*PI)
    y = (-sinTableBasic(x - PI));
  else //if (x <= 2.0*PI)
    y = (-sinTableBasic(2.0*PI - x));
  //else
    //Serial.print("sin range error");
  
  return(y);
}

double cost(double x) {
//   put into range of 0 <x < 2*pi
  double y;
  double num = floor(x / (2.0*PI));
  x = x - num * 2.0*PI;
  if (x <= PI/2.0)
    y = (sinTableBasic(PI/2.0-x));
  else if (x <= PI)
    y = (-sinTableBasic(x - PI/2.0));
  else if (x <= 1.5*PI)
    y = (-sinTableBasic(1.5*PI - x));
  else //if (x <= 2.0*PI)
    y = (sinTableBasic(x - 1.5*PI));
 // else
   // Serial.print("sin range error");
  
  return(y);
}

double tant(double x) {
  return(sint(x)/cost(x));
};

// Rational Approximation from Abramovitch & Stegun eq'n 4.4.48
// Error <= 5E-3 for -1 <= x <= 1
double atana(double x) {
  return(x/(1 + 0.28*x*x));
}

// from the internet errorMax = 0.162 degrees, valid x >= 0 without if statement
double atanb(double x) {
  double angle, B = 0.596227;
  if (x < 0) {
    x = -x;
    angle = -(PI/2.0 * (B + x)*x / (1.0 + (2.0*B + x)*x));
  } else {
      angle = (PI/2.0 * (B + x)*x / (1.0 + (2.0*B + x)*x));
    }
    return(angle);
  }
  
// from the internet errorMax = 0.00811 degree, valid x >= 0 without if statement
  double atanc(double x) {
    double angle, C = (1 + sqrt(17)) / 8.0;
    if (x < 0) {
      x = -x;
      angle = -(PI/2.0 * (x*(C + x*(1.0 + x))) / (1 + x*((C +1) + x*((C+1) + x))));
    } else {
      angle = (PI/2.0 * (x*(C + x*(1.0 + x))) / (1 + x*((C +1) + x*((C+1) + x))));
      //angle = (PI/2.0 * (x*(C + x*(1.0 + x))) / (1 + (C +1)*x + (C+1)*x*x + x*x*x));
    }
    return(angle);
  }
  
// use the atanb approximation as a faster and accurate enough approximation
// returns a value between - Pi & PI
double atana2(double y, double x) {
    double angle;
    if (x==0.0)
      angle = PI/2;
    else
    angle = atanb(y/x);
  
//   printf("%9.6f  ",angle);
  if(x < 0 && y >= 0)
    angle = PI + angle;
  else if (x < 0 && y < 0)
    angle = -PI + angle;
  else if (x >= 0 && y < 0)
    angle = angle;
  else if (x >= 0 && y >= 0)
    angle = angle;
  //else
    //Serial.print("Problems determining angle in atana2 function!");
  
  return(angle);
}

/*
// main is for testing purposes to confirm that the function is working correctly in a PC environment
int main(void) {
  int i;
  double x, y;
  printf("x \t sint(x) \t sin(x) \t cost(x) \t cos(x) \t tant(x) \t tan(x) \t atana(x) \t atan(x)\n");
  
  // testing sin, cos, tan functions
  for (i=0; i<=100; i++) {
    x = (double) i/25.0 -1.0;
    printf("%9.6f\t%9.6f\t%9.6f\t%9.6f\t%9.6f\t%9.6f\t%9.6f\n",
            x, sint(x), sin(x), cost(x), cos(x), tant(x), tan(x));
  }
  
  // testing the atan functions
  for (i=0; i<=100; i++) {
    x = (double) i/15.0 -3.0;
    printf("%9.6f\t%9.6f\t%9.6f\t%9.6f\t%9.6f\n",
            x, atan(x), atana(x), atanb(x), atanc(x));
  }
  
  // testing the atan2 function
  for (x=-10; x< 20; x=x+2) {
    for (y=-10; y < 25; y=y+2) {
      printf("%9.6f\t%9.6f\t%9.6f\t%9.6f\n",
              x, y, atan2(y,x), atana2(y,x));
    }
  }
  return(0);
}
*/
 
There's a few things you should probably know....

double really is 64 bits on Teensy LC, 3.0 and 3.1.

Floating point constants are often treaded as double, unless you append "f", and that can cause the entire calculation to be promoted to 64 bit double.

Likewise, the trig functions are double, unless you use sinf(), cosf(), tanf(), etc.

The compiler is surprisingly good at replacing code at compile time if the inputs are constants. It can remove loops and calls to static functions, if all the inputs are constants known to the compiler. This can make benchmarking a challenge. Usually you have to get input from Serial or some non-constant source. Another approach is to use volatile, but that defeats optimizations which can give false results.
 
Optimization may have invalidated your code. Have you examined the assembly output? I would make “dummy” and “floaty” volatile at least.
 
Thanks Paul,

I prepared a table of performance based on what you told me. It must have been that integers and longs are both 32 bit. There is no performance difference between integer and long operations. But there is between float and double.

In tweaking the code, I noticed performance improvements when I replaced
PI / 2.0/N with PI / (2.0*N) where N is defined and in
(1 + (C +1)*x + (C+1)*x*x + x*x*x))
with
(1 + x*((C +1) + x*((C+1) + x)))

where you can see the number of adds is the same but the number of multiplies has reduced from 5 to two. Both these changes gave improved performance. I was surprised as I thought the compiler would have done the optimization.

Code:
	                                                           double long	float int
Addition Time	                                                       34	35
Subtraction Time	                                               35	35
Integer Multiply Time	                                       39	38
float Multiply Time	                                           1255	967
float Divide Time	                                           8321	2567
Sine Evaluation Time	                                 50205	27847
Sin Table Lookup Time	                               184100	103918
Arctangent Time	                                         58017	53502
Rational Approximation B Arctangent Time	 25989	20735
Rational Approximation C Arctangent Time	 34659	25517
 
On the Cortex, and many / most 32 bit processors, the C "int" is a signed long (two's complement 32 bits). Unsigned int is 32 bit.
a C short is 16 bits on any processor I've ever seen.
an int is either 16 or 32 bit.

This is why many of us use
#include <stdint.h>
and avoid using int unless we KNOW there are small signedl numbers for this variable.
instead, we use
uint32_t x; // for unsigned 32 bit, because unsigned int might be 16 bit on some processors
and so on

code portability among processors really requires disciplined use of stdint.h.

On floats, no doubt you know that lesser processors in the ARM family lack hardware floating point (single precision), and better ARMs have hardware floating point, and perhaps none (?) have double precision (64 bit) hardware floating point.
 
Last edited:
Dont put the lookuptable definition inside the SineTableBasic function.
Make it a global datastructure outside the function.
 
And try float consts instead of double. Double is 8 Bytes.
For functions: sin() returns double, sinf() returns float.
 
Just for the record :

With -O3 @ 96MHz all float and in one source file:

Starting Timer 10
Addition Time 4
Subtraction Time 4
Integer Multiply Time 7
Float Multiply Time 927
Float Divide Time 2441
Sine Evaluation Time 24908
Sin Table Lookup Time 99462
Arctangent Time 25226
Rational Approximation B Arctangent Time 18152
Rational Approximation C Arctangent Time 21511

sinetable global:
Sin Table Lookup Time 33833
 
Last edited:
Just for the record :

With -O3 @ 96MHz all float and in one source file:

Starting Timer 10
Addition Time 4
Subtraction Time 4
Integer Multiply Time 7
Float Multiply Time 927
Float Divide Time 2441
Sine Evaluation Time 24908
Sin Table Lookup Time 99462
Arctangent Time 25226
Rational Approximation B Arctangent Time 18152
Rational Approximation C Arctangent Time 21511

sinetable global:
Sin Table Lookup Time 33833

units of timings?
 
On floats, no doubt you know that lesser processors in the ARM family lack hardware floating point (single precision), and better ARMs have hardware floating point, and perhaps none (?) have double precision (64 bit) hardware floating point.
The common chips in the lower power embedded world only have no floating point or just single precision. However many ARM chips do have 64-bit floating point hardware instructions. For example, both the Rasbperry Pi and Beagle Bone Black run at much higher clock frequencies, and do have full 64-bit hardware support. Of course, those systems are designed with different design constraints than chips like the Teensy.

The current Teensy 3.0, 3.1, and LC do not have any floating point support.

Paul has said the next high end Teensy (so-called Teensy 3.1++) will support single precision in hardware. When this comes out you would need to add a 'f' suffix to all floating point constants, and use the float version of the math routines (with a 'f' suffix) to prevent the compiler from automatically promoting expressions from single precision to double precision and then do the conversion back to float for the store. This is due to C being developed on a PDP-11 where double precision was easier to use and faster than single precision, and the default was to convert every to double. AVR processors used in Arduino boards didn't help matters, where due to space issues, the compiler makes 'double' the same as 'float'.

So right now, if you need to do lots and lots of FP calculations, the Teensy is probably not the platform for you. Perhaps a Raspberry Pi (if you don't need bare metal, real time performance) would work (and using a Teensy as an off-board processor if you need bare metal real-time). Perhaps a Navspark (http://www.navspark.com.tw/), which has a Sparc processor with full 64-bit floating point, that appears to be targeted towards real-time usage with GPS/GNSS devices.
 
units of timings?

See sourcecode above. us.
I just did a "search and replace" double-->float, sin -> sinf, atan->atanf

Edit:
Pls note, that the above code is not good for benchmarking.
For example, some of the loops can be optimized away completely, and the micros() after print gives unreliable results. Then, the usb-transfer parallel to the calculations is suboptimal.
 
Last edited:
@MichaelMeissner:

This is optmized away (execution time=zero):

for(n=0; n<1000; n++)
dummy = 5 * n;

But this not:

for(n=0; n<1000; n++)
floaty = 5.0 * (float) n;

Is there a special rason for this behavior, or is it a missed optimization ?
 
its because floaty is declared outside the function, I think. should be possible to optimize avay since floaty is not decalred volatile.

the interpolation for table lookup can be improve since the denominator in the division is constant, but the code having interpolation in a separate function make this hard to optimize away.
 
Hm, both, dummy and floaty are global:
int n, time1, dummy;
float floaty;

I looks like a missed optimization for me, but i'm curious what Michael says..
Float multiplication is handled different, i think there is a function call to a internal "mult". But, that's not a reason to not optimize it.. am i right ?

edit:
Minimal example:
Code:
int n, time1, dummy;
float floaty;

void setup() {
  delay(1000);
  
  time1 = micros();
  for(n=0; n<1000; n++)
    dummy = 5 * n;
  time1 = micros()-time1;        
  Serial.print("Integer Multiply Time ");
  Serial.println(time1);  
  delay(50);
  
 time1 = micros();
  for(n=0; n<1000; n++)
    floaty = 5.0 * (float) n;
  time1 = micros()-time1;  
  Serial.print("Float Multiply Time ");
  Serial.println(time1);  
 
}

void loop() {}

Output is:
Integer Multiply Time 1
Float Multiply Time 939

even worse:
floaty = 5.0 * n;

-> Float Multiply Time 2499

(all -O2)
 
Last edited:
-O2:
Code:
int n, time1, time2, time3, dummy;
float floaty;

void setup() {
  delay(5000);
  
  time1 = micros();
  for(n=0; n<1000; n++)
    dummy = 5 * n;
  time1 = micros()-time1;        

 time2 = micros();
  for(n=0; n<1000; n++)
    floaty =   5.0 * (float) n;
  time2 = micros()-time2;  
  
 time3 = micros();
  for(n=0; n<1000; n++)
    floaty =  (float) n * 5.0 ;
  time3 = micros()-time3;  
  

  delay(2000);

  Serial.print("Integer Multiply Time ");
  Serial.println(time1);  
  
  Serial.print("Float Multiply Time 1 ");
  Serial.println(time2);    

  Serial.print("Float Multiply Time 2 ");
  Serial.println(time3);    

}

void loop() {}

Integer Multiply Time 0
Float Multiply Time 1 1167
Float Multiply Time 2 1044

WTF ???

Edit: Observation:
No difference between
- volatile float floaty;
and
- float floaty;

With local variables (not volatile), all loops are optimzed to zero microseconds.
 
Last edited:
-O2:
...
Integer Multiply Time 0
Float Multiply Time 1 1167
Float Multiply Time 2 1044

WTF ???

Edit: Observation:
No difference between
- volatile float floaty;
and
- float floaty;

With local variables (not volatile), all loops are optimzed to zero microseconds.

Might this have something to do with it:

"Note on built-in math functions
...
Functions are only evaluated at compile time if the rounding mode is known to be round to even (-fno-rounding-math), or if the result is exact and does not depend on the rounding mode. Compile-time evaluation always rounds correctly, even for transcendental functions."

ref:
https://gcc.gnu.org/wiki/FloatingPointMath
 
Last edited:
Ok..forget the difference between Float Multiply Time 1 and Float Multiply Time 2...its because of interrupts.
shame on me :)
 
It should be 0, or close to 0 micros. any other number is strange.... the compiler does not detect that the result is not needed. Even worse, every single superfluous float multiply is calculated.
Its either a BUG, or something that i do not understand (and yes, i know, most of the time the problem is between the ears ;-) . If you declare the variables locally, the time is 0 - exact what i expect.
 
Yeah, the -O3 optimizer level is buggy, broken and unusable on this architecture, and the -O is not working hard enough to remove some obviously not needed calculations. So we better write good code and not overly rely on the optimizer.

Anyway, one possibility is that the optimizer works on assembly level, add and mult's are single operations so the its easy for the optimizer to see that they are not needed. The floating point calculations are done with software library calls since Teensy has no FPU, and therefore the float multiplications uses function calls that the might or might not have side effects as far as the optimizer can tell, so they are kept, perhaps the final assignement to floaty is optimized away.
 
:) Its the same for -O2. Most of the time (well, a least for me), this gives good results, better than -O.
Yes, the function call might be part of the problem as i wrote in post#15. On the other hand, it works ok with local variables.

for(n=0; n<1000; n++)
floaty = (float)n;

is the simplest case. It fails too.
I think i go to bugzilla and try to find this bug or report it.
 
..and its not only with coretx-m4! its the same with arm-mode (not-thumb), independend of the cpu.
just tested with the latest gcc 4.9.3

Code:
int n, dummy;
float dummyfloat;

void bug(void) 
{
 for(n=0; n<1000; n++)
    dummy = n;
 for(n=0; n<1000; n++)
    dummyfloat = n;
}

arm-none-eabi-gcc -O2 bug.c -S
gives:
Code:
    .cpu arm7tdmi
    .fpu softvfp
    .eabi_attribute 20, 1
    .eabi_attribute 21, 1
    .eabi_attribute 23, 3
    .eabi_attribute 24, 1
    .eabi_attribute 25, 1
    .eabi_attribute 26, 1
    .eabi_attribute 30, 2
    .eabi_attribute 34, 0
    .eabi_attribute 18, 4
    .file    "bug.c"
    .global    __aeabi_i2f
    .text
    .align    2
    .global    bug
    .type    bug, %function
bug:
    @ Function supports interworking.
    @ args = 0, pretend = 0, frame = 0
    @ frame_needed = 0, uses_anonymous_args = 0
    stmfd    sp!, {r4, lr}
    mov    r4, #0
    ldr    r3, .L6
    ldr    r2, .L6+4
    str    r2, [r3]
.L2:
    mov    r0, r4
    add    r4, r4, #1
    bl    __aeabi_i2f
    cmp    r4, #1000
    bne    .L2
    ldr    r2, .L6+8
    ldr    r3, .L6+12
    str    r0, [r2]    @ float
    str    r4, [r3]
    ldmfd    sp!, {r4, lr}
    bx    lr
.L7:
    .align    2
.L6:
    .word    dummy
    .word    999
    .word    dummyfloat
    .word    n
    .size    bug, .-bug
    .comm    dummyfloat,4,4
    .comm    dummy,4,4
    .comm    n,4,4
    .ident    "GCC: (GNU Tools for ARM Embedded Processors) 4.9.3 20150529 (release) [ARM/embedded-4_9-branch revision 224288]"
 
you must set the hardware floating point flags to the compiler

FPU = -D__FPU_PRESENT -mfloat-abi=hard -mfpu=fpv4-sp-d16

in order to not use the software function calls
 
Status
Not open for further replies.
Back
Top