RAM and CPU usage

Status
Not open for further replies.

MattH

Active member
Hello,
I was trying to see how much resources my small project was taking using this sketch:
HTML:
https://forum.pjrc.com/threads/31664-measure-your-teensy-3-x-cpu-and-ram-usage!
I found another one here
HTML:
https://github.com/chrishonson/Arduino_CPU_Usage
I haven't tested it yet but it looks pretty similar (except that it does not detect the clock speed).

When I used it as it is (i.e. empty) I got CPU:6% and Free RAM:60531
When I pasted just a part of my project where the commented lines are in the "inline void Ten_MS_Task(void)" I got CPU:17% and Free RAM:60472 and if I paste it in the inline void One_MS_Task(void) I got CPU:24% and Free RAM:60511

Is it possible that I am only using 59 of RAM ? I have an array of float, a 2D array of floats, an array of bytes, lots of long, float, and Uint16_t variables, three functions that do float calculation...

Am I using this correctly?

Why do I get 100% CPU usage if I leave a delay in my loop ?

How do I extrapolate CPU usage for 1 loop from the x10=17% and x100=24% ?

Cheers
 
These counter methods for monitoring cpu load depends on compiler optimization, versions, etc. and has to be adjusted for each case. I have found another method that always will give a correct result.
It will only work in eventdriven programming since it dependents on wfi.

Working example below. Testet on a Teensy 3.2 at 72 Mhz

cpuload.ino
Code:
#include "Arduino.h"

volatile uint32_t cpu_load = 0;
volatile bool has_result = false;

IntervalTimer itimer;

void cpuLoadSleep()
{
    uint32_t st;
    static uint32_t wt = 0;
    static uint32_t busy_time = 0;
    static uint32_t sleep_time = 0;

    busy_time += ARM_DWT_CYCCNT - wt;
    st = ARM_DWT_CYCCNT;
    __disable_irq();
    __asm volatile ("wfi \n");
    sleep_time += ARM_DWT_CYCCNT - st;
    wt = ARM_DWT_CYCCNT;
    __enable_irq();
    if ((busy_time + sleep_time) > F_CPU) {
        cpu_load = busy_time / ((busy_time + sleep_time) / 1000);
        busy_time = 0;
        sleep_time = 0;
        has_result = true;
    }
}

void wasteSomeTime()
{
    for (int i=0; i<5000; i++) {
        __asm volatile ("nop \n");
    }
}

void setup(void)
{
    Serial.begin(9600);

    // enable cpu cycle counter
    ARM_DEMCR |= ARM_DEMCR_TRCENA;
    ARM_DWT_CTRL |= ARM_DWT_CTRL_CYCCNTENA;

    // create some cpu load
    itimer.begin([] { wasteSomeTime(); }, 1000);
}

void loop(void)
{
    if (has_result) {
        Serial.printf("cpu load: %d.%d%%\n\r", cpu_load/10, cpu_load%10);
        has_result = false;
    }

    // do other usefull stuff....

    cpuLoadSleep();
}
 
Last edited:
Thanks Jacob,

Stupid question: where do I insert my code ? Where "// do other useful stuff...." is or in the wastesometime method ? Obviously I need to add all the relevant stuff in the setup too.

Cheers !
 
You only have to call cpuLoadSleep() then your application is in idle state and the cpu will stop until a interrupt starts it again. The timer and wastesometime funtion is only for test and is not needed.
 
Free ram ;)
Code:
// ***********************************************************************************************************
// *  F R E E  R A M
// * Function from the sdFat library (SdFatUtil.cpp) licensed under MIT.
// * Full credit goes to Bill Greiman.
// ***********************************************************************************************************
extern "C" char* sbrk(int incr);
extern char *__brkval;
extern char __bss_end;
int FreeRam()
{
  char top;
  return __brkval ? &top - __brkval : &top - &__bss_end;
}

void setup() {
  Serial.begin(9600);
  while (!Serial && millis() < 5000) {} // wait for Arduino Serial Monitor
  
  Serial.print("FreeRam "), Serial.println(FreeRam());
}

void loop() {
  // put your main code here, to run repeatedly:
  //    ):
}
 
**cpuload

Empty :
96MHz -> 22.1%
72MHz -> 29.2%
48MHz -> 43.5%
24MHz -> 85.7%

Project :
96MHz -> 38.4%
72MHz -> 41.3%
48MHz -> 58.9%
24MHz -> 94.8%

Comments:
My project involves a Bluefruit module and for some reasons the serial port is not available simultaneously. I sent the cpu load via a ble.printf command and it worked fine.
I noticed that it takes a bit of time (~6 readings) before the value stabilises, is that normal ?
The CPU usage value for the empty sketch looks rather high (22%) and much higher that previously calculated with the first method I tested (6%). How come ?
Appart from that everything makes sense, thanks Jacob !


**FREE RAM:
empty: 60587
project1: 60576

Comment: I don't understand why the free RAM is only worked out in the setup. I would make more sense if it was in the loop, right ?
 
The idle cpu load at 72 Mhz is should be 0.8% due to systicks overhead or else something is wrong. If you comment out the itimer.begin i my example it shoud almost be zerro.
Of course, you should call FreeRam from the loop if you want a running update. It would be natual to print cpu load and ram usage in the same statement.
 
Remove wasteSomeTime() function if you included it. It's only a dummy CPU load for testing. It could give the result you have.

Remove this line too:

// create some cpu load
itimer.begin([] { wasteSomeTime(); }, 1000);

Like this:
Code:
#include "Arduino.h"

volatile uint32_t cpu_load = 0;
volatile bool has_result = false;

void cpuLoadSleep()
{
    uint32_t st;
    static uint32_t wt = 0;
    static uint32_t busy_time = 0;
    static uint32_t sleep_time = 0;

    busy_time += ARM_DWT_CYCCNT - wt;
    st = ARM_DWT_CYCCNT;
    __disable_irq();
    __asm volatile ("wfi \n");
    sleep_time += ARM_DWT_CYCCNT - st;
    wt = ARM_DWT_CYCCNT;
    __enable_irq();
    if ((busy_time + sleep_time) > F_CPU) {
        cpu_load = busy_time / ((busy_time + sleep_time) / 1000);
        busy_time = 0;
        sleep_time = 0;
        has_result = true;
    }
}


void setup(void)
{
    Serial.begin(9600);

    // enable cpu cycle counter
    ARM_DEMCR |= ARM_DEMCR_TRCENA;
    ARM_DWT_CTRL |= ARM_DWT_CTRL_CYCCNTENA;
}

void loop(void)
{
    if (has_result) {
        Serial.printf("cpu load: %d.%d%%\n\r", cpu_load/10, cpu_load%10);
        has_result = false;
    }

    // do other usefull stuff....

    cpuLoadSleep();
}
 
Hey Jacob,

That's much better, but it's weird that CPU usage is higher @96MHz than 72... I whole thing twice (i.e. changing the clock speed up and down) and got exactly the same values.

empty:
96MHz -> 0.8 %
72MHz -> 0.9 %
48MHz -> 1.1 %
24MHz -> 1.9 %

project:
96MHz -> 17.4 %
72MHz -> 14.1 %
48MHz -> 17.3 %
24MHz -> 22.4 %

Chris O's snippet still reads 59943 free RAM even when I include it in the loop of my main sketch and 60599 when I run it just as he posted it (I just moved Serial.println(FreeRam()); in the main loop and added a 500ms delay). That's just a 656 byte difference. I have no point of comparison but it looks a bit small to me. Teensy3.2 RAM is supposed to be 64k so 4k is already taken by what I can only guess as housekeeping functions. Can my sketch use so little of the RAM ?

Thanks !



PS: This is just the part of the project I used for testing RAM usage. It's not the whole project but it has most of the calculations. Can that use only 656 bytes of RAM ?

Code:
#include <Arduino.h>
#define MODE_LED_BEHAVIOUR          "DISABLE"
#define PACKET_SIZE                 20 
#define DECIMAL_PRECISION           100

const int upperTH[]={7389,4004,1127,3349,9643,9365,3364,7563};
const int lowerTH[]={3824,7059,8529,7941,5294,3529,1471,2647};
const int ledPin = 6;
const int batPin = 23; // pin23=A9
unsigned long startTime;
uint8_t vBat=0x00;
const float coeffR1[8][8]= {  {0.086412,0.524236,1.08109,1.088562,1.393154,1.555173,527.3333,210.3451},
                                    {0.073565,0.412292,0.794877,0.931471,1.565486,1.622306,534.037,211.3498},
                                    {0.224957,1.002921,1.303058,0.673881,1.186716,1.492633,538.3929,229.9034},
                                    {0.066919,0.429193,0.980065,1.186178,1.540448,1.536392,549.7037,215.0949},
                                    {0.050312,0.32972,0.819643,1.20045,1.67268,1.543971,545.6111,211.6036},
                                    {0.158642,0.986599,2.045174,1.697897,1.182272,1.329056,577.5926,222.6724},
                                    {0.036896,0.297687,0.857206,1.26532,1.628319,1.528661,547.037,214.6597},
                                    {0.042915,0.319565,0.866577,1.242249,1.618421,1.532652,546.7037,214.6335} };

float calibrateprobe(int probeId, int probeOutput){
  float x=(probeOutput-coeffR1[probeId][6])/coeffR1[probeId][7]; // transpose
  float result=((((coeffR1[probeId][0]*x + coeffR1[probeId][1])*x + coeffR1[probeId][2])*x + coeffR1[probeId][3])*x + coeffR1[probeId][4])*x + coeffR1[probeId][5];
  return result;
}

uint8_t batteryRead(int batPin){
// express battery voltage in % of battery charge
  const int VBAT_MIN=3500; //in mV
  const int VBAT_MAX=4100; //in mV
  int uBat = batteryVoltage(batPin);
  return (uBat-VBAT_MIN)*100/(VBAT_MAX-VBAT_MIN);  
}

int batteryVoltage(int batPin){
// calculate battery voltage
  const int VDIV=205;  // voltage divider factor * 100 (e.g. 2.05->205); with R1=R2=27K the factor is, in theory, 2
  return analogRead(batPin)*33*VDIV/1024;  // vBat is a 10 bit value (1024 channels) and the reference voltage is 3.3V
}



void setup()
{

  pinMode(ledPin, OUTPUT);
  startTime=millis();
}


void loop()
{
  float probeValue[8];
  probeValue[0]=1; 
  probeValue[1]=2; 
  probeValue[2]=3; 
  probeValue[3]=4; 
  probeValue[4]=5; 
  probeValue[5]=6;
  probeValue[6]=7; 
  probeValue[7]=8;

  long x=0;
  long y=0;
  long z=0;
   uint16_t THx=x/z;
   uint16_t THy=y/z;
   
// Read timer and update battery voltage (every 2 min)
   unsigned long elapsedTime = millis()-startTime;
   uint16_t myTime = elapsedTime % 65535;
   if (myTime <1) {
    vBat = batteryRead(batPin);
   } 
   
   uint16_t s0 = constrain(probeValue[0]*100, 0, 65535);
   uint16_t s1 = constrain(probeValue[1]*100, 0, 65535);
   uint16_t s2 = constrain(probeValue[2]*100, 0, 65535);
   uint16_t s3 = constrain(probeValue[3]*100, 0, 65535);
   uint16_t s4 = constrain(probeValue[4]*100, 0, 65535);
   uint16_t s5 = constrain(probeValue[5]*100, 0, 65535);
   uint16_t s6 = constrain(probeValue[6]*100, 0, 65535);
   uint16_t s7 = constrain(probeValue[7]*100, 0, 65535);

// build and send data packet
   byte data[PACKET_SIZE];
    data[0] = highByte(elapsedTime);
    data[1] = lowByte(elapsedTime);
    data[2] = highByte(s0);
    data[3] = lowByte(s0);
    data[4] = highByte(s1);
    data[5] = lowByte(s1);
    data[6] = highByte(s2);
    data[7] = lowByte(s2);  
    data[8] = highByte(s3);
    data[9] = lowByte(s3);
    data[10] = highByte(s4);
    data[11] = lowByte(s4);
    data[12] = highByte(s5);
    data[13] = lowByte(s5);
    data[14] = highByte(s6);
    data[15] = lowByte(s6);
    data[16] = highByte(s7);
    data[17] = lowByte(s7);
    data[18] = vBat;
    data[19] = 0x00; 
    //Serial.write(data,PACKET_SIZE);
}
 
Did'nt look at the sourcecode you provided in detail, but as a general rule, unused variables are optimized -out (removed) by the compiler. data[] gets filled, but not really used, so all that might be discarded.. (and the code that fills it, too)
 
good point.

I printed the end result (data[]) and I also sprinkled FreeRAM() throughout the code (I realised that dynamic memory usage was... dynamic). I ended up with a minimum free RAM of 59883 that is to say 716 bytes of used RAM. Again I am not sure how much I should be expecting but I was thinking something around a few kb.
The good point is that it remains the same over time. To me that means that the heap is not increasing.

Thanks for your inputs.
 
I just tested my first version with the cpu dummy load (wasteSomeTime) and it is reacting as expected. If you have blocking IO or delays in your program the CPU load might not reflect the clock frequency.

72Mhz cpu load: 29.1%
96Mhz cpu load: 22.0%
 
Status
Not open for further replies.
Back
Top