Measuring Code Efficiency; Measuring CPU utilization

Status
Not open for further replies.

JimKazmer

Well-known member
I'm writing some code to control external devices using the Teensy 4.0. I'd like to measure the overhead my code is placing on the Teensy.

I am using DMA extensively, SPIs, flexPWM, and a few timer interrupts. I'd like to quantize the impact of my code on the general performance of the Teensy, so I can either forewarn others who may eventually use my code libraries, or focus on improving my code relative to a quantitative measure.

Has anyone written anything I might be able to leverage? Does anyone have suggestions?

My initial thoughts seem simplistic; while my code is controlling the external devices, run some algorithm like:
* Calculate PI : how many digits in X minutes
* Move memory around : how many bytes in X Minutes
Then run the same tests without controlling the external devices, and compare performance.

I was hoping someone has already developed such a benchmarking program for the Teensy.

Thanks,
 
A very simple measure is number of loops per second with your code active as a fraction of number of loops per second for an (almost) empty sketch.
 
I usually put a pair of digitalWriteFast() into the code and watch the pulse width on my oscilloscope. One reason I really like using my scope is all sorts of anomalous behaviors tend to show up on the screen as flickering, ghosting, etc. But you do need a fast updating scope and familiarity with how to use it.
 
...fraction of number of loops per second for an (almost) empty sketch.
Yes, I am heading in that direction. I'd like to include functions from some of the (IO) subsystems on the Teensy to measure more than just the CPU impact.

I usually put a pair of digitalWriteFast() into the code and watch the pulse width on my oscilloscope.
I have an RIGOL DS1054 (4 channels, 1Gsps sampling rate for one channel, and 50 MHz input bandwidth). Is that too low for what you are talking about?

So I'd make the edge go high near the top of the loop (trigger the scope on the rising edge), and be able to see/measure when the edge goes low near the bottom of the loop... letting the scope run, I'd see the variability play out on the screen. Is that right?
 
You can also measure within your code using millis(), micros(), or ARM_DWT_CYCCNT for clock cycle resolution.

uint32_t start, stop, delta, min=0xFFFFFFFF, max=0;
loop() {
start = micros();
.... your stuff ....
period = micros() - start; // okay even w/ overflow
if (period > max) max = period;
if (period < min) min = period;
}
 
I have an RIGOL DS1054 (4 channels, 1Gsps sampling rate for one channel, and 50 MHz input bandwidth). Is that too low for what you are talking about?
No, higher bandwidth will help with narrow pulses of course, but you should see something with that bandwidth even for pretty fast pulses.
 
I have an RIGOL DS1054 (4 channels, 1Gsps sampling rate for one channel, and 50 MHz input bandwidth). Is that too low for what you are talking about?

Rigol DS1054 should work. Just don't configure for deep memory, as that causes those cheap scopes to run terribly slow. The bandwidth and sample rate aren't the main limitation. How quickly the scope can capture & render waveforms to the screen is what's important. I believe those Rigol scopes offer up to 30,000 waveform/sec rendering. That's nowhere near the maximum 1,000,000 waveform/sec speed if you spend several thousand dollars on a high end Keysight scope, but 30,000 is still pretty good.... and realistically, if you're testing code which runs once every millisecond, it's 30 times as much as you really need.

I've used those Rigol scopes a few times, but usually only briefly at a hackerspace gathering while trying to help troubleshoot someone's project. My impression is their triggering is far from the capability you might expect from only the specs. So it may not trigger really reliably if you have only a tiny amount of code between the 2 digitalWriteFast() lines and you're running Teensy 4.x at 600 MHz or higher. But if you're testing most realistic cases with code that runs for 100 ns or more (probably much more in scenarios where you would want to measure & optimize), it'll probably work fine.
 
You can also measure within your code using millis(), micros(), or ARM_DWT_CYCCNT for clock cycle resolution. }

Yes, I am familiar with that technique. Time required per quanta of work is likely the core principle behind such metrics. Knowing average loop-time, max loop time, std-dev loop time, and histogram of iteration-times are all very useful to characterizing the programs behavior.

I was hoping that someone has figured out some standard way of generating one or more metrics that could be used across different programs, yet gave some relatively useful basis of comparisons (metric examples listed below). I think such metrics are possible, yet requires someone to do some deep thinking about benchmarking and applying it to different aspects of the Teensy. For example, I could write a library with a simple-to-use interface to track and report "average loop-time, max loop time, std-dev loop time, and histogram of iteration-times"; but these are not the types of metrics I am discussing.

For my current work, writing peripheral libraries that are meant to do something helpful (and ideally 100% in the background), its helpful to know how resource intrusive they are to the rest of the program.

I think what I will end up doing is measuring the resource cost of running my "library" at various levels (e.g. 1 device, 2 devices, ... 20 devices).
Hopefully, I can measure:
* Impact on CPU (e.g. 6% CPU cycles consumed)
* Impact on interrupts (e.g. how many and which interrupts are being used)
* Impact on memory (e.g. how much and which type)
* Impact on DMA (e.g. DMA utilization 50%... that may seem crazy, but I'm moving a lot of data without any CPU involvement)
** A high DMA utilization must have some adverse impact on the program... How do I measure/quantize that?
* other resources that I should try to assess my impact on?

As you can see, in this case I don't care about loop times. I am trying to measure what my library's impact will be on other people's loop times (or my loop times for code I haven't written yet).
 
Status
Not open for further replies.
Back
Top