I just change something and measure the elapsed micros to see which is faster. The unrolled was twice as fast so I kept it. It doesn’t matter what anyone says, if it’s faster it’s faster.
However the code before was...
Ah, I missed that. Thanks, will fix. It doesn’t really do anything because the signals are usually in the thousands. It’s just in case to not divide by zero but it’s pretty much impossible anyway.
You mean: ”std::max(max, 10.0f)” ?
That’s intentional for my use case. I don’t care about max values below 80 and 10 is just noise.
I used a regular for loop but it was way slower so I unrolled it manually. I...
However a big part of the optimisation is that I only process every other sensor reading as the requirements are not that great for this calculation. So every other time I process one sensor pair, and the next time I...
Sure:
float CrossCorrelation::ComputeHead(int sensorID) {
float max = 0.0f;
//unrolled loop, the "0" is incremented every time as in a for loop
m_targetSignal =...
I executed 10 of these calculations and printed the result, it's like 10 times faster lol.
Thanks for the link and the suggestion for the optimization. This knowledge will come in handy later when I have to do a lot...
Well I should be able to save like 1000 cycles per sensor pair and that is a lot. I'm doing this for 5 sensor pairs in between the sensor readings and I only have 5 microseconds of total time to calculate a sensor pair...
I will try it, it would explain why it is so slow. I was expecting a lot fewer cycles. I assumed it was all the comparisons for the loop. Unrolling the loop made it twice as fast but if this works and is much faster...
Really? In that case then it should make a big difference. I think I read in a post on the forum that it was 1 cycle for division/multiplication/addition/subtraction with floats and two cycles for doubles. I believe it...
I'm reading sensor pairs and processing the two previous pairs so it's as good as it can get I think. I'm doing all calculations while waiting for sensor reads. Problem before was I had about 5 micros before the sensor...
After pruning so that every other sample is used and unrolling it's now really fast. Went from 3.5 to below 1 micros per call. It's less elements but sufficient for my use case.
std::array was faster but negligible....
I've solved it (sort of). Made it go from 3.5 microseconds per sensor to 1.7 which is good enough. I got rid of one of the loops by combining the nompalise into the function. I tried various things but unrolling the...
These are the functions that are run continuously and I'm wondering if there is anything I'm doing that's stupid and will waste cycles because it's pretty slow right now.
I'm normalising a 1d signal of 60 elements and...
I wrote a class that does some calculations. It compares a 1d signal with a template of a peak and calculates the absolute difference after normalisation. I have a function that triggers based on a threshold but I want...