Floats... don't use 'em!

Status
Not open for further replies.

skidd

Member
Just wanted to convey the outcome of a 2 week long battle to find a bug in my project.

Cliff Notes: Don't underestimate the CPU cost of doing even simple floating point operations.

I'm building a POV clock, spinning at about 1000rpm, with 32 LEDs and 360 rotation segments. Needless to say, my timings need to be pretty accurate. I was using a "float" for the decimal precision when dividing the total time per rev in microseconds by 360. I needed that decimal accuracy since as the arm swung around closer to the end of it's rotation, the precision would drift without the decimal accuracy. A simple single float operation doing just 1 divide. Well... I'd been struggling to find a strange "judder" issue. It was especially strange because it seemed to start out perfectly smooth, then slowly get worse and worse. It's worth noting , I'm using TeensyThreads to offload the much much slower image generation for some of the animations. So, I spent a good deal of time tweaking those thread timing thinking that was where the problem was.

Finally.. after stripping down the clock to it's bare functionality of just lighting up the LEDs (360x/rev), the judder was still there. This is when I finally just dropped the "float" in favor of multiplying the int values by 100 since I only really needed 1-2 decimals of precision. Voila!! judder gone.

My conclusion has to be that the float was expensive enough to slow down each of the 360 output steps (aprox 160us @ 1000rpm) by just enough, that the arm had moved too far along between grabbing the "micros()" value, and doing the math on it. I knew the Teensy3.2 didn't have a FPU on board, but I guess I just figured one float should not have cost that much CPU. Alas, it did.

So.. avoid floats.. if you can. I know I will in the future.
 
Grabbing micros() is a bit expensive too. If you could adapt to the MCU Cycle Counter you can track 'time' to nearest F_CPU cycle with what seems to be a simple memory read. There are other posts about turning it on and then using it. The one I did here might be enough to get you resolution to 1/F_CPU [ i.e.:: 1/96,000,000 ] without the overhead of micros() resolving the ticks and doing rounding, stopping interrupts.

Code:
uint32_t micros(void)
{
	uint32_t count, current, istatus;

	__disable_irq();
	current = SYST_CVR;
	count = systick_millis_count;
	istatus = SCB_ICSR;	// bit 26 indicates if systick exception pending
	__enable_irq();
	 //systick_current = current;
	 //systick_count = count;
	 //systick_istatus = istatus & SCB_ICSR_PENDSTSET ? 1 : 0;
	if ((istatus & SCB_ICSR_PENDSTSET) && current > 50) count++;
	current = ((F_CPU / 1000) - 1) - current;
#if defined(KINETISL) && F_CPU == 48000000
	return count * 1000 + ((current * (uint32_t)87381) >> 22);
#elif defined(KINETISL) && F_CPU == 24000000
	return count * 1000 + ((current * (uint32_t)174763) >> 22);
#endif
	return count * 1000 + current / (F_CPU / 1000000);
}
 
Thanks for that.. I might try and use some of that to allow for as much CPU time on the rendering thread as I can muster.
 
My conclusion has to be that the float was expensive enough to slow down each of the 360 output steps (aprox 160us @ 1000rpm) by just enough, that the arm had moved too far along between grabbing the "micros()" value, and doing the math on it. I knew the Teensy3.2 didn't have a FPU on board, but I guess I just figured one float should not have cost that much CPU. Alas, it did.

So.. avoid floats.. if you can. I know I will in the future.
For your application, using scaled integers works.

Alternatively, spend $5-ish more and get a Teensy 3.5 which has hardware floating point and a faster clock cycle (or $10 to get the Teensy 3.6). Note, specifically use float and not double. If you use the math functions, be sure to use the functions with a 'f' suffix that take a float input and produce a float output. If you use the double math functions (i.e. without the 'f' suffix), the Teensy 3.5/3.6 will need to emulate the double precision arithmetic.
 
In hind site.. I should have gone with the Teeny3.5 for sure. To be honest, I didn't realize the amount of CPU I was going to need until I got well into the software. I have this little 3.2 over-clocked to 120Mhz and it's keeping up, but I think only just. The stock 72 or 96 clock speeds were not enough for what I'm asking of it. But.. I've got it singing along pretty nice now. And to think, I originally thought a little ATMEGA328p was going to have enough power. HA!!
 
Status
Not open for further replies.
Back
Top