PDA

View Full Version : millis() on teensy 3



manitou
01-31-2013, 01:49 AM
I just got my Teensy 3.0, tried some simple UNO programs. The little fibonacci benchmarks ran, but
it reported 0 ms for times?? (I know it's not taking 0 ms). is there some magic for millis()?
micros() reports good timings

Here is the UNO sketch.


unsigned long fibo (unsigned int n) {
if (n < 2) {
return (n);
}
else {
return fibo(n - 1) + fibo(n - 2);
}
}
void setup() {
Serial.begin(9600);
}

void loop() {
int n;
long result;
long startTime;
long endTime;
long ms;


for (n = 0; n <= 26; n++) {
Serial.print("fibo- ");
Serial.print(n);
startTime = millis();
result = fibo(n);
endTime = millis();
ms = (endTime - startTime);
Serial.print(" "); Serial.print(result);
Serial.print(" "); Serial.println(ms);
}
delay(2000);
}

Green
01-31-2013, 03:25 AM
nothing wrong for 0ms, because who action finished under 1 Milli second.

manitou
01-31-2013, 11:58 AM
nothing wrong for 0ms, because who action finished under 1 Milli second.

Nope. fibo(26) takes 84ms (as measured by micros() at 96MHz)
I'm using beta11 IDE.

looking at the dis-assembled listing of code, the compiler is doing strange things
with in-line millis() ... sigh

PaulStoffregen
01-31-2013, 02:37 PM
Wow, that is indeed very strange. It sure seems like a compiler bug. Maybe it's some obscure quirk (or "feature") of the C & C++ language?

Anyway, I did some fiddling with core_pins.h and this seems to fix it.



static inline uint32_t millis(void) __attribute__((always_inline, unused));
static inline uint32_t millis(void)
{
volatile uint32_t ret = systick_millis_count; // single aligned 32 bit is atomic;
return ret;
}


Please confirm if this works for you?

manitou
01-31-2013, 03:09 PM
Paul, your new code fixed the millis() problem.
I suspect it had something to do with C losing its notion of volatile for the millis counter ...


thanks

tni
02-08-2013, 06:43 PM
Nope. fibo(26) takes 84ms (as measured by micros() at 96MHz)
I'm using beta11 IDE.

looking at the dis-assembled listing of code, the compiler is doing strange things
with in-line millis() ... sigh

Looking at the dis-assembled code, the compiler is doing the the volatile systick_millis_count reads, but reordering the 'fibo' call - which it is allowed to do, since the function has no side effects.

A snippet of the disassembled code:
504: f7ff ffe8 bl 4d8 <_Z4fiboj>
508: 682e ldr r6, [r5, #0]
50a: 6829 ldr r1, [r5, #0]

504 is the 'fibo' call, result will be in r0 (not used). 508: startTime read from 'systick_millis_count' into r6. 50a: endTime read from 'systick_millis_count' into r1.

Declaring 'result' volatile results in:
504: f7ff ffe8 bl 4d8 <_Z4fiboj>
508: 682e ldr r6, [r5, #0]
50a: 9001 str r0, [sp, #4]
50c: 6829 ldr r1, [r5, #0]

504 is the 'fibo' call, result will be in r0. 508: startTime read from 'systick_millis_count' into r6. 50a: 'fibo' result from r0 written to the stack variable allocated for 'result'. 50c: endTime read from 'systick_millis_count' into r1. What the compiler is doing is perfectly legal and it properly preserves the ordering of the volatile accesses.

You really want to use a compiler barrier to prevent the reordering:
...
asm volatile("" ::: "memory");
result = fibo(n);
asm volatile("" ::: "memory");
...

So, Paul your fix 'works' by accident, if the compiler gets a bit smarter with optimization, it may break.

PaulStoffregen
02-08-2013, 07:05 PM
So, Paul your fix 'works' by accident, if the compiler gets a bit smarter with optimization, it may break.

For the moment, I'm happy it works....

Long term, how about putting a pair of 'asm volatile("" ::: "memory");' around the read inside the inline millis() function?

tni
02-08-2013, 07:32 PM
For the moment, I'm happy it works....

Long term, how about putting a pair of 'asm volatile("" ::: "memory");' around the read inside the inline millis() function?
That will work, but I think this is a user error (nothing wrong with your original function). On the other hand, it will help to keep support questions down - always a good thing... (And the performance impact should be quite small.)

manitou
05-21-2016, 03:24 PM
Deja vu

Well, I re-ran this old benchmark, and delta millis() was again 0s! I know that I had been admonished above for not providing a memory barrier, but Paul's adjustment to core_pins.h fixed it, so I didn't do anything. Now it's back because of May 2015 issue (https://github.com/PaulStoffregen/cores/commit/3bc1d5f9e8c76af26b8d8f8dcee418cfa78d8efe) that noted the volatile as redundant. I re-applied Paul's adjustment and it sill fixes timings by preventing the code re-ordering ... whatever....

stevech
05-21-2016, 06:21 PM
Nit, but relevant: original post's code used 'long' for time values whereas timers, counters, ticks, etc. are not arithmetic values; they are counts. So unsigned long or uint32_t complies.
Not an issue with count values less than 2^31.

PaulStoffregen
06-04-2016, 04:41 AM
I've returned the original seemingly-redundant millis()

https://github.com/PaulStoffregen/cores/commit/9b06bacb72f73b01e00eb278bc0bb043b5e6c5ac

This change didn't make it into 1.29-beta2, but it will be in the final 1.29 release. The code now has a lengthy comment explaining the issue, with a link to this thread, so hopefully it won't be optimized away again without checking this case.

Ros3
06-04-2016, 06:03 AM
I find this rather interesting. If I'm understanding correctly, the compiler is re-ordering non-consequential commands (for faster processing, I assume. It looks like it wants to keep everything in the cache rather than going to RAM, which is rather smart, seeing how SLOW RAM is when you're down at the processor's clock speed), but in this particular case, it moved the 2 millis() calls close enough that their difference was 0. huh, I guess in the real world, I don't care exactly the order my computer does work in, so long as it all gets done in a reasonable amount of time.
Would this ever cause a problem in real-world scenarios? It seems to only happen because nothing of worth is really changed between the 2 calls to millis(); (or am I just interpreting this incorrectly?)

Frank B
06-04-2016, 08:37 AM
Sorry, - i didn't notice this thread - at least for gcc 5 the "bug" only exists for "Optimize for size" (did not look @ older versions) , i don't think it is a good idea to fix it this way.
I'm sure, not millis() itself is the problem-
Changing the sketch
volatile long startTime; solves the "problem", too, so editing milis() to use volatile and redundant accesses is not more than a workaround.
We should invest some more time in this to find the real reason for this behavior. And perhaps using some lines assembler instead of C are the better way to solve the problem.

Frank B
06-04-2016, 09:29 AM
yes, i think the sketch is problem. nobody says the compiler that fibu() has to be called between the both millis() - they are all compkletely independent. if you want that, you have to edit the sketch, but not millis().

Changing millis() affects ALL sketches, and perhaps leads to more generated (and thus slower) code than needed.

Frank B
06-04-2016, 05:58 PM
I'd prefer something like this:


static inline uint32_t millis(void)
{
uint32_t r = systick_millis_count;
asm ("": : :"memory");
return r;
}

I think this gives the compiler a bit more room for optimizations than the very rigid double - "volatile". At least at higher levels than -Os (the only case were the "problem" occurs - I - still - think, that it is a problem with the skecth, not with millis() )