Cxxmark Compiler Performance LC /3.1

Frank B

Senior Member
Hi,

I ported the Coremark Benchmark to Arduino.
My goal was to get a reliable comparison of the performance between different compiler-versions.
I did a first run today, with the newest "GCC4.9.3 20150529 (release) [ARM/embedded-4_9-branch revision 224288]" toolchain from launchpad, which is maintained by ARM-employees.

Tomorrow, i want to run the same tests with the "official" Teensyduino compiler-version, which is 4.8.4
Edit: Done

These are the values:
2015-06-28 11_31_31-Microsoft Excel - coremark_teensy.xlsx.png
Disclaimer:
Important: These are NOT Coremark Values. I had to change the source a little bit, to be able to compile it for Arduino. Therefore, it is not allowed to call my test "Coremark" (if i understood the licence correctly). And unfortunately, I'm not allowed to publish the source (i think).

The main change was to rename the files from *.c to *.cpp and replace printf with Serial.printf. Both changes should'nt have any impact... but to be sure: Don't call it coremark, please.

I learned: -O3 is not good for our Targets. I think there waits some work for the GCC-Guys....
I'll do more tests in the next days with more detailed optimization-options.
 
Last edited:
Hi,

I ported the Coremark Benchmark to Arduino.
My goal was to get a reliable comparison of the performance between different compiler-versions.

I updated the table.
What I found:
- 4.9.3 is not better than 4.8.4 with this test
- O3 is not good

So, for performance, there is no reason to update.
For 4.9 the Newlib is newer and there are more functions in it.
 
I found a slightly better set of options:
Code:
-O2 -mslow-flash-data -finline-functions -funswitch-loops -fpredictive-commoning -fgcse-after-reload  -fvect-cost-model -ftree-partial-pre -fipa-cp-clone

It is a subset of the additional flags from O2 to O3, plus "-mslow-flash-data". The (O3) -ftree* optimizations (or one of them, maybe i test this in more detail someday) seem to have a negative impact.
For 4.9.3 (4.8 not tested) this is the fastest i found so far.I find it remarkable that "mslow-flash-data" has indeed a small positive effect. Perhaps due to the cache ? Its only 32 Bytes, correct ?


- My Results are only valid for my "Coremark-Sketch", you may get other results with your code -
 
Last edited:
Update with GCC 5.2.1

coremark.jpg

I have not tested -mslow-flash-data with GCC 5.2.1

-flto can save several KB of space.
-it has a negative influence on the speed
-O1 is not good for GCC 5.2
-O2 with GCC 5.2 gives the best (fastest!) result over all tests so far. 301 Iterations/s. But keep in mind, that the older tests were with older Teensyduino-versions - this may have an influence.
-O2 creates smaller! code than -O1
-Os is slower than with GCC 4.8 and generates a bit more code.
- the difference between Os and O2 shows no reason to use -Os.. OS generates a bit smaller code, but is much slower.

For detailed results, click on the picture.I can upload the excel-sheet on request.

Important: My Coremark-sketch crashes with -flto in combination with -Os

Disclamer: Remember, you may have other results with other benchmarks.
 
Last edited:
Any chance you could add -Og if you do more testing? Someday we might start using -Og quite often, regardless of its performance or code size......
 
GCC5.3.1 20160307 (release) [ARM/embedded-5-branch revision 234589]

For -O2 again faster, its the fastest measured value now: 305 for 144MHz (codesize 41408 Bytes)

Could'nt test with Os or Og (fctv() is missing here)
 
Back
Top