
Originally Posted by
Frank B
..and it is great.
Oh, good to know! Does always_inline override that?
Just in case it isn't clear, GCC can only switch the options used at the function boundary level. So if you have:
Code:
__attribute__((__optimize__("O3,unroll-loops")))
void fast_inner (...)
{
}
void slow_outer (...)
{
// ...
if (test) {
fast_inner ();
}
// ...
}
There is no support in the compiler to change the optimization level or target options in the if statement. You only can change things at the function level. That means with __aways_inline__, you lose the optimization attributes.
I should mention my motivation for adding it was more from the hosted compiler perspective and not the embedded perspective. At AMD and now at IBM, you have generations of processors. Typically most code is compiled for the least common denominator, but for performance critical code, you might want one function to be compiled with special options.
For the hosted folk which have full shared library support you now have the target_clones attribute:
Code:
/* Power9 (aka, ISA 3.0) has a MODSD instruction to do modulus, while Power8
(aka, ISA 2.07) has to do modulus with divide and multiply. Make sure
both clone functions are generated.
Restrict ourselves to Linux, since IFUNC might not be supported in other
operating systems. */
__attribute__((target_clones("cpu=power9,default")))
long mod_func (long a, long b)
{
return a % b;
}
long mod_func_or (long a, long b, long c)
{
return mod_func (a, b) | c;
}
Now this example is silly, because it takes a lot longer to call a function through the PLT that can vary at initialization time (or is in a shared library) than a normal call, but it illustrates the example usage.
I also did think about the embedded use (such as Teensy) and knew that often there were often some reasons you might want to select different functions to be compiled differently. In the embedded world, you are typically only compiling for a single processor, so having different target options doesn't give you much.
- Often times you have very little code space, and compiling most functions with -Os will allow you to fit into a fixed size flash memory region;
- There are times when an embedded processor is doing something speed critical, and you need every cycle you can get, but most of the program is not speed critical;
- Unfortunately there are bugs in complex software like GCC, and using an attribute to turn off a specific optimization that results in buggy code might be needed;
- In environments like Arduino, it is not always easy to modify what compilation options are used.