Function multiversioning question

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Function multiversioning question

Martin Reinecke
Hi,

I'm trying to use gcc's "target_clones" attribute for some functions in
a performance critical library. These functions use gcc builtins and
choose between different sets (standard code, SSE2, AVX) depending on
the predefined macros __SSE2__ and __AVX__.
Unfortunately these macros apparently are not set by the compiler when
it compiles for the individual targets.

Consider the code below:

#include <stdio.h>

__attribute__((target_clones("avx","sse2","default")))
void foo(void)
  {
#if defined(__AVX__)
  printf("AVX\n");
#elif defined(__SSE2__)
  printf("SSE2\n");
#else
  printf("nothing special\n");
#endif
  }

int main(void)
  {
  foo();
  return 0;
  }

Compiling and running this in an AVX-capable CPU prints "SSE2", where I
would have hoped to see "AVX".
Is there a way to achieve what I have in mind?

Thanks,
  Martin
Reply | Threaded
Open this post in threaded view
|

Re: Function multiversioning question

Jonathan Wakely-4
On Thu, 25 Oct 2018 at 12:46, Martin Reinecke
<[hidden email]> wrote:

>
> Hi,
>
> I'm trying to use gcc's "target_clones" attribute for some functions in
> a performance critical library. These functions use gcc builtins and
> choose between different sets (standard code, SSE2, AVX) depending on
> the predefined macros __SSE2__ and __AVX__.
> Unfortunately these macros apparently are not set by the compiler when
> it compiles for the individual targets.
>
> Consider the code below:
>
> #include <stdio.h>
>
> __attribute__((target_clones("avx","sse2","default")))
> void foo(void)
>   {
> #if defined(__AVX__)
>   printf("AVX\n");
> #elif defined(__SSE2__)
>   printf("SSE2\n");
> #else
>   printf("nothing special\n");
> #endif
>   }
>
> int main(void)
>   {
>   foo();
>   return 0;
>   }
>
> Compiling and running this in an AVX-capable CPU prints "SSE2", where I
> would have hoped to see "AVX".

Macros are defined during preprocessing, and the preprocessor doesn't
know anything about the target_clones attribute. When the compiler
sees the attribute it can't go back in time and alter the result of
earlier preprocessing.

> Is there a way to achieve what I have in mind?

If you want three different implementations of the function I think
you need three different clones. Or do runtime checks for the CPU
features inside the function, but that seems suboptimal.
Reply | Threaded
Open this post in threaded view
|

Re: Function multiversioning question

Martin Reinecke
Hi Jonathan,

thanks for the quick reply!

> Macros are defined during preprocessing, and the preprocessor doesn't
> know anything about the target_clones attribute. When the compiler
> sees the attribute it can't go back in time and alter the result of
> earlier preprocessing.

I feared as much.
This creates a nasty asymmetry in the sense that gcc's own optimizations
will be able to use all target features (because the compiler knows that
it is OK to use specific features like AVX instructions) whereas the
user has no way to hand-optimize where this becomes necessary. At least
not using this nice mechanism.

>> Is there a way to achieve what I have in mind?
>
> If you want three different implementations of the function I think
> you need three different clones. Or do runtime checks for the CPU
> features inside the function, but that seems suboptimal.

I guess I'll just put all functions in question in a separate file and
compile this with different flags and name prefixes.

Cheers,
  Martin
Reply | Threaded
Open this post in threaded view
|

Re: Function multiversioning question

Jonathan Wakely-4
On Thu, 25 Oct 2018 at 13:35, Martin Reinecke
<[hidden email]> wrote:

>
> Hi Jonathan,
>
> thanks for the quick reply!
>
> > Macros are defined during preprocessing, and the preprocessor doesn't
> > know anything about the target_clones attribute. When the compiler
> > sees the attribute it can't go back in time and alter the result of
> > earlier preprocessing.
>
> I feared as much.
> This creates a nasty asymmetry in the sense that gcc's own optimizations
> will be able to use all target features (because the compiler knows that
> it is OK to use specific features like AVX instructions) whereas the
> user has no way to hand-optimize where this becomes necessary. At least
> not using this nice mechanism.

They can, just not based on preprocessor macros.


>
> >> Is there a way to achieve what I have in mind?
> >
> > If you want three different implementations of the function I think
> > you need three different clones. Or do runtime checks for the CPU
> > features inside the function, but that seems suboptimal.
>
> I guess I'll just put all functions in question in a separate file and
> compile this with different flags and name prefixes.
>
> Cheers,
>   Martin
Reply | Threaded
Open this post in threaded view
|

Re: Function multiversioning question

Marc Glisse-6
In reply to this post by Martin Reinecke
On Thu, 25 Oct 2018, Martin Reinecke wrote:

> Hi Jonathan,
>
> thanks for the quick reply!
>
>> Macros are defined during preprocessing, and the preprocessor doesn't
>> know anything about the target_clones attribute. When the compiler
>> sees the attribute it can't go back in time and alter the result of
>> earlier preprocessing.
>
> I feared as much.
> This creates a nasty asymmetry in the sense that gcc's own optimizations
> will be able to use all target features (because the compiler knows that
> it is OK to use specific features like AVX instructions) whereas the
> user has no way to hand-optimize where this becomes necessary. At least
> not using this nice mechanism.
>
>>> Is there a way to achieve what I have in mind?
>>
>> If you want three different implementations of the function I think
>> you need three different clones. Or do runtime checks for the CPU
>> features inside the function, but that seems suboptimal.
>
> I guess I'll just put all functions in question in a separate file and
> compile this with different flags and name prefixes.

target_clones does nothing magic, you can also look at target and ifunc.
https://gcc.gnu.org/wiki/FunctionMultiVersioning

--
Marc Glisse
Reply | Threaded
Open this post in threaded view
|

Re: Function multiversioning question

Martin Reinecke
In reply to this post by Jonathan Wakely-4

>> This creates a nasty asymmetry in the sense that gcc's own optimizations
>> will be able to use all target features (because the compiler knows that
>> it is OK to use specific features like AVX instructions) whereas the
>> user has no way to hand-optimize where this becomes necessary. At least
>> not using this nice mechanism.
>
> They can, just not based on preprocessor macros.

I was thinking about decisions at compile time (along the lines of "ah,
I'm in the AVX-specific version of the function, therefore I will call
AVX intrinsics"), and I don't see a way to make them without access to
macros.
At runtime this is of course possible.

Cheers,
  Martin