Function multiversioning question

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Function multiversioning question

Martin Reinecke
Hi,

I'm trying to use gcc's "target_clones" attribute for some functions in
a performance critical library. These functions use gcc builtins and
choose between different sets (standard code, SSE2, AVX) depending on
the predefined macros __SSE2__ and __AVX__.
Unfortunately these macros apparently are not set by the compiler when
it compiles for the individual targets.

Consider the code below:

#include <stdio.h>

__attribute__((target_clones("avx","sse2","default")))
void foo(void)
  {
#if defined(__AVX__)
  printf("AVX\n");
#elif defined(__SSE2__)
  printf("SSE2\n");
#else
  printf("nothing special\n");
#endif
  }

int main(void)
  {
  foo();
  return 0;
  }

Compiling and running this in an AVX-capable CPU prints "SSE2", where I
would have hoped to see "AVX".
Is there a way to achieve what I have in mind?

Thanks,
  Martin
Reply | Threaded
Open this post in threaded view
|

Re: Function multiversioning question

Jonathan Wakely-4
On Thu, 25 Oct 2018 at 12:46, Martin Reinecke
<[hidden email]> wrote:

>
> Hi,
>
> I'm trying to use gcc's "target_clones" attribute for some functions in
> a performance critical library. These functions use gcc builtins and
> choose between different sets (standard code, SSE2, AVX) depending on
> the predefined macros __SSE2__ and __AVX__.
> Unfortunately these macros apparently are not set by the compiler when
> it compiles for the individual targets.
>
> Consider the code below:
>
> #include <stdio.h>
>
> __attribute__((target_clones("avx","sse2","default")))
> void foo(void)
>   {
> #if defined(__AVX__)
>   printf("AVX\n");
> #elif defined(__SSE2__)
>   printf("SSE2\n");
> #else
>   printf("nothing special\n");
> #endif
>   }
>
> int main(void)
>   {
>   foo();
>   return 0;
>   }
>
> Compiling and running this in an AVX-capable CPU prints "SSE2", where I
> would have hoped to see "AVX".

Macros are defined during preprocessing, and the preprocessor doesn't
know anything about the target_clones attribute. When the compiler
sees the attribute it can't go back in time and alter the result of
earlier preprocessing.

> Is there a way to achieve what I have in mind?

If you want three different implementations of the function I think
you need three different clones. Or do runtime checks for the CPU
features inside the function, but that seems suboptimal.
Reply | Threaded
Open this post in threaded view
|

Re: Function multiversioning question

Martin Reinecke
Hi Jonathan,

thanks for the quick reply!

> Macros are defined during preprocessing, and the preprocessor doesn't
> know anything about the target_clones attribute. When the compiler
> sees the attribute it can't go back in time and alter the result of
> earlier preprocessing.

I feared as much.
This creates a nasty asymmetry in the sense that gcc's own optimizations
will be able to use all target features (because the compiler knows that
it is OK to use specific features like AVX instructions) whereas the
user has no way to hand-optimize where this becomes necessary. At least
not using this nice mechanism.

>> Is there a way to achieve what I have in mind?
>
> If you want three different implementations of the function I think
> you need three different clones. Or do runtime checks for the CPU
> features inside the function, but that seems suboptimal.

I guess I'll just put all functions in question in a separate file and
compile this with different flags and name prefixes.

Cheers,
  Martin
Reply | Threaded
Open this post in threaded view
|

Re: Function multiversioning question

Jonathan Wakely-4
On Thu, 25 Oct 2018 at 13:35, Martin Reinecke
<[hidden email]> wrote:

>
> Hi Jonathan,
>
> thanks for the quick reply!
>
> > Macros are defined during preprocessing, and the preprocessor doesn't
> > know anything about the target_clones attribute. When the compiler
> > sees the attribute it can't go back in time and alter the result of
> > earlier preprocessing.
>
> I feared as much.
> This creates a nasty asymmetry in the sense that gcc's own optimizations
> will be able to use all target features (because the compiler knows that
> it is OK to use specific features like AVX instructions) whereas the
> user has no way to hand-optimize where this becomes necessary. At least
> not using this nice mechanism.

They can, just not based on preprocessor macros.


>
> >> Is there a way to achieve what I have in mind?
> >
> > If you want three different implementations of the function I think
> > you need three different clones. Or do runtime checks for the CPU
> > features inside the function, but that seems suboptimal.
>
> I guess I'll just put all functions in question in a separate file and
> compile this with different flags and name prefixes.
>
> Cheers,
>   Martin
Reply | Threaded
Open this post in threaded view
|

Re: Function multiversioning question

Marc Glisse-6
In reply to this post by Martin Reinecke
On Thu, 25 Oct 2018, Martin Reinecke wrote:

> Hi Jonathan,
>
> thanks for the quick reply!
>
>> Macros are defined during preprocessing, and the preprocessor doesn't
>> know anything about the target_clones attribute. When the compiler
>> sees the attribute it can't go back in time and alter the result of
>> earlier preprocessing.
>
> I feared as much.
> This creates a nasty asymmetry in the sense that gcc's own optimizations
> will be able to use all target features (because the compiler knows that
> it is OK to use specific features like AVX instructions) whereas the
> user has no way to hand-optimize where this becomes necessary. At least
> not using this nice mechanism.
>
>>> Is there a way to achieve what I have in mind?
>>
>> If you want three different implementations of the function I think
>> you need three different clones. Or do runtime checks for the CPU
>> features inside the function, but that seems suboptimal.
>
> I guess I'll just put all functions in question in a separate file and
> compile this with different flags and name prefixes.

target_clones does nothing magic, you can also look at target and ifunc.
https://gcc.gnu.org/wiki/FunctionMultiVersioning

--
Marc Glisse
Reply | Threaded
Open this post in threaded view
|

Re: Function multiversioning question

Martin Reinecke
In reply to this post by Jonathan Wakely-4

>> This creates a nasty asymmetry in the sense that gcc's own optimizations
>> will be able to use all target features (because the compiler knows that
>> it is OK to use specific features like AVX instructions) whereas the
>> user has no way to hand-optimize where this becomes necessary. At least
>> not using this nice mechanism.
>
> They can, just not based on preprocessor macros.

I was thinking about decisions at compile time (along the lines of "ah,
I'm in the AVX-specific version of the function, therefore I will call
AVX intrinsics"), and I don't see a way to make them without access to
macros.
At runtime this is of course possible.

Cheers,
  Martin
Reply | Threaded
Open this post in threaded view
|

Re: Function multiversioning question

Martin Reinecke
In reply to this post by Marc Glisse-6
Hi,

I'm coming back to this after some experiments. If one compiles the
attached example with

gcc -c archtest1.c

one gets the output

archtest1.c:4:2: warning: #warning outer file: AVX512F not defined [-Wcpp]
 #warning outer file: AVX512F not defined
  ^~~~~~~
In file included from archtest1.c:8:
archtest2.c:2:2: warning: #warning inner file: AVX512F defined [-Wcpp]
 #warning inner file: AVX512F defined

which seems to contradict what Jonathan said about macros not being
influenced by the #pragmas.

However, if I compile the same code with clang, I get

martin@debian:~/tmp$ clang-7 -c archtest1.c
archtest1.c:4:2: warning: outer file: AVX512F not defined [-W#warnings]
#warning outer file: AVX512F not defined
 ^
In file included from archtest1.c:8:
./archtest2.c:4:2: warning: inner file: AVX512F not defined [-W#warnings]
#warning inner file: AVX512F not defined
 ^
2 warnings generated.

So the compilers behave differently, even though clang tries to emulate
the GCC pragma.

My question is now: is the fact that gcc defines the __AVX512F__ macro
in the included file a bug, or is this working as intended?

Thanks,
  Martin


On 10/25/18 2:50 PM, Marc Glisse wrote:

> On Thu, 25 Oct 2018, Martin Reinecke wrote:
>
>> Hi Jonathan,
>>
>> thanks for the quick reply!
>>
>>> Macros are defined during preprocessing, and the preprocessor doesn't
>>> know anything about the target_clones attribute. When the compiler
>>> sees the attribute it can't go back in time and alter the result of
>>> earlier preprocessing.
>>
>> I feared as much.
>> This creates a nasty asymmetry in the sense that gcc's own optimizations
>> will be able to use all target features (because the compiler knows that
>> it is OK to use specific features like AVX instructions) whereas the
>> user has no way to hand-optimize where this becomes necessary. At least
>> not using this nice mechanism.
>>
>>>> Is there a way to achieve what I have in mind?
>>>
>>> If you want three different implementations of the function I think
>>> you need three different clones. Or do runtime checks for the CPU
>>> features inside the function, but that seems suboptimal.
>>
>> I guess I'll just put all functions in question in a separate file and
>> compile this with different flags and name prefixes.
>
> target_clones does nothing magic, you can also look at target and ifunc.
> https://gcc.gnu.org/wiki/FunctionMultiVersioning
>

archtest1.c (172 bytes) Download Attachment
archtest2.c (115 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Function multiversioning question

Martin Reinecke
... sorry, please ignore my previous email!
clang actually ignores the #pragma completely, but in contrast to gcc it
does so silently, so I completely missed that.

On 1/17/19 10:38 AM, Martin Reinecke wrote:

> Hi,
>
> I'm coming back to this after some experiments. If one compiles the
> attached example with
>
> gcc -c archtest1.c
>
> one gets the output
>
> archtest1.c:4:2: warning: #warning outer file: AVX512F not defined [-Wcpp]
>  #warning outer file: AVX512F not defined
>   ^~~~~~~
> In file included from archtest1.c:8:
> archtest2.c:2:2: warning: #warning inner file: AVX512F defined [-Wcpp]
>  #warning inner file: AVX512F defined
>
> which seems to contradict what Jonathan said about macros not being
> influenced by the #pragmas.
>
> However, if I compile the same code with clang, I get
>
> martin@debian:~/tmp$ clang-7 -c archtest1.c
> archtest1.c:4:2: warning: outer file: AVX512F not defined [-W#warnings]
> #warning outer file: AVX512F not defined
>  ^
> In file included from archtest1.c:8:
> ./archtest2.c:4:2: warning: inner file: AVX512F not defined [-W#warnings]
> #warning inner file: AVX512F not defined
>  ^
> 2 warnings generated.
>
> So the compilers behave differently, even though clang tries to emulate
> the GCC pragma.
>
> My question is now: is the fact that gcc defines the __AVX512F__ macro
> in the included file a bug, or is this working as intended?
>
> Thanks,
>   Martin
>
>
> On 10/25/18 2:50 PM, Marc Glisse wrote:
>> On Thu, 25 Oct 2018, Martin Reinecke wrote:
>>
>>> Hi Jonathan,
>>>
>>> thanks for the quick reply!
>>>
>>>> Macros are defined during preprocessing, and the preprocessor doesn't
>>>> know anything about the target_clones attribute. When the compiler
>>>> sees the attribute it can't go back in time and alter the result of
>>>> earlier preprocessing.
>>>
>>> I feared as much.
>>> This creates a nasty asymmetry in the sense that gcc's own optimizations
>>> will be able to use all target features (because the compiler knows that
>>> it is OK to use specific features like AVX instructions) whereas the
>>> user has no way to hand-optimize where this becomes necessary. At least
>>> not using this nice mechanism.
>>>
>>>>> Is there a way to achieve what I have in mind?
>>>>
>>>> If you want three different implementations of the function I think
>>>> you need three different clones. Or do runtime checks for the CPU
>>>> features inside the function, but that seems suboptimal.
>>>
>>> I guess I'll just put all functions in question in a separate file and
>>> compile this with different flags and name prefixes.
>>
>> target_clones does nothing magic, you can also look at target and ifunc.
>> https://gcc.gnu.org/wiki/FunctionMultiVersioning
>>
Reply | Threaded
Open this post in threaded view
|

Re: Function multiversioning question

Martin Sebor-2
In reply to this post by Martin Reinecke
On 10/25/18 7:13 AM, Martin Reinecke wrote:

>
>>> This creates a nasty asymmetry in the sense that gcc's own optimizations
>>> will be able to use all target features (because the compiler knows that
>>> it is OK to use specific features like AVX instructions) whereas the
>>> user has no way to hand-optimize where this becomes necessary. At least
>>> not using this nice mechanism.
>>
>> They can, just not based on preprocessor macros.
>
> I was thinking about decisions at compile time (along the lines of "ah,
> I'm in the AVX-specific version of the function, therefore I will call
> AVX intrinsics"), and I don't see a way to make them without access to
> macros.
> At runtime this is of course possible.

Since each of the clones has a target attribute attached to it there
should be a way to query that attribute at compile time.  GCC 9
provides a __builtin_has_attribute intrinsic for simple attribute
introspection so in principle it could be used for this.  One caveat
is that the clones only get created by the middle-end so evaluating
the attribute query would have to be deferred until then (right now
it's evaluated during parsing).  As a result, it would no longer
evaluate to a constant expression in this form.  Another challenge
is how to distinguish the clones from the default function so that

   __builtin_has_attribute (foo, target ("avx2"))

didn't refer to the default foo as it does now.  It might need some
special syntax to refer to the current function.

If this is something you would would useful please open a request
n Bugzilla.

Martin
Reply | Threaded
Open this post in threaded view
|

Re: Function multiversioning question

Florian Weimer-5
* Martin Sebor:

> On 10/25/18 7:13 AM, Martin Reinecke wrote:
>>
>>>> This creates a nasty asymmetry in the sense that gcc's own optimizations
>>>> will be able to use all target features (because the compiler knows that
>>>> it is OK to use specific features like AVX instructions) whereas the
>>>> user has no way to hand-optimize where this becomes necessary. At least
>>>> not using this nice mechanism.
>>>
>>> They can, just not based on preprocessor macros.
>>
>> I was thinking about decisions at compile time (along the lines of "ah,
>> I'm in the AVX-specific version of the function, therefore I will call
>> AVX intrinsics"), and I don't see a way to make them without access to
>> macros.
>> At runtime this is of course possible.
>
> Since each of the clones has a target attribute attached to it there
> should be a way to query that attribute at compile time.  GCC 9
> provides a __builtin_has_attribute intrinsic for simple attribute
> introspection so in principle it could be used for this.

Another option: __builtin_cpu_supports could be constant-folded
according to the configured architecture baseline.

Thanks,
Florian