How to get a vector FMA with GCC in a portable way?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

How to get a vector FMA with GCC in a portable way?

Vincent Lefevre-3
I would like to know how to get a vector FMA with GCC in a portable
way.

By "portable way", I mean that the behavior must not depend on the
compilation options (e.g., if FP contraction is disabled, I still
want a true FMA) and that the code must not depend on the architecture
(thus intrinsics should not be used... even when restricting to x86,
one reason is FMA3 vs FMA4 issues).

For instance, for addition, one can write "a + b". But for FMA?

Thanks,

--
Vincent Lefèvre <[hidden email]> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
Reply | Threaded
Open this post in threaded view
|

Re: How to get a vector FMA with GCC in a portable way?

Alexander Monakov-4
On Tue, 15 Jan 2019, Vincent Lefevre wrote:

> I would like to know how to get a vector FMA with GCC in a portable
> way.
>
> By "portable way", I mean that the behavior must not depend on the
> compilation options (e.g., if FP contraction is disabled, I still
> want a true FMA) and that the code must not depend on the architecture
> (thus intrinsics should not be used... even when restricting to x86,
> one reason is FMA3 vs FMA4 issues).
>
> For instance, for addition, one can write "a + b". But for FMA?

In the context of autovectorized code or when using generic vector types?
When the source is supposed to be autovectorized and operates on scalar
variables, using fma function works (GCC recognizes it as a builtin;
__FP_FAST_FMA is predefined when the fma instruction is available).

For generic vector types I'm afraid GCC does not provide such a facility.
I think it would make a reasonable feature request.

Alexander.
Reply | Threaded
Open this post in threaded view
|

Re: How to get a vector FMA with GCC in a portable way?

Vincent Lefevre-3
On 2019-01-16 12:26:51 +0300, Alexander Monakov wrote:

> On Tue, 15 Jan 2019, Vincent Lefevre wrote:
>
> > I would like to know how to get a vector FMA with GCC in a portable
> > way.
> >
> > By "portable way", I mean that the behavior must not depend on the
> > compilation options (e.g., if FP contraction is disabled, I still
> > want a true FMA) and that the code must not depend on the architecture
> > (thus intrinsics should not be used... even when restricting to x86,
> > one reason is FMA3 vs FMA4 issues).
> >
> > For instance, for addition, one can write "a + b". But for FMA?
>
> In the context of autovectorized code or when using generic vector types?

It could be either (or both, see below). But it appears that I need
to use vector types due to ABI issues:

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65847#c1

(inlining might improve things, but I prefer to avoid ABI issues in
every case).

But if I use fma() (from either <math.h> or <tgmath.h>), it must be
done on scalar types, thus this means that autovectorized code must
also work with decomposed vector types. Unfortunately, while this
works with structures (which are affected by ABI issues), this
doesn't with vectors: on x86_64, I get 2 vfmadd132sd (with unpack
instructions) instead of a single vfmadd132pd!

I've just reported the following bug:

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88873

> When the source is supposed to be autovectorized and operates on scalar
> variables, using fma function works (GCC recognizes it as a builtin;
> __FP_FAST_FMA is predefined when the fma instruction is available).
>
> For generic vector types I'm afraid GCC does not provide such a facility.
> I think it would make a reasonable feature request.

I've just done it here:

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88874

--
Vincent Lefèvre <[hidden email]> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)