PPC64 libmvec implementation of sincos

classic Classic list List threaded Threaded
19 messages Options
GT
Reply | Threaded
Open this post in threaded view
|

PPC64 libmvec implementation of sincos

GT
I am attempting to create a vector version of sincos for PPC64.
The relevant discussion thread is on the GLIBC libc-alpha mailing list.
Navigate it beginning at https://sourceware.org/ml/libc-alpha/2019-09/msg00334.html

The intention is to reuse as much as possible from the existing GCC implementation of other libmvec functions.
My questions are: Which function(s) in GCC;

1. Gather scalar function input arguments, from multiple loop iterations, into a single vector input argument for the vector function version?
2. Distribute scalar function outputs, to appropriate loop iteration result, from the single vector function output result?

I am referring especially to vectorization of sin and cos.

Thanks.
Bert Tenjy.
Reply | Threaded
Open this post in threaded view
|

Re: PPC64 libmvec implementation of sincos

Szabolcs Nagy-2
On 27/09/2019 20:23, GT wrote:

> I am attempting to create a vector version of sincos for PPC64.
> The relevant discussion thread is on the GLIBC libc-alpha mailing list.
> Navigate it beginning at https://sourceware.org/ml/libc-alpha/2019-09/msg00334.html
>
> The intention is to reuse as much as possible from the existing GCC implementation of other libmvec functions.
> My questions are: Which function(s) in GCC;
>
> 1. Gather scalar function input arguments, from multiple loop iterations, into a single vector input argument for the vector function version?
> 2. Distribute scalar function outputs, to appropriate loop iteration result, from the single vector function output result?
>
> I am referring especially to vectorization of sin and cos.

i wonder if gcc can auto-vectorize scalar sincos
calls, the vectorizer seems to want the calls to
have no side-effect, but attribute pure or const
is not appropriate for sincos (which has no return
value but takes writable pointer args)

"#pragma omp simd" on a loop seems to work but i
could not get unannotated sincos loops to vectorize.

it seems it would be nice if we could add pure/const
somehow (maybe to the simd variant only? afaik openmp
requires no sideeffects for simd variants, but that's
probably only for explicitly marked loops?)
GT
Reply | Threaded
Open this post in threaded view
|

Re: PPC64 libmvec implementation of sincos

GT
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Monday, September 30, 2019 9:52 AM, Szabolcs Nagy <[hidden email]> wrote:

> On 27/09/2019 20:23, GT wrote:
>
> > I am attempting to create a vector version of sincos for PPC64.
> > The relevant discussion thread is on the GLIBC libc-alpha mailing list.
> > Navigate it beginning at https://sourceware.org/ml/libc-alpha/2019-09/msg00334.html
> > The intention is to reuse as much as possible from the existing GCC implementation of other libmvec functions.
> > My questions are: Which function(s) in GCC;
> >
> > 1.  Gather scalar function input arguments, from multiple loop iterations, into a single vector input argument for the vector function version?
> > 2.  Distribute scalar function outputs, to appropriate loop iteration result, from the single vector function output result?
> >
> > I am referring especially to vectorization of sin and cos.
>
> i wonder if gcc can auto-vectorize scalar sincos
> calls, the vectorizer seems to want the calls to
> have no side-effect, but attribute pure or const
> is not appropriate for sincos (which has no return
> value but takes writable pointer args)

1.  Do you mean whether x86_64 already does auto-vectorize sincos?
2.  Where in the code do you see the vectorizer require no side-effect?

> "#pragma omp simd" on a loop seems to work but i
> could not get unannotated sincos loops to vectorize.
>
> it seems it would be nice if we could add pure/const
> somehow (maybe to the simd variant only? afaik openmp
> requires no sideeffects for simd variants, but that's
> probably only for explicitly marked loops?)

1. Example 1 and Example 2 at https://sourceware.org/glibc/wiki/libmvec show the 2 different
ways to activate auto-vectorization. When you refer to "unannotated sincos", which of
the 2 techniques do you mean?
2. Which function was auto-vectorized by "pragma omp simd" in the loop?
Reply | Threaded
Open this post in threaded view
|

Re: PPC64 libmvec implementation of sincos

Szabolcs Nagy-2
On 30/09/2019 18:30, GT wrote:

> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Monday, September 30, 2019 9:52 AM, Szabolcs Nagy <[hidden email]> wrote:
>
>> On 27/09/2019 20:23, GT wrote:
>>
>>> I am attempting to create a vector version of sincos for PPC64.
>>> The relevant discussion thread is on the GLIBC libc-alpha mailing list.
>>> Navigate it beginning at https://sourceware.org/ml/libc-alpha/2019-09/msg00334.html
>>> The intention is to reuse as much as possible from the existing GCC implementation of other libmvec functions.
>>> My questions are: Which function(s) in GCC;
>>>
>>> 1.  Gather scalar function input arguments, from multiple loop iterations, into a single vector input argument for the vector function version?
>>> 2.  Distribute scalar function outputs, to appropriate loop iteration result, from the single vector function output result?
>>>
>>> I am referring especially to vectorization of sin and cos.
>>
>> i wonder if gcc can auto-vectorize scalar sincos
>> calls, the vectorizer seems to want the calls to
>> have no side-effect, but attribute pure or const
>> is not appropriate for sincos (which has no return
>> value but takes writable pointer args)
>
> 1.  Do you mean whether x86_64 already does auto-vectorize sincos?

any current target with simd attribute or omp delcare simd support.

> 2.  Where in the code do you see the vectorizer require no side-effect?

i don't know where it is in the code, but

__attribute__((simd)) float foo (float);

void bar (float *restrict a, float *restrict b)
{
        for(int i=0; i<4000; i++)
                a[i] = foo (b[i]);
}

is not vectorized, however it gets vectorized if

i add __attribute__((const)) to foo
OR
if i add '#pragma omp simd' to the loop and compile with
-fopenmp-simd.

(which makes sense to me: you don't want to vectorize
if you don't know the side-effects, otoh, there is no
attribute to say that i know there will be no side-effects
in functions taking pointer arguments so i don't see
how sincos can get vectorized)

>> "#pragma omp simd" on a loop seems to work but i
>> could not get unannotated sincos loops to vectorize.
>>
>> it seems it would be nice if we could add pure/const
>> somehow (maybe to the simd variant only? afaik openmp
>> requires no sideeffects for simd variants, but that's
>> probably only for explicitly marked loops?)
>
> 1. Example 1 and Example 2 at https://sourceware.org/glibc/wiki/libmvec show the 2 different
> ways to activate auto-vectorization. When you refer to "unannotated sincos", which of
> the 2 techniques do you mean?

example 1 annotates the loop with #pragma omp simd.
(and requires -fopenmp-simd cflag to work)

example 2 is my goal where -ftree-vectorize is enough
without annotation.

> 2. Which function was auto-vectorized by "pragma omp simd" in the loop?

see example above.
Reply | Threaded
Open this post in threaded view
|

Re: PPC64 libmvec implementation of sincos

Richard Biener-2
In reply to this post by Szabolcs Nagy-2
On September 30, 2019 3:52:52 PM GMT+02:00, Szabolcs Nagy <[hidden email]> wrote:

>On 27/09/2019 20:23, GT wrote:
>> I am attempting to create a vector version of sincos for PPC64.
>> The relevant discussion thread is on the GLIBC libc-alpha mailing
>list.
>> Navigate it beginning at
>https://sourceware.org/ml/libc-alpha/2019-09/msg00334.html
>>
>> The intention is to reuse as much as possible from the existing GCC
>implementation of other libmvec functions.
>> My questions are: Which function(s) in GCC;
>>
>> 1. Gather scalar function input arguments, from multiple loop
>iterations, into a single vector input argument for the vector function
>version?
>> 2. Distribute scalar function outputs, to appropriate loop iteration
>result, from the single vector function output result?
>>
>> I am referring especially to vectorization of sin and cos.
>
>i wonder if gcc can auto-vectorize scalar sincos
>calls, the vectorizer seems to want the calls to
>have no side-effect, but attribute pure or const
>is not appropriate for sincos (which has no return
>value but takes writable pointer args)

We have __builtin_cexpi for that but not sure if any of the mechanisms can provide a mapping to a vectorized variant.

>"#pragma omp simd" on a loop seems to work but i
>could not get unannotated sincos loops to vectorize.
>
>it seems it would be nice if we could add pure/const
>somehow (maybe to the simd variant only? afaik openmp
>requires no sideeffects for simd variants, but that's
>probably only for explicitly marked loops?)

GT
Reply | Threaded
Open this post in threaded view
|

Re: PPC64 libmvec implementation of sincos

GT
> >
> > i wonder if gcc can auto-vectorize scalar sincos
> > calls, the vectorizer seems to want the calls to
> > have no side-effect, but attribute pure or const
> > is not appropriate for sincos (which has no return
> > value but takes writable pointer args)
>
> We have __builtin_cexpi for that but not sure if any of the mechanisms can provide a mapping to a vectorized variant.
>

1. Using flags -fopt-info-all and -fopt-info-internals, the failure to vectorize sincos
is reported as "unsupported data-type: complex double". The default GCC behavior is to
replace sincos calls with calls to __builtin_cexpi.

2. Using flags -fno-builtin-sincos and -fno-builtin-cexpi, the failure to vectorize
sincos is different. In this case, the failure to vectorize is due to "number of iterations
could not be computed". No calls to __builtin_cexpi; sincos calls retained.

Questions:
1. Should we aim to provide a vectorized version of __builtin_cexpi? If so, it would have
to be a PPC64-only vector __builtin-cexpi, right?

2. Or should we require that vectorized sincos be available only when -fno-builtin-sincos flag
is used in compilation?

I don't think we need to fix both types of vectorization failures in order to obtain sincos
vectorization.

Thanks.
Bert.
Reply | Threaded
Open this post in threaded view
|

Re: PPC64 libmvec implementation of sincos

Richard Biener-2
On Mon, Nov 25, 2019 at 5:53 PM GT <[hidden email]> wrote:

>
> > >
> > > i wonder if gcc can auto-vectorize scalar sincos
> > > calls, the vectorizer seems to want the calls to
> > > have no side-effect, but attribute pure or const
> > > is not appropriate for sincos (which has no return
> > > value but takes writable pointer args)
> >
> > We have __builtin_cexpi for that but not sure if any of the mechanisms can provide a mapping to a vectorized variant.
> >
>
> 1. Using flags -fopt-info-all and -fopt-info-internals, the failure to vectorize sincos
> is reported as "unsupported data-type: complex double". The default GCC behavior is to
> replace sincos calls with calls to __builtin_cexpi.
>
> 2. Using flags -fno-builtin-sincos and -fno-builtin-cexpi, the failure to vectorize
> sincos is different. In this case, the failure to vectorize is due to "number of iterations
> could not be computed". No calls to __builtin_cexpi; sincos calls retained.
>
> Questions:
> 1. Should we aim to provide a vectorized version of __builtin_cexpi? If so, it would have
> to be a PPC64-only vector __builtin-cexpi, right?
>
> 2. Or should we require that vectorized sincos be available only when -fno-builtin-sincos flag
> is used in compilation?
>
> I don't think we need to fix both types of vectorization failures in order to obtain sincos
> vectorization.

I think we should have a vectorized cexpi since that's having a sane
ABI.  The complex
return type of cexpi makes it a little awkward for the vectorizer but
handling this should
be manageable.  It's a bit difficult to expose complex types to the
vectorizer since
most cases are lowered early.

Richard.

> Thanks.
> Bert.
GT
Reply | Threaded
Open this post in threaded view
|

Re: PPC64 libmvec implementation of sincos

GT
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Wednesday, November 27, 2019 3:19 AM, Richard Biener <[hidden email]> wrote:

...

> > Questions:
> >
> > 1.  Should we aim to provide a vectorized version of __builtin_cexpi? If so, it would have
> >     to be a PPC64-only vector __builtin-cexpi, right?
> >
> > 2.  Or should we require that vectorized sincos be available only when -fno-builtin-sincos flag
> >     is used in compilation?
> >
> >
> > I don't think we need to fix both types of vectorization failures in order to obtain sincos
> > vectorization.
>
> I think we should have a vectorized cexpi since that's having a sane
> ABI. The complex
> return type of cexpi makes it a little awkward for the vectorizer but
> handling this should
> be manageable. It's a bit difficult to expose complex types to the
> vectorizer since
> most cases are lowered early.
>

I'm trying to identify the source code which needs modification but I need help proceeding.

I am comparing two compilations: The first is a simple file with a call to sin in a loop.
Vectorization succeeds. The second is an almost identical file but with a call to sincos
in the loop. Vectorization fails.

In gdb, the earliest code location where the two compilations differ is in function
number_of_iterations_exit_assumptions in file tree-ssa-loop-niter.c. Line

op0 = gimple_cond_lhs (stmt);

returns a tree which when analyzed in function instantiate_scev_r (in file tree-scalar-evolution.c)
results in the first branch of the switch being taken for sincos. For sin, the 2nd branch of the
switch is taken.

How can I correlate stmt in the source line above to the relevant line in any dump among those created
using debugging dump option -fdump-tree-all?

Thanks.
Bert.
Reply | Threaded
Open this post in threaded view
|

Re: PPC64 libmvec implementation of sincos

Richard Biener-2
On Wed, Dec 4, 2019 at 9:53 PM GT <[hidden email]> wrote:

>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Wednesday, November 27, 2019 3:19 AM, Richard Biener <[hidden email]> wrote:
>
> ...
>
> > > Questions:
> > >
> > > 1.  Should we aim to provide a vectorized version of __builtin_cexpi? If so, it would have
> > >     to be a PPC64-only vector __builtin-cexpi, right?
> > >
> > > 2.  Or should we require that vectorized sincos be available only when -fno-builtin-sincos flag
> > >     is used in compilation?
> > >
> > >
> > > I don't think we need to fix both types of vectorization failures in order to obtain sincos
> > > vectorization.
> >
> > I think we should have a vectorized cexpi since that's having a sane
> > ABI. The complex
> > return type of cexpi makes it a little awkward for the vectorizer but
> > handling this should
> > be manageable. It's a bit difficult to expose complex types to the
> > vectorizer since
> > most cases are lowered early.
> >
>
> I'm trying to identify the source code which needs modification but I need help proceeding.
>
> I am comparing two compilations: The first is a simple file with a call to sin in a loop.
> Vectorization succeeds. The second is an almost identical file but with a call to sincos
> in the loop. Vectorization fails.
>
> In gdb, the earliest code location where the two compilations differ is in function
> number_of_iterations_exit_assumptions in file tree-ssa-loop-niter.c. Line
>
> op0 = gimple_cond_lhs (stmt);
>
> returns a tree which when analyzed in function instantiate_scev_r (in file tree-scalar-evolution.c)
> results in the first branch of the switch being taken for sincos. For sin, the 2nd branch of the
> switch is taken.
>
> How can I correlate stmt in the source line above to the relevant line in any dump among those created
> using debugging dump option -fdump-tree-all?

grep ;)

Can you provide a testcase with a simd attribute annotated cexpi that
one can play with?

Richard.

>
> Thanks.
> Bert.
GT
Reply | Threaded
Open this post in threaded view
|

Re: PPC64 libmvec implementation of sincos

GT

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Thursday, December 5, 2019 4:44 AM, Richard Biener <[hidden email]> wrote:

...
...
...

> >
> > I'm trying to identify the source code which needs modification but I need help proceeding.
> > I am comparing two compilations: The first is a simple file with a call to sin in a loop.
> > Vectorization succeeds. The second is an almost identical file but with a call to sincos
> > in the loop. Vectorization fails.
> > In gdb, the earliest code location where the two compilations differ is in function
> > number_of_iterations_exit_assumptions in file tree-ssa-loop-niter.c. Line
> > op0 = gimple_cond_lhs (stmt);
> > returns a tree which when analyzed in function instantiate_scev_r (in file tree-scalar-evolution.c)
> > results in the first branch of the switch being taken for sincos. For sin, the 2nd branch of the
> > switch is taken.
> > How can I correlate stmt in the source line above to the relevant line in any dump among those created
> > using debugging dump option -fdump-tree-all?
>
> grep ;)
>
> Can you provide a testcase with a simd attribute annotated cexpi that
> one can play with?
>

On an x86_64 system, run Example 2 at this link:

sourceware.org/glibc/wiki/libmvec

After verifying vectorization (by finding a name with prefix _ZGV and suffix _sin in a.out), replace
the call to sin by one to sincos. The file should be similar to this:

================

#include <math.h>

int N = 3200;
double c[3200];
double b[3200];
double a[3200];

int main (void)
{
  int i;

  for (i = 0; i < N; i += 1)
  {
    sincos (a[i], &b[i], &c[i]);
  }

  return (0);
}

================

In addition to the options shown in Example 2, I passed GCC flags -fopt-info-all, -fopt-info-internal and
-fdump-tree-all to obtain more verbose messages.

That should show vectorization failing for sincos, and diagnostics on the screen indicating reason(s) for
the failure.

To perform the runs on PPC64 requires building both GCC and GLIBC with modifications not yet accepted
into the main development branches of the projects.

Please let me know if you are able to run on x86_64; if not, then perhaps I can push the local GCC
changes to some github repository. GLIBC changes are available at branch tuliom/libmvec of the
development repository.

Bert.
Reply | Threaded
Open this post in threaded view
|

Re: PPC64 libmvec implementation of sincos

Richard Biener-2
On Thu, Dec 5, 2019 at 6:45 PM GT <[hidden email]> wrote:

>
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Thursday, December 5, 2019 4:44 AM, Richard Biener <[hidden email]> wrote:
>
> ...
> ...
> ...
>
> > >
> > > I'm trying to identify the source code which needs modification but I need help proceeding.
> > > I am comparing two compilations: The first is a simple file with a call to sin in a loop.
> > > Vectorization succeeds. The second is an almost identical file but with a call to sincos
> > > in the loop. Vectorization fails.
> > > In gdb, the earliest code location where the two compilations differ is in function
> > > number_of_iterations_exit_assumptions in file tree-ssa-loop-niter.c. Line
> > > op0 = gimple_cond_lhs (stmt);
> > > returns a tree which when analyzed in function instantiate_scev_r (in file tree-scalar-evolution.c)
> > > results in the first branch of the switch being taken for sincos. For sin, the 2nd branch of the
> > > switch is taken.
> > > How can I correlate stmt in the source line above to the relevant line in any dump among those created
> > > using debugging dump option -fdump-tree-all?
> >
> > grep ;)
> >
> > Can you provide a testcase with a simd attribute annotated cexpi that
> > one can play with?
> >
>
> On an x86_64 system, run Example 2 at this link:
>
> sourceware.org/glibc/wiki/libmvec
>
> After verifying vectorization (by finding a name with prefix _ZGV and suffix _sin in a.out), replace
> the call to sin by one to sincos. The file should be similar to this:
>
> ================
>
> #include <math.h>
>
> int N = 3200;
> double c[3200];
> double b[3200];
> double a[3200];
>
> int main (void)
> {
>   int i;
>
>   for (i = 0; i < N; i += 1)
>   {
>     sincos (a[i], &b[i], &c[i]);
>   }
>
>   return (0);
> }
>
> ================
>
> In addition to the options shown in Example 2, I passed GCC flags -fopt-info-all, -fopt-info-internal and
> -fdump-tree-all to obtain more verbose messages.
>
> That should show vectorization failing for sincos, and diagnostics on the screen indicating reason(s) for
> the failure.
>
> To perform the runs on PPC64 requires building both GCC and GLIBC with modifications not yet accepted
> into the main development branches of the projects.
>
> Please let me know if you are able to run on x86_64; if not, then perhaps I can push the local GCC
> changes to some github repository. GLIBC changes are available at branch tuliom/libmvec of the
> development repository.

So I used

void sincos(double x, double *sin, double *cos);
_Complex double __attribute__((__simd__("notinbranch")))
__builtin_cexpi (double);

int N = 3200;
double c[3200];
double b[3200];
double a[3200];

int main (void)
{
  int i;

  for (i = 0; i < N; i += 1)
  {
    sincos (a[i], &b[i], &c[i]);
  }

  return (0);
}

and get

t.c:2:58: warning: unsupported return type ‘complex double’ for simd

so I suppose that would need fixing / ABI adjustments.  Then vectorization
fails with the expected

t.c:13:3: note:   ==> examining statement: _8 = __builtin_cexpi (_1);
t.c:13:3: note:   get vectype for scalar type: complex double
t.c:15:5: missed:   not vectorized: unsupported data-type complex double
t.c:13:3: missed:  can't determine vectorization factor.

For the ABI thing the alternative is to go with "something" for sincos
and have the vectorizer query that something at cexpi vectorization
time, emitting code for that ABI.

But of course the vectorizer needs to be teached to deal with the cexpi
call in the IL which was very low priority because there wasn't any
SIMD implementation of sincos (with whatever ABI).  I can help with
that to some extent, but I wonder what openmp says to _Complex
types and simd functions for those?  Jakub?

Richard.

> Bert.
Reply | Threaded
Open this post in threaded view
|

Re: PPC64 libmvec implementation of sincos

Jakub Jelinek
On Fri, Dec 06, 2019 at 11:48:03AM +0100, Richard Biener wrote:
> So I used
>
> void sincos(double x, double *sin, double *cos);
> _Complex double __attribute__((__simd__("notinbranch")))
> __builtin_cexpi (double);

While Intel-ABI-Vector-Function-2015-v0.9.8.pdf talks about complex numbers,
the reason we punt:
unsupported return type ‘complex double’ for simd
etc. is that we really don't support VECTOR_TYPE with COMPLEX_TYPE element
type, I guess the vectorizer doesn't do anything with that either unless
some earlier optimization was able to scalarize the complex halves.
In theory we could represent the vector counterparts of complex types
as just vectors of double width with element type of COMPLEX_TYPE element
type, have a look at what exactly ICC does to find out if the vector
ordering is real0 complex0 real1 complex1 ... or
real0 real1 real2 ... complex0 complex1 complex2 ...
and tweak everything that needs to cope.

        Jakub

Reply | Threaded
Open this post in threaded view
|

Re: PPC64 libmvec implementation of sincos

Richard Biener-2
On Fri, Dec 6, 2019 at 12:15 PM Jakub Jelinek <[hidden email]> wrote:

>
> On Fri, Dec 06, 2019 at 11:48:03AM +0100, Richard Biener wrote:
> > So I used
> >
> > void sincos(double x, double *sin, double *cos);
> > _Complex double __attribute__((__simd__("notinbranch")))
> > __builtin_cexpi (double);
>
> While Intel-ABI-Vector-Function-2015-v0.9.8.pdf talks about complex numbers,
> the reason we punt:
> unsupported return type ‘complex double’ for simd
> etc. is that we really don't support VECTOR_TYPE with COMPLEX_TYPE element
> type, I guess the vectorizer doesn't do anything with that either unless
> some earlier optimization was able to scalarize the complex halves.
> In theory we could represent the vector counterparts of complex types
> as just vectors of double width with element type of COMPLEX_TYPE element
> type, have a look at what exactly ICC does to find out if the vector
> ordering is real0 complex0 real1 complex1 ... or
> real0 real1 real2 ... complex0 complex1 complex2 ...
> and tweak everything that needs to cope.

I hope real0 complex0, ...

Anyway, the first step is to support vectorizing code where parts of it are
already vectors:

typedef double v2df __attribute__((vector_size(16)));
#define N 1024
v2df a[N];
double b[N];
double c[N];
void foo()
{
  for (int i = 0; i < N; ++i)
    {
      v2df tem = a[i];
      b[i] = tem[0];
      c[i] = tem[1];
    }
}

that can be "re-vectorized" for AVX for example.  If you substitute
_Complex double for the vector type we only handle it during
vectorization because forwprop combines the load and the
__real/imag which helps.

Richard.

>         Jakub
>
GT
Reply | Threaded
Open this post in threaded view
|

Re: PPC64 libmvec implementation of sincos

GT
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, December 6, 2019 6:38 AM, Richard Biener <[hidden email]> wrote:

> On Fri, Dec 6, 2019 at 12:15 PM Jakub Jelinek [hidden email] wrote:
>
> > On Fri, Dec 06, 2019 at 11:48:03AM +0100, Richard Biener wrote:
> >
> > > So I used
> > > void sincos(double x, double *sin, double *cos);
> > > _Complex double attribute((simd("notinbranch")))
> > > __builtin_cexpi (double);
> >
> > While Intel-ABI-Vector-Function-2015-v0.9.8.pdf talks about complex numbers,
> > the reason we punt:
> > unsupported return type ‘complex double’ for simd
> > etc. is that we really don't support VECTOR_TYPE with COMPLEX_TYPE element
> > type, I guess the vectorizer doesn't do anything with that either unless
> > some earlier optimization was able to scalarize the complex halves.
> > In theory we could represent the vector counterparts of complex types
> > as just vectors of double width with element type of COMPLEX_TYPE element
> > type, have a look at what exactly ICC does to find out if the vector
> > ordering is real0 complex0 real1 complex1 ... or
> > real0 real1 real2 ... complex0 complex1 complex2 ...
> > and tweak everything that needs to cope.
>
> I hope real0 complex0, ...
>
> Anyway, the first step is to support vectorizing code where parts of it are
> already vectors:
>
> typedef double v2df attribute((vector_size(16)));
> #define N 1024
> v2df a[N];
> double b[N];
> double c[N];
> void foo()
> {
> for (int i = 0; i < N; ++i)
> {
> v2df tem = a[i];
> b[i] = tem[0];
> c[i] = tem[1];
> }
> }
>
> that can be "re-vectorized" for AVX for example. If you substitute
> _Complex double for the vector type we only handle it during
> vectorization because forwprop combines the load and the
> __real/imag which helps.
>

Are we certain the change we want is to support _Complex double so that cexpi is auto-vectorized?
Looking at the resulting executable of the code with sincos in the loop, the only function called
is sincos. Not builtin_cexpi or any variant of cexpi. File gcc/builtins.c expands calls to builtin_cexpi
to sincos! What is gained by the compiler going through the transformations sincos -> builtin_cexpi ->
sincos?

Bert.
Reply | Threaded
Open this post in threaded view
|

Re: PPC64 libmvec implementation of sincos

Richard Biener-2
On December 6, 2019 5:50:25 PM GMT+01:00, GT <[hidden email]> wrote:

>‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>On Friday, December 6, 2019 6:38 AM, Richard Biener
><[hidden email]> wrote:
>
>> On Fri, Dec 6, 2019 at 12:15 PM Jakub Jelinek [hidden email] wrote:
>>
>> > On Fri, Dec 06, 2019 at 11:48:03AM +0100, Richard Biener wrote:
>> >
>> > > So I used
>> > > void sincos(double x, double *sin, double *cos);
>> > > _Complex double attribute((simd("notinbranch")))
>> > > __builtin_cexpi (double);
>> >
>> > While Intel-ABI-Vector-Function-2015-v0.9.8.pdf talks about complex
>numbers,
>> > the reason we punt:
>> > unsupported return type ‘complex double’ for simd
>> > etc. is that we really don't support VECTOR_TYPE with COMPLEX_TYPE
>element
>> > type, I guess the vectorizer doesn't do anything with that either
>unless
>> > some earlier optimization was able to scalarize the complex halves.
>> > In theory we could represent the vector counterparts of complex
>types
>> > as just vectors of double width with element type of COMPLEX_TYPE
>element
>> > type, have a look at what exactly ICC does to find out if the
>vector
>> > ordering is real0 complex0 real1 complex1 ... or
>> > real0 real1 real2 ... complex0 complex1 complex2 ...
>> > and tweak everything that needs to cope.
>>
>> I hope real0 complex0, ...
>>
>> Anyway, the first step is to support vectorizing code where parts of
>it are
>> already vectors:
>>
>> typedef double v2df attribute((vector_size(16)));
>> #define N 1024
>> v2df a[N];
>> double b[N];
>> double c[N];
>> void foo()
>> {
>> for (int i = 0; i < N; ++i)
>> {
>> v2df tem = a[i];
>> b[i] = tem[0];
>> c[i] = tem[1];
>> }
>> }
>>
>> that can be "re-vectorized" for AVX for example. If you substitute
>> _Complex double for the vector type we only handle it during
>> vectorization because forwprop combines the load and the
>> __real/imag which helps.
>>
>
>Are we certain the change we want is to support _Complex double so that
>cexpi is auto-vectorized?
>Looking at the resulting executable of the code with sincos in the
>loop, the only function called
>is sincos. Not builtin_cexpi or any variant of cexpi. File
>gcc/builtins.c expands calls to builtin_cexpi
>to sincos! What is gained by the compiler going through the
>transformations sincos -> builtin_cexpi ->
>sincos?

Yes, we want to support vectorizing cexpi because that is what the compiler will lower sincos to. The sincos API is painful to deal with due to the data dependences it introduces. Now, the vectorizer can of course emit calls to a vectorized sincos it just needs to be able to deal with cexpi input IL.

Richard.

>Bert.

GT
Reply | Threaded
Open this post in threaded view
|

Re: PPC64 libmvec implementation of sincos

GT
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, December 6, 2019 12:43 PM, Richard Biener [hidden email] wrote:

...
...

> > Are we certain the change we want is to support _Complex double so that
> > cexpi is auto-vectorized?
> > Looking at the resulting executable of the code with sincos in the
> > loop, the only function called
> > is sincos. Not builtin_cexpi or any variant of cexpi. File
> > gcc/builtins.c expands calls to builtin_cexpi
> > to sincos! What is gained by the compiler going through the
> > transformations sincos -> builtin_cexpi ->
> > sincos?
>
> Yes, we want to support vectorizing cexpi because that is what the compiler will lower sincos to. The sincos API is painful to deal with due to the data dependences it introduces. Now, the vectorizer can of course emit calls to a vectorized sincos it just needs to be able to deal with cexpi input IL.
>
> Richard.

I'm modifying the code trying to get complex double accepted as a valid type by the vectorizer.
This is the first time I'm dealing with GCC source so I ask for some patience.

Function mode_for_vector in gcc/stor-layout.c requires a new else-if for complex double. I cannot
seem to find a header file where MIN_MODE_VECTOR_FLOAT and similar macros are defined. I expect
a new MIN_MODE_COMPLEX_VECTOR_FLOAT to be defined in the same file as the existing similar macros.
How do I go about making this change?

Thanks.
Bert.
Reply | Threaded
Open this post in threaded view
|

Re: PPC64 libmvec implementation of sincos

Richard Biener-2
On Sun, Dec 8, 2019 at 10:40 PM GT <[hidden email]> wrote:

>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Friday, December 6, 2019 12:43 PM, Richard Biener [hidden email] wrote:
>
> ...
> ...
>
> > > Are we certain the change we want is to support _Complex double so that
> > > cexpi is auto-vectorized?
> > > Looking at the resulting executable of the code with sincos in the
> > > loop, the only function called
> > > is sincos. Not builtin_cexpi or any variant of cexpi. File
> > > gcc/builtins.c expands calls to builtin_cexpi
> > > to sincos! What is gained by the compiler going through the
> > > transformations sincos -> builtin_cexpi ->
> > > sincos?
> >
> > Yes, we want to support vectorizing cexpi because that is what the compiler will lower sincos to. The sincos API is painful to deal with due to the data dependences it introduces. Now, the vectorizer can of course emit calls to a vectorized sincos it just needs to be able to deal with cexpi input IL.
> >
> > Richard.
>
> I'm modifying the code trying to get complex double accepted as a valid type by the vectorizer.
> This is the first time I'm dealing with GCC source so I ask for some patience.
>
> Function mode_for_vector in gcc/stor-layout.c requires a new else-if for complex double. I cannot
> seem to find a header file where MIN_MODE_VECTOR_FLOAT and similar macros are defined. I expect
> a new MIN_MODE_COMPLEX_VECTOR_FLOAT to be defined in the same file as the existing similar macros.
> How do I go about making this change?

You don't want to do it this way but map _Complex double to a vector
of 2 * n doubles instead.
Look into get_related_vectype_for_scalar_type where it alreday has
code to "change" the
scalar type into something that fits what we allow for vectors.

Richard.

> Thanks.
> Bert.
GT
Reply | Threaded
Open this post in threaded view
|

Re: PPC64 libmvec implementation of sincos

GT
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Monday, December 9, 2019 3:39 AM, Richard Biener [hidden email] wrote:

> > I'm modifying the code trying to get complex double accepted as a valid type by the vectorizer.
> > This is the first time I'm dealing with GCC source so I ask for some patience.
> > Function mode_for_vector in gcc/stor-layout.c requires a new else-if for complex double. I cannot
> > seem to find a header file where MIN_MODE_VECTOR_FLOAT and similar macros are defined. I expect
> > a new MIN_MODE_COMPLEX_VECTOR_FLOAT to be defined in the same file as the existing similar macros.
> > How do I go about making this change?
>
> You don't want to do it this way but map _Complex double to a vector
> of 2 * n doubles instead.
> Look into get_related_vectype_for_scalar_type where it alreday has
> code to "change" the
> scalar type into something that fits what we allow for vectors.
>

Function get_related_vectype_for_scalar_type doesn't exist. There is one named
get_vectype_for_scalar_type, which in turn calls get_vectype_for_scalar_type_and_size. In that
last function I already have 2 changes to prevent NULL_TREE being returned for _Complex double.

1.  In the first if statement of the function, added new condition !is_complex_float_mode (...),
    with arguments identical to those of the existing !is_int_mode and !is_float_mode conditions.

2.  In the 2nd if statement, the else-if has a new condition !COMPLEX_FLOAT_TYPE_P (scalar_type)

    After those changes, NULL_TREE is returned by a clause of the if statement whose first condition
    is if (known_eq (size, 0U)). The 2nd part of the else-if returns true for !mode_for_vector (...).

    Unless the correct path should involve a call similar to build_nonstandard_integer_type in the
    2nd if statement, I still end up requiring the change to mode_for_vector as in my last post.

    Bert.
GT
Reply | Threaded
Open this post in threaded view
|

Re: PPC64 libmvec implementation of sincos

GT
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Monday, December 9, 2019 12:36 PM, GT <[hidden email]> wrote:

> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Monday, December 9, 2019 3:39 AM, Richard Biener [hidden email] wrote:
>
> > > I'm modifying the code trying to get complex double accepted as a valid type by the vectorizer.
> > > This is the first time I'm dealing with GCC source so I ask for some patience.
> > > Function mode_for_vector in gcc/stor-layout.c requires a new else-if for complex double. I cannot
> > > seem to find a header file where MIN_MODE_VECTOR_FLOAT and similar macros are defined. I expect
> > > a new MIN_MODE_COMPLEX_VECTOR_FLOAT to be defined in the same file as the existing similar macros.
> > > How do I go about making this change?
> >
> > You don't want to do it this way but map _Complex double to a vector
> > of 2 * n doubles instead.
> > Look into get_related_vectype_for_scalar_type where it alreday has
> > code to "change" the
> > scalar type into something that fits what we allow for vectors.
>
> Function get_related_vectype_for_scalar_type doesn't exist. There is one named
> get_vectype_for_scalar_type, which in turn calls get_vectype_for_scalar_type_and_size. In that
> last function I already have 2 changes to prevent NULL_TREE being returned for _Complex double.
>
> 1.  In the first if statement of the function, added new condition !is_complex_float_mode (...),
>     with arguments identical to those of the existing !is_int_mode and !is_float_mode conditions.
>
> 2.  In the 2nd if statement, the else-if has a new condition !COMPLEX_FLOAT_TYPE_P (scalar_type)
>
>     After those changes, NULL_TREE is returned by a clause of the if statement whose first condition
>     is if (known_eq (size, 0U)). The 2nd part of the else-if returns true for !mode_for_vector (...).
>
>     Unless the correct path should involve a call similar to build_nonstandard_integer_type in the
>     2nd if statement, I still end up requiring the change to mode_for_vector as in my last post.
>
>     Bert.
>

Please disregard the most recent post. I was using a repository that was outdated. After an update
I see function get_related_vectype_for_scalar_type in the code.

Bert.