[PATCH][vect]Account for epilogue's peeling for gaps when checking if we have enough niters for epilogue

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH][vect]Account for epilogue's peeling for gaps when checking if we have enough niters for epilogue

Andre Vieira (lists)
Hi,

As I mentioned in the patch to disable epilogue vectorization for loops
with SIMDUID set, there were still some aarch64 libgomp failures. This
patch fixes those.

The problem was that we were vectorizing a reduction that was only using
one of the parts from a complex number, creating data accesses with
gaps. For this we set PEELING_FOR_GAPS which forces us to peel an extra
scalar iteration.

What was happening in the testcase I looked at was that we had a known
niters of 10. The first VF was 4, leaving 10 % 4 = 2 scalar iterations.
The epilogue had VF 2, which meant the current code thought we could do
it. However, given the PEELING_FOR_GAPS it would create a scalar
epilogue and we would end up doing too many iterations, surprisingly 12
as I think the code assumed we hadn't created said epilogue.

I ran a local check where I upped the iterations of the fortran test to
11 and I see GCC vectorizing the epilogue with VF = 2 and a scalar
epilogue for one iteration, so that looks good too. I have transformed
it into a test that would reproduce the issue in C and without openacc
so I can run it in gcc's normal testsuite more easily.

Bootstrap on aarch64 and x86_64.

Is this OK for trunk?

Cheers,
Andre

gcc/ChangeLog:
2019-11-08  Andre Vieira  <[hidden email]>

        * tree-vect-loop-manip.c (vect_do_peeling): Take epilogue gaps
         into account when checking if there are enough iterations to
         vectorize epilogue.

gcc/testsuite/ChangeLog:
2019-11-08  Andre Vieira  <[hidden email]>

        * gcc.dg/vect/vect-reduc-epilogue-gaps.c: New test.

gaps.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH][vect]Account for epilogue's peeling for gaps when checking if we have enough niters for epilogue

Richard Biener
On Fri, 8 Nov 2019, Andre Vieira (lists) wrote:

> Hi,
>
> As I mentioned in the patch to disable epilogue vectorization for loops with
> SIMDUID set, there were still some aarch64 libgomp failures. This patch fixes
> those.
>
> The problem was that we were vectorizing a reduction that was only using one
> of the parts from a complex number, creating data accesses with gaps. For this
> we set PEELING_FOR_GAPS which forces us to peel an extra scalar iteration.
>
> What was happening in the testcase I looked at was that we had a known niters
> of 10. The first VF was 4, leaving 10 % 4 = 2 scalar iterations. The epilogue
> had VF 2, which meant the current code thought we could do it. However, given
> the PEELING_FOR_GAPS it would create a scalar epilogue and we would end up
> doing too many iterations, surprisingly 12 as I think the code assumed we
> hadn't created said epilogue.
>
> I ran a local check where I upped the iterations of the fortran test to 11 and
> I see GCC vectorizing the epilogue with VF = 2 and a scalar epilogue for one
> iteration, so that looks good too. I have transformed it into a test that
> would reproduce the issue in C and without openacc so I can run it in gcc's
> normal testsuite more easily.
>
> Bootstrap on aarch64 and x86_64.
>
> Is this OK for trunk?
OK.

Richard.

> Cheers,
> Andre
>
> gcc/ChangeLog:
> 2019-11-08  Andre Vieira  <[hidden email]>
>
> * tree-vect-loop-manip.c (vect_do_peeling): Take epilogue gaps
>         into account when checking if there are enough iterations to
>         vectorize epilogue.
>
> gcc/testsuite/ChangeLog:
> 2019-11-08  Andre Vieira  <[hidden email]>
>
> * gcc.dg/vect/vect-reduc-epilogue-gaps.c: New test.
>
>
--
Richard Biener <[hidden email]>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)