Possibly latent issue with combine ?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Possibly latent issue with combine ?

Prathamesh Kulkarni-2
Hi,
For following test-case, taken from pr88152.C:

#include <x86intrin.h>

template <typename T, size_t N>
using V [[gnu::vector_size(N)]] = T;

int f10 (V<unsigned long long, 16> a)
{
  return _mm_movemask_pd (reinterpret_cast<__m128d> (a > __LONG_LONG_MAX__));
}

.optimized dump shows:

f10 (V a)
{
  vector(2) signed long _1;
  vector(2) long int _2;
  vector(2) double _3;
  int _6;

  <bb 2> [local count: 1073741824]:
  _1 = VIEW_CONVERT_EXPR<vector(2) signed long>(a_4(D));
  _2 = VEC_COND_EXPR <_1 < { 0, 0 }, { -1, -1 }, { 0, 0 }>;
  _3 = VIEW_CONVERT_EXPR<__m128d>(_2);
  _6 = __builtin_ia32_movmskpd (_3); [tail call]
  return _6;

}

IIUC, we're using -1 to represent true and 0 as false.
combine then does following combinations:

Trying 7 -> 9:
    7: r90:V2DI=r89:V2DI>r93:V2DI
      REG_DEAD r93:V2DI
      REG_DEAD r89:V2DI
    9: r91:V2DF=r90:V2DI#0
      REG_DEAD r90:V2DI
Successfully matched this instruction:
(set (subreg:V2DI (reg:V2DF 91) 0)
    (gt:V2DI (reg:V2DI 89)
        (reg:V2DI 93)))
allowing combination of insns 7 and 9

Trying 6, 9 -> 10:
    6: r89:V2DI=const_vector
    9: r91:V2DF#0=r89:V2DI>r93:V2DI
      REG_DEAD r89:V2DI
      REG_DEAD r93:V2DI
   10: r87:SI=unspec[r91:V2DF] 43
      REG_DEAD r91:V2DF
Successfully matched this instruction:
(set (reg:SI 87)
    (unspec:SI [
            (lt:V2DF (reg:V2DI 93)
                (const_vector:V2DI [
                        (const_int 0 [0]) repeated x2
                    ]))
        ] UNSPEC_MOVMSK))

Is the above folding correct, since lt has V2DF mode,
and casting -1 (literally) to DFmode would result in -NaN ?
Also, should result of lt be having only integral modes ?

split2 then folds insn 10 into:

(insn 22 9 16 2 (set (reg:SI 0 ax [87])
        (unspec:SI [
                (reg:V2DF 20 xmm0 [93])
            ] UNSPEC_MOVMSK))
"../../stage1-build/gcc/include/emmintrin.h":958:34 4222
{sse2_movmskpd}
     (nil))

deleting insn 10.

The issue is my patch for PR88833 results in following propagation in forwprop1:

In insn 10, replacing
 (unspec:SI [
            (reg:V2DF 91)
        ] UNSPEC_MOVMSK)
 with (unspec:SI [
            (subreg:V2DF (reg:V2DI 90) 0)
        ] UNSPEC_MOVMSK)

deleting insn 9 and this inhibits the above combinations,
resulting in failure of PR88152.C

With patch, combine shows:
Trying 7 -> 10:
    7: r90:V2DI=r89:V2DI>r93:V2DI
      REG_DEAD r93:V2DI
      REG_DEAD r89:V2DI
   10: r87:SI=unspec[r90:V2DI#0] 43
      REG_DEAD r90:V2DI
Failed to match this instruction:
(set (reg:SI 87)
    (unspec:SI [
            (subreg:V2DF (gt:V2DI (reg:V2DI 89)
                    (reg:V2DI 93)) 0)
        ] UNSPEC_MOVMSK))

and subsequently fails to match 6, 7 -> 10

Patch:
http://people.linaro.org/~prathamesh.kulkarni/pr88833-10.diff

Upstream discussion about the issue:
https://gcc.gnu.org/ml/gcc-patches/2019-06/msg01651.html

Thanks,
Prathamesh
Reply | Threaded
Open this post in threaded view
|

Re: Possibly latent issue with combine ?

Segher Boessenkool
On Wed, Jun 26, 2019 at 07:27:20PM +0530, Prathamesh Kulkarni wrote:

> combine then does following combinations:
>
> Trying 7 -> 9:
>     7: r90:V2DI=r89:V2DI>r93:V2DI
>       REG_DEAD r93:V2DI
>       REG_DEAD r89:V2DI
>     9: r91:V2DF=r90:V2DI#0
>       REG_DEAD r90:V2DI
> Successfully matched this instruction:
> (set (subreg:V2DI (reg:V2DF 91) 0)
>     (gt:V2DI (reg:V2DI 89)
>         (reg:V2DI 93)))
> allowing combination of insns 7 and 9
>
> Trying 6, 9 -> 10:
>     6: r89:V2DI=const_vector
>     9: r91:V2DF#0=r89:V2DI>r93:V2DI
>       REG_DEAD r89:V2DI
>       REG_DEAD r93:V2DI
>    10: r87:SI=unspec[r91:V2DF] 43
>       REG_DEAD r91:V2DF
> Successfully matched this instruction:
> (set (reg:SI 87)
>     (unspec:SI [
>             (lt:V2DF (reg:V2DI 93)
>                 (const_vector:V2DI [
>                         (const_int 0 [0]) repeated x2
>                     ]))
>         ] UNSPEC_MOVMSK))

Both of these are obviously correct, they are simple substitutions.
If this does not do what you want, the problem is elsewhere, not in
combine, afaics?

> Is the above folding correct, since lt has V2DF mode,
> and casting -1 (literally) to DFmode would result in -NaN ?

Combine does not introduce any of that, it was there already.

> Also, should result of lt be having only integral modes ?

Apparently your machine description likes this fine.  Combine does not
ask questions.


Segher
Reply | Threaded
Open this post in threaded view
|

Re: Possibly latent issue with combine ?

Richard Sandiford-9
Segher Boessenkool <[hidden email]> writes:

> On Wed, Jun 26, 2019 at 07:27:20PM +0530, Prathamesh Kulkarni wrote:
>> combine then does following combinations:
>>
>> Trying 7 -> 9:
>>     7: r90:V2DI=r89:V2DI>r93:V2DI
>>       REG_DEAD r93:V2DI
>>       REG_DEAD r89:V2DI
>>     9: r91:V2DF=r90:V2DI#0
>>       REG_DEAD r90:V2DI
>> Successfully matched this instruction:
>> (set (subreg:V2DI (reg:V2DF 91) 0)
>>     (gt:V2DI (reg:V2DI 89)
>>         (reg:V2DI 93)))
>> allowing combination of insns 7 and 9
>>
>> Trying 6, 9 -> 10:
>>     6: r89:V2DI=const_vector
>>     9: r91:V2DF#0=r89:V2DI>r93:V2DI
>>       REG_DEAD r89:V2DI
>>       REG_DEAD r93:V2DI
>>    10: r87:SI=unspec[r91:V2DF] 43
>>       REG_DEAD r91:V2DF
>> Successfully matched this instruction:
>> (set (reg:SI 87)
>>     (unspec:SI [
>>             (lt:V2DF (reg:V2DI 93)
>>                 (const_vector:V2DI [
>>                         (const_int 0 [0]) repeated x2
>>                     ]))
>>         ] UNSPEC_MOVMSK))
>
> Both of these are obviously correct, they are simple substitutions.
> If this does not do what you want, the problem is elsewhere, not in
> combine, afaics?

"Obviously" correct seems a stretch :-)  We can only fold:

  (subreg:V2DF (foo:V2DI X) 0)

to:

  (foo:V2DF X)

for certain operations.  E.g. it'd be wrong to do it for foo=plus.
IMO it's wrong for comparisons too.  A comparison between integers
that produces a floating-point result makes no sense, whatever the
target thinks about it.

>> Is the above folding correct, since lt has V2DF mode,
>> and casting -1 (literally) to DFmode would result in -NaN ?
>
> Combine does not introduce any of that, it was there already.

The original insns had an lt:V2DI between V2DI inputs and a V2DF
subreg of the result.  It's combine that turns that into a lt:V2DF
between V2DI inputs.

Richard

>> Also, should result of lt be having only integral modes ?
>
> Apparently your machine description likes this fine.  Combine does not
> ask questions.
>
>
> Segher
Reply | Threaded
Open this post in threaded view
|

Re: Possibly latent issue with combine ?

Segher Boessenkool
On Wed, Jun 26, 2019 at 05:45:48PM +0100, Richard Sandiford wrote:

> "Obviously" correct seems a stretch :-)  We can only fold:
>
>   (subreg:V2DF (foo:V2DI X) 0)
>
> to:
>
>   (foo:V2DF X)
>
> for certain operations.
>
> E.g. it'd be wrong to do it for foo=plus.

You would need to change X then, sure, so you cannot get that by doing a
simple substitution.  But this is lt, and it makes (structurally) perfect
sense here, the mode of lt does not depend on the mode of its args.  The
target should refuse it if it doesn't like it.  Simply by not having too
lenient patterns in the machine descriptions, probably.

> IMO it's wrong for comparisons too.  A comparison between integers
> that produces a floating-point result makes no sense, whatever the
> target thinks about it.

Then the target should not say it makes sense?

> >> Is the above folding correct, since lt has V2DF mode,
> >> and casting -1 (literally) to DFmode would result in -NaN ?
> >
> > Combine does not introduce any of that, it was there already.
>
> The original insns had an lt:V2DI between V2DI inputs and a V2DF
> subreg of the result.  It's combine that turns that into a lt:V2DF
> between V2DI inputs.

Combine did only simple substitution as far as I can see.


Segher