DSE and maskstore trouble

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

DSE and maskstore trouble

Andrew Stubbs
Hi All,

I'm trying to implement maskload/maskstore for AMD GCN, which has up-to
64-lane, 512-byte fully-masked vectors. All seems fine as far as the
vector operations themselves go, but I've found a problem with the RTL
Dead Store Elimination pass.

Testcase gcc.c-torture/execute/20050826-2.c uses a maskstore to write
the 14 DImode pointers all in one go. The problem is that DSE doesn't
know that the store is masked and judges the width at 512 bytes, not the
true 56 bytes. This leads it to eliminate prior writes to nearby stack
locations, and therefore bad code.

Has anyone encountered this problem with SVE or AVX maskstore at all?

I was thinking of solving the problem by adding a target hook to query
the true length of vector types. Any thoughts on that?

Andrew
Reply | Threaded
Open this post in threaded view
|

Re: DSE and maskstore trouble

Richard Biener-2
On Tue, Jul 3, 2018 at 11:59 AM Andrew Stubbs <[hidden email]> wrote:

>
> Hi All,
>
> I'm trying to implement maskload/maskstore for AMD GCN, which has up-to
> 64-lane, 512-byte fully-masked vectors. All seems fine as far as the
> vector operations themselves go, but I've found a problem with the RTL
> Dead Store Elimination pass.
>
> Testcase gcc.c-torture/execute/20050826-2.c uses a maskstore to write
> the 14 DImode pointers all in one go. The problem is that DSE doesn't
> know that the store is masked and judges the width at 512 bytes, not the
> true 56 bytes. This leads it to eliminate prior writes to nearby stack
> locations, and therefore bad code.
>
> Has anyone encountered this problem with SVE or AVX maskstore at all?

AVX ones are all UNSPECs I believe - how do your patterns look like?

> I was thinking of solving the problem by adding a target hook to query
> the true length of vector types. Any thoughts on that?

It isn't about the length but about the mask because there can be mask
values that do not affect the length?

Richard.

>
> Andrew
Reply | Threaded
Open this post in threaded view
|

Re: DSE and maskstore trouble

Andrew Stubbs
On 03/07/18 11:15, Richard Biener wrote:
> AVX ones are all UNSPECs I believe - how do your patterns look like?

AVX has both unspec and vec_merge variants (at least for define_expand,
in GCC8), but in any case, AFAICT dse.c only cares about the destination
MEM, and all the AVX and SVE patterns appear to use nothing special there.

>> I was thinking of solving the problem by adding a target hook to query
>> the true length of vector types. Any thoughts on that?
>
> It isn't about the length but about the mask because there can be mask
> values that do not affect the length?

The problem I have right now is that the vector write conflicts with
writes to distinct variables, in which case the vector length is what's
important, and it's probably(?) safe to assume that if the vector mask
is not constant then space for the whole vector has been allocated on
the stack.

But yes, in general it's true that subsequent writes to the same vector
could well write distinct elements, in which case the value of the mask
is significant to DSE analysis.

Andrew
Reply | Threaded
Open this post in threaded view
|

Re: DSE and maskstore trouble

Andrew Stubbs
On 03/07/18 11:33, Andrew Stubbs wrote:
> On 03/07/18 11:15, Richard Biener wrote:
>> AVX ones are all UNSPECs I believe - how do your patterns look like?
>
> AVX has both unspec and vec_merge variants (at least for define_expand,
> in GCC8), but in any case, AFAICT dse.c only cares about the destination
> MEM, and all the AVX and SVE patterns appear to use nothing special there.

Sorry, my patterns look something like this:

(set (mem:V64DI (reg:DI)
      (vec_merge:V64DI (reg:V64DI) (unspec ...) (reg:DI)))

Where the unspec just means that the destination remains unchanged. We
could also use (match_dup 0) there, but we don't, so probably there was
an issue with that at some point.

>>> I was thinking of solving the problem by adding a target hook to query
>>> the true length of vector types. Any thoughts on that?
>>
>> It isn't about the length but about the mask because there can be mask
>> values that do not affect the length?
>
> The problem I have right now is that the vector write conflicts with
> writes to distinct variables, in which case the vector length is what's
> important, and it's probably(?) safe to assume that if the vector mask
> is not constant then space for the whole vector has been allocated on
> the stack.
>
> But yes, in general it's true that subsequent writes to the same vector
> could well write distinct elements, in which case the value of the mask
> is significant to DSE analysis.
>
> Andrew

Reply | Threaded
Open this post in threaded view
|

Re: DSE and maskstore trouble

Richard Biener-2
On Tue, Jul 3, 2018 at 12:57 PM Andrew Stubbs <[hidden email]> wrote:

>
> On 03/07/18 11:33, Andrew Stubbs wrote:
> > On 03/07/18 11:15, Richard Biener wrote:
> >> AVX ones are all UNSPECs I believe - how do your patterns look like?
> >
> > AVX has both unspec and vec_merge variants (at least for define_expand,
> > in GCC8), but in any case, AFAICT dse.c only cares about the destination
> > MEM, and all the AVX and SVE patterns appear to use nothing special there.
>
> Sorry, my patterns look something like this:
>
> (set (mem:V64DI (reg:DI)
>       (vec_merge:V64DI (reg:V64DI) (unspec ...) (reg:DI)))
>
> Where the unspec just means that the destination remains unchanged. We
> could also use (match_dup 0) there, but we don't, so probably there was
> an issue with that at some point.

I believe that the AVX variants like

(define_expand "maskstore<mode><avx512fmaskmodelower>"
  [(set (match_operand:V48_AVX512VL 0 "memory_operand")
        (vec_merge:V48_AVX512VL
          (match_operand:V48_AVX512VL 1 "register_operand")
          (match_dup 0)
          (match_operand:<avx512fmaskmode> 2 "register_operand")))]
  "TARGET_AVX512F")

are OK since they represent a use of the memory due to the match_dup 0
while your UNSPEC one doesn't so as the store doesn't actually take
place to all of 0 your insn variant doesn't represent observable behavior.

> >>> I was thinking of solving the problem by adding a target hook to query
> >>> the true length of vector types. Any thoughts on that?
> >>
> >> It isn't about the length but about the mask because there can be mask
> >> values that do not affect the length?
> >
> > The problem I have right now is that the vector write conflicts with
> > writes to distinct variables, in which case the vector length is what's
> > important, and it's probably(?) safe to assume that if the vector mask
> > is not constant then space for the whole vector has been allocated on
> > the stack.
> >
> > But yes, in general it's true that subsequent writes to the same vector
> > could well write distinct elements, in which case the value of the mask
> > is significant to DSE analysis.
> >
> > Andrew
>
Reply | Threaded
Open this post in threaded view
|

Re: DSE and maskstore trouble

Andrew Stubbs
On 03/07/18 12:02, Richard Biener wrote:

> I believe that the AVX variants like
>
> (define_expand "maskstore<mode><avx512fmaskmodelower>"
>    [(set (match_operand:V48_AVX512VL 0 "memory_operand")
>          (vec_merge:V48_AVX512VL
>            (match_operand:V48_AVX512VL 1 "register_operand")
>            (match_dup 0)
>            (match_operand:<avx512fmaskmode> 2 "register_operand")))]
>    "TARGET_AVX512F")
>
> are OK since they represent a use of the memory due to the match_dup 0
> while your UNSPEC one doesn't so as the store doesn't actually take
> place to all of 0 your insn variant doesn't represent observable behavior.

Hmm, so they're safe, but may prevent the optimization of nearby variables?

What about the unspec AVX variant?

Andrew
Reply | Threaded
Open this post in threaded view
|

Re: DSE and maskstore trouble

Richard Biener-2
On Tue, Jul 3, 2018 at 1:06 PM Andrew Stubbs <[hidden email]> wrote:

>
> On 03/07/18 12:02, Richard Biener wrote:
> > I believe that the AVX variants like
> >
> > (define_expand "maskstore<mode><avx512fmaskmodelower>"
> >    [(set (match_operand:V48_AVX512VL 0 "memory_operand")
> >          (vec_merge:V48_AVX512VL
> >            (match_operand:V48_AVX512VL 1 "register_operand")
> >            (match_dup 0)
> >            (match_operand:<avx512fmaskmode> 2 "register_operand")))]
> >    "TARGET_AVX512F")
> >
> > are OK since they represent a use of the memory due to the match_dup 0
> > while your UNSPEC one doesn't so as the store doesn't actually take
> > place to all of 0 your insn variant doesn't represent observable behavior.
>
> Hmm, so they're safe, but may prevent the optimization of nearby variables?

Yes, they prevent earlier stores into lanes that are "really" written
to to be DSEd.

> What about the unspec AVX variant?

They also use match_dup 0 for an input to the UNSPEC so DSE should consider
those similarly (as use).

I think the non-unspec ones exist because AVX512 masks are integer registers
and actually match the vec_merge requirements for the mask operand.  For
the AVX2 variants the mask is a regular vector which is where we do not have
any suitable way to represent things on RTL.  Well, maybe sth like

 (vec_select (vec_concat A B) (mult (const_vector -1 -2 -3 ...) mask-vector))

might work (not exactly, sth more complex is required).  That said,
a "vector"_cond_exec or some other variant of explicit masked-vector
operation is really required to model this stuff exactly on RTL.

Richard.

>
> Andrew
Reply | Threaded
Open this post in threaded view
|

Re: DSE and maskstore trouble

Andrew Stubbs
On 03/07/18 12:30, Richard Biener wrote:
>> Hmm, so they're safe, but may prevent the optimization of nearby variables?
>
> Yes, they prevent earlier stores into lanes that are "really" written
> to to be DSEd.

Right, but I have unrelated variables allocated to the stack within the
"shadow" of the masked vector. I didn't ask it to do that, it just does,
so I presume this is an expect feature of masked vectors with a known mask.

Surely this prevents valid optimizations on those variables?

Andrew
Reply | Threaded
Open this post in threaded view
|

Re: DSE and maskstore trouble

Richard Biener-2
On Tue, Jul 3, 2018 at 1:38 PM Andrew Stubbs <[hidden email]> wrote:

>
> On 03/07/18 12:30, Richard Biener wrote:
> >> Hmm, so they're safe, but may prevent the optimization of nearby variables?
> >
> > Yes, they prevent earlier stores into lanes that are "really" written
> > to to be DSEd.
>
> Right, but I have unrelated variables allocated to the stack within the
> "shadow" of the masked vector. I didn't ask it to do that, it just does,
> so I presume this is an expect feature of masked vectors with a known mask.

Huh, I don't think so.  I guess that's the real error and I wonder how
that happens.
Are those just spills or real allocations?

> Surely this prevents valid optimizations on those variables?
>
> Andrew
Reply | Threaded
Open this post in threaded view
|

Re: DSE and maskstore trouble

Andrew Stubbs
On 03/07/18 12:45, Richard Biener wrote:

> On Tue, Jul 3, 2018 at 1:38 PM Andrew Stubbs <[hidden email]> wrote:
>>
>> On 03/07/18 12:30, Richard Biener wrote:
>>>> Hmm, so they're safe, but may prevent the optimization of nearby variables?
>>>
>>> Yes, they prevent earlier stores into lanes that are "really" written
>>> to to be DSEd.
>>
>> Right, but I have unrelated variables allocated to the stack within the
>> "shadow" of the masked vector. I didn't ask it to do that, it just does,
>> so I presume this is an expect feature of masked vectors with a known mask.
>
> Huh, I don't think so.  I guess that's the real error and I wonder how
> that happens.
> Are those just spills or real allocations?

The code from the testcase looks like this:

     struct rtattr rt[2];
     struct rtattr *rta[14];
     int i;

     rt[0].rta_len = sizeof (struct rtattr) + 8;
     rt[0].rta_type = 0;
     rt[1] = rt[0];
     for (i = 0; i < 14; i++)
       rta[i] = &rt[0];

The initialization of rt[0] and rt[1] are being deleted because the
write to rta[0..13] would overwrite rt if it had actually been the
maximum rta[0..63].

That, or I've been staring at dumps too long and gone crazy.

Andrew

P.S. I'm trying to test with (match_dup 0), but LRA exploded.
Reply | Threaded
Open this post in threaded view
|

Re: DSE and maskstore trouble

Richard Biener-2
On Tue, Jul 3, 2018 at 1:57 PM Andrew Stubbs <[hidden email]> wrote:

>
> On 03/07/18 12:45, Richard Biener wrote:
> > On Tue, Jul 3, 2018 at 1:38 PM Andrew Stubbs <[hidden email]> wrote:
> >>
> >> On 03/07/18 12:30, Richard Biener wrote:
> >>>> Hmm, so they're safe, but may prevent the optimization of nearby variables?
> >>>
> >>> Yes, they prevent earlier stores into lanes that are "really" written
> >>> to to be DSEd.
> >>
> >> Right, but I have unrelated variables allocated to the stack within the
> >> "shadow" of the masked vector. I didn't ask it to do that, it just does,
> >> so I presume this is an expect feature of masked vectors with a known mask.
> >
> > Huh, I don't think so.  I guess that's the real error and I wonder how
> > that happens.
> > Are those just spills or real allocations?
>
> The code from the testcase looks like this:
>
>      struct rtattr rt[2];
>      struct rtattr *rta[14];
>      int i;
>
>      rt[0].rta_len = sizeof (struct rtattr) + 8;
>      rt[0].rta_type = 0;
>      rt[1] = rt[0];
>      for (i = 0; i < 14; i++)
>        rta[i] = &rt[0];
>
> The initialization of rt[0] and rt[1] are being deleted because the
> write to rta[0..13] would overwrite rt if it had actually been the
> maximum rta[0..63].

Ok, so if we vectorize the above with 64 element masked stores
then indeed the RTL representation is _not_ safe.  That is because
while the uses in the masked stores should prevent things from
going bad there is also TBAA to consider which means those
uses might not actually _be_ uses (TBAA-wise) of the earlier
stores.  In the above case rtattr * doesn't alias int (or whatever
types rta_type or rta_len have).  That means to DSE the earlier
stores are dead.

The fix would be to, instead of (match_dup 0) use
(match_dup_but_change_MEM_ALIAS_SET_to_zero 0) ...
at least for the expanders.  That is, adjust the expanders
to not use (match_dup 0) and add insn variants that also
match alias-set zero (match_dup 0).  Or sth along that lines.

Richard.

> That, or I've been staring at dumps too long and gone crazy.
>
> Andrew
>
> P.S. I'm trying to test with (match_dup 0), but LRA exploded.
Reply | Threaded
Open this post in threaded view
|

Re: DSE and maskstore trouble

Andrew Stubbs
On 03/07/18 13:21, Richard Biener wrote:
> Ok, so if we vectorize the above with 64 element masked stores
> then indeed the RTL representation is _not_ safe.  That is because
> while the uses in the masked stores should prevent things from
> going bad there is also TBAA to consider which means those
> uses might not actually _be_ uses (TBAA-wise) of the earlier
> stores.  In the above case rtattr * doesn't alias int (or whatever
> types rta_type or rta_len have).  That means to DSE the earlier
> stores are dead.

I managed to get it to generate maskstore without the unspec, and the
code now runs correctly.

I don't follow your AA reasoning. You say the use stops it being bad,
and then you say the stores are dead, which sounds bad, yet it's not
deleting them now.

Confused. :-(

Andrew
Reply | Threaded
Open this post in threaded view
|

Re: DSE and maskstore trouble

Richard Biener-2
On Tue, Jul 3, 2018 at 2:46 PM Andrew Stubbs <[hidden email]> wrote:

>
> On 03/07/18 13:21, Richard Biener wrote:
> > Ok, so if we vectorize the above with 64 element masked stores
> > then indeed the RTL representation is _not_ safe.  That is because
> > while the uses in the masked stores should prevent things from
> > going bad there is also TBAA to consider which means those
> > uses might not actually _be_ uses (TBAA-wise) of the earlier
> > stores.  In the above case rtattr * doesn't alias int (or whatever
> > types rta_type or rta_len have).  That means to DSE the earlier
> > stores are dead.
>
> I managed to get it to generate maskstore without the unspec, and the
> code now runs correctly.

OK, that is good.

> I don't follow your AA reasoning. You say the use stops it being bad,
> and then you say the stores are dead, which sounds bad, yet it's not
> deleting them now.

If you look at RTL dumps (with -fstrict-aliasing, thus -O2+) you should
see MEM_ALIAS_SETs differing for the earlier stores and the masked
store uses.

Now I'm of course assuming DSE is perfect, maybe it isn't ... ;)

Richard.

>
> Confused. :-(
>
> Andrew
Reply | Threaded
Open this post in threaded view
|

Re: DSE and maskstore trouble

Andrew Stubbs
On 03/07/18 14:52, Richard Biener wrote:
> If you look at RTL dumps (with -fstrict-aliasing, thus -O2+) you should
> see MEM_ALIAS_SETs differing for the earlier stores and the masked
> store uses.
>
> Now I'm of course assuming DSE is perfect, maybe it isn't ... ;)

Ok, I see that the stores have MEMs with different alias sets, indeed. I
can't quite work out if that means it's safe, or unsafe? Do I still need
to zero the set?

For masked stores, clearly the current DSE implementation must be
sub-optimal because it ignores the mask. Writing it as a
load/modify/write means that stores are not erroneously removed, but
also means that redundant writes (to masked vector destinations) are
never removed. Anyway, at least I know how to make that part safe now.

Thanks

Andrew
Reply | Threaded
Open this post in threaded view
|

Re: DSE and maskstore trouble

Richard Biener-2
On July 3, 2018 5:19:24 PM GMT+02:00, Andrew Stubbs <[hidden email]> wrote:

>On 03/07/18 14:52, Richard Biener wrote:
>> If you look at RTL dumps (with -fstrict-aliasing, thus -O2+) you
>should
>> see MEM_ALIAS_SETs differing for the earlier stores and the masked
>> store uses.
>>
>> Now I'm of course assuming DSE is perfect, maybe it isn't ... ;)
>
>Ok, I see that the stores have MEMs with different alias sets, indeed.
>I
>can't quite work out if that means it's safe, or unsafe? Do I still
>need
>to zero the set?

I think you need to zero the set of the load in the masked store. The AVX patterns suffer from the same issue here.
Of course I'm not sure if we can construct a miscompilation here but I wouldn't be surprised if we can.

>For masked stores, clearly the current DSE implementation must be
>sub-optimal because it ignores the mask. Writing it as a
>load/modify/write means that stores are not erroneously removed, but
>also means that redundant writes (to masked vector destinations) are
>never removed. Anyway, at least I know how to make that part safe now.

Yeah.

Richard.
>
>Thanks
>
>Andrew