RTL alternative selection question

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

RTL alternative selection question

Andrew Stubbs
Hi All,

I'm trying to figure out how to prevent LRA selecting alternatives that
result in values being copied from A to B for one instruction, and then
immediately back from B to A again, when there are apparently more
sensible alternatives available.

I have an insn with the following pattern (simplified here):

   [(set (match_operand:DI 0 "register_operand"  "=Sg,v")
         (ashift:DI
           (match_operand:DI 1 "gcn_alu_operand" " Sg,v")
           (match_operand:SI 2 "gcn_alu_operand" " Sg,v")))
    (clobber (match_scratch:BI 3                 "=cs,X"))]

There are two lshl instructions; one for scalar registers and one for
vector registers. The vector here has only a single element, so the two
are equivalent, but we need to pick one.

This operation works for both register files, but there are other
operations that exist only on one side or the other, so we want those to
determine in which register file the values are allocated.

Unfortunately, the compiler (almost?) exclusively selects the second
alternative, even when this means moving the values from one register
file to the other, and then back again.

The problem is that the scalar instruction clobbers the CC register,
which results in a "reject++" for that alternative in the LRA dump.

I can fix this by disparaging the second alternative in the pattern:

    (clobber (match_scratch:BI 3                 "=cs,?X"))

This appears to do the right thing. I can now see both kinds of shift
appearing in the assembly dumps.

But that does "reject+=6", which makes me worry that the balance has now
shifted too far the other way.

Does this make sense?

    (clobber (match_scratch:BI 3                 "=^cs,?X"))

Is there a better way to discourage the copies? Perhaps without editing
all the patterns?

What I want is for the two alternatives to appear equal when the CC
register is not live, and when CC is live for LRA to be able to choose
between reloading CC or switching to the other alternative according to
the situation, not on the pattern alone.

Thanks in advance.

Andrew
Reply | Threaded
Open this post in threaded view
|

Re: RTL alternative selection question

Richard Biener-2
On Mon, Sep 23, 2019 at 12:56 PM Andrew Stubbs <[hidden email]> wrote:

>
> Hi All,
>
> I'm trying to figure out how to prevent LRA selecting alternatives that
> result in values being copied from A to B for one instruction, and then
> immediately back from B to A again, when there are apparently more
> sensible alternatives available.
>
> I have an insn with the following pattern (simplified here):
>
>    [(set (match_operand:DI 0 "register_operand"  "=Sg,v")
>          (ashift:DI
>            (match_operand:DI 1 "gcn_alu_operand" " Sg,v")
>            (match_operand:SI 2 "gcn_alu_operand" " Sg,v")))
>     (clobber (match_scratch:BI 3                 "=cs,X"))]
>
> There are two lshl instructions; one for scalar registers and one for
> vector registers. The vector here has only a single element, so the two
> are equivalent, but we need to pick one.
>
> This operation works for both register files, but there are other
> operations that exist only on one side or the other, so we want those to
> determine in which register file the values are allocated.
>
> Unfortunately, the compiler (almost?) exclusively selects the second
> alternative, even when this means moving the values from one register
> file to the other, and then back again.
>
> The problem is that the scalar instruction clobbers the CC register,
> which results in a "reject++" for that alternative in the LRA dump.
>
> I can fix this by disparaging the second alternative in the pattern:
>
>     (clobber (match_scratch:BI 3                 "=cs,?X"))
>
> This appears to do the right thing. I can now see both kinds of shift
> appearing in the assembly dumps.
>
> But that does "reject+=6", which makes me worry that the balance has now
> shifted too far the other way.
>
> Does this make sense?
>
>     (clobber (match_scratch:BI 3                 "=^cs,?X"))
>
> Is there a better way to discourage the copies? Perhaps without editing
> all the patterns?
>
> What I want is for the two alternatives to appear equal when the CC
> register is not live, and when CC is live for LRA to be able to choose
> between reloading CC or switching to the other alternative according to
> the situation, not on the pattern alone.
>
> Thanks in advance.

I've faced a similar situation for PR91154 where for x86 there's a
vector min/max but no scalar one and the vector min/max is faster
than the conditional move sequence on the integer side.

In the end the extra freedom for the RA is bad since it doesn't
do a global decision on the initial regset preference.

So - don't do it.  In the simpler case as it is yours it would indeed
be nice if it just worked...

Richard.

>
> Andrew
Reply | Threaded
Open this post in threaded view
|

Re: RTL alternative selection question

Segher Boessenkool
In reply to this post by Andrew Stubbs
On Mon, Sep 23, 2019 at 11:56:27AM +0100, Andrew Stubbs wrote:
>   [(set (match_operand:DI 0 "register_operand"  "=Sg,v")
>         (ashift:DI
>           (match_operand:DI 1 "gcn_alu_operand" " Sg,v")
>           (match_operand:SI 2 "gcn_alu_operand" " Sg,v")))
>    (clobber (match_scratch:BI 3                 "=cs,X"))]

> Unfortunately, the compiler (almost?) exclusively selects the second
> alternative, even when this means moving the values from one register
> file to the other, and then back again.
>
> The problem is that the scalar instruction clobbers the CC register,
> which results in a "reject++" for that alternative in the LRA dump.

What kind of reject?  It prints a reason, too.

Maybe we should make a macro/hook to never do that for your target, for
those flags registers anyway.


Segher
Reply | Threaded
Open this post in threaded view
|

Re: RTL alternative selection question

Andrew Stubbs
On 23/09/2019 15:15, Segher Boessenkool wrote:

> On Mon, Sep 23, 2019 at 11:56:27AM +0100, Andrew Stubbs wrote:
>>    [(set (match_operand:DI 0 "register_operand"  "=Sg,v")
>>          (ashift:DI
>>            (match_operand:DI 1 "gcn_alu_operand" " Sg,v")
>>            (match_operand:SI 2 "gcn_alu_operand" " Sg,v")))
>>     (clobber (match_scratch:BI 3                 "=cs,X"))]
>
>> Unfortunately, the compiler (almost?) exclusively selects the second
>> alternative, even when this means moving the values from one register
>> file to the other, and then back again.
>>
>> The problem is that the scalar instruction clobbers the CC register,
>> which results in a "reject++" for that alternative in the LRA dump.
>
> What kind of reject?  It prints a reason, too.

      0 Non input pseudo reload: reject++

> Maybe we should make a macro/hook to never do that for your target, for
> those flags registers anyway.

That wouldn't be horrible. I suppose I could look at doing that. Any
suggestions how it should or should not work?

Andrew
Reply | Threaded
Open this post in threaded view
|

Re: RTL alternative selection question

Segher Boessenkool
On Mon, Sep 23, 2019 at 03:39:08PM +0100, Andrew Stubbs wrote:

> On 23/09/2019 15:15, Segher Boessenkool wrote:
> >On Mon, Sep 23, 2019 at 11:56:27AM +0100, Andrew Stubbs wrote:
> >>   [(set (match_operand:DI 0 "register_operand"  "=Sg,v")
> >>         (ashift:DI
> >>           (match_operand:DI 1 "gcn_alu_operand" " Sg,v")
> >>           (match_operand:SI 2 "gcn_alu_operand" " Sg,v")))
> >>    (clobber (match_scratch:BI 3                 "=cs,X"))]
> >
> >>Unfortunately, the compiler (almost?) exclusively selects the second
> >>alternative, even when this means moving the values from one register
> >>file to the other, and then back again.
> >>
> >>The problem is that the scalar instruction clobbers the CC register,
> >>which results in a "reject++" for that alternative in the LRA dump.
> >
> >What kind of reject?  It prints a reason, too.
>
>      0 Non input pseudo reload: reject++
>
> >Maybe we should make a macro/hook to never do that for your target, for
> >those flags registers anyway.
>
> That wouldn't be horrible. I suppose I could look at doing that. Any
> suggestions how it should or should not work?

Pass the register class or constraint or something like that to the hook,
then based on what the hook returns, either or not do the reject?  So your
hook would special-case SCC_CONDITIONAL_REG, maybe a few more similar ones
(those are confusing names btw, _REG but they are register classes).


Segher
Reply | Threaded
Open this post in threaded view
|

Re: RTL alternative selection question

Andrew Stubbs
On 23/09/2019 16:21, Segher Boessenkool wrote:
> Pass the register class or constraint or something like that to the hook,
> then based on what the hook returns, either or not do the reject?  So your
> hook would special-case SCC_CONDITIONAL_REG, maybe a few more similar ones
> (those are confusing names btw, _REG but they are register classes).

It's a class of one register!

Thanks

Andrew
Reply | Threaded
Open this post in threaded view
|

Re: RTL alternative selection question

Jeff Law
On 9/23/19 9:26 AM, Andrew Stubbs wrote:
> On 23/09/2019 16:21, Segher Boessenkool wrote:
>> Pass the register class or constraint or something like that to the hook,
>> then based on what the hook returns, either or not do the reject?  So
>> your
>> hook would special-case SCC_CONDITIONAL_REG, maybe a few more similar
>> ones
>> (those are confusing names btw, _REG but they are register classes).
>
> It's a class of one register!
So maybe use CLASS_LIKELY_SPILLED_P as the trigger?

jeff
Reply | Threaded
Open this post in threaded view
|

Re: RTL alternative selection question

Richard Sandiford-9
In reply to this post by Andrew Stubbs
Andrew Stubbs <[hidden email]> writes:

> On 23/09/2019 15:15, Segher Boessenkool wrote:
>> On Mon, Sep 23, 2019 at 11:56:27AM +0100, Andrew Stubbs wrote:
>>>    [(set (match_operand:DI 0 "register_operand"  "=Sg,v")
>>>          (ashift:DI
>>>            (match_operand:DI 1 "gcn_alu_operand" " Sg,v")
>>>            (match_operand:SI 2 "gcn_alu_operand" " Sg,v")))
>>>     (clobber (match_scratch:BI 3                 "=cs,X"))]
>>
>>> Unfortunately, the compiler (almost?) exclusively selects the second
>>> alternative, even when this means moving the values from one register
>>> file to the other, and then back again.
>>>
>>> The problem is that the scalar instruction clobbers the CC register,
>>> which results in a "reject++" for that alternative in the LRA dump.
>>
>> What kind of reject?  It prints a reason, too.
>
>       0 Non input pseudo reload: reject++

It looks like that comes from:

                  /* Input reloads can be inherited more often than
                     output reloads can be removed, so penalize output
                     reloads.  */
                  if (!REG_P (op) || curr_static_id->operand[nop].type != OP_IN)
                    {
                      if (lra_dump_file != NULL)
                        fprintf
                          (lra_dump_file,
                           "            %d Non input pseudo reload: reject++\n",
                           nop);
                      reject++;
                    }

That's a good reason for increasing the cost of output reloads of values
vs. input reloads of values, but I don't think it covers your case of
reloading a SCRATCH, which can never be inherited anyway.  So if the
reason in the comment is the real reason (definitely sounds plausible)
then perhaps we should just exclude SCRATCHes?

I don't think this is really target-specific.  MIPS has a very similar
situation with multiplication, and again the SCRATCH reloads shouldn't
count.  Same for some SVE patterns.

Richard
Reply | Threaded
Open this post in threaded view
|

Re: RTL alternative selection question

Andrew Stubbs
In reply to this post by Andrew Stubbs
On 23/09/2019 15:39, Andrew Stubbs wrote:

> On 23/09/2019 15:15, Segher Boessenkool wrote:
>> On Mon, Sep 23, 2019 at 11:56:27AM +0100, Andrew Stubbs wrote:
>>>    [(set (match_operand:DI 0 "register_operand"  "=Sg,v")
>>>          (ashift:DI
>>>            (match_operand:DI 1 "gcn_alu_operand" " Sg,v")
>>>            (match_operand:SI 2 "gcn_alu_operand" " Sg,v")))
>>>     (clobber (match_scratch:BI 3                 "=cs,X"))]
>>
>>> Unfortunately, the compiler (almost?) exclusively selects the second
>>> alternative, even when this means moving the values from one register
>>> file to the other, and then back again.
>>>
>>> The problem is that the scalar instruction clobbers the CC register,
>>> which results in a "reject++" for that alternative in the LRA dump.
>>
>> What kind of reject?  It prints a reason, too.
>
>       0 Non input pseudo reload: reject++

Apparently I was confused by operand "0" versus alternative "0". That
message did occur, but it wasn't the only one. Here's all of it:

              0 Non input pseudo reload: reject++
              3 Scratch win: reject+=2
            alt=0,overall=9,losers=1,rld_nregs=2
            alt=1,overall=6,losers=1,rld_nregs=2

I don't understand why the "reject++" occurs, but presumably has to do
with the "Sg" register availability somehow?

The "Scratch win" part comes from this code:

           /* We simulate the behavior of old reload here.
              Although scratches need hard registers and it
              might result in spilling other pseudos, no reload
              insns are generated for the scratches.  So it
              might cost something but probably less than old
              reload pass believes.  */
           if (scratch_p)
             {
               if (lra_dump_file != NULL)
                 fprintf (lra_dump_file,
                          "            %d Scratch win: reject+=2\n",
                          nop);
               reject += 2;
             }
         }

Would it make sense to skip this reject when CLASS_LIKELY_SPILLED_P, as
Jeff suggested?

Unfortunately, removing the "Scratch win" penalty alone is not enough
for LRA to select the first alternative -- at least, no in my testcase
-- so I need to understand the "non input pseudo reload" issue as well.
I can see why it fires for alt0, but not why it does not fire for alt1.

Andrew
Reply | Threaded
Open this post in threaded view
|

Re: RTL alternative selection question

Segher Boessenkool
On Tue, Oct 01, 2019 at 01:12:06PM +0100, Andrew Stubbs wrote:

> On 23/09/2019 15:39, Andrew Stubbs wrote:
> >On 23/09/2019 15:15, Segher Boessenkool wrote:
> >>On Mon, Sep 23, 2019 at 11:56:27AM +0100, Andrew Stubbs wrote:
> >>>   [(set (match_operand:DI 0 "register_operand"  "=Sg,v")
> >>>         (ashift:DI
> >>>           (match_operand:DI 1 "gcn_alu_operand" " Sg,v")
> >>>           (match_operand:SI 2 "gcn_alu_operand" " Sg,v")))
> >>>    (clobber (match_scratch:BI 3                
> >>>"=cs,X"))]
> >>
> >>>Unfortunately, the compiler (almost?) exclusively selects the second
> >>>alternative, even when this means moving the values from one register
> >>>file to the other, and then back again.
> >>>
> >>>The problem is that the scalar instruction clobbers the CC register,
> >>>which results in a "reject++" for that alternative in the LRA dump.
> >>
> >>What kind of reject?  It prints a reason, too.
> >
> >      0 Non input pseudo reload: reject++
>
> Apparently I was confused by operand "0" versus alternative "0". That
> message did occur, but it wasn't the only one. Here's all of it:
>
>              0 Non input pseudo reload: reject++
>              3 Scratch win: reject+=2
>            alt=0,overall=9,losers=1,rld_nregs=2
>            alt=1,overall=6,losers=1,rld_nregs=2
>
> I don't understand why the "reject++" occurs, but presumably has to do
> with the "Sg" register availability somehow?

operands[0] (the output) needed a reload, apparently.

[ snip ]

> Unfortunately, removing the "Scratch win" penalty alone is not enough
> for LRA to select the first alternative -- at least, no in my testcase
> -- so I need to understand the "non input pseudo reload" issue as well.
> I can see why it fires for alt0, but not why it does not fire for alt1.

The reload it decided to do there was for operands[1] or operands[2]?


Segher