Gimpel lowering question.

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Gimpel lowering question.

Michael Eager-2
I have a small test case which generates poor quality code on my target.
Here is the original:

   if (cond1 == 2048 || cond2 == 8)
     {
       x = x + y;
     }
   return x;

This ends up generating a series of instructions to compute a flag with
the result of the condition followed by a single compare with zero and
a jump.  Better code would be two compares and two jumps.

The gimple is

   _1 = cond1 == 2048;
   _2 = cond2 == 8;
   _3 = _1 | _2;
   if (_3 != 0) goto <D.1464>; else goto <D.1465>;
...

so this poor code sequence is essentially a correct translation.

On MIPS, for the same test case, the gimple is different:

   if (cond1 == 2048) goto <D.1491>; else goto <D.1493>;
   <D.1493>:
   if (cond2 == 8) goto <D.1491>; else goto <D.1492>;
   <D.1491>:
...

which generates the better code sequence.

Can someone give me a pointer where to find where the lowering
pass decides to break up a condition into multiple tests?  Is
there a target configuration option which I have overlooked or
maybe set incorrectly?

--
Michael Eager    [hidden email]
1960 Park Blvd., Palo Alto, CA 94306
Reply | Threaded
Open this post in threaded view
|

Re: Gimpel lowering question.

Jeff Law
On 11/28/18 10:00 AM, Michael Eager wrote:

> I have a small test case which generates poor quality code on my target.
> Here is the original:
>
>   if (cond1 == 2048 || cond2 == 8)
>     {
>       x = x + y;
>     }
>   return x;
>
> This ends up generating a series of instructions to compute a flag with
> the result of the condition followed by a single compare with zero and
> a jump.  Better code would be two compares and two jumps.
>
> The gimple is
>
>   _1 = cond1 == 2048;
>   _2 = cond2 == 8;
>   _3 = _1 | _2;
>   if (_3 != 0) goto <D.1464>; else goto <D.1465>;
> ...
>
> so this poor code sequence is essentially a correct translation.
>
> On MIPS, for the same test case, the gimple is different:
>
>   if (cond1 == 2048) goto <D.1491>; else goto <D.1493>;
>   <D.1493>:
>   if (cond2 == 8) goto <D.1491>; else goto <D.1492>;
>   <D.1491>:
> ...
>
> which generates the better code sequence.
>
> Can someone give me a pointer where to find where the lowering
> pass decides to break up a condition into multiple tests?  Is
> there a target configuration option which I have overlooked or
> maybe set incorrectly?
BRANCH_COST, which comes into play during generation of the initial
trees as well in passes which try to optimize branchy code into
straighter code.

jeff
Reply | Threaded
Open this post in threaded view
|

Re: Gimpel lowering question.

Michael Eager-2
On 11/28/18 09:10, Jeff Law wrote:

> On 11/28/18 10:00 AM, Michael Eager wrote:
>> I have a small test case which generates poor quality code on my target.
>> Here is the original:
>>
>>    if (cond1 == 2048 || cond2 == 8)
>>      {
>>        x = x + y;
>>      }
>>    return x;
>>
>> This ends up generating a series of instructions to compute a flag with
>> the result of the condition followed by a single compare with zero and
>> a jump.  Better code would be two compares and two jumps.
>>
>> The gimple is
>>
>>    _1 = cond1 == 2048;
>>    _2 = cond2 == 8;
>>    _3 = _1 | _2;
>>    if (_3 != 0) goto <D.1464>; else goto <D.1465>;
>> ...
>>
>> so this poor code sequence is essentially a correct translation.
>>
>> On MIPS, for the same test case, the gimple is different:
>>
>>    if (cond1 == 2048) goto <D.1491>; else goto <D.1493>;
>>    <D.1493>:
>>    if (cond2 == 8) goto <D.1491>; else goto <D.1492>;
>>    <D.1491>:
>> ...
>>
>> which generates the better code sequence.
>>
>> Can someone give me a pointer where to find where the lowering
>> pass decides to break up a condition into multiple tests?  Is
>> there a target configuration option which I have overlooked or
>> maybe set incorrectly?
> BRANCH_COST, which comes into play during generation of the initial
> trees as well in passes which try to optimize branchy code into
> straighter code.

Thanks.  I did look at BRANCH_COST, and played with the values, but it
didn't seem to have any affect.  I'll take a closer look.

--
Michael Eager    [hidden email]
1960 Park Blvd., Palo Alto, CA 94306
Reply | Threaded
Open this post in threaded view
|

Re: Gimpel lowering question.

Jeff Law
On 11/28/18 10:47 AM, Michael Eager wrote:

> On 11/28/18 09:10, Jeff Law wrote:
>> On 11/28/18 10:00 AM, Michael Eager wrote:
>>> I have a small test case which generates poor quality code on my target.
>>> Here is the original:
>>>
>>>    if (cond1 == 2048 || cond2 == 8)
>>>      {
>>>        x = x + y;
>>>      }
>>>    return x;
>>>
>>> This ends up generating a series of instructions to compute a flag with
>>> the result of the condition followed by a single compare with zero and
>>> a jump.  Better code would be two compares and two jumps.
>>>
>>> The gimple is
>>>
>>>    _1 = cond1 == 2048;
>>>    _2 = cond2 == 8;
>>>    _3 = _1 | _2;
>>>    if (_3 != 0) goto <D.1464>; else goto <D.1465>;
>>> ...
>>>
>>> so this poor code sequence is essentially a correct translation.
>>>
>>> On MIPS, for the same test case, the gimple is different:
>>>
>>>    if (cond1 == 2048) goto <D.1491>; else goto <D.1493>;
>>>    <D.1493>:
>>>    if (cond2 == 8) goto <D.1491>; else goto <D.1492>;
>>>    <D.1491>:
>>> ...
>>>
>>> which generates the better code sequence.
>>>
>>> Can someone give me a pointer where to find where the lowering
>>> pass decides to break up a condition into multiple tests?  Is
>>> there a target configuration option which I have overlooked or
>>> maybe set incorrectly?
>> BRANCH_COST, which comes into play during generation of the initial
>> trees as well in passes which try to optimize branchy code into
>> straighter code.
>
> Thanks.  I did look at BRANCH_COST, and played with the values, but it
> didn't seem to have any affect.  I'll take a closer look.
I'd start by looking at the state the trees during gimplification then
walk forward or backward as necessary.

Jeff
Reply | Threaded
Open this post in threaded view
|

Re: Gimpel lowering question.

Andrew Pinski-2
In reply to this post by Michael Eager-2
On Wed, Nov 28, 2018 at 9:47 AM Michael Eager <[hidden email]> wrote:

>
> On 11/28/18 09:10, Jeff Law wrote:
> > On 11/28/18 10:00 AM, Michael Eager wrote:
> >> I have a small test case which generates poor quality code on my target.
> >> Here is the original:
> >>
> >>    if (cond1 == 2048 || cond2 == 8)
> >>      {
> >>        x = x + y;
> >>      }
> >>    return x;
> >>
> >> This ends up generating a series of instructions to compute a flag with
> >> the result of the condition followed by a single compare with zero and
> >> a jump.  Better code would be two compares and two jumps.
> >>
> >> The gimple is
> >>
> >>    _1 = cond1 == 2048;
> >>    _2 = cond2 == 8;
> >>    _3 = _1 | _2;
> >>    if (_3 != 0) goto <D.1464>; else goto <D.1465>;
> >> ...
> >>
> >> so this poor code sequence is essentially a correct translation.
> >>
> >> On MIPS, for the same test case, the gimple is different:
> >>
> >>    if (cond1 == 2048) goto <D.1491>; else goto <D.1493>;
> >>    <D.1493>:
> >>    if (cond2 == 8) goto <D.1491>; else goto <D.1492>;
> >>    <D.1491>:
> >> ...
> >>
> >> which generates the better code sequence.
> >>
> >> Can someone give me a pointer where to find where the lowering
> >> pass decides to break up a condition into multiple tests?  Is
> >> there a target configuration option which I have overlooked or
> >> maybe set incorrectly?
> > BRANCH_COST, which comes into play during generation of the initial
> > trees as well in passes which try to optimize branchy code into
> > straighter code.
>
> Thanks.  I did look at BRANCH_COST, and played with the values, but it
> didn't seem to have any affect.  I'll take a closer look.

Look at LOGICAL_OP_NON_SHORT_CIRCUIT .  By defualt, it is defined as:
  (BRANCH_COST (optimize_function_for_speed_p (cfun), \
                false) >= 2)
But MIPS defines it as 0.

Changing  LOGICAL_OP_NON_SHORT_CIRCUIT to 0 will disable this optimization.

LOGICAL_OP_NON_SHORT_CIRCUIT controls both places where (cond1 == 2048
|| cond2 == 8) would remove the branch.

NOTE I think MIPS defines LOGICAL_OP_NON_SHORT_CIRCUIT incorrectly but
that is a different story.

Thanks,
Andrew Pinski


>
> --
> Michael Eager    [hidden email]
> 1960 Park Blvd., Palo Alto, CA 94306
Reply | Threaded
Open this post in threaded view
|

Re: Gimpel lowering question.

Michael Eager-2
On 11/28/18 14:37, Andrew Pinski wrote:

> On Wed, Nov 28, 2018 at 9:47 AM Michael Eager <[hidden email]> wrote:
>>
>> On 11/28/18 09:10, Jeff Law wrote:
>>> On 11/28/18 10:00 AM, Michael Eager wrote:
>>>> I have a small test case which generates poor quality code on my target.
>>>> Here is the original:
>>>>
>>>>     if (cond1 == 2048 || cond2 == 8)
>>>>       {
>>>>         x = x + y;
>>>>       }
>>>>     return x;
>>>>
>>>> This ends up generating a series of instructions to compute a flag with
>>>> the result of the condition followed by a single compare with zero and
>>>> a jump.  Better code would be two compares and two jumps.
>>>>
>>>> The gimple is
>>>>
>>>>     _1 = cond1 == 2048;
>>>>     _2 = cond2 == 8;
>>>>     _3 = _1 | _2;
>>>>     if (_3 != 0) goto <D.1464>; else goto <D.1465>;
>>>> ...
>>>>
>>>> so this poor code sequence is essentially a correct translation.
>>>>
>>>> On MIPS, for the same test case, the gimple is different:
>>>>
>>>>     if (cond1 == 2048) goto <D.1491>; else goto <D.1493>;
>>>>     <D.1493>:
>>>>     if (cond2 == 8) goto <D.1491>; else goto <D.1492>;
>>>>     <D.1491>:
>>>> ...
>>>>
>>>> which generates the better code sequence.
>>>>
>>>> Can someone give me a pointer where to find where the lowering
>>>> pass decides to break up a condition into multiple tests?  Is
>>>> there a target configuration option which I have overlooked or
>>>> maybe set incorrectly?
>>> BRANCH_COST, which comes into play during generation of the initial
>>> trees as well in passes which try to optimize branchy code into
>>> straighter code.
>>
>> Thanks.  I did look at BRANCH_COST, and played with the values, but it
>> didn't seem to have any affect.  I'll take a closer look.
>
> Look at LOGICAL_OP_NON_SHORT_CIRCUIT .  By defualt, it is defined as:
>    (BRANCH_COST (optimize_function_for_speed_p (cfun), \
>                  false) >= 2)
> But MIPS defines it as 0.
>
> Changing  LOGICAL_OP_NON_SHORT_CIRCUIT to 0 will disable this optimization.
>
> LOGICAL_OP_NON_SHORT_CIRCUIT controls both places where (cond1 == 2048
> || cond2 == 8) would remove the branch.
>
> NOTE I think MIPS defines LOGICAL_OP_NON_SHORT_CIRCUIT incorrectly but
> that is a different story.
>
> Thanks,
> Andrew Pinski

I set BRANCH_COST to 1 for both predicted and non-predicted branches.
This generates the shorter sequence with a compare, but I'm concerned
that it will adversely affect code generation elsewhere.  Branches are
higher cost than simple instructions.

Looking at the code generated with BRANCH_COST > 1, it doesn't do what
I indicated above.  This did not reduce the number of branches.
In both cases there are two branches in this short test case.  When
BRANCH_COST > 1, there are three instructions to do a compare vs one
when BRANCH_COST = 1.  I'm at a loss to see where there is any benefit.

I'll take a look at the LOGICAL_OP_NON_SHORT_CIRCUIT settings and see
if this makes a difference.

--
Michael Eager    [hidden email]
1960 Park Blvd., Palo Alto, CA 94306
Reply | Threaded
Open this post in threaded view
|

Re: Gimpel lowering question.

Richard Biener-2
In reply to this post by Jeff Law
On Wed, Nov 28, 2018 at 6:10 PM Jeff Law <[hidden email]> wrote:

>
> On 11/28/18 10:00 AM, Michael Eager wrote:
> > I have a small test case which generates poor quality code on my target.
> > Here is the original:
> >
> >   if (cond1 == 2048 || cond2 == 8)
> >     {
> >       x = x + y;
> >     }
> >   return x;
> >
> > This ends up generating a series of instructions to compute a flag with
> > the result of the condition followed by a single compare with zero and
> > a jump.  Better code would be two compares and two jumps.
> >
> > The gimple is
> >
> >   _1 = cond1 == 2048;
> >   _2 = cond2 == 8;
> >   _3 = _1 | _2;
> >   if (_3 != 0) goto <D.1464>; else goto <D.1465>;
> > ...
> >
> > so this poor code sequence is essentially a correct translation.
> >
> > On MIPS, for the same test case, the gimple is different:
> >
> >   if (cond1 == 2048) goto <D.1491>; else goto <D.1493>;
> >   <D.1493>:
> >   if (cond2 == 8) goto <D.1491>; else goto <D.1492>;
> >   <D.1491>:
> > ...
> >
> > which generates the better code sequence.
> >
> > Can someone give me a pointer where to find where the lowering
> > pass decides to break up a condition into multiple tests?  Is
> > there a target configuration option which I have overlooked or
> > maybe set incorrectly?
> BRANCH_COST, which comes into play during generation of the initial
> trees

Sth I don't like very much...  maybe we can revisit removing
the few cases in fold-const.c (thus GENERIC folding) and rely
on later GIMPLE passes instead plus on RTL expansion to
do the reverse transform.

Not for GCC 9 of course.

>as well in passes which try to optimize branchy code into
> straighter code.
>
> jeff
Reply | Threaded
Open this post in threaded view
|

Re: Gimpel lowering question.

Jeff Law
On 11/30/18 12:39 AM, Richard Biener wrote:

> On Wed, Nov 28, 2018 at 6:10 PM Jeff Law <[hidden email]> wrote:
>>
>> On 11/28/18 10:00 AM, Michael Eager wrote:
>>> I have a small test case which generates poor quality code on my target.
>>> Here is the original:
>>>
>>>   if (cond1 == 2048 || cond2 == 8)
>>>     {
>>>       x = x + y;
>>>     }
>>>   return x;
>>>
>>> This ends up generating a series of instructions to compute a flag with
>>> the result of the condition followed by a single compare with zero and
>>> a jump.  Better code would be two compares and two jumps.
>>>
>>> The gimple is
>>>
>>>   _1 = cond1 == 2048;
>>>   _2 = cond2 == 8;
>>>   _3 = _1 | _2;
>>>   if (_3 != 0) goto <D.1464>; else goto <D.1465>;
>>> ...
>>>
>>> so this poor code sequence is essentially a correct translation.
>>>
>>> On MIPS, for the same test case, the gimple is different:
>>>
>>>   if (cond1 == 2048) goto <D.1491>; else goto <D.1493>;
>>>   <D.1493>:
>>>   if (cond2 == 8) goto <D.1491>; else goto <D.1492>;
>>>   <D.1491>:
>>> ...
>>>
>>> which generates the better code sequence.
>>>
>>> Can someone give me a pointer where to find where the lowering
>>> pass decides to break up a condition into multiple tests?  Is
>>> there a target configuration option which I have overlooked or
>>> maybe set incorrectly?
>> BRANCH_COST, which comes into play during generation of the initial
>> trees
>
> Sth I don't like very much...  maybe we can revisit removing
> the few cases in fold-const.c (thus GENERIC folding) and rely
> on later GIMPLE passes instead plus on RTL expansion to
> do the reverse transform.
>
> Not for GCC 9 of course.
Agreed, 100%.  That was supposed to be where Kai's work was headed, but
we never seemed to make significant headway.   Getting BRANCH_COST out
of the tree generation/folding would be a good thing at many levels.

There'll be fallout, of course.

jeff
Reply | Threaded
Open this post in threaded view
|

Re: Gimpel lowering question.

Jakub Jelinek
On Fri, Nov 30, 2018 at 09:08:34AM -0700, Jeff Law wrote:

> > Sth I don't like very much...  maybe we can revisit removing
> > the few cases in fold-const.c (thus GENERIC folding) and rely
> > on later GIMPLE passes instead plus on RTL expansion to
> > do the reverse transform.
> >
> > Not for GCC 9 of course.
> Agreed, 100%.  That was supposed to be where Kai's work was headed, but
> we never seemed to make significant headway.   Getting BRANCH_COST out
> of the tree generation/folding would be a good thing at many levels.
>
> There'll be fallout, of course.

Note, for the GCC 9 timeframe, for PR85368 I wrote today
--param logical-op-non-short-circuit={0,1} support that allows
to override LOGICAL_OP_NON_SHORT_CIRCUIT, because clearly it is impossible
to handle it in testcases otherwise, just -mbranch-cost= and the defaults
aren't good enough, as some targets override LOGICAL_OP_NON_SHORT_CIRCUIT
and figuring out all the details is too hard in tcl.

I prefer a param to make it clear that it won't last too long if we
implement the above for GCC 10 or 11.

        Jakub
Reply | Threaded
Open this post in threaded view
|

Re: Gimpel lowering question.

Jeff Law
On 11/30/18 9:14 AM, Jakub Jelinek wrote:

> On Fri, Nov 30, 2018 at 09:08:34AM -0700, Jeff Law wrote:
>>> Sth I don't like very much...  maybe we can revisit removing
>>> the few cases in fold-const.c (thus GENERIC folding) and rely
>>> on later GIMPLE passes instead plus on RTL expansion to
>>> do the reverse transform.
>>>
>>> Not for GCC 9 of course.
>> Agreed, 100%.  That was supposed to be where Kai's work was headed, but
>> we never seemed to make significant headway.   Getting BRANCH_COST out
>> of the tree generation/folding would be a good thing at many levels.
>>
>> There'll be fallout, of course.
>
> Note, for the GCC 9 timeframe, for PR85368 I wrote today
> --param logical-op-non-short-circuit={0,1} support that allows
> to override LOGICAL_OP_NON_SHORT_CIRCUIT, because clearly it is impossible
> to handle it in testcases otherwise, just -mbranch-cost= and the defaults
> aren't good enough, as some targets override LOGICAL_OP_NON_SHORT_CIRCUIT
> and figuring out all the details is too hard in tcl.
Yea, that sounds quite reasonable.  The target conditions in those tests
are just insane and obviously unmaintainable.  A param to control
behavior seems like a good idea.

The param also makes it easier to experiment with this stuff as we try
to kill the BRANCH_COST and LOGICAL_OP_NON_SHORT_CIRCUIT bits early.
jeff
Reply | Threaded
Open this post in threaded view
|

Re: Gimpel lowering question.

Michael Eager-2
In reply to this post by Michael Eager-2
On 11/29/18 10:28, Michael Eager wrote:

> On 11/28/18 14:37, Andrew Pinski wrote:
>> On Wed, Nov 28, 2018 at 9:47 AM Michael Eager <[hidden email]> wrote:
>>>
>>> On 11/28/18 09:10, Jeff Law wrote:
>>>> On 11/28/18 10:00 AM, Michael Eager wrote:
>>>>> I have a small test case which generates poor quality code on my
>>>>> target.
>>>>> Here is the original:
>>>>>
>>>>>     if (cond1 == 2048 || cond2 == 8)
>>>>>       {
>>>>>         x = x + y;
>>>>>       }
>>>>>     return x;
>>>>>
>>>>> This ends up generating a series of instructions to compute a flag
>>>>> with
>>>>> the result of the condition followed by a single compare with zero and
>>>>> a jump.  Better code would be two compares and two jumps.
>>>>>
>>>>> The gimple is
>>>>>
>>>>>     _1 = cond1 == 2048;
>>>>>     _2 = cond2 == 8;
>>>>>     _3 = _1 | _2;
>>>>>     if (_3 != 0) goto <D.1464>; else goto <D.1465>;
>>>>> ...
>>>>>
>>>>> so this poor code sequence is essentially a correct translation.
>>>>>
>>>>> On MIPS, for the same test case, the gimple is different:
>>>>>
>>>>>     if (cond1 == 2048) goto <D.1491>; else goto <D.1493>;
>>>>>     <D.1493>:
>>>>>     if (cond2 == 8) goto <D.1491>; else goto <D.1492>;
>>>>>     <D.1491>:
>>>>> ...
>>>>>
>>>>> which generates the better code sequence.
>>>>>
>>>>> Can someone give me a pointer where to find where the lowering
>>>>> pass decides to break up a condition into multiple tests?  Is
>>>>> there a target configuration option which I have overlooked or
>>>>> maybe set incorrectly?
>>>> BRANCH_COST, which comes into play during generation of the initial
>>>> trees as well in passes which try to optimize branchy code into
>>>> straighter code.
>>>
>>> Thanks.  I did look at BRANCH_COST, and played with the values, but it
>>> didn't seem to have any affect.  I'll take a closer look.
>>
>> Look at LOGICAL_OP_NON_SHORT_CIRCUIT .  By defualt, it is defined as:
>>    (BRANCH_COST (optimize_function_for_speed_p (cfun), \
>>                  false) >= 2)
>> But MIPS defines it as 0.
>>
>> Changing  LOGICAL_OP_NON_SHORT_CIRCUIT to 0 will disable this
>> optimization.
>>
>> LOGICAL_OP_NON_SHORT_CIRCUIT controls both places where (cond1 == 2048
>> || cond2 == 8) would remove the branch.
>>
>> NOTE I think MIPS defines LOGICAL_OP_NON_SHORT_CIRCUIT incorrectly but
>> that is a different story.
>>
>> Thanks,
>> Andrew Pinski
>
> I set BRANCH_COST to 1 for both predicted and non-predicted branches.
> This generates the shorter sequence with a compare, but I'm concerned
> that it will adversely affect code generation elsewhere.  Branches are
> higher cost than simple instructions.
>
> Looking at the code generated with BRANCH_COST > 1, it doesn't do what
> I indicated above.  This did not reduce the number of branches.
> In both cases there are two branches in this short test case.  When
> BRANCH_COST > 1, there are three instructions to do a compare vs one
> when BRANCH_COST = 1.  I'm at a loss to see where there is any benefit.
>
> I'll take a look at the LOGICAL_OP_NON_SHORT_CIRCUIT settings and see
> if this makes a difference.


Setting LOGICAL_OP_NON_SHORT_CIRCUIT to zero generates the better code
sequence.  Thanks for the pointer.

I stepped through the code at the two places where
LOGICAL_OP_NON_SHORT_CIRCUIT is tested, with it set both to 0 and 1, and
it is not clear what the code is trying to do.  Or what this parameter
is really supposed to mean.  I'll have to go back and look again.

Setting LOGICAL_OP_NON_SHORT_CIRCUIT to 0 or 1 does not change the
number of branches.  Why this would depend on BRANCH_COST is not clear,
and especially why it would depend on the costs of branches which are
predicted to fail makes no sense to me.

Is there a use case which would help this make some sense?

--
Michael Eager    [hidden email]
1960 Park Blvd., Palo Alto, CA 94306