if (x > ((2^x)-1)) optimization

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

if (x > ((2^x)-1)) optimization

Jason Duerstock
I was starting at the assembly from some of the Python source, and
came across this (simplified) comparison:

if (x > 2305843009213693951) {...}

This is the same as:

if (x > 0x1fffffffffffffff) {...}

This is equivalent to:

if (x >> 61) {...}

More generally, we can rewrite

if ( x > ((1 << z) -1)) { ...}

as

if ( x >> z ) { ... }

This does not appear to currently be a gcc optimization.  What is
involved in adding it?

Jason
Reply | Threaded
Open this post in threaded view
|

Re: if (x > ((2^x)-1)) optimization

Jeff Law
On 6/22/19 7:55 AM, Jason Duerstock wrote:

> I was starting at the assembly from some of the Python source, and
> came across this (simplified) comparison:
>
> if (x > 2305843009213693951) {...}
>
> This is the same as:
>
> if (x > 0x1fffffffffffffff) {...}
>
> This is equivalent to:
>
> if (x >> 61) {...}
>
> More generally, we can rewrite
>
> if ( x > ((1 << z) -1)) { ...}
>
> as
>
> if ( x >> z ) { ... }
>
> This does not appear to currently be a gcc optimization.  What is
> involved in adding it?
So first, when discussing this kind of stuff it's usually best to
actually have a compilable example.  Fragments usually are insufficient.

Adding it to the RTL optimizers would be painful because of the need for
type information (this is only valid when X is unsigned, right?)

Adding it to the gimple optimizers is painful because the optimized form
is actually de-optimized on some targets (say embedded targets with weak
shifters) and querying the target costs/capabilities is generally
frowned upon in the gimple optimizers.

I think the combination of those two factors would tend to argue for an
implementation in the gimple->rtl expanders.  You've still got type
information and you can query the backend for costing and capabilities.

cfgexpand::expand_gimple_cond might be a good place to start.

Another place to poke around would be expr:expand_expr_real_2, case
COND_EXPR.

Jeff
Reply | Threaded
Open this post in threaded view
|

Re: if (x > ((2^x)-1)) optimization

Segher Boessenkool
On Sat, Jun 22, 2019 at 09:46:52AM -0600, Jeff Law wrote:

> On 6/22/19 7:55 AM, Jason Duerstock wrote:
> > More generally, we can rewrite
> >
> > if ( x > ((1 << z) -1)) { ...}
> >
> > as
> >
> > if ( x >> z ) { ... }
> >
> > This does not appear to currently be a gcc optimization.  What is
> > involved in adding it?
> So first, when discussing this kind of stuff it's usually best to
> actually have a compilable example.  Fragments usually are insufficient.
>
> Adding it to the RTL optimizers would be painful because of the need for
> type information (this is only valid when X is unsigned, right?)
>
> Adding it to the gimple optimizers is painful because the optimized form
> is actually de-optimized on some targets (say embedded targets with weak
> shifters) and querying the target costs/capabilities is generally
> frowned upon in the gimple optimizers.
>
> I think the combination of those two factors would tend to argue for an
> implementation in the gimple->rtl expanders.  You've still got type
> information and you can query the backend for costing and capabilities.
>
> cfgexpand::expand_gimple_cond might be a good place to start.
>
> Another place to poke around would be expr:expand_expr_real_2, case
> COND_EXPR.

This is target-specific, so should just be done in the machine description?


Segher
Reply | Threaded
Open this post in threaded view
|

Re: if (x > ((2^x)-1)) optimization

Jeff Law
On 6/22/19 12:44 PM, Segher Boessenkool wrote:

> On Sat, Jun 22, 2019 at 09:46:52AM -0600, Jeff Law wrote:
>> On 6/22/19 7:55 AM, Jason Duerstock wrote:
>>> More generally, we can rewrite
>>>
>>> if ( x > ((1 << z) -1)) { ...}
>>>
>>> as
>>>
>>> if ( x >> z ) { ... }
>>>
>>> This does not appear to currently be a gcc optimization.  What is
>>> involved in adding it?
>> So first, when discussing this kind of stuff it's usually best to
>> actually have a compilable example.  Fragments usually are insufficient.
>>
>> Adding it to the RTL optimizers would be painful because of the need for
>> type information (this is only valid when X is unsigned, right?)
>>
>> Adding it to the gimple optimizers is painful because the optimized form
>> is actually de-optimized on some targets (say embedded targets with weak
>> shifters) and querying the target costs/capabilities is generally
>> frowned upon in the gimple optimizers.
>>
>> I think the combination of those two factors would tend to argue for an
>> implementation in the gimple->rtl expanders.  You've still got type
>> information and you can query the backend for costing and capabilities.
>>
>> cfgexpand::expand_gimple_cond might be a good place to start.
>>
>> Another place to poke around would be expr:expand_expr_real_2, case
>> COND_EXPR.
>
> This is target-specific, so should just be done in the machine description?
No because every target would have to reimplement it.  Drive it with
target costing and capability querying and we do it just once in the
expanders rather than for every target individually.

jeff
Reply | Threaded
Open this post in threaded view
|

Re: if (x > ((2^x)-1)) optimization

Segher Boessenkool
On Sat, Jun 22, 2019 at 03:30:15PM -0600, Jeff Law wrote:

> On 6/22/19 12:44 PM, Segher Boessenkool wrote:
> > On Sat, Jun 22, 2019 at 09:46:52AM -0600, Jeff Law wrote:
> >> On 6/22/19 7:55 AM, Jason Duerstock wrote:
> >>> More generally, we can rewrite
> >>>
> >>> if ( x > ((1 << z) -1)) { ...}
> >>>
> >>> as
> >>>
> >>> if ( x >> z ) { ... }
> >>>
> >>> This does not appear to currently be a gcc optimization.  What is
> >>> involved in adding it?
> >> So first, when discussing this kind of stuff it's usually best to
> >> actually have a compilable example.  Fragments usually are insufficient.
> >>
> >> Adding it to the RTL optimizers would be painful because of the need for
> >> type information (this is only valid when X is unsigned, right?)
> >>
> >> Adding it to the gimple optimizers is painful because the optimized form
> >> is actually de-optimized on some targets (say embedded targets with weak
> >> shifters) and querying the target costs/capabilities is generally
> >> frowned upon in the gimple optimizers.
> >>
> >> I think the combination of those two factors would tend to argue for an
> >> implementation in the gimple->rtl expanders.  You've still got type
> >> information and you can query the backend for costing and capabilities.
> >>
> >> cfgexpand::expand_gimple_cond might be a good place to start.
> >>
> >> Another place to poke around would be expr:expand_expr_real_2, case
> >> COND_EXPR.
> >
> > This is target-specific, so should just be done in the machine description?
> No because every target would have to reimplement it.  Drive it with
> target costing and capability querying and we do it just once in the
> expanders rather than for every target individually.

But you want to do this optimisation not only at expand time, for example you
probably want to do it after the RTL loop transforms as well.

This could be described in the machine description as just the relevant
comparison, generating only the zero/non-zero output.  Various passes will
pick it up automatically then.  This is the same way many similar patterns
are described already.


Segher
Reply | Threaded
Open this post in threaded view
|

Re: if (x > ((2^x)-1)) optimization

Jeff Law
On 6/23/19 5:50 AM, Segher Boessenkool wrote:

> On Sat, Jun 22, 2019 at 03:30:15PM -0600, Jeff Law wrote:
>> On 6/22/19 12:44 PM, Segher Boessenkool wrote:
>>> On Sat, Jun 22, 2019 at 09:46:52AM -0600, Jeff Law wrote:
>>>> On 6/22/19 7:55 AM, Jason Duerstock wrote:
>>>>> More generally, we can rewrite
>>>>>
>>>>> if ( x > ((1 << z) -1)) { ...}
>>>>>
>>>>> as
>>>>>
>>>>> if ( x >> z ) { ... }
>>>>>
>>>>> This does not appear to currently be a gcc optimization.  What is
>>>>> involved in adding it?
>>>> So first, when discussing this kind of stuff it's usually best to
>>>> actually have a compilable example.  Fragments usually are insufficient.
>>>>
>>>> Adding it to the RTL optimizers would be painful because of the need for
>>>> type information (this is only valid when X is unsigned, right?)
>>>>
>>>> Adding it to the gimple optimizers is painful because the optimized form
>>>> is actually de-optimized on some targets (say embedded targets with weak
>>>> shifters) and querying the target costs/capabilities is generally
>>>> frowned upon in the gimple optimizers.
>>>>
>>>> I think the combination of those two factors would tend to argue for an
>>>> implementation in the gimple->rtl expanders.  You've still got type
>>>> information and you can query the backend for costing and capabilities.
>>>>
>>>> cfgexpand::expand_gimple_cond might be a good place to start.
>>>>
>>>> Another place to poke around would be expr:expand_expr_real_2, case
>>>> COND_EXPR.
>>>
>>> This is target-specific, so should just be done in the machine description?
>> No because every target would have to reimplement it.  Drive it with
>> target costing and capability querying and we do it just once in the
>> expanders rather than for every target individually.
>
> But you want to do this optimisation not only at expand time, for example you
> probably want to do it after the RTL loop transforms as well.
Then some of the bits can be factored into helper routines.  It
shouldn't be terribly hard to handle generating good code at expansion
time as well as capturing cases that show up as a result of
transformations elsewhere in the RTL optimization pipeline.


>
> This could be described in the machine description as just the relevant
> comparison, generating only the zero/non-zero output.  Various passes will
> pick it up automatically then.  This is the same way many similar patterns
> are described already.
BUt again, every machine description would need these patterns.

Instead I'd be looking at simplify-rtx to pick up the cases that are
exposed by the RTL pipeline.

Jeff