[PATCH] ipa-inline: Adjust condition for caller_growth_limits

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH] ipa-inline: Adjust condition for caller_growth_limits

Xionghu Luo
Inline should return failure either (newsize > param_large_function_insns)
OR (newsize > limit).  Sometimes newsize is larger than
param_large_function_insns, but smaller than limit, inline doesn't return
failure even if the new function is a large function.
Therefore, param_large_function_insns and param_large_function_growth should be
OR instead of AND, otherwise --param large-function-growth won't
work correctly with --param large-function-insns.

gcc/ChangeLog:

        2020-01-06  Luo Xiong Hu  <[hidden email]>

        ipa-inline-analysis.c (estimate_growth): Fix typo.
        ipa-inline.c (caller_growth_limits): Use OR instead of AND.
---
 gcc/ipa-inline-analysis.c | 2 +-
 gcc/ipa-inline.c          | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c
index 711b85bd1a9..0e584a5a401 100644
--- a/gcc/ipa-inline-analysis.c
+++ b/gcc/ipa-inline-analysis.c
@@ -462,7 +462,7 @@ offline_size (struct cgraph_node *node, ipa_size_summary *info)
   return 0;
 }
 
-/* Estimate the growth caused by inlining NODE into all callees.  */
+/* Estimate the growth caused by inlining NODE into all callers.  */
 
 int
 estimate_growth (struct cgraph_node *node)
diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
index 0f87c476dde..33945538def 100644
--- a/gcc/ipa-inline.c
+++ b/gcc/ipa-inline.c
@@ -184,8 +184,8 @@ caller_growth_limits (struct cgraph_edge *e)
      the function to shrink if it went over the limits by forced inlining.  */
   newsize = estimate_size_after_inlining (to, e);
   if (newsize >= ipa_size_summaries->get (what)->size
-      && newsize > opt_for_fn (to->decl, param_large_function_insns)
-      && newsize > limit)
+      && (newsize > opt_for_fn (to->decl, param_large_function_insns)
+  || newsize > limit))
     {
       e->inline_failed = CIF_LARGE_FUNCTION_GROWTH_LIMIT;
       return false;
--
2.21.0.777.g83232e3864

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] ipa-inline: Adjust condition for caller_growth_limits

Jeff Law
On Mon, 2020-01-06 at 01:03 -0600, Xiong Hu Luo wrote:

> Inline should return failure either (newsize > param_large_function_insns)
> OR (newsize > limit).  Sometimes newsize is larger than
> param_large_function_insns, but smaller than limit, inline doesn't return
> failure even if the new function is a large function.
> Therefore, param_large_function_insns and param_large_function_growth should be
> OR instead of AND, otherwise --param large-function-growth won't
> work correctly with --param large-function-insns.
>
> gcc/ChangeLog:
>
> 2020-01-06  Luo Xiong Hu  <[hidden email]>
>
> ipa-inline-analysis.c (estimate_growth): Fix typo.
> ipa-inline.c (caller_growth_limits): Use OR instead of AND.
OK
jeff

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] ipa-inline: Adjust condition for caller_growth_limits

Xionghu Luo
On 2020/1/7 02:01, Jeff Law wrote:

> On Mon, 2020-01-06 at 01:03 -0600, Xiong Hu Luo wrote:
>> Inline should return failure either (newsize > param_large_function_insns)
>> OR (newsize > limit).  Sometimes newsize is larger than
>> param_large_function_insns, but smaller than limit, inline doesn't return
>> failure even if the new function is a large function.
>> Therefore, param_large_function_insns and param_large_function_growth should be
>> OR instead of AND, otherwise --param large-function-growth won't
>> work correctly with --param large-function-insns.
>>
>> gcc/ChangeLog:
>>
>> 2020-01-06  Luo Xiong Hu  <[hidden email]>
>>
>> ipa-inline-analysis.c (estimate_growth): Fix typo.
>> ipa-inline.c (caller_growth_limits): Use OR instead of AND.
> OK
> jeff

Thanks, committed in r279942.

XiongHu
>

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] ipa-inline: Adjust condition for caller_growth_limits

Jan Hubicka-2
In reply to this post by Jeff Law
> On Mon, 2020-01-06 at 01:03 -0600, Xiong Hu Luo wrote:
> > Inline should return failure either (newsize > param_large_function_insns)
> > OR (newsize > limit).  Sometimes newsize is larger than
> > param_large_function_insns, but smaller than limit, inline doesn't return
> > failure even if the new function is a large function.
> > Therefore, param_large_function_insns and param_large_function_growth should be
> > OR instead of AND, otherwise --param large-function-growth won't
> > work correctly with --param large-function-insns.
> >
> > gcc/ChangeLog:
> >
> > 2020-01-06  Luo Xiong Hu  <[hidden email]>
> >
> > ipa-inline-analysis.c (estimate_growth): Fix typo.
> > ipa-inline.c (caller_growth_limits): Use OR instead of AND.
> OK

This patch also seems wrong.  Inliner should return failure when
 newsize > param_large_function_insns && newsize > limit
The reason is same as with large_unit_insns.  We used to have only
growth limits but it has turned out that this gets too
restrictive/random for very small units/growths.

So the logic is meant to be that inlining is OK either
 - if function is reasonably small (large_function_insns limit is not
   met)
 - or it does not grow too much (large-function-growth isnot met)

@item large-function-insns                                                      
The limit specifying really large functions.  For functions larger than
this    limit after inlining, inlining is constrained by @option{--param
large-function-growth}.  This parameter is useful primarily to avoid
extreme compilation time caused by non-linear algorithms used by the
back end.


Honza
> jeff
>
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] ipa-inline: Adjust condition for caller_growth_limits

Martin Liška-2
On 1/7/20 9:40 AM, Jan Hubicka wrote:

>> On Mon, 2020-01-06 at 01:03 -0600, Xiong Hu Luo wrote:
>>> Inline should return failure either (newsize > param_large_function_insns)
>>> OR (newsize > limit).  Sometimes newsize is larger than
>>> param_large_function_insns, but smaller than limit, inline doesn't return
>>> failure even if the new function is a large function.
>>> Therefore, param_large_function_insns and param_large_function_growth should be
>>> OR instead of AND, otherwise --param large-function-growth won't
>>> work correctly with --param large-function-insns.
>>>
>>> gcc/ChangeLog:
>>>
>>> 2020-01-06  Luo Xiong Hu  <[hidden email]>
>>>
>>> ipa-inline-analysis.c (estimate_growth): Fix typo.
>>> ipa-inline.c (caller_growth_limits): Use OR instead of AND.
>> OK
>
> This patch also seems wrong.  Inliner should return failure when
>   newsize > param_large_function_insns && newsize > limit
> The reason is same as with large_unit_insns.  We used to have only
> growth limits but it has turned out that this gets too
> restrictive/random for very small units/growths.
>
> So the logic is meant to be that inlining is OK either
>   - if function is reasonably small (large_function_insns limit is not
>     met)
>   - or it does not grow too much (large-function-growth isnot met)
>
> @item large-function-insns
> The limit specifying really large functions.  For functions larger than
> this    limit after inlining, inlining is constrained by @option{--param
> large-function-growth}.  This parameter is useful primarily to avoid
> extreme compilation time caused by non-linear algorithms used by the
> back end.

Hello.

The patch was already installed and caused quite significant changes in SPEC
benchmarks:
https://lnt.opensuse.org/db_default/v4/SPEC/latest_runs_report

May I revert the patch?

Thanks,
Martin

>
>
> Honza
>> jeff
>>

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] ipa-inline: Adjust condition for caller_growth_limits

Xionghu Luo
In reply to this post by Jan Hubicka-2


On 2020/1/7 16:40, Jan Hubicka wrote:

>> On Mon, 2020-01-06 at 01:03 -0600, Xiong Hu Luo wrote:
>>> Inline should return failure either (newsize > param_large_function_insns)
>>> OR (newsize > limit).  Sometimes newsize is larger than
>>> param_large_function_insns, but smaller than limit, inline doesn't return
>>> failure even if the new function is a large function.
>>> Therefore, param_large_function_insns and param_large_function_growth should be
>>> OR instead of AND, otherwise --param large-function-growth won't
>>> work correctly with --param large-function-insns.
>>>
>>> gcc/ChangeLog:
>>>
>>> 2020-01-06  Luo Xiong Hu  <[hidden email]>
>>>
>>> ipa-inline-analysis.c (estimate_growth): Fix typo.
>>> ipa-inline.c (caller_growth_limits): Use OR instead of AND.
>> OK
>
> This patch also seems wrong.  Inliner should return failure when
>   newsize > param_large_function_insns && newsize > limit
> The reason is same as with large_unit_insns.  We used to have only
> growth limits but it has turned out that this gets too
> restrictive/random for very small units/growths.
>
> So the logic is meant to be that inlining is OK either
>   - if function is reasonably small (large_function_insns limit is not
>     met)
>   - or it does not grow too much (large-function-growth isnot met)

Sorry for checking too fast.  Will revert once confirmed.
large_function_insns is default to 2700, and large-function-growth is
default to 100, so inline a large function with insn 2700 to another large
function with insn 2700 is allowed here (limit reach to 5400), I observed
HUGE performance drop in some cases for this behavior(large function inline
large function due to register spilling) compared with not inlined, that's
why this patch comes out.

From performance point of view,it seems benefitial to inline small function to
large function, but negative when inline large function to large function(causing
register spill)?  2700 seems too large for defining a function to be large and and
5400 also seems too big for growth limit?

This inline happens at "Inlining one function called once", it's out of
inline_small_functions and since newsize > param_large_function_insns, so I  
suppose it won't change behavior for inlining small functions?

>
> @item large-function-insns
> The limit specifying really large functions.  For functions larger than
> this    limit after inlining, inlining is constrained by @option{--param
> large-function-growth}.  This parameter is useful primarily to avoid
> extreme compilation time caused by non-linear algorithms used by the
> back end.
>
>
> Honza
>> jeff
>>

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] ipa-inline: Adjust condition for caller_growth_limits

Jan Hubicka-2
>
>
> On 2020/1/7 16:40, Jan Hubicka wrote:
> >> On Mon, 2020-01-06 at 01:03 -0600, Xiong Hu Luo wrote:
> >>> Inline should return failure either (newsize > param_large_function_insns)
> >>> OR (newsize > limit).  Sometimes newsize is larger than
> >>> param_large_function_insns, but smaller than limit, inline doesn't return
> >>> failure even if the new function is a large function.
> >>> Therefore, param_large_function_insns and param_large_function_growth should be
> >>> OR instead of AND, otherwise --param large-function-growth won't
> >>> work correctly with --param large-function-insns.
> >>>
> >>> gcc/ChangeLog:
> >>>
> >>> 2020-01-06  Luo Xiong Hu  <[hidden email]>
> >>>
> >>> ipa-inline-analysis.c (estimate_growth): Fix typo.
> >>> ipa-inline.c (caller_growth_limits): Use OR instead of AND.
> >> OK
> >
> > This patch also seems wrong.  Inliner should return failure when
> >   newsize > param_large_function_insns && newsize > limit
> > The reason is same as with large_unit_insns.  We used to have only
> > growth limits but it has turned out that this gets too
> > restrictive/random for very small units/growths.
> >
> > So the logic is meant to be that inlining is OK either
> >   - if function is reasonably small (large_function_insns limit is not
> >     met)
> >   - or it does not grow too much (large-function-growth isnot met)
>
> Sorry for checking too fast.  Will revert once confirmed.

Yes, i can confirm that the origina test was as intended and it is OK to
revert the patch (I am currently traveling)

> large_function_insns is default to 2700, and large-function-growth is
> default to 100, so inline a large function with insn 2700 to another large
> function with insn 2700 is allowed here (limit reach to 5400), I observed
> HUGE performance drop in some cases for this behavior(large function inline
> large function due to register spilling) compared with not inlined, that's
> why this patch comes out.
>
> From performance point of view,it seems benefitial to inline small function to
> large function, but negative when inline large function to large function(causing
> register spill)?  2700 seems too large for defining a function to be large and and
> 5400 also seems too big for growth limit?
>
> This inline happens at "Inlining one function called once", it's out of
> inline_small_functions and since newsize > param_large_function_insns, so I  
> suppose it won't change behavior for inlining small functions?

large-function-growth/insns is not meant to improve performance. The
reason why they exists is the fact that compiler is non-linar
in some cases and thus producing very large functions makes it slow / or
give up on some optimizations.

The problem that sometimes inlining a larger function into a cold path
of other function leads to slowdown is rather old problem (see PR49194).
In general most of logic in inliner is about determining when inlining
is going to be useful. It is hard to tell when it is *not* going to be
useful, so I am bit unsure what to do about these cases in general.

Recently I experimented path which disables inlining functions called
once when callee and caller frequency differ (i.e. one is hot other is
cold) etc, but it did not lead to consistent perormance improvements.

What situation did you run into?

Honza
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] ipa-inline: Adjust condition for caller_growth_limits

Xionghu Luo

On 2020/1/7 23:40, Jan Hubicka wrote:

>>
>>
>> On 2020/1/7 16:40, Jan Hubicka wrote:
>>>> On Mon, 2020-01-06 at 01:03 -0600, Xiong Hu Luo wrote:
>>>>> Inline should return failure either (newsize > param_large_function_insns)
>>>>> OR (newsize > limit).  Sometimes newsize is larger than
>>>>> param_large_function_insns, but smaller than limit, inline doesn't return
>>>>> failure even if the new function is a large function.
>>>>> Therefore, param_large_function_insns and param_large_function_growth should be
>>>>> OR instead of AND, otherwise --param large-function-growth won't
>>>>> work correctly with --param large-function-insns.
>>>>>
>>>>> gcc/ChangeLog:
>>>>>
>>>>> 2020-01-06  Luo Xiong Hu  <[hidden email]>
>>>>>
>>>>> ipa-inline-analysis.c (estimate_growth): Fix typo.
>>>>> ipa-inline.c (caller_growth_limits): Use OR instead of AND.
>>>> OK
>>>
>>> This patch also seems wrong.  Inliner should return failure when
>>>    newsize > param_large_function_insns && newsize > limit
>>> The reason is same as with large_unit_insns.  We used to have only
>>> growth limits but it has turned out that this gets too
>>> restrictive/random for very small units/growths.
>>>
>>> So the logic is meant to be that inlining is OK either
>>>    - if function is reasonably small (large_function_insns limit is not
>>>      met)
>>>    - or it does not grow too much (large-function-growth isnot met)
>>
>> Sorry for checking too fast.  Will revert once confirmed.
>
> Yes, i can confirm that the origina test was as intended and it is OK to
> revert the patch (I am currently traveling)

Partially reverted in r279986 and remained the typo fix as it is obvious.

>
>> large_function_insns is default to 2700, and large-function-growth is
>> default to 100, so inline a large function with insn 2700 to another large
>> function with insn 2700 is allowed here (limit reach to 5400), I observed
>> HUGE performance drop in some cases for this behavior(large function inline
>> large function due to register spilling) compared with not inlined, that's
>> why this patch comes out.
>>
>>  From performance point of view,it seems benefitial to inline small function to
>> large function, but negative when inline large function to large function(causing
>> register spill)?  2700 seems too large for defining a function to be large and and
>> 5400 also seems too big for growth limit?
>>
>> This inline happens at "Inlining one function called once", it's out of
>> inline_small_functions and since newsize > param_large_function_insns, so I
>> suppose it won't change behavior for inlining small functions?
>
> large-function-growth/insns is not meant to improve performance. The
> reason why they exists is the fact that compiler is non-linar
> in some cases and thus producing very large functions makes it slow / or
> give up on some optimizations.
>
> The problem that sometimes inlining a larger function into a cold path
> of other function leads to slowdown is rather old problem (see PR49194).
> In general most of logic in inliner is about determining when inlining
> is going to be useful. It is hard to tell when it is *not* going to be
> useful, so I am bit unsure what to do about these cases in general.
>
> Recently I experimented path which disables inlining functions called
> once when callee and caller frequency differ (i.e. one is hot other is
> cold) etc, but it did not lead to consistent perormance improvements.
>
> What situation did you run into?

Thanks.  So caller could be {hot, cold} + {large, small}, same for callee.  It may
produce up to 4 * 4 = 16 combinations.  Agree that hard to define useful,
and useful really doesn't reflect performance improvements certainly. :)

My case is A1(1) calls A2(2), A2(2) calls A3(3).  A1, A2, A3 are
specialized cloned nodes with different input, they are all hot, called once
and each have about 1000+ insns.  By default, large-function-growth/insns are
both not reached, so A1 will inline A2, A2 will inline A3, which is 40% slower than no-inline.

If adjust the large-function-growth/insns to allow
A2 inline A3 only, the performance is 20% slower then no-inline.
All the inlinings are generated in functions called once.


Xionghu

>
> Honza
>

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] ipa-inline: Adjust condition for caller_growth_limits

Jan Hubicka-2
>
> Thanks.  So caller could be {hot, cold} + {large, small}, same for callee.  It may
> produce up to 4 * 4 = 16 combinations.  Agree that hard to define useful,
> and useful really doesn't reflect performance improvements certainly. :)
>
> My case is A1(1) calls A2(2), A2(2) calls A3(3).  A1, A2, A3 are
> specialized cloned nodes with different input, they are all hot, called once
> and each have about 1000+ insns.  By default, large-function-growth/insns are
> both not reached, so A1 will inline A2, A2 will inline A3, which is 40% slower than no-inline.
>
> If adjust the large-function-growth/insns to allow
> A2 inline A3 only, the performance is 20% slower then no-inline.
> All the inlinings are generated in functions called once.

I see, I assume that this is exchange.  What is difficult for GCC in
exchange is the large loop nest.  GCC generally assumes that what is
inside of deeper loop nest is more iportant and if I recall correctly
there are 10 nested loops wrapping the recursie call.

Basic observation is that for every self recursive function the
combined frequency of all self recursive calls must be less than entry
block frequency or the recursion tree will never end.

Some time ago we added PRED_LOOP_EXIT_WITH_RECURSION,
PRED_RECURSIVE_CALL, PRED_LOOP_GUARD_WITH_RECURSION which makes loops
leading to recursion less likely to iterate. But this may not be enough
to get profile correct.

I wonder if we can not help the situation by extending
esitmate_bb_frequencies to simply sum the frequencies of recursive calls
and if they exceeds entry block forcingly scale down corresponding BBs
accordingly (which would leave profile locally inconsistent, but I am
not sure how to do much better - one could identif control dependencies
and drop probabilities but after that one would need re-propagate the
loop nest I guess.

This may 1) make inliner less eager to perform the inline
         2) make tree optimizers to produce less damage on the outer
            loops if inlining happens.
Honza
>
>
> Xionghu
>
> >
> > Honza
> >
>