[RFC] ipa-cp: Fix PGO regression caused by r278808

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

[RFC] ipa-cp: Fix PGO regression caused by r278808

Xionghu Luo
The performance of exchange2 built with PGO will decrease ~28% by r278808
due to profile count set incorrectly.  The cloned nodes are updated to a
very small count caused later pass cunroll fail to unroll the recursive
function in exchange2, This patch enables adding orig_sum to the new nodes
for self recursive node.

digits_2 ->
digits_2.constprop.0, digits_2.constprop.1, etc.

gcc/ChangeLog:

        2019-12-10  Luo Xiong Hu  <[hidden email]>

        * ipa-pure-const.c (self_recursive_p): Move it from ...
        (propagate_pure_const): Use cgraph_node::self_recursive_p.
        (pass_nothrow::execute): Likewise.
        * cgraph.h (cgraph_node::self_recursive_p): to ... this.
        * ipa-cp.c (update_profiling_info): Check self_recursive_p node.
---
 gcc/cgraph.h         | 18 ++++++++++++++++++
 gcc/ipa-cp.c         |  5 ++++-
 gcc/ipa-pure-const.c | 19 ++-----------------
 3 files changed, 24 insertions(+), 18 deletions(-)

diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index cdeea4d9953..1aca7d114e9 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -1329,6 +1329,9 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : public symtab_node
   /* Return true if function should be optimized for size.  */
   bool optimize_for_size_p (void);
 
+  /* Return true if NODE is self recursive function.  */
+  inline bool self_recursive_p (void);
+
   /* Dump the callgraph to file F.  */
   static void dump_cgraph (FILE *f);
 
@@ -3285,6 +3288,21 @@ cgraph_node::optimize_for_size_p (void)
     return false;
 }
 
+/* Return true if NODE is self recursive function.
+   Indirectly recursive functions appears as non-trivial strongly
+   connected components, so we need to care about self recursion
+   only.  */
+
+inline bool
+cgraph_node::self_recursive_p (void)
+{
+  struct cgraph_edge *e;
+  for (e = this->callees; e; e = e->next_callee)
+    if (e->callee->function_symbol () == this)
+      return true;
+  return false;
+}
+
 /* Return symtab_node for NODE or create one if it is not present
    in symtab.  */
 
diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 14064ae0034..76c1b309d04 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -4305,7 +4305,10 @@ update_profiling_info (struct cgraph_node *orig_node,
     remainder = remainder.guessed_local ();
 
   new_sum = orig_node_count.combine_with_ipa_count (new_sum);
-  new_node->count = new_sum;
+  if (orig_node->self_recursive_p ())
+    new_node->count = (orig_sum + new_sum).apply_scale (5, 10);
+  else
+    new_node->count = new_sum;
   orig_node->count = remainder;
 
   profile_count::adjust_for_ipa_scaling (&new_sum, &orig_new_node_count);
diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c
index a142e0cc8f6..520ed39b476 100644
--- a/gcc/ipa-pure-const.c
+++ b/gcc/ipa-pure-const.c
@@ -1371,21 +1371,6 @@ ignore_edge_for_nothrow (struct cgraph_edge *e)
   || !opt_for_fn (ultimate_target->decl, flag_ipa_pure_const));
 }
 
-/* Return true if NODE is self recursive function.
-   Indirectly recursive functions appears as non-trivial strongly
-   connected components, so we need to care about self recursion
-   only.  */
-
-static bool
-self_recursive_p (struct cgraph_node *node)
-{
-  struct cgraph_edge *e;
-  for (e = node->callees; e; e = e->next_callee)
-    if (e->callee->function_symbol () == node)
-      return true;
-  return false;
-}
-
 /* Return true if N is cdtor that is not const or pure.  In this case we may
    need to remove unreachable function if it is marked const/pure.  */
 
@@ -1666,7 +1651,7 @@ propagate_pure_const (void)
       if (this_state == IPA_NEITHER)
         this_looping = w_l->looping_previously_known;
     }
-  if (!this_looping && self_recursive_p (w))
+  if (!this_looping && w->self_recursive_p ())
     this_looping = true;
   if (!w_l->looping_previously_known)
     this_looping = false;
@@ -2342,7 +2327,7 @@ pass_nothrow::execute (function *)
   node->set_nothrow_flag (true);
 
   bool cfg_changed = false;
-  if (self_recursive_p (node))
+  if (node->self_recursive_p ())
     FOR_EACH_BB_FN (this_block, cfun)
       if (gimple *g = last_stmt (this_block))
  if (is_gimple_call (g))
--
2.21.0.777.g83232e3864

Reply | Threaded
Open this post in threaded view
|

Re: [RFC] ipa-cp: Fix PGO regression caused by r278808

Martin Liška-2
CC'ing Martin and Honza.

Martin
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] ipa-cp: Fix PGO regression caused by r278808

Jan Hubicka-2
Hi,
I think the updating should treat self recursive edges as loops: that is
calculate SUM of counts incomming edges which are not self recursive,
calculate probability of self recursing and then set count as
SUM * (1/(1-probability_of_recursion))
we do same thing when computing counts withing loops.

Martin, I woner how updating works for indirectly self recursive edges?

Honza
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] ipa-cp: Fix PGO regression caused by r278808

Martin Jambor-3
Hi,

On Tue, Dec 10 2019, Jan Hubicka wrote:
> Hi,
> I think the updating should treat self recursive edges as loops: that is
> calculate SUM of counts incomming edges which are not self recursive,
> calculate probability of self recursing and then set count as
> SUM * (1/(1-probability_of_recursion))
> we do same thing when computing counts withing loops.
>
> Martin, I woner how updating works for indirectly self recursive edges?

It does not special-case them in any way, if that was the question.

Martin
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] ipa-cp: Fix PGO regression caused by r278808

Jan Hubicka-2
> Hi,
>
> On Tue, Dec 10 2019, Jan Hubicka wrote:
> > Hi,
> > I think the updating should treat self recursive edges as loops: that is
> > calculate SUM of counts incomming edges which are not self recursive,
> > calculate probability of self recursing and then set count as
> > SUM * (1/(1-probability_of_recursion))
> > we do same thing when computing counts withing loops.
> >
> > Martin, I woner how updating works for indirectly self recursive edges?
>
> It does not special-case them in any way, if that was the question.

Well, it should, summing only frequencies of edges entering the SCC is
not going to give a good answer.
If SCC is having single entry and forms a loop the generalization of
algorithm above will work, but for general graphs this is hard problem.

I suppose we could special case self recurisve nodes (since they are
most common) and apply some fixed scale for other SCCs?

Also i wonder if the updating is done in RPO so we do not account edges
not updated yet?

Honza

>
> Martin
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] ipa-cp: Fix PGO regression caused by r278808

Xionghu Luo
Thanks Honza,

On 2019/12/10 19:06, Jan Hubicka wrote:

>> Hi,
>>
>> On Tue, Dec 10 2019, Jan Hubicka wrote:
>>> Hi,
>>> I think the updating should treat self recursive edges as loops: that is
>>> calculate SUM of counts incomming edges which are not self recursive,
>>> calculate probability of self recursing and then set count as
>>> SUM * (1/(1-probability_of_recursion))
>>> we do same thing when computing counts withing loops.
>>>
>>> Martin, I woner how updating works for indirectly self recursive edges?
>>
>> It does not special-case them in any way, if that was the question.
>
> Well, it should, summing only frequencies of edges entering the SCC is
> not going to give a good answer.
> If SCC is having single entry and forms a loop the generalization of
> algorithm above will work, but for general graphs this is hard problem.
>
> I suppose we could special case self recurisve nodes (since they are
> most common) and apply some fixed scale for other SCCs?
>
> Also i wonder if the updating is done in RPO so we do not account edges
> not updated yet?

I have several questions with you comments.
1.  Where is the loops counts prob calculation code please?  And I am not
quite sure about the formula SUM * (1/(1-probability_of_recursion)).
At this moment, the self recursive node graph will be transformed as below:

            orig_node<-----
               |    |------|
               |  
  caller ---> new_node

Assuming:
e1: caller    ->  new_node:  count1
e2: orig_node -> orig_node:  count2
e3: new_node  -> orig_node:  count2 (same as back edge when cloned)

e3 is not considered to be a self recursive edge now?
So:
SUM = count2
probability_of_recursion should be count2/(count2 + count2)
=>
orig_node.count = SUM * (1/(1-probability_of_recursion)) = count2 * 2

But the count2 of e3 is not correct as it was copied when cloned which makes
the calculation inaccurate.  And the important thing is new_node.count is
still UNCERTAIN.

2.  Also, the orig_node_count is already inaccurate due to
ipa-cp.c:4289   orig_node_count = (orig_sum + new_sum).apply_scale (12, 10);
Since the parameter param_ipa_cp_max_recursive_depth defines how many nodes
are cloned, is it reasonable to update self_scc node as below to simplify the
calculation(replace self_recursive_p with info->node_is_self_scc)?

+  class ipa_node_params *info = IPA_NODE_REF (orig_node);
+  if (info && info->node_is_self_scc)
+    new_node->count
+      = (orig_sum + new_sum).apply_scale (1, param_ipa_cp_max_recursive_depth);
+  else
+    new_node->count = new_sum;

3.  The recursive ipa-cp will create multiple SCC nodes based on orig_node,
next new_node2 no need update previous new_node1, as each new_node takes
1/param_ipa_cp_max_recursive_depth execution count from orig_node.

BR
Xiong Hu

>
> Honza
>
>>
>> Martin

Reply | Threaded
Open this post in threaded view
|

[PATCH v2] ipa-cp: Fix PGO regression caused by r278808

Xionghu Luo
v2 Changes:
1. Enable proportion orig_sum to the new nodes for self recursive node:
   new_sum = (orig_sum + new_sum) \
   * self_recursive_probability * (1 / param_ipa_cp_max_recursive_depth).
2. Add value range for param_ipa_cp_max_recursive_depth.

The performance of exchange2 built with PGO will decrease ~28% by r278808
due to profile count set incorrectly.  The cloned nodes are updated to a
very small count caused later pass cunroll fail to unroll the recursive
function in exchange2.

digits_2 ->
digits_2.constprop.0, digits_2.constprop.1, etc.

gcc/ChangeLog:

        2019-12-26  Luo Xiong Hu  <[hidden email]>

        * ipa-cp.c (update_profiling_info): Check self_scc node.
        * params.opt (ipa-cp-max-recursive-depth): Set param range.
---
 gcc/ipa-cp.c   | 25 +++++++++++++++++++++++++
 gcc/params.opt |  2 +-
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 14064ae0034..947bf7c7199 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -4272,6 +4272,31 @@ update_profiling_info (struct cgraph_node *orig_node,
       false);
   new_sum = stats.count_sum;
 
+  class ipa_node_params *info = IPA_NODE_REF (orig_node);
+  if (info && info->node_is_self_scc)
+    {
+      profile_count self_recursive_count;
+
+      /* The self recursive edge is on the orig_node.  */
+      for (cs = orig_node->callees; cs; cs = cs->next_callee)
+ if (ipa_edge_within_scc (cs))
+  {
+    cgraph_node *callee = cs->callee->function_symbol ();
+    if (callee && cs->caller == cs->callee)
+      {
+ self_recursive_count = cs->count;
+ break;
+      }
+  }
+
+      /* Proportion count for self recursive node from all callers.  */
+      new_sum
+ = (orig_sum + new_sum).apply_scale (self_recursive_count, orig_sum);
+
+      /* Proportion count for specialized cloned node.  */
+      new_sum = new_sum.apply_scale (1, param_ipa_cp_max_recursive_depth);
+    }
+
   if (orig_node_count < orig_sum + new_sum)
     {
       if (dump_file)
diff --git a/gcc/params.opt b/gcc/params.opt
index d88ae0c468b..40a03b04580 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -199,7 +199,7 @@ Common Joined UInteger Var(param_ipa_cp_loop_hint_bonus) Init(64) Param
 Compile-time bonus IPA-CP assigns to candidates which make loop bounds or strides known.
 
 -param=ipa-cp-max-recursive-depth=
-Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(8) Param
+Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(8) IntegerRange(1, 10) Param
 Maximum depth of recursive cloning for self-recursive function.
 
 -param=ipa-cp-min-recursive-probability=
--
2.21.0.777.g83232e3864

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v2] ipa-cp: Fix PGO regression caused by r278808

Feng Xue OS
One comment: it's better to allow zero value for param_ipa_cp_max_recursive_depth,
this can be used to disable recursive cloning.

Feng
________________________________________
From: luoxhu <[hidden email]>
Sent: Monday, December 30, 2019 4:11 PM
To: Jan Hubicka; Martin Jambor
Cc: Martin Liška; [hidden email]; [hidden email]; [hidden email]; [hidden email]; [hidden email]; Feng Xue OS
Subject: [PATCH v2] ipa-cp: Fix PGO regression caused by r278808

v2 Changes:
1. Enable proportion orig_sum to the new nodes for self recursive node:
   new_sum = (orig_sum + new_sum) \
   * self_recursive_probability * (1 / param_ipa_cp_max_recursive_depth).
2. Add value range for param_ipa_cp_max_recursive_depth.

The performance of exchange2 built with PGO will decrease ~28% by r278808
due to profile count set incorrectly.  The cloned nodes are updated to a
very small count caused later pass cunroll fail to unroll the recursive
function in exchange2.

digits_2 ->
digits_2.constprop.0, digits_2.constprop.1, etc.

gcc/ChangeLog:

        2019-12-26  Luo Xiong Hu  <[hidden email]>

        * ipa-cp.c (update_profiling_info): Check self_scc node.
        * params.opt (ipa-cp-max-recursive-depth): Set param range.
---
 gcc/ipa-cp.c   | 25 +++++++++++++++++++++++++
 gcc/params.opt |  2 +-
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 14064ae0034..947bf7c7199 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -4272,6 +4272,31 @@ update_profiling_info (struct cgraph_node *orig_node,
                                              false);
   new_sum = stats.count_sum;

+  class ipa_node_params *info = IPA_NODE_REF (orig_node);
+  if (info && info->node_is_self_scc)
+    {
+      profile_count self_recursive_count;
+
+      /* The self recursive edge is on the orig_node.  */
+      for (cs = orig_node->callees; cs; cs = cs->next_callee)
+       if (ipa_edge_within_scc (cs))
+         {
+           cgraph_node *callee = cs->callee->function_symbol ();
+           if (callee && cs->caller == cs->callee)
+             {
+               self_recursive_count = cs->count;
+               break;
+             }
+         }
+
+      /* Proportion count for self recursive node from all callers.  */
+      new_sum
+       = (orig_sum + new_sum).apply_scale (self_recursive_count, orig_sum);
+
+      /* Proportion count for specialized cloned node.  */
+      new_sum = new_sum.apply_scale (1, param_ipa_cp_max_recursive_depth);
+    }
+
   if (orig_node_count < orig_sum + new_sum)
     {
       if (dump_file)
diff --git a/gcc/params.opt b/gcc/params.opt
index d88ae0c468b..40a03b04580 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -199,7 +199,7 @@ Common Joined UInteger Var(param_ipa_cp_loop_hint_bonus) Init(64) Param
 Compile-time bonus IPA-CP assigns to candidates which make loop bounds or strides known.

 -param=ipa-cp-max-recursive-depth=
-Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(8) Param
+Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(8) IntegerRange(1, 10) Param
 Maximum depth of recursive cloning for self-recursive function.

 -param=ipa-cp-min-recursive-probability=
--
2.21.0.777.g83232e3864

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v2] ipa-cp: Fix PGO regression caused by r278808

Xionghu Luo

On 2019/12/31 14:43, Feng Xue OS wrote:
> One comment: it's better to allow zero value for param_ipa_cp_max_recursive_depth,
> this can be used to disable recursive cloning.

Thanks, "1" means no recursive cloning but only constant propagation from
caller to callee in your code?

ipa-cp.c, line 2019:

       /* Recursively generate lattice values with a limited count.  */
       FOR_EACH_VEC_ELT (val_seeds, i, src_val)
        {
          for (int j = 1; j < param_ipa_cp_max_recursive_depth; j++)
            {
              tree cstval = get_val_across_arith_op (opcode, opnd1_type, opnd2,
                                                     src_val, res_type);
              if (!cstval)
                break;

              ret |= dest_lat->add_value (cstval, cs, src_val, src_idx,
                                          src_offset, &src_val, true);
              gcc_checking_assert (src_val);
            }
        }

XiongHu

>
> Feng
> ________________________________________
> From: luoxhu <[hidden email]>
> Sent: Monday, December 30, 2019 4:11 PM
> To: Jan Hubicka; Martin Jambor
> Cc: Martin Liška; [hidden email]; [hidden email]; [hidden email]; [hidden email]; [hidden email]; Feng Xue OS
> Subject: [PATCH v2] ipa-cp: Fix PGO regression caused by r278808
>
> v2 Changes:
> 1. Enable proportion orig_sum to the new nodes for self recursive node:
>     new_sum = (orig_sum + new_sum) \
>     * self_recursive_probability * (1 / param_ipa_cp_max_recursive_depth).
> 2. Add value range for param_ipa_cp_max_recursive_depth.
>
> The performance of exchange2 built with PGO will decrease ~28% by r278808
> due to profile count set incorrectly.  The cloned nodes are updated to a
> very small count caused later pass cunroll fail to unroll the recursive
> function in exchange2.
>
> digits_2 ->
> digits_2.constprop.0, digits_2.constprop.1, etc.
>
> gcc/ChangeLog:
>
>          2019-12-26  Luo Xiong Hu  <[hidden email]>
>
>          * ipa-cp.c (update_profiling_info): Check self_scc node.
>          * params.opt (ipa-cp-max-recursive-depth): Set param range.
> ---
>   gcc/ipa-cp.c   | 25 +++++++++++++++++++++++++
>   gcc/params.opt |  2 +-
>   2 files changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> index 14064ae0034..947bf7c7199 100644
> --- a/gcc/ipa-cp.c
> +++ b/gcc/ipa-cp.c
> @@ -4272,6 +4272,31 @@ update_profiling_info (struct cgraph_node *orig_node,
>                                                false);
>     new_sum = stats.count_sum;
>
> +  class ipa_node_params *info = IPA_NODE_REF (orig_node);
> +  if (info && info->node_is_self_scc)
> +    {
> +      profile_count self_recursive_count;
> +
> +      /* The self recursive edge is on the orig_node.  */
> +      for (cs = orig_node->callees; cs; cs = cs->next_callee)
> +       if (ipa_edge_within_scc (cs))
> +         {
> +           cgraph_node *callee = cs->callee->function_symbol ();
> +           if (callee && cs->caller == cs->callee)
> +             {
> +               self_recursive_count = cs->count;
> +               break;
> +             }
> +         }
> +
> +      /* Proportion count for self recursive node from all callers.  */
> +      new_sum
> +       = (orig_sum + new_sum).apply_scale (self_recursive_count, orig_sum);
> +
> +      /* Proportion count for specialized cloned node.  */
> +      new_sum = new_sum.apply_scale (1, param_ipa_cp_max_recursive_depth);
> +    }
> +
>     if (orig_node_count < orig_sum + new_sum)
>       {
>         if (dump_file)
> diff --git a/gcc/params.opt b/gcc/params.opt
> index d88ae0c468b..40a03b04580 100644
> --- a/gcc/params.opt
> +++ b/gcc/params.opt
> @@ -199,7 +199,7 @@ Common Joined UInteger Var(param_ipa_cp_loop_hint_bonus) Init(64) Param
>   Compile-time bonus IPA-CP assigns to candidates which make loop bounds or strides known.
>
>   -param=ipa-cp-max-recursive-depth=
> -Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(8) Param
> +Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(8) IntegerRange(1, 10) Param
>   Maximum depth of recursive cloning for self-recursive function.
>
>   -param=ipa-cp-min-recursive-probability=
> --
> 2.21.0.777.g83232e3864
>

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v2] ipa-cp: Fix PGO regression caused by r278808

Feng Xue OS
The first level is ordinary clone, corresponding to a non-self caller,
and the param_ipa_cp_max_recursive_depth-1 is for recursive cloning.
Then it's ok that the least value is 1, with which disabling does happen.

Feng

________________________________________
From: luoxhu <[hidden email]>
Sent: Tuesday, December 31, 2019 3:43 PM
To: Feng Xue OS; Jan Hubicka; Martin Jambor
Cc: Martin Liška; [hidden email]; [hidden email]; [hidden email]; [hidden email]; [hidden email]
Subject: Re: [PATCH v2] ipa-cp: Fix PGO regression caused by r278808


On 2019/12/31 14:43, Feng Xue OS wrote:
> One comment: it's better to allow zero value for param_ipa_cp_max_recursive_depth,
> this can be used to disable recursive cloning.

Thanks, "1" means no recursive cloning but only constant propagation from
caller to callee in your code?

ipa-cp.c, line 2019:

       /* Recursively generate lattice values with a limited count.  */
       FOR_EACH_VEC_ELT (val_seeds, i, src_val)
        {
          for (int j = 1; j < param_ipa_cp_max_recursive_depth; j++)
            {
              tree cstval = get_val_across_arith_op (opcode, opnd1_type, opnd2,
                                                     src_val, res_type);
              if (!cstval)
                break;

              ret |= dest_lat->add_value (cstval, cs, src_val, src_idx,
                                          src_offset, &src_val, true);
              gcc_checking_assert (src_val);
            }
        }

XiongHu

>
> Feng
> ________________________________________
> From: luoxhu <[hidden email]>
> Sent: Monday, December 30, 2019 4:11 PM
> To: Jan Hubicka; Martin Jambor
> Cc: Martin Liška; [hidden email]; [hidden email]; [hidden email]; [hidden email]; [hidden email]; Feng Xue OS
> Subject: [PATCH v2] ipa-cp: Fix PGO regression caused by r278808
>
> v2 Changes:
> 1. Enable proportion orig_sum to the new nodes for self recursive node:
>     new_sum = (orig_sum + new_sum) \
>     * self_recursive_probability * (1 / param_ipa_cp_max_recursive_depth).
> 2. Add value range for param_ipa_cp_max_recursive_depth.
>
> The performance of exchange2 built with PGO will decrease ~28% by r278808
> due to profile count set incorrectly.  The cloned nodes are updated to a
> very small count caused later pass cunroll fail to unroll the recursive
> function in exchange2.
>
> digits_2 ->
> digits_2.constprop.0, digits_2.constprop.1, etc.
>
> gcc/ChangeLog:
>
>          2019-12-26  Luo Xiong Hu  <[hidden email]>
>
>          * ipa-cp.c (update_profiling_info): Check self_scc node.
>          * params.opt (ipa-cp-max-recursive-depth): Set param range.
> ---
>   gcc/ipa-cp.c   | 25 +++++++++++++++++++++++++
>   gcc/params.opt |  2 +-
>   2 files changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> index 14064ae0034..947bf7c7199 100644
> --- a/gcc/ipa-cp.c
> +++ b/gcc/ipa-cp.c
> @@ -4272,6 +4272,31 @@ update_profiling_info (struct cgraph_node *orig_node,
>                                                false);
>     new_sum = stats.count_sum;
>
> +  class ipa_node_params *info = IPA_NODE_REF (orig_node);
> +  if (info && info->node_is_self_scc)
> +    {
> +      profile_count self_recursive_count;
> +
> +      /* The self recursive edge is on the orig_node.  */
> +      for (cs = orig_node->callees; cs; cs = cs->next_callee)
> +       if (ipa_edge_within_scc (cs))
> +         {
> +           cgraph_node *callee = cs->callee->function_symbol ();
> +           if (callee && cs->caller == cs->callee)
> +             {
> +               self_recursive_count = cs->count;
> +               break;
> +             }
> +         }
> +
> +      /* Proportion count for self recursive node from all callers.  */
> +      new_sum
> +       = (orig_sum + new_sum).apply_scale (self_recursive_count, orig_sum);
> +
> +      /* Proportion count for specialized cloned node.  */
> +      new_sum = new_sum.apply_scale (1, param_ipa_cp_max_recursive_depth);
> +    }
> +
>     if (orig_node_count < orig_sum + new_sum)
>       {
>         if (dump_file)
> diff --git a/gcc/params.opt b/gcc/params.opt
> index d88ae0c468b..40a03b04580 100644
> --- a/gcc/params.opt
> +++ b/gcc/params.opt
> @@ -199,7 +199,7 @@ Common Joined UInteger Var(param_ipa_cp_loop_hint_bonus) Init(64) Param
>   Compile-time bonus IPA-CP assigns to candidates which make loop bounds or strides known.
>
>   -param=ipa-cp-max-recursive-depth=
> -Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(8) Param
> +Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(8) IntegerRange(1, 10) Param
>   Maximum depth of recursive cloning for self-recursive function.
>
>   -param=ipa-cp-min-recursive-probability=
> --
> 2.21.0.777.g83232e3864
>

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v2] ipa-cp: Fix PGO regression caused by r278808

Jan Hubicka-2
In reply to this post by Xionghu Luo
> v2 Changes:
> 1. Enable proportion orig_sum to the new nodes for self recursive node:
>    new_sum = (orig_sum + new_sum) \
>    * self_recursive_probability * (1 / param_ipa_cp_max_recursive_depth).
> 2. Add value range for param_ipa_cp_max_recursive_depth.
>
> The performance of exchange2 built with PGO will decrease ~28% by r278808
> due to profile count set incorrectly.  The cloned nodes are updated to a
> very small count caused later pass cunroll fail to unroll the recursive
> function in exchange2.
>
> digits_2 ->
> digits_2.constprop.0, digits_2.constprop.1, etc.
>
> gcc/ChangeLog:
>
> 2019-12-26  Luo Xiong Hu  <[hidden email]>
>
> * ipa-cp.c (update_profiling_info): Check self_scc node.
> * params.opt (ipa-cp-max-recursive-depth): Set param range.
> ---
>  gcc/ipa-cp.c   | 25 +++++++++++++++++++++++++
>  gcc/params.opt |  2 +-
>  2 files changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> index 14064ae0034..947bf7c7199 100644
> --- a/gcc/ipa-cp.c
> +++ b/gcc/ipa-cp.c
> @@ -4272,6 +4272,31 @@ update_profiling_info (struct cgraph_node *orig_node,
>        false);
>    new_sum = stats.count_sum;
>  
> +  class ipa_node_params *info = IPA_NODE_REF (orig_node);
> +  if (info && info->node_is_self_scc)
> +    {
> +      profile_count self_recursive_count;
> +
> +      /* The self recursive edge is on the orig_node.  */
> +      for (cs = orig_node->callees; cs; cs = cs->next_callee)
> + if (ipa_edge_within_scc (cs))
> +  {
> +    cgraph_node *callee = cs->callee->function_symbol ();
> +    if (callee && cs->caller == cs->callee)
> +      {
> + self_recursive_count = cs->count;
> + break;
> +      }
> +  }

What happens here when there are multiple self recursive calls or when
the SCC has two mutually recursive functions?

I am still confused by the logic of this function.  I will take a deeper
look at your previous email.

> +
> +      /* Proportion count for self recursive node from all callers.  */
> +      new_sum
> + = (orig_sum + new_sum).apply_scale (self_recursive_count, orig_sum);
> +
> +      /* Proportion count for specialized cloned node.  */
> +      new_sum = new_sum.apply_scale (1, param_ipa_cp_max_recursive_depth);
> +    }
> +
>    if (orig_node_count < orig_sum + new_sum)
>      {
>        if (dump_file)
> diff --git a/gcc/params.opt b/gcc/params.opt
> index d88ae0c468b..40a03b04580 100644
> --- a/gcc/params.opt
> +++ b/gcc/params.opt
> @@ -199,7 +199,7 @@ Common Joined UInteger Var(param_ipa_cp_loop_hint_bonus) Init(64) Param
>  Compile-time bonus IPA-CP assigns to candidates which make loop bounds or strides known.
>  
>  -param=ipa-cp-max-recursive-depth=
> -Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(8) Param
> +Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(8) IntegerRange(1, 10) Param
>  Maximum depth of recursive cloning for self-recursive function.

The values stats from 0 but I also wonder why 10 is a good upper bound?
If the function calls itself with one type of value (like depth-1) then
we may clone well over 10 times, if it calls itself with two different
sets then 8 is already quite high I would say...

I suppose the call probabilities will eventually drop to be very low,
but I am not quite sure about that because we do not handle any IP
frequency propagation.  Do we have some way to treat wide trees? Or do
we clone only all self recursive calls are the same?

Honza
>  
>  -param=ipa-cp-min-recursive-probability=
> --
> 2.21.0.777.g83232e3864
>
Reply | Threaded
Open this post in threaded view
|

[PATCH v3] ipa-cp: Fix PGO regression caused by r278808

Xionghu Luo
Hi,

On 2020/1/3 00:58, Jan Hubicka wrote:

>> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
>> index 14064ae0034..947bf7c7199 100644
>> --- a/gcc/ipa-cp.c
>> +++ b/gcc/ipa-cp.c
>> @@ -4272,6 +4272,31 @@ update_profiling_info (struct cgraph_node *orig_node,
>>        false);
>>     new_sum = stats.count_sum;
>>  
>> +  class ipa_node_params *info = IPA_NODE_REF (orig_node);
>> +  if (info && info->node_is_self_scc)
>> +    {
>> +      profile_count self_recursive_count;
>> +
>> +      /* The self recursive edge is on the orig_node.  */
>> +      for (cs = orig_node->callees; cs; cs = cs->next_callee)
>> + if (ipa_edge_within_scc (cs))
>> +  {
>> +    cgraph_node *callee = cs->callee->function_symbol ();
>> +    if (callee && cs->caller == cs->callee)
>> +      {
>> + self_recursive_count = cs->count;
>> + break;
>> +      }
>> +  }
>
> What happens here when there are multiple self recursive calls or when
> the SCC has two mutually recursive functions?
>
> I am still confused by the logic of this function.  I will take a deeper
> look at your previous email.
>> +
>> +      /* Proportion count for self recursive node from all callers.  */
>> +      new_sum
>> + = (orig_sum + new_sum).apply_scale (self_recursive_count, orig_sum);
>> +
>> +      /* Proportion count for specialized cloned node.  */
>> +      new_sum = new_sum.apply_scale (1, param_ipa_cp_max_recursive_depth);
>> +    }
>> +
>>     if (orig_node_count < orig_sum + new_sum)
>>       {
>>         if (dump_file)
>> diff --git a/gcc/params.opt b/gcc/params.opt
>> index d88ae0c468b..40a03b04580 100644
>> --- a/gcc/params.opt
>> +++ b/gcc/params.opt
>> @@ -199,7 +199,7 @@ Common Joined UInteger Var(param_ipa_cp_loop_hint_bonus) Init(64) Param
>>   Compile-time bonus IPA-CP assigns to candidates which make loop bounds or strides known.
>>  
>>   -param=ipa-cp-max-recursive-depth=
>> -Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(8) Param
>> +Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(8) IntegerRange(1, 10) Param
>>   Maximum depth of recursive cloning for self-recursive function.
>
> The values stats from 0 but I also wonder why 10 is a good upper bound?
> If the function calls itself with one type of value (like depth-1) then
> we may clone well over 10 times, if it calls itself with two different
> sets then 8 is already quite high I would say...
>
> I suppose the call probabilities will eventually drop to be very low,
> but I am not quite sure about that because we do not handle any IP
> frequency propagation.  Do we have some way to treat wide trees? Or do
> we clone only all self recursive calls are the same?
>
> Honza
>>  
Update v3 patch.  This regression could only be reproduced when built with
"-fprofile-generate/-fprofile-use --param ipa-cp-eval-threshold=0 --param
ipa-cp-unit-growth=80" on exchange_2, on some platforms -fno-inline may be
also needed, I attached 3 files (compressed to exchange.tar.gz)
exchange2_gcc.use.orig.wpa.071i.cp.old, exchange2_gcc.use.orig.wpa.071i.cp.new
and cp.diff to show the profile count difference of specialized node
digits_2.constprop/152 to digits_2.constprop/159 without/with this patch.

Profile count is decreasing slowly with this patch instead of keeping very
small from the first clone (very low count as cold will cause complete unroll
not working), it still differs from really execution (exchange2.png), but this
average model takes the recursive edge as feedback.  Thanks.


v3:
1. Enable proportion orig_sum to the new nodes for self recursive node (k means
   multiple self recursive calls):
   new_sum = (orig_sum + new_sum) * 1 / k \
   * self_recursive_probability * (1 / param_ipa_cp_max_recursive_depth).
2. Two mutually recursive functions are not supported in self recursive
   clone yet so also not supported in update_profiling_info here.
3. Improve value range for param_ipa_cp_max_recursive_depth to (0, 8).
   If it calls itself two different sets, usually recursive boudary limit
   will stop the specialize first, otherwise it is slow even without
   recursive specialize.

The performance of exchange2 built with PGO will decrease ~28% by r278808
due to profile count set incorrectly.  The cloned nodes are updated to a
very small count caused later pass cunroll fail to unroll the recursive
function in exchange2.

digits_2 ->
digits_2.constprop.0, digits_2.constprop.1, etc.

gcc/ChangeLog:

        2020-01-14  Luo Xiong Hu  <[hidden email]>

        * ipa-cp.c (update_profiling_info): Check self_scc node.
        * params.opt (ipa-cp-max-recursive-depth): Set param range.
---
 gcc/ipa-cp.c   | 33 ++++++++++++++++++++++++++++++++-
 gcc/params.opt |  2 +-
 2 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 17da1d8e8a7..8b759168e24 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -2041,7 +2041,7 @@ propagate_vals_across_arith_jfunc (cgraph_edge *cs,
       /* Recursively generate lattice values with a limited count.  */
       FOR_EACH_VEC_ELT (val_seeds, i, src_val)
  {
-  for (int j = 1; j < max_recursive_depth; j++)
+  for (int j = 0; j < max_recursive_depth; j++)
     {
       tree cstval = get_val_across_arith_op (opcode, opnd1_type, opnd2,
      src_val, res_type);
@@ -4313,6 +4313,37 @@ update_profiling_info (struct cgraph_node *orig_node,
       false);
   new_sum = stats.count_sum;
 
+  class ipa_node_params *info = IPA_NODE_REF (orig_node);
+  if (info && info->node_is_self_scc)
+    {
+      profile_count self_recursive_count = profile_count::zero ();
+      unsigned k = 0;
+
+      /* The self recursive edge is on the orig_node.  */
+      for (cs = orig_node->callees; cs; cs = cs->next_callee)
+ if (ipa_edge_within_scc (cs))
+  {
+    cgraph_node *callee = cs->callee->function_symbol ();
+    if (callee && cs->caller == cs->callee)
+      {
+ self_recursive_count += cs->count;
+ k++;
+      }
+  }
+
+      /* Proportion count for multiple self recursive node.  */
+      if (k)
+ self_recursive_count.apply_scale (1, k);
+
+      /* Proportion count for self recursive node from all callers.  */
+      new_sum
+ = (orig_sum + new_sum).apply_scale (self_recursive_count, orig_sum);
+
+      /* Proportion count for specialized cloned node.  */
+      if (param_ipa_cp_max_recursive_depth)
+ new_sum = new_sum.apply_scale (1, param_ipa_cp_max_recursive_depth);
+    }
+
   if (orig_node_count < orig_sum + new_sum)
     {
       if (dump_file)
diff --git a/gcc/params.opt b/gcc/params.opt
index 31cc20031b1..9eceee5871f 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -199,7 +199,7 @@ Common Joined UInteger Var(param_ipa_cp_loop_hint_bonus) Init(64) Param Optimiza
 Compile-time bonus IPA-CP assigns to candidates which make loop bounds or strides known.
 
 -param=ipa-cp-max-recursive-depth=
-Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(8) Param Optimization
+Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(7) IntegerRange(0, 8) Param Optimization
 Maximum depth of recursive cloning for self-recursive function.
 
 -param=ipa-cp-min-recursive-probability=
--
2.21.0.777.g83232e3864


exchange2.tar.gz (211K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Ping^1: [PATCH v3] ipa-cp: Fix PGO regression caused by r278808

Xionghu Luo
Ping,

attachment of https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00764/exchange2.tar.gz
shows the profile count difference on cloned nodes digits_2.constprop.[0...8]
without/with this patch.  Thanks!

Xiong Hu


On 2020/1/14 14:45, luoxhu wrote:

> Hi,
>
> On 2020/1/3 00:58, Jan Hubicka wrote:
>>> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
>>> index 14064ae0034..947bf7c7199 100644
>>> --- a/gcc/ipa-cp.c
>>> +++ b/gcc/ipa-cp.c
>>> @@ -4272,6 +4272,31 @@ update_profiling_info (struct cgraph_node
>>> *orig_node,
>>>                             false);
>>>     new_sum = stats.count_sum;
>>> +  class ipa_node_params *info = IPA_NODE_REF (orig_node);
>>> +  if (info && info->node_is_self_scc)
>>> +    {
>>> +      profile_count self_recursive_count;
>>> +
>>> +      /* The self recursive edge is on the orig_node.  */
>>> +      for (cs = orig_node->callees; cs; cs = cs->next_callee)
>>> +    if (ipa_edge_within_scc (cs))
>>> +      {
>>> +        cgraph_node *callee = cs->callee->function_symbol ();
>>> +        if (callee && cs->caller == cs->callee)
>>> +          {
>>> +        self_recursive_count = cs->count;
>>> +        break;
>>> +          }
>>> +      }
>>
>> What happens here when there are multiple self recursive calls or when
>> the SCC has two mutually recursive functions?
>>
>> I am still confused by the logic of this function.  I will take a deeper
>> look at your previous email.
>>> +
>>> +      /* Proportion count for self recursive node from all callers.  */
>>> +      new_sum
>>> +    = (orig_sum + new_sum).apply_scale (self_recursive_count, orig_sum);
>>> +
>>> +      /* Proportion count for specialized cloned node.  */
>>> +      new_sum = new_sum.apply_scale (1, param_ipa_cp_max_recursive_depth);
>>> +    }
>>> +
>>>     if (orig_node_count < orig_sum + new_sum)
>>>       {
>>>         if (dump_file)
>>> diff --git a/gcc/params.opt b/gcc/params.opt
>>> index d88ae0c468b..40a03b04580 100644
>>> --- a/gcc/params.opt
>>> +++ b/gcc/params.opt
>>> @@ -199,7 +199,7 @@ Common Joined UInteger
>>> Var(param_ipa_cp_loop_hint_bonus) Init(64) Param
>>>   Compile-time bonus IPA-CP assigns to candidates which make loop bounds
>>> or strides known.
>>>   -param=ipa-cp-max-recursive-depth=
>>> -Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(8) Param
>>> +Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(8)
>>> IntegerRange(1, 10) Param
>>>   Maximum depth of recursive cloning for self-recursive function.
>>
>> The values stats from 0 but I also wonder why 10 is a good upper bound?
>> If the function calls itself with one type of value (like depth-1) then
>> we may clone well over 10 times, if it calls itself with two different
>> sets then 8 is already quite high I would say...
>>
>> I suppose the call probabilities will eventually drop to be very low,
>> but I am not quite sure about that because we do not handle any IP
>> frequency propagation.  Do we have some way to treat wide trees? Or do
>> we clone only all self recursive calls are the same?
>>
>> Honza
>
> Update v3 patch.  This regression could only be reproduced when built with
> "-fprofile-generate/-fprofile-use --param ipa-cp-eval-threshold=0 --param
> ipa-cp-unit-growth=80" on exchange_2, on some platforms -fno-inline may be
> also needed, I attached 3 files (compressed to exchange.tar.gz)
> exchange2_gcc.use.orig.wpa.071i.cp.old, exchange2_gcc.use.orig.wpa.071i.cp.new
> and cp.diff to show the profile count difference of specialized node
> digits_2.constprop/152 to digits_2.constprop/159 without/with this patch.
>
> Profile count is decreasing slowly with this patch instead of keeping very
> small from the first clone (very low count as cold will cause complete unroll
> not working), it still differs from really execution (exchange2.png), but this
> average model takes the recursive edge as feedback.  Thanks.
>
>
> v3:
> 1. Enable proportion orig_sum to the new nodes for self recursive node (k
> means
>    multiple self recursive calls):
>    new_sum = (orig_sum + new_sum) * 1 / k \
>    * self_recursive_probability * (1 / param_ipa_cp_max_recursive_depth).
> 2. Two mutually recursive functions are not supported in self recursive
>    clone yet so also not supported in update_profiling_info here.
> 3. Improve value range for param_ipa_cp_max_recursive_depth to (0, 8).
>    If it calls itself two different sets, usually recursive boudary limit
>    will stop the specialize first, otherwise it is slow even without
>    recursive specialize.
>
> The performance of exchange2 built with PGO will decrease ~28% by r278808
> due to profile count set incorrectly.  The cloned nodes are updated to a
> very small count caused later pass cunroll fail to unroll the recursive
> function in exchange2.
>
> digits_2 ->
> digits_2.constprop.0, digits_2.constprop.1, etc.
>
> gcc/ChangeLog:
>
>      2020-01-14  Luo Xiong Hu  <[hidden email]>
>
>      * ipa-cp.c (update_profiling_info): Check self_scc node.
>      * params.opt (ipa-cp-max-recursive-depth): Set param range.
> ---
> gcc/ipa-cp.c   | 33 ++++++++++++++++++++++++++++++++-
> gcc/params.opt |  2 +-
> 2 files changed, 33 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> index 17da1d8e8a7..8b759168e24 100644
> --- a/gcc/ipa-cp.c
> +++ b/gcc/ipa-cp.c
> @@ -2041,7 +2041,7 @@ propagate_vals_across_arith_jfunc (cgraph_edge *cs,
>        /* Recursively generate lattice values with a limited count.  */
>        FOR_EACH_VEC_ELT (val_seeds, i, src_val)
>      {
> -      for (int j = 1; j < max_recursive_depth; j++)
> +      for (int j = 0; j < max_recursive_depth; j++)
>          {
>            tree cstval = get_val_across_arith_op (opcode, opnd1_type, opnd2,
>                               src_val, res_type);
> @@ -4313,6 +4313,37 @@ update_profiling_info (struct cgraph_node *orig_node,
>                            false);
>    new_sum = stats.count_sum;
>
> +  class ipa_node_params *info = IPA_NODE_REF (orig_node);
> +  if (info && info->node_is_self_scc)
> +    {
> +      profile_count self_recursive_count = profile_count::zero ();
> +      unsigned k = 0;
> +
> +      /* The self recursive edge is on the orig_node.  */
> +      for (cs = orig_node->callees; cs; cs = cs->next_callee)
> +    if (ipa_edge_within_scc (cs))
> +      {
> +        cgraph_node *callee = cs->callee->function_symbol ();
> +        if (callee && cs->caller == cs->callee)
> +          {
> +        self_recursive_count += cs->count;
> +        k++;
> +          }
> +      }
> +
> +      /* Proportion count for multiple self recursive node.  */
> +      if (k)
> +    self_recursive_count.apply_scale (1, k);
> +
> +      /* Proportion count for self recursive node from all callers.  */
> +      new_sum
> +    = (orig_sum + new_sum).apply_scale (self_recursive_count, orig_sum);
> +
> +      /* Proportion count for specialized cloned node.  */
> +      if (param_ipa_cp_max_recursive_depth)
> +    new_sum = new_sum.apply_scale (1, param_ipa_cp_max_recursive_depth);
> +    }
> +
>    if (orig_node_count < orig_sum + new_sum)
>      {
>        if (dump_file)
> diff --git a/gcc/params.opt b/gcc/params.opt
> index 31cc20031b1..9eceee5871f 100644
> --- a/gcc/params.opt
> +++ b/gcc/params.opt
> @@ -199,7 +199,7 @@ Common Joined UInteger
> Var(param_ipa_cp_loop_hint_bonus) Init(64) Param Optimiza
> Compile-time bonus IPA-CP assigns to candidates which make loop bounds or
> strides known.
>
> -param=ipa-cp-max-recursive-depth=
> -Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(8) Param
> Optimization
> +Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(7)
> IntegerRange(0, 8) Param Optimization
> Maximum depth of recursive cloning for self-recursive function.
>
> -param=ipa-cp-min-recursive-probability=