[RFC][GCC][AArch64] Add minmax phi-reduction pattern

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[RFC][GCC][AArch64] Add minmax phi-reduction pattern

Joel Hutton
Hi all,

Just looking for some feedback on the approach.

Currently the loop vectorizer can't vectorize the following typical loop
for getting max value and index from an array:

void test_vec(int *data, int n) {
         int best_i, best = 0;

         for (int i = 0; i < n; i++) {
                 if (data[i] > best) {
                         best = data[i];
                         best_i = i;
                 }
         }

         data[best_i] = data[0];
         data[0] = best;
}

This patch adds some support for this pattern.

This patch addresses Bug 88259.

Regression testing is still in progress,
This patch does not work correctly with vect-epilogues-nomask, and the
reason for that is still being investigated.

gcc/ChangeLog:


2019-11-15  Joel Hutton  <[hidden email]>
         Tamar Christina  <[hidden email]>

     PR tree-optimization/88259
     * tree-vect-loop.c (vect_reassociating_reduction_simple_p): New
function.
     (vect_recog_minmax_index_pattern): New function.
     (vect_is_simple_reduction): Add check for minmax pattern.
     (vect_model_reduction_cost): Add case for minmax pattern.
     (vect_create_epilog_for_reduction): Add fixup for minmax epilog.
     * tree-vectorizer.h (enum vect_reduction_type): Add
INDEX_MINMAX_REDUCTION reduction type.

rb12291.patch (12K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [RFC][GCC][AArch64] Add minmax phi-reduction pattern

Joel Hutton
Forgot to CC maintainer.
On 15/11/2019 18:03, Joel wrote:

> Hi all,
>
> Just looking for some feedback on the approach.
>
> Currently the loop vectorizer can't vectorize the following typical
> loop for getting max value and index from an array:
>
> void test_vec(int *data, int n) {
>         int best_i, best = 0;
>
>         for (int i = 0; i < n; i++) {
>                 if (data[i] > best) {
>                         best = data[i];
>                         best_i = i;
>                 }
>         }
>
>         data[best_i] = data[0];
>         data[0] = best;
> }
>
> This patch adds some support for this pattern.
>
> This patch addresses Bug 88259.
>
> Regression testing is still in progress,
> This patch does not work correctly with vect-epilogues-nomask, and the
> reason for that is still being investigated.
>
> gcc/ChangeLog:
>
>
> 2019-11-15  Joel Hutton  <[hidden email]>
>         Tamar Christina  <[hidden email]>
>
>     PR tree-optimization/88259
>     * tree-vect-loop.c (vect_reassociating_reduction_simple_p): New
> function.
>     (vect_recog_minmax_index_pattern): New function.
>     (vect_is_simple_reduction): Add check for minmax pattern.
>     (vect_model_reduction_cost): Add case for minmax pattern.
>     (vect_create_epilog_for_reduction): Add fixup for minmax epilog.
>     * tree-vectorizer.h (enum vect_reduction_type): Add
> INDEX_MINMAX_REDUCTION reduction type.
Reply | Threaded
Open this post in threaded view
|

Re: [RFC][GCC][AArch64] Add minmax phi-reduction pattern

Richard Biener
On Fri, 15 Nov 2019, Joel Hutton wrote:

> Forgot to CC maintainer.
> On 15/11/2019 18:03, Joel wrote:
> > Hi all,
> >
> > Just looking for some feedback on the approach.
> >
> > Currently the loop vectorizer can't vectorize the following typical
> > loop for getting max value and index from an array:
> >
> > void test_vec(int *data, int n) {
> >         int best_i, best = 0;
> >
> >         for (int i = 0; i < n; i++) {
> >                 if (data[i] > best) {
> >                         best = data[i];
> >                         best_i = i;
> >                 }
> >         }
> >
> >         data[best_i] = data[0];
> >         data[0] = best;
> > }
> >
> > This patch adds some support for this pattern.
> >
> > This patch addresses Bug 88259.
> >
> > Regression testing is still in progress,
> > This patch does not work correctly with vect-epilogues-nomask, and the
> > reason for that is still being investigated.
Quick posting before stage1 ends, heh?

New functions lack comments, new fields in stmt_info as well,
accesses should go through (missing) STMT_VINFO_* macros.

You are removing a safety check without replacement elsewhere:

-      /* Check there's only a single stmt the op is used on inside
-         of the loop.  */
-      imm_use_iterator imm_iter;
-      gimple *op_use_stmt;
-      unsigned cnt = 0;
-      FOR_EACH_IMM_USE_STMT (op_use_stmt, imm_iter, op)
-       if (!is_gimple_debug (op_use_stmt)
-           && flow_bb_inside_loop_p (loop, gimple_bb (op_use_stmt)))
-         cnt++;
-      if (cnt != 1)
-       {
-         fail = true;
-         break;
-       }

AFAICS you still analyze both PHIs but the correct thing to do
here is to view both SSA cycles as a single one - when analyzing

  # best_i_25 = PHI <best_i_11(8), best_i_16(D)(18)>
  # best_26 = PHI <best_13(8), 0(18)>
...
  best_i_11 = _4 <= best_26 ? best_i_25 : i_27;
  best_13 = MAX_EXPR <_4, best_26>;

and starting from the best_i_25 PHI nothing prevents check_reduction_path
SCC finding to walk to the best_26 cycle but luck(?).

And the sanity check should be that all uses are within the
detected cycle (which then would be the case).

So your handling looks like an afterthought - exactly going backwards
of my attempts to make the code better structured :/

The code-gen you do in vect_create_epilog_for_reduction needs a
comment - vect_create_epilog_for_reduction will be called
twice in unspecified order, so clearly unconditionally
accessing stmt_info->reduc_related_stmt->reduc_minmax_epilog_stmt
(if it is what it looks like...) is not going to work.  You are
also using IFN_REDUC_MAX which possibly isn't available.

So I think what you want to do is "detect" the SCC with two PHIs,
then - maybe in tree-vect-patterns replace it with a single PHI
and a single operation, or somehow mangle it into a SLP tree
to make it a single reduction.

So please see to all this for next stage1.

Thanks,
Richard.

> > gcc/ChangeLog:
> >
> >
> > 2019-11-15  Joel Hutton  <[hidden email]>
> >         Tamar Christina  <[hidden email]>
> >
> >     PR tree-optimization/88259
> >     * tree-vect-loop.c (vect_reassociating_reduction_simple_p): New
> > function.
> >     (vect_recog_minmax_index_pattern): New function.
> >     (vect_is_simple_reduction): Add check for minmax pattern.
> >     (vect_model_reduction_cost): Add case for minmax pattern.
> >     (vect_create_epilog_for_reduction): Add fixup for minmax epilog.
> >     * tree-vectorizer.h (enum vect_reduction_type): Add
> > INDEX_MINMAX_REDUCTION reduction type.
Reply | Threaded
Open this post in threaded view
|

RE: [RFC][GCC][AArch64] Add minmax phi-reduction pattern

Tamar Christina
Hi Richi,

Thanks for the feedback, if you wouldn't mind indulging me quickly (for the version for next stage-1)

The original approach we had was indeed using a vec-pattern which turned

best_i_11 = _4 > best_26 ? i_27 : best_i_25;
best_13 = MAX_EXPR <_4, best_26>;

into

best_13 = MAX_EXPR <_4, best_26>;
best_i_11 = _4 == best_13 ? best_i_25 : i_27;

which was:

static gimple *
vect_recog_minmax_index_pattern (stmt_vec_info stmt_vinfo, tree *type_out)
{
  tree oprnd0, oprnd1;
  gimple *last_stmt = stmt_vinfo->stmt;
  vec_info *vinfo = stmt_vinfo->vinfo;
  tree type;
  gimple *use_stmt;
  imm_use_iterator iter;
  gimple_stmt_iterator gsi;

  /* Starting from LAST_STMT, follow the defs of its uses in search
     of the above pattern.  */

  if (!vect_reassociating_reduction_simple_p (stmt_vinfo, MAX_EXPR,
                                              &oprnd0, &oprnd1))
    return NULL;

  type = gimple_expr_type (last_stmt);

  if (!is_a <gphi *> (SSA_NAME_DEF_STMT (oprnd1)))
    return NULL;

  stmt_vec_info phy_vinfo = vinfo->lookup_def (oprnd1);
  if (!phy_vinfo)
    return NULL;

  basic_block top_bb = gimple_bb (last_stmt);
  bool found_p = false;

  FOR_EACH_IMM_USE_STMT (use_stmt, iter, oprnd1)
    {
      if (is_gimple_assign (use_stmt)
          && gimple_assign_rhs_code (use_stmt) == COND_EXPR)
        {
          basic_block bb = gimple_bb (use_stmt);

          if (bb == top_bb
              && gimple_uid (use_stmt) < gimple_uid (last_stmt))
            {
              tree cond = gimple_assign_rhs1 (use_stmt);
              if (TREE_CODE (cond) != GT_EXPR)
                continue;

              tree true_b = gimple_assign_rhs2 (use_stmt);
              tree false_b = gimple_assign_rhs3 (use_stmt);
              TREE_SET_CODE (cond, NE_EXPR);
              TREE_OPERAND (cond, 1) = gimple_assign_lhs (last_stmt);
              gimple_set_op (use_stmt, 3, false_b);
              gimple_set_op (use_stmt, 2, true_b);

              gimple_seq *def_seq = &STMT_VINFO_PATTERN_DEF_SEQ (stmt_vinfo);
              gsi = gsi_for_stmt (use_stmt, def_seq);
              gsi_remove (&gsi, false);
              gsi = gsi_for_stmt (last_stmt, def_seq);
              gimple_set_bb (use_stmt, bb);
              gsi_insert_after (&gsi, use_stmt, GSI_SAME_STMT);

              vect_pattern_detected ("vect_recog_minmax_index_pattern",
                                     last_stmt);
              found_p = true;
            }
        }
    }

  if (found_p)
    {
          STMT_VINFO_DEF_TYPE (phy_vinfo) = vect_min_max_reduction_def;
    }

  return NULL;
}

So it's rewriting the statement into one where the recursive phi node is only used once
And marking the reduction as a vect_min_max_reduction_def to handle it correctly later on.

There was two things that bugged be about it though. Returning NULL from the vect-pattern felt odd.
We could return the last statement here as a the pattern, but since we also have to re-order the statements in the BB it seemed better not to treat it as a simple pattern.
This made me think it really didn't belong here and that it's not really a vectorizer pattern, but rather a work around.

Maybe this should be done in ifconv? Where the conversions are created in the first place?

Also doing this seemed to require a lot of generic boiler plate changes to support vect_min_max_reduction_def, which also
seemed a bit messy.

Any thoughts here?

On a side note, this code in vect_create_epilog_for_reduction

          for (elt_offset = nelements / 2;
               elt_offset >= 1;
               elt_offset /= 2)
            {
              calc_vec_perm_mask_for_shift (elt_offset, nelements, &sel);
              indices.new_vector (sel, 2, nelements);
              tree mask = vect_gen_perm_mask_any (vectype1, indices);
              new_name = gimple_build (&stmts, VEC_PERM_EXPR, vectype1,
                                       new_temp, zero_vec, mask);
              new_temp = gimple_build (&stmts, code,
                                       vectype1, new_name, new_temp);
            }
           gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);

           .....

          rhs = build3 (BIT_FIELD_REF, scalar_type, new_temp,
                        bitsize, bitsize_zero_node);
               epilog_stmt = gimple_build_assign (new_scalar_dest, rhs);

Seems like a very roundabout way of doing a reduction.. unless I have misunderstood something?
If "code" is PLUS, MIN, MAX etc wouldn't a reduction call here be better if the target supports it?  This is quite hard to patch up later in combine particularly for byte of hf modes where the number of statements you'd need to match exceed combine's maximum.

Thanks,
Tamar

> -----Original Message-----
> From: Richard Biener <[hidden email]>
> Sent: Monday, November 18, 2019 11:24
> To: Joel Hutton <[hidden email]>
> Cc: GCC Patches <[hidden email]>; Tamar Christina
> <[hidden email]>; nd <[hidden email]>
> Subject: Re: [RFC][GCC][AArch64] Add minmax phi-reduction pattern
>
> On Fri, 15 Nov 2019, Joel Hutton wrote:
>
> > Forgot to CC maintainer.
> > On 15/11/2019 18:03, Joel wrote:
> > > Hi all,
> > >
> > > Just looking for some feedback on the approach.
> > >
> > > Currently the loop vectorizer can't vectorize the following typical
> > > loop for getting max value and index from an array:
> > >
> > > void test_vec(int *data, int n) {
> > >         int best_i, best = 0;
> > >
> > >         for (int i = 0; i < n; i++) {
> > >                 if (data[i] > best) {
> > >                         best = data[i];
> > >                         best_i = i;
> > >                 }
> > >         }
> > >
> > >         data[best_i] = data[0];
> > >         data[0] = best;
> > > }
> > >
> > > This patch adds some support for this pattern.
> > >
> > > This patch addresses Bug 88259.
> > >
> > > Regression testing is still in progress, This patch does not work
> > > correctly with vect-epilogues-nomask, and the reason for that is
> > > still being investigated.
>
> Quick posting before stage1 ends, heh?
>
> New functions lack comments, new fields in stmt_info as well, accesses
> should go through (missing) STMT_VINFO_* macros.
>
> You are removing a safety check without replacement elsewhere:
>
> -      /* Check there's only a single stmt the op is used on inside
> -         of the loop.  */
> -      imm_use_iterator imm_iter;
> -      gimple *op_use_stmt;
> -      unsigned cnt = 0;
> -      FOR_EACH_IMM_USE_STMT (op_use_stmt, imm_iter, op)
> -       if (!is_gimple_debug (op_use_stmt)
> -           && flow_bb_inside_loop_p (loop, gimple_bb (op_use_stmt)))
> -         cnt++;
> -      if (cnt != 1)
> -       {
> -         fail = true;
> -         break;
> -       }
>
> AFAICS you still analyze both PHIs but the correct thing to do here is to view
> both SSA cycles as a single one - when analyzing
>
>   # best_i_25 = PHI <best_i_11(8), best_i_16(D)(18)>
>   # best_26 = PHI <best_13(8), 0(18)>
> ...
>   best_i_11 = _4 <= best_26 ? best_i_25 : i_27;
>   best_13 = MAX_EXPR <_4, best_26>;
>
> and starting from the best_i_25 PHI nothing prevents check_reduction_path
> SCC finding to walk to the best_26 cycle but luck(?).
>
> And the sanity check should be that all uses are within the detected cycle
> (which then would be the case).
>
> So your handling looks like an afterthought - exactly going backwards of my
> attempts to make the code better structured :/
>
> The code-gen you do in vect_create_epilog_for_reduction needs a comment
> - vect_create_epilog_for_reduction will be called twice in unspecified order,
> so clearly unconditionally accessing stmt_info->reduc_related_stmt-
> >reduc_minmax_epilog_stmt
> (if it is what it looks like...) is not going to work.  You are also using
> IFN_REDUC_MAX which possibly isn't available.
>
> So I think what you want to do is "detect" the SCC with two PHIs, then -
> maybe in tree-vect-patterns replace it with a single PHI and a single
> operation, or somehow mangle it into a SLP tree to make it a single reduction.
>
> So please see to all this for next stage1.
>
> Thanks,
> Richard.
>
> > > gcc/ChangeLog:
> > >
> > >
> > > 2019-11-15  Joel Hutton  <[hidden email]>
> > >         Tamar Christina  <[hidden email]>
> > >
> > >     PR tree-optimization/88259
> > >     * tree-vect-loop.c (vect_reassociating_reduction_simple_p): New
> > > function.
> > >     (vect_recog_minmax_index_pattern): New function.
> > >     (vect_is_simple_reduction): Add check for minmax pattern.
> > >     (vect_model_reduction_cost): Add case for minmax pattern.
> > >     (vect_create_epilog_for_reduction): Add fixup for minmax epilog.
> > >     * tree-vectorizer.h (enum vect_reduction_type): Add
> > > INDEX_MINMAX_REDUCTION reduction type.
Reply | Threaded
Open this post in threaded view
|

RE: [RFC][GCC][AArch64] Add minmax phi-reduction pattern

Richard Biener
On Mon, 18 Nov 2019, Tamar Christina wrote:

> Hi Richi,
>
> Thanks for the feedback, if you wouldn't mind indulging me quickly (for the version for next stage-1)
>
> The original approach we had was indeed using a vec-pattern which turned
>
> best_i_11 = _4 > best_26 ? i_27 : best_i_25;
> best_13 = MAX_EXPR <_4, best_26>;
>
> into
>
> best_13 = MAX_EXPR <_4, best_26>;
> best_i_11 = _4 == best_13 ? best_i_25 : i_27;
Ah, interesting way to rewrite this indeed (didn't think of it).  I
think for FP compares this might be a bit iffy though.

> which was:
>
> static gimple *
> vect_recog_minmax_index_pattern (stmt_vec_info stmt_vinfo, tree *type_out)
> {
>   tree oprnd0, oprnd1;
>   gimple *last_stmt = stmt_vinfo->stmt;
>   vec_info *vinfo = stmt_vinfo->vinfo;
>   tree type;
>   gimple *use_stmt;
>   imm_use_iterator iter;
>   gimple_stmt_iterator gsi;
>
>   /* Starting from LAST_STMT, follow the defs of its uses in search
>      of the above pattern.  */
>
>   if (!vect_reassociating_reduction_simple_p (stmt_vinfo, MAX_EXPR,
>      &oprnd0, &oprnd1))
>     return NULL;
>
>   type = gimple_expr_type (last_stmt);
>
>   if (!is_a <gphi *> (SSA_NAME_DEF_STMT (oprnd1)))
>     return NULL;
>
>   stmt_vec_info phy_vinfo = vinfo->lookup_def (oprnd1);
>   if (!phy_vinfo)
>     return NULL;
>
>   basic_block top_bb = gimple_bb (last_stmt);
>   bool found_p = false;
>
>   FOR_EACH_IMM_USE_STMT (use_stmt, iter, oprnd1)
>     {
>       if (is_gimple_assign (use_stmt)
>  && gimple_assign_rhs_code (use_stmt) == COND_EXPR)
> {
>  basic_block bb = gimple_bb (use_stmt);
>
>  if (bb == top_bb
>      && gimple_uid (use_stmt) < gimple_uid (last_stmt))
>    {
>      tree cond = gimple_assign_rhs1 (use_stmt);
>      if (TREE_CODE (cond) != GT_EXPR)
> continue;
>
>      tree true_b = gimple_assign_rhs2 (use_stmt);
>      tree false_b = gimple_assign_rhs3 (use_stmt);
>      TREE_SET_CODE (cond, NE_EXPR);
>      TREE_OPERAND (cond, 1) = gimple_assign_lhs (last_stmt);
>      gimple_set_op (use_stmt, 3, false_b);
>      gimple_set_op (use_stmt, 2, true_b);
>
>      gimple_seq *def_seq = &STMT_VINFO_PATTERN_DEF_SEQ (stmt_vinfo);
>      gsi = gsi_for_stmt (use_stmt, def_seq);
>      gsi_remove (&gsi, false);
>      gsi = gsi_for_stmt (last_stmt, def_seq);
>      gimple_set_bb (use_stmt, bb);
>      gsi_insert_after (&gsi, use_stmt, GSI_SAME_STMT);
>
>      vect_pattern_detected ("vect_recog_minmax_index_pattern",
>     last_stmt);
>      found_p = true;
>    }
> }
>     }
>
>   if (found_p)
>     {
>  STMT_VINFO_DEF_TYPE (phy_vinfo) = vect_min_max_reduction_def;
>     }
>
>   return NULL;
> }
>
> So it's rewriting the statement into one where the recursive phi node is
> only used once And marking the reduction as a vect_min_max_reduction_def
> to handle it correctly later on.
The main issue you likely run into is that cycle detection runs before
pattern detection and cycle detection already fails.

> There was two things that bugged be about it though. Returning NULL from the vect-pattern felt odd.
> We could return the last statement here as a the pattern, but since we also have to re-order the statements in the BB it seemed better not to treat it as a simple pattern.
> This made me think it really didn't belong here and that it's not really a vectorizer pattern, but rather a work around.
>
> Maybe this should be done in ifconv? Where the conversions are created
> in the first place?

That's also a possibility though the IL is valid also before if-conversion
so if-conversion might end up doing nothing.

> Also doing this seemed to require a lot of generic boiler plate changes
> to support vect_min_max_reduction_def, which also seemed a bit messy.

Well, that I'd simply defer to vectorizable_reduction which walks
the discovered cycle and computes STMT_VINFO_REDUC_TYPE
(COND_REDUCTION, etc.).

> Any thoughts here?
>
> On a side note, this code in vect_create_epilog_for_reduction
>
>           for (elt_offset = nelements / 2;
>                elt_offset >= 1;
>                elt_offset /= 2)
>             {
>      calc_vec_perm_mask_for_shift (elt_offset, nelements, &sel);
>      indices.new_vector (sel, 2, nelements);
>      tree mask = vect_gen_perm_mask_any (vectype1, indices);
>      new_name = gimple_build (&stmts, VEC_PERM_EXPR, vectype1,
>       new_temp, zero_vec, mask);
>      new_temp = gimple_build (&stmts, code,
>       vectype1, new_name, new_temp);
>             }
>            gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
>
>            .....
>
>  rhs = build3 (BIT_FIELD_REF, scalar_type, new_temp,
> bitsize, bitsize_zero_node);
>                epilog_stmt = gimple_build_assign (new_scalar_dest, rhs);
>
> Seems like a very roundabout way of doing a reduction.. unless I have misunderstood something?
> If "code" is PLUS, MIN, MAX etc wouldn't a reduction call here be better if the target supports it?  This is quite hard to patch up later in combine particularly for byte of hf modes where the number of statements you'd need to match exceed combine's maximum.
Yes, I think that's already done - vectorizable_reduction checks for
such supported epilogue operation.

So I think the main issue right now is that vect_is_simple_reduction
needs to identify the beast as a single reduction.

I think it's kind-of a COND_REDUCTION where the induction PHI is already
in the original source.

Richard.

> Thanks,
> Tamar
>
> > -----Original Message-----
> > From: Richard Biener <[hidden email]>
> > Sent: Monday, November 18, 2019 11:24
> > To: Joel Hutton <[hidden email]>
> > Cc: GCC Patches <[hidden email]>; Tamar Christina
> > <[hidden email]>; nd <[hidden email]>
> > Subject: Re: [RFC][GCC][AArch64] Add minmax phi-reduction pattern
> >
> > On Fri, 15 Nov 2019, Joel Hutton wrote:
> >
> > > Forgot to CC maintainer.
> > > On 15/11/2019 18:03, Joel wrote:
> > > > Hi all,
> > > >
> > > > Just looking for some feedback on the approach.
> > > >
> > > > Currently the loop vectorizer can't vectorize the following typical
> > > > loop for getting max value and index from an array:
> > > >
> > > > void test_vec(int *data, int n) {
> > > >         int best_i, best = 0;
> > > >
> > > >         for (int i = 0; i < n; i++) {
> > > >                 if (data[i] > best) {
> > > >                         best = data[i];
> > > >                         best_i = i;
> > > >                 }
> > > >         }
> > > >
> > > >         data[best_i] = data[0];
> > > >         data[0] = best;
> > > > }
> > > >
> > > > This patch adds some support for this pattern.
> > > >
> > > > This patch addresses Bug 88259.
> > > >
> > > > Regression testing is still in progress, This patch does not work
> > > > correctly with vect-epilogues-nomask, and the reason for that is
> > > > still being investigated.
> >
> > Quick posting before stage1 ends, heh?
> >
> > New functions lack comments, new fields in stmt_info as well, accesses
> > should go through (missing) STMT_VINFO_* macros.
> >
> > You are removing a safety check without replacement elsewhere:
> >
> > -      /* Check there's only a single stmt the op is used on inside
> > -         of the loop.  */
> > -      imm_use_iterator imm_iter;
> > -      gimple *op_use_stmt;
> > -      unsigned cnt = 0;
> > -      FOR_EACH_IMM_USE_STMT (op_use_stmt, imm_iter, op)
> > -       if (!is_gimple_debug (op_use_stmt)
> > -           && flow_bb_inside_loop_p (loop, gimple_bb (op_use_stmt)))
> > -         cnt++;
> > -      if (cnt != 1)
> > -       {
> > -         fail = true;
> > -         break;
> > -       }
> >
> > AFAICS you still analyze both PHIs but the correct thing to do here is to view
> > both SSA cycles as a single one - when analyzing
> >
> >   # best_i_25 = PHI <best_i_11(8), best_i_16(D)(18)>
> >   # best_26 = PHI <best_13(8), 0(18)>
> > ...
> >   best_i_11 = _4 <= best_26 ? best_i_25 : i_27;
> >   best_13 = MAX_EXPR <_4, best_26>;
> >
> > and starting from the best_i_25 PHI nothing prevents check_reduction_path
> > SCC finding to walk to the best_26 cycle but luck(?).
> >
> > And the sanity check should be that all uses are within the detected cycle
> > (which then would be the case).
> >
> > So your handling looks like an afterthought - exactly going backwards of my
> > attempts to make the code better structured :/
> >
> > The code-gen you do in vect_create_epilog_for_reduction needs a comment
> > - vect_create_epilog_for_reduction will be called twice in unspecified order,
> > so clearly unconditionally accessing stmt_info->reduc_related_stmt-
> > >reduc_minmax_epilog_stmt
> > (if it is what it looks like...) is not going to work.  You are also using
> > IFN_REDUC_MAX which possibly isn't available.
> >
> > So I think what you want to do is "detect" the SCC with two PHIs, then -
> > maybe in tree-vect-patterns replace it with a single PHI and a single
> > operation, or somehow mangle it into a SLP tree to make it a single reduction.
> >
> > So please see to all this for next stage1.
> >
> > Thanks,
> > Richard.
> >
> > > > gcc/ChangeLog:
> > > >
> > > >
> > > > 2019-11-15  Joel Hutton  <[hidden email]>
> > > >         Tamar Christina  <[hidden email]>
> > > >
> > > >     PR tree-optimization/88259
> > > >     * tree-vect-loop.c (vect_reassociating_reduction_simple_p): New
> > > > function.
> > > >     (vect_recog_minmax_index_pattern): New function.
> > > >     (vect_is_simple_reduction): Add check for minmax pattern.
> > > >     (vect_model_reduction_cost): Add case for minmax pattern.
> > > >     (vect_create_epilog_for_reduction): Add fixup for minmax epilog.
> > > >     * tree-vectorizer.h (enum vect_reduction_type): Add
> > > > INDEX_MINMAX_REDUCTION reduction type.
>
--
Richard Biener <[hidden email]>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)