[Bug target/91522] New: [10 Regression] STV is slow

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[Bug target/91522] New: [10 Regression] STV is slow

slyfox at inbox dot ru
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91522

            Bug ID: 91522
           Summary: [10 Regression] STV is slow
           Product: gcc
           Version: 10.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

Compiling module_domain.fppized.f90 from 521.wrf with -Ofast -march=haswell
reveals

 machine dep reorg                  :  89.07 ( 95%)   0.02 ( 18%)  89.10 ( 95%)
     54 kB (  0%)
 TOTAL                              :  95.13          0.13         95.32      
 103803 kB

perf reports (checking is enabled but -fno-checking supplied):

  83.74%  f951      f951                                     [.] (anonymous
namespace)::scalar_chain::analyze_register_chain
   5.80%  f951      f951                                     [.] bitmap_bit_p  
   5.13%  f951      f951                                     [.] (anonymous
namespace)::general_scalar_chain::mark_dual_mode_def

which is exactly what I noticed when reviewing the pass functionality:

  /* ???  The following is quadratic since analyze_register_chain
     iterates over all refs to look for dual-mode regs.  Instead this
     should be done separately for all regs mentioned in the chain once.  */
Reply | Threaded
Open this post in threaded view
|

[Bug target/91522] [10 Regression] STV is slow

slyfox at inbox dot ru
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91522

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |compile-time-hog
             Target|                            |x86_64-*-* i?86-*-*
             Status|UNCONFIRMED                 |ASSIGNED
   Last reconfirmed|                            |2019-08-22
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org
   Target Milestone|---                         |10.0
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
I will have a look.
Reply | Threaded
Open this post in threaded view
|

[Bug target/91522] [10 Regression] STV is slow

slyfox at inbox dot ru
In reply to this post by slyfox at inbox dot ru
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91522

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
So in particular

  for (ref = DF_INSN_UID_DEFS (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref))
    if (!HARD_REGISTER_P (DF_REF_REG (ref)))
      for (def = DF_REG_DEF_CHAIN (DF_REF_REGNO (ref));
           def;
           def = DF_REF_NEXT_REG (def))
        analyze_register_chain (candidates, ref);

looks odd since that iterates over all defs in insn_uid, then over
all defs of that register everywhere else in the function and
analyze_register_chain then iterating over the corresponding def->use
chain.  I'd say the iteration over all defs everywhere else is
not necessary and it should be simply

Index: config/i386/i386-features.c
===================================================================
--- config/i386/i386-features.c (revision 274764)
+++ config/i386/i386-features.c (working copy)
@@ -419,10 +419,7 @@ scalar_chain::add_insn (bitmap candidate
   df_ref def;
   for (ref = DF_INSN_UID_DEFS (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref))
     if (!HARD_REGISTER_P (DF_REF_REG (ref)))
-      for (def = DF_REG_DEF_CHAIN (DF_REF_REGNO (ref));
-          def;
-          def = DF_REF_NEXT_REG (def))
-       analyze_register_chain (candidates, def);
+      analyze_register_chain (candidates, ref);
   for (ref = DF_INSN_UID_USES (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref))
     if (!DF_REF_REG_MEM_P (ref))
       analyze_register_chain (candidates, ref);

which fixes the slowness.  I'm going to test that.
Reply | Threaded
Open this post in threaded view
|

[Bug target/91522] [10 Regression] STV is slow

slyfox at inbox dot ru
In reply to this post by slyfox at inbox dot ru
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91522

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
Uh, all other DF_REG_REG_CHAIN uses need to be updated as well I guess, how
we convert defs and uses seems to be a slight mess :/  I'm going to try
to rewrite this part to

 for insn in insns
   for def in insn
     if def in defs_conv
       replace def in insn with subreg of new pseudo from new defs-map
       emit copy from new def to original scalar def
     else
       replace def with subreg
   for use in insn
     if use in defs_conv & ~defs
       replace use in insn with subreg of new pseudo from new defs-map
       emit copy from old scalar def to new def
     else
       replace use with subreg [of new pseudo from new defs-map]

that should also support chains where not all defs of a pseudo are part
of the chain.  Since we have ud and du chains we can use those more.
Reply | Threaded
Open this post in threaded view
|

[Bug target/91522] [10 Regression] STV is slow

slyfox at inbox dot ru
In reply to this post by slyfox at inbox dot ru
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91522

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
Author: rguenth
Date: Mon Aug 26 10:35:59 2019
New Revision: 274926

URL: https://gcc.gnu.org/viewcvs?rev=274926&root=gcc&view=rev
Log:
2019-08-26  Richard Biener  <[hidden email]>

        PR target/91522
        PR target/91527
        * config/i386/i386-features.h (general_scalar_chain::defs_map):
        New member.
        (general_scalar_chain::replace_with_subreg): Remove.
        (general_scalar_chain::replace_with_subreg_in_insn): Likewise.
        (general_scalar_chain::convert_reg): Adjust signature.
        * config/i386/i386-features.c (scalar_chain::add_insn): Do not
        iterate over all defs of a reg.
        (general_scalar_chain::replace_with_subreg): Remove.
        (general_scalar_chain::replace_with_subreg_in_insn): Likewise.
        (general_scalar_chain::make_vector_copies): Populate defs_map,
        place copy only after defs that are used as vectors in the chain.
        (general_scalar_chain::convert_reg): Emit a copy for a specific
        def in a specific instruction.
        (general_scalar_chain::convert_op): All reg uses are converted here.
        (general_scalar_chain::convert_insn): Emit copies for scalar
        uses of defs here.  Replace uses with the copies we created.
        Replace and convert the def.  Adjust REG_DEAD notes, remove
        REG_EQUIV/EQUAL notes.
        (general_scalar_chain::convert_registers): Only handle copies
        into the chain here.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386-features.c
    trunk/gcc/config/i386/i386-features.h
Reply | Threaded
Open this post in threaded view
|

[Bug target/91522] [10 Regression] STV is slow

slyfox at inbox dot ru
In reply to this post by slyfox at inbox dot ru
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91522

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
Fixed.