[Bug rtl-optimization/57193] New: suboptimal register allocation for SSE registers

classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[Bug rtl-optimization/57193] New: suboptimal register allocation for SSE registers

ian at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57193



             Bug #: 57193

           Summary: suboptimal register allocation for SSE registers

    Classification: Unclassified

           Product: gcc

           Version: 4.9.0

            Status: UNCONFIRMED

          Severity: normal

          Priority: P3

         Component: rtl-optimization

        AssignedTo: [hidden email]

        ReportedBy: [hidden email]





This bug _might_ be related to PR56339, although that report talks about a

regression compared to 4.7, while this bug seems to be a regression compared to

4.4.



I was converting some hand-written asm code to SSE-intrinsics, but

unfortunately the version using intrinsics generates worse code. It contains

two unnecessary 'movdqa' instructions.



I managed to reduce my test to this routine:



//--------------------------------------------------------------

#include <emmintrin.h>



void test1(const __m128i* in1, const __m128i* in2, __m128i* out,

           __m128i f, __m128i zero)

{

    __m128i c = _mm_avg_epu8(*in1, *in2);

    __m128i l = _mm_unpacklo_epi8(c, zero);

    __m128i h = _mm_unpackhi_epi8(c, zero);

    __m128i m = _mm_mulhi_epu16(l, f);

    __m128i n = _mm_mulhi_epu16(h, f);

    *out = _mm_packus_epi16(m, n);

}

//--------------------------------------------------------------



A (few days old) gcc snapshot generates the following code. Versions 4.5, 4.6

and 4.7 generate similar code:



   0:   66 0f 6f 17             movdqa (%rdi),%xmm2

   4:   66 0f e0 16             pavgb  (%rsi),%xmm2

   8:   66 0f 6f da             movdqa %xmm2,%xmm3

   c:   66 0f 68 d1             punpckhbw %xmm1,%xmm2

  10:   66 0f 60 d9             punpcklbw %xmm1,%xmm3

  14:   66 0f e4 d0             pmulhuw %xmm0,%xmm2

  18:   66 0f 6f cb             movdqa %xmm3,%xmm1

  1c:   66 0f e4 c8             pmulhuw %xmm0,%xmm1

  20:   66 0f 6f c1             movdqa %xmm1,%xmm0

  24:   66 0f 67 c2             packuswb %xmm2,%xmm0

  28:   66 0f 7f 02             movdqa %xmm0,(%rdx)

  2c:   c3                      retq



Gcc version 4.3 and 4.4 (and clang) generate the following optimal(?) code:

   0:   66 0f 6f 17             movdqa (%rdi),%xmm2

   4:   66 0f e0 16             pavgb  (%rsi),%xmm2

   8:   66 0f 6f da             movdqa %xmm2,%xmm3

   c:   66 0f 68 d1             punpckhbw %xmm1,%xmm2

  10:   66 0f 60 d9             punpcklbw %xmm1,%xmm3

  14:   66 0f e4 d8             pmulhuw %xmm0,%xmm3

  18:   66 0f e4 c2             pmulhuw %xmm2,%xmm0

  1c:   66 0f 67 d8             packuswb %xmm0,%xmm3

  20:   66 0f 7f 1a             movdqa %xmm3,(%rdx)

  24:   c3                      retq
Reply | Threaded
Open this post in threaded view
|

[Bug rtl-optimization/57193] suboptimal register allocation for SSE registers

ian at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57193



Richard Biener <rguenth at gcc dot gnu.org> changed:



           What    |Removed                     |Added

----------------------------------------------------------------------------

           Keywords|                            |missed-optimization, ra

             Status|UNCONFIRMED                 |NEW

   Last reconfirmed|                            |2013-05-07

                 CC|                            |vmakarov at gcc dot gnu.org

     Ever Confirmed|0                           |1



--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> 2013-05-07 11:47:25 UTC ---

Confirmed.
Reply | Threaded
Open this post in threaded view
|

[Bug rtl-optimization/57193] [4.5/4.6/4.7/4.8/4.9 Regression] suboptimal register allocation for SSE registers

ian at gcc dot gnu.org
In reply to this post by ian at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57193



H.J. Lu <hjl.tools at gmail dot com> changed:



           What    |Removed                     |Added

----------------------------------------------------------------------------

                 CC|                            |hjl.tools at gmail dot com

            Summary|suboptimal register         |[4.5/4.6/4.7/4.8/4.9

                   |allocation for SSE          |Regression] suboptimal

                   |registers                   |register allocation for SSE

                   |                            |registers



--- Comment #2 from H.J. Lu <hjl.tools at gmail dot com> 2013-05-07 18:17:11 UTC ---

It is caused by revision 156641:



http://gcc.gnu.org/ml/gcc-cvs/2010-02/msg00222.html
Reply | Threaded
Open this post in threaded view
|

[Bug rtl-optimization/57193] [4.7/4.8/4.9 Regression] suboptimal register allocation for SSE registers

ian at gcc dot gnu.org
In reply to this post by ian at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57193



Richard Biener <rguenth at gcc dot gnu.org> changed:



           What    |Removed                     |Added

----------------------------------------------------------------------------

      Known to work|                            |4.4.6

   Target Milestone|---                         |4.7.4

            Summary|[4.5/4.6/4.7/4.8/4.9        |[4.7/4.8/4.9 Regression]

                   |Regression] suboptimal      |suboptimal register

                   |register allocation for SSE |allocation for SSE

                   |registers                   |registers

      Known to fail|                            |4.5.3, 4.6.4
Reply | Threaded
Open this post in threaded view
|

[Bug rtl-optimization/57193] [4.7/4.8/4.9 Regression] suboptimal register allocation for SSE registers

ian at gcc dot gnu.org
In reply to this post by ian at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57193

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P2
      Known to fail|                            |4.9.0

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
Re-confirmed on trunk.
Reply | Threaded
Open this post in threaded view
|

[Bug rtl-optimization/57193] [4.7/4.8/4.9 Regression] suboptimal register allocation for SSE registers

ian at gcc dot gnu.org
In reply to this post by ian at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57193

Richard Henderson <rth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|2013-05-07 00:00:00         |2014-2-12
                 CC|                            |rth at gcc dot gnu.org

--- Comment #4 from Richard Henderson <rth at gcc dot gnu.org> ---
It seems like incomplete reload inheritance:

(insn 19 16 21 2 (set (reg:V8HI 107)
  (truncate:V8HI
    (lshiftrt:V8SI
      (mult:V8SI (zero_extend:V8SI (subreg:V8HI (reg:V16QI 105) 0))
                 (zero_extend:V8SI (subreg:V8HI (reg/v:V2DI 101 [ f ]) 0)))
      (const_int 16 [0x10]))))
  include/emmintrin.h:1362 2134 {*umulv8hi3_highpart}
  (expr_list:REG_DEAD (reg:V16QI 105) (nil)))

      Creating newreg=111 from oldreg=107, assigning class SSE_REGS to r111
   19: r111:V8HI=trunc(zero_extend(r111:V8HI)*zero_extend(r101:V2DI#0) 0>>0x10)
      REG_DEAD r105:V16QI
    Inserting insn reload before:
   31: r111:V8HI=r105:V16QI#0
    Inserting insn reload after:
   32: r107:V8HI=r111:V8HI

The new register r111 does wind up inheriting from r107, but not
transitively to r105.  Thus we wind up leaving the copy insn 31.
Reply | Threaded
Open this post in threaded view
|

[Bug rtl-optimization/57193] [4.7/4.8/4.9/4.10 Regression] suboptimal register allocation for SSE registers

ian at gcc dot gnu.org
In reply to this post by ian at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57193

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.7.4                       |4.8.4

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
The 4.7 branch is being closed, moving target milestone to 4.8.4.
Reply | Threaded
Open this post in threaded view
|

[Bug rtl-optimization/57193] [4.8/4.9/5 Regression] suboptimal register allocation for SSE registers

ian at gcc dot gnu.org
In reply to this post by ian at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57193

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.8.4                       |4.8.5

--- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 4.8.4 has been released.
Reply | Threaded
Open this post in threaded view
|

[Bug rtl-optimization/57193] [4.8/4.9/5/6 Regression] suboptimal register allocation for SSE registers

ian at gcc dot gnu.org
In reply to this post by ian at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57193

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.8.5                       |4.9.3

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
The gcc-4_8-branch is being closed, re-targeting regressions to 4.9.3.
Reply | Threaded
Open this post in threaded view
|

[Bug rtl-optimization/57193] [4.9/5/6 Regression] suboptimal register allocation for SSE registers

ian at gcc dot gnu.org
In reply to this post by ian at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57193

--- Comment #8 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 4.9.3 has been released.
Reply | Threaded
Open this post in threaded view
|

[Bug rtl-optimization/57193] [4.9/5/6 Regression] suboptimal register allocation for SSE registers

ian at gcc dot gnu.org
In reply to this post by ian at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57193

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.9.3                       |4.9.4
Reply | Threaded
Open this post in threaded view
|

[Bug rtl-optimization/57193] [4.9/5/6 Regression] suboptimal register allocation for SSE registers

ian at gcc dot gnu.org
In reply to this post by ian at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57193

Bernd Schmidt <bernds at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bernds at gcc dot gnu.org

--- Comment #9 from Bernd Schmidt <bernds at gcc dot gnu.org> ---
It looks like the situation is as follows (X is the LRA-created reload reg)

X = a
op on X
b = X

where a and b are different registers already allocated by IRA, hence we can
avoid one copy at most. I'm not very familiar with LRA yet, but I see no code
to rethink such register allocation choices.

-frename-registers gets rid of one unnecessary copy, it was enhanced to detect
such situations for gcc-6. Maybe we should finally enable that for -O2 and
higher?
Reply | Threaded
Open this post in threaded view
|

[Bug rtl-optimization/57193] [4.9/5/6 Regression] suboptimal register allocation for SSE registers

ian at gcc dot gnu.org
In reply to this post by ian at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57193

--- Comment #10 from Jeffrey A. Law <law at redhat dot com> ---
Look in lra-coalesce, if we have code to eliminate those copies, that's where
I'd expect to find it.
Reply | Threaded
Open this post in threaded view
|

[Bug rtl-optimization/57193] [4.9/5/6/7 Regression] suboptimal register allocation for SSE registers

ian at gcc dot gnu.org
In reply to this post by ian at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57193

--- Comment #11 from Bernd Schmidt <bernds at gcc dot gnu.org> ---
Author: bernds
Date: Tue Apr 26 12:43:42 2016
New Revision: 235442

URL: https://gcc.gnu.org/viewcvs?rev=235442&root=gcc&view=rev
Log:
Enable -frename-registers at -O2.

        PR rtl-optimization/57193
        * opts.c (default_options_table): Add OPT_frename_registers at -O2
        and above.
        * doc/invoke.texi (-frename-registers, -O2): Update documentation.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/doc/invoke.texi
    trunk/gcc/opts.c
Reply | Threaded
Open this post in threaded view
|

[Bug rtl-optimization/57193] [4.9/5/6/7 Regression] suboptimal register allocation for SSE registers

ian at gcc dot gnu.org
In reply to this post by ian at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57193

--- Comment #12 from Bernd Schmidt <bernds at gcc dot gnu.org> ---
Author: bernds
Date: Tue May  3 22:48:03 2016
New Revision: 235848

URL: https://gcc.gnu.org/viewcvs?rev=235848&root=gcc&view=rev
Log:
        PR rtl-optimization/57193
        * opts.c (default_options_table): Revert OPT_frename_registers change.
        * doc/invoke.texi (-frename-registers, -O2): Likewise.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/doc/invoke.texi
    trunk/gcc/opts.c
Reply | Threaded
Open this post in threaded view
|

[Bug rtl-optimization/57193] [4.9/5/6/7 Regression] suboptimal register allocation for SSE registers

ian at gcc dot gnu.org
In reply to this post by ian at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57193

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.9.4                       |5.5

--- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 4.9 branch is being closed
Reply | Threaded
Open this post in threaded view
|

[Bug rtl-optimization/57193] [5/6/7/8 Regression] suboptimal register allocation for SSE registers

ian at gcc dot gnu.org
In reply to this post by ian at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57193

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|5.5                         |6.5

--- Comment #14 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 5 branch is being closed
Reply | Threaded
Open this post in threaded view
|

[Bug rtl-optimization/57193] [6/7/8 Regression] suboptimal register allocation for SSE registers

ian at gcc dot gnu.org
In reply to this post by ian at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57193

Aldy Hernandez <aldyh at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|2015-06-01 00:00:00         |2018-2-3
                 CC|                            |aldyh at gcc dot gnu.org

--- Comment #15 from Aldy Hernandez <aldyh at gcc dot gnu.org> ---
Reconfirmed.
Reply | Threaded
Open this post in threaded view
|

[Bug rtl-optimization/57193] [6/7/8 Regression] suboptimal register allocation for SSE registers

ian at gcc dot gnu.org
In reply to this post by ian at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57193

--- Comment #16 from Vladimir Makarov <vmakarov at gcc dot gnu.org> ---
Author: vmakarov
Date: Fri Feb  9 18:23:58 2018
New Revision: 257537

URL: https://gcc.gnu.org/viewcvs?rev=257537&root=gcc&view=rev
Log:
2018-02-09  Vladimir Makarov  <[hidden email]>

        PR rtl-optimization/57193
        * ira-color.c (struct allocno_color_data): Add member
        conflict_allocno_hard_prefs.
        (update_conflict_allocno_hard_prefs): New.
        (bucket_allocno_compare_func): Add a preference based on
        conflict_allocno_hard_prefs.
        (push_allocno_to_stack): Update conflict_allocno_hard_prefs.
        (color_allocnos): Remove a dead code.  Initiate
        conflict_allocno_hard_prefs.  Call update_costs_from_prefs.

2018-02-09  Vladimir Makarov  <[hidden email]>

        PR rtl-optimization/57193
        * gcc.target/i386/57193.c: New.


Added:
    trunk/gcc/testsuite/gcc.target/i386/pr57193.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/ira-color.c
    trunk/gcc/testsuite/ChangeLog
Reply | Threaded
Open this post in threaded view
|

[Bug rtl-optimization/57193] [6/7 Regression] suboptimal register allocation for SSE registers

ian at gcc dot gnu.org
In reply to this post by ian at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57193

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|6.5                         |7.4

--- Comment #17 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 6 branch is being closed
12