[Bug tree-optimization/54589] New: [missed-optimization] struct offset add should be folded into address calculation

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

[Bug tree-optimization/54589] New: [missed-optimization] struct offset add should be folded into address calculation

kargl at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54589

             Bug #: 54589
           Summary: [missed-optimization] struct offset add should be
                    folded into address calculation
    Classification: Unclassified
           Product: gcc
           Version: 4.8.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: [hidden email]
        ReportedBy: [hidden email]


Hi,

I found this in 4.4 (Ubuntu 10.04), and have confirmed it's still there in

  gcc (Debian 20120820-1) 4.8.0 20120820 (experimental) [trunk revision 190537]

This code:

  #include <emmintrin.h>

  struct param {
          int a, b, c, d;
          __m128i array[256];
  };

  void func(struct param *p, unsigned char *src, int *dst)
  {
          __m128i x = p->array[*src];
          *dst = _mm_cvtsi128_si32(x);
  }

compiles with -O2 on x86-64 to this assembler:

  0000000000000000 <func>:
     0:    0f b6 06                 movzbl (%rsi),%eax
     3:    48 83 c0 01              add    $0x1,%rax
     7:    48 c1 e0 04              shl    $0x4,%rax
     b:    8b 04 07                 mov    (%rdi,%rax,1),%eax
     e:    89 02                    mov    %eax,(%rdx)
    10:    c3                       retq  

The add should be folded into the address calculation here. (The shl can't,
because it's too big.) Curiously enough, if I misalign the struct element by
removing c and d, and declaring the struct __attribute__((packed)), GCC will do
that; the mov will then be from $8(%rdi,%rax,1),%eax and there is no redundant
add.
Reply | Threaded
Open this post in threaded view
|

[Bug target/54589] struct offset add should be folded into address calculation

kargl at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54589

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|                            |x86_64-*-*, i?86-*-*
             Status|UNCONFIRMED                 |NEW
           Keywords|                            |missed-optimization
   Last reconfirmed|                            |2012-09-17
          Component|tree-optimization           |target
     Ever Confirmed|0                           |1
            Summary|[missed-optimization]       |struct offset add should be
                   |struct offset add should be |folded into address
                   |folded into address         |calculation
                   |calculation                 |
      Known to fail|                            |4.6.3, 4.7.2, 4.8.0

--- Comment #1 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-09-17 08:49:25 UTC ---
As there is no loop involved it is RTL fwprop or combines job to eventually
do this.  I see

func:
.LFB517:
        .cfi_startproc
        movzbl  (%rsi), %eax
        addq    $1, %rax
        salq    $4, %rax
        movl    (%rdi,%rax), %eax
        movl    %eax, (%rdx)
        ret

for both 4.7 and 4.8.  i?86 is even worse.
Reply | Threaded
Open this post in threaded view
|

[Bug target/54589] struct offset add should be folded into address calculation

kargl at gcc dot gnu.org
In reply to this post by kargl at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54589

--- Comment #2 from sgunderson at bigfoot dot com 2012-09-17 09:18:16 UTC ---
FWIW, in my original code, func() is a part of a loop body (it keeps reading
values from src in a loop). It doesn't really change anything in the generated
code, though.
Reply | Threaded
Open this post in threaded view
|

[Bug target/54589] struct offset add should be folded into address calculation

kargl at gcc dot gnu.org
In reply to this post by kargl at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54589

--- Comment #3 from sgunderson at bigfoot dot com ---
Still there in GCC 7.2.1 (exact same assembler output), and in 8.0 snapshot
20171017.
Reply | Threaded
Open this post in threaded view
|

[Bug target/54589] struct offset add should be folded into address calculation

kargl at gcc dot gnu.org
In reply to this post by kargl at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54589

--- Comment #4 from UroŇ° Bizjak <ubizjak at gmail dot com> ---
clang generates for x86_64:

        movzbl  (%rsi), %eax
        shlq    $4, %rax
        movl    16(%rdi,%rax), %eax
        movl    %eax, (%rdx)
        retq

and for i?86:

        movl    8(%esp), %edx
        movl    4(%esp), %ecx
        movl    12(%esp), %eax
        movzbl  (%edx), %edx
        shll    $4, %edx
        movl    16(%ecx,%edx), %ecx
        movl    %ecx, (%eax)
        retl

for the later case (-m32), gcc generates:

        movl    8(%esp), %eax
        movzbl  (%eax), %eax
        addl    $1, %eax
        sall    $4, %eax
        addl    4(%esp), %eax
        movl    (%eax), %edx
        movl    12(%esp), %eax
        movl    %edx, (%eax)
        ret

so, two extra additions.
Reply | Threaded
Open this post in threaded view
|

[Bug target/54589] struct offset add should be folded into address calculation

kargl at gcc dot gnu.org
In reply to this post by kargl at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54589

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org,
                   |                            |segher at gcc dot gnu.org

--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Note, in other cases combine is sucessful.  E.g. consider:
struct S { int a, b, c, d; };
struct T { struct S e[16]; struct S f[1024]; } t;

int
foo (unsigned long x)
{
  return t.f[x + 5].a;
}

int
bar (struct T *x, unsigned long y)
{
  return x->f[y + 5].b;
}

On x86_64-linux with -O2, we have before combine in both cases
(idx + 21) << 4 and in foo combine says:
Trying 7, 8 -> 10:
    7: {r87:DI=r91:DI+0x15;clobber flags:CC;}
      REG_DEAD r91:DI
      REG_UNUSED flags:CC
    8: {r88:DI=r87:DI<<0x4;clobber flags:CC;}
      REG_DEAD r87:DI
      REG_UNUSED flags:CC
   10: r90:SI=[r88:DI+`t']
      REG_DEAD r88:DI
Failed to match this instruction:
(set (reg:SI 90 [ t.f[_1].a ])
    (mem:SI (plus:DI (mult:DI (reg:DI 91)
                (const_int 16 [0x10]))
            (const:DI (plus:DI (symbol_ref:DI ("t") [flags 0x2]  <var_decl
0x7f837b94cab0 t>)
                    (const_int 336 [0x150])))) [1 t.f[_1].a+0 S4 A32]))
Successfully matched this instruction:
(set (reg:DI 88)
    (ashift:DI (reg:DI 91)
        (const_int 4 [0x4])))
Successfully matched this instruction:
(set (reg:SI 90 [ t.f[_1].a ])
    (mem:SI (plus:DI (reg:DI 88)
            (const:DI (plus:DI (symbol_ref:DI ("t") [flags 0x2]  <var_decl
0x7f837b94cab0 t>)
                    (const_int 336 [0x150])))) [1 t.f[_1].a+0 S4 A32]))
and it is merged the way we want:
        salq    $4, %rdi
        movl    t+336(%rdi), %eax
while in bar:
Failed to match this instruction:
(set (reg:SI 91 [ x_4(D)->f[_1].b ])
    (mem:SI (plus:DI (plus:DI (mult:DI (reg:DI 93)
                    (const_int 16 [0x10]))
                (reg:DI 92))
            (const_int 340 [0x154])) [1 x_4(D)->f[_1].b+0 S4 A32]))
Failed to match this instruction:
(set (reg:DI 88)
    (plus:DI (ashift:DI (reg:DI 93)
            (const_int 4 [0x4]))
        (reg:DI 92)))
and it is not merged:
        addq    $21, %rsi
        salq    $4, %rsi
        movl    4(%rsi,%rdi), %eax
when it could be:
        salq    $4, %rsi
        movl    340(%rsi,%rdi), %eax

It isn't x86 specific though, e.g. on powerpc64-linux we end up with:
        addi 4,4,21
        sldi 4,4,4
        add 4,3,4
        lwa 3,4(4)
for bar, where I guess:
        sldi 4,4,4
        add 4,3,4
        lwa 3,340(4)
would work too.
Reply | Threaded
Open this post in threaded view
|

[Bug target/54589] struct offset add should be folded into address calculation

kargl at gcc dot gnu.org
In reply to this post by kargl at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54589

--- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
--- gcc/combine.c.jj    2018-11-21 19:57:26.229726485 +0100
+++ gcc/combine.c       2018-11-29 17:57:48.069423874 +0100
@@ -4945,7 +4945,7 @@ find_split_point (rtx *loc, rtx_insn *in
        }

       /* If we have a PLUS whose second operand is a constant and the
-        address is not valid, perhaps will can split it up using
+        address is not valid, perhaps we can split it up using
         the machine-specific way to split large constants.  We use
         the first pseudo-reg (one of the virtual regs) as a placeholder;
         it will not remain in the result.  */
@@ -4960,7 +4960,7 @@ find_split_point (rtx *loc, rtx_insn *in

          /* This should have produced two insns, each of which sets our
             placeholder.  If the source of the second is a valid address,
-            we can make put both sources together and make a split point
+            we can put both sources together and make a split point
             in the middle.  */

          if (seq
@@ -5001,14 +5001,45 @@ find_split_point (rtx *loc, rtx_insn *in
                }
            }

+         /* If that didn't work and we have a nested plus, like:
+            ((REG1 * CONST1) + REG2) + CONST2 and (REG1 + REG2) + CONST2
+            is valid address, try to split (REG1 * CONST1).  */
+         if (GET_CODE (XEXP (XEXP (x, 0), 0)) == PLUS
+             && !OBJECT_P (XEXP (XEXP (XEXP (x, 0), 0), 0))
+             && OBJECT_P (XEXP (XEXP (XEXP (x, 0), 0), 1)))
+           {
+             rtx tem = XEXP (XEXP (XEXP (x, 0), 0), 0);
+             XEXP (XEXP (XEXP (x, 0), 0), 0) = reg;
+             if (memory_address_addr_space_p (GET_MODE (x), XEXP (x, 0),
+                                              MEM_ADDR_SPACE (x)))
+               {
+                 XEXP (XEXP (XEXP (x, 0), 0), 0) = tem;
+                 return &XEXP (XEXP (XEXP (x, 0), 0), 0);
+               }
+             XEXP (XEXP (XEXP (x, 0), 0), 0) = tem;
+           }
+         else if (GET_CODE (XEXP (XEXP (x, 0), 0)) == PLUS
+                  && OBJECT_P (XEXP (XEXP (XEXP (x, 0), 0), 0))
+                  && !OBJECT_P (XEXP (XEXP (XEXP (x, 0), 0), 1)))
+           {
+             rtx tem = XEXP (XEXP (XEXP (x, 0), 0), 1);
+             XEXP (XEXP (XEXP (x, 0), 0), 1) = reg;
+             if (memory_address_addr_space_p (GET_MODE (x), XEXP (x, 0),
+                                              MEM_ADDR_SPACE (x)))
+               {
+                 XEXP (XEXP (XEXP (x, 0), 0), 1) = tem;
+                 return &XEXP (XEXP (XEXP (x, 0), 0), 1);
+               }
+             XEXP (XEXP (XEXP (x, 0), 0), 1) = tem;
+           }
+
          /* If that didn't work, perhaps the first operand is complex and
             needs to be computed separately, so make a split point there.
             This will occur on machines that just support REG + CONST
             and have a constant moved through some previous computation.  */
-
-         else if (!OBJECT_P (XEXP (XEXP (x, 0), 0))
-                  && ! (GET_CODE (XEXP (XEXP (x, 0), 0)) == SUBREG
-                        && OBJECT_P (SUBREG_REG (XEXP (XEXP (x, 0), 0)))))
+         if (!OBJECT_P (XEXP (XEXP (x, 0), 0))
+             && ! (GET_CODE (XEXP (XEXP (x, 0), 0)) == SUBREG
+                   && OBJECT_P (SUBREG_REG (XEXP (XEXP (x, 0), 0)))))
            return &XEXP (XEXP (x, 0), 0);
        }


fixes this for me.
Reply | Threaded
Open this post in threaded view
|

[Bug target/54589] struct offset add should be folded into address calculation

kargl at gcc dot gnu.org
In reply to this post by kargl at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54589

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |jakub at gcc dot gnu.org

--- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 45125
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45125&action=edit
gcc9-pr54589.patch

Untested fix.
Reply | Threaded
Open this post in threaded view
|

[Bug target/54589] struct offset add should be folded into address calculation

kargl at gcc dot gnu.org
In reply to this post by kargl at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54589

--- Comment #8 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Author: jakub
Date: Sat Dec  1 07:27:58 2018
New Revision: 266707

URL: https://gcc.gnu.org/viewcvs?rev=266707&root=gcc&view=rev
Log:
        PR target/54589
        * combine.c (find_split_point): For invalid memory address
        nonobj + obj + const, if reg + obj + const is valid addressing
        mode, split at nonobj.  Use if rather than else if for the
        fallback.  Comment fixes.

        * gcc.target/i386/pr54589.c: New test.

Added:
    trunk/gcc/testsuite/gcc.target/i386/pr54589.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/combine.c
    trunk/gcc/testsuite/ChangeLog
Reply | Threaded
Open this post in threaded view
|

[Bug target/54589] struct offset add should be folded into address calculation

kargl at gcc dot gnu.org
In reply to this post by kargl at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54589

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED

--- Comment #9 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Fixed for GCC9 and later.
Reply | Threaded
Open this post in threaded view
|

[Bug target/54589] struct offset add should be folded into address calculation

kargl at gcc dot gnu.org
In reply to this post by kargl at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54589

Jaydeep Chauhan <jaydeepchauhan1494 at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jaydeepchauhan1494 at gmail dot co
                   |                            |m

--- Comment #10 from Jaydeep Chauhan <jaydeepchauhan1494 at gmail dot com> ---
 Hi,

 For the below case fix is not working.

 #include <emmintrin.h>

  struct param {
          int a, b, c, d;
          __m128i array[256];
  };

  struct param *p1;
  void func(struct param *p, unsigned char *src, int *dst)
  {
          __m128i x = p1->array[*src];
          *dst = _mm_cvtsi128_si32(x);
  }

  gcc generates for x86-64 with -O2:

  func:
        movzbl  (%rsi), %eax
        addq    $1, %rax
        salq    $4, %rax
        addq    p1(%rip), %rax
        movl    (%rax), %eax
        movl    %eax, (%rdx)
        ret

  clang generates for x86_64 with -O2:  

  func:                                  
        movq    p1(%rip), %rax
        movzbl  (%rsi), %ecx
        shlq    $4, %rcx
        movl    16(%rax,%rcx), %eax
        movl    %eax, (%rdx)
        retq
Reply | Threaded
Open this post in threaded view
|

[Bug target/54589] struct offset add should be folded into address calculation

kargl at gcc dot gnu.org
In reply to this post by kargl at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54589

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |---

--- Comment #11 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
That is because e.g. on:
struct S { int a, b, c, d; };
struct T { struct S a; struct S b[256]; } *v;

int
foo (unsigned char *x)
{
  return v->b[*x].a;
}
we have:
(insn 6 3 7 2 (set (reg/f:DI 88 [ v ])
        (mem/f/c:DI (symbol_ref:DI ("v") [flags 0x2]  <var_decl 0x7f5f67c7aab0
v>) [1 v+0 S8 A64])) "pr54589-3.c":7:18 66 {*movdi_internal}
     (nil))
(insn 7 6 8 2 (set (reg:SI 89 [ *x_5(D) ])
        (zero_extend:SI (mem:QI (reg/v/f:DI 86 [ x ]) [0 *x_5(D)+0 S1 A8])))
"pr54589-3.c":7:15 119 {*zero_extendqisi2}
     (expr_list:REG_DEAD (reg/v/f:DI 86 [ x ])
        (nil)))
(insn 8 7 9 2 (set (reg:DI 90 [ *x_5(D) ])
        (sign_extend:DI (reg:SI 89 [ *x_5(D) ]))) "pr54589-3.c":7:18 128
{*extendsidi2_rex64}
     (expr_list:REG_DEAD (reg:SI 89 [ *x_5(D) ])
        (nil)))
(insn 9 8 10 2 (parallel [
            (set (reg:DI 91)
                (plus:DI (reg:DI 90 [ *x_5(D) ])
                    (const_int 1 [0x1])))
            (clobber (reg:CC 17 flags))
        ]) "pr54589-3.c":7:18 191 {*adddi_1}
     (expr_list:REG_DEAD (reg:DI 90 [ *x_5(D) ])
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (nil))))
(insn 10 9 11 2 (parallel [
            (set (reg:DI 92)
                (ashift:DI (reg:DI 91)
                    (const_int 4 [0x4])))
            (clobber (reg:CC 17 flags))
        ]) "pr54589-3.c":7:18 518 {*ashldi3_1}
     (expr_list:REG_DEAD (reg:DI 91)
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (nil))))
(insn 11 10 12 2 (parallel [
            (set (reg/f:DI 93)
                (plus:DI (reg/f:DI 88 [ v ])
                    (reg:DI 92)))
            (clobber (reg:CC 17 flags))
        ]) "pr54589-3.c":7:18 191 {*adddi_1}
     (expr_list:REG_DEAD (reg:DI 92)
        (expr_list:REG_DEAD (reg/f:DI 88 [ v ])
            (expr_list:REG_UNUSED (reg:CC 17 flags)
                (nil)))))
(insn 12 11 17 2 (set (reg:SI 94 [ v.0_1->b[_3].a ])
        (mem:SI (reg/f:DI 93) [2 v.0_1->b[_3].a+0 S4 A32])) "pr54589-3.c":7:18
67 {*movsi_internal}
     (expr_list:REG_DEAD (reg/f:DI 93)
        (nil)))
and combine would handle that fine if it tried to combine 9, 10, 11 -> 12
first, but unfortunately for this case it tries 6 -> 11 first and, if the
combination of everything doesn't simplify, we try to split it only into two
parts, which in this case isn't enough, we'd need to split those 4 instructions
into 3 (one to do the shift, one to do the v load and last to do the mem read
with the 16(reg1, reg2) addressing.

Unless the combiner grows the possibility to split into 3 functions, I'm afraid
this would need to be solved in machine reorg or something similar.
Reply | Threaded
Open this post in threaded view
|

[Bug target/54589] struct offset add should be folded into address calculation

kargl at gcc dot gnu.org
In reply to this post by kargl at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54589

--- Comment #12 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #11)
> Unless the combiner grows the possibility to split into 3 functions, I'm

I mean 3 instructions when trying to combine 4.

> afraid this would need to be solved in machine reorg or something similar.
Reply | Threaded
Open this post in threaded view
|

[Bug target/54589] struct offset add should be folded into address calculation

kargl at gcc dot gnu.org
In reply to this post by kargl at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54589

--- Comment #13 from Segher Boessenkool <segher at gcc dot gnu.org> ---
Yeah, that's not going to happen.

Will it help to do some define_split or define_insn_and_split for this?