[Bug tree-optimization/87954] New: VRP can transform a * b where a,b are [0,1] to a & b

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[Bug tree-optimization/87954] New: VRP can transform a * b where a,b are [0,1] to a & b

ro at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87954

            Bug ID: 87954
           Summary: VRP can transform a * b where a,b are [0,1] to a & b
           Product: gcc
           Version: 9.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: marxin at gcc dot gnu.org
  Target Milestone: ---

Code snippet reported by kernel developers:

$ cat mul.c
#define __GFP_DMA 1u
#define __GFP_RECLAIM 0x10u

#define KMALLOC_DMA 2
#define KMALLOC_RECLAIM 1

unsigned int and(unsigned int flags)
{
        int is_dma, type_dma, is_rec;

        is_dma = !!(flags & __GFP_DMA);
        type_dma = is_dma * KMALLOC_DMA;
        is_rec = !!(flags & __GFP_RECLAIM);

        return type_dma + (is_rec & !is_dma) * KMALLOC_RECLAIM;
}

unsigned int imul(unsigned int flags)
{
        int is_dma, type_dma, is_rec;

        is_dma = !!(flags & __GFP_DMA);
        type_dma = is_dma * KMALLOC_DMA;
        is_rec = !!(flags & __GFP_RECLAIM);

        return type_dma + (is_rec * !is_dma) * KMALLOC_RECLAIM;
}

The first function ends with fast and operation:
and:
.LFB0:
        .cfi_startproc
        movl    %edi, %edx
        movl    %edi, %eax
        andl    $1, %edi
        shrl    $4, %edx
        notl    %eax
        andl    %edx, %eax
        andl    $1, %eax
        leal    (%rax,%rdi,2), %eax
        ret

while the second one with imull
imul:
.LFB1:
        .cfi_startproc
        movl    %edi, %eax
        movl    %edi, %edx
        andl    $1, %edi
        shrl    $4, %edx
        notl    %eax
        andl    $1, %eax
        andl    $1, %edx
        imull   %edx, %eax
        leal    (%rax,%rdi,2), %eax
        ret
        .cfi_endproc

VRP knows about the range of the operands:
  _7 = _6 * is_rec_12;

_6: [0, 1]
is_rec_12: [0, 1]
Reply | Threaded
Open this post in threaded view
|

[Bug tree-optimization/87954] VRP can transform a * b where a,b are [0,1] to a & b

ro at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87954

Aldy Hernandez <aldyh at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2018-11-09
     Ever confirmed|0                           |1

--- Comment #1 from Aldy Hernandez <aldyh at gcc dot gnu.org> ---
Indeed, if you compile imul() with -fdump-tree-all-details-alias -O2 and look
at the vrp1 dump, one can see:

  # RANGE [0, 1] NONZERO 1
  is_rec_12 = (int) _4;
...
  # RANGE [0, 1] NONZERO 1
  _6 = (int) _15;
  # RANGE [0, 1] NONZERO 1
  _7 = _6 * is_rec_12;

This pattern persists throughout the optimization pipeline, so any remaining
optimizer could potentially see the range of the operands and strength reduce
this.

What would be the best place to do this?
Reply | Threaded
Open this post in threaded view
|

[Bug tree-optimization/87954] VRP can transform a * b where a,b are [0,1] to a & b

ro at gcc dot gnu.org
In reply to this post by ro at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87954

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Aldy Hernandez from comment #1)

> Indeed, if you compile imul() with -fdump-tree-all-details-alias -O2 and
> look at the vrp1 dump, one can see:
>
>   # RANGE [0, 1] NONZERO 1
>   is_rec_12 = (int) _4;
> ...
>   # RANGE [0, 1] NONZERO 1
>   _6 = (int) _15;
>   # RANGE [0, 1] NONZERO 1
>   _7 = _6 * is_rec_12;
>
> This pattern persists throughout the optimization pipeline, so any remaining
> optimizer could potentially see the range of the operands and strength
> reduce this.
>
> What would be the best place to do this?

The best place to do this is a pattern in match.pd

(simplify (mult @1 @2)
 (if (INTEGRAL_TYPE_P (type)
      && wi::eq_p (get_nonzero_bits (@1), wi::one (TYPE_PRECISION (type))
      && wi::eq_p (get_nonzero_bits (@2), wi::one (TYPE_PRECISION (type)))
  (and @1 @2))

maybe even think of tricks like requiring only one operand to be [0,1]
and sign-extending that from bit 0 before the and?  Thus [0,1] * b
-> (a ? -1 : 0) & b.  But that's probably sth to do at RTL expansion time
because it depends on costs to do the sign extension.

if nonzero-bits is 2 then we can do b + (a == 2 ? 1 : -1) * b.  Thus
transfer the bit to the sign bit and do an add.  This is similarly sth
for RTL expansion.  I guess any power-of-two or zero duality can be
optimized a bit.
Reply | Threaded
Open this post in threaded view
|

[Bug tree-optimization/87954] VRP can transform a * b where a,b are [0,1] to a & b

ro at gcc dot gnu.org
In reply to this post by ro at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87954

--- Comment #3 from Martin Liška <marxin at gcc dot gnu.org> ---
Marc, are you planning to implement that in the future?
Reply | Threaded
Open this post in threaded view
|

[Bug tree-optimization/87954] VRP can transform a * b where a,b are [0,1] to a & b

ro at gcc dot gnu.org
In reply to this post by ro at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87954

--- Comment #4 from Marc Glisse <glisse at gcc dot gnu.org> ---
(In reply to Martin Liška from comment #3)
> Marc, are you planning to implement that in the future?

Not in the near future, no. It is all yours if you want it ;-)