[PATCH 00/29] [arm] Rewrite DImode arithmetic support

classic Classic list List threaded Threaded
51 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[PATCH 00/29] [arm] Rewrite DImode arithmetic support

Richard Earnshaw (lists)

This series of patches rewrites all the DImode arithmetic patterns for
the Arm backend when compiling for Arm or Thumb2 to split the
operations during expand (the thumb1 code is unchanged and cannot
benefit from early splitting as we are unable to expose the carry
flag).

This has a number of benefits:
 - register allocation has more freedom to use independent
   registers for the upper and lower halves of the register
 - we can make better use of combine for spotting insn merge
   opportunities without needing many additional patterns that are
   only used for DImode
 - we eliminate a number of bugs in the machine description where
   the carry calculations were not correctly propagated by the
   split patterns (we mostly got away with this because the
   splitting previously happened only after most of the important
   optimization passes had been run).

The patch series starts by paring back all the DImode arithmetic
support to a very simple form without any splitting at all and then
progressively re-implementing the patterns with early split
operations.  This proved to be the only sane way of untangling the
existing code due to a number of latent bugs which would have been
exposed if a different approach had been taken.

Each patch should produce a working compiler (it did when it was
originally written), though since the patch set has been re-ordered
slightly there is a possibility that some of the intermediate steps
may have missing test updates that are only cleaned up later.
However, only the end of the series should be considered complete.
I've kept the patch as a series to permit easier regression hunting
should that prove necessary.

R.

Richard Earnshaw (29):
  [arm] Rip out DImode addition and subtraction splits.
  [arm] Perform early splitting of adddi3.
  [arm] Early split zero- and sign-extension
  [arm] Rewrite addsi3_carryin_shift_<optab> in canonical form
  [arm] fix constraints on addsi3_carryin_alt2
  [arm] Early split subdi3
  [arm] Remove redundant DImode subtract patterns
  [arm] Introduce arm_carry_operation
  [arm] Correctly cost addition with a carry-in
  [arm] Correct cost calculations involving borrow for subtracts.
  [arm] Reduce cost of insns that are simple reg-reg moves.
  [arm] Implement negscc using SBC when appropriate.
  [arm] Add alternative canonicalizations for subtract-with-carry +
    shift
  [arm] Early split simple DImode equality comparisons
  [arm] Improve handling of DImode comparisions against constants.
  [arm] early split most DImode comparison operations.
  [arm] Handle some constant comparisons using rsbs+rscs
  [arm] Cleanup dead code - old support for DImode comparisons
  [arm] Handle immediate values in uaddvsi4
  [arm] Early expansion of uaddvdi4.
  [arm] Improve code generation for addvsi4.
  [arm] Allow the summation result of signed add-with-overflow to be
    discarded.
  [arm] Early split addvdi4
  [arm] Improve constant handling for usubvsi4.
  [arm] Early expansion of usubvdi4.
  [arm] Improve constant handling for subvsi4.
  [arm] Early expansion of subvdi4
  [arm] Improvements to negvsi4 and negvdi4.
  [arm] Fix testsuite nit when compiling for thumb2

 gcc/config/arm/arm-modes.def                  |   19 +-
 gcc/config/arm/arm-protos.h                   |    1 +
 gcc/config/arm/arm.c                          |  598 ++++-
 gcc/config/arm/arm.md                         | 2020 ++++++++++-------
 gcc/config/arm/iterators.md                   |   15 +-
 gcc/config/arm/predicates.md                  |   29 +-
 gcc/config/arm/thumb2.md                      |    8 +-
 .../gcc.dg/builtin-arith-overflow-3.c         |   41 +
 gcc/testsuite/gcc.target/arm/negdi-3.c        |    4 +-
 9 files changed, 1757 insertions(+), 978 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/builtin-arith-overflow-3.c


Reply | Threaded
Open this post in threaded view
|

[PATCH 01/29] [arm] Rip out DImode addition and subtraction splits.

Richard Earnshaw (lists)

The first step towards early splitting of addition and subtraction at
DImode is to rip out the old patterns that are designed to propagate
DImode through the RTL optimization passes and the do late splitting.

This patch does cause some code size regressions, but it should still
execute correctly.  We will progressively add back the optimizations
we had here in later patches.

A small number of tests in the Arm-specific testsuite do fail as a
result of this patch, but that's to be expected, since the
optimizations they are looking for have just been removed.  I've kept
the tests, but XFAILed them for now.

One small technical change is also done in this patch as part of the
cleanup: the uaddv<mode>4 expander is changed to use LTU as the branch
comparison.  This eliminates the need for CC_Cmode to recognize
somewhat bogus equality constraints.

gcc:
        * arm.md (adddi3): Only accept register operands.
        (arm_adddi3): Convert to simple insn with no split.  Do not accept
        constants.
        (adddi_sesidi_di): Delete patern.
        (adddi_zesidi_di): Likewise.
        (uaddv<mode>4): Use LTU as condition for branch.
        (adddi3_compareV): Convert to simple insn with no split.
        (addsi3_compareV_upper): Delete pattern.
        (adddi3_compareC): Convert to simple insn with no split.  Correct
        flags setting expression.
        (addsi3_compareC_upper): Delete pattern.
        (addsi3_compareC): Correct flags setting expression.
        (subdi3_compare1): Convert to simple insn with no split.
        (subsi3_carryin_compare): Delete pattern.
        (arm_subdi3): Convert to simple insn with no split.
        (subdi_zesidi): Delete pattern.
        (subdi_di_sesidi): Delete pattern.
        (subdi_zesidi_di): Delete pattern.
        (subdi_sesidi_di): Delete pattern.
        (subdi_zesidi_zesidi): Delete pattern.
        (negvdi3): Use s_register_operand.
        (negdi2_compare): Convert to simple insn with no split.
        (negdi2_insn): Likewise.
        (negsi2_carryin_compare): Delete pattern.
        (negdi_zero_extendsidi): Delete pattern.
        (arm_cmpdi_insn): Convert to simple insn with no split.
        (negdi2): Don't call gen_negdi2_neon.
        * config/arm/neon.md (adddi3_neon): Delete pattern.
        (subdi3_neon): Delete pattern.
        (negdi2_neon): Delete pattern.
        (splits for negdi2_neon): Delete splits.

testsuite:
        * gcc.target/arm/negdi-3.c: Add XFAILS.
        * gcc.target/arm/pr3447-1.c: Likewise.
        * gcc.target/arm/pr3447-3.c: Likewise.
        * gcc.target/arm/pr3447-4.c: Likewise.
---
 gcc/config/arm/arm.c                     |   2 -
 gcc/config/arm/arm.md                    | 569 ++---------------------
 gcc/testsuite/gcc.target/arm/negdi-3.c   |   8 +-
 gcc/testsuite/gcc.target/arm/pr53447-1.c |   2 +-
 gcc/testsuite/gcc.target/arm/pr53447-3.c |   2 +-
 gcc/testsuite/gcc.target/arm/pr53447-4.c |   2 +-
 6 files changed, 56 insertions(+), 529 deletions(-)


0001-arm-Rip-out-DImode-addition-and-subtraction-splits.patch (29K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

[PATCH 02/29] [arm] Perform early splitting of adddi3.

Richard Earnshaw (lists)
In reply to this post by Richard Earnshaw (lists)

This patch causes the expansion of adddi3 to split the operation
immediately for Arm and Thumb-2.  This is desirable as it frees up the
register allocator to pick what ever combination of registers suits
best and reduces the number of auxiliary patterns that we need in the
back-end.  Three of the testcases that we disabled earlier are already
fixed by this patch.  Finally, we add a new pattern to match the
canonicalization of add-with-carry when using an immediate of zero.

gcc:
        * config/arm/arm-protos.h (arm_decompose_di_binop): New prototype.
        * config/arm/arm.c (arm_decompose_di_binop): New function.
        * config/arm/arm.md (adddi3): Also accept any const_int for op2.
        If not generating Thumb-1 code, decompose the operation into 32-bit
        pieces.
        * add0si_carryin_<optab>: New pattern.

testsuite:
        * gcc.target/arm/pr53447-1.c: Remove XFAIL.
        * gcc.target/arm/pr53447-3.c: Remove XFAIL.
        * gcc.target/arm/pr53447-4.c: Remove XFAIL.
---
 gcc/config/arm/arm-protos.h              |  1 +
 gcc/config/arm/arm.c                     | 15 +++++
 gcc/config/arm/arm.md                    | 73 ++++++++++++++++++------
 gcc/testsuite/gcc.target/arm/pr53447-1.c |  2 +-
 gcc/testsuite/gcc.target/arm/pr53447-3.c |  2 +-
 gcc/testsuite/gcc.target/arm/pr53447-4.c |  2 +-
 6 files changed, 76 insertions(+), 19 deletions(-)


0002-arm-Perform-early-splitting-of-adddi3.patch (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

[PATCH 03/29] [arm] Early split zero- and sign-extension

Richard Earnshaw (lists)
In reply to this post by Richard Earnshaw (lists)

This patch changes the insn patterns for zero- and sign-extend into
define_expands that generate the appropriate word operations
immediately.

        * config/arm/arm.md (zero_extend<mode>di2): Convert to define_expand.
        (extend<mode>di2): Likewise.
---
 gcc/config/arm/arm.md | 75 +++++++++++++++++++++++++++++++------------
 1 file changed, 54 insertions(+), 21 deletions(-)


0003-arm-Early-split-zero-and-sign-extension.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

[PATCH 04/29] [arm] Rewrite addsi3_carryin_shift_<optab> in canonical form

Richard Earnshaw (lists)
In reply to this post by Richard Earnshaw (lists)

The add-with-carry operation which involves a shift doesn't match at present
because it isn't matching the canonical form generated by combine.  Fixing
this is simply a matter of re-ordering the operands.

        * config/arm/arm.md (addsi3_carryin_shift_<optab>): Reorder operands
        to match canonical form.
---
 gcc/config/arm/arm.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


0004-arm-Rewrite-addsi3_carryin_shift_-optab-in-canonical.patch (675 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

[PATCH 05/29] [arm] fix constraints on addsi3_carryin_alt2

Richard Earnshaw (lists)
In reply to this post by Richard Earnshaw (lists)

addsi3_carryin_alt2 has a more strict constraint than the predicate
when adding a constant.  This leads to sub-optimal code in some
circumstances.

        * config/arm/arm.md (addsi3_carryin_alt2): Use arm_not_operand for
        operand 2.
---
 gcc/config/arm/arm.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


0005-arm-fix-constraints-on-addsi3_carryin_alt2.patch (634 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

[PATCH 06/29] [arm] Early split subdi3

Richard Earnshaw (lists)
In reply to this post by Richard Earnshaw (lists)

This patch adds early splitting of subdi3 so that the individual
operations can be seen by the optimizers, particuarly combine.  This
should allow us to do at least as good a job as previously, but with
far fewer patterns in the machine description.

This is just the initial patch to add the early splitting.  The
cleanups will follow later.

A special trick is used to handle the 'reverse subtract and compare'
where a register is subtracted from a constant.  The natural
comparison

    (COMPARE (const) (reg))

is not canonical in this case and combine will never correctly
generate it (trying to swap the order of the operands.  To handle this
we write the comparison as

    (COMPARE (NOT (reg)) (~const)),

which has the same result for EQ, NE, LTU, LEU, GTU and GEU, which are
all the cases we are really interested in here.

Finally, we delete the negdi2 pattern.  The generic expanders will use
our new subdi3 expander if this pattern is missing and that can handle
the negate case just fine.

        * config/arm/arm-modes.def (CC_RSB): New CC mode.
        * config/arm/predicates.md (arm_borrow_operation): Handle CC_RSBmode.
        * config/arm/arm.c (arm_select_cc_mode): Detect when we should
        return CC_RSBmode.
        (maybe_get_arm_condition_code): Handle CC_RSBmode.
        * config/arm/arm.md (subsi3_carryin): Make this pattern available to
        expand.
        (subdi3): Rewrite to early-expand the sub-operations.
        (rsb_im_compare): New pattern.
        (negdi2): Delete.
        (negdi2_insn): Delete.
        (arm_negsi2): Correct type attribute to alu_imm.
        (negsi2_0compare): New insn pattern.
        (negsi2_carryin): New insn pattern.
---
 gcc/config/arm/arm-modes.def |   4 +
 gcc/config/arm/arm.c         |  23 ++++++
 gcc/config/arm/arm.md        | 141 ++++++++++++++++++++++++++++-------
 gcc/config/arm/predicates.md |   2 +-
 4 files changed, 141 insertions(+), 29 deletions(-)


0006-arm-Early-split-subdi3.patch (8K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

[PATCH 07/29] [arm] Remove redundant DImode subtract patterns

Richard Earnshaw (lists)
In reply to this post by Richard Earnshaw (lists)

Now that we early split DImode subtracts, the patterns to emit the
original and to match zero-extend with subtraction or negation are
no-longer useful.

        * config/arm/arm.md (arm_subdi3): Delete insn.
        (zextendsidi_negsi, negdi_extendsidi): Delete insn_and_split.
---
 gcc/config/arm/arm.md | 102 ------------------------------------------
 1 file changed, 102 deletions(-)


0007-arm-Remove-redundant-DImode-subtract-patterns.patch (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

[PATCH 08/29] [arm] Introduce arm_carry_operation

Richard Earnshaw (lists)
In reply to this post by Richard Earnshaw (lists)

An earlier patch introduced arm_borrow_operation, this one introduces
the carry variant, which is the same except that the logic of the
carry-setting is inverted.  Having done this we can now match more
cases where the carry flag is propagated from comparisons with
different modes without having to define even more patterns.  A few
small changes to the expand patterns are required to directly create
the carry representation.

The iterators LTUGEU is no-longer needed and removed, as is the code
attribute 'cnb'.

Finally, we fix a long-standing bug which was probably inert before:
in Thumb2 a shift with ADC can only be by an immediate amount;
register-specified shifts are not permitted.

        * config/arm/predicates.md (arm_carry_operation): New special
        predicate.
        * config/arm/iterators.md (LTUGEU): Delete iterator.
        (cnb): Delete code attribute.
        (optab): Delete ltu and geu elements.
        * config/arm/arm.md (addsi3_carryin): Renamed from
        addsi3_carryin_<optab>.  Remove iterator and use arm_carry_operand.
        (add0si3_carryin): Similarly, but from add0si3_carryin_<optab>.
        (addsi3_carryin_alt2): Similarly, but from addsi3_carryin_alt2_<optab>.
        (addsi3_carryin_clobercc): Similarly.
        (addsi3_carryin_shift): Similarly.  Do not allow register shifts in
        Thumb2 state.
---
 gcc/config/arm/arm.md        | 36 ++++++++++++++++++++----------------
 gcc/config/arm/iterators.md  | 11 +----------
 gcc/config/arm/predicates.md | 21 +++++++++++++++++++++
 3 files changed, 42 insertions(+), 26 deletions(-)


0008-arm-Introduce-arm_carry_operation.patch (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

[PATCH 09/29] [arm] Correctly cost addition with a carry-in

Richard Earnshaw (lists)
In reply to this post by Richard Earnshaw (lists)

The cost routine for Arm and Thumb2 was not recognising the idioms that
describe the addition with carry, this results in the instructions
appearing more expensive than they really are, which occasionally can lead
to poor choices by combine.  Recognising all the possible variants is
a little trickier than normal because the expressions can become complex
enough that this is no single canonical from.

        * config/arm/arm.c (strip_carry_operation): New function.
        (arm_rtx_costs_internal, case PLUS): Handle addtion with carry-in
        for SImode.
---
 gcc/config/arm/arm.c | 76 +++++++++++++++++++++++++++++++++++++-------
 1 file changed, 65 insertions(+), 11 deletions(-)


0009-arm-Correctly-cost-addition-with-a-carry-in.patch (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

[PATCH 10/29] [arm] Correct cost calculations involving borrow for subtracts.

Richard Earnshaw (lists)
In reply to this post by Richard Earnshaw (lists)

The rtx_cost calculations when a borrow operation was being performed were
not being calculated correctly.  The borrow is free as part of the
subtract-with-carry instructions.  This patch recognizes the various
idioms that can describe this and returns the correct costs.

        * config/arm/arm.c (arm_rtx_costs_internal, case MINUS): Handle
        borrow operations.
---
 gcc/config/arm/arm.c | 49 +++++++++++++++++++++++++++++++++++++-------
 1 file changed, 42 insertions(+), 7 deletions(-)


0010-arm-Correct-cost-calculations-involving-borrow-for-s.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

[PATCH 11/29] [arm] Reduce cost of insns that are simple reg-reg moves.

Richard Earnshaw (lists)
In reply to this post by Richard Earnshaw (lists)

Consider this sequence during combine:

Trying 18, 7 -> 22:
   18: r118:SI=r122:SI
      REG_DEAD r122:SI
    7: r114:SI=0x1-r118:SI-ltu(cc:CC_RSB,0)
      REG_DEAD r118:SI
      REG_DEAD cc:CC_RSB
   22: r1:SI=r114:SI
      REG_DEAD r114:SI
Failed to match this instruction:
(set (reg:SI 1 r1 [+4 ])
    (minus:SI (geu:SI (reg:CC_RSB 100 cc)
            (const_int 0 [0]))
        (reg:SI 122)))
Successfully matched this instruction:
(set (reg:SI 114)
    (geu:SI (reg:CC_RSB 100 cc)
        (const_int 0 [0])))
Successfully matched this instruction:
(set (reg:SI 1 r1 [+4 ])
    (minus:SI (reg:SI 114)
        (reg:SI 122)))
allowing combination of insns 18, 7 and 22
original costs 4 + 4 + 4 = 12
replacement costs 8 + 4 = 12

The costs are all correct, but we really don't want this combination
to take place.  The original costs contain an insn that is a simple
move of one pseudo register to another and it is extremely likely that
register allocation will eliminate this insn entirely.  On the other
hand, the resulting sequence really does expand into a sequence that
costs 12 (ie 3 insns).

We don't want to prevent combine from eliminating such moves, as this
can expose more combine opportunities, but we shouldn't rate them as
profitable in themselves.  We can do this be adjusting the costs
slightly so that the benefit of eliminating such a simple insn is
reduced.

We only do this before register allocation; after allocation we give
such insns their full cost.

        * config/arm/arm.c (arm_insn_cost): New function.
        (TARGET_INSN_COST): Override default definition.
---
 gcc/config/arm/arm.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)


0011-arm-Reduce-cost-of-insns-that-are-simple-reg-reg-mov.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

[PATCH 12/29] [arm] Implement negscc using SBC when appropriate.

Richard Earnshaw (lists)
In reply to this post by Richard Earnshaw (lists)

When the carry flag is appropriately set by a comprison, negscc
patterns can expand into a simple SBC of a register with itself.  This
means we can convert two conditional instructions into a single
non-conditional instruction.  Furthermore, in Thumb2 we can avoid the
need for an IT instruction as well.  This patch also fixes the remaining
testcase that we initially XFAILed in the first patch of this series.

gcc:
        * config/arm/arm.md (negscc_borrow): New pattern.
        (mov_negscc): Don't split if the insn would match negscc_borrow.
        * config/arm/thumb2.md (thumb2_mov_negscc): Likewise.
        (thumb2_mov_negscc_strict_it): Likewise.

testsuite:
        * gcc.target/arm/negdi-3.c: Remove XFAIL markers.
---
 gcc/config/arm/arm.md                  | 14 ++++++++++++--
 gcc/config/arm/thumb2.md               |  8 ++++++--
 gcc/testsuite/gcc.target/arm/negdi-3.c |  8 ++++----
 3 files changed, 22 insertions(+), 8 deletions(-)


0012-arm-Implement-negscc-using-SBC-when-appropriate.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

[PATCH 13/29] [arm] Add alternative canonicalizations for subtract-with-carry + shift

Richard Earnshaw (lists)
In reply to this post by Richard Earnshaw (lists)

This patch adds a couple of alternative canonicalizations to allow
combine to match a subtract-with-carry operation when one of the operands
is shifted first.  The most common case of this is when combining a
sign-extend of one operand with a long-long value during subtraction.
The RSC variant is only enabled for Arm, the SBC variant for any 32-bit
compilation.

        * config/arm/arm.md (subsi3_carryin_shift_alt): New pattern.
        (rsbsi3_carryin_shift_alt): Likewise.
---
 gcc/config/arm/arm.md | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)


0013-arm-Add-alternative-canonicalizations-for-subtract-w.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

[PATCH 14/29] [arm] Early split simple DImode equality comparisons

Richard Earnshaw (lists)
In reply to this post by Richard Earnshaw (lists)

This is the first step of early splitting all the DImode comparison
operations.  We start by factoring the DImode handling out of
arm_gen_compare_reg into its own function.

Simple DImode equality comparisions (such as equality with zero, or
equality with a constant that is zero in one of the two word values
that it comprises) can be done using a single subtract followed by an
ORRS instruction.  This avoids the need for conditional execution.

For example, (r0 != 5) can be written as

        SUB Rt, R0, #5
        ORRS Rt, Rt, R1

The ORRS is now expanded using an SImode pattern that already exists
in the MD file and this gives the register allocator more freedom to
select registers (consecutive pairs are no-longer required).
Furthermore, we can then delete the arm_cmpdi_zero pattern as it is
no-longer required.  We use SUB for the value adjustment as this has a
generally more flexible range of immediates than XOR and what's more
has the opportunity to be relaxed in thumb2 to a 16-bit SUBS
instruction.

        * config/arm/arm.c (arm_select_cc_mode): For DImode equality tests
        return CC_Zmode if comparing against a constant where one word is
        zero.
        (arm_gen_compare_reg): Split DImode handling to ...
        (arm_gen_dicompare_reg): ... here.  Handle equality comparisons
        against simple constants.
        * config/arm/arm.md (arm_cmpdi_zero): Delete pattern.
---
 gcc/config/arm/arm.c  | 87 +++++++++++++++++++++++++++++++++----------
 gcc/config/arm/arm.md | 11 ------
 2 files changed, 68 insertions(+), 30 deletions(-)


0014-arm-Early-split-simple-DImode-equality-comparisons.patch (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

[PATCH 15/29] [arm] Improve handling of DImode comparisions against constants.

Richard Earnshaw (lists)
In reply to this post by Richard Earnshaw (lists)

In almost all cases it is better to handle inequality handling against constants
by transforming comparisons of the form (reg <GE/LT/GEU/LTU> const) into
(reg <GT/LE/GTU/LEU> (const+1)).  However, there are many cases that we could
handle but currently failed to do so because we forced the constant into a
register too early in the pattern expansion.  To permit this to be done we need
to defer forcing the constant into a register until after we've had the chance
to do the transform - in some cases that may even mean that we no-longer need
to force the constant into a register at all.  For example, on Arm, the case:

_Bool f8 (unsigned long long a) { return a > 0xffffffff; }

previously compiled to

        mov     r3, #0
        cmp     r1, r3
        mvn     r2, #0
        cmpeq   r0, r2
        movhi   r0, #1
        movls   r0, #0
        bx      lr

But now compiles to

        cmp     r1, #1
        cmpeq   r0, #0
        movcs   r0, #1
        movcc   r0, #0
        bx      lr

Which although not yet completely optimal, is certainly better than
previously.

        * config/arm/arm.md (cbranchdi4): Accept reg_or_int_operand for
        operand 2.
        (cstoredi4): Similarly, but for operand 3.
        * config/arm/arm.c (arm_canoncialize_comparison): Allow canonicalization
        of unsigned compares with a constant on Arm.  Prefer using const+1 and
        adjusting the comparison over swapping the operands whenever the
        original constant was not valid.
        (arm_gen_dicompare_reg): If Y is not a valid operand, force it to a
        register here.
        (arm_validize_comparison): Do not force invalid DImode operands to
        registers here.
---
 gcc/config/arm/arm.c  | 37 +++++++++++++++++++++++--------------
 gcc/config/arm/arm.md |  4 ++--
 2 files changed, 25 insertions(+), 16 deletions(-)


0015-arm-Improve-handling-of-DImode-comparisions-against-.patch (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

[PATCH 16/29] [arm] early split most DImode comparison operations.

Richard Earnshaw (lists)
In reply to this post by Richard Earnshaw (lists)

This patch does most of the work for early splitting the DImode
comparisons.  We now handle EQ, NE, LT, GE, LTU and GEU during early
expansion, in addition to EQ and NE, for which the expansion has now
been reworked to use a standard conditional-compare pattern already in
the back-end.

To handle this we introduce two new condition flag modes that are used
when comparing the upper words of decomposed DImode values: one for
signed, and one for unsigned comparisons.  CC_Bmode (B for Borrow) is
essentially the inverse of CC_Cmode and is used when the carry flag is
set by a subtraction of unsigned values.

        * config/arm/arm-modes.def (CC_NV, CC_B): New CC modes.
        * config/arm/arm.c (arm_select_cc_mode): Recognize constructs that
        need these modes.
        (arm_gen_dicompare_reg): New code to early expand the sub-operations
        of EQ, NE, LT, GE, LTU and GEU.
        * config/arm/iterators.md (CC_EXTEND): New code attribute.
        * config/arm/predicates.md (arm_adcimm_operand): New predicate..
        * config/arm/arm.md (cmpsi3_carryin_<CC_EXTEND>out): New pattern.
        (cmpsi3_imm_carryin_<CC_EXTEND>out): Likewise.
        (cmpsi3_0_carryin_<CC_EXTEND>out): Likewise.
---
 gcc/config/arm/arm-modes.def |   6 +
 gcc/config/arm/arm.c         | 220 ++++++++++++++++++++++++++++++++++-
 gcc/config/arm/arm.md        |  45 +++++++
 gcc/config/arm/iterators.md  |   4 +
 gcc/config/arm/predicates.md |   6 +
 5 files changed, 278 insertions(+), 3 deletions(-)


0016-arm-early-split-most-DImode-comparison-operations.patch (12K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

[PATCH 17/29] [arm] Handle some constant comparisons using rsbs+rscs

Richard Earnshaw (lists)
In reply to this post by Richard Earnshaw (lists)

In a small number of cases it is preferable to handle comparisons with
constants using the sequence

        RSBS tmp, Xlo, constlo
        RSCS tmp, Xhi, consthi

which allows us to handle a small number of LE/GT/LEU/GEU cases when
changing the code to use LT/GE/LTU/GEU would make the constant more
expensive.  Sadly, we cannot do this on Thumb, since we need RSC, so we
now always use the incremented constant in that case since normally that
still works out cheaper than forcing the entire constant into a register.

Further investigation has also shown that the canonicalization of a
reverse subtract and compare is valid for signed as well as unsigned value,
so we relax the restriction on selecting CC_RSBmode to allow all types
of compare.

        * config/arm/arm.c (arm_const_double_prefer_rsbs_rsc): New function.
        (arm_canonicalize_comparison): For GT/LE/GTU/GEU, use the constant
        unchanged only if that will be cheaper.
        (arm_select_cc_mode): Recognize a swapped comparison that will
        be regenerated using RSBS or RSCS.  Relax restriction on selecting
        CC_RSBmode.
        (arm_gen_dicompare_reg): Handle LE/GT/LEU/GEU comparisons against
        a constant.
        (arm_gen_compare_reg): Handle compare (CONST, X) when the mode
        is CC_RSBmode.
        (maybe_get_arm_condition_code): CC_RSBmode now returns the same codes
        as CCmode.
        * config/arm/arm.md (rsb_imm_compare_scratch): New pattern.
        (rscsi3_<CC_EXTEND>out_scratch): New pattern.
---
 gcc/config/arm/arm.c  | 153 +++++++++++++++++++++++++++++-------------
 gcc/config/arm/arm.md |  27 ++++++++
 2 files changed, 134 insertions(+), 46 deletions(-)


0017-arm-Handle-some-constant-comparisons-using-rsbs-rscs.patch (11K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

[PATCH 18/29] [arm] Cleanup dead code - old support for DImode comparisons

Richard Earnshaw (lists)
In reply to this post by Richard Earnshaw (lists)

Now that all the major patterns for DImode have been converted to
early expansion, we can safely clean up some dead code for the old way
of handling DImode.

        * config/arm/arm-modes.def (CC_NCV, CC_CZ): Delete CC modes.
        * config/arm/arm.c (arm_select_cc_mode): Remove old selection code
        for DImode operands.
        (arm_gen_dicompare_reg): Remove unreachable expansion code.
        (maybe_get_arm_condition_code): Remove support for CC_CZmode and
        CC_NCVmode.
        * config/arm/arm.md (arm_cmpdi_insn): Delete.
        (arm_cmpdi_unsigned): Delete.
---
 gcc/config/arm/arm-modes.def |   5 --
 gcc/config/arm/arm.c         | 147 +----------------------------------
 gcc/config/arm/arm.md        |  45 -----------
 3 files changed, 1 insertion(+), 196 deletions(-)


0018-arm-Cleanup-dead-code-old-support-for-DImode-compari.patch (8K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

[PATCH 19/29] [arm] Handle immediate values in uaddvsi4

Richard Earnshaw (lists)
In reply to this post by Richard Earnshaw (lists)

The uaddv patterns in the arm back-end do not currenty handle immediates
during expansion.  This patch adds this support for uaddvsi4.  It's really
a stepping-stone towards early expansion of uaddvdi4, but it complete and
a useful change in its own right.

Whilst making this change I also observed that we really had two patterns
that did exactly the same thing, but with slightly different properties;
consequently I've cleaned up all of the add-and-compare patterns to bring
some consistency.

        * config/arm/arm.md (adddi3): Call gen_addsi3_compare_op1.
        * (uaddv<mode>4): Delete expansion pattern.
        (uaddvsi4): New pattern.
        (uaddvdi4): Likewise.
        (addsi3_compareC): Delete pattern, change callers to use
        addsi3_compare_op1.
        (addsi3_compare_op1): No-longer anonymous.  Clean up constraints to
        reduce the number of alternatives and re-work type attribute handling.
        (addsi3_compare_op2): Clean up constraints to reduce the number of
        alternatives and re-work type attribute handling.
        (compare_addsi2_op0): Likewise.
        (compare_addsi2_op1): Likewise.
---
 gcc/config/arm/arm.md | 118 ++++++++++++++++++++++--------------------
 1 file changed, 62 insertions(+), 56 deletions(-)


0019-arm-Handle-immediate-values-in-uaddvsi4.patch (7K) Download Attachment
123