https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88784
Bug ID: 88784 Summary: Middle end is missing some optimizations about unsigned Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: helijia at gcc dot gnu.org Target Milestone: --- For both operands are unsigned, the following optimizations are valid, and missing: 1. X > Y && X != 0 --> X > Y 2. X > Y || X != 0 --> X != 0 3. X <= Y || X != 0 --> true 4. X <= Y || X == 0 --> X <= Y 5. X > Y && X == 0 --> false unsigned foo(unsigned x, unsigned y) { return x > y && x != 0; } should fold to x > y, but I found we haven't done it right now. I compile the code with the following command. g++ unsigned.cpp -Ofast -c -S -o unsigned.s -fdump-tree-all |
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88784
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |missed-optimization Status|UNCONFIRMED |NEW Last reconfirmed| |2019-01-10 Ever confirmed|0 |1 --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- Confirmed. Similar for singed and X > Y && X != INT_MIN, etc. ifcombine is one place to fix (maybe_fold_and_comparisons and friends), match.pd in case this gets lowered to BIT_AND/IOR. |
In reply to this post by marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88784
Qi Feng <ffengqi at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |ffengqi at gcc dot gnu.org --- Comment #2 from Qi Feng <ffengqi at gcc dot gnu.org> --- I tried some modifications in and_comparisons_1 and or_comparisons_1. It seems that `||' is transformed to `&&' somewhere. And that makes the changes in or_comparisons_1 noneffective. Should I find the place where the transformation happens and make changes before that? |
In reply to this post by marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88784
--- Comment #3 from Qi Feng <ffengqi at gcc dot gnu.org> --- I have extended the transformations as following, the first five are the original ones: * unsigned Use UINT_MAX for demonstration, similar for UCHAR_MAX, USHRT_MAX, UINT_MAX, ULONG_MAX, ULLONG_MAX x > y && x != 0 --> x > y x > y || x != 0 --> x != 0 x <= y || x != 0 --> true x <= y || x == 0 --> x <= y x > y && x == 0 --> false x < y && x != UINT_MAX --> x < y x < y || x != UINT_MAX --> x != UINT_MAX x >= y || x != UINT_MAX --> true x >= y || x == UINT_MAX --> x >= y x < y && x == UINT_MAX --> false * signed Use INT_MIN, INT_MAX for demonstration, similar for SCHAR_MIN, SHRT_MIN, INT_MIN, LONG_MIN, LLONG_MIN, SCHAR_MAX, SHRT_MAX, INT_MAX, LONG_MAX, LLONG_MAX x > y && x != INT_MIN --> x > y x > y || x != INT_MIN --> x != INT_MIN x <= y || x != INT_MIN --> true x <= y || x == INT_MIN --> x <= y x > y && x == INT_MIN --> false x < y && x != INT_MAX --> x < y x < y || x != INT_MAX --> x != UINT_MAX x >= y || x != INT_MAX --> true x >= y || x == INT_MAX --> x >= y x < y && x == INT_MAX --> false Want to know if I missed something. |
In reply to this post by marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88784
--- Comment #4 from Qi Feng <ffengqi at gcc dot gnu.org> --- The fourth to the last should be: x < y || x != INT_MAX --> x != UINT_MAX sorry for the typo. |
In reply to this post by marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88784
--- Comment #5 from Qi Feng <ffengqi at gcc dot gnu.org> --- (In reply to Qi Feng from comment #4) > The fourth to the last should be: > > x < y || x != INT_MAX --> x != UINT_MAX > > sorry for the typo. x < y || x != INT_MAX --> x != INT_MAX typo again... |
In reply to this post by marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88784
--- Comment #6 from Fredrik Hederstierna <[hidden email]> --- Created attachment 46397 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46397&action=edit Some more patterns Looking into this I found some more places where it seems to be non-optimal code, maybe separate issue, but these are also example of equal evaluation for unsigned types ? Test 1 (x > y) || (x > (y / N)) equal to (x > (y / N)) Test 2 (x > y) || (x > (y >> N)) equal to (x > (y >> N)) Test 3 (x > y) && (x > (y / N)) equal to (x > y) Test 4 (x > y) && (x > (y >> N)) equal to (x > y) One thing to consider here maybe that depending on optimizing for size or speed, then the order of evaluation can be changed, so like if some operation is costy, then it could be avoided to obtain higher speed possibly assuming it will accept arguments prior in list I guess. But when optimizing for size, then I think always the more simplified expression would apply? Example for arm using above expressions, (code attached) 00000000 <test_xy_1_org>: 0: b510 push {r4, lr} 2: 000b movs r3, r1 4: 0004 movs r4, r0 6: 2001 movs r0, #1 8: 428c cmp r4, r1 a: d807 bhi.n 1c <test_xy_1_org+0x1c> c: 2103 movs r1, #3 e: 0018 movs r0, r3 10: f7ff fffe bl 0 <__aeabi_uidiv> 14: b2c0 uxtb r0, r0 16: 42a0 cmp r0, r4 18: 4180 sbcs r0, r0 1a: 4240 negs r0, r0 1c: bd10 pop {r4, pc} 0000001e <test_xy_1_equ>: 1e: b510 push {r4, lr} 20: 0004 movs r4, r0 22: 0008 movs r0, r1 24: 2103 movs r1, #3 26: f7ff fffe bl 0 <__aeabi_uidiv> 2a: b2c0 uxtb r0, r0 2c: 42a0 cmp r0, r4 2e: 4180 sbcs r0, r0 30: 4240 negs r0, r0 32: bd10 pop {r4, pc} 00000034 <test_xy_2_org>: 34: 0003 movs r3, r0 36: 2001 movs r0, #1 38: 428b cmp r3, r1 3a: d803 bhi.n 44 <test_xy_2_org+0x10> 3c: 08c9 lsrs r1, r1, #3 3e: 4299 cmp r1, r3 40: 4189 sbcs r1, r1 42: 4248 negs r0, r1 44: 4770 bx lr 00000046 <test_xy_2_equ>: 46: 08c9 lsrs r1, r1, #3 48: 4281 cmp r1, r0 4a: 4180 sbcs r0, r0 4c: 4240 negs r0, r0 4e: 4770 bx lr 00000050 <test_xy_3_org>: 50: b510 push {r4, lr} 52: 000b movs r3, r1 54: 0004 movs r4, r0 56: 2000 movs r0, #0 58: 428c cmp r4, r1 5a: d907 bls.n 6c <test_xy_3_org+0x1c> 5c: 2103 movs r1, #3 5e: 0018 movs r0, r3 60: f7ff fffe bl 0 <__aeabi_uidiv> 64: b2c0 uxtb r0, r0 66: 42a0 cmp r0, r4 68: 4180 sbcs r0, r0 6a: 4240 negs r0, r0 6c: bd10 pop {r4, pc} 0000006e <test_xy_3_equ>: 6e: 4281 cmp r1, r0 70: 4180 sbcs r0, r0 72: 4240 negs r0, r0 74: 4770 bx lr 00000076 <test_xy_4_org>: 76: 0003 movs r3, r0 78: 2000 movs r0, #0 7a: 428b cmp r3, r1 7c: d903 bls.n 86 <test_xy_4_org+0x10> 7e: 08c9 lsrs r1, r1, #3 80: 4299 cmp r1, r3 82: 4189 sbcs r1, r1 84: 4248 negs r0, r1 86: 4770 bx lr 00000088 <test_xy_4_equ>: 88: 4281 cmp r1, r0 8a: 4180 sbcs r0, r0 8c: 4240 negs r0, r0 8e: 4770 bx lr |
In reply to this post by marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88784
--- Comment #7 from Qi Feng <ffengqi at gcc dot gnu.org> --- I add some patterns in match.pd which handles the original 5 transformations. But I don't the language used in match.pd well, the patterns I wrote are very similar. And I haven't found predicates for constant values other than zero (INT_MIN, LONG_MIN, LLONG_MIN etc.). |
In reply to this post by marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88784
--- Comment #8 from rguenther at suse dot de <rguenther at suse dot de> --- On Wed, 22 May 2019, ffengqi at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88784 > > --- Comment #7 from Qi Feng <ffengqi at gcc dot gnu.org> --- > I add some patterns in match.pd which handles the original 5 transformations. > But I don't the language used in match.pd well, the patterns I wrote are very > similar. Sometimes iteration helps but sometimes it obfuscates things too much. match.pd also runs through the preprocessor (OK, I didn't really suggest to use macros to merge things ;)) > And I haven't found predicates for constant values other than zero (INT_MIN, > LONG_MIN, LLONG_MIN etc.). wi:eq_p (wi::to_wide (@1), wi::max_value (TYPE_PRECISION (TREE_TYPE (@1)), SIGNED)) (ok, a bit long...) |
In reply to this post by marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88784
--- Comment #9 from Qi Feng <ffengqi at gcc dot gnu.org> --- And there's another problem. Take `x > y && x != 0 --> x > y' for example, I would also like to do x < y && y != 0 --> x < y x != 0 && x > y --> x > y y != 0 && x < y --> x < y If the constant always comes in as the second operand is incorrect, these would have to be doubled. I tried to add :c to truth_andif, but got the `operation is not commutative' error. I also tried to make truth_andif commutative by modifying genmatch.c, but again, I don't know it well, afraid that I would break something. The patterns I wrote looks like: /* x > y && x != 0 --> x > y Only for unsigned x and y. */ (simplify (truth_andif:c (gt@2 @0 @1) (ne @0 integer_zerop)) (if (INTEGRAL_TYPE_P (TREE_TYPE(@0)) && TYPE_UNSIGNED (TREE_TYPE(@0)) && INTEGRAL_TYPE_P (TREE_TYPE(@1)) && TYPE_UNSIGNED (TREE_TYPE(@1))) @2)) I have to wrote 4 of this with minor modification for a single transformation. If there's better way to do it, please do leave a comment. |
In reply to this post by marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88784
--- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Qi Feng from comment #9) > And there's another problem. Take `x > y && x != 0 --> x > y' for > example, I would also like to do > > x < y && y != 0 --> x < y > x != 0 && x > y --> x > y > y != 0 && x < y --> x < y > > If the constant always comes in as the second operand is incorrect, these > would have to be doubled. > > I tried to add :c to truth_andif, but got the `operation is not commutative' > error. I also tried to make truth_andif commutative by modifying > genmatch.c, but again, I don't know it well, afraid that I would break > something. > > The patterns I wrote looks like: > > /* x > y && x != 0 --> x > y > Only for unsigned x and y. */ > (simplify > (truth_andif:c (gt@2 @0 @1) (ne @0 integer_zerop)) > (if (INTEGRAL_TYPE_P (TREE_TYPE(@0)) && TYPE_UNSIGNED (TREE_TYPE(@0)) > && INTEGRAL_TYPE_P (TREE_TYPE(@1)) && TYPE_UNSIGNED > (TREE_TYPE(@1))) > @2)) > > I have to wrote 4 of this with minor modification for a single > transformation. If there's better way to do it, please do leave a comment. I think first of all you do _not_ want to use truth_andif since that will only prevail iff x or y have side-effects. To match on GIMPLE you want bit_and instead/as well since all truth_ stuff doesn't prevail there. And obviously truth_andif is _not_ commutative. You can use :C if you want to force it though. Both truth_and and bit_and are commutative. So sth like (for and (truth_and bit_and) (for ltgtop (lt le) (simplify (and:c (ltgtop:c@2 @0 @1) (ne @0 integer_zerop)) (if (...) @2))) should cover all of x < y && y != 0 --> x < y x != 0 && x > y --> x > y y != 0 && x < y --> x < y x < y && y != 0 --> x < y note that from (and (lt:c@2 @0 @1) (ne @0 integer_zerop)) we generate (and (lt@2 @0 @1) (ne @0 integer_zerop)) (and (gt@2 @1 @0) (ne @0 integer_zerop)) so :c will ensure the semantically same operation will be present with swapped operands. As opposed to :C which would do lt@2 @1 @0. |
In reply to this post by marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88784
--- Comment #11 from Qi Feng <ffengqi at gcc dot gnu.org> --- I tried 2 patterns for the following test /* 1. x > y && x != 0 --> x > y */ /* 2. y < x && x != 0 --> y < x */ /* 3. x != 0 && x > y --> x > y */ /* 4. x != 0 && y < x --> y < x */ _Bool f1 (unsigned x, unsigned y) { return x > y && x != 0; } _Bool f2 (unsigned x, unsigned y) { return y < x && x != 0; } _Bool f3 (unsigned x, unsigned y) { return x != 0 && x > y; } _Bool f4 (unsigned x, unsigned y) { return x != 0 && y < x; } The first one is (for and (truth_and bit_and) (simplify (and:c (gt:c@2 @0 @1) (ne @0 integer_zerop)) (if (INTEGRAL_TYPE_P (TREE_TYPE(@0)) && TYPE_UNSIGNED (TREE_TYPE(@0)) && INTEGRAL_TYPE_P (TREE_TYPE(@1)) && TYPE_UNSIGNED (TREE_TYPE(@1))) @2))) The transformations did not happen as I checked the dump files of -fdump-tree-{original,optimized}. And the second one is (simplify (truth_andif:C (gt:c@2 @0 @1) (ne @0 integer_zerop)) (if (INTEGRAL_TYPE_P (TREE_TYPE(@0)) && TYPE_UNSIGNED (TREE_TYPE(@0)) && INTEGRAL_TYPE_P (TREE_TYPE(@1)) && TYPE_UNSIGNED (TREE_TYPE(@1))) @2)) The transformations did happen for all the 4 functions. And the dump of -fdump-tree-original showes they already happened, so I guess the pattern is matched before the process get to GIMPLE. |
In reply to this post by marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88784
--- Comment #12 from Marc Glisse <glisse at gcc dot gnu.org> --- (In reply to Qi Feng from comment #11) > I tried 2 patterns for the following test > > /* 1. x > y && x != 0 --> x > y */ > /* 2. y < x && x != 0 --> y < x */ > /* 3. x != 0 && x > y --> x > y */ > /* 4. x != 0 && y < x --> y < x */ > > _Bool f1 (unsigned x, unsigned y) > { > return x > y && x != 0; > } > > _Bool f2 (unsigned x, unsigned y) > { > return y < x && x != 0; > } > > _Bool f3 (unsigned x, unsigned y) > { > return x != 0 && x > y; > } > > _Bool f4 (unsigned x, unsigned y) > { > return x != 0 && y < x; > } > > The first one is > > (for and (truth_and bit_and) > (simplify > (and:c (gt:c@2 @0 @1) (ne @0 integer_zerop)) > (if (INTEGRAL_TYPE_P (TREE_TYPE(@0)) && TYPE_UNSIGNED (TREE_TYPE(@0)) > && INTEGRAL_TYPE_P (TREE_TYPE(@1)) && TYPE_UNSIGNED > (TREE_TYPE(@1))) > @2))) > > The transformations did not happen as I checked the dump files of > -fdump-tree-{original,optimized}. It isn't supposed to be done in original anyway. It does work in optimized (even forwprop1) for me. Did you forget to pass -O? Did you look at some old dump file? (I think you could use ANY_INTEGRAL_TYPE_P, this seems valid for vectors) |
In reply to this post by marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88784
--- Comment #13 from rguenther at suse dot de <rguenther at suse dot de> --- On Fri, 24 May 2019, glisse at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88784 > > --- Comment #12 from Marc Glisse <glisse at gcc dot gnu.org> --- > (In reply to Qi Feng from comment #11) > > I tried 2 patterns for the following test > > > > /* 1. x > y && x != 0 --> x > y */ > > /* 2. y < x && x != 0 --> y < x */ > > /* 3. x != 0 && x > y --> x > y */ > > /* 4. x != 0 && y < x --> y < x */ > > > > _Bool f1 (unsigned x, unsigned y) > > { > > return x > y && x != 0; > > } > > > > _Bool f2 (unsigned x, unsigned y) > > { > > return y < x && x != 0; > > } > > > > _Bool f3 (unsigned x, unsigned y) > > { > > return x != 0 && x > y; > > } > > > > _Bool f4 (unsigned x, unsigned y) > > { > > return x != 0 && y < x; > > } > > > > The first one is > > > > (for and (truth_and bit_and) > > (simplify > > (and:c (gt:c@2 @0 @1) (ne @0 integer_zerop)) > > (if (INTEGRAL_TYPE_P (TREE_TYPE(@0)) && TYPE_UNSIGNED (TREE_TYPE(@0)) > > && INTEGRAL_TYPE_P (TREE_TYPE(@1)) && TYPE_UNSIGNED > > (TREE_TYPE(@1))) > > @2))) > > > > The transformations did not happen as I checked the dump files of > > -fdump-tree-{original,optimized}. > > It isn't supposed to be done in original anyway. It does work in optimized > (even forwprop1) for me. Did you forget to pass -O? Did you look at some old > dump file? > > (I think you could use ANY_INTEGRAL_TYPE_P, this seems valid for vectors) I would have expected fold to first change the TRUTH_ANDIF to a TRUTH_AND and then your pattern match. But maybe I misremember that we have such transformation for cases where the 2nd operand doesn't have side-effects. While genmatch inserts checks for and rejects operands with side-effects I still wouldn't use truth_andif here. As Marc says, expecting the transform in .original is probably premature. OTOH whether it is applicable on GIMPLE in the end might depend on LOGICAL_OP_NON_SHORT_CIRCUIT and BRANCH_COST. |
In reply to this post by marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88784
--- Comment #14 from Qi Feng <ffengqi at gcc dot gnu.org> --- Checking .original and .optimized file is just a quick method I use to check whether an optimization happened (if not happen in first and last pass, probably it didn't happen). I didn't mean or think the transform should happen in these passes. Using the second pattern, the result of transform already showed in .original file, so I think it maybe happened earlier (like in GENERIC). And I tried the first pattern again using a totally clean debug build (bootstrap disabled), still didn't see the transformation happen. I don't know what went wrong. :( (I tested it on a ppc64le machine, don't know if that matters) |
In reply to this post by marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88784
--- Comment #15 from rguenther at suse dot de <rguenther at suse dot de> --- On Fri, 24 May 2019, ffengqi at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88784 > > --- Comment #14 from Qi Feng <ffengqi at gcc dot gnu.org> --- > Checking .original and .optimized file is just a quick method I use to check > whether an optimization happened (if not happen in first and last pass, > probably it didn't happen). I didn't mean or think the transform should happen > in these passes. > > Using the second pattern, the result of transform already showed in .original > file, so I think it maybe happened earlier (like in GENERIC). > > And I tried the first pattern again using a totally clean debug build > (bootstrap disabled), still didn't see the transformation happen. I don't know > what went wrong. :( > > (I tested it on a ppc64le machine, don't know if that matters) It can matter because we might have gimplified the && to control flow. |
In reply to this post by marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88784
--- Comment #16 from Marc Glisse <glisse at gcc dot gnu.org> --- (In reply to [hidden email] from comment #15) > It can matter because we might have gimplified the && to > control flow. Indeed. You want to look at the forwprop1 dump to see what gimple-match would need to match (and does match on x86_64). When X > Y && X != 0 is represented with control flow, in theory, VRP could deduce a range [1,inf] for X from X>Y, but in practice it uses a symbolic range [y_3(D) + 1, +INF] from which we fail to deduce X!=0. I think the new ranger thing that doesn't try to do symbolic ranges would handle that better ;-) |
In reply to this post by marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88784
--- Comment #17 from rguenther at suse dot de <rguenther at suse dot de> --- On Fri, 24 May 2019, glisse at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88784 > > --- Comment #16 from Marc Glisse <glisse at gcc dot gnu.org> --- > (In reply to [hidden email] from comment #15) > > It can matter because we might have gimplified the && to > > control flow. > > Indeed. You want to look at the forwprop1 dump to see what gimple-match would > need to match (and does match on x86_64). > > When X > Y && X != 0 is represented with control flow, in theory, VRP could > deduce a range [1,inf] for X from X>Y, but in practice it uses a symbolic range > [y_3(D) + 1, +INF] from which we fail to deduce X!=0. I think the new ranger > thing that doesn't try to do symbolic ranges would handle that better ;-) Note that in theory the ifcombine pass should pick up the opportunity to simplify the comparison via maybe_fold_and_comparisons. It looks like that code is quite old and could need a rewrite to match-and-simplify. But of course it exists to avoid creating trees for the combination parts which still has a point in match-and-simplify times. An opportunity for an alternate code-generator from match.pd since the patterns are still valid sources for the transform just the input is bigger exploded forms. |
In reply to this post by marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88784
--- Comment #18 from Qi Feng <ffengqi at gcc dot gnu.org> --- I only learned gcc for about 2 months, and I have to say that I don't fully understand what you guys were saying. Is match.pd the right place to fix this issue? Am I in the wrong direction? |
In reply to this post by marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88784
--- Comment #19 from rguenther at suse dot de <rguenther at suse dot de> --- On Mon, 27 May 2019, ffengqi at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88784 > > --- Comment #18 from Qi Feng <ffengqi at gcc dot gnu.org> --- > I only learned gcc for about 2 months, and I have to say that I don't fully > understand what you guys were saying. Is match.pd the right place to fix this > issue? Am I in the wrong direction? Yes, match.pd is the correct place to add this kind of simplifications. No, please don't use *_{and,or}if codes there if possible. Yes, as I said upthread it might be necessary to rewrite some of GCCs infrastructure to get the pattern applied in all situations. |
Free forum by Nabble | Edit this page |