PR83750: CSE erf/erfc pair

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

PR83750: CSE erf/erfc pair

Prathamesh Kulkarni-2
Hi,
This patch adds two transforms to match.pd to CSE erf/erfc pair.
erfc(x) is canonicalized to 1 - erf(x) and is then reversed to 1 -
erf(x) when canonicalization is disabled and result of erf(x) has
single use within 1 - erf(x).

The patch regressed builtin-nonneg-1.c. The following test-case
reproduces the issue with patch:

void test(double d1) {
  if (signbit(erfc(d1)))
    link_failure_erfc();
}

ssa dump:

  <bb 2> :
  _5 = __builtin_erf (d1_4(D));
  _1 = 1.0e+0 - _5;
  _6 = _1 < 0.0;
  _2 = (int) _6;
  if (_2 != 0)
    goto <bb 3>; [INV]
  else
    goto <bb 4>; [INV]

  <bb 3> :
  link_failure_erfc ();

  <bb 4> :
  return;

As can be seen, erfc(d1) is folded to 1 - erf(d1).
forwprop then transforms the if condition from _2 != 0
to _5 > 1.0e+0 and that defeats DCE thus resulting in link failure
in undefined reference to link_failure_erfc().

So, the patch adds another transform erf(x) > 1 -> 0
which resolves the regression.

Bootstrapped+tested on x86_64-unknown-linux-gnu.
Cross-testing on arm and aarch64 variants in progress.
OK for trunk if passes ?

Thanks,
Prathamesh

pr83750-4.txt (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: PR83750: CSE erf/erfc pair

Ulrich Drepper-3
On Fri, Nov 2, 2018 at 10:36 AM Prathamesh Kulkarni
<[hidden email]> wrote:
> So, the patch adds another transform erf(x) > 1 -> 0
> which resolves the regression.

Why don't you match for any constant with absolute value >= 1.0
instead of just 1.0?
Reply | Threaded
Open this post in threaded view
|

Re: PR83750: CSE erf/erfc pair

Jeff Law
In reply to this post by Prathamesh Kulkarni-2
On 11/2/18 3:36 AM, Prathamesh Kulkarni wrote:

> Hi,
> This patch adds two transforms to match.pd to CSE erf/erfc pair.
> erfc(x) is canonicalized to 1 - erf(x) and is then reversed to 1 -
> erf(x) when canonicalization is disabled and result of erf(x) has
> single use within 1 - erf(x).
>
> The patch regressed builtin-nonneg-1.c. The following test-case
> reproduces the issue with patch:
>
> void test(double d1) {
>   if (signbit(erfc(d1)))
>     link_failure_erfc();
> }
>
> ssa dump:
>
>   <bb 2> :
>   _5 = __builtin_erf (d1_4(D));
>   _1 = 1.0e+0 - _5;
>   _6 = _1 < 0.0;
>   _2 = (int) _6;
>   if (_2 != 0)
>     goto <bb 3>; [INV]
>   else
>     goto <bb 4>; [INV]
>
>   <bb 3> :
>   link_failure_erfc ();
>
>   <bb 4> :
>   return;
>
> As can be seen, erfc(d1) is folded to 1 - erf(d1).
> forwprop then transforms the if condition from _2 != 0
> to _5 > 1.0e+0 and that defeats DCE thus resulting in link failure
> in undefined reference to link_failure_erfc().
>
> So, the patch adds another transform erf(x) > 1 -> 0
> which resolves the regression.
>
> Bootstrapped+tested on x86_64-unknown-linux-gnu.
> Cross-testing on arm and aarch64 variants in progress.
> OK for trunk if passes ?
>
> Thanks,
> Prathamesh
>
>
> pr83750-4.txt
>
> 2018-11-02  Prathamesh Kulkarni  <[hidden email]>
>
> * match.pd (erfc(x) -> 1 - erf(x)): New pattern.
> (1 - erf(x) -> erfc(x)): Likewise.
> (erf(x) > 1 -> 0): Likewise.
>
> testsuite/
> * gcc.dg/tree-ssa/pr83750-1.c: New test
> * gcc.dg/tree-ssa/pr83750-2.c: Likewise.
Don't we have a flag specific to honoring nans?  Would that be better to
use than flag_unsafe_math_optimizations?  As Uli mentioned, there's
other cases (where ABS (const) >= 1.0.).

jeff
Reply | Threaded
Open this post in threaded view
|

Re: PR83750: CSE erf/erfc pair

Richard Biener-2
In reply to this post by Prathamesh Kulkarni-2
On Fri, Nov 2, 2018 at 10:37 AM Prathamesh Kulkarni
<[hidden email]> wrote:

>
> Hi,
> This patch adds two transforms to match.pd to CSE erf/erfc pair.
> erfc(x) is canonicalized to 1 - erf(x) and is then reversed to 1 -
> erf(x) when canonicalization is disabled and result of erf(x) has
> single use within 1 - erf(x).
>
> The patch regressed builtin-nonneg-1.c. The following test-case
> reproduces the issue with patch:
>
> void test(double d1) {
>   if (signbit(erfc(d1)))
>     link_failure_erfc();
> }
>
> ssa dump:
>
>   <bb 2> :
>   _5 = __builtin_erf (d1_4(D));
>   _1 = 1.0e+0 - _5;
>   _6 = _1 < 0.0;
>   _2 = (int) _6;
>   if (_2 != 0)
>     goto <bb 3>; [INV]
>   else
>     goto <bb 4>; [INV]
>
>   <bb 3> :
>   link_failure_erfc ();
>
>   <bb 4> :
>   return;
>
> As can be seen, erfc(d1) is folded to 1 - erf(d1).
> forwprop then transforms the if condition from _2 != 0
> to _5 > 1.0e+0 and that defeats DCE thus resulting in link failure
> in undefined reference to link_failure_erfc().
>
> So, the patch adds another transform erf(x) > 1 -> 0.

Ick.

Why not canonicalize erf (x) to 1-erfc(x) instead?

> which resolves the regression.
>
> Bootstrapped+tested on x86_64-unknown-linux-gnu.
> Cross-testing on arm and aarch64 variants in progress.
> OK for trunk if passes ?
>
> Thanks,
> Prathamesh
Reply | Threaded
Open this post in threaded view
|

Re: PR83750: CSE erf/erfc pair

Prathamesh Kulkarni-2
On Mon, 5 Nov 2018 at 15:10, Richard Biener <[hidden email]> wrote:

>
> On Fri, Nov 2, 2018 at 10:37 AM Prathamesh Kulkarni
> <[hidden email]> wrote:
> >
> > Hi,
> > This patch adds two transforms to match.pd to CSE erf/erfc pair.
> > erfc(x) is canonicalized to 1 - erf(x) and is then reversed to 1 -
> > erf(x) when canonicalization is disabled and result of erf(x) has
> > single use within 1 - erf(x).
> >
> > The patch regressed builtin-nonneg-1.c. The following test-case
> > reproduces the issue with patch:
> >
> > void test(double d1) {
> >   if (signbit(erfc(d1)))
> >     link_failure_erfc();
> > }
> >
> > ssa dump:
> >
> >   <bb 2> :
> >   _5 = __builtin_erf (d1_4(D));
> >   _1 = 1.0e+0 - _5;
> >   _6 = _1 < 0.0;
> >   _2 = (int) _6;
> >   if (_2 != 0)
> >     goto <bb 3>; [INV]
> >   else
> >     goto <bb 4>; [INV]
> >
> >   <bb 3> :
> >   link_failure_erfc ();
> >
> >   <bb 4> :
> >   return;
> >
> > As can be seen, erfc(d1) is folded to 1 - erf(d1).
> > forwprop then transforms the if condition from _2 != 0
> > to _5 > 1.0e+0 and that defeats DCE thus resulting in link failure
> > in undefined reference to link_failure_erfc().
> >
> > So, the patch adds another transform erf(x) > 1 -> 0.
>
> Ick.
>
> Why not canonicalize erf (x) to 1-erfc(x) instead?
Sorry I didn't quite follow, won't this cause similar issue with erf ?
I changed the pattern to canonicalize erf(x) -> 1 - erfc(x)
and 1 - erfc(x) -> erf(x) after canonicalization is disabled.

This caused undefined reference to link_failure_erf() in following test-case:

extern int signbit(double);
extern void link_failure_erf(void);
extern double erf(double);

void test(double d1) {
  if (signbit(erf(d1)))
    link_failure_erf();
}

forwprop1 shows:

   <bb 2> :
  _5 = __builtin_erfc (d1_4(D));
  _1 = 1.0e+0 - _5;
  _6 = _5 > 1.0e+0;
  _2 = (int) _6;
  if (_5 > 1.0e+0)
    goto <bb 3>; [INV]
  else
    goto <bb 4>; [INV]

  <bb 3> :
  link_failure_erf ();

  <bb 4> :
  return;

which defeats DCE to remove call to link_failure_erf.

Thanks,
Prathamesh
>
> > which resolves the regression.
> >
> > Bootstrapped+tested on x86_64-unknown-linux-gnu.
> > Cross-testing on arm and aarch64 variants in progress.
> > OK for trunk if passes ?
> >
> > Thanks,
> > Prathamesh
Reply | Threaded
Open this post in threaded view
|

Re: PR83750: CSE erf/erfc pair

Richard Biener-2
On Mon, Nov 5, 2018 at 1:11 PM Prathamesh Kulkarni
<[hidden email]> wrote:

>
> On Mon, 5 Nov 2018 at 15:10, Richard Biener <[hidden email]> wrote:
> >
> > On Fri, Nov 2, 2018 at 10:37 AM Prathamesh Kulkarni
> > <[hidden email]> wrote:
> > >
> > > Hi,
> > > This patch adds two transforms to match.pd to CSE erf/erfc pair.
> > > erfc(x) is canonicalized to 1 - erf(x) and is then reversed to 1 -
> > > erf(x) when canonicalization is disabled and result of erf(x) has
> > > single use within 1 - erf(x).
> > >
> > > The patch regressed builtin-nonneg-1.c. The following test-case
> > > reproduces the issue with patch:
> > >
> > > void test(double d1) {
> > >   if (signbit(erfc(d1)))
> > >     link_failure_erfc();
> > > }
> > >
> > > ssa dump:
> > >
> > >   <bb 2> :
> > >   _5 = __builtin_erf (d1_4(D));
> > >   _1 = 1.0e+0 - _5;
> > >   _6 = _1 < 0.0;
> > >   _2 = (int) _6;
> > >   if (_2 != 0)
> > >     goto <bb 3>; [INV]
> > >   else
> > >     goto <bb 4>; [INV]
> > >
> > >   <bb 3> :
> > >   link_failure_erfc ();
> > >
> > >   <bb 4> :
> > >   return;
> > >
> > > As can be seen, erfc(d1) is folded to 1 - erf(d1).
> > > forwprop then transforms the if condition from _2 != 0
> > > to _5 > 1.0e+0 and that defeats DCE thus resulting in link failure
> > > in undefined reference to link_failure_erfc().
> > >
> > > So, the patch adds another transform erf(x) > 1 -> 0.
> >
> > Ick.
> >
> > Why not canonicalize erf (x) to 1-erfc(x) instead?
> Sorry I didn't quite follow, won't this cause similar issue with erf ?
> I changed the pattern to canonicalize erf(x) -> 1 - erfc(x)
> and 1 - erfc(x) -> erf(x) after canonicalization is disabled.
>
> This caused undefined reference to link_failure_erf() in following test-case:
>
> extern int signbit(double);
> extern void link_failure_erf(void);
> extern double erf(double);
>
> void test(double d1) {
>   if (signbit(erf(d1)))
>     link_failure_erf();
> }

But that's already not optimized without any canonicalization
because erf returns sth in range [-1, 1].

I suggested the change because we have limited support for FP
value-ranges and nonnegative is one thing we can compute
(and erfc as opposed to erf is nonnegative).

> forwprop1 shows:
>
>    <bb 2> :
>   _5 = __builtin_erfc (d1_4(D));
>   _1 = 1.0e+0 - _5;
>   _6 = _5 > 1.0e+0;
>   _2 = (int) _6;
>   if (_5 > 1.0e+0)
>     goto <bb 3>; [INV]
>   else
>     goto <bb 4>; [INV]
>
>   <bb 3> :
>   link_failure_erf ();
>
>   <bb 4> :
>   return;
>
> which defeats DCE to remove call to link_failure_erf.
>
> Thanks,
> Prathamesh
> >
> > > which resolves the regression.
> > >
> > > Bootstrapped+tested on x86_64-unknown-linux-gnu.
> > > Cross-testing on arm and aarch64 variants in progress.
> > > OK for trunk if passes ?
> > >
> > > Thanks,
> > > Prathamesh
Reply | Threaded
Open this post in threaded view
|

Re: PR83750: CSE erf/erfc pair

Prathamesh Kulkarni-2
On Mon, 5 Nov 2018 at 18:14, Richard Biener <[hidden email]> wrote:

>
> On Mon, Nov 5, 2018 at 1:11 PM Prathamesh Kulkarni
> <[hidden email]> wrote:
> >
> > On Mon, 5 Nov 2018 at 15:10, Richard Biener <[hidden email]> wrote:
> > >
> > > On Fri, Nov 2, 2018 at 10:37 AM Prathamesh Kulkarni
> > > <[hidden email]> wrote:
> > > >
> > > > Hi,
> > > > This patch adds two transforms to match.pd to CSE erf/erfc pair.
> > > > erfc(x) is canonicalized to 1 - erf(x) and is then reversed to 1 -
> > > > erf(x) when canonicalization is disabled and result of erf(x) has
> > > > single use within 1 - erf(x).
> > > >
> > > > The patch regressed builtin-nonneg-1.c. The following test-case
> > > > reproduces the issue with patch:
> > > >
> > > > void test(double d1) {
> > > >   if (signbit(erfc(d1)))
> > > >     link_failure_erfc();
> > > > }
> > > >
> > > > ssa dump:
> > > >
> > > >   <bb 2> :
> > > >   _5 = __builtin_erf (d1_4(D));
> > > >   _1 = 1.0e+0 - _5;
> > > >   _6 = _1 < 0.0;
> > > >   _2 = (int) _6;
> > > >   if (_2 != 0)
> > > >     goto <bb 3>; [INV]
> > > >   else
> > > >     goto <bb 4>; [INV]
> > > >
> > > >   <bb 3> :
> > > >   link_failure_erfc ();
> > > >
> > > >   <bb 4> :
> > > >   return;
> > > >
> > > > As can be seen, erfc(d1) is folded to 1 - erf(d1).
> > > > forwprop then transforms the if condition from _2 != 0
> > > > to _5 > 1.0e+0 and that defeats DCE thus resulting in link failure
> > > > in undefined reference to link_failure_erfc().
> > > >
> > > > So, the patch adds another transform erf(x) > 1 -> 0.
> > >
> > > Ick.
> > >
> > > Why not canonicalize erf (x) to 1-erfc(x) instead?
> > Sorry I didn't quite follow, won't this cause similar issue with erf ?
> > I changed the pattern to canonicalize erf(x) -> 1 - erfc(x)
> > and 1 - erfc(x) -> erf(x) after canonicalization is disabled.
> >
> > This caused undefined reference to link_failure_erf() in following test-case:
> >
> > extern int signbit(double);
> > extern void link_failure_erf(void);
> > extern double erf(double);
> >
> > void test(double d1) {
> >   if (signbit(erf(d1)))
> >     link_failure_erf();
> > }
>
> But that's already not optimized without any canonicalization
> because erf returns sth in range [-1, 1].
>
> I suggested the change because we have limited support for FP
> value-ranges and nonnegative is one thing we can compute
> (and erfc as opposed to erf is nonnegative).
Ah right, thanks for the explanation.
Unfortunately this still regresses builtin-nonneg-1.c, which can be
reproduced with following test-case:

extern int signbit(double);
extern void link_failure_erf(void);
extern double erf(double);
extern double fabs(double);

void test(double d1) {
  if (signbit(erf(fabs(d1))))
    link_failure_erf();
}

signbit(erf(fabs(d1)) is transformed to 0 without patch but with patch
it gets canonicalized to signbit(1 - erfc(fabs(d1))) which similarly
defeats DCE.

forwprop1 shows:
<bb 2> :
  _1 = ABS_EXPR <d1_5(D)>;
  _6 = __builtin_erfc (_1);
  _2 = 1.0e+0 - _6;
  _7 = _6 > 1.0e+0;
  _3 = (int) _7;
  if (_6 > 1.0e+0)
    goto <bb 3>; [INV]
  else
    goto <bb 4>; [INV]

  <bb 3> :
  link_failure_erf ();

  <bb 4> :
  return;

I assume we would need to somehow tell gcc that the canonicalized
expression 1 - erfc(x) would not exceed 1.0 ?
Is there a better way to do that apart from defining pattern (1 -
erfc(x)) > 1.0 -> 0
which I agree doesn't look ideal to add in match.pd ?

Thanks
Prathamesh

>
> > forwprop1 shows:
> >
> >    <bb 2> :
> >   _5 = __builtin_erfc (d1_4(D));
> >   _1 = 1.0e+0 - _5;
> >   _6 = _5 > 1.0e+0;
> >   _2 = (int) _6;
> >   if (_5 > 1.0e+0)
> >     goto <bb 3>; [INV]
> >   else
> >     goto <bb 4>; [INV]
> >
> >   <bb 3> :
> >   link_failure_erf ();
> >
> >   <bb 4> :
> >   return;
> >
> > which defeats DCE to remove call to link_failure_erf.
> >
> > Thanks,
> > Prathamesh
> > >
> > > > which resolves the regression.
> > > >
> > > > Bootstrapped+tested on x86_64-unknown-linux-gnu.
> > > > Cross-testing on arm and aarch64 variants in progress.
> > > > OK for trunk if passes ?
> > > >
> > > > Thanks,
> > > > Prathamesh
Reply | Threaded
Open this post in threaded view
|

Re: PR83750: CSE erf/erfc pair

Joseph Myers
In reply to this post by Jeff Law
On Sun, 4 Nov 2018, Jeff Law wrote:

> Don't we have a flag specific to honoring nans?  Would that be better to
> use than flag_unsafe_math_optimizations?  As Uli mentioned, there's

That's only relevant for the comparison optimization, of course.

Converting erfc to 1-erf is dubious, since the whole point of erfc is for
cases where 1-erf is inaccurate.  (Conversion in the other direction also
needs flag_unsafe_math_optimizations.)

--
Joseph S. Myers
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: PR83750: CSE erf/erfc pair

Jeff Law
On 11/5/18 11:27 AM, Joseph Myers wrote:

> On Sun, 4 Nov 2018, Jeff Law wrote:
>
>> Don't we have a flag specific to honoring nans?  Would that be better to
>> use than flag_unsafe_math_optimizations?  As Uli mentioned, there's
>
> That's only relevant for the comparison optimization, of course.
>
> Converting erfc to 1-erf is dubious, since the whole point of erfc is for
> cases where 1-erf is inaccurate.  (Conversion in the other direction also
> needs flag_unsafe_math_optimizations.)
>
Understood.  Thanks for clarifying.  It seems like
unsafe-math-optimization is a better fit than the nan specific flag.

jeff
Reply | Threaded
Open this post in threaded view
|

Re: PR83750: CSE erf/erfc pair

Michael Matz
Hi,

On Mon, 5 Nov 2018, Jeff Law wrote:

> >> Don't we have a flag specific to honoring nans?  Would that be better
> >> to use than flag_unsafe_math_optimizations?  As Uli mentioned,
> >> there's
> >
> > That's only relevant for the comparison optimization, of course.
> >
> > Converting erfc to 1-erf is dubious, since the whole point of erfc is
> > for cases where 1-erf is inaccurate.  (Conversion in the other
> > direction also needs flag_unsafe_math_optimizations.)
> >
> Understood.  Thanks for clarifying.  It seems like
> unsafe-math-optimization is a better fit than the nan specific flag.

But still we should consider general usefullness, even with unsafe-math.  
In this case we would remove a usage of a slow function that the user
specifically used to deal with inaccuracies with an equally slow function
(plus a little arithmetic that is shadows by the functions slowness) that
now exacly produces the inaccuracies the user wanted to avoid.  I.e. the
speed gain is zero.  The "canonicalization gain" referred to in the PR
might be real, but it comes at the cost of introducing definite
catastrophic cancellation.

IMHO that's not a sensible transformation to do, under any flags.


Ciao,
Michael.
Reply | Threaded
Open this post in threaded view
|

Re: PR83750: CSE erf/erfc pair

Paul Koning-6


> On Nov 5, 2018, at 1:48 PM, Michael Matz <[hidden email]> wrote:
>
> Hi,
>
> On Mon, 5 Nov 2018, Jeff Law wrote:
>
>>>> Don't we have a flag specific to honoring nans?  Would that be better
>>>> to use than flag_unsafe_math_optimizations?  As Uli mentioned,
>>>> there's
>>>
>>> That's only relevant for the comparison optimization, of course.
>>>
>>> Converting erfc to 1-erf is dubious, since the whole point of erfc is
>>> for cases where 1-erf is inaccurate.  (Conversion in the other
>>> direction also needs flag_unsafe_math_optimizations.)
>>>
>> Understood.  Thanks for clarifying.  It seems like
>> unsafe-math-optimization is a better fit than the nan specific flag.
>
> But still we should consider general usefullness, even with unsafe-math.  
> In this case we would remove a usage of a slow function that the user
> specifically used to deal with inaccuracies with an equally slow function
> (plus a little arithmetic that is shadows by the functions slowness) that
> now exacly produces the inaccuracies the user wanted to avoid.  I.e. the
> speed gain is zero.  The "canonicalization gain" referred to in the PR
> might be real, but it comes at the cost of introducing definite
> catastrophic cancellation.
>
> IMHO that's not a sensible transformation to do, under any flags.

That seems right.  The same goes for log vs. logp1, and exp vs. expm1.

        paul

Reply | Threaded
Open this post in threaded view
|

Re: PR83750: CSE erf/erfc pair

Richard Biener-2
In reply to this post by Prathamesh Kulkarni-2
On Mon, Nov 5, 2018 at 3:11 PM Prathamesh Kulkarni
<[hidden email]> wrote:

>
> On Mon, 5 Nov 2018 at 18:14, Richard Biener <[hidden email]> wrote:
> >
> > On Mon, Nov 5, 2018 at 1:11 PM Prathamesh Kulkarni
> > <[hidden email]> wrote:
> > >
> > > On Mon, 5 Nov 2018 at 15:10, Richard Biener <[hidden email]> wrote:
> > > >
> > > > On Fri, Nov 2, 2018 at 10:37 AM Prathamesh Kulkarni
> > > > <[hidden email]> wrote:
> > > > >
> > > > > Hi,
> > > > > This patch adds two transforms to match.pd to CSE erf/erfc pair.
> > > > > erfc(x) is canonicalized to 1 - erf(x) and is then reversed to 1 -
> > > > > erf(x) when canonicalization is disabled and result of erf(x) has
> > > > > single use within 1 - erf(x).
> > > > >
> > > > > The patch regressed builtin-nonneg-1.c. The following test-case
> > > > > reproduces the issue with patch:
> > > > >
> > > > > void test(double d1) {
> > > > >   if (signbit(erfc(d1)))
> > > > >     link_failure_erfc();
> > > > > }
> > > > >
> > > > > ssa dump:
> > > > >
> > > > >   <bb 2> :
> > > > >   _5 = __builtin_erf (d1_4(D));
> > > > >   _1 = 1.0e+0 - _5;
> > > > >   _6 = _1 < 0.0;
> > > > >   _2 = (int) _6;
> > > > >   if (_2 != 0)
> > > > >     goto <bb 3>; [INV]
> > > > >   else
> > > > >     goto <bb 4>; [INV]
> > > > >
> > > > >   <bb 3> :
> > > > >   link_failure_erfc ();
> > > > >
> > > > >   <bb 4> :
> > > > >   return;
> > > > >
> > > > > As can be seen, erfc(d1) is folded to 1 - erf(d1).
> > > > > forwprop then transforms the if condition from _2 != 0
> > > > > to _5 > 1.0e+0 and that defeats DCE thus resulting in link failure
> > > > > in undefined reference to link_failure_erfc().
> > > > >
> > > > > So, the patch adds another transform erf(x) > 1 -> 0.
> > > >
> > > > Ick.
> > > >
> > > > Why not canonicalize erf (x) to 1-erfc(x) instead?
> > > Sorry I didn't quite follow, won't this cause similar issue with erf ?
> > > I changed the pattern to canonicalize erf(x) -> 1 - erfc(x)
> > > and 1 - erfc(x) -> erf(x) after canonicalization is disabled.
> > >
> > > This caused undefined reference to link_failure_erf() in following test-case:
> > >
> > > extern int signbit(double);
> > > extern void link_failure_erf(void);
> > > extern double erf(double);
> > >
> > > void test(double d1) {
> > >   if (signbit(erf(d1)))
> > >     link_failure_erf();
> > > }
> >
> > But that's already not optimized without any canonicalization
> > because erf returns sth in range [-1, 1].
> >
> > I suggested the change because we have limited support for FP
> > value-ranges and nonnegative is one thing we can compute
> > (and erfc as opposed to erf is nonnegative).
> Ah right, thanks for the explanation.
> Unfortunately this still regresses builtin-nonneg-1.c, which can be
> reproduced with following test-case:
>
> extern int signbit(double);
> extern void link_failure_erf(void);
> extern double erf(double);
> extern double fabs(double);
>
> void test(double d1) {
>   if (signbit(erf(fabs(d1))))
>     link_failure_erf();
> }
>
> signbit(erf(fabs(d1)) is transformed to 0 without patch but with patch
> it gets canonicalized to signbit(1 - erfc(fabs(d1))) which similarly
> defeats DCE.
>
> forwprop1 shows:
> <bb 2> :
>   _1 = ABS_EXPR <d1_5(D)>;
>   _6 = __builtin_erfc (_1);
>   _2 = 1.0e+0 - _6;
>   _7 = _6 > 1.0e+0;
>   _3 = (int) _7;
>   if (_6 > 1.0e+0)
>     goto <bb 3>; [INV]
>   else
>     goto <bb 4>; [INV]
>
>   <bb 3> :
>   link_failure_erf ();
>
>   <bb 4> :
>   return;
>
> I assume we would need to somehow tell gcc that the canonicalized
> expression 1 - erfc(x) would not exceed 1.0 ?
> Is there a better way to do that apart from defining pattern (1 -
> erfc(x)) > 1.0 -> 0
> which I agree doesn't look ideal to add in match.pd ?

You could handle a MINUS_EXPR of 1 and erfc() in
gimple_assign_nonnegative_warnv_p (I wouldn't bother
to do it for tree_binary_nonnegative_warnv_p)

This is of course similarly "lame" but a bit cleaner than
a match.pd pattern that just works for the comparison.

In reality the proper long-term fix is to add basic range
propagation to floats.

Richard.

>
> Thanks
> Prathamesh
> >
> > > forwprop1 shows:
> > >
> > >    <bb 2> :
> > >   _5 = __builtin_erfc (d1_4(D));
> > >   _1 = 1.0e+0 - _5;
> > >   _6 = _5 > 1.0e+0;
> > >   _2 = (int) _6;
> > >   if (_5 > 1.0e+0)
> > >     goto <bb 3>; [INV]
> > >   else
> > >     goto <bb 4>; [INV]
> > >
> > >   <bb 3> :
> > >   link_failure_erf ();
> > >
> > >   <bb 4> :
> > >   return;
> > >
> > > which defeats DCE to remove call to link_failure_erf.
> > >
> > > Thanks,
> > > Prathamesh
> > > >
> > > > > which resolves the regression.
> > > > >
> > > > > Bootstrapped+tested on x86_64-unknown-linux-gnu.
> > > > > Cross-testing on arm and aarch64 variants in progress.
> > > > > OK for trunk if passes ?
> > > > >
> > > > > Thanks,
> > > > > Prathamesh
Reply | Threaded
Open this post in threaded view
|

Re: PR83750: CSE erf/erfc pair

Prathamesh Kulkarni-2
On Tue, 6 Nov 2018 at 16:04, Richard Biener <[hidden email]> wrote:

>
> On Mon, Nov 5, 2018 at 3:11 PM Prathamesh Kulkarni
> <[hidden email]> wrote:
> >
> > On Mon, 5 Nov 2018 at 18:14, Richard Biener <[hidden email]> wrote:
> > >
> > > On Mon, Nov 5, 2018 at 1:11 PM Prathamesh Kulkarni
> > > <[hidden email]> wrote:
> > > >
> > > > On Mon, 5 Nov 2018 at 15:10, Richard Biener <[hidden email]> wrote:
> > > > >
> > > > > On Fri, Nov 2, 2018 at 10:37 AM Prathamesh Kulkarni
> > > > > <[hidden email]> wrote:
> > > > > >
> > > > > > Hi,
> > > > > > This patch adds two transforms to match.pd to CSE erf/erfc pair.
> > > > > > erfc(x) is canonicalized to 1 - erf(x) and is then reversed to 1 -
> > > > > > erf(x) when canonicalization is disabled and result of erf(x) has
> > > > > > single use within 1 - erf(x).
> > > > > >
> > > > > > The patch regressed builtin-nonneg-1.c. The following test-case
> > > > > > reproduces the issue with patch:
> > > > > >
> > > > > > void test(double d1) {
> > > > > >   if (signbit(erfc(d1)))
> > > > > >     link_failure_erfc();
> > > > > > }
> > > > > >
> > > > > > ssa dump:
> > > > > >
> > > > > >   <bb 2> :
> > > > > >   _5 = __builtin_erf (d1_4(D));
> > > > > >   _1 = 1.0e+0 - _5;
> > > > > >   _6 = _1 < 0.0;
> > > > > >   _2 = (int) _6;
> > > > > >   if (_2 != 0)
> > > > > >     goto <bb 3>; [INV]
> > > > > >   else
> > > > > >     goto <bb 4>; [INV]
> > > > > >
> > > > > >   <bb 3> :
> > > > > >   link_failure_erfc ();
> > > > > >
> > > > > >   <bb 4> :
> > > > > >   return;
> > > > > >
> > > > > > As can be seen, erfc(d1) is folded to 1 - erf(d1).
> > > > > > forwprop then transforms the if condition from _2 != 0
> > > > > > to _5 > 1.0e+0 and that defeats DCE thus resulting in link failure
> > > > > > in undefined reference to link_failure_erfc().
> > > > > >
> > > > > > So, the patch adds another transform erf(x) > 1 -> 0.
> > > > >
> > > > > Ick.
> > > > >
> > > > > Why not canonicalize erf (x) to 1-erfc(x) instead?
> > > > Sorry I didn't quite follow, won't this cause similar issue with erf ?
> > > > I changed the pattern to canonicalize erf(x) -> 1 - erfc(x)
> > > > and 1 - erfc(x) -> erf(x) after canonicalization is disabled.
> > > >
> > > > This caused undefined reference to link_failure_erf() in following test-case:
> > > >
> > > > extern int signbit(double);
> > > > extern void link_failure_erf(void);
> > > > extern double erf(double);
> > > >
> > > > void test(double d1) {
> > > >   if (signbit(erf(d1)))
> > > >     link_failure_erf();
> > > > }
> > >
> > > But that's already not optimized without any canonicalization
> > > because erf returns sth in range [-1, 1].
> > >
> > > I suggested the change because we have limited support for FP
> > > value-ranges and nonnegative is one thing we can compute
> > > (and erfc as opposed to erf is nonnegative).
> > Ah right, thanks for the explanation.
> > Unfortunately this still regresses builtin-nonneg-1.c, which can be
> > reproduced with following test-case:
> >
> > extern int signbit(double);
> > extern void link_failure_erf(void);
> > extern double erf(double);
> > extern double fabs(double);
> >
> > void test(double d1) {
> >   if (signbit(erf(fabs(d1))))
> >     link_failure_erf();
> > }
> >
> > signbit(erf(fabs(d1)) is transformed to 0 without patch but with patch
> > it gets canonicalized to signbit(1 - erfc(fabs(d1))) which similarly
> > defeats DCE.
> >
> > forwprop1 shows:
> > <bb 2> :
> >   _1 = ABS_EXPR <d1_5(D)>;
> >   _6 = __builtin_erfc (_1);
> >   _2 = 1.0e+0 - _6;
> >   _7 = _6 > 1.0e+0;
> >   _3 = (int) _7;
> >   if (_6 > 1.0e+0)
> >     goto <bb 3>; [INV]
> >   else
> >     goto <bb 4>; [INV]
> >
> >   <bb 3> :
> >   link_failure_erf ();
> >
> >   <bb 4> :
> >   return;
> >
> > I assume we would need to somehow tell gcc that the canonicalized
> > expression 1 - erfc(x) would not exceed 1.0 ?
> > Is there a better way to do that apart from defining pattern (1 -
> > erfc(x)) > 1.0 -> 0
> > which I agree doesn't look ideal to add in match.pd ?
>
> You could handle a MINUS_EXPR of 1 and erfc() in
> gimple_assign_nonnegative_warnv_p (I wouldn't bother
> to do it for tree_binary_nonnegative_warnv_p)
>
Um won't the value range of 1 - erfc(x) be [-1.0, 1.0] (same as
erf(x)), so would adding it to gimple_assign_nonnegative_warnv_p be
correct ?
erf(fabs(x)) is non-negative and between [0.0, 1.0] although I am not
sure if 1 - erfc(fabs(x)) would be always non-negative too if
erfc(fabs(x)) can exceed 1.0 for some value of x ?
AFAIU for the above case, gcc doesn't know that 1 - erfc(x) cannot
exceed 1.0, and that's because we don't propagate value range for
floats as you pointed out.

Thanks,
Prathamesh

> This is of course similarly "lame" but a bit cleaner than
> a match.pd pattern that just works for the comparison.
>
> In reality the proper long-term fix is to add basic range
> propagation to floats.
>
> Richard.
>
> >
> > Thanks
> > Prathamesh
> > >
> > > > forwprop1 shows:
> > > >
> > > >    <bb 2> :
> > > >   _5 = __builtin_erfc (d1_4(D));
> > > >   _1 = 1.0e+0 - _5;
> > > >   _6 = _5 > 1.0e+0;
> > > >   _2 = (int) _6;
> > > >   if (_5 > 1.0e+0)
> > > >     goto <bb 3>; [INV]
> > > >   else
> > > >     goto <bb 4>; [INV]
> > > >
> > > >   <bb 3> :
> > > >   link_failure_erf ();
> > > >
> > > >   <bb 4> :
> > > >   return;
> > > >
> > > > which defeats DCE to remove call to link_failure_erf.
> > > >
> > > > Thanks,
> > > > Prathamesh
> > > > >
> > > > > > which resolves the regression.
> > > > > >
> > > > > > Bootstrapped+tested on x86_64-unknown-linux-gnu.
> > > > > > Cross-testing on arm and aarch64 variants in progress.
> > > > > > OK for trunk if passes ?
> > > > > >
> > > > > > Thanks,
> > > > > > Prathamesh
Reply | Threaded
Open this post in threaded view
|

Re: PR83750: CSE erf/erfc pair

Richard Biener-2
On Fri, Nov 9, 2018 at 8:11 AM Prathamesh Kulkarni
<[hidden email]> wrote:

>
> On Tue, 6 Nov 2018 at 16:04, Richard Biener <[hidden email]> wrote:
> >
> > On Mon, Nov 5, 2018 at 3:11 PM Prathamesh Kulkarni
> > <[hidden email]> wrote:
> > >
> > > On Mon, 5 Nov 2018 at 18:14, Richard Biener <[hidden email]> wrote:
> > > >
> > > > On Mon, Nov 5, 2018 at 1:11 PM Prathamesh Kulkarni
> > > > <[hidden email]> wrote:
> > > > >
> > > > > On Mon, 5 Nov 2018 at 15:10, Richard Biener <[hidden email]> wrote:
> > > > > >
> > > > > > On Fri, Nov 2, 2018 at 10:37 AM Prathamesh Kulkarni
> > > > > > <[hidden email]> wrote:
> > > > > > >
> > > > > > > Hi,
> > > > > > > This patch adds two transforms to match.pd to CSE erf/erfc pair.
> > > > > > > erfc(x) is canonicalized to 1 - erf(x) and is then reversed to 1 -
> > > > > > > erf(x) when canonicalization is disabled and result of erf(x) has
> > > > > > > single use within 1 - erf(x).
> > > > > > >
> > > > > > > The patch regressed builtin-nonneg-1.c. The following test-case
> > > > > > > reproduces the issue with patch:
> > > > > > >
> > > > > > > void test(double d1) {
> > > > > > >   if (signbit(erfc(d1)))
> > > > > > >     link_failure_erfc();
> > > > > > > }
> > > > > > >
> > > > > > > ssa dump:
> > > > > > >
> > > > > > >   <bb 2> :
> > > > > > >   _5 = __builtin_erf (d1_4(D));
> > > > > > >   _1 = 1.0e+0 - _5;
> > > > > > >   _6 = _1 < 0.0;
> > > > > > >   _2 = (int) _6;
> > > > > > >   if (_2 != 0)
> > > > > > >     goto <bb 3>; [INV]
> > > > > > >   else
> > > > > > >     goto <bb 4>; [INV]
> > > > > > >
> > > > > > >   <bb 3> :
> > > > > > >   link_failure_erfc ();
> > > > > > >
> > > > > > >   <bb 4> :
> > > > > > >   return;
> > > > > > >
> > > > > > > As can be seen, erfc(d1) is folded to 1 - erf(d1).
> > > > > > > forwprop then transforms the if condition from _2 != 0
> > > > > > > to _5 > 1.0e+0 and that defeats DCE thus resulting in link failure
> > > > > > > in undefined reference to link_failure_erfc().
> > > > > > >
> > > > > > > So, the patch adds another transform erf(x) > 1 -> 0.
> > > > > >
> > > > > > Ick.
> > > > > >
> > > > > > Why not canonicalize erf (x) to 1-erfc(x) instead?
> > > > > Sorry I didn't quite follow, won't this cause similar issue with erf ?
> > > > > I changed the pattern to canonicalize erf(x) -> 1 - erfc(x)
> > > > > and 1 - erfc(x) -> erf(x) after canonicalization is disabled.
> > > > >
> > > > > This caused undefined reference to link_failure_erf() in following test-case:
> > > > >
> > > > > extern int signbit(double);
> > > > > extern void link_failure_erf(void);
> > > > > extern double erf(double);
> > > > >
> > > > > void test(double d1) {
> > > > >   if (signbit(erf(d1)))
> > > > >     link_failure_erf();
> > > > > }
> > > >
> > > > But that's already not optimized without any canonicalization
> > > > because erf returns sth in range [-1, 1].
> > > >
> > > > I suggested the change because we have limited support for FP
> > > > value-ranges and nonnegative is one thing we can compute
> > > > (and erfc as opposed to erf is nonnegative).
> > > Ah right, thanks for the explanation.
> > > Unfortunately this still regresses builtin-nonneg-1.c, which can be
> > > reproduced with following test-case:
> > >
> > > extern int signbit(double);
> > > extern void link_failure_erf(void);
> > > extern double erf(double);
> > > extern double fabs(double);
> > >
> > > void test(double d1) {
> > >   if (signbit(erf(fabs(d1))))
> > >     link_failure_erf();
> > > }
> > >
> > > signbit(erf(fabs(d1)) is transformed to 0 without patch but with patch
> > > it gets canonicalized to signbit(1 - erfc(fabs(d1))) which similarly
> > > defeats DCE.
> > >
> > > forwprop1 shows:
> > > <bb 2> :
> > >   _1 = ABS_EXPR <d1_5(D)>;
> > >   _6 = __builtin_erfc (_1);
> > >   _2 = 1.0e+0 - _6;
> > >   _7 = _6 > 1.0e+0;
> > >   _3 = (int) _7;
> > >   if (_6 > 1.0e+0)
> > >     goto <bb 3>; [INV]
> > >   else
> > >     goto <bb 4>; [INV]
> > >
> > >   <bb 3> :
> > >   link_failure_erf ();
> > >
> > >   <bb 4> :
> > >   return;
> > >
> > > I assume we would need to somehow tell gcc that the canonicalized
> > > expression 1 - erfc(x) would not exceed 1.0 ?
> > > Is there a better way to do that apart from defining pattern (1 -
> > > erfc(x)) > 1.0 -> 0
> > > which I agree doesn't look ideal to add in match.pd ?
> >
> > You could handle a MINUS_EXPR of 1 and erfc() in
> > gimple_assign_nonnegative_warnv_p (I wouldn't bother
> > to do it for tree_binary_nonnegative_warnv_p)
> >
> Um won't the value range of 1 - erfc(x) be [-1.0, 1.0] (same as
> erf(x)), so would adding it to gimple_assign_nonnegative_warnv_p be
> correct ?

Under the same condition as erf, when the argument is positive.
This is what the testcase is testing.

> erf(fabs(x)) is non-negative and between [0.0, 1.0] although I am not
> sure if 1 - erfc(fabs(x)) would be always non-negative too if
> erfc(fabs(x)) can exceed 1.0 for some value of x ?
> AFAIU for the above case, gcc doesn't know that 1 - erfc(x) cannot
> exceed 1.0, and that's because we don't propagate value range for
> floats as you pointed out.
>
> Thanks,
> Prathamesh
> > This is of course similarly "lame" but a bit cleaner than
> > a match.pd pattern that just works for the comparison.
> >
> > In reality the proper long-term fix is to add basic range
> > propagation to floats.
> >
> > Richard.
> >
> > >
> > > Thanks
> > > Prathamesh
> > > >
> > > > > forwprop1 shows:
> > > > >
> > > > >    <bb 2> :
> > > > >   _5 = __builtin_erfc (d1_4(D));
> > > > >   _1 = 1.0e+0 - _5;
> > > > >   _6 = _5 > 1.0e+0;
> > > > >   _2 = (int) _6;
> > > > >   if (_5 > 1.0e+0)
> > > > >     goto <bb 3>; [INV]
> > > > >   else
> > > > >     goto <bb 4>; [INV]
> > > > >
> > > > >   <bb 3> :
> > > > >   link_failure_erf ();
> > > > >
> > > > >   <bb 4> :
> > > > >   return;
> > > > >
> > > > > which defeats DCE to remove call to link_failure_erf.
> > > > >
> > > > > Thanks,
> > > > > Prathamesh
> > > > > >
> > > > > > > which resolves the regression.
> > > > > > >
> > > > > > > Bootstrapped+tested on x86_64-unknown-linux-gnu.
> > > > > > > Cross-testing on arm and aarch64 variants in progress.
> > > > > > > OK for trunk if passes ?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Prathamesh