[Bug fortran/91690] New: Slow IEEE intrinsics

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

[Bug fortran/91690] New: Slow IEEE intrinsics

jason at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91690

            Bug ID: 91690
           Summary: Slow IEEE intrinsics
           Product: gcc
           Version: 9.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: fortran
          Assignee: unassigned at gcc dot gnu.org
          Reporter: townsend at astro dot wisc.edu
  Target Milestone: ---

Created attachment 46844
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46844&action=edit
Demo program

The intrinsics provided by the IEEE_ARITHMETIC module appear to be
significantly slower than an equivalent implementation based on inspecting
bits.

I attach a short program that demonstrates this. Compiling with

gfortran -O2 -o test_ieee test_ieee.f90

...I get the following output:

  Using IEEE
    time:   16.1860008    
 Using bits
    time:  0.101000004

So, the IEEE_IS_NAN() intrinsic appears to be ~160 times slower than inspecting
bits directly. Other intrinsics seem to show similar issues.

cheers,

Rich
Reply | Threaded
Open this post in threaded view
|

[Bug fortran/91690] Slow IEEE intrinsics

jason at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91690

kargl at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kargl at gcc dot gnu.org

--- Comment #1 from kargl at gcc dot gnu.org ---

(In reply to Rich Townsend from comment #0)
> Created attachment 46844 [details]
> Demo program
>
> The intrinsics provided by the IEEE_ARITHMETIC module appear to be
> significantly slower than an equivalent implementation based on inspecting
> bits.
>

The implementations are not equivalent.  -fdump-tree-original may
shield some light.

is_nan_bits() isn't worried about FPU state (ie., keeping track of
exceptions).  is_nan_bits is reduced to something simply using simple
bit twiddling.  The following is somewhat edited to reduce length

is_nan_bits (real(kind=4) & x)
{
  integer(kind=4) expn, frac, is_nan, ix, sign;
  {
    integer(kind=8) D.3904, D.3905, D.3906, transfer.0;
    D.3904 = 4;
    D.3905 = 4;
    __builtin_memcpy ((void *) &transfer.0, (void *) x,
                      (unsigned long) MAX_EXPR <MIN_EXPR <D.3905, D.3904>, 0>);
    ix = transfer.0;
  }
  frac = NON_LVALUE_EXPR <ix> & 8388607;
  expn = ix >> 23 & 255;
  sign = (integer(kind=4)) ((unsigned int) ix >> 31);
  is_nan = expn == 255 && frac != 0;
  return is_nan;
}

is_nan_ieee is reduced to something with 2 function calls
to maintain fpu state.

is_nan_ieee (real(kind=4) & restrict x)
{
  c_char fpstate.1[33];
  logical(kind=4) is_nan;
  try
    {
      _gfortran_ieee_procedure_entry ((void *) &fpstate.1);
      is_nan = SAVE_EXPR <*x> unord SAVE_EXPR <*x>;
      return is_nan;
    }
  finally
    {
      _gfortran_ieee_procedure_exit ((void *) &fpstate.1);
    }
}

Both _entry and _exit are manipulating the FPU.
Reply | Threaded
Open this post in threaded view
|

[Bug fortran/91690] Slow IEEE intrinsics

jason at gcc dot gnu.org
In reply to this post by jason at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91690

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
__builtin_isnan is required to preserve FPU state, not sure why the Fortran FE
thinks it needs some extra magic around it?
Reply | Threaded
Open this post in threaded view
|

[Bug fortran/91690] Slow IEEE intrinsics

jason at gcc dot gnu.org
In reply to this post by jason at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91690

--- Comment #3 from Steve Kargl <sgk at troutmask dot apl.washington.edu> ---
On Mon, Sep 09, 2019 at 08:05:33AM +0000, rguenth at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91690
>
> --- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
> __builtin_isnan is required to preserve FPU state, not sure why the Fortran FE
> thinks it needs some extra magic around it?
>

The magic seems to come from trans-decl.c

/* Generate code for a function.  */

void
gfc_generate_function_code (gfc_namespace * ns)
{

...

  /* Check if an IEEE module is used in the procedure.  If so, save
     the floating point state.  */
  ieee = is_ieee_module_used (ns);
  if (ieee)
    fpstate = gfc_save_fp_state (&init);

...

  /* If IEEE modules are loaded, restore the floating-point state.  */
  if (ieee)
    gfc_restore_fp_state (&cleanup, fpstate);

}

The Fortran standard may require this behavior.  18-007r1 page 435

  In a scoping unit that has access to IEEE_EXCEPTIONS or IEEE_ARITHMETIC,
  if an intrinsic procedure or a procedure defined in an intrinsic module
  executes normally, the values of the flags IEEE_OVERFLOW,
IEEE_DIVIDE_BY_ZERO,
  and IEEE_INVALID shall be as on entry to the procedure, even if one or more
  of them signals during the calculation.

then on page 437

  In a procedure other than IEEE_SET_ROUNDING_MODE or IEEE_SET_STATUS, the
  processor shall not change the rounding modes on entry, and on return
  shall ensure that the rounding modes are the same as they were on entry.
...
  In a procedure other than IEEE_SET_UNDERFLOW_MODE or IEEE_SET_STATUS, the
  processor shall not change the underflow mode on entry, and on return
  shall ensure that the underflow mode is the same as it was on entry.

then on page 438

  In a procedure other than IEEE_SET_HALTING_MODE or IEEE_SET_STATUS, the
  processor shall not change the halting mode on entry, and on return shall
  ensure that the halting mode is the same as it was on entry.
Reply | Threaded
Open this post in threaded view
|

[Bug fortran/91690] Slow IEEE intrinsics

jason at gcc dot gnu.org
In reply to this post by jason at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91690

Wilco <wilco at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |wilco at gcc dot gnu.org

--- Comment #4 from Wilco <wilco at gcc dot gnu.org> ---
(In reply to Steve Kargl from comment #3)

> On Mon, Sep 09, 2019 at 08:05:33AM +0000, rguenth at gcc dot gnu.org wrote:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91690
> >
> > --- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
> > __builtin_isnan is required to preserve FPU state, not sure why the Fortran FE
> > thinks it needs some extra magic around it?
> >
>
> The magic seems to come from trans-decl.c
>
> /* Generate code for a function.  */
>
> void
> gfc_generate_function_code (gfc_namespace * ns)
> {
>
> ...
>
>   /* Check if an IEEE module is used in the procedure.  If so, save
>      the floating point state.  */
>   ieee = is_ieee_module_used (ns);
>   if (ieee)
>     fpstate = gfc_save_fp_state (&init);
>
> ...
>
>   /* If IEEE modules are loaded, restore the floating-point state.  */
>   if (ieee)
>     gfc_restore_fp_state (&cleanup, fpstate);
>
> }
>
> The Fortran standard may require this behavior.  18-007r1 page 435

But none of that is needed since a correct implementation of isnan etc does not
affect the floating point state at all (see PR66462).
Reply | Threaded
Open this post in threaded view
|

[Bug fortran/91690] Slow IEEE intrinsics

jason at gcc dot gnu.org
In reply to this post by jason at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91690

--- Comment #5 from Steve Kargl <sgk at troutmask dot apl.washington.edu> ---
On Mon, Sep 09, 2019 at 06:25:53PM +0000, wilco at gcc dot gnu.org wrote:
> >
> > The Fortran standard may require this behavior.  18-007r1 page 435
>
> But none of that is needed since a correct implementation of isnan
> etc does not affect the floating point state at all (see PR66462).
>

OK.  So you can special case IEEE_IS_NAN().  What about
the other IEEE_ARITHMETIC functions?

PR66462 has been re-opened.  Tamir seemed to have developed
4 different patches, yet no one can agree on "a correct
implementation" for all targets.
Reply | Threaded
Open this post in threaded view
|

[Bug fortran/91690] Slow IEEE intrinsics

jason at gcc dot gnu.org
In reply to this post by jason at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91690

Wilco <wdijkstr at arm dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |wdijkstr at arm dot com

--- Comment #6 from Wilco <wdijkstr at arm dot com> ---
(In reply to Steve Kargl from comment #5)

> On Mon, Sep 09, 2019 at 06:25:53PM +0000, wilco at gcc dot gnu.org wrote:
> > >
> > > The Fortran standard may require this behavior.  18-007r1 page 435
> >
> > But none of that is needed since a correct implementation of isnan
> > etc does not affect the floating point state at all (see PR66462).
> >
>
> OK.  So you can special case IEEE_IS_NAN().  What about
> the other IEEE_ARITHMETIC functions?

The same is true for the other isxxx functions from C99 and the new iszero and
isdenormal. Are there other functions besides these in IEEE_ARITHMETIC?

> PR66462 has been re-opened.  Tamir seemed to have developed
> 4 different patches, yet no one can agree on "a correct
> implementation" for all targets.

As his latest comments explain, Tamar's last patch is correct. That should be
the default since an integer implementation which doesn't affect floating point
state is the only correct one. It happens to be the fastest as well as a bonus.

Either way, I don't believe the Fortran front-end should try to work around
mid-end bugs. As reported a save/restore is prohibitive expensive on all
targets so shouldn't even be contemplated.
Reply | Threaded
Open this post in threaded view
|

[Bug fortran/91690] Slow IEEE intrinsics

jason at gcc dot gnu.org
In reply to this post by jason at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91690

--- Comment #7 from Steve Kargl <sgk at troutmask dot apl.washington.edu> ---
On Mon, Sep 09, 2019 at 07:22:24PM +0000, wdijkstr at arm dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91690
>
> Wilco <wdijkstr at arm dot com> changed:
>
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |wdijkstr at arm dot com
>
> --- Comment #6 from Wilco <wdijkstr at arm dot com> ---
> (In reply to Steve Kargl from comment #5)
> > On Mon, Sep 09, 2019 at 06:25:53PM +0000, wilco at gcc dot gnu.org wrote:
> > > >
> > > > The Fortran standard may require this behavior.  18-007r1 page 435
> > >
> > > But none of that is needed since a correct implementation of isnan
> > > etc does not affect the floating point state at all (see PR66462).
> > >
> >
> > OK.  So you can special case IEEE_IS_NAN().  What about
> > the other IEEE_ARITHMETIC functions?
>
> The same is true for the other isxxx functions from C99 and the new iszero and
> isdenormal. Are there other functions besides these in IEEE_ARITHMETIC?

Don't know.  I haven't tried cross-referencing C99 against Fortran 2018.
Table 17.2i lists the functions made available by IEEE_ARITHMETIC.

> > PR66462 has been re-opened.  Tamir seemed to have developed
> > 4 different patches, yet no one can agree on "a correct
> > implementation" for all targets.
>
> As his latest comments explain, Tamar's last patch is correct. That should be
> the default since an integer implementation which doesn't affect floating point
> state is the only correct one. It happens to be the fastest as well as a bonus.
>
> Either way, I don't believe the Fortran front-end should try to work around
> mid-end bugs. As reported a save/restore is prohibitive expensive on all
> targets so shouldn't even be contemplated.

The saving and restoring FPU state was implemented in

troutmask:sgk[206] svn log -r 216036
------------------------------------------------------------------------
r216036 | fxcoudert | 2014-10-09 02:47:25 -0700 (Thu, 09 Oct 2014) | 25 lines

The above is a full 8 months before PR66462.

PR66462 was opened on 2015-06-08.  A patch was committed
on 2017-06-08 and the PR was closed.  PR66462 was re-opened
on 2017-06-09.  The patch seems to have been reverted in
r249050, but this action isn't recorded in the PR.
Tamar's last comment #19 contains "Yes it was committed and
reverted and fixed, but never committed again."

So, it seems that the Fortran FE isn't trying to workaround
anything.  It is seemingly trying to ensure the requirements
of the Fortran standard are met.  If and when someone has
the time to review the implementation IEEE_ARITHMETIC,
perhaps, the saving/restoring of fpu state can be addressed.
Reply | Threaded
Open this post in threaded view
|

[Bug fortran/91690] Slow IEEE intrinsics

jason at gcc dot gnu.org
In reply to this post by jason at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91690

Dominique d'Humieres <dominiq at lps dot ens.fr> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2019-09-16
     Ever confirmed|0                           |1