https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91690
Bug ID: 91690 Summary: Slow IEEE intrinsics Product: gcc Version: 9.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: fortran Assignee: unassigned at gcc dot gnu.org Reporter: townsend at astro dot wisc.edu Target Milestone: --- Created attachment 46844 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46844&action=edit Demo program The intrinsics provided by the IEEE_ARITHMETIC module appear to be significantly slower than an equivalent implementation based on inspecting bits. I attach a short program that demonstrates this. Compiling with gfortran -O2 -o test_ieee test_ieee.f90 ...I get the following output: Using IEEE time: 16.1860008 Using bits time: 0.101000004 So, the IEEE_IS_NAN() intrinsic appears to be ~160 times slower than inspecting bits directly. Other intrinsics seem to show similar issues. cheers, Rich |
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91690
kargl at gcc dot gnu.org changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |kargl at gcc dot gnu.org --- Comment #1 from kargl at gcc dot gnu.org --- (In reply to Rich Townsend from comment #0) > Created attachment 46844 [details] > Demo program > > The intrinsics provided by the IEEE_ARITHMETIC module appear to be > significantly slower than an equivalent implementation based on inspecting > bits. > The implementations are not equivalent. -fdump-tree-original may shield some light. is_nan_bits() isn't worried about FPU state (ie., keeping track of exceptions). is_nan_bits is reduced to something simply using simple bit twiddling. The following is somewhat edited to reduce length is_nan_bits (real(kind=4) & x) { integer(kind=4) expn, frac, is_nan, ix, sign; { integer(kind=8) D.3904, D.3905, D.3906, transfer.0; D.3904 = 4; D.3905 = 4; __builtin_memcpy ((void *) &transfer.0, (void *) x, (unsigned long) MAX_EXPR <MIN_EXPR <D.3905, D.3904>, 0>); ix = transfer.0; } frac = NON_LVALUE_EXPR <ix> & 8388607; expn = ix >> 23 & 255; sign = (integer(kind=4)) ((unsigned int) ix >> 31); is_nan = expn == 255 && frac != 0; return is_nan; } is_nan_ieee is reduced to something with 2 function calls to maintain fpu state. is_nan_ieee (real(kind=4) & restrict x) { c_char fpstate.1[33]; logical(kind=4) is_nan; try { _gfortran_ieee_procedure_entry ((void *) &fpstate.1); is_nan = SAVE_EXPR <*x> unord SAVE_EXPR <*x>; return is_nan; } finally { _gfortran_ieee_procedure_exit ((void *) &fpstate.1); } } Both _entry and _exit are manipulating the FPU. |
In reply to this post by jason at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91690
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- __builtin_isnan is required to preserve FPU state, not sure why the Fortran FE thinks it needs some extra magic around it? |
In reply to this post by jason at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91690
--- Comment #3 from Steve Kargl <sgk at troutmask dot apl.washington.edu> --- On Mon, Sep 09, 2019 at 08:05:33AM +0000, rguenth at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91690 > > --- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- > __builtin_isnan is required to preserve FPU state, not sure why the Fortran FE > thinks it needs some extra magic around it? > The magic seems to come from trans-decl.c /* Generate code for a function. */ void gfc_generate_function_code (gfc_namespace * ns) { ... /* Check if an IEEE module is used in the procedure. If so, save the floating point state. */ ieee = is_ieee_module_used (ns); if (ieee) fpstate = gfc_save_fp_state (&init); ... /* If IEEE modules are loaded, restore the floating-point state. */ if (ieee) gfc_restore_fp_state (&cleanup, fpstate); } The Fortran standard may require this behavior. 18-007r1 page 435 In a scoping unit that has access to IEEE_EXCEPTIONS or IEEE_ARITHMETIC, if an intrinsic procedure or a procedure defined in an intrinsic module executes normally, the values of the flags IEEE_OVERFLOW, IEEE_DIVIDE_BY_ZERO, and IEEE_INVALID shall be as on entry to the procedure, even if one or more of them signals during the calculation. then on page 437 In a procedure other than IEEE_SET_ROUNDING_MODE or IEEE_SET_STATUS, the processor shall not change the rounding modes on entry, and on return shall ensure that the rounding modes are the same as they were on entry. ... In a procedure other than IEEE_SET_UNDERFLOW_MODE or IEEE_SET_STATUS, the processor shall not change the underflow mode on entry, and on return shall ensure that the underflow mode is the same as it was on entry. then on page 438 In a procedure other than IEEE_SET_HALTING_MODE or IEEE_SET_STATUS, the processor shall not change the halting mode on entry, and on return shall ensure that the halting mode is the same as it was on entry. |
In reply to this post by jason at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91690
Wilco <wilco at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |wilco at gcc dot gnu.org --- Comment #4 from Wilco <wilco at gcc dot gnu.org> --- (In reply to Steve Kargl from comment #3) > On Mon, Sep 09, 2019 at 08:05:33AM +0000, rguenth at gcc dot gnu.org wrote: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91690 > > > > --- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- > > __builtin_isnan is required to preserve FPU state, not sure why the Fortran FE > > thinks it needs some extra magic around it? > > > > The magic seems to come from trans-decl.c > > /* Generate code for a function. */ > > void > gfc_generate_function_code (gfc_namespace * ns) > { > > ... > > /* Check if an IEEE module is used in the procedure. If so, save > the floating point state. */ > ieee = is_ieee_module_used (ns); > if (ieee) > fpstate = gfc_save_fp_state (&init); > > ... > > /* If IEEE modules are loaded, restore the floating-point state. */ > if (ieee) > gfc_restore_fp_state (&cleanup, fpstate); > > } > > The Fortran standard may require this behavior. 18-007r1 page 435 But none of that is needed since a correct implementation of isnan etc does not affect the floating point state at all (see PR66462). |
In reply to this post by jason at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91690
--- Comment #5 from Steve Kargl <sgk at troutmask dot apl.washington.edu> --- On Mon, Sep 09, 2019 at 06:25:53PM +0000, wilco at gcc dot gnu.org wrote: > > > > The Fortran standard may require this behavior. 18-007r1 page 435 > > But none of that is needed since a correct implementation of isnan > etc does not affect the floating point state at all (see PR66462). > OK. So you can special case IEEE_IS_NAN(). What about the other IEEE_ARITHMETIC functions? PR66462 has been re-opened. Tamir seemed to have developed 4 different patches, yet no one can agree on "a correct implementation" for all targets. |
In reply to this post by jason at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91690
Wilco <wdijkstr at arm dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |wdijkstr at arm dot com --- Comment #6 from Wilco <wdijkstr at arm dot com> --- (In reply to Steve Kargl from comment #5) > On Mon, Sep 09, 2019 at 06:25:53PM +0000, wilco at gcc dot gnu.org wrote: > > > > > > The Fortran standard may require this behavior. 18-007r1 page 435 > > > > But none of that is needed since a correct implementation of isnan > > etc does not affect the floating point state at all (see PR66462). > > > > OK. So you can special case IEEE_IS_NAN(). What about > the other IEEE_ARITHMETIC functions? The same is true for the other isxxx functions from C99 and the new iszero and isdenormal. Are there other functions besides these in IEEE_ARITHMETIC? > PR66462 has been re-opened. Tamir seemed to have developed > 4 different patches, yet no one can agree on "a correct > implementation" for all targets. As his latest comments explain, Tamar's last patch is correct. That should be the default since an integer implementation which doesn't affect floating point state is the only correct one. It happens to be the fastest as well as a bonus. Either way, I don't believe the Fortran front-end should try to work around mid-end bugs. As reported a save/restore is prohibitive expensive on all targets so shouldn't even be contemplated. |
In reply to this post by jason at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91690
--- Comment #7 from Steve Kargl <sgk at troutmask dot apl.washington.edu> --- On Mon, Sep 09, 2019 at 07:22:24PM +0000, wdijkstr at arm dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91690 > > Wilco <wdijkstr at arm dot com> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > CC| |wdijkstr at arm dot com > > --- Comment #6 from Wilco <wdijkstr at arm dot com> --- > (In reply to Steve Kargl from comment #5) > > On Mon, Sep 09, 2019 at 06:25:53PM +0000, wilco at gcc dot gnu.org wrote: > > > > > > > > The Fortran standard may require this behavior. 18-007r1 page 435 > > > > > > But none of that is needed since a correct implementation of isnan > > > etc does not affect the floating point state at all (see PR66462). > > > > > > > OK. So you can special case IEEE_IS_NAN(). What about > > the other IEEE_ARITHMETIC functions? > > The same is true for the other isxxx functions from C99 and the new iszero and > isdenormal. Are there other functions besides these in IEEE_ARITHMETIC? Don't know. I haven't tried cross-referencing C99 against Fortran 2018. Table 17.2i lists the functions made available by IEEE_ARITHMETIC. > > PR66462 has been re-opened. Tamir seemed to have developed > > 4 different patches, yet no one can agree on "a correct > > implementation" for all targets. > > As his latest comments explain, Tamar's last patch is correct. That should be > the default since an integer implementation which doesn't affect floating point > state is the only correct one. It happens to be the fastest as well as a bonus. > > Either way, I don't believe the Fortran front-end should try to work around > mid-end bugs. As reported a save/restore is prohibitive expensive on all > targets so shouldn't even be contemplated. The saving and restoring FPU state was implemented in troutmask:sgk[206] svn log -r 216036 ------------------------------------------------------------------------ r216036 | fxcoudert | 2014-10-09 02:47:25 -0700 (Thu, 09 Oct 2014) | 25 lines The above is a full 8 months before PR66462. PR66462 was opened on 2015-06-08. A patch was committed on 2017-06-08 and the PR was closed. PR66462 was re-opened on 2017-06-09. The patch seems to have been reverted in r249050, but this action isn't recorded in the PR. Tamar's last comment #19 contains "Yes it was committed and reverted and fixed, but never committed again." So, it seems that the Fortran FE isn't trying to workaround anything. It is seemingly trying to ensure the requirements of the Fortran standard are met. If and when someone has the time to review the implementation IEEE_ARITHMETIC, perhaps, the saving/restoring of fpu state can be addressed. |
In reply to this post by jason at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91690
Dominique d'Humieres <dominiq at lps dot ens.fr> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Last reconfirmed| |2019-09-16 Ever confirmed|0 |1 |
Free forum by Nabble | Edit this page |