On Tue, Jul 17, 2018 at 2:35 PM Kyrill Tkachov
<[hidden email]> wrote: > > Hi all, > > This is my first Fortran patch, so apologies if I'm missing something. > The current expansion of the min and max intrinsics explicitly expands > the comparisons between each argument to calculate the global min/max. > Some targets, like aarch64, have instructions that can calculate the min/max > of two real (floating-point) numbers with the proper NaN-handling semantics > (if both inputs are NaN, return Nan. If one is NaN, return the other) and those > are the semantics provided by the __builtin_fmin/max family of functions that expand > to these instructions. > > This patch makes the frontend emit __builtin_fmin/max directly to compare each > pair of numbers when the numbers are floating-point, and use MIN_EXPR/MAX_EXPR otherwise > (integral types and -ffast-math) which should hopefully be easier to recognise in the What is Fortrans requirement on min/max intrinsics? Doesn't it only require things that are guaranteed by MIN/MAX_EXPR anyways? The only restriction here is /* Minimum and maximum values. When used with floating point, if both operands are zeros, or if either operand is NaN, then it is unspecified which of the two operands is returned as the result. */ which means MIN/MAX_EXPR are not strictly IEEE compliant with signed zeros or NaNs. Thus the correct test would be !HONOR_SIGNED_ZEROS && !HONOR_NANS if singed zeros are significant. I'm not sure if using fmin/max calls when we cannot use MIN/MAX_EXPR is a good idea, this may both generate bigger code and be slower. Richard. > midend and optimise. The previous approach of generating the open-coded version of that > is used when we don't have an appropriate __builtin_fmin/max available. > For example, for a configuration of x86_64-unknown-linux-gnu that I tested there was no > 128-bit __built_fminl available. > > With this patch I'm seeing more than 7000 FMINNM/FMAXNM instructions being generated at -O3 > on aarch64 for 521.wrf from fprate SPEC2017 where none before were generated > (we were generating explicit comparisons and NaN checks). This gave a 2.4% improvement > in performance on a Cortex-A72. > > Bootstrapped and tested on aarch64-none-linux-gnu and x86_64-unknown-linux-gnu. > > Ok for trunk? > Thanks, > Kyrill > > 2018-07-17 Kyrylo Tkachov <[hidden email]> > > * f95-lang.c (gfc_init_builtin_functions): Define __builtin_fmin, > __builtin_fminf, __builtin_fminl, __builtin_fmax, __builtin_fmaxf, > __builtin_fmaxl. > * trans-intrinsic.c: Include builtins.h. > (gfc_conv_intrinsic_minmax): Emit __builtin_fmin/max or MIN/MAX_EXPR > functions to calculate the min/max. > > 2018-07-17 Kyrylo Tkachov <[hidden email]> > > * gfortran.dg/max_fmaxf.f90: New test. > * gfortran.dg/min_fminf.f90: Likewise. > * gfortran.dg/minmax_integer.f90: Likewise. > * gfortran.dg/max_fmaxl_aarch64.f90: Likewise. > * gfortran.dg/min_fminl_aarch64.f90: Likewise. |
Hi Kyrill,
> The current implementation expands to: > mvar = a1; > if (a2 .op. mvar || isnan (mvar)) > mvar = a2; > if (a3 .op. mvar || isnan (mvar)) > mvar = a3; > ... > return mvar; > > That is, if one of the operands is a NaN it will return the other argument. > If both (all) are NaNs, it will return NaN. This is the same as the > semantics of fmin/max > as far as I can tell. I've looked at the F2008 standard, and, interestingly enough, the requirement on MIN and MAX do not mention NaNs at all. 13.7.106 has, for MAX, Result Value. The value of the result is that of the largest argument. plus some stuff about character variables (not relevant here). Similar for MIN. Also, the section on IEEE_ARITHMETIC (14.9) does not mention comparisons; also, "Complete conformance with IEC 60559:1989 is not required", what is required is the correct support for +,-, and *, plus support for / if IEEE_SUPPORT_DIVIDE is covered. So, the Fortran standard does not impose many requirements. I do think that a patch such as yours should not change the current behavior unless we know what it does and do think it is a good idea. Hmm... Having said that, I think we pretty much cover all the corner cases in nan_1.f90, so if that test passes without regression, then that aspect should be fine. Question: You have found an advantage on Aarm64. Do you have access to other architectures so see if there is also a speed advantage, or maybe a disadvantage? Regards Thomas |
Hi Kyrill,
> Because the expansion now emits straightline code rather than > conditionals and branches > it should be easier to optimise in general, so I'd expect this to be an > improvement overall. > That said, I have benchmarked it on SPEC2017 on aarch64. > If you have any benchmarks of interest to you you (or somebody else) can > run on a target that you > care about I would be very grateful for any results. Well, most people currently use x86_64 for scientific computing, so I would be concerned most about this architecture. As for the test case, min / max performance clearly has an effect on 521.wrf, so this would be ideal. If you could run 521.wrf on x86_64, and find that it does not regress measureably (or even shows an improvement), the patch is OK. I'd be interested in the timings you get. Regards Thomas |
In reply to this post by Thomas Koenig-6
On Tue, Jul 17, 2018 at 11:06 PM, Janne Blomqvist <[hidden email]
> wrote: > On Tue, Jul 17, 2018 at 6:36 PM, Thomas Koenig <[hidden email]> > wrote: > >> Hi Kyrill, >> >> The current implementation expands to: >>> mvar = a1; >>> if (a2 .op. mvar || isnan (mvar)) >>> mvar = a2; >>> if (a3 .op. mvar || isnan (mvar)) >>> mvar = a3; >>> ... >>> return mvar; >>> >>> That is, if one of the operands is a NaN it will return the other >>> argument. >>> If both (all) are NaNs, it will return NaN. This is the same as the >>> semantics of fmin/max >>> as far as I can tell. >>> >> >> I've looked at the F2008 standard, and, interestingly enough, the >> requirement on MIN and MAX do not mention NaNs at all. 13.7.106 >> has, for MAX, >> >> Result Value. The value of the result is that of the largest argument. >> >> plus some stuff about character variables (not relevant here). Similar >> for MIN. >> > > FWIW, this has not changed in the latest(?) draft for F2018 (N2146), see > 16.9.125. > > Also, the section on IEEE_ARITHMETIC (14.9) does not mention >> comparisons; also, "Complete conformance with IEC 60559:1989 is not >> required", what is required is the correct support for +,-, and *, >> plus support for / if IEEE_SUPPORT_DIVIDE is covered. >> > > Interestingly, here the F2018 draft has new intrinsics in the > IEEE_ARITHMETIC module, IEEE_MAX_NUM, IEEE_MAX_NUM_MAG, IEEE_MIN_NUM, > IEEE_MIN_NUM_MAG. These correspond to the {max,min}num{,_mag} operations in > IEEE 754-2008, which AFAICT has the same NaN semantics as __builtin_fmax > etc. > > >> So, the Fortran standard does not impose many requirements. > > > If so, why don't we just use {MAX,MIN}_EXPR unconditionally? Those who > worry about the behavior wrt. NaNs, infinities etc. can use the intrinsics > from IEEE_ARITHMETIC? > > > This thread also has some interesting discussion on the topic: > https://github.com/JuliaLang/julia/issues/7866 > Oh, and on http://754r.ucbtest.org/ there is information about the next update after IEEE 754-2008. In particular, http://754r.ucbtest.org/changes.html notes that the above mentioned {max,min}num{,_mag} have been deleted, and "new {min,max}imum{,Number,Magnitude,MagnitudeNumber} operations are recommended; NaN and signed zero handling are changed from 754-2008 5.3.1. ". -- Janne Blomqvist |
In reply to this post by Richard Biener-2
On Tue, Jul 17, 2018 at 3:46 PM Kyrill Tkachov
<[hidden email]> wrote: > > Hi Richard, > > On 17/07/18 14:27, Richard Biener wrote: > > On Tue, Jul 17, 2018 at 2:35 PM Kyrill Tkachov > > <[hidden email]> wrote: > >> Hi all, > >> > >> This is my first Fortran patch, so apologies if I'm missing something. > >> The current expansion of the min and max intrinsics explicitly expands > >> the comparisons between each argument to calculate the global min/max. > >> Some targets, like aarch64, have instructions that can calculate the min/max > >> of two real (floating-point) numbers with the proper NaN-handling semantics > >> (if both inputs are NaN, return Nan. If one is NaN, return the other) and those > >> are the semantics provided by the __builtin_fmin/max family of functions that expand > >> to these instructions. > >> > >> This patch makes the frontend emit __builtin_fmin/max directly to compare each > >> pair of numbers when the numbers are floating-point, and use MIN_EXPR/MAX_EXPR otherwise > >> (integral types and -ffast-math) which should hopefully be easier to recognise in the > > What is Fortrans requirement on min/max intrinsics? Doesn't it only > > require things that > > are guaranteed by MIN/MAX_EXPR anyways? The only restriction here is > > The current implementation expands to: > mvar = a1; > if (a2 .op. mvar || isnan (mvar)) > mvar = a2; > if (a3 .op. mvar || isnan (mvar)) > mvar = a3; > ... > return mvar; > > That is, if one of the operands is a NaN it will return the other argument. > If both (all) are NaNs, it will return NaN. This is the same as the semantics of fmin/max > as far as I can tell. > > > /* Minimum and maximum values. When used with floating point, if both > > operands are zeros, or if either operand is NaN, then it is unspecified > > which of the two operands is returned as the result. */ > > > > which means MIN/MAX_EXPR are not strictly IEEE compliant with signed > > zeros or NaNs. > > Thus the correct test would be !HONOR_SIGNED_ZEROS && !HONOR_NANS if singed > > zeros are significant. > > True, MIN/MAX_EXPR would not be appropriate in that condition. I guarded their use > on !HONOR_NANS (type) only. I'll update it to !HONOR_SIGNED_ZEROS (type) && !HONOR_NANS (type). > > > > > > I'm not sure if using fmin/max calls when we cannot use MIN/MAX_EXPR > > is a good idea, > > this may both generate bigger code and be slower. > > The patch will generate fmin/fmax calls (or the fminf,fminl variants) when mathfn_built_in advertises > them as available (does that mean they'll have a fast inline implementation?) This doesn't mean anything given you make them available with your patch ;) So I expect it may cause issues for !c99_runtime targets (and long double at least). > If the above doesn't hold and we can't use either MIN/MAX_EXPR of fmin/fmax then the patch falls back > to the existing expansion. As said I would not use fmin/fmax calls here at all. > FWIW, this patch does improve performance on 521.wrf from SPEC2017 on aarch64. You said that, yes. Even without -ffast-math? Richard. > Thanks, > Kyrill > > > > > Richard. > > > >> midend and optimise. The previous approach of generating the open-coded version of that > >> is used when we don't have an appropriate __builtin_fmin/max available. > >> For example, for a configuration of x86_64-unknown-linux-gnu that I tested there was no > >> 128-bit __built_fminl available. > >> > >> With this patch I'm seeing more than 7000 FMINNM/FMAXNM instructions being generated at -O3 > >> on aarch64 for 521.wrf from fprate SPEC2017 where none before were generated > >> (we were generating explicit comparisons and NaN checks). This gave a 2.4% improvement > >> in performance on a Cortex-A72. > >> > >> Bootstrapped and tested on aarch64-none-linux-gnu and x86_64-unknown-linux-gnu. > >> > >> Ok for trunk? > >> Thanks, > >> Kyrill > >> > >> 2018-07-17 Kyrylo Tkachov <[hidden email]> > >> > >> * f95-lang.c (gfc_init_builtin_functions): Define __builtin_fmin, > >> __builtin_fminf, __builtin_fminl, __builtin_fmax, __builtin_fmaxf, > >> __builtin_fmaxl. > >> * trans-intrinsic.c: Include builtins.h. > >> (gfc_conv_intrinsic_minmax): Emit __builtin_fmin/max or MIN/MAX_EXPR > >> functions to calculate the min/max. > >> > >> 2018-07-17 Kyrylo Tkachov <[hidden email]> > >> > >> * gfortran.dg/max_fmaxf.f90: New test. > >> * gfortran.dg/min_fminf.f90: Likewise. > >> * gfortran.dg/minmax_integer.f90: Likewise. > >> * gfortran.dg/max_fmaxl_aarch64.f90: Likewise. > >> * gfortran.dg/min_fminl_aarch64.f90: Likewise. > |
On 18/07/18 10:44, Richard Biener wrote: > On Tue, Jul 17, 2018 at 3:46 PM Kyrill Tkachov > <[hidden email]> wrote: >> Hi Richard, >> >> On 17/07/18 14:27, Richard Biener wrote: >>> On Tue, Jul 17, 2018 at 2:35 PM Kyrill Tkachov >>> <[hidden email]> wrote: >>>> Hi all, >>>> >>>> This is my first Fortran patch, so apologies if I'm missing something. >>>> The current expansion of the min and max intrinsics explicitly expands >>>> the comparisons between each argument to calculate the global min/max. >>>> Some targets, like aarch64, have instructions that can calculate the min/max >>>> of two real (floating-point) numbers with the proper NaN-handling semantics >>>> (if both inputs are NaN, return Nan. If one is NaN, return the other) and those >>>> are the semantics provided by the __builtin_fmin/max family of functions that expand >>>> to these instructions. >>>> >>>> This patch makes the frontend emit __builtin_fmin/max directly to compare each >>>> pair of numbers when the numbers are floating-point, and use MIN_EXPR/MAX_EXPR otherwise >>>> (integral types and -ffast-math) which should hopefully be easier to recognise in the >>> What is Fortrans requirement on min/max intrinsics? Doesn't it only >>> require things that >>> are guaranteed by MIN/MAX_EXPR anyways? The only restriction here is >> The current implementation expands to: >> mvar = a1; >> if (a2 .op. mvar || isnan (mvar)) >> mvar = a2; >> if (a3 .op. mvar || isnan (mvar)) >> mvar = a3; >> ... >> return mvar; >> >> That is, if one of the operands is a NaN it will return the other argument. >> If both (all) are NaNs, it will return NaN. This is the same as the semantics of fmin/max >> as far as I can tell. >> >>> /* Minimum and maximum values. When used with floating point, if both >>> operands are zeros, or if either operand is NaN, then it is unspecified >>> which of the two operands is returned as the result. */ >>> >>> which means MIN/MAX_EXPR are not strictly IEEE compliant with signed >>> zeros or NaNs. >>> Thus the correct test would be !HONOR_SIGNED_ZEROS && !HONOR_NANS if singed >>> zeros are significant. >> True, MIN/MAX_EXPR would not be appropriate in that condition. I guarded their use >> on !HONOR_NANS (type) only. I'll update it to !HONOR_SIGNED_ZEROS (type) && !HONOR_NANS (type). >> >> >>> I'm not sure if using fmin/max calls when we cannot use MIN/MAX_EXPR >>> is a good idea, >>> this may both generate bigger code and be slower. >> The patch will generate fmin/fmax calls (or the fminf,fminl variants) when mathfn_built_in advertises >> them as available (does that mean they'll have a fast inline implementation?) > This doesn't mean anything given you make them available with your > patch ;) So I expect it may > cause issues for !c99_runtime targets (and long double at least). Urgh, that can cause headaches... >> If the above doesn't hold and we can't use either MIN/MAX_EXPR of fmin/fmax then the patch falls back >> to the existing expansion. > As said I would not use fmin/fmax calls here at all. ... Given the comments from Thomas and Janne, maybe we should just emit MIN/MAX_EXPRs here since there is no language requirement on NaN/signed zero handling on these intrinsics? That should make it simpler and more portable. >> FWIW, this patch does improve performance on 521.wrf from SPEC2017 on aarch64. > You said that, yes. Even without -ffast-math? It improves at -O3 without -ffast-math in particular. With -ffast-math phiopt optimisation is more aggressive and merges the conditionals into MIN/MAX_EXPRs (minmax_replacement in tree-ssa-phiopt.c) Thanks, Kyrill > Richard. > >> Thanks, >> Kyrill >> >>> Richard. >>> >>>> midend and optimise. The previous approach of generating the open-coded version of that >>>> is used when we don't have an appropriate __builtin_fmin/max available. >>>> For example, for a configuration of x86_64-unknown-linux-gnu that I tested there was no >>>> 128-bit __built_fminl available. >>>> >>>> With this patch I'm seeing more than 7000 FMINNM/FMAXNM instructions being generated at -O3 >>>> on aarch64 for 521.wrf from fprate SPEC2017 where none before were generated >>>> (we were generating explicit comparisons and NaN checks). This gave a 2.4% improvement >>>> in performance on a Cortex-A72. >>>> >>>> Bootstrapped and tested on aarch64-none-linux-gnu and x86_64-unknown-linux-gnu. >>>> >>>> Ok for trunk? >>>> Thanks, >>>> Kyrill >>>> >>>> 2018-07-17 Kyrylo Tkachov <[hidden email]> >>>> >>>> * f95-lang.c (gfc_init_builtin_functions): Define __builtin_fmin, >>>> __builtin_fminf, __builtin_fminl, __builtin_fmax, __builtin_fmaxf, >>>> __builtin_fmaxl. >>>> * trans-intrinsic.c: Include builtins.h. >>>> (gfc_conv_intrinsic_minmax): Emit __builtin_fmin/max or MIN/MAX_EXPR >>>> functions to calculate the min/max. >>>> >>>> 2018-07-17 Kyrylo Tkachov <[hidden email]> >>>> >>>> * gfortran.dg/max_fmaxf.f90: New test. >>>> * gfortran.dg/min_fminf.f90: Likewise. >>>> * gfortran.dg/minmax_integer.f90: Likewise. >>>> * gfortran.dg/max_fmaxl_aarch64.f90: Likewise. >>>> * gfortran.dg/min_fminl_aarch64.f90: Likewise. |
On Wed, Jul 18, 2018 at 11:50 AM Kyrill Tkachov
<[hidden email]> wrote: > > > On 18/07/18 10:44, Richard Biener wrote: > > On Tue, Jul 17, 2018 at 3:46 PM Kyrill Tkachov > > <[hidden email]> wrote: > >> Hi Richard, > >> > >> On 17/07/18 14:27, Richard Biener wrote: > >>> On Tue, Jul 17, 2018 at 2:35 PM Kyrill Tkachov > >>> <[hidden email]> wrote: > >>>> Hi all, > >>>> > >>>> This is my first Fortran patch, so apologies if I'm missing something. > >>>> The current expansion of the min and max intrinsics explicitly expands > >>>> the comparisons between each argument to calculate the global min/max. > >>>> Some targets, like aarch64, have instructions that can calculate the min/max > >>>> of two real (floating-point) numbers with the proper NaN-handling semantics > >>>> (if both inputs are NaN, return Nan. If one is NaN, return the other) and those > >>>> are the semantics provided by the __builtin_fmin/max family of functions that expand > >>>> to these instructions. > >>>> > >>>> This patch makes the frontend emit __builtin_fmin/max directly to compare each > >>>> pair of numbers when the numbers are floating-point, and use MIN_EXPR/MAX_EXPR otherwise > >>>> (integral types and -ffast-math) which should hopefully be easier to recognise in the > >>> What is Fortrans requirement on min/max intrinsics? Doesn't it only > >>> require things that > >>> are guaranteed by MIN/MAX_EXPR anyways? The only restriction here is > >> The current implementation expands to: > >> mvar = a1; > >> if (a2 .op. mvar || isnan (mvar)) > >> mvar = a2; > >> if (a3 .op. mvar || isnan (mvar)) > >> mvar = a3; > >> ... > >> return mvar; > >> > >> That is, if one of the operands is a NaN it will return the other argument. > >> If both (all) are NaNs, it will return NaN. This is the same as the semantics of fmin/max > >> as far as I can tell. > >> > >>> /* Minimum and maximum values. When used with floating point, if both > >>> operands are zeros, or if either operand is NaN, then it is unspecified > >>> which of the two operands is returned as the result. */ > >>> > >>> which means MIN/MAX_EXPR are not strictly IEEE compliant with signed > >>> zeros or NaNs. > >>> Thus the correct test would be !HONOR_SIGNED_ZEROS && !HONOR_NANS if singed > >>> zeros are significant. > >> True, MIN/MAX_EXPR would not be appropriate in that condition. I guarded their use > >> on !HONOR_NANS (type) only. I'll update it to !HONOR_SIGNED_ZEROS (type) && !HONOR_NANS (type). > >> > >> > >>> I'm not sure if using fmin/max calls when we cannot use MIN/MAX_EXPR > >>> is a good idea, > >>> this may both generate bigger code and be slower. > >> The patch will generate fmin/fmax calls (or the fminf,fminl variants) when mathfn_built_in advertises > >> them as available (does that mean they'll have a fast inline implementation?) > > This doesn't mean anything given you make them available with your > > patch ;) So I expect it may > > cause issues for !c99_runtime targets (and long double at least). > > Urgh, that can cause headaches... > > >> If the above doesn't hold and we can't use either MIN/MAX_EXPR of fmin/fmax then the patch falls back > >> to the existing expansion. > > As said I would not use fmin/fmax calls here at all. > > ... Given the comments from Thomas and Janne, maybe we should just emit MIN/MAX_EXPRs here > since there is no language requirement on NaN/signed zero handling on these intrinsics? > That should make it simpler and more portable. That's fortran maintainers call. > >> FWIW, this patch does improve performance on 521.wrf from SPEC2017 on aarch64. > > You said that, yes. Even without -ffast-math? > > It improves at -O3 without -ffast-math in particular. With -ffast-math phiopt optimisation > is more aggressive and merges the conditionals into MIN/MAX_EXPRs (minmax_replacement in tree-ssa-phiopt.c) The question is will it be slower without -ffast-math, that is, when fmin/max() calls are emitted rather than inline conditionals. I think a patch just using MAX/MIN_EXPR within the existing constraints and otherwise falling back to the current code would be more obvious and other changes should be mande independently. Richard. > Thanks, > Kyrill > > > Richard. > > > >> Thanks, > >> Kyrill > >> > >>> Richard. > >>> > >>>> midend and optimise. The previous approach of generating the open-coded version of that > >>>> is used when we don't have an appropriate __builtin_fmin/max available. > >>>> For example, for a configuration of x86_64-unknown-linux-gnu that I tested there was no > >>>> 128-bit __built_fminl available. > >>>> > >>>> With this patch I'm seeing more than 7000 FMINNM/FMAXNM instructions being generated at -O3 > >>>> on aarch64 for 521.wrf from fprate SPEC2017 where none before were generated > >>>> (we were generating explicit comparisons and NaN checks). This gave a 2.4% improvement > >>>> in performance on a Cortex-A72. > >>>> > >>>> Bootstrapped and tested on aarch64-none-linux-gnu and x86_64-unknown-linux-gnu. > >>>> > >>>> Ok for trunk? > >>>> Thanks, > >>>> Kyrill > >>>> > >>>> 2018-07-17 Kyrylo Tkachov <[hidden email]> > >>>> > >>>> * f95-lang.c (gfc_init_builtin_functions): Define __builtin_fmin, > >>>> __builtin_fminf, __builtin_fminl, __builtin_fmax, __builtin_fmaxf, > >>>> __builtin_fmaxl. > >>>> * trans-intrinsic.c: Include builtins.h. > >>>> (gfc_conv_intrinsic_minmax): Emit __builtin_fmin/max or MIN/MAX_EXPR > >>>> functions to calculate the min/max. > >>>> > >>>> 2018-07-17 Kyrylo Tkachov <[hidden email]> > >>>> > >>>> * gfortran.dg/max_fmaxf.f90: New test. > >>>> * gfortran.dg/min_fminf.f90: Likewise. > >>>> * gfortran.dg/minmax_integer.f90: Likewise. > >>>> * gfortran.dg/max_fmaxl_aarch64.f90: Likewise. > >>>> * gfortran.dg/min_fminl_aarch64.f90: Likewise. > |
In reply to this post by Janne Blomqvist-3
On Wed, Jul 18, 2018 at 4:26 PM, Thomas König <[hidden email]> wrote:
> Hi Kyrlll, > > > Am 18.07.2018 um 13:17 schrieb Kyrill Tkachov < > [hidden email]>: > > > > Thomas, Janne, would this relaxation of NaN handling be acceptable given > the benefits > > mentioned above? If so, what would be the recommended adjustment to the > nan_1.f90 test? > > I would be a bit careful about changing behavior in such a major way. What > would the results with NaN and infinity then be, with or without > optimization? Would the results be consistent with min(nan,num) vs > min(num,nan)? Would they be consistent with the new IEEE standard? > AFAIU, MIN/MAX_EXPR do the right thing when comparing a normal number with Inf. For NaN the result is undefined, and you might indeed have min(a, NaN) = a min(NaN, a) = NaN where "a" is a normal number. (I think that happens at least on x86 if MIN_EXPR is expanded to minsd/minpd. Apparently what the proper result for min(a, NaN) should be is contentious enough that minnum was removed from the upcoming IEEE 754 revision, and new operations AFAICS have the semantics minimum(a, NaN) = minimum(NaN, a) = NaN minimumNumber(a, NaN) = minimumNumber(NaN, a) = a That is minimumNumber corresponds to minnum in IEEE 754-2008 and fmin* in C, and to the current behavior of gfortran. > In general, I think that min(nan,num) should be nan and that our current > behavior is not the best. > There was some extensive discussion of that in the Julia bug report I linked to in an earlier message, and they came to the same conclusion and changed their behavior. > Does anybody have dats points on how this is handled by other compilers? > The only other compiler I have access to at the moment is ifort (and not the latest version), but maybe somebody has access to a wider variety? > Oh, and if anything is changed, then compile and runtime behavior should > always be the same. > Well, IFF we place some weight on the runtime behavior being particularly sensible wrt NaN's, which it wouldn't be if we just use a plain MIN/MAX_EXPR. Is it worth taking a performance hit for, though? In particular, if other compilers are inconsistent, we might as well do whatever is fastest. -- Janne Blomqvist |
In reply to this post by Janne Blomqvist-3
Hi Richard,
On 18/07/18 16:27, Richard Sandiford wrote: > Thanks for doing this. > > Kyrill Tkachov <[hidden email]> writes: >> + calc = build_call_expr_internal_loc (input_location, ifn, type, >> + 2, mvar, convert (type, val)); > (indentation looks off) > >> diff --git a/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90 b/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90 >> new file mode 100644 >> index 0000000000000000000000000000000000000000..8c8ea063e5d0718dc829c1f5574c5b46040e6786 >> --- /dev/null >> +++ b/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90 >> @@ -0,0 +1,9 @@ >> +! { dg-do compile { target aarch64*-*-* } } >> +! { dg-options "-O2 -fdump-tree-optimized" } >> + >> +subroutine fool (a, b, c, d, e, f, g, h) >> + real (kind=16) :: a, b, c, d, e, f, g, h >> + a = max (a, b, c, d, e, f, g, h) >> +end subroutine >> + >> +! { dg-final { scan-tree-dump-times "__builtin_fmaxl " 7 "optimized" } } >> diff --git a/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90 b/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90 >> new file mode 100644 >> index 0000000000000000000000000000000000000000..92368917fb48e0c468a16d080ab3a9ac842e01a7 >> --- /dev/null >> +++ b/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90 >> @@ -0,0 +1,9 @@ >> +! { dg-do compile { target aarch64*-*-* } } >> +! { dg-options "-O2 -fdump-tree-optimized" } >> + >> +subroutine fool (a, b, c, d, e, f, g, h) >> + real (kind=16) :: a, b, c, d, e, f, g, h >> + a = min (a, b, c, d, e, f, g, h) >> +end subroutine >> + >> +! { dg-final { scan-tree-dump-times "__builtin_fminl " 7 "optimized" } } > Do these still pass? I wouldn't have expected us to use __builtin_fmin* > and __builtin_fmax* now. > > It would be good to have tests that we use ".FMIN" and ".FMAX" for kind=4 > and kind=8 on AArch64, since that's really the end goal here. I've fixed that and the indentation issue in this small revision. Given Janne's comments I will commit this tomorrow if there are no objections. This patch should be a conservative improvement. If the Fortran folks decide to sacrifice the more predictable NaN handling in favour of more optimisation leeway by using MIN/MAX_EXPR unconditionally we can do that as a follow-up. Thanks for the help, Kyrill 2018-07-18 Kyrylo Tkachov <[hidden email]> * trans-intrinsic.c: (gfc_conv_intrinsic_minmax): Emit MIN_MAX_EXPR or IFN_FMIN/FMAX sequence to calculate the min/max when possible. 2018-07-18 Kyrylo Tkachov <[hidden email]> * gfortran.dg/max_fmax_aarch64.f90: New test. * gfortran.dg/min_fmin_aarch64.f90: Likewise. * gfortran.dg/minmax_integer.f90: Likewise. fort-v4.patch (7K) Download Attachment |
Free forum by Nabble | Edit this page |