Re: [PATCH][Fortran] Use MIN/MAX_EXPR for intrinsics or __builtin_fmin/max when appropriate

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH][Fortran] Use MIN/MAX_EXPR for intrinsics or __builtin_fmin/max when appropriate

Richard Biener-2
On Tue, Jul 17, 2018 at 2:35 PM Kyrill Tkachov
<[hidden email]> wrote:

>
> Hi all,
>
> This is my first Fortran patch, so apologies if I'm missing something.
> The current expansion of the min and max intrinsics explicitly expands
> the comparisons between each argument to calculate the global min/max.
> Some targets, like aarch64, have instructions that can calculate the min/max
> of two real (floating-point) numbers with the proper NaN-handling semantics
> (if both inputs are NaN, return Nan. If one is NaN, return the other) and those
> are the semantics provided by the __builtin_fmin/max family of functions that expand
> to these instructions.
>
> This patch makes the frontend emit __builtin_fmin/max directly to compare each
> pair of numbers when the numbers are floating-point, and use MIN_EXPR/MAX_EXPR otherwise
> (integral types and -ffast-math) which should hopefully be easier to recognise in the

What is Fortrans requirement on min/max intrinsics?  Doesn't it only
require things that
are guaranteed by MIN/MAX_EXPR anyways?  The only restriction here is

/* Minimum and maximum values.  When used with floating point, if both
   operands are zeros, or if either operand is NaN, then it is unspecified
   which of the two operands is returned as the result.  */

which means MIN/MAX_EXPR are not strictly IEEE compliant with signed
zeros or NaNs.
Thus the correct test would be !HONOR_SIGNED_ZEROS && !HONOR_NANS if singed
zeros are significant.

I'm not sure if using fmin/max calls when we cannot use MIN/MAX_EXPR
is a good idea,
this may both generate bigger code and be slower.

Richard.

> midend and optimise. The previous approach of generating the open-coded version of that
> is used when we don't have an appropriate __builtin_fmin/max available.
> For example, for a configuration of x86_64-unknown-linux-gnu that I tested there was no
> 128-bit __built_fminl available.
>
> With this patch I'm seeing more than 7000 FMINNM/FMAXNM instructions being generated at -O3
> on aarch64 for 521.wrf from fprate SPEC2017 where none before were generated
> (we were generating explicit comparisons and NaN checks). This gave a 2.4% improvement
> in performance on a Cortex-A72.
>
> Bootstrapped and tested on aarch64-none-linux-gnu and x86_64-unknown-linux-gnu.
>
> Ok for trunk?
> Thanks,
> Kyrill
>
> 2018-07-17  Kyrylo Tkachov  <[hidden email]>
>
>      * f95-lang.c (gfc_init_builtin_functions): Define __builtin_fmin,
>      __builtin_fminf, __builtin_fminl, __builtin_fmax, __builtin_fmaxf,
>      __builtin_fmaxl.
>      * trans-intrinsic.c: Include builtins.h.
>      (gfc_conv_intrinsic_minmax): Emit __builtin_fmin/max or MIN/MAX_EXPR
>      functions to calculate the min/max.
>
> 2018-07-17  Kyrylo Tkachov  <[hidden email]>
>
>      * gfortran.dg/max_fmaxf.f90: New test.
>      * gfortran.dg/min_fminf.f90: Likewise.
>      * gfortran.dg/minmax_integer.f90: Likewise.
>      * gfortran.dg/max_fmaxl_aarch64.f90: Likewise.
>      * gfortran.dg/min_fminl_aarch64.f90: Likewise.
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH][Fortran] Use MIN/MAX_EXPR for intrinsics or __builtin_fmin/max when appropriate

Thomas Koenig-6
Hi Kyrill,

> The current implementation expands to:
>      mvar = a1;
>      if (a2 .op. mvar || isnan (mvar))
>        mvar = a2;
>      if (a3 .op. mvar || isnan (mvar))
>        mvar = a3;
>      ...
>      return mvar;
>
> That is, if one of the operands is a NaN it will return the other argument.
> If both (all) are NaNs, it will return NaN. This is the same as the
> semantics of fmin/max
> as far as I can tell.

I've looked at the F2008 standard, and, interestingly enough, the
requirement on MIN and MAX do not mention NaNs at all. 13.7.106
has, for MAX,

Result Value. The value of the result is that of the largest argument.

plus some stuff about character variables (not relevant here).  Similar
for MIN.

Also, the section on IEEE_ARITHMETIC (14.9) does not mention
comparisons; also, "Complete conformance with IEC 60559:1989 is not
required", what is required is the correct support for +,-, and *,
plus support for / if IEEE_SUPPORT_DIVIDE is covered.

So, the Fortran standard does not impose many requirements. I do think
that a patch such as yours should not change the current behavior unless
we know what it does and do think it is a good idea.  Hmm...

Having said that, I think we pretty much cover all the corner cases
in nan_1.f90, so if that test passes without regression, then that
aspect should be fine.

Question: You have found an advantage on Aarm64. Do you have
access to other architectures so see if there is also a speed
advantage, or maybe a disadvantage?

Regards

        Thomas
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH][Fortran] Use MIN/MAX_EXPR for intrinsics or __builtin_fmin/max when appropriate

Thomas Koenig-6
Hi Kyrill,

> Because the expansion now emits straightline code rather than
> conditionals and branches
> it should be easier to optimise in general, so I'd expect this to be an
> improvement overall.
> That said, I have benchmarked it on SPEC2017 on aarch64.

> If you have any benchmarks of interest to you you (or somebody else) can
> run on a target that you
> care about I would be very grateful for any results.

Well, most people currently use x86_64 for scientific computing, so I
would be concerned most about this architecture. As for the test case,
min / max performance clearly has an effect on 521.wrf, so this would
be ideal.

If you could run 521.wrf on x86_64, and find that it does not
regress measureably (or even shows an improvement), the patch is OK.
I'd be interested in the timings you get.

Regards

        Thomas
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH][Fortran] Use MIN/MAX_EXPR for intrinsics or __builtin_fmin/max when appropriate

Janne Blomqvist-3
In reply to this post by Thomas Koenig-6
On Tue, Jul 17, 2018 at 11:06 PM, Janne Blomqvist <[hidden email]
> wrote:

> On Tue, Jul 17, 2018 at 6:36 PM, Thomas Koenig <[hidden email]>
> wrote:
>
>> Hi Kyrill,
>>
>> The current implementation expands to:
>>>      mvar = a1;
>>>      if (a2 .op. mvar || isnan (mvar))
>>>        mvar = a2;
>>>      if (a3 .op. mvar || isnan (mvar))
>>>        mvar = a3;
>>>      ...
>>>      return mvar;
>>>
>>> That is, if one of the operands is a NaN it will return the other
>>> argument.
>>> If both (all) are NaNs, it will return NaN. This is the same as the
>>> semantics of fmin/max
>>> as far as I can tell.
>>>
>>
>> I've looked at the F2008 standard, and, interestingly enough, the
>> requirement on MIN and MAX do not mention NaNs at all. 13.7.106
>> has, for MAX,
>>
>> Result Value. The value of the result is that of the largest argument.
>>
>> plus some stuff about character variables (not relevant here).  Similar
>> for MIN.
>>
>
> FWIW, this has not changed in the latest(?) draft for F2018 (N2146), see
> 16.9.125.
>
> Also, the section on IEEE_ARITHMETIC (14.9) does not mention
>> comparisons; also, "Complete conformance with IEC 60559:1989 is not
>> required", what is required is the correct support for +,-, and *,
>> plus support for / if IEEE_SUPPORT_DIVIDE is covered.
>>
>
> Interestingly, here the F2018 draft has new intrinsics in the
> IEEE_ARITHMETIC module, IEEE_MAX_NUM, IEEE_MAX_NUM_MAG, IEEE_MIN_NUM,
> IEEE_MIN_NUM_MAG. These correspond to the {max,min}num{,_mag} operations in
> IEEE 754-2008, which AFAICT has the same NaN semantics as __builtin_fmax
> etc.
>
>
>> So, the Fortran standard does not impose many requirements.
>
>
> If so, why don't we just use {MAX,MIN}_EXPR unconditionally? Those who
> worry about the behavior wrt. NaNs, infinities etc. can use the intrinsics
> from IEEE_ARITHMETIC?
>
>
> This thread also has some interesting discussion on the topic:
> https://github.com/JuliaLang/julia/issues/7866
>

Oh, and on http://754r.ucbtest.org/ there is information about the next
update after IEEE 754-2008. In particular,
http://754r.ucbtest.org/changes.html notes that the above mentioned
{max,min}num{,_mag}  have been deleted, and "new
{min,max}imum{,Number,Magnitude,MagnitudeNumber} operations are
recommended; NaN and signed zero handling are changed from 754-2008 5.3.1.
".


--
Janne Blomqvist
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH][Fortran] Use MIN/MAX_EXPR for intrinsics or __builtin_fmin/max when appropriate

Richard Biener-2
In reply to this post by Richard Biener-2
On Tue, Jul 17, 2018 at 3:46 PM Kyrill Tkachov
<[hidden email]> wrote:

>
> Hi Richard,
>
> On 17/07/18 14:27, Richard Biener wrote:
> > On Tue, Jul 17, 2018 at 2:35 PM Kyrill Tkachov
> > <[hidden email]> wrote:
> >> Hi all,
> >>
> >> This is my first Fortran patch, so apologies if I'm missing something.
> >> The current expansion of the min and max intrinsics explicitly expands
> >> the comparisons between each argument to calculate the global min/max.
> >> Some targets, like aarch64, have instructions that can calculate the min/max
> >> of two real (floating-point) numbers with the proper NaN-handling semantics
> >> (if both inputs are NaN, return Nan. If one is NaN, return the other) and those
> >> are the semantics provided by the __builtin_fmin/max family of functions that expand
> >> to these instructions.
> >>
> >> This patch makes the frontend emit __builtin_fmin/max directly to compare each
> >> pair of numbers when the numbers are floating-point, and use MIN_EXPR/MAX_EXPR otherwise
> >> (integral types and -ffast-math) which should hopefully be easier to recognise in the
> > What is Fortrans requirement on min/max intrinsics?  Doesn't it only
> > require things that
> > are guaranteed by MIN/MAX_EXPR anyways?  The only restriction here is
>
> The current implementation expands to:
>      mvar = a1;
>      if (a2 .op. mvar || isnan (mvar))
>        mvar = a2;
>      if (a3 .op. mvar || isnan (mvar))
>        mvar = a3;
>      ...
>      return mvar;
>
> That is, if one of the operands is a NaN it will return the other argument.
> If both (all) are NaNs, it will return NaN. This is the same as the semantics of fmin/max
> as far as I can tell.
>
> > /* Minimum and maximum values.  When used with floating point, if both
> >     operands are zeros, or if either operand is NaN, then it is unspecified
> >     which of the two operands is returned as the result.  */
> >
> > which means MIN/MAX_EXPR are not strictly IEEE compliant with signed
> > zeros or NaNs.
> > Thus the correct test would be !HONOR_SIGNED_ZEROS && !HONOR_NANS if singed
> > zeros are significant.
>
> True, MIN/MAX_EXPR would not be appropriate in that condition. I guarded their use
> on !HONOR_NANS (type) only. I'll update it to !HONOR_SIGNED_ZEROS (type) && !HONOR_NANS (type).
>
>
> >
> > I'm not sure if using fmin/max calls when we cannot use MIN/MAX_EXPR
> > is a good idea,
> > this may both generate bigger code and be slower.
>
> The patch will generate fmin/fmax calls (or the fminf,fminl variants) when mathfn_built_in advertises
> them as available (does that mean they'll have a fast inline implementation?)

This doesn't mean anything given you make them available with your
patch ;)  So I expect it may
cause issues for !c99_runtime targets (and long double at least).

> If the above doesn't hold and we can't use either MIN/MAX_EXPR of fmin/fmax then the patch falls back
> to the existing expansion.

As said I would not use fmin/fmax calls here at all.

> FWIW, this patch does improve performance on 521.wrf from SPEC2017 on aarch64.

You said that, yes.  Even without -ffast-math?

Richard.

> Thanks,
> Kyrill
>
> >
> > Richard.
> >
> >> midend and optimise. The previous approach of generating the open-coded version of that
> >> is used when we don't have an appropriate __builtin_fmin/max available.
> >> For example, for a configuration of x86_64-unknown-linux-gnu that I tested there was no
> >> 128-bit __built_fminl available.
> >>
> >> With this patch I'm seeing more than 7000 FMINNM/FMAXNM instructions being generated at -O3
> >> on aarch64 for 521.wrf from fprate SPEC2017 where none before were generated
> >> (we were generating explicit comparisons and NaN checks). This gave a 2.4% improvement
> >> in performance on a Cortex-A72.
> >>
> >> Bootstrapped and tested on aarch64-none-linux-gnu and x86_64-unknown-linux-gnu.
> >>
> >> Ok for trunk?
> >> Thanks,
> >> Kyrill
> >>
> >> 2018-07-17  Kyrylo Tkachov  <[hidden email]>
> >>
> >>       * f95-lang.c (gfc_init_builtin_functions): Define __builtin_fmin,
> >>       __builtin_fminf, __builtin_fminl, __builtin_fmax, __builtin_fmaxf,
> >>       __builtin_fmaxl.
> >>       * trans-intrinsic.c: Include builtins.h.
> >>       (gfc_conv_intrinsic_minmax): Emit __builtin_fmin/max or MIN/MAX_EXPR
> >>       functions to calculate the min/max.
> >>
> >> 2018-07-17  Kyrylo Tkachov  <[hidden email]>
> >>
> >>       * gfortran.dg/max_fmaxf.f90: New test.
> >>       * gfortran.dg/min_fminf.f90: Likewise.
> >>       * gfortran.dg/minmax_integer.f90: Likewise.
> >>       * gfortran.dg/max_fmaxl_aarch64.f90: Likewise.
> >>       * gfortran.dg/min_fminl_aarch64.f90: Likewise.
>
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH][Fortran] Use MIN/MAX_EXPR for intrinsics or __builtin_fmin/max when appropriate

Kyrill  Tkachov-2

On 18/07/18 10:44, Richard Biener wrote:

> On Tue, Jul 17, 2018 at 3:46 PM Kyrill Tkachov
> <[hidden email]> wrote:
>> Hi Richard,
>>
>> On 17/07/18 14:27, Richard Biener wrote:
>>> On Tue, Jul 17, 2018 at 2:35 PM Kyrill Tkachov
>>> <[hidden email]> wrote:
>>>> Hi all,
>>>>
>>>> This is my first Fortran patch, so apologies if I'm missing something.
>>>> The current expansion of the min and max intrinsics explicitly expands
>>>> the comparisons between each argument to calculate the global min/max.
>>>> Some targets, like aarch64, have instructions that can calculate the min/max
>>>> of two real (floating-point) numbers with the proper NaN-handling semantics
>>>> (if both inputs are NaN, return Nan. If one is NaN, return the other) and those
>>>> are the semantics provided by the __builtin_fmin/max family of functions that expand
>>>> to these instructions.
>>>>
>>>> This patch makes the frontend emit __builtin_fmin/max directly to compare each
>>>> pair of numbers when the numbers are floating-point, and use MIN_EXPR/MAX_EXPR otherwise
>>>> (integral types and -ffast-math) which should hopefully be easier to recognise in the
>>> What is Fortrans requirement on min/max intrinsics?  Doesn't it only
>>> require things that
>>> are guaranteed by MIN/MAX_EXPR anyways?  The only restriction here is
>> The current implementation expands to:
>>       mvar = a1;
>>       if (a2 .op. mvar || isnan (mvar))
>>         mvar = a2;
>>       if (a3 .op. mvar || isnan (mvar))
>>         mvar = a3;
>>       ...
>>       return mvar;
>>
>> That is, if one of the operands is a NaN it will return the other argument.
>> If both (all) are NaNs, it will return NaN. This is the same as the semantics of fmin/max
>> as far as I can tell.
>>
>>> /* Minimum and maximum values.  When used with floating point, if both
>>>      operands are zeros, or if either operand is NaN, then it is unspecified
>>>      which of the two operands is returned as the result.  */
>>>
>>> which means MIN/MAX_EXPR are not strictly IEEE compliant with signed
>>> zeros or NaNs.
>>> Thus the correct test would be !HONOR_SIGNED_ZEROS && !HONOR_NANS if singed
>>> zeros are significant.
>> True, MIN/MAX_EXPR would not be appropriate in that condition. I guarded their use
>> on !HONOR_NANS (type) only. I'll update it to !HONOR_SIGNED_ZEROS (type) && !HONOR_NANS (type).
>>
>>
>>> I'm not sure if using fmin/max calls when we cannot use MIN/MAX_EXPR
>>> is a good idea,
>>> this may both generate bigger code and be slower.
>> The patch will generate fmin/fmax calls (or the fminf,fminl variants) when mathfn_built_in advertises
>> them as available (does that mean they'll have a fast inline implementation?)
> This doesn't mean anything given you make them available with your
> patch ;)  So I expect it may
> cause issues for !c99_runtime targets (and long double at least).

Urgh, that can cause headaches...

>> If the above doesn't hold and we can't use either MIN/MAX_EXPR of fmin/fmax then the patch falls back
>> to the existing expansion.
> As said I would not use fmin/fmax calls here at all.

... Given the comments from Thomas and Janne, maybe we should just emit MIN/MAX_EXPRs here
since there is no language requirement on NaN/signed zero handling on these intrinsics?
That should make it simpler and more portable.

>> FWIW, this patch does improve performance on 521.wrf from SPEC2017 on aarch64.
> You said that, yes.  Even without -ffast-math?

It improves at -O3 without -ffast-math in particular. With -ffast-math phiopt optimisation
is more aggressive and merges the conditionals into MIN/MAX_EXPRs (minmax_replacement in tree-ssa-phiopt.c)

Thanks,
Kyrill

> Richard.
>
>> Thanks,
>> Kyrill
>>
>>> Richard.
>>>
>>>> midend and optimise. The previous approach of generating the open-coded version of that
>>>> is used when we don't have an appropriate __builtin_fmin/max available.
>>>> For example, for a configuration of x86_64-unknown-linux-gnu that I tested there was no
>>>> 128-bit __built_fminl available.
>>>>
>>>> With this patch I'm seeing more than 7000 FMINNM/FMAXNM instructions being generated at -O3
>>>> on aarch64 for 521.wrf from fprate SPEC2017 where none before were generated
>>>> (we were generating explicit comparisons and NaN checks). This gave a 2.4% improvement
>>>> in performance on a Cortex-A72.
>>>>
>>>> Bootstrapped and tested on aarch64-none-linux-gnu and x86_64-unknown-linux-gnu.
>>>>
>>>> Ok for trunk?
>>>> Thanks,
>>>> Kyrill
>>>>
>>>> 2018-07-17  Kyrylo Tkachov  <[hidden email]>
>>>>
>>>>        * f95-lang.c (gfc_init_builtin_functions): Define __builtin_fmin,
>>>>        __builtin_fminf, __builtin_fminl, __builtin_fmax, __builtin_fmaxf,
>>>>        __builtin_fmaxl.
>>>>        * trans-intrinsic.c: Include builtins.h.
>>>>        (gfc_conv_intrinsic_minmax): Emit __builtin_fmin/max or MIN/MAX_EXPR
>>>>        functions to calculate the min/max.
>>>>
>>>> 2018-07-17  Kyrylo Tkachov  <[hidden email]>
>>>>
>>>>        * gfortran.dg/max_fmaxf.f90: New test.
>>>>        * gfortran.dg/min_fminf.f90: Likewise.
>>>>        * gfortran.dg/minmax_integer.f90: Likewise.
>>>>        * gfortran.dg/max_fmaxl_aarch64.f90: Likewise.
>>>>        * gfortran.dg/min_fminl_aarch64.f90: Likewise.

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH][Fortran] Use MIN/MAX_EXPR for intrinsics or __builtin_fmin/max when appropriate

Richard Biener-2
On Wed, Jul 18, 2018 at 11:50 AM Kyrill Tkachov
<[hidden email]> wrote:

>
>
> On 18/07/18 10:44, Richard Biener wrote:
> > On Tue, Jul 17, 2018 at 3:46 PM Kyrill Tkachov
> > <[hidden email]> wrote:
> >> Hi Richard,
> >>
> >> On 17/07/18 14:27, Richard Biener wrote:
> >>> On Tue, Jul 17, 2018 at 2:35 PM Kyrill Tkachov
> >>> <[hidden email]> wrote:
> >>>> Hi all,
> >>>>
> >>>> This is my first Fortran patch, so apologies if I'm missing something.
> >>>> The current expansion of the min and max intrinsics explicitly expands
> >>>> the comparisons between each argument to calculate the global min/max.
> >>>> Some targets, like aarch64, have instructions that can calculate the min/max
> >>>> of two real (floating-point) numbers with the proper NaN-handling semantics
> >>>> (if both inputs are NaN, return Nan. If one is NaN, return the other) and those
> >>>> are the semantics provided by the __builtin_fmin/max family of functions that expand
> >>>> to these instructions.
> >>>>
> >>>> This patch makes the frontend emit __builtin_fmin/max directly to compare each
> >>>> pair of numbers when the numbers are floating-point, and use MIN_EXPR/MAX_EXPR otherwise
> >>>> (integral types and -ffast-math) which should hopefully be easier to recognise in the
> >>> What is Fortrans requirement on min/max intrinsics?  Doesn't it only
> >>> require things that
> >>> are guaranteed by MIN/MAX_EXPR anyways?  The only restriction here is
> >> The current implementation expands to:
> >>       mvar = a1;
> >>       if (a2 .op. mvar || isnan (mvar))
> >>         mvar = a2;
> >>       if (a3 .op. mvar || isnan (mvar))
> >>         mvar = a3;
> >>       ...
> >>       return mvar;
> >>
> >> That is, if one of the operands is a NaN it will return the other argument.
> >> If both (all) are NaNs, it will return NaN. This is the same as the semantics of fmin/max
> >> as far as I can tell.
> >>
> >>> /* Minimum and maximum values.  When used with floating point, if both
> >>>      operands are zeros, or if either operand is NaN, then it is unspecified
> >>>      which of the two operands is returned as the result.  */
> >>>
> >>> which means MIN/MAX_EXPR are not strictly IEEE compliant with signed
> >>> zeros or NaNs.
> >>> Thus the correct test would be !HONOR_SIGNED_ZEROS && !HONOR_NANS if singed
> >>> zeros are significant.
> >> True, MIN/MAX_EXPR would not be appropriate in that condition. I guarded their use
> >> on !HONOR_NANS (type) only. I'll update it to !HONOR_SIGNED_ZEROS (type) && !HONOR_NANS (type).
> >>
> >>
> >>> I'm not sure if using fmin/max calls when we cannot use MIN/MAX_EXPR
> >>> is a good idea,
> >>> this may both generate bigger code and be slower.
> >> The patch will generate fmin/fmax calls (or the fminf,fminl variants) when mathfn_built_in advertises
> >> them as available (does that mean they'll have a fast inline implementation?)
> > This doesn't mean anything given you make them available with your
> > patch ;)  So I expect it may
> > cause issues for !c99_runtime targets (and long double at least).
>
> Urgh, that can cause headaches...
>
> >> If the above doesn't hold and we can't use either MIN/MAX_EXPR of fmin/fmax then the patch falls back
> >> to the existing expansion.
> > As said I would not use fmin/fmax calls here at all.
>
> ... Given the comments from Thomas and Janne, maybe we should just emit MIN/MAX_EXPRs here
> since there is no language requirement on NaN/signed zero handling on these intrinsics?
> That should make it simpler and more portable.

That's fortran maintainers call.

> >> FWIW, this patch does improve performance on 521.wrf from SPEC2017 on aarch64.
> > You said that, yes.  Even without -ffast-math?
>
> It improves at -O3 without -ffast-math in particular. With -ffast-math phiopt optimisation
> is more aggressive and merges the conditionals into MIN/MAX_EXPRs (minmax_replacement in tree-ssa-phiopt.c)

The question is will it be slower without -ffast-math, that is, when
fmin/max() calls are emitted rather
than inline conditionals.

I think a patch just using MAX/MIN_EXPR within the existing
constraints and otherwise falling back to
the current code would be more obvious and other changes should be
mande independently.

Richard.

> Thanks,
> Kyrill
>
> > Richard.
> >
> >> Thanks,
> >> Kyrill
> >>
> >>> Richard.
> >>>
> >>>> midend and optimise. The previous approach of generating the open-coded version of that
> >>>> is used when we don't have an appropriate __builtin_fmin/max available.
> >>>> For example, for a configuration of x86_64-unknown-linux-gnu that I tested there was no
> >>>> 128-bit __built_fminl available.
> >>>>
> >>>> With this patch I'm seeing more than 7000 FMINNM/FMAXNM instructions being generated at -O3
> >>>> on aarch64 for 521.wrf from fprate SPEC2017 where none before were generated
> >>>> (we were generating explicit comparisons and NaN checks). This gave a 2.4% improvement
> >>>> in performance on a Cortex-A72.
> >>>>
> >>>> Bootstrapped and tested on aarch64-none-linux-gnu and x86_64-unknown-linux-gnu.
> >>>>
> >>>> Ok for trunk?
> >>>> Thanks,
> >>>> Kyrill
> >>>>
> >>>> 2018-07-17  Kyrylo Tkachov  <[hidden email]>
> >>>>
> >>>>        * f95-lang.c (gfc_init_builtin_functions): Define __builtin_fmin,
> >>>>        __builtin_fminf, __builtin_fminl, __builtin_fmax, __builtin_fmaxf,
> >>>>        __builtin_fmaxl.
> >>>>        * trans-intrinsic.c: Include builtins.h.
> >>>>        (gfc_conv_intrinsic_minmax): Emit __builtin_fmin/max or MIN/MAX_EXPR
> >>>>        functions to calculate the min/max.
> >>>>
> >>>> 2018-07-17  Kyrylo Tkachov  <[hidden email]>
> >>>>
> >>>>        * gfortran.dg/max_fmaxf.f90: New test.
> >>>>        * gfortran.dg/min_fminf.f90: Likewise.
> >>>>        * gfortran.dg/minmax_integer.f90: Likewise.
> >>>>        * gfortran.dg/max_fmaxl_aarch64.f90: Likewise.
> >>>>        * gfortran.dg/min_fminl_aarch64.f90: Likewise.
>
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH][Fortran][v2] Use MIN/MAX_EXPR for min/max intrinsics

Janne Blomqvist-3
In reply to this post by Janne Blomqvist-3
On Wed, Jul 18, 2018 at 4:26 PM, Thomas König <[hidden email]> wrote:

> Hi Kyrlll,
>
> > Am 18.07.2018 um 13:17 schrieb Kyrill Tkachov <
> [hidden email]>:
> >
> > Thomas, Janne, would this relaxation of NaN handling be acceptable given
> the benefits
> > mentioned above? If so, what would be the recommended adjustment to the
> nan_1.f90 test?
>
> I would be a bit careful about changing behavior in such a major way. What
> would the results with NaN and infinity then be, with or without
> optimization? Would the results be consistent with min(nan,num) vs
> min(num,nan)? Would they be consistent with the new IEEE standard?
>

AFAIU, MIN/MAX_EXPR do the right thing when comparing a normal number with
Inf. For NaN the result is undefined, and you might indeed have

min(a, NaN) = a
min(NaN, a) = NaN

where "a" is a normal number.

(I think that happens at least on x86 if MIN_EXPR is expanded to
minsd/minpd.

Apparently what the proper result for min(a, NaN) should be is contentious
enough that minnum was removed from the upcoming IEEE 754 revision, and new
operations AFAICS have the semantics

minimum(a, NaN) = minimum(NaN, a) = NaN
minimumNumber(a, NaN) = minimumNumber(NaN, a) = a

That is minimumNumber corresponds to minnum in IEEE 754-2008 and fmin* in
C, and to the current behavior of gfortran.


> In general, I think that min(nan,num) should be nan and that our current
> behavior is not the best.
>

There was some extensive discussion of that in the Julia bug report I
linked to in an earlier message, and they came to the same conclusion and
changed their behavior.


> Does anybody have dats points on how this is handled by other compilers?
>

The only other compiler I have access to at the moment is ifort (and not
the latest version), but maybe somebody has access to a wider variety?


> Oh, and if anything is changed, then compile and runtime behavior should
> always be the same.
>

Well, IFF we place some weight on the runtime behavior being particularly
sensible wrt NaN's, which it wouldn't be if we just use a plain
MIN/MAX_EXPR. Is it worth taking a performance hit for, though? In
particular, if other compilers are inconsistent, we might as well do
whatever is fastest.


--
Janne Blomqvist
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH][Fortran][v2] Use MIN/MAX_EXPR for min/max intrinsics

Kyrill  Tkachov-2
In reply to this post by Janne Blomqvist-3
Hi Richard,

On 18/07/18 16:27, Richard Sandiford wrote:

> Thanks for doing this.
>
> Kyrill  Tkachov <[hidden email]> writes:
>> +  calc = build_call_expr_internal_loc (input_location, ifn, type,
>> +      2, mvar, convert (type, val));
> (indentation looks off)
>
>> diff --git a/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90 b/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..8c8ea063e5d0718dc829c1f5574c5b46040e6786
>> --- /dev/null
>> +++ b/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90
>> @@ -0,0 +1,9 @@
>> +! { dg-do compile { target aarch64*-*-* } }
>> +! { dg-options "-O2 -fdump-tree-optimized" }
>> +
>> +subroutine fool (a, b, c, d, e, f, g, h)
>> +  real (kind=16) :: a, b, c, d, e, f, g, h
>> +  a = max (a, b, c, d, e, f, g, h)
>> +end subroutine
>> +
>> +! { dg-final { scan-tree-dump-times "__builtin_fmaxl " 7 "optimized" } }
>> diff --git a/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90 b/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..92368917fb48e0c468a16d080ab3a9ac842e01a7
>> --- /dev/null
>> +++ b/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90
>> @@ -0,0 +1,9 @@
>> +! { dg-do compile { target aarch64*-*-* } }
>> +! { dg-options "-O2 -fdump-tree-optimized" }
>> +
>> +subroutine fool (a, b, c, d, e, f, g, h)
>> +  real (kind=16) :: a, b, c, d, e, f, g, h
>> +  a = min (a, b, c, d, e, f, g, h)
>> +end subroutine
>> +
>> +! { dg-final { scan-tree-dump-times "__builtin_fminl " 7 "optimized" } }
> Do these still pass?  I wouldn't have expected us to use __builtin_fmin*
> and __builtin_fmax* now.
>
> It would be good to have tests that we use ".FMIN" and ".FMAX" for kind=4
> and kind=8 on AArch64, since that's really the end goal here.
Doh, yes. I had spotted that myself after I had sent out the patch.
I've fixed that and the indentation issue in this small revision.

Given Janne's comments I will commit this tomorrow if there are no objections.
This patch should be a conservative improvement. If the Fortran folks decide
to sacrifice the more predictable NaN handling in favour of more optimisation
leeway by using MIN/MAX_EXPR unconditionally we can do that as a follow-up.

Thanks for the help,
Kyrill

2018-07-18  Kyrylo Tkachov  <[hidden email]>

     * trans-intrinsic.c: (gfc_conv_intrinsic_minmax): Emit MIN_MAX_EXPR
     or IFN_FMIN/FMAX sequence to calculate the min/max when possible.

2018-07-18  Kyrylo Tkachov  <[hidden email]>

     * gfortran.dg/max_fmax_aarch64.f90: New test.
     * gfortran.dg/min_fmin_aarch64.f90: Likewise.
     * gfortran.dg/minmax_integer.f90: Likewise.


fort-v4.patch (7K) Download Attachment