[PATCH][ARM] Update max_cond_insns settings

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH][ARM] Update max_cond_insns settings

Wilco Dijkstra-2
The existing setting of max_cond_insns for most cores is non-optimal.
Thumb-2 IT has a maximum limit of 4, so 5 means emitting 2 IT sequences.
Also such long sequences of conditional instructions can increase the number
of executed instructions significantly, so using 5 for max_cond_insns is
non-optimal.

Previous benchmarking showed that setting max_cond_insn to 2 was the best value
for Cortex-A15 and Cortex-A57.  All ARMv8-A cores use 2 - apart from Cortex-A35
and Cortex-A53.  Given that using 5 is worse, set it to 2.  This also has the
advantage of producing more uniform code.

Bootstrap and regress OK on arm-none-linux-gnueabihf.

OK for stage 1?

ChangeLog:
2017-04-12  Wilco Dijkstra  <[hidden email]>

        * gcc/config/arm/arm.c (arm_cortex_a53_tune): Set max_cond_insns to 2.
        (arm_cortex_a35_tune): Likewise.
---

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 29e8d1d07d918fbb2a627a653510dfc8587ee01a..1a6d552aa322114795acbb3667c6ea36963bf193 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1967,7 +1967,7 @@ const struct tune_params arm_cortex_a35_tune =
   arm_default_branch_cost,
   &arm_default_vec_cost,
   1, /* Constant limit.  */
-  5, /* Max cond insns.  */
+  2, /* Max cond insns.  */
   8, /* Memset max inline.  */
   1, /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
@@ -1990,7 +1990,7 @@ const struct tune_params arm_cortex_a53_tune =
   arm_default_branch_cost,
   &arm_default_vec_cost,
   1, /* Constant limit.  */
-  5, /* Max cond insns.  */
+  2, /* Max cond insns.  */
   8, /* Memset max inline.  */
   2, /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH][ARM] Update max_cond_insns settings

Wilco Dijkstra-2

ping


From: Wilco Dijkstra
Sent: 12 April 2017 14:02
To: GCC Patches
Cc: nd; Kyrylo Tkachov
Subject: [PATCH][ARM] Update max_cond_insns settings
   
The existing setting of max_cond_insns for most cores is non-optimal.
Thumb-2 IT has a maximum limit of 4, so 5 means emitting 2 IT sequences.
Also such long sequences of conditional instructions can increase the number
of executed instructions significantly, so using 5 for max_cond_insns is
non-optimal.

Previous benchmarking showed that setting max_cond_insn to 2 was the best value
for Cortex-A15 and Cortex-A57.  All ARMv8-A cores use 2 - apart from Cortex-A35
and Cortex-A53.  Given that using 5 is worse, set it to 2.  This also has the
advantage of producing more uniform code.

Bootstrap and regress OK on arm-none-linux-gnueabihf.

OK for stage 1?

ChangeLog:
2017-04-12  Wilco Dijkstra  <[hidden email]>

        * gcc/config/arm/arm.c (arm_cortex_a53_tune): Set max_cond_insns to 2.
        (arm_cortex_a35_tune): Likewise.
---

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 29e8d1d07d918fbb2a627a653510dfc8587ee01a..1a6d552aa322114795acbb3667c6ea36963bf193 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1967,7 +1967,7 @@ const struct tune_params arm_cortex_a35_tune =
   arm_default_branch_cost,
   &arm_default_vec_cost,
   1,                                           /* Constant limit.  */
-  5,                                           /* Max cond insns.  */
+  2,                                           /* Max cond insns.  */
   8,                                           /* Memset max inline.  */
   1,                                           /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
@@ -1990,7 +1990,7 @@ const struct tune_params arm_cortex_a53_tune =
   arm_default_branch_cost,
   &arm_default_vec_cost,
   1,                                           /* Constant limit.  */
-  5,                                           /* Max cond insns.  */
+  2,                                           /* Max cond insns.  */
   8,                                           /* Memset max inline.  */
   2,                                           /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,    
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH][ARM] Update max_cond_insns settings

Richard Earnshaw (lists)
In reply to this post by Wilco Dijkstra-2
On 12/04/17 14:02, Wilco Dijkstra wrote:

> The existing setting of max_cond_insns for most cores is non-optimal.
> Thumb-2 IT has a maximum limit of 4, so 5 means emitting 2 IT sequences.
> Also such long sequences of conditional instructions can increase the number
> of executed instructions significantly, so using 5 for max_cond_insns is
> non-optimal.
>
> Previous benchmarking showed that setting max_cond_insn to 2 was the best value
> for Cortex-A15 and Cortex-A57.  All ARMv8-A cores use 2 - apart from Cortex-A35
> and Cortex-A53.  Given that using 5 is worse, set it to 2.  This also has the
> advantage of producing more uniform code.
>
> Bootstrap and regress OK on arm-none-linux-gnueabihf.
>
> OK for stage 1?
>
> ChangeLog:
> 2017-04-12  Wilco Dijkstra  <[hidden email]>
>
>         * gcc/config/arm/arm.c (arm_cortex_a53_tune): Set max_cond_insns to 2.
>         (arm_cortex_a35_tune): Likewise.
> ---
>
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 29e8d1d07d918fbb2a627a653510dfc8587ee01a..1a6d552aa322114795acbb3667c6ea36963bf193 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -1967,7 +1967,7 @@ const struct tune_params arm_cortex_a35_tune =
>    arm_default_branch_cost,
>    &arm_default_vec_cost,
>    1, /* Constant limit.  */
> -  5, /* Max cond insns.  */
> +  2, /* Max cond insns.  */
>    8, /* Memset max inline.  */
>    1, /* Issue rate.  */
>    ARM_PREFETCH_NOT_BENEFICIAL,
> @@ -1990,7 +1990,7 @@ const struct tune_params arm_cortex_a53_tune =
>    arm_default_branch_cost,
>    &arm_default_vec_cost,
>    1, /* Constant limit.  */
> -  5, /* Max cond insns.  */
> +  2, /* Max cond insns.  */
>    8, /* Memset max inline.  */
>    2, /* Issue rate.  */
>    ARM_PREFETCH_NOT_BENEFICIAL,
>


This parameter is also used for A32 code.  Is that really the right
number there as well?

I do wonder if the code in arm_option_params_internal should be tweaked
to hard-limit the number of skipped insns for Thumb2 to one IT block.  So

    /* When -mrestrict-it is in use tone down the if-conversion.  */
    max_insns_skipped = (TARGET_THUMB2 && arm_restrict_it)
      ? 1
      : (TARGET_THUMB2 ? MIN (current_tune->max_insns_skipped, 4)
         | current_tune->max_insns_skipped);


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH][ARM] Update max_cond_insns settings

Wilco Dijkstra-2
Richard Earnshaw wrote:

> -  5,                                         /* Max cond insns.  */
> +  2,                                         /* Max cond insns.  */

> This parameter is also used for A32 code.  Is that really the right
> number there as well?

Yes, this parameter has always been the same for ARM and Thumb-2.

> I do wonder if the code in arm_option_params_internal should be tweaked
> to hard-limit the number of skipped insns for Thumb2 to one IT block.  So

You mean https://gcc.gnu.org/ml/gcc-patches/2017-01/msg01191.html ? :-)

Wilco
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH][ARM] Update max_cond_insns settings

Richard Earnshaw (lists)
On 04/05/17 18:38, Wilco Dijkstra wrote:
> Richard Earnshaw wrote:
>
>> -  5,                                         /* Max cond insns.  */
>> +  2,                                         /* Max cond insns.  */
>
>> This parameter is also used for A32 code.  Is that really the right
>> number there as well?
>
> Yes, this parameter has always been the same for ARM and Thumb-2.

I know that.  I'm questioning whether that number (2) is right when on
ARM.  It seems very low to me, especially when branches are unpredictable.

>
>> I do wonder if the code in arm_option_params_internal should be tweaked
>> to hard-limit the number of skipped insns for Thumb2 to one IT block.  So
>
> You mean https://gcc.gnu.org/ml/gcc-patches/2017-01/msg01191.html ? :-)
>

Haven't got as far as that one yet.

R.

> Wilco
>

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH][ARM] Update max_cond_insns settings

Wilco Dijkstra-2
Richard Earnshaw (lists) wrote:

> On 04/05/17 18:38, Wilco Dijkstra wrote:
> > Richard Earnshaw wrote:
> >
>>> -  5,                                         /* Max cond insns.  */
>>> +  2,                                         /* Max cond insns.  */
>>
>>> This parameter is also used for A32 code.  Is that really the right
>>> number there as well?
>>
>> Yes, this parameter has always been the same for ARM and Thumb-2.
>
> I know that.  I'm questioning whether that number (2) is right when on
> ARM.  It seems very low to me, especially when branches are unpredictable.

Why does it seem low? Benchmarking showed 2 was the best value for modern
cores. The same branch predictor is used, so the same settings should be used
for ARM and Thumb-2.

Wilco

   
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH][ARM] Update max_cond_insns settings

Richard Earnshaw (lists)
On 05/05/17 13:42, Wilco Dijkstra wrote:

> Richard Earnshaw (lists) wrote:
>> On 04/05/17 18:38, Wilco Dijkstra wrote:
>> > Richard Earnshaw wrote:
>> >
>>>> -  5,                                         /* Max cond insns.  */
>>>> +  2,                                         /* Max cond insns.  */
>>>
>>>> This parameter is also used for A32 code.  Is that really the right
>>>> number there as well?
>>>
>>> Yes, this parameter has always been the same for ARM and Thumb-2.
>>
>> I know that.  I'm questioning whether that number (2) is right when on
>> ARM.  It seems very low to me, especially when branches are unpredictable.
>
> Why does it seem low? Benchmarking showed 2 was the best value for modern
> cores. The same branch predictor is used, so the same settings should be
> used
> for ARM and Thumb-2.
>
> Wilco
>
>    

Thumb2 code has to execute an additional instruction to start an IT
sequence.  It might therefore seem reasonable for the ARM sequence to be
one instruction longer.

R.
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH][ARM] Update max_cond_insns settings

Wilco Dijkstra-2
Richard Earnshaw (lists) wrote:

> On 05/05/17 13:42, Wilco Dijkstra wrote:
>> Richard Earnshaw (lists) wrote:
>>> On 04/05/17 18:38, Wilco Dijkstra wrote:
>>> > Richard Earnshaw wrote:
>>> >
>>>>> -  5,                                         /* Max cond insns.  */
>>>>> +  2,                                         /* Max cond insns.  */
>>>>
>>>>> This parameter is also used for A32 code.  Is that really the right
>>>>> number there as well?
>>>>
>>>> Yes, this parameter has always been the same for ARM and Thumb-2.
>>>
>>> I know that.  I'm questioning whether that number (2) is right when on
>>> ARM.  It seems very low to me, especially when branches are unpredictable.
>>
>> Why does it seem low? Benchmarking showed 2 was the best value for modern
>> cores. The same branch predictor is used, so the same settings should be
>> used
>> for ARM and Thumb-2.
>
> Thumb2 code has to execute an additional instruction to start an IT
> sequence.  It might therefore seem reasonable for the ARM sequence to be
> one instruction longer.

The IT instruction has no inputs/outputs and thus behaves like a NOP - unlike
conditional instructions which have real latencies and additional dependencies due
to being conditional. So the overhead of IT itself is small.

Wilco
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH][ARM] Update max_cond_insns settings

Wilco Dijkstra-2
ping
   
Richard Earnshaw (lists) wrote:

> On 05/05/17 13:42, Wilco Dijkstra wrote:
>> Richard Earnshaw (lists) wrote:
>>> On 04/05/17 18:38, Wilco Dijkstra wrote:
>>> > Richard Earnshaw wrote:
>>> >
>>>>> -  5,                                         /* Max cond insns.  */
>>>>> +  2,                                         /* Max cond insns.  */
>>>>
>>>>> This parameter is also used for A32 code.  Is that really the right
>>>>> number there as well?
>>>>
>>>> Yes, this parameter has always been the same for ARM and Thumb-2.
>>>
>>> I know that.  I'm questioning whether that number (2) is right when on
>>> ARM.  It seems very low to me, especially when branches are unpredictable.
>>
>> Why does it seem low? Benchmarking showed 2 was the best value for modern
>> cores. The same branch predictor is used, so the same settings should be
>> used
>> for ARM and Thumb-2.
>
> Thumb2 code has to execute an additional instruction to start an IT
> sequence.  It might therefore seem reasonable for the ARM sequence to be
> one instruction longer.

The IT instruction has no inputs/outputs and thus behaves like a NOP - unlike
conditional instructions which have real latencies and additional dependencies due
to being conditional. So the overhead of IT itself is small.

Wilco    
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH][ARM] Update max_cond_insns settings

Wilco Dijkstra-2

   
ping
   
Richard Earnshaw (lists) wrote:

> On 05/05/17 13:42, Wilco Dijkstra wrote:
>> Richard Earnshaw (lists) wrote:
>>> On 04/05/17 18:38, Wilco Dijkstra wrote:
>>> > Richard Earnshaw wrote:
>>> >
>>>>> -  5,                                         /* Max cond insns.  */
>>>>> +  2,                                         /* Max cond insns.  */
>>>>
>>>>> This parameter is also used for A32 code.  Is that really the right
>>>>> number there as well?
>>>>
>>>> Yes, this parameter has always been the same for ARM and Thumb-2.
>>>
>>> I know that.  I'm questioning whether that number (2) is right when on
>>> ARM.  It seems very low to me, especially when branches are unpredictable.
>>
>> Why does it seem low? Benchmarking showed 2 was the best value for modern
>> cores. The same branch predictor is used, so the same settings should be
>> used
>> for ARM and Thumb-2.
>
> Thumb2 code has to execute an additional instruction to start an IT
> sequence.  It might therefore seem reasonable for the ARM sequence to be
> one instruction longer.

The IT instruction has no inputs/outputs and thus behaves like a NOP - unlike
conditional instructions which have real latencies and additional dependencies due
to being conditional. So the overhead of IT itself is small.

Wilco       
Reply | Threaded
Open this post in threaded view
|

[PATCH][ARM] Update max_cond_insns settings

Wilco Dijkstra-2
In reply to this post by Wilco Dijkstra-2
The existing setting of max_cond_insns for most cores is non-optimal.
Thumb-2 IT has a maximum limit of 4, so 5 means emitting 2 IT sequences.
Also such long sequences of conditional instructions can increase the number
of executed instructions significantly, so using 5 for max_cond_insns is
non-optimal.

Previous benchmarking showed that setting max_cond_insn to 2 was the best value
for Cortex-A15 and Cortex-A57.  All ARMv8-A cores use 2 - apart from Cortex-A35
and Cortex-A53.  Given that using 5 is worse, set it to 2.  This also has the
advantage of producing more uniform code.

Bootstrap and regress OK on arm-none-linux-gnueabihf.

OK for stage 1?

ChangeLog:
2017-04-12  Wilco Dijkstra  <[hidden email]>

        * gcc/config/arm/arm.c (arm_cortex_a53_tune): Set max_cond_insns to 2.
        (arm_cortex_a35_tune): Likewise.
---

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 29e8d1d07d918fbb2a627a653510dfc8587ee01a..1a6d552aa322114795acbb3667c6ea36963bf193 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1967,7 +1967,7 @@ const struct tune_params arm_cortex_a35_tune =
   arm_default_branch_cost,
   &arm_default_vec_cost,
   1,                                           /* Constant limit.  */
-  5,                                           /* Max cond insns.  */
+  2,                                           /* Max cond insns.  */
   8,                                           /* Memset max inline.  */
   1,                                           /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
@@ -1990,7 +1990,7 @@ const struct tune_params arm_cortex_a53_tune =
   arm_default_branch_cost,
   &arm_default_vec_cost,
   1,                                           /* Constant limit.  */
-  5,                                           /* Max cond insns.  */
+  2,                                           /* Max cond insns.  */
   8,                                           /* Memset max inline.  */
   2,                                           /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,