[Bug target/87949] New: PowerPC saves CR registers across calls

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

[Bug target/87949] New: PowerPC saves CR registers across calls

segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87949

            Bug ID: 87949
           Summary: PowerPC saves CR registers across calls
           Product: gcc
           Version: 9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: meissner at gcc dot gnu.org
  Target Milestone: ---

While there are 3 saved CR registers (CR2, CR3, CR4) in the PowerPC, we really,
really, really should not be saving CR values across calls due to the amount of
time it takes to save and restore these registers.

This shows up in the Spec 2006 perlbench benchmark where the hot function
(S_regmatch in regexep.c) saves all 3 CRs at the function prologue, and has to
restore these registers at the epilog.

It also shows up in the gamess benchmark (which is where I found it in doing
some future code).  Note only do functions in gamess save all 3 CR registers,
at least one function decides to use caller saves to save a 4th CR register
across a call.

I'm not sure whether this is a target feature or a machine independent feature
at this point.  My first attempt at fixing it via HARD_REGNO_CALLER_SAVE_MODE
fails due to LRA not supporting it returning VOIDmode (PR 87948).
Reply | Threaded
Open this post in threaded view
|

[Bug target/87949] PowerPC saves CR registers across calls

segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87949

--- Comment #1 from Michael Meissner <meissner at gcc dot gnu.org> ---
Created attachment 44980
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44980&action=edit
Fortran file showing problem from gamess

Compile this file with the '-Ofast -g -S -std=legacy -mcpu=power9' options.

In the assembly file, at line 1262, there is a store from a MFCR instruction to
save a CR register, and on line 1381 there is a load and a MTCRF instruction to
restore it.
Reply | Threaded
Open this post in threaded view
|

[Bug target/87949] PowerPC saves CR registers across calls

segher at gcc dot gnu.org
In reply to this post by segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87949

--- Comment #2 from Michael Meissner <meissner at gcc dot gnu.org> ---
Created attachment 44981
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44981&action=edit
Bzip2 assembly file from the fortran source

In the assembly file, at line 1262, there is a store from a MFCR instruction to
save a CR register, and on line 1381 there is a load and a MTCRF instruction to
restore it.
Reply | Threaded
Open this post in threaded view
|

[Bug target/87949] PowerPC saves CR registers across calls

segher at gcc dot gnu.org
In reply to this post by segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87949

--- Comment #3 from Peter Bergner <bergner at gcc dot gnu.org> ---
What do you think we can do about that?  The call clobbers the ABI defined
non-volatile CR regs, so we have to save/restore them.  I don't think we have
any other option, other than telling GCC to never use the non-volatile CR regs
so they never have to be saved/restored.  Does using -ffixed-cr2 -ffixed-cr3
-ffixed-cr4 help?
Reply | Threaded
Open this post in threaded view
|

[Bug target/87949] PowerPC saves CR registers across calls

segher at gcc dot gnu.org
In reply to this post by segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87949

--- Comment #4 from Bill Schmidt <wschmidt at gcc dot gnu.org> ---
Seems like a potential opportunity for shrink-wrap separate on the CRs.  I'm
not sure whether that's implemented yet.
Reply | Threaded
Open this post in threaded view
|

[Bug target/87949] PowerPC saves CR registers across calls

segher at gcc dot gnu.org
In reply to this post by segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87949

--- Comment #5 from meissner at linux dot ibm.com ---
On Fri, Nov 09, 2018 at 02:28:28AM +0000, bergner at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87949
>
> --- Comment #3 from Peter Bergner <bergner at gcc dot gnu.org> ---
> What do you think we can do about that?  The call clobbers the ABI defined
> non-volatile CR regs, so we have to save/restore them.  I don't think we have
> any other option, other than telling GCC to never use the non-volatile CR regs
> so they never have to be saved/restored.  Does using -ffixed-cr2 -ffixed-cr3
> -ffixed-cr4 help?

I should clarify what I would like.  Yes, if the compiler needs to preserve the
CRs the current code works.  But it would be helpful if instead of trying to
preserve the CR across calls, that we re-do the initial comparison, instead of
trying to keep the CR live for so long.
Reply | Threaded
Open this post in threaded view
|

[Bug target/87949] PowerPC saves CR registers across calls

segher at gcc dot gnu.org
In reply to this post by segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87949

--- Comment #6 from meissner at linux dot ibm.com ---
On Fri, Nov 09, 2018 at 02:47:31PM +0000, wschmidt at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87949
>
> --- Comment #4 from Bill Schmidt <wschmidt at gcc dot gnu.org> ---
> Seems like a potential opportunity for shrink-wrap separate on the CRs.  I'm
> not sure whether that's implemented yet.

Not really, since shrink wrap only occurs if the program has an early exit
condition.  In this case, I don't think it is early exit, but instead the
compiler is just lengthening the lifetime of the CR and moving it out of loops,
etc.
Reply | Threaded
Open this post in threaded view
|

[Bug target/87949] PowerPC saves CR registers across calls

segher at gcc dot gnu.org
In reply to this post by segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87949

David Edelsohn <dje at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|powerpc                     |powerpc-*-*
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2018-11-09
     Ever confirmed|0                           |1

--- Comment #7 from David Edelsohn <dje at gcc dot gnu.org> ---
This sounds like the general problem of the first RA pass creating excessively
long live ranges.

Does GCC know how to re-materialize a comparison?
Reply | Threaded
Open this post in threaded view
|

[Bug target/87949] PowerPC saves CR registers across calls

segher at gcc dot gnu.org
In reply to this post by segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87949

--- Comment #8 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to Bill Schmidt from comment #4)
> Seems like a potential opportunity for shrink-wrap separate on the CRs.  I'm
> not sure whether that's implemented yet.

It isn't; there are some technical problems to implement it.  Nothing that
cannot be overcome though.

The benefit wouldn't be huge; a much bigger benefit can be had by not using
non-volatile CR fields in the first place.
Reply | Threaded
Open this post in threaded view
|

[Bug target/87949] PowerPC saves CR registers across calls

segher at gcc dot gnu.org
In reply to this post by segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87949

--- Comment #9 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to meissner from comment #6)
- Comment #4 from Bill Schmidt <wschmidt at gcc dot gnu.org> ---
> > Seems like a potential opportunity for shrink-wrap separate on the CRs.  I'm
> > not sure whether that's implemented yet.
>
> Not really, since shrink wrap only occurs if the program has an early exit
> condition.

Separate shrink-wrapping can help in all cases where not all saves or restores
are needed on every path through the function, not just in cases where on some
path through the function *no* save/restore is needed.
Reply | Threaded
Open this post in threaded view
|

[Bug target/87949] PowerPC saves CR registers across calls

segher at gcc dot gnu.org
In reply to this post by segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87949

--- Comment #10 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to David Edelsohn from comment #7)
> This sounds like the general problem of the first RA pass creating
> excessively long live ranges.

Yeah.

> Does GCC know how to re-materialize a comparison?

Not sure...  Maybe secondary reloads can help?

But there also is TARGET_SELECT_EARLY_REMAT_MODES nowadays, that sounds like
just what we need here!
Reply | Threaded
Open this post in threaded view
|

[Bug target/87949] PowerPC saves CR registers across calls

segher at gcc dot gnu.org
In reply to this post by segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87949

--- Comment #11 from Segher Boessenkool <segher at gcc dot gnu.org> ---
Trying that out now.
Reply | Threaded
Open this post in threaded view
|

[Bug target/87949] PowerPC saves CR registers across calls

segher at gcc dot gnu.org
In reply to this post by segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87949

Segher Boessenkool <segher at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |segher at gcc dot gnu.org

--- Comment #12 from Segher Boessenkool <segher at gcc dot gnu.org> ---
Wow, this works!  Mine :-)