Register allocation trouble

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Register allocation trouble

Andrew Stubbs
Hi all,

I have an architecture that has two register files. Let's call them
class A and class B. There are some differences between their
capabilities, but for the purposes of this problem, they can be
considered to be identical, both holding SImode values, and both able to
receive values without a secondary reload.

In order to load data into A, the address register must also be in A.
Similarly, in order to load data into B, the address register must also
be in B.

So I set up my (simplified) pattern:

(set (match_operand:SI "register_operand" "=a,b")
      (match_operand:SI "memory_operand"   "Ra,Rb"))

where
   "a" is a register in A
   "b" is a register in B
   "Ra" is a mem with base address in A
   "Rb" is a mem with base address in B

(Obviously, there are stores and moves and whatnot too, but you get the
idea.)

The problem is that the register allocator cannot see inside Ra and Rb
to know that it must allocate the base register correctly. It only knows
that, some of the time, the instruction "does not satisfy its constraints".

I believe the register allocator relies on base_reg_class to choose a
class for the base register, but that just says "AB" (the union of class
A and class B), because it has no context to say otherwise. Only the
constraints define what hard registers are acceptable, and all they see
is pseudoregs during reload.

Similarly for the other hooks I've investigated: they don't have enough
context to know what's going on, and the rtx given always has pseudo
registers anyway.

The only solutions I can think of are to either preallocate the loads to
register classes via some means MODE_CODE_BASE_REG_CLASS can see
(address spaces, perhaps), or to have an alternative that catches the
mismatching cases and splits to a load and move, or set the base class
to A and always load via secondary reload for B.

Any suggestions would be greatly appreciated.

Andrew
Reply | Threaded
Open this post in threaded view
|

Re: Register allocation trouble

Jeff Law
On 07/21/2017 05:50 AM, Andrew Stubbs wrote:

> Hi all,
>
> I have an architecture that has two register files. Let's call them
> class A and class B. There are some differences between their
> capabilities, but for the purposes of this problem, they can be
> considered to be identical, both holding SImode values, and both able to
> receive values without a secondary reload.
>
> In order to load data into A, the address register must also be in A.
> Similarly, in order to load data into B, the address register must also
> be in B.
>
> So I set up my (simplified) pattern:
>
> (set (match_operand:SI "register_operand" "=a,b")
>      (match_operand:SI "memory_operand"   "Ra,Rb"))
>
> where
>   "a" is a register in A
>   "b" is a register in B
>   "Ra" is a mem with base address in A
>   "Rb" is a mem with base address in B
>
> (Obviously, there are stores and moves and whatnot too, but you get the
> idea.)
>
> The problem is that the register allocator cannot see inside Ra and Rb
> to know that it must allocate the base register correctly. It only knows
> that, some of the time, the instruction "does not satisfy its constraints".
Wen determining if any particular memory address is legitimate, you need
to know something about the other operand.  Ie, in a load from memory
the validity of the base register varies based on the target register of
the load.

GCC, in general, doesn't support that -- when determining address
validity if knows the address and the size of the memory reference.  You
don't know if it's a load or a store, nor do you know anything about the
other operand.

Last I looked, this was deeply baked into reload.  I haven't looked
recently to see if that's changed nor have I looked at LRA to see if it
handles this scenario better.

So not only do you have a constraint issue, you have a problem with
GO_IF_LEGITIMATE_ADDRESS (or whatever the hook is called these days).




>
> I believe the register allocator relies on base_reg_class to choose a
> class for the base register, but that just says "AB" (the union of class
> A and class B), because it has no context to say otherwise. Only the
> constraints define what hard registers are acceptable, and all they see
> is pseudoregs during reload.
>
> Similarly for the other hooks I've investigated: they don't have enough
> context to know what's going on, and the rtx given always has pseudo
> registers anyway.
Note that you may be able to look at reg_renumber to see a tentative
assignment.  But yes, you've got a fundamental problem in that there's
not enough context to make correct decisions.

>
> The only solutions I can think of are to either preallocate the loads to
> register classes via some means MODE_CODE_BASE_REG_CLASS can see
> (address spaces, perhaps), or to have an alternative that catches the
> mismatching cases and splits to a load and move, or set the base class
> to A and always load via secondary reload for B.
>
> Any suggestions would be greatly appreciated.
I wish I had suggestions.  This was a long standing nasty problem on the
PA as well.  For example, you needed to look at the destination register
of for a load from memory to know if the displacement in the memory
address was valid, or whether or not certain indexing address modes were
valid.

We went through many iterations of hacks, none of which were
particularly clean, all of which were weak in terms of GCC's ability to
use the addressing modes available when it was profitable.

You're likely to have to define some secondary reloads.

Good luck,
jeff
>
> Andrew

Reply | Threaded
Open this post in threaded view
|

Re: Register allocation trouble

Georg-Johann Lay-4
In reply to this post by Andrew Stubbs
Andrew Stubbs schrieb:

> Hi all,
>
> I have an architecture that has two register files. Let's call them
> class A and class B. There are some differences between their
> capabilities, but for the purposes of this problem, they can be
> considered to be identical, both holding SImode values, and both able to
> receive values without a secondary reload.
>
> In order to load data into A, the address register must also be in A.
> Similarly, in order to load data into B, the address register must also
> be in B.
>
> So I set up my (simplified) pattern:
>
> (set (match_operand:SI "register_operand" "=a,b")
>      (match_operand:SI "memory_operand"   "Ra,Rb"))
>
> where
>   "a" is a register in A
>   "b" is a register in B
>   "Ra" is a mem with base address in A
>   "Rb" is a mem with base address in B
>
> (Obviously, there are stores and moves and whatnot too, but you get the
> idea.)

1) You must not have more than 1 mov insn per mode.

2) You must handle any (reg, mem) x (reg, mem, constant) operand
    combination (constants in pools are handled separately).

> The problem is that the register allocator cannot see inside Ra and Rb
> to know that it must allocate the base register correctly. It only knows
> that, some of the time, the instruction "does not satisfy its constraints".

Dis you try secondary reload?

> I believe the register allocator relies on base_reg_class to choose a
> class for the base register, but that just says "AB" (the union of class
> A and class B), because it has no context to say otherwise. Only the
> constraints define what hard registers are acceptable, and all they see
> is pseudoregs during reload.
>
> Similarly for the other hooks I've investigated: they don't have enough
> context to know what's going on, and the rtx given always has pseudo
> registers anyway.

This might even be the case with secondary reloads. Sometimes you
just get "regclass is GENERAL_REGS" ... great.

> The only solutions I can think of are to either preallocate the loads to
> register classes via some means MODE_CODE_BASE_REG_CLASS can see
> (address spaces, perhaps), or to have an alternative that catches the
> mismatching cases and splits to a load and move, or set the base class
> to A and always load via secondary reload for B.

I don't see how address spaces could work.  Who would set the address
spaces of all types and decls? You don't even have enough information
to set the address space (no reg classes, just types and trees).

Dedicated machine mode is also no solution.  You'd just waste time
duplication all SI pattern to also, say, PSI of same mode precision,
but you still have to handle the complete cross product in the mov
insns.  Maybe could could deny specific modes to go into specific
reg classes, but even if that works the resulting code will be sub
optimal and you then need secondary reloads for reg-reg moves.
And the problem of who shall decide which mode to use is still there.

Johann

> Any suggestions would be greatly appreciated.
>
> Andrew
>

Reply | Threaded
Open this post in threaded view
|

Re: Register allocation trouble

Nathan Sidwell-2
In reply to this post by Andrew Stubbs
On 07/21/2017 07:50 AM, Andrew Stubbs wrote:

> (set (match_operand:SI "register_operand" "=a,b")
>       (match_operand:SI "memory_operand"   "Ra,Rb"))


How horrible would it be to split expose the entire mem:

(set (match_operand:SI "register" "=a,b")
      (mem:SI (match_operand:SI "register" "a,b")))
+ variants for reg+const if you have them?

nathan

--
Nathan Sidwell
Reply | Threaded
Open this post in threaded view
|

Re: Register allocation trouble

Richard Earnshaw-4
On 21/07/17 19:40, Nathan Sidwell wrote:

> On 07/21/2017 07:50 AM, Andrew Stubbs wrote:
>
>> (set (match_operand:SI "register_operand" "=a,b")
>>       (match_operand:SI "memory_operand"   "Ra,Rb"))
>
>
> How horrible would it be to split expose the entire mem:
>
> (set (match_operand:SI "register" "=a,b")
>      (mem:SI (match_operand:SI "register" "a,b")))
> + variants for reg+const if you have them?
>
> nathan
>

I have in the past pondered some form of MD rules for addressing modes:
something like

(define_mem_addr <name>
  (parallel [
    (match_operand:M 0 "register_operand" "constraint")
    (plus:M (match_operand:M 0 "register_operand" "constraint")
            (match_operand:M 1 "mem_offset_operand" "constraint"))
    ...
   ])
  "@
  [%0]
  [%0, %1]
  ..."
)

such that you could express all the details of each addressing form.
You could then match these rules in your patterns and be explicit about
what was permitted in each case (including adding early-clobber
constraints -- really needed on ARM).  The parallel lists all the forms
permitted by this addressing rule.

It would be a lot of work however, and I've never pushed it beyond the
initial sketch above.  The hardest part is that this would involve
serious surgery in the register allocators.


Reply | Threaded
Open this post in threaded view
|

Re: Register allocation trouble

Jeff Law
In reply to this post by Nathan Sidwell-2
On 07/21/2017 12:40 PM, Nathan Sidwell wrote:

> On 07/21/2017 07:50 AM, Andrew Stubbs wrote:
>
>> (set (match_operand:SI "register_operand" "=a,b")
>>       (match_operand:SI "memory_operand"   "Ra,Rb"))
>
>
> How horrible would it be to split expose the entire mem:
>
> (set (match_operand:SI "register" "=a,b")
>      (mem:SI (match_operand:SI "register" "a,b")))
> + variants for reg+const if you have them?
Doesn't that run afoul of the various restrictions on the movXX patterns?

jeff
Reply | Threaded
Open this post in threaded view
|

Re: Register allocation trouble

Andrew Stubbs
In reply to this post by Andrew Stubbs
Thanks to all those who replied. :-)

Here's what I've done to fix the problem:

1. Set the base rclass to A only.

2. Configured secondary reloads to B via A.

3. Disabled the Rb constraint. [*]

That's enough to create correct code, but it's pretty horrible, so I
also added new patterns of the form Nathan suggested so that the base
register can be allocated directly, as an optimization. These occur
before the main mov insn in the search order, and catch only valid MEMs
that won't get meddled with, so I believe that the "one mov per mode"
rule can be safely ignored. The main mov pattern can still do the
loads/stores, via the secondary reloads, so I believe that to be safe too.

Thanks again

Andrew

[*] I've not removed it because actually it's still active for some
address spaces, but that's a detail I glossed over previously.
Reply | Threaded
Open this post in threaded view
|

Re: Register allocation trouble

Georg-Johann Lay-2
On 24.07.2017 13:38, Andrew Stubbs wrote:

> Thanks to all those who replied. :-)
>
> Here's what I've done to fix the problem:
>
> 1. Set the base rclass to A only.
>
> 2. Configured secondary reloads to B via A.
>
> 3. Disabled the Rb constraint. [*]
>
> That's enough to create correct code, but it's pretty horrible, so I
> also added new patterns of the form Nathan suggested so that the base
> register can be allocated directly, as an optimization. These occur
> before the main mov insn in the search order, and catch only valid MEMs
> that won't get meddled with, so I believe that the "one mov per mode"
> rule can be safely ignored. The main mov pattern can still do the
> loads/stores, via the secondary reloads, so I believe that to be safe too.

Dunno if that works in all situation.  For example, when the register
allocator is facing high register pressure and decides to spill the
target register, it uses the constraints of the matched insn.

Johann

> Thanks again
>
> Andrew
>
> [*] I've not removed it because actually it's still active for some
> address spaces, but that's a detail I glossed over previously.
>

Reply | Threaded
Open this post in threaded view
|

Re: Register allocation trouble

Andrew Stubbs
On 24/07/17 14:58, Georg-Johann Lay wrote:
> Dunno if that works in all situation.  For example, when the register
> allocator is facing high register pressure and decides to spill the
> target register, it uses the constraints of the matched insn.

That would be a memory to memory move, and therefore not valid in any
mov insn on many architectures. Are you sure that's a real thing?

Confused now.

Andrew
Reply | Threaded
Open this post in threaded view
|

Re: Register allocation trouble

Georg-Johann Lay-2
Andrew Stubbs schrieb:
> On 24/07/17 14:58, Georg-Johann Lay wrote:
>> Dunno if that works in all situation.  For example, when the register
>> allocator is facing high register pressure and decides to spill the
>> target register, it uses the constraints of the matched insn.
>
> That would be a memory to memory move, and therefore not valid in any
> mov insn on many architectures. Are you sure that's a real thing?

Hard to tell, it depends on many gory details...

If the additional insn is a mov insn that only catches specific cases
by its predicates, then you have definitely a problem (more than 1 mov
insn per mode).

If it has an explicit mem: then the problems are completely different.
For example, it will also match volatile MEMs even for cases when
volatile_ok is false.

In general, it's hard to find test cases which show that something
breaks.

Who generates the additional insn?

Johann

> Confused now.
>
> Andrew


Reply | Threaded
Open this post in threaded view
|

Re: Register allocation trouble

Jeff Law
On 07/24/2017 11:37 AM, Georg-Johann Lay wrote:

> Andrew Stubbs schrieb:
>> On 24/07/17 14:58, Georg-Johann Lay wrote:
>>> Dunno if that works in all situation.  For example, when the register
>>> allocator is facing high register pressure and decides to spill the
>>> target register, it uses the constraints of the matched insn.
>>
>> That would be a memory to memory move, and therefore not valid in any
>> mov insn on many architectures. Are you sure that's a real thing?
>
> Hard to tell, it depends on many gory details...
>
> If the additional insn is a mov insn that only catches specific cases
> by its predicates, then you have definitely a problem (more than 1 mov
> insn per mode).
Correct.  Sadly everything will seem OK here, then one day you'll have
an abort or wrong code generation in reload.  Been there, done that.

>
> If it has an explicit mem: then the problems are completely different.
> For example, it will also match volatile MEMs even for cases when
> volatile_ok is false.
Though presumably you could extract the MEM out of the insn so that you
could check the volatile flag.  Though that's fairly gross.

>
> In general, it's hard to find test cases which show that something
> breaks.
Right.

Jeff