movmem pattern and missed alignment

classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

movmem pattern and missed alignment

Paul Koning-6
I have a movmem pattern in my target that pays attention to the alignment argument.

GCC isn't passing in the expected alignment part of the time.  I have this test case:

extern int *i, *j;
extern int iv[40], jv[40];

void f1(void)
{
    __builtin_memcpy (i, j, 32);
}

void f2(void)
{
    __builtin_memcpy (iv, jv, 32);
}

When the movmem pattern is called for f1, alignment is 1.  In f2, it is 2 (int is 2 bytes in pdp11) as expected.

The compiler clearly knows that int* points to aligned data, since it generates instructions that assume alignment (this is a strict-alignment target) when I dereference the pointer.  But somehow it gets it wrong for block move.

I also see this for the individual move operations that are generated for very short memcpy operations; if the count is 4, I get four move byte operations for f1, but two move word operations for f2.  

This seems like a bug.  Am I missing something?

        paul

Reply | Threaded
Open this post in threaded view
|

Re: movmem pattern and missed alignment

Richard Biener-2
On Mon, Oct 8, 2018 at 3:57 PM Paul Koning <[hidden email]> wrote:

>
> I have a movmem pattern in my target that pays attention to the alignment argument.
>
> GCC isn't passing in the expected alignment part of the time.  I have this test case:
>
> extern int *i, *j;
> extern int iv[40], jv[40];
>
> void f1(void)
> {
>     __builtin_memcpy (i, j, 32);
> }
>
> void f2(void)
> {
>     __builtin_memcpy (iv, jv, 32);
> }
>
> When the movmem pattern is called for f1, alignment is 1.  In f2, it is 2 (int is 2 bytes in pdp11) as expected.
>
> The compiler clearly knows that int* points to aligned data, since it generates instructions that assume alignment (this is a strict-alignment target) when I dereference the pointer.  But somehow it gets it wrong for block move.
>
> I also see this for the individual move operations that are generated for very short memcpy operations; if the count is 4, I get four move byte operations for f1, but two move word operations for f2.
>
> This seems like a bug.  Am I missing something?

Yes, memcpy doesn't require anything bigger than byte alignment and
GCC infers alignemnt
only from actual memory references or from declarations (like iv /
jv).  For i and j there
are no dereferences and thus you get alignment of 1.

Richard.

>
>         paul
>
Reply | Threaded
Open this post in threaded view
|

Re: movmem pattern and missed alignment

Paul Koning-6


> On Oct 8, 2018, at 11:09 AM, Richard Biener <[hidden email]> wrote:
>
> On Mon, Oct 8, 2018 at 3:57 PM Paul Koning <[hidden email]> wrote:
>>
>> I have a movmem pattern in my target that pays attention to the alignment argument.
>>
>> GCC isn't passing in the expected alignment part of the time.  I have this test case:
>>
>> extern int *i, *j;
>> extern int iv[40], jv[40];
>>
>> void f1(void)
>> {
>>    __builtin_memcpy (i, j, 32);
>> }
>>
>> void f2(void)
>> {
>>    __builtin_memcpy (iv, jv, 32);
>> }
>>
>> When the movmem pattern is called for f1, alignment is 1.  In f2, it is 2 (int is 2 bytes in pdp11) as expected.
>>
>> The compiler clearly knows that int* points to aligned data, since it generates instructions that assume alignment (this is a strict-alignment target) when I dereference the pointer.  But somehow it gets it wrong for block move.
>>
>> I also see this for the individual move operations that are generated for very short memcpy operations; if the count is 4, I get four move byte operations for f1, but two move word operations for f2.
>>
>> This seems like a bug.  Am I missing something?
>
> Yes, memcpy doesn't require anything bigger than byte alignment and
> GCC infers alignemnt
> only from actual memory references or from declarations (like iv /
> jv).  For i and j there
> are no dereferences and thus you get alignment of 1.
>
> Richard.

Ok, but why is that not a bug?  The whole point of passing alignment to the movmem pattern is to let it generate code that takes advantage of the alignment.  So we get a missed optimization.

        paul

Reply | Threaded
Open this post in threaded view
|

Re: movmem pattern and missed alignment

Michael Matz
Hi,

On Mon, 8 Oct 2018, Paul Koning wrote:

> >> extern int *i, *j;
> >> extern int iv[40], jv[40];
> >>
> >> void f1(void)
> >> {
> >>    __builtin_memcpy (i, j, 32);
> >> }
> >>
> >> void f2(void)
> >> {
> >>    __builtin_memcpy (iv, jv, 32);
> >> }
> >
> > Yes, memcpy doesn't require anything bigger than byte alignment and
> > GCC infers alignemnt
> > only from actual memory references or from declarations (like iv /
> > jv).  For i and j there
> > are no dereferences and thus you get alignment of 1.
> >
> > Richard.
>
> Ok, but why is that not a bug?  The whole point of passing alignment to
> the movmem pattern is to let it generate code that takes advantage of
> the alignment.  So we get a missed optimization.

Only if you somewhere visibly add accesses to *i and *j.  Without them you
only have the "accesses" via memcpy, and as Richi says, those don't imply
any alignment requirements.  The i and j pointers might validly be char*
pointers in disguise and hence be in fact only 1-aligned.  I.e. there's
nothing in your small example program from which GCC can infer that those
two global pointers are in fact 2-aligned.


Ciao,
Michael.
Reply | Threaded
Open this post in threaded view
|

Re: movmem pattern and missed alignment

Andrew Haley
On 10/08/2018 06:20 PM, Michael Matz wrote:
> Only if you somewhere visibly add accesses to *i and *j.  Without them you
> only have the "accesses" via memcpy, and as Richi says, those don't imply
> any alignment requirements.  The i and j pointers might validly be char*
> pointers in disguise and hence be in fact only 1-aligned.  I.e. there's
> nothing in your small example program from which GCC can infer that those
> two global pointers are in fact 2-aligned.

So all you'd actually have to say is

void f1(void)
{
    *i; *j;
    __builtin_memcpy (i, j, 32);
}

--
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
Reply | Threaded
Open this post in threaded view
|

Re: movmem pattern and missed alignment

Alexander Monakov-4
In reply to this post by Michael Matz
On Mon, 8 Oct 2018, Michael Matz wrote:

> > Ok, but why is that not a bug?  The whole point of passing alignment to
> > the movmem pattern is to let it generate code that takes advantage of
> > the alignment.  So we get a missed optimization.
>
> Only if you somewhere visibly add accesses to *i and *j.  Without them you
> only have the "accesses" via memcpy, and as Richi says, those don't imply
> any alignment requirements.  The i and j pointers might validly be char*
> pointers in disguise and hence be in fact only 1-aligned.  I.e. there's
> nothing in your small example program from which GCC can infer that those
> two global pointers are in fact 2-aligned.

Well, it's not that simple. C11 6.3.2.3 p7 makes it undefined to form an
'int *' value that is not suitably aligned:

  A pointer to an object type may be converted to a pointer to a different
  object type. If the resulting pointer is not correctly aligned for the
  referenced type, the behavior is undefined.

So in addition to what you said, we should probably say that GCC decides
not to exploit this UB in order to allow code to round-trip pointer values
via arbitrary pointer types?


To put Michael's explanation in different words:

This is not obviously a bug, because static pointer type does not imply the
dynamic pointed-to type. The caller of 'f1' could look like

void call_f1(void)
{
  short ibuf[20] = {0}, jbuf[20] = {0};
  i = (void *) ibuf;
  j = (void *) jbuf;
  f1();
}

and it's valid to memcpy from jbuf to ibuf, memcpy does not "see" the
static pointer type, and works as if by dereferencing 'char *' pointers.
(although as mentioned above it's more subtly invalid when assigning to
i and j).

If 'f1' dereferences 'i', GCC may deduce that dynamic type of '*i' is 'int' and
therefore 'i' must be suitably aligned. But in absence of dereferences GCC
does not make assumptions about dynamic type and alignment.

Alexander
Reply | Threaded
Open this post in threaded view
|

Re: movmem pattern and missed alignment

Michael Matz
Hi,

On Mon, 8 Oct 2018, Alexander Monakov wrote:

> > Only if you somewhere visibly add accesses to *i and *j.  Without them
> > you only have the "accesses" via memcpy, and as Richi says, those
> > don't imply any alignment requirements.  The i and j pointers might
> > validly be char* pointers in disguise and hence be in fact only
> > 1-aligned.  I.e. there's nothing in your small example program from
> > which GCC can infer that those two global pointers are in fact
> > 2-aligned.
>
> Well, it's not that simple. C11 6.3.2.3 p7 makes it undefined to form an
> 'int *' value that is not suitably aligned:
>
> So in addition to what you said, we should probably say that GCC decides
> not to exploit this UB in order to allow code to round-trip pointer values
> via arbitrary pointer types?

That's correct, I was explaining from the middle-end perspective.  There
we are consciously more lenient as we have to support the real world and
other languages than C.  This is one of the cases.


Ciao,
Michael.
Reply | Threaded
Open this post in threaded view
|

Re: movmem pattern and missed alignment

Paul Koning-6
In reply to this post by Andrew Haley


> On Oct 8, 2018, at 1:29 PM, Andrew Haley <[hidden email]> wrote:
>
> On 10/08/2018 06:20 PM, Michael Matz wrote:
>> Only if you somewhere visibly add accesses to *i and *j.  Without them you
>> only have the "accesses" via memcpy, and as Richi says, those don't imply
>> any alignment requirements.  The i and j pointers might validly be char*
>> pointers in disguise and hence be in fact only 1-aligned.  I.e. there's
>> nothing in your small example program from which GCC can infer that those
>> two global pointers are in fact 2-aligned.
>
> So all you'd actually have to say is
>
> void f1(void)
> {
>    *i; *j;
>    __builtin_memcpy (i, j, 32);
> }

No, that doesn't help.  Not even if I make it:

void f1(void)
{
    k = *i + *j;
    __builtin_memcpy (i, j, 4);
}

The first line does word aligned references to *i and *j, but the memcpy stubbornly remains a byte move.

        paul

Reply | Threaded
Open this post in threaded view
|

Re: movmem pattern and missed alignment

Michael Matz
Hi,

On Mon, 8 Oct 2018, Paul Koning wrote:

> > So all you'd actually have to say is
> >
> > void f1(void)
> > {
> >    *i; *j;
> >    __builtin_memcpy (i, j, 32);
> > }
>
> No, that doesn't help.  Not even if I make it:
>
> void f1(void)
> {
>     k = *i + *j;
>     __builtin_memcpy (i, j, 4);
> }
>
> The first line does word aligned references to *i and *j, but the memcpy stubbornly remains a byte move.

k is a global, so the loads from i/j can't be optimized away?  If so, now
you have a missed optimization bug ;-)  Might be non-trivial to fix for
general situations (basically the natural alignment can only be inferred
in regions that are dominated by such accesses, but not e.g. for:
   if (cond()) k = *i+*j;
   memcpy(i,j,4);
as cond() might be always false).


Ciao,
Michael.
Reply | Threaded
Open this post in threaded view
|

Re: movmem pattern and missed alignment

Eric Botcazou-3
In reply to this post by Michael Matz
> That's correct, I was explaining from the middle-end perspective.  There
> we are consciously more lenient as we have to support the real world and
> other languages than C.  This is one of the cases.

This had worked as Paul expects until GCC 4.4 IIRC and this was perfectly OK
for every language on strict-alignment platforms.  This was changed only
because of SSE on x86.

--
Eric Botcazou
Reply | Threaded
Open this post in threaded view
|

Re: movmem pattern and missed alignment

Paul Koning-6


> On Oct 8, 2018, at 5:43 PM, Eric Botcazou <[hidden email]> wrote:
>
>> That's correct, I was explaining from the middle-end perspective.  There
>> we are consciously more lenient as we have to support the real world and
>> other languages than C.  This is one of the cases.
>
> This had worked as Paul expects until GCC 4.4 IIRC and this was perfectly OK
> for every language on strict-alignment platforms.  This was changed only
> because of SSE on x86.
>
> --
> Eric Botcazou

So does that mean this should be a target-specific behavior, but it isn't at the moment?

        paul

Reply | Threaded
Open this post in threaded view
|

Re: movmem pattern and missed alignment

Richard Biener-2
In reply to this post by Eric Botcazou-3
On October 8, 2018 11:43:00 PM GMT+02:00, Eric Botcazou <[hidden email]> wrote:

>> That's correct, I was explaining from the middle-end perspective.
>There
>> we are consciously more lenient as we have to support the real world
>and
>> other languages than C.  This is one of the cases.
>
>This had worked as Paul expects until GCC 4.4 IIRC and this was
>perfectly OK
>for every language on strict-alignment platforms.  This was changed
>only
>because of SSE on x86.

And because we ended up ignoring all pointer casts.

Richard.

Reply | Threaded
Open this post in threaded view
|

Re: movmem pattern and missed alignment

Alexander Monakov-4
On Tue, 9 Oct 2018, Richard Biener wrote:
> >This had worked as Paul expects until GCC 4.4 IIRC and this was perfectly OK
> >for every language on strict-alignment platforms.  This was changed only
> >because of SSE on x86.
>
> And because we ended up ignoring all pointer casts.

It's not quite obvious what SSE has to do with this - any hint please?

(according to my quick check this changed between gcc-4.5 and gcc-4.6)

Alexander
Reply | Threaded
Open this post in threaded view
|

Re: movmem pattern and missed alignment

Eric Botcazou-3
> It's not quite obvious what SSE has to do with this - any hint please?

SSE introduced alignment constraints into the non-strict-alignment target x86
so people didn't really want to play by the rules of strict-alignment targets.

> (according to my quick check this changed between gcc-4.5 and gcc-4.6)

Possibly indeed, I remembered GCC 4.5 as being the turning point.

--
Eric Botcazou
Reply | Threaded
Open this post in threaded view
|

Re: movmem pattern and missed alignment

Andrew Haley
In reply to this post by Paul Koning-6
On 10/08/2018 07:38 PM, Paul Koning wrote:

>
>
>> On Oct 8, 2018, at 1:29 PM, Andrew Haley <[hidden email]> wrote:
>>
>> On 10/08/2018 06:20 PM, Michael Matz wrote:
>>> Only if you somewhere visibly add accesses to *i and *j.  Without them you
>>> only have the "accesses" via memcpy, and as Richi says, those don't imply
>>> any alignment requirements.  The i and j pointers might validly be char*
>>> pointers in disguise and hence be in fact only 1-aligned.  I.e. there's
>>> nothing in your small example program from which GCC can infer that those
>>> two global pointers are in fact 2-aligned.
>>
>> So all you'd actually have to say is
>>
>> void f1(void)
>> {
>>    *i; *j;
>>    __builtin_memcpy (i, j, 32);
>> }
>
> No, that doesn't help.

It could do.

> Not even if I make it:
>
> void f1(void)
> {
>     k = *i + *j;
>     __builtin_memcpy (i, j, 4);
> }
>
> The first line does word aligned references to *i and *j, but the memcpy stubbornly remains a byte move.

Right, so that is a missed optimization.

--
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
Reply | Threaded
Open this post in threaded view
|

Re: movmem pattern and missed alignment

Richard Biener-2
In reply to this post by Eric Botcazou-3
On Tue, Oct 9, 2018 at 8:41 AM Eric Botcazou <[hidden email]> wrote:
>
> > It's not quite obvious what SSE has to do with this - any hint please?
>
> SSE introduced alignment constraints into the non-strict-alignment target x86
> so people didn't really want to play by the rules of strict-alignment targets.

Yeah.  We've walked back and forth for that very issue though.  We now require
all targest to play by the same rules -- if you have a *(double *) access then
that has to be aligned according to double.

We couldn't realistically walk back and rely on alignment of addresses based
on their type (like C would allow us to do) because we've thrown away types
on addresses.  See also the thread about string-length warning stuff where
we've posted testcases that show you can get arbitrarily typed addresses
into your strlen() calls for example by means of CSE.  The middle-end is
simply not prepared to preserve that information.

It was repeatedly suggested that we _could_ derive alignment info from
function parameter types since we rely on precise typing there for example
for points-to analysis (albeit only for restrict qualification processing and
for DECL_BY_REFERENCE "pointers").  That would fix the simple testcase
that was presented here.

> > (according to my quick check this changed between gcc-4.5 and gcc-4.6)
>
> Possibly indeed, I remembered GCC 4.5 as being the turning point.

It was really changing over several releases, but yes.

Richard.

>
> --
> Eric Botcazou
Reply | Threaded
Open this post in threaded view
|

Re: movmem pattern and missed alignment

Richard Biener-2
In reply to this post by Andrew Haley
On Tue, Oct 9, 2018 at 10:02 AM Andrew Haley <[hidden email]> wrote:

>
> On 10/08/2018 07:38 PM, Paul Koning wrote:
> >
> >
> >> On Oct 8, 2018, at 1:29 PM, Andrew Haley <[hidden email]> wrote:
> >>
> >> On 10/08/2018 06:20 PM, Michael Matz wrote:
> >>> Only if you somewhere visibly add accesses to *i and *j.  Without them you
> >>> only have the "accesses" via memcpy, and as Richi says, those don't imply
> >>> any alignment requirements.  The i and j pointers might validly be char*
> >>> pointers in disguise and hence be in fact only 1-aligned.  I.e. there's
> >>> nothing in your small example program from which GCC can infer that those
> >>> two global pointers are in fact 2-aligned.
> >>
> >> So all you'd actually have to say is
> >>
> >> void f1(void)
> >> {
> >>    *i; *j;
> >>    __builtin_memcpy (i, j, 32);
> >> }
> >
> > No, that doesn't help.
>
> It could do.
>
> > Not even if I make it:
> >
> > void f1(void)
> > {
> >     k = *i + *j;
> >     __builtin_memcpy (i, j, 4);
> > }
> >
> > The first line does word aligned references to *i and *j, but the memcpy stubbornly remains a byte move.
>
> Right, so that is a missed optimization.

Yes.  Note that on GIMPLE alignment of pointers info is carried as
side-info for SSA names
which make the above cases difficult to deal with since the
dereference and the call argument
use the same SSA names.  So if you consider

  if (i_1 & 7 == 0)
   {
     k = *i_1;
     __builtin_memcpy (i_1, j, 4);
   }

then we cannot set the alignment of i_1 at/after k = *i_1 because doing so would
affect the alignment test which we'd then optimize away.  We'd need to introduce
a SSA copy to get a new SSA name but that would be optimized away quickly.

So the option would be to change the representation of __builtin_memcpy
either by making it an aggregate assignment or by using a builtin with
explicit alignment or compute alignment at RTL expansion time.

Note the pass that "computes" alignment is currently SSA based (it's
the CCP pass).

Richard.

> --
> Andrew Haley
> Java Platform Lead Engineer
> Red Hat UK Ltd. <https://www.redhat.com>
> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
Reply | Threaded
Open this post in threaded view
|

Re: movmem pattern and missed alignment

Eric Botcazou-3
In reply to this post by Richard Biener-2
> It was repeatedly suggested that we _could_ derive alignment info from
> function parameter types since we rely on precise typing there for example
> for points-to analysis (albeit only for restrict qualification processing
> and for DECL_BY_REFERENCE "pointers").  That would fix the simple testcase
> that was presented here.

OK, I keep forgetting it and that would be a good compromise indeed.

--
Eric Botcazou
Reply | Threaded
Open this post in threaded view
|

Re: movmem pattern and missed alignment

Alexander Monakov-4
In reply to this post by Richard Biener-2
On Tue, 9 Oct 2018, Richard Biener wrote:
>
> then we cannot set the alignment of i_1 at/after k = *i_1 because doing so would
> affect the alignment test which we'd then optimize away.  We'd need to introduce
> a SSA copy to get a new SSA name but that would be optimized away quickly.

We preserve __builtin_assume_aligned up to pass-fold-all-builtins, so would it
work to emit it just before the memcpy

  i_2 = __builtin_assume_aligned(i_1, 4);
  __builtin_memcpy(j, i_2, 32);

in theory?

Alexander
Reply | Threaded
Open this post in threaded view
|

Re: movmem pattern and missed alignment

Richard Biener-2
On Tue, Oct 9, 2018 at 11:00 AM Alexander Monakov <[hidden email]> wrote:

>
> On Tue, 9 Oct 2018, Richard Biener wrote:
> >
> > then we cannot set the alignment of i_1 at/after k = *i_1 because doing so would
> > affect the alignment test which we'd then optimize away.  We'd need to introduce
> > a SSA copy to get a new SSA name but that would be optimized away quickly.
>
> We preserve __builtin_assume_aligned up to pass-fold-all-builtins, so would it
> work to emit it just before the memcpy
>
>   i_2 = __builtin_assume_aligned(i_1, 4);
>   __builtin_memcpy(j, i_2, 32);
>
> in theory?

That's still before RTL expansion so I'm not sure that is enough.

Richard.

>
> Alexander
12