Re: [PATCH] x86: fix CVT{,T}PD2PI insns

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] x86: fix CVT{,T}PD2PI insns

Jan Beulich-2
>>> On 27.06.19 at 11:03,  wrote:
> With just an "m" constraint misaligned memory operands won't be forced
> into a register, and hence cause #GP. So far this was guaranteed only
> in the case that CVT{,T}PD2DQ were chosen (which looks to be the case on
> x86-64 only).
>
> Instead of switching the second alternative to Bm, use just m on the
> first and replace nonimmediate_operand by vector_operand.

While doing this and the others where I'm also replacing Bm by uses of
vector_operand, I've started wondering whether Bm couldn't (and then
shouldn't) be dropped altogether, replacing it everywhere by "m"
combined with vector_operand (or vector_memory_operand when
register operands aren't allowed anyway).

Furthermore there's an issue with Bm and vector_memory_operand:
Whether alignment gets enforced depends on TARGET_AVX. However,
just like the two insns in question here, the SHA ones too don't have
VEX-encoded equivalents, and hence require alignment enforced even
with -mavx. Together with the above I wonder whether Bm shouldn't
be re-purposed to express this special requirement.

Jan


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] x86: fix CVT{,T}PD2PI insns

Uros Bizjak-3
On Thu, Jun 27, 2019 at 11:10 AM Jan Beulich <[hidden email]> wrote:

>
> >>> On 27.06.19 at 11:03,  wrote:
> > With just an "m" constraint misaligned memory operands won't be forced
> > into a register, and hence cause #GP. So far this was guaranteed only
> > in the case that CVT{,T}PD2DQ were chosen (which looks to be the case on
> > x86-64 only).
> >
> > Instead of switching the second alternative to Bm, use just m on the
> > first and replace nonimmediate_operand by vector_operand.
>
> While doing this and the others where I'm also replacing Bm by uses of
> vector_operand, I've started wondering whether Bm couldn't (and then
> shouldn't) be dropped altogether, replacing it everywhere by "m"
> combined with vector_operand (or vector_memory_operand when
> register operands aren't allowed anyway).

No. Register allocator will propagate unaligned memory in non-AVX
case, which is not allowed with vector_operand.

Uros.

> Furthermore there's an issue with Bm and vector_memory_operand:
> Whether alignment gets enforced depends on TARGET_AVX. However,
> just like the two insns in question here, the SHA ones too don't have
> VEX-encoded equivalents, and hence require alignment enforced even
> with -mavx. Together with the above I wonder whether Bm shouldn't
> be re-purposed to express this special requirement.
>
> Jan
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] x86: fix CVT{,T}PD2PI insns

Jan Beulich-2
>>> On 27.06.19 at 12:22, <[hidden email]> wrote:
> On Thu, Jun 27, 2019 at 11:10 AM Jan Beulich <[hidden email]> wrote:
>>
>> >>> On 27.06.19 at 11:03,  wrote:
>> > With just an "m" constraint misaligned memory operands won't be forced
>> > into a register, and hence cause #GP. So far this was guaranteed only
>> > in the case that CVT{,T}PD2DQ were chosen (which looks to be the case on
>> > x86-64 only).
>> >
>> > Instead of switching the second alternative to Bm, use just m on the
>> > first and replace nonimmediate_operand by vector_operand.
>>
>> While doing this and the others where I'm also replacing Bm by uses of
>> vector_operand, I've started wondering whether Bm couldn't (and then
>> shouldn't) be dropped altogether, replacing it everywhere by "m"
>> combined with vector_operand (or vector_memory_operand when
>> register operands aren't allowed anyway).
>
> No. Register allocator will propagate unaligned memory in non-AVX
> case, which is not allowed with vector_operand.

I'm afraid I don't understand: Unaligned SIMD memory accesses will
generally fault in non-AVX mode, so such propagation would seem
wrong to me and hence would seem to be correctly not allowed.
Furthermore both vector_operand and Bm resolve to the same
vector_memory_operand. The TARGET_AVX check actually is inside
vector_memory_operand, i.e. affects both the same way.

Jan


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] x86: fix CVT{,T}PD2PI insns

Uros Bizjak-3
On Thu, Jun 27, 2019 at 12:47 PM Jan Beulich <[hidden email]> wrote:

>
> >>> On 27.06.19 at 12:22, <[hidden email]> wrote:
> > On Thu, Jun 27, 2019 at 11:10 AM Jan Beulich <[hidden email]> wrote:
> >>
> >> >>> On 27.06.19 at 11:03,  wrote:
> >> > With just an "m" constraint misaligned memory operands won't be forced
> >> > into a register, and hence cause #GP. So far this was guaranteed only
> >> > in the case that CVT{,T}PD2DQ were chosen (which looks to be the case on
> >> > x86-64 only).
> >> >
> >> > Instead of switching the second alternative to Bm, use just m on the
> >> > first and replace nonimmediate_operand by vector_operand.
> >>
> >> While doing this and the others where I'm also replacing Bm by uses of
> >> vector_operand, I've started wondering whether Bm couldn't (and then
> >> shouldn't) be dropped altogether, replacing it everywhere by "m"
> >> combined with vector_operand (or vector_memory_operand when
> >> register operands aren't allowed anyway).
> >
> > No. Register allocator will propagate unaligned memory in non-AVX
> > case, which is not allowed with vector_operand.
>
> I'm afraid I don't understand: Unaligned SIMD memory accesses will
> generally fault in non-AVX mode, so such propagation would seem
> wrong to me and hence would seem to be correctly not allowed.
> Furthermore both vector_operand and Bm resolve to the same
> vector_memory_operand. The TARGET_AVX check actually is inside
> vector_memory_operand, i.e. affects both the same way.

"Bm" *prevents* propagation of unaligned access for non-AVX targets.
As said, register allocator does not care for operand predicates (it
only looks at operand constraints), so it will propagate unaligned
access with "m" operand. To avoid propagation, "Bm" should and does
use vector_memory_operand constraint internally.

Uros.
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] x86: fix CVT{,T}PD2PI insns

Jan Beulich-2
>>> On 27.06.19 at 13:02, <[hidden email]> wrote:
> On Thu, Jun 27, 2019 at 12:47 PM Jan Beulich <[hidden email]> wrote:
>>
>> >>> On 27.06.19 at 12:22, <[hidden email]> wrote:
>> > On Thu, Jun 27, 2019 at 11:10 AM Jan Beulich <[hidden email]> wrote:
>> >>
>> >> >>> On 27.06.19 at 11:03,  wrote:
>> >> > With just an "m" constraint misaligned memory operands won't be forced
>> >> > into a register, and hence cause #GP. So far this was guaranteed only
>> >> > in the case that CVT{,T}PD2DQ were chosen (which looks to be the case on
>> >> > x86-64 only).
>> >> >
>> >> > Instead of switching the second alternative to Bm, use just m on the
>> >> > first and replace nonimmediate_operand by vector_operand.
>> >>
>> >> While doing this and the others where I'm also replacing Bm by uses of
>> >> vector_operand, I've started wondering whether Bm couldn't (and then
>> >> shouldn't) be dropped altogether, replacing it everywhere by "m"
>> >> combined with vector_operand (or vector_memory_operand when
>> >> register operands aren't allowed anyway).
>> >
>> > No. Register allocator will propagate unaligned memory in non-AVX
>> > case, which is not allowed with vector_operand.
>>
>> I'm afraid I don't understand: Unaligned SIMD memory accesses will
>> generally fault in non-AVX mode, so such propagation would seem
>> wrong to me and hence would seem to be correctly not allowed.
>> Furthermore both vector_operand and Bm resolve to the same
>> vector_memory_operand. The TARGET_AVX check actually is inside
>> vector_memory_operand, i.e. affects both the same way.
>
> "Bm" *prevents* propagation of unaligned access for non-AVX targets.
> As said, register allocator does not care for operand predicates (it
> only looks at operand constraints), so it will propagate unaligned
> access with "m" operand. To avoid propagation, "Bm" should and does
> use vector_memory_operand constraint internally.

Okay, I think I got it now (also because of your reply on the other
thread). It means in the patch here I need to retain Bm rather than
dropping it, too, and additionally use it on the other alternative.

Jan


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] x86: fix CVT{,T}PD2PI insns

Uros Bizjak-3
On Thu, Jun 27, 2019 at 1:31 PM Jan Beulich <[hidden email]> wrote:

>
> >>> On 27.06.19 at 13:02, <[hidden email]> wrote:
> > On Thu, Jun 27, 2019 at 12:47 PM Jan Beulich <[hidden email]> wrote:
> >>
> >> >>> On 27.06.19 at 12:22, <[hidden email]> wrote:
> >> > On Thu, Jun 27, 2019 at 11:10 AM Jan Beulich <[hidden email]> wrote:
> >> >>
> >> >> >>> On 27.06.19 at 11:03,  wrote:
> >> >> > With just an "m" constraint misaligned memory operands won't be forced
> >> >> > into a register, and hence cause #GP. So far this was guaranteed only
> >> >> > in the case that CVT{,T}PD2DQ were chosen (which looks to be the case on
> >> >> > x86-64 only).
> >> >> >
> >> >> > Instead of switching the second alternative to Bm, use just m on the
> >> >> > first and replace nonimmediate_operand by vector_operand.
> >> >>
> >> >> While doing this and the others where I'm also replacing Bm by uses of
> >> >> vector_operand, I've started wondering whether Bm couldn't (and then
> >> >> shouldn't) be dropped altogether, replacing it everywhere by "m"
> >> >> combined with vector_operand (or vector_memory_operand when
> >> >> register operands aren't allowed anyway).
> >> >
> >> > No. Register allocator will propagate unaligned memory in non-AVX
> >> > case, which is not allowed with vector_operand.
> >>
> >> I'm afraid I don't understand: Unaligned SIMD memory accesses will
> >> generally fault in non-AVX mode, so such propagation would seem
> >> wrong to me and hence would seem to be correctly not allowed.
> >> Furthermore both vector_operand and Bm resolve to the same
> >> vector_memory_operand. The TARGET_AVX check actually is inside
> >> vector_memory_operand, i.e. affects both the same way.
> >
> > "Bm" *prevents* propagation of unaligned access for non-AVX targets.
> > As said, register allocator does not care for operand predicates (it
> > only looks at operand constraints), so it will propagate unaligned
> > access with "m" operand. To avoid propagation, "Bm" should and does
> > use vector_memory_operand constraint internally.
>
> Okay, I think I got it now (also because of your reply on the other
> thread). It means in the patch here I need to retain Bm rather than
> dropping it, too, and additionally use it on the other alternative.

The correct solution is a bit more complicated. I don't know if these
instructions tolerate unaligned operand in non-AVX case. If they
don't, then vector_operand should be used and the first alternative
should be split to avx and non-avx part, where non-avx part uses Bm
constraint.

Uros.
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] x86: fix CVT{,T}PD2PI insns

Jan Beulich-2
>>> On 27.06.19 at 14:07, <[hidden email]> wrote:
> On Thu, Jun 27, 2019 at 1:31 PM Jan Beulich <[hidden email]> wrote:
>>
>> >>> On 27.06.19 at 13:02, <[hidden email]> wrote:
>> > On Thu, Jun 27, 2019 at 12:47 PM Jan Beulich <[hidden email]> wrote:
>> >>
>> >> >>> On 27.06.19 at 12:22, <[hidden email]> wrote:
>> >> > On Thu, Jun 27, 2019 at 11:10 AM Jan Beulich <[hidden email]> wrote:
>> >> >>
>> >> >> >>> On 27.06.19 at 11:03,  wrote:
>> >> >> > With just an "m" constraint misaligned memory operands won't be forced
>> >> >> > into a register, and hence cause #GP. So far this was guaranteed only
>> >> >> > in the case that CVT{,T}PD2DQ were chosen (which looks to be the case on
>> >> >> > x86-64 only).
>> >> >> >
>> >> >> > Instead of switching the second alternative to Bm, use just m on the
>> >> >> > first and replace nonimmediate_operand by vector_operand.
>> >> >>
>> >> >> While doing this and the others where I'm also replacing Bm by uses of
>> >> >> vector_operand, I've started wondering whether Bm couldn't (and then
>> >> >> shouldn't) be dropped altogether, replacing it everywhere by "m"
>> >> >> combined with vector_operand (or vector_memory_operand when
>> >> >> register operands aren't allowed anyway).
>> >> >
>> >> > No. Register allocator will propagate unaligned memory in non-AVX
>> >> > case, which is not allowed with vector_operand.
>> >>
>> >> I'm afraid I don't understand: Unaligned SIMD memory accesses will
>> >> generally fault in non-AVX mode, so such propagation would seem
>> >> wrong to me and hence would seem to be correctly not allowed.
>> >> Furthermore both vector_operand and Bm resolve to the same
>> >> vector_memory_operand. The TARGET_AVX check actually is inside
>> >> vector_memory_operand, i.e. affects both the same way.
>> >
>> > "Bm" *prevents* propagation of unaligned access for non-AVX targets.
>> > As said, register allocator does not care for operand predicates (it
>> > only looks at operand constraints), so it will propagate unaligned
>> > access with "m" operand. To avoid propagation, "Bm" should and does
>> > use vector_memory_operand constraint internally.
>>
>> Okay, I think I got it now (also because of your reply on the other
>> thread). It means in the patch here I need to retain Bm rather than
>> dropping it, too, and additionally use it on the other alternative.
>
> The correct solution is a bit more complicated. I don't know if these
> instructions tolerate unaligned operand in non-AVX case.

They don't.

> If they
> don't, then vector_operand should be used and the first alternative
> should be split to avx and non-avx part, where non-avx part uses Bm
> constraint.

Why? Bm takes care to distinguish the AVX and non-AVX cases. That's
how things work elsewhere too, afaict. The bug here really is that the
(non-AVX-only) second alternative didn't also use Bm.

Jan


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] x86: fix CVT{,T}PD2PI insns

Uros Bizjak-3
On Thu, Jun 27, 2019 at 2:14 PM Jan Beulich <[hidden email]> wrote:

>
> >>> On 27.06.19 at 14:07, <[hidden email]> wrote:
> > On Thu, Jun 27, 2019 at 1:31 PM Jan Beulich <[hidden email]> wrote:
> >>
> >> >>> On 27.06.19 at 13:02, <[hidden email]> wrote:
> >> > On Thu, Jun 27, 2019 at 12:47 PM Jan Beulich <[hidden email]> wrote:
> >> >>
> >> >> >>> On 27.06.19 at 12:22, <[hidden email]> wrote:
> >> >> > On Thu, Jun 27, 2019 at 11:10 AM Jan Beulich <[hidden email]> wrote:
> >> >> >>
> >> >> >> >>> On 27.06.19 at 11:03,  wrote:
> >> >> >> > With just an "m" constraint misaligned memory operands won't be forced
> >> >> >> > into a register, and hence cause #GP. So far this was guaranteed only
> >> >> >> > in the case that CVT{,T}PD2DQ were chosen (which looks to be the case on
> >> >> >> > x86-64 only).
> >> >> >> >
> >> >> >> > Instead of switching the second alternative to Bm, use just m on the
> >> >> >> > first and replace nonimmediate_operand by vector_operand.
> >> >> >>
> >> >> >> While doing this and the others where I'm also replacing Bm by uses of
> >> >> >> vector_operand, I've started wondering whether Bm couldn't (and then
> >> >> >> shouldn't) be dropped altogether, replacing it everywhere by "m"
> >> >> >> combined with vector_operand (or vector_memory_operand when
> >> >> >> register operands aren't allowed anyway).
> >> >> >
> >> >> > No. Register allocator will propagate unaligned memory in non-AVX
> >> >> > case, which is not allowed with vector_operand.
> >> >>
> >> >> I'm afraid I don't understand: Unaligned SIMD memory accesses will
> >> >> generally fault in non-AVX mode, so such propagation would seem
> >> >> wrong to me and hence would seem to be correctly not allowed.
> >> >> Furthermore both vector_operand and Bm resolve to the same
> >> >> vector_memory_operand. The TARGET_AVX check actually is inside
> >> >> vector_memory_operand, i.e. affects both the same way.
> >> >
> >> > "Bm" *prevents* propagation of unaligned access for non-AVX targets.
> >> > As said, register allocator does not care for operand predicates (it
> >> > only looks at operand constraints), so it will propagate unaligned
> >> > access with "m" operand. To avoid propagation, "Bm" should and does
> >> > use vector_memory_operand constraint internally.
> >>
> >> Okay, I think I got it now (also because of your reply on the other
> >> thread). It means in the patch here I need to retain Bm rather than
> >> dropping it, too, and additionally use it on the other alternative.
> >
> > The correct solution is a bit more complicated. I don't know if these
> > instructions tolerate unaligned operand in non-AVX case.
>
> They don't.
>
> > If they
> > don't, then vector_operand should be used and the first alternative
> > should be split to avx and non-avx part, where non-avx part uses Bm
> > constraint.
>
> Why? Bm takes care to distinguish the AVX and non-AVX cases. That's
> how things work elsewhere too, afaict. The bug here really is that the
> (non-AVX-only) second alternative didn't also use Bm.

Ah, yes. So, please use vector_operand here and change the second
alternative to use Bm.

Patch is OK with this change.

Thanks,
Uros.