clarification on the intent of X86_64 psABI vector return.

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

clarification on the intent of X86_64 psABI vector return.

Iain Sandoe-5
Hi,

For a processor that supports SSE, but not AVX.

the following code:

typedef int __attribute__((mode(QI))) qi;
typedef qi __attribute__((vector_size (32))) v32qi;

v32qi foo (int x)
{
  v32qi y = {'0','1','2','3','4','5','6','7','8','9','a','b','c','d','e','f',
          '0','1','2','3','4','5','6','7','8','9','a','b','c','d','e','f'};
  return y;
}

produces a warning " warning: AVX vector return without AVX enabled changes the ABI [-Wpsabi]”.

so - the question is what is the resultant ABI in the changed case (since _m256 is supported for such processors)

====

Looking at the psABI v1.0

* pp24 Returning of Values

The returning of values is done according to the following algorithm:

        • Classify the return type with the classification algorithm.


        • If the class is SSE, the next available vector register of the sequence %xmm0, %xmm1 is used.

        • If the class is SSEUP, the eight byte is returned in the next available eightbyte chunk of the last used vector register.

...

* classification algorithm : pp20

        • Arguments of type __m256 are split into four eightbyte chunks. The least significant one belongs to class SSE and all the others to class SSEUP.

        • Arguments of type __m512 are split into eight eightbyte chunks. The least significant one belongs to class SSE and all the others to class SSEUP.

*  footnote on pp21

12 The post merger clean up described later ensures that, for the processors that do not support the __m256 type, if the size of an object is larger than two eightbytes and the first eightbyte is not SSE or any other eightbyte is not SSEUP, it still has class MEMORY.

This in turn ensures that for processors that do support the __m256 type, if the size of an object is four eightbytes and the first eightbyte is SSE and all other eightbytes are SSEUP, it can be passed in a register. This also applies to the __m512 type. That is for processors that support the __m512 type, if the size of an object is eight eightbytes and the first eightbyte is SSE and all other eightbytes are SSEUP, it can be passed in a register, otherwise, it will be passed in memory.

---

However : the case where the processor does *not* support __m256 but the first eightbyte *is* SSE and the following eighbytes *are* SSEUP is not clarified.

The intent for SSE seems clear - use a reg
The intent for following SSEUP is less clear.

Nevertheless, it seems to imply that the intent for processors with SSE that the __m256 (and __m512) returns should be passed in xmm0:1(:3, maybe).

figure 3.4 pp23 does not clarify xmm* use for vector return at all - only mentioning floating point.

===== status

In any event, GCC passes the vec32 return in memory,
LLVM conversely passes it in xmm0:1 (at least for the versions I’ve tried).

which leads to an ABI discrepancy when GCC is used to build code on systems based on LLVM.

Please could the X86 maintainers clarify the intent (and maybe consider enhancing the footnote classification notes to make things clearer)?

- and then we can figure out how to deal with the systems that are already implemented - and how to move forward.

(as an aside, in any event, it seems inefficient to pass through memory when at least xmm0:1 are already set aside for return value use).

thanks
Iain

Reply | Threaded
Open this post in threaded view
|

Re: clarification on the intent of X86_64 psABI vector return.

H.J. Lu-30
In Tue, Oct 30, 2018 at 4:28 AM Iain Sandoe <[hidden email]> wrote:

>
> Hi,
>
> For a processor that supports SSE, but not AVX.
>
> the following code:
>
> typedef int __attribute__((mode(QI))) qi;
> typedef qi __attribute__((vector_size (32))) v32qi;
>
> v32qi foo (int x)
> {
>   v32qi y = {'0','1','2','3','4','5','6','7','8','9','a','b','c','d','e','f',
>           '0','1','2','3','4','5','6','7','8','9','a','b','c','d','e','f'};
>   return y;
> }
>
> produces a warning " warning: AVX vector return without AVX enabled changes the ABI [-Wpsabi]”.
>
> so - the question is what is the resultant ABI in the changed case (since _m256 is supported for such processors)
>
> ====
>
> Looking at the psABI v1.0
>
> * pp24 Returning of Values
>
> The returning of values is done according to the following algorithm:
>
>         • Classify the return type with the classification algorithm.
>
> …
>         • If the class is SSE, the next available vector register of the sequence %xmm0, %xmm1 is used.
>
>         • If the class is SSEUP, the eight byte is returned in the next available eightbyte chunk of the last used vector register.
>
> ...
>
> * classification algorithm : pp20
>
>         • Arguments of type __m256 are split into four eightbyte chunks. The least significant one belongs to class SSE and all the others to class SSEUP.
>
>         • Arguments of type __m512 are split into eight eightbyte chunks. The least significant one belongs to class SSE and all the others to class SSEUP.
>
> *  footnote on pp21
>
> 12 The post merger clean up described later ensures that, for the processors that do not support the __m256 type, if the size of an object is larger than two eightbytes and the first eightbyte is not SSE or any other eightbyte is not SSEUP, it still has class MEMORY.
>
> This in turn ensures that for processors that do support the __m256 type, if the size of an object is four eightbytes and the first eightbyte is SSE and all other eightbytes are SSEUP, it can be passed in a register. This also applies to the __m512 type. That is for processors that support the __m512 type, if the size of an object is eight eightbytes and the first eightbyte is SSE and all other eightbytes are SSEUP, it can be passed in a register, otherwise, it will be passed in memory.
>
> ---
>
> However : the case where the processor does *not* support __m256 but the first eightbyte *is* SSE and the following eighbytes *are* SSEUP is not clarified.
>
> The intent for SSE seems clear - use a reg
> The intent for following SSEUP is less clear.
>
> Nevertheless, it seems to imply that the intent for processors with SSE that the __m256 (and __m512) returns should be passed in xmm0:1(:3, maybe).
>
> figure 3.4 pp23 does not clarify xmm* use for vector return at all - only mentioning floating point.
>
> ===== status
>
> In any event, GCC passes the vec32 return in memory,
> LLVM conversely passes it in xmm0:1 (at least for the versions I’ve tried).
>
> which leads to an ABI discrepancy when GCC is used to build code on systems based on LLVM.
>
> Please could the X86 maintainers clarify the intent (and maybe consider enhancing the footnote classification notes to make things clearer)?
>
> - and then we can figure out how to deal with the systems that are already implemented - and how to move forward.
>
> (as an aside, in any event, it seems inefficient to pass through memory when at least xmm0:1 are already set aside for return value use).

Please open a bug to keep track.

Thanks.

--
H.J.
Reply | Threaded
Open this post in threaded view
|

Re: clarification on the intent of X86_64 psABI vector return.

Iain Sandoe-5

> On 30 Oct 2018, at 13:26, H.J. Lu <[hidden email]> wrote:
>
> In Tue, Oct 30, 2018 at 4:28 AM Iain Sandoe <[hidden email]> wrote:
>>

>
> Please open a bug to keep track.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87812