PROPOSAL: Extend inline asm syntax with size spec

classic Classic list List threaded Threaded
40 messages Options
12
Reply | Threaded
Open this post in threaded view
|

PROPOSAL: Extend inline asm syntax with size spec

Borislav Petkov-2
Hi people,

this is an attempt to see whether gcc's inline asm heuristic when
estimating inline asm statements' cost for better inlining can be
improved.

AFAIU, the problematic arises when one ends up using a lot of inline
asm statements in the kernel but due to the inline asm cost estimation
heuristic which counts lines, I think, for example like in this here
macro:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/include/asm/cpufeature.h#n162

the resulting code ends up not inlining the functions themselves which
use this macro. I.e., you see a CALL <function> instead of its body
getting inlined directly.

Even though it should be because the actual instructions are only a
couple in most cases and all those other directives end up in another
section anyway.

The issue is explained below in the forwarded mail in a larger detail
too.

Now, Richard suggested doing something like:

 1) inline asm ("...")
 2) asm ("..." : : : : <size-expr>)
 3) asm ("...") __attribute__((asm_size(<size-expr>)));

with which user can tell gcc what the size of that inline asm statement
is and thus allow for more precise cost estimation and in the end better
inlining.

And FWIW 3) looks pretty straight-forward to me because attributes are
pretty common anyways.

But I'm sure there are other options and I'm sure people will have
better/different ideas so feel free to chime in.

Thx.

On Wed, Oct 03, 2018 at 02:30:50PM -0700, Nadav Amit wrote:

> This patch-set deals with an interesting yet stupid problem: kernel code
> that does not get inlined despite its simplicity. There are several
> causes for this behavior: "cold" attribute on __init, different function
> optimization levels; conditional constant computations based on
> __builtin_constant_p(); and finally large inline assembly blocks.
>
> This patch-set deals with the inline assembly problem. I separated these
> patches from the others (that were sent in the RFC) for easier
> inclusion. I also separated the removal of unnecessary new-lines which
> would be sent separately.
>
> The problem with inline assembly is that inline assembly is often used
> by the kernel for things that are other than code - for example,
> assembly directives and data. GCC however is oblivious to the content of
> the blocks and assumes their cost in space and time is proportional to
> the number of the perceived assembly "instruction", according to the
> number of newlines and semicolons. Alternatives, paravirt and other
> mechanisms are affected, causing code not to be inlined, and degrading
> compilation quality in general.
>
> The solution that this patch-set carries for this problem is to create
> an assembly macro, and then call it from the inline assembly block.  As
> a result, the compiler sees a single "instruction" and assigns the more
> appropriate cost to the code.
>
> To avoid uglification of the code, as many noted, the macros are first
> precompiled into an assembly file, which is later assembled together
> with the C files. This also enables to avoid duplicate implementation
> that was set before for the asm and C code. This can be seen in the
> exception table changes.
>
> Overall this patch-set slightly increases the kernel size (my build was
> done using my Ubuntu 18.04 config + localyesconfig for the record):
>
>    text   data    bss    dec    hex filename
> 18140829 10224724 2957312 31322865 1ddf2f1 ./vmlinux before
> 18163608 10227348 2957312 31348268 1de562c ./vmlinux after (+0.1%)
>
> The number of static functions in the image is reduced by 379, but
> actually inlining is even better, which does not always shows in these
> numbers: a function may be inlined causing the calling function not to
> be inlined.
>
> I ran some limited number of benchmarks, and in general the performance
> impact is not very notable. You can still see >10 cycles shaved off some
> syscalls that manipulate page-tables (e.g., mprotect()), in which
> paravirt caused many functions not to be inlined. In addition this
> patch-set can prevent issues such as [1], and improves code readability
> and maintainability.
>
> Update: Rasmus recently caused me (inadvertently) to become paranoid
> about the dependencies. To clarify: if any of the headers changes, any c
> file which uses macros that are included in macros.S would be fine as
> long as it includes the header as well (as it should). Adding an
> assertion to check this is done might become slightly ugly, and nobody
> else is concerned about it. Another minor issue is that changes of
> macros.S would not trigger a global rebuild, but that is pretty similar
> to changes of the Makefile that do not trigger a rebuild.
>
> [1] https://patchwork.kernel.org/patch/10450037/
>
> v8->v9: * Restoring the '-pipe' parameter (Rasmus)
> * Adding Kees's tested-by tag (Kees)
>
> v7->v8: * Add acks (Masahiro, Max)
> * Rebase on 4.19 (Ingo)
>
> v6->v7: * Fix context switch tracking (Ingo)
> * Fix xtensa build error (Ingo)
> * Rebase on 4.18-rc8
>
> v5->v6: * Removing more code from jump-labels (PeterZ)
> * Fix build issue on i386 (0-day, PeterZ)
>
> v4->v5: * Makefile fixes (Masahiro, Sam)
>
> v3->v4: * Changed naming of macros in 2 patches (PeterZ)
> * Minor cleanup of the paravirt patch
>
> v2->v3: * Several build issues resolved (0-day)
> * Wrong comments fix (Josh)
> * Change asm vs C order in refcount (Kees)
>
> v1->v2: * Compiling the macros into a separate .s file, improving
>  readability (Linus)
> * Improving assembly formatting, applying most of the comments
>  according to my judgment (Jan)
> * Adding exception-table, cpufeature and jump-labels
> * Removing new-line cleanup; to be submitted separately
>
> Cc: Masahiro Yamada <[hidden email]>
> Cc: Sam Ravnborg <[hidden email]>
> Cc: Alok Kataria <[hidden email]>
> Cc: Christopher Li <[hidden email]>
> Cc: Greg Kroah-Hartman <[hidden email]>
> Cc: "H. Peter Anvin" <[hidden email]>
> Cc: Ingo Molnar <[hidden email]>
> Cc: Jan Beulich <[hidden email]>
> Cc: Josh Poimboeuf <[hidden email]>
> Cc: Juergen Gross <[hidden email]>
> Cc: Kate Stewart <[hidden email]>
> Cc: Kees Cook <[hidden email]>
> Cc: [hidden email]
> Cc: Peter Zijlstra <[hidden email]>
> Cc: Philippe Ombredanne <[hidden email]>
> Cc: Thomas Gleixner <[hidden email]>
> Cc: [hidden email]
> Cc: Linus Torvalds <[hidden email]>
> Cc: [hidden email]
> Cc: Chris Zankel <[hidden email]>
> Cc: Max Filippov <[hidden email]>
> Cc: [hidden email]
>
> Nadav Amit (10):
>   xtensa: defining LINKER_SCRIPT for the linker script
>   Makefile: Prepare for using macros for inline asm
>   x86: objtool: use asm macro for better compiler decisions
>   x86: refcount: prevent gcc distortions
>   x86: alternatives: macrofy locks for better inlining
>   x86: bug: prevent gcc distortions
>   x86: prevent inline distortion by paravirt ops
>   x86: extable: use macros instead of inline assembly
>   x86: cpufeature: use macros instead of inline assembly
>   x86: jump-labels: use macros instead of inline assembly
>
>  Makefile                               |  9 ++-
>  arch/x86/Makefile                      |  7 ++
>  arch/x86/entry/calling.h               |  2 +-
>  arch/x86/include/asm/alternative-asm.h | 20 ++++--
>  arch/x86/include/asm/alternative.h     | 11 +--
>  arch/x86/include/asm/asm.h             | 61 +++++++---------
>  arch/x86/include/asm/bug.h             | 98 +++++++++++++++-----------
>  arch/x86/include/asm/cpufeature.h      | 82 ++++++++++++---------
>  arch/x86/include/asm/jump_label.h      | 77 ++++++++------------
>  arch/x86/include/asm/paravirt_types.h  | 56 +++++++--------
>  arch/x86/include/asm/refcount.h        | 74 +++++++++++--------
>  arch/x86/kernel/macros.S               | 16 +++++
>  arch/xtensa/kernel/Makefile            |  4 +-
>  include/asm-generic/bug.h              |  8 +--
>  include/linux/compiler.h               | 56 +++++++++++----
>  scripts/Kbuild.include                 |  4 +-
>  scripts/mod/Makefile                   |  2 +
>  17 files changed, 331 insertions(+), 256 deletions(-)
>  create mode 100644 arch/x86/kernel/macros.S
>
> --
> 2.17.1
>

--
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
Reply | Threaded
Open this post in threaded view
|

Re: PROPOSAL: Extend inline asm syntax with size spec

Segher Boessenkool
On Sun, Oct 07, 2018 at 11:18:06AM +0200, Borislav Petkov wrote:
> this is an attempt to see whether gcc's inline asm heuristic when
> estimating inline asm statements' cost for better inlining can be
> improved.

GCC already estimates the *size* of inline asm, and this is required
*for correctness*.  So any workaround that works against this will only
end in tears.

Taking size as an estimate of cost is not very good.  But in your
motivating example you actually *do* care mostly about size, namely, for
the inlining decisions.

So I guess the real issue is that the inline asm size estimate for x86
isn't very good (since it has to be pessimistic, and x86 insns can be
huge)?

> Now, Richard suggested doing something like:
>
>  1) inline asm ("...")

What would the semantics of this be?

>  2) asm ("..." : : : : <size-expr>)

This potentially conflicts with the syntax for asm goto.

>  3) asm ("...") __attribute__((asm_size(<size-expr>)));

Eww.

> with which user can tell gcc what the size of that inline asm statement
> is and thus allow for more precise cost estimation and in the end better
> inlining.

More precise *size* estimates, yes.  And if the user lies he should not
be surprised to get assembler errors, etc.

> And FWIW 3) looks pretty straight-forward to me because attributes are
> pretty common anyways.

I don't like 2) either.  But 1) looks interesting, depends what its
semantics would be?  "Don't count this insn's size for inlining decisions",
maybe?


Another option is to just force inlining for those few functions where
GCC currently makes an inlining decision you don't like.  Or are there
more than a few?


Segher
Reply | Threaded
Open this post in threaded view
|

Re: PROPOSAL: Extend inline asm syntax with size spec

Borislav Petkov-2
On Sun, Oct 07, 2018 at 08:22:28AM -0500, Segher Boessenkool wrote:
> GCC already estimates the *size* of inline asm, and this is required
> *for correctness*.

I didn't say it didn't - but the heuristic could use improving.

> So I guess the real issue is that the inline asm size estimate for x86
> isn't very good (since it has to be pessimistic, and x86 insns can be
> huge)?

Well, the size thing could be just a "parameter" or "hint" of sorts, to
tell gcc to inline the function X which is inlining the asm statement
into the function Y which is calling function X. If you look at the
patchset, it is moving everything to asm macros where gcc is apparently
able to do better inlining.

> >  3) asm ("...") __attribute__((asm_size(<size-expr>)));
>
> Eww.

Why?

> More precise *size* estimates, yes.  And if the user lies he should not
> be surprised to get assembler errors, etc.

Yes.

Another option would be if gcc parses the inline asm directly and
does a more precise size estimation. Which is a lot more involved and
complicated solution so I guess we wanna look at the simpler ones first.

:-)

> I don't like 2) either.  But 1) looks interesting, depends what its
> semantics would be?  "Don't count this insn's size for inlining decisions",
> maybe?

Or simply "this asm statement has a size of 1" to mean, inline it
everywhere. Which has the same caveats as above.

> Another option is to just force inlining for those few functions where
> GCC currently makes an inlining decision you don't like.  Or are there
> more than a few?

I'm afraid they're more than a few and this should work automatically,
if possible.

Thx.

--
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
Reply | Threaded
Open this post in threaded view
|

Re: PROPOSAL: Extend inline asm syntax with size spec

Segher Boessenkool
On Sun, Oct 07, 2018 at 04:13:49PM +0200, Borislav Petkov wrote:
> On Sun, Oct 07, 2018 at 08:22:28AM -0500, Segher Boessenkool wrote:
> > GCC already estimates the *size* of inline asm, and this is required
> > *for correctness*.
>
> I didn't say it didn't - but the heuristic could use improving.

How?  It is as sharp an estimate as can be *already*: number of insns
times maximum size per insn.

If you get this wrong, conditional branches (and similar things, but
conditional branches usually hit first, and hurt most) will stop working
correctly, unless binutils uses relaxation for those on your architecture
(most don't).

> > So I guess the real issue is that the inline asm size estimate for x86
> > isn't very good (since it has to be pessimistic, and x86 insns can be
> > huge)?
>
> Well, the size thing could be just a "parameter" or "hint" of sorts, to
> tell gcc to inline the function X which is inlining the asm statement
> into the function Y which is calling function X. If you look at the
> patchset, it is moving everything to asm macros where gcc is apparently
> able to do better inlining.

Yes, that will cause fewer problems I think: do not override size _at all_,
but give a hint to the inliner.

> > >  3) asm ("...") __attribute__((asm_size(<size-expr>)));
> >
> > Eww.
>
> Why?

Attributes are clumsy and clunky and kludgy.

It never is well-defined how attributes interact, and the more attributes
we define and use, the more that matters.

> > More precise *size* estimates, yes.  And if the user lies he should not
> > be surprised to get assembler errors, etc.
>
> Yes.
>
> Another option would be if gcc parses the inline asm directly and
> does a more precise size estimation. Which is a lot more involved and
> complicated solution so I guess we wanna look at the simpler ones first.
>
> :-)

Which is *impossible* to do.  Inline assembler is free-form text.

> > I don't like 2) either.  But 1) looks interesting, depends what its
> > semantics would be?  "Don't count this insn's size for inlining decisions",
> > maybe?
>
> Or simply "this asm statement has a size of 1" to mean, inline it
> everywhere. Which has the same caveats as above.

"Has minimum length" then (size 1 cannot work on most archs).

> > Another option is to just force inlining for those few functions where
> > GCC currently makes an inlining decision you don't like.  Or are there
> > more than a few?
>
> I'm afraid they're more than a few and this should work automatically,
> if possible.

Would counting *all* asms as having minimum length for inlining decisions
work?  Will that give bad side effects?

Or since this problem is quite specific to x86, maybe some target hook is
wanted?  Things work quite well elsewhere as-is, degrading that is not a
good idea.


Segher
Reply | Threaded
Open this post in threaded view
|

Re: PROPOSAL: Extend inline asm syntax with size spec

Michael Matz
In reply to this post by Segher Boessenkool
Hi Segher,

On Sun, 7 Oct 2018, Segher Boessenkool wrote:

> On Sun, Oct 07, 2018 at 11:18:06AM +0200, Borislav Petkov wrote:
> > this is an attempt to see whether gcc's inline asm heuristic when
> > estimating inline asm statements' cost for better inlining can be
> > improved.
>
> GCC already estimates the *size* of inline asm, and this is required
> *for correctness*.  So any workaround that works against this will only
> end in tears.

You're right and wrong.  GCC can't even estimate the size of mildly
complicated inline asms right now, so your claim of it being necessary for
correctness can't be true in this absolute form.  I know what you try to
say, but still, consider inline asms like this:

     insn1
  .section bla
     insn2
  .previous

or
   invoke_asm_macro foo,bar

in both cases GCCs size estimate will be wrong however you want to count
it.  This is actually the motivating example for the kernel guys, the
games they play within their inline asms make the estimates be wildly
wrong to a point it interacts with the inliner.

> So I guess the real issue is that the inline asm size estimate for x86
> isn't very good (since it has to be pessimistic, and x86 insns can be
> huge)?

No, see above, even if we were to improve the size estimates (e.g. based
on some average instruction size) the kernel examples would still be off
because they switch sections back and forth, use asm macros and computed
.fill directives and maybe further stuff.  GCC will never be able to
accurately calculate these sizes (without an built-in assembler which
hopefully noone proposes).

So, there is a case for extending the inline-asm facility to say
"size is complicated here, assume this for inline decisions".

> > Now, Richard suggested doing something like:
> >
> >  1) inline asm ("...")
>
> What would the semantics of this be?

The size of the inline asm wouldn't be counted towards the inliner size
limits (or be counted as "1").

> I don't like 2) either.  But 1) looks interesting, depends what its
> semantics would be?  "Don't count this insn's size for inlining decisions",
> maybe?

TBH, I like the inline asm (...) suggestion most currently, but what if we
want to add more attributes to asms?  We could add further special
keywords to the clobber list:
  asm ("...." : : : "cc,memory,inline");
sure, it might seem strange to "clobber" inline, but if we reinterpret the
clobber list as arbitrary set of attributes for this asm, it'd be fine.

> Another option is to just force inlining for those few functions where
> GCC currently makes an inlining decision you don't like.  Or are there
> more than a few?

I think the examples I saw from Boris were all indirect inlines:

  static inline void foo() { asm("large-looking-but-small-asm"); }
  static void bar1() { ... foo() ... }
  static void bar2() { ... foo() ... }
  void goo (void) { bar1(); }  // bar1 should have been inlined

So, while the immediate asm user was marked as always inline that in turn
caused users of it to become non-inlined.  I'm assuming the kernel guys
did proper measurements that they _really_ get some non-trivial speed
benefit by inlining bar1/bar2, but for some reasons (I didn't inquire)
didn't want to mark them all as inline as well.


Ciao,
Michael.
Reply | Threaded
Open this post in threaded view
|

[RESEND] PROPOSAL: Extend inline asm syntax with size spec

Nadav Amit
In reply to this post by Borislav Petkov-2
Second try after being blocked by gcc mailing list:

at 9:09 AM, Nadav Amit <[hidden email]<mailto:[hidden email]>> wrote:

at 2:18 AM, Borislav Petkov <[hidden email]<mailto:[hidden email]>> wrote:

Hi people,

this is an attempt to see whether gcc's inline asm heuristic when
estimating inline asm statements' cost for better inlining can be
improved.

AFAIU, the problematic arises when one ends up using a lot of inline
asm statements in the kernel but due to the inline asm cost estimation
heuristic which counts lines, I think, for example like in this here
macro:

https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Ftree%2Farch%2Fx86%2Finclude%2Fasm%2Fcpufeature.h%23n162&amp;data=02%7C01%7Cnamit%40vmware.com%7C6db1258c65ea45bbe4ea08d62c35ceec%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C636745007006838299&amp;sdata=iehl%2Fb8h%2BZE%2Frqb4qjac19WekSgOObc9%2BM1Jto1VgF4%3D&amp;reserved=0

the resulting code ends up not inlining the functions themselves which
use this macro. I.e., you see a CALL <function> instead of its body
getting inlined directly.

Even though it should be because the actual instructions are only a
couple in most cases and all those other directives end up in another
section anyway.

The issue is explained below in the forwarded mail in a larger detail
too.

Now, Richard suggested doing something like:

1) inline asm ("...")
2) asm ("..." : : : : <size-expr>)
3) asm ("...") __attribute__((asm_size(<size-expr>)));

with which user can tell gcc what the size of that inline asm statement
is and thus allow for more precise cost estimation and in the end better
inlining.

And FWIW 3) looks pretty straight-forward to me because attributes are
pretty common anyways.

But I'm sure there are other options and I'm sure people will have
better/different ideas so feel free to chime in.

Thanks for taking care of it. I would like to mention a second issue, since
you may want to resolve both with a single solution: not inlining
conditional __builtin_constant_p(), in which there are two code-paths - one
for constants and one for variables.

Consider for example the Linux kernel ilog2 macro, which has a condition
based on __builtin_constant_p() (
https://elixir.bootlin.com/linux/v4.19-rc7/source/include/linux/log2.h#L160
). The compiler mistakenly considers the “heavy” code-path that is supposed
to be evaluated only in compilation time to evaluate the code size. This
causes the kernel to consider functions such as kmalloc() as “big”.
kmalloc() is marked with always_inline attribute, so instead the calling
functions, such as kzalloc() are not inlined.

When I thought about hacking gcc to solve this issue, I considered an
intrinsic that would override the cost of a given statement. This solution
is not too nice, but may solve both issues.

In addition, note that AFAIU the impact of a wrong cost of code estimation
can also impact loop and other optimizations.

Regards,
Nadav


Reply | Threaded
Open this post in threaded view
|

Re: PROPOSAL: Extend inline asm syntax with size spec

Richard Biener
In reply to this post by Borislav Petkov-2
On October 7, 2018 6:09:30 PM GMT+02:00, Nadav Amit <[hidden email]> wrote:

>at 2:18 AM, Borislav Petkov <[hidden email]> wrote:
>
>> Hi people,
>>
>> this is an attempt to see whether gcc's inline asm heuristic when
>> estimating inline asm statements' cost for better inlining can be
>> improved.
>>
>> AFAIU, the problematic arises when one ends up using a lot of inline
>> asm statements in the kernel but due to the inline asm cost
>estimation
>> heuristic which counts lines, I think, for example like in this here
>> macro:
>>
>>
>https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Ftree%2Farch%2Fx86%2Finclude%2Fasm%2Fcpufeature.h%23n162&amp;data=02%7C01%7Cnamit%40vmware.com%7C6db1258c65ea45bbe4ea08d62c35ceec%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C636745007006838299&amp;sdata=iehl%2Fb8h%2BZE%2Frqb4qjac19WekSgOObc9%2BM1Jto1VgF4%3D&amp;reserved=0
>>
>> the resulting code ends up not inlining the functions themselves
>which
>> use this macro. I.e., you see a CALL <function> instead of its body
>> getting inlined directly.
>>
>> Even though it should be because the actual instructions are only a
>> couple in most cases and all those other directives end up in another
>> section anyway.
>>
>> The issue is explained below in the forwarded mail in a larger detail
>> too.
>>
>> Now, Richard suggested doing something like:
>>
>> 1) inline asm ("...")
>> 2) asm ("..." : : : : <size-expr>)
>> 3) asm ("...") __attribute__((asm_size(<size-expr>)));
>>
>> with which user can tell gcc what the size of that inline asm
>statement
>> is and thus allow for more precise cost estimation and in the end
>better
>> inlining.
>>
>> And FWIW 3) looks pretty straight-forward to me because attributes
>are
>> pretty common anyways.
>>
>> But I'm sure there are other options and I'm sure people will have
>> better/different ideas so feel free to chime in.
>
>Thanks for taking care of it. I would like to mention a second issue,
>since
>you may want to resolve both with a single solution: not inlining
>conditional __builtin_constant_p(), in which there are two code-paths -
>one
>for constants and one for variables.
>
>Consider for example the Linux kernel ilog2 macro, which has a
>condition
>based on __builtin_constant_p() (
>https://elixir.bootlin.com/linux/v4.19-rc7/source/include/linux/log2.h#L160
>). The compiler mistakenly considers the “heavy” code-path that is
>supposed
>to be evaluated only in compilation time to evaluate the code size.

But this is a misconception about __builtin_constant_p. It doesn't guard sth like 'constexpr' regions. If you try to use it with those semantics you'll fail (appearantly you do).

Of course IPA CP code size estimates when seeing a constant fed to bcp might be not optimal, that's another issue of course.

Richard.

>This
>causes the kernel to consider functions such as kmalloc() as “big”.
>kmalloc() is marked with always_inline attribute, so instead the
>calling
>functions, such as kzalloc() are not inlined.
>
>When I thought about hacking gcc to solve this issue, I considered an
>intrinsic that would override the cost of a given statement. This
>solution
>is not too nice, but may solve both issues.
>
>In addition, note that AFAIU the impact of a wrong cost of code
>estimation
>can also impact loop and other optimizations.
>
>Regards,
>Nadav

Reply | Threaded
Open this post in threaded view
|

Re: PROPOSAL: Extend inline asm syntax with size spec

Nadav Amit
at 9:46 AM, Richard Biener <[hidden email]> wrote:

> On October 7, 2018 6:09:30 PM GMT+02:00, Nadav Amit <[hidden email]> wrote:
>> at 2:18 AM, Borislav Petkov <[hidden email]> wrote:
>>
>>> Hi people,
>>>
>>> this is an attempt to see whether gcc's inline asm heuristic when
>>> estimating inline asm statements' cost for better inlining can be
>>> improved.
>>>
>>> AFAIU, the problematic arises when one ends up using a lot of inline
>>> asm statements in the kernel but due to the inline asm cost
>> estimation
>>> heuristic which counts lines, I think, for example like in this here
>>> macro:
>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Ftree%2Farch%2Fx86%2Finclude%2Fasm%2Fcpufeature.h%23n162&amp;data=02%7C01%7Cnamit%40vmware.com%7C860403cecb874db64b7e08d62c746f46%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C636745275975505381&amp;sdata=Nd0636K9Z1IsUs1RWSRAhVuVboLxlBCB4peiAMfmQzQ%3D&amp;reserved=0
>>> the resulting code ends up not inlining the functions themselves
>> which
>>> use this macro. I.e., you see a CALL <function> instead of its body
>>> getting inlined directly.
>>>
>>> Even though it should be because the actual instructions are only a
>>> couple in most cases and all those other directives end up in another
>>> section anyway.
>>>
>>> The issue is explained below in the forwarded mail in a larger detail
>>> too.
>>>
>>> Now, Richard suggested doing something like:
>>>
>>> 1) inline asm ("...")
>>> 2) asm ("..." : : : : <size-expr>)
>>> 3) asm ("...") __attribute__((asm_size(<size-expr>)));
>>>
>>> with which user can tell gcc what the size of that inline asm
>> statement
>>> is and thus allow for more precise cost estimation and in the end
>> better
>>> inlining.
>>>
>>> And FWIW 3) looks pretty straight-forward to me because attributes
>> are
>>> pretty common anyways.
>>>
>>> But I'm sure there are other options and I'm sure people will have
>>> better/different ideas so feel free to chime in.
>>
>> Thanks for taking care of it. I would like to mention a second issue,
>> since
>> you may want to resolve both with a single solution: not inlining
>> conditional __builtin_constant_p(), in which there are two code-paths -
>> one
>> for constants and one for variables.
>>
>> Consider for example the Linux kernel ilog2 macro, which has a
>> condition
>> based on __builtin_constant_p() (
>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Fv4.19-rc7%2Fsource%2Finclude%2Flinux%2Flog2.h%23L160&amp;data=02%7C01%7Cnamit%40vmware.com%7C860403cecb874db64b7e08d62c746f46%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C636745275975515386&amp;sdata=Hk39Za9%2FxcFyK0sGENB24d6QySjsDGzF%2FwqjnUEMiGk%3D&amp;reserved=0
>> ). The compiler mistakenly considers the “heavy” code-path that is
>> supposed
>> to be evaluated only in compilation time to evaluate the code size.
>
> But this is a misconception about __builtin_constant_p. It doesn't guard sth like 'constexpr' regions. If you try to use it with those semantics you'll fail (appearantly you do).
>
> Of course IPA CP code size estimates when seeing a constant fed to bcp might be not optimal, that's another issue of course.

I understand that this is might not be the right way to implement macros
such as ilog2() and test_bit(), but this code is around for some time.

I thought of using __builtin_choose_expr() instead of ternary operator, but
this introduces a different problem, as the variable version is used instead
of the constant one in many cases. From my brief experiments with llvm, it
appears that llvm does not have both of these issues (wrong cost attributed
to inline asm and conditions based on __builtin_constant_p()).

So what alternative do you propose to implement ilog2() like behavior? I was
digging through the gcc code to find a workaround with no success.

Thanks,
Nadav

Reply | Threaded
Open this post in threaded view
|

Re: PROPOSAL: Extend inline asm syntax with size spec

Jeff Law
On 10/7/18 1:06 PM, Nadav Amit wrote:

> at 9:46 AM, Richard Biener <[hidden email]> wrote:
>
>> On October 7, 2018 6:09:30 PM GMT+02:00, Nadav Amit <[hidden email]> wrote:
>>> at 2:18 AM, Borislav Petkov <[hidden email]> wrote:
>>>
>>>> Hi people,
>>>>
>>>> this is an attempt to see whether gcc's inline asm heuristic when
>>>> estimating inline asm statements' cost for better inlining can be
>>>> improved.
>>>>
>>>> AFAIU, the problematic arises when one ends up using a lot of inline
>>>> asm statements in the kernel but due to the inline asm cost
>>> estimation
>>>> heuristic which counts lines, I think, for example like in this here
>>>> macro:
>>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Ftree%2Farch%2Fx86%2Finclude%2Fasm%2Fcpufeature.h%23n162&amp;data=02%7C01%7Cnamit%40vmware.com%7C860403cecb874db64b7e08d62c746f46%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C636745275975505381&amp;sdata=Nd0636K9Z1IsUs1RWSRAhVuVboLxlBCB4peiAMfmQzQ%3D&amp;reserved=0
>>>> the resulting code ends up not inlining the functions themselves
>>> which
>>>> use this macro. I.e., you see a CALL <function> instead of its body
>>>> getting inlined directly.
>>>>
>>>> Even though it should be because the actual instructions are only a
>>>> couple in most cases and all those other directives end up in another
>>>> section anyway.
>>>>
>>>> The issue is explained below in the forwarded mail in a larger detail
>>>> too.
>>>>
>>>> Now, Richard suggested doing something like:
>>>>
>>>> 1) inline asm ("...")
>>>> 2) asm ("..." : : : : <size-expr>)
>>>> 3) asm ("...") __attribute__((asm_size(<size-expr>)));
>>>>
>>>> with which user can tell gcc what the size of that inline asm
>>> statement
>>>> is and thus allow for more precise cost estimation and in the end
>>> better
>>>> inlining.
>>>>
>>>> And FWIW 3) looks pretty straight-forward to me because attributes
>>> are
>>>> pretty common anyways.
>>>>
>>>> But I'm sure there are other options and I'm sure people will have
>>>> better/different ideas so feel free to chime in.
>>>
>>> Thanks for taking care of it. I would like to mention a second issue,
>>> since
>>> you may want to resolve both with a single solution: not inlining
>>> conditional __builtin_constant_p(), in which there are two code-paths -
>>> one
>>> for constants and one for variables.
>>>
>>> Consider for example the Linux kernel ilog2 macro, which has a
>>> condition
>>> based on __builtin_constant_p() (
>>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Fv4.19-rc7%2Fsource%2Finclude%2Flinux%2Flog2.h%23L160&amp;data=02%7C01%7Cnamit%40vmware.com%7C860403cecb874db64b7e08d62c746f46%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C636745275975515386&amp;sdata=Hk39Za9%2FxcFyK0sGENB24d6QySjsDGzF%2FwqjnUEMiGk%3D&amp;reserved=0
>>> ). The compiler mistakenly considers the “heavy” code-path that is
>>> supposed
>>> to be evaluated only in compilation time to evaluate the code size.
>>
>> But this is a misconception about __builtin_constant_p. It doesn't guard sth like 'constexpr' regions. If you try to use it with those semantics you'll fail (appearantly you do).
>>
>> Of course IPA CP code size estimates when seeing a constant fed to bcp might be not optimal, that's another issue of course.
>
> I understand that this is might not be the right way to implement macros
> such as ilog2() and test_bit(), but this code is around for some time.
That doesn't make it right -- and there's been numerous bogus bugs
reported against ilog2 because the authors of ilog2 haven't had a clear
understanding of the semantics of builtin_constant_p.


Jeff
Reply | Threaded
Open this post in threaded view
|

Re: PROPOSAL: Extend inline asm syntax with size spec

Segher Boessenkool
In reply to this post by Michael Matz
Hi!

On Sun, Oct 07, 2018 at 03:53:26PM +0000, Michael Matz wrote:

> On Sun, 7 Oct 2018, Segher Boessenkool wrote:
> > On Sun, Oct 07, 2018 at 11:18:06AM +0200, Borislav Petkov wrote:
> > > this is an attempt to see whether gcc's inline asm heuristic when
> > > estimating inline asm statements' cost for better inlining can be
> > > improved.
> >
> > GCC already estimates the *size* of inline asm, and this is required
> > *for correctness*.  So any workaround that works against this will only
> > end in tears.
>
> You're right and wrong.  GCC can't even estimate the size of mildly
> complicated inline asms right now, so your claim of it being necessary for
> correctness can't be true in this absolute form.  I know what you try to
> say, but still, consider inline asms like this:
>
>      insn1
>   .section bla
>      insn2
>   .previous
>
> or
>    invoke_asm_macro foo,bar
>
> in both cases GCCs size estimate will be wrong however you want to count
> it.  This is actually the motivating example for the kernel guys, the
> games they play within their inline asms make the estimates be wildly
> wrong to a point it interacts with the inliner.

Right.  The manual says:

"""
Some targets require that GCC track the size of each instruction used
in order to generate correct code.  Because the final length of the
code produced by an @code{asm} statement is only known by the
assembler, GCC must make an estimate as to how big it will be.  It
does this by counting the number of instructions in the pattern of the
@code{asm} and multiplying that by the length of the longest
instruction supported by that processor.  (When working out the number
of instructions, it assumes that any occurrence of a newline or of
whatever statement separator character is supported by the assembler --
typically @samp{;} --- indicates the end of an instruction.)

Normally, GCC's estimate is adequate to ensure that correct
code is generated, but it is possible to confuse the compiler if you use
pseudo instructions or assembler macros that expand into multiple real
instructions, or if you use assembler directives that expand to more
space in the object file than is needed for a single instruction.
If this happens then the assembler may produce a diagnostic saying that
a label is unreachable.
"""

It *is* necessary for correctness, except you can do things that can
confuse the compiler and then you are on your own anyway.

> > So I guess the real issue is that the inline asm size estimate for x86
> > isn't very good (since it has to be pessimistic, and x86 insns can be
> > huge)?
>
> No, see above, even if we were to improve the size estimates (e.g. based
> on some average instruction size) the kernel examples would still be off
> because they switch sections back and forth, use asm macros and computed
> .fill directives and maybe further stuff.  GCC will never be able to
> accurately calculate these sizes

What *is* such a size, anyway?  If it can be spread over multiple sections
(some of which support section merging), and you can have huge alignments,
etc.  What is needed here is not knowing the maximum size of the binary
output (however you want to define that), but some way for the compiler
to understand how bad it is to inline some assembler.  Maybe manual
direction, maybe just the current jeuristics can be tweaked a bit, maybe
we need to invent some attribute or two.

> (without an built-in assembler which hopefully noone proposes).

Not me, that's for sure.

> So, there is a case for extending the inline-asm facility to say
> "size is complicated here, assume this for inline decisions".

Yeah, that's an option.  It may be too complicated though, or just not
useful in its generality, so that everyone will use "1" (or "1 normal
size instruction"), and then we are better off just making something
for _that_ (or making it the default).

> > > Now, Richard suggested doing something like:
> > >
> > >  1) inline asm ("...")
> >
> > What would the semantics of this be?
>
> The size of the inline asm wouldn't be counted towards the inliner size
> limits (or be counted as "1").

That sounds like a good option.

> > I don't like 2) either.  But 1) looks interesting, depends what its
> > semantics would be?  "Don't count this insn's size for inlining decisions",
> > maybe?
>
> TBH, I like the inline asm (...) suggestion most currently, but what if we
> want to add more attributes to asms?  We could add further special
> keywords to the clobber list:
>   asm ("...." : : : "cc,memory,inline");
> sure, it might seem strange to "clobber" inline, but if we reinterpret the
> clobber list as arbitrary set of attributes for this asm, it'd be fine.

All of a targets register names and alternative register names are
allowed in the clobber list.  Will that never conflict with an attribute
name?  We already *have* syntax for specifying attributes on an asm (on
*any* statement even), so mixing these two things has no advantage.

Both "cc" and "memory" have their own problems of course, adding more
things to this just feels bad.  It may not be so bad ;-)

> > Another option is to just force inlining for those few functions where
> > GCC currently makes an inlining decision you don't like.  Or are there
> > more than a few?
>
> I think the examples I saw from Boris were all indirect inlines:
>
>   static inline void foo() { asm("large-looking-but-small-asm"); }
>   static void bar1() { ... foo() ... }
>   static void bar2() { ... foo() ... }
>   void goo (void) { bar1(); }  // bar1 should have been inlined
>
> So, while the immediate asm user was marked as always inline that in turn
> caused users of it to become non-inlined.  I'm assuming the kernel guys
> did proper measurements that they _really_ get some non-trivial speed
> benefit by inlining bar1/bar2, but for some reasons (I didn't inquire)
> didn't want to mark them all as inline as well.

Yeah that makes sense, like if this happens with the fixup stuff, it will
quickly spiral out of control.


Segher
Reply | Threaded
Open this post in threaded view
|

Re: PROPOSAL: Extend inline asm syntax with size spec

Richard Biener
In reply to this post by Nadav Amit
On Sun, 7 Oct 2018, Nadav Amit wrote:

> at 9:46 AM, Richard Biener <[hidden email]> wrote:
>
> > On October 7, 2018 6:09:30 PM GMT+02:00, Nadav Amit <[hidden email]> wrote:
> >> at 2:18 AM, Borislav Petkov <[hidden email]> wrote:
> >>
> >>> Hi people,
> >>>
> >>> this is an attempt to see whether gcc's inline asm heuristic when
> >>> estimating inline asm statements' cost for better inlining can be
> >>> improved.
> >>>
> >>> AFAIU, the problematic arises when one ends up using a lot of inline
> >>> asm statements in the kernel but due to the inline asm cost
> >> estimation
> >>> heuristic which counts lines, I think, for example like in this here
> >>> macro:
> >> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Ftree%2Farch%2Fx86%2Finclude%2Fasm%2Fcpufeature.h%23n162&amp;data=02%7C01%7Cnamit%40vmware.com%7C860403cecb874db64b7e08d62c746f46%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C636745275975505381&amp;sdata=Nd0636K9Z1IsUs1RWSRAhVuVboLxlBCB4peiAMfmQzQ%3D&amp;reserved=0
> >>> the resulting code ends up not inlining the functions themselves
> >> which
> >>> use this macro. I.e., you see a CALL <function> instead of its body
> >>> getting inlined directly.
> >>>
> >>> Even though it should be because the actual instructions are only a
> >>> couple in most cases and all those other directives end up in another
> >>> section anyway.
> >>>
> >>> The issue is explained below in the forwarded mail in a larger detail
> >>> too.
> >>>
> >>> Now, Richard suggested doing something like:
> >>>
> >>> 1) inline asm ("...")
> >>> 2) asm ("..." : : : : <size-expr>)
> >>> 3) asm ("...") __attribute__((asm_size(<size-expr>)));
> >>>
> >>> with which user can tell gcc what the size of that inline asm
> >> statement
> >>> is and thus allow for more precise cost estimation and in the end
> >> better
> >>> inlining.
> >>>
> >>> And FWIW 3) looks pretty straight-forward to me because attributes
> >> are
> >>> pretty common anyways.
> >>>
> >>> But I'm sure there are other options and I'm sure people will have
> >>> better/different ideas so feel free to chime in.
> >>
> >> Thanks for taking care of it. I would like to mention a second issue,
> >> since
> >> you may want to resolve both with a single solution: not inlining
> >> conditional __builtin_constant_p(), in which there are two code-paths -
> >> one
> >> for constants and one for variables.
> >>
> >> Consider for example the Linux kernel ilog2 macro, which has a
> >> condition
> >> based on __builtin_constant_p() (
> >> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Fv4.19-rc7%2Fsource%2Finclude%2Flinux%2Flog2.h%23L160&amp;data=02%7C01%7Cnamit%40vmware.com%7C860403cecb874db64b7e08d62c746f46%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C636745275975515386&amp;sdata=Hk39Za9%2FxcFyK0sGENB24d6QySjsDGzF%2FwqjnUEMiGk%3D&amp;reserved=0
> >> ). The compiler mistakenly considers the “heavy” code-path that is
> >> supposed
> >> to be evaluated only in compilation time to evaluate the code size.
> >
> > But this is a misconception about __builtin_constant_p. It doesn't guard sth like 'constexpr' regions. If you try to use it with those semantics you'll fail (appearantly you do).
> >
> > Of course IPA CP code size estimates when seeing a constant fed to bcp might be not optimal, that's another issue of course.
>
> I understand that this is might not be the right way to implement macros
> such as ilog2() and test_bit(), but this code is around for some time.
>
> I thought of using __builtin_choose_expr() instead of ternary operator, but
> this introduces a different problem, as the variable version is used instead
> of the constant one in many cases. From my brief experiments with llvm, it
> appears that llvm does not have both of these issues (wrong cost attributed
> to inline asm and conditions based on __builtin_constant_p()).
>
> So what alternative do you propose to implement ilog2() like behavior? I was
> digging through the gcc code to find a workaround with no success.
1) Don't try to cheat the compilers constant propagation abilities
2) Use a language that allows this (C++)
3) Define (and implement) the corresponding GNU C extension

__builtin_constant_p() isn't the right fit (I wonder what it was
implemented for in the first place though...).

I suppose you want sth like

 if (__builtin_constant_p (x))
   return __constexpr ...;

or use a call and have constexpr functions.  Note it wouldn't be
C++-constexpr like since you want the constexpr evaluation to
happen very late in the compilation to benefit from optimizations
and you are fine with the non-constexpr path.

Properly defining a language extension is hard.

Richard.

>
> Thanks,
> Nadav
>
>

--
Richard Biener <[hidden email]>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Reply | Threaded
Open this post in threaded view
|

Re: PROPOSAL: Extend inline asm syntax with size spec

Segher Boessenkool
In reply to this post by Segher Boessenkool
On Mon, Oct 08, 2018 at 07:58:38AM +0200, Ingo Molnar wrote:

> * Segher Boessenkool <[hidden email]> wrote:
> > > > More precise *size* estimates, yes.  And if the user lies he should not
> > > > be surprised to get assembler errors, etc.
> > >
> > > Yes.
> > >
> > > Another option would be if gcc parses the inline asm directly and
> > > does a more precise size estimation. Which is a lot more involved and
> > > complicated solution so I guess we wanna look at the simpler ones first.
> > >
> > > :-)
> >
> > Which is *impossible* to do.  Inline assembler is free-form text.
>
> "Impossible" is false: only under GCC's model and semantics of inline
> asm that is, and only under the (false) assumption that the semantics
> of the asm statement (which is a GCC extension to begin with) cannot
> be changed like it has been changed multiple times in the past.
>
> "Difficult", "not worth our while", perhaps.

If we throw out our current definition of inline assembler, and of the
internal backend interfaces, then sure you can do it.  This of course
invalidates all code that uses GCC inline assembler, and all GCC backends
(both in-tree and out-of-tree, both current and historical).

If other compilers think everyone should rewrite all of their code because
those compiler do inline asm wro^H^H^Hdifferently, that is their problem;
GCC should not deny all history and screw over all its users.


Segher
Reply | Threaded
Open this post in threaded view
|

Re: PROPOSAL: Extend inline asm syntax with size spec

Segher Boessenkool
In reply to this post by Michael Matz
On Mon, Oct 08, 2018 at 08:13:23AM +0200, Ingo Molnar wrote:
> * Michael Matz <[hidden email]> wrote:
> > (without an built-in assembler which hopefully noone proposes).
> There are disadvantages (the main one is having to implement it), but a built-in assembler has
> numerous advantages as well:
>
>  - Better optimizations: for example -Os could more accurately estimate true instruction size.

GCC already knows the exact instruction size in almost all cases, for most
backends.  It is an advantage to not *have to* keep track of exact insn
size in all cases.

>  - Better inlining: as the examples in this thread are showing.

That's a red herring, the actual issue is inlining makes some spectacularly
wrong decisions for code involving asm currently.  That would not be solved
by outputting binary code instead of assembler code.  All those decisions
are done long before code is output.

>  - Better padding/alignment: right now GCC has no notion about the precise cache layout of the
>    assembly code it generates and the code alignment options it has are crude.

This isn't true.  Maybe some targets do not care.

And of course GCC only knows this as far as it knows the alignments of the
sections it outputs into; for example, ASLR is a nice performance killer
at times.  And if your linker scripts align sections to less than a cache
line things do not look rosy either, etc.

> It got away with
>    this so far because the x86 rule of thumb is that dense code is usually the right choice.

>  - Better compiler performance: it would be faster as well to immediately emit assembly
>    instructions, just like GCC's preprocessor library use speeds up compilation *significantly*
>    instead of creating a separate preprocessor task.

So how much faster do you think it would be?  Do you have actual numbers?
And no, -O0 does not count.

>  - Better future integration of assembly blocks: GCC could begin to actually understand the
>    assembly statements in inline asm and allow more user-friendly extensions to its
>    historically complex and difficult to master inline asm syntax.

If you want to add a different kind of inline assembler, you can already.
There is no need for outputting binary code for that.

> I mean, it's a fact that the GNU project has *already* defined their own assembly syntax which
> departs from decades old platform assembly syntax

GCC's asm syntax is over three decades old itself.  When it was defined
all other C inline assembler syntaxes were much younger than that.  I don't
see what your argument is here.

> - and how the assembler is called by the
> compiler is basically an implementation detail, not a conceptual choice.

But the *format* of the interface representation, in this case textual
assembler code, is quite fundamental.

> The random
> multi-process unidirectional assembler choice of the past should not be treated as orthodoxy.

Then I have good news for you: no assembler works that way these days, and
that has been true for decades.


Segher
Reply | Threaded
Open this post in threaded view
|

Re: PROPOSAL: Extend inline asm syntax with size spec

Richard Biener
In reply to this post by Segher Boessenkool
On Mon, 8 Oct 2018, Segher Boessenkool wrote:

> Hi!
>
> On Sun, Oct 07, 2018 at 03:53:26PM +0000, Michael Matz wrote:
> > On Sun, 7 Oct 2018, Segher Boessenkool wrote:
> > > On Sun, Oct 07, 2018 at 11:18:06AM +0200, Borislav Petkov wrote:
> > > > this is an attempt to see whether gcc's inline asm heuristic when
> > > > estimating inline asm statements' cost for better inlining can be
> > > > improved.
> > >
> > > GCC already estimates the *size* of inline asm, and this is required
> > > *for correctness*.  So any workaround that works against this will only
> > > end in tears.
> >
> > You're right and wrong.  GCC can't even estimate the size of mildly
> > complicated inline asms right now, so your claim of it being necessary for
> > correctness can't be true in this absolute form.  I know what you try to
> > say, but still, consider inline asms like this:
> >
> >      insn1
> >   .section bla
> >      insn2
> >   .previous
> >
> > or
> >    invoke_asm_macro foo,bar
> >
> > in both cases GCCs size estimate will be wrong however you want to count
> > it.  This is actually the motivating example for the kernel guys, the
> > games they play within their inline asms make the estimates be wildly
> > wrong to a point it interacts with the inliner.
>
> Right.  The manual says:
>
> """
> Some targets require that GCC track the size of each instruction used
> in order to generate correct code.  Because the final length of the
> code produced by an @code{asm} statement is only known by the
> assembler, GCC must make an estimate as to how big it will be.  It
> does this by counting the number of instructions in the pattern of the
> @code{asm} and multiplying that by the length of the longest
> instruction supported by that processor.  (When working out the number
> of instructions, it assumes that any occurrence of a newline or of
> whatever statement separator character is supported by the assembler --
> typically @samp{;} --- indicates the end of an instruction.)
>
> Normally, GCC's estimate is adequate to ensure that correct
> code is generated, but it is possible to confuse the compiler if you use
> pseudo instructions or assembler macros that expand into multiple real
> instructions, or if you use assembler directives that expand to more
> space in the object file than is needed for a single instruction.
> If this happens then the assembler may produce a diagnostic saying that
> a label is unreachable.
> """
>
> It *is* necessary for correctness, except you can do things that can
> confuse the compiler and then you are on your own anyway.
>
> > > So I guess the real issue is that the inline asm size estimate for x86
> > > isn't very good (since it has to be pessimistic, and x86 insns can be
> > > huge)?
> >
> > No, see above, even if we were to improve the size estimates (e.g. based
> > on some average instruction size) the kernel examples would still be off
> > because they switch sections back and forth, use asm macros and computed
> > .fill directives and maybe further stuff.  GCC will never be able to
> > accurately calculate these sizes
>
> What *is* such a size, anyway?  If it can be spread over multiple sections
> (some of which support section merging), and you can have huge alignments,
> etc.  What is needed here is not knowing the maximum size of the binary
> output (however you want to define that), but some way for the compiler
> to understand how bad it is to inline some assembler.  Maybe manual
> direction, maybe just the current jeuristics can be tweaked a bit, maybe
> we need to invent some attribute or two.
>
> > (without an built-in assembler which hopefully noone proposes).
>
> Not me, that's for sure.
>
> > So, there is a case for extending the inline-asm facility to say
> > "size is complicated here, assume this for inline decisions".
>
> Yeah, that's an option.  It may be too complicated though, or just not
> useful in its generality, so that everyone will use "1" (or "1 normal
> size instruction"), and then we are better off just making something
> for _that_ (or making it the default).
>
> > > > Now, Richard suggested doing something like:
> > > >
> > > >  1) inline asm ("...")
> > >
> > > What would the semantics of this be?
> >
> > The size of the inline asm wouldn't be counted towards the inliner size
> > limits (or be counted as "1").
>
> That sounds like a good option.

Yes, I also like it for simplicity.  It also avoids the requirement
of translating the number (in bytes?) given by the user to
"number of GIMPLE instructions" as needed by the inliner.

> > > I don't like 2) either.  But 1) looks interesting, depends what its
> > > semantics would be?  "Don't count this insn's size for inlining decisions",
> > > maybe?
> >
> > TBH, I like the inline asm (...) suggestion most currently, but what if we
> > want to add more attributes to asms?  We could add further special
> > keywords to the clobber list:
> >   asm ("...." : : : "cc,memory,inline");
> > sure, it might seem strange to "clobber" inline, but if we reinterpret the
> > clobber list as arbitrary set of attributes for this asm, it'd be fine.
>
> All of a targets register names and alternative register names are
> allowed in the clobber list.  Will that never conflict with an attribute
> name?  We already *have* syntax for specifying attributes on an asm (on
> *any* statement even), so mixing these two things has no advantage.

Heh, but I failed to make an example with attribute synatx working.
IIRC attributes do not work on stmts.  What could work is to use
a #pragma though.

Richard.

> Both "cc" and "memory" have their own problems of course, adding more
> things to this just feels bad.  It may not be so bad ;-)
>
> > > Another option is to just force inlining for those few functions where
> > > GCC currently makes an inlining decision you don't like.  Or are there
> > > more than a few?
> >
> > I think the examples I saw from Boris were all indirect inlines:
> >
> >   static inline void foo() { asm("large-looking-but-small-asm"); }
> >   static void bar1() { ... foo() ... }
> >   static void bar2() { ... foo() ... }
> >   void goo (void) { bar1(); }  // bar1 should have been inlined
> >
> > So, while the immediate asm user was marked as always inline that in turn
> > caused users of it to become non-inlined.  I'm assuming the kernel guys
> > did proper measurements that they _really_ get some non-trivial speed
> > benefit by inlining bar1/bar2, but for some reasons (I didn't inquire)
> > didn't want to mark them all as inline as well.
>
> Yeah that makes sense, like if this happens with the fixup stuff, it will
> quickly spiral out of control.
Reply | Threaded
Open this post in threaded view
|

Re: PROPOSAL: Extend inline asm syntax with size spec

Segher Boessenkool
On Mon, Oct 08, 2018 at 11:07:46AM +0200, Richard Biener wrote:
> > All of a targets register names and alternative register names are
> > allowed in the clobber list.  Will that never conflict with an attribute
> > name?  We already *have* syntax for specifying attributes on an asm (on
> > *any* statement even), so mixing these two things has no advantage.
>
> Heh, but I failed to make an example with attribute synatx working.
> IIRC attributes do not work on stmts.  What could work is to use
> a #pragma though.

Apparently statement attributes currently(?) only work for null statements.
Oh well.


Segher
Reply | Threaded
Open this post in threaded view
|

Re: PROPOSAL: Extend inline asm syntax with size spec

Segher Boessenkool
In reply to this post by Richard Biener
On Mon, Oct 08, 2018 at 11:07:46AM +0200, Richard Biener wrote:

> On Mon, 8 Oct 2018, Segher Boessenkool wrote:
> > On Sun, Oct 07, 2018 at 03:53:26PM +0000, Michael Matz wrote:
> > > On Sun, 7 Oct 2018, Segher Boessenkool wrote:
> > > > On Sun, Oct 07, 2018 at 11:18:06AM +0200, Borislav Petkov wrote:
> > > > > Now, Richard suggested doing something like:
> > > > >
> > > > >  1) inline asm ("...")
> > > >
> > > > What would the semantics of this be?
> > >
> > > The size of the inline asm wouldn't be counted towards the inliner size
> > > limits (or be counted as "1").
> >
> > That sounds like a good option.
>
> Yes, I also like it for simplicity.  It also avoids the requirement
> of translating the number (in bytes?) given by the user to
> "number of GIMPLE instructions" as needed by the inliner.

This patch implements this, for C only so far.  And the syntax is
"asm inline", which is more in line with other syntax.

How does this look?


Segher



diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 1f173fc..94b1d41 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -6287,7 +6287,7 @@ static tree
 c_parser_asm_statement (c_parser *parser)
 {
   tree quals, str, outputs, inputs, clobbers, labels, ret;
-  bool simple, is_goto;
+  bool simple, is_goto, is_inline;
   location_t asm_loc = c_parser_peek_token (parser)->location;
   int section, nsections;
 
@@ -6318,6 +6318,13 @@ c_parser_asm_statement (c_parser *parser)
       is_goto = true;
     }
 
+  is_inline = false;
+  if (c_parser_next_token_is_keyword (parser, RID_INLINE))
+    {
+      c_parser_consume_token (parser);
+      is_inline = true;
+    }
+
   /* ??? Follow the C++ parser rather than using the
      lex_untranslated_string kludge.  */
   parser->lex_untranslated_string = true;
@@ -6393,7 +6400,8 @@ c_parser_asm_statement (c_parser *parser)
     c_parser_skip_to_end_of_block_or_statement (parser);
 
   ret = build_asm_stmt (quals, build_asm_expr (asm_loc, str, outputs, inputs,
-       clobbers, labels, simple));
+       clobbers, labels, simple,
+       is_inline));
 
  error:
   parser->lex_untranslated_string = false;
diff --git a/gcc/c/c-tree.h b/gcc/c/c-tree.h
index 017c01c..f5629300 100644
--- a/gcc/c/c-tree.h
+++ b/gcc/c/c-tree.h
@@ -677,7 +677,8 @@ extern tree build_compound_literal (location_t, tree, tree, bool,
 extern void check_compound_literal_type (location_t, struct c_type_name *);
 extern tree c_start_case (location_t, location_t, tree, bool);
 extern void c_finish_case (tree, tree);
-extern tree build_asm_expr (location_t, tree, tree, tree, tree, tree, bool);
+extern tree build_asm_expr (location_t, tree, tree, tree, tree, tree, bool,
+    bool);
 extern tree build_asm_stmt (tree, tree);
 extern int c_types_compatible_p (tree, tree);
 extern tree c_begin_compound_stmt (bool);
diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index 9d09b8d..e013100 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -10064,7 +10064,7 @@ build_asm_stmt (tree cv_qualifier, tree args)
    are subtly different.  We use a ASM_EXPR node to represent this.  */
 tree
 build_asm_expr (location_t loc, tree string, tree outputs, tree inputs,
- tree clobbers, tree labels, bool simple)
+ tree clobbers, tree labels, bool simple, bool is_inline)
 {
   tree tail;
   tree args;
@@ -10182,6 +10182,7 @@ build_asm_expr (location_t loc, tree string, tree outputs, tree inputs,
      as volatile.  */
   ASM_INPUT_P (args) = simple;
   ASM_VOLATILE_P (args) = (noutputs == 0);
+  ASM_INLINE_P (args) = is_inline;
 
   return args;
 }
diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index 83e2273..1a00fa3 100644
--- a/gcc/gimple-pretty-print.c
+++ b/gcc/gimple-pretty-print.c
@@ -2019,6 +2019,8 @@ dump_gimple_asm (pretty_printer *buffer, gasm *gs, int spc, dump_flags_t flags)
       pp_string (buffer, "__asm__");
       if (gimple_asm_volatile_p (gs))
  pp_string (buffer, " __volatile__");
+      if (gimple_asm_inline_p (gs))
+ pp_string (buffer, " __inline__");
       if (gimple_asm_nlabels (gs))
  pp_string (buffer, " goto");
       pp_string (buffer, "(\"");
diff --git a/gcc/gimple.h b/gcc/gimple.h
index a5dda93..8a58e07 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -137,6 +137,7 @@ enum gimple_rhs_class
 enum gf_mask {
     GF_ASM_INPUT = 1 << 0,
     GF_ASM_VOLATILE = 1 << 1,
+    GF_ASM_INLINE = 1 << 2,
     GF_CALL_FROM_THUNK = 1 << 0,
     GF_CALL_RETURN_SLOT_OPT = 1 << 1,
     GF_CALL_TAILCALL = 1 << 2,
@@ -3911,7 +3912,7 @@ gimple_asm_volatile_p (const gasm *asm_stmt)
 }
 
 
-/* If VOLATLE_P is true, mark asm statement ASM_STMT as volatile.  */
+/* If VOLATILE_P is true, mark asm statement ASM_STMT as volatile.  */
 
 static inline void
 gimple_asm_set_volatile (gasm *asm_stmt, bool volatile_p)
@@ -3923,6 +3924,27 @@ gimple_asm_set_volatile (gasm *asm_stmt, bool volatile_p)
 }
 
 
+/* Return true ASM_STMT ASM_STMT is an asm statement marked inline.  */
+
+static inline bool
+gimple_asm_inline_p (const gasm *asm_stmt)
+{
+  return (asm_stmt->subcode & GF_ASM_INLINE) != 0;
+}
+
+
+/* If INLINE_P is true, mark asm statement ASM_STMT as inline.  */
+
+static inline void
+gimple_asm_set_inline (gasm *asm_stmt, bool inline_p)
+{
+  if (inline_p)
+    asm_stmt->subcode |= GF_ASM_INLINE;
+  else
+    asm_stmt->subcode &= ~GF_ASM_INLINE;
+}
+
+
 /* If INPUT_P is true, mark asm ASM_STMT as an ASM_INPUT.  */
 
 static inline void
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 509fc2f..10b80f2 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -6315,6 +6315,7 @@ gimplify_asm_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p)
 
       gimple_asm_set_volatile (stmt, ASM_VOLATILE_P (expr) || noutputs == 0);
       gimple_asm_set_input (stmt, ASM_INPUT_P (expr));
+      gimple_asm_set_inline (stmt, ASM_INLINE_P (expr));
 
       gimplify_seq_add_stmt (pre_p, stmt);
     }
diff --git a/gcc/ipa-icf-gimple.c b/gcc/ipa-icf-gimple.c
index ba39ea3..5361139 100644
--- a/gcc/ipa-icf-gimple.c
+++ b/gcc/ipa-icf-gimple.c
@@ -993,6 +993,9 @@ func_checker::compare_gimple_asm (const gasm *g1, const gasm *g2)
   if (gimple_asm_input_p (g1) != gimple_asm_input_p (g2))
     return false;
 
+  if (gimple_asm_inline_p (g1) != gimple_asm_inline_p (g2))
+    return false;
+
   if (gimple_asm_ninputs (g1) != gimple_asm_ninputs (g2))
     return false;
 
diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index 9134253..6b1d2ea 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -4108,6 +4108,8 @@ estimate_num_insns (gimple *stmt, eni_weights *weights)
    with very long asm statements.  */
  if (count > 1000)
   count = 1000;
+ if (gimple_asm_inline_p (as_a <gasm *> (stmt)))
+  count = !!count;
  return MAX (1, count);
       }
 
diff --git a/gcc/tree.h b/gcc/tree.h
index 2e45f9d..160b3a7 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1245,6 +1245,9 @@ extern tree maybe_wrap_with_location (tree, location_t);
    ASM_OPERAND with no operands.  */
 #define ASM_INPUT_P(NODE) (ASM_EXPR_CHECK (NODE)->base.static_flag)
 #define ASM_VOLATILE_P(NODE) (ASM_EXPR_CHECK (NODE)->base.public_flag)
+/* Nonzero if we want to consider this asm as minimum length and cost
+   for inlining decisions.  */
+#define ASM_INLINE_P(NODE) (ASM_EXPR_CHECK (NODE)->base.protected_flag)
 
 /* COND_EXPR accessors.  */
 #define COND_EXPR_COND(NODE) (TREE_OPERAND (COND_EXPR_CHECK (NODE), 0))
Reply | Threaded
Open this post in threaded view
|

Re: PROPOSAL: Extend inline asm syntax with size spec

Richard Biener
On Tue, 9 Oct 2018, Segher Boessenkool wrote:

> On Mon, Oct 08, 2018 at 11:07:46AM +0200, Richard Biener wrote:
> > On Mon, 8 Oct 2018, Segher Boessenkool wrote:
> > > On Sun, Oct 07, 2018 at 03:53:26PM +0000, Michael Matz wrote:
> > > > On Sun, 7 Oct 2018, Segher Boessenkool wrote:
> > > > > On Sun, Oct 07, 2018 at 11:18:06AM +0200, Borislav Petkov wrote:
> > > > > > Now, Richard suggested doing something like:
> > > > > >
> > > > > >  1) inline asm ("...")
> > > > >
> > > > > What would the semantics of this be?
> > > >
> > > > The size of the inline asm wouldn't be counted towards the inliner size
> > > > limits (or be counted as "1").
> > >
> > > That sounds like a good option.
> >
> > Yes, I also like it for simplicity.  It also avoids the requirement
> > of translating the number (in bytes?) given by the user to
> > "number of GIMPLE instructions" as needed by the inliner.
>
> This patch implements this, for C only so far.  And the syntax is
> "asm inline", which is more in line with other syntax.
>
> How does this look?

Looks good.  A few nits - you need to document this in extend.texi, the
tree flag use needs documenting in tree-core.h, and we need a testcase
(I'd suggest one that shows we inline a function with "large" asm inline
() even at -Os).

Oh, and I don't think we want C and C++ to diverge - so you need to
cook up C++ support as well.

Can kernel folks give this a second and third thought please so we
don't implement sth that in the end won't satisfy you guys?

Thanks for doing this,
Richard.

>
> Segher
>
>
>
> diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
> index 1f173fc..94b1d41 100644
> --- a/gcc/c/c-parser.c
> +++ b/gcc/c/c-parser.c
> @@ -6287,7 +6287,7 @@ static tree
>  c_parser_asm_statement (c_parser *parser)
>  {
>    tree quals, str, outputs, inputs, clobbers, labels, ret;
> -  bool simple, is_goto;
> +  bool simple, is_goto, is_inline;
>    location_t asm_loc = c_parser_peek_token (parser)->location;
>    int section, nsections;
>  
> @@ -6318,6 +6318,13 @@ c_parser_asm_statement (c_parser *parser)
>        is_goto = true;
>      }
>  
> +  is_inline = false;
> +  if (c_parser_next_token_is_keyword (parser, RID_INLINE))
> +    {
> +      c_parser_consume_token (parser);
> +      is_inline = true;
> +    }
> +
>    /* ??? Follow the C++ parser rather than using the
>       lex_untranslated_string kludge.  */
>    parser->lex_untranslated_string = true;
> @@ -6393,7 +6400,8 @@ c_parser_asm_statement (c_parser *parser)
>      c_parser_skip_to_end_of_block_or_statement (parser);
>  
>    ret = build_asm_stmt (quals, build_asm_expr (asm_loc, str, outputs, inputs,
> -       clobbers, labels, simple));
> +       clobbers, labels, simple,
> +       is_inline));
>  
>   error:
>    parser->lex_untranslated_string = false;
> diff --git a/gcc/c/c-tree.h b/gcc/c/c-tree.h
> index 017c01c..f5629300 100644
> --- a/gcc/c/c-tree.h
> +++ b/gcc/c/c-tree.h
> @@ -677,7 +677,8 @@ extern tree build_compound_literal (location_t, tree, tree, bool,
>  extern void check_compound_literal_type (location_t, struct c_type_name *);
>  extern tree c_start_case (location_t, location_t, tree, bool);
>  extern void c_finish_case (tree, tree);
> -extern tree build_asm_expr (location_t, tree, tree, tree, tree, tree, bool);
> +extern tree build_asm_expr (location_t, tree, tree, tree, tree, tree, bool,
> +    bool);
>  extern tree build_asm_stmt (tree, tree);
>  extern int c_types_compatible_p (tree, tree);
>  extern tree c_begin_compound_stmt (bool);
> diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
> index 9d09b8d..e013100 100644
> --- a/gcc/c/c-typeck.c
> +++ b/gcc/c/c-typeck.c
> @@ -10064,7 +10064,7 @@ build_asm_stmt (tree cv_qualifier, tree args)
>     are subtly different.  We use a ASM_EXPR node to represent this.  */
>  tree
>  build_asm_expr (location_t loc, tree string, tree outputs, tree inputs,
> - tree clobbers, tree labels, bool simple)
> + tree clobbers, tree labels, bool simple, bool is_inline)
>  {
>    tree tail;
>    tree args;
> @@ -10182,6 +10182,7 @@ build_asm_expr (location_t loc, tree string, tree outputs, tree inputs,
>       as volatile.  */
>    ASM_INPUT_P (args) = simple;
>    ASM_VOLATILE_P (args) = (noutputs == 0);
> +  ASM_INLINE_P (args) = is_inline;
>  
>    return args;
>  }
> diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
> index 83e2273..1a00fa3 100644
> --- a/gcc/gimple-pretty-print.c
> +++ b/gcc/gimple-pretty-print.c
> @@ -2019,6 +2019,8 @@ dump_gimple_asm (pretty_printer *buffer, gasm *gs, int spc, dump_flags_t flags)
>        pp_string (buffer, "__asm__");
>        if (gimple_asm_volatile_p (gs))
>   pp_string (buffer, " __volatile__");
> +      if (gimple_asm_inline_p (gs))
> + pp_string (buffer, " __inline__");
>        if (gimple_asm_nlabels (gs))
>   pp_string (buffer, " goto");
>        pp_string (buffer, "(\"");
> diff --git a/gcc/gimple.h b/gcc/gimple.h
> index a5dda93..8a58e07 100644
> --- a/gcc/gimple.h
> +++ b/gcc/gimple.h
> @@ -137,6 +137,7 @@ enum gimple_rhs_class
>  enum gf_mask {
>      GF_ASM_INPUT = 1 << 0,
>      GF_ASM_VOLATILE = 1 << 1,
> +    GF_ASM_INLINE = 1 << 2,
>      GF_CALL_FROM_THUNK = 1 << 0,
>      GF_CALL_RETURN_SLOT_OPT = 1 << 1,
>      GF_CALL_TAILCALL = 1 << 2,
> @@ -3911,7 +3912,7 @@ gimple_asm_volatile_p (const gasm *asm_stmt)
>  }
>  
>  
> -/* If VOLATLE_P is true, mark asm statement ASM_STMT as volatile.  */
> +/* If VOLATILE_P is true, mark asm statement ASM_STMT as volatile.  */
>  
>  static inline void
>  gimple_asm_set_volatile (gasm *asm_stmt, bool volatile_p)
> @@ -3923,6 +3924,27 @@ gimple_asm_set_volatile (gasm *asm_stmt, bool volatile_p)
>  }
>  
>  
> +/* Return true ASM_STMT ASM_STMT is an asm statement marked inline.  */
> +
> +static inline bool
> +gimple_asm_inline_p (const gasm *asm_stmt)
> +{
> +  return (asm_stmt->subcode & GF_ASM_INLINE) != 0;
> +}
> +
> +
> +/* If INLINE_P is true, mark asm statement ASM_STMT as inline.  */
> +
> +static inline void
> +gimple_asm_set_inline (gasm *asm_stmt, bool inline_p)
> +{
> +  if (inline_p)
> +    asm_stmt->subcode |= GF_ASM_INLINE;
> +  else
> +    asm_stmt->subcode &= ~GF_ASM_INLINE;
> +}
> +
> +
>  /* If INPUT_P is true, mark asm ASM_STMT as an ASM_INPUT.  */
>  
>  static inline void
> diff --git a/gcc/gimplify.c b/gcc/gimplify.c
> index 509fc2f..10b80f2 100644
> --- a/gcc/gimplify.c
> +++ b/gcc/gimplify.c
> @@ -6315,6 +6315,7 @@ gimplify_asm_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p)
>  
>        gimple_asm_set_volatile (stmt, ASM_VOLATILE_P (expr) || noutputs == 0);
>        gimple_asm_set_input (stmt, ASM_INPUT_P (expr));
> +      gimple_asm_set_inline (stmt, ASM_INLINE_P (expr));
>  
>        gimplify_seq_add_stmt (pre_p, stmt);
>      }
> diff --git a/gcc/ipa-icf-gimple.c b/gcc/ipa-icf-gimple.c
> index ba39ea3..5361139 100644
> --- a/gcc/ipa-icf-gimple.c
> +++ b/gcc/ipa-icf-gimple.c
> @@ -993,6 +993,9 @@ func_checker::compare_gimple_asm (const gasm *g1, const gasm *g2)
>    if (gimple_asm_input_p (g1) != gimple_asm_input_p (g2))
>      return false;
>  
> +  if (gimple_asm_inline_p (g1) != gimple_asm_inline_p (g2))
> +    return false;
> +
>    if (gimple_asm_ninputs (g1) != gimple_asm_ninputs (g2))
>      return false;
>  
> diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
> index 9134253..6b1d2ea 100644
> --- a/gcc/tree-inline.c
> +++ b/gcc/tree-inline.c
> @@ -4108,6 +4108,8 @@ estimate_num_insns (gimple *stmt, eni_weights *weights)
>     with very long asm statements.  */
>   if (count > 1000)
>    count = 1000;
> + if (gimple_asm_inline_p (as_a <gasm *> (stmt)))
> +  count = !!count;
>   return MAX (1, count);
>        }
>  
> diff --git a/gcc/tree.h b/gcc/tree.h
> index 2e45f9d..160b3a7 100644
> --- a/gcc/tree.h
> +++ b/gcc/tree.h
> @@ -1245,6 +1245,9 @@ extern tree maybe_wrap_with_location (tree, location_t);
>     ASM_OPERAND with no operands.  */
>  #define ASM_INPUT_P(NODE) (ASM_EXPR_CHECK (NODE)->base.static_flag)
>  #define ASM_VOLATILE_P(NODE) (ASM_EXPR_CHECK (NODE)->base.public_flag)
> +/* Nonzero if we want to consider this asm as minimum length and cost
> +   for inlining decisions.  */
> +#define ASM_INLINE_P(NODE) (ASM_EXPR_CHECK (NODE)->base.protected_flag)
>  
>  /* COND_EXPR accessors.  */
>  #define COND_EXPR_COND(NODE) (TREE_OPERAND (COND_EXPR_CHECK (NODE), 0))
>
>

--
Richard Biener <[hidden email]>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Reply | Threaded
Open this post in threaded view
|

Re: PROPOSAL: Extend inline asm syntax with size spec

Segher Boessenkool
On Wed, Oct 10, 2018 at 09:12:48AM +0200, Richard Biener wrote:

> On Tue, 9 Oct 2018, Segher Boessenkool wrote:
> > On Mon, Oct 08, 2018 at 11:07:46AM +0200, Richard Biener wrote:
> > > On Mon, 8 Oct 2018, Segher Boessenkool wrote:
> > > > On Sun, Oct 07, 2018 at 03:53:26PM +0000, Michael Matz wrote:
> > > > > On Sun, 7 Oct 2018, Segher Boessenkool wrote:
> > > > > > On Sun, Oct 07, 2018 at 11:18:06AM +0200, Borislav Petkov wrote:
> > > > > > > Now, Richard suggested doing something like:
> > > > > > >
> > > > > > >  1) inline asm ("...")
> > > > > >
> > > > > > What would the semantics of this be?
> > > > >
> > > > > The size of the inline asm wouldn't be counted towards the inliner size
> > > > > limits (or be counted as "1").
> > > >
> > > > That sounds like a good option.
> > >
> > > Yes, I also like it for simplicity.  It also avoids the requirement
> > > of translating the number (in bytes?) given by the user to
> > > "number of GIMPLE instructions" as needed by the inliner.
> >
> > This patch implements this, for C only so far.  And the syntax is
> > "asm inline", which is more in line with other syntax.
> >
> > How does this look?
>
> Looks good.  A few nits - you need to document this in extend.texi, the

Yup.

> tree flag use needs documenting in tree-core.h,

Ah yes.

> and we need a testcase
> (I'd suggest one that shows we inline a function with "large" asm inline
> () even at -Os).

I have one.  Oh, and I probably should do a comment at the one line of
code that isn't just bookkeeping ;-)

> Oh, and I don't think we want C and C++ to diverge - so you need to
> cook up C++ support as well.

Right, that's why I said "C only so far".

> Can kernel folks give this a second and third thought please so we
> don't implement sth that in the end won't satisfy you guys?

Or actually try it out and see if it has the desired effect!  Nothing
beats field trials.

I'll do the C++ thing today hopefully, and send things to gcc-patches@.


Segher
Reply | Threaded
Open this post in threaded view
|

Re: PROPOSAL: Extend inline asm syntax with size spec

Segher Boessenkool
In reply to this post by Richard Biener
On Wed, Oct 10, 2018 at 09:22:40AM +0200, Ingo Molnar wrote:
> * Richard Biener <[hidden email]> wrote:
> > Can kernel folks give this a second and third thought please so we
> > don't implement sth that in the end won't satisfy you guys?
>
> So this basically passes '0 size' to the inliner, which should be better
> than passing in the explicit size, as we'd inevitably get it wrong
> in cases.

The code immediately after this makes it size 1, even for things like
asm(""), I suppose this works better for the inliner.  But that's a detail
(and it might change); the description says "consider this asm as minimum
length and cost for inlining decisions", which works for either 0 or 1.

> I also like 'size 0' for the reason that we tend to write assembly code
> and mark it 'inline' if we really think it matters to performance,
> so making it more likely to be inlined when used within another inline
> function is a plus as well.

You can think of it as meaning "we want this asm inlined always", and then
whether that actually happens depends on if the function around it is
inlined or not.


Segher
Reply | Threaded
Open this post in threaded view
|

Re: PROPOSAL: Extend inline asm syntax with size spec

Borislav Petkov-2
On Wed, Oct 10, 2018 at 03:03:25AM -0500, Segher Boessenkool wrote:
> The code immediately after this makes it size 1, even for things like
> asm(""), I suppose this works better for the inliner.  But that's a detail
> (and it might change); the description says "consider this asm as minimum
> length and cost for inlining decisions", which works for either 0 or 1.

Thanks for implementing this, much appreciated. If you need people to
test stuff, lemme know.

> You can think of it as meaning "we want this asm inlined always", and then
> whether that actually happens depends on if the function around it is
> inlined or not.

My only concern is how we would catch the other extremity where the
inline asm grows too big and we end up inlining it everywhere and thus
getting fat. The 0day bot already builds tinyconfigs but we should be
looking at vmlinux size growth too.

Thx.

--
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
12