[Bug libstdc++/91719] New: gcc compiles seq_cst store on x86-64 differently from clang/icc

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

[Bug libstdc++/91719] New: gcc compiles seq_cst store on x86-64 differently from clang/icc

jason at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91719

            Bug ID: 91719
           Summary: gcc compiles seq_cst store on x86-64 differently from
                    clang/icc
           Product: gcc
           Version: 9.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libstdc++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: [hidden email]
  Target Milestone: ---

On x86-64 all versions of gcc compile a sequentially consistent atomic store as
'mov + mfence'. Both clang and icc on the other hand instead compile such a
store using only a 'xchg' instruction, which appears to be more efficient
according to my tests.

Example code:
------------------------------
#include <atomic>

std::atomic<int> G{0};

void store() {
  G.store(1);
}
------------------------------
Reply | Threaded
Open this post in threaded view
|

[Bug c++/91719] gcc compiles seq_cst store on x86-64 differently from clang/icc

jason at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91719

--- Comment #1 from Jonathan Wakely <redi at gcc dot gnu.org> ---
Surely this is a compiler bug not libstdc++ bug, since std::atomic just uses
the __atomic built-in function.
Reply | Threaded
Open this post in threaded view
|

[Bug target/91719] gcc compiles seq_cst store on x86-64 differently from clang/icc

jason at gcc dot gnu.org
In reply to this post by jason at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91719

Uroš Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |ASSIGNED
   Last reconfirmed|                            |2019-09-10
           Assignee|unassigned at gcc dot gnu.org      |ubizjak at gmail dot com
   Target Milestone|---                         |10.0
     Ever confirmed|0                           |1

--- Comment #2 from Uroš Bizjak <ubizjak at gmail dot com> ---
Proposed patch:

--cut here--
diff --git a/gcc/config/i386/sync.md b/gcc/config/i386/sync.md
index ba146e3c8f81..b23dbf93bde7 100644
--- a/gcc/config/i386/sync.md
+++ b/gcc/config/i386/sync.md
@@ -306,8 +306,8 @@
     {
       operands[1] = force_reg (<MODE>mode, operands[1]);

-      /* For seq-cst stores, when we lack MFENCE, use XCHG.  */
-      if (is_mm_seq_cst (model) && !(TARGET_64BIT || TARGET_SSE2))
+      /* For seq-cst stores use XCHG.  */
+      if (is_mm_seq_cst (model))
        {
          emit_insn (gen_atomic_exchange<mode> (gen_reg_rtx (<MODE>mode),
                                                operands[0], operands[1],
--cut here--
Reply | Threaded
Open this post in threaded view
|

[Bug target/91719] gcc compiles seq_cst store on x86-64 differently from clang/icc

jason at gcc dot gnu.org
In reply to this post by jason at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91719

--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
We do that if mfence isn't supported:
      /* For seq-cst stores, when we lack MFENCE, use XCHG.  */
      if (is_mm_seq_cst (model) && !(TARGET_64BIT || TARGET_SSE2))
        {
          emit_insn (gen_atomic_exchange<mode> (gen_reg_rtx (<MODE>mode),
                                                operands[0], operands[1],
                                                operands[2]));
          DONE;
        }

      /* Otherwise use a store.  */
      emit_insn (gen_atomic_store<mode>_1 (operands[0], operands[1],
                                           operands[2]));
    }
  /* ... followed by an MFENCE, if required.  */
  if (is_mm_seq_cst (model))
    emit_insn (gen_mem_thread_fence (operands[2]));

So, I guess we need to benchmark both and if xchg is beneficial on some CPUs,
use it there guarded by some tuning flag.
Reply | Threaded
Open this post in threaded view
|

[Bug target/91719] gcc compiles seq_cst store on x86-64 differently from clang/icc

jason at gcc dot gnu.org
In reply to this post by jason at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91719

--- Comment #4 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Jakub Jelinek from comment #3)
> So, I guess we need to benchmark both and if xchg is beneficial on some
> CPUs, use it there guarded by some tuning flag.

Yes, my patch assumes that XCHG is beneficial on all CPUs, which may not be the
case.
Reply | Threaded
Open this post in threaded view
|

[Bug target/91719] gcc compiles seq_cst store on x86-64 differently from clang/icc

jason at gcc dot gnu.org
In reply to this post by jason at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91719

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rth at gcc dot gnu.org

--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
This has been added in https://gcc.gnu.org/ml/gcc-patches/2011-11/msg00280.html
, CCing Richard just in case he remembers some details.
Reply | Threaded
Open this post in threaded view
|

[Bug target/91719] gcc compiles seq_cst store on x86-64 differently from clang/icc

jason at gcc dot gnu.org
In reply to this post by jason at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91719

--- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Tried:
unsigned int a;

int
main ()
{
  for (int i = 0; i < 100000000; i++)
    __atomic_store_n (&a, i, __ATOMIC_SEQ_CST);
  return 0;
}
and:
unsigned int a;

int
main ()
{
  for (int i = 0; i < 100000000; i++)
    __atomic_exchange_n (&a, i, __ATOMIC_SEQ_CST);
  return 0;
}
and got (microbenchmark, sure):
Intel(R) Core(TM) i9-7960X CPU @ 2.80GHz
user    0m1.045s
user    0m0.441s
Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz
user    0m1.216s
user    0m0.529s
Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
user    0m1.185s
user    0m0.627s
Intel Xeon E312xx (Sandy Bridge)
user    0m1.600s
user    0m0.846s
Quad-Core AMD Opteron(tm) Processor 8354
user    0m1.720s
user    0m0.724s
Reply | Threaded
Open this post in threaded view
|

[Bug target/91719] gcc compiles seq_cst store on x86-64 differently from clang/icc

jason at gcc dot gnu.org
In reply to this post by jason at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91719

--- Comment #7 from Uroš Bizjak <ubizjak at gmail dot com> ---
Created attachment 46861
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46861&action=edit
Proposed patch that introduces use_xchg_for_atomic_store

The patch introduces use_xchg_for_atomic_store and enables it for
(m_INTEL | m_CORE_ALL). We need some confirmation from Intel and AMD, what
targets should have this tune enabled.
Reply | Threaded
Open this post in threaded view
|

[Bug target/91719] gcc compiles seq_cst store on x86-64 differently from clang/icc

jason at gcc dot gnu.org
In reply to this post by jason at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91719

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |vekumar at gcc dot gnu.org

--- Comment #8 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
CCing AMD too.
Reply | Threaded
Open this post in threaded view
|

[Bug target/91719] gcc compiles seq_cst store on x86-64 differently from clang/icc

jason at gcc dot gnu.org
In reply to this post by jason at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91719

--- Comment #9 from vekumar at gcc dot gnu.org ---
(In reply to Jakub Jelinek from comment #8)
> CCing AMD too.
Sure Let me check if this tuning helps AMD Zen Arch.
Reply | Threaded
Open this post in threaded view
|

[Bug target/91719] gcc compiles seq_cst store on x86-64 differently from clang/icc

jason at gcc dot gnu.org
In reply to this post by jason at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91719

--- Comment #10 from vekumar at gcc dot gnu.org ---
xchg is faster than mov+mfence on AMD Zen. We can add m_ZNVER1 | m_ZNVER2 to
the tuning.
Reply | Threaded
Open this post in threaded view
|

[Bug target/91719] gcc compiles seq_cst store on x86-64 differently from clang/icc

jason at gcc dot gnu.org
In reply to this post by jason at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91719

--- Comment #11 from uros at gcc dot gnu.org ---
Author: uros
Date: Mon Sep 16 18:37:28 2019
New Revision: 275754

URL: https://gcc.gnu.org/viewcvs?rev=275754&root=gcc&view=rev
Log:
        PR target/91719
        * config/i386/i386.h (TARGET_USE_XCHG_FOR_ATOMIC_STORE): New macro.
        * config/i386/x86-tune.def (X86_TUNE_USE_XCHG_FOR_ATOMIC_STORE): New.
        * config/i386/sync.md (atomic_store<mode>): emit XCHG for
        TARGET_USE_XCHG_FOR_ATOMIC_STORE.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.h
    trunk/gcc/config/i386/sync.md
    trunk/gcc/config/i386/x86-tune.def
Reply | Threaded
Open this post in threaded view
|

[Bug target/91719] gcc compiles seq_cst store on x86-64 differently from clang/icc

jason at gcc dot gnu.org
In reply to this post by jason at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91719

Uroš Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |maxim.yegorushkin at gmail dot com

--- Comment #12 from Uroš Bizjak <ubizjak at gmail dot com> ---
*** Bug 90606 has been marked as a duplicate of this bug. ***
Reply | Threaded
Open this post in threaded view
|

[Bug target/91719] gcc compiles seq_cst store on x86-64 differently from clang/icc

jason at gcc dot gnu.org
In reply to this post by jason at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91719

Uroš Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED

--- Comment #13 from Uroš Bizjak <ubizjak at gmail dot com> ---
Implemented for gcc-10.