[PATCH 2/6, OpenACC, libgomp] Async re-work, oacc-* parts

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH 2/6, OpenACC, libgomp] Async re-work, oacc-* parts

Chung-Lin Tang-5
Hi Thomas,
These are the OpenACC specific changes, mostly the re-implementation of async-related acc_* runtime
library API functions to use the new backend plugin interfaces, in a non-target specific way.

Thanks,
Chung-Lin


        * oacc-async.c (get_goacc_thread): New function.
        (get_goacc_thread_device): New function.
        (lookup_goacc_asyncqueue): New function.
        (get_goacc_asyncqueue): New function.
        (acc_async_test): Adjust code to use new async design.
        (acc_async_test_all): Likewise.
        (acc_wait): Likewise.
        (acc_wait_async): Likewise.
        (acc_wait_all): Likewise.
        (acc_wait_all_async): Likewise.
        (acc_get_default_async): New API function.
        (acc_set_default_async): Likewise.
        (goacc_async_unmap_tgt): New function.
        (goacc_async_copyout_unmap_vars): Likewise.
        (goacc_async_free): Likewise.
        (goacc_init_asyncqueues): Likewise.
        (goacc_fini_asyncqueues): Likewise.
        * oacc-cuda.c (acc_get_cuda_stream): Adjust code to use new async
        design.
        (acc_set_cuda_stream): Likewise.
        * oacc-host.c (host_openacc_exec): Adjust parameters, remove 'async'.
        (host_openacc_register_async_cleanup): Remove.
        (host_openacc_async_exec): New function.
        (host_openacc_async_test): Adjust parameters.
        (host_openacc_async_test_all): Remove.
        (host_openacc_async_wait): Remove.
        (host_openacc_async_wait_async): Remove.
        (host_openacc_async_wait_all): Remove.
        (host_openacc_async_wait_all_async): Remove.
        (host_openacc_async_set_async): Remove.
        (host_openacc_async_synchronize): New function.
        (host_openacc_async_serialize): New function.
        (host_openacc_async_host2dev): New function.
        (host_openacc_async_dev2host): New function.
        (host_openacc_async_queue_callback): New function.
        (host_openacc_async_construct): New function.
        (host_openacc_async_destruct): New function.
        (struct gomp_device_descr host_dispatch): Remove initialization of old
        interface, add intialization of new async sub-struct.
        * oacc-init.c (acc_shutdown_1): Adjust to use gomp_fini_device.
        (goacc_attach_host_thread_to_device): Remove old async code usage, add
        initialization of per-thread default_async.
        * oacc-int.h (struct goacc_thread): Add default_async field.
        (goacc_init_asyncqueues): New declaration.
        (goacc_fini_asyncqueues): Likewise.
        (goacc_async_copyout_unmap_vars): Likewise.
        (goacc_async_free): Likewise.
        (get_goacc_asyncqueue): Likewise.
        (lookup_goacc_asyncqueue): Likewise.

        * oacc-mem.c (memcpy_tofrom_device): Adjust code to use new async
        design.
        (present_create_copy): Likewise.
        (delete_copyout): Likewise.
        (update_dev_host): Likewise.
        (gomp_acc_insert_pointer): Add async parameter, adjust code to use new
        async design.
        (gomp_acc_remove_pointer): Adjust code to use new async design.
        * oacc-parallel.c (GOACC_parallel_keyed): Likewise.
        (GOACC_enter_exit_data): Likewise.
        (goacc_wait): Likewise.
        (GOACC_update): Likewise.
        * oacc-plugin.c (GOMP_PLUGIN_async_unmap_vars): Remove.


async-02.openacc-parts.patch (34K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 2/6, OpenACC, libgomp] Async re-work, oacc-* parts

Thomas Schwinge-8
Hi Chung-Lin!

On Tue, 25 Sep 2018 21:10:47 +0800, Chung-Lin Tang <[hidden email]> wrote:
> These are the OpenACC specific changes, mostly the re-implementation of async-related acc_* runtime
> library API functions to use the new backend plugin interfaces, in a non-target specific way.

(The patch was missing the "libgomp/oacc-int.h" changes, but I had no
problem replicating these from openacc-gcc-8-branch.)


Does the following make sense?

commit f86b403dbe6ed17afa8d157ec4089ff169a63680
Author: Thomas Schwinge <[hidden email]>
Date:   Fri Dec 7 12:19:56 2018 +0100

    Don't create an asyncqueue just to then test/synchronize with it
---
 libgomp/oacc-async.c    | 12 ++++++++----
 libgomp/oacc-parallel.c |  4 +++-
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git libgomp/oacc-async.c libgomp/oacc-async.c
index 553082fe3d4a..c9b134ac3380 100644
--- libgomp/oacc-async.c
+++ libgomp/oacc-async.c
@@ -119,8 +119,11 @@ acc_async_test (int async)
   if (!thr || !thr->dev)
     gomp_fatal ("no device active");
 
-  goacc_aq aq = lookup_goacc_asyncqueue (thr, true, async);
-  return thr->dev->openacc.async.test_func (aq);
+  goacc_aq aq = lookup_goacc_asyncqueue (thr, false, async);
+  if (!aq)
+    return 1;
+  else
+    return thr->dev->openacc.async.test_func (aq);
 }
 
 int
@@ -148,8 +151,9 @@ acc_wait (int async)
 
   struct goacc_thread *thr = get_goacc_thread ();
 
-  goacc_aq aq = lookup_goacc_asyncqueue (thr, true, async);
-  thr->dev->openacc.async.synchronize_func (aq);
+  goacc_aq aq = lookup_goacc_asyncqueue (thr, false, async);
+  if (aq)
+    thr->dev->openacc.async.synchronize_func (aq);
 }
 
 /* acc_async_wait is an OpenACC 1.0 compatibility name for acc_wait.  */
diff --git libgomp/oacc-parallel.c libgomp/oacc-parallel.c
index 7b6a6e515018..0be48f98036f 100644
--- libgomp/oacc-parallel.c
+++ libgomp/oacc-parallel.c
@@ -491,7 +491,9 @@ goacc_wait (int async, int num_waits, va_list *ap)
     {
       int qid = va_arg (*ap, int);
       
-      goacc_aq aq = get_goacc_asyncqueue (qid);
+      goacc_aq aq = lookup_goacc_asyncqueue (thr, false, qid);
+      if (!aq)
+ continue;
       if (acc_dev->openacc.async.test_func (aq))
  continue;
       if (async == acc_async_sync)


Grüße
 Thomas
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 2/6, OpenACC, libgomp] Async re-work, oacc-* parts

Chung-Lin Tang-2
On 2018/12/7 07:32 PM, Thomas Schwinge wrote:

> Hi Chung-Lin!
>
> On Tue, 25 Sep 2018 21:10:47 +0800, Chung-Lin Tang <[hidden email]> wrote:
>> These are the OpenACC specific changes, mostly the re-implementation of async-related acc_* runtime
>> library API functions to use the new backend plugin interfaces, in a non-target specific way.
>
> (The patch was missing the "libgomp/oacc-int.h" changes, but I had no
> problem replicating these from openacc-gcc-8-branch.)
>
>
> Does the following make sense?

I don't quite remember why I simply ensured asyncqueue creation here at the time,
maybe simply because it allowed simpler code at this level.

OTOH, the old logic is to GOMP_fatal upon such an unknown queue case, so maybe that's
the right thing to do (inside lookup_goacc_asyncqueue()), instead of silently allowing it to pass.

WDYT?

Chung-Lin


> commit f86b403dbe6ed17afa8d157ec4089ff169a63680
> Author: Thomas Schwinge <[hidden email]>
> Date:   Fri Dec 7 12:19:56 2018 +0100
>
>     Don't create an asyncqueue just to then test/synchronize with it
> ---
>  libgomp/oacc-async.c    | 12 ++++++++----
>  libgomp/oacc-parallel.c |  4 +++-
>  2 files changed, 11 insertions(+), 5 deletions(-)
>
> diff --git libgomp/oacc-async.c libgomp/oacc-async.c
> index 553082fe3d4a..c9b134ac3380 100644
> --- libgomp/oacc-async.c
> +++ libgomp/oacc-async.c
> @@ -119,8 +119,11 @@ acc_async_test (int async)
>    if (!thr || !thr->dev)
>      gomp_fatal ("no device active");
>  
> -  goacc_aq aq = lookup_goacc_asyncqueue (thr, true, async);
> -  return thr->dev->openacc.async.test_func (aq);
> +  goacc_aq aq = lookup_goacc_asyncqueue (thr, false, async);
> +  if (!aq)
> +    return 1;
> +  else
> +    return thr->dev->openacc.async.test_func (aq);
>  }
>  
>  int
> @@ -148,8 +151,9 @@ acc_wait (int async)
>  
>    struct goacc_thread *thr = get_goacc_thread ();
>  
> -  goacc_aq aq = lookup_goacc_asyncqueue (thr, true, async);
> -  thr->dev->openacc.async.synchronize_func (aq);
> +  goacc_aq aq = lookup_goacc_asyncqueue (thr, false, async);
> +  if (aq)
> +    thr->dev->openacc.async.synchronize_func (aq);
>  }
>  
>  /* acc_async_wait is an OpenACC 1.0 compatibility name for acc_wait.  */
> diff --git libgomp/oacc-parallel.c libgomp/oacc-parallel.c
> index 7b6a6e515018..0be48f98036f 100644
> --- libgomp/oacc-parallel.c
> +++ libgomp/oacc-parallel.c
> @@ -491,7 +491,9 @@ goacc_wait (int async, int num_waits, va_list *ap)
>      {
>        int qid = va_arg (*ap, int);
>        
> -      goacc_aq aq = get_goacc_asyncqueue (qid);
> +      goacc_aq aq = lookup_goacc_asyncqueue (thr, false, qid);
> +      if (!aq)
> + continue;
>        if (acc_dev->openacc.async.test_func (aq))
>   continue;
>        if (async == acc_async_sync)
>
>
> Grüße
>  Thomas
>
>

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 2/6, OpenACC, libgomp] Async re-work, oacc-* parts

Thomas Schwinge-8
Hi Chung-Lin!

On Fri, 7 Dec 2018 22:19:14 +0800, Chung-Lin Tang <[hidden email]> wrote:
> On 2018/12/7 07:32 PM, Thomas Schwinge wrote:
> > Does the following make sense?
>
> I don't quite remember why I simply ensured asyncqueue creation here at the time,
> maybe simply because it allowed simpler code at this level.

Well, I think it's just overhead we can avoid.  ;-)

> OTOH, the old logic is to GOMP_fatal upon such an unknown queue case, so maybe that's
> the right thing to do (inside lookup_goacc_asyncqueue()), instead of silently allowing it to pass.
>
> WDYT?

I argued and posted patches (or will post if not yet done) to make this
defined, valid behavior, <https://gcc.gnu.org/PR88407> "[OpenACC]
Correctly handle unseen async-arguments".  Please speak up soon if you
disagree.

Thus, I still propose that you include the following.

Please especially review the "libgomp/oacc-parallel.c:goacc_wait" change,
and confirm no corresponding "libgomp/oacc-parallel.c:GOACC_wait" change
to be done, because that code is structured differently.

commit c96c6607b77bdbf562f35209718d8b8c5705c603
Author: Thomas Schwinge <[hidden email]>
Date:   Fri Dec 7 12:19:56 2018 +0100

    into async re-work: don't create an asyncqueue just to then test/synchronize with it
---
 libgomp/oacc-async.c    | 12 ++++++++----
 libgomp/oacc-parallel.c |  4 +++-
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git libgomp/oacc-async.c libgomp/oacc-async.c
index 553082fe3d4a..c9b134ac3380 100644
--- libgomp/oacc-async.c
+++ libgomp/oacc-async.c
@@ -119,8 +119,11 @@ acc_async_test (int async)
   if (!thr || !thr->dev)
     gomp_fatal ("no device active");
 
-  goacc_aq aq = lookup_goacc_asyncqueue (thr, true, async);
-  return thr->dev->openacc.async.test_func (aq);
+  goacc_aq aq = lookup_goacc_asyncqueue (thr, false, async);
+  if (!aq)
+    return 1;
+  else
+    return thr->dev->openacc.async.test_func (aq);
 }
 
 int
@@ -148,8 +151,9 @@ acc_wait (int async)
 
   struct goacc_thread *thr = get_goacc_thread ();
 
-  goacc_aq aq = lookup_goacc_asyncqueue (thr, true, async);
-  thr->dev->openacc.async.synchronize_func (aq);
+  goacc_aq aq = lookup_goacc_asyncqueue (thr, false, async);
+  if (aq)
+    thr->dev->openacc.async.synchronize_func (aq);
 }
 
 /* acc_async_wait is an OpenACC 1.0 compatibility name for acc_wait.  */
diff --git libgomp/oacc-parallel.c libgomp/oacc-parallel.c
index 2815a10f0386..9519abeccc2c 100644
--- libgomp/oacc-parallel.c
+++ libgomp/oacc-parallel.c
@@ -493,7 +493,9 @@ goacc_wait (int async, int num_waits, va_list *ap)
     {
       int qid = va_arg (*ap, int);
       
-      goacc_aq aq = get_goacc_asyncqueue (qid);
+      goacc_aq aq = lookup_goacc_asyncqueue (thr, false, qid);
+      if (!aq)
+ continue;
       if (acc_dev->openacc.async.test_func (aq))
  continue;
       if (async == acc_async_sync)


Grüße
 Thomas
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 2/6, OpenACC, libgomp] Async re-work, oacc-* parts

Thomas Schwinge-8
In reply to this post by Chung-Lin Tang-5
Hi Chung-Lin!

On Tue, 25 Sep 2018 21:10:47 +0800, Chung-Lin Tang <[hidden email]> wrote:
> --- a/libgomp/oacc-async.c
> +++ b/libgomp/oacc-async.c

> +attribute_hidden struct goacc_asyncqueue *
> +lookup_goacc_asyncqueue (struct goacc_thread *thr, bool create, int async)
> +{
> +  /* The special value acc_async_noval (-1) maps to the thread-specific
> +     default async stream.  */
> +  if (async == acc_async_noval)
> +    async = thr->default_async;
> +
> +  if (async == acc_async_sync)
> +    return NULL;
> +
> +  if (async < 0)
> +    gomp_fatal ("bad async %d", async);

To make this "resolve" part more obvious, that is, the translation from
the "async" argument to an "asyncqueue" array index:

> +  if (!create
> +      && (async >= dev->openacc.async.nasyncqueue
> +  || !dev->openacc.async.asyncqueue[async]))
> +    return NULL;
> +[...]

..., I propose adding a "async2id" function for that, and then rename all
"asyncqueue[async]" to "asyncqueue[id]".

And, this also restores the current trunk behavior, so that
"acc_async_noval" gets its own, separate "asyncqueue".

commit e0d10cd744906c031af536bbf523ed6607370bf7
Author: Thomas Schwinge <[hidden email]>
Date:   Wed Dec 12 15:22:29 2018 +0100

    into async re-work: libgomp/oacc-async.c:async2id
---
 libgomp/oacc-async.c | 58 +++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 39 insertions(+), 19 deletions(-)

diff --git libgomp/oacc-async.c libgomp/oacc-async.c
index c9b134ac3380..b091ba2460ac 100644
--- libgomp/oacc-async.c
+++ libgomp/oacc-async.c
@@ -54,53 +54,73 @@ get_goacc_thread_device (void)
   return thr->dev;
 }
 
-attribute_hidden struct goacc_asyncqueue *
-lookup_goacc_asyncqueue (struct goacc_thread *thr, bool create, int async)
+/* Translate from an OpenACC async-argument to an internal asyncqueue ID, or -1
+   if no asyncqueue is to be used.  */
+
+static int
+async2id (int async)
 {
-  /* The special value acc_async_noval (-1) maps to the thread-specific
-     default async stream.  */
-  if (async == acc_async_noval)
-    async = 0; //TODO thr->default_async;
+  if (!async_valid_p (async))
+    gomp_fatal ("invalid async-argument: %d", async);
 
   if (async == acc_async_sync)
+    return -1;
+  else if (async == acc_async_noval)
+    return 0;
+  else if (async >= 0)
+    return 1 + async;
+  else
+    __builtin_unreachable ();
+}
+
+/* Return the asyncqueue to be used for OpenACC async-argument ASYNC.  This
+   might return NULL if no asyncqueue is to be used.  Otherwise, if CREATE,
+   create the asyncqueue if it doesn't exist yet.  */
+
+attribute_hidden struct goacc_asyncqueue *
+lookup_goacc_asyncqueue (struct goacc_thread *thr, bool create, int async)
+{
+  int id = async2id (async);
+  if (id < 0)
     return NULL;
 
-  if (async < 0)
-    gomp_fatal ("bad async %d", async);
-
   struct gomp_device_descr *dev = thr->dev;
 
   if (!create
-      && (async >= dev->openacc.async.nasyncqueue
-  || !dev->openacc.async.asyncqueue[async]))
+      && (id >= dev->openacc.async.nasyncqueue
+  || !dev->openacc.async.asyncqueue[id]))
     return NULL;
 
   gomp_mutex_lock (&dev->openacc.async.lock);
-  if (async >= dev->openacc.async.nasyncqueue)
+  if (id >= dev->openacc.async.nasyncqueue)
     {
-      int diff = async + 1 - dev->openacc.async.nasyncqueue;
+      int diff = id + 1 - dev->openacc.async.nasyncqueue;
       dev->openacc.async.asyncqueue
  = gomp_realloc (dev->openacc.async.asyncqueue,
- sizeof (goacc_aq) * (async + 1));
+ sizeof (goacc_aq) * (id + 1));
       memset (dev->openacc.async.asyncqueue + dev->openacc.async.nasyncqueue,
       0, sizeof (goacc_aq) * diff);
-      dev->openacc.async.nasyncqueue = async + 1;
+      dev->openacc.async.nasyncqueue = id + 1;
     }
 
-  if (!dev->openacc.async.asyncqueue[async])
+  if (!dev->openacc.async.asyncqueue[id])
     {
-      dev->openacc.async.asyncqueue[async] = dev->openacc.async.construct_func ();
+      dev->openacc.async.asyncqueue[id] = dev->openacc.async.construct_func ();
 
       /* Link new async queue into active list.  */
       goacc_aq_list n = gomp_malloc (sizeof (struct goacc_asyncqueue_list));
-      n->aq = dev->openacc.async.asyncqueue[async];
+      n->aq = dev->openacc.async.asyncqueue[id];
       n->next = dev->openacc.async.active;
       dev->openacc.async.active = n;
     }
   gomp_mutex_unlock (&dev->openacc.async.lock);
-  return dev->openacc.async.asyncqueue[async];
+  return dev->openacc.async.asyncqueue[id];
 }
 
+/* Return the asyncqueue to be used for OpenACC async-argument ASYNC.  This
+   might return NULL if no asyncqueue is to be used.  Otherwise, create the
+   asyncqueue if it doesn't exist yet.  */
+
 attribute_hidden struct goacc_asyncqueue *
 get_goacc_asyncqueue (int async)
 {


Grüße
 Thomas
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 2/6, OpenACC, libgomp] Async re-work, oacc-* parts

Thomas Schwinge-8
In reply to this post by Chung-Lin Tang-5
Hi Chung-Lin!

On Tue, 25 Sep 2018 21:10:47 +0800, Chung-Lin Tang <[hidden email]> wrote:

>  void
>  acc_wait_async (int async1, int async2)
>  {
> +  struct goacc_thread *thr = get_goacc_thread ();
>  
> +  goacc_aq aq2 = lookup_goacc_asyncqueue (thr, true, async2);
> +  goacc_aq aq1 = lookup_goacc_asyncqueue (thr, false, async1);
> +  if (!aq1)
> +    gomp_fatal ("invalid async 1");
> +  if (aq1 == aq2)
> +    gomp_fatal ("identical parameters");
>  
> +  thr->dev->openacc.async.synchronize_func (aq1);
> +  thr->dev->openacc.async.serialize_func (aq1, aq2);
>  }

Invoked as "acc_wait_async ([...], acc_async_sync)" (as used in a test
case that I'll soon submit/commit), we'll end up with "aq2 == NULL", and
will segfault in the nvptx "openacc.async.serialize_func".

Good to fix as follows?

commit 448ff855bd954a72b5edb19fc1f3d481833fcb59
Author: Thomas Schwinge <[hidden email]>
Date:   Thu Dec 13 17:43:42 2018 +0100

    into async re-work: adjust for test case added in "[PR88484] OpenACC wait directive without wait argument but with async clause"
---
 libgomp/oacc-async.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git libgomp/oacc-async.c libgomp/oacc-async.c
index 7e61b5dc0a05..a38e42781aa0 100644
--- libgomp/oacc-async.c
+++ libgomp/oacc-async.c
@@ -196,7 +196,8 @@ acc_wait_async (int async1, int async2)
     gomp_fatal ("identical parameters");
 
   thr->dev->openacc.async.synchronize_func (aq1);
-  thr->dev->openacc.async.serialize_func (aq1, aq2);
+  if (aq2)
+    thr->dev->openacc.async.serialize_func (aq1, aq2);
 }
 
 void


Grüße
 Thomas
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 2/6, OpenACC, libgomp] Async re-work, oacc-* parts

Chung-Lin Tang-5
On 2018/12/14 10:32 PM, Thomas Schwinge wrote:
> Invoked as "acc_wait_async ([...], acc_async_sync)" (as used in a test
> case that I'll soon submit/commit), we'll end up with "aq2 == NULL", and
> will segfault in the nvptx "openacc.async.serialize_func".

What does "wait async(acc_async_sync)" supposed to mean? Instead of fixing
it here, will it make more sense to have the serialize_func hook to accommodate
the NULL asyncqueue?

Chung-Lin

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 2/6, OpenACC, libgomp] Async re-work, oacc-* parts

Chung-Lin Tang-5
In reply to this post by Thomas Schwinge-8
On 2018/12/14 10:17 PM, Thomas Schwinge wrote:

> Hi Chung-Lin!
>
> On Tue, 25 Sep 2018 21:10:47 +0800, Chung-Lin Tang <[hidden email]> wrote:
>> --- a/libgomp/oacc-async.c
>> +++ b/libgomp/oacc-async.c
>
>> +attribute_hidden struct goacc_asyncqueue *
>> +lookup_goacc_asyncqueue (struct goacc_thread *thr, bool create, int async)
>> +{
>> +  /* The special value acc_async_noval (-1) maps to the thread-specific
>> +     default async stream.  */
>> +  if (async == acc_async_noval)
>> +    async = thr->default_async;
>> +
>> +  if (async == acc_async_sync)
>> +    return NULL;
>> +
>> +  if (async < 0)
>> +    gomp_fatal ("bad async %d", async);
>
> To make this "resolve" part more obvious, that is, the translation from
> the "async" argument to an "asyncqueue" array index:
>
>> +  if (!create
>> +      && (async >= dev->openacc.async.nasyncqueue
>> +  || !dev->openacc.async.asyncqueue[async]))
>> +    return NULL;
>> +[...]
>
> ..., I propose adding a "async2id" function for that, and then rename all
> "asyncqueue[async]" to "asyncqueue[id]".

I don't think this is needed. This is the only place in the entire runtime that
does asyncqueue indexing, adding more conceptual layers of re-directed indexing
seems unneeded.

I do think the more descriptive comments are nice though.

> And, this also restores the current trunk behavior, so that
> "acc_async_noval" gets its own, separate "asyncqueue".

Is there a reason we need to restore that behavior right now?

Thanks,
Chung-Lin

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 2/6, OpenACC, libgomp] Async re-work, oacc-* parts

Thomas Schwinge-8
In reply to this post by Chung-Lin Tang-5
Hi Chung-Lin!

On Tue, 25 Sep 2018 21:10:47 +0800, Chung-Lin Tang <[hidden email]> wrote:

> --- a/libgomp/oacc-parallel.c
> +++ b/libgomp/oacc-parallel.c
> @@ -377,8 +360,6 @@ GOACC_enter_exit_data (int device, size_t mapnum,
>   finalize = true;
>      }
>  
> -  acc_dev->openacc.async_set_async_func (async);
> -
>    /* Determine if this is an "acc enter data".  */
>    for (i = 0; i < mapnum; ++i)
>      {
> @@ -450,7 +431,7 @@ GOACC_enter_exit_data (int device, size_t mapnum,
>    else
>      {
>        gomp_acc_insert_pointer (pointer, &hostaddrs[i],
> -       &sizes[i], &kinds[i]);
> +       &sizes[i], &kinds[i], async);
>        /* Increment 'i' by two because OpenACC requires fortran
>   arrays to be contiguous, so each PSET is associated with
>   one of MAP_FORCE_ALLOC/MAP_FORCE_PRESET/MAP_FORCE_TO, and
> @@ -475,17 +456,17 @@ GOACC_enter_exit_data (int device, size_t mapnum,
>   if (acc_is_present (hostaddrs[i], sizes[i]))
>    {
>      if (finalize)
> -      acc_delete_finalize (hostaddrs[i], sizes[i]);
> +      acc_delete_finalize_async (hostaddrs[i], sizes[i], async);
>      else
> -      acc_delete (hostaddrs[i], sizes[i]);
> +      acc_delete_async (hostaddrs[i], sizes[i], async);
>    }
>   break;
>        case GOMP_MAP_FROM:
>        case GOMP_MAP_FORCE_FROM:
>   if (finalize)
> -  acc_copyout_finalize (hostaddrs[i], sizes[i]);
> +  acc_copyout_finalize_async (hostaddrs[i], sizes[i], async);
>   else
> -  acc_copyout (hostaddrs[i], sizes[i]);
> +  acc_copyout_async (hostaddrs[i], sizes[i], async);
>   break;
>        default:
>   gomp_fatal (">>>> GOACC_enter_exit_data UNHANDLED kind 0x%.2x",
> @@ -503,8 +484,6 @@ GOACC_enter_exit_data (int device, size_t mapnum,
>      i += pointer - 1;
>    }
>        }
> -
> -  acc_dev->openacc.async_set_async_func (acc_async_sync);
>  }

Additionally the following, or why not?  Please comment on the one TODO
which before your async re-work also was -- incorrectly? -- run
asynchronously?

commit 34c9ce65ad1f9865d0716d18c364d8c6928e694c
Author: Thomas Schwinge <[hidden email]>
Date:   Fri Dec 14 14:34:17 2018 +0100

    into async re-work: more async function usage
---
 libgomp/oacc-parallel.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git libgomp/oacc-parallel.c libgomp/oacc-parallel.c
index 5a441c9efe38..91875c57fc97 100644
--- libgomp/oacc-parallel.c
+++ libgomp/oacc-parallel.c
@@ -413,11 +413,11 @@ GOACC_enter_exit_data (int device, size_t mapnum,
  {
  case GOMP_MAP_ALLOC:
  case GOMP_MAP_FORCE_ALLOC:
-  acc_create (hostaddrs[i], sizes[i]);
+  acc_create_async (hostaddrs[i], sizes[i], async);
   break;
  case GOMP_MAP_TO:
  case GOMP_MAP_FORCE_TO:
-  acc_copyin (hostaddrs[i], sizes[i]);
+  acc_copyin_async (hostaddrs[i], sizes[i], async);
   break;
  default:
   gomp_fatal (">>>> GOACC_enter_exit_data UNHANDLED kind 0x%.2x",
@@ -563,6 +563,8 @@ GOACC_update (int device, size_t mapnum,
  the value of the allocated device memory in the
  previous pointer.  */
       *(uintptr_t *) hostaddrs[i] = (uintptr_t)dptr;
+      /* This is intentionally no calling acc_update_device_async,
+ because TODO.  */
       acc_update_device (hostaddrs[i], sizeof (uintptr_t));
 
       /* Restore the host pointer.  */


Grüße
 Thomas
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 2/6, OpenACC, libgomp] Async re-work, oacc-* parts

Thomas Schwinge
In reply to this post by Chung-Lin Tang-5
Hi Chung-Lin!

On Tue, 25 Sep 2018 21:10:47 +0800, Chung-Lin Tang <[hidden email]> wrote:
> --- a/libgomp/oacc-async.c
> +++ b/libgomp/oacc-async.c

> +attribute_hidden struct goacc_asyncqueue *
> +lookup_goacc_asyncqueue (struct goacc_thread *thr, bool create, int async)
> +{
> +  /* The special value acc_async_noval (-1) maps to the thread-specific
> +     default async stream.  */
> +  if (async == acc_async_noval)
> +    async = thr->default_async;
> +
> +  if (async == acc_async_sync)
> +    return NULL;
> +
> +  if (async < 0)
> +    gomp_fatal ("bad async %d", async);
> +
> +  struct gomp_device_descr *dev = thr->dev;
> +
> +  if (!create
> +      && (async >= dev->openacc.async.nasyncqueue
> +  || !dev->openacc.async.asyncqueue[async]))
> +    return NULL;
> +

Doesn't this last block also have to be included in the lock you're
taking below?

> +  gomp_mutex_lock (&dev->openacc.async.lock);
> +  if (async >= dev->openacc.async.nasyncqueue)
> +    {
> +      int diff = async + 1 - dev->openacc.async.nasyncqueue;
> +      dev->openacc.async.asyncqueue
> + = gomp_realloc (dev->openacc.async.asyncqueue,
> + sizeof (goacc_aq) * (async + 1));
> +      memset (dev->openacc.async.asyncqueue + dev->openacc.async.nasyncqueue,
> +      0, sizeof (goacc_aq) * diff);
> +      dev->openacc.async.nasyncqueue = async + 1;
> +    }
> +
> +  if (!dev->openacc.async.asyncqueue[async])
> +    {
> +      dev->openacc.async.asyncqueue[async] = dev->openacc.async.construct_func ();
> +
> +      /* Link new async queue into active list.  */
> +      goacc_aq_list n = gomp_malloc (sizeof (struct goacc_asyncqueue_list));
> +      n->aq = dev->openacc.async.asyncqueue[async];
> +      n->next = dev->openacc.async.active;
> +      dev->openacc.async.active = n;
> +    }
> +  gomp_mutex_unlock (&dev->openacc.async.lock);
> +  return dev->openacc.async.asyncqueue[async];
> +}

And then, some more concerns, as encoded in the following patch (but
please also continue reading below):

commit d2d6aaeca840debbec14e421be705ef56d444ac7
Author: Thomas Schwinge <[hidden email]>
Date:   Wed Dec 12 15:57:30 2018 +0100

    into async re-work: locking concerns
---
 libgomp/oacc-async.c          | 18 +++++++++++++++---
 libgomp/plugin/plugin-nvptx.c |  6 ++++++
 2 files changed, 21 insertions(+), 3 deletions(-)

diff --git libgomp/oacc-async.c libgomp/oacc-async.c
index 89a405ebcdb1..68e4e65e8182 100644
--- libgomp/oacc-async.c
+++ libgomp/oacc-async.c
@@ -84,17 +84,21 @@ lookup_goacc_asyncqueue (struct goacc_thread *thr, bool create, int async)
   if (id < 0)
     return NULL;
 
+  struct goacc_asyncqueue *ret = NULL;
+
   struct gomp_device_descr *dev = thr->dev;
 
+  gomp_mutex_lock (&dev->openacc.async.lock);
+
   if (!create
       && (id >= dev->openacc.async.nasyncqueue
   || !dev->openacc.async.asyncqueue[id]))
-    return NULL;
+    goto out;
 
-  gomp_mutex_lock (&dev->openacc.async.lock);
   if (id >= dev->openacc.async.nasyncqueue)
     {
       int diff = id + 1 - dev->openacc.async.nasyncqueue;
+      // TODO gomp_realloc might call "gomp_fatal" with "&dev->openacc.async.lock" locked.  Might cause deadlock?
       dev->openacc.async.asyncqueue
  = gomp_realloc (dev->openacc.async.asyncqueue,
  sizeof (goacc_aq) * (id + 1));
@@ -105,16 +109,23 @@ lookup_goacc_asyncqueue (struct goacc_thread *thr, bool create, int async)
 
   if (!dev->openacc.async.asyncqueue[id])
     {
+      //TODO We have "&dev->openacc.async.lock" locked here, and if "openacc.async.construct_func" calls "GOMP_PLUGIN_fatal" (via "CUDA_CALL_ASSERT", for example), that might cause deadlock?
+      //TODO Change the interface to emit an error in the plugin, but then "return NULL", and we catch that here, unlock, and bail out?
       dev->openacc.async.asyncqueue[id] = dev->openacc.async.construct_func ();
 
       /* Link new async queue into active list.  */
+      // TODO gomp_malloc might call "gomp_fatal" with "&dev->openacc.async.lock" locked.  Might cause deadlock?
       goacc_aq_list n = gomp_malloc (sizeof (struct goacc_asyncqueue_list));
       n->aq = dev->openacc.async.asyncqueue[id];
       n->next = dev->openacc.async.active;
       dev->openacc.async.active = n;
     }
+  ret = dev->openacc.async.asyncqueue[id];
+
+ out:
   gomp_mutex_unlock (&dev->openacc.async.lock);
-  return dev->openacc.async.asyncqueue[id];
+
+  return ret;
 }
 
 /* Return the asyncqueue to be used for OpenACC async-argument ASYNC.  This
@@ -305,6 +316,7 @@ goacc_fini_asyncqueues (struct gomp_device_descr *devicep)
       goacc_aq_list next;
       for (goacc_aq_list l = devicep->openacc.async.active; l; l = next)
  {
+  //TODO Can/should/must we "synchronize" here (how?), so as to make sure that no other operation on this asyncqueue is going on while/after we've destructed it here?
   ret &= devicep->openacc.async.destruct_func (l->aq);
   next = l->next;
   free (l);
diff --git libgomp/plugin/plugin-nvptx.c libgomp/plugin/plugin-nvptx.c
index 577ed39ef3f6..872e91f05e78 100644
--- libgomp/plugin/plugin-nvptx.c
+++ libgomp/plugin/plugin-nvptx.c
@@ -1389,6 +1389,7 @@ GOMP_OFFLOAD_openacc_async_test (struct goacc_asyncqueue *aq)
   if (r == CUDA_ERROR_NOT_READY)
     return 0;
 
+  //TODO Is this safe to call, or might this cause deadlock if something's locked?
   GOMP_PLUGIN_error ("cuStreamQuery error: %s", cuda_error (r));
   return -1;
 }
@@ -1396,6 +1397,7 @@ GOMP_OFFLOAD_openacc_async_test (struct goacc_asyncqueue *aq)
 void
 GOMP_OFFLOAD_openacc_async_synchronize (struct goacc_asyncqueue *aq)
 {
+  //TODO Is this safe to call, or might this cause deadlock if something's locked?
   CUDA_CALL_ASSERT (cuStreamSynchronize, aq->cuda_stream);
 }
 
@@ -1404,6 +1406,7 @@ GOMP_OFFLOAD_openacc_async_serialize (struct goacc_asyncqueue *aq1,
       struct goacc_asyncqueue *aq2)
 {
   CUevent e;
+  //TODO Are these safe to call, or might this cause deadlock if something's locked?
   CUDA_CALL_ASSERT (cuEventCreate, &e, CU_EVENT_DISABLE_TIMING);
   CUDA_CALL_ASSERT (cuEventRecord, e, aq1->cuda_stream);
   CUDA_CALL_ASSERT (cuStreamWaitEvent, aq2->cuda_stream, e, 0);
@@ -1413,6 +1416,7 @@ static void
 cuda_callback_wrapper (CUstream stream, CUresult res, void *ptr)
 {
   if (res != CUDA_SUCCESS)
+    //TODO Is this safe to call, or might this cause deadlock if something's locked?
     GOMP_PLUGIN_fatal ("%s error: %s", __FUNCTION__, cuda_error (res));
   struct nvptx_callback *cb = (struct nvptx_callback *) ptr;
   cb->fn (cb->ptr);
@@ -1424,10 +1428,12 @@ GOMP_OFFLOAD_openacc_async_queue_callback (struct goacc_asyncqueue *aq,
    void (*callback_fn)(void *),
    void *userptr)
 {
+  //TODO Is this safe to call, or might this cause deadlock if something's locked?
   struct nvptx_callback *b = GOMP_PLUGIN_malloc (sizeof (*b));
   b->fn = callback_fn;
   b->ptr = userptr;
   b->aq = aq;
+  //TODO Is this safe to call, or might this cause deadlock if something's locked?
   CUDA_CALL_ASSERT (cuStreamAddCallback, aq->cuda_stream,
     cuda_callback_wrapper, (void *) b, 0);
 }


But then, I wonder if we couldn't skip all that locking, if we moved the
"asyncqueue"s from "acc_dispatch_t" into "goacc_thread"?

commit c9282e058f67cb8f8ca1720d7f9e3fe0c04b6c89
Author: Thomas Schwinge <[hidden email]>
Date:   Thu Dec 13 18:00:16 2018 +0100

    [TODO] into async re-work: move "asyncqueue"s from "acc_dispatch_t" into "goacc_thread"?
---
 libgomp/libgomp.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git libgomp/libgomp.h libgomp/libgomp.h
index 574fcd1ee4ad..09852589d2f1 100644
--- libgomp/libgomp.h
+++ libgomp/libgomp.h
@@ -949,6 +949,11 @@ typedef struct acc_dispatch_t
   __typeof (GOMP_OFFLOAD_openacc_exec) *exec_func;
 
   struct {
+    //TODO Why do these live in the "device" data structure, and not in the "per-thread" data structure?
+    //TODO Aren't they meant to be separate per thread?
+    //TODO That is, as far as I remember right now, OpenACC explicitly states that an asyncqueue doesn't entail any synchronization between different host threads.
+    //TODO Verify OpenACC.
+    //TODO With that moved into "goacc_thread", we could get rid of all the locking needed here?
     /* Once created and put into the "active" list, asyncqueues are then never
        destructed and removed from the "active" list, other than if the TODO
        device is shut down.  */

At this point, I will again (as in that other email) state that my
understanding of OpenACC is that an async queue does not entail any
inter-thread synchronization, so it would seem reasonable that all async
queues are separate per thread.


Grüße
 Thomas
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 2/6, OpenACC, libgomp] Async re-work, oacc-* parts

Chung-Lin Tang-5
In reply to this post by Thomas Schwinge-8
On 2018/12/14 10:53 PM, Thomas Schwinge wrote:
> Additionally the following, or why not?  Please comment on the one TODO
> which before your async re-work also was -- incorrectly? -- run
> asynchronously?


> diff --git libgomp/oacc-parallel.c libgomp/oacc-parallel.c
> index 5a441c9efe38..91875c57fc97 100644
> --- libgomp/oacc-parallel.c
> +++ libgomp/oacc-parallel.c
> @@ -413,11 +413,11 @@ GOACC_enter_exit_data (int device, size_t mapnum,
>   {
>   case GOMP_MAP_ALLOC:
>   case GOMP_MAP_FORCE_ALLOC:
> -  acc_create (hostaddrs[i], sizes[i]);
> +  acc_create_async (hostaddrs[i], sizes[i], async);
>    break;
>   case GOMP_MAP_TO:
>   case GOMP_MAP_FORCE_TO:
> -  acc_copyin (hostaddrs[i], sizes[i]);
> +  acc_copyin_async (hostaddrs[i], sizes[i], async);
>    break;
>   default:

Yes! I think these were somehow missed by mistake. Thanks for catching!

>    gomp_fatal (">>>> GOACC_enter_exit_data UNHANDLED kind 0x%.2x",
> @@ -563,6 +563,8 @@ GOACC_update (int device, size_t mapnum,
>   the value of the allocated device memory in the
>   previous pointer.  */
>        *(uintptr_t *) hostaddrs[i] = (uintptr_t)dptr;
> +      /* This is intentionally no calling acc_update_device_async,
> + because TODO.  */
>        acc_update_device (hostaddrs[i], sizeof (uintptr_t));
>  
>        /* Restore the host pointer.  */

I don't remember adding this piece of comment, it might have been Cesar I guess?
I'm not sure if there's any real reason not to use acc_update_device_async here...
Change and test to see?

Thanks,
Chung-Lin