[PATCH,openacc] check for compatible loop parallelism with acc routine calls

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH,openacc] check for compatible loop parallelism with acc routine calls

Cesar Philippidis
This patch addresses the following problems with acc routines:

 * incorrectly permitting 'acc seq' loops to call gang, worker and
   vector routines

 * lto-wrapper errors when a function or subroutine isn't marked as
   'acc routine'

The solution to the first problem is straightforward. It only required a
small change to oacc_loop_fixed_partitions. The solution to the second
problem is more involved, since it required changes to the fortran FE,
gimplifier, the behavior of flag_generate_offload, and libgomp.

Starting with the the fortran changes, this patch updates the way that
the fortran FE handles the 'acc routine' attribute in modules. Before,
it only recorded that a function was marked as an acc routine. With this
patch, it now records the level of parallelism the routine has. This is
necessary for the middle end to validate compatible parallelism between
the loop calling the routine and the routine itself.

The second set of changes involves teaching the gimplifier to error when
it detects a function call to an non-acc routines inside an OpenACC
offloaded region. Actually, I relaxed non-acc routines by excluding
calls to builtin functions, including those prefixed with _gfortran_.
Nvptx does have a newlib c library, and it also has a subset of
libgfortran. Still, this solution is probably not optimal.

Next, I had to modify the openacc header files in libgomp to mark
acc_on_device as an acc routine. Unfortunately, this meant that I had to
build the opeancc.mod module for gfortran with -fopenacc. But doing
that, caused caused gcc to stream offloaded code to the openacc.o object
file. So, I've updated the behavior of flag_generate_offload such that
minus one indicates that the user specified -foffload=disable, and that
will prevent gcc from streaming offloaded lto code. The alternative was
to hack libtool to build libgomp with -foffload=disable.

Is this patch OK for trunk?

There are still a couple of other quirks with routines we'll need to
address with a follow up patch. Namely, passing scalar dummy arguments
causes to subroutines trips up the nvptx worker and vector state
propagator if the actual argument is a local variable. That's because
the nvptx state propagator only forwards the pointer to the worker and
vector threads, and not the actual variable itself. Consequently, those
pointers dereference garbage. This is a problem with pass-by-reference
in general.

Cesar


acc-subroutines-20160615.diff (24K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH,openacc] check for compatible loop parallelism with acc routine calls

Jakub Jelinek
On Wed, Jun 15, 2016 at 08:12:15PM -0700, Cesar Philippidis wrote:
> The second set of changes involves teaching the gimplifier to error when
> it detects a function call to an non-acc routines inside an OpenACC
> offloaded region. Actually, I relaxed non-acc routines by excluding
> calls to builtin functions, including those prefixed with _gfortran_.
> Nvptx does have a newlib c library, and it also has a subset of
> libgfortran. Still, this solution is probably not optimal.

I don't really like that, hardcoding prefixes or whatever is available
(you have quite some subset of libc, libm etc. available too) in the
compiler looks very hackish.  What is wrong with complaining during
linking of the offloaded code?

> Next, I had to modify the openacc header files in libgomp to mark
> acc_on_device as an acc routine. Unfortunately, this meant that I had to
> build the opeancc.mod module for gfortran with -fopenacc. But doing
> that, caused caused gcc to stream offloaded code to the openacc.o object
> file. So, I've updated the behavior of flag_generate_offload such that
> minus one indicates that the user specified -foffload=disable, and that
> will prevent gcc from streaming offloaded lto code. The alternative was
> to hack libtool to build libgomp with -foffload=disable.

This also looks wrong.  I'd say the right thing is when loading modules
that have OpenACC bits set in it (and also OpenMP bits, I admit I haven't
handled this well) into CU with the corresponding flags unset (-fopenacc,
-fopenmp, -fopenmp-simd here, depending on which bit it is), then
IMHO the module loading code should just ignore it, pretend it wasn't there.
Similarly e.g. to how lto1 with -g0 should ignore debug statements that
could be in the LTO inputs.

        Jakub
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH,openacc] check for compatible loop parallelism with acc routine calls

Cesar Philippidis
On 06/17/2016 07:42 AM, Jakub Jelinek wrote:

> On Wed, Jun 15, 2016 at 08:12:15PM -0700, Cesar Philippidis wrote:
>> The second set of changes involves teaching the gimplifier to error when
>> it detects a function call to an non-acc routines inside an OpenACC
>> offloaded region. Actually, I relaxed non-acc routines by excluding
>> calls to builtin functions, including those prefixed with _gfortran_.
>> Nvptx does have a newlib c library, and it also has a subset of
>> libgfortran. Still, this solution is probably not optimal.
>
> I don't really like that, hardcoding prefixes or whatever is available
> (you have quite some subset of libc, libm etc. available too) in the
> compiler looks very hackish.  What is wrong with complaining during
> linking of the offloaded code?
Wouldn't the error get reported multiple times then, i.e. once per
target? Then again, maybe this error could have been restrained to the
host compiler.

Anyway, this patch now reduces that error to a warning. Furthermore,
that warning is being thrown in lower_omp_1 instead of
gimplify_call_expr because the latter is called multiple times and that
causes duplicate warnings. The only bit of fallout I had with this
change was with the fortran FE's usage of BUILT_IN_EXPECT in
gfc_{un}likely. Since these are generated implicitly by the FE, I just
added an oacc_function attribute to those calls when flag_openacc is set.

>> Next, I had to modify the openacc header files in libgomp to mark
>> acc_on_device as an acc routine. Unfortunately, this meant that I had to
>> build the opeancc.mod module for gfortran with -fopenacc. But doing
>> that, caused caused gcc to stream offloaded code to the openacc.o object
>> file. So, I've updated the behavior of flag_generate_offload such that
>> minus one indicates that the user specified -foffload=disable, and that
>> will prevent gcc from streaming offloaded lto code. The alternative was
>> to hack libtool to build libgomp with -foffload=disable.
>
> This also looks wrong.  I'd say the right thing is when loading modules
> that have OpenACC bits set in it (and also OpenMP bits, I admit I haven't
> handled this well) into CU with the corresponding flags unset (-fopenacc,
> -fopenmp, -fopenmp-simd here, depending on which bit it is), then
> IMHO the module loading code should just ignore it, pretend it wasn't there.
> Similarly e.g. to how lto1 with -g0 should ignore debug statements that
> could be in the LTO inputs.
This required two changes. First, I had to teach lto-cgraph.c how to
report an error rather then fail an assert when partitions are missing
decls. Second, I taught the lto wrapper how to stream offloaded code on
the absence of -fopen*. The only kink with this approach is that I had
to build libgomp/openacc.f90 with -frandom-seed=1 to prevent lto related
bootstrap failures.

By the way, Thomas, I've added

 #pragma acc routine(__builtin_acc_on_device) seq

to openacc.h. Is this OK, or should I just modify the various
libgomp.oacc-c-c++-common/loop* tests to use that pragma directly? Or
another option is to have the compiler add that attribute directly. I
don't think we're really expecting the end user to use
__builtin_acc_on_device directly since this is a gcc-ism.

Cesar

acc-subroutines-20160623.diff (52K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH,openacc] check for compatible loop parallelism with acc routine calls

Thomas Schwinge-8
Hi!

Cesar, I have not yet fully digested this, but do I understand right that
you're really fixing two issues here, that are related (OpenACC routines)
but still can be addressed independently of each other?  Do I understand
right that the first one, the "problems with acc routines [...]
incorrectly permitting 'acc seq' loops to call gang, worker and vector
routines" is just a Fortran front end patch?  If yes, please split that
one out, so as to reduce the volume of remaining changes that remain to
be discussed.

On Thu, 23 Jun 2016 09:05:38 -0700, Cesar Philippidis <[hidden email]> wrote:
> On 06/17/2016 07:42 AM, Jakub Jelinek wrote:
> > On Wed, Jun 15, 2016 at 08:12:15PM -0700, Cesar Philippidis wrote:
> >> The second set of changes involves teaching the gimplifier to error when
> >> it detects a function call to an non-acc routines inside an OpenACC
> >> offloaded region.

As I understand, that's the same problem as has been discussed before
(Ilya CCed), and has recently again been filed in
<https://gcc.gnu.org/PR71499> "ICE in LTO1 when attempting NVPTX
offloading (-fopenacc)", and <https://gcc.gnu.org/PR71535> "ICE in LTO1
with -fopenmp offloading" (Alexander CCed).  Some earlier discussion
threads include:
<http://news.gmane.org/find-root.php?message_id=%3C20150109145702.GA45210%40msticlxl57.ims.intel.com%3E>,
<http://news.gmane.org/find-root.php?message_id=%3C20150724152119.GA41292%40msticlxl57.ims.intel.com%3E>,
<http://news.gmane.org/find-root.php?message_id=%3C56269C05.6030502%40acm.org%3E>.

> >> Actually, I relaxed non-acc routines by excluding
> >> calls to builtin functions, including those prefixed with _gfortran_.
> >> Nvptx does have a newlib c library, and it also has a subset of
> >> libgfortran. Still, this solution is probably not optimal.
> >
> > I don't really like that, hardcoding prefixes or whatever is available
> > (you have quite some subset of libc, libm etc. available too) in the
> > compiler looks very hackish.  What is wrong with complaining during
> > linking of the offloaded code?

ACK.  Jakub, do I understand you correctly, that you basically say that
every function declaration that is in scope inside offloaded regions (for
example, GCC builtin functions, or standard library functions declared in
target compiler's header files) is permitted to be called in offloaded
regions, and the offloading compiler will then either be able to resolve
these (nvptx back end knows about trigonometric functions, for example,
and a lot of functions are available in the nvptx libc), or otherwise
error out during the offloading compilation (during linking), gracefully
without terminating the target compilation (that "gracefully" bit is
currently missing -- that's for another day).  That is, all such
functions are implicitly callable as OpenACC "seq" functions (which means
that they don't internally use gang/worker/vector parallelism).  In
particular, all these functions do *not* need to be marked with an
explicit "#pragma acc routine seq" directive.  (Functions internally
using gang/worker/vector parallelism will need to be marked
appropriately, using a "#pragma acc routine gang/worker/vector"
directive.)  That's how I understand your comment above, and your earlier
comments on this topic, and also is what I think should be done.

> Wouldn't the error get reported multiple times then, i.e. once per
> target? Then again, maybe this error could have been restrained to the
> host compiler.

That's not something I would care about right now.  :-)

> Anyway, this patch now reduces that error to a warning. Furthermore,
> that warning is being thrown in lower_omp_1 instead of
> gimplify_call_expr because the latter is called multiple times and that
> causes duplicate warnings. The only bit of fallout I had with this
> change was with the fortran FE's usage of BUILT_IN_EXPECT in
> gfc_{un}likely. Since these are generated implicitly by the FE, I just
> added an oacc_function attribute to those calls when flag_openacc is set.
>
> >> Next, I had to modify the openacc header files in libgomp to mark
> >> acc_on_device as an acc routine. Unfortunately, this meant that I had to
> >> build the opeancc.mod module for gfortran with -fopenacc. But doing
> >> that, caused caused gcc to stream offloaded code to the openacc.o object
> >> file. So, I've updated the behavior of flag_generate_offload such that
> >> minus one indicates that the user specified -foffload=disable, and that
> >> will prevent gcc from streaming offloaded lto code. The alternative was
> >> to hack libtool to build libgomp with -foffload=disable.
> >
> > This also looks wrong.  I'd say the right thing is when loading modules
> > that have OpenACC bits set in it (and also OpenMP bits, I admit I haven't
> > handled this well) into CU with the corresponding flags unset (-fopenacc,
> > -fopenmp, -fopenmp-simd here, depending on which bit it is), then
> > IMHO the module loading code should just ignore it, pretend it wasn't there.
> > Similarly e.g. to how lto1 with -g0 should ignore debug statements that
> > could be in the LTO inputs.
(Also a task for another day, in my opinion.)

> This required two changes. First, I had to teach lto-cgraph.c how to
> report an error rather then fail an assert when partitions are missing
> decls.

Something like that may make sense (conceptually).

> Second, I taught the lto wrapper how to stream offloaded code on
> the absence of -fopen*. The only kink with this approach is that I had
> to build libgomp/openacc.f90 with -frandom-seed=1 to prevent lto related
> bootstrap failures.

Uh.  Hopefully we're not going to need something like that.

> By the way, Thomas, I've added
>
>  #pragma acc routine(__builtin_acc_on_device) seq
>
> to openacc.h. Is this OK, or should I just modify the various
> libgomp.oacc-c-c++-common/loop* tests to use that pragma directly? Or
> another option is to have the compiler add that attribute directly. I
> don't think we're really expecting the end user to use
> __builtin_acc_on_device directly since this is a gcc-ism.

As per my reasoning above, all that should not be needed.


A few random comments on the patch:

> --- a/gcc/fortran/gfortran.h
> +++ b/gcc/fortran/gfortran.h
> @@ -303,6 +303,15 @@ enum save_state
>  { SAVE_NONE = 0, SAVE_EXPLICIT, SAVE_IMPLICIT
>  };
>  
> +/* Flags to keep track of ACC routine states.  */
> +enum oacc_function
> +{ OACC_FUNCTION_NONE = 0,
> +  OACC_FUNCTION_SEQ,
> +  OACC_FUNCTION_GANG,
> +  OACC_FUNCTION_WORKER,
> +  OACC_FUNCTION_VECTOR
> +};
What's the purpose of OACC_FUNCTION_NONE?  It's not used anywhere, as far
as I can tell?

> --- a/gcc/fortran/openmp.c
> +++ b/gcc/fortran/openmp.c
> @@ -1664,21 +1664,31 @@ gfc_match_oacc_cache (void)
>  
>  /* Determine the loop level for a routine.   */
>  
> -static int
> +static oacc_function
>  gfc_oacc_routine_dims (gfc_omp_clauses *clauses)
>  {
>    int level = -1;
> +  oacc_function ret = OACC_FUNCTION_SEQ;
>  
>    if (clauses)
>      {
>        unsigned mask = 0;
>  
>        if (clauses->gang)
> - level = GOMP_DIM_GANG, mask |= GOMP_DIM_MASK (level);
> + {
> +  level = GOMP_DIM_GANG, mask |= GOMP_DIM_MASK (level);
> +  ret = OACC_FUNCTION_GANG;
> + }
>        if (clauses->worker)
> - level = GOMP_DIM_WORKER, mask |= GOMP_DIM_MASK (level);
> + {
> +  level = GOMP_DIM_WORKER, mask |= GOMP_DIM_MASK (level);
> +  ret = OACC_FUNCTION_WORKER;
> + }
>        if (clauses->vector)
> - level = GOMP_DIM_VECTOR, mask |= GOMP_DIM_MASK (level);
> + {
> +  level = GOMP_DIM_VECTOR, mask |= GOMP_DIM_MASK (level);
> +  ret = OACC_FUNCTION_VECTOR;
> + }
>        if (clauses->seq)
>   level = GOMP_DIM_MAX, mask |= GOMP_DIM_MASK (level);
>  
I have not looked in detail, so maybe I'm misunderstanding what is being
done here -- but how do "clauses->seq" and "level = GOMP_DIM_MAX" fit
together?

> @@ -1689,7 +1699,7 @@ gfc_oacc_routine_dims (gfc_omp_clauses *clauses)
>    if (level < 0)
>      level = GOMP_DIM_MAX;
>  
> -  return level;
> +  return ret;
>  }

Just from that last hunk, it seems that the assignment to "level" is a
dead store?

> --- a/gcc/fortran/trans-decl.c
> +++ b/gcc/fortran/trans-decl.c
> @@ -1308,30 +1308,34 @@ gfc_add_assign_aux_vars (gfc_symbol * sym)
>  }
>  
>  
> -static tree
> -add_attributes_to_decl (symbol_attribute sym_attr, tree list)
> +tree
> +add_omp_offloading_attributes (unsigned omp_declare_target,
> +       enum oacc_function acc_routine, tree list)
>  {
> -  unsigned id;
> -  tree attr;
> -
> -  for (id = 0; id < EXT_ATTR_NUM; id++)
> -    if (sym_attr.ext_attr & (1 << id))
> -      {
> - attr = build_tree_list (
> - get_identifier (ext_attr_list[id].middle_end_name),
> - NULL_TREE);
> - list = chainon (list, attr);
> -      }
> -
> -  if (sym_attr.omp_declare_target)
> +  if (omp_declare_target)
>      list = tree_cons (get_identifier ("omp declare target"),
>        NULL_TREE, list);
>  
> -  if (sym_attr.oacc_function)
> +  if (acc_routine)
>      {
>        tree dims = NULL_TREE;
>        int ix;
> -      int level = sym_attr.oacc_function - 1;
> +      int level = GOMP_DIM_MAX;
> +
> +      switch (acc_routine)
> + {
> + case OACC_FUNCTION_GANG:
> +  level = GOMP_DIM_GANG;
> +  break;
> + case OACC_FUNCTION_WORKER:
> +  level = GOMP_DIM_WORKER;
> +  break;
> + case OACC_FUNCTION_VECTOR:
> +  level = GOMP_DIM_VECTOR;
> +  break;
> + case OACC_FUNCTION_SEQ:
> + default:;
> + }
>  
>        for (ix = GOMP_DIM_MAX; ix--;)
>   dims = tree_cons (build_int_cst (boolean_type_node, ix >= level),
> @@ -1344,6 +1348,27 @@ add_attributes_to_decl (symbol_attribute sym_attr, tree list)
>    return list;
>  }
>  
> +static tree
> +add_attributes_to_decl (symbol_attribute sym_attr, tree list)
> +{
> +  unsigned id;
> +  tree attr;
> +
> +  for (id = 0; id < EXT_ATTR_NUM; id++)
> +    if (sym_attr.ext_attr & (1 << id))
> +      {
> + attr = build_tree_list (
> + get_identifier (ext_attr_list[id].middle_end_name),
> + NULL_TREE);
> + list = chainon (list, attr);
> +      }
> +
> +  list = add_omp_offloading_attributes (sym_attr.omp_declare_target,
> + sym_attr.oacc_function, list);
> +
> +  return list;
> +}
Something that I had noticed before, possibly related here: code in
gcc/fortran/ does never call replace_oacc_fn_attrib, but the C and C++
front ends do.  Is that function what you've re-implemented here?

> --- a/gcc/lto-cgraph.c
> +++ b/gcc/lto-cgraph.c
> @@ -1201,9 +1201,11 @@ input_overwrite_node (struct lto_file_decl_data *file_data,
>       LDPR_NUM_KNOWN);
>    node->instrumentation_clone = bp_unpack_value (bp, 1);
>    node->split_part = bp_unpack_value (bp, 1);
> -  gcc_assert (flag_ltrans
> -      || (!node->in_other_partition
> -  && !node->used_from_other_partition));
> +
> +  int success = flag_ltrans || (!node->in_other_partition
> + && !node->used_from_other_partition);
> +  if (!success)
> +    error ("Missing %<%s%>", node->name ());
>  }
>  
>  /* Return string alias is alias of.  */
> @@ -1416,9 +1418,11 @@ input_varpool_node (struct lto_file_decl_data *file_data,
>      node->set_section_for_node (section);
>    node->resolution = streamer_read_enum (ib, ld_plugin_symbol_resolution,
>          LDPR_NUM_KNOWN);
> -  gcc_assert (flag_ltrans
> -      || (!node->in_other_partition
> -  && !node->used_from_other_partition));
> +
> +  int success = flag_ltrans || (!node->in_other_partition
> + && !node->used_from_other_partition);
> +  if (!success)
> +    error ("Missing %<%s%>", node->name ());
>  
>    return node;
>  }
That looks similar to what I remember from earlier, simiar patches, as
referenced above.

> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c
> @@ -17114,6 +17114,28 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_context *ctx)
>    default:
>      break;
>    }
> +      /* Warn if a non-'acc routine' function is called from an OpenACC
> + offloaded region.  */
> +      if (fndecl)
> + {
> +  omp_context *octx = ctx;
> +  bool is_oacc_offloaded = false;
> +
> +  /* Check if the current function is an 'acc routine'.  */
> +  if (get_oacc_fn_attrib (current_function_decl) != NULL_TREE)
> +    is_oacc_offloaded = true;
> +
> +  while (!is_oacc_offloaded && octx)
> +    {
> +      if (is_oacc_parallel (octx) || is_oacc_kernels (octx))
> + is_oacc_offloaded = true;
> +      octx = octx->outer;
> +    }
> +
> +  if (is_oacc_offloaded && get_oacc_fn_attrib (fndecl) == NULL_TREE)
> +    warning_at (gimple_location (call_stmt), 0,
> + "%qE is not an %<acc routine%>", fndecl);
> + }
>        /* FALLTHRU */
>      default:
>        if ((ctx || task_shared_vars)
Per my reasoning above, we should either get a undeclared symbol error
(if the target compiler doesn't know about the routine), or should get a
offloading compiler link-time error, if the -- implicit "seq" -- routine
is missing there.

> @@ -19420,7 +19442,8 @@ oacc_loop_fixed_partitions (oacc_loop *loop, unsigned outer_mask)
>      {
>        unsigned outermost = this_mask & -this_mask;
>  
> -      if (outermost && outermost <= outer_mask)
> +      if ((outermost && outermost <= outer_mask)
> +  || (this_mask && (loop->parent->flags & OLF_SEQ)))
>   {
>    if (noisy)
>      {

> --- a/gcc/testsuite/c-c++-common/goacc/routine-3.c
> +++ b/gcc/testsuite/c-c++-common/goacc/routine-3.c
> @@ -49,7 +49,7 @@ main ()
>    int red = 0;
>  #pragma acc parallel copy (red)
>    {
> -    /* Independent/seq loop tests.  */
> +    /* Independent loop tests.  */
>  #pragma acc loop reduction (+:red) // { dg-warning "insufficient partitioning" }
>      for (int i = 0; i < 10; i++)
>        red += gang ();
> @@ -62,6 +62,19 @@ main ()
>      for (int i = 0; i < 10; i++)
>        red += vector ();
>  
> +    /* Seq loop tests.  */
> +#pragma acc loop seq reduction (+:red)  /* { dg-message "containing loop" } */
> +    for (int i = 0; i < 10; i++)
> +      red += gang (); /* { dg-error "incorrectly nested" } */
> +
> +#pragma acc loop seq reduction (+:red)  /* { dg-message "containing loop" } */
> +    for (int i = 0; i < 10; i++)
> +      red += worker (); /* { dg-error "incorrectly nested" } */
> +
> +#pragma acc loop seq reduction (+:red)  /* { dg-message "containing loop" } */
> +    for (int i = 0; i < 10; i++)
> +      red += vector (); /* { dg-error "incorrectly nested" } */
> +    
>      /* Gang routine tests.  */
>  #pragma acc loop gang reduction (+:red)  /* { dg-message "containing loop" } */
>      for (int i = 0; i < 10; i++)
Do these test case changes actually relate to any of the compiler changes
discussed above?  Maybe to the oacc_loop_fixed_partitions cited just
above?  Is that a separate issue to fix?  Eh, or is that actually the fix
for your first issue, the "problems with acc routines [...] incorrectly
permitting 'acc seq' loops to call gang, worker and vector routines"?

> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels-ipa-pta.c
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels-ipa-pta.c
> @@ -1,4 +1,4 @@
>  /* { dg-do run { target lto } } */
> -/* { dg-additional-options "-fipa-pta -flto -flto-partition=max" } */
> +/* { dg-additional-options "-fipa-pta -flto -flto-partition=max -fno-exceptions" } */
>  
>  #include "data-clauses-kernels.c"

> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels.c
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels.c
> @@ -1,2 +1,4 @@
> +/* { dg-additional-options "-fno-exceptions" }  */
> +
>  #define CONSTRUCT kernels
>  #include "data-clauses.h"

> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel-ipa-pta.c
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel-ipa-pta.c
> @@ -1,4 +1,4 @@
>  /* { dg-do run { target lto } } */
> -/* { dg-additional-options "-fipa-pta -flto -flto-partition=max" } */
> +/* { dg-additional-options "-fipa-pta -flto -flto-partition=max -fno-exceptions" } */
>  
>  #include "data-clauses-parallel.c"

> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel.c
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel.c
> @@ -1,2 +1,4 @@
> +/* { dg-additional-options "-fno-exceptions" }  */
> +
>  #define CONSTRUCT parallel
>  #include "data-clauses.h"

Hmm?


Grüße
 Thomas

signature.asc (482 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH,openacc] check for compatible loop parallelism with acc routine calls

Jakub Jelinek
On Wed, Jun 29, 2016 at 04:11:31PM +0200, Thomas Schwinge wrote:

> > >> Actually, I relaxed non-acc routines by excluding
> > >> calls to builtin functions, including those prefixed with _gfortran_.
> > >> Nvptx does have a newlib c library, and it also has a subset of
> > >> libgfortran. Still, this solution is probably not optimal.
> > >
> > > I don't really like that, hardcoding prefixes or whatever is available
> > > (you have quite some subset of libc, libm etc. available too) in the
> > > compiler looks very hackish.  What is wrong with complaining during
> > > linking of the offloaded code?
>
> ACK.  Jakub, do I understand you correctly, that you basically say that
> every function declaration that is in scope inside offloaded regions (for
> example, GCC builtin functions, or standard library functions declared in
> target compiler's header files) is permitted to be called in offloaded
> regions, and the offloading compiler will then either be able to resolve
> these (nvptx back end knows about trigonometric functions, for example,
> and a lot of functions are available in the nvptx libc), or otherwise
> error out during the offloading compilation (during linking), gracefully
> without terminating the target compilation (that "gracefully" bit is
> currently missing -- that's for another day).  That is, all such
> functions are implicitly callable as OpenACC "seq" functions (which means
> that they don't internally use gang/worker/vector parallelism).  In
> particular, all these functions do *not* need to be marked with an
> explicit "#pragma acc routine seq" directive.  (Functions internally
> using gang/worker/vector parallelism will need to be marked
> appropriately, using a "#pragma acc routine gang/worker/vector"
> directive.)  That's how I understand your comment above, and your earlier
> comments on this topic, and also is what I think should be done.

Yes.  Well, OpenMP doesn't have different kinds of target functions, just
one.  And at least the current spec doesn't require that target regions or
declare target functions only call functions declared target, I guess mainly
because that would require that all the C/C++ headers are OpenMP aware and
declare everything that has the offloading counterpart.
For user code, of course users have to declare their routines, otherwise it
just can't be offloaded, and the implementation runtime is a very fuzzy
thing outside of the standard.

        Jakub
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH,openacc] check for compatible loop parallelism with acc routine calls

Cesar Philippidis
In reply to this post by Thomas Schwinge-8
On 06/29/2016 07:11 AM, Thomas Schwinge wrote:

> Cesar, I have not yet fully digested this, but do I understand right that
> you're really fixing two issues here, that are related (OpenACC routines)
> but still can be addressed independently of each other?  Do I understand
> right that the first one, the "problems with acc routines [...]
> incorrectly permitting 'acc seq' loops to call gang, worker and vector
> routines" is just a Fortran front end patch?  If yes, please split that
> one out, so as to reduce the volume of remaining changes that remain to
> be discussed.

This patch addresses the following issues:

 1. Issues warnings when a non-acc routine function is called inside an
    OpenACC offloaded region.

 2. It corrects a bug what was allowing seq loops to call gang, worker
    and vector routines.

 3. It adds supports for acc routines in fortran modules (which I
    noticed was missing when I added 'acc routine seq' to acc_on_device
    in the fortran openacc include files).

I'll split these into separate patches.

> On Thu, 23 Jun 2016 09:05:38 -0700, Cesar Philippidis <[hidden email]> wrote:
>> On 06/17/2016 07:42 AM, Jakub Jelinek wrote:
>>> On Wed, Jun 15, 2016 at 08:12:15PM -0700, Cesar Philippidis wrote:
>>>> The second set of changes involves teaching the gimplifier to error when
>>>> it detects a function call to an non-acc routines inside an OpenACC
>>>> offloaded region.
>
> As I understand, that's the same problem as has been discussed before
> (Ilya CCed), and has recently again been filed in
> <https://gcc.gnu.org/PR71499> "ICE in LTO1 when attempting NVPTX
> offloading (-fopenacc)", and <https://gcc.gnu.org/PR71535> "ICE in LTO1
> with -fopenmp offloading" (Alexander CCed).  Some earlier discussion
> threads include:
> <http://news.gmane.org/find-root.php?message_id=%3C20150109145702.GA45210%40msticlxl57.ims.intel.com%3E>,
> <http://news.gmane.org/find-root.php?message_id=%3C20150724152119.GA41292%40msticlxl57.ims.intel.com%3E>,
> <http://news.gmane.org/find-root.php?message_id=%3C56269C05.6030502%40acm.org%3E>.
>
>>>> Actually, I relaxed non-acc routines by excluding
>>>> calls to builtin functions, including those prefixed with _gfortran_.
>>>> Nvptx does have a newlib c library, and it also has a subset of
>>>> libgfortran. Still, this solution is probably not optimal.
>>>
>>> I don't really like that, hardcoding prefixes or whatever is available
>>> (you have quite some subset of libc, libm etc. available too) in the
>>> compiler looks very hackish.  What is wrong with complaining during
>>> linking of the offloaded code?
>
> ACK.  Jakub, do I understand you correctly, that you basically say that
> every function declaration that is in scope inside offloaded regions (for
> example, GCC builtin functions, or standard library functions declared in
> target compiler's header files) is permitted to be called in offloaded
> regions, and the offloading compiler will then either be able to resolve
> these (nvptx back end knows about trigonometric functions, for example,
> and a lot of functions are available in the nvptx libc), or otherwise
> error out during the offloading compilation (during linking), gracefully
> without terminating the target compilation (that "gracefully" bit is
> currently missing -- that's for another day).  That is, all such
> functions are implicitly callable as OpenACC "seq" functions (which means
> that they don't internally use gang/worker/vector parallelism).  In
> particular, all these functions do *not* need to be marked with an
> explicit "#pragma acc routine seq" directive.  (Functions internally
> using gang/worker/vector parallelism will need to be marked
> appropriately, using a "#pragma acc routine gang/worker/vector"
> directive.)  That's how I understand your comment above, and your earlier
> comments on this topic, and also is what I think should be done.

OK. I'll drop the warning changes from my patch set then unless you want
to keep it.

> A few random comments on the patch:
>
>> --- a/gcc/fortran/gfortran.h
>> +++ b/gcc/fortran/gfortran.h
>> @@ -303,6 +303,15 @@ enum save_state
>>  { SAVE_NONE = 0, SAVE_EXPLICIT, SAVE_IMPLICIT
>>  };
>>  
>> +/* Flags to keep track of ACC routine states.  */
>> +enum oacc_function
>> +{ OACC_FUNCTION_NONE = 0,
>> +  OACC_FUNCTION_SEQ,
>> +  OACC_FUNCTION_GANG,
>> +  OACC_FUNCTION_WORKER,
>> +  OACC_FUNCTION_VECTOR
>> +};
>
> What's the purpose of OACC_FUNCTION_NONE?  It's not used anywhere, as far
> as I can tell?

It's used by the fortran module code. It controls how parallelism gets
encoded in the .mod files.

>> --- a/gcc/fortran/openmp.c
>> +++ b/gcc/fortran/openmp.c
>> @@ -1664,21 +1664,31 @@ gfc_match_oacc_cache (void)
>>  
>>  /* Determine the loop level for a routine.   */
>>  
>> -static int
>> +static oacc_function
>>  gfc_oacc_routine_dims (gfc_omp_clauses *clauses)
>>  {
>>    int level = -1;
>> +  oacc_function ret = OACC_FUNCTION_SEQ;
>>  
>>    if (clauses)
>>      {
>>        unsigned mask = 0;
>>  
>>        if (clauses->gang)
>> - level = GOMP_DIM_GANG, mask |= GOMP_DIM_MASK (level);
>> + {
>> +  level = GOMP_DIM_GANG, mask |= GOMP_DIM_MASK (level);
>> +  ret = OACC_FUNCTION_GANG;
>> + }
>>        if (clauses->worker)
>> - level = GOMP_DIM_WORKER, mask |= GOMP_DIM_MASK (level);
>> + {
>> +  level = GOMP_DIM_WORKER, mask |= GOMP_DIM_MASK (level);
>> +  ret = OACC_FUNCTION_WORKER;
>> + }
>>        if (clauses->vector)
>> - level = GOMP_DIM_VECTOR, mask |= GOMP_DIM_MASK (level);
>> + {
>> +  level = GOMP_DIM_VECTOR, mask |= GOMP_DIM_MASK (level);
>> +  ret = OACC_FUNCTION_VECTOR;
>> + }
>>        if (clauses->seq)
>>   level = GOMP_DIM_MAX, mask |= GOMP_DIM_MASK (level);
>>  
>
> I have not looked in detail, so maybe I'm misunderstanding what is being
> done here -- but how do "clauses->seq" and "level = GOMP_DIM_MAX" fit
> together?

Conceptually, if you take a look at the oacc_function attribute in a
tree dump, you'll see an array with three elements. Basically, each
element in that array represents a gang, worker or vector parallelism.
By definition, a gang loop permits a worker and vector loop to be nested
inside it. So, for a gang routine, the oacc_function attribute is
constructed such that it permits gang, worker and vector level
parallelism. Similarly, for a worker routine, the oacc_function
attribute has the worker and vector level parallelism 'bits' set.

With that in mind, setting seq to GOMP_DIM_MASK allows the loop creating
that oacc_function attribute to mask out any gang, worker and vector
parallelism.

>> @@ -1689,7 +1699,7 @@ gfc_oacc_routine_dims (gfc_omp_clauses *clauses)
>>    if (level < 0)
>>      level = GOMP_DIM_MAX;
>>  
>> -  return level;
>> +  return ret;
>>  }
>
> Just from that last hunk, it seems that the assignment to "level" is a
> dead store?

I'll need to check this when I split out the patch.

>> +static tree
>> +add_attributes_to_decl (symbol_attribute sym_attr, tree list)
>> +{
>> +  unsigned id;
>> +  tree attr;
>> +
>> +  for (id = 0; id < EXT_ATTR_NUM; id++)
>> +    if (sym_attr.ext_attr & (1 << id))
>> +      {
>> + attr = build_tree_list (
>> + get_identifier (ext_attr_list[id].middle_end_name),
>> + NULL_TREE);
>> + list = chainon (list, attr);
>> +      }
>> +
>> +  list = add_omp_offloading_attributes (sym_attr.omp_declare_target,
>> + sym_attr.oacc_function, list);
>> +
>> +  return list;
>> +}
>
> Something that I had noticed before, possibly related here: code in
> gcc/fortran/ does never call replace_oacc_fn_attrib, but the C and C++
> front ends do.  Is that function what you've re-implemented here?

Similar, but I broke this code out from another function to handle
BUILT_IN_EXPECT. But I can revert this change now, since BUILT_IN_EXPECT
will be treated as an implicit SEQ routine.

>> --- a/gcc/lto-cgraph.c
>> +++ b/gcc/lto-cgraph.c
>> @@ -1201,9 +1201,11 @@ input_overwrite_node (struct lto_file_decl_data *file_data,
>>       LDPR_NUM_KNOWN);
>>    node->instrumentation_clone = bp_unpack_value (bp, 1);
>>    node->split_part = bp_unpack_value (bp, 1);
>> -  gcc_assert (flag_ltrans
>> -      || (!node->in_other_partition
>> -  && !node->used_from_other_partition));
>> +
>> +  int success = flag_ltrans || (!node->in_other_partition
>> + && !node->used_from_other_partition);
>> +  if (!success)
>> +    error ("Missing %<%s%>", node->name ());
>>  }
>>  
>>  /* Return string alias is alias of.  */
>> @@ -1416,9 +1418,11 @@ input_varpool_node (struct lto_file_decl_data *file_data,
>>      node->set_section_for_node (section);
>>    node->resolution = streamer_read_enum (ib, ld_plugin_symbol_resolution,
>>          LDPR_NUM_KNOWN);
>> -  gcc_assert (flag_ltrans
>> -      || (!node->in_other_partition
>> -  && !node->used_from_other_partition));
>> +
>> +  int success = flag_ltrans || (!node->in_other_partition
>> + && !node->used_from_other_partition);
>> +  if (!success)
>> +    error ("Missing %<%s%>", node->name ());
>>  
>>    return node;
>>  }
>
> That looks similar to what I remember from earlier, simiar patches, as
> referenced above.

It is. I never got around to pushing that patch very strongly because I
thought those link failures were legitimate compiler bugs.

>> @@ -19420,7 +19442,8 @@ oacc_loop_fixed_partitions (oacc_loop *loop, unsigned outer_mask)
>>      {
>>        unsigned outermost = this_mask & -this_mask;
>>  
>> -      if (outermost && outermost <= outer_mask)
>> +      if ((outermost && outermost <= outer_mask)
>> +  || (this_mask && (loop->parent->flags & OLF_SEQ)))
>>   {
>>    if (noisy)
>>      {
>
>> --- a/gcc/testsuite/c-c++-common/goacc/routine-3.c
>> +++ b/gcc/testsuite/c-c++-common/goacc/routine-3.c
>> @@ -49,7 +49,7 @@ main ()
>>    int red = 0;
>>  #pragma acc parallel copy (red)
>>    {
>> -    /* Independent/seq loop tests.  */
>> +    /* Independent loop tests.  */
>>  #pragma acc loop reduction (+:red) // { dg-warning "insufficient partitioning" }
>>      for (int i = 0; i < 10; i++)
>>        red += gang ();
>> @@ -62,6 +62,19 @@ main ()
>>      for (int i = 0; i < 10; i++)
>>        red += vector ();
>>  
>> +    /* Seq loop tests.  */
>> +#pragma acc loop seq reduction (+:red)  /* { dg-message "containing loop" } */
>> +    for (int i = 0; i < 10; i++)
>> +      red += gang (); /* { dg-error "incorrectly nested" } */
>> +
>> +#pragma acc loop seq reduction (+:red)  /* { dg-message "containing loop" } */
>> +    for (int i = 0; i < 10; i++)
>> +      red += worker (); /* { dg-error "incorrectly nested" } */
>> +
>> +#pragma acc loop seq reduction (+:red)  /* { dg-message "containing loop" } */
>> +    for (int i = 0; i < 10; i++)
>> +      red += vector (); /* { dg-error "incorrectly nested" } */
>> +    
>>      /* Gang routine tests.  */
>>  #pragma acc loop gang reduction (+:red)  /* { dg-message "containing loop" } */
>>      for (int i = 0; i < 10; i++)
>
> Do these test case changes actually relate to any of the compiler changes
> discussed above?  Maybe to the oacc_loop_fixed_partitions cited just
> above?  Is that a separate issue to fix?  Eh, or is that actually the fix
> for your first issue, the "problems with acc routines [...] incorrectly
> permitting 'acc seq' loops to call gang, worker and vector routines"?

This is issue 2, and I'll break it out into a separate patch.

>> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels-ipa-pta.c
>> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels-ipa-pta.c
>> @@ -1,4 +1,4 @@
>>  /* { dg-do run { target lto } } */
>> -/* { dg-additional-options "-fipa-pta -flto -flto-partition=max" } */
>> +/* { dg-additional-options "-fipa-pta -flto -flto-partition=max -fno-exceptions" } */
>>  
>>  #include "data-clauses-kernels.c"
>
>> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels.c
>> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels.c
>> @@ -1,2 +1,4 @@
>> +/* { dg-additional-options "-fno-exceptions" }  */
>> +
>>  #define CONSTRUCT kernels
>>  #include "data-clauses.h"
>
>> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel-ipa-pta.c
>> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel-ipa-pta.c
>> @@ -1,4 +1,4 @@
>>  /* { dg-do run { target lto } } */
>> -/* { dg-additional-options "-fipa-pta -flto -flto-partition=max" } */
>> +/* { dg-additional-options "-fipa-pta -flto -flto-partition=max -fno-exceptions" } */
>>  
>>  #include "data-clauses-parallel.c"
>
>> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel.c
>> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel.c
>> @@ -1,2 +1,4 @@
>> +/* { dg-additional-options "-fno-exceptions" }  */
>> +
>>  #define CONSTRUCT parallel
>>  #include "data-clauses.h"
>
> Hmm?

I'm not sure what happened here either. Maybe adding the 'acc routine'
directive to acc_on_device is preventing that function from expanding to
its builtin function counterpart, which caused gcc to generate exception
code?

Cesar

Reply | Threaded
Open this post in threaded view
|

[PR72741] Encode OpenACC 'routine' directive inside Fortran module files

Thomas Schwinge-8
In reply to this post by Cesar Philippidis
Hi!

On Wed, 15 Jun 2016 20:12:15 -0700, Cesar Philippidis <[hidden email]> wrote:
> [...], this patch updates the way that
> the fortran FE handles the 'acc routine' attribute in modules. Before,
> it only recorded that a function was marked as an acc routine.

(By means of 'OMP_DECLARE_TARGET', that is.)

> With this
> patch, it now records the level of parallelism the routine has. This is
> necessary for the middle end to validate compatible parallelism between
> the loop calling the routine and the routine itself.

This patch has seen a bunch of further revisions later on.  I've now
singled out the changes that are actually relevant for the feature under
discussion here, and added test cases that actually test what they
describe to be testing...  ;-)


The code changes now are actually very simple.  The "problem" is that
we're incrementing the Fortran module version, 'MOD_VERSION', which
breaks binary compatibility with Fortran module files created with
earlier versions of GCC, which is something that is to be avoided, as
I've heard.  Or, is it not that bad actually?

We might be able to resolve this by encoding individual "bits" for the
'gang'/'worker'/'vector'/'seq' clauses, instead of using the values for
'enum oacc_routine_lop' via 'oacc_routine_lop_types'.  The earlier
Fortran module files would simply not have these bits set, and could thus
still be read (given 'MOD_VERSION' not incremented).  Would that be a
solution to this issue?  (You'd then get a parse error when trying to use
with an older version of GCC any Fortran module files created with a
newer version, which seems OK?)

Or, an idea I just had (but not yet verified), guard the stream-out and
stream-in of 'attr->oacc_routine_lop' to just happen if '-fopenacc' is
active (or some such), which thus won't affect that majority of users.
Would that be a solution to this issue?


See attached, any comments?

And, can this still go into trunk now?  (Rationale: I'd like to later fix
this issue on GCC release branches also, because this is quite a
limitation to usage.  This will thus again bring up the 'MOD_VERSION'
issue.)


Grüße
 Thomas



From f2c4627045e70a6f6c52914cf6334392aca75230 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <[hidden email]>
Date: Thu, 28 Feb 2019 17:45:13 +0100
Subject: [PATCH 1/2] [testsuite, Fortran] Provide 'dg-compile-aux-modules' in
 'gfortran.dg/goacc/goacc.exp'

..., as yet another copy from 'gfortran.dg/dg.exp', which there are a few
already.

        gcc/testsuite/
        * gfortran.dg/goacc/goacc.exp (dg-compile-aux-modules): New proc.
---
 gcc/testsuite/gfortran.dg/goacc/goacc.exp | 25 +++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/gcc/testsuite/gfortran.dg/goacc/goacc.exp b/gcc/testsuite/gfortran.dg/goacc/goacc.exp
index f1adb186a1e4..409c5fe54003 100644
--- a/gcc/testsuite/gfortran.dg/goacc/goacc.exp
+++ b/gcc/testsuite/gfortran.dg/goacc/goacc.exp
@@ -28,6 +28,31 @@ if ![check_effective_target_fopenacc] {
 # Initialize `dg'.
 dg-init
 
+global gfortran_test_path
+global gfortran_aux_module_flags
+set gfortran_test_path $srcdir/$subdir
+set gfortran_aux_module_flags "-fopenacc"
+proc dg-compile-aux-modules { args } {
+    global gfortran_test_path
+    global gfortran_aux_module_flags
+    if { [llength $args] != 2 } {
+ error "dg-set-target-env-var: needs one argument"
+ return
+    }
+
+    set level [info level]
+    if { [info procs dg-save-unknown] != [list] } {
+ rename dg-save-unknown dg-save-unknown-level-$level
+    }
+
+    dg-test $gfortran_test_path/[lindex $args 1] "" $gfortran_aux_module_flags
+    # cleanup-modules is intentionally not invoked here.
+
+    if { [info procs dg-save-unknown-level-$level] != [list] } {
+ rename dg-save-unknown-level-$level dg-save-unknown
+    }
+}
+
 # Main loop.
 gfortran-dg-runtest [lsort \
        [find $srcdir/$subdir *.\[fF\]{,90,95,03,08} ] ] "" "-fopenacc"
--
2.17.1


From d947a297d224ced389abd7ad74d3519b4a0e8d32 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <[hidden email]>
Date: Thu, 28 Feb 2019 21:58:14 +0100
Subject: [PATCH 2/2] [PR72741] Encode OpenACC 'routine' directive inside
 Fortran module files

        gcc/fortran/
        * gfortran.h (oacc_routine_lop_types): Declare.
        * module.c (MOD_VERSION): Increment
        (oacc_routine_lop): New DECL_MIO_NAME.
        (mio_symbol_attribute): Set the oacc_routine_lop attribute.
        * symbol.c (oacc_routine_lop_types): Define.
        gcc/testsuite/
        * gfortran.dg/goacc/routine-module-1.f90: New file.
        * gfortran.dg/goacc/routine-module-2.f90: Likewise.
        * gfortran.dg/goacc/routine-module-mod-1.f90: Likewise.
---
 gcc/fortran/gfortran.h                        |  1 +
 gcc/fortran/module.c                          |  6 +-
 gcc/fortran/symbol.c                          | 12 +++
 .../gfortran.dg/goacc/routine-module-1.f90    | 47 +++++++++++
 .../gfortran.dg/goacc/routine-module-2.f90    | 31 ++++++++
 .../goacc/routine-module-mod-1.f90            | 79 +++++++++++++++++++
 6 files changed, 175 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/routine-module-1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/routine-module-2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/routine-module-mod-1.f90

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 3e0f634c3a8e..0d929d4d0c2c 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -336,6 +336,7 @@ extern const mstring intents[];
 extern const mstring access_types[];
 extern const mstring ifsrc_types[];
 extern const mstring save_status[];
+extern const mstring oacc_routine_lop_types[];
 
 /* Strings for DTIO procedure names.  In symbol.c.  */
 extern const mstring dtio_procs[];
diff --git a/gcc/fortran/module.c b/gcc/fortran/module.c
index 320b30c529ac..ce43997ccf48 100644
--- a/gcc/fortran/module.c
+++ b/gcc/fortran/module.c
@@ -84,7 +84,7 @@ along with GCC; see the file COPYING3.  If not see
 
 /* Don't put any single quote (') in MOD_VERSION, if you want it to be
    recognized.  */
-#define MOD_VERSION "15"
+#define MOD_VERSION "16"
 
 
 /* Structure that describes a position within a module file.  */
@@ -2126,6 +2126,7 @@ DECL_MIO_NAME (ref_type)
 DECL_MIO_NAME (sym_flavor)
 DECL_MIO_NAME (sym_intent)
 DECL_MIO_NAME (inquiry_type)
+DECL_MIO_NAME (oacc_routine_lop)
 #undef DECL_MIO_NAME
 
 /* Symbol attributes are stored in list with the first three elements
@@ -2147,6 +2148,9 @@ mio_symbol_attribute (symbol_attribute *attr)
   attr->proc = MIO_NAME (procedure_type) (attr->proc, procedures);
   attr->if_source = MIO_NAME (ifsrc) (attr->if_source, ifsrc_types);
   attr->save = MIO_NAME (save_state) (attr->save, save_status);
+  attr->oacc_routine_lop
+    = MIO_NAME (oacc_routine_lop) (attr->oacc_routine_lop,
+   oacc_routine_lop_types);
 
   ext_attr = attr->ext_attr;
   mio_integer ((int *) &ext_attr);
diff --git a/gcc/fortran/symbol.c b/gcc/fortran/symbol.c
index c8a1f842d353..3c3d8cb22f7b 100644
--- a/gcc/fortran/symbol.c
+++ b/gcc/fortran/symbol.c
@@ -96,6 +96,18 @@ const mstring dtio_procs[] =
     minit ("_dtio_unformatted_write", DTIO_WUF),
 };
 
+const mstring oacc_routine_lop_types[] =
+{
+    minit ("OACC_ROUTINE_LOP_NONE", OACC_ROUTINE_LOP_NONE),
+    minit ("OACC_ROUTINE_LOP_GANG", OACC_ROUTINE_LOP_GANG),
+    minit ("OACC_ROUTINE_LOP_WORKER", OACC_ROUTINE_LOP_WORKER),
+    minit ("OACC_ROUTINE_LOP_VECTOR", OACC_ROUTINE_LOP_VECTOR),
+    minit ("OACC_ROUTINE_LOP_SEQ", OACC_ROUTINE_LOP_SEQ),
+    /* 'OACC_ROUTINE_LOP_ERROR' intentionally ommitted here; it's only unsed
+       internally.  */
+    minit (NULL, -1)
+};
+
 /* This is to make sure the backend generates setup code in the correct
    order.  */
 
diff --git a/gcc/testsuite/gfortran.dg/goacc/routine-module-1.f90 b/gcc/testsuite/gfortran.dg/goacc/routine-module-1.f90
new file mode 100644
index 000000000000..4e81f11fec86
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/routine-module-1.f90
@@ -0,0 +1,47 @@
+! Valid use of routines defined inside a Fortran module.
+
+! { dg-compile-aux-modules "routine-module-mod-1.f90" }
+! { dg-additional-options "-fopt-info-optimized-omp" }
+
+program main
+  use routine_module_mod_1
+  implicit none
+
+  integer :: i
+
+  call pl_1
+
+  !$acc parallel loop seq ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+  do i = 1, 10
+     call s_1 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+     call s_2 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+     call g_1 ! { dg-message "optimized: assigned OpenACC gang worker vector loop parallelism" }
+     call w_1 ! { dg-message "optimized: assigned OpenACC worker vector loop parallelism" }
+     call v_1 ! { dg-message "optimized: assigned OpenACC vector loop parallelism" }
+  end do
+  !$acc end parallel loop
+
+  !$acc parallel loop gang ! { dg-message "optimized: assigned OpenACC gang loop parallelism" }
+  do i = 1, 10
+     call s_1 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+     call s_2 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+     call w_1 ! { dg-message "optimized: assigned OpenACC worker vector loop parallelism" }
+     call v_1 ! { dg-message "optimized: assigned OpenACC vector loop parallelism" }
+  end do
+  !$acc end parallel loop
+
+  !$acc parallel loop worker ! { dg-message "optimized: assigned OpenACC worker loop parallelism" }
+  do i = 1, 10
+     call s_1 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+     call s_2 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+     call v_1 ! { dg-message "optimized: assigned OpenACC vector loop parallelism" }
+  end do
+  !$acc end parallel loop
+
+  !$acc parallel loop vector ! { dg-message "optimized: assigned OpenACC vector loop parallelism" }
+  do i = 1, 10
+     call s_1 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+     call s_2 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+  end do
+  !$acc end parallel loop
+end program main
diff --git a/gcc/testsuite/gfortran.dg/goacc/routine-module-2.f90 b/gcc/testsuite/gfortran.dg/goacc/routine-module-2.f90
new file mode 100644
index 000000000000..eae0807643c9
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/routine-module-2.f90
@@ -0,0 +1,31 @@
+! Invalid use of routines defined inside a Fortran module.
+
+! { dg-compile-aux-modules "routine-module-mod-1.f90" }
+
+program main
+  use routine_module_mod_1
+  implicit none
+
+  integer :: i
+
+  !$acc parallel loop gang
+  do i = 1, 10
+     call g_1 ! { dg-error "routine call uses same OpenACC parallelism as containing loop" }
+  end do
+  !$acc end parallel loop
+
+  !$acc parallel loop worker
+  do i = 1, 10
+     call g_1 ! { dg-error "routine call uses same OpenACC parallelism as containing loop" }
+     call w_1 ! { dg-error "routine call uses same OpenACC parallelism as containing loop" }
+  end do
+  !$acc end parallel loop
+
+  !$acc parallel loop vector
+  do i = 1, 10
+     call g_1 ! { dg-error "routine call uses same OpenACC parallelism as containing loop" }
+     call w_1 ! { dg-error "routine call uses same OpenACC parallelism as containing loop" }
+     call v_1 ! { dg-error "routine call uses same OpenACC parallelism as containing loop" }
+  end do
+  !$acc end parallel loop
+end program main
diff --git a/gcc/testsuite/gfortran.dg/goacc/routine-module-mod-1.f90 b/gcc/testsuite/gfortran.dg/goacc/routine-module-mod-1.f90
new file mode 100644
index 000000000000..8f73db41d523
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/routine-module-mod-1.f90
@@ -0,0 +1,79 @@
+! OpenACC 'routine' directives inside a Fortran module.
+
+! { dg-additional-options "-fopt-info-optimized-omp" }
+
+module routine_module_mod_1
+contains
+  subroutine s_1
+    implicit none
+    !$acc routine
+
+    integer :: i
+
+    !$acc loop ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+    ! { dg-warning "insufficient partitioning available to parallelize loop" "" { target *-*-* } .-1 }
+    do i = 1, 3
+    end do
+  end subroutine s_1
+
+  subroutine s_2
+    implicit none
+    !$acc routine seq
+
+    integer :: i
+
+    !$acc loop ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+    ! { dg-warning "insufficient partitioning available to parallelize loop" "" { target *-*-* } .-1 }
+    do i = 1, 3
+    end do
+  end subroutine s_2
+
+  subroutine v_1
+    implicit none
+    !$acc routine vector
+
+    integer :: i
+
+    !$acc loop ! { dg-message "optimized: assigned OpenACC vector loop parallelism" }
+    do i = 1, 3
+    end do
+  end subroutine v_1
+
+  subroutine w_1
+    implicit none
+    !$acc routine worker
+
+    integer :: i
+
+    !$acc loop ! { dg-message "optimized: assigned OpenACC worker vector loop parallelism" }
+    do i = 1, 3
+    end do
+  end subroutine w_1
+
+  subroutine g_1
+    implicit none
+    !$acc routine gang
+
+    integer :: i
+
+    !$acc loop ! { dg-message "optimized: assigned OpenACC gang vector loop parallelism" }
+    do i = 1, 3
+    end do
+  end subroutine g_1
+
+  subroutine pl_1
+    implicit none
+
+    integer :: i
+
+    !$acc parallel loop ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+    ! { dg-warning "insufficient partitioning available to parallelize loop" "" { target *-*-* } .-1 }
+    do i = 1, 3
+       call s_1 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+       call s_2 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+       call g_1 ! { dg-message "optimized: assigned OpenACC gang worker vector loop parallelism" }
+       call w_1 ! { dg-message "optimized: assigned OpenACC worker vector loop parallelism" }
+       call v_1 ! { dg-message "optimized: assigned OpenACC vector loop parallelism" }
+    end do
+  end subroutine pl_1
+end module routine_module_mod_1
--
2.17.1


signature.asc (671 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PR72741] Encode OpenACC 'routine' directive inside Fortran module files

Jakub Jelinek
On Thu, Feb 28, 2019 at 10:12:00PM +0100, Thomas Schwinge wrote:
> On Wed, 15 Jun 2016 20:12:15 -0700, Cesar Philippidis <[hidden email]> wrote:
> The code changes now are actually very simple.  The "problem" is that
> we're incrementing the Fortran module version, 'MOD_VERSION', which
> breaks binary compatibility with Fortran module files created with
> earlier versions of GCC, which is something that is to be avoided, as
> I've heard.  Or, is it not that bad actually?

It is bad and we certainly shouldn't change it on release branches.
There are many ways to deal with it without bumping MOD_VERSION in a
backwards but not forwards compatible way, so that a newer compiler will be
able to parse old *.mod files, and newer compiler new ones as long as this
problematic stuff doesn't appear in.

        Jakub
Reply | Threaded
Open this post in threaded view
|

[PR72741] Encode OpenACC 'routine' directive's level of parallelism inside Fortran module files

Thomas Schwinge-8
Hi!

On Thu, 28 Feb 2019 22:17:01 +0100, Jakub Jelinek <[hidden email]> wrote:
> On Thu, Feb 28, 2019 at 10:12:00PM +0100, Thomas Schwinge wrote:
> > On Wed, 15 Jun 2016 20:12:15 -0700, Cesar Philippidis <[hidden email]> wrote:
> > The code changes now are actually very simple.  The "problem" is that
> > we're incrementing the Fortran module version, 'MOD_VERSION', which
> > breaks binary compatibility with Fortran module files created with
> > earlier versions of GCC, which is something that is to be avoided, as
> > I've heard.  Or, is it not that bad actually?
>
> It is bad and we certainly shouldn't change it on release branches.

ACK.

> There are many ways to deal with it without bumping MOD_VERSION in a
> backwards but not forwards compatible way, so that a newer compiler will be
> able to parse old *.mod files, and newer compiler new ones as long as this
> problematic stuff doesn't appear in.

Like the attached, actually pretty simple now?

It may seem wasteful to use individual bits for 'gang', 'worker',
'vector', 'seq', but that makes it easy to implement the behavior
described above by Jakub, and I've heard rumors that OpenACC might at
some point allow several of these level of parallelism clauses to be
specified (plus a new 'auto' clause?), and then this will be necessary
anyway (several of these bits can then in fact appear).

If approving this patch, please respond with "Reviewed-by: NAME <EMAIL>"
so that your effort will be recorded in the commit log, see
<https://gcc.gnu.org/wiki/Reviewed-by>.


Grüße
 Thomas



From b2f200d24d040c6d34b5b4421e4cb1be9786030f Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <[hidden email]>
Date: Wed, 13 Mar 2019 18:39:53 +0100
Subject: [PATCH] [PR72741] Encode OpenACC 'routine' directive's level of
 parallelism inside Fortran module files

If 'use'ing with an old GCC a new module file (with OpenACC 'routine'
directive's level of parallelism encoded), then that expectedly fails as
follows:

    f951: Fatal Error: Reading module 'routine_module_mod_1' at line 27 column 21: find_enum(): Enum not found

If 'use'ing with a new GCC an old module file (without OpenACC 'routine'
directive's level of parallelism encoded), then that (silently) continues to
accept the module file, and will proceed with the previous, erroneous behavior.

These seem to be acceptable compromises, instead of incrementing 'MOD_VERSION'.

        gcc/fortran/
        * module.c (verify_OACC_ROUTINE_LOP_NONE): New function.
        (enum ab_attribute): Add AB_OACC_ROUTINE_LOP_GANG,
        AB_OACC_ROUTINE_LOP_WORKER, AB_OACC_ROUTINE_LOP_VECTOR,
        AB_OACC_ROUTINE_LOP_SEQ.
        (attr_bits): Add these.
        (mio_symbol_attribute): Handle these.
        gcc/testsuite/
        * gfortran.dg/goacc/routine-module-1.f90: New file.
        * gfortran.dg/goacc/routine-module-2.f90: Likewise.
        * gfortran.dg/goacc/routine-module-mod-1.f90: Likewise.
---
 gcc/fortran/module.c                          | 57 ++++++++++++-
 .../gfortran.dg/goacc/routine-module-1.f90    | 47 +++++++++++
 .../gfortran.dg/goacc/routine-module-2.f90    | 31 ++++++++
 .../goacc/routine-module-mod-1.f90            | 79 +++++++++++++++++++
 4 files changed, 213 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/routine-module-1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/routine-module-2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/routine-module-mod-1.f90

diff --git a/gcc/fortran/module.c b/gcc/fortran/module.c
index 0572b8e02c17..39b420039dff 100644
--- a/gcc/fortran/module.c
+++ b/gcc/fortran/module.c
@@ -2011,7 +2011,9 @@ enum ab_attribute
   AB_OACC_DECLARE_COPYIN, AB_OACC_DECLARE_DEVICEPTR,
   AB_OACC_DECLARE_DEVICE_RESIDENT, AB_OACC_DECLARE_LINK,
   AB_OMP_DECLARE_TARGET_LINK, AB_PDT_KIND, AB_PDT_LEN, AB_PDT_TYPE,
-  AB_PDT_TEMPLATE, AB_PDT_ARRAY, AB_PDT_STRING
+  AB_PDT_TEMPLATE, AB_PDT_ARRAY, AB_PDT_STRING,
+  AB_OACC_ROUTINE_LOP_GANG, AB_OACC_ROUTINE_LOP_WORKER,
+  AB_OACC_ROUTINE_LOP_VECTOR, AB_OACC_ROUTINE_LOP_SEQ
 };
 
 static const mstring attr_bits[] =
@@ -2081,6 +2083,10 @@ static const mstring attr_bits[] =
     minit ("PDT_TEMPLATE", AB_PDT_TEMPLATE),
     minit ("PDT_ARRAY", AB_PDT_ARRAY),
     minit ("PDT_STRING", AB_PDT_STRING),
+    minit ("OACC_ROUTINE_LOP_GANG", AB_OACC_ROUTINE_LOP_GANG),
+    minit ("OACC_ROUTINE_LOP_WORKER", AB_OACC_ROUTINE_LOP_WORKER),
+    minit ("OACC_ROUTINE_LOP_VECTOR", AB_OACC_ROUTINE_LOP_VECTOR),
+    minit ("OACC_ROUTINE_LOP_SEQ", AB_OACC_ROUTINE_LOP_SEQ),
     minit (NULL, -1)
 };
 
@@ -2128,6 +2134,15 @@ DECL_MIO_NAME (sym_intent)
 DECL_MIO_NAME (inquiry_type)
 #undef DECL_MIO_NAME
 
+/* Verify OACC_ROUTINE_LOP_NONE.  */
+
+static void
+verify_OACC_ROUTINE_LOP_NONE (enum oacc_routine_lop lop)
+{
+  if (lop != OACC_ROUTINE_LOP_NONE)
+    bad_module ("Unsupported: multiple OpenACC 'routine' levels of parallelism");
+}
+
 /* Symbol attributes are stored in list with the first three elements
    being the enumerated fields, while the remaining elements (if any)
    indicate the individual attribute bits.  The access field is not
@@ -2292,6 +2307,30 @@ mio_symbol_attribute (symbol_attribute *attr)
  MIO_NAME (ab_attribute) (AB_PDT_ARRAY, attr_bits);
       if (attr->pdt_string)
  MIO_NAME (ab_attribute) (AB_PDT_STRING, attr_bits);
+      switch (attr->oacc_routine_lop)
+ {
+ case OACC_ROUTINE_LOP_NONE:
+  /* This is the default anyway, and for maintaining compatibility with
+     the current MOD_VERSION, we're not emitting anything in that
+     case.  */
+  break;
+ case OACC_ROUTINE_LOP_GANG:
+  MIO_NAME (ab_attribute) (AB_OACC_ROUTINE_LOP_GANG, attr_bits);
+  break;
+ case OACC_ROUTINE_LOP_WORKER:
+  MIO_NAME (ab_attribute) (AB_OACC_ROUTINE_LOP_WORKER, attr_bits);
+  break;
+ case OACC_ROUTINE_LOP_VECTOR:
+  MIO_NAME (ab_attribute) (AB_OACC_ROUTINE_LOP_VECTOR, attr_bits);
+  break;
+ case OACC_ROUTINE_LOP_SEQ:
+  MIO_NAME (ab_attribute) (AB_OACC_ROUTINE_LOP_SEQ, attr_bits);
+  break;
+ case OACC_ROUTINE_LOP_ERROR:
+  /* ... intentionally omitted here; it's only unsed internally.  */
+ default:
+  gcc_unreachable ();
+ }
 
       mio_rparen ();
 
@@ -2503,6 +2542,22 @@ mio_symbol_attribute (symbol_attribute *attr)
     case AB_PDT_STRING:
       attr->pdt_string = 1;
       break;
+    case AB_OACC_ROUTINE_LOP_GANG:
+      verify_OACC_ROUTINE_LOP_NONE (attr->oacc_routine_lop);
+      attr->oacc_routine_lop = OACC_ROUTINE_LOP_GANG;
+      break;
+    case AB_OACC_ROUTINE_LOP_WORKER:
+      verify_OACC_ROUTINE_LOP_NONE (attr->oacc_routine_lop);
+      attr->oacc_routine_lop = OACC_ROUTINE_LOP_WORKER;
+      break;
+    case AB_OACC_ROUTINE_LOP_VECTOR:
+      verify_OACC_ROUTINE_LOP_NONE (attr->oacc_routine_lop);
+      attr->oacc_routine_lop = OACC_ROUTINE_LOP_VECTOR;
+      break;
+    case AB_OACC_ROUTINE_LOP_SEQ:
+      verify_OACC_ROUTINE_LOP_NONE (attr->oacc_routine_lop);
+      attr->oacc_routine_lop = OACC_ROUTINE_LOP_SEQ;
+      break;
     }
  }
     }
diff --git a/gcc/testsuite/gfortran.dg/goacc/routine-module-1.f90 b/gcc/testsuite/gfortran.dg/goacc/routine-module-1.f90
new file mode 100644
index 000000000000..4e81f11fec86
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/routine-module-1.f90
@@ -0,0 +1,47 @@
+! Valid use of routines defined inside a Fortran module.
+
+! { dg-compile-aux-modules "routine-module-mod-1.f90" }
+! { dg-additional-options "-fopt-info-optimized-omp" }
+
+program main
+  use routine_module_mod_1
+  implicit none
+
+  integer :: i
+
+  call pl_1
+
+  !$acc parallel loop seq ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+  do i = 1, 10
+     call s_1 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+     call s_2 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+     call g_1 ! { dg-message "optimized: assigned OpenACC gang worker vector loop parallelism" }
+     call w_1 ! { dg-message "optimized: assigned OpenACC worker vector loop parallelism" }
+     call v_1 ! { dg-message "optimized: assigned OpenACC vector loop parallelism" }
+  end do
+  !$acc end parallel loop
+
+  !$acc parallel loop gang ! { dg-message "optimized: assigned OpenACC gang loop parallelism" }
+  do i = 1, 10
+     call s_1 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+     call s_2 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+     call w_1 ! { dg-message "optimized: assigned OpenACC worker vector loop parallelism" }
+     call v_1 ! { dg-message "optimized: assigned OpenACC vector loop parallelism" }
+  end do
+  !$acc end parallel loop
+
+  !$acc parallel loop worker ! { dg-message "optimized: assigned OpenACC worker loop parallelism" }
+  do i = 1, 10
+     call s_1 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+     call s_2 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+     call v_1 ! { dg-message "optimized: assigned OpenACC vector loop parallelism" }
+  end do
+  !$acc end parallel loop
+
+  !$acc parallel loop vector ! { dg-message "optimized: assigned OpenACC vector loop parallelism" }
+  do i = 1, 10
+     call s_1 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+     call s_2 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+  end do
+  !$acc end parallel loop
+end program main
diff --git a/gcc/testsuite/gfortran.dg/goacc/routine-module-2.f90 b/gcc/testsuite/gfortran.dg/goacc/routine-module-2.f90
new file mode 100644
index 000000000000..eae0807643c9
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/routine-module-2.f90
@@ -0,0 +1,31 @@
+! Invalid use of routines defined inside a Fortran module.
+
+! { dg-compile-aux-modules "routine-module-mod-1.f90" }
+
+program main
+  use routine_module_mod_1
+  implicit none
+
+  integer :: i
+
+  !$acc parallel loop gang
+  do i = 1, 10
+     call g_1 ! { dg-error "routine call uses same OpenACC parallelism as containing loop" }
+  end do
+  !$acc end parallel loop
+
+  !$acc parallel loop worker
+  do i = 1, 10
+     call g_1 ! { dg-error "routine call uses same OpenACC parallelism as containing loop" }
+     call w_1 ! { dg-error "routine call uses same OpenACC parallelism as containing loop" }
+  end do
+  !$acc end parallel loop
+
+  !$acc parallel loop vector
+  do i = 1, 10
+     call g_1 ! { dg-error "routine call uses same OpenACC parallelism as containing loop" }
+     call w_1 ! { dg-error "routine call uses same OpenACC parallelism as containing loop" }
+     call v_1 ! { dg-error "routine call uses same OpenACC parallelism as containing loop" }
+  end do
+  !$acc end parallel loop
+end program main
diff --git a/gcc/testsuite/gfortran.dg/goacc/routine-module-mod-1.f90 b/gcc/testsuite/gfortran.dg/goacc/routine-module-mod-1.f90
new file mode 100644
index 000000000000..3855b8c88596
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/routine-module-mod-1.f90
@@ -0,0 +1,79 @@
+! OpenACC 'routine' directives inside a Fortran module.
+
+! { dg-additional-options "-fopt-info-optimized-omp" }
+
+module routine_module_mod_1
+contains
+  subroutine s_1
+    implicit none
+    !$acc routine
+
+    integer :: i
+
+    !$acc loop ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+    ! { dg-warning "insufficient partitioning available to parallelize loop" "" { target *-*-* } .-1 }
+    do i = 1, 3
+    end do
+  end subroutine s_1
+
+  subroutine s_2
+    implicit none
+    !$acc routine seq
+
+    integer :: i
+
+    !$acc loop ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+    ! { dg-warning "insufficient partitioning available to parallelize loop" "" { target *-*-* } .-1 }
+    do i = 1, 3
+    end do
+  end subroutine s_2
+
+  subroutine v_1
+    implicit none
+    !$acc routine vector
+
+    integer :: i
+
+    !$acc loop ! { dg-message "optimized: assigned OpenACC vector loop parallelism" }
+    do i = 1, 3
+    end do
+  end subroutine v_1
+
+  subroutine w_1
+    implicit none
+    !$acc routine worker
+
+    integer :: i
+
+    !$acc loop ! { dg-message "optimized: assigned OpenACC worker vector loop parallelism" }
+    do i = 1, 3
+    end do
+  end subroutine w_1
+
+  subroutine g_1
+    implicit none
+    !$acc routine gang
+
+    integer :: i
+
+    !$acc loop ! { dg-message "optimized: assigned OpenACC gang vector loop parallelism" }
+    do i = 1, 3
+    end do
+  end subroutine g_1
+
+  subroutine pl_1
+    implicit none
+
+    integer :: i
+
+    !$acc parallel loop ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+    ! { dg-warning "insufficient partitioning available to parallelize loop" "" { target *-*-* } .-1 }
+    do i = 1, 3
+       call s_1 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+       call s_2 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+       call v_1 ! { dg-message "optimized: assigned OpenACC vector loop parallelism" }
+       call w_1 ! { dg-message "optimized: assigned OpenACC worker vector loop parallelism" }
+       call g_1 ! { dg-message "optimized: assigned OpenACC gang worker vector loop parallelism" }
+    end do
+  end subroutine pl_1
+end module routine_module_mod_1
--
2.17.1

Reply | Threaded
Open this post in threaded view
|

Re: [PR72741] Encode OpenACC 'routine' directive's level of parallelism inside Fortran module files

Thomas Koenig-6
Am 13.03.19 um 18:50 schrieb Thomas Schwinge:
>> There are many ways to deal with it without bumping MOD_VERSION in a
>> backwards but not forwards compatible way, so that a newer compiler will be
>> able to parse old *.mod files, and newer compiler new ones as long as this
>> problematic stuff doesn't appear in.
> Like the attached, actually pretty simple now?

Can you explain a) how this works, and b) how you tested it?
Reply | Threaded
Open this post in threaded view
|

Re: [PR72741] Encode OpenACC 'routine' directive's level of parallelism inside Fortran module files

Thomas Schwinge-8
Hi Thomas!

On Wed, 13 Mar 2019 23:13:46 +0100, Thomas Koenig <[hidden email]> wrote:
> Am 13.03.19 um 18:50 schrieb Thomas Schwinge:
> >> There are many ways to deal with it without bumping MOD_VERSION in a
> >> backwards but not forwards compatible way, so that a newer compiler will be
> >> able to parse old *.mod files, and newer compiler new ones as long as this
> >> problematic stuff doesn't appear in.
> > Like the attached, actually pretty simple now?
>
> Can you explain a) how this works

I'll be happy to elaborate, but I'm not sure at which level you'd like me
to explain?

This is basically the very same thing that's being done for other 'struct
symbol_attribute' flag fields, by interpreting the 'enum
oacc_routine_lop' values as individual flags, and with the corollary (to
maintain MOD_VERSION compatibility as best as we can) that we don't
stream out its default value (so doing correspondingly to when a "real"
flag's value is 'false').

> and b) how you tested it?

I had mentioned that in the commit message: with the relevant old/new GCC
combinations, using the included test case.

Happy to explain further, if necessary.


Grüße
 Thomas
Reply | Threaded
Open this post in threaded view
|

Re: [PR72741] Encode OpenACC 'routine' directive's level of parallelism inside Fortran module files

Thomas Koenig-6
Hi Thomas,

> Are there any further questions, or am I good to commit my patch as
> posted?

Problem is, I have never looked into module writing / reading in any
detail, so it will take a few days for me to get up to speed so I can
really review your patch.

If, in the meantime, maybe somebody else with more knowledge in that
area (Janne?) could step in.

Regards

        Thomas