[PATCH, libgfortran] Remove recursion check

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH, libgfortran] Remove recursion check

Janne Blomqvist-3
libgfortran has a recursion check in the error handling paths.  This
works by checking the value of a static variable, and if it matches,
aborting immediately instead of continuing error processing.
Unfortunately, in a multi-threaded program, if two threads report an
error at the same time, this can trigger this check, and thus the
underlying reason for the error is never reported.

The downside is that without the recursion check, it might be possible
to get into some infinite recursion in the error handling.  But by my
audit of the code, this shouldn't happen.

Regtested on x86_64-pc-linux-gnu, Ok for trunk?

libgfortran/ChangeLog:

2018-09-24  Janne Blomqvist  <[hidden email]>

        * runtime/error.c (MAGIC): Remove.
        (recursion_check): Remove function.
        (os_error): Remove recursion_check call.
        (runtime_error): Likewise.
        (runtime_error_at): Likewise.
        (internal_error): Likewise.
        (generate_error_common): Likewise.
        (notify_std): Likewise.
        * runtime/minimal.c (MAGIC): Remove.
        (recursion_check): Remove function.
        (os_error): Remove recursion_check call.
        (runtime_error): Likewise.
        (runtime_error_at): Likewise.
        (internal_error): Likewise.
---
 libgfortran/runtime/error.c   | 25 -------------------------
 libgfortran/runtime/minimal.c | 22 ----------------------
 2 files changed, 47 deletions(-)

diff --git a/libgfortran/runtime/error.c b/libgfortran/runtime/error.c
index b07a4c0b12a..b6cbd14ff6d 100644
--- a/libgfortran/runtime/error.c
+++ b/libgfortran/runtime/error.c
@@ -330,25 +330,6 @@ show_locus (st_parameter_common *cmp)
 }
 
 
-/* recursion_check()-- It's possible for additional errors to occur
- * during fatal error processing.  We detect this condition here and
- * exit with code 4 immediately. */
-
-#define MAGIC 0x20DE8101
-
-static void
-recursion_check (void)
-{
-  static int magic = 0;
-
-  /* Don't even try to print something at this point */
-  if (magic == MAGIC)
-    sys_abort ();
-
-  magic = MAGIC;
-}
-
-
 #define STRERR_MAXSZ 256
 
 /* os_error()-- Operating system error.  We get a message from the
@@ -360,7 +341,6 @@ os_error (const char *message)
 {
   char errmsg[STRERR_MAXSZ];
   struct iovec iov[5];
-  recursion_check ();
   iov[0].iov_base = (char*) "Operating system error: ";
   iov[0].iov_len = strlen (iov[0].iov_base);
   iov[1].iov_base = gf_strerror (errno, errmsg, STRERR_MAXSZ);
@@ -388,7 +368,6 @@ runtime_error (const char *message, ...)
   va_list ap;
   int written;
 
-  recursion_check ();
   iov[0].iov_base = (char*) "Fortran runtime error: ";
   iov[0].iov_len = strlen (iov[0].iov_base);
   va_start (ap, message);
@@ -417,7 +396,6 @@ runtime_error_at (const char *where, const char *message, ...)
   struct iovec iov[4];
   int written;
 
-  recursion_check ();
   iov[0].iov_base = (char*) where;
   iov[0].iov_len = strlen (where);
   iov[1].iov_base = (char*) "\nFortran runtime error: ";
@@ -473,7 +451,6 @@ internal_error (st_parameter_common *cmp, const char *message)
 {
   struct iovec iov[3];
 
-  recursion_check ();
   show_locus (cmp);
   iov[0].iov_base = (char*) "Internal Error: ";
   iov[0].iov_len = strlen (iov[0].iov_base);
@@ -674,7 +651,6 @@ generate_error_common (st_parameter_common *cmp, int family, const char *message
   /* Return code, caller is responsible for terminating
    the program if necessary.  */
 
-  recursion_check ();
   show_locus (cmp);
   struct iovec iov[3];
   iov[0].iov_base = (char*) "Fortran runtime error: ";
@@ -766,7 +742,6 @@ notify_std (st_parameter_common *cmp, int std, const char * message)
 
   if (!warning)
     {
-      recursion_check ();
       show_locus (cmp);
       iov[0].iov_base = (char*) "Fortran runtime error: ";
       iov[0].iov_len = strlen (iov[0].iov_base);
diff --git a/libgfortran/runtime/minimal.c b/libgfortran/runtime/minimal.c
index b6d26fd2228..44ebd99af07 100644
--- a/libgfortran/runtime/minimal.c
+++ b/libgfortran/runtime/minimal.c
@@ -43,24 +43,6 @@ options_t options;
 static int argc_save;
 static char **argv_save;
 
-/* recursion_check()-- It's possible for additional errors to occur
- * during fatal error processing.  We detect this condition here and
- * exit with code 4 immediately. */
-
-#define MAGIC 0x20DE8101
-
-static void
-recursion_check (void)
-{
-  static int magic = 0;
-
-  /* Don't even try to print something at this point */
-  if (magic == MAGIC)
-    sys_abort ();
-
-  magic = MAGIC;
-}
-
 
 /* os_error()-- Operating system error.  We get a message from the
  * operating system, show it and leave.  Some operating system errors
@@ -69,7 +51,6 @@ recursion_check (void)
 void
 os_error (const char *message)
 {
-  recursion_check ();
   printf ("Operating system error: ");
   printf ("%s\n", message);
   exit (1);
@@ -85,7 +66,6 @@ runtime_error (const char *message, ...)
 {
   va_list ap;
 
-  recursion_check ();
   printf ("Fortran runtime error: ");
   va_start (ap, message);
   vprintf (message, ap);
@@ -103,7 +83,6 @@ runtime_error_at (const char *where, const char *message, ...)
 {
   va_list ap;
 
-  recursion_check ();
   printf ("%s", where);
   printf ("\nFortran runtime error: ");
   va_start (ap, message);
@@ -136,7 +115,6 @@ iexport(runtime_warning_at);
 void
 internal_error (st_parameter_common *cmp, const char *message)
 {
-  recursion_check ();
   printf ("Internal Error: ");
   printf ("%s", message);
   printf ("\n");
--
2.17.1

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH, libgfortran] Remove recursion check

Thomas Koenig-6
Hi Janne,

> libgfortran has a recursion check in the error handling paths.  This
> works by checking the value of a static variable, and if it matches,
> aborting immediately instead of continuing error processing.
> Unfortunately, in a multi-threaded program, if two threads report an
> error at the same time, this can trigger this check, and thus the
> underlying reason for the error is never reported.

I agree that this is a problem that should be addressed.  Hm...

What do you think, would it be desirable / possible to keep the
recursive error check, but ensure thread safety by a suitable
locking mechanism?  As a first step, we should probably specify
what exactly we would like to happen.

Let us assume that the aim is to keep the current behavior for normal
processes and allow concurrent error processing for multiple threads.

This could be done by making the static variable thread-local.
I'm not sure that this would work reliably, or if some sort of
locking mechanism (probably involving PTHREAD_MUTEX_RECURSIVE)
would be required - only have one thread at a time process
an error.

Do we actually want to keep the current behavior for non-threaded
programs?  I'd be in favor, but I do not feel strongly about that.

Regards

        Thomas
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH, libgfortran] Remove recursion check

Janne Blomqvist-3
On Mon, Sep 24, 2018 at 7:48 PM Thomas Koenig <[hidden email]> wrote:

> Hi Janne,
>
> > libgfortran has a recursion check in the error handling paths.  This
> > works by checking the value of a static variable, and if it matches,
> > aborting immediately instead of continuing error processing.
> > Unfortunately, in a multi-threaded program, if two threads report an
> > error at the same time, this can trigger this check, and thus the
> > underlying reason for the error is never reported.
>
> I agree that this is a problem that should be addressed.  Hm...
>
> What do you think, would it be desirable / possible to keep the
> recursive error check, but ensure thread safety by a suitable
> locking mechanism?  As a first step, we should probably specify
> what exactly we would like to happen.
>
> Let us assume that the aim is to keep the current behavior for normal
> processes and allow concurrent error processing for multiple threads.
>
> This could be done by making the static variable thread-local.
> I'm not sure that this would work reliably, or if some sort of
> locking mechanism (probably involving PTHREAD_MUTEX_RECURSIVE)
> would be required - only have one thread at a time process
> an error.
>
> Do we actually want to keep the current behavior for non-threaded
> programs?  I'd be in favor, but I do not feel strongly about that.
>

What I briefly was thinking about doing, was to use TLS. Or rather, since
TLS is not supported on all targets, something like I did in
intrinsics/random.c:get_rand_state(). But, since the error handling stuff
should be async-signal-safe, the allocation in the setup path should be
done separately, e.g. as part of library initialization.

In the end I decided against it because

1) It's more complicated, in order to handle a quite unlikely edge case.

2) If recursion happens anyway, it's a bug in our error handling flow which
should be fixed and not be papered over.

Anyway, I'm not massively against this, if people have any particular
opinion lets hear it.

--
Janne Blomqvist
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH, libgfortran] Remove recursion check

jerry DeLisle-3
On 10/6/18 10:00 AM, Janne Blomqvist wrote:

> On Mon, Sep 24, 2018 at 10:18 PM Janne Blomqvist <[hidden email]>
> wrote:
>
>> On Mon, Sep 24, 2018 at 7:48 PM Thomas Koenig <[hidden email]>
>> wrote:
>>
>>> Hi Janne,
>>>
>>>> libgfortran has a recursion check in the error handling paths.  This
>>>> works by checking the value of a static variable, and if it matches,
>>>> aborting immediately instead of continuing error processing.
>>>> Unfortunately, in a multi-threaded program, if two threads report an
>>>> error at the same time, this can trigger this check, and thus the
>>>> underlying reason for the error is never reported.
>>>
>>> I agree that this is a problem that should be addressed.  Hm...
>>>
>>> What do you think, would it be desirable / possible to keep the
>>> recursive error check, but ensure thread safety by a suitable
>>> locking mechanism?  As a first step, we should probably specify
>>> what exactly we would like to happen.
>>>
>>> Let us assume that the aim is to keep the current behavior for normal
>>> processes and allow concurrent error processing for multiple threads.
>>>
>>> This could be done by making the static variable thread-local.
>>> I'm not sure that this would work reliably, or if some sort of
>>> locking mechanism (probably involving PTHREAD_MUTEX_RECURSIVE)
>>> would be required - only have one thread at a time process
>>> an error.
>>>
>>> Do we actually want to keep the current behavior for non-threaded
>>> programs?  I'd be in favor, but I do not feel strongly about that.
>>>
>>
>> What I briefly was thinking about doing, was to use TLS. Or rather, since
>> TLS is not supported on all targets, something like I did in
>> intrinsics/random.c:get_rand_state(). But, since the error handling stuff
>> should be async-signal-safe, the allocation in the setup path should be
>> done separately, e.g. as part of library initialization.
>>
>> In the end I decided against it because
>>
>> 1) It's more complicated, in order to handle a quite unlikely edge case.
>>
>> 2) If recursion happens anyway, it's a bug in our error handling flow
>> which should be fixed and not be papered over.
>>
>> Anyway, I'm not massively against this, if people have any particular
>> opinion lets hear it.
>>
>
> PING! Any opinions on the above?
>

Agree it should be fixed. I would think a mutex lock should work. Lock,
issue error message, unlock. Even if there is recursion, since there is
at least one error somewhere, It should be OK to pause the world to
issue the error, and come back and finish. AS I think about it, maybe
one global lock variable so that only one thread can get it at a time
and others wait until it is unlcoked to proceed with their lock and error.

Would there be a problem with some sort of endless loop waiting for the
unlock?

Jerry
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH, libgfortran] Remove recursion check

Thomas Koenig-6
Hi Janne,

> The error handling functions can be called from a signal handler, so they
> need to be async-signal-safe.

I didn't know that. How can this happen?

Regards

        Thomas
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH, libgfortran] Remove recursion check

Janne Blomqvist-3
On Sun, Oct 7, 2018 at 12:36 AM Thomas Koenig <[hidden email]> wrote:

> Hi Janne,
>
> > The error handling functions can be called from a signal handler, so they
> > need to be async-signal-safe.
>
> I didn't know that. How can this happen?
>

 Hmm, seems I was imagining things, I can't find anything like that in the
code.

Anyway, the more I think about it the better I like my original patch

1) It's KISS

2) I can't find anything in the code that would lead to endless recursive
invocation of the error printing functions.

So, Ok for trunk?


--
Janne Blomqvist
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH, libgfortran] Remove recursion check

Thomas Koenig-6
Hi Janne,

> 1) It's KISS
>
> 2) I can't find anything in the code that would lead to endless
> recursive invocation of the error printing functions.
>
> So, Ok for trunk?

With async I/O, I think the possibilities of hitting
concurrent errors have increased, so I'd still prefer the
solution with wrapping a recursive lock around error handling.

Regards

        Thomas
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH,fortran] Make recursion_check work for multiple threads

Janne Blomqvist-3
On Fri, Nov 23, 2018 at 10:24 PM Jerry DeLisle <[hidden email]>
wrote:

> On 11/23/18 11:58 AM, Janne Blomqvist wrote:
> > With multiple threads, using an unprotected static variable to check
> > whether recursion has occured isn't valid, as one thread might have
> > modified the variable, thus causing another thread to incorrectly
> > conclude that recursion has occured.  This patch avoids this problem
> > by using a thread-specific variable for the recursion check.
> >
> > Regtested on x86_64-pc-linux-gnu, Ok for trunk?
> >
>
> OK and thanks Janne.
>

Thanks for the quick review, committed as r266419.


--
Janne Blomqvist