[PATCH] avoid warning on constant strncpy until next statement is reachable (PR 87028)

classic Classic list List threaded Threaded
35 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[PATCH] avoid warning on constant strncpy until next statement is reachable (PR 87028)

Martin Sebor-2
The warning suppression for -Wstringop-truncation looks for
the next statement after a truncating strncpy to see if it
adds a terminating nul.  This only works when the next
statement can be reached using the Gimple statement iterator
which isn't until after gimplification.  As a result, strncpy
calls that truncate their constant argument that are being
folded to memcpy this early get diagnosed even if they are
followed by the nul assignment:

   const char s[] = "12345";
   char d[3];

   void f (void)
   {
     strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation
     d[sizeof d - 1] = 0;
   }

To avoid the warning I propose to defer folding strncpy to
memcpy until the pointer to the basic block the strnpy call
is in can be used to try to reach the next statement (this
happens as early as ccp1).  I'm aware of the preference to
fold things early but in the case of strncpy (a relatively
rarely used function that is often misused), getting
the warning right while folding a bit later but still fairly
early on seems like a reasonable compromise.  I fear that
otherwise, the false positives will drive users to adopt
other unsafe solutions (like memcpy) where these kinds of
bugs cannot be as readily detected.

Tested on x86_64-linux.

Martin

PS There still are outstanding cases where the warning can
be avoided.  I xfailed them in the test for now but will
still try to get them to work for GCC 9.

gcc-87028.diff (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] avoid warning on constant strncpy until next statement is reachable (PR 87028)

Jeff Law
On 08/24/2018 09:58 AM, Martin Sebor wrote:

> The warning suppression for -Wstringop-truncation looks for
> the next statement after a truncating strncpy to see if it
> adds a terminating nul.  This only works when the next
> statement can be reached using the Gimple statement iterator
> which isn't until after gimplification.  As a result, strncpy
> calls that truncate their constant argument that are being
> folded to memcpy this early get diagnosed even if they are
> followed by the nul assignment:
>
>   const char s[] = "12345";
>   char d[3];
>
>   void f (void)
>   {
>     strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation
>     d[sizeof d - 1] = 0;
>   }
>
> To avoid the warning I propose to defer folding strncpy to
> memcpy until the pointer to the basic block the strnpy call
> is in can be used to try to reach the next statement (this
> happens as early as ccp1).  I'm aware of the preference to
> fold things early but in the case of strncpy (a relatively
> rarely used function that is often misused), getting
> the warning right while folding a bit later but still fairly
> early on seems like a reasonable compromise.  I fear that
> otherwise, the false positives will drive users to adopt
> other unsafe solutions (like memcpy) where these kinds of
> bugs cannot be as readily detected.
>
> Tested on x86_64-linux.
>
> Martin
>
> PS There still are outstanding cases where the warning can
> be avoided.  I xfailed them in the test for now but will
> still try to get them to work for GCC 9.
>
> gcc-87028.diff
>
>
> PR tree-optimization/87028 - false positive -Wstringop-truncation strncpy with global variable source string
> gcc/ChangeLog:
>
> PR tree-optimization/87028
> * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding when
> statement doesn't belong to a basic block.
> * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle MEM_REF on
> the left hand side of assignment.
>
> gcc/testsuite/ChangeLog:
>
> PR tree-optimization/87028
> * c-c++-common/Wstringop-truncation.c: Remove xfails.
> * gcc.dg/Wstringop-truncation-5.c: New test.
>
> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
> index 07341eb..284c2fb 100644
> --- a/gcc/gimple-fold.c
> +++ b/gcc/gimple-fold.c
> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy (gimple_stmt_iterator *gsi,
>    if (tree_int_cst_lt (ssize, len))
>      return false;
>  
> +  /* Defer warning (and folding) until the next statement in the basic
> +     block is reachable.  */
> +  if (!gimple_bb (stmt))
> +    return false;
I think you want cfun->cfg as the test here.  They should be equivalent
in practice.


> diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
> index d0792aa..f1988f6 100644
> --- a/gcc/tree-ssa-strlen.c
> +++ b/gcc/tree-ssa-strlen.c
> @@ -1981,6 +1981,23 @@ maybe_diag_stxncpy_trunc (gimple_stmt_iterator gsi, tree src, tree cnt)
>    && known_eq (dstoff, lhsoff)
>    && operand_equal_p (dstbase, lhsbase, 0))
>   return false;
> +
> +      if (code == MEM_REF
> +  && TREE_CODE (lhsbase) == SSA_NAME
> +  && known_eq (dstoff, lhsoff))
> + {
> +  /* Extract the referenced variable from something like
> +       MEM[(char *)d_3(D) + 3B] = 0;  */
> +  gimple *def = SSA_NAME_DEF_STMT (lhsbase);
> +  if (gimple_nop_p (def))
> +    {
> +      lhsbase = SSA_NAME_VAR (lhsbase);
> +      if (lhsbase
> +  && dstbase
> +  && operand_equal_p (dstbase, lhsbase, 0))
> + return false;
> +    }
> + }
If you find yourself looking at SSA_NAME_VAR, you're usually barking up
the wrong tree.  It'd be easier to suggest something here if I could see
the gimple (with virtual operands).  BUt at some level what you really
want to do is make sure the base of the MEM_REF is the same as what got
passed as the destination of the strncpy.  You'd want to be testing
SSA_NAMEs in that case.

Jeff

Jeff
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] avoid warning on constant strncpy until next statement is reachable (PR 87028)

Richard Biener-2
On Sun, Aug 26, 2018 at 7:26 AM Jeff Law <[hidden email]> wrote:

>
> On 08/24/2018 09:58 AM, Martin Sebor wrote:
> > The warning suppression for -Wstringop-truncation looks for
> > the next statement after a truncating strncpy to see if it
> > adds a terminating nul.  This only works when the next
> > statement can be reached using the Gimple statement iterator
> > which isn't until after gimplification.  As a result, strncpy
> > calls that truncate their constant argument that are being
> > folded to memcpy this early get diagnosed even if they are
> > followed by the nul assignment:
> >
> >   const char s[] = "12345";
> >   char d[3];
> >
> >   void f (void)
> >   {
> >     strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation
> >     d[sizeof d - 1] = 0;
> >   }
> >
> > To avoid the warning I propose to defer folding strncpy to
> > memcpy until the pointer to the basic block the strnpy call
> > is in can be used to try to reach the next statement (this
> > happens as early as ccp1).  I'm aware of the preference to
> > fold things early but in the case of strncpy (a relatively
> > rarely used function that is often misused), getting
> > the warning right while folding a bit later but still fairly
> > early on seems like a reasonable compromise.  I fear that
> > otherwise, the false positives will drive users to adopt
> > other unsafe solutions (like memcpy) where these kinds of
> > bugs cannot be as readily detected.
> >
> > Tested on x86_64-linux.
> >
> > Martin
> >
> > PS There still are outstanding cases where the warning can
> > be avoided.  I xfailed them in the test for now but will
> > still try to get them to work for GCC 9.
> >
> > gcc-87028.diff
> >
> >
> > PR tree-optimization/87028 - false positive -Wstringop-truncation strncpy with global variable source string
> > gcc/ChangeLog:
> >
> >       PR tree-optimization/87028
> >       * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding when
> >       statement doesn't belong to a basic block.
> >       * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle MEM_REF on
> >       the left hand side of assignment.
> >
> > gcc/testsuite/ChangeLog:
> >
> >       PR tree-optimization/87028
> >       * c-c++-common/Wstringop-truncation.c: Remove xfails.
> >       * gcc.dg/Wstringop-truncation-5.c: New test.
> >
> > diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
> > index 07341eb..284c2fb 100644
> > --- a/gcc/gimple-fold.c
> > +++ b/gcc/gimple-fold.c
> > @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy (gimple_stmt_iterator *gsi,
> >    if (tree_int_cst_lt (ssize, len))
> >      return false;
> >
> > +  /* Defer warning (and folding) until the next statement in the basic
> > +     block is reachable.  */
> > +  if (!gimple_bb (stmt))
> > +    return false;
> I think you want cfun->cfg as the test here.  They should be equivalent
> in practice.

Please do not add 'cfun' references.  Note that the next stmt is also accessible
when there is no CFG.  I guess the issue is that we fold this during
gimplification
where the next stmt is not yet "there" (but still in GENERIC)?

We generally do not want to have unfolded stmts in the IL when we can avoid that
which is why we fold most stmts during gimplification.  We also do that because
we now do less folding on GENERIC.

There may be the possibility to refactor gimplification time folding to what we
do during inlining - queue stmts we want to fold and perform all
folding delayed.
This of course means bigger compile-time due to cache effects.

>
> > diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
> > index d0792aa..f1988f6 100644
> > --- a/gcc/tree-ssa-strlen.c
> > +++ b/gcc/tree-ssa-strlen.c
> > @@ -1981,6 +1981,23 @@ maybe_diag_stxncpy_trunc (gimple_stmt_iterator gsi, tree src, tree cnt)
> >         && known_eq (dstoff, lhsoff)
> >         && operand_equal_p (dstbase, lhsbase, 0))
> >       return false;
> > +
> > +      if (code == MEM_REF
> > +       && TREE_CODE (lhsbase) == SSA_NAME
> > +       && known_eq (dstoff, lhsoff))
> > +     {
> > +       /* Extract the referenced variable from something like
> > +            MEM[(char *)d_3(D) + 3B] = 0;  */
> > +       gimple *def = SSA_NAME_DEF_STMT (lhsbase);
> > +       if (gimple_nop_p (def))
> > +         {
> > +           lhsbase = SSA_NAME_VAR (lhsbase);
> > +           if (lhsbase
> > +               && dstbase
> > +               && operand_equal_p (dstbase, lhsbase, 0))
> > +             return false;
> > +         }
> > +     }
> If you find yourself looking at SSA_NAME_VAR, you're usually barking up
> the wrong tree.  It'd be easier to suggest something here if I could see
> the gimple (with virtual operands).  BUt at some level what you really
> want to do is make sure the base of the MEM_REF is the same as what got
> passed as the destination of the strncpy.  You'd want to be testing
> SSA_NAMEs in that case.

Yes.  Why not simply compare the SSA names?  Why would it be
not OK to do that when !lhsbase?

Richard.

>
> Jeff
>
> Jeff
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] avoid warning on constant strncpy until next statement is reachable (PR 87028)

Jeff Law
On 08/27/2018 02:29 AM, Richard Biener wrote:

> On Sun, Aug 26, 2018 at 7:26 AM Jeff Law <[hidden email]> wrote:
>>
>> On 08/24/2018 09:58 AM, Martin Sebor wrote:
>>> The warning suppression for -Wstringop-truncation looks for
>>> the next statement after a truncating strncpy to see if it
>>> adds a terminating nul.  This only works when the next
>>> statement can be reached using the Gimple statement iterator
>>> which isn't until after gimplification.  As a result, strncpy
>>> calls that truncate their constant argument that are being
>>> folded to memcpy this early get diagnosed even if they are
>>> followed by the nul assignment:
>>>
>>>   const char s[] = "12345";
>>>   char d[3];
>>>
>>>   void f (void)
>>>   {
>>>     strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation
>>>     d[sizeof d - 1] = 0;
>>>   }
>>>
>>> To avoid the warning I propose to defer folding strncpy to
>>> memcpy until the pointer to the basic block the strnpy call
>>> is in can be used to try to reach the next statement (this
>>> happens as early as ccp1).  I'm aware of the preference to
>>> fold things early but in the case of strncpy (a relatively
>>> rarely used function that is often misused), getting
>>> the warning right while folding a bit later but still fairly
>>> early on seems like a reasonable compromise.  I fear that
>>> otherwise, the false positives will drive users to adopt
>>> other unsafe solutions (like memcpy) where these kinds of
>>> bugs cannot be as readily detected.
>>>
>>> Tested on x86_64-linux.
>>>
>>> Martin
>>>
>>> PS There still are outstanding cases where the warning can
>>> be avoided.  I xfailed them in the test for now but will
>>> still try to get them to work for GCC 9.
>>>
>>> gcc-87028.diff
>>>
>>>
>>> PR tree-optimization/87028 - false positive -Wstringop-truncation strncpy with global variable source string
>>> gcc/ChangeLog:
>>>
>>>       PR tree-optimization/87028
>>>       * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding when
>>>       statement doesn't belong to a basic block.
>>>       * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle MEM_REF on
>>>       the left hand side of assignment.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>>       PR tree-optimization/87028
>>>       * c-c++-common/Wstringop-truncation.c: Remove xfails.
>>>       * gcc.dg/Wstringop-truncation-5.c: New test.
>>>
>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
>>> index 07341eb..284c2fb 100644
>>> --- a/gcc/gimple-fold.c
>>> +++ b/gcc/gimple-fold.c
>>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy (gimple_stmt_iterator *gsi,
>>>    if (tree_int_cst_lt (ssize, len))
>>>      return false;
>>>
>>> +  /* Defer warning (and folding) until the next statement in the basic
>>> +     block is reachable.  */
>>> +  if (!gimple_bb (stmt))
>>> +    return false;
>> I think you want cfun->cfg as the test here.  They should be equivalent
>> in practice.
>
> Please do not add 'cfun' references.  Note that the next stmt is also accessible
> when there is no CFG.  I guess the issue is that we fold this during
> gimplification where the next stmt is not yet "there" (but still in GENERIC)?
That was my assumption.  I almost suggested peeking at gsi_next and
avoiding in that case.

>
> We generally do not want to have unfolded stmts in the IL when we can avoid that
> which is why we fold most stmts during gimplification.  We also do that because
> we now do less folding on GENERIC.
But an unfolded call in the IL should always be safe and we've got
plenty of opportunities to fold it later.

Jeff
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] avoid warning on constant strncpy until next statement is reachable (PR 87028)

Richard Biener-2
On Mon, Aug 27, 2018 at 5:32 PM Jeff Law <[hidden email]> wrote:

>
> On 08/27/2018 02:29 AM, Richard Biener wrote:
> > On Sun, Aug 26, 2018 at 7:26 AM Jeff Law <[hidden email]> wrote:
> >>
> >> On 08/24/2018 09:58 AM, Martin Sebor wrote:
> >>> The warning suppression for -Wstringop-truncation looks for
> >>> the next statement after a truncating strncpy to see if it
> >>> adds a terminating nul.  This only works when the next
> >>> statement can be reached using the Gimple statement iterator
> >>> which isn't until after gimplification.  As a result, strncpy
> >>> calls that truncate their constant argument that are being
> >>> folded to memcpy this early get diagnosed even if they are
> >>> followed by the nul assignment:
> >>>
> >>>   const char s[] = "12345";
> >>>   char d[3];
> >>>
> >>>   void f (void)
> >>>   {
> >>>     strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation
> >>>     d[sizeof d - 1] = 0;
> >>>   }
> >>>
> >>> To avoid the warning I propose to defer folding strncpy to
> >>> memcpy until the pointer to the basic block the strnpy call
> >>> is in can be used to try to reach the next statement (this
> >>> happens as early as ccp1).  I'm aware of the preference to
> >>> fold things early but in the case of strncpy (a relatively
> >>> rarely used function that is often misused), getting
> >>> the warning right while folding a bit later but still fairly
> >>> early on seems like a reasonable compromise.  I fear that
> >>> otherwise, the false positives will drive users to adopt
> >>> other unsafe solutions (like memcpy) where these kinds of
> >>> bugs cannot be as readily detected.
> >>>
> >>> Tested on x86_64-linux.
> >>>
> >>> Martin
> >>>
> >>> PS There still are outstanding cases where the warning can
> >>> be avoided.  I xfailed them in the test for now but will
> >>> still try to get them to work for GCC 9.
> >>>
> >>> gcc-87028.diff
> >>>
> >>>
> >>> PR tree-optimization/87028 - false positive -Wstringop-truncation strncpy with global variable source string
> >>> gcc/ChangeLog:
> >>>
> >>>       PR tree-optimization/87028
> >>>       * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding when
> >>>       statement doesn't belong to a basic block.
> >>>       * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle MEM_REF on
> >>>       the left hand side of assignment.
> >>>
> >>> gcc/testsuite/ChangeLog:
> >>>
> >>>       PR tree-optimization/87028
> >>>       * c-c++-common/Wstringop-truncation.c: Remove xfails.
> >>>       * gcc.dg/Wstringop-truncation-5.c: New test.
> >>>
> >>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
> >>> index 07341eb..284c2fb 100644
> >>> --- a/gcc/gimple-fold.c
> >>> +++ b/gcc/gimple-fold.c
> >>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy (gimple_stmt_iterator *gsi,
> >>>    if (tree_int_cst_lt (ssize, len))
> >>>      return false;
> >>>
> >>> +  /* Defer warning (and folding) until the next statement in the basic
> >>> +     block is reachable.  */
> >>> +  if (!gimple_bb (stmt))
> >>> +    return false;
> >> I think you want cfun->cfg as the test here.  They should be equivalent
> >> in practice.
> >
> > Please do not add 'cfun' references.  Note that the next stmt is also accessible
> > when there is no CFG.  I guess the issue is that we fold this during
> > gimplification where the next stmt is not yet "there" (but still in GENERIC)?
> That was my assumption.  I almost suggested peeking at gsi_next and
> avoiding in that case.

So I'd rather add guards to maybe_fold_stmt in the gimplifier then.

> >
> > We generally do not want to have unfolded stmts in the IL when we can avoid that
> > which is why we fold most stmts during gimplification.  We also do that because
> > we now do less folding on GENERIC.
> But an unfolded call in the IL should always be safe and we've got
> plenty of opportunities to fold it later.

Well - we do.  The very first one is forwprop though which means we'll miss to
re-write some memcpy parts into SSA:

          NEXT_PASS (pass_ccp, false /* nonzero_p */);
          /* After CCP we rewrite no longer addressed locals into SSA
             form if possible.  */
          NEXT_PASS (pass_forwprop);

likewise early object-size will be confused by memcpy calls that just exist
to avoid TBAA issues (another of our recommendations besides using unions).

We do fold mem* early for a reason ;)

"We can always do warnings earlier" would be a similar true sentence.

Both come at a cost.  You know I'm usually declaring GCC to be an
optimizing compiler
and not a static analysis engine ;)  So I'm not too much convinced when seeing
disabling/delaying folding here and there to catch some false
negatives for -Wxyz.

We need to work out a plan rather than throwing sticks here and there.

Richard.

>
> Jeff
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] avoid warning on constant strncpy until next statement is reachable (PR 87028)

Martin Sebor-2
In reply to this post by Richard Biener-2
On 08/27/2018 02:29 AM, Richard Biener wrote:

> On Sun, Aug 26, 2018 at 7:26 AM Jeff Law <[hidden email]> wrote:
>>
>> On 08/24/2018 09:58 AM, Martin Sebor wrote:
>>> The warning suppression for -Wstringop-truncation looks for
>>> the next statement after a truncating strncpy to see if it
>>> adds a terminating nul.  This only works when the next
>>> statement can be reached using the Gimple statement iterator
>>> which isn't until after gimplification.  As a result, strncpy
>>> calls that truncate their constant argument that are being
>>> folded to memcpy this early get diagnosed even if they are
>>> followed by the nul assignment:
>>>
>>>   const char s[] = "12345";
>>>   char d[3];
>>>
>>>   void f (void)
>>>   {
>>>     strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation
>>>     d[sizeof d - 1] = 0;
>>>   }
>>>
>>> To avoid the warning I propose to defer folding strncpy to
>>> memcpy until the pointer to the basic block the strnpy call
>>> is in can be used to try to reach the next statement (this
>>> happens as early as ccp1).  I'm aware of the preference to
>>> fold things early but in the case of strncpy (a relatively
>>> rarely used function that is often misused), getting
>>> the warning right while folding a bit later but still fairly
>>> early on seems like a reasonable compromise.  I fear that
>>> otherwise, the false positives will drive users to adopt
>>> other unsafe solutions (like memcpy) where these kinds of
>>> bugs cannot be as readily detected.
>>>
>>> Tested on x86_64-linux.
>>>
>>> Martin
>>>
>>> PS There still are outstanding cases where the warning can
>>> be avoided.  I xfailed them in the test for now but will
>>> still try to get them to work for GCC 9.
>>>
>>> gcc-87028.diff
>>>
>>>
>>> PR tree-optimization/87028 - false positive -Wstringop-truncation strncpy with global variable source string
>>> gcc/ChangeLog:
>>>
>>>       PR tree-optimization/87028
>>>       * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding when
>>>       statement doesn't belong to a basic block.
>>>       * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle MEM_REF on
>>>       the left hand side of assignment.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>>       PR tree-optimization/87028
>>>       * c-c++-common/Wstringop-truncation.c: Remove xfails.
>>>       * gcc.dg/Wstringop-truncation-5.c: New test.
>>>
>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
>>> index 07341eb..284c2fb 100644
>>> --- a/gcc/gimple-fold.c
>>> +++ b/gcc/gimple-fold.c
>>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy (gimple_stmt_iterator *gsi,
>>>    if (tree_int_cst_lt (ssize, len))
>>>      return false;
>>>
>>> +  /* Defer warning (and folding) until the next statement in the basic
>>> +     block is reachable.  */
>>> +  if (!gimple_bb (stmt))
>>> +    return false;
>> I think you want cfun->cfg as the test here.  They should be equivalent
>> in practice.
>
> Please do not add 'cfun' references.  Note that the next stmt is also accessible
> when there is no CFG.  I guess the issue is that we fold this during
> gimplification
> where the next stmt is not yet "there" (but still in GENERIC)?
>
> We generally do not want to have unfolded stmts in the IL when we can avoid that
> which is why we fold most stmts during gimplification.  We also do that because
> we now do less folding on GENERIC.
>
> There may be the possibility to refactor gimplification time folding to what we
> do during inlining - queue stmts we want to fold and perform all
> folding delayed.
> This of course means bigger compile-time due to cache effects.
>
>>
>>> diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
>>> index d0792aa..f1988f6 100644
>>> --- a/gcc/tree-ssa-strlen.c
>>> +++ b/gcc/tree-ssa-strlen.c
>>> @@ -1981,6 +1981,23 @@ maybe_diag_stxncpy_trunc (gimple_stmt_iterator gsi, tree src, tree cnt)
>>>         && known_eq (dstoff, lhsoff)
>>>         && operand_equal_p (dstbase, lhsbase, 0))
>>>       return false;
>>> +
>>> +      if (code == MEM_REF
>>> +       && TREE_CODE (lhsbase) == SSA_NAME
>>> +       && known_eq (dstoff, lhsoff))
>>> +     {
>>> +       /* Extract the referenced variable from something like
>>> +            MEM[(char *)d_3(D) + 3B] = 0;  */
>>> +       gimple *def = SSA_NAME_DEF_STMT (lhsbase);
>>> +       if (gimple_nop_p (def))
>>> +         {
>>> +           lhsbase = SSA_NAME_VAR (lhsbase);
>>> +           if (lhsbase
>>> +               && dstbase
>>> +               && operand_equal_p (dstbase, lhsbase, 0))
>>> +             return false;
>>> +         }
>>> +     }
>> If you find yourself looking at SSA_NAME_VAR, you're usually barking up
>> the wrong tree.  It'd be easier to suggest something here if I could see
>> the gimple (with virtual operands).  BUt at some level what you really
>> want to do is make sure the base of the MEM_REF is the same as what got
>> passed as the destination of the strncpy.  You'd want to be testing
>> SSA_NAMEs in that case.
>
> Yes.  Why not simply compare the SSA names?  Why would it be
> not OK to do that when !lhsbase?

The added code handles this case:

   void f (char *d)
   {
     __builtin_strncpy (d, "12345", 4);
     d[3] = 0;
   }

where during forwprop we see:

   __builtin_strncpy (d_3(D), "12345", 4);
   MEM[(char *)d_3(D) + 3B] = 0;

The next statement after the strncpy is the assignment whose
lhs is the MEM_REF with a GIMPLE_NOP as an operand.  There
is no other information in the GIMPLE_NOP that I can see to
tell that the operand is d_3(D) or that it's the same as
the strncpy argument (i.e., the PARAM_DECl d).  Having to
do open-code this all the time seems so cumbersome -- is
there some API that would do this for me?  (I thought
get_addr_base_and_unit_offset was that API but clearly in
this case it doesn't do what I expect -- it just returns
the argument.)

Martin
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] avoid warning on constant strncpy until next statement is reachable (PR 87028)

Martin Sebor-2
In reply to this post by Jeff Law
On 08/25/2018 11:24 PM, Jeff Law wrote:

> On 08/24/2018 09:58 AM, Martin Sebor wrote:
>> The warning suppression for -Wstringop-truncation looks for
>> the next statement after a truncating strncpy to see if it
>> adds a terminating nul.  This only works when the next
>> statement can be reached using the Gimple statement iterator
>> which isn't until after gimplification.  As a result, strncpy
>> calls that truncate their constant argument that are being
>> folded to memcpy this early get diagnosed even if they are
>> followed by the nul assignment:
>>
>>   const char s[] = "12345";
>>   char d[3];
>>
>>   void f (void)
>>   {
>>     strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation
>>     d[sizeof d - 1] = 0;
>>   }
>>
>> To avoid the warning I propose to defer folding strncpy to
>> memcpy until the pointer to the basic block the strnpy call
>> is in can be used to try to reach the next statement (this
>> happens as early as ccp1).  I'm aware of the preference to
>> fold things early but in the case of strncpy (a relatively
>> rarely used function that is often misused), getting
>> the warning right while folding a bit later but still fairly
>> early on seems like a reasonable compromise.  I fear that
>> otherwise, the false positives will drive users to adopt
>> other unsafe solutions (like memcpy) where these kinds of
>> bugs cannot be as readily detected.
>>
>> Tested on x86_64-linux.
>>
>> Martin
>>
>> PS There still are outstanding cases where the warning can
>> be avoided.  I xfailed them in the test for now but will
>> still try to get them to work for GCC 9.
>>
>> gcc-87028.diff
>>
>>
>> PR tree-optimization/87028 - false positive -Wstringop-truncation strncpy with global variable source string
>> gcc/ChangeLog:
>>
>> PR tree-optimization/87028
>> * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding when
>> statement doesn't belong to a basic block.
>> * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle MEM_REF on
>> the left hand side of assignment.
>>
>> gcc/testsuite/ChangeLog:
>>
>> PR tree-optimization/87028
>> * c-c++-common/Wstringop-truncation.c: Remove xfails.
>> * gcc.dg/Wstringop-truncation-5.c: New test.
>>
>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
>> index 07341eb..284c2fb 100644
>> --- a/gcc/gimple-fold.c
>> +++ b/gcc/gimple-fold.c
>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy (gimple_stmt_iterator *gsi,
>>    if (tree_int_cst_lt (ssize, len))
>>      return false;
>>
>> +  /* Defer warning (and folding) until the next statement in the basic
>> +     block is reachable.  */
>> +  if (!gimple_bb (stmt))
>> +    return false;
> I think you want cfun->cfg as the test here.  They should be equivalent
> in practice.
>
>
>> diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
>> index d0792aa..f1988f6 100644
>> --- a/gcc/tree-ssa-strlen.c
>> +++ b/gcc/tree-ssa-strlen.c
>> @@ -1981,6 +1981,23 @@ maybe_diag_stxncpy_trunc (gimple_stmt_iterator gsi, tree src, tree cnt)
>>    && known_eq (dstoff, lhsoff)
>>    && operand_equal_p (dstbase, lhsbase, 0))
>>   return false;
>> +
>> +      if (code == MEM_REF
>> +  && TREE_CODE (lhsbase) == SSA_NAME
>> +  && known_eq (dstoff, lhsoff))
>> + {
>> +  /* Extract the referenced variable from something like
>> +       MEM[(char *)d_3(D) + 3B] = 0;  */
>> +  gimple *def = SSA_NAME_DEF_STMT (lhsbase);
>> +  if (gimple_nop_p (def))
>> +    {
>> +      lhsbase = SSA_NAME_VAR (lhsbase);
>> +      if (lhsbase
>> +  && dstbase
>> +  && operand_equal_p (dstbase, lhsbase, 0))
>> + return false;
>> +    }
>> + }
> If you find yourself looking at SSA_NAME_VAR, you're usually barking up
> the wrong tree.  It'd be easier to suggest something here if I could see
> the gimple (with virtual operands).  BUt at some level what you really
> want to do is make sure the base of the MEM_REF is the same as what got
> passed as the destination of the strncpy.  You'd want to be testing
> SSA_NAMEs in that case.

I replied to Richard with the code that this hunk handles:

   https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01697.html

I couldn't find any other way to determine that d_3(D) in

   MEM[(char *)d_3(D) + 3B] = 0;

is the same as the first argument in:

   __builtin_strncpy (d_3(D), "12345", 4);

The MEM_REF operand is an SSA_NAME whose DEF_STMT is
a GIMPLE_NOP and whose SSA_NAME_VAR is the PARAM_DECL d.
Where else can I get the variable from?

Martin

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] avoid warning on constant strncpy until next statement is reachable (PR 87028)

Jeff Law
In reply to this post by Martin Sebor-2
On 08/27/2018 10:27 AM, Martin Sebor wrote:

> On 08/27/2018 02:29 AM, Richard Biener wrote:
>> On Sun, Aug 26, 2018 at 7:26 AM Jeff Law <[hidden email]> wrote:
>>>
>>> On 08/24/2018 09:58 AM, Martin Sebor wrote:
>>>> The warning suppression for -Wstringop-truncation looks for
>>>> the next statement after a truncating strncpy to see if it
>>>> adds a terminating nul.  This only works when the next
>>>> statement can be reached using the Gimple statement iterator
>>>> which isn't until after gimplification.  As a result, strncpy
>>>> calls that truncate their constant argument that are being
>>>> folded to memcpy this early get diagnosed even if they are
>>>> followed by the nul assignment:
>>>>
>>>>   const char s[] = "12345";
>>>>   char d[3];
>>>>
>>>>   void f (void)
>>>>   {
>>>>     strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation
>>>>     d[sizeof d - 1] = 0;
>>>>   }
>>>>
>>>> To avoid the warning I propose to defer folding strncpy to
>>>> memcpy until the pointer to the basic block the strnpy call
>>>> is in can be used to try to reach the next statement (this
>>>> happens as early as ccp1).  I'm aware of the preference to
>>>> fold things early but in the case of strncpy (a relatively
>>>> rarely used function that is often misused), getting
>>>> the warning right while folding a bit later but still fairly
>>>> early on seems like a reasonable compromise.  I fear that
>>>> otherwise, the false positives will drive users to adopt
>>>> other unsafe solutions (like memcpy) where these kinds of
>>>> bugs cannot be as readily detected.
>>>>
>>>> Tested on x86_64-linux.
>>>>
>>>> Martin
>>>>
>>>> PS There still are outstanding cases where the warning can
>>>> be avoided.  I xfailed them in the test for now but will
>>>> still try to get them to work for GCC 9.
>>>>
>>>> gcc-87028.diff
>>>>
>>>>
>>>> PR tree-optimization/87028 - false positive -Wstringop-truncation
>>>> strncpy with global variable source string
>>>> gcc/ChangeLog:
>>>>
>>>>       PR tree-optimization/87028
>>>>       * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding when
>>>>       statement doesn't belong to a basic block.
>>>>       * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle MEM_REF on
>>>>       the left hand side of assignment.
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>>       PR tree-optimization/87028
>>>>       * c-c++-common/Wstringop-truncation.c: Remove xfails.
>>>>       * gcc.dg/Wstringop-truncation-5.c: New test.
>>>>
>>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
>>>> index 07341eb..284c2fb 100644
>>>> --- a/gcc/gimple-fold.c
>>>> +++ b/gcc/gimple-fold.c
>>>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy
>>>> (gimple_stmt_iterator *gsi,
>>>>    if (tree_int_cst_lt (ssize, len))
>>>>      return false;
>>>>
>>>> +  /* Defer warning (and folding) until the next statement in the basic
>>>> +     block is reachable.  */
>>>> +  if (!gimple_bb (stmt))
>>>> +    return false;
>>> I think you want cfun->cfg as the test here.  They should be equivalent
>>> in practice.
>>
>> Please do not add 'cfun' references.  Note that the next stmt is also
>> accessible
>> when there is no CFG.  I guess the issue is that we fold this during
>> gimplification
>> where the next stmt is not yet "there" (but still in GENERIC)?
>>
>> We generally do not want to have unfolded stmts in the IL when we can
>> avoid that
>> which is why we fold most stmts during gimplification.  We also do
>> that because
>> we now do less folding on GENERIC.
>>
>> There may be the possibility to refactor gimplification time folding
>> to what we
>> do during inlining - queue stmts we want to fold and perform all
>> folding delayed.
>> This of course means bigger compile-time due to cache effects.
>>
>>>
>>>> diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
>>>> index d0792aa..f1988f6 100644
>>>> --- a/gcc/tree-ssa-strlen.c
>>>> +++ b/gcc/tree-ssa-strlen.c
>>>> @@ -1981,6 +1981,23 @@ maybe_diag_stxncpy_trunc
>>>> (gimple_stmt_iterator gsi, tree src, tree cnt)
>>>>         && known_eq (dstoff, lhsoff)
>>>>         && operand_equal_p (dstbase, lhsbase, 0))
>>>>       return false;
>>>> +
>>>> +      if (code == MEM_REF
>>>> +       && TREE_CODE (lhsbase) == SSA_NAME
>>>> +       && known_eq (dstoff, lhsoff))
>>>> +     {
>>>> +       /* Extract the referenced variable from something like
>>>> +            MEM[(char *)d_3(D) + 3B] = 0;  */
>>>> +       gimple *def = SSA_NAME_DEF_STMT (lhsbase);
>>>> +       if (gimple_nop_p (def))
>>>> +         {
>>>> +           lhsbase = SSA_NAME_VAR (lhsbase);
>>>> +           if (lhsbase
>>>> +               && dstbase
>>>> +               && operand_equal_p (dstbase, lhsbase, 0))
>>>> +             return false;
>>>> +         }
>>>> +     }
>>> If you find yourself looking at SSA_NAME_VAR, you're usually barking up
>>> the wrong tree.  It'd be easier to suggest something here if I could see
>>> the gimple (with virtual operands).  BUt at some level what you really
>>> want to do is make sure the base of the MEM_REF is the same as what got
>>> passed as the destination of the strncpy.  You'd want to be testing
>>> SSA_NAMEs in that case.
>>
>> Yes.  Why not simply compare the SSA names?  Why would it be
>> not OK to do that when !lhsbase?
>
> The added code handles this case:
>
>   void f (char *d)
>   {
>     __builtin_strncpy (d, "12345", 4);
>     d[3] = 0;
>   }
>
> where during forwprop we see:
>
>   __builtin_strncpy (d_3(D), "12345", 4);
>   MEM[(char *)d_3(D) + 3B] = 0;
>
> The next statement after the strncpy is the assignment whose
> lhs is the MEM_REF with a GIMPLE_NOP as an operand.  There
> is no other information in the GIMPLE_NOP that I can see to
> tell that the operand is d_3(D) or that it's the same as
> the strncpy argument (i.e., the PARAM_DECl d).  Having to
> do open-code this all the time seems so cumbersome -- is
> there some API that would do this for me?  (I thought
> get_addr_base_and_unit_offset was that API but clearly in
> this case it doesn't do what I expect -- it just returns
> the argument.)

I think you need to look harder at that MEM_REF.  It references d_3.
That's what you need to be checking.  The base (d_3) is the first
operand of the MEM_REF, the offset is the second operand of the MEM_REF.

(gdb) p debug_gimple_stmt ($2)
# .MEM_5 = VDEF <.MEM_4>
MEM[(char *)d_3(D) + 3B] = 0;


(gdb) p gimple_assign_lhs ($2)
$5 = (tree_node *) 0x7ffff01a6208

(gdb) p debug_tree ($5)
 <mem_ref 0x7ffff01a6208
    type <integer_type 0x7ffff00723f0 char public string-flag QI
        size <integer_cst 0x7ffff0059d80 constant 8>
        unit-size <integer_cst 0x7ffff0059d98 constant 1>
        align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x7ffff00723f0 precision:8 min <integer_cst 0x7ffff0059dc8 -128> max
<integer_cst 0x7ffff0059df8 127>
        pointer_to_this <pointer_type 0x7ffff007de70>>

    arg:0 <ssa_name 0x7ffff0063dc8
        type <pointer_type 0x7ffff007de70 type <integer_type
0x7ffff00723f0 char>
            public unsigned DI
            size <integer_cst 0x7ffff0059c90 constant 64>
            unit-size <integer_cst 0x7ffff0059ca8 constant 8>
            align:64 warn_if_not_align:0 symtab:0 alias-set -1
canonical-type 0x7ffff007de70 reference_to_this <reference_type
0x7ffff017d738>>
        visited var <parm_decl 0x7ffff01a5000 d>
        def_stmt GIMPLE_NOP
        version:3>
    arg:1 <integer_cst 0x7ffff018ae40 type <pointer_type 0x7ffff007de70>
constant 3>
    j.c:4:6 start: j.c:4:5 finish: j.c:4:8>


Note arg:0 is the SSA_NAME d_3.  And not surprising that's lhsbase:

(gdb) p debug_tree (lhsbase)
<ssa_name 0x7ffff0063dc8
    type <pointer_type 0x7ffff007de70
        type <integer_type 0x7ffff00723f0 char public string-flag QI
            size <integer_cst 0x7ffff0059d80 constant 8>
            unit-size <integer_cst 0x7ffff0059d98 constant 1>
            align:8 warn_if_not_align:0 symtab:0 alias-set -1
canonical-type 0x7ffff00723f0 precision:8 min <integer_cst
0x7ffff0059dc8 -128> max <integer_cst 0x7ffff0059df8 127>
            pointer_to_this <pointer_type 0x7ffff007de70>>
        public unsigned DI
        size <integer_cst 0x7ffff0059c90 constant 64>
        unit-size <integer_cst 0x7ffff0059ca8 constant 8>
        align:64 warn_if_not_align:0 symtab:0 alias-set -1
canonical-type 0x7ffff007de70 reference_to_this <reference_type
0x7ffff017d738>>
    visited var <parm_decl 0x7ffff01a5000 d>
    def_stmt GIMPLE_NOP
    version:3>


Sadly, dstbase is the PARM_DECL for d.  That's where things are going
"wrong".  Not sure why you're getting the PARM_DECL in that case.  I'd
debug get_addr_base_and_unit_offset to understand what's going on.
Essentially you're getting different results of
get_addr_base_and_unit_offset in a case where they arguably should be
the same.

Jeff

Jeff
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] avoid warning on constant strncpy until next statement is reachable (PR 87028)

Richard Biener-2
On Tue, Aug 28, 2018 at 6:27 AM Jeff Law <[hidden email]> wrote:

>
> On 08/27/2018 10:27 AM, Martin Sebor wrote:
> > On 08/27/2018 02:29 AM, Richard Biener wrote:
> >> On Sun, Aug 26, 2018 at 7:26 AM Jeff Law <[hidden email]> wrote:
> >>>
> >>> On 08/24/2018 09:58 AM, Martin Sebor wrote:
> >>>> The warning suppression for -Wstringop-truncation looks for
> >>>> the next statement after a truncating strncpy to see if it
> >>>> adds a terminating nul.  This only works when the next
> >>>> statement can be reached using the Gimple statement iterator
> >>>> which isn't until after gimplification.  As a result, strncpy
> >>>> calls that truncate their constant argument that are being
> >>>> folded to memcpy this early get diagnosed even if they are
> >>>> followed by the nul assignment:
> >>>>
> >>>>   const char s[] = "12345";
> >>>>   char d[3];
> >>>>
> >>>>   void f (void)
> >>>>   {
> >>>>     strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation
> >>>>     d[sizeof d - 1] = 0;
> >>>>   }
> >>>>
> >>>> To avoid the warning I propose to defer folding strncpy to
> >>>> memcpy until the pointer to the basic block the strnpy call
> >>>> is in can be used to try to reach the next statement (this
> >>>> happens as early as ccp1).  I'm aware of the preference to
> >>>> fold things early but in the case of strncpy (a relatively
> >>>> rarely used function that is often misused), getting
> >>>> the warning right while folding a bit later but still fairly
> >>>> early on seems like a reasonable compromise.  I fear that
> >>>> otherwise, the false positives will drive users to adopt
> >>>> other unsafe solutions (like memcpy) where these kinds of
> >>>> bugs cannot be as readily detected.
> >>>>
> >>>> Tested on x86_64-linux.
> >>>>
> >>>> Martin
> >>>>
> >>>> PS There still are outstanding cases where the warning can
> >>>> be avoided.  I xfailed them in the test for now but will
> >>>> still try to get them to work for GCC 9.
> >>>>
> >>>> gcc-87028.diff
> >>>>
> >>>>
> >>>> PR tree-optimization/87028 - false positive -Wstringop-truncation
> >>>> strncpy with global variable source string
> >>>> gcc/ChangeLog:
> >>>>
> >>>>       PR tree-optimization/87028
> >>>>       * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding when
> >>>>       statement doesn't belong to a basic block.
> >>>>       * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle MEM_REF on
> >>>>       the left hand side of assignment.
> >>>>
> >>>> gcc/testsuite/ChangeLog:
> >>>>
> >>>>       PR tree-optimization/87028
> >>>>       * c-c++-common/Wstringop-truncation.c: Remove xfails.
> >>>>       * gcc.dg/Wstringop-truncation-5.c: New test.
> >>>>
> >>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
> >>>> index 07341eb..284c2fb 100644
> >>>> --- a/gcc/gimple-fold.c
> >>>> +++ b/gcc/gimple-fold.c
> >>>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy
> >>>> (gimple_stmt_iterator *gsi,
> >>>>    if (tree_int_cst_lt (ssize, len))
> >>>>      return false;
> >>>>
> >>>> +  /* Defer warning (and folding) until the next statement in the basic
> >>>> +     block is reachable.  */
> >>>> +  if (!gimple_bb (stmt))
> >>>> +    return false;
> >>> I think you want cfun->cfg as the test here.  They should be equivalent
> >>> in practice.
> >>
> >> Please do not add 'cfun' references.  Note that the next stmt is also
> >> accessible
> >> when there is no CFG.  I guess the issue is that we fold this during
> >> gimplification
> >> where the next stmt is not yet "there" (but still in GENERIC)?
> >>
> >> We generally do not want to have unfolded stmts in the IL when we can
> >> avoid that
> >> which is why we fold most stmts during gimplification.  We also do
> >> that because
> >> we now do less folding on GENERIC.
> >>
> >> There may be the possibility to refactor gimplification time folding
> >> to what we
> >> do during inlining - queue stmts we want to fold and perform all
> >> folding delayed.
> >> This of course means bigger compile-time due to cache effects.
> >>
> >>>
> >>>> diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
> >>>> index d0792aa..f1988f6 100644
> >>>> --- a/gcc/tree-ssa-strlen.c
> >>>> +++ b/gcc/tree-ssa-strlen.c
> >>>> @@ -1981,6 +1981,23 @@ maybe_diag_stxncpy_trunc
> >>>> (gimple_stmt_iterator gsi, tree src, tree cnt)
> >>>>         && known_eq (dstoff, lhsoff)
> >>>>         && operand_equal_p (dstbase, lhsbase, 0))
> >>>>       return false;
> >>>> +
> >>>> +      if (code == MEM_REF
> >>>> +       && TREE_CODE (lhsbase) == SSA_NAME
> >>>> +       && known_eq (dstoff, lhsoff))
> >>>> +     {
> >>>> +       /* Extract the referenced variable from something like
> >>>> +            MEM[(char *)d_3(D) + 3B] = 0;  */
> >>>> +       gimple *def = SSA_NAME_DEF_STMT (lhsbase);
> >>>> +       if (gimple_nop_p (def))
> >>>> +         {
> >>>> +           lhsbase = SSA_NAME_VAR (lhsbase);
> >>>> +           if (lhsbase
> >>>> +               && dstbase
> >>>> +               && operand_equal_p (dstbase, lhsbase, 0))
> >>>> +             return false;
> >>>> +         }
> >>>> +     }
> >>> If you find yourself looking at SSA_NAME_VAR, you're usually barking up
> >>> the wrong tree.  It'd be easier to suggest something here if I could see
> >>> the gimple (with virtual operands).  BUt at some level what you really
> >>> want to do is make sure the base of the MEM_REF is the same as what got
> >>> passed as the destination of the strncpy.  You'd want to be testing
> >>> SSA_NAMEs in that case.
> >>
> >> Yes.  Why not simply compare the SSA names?  Why would it be
> >> not OK to do that when !lhsbase?
> >
> > The added code handles this case:
> >
> >   void f (char *d)
> >   {
> >     __builtin_strncpy (d, "12345", 4);
> >     d[3] = 0;
> >   }
> >
> > where during forwprop we see:
> >
> >   __builtin_strncpy (d_3(D), "12345", 4);
> >   MEM[(char *)d_3(D) + 3B] = 0;
> >
> > The next statement after the strncpy is the assignment whose
> > lhs is the MEM_REF with a GIMPLE_NOP as an operand.  There
> > is no other information in the GIMPLE_NOP that I can see to
> > tell that the operand is d_3(D) or that it's the same as
> > the strncpy argument (i.e., the PARAM_DECl d).  Having to
> > do open-code this all the time seems so cumbersome -- is
> > there some API that would do this for me?  (I thought
> > get_addr_base_and_unit_offset was that API but clearly in
> > this case it doesn't do what I expect -- it just returns
> > the argument.)
>
> I think you need to look harder at that MEM_REF.  It references d_3.
> That's what you need to be checking.  The base (d_3) is the first
> operand of the MEM_REF, the offset is the second operand of the MEM_REF.
>
> (gdb) p debug_gimple_stmt ($2)
> # .MEM_5 = VDEF <.MEM_4>
> MEM[(char *)d_3(D) + 3B] = 0;
>
>
> (gdb) p gimple_assign_lhs ($2)
> $5 = (tree_node *) 0x7ffff01a6208
>
> (gdb) p debug_tree ($5)
>  <mem_ref 0x7ffff01a6208
>     type <integer_type 0x7ffff00723f0 char public string-flag QI
>         size <integer_cst 0x7ffff0059d80 constant 8>
>         unit-size <integer_cst 0x7ffff0059d98 constant 1>
>         align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
> 0x7ffff00723f0 precision:8 min <integer_cst 0x7ffff0059dc8 -128> max
> <integer_cst 0x7ffff0059df8 127>
>         pointer_to_this <pointer_type 0x7ffff007de70>>
>
>     arg:0 <ssa_name 0x7ffff0063dc8
>         type <pointer_type 0x7ffff007de70 type <integer_type
> 0x7ffff00723f0 char>
>             public unsigned DI
>             size <integer_cst 0x7ffff0059c90 constant 64>
>             unit-size <integer_cst 0x7ffff0059ca8 constant 8>
>             align:64 warn_if_not_align:0 symtab:0 alias-set -1
> canonical-type 0x7ffff007de70 reference_to_this <reference_type
> 0x7ffff017d738>>
>         visited var <parm_decl 0x7ffff01a5000 d>
>         def_stmt GIMPLE_NOP
>         version:3>
>     arg:1 <integer_cst 0x7ffff018ae40 type <pointer_type 0x7ffff007de70>
> constant 3>
>     j.c:4:6 start: j.c:4:5 finish: j.c:4:8>
>
>
> Note arg:0 is the SSA_NAME d_3.  And not surprising that's lhsbase:
>
> (gdb) p debug_tree (lhsbase)
> <ssa_name 0x7ffff0063dc8
>     type <pointer_type 0x7ffff007de70
>         type <integer_type 0x7ffff00723f0 char public string-flag QI
>             size <integer_cst 0x7ffff0059d80 constant 8>
>             unit-size <integer_cst 0x7ffff0059d98 constant 1>
>             align:8 warn_if_not_align:0 symtab:0 alias-set -1
> canonical-type 0x7ffff00723f0 precision:8 min <integer_cst
> 0x7ffff0059dc8 -128> max <integer_cst 0x7ffff0059df8 127>
>             pointer_to_this <pointer_type 0x7ffff007de70>>
>         public unsigned DI
>         size <integer_cst 0x7ffff0059c90 constant 64>
>         unit-size <integer_cst 0x7ffff0059ca8 constant 8>
>         align:64 warn_if_not_align:0 symtab:0 alias-set -1
> canonical-type 0x7ffff007de70 reference_to_this <reference_type
> 0x7ffff017d738>>
>     visited var <parm_decl 0x7ffff01a5000 d>
>     def_stmt GIMPLE_NOP
>     version:3>
>
>
> Sadly, dstbase is the PARM_DECL for d.  That's where things are going
> "wrong".  Not sure why you're getting the PARM_DECL in that case.  I'd
> debug get_addr_base_and_unit_offset to understand what's going on.
> Essentially you're getting different results of
> get_addr_base_and_unit_offset in a case where they arguably should be
> the same.

Probably get_attr_nonstring_decl has the same "mistake" and returns
the PARM_DECL instead of the SSA name pointer.  So we're comparing
apples and oranges here.

Yeah:

/* If EXPR refers to a character array or pointer declared attribute
   nonstring return a decl for that array or pointer and set *REF to
   the referenced enclosing object or pointer.  Otherwise returns
   null.  */

tree
get_attr_nonstring_decl (tree expr, tree *ref)
{
  tree decl = expr;
  if (TREE_CODE (decl) == SSA_NAME)
    {
      gimple *def = SSA_NAME_DEF_STMT (decl);

      if (is_gimple_assign (def))
        {
          tree_code code = gimple_assign_rhs_code (def);
          if (code == ADDR_EXPR
              || code == COMPONENT_REF
              || code == VAR_DECL)
            decl = gimple_assign_rhs1 (def);
        }
      else if (tree var = SSA_NAME_VAR (decl))
        decl = var;
    }

  if (TREE_CODE (decl) == ADDR_EXPR)
    decl = TREE_OPERAND (decl, 0);

  if (ref)
    *ref = decl;

I see a lot of "magic" here again in the attempt to "propagate"
a nonstring attribute.  Note

foo (char *p __attribute__(("nonstring")))
{
  p = "bar";
  strlen (p); // or whatever is necessary to call get_attr_nonstring_decl
}

is perfectly valid and p as passed to strlen is _not_ nonstring(?).

I think in your code comparing bases you want to look at the _original_
argument to the string function rather than what get_attr_nonstring_decl
returned as ref.

Richard.

> Jeff
>
> Jeff
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] avoid warning on constant strncpy until next statement is reachable (PR 87028)

Richard Biener-2
On Tue, Aug 28, 2018 at 11:55 AM Richard Biener
<[hidden email]> wrote:

>
> On Tue, Aug 28, 2018 at 6:27 AM Jeff Law <[hidden email]> wrote:
> >
> > On 08/27/2018 10:27 AM, Martin Sebor wrote:
> > > On 08/27/2018 02:29 AM, Richard Biener wrote:
> > >> On Sun, Aug 26, 2018 at 7:26 AM Jeff Law <[hidden email]> wrote:
> > >>>
> > >>> On 08/24/2018 09:58 AM, Martin Sebor wrote:
> > >>>> The warning suppression for -Wstringop-truncation looks for
> > >>>> the next statement after a truncating strncpy to see if it
> > >>>> adds a terminating nul.  This only works when the next
> > >>>> statement can be reached using the Gimple statement iterator
> > >>>> which isn't until after gimplification.  As a result, strncpy
> > >>>> calls that truncate their constant argument that are being
> > >>>> folded to memcpy this early get diagnosed even if they are
> > >>>> followed by the nul assignment:
> > >>>>
> > >>>>   const char s[] = "12345";
> > >>>>   char d[3];
> > >>>>
> > >>>>   void f (void)
> > >>>>   {
> > >>>>     strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation
> > >>>>     d[sizeof d - 1] = 0;
> > >>>>   }
> > >>>>
> > >>>> To avoid the warning I propose to defer folding strncpy to
> > >>>> memcpy until the pointer to the basic block the strnpy call
> > >>>> is in can be used to try to reach the next statement (this
> > >>>> happens as early as ccp1).  I'm aware of the preference to
> > >>>> fold things early but in the case of strncpy (a relatively
> > >>>> rarely used function that is often misused), getting
> > >>>> the warning right while folding a bit later but still fairly
> > >>>> early on seems like a reasonable compromise.  I fear that
> > >>>> otherwise, the false positives will drive users to adopt
> > >>>> other unsafe solutions (like memcpy) where these kinds of
> > >>>> bugs cannot be as readily detected.
> > >>>>
> > >>>> Tested on x86_64-linux.
> > >>>>
> > >>>> Martin
> > >>>>
> > >>>> PS There still are outstanding cases where the warning can
> > >>>> be avoided.  I xfailed them in the test for now but will
> > >>>> still try to get them to work for GCC 9.
> > >>>>
> > >>>> gcc-87028.diff
> > >>>>
> > >>>>
> > >>>> PR tree-optimization/87028 - false positive -Wstringop-truncation
> > >>>> strncpy with global variable source string
> > >>>> gcc/ChangeLog:
> > >>>>
> > >>>>       PR tree-optimization/87028
> > >>>>       * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding when
> > >>>>       statement doesn't belong to a basic block.
> > >>>>       * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle MEM_REF on
> > >>>>       the left hand side of assignment.
> > >>>>
> > >>>> gcc/testsuite/ChangeLog:
> > >>>>
> > >>>>       PR tree-optimization/87028
> > >>>>       * c-c++-common/Wstringop-truncation.c: Remove xfails.
> > >>>>       * gcc.dg/Wstringop-truncation-5.c: New test.
> > >>>>
> > >>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
> > >>>> index 07341eb..284c2fb 100644
> > >>>> --- a/gcc/gimple-fold.c
> > >>>> +++ b/gcc/gimple-fold.c
> > >>>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy
> > >>>> (gimple_stmt_iterator *gsi,
> > >>>>    if (tree_int_cst_lt (ssize, len))
> > >>>>      return false;
> > >>>>
> > >>>> +  /* Defer warning (and folding) until the next statement in the basic
> > >>>> +     block is reachable.  */
> > >>>> +  if (!gimple_bb (stmt))
> > >>>> +    return false;
> > >>> I think you want cfun->cfg as the test here.  They should be equivalent
> > >>> in practice.
> > >>
> > >> Please do not add 'cfun' references.  Note that the next stmt is also
> > >> accessible
> > >> when there is no CFG.  I guess the issue is that we fold this during
> > >> gimplification
> > >> where the next stmt is not yet "there" (but still in GENERIC)?
> > >>
> > >> We generally do not want to have unfolded stmts in the IL when we can
> > >> avoid that
> > >> which is why we fold most stmts during gimplification.  We also do
> > >> that because
> > >> we now do less folding on GENERIC.
> > >>
> > >> There may be the possibility to refactor gimplification time folding
> > >> to what we
> > >> do during inlining - queue stmts we want to fold and perform all
> > >> folding delayed.
> > >> This of course means bigger compile-time due to cache effects.
> > >>
> > >>>
> > >>>> diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
> > >>>> index d0792aa..f1988f6 100644
> > >>>> --- a/gcc/tree-ssa-strlen.c
> > >>>> +++ b/gcc/tree-ssa-strlen.c
> > >>>> @@ -1981,6 +1981,23 @@ maybe_diag_stxncpy_trunc
> > >>>> (gimple_stmt_iterator gsi, tree src, tree cnt)
> > >>>>         && known_eq (dstoff, lhsoff)
> > >>>>         && operand_equal_p (dstbase, lhsbase, 0))
> > >>>>       return false;
> > >>>> +
> > >>>> +      if (code == MEM_REF
> > >>>> +       && TREE_CODE (lhsbase) == SSA_NAME
> > >>>> +       && known_eq (dstoff, lhsoff))
> > >>>> +     {
> > >>>> +       /* Extract the referenced variable from something like
> > >>>> +            MEM[(char *)d_3(D) + 3B] = 0;  */
> > >>>> +       gimple *def = SSA_NAME_DEF_STMT (lhsbase);
> > >>>> +       if (gimple_nop_p (def))
> > >>>> +         {
> > >>>> +           lhsbase = SSA_NAME_VAR (lhsbase);
> > >>>> +           if (lhsbase
> > >>>> +               && dstbase
> > >>>> +               && operand_equal_p (dstbase, lhsbase, 0))
> > >>>> +             return false;
> > >>>> +         }
> > >>>> +     }
> > >>> If you find yourself looking at SSA_NAME_VAR, you're usually barking up
> > >>> the wrong tree.  It'd be easier to suggest something here if I could see
> > >>> the gimple (with virtual operands).  BUt at some level what you really
> > >>> want to do is make sure the base of the MEM_REF is the same as what got
> > >>> passed as the destination of the strncpy.  You'd want to be testing
> > >>> SSA_NAMEs in that case.
> > >>
> > >> Yes.  Why not simply compare the SSA names?  Why would it be
> > >> not OK to do that when !lhsbase?
> > >
> > > The added code handles this case:
> > >
> > >   void f (char *d)
> > >   {
> > >     __builtin_strncpy (d, "12345", 4);
> > >     d[3] = 0;
> > >   }
> > >
> > > where during forwprop we see:
> > >
> > >   __builtin_strncpy (d_3(D), "12345", 4);
> > >   MEM[(char *)d_3(D) + 3B] = 0;
> > >
> > > The next statement after the strncpy is the assignment whose
> > > lhs is the MEM_REF with a GIMPLE_NOP as an operand.  There
> > > is no other information in the GIMPLE_NOP that I can see to
> > > tell that the operand is d_3(D) or that it's the same as
> > > the strncpy argument (i.e., the PARAM_DECl d).  Having to
> > > do open-code this all the time seems so cumbersome -- is
> > > there some API that would do this for me?  (I thought
> > > get_addr_base_and_unit_offset was that API but clearly in
> > > this case it doesn't do what I expect -- it just returns
> > > the argument.)
> >
> > I think you need to look harder at that MEM_REF.  It references d_3.
> > That's what you need to be checking.  The base (d_3) is the first
> > operand of the MEM_REF, the offset is the second operand of the MEM_REF.
> >
> > (gdb) p debug_gimple_stmt ($2)
> > # .MEM_5 = VDEF <.MEM_4>
> > MEM[(char *)d_3(D) + 3B] = 0;
> >
> >
> > (gdb) p gimple_assign_lhs ($2)
> > $5 = (tree_node *) 0x7ffff01a6208
> >
> > (gdb) p debug_tree ($5)
> >  <mem_ref 0x7ffff01a6208
> >     type <integer_type 0x7ffff00723f0 char public string-flag QI
> >         size <integer_cst 0x7ffff0059d80 constant 8>
> >         unit-size <integer_cst 0x7ffff0059d98 constant 1>
> >         align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
> > 0x7ffff00723f0 precision:8 min <integer_cst 0x7ffff0059dc8 -128> max
> > <integer_cst 0x7ffff0059df8 127>
> >         pointer_to_this <pointer_type 0x7ffff007de70>>
> >
> >     arg:0 <ssa_name 0x7ffff0063dc8
> >         type <pointer_type 0x7ffff007de70 type <integer_type
> > 0x7ffff00723f0 char>
> >             public unsigned DI
> >             size <integer_cst 0x7ffff0059c90 constant 64>
> >             unit-size <integer_cst 0x7ffff0059ca8 constant 8>
> >             align:64 warn_if_not_align:0 symtab:0 alias-set -1
> > canonical-type 0x7ffff007de70 reference_to_this <reference_type
> > 0x7ffff017d738>>
> >         visited var <parm_decl 0x7ffff01a5000 d>
> >         def_stmt GIMPLE_NOP
> >         version:3>
> >     arg:1 <integer_cst 0x7ffff018ae40 type <pointer_type 0x7ffff007de70>
> > constant 3>
> >     j.c:4:6 start: j.c:4:5 finish: j.c:4:8>
> >
> >
> > Note arg:0 is the SSA_NAME d_3.  And not surprising that's lhsbase:
> >
> > (gdb) p debug_tree (lhsbase)
> > <ssa_name 0x7ffff0063dc8
> >     type <pointer_type 0x7ffff007de70
> >         type <integer_type 0x7ffff00723f0 char public string-flag QI
> >             size <integer_cst 0x7ffff0059d80 constant 8>
> >             unit-size <integer_cst 0x7ffff0059d98 constant 1>
> >             align:8 warn_if_not_align:0 symtab:0 alias-set -1
> > canonical-type 0x7ffff00723f0 precision:8 min <integer_cst
> > 0x7ffff0059dc8 -128> max <integer_cst 0x7ffff0059df8 127>
> >             pointer_to_this <pointer_type 0x7ffff007de70>>
> >         public unsigned DI
> >         size <integer_cst 0x7ffff0059c90 constant 64>
> >         unit-size <integer_cst 0x7ffff0059ca8 constant 8>
> >         align:64 warn_if_not_align:0 symtab:0 alias-set -1
> > canonical-type 0x7ffff007de70 reference_to_this <reference_type
> > 0x7ffff017d738>>
> >     visited var <parm_decl 0x7ffff01a5000 d>
> >     def_stmt GIMPLE_NOP
> >     version:3>
> >
> >
> > Sadly, dstbase is the PARM_DECL for d.  That's where things are going
> > "wrong".  Not sure why you're getting the PARM_DECL in that case.  I'd
> > debug get_addr_base_and_unit_offset to understand what's going on.
> > Essentially you're getting different results of
> > get_addr_base_and_unit_offset in a case where they arguably should be
> > the same.
>
> Probably get_attr_nonstring_decl has the same "mistake" and returns
> the PARM_DECL instead of the SSA name pointer.  So we're comparing
> apples and oranges here.
>
> Yeah:
>
> /* If EXPR refers to a character array or pointer declared attribute
>    nonstring return a decl for that array or pointer and set *REF to
>    the referenced enclosing object or pointer.  Otherwise returns
>    null.  */
>
> tree
> get_attr_nonstring_decl (tree expr, tree *ref)
> {
>   tree decl = expr;
>   if (TREE_CODE (decl) == SSA_NAME)
>     {
>       gimple *def = SSA_NAME_DEF_STMT (decl);
>
>       if (is_gimple_assign (def))
>         {
>           tree_code code = gimple_assign_rhs_code (def);
>           if (code == ADDR_EXPR
>               || code == COMPONENT_REF
>               || code == VAR_DECL)
>             decl = gimple_assign_rhs1 (def);
>         }
>       else if (tree var = SSA_NAME_VAR (decl))
>         decl = var;
>     }
>
>   if (TREE_CODE (decl) == ADDR_EXPR)
>     decl = TREE_OPERAND (decl, 0);
>
>   if (ref)
>     *ref = decl;
>
> I see a lot of "magic" here again in the attempt to "propagate"
> a nonstring attribute.  Note
>
> foo (char *p __attribute__(("nonstring")))
> {
>   p = "bar";
>   strlen (p); // or whatever is necessary to call get_attr_nonstring_decl
> }
>
> is perfectly valid and p as passed to strlen is _not_ nonstring(?).
>
> I think in your code comparing bases you want to look at the _original_
> argument to the string function rather than what get_attr_nonstring_decl
> returned as ref.

Oh, and this 'nonstring' feels like sth that could be propagated by points-to
analysis.

> Richard.
>
> > Jeff
> >
> > Jeff
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] avoid warning on constant strncpy until next statement is reachable (PR 87028)

Martin Sebor-2
In reply to this post by Jeff Law
On 08/27/2018 10:27 PM, Jeff Law wrote:

> On 08/27/2018 10:27 AM, Martin Sebor wrote:
>> On 08/27/2018 02:29 AM, Richard Biener wrote:
>>> On Sun, Aug 26, 2018 at 7:26 AM Jeff Law <[hidden email]> wrote:
>>>>
>>>> On 08/24/2018 09:58 AM, Martin Sebor wrote:
>>>>> The warning suppression for -Wstringop-truncation looks for
>>>>> the next statement after a truncating strncpy to see if it
>>>>> adds a terminating nul.  This only works when the next
>>>>> statement can be reached using the Gimple statement iterator
>>>>> which isn't until after gimplification.  As a result, strncpy
>>>>> calls that truncate their constant argument that are being
>>>>> folded to memcpy this early get diagnosed even if they are
>>>>> followed by the nul assignment:
>>>>>
>>>>>   const char s[] = "12345";
>>>>>   char d[3];
>>>>>
>>>>>   void f (void)
>>>>>   {
>>>>>     strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation
>>>>>     d[sizeof d - 1] = 0;
>>>>>   }
>>>>>
>>>>> To avoid the warning I propose to defer folding strncpy to
>>>>> memcpy until the pointer to the basic block the strnpy call
>>>>> is in can be used to try to reach the next statement (this
>>>>> happens as early as ccp1).  I'm aware of the preference to
>>>>> fold things early but in the case of strncpy (a relatively
>>>>> rarely used function that is often misused), getting
>>>>> the warning right while folding a bit later but still fairly
>>>>> early on seems like a reasonable compromise.  I fear that
>>>>> otherwise, the false positives will drive users to adopt
>>>>> other unsafe solutions (like memcpy) where these kinds of
>>>>> bugs cannot be as readily detected.
>>>>>
>>>>> Tested on x86_64-linux.
>>>>>
>>>>> Martin
>>>>>
>>>>> PS There still are outstanding cases where the warning can
>>>>> be avoided.  I xfailed them in the test for now but will
>>>>> still try to get them to work for GCC 9.
>>>>>
>>>>> gcc-87028.diff
>>>>>
>>>>>
>>>>> PR tree-optimization/87028 - false positive -Wstringop-truncation
>>>>> strncpy with global variable source string
>>>>> gcc/ChangeLog:
>>>>>
>>>>>       PR tree-optimization/87028
>>>>>       * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding when
>>>>>       statement doesn't belong to a basic block.
>>>>>       * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle MEM_REF on
>>>>>       the left hand side of assignment.
>>>>>
>>>>> gcc/testsuite/ChangeLog:
>>>>>
>>>>>       PR tree-optimization/87028
>>>>>       * c-c++-common/Wstringop-truncation.c: Remove xfails.
>>>>>       * gcc.dg/Wstringop-truncation-5.c: New test.
>>>>>
>>>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
>>>>> index 07341eb..284c2fb 100644
>>>>> --- a/gcc/gimple-fold.c
>>>>> +++ b/gcc/gimple-fold.c
>>>>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy
>>>>> (gimple_stmt_iterator *gsi,
>>>>>    if (tree_int_cst_lt (ssize, len))
>>>>>      return false;
>>>>>
>>>>> +  /* Defer warning (and folding) until the next statement in the basic
>>>>> +     block is reachable.  */
>>>>> +  if (!gimple_bb (stmt))
>>>>> +    return false;
>>>> I think you want cfun->cfg as the test here.  They should be equivalent
>>>> in practice.
>>>
>>> Please do not add 'cfun' references.  Note that the next stmt is also
>>> accessible
>>> when there is no CFG.  I guess the issue is that we fold this during
>>> gimplification
>>> where the next stmt is not yet "there" (but still in GENERIC)?
>>>
>>> We generally do not want to have unfolded stmts in the IL when we can
>>> avoid that
>>> which is why we fold most stmts during gimplification.  We also do
>>> that because
>>> we now do less folding on GENERIC.
>>>
>>> There may be the possibility to refactor gimplification time folding
>>> to what we
>>> do during inlining - queue stmts we want to fold and perform all
>>> folding delayed.
>>> This of course means bigger compile-time due to cache effects.
>>>
>>>>
>>>>> diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
>>>>> index d0792aa..f1988f6 100644
>>>>> --- a/gcc/tree-ssa-strlen.c
>>>>> +++ b/gcc/tree-ssa-strlen.c
>>>>> @@ -1981,6 +1981,23 @@ maybe_diag_stxncpy_trunc
>>>>> (gimple_stmt_iterator gsi, tree src, tree cnt)
>>>>>         && known_eq (dstoff, lhsoff)
>>>>>         && operand_equal_p (dstbase, lhsbase, 0))
>>>>>       return false;
>>>>> +
>>>>> +      if (code == MEM_REF
>>>>> +       && TREE_CODE (lhsbase) == SSA_NAME
>>>>> +       && known_eq (dstoff, lhsoff))
>>>>> +     {
>>>>> +       /* Extract the referenced variable from something like
>>>>> +            MEM[(char *)d_3(D) + 3B] = 0;  */
>>>>> +       gimple *def = SSA_NAME_DEF_STMT (lhsbase);
>>>>> +       if (gimple_nop_p (def))
>>>>> +         {
>>>>> +           lhsbase = SSA_NAME_VAR (lhsbase);
>>>>> +           if (lhsbase
>>>>> +               && dstbase
>>>>> +               && operand_equal_p (dstbase, lhsbase, 0))
>>>>> +             return false;
>>>>> +         }
>>>>> +     }
>>>> If you find yourself looking at SSA_NAME_VAR, you're usually barking up
>>>> the wrong tree.  It'd be easier to suggest something here if I could see
>>>> the gimple (with virtual operands).  BUt at some level what you really
>>>> want to do is make sure the base of the MEM_REF is the same as what got
>>>> passed as the destination of the strncpy.  You'd want to be testing
>>>> SSA_NAMEs in that case.
>>>
>>> Yes.  Why not simply compare the SSA names?  Why would it be
>>> not OK to do that when !lhsbase?
>>
>> The added code handles this case:
>>
>>   void f (char *d)
>>   {
>>     __builtin_strncpy (d, "12345", 4);
>>     d[3] = 0;
>>   }
>>
>> where during forwprop we see:
>>
>>   __builtin_strncpy (d_3(D), "12345", 4);
>>   MEM[(char *)d_3(D) + 3B] = 0;
>>
>> The next statement after the strncpy is the assignment whose
>> lhs is the MEM_REF with a GIMPLE_NOP as an operand.  There
>> is no other information in the GIMPLE_NOP that I can see to
>> tell that the operand is d_3(D) or that it's the same as
>> the strncpy argument (i.e., the PARAM_DECl d).  Having to
>> do open-code this all the time seems so cumbersome -- is
>> there some API that would do this for me?  (I thought
>> get_addr_base_and_unit_offset was that API but clearly in
>> this case it doesn't do what I expect -- it just returns
>> the argument.)
>
> I think you need to look harder at that MEM_REF.  It references d_3.
> That's what you need to be checking.  The base (d_3) is the first
> operand of the MEM_REF, the offset is the second operand of the MEM_REF.
>
> (gdb) p debug_gimple_stmt ($2)
> # .MEM_5 = VDEF <.MEM_4>
> MEM[(char *)d_3(D) + 3B] = 0;
>
>
> (gdb) p gimple_assign_lhs ($2)
> $5 = (tree_node *) 0x7ffff01a6208
>
> (gdb) p debug_tree ($5)
>  <mem_ref 0x7ffff01a6208
>     type <integer_type 0x7ffff00723f0 char public string-flag QI
>         size <integer_cst 0x7ffff0059d80 constant 8>
>         unit-size <integer_cst 0x7ffff0059d98 constant 1>
>         align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
> 0x7ffff00723f0 precision:8 min <integer_cst 0x7ffff0059dc8 -128> max
> <integer_cst 0x7ffff0059df8 127>
>         pointer_to_this <pointer_type 0x7ffff007de70>>
>
>     arg:0 <ssa_name 0x7ffff0063dc8
>         type <pointer_type 0x7ffff007de70 type <integer_type
> 0x7ffff00723f0 char>
>             public unsigned DI
>             size <integer_cst 0x7ffff0059c90 constant 64>
>             unit-size <integer_cst 0x7ffff0059ca8 constant 8>
>             align:64 warn_if_not_align:0 symtab:0 alias-set -1
> canonical-type 0x7ffff007de70 reference_to_this <reference_type
> 0x7ffff017d738>>
>         visited var <parm_decl 0x7ffff01a5000 d>
>         def_stmt GIMPLE_NOP
>         version:3>
>     arg:1 <integer_cst 0x7ffff018ae40 type <pointer_type 0x7ffff007de70>
> constant 3>
>     j.c:4:6 start: j.c:4:5 finish: j.c:4:8>
>
>
> Note arg:0 is the SSA_NAME d_3.  And not surprising that's lhsbase:

The d in the MEM_REF you see in the dump above is the SSA_NAME's
SSA_NAME_VAR:

           visited var <parm_decl 0x7ffff01a5000 d>

Here's the print_node() code that prints it:

          print_node_brief (file, "var", SSA_NAME_VAR (node), indent + 4);

There is nothing else in the MEM_REF operand that tells me that.
Why is it wrong to look at the SSA_NAME_VAR?

> (gdb) p debug_tree (lhsbase)
> <ssa_name 0x7ffff0063dc8
>     type <pointer_type 0x7ffff007de70
>         type <integer_type 0x7ffff00723f0 char public string-flag QI
>             size <integer_cst 0x7ffff0059d80 constant 8>
>             unit-size <integer_cst 0x7ffff0059d98 constant 1>
>             align:8 warn_if_not_align:0 symtab:0 alias-set -1
> canonical-type 0x7ffff00723f0 precision:8 min <integer_cst
> 0x7ffff0059dc8 -128> max <integer_cst 0x7ffff0059df8 127>
>             pointer_to_this <pointer_type 0x7ffff007de70>>
>         public unsigned DI
>         size <integer_cst 0x7ffff0059c90 constant 64>
>         unit-size <integer_cst 0x7ffff0059ca8 constant 8>
>         align:64 warn_if_not_align:0 symtab:0 alias-set -1
> canonical-type 0x7ffff007de70 reference_to_this <reference_type
> 0x7ffff017d738>>
>     visited var <parm_decl 0x7ffff01a5000 d>
>     def_stmt GIMPLE_NOP
>     version:3>
> Sadly, dstbase is the PARM_DECL for d.  That's where things are going
> "wrong".

As Richard observed, that's because get_attr_nonstring_decl()
returns the DECL that the expression refers to.  It does that
because that's where it looks for attribute nonstring, and so
that the warning can mention the DECL with the attribute.

I suppose since I'm not supposed to be using SSA_NAME_VAR
(I still don't understand why it's taboo) I'll have to avoid
using the get_attr_nonstring_decl() return value and instead
look into comparing the SSA_NAMEs.

Martin

> Not sure why you're getting the PARM_DECL in that case.  I'd
> debug get_addr_base_and_unit_offset to understand what's going on.
> Essentially you're getting different results of
> get_addr_base_and_unit_offset in a case where they arguably should be
> the same.
>
> Jeff
>
> Jeff
>

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] avoid warning on constant strncpy until next statement is reachable (PR 87028)

Jeff Law
On 08/28/2018 02:43 PM, Martin Sebor wrote:

> On 08/27/2018 10:27 PM, Jeff Law wrote:
>> On 08/27/2018 10:27 AM, Martin Sebor wrote:
>>> On 08/27/2018 02:29 AM, Richard Biener wrote:
>>>> On Sun, Aug 26, 2018 at 7:26 AM Jeff Law <[hidden email]> wrote:
>>>>>
>>>>> On 08/24/2018 09:58 AM, Martin Sebor wrote:
>>>>>> The warning suppression for -Wstringop-truncation looks for
>>>>>> the next statement after a truncating strncpy to see if it
>>>>>> adds a terminating nul.  This only works when the next
>>>>>> statement can be reached using the Gimple statement iterator
>>>>>> which isn't until after gimplification.  As a result, strncpy
>>>>>> calls that truncate their constant argument that are being
>>>>>> folded to memcpy this early get diagnosed even if they are
>>>>>> followed by the nul assignment:
>>>>>>
>>>>>>   const char s[] = "12345";
>>>>>>   char d[3];
>>>>>>
>>>>>>   void f (void)
>>>>>>   {
>>>>>>     strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation
>>>>>>     d[sizeof d - 1] = 0;
>>>>>>   }
>>>>>>
>>>>>> To avoid the warning I propose to defer folding strncpy to
>>>>>> memcpy until the pointer to the basic block the strnpy call
>>>>>> is in can be used to try to reach the next statement (this
>>>>>> happens as early as ccp1).  I'm aware of the preference to
>>>>>> fold things early but in the case of strncpy (a relatively
>>>>>> rarely used function that is often misused), getting
>>>>>> the warning right while folding a bit later but still fairly
>>>>>> early on seems like a reasonable compromise.  I fear that
>>>>>> otherwise, the false positives will drive users to adopt
>>>>>> other unsafe solutions (like memcpy) where these kinds of
>>>>>> bugs cannot be as readily detected.
>>>>>>
>>>>>> Tested on x86_64-linux.
>>>>>>
>>>>>> Martin
>>>>>>
>>>>>> PS There still are outstanding cases where the warning can
>>>>>> be avoided.  I xfailed them in the test for now but will
>>>>>> still try to get them to work for GCC 9.
>>>>>>
>>>>>> gcc-87028.diff
>>>>>>
>>>>>>
>>>>>> PR tree-optimization/87028 - false positive -Wstringop-truncation
>>>>>> strncpy with global variable source string
>>>>>> gcc/ChangeLog:
>>>>>>
>>>>>>       PR tree-optimization/87028
>>>>>>       * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding
>>>>>> when
>>>>>>       statement doesn't belong to a basic block.
>>>>>>       * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle
>>>>>> MEM_REF on
>>>>>>       the left hand side of assignment.
>>>>>>
>>>>>> gcc/testsuite/ChangeLog:
>>>>>>
>>>>>>       PR tree-optimization/87028
>>>>>>       * c-c++-common/Wstringop-truncation.c: Remove xfails.
>>>>>>       * gcc.dg/Wstringop-truncation-5.c: New test.
>>>>>>
>>>>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
>>>>>> index 07341eb..284c2fb 100644
>>>>>> --- a/gcc/gimple-fold.c
>>>>>> +++ b/gcc/gimple-fold.c
>>>>>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy
>>>>>> (gimple_stmt_iterator *gsi,
>>>>>>    if (tree_int_cst_lt (ssize, len))
>>>>>>      return false;
>>>>>>
>>>>>> +  /* Defer warning (and folding) until the next statement in the
>>>>>> basic
>>>>>> +     block is reachable.  */
>>>>>> +  if (!gimple_bb (stmt))
>>>>>> +    return false;
>>>>> I think you want cfun->cfg as the test here.  They should be
>>>>> equivalent
>>>>> in practice.
>>>>
>>>> Please do not add 'cfun' references.  Note that the next stmt is also
>>>> accessible
>>>> when there is no CFG.  I guess the issue is that we fold this during
>>>> gimplification
>>>> where the next stmt is not yet "there" (but still in GENERIC)?
>>>>
>>>> We generally do not want to have unfolded stmts in the IL when we can
>>>> avoid that
>>>> which is why we fold most stmts during gimplification.  We also do
>>>> that because
>>>> we now do less folding on GENERIC.
>>>>
>>>> There may be the possibility to refactor gimplification time folding
>>>> to what we
>>>> do during inlining - queue stmts we want to fold and perform all
>>>> folding delayed.
>>>> This of course means bigger compile-time due to cache effects.
>>>>
>>>>>
>>>>>> diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
>>>>>> index d0792aa..f1988f6 100644
>>>>>> --- a/gcc/tree-ssa-strlen.c
>>>>>> +++ b/gcc/tree-ssa-strlen.c
>>>>>> @@ -1981,6 +1981,23 @@ maybe_diag_stxncpy_trunc
>>>>>> (gimple_stmt_iterator gsi, tree src, tree cnt)
>>>>>>         && known_eq (dstoff, lhsoff)
>>>>>>         && operand_equal_p (dstbase, lhsbase, 0))
>>>>>>       return false;
>>>>>> +
>>>>>> +      if (code == MEM_REF
>>>>>> +       && TREE_CODE (lhsbase) == SSA_NAME
>>>>>> +       && known_eq (dstoff, lhsoff))
>>>>>> +     {
>>>>>> +       /* Extract the referenced variable from something like
>>>>>> +            MEM[(char *)d_3(D) + 3B] = 0;  */
>>>>>> +       gimple *def = SSA_NAME_DEF_STMT (lhsbase);
>>>>>> +       if (gimple_nop_p (def))
>>>>>> +         {
>>>>>> +           lhsbase = SSA_NAME_VAR (lhsbase);
>>>>>> +           if (lhsbase
>>>>>> +               && dstbase
>>>>>> +               && operand_equal_p (dstbase, lhsbase, 0))
>>>>>> +             return false;
>>>>>> +         }
>>>>>> +     }
>>>>> If you find yourself looking at SSA_NAME_VAR, you're usually
>>>>> barking up
>>>>> the wrong tree.  It'd be easier to suggest something here if I
>>>>> could see
>>>>> the gimple (with virtual operands).  BUt at some level what you really
>>>>> want to do is make sure the base of the MEM_REF is the same as what
>>>>> got
>>>>> passed as the destination of the strncpy.  You'd want to be testing
>>>>> SSA_NAMEs in that case.
>>>>
>>>> Yes.  Why not simply compare the SSA names?  Why would it be
>>>> not OK to do that when !lhsbase?
>>>
>>> The added code handles this case:
>>>
>>>   void f (char *d)
>>>   {
>>>     __builtin_strncpy (d, "12345", 4);
>>>     d[3] = 0;
>>>   }
>>>
>>> where during forwprop we see:
>>>
>>>   __builtin_strncpy (d_3(D), "12345", 4);
>>>   MEM[(char *)d_3(D) + 3B] = 0;
>>>
>>> The next statement after the strncpy is the assignment whose
>>> lhs is the MEM_REF with a GIMPLE_NOP as an operand.  There
>>> is no other information in the GIMPLE_NOP that I can see to
>>> tell that the operand is d_3(D) or that it's the same as
>>> the strncpy argument (i.e., the PARAM_DECl d).  Having to
>>> do open-code this all the time seems so cumbersome -- is
>>> there some API that would do this for me?  (I thought
>>> get_addr_base_and_unit_offset was that API but clearly in
>>> this case it doesn't do what I expect -- it just returns
>>> the argument.)
>>
>> I think you need to look harder at that MEM_REF.  It references d_3.
>> That's what you need to be checking.  The base (d_3) is the first
>> operand of the MEM_REF, the offset is the second operand of the MEM_REF.
>>
>> (gdb) p debug_gimple_stmt ($2)
>> # .MEM_5 = VDEF <.MEM_4>
>> MEM[(char *)d_3(D) + 3B] = 0;
>>
>>
>> (gdb) p gimple_assign_lhs ($2)
>> $5 = (tree_node *) 0x7ffff01a6208
>>
>> (gdb) p debug_tree ($5)
>>  <mem_ref 0x7ffff01a6208
>>     type <integer_type 0x7ffff00723f0 char public string-flag QI
>>         size <integer_cst 0x7ffff0059d80 constant 8>
>>         unit-size <integer_cst 0x7ffff0059d98 constant 1>
>>         align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
>> 0x7ffff00723f0 precision:8 min <integer_cst 0x7ffff0059dc8 -128> max
>> <integer_cst 0x7ffff0059df8 127>
>>         pointer_to_this <pointer_type 0x7ffff007de70>>
>>
>>     arg:0 <ssa_name 0x7ffff0063dc8
>>         type <pointer_type 0x7ffff007de70 type <integer_type
>> 0x7ffff00723f0 char>
>>             public unsigned DI
>>             size <integer_cst 0x7ffff0059c90 constant 64>
>>             unit-size <integer_cst 0x7ffff0059ca8 constant 8>
>>             align:64 warn_if_not_align:0 symtab:0 alias-set -1
>> canonical-type 0x7ffff007de70 reference_to_this <reference_type
>> 0x7ffff017d738>>
>>         visited var <parm_decl 0x7ffff01a5000 d>
>>         def_stmt GIMPLE_NOP
>>         version:3>
>>     arg:1 <integer_cst 0x7ffff018ae40 type <pointer_type 0x7ffff007de70>
>> constant 3>
>>     j.c:4:6 start: j.c:4:5 finish: j.c:4:8>
>>
>>
>> Note arg:0 is the SSA_NAME d_3.  And not surprising that's lhsbase:
>
> The d in the MEM_REF you see in the dump above is the SSA_NAME's
> SSA_NAME_VAR:
>
>           visited var <parm_decl 0x7ffff01a5000 d>
>
> Here's the print_node() code that prints it:
>
>       print_node_brief (file, "var", SSA_NAME_VAR (node), indent + 4);
>
> There is nothing else in the MEM_REF operand that tells me that.
> Why is it wrong to look at the SSA_NAME_VAR?
>
>> (gdb) p debug_tree (lhsbase)
>> <ssa_name 0x7ffff0063dc8
>>     type <pointer_type 0x7ffff007de70
>>         type <integer_type 0x7ffff00723f0 char public string-flag QI
>>             size <integer_cst 0x7ffff0059d80 constant 8>
>>             unit-size <integer_cst 0x7ffff0059d98 constant 1>
>>             align:8 warn_if_not_align:0 symtab:0 alias-set -1
>> canonical-type 0x7ffff00723f0 precision:8 min <integer_cst
>> 0x7ffff0059dc8 -128> max <integer_cst 0x7ffff0059df8 127>
>>             pointer_to_this <pointer_type 0x7ffff007de70>>
>>         public unsigned DI
>>         size <integer_cst 0x7ffff0059c90 constant 64>
>>         unit-size <integer_cst 0x7ffff0059ca8 constant 8>
>>         align:64 warn_if_not_align:0 symtab:0 alias-set -1
>> canonical-type 0x7ffff007de70 reference_to_this <reference_type
>> 0x7ffff017d738>>
>>     visited var <parm_decl 0x7ffff01a5000 d>
>>     def_stmt GIMPLE_NOP
>>     version:3>
>> Sadly, dstbase is the PARM_DECL for d.  That's where things are going
>> "wrong".
>
> As Richard observed, that's because get_attr_nonstring_decl()
> returns the DECL that the expression refers to.  It does that
> because that's where it looks for attribute nonstring, and so
> that the warning can mention the DECL with the attribute.
>
> I suppose since I'm not supposed to be using SSA_NAME_VAR
> (I still don't understand why it's taboo) I'll have to avoid
> using the get_attr_nonstring_decl() return value and instead
> look into comparing the SSA_NAMEs.
Because it's not generally useful because it has no dataflow information
associated with it.  SSA_NAMEs are what carry dataflow information and
what you need to check if you want to know if two objects are the same.

SSA_NAME_VAR's primary use is for diagnostic messages and debugging.  We
do hang attributes off the _DECL node it refers to, so you can take an
SSA_NAME, query its SSA_NAME_VAR if you need to check if the SSA_NAME
has a particular attribute property.  But if you're trying to see if two
objects in the IL are the same, you need to be looking at the SSA_NAME.

jeff

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] avoid warning on constant strncpy until next statement is reachable (PR 87028)

Martin Sebor-2
In reply to this post by Richard Biener-2
>> Sadly, dstbase is the PARM_DECL for d.  That's where things are going
>> "wrong".  Not sure why you're getting the PARM_DECL in that case.  I'd
>> debug get_addr_base_and_unit_offset to understand what's going on.
>> Essentially you're getting different results of
>> get_addr_base_and_unit_offset in a case where they arguably should be
>> the same.
>
> Probably get_attr_nonstring_decl has the same "mistake" and returns
> the PARM_DECL instead of the SSA name pointer.  So we're comparing
> apples and oranges here.

Returning the SSA_NAME_VAR from get_attr_nonstring_decl() is
intentional but the function need not (perhaps should not)
also set *REF to it.

>
> Yeah:
>
> /* If EXPR refers to a character array or pointer declared attribute
>    nonstring return a decl for that array or pointer and set *REF to
>    the referenced enclosing object or pointer.  Otherwise returns
>    null.  */
>
> tree
> get_attr_nonstring_decl (tree expr, tree *ref)
> {
>   tree decl = expr;
>   if (TREE_CODE (decl) == SSA_NAME)
>     {
>       gimple *def = SSA_NAME_DEF_STMT (decl);
>
>       if (is_gimple_assign (def))
>         {
>           tree_code code = gimple_assign_rhs_code (def);
>           if (code == ADDR_EXPR
>               || code == COMPONENT_REF
>               || code == VAR_DECL)
>             decl = gimple_assign_rhs1 (def);
>         }
>       else if (tree var = SSA_NAME_VAR (decl))
>         decl = var;
>     }
>
>   if (TREE_CODE (decl) == ADDR_EXPR)
>     decl = TREE_OPERAND (decl, 0);
>
>   if (ref)
>     *ref = decl;
>
> I see a lot of "magic" here again in the attempt to "propagate"
> a nonstring attribute.
That's the function's purpose: to look for the attribute.  Is
there a better way to do this?

> Note
>
> foo (char *p __attribute__(("nonstring")))
> {
>   p = "bar";
>   strlen (p); // or whatever is necessary to call get_attr_nonstring_decl
> }
>
> is perfectly valid and p as passed to strlen is _not_ nonstring(?).

I don't know if you're saying that it should get a warning or
shouldn't.  Right now it doesn't because the strlen() call is
folded before we check for nonstring.

I could see an argument for diagnosing it but I suspect you
wouldn't like it because it would mean more warning from
the folder.  I could also see an argument against it because,
as you said, it's safe.

If you take the assignment to p away then a warning is issued,
and that's because p is declared with attribute nonstring.
That's also why get_attr_nonstring_decl looks at SSA_NAME_VAR.

> I think in your code comparing bases you want to look at the _original_
> argument to the string function rather than what get_attr_nonstring_decl
> returned as ref.

I've adjusted get_attr_nonstring_decl() to avoid setting *REF
to SSA_NAME_VAR.  That let me remove the GIMPLE_NOP code from
the patch.  I've also updated the comment above SSA_NAME_VAR
to clarify its purpose per Jeff's comments.

Attached is an updated revision with these changes.

Martin

gcc-87028.diff (7K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] avoid warning on constant strncpy until next statement is reachable (PR 87028)

Richard Biener-2
On Wed, Aug 29, 2018 at 2:12 AM Martin Sebor <[hidden email]> wrote:

>
> >> Sadly, dstbase is the PARM_DECL for d.  That's where things are going
> >> "wrong".  Not sure why you're getting the PARM_DECL in that case.  I'd
> >> debug get_addr_base_and_unit_offset to understand what's going on.
> >> Essentially you're getting different results of
> >> get_addr_base_and_unit_offset in a case where they arguably should be
> >> the same.
> >
> > Probably get_attr_nonstring_decl has the same "mistake" and returns
> > the PARM_DECL instead of the SSA name pointer.  So we're comparing
> > apples and oranges here.
>
> Returning the SSA_NAME_VAR from get_attr_nonstring_decl() is
> intentional but the function need not (perhaps should not)
> also set *REF to it.
>
> >
> > Yeah:
> >
> > /* If EXPR refers to a character array or pointer declared attribute
> >    nonstring return a decl for that array or pointer and set *REF to
> >    the referenced enclosing object or pointer.  Otherwise returns
> >    null.  */
> >
> > tree
> > get_attr_nonstring_decl (tree expr, tree *ref)
> > {
> >   tree decl = expr;
> >   if (TREE_CODE (decl) == SSA_NAME)
> >     {
> >       gimple *def = SSA_NAME_DEF_STMT (decl);
> >
> >       if (is_gimple_assign (def))
> >         {
> >           tree_code code = gimple_assign_rhs_code (def);
> >           if (code == ADDR_EXPR
> >               || code == COMPONENT_REF
> >               || code == VAR_DECL)
> >             decl = gimple_assign_rhs1 (def);
> >         }
> >       else if (tree var = SSA_NAME_VAR (decl))
> >         decl = var;
> >     }
> >
> >   if (TREE_CODE (decl) == ADDR_EXPR)
> >     decl = TREE_OPERAND (decl, 0);
> >
> >   if (ref)
> >     *ref = decl;
> >
> > I see a lot of "magic" here again in the attempt to "propagate"
> > a nonstring attribute.
>
> That's the function's purpose: to look for the attribute.  Is
> there a better way to do this?

Well, the question is what "nonstring" is, semantically.  I read it
as sth like __restrinct - a pointer with "nonstring" attribute points
to a non-string.  So I suspect your function either computes
"may expr point to a nonstring" or "must expr point to a nonstring"
if it gets a pointer argument.  If it gets a (string) object it checks whether
that object is declared "nonstring" (thus, if you'd built a pointer to expr
whether that pointer _must_ point to a nonstring.  So I guess the first
one is "must".  Clearly looking at SSA_NAME_VAR isn't good here,
it would be semantically correct only for SSA_NAME_IS_DEFAULT_DEF
and SSA_NAME_VAR being a PARM_DECL.

I guess it would be nice to clearly separate the pointer vs. object case
by documentation in the function - all of the quoted parts above seem
to be for the address case so a gcc_assert (POINTER_TYPE_P (TREE_TYPE (decl))
inside the if (TREE_CODE (decl) == SSA_NAME) path should never trigger?

> > Note
> >
> > foo (char *p __attribute__(("nonstring")))
> > {
> >   p = "bar";
> >   strlen (p); // or whatever is necessary to call get_attr_nonstring_decl
> > }
> >
> > is perfectly valid and p as passed to strlen is _not_ nonstring(?).
>
> I don't know if you're saying that it should get a warning or
> shouldn't.  Right now it doesn't because the strlen() call is
> folded before we check for nonstring.

I say it shouldn't because I assign "bar" to p and after that p isn't
the original parameter anymore?

> I could see an argument for diagnosing it but I suspect you
> wouldn't like it because it would mean more warning from
> the folder.  I could also see an argument against it because,
> as you said, it's safe.
>
> If you take the assignment to p away then a warning is issued,
> and that's because p is declared with attribute nonstring.
> That's also why get_attr_nonstring_decl looks at SSA_NAME_VAR.
>
> > I think in your code comparing bases you want to look at the _original_
> > argument to the string function rather than what get_attr_nonstring_decl
> > returned as ref.
>
> I've adjusted get_attr_nonstring_decl() to avoid setting *REF
> to SSA_NAME_VAR.  That let me remove the GIMPLE_NOP code from
> the patch.  I've also updated the comment above SSA_NAME_VAR
> to clarify its purpose per Jeff's comments.
>
> Attached is an updated revision with these changes.
>
> Martin
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] avoid warning on constant strncpy until next statement is reachable (PR 87028)

Martin Sebor-2
On 08/29/2018 01:29 AM, Richard Biener wrote:

> On Wed, Aug 29, 2018 at 2:12 AM Martin Sebor <[hidden email]> wrote:
>>
>>>> Sadly, dstbase is the PARM_DECL for d.  That's where things are going
>>>> "wrong".  Not sure why you're getting the PARM_DECL in that case.  I'd
>>>> debug get_addr_base_and_unit_offset to understand what's going on.
>>>> Essentially you're getting different results of
>>>> get_addr_base_and_unit_offset in a case where they arguably should be
>>>> the same.
>>>
>>> Probably get_attr_nonstring_decl has the same "mistake" and returns
>>> the PARM_DECL instead of the SSA name pointer.  So we're comparing
>>> apples and oranges here.
>>
>> Returning the SSA_NAME_VAR from get_attr_nonstring_decl() is
>> intentional but the function need not (perhaps should not)
>> also set *REF to it.
>>
>>>
>>> Yeah:
>>>
>>> /* If EXPR refers to a character array or pointer declared attribute
>>>    nonstring return a decl for that array or pointer and set *REF to
>>>    the referenced enclosing object or pointer.  Otherwise returns
>>>    null.  */
>>>
>>> tree
>>> get_attr_nonstring_decl (tree expr, tree *ref)
>>> {
>>>   tree decl = expr;
>>>   if (TREE_CODE (decl) == SSA_NAME)
>>>     {
>>>       gimple *def = SSA_NAME_DEF_STMT (decl);
>>>
>>>       if (is_gimple_assign (def))
>>>         {
>>>           tree_code code = gimple_assign_rhs_code (def);
>>>           if (code == ADDR_EXPR
>>>               || code == COMPONENT_REF
>>>               || code == VAR_DECL)
>>>             decl = gimple_assign_rhs1 (def);
>>>         }
>>>       else if (tree var = SSA_NAME_VAR (decl))
>>>         decl = var;
>>>     }
>>>
>>>   if (TREE_CODE (decl) == ADDR_EXPR)
>>>     decl = TREE_OPERAND (decl, 0);
>>>
>>>   if (ref)
>>>     *ref = decl;
>>>
>>> I see a lot of "magic" here again in the attempt to "propagate"
>>> a nonstring attribute.
>>
>> That's the function's purpose: to look for the attribute.  Is
>> there a better way to do this?
>
> Well, the question is what "nonstring" is, semantically.  I read it
> as sth like __restrinct - a pointer with "nonstring" attribute points
> to a non-string.  So I suspect your function either computes
> "may expr point to a nonstring" or "must expr point to a nonstring"
> if it gets a pointer argument.  If it gets a (string) object it checks whether
> that object is declared "nonstring" (thus, if you'd built a pointer to expr
> whether that pointer _must_ point to a nonstring.  So I guess the first
> one is "must".  Clearly looking at SSA_NAME_VAR isn't good here,
> it would be semantically correct only for SSA_NAME_IS_DEFAULT_DEF
> and SSA_NAME_VAR being a PARM_DECL.
>
> I guess it would be nice to clearly separate the pointer vs. object case
> by documentation in the function - all of the quoted parts above seem
> to be for the address case so a gcc_assert (POINTER_TYPE_P (TREE_TYPE (decl))
> inside the if (TREE_CODE (decl) == SSA_NAME) path should never trigger?

Attribute nonstring on either an array or a pointer decl means
"it need not be a nul-terminated string."  I.e., it's just
a sequence of bytes.  If it happens to have a nul in it then it
is a string.   I don't think of the pointer case as different
from the array.

The get_attr_nonstring_decl() function isn't a predicate telling
us whether or not an expression refers to a string.  It returns
true if it refers to an object declared nonstring.  Whether what
the object contains/points to is in fact a string is determined
somewhere else.

>
>>> Note
>>>
>>> foo (char *p __attribute__(("nonstring")))
>>> {
>>>   p = "bar";
>>>   strlen (p); // or whatever is necessary to call get_attr_nonstring_decl
>>> }
>>>
>>> is perfectly valid and p as passed to strlen is _not_ nonstring(?).
>>
>> I don't know if you're saying that it should get a warning or
>> shouldn't.  Right now it doesn't because the strlen() call is
>> folded before we check for nonstring.
>
> I say it shouldn't because I assign "bar" to p and after that p isn't
> the original parameter anymore?

I agree with not warning here, but I don't think of p's nonstring
property as changing with an assignment.  It's still nonstring,
we just know that what it points to at the moment is a string.
If the code were instead:

   extern char a[];
   p = a;
   return strlen (a) + strlen (p);

a warning would be expected for strlen (p) because p is declared
to point to what need not be a string.  A warning would not be
expected for strlen (a) because it is not declared nonstring so
when we don't know, the assumption is that it is a string.

Does that make sense?

Martin

PS Since restrict is a property of a pointer and part of the type
system nonstring a property of what the pointer points and not
part of the type system to I don't think of them as similar.  In
my mind, nonstring is analogous to the notions of object constness
and volatility (but not the const and volatile qualifiers).  it's
okay to assign the address of a const object to a non-const pointer,
but it's an error to try to modify the object through the pointer.
(It would be nice to add a warning to detect these kinds of errors
as well.)

>
>> I could see an argument for diagnosing it but I suspect you
>> wouldn't like it because it would mean more warning from
>> the folder.  I could also see an argument against it because,
>> as you said, it's safe.
>>
>> If you take the assignment to p away then a warning is issued,
>> and that's because p is declared with attribute nonstring.
>> That's also why get_attr_nonstring_decl looks at SSA_NAME_VAR.
>>
>>> I think in your code comparing bases you want to look at the _original_
>>> argument to the string function rather than what get_attr_nonstring_decl
>>> returned as ref.
>>
>> I've adjusted get_attr_nonstring_decl() to avoid setting *REF
>> to SSA_NAME_VAR.  That let me remove the GIMPLE_NOP code from
>> the patch.  I've also updated the comment above SSA_NAME_VAR
>> to clarify its purpose per Jeff's comments.
>>
>> Attached is an updated revision with these changes.
>>
>> Martin

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] avoid warning on constant strncpy until next statement is reachable (PR 87028)

Jeff Law
In reply to this post by Martin Sebor-2
On 08/28/2018 06:12 PM, Martin Sebor wrote:

>>> Sadly, dstbase is the PARM_DECL for d.  That's where things are going
>>> "wrong".  Not sure why you're getting the PARM_DECL in that case.  I'd
>>> debug get_addr_base_and_unit_offset to understand what's going on.
>>> Essentially you're getting different results of
>>> get_addr_base_and_unit_offset in a case where they arguably should be
>>> the same.
>>
>> Probably get_attr_nonstring_decl has the same "mistake" and returns
>> the PARM_DECL instead of the SSA name pointer.  So we're comparing
>> apples and oranges here.
>
> Returning the SSA_NAME_VAR from get_attr_nonstring_decl() is
> intentional but the function need not (perhaps should not)
> also set *REF to it.
>
>>
>> Yeah:
>>
>> /* If EXPR refers to a character array or pointer declared attribute
>>    nonstring return a decl for that array or pointer and set *REF to
>>    the referenced enclosing object or pointer.  Otherwise returns
>>    null.  */
>>
>> tree
>> get_attr_nonstring_decl (tree expr, tree *ref)
>> {
>>   tree decl = expr;
>>   if (TREE_CODE (decl) == SSA_NAME)
>>     {
>>       gimple *def = SSA_NAME_DEF_STMT (decl);
>>
>>       if (is_gimple_assign (def))
>>         {
>>           tree_code code = gimple_assign_rhs_code (def);
>>           if (code == ADDR_EXPR
>>               || code == COMPONENT_REF
>>               || code == VAR_DECL)
>>             decl = gimple_assign_rhs1 (def);
>>         }
>>       else if (tree var = SSA_NAME_VAR (decl))
>>         decl = var;
>>     }
>>
>>   if (TREE_CODE (decl) == ADDR_EXPR)
>>     decl = TREE_OPERAND (decl, 0);
>>
>>   if (ref)
>>     *ref = decl;
>>
>> I see a lot of "magic" here again in the attempt to "propagate"
>> a nonstring attribute.
>
> That's the function's purpose: to look for the attribute.  Is
> there a better way to do this?
Well, there's a distinction between looking for the attribute (which
will be on the _DECL node) and determining if the current instance (an
SSA_NAME) has that attribute.

What I think Richard is implying is that it might be better to propagate
the state of the attribute to instances rather than going from an
SSA_NAME backwards through the use-def chains or SSA_NAME_VAR to get to
a potentially related _DECL node.

This could be built into the alias oracle, or via a propagation engine.
In either approach you should be able to cut down on false positives as
well as false negatives.

>
>> Note
>>
>> foo (char *p __attribute__(("nonstring")))
>> {
>>   p = "bar";
>>   strlen (p); // or whatever is necessary to call get_attr_nonstring_decl
>> }
>>
>> is perfectly valid and p as passed to strlen is _not_ nonstring(?).
>
> I don't know if you're saying that it should get a warning or
> shouldn't.  Right now it doesn't because the strlen() call is
> folded before we check for nonstring.
>
> I could see an argument for diagnosing it but I suspect you
> wouldn't like it because it would mean more warning from
> the folder.  I could also see an argument against it because,
> as you said, it's safe.
Well, this is where propagating the bit would help.  The assignment p =
"bar" would clobber the nonstring property because we know "bar" is
properly terminated. Pointer arithmetic, casts and the like would
preserve the property and so on.

If it were done via the aliasing oracle, the instance of P in the strlen
call would be known to point to a proper string and thus the call is safe.

Hope this helps...


Jeff
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] avoid warning on constant strncpy until next statement is reachable (PR 87028)

Richard Biener-2
On Thu, Aug 30, 2018 at 2:27 AM Jeff Law <[hidden email]> wrote:

>
> On 08/28/2018 06:12 PM, Martin Sebor wrote:
> >>> Sadly, dstbase is the PARM_DECL for d.  That's where things are going
> >>> "wrong".  Not sure why you're getting the PARM_DECL in that case.  I'd
> >>> debug get_addr_base_and_unit_offset to understand what's going on.
> >>> Essentially you're getting different results of
> >>> get_addr_base_and_unit_offset in a case where they arguably should be
> >>> the same.
> >>
> >> Probably get_attr_nonstring_decl has the same "mistake" and returns
> >> the PARM_DECL instead of the SSA name pointer.  So we're comparing
> >> apples and oranges here.
> >
> > Returning the SSA_NAME_VAR from get_attr_nonstring_decl() is
> > intentional but the function need not (perhaps should not)
> > also set *REF to it.
> >
> >>
> >> Yeah:
> >>
> >> /* If EXPR refers to a character array or pointer declared attribute
> >>    nonstring return a decl for that array or pointer and set *REF to
> >>    the referenced enclosing object or pointer.  Otherwise returns
> >>    null.  */
> >>
> >> tree
> >> get_attr_nonstring_decl (tree expr, tree *ref)
> >> {
> >>   tree decl = expr;
> >>   if (TREE_CODE (decl) == SSA_NAME)
> >>     {
> >>       gimple *def = SSA_NAME_DEF_STMT (decl);
> >>
> >>       if (is_gimple_assign (def))
> >>         {
> >>           tree_code code = gimple_assign_rhs_code (def);
> >>           if (code == ADDR_EXPR
> >>               || code == COMPONENT_REF
> >>               || code == VAR_DECL)
> >>             decl = gimple_assign_rhs1 (def);
> >>         }
> >>       else if (tree var = SSA_NAME_VAR (decl))
> >>         decl = var;
> >>     }
> >>
> >>   if (TREE_CODE (decl) == ADDR_EXPR)
> >>     decl = TREE_OPERAND (decl, 0);
> >>
> >>   if (ref)
> >>     *ref = decl;
> >>
> >> I see a lot of "magic" here again in the attempt to "propagate"
> >> a nonstring attribute.
> >
> > That's the function's purpose: to look for the attribute.  Is
> > there a better way to do this?
> Well, there's a distinction between looking for the attribute (which
> will be on the _DECL node) and determining if the current instance (an
> SSA_NAME) has that attribute.
>
> What I think Richard is implying is that it might be better to propagate
> the state of the attribute to instances rather than going from an
> SSA_NAME backwards through the use-def chains or SSA_NAME_VAR to get to
> a potentially related _DECL node.
>
> This could be built into the alias oracle, or via a propagation engine.
> In either approach you should be able to cut down on false positives as
> well as false negatives.

It's more like the underlying decl of a SSA name doesn't guarantee you
the entity was originally related to that decl.

Maybe we're should be more strict here because we use the underlying
decl for debug info purposes.

Given there's really no semantic on the attribute but it just suppresses
warnings I'm OK with looking at the underlying decl.  Yes, propagating
would eventually improve things but it might be overkill at the same time
(just costing compile-time).

> >
> >> Note
> >>
> >> foo (char *p __attribute__(("nonstring")))
> >> {
> >>   p = "bar";
> >>   strlen (p); // or whatever is necessary to call get_attr_nonstring_decl
> >> }
> >>
> >> is perfectly valid and p as passed to strlen is _not_ nonstring(?).
> >
> > I don't know if you're saying that it should get a warning or
> > shouldn't.  Right now it doesn't because the strlen() call is
> > folded before we check for nonstring.
> >
> > I could see an argument for diagnosing it but I suspect you
> > wouldn't like it because it would mean more warning from
> > the folder.  I could also see an argument against it because,
> > as you said, it's safe.
> Well, this is where propagating the bit would help.  The assignment p =
> "bar" would clobber the nonstring property because we know "bar" is
> properly terminated. Pointer arithmetic, casts and the like would
> preserve the property and so on.
>
> If it were done via the aliasing oracle, the instance of P in the strlen
> call would be known to point to a proper string and thus the call is safe.
>
> Hope this helps...

So to elaborate a bit here - to propagate these kind of attributes
in PTA analysis (for example) you'd need to introduce fake
pointed-to objects (just special ids like nonlocal), nonstring
and string and have "sources" of those generate constraints.
After propagation finished you could then see whether an
SSA name points to either string or nonstring exclusively or
to both and set a bit in the pointer-info according to that
result.

It comes at the cost of increasing points-to bitmaps and
more constraints during propagation.

If you can do with just knowing whether any nonstring source
can be possibly pointed-to the effect on code not using that
attribute would be none.  Just be aware that with points-to
analysis this stuff leaks quite a bit since it is conservative
propagation (may point to nonstring) - separately tracking
may point to string allows you to get an idea of
"must point to nonstring".  But that comes at a cost.

A "must point to" propagator would be useful thing to have
as well I guess.  That would fit in a value-numbering kind
of framework.

Richard.

>
>
> Jeff
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] avoid warning on constant strncpy until next statement is reachable (PR 87028)

Martin Sebor-2
In reply to this post by Martin Sebor-2
PING: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01818.html

There have been follow up comments in this thread suggesting
alternate designs for the nonstr attribute but (AFAICT) no
objections to the bug fix.  I don't expect to have the time
to redesign and reimplement the attribute for GCC 9 in terms
of the alias oracle as was suggested but I would like to avoid
the warning in the report.

Is the final patch okay to commit?

Martin

On 08/28/2018 06:12 PM, Martin Sebor wrote:

>>> Sadly, dstbase is the PARM_DECL for d.  That's where things are going
>>> "wrong".  Not sure why you're getting the PARM_DECL in that case.  I'd
>>> debug get_addr_base_and_unit_offset to understand what's going on.
>>> Essentially you're getting different results of
>>> get_addr_base_and_unit_offset in a case where they arguably should be
>>> the same.
>>
>> Probably get_attr_nonstring_decl has the same "mistake" and returns
>> the PARM_DECL instead of the SSA name pointer.  So we're comparing
>> apples and oranges here.
>
> Returning the SSA_NAME_VAR from get_attr_nonstring_decl() is
> intentional but the function need not (perhaps should not)
> also set *REF to it.
>
>>
>> Yeah:
>>
>> /* If EXPR refers to a character array or pointer declared attribute
>>    nonstring return a decl for that array or pointer and set *REF to
>>    the referenced enclosing object or pointer.  Otherwise returns
>>    null.  */
>>
>> tree
>> get_attr_nonstring_decl (tree expr, tree *ref)
>> {
>>   tree decl = expr;
>>   if (TREE_CODE (decl) == SSA_NAME)
>>     {
>>       gimple *def = SSA_NAME_DEF_STMT (decl);
>>
>>       if (is_gimple_assign (def))
>>         {
>>           tree_code code = gimple_assign_rhs_code (def);
>>           if (code == ADDR_EXPR
>>               || code == COMPONENT_REF
>>               || code == VAR_DECL)
>>             decl = gimple_assign_rhs1 (def);
>>         }
>>       else if (tree var = SSA_NAME_VAR (decl))
>>         decl = var;
>>     }
>>
>>   if (TREE_CODE (decl) == ADDR_EXPR)
>>     decl = TREE_OPERAND (decl, 0);
>>
>>   if (ref)
>>     *ref = decl;
>>
>> I see a lot of "magic" here again in the attempt to "propagate"
>> a nonstring attribute.
>
> That's the function's purpose: to look for the attribute.  Is
> there a better way to do this?
>
>> Note
>>
>> foo (char *p __attribute__(("nonstring")))
>> {
>>   p = "bar";
>>   strlen (p); // or whatever is necessary to call get_attr_nonstring_decl
>> }
>>
>> is perfectly valid and p as passed to strlen is _not_ nonstring(?).
>
> I don't know if you're saying that it should get a warning or
> shouldn't.  Right now it doesn't because the strlen() call is
> folded before we check for nonstring.
>
> I could see an argument for diagnosing it but I suspect you
> wouldn't like it because it would mean more warning from
> the folder.  I could also see an argument against it because,
> as you said, it's safe.
>
> If you take the assignment to p away then a warning is issued,
> and that's because p is declared with attribute nonstring.
> That's also why get_attr_nonstring_decl looks at SSA_NAME_VAR.
>
>> I think in your code comparing bases you want to look at the _original_
>> argument to the string function rather than what get_attr_nonstring_decl
>> returned as ref.
>
> I've adjusted get_attr_nonstring_decl() to avoid setting *REF
> to SSA_NAME_VAR.  That let me remove the GIMPLE_NOP code from
> the patch.  I've also updated the comment above SSA_NAME_VAR
> to clarify its purpose per Jeff's comments.
>
> Attached is an updated revision with these changes.
>
> Martin

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] avoid warning on constant strncpy until next statement is reachable (PR 87028)

Jeff Law
In reply to this post by Martin Sebor-2
On 8/28/18 6:12 PM, Martin Sebor wrote:

>>> Sadly, dstbase is the PARM_DECL for d.  That's where things are going
>>> "wrong".  Not sure why you're getting the PARM_DECL in that case.  I'd
>>> debug get_addr_base_and_unit_offset to understand what's going on.
>>> Essentially you're getting different results of
>>> get_addr_base_and_unit_offset in a case where they arguably should be
>>> the same.
>>
>> Probably get_attr_nonstring_decl has the same "mistake" and returns
>> the PARM_DECL instead of the SSA name pointer.  So we're comparing
>> apples and oranges here.
>
> Returning the SSA_NAME_VAR from get_attr_nonstring_decl() is
> intentional but the function need not (perhaps should not)
> also set *REF to it.
>
>>
>> Yeah:
>>
>> /* If EXPR refers to a character array or pointer declared attribute
>>    nonstring return a decl for that array or pointer and set *REF to
>>    the referenced enclosing object or pointer.  Otherwise returns
>>    null.  */
>>
>> tree
>> get_attr_nonstring_decl (tree expr, tree *ref)
>> {
>>   tree decl = expr;
>>   if (TREE_CODE (decl) == SSA_NAME)
>>     {
>>       gimple *def = SSA_NAME_DEF_STMT (decl);
>>
>>       if (is_gimple_assign (def))
>>         {
>>           tree_code code = gimple_assign_rhs_code (def);
>>           if (code == ADDR_EXPR
>>               || code == COMPONENT_REF
>>               || code == VAR_DECL)
>>             decl = gimple_assign_rhs1 (def);
>>         }
>>       else if (tree var = SSA_NAME_VAR (decl))
>>         decl = var;
>>     }
>>
>>   if (TREE_CODE (decl) == ADDR_EXPR)
>>     decl = TREE_OPERAND (decl, 0);
>>
>>   if (ref)
>>     *ref = decl;
>>
>> I see a lot of "magic" here again in the attempt to "propagate"
>> a nonstring attribute.
>
> That's the function's purpose: to look for the attribute.  Is
> there a better way to do this?
>
>> Note
>>
>> foo (char *p __attribute__(("nonstring")))
>> {
>>   p = "bar";
>>   strlen (p); // or whatever is necessary to call get_attr_nonstring_decl
>> }
>>
>> is perfectly valid and p as passed to strlen is _not_ nonstring(?).
>
> I don't know if you're saying that it should get a warning or
> shouldn't.  Right now it doesn't because the strlen() call is
> folded before we check for nonstring.
>
> I could see an argument for diagnosing it but I suspect you
> wouldn't like it because it would mean more warning from
> the folder.  I could also see an argument against it because,
> as you said, it's safe.
>
> If you take the assignment to p away then a warning is issued,
> and that's because p is declared with attribute nonstring.
> That's also why get_attr_nonstring_decl looks at SSA_NAME_VAR.
>
>> I think in your code comparing bases you want to look at the _original_
>> argument to the string function rather than what get_attr_nonstring_decl
>> returned as ref.
>
> I've adjusted get_attr_nonstring_decl() to avoid setting *REF
> to SSA_NAME_VAR.  That let me remove the GIMPLE_NOP code from
> the patch.  I've also updated the comment above SSA_NAME_VAR
> to clarify its purpose per Jeff's comments.
>
> Attached is an updated revision with these changes.
>
> Martin
>
> gcc-87028.diff
>
> PR tree-optimization/87028 - false positive -Wstringop-truncation strncpy with global variable source string
> gcc/ChangeLog:
>
> PR tree-optimization/87028
> * calls.c (get_attr_nonstring_decl): Avoid setting *REF to
> SSA_NAME_VAR.
> * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding
> when statement doesn't belong to a basic block.
> * tree.h (SSA_NAME_VAR): Update comment.
> * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Simplify.
>
> gcc/testsuite/ChangeLog:
>
> PR tree-optimization/87028
> * c-c++-common/Wstringop-truncation.c: Remove xfails.
> * gcc.dg/Wstringop-truncation-5.c: New test.
>

> Index: gcc/calls.c
> ===================================================================
> --- gcc/calls.c (revision 263928)
> +++ gcc/calls.c (working copy)
> @@ -1503,6 +1503,7 @@ tree
>  get_attr_nonstring_decl (tree expr, tree *ref)
>  {
>    tree decl = expr;
> +  tree var = NULL_TREE;
>    if (TREE_CODE (decl) == SSA_NAME)
>      {
>        gimple *def = SSA_NAME_DEF_STMT (decl);
> @@ -1515,17 +1516,25 @@ get_attr_nonstring_decl (tree expr, tree *ref)
>        || code == VAR_DECL)
>      decl = gimple_assign_rhs1 (def);
>   }
> -      else if (tree var = SSA_NAME_VAR (decl))
> - decl = var;
> +      else
> + var = SSA_NAME_VAR (decl);
>      }
>  
>    if (TREE_CODE (decl) == ADDR_EXPR)
>      decl = TREE_OPERAND (decl, 0);
>  
> +  /* To simplify calling code, store the referenced DECL regardless of
> +     the attribute determined below, but avoid storing the SSA_NAME_VAR
> +     obtained above (it's not useful for dataflow purposes).  */
>    if (ref)
>      *ref = decl;
>  
> -  if (TREE_CODE (decl) == ARRAY_REF)
> +  /* Use the SSA_NAME_VAR that was determined above to see if it's
> +     declared nonstring.  Otherwise drill down into the referenced
> +     DECL.  */
> +  if (var)
> +    decl = var;
> +  else if (TREE_CODE (decl) == ARRAY_REF)
>      decl = TREE_OPERAND (decl, 0);
>    else if (TREE_CODE (decl) == COMPONENT_REF)
>      decl = TREE_OPERAND (decl, 1);
The more I look at this the more I think what we really want to be doing
is real propagation of the property either via the alias oracle or a
propagation engine.   You can't even guarantee that if you've got an
SSA_NAME that the value it holds has any relation to its underlying
SSA_NAME_VAR -- the value in the SSA_NAME could well have been copied
from a some other SSA_NAME with a different underlying SSA_NAME_VAR.

I'm not going to insist on it, but I think if we find ourselves
extending this again in a way that is really working around lack of
propagation of the property then we should go back and fix the
propagation problem.



> Index: gcc/gimple-fold.c
> ===================================================================
> --- gcc/gimple-fold.c (revision 263925)
> +++ gcc/gimple-fold.c (working copy)
> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy (gimple_stmt_iterator
>    if (tree_int_cst_lt (ssize, len))
>      return false;
>  
> +  /* Defer warning (and folding) until the next statement in the basic
> +     block is reachable.  */
> +  if (!gimple_bb (stmt))
> +    return false;
> +
>    /* Diagnose truncation that leaves the copy unterminated.  */
>    maybe_diag_stxncpy_trunc (*gsi, src, len);
I thought Richi wanted the guard earlier (maybe_fold_stmt) -- it wasn't
entirely clear to me if the subsequent comments about needing to fold
early where meant to raise issues with guarding earlier or not.

Jeff
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] avoid warning on constant strncpy until next statement is reachable (PR 87028)

Martin Sebor-2
On 09/17/2018 07:30 PM, Jeff Law wrote:

> On 8/28/18 6:12 PM, Martin Sebor wrote:
>>>> Sadly, dstbase is the PARM_DECL for d.  That's where things are going
>>>> "wrong".  Not sure why you're getting the PARM_DECL in that case.  I'd
>>>> debug get_addr_base_and_unit_offset to understand what's going on.
>>>> Essentially you're getting different results of
>>>> get_addr_base_and_unit_offset in a case where they arguably should be
>>>> the same.
>>>
>>> Probably get_attr_nonstring_decl has the same "mistake" and returns
>>> the PARM_DECL instead of the SSA name pointer.  So we're comparing
>>> apples and oranges here.
>>
>> Returning the SSA_NAME_VAR from get_attr_nonstring_decl() is
>> intentional but the function need not (perhaps should not)
>> also set *REF to it.
>>
>>>
>>> Yeah:
>>>
>>> /* If EXPR refers to a character array or pointer declared attribute
>>>    nonstring return a decl for that array or pointer and set *REF to
>>>    the referenced enclosing object or pointer.  Otherwise returns
>>>    null.  */
>>>
>>> tree
>>> get_attr_nonstring_decl (tree expr, tree *ref)
>>> {
>>>   tree decl = expr;
>>>   if (TREE_CODE (decl) == SSA_NAME)
>>>     {
>>>       gimple *def = SSA_NAME_DEF_STMT (decl);
>>>
>>>       if (is_gimple_assign (def))
>>>         {
>>>           tree_code code = gimple_assign_rhs_code (def);
>>>           if (code == ADDR_EXPR
>>>               || code == COMPONENT_REF
>>>               || code == VAR_DECL)
>>>             decl = gimple_assign_rhs1 (def);
>>>         }
>>>       else if (tree var = SSA_NAME_VAR (decl))
>>>         decl = var;
>>>     }
>>>
>>>   if (TREE_CODE (decl) == ADDR_EXPR)
>>>     decl = TREE_OPERAND (decl, 0);
>>>
>>>   if (ref)
>>>     *ref = decl;
>>>
>>> I see a lot of "magic" here again in the attempt to "propagate"
>>> a nonstring attribute.
>>
>> That's the function's purpose: to look for the attribute.  Is
>> there a better way to do this?
>>
>>> Note
>>>
>>> foo (char *p __attribute__(("nonstring")))
>>> {
>>>   p = "bar";
>>>   strlen (p); // or whatever is necessary to call get_attr_nonstring_decl
>>> }
>>>
>>> is perfectly valid and p as passed to strlen is _not_ nonstring(?).
>>
>> I don't know if you're saying that it should get a warning or
>> shouldn't.  Right now it doesn't because the strlen() call is
>> folded before we check for nonstring.
>>
>> I could see an argument for diagnosing it but I suspect you
>> wouldn't like it because it would mean more warning from
>> the folder.  I could also see an argument against it because,
>> as you said, it's safe.
>>
>> If you take the assignment to p away then a warning is issued,
>> and that's because p is declared with attribute nonstring.
>> That's also why get_attr_nonstring_decl looks at SSA_NAME_VAR.
>>
>>> I think in your code comparing bases you want to look at the _original_
>>> argument to the string function rather than what get_attr_nonstring_decl
>>> returned as ref.
>>
>> I've adjusted get_attr_nonstring_decl() to avoid setting *REF
>> to SSA_NAME_VAR.  That let me remove the GIMPLE_NOP code from
>> the patch.  I've also updated the comment above SSA_NAME_VAR
>> to clarify its purpose per Jeff's comments.
>>
>> Attached is an updated revision with these changes.
>>
>> Martin
>>
>> gcc-87028.diff
>>
>> PR tree-optimization/87028 - false positive -Wstringop-truncation strncpy with global variable source string
>> gcc/ChangeLog:
>>
>> PR tree-optimization/87028
>> * calls.c (get_attr_nonstring_decl): Avoid setting *REF to
>> SSA_NAME_VAR.
>> * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding
>> when statement doesn't belong to a basic block.
>> * tree.h (SSA_NAME_VAR): Update comment.
>> * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Simplify.
>>
>> gcc/testsuite/ChangeLog:
>>
>> PR tree-optimization/87028
>> * c-c++-common/Wstringop-truncation.c: Remove xfails.
>> * gcc.dg/Wstringop-truncation-5.c: New test.
>>
>
>> Index: gcc/calls.c
>> ===================================================================
>> --- gcc/calls.c (revision 263928)
>> +++ gcc/calls.c (working copy)
>> @@ -1503,6 +1503,7 @@ tree
>>  get_attr_nonstring_decl (tree expr, tree *ref)
>>  {
>>    tree decl = expr;
>> +  tree var = NULL_TREE;
>>    if (TREE_CODE (decl) == SSA_NAME)
>>      {
>>        gimple *def = SSA_NAME_DEF_STMT (decl);
>> @@ -1515,17 +1516,25 @@ get_attr_nonstring_decl (tree expr, tree *ref)
>>        || code == VAR_DECL)
>>      decl = gimple_assign_rhs1 (def);
>>   }
>> -      else if (tree var = SSA_NAME_VAR (decl))
>> - decl = var;
>> +      else
>> + var = SSA_NAME_VAR (decl);
>>      }
>>
>>    if (TREE_CODE (decl) == ADDR_EXPR)
>>      decl = TREE_OPERAND (decl, 0);
>>
>> +  /* To simplify calling code, store the referenced DECL regardless of
>> +     the attribute determined below, but avoid storing the SSA_NAME_VAR
>> +     obtained above (it's not useful for dataflow purposes).  */
>>    if (ref)
>>      *ref = decl;
>>
>> -  if (TREE_CODE (decl) == ARRAY_REF)
>> +  /* Use the SSA_NAME_VAR that was determined above to see if it's
>> +     declared nonstring.  Otherwise drill down into the referenced
>> +     DECL.  */
>> +  if (var)
>> +    decl = var;
>> +  else if (TREE_CODE (decl) == ARRAY_REF)
>>      decl = TREE_OPERAND (decl, 0);
>>    else if (TREE_CODE (decl) == COMPONENT_REF)
>>      decl = TREE_OPERAND (decl, 1);
> The more I look at this the more I think what we really want to be doing
> is real propagation of the property either via the alias oracle or a
> propagation engine.   You can't even guarantee that if you've got an
> SSA_NAME that the value it holds has any relation to its underlying
> SSA_NAME_VAR -- the value in the SSA_NAME could well have been copied
> from a some other SSA_NAME with a different underlying SSA_NAME_VAR.
>
> I'm not going to insist on it, but I think if we find ourselves
> extending this again in a way that is really working around lack of
> propagation of the property then we should go back and fix the
> propagation problem.

We talked about improving this back in the GCC 8 cycle.  I've
been collecting input (and test cases) from Miguel Ojeda from
the adoption of the attribute in the Linux kernel.  There are
a number of issues I was hoping to get to in stage 1 but that
has been derailed by all the strlen back and forth.  I'm still
hoping to be able to fix some of the false positives here in
stage 3 but, IIUC the constraints, a redesign along the lines
you suggest would be considered overly intrusive.  (If not,
I'm willing to look into it.)

That said, I had the impression from Richard's comments that
implementing the propagation in points-to analysis would come
at a cost and have its own downsides:

   https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01954.html

So I wasn't sure it was necessarily an endorsement of
the approach as the ideal solution or just a passing thought.

>> Index: gcc/gimple-fold.c
>> ===================================================================
>> --- gcc/gimple-fold.c (revision 263925)
>> +++ gcc/gimple-fold.c (working copy)
>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy (gimple_stmt_iterator
>>    if (tree_int_cst_lt (ssize, len))
>>      return false;
>>
>> +  /* Defer warning (and folding) until the next statement in the basic
>> +     block is reachable.  */
>> +  if (!gimple_bb (stmt))
>> +    return false;
>> +
>>    /* Diagnose truncation that leaves the copy unterminated.  */
>>    maybe_diag_stxncpy_trunc (*gsi, src, len);
> I thought Richi wanted the guard earlier (maybe_fold_stmt) -- it wasn't
> entirely clear to me if the subsequent comments about needing to fold
> early where meant to raise issues with guarding earlier or not.

I'm fine with moving it if that's preferable.

Moving the test to maybe_fold_stmt() would, IMO, be the right
change to make in general, at least for library built-ins.
I have been meaning to suggest it independently of this fix
but because of its pervasive impact I've been holding off,
expecting it to be controversial.  If there is consensus I'm
happy to make this change but I would prefer to do it separately
since it causes a number of regressions in tests that expect
built-ins to be folded very early on (i.e., look for evidence
of the folding in the output of -fdump-tree-gimple or
-fdump-tree-ccp1).  Some of the regression would go away if
maybe_fold_stmt() only avoided folding of library built-in
functions.  Resolving the others would require adjusting
the tests to either use optimization or look for the evidence
of folding in later passes than gimple or ccp1).  I think all
that is reasonable and won't impact the efficiency of
the emitted object code, but it's obviously a much bigger
change than a simple fix for a false positive warning.

If that sounds reasonable, is the patch acceptable as is?

The latest version is here:

   https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01818.html

Martin
12