[PATCH] include size and offset in -Wstringop-overflow

classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] include size and offset in -Wstringop-overflow

Martin Sebor-2
On 11/17/19 12:03 PM, Jeff Law wrote:

> On 11/12/19 12:55 PM, Martin Sebor wrote:
>> On 11/12/19 10:54 AM, Jeff Law wrote:
>>> On 11/12/19 1:15 AM, Richard Biener wrote:
>>>> On Tue, Nov 12, 2019 at 6:10 AM Jeff Law <[hidden email]> wrote:
>>>>>
>>>>> On 11/6/19 3:34 PM, Martin Sebor wrote:
>>>>>> On 11/6/19 2:06 PM, Martin Sebor wrote:
>>>>>>> On 11/6/19 1:39 PM, Jeff Law wrote:
>>>>>>>> On 11/6/19 1:27 PM, Martin Sebor wrote:
>>>>>>>>> On 11/6/19 11:55 AM, Jeff Law wrote:
>>>>>>>>>> On 11/6/19 11:00 AM, Martin Sebor wrote:
>>>>>>>>>>> The -Wstringop-overflow warnings for single-byte and multi-byte
>>>>>>>>>>> stores mention the amount of data being stored and the amount of
>>>>>>>>>>> space remaining in the destination, such as:
>>>>>>>>>>>
>>>>>>>>>>> warning: writing 4 bytes into a region of size 0
>>>>>>>>>>> [-Wstringop-overflow=]
>>>>>>>>>>>
>>>>>>>>>>>       123 |   *p = 0;
>>>>>>>>>>>           |   ~~~^~~
>>>>>>>>>>> note: destination object declared here
>>>>>>>>>>>        45 |   char b[N];
>>>>>>>>>>>           |        ^
>>>>>>>>>>>
>>>>>>>>>>> A warning like this can take some time to analyze.  First, the
>>>>>>>>>>> size
>>>>>>>>>>> of the destination isn't mentioned and may not be easy to tell
>>>>>>>>>>> from
>>>>>>>>>>> the sources.  In the note above, when N's value is the result of
>>>>>>>>>>> some non-trivial computation, chasing it down may be a small
>>>>>>>>>>> project
>>>>>>>>>>> in and of itself.  Second, it's also not clear why the region
>>>>>>>>>>> size
>>>>>>>>>>> is zero.  It could be because the offset is exactly N, or because
>>>>>>>>>>> it's negative, or because it's in some range greater than N.
>>>>>>>>>>>
>>>>>>>>>>> Mentioning both the size of the destination object and the offset
>>>>>>>>>>> makes the existing messages clearer, are will become essential
>>>>>>>>>>> when
>>>>>>>>>>> GCC starts diagnosing overflow into allocated buffers (as my
>>>>>>>>>>> follow-on patch does).
>>>>>>>>>>>
>>>>>>>>>>> The attached patch enhances -Wstringop-overflow to do this by
>>>>>>>>>>> letting compute_objsize return the offset to its caller, doing
>>>>>>>>>>> something similar in get_stridx, and adding a new function to
>>>>>>>>>>> the strlen pass to issue this enhanced warning (eventually, I'd
>>>>>>>>>>> like the function to replace the -Wstringop-overflow handler in
>>>>>>>>>>> builtins.c).  With the change, the note above might read
>>>>>>>>>>> something
>>>>>>>>>>> like:
>>>>>>>>>>>
>>>>>>>>>>> note: at offset 11 to object ‘b’ with size 8 declared here
>>>>>>>>>>>        45 |   char b[N];
>>>>>>>>>>>           |        ^
>>>>>>>>>>>
>>>>>>>>>>> Tested on x86_64-linux.
>>>>>>>>>>>
>>>>>>>>>>> Martin
>>>>>>>>>>>
>>>>>>>>>>> gcc-store-offset.diff
>>>>>>>>>>>
>>>>>>>>>>> gcc/ChangeLog:
>>>>>>>>>>>
>>>>>>>>>>>        * builtins.c (compute_objsize): Add an argument and set
>>>>>>>>>>> it to
>>>>>>>>>>> offset
>>>>>>>>>>>        into destination.
>>>>>>>>>>>        * builtins.h (compute_objsize): Add an argument.
>>>>>>>>>>>        * tree-object-size.c (addr_object_size): Add an argument
>>>>>>>>>>> and
>>>>>>>>>>> set it
>>>>>>>>>>>        to offset into destination.
>>>>>>>>>>>        (compute_builtin_object_size): Same.
>>>>>>>>>>>        * tree-object-size.h (compute_builtin_object_size): Add an
>>>>>>>>>>> argument.
>>>>>>>>>>>        * tree-ssa-strlen.c (get_addr_stridx): Add an argument and
>>>>>>>>>>> set it
>>>>>>>>>>>        to offset into destination.
>>>>>>>>>>>        (maybe_warn_overflow): New function.
>>>>>>>>>>>        (handle_store): Call maybe_warn_overflow to issue warnings.
>>>>>>>>>>>
>>>>>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>>>>>
>>>>>>>>>>>        * c-c++-common/Wstringop-overflow-2.c: Adjust text of
>>>>>>>>>>> expected
>>>>>>>>>>> messages.
>>>>>>>>>>>        * g++.dg/warn/Wstringop-overflow-3.C: Same.
>>>>>>>>>>>        * gcc.dg/Wstringop-overflow-17.c: Same.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Index: gcc/tree-ssa-strlen.c
>>>>>>>>>>> ===================================================================
>>>>>>>>>>>
>>>>>>>>>>> --- gcc/tree-ssa-strlen.c    (revision 277886)
>>>>>>>>>>> +++ gcc/tree-ssa-strlen.c    (working copy)
>>>>>>>>>>> @@ -189,6 +189,52 @@ struct laststmt_struct
>>>>>>>>>>>      static int get_stridx_plus_constant (strinfo *, unsigned
>>>>>>>>>>> HOST_WIDE_INT, tree);
>>>>>>>>>>>      static void handle_builtin_stxncpy (built_in_function,
>>>>>>>>>>> gimple_stmt_iterator *);
>>>>>>>>>>>      +/* Sets MINMAX to either the constant value or the range VAL
>>>>>>>>>>> is in
>>>>>>>>>>> +   and returns true on success.  */
>>>>>>>>>>> +
>>>>>>>>>>> +static bool
>>>>>>>>>>> +get_range (tree val, wide_int minmax[2], const vr_values
>>>>>>>>>>> *rvals =
>>>>>>>>>>> NULL)
>>>>>>>>>>> +{
>>>>>>>>>>> +  if (tree_fits_uhwi_p (val))
>>>>>>>>>>> +    {
>>>>>>>>>>> +      minmax[0] = minmax[1] = wi::to_wide (val);
>>>>>>>>>>> +      return true;
>>>>>>>>>>> +    }
>>>>>>>>>>> +
>>>>>>>>>>> +  if (TREE_CODE (val) != SSA_NAME)
>>>>>>>>>>> +    return false;
>>>>>>>>>>> +
>>>>>>>>>>> +  if (rvals)
>>>>>>>>>>> +    {
>>>>>>>>>>> +      gimple *def = SSA_NAME_DEF_STMT (val);
>>>>>>>>>>> +      if (gimple_assign_single_p (def)
>>>>>>>>>>> +      && gimple_assign_rhs_code (def) == INTEGER_CST)
>>>>>>>>>>> +    {
>>>>>>>>>>> +      /* get_value_range returns [0, N] for constant
>>>>>>>>>>> assignments.  */
>>>>>>>>>>> +      val = gimple_assign_rhs1 (def);
>>>>>>>>>>> +      minmax[0] = minmax[1] = wi::to_wide (val);
>>>>>>>>>>> +      return true;
>>>>>>>>>>> +    }
>>>>>>>>>> Umm, something seems really off with this hunk.  If the
>>>>>>>>>> SSA_NAME is
>>>>>>>>>> set
>>>>>>>>>> via a simple constant assignment, then the range ought to be a
>>>>>>>>>> singleton
>>>>>>>>>> ie [CONST,CONST].   Is there are particular test were this is not
>>>>>>>>>> true?
>>>>>>>>>>
>>>>>>>>>> The only way offhand I could see this happening is if originally
>>>>>>>>>> the RHS
>>>>>>>>>> wasn't a constant, but due to optimizations it either simplified
>>>>>>>>>> into a
>>>>>>>>>> constant or a constant was propagated into an SSA_NAME
>>>>>>>>>> appearing on
>>>>>>>>>> the
>>>>>>>>>> RHS.  This would have to happen between the last range analysis
>>>>>>>>>> and
>>>>>>>>>> the
>>>>>>>>>> point where you're making this query.
>>>>>>>>>
>>>>>>>>> Yes, I think that's right.  Here's an example where it happens:
>>>>>>>>>
>>>>>>>>>      void f (void)
>>>>>>>>>      {
>>>>>>>>>        char s[] = "1234";
>>>>>>>>>        unsigned n = strlen (s);
>>>>>>>>>        char vla[n];   // or malloc (n)
>>>>>>>>>        vla[n] = 0;    // n = [4, 4]
>>>>>>>>>        ...
>>>>>>>>>      }
>>>>>>>>>
>>>>>>>>> The strlen call is folded to 4 but that's not propagated to
>>>>>>>>> the access until sometime after the strlen pass is done.
>>>>>>>> Hmm.  Are we calling set_range_info in that case?  That goes
>>>>>>>> behind the
>>>>>>>> back of pass instance of vr_values.  If so, that might argue we
>>>>>>>> want to
>>>>>>>> be setting it in vr_values too.
>>>>>>>
>>>>>>> No, set_range_info is only called for ranges.  In this case,
>>>>>>> handle_builtin_strlen replaces the strlen() call with 4:
>>>>>>>
>>>>>>>      s = "1234";
>>>>>>>      _1 = __builtin_strlen (&s);
>>>>>>>      n_2 = (unsigned int) _1;
>>>>>>>      a.1_8 = __builtin_alloca_with_align (_1, 8);
>>>>>>>      (*a.1_8)[n_2] = 0;
>>>>>>>
>>>>>>> When the access is made, the __builtin_alloca_with_align call
>>>>>>> is found as the destination and the _1 SSA_NAME is used to
>>>>>>> get its size.  We get back the range [4, 4].
>>>>>>
>>>>>> By the way, I glossed over one detail.  The above doesn't work
>>>>>> exactly as is because the allocation size is the SSA_NAME _1
>>>>>> (with the range [4, 4]) but the index is the SSA_NAME n_2 (with
>>>>>> the range [0, 4]; the range is [0, 4] because it was set based
>>>>>> on the size of the argument to the strlen() call well before
>>>>>> the strlen pass even ran).
>>>>> Which would tend to argue that we should forward propagate the constant
>>>>> to the uses of _1.  That should expose that the RHS of the
>>>>> assignment to
>>>>> n_2 is a constant as well.
>>>>>
>>>>>
>>>>>>
>>>>>> To make it work across assignments we need to propagate the strlen
>>>>>> results down the CFG somehow.  I'm hoping the on-demand VRP will
>>>>>> do this automagically.
>>>>> It would, but it's probably more heavyweight than we need.  We just
>>>>> need
>>>>> to forward propagate the discovered constant to the use points and pick
>>>>> up any secondary opportunities that arise.
>>>>
>>>> Yes.  And the usual way of doing this is to keep a constant-and-copy
>>>> lattice (and for copies you'd need to track availability) and before
>>>> optimizing
>>>> a stmt substitute its operands with the lattice contents.
>>>>
>>>> forwprop has a scheme that can be followed doing a RPO walk, strlen
>>>> does a DOM walk, there you can follow what DOM/PRE elimination do
>>>> (for tracking copy availability - if you just track constants you can
>>>> elide that).
>>> I'm also note sure handling copies is all that interesting here and if
>>> we just handle constants/invariants, then it's pretty easy.
>>>
>>> Whenever we replace a strlen call with a const, we put the LHS (assuming
>>> its an SSA_NAME) of the statement on a worklist.
>>>
>>> We pull items off the worklist and propagate the value to the use points
>>> and try to simplify the resulting statement.  If the RHS of the use
>>> point simplified to a constant, then put the LHS of the use statement
>>> onto the worklist.  Iterate until the list is empty.
>>>
>>> That would capture everything of interest I suspect and ought to be
>>> cheap.
>>
>> This sounds like a significant project of its own, and well beyond
>> the scope of the simple infrastructure enhancement I'm making here:
>> all this does is improve the accuracy of the diagnostics.  Is
>> implementing it a precondition for accepting this patch?
> It's not bad as probably 90%+ of the code could be cribbed from
> elsewhere and a simple API built around it.

Maybe.  I just don't expect to have the time to do it.  Plus, with
the on-demand VRP expected to land in GCC 11, I don't think it would
be time well spent.  We'd be adding code only to remove it a few
months from now.  (Unless, of course, the new VRP doesn't handle it.)

>
>>
>> If yes, since I don't think I have the time for it for GCC 10
>> I need to decide whether to drop just this improvement or all
>> of the buffer overflow checks that depend on it.
>>
>> Let me know which you prefer.
> As I mentioned in my previous message, I think we've got two potential
> paths.  We could just drop the problem hunk and adjust the tests with
> xfails, but I'm not sure how badly that compromises what you're trying
> to do.  We could also leave the hunk in with a fixme or somesuch.

It would compromise the buffer overflow detection in cases where
the length of a string computed by the pass is then used in the same
function to access another string.  There are more of these now as
the strlen has been getting better and better at tracking the string
lenghts.

Rather than using fixed size arrays, code that deals with strings
of unlimited lengths has to allocate space for them dynamically.
Until now, the strlen pass hasn't tried to detect or prevent
overflow into those (and transformed writes to them in ways that
prevented it from being detected later by _FORTIFY_SOURCE).  But
the follow-on enhancement to this patch changes that, and so this
limitation becomes more of an impediment.

But I don't understand why you think using the RHS of an SSA_NAME
assignment is a problem when it's an INTEGER_CST.  Is that unsafe
for some reason, or likely to fail somehow?

Martin
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] include size and offset in -Wstringop-overflow

Jeff Law
On 11/18/19 9:15 AM, Martin Sebor wrote:

> On 11/17/19 12:03 PM, Jeff Law wrote:
>> On 11/12/19 12:55 PM, Martin Sebor wrote:
>>> On 11/12/19 10:54 AM, Jeff Law wrote:
>>>> On 11/12/19 1:15 AM, Richard Biener wrote:
>>>>> On Tue, Nov 12, 2019 at 6:10 AM Jeff Law <[hidden email]> wrote:
>>>>>>
>>>>>> On 11/6/19 3:34 PM, Martin Sebor wrote:
>>>>>>> On 11/6/19 2:06 PM, Martin Sebor wrote:
>>>>>>>> On 11/6/19 1:39 PM, Jeff Law wrote:
>>>>>>>>> On 11/6/19 1:27 PM, Martin Sebor wrote:
>>>>>>>>>> On 11/6/19 11:55 AM, Jeff Law wrote:
>>>>>>>>>>> On 11/6/19 11:00 AM, Martin Sebor wrote:
>>>>>>>>>>>> The -Wstringop-overflow warnings for single-byte and multi-byte
>>>>>>>>>>>> stores mention the amount of data being stored and the
>>>>>>>>>>>> amount of
>>>>>>>>>>>> space remaining in the destination, such as:
>>>>>>>>>>>>
>>>>>>>>>>>> warning: writing 4 bytes into a region of size 0
>>>>>>>>>>>> [-Wstringop-overflow=]
>>>>>>>>>>>>
>>>>>>>>>>>>       123 |   *p = 0;
>>>>>>>>>>>>           |   ~~~^~~
>>>>>>>>>>>> note: destination object declared here
>>>>>>>>>>>>        45 |   char b[N];
>>>>>>>>>>>>           |        ^
>>>>>>>>>>>>
>>>>>>>>>>>> A warning like this can take some time to analyze.  First, the
>>>>>>>>>>>> size
>>>>>>>>>>>> of the destination isn't mentioned and may not be easy to tell
>>>>>>>>>>>> from
>>>>>>>>>>>> the sources.  In the note above, when N's value is the
>>>>>>>>>>>> result of
>>>>>>>>>>>> some non-trivial computation, chasing it down may be a small
>>>>>>>>>>>> project
>>>>>>>>>>>> in and of itself.  Second, it's also not clear why the region
>>>>>>>>>>>> size
>>>>>>>>>>>> is zero.  It could be because the offset is exactly N, or
>>>>>>>>>>>> because
>>>>>>>>>>>> it's negative, or because it's in some range greater than N.
>>>>>>>>>>>>
>>>>>>>>>>>> Mentioning both the size of the destination object and the
>>>>>>>>>>>> offset
>>>>>>>>>>>> makes the existing messages clearer, are will become essential
>>>>>>>>>>>> when
>>>>>>>>>>>> GCC starts diagnosing overflow into allocated buffers (as my
>>>>>>>>>>>> follow-on patch does).
>>>>>>>>>>>>
>>>>>>>>>>>> The attached patch enhances -Wstringop-overflow to do this by
>>>>>>>>>>>> letting compute_objsize return the offset to its caller, doing
>>>>>>>>>>>> something similar in get_stridx, and adding a new function to
>>>>>>>>>>>> the strlen pass to issue this enhanced warning (eventually, I'd
>>>>>>>>>>>> like the function to replace the -Wstringop-overflow handler in
>>>>>>>>>>>> builtins.c).  With the change, the note above might read
>>>>>>>>>>>> something
>>>>>>>>>>>> like:
>>>>>>>>>>>>
>>>>>>>>>>>> note: at offset 11 to object ‘b’ with size 8 declared here
>>>>>>>>>>>>        45 |   char b[N];
>>>>>>>>>>>>           |        ^
>>>>>>>>>>>>
>>>>>>>>>>>> Tested on x86_64-linux.
>>>>>>>>>>>>
>>>>>>>>>>>> Martin
>>>>>>>>>>>>
>>>>>>>>>>>> gcc-store-offset.diff
>>>>>>>>>>>>
>>>>>>>>>>>> gcc/ChangeLog:
>>>>>>>>>>>>
>>>>>>>>>>>>        * builtins.c (compute_objsize): Add an argument and set
>>>>>>>>>>>> it to
>>>>>>>>>>>> offset
>>>>>>>>>>>>        into destination.
>>>>>>>>>>>>        * builtins.h (compute_objsize): Add an argument.
>>>>>>>>>>>>        * tree-object-size.c (addr_object_size): Add an argument
>>>>>>>>>>>> and
>>>>>>>>>>>> set it
>>>>>>>>>>>>        to offset into destination.
>>>>>>>>>>>>        (compute_builtin_object_size): Same.
>>>>>>>>>>>>        * tree-object-size.h (compute_builtin_object_size):
>>>>>>>>>>>> Add an
>>>>>>>>>>>> argument.
>>>>>>>>>>>>        * tree-ssa-strlen.c (get_addr_stridx): Add an
>>>>>>>>>>>> argument and
>>>>>>>>>>>> set it
>>>>>>>>>>>>        to offset into destination.
>>>>>>>>>>>>        (maybe_warn_overflow): New function.
>>>>>>>>>>>>        (handle_store): Call maybe_warn_overflow to issue
>>>>>>>>>>>> warnings.
>>>>>>>>>>>>
>>>>>>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>>>>>>
>>>>>>>>>>>>        * c-c++-common/Wstringop-overflow-2.c: Adjust text of
>>>>>>>>>>>> expected
>>>>>>>>>>>> messages.
>>>>>>>>>>>>        * g++.dg/warn/Wstringop-overflow-3.C: Same.
>>>>>>>>>>>>        * gcc.dg/Wstringop-overflow-17.c: Same.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Index: gcc/tree-ssa-strlen.c
>>>>>>>>>>>> ===================================================================
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --- gcc/tree-ssa-strlen.c    (revision 277886)
>>>>>>>>>>>> +++ gcc/tree-ssa-strlen.c    (working copy)
>>>>>>>>>>>> @@ -189,6 +189,52 @@ struct laststmt_struct
>>>>>>>>>>>>      static int get_stridx_plus_constant (strinfo *, unsigned
>>>>>>>>>>>> HOST_WIDE_INT, tree);
>>>>>>>>>>>>      static void handle_builtin_stxncpy (built_in_function,
>>>>>>>>>>>> gimple_stmt_iterator *);
>>>>>>>>>>>>      +/* Sets MINMAX to either the constant value or the
>>>>>>>>>>>> range VAL
>>>>>>>>>>>> is in
>>>>>>>>>>>> +   and returns true on success.  */
>>>>>>>>>>>> +
>>>>>>>>>>>> +static bool
>>>>>>>>>>>> +get_range (tree val, wide_int minmax[2], const vr_values
>>>>>>>>>>>> *rvals =
>>>>>>>>>>>> NULL)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +  if (tree_fits_uhwi_p (val))
>>>>>>>>>>>> +    {
>>>>>>>>>>>> +      minmax[0] = minmax[1] = wi::to_wide (val);
>>>>>>>>>>>> +      return true;
>>>>>>>>>>>> +    }
>>>>>>>>>>>> +
>>>>>>>>>>>> +  if (TREE_CODE (val) != SSA_NAME)
>>>>>>>>>>>> +    return false;
>>>>>>>>>>>> +
>>>>>>>>>>>> +  if (rvals)
>>>>>>>>>>>> +    {
>>>>>>>>>>>> +      gimple *def = SSA_NAME_DEF_STMT (val);
>>>>>>>>>>>> +      if (gimple_assign_single_p (def)
>>>>>>>>>>>> +      && gimple_assign_rhs_code (def) == INTEGER_CST)
>>>>>>>>>>>> +    {
>>>>>>>>>>>> +      /* get_value_range returns [0, N] for constant
>>>>>>>>>>>> assignments.  */
>>>>>>>>>>>> +      val = gimple_assign_rhs1 (def);
>>>>>>>>>>>> +      minmax[0] = minmax[1] = wi::to_wide (val);
>>>>>>>>>>>> +      return true;
>>>>>>>>>>>> +    }
>>>>>>>>>>> Umm, something seems really off with this hunk.  If the
>>>>>>>>>>> SSA_NAME is
>>>>>>>>>>> set
>>>>>>>>>>> via a simple constant assignment, then the range ought to be a
>>>>>>>>>>> singleton
>>>>>>>>>>> ie [CONST,CONST].   Is there are particular test were this is
>>>>>>>>>>> not
>>>>>>>>>>> true?
>>>>>>>>>>>
>>>>>>>>>>> The only way offhand I could see this happening is if originally
>>>>>>>>>>> the RHS
>>>>>>>>>>> wasn't a constant, but due to optimizations it either simplified
>>>>>>>>>>> into a
>>>>>>>>>>> constant or a constant was propagated into an SSA_NAME
>>>>>>>>>>> appearing on
>>>>>>>>>>> the
>>>>>>>>>>> RHS.  This would have to happen between the last range analysis
>>>>>>>>>>> and
>>>>>>>>>>> the
>>>>>>>>>>> point where you're making this query.
>>>>>>>>>>
>>>>>>>>>> Yes, I think that's right.  Here's an example where it happens:
>>>>>>>>>>
>>>>>>>>>>      void f (void)
>>>>>>>>>>      {
>>>>>>>>>>        char s[] = "1234";
>>>>>>>>>>        unsigned n = strlen (s);
>>>>>>>>>>        char vla[n];   // or malloc (n)
>>>>>>>>>>        vla[n] = 0;    // n = [4, 4]
>>>>>>>>>>        ...
>>>>>>>>>>      }
>>>>>>>>>>
>>>>>>>>>> The strlen call is folded to 4 but that's not propagated to
>>>>>>>>>> the access until sometime after the strlen pass is done.
>>>>>>>>> Hmm.  Are we calling set_range_info in that case?  That goes
>>>>>>>>> behind the
>>>>>>>>> back of pass instance of vr_values.  If so, that might argue we
>>>>>>>>> want to
>>>>>>>>> be setting it in vr_values too.
>>>>>>>>
>>>>>>>> No, set_range_info is only called for ranges.  In this case,
>>>>>>>> handle_builtin_strlen replaces the strlen() call with 4:
>>>>>>>>
>>>>>>>>      s = "1234";
>>>>>>>>      _1 = __builtin_strlen (&s);
>>>>>>>>      n_2 = (unsigned int) _1;
>>>>>>>>      a.1_8 = __builtin_alloca_with_align (_1, 8);
>>>>>>>>      (*a.1_8)[n_2] = 0;
>>>>>>>>
>>>>>>>> When the access is made, the __builtin_alloca_with_align call
>>>>>>>> is found as the destination and the _1 SSA_NAME is used to
>>>>>>>> get its size.  We get back the range [4, 4].
>>>>>>>
>>>>>>> By the way, I glossed over one detail.  The above doesn't work
>>>>>>> exactly as is because the allocation size is the SSA_NAME _1
>>>>>>> (with the range [4, 4]) but the index is the SSA_NAME n_2 (with
>>>>>>> the range [0, 4]; the range is [0, 4] because it was set based
>>>>>>> on the size of the argument to the strlen() call well before
>>>>>>> the strlen pass even ran).
>>>>>> Which would tend to argue that we should forward propagate the
>>>>>> constant
>>>>>> to the uses of _1.  That should expose that the RHS of the
>>>>>> assignment to
>>>>>> n_2 is a constant as well.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> To make it work across assignments we need to propagate the strlen
>>>>>>> results down the CFG somehow.  I'm hoping the on-demand VRP will
>>>>>>> do this automagically.
>>>>>> It would, but it's probably more heavyweight than we need.  We just
>>>>>> need
>>>>>> to forward propagate the discovered constant to the use points and
>>>>>> pick
>>>>>> up any secondary opportunities that arise.
>>>>>
>>>>> Yes.  And the usual way of doing this is to keep a constant-and-copy
>>>>> lattice (and for copies you'd need to track availability) and before
>>>>> optimizing
>>>>> a stmt substitute its operands with the lattice contents.
>>>>>
>>>>> forwprop has a scheme that can be followed doing a RPO walk, strlen
>>>>> does a DOM walk, there you can follow what DOM/PRE elimination do
>>>>> (for tracking copy availability - if you just track constants you can
>>>>> elide that).
>>>> I'm also note sure handling copies is all that interesting here and if
>>>> we just handle constants/invariants, then it's pretty easy.
>>>>
>>>> Whenever we replace a strlen call with a const, we put the LHS
>>>> (assuming
>>>> its an SSA_NAME) of the statement on a worklist.
>>>>
>>>> We pull items off the worklist and propagate the value to the use
>>>> points
>>>> and try to simplify the resulting statement.  If the RHS of the use
>>>> point simplified to a constant, then put the LHS of the use statement
>>>> onto the worklist.  Iterate until the list is empty.
>>>>
>>>> That would capture everything of interest I suspect and ought to be
>>>> cheap.
>>>
>>> This sounds like a significant project of its own, and well beyond
>>> the scope of the simple infrastructure enhancement I'm making here:
>>> all this does is improve the accuracy of the diagnostics.  Is
>>> implementing it a precondition for accepting this patch?
>> It's not bad as probably 90%+ of the code could be cribbed from
>> elsewhere and a simple API built around it.
>
> Maybe.  I just don't expect to have the time to do it.  Plus, with
> the on-demand VRP expected to land in GCC 11, I don't think it would
> be time well spent.  We'd be adding code only to remove it a few
> months from now.  (Unless, of course, the new VRP doesn't handle it.)
>
>>
>>>
>>> If yes, since I don't think I have the time for it for GCC 10
>>> I need to decide whether to drop just this improvement or all
>>> of the buffer overflow checks that depend on it.
>>>
>>> Let me know which you prefer.
>> As I mentioned in my previous message, I think we've got two potential
>> paths.  We could just drop the problem hunk and adjust the tests with
>> xfails, but I'm not sure how badly that compromises what you're trying
>> to do.  We could also leave the hunk in with a fixme or somesuch.
>
> It would compromise the buffer overflow detection in cases where
> the length of a string computed by the pass is then used in the same
> function to access another string.  There are more of these now as
> the strlen has been getting better and better at tracking the string
> lenghts.
>
> Rather than using fixed size arrays, code that deals with strings
> of unlimited lengths has to allocate space for them dynamically.
> Until now, the strlen pass hasn't tried to detect or prevent
> overflow into those (and transformed writes to them in ways that
> prevented it from being detected later by _FORTIFY_SOURCE).  But
> the follow-on enhancement to this patch changes that, and so this
> limitation becomes more of an impediment.
Maybe I would ask the question differently.  In real world scenarios how
often does this happen?  How about in tests you're adding?  Is there
value in the patch without this hunk?

Essentially I'm trying to make a determination if the patch and its
follow-ups have value without this hack.

>
> But I don't understand why you think using the RHS of an SSA_NAME
> assignment is a problem when it's an INTEGER_CST.  Is that unsafe
> for some reason, or likely to fail somehow?
Because it's just working around a failing in our optimizers and a
fairly trivial one at that.  I'd rather fix the real problem rather than
just paper over it.

jeff

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] include size and offset in -Wstringop-overflow

Martin Sebor-2
>>>> If yes, since I don't think I have the time for it for GCC 10

>>>> I need to decide whether to drop just this improvement or all
>>>> of the buffer overflow checks that depend on it.
>>>>
>>>> Let me know which you prefer.
>>> As I mentioned in my previous message, I think we've got two potential
>>> paths.  We could just drop the problem hunk and adjust the tests with
>>> xfails, but I'm not sure how badly that compromises what you're trying
>>> to do.  We could also leave the hunk in with a fixme or somesuch.
>>
>> It would compromise the buffer overflow detection in cases where
>> the length of a string computed by the pass is then used in the same
>> function to access another string.  There are more of these now as
>> the strlen has been getting better and better at tracking the string
>> lenghts.
>>
>> Rather than using fixed size arrays, code that deals with strings
>> of unlimited lengths has to allocate space for them dynamically.
>> Until now, the strlen pass hasn't tried to detect or prevent
>> overflow into those (and transformed writes to them in ways that
>> prevented it from being detected later by _FORTIFY_SOURCE).  But
>> the follow-on enhancement to this patch changes that, and so this
>> limitation becomes more of an impediment.
> Maybe I would ask the question differently.  In real world scenarios how
> often does this happen?  How about in tests you're adding?  Is there
> value in the patch without this hunk?
To reiterate what we discussed privately: the patches in this
series (the dynamic buffer overflow detection) have considerable
value even without the hunk.  The detection will work correctly
in most cases.  It will fail and cause false negatives in
the specific cases I mentioned uptread, namely:

where the out-of-bounds store is to a character array whose size
or offset depend on the length of a string previously computed
by the strlen pass.  The class of these cases is large but, due
to the limitation (it has to be the same SSA_NAME that's used
in both the allocation context and the destination offset),
the subset where it is effective is relatively small.

>
> Essentially I'm trying to make a determination if the patch and its
> follow-ups have value without this hack.
>
>>
>> But I don't understand why you think using the RHS of an SSA_NAME
>> assignment is a problem when it's an INTEGER_CST.  Is that unsafe
>> for some reason, or likely to fail somehow?
> Because it's just working around a failing in our optimizers and a
> fairly trivial one at that.  I'd rather fix the real problem rather than
> just paper over it.
The robust solution is to implement the full constant propagation
in (or for) the strlen pass.  Hopefully, the on-demand VRP will
give us that in GCC 11.  If not, I will look into implementing
it myself.

Since the workaround stands in the way of getting the patch
approved I have removed it.  Attached is a revised patch, rebased
on the top of trunk and retested on x86_64-linux.

Martin

gcc-store-offset.diff (33K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] include size and offset in -Wstringop-overflow

Martin Sebor-2
On 12/5/19 4:37 PM, Martin Sebor wrote:

>>>>> If yes, since I don't think I have the time for it for GCC 10
>>>>> I need to decide whether to drop just this improvement or all
>>>>> of the buffer overflow checks that depend on it.
>>>>>
>>>>> Let me know which you prefer.
>>>> As I mentioned in my previous message, I think we've got two potential
>>>> paths.  We could just drop the problem hunk and adjust the tests with
>>>> xfails, but I'm not sure how badly that compromises what you're trying
>>>> to do.  We could also leave the hunk in with a fixme or somesuch.
>>>
>>> It would compromise the buffer overflow detection in cases where
>>> the length of a string computed by the pass is then used in the same
>>> function to access another string.  There are more of these now as
>>> the strlen has been getting better and better at tracking the string
>>> lenghts.
>>>
>>> Rather than using fixed size arrays, code that deals with strings
>>> of unlimited lengths has to allocate space for them dynamically.
>>> Until now, the strlen pass hasn't tried to detect or prevent
>>> overflow into those (and transformed writes to them in ways that
>>> prevented it from being detected later by _FORTIFY_SOURCE).  But
>>> the follow-on enhancement to this patch changes that, and so this
>>> limitation becomes more of an impediment.
>> Maybe I would ask the question differently.  In real world scenarios how
>> often does this happen?  How about in tests you're adding?  Is there
>> value in the patch without this hunk?
>
> To reiterate what we discussed privately: the patches in this
> series (the dynamic buffer overflow detection) have considerable
> value even without the hunk.  The detection will work correctly
> in most cases.  It will fail and cause false negatives in
> the specific cases I mentioned uptread, namely:
>
> where the out-of-bounds store is to a character array whose size
> or offset depend on the length of a string previously computed
> by the strlen pass.  The class of these cases is large but, due
> to the limitation (it has to be the same SSA_NAME that's used
> in both the allocation context and the destination offset),
> the subset where it is effective is relatively small.
>
>>
>> Essentially I'm trying to make a determination if the patch and its
>> follow-ups have value without this hack.
>>
>>>
>>> But I don't understand why you think using the RHS of an SSA_NAME
>>> assignment is a problem when it's an INTEGER_CST.  Is that unsafe
>>> for some reason, or likely to fail somehow?
>> Because it's just working around a failing in our optimizers and a
>> fairly trivial one at that.  I'd rather fix the real problem rather than
>> just paper over it.
>
> The robust solution is to implement the full constant propagation
> in (or for) the strlen pass.  Hopefully, the on-demand VRP will
> give us that in GCC 11.  If not, I will look into implementing
> it myself.
>
> Since the workaround stands in the way of getting the patch
> approved I have removed it.  Attached is a revised patch, rebased
> on the top of trunk and retested on x86_64-linux.

I have committed this in r279248 with Jeff's off-list approval.

Martin
12