[RFA] [tree-optimization/80576] Handle non-constant sizes in DSE

classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[RFA] [tree-optimization/80576] Handle non-constant sizes in DSE

Jeff Law

This patch improves our ability to detect dead stores by handling cases
where the size memcpy, memset, strncpy, etc call is not constant.  This
addresses some, but not all, of the issues in 80576.

The key here is when the size is not constant we can make conservative
decisions that still give us a chance to analyze the code for dead stores.

Remember that for dead store elimination, we're trying to prove that
given two stores, the second store overwrites (partially or fully) the
same memory locations as the first store.  That makes the first store
either partially or fully dead.

When we encounter the first store, we set up a bitmap of bytes written
by that store (live_bytes).  We then look at subsequent stores and clear
the appropriate entries in the bitmap.

If the first store has a nonconstant length argument we can use the
range of the length argument (max) and the size of the destination
object to make a conservative estimation of how many bytes are written.

For the second store the conservative thing to do for a non-constant
length is to use the minimum of the range of the length argument.

This doesn't come up a lot in practice.  But it also happens to put some
of the infrastructure in place to handle strcpy and strcpy_chk which are
needed to fully resolve 80576.

Bootstrapped and regression tested on x86, x86_64, ppc64le, ppc64,
ppc32, aarch64, sparc, s390x and probably others.  Also verified that
the tests work on the various *-elf targets in my tester.

OK for the trunk?

Jeff
Reply | Threaded
Open this post in threaded view
|

Re: [RFA] [tree-optimization/80576] Handle non-constant sizes in DSE

Marc Glisse-6
On Fri, 16 Aug 2019, Jeff Law wrote:

> This patch improves our ability to detect dead stores by handling cases
> where the size memcpy, memset, strncpy, etc call is not constant.  This
> addresses some, but not all, of the issues in 80576.
>
> The key here is when the size is not constant we can make conservative
> decisions that still give us a chance to analyze the code for dead stores.
>
> Remember that for dead store elimination, we're trying to prove that
> given two stores, the second store overwrites (partially or fully) the
> same memory locations as the first store.  That makes the first store
> either partially or fully dead.
>
> When we encounter the first store, we set up a bitmap of bytes written
> by that store (live_bytes).  We then look at subsequent stores and clear
> the appropriate entries in the bitmap.
>
> If the first store has a nonconstant length argument we can use the
> range of the length argument (max) and the size of the destination
> object to make a conservative estimation of how many bytes are written.
>
> For the second store the conservative thing to do for a non-constant
> length is to use the minimum of the range of the length argument.

So I guess it won't handle things like

void f(char*p,int n){
   __builtin_memset(p,3,n);
   __builtin_memset(p,7,n);
}

where we know nothing about the length, except that it is the same? Or do
you look at symbolic ranges?

> This doesn't come up a lot in practice.  But it also happens to put some
> of the infrastructure in place to handle strcpy and strcpy_chk which are
> needed to fully resolve 80576.
>
> Bootstrapped and regression tested on x86, x86_64, ppc64le, ppc64,
> ppc32, aarch64, sparc, s390x and probably others.  Also verified that
> the tests work on the various *-elf targets in my tester.
>
> OK for the trunk?

ENOPATCH

--
Marc Glisse
Reply | Threaded
Open this post in threaded view
|

Re: [RFA] [tree-optimization/80576] Handle non-constant sizes in DSE

Jeff Law
On 8/16/19 12:09 PM, Marc Glisse wrote:

> On Fri, 16 Aug 2019, Jeff Law wrote:
>
>> This patch improves our ability to detect dead stores by handling cases
>> where the size memcpy, memset, strncpy, etc call is not constant.  This
>> addresses some, but not all, of the issues in 80576.
>>
>> The key here is when the size is not constant we can make conservative
>> decisions that still give us a chance to analyze the code for dead
>> stores.
>>
>> Remember that for dead store elimination, we're trying to prove that
>> given two stores, the second store overwrites (partially or fully) the
>> same memory locations as the first store.  That makes the first store
>> either partially or fully dead.
>>
>> When we encounter the first store, we set up a bitmap of bytes written
>> by that store (live_bytes).  We then look at subsequent stores and clear
>> the appropriate entries in the bitmap.
>>
>> If the first store has a nonconstant length argument we can use the
>> range of the length argument (max) and the size of the destination
>> object to make a conservative estimation of how many bytes are written.
>>
>> For the second store the conservative thing to do for a non-constant
>> length is to use the minimum of the range of the length argument.
>
> So I guess it won't handle things like
>
> void f(char*p,int n){
>   __builtin_memset(p,3,n);
>   __builtin_memset(p,7,n);
> }
>
> where we know nothing about the length, except that it is the same? Or
> do you look at symbolic ranges?
Nope.  I think ao_ref can represent that, so it'd just be a matter of
recording "n" as the length, then verifying that the second call's
length is "n" as well.  That makes the first call dead.  We'd have to
bypass the byte tracking in that case, but I think that's trivial
because we already have a means to do that when the sizes are too large.

>
>> This doesn't come up a lot in practice.  But it also happens to put some
>> of the infrastructure in place to handle strcpy and strcpy_chk which are
>> needed to fully resolve 80576.
>>
>> Bootstrapped and regression tested on x86, x86_64, ppc64le, ppc64,
>> ppc32, aarch64, sparc, s390x and probably others.  Also verified that
>> the tests work on the various *-elf targets in my tester.
>>
>> OK for the trunk?
>
> ENOPATCH
Opps.    Attached.

Jeff



        PR tree-optimizatoin/80576
        * tree-ssa-dse.c: Include builtins.h and gimple-fold.h.
        (objsize_from_type): New function.
        (initialize_ao_ref_for_dse): Add new argument.  Handle non
        constant sizes for currently supported builtin calls.
        (clear_bytes_written_by): Pass new argument to
        initialize_ao_ref_for_dse.
        (dse_optimize_redundant_stores): Likewise.
        (dse_dom_walker::dse_optimize_stmt): Likewise.  Do not trim
        calls if the size is not constant.

        * gcc.dg/tree-ssa/ssa-dse-39.c: New test.
        * gcc.dg/tree-ssa/ssa-dse-40.c: New test.

diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
index 5b7c4fc6d1a..ae03980f792 100644
--- a/gcc/tree-ssa-dse.c
+++ b/gcc/tree-ssa-dse.c
@@ -36,6 +36,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "params.h"
 #include "alias.h"
 #include "tree-ssa-loop.h"
+#include "builtins.h"
+#include "gimple-fold.h"
 
 /* This file implements dead store elimination.
 
@@ -91,16 +93,47 @@ enum dse_store_status
   DSE_STORE_DEAD
 };
 
+/* OBJECT is an ADDR_EXPR.  If its underlying type is an array,
+   return the size of the array in bytes.  */
+
+static tree
+objsize_from_type (tree object)
+{
+  if (TREE_CODE (object) != ADDR_EXPR)
+    return NULL_TREE;
+
+  tree type = TREE_TYPE (object);
+  if (POINTER_TYPE_P (type))
+    type = TREE_TYPE (type);
+
+  type = TYPE_MAIN_VARIANT (type);
+
+  if (TREE_CODE (type) == ARRAY_TYPE && !array_at_struct_end_p (object))
+    {
+      tree t = TYPE_SIZE_UNIT (type);
+      if (t && TREE_CODE (t) == INTEGER_CST && !integer_zerop (t))
+ return t;
+    }
+
+  return NULL_TREE;
+}
+
 /* STMT is a statement that may write into memory.  Analyze it and
    initialize WRITE to describe how STMT affects memory.
 
+   If MAXLEN is true, then we are computing how many bytes this write
+   might perform.  When false we are computing the minimum number of
+   bytes this write may perform.  This only matters for calls like
+   strcpy where the number of bytes written is determined by the length
+   of the input string operand.
+
    Return TRUE if the the statement was analyzed, FALSE otherwise.
 
    It is always safe to return FALSE.  But typically better optimziation
    can be achieved by analyzing more statements.  */
 
 static bool
-initialize_ao_ref_for_dse (gimple *stmt, ao_ref *write)
+initialize_ao_ref_for_dse (gimple *stmt, ao_ref *write, bool maxlen)
 {
   /* It's advantageous to handle certain mem* functions.  */
   if (gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
@@ -117,8 +150,34 @@ initialize_ao_ref_for_dse (gimple *stmt, ao_ref *write)
  case BUILT_IN_STRNCPY_CHK:
   {
     tree size = gimple_call_arg (stmt, 2);
-    tree ptr = gimple_call_arg (stmt, 0);
-    ao_ref_init_from_ptr_and_size (write, ptr, size);
+    tree dest = gimple_call_arg (stmt, 0);
+
+    if (TREE_CODE (size) != INTEGER_CST)
+      {
+ wide_int minbound, maxbound;
+ value_range_kind rng = get_range_info (size, &minbound, &maxbound);
+ if (rng == VR_RANGE)
+  size = wide_int_to_tree (TREE_TYPE (size),
+   maxlen ? maxbound : minbound);
+      }
+
+    /* Constrain the maximum length to the size of the destination
+       object.  This is primarily useful when we have a nonconstant
+       range which might be somewhat pessimistic.  */
+    if (maxlen)
+      {
+ tree type_size = objsize_from_type (dest);
+ if (TREE_CODE (size) != INTEGER_CST)
+  size = type_size;
+ else if (type_size)
+  size = fold_build2 (MIN_EXPR, TREE_TYPE (dest),
+      size, type_size);
+      }
+
+    if (size == NULL_TREE || TREE_CODE (size) != INTEGER_CST)
+      return false;
+
+    ao_ref_init_from_ptr_and_size (write, dest, size);
     return true;
   }
 
@@ -219,7 +278,7 @@ static void
 clear_bytes_written_by (sbitmap live_bytes, gimple *stmt, ao_ref *ref)
 {
   ao_ref write;
-  if (!initialize_ao_ref_for_dse (stmt, &write))
+  if (!initialize_ao_ref_for_dse (stmt, &write, false))
     return;
 
   /* Verify we have the same base memory address, the write
@@ -641,7 +700,7 @@ dse_optimize_redundant_stores (gimple *stmt)
  {
   ao_ref write;
 
-  if (!initialize_ao_ref_for_dse (use_stmt, &write))
+  if (!initialize_ao_ref_for_dse (use_stmt, &write, false))
     BREAK_FROM_IMM_USE_STMT (ui)
 
   if (valid_ao_ref_for_dse (&write)
@@ -956,7 +1015,7 @@ dse_dom_walker::dse_optimize_stmt (gimple_stmt_iterator *gsi)
     return;
 
   ao_ref ref;
-  if (!initialize_ao_ref_for_dse (stmt, &ref))
+  if (!initialize_ao_ref_for_dse (stmt, &ref, true))
     return;
 
   /* We know we have virtual definitions.  We can handle assignments and
@@ -1002,7 +1061,13 @@ dse_dom_walker::dse_optimize_stmt (gimple_stmt_iterator *gsi)
     if (store_status == DSE_STORE_LIVE)
       return;
 
-    if (store_status == DSE_STORE_MAYBE_PARTIAL_DEAD)
+    /* If this store did not have a constant size, then do not
+       try to trim the partially redundant store.
+
+       We could possible compute a new size at runtime which
+       could be faster because we avoid memory traffic.  */
+    if (TREE_CODE (size) == INTEGER_CST
+ && store_status == DSE_STORE_MAYBE_PARTIAL_DEAD)
       {
  maybe_trim_memstar_call (&ref, m_live_bytes, stmt);
  return;
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-39.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-39.c
new file mode 100644
index 00000000000..0b0fa618844
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-39.c
@@ -0,0 +1,44 @@
+/* { dg-options "-O2 -fdump-tree-dse-details" } */
+
+extern void frob (char *);
+
+void g (char *s, int x)
+{
+  char a[8];
+  __builtin_strncpy (a, s, x);
+  __builtin_memset (a, 0, sizeof a);
+  frob (a);
+}
+
+void h (char *s, int x)
+{
+  char a[8];
+  __builtin_memset (a, 0, x);
+  __builtin_strncpy (a, s, sizeof a);
+  frob (a);
+}
+
+void i (char *s, int x)
+{
+  char a[8];
+  __builtin_strncpy (a, s, sizeof a);
+  __builtin_memset (a, 0, x);
+  frob (a);
+}
+
+void j (char *s, int x)
+{
+  char a[8];
+  __builtin_memset (a, 0, sizeof a);
+  __builtin_strncpy (a, s, x);
+  frob (a);
+}
+
+/* We can remove the dead stores/calls in the first two functions
+   since in both cases the second call wipes the entire object.
+
+   There's nothing we can do in the last two functions since we
+   know nothing about the extent of the second store.  */
+/* { dg-final { scan-tree-dump-times "Deleted dead call" 2 "dse1" } } */
+/* { dg-final { scan-tree-dump-not "Trimming statement " "dse1" } } */
+
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-40.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-40.c
new file mode 100644
index 00000000000..0f431b8089f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-40.c
@@ -0,0 +1,63 @@
+/* { dg-options "-O2 -fdump-tree-dse-details" } */
+
+extern void frob (char *);
+
+void g (char *s, int x)
+{
+  char a[8];
+  __builtin_strncpy (a, s, sizeof a);
+  __builtin_memset (a, 0, x ? 8 : 10);
+  frob (a);
+}
+
+void h (char *s, int x)
+{
+  char a[8];
+  __builtin_memset (a, 0, sizeof a);
+  __builtin_strncpy (a, s, x ? 8 : 10);
+  frob (a);
+}
+
+void i (char *s, int x)
+{
+  char a[8];
+  __builtin_strncpy (a, s, sizeof a);
+  __builtin_memset (a, 0, x ? 6 : 8);
+  frob (a);
+}
+
+void j (char *s, int x)
+{
+  char a[8];
+  __builtin_memset (a, 0, sizeof a);
+  __builtin_strncpy (a, s, x ? 6 : 8);
+  frob (a);
+}
+
+void k (char *s, int x)
+{
+  char a[8];
+  __builtin_strncpy (a, s, sizeof a);
+  __builtin_memset (&a[6], 0, x ? 2 : 4);
+  frob (a);
+}
+
+void l (char *s, int x)
+{
+  char a[8];
+  __builtin_memset (a, 0, sizeof a);
+  __builtin_strncpy (&a[6], s, x ? 2 : 4);
+  frob (a);
+}
+
+/* We can remove the dead stores/calls in the first two functions
+   since in both cases the second call wipes the entire object.
+
+   In the 3rd and 4th case we can trim up to 6 bytes off the
+   head of the store.
+
+   In the last two tests we can trim 2 bytes from the tail.  */
+
+/* { dg-final { scan-tree-dump-times "Deleted dead call" 2 "dse1" } } */
+/* { dg-final { scan-tree-dump-times "Trimming statement"  4 "dse1" } } */
+
Reply | Threaded
Open this post in threaded view
|

Re: [RFA] [tree-optimization/80576] Handle non-constant sizes in DSE

Jeff Law
In reply to this post by Marc Glisse-6
On 8/16/19 12:09 PM, Marc Glisse wrote:

> On Fri, 16 Aug 2019, Jeff Law wrote:
>
>> This patch improves our ability to detect dead stores by handling cases
>> where the size memcpy, memset, strncpy, etc call is not constant.  This
>> addresses some, but not all, of the issues in 80576.
>>
>> The key here is when the size is not constant we can make conservative
>> decisions that still give us a chance to analyze the code for dead
>> stores.
>>
>> Remember that for dead store elimination, we're trying to prove that
>> given two stores, the second store overwrites (partially or fully) the
>> same memory locations as the first store.  That makes the first store
>> either partially or fully dead.
>>
>> When we encounter the first store, we set up a bitmap of bytes written
>> by that store (live_bytes).  We then look at subsequent stores and clear
>> the appropriate entries in the bitmap.
>>
>> If the first store has a nonconstant length argument we can use the
>> range of the length argument (max) and the size of the destination
>> object to make a conservative estimation of how many bytes are written.
>>
>> For the second store the conservative thing to do for a non-constant
>> length is to use the minimum of the range of the length argument.
>
> So I guess it won't handle things like
>
> void f(char*p,int n){
>   __builtin_memset(p,3,n);
>   __builtin_memset(p,7,n);
> }
>
> where we know nothing about the length, except that it is the same? Or
> do you look at symbolic ranges?

So handling this was slightly uglier than I'd hoped.  I mis-remembered
what we had in an ao_ref.  We have a way to describe constant sizes and
to mark that something was non-constant (size of -1), but not what the
non-constant value was.

I looked at two approaches.  One created a dse_ref structure that had an
embedded ao_ref.  That creates a fair amount of churn.  But it certainly
looks do-able.

The second derived a dse_ref from an ao_ref which allows the vast
majority of the code in tree-ssa-dse.c to just work as-is.  With that in
place it was fairly simply to initialize the new field and check it in a
couple places.  Resulting in:

> ; Function f (f, funcdef_no=0, decl_uid=1909, cgraph_uid=1, symbol_order=0)
>
>   Deleted dead call: __builtin_memset (p_5(D), 3, _1);
>
> f (char * p, int n)
> {
>   long unsigned int _1;
>
> ;;   basic block 2, loop depth 0, maybe hot
> ;;    prev block 0, next block 1, flags: (NEW, VISITED)
> ;;    pred:       ENTRY (FALLTHRU,EXECUTABLE)
>   _1 = (long unsigned int) n_3(D);
>   __builtin_memset (p_5(D), 7, _1);
>   return;
> ;;    succ:       EXIT (EXECUTABLE)
>
> }
>

Not sure how useful this will be in practice though.

Jeff
Reply | Threaded
Open this post in threaded view
|

Re: [RFA] [tree-optimization/80576] Handle non-constant sizes in DSE

Marc Glisse-6
On Fri, 16 Aug 2019, Jeff Law wrote:

> On 8/16/19 12:09 PM, Marc Glisse wrote:
>> On Fri, 16 Aug 2019, Jeff Law wrote:
>>
>>> This patch improves our ability to detect dead stores by handling cases
>>> where the size memcpy, memset, strncpy, etc call is not constant.  This
>>> addresses some, but not all, of the issues in 80576.
>>>
>>> The key here is when the size is not constant we can make conservative
>>> decisions that still give us a chance to analyze the code for dead
>>> stores.
>>>
>>> Remember that for dead store elimination, we're trying to prove that
>>> given two stores, the second store overwrites (partially or fully) the
>>> same memory locations as the first store.  That makes the first store
>>> either partially or fully dead.
>>>
>>> When we encounter the first store, we set up a bitmap of bytes written
>>> by that store (live_bytes).  We then look at subsequent stores and clear
>>> the appropriate entries in the bitmap.
>>>
>>> If the first store has a nonconstant length argument we can use the
>>> range of the length argument (max) and the size of the destination
>>> object to make a conservative estimation of how many bytes are written.
>>>
>>> For the second store the conservative thing to do for a non-constant
>>> length is to use the minimum of the range of the length argument.
>>
>> So I guess it won't handle things like
>>
>> void f(char*p,int n){
>>   __builtin_memset(p,3,n);
>>   __builtin_memset(p,7,n);
>> }
>>
>> where we know nothing about the length, except that it is the same? Or
>> do you look at symbolic ranges?
>
> So handling this was slightly uglier than I'd hoped.  I mis-remembered
> what we had in an ao_ref.  We have a way to describe constant sizes and
> to mark that something was non-constant (size of -1), but not what the
> non-constant value was.
>
> I looked at two approaches.  One created a dse_ref structure that had an
> embedded ao_ref.  That creates a fair amount of churn.  But it certainly
> looks do-able.
>
> The second derived a dse_ref from an ao_ref which allows the vast
> majority of the code in tree-ssa-dse.c to just work as-is.  With that in
> place it was fairly simply to initialize the new field and check it in a
> couple places.  Resulting in:
>
>> ; Function f (f, funcdef_no=0, decl_uid=1909, cgraph_uid=1, symbol_order=0)
>>
>>   Deleted dead call: __builtin_memset (p_5(D), 3, _1);
>>
>> f (char * p, int n)
>> {
>>   long unsigned int _1;
>>
>> ;;   basic block 2, loop depth 0, maybe hot
>> ;;    prev block 0, next block 1, flags: (NEW, VISITED)
>> ;;    pred:       ENTRY (FALLTHRU,EXECUTABLE)
>>   _1 = (long unsigned int) n_3(D);
>>   __builtin_memset (p_5(D), 7, _1);
>>   return;
>> ;;    succ:       EXIT (EXECUTABLE)
>>
>> }
>>
>
> Not sure how useful this will be in practice though.

Oh, I wasn't expecting you to handle it right now...

I've seen this kind of thing (not necessarily with memset, memcpy was
probably involved as well) a few times, enough that it immediately came to
mind when I saw the title of your message. There are probably traces in
bugzilla, although I don't think there is a convenient search query to
find them. I found PR 79716 that seems related at least, but not what I
was looking for.

--
Marc Glisse
Reply | Threaded
Open this post in threaded view
|

Re: [RFA] [tree-optimization/80576] Handle non-constant sizes in DSE

Jeff Law
On 8/16/19 4:06 PM, Marc Glisse wrote:

> On Fri, 16 Aug 2019, Jeff Law wrote:
>
>> On 8/16/19 12:09 PM, Marc Glisse wrote:
>>> On Fri, 16 Aug 2019, Jeff Law wrote:
>>>
>>>> This patch improves our ability to detect dead stores by handling cases
>>>> where the size memcpy, memset, strncpy, etc call is not constant.  This
>>>> addresses some, but not all, of the issues in 80576.
>>>>
>>>> The key here is when the size is not constant we can make conservative
>>>> decisions that still give us a chance to analyze the code for dead
>>>> stores.
>>>>
>>>> Remember that for dead store elimination, we're trying to prove that
>>>> given two stores, the second store overwrites (partially or fully) the
>>>> same memory locations as the first store.  That makes the first store
>>>> either partially or fully dead.
>>>>
>>>> When we encounter the first store, we set up a bitmap of bytes written
>>>> by that store (live_bytes).  We then look at subsequent stores and
>>>> clear
>>>> the appropriate entries in the bitmap.
>>>>
>>>> If the first store has a nonconstant length argument we can use the
>>>> range of the length argument (max) and the size of the destination
>>>> object to make a conservative estimation of how many bytes are written.
>>>>
>>>> For the second store the conservative thing to do for a non-constant
>>>> length is to use the minimum of the range of the length argument.
>>>
>>> So I guess it won't handle things like
>>>
>>> void f(char*p,int n){
>>>   __builtin_memset(p,3,n);
>>>   __builtin_memset(p,7,n);
>>> }
>>>
>>> where we know nothing about the length, except that it is the same? Or
>>> do you look at symbolic ranges?
>>
>> So handling this was slightly uglier than I'd hoped.  I mis-remembered
>> what we had in an ao_ref.  We have a way to describe constant sizes and
>> to mark that something was non-constant (size of -1), but not what the
>> non-constant value was.
>>
>> I looked at two approaches.  One created a dse_ref structure that had an
>> embedded ao_ref.  That creates a fair amount of churn.  But it certainly
>> looks do-able.
>>
>> The second derived a dse_ref from an ao_ref which allows the vast
>> majority of the code in tree-ssa-dse.c to just work as-is.  With that in
>> place it was fairly simply to initialize the new field and check it in a
>> couple places.  Resulting in:
>>
>>> ; Function f (f, funcdef_no=0, decl_uid=1909, cgraph_uid=1,
>>> symbol_order=0)
>>>
>>>   Deleted dead call: __builtin_memset (p_5(D), 3, _1);
>>>
>>> f (char * p, int n)
>>> {
>>>   long unsigned int _1;
>>>
>>> ;;   basic block 2, loop depth 0, maybe hot
>>> ;;    prev block 0, next block 1, flags: (NEW, VISITED)
>>> ;;    pred:       ENTRY (FALLTHRU,EXECUTABLE)
>>>   _1 = (long unsigned int) n_3(D);
>>>   __builtin_memset (p_5(D), 7, _1);
>>>   return;
>>> ;;    succ:       EXIT (EXECUTABLE)
>>>
>>> }
>>>
>>
>> Not sure how useful this will be in practice though.
>
> Oh, I wasn't expecting you to handle it right now...
Might as well since I'm already in the code :-)


>
> I've seen this kind of thing (not necessarily with memset, memcpy was
> probably involved as well) a few times, enough that it immediately came
> to mind when I saw the title of your message. There are probably traces
> in bugzilla, although I don't think there is a convenient search query
> to find them. I found PR 79716 that seems related at least, but not what
> I was looking for.
I didn't realize this was 79716.  I'd just glossed over that today
looking for something else in this space :-)  With the way DSE works it
really shouldn't matter if it's memset, or memcpy.

79716 also raises the issue of turning a calloc back into a simple
malloc when the zero initialization was overwritten.  We're not handling
that either.  But that seems even less likely to be seen with any
regularity.

jeff

Reply | Threaded
Open this post in threaded view
|

Re: [RFA] [tree-optimization/80576] Handle non-constant sizes in DSE

Martin Sebor-2
In reply to this post by Jeff Law
On 8/16/19 12:15 PM, Jeff Law wrote:

> On 8/16/19 12:09 PM, Marc Glisse wrote:
>> On Fri, 16 Aug 2019, Jeff Law wrote:
>>
>>> This patch improves our ability to detect dead stores by handling cases
>>> where the size memcpy, memset, strncpy, etc call is not constant.  This
>>> addresses some, but not all, of the issues in 80576.
>>>
>>> The key here is when the size is not constant we can make conservative
>>> decisions that still give us a chance to analyze the code for dead
>>> stores.
>>>
>>> Remember that for dead store elimination, we're trying to prove that
>>> given two stores, the second store overwrites (partially or fully) the
>>> same memory locations as the first store.  That makes the first store
>>> either partially or fully dead.
>>>
>>> When we encounter the first store, we set up a bitmap of bytes written
>>> by that store (live_bytes).  We then look at subsequent stores and clear
>>> the appropriate entries in the bitmap.
>>>
>>> If the first store has a nonconstant length argument we can use the
>>> range of the length argument (max) and the size of the destination
>>> object to make a conservative estimation of how many bytes are written.
>>>
>>> For the second store the conservative thing to do for a non-constant
>>> length is to use the minimum of the range of the length argument.
>>
>> So I guess it won't handle things like
>>
>> void f(char*p,int n){
>>    __builtin_memset(p,3,n);
>>    __builtin_memset(p,7,n);
>> }
>>
>> where we know nothing about the length, except that it is the same? Or
>> do you look at symbolic ranges?
> Nope.  I think ao_ref can represent that, so it'd just be a matter of
> recording "n" as the length, then verifying that the second call's
> length is "n" as well.  That makes the first call dead.  We'd have to
> bypass the byte tracking in that case, but I think that's trivial
> because we already have a means to do that when the sizes are too large.
>
>>
>>> This doesn't come up a lot in practice.  But it also happens to put some
>>> of the infrastructure in place to handle strcpy and strcpy_chk which are
>>> needed to fully resolve 80576.
>>>
>>> Bootstrapped and regression tested on x86, x86_64, ppc64le, ppc64,
>>> ppc32, aarch64, sparc, s390x and probably others.  Also verified that
>>> the tests work on the various *-elf targets in my tester.
>>>
>>> OK for the trunk?
>>
>> ENOPATCH
> Opps.    Attached.

It's an improvement and I realize you said it doesn't handle
everything and that you don't think it comes up a lot, but...
I would actually expect the following example (from the bug)
not to be that uncommon:

   void g (char *s)
   {
     char a[8];
     __builtin_strcpy (a, s);
     __builtin_memset (a, 0, sizeof a);
     f (a);
   }

or at least to be more common than the equivalent alternative
the improvement does optimize:

   void g (char *s)
   {
     char a[8];
     __builtin_memcpy (a, s,  __builtin_strlen (s));
     __builtin_memset (a, 0, 8);
     f (a);
   }

It seems that making the first one work should be just a matter
of handling strcpy analogously (the string length doesn't matter).

As an aside, the new tests make me realize that -Wstringop-overflow
should be enhanced to detect this problem (i.e., a consider
the largest size in a PHI).

Martin
Reply | Threaded
Open this post in threaded view
|

Re: [RFA] [tree-optimization/80576] Handle non-constant sizes in DSE

Richard Biener-2
In reply to this post by Jeff Law
On Fri, Aug 16, 2019 at 8:15 PM Jeff Law <[hidden email]> wrote:

>
> On 8/16/19 12:09 PM, Marc Glisse wrote:
> > On Fri, 16 Aug 2019, Jeff Law wrote:
> >
> >> This patch improves our ability to detect dead stores by handling cases
> >> where the size memcpy, memset, strncpy, etc call is not constant.  This
> >> addresses some, but not all, of the issues in 80576.
> >>
> >> The key here is when the size is not constant we can make conservative
> >> decisions that still give us a chance to analyze the code for dead
> >> stores.
> >>
> >> Remember that for dead store elimination, we're trying to prove that
> >> given two stores, the second store overwrites (partially or fully) the
> >> same memory locations as the first store.  That makes the first store
> >> either partially or fully dead.
> >>
> >> When we encounter the first store, we set up a bitmap of bytes written
> >> by that store (live_bytes).  We then look at subsequent stores and clear
> >> the appropriate entries in the bitmap.
> >>
> >> If the first store has a nonconstant length argument we can use the
> >> range of the length argument (max) and the size of the destination
> >> object to make a conservative estimation of how many bytes are written.
> >>
> >> For the second store the conservative thing to do for a non-constant
> >> length is to use the minimum of the range of the length argument.
> >
> > So I guess it won't handle things like
> >
> > void f(char*p,int n){
> >   __builtin_memset(p,3,n);
> >   __builtin_memset(p,7,n);
> > }
> >
> > where we know nothing about the length, except that it is the same? Or
> > do you look at symbolic ranges?
> Nope.  I think ao_ref can represent that, so it'd just be a matter of
> recording "n" as the length, then verifying that the second call's
> length is "n" as well.  That makes the first call dead.  We'd have to
> bypass the byte tracking in that case, but I think that's trivial
> because we already have a means to do that when the sizes are too large.
>
> >
> >> This doesn't come up a lot in practice.  But it also happens to put some
> >> of the infrastructure in place to handle strcpy and strcpy_chk which are
> >> needed to fully resolve 80576.
> >>
> >> Bootstrapped and regression tested on x86, x86_64, ppc64le, ppc64,
> >> ppc32, aarch64, sparc, s390x and probably others.  Also verified that
> >> the tests work on the various *-elf targets in my tester.
> >>
> >> OK for the trunk?
> >
> > ENOPATCH
> Opps.    Attached.

+static tree
+objsize_from_type (tree object)
+{
+  if (TREE_CODE (object) != ADDR_EXPR)
+    return NULL_TREE;
+
+  tree type = TREE_TYPE (object);
+  if (POINTER_TYPE_P (type))

that if looks suspicious...  I'd say
  if (!POINTER_TYPE_P (type))
    return NULL_TREE;

is better

+    type = TREE_TYPE (type);

+  if (TREE_CODE (type) == ARRAY_TYPE && !array_at_struct_end_p (object))
+    {

array_at_struct_end_p will never return true here since you pass it
an ADDR_EXPR...  you wanted to pass TREE_OPERAND (object, 0) here?

Also you seem to use this info to constrain optimization when you
might remember that types of addresses do not carry such information...
Thus it should be "trivially" possible to write a testcase that is miscompiled
after your patch.  I also don't see this really exercised in the
testcases you add?
All of them reference a static array a ...

+           if (TREE_CODE (size) != INTEGER_CST)

TREE_CODE (size) == SSA_NAME you want

+             {
+               wide_int minbound, maxbound;
+               value_range_kind rng = get_range_info (size,
&minbound, &maxbound);
+               if (rng == VR_RANGE)

Richard.

>
> Jeff
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [RFA] [tree-optimization/80576] Handle non-constant sizes in DSE

Jeff Law
In reply to this post by Martin Sebor-2
On 8/16/19 4:21 PM, Martin Sebor wrote:

> On 8/16/19 12:15 PM, Jeff Law wrote:
>> On 8/16/19 12:09 PM, Marc Glisse wrote:
>>> On Fri, 16 Aug 2019, Jeff Law wrote:
>>>
>>>> This patch improves our ability to detect dead stores by handling cases
>>>> where the size memcpy, memset, strncpy, etc call is not constant.  This
>>>> addresses some, but not all, of the issues in 80576.
>>>>
>>>> The key here is when the size is not constant we can make conservative
>>>> decisions that still give us a chance to analyze the code for dead
>>>> stores.
>>>>
>>>> Remember that for dead store elimination, we're trying to prove that
>>>> given two stores, the second store overwrites (partially or fully) the
>>>> same memory locations as the first store.  That makes the first store
>>>> either partially or fully dead.
>>>>
>>>> When we encounter the first store, we set up a bitmap of bytes written
>>>> by that store (live_bytes).  We then look at subsequent stores and
>>>> clear
>>>> the appropriate entries in the bitmap.
>>>>
>>>> If the first store has a nonconstant length argument we can use the
>>>> range of the length argument (max) and the size of the destination
>>>> object to make a conservative estimation of how many bytes are written.
>>>>
>>>> For the second store the conservative thing to do for a non-constant
>>>> length is to use the minimum of the range of the length argument.
>>>
>>> So I guess it won't handle things like
>>>
>>> void f(char*p,int n){
>>>    __builtin_memset(p,3,n);
>>>    __builtin_memset(p,7,n);
>>> }
>>>
>>> where we know nothing about the length, except that it is the same? Or
>>> do you look at symbolic ranges?
>> Nope.  I think ao_ref can represent that, so it'd just be a matter of
>> recording "n" as the length, then verifying that the second call's
>> length is "n" as well.  That makes the first call dead.  We'd have to
>> bypass the byte tracking in that case, but I think that's trivial
>> because we already have a means to do that when the sizes are too large.
>>
>>>
>>>> This doesn't come up a lot in practice.  But it also happens to put
>>>> some
>>>> of the infrastructure in place to handle strcpy and strcpy_chk which
>>>> are
>>>> needed to fully resolve 80576.
>>>>
>>>> Bootstrapped and regression tested on x86, x86_64, ppc64le, ppc64,
>>>> ppc32, aarch64, sparc, s390x and probably others.  Also verified that
>>>> the tests work on the various *-elf targets in my tester.
>>>>
>>>> OK for the trunk?
>>>
>>> ENOPATCH
>> Opps.    Attached.
>
> It's an improvement and I realize you said it doesn't handle
> everything and that you don't think it comes up a lot, but...
> I would actually expect the following example (from the bug)
> not to be that uncommon:
>
>   void g (char *s)
>   {
>     char a[8];
>     __builtin_strcpy (a, s);
>     __builtin_memset (a, 0, sizeof a);
>     f (a);
>   }
>
> or at least to be more common than the equivalent alternative
> the improvement does optimize:
>
>   void g (char *s)
>   {
>     char a[8];
>     __builtin_memcpy (a, s,  __builtin_strlen (s));
>     __builtin_memset (a, 0, 8);
>     f (a);
>   }
>
> It seems that making the first one work should be just a matter
> of handling strcpy analogously (the string length doesn't matter).
>
> As an aside, the new tests make me realize that -Wstringop-overflow
> should be enhanced to detect this problem (i.e., a consider
> the largest size in a PHI).
Certainly not seeing much of these in the code I've looked at.  It may
be a case that aliasing gets in the way often.

Also note that we've got the same issues in this space that we did with
the strlen optimization improvements for last year.  I totally spaced
that and the net result may be we have to avoid using the type size for
anything in DSE which may make it impossible to really do a good job.

JEff
Reply | Threaded
Open this post in threaded view
|

Re: [RFA] [tree-optimization/80576] Handle non-constant sizes in DSE

Jeff Law
In reply to this post by Richard Biener-2
On 8/19/19 7:57 AM, Richard Biener wrote:

>
> +static tree
> +objsize_from_type (tree object)
> +{
> +  if (TREE_CODE (object) != ADDR_EXPR)
> +    return NULL_TREE;
> +
> +  tree type = TREE_TYPE (object);
> +  if (POINTER_TYPE_P (type))
>
> that if looks suspicious...  I'd say
>   if (!POINTER_TYPE_P (type))
>     return NULL_TREE;
>
> is better
Sure.  But we may not be able to use this anyway as you noted later,
depending on the type is not safe.

>
> +    type = TREE_TYPE (type);
>
> +  if (TREE_CODE (type) == ARRAY_TYPE && !array_at_struct_end_p (object))
> +    {
>
> array_at_struct_end_p will never return true here since you pass it
> an ADDR_EXPR...  you wanted to pass TREE_OPERAND (object, 0) here?
I think you're right.  Given this was cribbed from the tail of another
function elsewhere, I suspect that other function has the same issue.

I suspect ultimately we want to fix that other copy and drop
objsize_from_type.


>
> Also you seem to use this info to constrain optimization when you
> might remember that types of addresses do not carry such information...
> Thus it should be "trivially" possible to write a testcase that is miscompiled
> after your patch.  I also don't see this really exercised in the
> testcases you add?
Arggh.  You're absolutely correct.  I must be blocking out that entire
discussion from last summer due to the trama :-)

If the destination is the address of a _DECL node, can we use the size
of the _DECL?


> All of them reference a static array a ...
>
> +           if (TREE_CODE (size) != INTEGER_CST)
>
> TREE_CODE (size) == SSA_NAME you want
Agreed.  Will fix.

jeff
Reply | Threaded
Open this post in threaded view
|

Re: [RFA] [tree-optimization/80576] Handle non-constant sizes in DSE

Richard Biener-2
On Thu, Aug 22, 2019 at 12:48 AM Jeff Law <[hidden email]> wrote:

>
> On 8/19/19 7:57 AM, Richard Biener wrote:
>
> >
> > +static tree
> > +objsize_from_type (tree object)
> > +{
> > +  if (TREE_CODE (object) != ADDR_EXPR)
> > +    return NULL_TREE;
> > +
> > +  tree type = TREE_TYPE (object);
> > +  if (POINTER_TYPE_P (type))
> >
> > that if looks suspicious...  I'd say
> >   if (!POINTER_TYPE_P (type))
> >     return NULL_TREE;
> >
> > is better
> Sure.  But we may not be able to use this anyway as you noted later,
> depending on the type is not safe.
>
> >
> > +    type = TREE_TYPE (type);
> >
> > +  if (TREE_CODE (type) == ARRAY_TYPE && !array_at_struct_end_p (object))
> > +    {
> >
> > array_at_struct_end_p will never return true here since you pass it
> > an ADDR_EXPR...  you wanted to pass TREE_OPERAND (object, 0) here?
> I think you're right.  Given this was cribbed from the tail of another
> function elsewhere, I suspect that other function has the same issue.
>
> I suspect ultimately we want to fix that other copy and drop
> objsize_from_type.
>
>
> >
> > Also you seem to use this info to constrain optimization when you
> > might remember that types of addresses do not carry such information...
> > Thus it should be "trivially" possible to write a testcase that is miscompiled
> > after your patch.  I also don't see this really exercised in the
> > testcases you add?
> Arggh.  You're absolutely correct.  I must be blocking out that entire
> discussion from last summer due to the trama :-)
>
> If the destination is the address of a _DECL node, can we use the size
> of the _DECL?

Yes, but this should already happen for both invariant ones like &a.b.c
and variant ones like &a.b[i].c in ao_ref_init_from_ptr_and_size.

Richard.

>
>
> > All of them reference a static array a ...
> >
> > +           if (TREE_CODE (size) != INTEGER_CST)
> >
> > TREE_CODE (size) == SSA_NAME you want
> Agreed.  Will fix.
>
> jeff
Reply | Threaded
Open this post in threaded view
|

Re: [RFA] [tree-optimization/80576] Handle non-constant sizes in DSE

Martin Sebor-2
In reply to this post by Jeff Law
On 8/21/19 4:47 PM, Jeff Law wrote:

> On 8/19/19 7:57 AM, Richard Biener wrote:
>
>>
>> +static tree
>> +objsize_from_type (tree object)
>> +{
>> +  if (TREE_CODE (object) != ADDR_EXPR)
>> +    return NULL_TREE;
>> +
>> +  tree type = TREE_TYPE (object);
>> +  if (POINTER_TYPE_P (type))
>>
>> that if looks suspicious...  I'd say
>>    if (!POINTER_TYPE_P (type))
>>      return NULL_TREE;
>>
>> is better
> Sure.  But we may not be able to use this anyway as you noted later,
> depending on the type is not safe.
>
>>
>> +    type = TREE_TYPE (type);
>>
>> +  if (TREE_CODE (type) == ARRAY_TYPE && !array_at_struct_end_p (object))
>> +    {
>>
>> array_at_struct_end_p will never return true here since you pass it
>> an ADDR_EXPR...  you wanted to pass TREE_OPERAND (object, 0) here?
> I think you're right.  Given this was cribbed from the tail of another
> function elsewhere, I suspect that other function has the same issue.
>
> I suspect ultimately we want to fix that other copy and drop
> objsize_from_type.

Is the other copy compute_objsize in builtins.c?  The function
has a comment that says it "is intended for diagnostics a should
not be used to influence code generation or optimization."
I added that comment on your request, not because the function
relies the type of objects but because it conservatively uses
the lower bound of ranges of offsets into arrays to determine
the minimum size.

Martin
Reply | Threaded
Open this post in threaded view
|

Re: [RFA] [tree-optimization/80576] Handle non-constant sizes in DSE

Martin Sebor-2
In reply to this post by Jeff Law
On 8/21/19 4:34 PM, Jeff Law wrote:

> On 8/16/19 4:21 PM, Martin Sebor wrote:
>> On 8/16/19 12:15 PM, Jeff Law wrote:
>>> On 8/16/19 12:09 PM, Marc Glisse wrote:
>>>> On Fri, 16 Aug 2019, Jeff Law wrote:
>>>>
>>>>> This patch improves our ability to detect dead stores by handling cases
>>>>> where the size memcpy, memset, strncpy, etc call is not constant.  This
>>>>> addresses some, but not all, of the issues in 80576.
>>>>>
>>>>> The key here is when the size is not constant we can make conservative
>>>>> decisions that still give us a chance to analyze the code for dead
>>>>> stores.
>>>>>
>>>>> Remember that for dead store elimination, we're trying to prove that
>>>>> given two stores, the second store overwrites (partially or fully) the
>>>>> same memory locations as the first store.  That makes the first store
>>>>> either partially or fully dead.
>>>>>
>>>>> When we encounter the first store, we set up a bitmap of bytes written
>>>>> by that store (live_bytes).  We then look at subsequent stores and
>>>>> clear
>>>>> the appropriate entries in the bitmap.
>>>>>
>>>>> If the first store has a nonconstant length argument we can use the
>>>>> range of the length argument (max) and the size of the destination
>>>>> object to make a conservative estimation of how many bytes are written.
>>>>>
>>>>> For the second store the conservative thing to do for a non-constant
>>>>> length is to use the minimum of the range of the length argument.
>>>>
>>>> So I guess it won't handle things like
>>>>
>>>> void f(char*p,int n){
>>>>     __builtin_memset(p,3,n);
>>>>     __builtin_memset(p,7,n);
>>>> }
>>>>
>>>> where we know nothing about the length, except that it is the same? Or
>>>> do you look at symbolic ranges?
>>> Nope.  I think ao_ref can represent that, so it'd just be a matter of
>>> recording "n" as the length, then verifying that the second call's
>>> length is "n" as well.  That makes the first call dead.  We'd have to
>>> bypass the byte tracking in that case, but I think that's trivial
>>> because we already have a means to do that when the sizes are too large.
>>>
>>>>
>>>>> This doesn't come up a lot in practice.  But it also happens to put
>>>>> some
>>>>> of the infrastructure in place to handle strcpy and strcpy_chk which
>>>>> are
>>>>> needed to fully resolve 80576.
>>>>>
>>>>> Bootstrapped and regression tested on x86, x86_64, ppc64le, ppc64,
>>>>> ppc32, aarch64, sparc, s390x and probably others.  Also verified that
>>>>> the tests work on the various *-elf targets in my tester.
>>>>>
>>>>> OK for the trunk?
>>>>
>>>> ENOPATCH
>>> Opps.    Attached.
>>
>> It's an improvement and I realize you said it doesn't handle
>> everything and that you don't think it comes up a lot, but...
>> I would actually expect the following example (from the bug)
>> not to be that uncommon:
>>
>>    void g (char *s)
>>    {
>>      char a[8];
>>      __builtin_strcpy (a, s);
>>      __builtin_memset (a, 0, sizeof a);
>>      f (a);
>>    }
>>
>> or at least to be more common than the equivalent alternative
>> the improvement does optimize:
>>
>>    void g (char *s)
>>    {
>>      char a[8];
>>      __builtin_memcpy (a, s,  __builtin_strlen (s));
>>      __builtin_memset (a, 0, 8);
>>      f (a);
>>    }
>>
>> It seems that making the first one work should be just a matter
>> of handling strcpy analogously (the string length doesn't matter).
>>
>> As an aside, the new tests make me realize that -Wstringop-overflow
>> should be enhanced to detect this problem (i.e., a consider
>> the largest size in a PHI).
> Certainly not seeing much of these in the code I've looked at.  It may
> be a case that aliasing gets in the way often.
>
> Also note that we've got the same issues in this space that we did with
> the strlen optimization improvements for last year.  I totally spaced
> that and the net result may be we have to avoid using the type size for
> anything in DSE which may make it impossible to really do a good job.

The issue with the strlen optimization was that it relied on
the type of subobjects of multidimensional arrays and structs
to determine the maximum valid size of the access to them.
This ran afoul of assumptions in code that doesn't respect
subobject boundaries.

This is not a concern for outermost objects like in the example
above.  strlen and other parts of GCC still make use of their
size for optimization.  The middle-end diagnostics I've been
adding still expect code to respect subobject boundaries.  I've
been doing that for this very reason: to let us do a better job
not just optimizing code, but also diagnosing bugs: find more
real problems with fewer false positives.

Martin
Reply | Threaded
Open this post in threaded view
|

Re: [RFA] [tree-optimization/80576] Handle non-constant sizes in DSE

Jeff Law
On 8/22/19 10:39 AM, Martin Sebor wrote:

> On 8/21/19 4:34 PM, Jeff Law wrote:
>> On 8/16/19 4:21 PM, Martin Sebor wrote:
>>> On 8/16/19 12:15 PM, Jeff Law wrote:
>>>> On 8/16/19 12:09 PM, Marc Glisse wrote:
>>>>> On Fri, 16 Aug 2019, Jeff Law wrote:
>>>>>
>>>>>> This patch improves our ability to detect dead stores by handling
>>>>>> cases
>>>>>> where the size memcpy, memset, strncpy, etc call is not constant. 
>>>>>> This
>>>>>> addresses some, but not all, of the issues in 80576.
>>>>>>
>>>>>> The key here is when the size is not constant we can make
>>>>>> conservative
>>>>>> decisions that still give us a chance to analyze the code for dead
>>>>>> stores.
>>>>>>
>>>>>> Remember that for dead store elimination, we're trying to prove that
>>>>>> given two stores, the second store overwrites (partially or fully)
>>>>>> the
>>>>>> same memory locations as the first store.  That makes the first store
>>>>>> either partially or fully dead.
>>>>>>
>>>>>> When we encounter the first store, we set up a bitmap of bytes
>>>>>> written
>>>>>> by that store (live_bytes).  We then look at subsequent stores and
>>>>>> clear
>>>>>> the appropriate entries in the bitmap.
>>>>>>
>>>>>> If the first store has a nonconstant length argument we can use the
>>>>>> range of the length argument (max) and the size of the destination
>>>>>> object to make a conservative estimation of how many bytes are
>>>>>> written.
>>>>>>
>>>>>> For the second store the conservative thing to do for a non-constant
>>>>>> length is to use the minimum of the range of the length argument.
>>>>>
>>>>> So I guess it won't handle things like
>>>>>
>>>>> void f(char*p,int n){
>>>>>     __builtin_memset(p,3,n);
>>>>>     __builtin_memset(p,7,n);
>>>>> }
>>>>>
>>>>> where we know nothing about the length, except that it is the same? Or
>>>>> do you look at symbolic ranges?
>>>> Nope.  I think ao_ref can represent that, so it'd just be a matter of
>>>> recording "n" as the length, then verifying that the second call's
>>>> length is "n" as well.  That makes the first call dead.  We'd have to
>>>> bypass the byte tracking in that case, but I think that's trivial
>>>> because we already have a means to do that when the sizes are too
>>>> large.
>>>>
>>>>>
>>>>>> This doesn't come up a lot in practice.  But it also happens to put
>>>>>> some
>>>>>> of the infrastructure in place to handle strcpy and strcpy_chk which
>>>>>> are
>>>>>> needed to fully resolve 80576.
>>>>>>
>>>>>> Bootstrapped and regression tested on x86, x86_64, ppc64le, ppc64,
>>>>>> ppc32, aarch64, sparc, s390x and probably others.  Also verified that
>>>>>> the tests work on the various *-elf targets in my tester.
>>>>>>
>>>>>> OK for the trunk?
>>>>>
>>>>> ENOPATCH
>>>> Opps.    Attached.
>>>
>>> It's an improvement and I realize you said it doesn't handle
>>> everything and that you don't think it comes up a lot, but...
>>> I would actually expect the following example (from the bug)
>>> not to be that uncommon:
>>>
>>>    void g (char *s)
>>>    {
>>>      char a[8];
>>>      __builtin_strcpy (a, s);
>>>      __builtin_memset (a, 0, sizeof a);
>>>      f (a);
>>>    }
>>>
>>> or at least to be more common than the equivalent alternative
>>> the improvement does optimize:
>>>
>>>    void g (char *s)
>>>    {
>>>      char a[8];
>>>      __builtin_memcpy (a, s,  __builtin_strlen (s));
>>>      __builtin_memset (a, 0, 8);
>>>      f (a);
>>>    }
>>>
>>> It seems that making the first one work should be just a matter
>>> of handling strcpy analogously (the string length doesn't matter).
>>>
>>> As an aside, the new tests make me realize that -Wstringop-overflow
>>> should be enhanced to detect this problem (i.e., a consider
>>> the largest size in a PHI).
>> Certainly not seeing much of these in the code I've looked at.  It may
>> be a case that aliasing gets in the way often.
>>
>> Also note that we've got the same issues in this space that we did with
>> the strlen optimization improvements for last year.  I totally spaced
>> that and the net result may be we have to avoid using the type size for
>> anything in DSE which may make it impossible to really do a good job.
>
> The issue with the strlen optimization was that it relied on
> the type of subobjects of multidimensional arrays and structs
> to determine the maximum valid size of the access to them.
> This ran afoul of assumptions in code that doesn't respect
> subobject boundaries.
Right.  And my proposed DSE changes have the same fundamental problem.

>
> This is not a concern for outermost objects like in the example
> above.  strlen and other parts of GCC still make use of their
> size for optimization.  The middle-end diagnostics I've been
> adding still expect code to respect subobject boundaries.  I've
> been doing that for this very reason: to let us do a better job
> not just optimizing code, but also diagnosing bugs: find more
> real problems with fewer false positives.
True, but the proposed DSE code does not restrict itself to outermost
objects.  That's what I need to fix before resubmitting.  Thankfully
it's not terribly hard.

jeff
Reply | Threaded
Open this post in threaded view
|

Re: [RFA] [tree-optimization/80576] Handle non-constant sizes in DSE

Jeff Law
In reply to this post by Martin Sebor-2
On 8/22/19 8:40 AM, Martin Sebor wrote:

> On 8/21/19 4:47 PM, Jeff Law wrote:
>> On 8/19/19 7:57 AM, Richard Biener wrote:
>>
>>>
>>> +static tree
>>> +objsize_from_type (tree object)
>>> +{
>>> +  if (TREE_CODE (object) != ADDR_EXPR)
>>> +    return NULL_TREE;
>>> +
>>> +  tree type = TREE_TYPE (object);
>>> +  if (POINTER_TYPE_P (type))
>>>
>>> that if looks suspicious...  I'd say
>>>    if (!POINTER_TYPE_P (type))
>>>      return NULL_TREE;
>>>
>>> is better
>> Sure.  But we may not be able to use this anyway as you noted later,
>> depending on the type is not safe.
>>
>>>
>>> +    type = TREE_TYPE (type);
>>>
>>> +  if (TREE_CODE (type) == ARRAY_TYPE && !array_at_struct_end_p
>>> (object))
>>> +    {
>>>
>>> array_at_struct_end_p will never return true here since you pass it
>>> an ADDR_EXPR...  you wanted to pass TREE_OPERAND (object, 0) here?
>> I think you're right.  Given this was cribbed from the tail of another
>> function elsewhere, I suspect that other function has the same issue.
>>
>> I suspect ultimately we want to fix that other copy and drop
>> objsize_from_type.
>
> Is the other copy compute_objsize in builtins.c?  The function
> has a comment that says it "is intended for diagnostics a should
> not be used to influence code generation or optimization."
> I added that comment on your request, not because the function
> relies the type of objects but because it conservatively uses
> the lower bound of ranges of offsets into arrays to determine
> the minimum size.
That's the one.  What's sad here is when I copied the tail of that
routine I noted the comment.  By the time I came back to the changes
several days (or weeks?) later the issue had been totally wiped from my
working memory.

I think I can just call get_base_address on the 0th operand of the
ADDR_EXPR and if I get back a _DECL node extract its DECL_UNIT_SIZE.

Jeff
Reply | Threaded
Open this post in threaded view
|

Re: [RFA] [tree-optimization/80576] Handle non-constant sizes in DSE

Jeff Law
In reply to this post by Richard Biener-2
On 8/22/19 4:46 AM, Richard Biener wrote:

>>> Also you seem to use this info to constrain optimization when you
>>> might remember that types of addresses do not carry such information...
>>> Thus it should be "trivially" possible to write a testcase that is miscompiled
>>> after your patch.  I also don't see this really exercised in the
>>> testcases you add?
>> Arggh.  You're absolutely correct.  I must be blocking out that entire
>> discussion from last summer due to the trama :-)
>>
>> If the destination is the address of a _DECL node, can we use the size
>> of the _DECL?
>
> Yes, but this should already happen for both invariant ones like &a.b.c
> and variant ones like &a.b[i].c in ao_ref_init_from_ptr_and_size.
I don't see that in ao_ref_init_from_ptr_and_size.  AFAICT if you don't
know the size when you call that routine (size == NULL), then you end up
with the ref->size and ref->max_size set to -1.

Am I missing something here?

Jeff
Reply | Threaded
Open this post in threaded view
|

Re: [RFA] [tree-optimization/80576] Handle non-constant sizes in DSE

Richard Biener-2
On Fri, Aug 23, 2019 at 9:19 PM Jeff Law <[hidden email]> wrote:

>
> On 8/22/19 4:46 AM, Richard Biener wrote:
> >>> Also you seem to use this info to constrain optimization when you
> >>> might remember that types of addresses do not carry such information...
> >>> Thus it should be "trivially" possible to write a testcase that is miscompiled
> >>> after your patch.  I also don't see this really exercised in the
> >>> testcases you add?
> >> Arggh.  You're absolutely correct.  I must be blocking out that entire
> >> discussion from last summer due to the trama :-)
> >>
> >> If the destination is the address of a _DECL node, can we use the size
> >> of the _DECL?
> >
> > Yes, but this should already happen for both invariant ones like &a.b.c
> > and variant ones like &a.b[i].c in ao_ref_init_from_ptr_and_size.
> I don't see that in ao_ref_init_from_ptr_and_size.  AFAICT if you don't
> know the size when you call that routine (size == NULL), then you end up
> with the ref->size and ref->max_size set to -1.
>
> Am I missing something here?

Ah, of course.  ao_ref_from_ptr_and_size would need to be extended
to constrain max_size.  So what I was
saying is that ao_ref_init_from_ptr_and_size should get you
a DECL ao_ref_base () from which you could constrain max_size with.
Or rather ao_ref_from_ptr_and_size should be extended do that,
mimicing what get_ref_base_and_extent does at the end in the
if (DECL_P (exp)) case (mind flag_unconstrained_commons!).

Richard.

>
> Jeff
Reply | Threaded
Open this post in threaded view
|

Re: [RFA] [tree-optimization/80576] Handle non-constant sizes in DSE

Jeff Law
On 8/26/19 3:00 AM, Richard Biener wrote:

> On Fri, Aug 23, 2019 at 9:19 PM Jeff Law <[hidden email]> wrote:
>>
>> On 8/22/19 4:46 AM, Richard Biener wrote:
>>>>> Also you seem to use this info to constrain optimization when you
>>>>> might remember that types of addresses do not carry such information...
>>>>> Thus it should be "trivially" possible to write a testcase that is miscompiled
>>>>> after your patch.  I also don't see this really exercised in the
>>>>> testcases you add?
>>>> Arggh.  You're absolutely correct.  I must be blocking out that entire
>>>> discussion from last summer due to the trama :-)
>>>>
>>>> If the destination is the address of a _DECL node, can we use the size
>>>> of the _DECL?
>>>
>>> Yes, but this should already happen for both invariant ones like &a.b.c
>>> and variant ones like &a.b[i].c in ao_ref_init_from_ptr_and_size.
>> I don't see that in ao_ref_init_from_ptr_and_size.  AFAICT if you don't
>> know the size when you call that routine (size == NULL), then you end up
>> with the ref->size and ref->max_size set to -1.
>>
>> Am I missing something here?
>
> Ah, of course.  ao_ref_from_ptr_and_size would need to be extended
> to constrain max_size.  So what I was
> saying is that ao_ref_init_from_ptr_and_size should get you
> a DECL ao_ref_base () from which you could constrain max_size with.
> Or rather ao_ref_from_ptr_and_size should be extended do that,
> mimicing what get_ref_base_and_extent does at the end in the
> if (DECL_P (exp)) case (mind flag_unconstrained_commons!).
Not a bad idea to constrain ao_ref's max_size this way.  Not offhand
sure if other passes would be able to exploit having that max_size set,
but DSE certainly could.

I'll see if I can add that and drop the equivalent DSE bits.

Jeff
Reply | Threaded
Open this post in threaded view
|

Re: [RFA] [tree-optimization/80576] Handle non-constant sizes in DSE

Jeff Law
In reply to this post by Richard Biener-2
On 8/26/19 3:00 AM, Richard Biener wrote:

> On Fri, Aug 23, 2019 at 9:19 PM Jeff Law <[hidden email]> wrote:
>>
>> On 8/22/19 4:46 AM, Richard Biener wrote:
>>>>> Also you seem to use this info to constrain optimization when you
>>>>> might remember that types of addresses do not carry such information...
>>>>> Thus it should be "trivially" possible to write a testcase that is miscompiled
>>>>> after your patch.  I also don't see this really exercised in the
>>>>> testcases you add?
>>>> Arggh.  You're absolutely correct.  I must be blocking out that entire
>>>> discussion from last summer due to the trama :-)
>>>>
>>>> If the destination is the address of a _DECL node, can we use the size
>>>> of the _DECL?
>>>
>>> Yes, but this should already happen for both invariant ones like &a.b.c
>>> and variant ones like &a.b[i].c in ao_ref_init_from_ptr_and_size.
>> I don't see that in ao_ref_init_from_ptr_and_size.  AFAICT if you don't
>> know the size when you call that routine (size == NULL), then you end up
>> with the ref->size and ref->max_size set to -1.
>>
>> Am I missing something here?
>
> Ah, of course.  ao_ref_from_ptr_and_size would need to be extended
> to constrain max_size.  So what I was
> saying is that ao_ref_init_from_ptr_and_size should get you
> a DECL ao_ref_base () from which you could constrain max_size with.
> Or rather ao_ref_from_ptr_and_size should be extended do that,
> mimicing what get_ref_base_and_extent does at the end in the
> if (DECL_P (exp)) case (mind flag_unconstrained_commons!).
So I was going to use get_ref_base_and_extent from within
ao_ref_init_from_ptr_and_size to capture these cases, but
get_ref_base_and_extent internally uses TYPE_SIZE to get the maximum
size of the referenced object.

That likely represents a codegen bug waiting to happen.

I'll see if I can refactor just the bits we want so that we're not
duplicating anything.

Jeff
Reply | Threaded
Open this post in threaded view
|

Re: [RFA] [tree-optimization/80576] Handle non-constant sizes in DSE

Richard Biener-2
On Mon, Sep 9, 2019 at 10:10 PM Jeff Law <[hidden email]> wrote:

>
> On 8/26/19 3:00 AM, Richard Biener wrote:
> > On Fri, Aug 23, 2019 at 9:19 PM Jeff Law <[hidden email]> wrote:
> >>
> >> On 8/22/19 4:46 AM, Richard Biener wrote:
> >>>>> Also you seem to use this info to constrain optimization when you
> >>>>> might remember that types of addresses do not carry such information...
> >>>>> Thus it should be "trivially" possible to write a testcase that is miscompiled
> >>>>> after your patch.  I also don't see this really exercised in the
> >>>>> testcases you add?
> >>>> Arggh.  You're absolutely correct.  I must be blocking out that entire
> >>>> discussion from last summer due to the trama :-)
> >>>>
> >>>> If the destination is the address of a _DECL node, can we use the size
> >>>> of the _DECL?
> >>>
> >>> Yes, but this should already happen for both invariant ones like &a.b.c
> >>> and variant ones like &a.b[i].c in ao_ref_init_from_ptr_and_size.
> >> I don't see that in ao_ref_init_from_ptr_and_size.  AFAICT if you don't
> >> know the size when you call that routine (size == NULL), then you end up
> >> with the ref->size and ref->max_size set to -1.
> >>
> >> Am I missing something here?
> >
> > Ah, of course.  ao_ref_from_ptr_and_size would need to be extended
> > to constrain max_size.  So what I was
> > saying is that ao_ref_init_from_ptr_and_size should get you
> > a DECL ao_ref_base () from which you could constrain max_size with.
> > Or rather ao_ref_from_ptr_and_size should be extended do that,
> > mimicing what get_ref_base_and_extent does at the end in the
> > if (DECL_P (exp)) case (mind flag_unconstrained_commons!).
> So I was going to use get_ref_base_and_extent from within
> ao_ref_init_from_ptr_and_size to capture these cases, but
> get_ref_base_and_extent internally uses TYPE_SIZE to get the maximum
> size of the referenced object.
>
> That likely represents a codegen bug waiting to happen.

Yeah, you can't use get_ref_base_and_extent literally here.

> I'll see if I can refactor just the bits we want so that we're not
> duplicating anything.

Not sure if that's too important.  But yes, splitting out

  if (DECL_P (exp))
    {
      if (VAR_P (exp)
          && ((flag_unconstrained_commons && DECL_COMMON (exp))
              || (DECL_EXTERNAL (exp) && seen_variable_array_ref)))
        {
          tree sz_tree = TYPE_SIZE (TREE_TYPE (exp));
          /* If size is unknown, or we have read to the end, assume there
             may be more to the structure than we are told.  */
          if (TREE_CODE (TREE_TYPE (exp)) == ARRAY_TYPE
              || (seen_variable_array_ref
                  && (sz_tree == NULL_TREE
                      || !poly_int_tree_p (sz_tree)
                      || maybe_eq (bit_offset + maxsize,
                                   wi::to_poly_offset (sz_tree)))))
            maxsize = -1;
        }
      /* If maxsize is unknown adjust it according to the size of the
         base decl.  */
      else if (!known_size_p (maxsize)
               && DECL_SIZE (exp)
               && poly_int_tree_p (DECL_SIZE (exp)))
        maxsize = wi::to_poly_offset (DECL_SIZE (exp)) - bit_offset;
    }
  else if (CONSTANT_CLASS_P (exp))
    {
      /* If maxsize is unknown adjust it according to the size of the
         base type constant.  */
      if (!known_size_p (maxsize)
          && TYPE_SIZE (TREE_TYPE (exp))
          && poly_int_tree_p (TYPE_SIZE (TREE_TYPE (exp))))
        maxsize = (wi::to_poly_offset (TYPE_SIZE (TREE_TYPE (exp)))
                   - bit_offset);
    }

into a helper with just the computed offset as argument
(plus maybe that seen_variable_array_ref which is meaningless
for the address case or rather has to be assumed true(?)) should be possible.

Richard.

>
> Jeff
12