[GSoC'19] Parallelize GCC with Threads -- Second Evaluation Status

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[GSoC'19] Parallelize GCC with Threads -- Second Evaluation Status

Giuliano Belinassi-2
Hi all,

Here is my second evaluation report, together with a simple program that
I was able to compile with my parallel version of GCC. Keep in mind that
I still have lots of concurrent issues inside the compiler and therefore
my branch will fail to compile pretty much anything else.

To reproduce my current branch, use the following steps:

1-) Clone https://gitlab.com/flusp/gcc/tree/giulianob_parallel

2-) Edit gcc/graphunit.c's variable `num_threads` to 1.

3-) Compile with --disable-bootstrap --enable-languages=c

4-) make

5-) Edit gcc/graphunit.c's variable `num_threads` to 2, for instance.

6-) make install DESTDIR="somewhere_else_that_doesnt_break_your_gcc"

7-) compile the program using -O2

I a attaching my report in markdown format, which you can convert to pdf
using `pandoc` if you find it difficult to read in the current format.

I am also open to suggestions. Please do not hesitate to comment :)

Thank you,
Giuliano.

test_atan2.c (2K) Download Attachment
parallel_deliver_2.md (19K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [GSoC'19] Parallelize GCC with Threads -- Second Evaluation Status

Richard Biener
On Sun, 21 Jul 2019, Giuliano Belinassi wrote:

> Hi all,
>
> Here is my second evaluation report, together with a simple program that
> I was able to compile with my parallel version of GCC. Keep in mind that
> I still have lots of concurrent issues inside the compiler and therefore
> my branch will fail to compile pretty much anything else.
>
> To reproduce my current branch, use the following steps:
>
> 1-) Clone https://gitlab.com/flusp/gcc/tree/giulianob_parallel
>
> 2-) Edit gcc/graphunit.c's variable `num_threads` to 1.
>
> 3-) Compile with --disable-bootstrap --enable-languages=c
>
> 4-) make
>
> 5-) Edit gcc/graphunit.c's variable `num_threads` to 2, for instance.
>
> 6-) make install DESTDIR="somewhere_else_that_doesnt_break_your_gcc"
>
> 7-) compile the program using -O2
>
> I a attaching my report in markdown format, which you can convert to pdf
> using `pandoc` if you find it difficult to read in the current format.
>
> I am also open to suggestions. Please do not hesitate to comment :)

Thanks for the report and it's great that you are making progress!

I suggest you add a --param (edit params.def) so one can choose
num_threads on the command-line instead of needing to recompile GCC.
Just keep the default "safe" so that GCC build itself will still work.

For most of the allocators I think that in the end we want to
keep most of them global but have either per-thread freelists
or a freelist implementation that can work (allocate and free)
without locking, employing some RCU scheme.  Not introducing
per-thread state is probably leaner on the implementation.
It would of course mean taking a lock when the freelist needs to
be re-filled from the main pool but that's hopefully not common.
I don't know a RCU allocator freelist implementation to copy/learn
from, but experimenting with such before going the per thread freelist
might be interesting.  Maybe not all allocators need to be treated
equal either.

Your memory-block issue is likely that you added

{
  if (!instance)
    instance = XNEW (memory_block_pool);

but as misleading as it is, XNEW doesn't invoke C++ new but
just malloc so the allocated structure isn't initialized
since it's constructor isn't invoked.  Just use

    instance = new memory_block_pool;

with that I get helgrind to run (without complaining!) on your
testcase.  I also get to compile gimple-match.c with two threads
for more than one minute before crashing on some EVRP global
state (somehow I knew the passes global state would be quite a
distraction...).

I hope the project will be motivation to cleanup the way we
handle pass-specific global state.

Thanks again,
Richard.
Reply | Threaded
Open this post in threaded view
|

Re: [GSoC'19] Parallelize GCC with Threads -- Second Evaluation Status

Richard Biener
On Mon, 22 Jul 2019, Richard Biener wrote:

> On Sun, 21 Jul 2019, Giuliano Belinassi wrote:
>
> > Hi all,
> >
> > Here is my second evaluation report, together with a simple program that
> > I was able to compile with my parallel version of GCC. Keep in mind that
> > I still have lots of concurrent issues inside the compiler and therefore
> > my branch will fail to compile pretty much anything else.
> >
> > To reproduce my current branch, use the following steps:
> >
> > 1-) Clone https://gitlab.com/flusp/gcc/tree/giulianob_parallel
> >
> > 2-) Edit gcc/graphunit.c's variable `num_threads` to 1.
> >
> > 3-) Compile with --disable-bootstrap --enable-languages=c
> >
> > 4-) make
> >
> > 5-) Edit gcc/graphunit.c's variable `num_threads` to 2, for instance.
> >
> > 6-) make install DESTDIR="somewhere_else_that_doesnt_break_your_gcc"
> >
> > 7-) compile the program using -O2
> >
> > I a attaching my report in markdown format, which you can convert to pdf
> > using `pandoc` if you find it difficult to read in the current format.
> >
> > I am also open to suggestions. Please do not hesitate to comment :)
>
> Thanks for the report and it's great that you are making progress!
>
> I suggest you add a --param (edit params.def) so one can choose
> num_threads on the command-line instead of needing to recompile GCC.
> Just keep the default "safe" so that GCC build itself will still work.
>
> For most of the allocators I think that in the end we want to
> keep most of them global but have either per-thread freelists
> or a freelist implementation that can work (allocate and free)
> without locking, employing some RCU scheme.  Not introducing
> per-thread state is probably leaner on the implementation.
> It would of course mean taking a lock when the freelist needs to
> be re-filled from the main pool but that's hopefully not common.
> I don't know a RCU allocator freelist implementation to copy/learn
> from, but experimenting with such before going the per thread freelist
> might be interesting.  Maybe not all allocators need to be treated
> equal either.
>
> Your memory-block issue is likely that you added
>
> {
>   if (!instance)
>     instance = XNEW (memory_block_pool);
>
> but as misleading as it is, XNEW doesn't invoke C++ new but
> just malloc so the allocated structure isn't initialized
> since it's constructor isn't invoked.  Just use
>
>     instance = new memory_block_pool;
>
> with that I get helgrind to run (without complaining!) on your
> testcase.  I also get to compile gimple-match.c with two threads
> for more than one minute before crashing on some EVRP global
> state (somehow I knew the passes global state would be quite a
> distraction...).
>
> I hope the project will be motivation to cleanup the way we
> handle pass-specific global state.

Btw, to get to "working" state quicker you might consider
concentrating on a pass subset for which you can conveniently
restrict optimization to just -Og, effectively parallelizing
pass_all_optimizations_g only, you then probably hit more
issues in infrastructure which is more interesting for the
project (we know there's a lot of pass-specific global state...).
Of course the time spent in pass_all_optimizations_g is minimal...

I then hit tree-ssa-live.c:usedvars quickly (slap __thread on it)
and after that the EVRP issue via the sprintf_length pass.

Richard.