Jan Kratochvil | 1 Feb 2012 14:23
Picon
Favicon

Inter-CU DWARF size optimizations and gcc -flto

Hi,

I am sorry if it is clear to everyone but I admit I played with it only
yesterday.

With
	gcc -flto -flto-partition=none

gcc outputs only single CU (Compilation Unit).  With default (omitting)
-flto-partition there are multiple CUs but still a few compared to the number
of .o files.

-flto is AFAIK the future for all the compilations.  It is well known -flto
debug info is somehow broken now but that needs to be fixed anyway.

As the DWARF size is being discussed for 5+ years I am in Tools this is
a long-term project and waiting for (helping, heh) working -flto is an
acceptable solution.

This has some implications:

(a) DWARF post-processing optimization tool no longer makes sense with -flto.

    (a1) Intra-CU optimizations in GCC make sense as it is the final output.

(b) .gdb_index will have limited scope, only to select which objfiles to expand,
    no longer to select which CUs to expand.

(c) Partial CU expansion Tom Tromey talks about is a must in such case.
    Although the smaller LTO debug info takes only 63% of GDB memory
(Continue reading)

Jakub Jelinek | 1 Feb 2012 14:32
Picon
Favicon

Re: Inter-CU DWARF size optimizations and gcc -flto

On Wed, Feb 01, 2012 at 02:23:09PM +0100, Jan Kratochvil wrote:
> I am sorry if it is clear to everyone but I admit I played with it only
> yesterday.
> 
> With
> 	gcc -flto -flto-partition=none
> 
> gcc outputs only single CU (Compilation Unit).  With default (omitting)
> -flto-partition there are multiple CUs but still a few compared to the number
> of .o files.
> 
> -flto is AFAIK the future for all the compilations.  It is well known -flto
> debug info is somehow broken now but that needs to be fixed anyway.

It isn't only somehow broken, it is quite fundamentally broken.  And even
with LTO GCC should output CUs matching the original source, one CU per
source IMHO, which is admittedly going to be very difficult though,
especially when partitioning the compilation, because multiple partitions
might need to add stuff to a single CU.  IMHO at least for us -flto is a
no-go until these problems are solved though.

	Jakub

Tom Tromey | 22 Feb 2012 22:56
Picon
Favicon

Re: Inter-CU DWARF size optimizations and gcc -flto

Jan> (b) .gdb_index will have limited scope, only to select which
Jan> objfiles to expand, no longer to select which CUs to expand.

I suspect we are going to need a better approach here anyway.
I sometimes hear about programs with more than 800 shared libraries.
If you assume separate debuginfo this means 1600 objfiles.
I think this will just crush most of the existing algorithms in gdb.

Jan> (c) Partial CU expansion Tom Tromey talks about is a must in such case.

I realized I never wrote up how this could work.  The below is sort of a
sketch that devolves into random thoughts.

I have been thinking about it since we discussed it and I think it has a
potentially severe problem.

The basic idea is simple: right now we have two DWARF readers in
dwarf2read.c, the psymtab reader and the full symbol reader.

Right now when we find a psymbol, we expand the whole CU to full
symbols.  This normally isn't too bad -- but there are some CUs out
there in practice that are quite large, and the delay reading them is
noticeable.

So, what if we unified the two readers -- eliminating one source of bugs
-- and also changed CU expansion to be DIE-based.  That is, in symtab.c,
before returning a symbol from a symtab, we would call some back-end
function to expand the symbol.  The DWARF reader would then just read
the DIEs needed to instantiate that one particular symbol plus whatever
dependencies (types usually) it has.
(Continue reading)

Daniel Jacobowitz | 26 Feb 2012 16:08

Re: Inter-CU DWARF size optimizations and gcc -flto

On Wed, Feb 22, 2012 at 4:56 PM, Tom Tromey <tromey@...> wrote:
> Jan> (b) .gdb_index will have limited scope, only to select which
> Jan> objfiles to expand, no longer to select which CUs to expand.
>
> I suspect we are going to need a better approach here anyway.
> I sometimes hear about programs with more than 800 shared libraries.
> If you assume separate debuginfo this means 1600 objfiles.
> I think this will just crush most of the existing algorithms in gdb.

You are correct, it does crush GDB :-)  I routinely try - emphasis on
try - to use GDB on programs with between 2500 and 5500 shared
libraries.  It's agonizing.  I have another project I want to work on
first, and not much time for GDB lately, but this is absolutely on my
list to improve.

--

-- 
Thanks,
Daniel

Tom Tromey | 3 Mar 2012 03:54
Picon
Favicon

Re: Inter-CU DWARF size optimizations and gcc -flto

>>>>> "Daniel" == Daniel Jacobowitz <drow@...> writes:

Daniel> You are correct, it does crush GDB :-)  I routinely try - emphasis on
Daniel> try - to use GDB on programs with between 2500 and 5500 shared
Daniel> libraries.  It's agonizing.  I have another project I want to work on
Daniel> first, and not much time for GDB lately, but this is absolutely on my
Daniel> list to improve.

I am curious how you plan to improve it.

The plan I mentioned upthread is probably pretty good for scaling to
distro-sized programs, say 200 shared libraries or less (this is
LibreOffice or Mozilla).  Maybe we could get a bit more by putting
minsyms into the index.

I am not so confident it would let gdb scale to 5000 shared libraries
though.

For that size I've had two ideas.

First, and simplest, punt.  Make the user disable automatic reading of
shared library debuginfo (or even minsyms) and make the user explicitly
mention which ones should be used -- either by 'sharedlibrary' or by a
linespec extension.

I guess this one would sort of work today.  (I haven't tried.)

Second, and harder, is the "big data" approach.  This would be something
like -- load all the debuginfo into a server, tagged by build-id,
ideally with global type- and symbol-interning; then change gdb to send
(Continue reading)

Daniel Jacobowitz | 5 Mar 2012 01:25

Re: Inter-CU DWARF size optimizations and gcc -flto

On Fri, Mar 2, 2012 at 9:54 PM, Tom Tromey <tromey@...> wrote:
>>>>>> "Daniel" == Daniel Jacobowitz <drow@...> writes:
>
> Daniel> You are correct, it does crush GDB :-)  I routinely try - emphasis on
> Daniel> try - to use GDB on programs with between 2500 and 5500 shared
> Daniel> libraries.  It's agonizing.  I have another project I want to work on
> Daniel> first, and not much time for GDB lately, but this is absolutely on my
> Daniel> list to improve.
>
> I am curious how you plan to improve it.

I have no idea.  One thing I'd like to revisit is your work on
threaded symbol load; I have plenty of cores available, and the
machine is pretty much useless to me until my test starts.  There's
also a lot of room for profiling to identify bad algorithms; I think
we spend a lot of time reading the solib list from the inferior
(something I thought I and others had fixed thoroughly already...) and
I routinely hit inefficient algorithms e.g. during "next".

>
>
> The plan I mentioned upthread is probably pretty good for scaling to
> distro-sized programs, say 200 shared libraries or less (this is
> LibreOffice or Mozilla).  Maybe we could get a bit more by putting
> minsyms into the index.
>
> I am not so confident it would let gdb scale to 5000 shared libraries
> though.
>
> For that size I've had two ideas.
(Continue reading)

Tom Tromey | 5 Mar 2012 23:03
Picon
Favicon

Re: Inter-CU DWARF size optimizations and gcc -flto

Daniel> I have no idea.  One thing I'd like to revisit is your work on
Daniel> threaded symbol load; I have plenty of cores available, and the
Daniel> machine is pretty much useless to me until my test starts.

This might help, it would be worth trying at least.
I am mildly skeptical about it working well with a very big program.
It seems like you could get into memory trouble, which would need a
different sort of scaling approach.

Also, with .gdb_index, in my tests the startup time of gdb is dominated
by minsym reading, even banal stuff like sorting them.  I think you'd
have to insert some threading bits in there too... easy though.

Daniel> There's
Daniel> also a lot of room for profiling to identify bad algorithms; I think
Daniel> we spend a lot of time reading the solib list from the inferior
Daniel> (something I thought I and others had fixed thoroughly already...) and
Daniel> I routinely hit inefficient algorithms e.g. during "next".

Yeah, I hadn't even gotten to thinking about anything other than the
symbol tables.

Tom> First, and simplest, punt.  Make the user disable automatic reading of
Tom> shared library debuginfo (or even minsyms) and make the user explicitly
Tom> mention which ones should be used -- either by 'sharedlibrary' or by a
Tom> linespec extension.

Daniel> I am hugely unexcited by this.

Yeah, me too.  It would "work" but the user experience would be not be
(Continue reading)

Gary Benson | 15 Mar 2012 13:51
Picon
Favicon
Gravatar

Re: Inter-CU DWARF size optimizations and gcc -flto

Daniel Jacobowitz wrote:
> There's also a lot of room for profiling to identify bad algorithms;
> I think we spend a lot of time reading the solib list from the
> inferior (something I thought I and others had fixed thoroughly
> already...) and I routinely hit inefficient algorithms e.g. during
> "next".

I did some work on this recently.  On my setup (with gdb and the
inferior on the same machine) it was spending a huge chunk of time
regenerating symbol tables every time the solib_event_breakpoint
hit.  The final patch I committed is here:

  http://www.cygwin.com/ml/gdb-patches/2011-10/msg00068.html

If you're seeing some sort of qsort comparison function at the top
of the profile it could be that something is bypassing this.

If you find the time is taken up mostly with transferring data from
the inferior to gdb (I never tried remote, for instance) then you
might be interested in some work I did last year on a SystemTap based
interface between glibc and gdb that should be able to be extended to
allow selective reading of the solib list.  That's waiting on Sergio's
SystemTap stuff... also the glibc maintainers seem hostile to the
idea of us inserting SystemTap probes in there.  I can dig up the code
I had for this if you're interested.

I also had a patch floating around that disabled the solib event
breakpoint under certain conditions, but I think the ambiguous
linespec stuff makes this patch invalid as you always have to be
looking out for new functions turning up.  If you're interested the
(Continue reading)


Gmane