Jan Kratochvil | 18 Oct 2011 11:44
Picon
Favicon

gcc dwarf2out: Drop the size + performance overhead of DW_AT_sibling

Hi Mark,

<warning> moved to a public list </warning>

On Tue, 18 Oct 2011 11:26:03 +0200, Mark Wielaard wrote:
> On Mon, 2011-10-17 at 15:36 +0200, Jan Kratochvil wrote:
> > gcc.post: Drop DW_AT_sibling; remove 27 LoC: -3.49% .debug size, -1.7%
> > GDB time.
> 
> Do you have more information about that? Systemtap for example, which
> uses elfutils libdw uses DW_AT_subling to more efficiently go through
> the debug_info DIEs.

The patch with various benchmarks is:
	[patch] dwarf2out: Drop the size + performance overhead of DW_AT_sibling
	http://gcc.gnu.org/ml/gcc-patches/2011-10/msg00992.html

GDB also uses DW_AT_sibling when available (skip_one_die and
locate_pdi_sibling).  The mail above quotation:
# I guess DW_AT_sibling had real performance gains on CPUs with 1x (=no) clock
# multipliers.  Nowadays mostly only the data size transferred over FSB matters.

The problem is the DIEs skipping by CPU is so cheap on current CPUs it cannot
be compared with the overhead of providing the helper data for it.  I did not
expect dropping DW_AT_sibling would be even a consumer performance
_improvement_.  I expected more it will be either not measurable or just not
significant enough for the .debug on-disk sizes cost justification.

I did only gdb and idb benchmarks.  systemtap benchmark is welcome, libstdc++
files for benchmark, if it is enough for systemtap this way:
(Continue reading)

Jan Kratochvil | 18 Oct 2011 11:49
Picon
Favicon

Re: gcc dwarf2out: Drop the size + performance overhead of DW_AT_sibling

On Tue, 18 Oct 2011 11:44:57 +0200, Jan Kratochvil wrote:
> The problem is the DIEs skipping by CPU is so cheap on current CPUs it cannot
> be compared with the overhead of providing the helper data for it.  I did not
> expect dropping DW_AT_sibling would be even a consumer performance
> _improvement_.  I expected more it will be either not measurable or just not
> significant enough for the .debug on-disk sizes cost justification.

maybe it could be worth tune out specific special cases where DW_AT_sibling
skips larger set of DIEs and any of the consumers benefits from that case.

Just at least in the case of GDB there are so many several orders of magnitude
worse performance issues than reading out the CU data that I do not think it
matters much and the on-disk size should be the primary concern even even if
would mean some performance degradation, which will not be much measurable.

It is true systemtap is a different kind of consumer, thanks for pointing it
out.

Thanks,
Jan

Mark Wielaard | 19 Oct 2011 11:34
Picon
Favicon

Re: gcc dwarf2out: Drop the size + performance overhead of DW_AT_sibling

Hi Jan,

On Tue, 2011-10-18 at 11:44 +0200, Jan Kratochvil wrote:
> On Tue, 18 Oct 2011 11:26:03 +0200, Mark Wielaard wrote:
> > On Mon, 2011-10-17 at 15:36 +0200, Jan Kratochvil wrote:
> > > gcc.post: Drop DW_AT_sibling; remove 27 LoC: -3.49% .debug size, -1.7%
> > > GDB time.
> > 
> > Do you have more information about that? Systemtap for example, which
> > uses elfutils libdw uses DW_AT_subling to more efficiently go through
> > the debug_info DIEs.
> 
> The patch with various benchmarks is:
> 	[patch] dwarf2out: Drop the size + performance overhead of DW_AT_sibling
> 	http://gcc.gnu.org/ml/gcc-patches/2011-10/msg00992.html
> 
> GDB also uses DW_AT_sibling when available (skip_one_die and
> locate_pdi_sibling).  The mail above quotation:
> # I guess DW_AT_sibling had real performance gains on CPUs with 1x (=no) clock
> # multipliers.  Nowadays mostly only the data size transferred over FSB matters.
> 
> The problem is the DIEs skipping by CPU is so cheap on current CPUs it cannot
> be compared with the overhead of providing the helper data for it.  I did not
> expect dropping DW_AT_sibling would be even a consumer performance
> _improvement_.  I expected more it will be either not measurable or just not
> significant enough for the .debug on-disk sizes cost justification.
> 
> I did only gdb and idb benchmarks.  systemtap benchmark is welcome, libstdc++
> files for benchmark, if it is enough for systemtap this way:
> 	http://people.redhat.com/jkratoch/ns.tar.xz
(Continue reading)

Jan Kratochvil | 19 Oct 2011 13:28
Picon
Favicon

Re: gcc dwarf2out: Drop the size + performance overhead of DW_AT_sibling

Hi Mark,

On Wed, 19 Oct 2011 11:34:18 +0200, Mark Wielaard wrote:
> real	0m0.558s
->
> real	0m0.603s

I find it a significant enough performance degradation.

I will look into some compromise of a selective DW_AT_sibling entries.

Thanks for the stap commands for performance tuning.

Regards,
Jan

Jan Kratochvil | 19 Oct 2011 17:37
Picon
Favicon

Re: gcc dwarf2out: Drop the size + performance overhead of DW_AT_sibling

Hi Mark,

(.debug size)
5059640 = -3.71% no DW_AT_sibling
stap	0m0.577s (real)
gdb	0m0.234s (real)

5061160 = -3.68% DW_AT_sibling if >= 256 total children
stap	0m0.572s
gdb	0m0.232s

5084040 = -3.25%; DW_AT_sibling if >= 16 total children
stap	0m0.545s
gdb	0m0.231s

5169888 = -1.62%; DW_AT_sibling if >= 4 total children
stap	0m0.540s
gdb	0m0.235s

5254792 = ------; all DW_AT_sibling
stap	0m0.536s
gdb	0m0.243s

So the stap vs. gdb performance is exactly the opposite.

I will redo the timings after another change (DW_FORM_ref_udata) which may yet
change the timing / magic threshold.

Thanks,
Jan
(Continue reading)


Gmane