Mouse | 21 Apr 2012 06:07

Incorrect demangling

I see this behaviour on 4.0.1 i386, 5.1 i386, 1.4T sparc, and
4.0_STABLE i386.  I'm curious how widespread it is.  (In particular,
I'd very much like to know if others on such systems don't see it.)

% cat z.c
extern void foo__prep(void);
int main(void);
int main(void)
{
 foo__prep();
 return(0);
}
% cc -o z z.c
/var/tmp//ccKokvnE.o: In function `main':
z.c:(.text+0x12): undefined reference to `foo(long double *,...)( *)'
% 

The issue, of course, is that it should be reporting foo__prep, not a
bizarre prototypeish thing, in the complaint.  This looks to me like a
misfire of C++ demangling, but that doesn't really help all that much,
especially since there's no C++ involved.

/~\ The ASCII				  Mouse
\ / Ribbon Campaign
 X  Against HTML		mouse <at> rodents-montreal.org
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Iain Hibbert | 21 Apr 2012 08:33

Re: Incorrect demangling

On Sat, 21 Apr 2012, Mouse wrote:

> I see this behaviour on 4.0.1 i386, 5.1 i386, 1.4T sparc, and
> 4.0_STABLE i386.  I'm curious how widespread it is.  (In particular,
> I'd very much like to know if others on such systems don't see it.)
>
> % cat z.c
> extern void foo__prep(void);
> int main(void);
> int main(void)
> {
>  foo__prep();
>  return(0);
> }
> % cc -o z z.c
> /var/tmp//ccKokvnE.o: In function `main':
> z.c:(.text+0x12): undefined reference to `foo(long double *,...)( *)'
> %
>
> The issue, of course, is that it should be reporting foo__prep, not a
> bizarre prototypeish thing, in the complaint.  This looks to me like a
> misfire of C++ demangling, but that doesn't really help all that much,
> especially since there's no C++ involved.

well if you investigated a bit more you might find that it is not the C
compiler that is doing this.

iain

(Continue reading)

Mouse | 21 Apr 2012 09:32

Re: Incorrect demangling

>> % cc -o z z.c
>> /var/tmp//ccKokvnE.o: In function `main':
>> z.c:(.text+0x12): undefined reference to `foo(long double *,...)( *)'
>> %

> well if you investigated a bit more you might find that it is not the
> C compiler that is doing this.

(a) I didn't say anything about what was doing it.  I assume it's
somewhere in ld, since that's where undefined references would normally
be discovered, but I don't see any particular relevance to that even if
it's true.

(b) There's a sense in which it is; if cc chooses to use other tools as
an implementation technique, it's still cc that they're implementing.

/~\ The ASCII				  Mouse
\ / Ribbon Campaign
 X  Against HTML		mouse <at> rodents-montreal.org
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Iain Hibbert | 21 Apr 2012 10:54

Re: Incorrect demangling

On Sat, 21 Apr 2012, Mouse wrote:

> >> % cc -o z z.c
> >> /var/tmp//ccKokvnE.o: In function `main':
> >> z.c:(.text+0x12): undefined reference to `foo(long double *,...)( *)'
> >> %
>
> > well if you investigated a bit more you might find that it is not the
> > C compiler that is doing this.
>
> (a) I didn't say anything about what was doing it.

 "This looks to me like a misfire of C++ demangling, but that doesn't
  really help all that much, especially since there's no C++ involved."

in case you did not investigate further as yet:

% gcc -o foo foo.c
/var/tmp//ccOOsL8j.o: In function `main':
foo.c:(.text+0x7): undefined reference to `foo(long double *,...)( *)'

% gcc -Wl,--no-demangle -o foo foo.c
/var/tmp//ccQoqdGk.o: In function `main':
foo.c:(.text+0x7): undefined reference to `foo__prep'

regards,
iain

Mouse | 21 Apr 2012 17:10

Re: Incorrect demangling

>>>> z.c:(.text+0x12): undefined reference to `foo(long double *,...)( *)'
>>> well if you investigated a bit more you might find that it is not
>>> the C compiler that is doing this.
>> (a) I didn't say anything about what was doing it.
> "This looks to me like a misfire of C++ demangling, but that doesn't
> really help all that much, especially since there's no C++ involved."

I'm not quite sure how that amounts to a claim that it's the C compiler
doing it, but if you want to take it that way, then sure.

> in case you did not investigate further as yet:
> [...]
> % gcc -Wl,--no-demangle -o foo foo.c

Okay, so there's a workaround, at least.

I still maintain it's a bug; (a) I shouldn't have to tell it whether
symbols deserve demangling and (b) even the workaround is rather crude,
and won't work at all when using both C and C++ in the same program.
(Not to mention that this makes me suspect it's possible to get an name
conflict between a C symbol and a (mangled) C++ symbol when it really
shouldn't be.)  I'd say the right fix is for mangling to generate
symbols C can't generate; my first impluse is to use . instead of __ to
separate the name from the encoded data types.  Better would be to
encode the "this is a mangled C++ name" somewhere other than the
symbol's name in the first place.

At any rate, it appears the bug isn't one I introduced, since you seem
to be seeing it too.

(Continue reading)

Martin Husemann | 21 Apr 2012 20:32
Picon

Re: Incorrect demangling

On Sat, Apr 21, 2012 at 11:10:57AM -0400, Mouse wrote:
> I still maintain it's a bug;

Yes, it is a bug.

An easy papering over would be to have {g}cc invoke ld with the no-demangle
flag, but not {g}c++. This would cover your example, but (as you said) still
leave mixed C/C++ programs broken.

A real fix would involve changing the (de)mangling scheme and force usage
of a special character ("$" or something) in all mangled names. However,
this would break binary compatibility.

This should be solved upstream, any volunteers for opening a bugzilla ticket?

Martin

Mouse | 21 Apr 2012 20:59

Re: Incorrect demangling

> A real fix would involve changing the (de)mangling scheme and force
> usage of a special character ("$" or something) in all mangled names.
> However, this would break binary compatibility.

I'd prefer something else, because gcc lets you use $ in identifiers on
some arches (one extend.texi I have handy says that it's supported in
general, but not on "a few target machines, typically because the
target assembler does not allow them").  That's why I suggested a dot,
but I'm hardly attached to the idea.

As for binary compatability, my inclination would be to switch mangling
styles and at the same time make demangling recognize both kinds of
name, with a message for the old style.  Then, next release, remove the
old style of demangling.

I'm not sure whether I'm talking about NetSBD releases or gcc releases
here.  I could argue it either way.

And, of course, it is likely to become irrelevant to NetBSD if/when
NetBSD switches to pcc, or clang, or whatever - well, depending on
whether the replacement does mangling, and if so, how, but it's an
opportunity to take advantage of a free flag day.

> This should be solved upstream, any volunteers for opening a bugzilla
> ticket?

Heh.  Not me; it's somewhere between difficult and impossible (and
certainly unpleasant) for me to use bug-reporting systems that force
reports through a Web interface (which AIUI is true of bugzilla).

(Continue reading)

David Holland | 21 Apr 2012 23:51
Picon

Re: Incorrect demangling

On Sat, Apr 21, 2012 at 02:59:05PM -0400, Mouse wrote:
 > As for binary compatability, my inclination would be to switch mangling
 > styles and at the same time make demangling recognize both kinds of
 > name, with a message for the old style.  Then, next release, remove the
 > old style of demangling.

It's not just that, it will break all C++ shared libraries too.

Anyhow, it's not likely to get fixed, upstream or otherwise;
identifiers containing __ were made a de facto reserved part of the
linker namespace when C++ first started to become popular... which was
well over 20 years ago.

Maybe if we ever ditch ELF we can include some kind of per-symbol
language or naming domain tags into the new format; but I'm not
convinced it's worthwhile or even necessarily desirable.

--

-- 
David A. Holland
dholland <at> netbsd.org

Mouse | 22 Apr 2012 00:01

Re: Incorrect demangling

> identifiers containing __ were made a de facto reserved part of the
> linker namespace when C++ first started to become popular... which
> was well over 20 years ago.

Someone should have told the rest of the world, then; if it's taken 20
years for me to even hear about it, it was _abysmally_ advertised, to
the point where I'm not even sure it's fair to call it "de-facto
reserved".

In any case...okay, so I'm on my own here.  I'll see how hard it is to
change the __ to something saner.  (I have some uses where sane
behaviour for linker messages is more important than binary
compatability with code compiled under the broken paradigm....)

/~\ The ASCII				  Mouse
\ / Ribbon Campaign
 X  Against HTML		mouse <at> rodents-montreal.org
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Rich Neswold | 24 Apr 2012 20:17
Picon

Re: Incorrect demangling

On Sat, Apr 21, 2012 at 5:01 PM, Mouse <mouse <at> rodents-montreal.org> wrote:
>
> > identifiers containing __ were made a de facto reserved part of the
> > linker namespace when C++ first started to become popular... which
> > was well over 20 years ago.
>
> Someone should have told the rest of the world, then; if it's taken 20
> years for me to even hear about it, it was _abysmally_ advertised, to
> the point where I'm not even sure it's fair to call it "de-facto
> reserved".

My "C++ Programming Reference, 2nd edition" (1994), section r.2.4 says:

"In addition, identifiers containing a double underscore (__) are
reserved for use by C++ implementations and standard libraries and
should be avoided by users."

So it looks like you won't convince the GCC maintainers in changing
the compiler. Plus, PCC and CLANG will probably give you similar
problems.

--
Rich

Alan Barrett | 26 Apr 2012 06:33
Gravatar

Re: Incorrect demangling

On Tue, 24 Apr 2012, Rich Neswold wrote:
> My "C++ Programming Reference, 2nd edition" (1994), section 
> r.2.4 says:
>
> "In addition, identifiers containing a double underscore (__) 
> are reserved for use by C++ implementations and standard 
> libraries and should be avoided by users."

That's in the C++ reference manual, so "users" should be 
understood to mean "users of the C++ language".  The C++ reference 
manual should have no jurisdiction over users of languages other 
than C++.

Users of the C language should be able to find everything they 
need in the C reference manual and associated library reference 
manuals, plus the implementation-specific documentation (e.g. the 
compiler manual) if they rely on implementation-specific features 
of the language.  Section 7.1.3 of the N1256 draft of the C2010 
standard provides a list of rules for determining whether an 
identifier is reserved, and double underscore is featured only 
when it appears at the beginning of an identifier.  The same 
clause also says "No other identifiers are reserved", so it seems 
pretty clear to me that identifiers that contain double underscore 
are not reserved in the C language (unless they also hit one of 
the other restrictions).

> So it looks like you won't convince the GCC maintainers in 
> changing the compiler. Plus, PCC and CLANG will probably give 
> you similar problems.

(Continue reading)

Joerg Sonnenberger | 26 Apr 2012 12:12
Picon

Re: Incorrect demangling

On Wed, Apr 25, 2012 at 09:33:01PM -0700, Alan Barrett wrote:
> >So it looks like you won't convince the GCC maintainers in
> >changing the compiler. Plus, PCC and CLANG will probably give you
> >similar problems.
> 
> It looks as though anything that performs demangling should have
> some way other than the symbol name of figuring out whether the
> symbol came from C++, so that it is not tripped up by valid C
> symbols that happen to mean something special when viewed as C++
> symbols.

It can't. Just look at the history of C++ and the way it has always been
implemented. On most systems it is simply not possible to use
non-C-compatible identifiers as result of the mangling, because the
assembler or linker would reject them. There is no bug here. Newer ld
defaults to mangling off again. It is purely cosmetic anyway. If you
want to use C and C++ in the same program, you either *depend* on being
able to create functions that can be used from C++ (including
overloading etc) or you will get a linker complain about redefinition on
accidental matches.

Joerg

Mouse | 26 Apr 2012 17:02

Re: Incorrect demangling

>> It looks as though anything that performs demangling should have
>> some way other than the symbol name of figuring out whether the
>> symbol came from C++, so that it is not tripped up by valid C
>> symbols that happen to mean something special when viewed as C++
>> symbols.
> It can't.

Sure it can.  There are at least two ways of doing it.

> Just look at the history of C++ and the way it has always been
> implemented.

"It's always been done this way, therefore it must always be done this
way"?  I sure hope that's not what you mean here, but it's what it
sounds like.  While I don't actually know, I also suspect that there's
more variety in existence than you credit.  A few very common
implementations != all implementations, in most cases.

> On most systems it is simply not possible to use non-C-compatible
> identifiers as result of the mangling, because the assembler or
> linker would reject them.

Even if true, this means "it can't be done using the legacy assembler
and linker", not "it can't be done".

I also suspect it's often not true.  Many, possibly even most,
assemblers and linkers allow dots in symbols; using .. instead of __
would completely fix the C-vs-mangled-C++ issue.  (So would extending
the symbol table entry with a bit somewhere that says "this is a
mangled C++ symbol", though that one requires changing more things that
(Continue reading)

David Holland | 27 Apr 2012 17:42
Picon

Re: Incorrect demangling

On Thu, Apr 26, 2012 at 12:12:35PM +0200, Joerg Sonnenberger wrote:
 > > >So it looks like you won't convince the GCC maintainers in
 > > >changing the compiler. Plus, PCC and CLANG will probably give you
 > > >similar problems.
 > > 
 > > It looks as though anything that performs demangling should have
 > > some way other than the symbol name of figuring out whether the
 > > symbol came from C++, so that it is not tripped up by valid C
 > > symbols that happen to mean something special when viewed as C++
 > > symbols.
 > 
 > It can't. Just look at the history of C++ and the way it has always been
 > implemented. On most systems it is simply not possible to use
 > non-C-compatible identifiers as result of the mangling, because the
 > assembler or linker would reject them. There is no bug here. Newer ld
 > defaults to mangling off again. It is purely cosmetic anyway. If you
 > want to use C and C++ in the same program, you either *depend* on being
 > able to create functions that can be used from C++ (including
 > overloading etc) or you will get a linker complain about redefinition on
 > accidental matches.

Buncombe.

You may recall that in a.out days, the C symbol "foo" appeared in the
assembler and linker namespace as "_foo", providing a large namespace
that C programs wouldn't overlap with. For some reason this practice
was dropped by the guys who designed ELF.

...furthermore, nothing prevents attaching a separate language code to
symbol names, except that our antiquated binary format doesn't support
(Continue reading)


Gmane