Trent Piepho | 1 Nov 2006 12:42
Picon

Re: Improved remove-logo filter

On Wed, 1 Nov 2006, Michael Niedermayer wrote:
> On Tue, Oct 31, 2006 at 01:13:29PM -0800, Trent Piepho wrote:
> > On Tue, 31 Oct 2006, Guillaume POIRIER wrote:
> >
> > > Hi Trent,
> > >
> > > Were you able to work on improving your patch?
> >
> > What was there to do?  Change the asm to use "+" constraints
> > even though it doesn't always work with gcc 2.7.2?
>
> does gcc 2.7.2 compile mplayer at all?

Sorry, I meant 2.95, but I'm not sure if that's the case.  Some older gcc
docs explictly say you can't use "+", newer ones say you can, but with
various conditions (which change from version to version).

You might also want to look at these threads:
http://marc.theaimsgroup.com/?l=linux-kernel&m=107475162200773&w=2
http://lkml.org/lkml/2006/7/8/251

Using "+m" vs "=m"/"m" is a complex issue.

>
> and if you use "=something" then you should also be aware of that for example
>  "=r"(a)
> :"r"(b)
>
> does not prevent %0 == %1 if you want an output to not be able to use the
> same register or memory location as an random input then you must use "=&..."
(Continue reading)

Michael Niedermayer | 1 Nov 2006 14:18
Picon
Picon

Re: Improved remove-logo filter

Hi

On Wed, Nov 01, 2006 at 03:42:30AM -0800, Trent Piepho wrote:
> On Wed, 1 Nov 2006, Michael Niedermayer wrote:
> > On Tue, Oct 31, 2006 at 01:13:29PM -0800, Trent Piepho wrote:
> > > On Tue, 31 Oct 2006, Guillaume POIRIER wrote:
> > >
> > > > Hi Trent,
> > > >
> > > > Were you able to work on improving your patch?
> > >
> > > What was there to do?  Change the asm to use "+" constraints
> > > even though it doesn't always work with gcc 2.7.2?
> >
> > does gcc 2.7.2 compile mplayer at all?
> 
> Sorry, I meant 2.95, but I'm not sure if that's the case.  Some older gcc
> docs explictly say you can't use "+", newer ones say you can, but with
> various conditions (which change from version to version).
> 
> You might also want to look at these threads:
> http://marc.theaimsgroup.com/?l=linux-kernel&m=107475162200773&w=2
> http://lkml.org/lkml/2006/7/8/251
> 
> Using "+m" vs "=m"/"m" is a complex issue.
> 
> >
> > and if you use "=something" then you should also be aware of that for example
> >  "=r"(a)
> > :"r"(b)
(Continue reading)

Trent Piepho | 1 Nov 2006 20:47
Picon

Re: Improved remove-logo filter

On Wed, 1 Nov 2006, Michael Niedermayer wrote:
> On Wed, Nov 01, 2006 at 03:42:30AM -0800, Trent Piepho wrote:
> > On Wed, 1 Nov 2006, Michael Niedermayer wrote:
> > > On Tue, Oct 31, 2006 at 01:13:29PM -0800, Trent Piepho wrote:
> > > does gcc 2.7.2 compile mplayer at all?
> >
> > Sorry, I meant 2.95, but I'm not sure if that's the case.  Some older gcc

I'm now more sure of how this worked.  In 2.7.2, '+' wasn't allowed.  In
2.95, it was allowed but didn't work correctly.  I'm not sure when, or if,
it was fixed.

> > You might also want to look at these threads:
> > http://marc.theaimsgroup.com/?l=linux-kernel&m=107475162200773&w=2

> > > does not prevent %0 == %1 if you want an output to not be able to use the
> > > same register or memory location as an random input then you must use "=&..."
> > > iam not sure if that could cause any problems with your code as i didnt look
> > > at it, just the constraints quoted above in which "=m" (accumulator) and
> > > "m" (accumulator) could be in the same memory location or a different one
> > > or "=m" (accumulator) and "g" (stride) could be in the same memory location
> >
> > How could accumulator be in two different memory locations?
>
> an opimizing compiler can make a copy, for example it could copy it to
> the stack, gcc may or may not be capable of that but that doesnt matter
> for the validity of the code ...

It isn't allowed do that.  If it did, it would be impossible to write
atomic operations.  There would be no way to write something like a
(Continue reading)

Michael Niedermayer | 2 Nov 2006 03:08
Picon
Picon

Re: Improved remove-logo filter

Hi

On Wed, Nov 01, 2006 at 11:47:22AM -0800, Trent Piepho wrote:
> On Wed, 1 Nov 2006, Michael Niedermayer wrote:
> > On Wed, Nov 01, 2006 at 03:42:30AM -0800, Trent Piepho wrote:
> > > On Wed, 1 Nov 2006, Michael Niedermayer wrote:
> > > > On Tue, Oct 31, 2006 at 01:13:29PM -0800, Trent Piepho wrote:
> > > > does gcc 2.7.2 compile mplayer at all?
> > >
> > > Sorry, I meant 2.95, but I'm not sure if that's the case.  Some older gcc
> 
> I'm now more sure of how this worked.  In 2.7.2, '+' wasn't allowed.  In
> 2.95, it was allowed but didn't work correctly.  I'm not sure when, or if,
> it was fixed.

well fact be mplayer uses "+" frequently and mplayer works very well with 
2.95.3+, there where some big bugs in 2.95.2 related to asm though i dunno
if they where related to "+"

> 
> > > You might also want to look at these threads:
> > > http://marc.theaimsgroup.com/?l=linux-kernel&m=107475162200773&w=2
> 
> > > > does not prevent %0 == %1 if you want an output to not be able to use the
> > > > same register or memory location as an random input then you must use "=&..."
> > > > iam not sure if that could cause any problems with your code as i didnt look
> > > > at it, just the constraints quoted above in which "=m" (accumulator) and
> > > > "m" (accumulator) could be in the same memory location or a different one
> > > > or "=m" (accumulator) and "g" (stride) could be in the same memory location
> > >
(Continue reading)

Trent Piepho | 3 Nov 2006 00:22
Picon

Re: Improved remove-logo filter

On Thu, 2 Nov 2006, Michael Niedermayer wrote:
> > > >
> > > > Sorry, I meant 2.95, but I'm not sure if that's the case.  Some older gcc
> >
> > I'm now more sure of how this worked.  In 2.7.2, '+' wasn't allowed.  In
> > 2.95, it was allowed but didn't work correctly.  I'm not sure when, or if,
> > it was fixed.
>
> well fact be mplayer uses "+" frequently and mplayer works very well with
> 2.95.3+, there where some big bugs in 2.95.2 related to asm though i dunno
> if they where related to "+"

Comments like these make me think everything doesn't always work that well:
//FIXME this is fragile gcc either runs out of registers or misscompiles it (for
example if "+a"(bit) or "+m"(*state) is used
//Note "+bm" and "+mb" are buggy too (with gcc 3.2.2 at least) and cant be used

And then all the hard coded registers...

If one of the gcc developers says that there were problems with '+m' in asm
constraints, I'd be inclined to believe him.  Freebsd people seem to have
found the same thing:
http://www.mail-archive.com/cvs-all <at> freebsd.org/msg49149.html

> > > > You might also want to look at these threads:
> > > > http://marc.theaimsgroup.com/?l=linux-kernel&m=107475162200773&w=2
> >
> > > > > does not prevent %0 == %1 if you want an output to not be able to use the
> > > > > same register or memory location as an random input then you must use "=&..."
> > > > > iam not sure if that could cause any problems with your code as i didnt look
(Continue reading)

Uoti Urpala | 3 Nov 2006 01:22
Picon
Picon

Re: Improved remove-logo filter

On Thu, 2006-11-02 at 15:22 -0800, Trent Piepho wrote:
> And in order to be able
> to write one in asm, you need to be able to modify a non-copy of a
> memory location.

I already explained once that you can use the volatile keyword for
this...

Are you really seriously arguing that gcc should disable all
optimizations in a function if there's any asm? That it must not keep
variables in registers since original memory copy of the variable could
be changed from outside etc?

> Did you read the same thread I did?
> 
> "=m"(x) : "0"(y)  isn't allowed, as a and b can't be in the same location.

I'm not sure exactly what you're arguing for in here and the rest of
your message, but I think it's not about the same thing as your original
disagreement with Michael.

Michael's original comment which you disagreed with said that if you do
use [out]"=m"(a):[in]"m"(a) then [out] and [in] could be the same memory
location or different ones, and [out] might overlap some other input
parameter. I don't see anything in your latest reply which would show
that gcc guarantees the opposite will always be true.
Trent Piepho | 3 Nov 2006 17:13
Picon

Re: Improved remove-logo filter

On Fri, 3 Nov 2006, Uoti Urpala wrote:
> On Thu, 2006-11-02 at 15:22 -0800, Trent Piepho wrote:
> > And in order to be able
> > to write one in asm, you need to be able to modify a non-copy of a
> > memory location.
>
> I already explained once that you can use the volatile keyword for
> this...

That's not how it works...  Say I write this:

volatile int x;
asm("# atomic test and set %0" : "+r"(x));

Obviously x must be copied into a register.  It's volatile, yet the asm
instruction gets a copy.  Obviously, this won't work for a spin-lock.

volatile just means that all the loads and stores implied in the code must
be present.  gcc can copy all the inputs to a new location before an asm
block, and then copy all the outputs back out after the block, and this
satisfies volatile.  This isn't enough for something like a spin-lock.

> Are you really seriously arguing that gcc should disable all
> optimizations in a function if there's any asm? That it must not keep
> variables in registers since original memory copy of the variable could
> be changed from outside etc?

Where did I say anything like that?  I said that in order to be able to
write atomic operations, gcc must make "m" constraints not be copies, but
the actual variable.
(Continue reading)

Michael Niedermayer | 4 Nov 2006 02:22
Picon
Picon

Re: Improved remove-logo filter

Hi

On Fri, Nov 03, 2006 at 08:13:37AM -0800, Trent Piepho wrote:
[...]
> 
> > Are you really seriously arguing that gcc should disable all
> > optimizations in a function if there's any asm? That it must not keep
> > variables in registers since original memory copy of the variable could
> > be changed from outside etc?
> 
> Where did I say anything like that?  I said that in order to be able to
> write atomic operations, gcc must make "m" constraints not be copies, but
> the actual variable.

no, there just needs to be any syntactical way to say "dont copy this anywhere
and give me the real location" this can be volatile or "m" or anything else,
its the decission of the compiler developers

if "m" has this additional (undocumented) constraint, its useless for anything
but spinlocks and similar, and a "m2" would be needed for normal applications

> 
> > > Did you read the same thread I did?
> > >
> > > "=m"(x) : "0"(y)  isn't allowed, as a and b can't be in the same location.
> >
> > I'm not sure exactly what you're arguing for in here and the rest of
> > your message, but I think it's not about the same thing as your original
> > disagreement with Michael.
> 
(Continue reading)

Trent Piepho | 4 Nov 2006 11:37
Picon

Re: Improved remove-logo filter

On Sat, 4 Nov 2006, Michael Niedermayer wrote:
> so to summarize, please correct me if iam wrong
> "=m"/"m" works best on most gcc versions according to you, it according
> to you and one gcc developer is guranteed to have both pointing to the same
> spot in memory while nothing? in the gcc docs would support such a view and
> consequently according to the docs has undefined bahavior
>
> so until the gcc docs contain a gurantee that "=m"(a)/"m"(a) will always
> point to the same spot in memory, i assume that this is not guranteed
> yes the official docs are more authorative then 2 random people
> and so code which depends on such assumtations is consequently buggy and
> rejected
> if your claim is really true please bug the gcc devels to add that to the
> docs or point me to the part of the docs which confirm your claim

I looks like you didn't notice this bit in the gcc 4.1.1 docs:

	Use the constraint character `+' to indicate [a read-write
	operand].  You should ONLY use read-write operands when the
	constraints for the operand (or the operand in which only some of
	the bits are to be changed) ALLOW A REGISTER.

The docs say, only use "+" with constraints that allow a register, which
obviously means don't use it with "m"!  So the "+m" form is out, the docs
(and the various other things I've pointed out) say so.

If you use "=m":"0", you get a warning:
void foo(void) { int x; asm("# %0":"=m"(x):"0"(x)); }
test.c:1: warning: matching constraint does not allow a register

(Continue reading)

Michael Niedermayer | 5 Nov 2006 14:14
Picon
Picon

Re: Improved remove-logo filter

Hi

On Sat, Nov 04, 2006 at 02:37:58AM -0800, Trent Piepho wrote:
> On Sat, 4 Nov 2006, Michael Niedermayer wrote:
> > so to summarize, please correct me if iam wrong
> > "=m"/"m" works best on most gcc versions according to you, it according
> > to you and one gcc developer is guranteed to have both pointing to the same
> > spot in memory while nothing? in the gcc docs would support such a view and
> > consequently according to the docs has undefined bahavior
> >
> > so until the gcc docs contain a gurantee that "=m"(a)/"m"(a) will always
> > point to the same spot in memory, i assume that this is not guranteed
> > yes the official docs are more authorative then 2 random people
> > and so code which depends on such assumtations is consequently buggy and
> > rejected
> > if your claim is really true please bug the gcc devels to add that to the
> > docs or point me to the part of the docs which confirm your claim
> 
> I looks like you didn't notice this bit in the gcc 4.1.1 docs:
> 
> 	Use the constraint character `+' to indicate [a read-write
> 	operand].  You should ONLY use read-write operands when the
> 	constraints for the operand (or the operand in which only some of
> 	the bits are to be changed) ALLOW A REGISTER.
> 
> The docs say, only use "+" with constraints that allow a register, which
> obviously means don't use it with "m"!  So the "+m" form is out, the docs
> (and the various other things I've pointed out) say so.

do you know what the word "should" means? apparently not also my copy of
(Continue reading)

Trent Piepho | 4 Nov 2006 08:53
Picon

Re: Improved remove-logo filter

On Sat, 4 Nov 2006, Michael Niedermayer wrote:
> On Fri, Nov 03, 2006 at 08:13:37AM -0800, Trent Piepho wrote:
> [...]
> >
> > > Are you really seriously arguing that gcc should disable all
> > > optimizations in a function if there's any asm? That it must not keep
> > > variables in registers since original memory copy of the variable could
> > > be changed from outside etc?
> >
> > Where did I say anything like that?  I said that in order to be able to
> > write atomic operations, gcc must make "m" constraints not be copies, but
> > the actual variable.
>
> no, there just needs to be any syntactical way to say "dont copy this anywhere
> and give me the real location" this can be volatile or "m" or anything else,
> its the decission of the compiler developers
>
> if "m" has this additional (undocumented) constraint, its useless for anything
> but spinlocks and similar, and a "m2" would be needed for normal applications

What makes it useless?

> > > > Did you read the same thread I did?
> > > >
> > > > "=m"(x) : "0"(y)  isn't allowed, as a and b can't be in the same location.
> > >
> > > I'm not sure exactly what you're arguing for in here and the rest of
> > > your message, but I think it's not about the same thing as your original
> > > disagreement with Michael.
> >
(Continue reading)

Michael Niedermayer | 5 Nov 2006 13:52
Picon
Picon

Re: Improved remove-logo filter

Hi

On Fri, Nov 03, 2006 at 11:53:13PM -0800, Trent Piepho wrote:
> On Sat, 4 Nov 2006, Michael Niedermayer wrote:
> > On Fri, Nov 03, 2006 at 08:13:37AM -0800, Trent Piepho wrote:
> > [...]
> > >
> > > > Are you really seriously arguing that gcc should disable all
> > > > optimizations in a function if there's any asm? That it must not keep
> > > > variables in registers since original memory copy of the variable could
> > > > be changed from outside etc?
> > >
> > > Where did I say anything like that?  I said that in order to be able to
> > > write atomic operations, gcc must make "m" constraints not be copies, but
> > > the actual variable.
> >
> > no, there just needs to be any syntactical way to say "dont copy this anywhere
> > and give me the real location" this can be volatile or "m" or anything else,
> > its the decission of the compiler developers
> >
> > if "m" has this additional (undocumented) constraint, its useless for anything
> > but spinlocks and similar, and a "m2" would be needed for normal applications
> 
> What makes it useless?

well, look at all the asm gcc bugs on gccs bugzilla
if gcc cannot copy "m" onto the stack and use it from there then it will end
up with some problems if the number of "m" and "r" exceeds the number of
registers minus the ones gcc cant use due to its architecture in case of PIC
so with PIC and without -fomit-frame-pointer on x86 you are limited to 5 asm 
(Continue reading)

Trent Piepho | 5 Nov 2006 22:37
Picon

Re: Improved remove-logo filter

On Sun, 5 Nov 2006, Michael Niedermayer wrote:
> On Fri, Nov 03, 2006 at 11:53:13PM -0800, Trent Piepho wrote:
> > On Sat, 4 Nov 2006, Michael Niedermayer wrote:
> > > no, there just needs to be any syntactical way to say "dont copy this anywhere
> > > and give me the real location" this can be volatile or "m" or anything else,
> > > its the decission of the compiler developers
> > >
> > > if "m" has this additional (undocumented) constraint, its useless for anything
> > > but spinlocks and similar, and a "m2" would be needed for normal applications
> >
> > What makes it useless?
>
> well, look at all the asm gcc bugs on gccs bugzilla
> if gcc cannot copy "m" onto the stack and use it from there then it will end
> up with some problems if the number of "m" and "r" exceeds the number of
> registers minus the ones gcc cant use due to its architecture in case of PIC
> so with PIC and without -fomit-frame-pointer on x86 you are limited to 5 asm
> operands which is simply not enough

Which is the case (and what I have been saying all along):

int *a,*b,*c,*d, *e,*f,*g,*h;
void foo(void) {
asm volatile("# %0 %1 %2 %3\n\t# %4 %5"/*" %6 %7"*/ :
    "=m"(*a), "=m"(*b), "=m"(*c), "=m"(*d),
    "=m"(*e), "=m"(*f)/*, "=m"(*g) , "=m"(*h) */); }

gcc -Wall -O4 -S -fpic -fno-omit-frame-pointer test.c
test.c:3: error: can't find a register in class 'GENERAL_REGS' while reloading 'asm'

(Continue reading)

Michael Niedermayer | 7 Nov 2006 15:31
Picon
Picon

Re: Improved remove-logo filter

Hi

On Sun, Nov 05, 2006 at 01:37:35PM -0800, Trent Piepho wrote:
[...]
> > and copying variables onto the stack can improve speed alot, gcc svn does
> > this according to uoti, maybe violating your dont copy constraint?
> 
> It sounded to me like what Uoti found was an asm construct that modified a
> variable through a pointer with out telling gcc about it.  Something like:
> 
> int x, *p = x;
> asm("movl $0, (%0)" : : "r"(p));
> 
> Of course something like that will not work, because gcc doesn't know the
> value of x has changed.  It needs "memory" on the clobber list, or better
> yet, "=m"(x) as an output.

"=m" needs an additional register in some cases (the ones where no not
modified register points to it or to a constant offset from it)

[...]
> > > > > > > Did you read the same thread I did?
> > > > > > >
> > > > > > > "=m"(x) : "0"(y)  isn't allowed, as a and b can't be in the same location.
> > > > > >
> > > > > > I'm not sure exactly what you're arguing for in here and the rest of
> > > > > > your message, but I think it's not about the same thing as your original
> > > > > > disagreement with Michael.
> > > > >
> > > > > Michael said don't use [out]"=m"(a):[in]"m"(a), use [out]"=m"(a):[in]"0"(a)
(Continue reading)

Trent Piepho | 8 Nov 2006 04:38
Picon

Re: Improved remove-logo filter

On Tue, 7 Nov 2006, Michael Niedermayer wrote:
> > It sounded to me like what Uoti found was an asm construct that modified a
> > variable through a pointer with out telling gcc about it.  Something like:
> >
> > int x, *p = x;
> > asm("movl $0, (%0)" : : "r"(p));
> >
> > Of course something like that will not work, because gcc doesn't know the
> > value of x has changed.  It needs "memory" on the clobber list, or better
> > yet, "=m"(x) as an output.
>
> "=m" needs an additional register in some cases (the ones where no not
> modified register points to it or to a constant offset from it)

True, if "r"(p) was an output instead of an input gcc wouldn't be able to
use the same register.  In that case there is no good way to tell gcc you
are going to dereference a pointer, without either using extra regsiter(s)
with "=m" or having gcc re-load everything with "memory".

> > > you really seems to be intentionally trying to sidestep the issue, its not
> > > what gcc does, it is what something means and what its guranteed to do, its
> > > a question of specification vs. implementation
> >
> > Suppose the docs are not some kind of specification that the inline asm
> > extension was written to follow, but the other way around.  That the inline
> > asm extension is just a way to access gcc's internal structures.  That the
> > docs are just an attempt to document how to make use of gcc's internal
> > workings.  Have you considered that?  Maybe future versions of gcc will not
> > have an asm extension that works differently, but different documentation?
>
(Continue reading)

Michael Niedermayer | 8 Nov 2006 13:27
Picon
Picon

Re: Improved remove-logo filter

Hi

On Tue, Nov 07, 2006 at 07:38:01PM -0800, Trent Piepho wrote:
[...]
> > > You have instead dismissed this as irrelevant, and claimed what is
> > > relevant is how you think future versions of gcc will work.  At that
> > > point it is no longer a discussion of facts, but one of opinions.
> >
> > you repeated many times how different gcc versions had difficulty with
> > specific asm constructs which where working with others, you yourself
> > claimed with that that gccs interpretation and implementation of asm
> > changes confirming what i said now it doesnt fit into your newly twisted
> > argumentation so you claim the opposit
> 
> My argument has always been that my claims about what asm constructs will
> work with past and present gcc versions and which will not work are
> correct.  Where you have disagreed with me about that, you have been wrong.
> When you said that one must use "=&m" to keep a memory operand from
> overlapping an input, you were wrong.

my argument has always been about the specification of asm in the docs
not the limitations of specific implementations
asm code in mplayer and ffmpeg must conform to the asm
specification in the docs if it does not it is rejected
you can twist what i said if you like it wont help you, maybe some
people who didnt follow the thread carefully will be fooled by your
missrepresentations of what others said, if so thats sad, but it cant
be helped i dont have the time to repeatly reply and say the same thing

>  "=&m" won't even compile!  
(Continue reading)

Trent Piepho | 8 Nov 2006 22:13
Picon

Re: Improved remove-logo filter

On Wed, 8 Nov 2006, Michael Niedermayer wrote:
> On Tue, Nov 07, 2006 at 07:38:01PM -0800, Trent Piepho wrote:
> [...]
> > > > You have instead dismissed this as irrelevant, and claimed what is
> > > > relevant is how you think future versions of gcc will work.  At that
> > > > point it is no longer a discussion of facts, but one of opinions.
> > >
> > > you repeated many times how different gcc versions had difficulty with
> > > specific asm constructs which where working with others, you yourself
> > > claimed with that that gccs interpretation and implementation of asm
> > > changes confirming what i said now it doesnt fit into your newly twisted
> > > argumentation so you claim the opposit
> >
> > My argument has always been that my claims about what asm constructs will
> > work with past and present gcc versions and which will not work are
> > correct.  Where you have disagreed with me about that, you have been wrong.
> > When you said that one must use "=&m" to keep a memory operand from
> > overlapping an input, you were wrong.
>
> my argument has always been about the specification of asm in the docs
> not the limitations of specific implementations
> asm code in mplayer and ffmpeg must conform to the asm
> specification in the docs if it does not it is rejected

The gcc docs specifically say that "+" should not be used with memory
operands.  Yet lavc has code which uses "+m", in violation of the
specification in the docs.  So why is that ok?  Because it works, even
though the docs say not to do it?

You said, "if you want an output to not be able to use the same register or
(Continue reading)

Michael Niedermayer | 9 Nov 2006 03:43
Picon
Picon

Re: Improved remove-logo filter

Hi

On Wed, Nov 08, 2006 at 01:13:39PM -0800, Trent Piepho wrote:
> On Wed, 8 Nov 2006, Michael Niedermayer wrote:
> > On Tue, Nov 07, 2006 at 07:38:01PM -0800, Trent Piepho wrote:
> > [...]
> > > > > You have instead dismissed this as irrelevant, and claimed what is
> > > > > relevant is how you think future versions of gcc will work.  At that
> > > > > point it is no longer a discussion of facts, but one of opinions.
> > > >
> > > > you repeated many times how different gcc versions had difficulty with
> > > > specific asm constructs which where working with others, you yourself
> > > > claimed with that that gccs interpretation and implementation of asm
> > > > changes confirming what i said now it doesnt fit into your newly twisted
> > > > argumentation so you claim the opposit
> > >
> > > My argument has always been that my claims about what asm constructs will
> > > work with past and present gcc versions and which will not work are
> > > correct.  Where you have disagreed with me about that, you have been wrong.
> > > When you said that one must use "=&m" to keep a memory operand from
> > > overlapping an input, you were wrong.
> >
> > my argument has always been about the specification of asm in the docs
> > not the limitations of specific implementations
> > asm code in mplayer and ffmpeg must conform to the asm
> > specification in the docs if it does not it is rejected
> 
> The gcc docs specifically say that "+" should not be used with memory
> operands.  Yet lavc has code which uses "+m", 

(Continue reading)

Uoti Urpala | 5 Nov 2006 23:57
Picon
Picon

Re: Improved remove-logo filter

On Sun, 2006-11-05 at 13:37 -0800, Trent Piepho wrote:
> On Sun, 5 Nov 2006, Michael Niedermayer wrote:
> > and copying variables onto the stack can improve speed alot, gcc svn does
> > this according to uoti, maybe violating your dont copy constraint?
> 
> It sounded to me like what Uoti found was an asm construct that modified a
> variable through a pointer with out telling gcc about it.  Something like:
> 
> int x, *p = x;
> asm("movl $0, (%0)" : : "r"(p));
> 
> Of course something like that will not work, because gcc doesn't know the
> value of x has changed.

The construct was similar to that, but there were no free registers in
which gcc could keep anything extra during the asm, and there was no
optimization gcc could do based on just "it must be the same as last
time". It failed because gcc made a copy of the variable *on the stack*.
For an example where this obviously makes sense consider a chain of
pointers like a->b->c->d->e->f used in several places in the code. The
final value of f is a common subexpression; even if it cannot fit in a
register it makes a lot more sense to make a copy of it on the stack
than to go through the whole chain every time the value is accessed.

>   It needs "memory" on the clobber list, or better
> yet, "=m"(x) as an output.

... except that if "=m" cannot point to an alias of the variable, your
"better" alternative is a bad idea since it requires allocating a
register to hold &x unless x is a stack variable or similar that can be
(Continue reading)

Trent Piepho | 6 Nov 2006 12:20
Picon

Re: Improved remove-logo filter

On Mon, 6 Nov 2006, Uoti Urpala wrote:
> On Sun, 2006-11-05 at 13:37 -0800, Trent Piepho wrote:
> > On Sun, 5 Nov 2006, Michael Niedermayer wrote:
> > > and copying variables onto the stack can improve speed alot, gcc svn does
> > > this according to uoti, maybe violating your dont copy constraint?
> >
> > It sounded to me like what Uoti found was an asm construct that modified a
> > variable through a pointer with out telling gcc about it.  Something like:
> >
> > int x, *p = x;
> > asm("movl $0, (%0)" : : "r"(p));
> >
> > Of course something like that will not work, because gcc doesn't know the
> > value of x has changed.
>
> The construct was similar to that, but there were no free registers in
> which gcc could keep anything extra during the asm, and there was no
> optimization gcc could do based on just "it must be the same as last
> time". It failed because gcc made a copy of the variable *on the stack*.
> For an example where this obviously makes sense consider a chain of
> pointers like a->b->c->d->e->f used in several places in the code. The
> final value of f is a common subexpression; even if it cannot fit in a
> register it makes a lot more sense to make a copy of it on the stack
> than to go through the whole chain every time the value is accessed.
>
> >   It needs "memory" on the clobber list, or better
> > yet, "=m"(x) as an output.
>
> ... except that if "=m" cannot point to an alias of the variable, your
> "better" alternative is a bad idea since it requires allocating a
(Continue reading)

Uoti Urpala | 8 Nov 2006 04:22
Picon
Picon

Re: Improved remove-logo filter

On Mon, 2006-11-06 at 03:20 -0800, Trent Piepho wrote:
> On Mon, 6 Nov 2006, Uoti Urpala wrote:
> > >   It needs "memory" on the clobber list, or better
> > > yet, "=m"(x) as an output.
> >
> > ... except that if "=m" cannot point to an alias of the variable, your
> > "better" alternative is a bad idea since it requires allocating a
> > register to hold &x unless x is a stack variable or similar that can be
> > accessed without needing an explicit pointer.
> 
> You're wrong about that.  Try it.

About what? Depending on what x is having "=m"(x) in the output list
does require an extra register; I have tested that. Probably best would
be for gcc to support memory variables in the clobber list, but
currently it doesn't.

> > I've already explained this exact thing TWICE. Are you retarted or what?
> > If you need side effects which don't directly affect the current thread
> > to happen then you must specify "volatile". This is true even for plain
> > C, without any asm whatsoever.
> 
> And I already explained to you that you are wrong about what volatile
> means.  It just means that all the loads and stores implied by the code
> must exist.  If you write this:

gcc cannot guarantee that all the loads and stores in asm exist if it
uses a copy of the variable. The asm can do multiple loads or stores and
there's no way they're all guaranteed to exist unless the asm gets the
real address. Of course gcc can specify whatever semantics for asm
(Continue reading)

Trent Piepho | 10 Nov 2006 02:43
Picon

Re: Improved remove-logo filter

On Wed, 8 Nov 2006, Uoti Urpala wrote:
> On Mon, 2006-11-06 at 03:20 -0800, Trent Piepho wrote:
> > On Mon, 6 Nov 2006, Uoti Urpala wrote:
> > > >   It needs "memory" on the clobber list, or better
> > > > yet, "=m"(x) as an output.
> > >
> > > ... except that if "=m" cannot point to an alias of the variable, your
> > > "better" alternative is a bad idea since it requires allocating a
> > > register to hold &x unless x is a stack variable or similar that can be
> > > accessed without needing an explicit pointer.
> >
> > You're wrong about that.  Try it.
>
> About what? Depending on what x is having "=m"(x) in the output list

About what you said, "it requires allocating a register to hold &x unless x
is a stack variable or similar."

I posted an example, which you edited out, of gcc using the same register
for "=m"(x) and "r"(&x) when x was not a stack variable.  You said this
could not be done, but I posted an example that proves that wrong.

> does require an extra register; I have tested that. Probably best would
> be for gcc to support memory variables in the clobber list, but
> currently it doesn't.

If you've tested it, where is your example?  I posted two that support my
claim, you've posted none that support yours.  If you have "r"(&x) as an
input (not an input/output) and add "=m"(x) as an output, gcc will be able
to use the same register for both.  You might need to turn optimization on,
(Continue reading)

Uoti Urpala | 10 Nov 2006 05:15
Picon
Picon

Re: Improved remove-logo filter

On Thu, 2006-11-09 at 17:43 -0800, Trent Piepho wrote:
> On Wed, 8 Nov 2006, Uoti Urpala wrote:
> > > > register to hold &x unless x is a stack variable or similar that can be
> > > > accessed without needing an explicit pointer.
> > >
> > > You're wrong about that.  Try it.
> >
> > About what? Depending on what x is having "=m"(x) in the output list
> 
> About what you said, "it requires allocating a register to hold &x unless x
> is a stack variable or similar."

What I said was "or similar that can be accessed without needing an
explicit pointer".

> I posted an example, which you edited out, of gcc using the same register
> for "=m"(x) and "r"(&x) when x was not a stack variable.  You said this

I wasn't talking about "=m"(x) and "r"(&x). It was about a case where
asm was modifying x but the code failed because gcc was using a stack
copy of x instead of the original variable which was modified (x didn't
appear in the asm constraints at all in this case).

Btw the asm in cabac.h which has "r"(c) in input arguments still
compiles after adding "=m"(c->low) to the output arguments if you
disable inlining, but runs out of registers with inlining enabled...

> could not be done, but I posted an example that proves that wrong.

You proved that a claim you made up yourself was wrong. I was neither
(Continue reading)

Trent Piepho | 15 Nov 2006 02:42
Picon

Re: Improved remove-logo filter

On Fri, 10 Nov 2006, Uoti Urpala wrote:
> On Thu, 2006-11-09 at 17:43 -0800, Trent Piepho wrote:
> > On Wed, 8 Nov 2006, Uoti Urpala wrote:
> > > > > register to hold &x unless x is a stack variable or similar that can be
> > > > > accessed without needing an explicit pointer.
> > > >
> > > > You're wrong about that.  Try it.
> > >
> > > About what? Depending on what x is having "=m"(x) in the output list
> >
> > About what you said, "it requires allocating a register to hold &x unless x
> > is a stack variable or similar."
>
> What I said was "or similar that can be accessed without needing an
> explicit pointer".

Which is what I provided an example of.

>
> > I posted an example, which you edited out, of gcc using the same register
> > for "=m"(x) and "r"(&x) when x was not a stack variable.  You said this
>
> I wasn't talking about "=m"(x) and "r"(&x). It was about a case where
> asm was modifying x but the code failed because gcc was using a stack
> copy of x instead of the original variable which was modified (x didn't
> appear in the asm constraints at all in this case).

I think you need to go back and look at the message you replied to:

> Something like:
(Continue reading)

Rich Felker | 2 Nov 2006 20:36

Re: Improved remove-logo filter

On Thu, Nov 02, 2006 at 03:08:41AM +0100, Michael Niedermayer wrote:
> > There would be no way to write something like a
> > spin-lock.
> 
> iam no spinlock expert but i think you cannot write a spinlock in pure
> iso/ansi C,

The concept of spinlock makes no sense in iso/ansi C since there is no
concurrency in the language. :)

Rich
Uoti Urpala | 2 Nov 2006 06:29
Picon
Picon

Re: Improved remove-logo filter

On Thu, 2006-11-02 at 03:08 +0100, Michael Niedermayer wrote:
> also iam not entirely sure but gcc-svn did break our code once
> due to missing "memory" on the clobber list and even though iam not
> completely sure why that happened, the most obvious reason why it failed
> would be that gcc did copy a variable onto the stack and used it from
> there (it could not have had a copy in a register as my asm code used
> all for other stuff)

I posted the reason on ffmpeg-dev earlier; it was that gcc did create a
copy of a variable (read-only in the function as seen by gcc) on the
stack and used that within the function, while the asm modified the
original location through a pointer.
Uoti Urpala | 1 Nov 2006 21:42
Picon
Picon

Asm parameter constraints (was: Improved remove-logo filter)

On Wed, 2006-11-01 at 11:47 -0800, Trent Piepho wrote:
> > > How could accumulator be in two different memory locations?
> >
> > an opimizing compiler can make a copy, for example it could copy it to
> > the stack, gcc may or may not be capable of that but that doesnt matter
> > for the validity of the code ...
> 
> It isn't allowed do that.  If it did, it would be impossible to write
> atomic operations.  There would be no way to write something like a
> spin-lock.

It is allowed to do that. There is a way to write a spinlock: use the
"volatile" keyword.

> Linus is saying allow "+m"(x) constraints by making them the same as
> "=m"(x):"m"(x).  He's not saying gcc should be able to copy variables in
> and out of a temporary memory location for asm constructs.  In fact, he
> specificly says something like "=m"(x) :  "0"(y) that would require two
> different variables to be in the same place in memory shouldn't be allowed.

So he says a construct that would *require* the variables to be in
different locations shouldn't be allowed. That was nothing whatsoever to
do with whether the compiler is *allowed* to keep copies in multiple
locations.
Trent Piepho | 4 Nov 2006 22:12
Picon

Re: Asm parameter constraints (was: Improved remove-logo filter)

On Fri, 3 Nov 2006, Uoti Urpala wrote:
> Michael's original comment which you disagreed with said that if you do
> use [out]"=m"(a):[in]"m"(a) then [out] and [in] could be the same memory
> location or different ones, and [out] might overlap some other input
> parameter. I don't see anything in your latest reply which would show
> that gcc guarantees the opposite will always be true.

You are saying that if you write [out]"=m"(a):[in]"m"(b) then out and in
could overlap?  This is what you mean when you say, "[out] might overlap
some other input parameter?"

If that is the case, what would you write so that out and in can't overlap?
Or do you think there is no way to keep [out]"=m"(a) from overlapping some
other input paramters, and it is thus impossible to have a "=m" parameter
that is used before all inputs are consumed?
Loren Merritt | 5 Nov 2006 08:46

Re: Asm parameter constraints (was: Improved remove-logo filter)

On Sat, 4 Nov 2006, Trent Piepho wrote:
> On Fri, 3 Nov 2006, Uoti Urpala wrote:
>
>> Michael's original comment which you disagreed with said that if you do
>> use [out]"=m"(a):[in]"m"(a) then [out] and [in] could be the same memory
>> location or different ones, and [out] might overlap some other input
>> parameter. I don't see anything in your latest reply which would show
>> that gcc guarantees the opposite will always be true.
>
> You are saying that if you write [out]"=m"(a):[in]"m"(b) then out and in
> could overlap?  This is what you mean when you say, "[out] might overlap
> some other input parameter?"
>
> If that is the case, what would you write so that out and in can't overlap?
> Or do you think there is no way to keep [out]"=m"(a) from overlapping some
> other input paramters, and it is thus impossible to have a "=m" parameter
> that is used before all inputs are consumed?

[out]"=&m"(a)

earlyclobber ("&") is specifally for that situation.

--Loren Merritt
Trent Piepho | 5 Nov 2006 21:00
Picon

Re: Asm parameter constraints (was: Improved remove-logo filter)

On Sun, 5 Nov 2006, Loren Merritt wrote:
> On Sat, 4 Nov 2006, Trent Piepho wrote:
> > On Fri, 3 Nov 2006, Uoti Urpala wrote:
> >
> >> Michael's original comment which you disagreed with said that if you do
> >> use [out]"=m"(a):[in]"m"(a) then [out] and [in] could be the same memory
> >> location or different ones, and [out] might overlap some other input
> >> parameter. I don't see anything in your latest reply which would show
> >> that gcc guarantees the opposite will always be true.
> >
> > You are saying that if you write [out]"=m"(a):[in]"m"(b) then out and in
> > could overlap?  This is what you mean when you say, "[out] might overlap
> > some other input parameter?"
> >
> > If that is the case, what would you write so that out and in can't overlap?
> > Or do you think there is no way to keep [out]"=m"(a) from overlapping some
> > other input paramters, and it is thus impossible to have a "=m" parameter
> > that is used before all inputs are consumed?
>
> [out]"=&m"(a)
>
> earlyclobber ("&") is specifally for that situation.

void foo(void) { int a, b; asm("# %0":"=&m"(a):"g"(b)); }

test.c: In function 'foo':
test.c:1: error: '&' constraint used with no register class
test.c:1: error: '&' constraint used with no register class

My question was mearly rhetorical, as I already knew (unlike those trying
(Continue reading)

Guillaume POIRIER | 1 Nov 2006 14:48
Picon
Gravatar

Re: Improved remove-logo filter

Hi,

On 11/1/06, Michael Niedermayer <michaelni <at> gmx.at> wrote:

[..]

> an opimizing compiler can make a copy, for example it could copy it to
> the stack, gcc may or may not be capable of that but that doesnt matter
> for the validity of the code ...
> also note that we had very significant speed gains from manually copying
> the cabac stuff in h.264 onto the stack so this is not a silly choice
> and the question why use one as input and one as output?
> well one might be somewhere (not the stack) but hold the uptodate value
> and one miht be on the stack and be the one used by later code

BTW, what's the part of CABAC routines that uses the stack extensively?
I'd like to see if I can write an AMD64 version that uses the
registers rather than using the stack, hoping that it's faster (it
should, though you never know with recent x86 CPUs)

Guillaume
--

-- 
With DADVSI (http://en.wikipedia.org/wiki/DADVSI), France finally has
a lead on USA on selling out individuals right to corporations!
Vive la France!
Michael Niedermayer | 1 Nov 2006 14:58
Picon
Picon

Re: Improved remove-logo filter

Hi

On Wed, Nov 01, 2006 at 02:48:48PM +0100, Guillaume POIRIER wrote:
> Hi,
> 
> On 11/1/06, Michael Niedermayer <michaelni <at> gmx.at> wrote:
> 
> [..]
> 
> >an opimizing compiler can make a copy, for example it could copy it to
> >the stack, gcc may or may not be capable of that but that doesnt matter
> >for the validity of the code ...
> >also note that we had very significant speed gains from manually copying
> >the cabac stuff in h.264 onto the stack so this is not a silly choice
> >and the question why use one as input and one as output?
> >well one might be somewhere (not the stack) but hold the uptodate value
> >and one miht be on the stack and be the one used by later code
> 
> BTW, what's the part of CABAC routines that uses the stack extensively?
> I'd like to see if I can write an AMD64 version that uses the
> registers rather than using the stack, hoping that it's faster (it
> should, though you never know with recent x86 CPUs)

decode_significance_*x86() probably would benefit most from more registers

[...]
--

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

In the past you could go to a library and read, borrow or copy any book
(Continue reading)


Gmane