Yuras Shumovich | 9 Dec 00:12 2012
Picon

How to use C-land variable from Cmm-land?

Hi,

I'm working on that issue as an exercise/playground while studding the
GHC internals: http://hackage.haskell.org/trac/ghc/ticket/693

First I tried just to replace "ccall lockClosure(mvar "ptr")" with
GET_INFO(mvar) in stg_takeMVarzh and stg_putMVarzh and got 60% speedup
(see the test case at the end.)

Then I changed lockClosure to read header info directly when
enabled_capabilities == 1. The speedup was significantly lower, <20%

I tried to hack stg_putMVarzh directly:

    if (enabled_capabilities == 1) {
        info = GET_INFO(mvar);
    } else {
        ("ptr" info) = ccall lockClosure(mvar "ptr");
    }

But got no speedup at all.
The generated asm (amd64):

        movl $enabled_capabilities,%eax
        cmpq $1,%rax
        je .Lcgq
.Lcgp:
        movq %rbx,%rdi
        subq $8,%rsp
        movl $0,%eax
(Continue reading)

Axel Simon | 9 Dec 09:07 2012
Picon

Re: How to use C-land variable from Cmm-land?


On 09.12.2012, at 00:12, Yuras Shumovich <shumovichy <at> gmail.com> wrote:

> It looks wrong for me: the highest part of %rax remains uninitialized.

When 32 bits are assigned to any of the standard registers, the upper 32 bits are implicitly set to zero.
Intel is weird.

Axel
Kim-Ee Yeoh | 9 Dec 09:22 2012

Re: How to use C-land variable from Cmm-land?

> When 32 bits are assigned to any of the standard registers, the upper 32 bits are implicitly set to zero. Intel is weird.


Didn't AMD invent the 64-bit extensions?

-- Kim-Ee



On Sun, Dec 9, 2012 at 3:07 PM, Axel Simon <Axel.Simon <at> in.tum.de> wrote:

On 09.12.2012, at 00:12, Yuras Shumovich <shumovichy <at> gmail.com> wrote:

> It looks wrong for me: the highest part of %rax remains uninitialized.

When 32 bits are assigned to any of the standard registers, the upper 32 bits are implicitly set to zero. Intel is weird.

Axel


_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users <at> haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users <at> haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Simon Marlow | 10 Dec 11:58 2012
Picon

Re: How to use C-land variable from Cmm-land?

On 08/12/12 23:12, Yuras Shumovich wrote:
> Hi,
>
> I'm working on that issue as an exercise/playground while studding the
> GHC internals: http://hackage.haskell.org/trac/ghc/ticket/693

It's not at all clear that we want to do this.  Perhaps you'll be able 
to put the question to rest and close the ticket!

> First I tried just to replace "ccall lockClosure(mvar "ptr")" with
> GET_INFO(mvar) in stg_takeMVarzh and stg_putMVarzh and got 60% speedup
> (see the test case at the end.)
>
> Then I changed lockClosure to read header info directly when
> enabled_capabilities == 1. The speedup was significantly lower, <20%
>
> I tried to hack stg_putMVarzh directly:
>
>      if (enabled_capabilities == 1) {
>          info = GET_INFO(mvar);
>      } else {
>          ("ptr" info) = ccall lockClosure(mvar "ptr");
>      }

You should use n_capabilities, not enabled_capabilities.  The latter 
might be 1, even when there are multiple capabilities actually in use, 
while the RTS is in the process of migrating threads.

> But got no speedup at all.
> The generated asm (amd64):
>
>          movl $enabled_capabilities,%eax
>          cmpq $1,%rax
>          je .Lcgq
> .Lcgp:
>          movq %rbx,%rdi
>          subq $8,%rsp
>          movl $0,%eax
>          call lockClosure
>          addq $8,%rsp
> .Lcgr:
>          cmpq $stg_MVAR_CLEAN_info,%rax
>          jne .Lcgu
> {...}
> .Lcgq:
>          movq (%rbx),%rax
>          jmp .Lcgr
>
>
> It moves enabled_capabilities into %eax and then compares 1 with %rax.
> It looks wrong for me: the highest part of %rax remains uninitialized.

As Axel noted, this is correct.

Cheers,
	Simon
Yuras Shumovich | 10 Dec 13:46 2012
Picon

Re: How to use C-land variable from Cmm-land?

On Mon, 2012-12-10 at 10:58 +0000, Simon Marlow wrote:
> On 08/12/12 23:12, Yuras Shumovich wrote:
> > I tried to hack stg_putMVarzh directly:
> >
> >      if (enabled_capabilities == 1) {
> >          info = GET_INFO(mvar);
> >      } else {
> >          ("ptr" info) = ccall lockClosure(mvar "ptr");
> >      }
> 
> You should use n_capabilities, not enabled_capabilities.  The latter 
> might be 1, even when there are multiple capabilities actually in use, 
> while the RTS is in the process of migrating threads.

Could you please elaborate? setNumCapabilities is guarded with
asquireAllCapabilities, so all threads are in scheduler. And threads
will be migrated from disabled capabilities before they get a chance to
put/take mvar.
I changed my mind re enabled_capabilities/n_capabilities a number of
times during the weekend. Most likely you are right, and I should use
n_capabilities. But I'll appreciate if you find time to explain it for
me.

> 
> > But got no speedup at all.
> > The generated asm (amd64):
> >
> >          movl $enabled_capabilities,%eax
> >          cmpq $1,%rax
> >          je .Lcgq
> > .Lcgp:
> >          movq %rbx,%rdi
> >          subq $8,%rsp
> >          movl $0,%eax
> >          call lockClosure
> >          addq $8,%rsp
> > .Lcgr:
> >          cmpq $stg_MVAR_CLEAN_info,%rax
> >          jne .Lcgu
> > {...}
> > .Lcgq:
> >          movq (%rbx),%rax
> >          jmp .Lcgr
> >
> >
> > It moves enabled_capabilities into %eax and then compares 1 with %rax.
> > It looks wrong for me: the highest part of %rax remains uninitialized.
> 
> As Axel noted, this is correct.

The problem was that "movl $enabled_capabilities,%eax" loaded the
address of enabled_capabilities, not a value. (Again, why does it use
32bit register? The value is 32bit on linux, but the address is 64bit,
isn't it?) So the correct way to use C-land variable is:

if (CInt[enabled_capabilities]) {...}

Not very intuitive, but at least it works.

Thanks,
Yuras
Simon Marlow | 11 Dec 09:43 2012
Picon

Re: How to use C-land variable from Cmm-land?

On 10/12/12 12:46, Yuras Shumovich wrote:
> On Mon, 2012-12-10 at 10:58 +0000, Simon Marlow wrote:
>> On 08/12/12 23:12, Yuras Shumovich wrote:
>>> I tried to hack stg_putMVarzh directly:
>>>
>>>       if (enabled_capabilities == 1) {
>>>           info = GET_INFO(mvar);
>>>       } else {
>>>           ("ptr" info) = ccall lockClosure(mvar "ptr");
>>>       }
>>
>> You should use n_capabilities, not enabled_capabilities.  The latter
>> might be 1, even when there are multiple capabilities actually in use,
>> while the RTS is in the process of migrating threads.
>
> Could you please elaborate? setNumCapabilities is guarded with
> asquireAllCapabilities, so all threads are in scheduler. And threads
> will be migrated from disabled capabilities before they get a chance to
> put/take mvar.
> I changed my mind re enabled_capabilities/n_capabilities a number of
> times during the weekend. Most likely you are right, and I should use
> n_capabilities. But I'll appreciate if you find time to explain it for
> me.

n_capabilities is the actual number of capabilities, and can only 
increase, never decrease.  enabled_capabilities is the number of 
capabilities we are currently aiming to use, which might be less than 
n_capabilities.  If enabled_capabilities is less than n_capabilities, 
there might still be activity on the other capabilities, but the idea is 
that threads get migrated away from the inactive capabilities as quickly 
as possible.  It's hard to do this immediately, which is why we have 
enabled_capabilities and we don't just change n_capabilities.

> The problem was that "movl $enabled_capabilities,%eax" loaded the
> address of enabled_capabilities, not a value.

Yes, sorry, you are right.

> (Again, why does it use
> 32bit register? The value is 32bit on linux, but the address is 64bit,
> isn't it?) So the correct way to use C-land variable is:
>
> if (CInt[enabled_capabilities]) {...}
>
> Not very intuitive, but at least it works.

That's C-- syntax for a memory load of a CInt value (CInt is a CPP 
symbol that expands to a real C-- type, like bits32).  Unlike in C, 
memory loads are explicit in C--.

Cheers,
	Simon

Gmane