Nicolas Trangez | 7 Jul 2012 21:13

vector-simd: some code available, and some questions

All, 

After my message of yesterday [1] I got down to it and implemented
something along those lines. I created a playground repository
containing the code at [2]. Initial benchmark results at [3]. More about
the benchmark at the end of this email.

First some questions and requests for help:

- I'm stuck with a typing issue related to 'sizeOf' calculation at [4].
I tried a couple of things, but wasn't able to figure out how to fix it.
- I'm using unsafePerformIO at [5], yet I'm not certain it's OK to do
so. Are there better (safer/performant/...) ways to get this working?
- Currently Alignment phantom types (e.g. A8 and A16) are not related to
each other: a function (like Data.Vector.SIMD.Algorithms.unsafeXorSSE42)
can have this signature:

unsafeXorSSE42 :: Storable a => SV.Vector SV.A16 a -> SV.Vector SV.A16 a
-> SV.Vector SV.A16 a

Yet, imaging I'd have an "SV.Vector SV.A32 Word8" vector at hand, the
function should accept it as well (a 32-byte aligned vector is also
16-byte aligned). Is there any way to encode this at the type level?

That's about it :-)

As of now, I only implemented a couple of the vector API functions (the
ones required to execute my benchmark). Adding the others should be
trivial.

(Continue reading)

Gábor Lehel | 7 Jul 2012 21:59
Picon

Re: vector-simd: some code available, and some questions

On Sat, Jul 7, 2012 at 9:13 PM, Nicolas Trangez <nicolas <at> incubaid.com> wrote:
> - Currently Alignment phantom types (e.g. A8 and A16) are not related to
> each other: a function (like Data.Vector.SIMD.Algorithms.unsafeXorSSE42)
> can have this signature:
>
> unsafeXorSSE42 :: Storable a => SV.Vector SV.A16 a -> SV.Vector SV.A16 a
> -> SV.Vector SV.A16 a
>
> Yet, imaging I'd have an "SV.Vector SV.A32 Word8" vector at hand, the
> function should accept it as well (a 32-byte aligned vector is also
> 16-byte aligned). Is there any way to encode this at the type level?

If the set of alignments you care about is finite, you could do:

class AlignedToAtLeast n a
instance AlignedToAtLeast A1 A1
instance AlignedToAtLeast A4 A1
instance AlignedToAtLeast A4 A4
instance AlignedToAtLeast A8 A1
instance AlignedToAtLeast A8 A4
instance AlignedToAtLeast A8 A8
instance AlignedToAtLeast A16 A1
instance AlignedToAtLeast A16 A4
instance AlignedToAtLeast A16 A8
instance AlignedToAtLeast A16 A16
instance AlignedToAtLeast A32 A1
instance AlignedToAtLeast A32 A4
instance AlignedToAtLeast A32 A8
instance AlignedToAtLeast A32 A16
instance AlignedToAtLeast A32 A32
(Continue reading)

Gábor Lehel | 7 Jul 2012 22:16
Picon

Re: vector-simd: some code available, and some questions

On Sat, Jul 7, 2012 at 9:59 PM, Gábor Lehel <illissius <at> gmail.com> wrote:
> class AlignedToAtLeast n a
> instance AlignedToAtLeast A1 A1
> instance AlignedToAtLeast A4 A1
> instance AlignedToAtLeast A4 A4
> instance AlignedToAtLeast A8 A1
> instance AlignedToAtLeast A8 A4
> instance AlignedToAtLeast A8 A8
> instance AlignedToAtLeast A16 A1
> instance AlignedToAtLeast A16 A4
> instance AlignedToAtLeast A16 A8
> instance AlignedToAtLeast A16 A16
> instance AlignedToAtLeast A32 A1
> instance AlignedToAtLeast A32 A4
> instance AlignedToAtLeast A32 A8
> instance AlignedToAtLeast A32 A16
> instance AlignedToAtLeast A32 A32

Oh dang it. Sorry. All of these should be the other way around.
Proofreading doesn't help if it's your brain that's fried.

--

-- 
Your ship was caught in a monadic eruption.
Nicolas Trangez | 8 Jul 2012 00:21

Re: vector-simd: some code available, and some questions

On Sat, 2012-07-07 at 21:59 +0200, Gábor Lehel wrote:
> An alternative solution is to encode all of the alignments in unary,
> which is more general; if they're all going to be a power of two you
> can "store" just the logarithm:
> 
> data One
> data Twice n -- not practical to call it Double :)
> 
> class AlignedToAtLeast n a
> instance AlignedToAtLeast One One
> instance AlignedToAtLeast One (Twice a)
> instance AlignedToAtLeast n a => AlignedToAtLeast (Twice n) (Twice a)
> 
> type A1 = One
> type A4 = Twice (Twice A1)
> type A8 = Twice A4
> type A16 = Twice A8
> type A32 = Twice A16
> 
> and you can apply the same private class thing from above if you
> want. 

Very ingenious, thanks! I pushed this into [1], although export lists of
all modules most likely will need some love once things get into shape.

This also allows functions to become more general:

unsafeXorSSE42 :: (Storable a,
    SV.AlignedToAtLeast SV.A16 o1, SV.Alignment o1,
    SV.AlignedToAtLeast SV.A16 o2, SV.Alignment o2,
(Continue reading)

Gábor Lehel | 8 Jul 2012 01:40
Picon

Re: vector-simd: some code available, and some questions

On Sun, Jul 8, 2012 at 12:21 AM, Nicolas Trangez <nicolas <at> incubaid.com> wrote:
> On Sat, 2012-07-07 at 21:59 +0200, Gábor Lehel wrote:
>> An alternative solution is to encode all of the alignments in unary,
>> which is more general; if they're all going to be a power of two you
>> can "store" just the logarithm:
>>
>> data One
>> data Twice n -- not practical to call it Double :)
>>
>> class AlignedToAtLeast n a
>> instance AlignedToAtLeast One One
>> instance AlignedToAtLeast One (Twice a)
>> instance AlignedToAtLeast n a => AlignedToAtLeast (Twice n) (Twice a)
>>
>> type A1 = One
>> type A4 = Twice (Twice A1)
>> type A8 = Twice A4
>> type A16 = Twice A8
>> type A32 = Twice A16
>>
>> and you can apply the same private class thing from above if you
>> want.
>
> Very ingenious, thanks! I pushed this into [1], although export lists of
> all modules most likely will need some love once things get into shape.
>
> This also allows functions to become more general:
>
> unsafeXorSSE42 :: (Storable a,
>     SV.AlignedToAtLeast SV.A16 o1, SV.Alignment o1,
(Continue reading)

Nicolas Trangez | 8 Jul 2012 03:05

Re: vector-simd: some code available, and some questions

On Sun, 2012-07-08 at 01:40 +0200, Gábor Lehel wrote:
>  unsafeXorSSE42 :: (Storable a,
> >     SV.AlignedToAtLeast SV.A16 o1, SV.Alignment o1,
> >     SV.AlignedToAtLeast SV.A16 o2, SV.Alignment o2,
> >     SV.AlignedToAtLeast SV.A16 o3, SV.Alignment o3) =>
> >     SV.Vector o1 a -> SV.Vector o2 a -> SV.Vector o3 a
> 
> I wonder if you could get that a bit shorter... I suppose you could write:
> 
> instance (AlignedToAtLeast n a, AlignedToAtLeast n b) =>
> AlignedToAtLeast n (a, b)
> instance (AlignedToAtLeast n a, AlignedToAtLeast n b, AlignedToAtLeast
> n c) => AlignedToAtLeast n (a, b, c)
> ...and so on...

Once again, nifty! And implemented in [1].

> though it feels a little strange semantically (relating a tuple to a
> scalar), but I don't see what harm can come of it. And then you can
> just write SV.AlignedToAtLeast SV.A16 (o1, o2, o3) in signatures.

> You
> can also make (Alignment n, Alignment a) a superclass constraint of
> AlignedToAtLeast, and write instances for Alignment inductively on One
> and Twice, and then you don't have to write Alignment o1 etc.
> separately either. So the signature would be just:
> 
> unsafeXorSSE42 :: (Storable a, SV.AlignedToAtLeast SV.A16 (o1, o2,
> o3)) =>      SV.Vector o1 a -> SV.Vector o2 a -> SV.Vector o3 a
> 
(Continue reading)

Gábor Lehel | 8 Jul 2012 10:27
Picon

Re: vector-simd: some code available, and some questions

On Sun, Jul 8, 2012 at 3:05 AM, Nicolas Trangez <nicolas <at> incubaid.com> wrote:
> On Sun, 2012-07-08 at 01:40 +0200, Gábor Lehel wrote:
>>  unsafeXorSSE42 :: (Storable a,
>> >     SV.AlignedToAtLeast SV.A16 o1, SV.Alignment o1,
>> >     SV.AlignedToAtLeast SV.A16 o2, SV.Alignment o2,
>> >     SV.AlignedToAtLeast SV.A16 o3, SV.Alignment o3) =>
>> >     SV.Vector o1 a -> SV.Vector o2 a -> SV.Vector o3 a
>>
>> I wonder if you could get that a bit shorter... I suppose you could write:
>>
>> instance (AlignedToAtLeast n a, AlignedToAtLeast n b) =>
>> AlignedToAtLeast n (a, b)
>> instance (AlignedToAtLeast n a, AlignedToAtLeast n b, AlignedToAtLeast
>> n c) => AlignedToAtLeast n (a, b, c)
>> ...and so on...
>
> Once again, nifty! And implemented in [1].
>
>> though it feels a little strange semantically (relating a tuple to a
>> scalar), but I don't see what harm can come of it. And then you can
>> just write SV.AlignedToAtLeast SV.A16 (o1, o2, o3) in signatures.
>
>> You
>> can also make (Alignment n, Alignment a) a superclass constraint of
>> AlignedToAtLeast, and write instances for Alignment inductively on One
>> and Twice, and then you don't have to write Alignment o1 etc.
>> separately either. So the signature would be just:
>>
>> unsafeXorSSE42 :: (Storable a, SV.AlignedToAtLeast SV.A16 (o1, o2,
>> o3)) =>      SV.Vector o1 a -> SV.Vector o2 a -> SV.Vector o3 a
(Continue reading)

Nicolas Trangez | 8 Jul 2012 17:09

Re: vector-simd: some code available, and some questions

On Sun, 2012-07-08 at 10:27 +0200, Gábor Lehel wrote:
> On Sun, Jul 8, 2012 at 3:05 AM, Nicolas Trangez <nicolas <at> incubaid.com> wrote:
> > I implemented the inductive alignment calculation over One and Twice
> > (good idea, and easy to do), but I don't get the thing about
> > superclasses. I've been trying several approaches (including definitions
> > based on forall and other trickery I never used before), but didn't get
> > things to work, at least: the compiler always said I'd need
> > UndecidableInstances, and that sounds scary... Care to elaborate?
> 
> All I meant was
> 
> class (Alignment n, Alignment a) => AlignedToAtLeast n a
> 
> but I got a bit ahead of myself, because that rules out the instance
> on tuples.

Maybe that's one of the things I ran into, can't remember OTOH.

>  (I suppose you *could* write some kind of Alignment
> instance for them, taking their minimum or something, but that's
> getting a bit too subversive for me).

Heh, it already feels evil-but-in-a-good-and-powerful-way to me right
now ;-) Thanks for all your input!

>  The alternative, if you want
> both Alignment as a superclass and the ability to constrain multiple
> types at once, is to have the above, remove the instance on tuples,
> and instead something like:
> 
(Continue reading)

Nicolas Trangez | 8 Jul 2012 02:22

Re: vector-simd: some code available, and some questions

On Sat, 2012-07-07 at 21:13 +0200, Nicolas Trangez wrote:
> As you can see, the zipWith Data.Vector.SIMD implementation is slightly
> slower than the Data.Vector.Storable based one. I didn't perform much
> profiling yet, but I suspect allocation and ForeignPtr creation is to
> blame, this seems to be highly optimized in
> GHC.ForeignPtr.mallocPlainForeignPtrBytes as used by
> Data.Vector.Storable.

I got the MV benchmark on-par with SV by reworking the allocation
mechanism: no more FFI involved, but based on
GHC.Exts.newAlignedPinnedByteArray# and some other trickery, see [1].
This could still be improved a little by using PlainPtr, but this is not
exported by GHC.ForeignPtr.

This did have a pretty big performance-impact on the SIMD-based
benchmark, compare [2] to the old one [3]. I have no clue why the 4096
case now only uses twice the time of the 1024 one, unlike the expected
4x (+- as before).

Nicolas

[1]
https://github.com/NicolasT/vector-simd/commit/5ec539167254435ef4e7d308706dcafae09504d2
[2] http://linode2.nicolast.be/files/vector-simd-xor2.html
[3] http://linode2.nicolast.be/files/vector-simd-xor1.html
Reiner Pope | 8 Jul 2012 12:49
Picon
Gravatar

Re: vector-simd: some code available, and some questions

I've not been following this thread very closely, but it seems like what you're trying to do may be related to Geoffrey Mainland's work on SIMD support in GHC. See [1] for his "SIMD-enabled version of the vector library". He's also written some blog posts about this [2].


Reiner

[2] http://ghc-simd.blogspot.com.au/

On 8 July 2012 05:13, Nicolas Trangez <nicolas <at> incubaid.com> wrote:
All,

After my message of yesterday [1] I got down to it and implemented
something along those lines. I created a playground repository
containing the code at [2]. Initial benchmark results at [3]. More about
the benchmark at the end of this email.

First some questions and requests for help:

- I'm stuck with a typing issue related to 'sizeOf' calculation at [4].
I tried a couple of things, but wasn't able to figure out how to fix it.
- I'm using unsafePerformIO at [5], yet I'm not certain it's OK to do
so. Are there better (safer/performant/...) ways to get this working?
- Currently Alignment phantom types (e.g. A8 and A16) are not related to
each other: a function (like Data.Vector.SIMD.Algorithms.unsafeXorSSE42)
can have this signature:

unsafeXorSSE42 :: Storable a => SV.Vector SV.A16 a -> SV.Vector SV.A16 a
-> SV.Vector SV.A16 a

Yet, imaging I'd have an "SV.Vector SV.A32 Word8" vector at hand, the
function should accept it as well (a 32-byte aligned vector is also
16-byte aligned). Is there any way to encode this at the type level?

That's about it :-)

As of now, I only implemented a couple of the vector API functions (the
ones required to execute my benchmark). Adding the others should be
trivial.

The benchmark works with Data.Vector.{Unboxed|Storable}.Vector (UV and
SV) vectors of Word8 values, as well as my custom
Data.Vector.SIMD.Vector type (MV) using 16-byte alignment (MV.Vector
MV.A16 Word8).

benchUV, benchSV and benchMV all take 2 pre-calculated Word8 vectors of
given size (1024 and 4096) and xor them pairwise into the result using
"zipWith xor". benchMVA takes 2 suitable MV vectors and xor's them into
a third using a rather simple and unoptimized C implementation using
SSE4.2 intrinsics [6]. This could be enhanced quite a bit (I guess using
the prim calling convention, FFI overhead can be reduced as well).
Currently, only vectors of a multiple of 32 bytes are supported (mostly
because of laziness on my part).

As you can see, the zipWith Data.Vector.SIMD implementation is slightly
slower than the Data.Vector.Storable based one. I didn't perform much
profiling yet, but I suspect allocation and ForeignPtr creation is to
blame, this seems to be highly optimized in
GHC.ForeignPtr.mallocPlainForeignPtrBytes as used by
Data.Vector.Storable.

Thanks for any input,

Nicolas

[1] http://www.haskell.org/pipermail/haskell-cafe/2012-July/102167.html
[2] https://github.com/NicolasT/vector-simd/
[3] http://linode2.nicolast.be/files/vector-simd-xor1.html
[4]
https://github.com/NicolasT/vector-simd/blob/master/src/Data/Vector/SIMD/Algorithms.hs#L46
[5]
https://github.com/NicolasT/vector-simd/blob/master/src/Data/Vector/SIMD/Algorithms.hs#L43
[6]
https://github.com/NicolasT/vector-simd/blob/master/cbits/vector-simd.c#L47


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Nicolas Trangez | 8 Jul 2012 15:43

Re: vector-simd: some code available, and some questions

On Sun, 2012-07-08 at 20:49 +1000, Reiner Pope wrote:
> I've not been following this thread very closely, but it seems like
> what you're trying to do may be related to Geoffrey Mainland's work on
> SIMD support in GHC. See [1] for his "SIMD-enabled version of the
> vector library". He's also written some blog posts about this [2].

Thanks. I knew about this before (did at least some research ;-)), yet I
think the scope is somewhat different (assuming my playground work has
any scope at all...).

I think Geoffreys work goes much further than what I'm trying to
accomplish: it's not my intention to write SIMD-based code at a high
level (which is, obviously, a great thing, but beyond my capabilities as
of now I'm afraid). What I'd like to achieve is to be able to write SIMD
code at the C level (or have SIMD code generated somehow through
LLVM/Clang(/Poly?), ISPC or others), and call into this from Haskell,
providing some guarantees w.r.t. alignment (and maybe length-in-bytes as
well?) of the passed vector pointers (which you don't get when using
plain Storable vectors as far as I could find/see).

Consider it a very short-term stop-gap solution until the tools are in
place to write high-level code which yields the same performance as a
more low-level (assembly-level) implementation.

Nicolas

Gmane