Clark Gaebel | 3 Dec 07:07 2012
Picon
Picon

Naive matrix multiplication with Accelerate

Hello cafe,

I've recently started learning about cuda and hetrogenous programming, and have been using accelerate [1] to help me out. Right now, I'm running into trouble in that I can't call parallel code from sequential code. Turns out GPUs aren't exactly like Repa =P.

Here's what I have so far:

import qualified Data.Array.Accelerate as A
import Data.Array.Accelerate ( (:.)(..)
                             , Acc
                             , Vector
                             , Scalar
                             , Elt
                             , fold
                             , slice
                             , constant
                             , Array
                             , Z(..), DIM1, DIM2
                             , fromList
                             , All(..)
                             , generate
                             , lift, unlift
                             , shape
                             )
import Data.Array.Accelerate.Interpreter ( run )

dotP :: (Num a, Elt a) => Acc (Vector a) -> Acc (Vector a) -> Acc (Scalar a)
dotP xs ys = fold (+) 0 $ A.zipWith (*) xs ys

type Matrix a = Array DIM2 a

getRow :: Elt a => Int -> Acc (Matrix a) -> Acc (Vector a)
getRow n mat = slice mat . constant $ Z :. n :. All

-- Naive matrix multiplication:
--
-- index (i, j) is equal to the ith row of 'a' `dot` the jth row of 'b'
matMul :: A.Acc (Matrix Double) -> A.Acc (Matrix Double) -> A.Acc (Matrix Double)
matMul a b' = A.generate (constant $ Z :. nrows :. ncols) $
                \ix ->
                  let (Z :. i :. j) = unlift ix
                   in getRow i a `dotP` getRow j b
    where
        b = A.transpose b' -- I assume row indexing is faster than column indexing...
        (Z :. nrows :.   _  ) = unlift $ shape a
        (Z :.   _   :. ncols) = unlift $ shape b


This, of course, gives me errors right now because I'm calling getRow and dotP from within the generation function, which expects Exp[ression]s, not Acc[elerated computation]s.

So maybe I need to replace that line with an inner for loop? Is there an easy way to do that with Accelerate?

Thanks for your help,
  - Clark

[1] http://hackage.haskell.org/package/accelerate
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Trevor L. McDonell | 3 Dec 08:06 2012
Picon
Picon

Re: Naive matrix multiplication with Accelerate

Hi Clark,

The trick is that most accelerate operations work over multidimensional arrays, so you can still get around the fact that we are limited to flat data-parallelism only.

Here is matrix multiplication in Accelerate, lifted from the first Repa paper [1].


import Data.Array.Accelerate as A

type Matrix a = Array DIM2 a

matMul :: (IsNum e, Elt e) => Acc (Matrix e) -> Acc (Matrix e) -> Acc (Matrix e)
matMul arr brr
  = A.fold (+) 0
  $ A.zipWith (*) arrRepl brrRepl
  where
    Z :. rowsA :. _     = unlift (shape arr)    :: Z :. Exp Int :. Exp Int
    Z :. _     :. colsB = unlift (shape brr)    :: Z :. Exp Int :. Exp Int

    arrRepl             = A.replicate (lift $ Z :. All   :. colsB :. All) arr
    brrRepl             = A.replicate (lift $ Z :. rowsA :. All   :. All) (A.transpose brr)


If you use github sources rather than the hackage package, those intermediate replicates will get fused away.


Cheers,
-Trev





On 03/12/2012, at 5:07 PM, Clark Gaebel <cgaebel <at> uwaterloo.ca> wrote:

Hello cafe,

I've recently started learning about cuda and hetrogenous programming, and have been using accelerate [1] to help me out. Right now, I'm running into trouble in that I can't call parallel code from sequential code. Turns out GPUs aren't exactly like Repa =P.

Here's what I have so far:

import qualified Data.Array.Accelerate as A
import Data.Array.Accelerate ( (:.)(..)
                             , Acc
                             , Vector
                             , Scalar
                             , Elt
                             , fold
                             , slice
                             , constant
                             , Array
                             , Z(..), DIM1, DIM2
                             , fromList
                             , All(..)
                             , generate
                             , lift, unlift
                             , shape
                             )
import Data.Array.Accelerate.Interpreter ( run )

dotP :: (Num a, Elt a) => Acc (Vector a) -> Acc (Vector a) -> Acc (Scalar a)
dotP xs ys = fold (+) 0 $ A.zipWith (*) xs ys

type Matrix a = Array DIM2 a

getRow :: Elt a => Int -> Acc (Matrix a) -> Acc (Vector a)
getRow n mat = slice mat . constant $ Z :. n :. All

-- Naive matrix multiplication:
--
-- index (i, j) is equal to the ith row of 'a' `dot` the jth row of 'b'
matMul :: A.Acc (Matrix Double) -> A.Acc (Matrix Double) -> A.Acc (Matrix Double)
matMul a b' = A.generate (constant $ Z :. nrows :. ncols) $
                \ix ->
                  let (Z :. i :. j) = unlift ix
                   in getRow i a `dotP` getRow j b
    where
        b = A.transpose b' -- I assume row indexing is faster than column indexing...
        (Z :. nrows :.   _  ) = unlift $ shape a
        (Z :.   _   :. ncols) = unlift $ shape b


This, of course, gives me errors right now because I'm calling getRow and dotP from within the generation function, which expects Exp[ression]s, not Acc[elerated computation]s.

So maybe I need to replace that line with an inner for loop? Is there an easy way to do that with Accelerate?

Thanks for your help,
  - Clark

[1] http://hackage.haskell.org/package/accelerate
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Clark Gaebel | 3 Dec 17:41 2012
Picon
Picon

Re: Naive matrix multiplication with Accelerate

Ah. I see now. Silly Haskell making inefficient algorithms hard to write and efficient ones easy. It's actually kind of annoying when learning, but probably for the best.

Is there a good write-up of the algorithm you're using somewhere? The Repa paper was very brief in its explaination, and I'm having trouble visualizing the mapping of the 2D matricies into 3 dimensions.

  - Clark


On Mon, Dec 3, 2012 at 2:06 AM, Trevor L. McDonell <tmcdonell <at> cse.unsw.edu.au> wrote:
Hi Clark,

The trick is that most accelerate operations work over multidimensional arrays, so you can still get around the fact that we are limited to flat data-parallelism only.

Here is matrix multiplication in Accelerate, lifted from the first Repa paper [1].


import Data.Array.Accelerate as A

type Matrix a = Array DIM2 a

matMul :: (IsNum e, Elt e) => Acc (Matrix e) -> Acc (Matrix e) -> Acc (Matrix e)
matMul arr brr
  = A.fold (+) 0
  $ A.zipWith (*) arrRepl brrRepl
  where
    Z :. rowsA :. _     = unlift (shape arr)    :: Z :. Exp Int :. Exp Int
    Z :. _     :. colsB = unlift (shape brr)    :: Z :. Exp Int :. Exp Int

    arrRepl             = A.replicate (lift $ Z :. All   :. colsB :. All) arr
    brrRepl             = A.replicate (lift $ Z :. rowsA :. All   :. All) (A.transpose brr)


If you use github sources rather than the hackage package, those intermediate replicates will get fused away.


Cheers,
-Trev





On 03/12/2012, at 5:07 PM, Clark Gaebel <cgaebel <at> uwaterloo.ca> wrote:

Hello cafe,

I've recently started learning about cuda and hetrogenous programming, and have been using accelerate [1] to help me out. Right now, I'm running into trouble in that I can't call parallel code from sequential code. Turns out GPUs aren't exactly like Repa =P.

Here's what I have so far:

import qualified Data.Array.Accelerate as A
import Data.Array.Accelerate ( (:.)(..)
                             , Acc
                             , Vector
                             , Scalar
                             , Elt
                             , fold
                             , slice
                             , constant
                             , Array
                             , Z(..), DIM1, DIM2
                             , fromList
                             , All(..)
                             , generate
                             , lift, unlift
                             , shape
                             )
import Data.Array.Accelerate.Interpreter ( run )

dotP :: (Num a, Elt a) => Acc (Vector a) -> Acc (Vector a) -> Acc (Scalar a)
dotP xs ys = fold (+) 0 $ A.zipWith (*) xs ys

type Matrix a = Array DIM2 a

getRow :: Elt a => Int -> Acc (Matrix a) -> Acc (Vector a)
getRow n mat = slice mat . constant $ Z :. n :. All

-- Naive matrix multiplication:
--
-- index (i, j) is equal to the ith row of 'a' `dot` the jth row of 'b'
matMul :: A.Acc (Matrix Double) -> A.Acc (Matrix Double) -> A.Acc (Matrix Double)
matMul a b' = A.generate (constant $ Z :. nrows :. ncols) $
                \ix ->
                  let (Z :. i :. j) = unlift ix
                   in getRow i a `dotP` getRow j b
    where
        b = A.transpose b' -- I assume row indexing is faster than column indexing...
        (Z :. nrows :.   _  ) = unlift $ shape a
        (Z :.   _   :. ncols) = unlift $ shape b


This, of course, gives me errors right now because I'm calling getRow and dotP from within the generation function, which expects Exp[ression]s, not Acc[elerated computation]s.

So maybe I need to replace that line with an inner for loop? Is there an easy way to do that with Accelerate?

Thanks for your help,
  - Clark

[1] http://hackage.haskell.org/package/accelerate
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Trevor L. McDonell | 4 Dec 06:21 2012
Picon
Picon

Re: Naive matrix multiplication with Accelerate

As far as I am aware, the only description is in the Repa paper. I you are right, it really should be explained properly somewhere… 

At a simpler example, here is the outer product of two vectors [1].

vvProd :: (IsNum e, Elt e) => Acc (Vector e) -> Acc (Vector e) -> Acc (Matrix e)
vvProd xs ys = A.zipWith (*) xsRepl ysRepl
  where
    n           = A.size xs
    m           = A.size ys

    xsRepl      = A.replicate (lift (Z :. All :. m  )) xs
    ysRepl      = A.replicate (lift (Z :. n   :. All)) ys

If we then `A.fold (+) 0` the matrix, it would reduce along each row producing a vector. So the first element of that vector is going to be calculated as (xs[0] * ys[0] + xs[0] * ys[1] +  … xs[0] * ys[m-1]). That's the idea we want for our matrix multiplication … but I agree, it is difficult for me to visualise as well.

I do the same sort of trick with the n-body demo to get all n^2 particle interactions.

-Trev





On 04/12/2012, at 3:41 AM, Clark Gaebel <cgaebel <at> uwaterloo.ca> wrote:

Ah. I see now. Silly Haskell making inefficient algorithms hard to write and efficient ones easy. It's actually kind of annoying when learning, but probably for the best.

Is there a good write-up of the algorithm you're using somewhere? The Repa paper was very brief in its explaination, and I'm having trouble visualizing the mapping of the 2D matricies into 3 dimensions.

  - Clark


On Mon, Dec 3, 2012 at 2:06 AM, Trevor L. McDonell <tmcdonell <at> cse.unsw.edu.au> wrote:
Hi Clark,

The trick is that most accelerate operations work over multidimensional arrays, so you can still get around the fact that we are limited to flat data-parallelism only.

Here is matrix multiplication in Accelerate, lifted from the first Repa paper [1].


import Data.Array.Accelerate as A

type Matrix a = Array DIM2 a

matMul :: (IsNum e, Elt e) => Acc (Matrix e) -> Acc (Matrix e) -> Acc (Matrix e)
matMul arr brr
  = A.fold (+) 0
  $ A.zipWith (*) arrRepl brrRepl
  where
    Z :. rowsA :. _     = unlift (shape arr)    :: Z :. Exp Int :. Exp Int
    Z :. _     :. colsB = unlift (shape brr)    :: Z :. Exp Int :. Exp Int

    arrRepl             = A.replicate (lift $ Z :. All   :. colsB :. All) arr
    brrRepl             = A.replicate (lift $ Z :. rowsA :. All   :. All) (A.transpose brr)


If you use github sources rather than the hackage package, those intermediate replicates will get fused away.


Cheers,
-Trev





On 03/12/2012, at 5:07 PM, Clark Gaebel <cgaebel <at> uwaterloo.ca> wrote:

Hello cafe,

I've recently started learning about cuda and hetrogenous programming, and have been using accelerate [1] to help me out. Right now, I'm running into trouble in that I can't call parallel code from sequential code. Turns out GPUs aren't exactly like Repa =P.

Here's what I have so far:

import qualified Data.Array.Accelerate as A
import Data.Array.Accelerate ( (:.)(..)
                             , Acc
                             , Vector
                             , Scalar
                             , Elt
                             , fold
                             , slice
                             , constant
                             , Array
                             , Z(..), DIM1, DIM2
                             , fromList
                             , All(..)
                             , generate
                             , lift, unlift
                             , shape
                             )
import Data.Array.Accelerate.Interpreter ( run )

dotP :: (Num a, Elt a) => Acc (Vector a) -> Acc (Vector a) -> Acc (Scalar a)
dotP xs ys = fold (+) 0 $ A.zipWith (*) xs ys

type Matrix a = Array DIM2 a

getRow :: Elt a => Int -> Acc (Matrix a) -> Acc (Vector a)
getRow n mat = slice mat . constant $ Z :. n :. All

-- Naive matrix multiplication:
--
-- index (i, j) is equal to the ith row of 'a' `dot` the jth row of 'b'
matMul :: A.Acc (Matrix Double) -> A.Acc (Matrix Double) -> A.Acc (Matrix Double)
matMul a b' = A.generate (constant $ Z :. nrows :. ncols) $
                \ix ->
                  let (Z :. i :. j) = unlift ix
                   in getRow i a `dotP` getRow j b
    where
        b = A.transpose b' -- I assume row indexing is faster than column indexing...
        (Z :. nrows :.   _  ) = unlift $ shape a
        (Z :.   _   :. ncols) = unlift $ shape b


This, of course, gives me errors right now because I'm calling getRow and dotP from within the generation function, which expects Exp[ression]s, not Acc[elerated computation]s.

So maybe I need to replace that line with an inner for loop? Is there an easy way to do that with Accelerate?

Thanks for your help,
  - Clark

[1] http://hackage.haskell.org/package/accelerate
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe



_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Alexander Solla | 5 Dec 01:08 2012
Picon

Re: Naive matrix multiplication with Accelerate

I don't mean to be blunt, but have you guys taken a course in linear algebra?



On Mon, Dec 3, 2012 at 9:21 PM, Trevor L. McDonell <tmcdonell <at> cse.unsw.edu.au> wrote:
As far as I am aware, the only description is in the Repa paper. I you are right, it really should be explained properly somewhere… 

At a simpler example, here is the outer product of two vectors [1].

vvProd :: (IsNum e, Elt e) => Acc (Vector e) -> Acc (Vector e) -> Acc (Matrix e)
vvProd xs ys = A.zipWith (*) xsRepl ysRepl
  where
    n           = A.size xs
    m           = A.size ys

    xsRepl      = A.replicate (lift (Z :. All :. m  )) xs
    ysRepl      = A.replicate (lift (Z :. n   :. All)) ys

If we then `A.fold (+) 0` the matrix, it would reduce along each row producing a vector. So the first element of that vector is going to be calculated as (xs[0] * ys[0] + xs[0] * ys[1] +  … xs[0] * ys[m-1]). That's the idea we want for our matrix multiplication … but I agree, it is difficult for me to visualise as well.

I do the same sort of trick with the n-body demo to get all n^2 particle interactions.

-Trev





On 04/12/2012, at 3:41 AM, Clark Gaebel <cgaebel <at> uwaterloo.ca> wrote:

Ah. I see now. Silly Haskell making inefficient algorithms hard to write and efficient ones easy. It's actually kind of annoying when learning, but probably for the best.

Is there a good write-up of the algorithm you're using somewhere? The Repa paper was very brief in its explaination, and I'm having trouble visualizing the mapping of the 2D matricies into 3 dimensions.

  - Clark


On Mon, Dec 3, 2012 at 2:06 AM, Trevor L. McDonell <tmcdonell <at> cse.unsw.edu.au> wrote:
Hi Clark,

The trick is that most accelerate operations work over multidimensional arrays, so you can still get around the fact that we are limited to flat data-parallelism only.

Here is matrix multiplication in Accelerate, lifted from the first Repa paper [1].


import Data.Array.Accelerate as A

type Matrix a = Array DIM2 a

matMul :: (IsNum e, Elt e) => Acc (Matrix e) -> Acc (Matrix e) -> Acc (Matrix e)
matMul arr brr
  = A.fold (+) 0
  $ A.zipWith (*) arrRepl brrRepl
  where
    Z :. rowsA :. _     = unlift (shape arr)    :: Z :. Exp Int :. Exp Int
    Z :. _     :. colsB = unlift (shape brr)    :: Z :. Exp Int :. Exp Int

    arrRepl             = A.replicate (lift $ Z :. All   :. colsB :. All) arr
    brrRepl             = A.replicate (lift $ Z :. rowsA :. All   :. All) (A.transpose brr)


If you use github sources rather than the hackage package, those intermediate replicates will get fused away.


Cheers,
-Trev





On 03/12/2012, at 5:07 PM, Clark Gaebel <cgaebel <at> uwaterloo.ca> wrote:

Hello cafe,

I've recently started learning about cuda and hetrogenous programming, and have been using accelerate [1] to help me out. Right now, I'm running into trouble in that I can't call parallel code from sequential code. Turns out GPUs aren't exactly like Repa =P.

Here's what I have so far:

import qualified Data.Array.Accelerate as A
import Data.Array.Accelerate ( (:.)(..)
                             , Acc
                             , Vector
                             , Scalar
                             , Elt
                             , fold
                             , slice
                             , constant
                             , Array
                             , Z(..), DIM1, DIM2
                             , fromList
                             , All(..)
                             , generate
                             , lift, unlift
                             , shape
                             )
import Data.Array.Accelerate.Interpreter ( run )

dotP :: (Num a, Elt a) => Acc (Vector a) -> Acc (Vector a) -> Acc (Scalar a)
dotP xs ys = fold (+) 0 $ A.zipWith (*) xs ys

type Matrix a = Array DIM2 a

getRow :: Elt a => Int -> Acc (Matrix a) -> Acc (Vector a)
getRow n mat = slice mat . constant $ Z :. n :. All

-- Naive matrix multiplication:
--
-- index (i, j) is equal to the ith row of 'a' `dot` the jth row of 'b'
matMul :: A.Acc (Matrix Double) -> A.Acc (Matrix Double) -> A.Acc (Matrix Double)
matMul a b' = A.generate (constant $ Z :. nrows :. ncols) $
                \ix ->
                  let (Z :. i :. j) = unlift ix
                   in getRow i a `dotP` getRow j b
    where
        b = A.transpose b' -- I assume row indexing is faster than column indexing...
        (Z :. nrows :.   _  ) = unlift $ shape a
        (Z :.   _   :. ncols) = unlift $ shape b


This, of course, gives me errors right now because I'm calling getRow and dotP from within the generation function, which expects Exp[ression]s, not Acc[elerated computation]s.

So maybe I need to replace that line with an inner for loop? Is there an easy way to do that with Accelerate?

Thanks for your help,
  - Clark

[1] http://hackage.haskell.org/package/accelerate
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe




_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Clark Gaebel | 5 Dec 01:13 2012
Picon
Picon

Re: Naive matrix multiplication with Accelerate

No. But that doesn't stop me from being curious with Accelerate. Might you have a better explaination for what's happening here than Trevor's?

  - Clark


On Tue, Dec 4, 2012 at 7:08 PM, Alexander Solla <alex.solla <at> gmail.com> wrote:
I don't mean to be blunt, but have you guys taken a course in linear algebra?


On Mon, Dec 3, 2012 at 9:21 PM, Trevor L. McDonell <tmcdonell <at> cse.unsw.edu.au> wrote:
As far as I am aware, the only description is in the Repa paper. I you are right, it really should be explained properly somewhere… 

At a simpler example, here is the outer product of two vectors [1].

vvProd :: (IsNum e, Elt e) => Acc (Vector e) -> Acc (Vector e) -> Acc (Matrix e)
vvProd xs ys = A.zipWith (*) xsRepl ysRepl
  where
    n           = A.size xs
    m           = A.size ys

    xsRepl      = A.replicate (lift (Z :. All :. m  )) xs
    ysRepl      = A.replicate (lift (Z :. n   :. All)) ys

If we then `A.fold (+) 0` the matrix, it would reduce along each row producing a vector. So the first element of that vector is going to be calculated as (xs[0] * ys[0] + xs[0] * ys[1] +  … xs[0] * ys[m-1]). That's the idea we want for our matrix multiplication … but I agree, it is difficult for me to visualise as well.

I do the same sort of trick with the n-body demo to get all n^2 particle interactions.

-Trev





On 04/12/2012, at 3:41 AM, Clark Gaebel <cgaebel <at> uwaterloo.ca> wrote:

Ah. I see now. Silly Haskell making inefficient algorithms hard to write and efficient ones easy. It's actually kind of annoying when learning, but probably for the best.

Is there a good write-up of the algorithm you're using somewhere? The Repa paper was very brief in its explaination, and I'm having trouble visualizing the mapping of the 2D matricies into 3 dimensions.

  - Clark


On Mon, Dec 3, 2012 at 2:06 AM, Trevor L. McDonell <tmcdonell <at> cse.unsw.edu.au> wrote:
Hi Clark,

The trick is that most accelerate operations work over multidimensional arrays, so you can still get around the fact that we are limited to flat data-parallelism only.

Here is matrix multiplication in Accelerate, lifted from the first Repa paper [1].


import Data.Array.Accelerate as A

type Matrix a = Array DIM2 a

matMul :: (IsNum e, Elt e) => Acc (Matrix e) -> Acc (Matrix e) -> Acc (Matrix e)
matMul arr brr
  = A.fold (+) 0
  $ A.zipWith (*) arrRepl brrRepl
  where
    Z :. rowsA :. _     = unlift (shape arr)    :: Z :. Exp Int :. Exp Int
    Z :. _     :. colsB = unlift (shape brr)    :: Z :. Exp Int :. Exp Int

    arrRepl             = A.replicate (lift $ Z :. All   :. colsB :. All) arr
    brrRepl             = A.replicate (lift $ Z :. rowsA :. All   :. All) (A.transpose brr)


If you use github sources rather than the hackage package, those intermediate replicates will get fused away.


Cheers,
-Trev





On 03/12/2012, at 5:07 PM, Clark Gaebel <cgaebel <at> uwaterloo.ca> wrote:

Hello cafe,

I've recently started learning about cuda and hetrogenous programming, and have been using accelerate [1] to help me out. Right now, I'm running into trouble in that I can't call parallel code from sequential code. Turns out GPUs aren't exactly like Repa =P.

Here's what I have so far:

import qualified Data.Array.Accelerate as A
import Data.Array.Accelerate ( (:.)(..)
                             , Acc
                             , Vector
                             , Scalar
                             , Elt
                             , fold
                             , slice
                             , constant
                             , Array
                             , Z(..), DIM1, DIM2
                             , fromList
                             , All(..)
                             , generate
                             , lift, unlift
                             , shape
                             )
import Data.Array.Accelerate.Interpreter ( run )

dotP :: (Num a, Elt a) => Acc (Vector a) -> Acc (Vector a) -> Acc (Scalar a)
dotP xs ys = fold (+) 0 $ A.zipWith (*) xs ys

type Matrix a = Array DIM2 a

getRow :: Elt a => Int -> Acc (Matrix a) -> Acc (Vector a)
getRow n mat = slice mat . constant $ Z :. n :. All

-- Naive matrix multiplication:
--
-- index (i, j) is equal to the ith row of 'a' `dot` the jth row of 'b'
matMul :: A.Acc (Matrix Double) -> A.Acc (Matrix Double) -> A.Acc (Matrix Double)
matMul a b' = A.generate (constant $ Z :. nrows :. ncols) $
                \ix ->
                  let (Z :. i :. j) = unlift ix
                   in getRow i a `dotP` getRow j b
    where
        b = A.transpose b' -- I assume row indexing is faster than column indexing...
        (Z :. nrows :.   _  ) = unlift $ shape a
        (Z :.   _   :. ncols) = unlift $ shape b


This, of course, gives me errors right now because I'm calling getRow and dotP from within the generation function, which expects Exp[ression]s, not Acc[elerated computation]s.

So maybe I need to replace that line with an inner for loop? Is there an easy way to do that with Accelerate?

Thanks for your help,
  - Clark

[1] http://hackage.haskell.org/package/accelerate
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe




_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe



_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Alexander Solla | 5 Dec 01:37 2012
Picon

Re: Naive matrix multiplication with Accelerate

Well, an m x n matrix corresponds to a linear transformation in at most min{m,n} dimensions.  In particular, this means that a 2x2 matrix corresponds to a plane, line, or the origin of 3-space, as a linear subspace.  Which of those the matrix corresponds to depends on the matrix's "rank", which is the number of linearly independent columns (or rows) in the matrix.


Do you really need to know /which/ plane or line a matrix corresponds to?  If so, reduce it using Gaussian elimination and, if appropriate, compute its eigenvectors or span.  Otherwise, think of it as a generic plane/line/0-point.

Outer products represent more of these simple facts about linear algebra.  The product of an mx1 and 1xn matrices is an mxn matrix with rank at most 1.  Trouble visualizing this means you are missing the essential facts (for the general picture as the product as a line or origin), or requires some computational details -- reducing the matrix using Gaussian elimination and determining its span.

As I said, I don't mean to be harsh, but playing with a vector algebra package without understanding vectors is like playing with a calculator without understanding multiplication.  You're better off learning what multiplication represents first, before using a machine to do it fast.  So, I can humbly recommend taking a course on the subject.  For example, https://www.coursera.org/course/matrix


On Tue, Dec 4, 2012 at 4:13 PM, Clark Gaebel <cgaebel <at> uwaterloo.ca> wrote:
No. But that doesn't stop me from being curious with Accelerate. Might you have a better explaination for what's happening here than Trevor's?

  - Clark


On Tue, Dec 4, 2012 at 7:08 PM, Alexander Solla <alex.solla <at> gmail.com> wrote:
I don't mean to be blunt, but have you guys taken a course in linear algebra?


On Mon, Dec 3, 2012 at 9:21 PM, Trevor L. McDonell <tmcdonell <at> cse.unsw.edu.au> wrote:
As far as I am aware, the only description is in the Repa paper. I you are right, it really should be explained properly somewhere… 

At a simpler example, here is the outer product of two vectors [1].

vvProd :: (IsNum e, Elt e) => Acc (Vector e) -> Acc (Vector e) -> Acc (Matrix e)
vvProd xs ys = A.zipWith (*) xsRepl ysRepl
  where
    n           = A.size xs
    m           = A.size ys

    xsRepl      = A.replicate (lift (Z :. All :. m  )) xs
    ysRepl      = A.replicate (lift (Z :. n   :. All)) ys

If we then `A.fold (+) 0` the matrix, it would reduce along each row producing a vector. So the first element of that vector is going to be calculated as (xs[0] * ys[0] + xs[0] * ys[1] +  … xs[0] * ys[m-1]). That's the idea we want for our matrix multiplication … but I agree, it is difficult for me to visualise as well.

I do the same sort of trick with the n-body demo to get all n^2 particle interactions.

-Trev





On 04/12/2012, at 3:41 AM, Clark Gaebel <cgaebel <at> uwaterloo.ca> wrote:

Ah. I see now. Silly Haskell making inefficient algorithms hard to write and efficient ones easy. It's actually kind of annoying when learning, but probably for the best.

Is there a good write-up of the algorithm you're using somewhere? The Repa paper was very brief in its explaination, and I'm having trouble visualizing the mapping of the 2D matricies into 3 dimensions.

  - Clark


On Mon, Dec 3, 2012 at 2:06 AM, Trevor L. McDonell <tmcdonell <at> cse.unsw.edu.au> wrote:
Hi Clark,

The trick is that most accelerate operations work over multidimensional arrays, so you can still get around the fact that we are limited to flat data-parallelism only.

Here is matrix multiplication in Accelerate, lifted from the first Repa paper [1].


import Data.Array.Accelerate as A

type Matrix a = Array DIM2 a

matMul :: (IsNum e, Elt e) => Acc (Matrix e) -> Acc (Matrix e) -> Acc (Matrix e)
matMul arr brr
  = A.fold (+) 0
  $ A.zipWith (*) arrRepl brrRepl
  where
    Z :. rowsA :. _     = unlift (shape arr)    :: Z :. Exp Int :. Exp Int
    Z :. _     :. colsB = unlift (shape brr)    :: Z :. Exp Int :. Exp Int

    arrRepl             = A.replicate (lift $ Z :. All   :. colsB :. All) arr
    brrRepl             = A.replicate (lift $ Z :. rowsA :. All   :. All) (A.transpose brr)


If you use github sources rather than the hackage package, those intermediate replicates will get fused away.


Cheers,
-Trev





On 03/12/2012, at 5:07 PM, Clark Gaebel <cgaebel <at> uwaterloo.ca> wrote:

Hello cafe,

I've recently started learning about cuda and hetrogenous programming, and have been using accelerate [1] to help me out. Right now, I'm running into trouble in that I can't call parallel code from sequential code. Turns out GPUs aren't exactly like Repa =P.

Here's what I have so far:

import qualified Data.Array.Accelerate as A
import Data.Array.Accelerate ( (:.)(..)
                             , Acc
                             , Vector
                             , Scalar
                             , Elt
                             , fold
                             , slice
                             , constant
                             , Array
                             , Z(..), DIM1, DIM2
                             , fromList
                             , All(..)
                             , generate
                             , lift, unlift
                             , shape
                             )
import Data.Array.Accelerate.Interpreter ( run )

dotP :: (Num a, Elt a) => Acc (Vector a) -> Acc (Vector a) -> Acc (Scalar a)
dotP xs ys = fold (+) 0 $ A.zipWith (*) xs ys

type Matrix a = Array DIM2 a

getRow :: Elt a => Int -> Acc (Matrix a) -> Acc (Vector a)
getRow n mat = slice mat . constant $ Z :. n :. All

-- Naive matrix multiplication:
--
-- index (i, j) is equal to the ith row of 'a' `dot` the jth row of 'b'
matMul :: A.Acc (Matrix Double) -> A.Acc (Matrix Double) -> A.Acc (Matrix Double)
matMul a b' = A.generate (constant $ Z :. nrows :. ncols) $
                \ix ->
                  let (Z :. i :. j) = unlift ix
                   in getRow i a `dotP` getRow j b
    where
        b = A.transpose b' -- I assume row indexing is faster than column indexing...
        (Z :. nrows :.   _  ) = unlift $ shape a
        (Z :.   _   :. ncols) = unlift $ shape b


This, of course, gives me errors right now because I'm calling getRow and dotP from within the generation function, which expects Exp[ression]s, not Acc[elerated computation]s.

So maybe I need to replace that line with an inner for loop? Is there an easy way to do that with Accelerate?

Thanks for your help,
  - Clark

[1] http://hackage.haskell.org/package/accelerate
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe




_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe




_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Alexander Solla | 5 Dec 01:43 2012
Picon

Re: Naive matrix multiplication with Accelerate

Sorry, I didn't realize that course was offered next year.  I read through "Matrices and Linear Algebra" when I was in high school.  And used Friedberg, Insel, Spence's "Linear Algebra" in college.



On Tue, Dec 4, 2012 at 4:37 PM, Alexander Solla <alex.solla <at> gmail.com> wrote:
Well, an m x n matrix corresponds to a linear transformation in at most min{m,n} dimensions.  In particular, this means that a 2x2 matrix corresponds to a plane, line, or the origin of 3-space, as a linear subspace.  Which of those the matrix corresponds to depends on the matrix's "rank", which is the number of linearly independent columns (or rows) in the matrix.

Do you really need to know /which/ plane or line a matrix corresponds to?  If so, reduce it using Gaussian elimination and, if appropriate, compute its eigenvectors or span.  Otherwise, think of it as a generic plane/line/0-point.

Outer products represent more of these simple facts about linear algebra.  The product of an mx1 and 1xn matrices is an mxn matrix with rank at most 1.  Trouble visualizing this means you are missing the essential facts (for the general picture as the product as a line or origin), or requires some computational details -- reducing the matrix using Gaussian elimination and determining its span.

As I said, I don't mean to be harsh, but playing with a vector algebra package without understanding vectors is like playing with a calculator without understanding multiplication.  You're better off learning what multiplication represents first, before using a machine to do it fast.  So, I can humbly recommend taking a course on the subject.  For example, https://www.coursera.org/course/matrix


On Tue, Dec 4, 2012 at 4:13 PM, Clark Gaebel <cgaebel <at> uwaterloo.ca> wrote:
No. But that doesn't stop me from being curious with Accelerate. Might you have a better explaination for what's happening here than Trevor's?

  - Clark


On Tue, Dec 4, 2012 at 7:08 PM, Alexander Solla <alex.solla <at> gmail.com> wrote:
I don't mean to be blunt, but have you guys taken a course in linear algebra?


On Mon, Dec 3, 2012 at 9:21 PM, Trevor L. McDonell <tmcdonell <at> cse.unsw.edu.au> wrote:
As far as I am aware, the only description is in the Repa paper. I you are right, it really should be explained properly somewhere… 

At a simpler example, here is the outer product of two vectors [1].

vvProd :: (IsNum e, Elt e) => Acc (Vector e) -> Acc (Vector e) -> Acc (Matrix e)
vvProd xs ys = A.zipWith (*) xsRepl ysRepl
  where
    n           = A.size xs
    m           = A.size ys

    xsRepl      = A.replicate (lift (Z :. All :. m  )) xs
    ysRepl      = A.replicate (lift (Z :. n   :. All)) ys

If we then `A.fold (+) 0` the matrix, it would reduce along each row producing a vector. So the first element of that vector is going to be calculated as (xs[0] * ys[0] + xs[0] * ys[1] +  … xs[0] * ys[m-1]). That's the idea we want for our matrix multiplication … but I agree, it is difficult for me to visualise as well.

I do the same sort of trick with the n-body demo to get all n^2 particle interactions.

-Trev





On 04/12/2012, at 3:41 AM, Clark Gaebel <cgaebel <at> uwaterloo.ca> wrote:

Ah. I see now. Silly Haskell making inefficient algorithms hard to write and efficient ones easy. It's actually kind of annoying when learning, but probably for the best.

Is there a good write-up of the algorithm you're using somewhere? The Repa paper was very brief in its explaination, and I'm having trouble visualizing the mapping of the 2D matricies into 3 dimensions.

  - Clark


On Mon, Dec 3, 2012 at 2:06 AM, Trevor L. McDonell <tmcdonell <at> cse.unsw.edu.au> wrote:
Hi Clark,

The trick is that most accelerate operations work over multidimensional arrays, so you can still get around the fact that we are limited to flat data-parallelism only.

Here is matrix multiplication in Accelerate, lifted from the first Repa paper [1].


import Data.Array.Accelerate as A

type Matrix a = Array DIM2 a

matMul :: (IsNum e, Elt e) => Acc (Matrix e) -> Acc (Matrix e) -> Acc (Matrix e)
matMul arr brr
  = A.fold (+) 0
  $ A.zipWith (*) arrRepl brrRepl
  where
    Z :. rowsA :. _     = unlift (shape arr)    :: Z :. Exp Int :. Exp Int
    Z :. _     :. colsB = unlift (shape brr)    :: Z :. Exp Int :. Exp Int

    arrRepl             = A.replicate (lift $ Z :. All   :. colsB :. All) arr
    brrRepl             = A.replicate (lift $ Z :. rowsA :. All   :. All) (A.transpose brr)


If you use github sources rather than the hackage package, those intermediate replicates will get fused away.


Cheers,
-Trev





On 03/12/2012, at 5:07 PM, Clark Gaebel <cgaebel <at> uwaterloo.ca> wrote:

Hello cafe,

I've recently started learning about cuda and hetrogenous programming, and have been using accelerate [1] to help me out. Right now, I'm running into trouble in that I can't call parallel code from sequential code. Turns out GPUs aren't exactly like Repa =P.

Here's what I have so far:

import qualified Data.Array.Accelerate as A
import Data.Array.Accelerate ( (:.)(..)
                             , Acc
                             , Vector
                             , Scalar
                             , Elt
                             , fold
                             , slice
                             , constant
                             , Array
                             , Z(..), DIM1, DIM2
                             , fromList
                             , All(..)
                             , generate
                             , lift, unlift
                             , shape
                             )
import Data.Array.Accelerate.Interpreter ( run )

dotP :: (Num a, Elt a) => Acc (Vector a) -> Acc (Vector a) -> Acc (Scalar a)
dotP xs ys = fold (+) 0 $ A.zipWith (*) xs ys

type Matrix a = Array DIM2 a

getRow :: Elt a => Int -> Acc (Matrix a) -> Acc (Vector a)
getRow n mat = slice mat . constant $ Z :. n :. All

-- Naive matrix multiplication:
--
-- index (i, j) is equal to the ith row of 'a' `dot` the jth row of 'b'
matMul :: A.Acc (Matrix Double) -> A.Acc (Matrix Double) -> A.Acc (Matrix Double)
matMul a b' = A.generate (constant $ Z :. nrows :. ncols) $
                \ix ->
                  let (Z :. i :. j) = unlift ix
                   in getRow i a `dotP` getRow j b
    where
        b = A.transpose b' -- I assume row indexing is faster than column indexing...
        (Z :. nrows :.   _  ) = unlift $ shape a
        (Z :.   _   :. ncols) = unlift $ shape b


This, of course, gives me errors right now because I'm calling getRow and dotP from within the generation function, which expects Exp[ression]s, not Acc[elerated computation]s.

So maybe I need to replace that line with an inner for loop? Is there an easy way to do that with Accelerate?

Thanks for your help,
  - Clark

[1] http://hackage.haskell.org/package/accelerate
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe




_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe





_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Trevor L. McDonell | 5 Dec 02:13 2012
Picon
Picon

Re: Naive matrix multiplication with Accelerate

Thanks for the insight Alex.
In case you are concerned, Accelerate is not in fact a linear algebra library.


On 05/12/2012, at 11:43 AM, Alexander Solla <alex.solla <at> gmail.com> wrote:

Sorry, I didn't realize that course was offered next year.  I read through "Matrices and Linear Algebra" when I was in high school.  And used Friedberg, Insel, Spence's "Linear Algebra" in college.


On Tue, Dec 4, 2012 at 4:37 PM, Alexander Solla <alex.solla <at> gmail.com> wrote:
Well, an m x n matrix corresponds to a linear transformation in at most min{m,n} dimensions.  In particular, this means that a 2x2 matrix corresponds to a plane, line, or the origin of 3-space, as a linear subspace.  Which of those the matrix corresponds to depends on the matrix's "rank", which is the number of linearly independent columns (or rows) in the matrix.

Do you really need to know /which/ plane or line a matrix corresponds to?  If so, reduce it using Gaussian elimination and, if appropriate, compute its eigenvectors or span.  Otherwise, think of it as a generic plane/line/0-point.

Outer products represent more of these simple facts about linear algebra.  The product of an mx1 and 1xn matrices is an mxn matrix with rank at most 1.  Trouble visualizing this means you are missing the essential facts (for the general picture as the product as a line or origin), or requires some computational details -- reducing the matrix using Gaussian elimination and determining its span.

As I said, I don't mean to be harsh, but playing with a vector algebra package without understanding vectors is like playing with a calculator without understanding multiplication.  You're better off learning what multiplication represents first, before using a machine to do it fast.  So, I can humbly recommend taking a course on the subject.  For example, https://www.coursera.org/course/matrix


On Tue, Dec 4, 2012 at 4:13 PM, Clark Gaebel <cgaebel <at> uwaterloo.ca> wrote:
No. But that doesn't stop me from being curious with Accelerate. Might you have a better explaination for what's happening here than Trevor's?

  - Clark


On Tue, Dec 4, 2012 at 7:08 PM, Alexander Solla <alex.solla <at> gmail.com> wrote:
I don't mean to be blunt, but have you guys taken a course in linear algebra?


On Mon, Dec 3, 2012 at 9:21 PM, Trevor L. McDonell <tmcdonell <at> cse.unsw.edu.au> wrote:
As far as I am aware, the only description is in the Repa paper. I you are right, it really should be explained properly somewhere… 

At a simpler example, here is the outer product of two vectors [1].

vvProd :: (IsNum e, Elt e) => Acc (Vector e) -> Acc (Vector e) -> Acc (Matrix e)
vvProd xs ys = A.zipWith (*) xsRepl ysRepl
  where
    n           = A.size xs
    m           = A.size ys

    xsRepl      = A.replicate (lift (Z :. All :. m  )) xs
    ysRepl      = A.replicate (lift (Z :. n   :. All)) ys

If we then `A.fold (+) 0` the matrix, it would reduce along each row producing a vector. So the first element of that vector is going to be calculated as (xs[0] * ys[0] + xs[0] * ys[1] +  … xs[0] * ys[m-1]). That's the idea we want for our matrix multiplication … but I agree, it is difficult for me to visualise as well.

I do the same sort of trick with the n-body demo to get all n^2 particle interactions.

-Trev





On 04/12/2012, at 3:41 AM, Clark Gaebel <cgaebel <at> uwaterloo.ca> wrote:

Ah. I see now. Silly Haskell making inefficient algorithms hard to write and efficient ones easy. It's actually kind of annoying when learning, but probably for the best.

Is there a good write-up of the algorithm you're using somewhere? The Repa paper was very brief in its explaination, and I'm having trouble visualizing the mapping of the 2D matricies into 3 dimensions.

  - Clark


On Mon, Dec 3, 2012 at 2:06 AM, Trevor L. McDonell <tmcdonell <at> cse.unsw.edu.au> wrote:
Hi Clark,

The trick is that most accelerate operations work over multidimensional arrays, so you can still get around the fact that we are limited to flat data-parallelism only.

Here is matrix multiplication in Accelerate, lifted from the first Repa paper [1].


import Data.Array.Accelerate as A

type Matrix a = Array DIM2 a

matMul :: (IsNum e, Elt e) => Acc (Matrix e) -> Acc (Matrix e) -> Acc (Matrix e)
matMul arr brr
  = A.fold (+) 0
  $ A.zipWith (*) arrRepl brrRepl
  where
    Z :. rowsA :. _     = unlift (shape arr)    :: Z :. Exp Int :. Exp Int
    Z :. _     :. colsB = unlift (shape brr)    :: Z :. Exp Int :. Exp Int

    arrRepl             = A.replicate (lift $ Z :. All   :. colsB :. All) arr
    brrRepl             = A.replicate (lift $ Z :. rowsA :. All   :. All) (A.transpose brr)


If you use github sources rather than the hackage package, those intermediate replicates will get fused away.


Cheers,
-Trev





On 03/12/2012, at 5:07 PM, Clark Gaebel <cgaebel <at> uwaterloo.ca> wrote:

Hello cafe,

I've recently started learning about cuda and hetrogenous programming, and have been using accelerate [1] to help me out. Right now, I'm running into trouble in that I can't call parallel code from sequential code. Turns out GPUs aren't exactly like Repa =P.

Here's what I have so far:

import qualified Data.Array.Accelerate as A
import Data.Array.Accelerate ( (:.)(..)
                             , Acc
                             , Vector
                             , Scalar
                             , Elt
                             , fold
                             , slice
                             , constant
                             , Array
                             , Z(..), DIM1, DIM2
                             , fromList
                             , All(..)
                             , generate
                             , lift, unlift
                             , shape
                             )
import Data.Array.Accelerate.Interpreter ( run )

dotP :: (Num a, Elt a) => Acc (Vector a) -> Acc (Vector a) -> Acc (Scalar a)
dotP xs ys = fold (+) 0 $ A.zipWith (*) xs ys

type Matrix a = Array DIM2 a

getRow :: Elt a => Int -> Acc (Matrix a) -> Acc (Vector a)
getRow n mat = slice mat . constant $ Z :. n :. All

-- Naive matrix multiplication:
--
-- index (i, j) is equal to the ith row of 'a' `dot` the jth row of 'b'
matMul :: A.Acc (Matrix Double) -> A.Acc (Matrix Double) -> A.Acc (Matrix Double)
matMul a b' = A.generate (constant $ Z :. nrows :. ncols) $
                \ix ->
                  let (Z :. i :. j) = unlift ix
                   in getRow i a `dotP` getRow j b
    where
        b = A.transpose b' -- I assume row indexing is faster than column indexing...
        (Z :. nrows :.   _  ) = unlift $ shape a
        (Z :.   _   :. ncols) = unlift $ shape b


This, of course, gives me errors right now because I'm calling getRow and dotP from within the generation function, which expects Exp[ression]s, not Acc[elerated computation]s.

So maybe I need to replace that line with an inner for loop? Is there an easy way to do that with Accelerate?

Thanks for your help,
  - Clark

[1] http://hackage.haskell.org/package/accelerate
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe




_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe






_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Clark Gaebel | 5 Dec 02:20 2012
Picon
Picon

Re: Naive matrix multiplication with Accelerate

Thanks! I'll read through "Matricies and Linear Algebra" over the next few days.

  - Clark


On Tue, Dec 4, 2012 at 7:43 PM, Alexander Solla <alex.solla <at> gmail.com> wrote:
Sorry, I didn't realize that course was offered next year.  I read through "Matrices and Linear Algebra" when I was in high school.  And used Friedberg, Insel, Spence's "Linear Algebra" in college.


On Tue, Dec 4, 2012 at 4:37 PM, Alexander Solla <alex.solla <at> gmail.com> wrote:
Well, an m x n matrix corresponds to a linear transformation in at most min{m,n} dimensions.  In particular, this means that a 2x2 matrix corresponds to a plane, line, or the origin of 3-space, as a linear subspace.  Which of those the matrix corresponds to depends on the matrix's "rank", which is the number of linearly independent columns (or rows) in the matrix.

Do you really need to know /which/ plane or line a matrix corresponds to?  If so, reduce it using Gaussian elimination and, if appropriate, compute its eigenvectors or span.  Otherwise, think of it as a generic plane/line/0-point.

Outer products represent more of these simple facts about linear algebra.  The product of an mx1 and 1xn matrices is an mxn matrix with rank at most 1.  Trouble visualizing this means you are missing the essential facts (for the general picture as the product as a line or origin), or requires some computational details -- reducing the matrix using Gaussian elimination and determining its span.

As I said, I don't mean to be harsh, but playing with a vector algebra package without understanding vectors is like playing with a calculator without understanding multiplication.  You're better off learning what multiplication represents first, before using a machine to do it fast.  So, I can humbly recommend taking a course on the subject.  For example, https://www.coursera.org/course/matrix


On Tue, Dec 4, 2012 at 4:13 PM, Clark Gaebel <cgaebel <at> uwaterloo.ca> wrote:
No. But that doesn't stop me from being curious with Accelerate. Might you have a better explaination for what's happening here than Trevor's?

  - Clark


On Tue, Dec 4, 2012 at 7:08 PM, Alexander Solla <alex.solla <at> gmail.com> wrote:
I don't mean to be blunt, but have you guys taken a course in linear algebra?


On Mon, Dec 3, 2012 at 9:21 PM, Trevor L. McDonell <tmcdonell <at> cse.unsw.edu.au> wrote:
As far as I am aware, the only description is in the Repa paper. I you are right, it really should be explained properly somewhere… 

At a simpler example, here is the outer product of two vectors [1].

vvProd :: (IsNum e, Elt e) => Acc (Vector e) -> Acc (Vector e) -> Acc (Matrix e)
vvProd xs ys = A.zipWith (*) xsRepl ysRepl
  where
    n           = A.size xs
    m           = A.size ys

    xsRepl      = A.replicate (lift (Z :. All :. m  )) xs
    ysRepl      = A.replicate (lift (Z :. n   :. All)) ys

If we then `A.fold (+) 0` the matrix, it would reduce along each row producing a vector. So the first element of that vector is going to be calculated as (xs[0] * ys[0] + xs[0] * ys[1] +  … xs[0] * ys[m-1]). That's the idea we want for our matrix multiplication … but I agree, it is difficult for me to visualise as well.

I do the same sort of trick with the n-body demo to get all n^2 particle interactions.

-Trev





On 04/12/2012, at 3:41 AM, Clark Gaebel <cgaebel <at> uwaterloo.ca> wrote:

Ah. I see now. Silly Haskell making inefficient algorithms hard to write and efficient ones easy. It's actually kind of annoying when learning, but probably for the best.

Is there a good write-up of the algorithm you're using somewhere? The Repa paper was very brief in its explaination, and I'm having trouble visualizing the mapping of the 2D matricies into 3 dimensions.

  - Clark


On Mon, Dec 3, 2012 at 2:06 AM, Trevor L. McDonell <tmcdonell <at> cse.unsw.edu.au> wrote:
Hi Clark,

The trick is that most accelerate operations work over multidimensional arrays, so you can still get around the fact that we are limited to flat data-parallelism only.

Here is matrix multiplication in Accelerate, lifted from the first Repa paper [1].


import Data.Array.Accelerate as A

type Matrix a = Array DIM2 a

matMul :: (IsNum e, Elt e) => Acc (Matrix e) -> Acc (Matrix e) -> Acc (Matrix e)
matMul arr brr
  = A.fold (+) 0
  $ A.zipWith (*) arrRepl brrRepl
  where
    Z :. rowsA :. _     = unlift (shape arr)    :: Z :. Exp Int :. Exp Int
    Z :. _     :. colsB = unlift (shape brr)    :: Z :. Exp Int :. Exp Int

    arrRepl             = A.replicate (lift $ Z :. All   :. colsB :. All) arr
    brrRepl             = A.replicate (lift $ Z :. rowsA :. All   :. All) (A.transpose brr)


If you use github sources rather than the hackage package, those intermediate replicates will get fused away.


Cheers,
-Trev





On 03/12/2012, at 5:07 PM, Clark Gaebel <cgaebel <at> uwaterloo.ca> wrote:

Hello cafe,

I've recently started learning about cuda and hetrogenous programming, and have been using accelerate [1] to help me out. Right now, I'm running into trouble in that I can't call parallel code from sequential code. Turns out GPUs aren't exactly like Repa =P.

Here's what I have so far:

import qualified Data.Array.Accelerate as A
import Data.Array.Accelerate ( (:.)(..)
                             , Acc
                             , Vector
                             , Scalar
                             , Elt
                             , fold
                             , slice
                             , constant
                             , Array
                             , Z(..), DIM1, DIM2
                             , fromList
                             , All(..)
                             , generate
                             , lift, unlift
                             , shape
                             )
import Data.Array.Accelerate.Interpreter ( run )

dotP :: (Num a, Elt a) => Acc (Vector a) -> Acc (Vector a) -> Acc (Scalar a)
dotP xs ys = fold (+) 0 $ A.zipWith (*) xs ys

type Matrix a = Array DIM2 a

getRow :: Elt a => Int -> Acc (Matrix a) -> Acc (Vector a)
getRow n mat = slice mat . constant $ Z :. n :. All

-- Naive matrix multiplication:
--
-- index (i, j) is equal to the ith row of 'a' `dot` the jth row of 'b'
matMul :: A.Acc (Matrix Double) -> A.Acc (Matrix Double) -> A.Acc (Matrix Double)
matMul a b' = A.generate (constant $ Z :. nrows :. ncols) $
                \ix ->
                  let (Z :. i :. j) = unlift ix
                   in getRow i a `dotP` getRow j b
    where
        b = A.transpose b' -- I assume row indexing is faster than column indexing...
        (Z :. nrows :.   _  ) = unlift $ shape a
        (Z :.   _   :. ncols) = unlift $ shape b


This, of course, gives me errors right now because I'm calling getRow and dotP from within the generation function, which expects Exp[ression]s, not Acc[elerated computation]s.

So maybe I need to replace that line with an inner for loop? Is there an easy way to do that with Accelerate?

Thanks for your help,
  - Clark

[1] http://hackage.haskell.org/package/accelerate
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe




_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe






_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Gmane