3 Dec 07:07 2012

## Naive matrix multiplication with Accelerate

Clark Gaebel <cgaebel <at> uwaterloo.ca>

2012-12-03 06:07:49 GMT

2012-12-03 06:07:49 GMT

Hello cafe,

I've recently started learning about cuda and hetrogenous programming, and have been using accelerate [1] to help me out. Right now, I'm running into trouble in that I can't call parallel code from sequential code. Turns out GPUs aren't exactly like Repa =P.

Here's what I have so far:

import qualified Data.Array.Accelerate as A

import Data.Array.Accelerate ( (:.)(..)

, Acc

, Vector

, Scalar

, Elt

, fold

, slice

, constant

, Array

, Z(..), DIM1, DIM2

, fromList

, All(..)

, generate

, lift, unlift

, shape

)

import Data.Array.Accelerate.Interpreter ( run )

dotP :: (Num a, Elt a) => Acc (Vector a) -> Acc (Vector a) -> Acc (Scalar a)

dotP xs ys = fold (+) 0 $ A.zipWith (*) xs ys

type Matrix a = Array DIM2 a

getRow :: Elt a => Int -> Acc (Matrix a) -> Acc (Vector a)

getRow n mat = slice mat . constant $ Z :. n :. All

-- Naive matrix multiplication:

--

-- index (i, j) is equal to the ith row of 'a' `dot` the jth row of 'b'

matMul :: A.Acc (Matrix Double) -> A.Acc (Matrix Double) -> A.Acc (Matrix Double)

matMul a b' = A.generate (constant $ Z :. nrows :. ncols) $

\ix ->

let (Z :. i :. j) = unlift ix

in getRow i a `dotP` getRow j b

where

b = A.transpose b' -- I assume row indexing is faster than column indexing...

(Z :. nrows :. _ ) = unlift $ shape a

(Z :. _ :. ncols) = unlift $ shape b

This, of course, gives me errors right now because I'm calling getRow and dotP from within the generation function, which expects Exp[ression]s, not Acc[elerated computation]s.

So maybe I need to replace that line with an inner for loop? Is there an easy way to do that with Accelerate?

Thanks for your help,

- Clark

[1] http://hackage.haskell.org/package/accelerate

I've recently started learning about cuda and hetrogenous programming, and have been using accelerate [1] to help me out. Right now, I'm running into trouble in that I can't call parallel code from sequential code. Turns out GPUs aren't exactly like Repa =P.

Here's what I have so far:

import qualified Data.Array.Accelerate as A

import Data.Array.Accelerate ( (:.)(..)

, Acc

, Vector

, Scalar

, Elt

, fold

, slice

, constant

, Array

, Z(..), DIM1, DIM2

, fromList

, All(..)

, generate

, lift, unlift

, shape

)

import Data.Array.Accelerate.Interpreter ( run )

dotP :: (Num a, Elt a) => Acc (Vector a) -> Acc (Vector a) -> Acc (Scalar a)

dotP xs ys = fold (+) 0 $ A.zipWith (*) xs ys

type Matrix a = Array DIM2 a

getRow :: Elt a => Int -> Acc (Matrix a) -> Acc (Vector a)

getRow n mat = slice mat . constant $ Z :. n :. All

-- Naive matrix multiplication:

--

-- index (i, j) is equal to the ith row of 'a' `dot` the jth row of 'b'

matMul :: A.Acc (Matrix Double) -> A.Acc (Matrix Double) -> A.Acc (Matrix Double)

matMul a b' = A.generate (constant $ Z :. nrows :. ncols) $

\ix ->

let (Z :. i :. j) = unlift ix

in getRow i a `dotP` getRow j b

where

b = A.transpose b' -- I assume row indexing is faster than column indexing...

(Z :. nrows :. _ ) = unlift $ shape a

(Z :. _ :. ncols) = unlift $ shape b

This, of course, gives me errors right now because I'm calling getRow and dotP from within the generation function, which expects Exp[ression]s, not Acc[elerated computation]s.

So maybe I need to replace that line with an inner for loop? Is there an easy way to do that with Accelerate?

Thanks for your help,

- Clark

[1] http://hackage.haskell.org/package/accelerate

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe <at> haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe