Andrey Yankin | 13 Jan 22:58 2013
Picon

Repa: traverse delayed array multiple times

Greetings to all!

Currently I'm trying to do physical simulations using finite difference method. To accomplish this I need to repeat some manipulations on 2-dimensional grid several thousands or millions times, you know. Is it possible to do using repa?

I wrote this rough sketch that shows what I am into. Apparently, program is severely slow. I think reason is:
"Every time an element is requested from a delayed array it is calculated anew, which means that delayed arrays are inefficient when the data is needed multiple times"
(http://www.haskell.org/haskellwiki/Numeric_Haskell:_A_Repa_Tutorial).

I started diving into Guiding Parallel Array Fusion with Indexed Types mentioned there. Is it a right place to find any answer to my question?

To summarize, I want to apply my function to Array several thousand times as fast as possible (using multi-threading and GPGPU). It seems I need a lot of memory and throughput. Do I have to call iterate only once at a time or in some kind of batches. What are my possibilities?

Thanks in advance!

Example program:

import System.Environment
import Data.Array.Repa as Repa

h = 50
w = 50

grid = fromListUnboxed (Z :. h :. w) $ take (h * w) [1, 1..]

main = do
  args <- getArgs
  let n = read $ head args
  g <- computeP $ accumulate n grid :: IO (Array U DIM2 Int)
  putStrLn "Ok"

accumulate :: Int -> Array U DIM2 Int -> Array D DIM2 Int
accumulate n g = repaIterate step n $ delay g

step :: (DIM2 -> Int) -> DIM2 -> Int
step f (Z :. i :. j) = center + right + left + top + bottom
 where
  left   = f'  j      (i - 1)
  center = f'  j       i
  right  = f'  j      (i + 1)
  top    = f' (j + 1)  i
  bottom = f' (j - 1)  i
  f' j i   =
    if j<0 || i<0 || i>=h || j>=w
    then 0
    else f (Z :. j :. i)

repaIterate :: ((DIM2 -> Int) -> DIM2 -> Int) -> Int -> Array D DIM2 Int -> Array D DIM2 Int
repaIterate f n = (!! n) . iterate (\array -> traverse array id f)


Compilation:

ghc -O2 -threaded -rtsopts --make repatest.hs

time output:

./repatest 3 +RTS -N2 -H  0,02s user 0,02s system 109% cpu 0,033 total
./repatest 4 +RTS -N2 -H  0,09s user 0,02s system 126% cpu 0,089 total
./repatest 5 +RTS -N2 -H  0,30s user 0,02s system 155% cpu 0,200 total
./repatest 6 +RTS -N2 -H  1,16s user 0,06s system 185% cpu 0,663 total
./repatest 7 +RTS -N2 -H  5,63s user 0,24s system 191% cpu 3,071 total
./repatest 8 +RTS -N2 -H  27,72s user 0,89s system 191% cpu 14,960 total
./repatest 8 +RTS -N1 -H  26,23s user 0,19s system 100% cpu 26,388 total

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Dominic Steinitz | 14 Jan 09:08 2013

Re: Repa: traverse delayed array multiple times

Andrey Yankin <yankin013 <at> gmail.com> writes:

> 
> 
> Greetings to all!
> 
repa?I wrote this rough sketch that shows what I am into. Apparently, program 
is severely slow. I think reason is:"Every time an element is requested from a 
delayed array it is calculated
>  anew, which means that delayed arrays are inefficient when the data is 
> needed multiple 
times"(http://www.haskell.org/haskellwiki/Numeric_Haskell:_A_Repa_Tutorial).I 
started diving into Guiding Parallel Array Fusion with Indexed Types mentioned 
there. Is it a right place to find any answer to my question?
> 

It's certainly possible to do this in repa. As I understand it, that is pretty 
much what repa is for. I'm not sure about your explanation for the slowdown 
although it sounds plausible.

I amended your code slightly and it runs pretty quickly for me using my 2 
processors!

updater :: Source r Int => Array r DIM2 Int -> Array D DIM2 Int
updater a = traverse a id step

updaterM :: Monad m => Int -> Array U DIM2 Int -> m (Array U DIM2 Int)
updaterM n = foldr (>=>) return (replicate n (computeP . updater))

and replace

  g <- computeP $ accumulate n grid :: IO (Array U DIM2 Int)

by

  g <- updaterM n grid

BTW I think you can use stencils for what you are doing (I assume it is some 
sort of relaxation method) as the coefficients are constant. I think this 
should speed things up further as you won't be testing the boundary conditions 
on every loop.

Dominic.
Andrey Yankin | 14 Jan 11:18 2013
Picon

Re: Repa: traverse delayed array multiple times

Hi, Dominic!

Thanks for advice. I will definitely use it for now.


When I wrote

> Do I have to call iterate only once at a time

It was obscure and has wrong words.
I was trying to ask if it is necessary to run computeP every time I want to modify(traverse) an array?
I was contemplating similar solution as you advise, but there is a concern.
I want to modify it thousands of times and do not want to allocate and garbage more memory than needed.

Here is the implementation I dream of:
There must be function traverseNTimes that takes also an arbitrary number (the number of times to repeat traversal).
Then it allocates but two new blocks and uses them to iterate the process as needed. No superfluous memory consumption.
Is it unusual case? Is it worth thinking about? Do it give any performance gain?

Actually I started with repa only yesterday, so I was thinking of delayed or cursored arrays which are capable of magic.
Now I am beginning to see that it is not the case, though I don't know it for sure yet... )

> BTW I think you can use stencils for what you are doing (I assume it is some
> sort of relaxation method) as the coefficients are constant. I think this
> should speed things up further as you won't be testing the boundary conditions
> on every loop.
Sadly, there are much more sophisticated formulae and more than one boundary conditions on each side (and these conditions are moving)...


2013/1/14 Dominic Steinitz <dominic <at> steinitz.org>
Andrey Yankin <yankin013 <at> gmail.com> writes:

>
>
> Greetings to all!
>
repa?I wrote this rough sketch that shows what I am into. Apparently, program
is severely slow. I think reason is:"Every time an element is requested from a
delayed array it is calculated
>  anew, which means that delayed arrays are inefficient when the data is
> needed multiple
times"(http://www.haskell.org/haskellwiki/Numeric_Haskell:_A_Repa_Tutorial).I
started diving into Guiding Parallel Array Fusion with Indexed Types mentioned
there. Is it a right place to find any answer to my question?
>

It's certainly possible to do this in repa. As I understand it, that is pretty
much what repa is for. I'm not sure about your explanation for the slowdown
although it sounds plausible.

I amended your code slightly and it runs pretty quickly for me using my 2
processors!

updater :: Source r Int => Array r DIM2 Int -> Array D DIM2 Int
updater a = traverse a id step

updaterM :: Monad m => Int -> Array U DIM2 Int -> m (Array U DIM2 Int)
updaterM n = foldr (>=>) return (replicate n (computeP . updater))

and replace

  g <- computeP $ accumulate n grid :: IO (Array U DIM2 Int)

by

  g <- updaterM n grid

BTW I think you can use stencils for what you are doing (I assume it is some
sort of relaxation method) as the coefficients are constant. I think this
should speed things up further as you won't be testing the boundary conditions
on every loop.

Dominic.




_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Gmane