Branimir Maksimovic | 18 Dec 21:42 2012
Picon

my Fasta is slow ;(

This time I have tried fasta benchmark since current entries does not
display correct output.
Program is copy of mine http://benchmarksgame.alioth.debian.org/u64q/program.php?test=fasta&lang=gpp&id=1
c++ benchmark, but unfortunately executes more than twice time.

Seems to me that culprit  is in function random as I have tested rest of code
and didn't found speed related  problems.

bmaxa <at> maxa:~/shootout/fasta$ time ./fastahs 25000000 > /dev/null

real    0m5.262s
user    0m5.228s
sys     0m0.020s

bmaxa <at> maxa:~/shootout/fasta$ time ./fastacpp 25000000 > /dev/null

real    0m2.075s
user    0m2.056s
sys     0m0.012s

Since I am planning to contribute program, perhaps someone can
see a problem to speed it up at least around 3.5 secs which is 
speed of bench that display incorrect result  (in 7.6.1).

Program follows:

{-# LANGUAGE BangPatterns #-}
{-  The Computer Language Benchmarks Game

    http://shootout.alioth.debian.org/

    contributed by Branimir Maksimovic
-}

import System.Environment
import System.IO.Unsafe

import Data.IORef
import Data.Array.Unboxed
import Data.Array.Storable
import Data.Array.Base
import Data.Word

import Foreign.Ptr
import Foreign.C.Types

type A = UArray Int Word8
type B = StorableArray Int Word8
type C = (UArray Int Word8,UArray Int Double)

foreign import ccall unsafe "stdio.h" 
     puts  :: Ptr a -> IO ()
foreign import ccall unsafe "string.h" 
     strlen :: Ptr a -> IO CInt

main :: IO ()     
main = do
    n <- getArgs >>= readIO.head

    let !a = (listArray (0,(length alu)-1) 
             $ map (fromIntegral. fromEnum) alu:: A)
    make "ONE" "Homo sapiens alu" (n*2) $ Main.repeat a (length alu)
    make "TWO"  "IUB ambiguity codes" (n*3) $ random iub
    make "THREE" "Homo sapiens frequency" (n*5) $ random homosapiens

make :: String -> String -> Int -> IO Word8 -> IO ()
{-# INLINE make #-}
make id desc n f = do
    let lst = ">" ++ id ++ " " ++ desc
    a <- (newListArray (0,length lst) 
        $ map (fromIntegral. fromEnum) lst:: IO B)
    unsafeWrite a (length lst) 0
    pr a
    make' n 0
    where 
        make' :: Int -> Int -> IO ()
        make' !n !i = do
            let line = (unsafePerformIO $ 
                        newArray (0,60) 0 :: B)
            if n > 0
                then do
                    !c <- f
                    unsafeWrite line i c
                    if i+1 >= 60 
                        then do
                            pr line
                            make' (n-1) 0
                        else 
                            make' (n-1) (i+1)
                else do
                    unsafeWrite line i 0
                    l <- len line
                    if l /= 0
                        then pr line
                        else return ()

pr :: B -> IO ()
pr line = withStorableArray line (\ptr -> puts ptr)
len :: B -> IO CInt
len line  = withStorableArray line (\ptr -> strlen ptr)

repeat :: A -> Int -> IO Word8
repeat xs !n = do
        let v = unsafePerformIO $ newIORef 0
        !i <- readIORef v
        if i+1 >= n
            then writeIORef v 0
            else writeIORef v (i+1)
        return $ xs `unsafeAt` i

random :: C -> IO Word8
random (a,b) = do 
        !rnd <- rand
        let 
            find :: Int -> IO Word8
            find !i = 
                let 
                    !c = a `unsafeAt` i
                    !p = b `unsafeAt` i
                in if p >= rnd
                    then return c
                    else find (i+1)
        find 0

rand :: IO Double
{-# INLINE rand #-}
rand = do
    !seed <- readIORef last
    let
        newseed = (seed * ia + ic) `rem` im
        newran  =  fromIntegral newseed * rimd
        rimd      = 1.0 / (fromIntegral im)
        im, ia, ic :: Int
        im  = 139968
        ia  = 3877
        ic  = 29573
    writeIORef last newseed
    return newran
    where 
        last = unsafePerformIO $ newIORef 42
    
alu    :: [Char]    
alu = 
    "GGCCGGGCGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGG\
    \GAGGCCGAGGCGGGCGGATCACCTGAGGTCAGGAGTTCGAGA\
    \CCAGCCTGGCCAACATGGTGAAACCCCGTCTCTACTAAAAAT\
    \ACAAAAATTAGCCGGGCGTGGTGGCGCGCGCCTGTAATCCCA\
    \GCTACTCGGGAGGCTGAGGCAGGAGAATCGCTTGAACCCGGG\
    \AGGCGGAGGTTGCAGTGAGCCGAGATCGCGCCACTGCACTCC\
    \AGCCTGGGCGACAGAGCGAGACTCCGTCTCAAAAA"

mkCum :: [(Char,Double)] -> [(Word8,Double)]
mkCum lst = map (\(c,p) -> ((fromIntegral.fromEnum) c,p)) $
              scanl1 (\(_,p) (c',p') -> (c', p+p')) lst

homosapiens, iub :: C

iub' = mkCum [('a',0.27),('c',0.12),('g',0.12),('t',0.27),('B',0.02)
        ,('D',0.02),('H',0.02),('K',0.02),('M',0.02),('N',0.02)
        ,('R',0.02),('S',0.02),('V',0.02),('W',0.02),('Y',0.02)]

homosapiens' = mkCum [('a',0.3029549426680),('c',0.1979883004921)
                ,('g',0.1975473066391),('t',0.3015094502008)]

iub = (listArray (0, (length iub')-1) $ map fst iub',
        listArray (0, (length iub')-1) $ map snd iub')

homosapiens = (listArray (0, (length homosapiens')-1) $ map fst homosapiens',
                listArray (0, (length homosapiens')-1) $ map snd homosapiens')

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Bryan O'Sullivan | 20 Dec 00:31 2012

Re: my Fasta is slow ;(

I took your Haskell program as a base and have refactored it into a version that is about the same speed as your original C++ program. Will follow up with details when I have a little more time.

On Tue, Dec 18, 2012 at 12:42 PM, Branimir Maksimovic <bmaxa <at> hotmail.com> wrote:
This time I have tried fasta benchmark since current entries does not
display correct output.
c++ benchmark, but unfortunately executes more than twice time.

Seems to me that culprit  is in function random as I have tested rest of code
and didn't found speed related  problems.

bmaxa <at> maxa:~/shootout/fasta$ time ./fastahs 25000000 > /dev/null

real    0m5.262s
user    0m5.228s
sys     0m0.020s

bmaxa <at> maxa:~/shootout/fasta$ time ./fastacpp 25000000 > /dev/null

real    0m2.075s
user    0m2.056s
sys     0m0.012s

Since I am planning to contribute program, perhaps someone can
see a problem to speed it up at least around 3.5 secs which is 
speed of bench that display incorrect result  (in 7.6.1).

Program follows:

{-# LANGUAGE BangPatterns #-}
{-  The Computer Language Benchmarks Game


    contributed by Branimir Maksimovic
-}

import System.Environment
import System.IO.Unsafe

import Data.IORef
import Data.Array.Unboxed
import Data.Array.Storable
import Data.Array.Base
import Data.Word

import Foreign.Ptr
import Foreign.C.Types

type A = UArray Int Word8
type B = StorableArray Int Word8
type C = (UArray Int Word8,UArray Int Double)

foreign import ccall unsafe "stdio.h" 
     puts  :: Ptr a -> IO ()
foreign import ccall unsafe "string.h" 
     strlen :: Ptr a -> IO CInt

main :: IO ()     
main = do
    n <- getArgs >>= readIO.head

    let !a = (listArray (0,(length alu)-1) 
             $ map (fromIntegral. fromEnum) alu:: A)
    make "ONE" "Homo sapiens alu" (n*2) $ Main.repeat a (length alu)
    make "TWO"  "IUB ambiguity codes" (n*3) $ random iub
    make "THREE" "Homo sapiens frequency" (n*5) $ random homosapiens

make :: String -> String -> Int -> IO Word8 -> IO ()
{-# INLINE make #-}
make id desc n f = do
    let lst = ">" ++ id ++ " " ++ desc
    a <- (newListArray (0,length lst) 
        $ map (fromIntegral. fromEnum) lst:: IO B)
    unsafeWrite a (length lst) 0
    pr a
    make' n 0
    where 
        make' :: Int -> Int -> IO ()
        make' !n !i = do
            let line = (unsafePerformIO $ 
                        newArray (0,60) 0 :: B)
            if n > 0
                then do
                    !c <- f
                    unsafeWrite line i c
                    if i+1 >= 60 
                        then do
                            pr line
                            make' (n-1) 0
                        else 
                            make' (n-1) (i+1)
                else do
                    unsafeWrite line i 0
                    l <- len line
                    if l /= 0
                        then pr line
                        else return ()

pr :: B -> IO ()
pr line = withStorableArray line (\ptr -> puts ptr)
len :: B -> IO CInt
len line  = withStorableArray line (\ptr -> strlen ptr)

repeat :: A -> Int -> IO Word8
repeat xs !n = do
        let v = unsafePerformIO $ newIORef 0
        !i <- readIORef v
        if i+1 >= n
            then writeIORef v 0
            else writeIORef v (i+1)
        return $ xs `unsafeAt` i

random :: C -> IO Word8
random (a,b) = do 
        !rnd <- rand
        let 
            find :: Int -> IO Word8
            find !i = 
                let 
                    !c = a `unsafeAt` i
                    !p = b `unsafeAt` i
                in if p >= rnd
                    then return c
                    else find (i+1)
        find 0

rand :: IO Double
{-# INLINE rand #-}
rand = do
    !seed <- readIORef last
    let
        newseed = (seed * ia + ic) `rem` im
        newran  =  fromIntegral newseed * rimd
        rimd      = 1.0 / (fromIntegral im)
        im, ia, ic :: Int
        im  = 139968
        ia  = 3877
        ic  = 29573
    writeIORef last newseed
    return newran
    where 
        last = unsafePerformIO $ newIORef 42
    
alu    :: [Char]    
alu = 
    "GGCCGGGCGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGG\
    \GAGGCCGAGGCGGGCGGATCACCTGAGGTCAGGAGTTCGAGA\
    \CCAGCCTGGCCAACATGGTGAAACCCCGTCTCTACTAAAAAT\
    \ACAAAAATTAGCCGGGCGTGGTGGCGCGCGCCTGTAATCCCA\
    \GCTACTCGGGAGGCTGAGGCAGGAGAATCGCTTGAACCCGGG\
    \AGGCGGAGGTTGCAGTGAGCCGAGATCGCGCCACTGCACTCC\
    \AGCCTGGGCGACAGAGCGAGACTCCGTCTCAAAAA"

mkCum :: [(Char,Double)] -> [(Word8,Double)]
mkCum lst = map (\(c,p) -> ((fromIntegral.fromEnum) c,p)) $
              scanl1 (\(_,p) (c',p') -> (c', p+p')) lst

homosapiens, iub :: C

iub' = mkCum [('a',0.27),('c',0.12),('g',0.12),('t',0.27),('B',0.02)
        ,('D',0.02),('H',0.02),('K',0.02),('M',0.02),('N',0.02)
        ,('R',0.02),('S',0.02),('V',0.02),('W',0.02),('Y',0.02)]

homosapiens' = mkCum [('a',0.3029549426680),('c',0.1979883004921)
                ,('g',0.1975473066391),('t',0.3015094502008)]

iub = (listArray (0, (length iub')-1) $ map fst iub',
        listArray (0, (length iub')-1) $ map snd iub')

homosapiens = (listArray (0, (length homosapiens')-1) $ map fst homosapiens',
                listArray (0, (length homosapiens')-1) $ map snd homosapiens')


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Bryan O'Sullivan | 28 Dec 00:58 2012

Re: my Fasta is slow ;(

On Tue, Dec 18, 2012 at 12:42 PM, Branimir Maksimovic <bmaxa <at> hotmail.com> wrote:


Seems to me that culprit  is in function random as I have tested rest of code
and didn't found speed related  problems.

The problem with your original program was that it was not pure enough. Because you stored your PRNG state in an IORef, you forced the program to allocate and case-inspect boxed Ints in its inner loop.

I refactored it slightly to make genRand and genRandom pure, and combined with using the LLVM back end, this doubled the program's performance, so that the Haskell program ran at the same speed as your C++ version.

The next bottleneck was that your program was performing floating point arithmetic in the inner loop. I changed it to precompute a small lookup table, followed by only using integer arithmetic in the inner loop (the same technique used by the fastest C fasta program). This further improved performance: the new Haskell code is 40% faster than the C++ program, and only ~20% slower than the C program that currently tops the shootout chart. The Haskell source is a little over half the size of the C source.

You can follow the work I did here: https://github.com/bos/shootout-fasta
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Branimir Maksimovic | 28 Dec 03:05 2012
Picon

Re: my Fasta is slow ;(

Thank you. Your entry is great. Faster than fortran entry!
Dou you want to contribute at the site, or you want me to do it for you?

Date: Thu, 27 Dec 2012 15:58:40 -0800
Subject: Re: [Haskell-cafe] my Fasta is slow ;(
From: bos <at> serpentine.com
To: bmaxa <at> hotmail.com
CC: haskell-cafe <at> haskell.org

On Tue, Dec 18, 2012 at 12:42 PM, Branimir Maksimovic <bmaxa <at> hotmail.com> wrote:

Seems to me that culprit  is in function random as I have tested rest of code
and didn't found speed related  problems.

The problem with your original program was that it was not pure enough. Because you stored your PRNG state in an IORef, you forced the program to allocate and case-inspect boxed Ints in its inner loop.

I refactored it slightly to make genRand and genRandom pure, and combined with using the LLVM back end, this doubled the program's performance, so that the Haskell program ran at the same speed as your C++ version.

The next bottleneck was that your program was performing floating point arithmetic in the inner loop. I changed it to precompute a small lookup table, followed by only using integer arithmetic in the inner loop (the same technique used by the fastest C fasta program). This further improved performance: the new Haskell code is 40% faster than the C++ program, and only ~20% slower than the C program that currently tops the shootout chart. The Haskell source is a little over half the size of the C source.

You can follow the work I did here: https://github.com/bos/shootout-fasta
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Bryan O'Sullivan | 28 Dec 10:18 2012

Re: my Fasta is slow ;(

I've already submitted it, thanks.

The Fortran program commits the same sin as the C++ one, of doing floating point arithmetic in the inner loop; that's why it's slow.

On Dec 27, 2012, at 18:05, Branimir Maksimovic <bmaxa <at> hotmail.com> wrote:

<!-- .hmmessage P { margin:0px; padding:0px } body.hmmessage { font-size: 10pt; font-family:Tahoma } -->
Thank you. Your entry is great. Faster than fortran entry!
Dou you want to contribute at the site, or you want me to do it for you?

Date: Thu, 27 Dec 2012 15:58:40 -0800
Subject: Re: [Haskell-cafe] my Fasta is slow ;(
From: bos <at> serpentine.com
To: bmaxa <at> hotmail.com
CC: haskell-cafe <at> haskell.org

On Tue, Dec 18, 2012 at 12:42 PM, Branimir Maksimovic <bmaxa <at> hotmail.com> wrote:

Seems to me that culprit  is in function random as I have tested rest of code
and didn't found speed related  problems.

The problem with your original program was that it was not pure enough. Because you stored your PRNG state in an IORef, you forced the program to allocate and case-inspect boxed Ints in its inner loop.

I refactored it slightly to make genRand and genRandom pure, and combined with using the LLVM back end, this doubled the program's performance, so that the Haskell program ran at the same speed as your C++ version.

The next bottleneck was that your program was performing floating point arithmetic in the inner loop. I changed it to precompute a small lookup table, followed by only using integer arithmetic in the inner loop (the same technique used by the fastest C fasta program). This further improved performance: the new Haskell code is 40% faster than the C++ program, and only ~20% slower than the C program that currently tops the shootout chart. The Haskell source is a little over half the size of the C source.

You can follow the work I did here: https://github.com/bos/shootout-fasta
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Gmane