Dmitry Vyal | 17 Oct 09:07 2012
Picon

poor performance when generating random text

Hello anyone

I've written a snippet which generates a file full of random strings. 
When compiled with -O2 on ghc-7.6, the generation speed is about 2Mb per 
second which is on par with interpreted php. That's the fact I find 
rather disappointing. Maybe I've missed something trivial? Any 
suggestions and explanations are welcome. :)

% cat ext_sort.hs
import qualified Data.Text as T
import System.Random
import Control.Exception
import Control.Monad

import System.IO
import qualified Data.Text.IO as TI

gen_string g = let (len, g') = randomR (50, 450) g
                in T.unfoldrN len rand_text (len, g')
  where rand_text (0,_) = Nothing
        rand_text (k,g) = let (c, g') = randomR ('a','z') g
                          in Just (c, ((k-1), g'))

write_corpus file = bracket (openFile file WriteMode) hClose $ \h -> do
   let size = 100000
   sequence $ replicate size $ do
     g <- newStdGen
     let text = gen_string g
     TI.hPutStrLn h text

(Continue reading)

Gregory Collins | 17 Oct 09:36 2012
Picon

Re: poor performance when generating random text

System.Random is very slow. Try the mwc-random package from Hackage.

On Wed, Oct 17, 2012 at 9:07 AM, Dmitry Vyal <akamaus <at> gmail.com> wrote:
Hello anyone

I've written a snippet which generates a file full of random strings. When compiled with -O2 on ghc-7.6, the generation speed is about 2Mb per second which is on par with interpreted php. That's the fact I find rather disappointing. Maybe I've missed something trivial? Any suggestions and explanations are welcome. :)

% cat ext_sort.hs
import qualified Data.Text as T
import System.Random
import Control.Exception
import Control.Monad

import System.IO
import qualified Data.Text.IO as TI

gen_string g = let (len, g') = randomR (50, 450) g
               in T.unfoldrN len rand_text (len, g')
 where rand_text (0,_) = Nothing
       rand_text (k,g) = let (c, g') = randomR ('a','z') g
                         in Just (c, ((k-1), g'))

write_corpus file = bracket (openFile file WriteMode) hClose $ \h -> do
  let size = 100000
  sequence $ replicate size $ do
    g <- newStdGen
    let text = gen_string g
    TI.hPutStrLn h text

main = do
  putStrLn "generating text corpus"
  write_corpus "test.txt"



% cat ext_sort.prof
        Wed Oct 17 10:59 2012 Time and Allocation Profiling Report (Final)

           ext_sort +RTS -p -RTS

        total time  =       32.56 secs   (32558 ticks <at> 1000 us, 1 processor)
        total alloc = 12,742,917,332 bytes  (excludes profiling overheads)

COST CENTRE                MODULE  %time %alloc

gen_string.rand_text.(...) Main     70.7   69.8
gen_string                 Main     17.6   15.8
gen_string.rand_text       Main      5.4   13.3
write_corpus.\             Main      4.3    0.8


individual     inherited
COST CENTRE                       MODULE no.     entries  %time %alloc   %time %alloc

MAIN MAIN                                67           0    0.0    0.0 100.0  100.0
 main                             Main 135           0    0.0    0.0   100.0  100.0
  write_corpus                    Main 137           0    0.0    0.0   100.0  100.0
   write_corpus.\                 Main 138           1    4.3    0.8   100.0  100.0
    write_corpus.\.text           Main 140      100000    0.0    0.0    95.7   99.2
     gen_string                   Main 141      100000   17.6   15.8    95.7   99.2
      gen_string.g'               Main 147      100000    0.0    0.0     0.0    0.0
      gen_string.rand_text        Main 144    25109743    5.4   13.3    77.5   83.2
       gen_string.rand_text.g'    Main 148    24909743    0.6    0.0     0.6    0.0
       gen_string.rand_text.(...) Main 146    25009743   70.7   69.8    70.7   69.8
       gen_string.rand_text.c     Main 145    25009743    0.8    0.0     0.8    0.0
      gen_string.len              Main 143      100000    0.0    0.0     0.0    0.0
      gen_string.(...)            Main 142      100000    0.6    0.3     0.6    0.3

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe



--
Gregory Collins <greg <at> gregorycollins.net>
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Alfredo Di Napoli | 17 Oct 10:45 2012
Picon

Re: poor performance when generating random text

What about this? I've tested on my pc and seems pretty fast. The trick is to generate the gen only once. Not sure if the inlines helps, though:


import qualified Data.Text as T
import System.Random.MWC
import Control.Monad
import System.IO
import Data.ByteString as B
import Data.Word (Word8)
import Data.ByteString.Char8 as CB


{- | Converts a Char to a Word8. Took from MissingH -}
c2w8 :: Char -> Word8
c2w8 = fromIntegral . fromEnum


charRangeStart :: Word8
charRangeStart = c2w8 'a'
{-# INLINE charRangeStart #-}

charRangeEnd :: Word8
charRangeEnd = c2w8 'z'
{-# INLINE charRangeEnd #-}

--genString :: Gen RealWorld -> IO B.ByteString
genString g = do
    randomLen <- uniformR (50 :: Int, 450 :: Int) g
    str <- replicateM randomLen $ uniformR (charRangeStart, charRangeEnd) g
    return $ B.pack str


writeCorpus :: FilePath -> IO [()]
writeCorpus file = withFile file WriteMode $ \h -> do
  let size = 100000
  _ <- withSystemRandom $ \gen ->
      replicateM size $ do
        text <- genString gen :: IO B.ByteString
        CB.hPutStrLn h text
  return [()]

main :: IO [()]
main =  writeCorpus "test.txt"



A.
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Dmitry Vyal | 17 Oct 21:10 2012
Picon

Re: poor performance when generating random text

On 10/17/2012 12:45 PM, Alfredo Di Napoli wrote:
> What about this? I've tested on my pc and seems pretty fast. The trick 
> is to generate the gen only once. Not sure if the inlines helps, though:
>

 > What about this? I've tested on my pc and seems pretty fast. The 
trick is to generate the gen only once. Not sure if the inlines helps, 
though
...

Wow, haskell-cafe is a wonderful place! In just a two hours program run 
time automagically improved 20x ;) Thanks Alfredo, code works wonderful. 
Compared to mine implementation it's 2.5 sec vs 50 sec on my laptop. 
Interesting, how it compares to C now.

Inlining makes about 50x difference when code compiled without 
optimization. A nice example.

Best wishes,
Dmitry
Alfredo Di Napoli | 17 Oct 21:28 2012
Picon

Re: poor performance when generating random text

Glad to have been helpful :)

Bests,
Alfredo

Sent from my iPad

On 17/ott/2012, at 21:10, Dmitry Vyal <akamaus <at> gmail.com> wrote:

> On 10/17/2012 12:45 PM, Alfredo Di Napoli wrote:
>> What about this? I've tested on my pc and seems pretty fast. The trick is to generate the gen only once. Not
sure if the inlines helps, though:
>> 
> 
> > What about this? I've tested on my pc and seems pretty fast. The trick is to generate the gen only once. Not
sure if the inlines helps, though
> ...
> 
> Wow, haskell-cafe is a wonderful place! In just a two hours program run time automagically improved 20x ;)
Thanks Alfredo, code works wonderful. Compared to mine implementation it's 2.5 sec vs 50 sec on my laptop.
Interesting, how it compares to C now.
> 
> Inlining makes about 50x difference when code compiled without optimization. A nice example.
> 
> Best wishes,
> Dmitry
> 

Gmane