Michael Lesniak | 13 Nov 15:15
Picon
Favicon

Strange parallel behaviour with Ubuntu Karmic / GHC 6.10.4

Hello,

I'm currently developing some applications with explicit threading
using forkIO and have strange behaviour on my freshly installed Ubuntu
Karmic 9.10 (Kernel 2.6.31-14 SMP).

Setup:
Machine A: Quadcore, Ubuntu 9.04, Kernel  2.6.28-13 SMP
Machine B: AMD Opteron 875, 8 cores,  2.6.18-164 SMP- (some redhat)
Machine C: Dual-Core, Ubuntu 9.10, Kernel 2.6.31-14 SMP
Compiler on all machines: ghc 6.10.4 (downloaded from GHCs official website)

Program, Compilation, Execution
A simple taskqueue with independent tasks and explicit parallelization
(hence should deliver more or less perfect speedup).
For one core wall-times around 16 are ok, for 2 a bit more than 8 seconds.

Since I used the same sources and Makefiles on all machines all files
were compiled with -threaded and started with +RTS -N2 -RTS.

Testing:
Machine A: Ok (meaning works and delivers the expected speedup)
Machine B: Ok
Machine C: Not ok (with -N2 wall times around 14-15 seconds)

Looking at the core usage, for example with htop, I see that the
second core is not really used on C. Executing OpenMP programs shows
the expected speedup and usage of both cores, hence I do not think its
a kind of general linux configuration problem.

So, after all the testing I think its either the Linux Kernel or some
other component of Ubuntu 9.10. But: Ubuntu is often used and I did
not found any information regarding this problem. The simple solution
of installing the old version of Ubuntu would probably help but should
not be the way to go, should it?

I'd be glad for any hints or comments,
Michael

--

-- 
Dipl.-Inf. Michael C. Lesniak
University of Kassel
Programming Languages / Methodologies Research Group
Department of Computer Science and Electrical Engineering

Wilhelmshöher Allee 73
34121 Kassel

Phone: +49-(0)561-804-6269
Neil Brown | 15 Nov 22:49
Picon
Favicon

Re: Strange parallel behaviour with Ubuntu Karmic / GHC 6.10.4


Michael Lesniak wrote: > Hello, > > I'm currently developing some applications with explicit threading > using forkIO and have strange behaviour on my freshly installed Ubuntu > Karmic 9.10 (Kernel 2.6.31-14 SMP). > > Setup: > Machine A: Quadcore, Ubuntu 9.04, Kernel 2.6.28-13 SMP > Machine B: AMD Opteron 875, 8 cores, 2.6.18-164 SMP- (some redhat) > Machine C: Dual-Core, Ubuntu 9.10, Kernel 2.6.31-14 SMP > Compiler on all machines: ghc 6.10.4 (downloaded from GHCs official website) >
Hi, I have a dual-core Ubuntu 9.10 machine (running whatever GHC comes with the distro -- 6.10.x), so if you put your test code somewhere that I can get at, I can run it and see if I get the same effect. Thanks, Neil.
Michael Lesniak | 16 Nov 00:48
Picon
Favicon

Re: Strange parallel behaviour with Ubuntu Karmic / GHC 6.10.4

Hello,

I've written a smaller example which reproduces the unusual behaviour.
Should I open a GHC-Ticket, too?

-- A small working example which describes the problems (I have) with GHC
-- 6.10.4, Ubuntu Karmic 9.10, explicit threading and core usage.
--
-- See http://www.haskell.org/pipermail/haskell-cafe/2009-November/069144.html
-- for the general description of the problem.
--
-- For comparsion:
-- Compilation on both machines with
-- 
--     ghc --make -O2 -threaded Example.hs -o e -Wall
--
-- 
-- 1. Machine B: (Quadcore, Ubuntu 9.04)
-- a. With 1 thread:
-- time e +RTS -N1 -RTS 16
-- e +RTS -N1 -RTS 16  11,00s user 5,00s system 100% cpu 16,004 total
--
-- b. With 2 threads:
-- time e +RTS -N2 -RTS 16
-- e +RTS -N2 -RTS 16  11,44s user 4,58s system 197% cpu 8,102 total
--
--
-- 2. Machine C: (Dualcore, Ubuntu 9.10)
-- a. With 1 thread:
-- time e +RTS -N1 -RTS  16
--
-- real 0m16.414s
-- user 0m11.360s
-- sys  0m4.650s
--
-- b. With 2 threads:
-- time e +RTS -N2 -RTS  16
--
-- real 0m18.484s
-- user 0m14.320s
-- sys  0m5.940s
--
-------------------------------------------------------------------------------
module Main where

import GHC.Conc
import Control.Concurrent
import Control.Monad
import System.Posix.Clock
import System.Environment

-------------------------------------------------------------------------------
main :: IO ()
main = do
    -- Configuration
    args <- getArgs
    let threads = numCapabilities    -- number of threads determined by -N<...>
        taskDur = 1.0                -- seconds each task takes
        taskNum = (read . head) args -- Number of tasks is 1st parameter

    -- Generate a channel for the tasks to do and fill it with uniform and
    -- independent tasks. The other channel receives a message for each task
    -- which is finished.
    queue    <- newChan
    finished <- newChan
    writeList2Chan queue (replicate taskNum taskDur)

    -- Fork threads
    replicateM_ threads (forkIO (thread queue finished))

    -- Wait until the queue is empty
    replicateM_ taskNum (readChan finished)

-------------------------------------------------------------------------------
thread :: Chan Double -> Chan Int -> IO ()
thread queue finished =
    forever $ do
        task <- readChan queue
        workFor task
        writeChan finished 1

-------------------------------------------------------------------------------
-- | Generates work for @s@ seconds.
workFor :: Double -> IO ()
workFor s = do
    now <- getTime ThreadCPUTime
    repeat (time2Double now + s)
  where repeat fs = do
            now <- nSqrt 10000 `pseq` getTime ThreadCPUTime
            let f = time2Double now
            unless (f >= fs) $ repeat fs
        time2Double t =
            fromIntegral (sec t) + (fromIntegral (nsec t) / 1000000000)
        -- Calculates the sqrt of 2^1000. The parameter n is to ensure
        -- that GHC does not optimize it away.
        -- (In fact, I'm not sure this is needed...)
        nSqrt n =
            let sqs = map (\_ -> iterate sqrt (2^1000) !! 50) [1..n]
            in foldr seq 1 sqs
Neil Brown | 16 Nov 10:38
Picon
Favicon

Re: Strange parallel behaviour with Ubuntu Karmic / GHC 6.10.4


Michael Lesniak wrote: > Hello, > > I've written a smaller example which reproduces the unusual behaviour. > Should I open a GHC-Ticket, too? >
Hi, I get these results: $ time ./Temp +RTS -N1 -RTS 16 real 0m16.010s user 0m10.869s sys 0m5.144s $ time ./Temp +RTS -N2 -RTS 16 real 0m12.794s user 0m13.341s sys 0m7.136s Looking at top, the second version used ~160% CPU time (i.e. it was using both cores fairly well). So I don't think I get the same bad behaviour as you. Those sys times look high by the way -- I guess it's all the calls to getTime? I wonder if that number might be causing the problem; can you replicate it with lower sys times? Thanks, Neil.
Michael Lesniak | 16 Nov 10:59
Picon
Favicon

Re: Strange parallel behaviour with Ubuntu Karmic / GHC 6.10.4

Hello,


> getTime?  I wonder if that number might be causing the problem; can you > replicate it with lower sys times?
That was it! Thanks Neil! When I'm using some number crunching without getTime it works (with more or less the expected speedup and usage of two cores) on my Ubuntu 9.10, too. Out of curiosity, the question is still open: Why does the old example (using getTime) work so much better on an older version of Ubuntu/RedHat and not on the new ones? Kind regards, Michael -- -- Dipl.-Inf. Michael C. Lesniak University of Kassel Programming Languages / Methodologies Research Group Department of Computer Science and Electrical Engineering Wilhelmshöher Allee 73 34121 Kassel Phone: +49-(0)561-804-6269
Neil Brown | 16 Nov 11:33
Picon
Favicon

Re: Strange parallel behaviour with Ubuntu Karmic / GHC 6.10.4


Michael Lesniak wrote: > Hello, > > >> getTime? I wonder if that number might be causing the problem; can you >> replicate it with lower sys times? >> > That was it! Thanks Neil! > > When I'm using some number crunching without getTime it works (with > more or less the expected speedup and usage of two cores) on my Ubuntu > 9.10, too. > > Out of curiosity, the question is still open: Why does the old example > (using getTime) work so much better on an older version of > Ubuntu/RedHat and not on the new ones? > >
Your kernels were: Setup: Machine A: Quadcore, Ubuntu 9.04, Kernel 2.6.28-13 SMP Machine B: AMD Opteron 875, 8 cores, 2.6.18-164 SMP- (some redhat) Machine C: Dual-Core, Ubuntu 9.10, Kernel 2.6.31-14 SMP Looking at the implementation of getTime ThreadCPUTime in the clock package, it calls clock_gettime(CLOCK_THREAD_CPUTIME_ID,..). According to this page (http://www.h-online.com/open/news/item/Kernel-Log-What-s-new-in-2-6-29-Part-8-Faster-start-up-and-other-behind-the-scenes-changes-740591.html), the changes in 2.6.29 (changes which only your Ubuntu 9.10 machine has) included a patch (http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=c742b31c03f37c5c499178f09f57381aa6c70131) which altered the implementation of that function. Perhaps on some multi-processor machines the new implementation effectively serialises the code? I know there used to be issues of whether some of the timers were synchronised across processors/cores (to stop them appearing to go backwards), so maybe something with the timers and their synchronisations effectively stops your program running in parallel. If it helps, my machine is: "Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz" according to /proc/cpuinfo. Thanks, Neil.

Gmane