J Baptist | 27 Aug 22:43 2012
Picon

I/O overhead in opening and writing files

I'm looking into high-performance I/O, particularly on a tmpfs (in-memory) filesystem. This involves creating lots of little files. Unfortunately, it seems that Haskell's performance in this area is not comparable to that of C. I assume that this is because of the overhead involved in opening and closing files. Some cursory profiling confirmed this: most of the runtime of the program is in taken by openFile, hPutStr, and hClose.

I thought that it might be faster to call the C library functions exposed as foreign imports in System.Posix.Internals, and thereby cut out some of Haskell's overhead. This indeed improved performance, but the program is still nearly twice as slow as the corresponding C program.

I took some benchmarks. I wrote a program to create 500.000 files on a tmpfs filesystem, and write an integer into each of them. I did this in C, using the open; and twice in Haskell, using openFile and c_open. Here are the results:

C program, using open and friends (gcc 4.4.3)
real    0m4.614s
user    0m0.380s
sys     0m4.200s

Haskell, using System.IO.openFile and friends (ghc 7.4.2)
real    0m14.892s
user    0m7.700s
sys     0m6.890s

Haskell, using System.Posix.Internals.c_open and friends (ghc 7.4.2)
real    0m7.372s
user    0m2.390s
sys     0m4.570s

Why question is: why is this so slow? Could the culprit be the marshaling necessary to pass the parameters to the foreign functions? If I'm calling the low-level function c_open anyway, shouldn't performance be closer to C? Does anyone have suggestions for how to improve this?

If anyone is interested, I can provide the code I used for these benchmarks.
_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users <at> haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Johan Tibell | 27 Aug 22:48 2012
Picon

Re: I/O overhead in opening and writing files

On Mon, Aug 27, 2012 at 1:43 PM, J Baptist <arc38813 <at> hotmail.com> wrote:
> I'm looking into high-performance I/O, particularly on a tmpfs (in-memory)
> filesystem. This involves creating lots of little files. Unfortunately, it
> seems that Haskell's performance in this area is not comparable to that of
> C. I assume that this is because of the overhead involved in opening and
> closing files. Some cursory profiling confirmed this: most of the runtime of
> the program is in taken by openFile, hPutStr, and hClose.
>
> I thought that it might be faster to call the C library functions exposed as
> foreign imports in System.Posix.Internals, and thereby cut out some of
> Haskell's overhead. This indeed improved performance, but the program is
> still nearly twice as slow as the corresponding C program.
>
> I took some benchmarks. I wrote a program to create 500.000 files on a tmpfs
> filesystem, and write an integer into each of them. I did this in C, using
> the open; and twice in Haskell, using openFile and c_open. Here are the
> results:
>
> C program, using open and friends (gcc 4.4.3)
> real    0m4.614s
> user    0m0.380s
> sys     0m4.200s
>
> Haskell, using System.IO.openFile and friends (ghc 7.4.2)
> real    0m14.892s
> user    0m7.700s
> sys     0m6.890s
>
> Haskell, using System.Posix.Internals.c_open and friends (ghc 7.4.2)
> real    0m7.372s
> user    0m2.390s
> sys     0m4.570s
>
> Why question is: why is this so slow? Could the culprit be the marshaling
> necessary to pass the parameters to the foreign functions? If I'm calling
> the low-level function c_open anyway, shouldn't performance be closer to C?
> Does anyone have suggestions for how to improve this?
>
> If anyone is interested, I can provide the code I used for these benchmarks.

Please do. You can paste them at http://hpaste.org/

Could you try using the Data.ByteString API. I don't have the code in
front of me so I don't know if the System.Posix API uses Strings. If
it does, that's most likely the issue.

-- Johan
Austin Seipp | 27 Aug 22:52 2012
Picon

Re: I/O overhead in opening and writing files

In this vein, you may be interested in trying out the unix-bytestring
package (it contains ByteString based bindings for POSIX I/O - but
you'll still need the unix package to get at the underlying file
descriptor.)

http://hackage.haskell.org/packages/archive/unix-bytestring/0.3.5.4/doc/html/System-Posix-IO-ByteString.html

On Mon, Aug 27, 2012 at 3:48 PM, Johan Tibell <johan.tibell <at> gmail.com> wrote:
> On Mon, Aug 27, 2012 at 1:43 PM, J Baptist <arc38813 <at> hotmail.com> wrote:
>> I'm looking into high-performance I/O, particularly on a tmpfs (in-memory)
>> filesystem. This involves creating lots of little files. Unfortunately, it
>> seems that Haskell's performance in this area is not comparable to that of
>> C. I assume that this is because of the overhead involved in opening and
>> closing files. Some cursory profiling confirmed this: most of the runtime of
>> the program is in taken by openFile, hPutStr, and hClose.
>>
>> I thought that it might be faster to call the C library functions exposed as
>> foreign imports in System.Posix.Internals, and thereby cut out some of
>> Haskell's overhead. This indeed improved performance, but the program is
>> still nearly twice as slow as the corresponding C program.
>>
>> I took some benchmarks. I wrote a program to create 500.000 files on a tmpfs
>> filesystem, and write an integer into each of them. I did this in C, using
>> the open; and twice in Haskell, using openFile and c_open. Here are the
>> results:
>>
>> C program, using open and friends (gcc 4.4.3)
>> real    0m4.614s
>> user    0m0.380s
>> sys     0m4.200s
>>
>> Haskell, using System.IO.openFile and friends (ghc 7.4.2)
>> real    0m14.892s
>> user    0m7.700s
>> sys     0m6.890s
>>
>> Haskell, using System.Posix.Internals.c_open and friends (ghc 7.4.2)
>> real    0m7.372s
>> user    0m2.390s
>> sys     0m4.570s
>>
>> Why question is: why is this so slow? Could the culprit be the marshaling
>> necessary to pass the parameters to the foreign functions? If I'm calling
>> the low-level function c_open anyway, shouldn't performance be closer to C?
>> Does anyone have suggestions for how to improve this?
>>
>> If anyone is interested, I can provide the code I used for these benchmarks.
>
> Please do. You can paste them at http://hpaste.org/
>
> Could you try using the Data.ByteString API. I don't have the code in
> front of me so I don't know if the System.Posix API uses Strings. If
> it does, that's most likely the issue.
>
> -- Johan
>
> _______________________________________________
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users <at> haskell.org
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

--

-- 
Regards,
Austin
Donn Cave | 27 Aug 23:33 2012

Re: I/O overhead in opening and writing files

Quoth Johan Tibell <johan.tibell <at> gmail.com>,
...
> Could you try using the Data.ByteString API. I don't have the code in
> front of me so I don't know if the System.Posix API uses Strings. If
> it does, that's most likely the issue.

It does, but it can also read directly to Ptr Word8 (fdReadBuf), which
you'd think would be closer to hardware speed - but then you might lose
the advantage trying to peek the data out of the buffer.  In principle
you ought to be able to stuff that pointer right into a ByteString,
but don't know for sure that there's any public API for such.  I guess
you may be proposing to use Data.ByteString.hGet?

Look out for character set conversions!  Even if String were affordable
in terms of resources, I'm not sure there's any way to avoid this
problem.

	Donn
wren ng thornton | 28 Aug 05:11 2012

Re: I/O overhead in opening and writing files

On 8/27/12 5:33 PM, Donn Cave wrote:
> It does, but it can also read directly to Ptr Word8 (fdReadBuf), which
> you'd think would be closer to hardware speed - but then you might lose
> the advantage trying to peek the data out of the buffer.  In principle
> you ought to be able to stuff that pointer right into a ByteString,
> but don't know for sure that there's any public API for such.

As Austin Seipp mentioned, there's unix-bytestring[1] which minimizes 
the amount of marshaling/conversion imposed by using a high-level 
language. And it includes the obvious conversion between Ptr Word8 and 
ByteString. If there's any remaining overhead, let me know and I'll do 
my best to eliminate it.

But, that's only for the reading and writing; opening files is another 
matter. If it's the *opening* of files that's causing the slowdown, then 
that has to be due to something in how GHC handles filename conversion 
et al.

[1] http://hackage.haskell.org/package/unix-bytestring

--

-- 
Live well,
~wren
J Baptist | 28 Aug 00:25 2012
Picon

RE: I/O overhead in opening and writing files

Using ByteStrings and the C calls does indeed speed things up a bit, but not much.

real 0m6.053s
user 0m1.480s
sys 0m4.550s

For your interest:
The original version (with Strings and openFile): http://hpaste.org/73803
Faster (with Strings and c_open): http://hpaste.org/73802
Even faster (with ByteStrings and c_open): http://hpaste.org/73801

The problem may be that even with ByteStrings, we are stuck using show, and thus Strings, at some point.

Ideas?


> From: johan.tibell <at> gmail.com
> Date: Mon, 27 Aug 2012 13:48:27 -0700
> Subject: Re: I/O overhead in opening and writing files
> To: arc38813 <at> hotmail.com
> CC: glasgow-haskell-users <at> haskell.org
>
> On Mon, Aug 27, 2012 at 1:43 PM, J Baptist <arc38813 <at> hotmail.com> wrote:
> > I'm looking into high-performance I/O, particularly on a tmpfs (in-memory)
> > filesystem. This involves creating lots of little files. Unfortunately, it
> > seems that Haskell's performance in this area is not comparable to that of
> > C. I assume that this is because of the overhead involved in opening and
> > closing files. Some cursory profiling confirmed this: most of the runtime of
> > the program is in taken by openFile, hPutStr, and hClose.
> >
> > I thought that it might be faster to call the C library functions exposed as
> > foreign imports in System.Posix.Internals, and thereby cut out some of
> > Haskell's overhead. This indeed improved performance, but the program is
> > still nearly twice as slow as the corresponding C program.
> >
> > I took some benchmarks. I wrote a program to create 500.000 files on a tmpfs
> > filesystem, and write an integer into each of them. I did this in C, using
> > the open; and twice in Haskell, using openFile and c_open. Here are the
> > results:
> >
> > C program, using open and friends (gcc 4.4.3)
> > real 0m4.614s
> > user 0m0.380s
> > sys 0m4.200s
> >
> > Haskell, using System.IO.openFile and friends (ghc 7.4.2)
> > real 0m14.892s
> > user 0m7.700s
> > sys 0m6.890s
> >
> > Haskell, using System.Posix.Internals.c_open and friends (ghc 7.4.2)
> > real 0m7.372s
> > user 0m2.390s
> > sys 0m4.570s
> >
> > Why question is: why is this so slow? Could the culprit be the marshaling
> > necessary to pass the parameters to the foreign functions? If I'm calling
> > the low-level function c_open anyway, shouldn't performance be closer to C?
> > Does anyone have suggestions for how to improve this?
> >
> > If anyone is interested, I can provide the code I used for these benchmarks.
>
> Please do. You can paste them at http://hpaste.org/
>
> Could you try using the Data.ByteString API. I don't have the code in
> front of me so I don't know if the System.Posix API uses Strings. If
> it does, that's most likely the issue.
>
> -- Johan
_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users <at> haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Don Stewart | 28 Aug 00:30 2012
Picon

Re: I/O overhead in opening and writing files

Why are you using Show?


bytestring-show might be an option.

Remember: for speed, don't convert between String types.

Consider mmap-bytestring too.

On Monday, August 27, 2012, J Baptist wrote:
Using ByteStrings and the C calls does indeed speed things up a bit, but not much.

real 0m6.053s
user 0m1.480s
sys 0m4.550s

For your interest:
The original version (with Strings and openFile): http://hpaste.org/73803
Faster (with Strings and c_open): http://hpaste.org/73802
Even faster (with ByteStrings and c_open): http://hpaste.org/73801

The problem may be that even with ByteStrings, we are stuck using show, and thus Strings, at some point.

Ideas?


> From: johan.tibell <at> gmail.com
> Date: Mon, 27 Aug 2012 13:48:27 -0700
> Subject: Re: I/O overhead in opening and writing files
> To: arc38813 <at> hotmail.com
> CC: glasgow-haskell-users <at> haskell.org
>
> On Mon, Aug 27, 2012 at 1:43 PM, J Baptist <arc38813 <at> hotmail.com> wrote:
> > I'm looking into high-performance I/O, particularly on a tmpfs (in-memory)
> > filesystem. This involves creating lots of little files. Unfortunately, it
> > seems that Haskell's performance in this area is not comparable to that of
> > C. I assume that this is because of the overhead involved in opening and
> > closing files. Some cursory profiling confirmed this: most of the runtime of
> > the program is in taken by openFile, hPutStr, and hClose.
> >
> > I thought that it might be faster to call the C library functions exposed as
> > foreign imports in System.Posix.Internals, and thereby cut out some of
> > Haskell's overhead. This indeed improved performance, but the program is
> > still nearly twice as slow as the corresponding C program.
> >
> > I took some benchmarks. I wrote a program to create 500.000 files on a tmpfs
> > filesystem, and write an integer into each of them. I did this in C, using
> > the open; and twice in Haskell, using openFile and c_open. Here are the
> > results:
> >
> > C program, using open and friends (gcc 4.4.3)
> > real 0m4.614s
> > user 0m0.380s
> > sys 0m4.200s
> >
> > Haskell, using System.IO.openFile and friends (ghc 7.4.2)
> > real 0m14.892s
> > user 0m7.700s
> > sys 0m6.890s
> >
> > Haskell, using System.Posix.Internals.c_open and friends (ghc 7.4.2)
> > real 0m7.372s
> > user 0m2.390s
> > sys 0m4.570s
> >
> > Why question is: why is this so slow? Could the culprit be the marshaling
> > necessary to pass the parameters to the foreign functions? If I'm calling
> > the low-level function c_open anyway, shouldn't performance be closer to C?
> > Does anyone have suggestions for how to improve this?
> >
> > If anyone is interested, I can provide the code I used for these benchmarks.
>
> Please do. You can paste them at http://hpaste.org/
>
> Could you try using the Data.ByteString API. I don't have the code in
> front of me so I don't know if the System.Posix API uses Strings. If
> it does, that's most likely the issue.
>
> -- Johan
_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users <at> haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Felipe Almeida Lessa | 28 Aug 03:19 2012
Picon

Re: I/O overhead in opening and writing files

On Mon, Aug 27, 2012 at 7:25 PM, J Baptist <arc38813 <at> hotmail.com> wrote:
> real 0m6.053s
> user 0m1.480s
> sys 0m4.550s

Do these timings include RTS startup?

--

-- 
Felipe.
J Baptist | 28 Aug 04:09 2012
Picon

RE: I/O overhead in opening and writing files

> From: felipe.lessa <at> gmail.com
> Do these timings include RTS startup?

Yes, this is the result of the time command on the whole executable.
_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users <at> haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Bryan O'Sullivan | 28 Aug 05:05 2012

Re: I/O overhead in opening and writing files

On Mon, Aug 27, 2012 at 3:25 PM, J Baptist <arc38813 <at> hotmail.com> wrote:

The problem may be that even with ByteStrings, we are stuck using show, and thus Strings, at some point.

Wait, what are you actually trying to do? If you have a benchmark that's half sane and half bonkers (cf. use of show), then yeah, it's not going to do so well.
_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users <at> haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
wren ng thornton | 28 Aug 05:24 2012

Re: I/O overhead in opening and writing files

On 8/27/12 6:25 PM, J Baptist wrote:
>
> Using ByteStrings and the C calls does indeed speed things up a bit, but not much.
> real	0m6.053suser	0m1.480ssys	0m4.550s
> For your interest:The original version (with Strings and openFile): http://hpaste.org/73803Faster
(with Strings and c_open): http://hpaste.org/73802Even faster (with ByteStrings and c_open): http://hpaste.org/73801
> The problem may be that even with ByteStrings, we are stuck using show, and thus Strings, at some point.
> Ideas?

Don't use Show.

Show is only there for printing things at the REPL and has no place in 
performance-centric code. The bytestring-lexing[1] package has efficient 
functions for rendering integral numbers into ByteStrings[2]. And there 
are a number of other efficient ByteString renderers as well, such as 
blaze-builder[3].

[1] http://hackage.haskell.org/package/bytestring-lexing

[2] Albeit, they haven't been hyper-aggressively optimized (cf., 
readDecimal), but that's because I haven't run into the need for doing 
so. If you can demonstrate a real need, I'm willing to spend some time 
on it.

[3] http://hackage.haskell.org/package/blaze-builder

--

-- 
Live well,
~wren

Gmane