Leon Smith | 28 Nov 20:38 2012
Picon

How can I avoid buffered reads?

I have some code that reads (infrequently) small amounts of data from /dev/urandom,  and because this is pretty infrequent,  I simply open the handle and close it every time I need some random bytes.   

The problem is that I recently discovered that,  thanks to buffering within GHC,   I was actually reading 8096 bytes when I only need 16 bytes,  and thus wasting entropy.   Moreover  calling hSetBuffering  handle NoBuffering did not change this behavior.   

I'm not sure if this behavior is a bug or a feature,  but in any case it's unacceptable for dealing with /dev/urandom.   Probably the simplest way to fix this is to write a little C helper function that will read from /dev/urandom for me,  so that I have precise control over the system calls involved.     But I'm curious if GHC can manage this use case correctly;   I've just started digging into the GHC.IO code myself.

Best,
Leon

{-# LANGUAGE BangPatterns, ViewPatterns #-} import Control.Applicative import Data.Bits import Data.Word(Word64) import qualified Data.ByteString as S import qualified Data.ByteString.Lazy as L import Data.ByteString.Internal (c2w) import qualified System.IO as IO import qualified Data.Binary.Get as Get showHex :: Word64 -> S.ByteString showHex n = s where (!s,_) = S.unfoldrN 16 f n f n = Just (char (n `shiftR` 60), n `shiftL` 4) char (fromIntegral -> i) | i < 10 = (c2w '0' - 0) + i | otherwise = (c2w 'a' - 10) + i twoRandomWord64s :: IO (Word64,Word64) twoRandomWord64s = IO.withBinaryFile "/dev/urandom" IO.ReadMode $ \handle -> do IO.hSetBuffering handle IO.NoBuffering Get.runGet ((,) <$> Get.getWord64host <*> Get.getWord64host) <$> L.hGet handle 16 main = do (x,y) <- twoRandomWord64s S.hPutStrLn IO.stdout (S.append (showHex x) (showHex y)) {- Relevant part of strace: open("/dev/urandom", O_RDONLY|O_NOCTTY|O_NONBLOCK) = 3 fstat(3, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 9), ...}) = 0 ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7ffff367e528) = -1 EINVAL (Invalid argument) ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7ffff367e528) = -1 EINVAL (Invalid argument) read(3, "N\304\4\367/\26c\"\3218\237f\214yKg~i\310\r\262\"\224H\340y\n\376V?\265\344"..., 8096) = 8096 close(3) = 0 -}
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Thomas DuBuisson | 28 Nov 20:45 2012
Picon

Re: How can I avoid buffered reads?

As an alternative, If there existed a Haskell package to give you fast
cryptographically secure random numbers or use the new Intel RDRAND
instruction (when available) would that interest you?

Also, what you are doing is identical to the "entropy" package on
hackage, which probably suffers from the same bug/performance issue.

Cheers,
Thomas

On Wed, Nov 28, 2012 at 11:38 AM, Leon Smith <leon.p.smith <at> gmail.com> wrote:
> I have some code that reads (infrequently) small amounts of data from
> /dev/urandom,  and because this is pretty infrequent,  I simply open the
> handle and close it every time I need some random bytes.
>
> The problem is that I recently discovered that,  thanks to buffering within
> GHC,   I was actually reading 8096 bytes when I only need 16 bytes,  and
> thus wasting entropy.   Moreover  calling hSetBuffering  handle NoBuffering
> did not change this behavior.
>
> I'm not sure if this behavior is a bug or a feature,  but in any case it's
> unacceptable for dealing with /dev/urandom.   Probably the simplest way to
> fix this is to write a little C helper function that will read from
> /dev/urandom for me,  so that I have precise control over the system calls
> involved.     But I'm curious if GHC can manage this use case correctly;
> I've just started digging into the GHC.IO code myself.
>
> Best,
> Leon
>
> {-# LANGUAGE BangPatterns, ViewPatterns #-}
>
> import           Control.Applicative
> import           Data.Bits
> import           Data.Word(Word64)
> import qualified Data.ByteString as S
> import qualified Data.ByteString.Lazy as L
> import           Data.ByteString.Internal (c2w)
> import qualified System.IO        as IO
> import qualified Data.Binary.Get        as Get
>
> showHex :: Word64 -> S.ByteString
> showHex n = s
>   where
>     (!s,_) = S.unfoldrN 16 f n
>
>     f n = Just (char (n `shiftR` 60), n `shiftL` 4)
>
>     char (fromIntegral -> i)
>       | i < 10    = (c2w '0' -  0) + i
>       | otherwise = (c2w 'a' - 10) + i
>
> twoRandomWord64s :: IO (Word64,Word64)
> twoRandomWord64s = IO.withBinaryFile "/dev/urandom" IO.ReadMode $ \handle ->
> do
>    IO.hSetBuffering handle IO.NoBuffering
>    Get.runGet ((,) <$> Get.getWord64host <*> Get.getWord64host) <$> L.hGet
> handle 16
>
> main = do
>    (x,y) <- twoRandomWord64s
>    S.hPutStrLn IO.stdout (S.append (showHex x) (showHex y))
>
>
> {- Relevant part of strace:
>
> open("/dev/urandom", O_RDONLY|O_NOCTTY|O_NONBLOCK) = 3
> fstat(3, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 9), ...}) = 0
> ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7ffff367e528) = -1 EINVAL (Invalid
> argument)
> ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7ffff367e528) = -1 EINVAL (Invalid
> argument)
> read(3,
> "N\304\4\367/\26c\"\3218\237f\214yKg~i\310\r\262\"\224H\340y\n\376V?\265\344"...,
> 8096) = 8096
> close(3)                                = 0
>
> -}
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe <at> haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
Leon Smith | 28 Nov 22:31 2012
Picon

Re: How can I avoid buffered reads?

Quite possibly,  entropy does seem to be a pretty lightweight dependency...


Though doesn't recent kernels use rdrand to seed /dev/urandom if it's available?   So /dev/urandom is the most portable source of random numbers on unix systems,  though rdrand does have the advantage of avoiding system calls,  so it certainly would be preferable, especially if you need large numbers of random numbers.

Best,
Leon

On Wed, Nov 28, 2012 at 2:45 PM, Thomas DuBuisson <thomas.dubuisson <at> gmail.com> wrote:
As an alternative, If there existed a Haskell package to give you fast
cryptographically secure random numbers or use the new Intel RDRAND
instruction (when available) would that interest you?

Also, what you are doing is identical to the "entropy" package on
hackage, which probably suffers from the same bug/performance issue.

Cheers,
Thomas

On Wed, Nov 28, 2012 at 11:38 AM, Leon Smith <leon.p.smith <at> gmail.com> wrote:
> I have some code that reads (infrequently) small amounts of data from
> /dev/urandom,  and because this is pretty infrequent,  I simply open the
> handle and close it every time I need some random bytes.
>
> The problem is that I recently discovered that,  thanks to buffering within
> GHC,   I was actually reading 8096 bytes when I only need 16 bytes,  and
> thus wasting entropy.   Moreover  calling hSetBuffering  handle NoBuffering
> did not change this behavior.
>
> I'm not sure if this behavior is a bug or a feature,  but in any case it's
> unacceptable for dealing with /dev/urandom.   Probably the simplest way to
> fix this is to write a little C helper function that will read from
> /dev/urandom for me,  so that I have precise control over the system calls
> involved.     But I'm curious if GHC can manage this use case correctly;
> I've just started digging into the GHC.IO code myself.
>
> Best,
> Leon
>
> {-# LANGUAGE BangPatterns, ViewPatterns #-}
>
> import           Control.Applicative
> import           Data.Bits
> import           Data.Word(Word64)
> import qualified Data.ByteString as S
> import qualified Data.ByteString.Lazy as L
> import           Data.ByteString.Internal (c2w)
> import qualified System.IO        as IO
> import qualified Data.Binary.Get        as Get
>
> showHex :: Word64 -> S.ByteString
> showHex n = s
>   where
>     (!s,_) = S.unfoldrN 16 f n
>
>     f n = Just (char (n `shiftR` 60), n `shiftL` 4)
>
>     char (fromIntegral -> i)
>       | i < 10    = (c2w '0' -  0) + i
>       | otherwise = (c2w 'a' - 10) + i
>
> twoRandomWord64s :: IO (Word64,Word64)
> twoRandomWord64s = IO.withBinaryFile "/dev/urandom" IO.ReadMode $ \handle ->
> do
>    IO.hSetBuffering handle IO.NoBuffering
>    Get.runGet ((,) <$> Get.getWord64host <*> Get.getWord64host) <$> L.hGet
> handle 16
>
> main = do
>    (x,y) <- twoRandomWord64s
>    S.hPutStrLn IO.stdout (S.append (showHex x) (showHex y))
>
>
> {- Relevant part of strace:
>
> open("/dev/urandom", O_RDONLY|O_NOCTTY|O_NONBLOCK) = 3
> fstat(3, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 9), ...}) = 0
> ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7ffff367e528) = -1 EINVAL (Invalid
> argument)
> ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7ffff367e528) = -1 EINVAL (Invalid
> argument)
> read(3,
> "N\304\4\367/\26c\"\3218\237f\214yKg~i\310\r\262\"\224H\340y\n\376V?\265\344"...,
> 8096) = 8096
> close(3)                                = 0
>
> -}
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe <at> haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Vincent Hanquez | 28 Nov 23:00 2012

Re: How can I avoid buffered reads?

On 11/28/2012 09:31 PM, Leon Smith wrote:
> Quite possibly,  entropy does seem to be a pretty lightweight 
> dependency...
>
> Though doesn't recent kernels use rdrand to seed /dev/urandom if it's 
> available?   So /dev/urandom is the most portable source of random 
> numbers on unix systems,  though rdrand does have the advantage of 
> avoiding system calls,  so it certainly would be preferable, 
> especially if you need large numbers of random numbers.
There's no much information on this i think, but if you need large 
number of random numbers you should build a PRNG yourself on top of the 
best random seed you can get, and make sure you reseed your prng 
casually with more entropy bytes. Also if
you don't have enough initial entropy, you should block.

/dev/urandom is not the same thing on every unix system. leading to 
various assumptions broken when varying the unixes. It also varies with 
the hardware context: for example on an embedded or some virtualized 
platform, giving you really terrible entropy.

--

-- 
Vincent
Leon Smith | 29 Nov 00:05 2012
Picon

Re: How can I avoid buffered reads?

If you have rdrand,  there is no need to build your own PRNG on top of rdrand.   RdRand already incorporates one so that it can produce random numbers as fast as they can be requested,  and this number is continuously re-seeded with the on-chip entropy source.


It would be nice to have a little more information about /dev/urandom and how it varies by OS and hardware,   but on Linux and FreeBSD at least it's supposed to be a cryptographically secure RNG that incorporates a PRNG to produce numbers in case you exhaust the entropy pool.

On Wed, Nov 28, 2012 at 5:00 PM, Vincent Hanquez <tab <at> snarc.org> wrote:
On 11/28/2012 09:31 PM, Leon Smith wrote:
Quite possibly,  entropy does seem to be a pretty lightweight dependency...

Though doesn't recent kernels use rdrand to seed /dev/urandom if it's available?   So /dev/urandom is the most portable source of random numbers on unix systems,  though rdrand does have the advantage of avoiding system calls,  so it certainly would be preferable, especially if you need large numbers of random numbers.
There's no much information on this i think, but if you need large number of random numbers you should build a PRNG yourself on top of the best random seed you can get, and make sure you reseed your prng casually with more entropy bytes. Also if
you don't have enough initial entropy, you should block.

/dev/urandom is not the same thing on every unix system. leading to various assumptions broken when varying the unixes. It also varies with the hardware context: for example on an embedded or some virtualized platform, giving you really terrible entropy.

--
Vincent

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Leon Smith | 29 Nov 11:26 2012
Picon

Re: How can I avoid buffered reads?

Well,  I took Bardur's suggestion and avoided all the complexities of GHC's IO stack and simply used System.Posix.IO and Foreign.    This appears to work,  but for better or worse,   it is using blocking calls to the "read" system call and is not integrated with GHC's IO manager.   This shouldn't be an issue for my purposes,  but I suppose it's worth pointing out.  


{-# LANGUAGE BangPatterns, ViewPatterns #-}

import           Control.Applicative
import           Data.Bits
import           Data.Word(Word64)
import qualified Data.ByteString as S
import qualified Data.ByteString.Lazy as L
import           Data.ByteString.Internal (c2w)
import           Control.Exception
import           System.Posix.IO
import           Foreign
import qualified System.IO              as IO
import qualified Data.Binary.Get        as Get

showHex :: Word64 -> S.ByteString
showHex n = s
  where
    (!s,_) = S.unfoldrN 16 f n

    f n = Just (char (n `shiftR` 60), n `shiftL` 4)

    char (fromIntegral -> i)
      | i < 10    = (c2w '0' -  0) + i
      | otherwise = (c2w 'a' - 10) + i

twoRandomWord64s :: IO (Word64,Word64)
twoRandomWord64s = bracket openRd closeRd readRd
  where
    openRd = openFd "/dev/urandom" ReadOnly Nothing defaultFileFlags { noctty = True }
    readRd = \fd -> allocaBytes 16 $ \ptr -> do
                fdReadAll fd ptr 16
                x <- peek (castPtr ptr)
                y <- peek (castPtr ptr `plusPtr` 8)
                return (x,y)
    closeRd = closeFd
    fdReadAll fd ptr n = do
      n' <- fdReadBuf fd ptr n
      if n /= n'
      then fdReadAll fd (ptr `plusPtr` n') (n - n')
      else return ()

main = do
   (x,y) <- twoRandomWord64s
   S.hPutStrLn IO.stdout (S.append (showHex x) (showHex y))


On Wed, Nov 28, 2012 at 6:05 PM, Leon Smith <leon.p.smith <at> gmail.com> wrote:
If you have rdrand,  there is no need to build your own PRNG on top of rdrand.   RdRand already incorporates one so that it can produce random numbers as fast as they can be requested,  and this number is continuously re-seeded with the on-chip entropy source.

It would be nice to have a little more information about /dev/urandom and how it varies by OS and hardware,   but on Linux and FreeBSD at least it's supposed to be a cryptographically secure RNG that incorporates a PRNG to produce numbers in case you exhaust the entropy pool.

On Wed, Nov 28, 2012 at 5:00 PM, Vincent Hanquez <tab <at> snarc.org> wrote:
On 11/28/2012 09:31 PM, Leon Smith wrote:
Quite possibly,  entropy does seem to be a pretty lightweight dependency...

Though doesn't recent kernels use rdrand to seed /dev/urandom if it's available?   So /dev/urandom is the most portable source of random numbers on unix systems,  though rdrand does have the advantage of avoiding system calls,  so it certainly would be preferable, especially if you need large numbers of random numbers.
There's no much information on this i think, but if you need large number of random numbers you should build a PRNG yourself on top of the best random seed you can get, and make sure you reseed your prng casually with more entropy bytes. Also if
you don't have enough initial entropy, you should block.

/dev/urandom is not the same thing on every unix system. leading to various assumptions broken when varying the unixes. It also varies with the hardware context: for example on an embedded or some virtualized platform, giving you really terrible entropy.

--
Vincent


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Tristan Seligmann | 6 Dec 21:24 2012
Picon

Re: How can I avoid buffered reads?


On 29 Nov 2012 12:27 PM, "Leon Smith" <leon.p.smith <at> gmail.com> wrote:
>
> Well,  I took Bardur's suggestion and avoided all the complexities of GHC's IO stack and simply used System.Posix.IO and Foreign.    This appears to work,  but for better or worse,   it is using blocking calls to the "read" system call and is not integrated with GHC's IO manager.   This shouldn't be an issue for my purposes,  but I suppose it's worth pointing out.  

Reading from an fd corresponding to an actual file is always blocking. select() will always indicate that the fd is ready for reading and writing, and I think epoll() will refuse to operate on the fd at all.

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Brandon Allbery | 6 Dec 22:39 2012
Picon

Re: How can I avoid buffered reads?

On Thu, Dec 6, 2012 at 3:24 PM, Tristan Seligmann <mithrandi <at> mithrandi.net> wrote:

On 29 Nov 2012 12:27 PM, "Leon Smith" <leon.p.smith <at> gmail.com> wrote:
System.Posix.IO and Foreign.    This appears to work,  but for better or worse,   it is using blocking calls to the "read" system call and is not integrated with GHC's IO manager.   This shouldn't be an issue for my purposes,  but I suppose it's worth pointing out.  

Reading from an fd corresponding to an actual file is always blocking. select() will always indicate that the fd is ready for reading and writing, and I think epoll() will refuse to operate on the fd at all.

This; it's a longstanding gripe among those of us who use network filesystems heavily, since it's entirely possible those reads *will* block... but the usual architecture of Unix-like kernel filesystem code doesn't provide any way to see it or do anything about it.  (This is also why NFS "hard" mounts are annoying and "soft" mounts are terrible fragile hacks.  It's no better with any other network filesystem; they just default to the "hard" behavior because the "soft" hack, when even possible, is even worse for most of them.)

--
brandon s allbery kf8nh                               sine nomine associates
allbery.b <at> gmail.com                                  ballbery <at> sinenomine.net
unix, openafs, kerberos, infrastructure, xmonad        http://sinenomine.net

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Donn Cave | 6 Dec 23:14 2012

Re: How can I avoid buffered reads?

Quoth Brandon Allbery <allbery.b <at> gmail.com>,
> On Thu, Dec 6, 2012 at 3:24 PM, Tristan Seligmann
> <mithrandi <at> mithrandi.net>wrote:
>>> On 29 Nov 2012 12:27 PM, "Leon Smith" <leon.p.smith <at> gmail.com> wrote:
>>> System.Posix.IO and Foreign.    This appears to work,  but for better or
>>> worse,   it is using blocking calls to the "read" system call and is not
>>> integrated with GHC's IO manager.   This shouldn't be an issue for my
>>> purposes,  but I suppose it's worth pointing out.
>>
>> Reading from an fd corresponding to an actual file is always blocking.
>> select() will always indicate that the fd is ready for reading and writing,
>> and I think epoll() will refuse to operate on the fd at all.
>
> This; it's a longstanding gripe among those of us who use network
> filesystems heavily, since it's entirely possible those reads *will*
> block... but the usual architecture of Unix-like kernel filesystem code
> doesn't provide any way to see it or do anything about it.

A wretched state of affairs indeed, but is that the same problem?

While I guess /dev/urandom isn't supposed to block, so it would look
about the same to select(2) either way, /dev/random is select-able, true?
If GHC IO is using blocking I/O on everything opened by name, on the
assumption it's talking to a filesystem, then that looks to me like
GHC's error, not UNIX's.

	Donn
Brandon Allbery | 6 Dec 23:23 2012
Picon

Re: How can I avoid buffered reads?

On Thu, Dec 6, 2012 at 5:14 PM, Donn Cave <donn <at> avvanta.com> wrote:
While I guess /dev/urandom isn't supposed to block, so it would look
about the same to select(2) either way, /dev/random is select-able, true?

Both should be cdevs, not files, so they do not go through the normal filesystem I/O pathway in the kernel and should support select()/poll().  ("ls -l", the first character should be "c" instead of "-" indicating character-mode device nodes.)  If ghc is not detecting that, then *that* is indeed an I/O manager issue.  More generally, anything which is not a regular file according to stat() (in practice this means block devices, character devices, and fifos; directories, sockets, doors, Xenix name files, and other exotics in the filesystem namespace are not generally accessible via standard I/O routines and should probably be failed on open --- in fact, you may have a kernel bug if the system does not fail them on open) should go through the I/O manager.

--
brandon s allbery kf8nh                               sine nomine associates
allbery.b <at> gmail.com                                  ballbery <at> sinenomine.net
unix, openafs, kerberos, infrastructure, xmonad        http://sinenomine.net

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Leon Smith | 9 Dec 10:29 2012
Picon

Re: How can I avoid buffered reads?

On Thu, Dec 6, 2012 at 5:23 PM, Brandon Allbery <allbery.b <at> gmail.com> wro\

Both should be cdevs, not files, so they do not go through the normal filesystem I/O pathway in the kernel and should support select()/poll().  ("ls -l", the first character should be "c" instead of "-" indicating character-mode device nodes.)  If ghc is not detecting that, then *that* is indeed an I/O manager issue.

The issue here is that if you look at the source of fdReadBuf,  you see that it's a plain system call without any reference to GHC's (relatively new) IO manager.

Best,
Leon
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Bas van Dijk | 10 Dec 00:01 2012
Picon

Re: How can I avoid buffered reads?

On 9 December 2012 10:29, Leon Smith <leon.p.smith <at> gmail.com> wrote:
> On Thu, Dec 6, 2012 at 5:23 PM, Brandon Allbery <allbery.b <at> gmail.com> wro\
>>
>> Both should be cdevs, not files, so they do not go through the normal
>> filesystem I/O pathway in the kernel and should support select()/poll().
>> ("ls -l", the first character should be "c" instead of "-" indicating
>> character-mode device nodes.)  If ghc is not detecting that, then *that* is
>> indeed an I/O manager issue.
>
>
> The issue here is that if you look at the source of fdReadBuf,  you see that
> it's a plain system call without any reference to GHC's (relatively new) IO
> manager.

What if you use threadWaitRead on the fd before you read from it?

http://hackage.haskell.org/packages/archive/base/latest/doc/html/Control-Concurrent.html#v:threadWaitRead

Bas
Bardur Arantsson | 28 Nov 21:03 2012
Picon

Re: How can I avoid buffered reads?

On 11/28/2012 08:38 PM, Leon Smith wrote:
> I have some code that reads (infrequently) small amounts of data from
> /dev/urandom,  and because this is pretty infrequent,  I simply open the
> handle and close it every time I need some random bytes.
> 
> The problem is that I recently discovered that,  thanks to buffering within
> GHC,   I was actually reading 8096 bytes when I only need 16 bytes,  and
> thus wasting entropy.   Moreover  calling hSetBuffering  handle NoBuffering
> did not change this behavior.
> 
> I'm not sure if this behavior is a bug or a feature,  but in any case it's
> unacceptable for dealing with /dev/urandom.   Probably the simplest way to
> fix this is to write a little C helper function that will read from
> /dev/urandom for me,  so that I have precise control over the system calls
> involved.     But I'm curious if GHC can manage this use case correctly;
> I've just started digging into the GHC.IO code myself.
> 

Use openFd, fdReadBuf and closeFd from the System.Posix.IO.ByteString
module in the 'unix' package.

Those correspond directly to system calls and are thus unbuffered.

Gmane