Chris Kuklewicz | 30 Jul 03:21

A fancier Get monad or two (a la binary and binary-strict)

Summary: I have two new Get-like monads for binary data (byte-aligned) that also
   (*) Suspend parsing and request more import when reading past the end of data.
       It is possible to respond with Nothing to indicate the hard end of data.
   (*) Return failures instead of calling error
   (*) Offer lookAhead,lookAheadM,lookAheadE like Data.Binary.Get
   (*) Are BinaryParser instances from Data.Binary.Strict.Class
   (*) Are Monad Transformers (and thus MonadIO)
   (*) Are MonadError,Plus,Reader,Writer,State and Applicative,Alternative
   (*) They differ because one is also MonadCont while the other is simpler
   (*) Simplified Non-transformer versions (applied to Identity) are defined

Note: Making the suspend/resume-with-more-input work efficiently with 
MonadError/Alternative is one improvement to binary-strict.  Make callCC work 
with sane semantics at the same time was non-trivial.

Long version and where to find the source:

In the process of writing an implementation of google's protocol buffers, I used 
binary's Put monad and binary-strict's BinaryParser class (i.e. binary-strict's 
Incremental Get monad).

The Get in the binary package works on Lazy ByteString (good for me) but tends 
to call "error" when unhappy, which was not ideal.  The Incremental Get from the 
binary-strict package does not want to work on Lazy bytestrings (it can be 
adapted), but the internals hold onto data too long and obstruct garbage 
collection (this is how it appears to me when I read the code).

So I have made a fancier Get monad that offers a large collection of features. 
I am announcing this in case other people might want to further adapt my 
solution for their own ends (or just use it outright).  Removing feature from my 
(Continue reading)

Chris Kuklewicz | 30 Jul 03:41

Re: A fancier Get monad or two (a la binary and binary-strict)

Silly typo in the first bullet point...

   (*) Suspend parsing and request more input when reading past the end of data.
                                        ^^^^^
I do not have to wonder why "import" is so easy to type.

--

-- 
Chris
Duncan Coutts | 30 Jul 09:26
Gravatar

Re: A fancier Get monad or two (a la binary and binary-strict)


On Wed, 2008-07-30 at 02:23 +0100, Chris Kuklewicz wrote:
> Summary: I have two new Get-like monads for binary data (byte-aligned) that also
>    (*) Suspend parsing and request more import when reading past the end of data.
>        It is possible to respond with Nothing to indicate the hard end of data.
>    (*) Return failures instead of calling error
>    (*) Offer lookAhead,lookAheadM,lookAheadE like Data.Binary.Get
>    (*) Are BinaryParser instances from Data.Binary.Strict.Class
>    (*) Are Monad Transformers (and thus MonadIO)
>    (*) Are MonadError,Plus,Reader,Writer,State and Applicative,Alternative
>    (*) They differ because one is also MonadCont while the other is simpler
>    (*) Simplified Non-transformer versions (applied to Identity) are defined

[..]

> I have tested the code, but it is still quite new.  I doubt I will have time to 
> make a proper cabal release. I may see if the maintainers of the binary-strict 
> or binary packages are interested in a fancier Get monad.

Yep, definitely interested. Sounds like we could make something that
would satisfy the needs of existing users of the binary and
binary-strict packages.

We'll have to look closely at the performance costs of the new features
but my intuition is that a non-transformer but continuation based
version that has error handling (and plus/alternative) and can request
more input should have minimal cost.

Duncan
(Continue reading)

Chris Kuklewicz | 30 Jul 12:14

Re: A fancier Get monad or two (a la binary and binary-strict)

Summary: I have made the simplified version Duncan speculates about...

Duncan Coutts wrote:
> Yep, definitely interested. Sounds like we could make something that
> would satisfy the needs of existing users of the binary and
> binary-strict packages.

I see this as incremental development of incremental get.  The comments in 
binary-strict's code point to one additional useful feature:  When the parser 
suspends and asks for more input it could potentially also return a list or 
Sequence of the output-so-far.

I believe this is possible to add to my existing code (even the simplified 
version below), so there will be an even fancier version eventually.  This would 
make the parser a controllable part of a pipeline.

> We'll have to look closely at the performance costs of the new features
> but my intuition is that a non-transformer but continuation based
> version that has error handling (and plus/alternative) and can request
> more input should have minimal cost.
> 
> Duncan
> 

To modify down "MyGet.hs" to produce that type is a matter of using the delete 
key (which I have done, see below).  I only have my Apple powerpc/G4 laptop to 
run Haskell, this and the lack of either need or available time means I will not 
be making performance measurements.  I have written for the language shootout 
before and I had binary's Get.hs to look at, and so I claim my code has no 
show-stopping performance killers in it.  A bit of !strictness and even more 
(Continue reading)

Johan Tibell | 30 Jul 17:16

Re: A fancier Get monad or two (a la binary and binary-strict)

On Wed, Jul 30, 2008 at 9:26 AM, Duncan Coutts
<duncan.coutts <at> worc.ox.ac.uk> wrote:
> On Wed, 2008-07-30 at 02:23 +0100, Chris Kuklewicz wrote:
>> Summary: I have two new Get-like monads for binary data (byte-aligned) that also
>>    (*) Suspend parsing and request more import when reading past the end of data.
>>        It is possible to respond with Nothing to indicate the hard end of data.
>>    (*) Return failures instead of calling error
>>    (*) Offer lookAhead,lookAheadM,lookAheadE like Data.Binary.Get
>>    (*) Are BinaryParser instances from Data.Binary.Strict.Class
>>    (*) Are Monad Transformers (and thus MonadIO)
>>    (*) Are MonadError,Plus,Reader,Writer,State and Applicative,Alternative
>>    (*) They differ because one is also MonadCont while the other is simpler
>>    (*) Simplified Non-transformer versions (applied to Identity) are defined
>
> [..]
>
> We'll have to look closely at the performance costs of the new features
> but my intuition is that a non-transformer but continuation based
> version that has error handling (and plus/alternative) and can request
> more input should have minimal cost.

I've written what I believe to be a similar, continuation-based
parser. I haven't uploaded my latest patches (basically faster
combinators) but the idea can be seen in the file here:

http://www.johantibell.com/cgi-bin/gitweb.cgi?p=hyena.git;a=blob;f=Hyena/Parser.hs;h=8086b11bfeb3bca15bfd16ec9c6a4b34aadf528e;hb=HEAD

The use case is parsing HTTP without resorting to lazy I/O.

Cheers,
(Continue reading)

Chris Kuklewicz | 30 Jul 23:34

Re: A fancier Get monad or two (a la binary and binary-strict)

Johan Tibell wrote:>
> I've written what I believe to be a similar, continuation-based
> parser. I haven't uploaded my latest patches (basically faster
> combinators) but the idea can be seen in the file here:
> 
> http://www.johantibell.com/cgi-bin/gitweb.cgi?p=hyena.git;a=blob;f=Hyena/Parser.hs;h=8086b11bfeb3bca15bfd16ec9c6a4b34aadf528e;hb=HEAD
> 
> The use case is parsing HTTP without resorting to lazy I/O.

I have just read through your code.  It is quite similar.

The differences:
   Your error handling is via Alternative, and if the first branch advances 
(consumes input) then the second branch is not attempted.  The state can only go 
forward (by 1 byte) or remain in place (there is no look-ahead).  If reading 
past the end, then either (*) the new first byte works (*) the new first fails 
and the Alternative is ready to try it.

   In MyGet/MyGetW/MyGetSimplified the MonadError semantics are different.  If 
the first alternative fails then the parser state is rolled back and the second 
alternative is tried from the same starting point as the first was tried.  If 
the first alternative trigged more input from IPartial then this input is still 
visible to the second alternative.

   The management of saved state on the stack of pending operations is much 
simpler with your commit-if-advance semantics and much more complicated with my 
rollback semantics.  Oddly, it seems your committed operations do not 
immediately release the pending handlers so they can be garbage collected.  This 
same kind of issue motivated me to improve the implementation of binary-strict's 
incremental get.
(Continue reading)

Johan Tibell | 31 Jul 09:33

Re: A fancier Get monad or two (a la binary and binary-strict)

Hi Chris,

Thanks for providing feedback. It's much appreciated.

On Wed, Jul 30, 2008 at 11:34 PM, Chris Kuklewicz
<haskell <at> list.mightyreason.com> wrote:
> The differences:
>  Your error handling is via Alternative, and if the first branch advances
> (consumes input) then the second branch is not attempted.  The state can
> only go forward (by 1 byte) or remain in place (there is no look-ahead).  If
> reading past the end, then either (*) the new first byte works (*) the new
> first fails and the Alternative is ready to try it.

This is by design. This is enough for parsing many network protocols
like HTTP and avoids having to deal with backtracking and the
potential space leaks and bookkeeping overhead that might come with
it.

>  The management of saved state on the stack of pending operations is much
> simpler with your commit-if-advance semantics and much more complicated with
> my rollback semantics.  Oddly, it seems your committed operations do not
> immediately release the pending handlers so they can be garbage collected.
>  This same kind of issue motivated me to improve the implementation of
> binary-strict's incremental get.

I'm not sure what you mean by releasing pending handlers. It's
probably something I've not considered. Could you please elaborate?

Cheers,

(Continue reading)

Chris Kuklewicz | 31 Jul 14:00

Re: A fancier Get monad or two (a la binary and binary-strict)

Johan Tibell wrote:
> Hi Chris,
> 
> Thanks for providing feedback. It's much appreciated.
> 
> On Wed, Jul 30, 2008 at 11:34 PM, Chris Kuklewicz
> <haskell <at> list.mightyreason.com> wrote:
>> The differences:
>>  Your error handling is via Alternative, and if the first branch advances
>> (consumes input) then the second branch is not attempted.  The state can
>> only go forward (by 1 byte) or remain in place (there is no look-ahead).  If
>> reading past the end, then either (*) the new first byte works (*) the new
>> first fails and the Alternative is ready to try it.
> 
> This is by design. This is enough for parsing many network protocols
> like HTTP and avoids having to deal with backtracking and the
> potential space leaks and bookkeeping overhead that might come with
> it.

I agree that the design space of get and parsing monads is quite broad. That is 
why my small endeavor now has 3 monads: MyGetW with MonadCont, MyGet without 
MonadCont, and MyGetSimplified without MonadCont,Reader,Writer,State,Trans.

> 
>>  The management of saved state on the stack of pending operations is much
>> simpler with your commit-if-advance semantics and much more complicated with
>> my rollback semantics.  Oddly, it seems your committed operations do not
>> immediately release the pending handlers so they can be garbage collected.
>>  This same kind of issue motivated me to improve the implementation of
>> binary-strict's incremental get.
(Continue reading)

Johan Tibell | 4 Aug 01:54

Re: A fancier Get monad or two (a la binary and binary-strict)

On Thu, Jul 31, 2008 at 2:00 PM, Chris Kuklewicz
<haskell <at> list.mightyreason.com> wrote:
> [terrific explanation of the problem]

Thanks a lot for the explanation. It's perfectly clear to me now. I
changed by implementation along the lines with the one you attached
except I didn't use Maybe to wrap the error handler in the parse state
but instead set the failure handler to `failed' instead of Nothing on
commit.

Cheers,

Johan
Chris Kuklewicz | 5 Aug 19:48

An even fancier Get monad, last in a series?

And now MyGetW.hs [1] is even more capable:

During the parsing one can use a "yieldItem y" command to queue y for delivery 
to the code running the Get monad.

The length of this queue can be examined with "pendingItems" which returns an 
Int, which will be non-negative.

To immediately pause parsing and return a Data.Sequence of yielded items to the 
user one calls "flushItems" which returns the sequence and a lazy value which is 
the future of the paused parsing.

The pending queue of items is also returned every time the parser needs more 
input, sees an error, or completes.

Thus the parser can be a lazy participant in a chain of processing.

The queued items are undisturbed by fail/throwError/mzero all lookAhead* 
functions and any use of callCC and the continuations it returns.  Thus 
"yieldItem" is for life and cannot be undone.

The continuation returned by "callCC" was improved internally to not hold onto 
the old input state of the BinaryParser or the old user state.  It does not use 
these things and should not keep them alive.

Frankly, I cannot think of any more features to add.  Once Haddock can be used 
with a released Cabal I may make a more proper release of these files.

Cheers,
   Chris
(Continue reading)

Michael Karcher | 6 Aug 15:17

Re: An even fancier Get monad, last in a series?

Chris Kuklewicz <haskell <at> list.mightyreason.com> wrote:
> The length of this queue can be examined with "pendingItems" which returns an 
> Int, which will be non-negative.
Wouldn't it be more in the spirit of Haskell to use a data type that has
"non-negative" already built in, like Data.Word.Word?

Regards,
  Michael Karcher

Gmane