Mike Meyer | 18 May 13:30 2014

Conduits vs. lazy byte strings

I'm working on a web app that loads a file, tweaks it a bit, then downloads the results. I'd like for it to use the minimal amount of memory possible, just as good practice. Especially since the tweaking all happens in the first K or so of the file, and the rest of it is passed through untouched.

The current version uses a conduit that just reads the data to a sinkLbs to get a lazy bytestring, which is then processed.

I think this will have the desired behavior (after all, the bytestring is lazy), but have this itch that says I should be doing the processing in the conduit.

Someone want to tell me if I correctly understand things and the itch is just leftover imperative thinking, or the itch is right and I need to fix the code?

If you're intersted, you can find the code at
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
John Wiegley | 18 May 20:51 2014

Re: Conduits vs. lazy byte strings

>>>>> Mike Meyer <mwm <at> mired.org> writes:

> The current version uses a conduit that just reads the data to a sinkLbs to
> get a lazy bytestring, which is then processed.

sinkLbs reads the entire contents into memory, so this is the exact opposite
of what you want.

> Someone want to tell me if I correctly understand things and the itch is
> just leftover imperative thinking, or the itch is right and I need to fix
> the code?

You should write a Conduit or a Sink which will do the processing you need.

By default, you'll receive "chunks" at each call to "await".  If you need
lines, there is a linesUnbounded Conduit (as of conduit 1.1), but it still
reads whole chunks into memory at a time (I believe the default chunk size is
32k)?  But that's the same behavior as plain lazy I/O.

Once your Conduit or Sink (i.e., Consumer) finds the data it needs, it should
simply end, and not call await anymore.  This will inform upstream that
processing and done and that all finalizers should be executed.

John

Gmane