Malcolm Wallace | 25 Feb 11:47 2013

ANN: lazy-csv - the fastest and most space-efficient parser for CSV

There are lots of Haskell CSV parsers out there.  Most have poor error-reporting, and do not scale to large
inputs.  I am pleased to announce an industrial-strength library that is robust, fast, space-efficient,
lazy, and scales to gigantic inputs with no loss of performance.

    http://code.haskell.org/lazy-csv/

Downloads from Hackage:

    http://hackage.haskell.org/package/lazy-csv

This library has been in industrial use for several years now, but this is the first public release.  No doubt
the API is not as general as it could be, but it already serves many purposes very well.  I'm happy to receive
bug reports and suggestions for improvements.

Regards,
    Malcolm
Oliver Charles | 25 Feb 12:14 2013
Picon

Re: ANN: lazy-csv - the fastest and most space-efficient parser for CSV

On 02/25/2013 10:47 AM, Malcolm Wallace wrote:
There are lots of Haskell CSV parsers out there. Most have poor error-reporting, and do not scale to large inputs. I am pleased to announce an industrial-strength library that is robust, fast, space-efficient, lazy, and scales to gigantic inputs with no loss of performance. http://code.haskell.org/lazy-csv/ Downloads from Hackage: http://hackage.haskell.org/package/lazy-csv This library has been in industrial use for several years now, but this is the first public release. No doubt the API is not as general as it could be, but it already serves many purposes very well. I'm happy to receive bug reports and suggestions for improvements. Regards, Malcolm

Obvious question: How does this compare to cassava? Especially cassava's Data.CSV.Incremental module? I specifically ask because you mention that it's " It is lazier, faster, more space-efficient, and more flexible in its treatment of errors, than any other extant Haskell CSV library on Hackage" but there is no mention of cassava in the website.

- Ollie
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Malcolm Wallace | 25 Feb 12:21 2013

Re: ANN: lazy-csv - the fastest and most space-efficient parser for CSV


On 25 Feb 2013, at 11:14, Oliver Charles wrote:

> Obvious question: How does this compare to cassava? Especially cassava's Data.CSV.Incremental
module? I specifically ask because you mention that it's " It is lazier, faster, more space-efficient,
and more flexible in its treatment of errors, than any other extant Haskell CSV library on Hackage" but
there is no mention of cassava in the website.

Simple answer - I have never heard of cassava, and suspect it did not exist when I first did the benchmarking.
I'd be happy to re-do my performance comparison, including cassava and any other recent-ish CSV
libraries, if I can find them.

Regards,
    Malcolm
John Wiegley | 25 Feb 23:16 2013

Re: ANN: lazy-csv - the fastest and most space-efficient parser for CSV

>>>>> Malcolm Wallace <malcolm.wallace <at> me.com> writes:

> Simple answer - I have never heard of cassava, and suspect it did not exist
> when I first did the benchmarking. I'd be happy to re-do my performance
> comparison, including cassava and any other recent-ish CSV libraries, if I
> can find them.

I would be very interested in those results, Malcolm.

Thanks,
--

-- 
John Wiegley
FP Complete                         Haskell tools, training and consulting
http://fpcomplete.com               johnw on #haskell/irc.freenode.net
Ozgun Ataman | 25 Feb 23:25 2013
Picon

Re: ANN: lazy-csv - the fastest and most space-efficient parser for CSV

I'd also like to point to a couple of CSV libraries I released a long time ago and have been maintaining that both target constant-space operation and try (and hope) for the best in terms of speed. I'd be very interested to know how they fare in terms of performance benchmarking:

Latest, based on conduit: http://hackage.haskell.org/package/csv-conduit (just released the latest version)


Notice how both are based on IO streaming libraries of fame to achieve both constant space operation AND nice interoperability with their habitat. I have found this to be especially true in the case of conduit.

If you end up designing a benchmark, I'd be happy to get it working with my library.

- Oz

On Monday, February 25, 2013 at 5:16 PM, John Wiegley wrote:

Malcolm Wallace <malcolm.wallace <at> me.com> writes:

Simple answer - I have never heard of cassava, and suspect it did not exist
when I first did the benchmarking. I'd be happy to re-do my performance
comparison, including cassava and any other recent-ish CSV libraries, if I
can find them.

I would be very interested in those results, Malcolm.

Thanks,
--
John Wiegley
FP Complete Haskell tools, training and consulting

_______________________________________________
Haskell-Cafe mailing list

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Don Stewart | 25 Feb 23:32 2013
Picon

Re: ANN: lazy-csv - the fastest and most space-efficient parser for CSV

Cassava is quite new, but has the same goals as lazy-csv.

Its about a year old now - http://blog.johantibell.com/2012/08/a-new-fast-and-easy-to-use-csv-library.html

I know Johan has been working on the benchmarks of late - it would be very good to know how the two compare in features

On Feb 25, 2013 11:23 AM, "Malcolm Wallace" <malcolm.wallace <at> me.com> wrote:

On 25 Feb 2013, at 11:14, Oliver Charles wrote:

> Obvious question: How does this compare to cassava? Especially cassava's Data.CSV.Incremental module? I specifically ask because you mention that it's " It is lazier, faster, more space-efficient, and more flexible in its treatment of errors, than any other extant Haskell CSV library on Hackage" but there is no mention of cassava in the website.

Simple answer - I have never heard of cassava, and suspect it did not exist when I first did the benchmarking. I'd be happy to re-do my performance comparison, including cassava and any other recent-ish CSV libraries, if I can find them.

Regards,
    Malcolm
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Johan Tibell | 26 Feb 00:46 2013
Picon

Re: ANN: lazy-csv - the fastest and most space-efficient parser for CSV

On Mon, Feb 25, 2013 at 2:32 PM, Don Stewart <dons00 <at> gmail.com> wrote:

Cassava is quite new, but has the same goals as lazy-csv.

Its about a year old now - http://blog.johantibell.com/2012/08/a-new-fast-and-easy-to-use-csv-library.html

I know Johan has been working on the benchmarks of late - it would be very good to know how the two compare in features


To run, check out the cassava repo on GitHub and run: cabal configure --enable-benchmarks && cabal build && cabal bench

Here are the results (all the normal caveats for benchmarking applies):

benchmarking positional/decode/presidents/without conversion
mean: 62.85965 us, lb 62.56705 us, ub 63.26101 us, ci 0.950
std dev: 1.751446 us, lb 1.371323 us, ub 2.295576 us, ci 0.950

benchmarking positional/decode/streaming/presidents/without conversion
mean: 93.81925 us, lb 91.14701 us, ub 98.19217 us, ci 0.950
std dev: 17.20842 us, lb 11.58690 us, ub 23.41786 us, ci 0.950

benchmarking comparison/lazy-csv
mean: 133.2609 us, lb 132.4415 us, ub 135.3085 us, ci 0.950
std dev: 6.193178 us, lb 3.123661 us, ub 12.83148 us, ci 0.950

The two first set of numbers are for cassava (in the all-at-once vs streaming mode). The last set is for lazy-csv.

The feature sets of the two libraries are quite different. Both do basic CSV parsing (with some extensions).

 * lazy-csv parses CSV data to something akin to [[ByteString]], but with a heavy focus on error recovery and precise error messages.
 * cassava parses CSV data to [a], where a is a user-defined type that represents a CSV record. There are options to recover from *type conversion* errors, but not from malformed CSV. cassava has several parsing modes: incremental for parsing interleaved with I/O, streaming for lazy parsing (with or without I/O), and all-at-once parsing for when you want to hold all the data in memory.

-- Johan

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Ivan Lazar Miljenovic | 25 Feb 12:23 2013
Picon

Re: ANN: lazy-csv - the fastest and most space-efficient parser for CSV

On 25 February 2013 21:47, Malcolm Wallace <malcolm.wallace <at> me.com> wrote:
> There are lots of Haskell CSV parsers out there.  Most have poor error-reporting, and do not scale to large
inputs.  I am pleased to announce an industrial-strength library that is robust, fast, space-efficient,
lazy, and scales to gigantic inputs with no loss of performance.
>
>     http://code.haskell.org/lazy-csv/
>
> Downloads from Hackage:
>
>     http://hackage.haskell.org/package/lazy-csv

Note that on your website, you list the Hackage URL as having
"packages" rather than "package"...

>
> This library has been in industrial use for several years now, but this is the first public release.  No
doubt the API is not as general as it could be, but it already serves many purposes very well.  I'm happy to
receive bug reports and suggestions for improvements.
>
> Regards,
>     Malcolm
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe <at> haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe

--

-- 
Ivan Lazar Miljenovic
Ivan.Miljenovic <at> gmail.com
http://IvanMiljenovic.wordpress.com

Gmane