Andrew Cowie | 18 Sep 03:43 2013

Telling Cassava to ignore lines

I'm happily using Cassava to parse CSV, only to discover that
non-conforming lines in the input data are causing the parser to error
out.

    let e = decodeByName y' :: Either String (Header, Vector Person)

chugs along fine until line 461 of the input when 

	"parse error (endOfInput) at ..."

Ironically when my Person (ha) data type was all fields of :: Text it
just worked, but now that I've specified one or two of the fields as Int
or Float or whatever, it's mis-parsing.

Is there a way to tell it to just ignore lines that don't parse, rather
than it killing the whole run? Cassava understands skipping the *header*
line (and indeed using it to do the -by-name field mapping).

Otherwise the only thing I can see is going back to all the fields
being :: Text, and then running over that as an intermediate structure
and validating whether or not things parse to i.e. float.

AfC
Sydney
Johan Tibell | 18 Sep 04:03 2013
Picon

Re: Telling Cassava to ignore lines

Hi,

It depends on what you mean by "doesn't parse". From your message is assume the CSV is valid, but some of the actual values fails to convert (using FromField). There are a couple of things you could try:

 1. Define a newtype for your field that calls runParser using e.g. the Int parser and if it fails, return some other value. I should probably add an Either instance that covers this case, but there's none there now.

newtype MaybeInt = JustI !Int | ParseFailed

instance FromField MaybeInt where
    parseField s = case runParser (parseField s) of
        Left err -> pure ParseFailed
        Right (n :: Int) -> JustI <$> n

(This is from memory, so I might have gotten some of the details wrong.)

 2. Use the Streaming module, which lets you skip whole records that fails to parse (see the docs for the Cons constructor).

-- Johan



On Tue, Sep 17, 2013 at 6:43 PM, Andrew Cowie <andrew <at> operationaldynamics.com> wrote:
I'm happily using Cassava to parse CSV, only to discover that
non-conforming lines in the input data are causing the parser to error
out.

    let e = decodeByName y' :: Either String (Header, Vector Person)

chugs along fine until line 461 of the input when

        "parse error (endOfInput) at ..."

Ironically when my Person (ha) data type was all fields of :: Text it
just worked, but now that I've specified one or two of the fields as Int
or Float or whatever, it's mis-parsing.

Is there a way to tell it to just ignore lines that don't parse, rather
than it killing the whole run? Cassava understands skipping the *header*
line (and indeed using it to do the -by-name field mapping).

Otherwise the only thing I can see is going back to all the fields
being :: Text, and then running over that as an intermediate structure
and validating whether or not things parse to i.e. float.

AfC
Sydney


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Andrew Cowie | 18 Sep 05:30 2013

Re: Telling Cassava to ignore lines

On Tue, 2013-09-17 at 19:03 -0700, Johan Tibell wrote:

>  2. Use the Streaming module, which lets you skip whole records that
> fails to parse (see the docs for the Cons constructor).

Ah, that's sure to be it. Totally missed Data.Csv.Streaming. Thanks!

AfC
Sydney

Gmane