Slava Pestov | 6 May 00:49

Bug in CSV parser

Hi Phil,

Try parsing the following:

"foo \"bar\" bar" <string-reader> csv

Also perhaps lines starting with # should be ignored as comments?

Slava

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
Phil Dawes | 6 May 07:40

Re: Bug in CSV parser

Hi Slava,

Ok - will check it out...

Cheers,

Phil

Slava Pestov wrote:
> Hi Phil,
> 
> Try parsing the following:
> 
> "foo \"bar\" bar" <string-reader> csv
> 
> Also perhaps lines starting with # should be ignored as comments?
> 
> Slava
> 
> -------------------------------------------------------------------------
> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
> Don't miss this year's exciting event. There's still time to save $100. 
> Use priority code J8TL2D2. 
> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
> _______________________________________________
> Factor-talk mailing list
> Factor-talk@...
> https://lists.sourceforge.net/lists/listinfo/factor-talk
> 

(Continue reading)

Phil Dawes | 6 May 10:05

Re: Bug in CSV parser

Actually the problem is that this isn't valid csv, at least according to
the rfc and wikipedia page IIRC (am on a bus so can't check this at the
moment).
If there's a quote in the field, the whole field must be quoted and the
quotes escaped '""', e.g. try:

"\"foo \"\"bar\"\" bar\"" <string-reader> csv

Cheers,

Phil

Phil Dawes wrote:
> Hi Slava,
> 
> Ok - will check it out...
> 
> Cheers,
> 
> Phil
> 
> Slava Pestov wrote:
>> Hi Phil,
>>
>> Try parsing the following:
>>
>> "foo \"bar\" bar" <string-reader> csv
>>
>> Also perhaps lines starting with # should be ignored as comments?
>>
(Continue reading)

Slava Pestov | 6 May 11:18

Re: Bug in CSV parser

Fair enough.

To put this in context, I'm working on extra/geo-ip. It doesn't work  
properly yet, but the idea is to parse the data at
http://software77.net/cgi-bin/ip-country/geo-ip.pl?action=download 
  to implement an IP geolocation tool for parsing log files, etc.

Slava

On May 6, 2008, at 3:05 AM, Phil Dawes wrote:

> Actually the problem is that this isn't valid csv, at least  
> according to
> the rfc and wikipedia page IIRC (am on a bus so can't check this at  
> the
> moment).
> If there's a quote in the field, the whole field must be quoted and  
> the
> quotes escaped '""', e.g. try:
>
> "\"foo \"\"bar\"\" bar\"" <string-reader> csv
>
> Cheers,
>
> Phil

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
(Continue reading)

Stefan Scholl | 6 May 08:20

Re: Bug in CSV parser

Slava Pestov <slava@...> wrote:
> Also perhaps lines starting with # should be ignored as comments?

I haven't seen comments in any csv documentation so far. This
kind of in-band signalling should be handled by the application
itself and not by the library.

--

-- 
Web (en): http://www.no-spoon.de/ -*- Web (de): http://www.frell.de/

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
Slava Pestov | 6 May 11:17

Re: Bug in CSV parser

This would be a good approach if the csv parser had an alternative  
entry point taking an array of strings.

Then I could do

"foo.txt" ascii file-lines [ "#" head? not ] filter csv-lines

Slava

On May 6, 2008, at 1:20 AM, Stefan Scholl wrote:

> I haven't seen comments in any csv documentation so far. This
> kind of in-band signalling should be handled by the application
> itself and not by the library.

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
Stefan Scholl | 6 May 15:37

Re: Bug in CSV parser

Or the application just ignores every row with the first field
starting with "#".

"123,456,789\n#stuff\n111,222,333" <string-reader> csv
    [ first "#" head? not ] filter .
{ { "123" "456" "789" } { "111" "222" "333" } }

! Or:

: process-row ( seq -- n )
    dup first "#" head? not
    [ 0 [ string>number + ] reduce ]
    [ 0 ] if ;

"123,456,789\n#stuff\n111,222,333" <string-reader> csv [ process-row ] map .
{ 1368 0 666 }

Slava Pestov <slava@...> wrote:
> This would be a good approach if the csv parser had an alternative  
> entry point taking an array of strings.
> 
> Then I could do
> 
> "foo.txt" ascii file-lines [ "#" head? not ] filter csv-lines
> 
> Slava
> 
> On May 6, 2008, at 1:20 AM, Stefan Scholl wrote:
> 
>> I haven't seen comments in any csv documentation so far. This
(Continue reading)

Slava Pestov | 6 May 15:41

Re: Bug in CSV parser

This won't work if the comment text itself contains a quoted string  
"foo", because then Phil's code simply returns "foo" as the content of  
that line.

Slava

On May 6, 2008, at 8:37 AM, Stefan Scholl wrote:

> Or the application just ignores every row with the first field
> starting with "#".
>
>
> "123,456,789\n#stuff\n111,222,333" <string-reader> csv
>    [ first "#" head? not ] filter .
> { { "123" "456" "789" } { "111" "222" "333" } }
>
>
> ! Or:
>
> : process-row ( seq -- n )
>    dup first "#" head? not
>    [ 0 [ string>number + ] reduce ]
>    [ 0 ] if ;
>
> "123,456,789\n#stuff\n111,222,333" <string-reader> csv [ process- 
> row ] map .
> { 1368 0 666 }
>
>
> Slava Pestov <slava@...> wrote:
(Continue reading)

Phil Dawes | 6 May 18:36

Re: Bug in CSV parser

:)
How about an abstraction that lets you apply a quot to each element of a 
stream? (I'm still thinking about stream/collection unification, but in 
the meantime...)

Slava Pestov wrote:
> This won't work if the comment text itself contains a quoted string  
> "foo", because then Phil's code simply returns "foo" as the content of  
> that line.
> 
> Slava
> 
> On May 6, 2008, at 8:37 AM, Stefan Scholl wrote:
> 
>> Or the application just ignores every row with the first field
>> starting with "#".
>>
>>
>> "123,456,789\n#stuff\n111,222,333" <string-reader> csv
>>    [ first "#" head? not ] filter .
>> { { "123" "456" "789" } { "111" "222" "333" } }
>>
>>
>> ! Or:
>>
>> : process-row ( seq -- n )
>>    dup first "#" head? not
>>    [ 0 [ string>number + ] reduce ]
>>    [ 0 ] if ;
>>
(Continue reading)

Stefan Scholl | 6 May 20:50

Re: Bug in CSV parser

One other idea: Could the streams be extended that way, that you
"pipe" through a quotation?

Something that reads from an input stream and "emits" data into
an output stream?

Slava Pestov <slava@...> wrote:
> This would be a good approach if the csv parser had an alternative  
> entry point taking an array of strings.
> 
> Then I could do
> 
> "foo.txt" ascii file-lines [ "#" head? not ] filter csv-lines
> 
> Slava
> 
> On May 6, 2008, at 1:20 AM, Stefan Scholl wrote:
> 
>> I haven't seen comments in any csv documentation so far. This
>> kind of in-band signalling should be handled by the application
>> itself and not by the library.
> 
> -------------------------------------------------------------------------
> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
> Don't miss this year's exciting event. There's still time to save $100. 
> Use priority code J8TL2D2. 
> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone

--

-- 
Web (en): http://www.no-spoon.de/ -*- Web (de): http://www.frell.de/
(Continue reading)

Re: Bug in CSV parser

Slava Pestov <slava@...> wrote:
> This would be a good approach if the csv parser had an alternative
>  entry point taking an array of strings.

Personally, I'd think a stream/pipe processing API would be more
general. That would work both with your comment/csv mashup, and with a
more complex mashup where the two components don't always agree on
what a "record" is (for example, one adding SQL statements onto a C
syntax, or one adding C++ comments into an old C parser). The point is
that both CSV and # comments have the same idea of when a record ends
-- the end of the current line. In general that neat coincidence
doesn't hold.

>  Slava

-Wm

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone

Gmane