Evan Laforge | 15 Mar 20:29 2013
Picon

attoparsec and backtracking

I have a couple of problems with attoparsec which I think are related
to its always backtrack nature, but maybe there's some other way to
solve the same problems.

The first is that it's hard to get the right error msg out.  For
instance, I have a parser that tries to parse a number with an
optional type suffix.  It's an error if the suffix is unrecognized:

p_num :: A.Parser Score.TypedVal
p_num = do
    num <- p_untyped_num
    suffix <- A.option "" ((:"") <$> A.letter_ascii)
    case Score.code_to_type suffix of
        Nothing -> fail $ "p_num expected suffix in [cdsr]: " ++ show suffix
        Just typ -> return $ Score.Typed typ num

However, which error msg shows up depends on the order of the (<|>)
alternatives, and in general the global structure of the entire
parser, because I think it just backtracks and then picks the last
failing backtrack.  Even after carefully rearranging all the parsers
it seems impossible to get this particular error to bubble up to the
top.  The thing is, as soon as I see an unexpected suffix I know I can
fail entirely right there, with exactly that error msg, but since
there's no way to turn off backtracking I think there's no way to do
that.

The second thing is that I want to lex a single token.  I thought I
could just parse a term and then see how far I got and then use that
index to splitAt the input.  But attoparsec doesn't keep track of the
current index, so I wrote ((,) <$> p_term <*> A.takeByteString), and
(Continue reading)

wren ng thornton | 16 Mar 00:26 2013

Re: attoparsec and backtracking

On 3/15/13 3:29 PM, Evan Laforge wrote:
> However, which error msg shows up depends on the order of the (<|>)
> alternatives, and in general the global structure of the entire
> parser, because I think it just backtracks and then picks the last
> failing backtrack.  Even after carefully rearranging all the parsers
> it seems impossible to get this particular error to bubble up to the
> top.  The thing is, as soon as I see an unexpected suffix I know I can
> fail entirely right there, with exactly that error msg, but since
> there's no way to turn off backtracking I think there's no way to do
> that.

I had some similar issues recently. The trick is figuring out how to
convince attoparsec to commit to a particular alternative. For example,
consider the grammar: A (B A)* C; where if the B succeeds then we want to
commit to parsing an A (and if it fails then return A's error, not C's).

To simplify things, let's drop the leading A since it's not part of the
problem. And let's try to parse an invalid string like "BX" (or "BABX").
The key point is that,

    bad = (pB *> pure (:) <*> pA <*> bad) <|> (pC *> pure [])

is different than,

    good = do
        e <- eitherP pB pC -- (Left <$> pB) <|> (Right <$> pC)
        case e of
            Left  _ -> (:) <$> pA <*> good
            Right _ -> pure []

(Continue reading)

Erik de Castro Lopo | 16 Mar 02:54 2013

Re: attoparsec and backtracking

Evan Laforge wrote:

> The first is that it's hard to get the right error msg out.  For
> instance, I have a parser that tries to parse a number with an
> optional type suffix.  It's an error if the suffix is unrecognized:
> 
> p_num :: A.Parser Score.TypedVal
> p_num = do
>     num <- p_untyped_num
>     suffix <- A.option "" ((:"") <$> A.letter_ascii)
>     case Score.code_to_type suffix of
>         Nothing -> fail $ "p_num expected suffix in [cdsr]: " ++ show suffix
>         Just typ -> return $ Score.Typed typ num

I think the mistake here is to parse something and then decide if
its it valid. It should be the parser which decides whether its
valid. So rather than:

     suffix <- A.option "" ((:"") <$> A.letter_ascii)

try:

     typ <- A.choice [ {- list or valid suffix parsers -} ]
     return $ Score.Typed typ num

> However, which error msg shows up depends on the order of the (<|>)
> alternatives, and in general the global structure of the entire
> parser, because I think it just backtracks and then picks the last
> failing backtrack.

(Continue reading)

Ivan Lazar Miljenovic | 16 Mar 04:26 2013
Picon

Re: attoparsec and backtracking

On 16 March 2013 12:54, Erik de Castro Lopo <mle+hs <at> mega-nerd.com> wrote:
> Evan Laforge wrote:
>
>> The first is that it's hard to get the right error msg out.  For
>> instance, I have a parser that tries to parse a number with an
>> optional type suffix.  It's an error if the suffix is unrecognized:
>>
>> p_num :: A.Parser Score.TypedVal
>> p_num = do
>>     num <- p_untyped_num
>>     suffix <- A.option "" ((:"") <$> A.letter_ascii)
>>     case Score.code_to_type suffix of
>>         Nothing -> fail $ "p_num expected suffix in [cdsr]: " ++ show suffix
>>         Just typ -> return $ Score.Typed typ num
>
> I think the mistake here is to parse something and then decide if
> its it valid. It should be the parser which decides whether its
> valid. So rather than:
>
>      suffix <- A.option "" ((:"") <$> A.letter_ascii)
>
> try:
>
>      typ <- A.choice [ {- list or valid suffix parsers -} ]
>      return $ Score.Typed typ num
>
>> However, which error msg shows up depends on the order of the (<|>)
>> alternatives, and in general the global structure of the entire
>> parser, because I think it just backtracks and then picks the last
>> failing backtrack.
(Continue reading)

Niklas Hambüchen | 16 Mar 04:49 2013

Re: attoparsec and backtracking

Is it not possible to add an alternative (no pun intended) to <|> that
supports the semantics Evan wants?

I would agree that what attoparsec does for <|> of Alternative and mplus
for MonadPlus is correct since e.g. the mplus laws say that a failure
must be identity and therefore the following alternatives must be
considered. I also find it very convenient that attoparsec works this
way, and prefer it to what parsec does by default.

However, I do not see why attoparsec cannot have a function <||> that on
failure with consumed input does not evaluate the remaining alternatives.

On 16/03/13 01:54, Erik de Castro Lopo wrote:
> Evan Laforge wrote:
>> However, which error msg shows up depends on the order of the (<|>)
>> alternatives, and in general the global structure of the entire
>> parser, because I think it just backtracks and then picks the last
>> failing backtrack.
> 
> I'm not sure if what I've offered will help, but its worth a try.
Roman Cheplyaka | 16 Mar 09:40 2013

Re: attoparsec and backtracking

* Niklas Hambüchen <mail <at> nh2.me> [2013-03-16 03:49:29+0000]
> I would agree that what attoparsec does for <|> of Alternative and mplus
> for MonadPlus is correct since e.g. the mplus laws say that a failure
> must be identity and therefore the following alternatives must be
> considered. I also find it very convenient that attoparsec works this
> way, and prefer it to what parsec does by default.

empty/mzero are indeed identities in Parsec.

What doesn't hold is the law

   v >> mzero = mzero

But this one is often violated:

  > flip runState 0 $ runMaybeT mzero
  (Nothing,0)

  > flip runState 0 $ runMaybeT $ lift (modify (+1)) >> mzero
  (Nothing,1)

Roman
Evan Laforge | 17 Mar 03:02 2013
Picon

Re: attoparsec and backtracking

On Fri, Mar 15, 2013 at 8:49 PM, Niklas Hambüchen <mail <at> nh2.me> wrote:
> Is it not possible to add an alternative (no pun intended) to <|> that
> supports the semantics Evan wants?

I assume it's the performance thing.  Presumably it would need to pass
an extra flag with to the failure continuation to tell it to not
retry, though that doesn't sound so bad.  Actually, Bryan's response
here:

https://github.com/bos/attoparsec/issues/42

makes it sound like he's not opposed to <||> on performance grounds,
just that <|> is more intuitive.  I agree in general, but not in the
case of error msgs!  So maybe I just need to see if I can make a patch
to add <||>, sounds like Johan at least would be into that.

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Niklas Hambüchen | 17 Mar 03:41 2013

Re: attoparsec and backtracking

 <at> Evan Thanks for that link, I posted a somewhat longer argument in 
there.

Personally, I'd love <||>.

On Sun 17 Mar 2013 02:02:44 GMT, Evan Laforge wrote:
> On Fri, Mar 15, 2013 at 8:49 PM, Niklas Hambüchen <mail <at> nh2.me> wrote:
>> Is it not possible to add an alternative (no pun intended) to <|> that
>> supports the semantics Evan wants?
>
> I assume it's the performance thing.  Presumably it would need to pass
> an extra flag with to the failure continuation to tell it to not
> retry, though that doesn't sound so bad.  Actually, Bryan's response
> here:
>
> https://github.com/bos/attoparsec/issues/42
>
> makes it sound like he's not opposed to <||> on performance grounds,
> just that <|> is more intuitive.  I agree in general, but not in the
> case of error msgs!  So maybe I just need to see if I can make a patch
> to add <||>, sounds like Johan at least would be into that.

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Evan Laforge | 17 Mar 02:48 2013
Picon

Re: attoparsec and backtracking

> I think the mistake here is to parse something and then decide if
> its it valid. It should be the parser which decides whether its
> valid. So rather than:
>
>      suffix <- A.option "" ((:"") <$> A.letter_ascii)
>
> try:
>
>      typ <- A.choice [ {- list or valid suffix parsers -} ]
>      return $ Score.Typed typ num

I actually had that originally, but but switched to fail-after for the
better error msg.  It worked with parsec, but then I lost it again
when I switched to attoparsec.  I think Wren is right, I really would
need to refactor the parser to put the decisions in the right spot.

> We you using Parsec as a token parser or as a Char parser. Obviously
> the second is going to be slow in comparison to the first.

It was actually the Text version of parsec, back when that was new.  I
should go do a profile again someday, but since attoparsec and parsec
APIs are almost but not quite the same, it's kind of a pain.  I
actually tried the Text version of attoparsec, back when that was not
yet integrated into attoparsec itself, and bytestring was still
significantly faster.  So I don't know how much was Text vs.
ByteString and how much was parsec vs. attoparsec.
oleg | 19 Mar 07:52 2013

Re: attoparsec and backtracking


Wren Thornton wrote:
> I had some similar issues recently. The trick is figuring out how to
> convince attoparsec to commit to a particular alternative. For example,
> consider the grammar: A (B A)* C; where if the B succeeds then we want to
> commit to parsing an A (and if it fails then return A's error, not C's).

Indeed. Consider the following (greatly simplified) fragment from the
OCaml grammar

        | "let"; r = opt_rec; bi = binding; "in";
           x = expr LEVEL ";" ->
        | "function"; a = match_case ->
        | "if"; e1 = SELF; "then"; e2 = expr LEVEL "top";
          "else"; e3 = expr LEVEL "top" ->
...
        | "false" -> 
        | "true"  -> 

It would be bizarre if the parser -- upon seeing "if" but not finding
"then" -- would've reported the error that `found "if" when "true" was
expected'. Many people would think that when the parser comes across
"if", it should commit to parsing the conditional. And if it fails later, it
should report the error with the conditional, rather than trying to
test how else the conditional cannot be parsed. This is exactly the
intuition of pattern matching. For example, given

> foo ("if":t) = case t of
>                  (e:"then":_) -> e
> foo _ = ""
(Continue reading)


Gmane