Mateusz Kowalczyk | 31 Jul 00:35 2013
Picon

Haddock GSOC project progress

Greetings cafe,

As some of you might know, I'm hacking on Haddock as part of Google
Summer of Code. I was recently advised to create a blog and document
some of what I have been doing recently. You can find the blog at [1] if
you're interested. The first post goes over the work from the last month
or so. Future posts should be shorter and on more specific topics.
There's an overview of what has happened/changed/will change at the
bottom of the post if you're short on time.

Thanks.

[1] - http://fuuzetsu.co.uk/blog

--

-- 
Mateusz K.
Attachment (0x2ADA9A97.asc): application/pgp-keys, 5619 bytes
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Roman Cheplyaka | 31 Jul 07:37 2013

Re: Haddock GSOC project progress

Hi Mateusz,

This looks great — I'm especially excited about "List entries no longer
have to be separated by empty lines"!

However, the decision to use Attoparsec (instead of Parsec, say) strikes
me as a bit odd, as it wasn't intended for parsing source code. In
particular, I'm concerned with error messages this parser would produce.

Roman

* Mateusz Kowalczyk <fuuzetsu <at> fuuzetsu.co.uk> [2013-07-30 23:35:45+0100]
> Greetings cafe,
> 
> As some of you might know, I'm hacking on Haddock as part of Google
> Summer of Code. I was recently advised to create a blog and document
> some of what I have been doing recently. You can find the blog at [1] if
> you're interested. The first post goes over the work from the last month
> or so. Future posts should be shorter and on more specific topics.
> There's an overview of what has happened/changed/will change at the
> bottom of the post if you're short on time.
> 
> Thanks.
> 
> [1] - http://fuuzetsu.co.uk/blog

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
(Continue reading)

Mateusz Kowalczyk | 31 Jul 08:39 2013
Picon

Re: Haddock GSOC project progress

On 31/07/13 06:37, Roman Cheplyaka wrote:
> Hi Mateusz,
>
> This looks great — I'm especially excited about "List entries no longer
> have to be separated by empty lines"!
Glad to hear that.

>
> However, the decision to use Attoparsec (instead of Parsec, say) strikes
> me as a bit odd,
Parsec has a dependency on Data.Text that you can't easily get rid of.
With Attoparsec, I was able to simply get rid of
the modules I was not interested in (anything with Text) and only keep
the ByteString part.

> as it wasn't intended for parsing source code.
We're not parsing source code. As I mention, we get comment content out
from GHC and parse the markup there.

> In particular, I'm concerned with error messages this parser would produce.
Currently there exist only two error messages: one for when module
header parsing fails
and another one for when parsing of anything else fails. Currently the
parsing
functions have the type ‘DynFlags -> String -> Maybe (Doc RdrName)’ and
if we get out Nothing
then you get a generic error message and no guidance. This is also the
current behaviour.

Now, I agree that this sounds horrible BUT in actuality, there's not
(Continue reading)

Mats Rauhala | 31 Jul 09:21 2013
Picon

Re: Haddock GSOC project progress

Is Data.Text as an extra dependency really that bad? Remember that you
are parsing comments, prose, human produced text, where Data.Text is way
more useful than ByteString.

--

-- 
Mats Rauhala
MasseR
Mateusz Kowalczyk | 31 Jul 09:40 2013
Picon

Re: Haddock GSOC project progress

On 31/07/13 08:21, Mats Rauhala wrote:
> Is Data.Text as an extra dependency really that bad? Remember that you
> are parsing comments, prose, human produced text, where Data.Text is way
> more useful than ByteString.
> 
It has to come with GHC boot packages and it currently doesn't. I have
updated my post accordingly to mention it.

ByteString indeed has its problems (I have to be quite careful to make
sure unicode doesn't get mangled) but that's just how it is at the
moment. If Text ever makes it in, the transition will be trivial. We're
not doing anything fancy to the text we get out anyway so any
performance difference it might bring is negligable. The only difference
I can think of would be that we would no longer have to worry about
preserving unicode by hand.

It's an inconvenience, but that's about it. Nothing mission-critical.
--

-- 
Mateusz K.
Attachment (0x2ADA9A97.asc): application/pgp-keys, 5619 bytes
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Simon Hengel | 31 Jul 10:16 2013
Picon

Re: Haddock GSOC project progress

Hi Roman,

> However, the decision to use Attoparsec (instead of Parsec, say)
> strikes me as a bit odd, as it wasn't intended for parsing source
> code. In particular, I'm concerned with error messages this parser
> would produce.

In addition to what Mateusz already said, I want to briefly summarize my
justification for using Attoparsec:

 * Attoparsec's backtracking behavior is much easier to work with than
   Parsec's

 * There is no such thing as a parse error in Markdown, and I think we
   should try to make this true for Haddock markup, too

Cheers,
Simon
Richard A. O'Keefe | 1 Aug 01:14 2013
Picon

Re: Haddock GSOC project progress


On 31/07/2013, at 8:16 PM, Simon Hengel wrote:
> 
> * There is no such thing as a parse error in Markdown, and I think we
>   should try to make this true for Haddock markup, too

It is very far from clear that this is a virtue in Markdown.
In trying to learn Markdown, I found it an excessively tiresome
defect.  Whenever I was trying to learn how to produce some
combination of effects, instead of Markdown telling me
"at THIS point you had something I wasn't expecting", it would
just produce incorrect output, defined as "anything other than
what I intended".  It also meant that two different Markdown
processors would accept the same text silently but do different
things with it.

This is one of the reasons I won't use Markdown.
S. Doaitse Swierstra | 5 Aug 17:30 2013
Picon

Re: Haddock GSOC project progress

Why not use uu-parsinglib, which will tell you what is wrong and nevertheless will continue parsing? 

Currently Jacco Krijnen is working on an extensible version of Pandoc, based on the AspectAG and the Murder
packages, so you can define your own plugins for syntax and semantics.

  Doaitse Swierstra

On Aug 1, 2013, at 1:14 , Richard A. O'Keefe <ok <at> cs.otago.ac.nz> wrote:

> 
> On 31/07/2013, at 8:16 PM, Simon Hengel wrote:
>> 
>> * There is no such thing as a parse error in Markdown, and I think we
>>  should try to make this true for Haddock markup, too
> 
> It is very far from clear that this is a virtue in Markdown.
> In trying to learn Markdown, I found it an excessively tiresome
> defect.  Whenever I was trying to learn how to produce some
> combination of effects, instead of Markdown telling me
> "at THIS point you had something I wasn't expecting", it would
> just produce incorrect output, defined as "anything other than
> what I intended".  It also meant that two different Markdown
> processors would accept the same text silently but do different
> things with it.
> 
> This is one of the reasons I won't use Markdown.
> 
> 
> _______________________________________________
> Haskell-Cafe mailing list
(Continue reading)


Gmane