Jonathan Lee | 8 Feb 19:42
Picon

Macro preprocessor

Hello everyone,

I am very new to SableCC, so please bear with me.  I would like to
implement an Erlang parser, and I've written the main grammar.
However, Erlang supports preprocessing.  For example, macros can be
defined and applied as:

  -define(MACRO, "abc").
  f() -> ?MACRO.

I am trying to understand how to create a preprocessor that integrates
with SableCC.  I've looked at the Java 1.5 unicode preprocessor, but
that is much simpler than what I need -- it simply replaces escaped
tokens as they occur.

I've thought of writing a preprocessor grammar, parsing the text once,
modifying the AST, outputting back to a string, and finally reparsing
with the main grammar.  But my gut feel is that that's a poor
approach, especially with the need to output an intermediate string.

Is there a standard way of doing this, or an example of this anywhere?
Is there a simple way to chain a preprocessor and parser together in
SableCC?

Thanks for your help!

Jonathan
Etienne M. Gagnon | 9 Feb 03:17

Re: Macro preprocessor

Hi Jonathan,

I did some reading of the Erlang specification ( http:// www.erlang.org/download/erl_spec47.ps.gz ). On page 107, the steps, for compiling Erlang, are enumerated.

The Erlang preprocessor does not operate on a character stream (i.e. strings). Instead, it operates on a token stream and it delivers a new token stream (not a string).

To achieve this in SableCC, I would personally implement the system as follows:
  1. develop a Lexer using SableCC, to scan the original text file.
  2. develop a Parser/AST for the preprocessor grammar using SableCC.
  3. implement the preprocessor using tree walkers (DepthFirstAdaptor). I would create/forward generated tokens into a token list.
  4. develop a custom Lexer that will return the tokens stored in the token list.
  5. develop a Parser/AST for the main Erlang grammar. The parser would be instantiated with the custom lexer as parameter.
  6. implement the rest of the compiler.
There is no "standard way" of doing this. The approach has to be adapted to the needs of the parsed language.

Just ask, if you need more help. You have an interesting project that offers neat challenges.

Have fun,

Etienne

On 2012-02-08 13:42, Jonathan Lee wrote:
I am very new to SableCC, so please bear with me.  I would like to implement an Erlang parser, and I've written the main grammar. However, Erlang supports preprocessing. For example, macros can be defined and applied as: [...] I've thought of writing a preprocessor grammar, parsing the text once, modifying the AST, outputting back to a string, and finally reparsing with the main grammar. But my gut feel is that that's a poor approach, especially with the need to output an intermediate string. Is there a standard way of doing this, or an example of this anywhere? Is there a simple way to chain a preprocessor and parser together in SableCC?

-- Etienne M. Gagnon, Ph.D. SableCC: http://sablecc.org
_______________________________________________
SableCC-Discussion mailing list
SableCC-Discussion <at> lists.sablecc.org
http://lists.sablecc.org/listinfo/sablecc-discussion
Jonathan Lee | 9 Feb 18:46
Picon

Re: Macro preprocessor

Hi Etienne,

Thank you for the quick response! I followed your recommendation of
creating a preprocessor grammar, parsing the input, then traversing
the AST saving encountered tokens into a separate list. When a macro
is found, the macro's tokens are added to the list instead.  A custom
lexer returns these tokens to the main parser. This works for me so
far, and I'm able to do it entirely within SableCC!

One caveat is that the two parsers don't share the same Token classes,
so the preprocessor ones need to be translated to ones used main
grammar. I assume that there's no way for the two generated parsers to
share Tokens, but please let me know if that's not the case.

Thanks for your help!

Jonathan

On Wed, Feb 8, 2012 at 9:17 PM, Etienne M. Gagnon <egagnon <at> j-meg.com> wrote:
> Hi Jonathan,
>
> I did some reading of the Erlang specification ( http://
> www.erlang.org/download/erl_spec47.ps.gz ). On page 107, the steps, for
> compiling Erlang, are enumerated.
>
> The Erlang preprocessor does not operate on a character stream (i.e.
> strings). Instead, it operates on a token stream and it delivers a new token
> stream (not a string).
>
> To achieve this in SableCC, I would personally implement the system as
> follows:
>
> develop a Lexer using SableCC, to scan the original text file.
> develop a Parser/AST for the preprocessor grammar using SableCC.
> implement the preprocessor using tree walkers (DepthFirstAdaptor). I would
> create/forward generated tokens into a token list.
> develop a custom Lexer that will return the tokens stored in the token list.
> develop a Parser/AST for the main Erlang grammar. The parser would be
> instantiated with the custom lexer as parameter.
> implement the rest of the compiler.
>
> There is no "standard way" of doing this. The approach has to be adapted to
> the needs of the parsed language.
>
> Just ask, if you need more help. You have an interesting project that offers
> neat challenges.
>
> Have fun,
>
> Etienne
>
>
> On 2012-02-08 13:42, Jonathan Lee wrote:
>
> I am very new to SableCC, so please bear with me.  I would like to
> implement an Erlang parser, and I've written the main grammar.
> However, Erlang supports preprocessing.  For example, macros can be
> defined and applied as:
> [...]
> I've thought of writing a preprocessor grammar, parsing the text once,
> modifying the AST, outputting back to a string, and finally reparsing
> with the main grammar.  But my gut feel is that that's a poor
> approach, especially with the need to output an intermediate string.
>
> Is there a standard way of doing this, or an example of this anywhere?
> Is there a simple way to chain a preprocessor and parser together in
> SableCC?
>
>
> --
> Etienne M. Gagnon, Ph.D.
> SableCC:                                            http://sablecc.org
>
>
> _______________________________________________
> SableCC-Discussion mailing list
> SableCC-Discussion <at> lists.sablecc.org
> http://lists.sablecc.org/listinfo/sablecc-discussion
>
Etienne M. Gagnon | 9 Feb 20:22

Re: Macro preprocessor

Hi Jonathan,

You are right; the two parsers can't share the same tokens.

Here's some additional trick, in case it might help you... The main grammar parser does not need a real lexer, only your custom one. I recommend that you define the tokens without providing a regular expression :
Tokens l_par = ; r_par = ; identifier = ; ... This will cause SableCC to generate the token classes, but will not waste time generating an unneeded lexical automaton.

I'm glad my recommendation helped.

Have fun and thanks for using SableCC!

Etienne

On 2012-02-09 12:46, Jonathan Lee write:
Thank you for the quick response! I followed your recommendation of creating a preprocessor grammar, parsing the input, then traversing the AST saving encountered tokens into a separate list. When a macro is found, the macro's tokens are added to the list instead. A custom lexer returns these tokens to the main parser. This works for me so far, and I'm able to do it entirely within SableCC! One caveat is that the two parsers don't share the same Token classes, so the preprocessor ones need to be translated to ones used main grammar. I assume that there's no way for the two generated parsers to share Tokens, but please let me know if that's not the case.

-- Etienne M. Gagnon, Ph.D. SableCC: http://sablecc.org
_______________________________________________
SableCC-Discussion mailing list
SableCC-Discussion <at> lists.sablecc.org
http://lists.sablecc.org/listinfo/sablecc-discussion

Gmane