Adam Strzelecki | 30 Dec 2011 13:58
Picon
Favicon
Gravatar

[SVN] C based grammars broken for embedding (restricting $base reference)

Hello TM developers,

I've just realized that posting to general TM list about this grammar problems wasn't wise idea to I've let
myself to report to dev list here.

Since C grammar is intended to be the base for other grammars such as C++ or Obj-C it uses `include = '$base'`
instead of `include = '$self'` for all recursive sub-block parsing to point back to original grammar (if
possible). This works perfectly well for standalone C or C++ file, however when trying to embed C source
into other language we get a problem, i.e. for Ruby grammar:

#!/usr/bin/env ruby
# trying to embed something into Ruby (using TM2 grammar)
variable = <<-C
/* we are parsed by C grammar here */
enum {
 /* ooops this comment isn't parsed anymore by C grammar but Ruby again! */
}
C

Problem is on '{' which starts new C block, that does `include = '$base'`. Unfortunately $base is Ruby here
not C. Same if we change C into CPP in the example above, $base is still Ruby.

I can see two solutions here:
(1) caller should be able to block/change $base i.e. using some new keyword:

{ safeInclude = 'source.c'; }
or
{ base = 'source.c'; include = 'source.c'; }

(2) callee should be able to specify language grammars that are allowed to be base for it.
(Continue reading)

Allan Odgaard | 6 Jan 2012 01:11
Favicon

[SVN] Re: C based grammars broken for embedding (restricting $base reference)

On 30/12/2011, at 19.58, Adam Strzelecki wrote:

> […] C grammar […] uses `include = '$base'` […] This works perfectly well for standalone C or C++ file

I think there might be a few edge-cases. For example using #if 0 / #else / #endif inside  <at> implementation …
 <at> end and such.

Although this specific example is perhaps a slightly different issue than the self/base one (as what we
want there is to include “current context’s rules”).

> […] when trying to embed C source into other language we get a problem […]
> 
> I can see two solutions here:
> (1) caller should be able to block/change $base […]
> (2) callee should be able to specify language grammars that are allowed to be base for it.
> 
> Or maybe there's already some undocumented solution?

I have not thought it through but I think we need to rewrite the current (Objective-)C(++) grammars to be
injection based.

So we have 4 “dummy” grammars which basically just assigns a root scope to the document (e.g.
source.c++) and the C, C++, and Objective-C support is then injected into the scopes for where they should
be active. This would fix embedding as well.
Adam Strzelecki | 16 Jan 2012 18:54
Picon
Favicon
Gravatar

[SVN] Re: C based grammars broken for embedding (restricting $base reference)

> So we have 4 “dummy” grammars which basically just assigns a root scope to the document (e.g.
source.c++) and the C, C++, and Objective-C support is then injected into the scopes for where they should
be active. This would fix embedding as well.

Okay I think I get it. Still I am not sure what makes injection more powerful than inclusion?

So we would have 4 dummy grammars for root scope (C, C++, Obj-C, Obj-C++), but also 3 extra grammars (C, C++,
Obj-C) that do the real job and have proper injection scope selectors set. Right?

But then if Obj-C++ dummy grammar sets source.objc++ and all support grammars have source.objc++
injection selector assigned, then what would be the order of injection? How can we ensure C support
grammar goes last?

Also it seems when I remove "include = 'source.c'" from Lua grammar and made it use injection from
"meta.embedded.c" scope, seems that C grammar $base still refers to injector grammar (Lua). Would it be
feasible to make $base to refer first grammar in inclusion chain, where any injection break such chain?
This would fix C embedding straight ahead when using injection.

Moreover now if we use injection based C/C++/Obj-C support, can we embed (inject) some other grammar into
C/C++? I.e.:

1. source.lua
2. meta.embedded.c <- C injection
3. string.quoted.double.c
4. meta.embedded.html <- HTML injection?

And then will C grammar rules will be injected only to 2,3 scope levels or 4 scope level (HTML) too, where HTML
get injected?

Regards,
(Continue reading)


Gmane