Dierk Koenig | 24 Mar 09:40 2005

RE: Re: Strings: Regular Expression Patterns

Hi Alan,

good Idea.

things to consider:
- use of # in shebang (and possibly line comments but this is dropped, I
guess)
- the need to escape # as \# forces \ to be escaped as \\ , or not?

how about
~pattern~
?
Which would nicely reuse ~ for patterns.
I have no idea what other implications this has.

just a thought
Mittie

> -----Original Message-----
> From: news [mailto:news@...]On Behalf Of Alan Green
> Sent: Donnerstag, 24. März 2005 2:05
> To: groovy-user@...
> Subject: [groovy-user] Re: Strings: Regular Expression Patterns
>
>
> John Rose wrote:
> > I agree that some sort of modified string literal syntax would be handy
> > for regexps.
> >
> > I'm not wild about r"..." itself, since IDENT STRING_LITERAL is already
(Continue reading)

Jeremy Rayner | 24 Mar 10:11 2005
Picon

Re: [groovy-user] Re: Strings: Regular Expression Patterns

Hi all,
I would propose the introduction of the character /
as the first character of an expression to mark a regular expression literal

This would behave in an identical fashion to ", i.e. you
need another / to close the regex.

The only difference would be that escapes in /regex/ vs "string"
is that /regex/ would swap the usages of single \ escapes and double \\ escapes 
in string literals

to paraphrase an example from "Backslashes, escapes, and quoting" in
http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html

  The string literal "\b", for example, matches a single backspace
character when interpreted
  as a regular expression, while "\\b" matches a word boundary.

  The regex literal /\\b/, for example, matches a single backspace
character when interpreted
  as a regular expression, while /\b/ matches a word boundary.

examples

  if ( it =~ /gro*vy/ )
equiv
  if ( it =~ "gro*vy" )

  if ( it =~ /foo[ab]/ )
equiv
(Continue reading)

Dierk Koenig | 24 Mar 10:25 2005

RE: Re: [groovy-user] Re: Strings: Regular Expression Patterns

This would look nicely like Perl/Ruby.

From the examples I assume it would work on a GString.
How is ${} handled/escaped then to distinguish the
GString meaning from the regex meaning?

cheers
Mittie

> -----Original Message-----
> From: Jeremy Rayner [mailto:jeremy.rayner@...]
> Sent: Donnerstag, 24. März 2005 10:11
> To: user@...
> Cc: Groovy JSR
> Subject: [groovy-jsr] Re: [groovy-user] Re: Strings: Regular Expression
> Patterns
>
>
> Hi all,
> I would propose the introduction of the character /
> as the first character of an expression to mark a regular
> expression literal
>
> This would behave in an identical fashion to ", i.e. you
> need another / to close the regex.
>
> The only difference would be that escapes in /regex/ vs "string"
> is that /regex/ would swap the usages of single \ escapes and
> double \\ escapes
> in string literals
(Continue reading)

Guillaume Laforge | 24 Mar 10:48 2005
Picon

Re: Re: [groovy-user] Re: Strings: Regular Expression Patterns

On Thu, 24 Mar 2005 09:11:05 +0000, Jeremy Rayner
<jeremy.rayner@...> wrote:
> [...]
> I would propose the introduction of the character /
> as the first character of an expression to mark a regular expression literal
> [...]

I like the idea too.
Could there be some potential problems with the divide operator?
Any possible ambiguities?

--

-- 
Guillaume Laforge
http://glaforge.free.fr/weblog

John Rose | 25 Mar 01:07 2005
Picon

Re: Re: [groovy-user] Re: Strings: Regular Expression Patterns

On Mar 24, 2005, at 1:11, Jeremy Rayner wrote:
> The introduction of a /regex/ literal would simplify the use of regex 
> in groovy.
Yes.

On Mar 24, 2005, at 1:48, Guillaume Laforge wrote:
> Could there be some potential problems with the divide operator?
> Any possible ambiguities?

If we're to support perl- or awk-style regexp literals, It's best to 
define them as tokens, in the lexer.
This definition should be as independent as possible from the syntax.
Syntax decisions are made after lexical decisions are complete, in both 
Java and Groovy.
Most parser generators (ANTLR included) don't allow feedback from the 
grammar to the lexer.
It interferes with token lookahead.

It's kind of gross, but we might recognize the slash '/' as introducing 
a regular expression after certain previous tokens.
The lexer should obviously recognize regexp tokens after:
	~  =~ ==~
(Covers use cases like foo =~ /bar/.)

Also, to allow standalone regexps as arguments and closure result 
values, recognize regexp tokens after:
	( [ { , : ; \n
(Covers use cases like eachMatch(/bar/).)

This works only because all of the above tokens are known never to end 
(Continue reading)

Alan Green | 26 Mar 08:17 2005
Picon

Re: [groovy-user] Re: Strings: Regular Expression Patterns

John Rose wrote:

> P.S.  If the lexer hack doesn't work, we could invent some sort of new, 
> unambiguous "open regexp quote" like '~/', as in:
> 
>> if ('abc' =~ ~/.../) whatever
>> if ('abc' ==~ ~/.../) whatever
>> 'abc'.eachMatch(~/.../) { whatever }
>> ~/.../.each('abc'){ whatever }
>> ['a','b','c'].grep(~/a/)
>> switch('abc'){
>>     case ~/.../ : whatever
>> }
>>
>> word = ~/\b\w+\b/

~/bar/ and Brian's suggestoin of ~"bar" both read better to me than /bar/.

Just my 2 cents,

Alan.

John Rose | 5 Apr 03:42 2005
Picon

Re: Re: [groovy-user] Re: Strings: Regular Expression Patterns

I implemented a straightforward version of our ideas about regexps.
It's simply a third string syntax, as in 'foo', "foo", /foo/.

The treatment of backslash escapes in /foo/ favors regular expression  
notation, but it's just a string to the language.

It reads reasonably well:

if ('abc' =~ /.../) whatever
if ('abc' ==~ /.../) whatever
'abc'.eachMatch(/.../) { whatever }
['a','b','c'].grep(/a/)
switch('abc'){
	case ~/.../ : whatever
}
assert 'EUOUAE'.matches(/^[aeiou]*$/)
assert 'football'.replaceAll(/foo/, "Bar") == 'Bartball'

It turns out that the division operators and regexp literals want to be  
in very different places, so there's no real ambiguity to the eye.   
It's easy enough to get over it in the lexer also.
	x = y / z
	x = /xyzzy/

Also, uses of dollars and braces for GStrings are (luckily) quite  
distinct from uses of the same characters in the regexp language, so  
there seems to be a happy marriage possible between Groovy GStrings and  
regexps.
	x =~ /${word} ${word}/
	x =~ /^true$|^false$/
(Continue reading)

Martin C. Martin | 5 Apr 03:43 2005

Re: Re: [groovy-user] Re: Strings: Regular Expression Patterns

John Rose wrote:

> I implemented a straightforward version of our ideas about regexps.
> It's simply a third string syntax, as in 'foo', "foo", /foo/.

...

> Comments?

Great!

- Martin

jastrachan | 5 Apr 09:08 2005
Picon

Re: Re: [groovy-user] Re: Strings: Regular Expression Patterns

I really like it! :)

On 5 Apr 2005, at 02:42, John Rose wrote:
> I implemented a straightforward version of our ideas about regexps.
> It's simply a third string syntax, as in 'foo', "foo", /foo/.
>
> The treatment of backslash escapes in /foo/ favors regular expression  
> notation, but it's just a string to the language.
>
> It reads reasonably well:
>
> if ('abc' =~ /.../) whatever
> if ('abc' ==~ /.../) whatever
> 'abc'.eachMatch(/.../) { whatever }
> ['a','b','c'].grep(/a/)
> switch('abc'){
> 	case ~/.../ : whatever
> }
> assert 'EUOUAE'.matches(/^[aeiou]*$/)
> assert 'football'.replaceAll(/foo/, "Bar") == 'Bartball'
>
> It turns out that the division operators and regexp literals want to  
> be in very different places, so there's no real ambiguity to the eye.   
> It's easy enough to get over it in the lexer also.
> 	x = y / z
> 	x = /xyzzy/
>
> Also, uses of dollars and braces for GStrings are (luckily) quite  
> distinct from uses of the same characters in the regexp language, so  
> there seems to be a happy marriage possible between Groovy GStrings  
(Continue reading)

Dierk Koenig | 5 Apr 09:31 2005

RE: Re: [groovy-user] Re: Strings: Regular Expression Patterns

looks great!

a minor issue with the dollar sign in // regex Strings:

If they behave like GStrings, i.e. "foo" or """foo"""
then I would expect that a literal backslash needs escaping.
Otherwise, we should state the disambiguation rules
(i.e. the differences from GStrings).

cheers
Mittie

> -----Original Message-----
> From: John Rose [mailto:rose00@...]
> Sent: Dienstag, 5. April 2005 3:43
> To: jsr@...
> Subject: Re: [groovy-jsr] Re: [groovy-user] Re: Strings: Regular
> Expression Patterns
> 
> 
> I implemented a straightforward version of our ideas about regexps.
> It's simply a third string syntax, as in 'foo', "foo", /foo/.
> 
> The treatment of backslash escapes in /foo/ favors regular expression  
> notation, but it's just a string to the language.
> 
> It reads reasonably well:
> 
> if ('abc' =~ /.../) whatever
> if ('abc' ==~ /.../) whatever
(Continue reading)

Dierk Koenig | 5 Apr 10:14 2005

RE: Re: [groovy-user] Re: Strings: Regular Expression Patterns

sorry, just found the respective part in the jsr spec:

In a regular expression literal, if a dollar sign is not followed by an
identifier character or a left curly brace, perhaps after an intervening
star, the dollar is deemed to be escaped.

cool job, John!

Mittie

> -----Original Message-----
> From: Dierk Koenig [mailto:dierk.koenig@...]
> Sent: Dienstag, 5. April 2005 9:31
> To: jsr@...
> Subject: RE: [groovy-jsr] Re: [groovy-user] Re: Strings: Regular
> Expression Patterns
>
>
> looks great!
>
> a minor issue with the dollar sign in // regex Strings:
>
> If they behave like GStrings, i.e. "foo" or """foo"""
> then I would expect that a literal backslash needs escaping.
> Otherwise, we should state the disambiguation rules
> (i.e. the differences from GStrings).
>
> cheers
> Mittie
>
(Continue reading)

LARSON, BRIAN (SBCSI | 24 Mar 14:23 2005
Picon

RE: Re: [groovy-user] Re: Strings: Regular Expression Patterns

Agreed.  I like this syntax.  It seems to work well in other languages.

I also wondered about the divide operator, but I couldn't come up with
any ambiguity off the top of my head.  It probably makes parsing a
little harder since / would be overloaded.

Thoughts:
Patterns could not be split onto multiple lines.
What operators would be valid with patterns:  ==~ =~ =
The current regex Groovy page is a little confusing
(http://groovy.codehaus.org/Regular+Expressions)
What is the difference between /pattern/ and the current ~"pattern"
syntax?

Brian Larson
SBC

-----Original Message-----
From: Guillaume Laforge [mailto:glaforge@...] 

On Thu, 24 Mar 2005 09:11:05 +0000, Jeremy Rayner
<jeremy.rayner@...> wrote:
> [...]
> I would propose the introduction of the character /
> as the first character of an expression to mark a regular expression
literal
> [...]

I like the idea too.
Could there be some potential problems with the divide operator?
(Continue reading)

Guillaume Laforge | 24 Mar 14:58 2005
Picon

Re: Re: [groovy-user] Re: Strings: Regular Expression Patterns

On Thu, 24 Mar 2005 07:23:18 -0600, LARSON, BRIAN (SBCSI)
<bl7385@...> wrote:
> Agreed.  I like this syntax.  It seems to work well in other languages.
> 
> I also wondered about the divide operator, but I couldn't come up with
> any ambiguity off the top of my head.  It probably makes parsing a
> little harder since / would be overloaded.
> [...]

Hmmm, let's see...

    foo = meth /bar/ 
    func()

How should it be parsed? Especially since top-level statements (like
print "hello") can omit parentheses?

It could be interpreted as being two different statements, with an
assignement and two method calls:

    foo = meth( /bar/ );
    func();

Or it could be interpreted as two divides spanning two lines:

    foo = meth / bar / func()

With meth and bar being some values, and func() a method call
returning some other value.

(Continue reading)

Jeremy Rayner | 24 Mar 16:45 2005
Picon

Re: Re: [groovy-user] Re: Strings: Regular Expression Patterns

>    foo = meth /bar/
>    func()
> 
It should be lexed as:

IDENT(foo) ASSIGN(=) IDENT(meth) REGEX_LITERAL(bar) 

> How should it be parsed? Especially since top-level statements (like
> print "hello") can omit parentheses?
I'm thinking identical usage for REGEX_LITERAL as STRING_LITERAL
is today.

jez
--

-- 
http://javanicus.com/blog2

LARSON, BRIAN (SBCSI | 24 Mar 15:48 2005
Picon

RE: Re: [groovy-user] Re: Strings: Regular Expression Patterns

Good find.  I was playing all around the example you provided below, but
I forgot to omit the optional parenthesis.  Looks like it might bite us
again -- ducking (smirk).

If /pattern/ can't be used, how is it different than the ~"pattern"?  It
seems that the primary objective is to avoid escaping the backslash in a
literal string.  I don't recall whether ~"pattern" syntax does this or
not, but it seems as though it wouldn't have to.

-----Original Message-----
From: Guillaume Laforge [mailto:glaforge@...] 
Sent: Thursday, March 24, 2005 7:58 AM
To: jsr@...
Subject: Re: [groovy-jsr] Re: [groovy-user] Re: Strings: Regular
Expression Patterns

On Thu, 24 Mar 2005 07:23:18 -0600, LARSON, BRIAN (SBCSI)
<bl7385@...> wrote:
> Agreed.  I like this syntax.  It seems to work well in other
languages.
> 
> I also wondered about the divide operator, but I couldn't come up with
> any ambiguity off the top of my head.  It probably makes parsing a
> little harder since / would be overloaded.
> [...]

Hmmm, let's see...

    foo = meth /bar/ 
    func()
(Continue reading)


Gmane