José Romildo Malaquias | 18 Aug 17:16 2012
Picon

regex-pcre is not working with UTF-8

Hello.

It seems that the regex-pcre has a bug dealing with utf-8:

   Prelude> :m + Text.Regex.PCRE

   Prelude Text.Regex.PCRE> "país:Brasil" =~ "país:(.*)" :: (String,String,String,[String])
   ("","pa\237s:Brasil","",["rasil"])

Notice the missing 'B' in the result of the regex matching.

With regex-posix this does not happen:

   Prelude> :m + Text.Regex.Posix

   Prelude Text.Regex.Posix> "país:Brasil" =~ "país:(.*)" ::(String,String,String,[String])
   ("","pa\237s:Brasil","",["Brasil"])

I hope this bug can be fixed soon.

Is there a bug tracker to report the bug? If so, what is it?

Romildo
Konstantin Litvinenko | 21 Aug 21:25 2012
Picon

Re: regex-pcre is not working with UTF-8

On 08/18/2012 06:16 PM, José Romildo Malaquias wrote:
> Hello.
>
> It seems that the regex-pcre has a bug dealing with utf-8:
>
> I hope this bug can be fixed soon.
>
> Is there a bug tracker to report the bug? If so, what is it?
>
You need something like that

let pat = makeRegexOpts (compUTF8 .|. defaultCompOpt) defaultExecOpt 
(" <at> '(.+?)' <at> " :: B.ByteString)

and than pat will match correctly.
José Romildo Malaquias | 21 Aug 23:00 2012
Picon

Re: regex-pcre is not working with UTF-8

On Tue, Aug 21, 2012 at 10:25:53PM +0300, Konstantin Litvinenko wrote:
> On 08/18/2012 06:16 PM, José Romildo Malaquias wrote:
> > Hello.
> >
> > It seems that the regex-pcre has a bug dealing with utf-8:
> >
> > I hope this bug can be fixed soon.
> >
> > Is there a bug tracker to report the bug? If so, what is it?
> >
> You need something like that
> 
> let pat = makeRegexOpts (compUTF8 .|. defaultCompOpt) defaultExecOpt 
> (" <at> '(.+?)' <at> " :: B.ByteString)
> 
> and than pat will match correctly.

The bug is related to String (not ByteString) in a UTF-8 locale.

Until it is fixed, I am using the workaround of converting the regular
expression and the text to ByteString, doing the matching, and then
converting the results back to String.

Romildo

Gmane