Anjaly | 1 Oct 2007 11:02
Picon

Re: u32regex_search crashes

I am sorry the last message had an mistake.I wanted to say that I want
to do a search that would take all the data as though it is  Utf32
rather than utf8 ( as i incorrectly wrote). I don't know whether i am
making myself clear (I am not very good in expressing the opnion).

What i really want to do is a unicode search on the available data.

						Anjaly G S

On Mon, 2007-10-01 at 09:42 +0100, John Maddock wrote:
> Anjaly wrote:
> > In the regex document it was said that the size of data type of the
> > variable passed to the make_u32regex  that determines character
> > encoding (utf8,utf16 or utf32) .
> 
> *For construction of the regex object*.
> 
> The search algorithms operate independently on any of UTF8/16/32.
> 
> > I passed wchar_t (which i think size
> > is 4) so that the buffer encoding is considered as utf8  by
> > u32regex_search irrespectively.  Actually i am trying to do a utf8
> > search.
> 
> Except the data file you sent *was not valid UTF8* !
> 
> It looks like it's probably UTF16LE, it's up to you in that case to decode 
> the byte order mark and read the text into something that Boost.Regex can 
> handle (for example platform-native UTF16).  ICU should have some file IO 
> routines for doing that kind of thing: for example for loading a file into a 
(Continue reading)

John Maddock | 1 Oct 2007 17:10
Picon

Re: u32regex_search crashes

Anjaly wrote:
> I am sorry the last message had an mistake.I wanted to say that I want
> to do a search that would take all the data as though it is  Utf32
> rather than utf8 ( as i incorrectly wrote). I don't know whether i am
> making myself clear (I am not very good in expressing the opnion).
>
> What i really want to do is a unicode search on the available data.

Right, but if that data is in a file then first you need to read it into 
memory so that it's in a well defined "in-memory-encoding".  You didn't say 
how you were reading the file you sent, but ICU has some API's here: 
http://www.icu-project.org/apiref/icu4c/ustdio_8h.html that assist with 
correctly reading and writing Unicode data to and from files.

John. 

Gmane