Jukka Rahkonen | 3 Jan 09:01 2012
Picon

WFS and -where with non-ASCII characters

Hi,

Next trouble with the WFS driver. How am I supposed to find a place named
'Hämeenkylä'?  The following request leads to an error telling me that there are
forbidden characters.

ogr2ogr -f "ESRI Shapefile" -spat 377346 6673628 392155 6684806 test.shp
WFS:http://hip.latuviitta.org/cgi-bin/tinyows lv:pks_tilastoalue_piste -where
name='Hämeenkylä'

I have met this same error with other clients usign http GET. I suppose that 'ä'
characters should be given somehow escaped.

-Jukka Rahkonen-
Ari Jolma | 3 Jan 09:45 2012
Picon

Re: WFS and -where with non-ASCII characters

Jukka,

I get: ERROR 1: 'Hämeenkylä' not recognised as an available field.

I.e., it seems to not be confused with 'ä' - I'm using a linux, where 
the terminal may be better with UTF8. However, the message confused me, 
it is coming from OGR SQL engine, and it seems that the engine tries to 
parse the constant as a field name.

The fix is to escape ', i.e., \'Hämeenkylä\', which works.

Ari

On 01/03/2012 10:01 AM, Jukka Rahkonen wrote:
> Hi,
>
> Next trouble with the WFS driver. How am I supposed to find a place named
> 'Hämeenkylä'?  The following request leads to an error telling me that there are
> forbidden characters.
>
> ogr2ogr -f "ESRI Shapefile" -spat 377346 6673628 392155 6684806 test.shp
> WFS:http://hip.latuviitta.org/cgi-bin/tinyows  lv:pks_tilastoalue_piste -where
> name='Hämeenkylä'
>
> I have met this same error with other clients usign http GET. I suppose that 'ä'
> characters should be given somehow escaped.
>
> -Jukka Rahkonen-
Jukka Rahkonen | 3 Jan 10:01 2012
Picon

Re: WFS and -where with non-ASCII characters

Ari Jolma <ari.jolma <at> gmail.com> writes:

> 
> Jukka,
> 
> I get: ERROR 1: 'Hämeenkylä' not recognised as an available field.
> 
> I.e., it seems to not be confused with 'ä' - I'm using a linux, where 
> the terminal may be better with UTF8. However, the message confused me, 
> it is coming from OGR SQL engine, and it seems that the engine tries to 
> parse the constant as a field name.
> 
> The fix is to escape ', i.e., \'Hämeenkylä\', which works.
> 
> Ari

Hi Ari,

It seems to behave differently on Windows. I got your error by using -where
name=Hämeenkylä. -where name='Hämeenkylä' gives me

ERROR 1: Error returned by server : <?xml version='1.0' encoding='UTF-8'?>
<ows:ExceptionReport
 xmlns='http://www.opengis.net/ows'
 xmlns:ows='http://www.opengis.net/ows'
 xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'
 xsi:schemaLocation='http://www.opengis.net/ows http://schemas.opengis.net/ows/1
.0.0/owsExceptionReport.xsd'
 version='1.1.0' language='en'>
 <ows:Exception exceptionCode='MissingParameterValue' locator='request'>
(Continue reading)

Even Rouault | 3 Jan 10:59 2012

Re: WFS and -where with non-ASCII characters

Selon Ari Jolma <ari.jolma <at> gmail.com>:

> Jukka,
>
> I get: ERROR 1: 'Hämeenkylä' not recognised as an available field.
>
> I.e., it seems to not be confused with 'ä' - I'm using a linux, where
> the terminal may be better with UTF8. However, the message confused me,
> it is coming from OGR SQL engine, and it seems that the engine tries to
> parse the constant as a field name.
>
> The fix is to escape ', i.e., 'Hämeenkylä', which works.
>

On Linux, I suspect that the single quotes must be removed by your shell (hence
the SQL engine believes that it must test equality between a column and another
column). So the solution is to escape the single-quote, or to
double-quote the whole stuff:

-where "name='Hämeenkylä'".

On Windows, there might perhaps be a problem with the shell passing strings in
the local code page (CP1252 or whatever variant of it that is used in
Finland), whereas the WFS server perhaps expect it to be UTF-8. I'm not sure
which encoding we are supposed to send through Filter Encoding in KVP, and I'm
not sure how it is escaped (or not) currently. But if it works with Linux, it is
likely not an escaping problem, but one related with the encoding. Sending the
Filter Encoding as a POST content will likely get rid of the encoding problem
from a theoritical point of view, but there would still be the problem of
Windows passing non-UTF8 strings to the command line utilities
(Continue reading)

Rahkonen Jukka | 3 Jan 11:34 2012
Picon

Re: WFS and -where with non-ASCII characters

Even Rouault <even.rouault <at> mines-paris.org> writes:

> On Windows, there might perhaps be a problem with the shell passing strings in
> the local code page (CP1252 or whatever variant of it that is used in
> Finland), whereas the WFS server perhaps expect it to be UTF-8. I'm not sure
> which encoding we are supposed to send through Filter Encoding in KVP, and I'm
> not sure how it is escaped (or not) currently. But if it works with Linux, it is
> likely not an escaping problem, but one related with the encoding. Sending the
> Filter Encoding as a POST content will likely get rid of the encoding problem
> from a theoritical point of view, but there would still be the problem of
> Windows passing non-UTF8 strings to the command line utilities

POST method has been reliable for me both with GIS clients or when sending
requests through Firefox and Poster.

> Jukka, I have not the means of checking for now (behind corporate firewall that
> blocks your servre). But perhaps you could try with the following OGR Python
> script :
> 
> from osgeo import ogr
> ds = ogr.Open('WFS:http://hip.latuviitta.org/cgi-bin/tinyows')
> lyr = ds.GetLayerByName('lv:pks_tilastoalue_piste')
> lyr.SetAttributeFilter("name='Hämeenkylä'")
> feat = lyr.GetNextFeature()
> feat.DumpReadable()
> 
> The python shell should send UTF8 strings to GDAL.

Error is about the same as with ogr2ogr

(Continue reading)

Jukka Rahkonen | 3 Jan 12:07 2012
Picon

Re: WFS and -where with non-ASCII characters

I took the successful query sent by Ari from the TinyOWS log and copied it
literally into Windows and this way it works:

-where name='Hämeenkylä'

-Jukka-
Even Rouault | 3 Jan 13:20 2012

Re: Re: WFS and -where with non-ASCII characters

Selon Jukka Rahkonen <jukka.rahkonen <at> mmmtike.fi>:

> I took the successful query sent by Ari from the TinyOWS log and copied it
> literally into Windows and this way it works:

Yes, this confirms that the content provided on the ogr2ogr command line is
passed untransformed to OGR (so in a ANSI codepage) whereas, in the context of
the WFS server, UTF-8 is expected. Perhaps we should swith GDAL/OGR command line
utitities to use the wide-char version of main() instead of the ANSI one in
order to get command line arguments as unicode string. But I'm afraid this would
break other parts of GDAL that are not encoding aware and for which the
translation to unicode wouldn't be that great.

About Python, I did check and I was wrong. On Windows, the Python interpreter
uses the terminal encoding by default and not UTF-8.

So you could try :

from osgeo import ogr
import sys
ds = ogr.Open('WFS:http://hip.latuviitta.org/cgi-bin/tinyows')
lyr = ds.GetLayerByName('lv:pks_tilastoalue_piste')
lyr.SetAttributeFilter("name='Hämeenkylä'".decode(sys.stdin.encoding).encode('utf-8'))
feat = lyr.GetNextFeature()
feat.DumpReadable()

Or create a test.py script and make sure to open it with a editor configured in
UTF-8 (typically not notepad !) :

#!/usr/bin/env python
(Continue reading)

Mateusz Łoskot | 3 Jan 13:45 2012
Picon

Re: Re: WFS and -where with non-ASCII characters

On 3 January 2012 11:07, Jukka Rahkonen <jukka.rahkonen <at> mmmtike.fi> wrote:
> I took the successful query sent by Ari from the TinyOWS log and copied it
> literally into Windows and this way it works:
>
> -where name='Hämeenkylä'

Windows Command Prompt can work with UTF-8 characters if you change
codepage to UTF-8:

0) Open new prompt (cmd.exe)
1) Change font to Lucida Concole
3) chcp 65001

And OGR can consume filter without problems:

-where "name=\"Hämeenkylä\""

Note, the \"\" is needed to not to confuse OGR SQL compilers,
otherwise value Hämeenkylä
will be parsed as OGR SQL type SNT_COLUMN instead of SNT_CONSTANT for
field value.

However, I think the problem may be with TinyOWS. It throws error;

<ows:ExceptionText>QUERY_STRING contains forbidden
characters</ows:ExceptionText>

which is generated by TinyOWS:

http://www.tinyows.org/trac/browser/trunk/src/struct/cgi_request.c?rev=525#L208
(Continue reading)

Ari Jolma | 3 Jan 14:07 2012
Picon

Re: Re: WFS and -where with non-ASCII characters

On 01/03/2012 02:45 PM, Mateusz Łoskot wrote:
> On 3 January 2012 11:07, Jukka Rahkonen<jukka.rahkonen <at> mmmtike.fi>  wrote:
>> I took the successful query sent by Ari from the TinyOWS log and copied it
>> literally into Windows and this way it works:
>>
>> -where name='Hämeenkylä'
> Windows Command Prompt can work with UTF-8 characters if you change
> codepage to UTF-8:
>
> 0) Open new prompt (cmd.exe)
> 1) Change font to Lucida Concole
> 3) chcp 65001
>
> And OGR can consume filter without problems:
>
> -where "name=\"Hämeenkylä\""
>
> Note, the \"\" is needed to not to confuse OGR SQL compilers,
> otherwise value Hämeenkylä
> will be parsed as OGR SQL type SNT_COLUMN instead of SNT_CONSTANT for
> field value.

Is that really so? At least in PostgreSQL " and ' have different uses. " 
is used for column names, which are not all lowercase and without 
special characters and ' is used for string constants (as in this case).

Ari
Mateusz Łoskot | 3 Jan 14:17 2012
Picon

Re: Re: WFS and -where with non-ASCII characters

On 3 January 2012 13:07, Ari Jolma <ari.jolma <at> gmail.com> wrote:
> On 01/03/2012 02:45 PM, Mateusz Łoskot wrote:
>> On 3 January 2012 11:07, Jukka Rahkonen<jukka.rahkonen <at> mmmtike.fi>  wrote:
>>>
>>> I took the successful query sent by Ari from the TinyOWS log and copied
>>> it literally into Windows and this way it works:
>>>
>>> -where name='Hämeenkylä'
>>
>> Windows Command Prompt can work with UTF-8 characters if you change
>> codepage to UTF-8:
>>
>> 0) Open new prompt (cmd.exe)
>> 1) Change font to Lucida Concole
>> 3) chcp 65001
>>
>> And OGR can consume filter without problems:
>>
>> -where "name=\"Hämeenkylä\""
>>
>> Note, the \"\" is needed to not to confuse OGR SQL compilers,
>> otherwise value Hämeenkylä
>> will be parsed as OGR SQL type SNT_COLUMN instead of SNT_CONSTANT for
>> field value.
>
>
> Is that really so?

I have checked the two variants under debugger and that's what I see,
as far as I look at right place.
(Continue reading)

Even Rouault | 3 Jan 15:31 2012

Re: Re: WFS and -where with non-ASCII characters

Selon Mateusz Łoskot <mateusz <at> loskot.net>:

> On 3 January 2012 13:07, Ari Jolma <ari.jolma <at> gmail.com> wrote:
> > On 01/03/2012 02:45 PM, Mateusz Łoskot wrote:
> >> On 3 January 2012 11:07, Jukka Rahkonen<jukka.rahkonen <at> mmmtike.fi>
>  wrote:
> >>>
> >>> I took the successful query sent by Ari from the TinyOWS log and copied
> >>> it literally into Windows and this way it works:
> >>>
> >>> -where name='Hämeenkylä'
> >>
> >> Windows Command Prompt can work with UTF-8 characters if you change
> >> codepage to UTF-8:
> >>
> >> 0) Open new prompt (cmd.exe)
> >> 1) Change font to Lucida Concole
> >> 3) chcp 65001
> >>
> >> And OGR can consume filter without problems:
> >>
> >> -where "name=\"Hämeenkylä\""
> >>
> >> Note, the \"\" is needed to not to confuse OGR SQL compilers,
> >> otherwise value Hämeenkylä
> >> will be parsed as OGR SQL type SNT_COLUMN instead of SNT_CONSTANT for
> >> field value.
> >
> >
> > Is that really so?
(Continue reading)

Rahkonen Jukka | 3 Jan 15:37 2012
Picon

Re: Re: WFS and -where with non-ASCII characters

 
Mateusz Łoskot wrote:

> Jukka Rahkonen wrote:
> > I took the successful query sent by Ari from the TinyOWS 
> log and copied it
> > literally into Windows and this way it works:
> >
> > -where name='Hämeenkylä'
> 
> Windows Command Prompt can work with UTF-8 characters if you change
> codepage to UTF-8:
> 
> 0) Open new prompt (cmd.exe)
> 1) Change font to Lucida Concole
> 3) chcp 65001
> 
> And OGR can consume filter without problems:
> 
> -where "name=\"Hämeenkylä\""
> 
> Note, the \"\" is needed to not to confuse OGR SQL compilers,
> otherwise value Hämeenkylä
> will be parsed as OGR SQL type SNT_COLUMN instead of SNT_CONSTANT for
> field value.
> 
> However, I think the problem may be with TinyOWS. It throws error;
> 
> <ows:ExceptionText>QUERY_STRING contains forbidden
> characters</ows:ExceptionText>
(Continue reading)

Even Rouault | 3 Jan 15:43 2012

Re: Re: WFS and -where with non-ASCII characters

Selon Rahkonen Jukka <Jukka.Rahkonen <at> mmmtike.fi>:

>
> Mateusz Łoskot wrote:
>
> > Jukka Rahkonen wrote:
> > > I took the successful query sent by Ari from the TinyOWS
> > log and copied it
> > > literally into Windows and this way it works:
> > >
> > > -where name='Hämeenkylä'
> >
> > Windows Command Prompt can work with UTF-8 characters if you change
> > codepage to UTF-8:
> >
> > 0) Open new prompt (cmd.exe)
> > 1) Change font to Lucida Concole
> > 3) chcp 65001
> >
> > And OGR can consume filter without problems:
> >
> > -where "name=\"Hämeenkylä\""
> >
> > Note, the \"\" is needed to not to confuse OGR SQL compilers,
> > otherwise value Hämeenkylä
> > will be parsed as OGR SQL type SNT_COLUMN instead of SNT_CONSTANT for
> > field value.
> >
> > However, I think the problem may be with TinyOWS. It throws error;
> >
(Continue reading)

Rahkonen Jukka | 3 Jan 15:51 2012
Picon

Re: Re: WFS and -where with non-ASCII characters

It starts to be really amusing to see how some characters are changing when they travel a few times between
Finland, France and Poland :)

-Jukka-

> -----Alkuperäinen viesti-----
> Lähettäjä: Even Rouault [mailto:even.rouault <at> mines-paris.org] 
> Lähetetty: 3. tammikuuta 2012 16:44
> Vastaanottaja: Rahkonen Jukka
> Kopio: 'gdal-dev <at> lists.osgeo.org'
> Aihe: Re: [gdal-dev] Re: WFS and -where with non-ASCII characters
> 
> Selon Rahkonen Jukka <Jukka.Rahkonen <at> mmmtike.fi>:
> 
> >
> > Mateusz Łoskot wrote:
> >
> > > Jukka Rahkonen wrote:
> > > > I took the successful query sent by Ari from the TinyOWS
> > > log and copied it
> > > > literally into Windows and this way it works:
> > > >
> > > > -where name='Hämeenkylä'
> > >
> > > Windows Command Prompt can work with UTF-8 characters if 
> you change
> > > codepage to UTF-8:
> > >
> > > 0) Open new prompt (cmd.exe)
> > > 1) Change font to Lucida Concole
(Continue reading)

Mateusz Łoskot | 3 Jan 15:56 2012
Picon

Re: Re: WFS and -where with non-ASCII characters

On 3 January 2012 14:43, Even Rouault <even.rouault <at> mines-paris.org> wrote:
> Selon Rahkonen Jukka <Jukka.Rahkonen <at> mmmtike.fi>:
>>
>> Mapserver behaves also as it did before. My codepage is now 65001 and
>> -where "name=\"Hämeenkylä\"" gives http 500 error while
>> -where name='Hämeenkylä' gives correct result.
>
> Yes, your observation confirms my little testing. Mateusz' trick with chcp
> indeed fixes the display of UTF-8 characters in the console, but when I enter an
> accentuated character, the command line utilities consume it as Latin1.
> Note: I'm on Windows xp.
>
> I've verified it with a trivial code compiled with MSVC :
>
> int main(int argc, char* argv[])
> {
>   printf("%d\n", strlen(argv[1]));
>   return 0;
> }
>
> If I try "test éven", it prints 4, whereas it should print 5 if it was really
> UTF-8.

Even,

Your test program works for me as expected for text with
all Polish diacritics included

http://www.flickr.com/photos/mloskot/6628216939/

(Continue reading)

Even Rouault | 3 Jan 16:22 2012

Re: Re: WFS and -where with non-ASCII characters

> Even,
>
> Your test program works for me as expected for text with
> all Polish diacritics included

As expected, really ? I can see on the photo that there are 17 characters in the
string and that it prints 17. But I'd say it is *not* the expected result. If it
was UTF-8, it would be more than 17 because strlen() will/should return the
number of bytes.

>
> http://www.flickr.com/photos/mloskot/6628216939/
>
> Best regards,
> --
> Mateusz Loskot, http://mateusz.loskot.net
>
Mateusz Łoskot | 3 Jan 16:39 2012
Picon

Re: Re: WFS and -where with non-ASCII characters

On 3 January 2012 15:22, Even Rouault <even.rouault <at> mines-paris.org> wrote:
>> Even,
>>
>> Your test program works for me as expected for text with
>> all Polish diacritics included
>
> As expected, really ? I can see on the photo that there are 17 characters in the
> string and that it prints 17. But I'd say it is *not* the expected result. If it
> was UTF-8, it would be more than 17 because strlen() will/should return the
> number of bytes.

Even,

You are right. I assumed wrong correctness: Polish diacritics would
fit in UTF-8
based on their single-byte encoding of extended ASCII codes.
Certainly, they are 2-byte long in UTF-8.

Best regards,
--

-- 
Mateusz Loskot, http://mateusz.loskot.net

Gmane