William McKee | 25 Feb 17:32
Favicon

Re:   to  mystery

On Wed, Feb 25, 2004 at 02:01:56PM +0000, Chris Croome wrote:
> Mail the script  -- I have a UTF-8 env and would be happy to test it.

It's small so I've attached the script and the template. Let me know if
you have any questions about it. I'd love to know if you see the
character in the second example (there's a   char between the words
Sticky and Space).

> I don't have any answer but I know that MKDoc has a bug very much like
> this at the moment:
> 
>   Copyright © 2001-2002 MKDoc Ltd.

I see it as well. In fact, I went back to look at my test site and am
now seeing that character showing up again despite the meta tag. Dunno
why it went away for me yesterday but it's definitely there again
instead of a sticky space.

I even fired up Windows on my laptop to check; it's also showing the
character in both Firefox & IE6. Are you using Petal::Parser::HTB on the
scripts that produce these pages?

> I haven't been able to work out how to reproduce this but lots of pages
> with a (c) symbol in the rights metadata field end up with ©
> _sometimes_ ...

The inconsistency is definitely the most annoying aspect of this whole
problem.

William
(Continue reading)

Chris Croome | 25 Feb 18:14
Picon
Favicon

Re:   to  mystery

Hi

On Wed 25-Feb-2004 at 11:32:05 -0500, William McKee wrote:
> 
> It's small so I've attached the script and the template. Let me know
> if you have any questions about it. I'd love to know if you see the
> character in the second example (there's a   char between the
> words Sticky and Space).

UTF-8 env:

  $ printenv | grep LANG
  LANG=en_GB.UTF-8

Running the script I get these two results:

  <title>Petal Test</title>
  <p>Sticky�Space</p>

  <title>Petal Test with Petal::Parser::HTB</title>
  <p>Sticky Space</p>

Is that what you expected?

Chris

--

-- 
Chris Croome                               <chris@...>
web design                             http://www.webarchitects.co.uk/ 
web content management                               http://mkdoc.com/   
(Continue reading)

William McKee | 25 Feb 18:39
Favicon

Re: &nbsp; to  mystery

On Wed, Feb 25, 2004 at 05:14:35PM +0000, Chris Croome wrote:
> Is that what you expected?

Not really. I'm surprised you get the ??? with straight Petal since I
thought the \240 character was a sticky space in utf8. Does the ??? mean
that the terminal just can't print the character? That could make sense.
I don't understand why the  is not printed when using
Petal::Parser::HTB in a UTF8 environment. More mysteries!

Here's my results:

$ printenv|grep LANG
LANG=en_US

    <title>Petal Test</title>
    <p>Sticky Space</p>

    ------------------------------------------------------------------------

    <title>Petal Test with Petal::Parser::HTB</title>
    <p>Sticky Space</p>

In the second example, I get the space but the  character preceeds it
much like your problem with the copyright entity. I had originally
thought the  was replacing the sticky space. 

-Wm

--

-- 
Knowmad Services Inc.
(Continue reading)

Jean-Michel Hiver | 25 Feb 19:53

Re: &nbsp; to  mystery

William McKee wrote:

>On Wed, Feb 25, 2004 at 05:14:35PM +0000, Chris Croome wrote:
>  
>
>>Is that what you expected?
>>    
>>
>
>Not really. I'm surprised you get the ??? with straight Petal since I
>thought the \240 character was a sticky space in utf8. Does the ??? mean
>that the terminal just can't print the character? That could make sense.
>I don't understand why the  is not printed when using
>Petal::Parser::HTB in a UTF8 environment. More mysteries!
>  
>
Just a quick note to tell everybody that I'm not dead, but that I have 
absolutely no idea what's going on. Our software can manage pages and 
pages of arabic / hurdu / chinese but the copyright symbol and the non 
breaking space seem to behave funny sometimes :-/

Maybe a s/Â / / somewhere along the line is what's _really_ needed :-)

William McKee | 25 Feb 21:18
Favicon

Re: &nbsp; to  mystery

Hey Jean-Michel,

Good to hear from you! The devil is in the details.

> Maybe a s/Â / / somewhere along the line is what's _really_ needed :-)

I'm all for this kind of solution. However, I suggest you change it to
the following regex or it will not work in Chris' example:

    s/Â//g;

-Wm

--

-- 
Knowmad Services Inc.
http://www.knowmad.com

William McKee | 26 Feb 03:34
Favicon

Re: &nbsp; to  mystery

Hey Chris,

I just read about the `use bytes` pragma on another mailing list and
threw it into my test script just to see what would happen. Here's the
output I get now (notice that the original version now displays the
 and the broken version still shows the  and now also has a Ã:

  <title>Petal Test</title>

  <p>Sticky Space</p>

  ------------------------------------------------------------------------

  <title>Petal Test with Petal::Parser::HTB</title>

  <p>StickyàSpace</p>

I wonder what you get when you add that pragma.... At any rate, I
started to do some more reading about this pragma and came across the
following manpages which would probably be of use to us (haven't read
them myself yet, though):

    perllocale
    perluniintro
    perlunicode

William

--

-- 
Knowmad Services Inc.
(Continue reading)

Chris Croome | 26 Feb 13:07
Picon
Favicon

Re: &nbsp; to  mystery

Hi

On Wed 25-Feb-2004 at 09:34:25PM -0500, William McKee wrote:
> 
> I just read about the `use bytes` pragma on another mailing list and
> threw it into my test script just to see what would happen. Here's the
> output I get now (notice that the original version now displays the
>  and the broken version still shows the  and now also has a Ã:
> 
> 
>   <title>Petal Test</title>
>   
>   <p>Sticky Space</p>
>   
>   ------------------------------------------------------------------------
>   
>   <title>Petal Test with Petal::Parser::HTB</title>
>   
>   <p>StickyàSpace</p>
> 
> 
> I wonder what you get when you add that pragma.... 

I get this:

  <title>Petal Test</title>

  <p>Sticky Space</p>

  ------------------------------------------------------------------------
(Continue reading)

William McKee | 26 Feb 15:10
Favicon

Re: &nbsp; to  mystery

On Thu, Feb 26, 2004 at 12:07:52PM +0000, Chris Croome wrote:
> I get this:

Interesting. That's what I get with LANG=en_US. The 'use bytes' pragma
acts as though it is converting utf8 to ascii.

Wm

--

-- 
Knowmad Services Inc.
http://www.knowmad.com


Gmane