Chris Croome | 25 Feb 18:14
Picon
Favicon

Re:   to  mystery

Hi

On Wed 25-Feb-2004 at 11:32:05 -0500, William McKee wrote:
> 
> It's small so I've attached the script and the template. Let me know
> if you have any questions about it. I'd love to know if you see the
> character in the second example (there's a   char between the
> words Sticky and Space).

UTF-8 env:

  $ printenv | grep LANG
  LANG=en_GB.UTF-8

Running the script I get these two results:

  <title>Petal Test</title>
  <p>Sticky�Space</p>

  <title>Petal Test with Petal::Parser::HTB</title>
  <p>Sticky Space</p>

Is that what you expected?

Chris

--

-- 
Chris Croome                               <chris@...>
web design                             http://www.webarchitects.co.uk/ 
web content management                               http://mkdoc.com/   
(Continue reading)

William McKee | 25 Feb 18:39
Favicon

Re: &nbsp; to  mystery

On Wed, Feb 25, 2004 at 05:14:35PM +0000, Chris Croome wrote:
> Is that what you expected?

Not really. I'm surprised you get the ??? with straight Petal since I
thought the \240 character was a sticky space in utf8. Does the ??? mean
that the terminal just can't print the character? That could make sense.
I don't understand why the  is not printed when using
Petal::Parser::HTB in a UTF8 environment. More mysteries!

Here's my results:

$ printenv|grep LANG
LANG=en_US

    <title>Petal Test</title>
    <p>Sticky Space</p>

    ------------------------------------------------------------------------

    <title>Petal Test with Petal::Parser::HTB</title>
    <p>Sticky Space</p>

In the second example, I get the space but the  character preceeds it
much like your problem with the copyright entity. I had originally
thought the  was replacing the sticky space. 

-Wm

--

-- 
Knowmad Services Inc.
(Continue reading)

Jean-Michel Hiver | 25 Feb 19:53

Re: &nbsp; to  mystery

William McKee wrote:

>On Wed, Feb 25, 2004 at 05:14:35PM +0000, Chris Croome wrote:
>  
>
>>Is that what you expected?
>>    
>>
>
>Not really. I'm surprised you get the ??? with straight Petal since I
>thought the \240 character was a sticky space in utf8. Does the ??? mean
>that the terminal just can't print the character? That could make sense.
>I don't understand why the  is not printed when using
>Petal::Parser::HTB in a UTF8 environment. More mysteries!
>  
>
Just a quick note to tell everybody that I'm not dead, but that I have 
absolutely no idea what's going on. Our software can manage pages and 
pages of arabic / hurdu / chinese but the copyright symbol and the non 
breaking space seem to behave funny sometimes :-/

Maybe a s/Â / / somewhere along the line is what's _really_ needed :-)

William McKee | 25 Feb 21:18
Favicon

Re: &nbsp; to  mystery

Hey Jean-Michel,

Good to hear from you! The devil is in the details.

> Maybe a s/Â / / somewhere along the line is what's _really_ needed :-)

I'm all for this kind of solution. However, I suggest you change it to
the following regex or it will not work in Chris' example:

    s/Â//g;

-Wm

--

-- 
Knowmad Services Inc.
http://www.knowmad.com

William McKee | 26 Feb 03:34
Favicon

Re: &nbsp; to  mystery

Hey Chris,

I just read about the `use bytes` pragma on another mailing list and
threw it into my test script just to see what would happen. Here's the
output I get now (notice that the original version now displays the
 and the broken version still shows the  and now also has a Ã:

  <title>Petal Test</title>

  <p>Sticky Space</p>

  ------------------------------------------------------------------------

  <title>Petal Test with Petal::Parser::HTB</title>

  <p>StickyàSpace</p>

I wonder what you get when you add that pragma.... At any rate, I
started to do some more reading about this pragma and came across the
following manpages which would probably be of use to us (haven't read
them myself yet, though):

    perllocale
    perluniintro
    perlunicode

William

--

-- 
Knowmad Services Inc.
(Continue reading)

Chris Croome | 26 Feb 13:07
Picon
Favicon

Re: &nbsp; to  mystery

Hi

On Wed 25-Feb-2004 at 09:34:25PM -0500, William McKee wrote:
> 
> I just read about the `use bytes` pragma on another mailing list and
> threw it into my test script just to see what would happen. Here's the
> output I get now (notice that the original version now displays the
>  and the broken version still shows the  and now also has a Ã:
> 
> 
>   <title>Petal Test</title>
>   
>   <p>Sticky Space</p>
>   
>   ------------------------------------------------------------------------
>   
>   <title>Petal Test with Petal::Parser::HTB</title>
>   
>   <p>StickyàSpace</p>
> 
> 
> I wonder what you get when you add that pragma.... 

I get this:

  <title>Petal Test</title>

  <p>Sticky Space</p>

  ------------------------------------------------------------------------
(Continue reading)

William McKee | 26 Feb 15:10
Favicon

Re: &nbsp; to  mystery

On Thu, Feb 26, 2004 at 12:07:52PM +0000, Chris Croome wrote:
> I get this:

Interesting. That's what I get with LANG=en_US. The 'use bytes' pragma
acts as though it is converting utf8 to ascii.

Wm

--

-- 
Knowmad Services Inc.
http://www.knowmad.com


Gmane