William McKee | 3 May 20:29
Favicon

More on entities and Â

Hi all,

Well, the saga continues for me and the capital A circumflex. Most
recently, I am receiving the \302 character on a production server but
not on my test server. Both servers are running Debian Linux with Apache
1.3.29 and mod_perl 1.29.

In this case, I'm not using Petal::HTB which I thought was the culprit
in my previous posts. This time, with the help of Firefox, I was able to
determine that Petal is outputting a nbsp character (\240) but is
prepending it with a \302 character (the capital A circumflex, A0 in
hex). On my test server, the \302 character is not being output.

Chris had indicated this appending behavior with his posts regarding the
copyright character and the capital A circumflex. At the time, I did not
realize that the same behavior was occurring with the nbsp entity. So it
appears to be affecting more than just the nbsp entity.

None of the previous recommendations such as changing the file encoding
or setting the meta tags are helping. For now, I'm going to do a global
search and replace on the output of the process command to remove this
character. However, this is not a good long-term solution due to the
hackish nature and the performance hit. Any suggestions or advice for
tracking down this bug would be most appreciated.

Regards,
William

--

-- 
Knowmad Services Inc.
(Continue reading)

Grant McLean | 3 May 21:09
Picon
Favicon

Re: More on entities and Â

William,

If your output encoding is UTF8 then every character beyond
0x7F will be two or more bytes.  The non-breaking space
character should be A2 A0 (I think).  So as long as you give
the browser the correct charset setting in your headers, it
should do exactly the right thing.

Regards
Grant

William McKee wrote:
> Hi all,
> 
> Well, the saga continues for me and the capital A circumflex. Most
> recently, I am receiving the \302 character on a production server but
> not on my test server. Both servers are running Debian Linux with Apache
> 1.3.29 and mod_perl 1.29.
> 
> In this case, I'm not using Petal::HTB which I thought was the culprit
> in my previous posts. This time, with the help of Firefox, I was able to
> determine that Petal is outputting a nbsp character (\240) but is
> prepending it with a \302 character (the capital A circumflex, A0 in
> hex). On my test server, the \302 character is not being output.
> 
> Chris had indicated this appending behavior with his posts regarding the
> copyright character and the capital A circumflex. At the time, I did not
> realize that the same behavior was occurring with the nbsp entity. So it
> appears to be affecting more than just the nbsp entity.
> 
(Continue reading)

William McKee | 3 May 22:01
Favicon

Re: More on entities and Â

On Tue, May 04, 2004 at 07:09:05AM +1200, Grant McLean wrote:
> If your output encoding is UTF8 then every character beyond
> 0x7F will be two or more bytes.  The non-breaking space
> character should be A2 A0 (I think).  So as long as you give
> the browser the correct charset setting in your headers, it
> should do exactly the right thing.

Hi Grant,

Thanks for the quick response. How do I know what my output encoding is?
I can set the encoding of the file and the meta tag. Should I be
modifying the configuration of my Apache server?

The output I'm getting right now is C2A0. According to this table[1],
nbsp is 00A0 and A2A0 is not defined.

Thanks,
William

[1] http://www.columbia.edu/kermit/utf8-t1.html

--

-- 
Knowmad Services Inc.
http://www.knowmad.com

Michele Beltrame | 4 May 10:15
Gravatar

Re: More on entities and Â

Hi!

> Thanks for the quick response. How do I know what my output encoding is?

Perl does exactly what he wants, that is to says it sets the encoding
depending on the input: if there are wide charachters it goes with UTF8,
otherwise it stays with ISO8859-1. At list, this is what happens in
my Slackware 9.1's Perl 5.8.3.

> I can set the encoding of the file and the meta tag. Should I be
> modifying the configuration of my Apache server?

First of all you need to ensure your output it UTF8:

use Encode;
my $string = $template->process (%stuff);
$string = Encode::encode ('utf8', $string);

If you don't want to use this every time you output a template, you
can subclass Petal and override process() method. See a recent message
about Jean-Michel Hiver about this.

There are two ways, and you probably should use both. First of all there's
the header:

Content-type: text/html; charset=utf-8;

However, the most important this is the META tag, as it overrides the
header settings. This will seems like scandal to purists, but that's the
way it goes. ;-) Here's the header:
(Continue reading)

Jean-Michel Hiver | 4 May 10:50

Re: More on entities and Â


>There are two ways, and you probably should use both. First of all there's
>the header:
>
>Content-type: text/html; charset=utf-8;
>
>However, the most important this is the META tag, as it overrides the
>header settings. This will seems like scandal to purists, but that's the
>way it goes. ;-) Here's the header:
>
It is an important tag, however a lot of browsers give precedence to the 
HTTP headers. Bottom line is, you need to declare your charset in both 
your http headers and your HTML template to be on the safe side.

Cheers,
Jean-Michel.

Grant McLean | 3 May 22:35
Picon
Favicon

Re: More on entities and Â

William McKee wrote:

 > On Tue, May 04, 2004 at 07:09:05AM +1200, Grant McLean wrote:
 >
 >>If your output encoding is UTF8 then every character beyond
 >>0x7F will be two or more bytes.  The non-breaking space
 >>character should be A2 A0 (I think).  So as long as you give
 >>the browser the correct charset setting in your headers, it
 >>should do exactly the right thing.
 >
 >
 > Hi Grant,
 >
 > Thanks for the quick response. How do I know what my output
 > encoding is?

It will be UTF8 unless you do something to change it.
For example (assuming Perl 5.8):

   my $html = $template->process (%args);
   open($fh,'>:encoding(iso-8859-1)', $path) or die "open($path): $!";
   $fh->print($html);

 > I can set the encoding of the file and the meta tag.

Yes, this tells the browser how it should interpret the document:

   <meta http-equiv="Content-type" content="text/html; charset=utf-8">

Obviously this needs to match the encoding used to create the file.
(Continue reading)

Chris Croome | 4 May 15:35
Picon
Favicon

Re: More on entities and Â

Hi

On Tue 04-May-2004 at 08:35:13 +1200, Grant McLean wrote:
> 
> I've heard that not all browsers honour the charset suffix on the
> Content-type header so it might not be worth the effort.  The meta tag
> has the advantage of staying with the document if the user does a
> 'Save-as', whereas the HTTP header would be lost.

IE _never_ takes _any_ notice of the charset in the HTTP headers,
therefore it is essential to use a meta element in the document (in the
same way that it doesn't really care about what mime type things are
served as, file extensions are the only things apart from the content of
the file it looks at).

However as fas as I'm aware all other broswers to follow the HTTP
specification (where the charset in the headers takes prescdent over the
charset set in the document) so it is important to use both and to have
their values the same!

Chris

--

-- 
Chris Croome                               <chris@...>
web design                             http://www.webarchitects.co.uk/ 
web content management                               http://mkdoc.com/   


Gmane