Chris Croome | 4 May 17:59
Picon
Favicon

Re: More on entities and Â

Hi

On Tue 04-May-2004 at 11:50:14AM -0400, William McKee wrote:
> 
> > > The next thing that confuses me is that I have Perl 5.8.3 installed on
> > > both systems. Only one is showing the extra character.
> > 
> > This is, of course, mistery. ;-)
> 
> Figures... :-/

Perhaps you environment is different?

  $ printenv | grep LANG

?

> My understanding of utf-8 was that it was compatible with latin1.

No, UTF-8 is compatible with US ASCII not Latin 1.

> One more point which may be at the root of my problems. I'm trying
> to get Apache to add the Content-Type header using the following
> declaration in my httpd.conf per the Apache docs:
> 
>     AddDefaultCharset utf-8
> 
> No matter if I have this in my main server configuration or the
> virtual host configuration, if I do a `HEAD http::servername`, I
> get back a Content-Type of iso-8859-1. If I view the page in
(Continue reading)

William McKee | 4 May 18:33
Favicon

Re: More on entities and Â

On Tue, May 04, 2004 at 04:59:26PM +0100, Chris Croome wrote:
> Perhaps you environment is different?
> 
>   $ printenv | grep LANG

That could be. That command returns nothing for my 'www' accounts on
both systems. However, for the user accounts, it returns EN_us on my
local test server and nothing for the production server.

A quick look in my .bashrc and /etc/bash.bashrc and /etc/profile doesn't
turn up where the LANG is being set. Any ideas where this setting would
be?

> > My understanding of utf-8 was that it was compatible with latin1.
> 
> No, UTF-8 is compatible with US ASCII not Latin 1.

Oh! Well that's certainly part of my confusion.

> Hmm, that's odd. 
> 
> I usually do it like this:
> 
>   AddType 'text/html; charset=UTF-8' .html

That works for my html files but my cgi scripts do not have extensions.
I tried using this command:

   AddType 'text/html; charset=UTF-8' *

(Continue reading)

Michele Beltrame | 4 May 19:23
Gravatar

Re: More on entities and Â

Hii!

> I guess I'm stuck with overloading the process function and forcing
> iso-8859-1 mode since I can't seem to convince my Apache server to
> either set the Content-Type header to my preference. Perhaps one day
> I'll figure out which command in my settings file was causing the
> different behavior.

You probably need to overload the process() function anyhow to ensure
that the output is UTF-8. Apache can only "tell the browser" that
you are sending UTF-8, but not make you really send UTF-8. ;-)
Moreover, this is easy to do from the script instead of from Apache
configuration with:

print "Content-type: text/html; charset=UTF-8;\n\n";

or, if you're using CGI.pm:

print $q->header(
    -charset => 'UTF-8'
);

Hope this helps, somehow. ;)

	Talk to you soon, Michele.

--

-- 
Michele Beltrame
http://www.italpro.net/mb/
ICQ# 76660101 - e-mail: mb@...
(Continue reading)

William McKee | 4 May 20:54
Favicon

Re: More on entities and  - SOLVED

Hi folks,

OK, I have a resolution regarding the forced setting of the Content-type
header. This behavior was occurring with a cgi script being run via
Apache::Registry. The script uses CGI::Application which sets the
Content-type to text/html. Because this is an A::Registry script, I have
the PerlSendHeaders declarative on.

The end result is that something was defaulting the charset to
ISO-8859-1 despite my Apache settings which suggest to do otherwise. A
quick "hello world" test shows that CGI.pm is doing this if no charset
is specified in the type argument to the header() function:

This script uses the AddDefaultCharset setting to set the header:

    #!/usr/bin/perl
    use strict;
    print "Content-type: text/html\n\nHello World!\n";

This one sets it to ISO-8859-1:

    #!/usr/bin/perl
    use strict;
    use CGI;
    my $q = new CGI;
    print $q->header( -type => 'text/html');
    print "Hello World!";

You'll see the header printed as:

(Continue reading)

Michele Beltrame | 5 May 09:54
Gravatar

Re: More on entities and  - SOLVED

Hi!

> If I specify the charset in the call to header(), all is well.
> Personally, I think this is a bit heavy-handed of CGI.pm to force the
> content-type. I'm not convinced it is a bug but it's darn close!

CGI.pm documentation reports this:

       The -charset parameter can be used to control the charac-
       ter set sent to the browser.  If not provided, defaults to
       ISO-8859-1.  As a side effect, this sets the charset()
       method as well.

I think it's somehow fair, as Perl's default charset (when it doesn't
upgrade automatically to UTF-8) is latin1 (ISO-8859-1). Hopefully,
when Perl will definitely switch to UTF-8, or will provide a
clearer way to decide the charset one wants to use, CGI.pm will just
do the same. ;-)

	Talk to you soon, Michele.

--

-- 
Michele Beltrame
http://www.italpro.net/mb/
ICQ# 76660101 - e-mail: mb@...

William McKee | 5 May 21:42
Favicon

Re: More on entities and  - SOLVED

On Wed, May 05, 2004 at 09:54:33AM +0200, Michele Beltrame wrote:
> > If I specify the charset in the call to header(), all is well.
> > Personally, I think this is a bit heavy-handed of CGI.pm to force the
> > content-type. I'm not convinced it is a bug but it's darn close!
> 
> CGI.pm documentation reports this:

Yeah, I came across that bit of documentation after sending the message.
I guess that I've just never really been aware of the fact nor have I
seen it mentioned as a potential problem on any of the mailing lists I
read. Now we know!

It would be nice if CGI.pm checked the default settings in Apache's
httpd.conf or auto-detected the format of the outgoing data but those
capabilities are probably asking a bit too much.

Thanks,
William

--

-- 
Knowmad Services Inc.
http://www.knowmad.com

William McKee | 4 May 19:40
Favicon

Re: More on entities and Â

On Tue, May 04, 2004 at 07:23:34PM +0200, Michele Beltrame wrote:
> You probably need to overload the process() function anyhow to ensure
> that the output is UTF-8.

That's probably a good idea.

On my production server, my output is UTF-8 right now (I can see this by
changing the charset from within Firefox). My problem is that something
in my Apache configuration on this server is causing the following
header to be sent:

    Content-Type: text/html; charset=ISO-8859-1

At this point, I've strayed off topic so will wrap up this thread and
take my inquiries to an Apache list. Thanks so much for everyone's input
on this very complex issue.

Regards,
William

--

-- 
Knowmad Services Inc.
http://www.knowmad.com


Gmane