Duane Raymond | 20 Sep 21:50

Convert Text to UTF-8?

Hi,

I'm sure someone has come across this before - I just can't find an answer...

I've got a list of countries in a relational DB which I am bringing
into Plone in different languages according to the language setting
(i.e. Spain for english, España for Spain).  However when it brings
the list in (via a SQL method to get the data and a DTML method to
format it in a drop-down list) all the characters with accents are not
displayed properly.

As Plone used UTF-8 encoding, I've been trying to find a way to
convert the names into UTF-8, but with no luck.  From what I can tell,
it would need to convert something like 'España' into 'Español'.

Anyone know how to convert these text strings or of a better way I
should be approaching this?

Cheers,

Duane

-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
Nick Bower | 21 Sep 09:18

Re: Convert Text to UTF-8?

I'm not sure I understand what is happening because you didn't mention
the encoding of the source data, but when migrating an old ISO8859-1
(mysql) database to UTF-8 (pgsql) for use with Zope, I wrote a python
script that converted the dump file before reloading it as UTF8.  It
wasn't a perfect way of doing it, but it worked and Zope/psycopg worked
with UTF8 fine after that.

#!/usr/bin/python
import sys

if ( len(sys.argv) != 3):
  print """ Usage: latin2unicode <infile> <outfile> """
  raise SystemExit

try:
  infile = open(sys.argv[1],'r')
except:
  print 'Can not open input file ' + sys.argv[1]
  print sys.exc_info()[0]
  raise SystemExit

try:
  outfile = open(sys.argv[2],'w')
except:
  print 'Can not open output file ' + sys.argv[2]
  print sys.exc_info()[0]
  raise SystemExit

print 'Converting...'
unicoderep = unicode(infile.read(),'latin-1')
(Continue reading)

Duane Raymond | 21 Sep 11:32

Re: Convert Text to UTF-8?

> I'm not sure I understand what is happening because you didn't mention
> the encoding of the source data, but when migrating an old ISO8859-1
> (mysql) database to UTF-8 (pgsql) for use with Zope, I wrote a python
> script that converted the dump file before reloading it as UTF8.  It
> wasn't a perfect way of doing it, but it worked and Zope/psycopg worked
> with UTF8 fine after that.

Thank you Nick! A very simple and effective solution.

Not only was I able to convert my data into utf-8 from latin-1 (into a
pgsql db) but I was finally able to identify the python snippet that
would allow me to convert to utf-8 on the fly.

For anyone else finding this, 
in python use: unicode(YOURTTEXT,'latin-1').encode('utf-8')
in dtml use: <dtml-var expr="unicode(YOURTTEXT,'latin-1').encode('utf-8')">

Cheers,

Duane

-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php

Gmane