13 Oct 2012 16:37
character encoding problems with hxt (I think)
<nadine.and.henry <at> pobox.com>
2012-10-13 14:37:05 GMT
2012-10-13 14:37:05 GMT
Dear Haskellers,
I'm trying to write some code that grabs countries and provinces from the
iso_3166 files on Linux systems. I seem to be running into some kind of
character encoding problem. file says iso_3166_2.xml is a utf8 file, and
isutf8 agrees, but when I run the following code, it crashes.
uft8Copy makes a byte for byte copy as expected.
noCrash read and writes the document without crashing, but the accented
characters in the strings show up garbled. Just search for "DE" and you'll
see what I mean. crash (on my system, (Debian testing)) produces the error
message below.
Can anyone enlighten me on what is going on?
Thanks in advance.
Henry Laxen
------------------------------------------------------------------------
{-# LANGUAGE Arrows #-}
import Text.XML.HXT.Core
import Data.List
import qualified System.IO.UTF8 as U
isoFile = "/usr/share/xml/iso-codes/iso_3166_2.xml"
countZerosInLines = length . filter (\x -> x == '0') . concat
utf8Copy = do
(Continue reading)
RSS Feed