Ganesh Sittampalam | 24 Nov 23:00 2012
Picon

portable encoding/decoding without going via a handle

Hi,

I need to convert directly between different string encodings, rather
than just using a particular encoding when reading from/writing to a Handle.

I'm aware of the following options, but they have a few problems:

- text-icu: not easily usable on Windows as it requires libicu
- text: just handles utf8/16/32
- iconv: POSIX only

It seems like GHC's TextEncoding has the necessary low-level
functionality
(http://hackage.haskell.org/packages/archive/base/latest/doc/html/GHC-IO-Encoding-Types.html#t:BufferCodec),
but I can't find any high-level interface for directly transcoding
between String/Bytestring/Text.

Am I missing something, or would this be a useful addition as a separate
library?

Cheers,

Ganesh
Herbert Valerio Riedel | 25 Nov 11:51 2012
Picon

Re: portable encoding/decoding without going via a handle

Ganesh Sittampalam <ganesh <at> earth.li> writes:

> I need to convert directly between different string encodings, rather
> than just using a particular encoding when reading from/writing to a Handle.
>
> I'm aware of the following options, but they have a few problems:
>
> - text-icu: not easily usable on Windows as it requires libicu
> - text: just handles utf8/16/32
> - iconv: POSIX only
>
> It seems like GHC's TextEncoding has the necessary low-level
> functionality
> (http://hackage.haskell.org/packages/archive/base/latest/doc/html/GHC-IO-Encoding-Types.html#t:BufferCodec),
> but I can't find any high-level interface for directly transcoding
> between String/Bytestring/Text.
>
> Am I missing something, or would this be a useful addition as a separate
> library?

btw, looking at the GHC.IO.Encoding.* modules, it seems to me that that
'mkTextEncoding'[1] only supports utf8/16/32 in a system independent
fashion:

,----
| The set of known encodings is system-dependent, but includes at least:
| 
|  - UTF-8
|  - UTF-16, UTF-16BE, UTF-16LE
|  - UTF-32, UTF-32BE, UTF-32LE 
(Continue reading)

Ganesh Sittampalam | 28 Nov 07:54 2012
Picon

Re: portable encoding/decoding without going via a handle

On 25/11/2012 10:51, Herbert Valerio Riedel wrote:

> btw, looking at the GHC.IO.Encoding.* modules, it seems to me that that
> 'mkTextEncoding'[1] only supports utf8/16/32 in a system independent
> fashion:
> 
> ...so does using GHC.Encoding.* actually provide you with more encodings
> than using the other options ('text' et al.) you mentioned? which text
> encodings beyond the UTF-family do you need btw?

I actually only need ones that exist on the current platform because
they're currently in use as GHC's encodings when reading from the
filesystem/console.

In theory I think the need to do the transcoding could be avoided by
just setting those encodings to the right values in the first place, but
in practice it's hard to do that as a purely local change.

Cheers,

Ganesh

Gmane