Richard Cobbe | 29 Jul 20:04 2012

GHC rendering of non-ASCII characters configurable?

I'm working on an application that involves processing a lot of Unicode
data, and I'm finding the built-in Show implementation for Char to be
really inconvenient.  Specifically, it renders all characters at U+0080 and
above with decimal escapes:

    Prelude> '\x80'
    '\128'

This is annoying because all of the Unicode charts give the code points in
hex, and indeed the charts are split into different PDFs at numbers that
are nice and round in hex but not in decimal.  So in order to figure out
which character I'm looking at, I have to convert back to hex and then look
it up in the charts.

Is there any way to ask GHC to render super-ASCII characters with their
hexadecimal escapes, instead?  I'm perfectly happy to write my own custom
Show instance, but I don't know how to hook that into ghci's REPL (or, for
that matter, the routines that HUnit uses to generate the messages on
failed tests, etc.).

I'm using GHC 7.4.1 on MacOS 10.7.4.

Thanks,

Richard
Paolo Capriotti | 30 Jul 11:02 2012
Picon

Re: GHC rendering of non-ASCII characters configurable?

On Sun, Jul 29, 2012 at 7:04 PM, Richard Cobbe <cobbe <at> ccs.neu.edu> wrote:
> I'm working on an application that involves processing a lot of Unicode
> data, and I'm finding the built-in Show implementation for Char to be
> really inconvenient.  Specifically, it renders all characters at U+0080 and
> above with decimal escapes:
>
>     Prelude> '\x80'
>     '\128'
>
> This is annoying because all of the Unicode charts give the code points in
> hex, and indeed the charts are split into different PDFs at numbers that
> are nice and round in hex but not in decimal.  So in order to figure out
> which character I'm looking at, I have to convert back to hex and then look
> it up in the charts.
>
> Is there any way to ask GHC to render super-ASCII characters with their
> hexadecimal escapes, instead?  I'm perfectly happy to write my own custom
> Show instance, but I don't know how to hook that into ghci's REPL (or, for
> that matter, the routines that HUnit uses to generate the messages on
> failed tests, etc.).
>
> I'm using GHC 7.4.1 on MacOS 10.7.4.

In GHC HEAD there is a new flag -interactive-print that allows to
change the function used for printing values in GHCi. It will be in
7.6.1. That won't help with HUnit output, though.

BR,
Paolo
(Continue reading)

Max Rabkin | 30 Jul 12:27 2012
Picon

Re: GHC rendering of non-ASCII characters configurable?

On Sun, Jul 29, 2012 at 8:04 PM, Richard Cobbe <cobbe <at> ccs.neu.edu> wrote:
> This is annoying because all of the Unicode charts give the code points in
> hex, and indeed the charts are split into different PDFs at numbers that
> are nice and round in hex but not in decimal.  So in order to figure out
> which character I'm looking at, I have to convert back to hex and then look
> it up in the charts.

My reading of the Haskell 98 report is that the Show instance for Char
*could* use hex escapes, so this is a compiler choice. If there isn't
a good reason for this choice, perhaps GHC could change?

--Max
Ivan Lazar Miljenovic | 30 Jul 15:45 2012
Picon

Re: GHC rendering of non-ASCII characters configurable?

On 30 July 2012 04:04, Richard Cobbe <cobbe <at> ccs.neu.edu> wrote:
> I'm working on an application that involves processing a lot of Unicode
> data, and I'm finding the built-in Show implementation for Char to be
> really inconvenient.  Specifically, it renders all characters at U+0080 and
> above with decimal escapes:
>
>     Prelude> '\x80'
>     '\128'
>
> This is annoying because all of the Unicode charts give the code points in
> hex, and indeed the charts are split into different PDFs at numbers that
> are nice and round in hex but not in decimal.  So in order to figure out
> which character I'm looking at, I have to convert back to hex and then look
> it up in the charts.

Can I ask what you're doing here? Are you printing individual
characters or entire chunks of text?

putStrLn and similar IO-based functions (at least for me) will
un-escape characters if that helps.  Otherwise, are you using Text or
String?

>
> Is there any way to ask GHC to render super-ASCII characters with their
> hexadecimal escapes, instead?  I'm perfectly happy to write my own custom
> Show instance, but I don't know how to hook that into ghci's REPL (or, for
> that matter, the routines that HUnit uses to generate the messages on
> failed tests, etc.).
>
> I'm using GHC 7.4.1 on MacOS 10.7.4.
(Continue reading)

Richard Cobbe | 31 Jul 13:01 2012

Re: GHC rendering of non-ASCII characters configurable?

On Mon, Jul 30, 2012 at 11:45:38PM +1000, Ivan Lazar Miljenovic wrote:
> On 30 July 2012 04:04, Richard Cobbe <cobbe <at> ccs.neu.edu> wrote:
> > I'm working on an application that involves processing a lot of Unicode
> > data, and I'm finding the built-in Show implementation for Char to be
> > really inconvenient.  Specifically, it renders all characters at U+0080 and
> > above with decimal escapes:
> >
> >     Prelude> '\x80'
> >     '\128'
> >
> > This is annoying because all of the Unicode charts give the code points in
> > hex, and indeed the charts are split into different PDFs at numbers that
> > are nice and round in hex but not in decimal.  So in order to figure out
> > which character I'm looking at, I have to convert back to hex and then look
> > it up in the charts.
>
> Can I ask what you're doing here? Are you printing individual
> characters or entire chunks of text?

Mostly, I'm working with expressions of type String, rather than Text; the
Char above was merely an example to demonstrate the problem.  The two I/O
cases that most concern me are evaluating a String expression at the GHCi
REPL, and working with HUnit test cases built around String expressions.

I suppose I could wrap putStrLn around all string exprs at the repl, but a)
that's a pain; b) it's important for this app that I be able to distinguish
between precomposed characters and combining characters; and c) some of the
characters I'm dealing with are very similar in my terminal fonts, such as
U+1F00 and U+1F01.  It's much nicer to be able to just see the code points.

(Continue reading)

Ivan Lazar Miljenovic | 31 Jul 13:17 2012
Picon

Re: GHC rendering of non-ASCII characters configurable?

On 31 July 2012 21:01, Richard Cobbe <cobbe <at> ccs.neu.edu> wrote:
> On Mon, Jul 30, 2012 at 11:45:38PM +1000, Ivan Lazar Miljenovic wrote:
>> On 30 July 2012 04:04, Richard Cobbe <cobbe <at> ccs.neu.edu> wrote:
>> > I'm working on an application that involves processing a lot of Unicode
>> > data, and I'm finding the built-in Show implementation for Char to be
>> > really inconvenient.  Specifically, it renders all characters at U+0080 and
>> > above with decimal escapes:
>> >
>> >     Prelude> '\x80'
>> >     '\128'
>> >
>> > This is annoying because all of the Unicode charts give the code points in
>> > hex, and indeed the charts are split into different PDFs at numbers that
>> > are nice and round in hex but not in decimal.  So in order to figure out
>> > which character I'm looking at, I have to convert back to hex and then look
>> > it up in the charts.
>>
>> Can I ask what you're doing here? Are you printing individual
>> characters or entire chunks of text?
>
> Mostly, I'm working with expressions of type String, rather than Text;

Any particular reason why?  Using Text will probably solve your
problem and give you a performance improvement at the same time.

--

-- 
Ivan Lazar Miljenovic
Ivan.Miljenovic <at> gmail.com
http://IvanMiljenovic.wordpress.com
(Continue reading)

Richard Cobbe | 1 Aug 02:35 2012

Re: GHC rendering of non-ASCII characters configurable?

On Tue, Jul 31, 2012 at 09:17:34PM +1000, Ivan Lazar Miljenovic wrote:
> On 31 July 2012 21:01, Richard Cobbe <cobbe <at> ccs.neu.edu> wrote:
> > On Mon, Jul 30, 2012 at 11:45:38PM +1000, Ivan Lazar Miljenovic wrote:

> >> Can I ask what you're doing here? Are you printing individual
> >> characters or entire chunks of text?
> >
> > Mostly, I'm working with expressions of type String, rather than Text;
>
> Any particular reason why?  Using Text will probably solve your
> problem and give you a performance improvement at the same time.

Well, I initially went with String because I didn't want to clutter up my
code with all of the calls to 'pack', especially around string literals.
I'm open to being convinced that it's worth it to switch, though.

In any case, while Text is undoubtedly faster than String, it unfortunately
doesn't solve my problem with output rendering:

    [vimes:~]$ ghci
    GHCi, version 7.4.1: http://www.haskell.org/ghc/  :? for help
    Loading package ghc-prim ... linking ... done.
    Loading package integer-gmp ... linking ... done.
    Loading package base ... linking ... done.
    Prelude> :m +Data.Text
    Prelude Data.Text> pack "\x1f00"
    Loading package array-0.4.0.0 ... linking ... done.
    Loading package bytestring-0.9.2.1 ... linking ... done.
    Loading package deepseq-1.3.0.0 ... linking ... done.
    Loading package text-0.11.2.0 ... linking ... done.
(Continue reading)

Erik Hesselink | 1 Aug 10:24 2012
Picon

Re: GHC rendering of non-ASCII characters configurable?

On Wed, Aug 1, 2012 at 2:35 AM, Richard Cobbe <cobbe <at> ccs.neu.edu> wrote:
> Well, I initially went with String because I didn't want to clutter up my
> code with all of the calls to 'pack', especially around string literals.
> I'm open to being convinced that it's worth it to switch, though.

For string literals, you can turn on OverloadedStrings to get rid of
the calls to 'pack'.

Erik

Gmane