Jason Dusek | 14 Nov 00:00
Picon
Gravatar

FFI binding -- different behaviour under compilation and interpretation.

  I'm binding to `wcwidth` to determine the column widths of
  various Unicode characters. I noticed a lot of -- in fact all
  -- Chinese characters were being given widths of `-1` when of
  course they should have width `2`. This only showed up when I
  compiled my program though -- within GHCi, it never happened.

  Below my signature is a parred down example that demoes the
  bug. It tries to get the width of only one Chinese character.
  You can see it like this:

   :; ghc --make DemoFailure.hs -o demo && demo
    [1 of 1] Compiling Main             ( DemoFailure.hs, DemoFailure.o )
    Linking demo ...
    0x00005cff  -1  峿
   :; chmod ug+x DemoFailure.hs && DemoFailure.hs
    0x00005cff   2  峿

  Switching between safe/unsafe does not make any difference. This
  was run on a Macintosh.

--
Jason Dusek

#!/usr/bin/env runhaskell

{- DemoFailure.hs -}

{-# LANGUAGE ForeignFunctionInterface
  #-}

import Foreign.C
import Data.Char
import Text.Printf

import qualified  System.IO.UTF8 as UTF8

main                         =  do
  (sequence_ . fmap (UTF8.putStrLn . uncurry fmt)) widths
 where
  widths                     =  [ (c, wcwidth c) | c <- ['\x5cff'] ]
--widths                     =  [ (c, wcwidth c) | c <- [minBound..maxBound] ]
  fmt c cols                 =  printf "0x%08x  %2d  %s" (fromEnum c) cols rep
   where
    rep | ' ' == c           =  "\\SP"
        | isSpace c          =  '\\' : show (fromEnum c)
        | isPrint c          =  [c]
        | otherwise          =  (reverse . drop 1 . reverse . drop 1 . show) c

wcwidth                     ::  Char -> Int
wcwidth                      =  fromEnum . native . toEnum . fromEnum

foreign import ccall unsafe "wchar.h wcwidth" native :: CWchar -> CInt
Daniel Fischer | 14 Nov 01:15
Picon

Re: FFI binding -- different behaviour under compilation and interpretation.

Am Samstag 14 November 2009 00:00:36 schrieb Jason Dusek:

> I'm binding to `wcwidth` to determine the column widths of > various Unicode characters. I noticed a lot of -- in fact all > -- Chinese characters were being given widths of `-1` when of > course they should have width `2`. This only showed up when I > compiled my program though -- within GHCi, it never happened.
It seems that ghci calls setlocale(LC_ALL,"") or similar, while the compiled code doesn't. I've no idea why that would be, but dafis <at> linux-mkk1:~/Haskell/CafeTesting> cat locl.h void sloc(); dafis <at> linux-mkk1:~/Haskell/CafeTesting> cat locl.c #include <locale.h> #include "locl.h" void sloc(){ setlocale(LC_ALL,""); } main                         =  do setloc   (sequence_ . fmap (UTF8.putStrLn . uncurry fmt)) widths where... foreign import ccall unsafe "locl.h sloc" setloc :: IO () fixes it.
Jason Dusek | 14 Nov 02:25
Picon
Gravatar

Re: FFI binding -- different behaviour under compilation and interpretation.

  Thank you very much!

--
Jason Dusek
Jason Dusek | 14 Nov 06:26
Picon
Gravatar

Re: FFI binding -- different behaviour under compilation and interpretation.

  There is a Cabal package for this already:

    http://hackage.haskell.org/package/setlocale

  A call to `setLocale LC_ALL (Just "")` in `main` fixes things.

--
Jason Dusek

2009/11/13 Daniel Fischer <daniel.is.fischer <at> web.de>:

> Am Samstag 14 November 2009 00:00:36 schrieb Jason Dusek: >>   I'm binding to `wcwidth` to determine the column widths of >>   various Unicode characters. I noticed a lot of -- in fact all >>   -- Chinese characters were being given widths of `-1` when of >>   course they should have width `2`. This only showed up when I >>   compiled my program though -- within GHCi, it never happened. > > It seems that ghci calls setlocale(LC_ALL,"") or similar, while the compiled code doesn't. > I've no idea why that would be, but > > dafis <at> linux-mkk1:~/Haskell/CafeTesting> cat locl.h > void sloc(); > dafis <at> linux-mkk1:~/Haskell/CafeTesting> cat locl.c > #include <locale.h> > #include "locl.h" > > void sloc(){ >    setlocale(LC_ALL,""); > } > > > main                         =  do >  setloc >   (sequence_ . fmap (UTF8.putStrLn . uncurry fmt)) widths >    where... > > foreign import ccall unsafe "locl.h sloc" setloc :: IO () > > fixes it. > > > _______________________________________________ > Haskell-Cafe mailing list > Haskell-Cafe <at> haskell.org > http://www.haskell.org/mailman/listinfo/haskell-cafe >

Gmane