Artyom Kazak | 21 Nov 22:59 2012
Picon

isLetter vs. isAlpha

Hello!

I saw a question on StackOverflow about the difference between isAlpha and  
isLetter today. One of the answers stated that the two functions are  
interchangeable, even though they are implemented differently.

I decided to find out whether the difference in implementation influences  
performance, and look what I found:

> import Criterion.Main
> import Data.Char
>fTest name f list = bgroup name $ map (\(n,c) -> bench n $ whnf f c) list
>tests = [("latin", 'e'), ("digit", '8'), ("symbol", '…'), ("greek", 'λ')]
>main = defaultMain [fTest "isAlpha" isAlpha tests, 
>                     fTest "isLetter" isLetter tests]

produces this table (times are in nanoseconds):

                  latin digit symbol greek
                  ----- ----- ------ -----
        isAlpha  | 156   212   368    310
        isLetter | 349   344   383    310

isAlpha is twice as fast on latin inputs! Does it mean that isAlpha should  
be preferred? Why isn’t isLetter defined in terms of isAlpha in Data.Char?

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
(Continue reading)

wren ng thornton | 24 Nov 05:47 2012

Re: isLetter vs. isAlpha

On 11/21/12 4:59 PM, Artyom Kazak wrote:
> I saw a question on StackOverflow about the difference between isAlpha
> and isLetter today. One of the answers stated that the two functions are
> interchangeable, even though they are implemented differently.
>
> I decided to find out whether the difference in implementation
> influences performance, and look what I found:
>
>> import Criterion.Main
>> import Data.Char
>> fTest name f list = bgroup name $ map (\(n,c) -> bench n $ whnf f c) list
>> tests = [("latin", 'e'), ("digit", '8'), ("symbol", '…'), ("greek", 'λ')]
>> main = defaultMain [fTest "isAlpha" isAlpha tests,
>> fTest "isLetter" isLetter tests]
>
> produces this table (times are in nanoseconds):
>
>                   latin digit symbol greek
>                   ----- ----- ------ -----
>         isAlpha  | 156   212   368    310
>         isLetter | 349   344   383    310
>
> isAlpha is twice as fast on latin inputs! Does it mean that isAlpha
> should be preferred? Why isn’t isLetter defined in terms of isAlpha in
> Data.Char?

FWIW, testing on an arbitrary snippit of Japanese yields:

     benchmarking nf (map isAlpha)
     mean: 26.21897 us, lb 26.17674 us, ub 26.27707 us, ci 0.950
(Continue reading)


Gmane