Kai Zhang | 29 Mar 02:43 2014

Inline makes program slow?

Hi cafe,

Inline sometimes can cause problems, at least in following case:

import qualified Data.Vector as V
import Data.List

f ∷ String → (String → Int → Char) → [Int] → String
f str g idx = map (g str) idx

h ∷ String → Int → Char
{-# INLINE h #-}
h s i = (V.fromList $ sort s) V.! i

slow ∷ String → [Int] → String
slow str = f str h 

fast ∷ String → [Int] → String
fast str = map ((V.fromList $ sort str) V.!)

main = do
    let testString = replicate 100000 'a'
        iterations = replicate 1000 100
    putStrLn $ fast testString iterations
    putStrLn $ slow testString iterations

Without inline (remove the inline pragma), "slow" would be much faster. I suspect this is because ghc can build a "persistent structure" for the partial applied function. After inline, each call of "g" will try to build a new vector. How can I tell ghc not to inline some specific functions? Or are there other ways to resolve this issue? 
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Ivan Lazar Miljenovic | 29 Mar 03:09 2014
Picon

Re: Inline makes program slow?

On 29 March 2014 12:43, Kai Zhang <kai <at> kzhang.org> wrote:
> Hi cafe,
>
> Inline sometimes can cause problems, at least in following case:
>
> import qualified Data.Vector as V
> import Data.List
>
> f ∷ String → (String → Int → Char) → [Int] → String
> f str g idx = map (g str) idx
>
> h ∷ String → Int → Char
> {-# INLINE h #-}
> h s i = (V.fromList $ sort s) V.! i
>
> slow ∷ String → [Int] → String
> slow str = f str h
>
> fast ∷ String → [Int] → String
> fast str = map ((V.fromList $ sort str) V.!)
>
> main = do
>     let testString = replicate 100000 'a'
>         iterations = replicate 1000 100
>     putStrLn $ fast testString iterations
>     putStrLn $ slow testString iterations
>
> Without inline (remove the inline pragma), "slow" would be much faster. I
> suspect this is because ghc can build a "persistent structure" for the
> partial applied function. After inline, each call of "g" will try to build a
> new vector. How can I tell ghc not to inline some specific functions? Or are
> there other ways to resolve this issue?

There's the NOINLINE and INLINEABLE pragmas.

Though I'm going through some of my own code where I waved the
INLINE-hammer around rather heavily only to find that whilst it
doesn't make much of a difference on my main x86_64 machine, on the
x86 laptop I tested it on it made the code much slower.  So I'm also
interested in finding better ways of determining where and when INLINE
is helpful (rather than blindly removing some INLINEs and re-profiling
to see what difference it makes, as in many cases it *does* require
the INLINE for performance).

>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe <at> haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>

--

-- 
Ivan Lazar Miljenovic
Ivan.Miljenovic <at> gmail.com
http://IvanMiljenovic.wordpress.com
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
wren romano | 30 Mar 09:56 2014
Picon

Re: Inline makes program slow?

On Fri, Mar 28, 2014 at 9:43 PM, Kai Zhang <kai <at> kzhang.org> wrote:
> Without inline (remove the inline pragma), "slow" would be much faster. I
> suspect this is because ghc can build a "persistent structure" for the
> partial applied function. After inline, each call of "g" will try to build a
> new vector. How can I tell ghc not to inline some specific functions? Or are
> there other ways to resolve this issue?

For what it's worth, I don't think this is an inlining issue, per se;
rather, it's an issue with the fact that eta-conversion does not
preserve performance characteristics. That is, when we inline h and
perform as much beta-reduction as we can, we're left with the lambda
expression:

    \i -> (V.fromList $ sort str) V.! i

Which is not necessarily the same thing, performance-wise, as:

    ((V.fromList $ sort str) V.!)

The problem is that, implicitly, the whole body of the lambda
abstraction (might) depend on the value of i and therefore cannot be
performed until we know what i is. If we wanted to make it explicit
that sorting the string is independent of the value of i, we could
write:

    let s = V.fromList $ sort str in \i -> s V.! i

By using let-binding to lift most of the computation out of the body
of the lambda abstraction, we ensure that the sorting will only be
done once, rather than (possibly) being done every time this function
is called.

The reason I say "might" and "possibly" is because, in theory, the
compiler could choose to perform this transformation for you. And
sometimes it does (as apparently it does in your fast code). The
problem is that, in practice, performing this transformation
everywhere can slow things down horribly by taking too much memory
because you're trying to hold onto too many things. Thus, the compiler
must rely on heuristics to decide when it should float computations
out from under lambdas and when it shouldn't.

--

-- 
Live well,
~wren
Dominick Samperi | 30 Mar 18:57 2014
Picon

Re: Inline makes program slow?

Compiler optimization levels are also important. The attached program compiles
and runs ok using:

ghc -O fibmustopt.hs
./fibmustopt

But if the '-O' option is omitted all of the available memory is used
and it fails.

On Sun, Mar 30, 2014 at 2:56 AM, wren romano <winterkoninkje <at> gmail.com> wrote:
> On Fri, Mar 28, 2014 at 9:43 PM, Kai Zhang <kai <at> kzhang.org> wrote:
>> Without inline (remove the inline pragma), "slow" would be much faster. I
>> suspect this is because ghc can build a "persistent structure" for the
>> partial applied function. After inline, each call of "g" will try to build a
>> new vector. How can I tell ghc not to inline some specific functions? Or are
>> there other ways to resolve this issue?
>
> For what it's worth, I don't think this is an inlining issue, per se;
> rather, it's an issue with the fact that eta-conversion does not
> preserve performance characteristics. That is, when we inline h and
> perform as much beta-reduction as we can, we're left with the lambda
> expression:
>
>     \i -> (V.fromList $ sort str) V.! i
>
> Which is not necessarily the same thing, performance-wise, as:
>
>     ((V.fromList $ sort str) V.!)
>
> The problem is that, implicitly, the whole body of the lambda
> abstraction (might) depend on the value of i and therefore cannot be
> performed until we know what i is. If we wanted to make it explicit
> that sorting the string is independent of the value of i, we could
> write:
>
>     let s = V.fromList $ sort str in \i -> s V.! i
>
> By using let-binding to lift most of the computation out of the body
> of the lambda abstraction, we ensure that the sorting will only be
> done once, rather than (possibly) being done every time this function
> is called.
>
> The reason I say "might" and "possibly" is because, in theory, the
> compiler could choose to perform this transformation for you. And
> sometimes it does (as apparently it does in your fast code). The
> problem is that, in practice, performing this transformation
> everywhere can slow things down horribly by taking too much memory
> because you're trying to hold onto too many things. Thus, the compiler
> must rely on heuristics to decide when it should float computations
> out from under lambdas and when it shouldn't.
>
> --
> Live well,
> ~wren
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe <at> haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
Attachment (fibmustopt.hs): text/x-haskell, 210 bytes
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Simon Peyton Jones | 3 Apr 18:21 2014
Picon

Re: Inline makes program slow?

Dominick's program is very delicate:

	fibonacci :: (Integral a) => [a]
	fibonacci = 0 : 1 : zipWith (+) fibonacci (tail fibonacci)

Remember, the "(Integral a) =>" thing is very like "Integral a ->"; it's an extra value  argument.  Would you
expect this program to be fast?

	foo :: Int -> [Int]
	foo n = 0 : 1 : zipWith (+) (foo n) (tail (foo n))

Perhaps, but it depends on common sub-expression analysis which is on with -O.

Anyway it is not related to INLINE or eta-expansion/contraction.

Simon

| -----Original Message-----
| From: Haskell-Cafe [mailto:haskell-cafe-bounces <at> haskell.org] On Behalf
| Of Dominick Samperi
| Sent: 30 March 2014 17:58
| To: wren romano
| Cc: haskell
| Subject: Re: [Haskell-cafe] Inline makes program slow?
| 
| Compiler optimization levels are also important. The attached program
| compiles and runs ok using:
| 
| ghc -O fibmustopt.hs
| ./fibmustopt
| 
| But if the '-O' option is omitted all of the available memory is used
| and it fails.
| 
| 
| On Sun, Mar 30, 2014 at 2:56 AM, wren romano <winterkoninkje <at> gmail.com>
| wrote:
| > On Fri, Mar 28, 2014 at 9:43 PM, Kai Zhang <kai <at> kzhang.org> wrote:
| >> Without inline (remove the inline pragma), "slow" would be much
| >> faster. I suspect this is because ghc can build a "persistent
| >> structure" for the partial applied function. After inline, each call
| >> of "g" will try to build a new vector. How can I tell ghc not to
| >> inline some specific functions? Or are there other ways to resolve
| this issue?
| >
| > For what it's worth, I don't think this is an inlining issue, per se;
| > rather, it's an issue with the fact that eta-conversion does not
| > preserve performance characteristics. That is, when we inline h and
| > perform as much beta-reduction as we can, we're left with the lambda
| > expression:
| >
| >     \i -> (V.fromList $ sort str) V.! i
| >
| > Which is not necessarily the same thing, performance-wise, as:
| >
| >     ((V.fromList $ sort str) V.!)
| >
| > The problem is that, implicitly, the whole body of the lambda
| > abstraction (might) depend on the value of i and therefore cannot be
| > performed until we know what i is. If we wanted to make it explicit
| > that sorting the string is independent of the value of i, we could
| > write:
| >
| >     let s = V.fromList $ sort str in \i -> s V.! i
| >
| > By using let-binding to lift most of the computation out of the body
| > of the lambda abstraction, we ensure that the sorting will only be
| > done once, rather than (possibly) being done every time this function
| > is called.
| >
| > The reason I say "might" and "possibly" is because, in theory, the
| > compiler could choose to perform this transformation for you. And
| > sometimes it does (as apparently it does in your fast code). The
| > problem is that, in practice, performing this transformation
| > everywhere can slow things down horribly by taking too much memory
| > because you're trying to hold onto too many things. Thus, the
| compiler
| > must rely on heuristics to decide when it should float computations
| > out from under lambdas and when it shouldn't.
| >
| > --
| > Live well,
| > ~wren
| > _______________________________________________
| > Haskell-Cafe mailing list
| > Haskell-Cafe <at> haskell.org
| > http://www.haskell.org/mailman/listinfo/haskell-cafe
Tom Ellis | 3 Apr 18:48 2014
Picon

Re: Inline makes program slow?

On Thu, Apr 03, 2014 at 04:21:38PM +0000, Simon Peyton Jones wrote:
> Dominick's program is very delicate:
> 
> 	fibonacci :: (Integral a) => [a]
> 	fibonacci = 0 : 1 : zipWith (+) fibonacci (tail fibonacci)
> 
> Remember, the "(Integral a) =>" thing is very like "Integral a ->"; it's
> an extra value argument.  Would you expect this program to be fast?
> 
> 	foo :: Int -> [Int]
> 	foo n = 0 : 1 : zipWith (+) (foo n) (tail (foo n))

Indeed.  The curious may be interested to know you can fix this by hand by
sharing a monomorphic bind

    fibonacci = let f = fibonacci in 0 : 1 : zipWith (+) f (tail f)

unless the dreaded monomorphism restriction is off, in which case you have
to find some other way of making 'f' monomorphic, for example with
ScopedTypeVariables:

    fibonacci = let f = fibonacci :: [a] in 0 : 1 : zipWith (+) f (tail f)

(I guess examples like this are exactly the reason the monomorphism
restriction exists).

Tom

Gmane