Kyle Hanson | 7 Aug 19:57 2014

Performance of StateT and best practices for debugging

Hello,

I was looking at cleaning up my refactoring a core loop of template rendering to go from a loop with many parameters

loop :: RenderConfig -> BlockMap -> InputBucket m -> Builder -> [Pieces] -> ExceptT StrapError m Builder

to a looped state monad transformer

loop :: [Pieces] -> RenderT m Builder

newtype RenderT m a = RenderT 
  { runRenderT :: ExceptT StrapError (StateT (RenderState m) m) a 
  } deriving ( Functor, Applicative, Monad, MonadIO )

data RenderState m = RenderState
  { position     :: SourcePos
  , renderConfig :: RenderConfig
  , blocks       :: BlockMap
  , bucket       :: InputBucket m
  }

however, there is a big slow down (about 6-10x) using a StateT. I think it might have something to do with laziness but I am not exactly sure of where to begin in tracking it down. Swapping out the Lazy State to a Strict State helps a little (only a 5x slow down)

You can find some of the processing code here:


With my old loop commented out.

Its messy right now since I am just trying a number of different approaches. I did some more work factoring out the lifts, trying different iterations of foldlM and stuff but that didn't have that much of an effect on performance.

After profiling I see in the StateT, the report has a lot more CAFs and garbage collecting.

Here is the profiling report from my original version w/o StateT

Slow version with StateT

Here is the "makeBucket" function that is referenced (it is the same in both state and nonstate):


Looking at stacked overflow and the official docs I have gotten an idea of what is going on. The heaps generated between them tells me that a lot more memory is being allocated to lists. These heaps were generated running my render function against a template with nested loops and a list of elements.


I am hoping that maybe someone could give me a hint at what to look at next. I've played around with Strictness and refactoring loops to no avail and now am kind of stuck. Any help would be appreciated.

--
Kyle Hanson
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
John Lato | 8 Aug 00:39 2014
Picon

Re: Performance of StateT and best practices for debugging

I haven't looked very closely, but I'm suspicious of this code from "instance Block Piece"

  ListLike l -> forM l (\obj -> ...)
                    >>= (return . mconcat)

The "forM" means that "l" will be traversed once and create an output list, which will then be mconcat'd together.  The list has to be created because of the monadic structure imposed by forM, but if the result of the mconcat isn't demanded right away it will be retained as a thunk that references the newly-created list.

I'd suggest that you replace it with something like

  ListLike l -> foldM (\(!acc) obj -> ... >>= return . mappend acc) mempty l

Here I've justed added a bang pattern to the accumulator.  If whatever is being returned has some lazy fields, you may want to change that to use deepseq instead of a bang pattern.

Also, "foo >>= return . bar" is often regarded as a bit of a code smell, it can be replaced with "bar <$> foo" or "bar `liftM` foo", or sometimes something even simpler depending on circumstances (but IMHO sometimes it's more clear to just leave it alone).

The heap profile does look like a space leak.  The line

 <StrappedTemplates-0.1.1.0:Text.Strapped.Render.sat_sc1z>

is a thunk (you can tell because it's in '<>' brackets), so whatever is referencing that is not strict enough.  Sometimes another heap profile report, e.g. "-hc" or maybe "-hy" will give more useful information that lets you identify what exactly "sat_sc1z" is.  You could also try compiling with -ddump-stg, which will dump the intermediate STG output which usually shows those names.  But then you'll probably also need to re-run the profile, since the names change between compilations.  Also IIRC some of values aren't named until the cmm phase, but that's harder to map back to Haskell so if you can identify the code from stg it's simpler.

If you haven't seen http://blog.ezyang.com/2011/06/pinpointing-space-leaks-in-big-programs/, I'd highly recommend it if you need to track down a space leak.

John L.



On Thu, Aug 7, 2014 at 10:57 AM, Kyle Hanson <me <at> khanson.io> wrote:
Hello,

I was looking at cleaning up my refactoring a core loop of template rendering to go from a loop with many parameters

loop :: RenderConfig -> BlockMap -> InputBucket m -> Builder -> [Pieces] -> ExceptT StrapError m Builder

to a looped state monad transformer

loop :: [Pieces] -> RenderT m Builder

newtype RenderT m a = RenderT 
  { runRenderT :: ExceptT StrapError (StateT (RenderState m) m) a 
  } deriving ( Functor, Applicative, Monad, MonadIO )

data RenderState m = RenderState
  { position     :: SourcePos
  , renderConfig :: RenderConfig
  , blocks       :: BlockMap
  , bucket       :: InputBucket m
  }

however, there is a big slow down (about 6-10x) using a StateT. I think it might have something to do with laziness but I am not exactly sure of where to begin in tracking it down. Swapping out the Lazy State to a Strict State helps a little (only a 5x slow down)

You can find some of the processing code here:


With my old loop commented out.

Its messy right now since I am just trying a number of different approaches. I did some more work factoring out the lifts, trying different iterations of foldlM and stuff but that didn't have that much of an effect on performance.

After profiling I see in the StateT, the report has a lot more CAFs and garbage collecting.

Here is the profiling report from my original version w/o StateT

Slow version with StateT

Here is the "makeBucket" function that is referenced (it is the same in both state and nonstate):


Looking at stacked overflow and the official docs I have gotten an idea of what is going on. The heaps generated between them tells me that a lot more memory is being allocated to lists. These heaps were generated running my render function against a template with nested loops and a list of elements.


I am hoping that maybe someone could give me a hint at what to look at next. I've played around with Strictness and refactoring loops to no avail and now am kind of stuck. Any help would be appreciated.

--
Kyle Hanson

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Bardur Arantsson | 8 Aug 06:31 2014
Picon

Re: Performance of StateT and best practices for debugging

On 2014-08-07 19:57, Kyle Hanson wrote:
> Hello,
> 
> Here is the "makeBucket" function that is referenced (it is the same in
> both state and nonstate):
> 
> https://github.com/hansonkd/StrappedTemplates/blob/321a88168d54943fc217553c873f188797c0d4f5/examples/big_example.hs#L24
> 

Just a shot in the dark, but I notice that you're using "modify" and not
"modify'" which was added in a recent version of transformers.

Strict.StateT is not always "strict enough" and you may need to use modify'.

At any rate, it's worth a shot, I think.

Regards,
John Lato | 8 Aug 08:56 2014
Picon

Re: Performance of StateT and best practices for debugging

On Thu, Aug 7, 2014 at 9:31 PM, Bardur Arantsson <spam <at> scientician.net> wrote:
On 2014-08-07 19:57, Kyle Hanson wrote:
> Hello,
>
> Here is the "makeBucket" function that is referenced (it is the same in
> both state and nonstate):
>
> https://github.com/hansonkd/StrappedTemplates/blob/321a88168d54943fc217553c873f188797c0d4f5/examples/big_example.hs#L24
>

Just a shot in the dark, but I notice that you're using "modify" and not
"modify'" which was added in a recent version of transformers.

Strict.StateT is not always "strict enough" and you may need to use modify'.

At any rate, it's worth a shot, I think.

Good point.  I think that even modify' will not be strict enough without adding strictness to RenderState as well.

John L. 
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Gmane