Chad Scherrer | 22 Aug 23:17

"dataflow rewriting engine"

Hello GHC,

This page
http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/IntegratedCodeGen
mentions a to-be-developed "dataflow rewriting engine". Can someone
please send a description of what this will do?

Thanks!
--

-- 

Chad Scherrer

"Time flies like an arrow; fruit flies like a banana" -- Groucho Marx
Simon Peyton-Jones | 26 Aug 14:31

RE: "dataflow rewriting engine"

Norman, John

Would you care to respond to this? (Perhaps by amplifying the wiki page?)  A good starting point is perhaps
Craig's paper.

Simon

| -----Original Message-----
| From: glasgow-haskell-users-bounces <at> haskell.org
[mailto:glasgow-haskell-users-bounces <at> haskell.org] On
| Behalf Of Chad Scherrer
| Sent: 22 August 2008 22:21
| To: GHC Users
| Subject: "dataflow rewriting engine"
|
| Hello GHC,
|
| This page
| http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/IntegratedCodeGen
| mentions a to-be-developed "dataflow rewriting engine". Can someone
| please send a description of what this will do?
|
| Thanks!
| --
|
| Chad Scherrer
|
| "Time flies like an arrow; fruit flies like a banana" -- Groucho Marx
| _______________________________________________
| Glasgow-haskell-users mailing list
(Continue reading)

John Dias | 26 Aug 15:04

RE: "dataflow rewriting engine"

I've added some text and links to point the reader in the right direction. Here's the new text, cribbed from
the Wiki:

Dataflow optimization: We can define a new optimization simply by defining a lattice of dataflow facts
(akin to a specialized logic) and then writing the dataflow-transfer functions found in compiler
textbooks. Handing these functions to the dataflow engine produces a new optimization that is not only
useful on its own, but that can easily be composed with other optimizations to create an integrated
"superoptimization" that is strictly more powerful than any sequence of individual optimizations, no
matter how many times they are re-run. The dataflow engine is based on (Lerner, Grove, and Chambers 2002
http://citeseer.ist.psu.edu/old/lerner01composing.html); you can find a functional
implementation of the dataflow engine presented in (Ramsey and Dias 2005 http://www.cs.tufts.edu/~nr/pubs/zipcfg-abstract.html).

Let me know how I can further clarify the text,
-j

> -----Original Message-----
> From: Simon Peyton-Jones
> Sent: Tuesday, August 26, 2008 1:32 PM
> To: Norman Ramsey; John Dias
> Cc: Chad Scherrer; GHC Users
> Subject: RE: "dataflow rewriting engine"
>
> Norman, John
>
> Would you care to respond to this? (Perhaps by amplifying the wiki
> page?)  A good starting point is perhaps Craig's paper.
>
> Simon
>
> | -----Original Message-----
(Continue reading)

Don Stewart | 26 Aug 19:57
Gravatar

Re: "dataflow rewriting engine"

I think we're all rather excited about seeing this stuff land.
What's the expected timeline, wrt. ghc 6.10's release?

-- Don

t-jodias:
> I've added some text and links to point the reader in the right
> direction. Here's the new text, cribbed from the Wiki:
> 
> Dataflow optimization: We can define a new optimization simply by
> defining a lattice of dataflow facts (akin to a specialized logic) and
> then writing the dataflow-transfer functions found in compiler
> textbooks. Handing these functions to the dataflow engine produces a
> new optimization that is not only useful on its own, but that can
> easily be composed with other optimizations to create an integrated
> "superoptimization" that is strictly more powerful than any sequence
> of individual optimizations, no matter how many times they are re-run.
> The dataflow engine is based on (Lerner, Grove, and Chambers 2002
> http://citeseer.ist.psu.edu/old/lerner01composing.html); you can find
> a functional implementation of the dataflow engine presented in
> (Ramsey and Dias 2005
> http://www.cs.tufts.edu/~nr/pubs/zipcfg-abstract.html).
> 
> Let me know how I can further clarify the text,
> -j
> 
> > -----Original Message-----
> > From: Simon Peyton-Jones
> > Sent: Tuesday, August 26, 2008 1:32 PM
> > To: Norman Ramsey; John Dias
(Continue reading)

Deborah Goldsmith | 26 Aug 21:26

Re: "dataflow rewriting engine"

Has there been any thought about working with the LLVM project? I  
didn't find anything on the wiki along those lines.

Thanks,
Deborah

On Aug 26, 2008, at 10:57 AM, Don Stewart wrote:

> I think we're all rather excited about seeing this stuff land.
> What's the expected timeline, wrt. ghc 6.10's release?
>
> -- Don
>
> t-jodias:
>> I've added some text and links to point the reader in the right
>> direction. Here's the new text, cribbed from the Wiki:
>>
>> Dataflow optimization: We can define a new optimization simply by
>> defining a lattice of dataflow facts (akin to a specialized logic)  
>> and
>> then writing the dataflow-transfer functions found in compiler
>> textbooks. Handing these functions to the dataflow engine produces a
>> new optimization that is not only useful on its own, but that can
>> easily be composed with other optimizations to create an integrated
>> "superoptimization" that is strictly more powerful than any sequence
>> of individual optimizations, no matter how many times they are re- 
>> run.
>> The dataflow engine is based on (Lerner, Grove, and Chambers 2002
>> http://citeseer.ist.psu.edu/old/lerner01composing.html); you can find
>> a functional implementation of the dataflow engine presented in
(Continue reading)

Austin Seipp | 26 Aug 23:11
Gravatar

Re: "dataflow rewriting engine"

Excerpts from Deborah Goldsmith's message of Tue Aug 26 14:26:33 -0500 2008:
> Has there been any thought about working with the LLVM project? I  
> didn't find anything on the wiki along those lines.

I imagine if tried, haskell as a language on LLVM would quickly end up
a second-class citizen, due to non-strict semantics?

Austin
Favicon
Gravatar

Re: "dataflow rewriting engine"

Deborah Goldsmith:
> Has there been any thought about working with the LLVM project? I  
> didn't find anything on the wiki along those lines.

I have only had a rather brief look at LLVM, but my understanding at  
the moment is that LLVM would not be able to support one of GHC's  
current code layout optimisations.  More precisely, with LLVM, it  
would not be possible to enforce that the meta data for a closure is  
placed right before (in terms of layout in the address space) the code  
executing the "eval" method of that same closure.  GHC uses that to  
have the closure code pointer point directly to the "eval" code (and  
hence also by an appropriate offset) to the various fields of the meta  
data.  If that layout cannot be ensured, GHC needs to take one more  
indirection to execute "evals" (which is a very frequent operation) -  
this is what an unregistered build does btw.

However, I am not convinced that this layout optimisation is really  
gaining that much extra performance these days.  In particular, since  
dynamic pointer tagging, very short running "evals" (for which the  
extra indirection incurs the largest overhead) have become less  
frequent.  Even if there is a slight performance regression, I think,  
it would be worthwhile to consider giving up on the described layout  
constraint.  It is the Last Quirk that keeps GHC from using standard  
compiler back-ends (such as LLVM), and I suspect, it is not worth it  
anymore.

When we discussed this last, Simon Marlow planned to run benchmarks to  
determine how much performance the layout optimisation gains us these  
days.  Simon, did you ever get around to that?

(Continue reading)

Duncan Coutts | 27 Aug 04:17
Gravatar

Re: "dataflow rewriting engine"

On Wed, 2008-08-27 at 10:25 +1000, Manuel M T Chakravarty wrote:

> However, I am not convinced that this layout optimisation is really  
> gaining that much extra performance these days.  In particular, since  
> dynamic pointer tagging, very short running "evals" (for which the  
> extra indirection incurs the largest overhead) have become less  
> frequent.  Even if there is a slight performance regression, I think,  
> it would be worthwhile to consider giving up on the described layout  
> constraint.  It is the Last Quirk that keeps GHC from using standard  
> compiler back-ends (such as LLVM), and I suspect, it is not worth it  
> anymore.

There's also a potential benefit on the other side, that the cpu's
instruction cache is not wasted on non-instruction data. Apparently some
cpus also do instruction read-ahead and suffer slowdown if they
encounter data that does not decode as valid op codes. Obviously it's
not allowed to fault since the instructions are not actually executed
but it can impact on performance according to Intel manuals (presumably
because it ends up flushing some instruction decoding caches or
something). Of course, as you've discussed, the only way to find out is
to benchmark.

Duncan
Simon Peyton-Jones | 27 Aug 10:10

RE: "dataflow rewriting engine"

| > Has there been any thought about working with the LLVM project? I
| > didn't find anything on the wiki along those lines.
|
| I have only had a rather brief look at LLVM, but my understanding at
| the moment is that LLVM would not be able to support one of GHC's
| current code layout optimisations.

With the new, modular code generation framework, described here
http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/NewCodeGen
I think it'll be much easier [for someone else] to consider using LLVM.  I don't see the "data next to code"
thing as a show-stopper; it used to be a performance win of a handful of percent, but I don't think we have
recent data.  And an LLVM back end might be willing to give up that few percent.

There may well be other difficulties, of course.  If anyone is interested, the best thing is to wait for the
new stuff to be committed to the HEAD (which will happen in Sept), and look at the new Cmm data types (which
define the language you'd have to convert to LLVM).

Simon
Simon Marlow | 27 Aug 11:35

Re: "dataflow rewriting engine"

Manuel M T Chakravarty wrote:
> Deborah Goldsmith:
>> Has there been any thought about working with the LLVM project? I 
>> didn't find anything on the wiki along those lines.
> 
> I have only had a rather brief look at LLVM, but my understanding at the 
> moment is that LLVM would not be able to support one of GHC's current 
> code layout optimisations.  More precisely, with LLVM, it would not be 
> possible to enforce that the meta data for a closure is placed right 
> before (in terms of layout in the address space) the code executing the 
> "eval" method of that same closure.  GHC uses that to have the closure 
> code pointer point directly to the "eval" code (and hence also by an 
> appropriate offset) to the various fields of the meta data.  If that 
> layout cannot be ensured, GHC needs to take one more indirection to 
> execute "evals" (which is a very frequent operation) - this is what an 
> unregistered build does btw.
> 
> However, I am not convinced that this layout optimisation is really 
> gaining that much extra performance these days.  In particular, since 
> dynamic pointer tagging, very short running "evals" (for which the extra 
> indirection incurs the largest overhead) have become less frequent.  
> Even if there is a slight performance regression, I think, it would be 
> worthwhile to consider giving up on the described layout constraint.  It 
> is the Last Quirk that keeps GHC from using standard compiler back-ends 
> (such as LLVM), and I suspect, it is not worth it anymore.
> 
> When we discussed this last, Simon Marlow planned to run benchmarks to 
> determine how much performance the layout optimisation gains us these 
> days.  Simon, did you ever get around to that?

(Continue reading)

Simon Peyton-Jones | 27 Aug 14:08

RE: "dataflow rewriting engine"

| I think we're all rather excited about seeing this stuff land.
| What's the expected timeline, wrt. ghc 6.10's release?

Good question.  I've updated the overview here

http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/NewCodeGen

to say what we plan.

Simon
Chad Scherrer | 27 Aug 21:42

Re: "dataflow rewriting engine"

Wow, lots of great information. We'll take a look at the papers and get back if
there's any remaining confusion. Thanks!

Chad

Gmane