From: Loup Vaillant <l <at> loup-vaillant.fr>
To: fonc <at> vpri.org
Sent: Wednesday, February 8, 2012 9:24 AM
Subject: Re: [fonc] Raspberry Pi
Alan Kay wrote:
> Hi Loup
> Actually, your last guess was how we thought most of the optimizations
> would be done (as separate code "guarded" by the meanings). […]
> In practice, the optimizations we did do are done in the translation
> chain and in the run-time, […]
I can't recall the exact reference, but I have read once about a middle
ground: mechanical optimization passes that are brittle in the face of
meaning change. I mean, if you change the concise version of your
program, you may have to re-think your optimizations passes, but you
don't necessarily have to re-write your optimized version directly.
The guy where at the university, and was assigned to write optimized
multiplication for big numbers. Each student would be graded
according to the speed of their program. No
restriction on the
Everyone started coding in C, but _he_ preferred to start with
scheme. He coded a straightforward version of the algorithm, then
set out to manually (but mostly mechanically) apply a set of
correctness-preserving transformations, most notably a CPS
transformation, and a direct translation to C with gotos. His final
program, written in pure C, was the second fastest of his class (and
very close to the first, which used assembly heavily). Looking back
at what he could have done better, he saw that his program spent most
of his time in malloc(). He didn't know how to do it at the time,
but he had managed his memory directly, his program would have been
Oh, and of course, he had much less trouble dealing with mistakes
So, his conclusion was that speed comes from beautiful programs, not
"prematurely optimized" ones.
About Frank, we may imagine using this method in a more automated way,
for instance by annotating the source and intermediate code with
specialized optimizations that would only work in certain cases. It
could be something roughly like this:
Nile Source <- Annotations to optimize the Nile program
| Compiler pass that check the validity of the
| optimizations then applies them.
Maru Intermediate code <- Annotations to optimize that maru program
Compiler pass like the above
C Backend code <- Annotations (though at this point…)
| <- GCC
Assembly <- (no, I won't touch that
(Note that instead of annotating the programs, you could manually
control the compilers.)
Of course, the second you change the Nile source is the second your
annotations at every level won't work any more. But (i) you would only
have to redo your annotations, and (ii), maybe not all of them anyway,
for there is a slight chance that your intermediate representation
didn't change too much when you changed your source code.
I can think of one big caveat however: if the generated code is too big
or too cryptic, this approach may not be feasible any more. And
forgot about profiling your program first.
> But Dan Amelang did such a great job at the meanings that they ran
> fast enough tempt us to use them directly [so] Cairo never entered
> the picture.
If I had to speculate from an outsider's point of view, I'd say these
good surprises probably apply to almost any domain specific language.
The more specialized a language is, the more domain knowledge you can
embed in the compiler, the more efficient the optimizations may be. I
know it sounds like magic, but I recall a similar example with Haskell,
applied to bioinformatics (again, can't find the paper).
The goal was to implement a super-fast algorithm. The catch was, the
resulting program has to accept a rather big number of parameters.
Written directly in C, the fact that those parameters changed meant
the main loop had to make several
checks, slowing the whole thing
So they did an EDSL based on monads that basically generated a C
program after the parameters were read and knows, then ran it. Not
quite Just-In-Time compilation, but close. The result was of course
noticeably faster than the original C program.
Therefore, I'm quite optimistic. My best guess right now is that smart
compilers will be more than enough to make Frank "fast enough", "as fast
as C" or possibly even faster, for 2 reasons:
1. As far as I know, most languages in Frank are quite specialized.
That prior knowledge can be exploited by compilers.
2. The code volume is sufficiently small that aggressive whole program
optimizations are actually feasible.
Such compilers may cost 10 to 100 thousands lines or more, but at least
those lines would be strictly contained. Then, potential
wouldn't give up too much hackability in the name of performance.
fonc mailing listfonc-uVco7kAcSAQ@public.gmane.orghttp://vpri.org/mailman/listinfo/fonc