Ian Zerny | 1 Nov 16:21

OCaml for efficient virtual machines

A friend and I are currently evaluating OCaml as the implementation
language for a virtual machine. The interpreted language is yet
undecided, but we expect it to be some intermediate bytecode for a
simple language such as Scheme or Self.

Currently we have a few issues and would greatly appreciate any
pointers or comments on them.

* Dynamically Compiled Code.
We need to dynamically compile code in the running VM. Our current
attack on this is to use a native ia32 emitter in OCaml and pass
the emitted code to a C function that can then start
executing it. We have looked at arrays and buffers as byte containers,
but the fact that OCaml tags non-pointers in these containers
makes this method cumbersome. We would need to remove the code tags in
C before executing it. Also, if we do not undo the removal of tags we
would hinder the garbage collector from collecting some unused
objects. Now if OCaml has some collection that can contain raw bytes
and is garbage collection aware we could use that. Something like
Self's byte array. We have not been able to find a reference to such a
container in OCaml.

* Threaded Code.
We would like our bytecode interpreter to be as fast as possible in
order to reduce the amount of compiling that needs to be done. A
common technique is to use a ``threaded code'' interpreter. Here every
operation is given by a label and the current operation can jump
directly to the following operation. This can greatly reduce the
amount of miss-predictions in the branch prediction unit.
We can implement this with a (named or unnamed) function for each
opcode and in each of them the invocation of the next opcode is in
tail-position.

A simple interpreter:

let op0 r pc ops = r;;
let op1 r pc ops = ops.(pc) (r+1) (pc+1) ops;;
let op2 r pc ops = ops.(pc) (r-1) (pc+1) ops;;
let interp ops = ops.(0) 0 1 ops;;

We have omitted the appropriate function to create the initial opcode
array and the code requires the -rectypes in order to compile. This compiles
to near optimal code and all parameters reside in registers. However,
the tail calls are indirect jumps that must first go through
`caml_apply3'.

Assembly output for op0 and op1:

op0:
ret
op1:
movl -2(%ecx, %ebx, 2), %edx
addl $2, %ebx
addl $2, %eax
jmp caml_apply3
# movl 8(%edx), %esi
# jmp *%esi

Here we have noted in the comments what we would have liked to see.
Why OCaml does not do this is still not clear to us. In the code for
caml_apply3 there is a check on some property of the invoked
function. It seems that under some situations the tail call can result
in a call instead of a jump. This puzzles us a bit as the fact that
the application is in tail position of the caller should be enough, no?
We would really like to understand why `caml_apply3' works this way.

If we perform our optimization by hand we beat the C version of the
same interpreter compiled with GCC (with labels-as-values extension).

The most efficient interpreter we have been able to produce directly
with OCaml is the defunctionalized version of the above.
Unfortunately, that again causes all operations to jump to the same
`apply' method dispatching over an ADT of operations. This is however
still faster than the double dispatch incurred by `caml_apply3'.

Thank you for taking the time to read this post. We look forward to
any remarks.

Kind regards, Ian and Thomas.

Assembly of caml_apply3:
caml_apply3:
subl $8, %esp
movl 4(%edx), %esi
cmpl $7, %esi # what are we checking?
jne .L447
movl 8(%edx), %esi
addl $8, %esp
jmp *%esi # we always end here
.align 16
.L447:
movl %ecx, 4(%esp)
movl %ebx, 0(%esp)
movl (%edx), %ecx
movl %edx, %ebx
call *%ecx
.L449:
movl %eax, %ebx
movl (%ebx), %ecx
movl 0(%esp), %eax
call *%ecx
.L450:
movl %eax, %ebx
movl (%ebx), %ecx
movl 4(%esp), %eax
addl $8, %esp
jmp *%ecx

__._,_.___
Recent Activity
Visit Your Group
Yahoo! Finance

It's Now Personal

Guides, news,

advice & more.

New web site?

Drive traffic now.

Get your business

on Yahoo! search.

Yahoo! Groups

Real Food Group

Share recipes

and favorite meals.

.

__,_._,___
Jon Harrop | 1 Nov 18:56

Re: OCaml for efficient virtual machines

On Saturday 01 November 2008 15:21:40 Ian Zerny wrote:
> A friend and I are currently evaluating OCaml as the implementation
> language for a virtual machine. The interpreted language is yet
> undecided, but we expect it to be some intermediate bytecode for a
> simple language such as Scheme or Self.

What about an OCaml string or Bigarray for your data structure?

This is perhaps not what you're after but I would recommend using LLVM to
write a JIT in OCaml instead. That is almost as easy, is likely to give much
better performance and, in particular, completely evades these tagging woes.

--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

__._,_.___
Recent Activity
Visit Your Group
Yahoo! Finance

It's Now Personal

Guides, news,

advice & more.

Y! Groups blog

The place to go

to stay informed

on Groups news!

Yahoo! Groups

w/ John McEnroe

Join the All-Bran

Day 10 Club.

.

__,_._,___
Ian Zerny | 12 Nov 09:30

Re: OCaml for efficient virtual machines

Jon Harrop wrote:
> On Saturday 01 November 2008 15:21:40 Ian Zerny wrote:
>> A friend and I are currently evaluating OCaml as the implementation
>> language for a virtual machine. The interpreted language is yet
>> undecided, but we expect it to be some intermediate bytecode for a
>> simple language such as Scheme or Self.
>
> What about an OCaml string or Bigarray for your data structure?

Of course. I have no idea how we managed not to see that string fulfils
our requirements. Thank you for the pointer.

> This is perhaps not what you're after but I would recommend using LLVM to
> write a JIT in OCaml instead. That is almost as easy, is likely to give much
> better performance and, in particular, completely evades these tagging woes.

Yes. We have looked at the LLVM support in OCaml and it looks splendid.
However, our project is to build the optimizing VM so using LLVM could
be considered cheating.

With respect to our question about tail-calls. We have still not found a
remedy to the problem that can directly be expressed in OCaml. We can
however hack our build system to manipulate the intermediate assembly
output of OCaml to remove the indirect jump in well defined places. We
would still be interested in learning of any other approach to this problem.

Thank you for the quick reply and sorry our response has been delayed.
Kind regards, Ian

__._,_.___
Recent Activity
Visit Your Group
Yahoo! Finance

It's Now Personal

Guides, news,

advice & more.

Ads on Yahoo!

Learn more now.

Reach customers

searching for you.

Special K Group

on Yahoo! Groups

Learn how others

are losing pounds.

.

__,_._,___
remi.vanicat | 1 Nov 20:19

Re: OCaml for efficient virtual machines

Ian Zerny <ian <at> zerny.dk> writes:

> A friend and I are currently evaluating OCaml as the implementation
> language for a virtual machine. The interpreted language is yet
> undecided, but we expect it to be some intermediate bytecode for a
> simple language such as Scheme or Self.

Firstly, you will probably have more precise answer on the main
caml-list, your question are not really beginner level question.

>
> Currently we have a few issues and would greatly appreciate any
> pointers or comments on them.
>
> * Dynamically Compiled Code.

Did you look at metaocaml ? it probably one of the next way to
dynamically generate code for ocaml.

--
Rémi Vanicat

__._,_.___
Recent Activity
Visit Your Group
Give Back

Yahoo! for Good

Get inspired

by a good cause.

Y! Toolbar

Get it Free!

easy 1-click access

to your groups.

Yahoo! Groups

Start a group

in 3 easy steps.

Connect with others.

.

__,_._,___
Jon Harrop | 1 Nov 21:50

Re: Re: OCaml for efficient virtual machines

On Saturday 01 November 2008 19:19:08 remi.vanicat <at> gmail.com wrote:
> Ian Zerny <ian <at> zerny.dk> writes:
> > * Dynamically Compiled Code.
>
> Did you look at metaocaml ? it probably one of the next way to
> dynamically generate code for ocaml.

FWIW, I found MetaOCaml to be fine for staging a term-level interpreter but
failed to get any kind of performance improvement out of it when dealing with
lower-level languages like bytecodes.

--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

__._,_.___
Recent Activity
Visit Your Group
Yahoo! Finance

It's Now Personal

Guides, news,

advice & more.

Search Ads

Get new customers.

List your web site

in Yahoo! Search.

Yahoo! Groups

Join people over 40

who are finding ways

to stay in shape.

.

__,_._,___
Ian Zerny | 12 Nov 09:34

Re: Re: OCaml for efficient virtual machines

remi.vanicat <at> gmail.com wrote:
> Ian Zerny <ian <at> zerny.dk> writes:
>
>> A friend and I are currently evaluating OCaml as the implementation
>> language for a virtual machine. The interpreted language is yet
>> undecided, but we expect it to be some intermediate bytecode for a
>> simple language such as Scheme or Self.
>
> Firstly, you will probably have more precise answer on the main
> caml-list, your question are not really beginner level question.

Yes. We did not really know where our questions would be appropriate.
This list seemed like the safe choice.

>> Currently we have a few issues and would greatly appreciate any
>> pointers or comments on them.
>>
>> * Dynamically Compiled Code.
>
> Did you look at metaocaml ? it probably one of the next way to
> dynamically generate code for ocaml.

Ok, we will take look at it.

Thank you for you answers.
Kind regards, Ian

__._,_.___
Recent Activity
Visit Your Group
Yahoo! Finance

It's Now Personal

Guides, news,

advice & more.

Yahoo! Groups

Dogs Owners Group

Join Do More For Dogs

pet community

Need traffic?

Drive customers

With search ads

on Yahoo!

.

__,_._,___

Gmane