1 Sep 15:07
asm intrinsics for speed
From: Phil Dawes <phil@...>
Subject: asm intrinsics for speed
Newsgroups: gmane.comp.lang.factor.general
Date: 2008-09-01 13:07:51 GMT
Subject: asm intrinsics for speed
Newsgroups: gmane.comp.lang.factor.general
Date: 2008-09-01 13:07:51 GMT
Hi Slava,
I'm building (what amounts to) an olap database engine in factor, and
part of that involves linear searching mmapped files. Basically the
faster I can do linear searching operations, the less I need to use
sorted indexes and the more data I can fit into memory (esp with
compression). I've currently got a bunch of C code doing searches, but
ideally I'd like to not rely on a separate C dll in the released code or
require people to have the gcc tool chain to compile it.
I decided to experiment with using asm intrinsics to implement the tight
loops, and took 'count' as a simple testcase. I'm searching a 48M file
of 32bit integers.
By comparison some C code (minus the mmaping stuff):
int count_items(int *from,int *to,int item) {
int count = 0;
while(from != to){
if(*from == item) count++;
from ++;
}
return count;
}
Which when compiled with -O3 takes about 80ms to search the file on a
processor locked at 800MHZ with the filebuffers running warm. The best I
could do with factor in this setup was a few seconds (which I think is
still pretty good).
(Continue reading)
I'm glad that the compiler internals were easy
enough to understand for someone to be able to pull this off.
> I was able to get down to 338ms for count-items after these mods.
The remaining difference between Factor and C can be attributed to
poor register allocation in the backend.
So you can see the compiler has some fundamental limitations right
RSS Feed