Osei Poku | 23 Sep 23:56

Hash Table performance

Hi,

CCL extensively uses the ENUMERATE-HASH-KEYS-AND-VALUES function  
underneath most of hash table functions (MAPHASH, WITH-HASH-TABLE- 
ITERATOR).  This causes a great deal of waste when iterating through a  
very large hash table (like I have now ~10^8 entries).  The way I  
understand the code from briefly looking at l0-hash.lisp and  
hash.lisp, the contents of the hash table are first stored in a pair  
of vectors, then the vectors are iterated through.

WITH-HASH-TABLE-ITERATOR is used everywhere like in the expansion of  
the LOOP macro.  There is simply no standard way in CCL to iterate  
through the contents of a hash table without doing this double work.   
Is there some fundamental limitation of the way hash tables are  
implemented in CCL that prevent iterating through its contents only  
once?

Osei
Chris Curtis | 24 Sep 00:09

Re: Hash Table performance


On Sep 23, 2008, at 5:58 PM, Osei Poku wrote:

> Hi,
>
> CCL extensively uses the ENUMERATE-HASH-KEYS-AND-VALUES function
> underneath most of hash table functions (MAPHASH, WITH-HASH-TABLE-
> ITERATOR).  This causes a great deal of waste when iterating through a
> very large hash table (like I have now ~10^8 entries).  The way I

If you need to iterate through 10^8 entries, a hash table is simply  
not the right data structure. For that matter, if you need to iterate  
through any number of entries, a hashtable is generally a Bad Idea.

> understand the code from briefly looking at l0-hash.lisp and
> hash.lisp, the contents of the hash table are first stored in a pair
> of vectors, then the vectors are iterated through.
>
> WITH-HASH-TABLE-ITERATOR is used everywhere like in the expansion of
> the LOOP macro.  There is simply no standard way in CCL to iterate
> through the contents of a hash table without doing this double work.
> Is there some fundamental limitation of the way hash tables are
> implemented in CCL that prevent iterating through its contents only
> once?

I don't think this has anything to do with CCL. Hashtables get their  
insert and lookup performance at the expense of iterability. Linked  
lists get their iteration performance at the expense of insert and  
lookup performance. Trees of various types spread the expense around  
differently.
(Continue reading)

Osei Poku | 24 Sep 00:34

Re: Hash Table performance

On Sep 23, 2008, at 6:09 PM, Chris Curtis wrote:

>
> On Sep 23, 2008, at 5:58 PM, Osei Poku wrote:
>
>> Hi,
>>
>> CCL extensively uses the ENUMERATE-HASH-KEYS-AND-VALUES function
>> underneath most of hash table functions (MAPHASH, WITH-HASH-TABLE-
>> ITERATOR).  This causes a great deal of waste when iterating  
>> through a
>> very large hash table (like I have now ~10^8 entries).  The way I
>
> If you need to iterate through 10^8 entries, a hash table is simply  
> not the right data structure. For that matter, if you need to  
> iterate through any number of entries, a hashtable is generally a  
> Bad Idea.

This is understood and I am in perfect agreement.  In fact, my  
ultimate intent is quick lookups and not to use the hash table for  
iteration at all.  I was looking into iteration to be able check the  
contents of the hash table for debugging purposes.  I am perfectly  
happy with just being able to grab any key in the hash table and then  
displaying its value.  I guess I got caught up with trying to use  
slime inspector (which uses WITH-HASH-TABLE-ITERATOR) and somewhere  
along the way I lost track of what I actually needed to do in the  
first place.

After utilizing a few more brain cells, I realize I was wasting time  
tackling the wrong problem.  I'm just trying to debug the parser that  
(Continue reading)

Gail Zacharias | 24 Sep 00:34

Re: Hash Table performance

At 9/23/2008 05:58 PM, Osei Poku wrote:
>Hi,
>
>CCL extensively uses the ENUMERATE-HASH-KEYS-AND-VALUES function
>underneath most of hash table functions (MAPHASH, WITH-HASH-TABLE-
>ITERATOR).  This causes a great deal of waste when iterating through a
>very large hash table (like I have now ~10^8 entries).  The way I
>understand the code from briefly looking at l0-hash.lisp and
>hash.lisp, the contents of the hash table are first stored in a pair
>of vectors, then the vectors are iterated through.
>
>WITH-HASH-TABLE-ITERATOR is used everywhere like in the expansion of
>the LOOP macro.  There is simply no standard way in CCL to iterate
>through the contents of a hash table without doing this double work.
>Is there some fundamental limitation of the way hash tables are
>implemented in CCL that prevent iterating through its contents only
>once?

Sort of.  CCL does rehashing in place.  If a hash table were rehashed
while you're iterating over it, the result would be totally random.

However, there are potentially ways around this.  For example, a hash
table could know whether it's being iterated over, and do a copying
rehash if so.  Of course if this does happen (and especially if it
happens repeatedly) you might end up consing as much or more as
the current iterator, but only if your keys are address based and
recently consed, so it's probably a good bet.

The new "nearly lock free" hash tables that are in the trunk always do
a copying rehash (for other reasons), so they especially should be
(Continue reading)


Gmane