Sami Farin | 4 Dec 2010 02:24
Picon

using %= in hash_find_slot, and optimized STRING_HASH_1

Seems like you have power of two's as the size.
The patch to idu-hash.c makes mkid run 15% faster.
Tested with icudt42l_dat.s (25631066 bytes) from Chrome.

%=     
  97,173,499,117 PERF_COUNT_HW_CPU_CYCLES (0.00% scaling, ena=34,788,193,052, run=34,788,193,052)
  17,018,624,785 PERF_COUNT_HW_INSTRUCTIONS (0.00% scaling, ena=34,788,193,052, run=34,788,193,052)

&=
  81,802,113,759 PERF_COUNT_HW_CPU_CYCLES (0.00% scaling, ena=29,281,007,448, run=29,281,007,448)
  18,096,072,339 PERF_COUNT_HW_INSTRUCTIONS (0.00% scaling, ena=29,281,007,448, run=29,281,007,448)

Still pretty slow at 3191 cycles/byte.

I checked out STRING_HASH_1, it's not very clever.
After some twiddling, here's the results:
   9,667,074,280 PERF_COUNT_HW_CPU_CYCLES (0.00% scaling, ena=3,458,109,799, run=3,458,109,799)
   6,793,017,489 PERF_COUNT_HW_INSTRUCTIONS (0.00% scaling, ena=3,458,109,799, run=3,458,109,799)

A little better, 377 cycles/byte.

When there are not that many tokens, it can get slower.
Tested mkid with current git repository:

original STRING_HASH_1
  1,222,419,290 PERF_COUNT_HW_CPU_CYCLES (0.00% scaling, ena=437,088,646, run=437,088,646)
  1,240,023,877 PERF_COUNT_HW_INSTRUCTIONS (0.00% scaling, ena=437,088,646, run=437,088,646)

new STRING_HASH_1
  1,312,536,207 PERF_COUNT_HW_CPU_CYCLES (0.00% scaling, ena=470,344,153, run=470,344,153)
(Continue reading)

Sami Farin | 4 Dec 2010 19:19
Picon

Re: using %= in hash_find_slot, and optimized STRING_HASH_1

jhash seems to be okay with idutils, at least on my CPU.  It's faster than
the hack in my previous post, due to fewer collisions.  It could cache result
of strlen's somewhere, but that might be micro-optimizing.
Note that I have not thought about portability or coding style of this patch,
but it works for me.

I also took the hash_ptr from Linux 2.6.36.

Some test data:
  internet-drafts: internet-drafts rsync mirror, 167948 KiB
    Name=148804, Number=35361, String=0, Literal=184165, Comment=0
    Files=7774, Tokens=11600470, Bytes=98628 Kb, Heap=9472+1188 Kb, Output=6446247 (2008853 tok,
3380399 hit)

  icudt42l_dat.s: asm code from Chrome git repository, 25032 KiB
    Name=12, Number=826176, String=0, Literal=826188, Comment=0
    Files=1, Tokens=2703263, Bytes=25030 Kb, Heap=12936+132 Kb, Output=12555284 (8423546 tok, 826188 hit)

=================================================================================================
This patch:
 - internet-drafts
     Load=184165/2097152=9%, Rehash=0, Collisions=55500/13271621=0%, Freq=11600470/184165=62.99

     23,296,084,400 PERF_COUNT_HW_CPU_CYCLES
     20,430,116,277 PERF_COUNT_HW_INSTRUCTIONS

 - icudt42l_dat.s
     Load=826188/1048576=79%, Rehash=9, Collisions=4488198/3826830=117%, Freq=2703263/826188=3.27

     8,607,590,966 PERF_COUNT_HW_CPU_CYCLES (0.00% scaling, ena=3,085,100,435, run=3,085,100,435)
(Continue reading)

Jim Meyering | 15 Dec 2010 10:27
Gravatar

Re: Re: using %= in hash_find_slot, and optimized STRING_HASH_1

Sami Farin wrote:
> jhash seems to be okay with idutils, at least on my CPU.  It's faster than
> the hack in my previous post, due to fewer collisions.  It could cache result
> of strlen's somewhere, but that might be micro-optimizing.
> Note that I have not thought about portability or coding style of this patch,
> but it works for me.
>
> I also took the hash_ptr from Linux 2.6.36.

Thanks for working on idutils.
Before we can use contributions by you, you'll have to
fill out some paperwork, assuming any school/employer
is ok with that.  Here's the procedure I wrote up for coreutils.
For idutils it's the same:

  http://git.savannah.gnu.org/cgit/coreutils.git/tree/HACKING#n429

There might be copyright complications with adding jhash (unless
it's GPLv3) or code from the kernel.  If you're inclined, I'd like to
see if you get similar gains by switching to the use of the hash_pjw
function from gnulib (lib/hash-pjw.c).  Using that would require changing
hash_init to take hashing functions with different signatures: this new
hash function has to know the size of the hash table.  But that would
be a good thing regardless, since it permits better use of the limited
number of bits in each key.

Gmane