4 Dec 2010 02:24
using %= in hash_find_slot, and optimized STRING_HASH_1
Seems like you have power of two's as the size. The patch to idu-hash.c makes mkid run 15% faster. Tested with icudt42l_dat.s (25631066 bytes) from Chrome. %= 97,173,499,117 PERF_COUNT_HW_CPU_CYCLES (0.00% scaling, ena=34,788,193,052, run=34,788,193,052) 17,018,624,785 PERF_COUNT_HW_INSTRUCTIONS (0.00% scaling, ena=34,788,193,052, run=34,788,193,052) &= 81,802,113,759 PERF_COUNT_HW_CPU_CYCLES (0.00% scaling, ena=29,281,007,448, run=29,281,007,448) 18,096,072,339 PERF_COUNT_HW_INSTRUCTIONS (0.00% scaling, ena=29,281,007,448, run=29,281,007,448) Still pretty slow at 3191 cycles/byte. I checked out STRING_HASH_1, it's not very clever. After some twiddling, here's the results: 9,667,074,280 PERF_COUNT_HW_CPU_CYCLES (0.00% scaling, ena=3,458,109,799, run=3,458,109,799) 6,793,017,489 PERF_COUNT_HW_INSTRUCTIONS (0.00% scaling, ena=3,458,109,799, run=3,458,109,799) A little better, 377 cycles/byte. When there are not that many tokens, it can get slower. Tested mkid with current git repository: original STRING_HASH_1 1,222,419,290 PERF_COUNT_HW_CPU_CYCLES (0.00% scaling, ena=437,088,646, run=437,088,646) 1,240,023,877 PERF_COUNT_HW_INSTRUCTIONS (0.00% scaling, ena=437,088,646, run=437,088,646) new STRING_HASH_1 1,312,536,207 PERF_COUNT_HW_CPU_CYCLES (0.00% scaling, ena=470,344,153, run=470,344,153)(Continue reading)
RSS Feed