Re: Re: A suggestion to optimize the for loops
On Sun, Oct 5, 2008 at 3:25 PM, David MacIver <david.maciver <at> gmail.com> wrote:
>> You'll never get the kind of loop performance from a for comprehension that
>> you get from tuned while-loop code. So, improving something 2x when most
>> people either need 10x or nothing at all seems like a waste.
>
> Sorry, I don't know how to respond to an unbacked assertion which
> flies in the face of observed facts. Could you rephrase that as a
> useful sentence?
And in order to do something shocking and back an opinion with actual
evidence, here's some code:
trait IntAction{
def apply(i : Int) : Unit;
}
object IntLoop{
def times(size : Int, action : IntAction) {
var i = 0;
while(i < size){
action(i);
i += 1;
}
}
}
object LoopBenchmark extends scalax.testing.SizedBenchmark{
var sum = 0;
override val sizes = Array.range(0, 22).map(i => 1 << i)
new Benchmark[Int]("while"){
def data(size : Int) = size;
def test(size : Int) = {
var i = 0;
while (i < size){
i += 1;
sum += i;
}
}
}
new Benchmark[Int]("IntLoop"){
def data(size : Int) = size;
def test(size : Int) = {
IntLoop.times(size, new IntAction(){
def apply(i : Int) { sum += i; }
})
}
}
}
And here are some numbers for running it on the Java 6 server VM (the
client VM gives similar results):
scala> LoopBenchmark.main(null)
Size, while, IntLoop
1, 0.9288, 1.4625
2, 1.4897, 2.1465
4, 1.8854, 2.6657
8, 2.5849, 3.7209
16, 4.0271, 5.7881
32, 3.8079, 5.2768
64, 0.0688, 0.0668
128, 0.1212, 0.1197
256, 0.215, 0.2128
512, 0.3988, 0.4338
1024, 0.7729, 0.7717
2048, 1.5298, 1.5148
4096, 3.0078, 3.004
8192, 6.0164, 5.98
16384, 11.9394, 11.9403
32768, 25.1256, 24.3824
65536, 49.7022, 50.5058
131072, 95.8288, 95.803
262144, 194.632, 194.1631
524288, 381.6128, 381.4799
1048576, 762.965, 763.0022
2097152, 1525.4113, 1527.9294
scala> LoopBenchmark.main(null)
Size, while, IntLoop
1, 0.939, 1.4783
2, 1.5019, 2.1634
4, 1.903, 2.6865
8, 2.614, 3.7507
16, 4.063, 5.8215
32, 3.8529, 5.321
64, 0.1372, 0.1328
128, 0.2432, 0.2394
256, 0.4301, 0.4255
512, 0.7974, 0.8506
1024, 1.5462, 1.5431
2048, 3.046, 3.0461
4096, 6.0248, 7.0722
8192, 12.0352, 12.0585
16384, 24.0016, 24.1609
32768, 49.3035, 48.5215
65536, 98.1966, 98.4091
131072, 191.1968, 191.1784
262144, 389.0127, 387.3132
524288, 774.0732, 766.5339
1048576, 1530.3079, 1525.7336
2097152, 3050.924, 3054.0253
scala> LoopBenchmark.main(null)
Size, while, IntLoop
1, 0.9491, 1.4941
2, 1.5141, 2.1804
4, 1.9208, 2.7074
8, 2.6425, 3.782
16, 4.099, 5.8549
32, 3.8983, 5.364
64, 0.2054, 0.1997
128, 0.3644, 0.3592
256, 0.6442, 0.6388
512, 1.1962, 1.3032
1024, 2.3201, 2.3333
2048, 4.6057, 4.6231
4096, 9.0941, 10.0933
8192, 18.034, 18.1189
16384, 36.0821, 36.1985
32768, 73.9898, 73.4623
65536, 147.6996, 148.1828
131072, 287.9733, 290.9254
262144, 579.8721, 578.1591
524288, 1155.842, 1148.2189
1048576, 2293.5976, 2289.0789
2097152, 4584.8553, 4594.8766
The times are execution times in milliseconds. Note how the numbers
are near as dammit identical for non-trivial loops. For small loops
the fact that you have the single extra object creation does make a
difference (although to be honest I wouldn't trust the numbers for
small loops. Too much noise), but hotspot just inlines the
IntAction.apply and cheerfully turns the two loops into almost exactly
the same thing (in the first few runs of this it was a little more
variable with about 30% variation in performance. Amusingly enough, in
one run the IntAction version was twice as fast as the while loop!
Probably a bad optimisation on hotspot's part.
Now, as long as for loops are using type erased forms we obviously
can't get this sort of performance for loops over primitives, but with
a little bit of optimisation from scalac we probably can. And
certainly this should demonstrate to you than when there's no boxing
overhead while loops are *not* faster.