Janek S. | 23 Nov 16:44 2012
Picon

Problem with benchmarking FFI calls with Criterion

I am using Criterion library to benchmark C code called via FFI bindings and I've ran into a 
problem that looks like a bug. 

The first benchmark that uses FFI runs correctly, but subsequent benchmarks run much longer. I 
created demo code (about 50 lines, available at github: https://gist.github.com/4135698 ) in 
which C function copies a vector of doubles. I benchmark that function a couple of times. First 
run results in avarage time of about 17us, subsequent runs take about 45us. In my real code 
additional time was about 15us and it seemed to be a constant factor, not relative to "correct" 
run time. The surprising thing is that if my C function only allocates memory and does no 
copying:

double* c_copy( double* inArr, int arrLen ) {
  double* outArr = malloc( arrLen * sizeof( double ) );

  return outArr;
}

then all is well - all runs take similar amount of time. I also noticed that sometimes in my demo 
code all runs take about 45us, but this does not seem to happen in my real code - first run is 
always shorter.

Does anyone have an idea what is going on?

Janek
Edward Z. Yang | 23 Nov 18:47 2012
Picon

Re: Problem with benchmarking FFI calls with Criterion

Hello Janek,

What happens if you do the benchmark without unsafePerformIO involved?

Edward

Excerpts from Janek S.'s message of Fri Nov 23 10:44:15 -0500 2012:
> I am using Criterion library to benchmark C code called via FFI bindings and I've ran into a 
> problem that looks like a bug. 
> 
> The first benchmark that uses FFI runs correctly, but subsequent benchmarks run much longer. I 
> created demo code (about 50 lines, available at github: https://gist.github.com/4135698 ) in 
> which C function copies a vector of doubles. I benchmark that function a couple of times. First 
> run results in avarage time of about 17us, subsequent runs take about 45us. In my real code 
> additional time was about 15us and it seemed to be a constant factor, not relative to "correct" 
> run time. The surprising thing is that if my C function only allocates memory and does no 
> copying:
> 
> double* c_copy( double* inArr, int arrLen ) {
>   double* outArr = malloc( arrLen * sizeof( double ) );
> 
>   return outArr;
> }
> 
> then all is well - all runs take similar amount of time. I also noticed that sometimes in my demo 
> code all runs take about 45us, but this does not seem to happen in my real code - first run is 
> always shorter.
> 
> Does anyone have an idea what is going on?
> 
(Continue reading)

Janek S. | 23 Nov 19:42 2012
Picon

Re: Problem with benchmarking FFI calls with Criterion

> What happens if you do the benchmark without unsafePerformIO involved?
I removed unsafePerformIO, changed copy to have type Vector Double -> IO (Vector Double) and 
modified benchmarks like this:

bench "C binding" $ whnfIO (copy signal)

I see no difference - one benchmark runs fast, remaining ones run slow.

Janek

>
> Excerpts from Janek S.'s message of Fri Nov 23 10:44:15 -0500 2012:
> > I am using Criterion library to benchmark C code called via FFI bindings
> > and I've ran into a problem that looks like a bug.
> >
> > The first benchmark that uses FFI runs correctly, but subsequent
> > benchmarks run much longer. I created demo code (about 50 lines,
> > available at github: https://gist.github.com/4135698 ) in which C
> > function copies a vector of doubles. I benchmark that function a couple
> > of times. First run results in avarage time of about 17us, subsequent
> > runs take about 45us. In my real code additional time was about 15us and
> > it seemed to be a constant factor, not relative to "correct" run time.
> > The surprising thing is that if my C function only allocates memory and
> > does no copying:
> >
> > double* c_copy( double* inArr, int arrLen ) {
> >   double* outArr = malloc( arrLen * sizeof( double ) );
> >
> >   return outArr;
> > }
(Continue reading)

Edward Z. Yang | 23 Nov 20:28 2012
Picon

Re: Problem with benchmarking FFI calls with Criterion

Running the sample code on GHC 7.4.2, I don't see the "one
fast, rest slow" behavior.  What version of GHC are you running?

Edward

Excerpts from Janek S.'s message of Fri Nov 23 13:42:03 -0500 2012:
> > What happens if you do the benchmark without unsafePerformIO involved?
> I removed unsafePerformIO, changed copy to have type Vector Double -> IO (Vector Double) and 
> modified benchmarks like this:
> 
> bench "C binding" $ whnfIO (copy signal)
> 
> I see no difference - one benchmark runs fast, remaining ones run slow.
> 
> Janek
> 
> >
> > Excerpts from Janek S.'s message of Fri Nov 23 10:44:15 -0500 2012:
> > > I am using Criterion library to benchmark C code called via FFI bindings
> > > and I've ran into a problem that looks like a bug.
> > >
> > > The first benchmark that uses FFI runs correctly, but subsequent
> > > benchmarks run much longer. I created demo code (about 50 lines,
> > > available at github: https://gist.github.com/4135698 ) in which C
> > > function copies a vector of doubles. I benchmark that function a couple
> > > of times. First run results in avarage time of about 17us, subsequent
> > > runs take about 45us. In my real code additional time was about 15us and
> > > it seemed to be a constant factor, not relative to "correct" run time.
> > > The surprising thing is that if my C function only allocates memory and
> > > does no copying:
(Continue reading)

Janek S. | 24 Nov 09:01 2012
Picon

Re: Problem with benchmarking FFI calls with Criterion

I'm using GHC 7.4.2 on x86_64 openSUSE Linux, kernel 2.6.37.6. 

Janek

Dnia piątek, 23 listopada 2012, Edward Z. Yang napisał:
> Running the sample code on GHC 7.4.2, I don't see the "one
> fast, rest slow" behavior.  What version of GHC are you running?
>
> Edward
>
> Excerpts from Janek S.'s message of Fri Nov 23 13:42:03 -0500 2012:
> > > What happens if you do the benchmark without unsafePerformIO involved?
> >
> > I removed unsafePerformIO, changed copy to have type Vector Double -> IO
> > (Vector Double) and modified benchmarks like this:
> >
> > bench "C binding" $ whnfIO (copy signal)
> >
> > I see no difference - one benchmark runs fast, remaining ones run slow.
> >
> > Janek
> >
> > > Excerpts from Janek S.'s message of Fri Nov 23 10:44:15 -0500 2012:
> > > > I am using Criterion library to benchmark C code called via FFI
> > > > bindings and I've ran into a problem that looks like a bug.
> > > >
> > > > The first benchmark that uses FFI runs correctly, but subsequent
> > > > benchmarks run much longer. I created demo code (about 50 lines,
> > > > available at github: https://gist.github.com/4135698 ) in which C
> > > > function copies a vector of doubles. I benchmark that function a
(Continue reading)

Branimir Maksimovic | 24 Nov 19:09 2012
Picon

Re: Problem with benchmarking FFI calls with Criterion


I don't see such behavior neither.
ubuntu 12.10, ghc 7.4.2.

Perhaps this has to do with how malloc allocates /cache
behavior. If you try not to allocate array rather use existing 
one perhaps there would be no inconsistency?
It looks to me that's about CPU cache performance.

Branimir

>
> I'm using GHC 7.4.2 on x86_64 openSUSE Linux, kernel 2.6.37.6.
>
> Janek
>
> Dnia piątek, 23 listopada 2012, Edward Z. Yang napisał:
> > Running the sample code on GHC 7.4.2, I don't see the "one
> > fast, rest slow" behavior. What version of GHC are you running?
> >
> > Edward
> >
> > Excerpts from Janek S.'s message of Fri Nov 23 13:42:03 -0500 2012:
> > > > What happens if you do the benchmark without unsafePerformIO involved?
> > >
> > > I removed unsafePerformIO, changed copy to have type Vector Double -> IO
> > > (Vector Double) and modified benchmarks like this:
> > >
> > > bench "C binding" $ whnfIO (copy signal)
> > >
> > > I see no difference - one benchmark runs fast, remaining ones run slow.
> > >
> > > Janek
> > >
> > > > Excerpts from Janek S.'s message of Fri Nov 23 10:44:15 -0500 2012:
> > > > > I am using Criterion library to benchmark C code called via FFI
> > > > > bindings and I've ran into a problem that looks like a bug.
> > > > >
> > > > > The first benchmark that uses FFI runs correctly, but subsequent
> > > > > benchmarks run much longer. I created demo code (about 50 lines,
> > > > > available at github: https://gist.github.com/4135698 ) in which C
> > > > > function copies a vector of doubles. I benchmark that function a
> > > > > couple of times. First run results in avarage time of about 17us,
> > > > > subsequent runs take about 45us. In my real code additional time was
> > > > > about 15us and it seemed to be a constant factor, not relative to
> > > > > "correct" run time. The surprising thing is that if my C function
> > > > > only allocates memory and does no copying:
> > > > >
> > > > > double* c_copy( double* inArr, int arrLen ) {
> > > > > double* outArr = malloc( arrLen * sizeof( double ) );
> > > > >
> > > > > return outArr;
> > > > > }
> > > > >
> > > > > then all is well - all runs take similar amount of time. I also
> > > > > noticed that sometimes in my demo code all runs take about 45us, but
> > > > > this does not seem to happen in my real code - first run is always
> > > > > shorter.
> > > > >
> > > > > Does anyone have an idea what is going on?
> > > > >
> > > > > Janek
>
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe <at> haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Janek S. | 25 Nov 10:38 2012
Picon

Re: Problem with benchmarking FFI calls with Criterion

Well, it seems that this only happens on my machine. I will try to test this code on different 
computer and see if I can reproduce it.

I don't think using existing vector is a good idea - it would make the code impure.

Janek

Dnia sobota, 24 listopada 2012, Branimir Maksimovic napisał:
> I don't see such behavior neither.ubuntu 12.10, ghc 7.4.2.
> Perhaps this has to do with how malloc allocates /cachebehavior. If you try
> not to allocate array rather use existing one perhaps there would be no
> inconsistency?It looks to me that's about CPU cache performance. Branimir
>
> > I'm using GHC 7.4.2 on x86_64 openSUSE Linux, kernel 2.6.37.6.
> >
> > Janek
> >
> > Dnia piątek, 23 listopada 2012, Edward Z. Yang napisał:
> > > Running the sample code on GHC 7.4.2, I don't see the "one
> > > fast, rest slow" behavior.  What version of GHC are you running?
> > >
> > > Edward
> > >
> > > Excerpts from Janek S.'s message of Fri Nov 23 13:42:03 -0500 2012:
> > > > > What happens if you do the benchmark without unsafePerformIO
> > > > > involved?
> > > >
> > > > I removed unsafePerformIO, changed copy to have type Vector Double ->
> > > > IO (Vector Double) and modified benchmarks like this:
> > > >
> > > > bench "C binding" $ whnfIO (copy signal)
> > > >
> > > > I see no difference - one benchmark runs fast, remaining ones run
> > > > slow.
> > > >
> > > > Janek
> > > >
> > > > > Excerpts from Janek S.'s message of Fri Nov 23 10:44:15 -0500 2012:
> > > > > > I am using Criterion library to benchmark C code called via FFI
> > > > > > bindings and I've ran into a problem that looks like a bug.
> > > > > >
> > > > > > The first benchmark that uses FFI runs correctly, but subsequent
> > > > > > benchmarks run much longer. I created demo code (about 50 lines,
> > > > > > available at github: https://gist.github.com/4135698 ) in which C
> > > > > > function copies a vector of doubles. I benchmark that function a
> > > > > > couple of times. First run results in avarage time of about 17us,
> > > > > > subsequent runs take about 45us. In my real code additional time
> > > > > > was about 15us and it seemed to be a constant factor, not
> > > > > > relative to "correct" run time. The surprising thing is that if
> > > > > > my C function only allocates memory and does no copying:
> > > > > >
> > > > > > double* c_copy( double* inArr, int arrLen ) {
> > > > > >   double* outArr = malloc( arrLen * sizeof( double ) );
> > > > > >
> > > > > >   return outArr;
> > > > > > }
> > > > > >
> > > > > > then all is well - all runs take similar amount of time. I also
> > > > > > noticed that sometimes in my demo code all runs take about 45us,
> > > > > > but this does not seem to happen in my real code - first run is
> > > > > > always shorter.
> > > > > >
> > > > > > Does anyone have an idea what is going on?
> > > > > >
> > > > > > Janek
> >
> > _______________________________________________
> > Haskell-Cafe mailing list
> > Haskell-Cafe <at> haskell.org
> > http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Janek S. | 27 Nov 13:21 2012
Picon

Re: Problem with benchmarking FFI calls with Criterion

I tested the same code on my second machine - Debian Squeeze (kernel 2.6.32) with GHC 7.4.1 - and 
the results are extremely surprising. At first I was unable to reproduce the problem and got 
consistent runtimes of about 107us:

benchmarking FFI/C binding
mean: 107.3837 us, lb 107.2013 us, ub 107.5862 us, ci 0.950
std dev: 983.6046 ns, lb 822.6750 ns, ub 1.292724 us, ci 0.950

benchmarking FFI/C binding
mean: 108.1152 us, lb 107.9457 us, ub 108.3052 us, ci 0.950
std dev: 916.2469 ns, lb 793.1004 ns, ub 1.122127 us, ci 0.950

I started experimenting with the vector size and after bumping its size to 32K elements I started 
getting this:

benchmarking FFI/C binding
mean: 38.50100 us, lb 36.71525 us, ub 46.87665 us, ci 0.950
std dev: 16.93131 us, lb 1.033678 us, ub 40.23900 us, ci 0.950
found 6 outliers among 100 samples (6.0%)
  3 (3.0%) low mild
  3 (3.0%) high severe
variance introduced by outliers: 98.921%
variance is severely inflated by outliers

benchmarking FFI/C binding
mean: 209.9733 us, lb 209.5316 us, ub 210.4680 us, ci 0.950
std dev: 2.401398 us, lb 2.052981 us, ub 2.889688 us, ci 0.950

First result is always about 39us (2,5 faster, despite longer signal!) while the remaining 
benchmarks take almost two times longer.

Janek

Dnia niedziela, 25 listopada 2012, Janek S. napisał:
> Well, it seems that this only happens on my machine. I will try to test
> this code on different computer and see if I can reproduce it.
>
> I don't think using existing vector is a good idea - it would make the code
> impure.
>
> Janek
>
> Dnia sobota, 24 listopada 2012, Branimir Maksimovic napisał:
> > I don't see such behavior neither.ubuntu 12.10, ghc 7.4.2.
> > Perhaps this has to do with how malloc allocates /cachebehavior. If you
> > try not to allocate array rather use existing one perhaps there would be
> > no inconsistency?It looks to me that's about CPU cache performance.
> > Branimir
> >
> > > I'm using GHC 7.4.2 on x86_64 openSUSE Linux, kernel 2.6.37.6.
> > >
> > > Janek
> > >
> > > Dnia piątek, 23 listopada 2012, Edward Z. Yang napisał:
> > > > Running the sample code on GHC 7.4.2, I don't see the "one
> > > > fast, rest slow" behavior.  What version of GHC are you running?
> > > >
> > > > Edward
> > > >
> > > > Excerpts from Janek S.'s message of Fri Nov 23 13:42:03 -0500 2012:
> > > > > > What happens if you do the benchmark without unsafePerformIO
> > > > > > involved?
> > > > >
> > > > > I removed unsafePerformIO, changed copy to have type Vector Double
> > > > > -> IO (Vector Double) and modified benchmarks like this:
> > > > >
> > > > > bench "C binding" $ whnfIO (copy signal)
> > > > >
> > > > > I see no difference - one benchmark runs fast, remaining ones run
> > > > > slow.
> > > > >
> > > > > Janek
> > > > >
> > > > > > Excerpts from Janek S.'s message of Fri Nov 23 10:44:15 -0500 2012:
> > > > > > > I am using Criterion library to benchmark C code called via FFI
> > > > > > > bindings and I've ran into a problem that looks like a bug.
> > > > > > >
> > > > > > > The first benchmark that uses FFI runs correctly, but
> > > > > > > subsequent benchmarks run much longer. I created demo code
> > > > > > > (about 50 lines, available at github:
> > > > > > > https://gist.github.com/4135698 ) in which C function copies a
> > > > > > > vector of doubles. I benchmark that function a couple of times.
> > > > > > > First run results in avarage time of about 17us, subsequent
> > > > > > > runs take about 45us. In my real code additional time was about
> > > > > > > 15us and it seemed to be a constant factor, not relative to
> > > > > > > "correct" run time. The surprising thing is that if my C
> > > > > > > function only allocates memory and does no copying:
> > > > > > >
> > > > > > > double* c_copy( double* inArr, int arrLen ) {
> > > > > > >   double* outArr = malloc( arrLen * sizeof( double ) );
> > > > > > >
> > > > > > >   return outArr;
> > > > > > > }
> > > > > > >
> > > > > > > then all is well - all runs take similar amount of time. I also
> > > > > > > noticed that sometimes in my demo code all runs take about
> > > > > > > 45us, but this does not seem to happen in my real code - first
> > > > > > > run is always shorter.
> > > > > > >
> > > > > > > Does anyone have an idea what is going on?
> > > > > > >
> > > > > > > Janek
> > >
> > > _______________________________________________
> > > Haskell-Cafe mailing list
> > > Haskell-Cafe <at> haskell.org
> > > http://www.haskell.org/mailman/listinfo/haskell-cafe
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe <at> haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Jake McArthur | 27 Nov 14:34 2012
Picon

Re: Problem with benchmarking FFI calls with Criterion

I once had a problem like this. It turned out that my laptop was stepping the cpu clock rate down whenever it got warm. Disabling that feature in my BIOS fixed it. Your problem might be similar.

On Nov 27, 2012 7:23 AM, "Janek S." <fremenzone <at> poczta.onet.pl> wrote:
I tested the same code on my second machine - Debian Squeeze (kernel 2.6.32) with GHC 7.4.1 - and
the results are extremely surprising. At first I was unable to reproduce the problem and got
consistent runtimes of about 107us:

benchmarking FFI/C binding
mean: 107.3837 us, lb 107.2013 us, ub 107.5862 us, ci 0.950
std dev: 983.6046 ns, lb 822.6750 ns, ub 1.292724 us, ci 0.950

benchmarking FFI/C binding
mean: 108.1152 us, lb 107.9457 us, ub 108.3052 us, ci 0.950
std dev: 916.2469 ns, lb 793.1004 ns, ub 1.122127 us, ci 0.950

I started experimenting with the vector size and after bumping its size to 32K elements I started
getting this:

benchmarking FFI/C binding
mean: 38.50100 us, lb 36.71525 us, ub 46.87665 us, ci 0.950
std dev: 16.93131 us, lb 1.033678 us, ub 40.23900 us, ci 0.950
found 6 outliers among 100 samples (6.0%)
  3 (3.0%) low mild
  3 (3.0%) high severe
variance introduced by outliers: 98.921%
variance is severely inflated by outliers

benchmarking FFI/C binding
mean: 209.9733 us, lb 209.5316 us, ub 210.4680 us, ci 0.950
std dev: 2.401398 us, lb 2.052981 us, ub 2.889688 us, ci 0.950

First result is always about 39us (2,5 faster, despite longer signal!) while the remaining
benchmarks take almost two times longer.

Janek


Dnia niedziela, 25 listopada 2012, Janek S. napisał:
> Well, it seems that this only happens on my machine. I will try to test
> this code on different computer and see if I can reproduce it.
>
> I don't think using existing vector is a good idea - it would make the code
> impure.
>
> Janek
>
> Dnia sobota, 24 listopada 2012, Branimir Maksimovic napisał:
> > I don't see such behavior neither.ubuntu 12.10, ghc 7.4.2.
> > Perhaps this has to do with how malloc allocates /cachebehavior. If you
> > try not to allocate array rather use existing one perhaps there would be
> > no inconsistency?It looks to me that's about CPU cache performance.
> > Branimir
> >
> > > I'm using GHC 7.4.2 on x86_64 openSUSE Linux, kernel 2.6.37.6.
> > >
> > > Janek
> > >
> > > Dnia piątek, 23 listopada 2012, Edward Z. Yang napisał:
> > > > Running the sample code on GHC 7.4.2, I don't see the "one
> > > > fast, rest slow" behavior.  What version of GHC are you running?
> > > >
> > > > Edward
> > > >
> > > > Excerpts from Janek S.'s message of Fri Nov 23 13:42:03 -0500 2012:
> > > > > > What happens if you do the benchmark without unsafePerformIO
> > > > > > involved?
> > > > >
> > > > > I removed unsafePerformIO, changed copy to have type Vector Double
> > > > > -> IO (Vector Double) and modified benchmarks like this:
> > > > >
> > > > > bench "C binding" $ whnfIO (copy signal)
> > > > >
> > > > > I see no difference - one benchmark runs fast, remaining ones run
> > > > > slow.
> > > > >
> > > > > Janek
> > > > >
> > > > > > Excerpts from Janek S.'s message of Fri Nov 23 10:44:15 -0500 2012:
> > > > > > > I am using Criterion library to benchmark C code called via FFI
> > > > > > > bindings and I've ran into a problem that looks like a bug.
> > > > > > >
> > > > > > > The first benchmark that uses FFI runs correctly, but
> > > > > > > subsequent benchmarks run much longer. I created demo code
> > > > > > > (about 50 lines, available at github:
> > > > > > > https://gist.github.com/4135698 ) in which C function copies a
> > > > > > > vector of doubles. I benchmark that function a couple of times.
> > > > > > > First run results in avarage time of about 17us, subsequent
> > > > > > > runs take about 45us. In my real code additional time was about
> > > > > > > 15us and it seemed to be a constant factor, not relative to
> > > > > > > "correct" run time. The surprising thing is that if my C
> > > > > > > function only allocates memory and does no copying:
> > > > > > >
> > > > > > > double* c_copy( double* inArr, int arrLen ) {
> > > > > > >   double* outArr = malloc( arrLen * sizeof( double ) );
> > > > > > >
> > > > > > >   return outArr;
> > > > > > > }
> > > > > > >
> > > > > > > then all is well - all runs take similar amount of time. I also
> > > > > > > noticed that sometimes in my demo code all runs take about
> > > > > > > 45us, but this does not seem to happen in my real code - first
> > > > > > > run is always shorter.
> > > > > > >
> > > > > > > Does anyone have an idea what is going on?
> > > > > > >
> > > > > > > Janek
> > >
> > > _______________________________________________
> > > Haskell-Cafe mailing list
> > > Haskell-Cafe <at> haskell.org
> > > http://www.haskell.org/mailman/listinfo/haskell-cafe
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe <at> haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe



_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Janek S. | 27 Nov 14:41 2012
Picon

Re: Problem with benchmarking FFI calls with Criterion

Dnia wtorek, 27 listopada 2012, Jake McArthur napisał:
> I once had a problem like this. It turned out that my laptop was stepping
> the cpu clock rate down whenever it got warm. Disabling that feature in my
> BIOS fixed it. Your problem might be similar.
I just check - I disabled frequency scaling and results are the same. Actually I doubt that 39us 
of benchmarking would cause CPU overheating with such repeatibility. Besides, this wouldn't 
explain why the first benchmark actually got faster.

Janek

>
> On Nov 27, 2012 7:23 AM, "Janek S." <fremenzone <at> poczta.onet.pl> wrote:
> > I tested the same code on my second machine - Debian Squeeze (kernel
> > 2.6.32) with GHC 7.4.1 - and
> > the results are extremely surprising. At first I was unable to reproduce
> > the problem and got
> > consistent runtimes of about 107us:
> >
> > benchmarking FFI/C binding
> > mean: 107.3837 us, lb 107.2013 us, ub 107.5862 us, ci 0.950
> > std dev: 983.6046 ns, lb 822.6750 ns, ub 1.292724 us, ci 0.950
> >
> > benchmarking FFI/C binding
> > mean: 108.1152 us, lb 107.9457 us, ub 108.3052 us, ci 0.950
> > std dev: 916.2469 ns, lb 793.1004 ns, ub 1.122127 us, ci 0.950
> >
> > I started experimenting with the vector size and after bumping its size
> > to 32K elements I started
> > getting this:
> >
> > benchmarking FFI/C binding
> > mean: 38.50100 us, lb 36.71525 us, ub 46.87665 us, ci 0.950
> > std dev: 16.93131 us, lb 1.033678 us, ub 40.23900 us, ci 0.950
> > found 6 outliers among 100 samples (6.0%)
> >   3 (3.0%) low mild
> >   3 (3.0%) high severe
> > variance introduced by outliers: 98.921%
> > variance is severely inflated by outliers
> >
> > benchmarking FFI/C binding
> > mean: 209.9733 us, lb 209.5316 us, ub 210.4680 us, ci 0.950
> > std dev: 2.401398 us, lb 2.052981 us, ub 2.889688 us, ci 0.950
> >
> > First result is always about 39us (2,5 faster, despite longer signal!)
> > while the remaining
> > benchmarks take almost two times longer.
> >
> > Janek
> >
> > Dnia niedziela, 25 listopada 2012, Janek S. napisał:
> > > Well, it seems that this only happens on my machine. I will try to test
> > > this code on different computer and see if I can reproduce it.
> > >
> > > I don't think using existing vector is a good idea - it would make the
> >
> > code
> >
> > > impure.
> > >
> > > Janek
> > >
> > > Dnia sobota, 24 listopada 2012, Branimir Maksimovic napisał:
> > > > I don't see such behavior neither.ubuntu 12.10, ghc 7.4.2.
> > > > Perhaps this has to do with how malloc allocates /cachebehavior. If
> > > > you try not to allocate array rather use existing one perhaps there
> > > > would
> >
> > be
> >
> > > > no inconsistency?It looks to me that's about CPU cache performance.
> > > > Branimir
> > > >
> > > > > I'm using GHC 7.4.2 on x86_64 openSUSE Linux, kernel 2.6.37.6.
> > > > >
> > > > > Janek
> > > > >
> > > > > Dnia piątek, 23 listopada 2012, Edward Z. Yang napisał:
> > > > > > Running the sample code on GHC 7.4.2, I don't see the "one
> > > > > > fast, rest slow" behavior.  What version of GHC are you running?
> > > > > >
> > > > > > Edward
> > > > > >
> > > > > > Excerpts from Janek S.'s message of Fri Nov 23 13:42:03 -0500 2012:
> > > > > > > > What happens if you do the benchmark without unsafePerformIO
> > > > > > > > involved?
> > > > > > >
> > > > > > > I removed unsafePerformIO, changed copy to have type Vector
> >
> > Double
> >
> > > > > > > -> IO (Vector Double) and modified benchmarks like this:
> > > > > > >
> > > > > > > bench "C binding" $ whnfIO (copy signal)
> > > > > > >
> > > > > > > I see no difference - one benchmark runs fast, remaining ones
> > > > > > > run slow.
> > > > > > >
> > > > > > > Janek
> > > > > > >
> > > > > > > > Excerpts from Janek S.'s message of Fri Nov 23 10:44:15 -0500
> >
> > 2012:
> > > > > > > > > I am using Criterion library to benchmark C code called via
> >
> > FFI
> >
> > > > > > > > > bindings and I've ran into a problem that looks like a bug.
> > > > > > > > >
> > > > > > > > > The first benchmark that uses FFI runs correctly, but
> > > > > > > > > subsequent benchmarks run much longer. I created demo code
> > > > > > > > > (about 50 lines, available at github:
> > > > > > > > > https://gist.github.com/4135698 ) in which C function
> >
> > copies a
> >
> > > > > > > > > vector of doubles. I benchmark that function a couple of
> >
> > times.
> >
> > > > > > > > > First run results in avarage time of about 17us, subsequent
> > > > > > > > > runs take about 45us. In my real code additional time was
> >
> > about
> >
> > > > > > > > > 15us and it seemed to be a constant factor, not relative to
> > > > > > > > > "correct" run time. The surprising thing is that if my C
> > > > > > > > > function only allocates memory and does no copying:
> > > > > > > > >
> > > > > > > > > double* c_copy( double* inArr, int arrLen ) {
> > > > > > > > >   double* outArr = malloc( arrLen * sizeof( double ) );
> > > > > > > > >
> > > > > > > > >   return outArr;
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > then all is well - all runs take similar amount of time. I
> >
> > also
> >
> > > > > > > > > noticed that sometimes in my demo code all runs take about
> > > > > > > > > 45us, but this does not seem to happen in my real code -
> >
> > first
> >
> > > > > > > > > run is always shorter.
> > > > > > > > >
> > > > > > > > > Does anyone have an idea what is going on?
> > > > > > > > >
> > > > > > > > > Janek
> > > > >
> > > > > _______________________________________________
> > > > > Haskell-Cafe mailing list
> > > > > Haskell-Cafe <at> haskell.org
> > > > > http://www.haskell.org/mailman/listinfo/haskell-cafe
> > >
> > > _______________________________________________
> > > Haskell-Cafe mailing list
> > > Haskell-Cafe <at> haskell.org
> > > http://www.haskell.org/mailman/listinfo/haskell-cafe
> >
> > _______________________________________________
> > Haskell-Cafe mailing list
> > Haskell-Cafe <at> haskell.org
> > http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Gregory Collins | 27 Nov 14:47 2012
Picon

Re: Problem with benchmarking FFI calls with Criterion

Did you pass the option to criterion asking it to do a GC between
trials? You might be measuring a GC pause.

On Tue, Nov 27, 2012 at 2:41 PM, Janek S. <fremenzone <at> poczta.onet.pl> wrote:
> Dnia wtorek, 27 listopada 2012, Jake McArthur napisał:
>> I once had a problem like this. It turned out that my laptop was stepping
>> the cpu clock rate down whenever it got warm. Disabling that feature in my
>> BIOS fixed it. Your problem might be similar.
> I just check - I disabled frequency scaling and results are the same. Actually I doubt that 39us
> of benchmarking would cause CPU overheating with such repeatibility. Besides, this wouldn't
> explain why the first benchmark actually got faster.
>
> Janek
>
>>
>> On Nov 27, 2012 7:23 AM, "Janek S." <fremenzone <at> poczta.onet.pl> wrote:
>> > I tested the same code on my second machine - Debian Squeeze (kernel
>> > 2.6.32) with GHC 7.4.1 - and
>> > the results are extremely surprising. At first I was unable to reproduce
>> > the problem and got
>> > consistent runtimes of about 107us:
>> >
>> > benchmarking FFI/C binding
>> > mean: 107.3837 us, lb 107.2013 us, ub 107.5862 us, ci 0.950
>> > std dev: 983.6046 ns, lb 822.6750 ns, ub 1.292724 us, ci 0.950
>> >
>> > benchmarking FFI/C binding
>> > mean: 108.1152 us, lb 107.9457 us, ub 108.3052 us, ci 0.950
>> > std dev: 916.2469 ns, lb 793.1004 ns, ub 1.122127 us, ci 0.950
>> >
>> > I started experimenting with the vector size and after bumping its size
>> > to 32K elements I started
>> > getting this:
>> >
>> > benchmarking FFI/C binding
>> > mean: 38.50100 us, lb 36.71525 us, ub 46.87665 us, ci 0.950
>> > std dev: 16.93131 us, lb 1.033678 us, ub 40.23900 us, ci 0.950
>> > found 6 outliers among 100 samples (6.0%)
>> >   3 (3.0%) low mild
>> >   3 (3.0%) high severe
>> > variance introduced by outliers: 98.921%
>> > variance is severely inflated by outliers
>> >
>> > benchmarking FFI/C binding
>> > mean: 209.9733 us, lb 209.5316 us, ub 210.4680 us, ci 0.950
>> > std dev: 2.401398 us, lb 2.052981 us, ub 2.889688 us, ci 0.950
>> >
>> > First result is always about 39us (2,5 faster, despite longer signal!)
>> > while the remaining
>> > benchmarks take almost two times longer.
>> >
>> > Janek
>> >
>> > Dnia niedziela, 25 listopada 2012, Janek S. napisał:
>> > > Well, it seems that this only happens on my machine. I will try to test
>> > > this code on different computer and see if I can reproduce it.
>> > >
>> > > I don't think using existing vector is a good idea - it would make the
>> >
>> > code
>> >
>> > > impure.
>> > >
>> > > Janek
>> > >
>> > > Dnia sobota, 24 listopada 2012, Branimir Maksimovic napisał:
>> > > > I don't see such behavior neither.ubuntu 12.10, ghc 7.4.2.
>> > > > Perhaps this has to do with how malloc allocates /cachebehavior. If
>> > > > you try not to allocate array rather use existing one perhaps there
>> > > > would
>> >
>> > be
>> >
>> > > > no inconsistency?It looks to me that's about CPU cache performance.
>> > > > Branimir
>> > > >
>> > > > > I'm using GHC 7.4.2 on x86_64 openSUSE Linux, kernel 2.6.37.6.
>> > > > >
>> > > > > Janek
>> > > > >
>> > > > > Dnia piątek, 23 listopada 2012, Edward Z. Yang napisał:
>> > > > > > Running the sample code on GHC 7.4.2, I don't see the "one
>> > > > > > fast, rest slow" behavior.  What version of GHC are you running?
>> > > > > >
>> > > > > > Edward
>> > > > > >
>> > > > > > Excerpts from Janek S.'s message of Fri Nov 23 13:42:03 -0500 2012:
>> > > > > > > > What happens if you do the benchmark without unsafePerformIO
>> > > > > > > > involved?
>> > > > > > >
>> > > > > > > I removed unsafePerformIO, changed copy to have type Vector
>> >
>> > Double
>> >
>> > > > > > > -> IO (Vector Double) and modified benchmarks like this:
>> > > > > > >
>> > > > > > > bench "C binding" $ whnfIO (copy signal)
>> > > > > > >
>> > > > > > > I see no difference - one benchmark runs fast, remaining ones
>> > > > > > > run slow.
>> > > > > > >
>> > > > > > > Janek
>> > > > > > >
>> > > > > > > > Excerpts from Janek S.'s message of Fri Nov 23 10:44:15 -0500
>> >
>> > 2012:
>> > > > > > > > > I am using Criterion library to benchmark C code called via
>> >
>> > FFI
>> >
>> > > > > > > > > bindings and I've ran into a problem that looks like a bug.
>> > > > > > > > >
>> > > > > > > > > The first benchmark that uses FFI runs correctly, but
>> > > > > > > > > subsequent benchmarks run much longer. I created demo code
>> > > > > > > > > (about 50 lines, available at github:
>> > > > > > > > > https://gist.github.com/4135698 ) in which C function
>> >
>> > copies a
>> >
>> > > > > > > > > vector of doubles. I benchmark that function a couple of
>> >
>> > times.
>> >
>> > > > > > > > > First run results in avarage time of about 17us, subsequent
>> > > > > > > > > runs take about 45us. In my real code additional time was
>> >
>> > about
>> >
>> > > > > > > > > 15us and it seemed to be a constant factor, not relative to
>> > > > > > > > > "correct" run time. The surprising thing is that if my C
>> > > > > > > > > function only allocates memory and does no copying:
>> > > > > > > > >
>> > > > > > > > > double* c_copy( double* inArr, int arrLen ) {
>> > > > > > > > >   double* outArr = malloc( arrLen * sizeof( double ) );
>> > > > > > > > >
>> > > > > > > > >   return outArr;
>> > > > > > > > > }
>> > > > > > > > >
>> > > > > > > > > then all is well - all runs take similar amount of time. I
>> >
>> > also
>> >
>> > > > > > > > > noticed that sometimes in my demo code all runs take about
>> > > > > > > > > 45us, but this does not seem to happen in my real code -
>> >
>> > first
>> >
>> > > > > > > > > run is always shorter.
>> > > > > > > > >
>> > > > > > > > > Does anyone have an idea what is going on?
>> > > > > > > > >
>> > > > > > > > > Janek
>> > > > >
>> > > > > _______________________________________________
>> > > > > Haskell-Cafe mailing list
>> > > > > Haskell-Cafe <at> haskell.org
>> > > > > http://www.haskell.org/mailman/listinfo/haskell-cafe
>> > >
>> > > _______________________________________________
>> > > Haskell-Cafe mailing list
>> > > Haskell-Cafe <at> haskell.org
>> > > http://www.haskell.org/mailman/listinfo/haskell-cafe
>> >
>> > _______________________________________________
>> > Haskell-Cafe mailing list
>> > Haskell-Cafe <at> haskell.org
>> > http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe <at> haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe

--

-- 
Gregory Collins <greg <at> gregorycollins.net>

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Janek S. | 27 Nov 14:56 2012
Picon

Re: Problem with benchmarking FFI calls with Criterion

Dnia wtorek, 27 listopada 2012, Gregory Collins napisał:
> Did you pass the option to criterion asking it to do a GC between
> trials? 
Yes.

> You might be measuring a GC pause. 
>
> On Tue, Nov 27, 2012 at 2:41 PM, Janek S. <fremenzone <at> poczta.onet.pl> wrote:
> > Dnia wtorek, 27 listopada 2012, Jake McArthur napisał:
> >> I once had a problem like this. It turned out that my laptop was
> >> stepping the cpu clock rate down whenever it got warm. Disabling that
> >> feature in my BIOS fixed it. Your problem might be similar.
> >
> > I just check - I disabled frequency scaling and results are the same.
> > Actually I doubt that 39us of benchmarking would cause CPU overheating
> > with such repeatibility. Besides, this wouldn't explain why the first
> > benchmark actually got faster.
> >
> > Janek
> >
> >> On Nov 27, 2012 7:23 AM, "Janek S." <fremenzone <at> poczta.onet.pl> wrote:
> >> > I tested the same code on my second machine - Debian Squeeze (kernel
> >> > 2.6.32) with GHC 7.4.1 - and
> >> > the results are extremely surprising. At first I was unable to
> >> > reproduce the problem and got
> >> > consistent runtimes of about 107us:
> >> >
> >> > benchmarking FFI/C binding
> >> > mean: 107.3837 us, lb 107.2013 us, ub 107.5862 us, ci 0.950
> >> > std dev: 983.6046 ns, lb 822.6750 ns, ub 1.292724 us, ci 0.950
> >> >
> >> > benchmarking FFI/C binding
> >> > mean: 108.1152 us, lb 107.9457 us, ub 108.3052 us, ci 0.950
> >> > std dev: 916.2469 ns, lb 793.1004 ns, ub 1.122127 us, ci 0.950
> >> >
> >> > I started experimenting with the vector size and after bumping its
> >> > size to 32K elements I started
> >> > getting this:
> >> >
> >> > benchmarking FFI/C binding
> >> > mean: 38.50100 us, lb 36.71525 us, ub 46.87665 us, ci 0.950
> >> > std dev: 16.93131 us, lb 1.033678 us, ub 40.23900 us, ci 0.950
> >> > found 6 outliers among 100 samples (6.0%)
> >> >   3 (3.0%) low mild
> >> >   3 (3.0%) high severe
> >> > variance introduced by outliers: 98.921%
> >> > variance is severely inflated by outliers
> >> >
> >> > benchmarking FFI/C binding
> >> > mean: 209.9733 us, lb 209.5316 us, ub 210.4680 us, ci 0.950
> >> > std dev: 2.401398 us, lb 2.052981 us, ub 2.889688 us, ci 0.950
> >> >
> >> > First result is always about 39us (2,5 faster, despite longer signal!)
> >> > while the remaining
> >> > benchmarks take almost two times longer.
> >> >
> >> > Janek
> >> >
> >> > Dnia niedziela, 25 listopada 2012, Janek S. napisał:
> >> > > Well, it seems that this only happens on my machine. I will try to
> >> > > test this code on different computer and see if I can reproduce it.
> >> > >
> >> > > I don't think using existing vector is a good idea - it would make
> >> > > the
> >> >
> >> > code
> >> >
> >> > > impure.
> >> > >
> >> > > Janek
> >> > >
> >> > > Dnia sobota, 24 listopada 2012, Branimir Maksimovic napisał:
> >> > > > I don't see such behavior neither.ubuntu 12.10, ghc 7.4.2.
> >> > > > Perhaps this has to do with how malloc allocates /cachebehavior.
> >> > > > If you try not to allocate array rather use existing one perhaps
> >> > > > there would
> >> >
> >> > be
> >> >
> >> > > > no inconsistency?It looks to me that's about CPU cache
> >> > > > performance. Branimir
> >> > > >
> >> > > > > I'm using GHC 7.4.2 on x86_64 openSUSE Linux, kernel 2.6.37.6.
> >> > > > >
> >> > > > > Janek
> >> > > > >
> >> > > > > Dnia piątek, 23 listopada 2012, Edward Z. Yang napisał:
> >> > > > > > Running the sample code on GHC 7.4.2, I don't see the "one
> >> > > > > > fast, rest slow" behavior.  What version of GHC are you
> >> > > > > > running?
> >> > > > > >
> >> > > > > > Edward
> >> > > > > >
> >> > > > > > Excerpts from Janek S.'s message of Fri Nov 23 13:42:03 -0500 2012:
> >> > > > > > > > What happens if you do the benchmark without
> >> > > > > > > > unsafePerformIO involved?
> >> > > > > > >
> >> > > > > > > I removed unsafePerformIO, changed copy to have type Vector
> >> >
> >> > Double
> >> >
> >> > > > > > > -> IO (Vector Double) and modified benchmarks like this:
> >> > > > > > >
> >> > > > > > > bench "C binding" $ whnfIO (copy signal)
> >> > > > > > >
> >> > > > > > > I see no difference - one benchmark runs fast, remaining
> >> > > > > > > ones run slow.
> >> > > > > > >
> >> > > > > > > Janek
> >> > > > > > >
> >> > > > > > > > Excerpts from Janek S.'s message of Fri Nov 23 10:44:15
> >> > > > > > > > -0500
> >> >
> >> > 2012:
> >> > > > > > > > > I am using Criterion library to benchmark C code called
> >> > > > > > > > > via
> >> >
> >> > FFI
> >> >
> >> > > > > > > > > bindings and I've ran into a problem that looks like a
> >> > > > > > > > > bug.
> >> > > > > > > > >
> >> > > > > > > > > The first benchmark that uses FFI runs correctly, but
> >> > > > > > > > > subsequent benchmarks run much longer. I created demo
> >> > > > > > > > > code (about 50 lines, available at github:
> >> > > > > > > > > https://gist.github.com/4135698 ) in which C function
> >> >
> >> > copies a
> >> >
> >> > > > > > > > > vector of doubles. I benchmark that function a couple of
> >> >
> >> > times.
> >> >
> >> > > > > > > > > First run results in avarage time of about 17us,
> >> > > > > > > > > subsequent runs take about 45us. In my real code
> >> > > > > > > > > additional time was
> >> >
> >> > about
> >> >
> >> > > > > > > > > 15us and it seemed to be a constant factor, not relative
> >> > > > > > > > > to "correct" run time. The surprising thing is that if
> >> > > > > > > > > my C function only allocates memory and does no copying:
> >> > > > > > > > >
> >> > > > > > > > > double* c_copy( double* inArr, int arrLen ) {
> >> > > > > > > > >   double* outArr = malloc( arrLen * sizeof( double ) );
> >> > > > > > > > >
> >> > > > > > > > >   return outArr;
> >> > > > > > > > > }
> >> > > > > > > > >
> >> > > > > > > > > then all is well - all runs take similar amount of time.
> >> > > > > > > > > I
> >> >
> >> > also
> >> >
> >> > > > > > > > > noticed that sometimes in my demo code all runs take
> >> > > > > > > > > about 45us, but this does not seem to happen in my real
> >> > > > > > > > > code -
> >> >
> >> > first
> >> >
> >> > > > > > > > > run is always shorter.
> >> > > > > > > > >
> >> > > > > > > > > Does anyone have an idea what is going on?
> >> > > > > > > > >
> >> > > > > > > > > Janek
> >> > > > >
> >> > > > > _______________________________________________
> >> > > > > Haskell-Cafe mailing list
> >> > > > > Haskell-Cafe <at> haskell.org
> >> > > > > http://www.haskell.org/mailman/listinfo/haskell-cafe
> >> > >
> >> > > _______________________________________________
> >> > > Haskell-Cafe mailing list
> >> > > Haskell-Cafe <at> haskell.org
> >> > > http://www.haskell.org/mailman/listinfo/haskell-cafe
> >> >
> >> > _______________________________________________
> >> > Haskell-Cafe mailing list
> >> > Haskell-Cafe <at> haskell.org
> >> > http://www.haskell.org/mailman/listinfo/haskell-cafe
> >
> > _______________________________________________
> > Haskell-Cafe mailing list
> > Haskell-Cafe <at> haskell.org
> > http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Gmane