Takayuki Muranushi | 5 Aug 16:24 2012
Picon

Benchmark of DFT libraries in Haskell

Dear everyone, I'm always grateful to your help.

I have been assigned a complicated and growing task in which I'll
perform a lot of discrete Fourier transforms, so I have measured
performance of several DFT libraries in Haskell:
http://en.pk.paraiso-lang.org/Hackage/what-is-the-fastest-dft-in-haskell/main

The raw result: http://paraiso-lang.org/html/bench-dft-in-haskell.html

I'll share the result in hope that some of you will also find this
result useful. Also, please let me know any possible flaws or
improvements in the benchmark process!

My observations are as follows:

* vector-fftw with wisdom was more than 1/2 times faster than fftw in
C with wisdom (and with communication overhead.)
* vector-fftw without wisdom was significantly _faster_ than fftw in C
without wisdom. I wonder why.
* vector-fftw over vector was faster than fft over CArray.
* any library that doesn't use fftw is much slower than those that does.

Best,
--

-- 
Takayuki MURANUSHI
The Hakubi Center for Advanced Research, Kyoto University
http://www.hakubi.kyoto-u.ac.jp/02_mem/h22/muranushi.html
Ertugrul Söylemez | 6 Aug 03:37 2012
Picon

Re: Benchmark of DFT libraries in Haskell

Takayuki Muranushi <muranushi <at> gmail.com> wrote:

> * vector-fftw with wisdom was more than 1/2 times faster than fftw in
> C with wisdom (and with communication overhead.)
> * vector-fftw without wisdom was significantly _faster_ than fftw in C
> without wisdom. I wonder why.
> * vector-fftw over vector was faster than fft over CArray.
> * any library that doesn't use fftw is much slower than those that
> does.

I have no experience with FFTW, but in general a result like this often
means that you may not have actually calculated the values themselves.
One easy way to ensure this is to print out the whole result.  If you
feel like printing takes too much CPU time for comparison, you need to
force deeply like with deepseq.

Notably Data.Vector is a lazy data structure.  If you force the vector
itself, you are not forcing the individual values.  For FFT I would
assume that the length of the resulting vector does not depend on any
values.

Greets,
Ertugrul

--

-- 
Not to be or to be and (not to be or to be and (not to be or to be and
(not to be or to be and ... that is the list monad.
_______________________________________________
(Continue reading)

Scott Michel | 6 Aug 03:59 2012
Picon

Re: Benchmark of DFT libraries in Haskell

Ertugrul:

I might be missing something in translation, but if I understand Takayuki's message's intent, everything needs to be calculated because the C-based FFTW library is called (eventually). Laziness doesn't really have an impact.

The choice of underlying data structure and whether FFTW wisdom is used clearly has a significant impact.

FFTW and Intel's MKL libraries are the acknowledged "state of the art" libraries for performing discrete Fourier transforms. I'm not sure there's anything better or faster for CPU implementations (I know there's a O(1) implementation for map-reduce systems and NVIDIA's CUDA-FFT. Note that the map-reduce approach has a preprocessing step that isn't O(1).) Interesting to note that much of the code for FFTW was initially generated using OCaml to find optimal versions of code for particular problem sizes.


-scooter

On Sun, Aug 5, 2012 at 6:37 PM, Ertugrul Söylemez <es <at> ertes.de> wrote:
Takayuki Muranushi <muranushi <at> gmail.com> wrote:

> * vector-fftw with wisdom was more than 1/2 times faster than fftw in
> C with wisdom (and with communication overhead.)
> * vector-fftw without wisdom was significantly _faster_ than fftw in C
> without wisdom. I wonder why.
> * vector-fftw over vector was faster than fft over CArray.
> * any library that doesn't use fftw is much slower than those that
> does.

I have no experience with FFTW, but in general a result like this often
means that you may not have actually calculated the values themselves.
One easy way to ensure this is to print out the whole result.  If you
feel like printing takes too much CPU time for comparison, you need to
force deeply like with deepseq.

Notably Data.Vector is a lazy data structure.  If you force the vector
itself, you are not forcing the individual values.  For FFT I would
assume that the length of the resulting vector does not depend on any
values.


Greets,
Ertugrul

--
Not to be or to be and (not to be or to be and (not to be or to be and
(not to be or to be and ... that is the list monad.

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Ertugrul Söylemez | 6 Aug 09:34 2012
Picon

Re: Benchmark of DFT libraries in Haskell

Scott Michel <scooter.phd <at> gmail.com> wrote:

> I might be missing something in translation, but if I understand
> Takayuki's message's intent, everything needs to be calculated because
> the C-based FFTW library is called (eventually). Laziness doesn't
> really have an impact.
>
> The choice of underlying data structure and whether FFTW wisdom is
> used clearly has a significant impact.

If the Haskell wrapper library is a thick enough, lazy layer around
FFTW, the size of the result vector may not at all depend on any FFTW
computation.

Again, I have no experience at all with FFTW or any Haskell bindings to
it.  This is just a general remark that is worth keeping in mind.

Greets,
Ertugrul

--

-- 
Key-ID: E5DD8D11 "Ertugrul Soeylemez <es <at> ertes.de>"
FPrint: BD28 3E3F BE63 BADD 4157  9134 D56A 37FA E5DD 8D11
Keysrv: hkp://subkeys.pgp.net/
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe <at> haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
Erik de Castro Lopo | 6 Aug 10:27 2012

Re: Benchmark of DFT libraries in Haskell

Takayuki Muranushi wrote:

> * vector-fftw with wisdom was more than 1/2 times faster than fftw in
> C with wisdom (and with communication overhead.)

I would be suspicious of that result. Calling a C function from a library
should be slower from Haskell than from C.

Erik
--

-- 
----------------------------------------------------------------------
Erik de Castro Lopo
http://www.mega-nerd.com/
Takayuki Muranushi | 7 Aug 03:52 2012
Picon

Re: Benchmark of DFT libraries in Haskell

Dear Ertugrul, Scott and Erik, thank you for your comments.

w.r.t the lazyness, I make the solvers to calculate the amplitude of
final FFT results (i.e. to calculate the square magnitude of array
elements and sum over them,) compare the response with the expected
results and cause side effects depending on the test result. This
should cause the FFT chain to be fully evaluated.

>> * vector-fftw with wisdom was more than 1/2 times faster than fftw in
>> C with wisdom (and with communication overhead.)

> I would be suspicious of that result. Calling a C function from a library
> should be slower from Haskell than from C.

Sorry for the confusion, What I meant is that vector-fftw version takes
more time than C version, but less than twice. Please compare the two lines

* "fft/cpp 1 1048576 102"
* "fft/vector-fftw 0 1048576 102"

in http://paraiso-lang.org/html/bench-dft-in-haskell.html .

P.S. including GPU contestants would be interesting!

2012/8/6 Erik de Castro Lopo <mle+hs <at> mega-nerd.com>:
> Takayuki Muranushi wrote:
>
>> * vector-fftw with wisdom was more than 1/2 times faster than fftw in
>> C with wisdom (and with communication overhead.)
>
> I would be suspicious of that result. Calling a C function from a library
> should be slower from Haskell than from C.
>
> Erik
> --
> ----------------------------------------------------------------------
> Erik de Castro Lopo
> http://www.mega-nerd.com/
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe <at> haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe

Best,

--

-- 
Takayuki MURANUSHI
The Hakubi Center for Advanced Research, Kyoto University
http://www.hakubi.kyoto-u.ac.jp/02_mem/h22/muranushi.html
Erik de Castro Lopo | 7 Aug 11:12 2012

Re: Benchmark of DFT libraries in Haskell

Takayuki Muranushi wrote:

> >> * vector-fftw with wisdom was more than 1/2 times faster than fftw in
> >> C with wisdom (and with communication overhead.)
> 
> > I would be suspicious of that result. Calling a C function from a library
> > should be slower from Haskell than from C.
> 
> Sorry for the confusion, What I meant is that vector-fftw version takes
> more time than C version, but less than twice.

That makes much more sense. Whether you're calling fftw from C or from
Haskell, its still the fftw library doing most of the work. As you 
increase the FFT length, the difference between C and Haskell should
decrease.

Cheers,
Erik
--

-- 
----------------------------------------------------------------------
Erik de Castro Lopo
http://www.mega-nerd.com/

Gmane