Siarhei Siamashka | 22 Feb 22:23 2011
Picon

[PATCH 3/7] C variant of bilinear scaled 'src_8888_8888' fast path

From: Siarhei Siamashka <siarhei.siamashka@...>

Because of doing scaling in a single pass without temporary buffers, it
is a bit faster than general path on x86 (and provides even better speedup
on MIPS and ARM).

Benchmark on Intel Core i7:
 Using cairo-perf-trace:
  before: image        firefox-planet-gnome   12.566   12.610   0.23%    6/6
  after:  image        firefox-planet-gnome   12.019   12.054   0.15%    5/6

 Microbenchmark (scaling 2000x2000 image with scale factor close to 1x):
  before: op=1, src=20028888, dst=20028888, speed=70.48 MPix/s
  after:  op=1, src=20028888, dst=20028888, speed=82.61 MPix/s

Benchmark on ARM Cortex-A8:
 Microbenchmark (scaling 2000x2000 image with scale factor close to 1x):
  before: op=1, src=20028888, dst=20028888, speed=6.70 MPix/s
  after:  op=1, src=20028888, dst=20028888, speed=10.72 MPix/s

Benchmark on MIPS 24K:
 Microbenchmark (scaling 2000x2000 image with scale factor close to 1x):
  before: op=1, src=20028888, dst=20028888, speed=5.12 MPix/s
  after:  op=1, src=20028888, dst=20028888, speed=6.96 MPix/s

 Microbenchmark (scaling 500x500 image with scale factor close to 1x):
  before: op=1, src=20028888, dst=20028888, speed=5.26 MPix/s
  after:  op=1, src=20028888, dst=20028888, speed=7.00 MPix/s
---
 pixman/pixman-fast-path.c |  144 +++++++++++++++++++++++++++++++++++++++++++++
(Continue reading)

Soeren Sandmann | 26 Feb 13:33 2011
Picon
Picon

Re: [PATCH 3/7] C variant of bilinear scaled 'src_8888_8888' fast path

Hi,

Most of this series looks good to me, except that this:

>  static force_inline uint32_t
> +bilinear_interpolation (uint32_t tl, uint32_t tr,
> +			uint32_t bl, uint32_t br,
> +			int distx, int wt, int wb)
> +{

is a duplication of the code from pixman-bits-image, except that disty
has been split into wt and wb so that it can deal with the top and
bottom edges of the image.

The bits_image_fetch_bilinear_no_repeat_8888() function handles the same
problem by simply passing 0 for tl/tr or bl/br. Could it be done the
same way here? Alternatively, if there is a good reason for the split,
presumably that same reason applies to the code in pixman-bits-image.c.

The pixman-fast-path.h file also contains a duplicated version of
"repeat" where the only difference is that the order of the arguments is
reversed and one uses MOD(), the other while() to do repeating. This
duplication was not introduced by you, but it would be useful to share
that code as well.

Maybe the file pixman-fast-path.h should be renamed to pixman-common.h
or pixman-inlines.h since it would no longer be specific to
pixman-fast-path.c

Soren
(Continue reading)

Siarhei Siamashka | 26 Feb 19:22 2011
Picon

Re: [PATCH 3/7] C variant of bilinear scaled 'src_8888_8888' fast path

On Saturday 26 February 2011 14:33:02 Soeren Sandmann wrote:
> Hi,

Hi,

First of all, I would say that this C fast path was primarily added in order to
benchmark the effect of doing single pass scaling without intermediate buffer
on different processors. With the intention to see whether this single pass
processing has any advantage over just putting efforts into SIMD optimizing
bilinear fetchers and be done with that. I think it proved to be useful,
especially on ARM and MIPS.

But I'm actually in favor of dropping all these bilinear scaling C fast paths
for now, especially considering that the discussion is still ongoing about
how to improve bilinear scaling performance when SIMD extensions are not
available.

Another reason to have bilinear C fast paths in pixman is that they could be
used as a reference C code if/when we get contributors willing to also optimize
bilinear scaling for other architectures  (powerpc VMX, MIPS DSP ASE, ...). So
when we get a question like "Could you let me know the bilinear scaling
interfaces in pixman and where the SIMD optimization will be applied?"
again, we would have some place in the code to show. And even better, we
should proactively prepare some documentation with some basic introductory
information about adding new CPU specific optimizations to pixman, so that
the contributors will be able to save their time and avoid common pitfalls.

> Most of this series looks good to me, except that this:
> >  static force_inline uint32_t
> > 
(Continue reading)

Siarhei Siamashka | 26 Feb 19:29 2011
Picon

Re: [PATCH 3/7] C variant of bilinear scaled 'src_8888_8888' fast path

On Saturday 26 February 2011 14:33:02 Soeren Sandmann wrote:
> The pixman-fast-path.h file also contains a duplicated version of
> "repeat" where the only difference is that the order of the arguments is
> reversed and one uses MOD(), the other while() to do repeating. This
> duplication was not introduced by you, but it would be useful to share
> that code as well.

Yes, there is another one in 'pixman-bits-image' and they indeed could be
combined. But this really is not directly related to this particular patch
series.

Also there is a duplication of functionality between SIMPLE_NEAREST_FAST_PATH
and NEAREST_FAST_PATH. SIMPLE_NEAREST_FAST_PATH is faster and already
provides almost all functionality of NEAREST_FAST_PATH except for
REFLECT repeat. I always considered to do something about REFLECT repeat
in SIMPLE_NEAREST_FAST_PATH, but the fact that it does not seem to be used
anywhere does provide much motivation. But if REFLECT repeat is not so
important, maybe NEAREST_FAST_PATH can be just dropped in order to save
code size?

> Maybe the file pixman-fast-path.h should be renamed to pixman-common.h
> or pixman-inlines.h since it would no longer be specific to
> pixman-fast-path.c

That makes sense.

--

-- 
Best regards,
Siarhei Siamashka
(Continue reading)


Gmane