David Sheffield | 22 Nov 2011 07:35
Picon

OMAP4430 ES2.2 Bandwidth Issues

I know OMAP4430 bandwidth issues have been brought-up in other posts
but I was wondering if anyone knows if the ES2.2 revision has fixed
(or has the potential through a microcode patch) poor memory
bandwidth?

If the ES2.2 has not fixed this issues, will future revisions of the
Pandaboard have a different version of the OMAP4 processor fixed
memory controller issues?

Does anyone know the root cause of this problem? Is a patchable issue
or a design flaw? Poor implementation of cache coherence? Not enough
buffering? Non-blocking cache disabled?

--David

Måns Rullgård | 22 Nov 2011 19:03

Re: OMAP4430 ES2.2 Bandwidth Issues

David Sheffield <sheffield.david@...> writes:

> I know OMAP4430 bandwidth issues have been brought-up in other posts
> but I was wondering if anyone knows if the ES2.2 revision has fixed
> (or has the potential through a microcode patch) poor memory
> bandwidth?
>
> If the ES2.2 has not fixed this issues, will future revisions of the
> Pandaboard have a different version of the OMAP4 processor fixed
> memory controller issues?

The OMAP4460 as found on the Pandaboard ES has much better memory
bandwidth.

> Does anyone know the root cause of this problem? Is a patchable issue
> or a design flaw? Poor implementation of cache coherence? Not enough
> buffering? Non-blocking cache disabled?

At least part of the problem is high latency to the DDR interface
combined with a low limit on outstanding requests.  The 4430 ES2.2
exposes a few more tuning knobs on the L2 cache controller which might
help to some extent by enabling more efficient prefetching of data.
I don't have such a chip and thus no benchmarks.

The 4460 has a new path between the CPU and memory controller with lower
latency, which makes a huge difference.

--

-- 
Måns Rullgård
mans@...
(Continue reading)

Binwei Yang | 28 Nov 2011 05:15
Picon

Re: OMAP4430 ES2.2 Bandwidth Issues


OMAP4460 redesign the memory access logic. one port of PL310 connects to MA which connects to memory controller directly. The similar design is used in Tegra2, Exynos4410 and high possibility on Apple's A5. 

So now their memory latency will be similar. but one master port from PL310 only supports 4 outstanding requests. Their memory throughput will be similar as well. 

Which indicates OMAP4430's high latency comes from local interconnect. but local interconnect runs at half of MPU frequency, it shouldn't lead to so high memory latency. Is it because local interconnect doesn't do address filter, so some memory request are routed to L3 interconnect? Interesting. 



2011/11/23 Måns Rullgård <mans-2StjZFpD7GcAvxtiuMwx3w@public.gmane.org>
David Sheffield <sheffield.david-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> I know OMAP4430 bandwidth issues have been brought-up in other posts
> but I was wondering if anyone knows if the ES2.2 revision has fixed
> (or has the potential through a microcode patch) poor memory
> bandwidth?
>
> If the ES2.2 has not fixed this issues, will future revisions of the
> Pandaboard have a different version of the OMAP4 processor fixed
> memory controller issues?

The OMAP4460 as found on the Pandaboard ES has much better memory
bandwidth.

> Does anyone know the root cause of this problem? Is a patchable issue
> or a design flaw? Poor implementation of cache coherence? Not enough
> buffering? Non-blocking cache disabled?

At least part of the problem is high latency to the DDR interface
combined with a low limit on outstanding requests.  The 4430 ES2.2
exposes a few more tuning knobs on the L2 cache controller which might
help to some extent by enabling more efficient prefetching of data.
I don't have such a chip and thus no benchmarks.

The 4460 has a new path between the CPU and memory controller with lower
latency, which makes a huge difference.

--
Måns Rullgård
mans-2StjZFpD7GcAvxtiuMwx3w@public.gmane.org

Binwei Yang | 28 Nov 2011 05:31
Picon

Re: OMAP4430 ES2.2 Bandwidth Issues


OMAP4460 bypasses the DMM and connects to EMIF directly. It's also possible the low frequency DMM causes the OMAP4430's long latency issue.

On Mon, Nov 28, 2011 at 12:15 PM, Binwei Yang <binwei.y-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

OMAP4460 redesign the memory access logic. one port of PL310 connects to MA which connects to memory controller directly. The similar design is used in Tegra2, Exynos4410 and high possibility on Apple's A5. 

So now their memory latency will be similar. but one master port from PL310 only supports 4 outstanding requests. Their memory throughput will be similar as well. 

Which indicates OMAP4430's high latency comes from local interconnect. but local interconnect runs at half of MPU frequency, it shouldn't lead to so high memory latency. Is it because local interconnect doesn't do address filter, so some memory request are routed to L3 interconnect? Interesting. 



2011/11/23 Måns Rullgård <mans-2StjZFpD7GcAvxtiuMwx3w@public.gmane.org>
David Sheffield <sheffield.david-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> I know OMAP4430 bandwidth issues have been brought-up in other posts
> but I was wondering if anyone knows if the ES2.2 revision has fixed
> (or has the potential through a microcode patch) poor memory
> bandwidth?
>
> If the ES2.2 has not fixed this issues, will future revisions of the
> Pandaboard have a different version of the OMAP4 processor fixed
> memory controller issues?

The OMAP4460 as found on the Pandaboard ES has much better memory
bandwidth.

> Does anyone know the root cause of this problem? Is a patchable issue
> or a design flaw? Poor implementation of cache coherence? Not enough
> buffering? Non-blocking cache disabled?

At least part of the problem is high latency to the DDR interface
combined with a low limit on outstanding requests.  The 4430 ES2.2
exposes a few more tuning knobs on the L2 cache controller which might
help to some extent by enabling more efficient prefetching of data.
I don't have such a chip and thus no benchmarks.

The 4460 has a new path between the CPU and memory controller with lower
latency, which makes a huge difference.

--
Måns Rullgård
mans-2StjZFpD7GcAvxtiuMwx3w@public.gmane.org



Gmane