Sebastien Bourdeauducq | 23 Apr 12:55 2011

MMU

Hi,

To follow up on the recent discussions about the LM32 MMU. I think we
can all agree to use virtually indexed physically tagged caches that run
in parallel with the CPU caches. Then to solve aliasing and/or context
switch problems two major options were proposed:
a) Use a process ID tag in the cache. Problems: limits the number of
processes we can have without flushing the caches at each context
switch, data cache cannot be used for inter-process communication, OS
still has to check for aliasing when the same process maps the same
physical address twice (rare case though).
b) Use a simple cache with cache associativity * page size = cache size.
Advantages: simple hardware, and solves above problems. Problems: a bit
less flexible, and if we do not want to flush caches at context switches
we must ensure that shared memory areas are mapped at the same virtual
addresses in all processes that use it. I'm not familiar enough with how
OSes deal with virtual memory to know if this is a strong constraint or
not. Comments?

Let's make the MMU design happen :)

S.

Sebastien Bourdeauducq | 23 Apr 12:56 2011

Re: MMU

On Sat, 2011-04-23 at 12:55 +0200, Sebastien Bourdeauducq wrote:
> I think we
> can all agree to use virtually indexed physically tagged caches that run
> in parallel with the CPU caches.

you should read: "with the TLB that runs in parallel..."

Norman Feske | 25 Apr 19:18 2011

Re: MMU

Hi Sebastien,

> Problems: a bit
> less flexible, and if we do not want to flush caches at context switches
> we must ensure that shared memory areas are mapped at the same virtual
> addresses in all processes that use it. I'm not familiar enough with how
> OSes deal with virtual memory to know if this is a strong constraint or
> not. Comments?

In general, this is a rather unwelcome restriction. E.g., look at the
mmap syscall on Linux, which enables each process to map (potentially
different parts of) the same file at (potentially multiple) virtual
address regions. Of course, multiple processes may do so for the same
file. For this horror scenario, your proposed solution would be a bad
fit. Even though I agree that such scenarios are rare, they are a
nightmare to debug once they occur. If the OS API is as flexible as mmap
(i.e., this is the case for the OS that we're developing), the OS must
somehow deal with such situations, or at least detect them. That said, I
acknowledge your reservations against a physically indexed cache. So I
think it would be sensible to go for your proposal b) and see how far we
get. At least for our custom OS, I have a few ideas in mind about how to
handle the aliasing problems. For Linux, I don't know.

In my experience, there are cases when user-level software knows about
the shared-memory nature of memory mappings and can take precautions
about potential aliasing problems by itself. E.g., an application
painting onto the surface of a virtual framebuffer provided by a GUI
service. For such cases, an instruction in the line of 'clflush' (as
provided by the x86 architecture) would be good to have. It enables the
program to explicitly flush cache lines by specifying a virtual address.
(Continue reading)

Sebastien Bourdeauducq | 25 Apr 20:39 2011

Re: MMU

On Mon, 2011-04-25 at 19:18 +0200, Norman Feske wrote:
> So I think it would be sensible to go for your proposal b) and see how far we
> get. At least for our custom OS, I have a few ideas in mind about how to
> handle the aliasing problems. For Linux, I don't know.

Ok. Hardware-wise, your solution is a superset of mine anyway, so we
could implement the missing bits later to try yours.

> For such cases, an instruction in the line of 'clflush' (as
> provided by the x86 architecture) would be good to have. It enables
> the program to explicitly flush cache lines by specifying a virtual
> address.

This shouldn't be too hard to add.

S.

Wesley W. Terpstra | 26 Apr 10:26 2011
Picon
Picon

Re: MMU

Although I've already mentioned there are absolutely no problems with a 
VIPT mmu/cache design under the cache-size assumption Sebastien worked 
with, there is *also* no problem with exceeding this cache-size 
restriction for an operating system like linux. I'll explain why below.

On 25/04/11 07:18 PM, Norman Feske wrote:
> In general, this is a rather unwelcome restriction. E.g., look at the
> mmap syscall on Linux, which enables each process to map (potentially
> different parts of) the same file at (potentially multiple) virtual
> address regions. Of course, multiple processes may do so for the same
> file.
Linux does *not* allow you map pages wherever you like. The mmap syscall 
is free to reject every proposed location you suggest. Just try any 
unaligned address; they will all fail. To ensure that file-backed mmap 
doesn't lead to inconsistent caching, linux just has to increase the 
required alignment.

For example, suppose a 64k cache 4-way associative cache with a 4k page 
size. Now, linux just has to make sure that every mapping of that 
file-backed page match on bits 13 and 14. This can be achieved by mmap 
requiring 16k alignment to the file offset, a perfectly permissible 
restriction according to POSIX.
> I acknowledge your reservations against a physically indexed cache.
A physically indexed cache would require us to add a new pipeline stage 
in the LM32. That would require a complete redesign of the CPU.

Wesley W. Terpstra | 26 Apr 10:01 2011
Picon
Picon

Re: MMU

On 23/04/11 12:55 PM, Sebastien Bourdeauducq wrote:
> b) Use a simple cache with cache associativity * page size = cache size.
> Advantages: simple hardware, and solves above problems.
Half the point of the LM32 is the sweet-spot in speed/size that it hits. 
This approach to an MMU lives at a similar design sweet-spot. I am for it.

> Problems: a bit
> less flexible, and if we do not want to flush caches at context switches
> we must ensure that shared memory areas are mapped at the same virtual
> addresses in all processes that use it.

I don't believe you are correct. As long as:
    [associativity]*[page size] >= [cache size]
there should be no problem.

There are only two possible issues that can happen in the cache:

1) Multiple physical addresses map to one virtual address

This is solved by using physical tagging.

2) Multiple virtual addresses map to one physical address

This is solved by the above restriction on cache size.

To be concrete, imagine a shared page. The operating system will only 
ever map a page aligned (ie: a 4k page won't live offset at position 
2k). Any access to the shared page will use the same low bits index into 
that page (12 bits in our example). Supposing a 16k cache, that means 
that the page can only have four aliasing locations (the highest 2 bits 
(Continue reading)

Sebastien Bourdeauducq | 27 Apr 19:12 2011

Re: MMU

On Tue, 2011-04-26 at 10:01 +0200, Wesley W. Terpstra wrote:
> I don't believe you are correct. As long as:
>     [associativity]*[page size] >= [cache size]
> there should be no problem.
> 
> There are only two possible issues that can happen in the cache:

You're probably right... I was thinking about a disaster scenario where
a process writes a buffer which is then read by another process that
maps it to a different virtual address (so the TLB is modified for this
buffer but the caches are not flushed), but I can't think of it
again :-P

S.


Gmane