Daniel Phillips | 8 Oct 08:31

Re: Encoding of extent information

On Sunday 05 October 2008 23:33, Philip Pokorny wrote:
> I was wondering how you were encoding the length into "unused" bits of the extent pointers.
> 
> I've seen you use high-order bits elsewhere in your design, so I assume you took 6 bits from the top of the
pointer thinking that 2^58 should be big enough for a block pointer.
> 
> I wonder if you have ever heard of this alternate encoding for extents.  It depends on several assumptions:
> 
> 1. The minimum extent size is 2x the addressable unit size.  In the case of a block device with 512 byte
sectors, that would be a 1k minimum extent (2 sectors).
> 
> 2. Extents can only be of size 2^n
> 
> 3. An extent of size 2^n is aligned on a 2^n addressable unit boundry.
> 
> For a filesystem, these would seem to be reasonable assumptions.
> 
> For any given number N (except 0) that describes an extent, the first unit in the extent is given by the
expression "N & (N-1)", the last unit is "N | (N-1)" and the width of the extent is "(N ^ (N-1))+1"
> 
> Forcing extents to be aligned on 2^n boundaries of the underlying block device should improve I/O to
composite devices like RAID arrays (which have 2^n "chunk" sizes).
> 
> This scheme imposes no limit on the size of an extent and with no sacrifice of high order bits.  Further,
using this scheme low-order bits can be used for other purposes at the cost of a larger minimum extent
width.  Current filesystems are generally limited to 4k block sizes which would allow for 2 low-order bits
for "special purposes" which would be masked off before the above expressions.
> 
> To illustrate:
> 
(Continue reading)

Philip Pokorny | 8 Oct 09:26
Favicon

Re: Encoding of extent information

Daniel Phillips wrote:
> On Sunday 05 October 2008 23:33, Philip Pokorny wrote:
>   
>> For any given number N (except 0) that describes an extent, the first unit in the extent is given by the
expression "N & (N-1)", the last unit is "N | (N-1)" and the width of the extent is "(N ^ (N-1))+1"
>>
>> Forcing extents to be aligned on 2^n boundaries of the underlying block device should improve I/O to
composite devices like RAID arrays (which have 2^n "chunk" sizes).
>>
>> This scheme imposes no limit on the size of an extent and with no sacrifice of high order bits.  Further,
using this scheme low-order bits can be used for other purposes at the cost of a larger minimum extent
width.  Current filesystems are generally limited to 4k block sizes which would allow for 2 low-order bits
for "special purposes" which would be masked off before the above expressions.
>>
>> To illustrate:
>>
>> N = 200   is a 16 block extent from 192 to 207
>>
>> it can be devided into two 8 block extents: N=196 (192-199) and N=204 (200-207)
>> or into four 4 block extents: N=194 (192-195), N=198 (196-199), N=202 (200-203), N=206 (204-207)
>> or finally into eight 2 block extents:  N= 193, 195, 197, 199, 201, 203, 205, 207
>>
>> Just a thought...
>>     
>
> Wow, that is diabolical,
Thank you [grin]

>  it is certainly worth thinking about.  There
> may be other benefits of the alignment restriction, such as helping
(Continue reading)

Daniel Phillips | 8 Oct 10:02

Re: Encoding of extent information

On Wednesday 08 October 2008 00:26, Philip Pokorny wrote:
> >  it is certainly worth thinking about.  There
> > may be other benefits of the alignment restriction, such as helping
> > control fragmentation.
> It does make the fragmentation somewhat predictable.  For a given extent 
> of a given size, the extent "next door" must be the same size or smaller 
> if it's a candidate for merging the two into a larger extent.
> 
> > But does it not require more extents to
> > represent odd sized files at the short end of the scale?
> I suppose it requires log2(n) extents at most for any "odd" length of an 
> extent.  Specifically the number of "1" bits in the binary 
> representation is the number of extents required and their position 
> tells you what length of extent is required.  So yes, in the general 
> case, you would probably have more extents.  Given the predictability of 
> the sizes, this might not be as much of an issue.  For a filesystem, the 
> requirement that extents be a power of 2 in length might be a 
> significant down-side to this scheme.

I haven't thought about it enough to know if it is a problem.  We win
5 bits and it costs some extra extent entries where it is necessary to
represent extent sizes exactly.  At worst, this is fun to think about,
and there might well be something useful there.

> > Out of curiosity, where did you run into this?
> >   
> I believe it's my own idea.  It's something I've been thinking about for 
> some time.  The magic properties of the expressions are something I read 
> most recently in Donald Knuth's new "chapters" for his next book.  I had 
> seen them before that, but the large number of combinations and 
(Continue reading)

Daniel Phillips | 11 Oct 00:53

Re: Encoding of extent information

On Wednesday 08 October 2008 01:02, Daniel Phillips wrote:
> On Wednesday 08 October 2008 00:26, Philip Pokorny wrote:
> > >  it is certainly worth thinking about.  There
> > > may be other benefits of the alignment restriction, such as helping
> > > control fragmentation.
> > It does make the fragmentation somewhat predictable.  For a given extent 
> > of a given size, the extent "next door" must be the same size or smaller 
> > if it's a candidate for merging the two into a larger extent.
> > 
> > > But does it not require more extents to
> > > represent odd sized files at the short end of the scale?
> > I suppose it requires log2(n) extents at most for any "odd" length of an 
> > extent.  Specifically the number of "1" bits in the binary 
> > representation is the number of extents required and their position 
> > tells you what length of extent is required.  So yes, in the general 
> > case, you would probably have more extents.  Given the predictability of 
> > the sizes, this might not be as much of an issue.  For a filesystem, the 
> > requirement that extents be a power of 2 in length might be a 
> > significant down-side to this scheme.
> 
> I haven't thought about it enough to know if it is a problem.  We win
> 5 bits and it costs some extra extent entries where it is necessary to
> represent extent sizes exactly.  At worst, this is fun to think about,
> and there might well be something useful there.

Hi Philip,

I thought about your idea some more and I am thinking there could be
some kind of win there.  Not only can we represent extents this way, we
can also do buddy allocation.  Why not manage each 128 MB allocation
(Continue reading)

Philip Pokorny | 8 Oct 10:14
Favicon

Re: Encoding of extent information

Daniel Phillips wrote:
> One thing you might consider: wait a few days for the extent code to
> settle down then try coding it up and running some test cases yourself.
> That is just what the unit test setup is for: trying out an idea in
> isolation.
>
> Notice we have an extent(block, count) function to create and extent
> and an extent_count function to extract the count already, and I can
> easily add an extent_block function.  These functions are there to hide
> the details of packing the extent fields to make it easier to add the
> endian conversionts, but now you can use them to try out your idea.
> There also has to be higher level logical to avoid generating misaligned
> extents, and code to report any that happen to slip through.
>
>   
Fair enough.

I'm not currently using or developing any FUSE filesystems, so I'll have 
to get that development environment configured first.  I was hoping that 
you would have the kernel driver ready soon, but the git tree isn't 
really a filesystem yet.

I'll take a look at the FUSE tree and see if I can get that to compile.

Phil P.

--

-- 
Philip Pokorny, RHCE, Chief Arch. & Sr. Dir. HW & Field Eng.
Tel: 415-954-2823   Cell: 415-370-0835
Fax: 415-954-2899   Toll Free: 888-PENGUIN
(Continue reading)

Daniel Phillips | 8 Oct 10:29

Re: Encoding of extent information

On Wednesday 08 October 2008 01:14, Philip Pokorny wrote:
> I'm not currently using or developing any FUSE filesystems, so I'll have 
> to get that development environment configured first.  I was hoping that 
> you would have the kernel driver ready soon, but the git tree isn't 
> really a filesystem yet.
> 
> I'll take a look at the FUSE tree and see if I can get that to compile.

You don't even have to use fuse.  Notice that you can compile dleaf.c
separately with its own main program and test your extents idea just
like I test extents now: by doing some getblks to create some dirty
buffers, then flushing them to invoke the filemap IO code.

Something like this:

  make dleaf && ./dleaf

Regards,

Daniel
Daniel Phillips | 8 Oct 08:36

Re: Encoding of extent information

On Tuesday 07 October 2008 23:31, Daniel Phillips wrote:
> On Sunday 05 October 2008 23:33, Philip Pokorny wrote:
> > I was wondering how you were encoding the length into "unused" bits of the extent pointers.
> > 
> > I've seen you use high-order bits elsewhere in your design, so I assume you took 6 bits from the top of the
pointer thinking that 2^58 should be big enough for a block pointer.

Yes, 6 bits for the extent count and another 10 for the extent version
(when versioning arrives) leaving 48 bits for the block address, giving
an addressable volume size of 2^60 or one exabyte with 4k blocks.

Regards,

Daniel

Gmane