Dr. Tilmann Bubeck | 25 Jan 09:18 2013
Picon

Use EXT4_BOOT_LOADER_INO for boot loader GRUB?

There is a long discussion about booting GRUB2 from a partition 
containing ext2/3/4 (e.g. 
https://bugzilla.redhat.com/show_bug.cgi?id=872826 and many more on the 
net). This is typically a multiboot scenario with a primary boot loader 
in the MBR loading a second bootloader (here "GRUB2") from a partition.

Currently GRUB2 prints an error message when trying to install to a 
ext2/3/4 filesystem with:

"Attempting to install GRUB to a partition instead of the MBR.  This is 
a BAD idea. Embedding is not possible. GRUB can only be installed in 
this setup by using blocklists. However, blocklists are UNRELIABLE and 
their use is discouraged."

The basic problem is, that GRUB needs a safe place to store (currently 
32k) for its boot loader "core.img". That place should be simple to find 
from the primary boot code ("stage1") and the place should be safe for 
user intervention.

QUESTION:

You have reserved a special inode #5 called "EXT4_BOOT_LOADER_INO". Is 
this inode currently used or supported by kernel or user land? What is 
the idea of this inode?

PROPOSAL:

I can think of using that inode to store the file "core.img" of GRUB. 
That file is used by GRUB to boot and the block list of that file is 
stored in GRUB when using "--force" to override the above error.
(Continue reading)

Theodore Ts'o | 26 Jan 19:49 2013
Picon
Picon

Re: Use EXT4_BOOT_LOADER_INO for boot loader GRUB?

On Fri, Jan 25, 2013 at 09:18:50AM +0100, Dr. Tilmann Bubeck wrote:
> The basic problem is, that GRUB needs a safe place to store
> (currently 32k) for its boot loader "core.img". That place should be
> simple to find from the primary boot code ("stage1") and the place
> should be safe for user intervention.
> 
> QUESTION:
> 
> You have reserved a special inode #5 called "EXT4_BOOT_LOADER_INO".
> Is this inode currently used or supported by kernel or user land?
> What is the idea of this inode?

It was basically for something exactly like this.  :-)

> PROPOSAL:
> 
> I can think of using that inode to store the file "core.img" of
> GRUB. That file is used by GRUB to boot and the block list of that
> file is stored in GRUB when using "--force" to override the above
> error.
> 
> ext2/3/4 must make sure, that the block list of that file never
> changes. I propose an additional EXT4 ioctl to tell ext4, which file
> to store in EXT4_BOOT_LOADER_INO.

What I'm thinking about is a new ioctl that would swap the i_block and
i_blocks array of the BOOT_LOADER_INO and the file descriptor.  That
is, if there were any blocks attached to the boot_loader_ino, they
would become attached to the inode associated with the file
descriptor, and the blocks associated with that inode would be
(Continue reading)

Phillip Susi | 6 Feb 03:15 2013

Re: Use EXT4_BOOT_LOADER_INO for boot loader GRUB?


On 01/26/2013 01:49 PM, Theodore Ts'o wrote:
> What I'm thinking about is a new ioctl that would swap the i_block
> and i_blocks array of the BOOT_LOADER_INO and the file descriptor.
> That is, if there were any blocks attached to the boot_loader_ino,
> they would become attached to the inode associated with the file 
> descriptor, and the blocks associated with that inode would be 
> attached to inode #5.

Isn't that what the new defrag ioctl does already?  So you just need a
way to open the reserved and unnamed inode to call the ioctl on.

Dr. Tilmann Bubeck | 19 Feb 22:15 2013
Picon

Re: Use EXT4_BOOT_LOADER_INO for boot loader GRUB?

Ted,

my previous email contains a first implementation of your idea. Did I 
get it right or do you want me something to change?

I can also offer some shell scripts for mass testing the change to a 
fresh ext4 filesystem to ensure, that it does not break anything. I ran 
the script and indeed it does not break anything (as far as I can see).

Kind regards,
  Tilmann

Am 26.01.2013 19:49, schrieb Theodore Ts'o:
> On Fri, Jan 25, 2013 at 09:18:50AM +0100, Dr. Tilmann Bubeck wrote:
>> The basic problem is, that GRUB needs a safe place to store
>> (currently 32k) for its boot loader "core.img". That place should be
>> simple to find from the primary boot code ("stage1") and the place
>> should be safe for user intervention.
>>
>> QUESTION:
>>
>> You have reserved a special inode #5 called "EXT4_BOOT_LOADER_INO".
>> Is this inode currently used or supported by kernel or user land?
>> What is the idea of this inode?
>
> It was basically for something exactly like this.  :-)
>
>> PROPOSAL:
>>
>> I can think of using that inode to store the file "core.img" of
(Continue reading)

Dr. Tilmann Bubeck | 18 Mar 14:26 2013
Picon

Re: Use EXT4_BOOT_LOADER_INO for boot loader GRUB?

Ted,

I sent a patch implementing your idea from below at 2013-02-23 to the  
mailing list. It has received a

Reviewed-by: Andreas Dilger <adilger <at> dilger.ca>

Is there anything more to do to get this accepted? I am willing to do more  
cleanups, if you request them.

Thanks!
   Tilmann

On Sat, 26 Jan 2013 19:49:27 +0100, Theodore Ts'o <tytso <at> mit.edu> wrote:

> On Fri, Jan 25, 2013 at 09:18:50AM +0100, Dr. Tilmann Bubeck wrote:
>> The basic problem is, that GRUB needs a safe place to store
>> (currently 32k) for its boot loader "core.img". That place should be
>> simple to find from the primary boot code ("stage1") and the place
>> should be safe for user intervention.
>>
>> QUESTION:
>>
>> You have reserved a special inode #5 called "EXT4_BOOT_LOADER_INO".
>> Is this inode currently used or supported by kernel or user land?
>> What is the idea of this inode?
>
> It was basically for something exactly like this.  :-)
>
>> PROPOSAL:
(Continue reading)

Theodore Ts'o | 18 Mar 14:51 2013
Picon
Picon

Re: Use EXT4_BOOT_LOADER_INO for boot loader GRUB?

On Mon, Mar 18, 2013 at 02:26:25PM +0100, Dr. Tilmann Bubeck wrote:
> Ted,
> 
> I sent a patch implementing your idea from below at 2013-02-23 to
> the mailing list. It has received a
> 
> Reviewed-by: Andreas Dilger <adilger <at> dilger.ca>
> 
> Is there anything more to do to get this accepted? I am willing to
> do more cleanups, if you request them.

Sorry, I'm still working on stablizing and fixing bugs/regressions
from the last merge window, so I haven't started looking at patches
which introduce new features.  I'll hopefully get to it in the next
week or two....

							- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Theodore Ts'o | 7 Apr 04:47 2013
Picon
Picon

[PATCH -v3] ext4: implementation of a new ioctl called EXT4_IOC_SWAP_BOOT

Hi Tilmann,

I had to make a number of changes to your patches:

#1) If the boot inode is uninitialized, i_flags was set to zero, but
the i_data field was initialized via ext4_ext_tree_init().  As a
result, if you ran e2fsck on a freshly created file system after
running ext4-swap-boot-inode, e2fsck would complain about a corrupted
file system.  (Lesson to be learned: always run e2fsck afterwards
testing the code.  The file system _must_ be consistent afterwards).

#2) Unconditionally calling ext4_ext_tree_init() means the inode will
be initialized to use extents even if the file system does not support
extents (for example, if it is an ext2 or ext3 mapped file system).
This will also cause a file system that will fail an e2fsck for a file
system formatted as ext2 or ext3 but mounted using the ext4 file
system driver.

#3) The original version of the code called truncate_inode_pages()
after releasing i_mutex.  This violates a requirement which is very
clearly documented in the comments of that function.  (Lesson to be
learned: always check the implementation of functions that you're not
familiar with.  This is also why defensive programming such as
including mutex_is_locked() assertions in functions is useful, since
developers don't always read function documentation...)

#4) filemap_flush() needs to be called before you swap the inode data.
That's because the page cache is indexed by inode number and page
number.  So if there is dirty page data, it needs to be flushed to the
inode *before* the messing with the inode data.
(Continue reading)

Andreas Dilger | 7 Apr 19:09 2013
Picon

Re: [PATCH -v3] ext4: implementation of a new ioctl called EXT4_IOC_SWAP_BOOT

On 2013-04-06, at 20:47, Theodore Ts'o <tytso <at> mit.edu> wrote:
> 
> I had to make a number of changes to your patches:
> 
> #1) If the boot inode is uninitialized, i_flags was set to zero, but
> the i_data field was initialized via ext4_ext_tree_init().  As a
> result, if you ran e2fsck on a freshly created file system after
> running ext4-swap-boot-inode, e2fsck would complain about a corrupted
> file system.  (Lesson to be learned: always run e2fsck afterwards
> testing the code.  The file system _must_ be consistent afterwards).
> 
> #2) Unconditionally calling ext4_ext_tree_init() means the inode will
> be initialized to use extents even if the file system does not support
> extents (for example, if it is an ext2 or ext3 mapped file system).
> This will also cause a file system that will fail an e2fsck for a file
> system formatted as ext2 or ext3 but mounted using the ext4 file
> system driver.
> 
> #3) The original version of the code called truncate_inode_pages()
> after releasing i_mutex.  This violates a requirement which is very
> clearly documented in the comments of that function.  (Lesson to be
> learned: always check the implementation of functions that you're not
> familiar with.  This is also why defensive programming such as
> including mutex_is_locked() assertions in functions is useful, since
> developers don't always read function documentation...)
> 
> #4) filemap_flush() needs to be called before you swap the inode data.
> That's because the page cache is indexed by inode number and page
> number.  So if there is dirty page data, it needs to be flushed to the
> inode *before* the messing with the inode data.
(Continue reading)

Theodore Ts'o | 7 Apr 21:48 2013
Picon
Picon

Re: [PATCH -v3] ext4: implementation of a new ioctl called EXT4_IOC_SWAP_BOOT

On Sun, Apr 07, 2013 at 11:09:58AM -0600, Andreas Dilger wrote:
> > +    filemap_flush(inode_bl->i_mapping);
> 
> Technically, it shouldn't be possible to have dirty pages on the
> bootloader inode, since any inode swapped there will itself have
> been flushed, and should not be directly accessible after that
> point. That said, I guess this ioctl() won't be called too often so
> the empty flush is mostly harmless.

Yes, technically there should be no mappings for the inode boot
loader.  (Well, unless someone had one of the unofficial open-by-inode
number patches).

> > +    if (inode_bl->i_nlink == 0) {
> > +        /* this inode has never been used as a BOOT_LOADER */
> > +        set_nlink(inode_bl, 1);
> > +        i_uid_write(inode_bl, 0);
> > +        i_gid_write(inode_bl, 0);
> > +        inode_bl->i_flags = 0;
> > +        ei_bl->i_flags = 0;
> > +        inode_bl->i_version = 1;
> > +        i_size_write(inode_bl, 0);
> > +        inode_bl->i_mode = S_IFREG;
> > +        if (EXT4_HAS_INCOMPAT_FEATURE(sb,
> > +                          EXT4_FEATURE_INCOMPAT_EXTENTS)) {
> > +            ext4_set_inode_flag(inode_bl, EXT4_INODE_EXTENTS);
> > +            ext4_ext_tree_init(handle, inode_bl);
> > +        } else
> > +            memset(ei_bl->i_data, 0, sizeof(ei_bl->i_data));
> 
(Continue reading)

Andreas Dilger | 7 Apr 23:29 2013
Picon

Re: [PATCH -v3] ext4: implementation of a new ioctl called EXT4_IOC_SWAP_BOOT

On 2013-04-07, at 13:48, Theodore Ts'o <tytso <at> mit.edu> wrote:
> On Sun, Apr 07, 2013 at 11:09:58AM -0600, Andreas Dilger wrote:
>>> +    if (inode_bl->i_nlink == 0) {
>>> +        /* this inode has never been used as a BOOT_LOADER */
>>> +        set_nlink(inode_bl, 1);
>>> +        i_uid_write(inode_bl, 0);
>>> +        i_gid_write(inode_bl, 0);
>>> +        inode_bl->i_flags = 0;
>>> +        ei_bl->i_flags = 0;
>>> +        inode_bl->i_version = 1;
>>> +        i_size_write(inode_bl, 0);
>>> +        inode_bl->i_mode = S_IFREG;
>>> +        if (EXT4_HAS_INCOMPAT_FEATURE(sb,
>>> +                          EXT4_FEATURE_INCOMPAT_EXTENTS)) {
>>> +            ext4_set_inode_flag(inode_bl, EXT4_INODE_EXTENTS);
>>> +            ext4_ext_tree_init(handle, inode_bl);
>>> +        } else
>>> +            memset(ei_bl->i_data, 0, sizeof(ei_bl->i_data));
>> 
>> I don't understand this. Wouldn't this clobber the block pointers if 
>> an existing boot inode and cause them to leak?  This seems broken to me. 
> 
> If i_nlink is zero, then there is no existing boot loader inode.  In
> theory the rest of the inode is zero, but this is to make sure the
> inode is initialized to something sane before we swap it into the
> user-visible inode.

Oh, my bad. I thought the else was for the nlink != 0 case, not the !extent case (the danger of reading email on
my phone, I guess). Looks fine. 

(Continue reading)

Dr. Tilmann Bubeck | 7 Apr 20:43 2013
Picon

Re: [PATCH -v3] ext4: implementation of a new ioctl called EXT4_IOC_SWAP_BOOT

Hi Ted,

thanks for improving my patch and thanks a lot for the lessons! I 
appreciate this really.

Feel free to improve the patch, it is getting better with every release.

Thanks!
  Tilmann

Am 07.04.2013 04:47, schrieb Theodore Ts'o:
> Hi Tilmann,
>
> I had to make a number of changes to your patches:
>
> #1) If the boot inode is uninitialized, i_flags was set to zero, but
> the i_data field was initialized via ext4_ext_tree_init().  As a
> result, if you ran e2fsck on a freshly created file system after
> running ext4-swap-boot-inode, e2fsck would complain about a corrupted
> file system.  (Lesson to be learned: always run e2fsck afterwards
> testing the code.  The file system _must_ be consistent afterwards).
>
> #2) Unconditionally calling ext4_ext_tree_init() means the inode will
> be initialized to use extents even if the file system does not support
> extents (for example, if it is an ext2 or ext3 mapped file system).
> This will also cause a file system that will fail an e2fsck for a file
> system formatted as ext2 or ext3 but mounted using the ext4 file
> system driver.
>
> #3) The original version of the code called truncate_inode_pages()
(Continue reading)


Gmane