Christian Schaubschläger | 24 Jul 2012 19:49
Picon
Picon

Problem with kexec on i386, linux-3.5

Hello list,

I'm not sure if this is the correct place to post this; if it's not, I'd like to apologize.

Here's a short description of my problem:

I have a tiny protected-/real mode program, which I start using kexec (kexec-tools 2.0.3 released 05 April
2012). At some point this program makes a call to extended-int13 to read data from the disk. Now starting
with linux-3.5-rc1 (and at least up to linux-3.5) this extended int13 call does not work any more.
Apparently the call returns with error code 0x80, which means "timeout (not ready)".

I have two machines here, both with Intel chipsets (one CougarPoint, one older ICH7-M), and I see the same
behaviour on both machines.

When I use older kernels (starting from 2.6.something up to 3.4.6), everything works fine.

Now I'm not sure if this is a kernel issue, or a kexec issue, or a mistake by myself. Maybe someone has a hint for me...

If required, of course, I can provide more detailed information about my hardware, kernel config, etc.
(since I'm not sure if this is the correct place, I wanted to keep this message short for now).

Thank you
Christian Schaubschlaeger
Eric W. Biederman | 14 Aug 2012 05:31
Favicon

Re: Problem with kexec on i386, linux-3.5

Christian Schaubschläger <christian.schaubschlaeger <at> gmx.at> writes:

> Hello list,
>
> I'm not sure if this is the correct place to post this; if it's not,
> I'd like to apologize.
>
> Here's a short description of my problem:
>
> I have a tiny protected-/real mode program, which I start using kexec
> (kexec-tools 2.0.3 released 05 April 2012). At some point this program
> makes a call to extended-int13 to read data from the disk. Now
> starting with linux-3.5-rc1 (and at least up to linux-3.5) this
> extended int13 call does not work any more. Apparently the call
> returns with error code 0x80, which means "timeout (not ready)".
>
> I have two machines here, both with Intel chipsets (one CougarPoint,
> one older ICH7-M), and I see the same behaviour on both machines.
>
> When I use older kernels (starting from 2.6.something up to 3.4.6),
> everything works fine.
>
> Now I'm not sure if this is a kernel issue, or a kexec issue, or a
> mistake by myself. Maybe someone has a hint for me...
>
> If required, of course, I can provide more detailed information about
> my hardware, kernel config, etc. (since I'm not sure if this is the
> correct place, I wanted to keep this message short for now).

That is a tricky issue.  Sometimes the slightest things can set
(Continue reading)

Christian Schaubschläger | 16 Aug 2012 14:41
Picon
Picon

Re: Problem with kexec on i386, linux-3.5


> That is a tricky issue.  Sometimes the slightest things can set
> something like this off.
>
> Somewhere someone changed something in one of the drivers that made it
> so that the hardware winds up in a state the int 13 disk driver does not
> like it after kexec.
> .
> If you want to track this down I would recommend a bisect between 3.4
> and 3.5-rc1 to see which change breaks your setup.

I bistcted that down to this patch:

commit b566a22c23327f18ce941ffad0ca907e50a53d41
Author: Khalid Aziz <khalid.aziz@...>
Date:   Fri Apr 27 13:00:33 2012 -0600

    PCI: disable Bus Master on PCI device shutdown

    Disable Bus Master bit on the device in pci_device_shutdown() to ensure PCI
    devices do not continue to DMA data after shutdown.  This can cause memory
    corruption in case of a kexec where the current kernel shuts down and
    transfers control to a new kernel while a PCI device continues to DMA to
    memory that does not belong to it any more in the new kernel.

    I have tested this code on two laptops, two workstations and a 16-socket
    server.  kexec worked correctly on all of them.

    Signed-off-by: Khalid Aziz <khalid.aziz@...>
    Signed-off-by: Bjorn Helgaas <bhelgaas@...>
(Continue reading)

Eric W. Biederman | 16 Aug 2012 21:22
Favicon

Re: Problem with kexec on i386, linux-3.5

Christian Schaubschläger <christian.schaubschlaeger <at> gmx.at> writes:

>> That is a tricky issue.  Sometimes the slightest things can set
>> something like this off.
>>
>> Somewhere someone changed something in one of the drivers that made it
>> so that the hardware winds up in a state the int 13 disk driver does not
>> like it after kexec.
>> .
>> If you want to track this down I would recommend a bisect between 3.4
>> and 3.5-rc1 to see which change breaks your setup.
>
> I bistcted that down to this patch:
>
> commit b566a22c23327f18ce941ffad0ca907e50a53d41
> Author: Khalid Aziz <khalid.aziz <at> hp.com>
> Date:   Fri Apr 27 13:00:33 2012 -0600
>
>     PCI: disable Bus Master on PCI device shutdown
>    
>     Disable Bus Master bit on the device in pci_device_shutdown() to ensure PCI
>     devices do not continue to DMA data after shutdown.  This can cause memory
>     corruption in case of a kexec where the current kernel shuts down and
>     transfers control to a new kernel while a PCI device continues to DMA to
>     memory that does not belong to it any more in the new kernel.
>    
>     I have tested this code on two laptops, two workstations and a 16-socket
>     server.  kexec worked correctly on all of them.
>    
>     Signed-off-by: Khalid Aziz <khalid.aziz <at> hp.com>
(Continue reading)

Christian Schaubschläger | 17 Aug 2012 08:58
Picon
Picon

Re: Problem with kexec on i386, linux-3.5


>> I bistcted that down to this patch:
>>
>> commit b566a22c23327f18ce941ffad0ca907e50a53d41
>> Author: Khalid Aziz <khalid.aziz@...>
>> Date:   Fri Apr 27 13:00:33 2012 -0600
>>
>>     PCI: disable Bus Master on PCI device shutdown
>>    
>>     Disable Bus Master bit on the device in pci_device_shutdown() to ensure PCI
>>     devices do not continue to DMA data after shutdown.  This can cause memory
>>     corruption in case of a kexec where the current kernel shuts down and
>>     transfers control to a new kernel while a PCI device continues to DMA to
>>     memory that does not belong to it any more in the new kernel.
>>    
>>     I have tested this code on two laptops, two workstations and a 16-socket
>>     server.  kexec worked correctly on all of them.
>>    
>>     Signed-off-by: Khalid Aziz <khalid.aziz@...>
>>     Signed-off-by: Bjorn Helgaas <bhelgaas@...>
>>
>>
>> Without this patch, int13 works fine here! If anyone needs more
>> information, just let me know!
> Which leads to an interesting conundrum.
>
> kexec appears to be more reliable for booting another kernel with this
> patch applied.  This patch does kill the entier use case of making BIOS
> calls, and I suspect it also does nasty things to alpha bootloaders.
>
(Continue reading)

Khalid Aziz | 7 Sep 2012 17:16
Picon
Favicon

Re: Problem with kexec on i386, linux-3.5

On Fri, 2012-08-17 at 08:58 +0200, Christian Schaubschläger wrote:
> >> I bistcted that down to this patch:
> >>
> >> commit b566a22c23327f18ce941ffad0ca907e50a53d41
> >> Author: Khalid Aziz <khalid.aziz <at> hp.com>
> >> Date:   Fri Apr 27 13:00:33 2012 -0600
> >>
> >>     PCI: disable Bus Master on PCI device shutdown
> >>    
> >>     Disable Bus Master bit on the device in pci_device_shutdown() to ensure PCI
> >>     devices do not continue to DMA data after shutdown.  This can cause memory
> >>     corruption in case of a kexec where the current kernel shuts down and
> >>     transfers control to a new kernel while a PCI device continues to DMA to
> >>     memory that does not belong to it any more in the new kernel.
> >>    
> >>     I have tested this code on two laptops, two workstations and a 16-socket
> >>     server.  kexec worked correctly on all of them.
> >>    
> >>     Signed-off-by: Khalid Aziz <khalid.aziz <at> hp.com>
> >>     Signed-off-by: Bjorn Helgaas <bhelgaas <at> google.com>
> >>
> >>
> >> Without this patch, int13 works fine here! If anyone needs more
> >> information, just let me know!
> > Which leads to an interesting conundrum.
> >
> > kexec appears to be more reliable for booting another kernel with this
> > patch applied.  This patch does kill the entier use case of making BIOS
> > calls, and I suspect it also does nasty things to alpha bootloaders.
> >
(Continue reading)

Khalid Aziz | 7 Sep 2012 22:29
Picon
Favicon

Re: Problem with kexec on i386, linux-3.5

On 07/24/2012 11:49 AM, Christian Schaubschläger wrote:
> Hello list,
>
> I'm not sure if this is the correct place to post this; if it's not, 
> I'd like to apologize.
>
> Here's a short description of my problem:
>
> I have a tiny protected-/real mode program, which I start using kexec 
> (kexec-tools 2.0.3 released 05 April 2012). At some point this program 
> makes a call to extended-int13 to read data from the disk. Now 
> starting with linux-3.5-rc1 (and at least up to linux-3.5) this 
> extended int13 call does not work any more. Apparently the call 
> returns with error code 0x80, which means "timeout (not ready)".
>
> I have two machines here, both with Intel chipsets (one CougarPoint, 
> one older ICH7-M), and I see the same behaviour on both machines.
>
> When I use older kernels (starting from 2.6.something up to 3.4.6), 
> everything works fine.
>
> Now I'm not sure if this is a kernel issue, or a kexec issue, or a 
> mistake by myself. Maybe someone has a hint for me...
>
> If required, of course, I can provide more detailed information about 
> my hardware, kernel config, etc. (since I'm not sure if this is the 
> correct place, I wanted to keep this message short for now).
>

Hello Christian,
(Continue reading)

Christian Schaubschläger | 10 Sep 2012 08:00
Picon
Picon

Re: Problem with kexec on i386, linux-3.5


Hello Khalid,

> Are you not loading the driver for your disk drive controller when the 
> new kernel boots up, even though you are not using the driver for disk 
> I/O? If yes, the driver should have re-enabled Bus Master bit in its 
> init routine. If you are loading the driver, which driver is it? I can 
> take a look at it and see if there is anything missing in the 
> initialization routine. Can you also include output from "lspci -v" from 
> your machine?

I'm not loading any drivers in my new kernel, I'm just doing pure BIOS disk I/O using Int13 calls (the program
I start using kexec I would not actually call a 'kernel', it's just a very tiny piece of software which does
some disk I/O and output on the screen).

Attached you find the output of lspci -v

Best regards
Christian

00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS, 943/940GML and 945GT Express Memory
Controller Hub (rev 03)
	Subsystem: Dell Device 01d4
	Flags: bus master, fast devsel, latency 0
	Capabilities: [e0] Vendor Specific Information: Len=09 <?>

00:02.0 VGA compatible controller: Intel Corporation Mobile 945GM/GMS, 943/940GML Express Integrated
Graphics Controller (rev 03) (prog-if 00 [VGA controller])
(Continue reading)

Khalid Aziz | 10 Sep 2012 18:29
Picon
Favicon

Re: Problem with kexec on i386, linux-3.5

On Mon, 2012-09-10 at 08:00 +0200, Christian Schaubschläger wrote:
> Hello Khalid,
> 
> > Are you not loading the driver for your disk drive controller when the 
> > new kernel boots up, even though you are not using the driver for disk 
> > I/O? If yes, the driver should have re-enabled Bus Master bit in its 
> > init routine. If you are loading the driver, which driver is it? I can 
> > take a look at it and see if there is anything missing in the 
> > initialization routine. Can you also include output from "lspci -v" from 
> > your machine?
> 
> I'm not loading any drivers in my new kernel, I'm just doing pure BIOS disk I/O using Int13 calls (the
program I start using kexec I would not actually call a 'kernel', it's just a very tiny piece of software
which does some disk I/O and output on the screen).
> 
> Attached you find the output of lspci -v
> 
> Best regards
> Christian
> 
> 

Hello Christian,

You have a rather esoteric use case. The patch that clears Bus Master
bit relies upon drivers reinitializing the controllers which includes
setting the Bus Master bit as well. Can you access PCI config space in
your program that you kexec? If yes, can you set the Bus Master bit in
your program? You have a pretty standard IDE controller there which does
have Bus Master capability.
(Continue reading)


Gmane