Avi Kivity | 16 Nov 18:59

[PATCH 0/3] KVM: Save/resume support

The following patchset adds the missing bits that allow kvm virtual
machines to be suspended and resumed, with appropriate
userspace support.

The changes are:
 - expose the pending interrupt bitmap to userspace
 - tsc save/restore
 - msr save/restore

--

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
Avi Kivity | 16 Nov 19:02

[PATCH 1/3] KVM: Expose interrupt bitmap

From: Uri Lublin <uril@...>

Expose all not-yet-delivered interrupts to userspace.  This allows a guest
to be saved and resumed even if some interrupts are yet pending.

Signed-off-by: Uri Lublin <uril@...>
Signed-off-by: Avi Kivity <avi@...>

Index: linux-2.6/drivers/kvm/kvm.h
===================================================================
--- linux-2.6.orig/drivers/kvm/kvm.h
+++ linux-2.6/drivers/kvm/kvm.h
@@ -13,6 +13,7 @@
 #include <linux/mm.h>

 #include "vmx.h"
+#include <linux/kvm.h>

 #define CR0_PE_MASK (1ULL << 0)
 #define CR0_TS_MASK (1ULL << 3)
@@ -164,7 +165,7 @@ struct kvm_vcpu {
 	int   cpu;
 	int   launched;
 	unsigned long irq_summary; /* bit vector: 1 per word in irq_pending */
-#define NR_IRQ_WORDS (256 / BITS_PER_LONG)
+#define NR_IRQ_WORDS KVM_IRQ_BITMAP_SIZE(unsigned long)
 	unsigned long irq_pending[NR_IRQ_WORDS];
 	unsigned long regs[NR_VCPU_REGS]; /* for rsp: vcpu_load_rsp_rip() */
 	unsigned long rip;      /* needs vcpu_load_rsp_rip() */
Index: linux-2.6/drivers/kvm/kvm_main.c
(Continue reading)

Avi Kivity | 16 Nov 19:03

[PATCH 2/3] KVM: Add time stamp counter msr and accessors

From: Uri Lublin <uril@...>

Add a couple of accessors for the time stamp counter, and expose the tsc msr.

Signed-off-by: Uri Lublin <uril@...>
Signed-off-by: Avi Kivity <avi@...>

Index: linux-2.6/drivers/kvm/kvm_main.c
===================================================================
--- linux-2.6.orig/drivers/kvm/kvm_main.c
+++ linux-2.6/drivers/kvm/kvm_main.c
@@ -121,6 +121,7 @@ static const u32 vmx_msr_index[] = {
 #define TSS_REDIRECTION_SIZE (256 / 8)
 #define RMODE_TSS_SIZE (TSS_BASE_SIZE + TSS_REDIRECTION_SIZE + TSS_IOPB_SIZE + 1)

+#define MSR_IA32_TIME_STAMP_COUNTER		0x010
 #define MSR_IA32_FEATURE_CONTROL 		0x03a
 #define MSR_IA32_VMX_BASIC_MSR   		0x480
 #define MSR_IA32_VMX_PINBASED_CTLS_MSR		0x481
@@ -716,6 +717,31 @@ static void inject_gp(struct kvm_vcpu *v
 		     INTR_INFO_VALID_MASK);
 }

+/*
+ * reads and returns guest's timestamp counter "register"
+ * guest_tsc = host_tsc + tsc_offset    -- 21.3
+ */
+static u64 guest_read_tsc(void)
+{
+	u64 host_tsc, tsc_offset;
(Continue reading)

Avi Kivity | 16 Nov 19:04

[PATCH 3/3] KVM: Expose MSRs to userspace

From: Uri Lublin <uril@...>

Expose the model-specific registers to userspace.  Primarily useful for
save/resume.

Signed-off-by: Uri Lublin <uril@...>
Signed-off-by: Avi Kivity <avi@...>

Index: linux-2.6/drivers/kvm/kvm.h
===================================================================
--- linux-2.6.orig/drivers/kvm/kvm.h
+++ linux-2.6/drivers/kvm/kvm.h
@@ -106,11 +106,7 @@ struct vmcs {
 	char data[0];
 };

-struct vmx_msr_entry {
-	u32 index;
-	u32 reserved;
-	u64 data;
-};
+#define vmx_msr_entry kvm_msr_entry

 struct kvm_vcpu;

Index: linux-2.6/drivers/kvm/kvm_main.c
===================================================================
--- linux-2.6.orig/drivers/kvm/kvm_main.c
+++ linux-2.6/drivers/kvm/kvm_main.c
@@ -2298,21 +2298,22 @@ static int handle_cpuid(struct kvm_vcpu 
(Continue reading)

Arnd Bergmann | 16 Nov 20:08

Re: [PATCH 3/3] KVM: Expose MSRs to userspace

On Thursday 16 November 2006 19:04, Avi Kivity wrote:
> +struct kvm_msr_entry {
> +       __u32 index;
> +       __u32 reserved;
> +       __u64 data;
> +};
> +
> +/* for KVM_GET_MSRS and KVM_SET_MSRS */
> +struct kvm_msrs {
> +       __u32 vcpu;
> +       __u32 nmsrs; /* number of msrs in entries */
> +
> +       union {
> +               struct kvm_msr_entry __user *entries;
> +               __u64 padding;
> +       };
> +};

ioctl interfaces with pointers in them are generally a bad idea,
though you handle most of the points against them fine here
(endianess doesn't matter, padding is correct).

Still, it might be better not to set a bad example. Is accessing
the MSRs actually performance critical? If not, you could
define the ioctl to take only a single entry argument.

A possible alternative could also be to have a variable length
argument like below, but that creates other problems:

+struct kvm_msrs {
(Continue reading)

Avi Kivity | 16 Nov 20:17

Re: [kvm-devel] [PATCH 3/3] KVM: Expose MSRs to userspace

Arnd Bergmann wrote:
> On Thursday 16 November 2006 19:04, Avi Kivity wrote:
>   
>> +struct kvm_msr_entry {
>> +       __u32 index;
>> +       __u32 reserved;
>> +       __u64 data;
>> +};
>> +
>> +/* for KVM_GET_MSRS and KVM_SET_MSRS */
>> +struct kvm_msrs {
>> +       __u32 vcpu;
>> +       __u32 nmsrs; /* number of msrs in entries */
>> +
>> +       union {
>> +               struct kvm_msr_entry __user *entries;
>> +               __u64 padding;
>> +       };
>> +};
>>     
>
> ioctl interfaces with pointers in them are generally a bad idea,
> though you handle most of the points against them fine here
> (endianess doesn't matter, padding is correct).
>
> Still, it might be better not to set a bad example. Is accessing
> the MSRs actually performance critical? If not, you could
> define the ioctl to take only a single entry argument.
>
>   
(Continue reading)

Christoph Hellwig | 17 Nov 09:06

Re: [PATCH 3/3] KVM: Expose MSRs to userspace

On Thu, Nov 16, 2006 at 09:17:18PM +0200, Avi Kivity wrote:
> Heh.  That was the original implementation by Uri.  I felt that was 
> wrong because _IOW() encodes the size in the ioctl number, bit the 
> actual size is different.

That really shouldn't be a problem.  After all the pointer approach
doesn't encode the transfered size either. Given that the variable
sized array gives a much cleaner interface you should use it.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
Andrew Morton | 17 Nov 02:02

Re: [PATCH 3/3] KVM: Expose MSRs to userspace

On Thu, 16 Nov 2006 18:04:22 -0000
Avi Kivity <avi <at> qumranet.com> wrote:

> +static int kvm_dev_ioctl_set_msrs(struct kvm *kvm, struct kvm_msrs *msrs)
> +{
> +	struct kvm_vcpu *vcpu;
> +	struct kvm_msr_entry *entry, *entries;
> +	int rc;
> +	u32 size, num_entries, i;
> +
> +	if (msrs->vcpu < 0 || msrs->vcpu >= KVM_MAX_VCPUS)
> +		return -EINVAL;
> +
> +	num_entries = ARRAY_SIZE(msrs_to_save);
> +	if (msrs->nmsrs < num_entries) {
> +		msrs->nmsrs = num_entries; /* inform actual size */
> +		return -EINVAL;
> +	}
> +
> +	vcpu = vcpu_load(kvm, msrs->vcpu);
> +	if (!vcpu)
> +		return -ENOENT;
> +
> +	size = msrs->nmsrs * sizeof(struct kvm_msr_entry);
> +	rc = -E2BIG;
> +	if (size > 4096)
> +		goto out_vcpu;

Classic mutiplicative overflow bug.  Only msrs->nmsrs doesn't get used
again, so there is no bug here.  Yet.
(Continue reading)

Avi Kivity | 17 Nov 08:20

Re: [PATCH 3/3] KVM: Expose MSRs to userspace

Andrew Morton wrote:
> On Thu, 16 Nov 2006 18:04:22 -0000
> Avi Kivity <avi@...> wrote:
>
>   
>> +static int kvm_dev_ioctl_set_msrs(struct kvm *kvm, struct kvm_msrs *msrs)
>> +{
>> +	struct kvm_vcpu *vcpu;
>> +	struct kvm_msr_entry *entry, *entries;
>> +	int rc;
>> +	u32 size, num_entries, i;
>> +
>> +	if (msrs->vcpu < 0 || msrs->vcpu >= KVM_MAX_VCPUS)
>> +		return -EINVAL;
>> +
>> +	num_entries = ARRAY_SIZE(msrs_to_save);
>> +	if (msrs->nmsrs < num_entries) {
>> +		msrs->nmsrs = num_entries; /* inform actual size */
>> +		return -EINVAL;
>> +	}
>> +
>> +	vcpu = vcpu_load(kvm, msrs->vcpu);
>> +	if (!vcpu)
>> +		return -ENOENT;
>> +
>> +	size = msrs->nmsrs * sizeof(struct kvm_msr_entry);
>> +	rc = -E2BIG;
>> +	if (size > 4096)
>> +		goto out_vcpu;
>>     
(Continue reading)

Andrew Morton | 17 Nov 09:15

Re: [PATCH 3/3] KVM: Expose MSRs to userspace

On Fri, 17 Nov 2006 09:20:49 +0200
Avi Kivity <avi@...> wrote:

> >> +out_vcpu:
> >> +	vcpu_put(vcpu);
> >> +
> >> +	return rc;
> >> +}
> >>     
> >
> > This function returns no indication of how many msrs it actually did set. 
> > Should it?
> >   
> 
> It can't hurt.  Is returning the number of msrs set in the return code 
> (ala short write) acceptable, or do I need to make this a read/write ioctl?
> 

I'd have thought that you'd just copy the number written into msrs->nmsrs via

	msrs->nmsrs = num_entries;

like kvm_dev_ioctl_set_msrs() does.  Dunno...

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
(Continue reading)

Avi Kivity | 17 Nov 09:17

Re: [PATCH 3/3] KVM: Expose MSRs to userspace

Andrew Morton wrote:
> On Fri, 17 Nov 2006 09:20:49 +0200
> Avi Kivity <avi@...> wrote:
>
>   
>>>> +out_vcpu:
>>>> +	vcpu_put(vcpu);
>>>> +
>>>> +	return rc;
>>>> +}
>>>>     
>>>>         
>>> This function returns no indication of how many msrs it actually did set. 
>>> Should it?
>>>   
>>>       
>> It can't hurt.  Is returning the number of msrs set in the return code 
>> (ala short write) acceptable, or do I need to make this a read/write ioctl?
>>
>>     
>
> I'd have thought that you'd just copy the number written into msrs->nmsrs via
>
> 	msrs->nmsrs = num_entries;
>
> like kvm_dev_ioctl_set_msrs() does.  Dunno...
>   

It means making it an _IOWR() ioctl, but no matter.

(Continue reading)


Gmane