Paravirtualization in KVM
expose_kvm
How to use it: -cpu host,kvm=on
// target/i386/cpu.c
static Property x86_cpu_properties[] = {
//...
DEFINE_PROP_BOOL("kvm", X86CPU, expose_kvm, true),
//...
};
// target/i386/kvm/kvm.c
int kvm_arch_init_vcpu(CPUState *cs)
{
//...
if (cpu->expose_kvm) {
memcpy(signature, "KVMKVMKVM\0\0\0", 12);
c = &cpuid_data.entries[cpuid_i++];
c->function = KVM_CPUID_SIGNATURE | kvm_base;
c->eax = KVM_CPUID_FEATURES | kvm_base;
c->ebx = signature[0];
c->ecx = signature[1];
c->edx = signature[2];
c = &cpuid_data.entries[cpuid_i++];
c->function = KVM_CPUID_FEATURES | kvm_base;
c->eax = env->features[FEAT_KVM]; // use the KVM para features user want
c->edx = env->features[FEAT_KVM_HINTS];
}
//...
r = kvm_vcpu_ioctl(cs, KVM_SET_CPUID2, &cpuid_data);
//...
}
When we set it in the command line, you can see above code that we will expose the KVM paravirtualization features to the guest, that's the meaning of "expose_kvm": expose so let guest know it is running as a KVM.
Why use this way to expose to guest, rather than KVM_GET_SUPPORTED_CPUID?
kvm_arch_init_vcpu
is executed after
CPUID filter, so actually we have executed KVM_GET_SUPPORTED_CPUID
to drop the CPUID KVM doesn't support. Then pass the filtered CPUID to KVM to set the CPUID: c->eax = env->features[FEAT_KVM];
.
KVM specific/custom MSR (Paravirtualization)
KVM makes use of some custom MSRs to service some requests.
Custom MSRs have a range reserved for them, that goes from 0x4b564d00
to 0x4b564dff
. There are MSRs outside this area, but they are deprecated and their use is discouraged.
For example, for MSR MSR_KVM_POLL_CONTROL
, guest kernel can enable it:
// arch/x86/kernel/kvm.c
void arch_haltpoll_enable(unsigned int cpu)
{
//...
/* Enable guest halt poll disables host halt poll */
// wrmsrl(MSR_KVM_POLL_CONTROL, 0);
smp_call_function_single(cpu, kvm_disable_host_haltpoll, NULL, 1);
}
EXPORT_SYMBOL_GPL(arch_haltpoll_enable);
KVM-specific MSRs — The Linux Kernel documentation
How does guest kernel know it is running as a VM when the KVM is exposed?
Intel and AMD CPUs have reserved bit 31 of ECX of CPUID leaf 0x1 as the hypervisor present bit. This bit allows hypervisors to indicate their presence to the guest operating system.
Mechanisms to determine if software is running in a VMware virtual machine (1009458)
start_kernel
setup_arch
init_hypervisor_platform
detect_hypervisor_vendor
kvm_detect //(*p)->detect();
kvm_cpuid_base
__kvm_cpuid_base
boot_cpu_has(X86_FEATURE_HYPERVISOR)
KVM_CPUID_SIGNATURE: 0x40000000
This CPUID
- returns the signature
"KVMKVMKVM"
inebx
,ecx
, andedx
. - returns the
KVM_CPUID_FEATURES
ineax
.
It should be used to determine that a VM is running under KVM.
KVM_CPUID_FEATURES: 0x40000001
/* This CPUID returns two feature bitmaps in eax, edx. Before enabling
* a particular paravirtualization, the appropriate feature bit should
* be checked in eax. The performance hint feature bit should be checked
* in edx.
*/
#define KVM_CPUID_FEATURES 0x40000001
/* features bits in this CPUID */
#define KVM_FEATURE_CLOCKSOURCE 0
#define KVM_FEATURE_NOP_IO_DELAY 1
#define KVM_FEATURE_MMU_OP 2
#define KVM_FEATURE_CLOCKSOURCE2 3
#define KVM_FEATURE_ASYNC_PF 4
#define KVM_FEATURE_STEAL_TIME 5
#define KVM_FEATURE_PV_EOI 6
#define KVM_FEATURE_PV_UNHALT 7
#define KVM_FEATURE_PV_TLB_FLUSH 9
#define KVM_FEATURE_ASYNC_PF_VMEXIT 10
#define KVM_FEATURE_PV_SEND_IPI 11
#define KVM_FEATURE_POLL_CONTROL 12
#define KVM_FEATURE_PV_SCHED_YIELD 13
#define KVM_FEATURE_ASYNC_PF_INT 14
#define KVM_FEATURE_MSI_EXT_DEST_ID 15
#define KVM_FEATURE_HC_MAP_GPA_RANGE 16
#define KVM_FEATURE_MIGRATION_CONTROL 17
KVM halt polling (KVM_FEATURE_POLL_CONTROL
)
在实际业务中,guest 执行 HLT 指令是导致虚拟化 overhead 的一个重要原因。注意,HLT 只有在一定条件下才会 vm-exit。
KVM halt polling 是为了解决这一个问题被引入的,它在 Linux 4.3-rc1 被合入主干内核。当 guest 执行 HLT 指令时,会发生 vm-exit。host kernel 并不马上让出物理核给调度器,而是 poll 一段时间,若 guest 在这段时间内被唤醒,便可以马上调度回该 vcpu 线程继续运行。
Host side poll / guest side poll
If guest poll is enabled, host poll should be disabled, and vice versa.
Guest side halt polling allows the guest vcpus to poll for a specified amount of time before halting. This provides the following benefits to host side polling:
- The POLL flag is set while polling is performed, which allows a remote vCPU to avoid sending an IPI (and the associated cost of handling the IPI) when performing a wakeup.
- The VM-exit cost can be avoided.
The downside is that polling is performed even with other runnable tasks in the host.
Why enable guest poll disables host poll?
Because no need to poll twice, see the following self-explained code in guest kernel:
// drivers/cpuidle/cpuidle-haltpoll.c
static int haltpoll_cpu_online(unsigned int cpu)
{
//...
arch_haltpoll_enable(cpu);
//...
}
// This function enables guest side halt poll (if it is running as a guest)
void arch_haltpoll_enable(unsigned int cpu)
{
// this feature actually is for **host side polling**
if (!kvm_para_has_feature(KVM_FEATURE_POLL_CONTROL)) {
pr_err_once("host does not support poll control\n");
pr_err_once("host upgrade recommended\n");
return;
}
/* Enable guest halt poll disables host halt poll */
smp_call_function_single(cpu, kvm_disable_host_haltpoll, NULL, 1);
}
EXPORT_SYMBOL_GPL(arch_haltpoll_enable);
Host side poll
How to enable host side poll
KVM should support KVM_FEATURE_POLL_CONTROL
paravirtualization feature.
MSR_KVM_POLL_CONTROL
should be set to 1 to enable host (KVM) side halt polling.
Main process
static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
//...
[EXIT_REASON_HLT] = kvm_emulate_halt,
//...
}
// vcpu_run -> vcpu_block
void kvm_vcpu_halt(struct kvm_vcpu *vcpu)
{
//...
if (do_halt_poll) {
ktime_t stop = ktime_add_ns(start, vcpu->halt_poll_ns);
do {
/*
* This sets KVM_REQ_UNHALT if an interrupt
* arrives.
*/
if (kvm_vcpu_check_block(vcpu) < 0)
goto out;
cpu_relax();
poll_end = cur = ktime_get();
} while (kvm_vcpu_can_poll(cur, stop));
}
//...
}