Skip to content

Commit

Permalink
KVM: Drop kvm_count_lock and instead protect kvm_usage_count with kvm…
Browse files Browse the repository at this point in the history
…_lock

Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock now
that KVM hooks CPU hotplug during the ONLINE phase, which can sleep.
Previously, KVM hooked the STARTING phase, which is not allowed to sleep
and thus could not take kvm_lock (a mutex).  This effectively allows the
task that's initiating hardware enabling/disabling to preempted and/or
migrated.

Note, the Documentation/virt/kvm/locking.rst statement that kvm_count_lock
is "raw" because hardware enabling/disabling needs to be atomic with
respect to migration is wrong on multiple fronts.  First, while regular
spinlocks can be preempted, the task holding the lock cannot be migrated.
Second, preventing migration is not required.  on_each_cpu() disables
preemption, which ensures that cpus_hardware_enabled correctly reflects
hardware state.  The task may be preempted/migrated between bumping
kvm_usage_count and invoking on_each_cpu(), but that's perfectly ok as
kvm_usage_count is still protected, e.g. other tasks that call
hardware_enable_all() will be blocked until the preempted/migrated owner
exits its critical section.

KVM does have lockless accesses to kvm_usage_count in the suspend/resume
flows, but those are safe because all tasks must be frozen prior to
suspending CPUs, and a task cannot be frozen while it holds one or more
locks (userspace tasks are frozen via a fake signal).

Preemption doesn't need to be explicitly disabled in the hotplug path.
The hotplug thread is pinned to the CPU that's being hotplugged, and KVM
only cares about having a stable CPU, i.e. to ensure hardware is enabled
on the correct CPU.  Lockep, i.e. check_preemption_disabled(), plays nice
with this state too, as is_percpu_thread() is true for the hotplug thread.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221130230934.1014142-45-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
  • Loading branch information
yamahata authored and bonzini committed Dec 29, 2022
1 parent 2c106f2 commit 0bf5049
Show file tree
Hide file tree
Showing 2 changed files with 34 additions and 21 deletions.
19 changes: 10 additions & 9 deletions Documentation/virt/kvm/locking.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ KVM Lock Overview

The acquisition orders for mutexes are as follows:

- cpus_read_lock() is taken outside kvm_lock

- kvm->lock is taken outside vcpu->mutex

- kvm->lock is taken outside kvm->slots_lock and kvm->irq_lock
Expand Down Expand Up @@ -225,15 +227,10 @@ time it will be set using the Dirty tracking mechanism described above.
:Type: mutex
:Arch: any
:Protects: - vm_list

``kvm_count_lock``
^^^^^^^^^^^^^^^^^^

:Type: raw_spinlock_t
:Arch: any
:Protects: - hardware virtualization enable/disable
:Comment: 'raw' because hardware enabling/disabling must be atomic /wrt
migration.
- kvm_usage_count
- hardware virtualization enable/disable
:Comment: KVM also disables CPU hotplug via cpus_read_lock() during
enable/disable.

``kvm->mn_invalidate_lock``
^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down Expand Up @@ -297,3 +294,7 @@ time it will be set using the Dirty tracking mechanism described above.
:Type: mutex
:Arch: x86
:Protects: loading a vendor module (kvm_amd or kvm_intel)
:Comment: Exists because using kvm_lock leads to deadlock. cpu_hotplug_lock is
taken outside of kvm_lock, e.g. in KVM's CPU online/offline callbacks, and
many operations need to take cpu_hotplug_lock when loading a vendor module,
e.g. updating static calls.
36 changes: 24 additions & 12 deletions virt/kvm/kvm_main.c
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,6 @@ EXPORT_SYMBOL_GPL(halt_poll_ns_shrink);
*/

DEFINE_MUTEX(kvm_lock);
static DEFINE_RAW_SPINLOCK(kvm_count_lock);
LIST_HEAD(vm_list);

static cpumask_var_t cpus_hardware_enabled;
Expand Down Expand Up @@ -5123,17 +5122,18 @@ static int kvm_online_cpu(unsigned int cpu)
* be enabled. Otherwise running VMs would encounter unrecoverable
* errors when scheduled to this CPU.
*/
raw_spin_lock(&kvm_count_lock);
mutex_lock(&kvm_lock);
if (kvm_usage_count) {
WARN_ON_ONCE(atomic_read(&hardware_enable_failed));

hardware_enable_nolock(NULL);

if (atomic_read(&hardware_enable_failed)) {
atomic_set(&hardware_enable_failed, 0);
ret = -EIO;
}
}
raw_spin_unlock(&kvm_count_lock);
mutex_unlock(&kvm_lock);
return ret;
}

Expand All @@ -5149,10 +5149,10 @@ static void hardware_disable_nolock(void *junk)

static int kvm_offline_cpu(unsigned int cpu)
{
raw_spin_lock(&kvm_count_lock);
mutex_lock(&kvm_lock);
if (kvm_usage_count)
hardware_disable_nolock(NULL);
raw_spin_unlock(&kvm_count_lock);
mutex_unlock(&kvm_lock);
return 0;
}

Expand All @@ -5168,9 +5168,9 @@ static void hardware_disable_all_nolock(void)
static void hardware_disable_all(void)
{
cpus_read_lock();
raw_spin_lock(&kvm_count_lock);
mutex_lock(&kvm_lock);
hardware_disable_all_nolock();
raw_spin_unlock(&kvm_count_lock);
mutex_unlock(&kvm_lock);
cpus_read_unlock();
}

Expand All @@ -5187,7 +5187,7 @@ static int hardware_enable_all(void)
* enable hardware multiple times.
*/
cpus_read_lock();
raw_spin_lock(&kvm_count_lock);
mutex_lock(&kvm_lock);

kvm_usage_count++;
if (kvm_usage_count == 1) {
Expand All @@ -5200,7 +5200,7 @@ static int hardware_enable_all(void)
}
}

raw_spin_unlock(&kvm_count_lock);
mutex_unlock(&kvm_lock);
cpus_read_unlock();

return r;
Expand Down Expand Up @@ -5806,17 +5806,29 @@ static void kvm_init_debug(void)

static int kvm_suspend(void)
{
/*
* Secondary CPUs and CPU hotplug are disabled across the suspend/resume
* callbacks, i.e. no need to acquire kvm_lock to ensure the usage count
* is stable. Assert that kvm_lock is not held to ensure the system
* isn't suspended while KVM is enabling hardware. Hardware enabling
* can be preempted, but the task cannot be frozen until it has dropped
* all locks (userspace tasks are frozen via a fake signal).
*/
lockdep_assert_not_held(&kvm_lock);
lockdep_assert_irqs_disabled();

if (kvm_usage_count)
hardware_disable_nolock(NULL);
return 0;
}

static void kvm_resume(void)
{
if (kvm_usage_count) {
lockdep_assert_not_held(&kvm_count_lock);
lockdep_assert_not_held(&kvm_lock);
lockdep_assert_irqs_disabled();

if (kvm_usage_count)
hardware_enable_nolock(NULL);
}
}

static struct syscore_ops kvm_syscore_ops = {
Expand Down

0 comments on commit 0bf5049

Please sign in to comment.