summaryrefslogtreecommitdiff
path: root/drivers/hv
AgeCommit message (Collapse)Author
2025-12-09Merge tag 'hyperv-next-signed-20251207' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - Enhancements to Linux as the root partition for Microsoft Hypervisor: - Support a new mode called L1VH, which allows Linux to drive the hypervisor running the Azure Host directly - Support for MSHV crash dump collection - Allow Linux's memory management subsystem to better manage guest memory regions - Fix issues that prevented a clean shutdown of the whole system on bare metal and nested configurations - ARM64 support for the MSHV driver - Various other bug fixes and cleanups - Add support for Confidential VMBus for Linux guest on Hyper-V - Secure AVIC support for Linux guests on Hyper-V - Add the mshv_vtl driver to allow Linux to run as the secure kernel in a higher virtual trust level for Hyper-V * tag 'hyperv-next-signed-20251207' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (58 commits) mshv: Cleanly shutdown root partition with MSHV mshv: Use reboot notifier to configure sleep state mshv: Add definitions for MSHV sleep state configuration mshv: Add support for movable memory regions mshv: Add refcount and locking to mem regions mshv: Fix huge page handling in memory region traversal mshv: Move region management to mshv_regions.c mshv: Centralize guest memory region destruction mshv: Refactor and rename memory region handling functions mshv: adjust interrupt control structure for ARM64 Drivers: hv: use kmalloc_array() instead of kmalloc() mshv: Add ioctl for self targeted passthrough hvcalls Drivers: hv: Introduce mshv_vtl driver Drivers: hv: Export some symbols for mshv_vtl static_call: allow using STATIC_CALL_TRAMP_STR() from assembly mshv: Extend create partition ioctl to support cpu features mshv: Allow mappings that overlap in uaddr mshv: Fix create memory region overlap check mshv: add WQ_PERCPU to alloc_workqueue users Drivers: hv: Use kmalloc_array() instead of kmalloc() ...
2025-12-05Merge tag 'soc-drivers-6.19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC driver updates from Arnd Bergmann: "This is the first half of the driver changes: - A treewide interface change to the "syscore" operations for power management, as a preparation for future Tegra specific changes - Reset controller updates with added drivers for LAN969x, eic770 and RZ/G3S SoCs - Protection of system controller registers on Renesas and Google SoCs, to prevent trivially triggering a system crash from e.g. debugfs access - soc_device identification updates on Nvidia, Exynos and Mediatek - debugfs support in the ST STM32 firewall driver - Minor updates for SoC drivers on AMD/Xilinx, Renesas, Allwinner, TI - Cleanups for memory controller support on Nvidia and Renesas" * tag 'soc-drivers-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (114 commits) memory: tegra186-emc: Fix missing put_bpmp Documentation: reset: Remove reset_controller_add_lookup() reset: fix BIT macro reference reset: rzg2l-usbphy-ctrl: Fix a NULL vs IS_ERR() bug in probe reset: th1520: Support reset controllers in more subsystems reset: th1520: Prepare for supporting multiple controllers dt-bindings: reset: thead,th1520-reset: Add controllers for more subsys dt-bindings: reset: thead,th1520-reset: Remove non-VO-subsystem resets reset: remove legacy reset lookup code clk: davinci: psc: drop unused reset lookup reset: rzg2l-usbphy-ctrl: Add support for RZ/G3S SoC reset: rzg2l-usbphy-ctrl: Add support for USB PWRRDY dt-bindings: reset: renesas,rzg2l-usbphy-ctrl: Document RZ/G3S support reset: eswin: Add eic7700 reset driver dt-bindings: reset: eswin: Documentation for eic7700 SoC reset: sparx5: add LAN969x support dt-bindings: reset: microchip: Add LAN969x support soc: rockchip: grf: Add select correct PWM implementation on RK3368 soc/tegra: pmc: Add USB wake events for Tegra234 amba: tegra-ahb: Fix device leak on SMMU enable ...
2025-12-05mshv: Cleanly shutdown root partition with MSHVPraveen K Paladugu
Root partitions running on MSHV currently attempt ACPI power-off, which MSHV intercepts and triggers a Machine Check Exception (MCE), leading to a kernel panic. Root partitions panic with a trace similar to: [ 81.306348] reboot: Power down [ 81.314709] mce: [Hardware Error]: CPU 0: Machine Check Exception: 4 Bank 0: b2000000c0060001 [ 81.314711] mce: [Hardware Error]: TSC 3b8cb60a66 PPIN 11d98332458e4ea9 [ 81.314713] mce: [Hardware Error]: PROCESSOR 0:606a6 TIME 1759339405 SOCKET 0 APIC 0 microcode ffffffff [ 81.314715] mce: [Hardware Error]: Run the above through 'mcelog --ascii' [ 81.314716] mce: [Hardware Error]: Machine check: Processor context corrupt [ 81.314717] Kernel panic - not syncing: Fatal machine check To avoid this, configure the sleep state in the hypervisor and invoke the HVCALL_ENTER_SLEEP_STATE hypercall as the final step in the shutdown sequence. This ensures a clean and safe shutdown of the root partition. Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com> Co-developed-by: Anatol Belski <anbelski@linux.microsoft.com> Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com> Reviewed-by: Easwar Hariharan <easwar.hariharan@linux.microsoft.com> Acked-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-12-05mshv: Use reboot notifier to configure sleep statePraveen K Paladugu
Configure sleep state information from ACPI in MSHV hypervisor using a reboot notifier. This data allows the hypervisor to correctly power off the host during shutdown. Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com> Co-developed-by: Anatol Belski <anbelski@linux.microsoft.com> Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com> Reviewed-by: Easwar Hariharan <easwar.hariharan@linux.microsoft.com> Reviewed-by: Stansialv Kinsburskii <skinsburskii@linux.miscrosoft.com> Acked-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-12-05mshv: Add support for movable memory regionsStanislav Kinsburskii
Introduce support for movable memory regions in the Hyper-V root partition driver to improve memory management flexibility and enable advanced use cases such as dynamic memory remapping. Mirror the address space between the Linux root partition and guest VMs using HMM. The root partition owns the memory, while guest VMs act as devices with page tables managed via hypercalls. MSHV handles VP intercepts by invoking hmm_range_fault() and updating SLAT entries. When memory is reclaimed, HMM invalidates the relevant regions, prompting MSHV to clear SLAT entries; guest VMs will fault again on access. Integrate mmu_interval_notifier for movable regions, implement handlers for HMM faults and memory invalidation, and update memory region mapping logic to support movable regions. While MMU notifiers are commonly used in virtualization drivers, this implementation leverages HMM (Heterogeneous Memory Management) for its specialized functionality. HMM provides a framework for mirroring, invalidation, and fault handling, reducing boilerplate and improving maintainability compared to generic MMU notifiers. Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-12-05mshv: Add refcount and locking to mem regionsStanislav Kinsburskii
Introduce kref-based reference counting and spinlock protection for memory regions in Hyper-V partition management. This change improves memory region lifecycle management and ensures thread-safe access to the region list. Previously, the regions list was protected by the partition mutex. However, this approach is too heavy for frequent fault and invalidation operations. Finer grained locking is now used to improve efficiency and concurrency. This is a precursor to supporting movable memory regions. Fault and invalidation handling for movable regions will require safe traversal of the region list and holding a region reference while performing invalidation or fault operations. Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-12-05mshv: Fix huge page handling in memory region traversalStanislav Kinsburskii
The previous code assumed that if a region's first page was huge, the entire region consisted of huge pages and stored this in a large_pages flag. This premise is incorrect not only for movable regions (where pages can be split and merged on invalidate callbacks or page faults), but even for pinned regions: THPs can be split and merged during allocation, so a large, pinned region may contain a mix of huge and regular pages. This change removes the large_pages flag and replaces region-wide assumptions with per-chunk inspection of the actual page size when mapping, unmapping, sharing, and unsharing. This makes huge page handling correct for mixed-page regions and avoids relying on stale metadata that can easily become invalid as memory is remapped. Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com> Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-12-05mshv: Move region management to mshv_regions.cStanislav Kinsburskii
Refactor memory region management functions from mshv_root_main.c into mshv_regions.c for better modularity and code organization. Adjust function calls and headers to use the new implementation. Improve maintainability and separation of concerns in the mshv_root module. Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com> Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-12-05mshv: Centralize guest memory region destructionStanislav Kinsburskii
Centralize guest memory region destruction to prevent resource leaks and inconsistent cleanup across unmap and partition destruction paths. Unify region removal, encrypted partition access recovery, and region invalidation to improve maintainability and reliability. Reduce code duplication and make future updates less error-prone by encapsulating cleanup logic in a single helper. Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-12-05mshv: Refactor and rename memory region handling functionsStanislav Kinsburskii
Simplify and unify memory region management to improve code clarity and reliability. Consolidate pinning and invalidation logic, adopt consistent naming, and remove redundant checks to reduce complexity. Enhance documentation and update call sites for maintainability. Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-12-05mshv: adjust interrupt control structure for ARM64Jinank Jain
Interrupt control structure (union hv_interupt_control) has different fields when it comes to x86 vs ARM64. Bring in the correct structure from HyperV header files and adjust the existing interrupt routing code accordingly. Signed-off-by: Jinank Jain <jinankjain@microsoft.com> Signed-off-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-12-05Drivers: hv: use kmalloc_array() instead of kmalloc()Gongwei Li
Replace kmalloc() with kmalloc_array() to prevent potential overflow, as recommended in Documentation/process/deprecated.rst. Signed-off-by: Gongwei Li <ligongwei@kylinos.cn> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-12-05mshv: Add ioctl for self targeted passthrough hvcallsAnirudh Rayabharam (Microsoft)
Allow MSHV_ROOT_HVCALL IOCTL on the /dev/mshv fd. This IOCTL would execute a passthrough hypercall targeting the root/parent partition i.e. HV_PARTITION_ID_SELF. This will be useful for the VMM to query things like supported synthetic processor features, supported VMM capabiliites etc. Since hypercalls targeting the host partition could potentially perform privileged operations, allow only a limited set of hypercalls. To begin with, allow only: HVCALL_GET_PARTITION_PROPERTY HVCALL_GET_PARTITION_PROPERTY_EX Signed-off-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com> Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-12-05Drivers: hv: Introduce mshv_vtl driverNaman Jain
Provide an interface for Virtual Machine Monitor like OpenVMM and its use as OpenHCL paravisor to control VTL0 (Virtual trust Level). Expose devices and support IOCTLs for features like VTL creation, VTL0 memory management, context switch, making hypercalls, mapping VTL0 address space to VTL2 userspace, getting new VMBus messages and channel events in VTL2 etc. Co-developed-by: Roman Kisel <romank@linux.microsoft.com> Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Co-developed-by: Saurabh Sengar <ssengar@linux.microsoft.com> Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Naman Jain <namjain@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-12-02Merge tag 'core-rseq-2025-11-30' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull rseq updates from Thomas Gleixner: "A large overhaul of the restartable sequences and CID management: The recent enablement of RSEQ in glibc resulted in regressions which are caused by the related overhead. It turned out that the decision to invoke the exit to user work was not really a decision. More or less each context switch caused that. There is a long list of small issues which sums up nicely and results in a 3-4% regression in I/O benchmarks. The other detail which caused issues due to extra work in context switch and task migration is the CID (memory context ID) management. It also requires to use a task work to consolidate the CID space, which is executed in the context of an arbitrary task and results in sporadic uncontrolled exit latencies. The rewrite addresses this by: - Removing deprecated and long unsupported functionality - Moving the related data into dedicated data structures which are optimized for fast path processing. - Caching values so actual decisions can be made - Replacing the current implementation with a optimized inlined variant. - Separating fast and slow path for architectures which use the generic entry code, so that only fault and error handling goes into the TIF_NOTIFY_RESUME handler. - Rewriting the CID management so that it becomes mostly invisible in the context switch path. That moves the work of switching modes into the fork/exit path, which is a reasonable tradeoff. That work is only required when a process creates more threads than the cpuset it is allowed to run on or when enough threads exit after that. An artificial thread pool benchmarks which triggers this did not degrade, it actually improved significantly. The main effect in migration heavy scenarios is that runqueue lock held time and therefore contention goes down significantly" * tag 'core-rseq-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits) sched/mmcid: Switch over to the new mechanism sched/mmcid: Implement deferred mode change irqwork: Move data struct to a types header sched/mmcid: Provide CID ownership mode fixup functions sched/mmcid: Provide new scheduler CID mechanism sched/mmcid: Introduce per task/CPU ownership infrastructure sched/mmcid: Serialize sched_mm_cid_fork()/exit() with a mutex sched/mmcid: Provide precomputed maximal value sched/mmcid: Move initialization out of line signal: Move MMCID exit out of sighand lock sched/mmcid: Convert mm CID mask to a bitmap cpumask: Cache num_possible_cpus() sched/mmcid: Use cpumask_weighted_or() cpumask: Introduce cpumask_weighted_or() sched/mmcid: Prevent pointless work in mm_update_cpus_allowed() sched/mmcid: Move scheduler code out of global header sched: Fixup whitespace damage sched/mmcid: Cacheline align MM CID storage sched/mmcid: Use proper data structures sched/mmcid: Revert the complex CID management ...
2025-11-28hv: convert mshv_ioctl_create_partition() to FD_ADD()Christian Brauner
Link: https://patch.msgid.link/20251123-work-fd-prepare-v4-39-b6efa1706cfd@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-15Drivers: hv: Export some symbols for mshv_vtlNaman Jain
MSHV_VTL driver is going to be introduced, which is supposed to provide interface for Virtual Machine Monitors (VMMs) to control Virtual Trust Level (VTL). Export the symbols needed to make it work (vmbus_isr, hv_context and hv_post_message). Co-developed-by: Roman Kisel <romank@linux.microsoft.com> Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Co-developed-by: Saurabh Sengar <ssengar@linux.microsoft.com> Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202506110544.q0NDMQVc-lkp@intel.com/ Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Naman Jain <namjain@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-15mshv: Extend create partition ioctl to support cpu featuresMuminul Islam
The existing mshv create partition ioctl does not provide a way to specify which cpu features are enabled in the guest. Instead, it attempts to enable all features and those that are not supported are silently disabled by the hypervisor. This was done to reduce unnecessary complexity and is sufficient for many cases. However, new scenarios require fine-grained control over these features. Define a new mshv_create_partition_v2 structure which supports passing the disabled processor and xsave feature bits through to the create partition hypercall directly. Introduce a new flag MSHV_PT_BIT_CPU_AND_XSAVE_FEATURES which enables the new structure. If unset, the original mshv_create_partition struct is used, with the old behavior of enabling all features. Co-developed-by: Jinank Jain <jinankjain@microsoft.com> Signed-off-by: Jinank Jain <jinankjain@microsoft.com> Signed-off-by: Muminul Islam <muislam@microsoft.com> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-15mshv: Allow mappings that overlap in uaddrMagnus Kulke
Currently the MSHV driver rejects mappings that would overlap in userspace. Some VMMs require the same memory to be mapped to different parts of the guest's address space, and so working around this restriction is difficult. The hypervisor itself doesn't prohibit mappings that overlap in uaddr, (really in SPA; system physical addresses), so supporting this in the driver doesn't require any extra work: only the checks need to be removed. Since no userspace code until now has been able to overlap regions in userspace, relaxing this constraint can't break any existing code. Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-15mshv: Fix create memory region overlap checkNuno Das Neves
The current check is incorrect; it only checks if the beginning or end of a region is within an existing region. This doesn't account for userspace specifying a region that begins before and ends after an existing region. Change the logic to a range intersection check against gfns and uaddrs for each region. Remove mshv_partition_region_by_uaddr() as it is no longer used. Fixes: 621191d709b1 ("Drivers: hv: Introduce mshv_root module to expose /dev/mshv to VMMs") Reported-by: Michael Kelley <mhklinux@outlook.com> Closes: https://lore.kernel.org/linux-hyperv/SN6PR02MB41575BE0406D3AB22E1D7DB5D4C2A@SN6PR02MB4157.namprd02.prod.outlook.com/ Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-15mshv: add WQ_PERCPU to alloc_workqueue usersMarco Crivellari
Currently if a user enqueues a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This lack of consistency cannot be addressed without refactoring the API. alloc_workqueue() treats all queues as per-CPU by default, while unbound workqueues must opt-in via WQ_UNBOUND. This default is suboptimal: most workloads benefit from unbound queues, allowing the scheduler to place worker threads where they’re needed and reducing noise when CPUs are isolated. This continues the effort to refactor workqueue APIs, which began with the introduction of new workqueues and a new alloc_workqueue flag in: commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq") commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag") This change adds a new WQ_PERCPU flag to explicitly request alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified. With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND), any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND must now use WQ_PERCPU. Once migration is complete, WQ_UNBOUND can be removed and unbound will become the implicit default. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-15Drivers: hv: Use kmalloc_array() instead of kmalloc()Rahul Kumar
Documentation/process/deprecated.rst recommends against the use of kmalloc with dynamic size calculations due to the risk of overflow and smaller allocation being made than the caller was expecting. Replace kmalloc() with kmalloc_array() in hv_common.c to make the intended allocation size clearer and avoid potential overflow issues. The number of pages (pgcount) is bounded, so overflow is not a practical concern here. However, using kmalloc_array() better reflects the intent to allocate an array and improves consistency with other allocations. No functional change intended. Signed-off-by: Rahul Kumar <rk0006818@gmail.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-15Drivers: hv: Resolve ambiguity in hypervisor version logStanislav Kinsburskii
Update the log message in hv_common_init to explicitly state that the reported version is for the Microsoft Hypervisor, not the host OS. Previously, this message was accurate for guests running on Windows hosts, where the host and hypervisor versions matched. With support for Linux hosts running the Hyper-V hypervisor, the host OS and hypervisor versions may differ. This change avoids confusion by making it clear that the version refers to the Microsoft Hypervisor regardless of the host operating system. Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-15Drivers: hv: fix missing kernel-doc description for 'size' in request_arr_init()Kriish Sharma
Add missing kernel-doc entry for the @size parameter in request_arr_init(), fixing the following documentation warning reported by the kernel test robot and detected via kernel-doc: Warning: drivers/hv/channel.c:595 function parameter 'size' not described in 'request_arr_init' Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202503021934.wH1BERla-lkp@intel.com Signed-off-by: Kriish Sharma <kriish.sharma2006@gmail.com> Reviewed-by: Easwar Hariharan <easwar.hariharan@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-15mshv: Introduce new hypercall to map stats page for L1VH partitionsJinank Jain
Introduce HVCALL_MAP_STATS_PAGE2 which provides a map location (GPFN) to map the stats to. This hypercall is required for L1VH partitions, depending on the hypervisor version. This uses the same check as the state page map location; mshv_use_overlay_gpfn(). Add mshv_map_vp_state_page() helpers to use this new hypercall or the old one depending on availability. For unmapping, the original HVCALL_UNMAP_STATS_PAGE works for both cases. Signed-off-by: Jinank Jain <jinankjain@linux.microsoft.com> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Easwar Hariharan <easwar.hariharan@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-15mshv: Allocate vp state page for HVCALL_MAP_VP_STATE_PAGE on L1VHJinank Jain
Introduce mshv_use_overlay_gpfn() to check if a page needs to be allocated and passed to the hypervisor to map VP state pages. This is only needed on L1VH, and only on some (newer) versions of the hypervisor, hence the need to check vmm_capabilities. Introduce functions hv_map/unmap_vp_state_page() to handle the allocation and freeing. Signed-off-by: Jinank Jain <jinankjain@linux.microsoft.com> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Praveen K Paladugu <prapal@linux.microsoft.com> Reviewed-by: Easwar Hariharan <easwar.hariharan@linux.microsoft.com> Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Reviewed-by: Anirudh Rayabharam <anirudh@anirudhrb.com> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-15mshv: Get the vmm capabilities offered by the hypervisorPurna Pavan Chandra Aekkaladevi
Some hypervisor APIs are gated by feature bits in the "vmm capabilities" partition property. Store the capabilities on mshv_root module init, using HVCALL_GET_PARTITION_PROPERTY_EX. This is not supported on all hypervisors. In that case, just set the capabilities to 0 and proceed as normal. Signed-off-by: Purna Pavan Chandra Aekkaladevi <paekkaladevi@linux.microsoft.com> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Praveen K Paladugu <prapal@linux.microsoft.com> Reviewed-by: Easwar Hariharan <easwar.hariharan@linux.microsoft.com> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-15mshv: Add the HVCALL_GET_PARTITION_PROPERTY_EX hypercallPurna Pavan Chandra Aekkaladevi
This hypercall can be used to fetch extended properties of a partition. Extended properties are properties with values larger than a u64. Some of these also need additional input arguments. Add helper function for using the hypercall in the mshv_root driver. Signed-off-by: Purna Pavan Chandra Aekkaladevi <paekkaladevi@linux.microsoft.com> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Anirudh Rayabharam <anirudh@anirudhrb.com> Reviewed-by: Praveen K Paladugu <prapal@linux.microsoft.com> Reviewed-by: Easwar Hariharan <easwar.hariharan@linux.microsoft.com> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-15mshv: Only map vp->vp_stats_pages if on root schedulerNuno Das Neves
This mapping is only used for checking if the dispatch thread is blocked. This is only relevant for the root scheduler, so check the scheduler type to determine whether to map/unmap these pages, instead of the current check, which is incorrect. Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Anirudh Rayabharam <anirudh@anirudhrb.com> Reviewed-by: Praveen K Paladugu <prapal@linux.microsoft.com> Reviewed-by: Easwar Hariharan <easwar.hariharan@linux.microsoft.com> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Acked-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-15Drivers: hv: Support establishing the confidential VMBus connectionRoman Kisel
To establish the confidential VMBus connection the CoCo VM, the guest first checks on the confidential VMBus availability, and then proceeds to initializing the communication stack. Implement that in the VMBus driver initialization. Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-15Drivers: hv: Set the default VMBus version to 6.0Roman Kisel
The confidential VMBus is supported by the protocol version 6.0 onwards. Attempt to establish the VMBus 6.0 connection thus enabling the confidential VMBus features when available. Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-15Drivers: hv: Support confidential VMBus channelsRoman Kisel
To make use of Confidential VMBus channels, initialize the co_ring_buffers and co_external_memory fields of the channel structure. Advertise support upon negotiating the version and compute values for those fields and initialize them. Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-15Drivers: hv: Free msginfo when the buffer fails to decryptRoman Kisel
The early failure path in __vmbus_establish_gpadl() doesn't deallocate msginfo if the buffer fails to decrypt. Fix the leak by breaking out the cleanup code into a separate function and calling it where required. Fixes: d4dccf353db80 ("Drivers: hv: vmbus: Mark vmbus ring buffer visible to host in Isolation VM") Reported-by: Michael Kelley <mkhlinux@outlook.com> Closes: https://lore.kernel.org/linux-hyperv/SN6PR02MB41573796F9787F67E0E97049D472A@SN6PR02MB4157.namprd02.prod.outlook.com Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-15Drivers: hv: Allocate encrypted buffers when requestedRoman Kisel
Confidential VMBus is built around using buffers not shared with the host. Support allocating encrypted buffers when requested. Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-15Drivers: hv: Functions for setting up and tearing down the paravisor SynICRoman Kisel
The confidential VMBus runs with the paravisor SynIC and requires configuring it with the paravisor. Add the functions for configuring the paravisor SynIC. Update overall SynIC initialization logic to initialize the SynIC if it is present. Finally, break out SynIC interrupt enable/disable code into separate functions so that SynIC interrupts can be enabled or disabled via the paravisor instead of the hypervisor if the paravisor SynIC is present. Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-15Drivers: hv: Rename the SynIC enable and disable routinesRoman Kisel
The confidential VMBus requires support for the both hypervisor facing SynIC and the paravisor one. Rename the functions that enable and disable SynIC with the hypervisor. No functional changes. Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-15Drivers: hv: Check message and event pages for non-NULL before iounmap()Roman Kisel
It might happen that some hyp SynIC pages aren't allocated. Check for that and only then call iounmap(). Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-15Drivers: hv: remove stale commentRoman Kisel
The comment about the x2v shim is ancient and long since incorrect. Remove the incorrect comment. Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-15Drivers: hv: Post messages through the confidential VMBus if availableRoman Kisel
When the confidential VMBus is available, the guest should post messages to the paravisor. Update hv_post_message() to post messages to the paravisor rather than through GHCB or TD calls. Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-15Drivers: hv: Allocate the paravisor SynIC pages when requiredRoman Kisel
Confidential VMBus requires interacting with two SynICs -- one provided by the host hypervisor, and one provided by the paravisor. Each SynIC requires its own message and event pages. Refactor and extend the existing code to add allocating and freeing the message and event pages for the paravisor SynIC when it is present. Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-15Drivers: hv: Rename fields for SynIC message and event pagesRoman Kisel
Confidential VMBus requires interacting with two SynICs -- one provided by the host hypervisor, and one provided by the paravisor. Each SynIC requires its own message and event pages. Rename the existing host-accessible SynIC message and event pages with the "hyp_" prefix to clearly distinguish them from the paravisor ones. The field name is also changed in mshv_root.* for consistency. No functional changes. Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-15arch/x86: mshyperv: Trap on access for some synthetic MSRsRoman Kisel
hv_set_non_nested_msr() has special handling for SINT MSRs when a paravisor is present. In addition to updating the MSR on the host, the mirror MSR in the paravisor is updated, including with the proxy bit. But with Confidential VMBus, the proxy bit must not be used, so add a special case to skip it. Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-15arch: hyperv: Get/set SynIC synth.registers via paravisorRoman Kisel
The existing Hyper-V wrappers for getting and setting MSRs are hv_get/set_msr(). Via hv_get/set_non_nested_msr(), they detect when running in a CoCo VM with a paravisor, and use the TDX or SNP guest-host communication protocol to bypass the paravisor and go directly to the host hypervisor for SynIC MSRs. The "set" function also implements the required special handling for the SINT MSRs. Provide functions that allow manipulating the SynIC registers through the paravisor. Move vmbus_signal_eom() to a more appropriate location (which also avoids breaking KVM). Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-15Drivers: hv: VMBus protocol version 6.0Roman Kisel
The confidential VMBus is supported starting from the protocol version 6.0 onwards. Provide the required definitions. No functional changes. Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-15drivers: hv: Allow vmbus message synic interrupt injected from Hyper-VTianyu Lan
When Secure AVIC is enabled, VMBus driver should call x2apic Secure AVIC interface to allow Hyper-V to inject VMBus message interrupt. Reviewed-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Tianyu Lan <tiala@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-15mshv: Fix deposit memory in MSHV_ROOT_HVCALLNuno Das Neves
When the MSHV_ROOT_HVCALL ioctl is executing a hypercall, and gets HV_STATUS_INSUFFICIENT_MEMORY, it deposits memory and then returns -EAGAIN to userspace. The expectation is that the VMM will retry. However, some VMM code in the wild doesn't do this and simply fails. Rather than force the VMM to retry, change the ioctl to deposit memory on demand and immediately retry the hypercall as is done with all the other hypercall helper functions. In addition to making the ioctl easier to use, removing the need for multiple syscalls improves performance. There is a complication: unlike the other hypercall helper functions, in MSHV_ROOT_HVCALL the input is opaque to the kernel. This is problematic for rep hypercalls, because the next part of the input list can't be copied on each loop after depositing pages (this was the original reason for returning -EAGAIN in this case). Introduce hv_do_rep_hypercall_ex(), which adds a 'rep_start' parameter. This solves the issue, allowing the deposit loop in MSHV_ROOT_HVCALL to restart a rep hypercall after depositing pages partway through. Fixes: 621191d709b1 ("Drivers: hv: Introduce mshv_root module to expose /dev/mshv to VMMs") Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-15mshv: Fix VpRootDispatchThreadBlocked valueNuno Das Neves
This value in the VP stats page is used to track if the VP can be dispatched for execution when there are no fast interrupts injected. The original value of 201 was used in a version of the hypervisor which did not ship. It was subsequently changed to 202 so that is the correct value. Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-14syscore: Pass context data to callbacksThierry Reding
Several drivers can benefit from registering per-instance data along with the syscore operations. To achieve this, move the modifiable fields out of the syscore_ops structure and into a separate struct syscore that can be registered with the framework. Add a void * driver data field for drivers to store contextual data that will be passed to the syscore ops. Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org> Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-11-04rseq, virt: Retrigger RSEQ after vcpu_run()Thomas Gleixner
Hypervisors invoke resume_user_mode_work() before entering the guest, which clears TIF_NOTIFY_RESUME. The @regs argument is NULL as there is no user space context available to them, so the rseq notify handler skips inspecting the critical section, but updates the CPU/MM CID values unconditionally so that the eventual pending rseq event is not lost on the way to user space. This is a pointless exercise as the task might be rescheduled before actually returning to user space and it creates unnecessary work in the vcpu_run() loops. It's way more efficient to ignore that invocation based on @regs == NULL and let the hypervisors re-raise TIF_NOTIFY_RESUME after returning from the vcpu_run() loop before returning from the ioctl(). This ensures that a pending RSEQ update is not lost and the IDs are updated before returning to user space. Once the RSEQ handling is decoupled from TIF_NOTIFY_RESUME, this turns into a NOOP. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://patch.msgid.link/20251027084306.399495855@linutronix.de
2025-10-01Drivers: hv: Make CONFIG_HYPERV boolMukesh Rathor
With CONFIG_HYPERV and CONFIG_HYPERV_VMBUS separated, change CONFIG_HYPERV to bool from tristate. CONFIG_HYPERV now becomes the core Hyper-V hypervisor support, such as hypercalls, clocks/timers, Confidential Computing setup, PCI passthru, etc. that doesn't involve VMBus or VMBus devices. Signed-off-by: Mukesh Rathor <mrathor@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>