summaryrefslogtreecommitdiff
path: root/arch/x86/kernel
AgeCommit message (Collapse)Author
2025-11-18x86/traps: Communicate a LASS violation in #GP messageAlexander Shishkin
A LASS violation typically results in a #GP. With LASS active, any invalid access to user memory (including the first page frame) would be reported as a #GP, instead of a #PF. Unfortunately, the #GP error messages provide limited information about the cause of the fault. This could be confusing for kernel developers and users who are accustomed to the friendly #PF messages. To make the transition easier, enhance the #GP Oops message to include a hint about LASS violations. Also, add a special hint for kernel NULL pointer dereferences to match with the existing #PF message. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251118182911.2983253-7-sohil.mehta%40intel.com
2025-11-18x86/kexec: Disable LASS during relocate kernelSohil Mehta
The relocate kernel mechanism uses an identity mapping to copy the new kernel, which leads to a LASS violation when executing from a low address. LASS must be disabled after the original CR4 value is saved because kexec paths that preserve context need to restore CR4.LASS. But, disabling it along with CET during identity_mapped() is too late. So, disable LASS immediately after saving CR4, along with PGE, and before jumping to the identity-mapped page. Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251118182911.2983253-6-sohil.mehta%40intel.com
2025-11-18x86/alternatives: Disable LASS when patching kernel codeSohil Mehta
For patching, the kernel initializes a temporary mm area in the lower half of the address range. LASS blocks these accesses because its enforcement relies on bit 63 of the virtual address as opposed to SMAP which depends on the _PAGE_BIT_USER bit in the page table. Disable LASS enforcement by toggling the RFLAGS.AC bit during patching to avoid triggering a #GP fault. Introduce LASS-specific STAC/CLAC helpers to set the AC bit only on platforms that need it. Name the wrappers as lass_stac()/_clac() instead of lass_disable()/_enable() because they only control the kernel data access enforcement. The entire LASS mechanism (including instruction fetch enforcement) is controlled by the CR4.LASS bit. Describe the usage of the new helpers in comparison to the ones used for SMAP. Also, add comments to explain when the existing stac()/clac() should be used. While at it, move the duplicated "barrier" comment to the same block. The Text poking functions use standard memcpy()/memset() while patching kernel code. However, objtool complains about calling such dynamic functions within an AC=1 region. See warning #9, regarding function calls with UACCESS enabled, in tools/objtool/Documentation/objtool.txt. To pacify objtool, one option is to add memcpy() and memset() to the list of allowed-functions. However, that would provide a blanket exemption for all usages of memcpy() and memset(). Instead, replace the standard calls in the text poking functions with their unoptimized, always-inlined versions. Considering that patching is usually small, there is no performance impact expected. Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251118182911.2983253-5-sohil.mehta%40intel.com
2025-11-18x86/cpu: Add an LASS dependency on SMAPSohil Mehta
With LASS enabled, any kernel data access to userspace typically results in a #GP, or a #SS in some stack-related cases. When the kernel needs to access user memory, it can suspend LASS enforcement by toggling the RFLAGS.AC bit. Most of these cases are already covered by the stac()/clac() pairs used to avoid SMAP violations. Even though LASS could potentially be enabled independently, it would be very painful without SMAP and the related stac()/clac() calls. There is no reason to support such a configuration because all future hardware with LASS is expected to have SMAP as well. Also, the STAC/CLAC instructions are architected to: #UD - If CPUID.(EAX=07H, ECX=0H):EBX.SMAP[bit 20] = 0. So, make LASS depend on SMAP to conveniently reuse the existing AC bit toggling already in place. Note: Additional STAC/CLAC would still be needed for accesses such as text poking which are not flagged by SMAP. This is because such mappings are in the lower half but do not have the _PAGE_USER bit set which SMAP uses for enforcement. Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251118182911.2983253-3-sohil.mehta%40intel.com
2025-11-16mm: consistently use current->mm in mm_get_unmapped_area()Ryan Roberts
mm_get_unmapped_area() is a wrapper around arch_get_unmapped_area() / arch_get_unmapped_area_topdown(), both of which search current->mm for some free space. Neither take an mm_struct - they implicitly operate on current->mm. But the wrapper takes an mm_struct and uses it to decide whether to search bottom up or top down. All callers pass in current->mm for this, so everything is working consistently. But it feels like an accident waiting to happen; eventually someone will call that function with a different mm, expecting to find free space in it, but what gets returned is free space in the current mm. So let's simplify by removing the parameter and have the wrapper use current->mm to decide which end to start at. Now everything is consistent and self-documenting. Link: https://lkml.kernel.org/r/20251003155306.2147572-1-ryan.roberts@arm.com Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-15Merge tag 'x86-urgent-2025-11-15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: - Update the list of AMD microcode minimum Entrysign revisions - Add additional fixed AMD RDSEED microcode revisions - Update the language transliteration for Kiryl Shutsemau's name in the MAINTAINERS entry * tag 'x86-urgent-2025-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode/AMD: Add Zen5 model 0x44, stepping 0x1 minrev x86/CPU/AMD: Add additional fixed RDSEED microcode revisions MAINTAINERS: Update name spelling
2025-11-15x86/hyperv: Rename guest crash shutdown functionMukesh Rathor
Rename hv_machine_crash_shutdown to more appropriate hv_guest_crash_shutdown and make it applicable to guests only. This in preparation for the subsequent hypervisor root crash support patches. Signed-off-by: Mukesh Rathor <mrathor@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-15x86: mshyperv: Remove duplicate asm/msr.h headerJiapeng Chong
./arch/x86/kernel/cpu/mshyperv.c: asm/msr.h is included more than once. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=26164 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-15arch/x86: mshyperv: Trap on access for some synthetic MSRsRoman Kisel
hv_set_non_nested_msr() has special handling for SINT MSRs when a paravisor is present. In addition to updating the MSR on the host, the mirror MSR in the paravisor is updated, including with the proxy bit. But with Confidential VMBus, the proxy bit must not be used, so add a special case to skip it. Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-15arch: hyperv: Get/set SynIC synth.registers via paravisorRoman Kisel
The existing Hyper-V wrappers for getting and setting MSRs are hv_get/set_msr(). Via hv_get/set_non_nested_msr(), they detect when running in a CoCo VM with a paravisor, and use the TDX or SNP guest-host communication protocol to bypass the paravisor and go directly to the host hypervisor for SynIC MSRs. The "set" function also implements the required special handling for the SINT MSRs. Provide functions that allow manipulating the SynIC registers through the paravisor. Move vmbus_signal_eom() to a more appropriate location (which also avoids breaking KVM). Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-15arch/x86: mshyperv: Discover Confidential VMBus availabilityRoman Kisel
Confidential VMBus requires enabling paravisor SynIC, and the x86_64 guest has to inspect the Virtualization Stack (VS) CPUID leaf to see if Confidential VMBus is available. If it is, the guest shall enable the paravisor SynIC. Read the relevant data from the VS CPUID leaf. Refactor the code to avoid repeating CPUID and add flags to the struct ms_hyperv_info. For ARM64, the flag for Confidential VMBus is not set which provides the desired behaviour for now as it is not available on ARM64 just yet. Once ARM64 CCA guests are supported, this flag will be set unconditionally when running such a guest. Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-15x86/hyperv: Don't use auto-eoi when Secure AVIC is availableTianyu Lan
Hyper-V doesn't support auto-eoi with Secure AVIC. So set the HV_DEPRECATING_AEOI_RECOMMENDED flag to force writing the EOI register after handling an interrupt. Reviewed-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Tianyu Lan <tiala@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-14Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfLinus Torvalds
Pull bpf fixes from Alexei Starovoitov: - Fix interaction between livepatch and BPF fexit programs (Song Liu) With Steven and Masami acks. - Fix stack ORC unwind from BPF kprobe_multi (Jiri Olsa) With Steven and Masami acks. - Fix out of bounds access in widen_imprecise_scalars() in the verifier (Eduard Zingerman) - Fix conflicts between MPTCP and BPF sockmap (Jiayuan Chen) - Fix net_sched storage collision with BPF data_meta/data_end (Eric Dumazet) - Add _impl suffix to BPF kfuncs with implicit args to avoid breaking them in bpf-next when KF_IMPLICIT_ARGS is added (Mykyta Yatsenko) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: Test widen_imprecise_scalars() with different stack depth bpf: account for current allocated stack depth in widen_imprecise_scalars() bpf: Add bpf_prog_run_data_pointers() selftests/bpf: Add mptcp test with sockmap mptcp: Fix proto fallback detection with BPF mptcp: Disallow MPTCP subflows from sockmap selftests/bpf: Add stacktrace ips test for raw_tp selftests/bpf: Add stacktrace ips test for kprobe_multi/kretprobe_multi x86/fgraph,bpf: Fix stack ORC unwind from kprobe_multi return probe Revert "perf/x86: Always store regs->ip in perf_callchain_kernel()" bpf: add _impl suffix for bpf_stream_vprintk() kfunc bpf:add _impl suffix for bpf_task_work_schedule* kfuncs selftests/bpf: Add tests for livepatch + bpf trampoline ftrace: bpf: Fix IPMODIFY + DIRECT in modify_ftrace_direct() ftrace: Fix BPF fexit with livepatch
2025-11-14x86/bugs: Get rid of the forward declarationsBorislav Petkov (AMD)
Get rid of the forward declarations of the mitigation functions by moving their single caller below them. No functional changes. Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/r/20251105200447.GBaQut3w4dLilZrX-z@fat_crate.local
2025-11-14x86/microcode/AMD: Add Zen5 model 0x44, stepping 0x1 minrevBorislav Petkov (AMD)
Add the minimum Entrysign revision for that model+stepping to the list of minimum revisions. Fixes: 50cef76d5cb0 ("x86/microcode/AMD: Load only SHA256-checksummed patches") Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org> Link: https://lore.kernel.org/r/e94dd76b-4911-482f-8500-5c848a3df026@citrix.com
2025-11-14x86/CPU/AMD: Add additional fixed RDSEED microcode revisionsMario Limonciello
Microcode that resolves the RDSEED failure (SB-7055 [1]) has been released for additional Zen5 models to linux-firmware [2]. Update the zen5_rdseed_microcode array to cover these new models. Fixes: 607b9fb2ce24 ("x86/CPU/AMD: Add RDSEED fix for Zen5") Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org> Link: https://www.amd.com/en/resources/product-security/bulletin/amd-sb-7055.html [1] Link: https://gitlab.com/kernel-firmware/linux-firmware/-/commit/6167e5566900cf236f7a69704e8f4c441bc7212a [2] Link: https://patch.msgid.link/20251113223608.1495655-1-mario.limonciello@amd.com
2025-11-14syscore: Pass context data to callbacksThierry Reding
Several drivers can benefit from registering per-instance data along with the syscore operations. To achieve this, move the modifiable fields out of the syscore_ops structure and into a separate struct syscore that can be registered with the framework. Add a void * driver data field for drivers to store contextual data that will be passed to the syscore ops. Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org> Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-11-13Merge branches 'acpi-cppc' and 'acpi-tables'Rafael J. Wysocki
Merge ACPI CPPC library fixes and an ACPI MRRM table parser fix for 6.18-rc6. * acpi-cppc: ACPI: CPPC: Limit perf ctrs in PCC check only to online CPUs ACPI: CPPC: Perform fast check switch only for online CPUs ACPI: CPPC: Check _CPC validity for only the online CPUs ACPI: CPPC: Detect preferred core availability on online CPUs * acpi-tables: ACPI: MRRM: Fix memory leaks and improve error handling
2025-11-13Merge tag 'v6.18-rc5' into objtool/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-11-12x86: Restrict KVM-induced symbol exports to KVM modules where obvious/possibleSean Christopherson
Extend KVM's export macro framework to provide EXPORT_SYMBOL_FOR_KVM(), and use the helper macro to export symbols for KVM throughout x86 if and only if KVM will build one or more modules, and only for those modules. To avoid unnecessary exports when CONFIG_KVM=m but kvm.ko will not be built (because no vendor modules are selected), let arch code #define EXPORT_SYMBOL_FOR_KVM to suppress/override the exports. Note, the set of symbols to restrict to KVM was generated by manual search and audit; any "misses" are due to human error, not some grand plan. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Kai Huang <kai.huang@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251112173944.1380633-5-seanjc%40google.com
2025-11-12x86/mtrr: Drop unnecessary export of "mtrr_state"Sean Christopherson
Don't export "mtrr_state" as usage is limited to arch/x86/kernel/cpu/mtrr (and nothing outside of that directory even includes the local mtrr.h). No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251112173944.1380633-3-seanjc%40google.com
2025-11-12x86/bugs: Drop unnecessary export of "x86_spec_ctrl_base"Sean Christopherson
Don't export x86_spec_ctrl_base as it's used only in bugs.c and process.c, neither of which can be built into a module. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251112173944.1380633-2-seanjc%40google.com
2025-11-08Merge tag 'x86-urgent-2025-11-08' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: - Fix AMD PCI root device caching regression that triggers on certain firmware variants - Fix the zen5_rdseed_microcode[] array to be NULL-terminated - Add more AMD models to microcode signature checking * tag 'x86-urgent-2025-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode/AMD: Add more known models to entry sign checking x86/CPU/AMD: Add missing terminator for zen5_rdseed_microcode x86/amd_node: Fix AMD root device caching
2025-11-07ACPI: CPPC: Detect preferred core availability on online CPUsGautham R. Shenoy
Commit 279f838a61f9 ("x86/amd: Detect preferred cores in amd_get_boost_ratio_numerator()") introduced the ability to detect the preferred core on AMD platforms by checking if there at least two distinct highest_perf values. However, it uses for_each_present_cpu() to iterate through all the CPUs in the platform, which is problematic when the kernel is booted with "nosmt=force" commandline option. Hence limit the search to only the online CPUs. Fixes: 279f838a61f9 ("x86/amd: Detect preferred cores in amd_get_boost_ratio_numerator()") Reported-by: Christopher Harris <chris.harris79@gmail.com> Closes: https://lore.kernel.org/lkml/CAM+eXpdDT7KjLV0AxEwOLkSJ2QtrsvGvjA2cCHvt1d0k2_C4Cw@mail.gmail.com/ Reviewed-by: "Mario Limonciello (AMD) (kernel.org)" <superm1@kernel.org> Tested-by: Chrisopher Harris <chris.harris79@gmail.com> Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Link: https://patch.msgid.link/20251107074145.2340-2-gautham.shenoy@amd.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-07x86/apic: Fix frequency in apic=verbose log outputJulian Stecklina
When apic=verbose is specified, the LAPIC timer calibration prints its results to the console. At least while debugging virtualization code, the CPU and bus frequencies are printed incorrectly. Specifically, for a 1.7 GHz CPU with 1 GHz bus frequency and HZ=1000, the log includes a superfluous 0 after the period: ..... calibration result: 999978 ..... CPU clock speed is 1696.0783 MHz. ..... host bus clock speed is 999.0978 MHz. Looking at the code, this only worked as intended for HZ=100. After the fix, the correct frequency is printed: ..... calibration result: 999828 ..... CPU clock speed is 1696.507 MHz. ..... host bus clock speed is 999.828 MHz. There is no functional change to the LAPIC calibration here, beyond the printing format changes. [ bp: - Massage commit message - Figures it should apply this patch about ~4 years later - Massage it into the current code ] Suggested-by: Markus Napierkowski <markus.napierkowski@cyberus-technology.de> Signed-off-by: Julian Stecklina <julian.stecklina@cyberus-technology.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/20211030142148.143261-1-js@alien8.de
2025-11-07x86/microcode/AMD: Add more known models to entry sign checkingMario Limonciello (AMD)
Two Zen5 systems are missing from need_sha_check(). Add them. Fixes: 50cef76d5cb0 ("x86/microcode/AMD: Load only SHA256-checksummed patches") Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org> Link: https://patch.msgid.link/20251106182904.4143757-1-superm1@kernel.org
2025-11-05x86/fgraph,bpf: Fix stack ORC unwind from kprobe_multi return probeJiri Olsa
Currently we don't get stack trace via ORC unwinder on top of fgraph exit handler. We can see that when generating stacktrace from kretprobe_multi bpf program which is based on fprobe/fgraph. The reason is that the ORC unwind code won't get pass the return_to_handler callback installed by fgraph return probe machinery. Solving this by creating stack frame in return_to_handler expected by ftrace_graph_ret_addr function to recover original return address and continue with the unwind. Also updating the pt_regs data with cs/flags/rsp which are needed for successful stack retrieval from ebpf bpf_get_stackid helper. - in get_perf_callchain we check user_mode(regs) so CS has to be set - in perf_callchain_kernel we call perf_hw_regs(regs), so EFLAGS/FIXED has to be unset Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20251104215405.168643-3-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-05x86/mce/amd: Define threshold restart function for banksYazen Ghannam
Prepare for CMCI storm support by moving the common bank/block iterator code to a helper function. Include a parameter to switch the interrupt enable. This will be used by the CMCI storm handling function. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Link: https://lore.kernel.org/20251104-wip-mca-updates-v8-0-66c8eacf67b9@amd.com
2025-11-05x86/mce/amd: Remove redundant reset_block()Yazen Ghannam
Many of the checks in reset_block() are done again in the block reset function. So drop the redundant checks. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20251104-wip-mca-updates-v8-0-66c8eacf67b9@amd.com
2025-11-05x86/mce/amd: Support SMCA Corrected Error InterruptYazen Ghannam
AMD systems optionally support MCA thresholding which provides the ability for hardware to send an interrupt when a set error threshold is reached. This feature counts errors of all severities, but it is commonly used to report correctable errors with an interrupt rather than polling. Scalable MCA systems allow the platform to take control of this feature. In this case, the OS will not see the feature configuration and control bits in the MCA_MISC* registers. The OS will not receive the MCA thresholding interrupt, and it will need to poll for correctable errors. A "corrected error interrupt" will be available on Scalable MCA systems. This will be used in the same configuration where the platform controls MCA thresholding. However, the platform will now be able to send the MCA thresholding interrupt to the OS. Check for, and enable, this feature during per-CPU SMCA init. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20251104-wip-mca-updates-v8-0-66c8eacf67b9@amd.com
2025-11-05x86/mce/amd: Enable interrupt vectors once per-CPU on SMCA systemsYazen Ghannam
Scalable MCA systems have a per-CPU register that gives the APIC LVT offset for the thresholding and deferred error interrupts. Currently, this register is read once to set up the deferred error interrupt and then read again for each thresholding block. Furthermore, the APIC LVT registers are configured each time, but they only need to be configured once per-CPU. Move the APIC LVT setup to the early part of CPU init, so that the registers are set up once. Also, this ensures that the kernel is ready to service the interrupts before the individual error sources (each MCA bank) are enabled. Apply this change only to SMCA systems to avoid breaking any legacy behavior. The deferred error interrupt is technically advertised by the SUCCOR feature. However, this was first made available on SMCA systems. Therefore, only set up the deferred error interrupt on SMCA systems and simplify the code. Guidance from hardware designers is that the LVT offsets provided from the platform should be used. The kernel should not try to enforce specific values. However, the kernel should check that an LVT offset is not reused for multiple sources. Therefore, remove the extra checking and value enforcement from the MCE code. The "reuse/conflict" case is already handled in setup_APIC_eilvt(). Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20251104-wip-mca-updates-v8-0-66c8eacf67b9@amd.com
2025-11-05x86/mce: Unify AMD DFR handler with MCA PollingYazen Ghannam
AMD systems optionally support a deferred error interrupt. The interrupt should be used as another signal to trigger MCA polling. This is similar to how other MCA interrupts are handled. Deferred errors do not require any special handling related to the interrupt, e.g. resetting or rearming the interrupt, etc. However, Scalable MCA systems include a pair of registers, MCA_DESTAT and MCA_DEADDR, that should be checked for valid errors. This check should be done whenever MCA registers are polled. Currently, the deferred error interrupt does this check, but the MCA polling function does not. Call the MCA polling function when handling the deferred error interrupt. This keeps all "polling" cases in a common function. Add an SMCA status check helper. This will do the same status check and register clearing that the interrupt handler has done. And it extends the common polling flow to find AMD deferred errors. Clear the MCA_DESTAT register at the end of the handler rather than the beginning. This maintains the procedure that the 'status' register must be cleared as the final step. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20251104-wip-mca-updates-v8-0-66c8eacf67b9@amd.com
2025-11-05x86/mce: Unify AMD THR handler with MCA PollingYazen Ghannam
AMD systems optionally support an MCA thresholding interrupt. The interrupt should be used as another signal to trigger MCA polling. This is similar to how the Intel Corrected Machine Check interrupt (CMCI) is handled. AMD MCA thresholding is managed using the MCA_MISC registers within an MCA bank. The OS will need to modify the hardware error count field in order to reset the threshold limit and rearm the interrupt. Management of the MCA_MISC register should be done as a follow up to the basic MCA polling flow. It should not be the main focus of the interrupt handler. Furthermore, future systems will have the ability to send an MCA thresholding interrupt to the OS even when the OS does not manage the feature, i.e. MCA_MISC registers are Read-as-Zero/Locked. Call the common MCA polling function when handling the MCA thresholding interrupt. This will allow the OS to find any valid errors whether or not the MCA thresholding feature is OS-managed. Also, this allows the common MCA polling options and kernel parameters to apply to AMD systems. Add a callback to the MCA polling function to check and reset any threshold blocks that have reached their threshold limit. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20251104-wip-mca-updates-v8-0-66c8eacf67b9@amd.com
2025-11-05x86/msr: Add CPU_OUT_OF_SPEC taint name to "unrecognized" pr_warn(msg)Marc Herbert
While restricting access, a7e1f67ed29f ("x86/msr: Filter MSR writes") also added warning and started tainting the kernel. But the warning message never mentioned tainting. Moreover, this uses the "CPU_OUT_OF_SPEC" flag which is not clearly related to MSRs: that flag is overloaded by several, fairly different situations, including some much scarier ones. So, without an expert around (thank you Dave Hansen), it would have been practically impossible to root cause the tainting from just the log file at hand. So it would be prudent to explicitly mention in the logs when the tainting happens so that debugging crashes can be made easier. Fix this by simply appending the CPU_OUT_OF_SPEC flag to the warning message. This readability issue happened when staring at logs involving the Intel Memory Latency Checker (among many other things going on in that log). The MLC disables hardware prefetch. [ bp: Massage and extend commit message. ] Signed-off-by: Marc Herbert <marc.herbert@linux.intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20251101-tainted-msr-v1-1-e00658ba04d4@linux.intel.com
2025-11-05x86: uaccess: don't use runtime-const rewriting in modulesLinus Torvalds
The runtime-const infrastructure was never designed to handle the modular case, because the constant fixup is only done at boot time for core kernel code. But by the time I used it for the x86-64 user space limit handling in commit 86e6b1547b3d ("x86: fix user address masking non-canonical speculation issue"), I had completely repressed that fact. And it all happens to work because the only code that currently actually gets inlined by modules is for the access_ok() limit check, where the default constant value works even when not fixed up. Because at least I had intentionally made it be something that is in the non-canonical address space region. But it's technically very wrong, and it does mean that at least in theory, the use of 'access_ok()' + '__get_user()' can trigger the same speculation issue with non-canonical addresses that the original commit was all about. The pattern is unusual enough that this probably doesn't matter in practice, but very wrong is still very wrong. Also, let's fix it before the nice optimized scoped user accessor helpers that Thomas Gleixner is working on cause this pseudo-constant to then be more widely used. This all came up due to an unrelated discussion with Mateusz Guzik about using the runtime const infrastructure for names_cachep accesses too. There the modular case was much more obviously broken, and Mateusz noted it in his 'v2' of the patch series. That then made me notice how broken 'access_ok()' had been in modules all along. Mea culpa, mea maxima culpa. Fix it by simply not using the runtime-const code in modules, and just using the USER_PTR_MAX variable value instead. This is not performance-critical like the core user accessor functions (get_user() and friends) are. Also make sure this doesn't get forgotten the next time somebody wants to do runtime constant optimizations by having the x86 runtime-const.h header file error out if included by modules. Fixes: 86e6b1547b3d ("x86: fix user address masking non-canonical speculation issue") Acked-by: Borislav Petkov <bp@alien8.de> Acked-by: Sean Christopherson <seanjc@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Triggered-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/all/20251030105242.801528-1-mjguzik@gmail.com/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-11-04x86/CPU/AMD: Add missing terminator for zen5_rdseed_microcodeMario Limonciello
Running x86_match_min_microcode_rev() on a Zen5 CPU trips up KASAN for an out of bounds access. Fixes: 607b9fb2ce248 ("x86/CPU/AMD: Add RDSEED fix for Zen5") Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20251104161007.269885-1-mario.limonciello@amd.com
2025-11-03x86/amd_node: Fix AMD root device cachingYazen Ghannam
Recent AMD node rework removed the "search and count" method of caching AMD root devices. This depended on the value from a Data Fabric register that was expected to hold the PCI bus of one of the root devices attached to that fabric. However, this expectation is incorrect. The register, when read from PCI config space, returns the bitwise-OR of the buses of all attached root devices. This behavior is benign on AMD reference design boards, since the bus numbers are aligned. This results in a bitwise-OR value matching one of the buses. For example, 0x00 | 0x40 | 0xA0 | 0xE0 = 0xE0. This behavior breaks on boards where the bus numbers are not exactly aligned. For example, 0x00 | 0x07 | 0xE0 | 0x15 = 0x1F. The examples above are for AMD node 0. The first root device on other nodes will not be 0x00. The first root device for other nodes will depend on the total number of root devices, the system topology, and the specific PCI bus number assignment. For example, a system with 2 AMD nodes could have this: Node 0 : 0x00 0x07 0x0e 0x15 Node 1 : 0x1c 0x23 0x2a 0x31 The bus numbering style in the reference boards is not a requirement. The numbering found in other boards is not incorrect. Therefore, the root device caching method needs to be adjusted. Go back to the "search and count" method used before the recent rework. Search for root devices using PCI class code rather than fixed PCI IDs. This keeps the goal of the rework (remove dependency on PCI IDs) while being able to support various board designs. Merge helper functions to reduce code duplication. [ bp: Reflow comment. ] Fixes: 40a5f6ffdfc8 ("x86/amd_nb: Simplify root device search") Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/all/20251028-fix-amd-root-v2-1-843e38f8be2c@amd.com
2025-10-30x86/smpboot: Mark native_play_dead() as __noreturnThorsten Blum
native_play_dead() ends by calling the non-returning function hlt_play_dead() and therefore also never returns. The !CONFIG_HOTPLUG_CPU stub version of native_play_dead() unconditionally calls BUG() and does not return either. Add the __noreturn attribute to both function definitions and their declaration to document this behavior and to potentially improve compiler optimizations. Remove the obsolete comment, and add native_play_dead() to the objtool's list of __noreturn functions. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Link: https://patch.msgid.link/20251027155107.183136-1-thorsten.blum@linux.dev Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2025-10-30x86/microcode: Mark early_parse_cmdline() as __initYu Peng
Fix section mismatch warning reported by modpost: .text:early_parse_cmdline() -> .init.data:boot_command_line The function early_parse_cmdline() is only called during init and accesses init data, so mark it __init to match its usage. [ bp: This happens only when the toolchain fails to inline the function and I haven't been able to reproduce it with any toolchain I'm using. Patch is obviously correct regardless. ] Signed-off-by: Yu Peng <pengyu@kylinos.cn> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/all/20251030123757.1410904-1-pengyu@kylinos.cn
2025-10-30x86/microcode/AMD: Select which microcode patch to loadBorislav Petkov (AMD)
All microcode patches up to the proper BIOS Entrysign fix are loaded only after the sha256 signature carried in the driver has been verified. Microcode patches after the Entrysign fix has been applied, do not need that signature verification anymore. In order to not abandon machines which haven't received the BIOS update yet, add the capability to select which microcode patch to load. The corresponding microcode container supplied through firmware-linux has been modified to carry two patches per CPU type (family/model/stepping) so that the proper one gets selected. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Waiman Long <longman@redhat.com> Link: https://patch.msgid.link/20251027133818.4363-1-bp@kernel.org
2025-10-30x86/CPU/AMD: Extend Zen6 model rangeBorislav Petkov (AMD)
Add some more Zen6 models. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://patch.msgid.link/20251029123056.19987-1-bp@kernel.org
2025-10-29x86/dumpstack: Prevent KASAN false positive warnings in __show_regs()Tengda Wu
When triggering a stack dump via sysrq (echo t > /proc/sysrq-trigger), KASAN may report false-positive out-of-bounds access: BUG: KASAN: out-of-bounds in __show_regs+0x4b/0x340 Call Trace: dump_stack_lvl print_address_description.constprop.0 print_report __show_regs show_trace_log_lvl sched_show_task show_state_filter sysrq_handle_showstate __handle_sysrq write_sysrq_trigger proc_reg_write vfs_write ksys_write do_syscall_64 entry_SYSCALL_64_after_hwframe The issue occurs as follows: Task A (walk other tasks' stacks) Task B (running) 1. echo t > /proc/sysrq-trigger show_trace_log_lvl regs = unwind_get_entry_regs() show_regs_if_on_stack(regs) 2. The stack value pointed by `regs` keeps changing, and so are the tags in its KASAN shadow region. __show_regs(regs) regs->ax, regs->bx, ... 3. hit KASAN redzones, OOB When task A walks task B's stack without suspending it, the continuous changes in task B's stack (and corresponding KASAN shadow tags) may cause task A to hit KASAN redzones when accessing obsolete values on the stack, resulting in false positive reports. Simply stopping the task before unwinding is not a viable fix, as it would alter the state intended to inspect. This is especially true for diagnosing misbehaving tasks (e.g., in a hard lockup), where stopping might fail or hide the root cause by changing the call stack. Therefore, fix this by disabling KASAN checks during asynchronous stack unwinding, which is identified when the unwinding task does not match the current task (task != current). [ bp: Align arguments on function's opening brace. ] Fixes: 3b3fa11bc700 ("x86/dumpstack: Print any pt_regs found on the stack") Signed-off-by: Tengda Wu <wutengda@huaweicloud.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://patch.msgid.link/all/20251023090632.269121-1-wutengda@huaweicloud.com
2025-10-29unwind_user/x86: Teach FP unwind about start of functionPeter Zijlstra
When userspace is interrupted at the start of a function, before we get a chance to complete the frame, unwind will miss one caller. X86 has a uprobe specific fixup for this, add bits to the generic unwinder to support this. Suggested-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20251024145156.GM4068168@noisy.programming.kicks-ass.net
2025-10-29x86: Use physical address for DMA mappingLeon Romanovsky
Perform mechanical conversion from DMA .map_page to .map_phys. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20251015-remove-map-page-v5-12-3bbfe3a25cdf@kernel.org
2025-10-29Merge branch 'linus/master' into sched/core, to resolve conflictPeter Zijlstra
Conflicts: kernel/sched/ext.c Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-10-28x86/fpu: Ensure XFD state on signal deliveryChang S. Bae
Sean reported [1] the following splat when running KVM tests: WARNING: CPU: 232 PID: 15391 at xfd_validate_state+0x65/0x70 Call Trace: <TASK> fpu__clear_user_states+0x9c/0x100 arch_do_signal_or_restart+0x142/0x210 exit_to_user_mode_loop+0x55/0x100 do_syscall_64+0x205/0x2c0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 Chao further identified [2] a reproducible scenario involving signal delivery: a non-AMX task is preempted by an AMX-enabled task which modifies the XFD MSR. When the non-AMX task resumes and reloads XSTATE with init values, a warning is triggered due to a mismatch between fpstate::xfd and the CPU's current XFD state. fpu__clear_user_states() does not currently re-synchronize the XFD state after such preemption. Invoke xfd_update_state() which detects and corrects the mismatch if there is a dynamic feature. This also benefits the sigreturn path, as fpu__restore_sig() may call fpu__clear_user_states() when the sigframe is inaccessible. [ dhansen: minor changelog munging ] Closes: https://lore.kernel.org/lkml/aDCo_SczQOUaB2rS@google.com [1] Fixes: 672365477ae8a ("x86/fpu: Update XFD state where required") Reported-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Chao Gao <chao.gao@intel.com> Tested-by: Chao Gao <chao.gao@intel.com> Link: https://lore.kernel.org/all/aDWbctO%2FRfTGiCg3@intel.com [2] Cc:stable@vger.kernel.org Link: https://patch.msgid.link/20250610001700.4097-1-chang.seok.bae%40intel.com
2025-10-28x86/CPU/AMD: Add RDSEED fix for Zen5Gregory Price
There's an issue with RDSEED's 16-bit and 32-bit register output variants on Zen5 which return a random value of 0 "at a rate inconsistent with randomness while incorrectly signaling success (CF=1)". Search the web for AMD-SB-7055 for more detail. Add a fix glue which checks microcode revisions. [ bp: Add microcode revisions checking, rewrite. ] Cc: stable@vger.kernel.org Signed-off-by: Gregory Price <gourry@gourry.net> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20251018024010.4112396-1-gourry@gourry.net
2025-10-27x86/microcode/AMD: Limit Entrysign signature checking to known generationsBorislav Petkov (AMD)
Limit Entrysign sha256 signature checking to CPUs in the range Zen1-Zen5. X86_BUG cannot be used here because the loading on the BSP happens way too early, before the cpufeatures machinery has been set up. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/all/20251023124629.5385-1-bp@kernel.org
2025-10-27Merge tag 'x86_urgent_for_v6.18_rc3' into x86/microcodeBorislav Petkov (AMD)
Pick up the below urgent upstream change in order to base more work ontop: - Correct the last Zen1 microcode revision for which Entrysign sha256 check is needed Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2025-10-24x86/bugs: Remove dead code which might prevent from buildingAndy Shevchenko
Clang, in particular, is not happy about dead code: arch/x86/kernel/cpu/bugs.c:1830:20: error: unused function 'match_option' [-Werror,-Wunused-function] 1830 | static inline bool match_option(const char *arg, int arglen, const char *opt) | ^~~~~~~~~~~~ 1 error generated. Remove a leftover from the previous cleanup. Fixes: 02ac6cc8c5a1 ("x86/bugs: Simplify SSB cmdline parsing") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251024125959.1526277-1-andriy.shevchenko%40linux.intel.com