From 5288176a541215ba48d38fb74bb619e64d4d9bab Mon Sep 17 00:00:00 2001 From: Swaraj Gaikwad Date: Wed, 10 Dec 2025 09:28:14 +0000 Subject: x86/boot/Documentation: Fix htmldocs build warning due to malformed table in boot.rst Sphinx reports htmldocs warnings: Documentation/arch/x86/boot.rst:437: ERROR: Malformed table. Text in column margin in table line 2. The table header defined the first column width as 2 characters ("=="), which is too narrow for entries like "0x10" and "0x13". This caused the text to spill into the margin, triggering a docutils parsing failure. Fix it by extending the first column of assigned boot loader ID to 4 characters ("====") to fit the widest entries. Fixes: 1c3377bee212 ("x86/boot/Documentation: Prefix hexadecimal literals with 0x") Tested-by: Randy Dunlap Signed-off-by: Swaraj Gaikwad Signed-off-by: Ingo Molnar Reviewed-by: Randy Dunlap Reviewed-by: Bagas Sanjaya Link: https://patch.msgid.link/20251210092814.9986-1-swarajgaikwad1925@gmail.com --- Documentation/arch/x86/boot.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Documentation/arch/x86/boot.rst b/Documentation/arch/x86/boot.rst index 6d36ce86fd8e..18574f010d46 100644 --- a/Documentation/arch/x86/boot.rst +++ b/Documentation/arch/x86/boot.rst @@ -433,7 +433,7 @@ Protocol: 2.00+ Assigned boot loader IDs: - == ======================================= + ==== ======================================= 0x0 LILO (0x00 reserved for pre-2.00 bootloader) 0x1 Loadlin @@ -456,7 +456,7 @@ Protocol: 2.00+ 0x12 OVMF UEFI virtualization stack 0x13 barebox - == ======================================= + ==== ======================================= Please contact if you need a bootloader ID value assigned. -- cgit v1.2.3 From c8161e5304abb26e6c0bec6efc947992500fa6c5 Mon Sep 17 00:00:00 2001 From: Yongxin Liu Date: Wed, 10 Dec 2025 08:02:20 +0800 Subject: x86/fpu: Fix FPU state core dump truncation on CPUs with no extended xfeatures Zero can be a valid value of num_records. For example, on Intel Atom x6425RE, only x87 and SSE are supported (features 0, 1), and fpu_user_cfg.max_features is 3. The for_each_extended_xfeature() loop only iterates feature 2, which is not enabled, so num_records = 0. This is valid and should not cause core dump failure. The issue is that dump_xsave_layout_desc() returns 0 for both genuine errors (dump_emit() failure) and valid cases (no extended features). Use negative return values for errors and only abort on genuine failures. Fixes: ba386777a30b ("x86/elf: Add a new FPU buffer layout info to x86 core files") Signed-off-by: Yongxin Liu Signed-off-by: Ingo Molnar Link: https://patch.msgid.link/20251210000219.4094353-2-yongxin.liu@windriver.com --- arch/x86/kernel/fpu/xstate.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index 48113c5193aa..76153dfb58c9 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -1946,7 +1946,7 @@ static int dump_xsave_layout_desc(struct coredump_params *cprm) }; if (!dump_emit(cprm, &xc, sizeof(xc))) - return 0; + return -1; num_records++; } @@ -1984,7 +1984,7 @@ int elf_coredump_extra_notes_write(struct coredump_params *cprm) return 1; num_records = dump_xsave_layout_desc(cprm); - if (!num_records) + if (num_records < 0) return 1; /* Total size should be equal to the number of records */ -- cgit v1.2.3 From ac87efcf9e42f07526438b67405659a8c1d0480e Mon Sep 17 00:00:00 2001 From: Ingo Molnar Date: Wed, 10 Dec 2025 08:36:18 +0100 Subject: x86/boot/Documentation: Fix whitespace noise in boot.rst There's a lot of unnecessary whitespace damage in this file: space before tabs, etc., that has no formatting or readability effect or advantages. Fix them. Signed-off-by: Ingo Molnar Link: https://patch.msgid.link/176535283007.498.16442167388418039352.tip-bot2@tip-bot2 --- Documentation/arch/x86/boot.rst | 194 ++++++++++++++++++++-------------------- 1 file changed, 97 insertions(+), 97 deletions(-) diff --git a/Documentation/arch/x86/boot.rst b/Documentation/arch/x86/boot.rst index 18574f010d46..dca3875a2435 100644 --- a/Documentation/arch/x86/boot.rst +++ b/Documentation/arch/x86/boot.rst @@ -95,26 +95,26 @@ Memory Layout The traditional memory map for the kernel loader, used for Image or zImage kernels, typically looks like:: - | | + | | 0A0000 +------------------------+ - | Reserved for BIOS | Do not use. Reserved for BIOS EBDA. + | Reserved for BIOS | Do not use. Reserved for BIOS EBDA. 09A000 +------------------------+ - | Command line | - | Stack/heap | For use by the kernel real-mode code. + | Command line | + | Stack/heap | For use by the kernel real-mode code. 098000 +------------------------+ - | Kernel setup | The kernel real-mode code. + | Kernel setup | The kernel real-mode code. 090200 +------------------------+ - | Kernel boot sector | The kernel legacy boot sector. + | Kernel boot sector | The kernel legacy boot sector. 090000 +------------------------+ - | Protected-mode kernel | The bulk of the kernel image. + | Protected-mode kernel | The bulk of the kernel image. 010000 +------------------------+ - | Boot loader | <- Boot sector entry point 0000:7C00 + | Boot loader | <- Boot sector entry point 0000:7C00 001000 +------------------------+ - | Reserved for MBR/BIOS | + | Reserved for MBR/BIOS | 000800 +------------------------+ - | Typically used by MBR | + | Typically used by MBR | 000600 +------------------------+ - | BIOS use only | + | BIOS use only | 000000 +------------------------+ When using bzImage, the protected-mode kernel was relocated to @@ -142,27 +142,27 @@ above the 0x9A000 point; too many BIOSes will break above that point. For a modern bzImage kernel with boot protocol version >= 2.02, a memory layout like the following is suggested:: - ~ ~ - | Protected-mode kernel | + ~ ~ + | Protected-mode kernel | 100000 +------------------------+ - | I/O memory hole | + | I/O memory hole | 0A0000 +------------------------+ - | Reserved for BIOS | Leave as much as possible unused - ~ ~ - | Command line | (Can also be below the X+10000 mark) + | Reserved for BIOS | Leave as much as possible unused + ~ ~ + | Command line | (Can also be below the X+10000 mark) X+10000 +------------------------+ - | Stack/heap | For use by the kernel real-mode code. + | Stack/heap | For use by the kernel real-mode code. X+08000 +------------------------+ - | Kernel setup | The kernel real-mode code. - | Kernel boot sector | The kernel legacy boot sector. + | Kernel setup | The kernel real-mode code. + | Kernel boot sector | The kernel legacy boot sector. X +------------------------+ - | Boot loader | <- Boot sector entry point 0000:7C00 + | Boot loader | <- Boot sector entry point 0000:7C00 001000 +------------------------+ - | Reserved for MBR/BIOS | + | Reserved for MBR/BIOS | 000800 +------------------------+ - | Typically used by MBR | + | Typically used by MBR | 000600 +------------------------+ - | BIOS use only | + | BIOS use only | 000000 +------------------------+ ... where the address X is as low as the design of the boot loader permits. @@ -809,12 +809,12 @@ Protocol: 2.09+ as follow:: struct setup_data { - __u64 next; - __u32 type; - __u32 len; - __u8 data[]; + __u64 next; + __u32 type; + __u32 len; + __u8 data[]; } - + Where, the next is a 64-bit physical pointer to the next node of linked list, the next field of the last node is 0; the type is used to identify the contents of data; the len is the length of data @@ -835,10 +835,10 @@ Protocol: 2.09+ protocol 2.15:: struct setup_indirect { - __u32 type; - __u32 reserved; /* Reserved, must be set to zero. */ - __u64 len; - __u64 addr; + __u32 type; + __u32 reserved; /* Reserved, must be set to zero. */ + __u64 len; + __u64 addr; }; The type member is a SETUP_INDIRECT | SETUP_* type. However, it cannot be @@ -850,15 +850,15 @@ Protocol: 2.09+ In this case setup_data and setup_indirect will look like this:: struct setup_data { - .next = 0, /* or */ - .type = SETUP_INDIRECT, - .len = sizeof(setup_indirect), - .data[sizeof(setup_indirect)] = (struct setup_indirect) { - .type = SETUP_INDIRECT | SETUP_E820_EXT, - .reserved = 0, - .len = , - .addr = , - }, + .next = 0, /* or */ + .type = SETUP_INDIRECT, + .len = sizeof(setup_indirect), + .data[sizeof(setup_indirect)] = (struct setup_indirect) { + .type = SETUP_INDIRECT | SETUP_E820_EXT, + .reserved = 0, + .len = , + .addr = , + }, } .. note:: @@ -897,11 +897,11 @@ Offset/size: 0x260/4 The kernel runtime start address is determined by the following algorithm:: if (relocatable_kernel) { - if (load_address < pref_address) - load_address = pref_address; - runtime_start = align_up(load_address, kernel_alignment); + if (load_address < pref_address) + load_address = pref_address; + runtime_start = align_up(load_address, kernel_alignment); } else { - runtime_start = pref_address; + runtime_start = pref_address; } Hence the necessary memory window location and size can be estimated by @@ -975,22 +975,22 @@ after kernel_info_var_len_data label. Each chunk of variable size data has to be prefixed with header/magic and its size, e.g.:: kernel_info: - .ascii "LToP" /* Header, Linux top (structure). */ - .long kernel_info_var_len_data - kernel_info - .long kernel_info_end - kernel_info - .long 0x01234567 /* Some fixed size data for the bootloaders. */ + .ascii "LToP" /* Header, Linux top (structure). */ + .long kernel_info_var_len_data - kernel_info + .long kernel_info_end - kernel_info + .long 0x01234567 /* Some fixed size data for the bootloaders. */ kernel_info_var_len_data: example_struct: /* Some variable size data for the bootloaders. */ - .ascii "0123" /* Header/Magic. */ - .long example_struct_end - example_struct - .ascii "Struct" - .long 0x89012345 + .ascii "0123" /* Header/Magic. */ + .long example_struct_end - example_struct + .ascii "Struct" + .long 0x89012345 example_struct_end: example_strings: /* Some variable size data for the bootloaders. */ - .ascii "ABCD" /* Header/Magic. */ - .long example_strings_end - example_strings - .asciz "String_0" - .asciz "String_1" + .ascii "ABCD" /* Header/Magic. */ + .long example_strings_end - example_strings + .asciz "String_0" + .asciz "String_1" example_strings_end: kernel_info_end: @@ -1132,53 +1132,53 @@ Such a boot loader should enter the following fields in the header:: unsigned long base_ptr; /* base address for real-mode segment */ if (setup_sects == 0) - setup_sects = 4; + setup_sects = 4; if (protocol >= 0x0200) { - type_of_loader = ; - if (loading_initrd) { - ramdisk_image = ; - ramdisk_size = ; - } - - if (protocol >= 0x0202 && loadflags & 0x01) - heap_end = 0xe000; - else - heap_end = 0x9800; - - if (protocol >= 0x0201) { - heap_end_ptr = heap_end - 0x200; - loadflags |= 0x80; /* CAN_USE_HEAP */ - } - - if (protocol >= 0x0202) { - cmd_line_ptr = base_ptr + heap_end; - strcpy(cmd_line_ptr, cmdline); - } else { - cmd_line_magic = 0xA33F; - cmd_line_offset = heap_end; - setup_move_size = heap_end + strlen(cmdline) + 1; - strcpy(base_ptr + cmd_line_offset, cmdline); - } + type_of_loader = ; + if (loading_initrd) { + ramdisk_image = ; + ramdisk_size = ; + } + + if (protocol >= 0x0202 && loadflags & 0x01) + heap_end = 0xe000; + else + heap_end = 0x9800; + + if (protocol >= 0x0201) { + heap_end_ptr = heap_end - 0x200; + loadflags |= 0x80; /* CAN_USE_HEAP */ + } + + if (protocol >= 0x0202) { + cmd_line_ptr = base_ptr + heap_end; + strcpy(cmd_line_ptr, cmdline); + } else { + cmd_line_magic = 0xA33F; + cmd_line_offset = heap_end; + setup_move_size = heap_end + strlen(cmdline) + 1; + strcpy(base_ptr + cmd_line_offset, cmdline); + } } else { - /* Very old kernel */ + /* Very old kernel */ - heap_end = 0x9800; + heap_end = 0x9800; - cmd_line_magic = 0xA33F; - cmd_line_offset = heap_end; + cmd_line_magic = 0xA33F; + cmd_line_offset = heap_end; - /* A very old kernel MUST have its real-mode code loaded at 0x90000 */ - if (base_ptr != 0x90000) { - /* Copy the real-mode kernel */ - memcpy(0x90000, base_ptr, (setup_sects + 1) * 512); - base_ptr = 0x90000; /* Relocated */ - } + /* A very old kernel MUST have its real-mode code loaded at 0x90000 */ + if (base_ptr != 0x90000) { + /* Copy the real-mode kernel */ + memcpy(0x90000, base_ptr, (setup_sects + 1) * 512); + base_ptr = 0x90000; /* Relocated */ + } - strcpy(0x90000 + cmd_line_offset, cmdline); + strcpy(0x90000 + cmd_line_offset, cmdline); - /* It is recommended to clear memory up to the 32K mark */ - memset(0x90000 + (setup_sects + 1) * 512, 0, (64 - (setup_sects + 1)) * 512); + /* It is recommended to clear memory up to the 32K mark */ + memset(0x90000 + (setup_sects + 1) * 512, 0, (64 - (setup_sects + 1)) * 512); } -- cgit v1.2.3 From 043507144ae13d3b882d40495d101bb4c4990d98 Mon Sep 17 00:00:00 2001 From: Thorsten Blum Date: Wed, 10 Dec 2025 13:56:28 +0100 Subject: x86/sgx: Remove unmatched quote in __sgx_encl_extend function comment There is no opening quote. Remove the unmatched closing quote. Signed-off-by: Thorsten Blum Signed-off-by: Ingo Molnar Reviewed-by: Kai Huang Link: https://patch.msgid.link/20251210125628.544916-1-thorsten.blum@linux.dev --- arch/x86/kernel/cpu/sgx/ioctl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c index 66f1efa16fbb..9322a9287dc7 100644 --- a/arch/x86/kernel/cpu/sgx/ioctl.c +++ b/arch/x86/kernel/cpu/sgx/ioctl.c @@ -242,7 +242,7 @@ static int __sgx_encl_add_page(struct sgx_encl *encl, /* * If the caller requires measurement of the page as a proof for the content, * use EEXTEND to add a measurement for 256 bytes of the page. Repeat this - * operation until the entire page is measured." + * operation until the entire page is measured. */ static int __sgx_encl_extend(struct sgx_encl *encl, struct sgx_epc_page *epc_page) -- cgit v1.2.3 From 8b62e64e6d30fa047b3aefb1a36e1f80c8acb3d2 Mon Sep 17 00:00:00 2001 From: Tal Zussman Date: Fri, 12 Dec 2025 04:08:07 -0500 Subject: x86/mm/tlb/trace: Export the TLB_REMOTE_WRONG_CPU enum in When the TLB_REMOTE_WRONG_CPU enum was introduced for the tlb_flush tracepoint, the enum was not exported to user-space. Add it to the appropriate macro definition to enable parsing by userspace tools, as per: Link: https://lore.kernel.org/all/20150403013802.220157513@goodmis.org [ mingo: Capitalize IPI, etc. ] Fixes: 2815a56e4b72 ("x86/mm/tlb: Add tracepoint for TLB flush IPI to stale CPU") Signed-off-by: Tal Zussman Signed-off-by: Ingo Molnar Reviewed-by: Steven Rostedt (Google) Reviewed-by: David Hildenbrand Reviewed-by: Rik van Riel Link: https://patch.msgid.link/20251212-tlb-trace-fix-v2-1-d322e0ad9b69@columbia.edu --- include/trace/events/tlb.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/include/trace/events/tlb.h b/include/trace/events/tlb.h index b4d8e7dc38f8..fb8369511685 100644 --- a/include/trace/events/tlb.h +++ b/include/trace/events/tlb.h @@ -12,8 +12,9 @@ EM( TLB_FLUSH_ON_TASK_SWITCH, "flush on task switch" ) \ EM( TLB_REMOTE_SHOOTDOWN, "remote shootdown" ) \ EM( TLB_LOCAL_SHOOTDOWN, "local shootdown" ) \ - EM( TLB_LOCAL_MM_SHOOTDOWN, "local mm shootdown" ) \ - EMe( TLB_REMOTE_SEND_IPI, "remote ipi send" ) + EM( TLB_LOCAL_MM_SHOOTDOWN, "local MM shootdown" ) \ + EM( TLB_REMOTE_SEND_IPI, "remote IPI send" ) \ + EMe( TLB_REMOTE_WRONG_CPU, "remote wrong CPU" ) /* * First define the enums in TLB_FLUSH_REASON to be exported to userspace -- cgit v1.2.3 From 0c01ea92f545ca7fcafdda6a8e29b65ef3a5ec74 Mon Sep 17 00:00:00 2001 From: Tal Zussman Date: Fri, 12 Dec 2025 04:08:08 -0500 Subject: mm: Remove tlb_flush_reason::NR_TLB_FLUSH_REASONS from This has been unused since it was added 11 years ago in: d17d8f9dedb9 ("x86/mm: Add tracepoints for TLB flushes") Signed-off-by: Tal Zussman Signed-off-by: Ingo Molnar Reviewed-by: Rik van Riel Acked-by: David Hildenbrand Link: https://patch.msgid.link/20251212-tlb-trace-fix-v2-2-d322e0ad9b69@columbia.edu --- include/linux/mm_types.h | 1 - 1 file changed, 1 deletion(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 9f6de068295d..42af2292951d 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -1631,7 +1631,6 @@ enum tlb_flush_reason { TLB_LOCAL_MM_SHOOTDOWN, TLB_REMOTE_SEND_IPI, TLB_REMOTE_WRONG_CPU, - NR_TLB_FLUSH_REASONS, }; /** -- cgit v1.2.3 From 21433d3e3ca14d20f9b0c2237b3d3a1355af7907 Mon Sep 17 00:00:00 2001 From: Kyle Meyer Date: Fri, 12 Dec 2025 12:53:36 -0600 Subject: x86/platform/uv: Fix UBSAN array-index-out-of-bounds When UBSAN is enabled, multiple array-index-out-of-bounds messages are printed: [ 0.000000] [ T0] UBSAN: array-index-out-of-bounds in arch/x86/kernel/apic/x2apic_uv_x.c:276:23 [ 0.000000] [ T0] index 1 is out of range for type ' [1]' ... [ 0.000000] [ T0] UBSAN: array-index-out-of-bounds in arch/x86/kernel/apic/x2apic_uv_x.c:277:32 [ 0.000000] [ T0] index 1 is out of range for type ' [1]' ... [ 0.000000] [ T0] UBSAN: array-index-out-of-bounds in arch/x86/kernel/apic/x2apic_uv_x.c:282:16 [ 0.000000] [ T0] index 1 is out of range for type ' [1]' ... [ 0.515850] [ T1] UBSAN: array-index-out-of-bounds in arch/x86/kernel/apic/x2apic_uv_x.c:1344:23 [ 0.519851] [ T1] index 1 is out of range for type ' [1]' ... [ 0.603850] [ T1] UBSAN: array-index-out-of-bounds in arch/x86/kernel/apic/x2apic_uv_x.c:1345:32 [ 0.607850] [ T1] index 1 is out of range for type ' [1]' ... [ 0.691850] [ T1] UBSAN: array-index-out-of-bounds in arch/x86/kernel/apic/x2apic_uv_x.c:1353:20 [ 0.695850] [ T1] index 1 is out of range for type ' [1]' One-element arrays have been deprecated: https://docs.kernel.org/process/deprecated.html#zero-length-and-one-element-arrays Switch entry in struct uv_systab to a flexible array member to fix UBSAN array-index-out-of-bounds messages. sizeof(struct uv_systab) is passed to early_memremap() and ioremap(). The flexible array member is not accessed until the UV system table size is used to remap the entire UV system table, so changes to sizeof(struct uv_systab) have no impact. Signed-off-by: Kyle Meyer Signed-off-by: Ingo Molnar Link: https://patch.msgid.link/aTxksN-3otY41WvQ@hpe.com --- arch/x86/include/asm/uv/bios.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/include/asm/uv/bios.h b/arch/x86/include/asm/uv/bios.h index 6989b824fd32..d0b62e255290 100644 --- a/arch/x86/include/asm/uv/bios.h +++ b/arch/x86/include/asm/uv/bios.h @@ -122,7 +122,7 @@ struct uv_systab { struct { u32 type:8; /* type of entry */ u32 offset:24; /* byte offset from struct start to entry */ - } entry[1]; /* additional entries follow */ + } entry[]; /* additional entries follow */ }; extern struct uv_systab *uv_systab; -- cgit v1.2.3 From 0edc78b82bea85e1b2165d8e870a5c3535919695 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Tue, 25 Nov 2025 22:50:45 +0100 Subject: x86/msi: Make irq_retrigger() functional for posted MSI Luigi reported that retriggering a posted MSI interrupt does not work correctly. The reason is that the retrigger happens at the vector domain by sending an IPI to the actual vector on the target CPU. That works correctly exactly once because the posted MSI interrupt chip does not issue an EOI as that's only required for the posted MSI notification vector itself. As a consequence the vector becomes stale in the ISR, which not only affects this vector but also any lower priority vector in the affected APIC because the ISR bit is not cleared. Luigi proposed to set the vector in the remap PIR bitmap and raise the posted MSI notification vector. That works, but that still does not cure a related problem: If there is ever a stray interrupt on such a vector, then the related APIC ISR bit becomes stale due to the lack of EOI as described above. Unlikely to happen, but if it happens it's not debuggable at all. So instead of playing games with the PIR, this can be actually solved for both cases by: 1) Keeping track of the posted interrupt vector handler state 2) Implementing a posted MSI specific irq_ack() callback which checks that state. If the posted vector handler is inactive it issues an EOI, otherwise it delegates that to the posted handler. This is correct versus affinity changes and concurrent events on the posted vector as the actual handler invocation is serialized through the interrupt descriptor lock. Fixes: ed1e48ea4370 ("iommu/vt-d: Enable posted mode for device MSIs") Reported-by: Luigi Rizzo Signed-off-by: Thomas Gleixner Tested-by: Luigi Rizzo Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20251125214631.044440658@linutronix.de Closes: https://lore.kernel.org/lkml/20251124104836.3685533-1-lrizzo@google.com --- arch/x86/include/asm/irq_remapping.h | 7 +++++++ arch/x86/kernel/irq.c | 23 +++++++++++++++++++++++ drivers/iommu/intel/irq_remapping.c | 8 ++++---- 3 files changed, 34 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/irq_remapping.h b/arch/x86/include/asm/irq_remapping.h index 5a0d42464d44..4e55d1755846 100644 --- a/arch/x86/include/asm/irq_remapping.h +++ b/arch/x86/include/asm/irq_remapping.h @@ -87,4 +87,11 @@ static inline void panic_if_irq_remap(const char *msg) } #endif /* CONFIG_IRQ_REMAP */ + +#ifdef CONFIG_X86_POSTED_MSI +void intel_ack_posted_msi_irq(struct irq_data *irqd); +#else +#define intel_ack_posted_msi_irq NULL +#endif + #endif /* __X86_IRQ_REMAPPING_H */ diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c index 86f4e574de02..b2fe6181960c 100644 --- a/arch/x86/kernel/irq.c +++ b/arch/x86/kernel/irq.c @@ -397,6 +397,7 @@ DEFINE_IDTENTRY_SYSVEC_SIMPLE(sysvec_kvm_posted_intr_nested_ipi) /* Posted Interrupt Descriptors for coalesced MSIs to be posted */ DEFINE_PER_CPU_ALIGNED(struct pi_desc, posted_msi_pi_desc); +static DEFINE_PER_CPU_CACHE_HOT(bool, posted_msi_handler_active); void intel_posted_msi_init(void) { @@ -414,6 +415,25 @@ void intel_posted_msi_init(void) this_cpu_write(posted_msi_pi_desc.ndst, destination); } +void intel_ack_posted_msi_irq(struct irq_data *irqd) +{ + irq_move_irq(irqd); + + /* + * Handle the rare case that irq_retrigger() raised the actual + * assigned vector on the target CPU, which means that it was not + * invoked via the posted MSI handler below. In that case APIC EOI + * is required as otherwise the ISR entry becomes stale and lower + * priority interrupts are never going to be delivered after that. + * + * If the posted handler invoked the device interrupt handler then + * the EOI would be premature because it would acknowledge the + * posted vector. + */ + if (unlikely(!__this_cpu_read(posted_msi_handler_active))) + apic_eoi(); +} + static __always_inline bool handle_pending_pir(unsigned long *pir, struct pt_regs *regs) { unsigned long pir_copy[NR_PIR_WORDS]; @@ -446,6 +466,8 @@ DEFINE_IDTENTRY_SYSVEC(sysvec_posted_msi_notification) pid = this_cpu_ptr(&posted_msi_pi_desc); + /* Mark the handler active for intel_ack_posted_msi_irq() */ + __this_cpu_write(posted_msi_handler_active, true); inc_irq_stat(posted_msi_notification_count); irq_enter(); @@ -474,6 +496,7 @@ DEFINE_IDTENTRY_SYSVEC(sysvec_posted_msi_notification) apic_eoi(); irq_exit(); + __this_cpu_write(posted_msi_handler_active, false); set_irq_regs(old_regs); } #endif /* X86_POSTED_MSI */ diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c index 4f9b01dc91e8..8bcbfe3d9c72 100644 --- a/drivers/iommu/intel/irq_remapping.c +++ b/drivers/iommu/intel/irq_remapping.c @@ -1303,17 +1303,17 @@ static struct irq_chip intel_ir_chip = { * irq_enter(); * handle_edge_irq() * irq_chip_ack_parent() - * irq_move_irq(); // No EOI + * intel_ack_posted_msi_irq(); // No EOI * handle_irq_event() * driver_handler() * handle_edge_irq() * irq_chip_ack_parent() - * irq_move_irq(); // No EOI + * intel_ack_posted_msi_irq(); // No EOI * handle_irq_event() * driver_handler() * handle_edge_irq() * irq_chip_ack_parent() - * irq_move_irq(); // No EOI + * intel_ack_posted_msi_irq(); // No EOI * handle_irq_event() * driver_handler() * apic_eoi() @@ -1322,7 +1322,7 @@ static struct irq_chip intel_ir_chip = { */ static struct irq_chip intel_ir_chip_post_msi = { .name = "INTEL-IR-POST", - .irq_ack = irq_move_irq, + .irq_ack = intel_ack_posted_msi_irq, .irq_set_affinity = intel_ir_set_affinity, .irq_compose_msi_msg = intel_ir_compose_msi_msg, .irq_set_vcpu_affinity = intel_ir_set_vcpu_affinity, -- cgit v1.2.3 From c56a12c71ad38f381105f6e5036dede64ad2dfee Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Thu, 18 Dec 2025 11:47:38 +0100 Subject: x86/bug: Fix old GCC compile fails For some mysterious reasons the GCC 8 and 9 preprocessor manages to sporadically fumble _ASM_BYTES(0x0f, 0x0b): $ grep ".byte[ ]*0x0f" defconfig-build/drivers/net/wireless/realtek/rtlwifi/base.s 1: .byte0x0f,0x0b ; 1: .byte 0x0f,0x0b ; which makes the assembler upset and all that. While there are more _ASM_BYTES() users (notably the NOP instructions), those don't seem affected. Therefore replace the offending ASM_UD2 with one using the ud2 mnemonic. Reported-by: Jean Delvare Suggested-by: Uros Bizjak Fixes: 85a2d4a890dc ("x86,ibt: Use UDB instead of 0xEA") Cc: stable@kernel.org Signed-off-by: Peter Zijlstra (Intel) Link: https://patch.msgid.link/20251218104659.GT3911114@noisy.programming.kicks-ass.net --- arch/x86/include/asm/bug.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/include/asm/bug.h b/arch/x86/include/asm/bug.h index ee23b98353d7..40de5796adb5 100644 --- a/arch/x86/include/asm/bug.h +++ b/arch/x86/include/asm/bug.h @@ -15,7 +15,7 @@ extern void __WARN_trap(struct bug_entry *bug, ...); /* * Despite that some emulators terminate on UD2, we use it for WARN(). */ -#define ASM_UD2 _ASM_BYTES(0x0f, 0x0b) +#define ASM_UD2 __ASM_FORM(ud2) #define INSN_UD2 0x0b0f #define LEN_UD2 2 -- cgit v1.2.3