summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2025-12-05cifs: Add a tracepoint to log EIO errorsDavid Howells
Add a tracepoint to log EIO errors and give it the capacity to convey up to two integers of information. This is then wrapped with three functions: int smb_EIO(enum smb_eio_trace trace) int smb_EIO1(enum smb_eio_trace trace, unsigned long info) int smb_EIO2(enum smb_eio_trace trace, unsigned long info, unsigned long info2) depending on how many bits of info are desired to be logged with any particular trace. The functions all return -EIO and can be used in place of -EIO. The trace argument is an enum value that gets translated to a string when the trace is printed. This makes is easier to log EIO instances when the client is under high load than turning on a printk wrapper such as cifs_dbg(). Granted, EIO could have its own separate EIO printing since EIO shouldn't happen. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> cc: linux-cifs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2025-12-05cifs: Don't need state locking in smb2_get_mid_entry()David Howells
There's no need to get ->srv_lock or ->ses_lock in smb2_get_mid_entry() as all that happens of relevance (to the lock) inside the locked sections is the reading of one status value in each. Replace the locking with READ_ONCE() and use a switch instead of a chain of if-statements. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> cc: Shyam Prasad N <sprasad@microsoft.com> cc: Tom Talpey <tom@talpey.com> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2025-12-05cifs: Remove the server pointer from smb_messageDavid Howells
Remove the server pointer from smb_message and instead pass it down to all the things that access it. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> cc: Shyam Prasad N <sprasad@microsoft.com> cc: Tom Talpey <tom@talpey.com> (RDMA, smbdirect) cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2025-12-05cifs: Fix specification of function pointersDavid Howells
Change the mid_receive_t, mid_callback_t and mid_handle_t function pointers to have the pointer marker in the typedef. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> cc: linux-cifs@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2025-12-05cifs: Replace SendReceiveBlockingLock() with SendReceive() plus flagsDavid Howells
Replace the smb1 transport's SendReceiveBlockingLock() with SendReceive() plus a couple of flags. This will then allow that to pick up the transport changes there. The first flag, CIFS_INTERRUPTIBLE_WAIT, is added to indicate that the wait should be interruptible and the second, CIFS_WINDOWS_LOCK, indicates that we need to send a Lock command with unlock type rather than a Cancel. send_lock_cancel() is then called from cifs_lock_cancel() which is called from the main transport loop in compound_send_recv(). [!] I *think* the error code handling is probably right. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> cc: Shyam Prasad N <sprasad@microsoft.com> cc: Tom Talpey <tom@talpey.com> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2025-12-05cifs: Clean up some places where an extra kvec[] was required for rfc1002David Howells
Clean up some places where previously an extra element in the kvec array was being used to hold an rfc1002 header for SMB1 (a previous patch removed this and generated it on the fly as for SMB2/3). Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> cc: Shyam Prasad N <sprasad@microsoft.com> cc: Tom Talpey <tom@talpey.com> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2025-12-05cifs: Make smb1's SendReceive() wrap cifs_send_recv()David Howells
Make the smb1 transport's SendReceive() simply wrap cifs_send_recv() as does SendReceive2(). This will then allow that to pick up the transport changes there. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> cc: Shyam Prasad N <sprasad@microsoft.com> cc: Tom Talpey <tom@talpey.com> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2025-12-05cifs: Remove the RFC1002 header from smb_hdrDavid Howells
Remove the RFC1002 header from struct smb_hdr as used for SMB-1.0. This simplifies the SMB-1.0 code by simplifying a lot of places that have to add or subtract 4 to work around the fact that the RFC1002 header isn't really part of the message and the base for various offsets within the message is from the base of the smb_hdr, not the RFC1002 header. Further, clean up a bunch of places that require an extra kvec struct specifically pointing to the RFC1002 header, such that kvec[0].iov_base must be exactly 4 bytes before kvec[1].iov_base. This allows the header preamble size stuff to be removed too. The size of the request and response message are then handed around either directly or by summing the size of all the iov_len members in the kvec array for which we have a count. Also, this simplifies and cleans up the common transmission and receive paths for SMB1 and SMB2/3 as there no longer needs to be special handling casing for SMB1 messages as the RFC1002 header is now generated on the fly for SMB1 as it is for SMB2/3. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Tom Talpey <tom@talpey.com> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> cc: Shyam Prasad N <sprasad@microsoft.com> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2025-12-05cifs: Fix handling of a beyond-EOF DIO/unbuffered read over SMB1David Howells
If a DIO read or an unbuffered read request extends beyond the EOF, the server will return a short read and a status code indicating that EOF was hit, which gets translated to -ENODATA. Note that the client does not cap the request at i_size, but asks for the amount requested in case there's a race on the server with a third party. Now, on the client side, the request will get split into multiple subrequests if rsize is smaller than the full request size. A subrequest that starts before or at the EOF and returns short data up to the EOF will be correctly handled, with the NETFS_SREQ_HIT_EOF flag being set, indicating to netfslib that we can't read more. If a subrequest, however, starts after the EOF and not at it, HIT_EOF will not be flagged, its error will be set to -ENODATA and it will be abandoned. This will cause the request as a whole to fail with -ENODATA. Fix this by setting NETFS_SREQ_HIT_EOF on any subrequest that lies beyond the EOF marker. This can be reproduced by mounting with "cache=none,sign,vers=1.0" and doing a read of a file that's significantly bigger than the size of the file (e.g. attempting to read 64KiB from a 16KiB file). Fixes: a68c74865f51 ("cifs: Fix SMB1 readv/writev callback in the same way as SMB2/3") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> cc: Shyam Prasad N <sprasad@microsoft.com> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2025-12-05Merge tag 'pull-persistency' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull persistent dentry infrastructure and conversion from Al Viro: "Some filesystems use a kinda-sorta controlled dentry refcount leak to pin dentries of created objects in dcache (and undo it when removing those). A reference is grabbed and not released, but it's not actually _stored_ anywhere. That works, but it's hard to follow and verify; among other things, we have no way to tell _which_ of the increments is intended to be an unpaired one. Worse, on removal we need to decide whether the reference had already been dropped, which can be non-trivial if that removal is on umount and we need to figure out if this dentry is pinned due to e.g. unlink() not done. Usually that is handled by using kill_litter_super() as ->kill_sb(), but there are open-coded special cases of the same (consider e.g. /proc/self). Things get simpler if we introduce a new dentry flag (DCACHE_PERSISTENT) marking those "leaked" dentries. Having it set claims responsibility for +1 in refcount. The end result this series is aiming for: - get these unbalanced dget() and dput() replaced with new primitives that would, in addition to adjusting refcount, set and clear persistency flag. - instead of having kill_litter_super() mess with removing the remaining "leaked" references (e.g. for all tmpfs files that hadn't been removed prior to umount), have the regular shrink_dcache_for_umount() strip DCACHE_PERSISTENT of all dentries, dropping the corresponding reference if it had been set. After that kill_litter_super() becomes an equivalent of kill_anon_super(). Doing that in a single step is not feasible - it would affect too many places in too many filesystems. It has to be split into a series. This work has really started early in 2024; quite a few preliminary pieces have already gone into mainline. This chunk is finally getting to the meat of that stuff - infrastructure and most of the conversions to it. Some pieces are still sitting in the local branches, but the bulk of that stuff is here" * tag 'pull-persistency' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits) d_make_discardable(): warn if given a non-persistent dentry kill securityfs_recursive_remove() convert securityfs get rid of kill_litter_super() convert rust_binderfs convert nfsctl convert rpc_pipefs convert hypfs hypfs: swich hypfs_create_u64() to returning int hypfs: switch hypfs_create_str() to returning int hypfs: don't pin dentries twice convert gadgetfs gadgetfs: switch to simple_remove_by_name() convert functionfs functionfs: switch to simple_remove_by_name() functionfs: fix the open/removal races functionfs: need to cancel ->reset_work in ->kill_sb() functionfs: don't bother with ffs->ref in ffs_data_{opened,closed}() functionfs: don't abuse ffs_data_closed() on fs shutdown convert selinuxfs ...
2025-12-05Merge tag 'mm-stable-2025-12-03-21-26' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "__vmalloc()/kvmalloc() and no-block support" (Uladzislau Rezki) Rework the vmalloc() code to support non-blocking allocations (GFP_ATOIC, GFP_NOWAIT) "ksm: fix exec/fork inheritance" (xu xin) Fix a rare case where the KSM MMF_VM_MERGE_ANY prctl state is not inherited across fork/exec "mm/zswap: misc cleanup of code and documentations" (SeongJae Park) Some light maintenance work on the zswap code "mm/page_owner: add debugfs files 'show_handles' and 'show_stacks_handles'" (Mauricio Faria de Oliveira) Enhance the /sys/kernel/debug/page_owner debug feature by adding unique identifiers to differentiate the various stack traces so that userspace monitoring tools can better match stack traces over time "mm/page_alloc: pcp->batch cleanups" (Joshua Hahn) Minor alterations to the page allocator's per-cpu-pages feature "Improve UFFDIO_MOVE scalability by removing anon_vma lock" (Lokesh Gidra) Address a scalability issue in userfaultfd's UFFDIO_MOVE operation "kasan: cleanups for kasan_enabled() checks" (Sabyrzhan Tasbolatov) "drivers/base/node: fold node register and unregister functions" (Donet Tom) Clean up the NUMA node handling code a little "mm: some optimizations for prot numa" (Kefeng Wang) Cleanups and small optimizations to the NUMA allocation hinting code "mm/page_alloc: Batch callers of free_pcppages_bulk" (Joshua Hahn) Address long lock hold times at boot on large machines. These were causing (harmless) softlockup warnings "optimize the logic for handling dirty file folios during reclaim" (Baolin Wang) Remove some now-unnecessary work from page reclaim "mm/damon: allow DAMOS auto-tuned for per-memcg per-node memory usage" (SeongJae Park) Enhance the DAMOS auto-tuning feature "mm/damon: fixes for address alignment issues in DAMON_LRU_SORT and DAMON_RECLAIM" (Quanmin Yan) Fix DAMON_LRU_SORT and DAMON_RECLAIM with certain userspace configuration "expand mmap_prepare functionality, port more users" (Lorenzo Stoakes) Enhance the new(ish) file_operations.mmap_prepare() method and port additional callsites from the old ->mmap() over to ->mmap_prepare() "Fix stale IOTLB entries for kernel address space" (Lu Baolu) Fix a bug (and possible security issue on non-x86) in the IOMMU code. In some situations the IOMMU could be left hanging onto a stale kernel pagetable entry "mm/huge_memory: cleanup __split_unmapped_folio()" (Wei Yang) Clean up and optimize the folio splitting code "mm, swap: misc cleanup and bugfix" (Kairui Song) Some cleanups and a minor fix in the swap discard code "mm/damon: misc documentation fixups" (SeongJae Park) "mm/damon: support pin-point targets removal" (SeongJae Park) Permit userspace to remove a specific monitoring target in the middle of the current targets list "mm: MISC follow-up patches for linux/pgalloc.h" (Harry Yoo) A couple of cleanups related to mm header file inclusion "mm/swapfile.c: select swap devices of default priority round robin" (Baoquan He) improve the selection of swap devices for NUMA machines "mm: Convert memory block states (MEM_*) macros to enums" (Israel Batista) Change the memory block labels from macros to enums so they will appear in kernel debug info "ksm: perform a range-walk to jump over holes in break_ksm" (Pedro Demarchi Gomes) Address an inefficiency when KSM unmerges an address range "mm/damon/tests: fix memory bugs in kunit tests" (SeongJae Park) Fix leaks and unhandled malloc() failures in DAMON userspace unit tests "some cleanups for pageout()" (Baolin Wang) Clean up a couple of minor things in the page scanner's writeback-for-eviction code "mm/hugetlb: refactor sysfs/sysctl interfaces" (Hui Zhu) Move hugetlb's sysfs/sysctl handling code into a new file "introduce VM_MAYBE_GUARD and make it sticky" (Lorenzo Stoakes) Make the VMA guard regions available in /proc/pid/smaps and improves the mergeability of guarded VMAs "mm: perform guard region install/remove under VMA lock" (Lorenzo Stoakes) Reduce mmap lock contention for callers performing VMA guard region operations "vma_start_write_killable" (Matthew Wilcox) Start work on permitting applications to be killed when they are waiting on a read_lock on the VMA lock "mm/damon/tests: add more tests for online parameters commit" (SeongJae Park) Add additional userspace testing of DAMON's "commit" feature "mm/damon: misc cleanups" (SeongJae Park) "make VM_SOFTDIRTY a sticky VMA flag" (Lorenzo Stoakes) Address the possible loss of a VMA's VM_SOFTDIRTY flag when that VMA is merged with another "mm: support device-private THP" (Balbir Singh) Introduce support for Transparent Huge Page (THP) migration in zone device-private memory "Optimize folio split in memory failure" (Zi Yan) "mm/huge_memory: Define split_type and consolidate split support checks" (Wei Yang) Some more cleanups in the folio splitting code "mm: remove is_swap_[pte, pmd]() + non-swap entries, introduce leaf entries" (Lorenzo Stoakes) Clean up our handling of pagetable leaf entries by introducing the concept of 'software leaf entries', of type softleaf_t "reparent the THP split queue" (Muchun Song) Reparent the THP split queue to its parent memcg. This is in preparation for addressing the long-standing "dying memcg" problem, wherein dead memcg's linger for too long, consuming memory resources "unify PMD scan results and remove redundant cleanup" (Wei Yang) A little cleanup in the hugepage collapse code "zram: introduce writeback bio batching" (Sergey Senozhatsky) Improve zram writeback efficiency by introducing batched bio writeback support "memcg: cleanup the memcg stats interfaces" (Shakeel Butt) Clean up our handling of the interrupt safety of some memcg stats "make vmalloc gfp flags usage more apparent" (Vishal Moola) Clean up vmalloc's handling of incoming GFP flags "mm: Add soft-dirty and uffd-wp support for RISC-V" (Chunyan Zhang) Teach soft dirty and userfaultfd write protect tracking to use RISC-V's Svrsw60t59b extension "mm: swap: small fixes and comment cleanups" (Youngjun Park) Fix a small bug and clean up some of the swap code "initial work on making VMA flags a bitmap" (Lorenzo Stoakes) Start work on converting the vma struct's flags to a bitmap, so we stop running out of them, especially on 32-bit "mm/swapfile: fix and cleanup swap list iterations" (Youngjun Park) Address a possible bug in the swap discard code and clean things up a little [ This merge also reverts commit ebb9aeb980e5 ("vfio/nvgrace-gpu: register device memory for poison handling") because it looks broken to me, I've asked for clarification - Linus ] * tag 'mm-stable-2025-12-03-21-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits) mm: fix vma_start_write_killable() signal handling mm/swapfile: use plist_for_each_entry in __folio_throttle_swaprate mm/swapfile: fix list iteration when next node is removed during discard fs/proc/task_mmu.c: fix make_uffd_wp_huge_pte() huge pte handling mm/kfence: add reboot notifier to disable KFENCE on shutdown memcg: remove inc/dec_lruvec_kmem_state helpers selftests/mm/uffd: initialize char variable to Null mm: fix DEBUG_RODATA_TEST indentation in Kconfig mm: introduce VMA flags bitmap type tools/testing/vma: eliminate dependency on vma->__vm_flags mm: simplify and rename mm flags function for clarity mm: declare VMA flags by bit zram: fix a spelling mistake mm/page_alloc: optimize lowmem_reserve max lookup using its semantic monotonicity mm/vmscan: skip increasing kswapd_failures when reclaim was boosted pagemap: update BUDDY flag documentation mm: swap: remove scan_swap_map_slots() references from comments mm: swap: change swap_alloc_slow() to void mm, swap: remove redundant comment for read_swap_cache_async mm, swap: use SWP_SOLIDSTATE to determine if swap is rotational ...
2025-12-05Merge tag 'sysctl-6.19-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl Pull sysctl updates from Joel Granados: - Move jiffies converters out of kernel/sysctl.c Move the jiffies converters into kernel/time/jiffies.c and replace the pipe-max-size proc_handler converter with a macro based version. This is all part of the effort to relocate non-sysctl logic out of kernel/sysctl.c into more relevant subsystems. No functional changes. - Generalize proc handler converter creation Remove duplicated sysctl converter logic by consolidating it in macros. These are used inside sysctl core as well as in pipe.c and jiffies.c. Converter kernel and user space pointer args are now automatically const qualified for the convenience of the caller. No functional changes. - Miscellaneous Fix kernel-doc format warnings, remove unnecessary __user qualifiers, and move the nmi_watchdog sysctl into .rodata. - Testing This series was run through sysctl selftests/kunit test suite in x86_64. It went into linux-next after rc2, giving it a good 4/5 weeks of testing. * tag 'sysctl-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl: (21 commits) sysctl: Wrap do_proc_douintvec with the public function proc_douintvec_conv sysctl: Create pipe-max-size converter using sysctl UINT macros sysctl: Move proc_doulongvec_ms_jiffies_minmax to kernel/time/jiffies.c sysctl: Move jiffies converters to kernel/time/jiffies.c sysctl: Move UINT converter macros to sysctl header sysctl: Move INT converter macros to sysctl header sysctl: Allow custom converters from outside sysctl sysctl: remove __user qualifier from stack_erasing_sysctl buffer argument sysctl: Create macro for user-to-kernel uint converter sysctl: Add optional range checking to SYSCTL_UINT_CONV_CUSTOM sysctl: Create unsigned int converter using new macro sysctl: Add optional range checking to SYSCTL_INT_CONV_CUSTOM sysctl: Create integer converters with one macro sysctl: Create converter functions with two new macros sysctl: Discriminate between kernel and user converter params sysctl: Indicate the direction of operation with macro names sysctl: Remove superfluous __do_proc_* indirection sysctl: Remove superfluous tbl_data param from "dovec" functions sysctl: Replace void pointer with const pointer to ctl_table sysctl: fix kernel-doc format warning ...
2025-12-05Merge tag 'pstore-v6.19-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull pstore update from Kees Cook: - pstore/ram: Update module parameters from platform data (Tzung-Bi Shih) * tag 'pstore-v6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: pstore/ram: Update module parameters from platform data
2025-12-05Merge tag 'configfs-for-v6.19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/a.hindborg/linux Pull configfs updates from Andreas Hindborg: "Two commits changing constness of the configfs vtable pointers. We plan to follow up with changes at call sites down the road" * tag 'configfs-for-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/a.hindborg/linux: configfs: Constify ct_item_ops in struct config_item_type configfs: Constify ct_group_ops in struct config_item_type
2025-12-059p: fix new mount API cache option handlingEric Sandeen
After commit 4eb3117888a92, 9p needs to be able to accept numerical cache= mount options as well as the string "shortcuts" because the option is printed numerically in /proc/mounts rather than by string. This was missed in the mount API conversion, which used an enum for the shortcuts and therefore could not handle a numeric equivalent as an argument to the cache option. Fix this by removing the enum and reverting to the slightly more open-coded option handling for Opt_cache, with the reinstated get_cache_mode() helper. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Message-ID: <48cdeec9-5bb9-4c7a-a203-39bb8e0ef443@redhat.com> Tested-by: Remi Pommarel <repk@triplefau.lt> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2025-12-059p: fix cache/debug options printing in v9fs_show_optionsEric Sandeen
commit 4eb3117888a92 changed the cache= option to accept either string shortcuts or bitfield values. It also changed /proc/mounts to emit the option as the hexadecimal numeric value rather than the shortcut string. However, by printing "cache=%x" without the leading 0x, shortcuts such as "cache=loose" will emit "cache=f" and 'f' is not a string that is parseable by kstrtoint(), so remounting may fail if a remount with "cache=f" is attempted. debug=%x has had the same problem since options have been displayed in c4fac9100456 ("9p: Implement show_options") Fix these by adding the 0x prefix to the hexadecimal value shown in /proc/mounts. Fixes: 4eb3117888a92 ("fs/9p: Rework cache modes and add new options to Documentation") Signed-off-by: Eric Sandeen <sandeen@redhat.com> Message-ID: <54b93378-dcf1-4b04-922d-c8b4393da299@redhat.com> [Dominique: use %#x at Al Viro's suggestion, also handle debug] Tested-by: Remi Pommarel <repk@triplefau.lt> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2025-12-04autofs: fix per-dentry timeout warningIan Kent
The check that determines if the message that warns about the per-dentry timeout being greater than the super block timeout is not correct. The initial value for this field is -1 and the type of the field is unsigned long. I could change the type to long but the message is in the wrong place too, it should come after the timeout setting. So leave everything else as it is and move the message and check the timeout is actually set as an additional condition on issuing the message. Also fix the timeout comparison. Signed-off-by: Ian Kent <raven@themaw.net> Link: https://patch.msgid.link/20251111060439.19593-2-raven@themaw.net Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-12-03cifs: client: enforce consistent handling of multichannel and max_channelsRajasi Mandal
Previously, the behavior of the multichannel and max_channels mount options was inconsistent and order-dependent. For example, specifying "multichannel,max_channels=1" would result in 2 channels, while "max_channels=1,multichannel" would result in 1 channel. Additionally, conflicting combinations such as "nomultichannel,max_channels=3" or "multichannel,max_channels=1" did not produce errors and could lead to unexpected channel counts. This commit introduces two new fields in smb3_fs_context to explicitly track whether multichannel and max_channels were specified during mount. The option parsing and validation logic is updated to ensure: - The outcome is no longer dependent on the order of options. - Conflicting combinations (e.g., "nomultichannel,max_channels=3" or "multichannel,max_channels=1") are detected and result in an error. - The number of channels created is consistent with the specified options. This improves the reliability and predictability of mount option handling for SMB3 multichannel support. Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Rajasi Mandal <rajasimandal@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-12-03Merge tag 'ntfs3_for_6.19' of ↵Linus Torvalds
https://github.com/Paragon-Software-Group/linux-ntfs3 Pull ntfs3 updates from Konstantin Komarov: "New code: - support timestamps prior to epoch - do not overwrite uptodate pages - disable readahead for compressed files - setting of dummy blocksize to read boot_block when mounting - the run_lock initialization when loading $Extend - initialization of allocated memory before use - support for the NTFS3_IOC_SHUTDOWN ioctl - check for minimum alignment when performing direct I/O reads - check for shutdown in fsync Fixes: - mount failure for sparse runs in run_unpack() - use-after-free of sbi->options in cmp_fnames - KMSAN uninit bug after failed mi_read in mi_format_new - uninit error after buffer allocation by __getname() - KMSAN uninit-value in ni_create_attr_list - double free of sbi->options->nls and ownership of fc->fs_private - incorrect vcn adjustments in attr_collapse_range() - mode update when ACL can be reduced to mode - memory leaks in add sub record Changes: - refactor code, updated terminology, spelling - do not kmap pages in (de)compression code - after ntfs_look_free_mft(), code that fails must put mft_inode - default mount options for "acl" and "prealloc" Replaced: - use unsafe_memcpy() to avoid memcpy size warning - ntfs_bio_pages with page cache for compressed files" * tag 'ntfs3_for_6.19' of https://github.com/Paragon-Software-Group/linux-ntfs3: (26 commits) fs/ntfs3: check for shutdown in fsync fs/ntfs3: change the default mount options for "acl" and "prealloc" fs/ntfs3: Prevent memory leaks in add sub record fs/ntfs3: out1 also needs to put mi fs/ntfs3: Fix spelling mistake "recommened" -> "recommended" fs/ntfs3: update mode in xattr when ACL can be reduced to mode fs/ntfs3: check minimum alignment for direct I/O fs/ntfs3: implement NTFS3_IOC_SHUTDOWN ioctl fs/ntfs3: correct attr_collapse_range when file is too fragmented ntfs3: fix double free of sbi->options->nls and clarify ownership of fc->fs_private fs/ntfs3: Initialize allocated memory before use fs/ntfs3: remove ntfs_bio_pages and use page cache for compressed I/O ntfs3: avoid memcpy size warning fs/ntfs3: fix KMSAN uninit-value in ni_create_attr_list ntfs3: init run lock for extend inode ntfs: set dummy blocksize to read boot_block when mounting fs/ntfs3: disable readahead for compressed files ntfs3: Fix uninit buffer allocated by __getname() ntfs3: fix uninit memory after failed mi_read in mi_format_new ntfs3: fix use-after-free of sbi->options in cmp_fnames ...
2025-12-03Merge tag 'ext4_for_linus-6.19-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "New features and improvements for the ext4 file system: - Optimize online defragmentation by using folios instead of individual buffer heads - Improve error codes stored in the superblock when the journal aborts - Minor cleanups and clarifications in ext4_map_blocks() - Add documentation of the casefold and encrypt flags - Add support for file systems with a blocksize greater than the pagesize - Improve performance by enabling the caching the fact that an inode does not have a Posix ACL Various Bug Fixes: - Fix false positive complaints from smatch - Fix error code which is returned by ext4fs_dirhash() when Siphash is used without the encryption key - Fix races when writing to inline data files which could trigger a BUG - Fix potential NULL dereference when there is an corrupt file system with an extended attribute value stored in a inode - Fix false positive lockdep report when syzbot uses ext4 and ocfs2 together - Fix false positive reported by DEPT by adjusting lock annotation - Avoid a potential BUG_ON in jbd2 when a file system is massively corrupted - Fix a WARN_ON when superblock is corrupted with a non-NULL terminated mount options field - Add check if the userspace passes in a non-NULL terminated mount options field to EXT4_IOC_SET_TUNE_SB_PARAM - Fix a potential journal checksum failure whena file system is copied while it is mounted read-only - Fix a potential potential orphan file tracking error which only showed on 32-bit systems - Fix assertion checks in mballoc (which have to be explicitly enbled by manually enabling AGGRESSIVE_CHECKS and recompiling) - Avoid complaining about overly large orphan files created by mke2fs with with file systems with a 64k block size" * tag 'ext4_for_linus-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (58 commits) ext4: mark inodes without acls in __ext4_iget() ext4: enable block size larger than page size ext4: add checks for large folio incompatibilities when BS > PS ext4: support verifying data from large folios with fs-verity ext4: make data=journal support large block size ext4: support large block size in __ext4_block_zero_page_range() ext4: support large block size in mpage_prepare_extent_to_map() ext4: support large block size in mpage_map_and_submit_buffers() ext4: support large block size in ext4_block_write_begin() ext4: support large block size in ext4_mpage_readpages() ext4: rename 'page' references to 'folio' in multi-block allocator ext4: prepare buddy cache inode for BS > PS with large folios ext4: support large block size in ext4_mb_init_cache() ext4: support large block size in ext4_mb_get_buddy_page_lock() ext4: support large block size in ext4_mb_load_buddy_gfp() ext4: add EXT4_LBLK_TO_PG and EXT4_PG_TO_LBLK for block/page conversion ext4: add EXT4_LBLK_TO_B macro for logical block to bytes conversion ext4: support large block size in ext4_readdir() ext4: support large block size in ext4_calculate_overhead() ext4: introduce s_min_folio_order for future BS > PS support ...
2025-12-03Merge tag 'gfs2-for-6.19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Andreas Gruenbacher: - Major withdraw / error handling overhaul based on dlm's new DLM_RELEASE_RECOVER feature: this allows gfs to treat withdraws like node failures. Make withdraws asynchronous - Fix a bug in commit e4a8b5481c59a that caused 'df' to remain out of sync. ('df' is still allowed to go slightly out of sync for short periods of time) - Prevent recusive memory reclaim in gfs2_unstuff_dinode() - Clean up SDF_JOURNAL_LIVE flag handling - Fix remote evict for read-only filesystems - Fix a misuse of bio_chain() - Various other minor cleanups * tag 'gfs2-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (35 commits) gfs2: Fix use of bio_chain gfs2: Clean up SDF_JOURNAL_LIVE flag handling gfs2: No longer thaw filesystems during a withdraw gfs2: Withdraw immediately in gfs2_trans_add_meta gfs2: New gfs2_withdraw_helper gfs2: Clean up properly during a withdraw gfs2: Rename gfs2_{gl_dq_holders => withdraw_glocks} Revert "gfs2: fix infinite loop when checking ail item count before go_inval" Revert "gfs2: Allow some glocks to be used during withdraw" Revert "gfs2: Check for log write errors before telling dlm to unlock" Revert "gfs2: fix a deadlock on withdraw-during-mount" Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (6/6) Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (5/6) Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (4/6) Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (3/6) Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (2/6) Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (1/6) Revert "gfs2: don't stop reads while withdraw in progress" gfs2: Rename LM_FLAG_{NOEXP -> RECOVER} gfs2: Kill gfs2_io_error_bh_wd ...
2025-12-03Merge tag 'v6.19-rc-smb-fixes' of git://git.samba.org/ksmbdLinus Torvalds
Pull smb client and server updates from Steve French: - server fixes: - IPC use after free locking fix - fix locking bug in delete paths - fix use after free in disconnect - fix underflow in locking check - error mapping improvement - socket listening improvement - return code mapping fixes - crypto improvements (use default libraries) - cleanup patches: - netfs - client checkpatch cleanup - server cleanup - move server/client duplicate code to common code - fix some defines to better match protocol specification - smbdirect (RDMA) fixes - client debugging improvements for leases * tag 'v6.19-rc-smb-fixes' of git://git.samba.org/ksmbd: (44 commits) cifs: Use netfs_alloc/free_folioq_buffer() smb: client: show smb lease key in open_dirs output smb: client: show smb lease key in open_files output ksmbd: ipc: fix use-after-free in ipc_msg_send_request smb: client: relax WARN_ON_ONCE(SMBDIRECT_SOCKET_*) checks in recv_done() and smbd_conn_upcall() smb: server: relax WARN_ON_ONCE(SMBDIRECT_SOCKET_*) checks in recv_done() and smb_direct_cm_handler() smb: smbdirect: introduce SMBDIRECT_CHECK_STATUS_{WARN,DISCONNECT}() smb: smbdirect: introduce SMBDIRECT_DEBUG_ERR_PTR() helper ksmbd: vfs: fix race on m_flags in vfs_cache ksmbd: Replace strcpy + strcat to improve convert_to_nt_pathname smb: move FILE_SYSTEM_ATTRIBUTE_INFO to common/fscc.h ksmbd: implement error handling for STATUS_INFO_LENGTH_MISMATCH in smb server ksmbd: fix use-after-free in ksmbd_tree_connect_put under concurrency ksmbd: server: avoid busy polling in accept loop smb: move create_durable_reconn to common/smb2pdu.h smb: fix some warnings reported by scripts/checkpatch.pl smb: do some cleanups smb: move FILE_SYSTEM_SIZE_INFO to common/fscc.h smb: move some duplicate struct definitions to common/fscc.h smb: move list of FileSystemAttributes to common/fscc.h ...
2025-12-03Merge tag 'xfs-merge-6.19' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs updates from Carlos Maiolino: "There are no major changes in xfs. This contains mostly some code cleanups, a few bug fixes and documentation update. Highlights are: - Quota locking cleanup - Getting rid of old xlog_in_core_2_t type" * tag 'xfs-merge-6.19' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (33 commits) docs: remove obsolete links in the xfs online repair documentation xfs: move some code out of xfs_iget_recycle xfs: use zi more in xfs_zone_gc_mount xfs: remove the unused bv field in struct xfs_gc_bio xfs: remove xarray mark for reclaimable zones xfs: remove the xlog_in_core_t typedef xfs: remove l_iclog_heads xfs: remove the xlog_rec_header_t typedef xfs: remove xlog_in_core_2_t xfs: remove a very outdated comment from xlog_alloc_log xfs: cleanup xlog_alloc_log a bit xfs: don't use xlog_in_core_2_t in struct xlog_in_core xfs: add a on-disk log header cycle array accessor xfs: add a XLOG_CYCLE_DATA_SIZE constant xfs: reduce ilock roundtrips in xfs_qm_vop_dqalloc xfs: move xfs_dquot_tree calls into xfs_qm_dqget_cache_{lookup,insert} xfs: move quota locking into xrep_quota_item xfs: move quota locking into xqcheck_commit_dquot xfs: move q_qlock locking into xqcheck_compare_dquot xfs: move q_qlock locking into xchk_quota_item ...
2025-12-03Merge tag 'erofs-for-6.19-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs updates from Gao Xiang: - Fix a WARNING caused by a recent FSDAX misdetection regression - Fix the filesystem stacking limit for file-backed mounts - Print more informative diagnostics on decompression errors - Switch the on-disk definition `erofs_fs.h` to the MIT license - Minor cleanups * tag 'erofs-for-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: switch on-disk header `erofs_fs.h` to MIT license erofs: get rid of raw bi_end_io() usage erofs: enable error reporting for z_erofs_fixup_insize() erofs: enable error reporting for z_erofs_stream_switch_bufs() erofs: improve Zstd, LZMA and DEFLATE error strings erofs: improve decompression error reporting erofs: tidy up z_erofs_lz4_handle_overlap() erofs: limit the level of fs stacking for file-backed mounts erofs: correct FSDAX detection
2025-12-03Merge tag 'hfs-v6.19-tag1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vdubeyko/hfs Pull hfs/hfsplus updates from Viacheslav Dubeyko: "Several fixes for syzbot reported issues, HFS/HFS+ fixes of xfstests failures, Kunit-based unit-tests introduction, and code cleanup: - Dan Carpenter fixed a potential use-after-free issue in hfs_correct_next_unused_CNID() method. Tetsuo Handa has made nice fix of syzbot reported issue related to incorrect inode->i_mode management if volume has been corrupted somehow. Yang Chenzhi has made really good fix of potential race condition in __hfs_bnode_create() method for HFS+ file system. - Several fixes to xfstests failures. Particularly, generic/070, generic/073, and generic/101 test-cases finish successfully for the case of HFS+ file system right now. - HFS and HFS+ drivers share multiple structures of on-disk layout declarations. Some structures are used without any change. However, we had two independent declarations of the same structures in HFS and HFS+ drivers. The on-disk layout declarations have been moved into include/linux/hfs_common.h with the goal to exclude the declarations duplication and to keep the HFS/HFS+ on-disk layout declarations in one place. Also, this patch prepares the basis for creating a hfslib that can aggregate common functionality without necessity to duplicate the same code in HFS and HFS+ drivers. - HFS/HFS+ really need unit-tests because of multiple xfstests failures. The first two patches introduce Kunit-based unit-tests for the case string operations in HFS/HFS+ file system drivers" * tag 'hfs-v6.19-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/vdubeyko/hfs: hfs/hfsplus: move on-disk layout declarations into hfs_common.h hfsplus: fix volume corruption issue for generic/101 hfsplus: introduce KUnit tests for HFS+ string operations hfs: introduce KUnit tests for HFS string operations hfsplus: fix volume corruption issue for generic/073 hfsplus: Verify inode mode when loading from disk hfsplus: fix volume corruption issue for generic/070 hfs/hfsplus: prevent getting negative values of offset/length hfsplus: fix missing hfs_bnode_get() in __hfs_bnode_create hfs: fix potential use after free in hfs_correct_next_unused_CNID()
2025-12-03Merge tag 'for-6.19-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "Features: - shutdown ioctl support (needs CONFIG_BTRFS_EXPERIMENTAL for now): - set filesystem state as being shut down (also named going down in other filesystems), where all active operations return EIO and this cannot be changed until unmount - pending operations are attempted to be finished but error messages may still show up depending on where exactly the shutdown happened - scrub (and device replace) vs suspend/hibernate: - a running scrub will prevent suspend, which can be annoying as suspend is an immediate request and scrub is not critical - filesystem freezing before suspend was not sufficient as the problem was in process freezing - behaviour change: on suspend scrub and device replace are cancelled, where scrub can record the last state and continue from there; the device replace has to be restarted from the beginning - zone stats exported in sysfs, from the perspective of the filesystem this includes active, reclaimable, relocation etc zones Performance: - improvements when processing space reservation tickets by optimizing locking and shrinking critical sections, cumulative improvements in lockstat numbers show +15% Notable fixes: - use vmalloc fallback when allocating bios as high order allocations can happen with wide checksums (like sha256) - scrub will always track the last position of progress so it's not starting from zero after an error Core: - under experimental config, checksum calculations are offloaded to process context, simplifies locking and allows to remove compression write worker kthread(s): - speed improvement in direct IO throughput with buffered IO fallback is +15% when not offloaded but this is more related to internal crypto subsystem improvements - this will be probably default in the future removing the sysfs tunable - (experimental) block size > page size updates: - support more operations when not using large folios (encoded read/write and send) - raid56 - more preparations for fscrypt support Other: - more conversions to auto-cleaned variables - parameter cleanups and removals - extended warning fixes - improved printing of structured values like keys - lots of other cleanups and refactoring" * tag 'for-6.19-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (147 commits) btrfs: remove unnecessary inode key in btrfs_log_all_parents() btrfs: remove redundant zero/NULL initializations in btrfs_alloc_root() btrfs: remaining BTRFS_PATH_AUTO_FREE conversions btrfs: send: do not allocate memory for xattr data when checking it exists btrfs: send: add unlikely to all unexpected overflow checks btrfs: reduce arguments to btrfs_del_inode_ref_in_log() btrfs: remove root argument from btrfs_del_dir_entries_in_log() btrfs: use test_and_set_bit() in btrfs_delayed_delete_inode_ref() btrfs: don't search back for dir inode item in INO_LOOKUP_USER btrfs: don't rewrite ret from inode_permission btrfs: add orig_logical to btrfs_bio for encryption btrfs: disable verity on encrypted inodes btrfs: disable various operations on encrypted inodes btrfs: remove redundant level reset in btrfs_del_items() btrfs: simplify leaf traversal after path release in btrfs_next_old_leaf() btrfs: optimize balance_level() path reference handling btrfs: factor out root promotion logic into promote_child_to_root() btrfs: raid56: remove the "_step" infix btrfs: raid56: enable bs > ps support btrfs: raid56: prepare finish_parity_scrub() to support bs > ps cases ...
2025-12-03Merge tag 'for-6.19/block-20251201' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull block updates from Jens Axboe: - Fix head insertion for mq-deadline, a regression from when priority support was added - Series simplifying and improving the ublk user copy code - Various ublk related cleanups - Fixup REQ_NOWAIT handling in loop/zloop, clearing NOWAIT when the request is punted to a thread for handling - Merge and then later revert loop dio nowait support, as it ended up causing excessive stack usage for when the inline issue code needs to dip back into the full file system code - Improve auto integrity code, making it less deadlock prone - Speedup polled IO handling, but manually managing the hctx lookups - Fixes for blk-throttle for SSD devices - Small series with fixes for the S390 dasd driver - Add support for caching zones, avoiding unnecessary report zone queries - MD pull requests via Yu: - fix null-ptr-dereference regression for dm-raid0 - fix IO hang for raid5 when array is broken with IO inflight - remove legacy 1s delay to speed up system shutdown - change maintainer's email address - data can be lost if array is created with different lbs devices, fix this problem and record lbs of the array in metadata - fix rcu protection for md_thread - fix mddev kobject lifetime regression - enable atomic writes for md-linear - some cleanups - bcache updates via Coly - remove useless discard and cache device code - improve usage of per-cpu workqueues - Reorganize the IO scheduler switching code, fixing some lockdep reports as well - Improve the block layer P2P DMA support - Add support to the block tracing code for zoned devices - Segment calculation improves, and memory alignment flexibility improvements - Set of prep and cleanups patches for ublk batching support. The actual batching hasn't been added yet, but helps shrink down the workload of getting that patchset ready for 6.20 - Fix for how the ps3 block driver handles segments offsets - Improve how block plugging handles batch tag allocations - nbd fixes for use-after-free of the configuration on device clear/put - Set of improvements and fixes for zloop - Add Damien as maintainer of the block zoned device code handling - Various other fixes and cleanups * tag 'for-6.19/block-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (162 commits) block/rnbd: correct all kernel-doc complaints blk-mq: use queue_hctx in blk_mq_map_queue_type md: remove legacy 1s delay in md_notify_reboot md/raid5: fix IO hang when array is broken with IO inflight md: warn about updating super block failure md/raid0: fix NULL pointer dereference in create_strip_zones() for dm-raid sbitmap: fix all kernel-doc warnings ublk: add helper of __ublk_fetch() ublk: pass const pointer to ublk_queue_is_zoned() ublk: refactor auto buffer register in ublk_dispatch_req() ublk: add `union ublk_io_buf` with improved naming ublk: add parameter `struct io_uring_cmd *` to ublk_prep_auto_buf_reg() kfifo: add kfifo_alloc_node() helper for NUMA awareness blk-mq: fix potential uaf for 'queue_hw_ctx' blk-mq: use array manage hctx map instead of xarray ublk: prevent invalid access with DEBUG s390/dasd: Use scnprintf() instead of sprintf() s390/dasd: Move device name formatting into separate function s390/dasd: Remove unnecessary debugfs_create() return checks s390/dasd: Fix gendisk parent after copy pair swap ...
2025-12-03Merge tag 'for-6.19/io_uring-20251201' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull io_uring updates from Jens Axboe: - Unify how task_work cancelations are detected, placing it in the task_work running state rather than needing to check the task state - Series cleaning up and moving the cancelation code to where it belongs, in cancel.c - Cleanup of waitid and futex argument handling - Add support for mixed sized SQEs. 6.18 added support for mixed sized CQEs, improving flexibility and efficiency of workloads that need big CQEs. This adds similar support for SQEs, where the occasional need for a 128b SQE doesn't necessitate having all SQEs be 128b in size - Introduce zcrx and SQ/CQ layout queries. The former returns what zcrx features are available. And both return the ring size information to help with allocation size calculation for user provided rings like IORING_SETUP_NO_MMAP and IORING_MEM_REGION_TYPE_USER - Zcrx updates for 6.19. It includes a bunch of small patches, IORING_REGISTER_ZCRX_CTRL and RQ flushing and David's work on sharing zcrx b/w multiple io_uring instances - Series cleaning up ring initializations, notable deduplicating ring size and offset calculations. It also moves most of the checking before doing any allocations, making the code simpler - Add support for getsockname and getpeername, which is mostly a trivial hookup after a bit of refactoring on the networking side - Various fixes and cleanups * tag 'for-6.19/io_uring-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (68 commits) io_uring: Introduce getsockname io_uring cmd socket: Split out a getsockname helper for io_uring socket: Unify getsockname and getpeername implementation io_uring/query: drop unused io_handle_query_entry() ctx arg io_uring/kbuf: remove obsolete buf_nr_pages and update comments io_uring/register: use correct location for io_rings_layout io_uring/zcrx: share an ifq between rings io_uring/zcrx: add io_fill_zcrx_offsets() io_uring/zcrx: export zcrx via a file io_uring/zcrx: move io_zcrx_scrub() and dependencies up io_uring/zcrx: count zcrx users io_uring/zcrx: add sync refill queue flushing io_uring/zcrx: introduce IORING_REGISTER_ZCRX_CTRL io_uring/zcrx: elide passing msg flags io_uring/zcrx: use folio_nr_pages() instead of shift operation io_uring/zcrx: convert to use netmem_desc io_uring/query: introduce rings info query io_uring/query: introduce zcrx query io_uring: move cq/sq user offset init around io_uring: pre-calculate scq layout ...
2025-12-04f2fs: ignore discard return valueChaitanya Kulkarni
__blkdev_issue_discard() always returns 0, making the error assignment in __submit_discard_cmd() dead code. Initialize err to 0 and remove the error assignment from the __blkdev_issue_discard() call to err. Move fault injection code into already present if branch where err is set to -EIO. This preserves the fault injection behavior while removing dead error handling. Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Chaitanya Kulkarni <ckulkarnilinux@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04f2fs: optimize trace_f2fs_write_checkpoint with enumsYH Lin
This patch optimizes the tracepoint by replacing these hardcoded strings with a new enumeration f2fs_cp_phase. 1.Defines enum f2fs_cp_phase with values for each checkpoint phase. 2.Updates trace_f2fs_write_checkpoint to accept a u16 phase argument instead of a string pointer. 3.Uses __print_symbolic in TP_printk to convert the enum values back to their corresponding strings for human-readable trace output. This change reduces the storage overhead for each trace event by replacing a variable-length string with a 2-byte integer, while maintaining the same readable output in ftrace. Signed-off-by: YH Lin <yhli@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04f2fs: fix to not account invalid blocks in get_left_section_blocks()Chao Yu
w/ LFS mode, in get_left_section_blocks(), we should not account the blocks which were used before and now are invalided, otherwise those blocks will be counted as freed one in has_curseg_enough_space(), result in missing to trigger GC in time. Cc: stable@kernel.org Fixes: 249ad438e1d9 ("f2fs: add a method for calculating the remaining blocks in the current segment in LFS mode.") Fixes: bf34c93d2645 ("f2fs: check curseg space before foreground GC") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04f2fs: support to show curseg.next_blkoff in debugfsChao Yu
cat /sys/kernel/debug/f2fs/status Main area: 17 segs, 17 secs 17 zones TYPE blkoff segno secno zoneno dirty_seg full_seg valid_blk - COLD data: 0 4 4 4 0 0 0 - WARM data: 0 7 7 7 0 0 0 - HOT data: 1 5 5 5 2 0 512 - Dir dnode: 3 0 0 0 1 0 2 - File dnode: 0 1 1 1 0 0 0 - Indir nodes: 0 2 2 2 0 0 0 - Pinned file: 0 -1 -1 -1 - ATGC data: 0 -1 -1 -1 Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04f2fs: expand scalability of f2fs mount optionChao Yu
opt field in structure f2fs_mount_info and opt_mask field in structure f2fs_fs_context is 32-bits variable, now we're running out of available bits in them, let's expand them to 64-bits for better scalability. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04f2fs: change default schedule timeout valueChao Yu
This patch changes default schedule timeout value from 20ms to 1ms, in order to give caller more chances to check whether IO or non-IO congestion condition has already been mitigable. In addition, default interval of periodical discard submission is kept to 20ms. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04f2fs: introduce f2fs_schedule_timeout()Chao Yu
In f2fs retry logic, we will call f2fs_io_schedule_timeout() to sleep as uninterruptible state (waiting for IO) for a while, however, in several paths below, we are not blocked by IO: - f2fs_write_single_data_page() return -EAGAIN due to racing on cp_rwsem. - f2fs_flush_device_cache() failed to submit preflush command. - __issue_discard_cmd_range() sleeps periodically in between two in batch discard submissions. So, in order to reveal state of task more accurate, let's introduce f2fs_schedule_timeout() and call it in above paths in where we are waiting for non-IO reasons. Then we can get real reason of uninterruptible sleep for a thread in tracepoint, perfetto, etc. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04f2fs: use memalloc_retry_wait() as much as possibleChao Yu
memalloc_retry_wait() is recommended in memory allocation retry logic, use it as much as possible. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04f2fs: add a sysfs entry to show max open zonesYongpeng Yang
This patch adds a sysfs entry showing the max zones that F2FS can write concurrently. Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04f2fs: wrap all unusable_blocks_per_sec code in CONFIG_BLK_DEV_ZONEDYongpeng Yang
The usage of unusable_blocks_per_sec is already wrapped by CONFIG_BLK_DEV_ZONED, except for its declaration and the definitions of CAP_BLKS_PER_SEC and CAP_SEGS_PER_SEC. This patch ensures that all code related to unusable_blocks_per_sec is properly wrapped under the CONFIG_BLK_DEV_ZONED option. Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04f2fs: simplify list initialization in f2fs_recover_fsync_data()Baolin Liu
In f2fs_recover_fsync_data(),use LIST_HEAD() to declare and initialize the list_head in one step instead of using INIT_LIST_HEAD() separately. No functional change. Signed-off-by: Baolin Liu <liubaolin@kylinos.cn> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04f2fs: revert summary entry count from 2048 to 512 in 16kb block supportDaeho Jeong
The recent increase in the number of Segment Summary Area (SSA) entries from 512 to 2048 was an unintentional change in logic of 16kb block support. This commit corrects the issue. To better utilize the space available from the erroneous 2048-entry calculation, we are implementing a solution to share the currently unused SSA space with neighboring segments. This enhances overall SSA utilization without impacting the established 8MB segment size. Fixes: d7e9a9037de2 ("f2fs: Support Block Size == Page Size") Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04f2fs: fix to detect recoverable inode during dryrun of find_fsync_dnodes()Chao Yu
mkfs.f2fs -f /dev/vdd mount /dev/vdd /mnt/f2fs touch /mnt/f2fs/foo sync # avoid CP_UMOUNT_FLAG in last f2fs_checkpoint.ckpt_flags touch /mnt/f2fs/bar f2fs_io fsync /mnt/f2fs/bar f2fs_io shutdown 2 /mnt/f2fs umount /mnt/f2fs blockdev --setro /dev/vdd mount /dev/vdd /mnt/f2fs mount: /mnt/f2fs: WARNING: source write-protected, mounted read-only. For the case if we create and fsync a new inode before sudden power-cut, without norecovery or disable_roll_forward mount option, the following mount will succeed w/o recovering last fsynced inode. The problem here is that we only check inode_list list after find_fsync_dnodes() in f2fs_recover_fsync_data() to find out whether there is recoverable data in the iamge, but there is a missed case, if last fsynced inode is not existing in last checkpoint, then, we will fail to get its inode due to nat of inode node is not existing in last checkpoint, so the inode won't be linked in inode_list. Let's detect such case in dyrun mode to fix this issue. After this change, mount will fail as expected below: mount: /mnt/f2fs: cannot mount /dev/vdd read-only. dmesg(1) may have more information after failed mount system call. demsg: F2FS-fs (vdd): Need to recover fsync data, but write access unavailable, please try mount w/ disable_roll_forward or norecovery Cc: stable@kernel.org Fixes: 6781eabba1bd ("f2fs: give -EINVAL for norecovery and rw mount") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04f2fs: fix return value of f2fs_recover_fsync_data()Chao Yu
With below scripts, it will trigger panic in f2fs: mkfs.f2fs -f /dev/vdd mount /dev/vdd /mnt/f2fs touch /mnt/f2fs/foo sync echo 111 >> /mnt/f2fs/foo f2fs_io fsync /mnt/f2fs/foo f2fs_io shutdown 2 /mnt/f2fs umount /mnt/f2fs mount -o ro,norecovery /dev/vdd /mnt/f2fs or mount -o ro,disable_roll_forward /dev/vdd /mnt/f2fs F2FS-fs (vdd): f2fs_recover_fsync_data: recovery fsync data, check_only: 0 F2FS-fs (vdd): Mounted with checkpoint version = 7f5c361f F2FS-fs (vdd): Stopped filesystem due to reason: 0 F2FS-fs (vdd): f2fs_recover_fsync_data: recovery fsync data, check_only: 1 Filesystem f2fs get_tree() didn't set fc->root, returned 1 ------------[ cut here ]------------ kernel BUG at fs/super.c:1761! Oops: invalid opcode: 0000 [#1] SMP PTI CPU: 3 UID: 0 PID: 722 Comm: mount Not tainted 6.18.0-rc2+ #721 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 RIP: 0010:vfs_get_tree.cold+0x18/0x1a Call Trace: <TASK> fc_mount+0x13/0xa0 path_mount+0x34e/0xc50 __x64_sys_mount+0x121/0x150 do_syscall_64+0x84/0x800 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7fa6cc126cfe The root cause is we missed to handle error number returned from f2fs_recover_fsync_data() when mounting image w/ ro,norecovery or ro,disable_roll_forward mount option, result in returning a positive error number to vfs_get_tree(), fix it. Cc: stable@kernel.org Fixes: 6781eabba1bd ("f2fs: give -EINVAL for norecovery and rw mount") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04f2fs: add fadvise tracepointJaegeuk Kim
This adds a tracepoint in the fadvise call path. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04f2fs: fix age extent cache insertion skip on counter overflowXiaole He
The age extent cache uses last_blocks (derived from allocated_data_blocks) to determine data age. However, there's a conflict between the deletion marker (last_blocks=0) and legitimate last_blocks=0 cases when allocated_data_blocks overflows to 0 after reaching ULLONG_MAX. In this case, valid extents are incorrectly skipped due to the "if (!tei->last_blocks)" check in __update_extent_tree_range(). This patch fixes the issue by: 1. Reserving ULLONG_MAX as an invalid/deletion marker 2. Limiting allocated_data_blocks to range [0, ULLONG_MAX-1] 3. Using F2FS_EXTENT_AGE_INVALID for deletion scenarios 4. Adjusting overflow age calculation from ULLONG_MAX to (ULLONG_MAX-1) Reproducer (using a patched kernel with allocated_data_blocks initialized to ULLONG_MAX - 3 for quick testing): Step 1: Mount and check initial state # dd if=/dev/zero of=/tmp/test.img bs=1M count=100 # mkfs.f2fs -f /tmp/test.img # mkdir -p /mnt/f2fs_test # mount -t f2fs -o loop,age_extent_cache /tmp/test.img /mnt/f2fs_test # cat /sys/kernel/debug/f2fs/status | grep -A 4 "Block Age" Allocated Data Blocks: 18446744073709551612 # ULLONG_MAX - 3 Inner Struct Count: tree: 1(0), node: 0 Step 2: Create files and write data to trigger overflow # touch /mnt/f2fs_test/{1,2,3,4}.txt; sync # cat /sys/kernel/debug/f2fs/status | grep -A 4 "Block Age" Allocated Data Blocks: 18446744073709551613 # ULLONG_MAX - 2 Inner Struct Count: tree: 5(0), node: 1 # dd if=/dev/urandom of=/mnt/f2fs_test/1.txt bs=4K count=1; sync # cat /sys/kernel/debug/f2fs/status | grep -A 4 "Block Age" Allocated Data Blocks: 18446744073709551614 # ULLONG_MAX - 1 Inner Struct Count: tree: 5(0), node: 2 # dd if=/dev/urandom of=/mnt/f2fs_test/2.txt bs=4K count=1; sync # cat /sys/kernel/debug/f2fs/status | grep -A 4 "Block Age" Allocated Data Blocks: 18446744073709551615 # ULLONG_MAX Inner Struct Count: tree: 5(0), node: 3 # dd if=/dev/urandom of=/mnt/f2fs_test/3.txt bs=4K count=1; sync # cat /sys/kernel/debug/f2fs/status | grep -A 4 "Block Age" Allocated Data Blocks: 0 # Counter overflowed! Inner Struct Count: tree: 5(0), node: 4 Step 3: Trigger the bug - next write should create node but gets skipped # dd if=/dev/urandom of=/mnt/f2fs_test/4.txt bs=4K count=1; sync # cat /sys/kernel/debug/f2fs/status | grep -A 4 "Block Age" Allocated Data Blocks: 1 Inner Struct Count: tree: 5(0), node: 4 Expected: node: 5 (new extent node for 4.txt) Actual: node: 4 (extent insertion was incorrectly skipped due to last_blocks = allocated_data_blocks = 0 in __get_new_block_age) After this fix, the extent node is correctly inserted and node count becomes 5 as expected. Fixes: 71644dff4811 ("f2fs: add block_age-based extent cache") Cc: stable@kernel.org Signed-off-by: Xiaole He <hexiaole1994@126.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04f2fs: Add sanity checks before unlinking and loading inodesNikola Z. Ivanov
Add check for inode->i_nlink == 1 for directories during unlink, as their value is decremented twice, which can trigger a warning in drop_nlink. In such case mark the filesystem as corrupted and return from the function call with the relevant failure return value. Additionally add the check for i_nlink == 1 in sanity_check_inode in order to detect on-disk corruption early. Reported-by: syzbot+c07d47c7bc68f47b9083@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=c07d47c7bc68f47b9083 Tested-by: syzbot+c07d47c7bc68f47b9083@syzkaller.appspotmail.com Signed-off-by: Nikola Z. Ivanov <zlatistiv@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04f2fs: Rename f2fs_unlink exit labelNikola Z. Ivanov
Rename "fail" label to "out" as it's used as a default exit path out of f2fs_unlink as well as error path. Signed-off-by: Nikola Z. Ivanov <zlatistiv@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04f2fs: ensure minimum trim granularity accounts for all devicesYongpeng Yang
When F2FS uses multiple block devices, each device may have a different discard granularity. The minimum trim granularity must be at least the maximum discard granularity of all devices, excluding zoned devices. Use max_t instead of the max() macro to compute the maximum value. Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04f2fs: fix uninitialized one_time_gc in victim_sel_policyXiaole He
The one_time_gc field in struct victim_sel_policy is conditionally initialized but unconditionally read, leading to undefined behavior that triggers UBSAN warnings. In f2fs_get_victim() at fs/f2fs/gc.c:774, the victim_sel_policy structure is declared without initialization: struct victim_sel_policy p; The field p.one_time_gc is only assigned when the 'one_time' parameter is true (line 789): if (one_time) { p.one_time_gc = one_time; ... } However, this field is unconditionally read in subsequent get_gc_cost() at line 395: if (p->one_time_gc && (valid_thresh_ratio < 100) && ...) When one_time is false, p.one_time_gc contains uninitialized stack memory. Hence p.one_time_gc is an invalid bool value. UBSAN detects this invalid bool value: UBSAN: invalid-load in fs/f2fs/gc.c:395:7 load of value 77 is not a valid value for type '_Bool' CPU: 3 UID: 0 PID: 1297 Comm: f2fs_gc-252:16 Not tainted 6.18.0-rc3 #5 PREEMPT(voluntary) Hardware name: OpenStack Foundation OpenStack Nova, BIOS 1.13.0-1ubuntu1.1 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x70/0x90 dump_stack+0x14/0x20 __ubsan_handle_load_invalid_value+0xb3/0xf0 ? dl_server_update+0x2e/0x40 ? update_curr+0x147/0x170 f2fs_get_victim.cold+0x66/0x134 [f2fs] ? sched_balance_newidle+0x2ca/0x470 ? finish_task_switch.isra.0+0x8d/0x2a0 f2fs_gc+0x2ba/0x8e0 [f2fs] ? _raw_spin_unlock_irqrestore+0x12/0x40 ? __timer_delete_sync+0x80/0xe0 ? timer_delete_sync+0x14/0x20 ? schedule_timeout+0x82/0x100 gc_thread_func+0x38b/0x860 [f2fs] ? gc_thread_func+0x38b/0x860 [f2fs] ? __pfx_autoremove_wake_function+0x10/0x10 kthread+0x10b/0x220 ? __pfx_gc_thread_func+0x10/0x10 [f2fs] ? _raw_spin_unlock_irq+0x12/0x40 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x11a/0x160 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> This issue is reliably reproducible with the following steps on a 100GB SSD /dev/vdb: mkfs.f2fs -f /dev/vdb mount /dev/vdb /mnt/f2fs_test fio --name=gc --directory=/mnt/f2fs_test --rw=randwrite \ --bs=4k --size=8G --numjobs=12 --fsync=4 --runtime=10 \ --time_based echo 1 > /sys/fs/f2fs/vdb/gc_urgent The uninitialized value causes incorrect GC victim selection, leading to unpredictable garbage collection behavior. Fix by zero-initializing the entire victim_sel_policy structure to ensure all fields have defined values. Fixes: e791d00bd06c ("f2fs: add valid block ratio not to do excessive GC for one time GC") Cc: stable@kernel.org Signed-off-by: Xiaole He <hexiaole1994@126.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04f2fs: ensure node page reads complete before f2fs_put_super() finishesJan Prusakowski
Xfstests generic/335, generic/336 sometimes crash with the following message: F2FS-fs (dm-0): detect filesystem reference count leak during umount, type: 9, count: 1 ------------[ cut here ]------------ kernel BUG at fs/f2fs/super.c:1939! Oops: invalid opcode: 0000 [#1] SMP NOPTI CPU: 1 UID: 0 PID: 609351 Comm: umount Tainted: G W 6.17.0-rc5-xfstests-g9dd1835ecda5 #1 PREEMPT(none) Tainted: [W]=WARN Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 RIP: 0010:f2fs_put_super+0x3b3/0x3c0 Call Trace: <TASK> generic_shutdown_super+0x7e/0x190 kill_block_super+0x1a/0x40 kill_f2fs_super+0x9d/0x190 deactivate_locked_super+0x30/0xb0 cleanup_mnt+0xba/0x150 task_work_run+0x5c/0xa0 exit_to_user_mode_loop+0xb7/0xc0 do_syscall_64+0x1ae/0x1c0 entry_SYSCALL_64_after_hwframe+0x76/0x7e </TASK> ---[ end trace 0000000000000000 ]--- It appears that sometimes it is possible that f2fs_put_super() is called before all node page reads are completed. Adding a call to f2fs_wait_on_all_pages() for F2FS_RD_NODE fixes the problem. Cc: stable@kernel.org Fixes: 20872584b8c0b ("f2fs: fix to drop all dirty meta/node pages during umount()") Signed-off-by: Jan Prusakowski <jprusakowski@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04f2fs: block cache/dio write during f2fs_enable_checkpoint()Chao Yu
If there are too many background IOs during f2fs_enable_checkpoint(), sync_inodes_sb() may be blocked for long time due to it will loop to write dirty datas which are generated by in parallel write() continuously. Let's change as below to resolve this issue: - hold cp_enable_rwsem write lock to block any cache/dio write - decrease DEF_ENABLE_INTERVAL from 16 to 5 In addition, dump more logs during f2fs_enable_checkpoint(). Testcase: 1. fill data into filesystem until 90% usage. 2. mount -o remount,checkpoint=disable:10% /data 3. fio --rw=randwrite --bs=4kb --size=1GB --numjobs=10 \ --iodepth=64 --ioengine=psync --time_based --runtime=600 \ --directory=/data/fio_dir/ & 4. mount -o remount,checkpoint=enable /data Before: F2FS-fs (dm-51): f2fs_enable_checkpoint() finishes, writeback:7232, sync:39793, cp:457 After: F2FS-fs (dm-51): f2fs_enable_checkpoint end, writeback:5032, lock:0, sync_inode:5552, sync_fs:84 Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>