summaryrefslogtreecommitdiff
path: root/drivers/base/devtmpfs.c
AgeCommit message (Collapse)Author
2025-12-05Merge tag 'pull-persistency' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull persistent dentry infrastructure and conversion from Al Viro: "Some filesystems use a kinda-sorta controlled dentry refcount leak to pin dentries of created objects in dcache (and undo it when removing those). A reference is grabbed and not released, but it's not actually _stored_ anywhere. That works, but it's hard to follow and verify; among other things, we have no way to tell _which_ of the increments is intended to be an unpaired one. Worse, on removal we need to decide whether the reference had already been dropped, which can be non-trivial if that removal is on umount and we need to figure out if this dentry is pinned due to e.g. unlink() not done. Usually that is handled by using kill_litter_super() as ->kill_sb(), but there are open-coded special cases of the same (consider e.g. /proc/self). Things get simpler if we introduce a new dentry flag (DCACHE_PERSISTENT) marking those "leaked" dentries. Having it set claims responsibility for +1 in refcount. The end result this series is aiming for: - get these unbalanced dget() and dput() replaced with new primitives that would, in addition to adjusting refcount, set and clear persistency flag. - instead of having kill_litter_super() mess with removing the remaining "leaked" references (e.g. for all tmpfs files that hadn't been removed prior to umount), have the regular shrink_dcache_for_umount() strip DCACHE_PERSISTENT of all dentries, dropping the corresponding reference if it had been set. After that kill_litter_super() becomes an equivalent of kill_anon_super(). Doing that in a single step is not feasible - it would affect too many places in too many filesystems. It has to be split into a series. This work has really started early in 2024; quite a few preliminary pieces have already gone into mainline. This chunk is finally getting to the meat of that stuff - infrastructure and most of the conversions to it. Some pieces are still sitting in the local branches, but the bulk of that stuff is here" * tag 'pull-persistency' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits) d_make_discardable(): warn if given a non-persistent dentry kill securityfs_recursive_remove() convert securityfs get rid of kill_litter_super() convert rust_binderfs convert nfsctl convert rpc_pipefs convert hypfs hypfs: swich hypfs_create_u64() to returning int hypfs: switch hypfs_create_str() to returning int hypfs: don't pin dentries twice convert gadgetfs gadgetfs: switch to simple_remove_by_name() convert functionfs functionfs: switch to simple_remove_by_name() functionfs: fix the open/removal races functionfs: need to cancel ->reset_work in ->kill_sb() functionfs: don't bother with ffs->ref in ffs_data_{opened,closed}() functionfs: don't abuse ffs_data_closed() on fs shutdown convert selinuxfs ...
2025-11-16convert ramfs and tmpfsAl Viro
Quite a bit is already done by infrastructure changes (simple_link(), simple_unlink()) - all that is left is replacing d_instantiate() + pinning dget() (in ->symlink() and ->mknod()) with d_make_persistent(), and, in case of shmem, using simple_unlink() and simple_link() in ->unlink() and ->link() resp., instead of open-coding those there. Since d_make_persistent() accepts (and hashes) unhashed ones, shmem situation gets simpler - we no longer care whether ->lookup() has hashed the sucker. With that done, we don't need kill_litter_super() for these filesystems anymore - by the umount time all remaining dentries will be marked persistent and kill_litter_super() will boil down to call of kill_anon_super(). The same goes for devtmpfs and rootfs - they are handled by ramfs or by shmem, depending upon config. NB: strictly speaking, both devtmpfs and rootfs ought to use ramfs_kill_sb() if they end up using ramfs; that's a separate story and the only impact of "just use kill_{litter,anon}_super()" is that we fail to free their sb->s_fs_info... on reboot. That's orthogonal to the changes in this series - kill_litter_super() is identical to kill_anon_super() for those at this point. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-12vfs: make vfs_mknod break delegations on parent directoryJeff Layton
In order to add directory delegation support, we need to break delegations on the parent whenever there is going to be a change in the directory. Add a new delegated_inode pointer to vfs_mknod() and have the appropriate callers wait when there is an outstanding delegation. All other callers just set the pointer to NULL. Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20251111-dir-deleg-ro-v6-11-52f3feebb2f2@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-12vfs: allow rmdir to wait for delegation break on parentJeff Layton
In order to add directory delegation support, we need to break delegations on the parent whenever there is going to be a change in the directory. Add a delegated_inode struct to vfs_rmdir() and populate that pointer with the parent inode if it's non-NULL. Most existing in-kernel callers pass in a NULL pointer. Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20251111-dir-deleg-ro-v6-7-52f3feebb2f2@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-12vfs: allow mkdir to wait for delegation break on parentJeff Layton
In order to add directory delegation support, we need to break delegations on the parent whenever there is going to be a change in the directory. Add a new delegated_inode parameter to vfs_mkdir. All of the existing callers set that to NULL for now, except for do_mkdirat which will properly block until the lease is gone. Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20251111-dir-deleg-ro-v6-6-52f3feebb2f2@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-23VFS: rename kern_path_locked() and related functions.NeilBrown
kern_path_locked() is now only used to prepare for removing an object from the filesystem (and that is the only credible reason for wanting a positive locked dentry). Thus it corresponds to kern_path_create() and so should have a corresponding name. Unfortunately the name "kern_path_create" is somewhat misleading as it doesn't actually create anything. The recently added simple_start_creating() provides a better pattern I believe. The "start" can be matched with "end" to bracket the creating or removing. So this patch changes names: kern_path_locked -> start_removing_path kern_path_create -> start_creating_path user_path_create -> start_creating_user_path user_path_locked_at -> start_removing_user_path_at done_path_create -> end_creating_path and also introduces end_removing_path() which is identical to end_creating_path(). __start_removing_path (which was __kern_path_locked) is enhanced to call mnt_want_write() for consistency with the start_creating_path(). Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: NeilBrown <neil@brown.name> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-25devtmpfs: don't use vfs_getattr_nosec to query i_modeChristoph Hellwig
The recent move of the bdev_statx call to the low-level vfs_getattr_nosec helper caused it being used by devtmpfs, which leads to deadlocks in md teardown due to the block device lookup and put interfering with the unusual lifetime rules in md. But as handle_remove only works on inodes created and owned by devtmpfs itself there is no need to use vfs_getattr_nosec vs simply reading the mode from the inode directly. Switch to that to avoid the bdev lookup or any other unintentional side effect. Reported-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reported-by: Xiao Ni <xni@redhat.com> Fixes: 777d0961ff95 ("fs: move the bdex_statx call to vfs_getattr_nosec") Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/20250423045941.1667425-1-hch@lst.de Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Tested-by: Xiao Ni <xni@redhat.com> Tested-by: Ayush Jain <Ayush.jain3@amd.com> Tested-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-24Merge tag 'vfs-6.15-rc1.async.dir' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs async dir updates from Christian Brauner: "This contains cleanups that fell out of the work from async directory handling: - Change kern_path_locked() and user_path_locked_at() to never return a negative dentry. This simplifies the usability of these helpers in various places - Drop d_exact_alias() from the remaining place in NFS where it is still used. This also allows us to drop the d_exact_alias() helper completely - Drop an unnecessary call to fh_update() from nfsd_create_locked() - Change i_op->mkdir() to return a struct dentry Change vfs_mkdir() to return a dentry provided by the filesystems which is hashed and positive. This allows us to reduce the number of cases where the resulting dentry is not positive to very few cases. The code in these places becomes simpler and easier to understand. - Repack DENTRY_* and LOOKUP_* flags" * tag 'vfs-6.15-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: doc: fix inline emphasis warning VFS: Change vfs_mkdir() to return the dentry. nfs: change mkdir inode_operation to return alternate dentry if needed. fuse: return correct dentry for ->mkdir ceph: return the correct dentry on mkdir hostfs: store inode in dentry after mkdir if possible. Change inode_operations.mkdir to return struct dentry * nfsd: drop fh_update() from S_IFDIR branch of nfsd_create_locked() nfs/vfs: discard d_exact_alias() VFS: add common error checks to lookup_one_qstr_excl() VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry VFS: repack LOOKUP_ bit flags. VFS: repack DENTRY_ flags.
2025-03-05VFS: Change vfs_mkdir() to return the dentry.NeilBrown
vfs_mkdir() does not guarantee to leave the child dentry hashed or make it positive on success, and in many such cases the filesystem had to use a different dentry which it can now return. This patch changes vfs_mkdir() to return the dentry provided by the filesystems which is hashed and positive when provided. This reduces the number of cases where the resulting dentry is not positive to a handful which don't deserve extra efforts. The only callers of vfs_mkdir() which are interested in the resulting inode are in-kernel filesystem clients: cachefiles, nfsd, smb/server. The only filesystems that don't reliably provide the inode are: - kernfs, tracefs which these clients are unlikely to be interested in - cifs in some configurations would need to do a lookup to find the created inode, but doesn't. cifs cannot be exported via NFS, is unlikely to be used by cachefiles, and smb/server only has a soft requirement for the inode, so this is unlikely to be a problem in practice. - hostfs, nfs, cifs may need to do a lookup (rarely for NFS) and it is possible for a race to make that lookup fail. Actual failure is unlikely and providing callers handle negative dentries graceful they will fail-safe. So this patch removes the lookup code in nfsd and smb/server and adjusts them to fail safe if a negative dentry is provided: - cache-files already fails safe by restarting the task from the top - it still does with this change, though it no longer calls cachefiles_put_directory() as that will crash if the dentry is negative. - nfsd reports "Server-fault" which it what it used to do if the lookup failed. This will never happen on any file-systems that it can actually export, so this is of no consequence. I removed the fh_update() call as that is not needed and out-of-place. A subsequent nfsd_create_setattr() call will call fh_update() when needed. - smb/server only wants the inode to call ksmbd_smb_inherit_owner() which updates ->i_uid (without calling notify_change() or similar) which can be safely skipping on cifs (I hope). If a different dentry is returned, the first one is put. If necessary the fact that it is new can be determined by comparing pointers. A new dentry will certainly have a new pointer (as the old is put after the new is obtained). Similarly if an error is returned (via ERR_PTR()) the original dentry is put. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neilb@suse.de> Link: https://lore.kernel.org/r/20250227013949.536172-7-neilb@suse.de Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-19VFS: change kern_path_locked() and user_path_locked_at() to never return ↵NeilBrown
negative dentry No callers of kern_path_locked() or user_path_locked_at() want a negative dentry. So change them to return -ENOENT instead. This simplifies callers. This results in a subtle change to bcachefs in that an ioctl will now return -ENOENT in preference to -EXDEV. I believe this restores the behaviour to what it was prior to Commit bbe6a7c899e7 ("bch2_ioctl_subvolume_destroy(): fix locking") Signed-off-by: NeilBrown <neilb@suse.de> Link: https://lore.kernel.org/r/20250217003020.3170652-2-neilb@suse.de Acked-by: Paul Moore <paul@paul-moore.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-06devtmpfs: replace ->mount with ->get_tree in public instanceEric Sandeen
To finalize mount API conversion, remove the ->mount op from the public instance in favor of ->get_tree etc. Copy most ops from the underlying ops vector (whether it's shmem or ramfs) and substitute our own ->get_tree which simply takes an extra reference on the existing internal mount as before. Thanks to Al for the fs_context_for_reconfigure() idea. Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Link: https://lore.kernel.org/r/20250205213931.74614-4-sandeen@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-03-31driver core: clean up the logic to determine which /sys/dev/ directory to useGreg Kroah-Hartman
When a dev_t is set in a struct device, an symlink in /sys/dev/ is created for it either under /sys/dev/block/ or /sys/dev/char/ depending on the device type. The logic to determine this would trigger off of the class of the object, and the kobj_type set in that location. But it turns out that this deep nesting isn't needed at all, as it's either a choice of block or "everything else" which is a char device. So make the logic a lot more simple and obvious, and remove the incorrect comments in the code that tried to document something that was not happening at all (it is impossible to set class->dev_kobj to NULL as the class core prevented that from happening. This removes the only place that class->dev_kobj was being used, so after this, it can be removed entirely. Acked-by: Rafael J. Wysocki <rafael@kernel.org> Link: https://lore.kernel.org/r/20230331093318.82288-4-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-24Merge tag 'driver-core-6.3-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the large set of driver core changes for 6.3-rc1. There's a lot of changes this development cycle, most of the work falls into two different categories: - fw_devlink fixes and updates. This has gone through numerous review cycles and lots of review and testing by lots of different devices. Hopefully all should be good now, and Saravana will be keeping a watch for any potential regression on odd embedded systems. - driver core changes to work to make struct bus_type able to be moved into read-only memory (i.e. const) The recent work with Rust has pointed out a number of areas in the driver core where we are passing around and working with structures that really do not have to be dynamic at all, and they should be able to be read-only making things safer overall. This is the contuation of that work (started last release with kobject changes) in moving struct bus_type to be constant. We didn't quite make it for this release, but the remaining patches will be finished up for the release after this one, but the groundwork has been laid for this effort. Other than that we have in here: - debugfs memory leak fixes in some subsystems - error path cleanups and fixes for some never-able-to-be-hit codepaths. - cacheinfo rework and fixes - Other tiny fixes, full details are in the shortlog All of these have been in linux-next for a while with no reported problems" [ Geert Uytterhoeven points out that that last sentence isn't true, and that there's a pending report that has a fix that is queued up - Linus ] * tag 'driver-core-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (124 commits) debugfs: drop inline constant formatting for ERR_PTR(-ERROR) OPP: fix error checking in opp_migrate_dentry() debugfs: update comment of debugfs_rename() i3c: fix device.h kernel-doc warnings dma-mapping: no need to pass a bus_type into get_arch_dma_ops() driver core: class: move EXPORT_SYMBOL_GPL() lines to the correct place Revert "driver core: add error handling for devtmpfs_create_node()" Revert "devtmpfs: add debug info to handle()" Revert "devtmpfs: remove return value of devtmpfs_delete_node()" driver core: cpu: don't hand-override the uevent bus_type callback. devtmpfs: remove return value of devtmpfs_delete_node() devtmpfs: add debug info to handle() driver core: add error handling for devtmpfs_create_node() driver core: bus: update my copyright notice driver core: bus: add bus_get_dev_root() function driver core: bus: constify bus_unregister() driver core: bus: constify some internal functions driver core: bus: constify bus_get_kset() driver core: bus: constify bus_register/unregister_notifier() driver core: remove private pointer from struct bus_type ...
2023-02-14Revert "devtmpfs: add debug info to handle()"Greg Kroah-Hartman
This reverts commit 90a9d5ff225267b3376f73c19f21174e3b6d7746 as it is reported to cause boot regressions. Link: https://lore.kernel.org/r/Y+rSXg14z1Myd8Px@dev-arch.thelio-3990X Reported-by: Nathan Chancellor <nathan@kernel.org> Cc: Longlong Xia <xialonglong1@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-14Revert "devtmpfs: remove return value of devtmpfs_delete_node()"Greg Kroah-Hartman
This reverts commit 9d3fe6aa6b9517408064c7c3134187e8ec77dbf7 as it is reported to cause boot regressions. Link: https://lore.kernel.org/r/Y+rSXg14z1Myd8Px@dev-arch.thelio-3990X Reported-by: Nathan Chancellor <nathan@kernel.org> Cc: Longlong Xia <xialonglong1@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-11devtmpfs: remove return value of devtmpfs_delete_node()Longlong Xia
The only caller of device_del() does not check the return value. And there's nothing we can do when cleaning things up on a remove path. Let's make it a void function. Signed-off-by: Longlong Xia <xialonglong1@huawei.com> Link: https://lore.kernel.org/r/20230210095444.4067307-4-xialonglong1@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-10devtmpfs: add debug info to handle()Longlong Xia
Because handle() is the core function for processing devtmpfs requests, Let's add some debug info in handle() to help users know why failed. Signed-off-by: Longlong Xia <xialonglong1@huawei.com> Link: https://lore.kernel.org/r/20230210095444.4067307-3-xialonglong1@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-02devtmpfs: convert to pr_fmtLonglong Xia
Use the pr_fmt() macro to prefix all the output with "devtmpfs: ". while at it, convert printk(<LEVEL>) to pr_<level>(). Signed-off-by: Longlong Xia <xialonglong1@huawei.com> Link: https://lore.kernel.org/r/20230202033203.1239239-2-xialonglong1@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-18fs: port vfs_*() helpers to struct mnt_idmapChristian Brauner
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-06-27devtmpfs: fix the dangling pointer of global devtmpfsd threadYangxi Xiang
When the devtmpfs fails to mount, a dangling pointer still remains in global. Specifically, the err variable is passed by a pointer to the devtmpfsd. When the devtmpfsd exits, it sets the error and completes the setup_done. In this situation, the thread pointer is not set to null. After the devtmpfsd exited, the devtmpfs can wakes up the destroyed devtmpfsd thread by wake_up_process if a device change event comes. Signed-off-by: Yangxi Xiang <xyangxi5@gmail.com> Link: https://lore.kernel.org/r/20220627120409.11174-1-xyangxi5@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-28Merge tag 'driver-core-5.18-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the set of driver core changes for 5.18-rc1. Not much here, primarily it was a bunch of cleanups and small updates: - kobj_type cleanups for default_groups - documentation updates - firmware loader minor changes - component common helper added and take advantage of it in many drivers (the largest part of this pull request). All of these have been in linux-next for a while with no reported problems" * tag 'driver-core-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (54 commits) Documentation: update stable review cycle documentation drivers/base/dd.c : Remove the initial value of the global variable Documentation: update stable tree link Documentation: add link to stable release candidate tree devres: fix typos in comments Documentation: add note block surrounding security patch note samples/kobject: Use sysfs_emit instead of sprintf base: soc: Make soc_device_match() simpler and easier to read driver core: dd: fix return value of __setup handler driver core: Refactor sysfs and drv/bus remove hooks driver core: Refactor multiple copies of device cleanup scripts: get_abi.pl: Fix typo in help message kernfs: fix typos in comments kernfs: remove unneeded #if 0 guard ALSA: hda/realtek: Make use of the helper component_compare_dev_name video: omapfb: dss: Make use of the helper component_compare_dev power: supply: ab8500: Make use of the helper component_compare_dev ASoC: codecs: wcd938x: Make use of the helper component_compare/release_of iommu/mediatek: Make use of the helper component_compare/release_of drm: of: Make use of the helper component_release_of ...
2022-02-04devtmpfs: drop redundant fs parameters from internal fsAnthony Iliopoulos
The internal_fs_type is mounted via vfs_kernel_mount() and is never registered as a filesystem, thus specifying the parameters is redundant as those params will not be validated by fs_validate_description(). Both {shmem,ramfs}_fs_parameters are anyway validated when those respective filesystems are first registered, so there is no reason to pass them to devtmpfs too, drop them. Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Anthony Iliopoulos <ailiop@suse.com> Link: https://lore.kernel.org/r/20220119220248.32225-1-ailiop@suse.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-02block: remove genhd.hChristoph Hellwig
There is no good reason to keep genhd.h separate from the main blkdev.h header that includes it. So fold the contents of genhd.h into blkdev.h and remove genhd.h entirely. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220124093913.742411-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-01-17devtmpfs regression fix: reconfigure on each mountNeilBrown
Prior to Linux v5.4 devtmpfs used mount_single() which treats the given mount options as "remount" options, so it updates the configuration of the single super_block on each mount. Since that was changed, the mount options used for devtmpfs are ignored. This is a regression which affect systemd - which mounts devtmpfs with "-o mode=755,size=4m,nr_inodes=1m". This patch restores the "remount" effect by calling reconfigure_single() Fixes: d401727ea0d7 ("devtmpfs: don't mix {ramfs,shmem}_fill_super() with mount_single()") Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-12-30devtmpfs: mount with noexec and nosuidKees Cook
devtmpfs is writable. Add the noexec and nosuid as default mount flags to prevent code execution from /dev. The systems who don't use systemd and who rely on CONFIG_DEVTMPFS_MOUNT=y are the ones to be protected by this patch. Other systems are fine with the udev solution. No sane program should be relying on executing from /dev. So this patch reduces the attack surface. It doesn't prevent any specific attack, but it reduces the possibility that someone can use /dev as a place to put executable code. Chrome OS has been carrying this patch for several years. It seems trivial and simple solution to improve the protection of /dev when CONFIG_DEVTMPFS_MOUNT=y. Original patch: https://lore.kernel.org/lkml/20121120215059.GA1859@www.outflux.net/ Cc: ellyjones@chromium.org Cc: Kay Sievers <kay@vrfy.org> Cc: Roland Eggner <edvx1@systemanalysen.net> Co-developed-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Link: https://lore.kernel.org/r/YcMfDOyrg647RCmd@debian-BULLSEYE-live-builder-AMD64 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-23devtmpfs: actually reclaim some init memoryRasmus Villemoes
Currently gcc seems to inline devtmpfs_setup() into devtmpfsd(), so its memory footprint isn't reclaimed as intended. Mark it noinline to make sure it gets put in .init.text. While here, setup_done can also be put in .init.data: After complete() releases the internal spinlock, the completion object is never touched again by that thread, and the waiting thread doesn't proceed until it observes ->done while holding that spinlock. This is now the same pattern as for kthreadd_done in init/main.c: complete() is done in a __ref function, while the corresponding wait_for_completion() is in an __init function. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Link: https://lore.kernel.org/r/20210312103027.2701413-2-linux@rasmusvillemoes.dk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-23devtmpfs: fix placement of complete() callRasmus Villemoes
Calling complete() from within the __init function is wrong - theoretically, the init process could proceed all the way to freeing the init mem before the devtmpfsd thread gets to execute the return instruction in devtmpfs_setup(). In practice, it seems to be harmless as gcc inlines devtmpfs_setup() into devtmpfsd(). So the calls of the __init functions init_chdir() etc. actually happen from devtmpfs_setup(), but the __ref on that one silences modpost (it's all right, because those calls happen before the complete()). But it does make the __init annotation of the setup function moot, which we'll fix in a subsequent patch. Fixes: bcbacc4909f1 ("devtmpfs: refactor devtmpfsd()") Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Link: https://lore.kernel.org/r/20210312103027.2701413-1-linux@rasmusvillemoes.dk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-24namei: prepare for idmapped mountsChristian Brauner
The various vfs_*() helpers are called by filesystems or by the vfs itself to perform core operations such as create, link, mkdir, mknod, rename, rmdir, tmpfile and unlink. Enable them to handle idmapped mounts. If the inode is accessed through an idmapped mount map it into the mount's user namespace and pass it down. Afterwards the checks and operations are identical to non-idmapped mounts. If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Link: https://lore.kernel.org/r/20210121131959.646623-15-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-24attr: handle idmapped mountsChristian Brauner
When file attributes are changed most filesystems rely on the setattr_prepare(), setattr_copy(), and notify_change() helpers for initialization and permission checking. Let them handle idmapped mounts. If the inode is accessed through an idmapped mount map it into the mount's user namespace. Afterwards the checks are identical to non-idmapped mounts. If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Helpers that perform checks on the ia_uid and ia_gid fields in struct iattr assume that ia_uid and ia_gid are intended values and have already been mapped correctly at the userspace-kernelspace boundary as we already do today. If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Link: https://lore.kernel.org/r/20210121131959.646623-8-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-07-31init: add an init_chroot helperChristoph Hellwig
Add a simple helper to chroot with a kernel space file name and switch the early init code over to it. Remove the now unused ksys_chroot. Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-31init: add an init_chdir helperChristoph Hellwig
Add a simple helper to chdir with a kernel space file name and switch the early init code over to it. Remove the now unused ksys_chdir. Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-31init: add an init_mount helperChristoph Hellwig
Like do_mount, but takes a kernel pointer for the destination path. Switch over the mounts in the init code and devtmpfs to it, which just happen to work due to the implicit set_fs(KERNEL_DS) during early init right now. Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-31devtmpfs: refactor devtmpfsd()Christoph Hellwig
Split the main worker loop into a separate function. This allows devtmpfsd_setup to be marked __init, which will allows us to call __init routines for the setup work. devtmpfѕ itself needs a __ref marker for that to work, and a comment explaining why it works. Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-02-08Merge branch 'merge.nfs-fs_parse.1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs file system parameter updates from Al Viro: "Saner fs_parser.c guts and data structures. The system-wide registry of syntax types (string/enum/int32/oct32/.../etc.) is gone and so is the horror switch() in fs_parse() that would have to grow another case every time something got added to that system-wide registry. New syntax types can be added by filesystems easily now, and their namespace is that of functions - not of system-wide enum members. IOW, they can be shared or kept private and if some turn out to be widely useful, we can make them common library helpers, etc., without having to do anything whatsoever to fs_parse() itself. And we already get that kind of requests - the thing that finally pushed me into doing that was "oh, and let's add one for timeouts - things like 15s or 2h". If some filesystem really wants that, let them do it. Without somebody having to play gatekeeper for the variants blessed by direct support in fs_parse(), TYVM. Quite a bit of boilerplate is gone. And IMO the data structures make a lot more sense now. -200LoC, while we are at it" * 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (25 commits) tmpfs: switch to use of invalfc() cgroup1: switch to use of errorfc() et.al. procfs: switch to use of invalfc() hugetlbfs: switch to use of invalfc() cramfs: switch to use of errofc() et.al. gfs2: switch to use of errorfc() et.al. fuse: switch to use errorfc() et.al. ceph: use errorfc() and friends instead of spelling the prefix out prefix-handling analogues of errorf() and friends turn fs_param_is_... into functions fs_parse: handle optional arguments sanely fs_parse: fold fs_parameter_desc/fs_parameter_spec fs_parser: remove fs_parameter_description name field add prefix to fs_context->log ceph_parse_param(), ceph_parse_mon_ips(): switch to passing fc_log new primitive: __fs_parse() switch rbd and libceph to p_log-based primitives struct p_log, variants of warnf() et.al. taking that one instead teach logfc() to handle prefices, give it saner calling conventions get rid of cg_invalf() ...
2020-02-07fs_parse: fold fs_parameter_desc/fs_parameter_specAl Viro
The former contains nothing but a pointer to an array of the latter... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-01-22devtmpfs: factor out common tail of devtmpfs_{create,delete}_nodeRasmus Villemoes
There's some common boilerplate in devtmpfs_{create,delete}_node, put that in a little helper. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Link: https://lore.kernel.org/r/20200115184154.3492-6-linux@rasmusvillemoes.dk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-22devtmpfs: initify a bitRasmus Villemoes
devtmpfs_mount() is only called from prepare_namespace() in init/do_mounts.c, which is an __init function, so devtmpfs_mount() can also be moved to .init.text. Then the mount_dev static variable is only referenced from __init functions (devtmpfs_mount and its initializer function mount_param). Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Link: https://lore.kernel.org/r/20200115184154.3492-5-linux@rasmusvillemoes.dk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-22devtmpfs: simplify initialization of mount_devRasmus Villemoes
Avoid a bit of ifdeffery by using the IS_ENABLED() helper. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Link: https://lore.kernel.org/r/20200115184154.3492-4-linux@rasmusvillemoes.dk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-22devtmpfs: factor out setup part of devtmpfsd()Rasmus Villemoes
Factor out the setup part of devtmpfsd() to make it a bit easier to see that we always call setup_done() exactly once (provided of course the kthread is succesfully created). Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Link: https://lore.kernel.org/r/20200115184154.3492-3-linux@rasmusvillemoes.dk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-22devtmpfs: fix theoretical stale pointer deref in devtmpfsd()Rasmus Villemoes
After complete(&setup_done), devtmpfs_init proceeds and may actually return, invalidating the *err pointer, before devtmpfsd() proceeds to reading back *err. This is of course completely theoretical since the error conditions never trigger in practice, and even if they did, nobody cares about the exit value from a kernel thread, so it doesn't matter if we happen to read back some garbage from some other stack frame. Still, this isn't a pattern that should be copy-pasted, so fix it. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Link: https://lore.kernel.org/r/20200115184154.3492-2-linux@rasmusvillemoes.dk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-12devtmpfs: use do_mount() instead of ksys_mount()Dominik Brodowski
In devtmpfs, do_mount() can be called directly instead of complex wrapping by ksys_mount(): - the first and third arguments are const strings in the kernel, and do not need to be copied over from userspace; - the fifth argument is NULL, and therefore no page needs to be copied over from userspace; - the second and fourth argument are passed through anyway. Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2019-09-12vfs: Convert ramfs, shmem, tmpfs, devtmpfs, rootfs to use the new mount APIDavid Howells
Convert the ramfs, shmem, tmpfs, devtmpfs and rootfs filesystems to the new internal mount API as the old one will be obsoleted and removed. This allows greater flexibility in communication of mount parameters between userspace, the VFS and the filesystem. See Documentation/filesystems/mount_api.txt for more information. Note that tmpfs is slightly tricky as it can contain embedded commas, so it can't be trivially split up using strsep() to break on commas in generic_parse_monolithic(). Instead, tmpfs has to supply its own generic parser. However, if tmpfs changes, then devtmpfs and rootfs, which are wrappers around tmpfs or ramfs, must change too - and thus so must ramfs, so these had to be converted also. [AV: rewritten] Signed-off-by: David Howells <dhowells@redhat.com> cc: Hugh Dickins <hughd@google.com> cc: linux-mm@kvack.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-09-05make shmem_fill_super() staticAl Viro
... have callers use shmem_mount() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-09-05devtmpfs: don't mix {ramfs,shmem}_fill_super() with mount_single()Al Viro
Create an internal-only type matching the current devtmpfs, never register it and have one kernel-internal mount done. That thing gets mounted only once, so it is free to use mount_nodev(). The "public" devtmpfs (the one we do register, and only after the internal mount of the real thing is done) simply gets and returns an extra reference to the internal superblock. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-07-04constify ksys_mount() string argumentsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-12-20vfs: Suppress MS_* flag defs within the kernel unless explicitly enabledDavid Howells
Only the mount namespace code that implements mount(2) should be using the MS_* flags. Suppress them inside the kernel unless uapi/linux/mount.h is included. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: David Howells <dhowells@redhat.com>
2018-09-16drivers/base/devtmpfs.c: don't pretend path is const in delete_pathRasmus Villemoes
path is the result of kstrdup, and we repeatedly call strrchr on it, modifying it through the returned pointer. So there's no reason to pretend path is const. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-02kernel: add ksys_unshare() helper; remove in-kernel calls to sys_unshare()Dominik Brodowski
Using this helper allows us to avoid the in-kernel calls to the sys_unshare() syscall. The ksys_ prefix denotes that this function is meant as a drop-in replacement for the syscall. In particular, it uses the same calling convention as sys_unshare(). This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02fs: add ksys_chdir() helper; remove in-kernel calls to sys_chdir()Dominik Brodowski
Using this helper allows us to avoid the in-kernel calls to the sys_chdir() syscall. The ksys_ prefix denotes that this function is meant as a drop-in replacement for the syscall. In particular, it uses the same calling convention as sys_chdir(). This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02fs: add ksys_chroot() helper; remove-in kernel calls to sys_chroot()Dominik Brodowski
Using this helper allows us to avoid the in-kernel calls to the sys_chroot() syscall. The ksys_ prefix denotes that this function is meant as a drop-in replacement for the syscall. In particular, it uses the same calling convention as sys_chroot(). In the near future, the fs-external callers of ksys_chroot() should be converted to use kern_path()/set_fs_root() directly. Then ksys_chroot() can be moved within sys_chroot() again. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>