commit ea86c1430c83aa91f2c4122408922e34a1279775 Author: Greg Kroah-Hartman Date: Sat Jul 2 16:39:25 2022 +0200 Linux 5.10.128 Link: https://lore.kernel.org/r/20220630133230.676254336@linuxfoundation.org Tested-by: Jon Hunter Tested-by: Salvatore Bonaccorso Tested-by: Florian Fainelli Tested-by: Shuah Khan Tested-by: Guenter Roeck Tested-by: Hulk Robot Tested-by: Linux Kernel Functional Testing Tested-by: Sudip Mukherjee Tested-by: Pavel Machek (CIP) Signed-off-by: Greg Kroah-Hartman commit 2d10984d99ac2b652b4f31efd2e059957f1fa51f Author: Vladimir Oltean Date: Tue Jun 28 20:20:14 2022 +0300 net: mscc: ocelot: allow unregistered IP multicast flooding Flooding of unregistered IP multicast has been broken (both to other switch ports and to the CPU) since the ocelot driver introduction, and up until commit 4cf35a2b627a ("net: mscc: ocelot: fix broken IP multicast flooding"), a bug fix for commit 421741ea5672 ("net: mscc: ocelot: offload bridge port flags to device") from v5.12. The driver used to set PGID_MCIPV4 and PGID_MCIPV6 to the empty port mask (0), which made unregistered IPv4/IPv6 multicast go nowhere, and without ever modifying that port mask at runtime. The expectation is that such packets are treated as broadcast, and flooded according to the forwarding domain (to the CPU if the port is standalone, or to the CPU and other bridged ports, if under a bridge). Since the aforementioned commit, the limitation has been lifted by responding to SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS events emitted by the bridge. As for host flooding, DSA synthesizes another call to ocelot_port_bridge_flags() on the NPI port which ensures that the CPU gets the unregistered multicast traffic it might need, for example for smcroute to work between standalone ports. But between v4.18 and v5.12, IP multicast flooding has remained unfixed. Delete the inexplicable premature optimization of clearing PGID_MCIPV4 and PGID_MCIPV6 as part of the init sequence, and allow unregistered IP multicast to be flooded freely according to the forwarding domain established by PGID_SRC, by explicitly programming PGID_MCIPV4 and PGID_MCIPV6 towards all physical ports plus the CPU port module. Fixes: a556c76adc05 ("net: mscc: Add initial Ocelot switch support") Cc: stable@kernel.org Signed-off-by: Vladimir Oltean Signed-off-by: Greg Kroah-Hartman commit 6a656280e775821f75c0b9ca599b10174210fbdb Author: Naveen N. Rao Date: Mon May 16 12:44:22 2022 +0530 powerpc/ftrace: Remove ftrace init tramp once kernel init is complete commit 84ade0a6655bee803d176525ef457175cbf4df22 upstream. Stop using the ftrace trampoline for init section once kernel init is complete. Fixes: 67361cf8071286 ("powerpc/ftrace: Handle large kernel configs") Cc: stable@vger.kernel.org # v4.20+ Signed-off-by: Naveen N. Rao Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20220516071422.463738-1-naveen.n.rao@linux.vnet.ibm.com Signed-off-by: Greg Kroah-Hartman commit 6b734f7b7071859f582b5acb95abb97e1276a030 Author: Dave Chinner Date: Mon Jun 27 09:51:40 2022 +0300 xfs: check sb_meta_uuid for dabuf buffer recovery commit 09654ed8a18cfd45027a67d6cbca45c9ea54feab upstream. Got a report that a repeated crash test of a container host would eventually fail with a log recovery error preventing the system from mounting the root filesystem. It manifested as a directory leaf node corruption on writeback like so: XFS (loop0): Mounting V5 Filesystem XFS (loop0): Starting recovery (logdev: internal) XFS (loop0): Metadata corruption detected at xfs_dir3_leaf_check_int+0x99/0xf0, xfs_dir3_leaf1 block 0x12faa158 XFS (loop0): Unmount and run xfs_repair XFS (loop0): First 128 bytes of corrupted metadata buffer: 00000000: 00 00 00 00 00 00 00 00 3d f1 00 00 e1 9e d5 8b ........=....... 00000010: 00 00 00 00 12 fa a1 58 00 00 00 29 00 00 1b cc .......X...).... 00000020: 91 06 78 ff f7 7e 4a 7d 8d 53 86 f2 ac 47 a8 23 ..x..~J}.S...G.# 00000030: 00 00 00 00 17 e0 00 80 00 43 00 00 00 00 00 00 .........C...... 00000040: 00 00 00 2e 00 00 00 08 00 00 17 2e 00 00 00 0a ................ 00000050: 02 35 79 83 00 00 00 30 04 d3 b4 80 00 00 01 50 .5y....0.......P 00000060: 08 40 95 7f 00 00 02 98 08 41 fe b7 00 00 02 d4 .@.......A...... 00000070: 0d 62 ef a7 00 00 01 f2 14 50 21 41 00 00 00 0c .b.......P!A.... XFS (loop0): Corruption of in-memory data (0x8) detected at xfs_do_force_shutdown+0x1a/0x20 (fs/xfs/xfs_buf.c:1514). Shutting down. XFS (loop0): Please unmount the filesystem and rectify the problem(s) XFS (loop0): log mount/recovery failed: error -117 XFS (loop0): log mount failed Tracing indicated that we were recovering changes from a transaction at LSN 0x29/0x1c16 into a buffer that had an LSN of 0x29/0x1d57. That is, log recovery was overwriting a buffer with newer changes on disk than was in the transaction. Tracing indicated that we were hitting the "recovery immediately" case in xfs_buf_log_recovery_lsn(), and hence it was ignoring the LSN in the buffer. The code was extracting the LSN correctly, then ignoring it because the UUID in the buffer did not match the superblock UUID. The problem arises because the UUID check uses the wrong UUID - it should be checking the sb_meta_uuid, not sb_uuid. This filesystem has sb_uuid != sb_meta_uuid (which is fine), and the buffer has the correct matching sb_meta_uuid in it, it's just the code checked it against the wrong superblock uuid. The is no corruption in the filesystem, and failing to recover the buffer due to a write verifier failure means the recovery bug did not propagate the corruption to disk. Hence there is no corruption before or after this bug has manifested, the impact is limited simply to an unmountable filesystem.... This was missed back in 2015 during an audit of incorrect sb_uuid usage that resulted in commit fcfbe2c4ef42 ("xfs: log recovery needs to validate against sb_meta_uuid") that fixed the magic32 buffers to validate against sb_meta_uuid instead of sb_uuid. It missed the magicda buffers.... Fixes: ce748eaa65f2 ("xfs: create new metadata UUID field and incompat flag") Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Signed-off-by: Amir Goldstein Acked-by: Darrick J. Wong Signed-off-by: Greg Kroah-Hartman commit 071e750ffb3dc625cc92826950c26554f161a32c Author: Darrick J. Wong Date: Mon Jun 27 09:51:39 2022 +0300 xfs: remove all COW fork extents when remounting readonly commit 089558bc7ba785c03815a49c89e28ad9b8de51f9 upstream. [backport xfs_icwalk -> xfs_eofblocks for 5.10.y] As part of multiple customer escalations due to file data corruption after copy on write operations, I wrote some fstests that use fsstress to hammer on COW to shake things loose. Regrettably, I caught some filesystem shutdowns due to incorrect rmap operations with the following loop: mount # (0) fsstress & # (1) while true; do fsstress mount -o remount,ro # (2) fsstress mount -o remount,rw # (3) done When (2) happens, notice that (1) is still running. xfs_remount_ro will call xfs_blockgc_stop to walk the inode cache to free all the COW extents, but the blockgc mechanism races with (1)'s reader threads to take IOLOCKs and loses, which means that it doesn't clean them all out. Call such a file (A). When (3) happens, xfs_remount_rw calls xfs_reflink_recover_cow, which walks the ondisk refcount btree and frees any COW extent that it finds. This function does not check the inode cache, which means that incore COW forks of inode (A) is now inconsistent with the ondisk metadata. If one of those former COW extents are allocated and mapped into another file (B) and someone triggers a COW to the stale reservation in (A), A's dirty data will be written into (B) and once that's done, those blocks will be transferred to (A)'s data fork without bumping the refcount. The results are catastrophic -- file (B) and the refcount btree are now corrupt. Solve this race by forcing the xfs_blockgc_free_space to run synchronously, which causes xfs_icwalk to return to inodes that were skipped because the blockgc code couldn't take the IOLOCK. This is safe to do here because the VFS has already prohibited new writer threads. Fixes: 10ddf64e420f ("xfs: remove leftover CoW reservations when remounting ro") Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner Reviewed-by: Chandan Babu R Signed-off-by: Amir Goldstein Acked-by: Darrick J. Wong Signed-off-by: Greg Kroah-Hartman commit 1e76bd4c67224a645558314c0097d5b5a338bba9 Author: Yang Xu Date: Mon Jun 27 09:51:38 2022 +0300 xfs: Fix the free logic of state in xfs_attr_node_hasname commit a1de97fe296c52eafc6590a3506f4bbd44ecb19a upstream. When testing xfstests xfs/126 on lastest upstream kernel, it will hang on some machine. Adding a getxattr operation after xattr corrupted, I can reproduce it 100%. The deadlock as below: [983.923403] task:setfattr state:D stack: 0 pid:17639 ppid: 14687 flags:0x00000080 [ 983.923405] Call Trace: [ 983.923410] __schedule+0x2c4/0x700 [ 983.923412] schedule+0x37/0xa0 [ 983.923414] schedule_timeout+0x274/0x300 [ 983.923416] __down+0x9b/0xf0 [ 983.923451] ? xfs_buf_find.isra.29+0x3c8/0x5f0 [xfs] [ 983.923453] down+0x3b/0x50 [ 983.923471] xfs_buf_lock+0x33/0xf0 [xfs] [ 983.923490] xfs_buf_find.isra.29+0x3c8/0x5f0 [xfs] [ 983.923508] xfs_buf_get_map+0x4c/0x320 [xfs] [ 983.923525] xfs_buf_read_map+0x53/0x310 [xfs] [ 983.923541] ? xfs_da_read_buf+0xcf/0x120 [xfs] [ 983.923560] xfs_trans_read_buf_map+0x1cf/0x360 [xfs] [ 983.923575] ? xfs_da_read_buf+0xcf/0x120 [xfs] [ 983.923590] xfs_da_read_buf+0xcf/0x120 [xfs] [ 983.923606] xfs_da3_node_read+0x1f/0x40 [xfs] [ 983.923621] xfs_da3_node_lookup_int+0x69/0x4a0 [xfs] [ 983.923624] ? kmem_cache_alloc+0x12e/0x270 [ 983.923637] xfs_attr_node_hasname+0x6e/0xa0 [xfs] [ 983.923651] xfs_has_attr+0x6e/0xd0 [xfs] [ 983.923664] xfs_attr_set+0x273/0x320 [xfs] [ 983.923683] xfs_xattr_set+0x87/0xd0 [xfs] [ 983.923686] __vfs_removexattr+0x4d/0x60 [ 983.923688] __vfs_removexattr_locked+0xac/0x130 [ 983.923689] vfs_removexattr+0x4e/0xf0 [ 983.923690] removexattr+0x4d/0x80 [ 983.923693] ? __check_object_size+0xa8/0x16b [ 983.923695] ? strncpy_from_user+0x47/0x1a0 [ 983.923696] ? getname_flags+0x6a/0x1e0 [ 983.923697] ? _cond_resched+0x15/0x30 [ 983.923699] ? __sb_start_write+0x1e/0x70 [ 983.923700] ? mnt_want_write+0x28/0x50 [ 983.923701] path_removexattr+0x9b/0xb0 [ 983.923702] __x64_sys_removexattr+0x17/0x20 [ 983.923704] do_syscall_64+0x5b/0x1a0 [ 983.923705] entry_SYSCALL_64_after_hwframe+0x65/0xca [ 983.923707] RIP: 0033:0x7f080f10ee1b When getxattr calls xfs_attr_node_get function, xfs_da3_node_lookup_int fails with EFSCORRUPTED in xfs_attr_node_hasname because we have use blocktrash to random it in xfs/126. So it free state in internal and xfs_attr_node_get doesn't do xfs_buf_trans release job. Then subsequent removexattr will hang because of it. This bug was introduced by kernel commit 07120f1abdff ("xfs: Add xfs_has_attr and subroutines"). It adds xfs_attr_node_hasname helper and said caller will be responsible for freeing the state in this case. But xfs_attr_node_hasname will free state itself instead of caller if xfs_da3_node_lookup_int fails. Fix this bug by moving the step of free state into caller. [amir: this text from original commit is not relevant for 5.10 backport: Also, use "goto error/out" instead of returning error directly in xfs_attr_node_addname_find_attr and xfs_attr_node_removename_setup function because we should free state ourselves. ] Fixes: 07120f1abdff ("xfs: Add xfs_has_attr and subroutines") Signed-off-by: Yang Xu Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Signed-off-by: Amir Goldstein Acked-by: Darrick J. Wong Signed-off-by: Greg Kroah-Hartman commit 0cdccc05da76a87a4e04d03eb812bacc33864ad9 Author: Brian Foster Date: Mon Jun 27 09:51:37 2022 +0300 xfs: punch out data fork delalloc blocks on COW writeback failure commit 5ca5916b6bc93577c360c06cb7cdf71adb9b5faf upstream. If writeback I/O to a COW extent fails, the COW fork blocks are punched out and the data fork blocks left alone. It is possible for COW fork blocks to overlap non-shared data fork blocks (due to cowextsz hint prealloc), however, and writeback unconditionally maps to the COW fork whenever blocks exist at the corresponding offset of the page undergoing writeback. This means it's quite possible for a COW fork extent to overlap delalloc data fork blocks, writeback to convert and map to the COW fork blocks, writeback to fail, and finally for ioend completion to cancel the COW fork blocks and leave stale data fork delalloc blocks around in the inode. The blocks are effectively stale because writeback failure also discards dirty page state. If this occurs, it is likely to trigger assert failures, free space accounting corruption and failures in unrelated file operations. For example, a subsequent reflink attempt of the affected file to a new target file will trip over the stale delalloc in the source file and fail. Several of these issues are occasionally reproduced by generic/648, but are reproducible on demand with the right sequence of operations and timely I/O error injection. To fix this problem, update the ioend failure path to also punch out underlying data fork delalloc blocks on I/O error. This is analogous to the writeback submission failure path in xfs_discard_page() where we might fail to map data fork delalloc blocks and consistent with the successful COW writeback completion path, which is responsible for unmapping from the data fork and remapping in COW fork blocks. Fixes: 787eb485509f ("xfs: fix and streamline error handling in xfs_end_io") Signed-off-by: Brian Foster Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Signed-off-by: Amir Goldstein Acked-by: Darrick J. Wong Signed-off-by: Greg Kroah-Hartman commit db3f8110c3b0e1185f6288331cb428978708fb79 Author: Rustam Kovhaev Date: Mon Jun 27 09:51:36 2022 +0300 xfs: use kmem_cache_free() for kmem_cache objects commit c30a0cbd07ecc0eec7b3cd568f7b1c7bb7913f93 upstream. For kmalloc() allocations SLOB prepends the blocks with a 4-byte header, and it puts the size of the allocated blocks in that header. Blocks allocated with kmem_cache_alloc() allocations do not have that header. SLOB explodes when you allocate memory with kmem_cache_alloc() and then try to free it with kfree() instead of kmem_cache_free(). SLOB will assume that there is a header when there is none, read some garbage to size variable and corrupt the adjacent objects, which eventually leads to hang or panic. Let's make XFS work with SLOB by using proper free function. Fixes: 9749fee83f38 ("xfs: enable the xfs_defer mechanism to process extents to free") Signed-off-by: Rustam Kovhaev Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Signed-off-by: Amir Goldstein Acked-by: Darrick J. Wong Signed-off-by: Greg Kroah-Hartman commit 09c9902cd80a07c2e69024f96f049985047e64b8 Author: Coly Li Date: Fri May 27 23:28:16 2022 +0800 bcache: memset on stack variables in bch_btree_check() and bch_sectors_dirty_init() commit 7d6b902ea0e02b2a25c480edf471cbaa4ebe6b3c upstream. The local variables check_state (in bch_btree_check()) and state (in bch_sectors_dirty_init()) should be fully filled by 0, because before allocating them on stack, they were dynamically allocated by kzalloc(). Signed-off-by: Coly Li Link: https://lore.kernel.org/r/20220527152818.27545-2-colyli@suse.de Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman commit c4ff3ffe0138234774602152fe67e3a898c615c6 Author: Masahiro Yamada Date: Mon Jun 27 12:22:09 2022 +0900 tick/nohz: unexport __init-annotated tick_nohz_full_setup() commit 2390095113e98fc52fffe35c5206d30d9efe3f78 upstream. EXPORT_SYMBOL and __init is a bad combination because the .init.text section is freed up after the initialization. Hence, modules cannot use symbols annotated __init. The access to a freed symbol may end up with kernel panic. modpost used to detect it, but it had been broken for a decade. Commit 28438794aba4 ("modpost: fix section mismatch check for exported init/exit sections") fixed it so modpost started to warn it again, then this showed up: MODPOST vmlinux.symvers WARNING: modpost: vmlinux.o(___ksymtab_gpl+tick_nohz_full_setup+0x0): Section mismatch in reference from the variable __ksymtab_tick_nohz_full_setup to the function .init.text:tick_nohz_full_setup() The symbol tick_nohz_full_setup is exported and annotated __init Fix this by removing the __init annotation of tick_nohz_full_setup or drop the export. Drop the export because tick_nohz_full_setup() is only called from the built-in code in kernel/sched/isolation.c. Fixes: ae9e557b5be2 ("time: Export tick start/stop functions for rcutorture") Reported-by: Linus Torvalds Signed-off-by: Masahiro Yamada Tested-by: Paul E. McKenney Signed-off-by: Linus Torvalds Cc: Thomas Backlund Signed-off-by: Greg Kroah-Hartman commit 069fff50d4008970642a5380c3022e76dd8e7336 Author: Christoph Hellwig Date: Tue Feb 2 13:13:23 2021 +0100 drm: remove drm_fb_helper_modinit commit bf22c9ec39da90ce866d5f625d616f28bc733dc1 upstream. drm_fb_helper_modinit has a lot of boilerplate for what is not very simple functionality. Just open code it in the only caller using IS_ENABLED and IS_MODULE, and skip the find_module check as a request_module is harmless if the module is already loaded (and not other caller has this find_module check either). Acked-by: Daniel Vetter Signed-off-by: Christoph Hellwig Signed-off-by: Jessica Yu Signed-off-by: Greg Kroah-Hartman commit 52dc7f3f6fa151f5d64b3e6fe95369c21600e72c Author: Amir Goldstein Date: Thu Jun 30 08:43:21 2022 +0300 MAINTAINERS: add Amir as xfs maintainer for 5.10.y This is an attempt to direct the bots and human that are testing LTS 5.10.y towards the maintainer of xfs in the 5.10.y tree. This is not an upstream MAINTAINERS entry and 5.15.y and 5.4.y will have their own LTS xfs maintainer entries. Update Darrick's email address from upstream and add Amir as xfs maintaier for the 5.10.y tree. Suggested-by: Darrick J. Wong Link: https://lore.kernel.org/linux-xfs/Yrx6%2F0UmYyuBPjEr@magnolia/ Signed-off-by: Amir Goldstein Reviewed-by: Darrick J. Wong Signed-off-by: Greg Kroah-Hartman