commit 8263087bf62739362d50ec965c8c34fe3ee7a7cd Author: Greg Kroah-Hartman Date: Thu Oct 18 09:16:28 2018 +0200 Linux 4.14.77 commit d0c9f9f9fb446e31b36d76e9a47001188f961a44 Author: Jiri Olsa Date: Mon Mar 19 09:29:01 2018 +0100 perf tools: Fix snprint warnings for gcc 8 commit 77f18153c080855e1c3fb520ca31a4e61530121d upstream. With gcc 8 we get new set of snprintf() warnings that breaks the compilation, one example: tests/mem.c: In function ‘check’: tests/mem.c:19:48: error: ‘%s’ directive output may be truncated writing \ up to 99 bytes into a region of size 89 [-Werror=format-truncation=] snprintf(failure, sizeof failure, "unexpected %s", out); The gcc docs says: To avoid the warning either use a bigger buffer or handle the function's return value which indicates whether or not its output has been truncated. Given that all these warnings are harmless, because the code either properly fails due to uncomplete file path or we don't care for truncated output at all, I'm changing all those snprintf() calls to scnprintf(), which actually 'checks' for the snprint return value so the gcc stays silent. Signed-off-by: Jiri Olsa Cc: Alexander Shishkin Cc: David Ahern Cc: Josh Poimboeuf Cc: Namhyung Kim Cc: Peter Zijlstra Cc: Sergey Senozhatsky Link: http://lkml.kernel.org/r/20180319082902.4518-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo Cc: Ignat Korchagin Signed-off-by: Greg Kroah-Hartman commit 57bff812c4e2ca91a876f759198f2cd9e15967ad Author: Russell King Date: Mon Oct 15 11:32:18 2018 -0400 ARM: spectre-v1: mitigate user accesses Commit a3c0f84765bb429ba0fd23de1c57b5e1591c9389 upstream. Spectre variant 1 attacks are about this sequence of pseudo-code: index = load(user-manipulated pointer); access(base + index * stride); In order for the cache side-channel to work, the access() must me made to memory which userspace can detect whether cache lines have been loaded. On 32-bit ARM, this must be either user accessible memory, or a kernel mapping of that same user accessible memory. The problem occurs when the load() speculatively loads privileged data, and the subsequent access() is made to user accessible memory. Any load() which makes use of a user-maniplated pointer is a potential problem if the data it has loaded is used in a subsequent access. This also applies for the access() if the data loaded by that access is used by a subsequent access. Harden the get_user() accessors against Spectre attacks by forcing out of bounds addresses to a NULL pointer. This prevents get_user() being used as the load() step above. As a side effect, put_user() will also be affected even though it isn't implicated. Also harden copy_from_user() by redoing the bounds check within the arm_copy_from_user() code, and NULLing the pointer if out of bounds. Acked-by: Mark Rutland Signed-off-by: Russell King Signed-off-by: David A. Long Signed-off-by: Greg Kroah-Hartman commit 4a1948d692f13cafaf2ca5c228f789a7ee74f6c7 Author: Russell King Date: Mon Oct 15 11:32:17 2018 -0400 ARM: spectre-v1: use get_user() for __get_user() Commit b1cd0a14806321721aae45f5446ed83a3647c914 upstream. Fixing __get_user() for spectre variant 1 is not sane: we would have to add address space bounds checking in order to validate that the location should be accessed, and then zero the address if found to be invalid. Since __get_user() is supposed to avoid the bounds check, and this is exactly what get_user() does, there's no point having two different implementations that are doing the same thing. So, when the Spectre workarounds are required, make __get_user() an alias of get_user(). Acked-by: Mark Rutland Signed-off-by: Russell King Signed-off-by: David A. Long Signed-off-by: Greg Kroah-Hartman commit f64824a3d475b573cbab5c35942223e0474096be Author: Russell King Date: Mon Oct 15 11:32:16 2018 -0400 ARM: use __inttype() in get_user() Commit d09fbb327d670737ab40fd8bbb0765ae06b8b739 upstream. Borrow the x86 implementation of __inttype() to use in get_user() to select an integer type suitable to temporarily hold the result value. This is necessary to avoid propagating the volatile nature of the result argument, which can cause the following warning: lib/iov_iter.c:413:5: warning: optimization may eliminate reads and/or writes to register variables [-Wvolatile-register-var] Acked-by: Mark Rutland Signed-off-by: Russell King Signed-off-by: David A. Long Signed-off-by: Greg Kroah-Hartman commit 70b96be10d151cf088c991a018889ee76b1c6c0e Author: Russell King Date: Mon Oct 15 11:32:15 2018 -0400 ARM: oabi-compat: copy semops using __copy_from_user() Commit 8c8484a1c18e3231648f5ba7cc5ffb7fd70b3ca4 upstream. __get_user_error() is used as a fast accessor to make copying structure members as efficient as possible. However, with software PAN and the recent Spectre variant 1, the efficiency is reduced as these are no longer fast accessors. In the case of software PAN, it has to switch the domain register around each access, and with Spectre variant 1, it would have to repeat the access_ok() check for each access. Rather than using __get_user_error() to copy each semops element member, copy each semops element in full using __copy_from_user(). Acked-by: Mark Rutland Signed-off-by: Russell King Signed-off-by: David A. Long Signed-off-by: Greg Kroah-Hartman commit 38752f41748728cbd176a50d10f02f1dda1c1a90 Author: Russell King Date: Mon Oct 15 11:32:14 2018 -0400 ARM: vfp: use __copy_from_user() when restoring VFP state Commit 42019fc50dfadb219f9e6ddf4c354f3837057d80 upstream. __get_user_error() is used as a fast accessor to make copying structure members in the signal handling path as efficient as possible. However, with software PAN and the recent Spectre variant 1, the efficiency is reduced as these are no longer fast accessors. In the case of software PAN, it has to switch the domain register around each access, and with Spectre variant 1, it would have to repeat the access_ok() check for each access. Use __copy_from_user() rather than __get_user_err() for individual members when restoring VFP state. Acked-by: Mark Rutland Signed-off-by: Russell King Signed-off-by: David A. Long Signed-off-by: Greg Kroah-Hartman commit faac72dc91507049c55f3de6731428f50f33fc88 Author: Russell King Date: Mon Oct 15 11:32:13 2018 -0400 ARM: signal: copy registers using __copy_from_user() Commit c32cd419d6650e42b9cdebb83c672ec945e6bd7e upstream. __get_user_error() is used as a fast accessor to make copying structure members in the signal handling path as efficient as possible. However, with software PAN and the recent Spectre variant 1, the efficiency is reduced as these are no longer fast accessors. In the case of software PAN, it has to switch the domain register around each access, and with Spectre variant 1, it would have to repeat the access_ok() check for each access. It becomes much more efficient to use __copy_from_user() instead, so let's use this for the ARM integer registers. Acked-by: Mark Rutland Signed-off-by: Russell King Signed-off-by: David A. Long Signed-off-by: Greg Kroah-Hartman commit b690ec0dc735545bca2b78bee639d1545bda97e6 Author: Russell King Date: Mon Oct 15 11:32:12 2018 -0400 ARM: spectre-v1: fix syscall entry Commit 10573ae547c85b2c61417ff1a106cffbfceada35 upstream. Prevent speculation at the syscall table decoding by clamping the index used to zero on invalid system call numbers, and using the csdb speculative barrier. Signed-off-by: Russell King Acked-by: Mark Rutland Boot-tested-by: Tony Lindgren Reviewed-by: Tony Lindgren Signed-off-by: David A. Long Signed-off-by: Greg Kroah-Hartman commit 4186f7cfa1d6083e969638c8e98b205163da7e2a Author: Russell King Date: Mon Oct 15 11:32:11 2018 -0400 ARM: spectre-v1: add array_index_mask_nospec() implementation Commit 1d4238c56f9816ce0f9c8dbe42d7f2ad81cb6613 upstream. Add an implementation of the array_index_mask_nospec() function for mitigating Spectre variant 1 throughout the kernel. Signed-off-by: Russell King Acked-by: Mark Rutland Boot-tested-by: Tony Lindgren Reviewed-by: Tony Lindgren Signed-off-by: David A. Long Signed-off-by: Greg Kroah-Hartman commit f6909113ad1ff2a168643e7fee3136b8adbb4aa9 Author: Russell King Date: Mon Oct 15 11:32:10 2018 -0400 ARM: spectre-v1: add speculation barrier (csdb) macros Commit a78d156587931a2c3b354534aa772febf6c9e855 upstream. Add assembly and C macros for the new CSDB instruction. Signed-off-by: Russell King Acked-by: Mark Rutland Boot-tested-by: Tony Lindgren Reviewed-by: Tony Lindgren Signed-off-by: David A. Long Signed-off-by: Greg Kroah-Hartman commit e7fc401a8800b9d657e2d1ba7188592b449c9c84 Author: Russell King Date: Mon Oct 15 11:32:09 2018 -0400 ARM: KVM: report support for SMCCC_ARCH_WORKAROUND_1 Commit add5609877c6785cc002c6ed7e008b1d61064439 upstream. Report support for SMCCC_ARCH_WORKAROUND_1 to KVM guests for affected CPUs. Signed-off-by: Russell King Boot-tested-by: Tony Lindgren Reviewed-by: Tony Lindgren Reviewed-by: Marc Zyngier Signed-off-by: David A. Long Signed-off-by: Greg Kroah-Hartman commit 8502541ee21650603972834c3560df9329f3a1a4 Author: Russell King Date: Mon Oct 15 11:32:08 2018 -0400 ARM: KVM: Add SMCCC_ARCH_WORKAROUND_1 fast handling Commit b800acfc70d9fb81fbd6df70f2cf5e20f70023d0 upstream. We want SMCCC_ARCH_WORKAROUND_1 to be fast. As fast as possible. So let's intercept it as early as we can by testing for the function call number as soon as we've identified a HVC call coming from the guest. Signed-off-by: Russell King Boot-tested-by: Tony Lindgren Reviewed-by: Tony Lindgren Reviewed-by: Marc Zyngier Signed-off-by: David A. Long Signed-off-by: Greg Kroah-Hartman commit ee4e537d3aa18e9a35fef400961ba0ce8edaf7b9 Author: Russell King Date: Mon Oct 15 11:32:07 2018 -0400 ARM: spectre-v2: KVM: invalidate icache on guest exit for Brahma B15 Commit 3c908e16396d130608e831b7fac4b167a2ede6ba upstream. Include Brahma B15 in the Spectre v2 KVM workarounds. Signed-off-by: Russell King Acked-by: Florian Fainelli Boot-tested-by: Tony Lindgren Reviewed-by: Tony Lindgren Acked-by: Marc Zyngier Signed-off-by: David A. Long Signed-off-by: Greg Kroah-Hartman commit 1df9a0a8201b4ba2e58f54c0b88810a8cc4f3fbd Author: Marc Zyngier Date: Mon Oct 15 11:32:06 2018 -0400 ARM: KVM: invalidate icache on guest exit for Cortex-A15 Commit 0c47ac8cd157727e7a532d665d6fb1b5fd333977 upstream. In order to avoid aliasing attacks against the branch predictor on Cortex-A15, let's invalidate the BTB on guest exit, which can only be done by invalidating the icache (with ACTLR[0] being set). We use the same hack as for A12/A17 to perform the vector decoding. Signed-off-by: Marc Zyngier Signed-off-by: Russell King Boot-tested-by: Tony Lindgren Reviewed-by: Tony Lindgren Signed-off-by: David A. Long Signed-off-by: Greg Kroah-Hartman commit 75e48eff8aae50cabc9124cb39b31ef94c713c6a Author: Marc Zyngier Date: Mon Oct 15 11:32:05 2018 -0400 ARM: KVM: invalidate BTB on guest exit for Cortex-A12/A17 Commit 3f7e8e2e1ebda787f156ce46e3f0a9ce2833fa4f upstream. In order to avoid aliasing attacks against the branch predictor, let's invalidate the BTB on guest exit. This is made complicated by the fact that we cannot take a branch before invalidating the BTB. We only apply this to A12 and A17, which are the only two ARM cores on which this useful. Signed-off-by: Marc Zyngier Signed-off-by: Russell King Boot-tested-by: Tony Lindgren Reviewed-by: Tony Lindgren Signed-off-by: David A. Long Signed-off-by: Greg Kroah-Hartman commit 6d75fe7ed2f69f5debc35098281516e6737a8229 Author: Russell King Date: Mon Oct 15 11:32:04 2018 -0400 ARM: spectre-v2: warn about incorrect context switching functions Commit c44f366ea7c85e1be27d08f2f0880f4120698125 upstream. Warn at error level if the context switching function is not what we are expecting. This can happen with big.Little systems, which we currently do not support. Signed-off-by: Russell King Boot-tested-by: Tony Lindgren Reviewed-by: Tony Lindgren Acked-by: Marc Zyngier Signed-off-by: David A. Long Signed-off-by: Greg Kroah-Hartman commit 510155b2d95b3561ec214a16fa9e28ce10d0d9c7 Author: Russell King Date: Mon Oct 15 11:32:03 2018 -0400 ARM: spectre-v2: add firmware based hardening Commit 10115105cb3aa17b5da1cb726ae8dd5f6854bd93 upstream. Add firmware based hardening for cores that require more complex handling in firmware. Signed-off-by: Russell King Boot-tested-by: Tony Lindgren Reviewed-by: Tony Lindgren Reviewed-by: Marc Zyngier Signed-off-by: David A. Long Signed-off-by: Greg Kroah-Hartman commit 5ab8c6e8879c3eee7d70714b630a4770ff8a2678 Author: Russell King Date: Mon Oct 15 11:32:02 2018 -0400 ARM: spectre-v2: harden user aborts in kernel space Commit f5fe12b1eaee220ce62ff9afb8b90929c396595f upstream. In order to prevent aliasing attacks on the branch predictor, invalidate the BTB or instruction cache on CPUs that are known to be affected when taking an abort on a address that is outside of a user task limit: Cortex A8, A9, A12, A17, A73, A75: flush BTB. Cortex A15, Brahma B15: invalidate icache. If the IBE bit is not set, then there is little point to enabling the workaround. Signed-off-by: Russell King Boot-tested-by: Tony Lindgren Reviewed-by: Tony Lindgren Signed-off-by: David A. Long Signed-off-by: Greg Kroah-Hartman commit 81b215a5b80b32020daf481f9f037942e4552aa7 Author: Russell King Date: Mon Oct 15 11:32:01 2018 -0400 ARM: spectre-v2: add Cortex A8 and A15 validation of the IBE bit Commit e388b80288aade31135aca23d32eee93dd106795 upstream. When the branch predictor hardening is enabled, firmware must have set the IBE bit in the auxiliary control register. If this bit has not been set, the Spectre workarounds will not be functional. Add validation that this bit is set, and print a warning at alert level if this is not the case. Signed-off-by: Russell King Reviewed-by: Florian Fainelli Boot-tested-by: Tony Lindgren Reviewed-by: Tony Lindgren Signed-off-by: David A. Long Signed-off-by: Greg Kroah-Hartman commit 3e52aff79d5eeac0f71acd054997c0cffa8d6c55 Author: Russell King Date: Mon Oct 15 11:32:00 2018 -0400 ARM: spectre-v2: harden branch predictor on context switches Commit 06c23f5ffe7ad45b908d0fff604dae08a7e334b9 upstream. Required manual merge of arch/arm/mm/proc-v7.S. Harden the branch predictor against Spectre v2 attacks on context switches for ARMv7 and later CPUs. We do this by: Cortex A9, A12, A17, A73, A75: invalidating the BTB. Cortex A15, Brahma B15: invalidating the instruction cache. Cortex A57 and Cortex A72 are not addressed in this patch. Cortex R7 and Cortex R8 are also not addressed as we do not enforce memory protection on these cores. Signed-off-by: Russell King Boot-tested-by: Tony Lindgren Reviewed-by: Tony Lindgren Acked-by: Marc Zyngier Signed-off-by: David A. Long Signed-off-by: Greg Kroah-Hartman commit c0f64070a310c9f9c948841c0ec7a9635ad8c08d Author: Russell King Date: Mon Oct 15 11:31:59 2018 -0400 ARM: spectre: add Kconfig symbol for CPUs vulnerable to Spectre Commit c58d237d0852a57fde9bc2c310972e8f4e3d155d upstream. Add a Kconfig symbol for CPUs which are vulnerable to the Spectre attacks. Signed-off-by: Russell King Reviewed-by: Florian Fainelli Boot-tested-by: Tony Lindgren Reviewed-by: Tony Lindgren Acked-by: Marc Zyngier Signed-off-by: David A. Long Signed-off-by: Greg Kroah-Hartman commit 0d5360ee15e98cb04ee98b92a90b99cfd3e154a3 Author: Russell King Date: Mon Oct 15 11:31:58 2018 -0400 ARM: bugs: add support for per-processor bug checking Commit 9d3a04925deeabb97c8e26d940b501a2873e8af3 upstream. Add support for per-processor bug checking - each processor function descriptor gains a function pointer for this check, which must not be an __init function. If non-NULL, this will be called whenever a CPU enters the kernel via which ever path (boot CPU, secondary CPU startup, CPU resuming, etc.) This allows processor specific bug checks to validate that workaround bits are properly enabled by firmware via all entry paths to the kernel. Signed-off-by: Russell King Reviewed-by: Florian Fainelli Boot-tested-by: Tony Lindgren Reviewed-by: Tony Lindgren Acked-by: Marc Zyngier Signed-off-by: David A. Long Signed-off-by: Greg Kroah-Hartman commit c7825c277bad3fc28ec29d5d73d54b7f3b12c573 Author: Russell King Date: Mon Oct 15 11:31:57 2018 -0400 ARM: bugs: hook processor bug checking into SMP and suspend paths Commit 26602161b5ba795928a5a719fe1d5d9f2ab5c3ef upstream. Check for CPU bugs when secondary processors are being brought online, and also when CPUs are resuming from a low power mode. This gives an opportunity to check that processor specific bug workarounds are correctly enabled for all paths that a CPU re-enters the kernel. Signed-off-by: Russell King Reviewed-by: Florian Fainelli Boot-tested-by: Tony Lindgren Reviewed-by: Tony Lindgren Acked-by: Marc Zyngier Signed-off-by: David A. Long Signed-off-by: Greg Kroah-Hartman commit 9a42b70744b1620ddfa8b2373a52e1d0552049da Author: Russell King Date: Mon Oct 15 11:31:56 2018 -0400 ARM: bugs: prepare processor bug infrastructure Commit a5b9177f69329314721aa7022b7e69dab23fa1f0 upstream. Prepare the processor bug infrastructure so that it can be expanded to check for per-processor bugs. Signed-off-by: Russell King Reviewed-by: Florian Fainelli Boot-tested-by: Tony Lindgren Reviewed-by: Tony Lindgren Acked-by: Marc Zyngier Signed-off-by: David A. Long Signed-off-by: Greg Kroah-Hartman commit 1789de335428b57be254175aa5748a7620e5fb97 Author: Russell King Date: Mon Oct 15 11:31:55 2018 -0400 ARM: add more CPU part numbers for Cortex and Brahma B15 CPUs Commit f5683e76f35b4ec5891031b6a29036efe0a1ff84 upstream. Add CPU part numbers for Cortex A53, A57, A72, A73, A75 and the Broadcom Brahma B15 CPU. Signed-off-by: Russell King Acked-by: Florian Fainelli Boot-tested-by: Tony Lindgren Reviewed-by: Tony Lindgren Acked-by: Marc Zyngier Signed-off-by: David A. Long Signed-off-by: Greg Kroah-Hartman commit d62b8ac8cd540c39a10d94c1fe5389f994a82e1a Author: Roman Gushchin Date: Fri May 11 16:01:53 2018 -0700 mm: don't show nr_indirectly_reclaimable in /proc/vmstat commit 7aaf7727235870f497eb928f728f7773d6df3b40 upstream. Don't show nr_indirectly_reclaimable in /proc/vmstat, because there is no need to export this vm counter to userspace, and some changes are expected in reclaimable object accounting, which can alter this counter. Link: http://lkml.kernel.org/r/20180425191422.9159-1-guro@fb.com Signed-off-by: Roman Gushchin Acked-by: Vlastimil Babka Reviewed-by: Andrew Morton Cc: Matthew Wilcox Cc: Alexander Viro Cc: Michal Hocko Cc: Johannes Weiner Cc: David Rientjes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 5de69d648a09acc3c58bcead8cfc9d19356f79e4 Author: Roman Gushchin Date: Tue Apr 10 16:27:47 2018 -0700 mm: treat indirectly reclaimable memory as free in overcommit logic commit d79f7aa496fc94d763f67b833a1f36f4c171176f upstream. Indirectly reclaimable memory can consume a significant part of total memory and it's actually reclaimable (it will be released under actual memory pressure). So, the overcommit logic should treat it as free. Otherwise, it's possible to cause random system-wide memory allocation failures by consuming a significant amount of memory by indirectly reclaimable memory, e.g. dentry external names. If overcommit policy GUESS is used, it might be used for denial of service attack under some conditions. The following program illustrates the approach. It causes the kernel to allocate an unreclaimable kmalloc-256 chunk for each stat() call, so that at some point the overcommit logic may start blocking large allocation system-wide. int main() { char buf[256]; unsigned long i; struct stat statbuf; buf[0] = '/'; for (i = 1; i < sizeof(buf); i++) buf[i] = '_'; for (i = 0; 1; i++) { sprintf(&buf[248], "%8lu", i); stat(buf, &statbuf); } return 0; } This patch in combination with related indirectly reclaimable memory patches closes this issue. Link: http://lkml.kernel.org/r/20180313130041.8078-1-guro@fb.com Signed-off-by: Roman Gushchin Reviewed-by: Andrew Morton Cc: Alexander Viro Cc: Michal Hocko Cc: Johannes Weiner Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 6d7942377c88ef51783057713f9610aa3e307f24 Author: Roman Gushchin Date: Tue Apr 10 16:27:44 2018 -0700 dcache: account external names as indirectly reclaimable memory commit f1782c9bc547754f4bd3043fe8cfda53db85f13f upstream. I received a report about suspicious growth of unreclaimable slabs on some machines. I've found that it happens on machines with low memory pressure, and these unreclaimable slabs are external names attached to dentries. External names are allocated using generic kmalloc() function, so they are accounted as unreclaimable. But they are held by dentries, which are reclaimable, and they will be reclaimed under the memory pressure. In particular, this breaks MemAvailable calculation, as it doesn't take unreclaimable slabs into account. This leads to a silly situation, when a machine is almost idle, has no memory pressure and therefore has a big dentry cache. And the resulting MemAvailable is too low to start a new workload. To address the issue, the NR_INDIRECTLY_RECLAIMABLE_BYTES counter is used to track the amount of memory, consumed by external names. The counter is increased in the dentry allocation path, if an external name structure is allocated; and it's decreased in the dentry freeing path. To reproduce the problem I've used the following Python script: import os for iter in range (0, 10000000): try: name = ("/some_long_name_%d" % iter) + "_" * 220 os.stat(name) except Exception: pass Without this patch: $ cat /proc/meminfo | grep MemAvailable MemAvailable: 7811688 kB $ python indirect.py $ cat /proc/meminfo | grep MemAvailable MemAvailable: 2753052 kB With the patch: $ cat /proc/meminfo | grep MemAvailable MemAvailable: 7809516 kB $ python indirect.py $ cat /proc/meminfo | grep MemAvailable MemAvailable: 7749144 kB [guro@fb.com: fix indirectly reclaimable memory accounting for CONFIG_SLOB] Link: http://lkml.kernel.org/r/20180312194140.19517-1-guro@fb.com [guro@fb.com: fix indirectly reclaimable memory accounting] Link: http://lkml.kernel.org/r/20180313125701.7955-1-guro@fb.com Link: http://lkml.kernel.org/r/20180305133743.12746-5-guro@fb.com Signed-off-by: Roman Gushchin Reviewed-by: Andrew Morton Cc: Alexander Viro Cc: Michal Hocko Cc: Johannes Weiner Cc: Mel Gorman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit dc09a5b68d830c2d25c4c1321d3db93b03b9be6a Author: Roman Gushchin Date: Tue Apr 10 16:27:40 2018 -0700 mm: treat indirectly reclaimable memory as available in MemAvailable commit 034ebf65c3c21d85b963d39f992258a64a85e3a9 upstream. Adjust /proc/meminfo MemAvailable calculation by adding the amount of indirectly reclaimable memory (rounded to the PAGE_SIZE). Link: http://lkml.kernel.org/r/20180305133743.12746-4-guro@fb.com Signed-off-by: Roman Gushchin Reviewed-by: Andrew Morton Cc: Alexander Viro Cc: Michal Hocko Cc: Johannes Weiner Cc: Mel Gorman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit c605894c84b9d26324bdbc31a9526cb0a5c8d48e Author: Roman Gushchin Date: Tue Apr 10 16:27:36 2018 -0700 mm: introduce NR_INDIRECTLY_RECLAIMABLE_BYTES commit eb59254608bc1d42c4c6afdcdce9c0d3ce02b318 upstream. Patch series "indirectly reclaimable memory", v2. This patchset introduces the concept of indirectly reclaimable memory and applies it to fix the issue of when a big number of dentries with external names can significantly affect the MemAvailable value. This patch (of 3): Introduce a concept of indirectly reclaimable memory and adds the corresponding memory counter and /proc/vmstat item. Indirectly reclaimable memory is any sort of memory, used by the kernel (except of reclaimable slabs), which is actually reclaimable, i.e. will be released under memory pressure. The counter is in bytes, as it's not always possible to count such objects in pages. The name contains BYTES by analogy to NR_KERNEL_STACK_KB. Link: http://lkml.kernel.org/r/20180305133743.12746-2-guro@fb.com Signed-off-by: Roman Gushchin Reviewed-by: Andrew Morton Cc: Alexander Viro Cc: Michal Hocko Cc: Johannes Weiner Cc: Mel Gorman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 7a4f9efdb213fc5ae014b3a551a1364b94cd2533 Author: Mathias Nyman Date: Mon Feb 12 14:24:47 2018 +0200 xhci: Don't print a warning when setting link state for disabled ports commit 1208d8a84fdcae6b395c57911cdf907450d30e70 upstream. When disabling a USB3 port the hub driver will set the port link state to U3 to prevent "ejected" or "safely removed" devices that are still physically connected from immediately re-enumerating. If the device was really unplugged, then error messages were printed as the hub tries to set the U3 link state for a port that is no longer enabled. xhci-hcd ee000000.usb: Cannot set link state. usb usb8-port1: cannot disable (err = -32) Don't print error message in xhci-hub if hub tries to set port link state for a disabled port. Return -ENODEV instead which also silences hub driver. Signed-off-by: Mathias Nyman Tested-by: Yoshihiro Shimoda Signed-off-by: Ross Zwisler Signed-off-by: Greg Kroah-Hartman commit 74a960430a8d01c33844884a382264599072d3c8 Author: Edgar Cherkasov Date: Thu Sep 27 11:56:03 2018 +0300 i2c: i2c-scmi: fix for i2c_smbus_write_block_data commit 08d9db00fe0e300d6df976e6c294f974988226dd upstream. The i2c-scmi driver crashes when the SMBus Write Block transaction is executed: WARNING: CPU: 9 PID: 2194 at mm/page_alloc.c:3931 __alloc_pages_slowpath+0x9db/0xec0 Call Trace: ? get_page_from_freelist+0x49d/0x11f0 ? alloc_pages_current+0x6a/0xe0 ? new_slab+0x499/0x690 __alloc_pages_nodemask+0x265/0x280 alloc_pages_current+0x6a/0xe0 kmalloc_order+0x18/0x40 kmalloc_order_trace+0x24/0xb0 ? acpi_ut_allocate_object_desc_dbg+0x62/0x10c __kmalloc+0x203/0x220 acpi_os_allocate_zeroed+0x34/0x36 acpi_ut_copy_eobject_to_iobject+0x266/0x31e acpi_evaluate_object+0x166/0x3b2 acpi_smbus_cmi_access+0x144/0x530 [i2c_scmi] i2c_smbus_xfer+0xda/0x370 i2cdev_ioctl_smbus+0x1bd/0x270 i2cdev_ioctl+0xaa/0x250 do_vfs_ioctl+0xa4/0x600 SyS_ioctl+0x79/0x90 do_syscall_64+0x73/0x130 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 ACPI Error: Evaluating _SBW: 4 (20170831/smbus_cmi-185) This problem occurs because the length of ACPI Buffer object is not defined/initialized in the code before a corresponding ACPI method is called. The obvious patch below fixes this issue. Signed-off-by: Edgar Cherkasov Acked-by: Viktor Krasnov Acked-by: Michael Brunner Signed-off-by: Wolfram Sang Signed-off-by: Greg Kroah-Hartman commit 1b7ff5208d2f154c26b9692ddf1b761c9b7339f2 Author: Jan Kara Date: Tue Oct 9 12:19:17 2018 +0200 mm: Preserve _PAGE_DEVMAP across mprotect() calls commit 4628a64591e6cee181237060961e98c615c33966 upstream. Currently _PAGE_DEVMAP bit is not preserved in mprotect(2) calls. As a result we will see warnings such as: BUG: Bad page map in process JobWrk0013 pte:800001803875ea25 pmd:7624381067 addr:00007f0930720000 vm_flags:280000f9 anon_vma: (null) mapping:ffff97f2384056f0 index:0 file:457-000000fe00000030-00000009-000000ca-00000001_2001.fileblock fault:xfs_filemap_fault [xfs] mmap:xfs_file_mmap [xfs] readpage: (null) CPU: 3 PID: 15848 Comm: JobWrk0013 Tainted: G W 4.12.14-2.g7573215-default #1 SLE12-SP4 (unreleased) Hardware name: Intel Corporation S2600WFD/S2600WFD, BIOS SE5C620.86B.01.00.0833.051120182255 05/11/2018 Call Trace: dump_stack+0x5a/0x75 print_bad_pte+0x217/0x2c0 ? enqueue_task_fair+0x76/0x9f0 _vm_normal_page+0xe5/0x100 zap_pte_range+0x148/0x740 unmap_page_range+0x39a/0x4b0 unmap_vmas+0x42/0x90 unmap_region+0x99/0xf0 ? vma_gap_callbacks_rotate+0x1a/0x20 do_munmap+0x255/0x3a0 vm_munmap+0x54/0x80 SyS_munmap+0x1d/0x30 do_syscall_64+0x74/0x150 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 ... when mprotect(2) gets used on DAX mappings. Also there is a wide variety of other failures that can result from the missing _PAGE_DEVMAP flag when the area gets used by get_user_pages() later. Fix the problem by including _PAGE_DEVMAP in a set of flags that get preserved by mprotect(2). Fixes: 69660fd797c3 ("x86, mm: introduce _PAGE_DEVMAP") Fixes: ebd31197931d ("powerpc/mm: Add devmap support for ppc64") Cc: Signed-off-by: Jan Kara Acked-by: Michal Hocko Reviewed-by: Johannes Thumshirn Signed-off-by: Dan Williams Signed-off-by: Greg Kroah-Hartman commit 68ba0bdfe4941a4ef2423fdf900f9b22089ed227 Author: Jérôme Glisse Date: Fri Oct 12 21:34:36 2018 -0700 mm/thp: fix call to mmu_notifier in set_pmd_migration_entry() v2 commit bfba8e5cf28f413aa05571af493871d74438979f upstream. Inside set_pmd_migration_entry() we are holding page table locks and thus we can not sleep so we can not call invalidate_range_start/end() So remove call to mmu_notifier_invalidate_range_start/end() because they are call inside the function calling set_pmd_migration_entry() (see try_to_unmap_one()). Link: http://lkml.kernel.org/r/20181012181056.7864-1-jglisse@redhat.com Signed-off-by: Jérôme Glisse Reported-by: Andrea Arcangeli Reviewed-by: Zi Yan Acked-by: Michal Hocko Cc: Greg Kroah-Hartman Cc: Kirill A. Shutemov Cc: "H. Peter Anvin" Cc: Anshuman Khandual Cc: Dave Hansen Cc: David Nellans Cc: Ingo Molnar Cc: Mel Gorman Cc: Minchan Kim Cc: Naoya Horiguchi Cc: Thomas Gleixner Cc: Vlastimil Babka Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman commit 3e6275d940a4419083de25de38d0cc9585117d25 Author: Will Deacon Date: Fri Oct 5 13:24:36 2018 +0100 arm64: perf: Reject stand-alone CHAIN events for PMUv3 commit ca2b497253ad01c80061a1f3ee9eb91b5d54a849 upstream. It doesn't make sense for a perf event to be configured as a CHAIN event in isolation, so extend the arm_pmu structure with a ->filter_match() function to allow the backend PMU implementation to reject CHAIN events early. Cc: Reviewed-by: Suzuki K Poulose Signed-off-by: Will Deacon Signed-off-by: Greg Kroah-Hartman commit b3e4b3c70a0a4028d481b8b1e84b06acdc80d1ec Author: Marco Felsch Date: Tue Oct 2 10:06:46 2018 +0200 pinctrl: mcp23s08: fix irq and irqchip setup order commit f259f896f2348f0302f6f88d4382378cf9d23a7e upstream. Since 'commit 02e389e63e35 ("pinctrl: mcp23s08: fix irq setup order")' the irq request isn't the last devm_* allocation. Without a deeper look at the irq and testing this isn't a good solution. Since this driver relies on the devm mechanism, requesting a interrupt should be the last thing to avoid memory corruptions during unbinding. 'Commit 02e389e63e35 ("pinctrl: mcp23s08: fix irq setup order")' fixed the order for the interrupt-controller use case only. The mcp23s08_irq_setup() must be split into two to fix it for the interrupt-controller use case and to register the irq at last. So the irq will be freed first during unbind. Cc: stable@vger.kernel.org Cc: Jan Kundrát Cc: Dmitry Mastykin Cc: Sebastian Reichel Fixes: 82039d244f87 ("pinctrl: mcp23s08: add pinconf support") Fixes: 02e389e63e35 ("pinctrl: mcp23s08: fix irq setup order") Signed-off-by: Marco Felsch Tested-by: Phil Reid Signed-off-by: Linus Walleij Signed-off-by: Greg Kroah-Hartman commit d5833a50c6a323bfc42dffe2e78aa0f06bb8263b Author: Chris Boot Date: Mon Oct 8 17:07:30 2018 +0200 mmc: block: avoid multiblock reads for the last sector in SPI mode commit 41591b38f5f8f78344954b68582b5f00e56ffe61 upstream. On some SD cards over SPI, reading with the multiblock read command the last sector will leave the card in a bad state. Remove last sectors from the multiblock reading cmd. Signed-off-by: Chris Boot Signed-off-by: Clément Péron Cc: stable@vger.kernel.org # v4.10+ Signed-off-by: Ulf Hansson Signed-off-by: Greg Kroah-Hartman commit bc183079ddfdeda8282b30a7f9a64aaca11c19a1 Author: Tejun Heo Date: Thu Oct 4 13:28:08 2018 -0700 cgroup: Fix dom_cgrp propagation when enabling threaded mode commit 479adb89a97b0a33e5a9d702119872cc82ca21aa upstream. A cgroup which is already a threaded domain may be converted into a threaded cgroup if the prerequisite conditions are met. When this happens, all threaded descendant should also have their ->dom_cgrp updated to the new threaded domain cgroup. Unfortunately, this propagation was missing leading to the following failure. # cd /sys/fs/cgroup/unified # cat cgroup.subtree_control # show that no controllers are enabled # mkdir -p mycgrp/a/b/c # echo threaded > mycgrp/a/b/cgroup.type At this point, the hierarchy looks as follows: mycgrp [d] a [dt] b [t] c [inv] Now let's make node "a" threaded (and thus "mycgrp" s made "domain threaded"): # echo threaded > mycgrp/a/cgroup.type By this point, we now have a hierarchy that looks as follows: mycgrp [dt] a [t] b [t] c [inv] But, when we try to convert the node "c" from "domain invalid" to "threaded", we get ENOTSUP on the write(): # echo threaded > mycgrp/a/b/c/cgroup.type sh: echo: write error: Operation not supported This patch fixes the problem by * Moving the opencoded ->dom_cgrp save and restoration in cgroup_enable_threaded() into cgroup_{save|restore}_control() so that mulitple cgroups can be handled. * Updating all threaded descendants' ->dom_cgrp to point to the new dom_cgrp when enabling threaded mode. Signed-off-by: Tejun Heo Reported-and-tested-by: "Michael Kerrisk (man-pages)" Reported-by: Amin Jamali Reported-by: Joao De Almeida Pereira Link: https://lore.kernel.org/r/CAKgNAkhHYCMn74TCNiMJ=ccLd7DcmXSbvw3CbZ1YREeG7iJM5g@mail.gmail.com Fixes: 454000adaa2a ("cgroup: introduce cgroup->dom_cgrp and threaded css_set handling") Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Greg Kroah-Hartman commit c339fab172a98d0fe8fc7d87ba5d819026ccfa3b Author: Damien Le Moal Date: Thu Oct 11 11:45:30 2018 +0900 dm linear: fix linear_end_io conditional definition commit 118aa47c7072bce05fc39bd40a1c0a90caed72ab upstream. The dm-linear target is independent of the dm-zoned target. For code requiring support for zoned block devices, use CONFIG_BLK_DEV_ZONED instead of CONFIG_DM_ZONED. While at it, similarly to dm linear, also enable the DM_TARGET_ZONED_HM feature in dm-flakey only if CONFIG_BLK_DEV_ZONED is defined. Fixes: beb9caac211c1 ("dm linear: eliminate linear_end_io call if CONFIG_DM_ZONED disabled") Fixes: 0be12c1c7fce7 ("dm linear: add support for zoned block devices") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman commit efd6537984d5399b9225ac678f9ff5d0001e8f2b Author: Mike Snitzer Date: Wed Oct 10 12:01:55 2018 -0400 dm linear: eliminate linear_end_io call if CONFIG_DM_ZONED disabled commit beb9caac211c1be1bc118bb62d5cf09c4107e6a5 upstream. It is best to avoid any extra overhead associated with bio completion. DM core will indirectly call a DM target's .end_io if it is defined. In the case of DM linear, there is no need to do so (for every bio that completes) if CONFIG_DM_ZONED is not enabled. Avoiding an extra indirect call for every bio completion is very important for ensuring DM linear doesn't incur more overhead that further widens the performance gap between dm-linear and raw block devices. Fixes: 0be12c1c7fce7 ("dm linear: add support for zoned block devices") Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman commit 261f2cba100bacbfa1e9d3f331461069450fc75d Author: Damien Le Moal Date: Tue Oct 9 14:24:31 2018 +0900 dm: fix report zone remapping to account for partition offset commit 9864cd5dc54cade89fd4b0954c2e522841aa247c upstream. If dm-linear or dm-flakey are layered on top of a partition of a zoned block device, remapping of the start sector and write pointer position of the zones reported by a report zones BIO must be modified to account for the target table entry mapping (start offset within the device and entry mapping with the dm device). If the target's backing device is a partition of a whole disk, the start sector on the physical device of the partition must also be accounted for when modifying the zone information. However, dm_remap_zone_report() was not considering this last case, resulting in incorrect zone information remapping with targets using disk partitions. Fix this by calculating the target backing device start sector using the position of the completed report zones BIO and the unchanged position and size of the original report zone BIO. With this value calculated, the start sector and write pointer position of the target zones can be correctly remapped. Fixes: 10999307c14e ("dm: introduce dm_remap_zone_report()") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman commit 6c8faa19e9cc5fc0010737a2de38ad6b477bea30 Author: Shenghui Wang Date: Sun Oct 7 14:45:41 2018 +0800 dm cache: destroy migration_cache if cache target registration failed commit c7cd55504a5b0fc826a2cd9540845979d24ae542 upstream. Commit 7e6358d244e47 ("dm: fix various targets to dm_register_target after module __init resources created") inadvertently introduced this bug when it moved dm_register_target() after the call to KMEM_CACHE(). Fixes: 7e6358d244e47 ("dm: fix various targets to dm_register_target after module __init resources created") Cc: stable@vger.kernel.org Signed-off-by: Shenghui Wang Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman commit 8d2f62cb2d463456a4dbc84c59d1d2a05947fe20 Author: Eric Farman Date: Tue Oct 2 03:02:35 2018 +0200 s390/cio: Fix how vfio-ccw checks pinned pages commit 24abf2901b18bf941b9f21ea2ce5791f61097ae4 upstream. We have two nested loops to check the entries within the pfn_array_table arrays. But we mistakenly use the outer array as an index in our check, and completely ignore the indexing performed by the inner loop. Cc: stable@vger.kernel.org Signed-off-by: Eric Farman Message-Id: <20181002010235.42483-1-farman@linux.ibm.com> Signed-off-by: Cornelia Huck Signed-off-by: Greg Kroah-Hartman commit e3f725f5c46aa7e989430b0ecc3463a5c4e4cd49 Author: Adrian Hunter Date: Tue Sep 11 14:45:04 2018 +0300 perf script python: Fix export-to-sqlite.py sample columns commit d005efe18db0b4a123dd92ea8e77e27aee8f99fd upstream. With the "branches" export option, not all sample columns are exported. However the unwanted columns are not at the end of the tuple, as assumed by the code. Fix by taking the first 15 and last 3 values, instead of the first 18. Signed-off-by: Adrian Hunter Cc: Jiri Olsa Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20180911114504.28516-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Greg Kroah-Hartman commit 82ac2740aa74668b694c04de659155571b6514e7 Author: Adrian Hunter Date: Tue Sep 11 14:45:03 2018 +0300 perf script python: Fix export-to-postgresql.py occasional failure commit 25e11700b54c7b6b5ebfc4361981dae12299557b upstream. Occasional export failures were found to be caused by truncating 64-bit pointers to 32-bits. Fix by explicitly setting types for all ctype arguments and results. Signed-off-by: Adrian Hunter Cc: Jiri Olsa Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20180911114504.28516-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Greg Kroah-Hartman commit 54886c97839732650f0aabb90e674cad791e82c0 Author: Mike Rapoport Date: Sun Oct 7 11:31:51 2018 +0300 percpu: stop leaking bitmap metadata blocks commit 6685b357363bfe295e3ae73665014db4aed62c58 upstream. The commit ca460b3c9627 ("percpu: introduce bitmap metadata blocks") introduced bitmap metadata blocks. These metadata blocks are allocated whenever a new chunk is created, but they are never freed. Fix it. Fixes: ca460b3c9627 ("percpu: introduce bitmap metadata blocks") Signed-off-by: Mike Rapoport Cc: stable@vger.kernel.org Signed-off-by: Dennis Zhou Signed-off-by: Greg Kroah-Hartman commit 6c8f4babb57bc9ef0e59aa50d3157610029f01a7 Author: Mikulas Patocka Date: Fri Aug 17 15:19:37 2018 -0400 mach64: detect the dot clock divider correctly on sparc commit 76ebebd2464c5c8a4453c98b6dbf9c95a599e810 upstream. On Sun Ultra 5, it happens that the dot clock is not set up properly for some videomodes. For example, if we set the videomode "r1024x768x60" in the firmware, Linux would incorrectly set a videomode with refresh rate 180Hz when booting (suprisingly, my LCD monitor can display it, although display quality is very low). The reason is this: Older mach64 cards set the divider in the register VCLK_POST_DIV. The register has four 2-bit fields (the field that is actually used is specified in the lowest two bits of the register CLOCK_CNTL). The 2 bits select divider "1, 2, 4, 8". On newer mach64 cards, there's another bit added - the top four bits of PLL_EXT_CNTL extend the divider selection, so we have possible dividers "1, 2, 4, 8, 3, 5, 6, 12". The Linux driver clears the top four bits of PLL_EXT_CNTL and never sets them, so it can work regardless if the card supports them. However, the sparc64 firmware may set these extended dividers during boot - and the mach64 driver detects incorrect dot clock in this case. This patch makes the driver read the additional divider bit from PLL_EXT_CNTL and calculate the initial refresh rate properly. Signed-off-by: Mikulas Patocka Cc: stable@vger.kernel.org Acked-by: David S. Miller Reviewed-by: Ville Syrjälä Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 86717a97f9218de798c6d33e0014ab9325e18f56 Author: Paul Burton Date: Tue Sep 25 15:51:26 2018 -0700 MIPS: VDSO: Always map near top of user memory commit ea7e0480a4b695d0aa6b3fa99bd658a003122113 upstream. When using the legacy mmap layout, for example triggered using ulimit -s unlimited, get_unmapped_area() fills memory from bottom to top starting from a fairly low address near TASK_UNMAPPED_BASE. This placement is suboptimal if the user application wishes to allocate large amounts of heap memory using the brk syscall. With the VDSO being located low in the user's virtual address space, the amount of space available for access using brk is limited much more than it was prior to the introduction of the VDSO. For example: # ulimit -s unlimited; cat /proc/self/maps 00400000-004ec000 r-xp 00000000 08:00 71436 /usr/bin/coreutils 004fc000-004fd000 rwxp 000ec000 08:00 71436 /usr/bin/coreutils 004fd000-0050f000 rwxp 00000000 00:00 0 00cc3000-00ce4000 rwxp 00000000 00:00 0 [heap] 2ab96000-2ab98000 r--p 00000000 00:00 0 [vvar] 2ab98000-2ab99000 r-xp 00000000 00:00 0 [vdso] 2ab99000-2ab9d000 rwxp 00000000 00:00 0 ... Resolve this by adjusting STACK_TOP to reserve space for the VDSO & providing an address hint to get_unmapped_area() causing it to use this space even when using the legacy mmap layout. We reserve enough space for the VDSO, plus 1MB or 256MB for 32 bit & 64 bit systems respectively within which we randomize the VDSO base address. Previously this randomization was taken care of by the mmap base address randomization performed by arch_mmap_rnd(). The 1MB & 256MB sizes are somewhat arbitrary but chosen such that we have some randomization without taking up too much of the user's virtual address space, which is often in short supply for 32 bit systems. With this the VDSO is always mapped at a high address, leaving lots of space for statically linked programs to make use of brk: # ulimit -s unlimited; cat /proc/self/maps 00400000-004ec000 r-xp 00000000 08:00 71436 /usr/bin/coreutils 004fc000-004fd000 rwxp 000ec000 08:00 71436 /usr/bin/coreutils 004fd000-0050f000 rwxp 00000000 00:00 0 00c28000-00c49000 rwxp 00000000 00:00 0 [heap] ... 7f67c000-7f69d000 rwxp 00000000 00:00 0 [stack] 7f7fc000-7f7fd000 rwxp 00000000 00:00 0 7fcf1000-7fcf3000 r--p 00000000 00:00 0 [vvar] 7fcf3000-7fcf4000 r-xp 00000000 00:00 0 [vdso] Signed-off-by: Paul Burton Reported-by: Huacai Chen Fixes: ebb5e78cc634 ("MIPS: Initial implementation of a VDSO") Cc: Huacai Chen Cc: linux-mips@linux-mips.org Cc: stable@vger.kernel.org # v4.4+ Signed-off-by: Greg Kroah-Hartman commit 8676e0b4a28fa12200ec742e7d8250b11d52916f Author: Jann Horn Date: Fri Oct 5 15:52:03 2018 -0700 mm/vmstat.c: fix outdated vmstat_text commit 28e2c4bb99aa40f9d5f07ac130cbc4da0ea93079 upstream. 7a9cdebdcc17 ("mm: get rid of vmacache_flush_all() entirely") removed the VMACACHE_FULL_FLUSHES statistics, but didn't remove the corresponding entry in vmstat_text. This causes an out-of-bounds access in vmstat_show(). Luckily this only affects kernels with CONFIG_DEBUG_VM_VMACACHE=y, which is probably very rare. Link: http://lkml.kernel.org/r/20181001143138.95119-1-jannh@google.com Fixes: 7a9cdebdcc17 ("mm: get rid of vmacache_flush_all() entirely") Signed-off-by: Jann Horn Reviewed-by: Kees Cook Reviewed-by: Andrew Morton Acked-by: Michal Hocko Acked-by: Roman Gushchin Cc: Davidlohr Bueso Cc: Oleg Nesterov Cc: Christoph Lameter Cc: Kemi Wang Cc: Andy Lutomirski Cc: Ingo Molnar Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman commit 059726864271e14a6fbc5f47bb15098a7e1d1d89 Author: Amber Lin Date: Wed Sep 12 21:42:18 2018 -0400 drm/amdgpu: Fix SDMA HQD destroy error on gfx_v7 [ Upstream commit caaa4c8a6be2a275bd14f2369ee364978ff74704 ] A wrong register bit was examinated for checking SDMA status so it reports false failures. This typo only appears on gfx_v7. gfx_v8 checks the correct bit. Acked-by: Alex Deucher Signed-off-by: Amber Lin Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit e4865b46e195e403915e6b630f187f80206b243d Author: Vitaly Kuznetsov Date: Thu Aug 2 17:08:16 2018 +0200 x86/kvm/lapic: always disable MMIO interface in x2APIC mode [ Upstream commit d1766202779e81d0f2a94c4650a6ba31497d369d ] When VMX is used with flexpriority disabled (because of no support or if disabled with module parameter) MMIO interface to lAPIC is still available in x2APIC mode while it shouldn't be (kvm-unit-tests): PASS: apic_disable: Local apic enabled in x2APIC mode PASS: apic_disable: CPUID.1H:EDX.APIC[bit 9] is set FAIL: apic_disable: *0xfee00030: 50014 The issue appears because we basically do nothing while switching to x2APIC mode when APIC access page is not used. apic_mmio_{read,write} only check if lAPIC is disabled before proceeding to actual write. When APIC access is virtualized we correctly manipulate with VMX controls in vmx_set_virtual_apic_mode() and we don't get vmexits from memory writes in x2APIC mode so there's no issue. Disabling MMIO interface seems to be easy. The question is: what do we do with these reads and writes? If we add apic_x2apic_mode() check to apic_mmio_in_range() and return -EOPNOTSUPP these reads and writes will go to userspace. When lAPIC is in kernel, Qemu uses this interface to inject MSIs only (see kvm_apic_mem_write() in hw/i386/kvm/apic.c). This somehow works with disabled lAPIC but when we're in xAPIC mode we will get a real injected MSI from every write to lAPIC. Not good. The simplest solution seems to be to just ignore writes to the region and return ~0 for all reads when we're in x2APIC mode. This is what this patch does. However, this approach is inconsistent with what currently happens when flexpriority is enabled: we allocate APIC access page and create KVM memory region so in x2APIC modes all reads and writes go to this pre-allocated page which is, btw, the same for all vCPUs. Signed-off-by: Vitaly Kuznetsov Signed-off-by: Paolo Bonzini Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 88659387b9d509ee4bb8f0b0fe9ce2ff00988b46 Author: Hans de Goede Date: Wed Sep 12 11:34:56 2018 +0200 clk: x86: Stop marking clocks as CLK_IS_CRITICAL [ Upstream commit 648e921888ad96ea3dc922739e96716ad3225d7f ] Commit d31fd43c0f9a ("clk: x86: Do not gate clocks enabled by the firmware"), which added the code to mark clocks as CLK_IS_CRITICAL, causes all unclaimed PMC clocks on Cherry Trail devices to be on all the time, resulting on the device not being able to reach S0i3 when suspended. The reason for this commit is that on some Bay Trail / Cherry Trail devices the r8169 ethernet controller uses pmc_plt_clk_4. Now that the clk-pmc-atom driver exports an "ether_clk" alias for pmc_plt_clk_4 and the r8169 driver has been modified to get and enable this clock (if present) the marking of the clocks as CLK_IS_CRITICAL is no longer necessary. This commit removes the CLK_IS_CRITICAL marking, fixing Cherry Trail devices not being able to reach S0i3 greatly decreasing their battery drain when suspended. Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=193891#c102 Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=196861 Cc: Johannes Stezenbach Cc: Carlo Caione Reported-by: Johannes Stezenbach Reviewed-by: Andy Shevchenko Acked-by: Stephen Boyd Signed-off-by: Hans de Goede Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit ba54417f8d01b44907464777e5280eb2216d2d2f Author: Hans de Goede Date: Wed Sep 12 11:34:54 2018 +0200 clk: x86: add "ether_clk" alias for Bay Trail / Cherry Trail [ Upstream commit b1e3454d39f992e5409cd19f97782185950df6e7 ] Commit d31fd43c0f9a ("clk: x86: Do not gate clocks enabled by the firmware") causes all unclaimed PMC clocks on Cherry Trail devices to be on all the time, resulting on the device not being able to reach S0i2 or S0i3 when suspended. The reason for this commit is that on some Bay Trail / Cherry Trail devices the ethernet controller uses pmc_plt_clk_4. This commit adds an "ether_clk" alias, so that the relevant ethernet drivers can try to (optionally) use this, without needing X86 specific code / hacks, thus fixing ethernet on these devices without breaking S0i3 support. This commit uses clkdev_hw_create() to create the alias, mirroring the code for the already existing "mclk" alias for pmc_plt_clk_3. Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=193891#c102 Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=196861 Cc: Johannes Stezenbach Cc: Carlo Caione Reported-by: Johannes Stezenbach Acked-by: Stephen Boyd Reviewed-by: Andy Shevchenko Signed-off-by: Hans de Goede Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit cac34c122cf389f9537d20bada82ecf4939d9c4e Author: Stephen Hemminger Date: Fri Sep 14 12:54:56 2018 -0700 PCI: hv: support reporting serial number as slot information [ Upstream commit a15f2c08c70811f120d99288d81f70d7f3d104f1 ] The Hyper-V host API for PCI provides a unique "serial number" which can be used as basis for sysfs PCI slot table. This can be useful for cases where userspace wants to find the PCI device based on serial number. When an SR-IOV NIC is added, the host sends an attach message with serial number. The kernel doesn't use the serial number, but it is useful when doing the same thing in a userspace driver such as the DPDK. By having /sys/bus/pci/slots/N it provides a direct way to find the matching PCI device. There maybe some cases where serial number is not unique such as when using GPU's. But the PCI slot infrastructure will handle that. This has a side effect which may also be useful. The common udev network device naming policy uses the slot information (rather than PCI address). Signed-off-by: Stephen Hemminger Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 18918ed70db940b22aad40068a8e20749e5fee74 Author: Nicolas Ferre Date: Fri Sep 14 17:48:11 2018 +0200 ARM: dts: at91: add new compatibility string for macb on sama5d3 [ Upstream commit 321cc359d899a8e988f3725d87c18a628e1cc624 ] We need this new compatibility string as we experienced different behavior for this 10/100Mbits/s macb interface on this particular SoC. Backward compatibility is preserved as we keep the alternative strings. Signed-off-by: Nicolas Ferre Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit c77295d6fa1c93a344dea79e9990351fc9cdffab Author: Nicolas Ferre Date: Fri Sep 14 17:48:10 2018 +0200 net: macb: disable scatter-gather for macb on sama5d3 [ Upstream commit eb4ed8e2d7fecb5f40db38e4498b9ee23cddf196 ] Create a new configuration for the sama5d3-macb new compatibility string. This configuration disables scatter-gather because we experienced lock down of the macb interface of this particular SoC under very high load. Signed-off-by: Nicolas Ferre Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 3265bda5bd9f1891f29c741a5fae3db73a882fc7 Author: Jongsung Kim Date: Thu Sep 13 18:32:21 2018 +0900 stmmac: fix valid numbers of unicast filter entries [ Upstream commit edf2ef7242805e53ec2e0841db26e06d8bc7da70 ] Synopsys DWC Ethernet MAC can be configured to have 1..32, 64, or 128 unicast filter entries. (Table 7-8 MAC Address Registers from databook) Fix dwmac1000_validate_ucast_entries() to accept values between 1 and 32 in addition. Signed-off-by: Jongsung Kim Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 1826e55625164ab51a21e46eacac18b43899b65f Author: Stephen Hemminger Date: Thu Sep 13 08:03:43 2018 -0700 hv_netvsc: fix schedule in RCU context [ Upstream commit 018349d70f28a78d5343b3660cb66e1667005f8a ] When netvsc device is removed it can call reschedule in RCU context. This happens because canceling the subchannel setup work could (in theory) cause a reschedule when manipulating the timer. To reproduce, run with lockdep enabled kernel and unbind a network device from hv_netvsc (via sysfs). [ 160.682011] WARNING: suspicious RCU usage [ 160.707466] 4.19.0-rc3-uio+ #2 Not tainted [ 160.709937] ----------------------------- [ 160.712352] ./include/linux/rcupdate.h:302 Illegal context switch in RCU read-side critical section! [ 160.723691] [ 160.723691] other info that might help us debug this: [ 160.723691] [ 160.730955] [ 160.730955] rcu_scheduler_active = 2, debug_locks = 1 [ 160.762813] 5 locks held by rebind-eth.sh/1812: [ 160.766851] #0: 000000008befa37a (sb_writers#6){.+.+}, at: vfs_write+0x184/0x1b0 [ 160.773416] #1: 00000000b097f236 (&of->mutex){+.+.}, at: kernfs_fop_write+0xe2/0x1a0 [ 160.783766] #2: 0000000041ee6889 (kn->count#3){++++}, at: kernfs_fop_write+0xeb/0x1a0 [ 160.787465] #3: 0000000056d92a74 (&dev->mutex){....}, at: device_release_driver_internal+0x39/0x250 [ 160.816987] #4: 0000000030f6031e (rcu_read_lock){....}, at: netvsc_remove+0x1e/0x250 [hv_netvsc] [ 160.828629] [ 160.828629] stack backtrace: [ 160.831966] CPU: 1 PID: 1812 Comm: rebind-eth.sh Not tainted 4.19.0-rc3-uio+ #2 [ 160.832952] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v1.0 11/26/2012 [ 160.832952] Call Trace: [ 160.832952] dump_stack+0x85/0xcb [ 160.832952] ___might_sleep+0x1a3/0x240 [ 160.832952] __flush_work+0x57/0x2e0 [ 160.832952] ? __mutex_lock+0x83/0x990 [ 160.832952] ? __kernfs_remove+0x24f/0x2e0 [ 160.832952] ? __kernfs_remove+0x1b2/0x2e0 [ 160.832952] ? mark_held_locks+0x50/0x80 [ 160.832952] ? get_work_pool+0x90/0x90 [ 160.832952] __cancel_work_timer+0x13c/0x1e0 [ 160.832952] ? netvsc_remove+0x1e/0x250 [hv_netvsc] [ 160.832952] ? __lock_is_held+0x55/0x90 [ 160.832952] netvsc_remove+0x9a/0x250 [hv_netvsc] [ 160.832952] vmbus_remove+0x26/0x30 [ 160.832952] device_release_driver_internal+0x18a/0x250 [ 160.832952] unbind_store+0xb4/0x180 [ 160.832952] kernfs_fop_write+0x113/0x1a0 [ 160.832952] __vfs_write+0x36/0x1a0 [ 160.832952] ? rcu_read_lock_sched_held+0x6b/0x80 [ 160.832952] ? rcu_sync_lockdep_assert+0x2e/0x60 [ 160.832952] ? __sb_start_write+0x141/0x1a0 [ 160.832952] ? vfs_write+0x184/0x1b0 [ 160.832952] vfs_write+0xbe/0x1b0 [ 160.832952] ksys_write+0x55/0xc0 [ 160.832952] do_syscall_64+0x60/0x1b0 [ 160.832952] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 160.832952] RIP: 0033:0x7fe48f4c8154 Resolve this by getting RTNL earlier. This is safe because the subchannel work queue does trylock on RTNL and will detect the race. Fixes: 7b2ee50c0cd5 ("hv_netvsc: common detach logic") Signed-off-by: Stephen Hemminger Reviewed-by: Haiyang Zhang Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 37ca1cc8d4c021c8fe79e070ca7bc2fd26ed1530 Author: Yu Zhao Date: Tue Sep 11 15:15:16 2018 -0600 sound: don't call skl_init_chip() to reset intel skl soc [ Upstream commit 75383f8d39d4c0fb96083dd460b7b139fbdac492 ] Internally, skl_init_chip() calls snd_hdac_bus_init_chip() which 1) sets bus->chip_init to prevent multiple entrances before device is stopped; 2) enables interrupt. We shouldn't use it for the purpose of resetting device only because 1) when we really want to initialize device, we won't be able to do so; 2) we are ready to handle interrupt yet, and kernel crashes when interrupt comes in. Rename azx_reset() to snd_hdac_bus_reset_link(), and use it to reset device properly. Fixes: 60767abcea3d ("ASoC: Intel: Skylake: Reset the controller in probe") Reviewed-by: Takashi Iwai Signed-off-by: Yu Zhao Signed-off-by: Mark Brown Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 2af2b70c107b8a2e75e3c3351f3777e8cf26229c Author: Yu Zhao Date: Tue Sep 11 15:14:04 2018 -0600 sound: enable interrupt after dma buffer initialization [ Upstream commit b61749a89f826eb61fc59794d9e4697bd246eb61 ] In snd_hdac_bus_init_chip(), we enable interrupt before snd_hdac_bus_init_cmd_io() initializing dma buffers. If irq has been acquired and irq handler uses the dma buffer, kernel may crash when interrupt comes in. Fix the problem by postponing enabling irq after dma buffer initialization. And warn once on null dma buffer pointer during the initialization. Reviewed-by: Takashi Iwai Signed-off-by: Yu Zhao Signed-off-by: Mark Brown Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit a5733703e38c9a1b79ae6f086d3b427ae8b49e93 Author: Dan Carpenter Date: Sat Sep 8 11:42:27 2018 +0300 scsi: qla2xxx: Fix an endian bug in fcpcmd_is_corrupted() [ Upstream commit cbe3fd39d223f14b1c60c80fe9347a3dd08c2edb ] We should first do the le16_to_cpu endian conversion and then apply the FCP_CMD_LENGTH_MASK mask. Fixes: 5f35509db179 ("qla2xxx: Terminate exchange if corrupted") Signed-off-by: Dan Carpenter Acked-by: Quinn Tran Acked-by: Himanshu Madhani Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 254cc00e53d7621d1d3b138db989bf47879e9ba9 Author: Laura Abbott Date: Tue Sep 4 11:47:40 2018 -0700 scsi: iscsi: target: Don't use stack buffer for scatterlist [ Upstream commit 679fcae46c8b2352bba3485d521da070cfbe68e6 ] Fedora got a bug report of a crash with iSCSI: kernel BUG at include/linux/scatterlist.h:143! ... RIP: 0010:iscsit_do_crypto_hash_buf+0x154/0x180 [iscsi_target_mod] ... Call Trace: ? iscsi_target_tx_thread+0x200/0x200 [iscsi_target_mod] iscsit_get_rx_pdu+0x4cd/0xa90 [iscsi_target_mod] ? native_sched_clock+0x3e/0xa0 ? iscsi_target_tx_thread+0x200/0x200 [iscsi_target_mod] iscsi_target_rx_thread+0x81/0xf0 [iscsi_target_mod] kthread+0x120/0x140 ? kthread_create_worker_on_cpu+0x70/0x70 ret_from_fork+0x3a/0x50 This is a BUG_ON for using a stack buffer with a scatterlist. There are two cases that trigger this bug. Switch to using a dynamically allocated buffer for one case and do not assign a NULL buffer in another case. Signed-off-by: Laura Abbott Reviewed-by: Mike Christie Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 5d53f0d897c37b41a5b8cabc2ebae3bf4ce12eae Author: Tony Lindgren Date: Wed Apr 25 07:29:22 2018 -0700 mfd: omap-usb-host: Fix dts probe of children [ Upstream commit 10492ee8ed9188d6d420e1f79b2b9bdbc0624e65 ] It currently only works if the parent bus uses "simple-bus". We currently try to probe children with non-existing compatible values. And we're missing .probe. I noticed this while testing devices configured to probe using ti-sysc interconnect target module driver. For that we also may want to rebind the driver, so let's remove __init and __exit. Signed-off-by: Tony Lindgren Acked-by: Roger Quadros Signed-off-by: Lee Jones Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit e3583d7b1bd9869c371aa1164a085e0966a86391 Author: Hermes Zhang Date: Tue Aug 28 09:48:30 2018 +0800 Bluetooth: hci_ldisc: Free rw_semaphore on close [ Upstream commit e6a57d22f787e73635ce0d29eef0abb77928b3e9 ] The percpu_rw_semaphore is not currently freed, and this leads to a crash when the stale rcu callback is invoked. DEBUG_OBJECTS detects this. ODEBUG: free active (active state 1) object type: rcu_head hint: (null) ------------[ cut here ]------------ WARNING: CPU: 1 PID: 2024 at debug_print_object+0xac/0xc8 PC is at debug_print_object+0xac/0xc8 LR is at debug_print_object+0xac/0xc8 Call trace: [] debug_print_object+0xac/0xc8 [] debug_check_no_obj_freed+0x1e8/0x228 [] kfree+0x1cc/0x250 [] hci_uart_tty_close+0x54/0x108 [] tty_ldisc_close.isra.1+0x40/0x58 [] tty_ldisc_kill+0x1c/0x40 [] tty_ldisc_release+0x94/0x170 [] tty_release_struct+0x1c/0x58 [] tty_release+0x3b0/0x490 [] __fput+0x88/0x1d0 [] ____fput+0xc/0x18 [] task_work_run+0x9c/0xc0 [] do_exit+0x24c/0x8a0 [] do_group_exit+0x38/0xa0 [] __wake_up_parent+0x0/0x28 [] el0_svc_naked+0x34/0x38 ---[ end trace bfe08cbd89098cdf ]--- Signed-off-by: Hermes Zhang Signed-off-by: Marcel Holtmann Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit bac5611371559cd8b83fd6bdcb869b843a342500 Author: Kuninori Morimoto Date: Thu Sep 6 03:21:47 2018 +0000 ASoC: rsnd: don't fallback to PIO mode when -EPROBE_DEFER [ Upstream commit 6c92d5a2744e27619a8fcc9d74b91ee9f1cdebd1 ] Current rsnd driver will fallback to PIO mode if it can't get DMA handler. But, DMA might return -EPROBE_DEFER when probe timing. This driver always fallback to PIO mode especially from commit ac6bbf0cdf4206c ("iommu: Remove IOMMU_OF_DECLARE") because of this reason. The DMA driver will be probed later, but sound driver might be probed as PIO mode in such case. This patch fixup this issue. Then, -EPROBE_DEFER is not error. Thus, let's don't indicate error message in such case. And it needs to call rsnd_adg_remove() individually if probe failed, because it registers clk which should be unregister. Maybe PIO fallback feature itself is not needed, but let's keep it so far. Signed-off-by: Kuninori Morimoto Signed-off-by: Mark Brown Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit ad9ad950a37bd6038ff61244d1a3c7c471399335 Author: Kuninori Morimoto Date: Thu Sep 6 03:21:33 2018 +0000 ASoC: rsnd: adg: care clock-frequency size [ Upstream commit 69235ccf491d2e26aefd465c0d3ccd1e3b2a9a9c ] ADG has buffer over flow bug if DT has more than 3 clock-frequency. This patch fixup this issue, and uses first 2 values. clock-frequency = ; /* this is OK */ clock-frequency = ; /* this is NG */ Signed-off-by: Kuninori Morimoto Tested-by: Hiroyuki Yokoyama Signed-off-by: Mark Brown Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 6d52f3e1e729116b686d61481f85ff9c204c77a3 Author: Lei Yang Date: Wed Sep 5 17:57:15 2018 +0800 selftests: memory-hotplug: add required configs [ Upstream commit 4d85af102a66ee6aeefa596f273169e77fb2b48e ] add CONFIG_MEMORY_HOTREMOVE=y in config without this config, /sys/devices/system/memory/memory*/removable always return 0, I endup getting an early skip during test Signed-off-by: Lei Yang Signed-off-by: Shuah Khan (Samsung OSG) Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit e121efd796c9f798a5f6ba4991c628bd23719071 Author: Lei Yang Date: Wed Sep 5 11:14:49 2018 +0800 selftests/efivarfs: add required kernel configs [ Upstream commit 53cf59d6c0ad3edc4f4449098706a8f8986258b6 ] add config file Signed-off-by: Lei Yang Signed-off-by: Shuah Khan (Samsung OSG) Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit c5f7b0d2ce9e523ff334b9ffee2e1145ac3532b9 Author: Danny Smith Date: Thu Aug 23 10:26:20 2018 +0200 ASoC: sigmadsp: safeload should not have lower byte limit [ Upstream commit 5ea752c6efdf5aa8a57aed816d453a8f479f1b0a ] Fixed range in safeload conditional to allow safeload to up to 20 bytes, without a lower limit. Signed-off-by: Danny Smith Acked-by: Lars-Peter Clausen Signed-off-by: Mark Brown Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit c08a99325a567d033b3a4549d8637191a135a827 Author: Pierre-Louis Bossart Date: Wed Aug 22 22:49:36 2018 -0500 ASoC: wm8804: Add ACPI support [ Upstream commit 960cdd50ca9fdfeb82c2757107bcb7f93c8d7d41 ] HID made of either Wolfson/CirrusLogic PCI ID + 8804 identifier. This helps enumerate the HifiBerry Digi+ HAT boards on the Up2 platform. The scripts at https://github.com/thesofproject/acpi-scripts can be used to add the ACPI initrd overlays. Signed-off-by: Pierre-Louis Bossart Acked-by: Charles Keepax Signed-off-by: Mark Brown Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit a15fac93a3e6019dac62b5c38700decd47849bf9 Author: Oder Chiou Date: Wed Aug 15 14:47:49 2018 +0800 ASoC: rt5514: Fix the issue of the delay volume applied again [ Upstream commit 6f0a256253f48095ba2e5bcdfbed41f21643c105 ] After our evaluation, we need to modify the default values to make sure the volume applied immediately. Signed-off-by: Oder Chiou Signed-off-by: Mark Brown Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit c5df58138946fe24d3cb0c99bb6ce04130c657b7 Author: Eric Dumazet Date: Tue Oct 2 12:35:05 2018 -0700 inet: make sure to grab rcu_read_lock before using ireq->ireq_opt [ Upstream commit 2ab2ddd301a22ca3c5f0b743593e4ad2953dfa53 ] Timer handlers do not imply rcu_read_lock(), so my recent fix triggered a LOCKDEP warning when SYNACK is retransmit. Lets add rcu_read_lock()/rcu_read_unlock() pairs around ireq->ireq_opt usages instead of guessing what is done by callers, since it is not worth the pain. Get rid of ireq_opt_deref() helper since it hides the logic without real benefit, since it is now a standard rcu_dereference(). Fixes: 1ad98e9d1bdf ("tcp/dccp: fix lockdep issue when SYN is backlogged") Signed-off-by: Eric Dumazet Reported-by: Willem de Bruijn Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 17af5475aef350b78875646b2795381f452de756 Author: Eric Dumazet Date: Mon Oct 1 15:02:26 2018 -0700 tcp/dccp: fix lockdep issue when SYN is backlogged [ Upstream commit 1ad98e9d1bdf4724c0a8532fabd84bf3c457c2bc ] In normal SYN processing, packets are handled without listener lock and in RCU protected ingress path. But syzkaller is known to be able to trick us and SYN packets might be processed in process context, after being queued into socket backlog. In commit 06f877d613be ("tcp/dccp: fix other lockdep splats accessing ireq_opt") I made a very stupid fix, that happened to work mostly because of the regular path being RCU protected. Really the thing protecting ireq->ireq_opt is RCU read lock, and the pseudo request refcnt is not relevant. This patch extends what I did in commit 449809a66c1d ("tcp/dccp: block BH for SYN processing") by adding an extra rcu_read_{lock|unlock} pair in the paths that might be taken when processing SYN from socket backlog (thus possibly in process context) Fixes: 06f877d613be ("tcp/dccp: fix other lockdep splats accessing ireq_opt") Signed-off-by: Eric Dumazet Reported-by: syzbot Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 4b7b26024f52a5cdeca20b56657f90d74a0a1995 Author: Maciej Żenczykowski Date: Sat Sep 22 01:34:01 2018 -0700 net-ethtool: ETHTOOL_GUFO did not and should not require CAP_NET_ADMIN [ Upstream commit 474ff2600889e16280dbc6ada8bfecb216169a70 ] So it should not fail with EPERM even though it is no longer implemented... This is a fix for: (userns)$ egrep ^Cap /proc/self/status CapInh: 0000003fffffffff CapPrm: 0000003fffffffff CapEff: 0000003fffffffff CapBnd: 0000003fffffffff CapAmb: 0000003fffffffff (userns)$ tcpdump -i usb_rndis0 tcpdump: WARNING: usb_rndis0: SIOCETHTOOL(ETHTOOL_GUFO) ioctl failed: Operation not permitted Warning: Kernel filter failed: Bad file descriptor tcpdump: can't remove kernel filter: Bad file descriptor With this change it returns EOPNOTSUPP instead of EPERM. See also https://github.com/the-tcpdump-group/libpcap/issues/689 Fixes: 08a00fea6de2 "net: Remove references to NETIF_F_UFO from ethtool." Cc: David S. Miller Signed-off-by: Maciej Żenczykowski Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 98c77f2eef29ffe9288edd84f8b5bc3fe2a3a614 Author: Davide Caratti Date: Wed Sep 19 19:01:37 2018 +0200 bnxt_en: don't try to offload VLAN 'modify' action [ Upstream commit 8c6ec3613e7b0aade20a3196169c0bab32ed3e3f ] bnxt offload code currently supports only 'push' and 'pop' operation: let .ndo_setup_tc() return -EOPNOTSUPP if VLAN 'modify' action is configured. Fixes: 2ae7408fedfe ("bnxt_en: bnxt: add TC flower filter offload support") Signed-off-by: Davide Caratti Acked-by: Sathya Perla Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit eb79c31aac15aad1ff700f1ac6c8223946b8553e Author: Jakub Kicinski Date: Tue Oct 2 10:10:14 2018 -0700 nfp: avoid soft lockups under control message storm [ Upstream commit ff58e2df62ce29d0552278c290ae494b30fe0c6f ] When FW floods the driver with control messages try to exit the cmsg processing loop every now and then to avoid soft lockups. Cmsg processing is generally very lightweight so 512 seems like a reasonable budget, which should not be exceeded under normal conditions. Fixes: 77ece8d5f196 ("nfp: add control vNIC datapath") Signed-off-by: Jakub Kicinski Reviewed-by: Simon Horman Tested-by: Pieter Jansen van Vuuren Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit f578e5b34c388a9281cfc613ed59aa9c414cc83f Author: Mahesh Bandewar Date: Tue Oct 2 12:14:34 2018 -0700 bonding: fix warning message [ Upstream commit 0f3b914c9cfcd7bbedd445dc4ac5dd999fa213c2 ] RX queue config for bonding master could be different from its slave device(s). With the commit 6a9e461f6fe4 ("bonding: pass link-local packets to bonding master also."), the packet is reinjected into stack with skb->dev as bonding master. This potentially triggers the message: "bondX received packet on queue Y, but number of RX queues is Z" whenever the queue that packet is received on is higher than the numrxqueues on bonding master (Y > Z). Fixes: 6a9e461f6fe4 ("bonding: pass link-local packets to bonding master also.") Reported-by: John Sperbeck Signed-off-by: Eric Dumazet Signed-off-by: Mahesh Bandewar Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 90a3d8afe1f41ab7034d91c6550b63485ae5ae1b Author: Mahesh Bandewar Date: Mon Sep 24 14:39:42 2018 -0700 bonding: pass link-local packets to bonding master also. [ Upstream commit 6a9e461f6fe4434e6172304b69774daff9a3ac4c ] Commit b89f04c61efe ("bonding: deliver link-local packets with skb->dev set to link that packets arrived on") changed the behavior of how link-local-multicast packets are processed. The change in the behavior broke some legacy use cases where these packets are expected to arrive on bonding master device also. This patch passes the packet to the stack with the link it arrived on as well as passes to the bonding-master device to preserve the legacy use case. Fixes: b89f04c61efe ("bonding: deliver link-local packets with skb->dev set to link that packets arrived on") Reported-by: Michal Soltys Signed-off-by: Mahesh Bandewar Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 83eb2fdd0483ea9195fa4a46532e57b2df1b9974 Author: Eran Ben Elisha Date: Sun Sep 16 14:45:27 2018 +0300 net/mlx5: E-Switch, Fix out of bound access when setting vport rate [ Upstream commit 11aa5800ed66ed0415b7509f02881c76417d212a ] The code that deals with eswitch vport bw guarantee was going beyond the eswitch vport array limit, fix that. This was pointed out by the kernel address sanitizer (KASAN). The error from KASAN log: [2018-09-15 15:04:45] BUG: KASAN: slab-out-of-bounds in mlx5_eswitch_set_vport_rate+0x8c1/0xae0 [mlx5_core] Fixes: c9497c98901c ("net/mlx5: Add support for setting VF min rate") Signed-off-by: Eran Ben Elisha Reviewed-by: Or Gerlitz Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman commit 7aa339e9099496a01183daf9b74b77c7cc26ed03 Author: Friedemann Gerold Date: Sat Sep 15 18:03:39 2018 +0300 net: aquantia: memory corruption on jumbo frames [ Upstream commit d26ed6b0e5e23190d43ab34bc69cbecdc464a2cf ] This patch fixes skb_shared area, which will be corrupted upon reception of 4K jumbo packets. Originally build_skb usage purpose was to reuse page for skb to eliminate needs of extra fragments. But that logic does not take into account that skb_shared_info should be reserved at the end of skb data area. In case packet data consumes all the page (4K), skb_shinfo location overflows the page. As a consequence, __build_skb zeroed shinfo data above the allocated page, corrupting next page. The issue is rarely seen in real life because jumbo are normally larger than 4K and that causes another code path to trigger. But it 100% reproducible with simple scapy packet, like: sendp(IP(dst="192.168.100.3") / TCP(dport=443) \ / Raw(RandString(size=(4096-40))), iface="enp1s0") Fixes: 018423e90bee ("net: ethernet: aquantia: Add ring support code") Reported-by: Friedemann Gerold Reported-by: Michael Rauch Signed-off-by: Friedemann Gerold Tested-by: Nikita Danilov Signed-off-by: Igor Russkikh Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 7ba8867fb3a7fb6ec0b919eb69e539b4bf03b712 Author: Jianbo Liu Date: Sat Aug 25 03:29:58 2018 +0000 net/mlx5e: Set vlan masks for all offloaded TC rules [ Upstream commit cee26487620bc9bc3c7db21b6984d91f7bae12ae ] In flow steering, if asked to, the hardware matches on the first ethertype which is not vlan. It's possible to set a rule as follows, which is meant to match on untagged packet, but will match on a vlan packet: tc filter add dev eth0 parent ffff: protocol ip flower ... To avoid this for packets with single tag, we set vlan masks to tell hardware to check the tags for every matched packet. Fixes: 095b6cfd69ce ('net/mlx5e: Add TC vlan match parsing') Signed-off-by: Jianbo Liu Reviewed-by: Or Gerlitz Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman commit 431a4fee711414dbd14389d62f38adb03dc39e49 Author: Florian Fainelli Date: Tue Oct 9 16:48:57 2018 -0700 net: dsa: bcm_sf2: Fix unbind ordering [ Upstream commit bf3b452b7af787b8bf27de6490dc4eedf6f97599 ] The order in which we release resources is unfortunately leading to bus errors while dismantling the port. This is because we set priv->wol_ports_mask to 0 to tell bcm_sf2_sw_suspend() that it is now permissible to clock gate the switch. Later on, when dsa_slave_destroy() comes in from dsa_unregister_switch() and calls dsa_switch_ops::port_disable, we perform the same dismantling again, and this time we hit registers that are clock gated. Make sure that dsa_unregister_switch() is the first thing that happens, which takes care of releasing all user visible resources, then proceed with clock gating hardware. We still need to set priv->wol_ports_mask to 0 to make sure that an enabled port properly gets disabled in case it was previously used as part of Wake-on-LAN. Fixes: d9338023fb8e ("net: dsa: bcm_sf2: Make it a real platform device driver") Signed-off-by: Florian Fainelli Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 5150140b4ea7ac8401deaebec03e619988a3804b Author: Jianfeng Tan Date: Sat Sep 29 15:41:27 2018 +0000 net/packet: fix packet drop as of virtio gso [ Upstream commit 9d2f67e43b73e8af7438be219b66a5de0cfa8bd9 ] When we use raw socket as the vhost backend, a packet from virito with gso offloading information, cannot be sent out in later validaton at xmit path, as we did not set correct skb->protocol which is further used for looking up the gso function. To fix this, we set this field according to virito hdr information. Fixes: e858fae2b0b8f4 ("virtio_net: use common code for virtio_net_hdr and skb GSO conversion") Signed-off-by: Jianfeng Tan Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 5e7bb38dc696b672b75de0539a75e776f023fe9b Author: Jose Abreu Date: Mon Sep 17 09:22:57 2018 +0100 net: stmmac: Fixup the tail addr setting in xmit path [ Upstream commit 0431100b3d82c509729ece1ab22ada2484e209c1 ] Currently we are always setting the tail address of descriptor list to the end of the pre-allocated list. According to databook this is not correct. Tail address should point to the last available descriptor + 1, which means we have to update the tail address everytime we call the xmit function. This should make no impact in older versions of MAC but in newer versions there are some DMA features which allows the IP to fetch descriptors in advance and in a non sequential order so its critical that we set the tail address correctly. Signed-off-by: Jose Abreu Fixes: f748be531d70 ("stmmac: support new GMAC4") Cc: David S. Miller Cc: Joao Pinto Cc: Giuseppe Cavallaro Cc: Alexandre Torgue Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 7976e6b70ecf1c9d592d597a7e9292aa8ad9e3fc Author: Jiri Kosina Date: Thu Oct 4 13:37:32 2018 +0200 udp: Unbreak modules that rely on external __skb_recv_udp() availability [ Upstream commit 7e823644b60555f70f241274b8d0120dd919269a ] Commit 2276f58ac589 ("udp: use a separate rx queue for packet reception") turned static inline __skb_recv_udp() from being a trivial helper around __skb_recv_datagram() into a UDP specific implementaion, making it EXPORT_SYMBOL_GPL() at the same time. There are external modules that got broken by __skb_recv_udp() not being visible to them. Let's unbreak them by making __skb_recv_udp EXPORT_SYMBOL(). Rationale (one of those) why this is actually "technically correct" thing to do: __skb_recv_udp() used to be an inline wrapper around __skb_recv_datagram(), which itself (still, and correctly so, I believe) is EXPORT_SYMBOL(). Cc: Paolo Abeni Cc: Eric Dumazet Fixes: 2276f58ac589 ("udp: use a separate rx queue for packet reception") Signed-off-by: Jiri Kosina Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 49984ca4e60ef5a707fd73b9e4bf34c539bbe2bf Author: Parthasarathy Bhuvaragan Date: Tue Sep 25 18:21:58 2018 +0200 tipc: fix flow control accounting for implicit connect [ Upstream commit 92ef12b32feab8f277b69e9fb89ede2796777f4d ] In the case of implicit connect message with data > 1K, the flow control accounting is incorrect. At this state, the socket does not know the peer nodes capability and falls back to legacy flow control by return 1, however the receiver of this message will perform the new block accounting. This leads to a slack and eventually traffic disturbance. In this commit, we perform tipc_node_get_capabilities() at implicit connect and perform accounting based on the peer's capability. Signed-off-by: Parthasarathy Bhuvaragan Signed-off-by: Jon Maloy Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 66c1b9cfa07d6f590c118ff8b548578b5f059187 Author: Ido Schimmel Date: Mon Oct 1 12:21:59 2018 +0300 team: Forbid enslaving team device to itself [ Upstream commit 471b83bd8bbe4e89743683ef8ecb78f7029d8288 ] team's ndo_add_slave() acquires 'team->lock' and later tries to open the newly enslaved device via dev_open(). This emits a 'NETDEV_UP' event that causes the VLAN driver to add VLAN 0 on the team device. team's ndo_vlan_rx_add_vid() will also try to acquire 'team->lock' and deadlock. Fix this by checking early at the enslavement function that a team device is not being enslaved to itself. A similar check was added to the bond driver in commit 09a89c219baf ("bonding: disallow enslaving a bond to itself"). WARNING: possible recursive locking detected 4.18.0-rc7+ #176 Not tainted -------------------------------------------- syz-executor4/6391 is trying to acquire lock: (____ptrval____) (&team->lock){+.+.}, at: team_vlan_rx_add_vid+0x3b/0x1e0 drivers/net/team/team.c:1868 but task is already holding lock: (____ptrval____) (&team->lock){+.+.}, at: team_add_slave+0xdb/0x1c30 drivers/net/team/team.c:1947 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&team->lock); lock(&team->lock); *** DEADLOCK *** May be due to missing lock nesting notation 2 locks held by syz-executor4/6391: #0: (____ptrval____) (rtnl_mutex){+.+.}, at: rtnl_lock net/core/rtnetlink.c:77 [inline] #0: (____ptrval____) (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x412/0xc30 net/core/rtnetlink.c:4662 #1: (____ptrval____) (&team->lock){+.+.}, at: team_add_slave+0xdb/0x1c30 drivers/net/team/team.c:1947 stack backtrace: CPU: 1 PID: 6391 Comm: syz-executor4 Not tainted 4.18.0-rc7+ #176 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113 print_deadlock_bug kernel/locking/lockdep.c:1765 [inline] check_deadlock kernel/locking/lockdep.c:1809 [inline] validate_chain kernel/locking/lockdep.c:2405 [inline] __lock_acquire.cold.64+0x1fb/0x486 kernel/locking/lockdep.c:3435 lock_acquire+0x1e4/0x540 kernel/locking/lockdep.c:3924 __mutex_lock_common kernel/locking/mutex.c:757 [inline] __mutex_lock+0x176/0x1820 kernel/locking/mutex.c:894 mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:909 team_vlan_rx_add_vid+0x3b/0x1e0 drivers/net/team/team.c:1868 vlan_add_rx_filter_info+0x14a/0x1d0 net/8021q/vlan_core.c:210 __vlan_vid_add net/8021q/vlan_core.c:278 [inline] vlan_vid_add+0x63e/0x9d0 net/8021q/vlan_core.c:308 vlan_device_event.cold.12+0x2a/0x2f net/8021q/vlan.c:381 notifier_call_chain+0x180/0x390 kernel/notifier.c:93 __raw_notifier_call_chain kernel/notifier.c:394 [inline] raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401 call_netdevice_notifiers_info+0x3f/0x90 net/core/dev.c:1735 call_netdevice_notifiers net/core/dev.c:1753 [inline] dev_open+0x173/0x1b0 net/core/dev.c:1433 team_port_add drivers/net/team/team.c:1219 [inline] team_add_slave+0xa8b/0x1c30 drivers/net/team/team.c:1948 do_set_master+0x1c9/0x220 net/core/rtnetlink.c:2248 do_setlink+0xba4/0x3e10 net/core/rtnetlink.c:2382 rtnl_setlink+0x2a9/0x400 net/core/rtnetlink.c:2636 rtnetlink_rcv_msg+0x46e/0xc30 net/core/rtnetlink.c:4665 netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2455 rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4683 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline] netlink_unicast+0x5a0/0x760 net/netlink/af_netlink.c:1343 netlink_sendmsg+0xa18/0xfd0 net/netlink/af_netlink.c:1908 sock_sendmsg_nosec net/socket.c:642 [inline] sock_sendmsg+0xd5/0x120 net/socket.c:652 ___sys_sendmsg+0x7fd/0x930 net/socket.c:2126 __sys_sendmsg+0x11d/0x290 net/socket.c:2164 __do_sys_sendmsg net/socket.c:2173 [inline] __se_sys_sendmsg net/socket.c:2171 [inline] __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2171 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x456b29 Code: fd b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f9706bf8c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007f9706bf96d4 RCX: 0000000000456b29 RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000004 RBP: 00000000009300a0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 00000000004d3548 R14: 00000000004c8227 R15: 0000000000000000 Fixes: 87002b03baab ("net: introduce vlan_vid_[add/del] and use them instead of direct [add/kill]_vid ndo calls") Signed-off-by: Ido Schimmel Reported-and-tested-by: syzbot+bd051aba086537515cdb@syzkaller.appspotmail.com Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit d63d3995d7db668a07765fd92cbb66c06968a5b3 Author: Xin Long Date: Thu Sep 20 17:27:28 2018 +0800 sctp: update dst pmtu with the correct daddr [ Upstream commit d7ab5cdce54da631f0c8c11e506c974536a3581e ] When processing pmtu update from an icmp packet, it calls .update_pmtu with sk instead of skb in sctp_transport_update_pmtu. However for sctp, the daddr in the transport might be different from inet_sock->inet_daddr or sk->sk_v6_daddr, which is used to update or create the route cache. The incorrect daddr will cause a different route cache created for the path. So before calling .update_pmtu, inet_sock->inet_daddr/sk->sk_v6_daddr should be updated with the daddr in the transport, and update it back after it's done. The issue has existed since route exceptions introduction. Fixes: 4895c771c7f0 ("ipv4: Add FIB nexthop exceptions.") Reported-by: ian.periam@dialogic.com Signed-off-by: Xin Long Acked-by: Marcelo Ricardo Leitner Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit a8b0f004eb9022d9150932d94126cab4911a5159 Author: Eric Dumazet Date: Tue Oct 2 15:47:35 2018 -0700 rtnl: limit IFLA_NUM_TX_QUEUES and IFLA_NUM_RX_QUEUES to 4096 [ Upstream commit 0e1d6eca5113858ed2caea61a5adc03c595f6096 ] We have an impressive number of syzkaller bugs that are linked to the fact that syzbot was able to create a networking device with millions of TX (or RX) queues. Let's limit the number of RX/TX queues to 4096, this really should cover all known cases. A separate patch will add various cond_resched() in the loops handling sysfs entries at device creation and dismantle. Tested: lpaa6:~# ip link add gre-4097 numtxqueues 4097 numrxqueues 4097 type ip6gretap RTNETLINK answers: Invalid argument lpaa6:~# time ip link add gre-4096 numtxqueues 4096 numrxqueues 4096 type ip6gretap real 0m0.180s user 0m0.000s sys 0m0.107s Fixes: 76ff5cc91935 ("rtnl: allow to specify number of rx and tx queues on device creation") Signed-off-by: Eric Dumazet Reported-by: syzbot Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 5f999abba33f6788f52cdadae3432f5e731c09bc Author: Mauricio Faria de Oliveira Date: Mon Oct 1 22:46:40 2018 -0300 rtnetlink: fix rtnl_fdb_dump() for ndmsg header [ Upstream commit bd961c9bc66497f0c63f4ba1d02900bb85078366 ] Currently, rtnl_fdb_dump() assumes the family header is 'struct ifinfomsg', which is not always true -- 'struct ndmsg' is used by iproute2 ('ip neigh'). The problem is, the function bails out early if nlmsg_parse() fails, which does occur for iproute2 usage of 'struct ndmsg' because the payload length is shorter than the family header alone (as 'struct ifinfomsg' is assumed). This breaks backward compatibility with userspace -- nothing is sent back. Some examples with iproute2 and netlink library for go [1]: 1) $ bridge fdb show 33:33:00:00:00:01 dev ens3 self permanent 01:00:5e:00:00:01 dev ens3 self permanent 33:33:ff:15:98:30 dev ens3 self permanent This one works, as it uses 'struct ifinfomsg'. fdb_show() @ iproute2/bridge/fdb.c """ .n.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)), ... if (rtnl_dump_request(&rth, RTM_GETNEIGH, [...] """ 2) $ ip --family bridge neigh RTNETLINK answers: Invalid argument Dump terminated This one fails, as it uses 'struct ndmsg'. do_show_or_flush() @ iproute2/ip/ipneigh.c """ .n.nlmsg_type = RTM_GETNEIGH, .n.nlmsg_len = NLMSG_LENGTH(sizeof(struct ndmsg)), """ 3) $ ./neighlist < no output > This one fails, as it uses 'struct ndmsg'-based. neighList() @ netlink/neigh_linux.go """ req := h.newNetlinkRequest(unix.RTM_GETNEIGH, [...] msg := Ndmsg{ """ The actual breakage was introduced by commit 0ff50e83b512 ("net: rtnetlink: bail out from rtnl_fdb_dump() on parse error"), because nlmsg_parse() fails if the payload length (with the _actual_ family header) is less than the family header length alone (which is assumed, in parameter 'hdrlen'). This is true in the examples above with struct ndmsg, with size and payload length shorter than struct ifinfomsg. However, that commit just intends to fix something under the assumption the family header is indeed an 'struct ifinfomsg' - by preventing access to the payload as such (via 'ifm' pointer) if the payload length is not sufficient to actually contain it. The assumption was introduced by commit 5e6d24358799 ("bridge: netlink dump interface at par with brctl"), to support iproute2's 'bridge fdb' command (not 'ip neigh') which indeed uses 'struct ifinfomsg', thus is not broken. So, in order to unbreak the 'struct ndmsg' family headers and still allow 'struct ifinfomsg' to continue to work, check for the known message sizes used with 'struct ndmsg' in iproute2 (with zero or one attribute which is not used in this function anyway) then do not parse the data as ifinfomsg. Same examples with this patch applied (or revert/before the original fix): $ bridge fdb show 33:33:00:00:00:01 dev ens3 self permanent 01:00:5e:00:00:01 dev ens3 self permanent 33:33:ff:15:98:30 dev ens3 self permanent $ ip --family bridge neigh dev ens3 lladdr 33:33:00:00:00:01 PERMANENT dev ens3 lladdr 01:00:5e:00:00:01 PERMANENT dev ens3 lladdr 33:33:ff:15:98:30 PERMANENT $ ./neighlist netlink.Neigh{LinkIndex:2, Family:7, State:128, Type:0, Flags:2, IP:net.IP(nil), HardwareAddr:net.HardwareAddr{0x33, 0x33, 0x0, 0x0, 0x0, 0x1}, LLIPAddr:net.IP(nil), Vlan:0, VNI:0} netlink.Neigh{LinkIndex:2, Family:7, State:128, Type:0, Flags:2, IP:net.IP(nil), HardwareAddr:net.HardwareAddr{0x1, 0x0, 0x5e, 0x0, 0x0, 0x1}, LLIPAddr:net.IP(nil), Vlan:0, VNI:0} netlink.Neigh{LinkIndex:2, Family:7, State:128, Type:0, Flags:2, IP:net.IP(nil), HardwareAddr:net.HardwareAddr{0x33, 0x33, 0xff, 0x15, 0x98, 0x30}, LLIPAddr:net.IP(nil), Vlan:0, VNI:0} Tested on mainline (v4.19-rc6) and net-next (3bd09b05b068). References: [1] netlink library for go (test-case) https://github.com/vishvananda/netlink $ cat ~/go/src/neighlist/main.go package main import ("fmt"; "syscall"; "github.com/vishvananda/netlink") func main() { neighs, _ := netlink.NeighList(0, syscall.AF_BRIDGE) for _, neigh := range neighs { fmt.Printf("%#v\n", neigh) } } $ export GOPATH=~/go $ go get github.com/vishvananda/netlink $ go build neighlist $ ~/go/src/neighlist/neighlist Thanks to David Ahern for suggestions to improve this patch. Fixes: 0ff50e83b512 ("net: rtnetlink: bail out from rtnl_fdb_dump() on parse error") Fixes: 5e6d24358799 ("bridge: netlink dump interface at par with brctl") Reported-by: Aidan Obley Signed-off-by: Mauricio Faria de Oliveira Reviewed-by: David Ahern Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 72675512fb1a823326d8c26ffa5a9a59ecb54bbb Author: Giacinto Cifelli Date: Wed Oct 10 20:05:53 2018 +0200 qmi_wwan: Added support for Gemalto's Cinterion ALASxx WWAN interface [ Upstream commit 4f7617705bfff84d756fe4401a1f4f032f374984 ] Added support for Gemalto's Cinterion ALASxx WWAN interfaces by adding QMI_FIXED_INTF with Cinterion's VID and PID. Signed-off-by: Giacinto Cifelli Acked-by: Bjørn Mork Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 0d5b9edea99531145d4cade061ad6c7238654c48 Author: Shahed Shaikh Date: Wed Sep 26 12:41:10 2018 -0700 qlcnic: fix Tx descriptor corruption on 82xx devices [ Upstream commit c333fa0c4f220f8f7ea5acd6b0ebf3bf13fd684d ] In regular NIC transmission flow, driver always configures MAC using Tx queue zero descriptor as a part of MAC learning flow. But with multi Tx queue supported NIC, regular transmission can occur on any non-zero Tx queue and from that context it uses Tx queue zero descriptor to configure MAC, at the same time TX queue zero could be used by another CPU for regular transmission which could lead to Tx queue zero descriptor corruption and cause FW abort. This patch fixes this in such a way that driver always configures learned MAC address from the same Tx queue which is used for regular transmission. Fixes: 7e2cf4feba05 ("qlcnic: change driver hardware interface mechanism") Signed-off-by: Shahed Shaikh Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 133aea0f2108f5c3b68a621f20caf275f7e90e64 Author: Yu Zhao Date: Fri Sep 28 17:04:30 2018 -0600 net/usb: cancel pending work when unbinding smsc75xx [ Upstream commit f7b2a56e1f3dcbdb4cf09b2b63e859ffe0e09df8 ] Cancel pending work before freeing smsc75xx private data structure during binding. This fixes the following crash in the driver: BUG: unable to handle kernel NULL pointer dereference at 0000000000000050 IP: mutex_lock+0x2b/0x3f Workqueue: events smsc75xx_deferred_multicast_write [smsc75xx] task: ffff8caa83e85700 task.stack: ffff948b80518000 RIP: 0010:mutex_lock+0x2b/0x3f Call Trace: smsc75xx_deferred_multicast_write+0x40/0x1af [smsc75xx] process_one_work+0x18d/0x2fc worker_thread+0x1a2/0x269 ? pr_cont_work+0x58/0x58 kthread+0xfa/0x10a ? pr_cont_work+0x58/0x58 ? rcu_read_unlock_sched_notrace+0x48/0x48 ret_from_fork+0x22/0x40 Signed-off-by: Yu Zhao Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 3e80ad8cbf228d74666a862fdf2c38f5b6b3c8aa Author: Florian Fainelli Date: Tue Oct 2 16:52:03 2018 -0700 net: systemport: Fix wake-up interrupt race during resume [ Upstream commit 45ec318578c0c22a11f5b9927d064418e1ab1905 ] The AON_PM_L2 is normally used to trigger and identify the source of a wake-up event. Since the RX_SYS clock is no longer turned off, we also have an interrupt being sent to the SYSTEMPORT INTRL_2_0 controller, and that interrupt remains active up until the magic packet detector is disabled which happens much later during the driver resumption. The race happens if we have a CPU that is entering the SYSTEMPORT INTRL2_0 handler during resume, and another CPU has managed to clear the wake-up interrupt during bcm_sysport_resume_from_wol(). In that case, we have the first CPU stuck in the interrupt handler with an interrupt cause that has been cleared under its feet, and so we keep returning IRQ_NONE and we never make any progress. This was not a problem before because we would always turn off the RX_SYS clock during WoL, so the SYSTEMPORT INTRL2_0 would also be turned off as well, thus not latching the interrupt. The fix is to make sure we do not enable either the MPD or BRCM_TAG_MATCH interrupts since those are redundant with what the AON_PM_L2 interrupt controller already processes and they would cause such a race to occur. Fixes: bb9051a2b230 ("net: systemport: Add support for WAKE_FILTER") Fixes: 83e82f4c706b ("net: systemport: add Wake-on-LAN support") Signed-off-by: Florian Fainelli Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit d9057423312e9d12f02290cb97f14300330a729e Author: David Ahern Date: Wed Oct 3 15:05:36 2018 -0700 net: sched: Add policy validation for tc attributes [ Upstream commit 8b4c3cdd9dd8290343ce959a132d3b334062c5b9 ] A number of TC attributes are processed without proper validation (e.g., length checks). Add a tca policy for all input attributes and use when invoking nlmsg_parse. The 2 Fixes tags below cover the latest additions. The other attributes are a string (KIND), nested attribute (OPTIONS which does seem to have validation in most cases), for dumps only or a flag. Fixes: 5bc1701881e39 ("net: sched: introduce multichain support for filters") Fixes: d47a6b0e7c492 ("net: sched: introduce ingress/egress block index attributes for qdisc") Signed-off-by: David Ahern Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 85ebbc5a2543f3ffcc535dbeb9842af542f3edb1 Author: Antoine Tenart Date: Tue Sep 18 16:58:47 2018 +0200 net: mvpp2: fix a txq_done race condition [ Upstream commit 774268f3e51b53ed432a1ec516574fd5ba469398 ] When no Tx IRQ is available, the txq_done() routine (called from tx_done()) shouldn't be called from the polling function, as in such case it is already called in the Tx path thanks to an hrtimer. This mostly occurred when using PPv2.1, as the engine then do not have Tx IRQs. Fixes: edc660fa09e2 ("net: mvpp2: replace TX coalescing interrupts with hrtimer") Reported-by: Stefan Chulski Signed-off-by: Antoine Tenart Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit d9bf6699aee863044cbb8c2b5d2de5e1db4fd7d8 Author: Maxime Chevallier Date: Fri Oct 5 09:04:40 2018 +0200 net: mvpp2: Extract the correct ethtype from the skb for tx csum offload [ Upstream commit 35f3625c21852ad839f20c91c7d81c4c1101e207 ] When offloading the L3 and L4 csum computation on TX, we need to extract the l3_proto from the ethtype, independently of the presence of a vlan tag. The actual driver uses skb->protocol as-is, resulting in packets with the wrong L4 checksum being sent when there's a vlan tag in the packet header and checksum offloading is enabled. This commit makes use of vlan_protocol_get() to get the correct ethtype regardless the presence of a vlan tag. Fixes: 3f518509dedc ("ethernet: Add new driver for Marvell Armada 375 network unit") Signed-off-by: Maxime Chevallier Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 19c5e73c745cac1ff3c54526335f16c515a1e3a7 Author: Sean Tranchetti Date: Thu Sep 20 14:29:45 2018 -0600 netlabel: check for IPV4MASK in addrinfo_get [ Upstream commit f88b4c01b97e09535505cf3c327fdbce55c27f00 ] netlbl_unlabel_addrinfo_get() assumes that if it finds the NLBL_UNLABEL_A_IPV4ADDR attribute, it must also have the NLBL_UNLABEL_A_IPV4MASK attribute as well. However, this is not necessarily the case as the current checks in netlbl_unlabel_staticadd() and friends are not sufficent to enforce this. If passed a netlink message with NLBL_UNLABEL_A_IPV4ADDR, NLBL_UNLABEL_A_IPV6ADDR, and NLBL_UNLABEL_A_IPV6MASK attributes, these functions will all call netlbl_unlabel_addrinfo_get() which will then attempt dereference NULL when fetching the non-existent NLBL_UNLABEL_A_IPV4MASK attribute: Unable to handle kernel NULL pointer dereference at virtual address 0 Process unlab (pid: 31762, stack limit = 0xffffff80502d8000) Call trace: netlbl_unlabel_addrinfo_get+0x44/0xd8 netlbl_unlabel_staticremovedef+0x98/0xe0 genl_rcv_msg+0x354/0x388 netlink_rcv_skb+0xac/0x118 genl_rcv+0x34/0x48 netlink_unicast+0x158/0x1f0 netlink_sendmsg+0x32c/0x338 sock_sendmsg+0x44/0x60 ___sys_sendmsg+0x1d0/0x2a8 __sys_sendmsg+0x64/0xb4 SyS_sendmsg+0x34/0x4c el0_svc_naked+0x34/0x38 Code: 51001149 7100113f 540000a0 f9401508 (79400108) ---[ end trace f6438a488e737143 ]--- Kernel panic - not syncing: Fatal exception Signed-off-by: Sean Tranchetti Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 335c37612f9fec84f8e361e2ce8148b7cd156d22 Author: Jeff Barnhill <0xeffeff@gmail.com> Date: Fri Sep 21 00:45:27 2018 +0000 net/ipv6: Display all addresses in output of /proc/net/if_inet6 [ Upstream commit 86f9bd1ff61c413a2a251fa736463295e4e24733 ] The backend handling for /proc/net/if_inet6 in addrconf.c doesn't properly handle starting/stopping the iteration. The problem is that at some point during the iteration, an overflow is detected and the process is subsequently stopped. The item being shown via seq_printf() when the overflow occurs is not actually shown, though. When start() is subsequently called to resume iterating, it returns the next item, and thus the item that was being processed when the overflow occurred never gets printed. Alter the meaning of the private data member "offset". Currently, when it is not 0 (which only happens at the very beginning), "offset" represents the next hlist item to be printed. After this change, "offset" always represents the current item. This is also consistent with the private data member "bucket", which represents the current bucket, and also the use of "pos" as defined in seq_file.txt: The pos passed to start() will always be either zero, or the most recent pos used in the previous session. Signed-off-by: Jeff Barnhill <0xeffeff@gmail.com> Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 9b4869cf385aa16f89c0f019eed4ec4e36aa441c Author: Sabrina Dubroca Date: Tue Oct 9 17:48:14 2018 +0200 net: ipv4: update fnhe_pmtu when first hop's MTU changes [ Upstream commit af7d6cce53694a88d6a1bb60c9a239a6a5144459 ] Since commit 5aad1de5ea2c ("ipv4: use separate genid for next hop exceptions"), exceptions get deprecated separately from cached routes. In particular, administrative changes don't clear PMTU anymore. As Stefano described in commit e9fa1495d738 ("ipv6: Reflect MTU changes on PMTU of exceptions for MTU-less routes"), the PMTU discovered before the local MTU change can become stale: - if the local MTU is now lower than the PMTU, that PMTU is now incorrect - if the local MTU was the lowest value in the path, and is increased, we might discover a higher PMTU Similarly to what commit e9fa1495d738 did for IPv6, update PMTU in those cases. If the exception was locked, the discovered PMTU was smaller than the minimal accepted PMTU. In that case, if the new local MTU is smaller than the current PMTU, let PMTU discovery figure out if locking of the exception is still needed. To do this, we need to know the old link MTU in the NETDEV_CHANGEMTU notifier. By the time the notifier is called, dev->mtu has been changed. This patch adds the old MTU as additional information in the notifier structure, and a new call_netdevice_notifiers_u32() function. Fixes: 5aad1de5ea2c ("ipv4: use separate genid for next hop exceptions") Signed-off-by: Sabrina Dubroca Reviewed-by: Stefano Brivio Reviewed-by: David Ahern Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 6c61dae979ae39850a3a9945f93e83e5b5baf31c Author: Yunsheng Lin Date: Tue Sep 25 10:21:55 2018 +0100 net: hns: fix for unmapping problem when SMMU is on [ Upstream commit 2e9361efa707e186d91b938e44f9e326725259f7 ] If SMMU is on, there is more likely that skb_shinfo(skb)->frags[i] can not send by a single BD. when this happen, the hns_nic_net_xmit_hw function map the whole data in a frags using skb_frag_dma_map, but unmap each BD' data individually when tx is done, which causes problem when SMMU is on. This patch fixes this problem by ummapping the whole data in a frags when tx is done. Signed-off-by: Yunsheng Lin Signed-off-by: Peng Li Reviewed-by: Yisen Zhuang Signed-off-by: Salil Mehta Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 8193b775247a2c38294fd9d1fee5084b5c1b3de8 Author: Florian Fainelli Date: Tue Oct 9 16:48:58 2018 -0700 net: dsa: bcm_sf2: Call setup during switch resume [ Upstream commit 54baca096386d862d19c10f58f34bf787c6b3cbe ] There is no reason to open code what the switch setup function does, in fact, because we just issued a switch reset, we would make all the register get their default values, including for instance, having unused port be enabled again and wasting power and leading to an inappropriate switch core clock being selected. Fixes: 8cfa94984c9c ("net: dsa: bcm_sf2: add suspend/resume callbacks") Signed-off-by: Florian Fainelli Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 48c14f2ea5c58268f4ea59da6467c74cdec9e6f2 Author: Wei Wang Date: Thu Oct 4 10:12:37 2018 -0700 ipv6: take rcu lock in rawv6_send_hdrinc() [ Upstream commit a688caa34beb2fd2a92f1b6d33e40cde433ba160 ] In rawv6_send_hdrinc(), in order to avoid an extra dst_hold(), we directly assign the dst to skb and set passed in dst to NULL to avoid double free. However, in error case, we free skb and then do stats update with the dst pointer passed in. This causes use-after-free on the dst. Fix it by taking rcu read lock right before dst could get released to make sure dst does not get freed until the stats update is done. Note: we don't have this issue in ipv4 cause dst is not used for stats update in v4. Syzkaller reported following crash: BUG: KASAN: use-after-free in rawv6_send_hdrinc net/ipv6/raw.c:692 [inline] BUG: KASAN: use-after-free in rawv6_sendmsg+0x4421/0x4630 net/ipv6/raw.c:921 Read of size 8 at addr ffff8801d95ba730 by task syz-executor0/32088 CPU: 1 PID: 32088 Comm: syz-executor0 Not tainted 4.19.0-rc2+ #93 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c4/0x2b4 lib/dump_stack.c:113 print_address_description.cold.8+0x9/0x1ff mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433 rawv6_send_hdrinc net/ipv6/raw.c:692 [inline] rawv6_sendmsg+0x4421/0x4630 net/ipv6/raw.c:921 inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798 sock_sendmsg_nosec net/socket.c:621 [inline] sock_sendmsg+0xd5/0x120 net/socket.c:631 ___sys_sendmsg+0x7fd/0x930 net/socket.c:2114 __sys_sendmsg+0x11d/0x280 net/socket.c:2152 __do_sys_sendmsg net/socket.c:2161 [inline] __se_sys_sendmsg net/socket.c:2159 [inline] __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2159 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x457099 Code: fd b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f83756edc78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007f83756ee6d4 RCX: 0000000000457099 RDX: 0000000000000000 RSI: 0000000020003840 RDI: 0000000000000004 RBP: 00000000009300a0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 00000000004d4b30 R14: 00000000004c90b1 R15: 0000000000000000 Allocated by task 32088: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553 kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490 kmem_cache_alloc+0x12e/0x730 mm/slab.c:3554 dst_alloc+0xbb/0x1d0 net/core/dst.c:105 ip6_dst_alloc+0x35/0xa0 net/ipv6/route.c:353 ip6_rt_cache_alloc+0x247/0x7b0 net/ipv6/route.c:1186 ip6_pol_route+0x8f8/0xd90 net/ipv6/route.c:1895 ip6_pol_route_output+0x54/0x70 net/ipv6/route.c:2093 fib6_rule_lookup+0x277/0x860 net/ipv6/fib6_rules.c:122 ip6_route_output_flags+0x2c5/0x350 net/ipv6/route.c:2121 ip6_route_output include/net/ip6_route.h:88 [inline] ip6_dst_lookup_tail+0xe27/0x1d60 net/ipv6/ip6_output.c:951 ip6_dst_lookup_flow+0xc8/0x270 net/ipv6/ip6_output.c:1079 rawv6_sendmsg+0x12d9/0x4630 net/ipv6/raw.c:905 inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798 sock_sendmsg_nosec net/socket.c:621 [inline] sock_sendmsg+0xd5/0x120 net/socket.c:631 ___sys_sendmsg+0x7fd/0x930 net/socket.c:2114 __sys_sendmsg+0x11d/0x280 net/socket.c:2152 __do_sys_sendmsg net/socket.c:2161 [inline] __se_sys_sendmsg net/socket.c:2159 [inline] __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2159 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 5356: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528 __cache_free mm/slab.c:3498 [inline] kmem_cache_free+0x83/0x290 mm/slab.c:3756 dst_destroy+0x267/0x3c0 net/core/dst.c:141 dst_destroy_rcu+0x16/0x19 net/core/dst.c:154 __rcu_reclaim kernel/rcu/rcu.h:236 [inline] rcu_do_batch kernel/rcu/tree.c:2576 [inline] invoke_rcu_callbacks kernel/rcu/tree.c:2880 [inline] __rcu_process_callbacks kernel/rcu/tree.c:2847 [inline] rcu_process_callbacks+0xf23/0x2670 kernel/rcu/tree.c:2864 __do_softirq+0x30b/0xad8 kernel/softirq.c:292 Fixes: 1789a640f556 ("raw: avoid two atomics in xmit") Signed-off-by: Wei Wang Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 32b193216e185a3ba817a179f29a53a9973665a9 Author: Eric Dumazet Date: Sun Sep 30 11:33:39 2018 -0700 ipv4: fix use-after-free in ip_cmsg_recv_dstaddr() [ Upstream commit 64199fc0a46ba211362472f7f942f900af9492fd ] Caching ip_hdr(skb) before a call to pskb_may_pull() is buggy, do not do it. Fixes: 2efd4fca703a ("ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull") Signed-off-by: Eric Dumazet Cc: Willem de Bruijn Reported-by: syzbot Acked-by: Willem de Bruijn Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit deb33b68f40e788c974acf61bc220dc077cb8032 Author: Paolo Abeni Date: Mon Sep 24 15:48:19 2018 +0200 ip_tunnel: be careful when accessing the inner header [ Upstream commit ccfec9e5cb2d48df5a955b7bf47f7782157d3bc2] Cong noted that we need the same checks introduced by commit 76c0ddd8c3a6 ("ip6_tunnel: be careful when accessing the inner header") even for ipv4 tunnels. Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.") Suggested-by: Cong Wang Signed-off-by: Paolo Abeni Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 18bf9a724362a3feff649f529178f49b11e877e4 Author: Paolo Abeni Date: Wed Sep 19 15:02:07 2018 +0200 ip6_tunnel: be careful when accessing the inner header [ Upstream commit 76c0ddd8c3a683f6e2c6e60e11dc1a1558caf4bc ] the ip6 tunnel xmit ndo assumes that the processed skb always contains an ip[v6] header, but syzbot has found a way to send frames that fall short of this assumption, leading to the following splat: BUG: KMSAN: uninit-value in ip6ip6_tnl_xmit net/ipv6/ip6_tunnel.c:1307 [inline] BUG: KMSAN: uninit-value in ip6_tnl_start_xmit+0x7d2/0x1ef0 net/ipv6/ip6_tunnel.c:1390 CPU: 0 PID: 4504 Comm: syz-executor558 Not tainted 4.16.0+ #87 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:53 kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067 __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:683 ip6ip6_tnl_xmit net/ipv6/ip6_tunnel.c:1307 [inline] ip6_tnl_start_xmit+0x7d2/0x1ef0 net/ipv6/ip6_tunnel.c:1390 __netdev_start_xmit include/linux/netdevice.h:4066 [inline] netdev_start_xmit include/linux/netdevice.h:4075 [inline] xmit_one net/core/dev.c:3026 [inline] dev_hard_start_xmit+0x5f1/0xc70 net/core/dev.c:3042 __dev_queue_xmit+0x27ee/0x3520 net/core/dev.c:3557 dev_queue_xmit+0x4b/0x60 net/core/dev.c:3590 packet_snd net/packet/af_packet.c:2944 [inline] packet_sendmsg+0x7c70/0x8a30 net/packet/af_packet.c:2969 sock_sendmsg_nosec net/socket.c:630 [inline] sock_sendmsg net/socket.c:640 [inline] ___sys_sendmsg+0xec0/0x1310 net/socket.c:2046 __sys_sendmmsg+0x42d/0x800 net/socket.c:2136 SYSC_sendmmsg+0xc4/0x110 net/socket.c:2167 SyS_sendmmsg+0x63/0x90 net/socket.c:2162 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 RIP: 0033:0x441819 RSP: 002b:00007ffe58ee8268 EFLAGS: 00000213 ORIG_RAX: 0000000000000133 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000441819 RDX: 0000000000000002 RSI: 0000000020000100 RDI: 0000000000000003 RBP: 00000000006cd018 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000213 R12: 0000000000402510 R13: 00000000004025a0 R14: 0000000000000000 R15: 0000000000000000 Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline] kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188 kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314 kmsan_slab_alloc+0x11/0x20 mm/kmsan/kmsan.c:321 slab_post_alloc_hook mm/slab.h:445 [inline] slab_alloc_node mm/slub.c:2737 [inline] __kmalloc_node_track_caller+0xaed/0x11c0 mm/slub.c:4369 __kmalloc_reserve net/core/skbuff.c:138 [inline] __alloc_skb+0x2cf/0x9f0 net/core/skbuff.c:206 alloc_skb include/linux/skbuff.h:984 [inline] alloc_skb_with_frags+0x1d4/0xb20 net/core/skbuff.c:5234 sock_alloc_send_pskb+0xb56/0x1190 net/core/sock.c:2085 packet_alloc_skb net/packet/af_packet.c:2803 [inline] packet_snd net/packet/af_packet.c:2894 [inline] packet_sendmsg+0x6454/0x8a30 net/packet/af_packet.c:2969 sock_sendmsg_nosec net/socket.c:630 [inline] sock_sendmsg net/socket.c:640 [inline] ___sys_sendmsg+0xec0/0x1310 net/socket.c:2046 __sys_sendmmsg+0x42d/0x800 net/socket.c:2136 SYSC_sendmmsg+0xc4/0x110 net/socket.c:2167 SyS_sendmmsg+0x63/0x90 net/socket.c:2162 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 This change addresses the issue adding the needed check before accessing the inner header. The ipv4 side of the issue is apparently there since the ipv4 over ipv6 initial support, and the ipv6 side predates git history. Fixes: c4d3efafcc93 ("[IPV6] IP6TUNNEL: Add support to IPv4 over IPv6 tunnel.") Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+3fde91d4d394747d6db4@syzkaller.appspotmail.com Tested-by: Alexander Potapenko Signed-off-by: Paolo Abeni Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 94402f23659fa9005776a1500f61953c825c3420 Author: Mahesh Bandewar Date: Mon Sep 24 14:40:11 2018 -0700 bonding: avoid possible dead-lock [ Upstream commit d4859d749aa7090ffb743d15648adb962a1baeae ] Syzkaller reported this on a slightly older kernel but it's still applicable to the current kernel - ====================================================== WARNING: possible circular locking dependency detected 4.18.0-next-20180823+ #46 Not tainted ------------------------------------------------------ syz-executor4/26841 is trying to acquire lock: 00000000dd41ef48 ((wq_completion)bond_dev->name){+.+.}, at: flush_workqueue+0x2db/0x1e10 kernel/workqueue.c:2652 but task is already holding lock: 00000000768ab431 (rtnl_mutex){+.+.}, at: rtnl_lock net/core/rtnetlink.c:77 [inline] 00000000768ab431 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x412/0xc30 net/core/rtnetlink.c:4708 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (rtnl_mutex){+.+.}: __mutex_lock_common kernel/locking/mutex.c:925 [inline] __mutex_lock+0x171/0x1700 kernel/locking/mutex.c:1073 mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1088 rtnl_lock+0x17/0x20 net/core/rtnetlink.c:77 bond_netdev_notify drivers/net/bonding/bond_main.c:1310 [inline] bond_netdev_notify_work+0x44/0xd0 drivers/net/bonding/bond_main.c:1320 process_one_work+0xc73/0x1aa0 kernel/workqueue.c:2153 worker_thread+0x189/0x13c0 kernel/workqueue.c:2296 kthread+0x35a/0x420 kernel/kthread.c:246 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:415 -> #1 ((work_completion)(&(&nnw->work)->work)){+.+.}: process_one_work+0xc0b/0x1aa0 kernel/workqueue.c:2129 worker_thread+0x189/0x13c0 kernel/workqueue.c:2296 kthread+0x35a/0x420 kernel/kthread.c:246 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:415 -> #0 ((wq_completion)bond_dev->name){+.+.}: lock_acquire+0x1e4/0x4f0 kernel/locking/lockdep.c:3901 flush_workqueue+0x30a/0x1e10 kernel/workqueue.c:2655 drain_workqueue+0x2a9/0x640 kernel/workqueue.c:2820 destroy_workqueue+0xc6/0x9d0 kernel/workqueue.c:4155 __alloc_workqueue_key+0xef9/0x1190 kernel/workqueue.c:4138 bond_init+0x269/0x940 drivers/net/bonding/bond_main.c:4734 register_netdevice+0x337/0x1100 net/core/dev.c:8410 bond_newlink+0x49/0xa0 drivers/net/bonding/bond_netlink.c:453 rtnl_newlink+0xef4/0x1d50 net/core/rtnetlink.c:3099 rtnetlink_rcv_msg+0x46e/0xc30 net/core/rtnetlink.c:4711 netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2454 rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4729 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline] netlink_unicast+0x5a0/0x760 net/netlink/af_netlink.c:1343 netlink_sendmsg+0xa18/0xfc0 net/netlink/af_netlink.c:1908 sock_sendmsg_nosec net/socket.c:622 [inline] sock_sendmsg+0xd5/0x120 net/socket.c:632 ___sys_sendmsg+0x7fd/0x930 net/socket.c:2115 __sys_sendmsg+0x11d/0x290 net/socket.c:2153 __do_sys_sendmsg net/socket.c:2162 [inline] __se_sys_sendmsg net/socket.c:2160 [inline] __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2160 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe other info that might help us debug this: Chain exists of: (wq_completion)bond_dev->name --> (work_completion)(&(&nnw->work)->work) --> rtnl_mutex Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(rtnl_mutex); lock((work_completion)(&(&nnw->work)->work)); lock(rtnl_mutex); lock((wq_completion)bond_dev->name); *** DEADLOCK *** 1 lock held by syz-executor4/26841: stack backtrace: CPU: 1 PID: 26841 Comm: syz-executor4 Not tainted 4.18.0-next-20180823+ #46 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113 print_circular_bug.isra.34.cold.55+0x1bd/0x27d kernel/locking/lockdep.c:1222 check_prev_add kernel/locking/lockdep.c:1862 [inline] check_prevs_add kernel/locking/lockdep.c:1975 [inline] validate_chain kernel/locking/lockdep.c:2416 [inline] __lock_acquire+0x3449/0x5020 kernel/locking/lockdep.c:3412 lock_acquire+0x1e4/0x4f0 kernel/locking/lockdep.c:3901 flush_workqueue+0x30a/0x1e10 kernel/workqueue.c:2655 drain_workqueue+0x2a9/0x640 kernel/workqueue.c:2820 destroy_workqueue+0xc6/0x9d0 kernel/workqueue.c:4155 __alloc_workqueue_key+0xef9/0x1190 kernel/workqueue.c:4138 bond_init+0x269/0x940 drivers/net/bonding/bond_main.c:4734 register_netdevice+0x337/0x1100 net/core/dev.c:8410 bond_newlink+0x49/0xa0 drivers/net/bonding/bond_netlink.c:453 rtnl_newlink+0xef4/0x1d50 net/core/rtnetlink.c:3099 rtnetlink_rcv_msg+0x46e/0xc30 net/core/rtnetlink.c:4711 netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2454 rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4729 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline] netlink_unicast+0x5a0/0x760 net/netlink/af_netlink.c:1343 netlink_sendmsg+0xa18/0xfc0 net/netlink/af_netlink.c:1908 sock_sendmsg_nosec net/socket.c:622 [inline] sock_sendmsg+0xd5/0x120 net/socket.c:632 ___sys_sendmsg+0x7fd/0x930 net/socket.c:2115 __sys_sendmsg+0x11d/0x290 net/socket.c:2153 __do_sys_sendmsg net/socket.c:2162 [inline] __se_sys_sendmsg net/socket.c:2160 [inline] __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2160 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x457089 Code: fd b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f2df20a5c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007f2df20a66d4 RCX: 0000000000457089 RDX: 0000000000000000 RSI: 0000000020000180 RDI: 0000000000000003 RBP: 0000000000930140 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 00000000004d40b8 R14: 00000000004c8ad8 R15: 0000000000000001 Signed-off-by: Mahesh Bandewar Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit e73b51a995ac55aa207b693dc6830784efa6c41c Author: Venkat Duvvuru Date: Fri Oct 5 00:26:02 2018 -0400 bnxt_en: free hwrm resources, if driver probe fails. [ Upstream commit a2bf74f4e1b82395dad2b08d2a911d9151db71c1 ] When the driver probe fails, all the resources that were allocated prior to the failure must be freed. However, hwrm dma response memory is not getting freed. This patch fixes the problem described above. Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.") Signed-off-by: Venkat Duvvuru Signed-off-by: Michael Chan Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 67d1ee6c7b76f1b1de9c6e217df8ee5894902aaf Author: Michael Chan Date: Wed Sep 26 00:41:04 2018 -0400 bnxt_en: Fix TX timeout during netpoll. [ Upstream commit 73f21c653f930f438d53eed29b5e4c65c8a0f906 ] The current netpoll implementation in the bnxt_en driver has problems that may miss TX completion events. bnxt_poll_work() in effect is only handling at most 1 TX packet before exiting. In addition, there may be in flight TX completions that ->poll() may miss even after we fix bnxt_poll_work() to handle all visible TX completions. netpoll may not call ->poll() again and HW may not generate IRQ because the driver does not ARM the IRQ when the budget (0 for netpoll) is reached. We fix it by handling all TX completions and to always ARM the IRQ when we exit ->poll() with 0 budget. Also, the logic to ACK the completion ring in case it is almost filled with TX completions need to be adjusted to take care of the 0 budget case, as discussed with Eric Dumazet Reported-by: Song Liu Reviewed-by: Song Liu Tested-by: Song Liu Signed-off-by: Michael Chan Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman