forked from openbmc/linux
-
Notifications
You must be signed in to change notification settings - Fork 4
kws 7435 Include codeowners #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
rajas-axiado
wants to merge
10,000
commits into
NVIDIA:develop-6.5
from
axiado:feature/KWS-7435-codeowners
Closed
kws 7435 Include codeowners #1
rajas-axiado
wants to merge
10,000
commits into
NVIDIA:develop-6.5
from
axiado:feature/KWS-7435-codeowners
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[ Upstream commit 3cb7cf1 ] Most qdiscs maintain their backlog using qdisc_pkt_len(skb) on the assumption it is invariant between the enqueue() and dequeue() handlers. Unfortunately syzbot can crash a host rather easily using a TBF + SFQ combination, with an STAB on SFQ [1] We can't support TCA_STAB on arbitrary level, this would require to maintain per-qdisc storage. [1] [ 88.796496] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 88.798611] #PF: supervisor read access in kernel mode [ 88.799014] #PF: error_code(0x0000) - not-present page [ 88.799506] PGD 0 P4D 0 [ 88.799829] Oops: Oops: 0000 [#1] SMP NOPTI [ 88.800569] CPU: 14 UID: 0 PID: 2053 Comm: b371744477 Not tainted 6.12.0-rc1-virtme torvalds#1117 [ 88.801107] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 88.801779] RIP: 0010:sfq_dequeue (net/sched/sch_sfq.c:272 net/sched/sch_sfq.c:499) sch_sfq [ 88.802544] Code: 0f b7 50 12 48 8d 04 d5 00 00 00 00 48 89 d6 48 29 d0 48 8b 91 c0 01 00 00 48 c1 e0 03 48 01 c2 66 83 7a 1a 00 7e c0 48 8b 3a <4c> 8b 07 4c 89 02 49 89 50 08 48 c7 47 08 00 00 00 00 48 c7 07 00 All code ======== 0: 0f b7 50 12 movzwl 0x12(%rax),%edx 4: 48 8d 04 d5 00 00 00 lea 0x0(,%rdx,8),%rax b: 00 c: 48 89 d6 mov %rdx,%rsi f: 48 29 d0 sub %rdx,%rax 12: 48 8b 91 c0 01 00 00 mov 0x1c0(%rcx),%rdx 19: 48 c1 e0 03 shl $0x3,%rax 1d: 48 01 c2 add %rax,%rdx 20: 66 83 7a 1a 00 cmpw $0x0,0x1a(%rdx) 25: 7e c0 jle 0xffffffffffffffe7 27: 48 8b 3a mov (%rdx),%rdi 2a:* 4c 8b 07 mov (%rdi),%r8 <-- trapping instruction 2d: 4c 89 02 mov %r8,(%rdx) 30: 49 89 50 08 mov %rdx,0x8(%r8) 34: 48 c7 47 08 00 00 00 movq $0x0,0x8(%rdi) 3b: 00 3c: 48 rex.W 3d: c7 .byte 0xc7 3e: 07 (bad) ... Code starting with the faulting instruction =========================================== 0: 4c 8b 07 mov (%rdi),%r8 3: 4c 89 02 mov %r8,(%rdx) 6: 49 89 50 08 mov %rdx,0x8(%r8) a: 48 c7 47 08 00 00 00 movq $0x0,0x8(%rdi) 11: 00 12: 48 rex.W 13: c7 .byte 0xc7 14: 07 (bad) ... [ 88.803721] RSP: 0018:ffff9a1f892b7d58 EFLAGS: 00000206 [ 88.804032] RAX: 0000000000000000 RBX: ffff9a1f8420c800 RCX: ffff9a1f8420c800 [ 88.804560] RDX: ffff9a1f81bc1440 RSI: 0000000000000000 RDI: 0000000000000000 [ 88.805056] RBP: ffffffffc04bb0e0 R08: 0000000000000001 R09: 00000000ff7f9a1f [ 88.805473] R10: 000000000001001b R11: 0000000000009a1f R12: 0000000000000140 [ 88.806194] R13: 0000000000000001 R14: ffff9a1f886df400 R15: ffff9a1f886df4ac [ 88.806734] FS: 00007f445601a740(0000) GS:ffff9a2e7fd80000(0000) knlGS:0000000000000000 [ 88.807225] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 88.807672] CR2: 0000000000000000 CR3: 000000050cc46000 CR4: 00000000000006f0 [ 88.808165] Call Trace: [ 88.808459] <TASK> [ 88.808710] ? __die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434) [ 88.809261] ? page_fault_oops (arch/x86/mm/fault.c:715) [ 88.809561] ? exc_page_fault (./arch/x86/include/asm/irqflags.h:26 ./arch/x86/include/asm/irqflags.h:87 ./arch/x86/include/asm/irqflags.h:147 arch/x86/mm/fault.c:1489 arch/x86/mm/fault.c:1539) [ 88.809806] ? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:623) [ 88.810074] ? sfq_dequeue (net/sched/sch_sfq.c:272 net/sched/sch_sfq.c:499) sch_sfq [ 88.810411] sfq_reset (net/sched/sch_sfq.c:525) sch_sfq [ 88.810671] qdisc_reset (./include/linux/skbuff.h:2135 ./include/linux/skbuff.h:2441 ./include/linux/skbuff.h:3304 ./include/linux/skbuff.h:3310 net/sched/sch_generic.c:1036) [ 88.810950] tbf_reset (./include/linux/timekeeping.h:169 net/sched/sch_tbf.c:334) sch_tbf [ 88.811208] qdisc_reset (./include/linux/skbuff.h:2135 ./include/linux/skbuff.h:2441 ./include/linux/skbuff.h:3304 ./include/linux/skbuff.h:3310 net/sched/sch_generic.c:1036) [ 88.811484] netif_set_real_num_tx_queues (./include/linux/spinlock.h:396 ./include/net/sch_generic.h:768 net/core/dev.c:2958) [ 88.811870] __tun_detach (drivers/net/tun.c:590 drivers/net/tun.c:673) [ 88.812271] tun_chr_close (drivers/net/tun.c:702 drivers/net/tun.c:3517) [ 88.812505] __fput (fs/file_table.c:432 (discriminator 1)) [ 88.812735] task_work_run (kernel/task_work.c:230) [ 88.813016] do_exit (kernel/exit.c:940) [ 88.813372] ? trace_hardirqs_on (kernel/trace/trace_preemptirq.c:58 (discriminator 4)) [ 88.813639] ? handle_mm_fault (./arch/x86/include/asm/irqflags.h:42 ./arch/x86/include/asm/irqflags.h:97 ./arch/x86/include/asm/irqflags.h:155 ./include/linux/memcontrol.h:1022 ./include/linux/memcontrol.h:1045 ./include/linux/memcontrol.h:1052 mm/memory.c:5928 mm/memory.c:6088) [ 88.813867] do_group_exit (kernel/exit.c:1070) [ 88.814138] __x64_sys_exit_group (kernel/exit.c:1099) [ 88.814490] x64_sys_call (??:?) [ 88.814791] do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1)) [ 88.815012] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) [ 88.815495] RIP: 0033:0x7f44560f1975 Fixes: 175f9c1 ("net_sched: Add size table for qdiscs") Reported-by: syzbot <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Cc: Daniel Borkmann <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 08c8acc ] dcr_map is called in the previous if and therefore needs to be unmapped. Fixes: 1ff0fcf ("ibm_newemac: Fix new MAL feature handling") Signed-off-by: Rosen Penev <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
…one info [ Upstream commit fe4cd7e ] At btrfs_load_zone_info() we have an error path that is dereferencing the name of a device which is a RCU string but we are not holding a RCU read lock, which is incorrect. Fix this by using btrfs_err_in_rcu() instead of btrfs_err(). The problem is there since commit 08e11a3 ("btrfs: zoned: load zone's allocation offset"), back then at btrfs_load_block_group_zone_info() but then later on that code was factored out into the helper btrfs_load_zone_info() by commit 09a4672 ("btrfs: zoned: factor out per-zone logic from btrfs_load_block_group_zone_info"). Fixes: 08e11a3 ("btrfs: zoned: load zone's allocation offset") Reviewed-by: Johannes Thumshirn <[email protected]> Reviewed-by: Qu Wenruo <[email protected]> Reviewed-by: Naohiro Aota <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
…n_start [ Upstream commit 4d5c70e ] If hashing fails in sctp_listen_start(), the socket remains in the LISTENING state, even though it was not added to the hash table. This can lead to a scenario where a socket appears to be listening without actually being accessible. This patch ensures that if the hashing operation fails, the sk_state is set back to CLOSED before returning an error. Note that there is no need to undo the autobind operation if hashing fails, as the bind port can still be used for next listen() call on the same socket. Fixes: 76c6d98 ("sctp: add sock_reuseport for the sock in __sctp_hash_endpoint") Reported-by: Marcelo Ricardo Leitner <[email protected]> Signed-off-by: Xin Long <[email protected]> Acked-by: Marcelo Ricardo Leitner <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 0bfcb7b ] syzbot managed to call xt_cluster match via ebtables: WARNING: CPU: 0 PID: 11 at net/netfilter/xt_cluster.c:72 xt_cluster_mt+0x196/0x780 [..] ebt_do_table+0x174b/0x2a40 Module registers to NFPROTO_UNSPEC, but it assumes ipv4/ipv6 packet processing. As this is only useful to restrict locally terminating TCP/UDP traffic, register this for ipv4 and ipv6 family only. Pablo points out that this is a general issue, direct users of the set/getsockopt interface can call into targets/matches that were only intended for use with ip(6)tables. Check all UNSPEC matches and targets for similar issues: - matches and targets are fine except if they assume skb_network_header() is valid -- this is only true when called from inet layer: ip(6) stack pulls the ip/ipv6 header into linear data area. - targets that return XT_CONTINUE or other xtables verdicts must be restricted too, they are incompatbile with the ebtables traverser, e.g. EBT_CONTINUE is a completely different value than XT_CONTINUE. Most matches/targets are changed to register for NFPROTO_IPV4/IPV6, as they are provided for use by ip(6)tables. The MARK target is also used by arptables, so register for NFPROTO_ARP too. While at it, bail out if connbytes fails to enable the corresponding conntrack family. This change passes the selftests in iptables.git. Reported-by: [email protected] Closes: https://lore.kernel.org/netfilter-devel/[email protected]/ Fixes: 0269ea4 ("netfilter: xtables: add cluster match") Signed-off-by: Florian Westphal <[email protected]> Co-developed-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 05ef705 ] We need to init l3mdev unconditionally, else main routing table is searched and incorrect result is returned unless strict (iif keyword) matching is requested. Next patch adds a selftest for this. Fixes: 2a8a7c0 ("netfilter: nft_fib: Fix for rpath check with VRF devices") Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1761 Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit ac888d5 ] dst_entries_add() uses per-cpu data that might be freed at netns dismantle from ip6_route_net_exit() calling dst_entries_destroy() Before ip6_route_net_exit() can be called, we release all the dsts associated with this netns, via calls to dst_release(), which waits an rcu grace period before calling dst_destroy() dst_entries_add() use in dst_destroy() is racy, because dst_entries_destroy() could have been called already. Decrementing the number of dsts must happen sooner. Notes: 1) in CONFIG_XFRM case, dst_destroy() can call dst_release_immediate(child), this might also cause UAF if the child does not have DST_NOCOUNT set. IPSEC maintainers might take a look and see how to address this. 2) There is also discussion about removing this count of dst, which might happen in future kernels. Fixes: f886497 ("ipv4: fix dst race in sk_dst_get()") Closes: https://lore.kernel.org/lkml/CANn89iLCCGsP7SFn9HKpvnKu96Td4KD08xf7aGtiYgZnkjaL=w@mail.gmail.com/T/ Reported-by: Naresh Kamboju <[email protected]> Tested-by: Linux Kernel Functional Testing <[email protected]> Tested-by: Naresh Kamboju <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Cc: Xin Long <[email protected]> Cc: Steffen Klassert <[email protected]> Reviewed-by: Xin Long <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 07cc7b0 ] Before commit addf9b9 ("net: rtnetlink: use rcu to free rtnl message handlers"), once rtnl_msg_handlers[protocol] was allocated, the following rtnl_register_module() for the same protocol never failed. However, after the commit, rtnl_msg_handler[protocol][msgtype] needs to be allocated in each rtnl_register_module(), so each call could fail. Many callers of rtnl_register_module() do not handle the returned error, and we need to add many error handlings. To handle that easily, let's add wrapper functions for bulk registration of rtnetlink message handlers. Signed-off-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Stable-dep-of: 78b7b99 ("vxlan: Handle error of rtnl_register_module().") Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 78b7b99 ] Since introduced, vxlan_vnifilter_init() has been ignoring the returned value of rtnl_register_module(), which could fail silently. Handling the error allows users to view a module as an all-or-nothing thing in terms of the rtnetlink functionality. This prevents syzkaller from reporting spurious errors from its tests, where OOM often occurs and module is automatically loaded. Let's handle the errors by rtnl_register_many(). Fixes: f9c4bb0 ("vxlan: vni filtering support on collect metadata device") Signed-off-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit cba5e43 ] Since introduced, br_vlan_rtnl_init() has been ignoring the returned value of rtnl_register_module(), which could fail silently. Handling the error allows users to view a module as an all-or-nothing thing in terms of the rtnetlink functionality. This prevents syzkaller from reporting spurious errors from its tests, where OOM often occurs and module is automatically loaded. Let's handle the errors by rtnl_register_many(). Fixes: 8dcea18 ("net: bridge: vlan: add rtm definitions and dump support") Fixes: f26b296 ("net: bridge: vlan: add new rtm message support") Fixes: adb3ce9 ("net: bridge: vlan: add del rtm message support") Signed-off-by: Kuniyuki Iwashima <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit d517056 ] Since introduced, mctp has been ignoring the returned value of rtnl_register_module(), which could fail silently. Handling the error allows users to view a module as an all-or-nothing thing in terms of the rtnetlink functionality. This prevents syzkaller from reporting spurious errors from its tests, where OOM often occurs and module is automatically loaded. Let's handle the errors by rtnl_register_many(). Fixes: 583be98 ("mctp: Add device handling and netlink interface") Fixes: 831119f ("mctp: Add neighbour netlink interface") Fixes: 06d2f4c ("mctp: Add netlink route management") Signed-off-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: Jeremy Kerr <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit e39951d ] In commit af65bdf ("[NETLINK]: Switch cb_lock spinlock to mutex and allow to override it"), Patrick McHardy used a common mutex to protect both nlk->cb and the dump() operations. The override is used for rtnl dumps, registered with rntl_register() and rntl_register_module(). We want to be able to opt-out some dump() operations to not acquire RTNL, so we need to protect nlk->cb with a per socket mutex. This patch renames nlk->cb_def_mutex to nlk->nl_cb_mutex The optional pointer to the mutex used to protect dump() call is stored in nlk->dump_cb_mutex Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]> Stable-dep-of: 5be2062 ("mpls: Handle error of rtnl_register_module().") Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 386520e ] Similarly to RTNL_FLAG_DOIT_UNLOCKED, this new flag allows dump operations registered via rtnl_register() or rtnl_register_module() to opt-out from RTNL protection. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Donald Hunter <[email protected]> Signed-off-by: David S. Miller <[email protected]> Stable-dep-of: 5be2062 ("mpls: Handle error of rtnl_register_module().") Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit e0f89d2 ] - Use for_each_netdev_dump() to no longer rely on net->dev_index_head hash table. - No longer care of net->dev_base_seq - Fix return value at the end of a dump, so that NLMSG_DONE can be appended to current skb, saving one recvmsg() system call. - No longer grab RTNL, RCU protection is enough, afer adding one READ_ONCE(mdev->input_enabled) in mpls_netconf_fill_devconf() Signed-off-by: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Stable-dep-of: 5be2062 ("mpls: Handle error of rtnl_register_module().") Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 5be2062 ] Since introduced, mpls_init() has been ignoring the returned value of rtnl_register_module(), which could fail silently. Handling the error allows users to view a module as an all-or-nothing thing in terms of the rtnetlink functionality. This prevents syzkaller from reporting spurious errors from its tests, where OOM often occurs and module is automatically loaded. Let's handle the errors by rtnl_register_many(). Fixes: 03c0566 ("mpls: Netlink commands to add, remove, and dump routes") Signed-off-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 58a4ff5 ] route_dumpit() already relies on RCU, RTNL is not needed. Also change return value at the end of a dump. This allows NLMSG_DONE to be appended to the current skb at the end of a dump, saving a couple of recvmsg() system calls. Signed-off-by: Eric Dumazet <[email protected]> Cc: Remi Denis-Courmont <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Stable-dep-of: b5e837c ("phonet: Handle error of rtnl_register_module().") Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit b5e837c ] Before commit addf9b9 ("net: rtnetlink: use rcu to free rtnl message handlers"), once the first rtnl_register_module() allocated rtnl_msg_handlers[PF_PHONET], the following calls never failed. However, after the commit, rtnl_register_module() could fail silently to allocate rtnl_msg_handlers[PF_PHONET][msgtype] and requires error handling for each call. Handling the error allows users to view a module as an all-or-nothing thing in terms of the rtnetlink functionality. This prevents syzkaller from reporting spurious errors from its tests, where OOM often occurs and module is automatically loaded. Let's use rtnl_register_many() to handle the errors easily. Fixes: addf9b9 ("net: rtnetlink: use rcu to free rtnl message handlers") Signed-off-by: Kuniyuki Iwashima <[email protected]> Acked-by: Rémi Denis-Courmont <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 40dddd4 ] syzbot reported an issue in ppp_async_encode() [1] In this case, pppoe_sendmsg() is called with a zero size. Then ppp_async_encode() is called with an empty skb. BUG: KMSAN: uninit-value in ppp_async_encode drivers/net/ppp/ppp_async.c:545 [inline] BUG: KMSAN: uninit-value in ppp_async_push+0xb4f/0x2660 drivers/net/ppp/ppp_async.c:675 ppp_async_encode drivers/net/ppp/ppp_async.c:545 [inline] ppp_async_push+0xb4f/0x2660 drivers/net/ppp/ppp_async.c:675 ppp_async_send+0x130/0x1b0 drivers/net/ppp/ppp_async.c:634 ppp_channel_bridge_input drivers/net/ppp/ppp_generic.c:2280 [inline] ppp_input+0x1f1/0xe60 drivers/net/ppp/ppp_generic.c:2304 pppoe_rcv_core+0x1d3/0x720 drivers/net/ppp/pppoe.c:379 sk_backlog_rcv+0x13b/0x420 include/net/sock.h:1113 __release_sock+0x1da/0x330 net/core/sock.c:3072 release_sock+0x6b/0x250 net/core/sock.c:3626 pppoe_sendmsg+0x2b8/0xb90 drivers/net/ppp/pppoe.c:903 sock_sendmsg_nosec net/socket.c:729 [inline] __sock_sendmsg+0x30f/0x380 net/socket.c:744 ____sys_sendmsg+0x903/0xb60 net/socket.c:2602 ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2656 __sys_sendmmsg+0x3c1/0x960 net/socket.c:2742 __do_sys_sendmmsg net/socket.c:2771 [inline] __se_sys_sendmmsg net/socket.c:2768 [inline] __x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2768 x64_sys_call+0xb6e/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:308 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Uninit was created at: slab_post_alloc_hook mm/slub.c:4092 [inline] slab_alloc_node mm/slub.c:4135 [inline] kmem_cache_alloc_node_noprof+0x6bf/0xb80 mm/slub.c:4187 kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:587 __alloc_skb+0x363/0x7b0 net/core/skbuff.c:678 alloc_skb include/linux/skbuff.h:1322 [inline] sock_wmalloc+0xfe/0x1a0 net/core/sock.c:2732 pppoe_sendmsg+0x3a7/0xb90 drivers/net/ppp/pppoe.c:867 sock_sendmsg_nosec net/socket.c:729 [inline] __sock_sendmsg+0x30f/0x380 net/socket.c:744 ____sys_sendmsg+0x903/0xb60 net/socket.c:2602 ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2656 __sys_sendmmsg+0x3c1/0x960 net/socket.c:2742 __do_sys_sendmmsg net/socket.c:2771 [inline] __se_sys_sendmmsg net/socket.c:2768 [inline] __x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2768 x64_sys_call+0xb6e/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:308 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f CPU: 1 UID: 0 PID: 5411 Comm: syz.1.14 Not tainted 6.12.0-rc1-syzkaller-00165-g360c1f1f24c6 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: [email protected] Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 7d3fce8 ] syzbot found that slhc_remember() was missing checks against malicious packets [1]. slhc_remember() only checked the size of the packet was at least 20, which is not good enough. We need to make sure the packet includes the IPv4 and TCP header that are supposed to be carried. Add iph and th pointers to make the code more readable. [1] BUG: KMSAN: uninit-value in slhc_remember+0x2e8/0x7b0 drivers/net/slip/slhc.c:666 slhc_remember+0x2e8/0x7b0 drivers/net/slip/slhc.c:666 ppp_receive_nonmp_frame+0xe45/0x35e0 drivers/net/ppp/ppp_generic.c:2455 ppp_receive_frame drivers/net/ppp/ppp_generic.c:2372 [inline] ppp_do_recv+0x65f/0x40d0 drivers/net/ppp/ppp_generic.c:2212 ppp_input+0x7dc/0xe60 drivers/net/ppp/ppp_generic.c:2327 pppoe_rcv_core+0x1d3/0x720 drivers/net/ppp/pppoe.c:379 sk_backlog_rcv+0x13b/0x420 include/net/sock.h:1113 __release_sock+0x1da/0x330 net/core/sock.c:3072 release_sock+0x6b/0x250 net/core/sock.c:3626 pppoe_sendmsg+0x2b8/0xb90 drivers/net/ppp/pppoe.c:903 sock_sendmsg_nosec net/socket.c:729 [inline] __sock_sendmsg+0x30f/0x380 net/socket.c:744 ____sys_sendmsg+0x903/0xb60 net/socket.c:2602 ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2656 __sys_sendmmsg+0x3c1/0x960 net/socket.c:2742 __do_sys_sendmmsg net/socket.c:2771 [inline] __se_sys_sendmmsg net/socket.c:2768 [inline] __x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2768 x64_sys_call+0xb6e/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:308 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Uninit was created at: slab_post_alloc_hook mm/slub.c:4091 [inline] slab_alloc_node mm/slub.c:4134 [inline] kmem_cache_alloc_node_noprof+0x6bf/0xb80 mm/slub.c:4186 kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:587 __alloc_skb+0x363/0x7b0 net/core/skbuff.c:678 alloc_skb include/linux/skbuff.h:1322 [inline] sock_wmalloc+0xfe/0x1a0 net/core/sock.c:2732 pppoe_sendmsg+0x3a7/0xb90 drivers/net/ppp/pppoe.c:867 sock_sendmsg_nosec net/socket.c:729 [inline] __sock_sendmsg+0x30f/0x380 net/socket.c:744 ____sys_sendmsg+0x903/0xb60 net/socket.c:2602 ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2656 __sys_sendmmsg+0x3c1/0x960 net/socket.c:2742 __do_sys_sendmmsg net/socket.c:2771 [inline] __se_sys_sendmmsg net/socket.c:2768 [inline] __x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2768 x64_sys_call+0xb6e/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:308 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f CPU: 0 UID: 0 PID: 5460 Comm: syz.2.33 Not tainted 6.12.0-rc2-syzkaller-00006-g87d6aab2389e #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 Fixes: b5451d7 ("slip: Move the SLIP drivers") Reported-by: [email protected] Closes: https://lore.kernel.org/netdev/[email protected]/T/#u Signed-off-by: Eric Dumazet <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit b913c3f ] Currently IRQs are disabled on call_rcu() and then depending on the context: * If the CPU is in nocb mode: - If the callback is enqueued in the bypass list, IRQs are re-enabled implictly by rcu_nocb_try_bypass() - If the callback is enqueued in the normal list, IRQs are re-enabled implicitly by __call_rcu_nocb_wake() * If the CPU is NOT in nocb mode, IRQs are reenabled explicitly from call_rcu() This makes the code a bit hard to follow, especially as it interleaves with nocb locking. To make the IRQ flags coverage clearer and also in order to prepare for moving all the nocb enqueue code to its own function, always re-enable the IRQ flags explicitly from call_rcu(). Reviewed-by: Neeraj Upadhyay (AMD) <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Reviewed-by: Paul E. McKenney <[email protected]> Signed-off-by: Boqun Feng <[email protected]> Stable-dep-of: f7345cc ("rcu/nocb: Fix rcuog wake-up from offline softirq") Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit f7345cc ] After a CPU has set itself offline and before it eventually calls rcutree_report_cpu_dead(), there are still opportunities for callbacks to be enqueued, for example from a softirq. When that happens on NOCB, the rcuog wake-up is deferred through an IPI to an online CPU in order not to call into the scheduler and risk arming the RT-bandwidth after hrtimers have been migrated out and disabled. But performing a synchronized IPI from a softirq is buggy as reported in the following scenario: WARNING: CPU: 1 PID: 26 at kernel/smp.c:633 smp_call_function_single Modules linked in: rcutorture torture CPU: 1 UID: 0 PID: 26 Comm: migration/1 Not tainted 6.11.0-rc1-00012-g9139f93209d1 #1 Stopper: multi_cpu_stop+0x0/0x320 <- __stop_cpus+0xd0/0x120 RIP: 0010:smp_call_function_single <IRQ> swake_up_one_online __call_rcu_nocb_wake __call_rcu_common ? rcu_torture_one_read call_timer_fn __run_timers run_timer_softirq handle_softirqs irq_exit_rcu ? tick_handle_periodic sysvec_apic_timer_interrupt </IRQ> Fix this with forcing deferred rcuog wake up through the NOCB timer when the CPU is offline. The actual wake up will happen from rcutree_report_cpu_dead(). Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-lkp/[email protected] Fixes: 9139f93 ("rcu/nocb: Fix RT throttling hrtimer armed from offline CPU") Reviewed-by: "Joel Fernandes (Google)" <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Neeraj Upadhyay <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 740329d ] Added a gpiochip compatible driver to control the 8 GPIOs of the MCP2200 by using the HID interface. Using GPIOs with alternative functions (GP0<->SSPND, GP1<->USBCFG, GP6<->RXLED, GP7<->TXLED) will reset the functions, if set (unset by default). The driver was tested while also using the UART of the chip. Setting and reading the GPIOs has no effect on the UART communication. However, a reset is triggered after the CONFIGURE command. If the GPIO Direction is constantly changed, this will affect the communication at low baud rates. This is a hardware problem of the MCP2200 and is not caused by the driver. Signed-off-by: Johannes Roith <[email protected]> Reviewed-by: Rahul Rameshbabu <[email protected]> Signed-off-by: Jiri Kosina <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit bd008ac ] Re-trying the power-on command on failure on all devices should not be a problem, drop the I2C_HID_QUIRK_SET_PWR_WAKEUP_DEV quirk and simply retry power-on on all devices. Reviewed-by: Douglas Anderson <[email protected]> Signed-off-by: Hans de Goede <[email protected]> Signed-off-by: Jiri Kosina <[email protected]> Stable-dep-of: 26dd6a5 ("HID: i2c-hid: Skip SET_POWER SLEEP for Cirque touchpad on system suspend") Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 7d7a252 ] The quirks variable and the I2C_HID_QUIRK_ defines are never used / exported outside of the i2c-hid code renumber them to start at BIT(0) again. Reviewed-by: Douglas Anderson <[email protected]> Signed-off-by: Hans de Goede <[email protected]> Signed-off-by: Jiri Kosina <[email protected]> Stable-dep-of: 26dd6a5 ("HID: i2c-hid: Skip SET_POWER SLEEP for Cirque touchpad on system suspend") Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 26dd6a5 ] There's a Cirque touchpad that wakes system up without anything touched the touchpad. The input report is empty when this happens. The reason is stated in HID over I2C spec, 7.2.8.2: "If the DEVICE wishes to wake the HOST from its low power state, it can issue a wake by asserting the interrupt." This is fine if OS can put system back to suspend by identifying input wakeup count stays the same on resume, like Chrome OS Dark Resume [0]. But for regular distro such policy is lacking. Though the change doesn't bring any impact on power consumption for touchpad is minimal, other i2c-hid device may depends on SLEEP control power. So use a quirk to limit the change scope. [0] https://chromium.googlesource.com/chromiumos/platform2/+/HEAD/power_manager/docs/dark_resume.md Signed-off-by: Kai-Heng Feng <[email protected]> Reviewed-by: Douglas Anderson <[email protected]> Signed-off-by: Jiri Kosina <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 08b50c6 ] A handful of buttons on the ROG Ally are not actually part of the xpad device and are instead keyboard keys (a typical use of the MCU that asus uses). We attach a group of F<num> key codes which aren't used much and which the handheld community has already accepted as defaults here. Signed-off-by: Luke D. Jones <[email protected]> Signed-off-by: Jiri Kosina <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit e901f10 ] Add init of the lightbar which is a small panel on the back of the ASUS ROG Z13 and uses the same MCU as keyboards. Signed-off-by: Luke D. Jones <[email protected]> Signed-off-by: Jiri Kosina <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit d1aa95e ] The new ASUS ROG Ally X functions almost exactly the same as the previous model, so we can use the same quirks. Signed-off-by: Luke D. Jones <[email protected]> Signed-off-by: Jiri Kosina <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
…dexing [ Upstream commit 2663d04 ] req->n_channels must be set before req->channels[] can be used. This patch fixes one of the issues encountered in [1]. [ 83.964255] UBSAN: array-index-out-of-bounds in net/mac80211/scan.c:364:4 [ 83.964258] index 0 is out of range for type 'struct ieee80211_channel *[]' [...] [ 83.964264] Call Trace: [ 83.964267] <TASK> [ 83.964269] dump_stack_lvl+0x3f/0xc0 [ 83.964274] __ubsan_handle_out_of_bounds+0xec/0x110 [ 83.964278] ieee80211_prep_hw_scan+0x2db/0x4b0 [ 83.964281] __ieee80211_start_scan+0x601/0x990 [ 83.964291] nl80211_trigger_scan+0x874/0x980 [ 83.964295] genl_family_rcv_msg_doit+0xe8/0x160 [ 83.964298] genl_rcv_msg+0x240/0x270 [...] [1] https://bugzilla.kernel.org/show_bug.cgi?id=218810 Co-authored-by: Kees Cook <[email protected]> Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Kenton Groombridge <[email protected]> Link: https://msgid.link/[email protected] Signed-off-by: Johannes Berg <[email protected]> [Xiangyu: Modified to apply on 6.1.y and 6.6.y] Signed-off-by: Xiangyu Chen <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit a017616 ] Consistently use CVL instead of Columbiaville, since CVL is already being used in all other sensor labels for the Intel N6000 card. Fixes: e198322 ("hwmon: intel-m10-bmc-hwmon: Add N6000 sensors") Signed-off-by: Peter Colberg <[email protected]> Reviewed-by: Michael Adler <[email protected]> Message-ID: <[email protected]> Signed-off-by: Guenter Roeck <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Add a configfs option for the USB gadget function: tx_batch_delay. When non-zero, will wait delay milliseconds for further packets to be transmitted, and potentially batch those packets into a single USB transfer. Intended for testing only; helps to excersise the host-side driver's packet parsing. Signed-off-by: Jeremy Kerr <[email protected]>
Fixes the following: * Endian-ness of the MCTP USB binding header (DMTF ID) * Getting the binding header from the URB. Tested: Tested on a hardware that supports MCTP over USB. MCTP Control commands work fine after fixes. Signed-off-by: Santosh Puranik <[email protected]>
Fix statistics to work with old kernels. Allow MCTP sub-class 0x02 to work around FPGA bug. Fixes JIRA https:// Signed-off-by: Santosh Puranik <[email protected]>
Set a Tx queue size of 1100 so that the net core layer can queue packets while our Tx URB is busy. Also remove commented code that implements new style of stats as it does not work wit the kernel version we run. Signed-off-by: Santosh Puranik <[email protected]>
Set the MCTP netdev's parent device as the usbdev. This enables userspace to navigate to the USB device via sysfs/udev. Signed-off-by: Santosh Puranik <[email protected]>
This commit enables a second I2C interface between the BF ARM and the DPU BMC in order to enhance system security and serviceability. BMC side: I2C-0@10 : BMC Rx I2C-9@20 : BMC Tx DPU side: I2C-1@11 : DPU Tx I2C-9@30 : DPU Rx Tested: ``` BMC --> DPU root@dpu-bmc:~# ipmitool -I ipmb mc info Device ID : 48 Device Revision : 1 Firmware Revision : 1.00 IPMI Version : 2.0 Manufacturer ID : 33049 Manufacturer Name : NVIDIA Product ID : 5 (0x0005) Product Name : Bluefield3 ARM Device Available : yes Provides Device SDRs : no Additional Device Support : Sensor Device SDR Repository Device SEL Device FRU Inventory Device IPMB Event Receiver Chassis Device Aux Firmware Rev Info : 0x00 0x00 0x00 0x00 DPU --> BMC root@ldev-platform-12-244-oob:/home/ubuntu# sudo ipmitool mc info Device ID : 1 Device Revision : 1 Firmware Revision : 24.07 IPMI Version : 2.0 Manufacturer ID : 33049 Manufacturer Name : Unknown (0x8119) Product ID : 4 (0x0004) Product Name : Unknown (0x4) Device Available : no Provides Device SDRs : yes Additional Device Support : Sensor Device SDR Repository Device SEL Device FRU Inventory Device IPMB Event Receiver Chassis Device Aux Firmware Rev Info : 0x10 0x08 0x00 0x00 ``` Fixes nvbug https://redmine.mellanox.com/issues/3967235
After splitting the IPMB communication into 2 channels, we saw a major slowdown on I2C9 after a BMC reboot, the communication channel from BMC to DPU. Adding the multi-master options fixes the slowdown. Tested: Reboot the BMC, then: root@dpu-bmc:~# i2cdetect -y 9 0 1 2 3 4 5 6 7 8 9 a b c d e f 00: -- -- -- -- -- -- -- -- 10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 20: UU -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 30: 30 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 40: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 50: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 60: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 70: -- -- -- -- -- -- -- -- root@dpu-bmc:~# ipmitool -I ipmb fru print FRU Device Description : Builtin FRU Device (ID 0) Unknown FRU header version 0x30 FRU Device Description : Bluefield_MC Unknown FRU header version 0x30 FRU Device Description : update_timer (ID 0) Unknown FRU header version 0x30 FRU Device Description : fw_info (ID 1) Unknown FRU header version 0x42 FRU Device Description : nic_pci_dev_info (ID 2) Unknown FRU header version 0x44 FRU Device Description : cpuinfo (ID 3) Unknown FRU header version 0x41 FRU Device Description : ddr0_0_spd (ID 4) Unknown FRU header version 0x00 FRU Device Description : ddr0_1_spd (ID 5) Unknown FRU header version 0x00 FRU Device Description : ddr1_0_spd (ID 6) Unknown FRU header version 0x00 FRU Device Description : ddr1_1_spd (ID 7) Unknown FRU header version 0x00 FRU Device Description : emmc_info (ID 8) Unknown FRU header version 0x44 FRU Device Description : qsfp0_eeprom (ID 9) FRU Device Description : qsfp1_eeprom (ID 10) Device not present (Unspecified error) FRU Device Description : ip_addresses (ID 11) Unknown FRU header version 0x00 FRU Device Description : dimms_ce_ue (ID 12) Unknown FRU header version 0x00 FRU Device Description : eth0 (ID 13) Unknown FRU header version 0x4c FRU Device Description : eth1 (ID 14) Unknown FRU header version 0x4e FRU Device Description : bf_uid (ID 15) Unknown FRU header version 0x32 FRU Device Description : eth_hw_counters (ID 16) Unknown FRU header version 0x4c FRU Device Description : oob0 (ID 17) Unknown FRU header version 0x4c FRU Device Description : bf_fru (ID 18) Unknown FRU header version 0x49 FRU Device Description : product_name (ID 19) Unknown FRU header version 0x42 FRU Device Description : dmidecode_info (ID 20) Unknown FRU header version 0x42 root@dpu-bmc:~# Output from IPMB is fast even right after the boot, i2cdetect scan is clean and quick. Fixes nvbug https://redmine.mellanox.com/issues/4082169 Signed-off-by: Alon Lapidus <[email protected]>
This adjustment is necessary to facilitate the bi-directional I2C traffic generated by either the IPMB or NCSI/PLDM protocols. Consequently, I2C buses i2c0 and i2c5 are set to operate in a multi-master mode. Fixes nvbug https://
Set MCTP's SPI netdev's parent spi device. This enables relationship b/w net device and spi which allow userspace to navigate SPI device via udev/sysfs Tested on GB200NVL-HMC after enabling FMC SPI0 interface root@gb200nvl-bmc:~# udevadm info --query=property /sys/class/net/mctpspi0 DEVPATH=/devices/platform/ahb/1e620000.spiraw/spi_master/spi0/spi0.2/net/mctpspi0 INTERFACE=mctpspi0 IFINDEX=5 SUBSYSTEM=net USEC_INITIALIZED=32015878 ID_NET_DRIVER=mctpspi ID_PATH=platform-1e620000.spiraw-cs-2 :ID_PATH_TAG=platform-1e620000_spiraw-cs-2 ID_NET_LINK_FILE=/usr/lib/systemd/network/99-default.link ID_NET_NAME=mctpspi0 SYSTEMD_ALIAS=/sys/subsystem/net/devices/mctpspi0 TAGS=:systemd: CURRENT_TAGS=:systemd: Signed-off-by: Faizan Ali <[email protected]>
Change to remove the current limit on Tx URBs. We now allocate as many as needed when Tx'ing, free them when the URB completes and kill all anchored URBs upon disconnect. Fixes JIRA https:// Signed-off-by: Santosh Puranik <[email protected]>
Add polling to wait for the tx to complete to avoid the condition where the tx read/write pointer is equal, and the software tries to trigger the transaction which will lead to the tx trigger bit stuck at 1 and need the software to clear it. Signed-off-by: Billy Tsai <[email protected]> Change-Id: Ic00b0eb0802e98c76aa62236971cecf9e2c43637 (cherry picked from commit 4cf0aea) Co-authored-by: Billy Tsai <[email protected]>
Change: * Handle corner case of double ISR for slave read transaction Reason: * I2C_SLAVE_READ_PROCESSED was occuring before the first I2C_SLAVE_READ_REQUESTED. This was causing slave reads to start at an incorrect i2c_bus->slave[idx]. * Master was receiving data from offset+1 resulting in wha appears as data corruption to the master. Fixes NVBUG https://nvbugspro.nvidia.com/bug/5030119 Signed-off-by: Ryan Chen <[email protected]> Signed-off-by: Ryan Russell <[email protected]> Signed-off-by: Chester Lin <[email protected]>
Made changes in the dts as well as in the Makefile to support igx over mgx Fixes nvbug https://nvbugspro.nvidia.com/bug/4898393
We are seeing high latency issues at Google if the packet size is 512 bytes. From our side, we also see many loss packets with the ping command if the packet length is exactly equal to usb max packet size (512): root@gb200nvl-bmc:~# ping 172.31.13.251 -s 470 -c 6 PING 172.31.13.251 (172.31.13.251) 470(498) bytes of data. 478 bytes from 172.31.13.251: icmp_seq=1 ttl=64 time=2086 ms 478 bytes from 172.31.13.251: icmp_seq=4 ttl=64 time=2080 ms --- 172.31.13.251 ping statistics --- 6 packets transmitted, 2 received, 66.6667% packet loss, time 5206ms rtt min/avg/max/mdev = 2080.441/2083.149/2085.857/2.708 ms, pipe 3 We found that the USB device IN HMC is not sending the Zero Length Packet to notify the USB host (BMC) about the finish of the packet when the packet size is the multiple of max package size. Going through Ast2600 datasheet, it is not saying about the ZLP support. This commit is to enable the software ZLP from u_ether.c driver. Setting the flag "quirk_zlp_not_supp" will say the device is not supporting ZLP. So, the package transmit function in u_ether.c driver will append one more byte to the end of the buffer as a ZLP. Here is the code snippet to add one more byte to the buffer: https://gitlab-master.nvidia.com/dgx/bmc/linux/-/blob/HGXB-24.10-1_br/drivers/usb/gadget/function/u_ether.c#L572 Test: ping -s 470 -c 100 bmc_ip No packet loss found Signed-off-by : Willie Thai <[email protected]> Fixes nvbug https://nvbugspro.nvidia.com/bug/4959909
Referring to mctp-usb implementation, set addr_len of the net device to 0 since it's a peer-to-peer communication for each mctp-spi interface. Test: 1. Change phy_addlen value in mctp-ctrl to 0. Then build/update the binary. 2. Check if mctp-spi0-ctrl runs correctly. 3. Use mctp-vdm-util to check mctp traffic ``` root@gb200nvl-bmc:~# mctp-vdm-util -t 10 -c background_copy_query_status teid = 10 Test command = background_copy_query_status TX: 00 00 16 47 80 01 09 01 05 RX: 00 00 16 47 00 01 09 01 00 01 ``` Fixes jira https:// Signed-off-by: Ting-Kai Chen <[email protected]>
Following the naming rule of SPI device: spi<bus_num>.<cs>, the interface name of mctp-spi takes the same bus number and chip select of the SPI device -> mctpspi<bus_num>_<cs> Example: SPI device: /sys/bus/spi/devices/spi0.2 -> mctp-spi intf: /sys/class/net/mctpspi0_2 Test: (in QEMU) 1. Check if the net device is registered ``` root@gb200nvl-bmc:~# ls -al /sys/class/net/ ... lrwxrwxrwx 1 root root 0 Mar 13 18:14 mctpspi0_2 -> ../../devices/platform/ahb/1e620000.spiraw/spi_master/spi0/spi0.2/net/mctpspi0_2 ``` 2. Setup mctp-spi ``` root@gb200nvl-bmc:~# mctp addr add 8 dev mctpspi0_2 root@gb200nvl-bmc:~# mctp route add 10 via mctpspi0_2 root@gb200nvl-bmc:~# mctp link set mctpspi0_2 up ``` 3. Check network interface is on ``` root@gb200nvl-bmc:~# mctp link dev lo index 1 address 0x00:00:00:00:00:00 net 1 mtu 65536 up dev mctpspi0_2 index 5 address 0x00 net 1 mtu 68 up root@gb200nvl-bmc:~# ip a ... 5: mctpspi0_2: <UP,LOWER_UP> mtu 68 qdisc pfifo_fast qlen 1100 link/[290] 00 brd 00 family 45 ???/0 scope global dynamic root@gb200nvl-bmc:~# ifconfig ... mctpspi0_2 Link encap:UNSPEC HWaddr 00-00-30-30-30-30-00-30-00-00-00-00-00-00-00-00 UP RUNNING MTU:68 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1100 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) ``` Fixes jira https://jirasw.nvidia.com/browse/DGXOPENBMC-16209 Signed-off-by: Ting-Kai Chen <[email protected]>
…ime tolerance On some devices/applications we noticed that sometimes their SMBus transactions could violate T-timeout criteria (<=25ms), which evneutally breaks data messaging. For non-SMBus I2C traffic, the current maximum time (32ms) could be still not enough for some corner cases. Here we introduce a new DT property "tout-baseclk-div", which allows timeout base-clk adjustment so i2c controllers can configure a longer expiration time. (Suggested by ASPEED) Fixes nvbug https://nvbugspro.nvidia.com/bug/5168121 Signed-off-by: Chester Lin <[email protected]>
We missed some codes that seem caused by merging upstream codes. Add them back to align. The function codes in dev-6.6 are as follows. ``` void io_req_defer_failed(struct io_kiocb *req, s32 res) __must_hold(&ctx->uring_lock) { const struct io_cold_def *def = &io_cold_defs[req->opcode]; lockdep_assert_held(&req->ctx->uring_lock); req_set_fail(req); io_req_set_res(req, res, io_put_kbuf(req, IO_URING_F_UNLOCKED)); if (def->fail) def->fail(req); io_req_complete_defer(req); } ... static void io_queue_sqe_fallback(struct io_kiocb *req) __must_hold(&req->ctx->uring_lock) { if (unlikely(req->flags & REQ_F_FAIL)) { /* * We don't submit, fail them all, for that replace hardlinks * with normal links. Extra REQ_F_LINK is tolerated. */ req->flags &= ~REQ_F_HARDLINK; req->flags |= REQ_F_LINK; io_req_defer_failed(req, req->cqe.res); } else { int ret = io_req_prep_async(req); if (unlikely(ret)) { io_req_defer_failed(req, ret); return; } if (unlikely(req->ctx->drain_active)) io_drain_req(req); else io_queue_iowq(req); } } ``` Fixes nvbug https:// Signed-off-by: Amy Chang <[email protected]>
Issue: mctp-usb-ctrl was not starting when the interface was set to down and then back up. Reason: The mctp-usb driver was not starting the queue when the device was opened. Fix: Start the mctp tx queue in the mctp_usb_open function. mctp_usb_open function is a callback from when the interface is set to up. fixes jira https://jirasw.nvidia.com/browse/DGXOPENBMC-16690 Before fix: ``` root@gb200nvl-hmc:~# systemctl status mctp-usb-ctrl@1-1 * [email protected] - MCTP USB control daemon Loaded: loaded (/usr/lib/systemd/system/[email protected]; disabled; preset: enabled) Drop-In: /usr/lib/systemd/system/[email protected] `-mctp-ctrl-hmc-usb.conf Active: active (running) since Fri 2025-05-02 17:16:56 UTC; 16min ago Main PID: 340 (mctp-usb-ctrl) CPU: 949ms CGroup: /system.slice/system-mctp\x2dusb\x2dctrl.slice/[email protected] |-340 /bin/sh /usr/bin/mctp-usb-ctrl -m 1 -t 3 -d 1 -v 1 -f /usr/share/mctp/mctp_cfg_usb.json -w 1-1 `-345 /usr/bin/mctp-ctrl -m 1 -t 3 -d 1 -v 1 -f /usr/share/mctp/mctp_cfg_usb.json -w 1-1 May 02 17:17:05 gb200nvl-hmc mctp-usb-ctrl[345]: Registering object '/xyz/openbmc_project/mctp/0/26' for UnixSocket: 26 May 02 17:17:05 gb200nvl-hmc mctp-usb-ctrl[345]: Registering object '/xyz/openbmc_project/mctp/0/26' for Binding: 26 May 02 17:17:05 gb200nvl-hmc mctp-usb-ctrl[345]: Registering object '/xyz/openbmc_project/mctp/0/26' for Enable: 26 May 02 17:17:05 gb200nvl-hmc mctp-usb-ctrl[345]: Registering object '/xyz/openbmc_project/mctp/0/27' for Endpoint: 27 May 02 17:17:05 gb200nvl-hmc mctp-usb-ctrl[345]: Registering object '/xyz/openbmc_project/mctp/0/27' for UUID: 27 May 02 17:17:05 gb200nvl-hmc mctp-usb-ctrl[345]: Registering object '/xyz/openbmc_project/mctp/0/27' for UnixSocket: 27 May 02 17:17:05 gb200nvl-hmc mctp-usb-ctrl[345]: Registering object '/xyz/openbmc_project/mctp/0/27' for Binding: 27 May 02 17:17:05 gb200nvl-hmc mctp-usb-ctrl[345]: Registering object '/xyz/openbmc_project/mctp/0/27' for Enable: 27 May 02 17:17:05 gb200nvl-hmc mctp-usb-ctrl[345]: Getting D-Bus file descriptors May 02 17:17:05 gb200nvl-hmc mctp-usb-ctrl[345]: mctp_ctrl_sdbus_init: Entering polling loop root@gb200nvl-hmc:~# systemctl stop mctp-usb-ctrl@1-1 root@gb200nvl-hmc:~# mctp-list-eps Transport| EID| UUID|Supported MCTP Types SPI| 10| "f72d6fa0-5675-11ed-9b6a-0242ac120002"| 1 5 126 127 root@gb200nvl-hmc:~# ip link set mctpusb0 down root@gb200nvl-hmc:~# ip link set mctpusb0 up root@gb200nvl-hmc:~# mctp-list-eps Transport| EID| UUID|Supported MCTP Types SPI| 10| "f72d6fa0-5675-11ed-9b6a-0242ac120002"| 1 5 126 127 root@gb200nvl-hmc:~# systemctl status mctp-usb-ctrl@1-1 * [email protected] - MCTP USB control daemon Loaded: loaded (/usr/lib/systemd/system/[email protected]; disabled; preset: enabled) Drop-In: /usr/lib/systemd/system/[email protected] `-mctp-ctrl-hmc-usb.conf Active: inactive (dead) since Fri 2025-05-02 17:33:40 UTC; 56s ago Duration: 16min 44.509s Process: 340 ExecStart=/usr/bin/mctp-usb-ctrl $MCTP_USB_CTRL_OPTS -w 1-1 (code=killed, signal=TERM) Main PID: 340 (code=killed, signal=TERM) CPU: 982ms May 02 17:33:40 gb200nvl-hmc mctp-usb-ctrl[345]: mctp_msg_types_delete_all: Deleting msg type entry: EID[18] May 02 17:33:40 gb200nvl-hmc mctp-usb-ctrl[345]: mctp_msg_types_delete_all: Deleting msg type entry: EID[19] May 02 17:33:40 gb200nvl-hmc mctp-usb-ctrl[345]: mctp_msg_types_delete_all: Deleting msg type entry: EID[20] May 02 17:33:40 gb200nvl-hmc mctp-usb-ctrl[345]: mctp_msg_types_delete_all: Deleting msg type entry: EID[21] May 02 17:33:40 gb200nvl-hmc mctp-usb-ctrl[345]: mctp_msg_types_delete_all: Deleting msg type entry: EID[24] May 02 17:33:40 gb200nvl-hmc mctp-usb-ctrl[345]: mctp_msg_types_delete_all: Deleting msg type entry: EID[26] May 02 17:33:40 gb200nvl-hmc mctp-usb-ctrl[345]: mctp_msg_types_delete_all: Deleting msg type entry: EID[27] May 02 17:33:40 gb200nvl-hmc systemd[1]: Stopping MCTP USB control daemon... May 02 17:33:40 gb200nvl-hmc systemd[1]: [email protected]: Deactivated successfully. May 02 17:33:40 gb200nvl-hmc systemd[1]: Stopped MCTP USB control daemon. root@gb200nvl-hmc:~# systemctl start mctp-usb-ctrl@1-1 root@gb200nvl-hmc:~# systemctl status mctp-usb-ctrl@1-1 x [email protected] - MCTP USB control daemon Loaded: loaded (/usr/lib/systemd/system/[email protected]; disabled; preset: enabled) Drop-In: /usr/lib/systemd/system/[email protected] `-mctp-ctrl-hmc-usb.conf Active: failed (Result: exit-code) since Fri 2025-05-02 17:35:00 UTC; 5s ago Duration: 6.053s Process: 19787 ExecStart=/usr/bin/mctp-usb-ctrl $MCTP_USB_CTRL_OPTS -w 1-1 (code=exited, status=1/FAILURE) Main PID: 19787 (code=exited, status=1/FAILURE) CPU: 24ms May 02 17:34:59 gb200nvl-hmc systemd[1]: [email protected]: Main process exited, code=exited, status=1/FAILURE May 02 17:34:59 gb200nvl-hmc systemd[1]: [email protected]: Failed with result 'exit-code'. May 02 17:35:00 gb200nvl-hmc systemd[1]: [email protected]: Scheduled restart job, restart counter is at 2. May 02 17:35:00 gb200nvl-hmc systemd[1]: [email protected]: Start request repeated too quickly. May 02 17:35:00 gb200nvl-hmc systemd[1]: [email protected]: Failed with result 'exit-code'. May 02 17:35:00 gb200nvl-hmc systemd[1]: Failed to start MCTP USB control daemon. root@gb200nvl-hmc:~# ``` After fix: ``` root@vhmc:~# systemctl status mctp-usb-ctrl@1-1 * [email protected] - MCTP USB control daemon Loaded: loaded (/usr/lib/systemd/system/[email protected]; enabled; preset: enabled) Drop-In: /usr/lib/systemd/system/[email protected] `-mctp-ctrl-hmc-usb.conf Active: active (running) since Fri 2025-05-02 21:11:30 UTC; 1min 10s ago Process: 266 ExecStartPre=/usr/bin/vhmc-init.sh (code=exited, status=0/SUCCESS) Main PID: 333 (mctp-usb-ctrl) CPU: 608ms CGroup: /system.slice/system-mctp\x2dusb\x2dctrl.slice/[email protected] |-333 /bin/sh /usr/bin/mctp-usb-ctrl -m 1 -t 3 -d 1 -v 1 -f /usr/share/mctp/mctp_cfg_usb.json -w 1-1 `-336 /usr/bin/mctp-ctrl -m 1 -t 3 -d 1 -v 1 -f /usr/share/mctp/mctp_cfg_usb.json -w 1-1 May 02 21:11:39 vhmc mctp-usb-ctrl[336]: Registering object '/xyz/openbmc_project/mctp/0/26' for UnixSocket: 26 May 02 21:11:39 vhmc mctp-usb-ctrl[336]: Registering object '/xyz/openbmc_project/mctp/0/26' for Binding: 26 May 02 21:11:39 vhmc mctp-usb-ctrl[336]: Registering object '/xyz/openbmc_project/mctp/0/26' for Enable: 26 May 02 21:11:39 vhmc mctp-usb-ctrl[336]: Registering object '/xyz/openbmc_project/mctp/0/27' for Endpoint: 27 May 02 21:11:39 vhmc mctp-usb-ctrl[336]: Registering object '/xyz/openbmc_project/mctp/0/27' for UUID: 27 May 02 21:11:39 vhmc mctp-usb-ctrl[336]: Registering object '/xyz/openbmc_project/mctp/0/27' for UnixSocket: 27 May 02 21:11:39 vhmc mctp-usb-ctrl[336]: Registering object '/xyz/openbmc_project/mctp/0/27' for Binding: 27 May 02 21:11:39 vhmc mctp-usb-ctrl[336]: Registering object '/xyz/openbmc_project/mctp/0/27' for Enable: 27 May 02 21:11:39 vhmc mctp-usb-ctrl[336]: Getting D-Bus file descriptors May 02 21:11:39 vhmc mctp-usb-ctrl[336]: mctp_ctrl_sdbus_init: Entering polling loop root@vhmc:~# systemctl stop mctp-usb-ctrl@1-1 root@vhmc:~# ip link set mctpusb0 down root@vhmc:~# ip link set mctpusb0 up root@vhmc:~# systemctl start mctp-usb-ctrl@1-1 root@vhmc:~# systemctl status mctp-usb-ctrl@1-1 * [email protected] - MCTP USB control daemon Loaded: loaded (/usr/lib/systemd/system/[email protected]; enabled; preset: enabled) Drop-In: /usr/lib/systemd/system/[email protected] `-mctp-ctrl-hmc-usb.conf Active: active (running) since Fri 2025-05-02 21:18:02 UTC; 16s ago Main PID: 15105 (mctp-usb-ctrl) CPU: 356ms CGroup: /system.slice/system-mctp\x2dusb\x2dctrl.slice/[email protected] |-15105 /bin/sh /usr/bin/mctp-usb-ctrl -m 1 -t 3 -d 1 -v 1 -f /usr/share/mctp/mctp_cfg_usb.json -w 1-1 `-15106 /usr/bin/mctp-ctrl -m 1 -t 3 -d 1 -v 1 -f /usr/share/mctp/mctp_cfg_usb.json -w 1-1 May 02 21:18:10 vhmc mctp-usb-ctrl[15106]: Registering object '/xyz/openbmc_project/mctp/0/26' for UnixSocket: 26 May 02 21:18:10 vhmc mctp-usb-ctrl[15106]: Registering object '/xyz/openbmc_project/mctp/0/26' for Binding: 26 May 02 21:18:10 vhmc mctp-usb-ctrl[15106]: Registering object '/xyz/openbmc_project/mctp/0/26' for Enable: 26 May 02 21:18:10 vhmc mctp-usb-ctrl[15106]: Registering object '/xyz/openbmc_project/mctp/0/27' for Endpoint: 27 May 02 21:18:10 vhmc mctp-usb-ctrl[15106]: Registering object '/xyz/openbmc_project/mctp/0/27' for UUID: 27 May 02 21:18:10 vhmc mctp-usb-ctrl[15106]: Registering object '/xyz/openbmc_project/mctp/0/27' for UnixSocket: 27 May 02 21:18:10 vhmc mctp-usb-ctrl[15106]: Registering object '/xyz/openbmc_project/mctp/0/27' for Binding: 27 May 02 21:18:10 vhmc mctp-usb-ctrl[15106]: Registering object '/xyz/openbmc_project/mctp/0/27' for Enable: 27 May 02 21:18:10 vhmc mctp-usb-ctrl[15106]: Getting D-Bus file descriptors May 02 21:18:10 vhmc mctp-usb-ctrl[15106]: mctp_ctrl_sdbus_init: Entering polling loop root@vhmc:~# ```
Fix an issue that timeout timer could only be triggered once. When a slave timeout interrupt triggered, the irq handler must reload the number of timer ticks/cycles in ICC04[28:24] register to re-activate the timeout timer for the next timeout event. Since the commit 0d1fdaa, the i2c_bus->timeout stores a real time value (unit: us) rather than number of timer cycles (unit: 1 cycle of Timeout Base Clock), which can not be used by AST2600_I2CC_TTIMEOUT() directly. Otherwise it can fill a wrong number or 0 (= timer disabled) in ICC04[28:24] and cause timeout timer not functioning. In driver init, the number of cycles is calculated in tout_ticks so let's use this variable as a struct member in i2c_bus rather than a local variable so the irq handler doesn't have to re-calculate the number while reloading ICC04[28:24]. Fixes 0d1fdaa ("i2c-ast2600: a DT property to adjust tout base-clk ...") Fixes nvbug https://nvbugspro.nvidia.com/bug/5302873 Signed-off-by: Chester Lin <[email protected]>
Fixes jira https:// Signed-off-by: Ting-Kai Chen <[email protected]>
…lock read During ipmi ssif test, some i2c-tegra driver warnings can be observed in BaseOS dmesg, which causes ipmi-ssif not working sometimes. This was caused by wrong data delivery in incorrect SMBus block w/r transactions, which simply filled the byte count field with a "zero". To follow SMBus spec [the byte-count cannot be 0], we should still return at least 1 but not 0 in error cases since the byte is still transferred by the i2c bus driver. dmesg ===== [47210.237047] ipmi-ssif-host 0-0010: Warn: Unknown SMBus write command=0x3 [47210.245648] ipmi-ssif-host 0-0010: Warn: on_stop_event unexpected SLAVE STOP in state=SSIF_ABORTING [47210.259838] ipmi-ssif-host 0-0010: Warn: on_stop_event unexpected SLAVE STOP in state=SSIF_ABORTING i2c_slave event trace ===================== ... snip ... dbus-broker-262 [000] d.h.. 47210.214994: i2c_slave: i2c-0 a=010 ret=0 WR_RCV [01] dbus-broker-262 [000] d.h.. 47210.215008: i2c_slave: i2c-0 a=010 ret=0 WR_RCV [00] dbus-broker-262 [000] d.h.. 47210.215015: i2c_slave: i2c-0 a=010 ret=0 WR_RCV [00] dbus-broker-262 [000] d.h.. 47210.215029: i2c_slave: i2c-0 a=010 ret=0 WR_RCV [d5] dbus-broker-262 [000] dnh.. 47210.215390: i2c_slave: i2c-0 a=010 ret=0 STOP [] dbus-broker-262 [000] dnh.. 47210.215397: i2c_slave: i2c-0 a=010 ret=0 WR_REQ [] <idle>-0 [000] d.h.. 47210.236845: i2c_slave: i2c-0 a=010 ret=0 WR_RCV [03] <== SSIF read (0x3) <idle>-0 [000] d.h.. 47210.244582: i2c_slave: i2c-0 a=010 ret=0 WR_RCV [03] <== !! duplicated // !! A duplicated WR_RCV [03] with SSIF-READ command 0x3 arrived. // When a SSIF-READ command (0x3) is received, the next packet should be RD_REQ for starting read transaction. // However, a second WR_REV [] arrived so SSIF state mismatched and turned into SSIF_ABORTING <idle>-0 [000] d.h.. 47210.244584: i2c_slave: i2c-0 a=010 ret=0 RD_REQ [00] <= returned byte-length = 0 // Since ssif-state was in SSIF_ABORTING, ssif_bmc driver returned the "smbus length" byte with // 0 to Host/i2c-tegra, and then i2c-tegra crashed with lots of "I2C transfer timed out" // warnings in BaseOS since all the follow RD_PRO bytes were all 0. rcu_sched-16 [000] d.h.. 47210.244714: i2c_slave: i2c-0 a=010 ret=0 RD_PRO [00] systemd-journal-179 [000] d.h.. 47210.244830: i2c_slave: i2c-0 a=010 ret=0 RD_PRO [00] systemd-journal-179 [000] d.h.. 47210.244947: i2c_slave: i2c-0 a=010 ret=0 RD_PRO [00] systemd-journal-179 [000] d.h.. 47210.245062: i2c_slave: i2c-0 a=010 ret=0 RD_PRO [00] systemd-journal-179 [000] d.h.. 47210.245176: i2c_slave: i2c-0 a=010 ret=0 RD_PRO [00] ipmid-687 [000] d.h.. 47210.245291: i2c_slave: i2c-0 a=010 ret=0 RD_PRO [00] ... snip ... Fixes nvbug https://nvbugspro.nvidia.com/bug/5341543 Fixes nvbug https://nvbugspro.nvidia.com/bug/5168121 Signed-off-by: Chester Lin <[email protected]>
fix for miss handle case * code[1] revised by Chester Lin to be compatible with NvBMC kernel code base. Signed-off-by: Ryan Chen <[email protected]> Change-Id: I351ee136d0b45095006dacc16b82370cbbb6f334 Link: AspeedTech-BMC@4dc1c9a [1] Acked-by: Chester Lin <[email protected]>
In a previous patch, we introduced two new cases to handle the updated i2c slave behavior on the AST2600. However, it turns out that both cases should be handled using the same logic. Additionally, we discovered that one of the cases was incorrectly invoking the virtual slave API. This patch corrects it and merges the handling into a unified case. * code[1] revised by Chester Lin to be compatible with NvBMC kernel code base. Signed-off-by: Tommy Huang <[email protected]> Change-Id: I1111be0b89f4836697c45f41c3cdcff5313d55b2 Link: AspeedTech-BMC@c9e404a [1] Signed-off-by: Chester Lin <[email protected]>
Combine the handling of SLAVE_PENDING, TX_NAK, and STOP into a single IRQ case. This change separates the original logic into two stages: 1. The STOP stage is handled in the current IRQ. 2. The next packet stage will be handled in the following IRQ. This refactoring improves clarity and reduces redundancy in the interrupt handling logic. * code[1] revised by Chester Lin to be compatible with NvBMC kernel code base. Signed-off-by: Tommy Huang <[email protected]> Change-Id: Iecafffd1907c8c3a160d70fb3a38de2dbf90562e Link: AspeedTech-BMC@bd4e6bd [1] Acked-by: Chester Lin <[email protected]>
In a multi-slace scenario, when SLAVE_PENDING and STOP condition occur, the driver need to issue STOP event for the previous slave device. To support this, add logic to record the pervious slave index to that the STOP event can be delivered correctly to the appropriate slave context. * code[1] revised by Chester Lin to be compatible with NvBMC kernel code base. Signed-off-by: Tommy Huang <[email protected]> Change-Id: Ide44721dc9f4763304584eb8a90ba53e994763be Link: AspeedTech-BMC@34244dc [1] Acked-by: Chester Lin <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Include codeowners file with reviewers.