Topics

[PATCH v5 0/5] io_uring cmd for tx timestamps (13 replies) [PATCH v5 1/5] net: timestamp: add helper returning skb's tx tstamp (6 replies) [PATCH v5 2/5] io_uring/poll: introduce io_arm_apoll() (0 replies) [PATCH v5 3/5] io_uring/cmd: allow multishot polled commands (0 replies) [PATCH v5 4/5] io_uring: add mshot helper for posting CQE32 (0 replies) [PATCH v5 5/5] io_uring/netcmd: add tx timestamping cmd support (0 replies)

Top-posted messages (sorry this section looks like a mess)

Jens Axboe <axboe@kernel.dk>
>>> On Tue, 17 Jun 2025 16:33:20 -0600 Jens Axboe wrote: >>>> Can we put it in a separate branch and merge it into both? Otherwise >>>> my branch will get a bunch of unrelated commits, and pulling an >>>> unnamed sha is pretty iffy. >>> >>> Like this? >>> >>> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git timestamp-for-jens
Jens Axboe <axboe@kernel.dk>
>>> On Tue, 17 Jun 2025 16:33:20 -0600 Jens Axboe wrote: >>>> Can we put it in a separate branch and merge it into both? Otherwise >>>> my branch will get a bunch of unrelated commits, and pulling an >>>> unnamed sha is pretty iffy. >>> >>> Like this? >>> >>> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git timestamp-for-jens
Jens Axboe <axboe@kernel.dk>
>>>> Sounds like we're good to queue this up for 6.17? >>> >>> I think so. Can I apply patch 1 off v6.16-rc1 and merge it in to >>> net-next? Hash will be 2410251cde0bac9f6, you can pull that across. >>> LMK if that works.
Jens Axboe <axboe@kernel.dk>
>>>> Sounds like we're good to queue this up for 6.17? >>> >>> I think so. Can I apply patch 1 off v6.16-rc1 and merge it in to >>> net-next? Hash will be 2410251cde0bac9f6, you can pull that across. >>> LMK if that works.
Jens Axboe <axboe@kernel.dk>
>>> Vadim Fedorenko suggested to add an alternative API for receiving >>> tx timestamps through io_uring. The series introduces io_uring socket >>> cmd for fetching tx timestamps, which is a polled multishot request, >>> i.e. internally polling the socket for POLLERR and posts timestamps >>> when they're arrives. For the API description see Patch 5. >>> >>> It reuses existing timestamp infra and takes them from the socket's >>> error queue. For networking people the important parts are Patch 1, >>> and io_uring_cmd_timestamp() from Patch 5 walking the error queue. >>> >>> It should be reasonable to take it through the io_uring tree once >>> we have consensus, but let me know if there are any concerns.

Actual thread

Pavel Begunkov <asml.silence@gmail.com>
Vadim Fedorenko suggested to add an alternative API for receiving tx timestamps through io_uring. The series introduces io_uring socket cmd for fetching tx timestamps, which is a polled multishot request, i.e. internally polling the socket for POLLERR and posts timestamps when they're arrives. For the API description see Patch 5. It reuses existing timestamp infra and takes them from the socket's error queue. For networking people the important parts are Patch 1, and io_uring_cmd_timestamp() from Patch 5 walking the error queue.
Jens Axboe <axboe@kernel.dk>
> [...] Applied, thanks! [2/5] io_uring/poll: introduce io_arm_apoll() commit: 162151889267089bb920609830c35f9272087c3f [3/5] io_uring/cmd: allow multishot polled commands commit: b95575495948a81ac9b0110aa721ea061dd850d9 [4/5] io_uring: add mshot helper for posting CQE32 commit: ac479eac22e81c0ff56c6bdb93fad787015149cc [5/5] io_uring/netcmd: add tx timestamping cmd support commit: 9e4ed359b8efad0e8ad4510d8ad22bf0b060526a
Jens Axboe <axboe@kernel.dk>
Pavel, can you send in the liburing PR for these, please?
Pavel Begunkov <asml.silence@gmail.com>
It needs a minor clean up, I'll send it by Monday -- Pavel Begunkov
-- Jens Axboe
Best regards, -- Jens Axboe
It should be reasonable to take it through the io_uring tree once we have consensus, but let me know if there are any concerns.
Jens Axboe <axboe@kernel.dk>
Sounds like we're good to queue this up for 6.17?
Jakub Kicinski <kuba@kernel.org>
I think so. Can I apply patch 1 off v6.16-rc1 and merge it in to net-next? Hash will be 2410251cde0bac9f6, you can pull that across. LMK if that works.
Jens Axboe <axboe@kernel.dk>
Can we put it in a separate branch and merge it into both? Otherwise my branch will get a bunch of unrelated commits, and pulling an unnamed sha is pretty iffy.
Jakub Kicinski <kuba@kernel.org>
Like this? https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git timestamp-for-jens
Jens Axboe <axboe@kernel.dk>
Perfect, thanks Jakub! -- Jens Axboe
Jens Axboe <axboe@kernel.dk>
Branch seems to be gone?
Pavel Begunkov <asml.silence@gmail.com>
Anything against just taking the hash from net-next, turning into a branch and then merging that? E.g. https://github.com/isilence/linux.git io_uring-tx-timestamp-net-helpers -- Pavel Begunkov
Jakub Kicinski <kuba@kernel.org>
Ah, I deleted when I was forwarding to Linus yesterday. I figured you already pulled, sorry about that. I've pushed it out again now.
Jens Axboe <axboe@kernel.dk>
Thanks! Usually I would have, but waiting on -rc3 for that branch. I'll reply here when I've pulled, should be on Sunday. -- Jens Axboe
Jens Axboe <axboe@kernel.dk>
I've pulled it now, thanks Jakub. -- Jens Axboe
-- Jens Axboe
-- Jens Axboe
-- Jens Axboe
v5: return SOF_TIMESTAMPING_TX_* from net helpers v4: rename uapi flags, etc. v3: Add a flag to distinguish sw vs hw timestamp. skb_get_tx_timestamp() from Patch 1 now returns the indication of that, and in Patch 5 it's converted into a io_uring CQE bit flag. v2: remove (rx) false timestamp handling fix skipping already queued events on request submission constantize socket in a helper Pavel Begunkov (5): net: timestamp: add helper returning skb's tx tstamp io_uring/poll: introduce io_arm_apoll() io_uring/cmd: allow multishot polled commands io_uring: add mshot helper for posting CQE32 io_uring/netcmd: add tx timestamping cmd support include/net/sock.h | 4 ++ include/uapi/linux/io_uring.h | 16 +++++++ io_uring/cmd_net.c | 82 +++++++++++++++++++++++++++++++++++ io_uring/io_uring.c | 40 +++++++++++++++++ io_uring/io_uring.h | 1 + io_uring/poll.c | 44 +++++++++++-------- io_uring/poll.h | 1 + io_uring/uring_cmd.c | 34 +++++++++++++++ io_uring/uring_cmd.h | 7 +++ net/socket.c | 46 ++++++++++++++++++++ 10 files changed, 258 insertions(+), 17 deletions(-) -- 2.49.0

Top-posted messages (sorry this section looks like a mess)

Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Pavel Begunkov wrote:
Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Pavel Begunkov wrote:
Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Willem de Bruijn wrote:
Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Pavel Begunkov wrote:

Actual thread

Pavel Begunkov <asml.silence@gmail.com>
Add a helper function skb_get_tx_timestamp() that returns a tx timestamp associated with an error queue skb. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Thanks!
--- include/net/sock.h | 4 ++++ net/socket.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 50 insertions(+) diff --git a/include/net/sock.h b/include/net/sock.h index 92e7c1aae3cc..f5f5a9ad290b 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -2677,6 +2677,10 @@ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk, void __sock_recv_wifi_status(struct msghdr *msg, struct sock *sk, struct sk_buff *skb); +bool skb_has_tx_timestamp(struct sk_buff *skb, const struct sock *sk); +int skb_get_tx_timestamp(struct sk_buff *skb, struct sock *sk, + struct timespec64 *ts); + static inline void sock_recv_timestamp(struct msghdr *msg, struct sock *sk, struct sk_buff *skb) { diff --git a/net/socket.c b/net/socket.c index 9a0e720f0859..2cab805943c0 100644 --- a/net/socket.c +++ b/net/socket.c @@ -843,6 +843,52 @@ static void put_ts_pktinfo(struct msghdr *msg, struct sk_buff *skb, sizeof(ts_pktinfo), &ts_pktinfo); } +bool skb_has_tx_timestamp(struct sk_buff *skb, const struct sock *sk) +{
Willem de Bruijn <willemdebruijn.kernel@gmail.com>
I forgot to ask earlier, and not a reason for a respin. Is the only reason that skb is not const here skb_hwtstamps?
Pavel Begunkov <asml.silence@gmail.com>
Yes, and also get_timestamp() for skb_get_tx_timestamp(). It's easy to patch, but I was hoping we can merge it through the io_uring tree without deps on net-next and add const to the new helpers after. It's definitely less trouble
Willem de Bruijn <willemdebruijn.kernel@gmail.com>
> than orchestrating a separate branch otherwise. Makes sense. > FWIW, it'd be fine to add
than orchestrating a separate branch otherwise. FWIW, it'd be fine to add const to the existing helpers in the meantime as long as the new functions stay non-const for now. Hope that works
Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Yeah, agreed no need to add such dependencies.
-- Pavel Begunkov
I can send a patch to make that container_of_const
Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Just to follow up. The container_of_const is not applicable here. As skb_shared_info is a (cast) pointer, into skb linear. So even simpler, the skb can be const even if what its member points to is not. This works fine. -static inline struct skb_shared_hwtstamps *skb_hwtstamps(struct sk_buff *skb) +static inline struct skb_shared_hwtstamps *skb_hwtstamps(const struct sk_buff *skb) { return &skb_shinfo(skb)->hwtstamps; } And same for skb_zcopy, skb_zcopy_init, skb_zcopy_set, skb_zcopy_set_nouarg, skb_zcopy_is_nouarg, skb_zcopy_get_nouarg, skb_zcopy_clear, __skb_zcopy_downgrade_managed, skb_zcopy_downgrade_managed, skb_frag_ref and the ubuf_info_ops complete and link callbacks. But that's a lot of churn, especially if including ubuf_info implementations like io_uring. Not sure it's worth that.
+ const struct sock_exterr_skb *serr = SKB_EXT_ERR(skb); + u32 tsflags = READ_ONCE(sk->sk_tsflags); + + if (serr->ee.ee_errno != ENOMSG || + serr->ee.ee_origin != SO_EE_ORIGIN_TIMESTAMPING) + return false; + + /* software time stamp available and wanted */ + if ((tsflags & SOF_TIMESTAMPING_SOFTWARE) && skb->tstamp) + return true; + /* hardware time stamps available and wanted */ + return (tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) && + skb_hwtstamps(skb)->hwtstamp; +} + +int skb_get_tx_timestamp(struct sk_buff *skb, struct sock *sk, + struct timespec64 *ts) +{ + u32 tsflags = READ_ONCE(sk->sk_tsflags); + ktime_t hwtstamp; + int if_index = 0; + + if ((tsflags & SOF_TIMESTAMPING_SOFTWARE) && + ktime_to_timespec64_cond(skb->tstamp, ts)) + return SOF_TIMESTAMPING_TX_SOFTWARE; + + if (!(tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) || + skb_is_swtx_tstamp(skb, false)) + return -ENOENT; + + if (skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP_NETDEV) + hwtstamp = get_timestamp(sk, skb, &if_index); + else + hwtstamp = skb_hwtstamps(skb)->hwtstamp; + + if (tsflags & SOF_TIMESTAMPING_BIND_PHC) + hwtstamp = ptp_convert_timestamp(&hwtstamp, + READ_ONCE(sk->sk_bind_phc)); + if (!ktime_to_timespec64_cond(hwtstamp, ts)) + return -ENOENT; + + return SOF_TIMESTAMPING_TX_HARDWARE; +} + /* * called from sock_recv_timestamp() if sock_flag(sk, SOCK_RCVTSTAMP) */ -- 2.49.0

Actual thread

Pavel Begunkov <asml.silence@gmail.com>
In preparation to allowing commands to do file polling, add a helper that takes the desired poll event mask and arms it for polling. We won't be able to use io_arm_poll_handler() with IORING_OP_URING_CMD as it tries to infer the mask from the opcode data, and we can't unify it across all commands. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> --- io_uring/poll.c | 44 +++++++++++++++++++++++++++----------------- io_uring/poll.h | 1 + 2 files changed, 28 insertions(+), 17 deletions(-) diff --git a/io_uring/poll.c b/io_uring/poll.c index 0526062e2f81..c7e9fb34563d 100644 --- a/io_uring/poll.c +++ b/io_uring/poll.c @@ -669,33 +669,18 @@ static struct async_poll *io_req_alloc_apoll(struct io_kiocb *req, return apoll; } -int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags) +int io_arm_apoll(struct io_kiocb *req, unsigned issue_flags, __poll_t mask) { - const struct io_issue_def *def = &io_issue_defs[req->opcode]; struct async_poll *apoll; struct io_poll_table ipt; - __poll_t mask = POLLPRI | POLLERR | EPOLLET; int ret; - if (!def->pollin && !def->pollout) - return IO_APOLL_ABORTED; + mask |= EPOLLET; if (!io_file_can_poll(req)) return IO_APOLL_ABORTED; if (!(req->flags & REQ_F_APOLL_MULTISHOT)) mask |= EPOLLONESHOT; - if (def->pollin) { - mask |= EPOLLIN | EPOLLRDNORM; - - /* If reading from MSG_ERRQUEUE using recvmsg, ignore POLLIN */ - if (req->flags & REQ_F_CLEAR_POLLIN) - mask &= ~EPOLLIN; - } else { - mask |= EPOLLOUT | EPOLLWRNORM; - } - if (def->poll_exclusive) - mask |= EPOLLEXCLUSIVE; - apoll = io_req_alloc_apoll(req, issue_flags); if (!apoll) return IO_APOLL_ABORTED; @@ -712,6 +697,31 @@ int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags) return IO_APOLL_OK; } +int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags) +{ + const struct io_issue_def *def = &io_issue_defs[req->opcode]; + __poll_t mask = POLLPRI | POLLERR; + + if (!def->pollin && !def->pollout) + return IO_APOLL_ABORTED; + if (!io_file_can_poll(req)) + return IO_APOLL_ABORTED; + + if (def->pollin) { + mask |= EPOLLIN | EPOLLRDNORM; + + /* If reading from MSG_ERRQUEUE using recvmsg, ignore POLLIN */ + if (req->flags & REQ_F_CLEAR_POLLIN) + mask &= ~EPOLLIN; + } else { + mask |= EPOLLOUT | EPOLLWRNORM; + } + if (def->poll_exclusive) + mask |= EPOLLEXCLUSIVE; + + return io_arm_apoll(req, issue_flags, mask); +} + /* * Returns true if we found and killed one or more poll requests */ diff --git a/io_uring/poll.h b/io_uring/poll.h index 27e2db2ed4ae..c8438286dfa0 100644 --- a/io_uring/poll.h +++ b/io_uring/poll.h @@ -41,6 +41,7 @@ int io_poll_remove(struct io_kiocb *req, unsigned int issue_flags); struct io_cancel_data; int io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd, unsigned issue_flags); +int io_arm_apoll(struct io_kiocb *req, unsigned issue_flags, __poll_t mask); int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags); bool io_poll_remove_all(struct io_ring_ctx *ctx, struct io_uring_task *tctx, bool cancel_all); -- 2.49.0

Actual thread

Pavel Begunkov <asml.silence@gmail.com>
Some commands like timestamping in the next patch can make use of multishot polling, i.e. REQ_F_APOLL_MULTISHOT. Add support for that, which is condensed in a single helper called io_cmd_poll_multishot(). The user who wants to continue with a request in a multishot mode must call the function, and only if it returns 0 the user is free to proceed. Apart from normal terminal errors, it can also end up with -EIOCBQUEUED, in which case the user must forward it to the core io_uring. It's forbidden to use task work while the request is executing in a multishot mode. The API is not foolproof, hence it's not exported to modules nor exposed in public headers. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> --- io_uring/uring_cmd.c | 23 +++++++++++++++++++++++ io_uring/uring_cmd.h | 3 +++ 2 files changed, 26 insertions(+) diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c index 9ad0ea5398c2..02cec6231831 100644 --- a/io_uring/uring_cmd.c +++ b/io_uring/uring_cmd.c @@ -12,6 +12,7 @@ #include "alloc_cache.h" #include "rsrc.h" #include "uring_cmd.h" +#include "poll.h" void io_cmd_cache_free(const void *entry) { @@ -136,6 +137,9 @@ void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd, { struct io_kiocb *req = cmd_to_io_kiocb(ioucmd); + if (WARN_ON_ONCE(req->flags & REQ_F_APOLL_MULTISHOT)) + return; + ioucmd->task_work_cb = task_work_cb; req->io_task_work.func = io_uring_cmd_work; __io_req_task_work_add(req, flags); @@ -158,6 +162,9 @@ void io_uring_cmd_done(struct io_uring_cmd *ioucmd, ssize_t ret, u64 res2, { struct io_kiocb *req = cmd_to_io_kiocb(ioucmd); + if (WARN_ON_ONCE(req->flags & REQ_F_APOLL_MULTISHOT)) + return; + io_uring_cmd_del_cancelable(ioucmd, issue_flags); if (ret < 0) @@ -305,3 +312,19 @@ void io_uring_cmd_issue_blocking(struct io_uring_cmd *ioucmd) io_req_queue_iowq(req); } + +int io_cmd_poll_multishot(struct io_uring_cmd *cmd, + unsigned int issue_flags, __poll_t mask) +{ + struct io_kiocb *req = cmd_to_io_kiocb(cmd); + int ret; + + if (likely(req->flags & REQ_F_APOLL_MULTISHOT)) + return 0; + + req->flags |= REQ_F_APOLL_MULTISHOT; + mask &= ~EPOLLONESHOT; + + ret = io_arm_apoll(req, issue_flags, mask); + return ret == IO_APOLL_OK ? -EIOCBQUEUED : -ECANCELED; +} diff --git a/io_uring/uring_cmd.h b/io_uring/uring_cmd.h index a6dad47afc6b..50a6ccb831df 100644 --- a/io_uring/uring_cmd.h +++ b/io_uring/uring_cmd.h @@ -18,3 +18,6 @@ bool io_uring_try_cancel_uring_cmd(struct io_ring_ctx *ctx, struct io_uring_task *tctx, bool cancel_all); void io_cmd_cache_free(const void *entry); + +int io_cmd_poll_multishot(struct io_uring_cmd *cmd, + unsigned int issue_flags, __poll_t mask); -- 2.49.0

Actual thread

Pavel Begunkov <asml.silence@gmail.com>
Add a helper for posting 32 byte CQEs in a multishot mode and add a cmd helper on top. As it specifically works with requests, the helper ignore the passed in cqe->user_data and sets it to the one stored in the request. The command helper is only valid with multishot requests. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> --- io_uring/io_uring.c | 40 ++++++++++++++++++++++++++++++++++++++++ io_uring/io_uring.h | 1 + io_uring/uring_cmd.c | 11 +++++++++++ io_uring/uring_cmd.h | 4 ++++ 4 files changed, 56 insertions(+) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 98a701fc56cc..4352cf209450 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -793,6 +793,21 @@ bool io_cqe_cache_refill(struct io_ring_ctx *ctx, bool overflow) return true; } +static bool io_fill_cqe_aux32(struct io_ring_ctx *ctx, + struct io_uring_cqe src_cqe[2]) +{ + struct io_uring_cqe *cqe; + + if (WARN_ON_ONCE(!(ctx->flags & IORING_SETUP_CQE32))) + return false; + if (unlikely(!io_get_cqe(ctx, &cqe))) + return false; + + memcpy(cqe, src_cqe, 2 * sizeof(*cqe)); + trace_io_uring_complete(ctx, NULL, cqe); + return true; +} + static bool io_fill_cqe_aux(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags) { @@ -904,6 +919,31 @@ bool io_req_post_cqe(struct io_kiocb *req, s32 res, u32 cflags) return posted; } +/* + * A helper for multishot requests posting additional CQEs. + * Should only be used from a task_work including IO_URING_F_MULTISHOT. + */ +bool io_req_post_cqe32(struct io_kiocb *req, struct io_uring_cqe cqe[2]) +{ + struct io_ring_ctx *ctx = req->ctx; + bool posted; + + lockdep_assert(!io_wq_current_is_worker()); + lockdep_assert_held(&ctx->uring_lock); + + cqe[0].user_data = req->cqe.user_data; + if (!ctx->lockless_cq) { + spin_lock(&ctx->completion_lock); + posted = io_fill_cqe_aux32(ctx, cqe); + spin_unlock(&ctx->completion_lock); + } else { + posted = io_fill_cqe_aux32(ctx, cqe); + } + + ctx->submit_state.cq_flush = true; + return posted; +} + static void io_req_complete_post(struct io_kiocb *req, unsigned issue_flags) { struct io_ring_ctx *ctx = req->ctx; diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index d59c12277d58..1263af818c47 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -81,6 +81,7 @@ void io_req_defer_failed(struct io_kiocb *req, s32 res); bool io_post_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags); void io_add_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags); bool io_req_post_cqe(struct io_kiocb *req, s32 res, u32 cflags); +bool io_req_post_cqe32(struct io_kiocb *req, struct io_uring_cqe src_cqe[2]); void __io_commit_cqring_flush(struct io_ring_ctx *ctx); void io_req_track_inflight(struct io_kiocb *req); diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c index 02cec6231831..b228b84a510f 100644 --- a/io_uring/uring_cmd.c +++ b/io_uring/uring_cmd.c @@ -328,3 +328,14 @@ int io_cmd_poll_multishot(struct io_uring_cmd *cmd, ret = io_arm_apoll(req, issue_flags, mask); return ret == IO_APOLL_OK ? -EIOCBQUEUED : -ECANCELED; } + +bool io_uring_cmd_post_mshot_cqe32(struct io_uring_cmd *cmd, + unsigned int issue_flags, + struct io_uring_cqe cqe[2]) +{ + struct io_kiocb *req = cmd_to_io_kiocb(cmd); + + if (WARN_ON_ONCE(!(issue_flags & IO_URING_F_MULTISHOT))) + return false; + return io_req_post_cqe32(req, cqe); +} diff --git a/io_uring/uring_cmd.h b/io_uring/uring_cmd.h index 50a6ccb831df..9e11da10ecab 100644 --- a/io_uring/uring_cmd.h +++ b/io_uring/uring_cmd.h @@ -17,6 +17,10 @@ void io_uring_cmd_cleanup(struct io_kiocb *req); bool io_uring_try_cancel_uring_cmd(struct io_ring_ctx *ctx, struct io_uring_task *tctx, bool cancel_all); +bool io_uring_cmd_post_mshot_cqe32(struct io_uring_cmd *cmd, + unsigned int issue_flags, + struct io_uring_cqe cqe[2]); + void io_cmd_cache_free(const void *entry); int io_cmd_poll_multishot(struct io_uring_cmd *cmd, -- 2.49.0

Actual thread

Pavel Begunkov <asml.silence@gmail.com>
Add a new socket command which returns tx time stamps to the user. It provide an alternative to the existing error queue recvmsg interface. The command works in a polled multishot mode, which means io_uring will poll the socket and keep posting timestamps until the request is cancelled or fails in any other way (e.g. with no space in the CQ). It reuses the net infra and grabs timestamps from the socket's error queue. The command requires IORING_SETUP_CQE32. All non-final CQEs (marked with IORING_CQE_F_MORE) have cqe->res set to the tskey, and the upper 16 bits of cqe->flags keep tstype (i.e. offset by IORING_CQE_BUFFER_SHIFT). The timevalue is store in the upper part of the extended CQE. The final completion won't have IORING_CQE_F_MORE and will have cqe->res storing 0/error. Suggested-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> --- include/uapi/linux/io_uring.h | 16 +++++++ io_uring/cmd_net.c | 82 +++++++++++++++++++++++++++++++++++ 2 files changed, 98 insertions(+) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index cfd17e382082..dcadf709bfc4 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -968,6 +968,22 @@ enum io_uring_socket_op { SOCKET_URING_OP_SIOCOUTQ, SOCKET_URING_OP_GETSOCKOPT, SOCKET_URING_OP_SETSOCKOPT, + SOCKET_URING_OP_TX_TIMESTAMP, +}; + +/* + * SOCKET_URING_OP_TX_TIMESTAMP definitions + */ + +#define IORING_TIMESTAMP_HW_SHIFT 16 +/* The cqe->flags bit from which the timestamp type is stored */ +#define IORING_TIMESTAMP_TYPE_SHIFT (IORING_TIMESTAMP_HW_SHIFT + 1) +/* The cqe->flags flag signifying whether it's a hardware timestamp */ +#define IORING_CQE_F_TSTAMP_HW ((__u32)1 << IORING_TIMESTAMP_HW_SHIFT); + +struct io_timespec { + __u64 tv_sec; + __u64 tv_nsec; }; /* Zero copy receive refill queue entry */ diff --git a/io_uring/cmd_net.c b/io_uring/cmd_net.c index e99170c7d41a..3866fe6ff541 100644 --- a/io_uring/cmd_net.c +++ b/io_uring/cmd_net.c @@ -1,5 +1,6 @@ #include <asm/ioctls.h> #include <linux/io_uring/net.h> +#include <linux/errqueue.h> #include <net/sock.h> #include "uring_cmd.h" @@ -51,6 +52,85 @@ static inline int io_uring_cmd_setsockopt(struct socket *sock, optlen); } +static bool io_process_timestamp_skb(struct io_uring_cmd *cmd, struct sock *sk, + struct sk_buff *skb, unsigned issue_flags) +{ + struct sock_exterr_skb *serr = SKB_EXT_ERR(skb); + struct io_uring_cqe cqe[2]; + struct io_timespec *iots; + struct timespec64 ts; + u32 tstype, tskey; + int ret; + + BUILD_BUG_ON(sizeof(struct io_uring_cqe) != sizeof(struct io_timespec)); + + ret = skb_get_tx_timestamp(skb, sk, &ts); + if (ret < 0) + return false; + + tskey = serr->ee.ee_data; + tstype = serr->ee.ee_info; + + cqe->user_data = 0; + cqe->res = tskey; + cqe->flags = IORING_CQE_F_MORE; + cqe->flags |= tstype << IORING_TIMESTAMP_TYPE_SHIFT; + if (ret == SOF_TIMESTAMPING_TX_HARDWARE) + cqe->flags |= IORING_CQE_F_TSTAMP_HW; + + iots = (struct io_timespec *)&cqe[1]; + iots->tv_sec = ts.tv_sec; + iots->tv_nsec = ts.tv_nsec; + return io_uring_cmd_post_mshot_cqe32(cmd, issue_flags, cqe); +} + +static int io_uring_cmd_timestamp(struct socket *sock, + struct io_uring_cmd *cmd, + unsigned int issue_flags) +{ + struct sock *sk = sock->sk; + struct sk_buff_head *q = &sk->sk_error_queue; + struct sk_buff *skb, *tmp; + struct sk_buff_head list; + int ret; + + if (!(issue_flags & IO_URING_F_CQE32)) + return -EINVAL; + ret = io_cmd_poll_multishot(cmd, issue_flags, EPOLLERR); + if (unlikely(ret)) + return ret; + + if (skb_queue_empty_lockless(q)) + return -EAGAIN; + __skb_queue_head_init(&list); + + scoped_guard(spinlock_irq, &q->lock) { + skb_queue_walk_safe(q, skb, tmp) { + /* don't support skbs with payload */ + if (!skb_has_tx_timestamp(skb, sk) || skb->len) + continue; + __skb_unlink(skb, q); + __skb_queue_tail(&list, skb); + } + } + + while (1) { + skb = skb_peek(&list); + if (!skb) + break; + if (!io_process_timestamp_skb(cmd, sk, skb, issue_flags)) + break; + __skb_dequeue(&list); + consume_skb(skb); + } + + if (!unlikely(skb_queue_empty(&list))) { + scoped_guard(spinlock_irqsave, &q->lock) + skb_queue_splice(q, &list); + } + return -EAGAIN; +} + int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags) { struct socket *sock = cmd->file->private_data; @@ -76,6 +156,8 @@ int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags) return io_uring_cmd_getsockopt(sock, cmd, issue_flags); case SOCKET_URING_OP_SETSOCKOPT: return io_uring_cmd_setsockopt(sock, cmd, issue_flags); + case SOCKET_URING_OP_TX_TIMESTAMP: + return io_uring_cmd_timestamp(sock, cmd, issue_flags); default: return -EOPNOTSUPP; } -- 2.49.0