Skip to content

Conversation

@mox692
Copy link
Member

@mox692 mox692 commented May 8, 2025

Overview

This PR implement io_uring version open system call.

Since common file operations, like fs::read and fs::write, use open under the hood, I think suppporting open sysacall first is a good starting point.

Also, since this is the first operation we actually ship, I've added:

  • Some uring fs tests
  • Runtime flag for enabling uring (enable_uring)

Implementation

  • When constructing OpenOptions, the implementation is chosen based on the presence of cfg flags.
  • The only difference is that its open() method, which uses io_uring or thread_pool.
  • Even when calling open() on UringOpenOptions, if the machine does not support io_uring, it will fall back to using the standard OpenOptions.

Much of the code in this PR is ported from the tokio-uring crate.

Why do we need our own UringOpenOptions?

The standard library’s OpenOptions does not expose accessors for its internal flags. However, when using io_uring to perform the open system call, it's necessary to access these internal flags. Since this is not possible with the standard implementation, we define our own UringOpenOptions type.

This limitation is tracked in the std (rust-lang/rust#76801), but it has not yet been resolved.

@mox692 mox692 added A-tokio Area: The main tokio crate M-fs Module: tokio/fs labels May 8, 2025
@mox692
Copy link
Member Author

mox692 commented May 11, 2025

I ran a benchmark in commit d0dc6b2.

settings
$ uname -a
Linux mox692-ThinkPad-P51 6.8.0-59-generic 61-Ubuntu SMP PREEMPT_DYNAMIC Fri Apr 11 23:16:11 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

# On commit `d0dc6b2`,

# uring bench
$ RUSTFLAGS="--cfg tokio_unstable_uring" cargo bench open_many_files

# thread pool bench
$ cargo bench open_many_files

Single-threaded runtime

Benchmark Name Thread Pool Uring
open_many_files [5.0981 µs 5.1226 µs 5.1520 µs] [4.1279 µs 4.1374 µs 4.1479 µs]

Multi-threaded runtime

Benchmark Name Thread Pool Uring
open_many_files/2 threads [3.5071 µs 3.5448 µs 3.5911 µs] [4.0503 µs 4.1464 µs 4.2529 µs]
open_many_files/4 threads [3.2188 µs 3.2382 µs 3.2582 µs] [5.0549 µs 5.1624 µs 5.2627 µs]
open_many_files/8 threads [3.2208 µs 3.2434 µs 3.2663 µs] [6.6902 µs 6.7288 µs 6.7705 µs]
open_many_files/16 threads [3.4278 µs 3.4474 µs 3.4675 µs] [7.4527 µs 7.5142 µs 7.5674 µs]
open_many_files/32 threads [3.9230 µs 3.9604 µs 3.9977 µs] [6.8199 µs 6.8953 µs 6.9600 µs]

While performance improves in a current thread runtime, the results actually regress as the number of threads increases. Currently there is only one global ring in the runtime, so increasing threads may be causing lock contention.

I'll look into whether I can reduce the lock contention.

@Darksonn
Copy link
Contributor

The main feedback I have here is that although it's ok to use #[cfg()] temporarily, eventually we will want --cfg tokio_uring to result in a binary that supports both io_uring and spawn_blocking. I.e. it should fall back to spawn_blocking if the Linux kernel it runs on does not support io_uring.

Base automatically changed from mox692/iouring_initial_infra to master May 20, 2025 17:36
@mox692 mox692 changed the base branch from master to mox692/uring_dynamic_check May 24, 2025 09:38
@mox692
Copy link
Member Author

mox692 commented May 24, 2025

Update: Sync with master and now the base branch is 7357.

Base automatically changed from mox692/uring_dynamic_check to master June 10, 2025 18:26
@mox692 mox692 marked this pull request as ready for review June 15, 2025 16:10
@mox692
Copy link
Member Author

mox692 commented Jun 15, 2025

This is ready for review.

Copy link
Contributor

@Darksonn Darksonn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks reasonable to me. Have we verified whether CI runs a recent enough Linux to actually run the io_uring code?

Copy link
Member

@ADD-SP ADD-SP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/tokio-rs/tokio/actions/runs/15971607056/job/45043945544?pr=7321

SLOW [>21480.000s] tokio::fs_uring shutdown_runtime_while_performing_io_uring_ops

This job ran for six hours and was canceled due to timeout. Were we just unlucky? Or was something else slows down the test?

@mox692
Copy link
Member Author

mox692 commented Jul 1, 2025

This doesn't seem like a random occurrence, I'll take a look into it.

@mox692
Copy link
Member Author

mox692 commented Jul 1, 2025

To fix #7321 (review), I opened another PR #7436 since it's somewhat unrelated to open support itself.

@ADD-SP
Copy link
Member

ADD-SP commented Jul 22, 2025

I'd like to go over the uring implementation (#7320) again before reviewing this PR, this is because I took too much time to understand the fix in #7436.

After that, I think I can review this PR with deeper understanding.

Copy link
Contributor

@Darksonn Darksonn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks reasonable to me.

Comment on lines 12 to 16
#[allow(dead_code)]
#[derive(Debug)]
pub(crate) enum CancelData {}
pub(crate) enum CancelData {
Open(Open),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the dead code annotation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed some #[allow(dead_code)] annotations, but it seems that some of them still require the annotation. So I also added comments to explain that.

1405ee8

Comment on lines 61 to 62
// Avoid busy looping.
tokio::time::sleep(std::time::Duration::from_millis(10)).await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use yield_now() here? Sleeping is best avoided in tests if possible.

Copy link
Member Author

@mox692 mox692 Jul 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using yield_now should work, but now I'm seeing a test failure after this change (6dd50a3) ... Let me figure out what's going on

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out there was a flaw in the drop implementation of UringContext, so I pushed a commit (76d1537) to fix that.

Previous implementation used a wrong key when we remove the slab entry:

// commit: 1405ee87b11c6bbb558d2f411002fd587524cd88
fn drop(&mut self) {

    ...

    let mut cancel_ops = Slab::new();
    let mut keys_to_move = Vec::new();

    ... 

    for key in keys_to_move {
        let lifecycle = self.remove_op(key);
        // Here, cancel_ops generates a separate index from `self.ops`. 
        cancel_ops.insert(lifecycle);
    }

    while !cancel_ops.is_empty() {
        ...

        for cqe in self.ring_mut().completion() {
            let idx = cqe.user_data() as usize;
            
            // Bug: We should not use `idx` gained from `cqe.user_data()` here,
            // as it will not correspond to the index in `cancel_ops`
            cancel_ops.remove(idx);
        }
    }
}

@mox692
Copy link
Member Author

mox692 commented Jul 25, 2025

I'll be back soon and address the comments.

@ADD-SP
Copy link
Member

ADD-SP commented Jul 29, 2025

I'm reading this branch locally, would you mind fixing the clippy reports?

RUSTFLAGS="--cfg tokio_uring" cargo +1.88 clippy --all --tests --all-features

@Darksonn
Copy link
Contributor

Does the clippy job need to be modified?

@ADD-SP
Copy link
Member

ADD-SP commented Jul 30, 2025

Does the clippy job need to be modified?

I think we need to improve the clippy job to cover different combination of feature flags of cfgs, cargo-hack may help a lot.

Currently, it covers only one combination.

run: cargo clippy --all --tests --all-features --no-deps

@Darksonn
Copy link
Contributor

Darksonn commented Jul 30, 2025

I probably would only consider two combinations for clippy:

  • Full without any cfg flags.
  • Full with all cfg flags.

It's not critical that every combination is clippy-free.

Copy link
Member Author

@mox692 mox692 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clippy stuff is resolved, and I slightly fixed the test.

Comment on lines +122 to +124
// If io_uring is enabled (and not falling back to the thread pool),
// the first poll should return Pending.
let _pending = Box::pin(fut).poll_unpin(cx);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the da0175b, I removed an assertion assert!(res.is_pending(), "Expected the open to be pending"); since it could be ready at first poll if it's using fallback logic. (For example, old kernel ci check could fail here)

Copy link
Member

@ADD-SP ADD-SP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@Darksonn Darksonn merged commit 3e84a19 into master Aug 8, 2025
92 checks passed
@Darksonn Darksonn deleted the mox692/iouring_add_open branch August 8, 2025 07:51
@mox692 mox692 mentioned this pull request Aug 10, 2025
9 tasks
@Darksonn Darksonn mentioned this pull request Oct 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-tokio Area: The main tokio crate M-fs Module: tokio/fs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants