-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide interfaces to prepare SQE in place #116
Comments
I think memory copy is not a big overhead, because sqe only has 64bytes. see nop bench, the io_uring crate is not at a performance disadvantage because of this. but I don't reject this idea, it requires some refactor for opcode mod. |
It would be great if we could also introduce And I will give a try to prepare some patches:) |
This is not enough, I hope to be able to specify some initialization parameters at same time. I would consider adding a impl SubmissionQueue<'_> {
fn prepare(&mut self) -> PrepareEntry<'_>;
}
struct PrepareEntry<'a> {
ptr: *mut io_uring_sqe,
tail: &'a mut u32
}
struct InitializedEntry<'a, OPCODE> {
ptr: *mut OPCODE,
tail: &'a mut u32
}
// auto commit
impl Drop for InitializedEntry<_, OPCODE> {}
impl DerefMut for InitializedEntry<'a, OPCODE> {
type Target = OPCODE;
}
#[repr(transparent)]
struct Timeout(io_uring_sqe);
impl Timeout {
fn init<'a>(entry: PrepareEntry<'a>, timespec: *const types::Timespec) -> InitializedEntry<'a, Timeout>;
} The current opcode always constructs a separate struct and then constructs sqe type, we need to make it directly modify sqe type. |
I started to make opcode support inplace init in the branch. This seems to have some performance degradation for ZST like |
I have given it a second try during the new year holiday and implemented a mechanism to prepare SQE in place without generate the intermediate OpCode structures. But the nop benchmark result is not so optimistic. With more investigations, it turns out that the nop benchmarks have been heavily optimized by the compiler.
At first glance,
So the compiler has heavily optimized these three functions, and the final asm code is almost the same
They are almost the same, but there's still a minor difference in the core part. Let get rid of the prolog and epilog part, which are the same. Then we get
Logically the two versions of asm code above has the same effect, but the the normal() version performance slightly better. So when dealing with more complex opcode and compiler can't heavily optimize normal(), the prepare_sqe() should have performance benefits. |
Hello, I observed that ublk-null[1] built over tokio-rs/io-uring performs less ~10% than C version's miniublk[2], 1.7M vs. 1.9M wrt. IOPS when running: fio/t/io_uring /dev/ublkb0 /dev/ublkb0 is created by ublk-null[1] or miniublk[2]. ublk-null basically just takes io-uring cmd(16) for communicating with driver, and the logic is translated And I guess the gap might be from the SQE copy. [1] https://github.com/ming1/libublk-rs |
@ming1 I'm a bit skeptical about this, your code has a lot of |
Hello @quininer Thanks for the response! All borrow() and borrow_mut() are done in single same thread context wrt. fast IO path, as one Rust beginner, I understand the two just adds runtime borrow checking, which shouldn't be expensive enough, which is just for interior mutability. Otherwise, q.ops(UblkQueueImpl trait object) has to pass as function parameter to most methods of UblkQueue. The heap allocation in reap_events_uring() can be removed easily, which is still for handling borrowing check. I just tried Thanks, |
It's hard to determine the problem by just guess, I'm just pointing out some differences between your code and C version. |
Yeah, I agree. Today I wrote one small test code against io-uring craft to So I start to investigate from ublk side, turns out the gap After addressing this two things, now libublk can perform Thanks |
Currently the way to submit an SQE is:
op
It would be great to provide interfaces to prepare SQE in place to reduce one memory copy by:
op
let sqe: &mut Entry = sq.prepare()
op.prepare(sqe)
The text was updated successfully, but these errors were encountered: