There's a workaround, but it's unidiomatic, requires more traits, and requires i...

withoutboats2 · on March 10, 2021

> I am not sure whether this is actually viable.

Having investigated this myself, I would be very surprised to discover that it is.

The only viable solution to make AsyncRead zero cost for io-uring would be to have required futures to be polled to completion before they are dropped. So you can give up on select and most necessary concurrency primitives. You really want to be able to stop running futures you don't need, after all.

If you want the kernel to own the buffer, you should just let the kernel own the buffer. Therefore, AsyncBufRead. This will require the ecosystem to shift where the buffer is owned, of course, and that's a cost of moving to io-uring. Tough, but those are the cards we were dealt.

comex · on March 10, 2021

Well, you can still have select; it "just" has to react to one of the futures becoming ready by cancelling all the other ones and waiting (asynchronously) for the cancellation to be complete. Future doesn't currently have a "cancel" method, but I guess it would just be represented as async drop. So this requires some way of enforcing that async drop is called, which is hard, but I believe it's equally hard as enforcing that futures are polled to completion: either way you're requiring that some method on the future be called, and polled on, before the memory the future refers to can be reused. For the sake of this post I'll assume it's somehow possible.

Having to wait for cancellation does sound expensive, especially if the end goal is to pervasively use APIs like io_uring where cancellation can be slow.

But then, in a typical use of select, you don't actually want to cancel the I/O operations represented by the other futures. Rather, you're running select in a loop in order to handle each completed operation as it comes.

So I think the endgame of this hypothetical world is to encourage having the actual I/O be initiated by a Future or Stream created outside the loop. Then within the loop you would poll on `&mut future` or `stream.next()`. This already exists and is already cheaper in some cases, but it would be significantly cheaper when the backend is io_uring.

withoutboats2 · on March 11, 2021

> But then, in a typical use of select, you don't actually want to cancel the I/O operations represented by the other futures. Rather, you're running select in a loop in order to handle each completed operation as it comes.

You often do want to cancel them in some branches of the code that handles the result (for example, if they error). It indeed may be prohibitively expensive to wait until cancellation is complete - because io-uring cancellation requires a full round trip through the interface, the IORING_OP_ASYNC_CANCEL op is just a hint to the kernel to cancel any blocking work, you still have to wait to get a completion back before you know the kernel will not touch the buffer passed in.

And this doesn't even get into the much better buffer management strategies io-uring has baked into it, like registered buffers and buffer pre-allocation. I'm really skeptical of making those work with AsyncRead (now you need to define buffer types that deref to slices that are tracking these things independent of the IO object), but since AsyncBufRead lets the IO object own the buffer, it is trivial.

Moving the ecosystem that cares about io-uring to AsyncBufRead (a trait that already exists) and letting the low level IO code handle the buffer is a strictly better solution than requiring futures to run until they're fully, truly cancalled. Protocol libraries should already expose the ability to parse the protocol from an arbitrary stream of buffers, instead of directly owning an IO handle. I'm sure some libraries don't, but that's a mistake that this will course correct.

Matthias247 · on March 11, 2021

> Well, you can still have select; it "just" has to react to one of the futures becoming ready by cancelling all the other ones and waiting (asynchronously) for the cancellation to be complete.

Right. Which is more or less what the structured concurrency primitives in Kotlin, Trio, and soon Swift are doing.

stormbrew · on March 10, 2021

Wouldn't a more 'correct' implementation be moving the buffer into the thing that initiates the future (and thus, abstractly, into the future), rather than refcounting? At least with IOCP you aren't really supposed to even touch the memory region given to the completion port until it's signaled completion iirc.

Ie. to me, an implementation of read() that would work for a completion model could be basically:

    async read<T: IntoSystemBufferSomehow>(&self, buf: T) -> Result<T, Error>

I recognize this doesn't resolve the early drop issues outlined, and it obviously does require copying to adapt it to the existing AsyncRead trait, or if you want to like.. update a buffer in an already allocated object. It's just what I would expect an api working against iocp to look like, and I feel like it avoids many of the issues you're talking about.

alexisread · on March 10, 2021

I'm not a rust expert, so I'm not sure how close this proposal is to Composita

http://concurrency.ch/Content/publications/Blaeser_Component...

Essentially each component has a buffered interface (an interface message queue), which static analysis sizes at compile time. This buffer can act as a daemon, ref counter, offline dropbox, cache, cancellation check, and can probably help with cycle checking.

Is this the sort of model which would be useful here?