To block_on or await?

A problem

I recently faced a conundrum in some Rust at the boundary between sync and async code. The library in question uses a persistent tokio executor internally and you interact with it via a C-style API. Since there is no such thing as an async C function these entry points to the library are all implemented as sync Rust. If they need to do something async on the internal runtime before they return, they execute a future using the runtime’s block_on method then wait for the result by blocking the current thread.

So far there is no issue in what I’ve described. Provided the caller expects the functions to block for as long as they do, everything is working as intended.

This library also does something else. You can register callbacks such that it will spontaneously call some function pointer that you provided when certain events occur. These callbacks have be delivered on somebody’s thread and in this case the library will use one of its own. Since the triggers for these callbacks are often async I/O or timers it’s often the case that it’s a tokio worker thread that ends up invoking the callback. This is allowed because the C callback is a sync function, and async functions can call sync functions.

In isolation these two aspects are valid but when you combine them a problem arises.

A callback is triggered, which the library delivers on a tokio worker thread by calling the user’s function.
Inside the callback, the user code calls back into the library in direct response to the event.
This “tromboned” library function tries to use block_on.
The app panics because you can’t use block_on in the context of a tokio runtime.

This is a tricky situation. The function could be called from either a sync or async context. Worse, you might not even notice these combinations of callback-and-call-in unless you tried to do something specific. It’s just a crash lurking in the dark for someone to bump into it.

A wrong turn

Like others before me, my first thought was this: what if these functions could detect whether they’re in the tokio runtime and convert the block_on call to something that “awaits”? I put “awaits” in quotes because the basic idea is clearly impossible—a sync function like this cannot .await on anything unless it was compiled as a future in the first place. However if we take a step back from reality you could imagine some sort of mechanism where the thread talks to the runtime and between them they agree to park this particular future for a while until some other future completes; a dynamically-added await point if you will.

It was at this point I found an old discussion on the Rust users forum, “Accessing the current tokio runtime to call block_on”. In this exciting thread, Alice Rhyl of tokio states unequivocally that you shouldn’t try to do this, and 2e71828 provides a brilliant/slightly cursed workaround involving crossbeam. I’m going to reproduce a slightly-fixed version here for the sake of discussion:

let future_result = match tokio::runtime::Handle::try_current() {
    Err(_) => tokio::runtime::new().block_on( the_future ),
    Ok(handle) => {
        crossbeam::scope(move |scope| {
            scope.spawn(move || {
                handle.block_on( the_future )
            })
        })
    }
};

Alice said:

In principle, the crossbeam::scope thing will “work” […]

How would it work then? Well first we try to get the Handle for the current async context. If it fails (because we’re on a regular sync thread) then we spin up a whole new tokio Runtime to execute our future, which is overkill but allowed. If getting the handle succeeds, we know we can use Handle::block_on to run a future to completion, but only if we’re starting from a sync thread. So we use crossbeam to spawn a separate thread from which we can safely make that call. (I guess using a scoped thread means the future doesn’t have to be 'static?) Kind of genius but not very efficient in either situation.

As far as I know there’s no “good” way to do this because Rust’s async model doesn’t allow you to yield from a future inbetween await points. It just doesn’t make sense. Even so, suppose I was especially bored and I wanted to hack support for this into tokio in some way. What could I feasibly do?

2e71828’s solution points to one approach: even if we block the tokio worker (which is usually verboten) you can kind of get away with it if you spin up an additional thread to compensate. Creating a new thread for every function is pretty inefficient though. If you had some sort of pool of them… wait, this sounds familiar—ah yes, tokio’s blocking threads. I propose that you could have a separate threadpool that expands up to the same size as the number of regular workers. Then you could have handle.block_on_or_await(future) which acquires a thread from that pool and uses it to poll the future, while allowing the original thread, which has a sync function on top of its stack, to sleep until woken with the result. By reusing these threads you can save the cost of creating them repeatedly.

What if we didn’t want to create extra threads though? We don’t want to starve the executor by taking the current worker thread out of service. The solution: spawn the inner future, then while waiting for its result, take some other future that’s ready to poll from the executor and do that on top of the stack of the currently-polled future. Then we can check if the result came through inbetween polls of other futures, allowing us to reduce the stack by a layer. Normally spawned futures have to be 'static… I feel like you could unsafely assert that references are stable in this case since it’s buried in the stack of another thread but I would leave that to decision to somebody who knows things about unsafe.

With either approach, I don’t like my chances of getting that PR merged.

A little common sense

In reality the problem with the design is at the opposite end—invoking the callbacks. If there’s any possibility that a sync function will block for a significant period of time then it should not be called directly from a worker thread since it could starve the executor. The user callback is a perfect example of an unpredictable sync function. They might not allow it to return for ten seconds and that’s my problem to deal with.

Therefore the solution is to ensure that all callbacks are delivered on non-worker threads, either via tokio’s spawn_blocking API or some other std::threads created for the purpose. This solves the original problem neatly: if the user never has the opportunity to run their own code on threads belonging to the internal tokio runtime then they can always safely call a function that performs block_on. Unfortunately this isn’t the sort of problem that the compiler can catch but it’s something you can look out for in review.

If you want an additional incentive not to expose your workers consider this hair-raising scenario: if the user code operating the C API is also Rust, it’s possible for them to call tokio::spawn inside the sync function (using their version of the tokio library) to inject a future into your runtime (using your version of the tokio library). This often works because thread local storage is great like that, and again it’s not something where the compiler will help. A surprisingly easy mistake to make if both sides are very async code.