Skip to content
This repository was archived by the owner on Apr 1, 2025. It is now read-only.

Conversation

@patrickt
Copy link
Contributor

@patrickt patrickt commented Mar 12, 2020

Do not merge this until we’re confident it won’t cause performance regressions in clients.

This removes the Distribute effect, which was problematic. Read on to find out why. You can also read @lexi-lambda's blog post, which explains this in the context of the MonadBaseControl ecosystem, which I won't mention in the following.


Consider this age-old situation: given a function fn and a container list, we want to apply this function to every element in the container in such a way that it uses all available computational resources. Put another way, given

fn `parallelMapM` list

we expect its evaluation time to be bounded by the whatever element e in list entailed the longest computational time when applied to fn. Formally stated, we anticipate the existence of a sequential-map function of complexity Ο(n), and a parallel-map function of complexity Ω(fn).

This is a pressing need for any real-world programming language, and indeed it is satisfied in Haskell IO, with traverse and mapConcurrently. Evaluating this code will print each character in 'abcdef' in arbitrary order:

Control.Concurrent.Async.mapConcurrently print "abcdef"

The concurrency here is observable thanks to the print statement, and powered by the special support GHC has for its IO type. Once we get outside of IO, things start getting a little more interesting.

It's worth thinking about the cases when parallelism isn't observable. Because Haskell is a lazy language, the statement map (+1) [1,2,3], in isolation, doesn't produce any computation, much less any concurrency: to get concurrency, we need to invoke functions on these values in a context (like IO or STM) where we can observe, thanks to side effects, that a given action has been invoked.

However, if our program behavior is unaffected by whether a given map operation is parallel or serial, the parallel library can come in handy. It provides combinators that use behind-the-scenes magic to tell GHC "hey, I've mapped fn over list: if you access any such application inside list, you might want to start evaluating the others in parallel, because I expect you to compute them ASAP."

It is atop these combinators (specifically parTraversable) that we built a Distribute effect, providing distributeFor and distributeFoldMap functions:

distributeFor :: (Has Distribute sig m, Traversable t) => t a -> (a -> m output) -> m (t output)
distributeFoldMap :: (Has Distribute sig m, Monoid output, Traversable t) => (a -> m output) -> t a -> m output

(Side note: distributeFoldMap is worth mentioning, because it's a long name for a storied concept: map-reduce. Indeed, foldMap is so universal—it's one of the fundamental methods on Foldable, from which all other Foldable methods descend—because it describes the tremendously-common pattern of map-reduce with a monoidal, associative function.)

So, ideally, we should be able to just use this effect and everything should be hunky-dory, right? Wrong. I said earlier that the combinators provided by parallel only take effect in cases where the parallelism isn't observable. Thus, the following code using Distribute's combinators doesn't do what you think it might:

runM (withDistribute (distributeFor "abcdef" (sendM . print)))

This function constructs the computation over "abcdef" in parallel, but evaluates it sequentially; as a result, we get the characters printed in the same order every time, in contrast to the mapConcurrently example above.

Clearly, this is a bug: if distributeFor can't actually distribute its computations, then it's not useful for anything. We get a small bit of parallelism out of the cases where we save time by constructing computations in parallel, despite evaluating them sequentially, but it's not real concurrency: the same project, evaluated in Semantic, will produce its observable results in the same way, every time.


This raises the question: why not just use a fused-effects style wrapper for mapConcurrently? The Lift effect provides, with the liftWith function, an API that on its surface is powerful enough to do the lifting/unlifting required to promote mapConcurrently to this:

mapConcurrently :: (Has (Lift IO) sig m, Traversable t) => (a -> m b) -> t a -> m (t b)

In practice, this is not the case, because liftWith makes us handle our monadic state explicitly. Each invocation of liftWith requires that we tell fused-effects how to propagate updates to monadic state, even if the lifting function does something weird like call catch or throw. Other languages don't have to worry about this, because they're always stuck in IO; in Haskell, any given effect might be in a pure or impure context, and since we might not be in IO, we can't rely on global mutable state to save and update monadic state as necessary. As such, we simulate stateful computations by using carrier monads of the form s -> , (s, a), functions that take a state and return a monadic pair of newly-updated-state and return value. The accumulated set of these stateful variables is called a "context", and to use liftWith we have to indicate, before and after a lifting process, what we are doing with the context. This poses a problem for each call to mapConcurrently: if we map a monadic function over each element of a given list, how do we ensure that the state changes produced by that function are visible to future invocations?

Because fused-effects does not allow us to constraint the type of mapConcurrently so that we could provide vocabulary for this use case, we can't express it with just liftWith. This has been a point of investigation for some time. We are, however, able to access a more type-restricted version:

mapConcurrently :: (Has (Lift IO) sig m, Traversable t) => (a -> IO b) -> t a -> m (t b)

UNDER CONSTRUCTION

@patrickt patrickt changed the base branch from master to this-carrier-is-in-time-out March 12, 2020 00:57
@patrickt patrickt changed the title Remove Distribute effect. [Do not merge] Remove Distribute effect. Mar 12, 2020
@patrickt patrickt changed the title [Do not merge] Remove Distribute effect. Remove Distribute effect. Mar 12, 2020
@patrickt patrickt changed the title Remove Distribute effect. [WIP] Remove Distribute effect. Mar 31, 2020
@patrickt patrickt changed the base branch from this-carrier-is-in-time-out to master July 27, 2020 19:53
@patrickt patrickt closed this Jul 27, 2020
@patrickt patrickt deleted the sane-concurrency branch July 27, 2020 19:55
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants