Let’s Reify Effects

This is my attempt at demystifying effect systems of pure functional languages and how to implement them in impure imperative languages.

Function Purity

To tell whether a function written in an impure language is pure, just ask yourself whether it can be replaced by a (possibly infinite) mapping from its arguments to its results – nice talk here. This corresponds to the mathematical definition of a function. Effects are what make functions impure:

Input/Output actions.
Never returning.
Crashing the program.
Throwing an exception.
In memory mutation.

Pros of pure functions:

Easier to test – eg: no mock db no file setup.
Type signature offers a better documentation.
No race condition means easier concurrency.
Can be memoized.
Easier to reason about (somehow subjective).

Pros of impure functions:

Can actually do something useful by performing IO actions.

So we want to write program with pure functions but we also want to produce programs that have effects on the real world. Because effects are contagious, a function becomes impure when it calls an impure functions. That means that if we want to maximize the amount of logic implemented by pure functions we need to reject effects to the edge of our programs. This architecture has many names: functional core and imperative shel, port and adapters, or hexagonal – nice talk here.

Supporting Output Actions

One approach to move logic inside the functional core is to reify effects. That is that impure functions are turned into pure functions by making them compute description of their effects rather than letting them directly carrying them. These reified effects still need to be interpreted to do anything useful.

// Runtime [impure] //
import { writeFile } from "node:fs/promises";
const run = async (effect) => {
  if (effect.type === "writeFile") {
    await writeFile(effect.path, effect.content, effect.encoding);
  } else {
    throw new Error("unknown effect");
  }
};

// Prelude [pure] //
const writeFileEffect = (path, content, encoding) => ({
  type: "writeFile", path, content, encoding,
});

// Main [pure] //
const main = writeFileEffect("file.txt", "foo", "utf8");

// Runtime [impure] //
await run(main);

At the extreme, every single effect (beside running forever or crashing) carried by the program is reified that way. We then speak about effect system. Effect systems can be made generic and reusable. This is what pure functional language usually do to support IO actions:

In haskell, it is called IO.
In elm, it is called task.
In purescript, it is called Effect.

Combining Output Actions

We can add both concurrent and sequential composition to our effect system:

// Runtime [impure] //
import { writeFile } from "node:fs/promises";
const run = async (effect) => {
  if (effect.type === "writeFile") {
    await writeFile(effect.path, effect.content, effect.encoding);
  } else if (effect.type === "concurrent") {
    await Promise.all([run(effect.left), run(effect.right)]);
  } else if (effect.type === "sequence") {
    await run(effect.first);
    await run(effect.second);
  } else {
    throw new Error("unknown effect");
  }
};

// Prelude [pure] //
const writeFileEffect = (path, content, encoding) => ({
  type: "writeFile", path, content, encoding,
});
const concurrentEffect = (left, right) => ({
  type: "concurrent", left, right,
});
const sequenceEffect = (first, second) => ({
  type: "sequence", first, second,
});

// Main [pure] //
const main = sequenceEffect(
  writeFileEffect("file1.txt", "foo", "utf8"),
  concurrentEffect(
    writeFileEffect("file2.txt", "bar", "utf8"),
    writeFileEffect("file3.txt", "qux", "utf8"),
  ),
);

// Runtime [impure] //
await run(main);

Supporting Input Actions with Abstract Machines

Up until now, our effect system could only express entirely static programs that do the same thing every-time they are executed. To make our program more dynamic we need to retrieve the values returned by our effects and somehow plug them back to the functional core.

One idea is to turn our main into a function that accept an input and returns an effect. Then we repeatedly call this function with whatever input is available. To help our main function to make sense of the input we should also bookkeep a state.

// Runtime [impure] //
import { readFile, writeFile } from "node:fs/promises";
const run = async (effect) => {
  if (effect.type === "writeFile") {
    return await writeFile(effect.path, effect.content, effect.encoding);
  } else if (effect.type === "readFile") {
    return await readFile(effect.path, effect.content, effect.encoding);
  } else if (effect.type === "concurrent") {
    return await Promise.all([run(effect.left), run(effect.right)]);
  } else {
    throw new Error("unknown effect");
  }
};

// Prelude [pure] //
const writeFileEffect = (path, content, encoding) => ({
  type: "writeFile", path, content, encoding,
});
const readFileEffect = (path, encoding) => ({
  type: "readFile", path, encoding,
});
const concurrentEffect = (left, right) => ({
  type: "concurrent", left, right,
});

// Main [pure] //
const main = {
  initial: {
    state: "reading",
    effect: readFileEffect("file.txt", "utf8"),
  },
  step: ({state, input}) => {
    if (state === "reading") {
      return {
        state: "writing",
        effect: concurrentEffect(
          writeFileEffect("copy1.txt", input, "utf8"),
          writeFileEffect("copy2.txt", input, "utf8"),
        ),
      };
    } else if (state === "writing") {
      return {
        state: "final",
        effect: null,
      };
    } else {
      throw new Error("invalid state");
    }
  },
};

// Runtime [impure] //
{
  const { step } = main;
  let { initial: { state, effect } } = main;
  while (effect !== null) {
    ({ state, effect } = step({ state, input: await run(effect) }));
  }
}

Our main effectively encodes an abstract machine: an initial state, a final state and a state transition function. By also reifying the state, abstract machines facilitate formally reasoning about programs and enable powerful introspection techniques – eg: time travel debugging. However it doesn’t scale well with complexity and expressing real-world software requirements as an abstract machine seems like a nightmare. I might be wrong. Maybe, they will be the way to write programs in the future. With the advent of AI, who can tell?

reading
  |
  | -> readFile
  | <- "content"
  |
writing
  |
  | -> (writeFile, writeFile)
  | <- [undefined, undefined]
  |
final

It’s worth noting that there are actually two nested abstract machines at play here. What we discussed above corresponds to the external machine that describes the transition of the program between effects. But these outer transitions are themselves composed by many inner transitions dependent of the programming language at hand. A good formalism to express these inner transitions is the CESK machine.

reading
  |
  | -> readFile
  | <- "content"
  |
  writeFileEffect()
  |
  writeFileEffect()
  |
  concurrentEffect()
  |
writing

Supporting Input Actions with Callbacks

Another, more practical way to support input actions is to allow pure functions inside effects. Let’s tweak our effect system to handle these callbacks and use some of the big-brain Haskell names.

// Runtime [impure] //
import { readFile, writeFile } from "node:fs/promises";
const run = async (effect) => {
  if (effect.type === "writeFile") {
    return await writeFile(effect.path, effect.content, effect.encoding);
  } else if (effect.type === "readFile") {
    return await readFile(effect.path, effect.content, effect.encoding);
  } else if (effect.type === "fmap") {
    return effect.mapping(effect.child);
  } else if (effect.type === "liftA2") {
    return effect.combine(... await Promise.all([run(effect.left), run(effect.right)]));
  } else if (effect.type === "bind") {
    return await run(effect.makeSecond(await run(effect.first)));
  } else if (effect.type === "return") {
    return await effect.makeSecond(await run(effect.first));
  } else {
    throw new Error("unknown effect");
  }
};

// Prelude [pure] //
const readFileEffect = (path, encoding) => ({
  type: "readFile", path, encoding,
});
const writeFileEffect = (path, content, encoding) => ({
  type: "writeFile", path, content, encoding,
});
// Effects are functors:
const fmapEffect = (mapping, child) => ({
  type: "fmap", mapping, child,
});
// Effects are applicatives:
const liftA2Effect = (combine, left, right) => ({
  type: "liftA2", left, right, combine,
});
// Effects are monads:
const bindEffect = (first, makeSecond) => ({
  type: "bind", first, makeSecond,
});
const returnEffect = (result) => ({
  type: "return", result,
});

// Main [pure] //
const main = bindEffect(
  readFileEffect("file.txt", "utf8"),
  (content) => liftA2Effect(
    (_write1, _write2) => null,
    writeFileEffect("copy1.txt", content, "utf8"),
    writeFileEffect("copy2.txt", content, "utf8"),
  ),
);

// Runtime [impure] //
await run(main);

The code is more concise and readable but states are no longer explicit. Indeed, states have been encoded inside the effects. As a result, the two nested abstract machines are no longer cleanly separated.

Also, by polluting effects with functions they are no longer pure data and cannot be expressed inside a DSL.

Supporting In-Memory Mutations

From a functional point of view, memory mutations are no different from IO actions. They are just faster. Let’s use the same system to support them.

// Runtime [impure] //
const run = (effect) => {
  if (effect.type === "get") {
    return effect.map.get(effect.key);
  } else if (effect.type === "set") {
    return effect.map.set(effect.key, effect.val);
  } else if (effect.type === "log") {
    return console.log(effect.message);
  } else if (effect.type === "bind") {
    return run(effect.makeSecond(run(effect.first)));
  } else {
    throw new Error("unknown effect");
  }
};

// Prelude [pure] //
const getEffect = (map, key) => ({
  type: "get", map, key,
});
const setEffect = (map, key, val) => ({
  type: "set", map, key, val,
});
const logEffect = (message) => ({
  type: "log", message,
});
const bindEffect = (first, makeSecond) => ({
  type: "bind", first, makeSecond,
});

// Main [pure] //
let map = new Map();
const main = bindEffect(
  setEffect(map, "foo", "bar"),
  (_) => bindEffect(
    getEffect(map, "foo"),
    logEffect,
  ),
);

// Runtime [impure] //
run(main); // logs bar

Supporting Timeouts

Up until now, effects had a clear before and after which only made it necessary to add callbacks in the bind effect combinator. This is no longer the case with timers which are indeed effects as they depend on the global state of the JavaScript VM. For instance, setTimeout is not referentially transparent because setTimeout(() => {}, 1000) !== setTimeout(() => {}, 1000).

// Runtime [impure] //
const run = (effect) => {
  if (effect.type === "setTimeout") {
    return setTimeout(() => run(effect.callback()), effect.time);
  } else if (effect.type === "clearTimeout") {
    return clearTimeout(effect.timer);
  } else if (effect.type === "log") {
    return console.log(effect.message);
  } else if (effect.type === "bind") {
    return run(effect.makeSecond(run(effect.first)));
  } else {
    throw new Error("unknown effect");
  }
};

// Prelude [pure] //
const setTimeoutEffect = (callback, time) => ({
  type: "setTimeout", callback, time,
});
const clearTimeoutEffect = (timer) => ({
  type: "clearTimeout", timer,
});
const logEffect = (message) => ({
  type: "log", message,
});
const bindEffect = (first, makeSecond) => ({
  type: "bind", first, makeSecond,
});

// Main [pure] //
const main = bindEffect(
  setTimeoutEffect(
    () => log("this should not be printed"),
    2000,
  ),
  (timer) => setTimeoutEffect(
    () => clearTimeoutEffect(timer),
    1000,
  ),
);

// Runtime [impure] //
run(main);

Many other effects also require a callback directly inside the effect rather than inside the bind combinator: listening to http requests, listening to user click on a button, etc… Note that callbacks inside the bind combinator can be nicely chained whereas callbacks directly inside the effects requires manual nesting to be composed. That is the reason why you should prefer promises wherever they make sense despite what Mikeal Rogers says here.

Takeaway

If you have the opportunity to work with a pure functional language, great. But let’s be real, most of us are stuck with imperative languages. Nonetheless, I found out that reasoning in terms of functional core and imperative shell was helpful. And, rejecting the effects to the border of the program is generally a good idea. I sometimes use effect reification as a design pattern to achieve this. Even if this is not a full-blown generic effect system, it is already beneficial.

I had to instrument JavaScript files fo my work. This required fetching the associated source map file and source files. I ended up rejecting readFile effects by reifying them into url requests:

import { readFile } from "node:fs/promises";

// https://github.com/getappmap/appmap-agent-js/blob/055d138abb9dba260db8fc95ad412dcf339be3e4/components/instrumentation/default/codebase.mjs#L21
export const extractMissingUrlArrayPure = (url, files) => {
  // may return:
  // - the file url to be instrumented
  // - the url of the source map file
  // - the url of the source files
};

// https://github.com/getappmap/appmap-agent-js/blob/055d138abb9dba260db8fc95ad412dcf339be3e4/components/instrumentation/default/index.mjs#L21
export const instrumentPure = (url, files) => {
  // return: the instrumented content of the file.
};

// https://github.com/getappmap/appmap-agent-js/blob/055d138abb9dba260db8fc95ad412dcf339be3e4/components/agent/default/index.mjs#L77
export const instrumentImpure = async (url, content) => {
  const files = new Map([url, content]);
  while (true) {
    const urls = extractMissingUrlArrayPure(url, file);
    if (urls.length === 0) {
      return instrumentPure(url, file);
    }
    for (const url of urls) {
      files.set(url, await readFile(url, "utf8"));
    }
  }
};