2024-08-02

Reversing normalization for errors

Normalization is a technique used to ensure that any two objects that represent the same thing have the same form aka the normal form, a unique form of an expression.

What is the problem?

Normalization is one of the core concepts in programming languages, but it leads to not great errors and normalization is everywhere, but hopefully reversing normalization may save us.

Normalization is evaluation

Normalization is useful whenever we want to do pretty much anything with an object, if I want to (print 3 + 2 times the message Hello) then I first normalize 3 + 2 to 5 and then I will (print 5 times the message Hello).

The normal form of many expressions are known as a value, so the act of producing a value is also known as evaluation. In fact “running” a program is no different from normalizing it.

Normalization for type checking

Normalization is also useful whenever we want to check if two objects are equal, such as during type checking.

How does TypeScript knows that the following code should be allowed?

1 2	type Id<A> = A; const f = (x: number): Id<number> => x;

Well the function wants to return a Id<number> but it is actually returning a number, but this works, because the normal form of Id<number> is actually number and number == number is trivially true.

So by first normalizing the types we reduced the entire checking process to a simple equality. This is the gist of pretty much all fancy’ish type checking.

But what happens when it fails?

Let’s say you wrote a bug and you’re trying to debug it, you have a hunch of where the issue should be and as a professional programmer your first reflex is to add a debug message to check.

const f = (x: number) => {
  // checking if f is called with the right value
  console.log(x);
};

After running the program, you find out that x is actually wrong, it should be 13 but it’s 12, the main issue is that f is being called in the entire codebase, you can check stack traces, but those are hard to read, such if you have bad async code, it would be really nice if you could just regex search the codebase and find the desired call site or even better just ask the language where this value was produced.

Sadly this doesn’t work, maybe on the call site it was actually called with f(5 + 7) and you only know 12, also, sadly the language doesn’t know where a value is created, as only the actual value is known but not how it was created, as such, good luck.

This actually happens, because types

Another case, likely more realistic and painful are those on the type level. Let’s check the previous example, but now with a twist, the type constructor is wrong.

// far.away.lib.ts
type Id<A> = string; // wrong

// your.crud.ts
const f = (x: number): Id<number> => x;

The typer will try to do the same thing, it is expecting Id<number> and you returned number, but this time normalizing Id<number> actually produces string and string == number is false.

This would not be a major problem, but when reporting the error it reports string is not a number, which is a reasonable error, except that nowhere on your code there is any mention of string. The issue is that the typer is giving you the diff over the normal form, which may look unrelated to the actual annotations.

Solving the problem

While there is a lot of partial solutions for both issues, I’ve been thinking about it for a while, trying to find a principled solution that always works, hopefully one that can be implemented without too much overhead for a debugging build and that works nicely for the small interpreters inside of typers.

Originally my hope was to just preserve the original form as much as possible, this is okay(ish) for type checking because the forms are small and you don’t need to hold intermediary values, but still, that implies on preserving a lot of versions of the same type, which is not very efficient.

Reductions as equivalences

Recently I was toying with an idea about rewriting systems, making reductions into actual equivalences, to make any reduction into an equivalence, so I played with adding the reverse reduction on the system, but most reductions actually erase information, so you don’t know what was the “original” term.

The reductions as equivalences thing, didn’t go anywhere yet, but the idea of reversing reductions is quite interesting.

Finally, reversing reductions

The key insight here, is that if you can reverse all the reductions, then you can just go back to the original term, producing wherever error message you want and showing wherever intermediary step you want.

The only requirement is to track which reductions happened, I will be calling this the reduction’s receipt. Here it will be a special node on my tree, but in a efficient implementation it is likely tracked in a different way, I have a couple ideas, but this is for another post.

Here I will focus on evaluation of the lambda calculus, while this may look unrelated to typing, the idea is the same, in fact you can think about type constructors as lambda’s on the type level. As such the original example could still be produced.

A reversible LC

Let’s imagine the Lambda Calculus extended with let.

Term (M, N) ::=
  | x        // var
  | x => M   // lambda
  | M N      // apply
  | x = N; M // let

It has two reductions, beta and subst.

1 2	(x => M) N \|-> x = N; M // beta x = N; M \|-> M{x := N} // subst

Beta will produce let and subst will eliminate them, this is a more or less traditional system. Here you can notice that both reductions actually loose information, this can be verified because both the following examples will produce the same term and as such you cannot dinstinguish between them after normalized.

(x = 2; f x) // you wrote this
(f 2) // output

(x => f x) 2 // but what if you wrote this?
(x = 2; f x)
(f 2) // output

Of course, this is intended as they share the same normal form, but here the goal is to be able to actually figure out which one you wrote, so let’s extend the system and tweak the reduction rules such that they preserve the receipts.

Receipt (R) ::=
  | [beta]
  | [x = N] // subst
  | [x] // var
Term (M, N) ::=
  | #R; M // receipt

(x => M) N |-> #[beta]; x = N; M  // beta
x = N; M |-> #[x = N]; M{x := #[x]; N} // subst

Now let’s check the original examples using those modified rules.

(x = 2; f x)
(#[x = 2]; f (#[x]; 2)) // actual output
(f 2) // prune receipt

(x => f x) 2 // but what if you wrote this?
(#[beta]; x = 2; f x)
(#[beta]; #[x = 2]; f (#[x]; 2)) // actual output
(f 2) // prune receipt

Notice that they actually produce different results, so normalization is broken. And yes technically it is, but by prunning those they actually go back to producing the same normal form. But now all the original information is preserved and as such those rules can be reversed.

1 2	#[beta]; x = N; M \|-> (x => M) N // rev-beta #[x = N]; M{x := #[x]; N} \|-> x = N; M // rev-subst

Mostly the final goal is to have the following

eval : (initial : Term) -> (normal : Term);
rev : (normal : Term) -> (initial : Term);

eval_rev_is_id : (initial : Term) -> term == rev (eval term);

Typers are interpreters and I gotta go fast

Another idea that I’ve been playing for a while is that typers are just interpreters, this allows all the techniques developed for evaluating code to be used while typing, such as abstract machines, closure conversion, defunctionalization, bump allocation and JITs, to be used here.

This is especially important when thinking about languages that have a lot of computing on equality, such as dependently typed languages or TypeScript. But under the right circunstances it may also be quite interesting to speeding up elaboration.

This is half the problem

Reversing reductions doesn’t solve the error messages problem, but it unlocks the best possible results, by having the full term, we can then start producing better tools for fancier type systems. Maybe more important than that, it also allows for much faster development of programming languages, hopefully type errors don’t require almost complete rewrites of codebases, as long as you can add receipts back into the system, you’re good.

I also need to think in a slightly more formal way about the receipts and try further to find prior research on them, the general trick seems to work but as always, it’s easy to screw it up. Follow me on Twitter(not X) and Twitch for that.

https://x.com/TheEduardoRFS/
https://www.twitch.tv/EduardoRFS