In my thesis, I exclusively used Agda for decidability analysis and and Coq for algorithmic analysis. This decision is based on an unobvious technical observation. This blog explains this technical observation, and why it would have been extremely difficult to work out the same undecidability proof in Coq.

## Summary

To give a high-level summary, the main difficulties to establish the same proof in Coq as in Agda are:

- inference rules live in Prop instead of Type,
- names are represented by locally nameless and cofinite representation in formalization in Coq, and
- termination checking in Coq requires a specified decreasing structure.

This difficulties are mainly encountered when attempting to show transitivity of \(D_{<:}\) normal form (Theorem 3.20). These points requires more careful technical setup to be fully clear.

`Type` and `Prop`

Though the majority of my work is in Coq, the undecidability proofs discussed in Chapter 3, were done in Agda. This is not simply for reasons like “wow, it would be cool to use two proof assistants”, but for something more fundamental.

The first difficulty is about proof relevant universes versus proof irrelevant
universes. Up until Agda 2.5.4.2, Agda has no universe for propositions (but there is
one now in Agda 2.6+), which means the entire language resides in a predicative
universe hierarchy (called `Set`s). On the other hand, Coq is way more
complicated. It has not only the predicative universe hierarchy as in Agda (called
`Type`s), but also other two universes, `Set` and `Prop`. It is very
misleading that `Set` in Coq is not the same as `Set`s in Agda. Specifically,
`Set` in Coq is a small universe which can only contain data. `Set`s in Agda
corresponds to `Type`s in Coq.

The problem here is that `Prop` in Coq is a proof irrelevant universe. Proof
irrelevance is a concept stating that terms inhabited in types that are `Prop` are
purely logical and any two proofs of the same proposition are considered identical. A
`Prop` in Coq can only be eliminated into data (`Set` and `Type`) when the
definition of the `Prop` satisfies *singleton elimination principle*.

In programming languages, it is a convention to put inference rules in
`Prop`. Therefore, libraries designed for programming languages researchers assumes
it is the case.

First, let me explain what `Prop` cannot do. Consider the following two types.

The first one is existential quantification, which is typical in logic. It simply confirms the existence of something but gives away no precise information of the witness. The second one is called a \(\Sigma\) type and represents a pair, where the type of the second element depends on the value of the first element. For example, let us assume

and \(P\) is meal. If \(x\) is morning, then \(P(x)\) is breakfast, and cereal is one of the values. If \(x\) is evening, then \(P(x)\) is dinner, and steak is one of the values. Having a \(\Sigma\) type, I can know the time, together with a concrete proof that there will be a meal at that time. This is because \(\Sigma\) type stores data. In general,

This is because one can always forget about the information. Instead of saying the information is about the dinner in the evening, I can always forget about the time and only be concerned about the fact that there will be food for the day without even caring about when it is. Clearly, I have lost information by going through this direction.

Though consistent with the theory, the other way is not generally provable.

This direction requires an axiom, called the *indefinite description* or *Hilbert’s*
\(\epsilon\). This axiom turns out to be very strong as it can be used to prove
the axiom of choice. In short, the proof irrelevant nature of universe `Prop`
forgets the internal structures of derivations of inference rules and prevents
their treatment as pieces of data, because in `Prop`, any two proofs of the same
conclusion are considered the same.

However, when considering undecidability proof, I have to consider interactions between features in a judgment derivation using Gentzen's style of proof induction. This, in turn, is to treat proofs as prices of structural data. This view naturally contradicts to the philosophy of proof irrelevance.

This problem is fundamental. Technically, the barrier between `Prop` and `Type` is
firm in Coq. Since there is virtually no way to extract data information from `Prop`
(except when the definition satisfies singleton elimination which is very often not
the case), if a proof step requires induction on a definition in `Prop`, then it
forces all other dependent definitions into `Prop` as well. An alternative solution
is to turn this definition a `Type`, which is not always possible.

In contrast, Agda has no proof irrelevant universe, so such problem simply does not occur. Even in Agda 2.6, the majority of concepts are defined in Sets, which limits the troubles created by proof irrelevance.

In my opinion, most of the proof techniques in programming languages are representable
in first order fragment of intuitionistic type theory, so requirement of `Prop` does
not look justified. For other proofs that cannot be shown in the first order fragment,
it is likely that having `Prop` is not going to resolve this issue. It appears
`Prop` does not give many advantages from this perspective. Therefore, working with
proof relevant universes had been a much better choice.

It might still have been still possible to establish transitivity of \(D_{<:}\) normal form. However, a particular name representation had introduced significant difficulties to the problem which makes establishment of the property very unlikely.

## Name Representations

In informal presentation of programming languages, we often use Barendregt's convention of \(\alpha\) conversions. That is, all terms and types are considered the same if free variables are the permuted. In formal analysis, however, such equivalence is too abstract to represent. Very often, other name representations are used instead. Among these representation, de Bruijn indices are one choice. In de Bruijn indices, each name is presented as a natural number, indexing the telescope. Intuitively, de Bruijn indices take a quotient of the set of terms and types over all names.

Another possibility is locally nameless representation with cofinite quantification (or LN-cofinite in short). In this representation, a name has dual representation: a string when bound in the telescope, or a de Bruijn index when is closed in a term or a type.

There are many advantages of LN-cofinite in the context of soundness proofs. However,
this name representation does not have all advantages over other name representations,
and there is one particular issue with LN-cofinite which gives substantial trouble
with interacting with `Prop`.

### Cofinite quantification

[1] introduced LN-cofinite as an answer to the POPLmark challenges. In LN-cofinite,
when a fresh name is needed, such free name is specified by a universal
quantification, with a condition of not being contained in a specific finite name
store, hence *cofinite*. For ease of presentation, I will use \(F_{<:}^-\) has an
example. Note that there are other ways to resolve the situation in \(F_{<:}^-\),
because it is not as complicated as \(D_{<:}\). I use \(F_{<:}^-\) only
because it has less rules and it is exposed to the same problem.

Notice that this definition of \(F_{<:}^-\) uses Barendregt's convention as it implicitly requires \(\alpha\) conversion in the All rule when drawing a fresh \(X\) as a type variable. A formal definition using LN-cofinite would define the All rule alternatively as follows:

I use \(\Pi\) to denote universal quantification in type theory, in order to distinguish the symbol \(\forall\) which has been used as a part of the syntax in \(F_{<:}^-\).

Notice that, in this formal definition, \(L\) is a finite name store. The second premise in All' holds for any \(X\) that is not contained in \(L\). That is, in this judgment, it remembers a finite set of names \(L\), and the fresh names drawn in the second premise must avoid \(L\). Since the set of all names are countably infinite, excluding a finite number of them will not exclude all names. In a universal type, e.g. \(\forall X <:S.U\), \(X\) on the other hand, is presented by a de Bruijn index, and it is 0 in this case.

### Noncanonical derivation of cofinite quantification

LN-cofinite captures much intuition, but there are gaps between it and Barendregt's
convention. In particular, with \(\alpha\) conversion, we usually consider a
given derivation has fixed subderivations; namely, the shape of any subderivation is
fixed up to permutations of names. This assumption, unfortunately, is *false* with
LN-cofinite.

Let us consider the All' rule again:

In this rule, the second premise resides in a universal quantification, which is a function space in the type theory. Since it is a \(\Pi\) type, the proof term is some function. Consider the following implementation:

```
fun X (X_notin_L : X `notin` L) =>
if X == Y then (* ... a proof term specialized for Y *)
else if X == Z then (* ... a proof term specialize for Z *)
else (* ... other proof terms *)
```

In particular, proof terms when \(Y\) is picked and \(Z\) is picked do not have to look the same. This is even worse when transitivity and reflexivity are a part of the definition. For any derivation \(D\) which witnesses \(\Gamma \vdash S <: U\), following derivation is also a witness of the same conclusion:

Define this trivial expansion of derivation tree \(f(D)\). Since a set of names excluding any finite set remains countably infinite, there exists an isomorphism between this set and natural numbers. Call this isomorphism \(I\). Consider a proof \(H\) witnessing \(\Pi X, X \notin L \rightarrow \Gamma; X <: S' \vdash U_X <: U'_X\). Then one can derive a proof term, so that the forms of the subtyping derivations between return types generated by different type variables are completely “distorted”.

The idea is that \(H'\) expands an existing proof \(H\) using the expansion function above so that a pair of derivations generated by two difference choices of names are not going to be the same.

In the construction of \(H'\), I first extract an existing proof from \(H\). Then for any given valid choice of name \(X\), the isomorphism \(I\) between the set of names and natural numbers converts \(X\) to a natural number. This natural number is used to count the number of interations of applications of \(f\) to the existing proof. Since any two distinct names are not mapped to the same natural numbers, and there are infinite number of names, it is clear that sizes of derivations in \(F_{<:}^-\) are in generally not measurable by natural numbers.

On the other hand, we assume derivations are trees and therefore their depths can be measured by natural numbers. This distinction is not quite compatible with the classical view of Gentzen's style proof induction.

## Termination Checking

Before turning our attention to the problem with LN-cofinite, it is worth briefly mentioning the strategies of termination checking in Coq and Agda.

Though very often inferred, termination checking in Coq indeed requires a specified
`struct` parameter so that the termination checker knows which structure the
fixpoint decreases on. This strategy works similar in both `Prop` and `Type` in
Coq.

On the other hand, in Agda, the strategy is slightly different. In Agda, the termination checker directly checks each call site of the function being defined (or functions in the context of mutual definitions) to ensure the parameters are decreasing in a lexicographical order. Note that this method is weaker than the one used in Coq, though ultimately equivalent as it can be done manually in Coq by using some well-founded induction.

That being said, sometimes this distinction can induce significant differences of
engineering effort. The problem is much easier to surface when considering
well-founded induction of some `Prop`, e.g. well-founded induction on the depth of
the derivations.

## Interacting with `Prop`

There are various proofs in my thesis requiring well-founded induction on the number of steps in the derivation of a judgment. On paper, this is usually not a problem; as indicated earlier, we often consider a derivation is a (finite) tree and therefore there is always a maximum depth of a derivation, following Gentzen's style. However, in a previous section, I showed that this view is not well compatible with LN-cofinite, as there are subderivations living in function spaces. Note that I will be using \(F_{<:}^-\) as an example here for ease of presentation. However, there are workarounds for the problem presented here, so \(F_{<:}^-\) itself does not suffer much from it. My point is that, in more sophisticated proofs like transitivity of \(D_{<:}\), due to the intervene of complex inductive scheme, a workaround does not seem to exist, and therefore the problem about to be presented here is more significant for those calculi.

Let us consider how to measure the size of a derivation if it is defind in
`Type`. As shown previously, natural numbers are not big enough, so the following
definition is used instead.

```
Inductive card : Type :=
| base : card
| rec : card -> card
| clsr : forall L, card -> (forall x, x `notin` L -> card) -> card.
```

In this definition, `card` stands for cardinal. If the subtyping relation is defined
in `Type`, then a function can be defined to convert the derivation to `card`. One
can easily define a well-founded relation on `card`, and therefore it is possible to
do well-founded induction with LN-cofinite. However, if the subtyping relation is
defined in `Prop`, as it is usually done, then `card` does not measure the size of
a derivation anymore.

The first trouble is, one cannot define a function converting a derivation to a
`card`, because it would require elimination from `Prop` to `Type`, which is in
general not possible except when the `Prop` satisfies singleton elimination
principle.

A possible solution to this, is to use a relation to associate every derivation with a measure. That is the following equivalence:

The equivalence shows an equivalence between the original subtyping relation and an
enriched subtyping relation, denoted by \(\Gamma \vdash [ c ] S <: U\). In this
relation, \(\Gamma\), \(S\) and \(U\) represent precisely what
\(\Gamma \vdash S <: U\) means. In addition to that, a fourth parameter \(c\)
represents the cardinal of \(\Gamma \vdash S <: U\), defined by `card`
aboved. That is, \(c\) is used to remember the cardinal of a given subtyping
derivation. For completeness, the full definition of this four-place relation is
defined as follows:

Reviewing these rules, one can realize that indeed the cardinal measures a derivation. It is easy to show that

This can be proved by forgetting \(c\).

However, the other direction is not quite easy.

This is because the subtyping relation lives in `Prop` and that the All' rule uses a
function space for its subderivation. Consider a proof attempt by induction. The All'
rule generates the following antecedents in the proof context:

- \(\Gamma \vdash S_2 <: S_1\),
- \(\Pi X, X \notin L \to \Gamma ; X <: S_2 \vdash U_1 <: U_2\),
- \(\Gamma \vdash [\texttt{c}] S_2 <: S_1\),
- \(\Pi X, X \notin L \to \exists d, \Gamma ; X <: S_2 \vdash [\texttt{d}] U_1 <: U_2\),

The goal is

According to the C-All rule, this \(e\) is constructed by `clsr`, and c is
already known. The trouble is \(d\), which has type \(\Pi X, X \notin L \to
\texttt{card}\). This ought to be constructed from the fourth antecedent. However,
this cannot be done. Notice that this target type is in `Type`, while the fourth
antecedent lives in `Prop`. Due to the isolation between `Prop` and `Type`, one
cannot construct a `Type` out of `Prop` (clearly this function space does not
satisfy singleton elimination).

Looking at the form of the fourth antecedent, to derive \(\Pi X, X \notin L \to \texttt{card}\), at least the axiom of Choice seems necessary. Surely this can be done by postulating it, as Choice is compatible with Coq. If one resists postulating axioms in type theory, it has come to a deadend. Yet, if de Bruijn indices are used, there is no need to use other axioms. Namely, \(\Gamma \vdash S <: U \Rightarrow \exists c, \Gamma \vdash [ c ] S <: U\) can already be proven purely constructively with de Bruijn indices.

## Conclusion

In this blog, I showed that how LN-cofinite interacts with `Prop` and introduces the
need of some external axiom. This is quite unexpected, as it has nothing to do with
Barendregt's convention at all, but is a very technical detail of the choice of name
representation. With Gentzen's style proof induction, one treats proof derivations as
trees which directly conflicts with the nature of LN-cofinite. It means that the
choice of name representation is highly non-trivial depending on the problems and the
use of `Prop` universe needs to be carefully reviewed.

[1] | The Locally Nameless Representation, https://www.chargueraud.org/research/2009/ln/main.pdf |