Sunday, November 14, 2010

Totally nameless representation

[Note: I am unaware of any truly original content in this post; when I say "we picked this up from Dan" it doesn't imply Dan worked it out himself; when I say "we worked this out" I don't have any reason to believe we were the first. Dan Licata in a comment lists some of the references to things we generally absorbed from him.]

This post is about two things. The first topic is a style of formalization in Agda that Bernardo Toninho and I picked up on from Dan Licata. The second topic is a way of establishing termination arguments that Bernardo and I worked out and used extensively in the Agda formalizaiton of several variants of Constructive Provability Logic, which I've mentioned before. Because I'm a bit perverse, I'll be using my own Agda Standard Library as the basis for this post; the code for this post can be found here - since it's a demo bound together with my library, technically I guess I'm supposed to keep the file at that link (though maybe not this post) in sync with the library as the library evolves.

Starting point: the simply-typed lambda calculus

This is a pretty standard Agda formalization of "Church-style" simply-typed lambda calculus terms. "Church-style," as opposed to "Curry-style," encodings only allow for the existence of well-typed terms.*
   open import Prelude

infixr 5 _⊃_
data Type : Set where
con : StringType
_⊃_ : TypeTypeType

Ctx = List Type

infixl 5 _·_
data Term (Γ : Ctx) : TypeSet where
var : ∀{A} → A Γ → Term Γ A
_·_ : ∀{A B} → Term Γ (A ⊃ B) → Term Γ A → Term Γ B
Λ : ∀{A B} → Term (A :: Γ) B → Term Γ (A ⊃ B)
The only non-obvious part of this encoding is that variables are represented by "dependent de Bruijn indices" - instead of a de Bruijn index being just a natural number, it's the pointer into its type in the context. The nice part about doing it this way is that you can define a very powerful notion of "generalized weakening" that is based on the "subset" partial order on contexts (Γ Δ is a function mapping all the indices A Γ to indices A Δ).
   _⊆_ : CtxCtxSet
_⊆_ = LIST.SET.Sub

wk : ∀{Γ Δ A} → Γ Δ → Term Γ A → Term Δ A
wk θ (var n) = var (θ n)
wk θ (n1 · n2) = wk θ n1 · wk θ n2
wk θ (Λ n) = Λ (wk (LIST.SET.sub-cons-congr θ) n)
The nice part about this definition, and one of the things I picked up on from Dan, is that we can then define the usual weakening, exchange, and contraction properties directly by using generalized weakening and the corresponding facts about the subset relation on lists (which are defined in my standard library).
   wken : ∀{Γ A C} → Term Γ C → Term (A :: Γ) C
wken = wk LIST.SET.sub-wken

exch : ∀{Γ A B C} → Term (A :: B :: Γ) C → Term (B :: A :: Γ) C
exch = wk LIST.SET.sub-exch

cntr : ∀{Γ A C} → Term (A :: A :: Γ) C → Term (A :: Γ) C
cntr = wk LIST.SET.sub-cntr
With that out of the way, we can talk about substitution, both the "obvious way" and the metric-indexed way Bernardo and I have used extensively.

Straightforward substitution

The obvious (for some value of obvious) and direct definition of substitution involves 1) generalizing the induction hypothesis to allow for even more additions (tm-subst) and 2) writing a variable substitution lemma that handles the case where the substituted-into term is a variable and we either perform the substitution, lower the de Bruijn index, or leaving the de Bruijn index alone (var-subst). The main theorem is inductive on the structure of the first term.
   subst : ∀{Γ A C} → Term Γ A → Term (A :: Γ) C → Term Γ C
subst = tm-subst [] _
where
var-subst : ∀{A C} (Γ Γ' : Ctx)
Term Γ' A
→ C ++ [ A ] ++ Γ')
Term++ Γ') C
var-subst [] Γ' n1 Z = n1
var-subst [] Γ' n1 (S n) = var n
var-subst (_ :: Γ) Γ' n1 Z = var Z
var-subst (_ :: Γ) Γ' n1 (S n) = wken (var-subst Γ Γ' n1 n)

tm-subst : ∀{A C} (Γ Γ' : Ctx)
Term Γ' A
Term++ [ A ] ++ Γ') C
Term++ Γ') C
tm-subst Γ Γ' n1 (var n) = var-subst Γ Γ' n1 n
tm-subst Γ Γ' n1 (n · n') = tm-subst Γ Γ' n1 n · tm-subst Γ Γ' n1 n'
tm-subst Γ Γ' n1 (Λ n) = Λ (tm-subst (_ :: Γ) Γ' n1 n)
To motivate the next step, I'll point out that the only reason we had to generalize the induction hypothesis is because of the lambda case - if we try to prove the theorem directly, we find ourselves in the position of having a Term Γ A and a Term (B :: A :: Γ) C; we can't apply the induction hypothesis directly on these two terms. We can apply wken to the first subterm and apply exch to the second subterm and call subst recursively, but now we can't establish termination on the structure of the second term, since we've passed it to the exch function, which changed its structure by replacing Zs with (S Z)s and vice versa.

Setting up a metric

The alternate solution I intend to present involves setting up a metric; this metric expresses the "shape" of a term, disregarding all binding. This is a three-step process: first we define a generic "tree" piece of data that can capture the shape of a term (disregarding binding), and second we define a copy of the Term datatype, called MTerm, that has a Metric as one of its indices.
   data Metric : Set where
: Metric
_●_ : MetricMetricMetric
> : MetricMetric

data MTerm (Γ : Ctx) : TypeMetricSet where
var : ∀{A} → A Γ → MTerm Γ A
_·_ : ∀{A B m m'} → MTerm Γ (A B) m → MTerm Γ A m' → MTerm Γ B (m m')
Λ : ∀{A B m} → MTerm (A :: Γ) B m → MTerm Γ (A B) (> m)
The third step is to show that we can freely get into and out of the metric; potentially we should also show that these two actions are an isomorphism, but I haven't run across the need to actually prove that. It really turns out that Agda does most of the work here...
   tm→ : ∀{Γ A m} → MTerm Γ A m → Term Γ A
tm→ (var n) = var n
tm→ (n1 · n2) = tm→ n1 · tm→ n2
tm→ (Λ n) = Λ (tm→ n)

→tm : ∀{Γ A} → Term Γ A → ∃ λ m → MTerm Γ A m
→tm (var n) = , var n
→tm (n1 · n2) = , snd (→tm n1) · snd (→tm n2)
→tm (Λ n) = , Λ (snd (→tm n))
Now that we have our metric, we can prove a stronger and more useful version of the generalized weakening lemma: not only can we apply generalized weakening to terms, but the resulting term has the exact same shape (as shown by the fact that the same metric "m" appears as an index both to the function's argument and its conclusion.
   wkM : ∀{Γ Δ A m} → Γ  Δ → MTerm Γ A m → MTerm Δ A m
wkM θ (var n) = var (θ n)
wkM θ (n · n') = wkM θ n · wkM θ n'
wkM θ (Λ n) = Λ (wkM (LIST.SET.sub-cons-congr θ) n)
We don't need to prove weakening twice; our previous weakening lemma is just a simple consequence of this new, stronger one:
   wk : ∀{Γ Δ A} → Γ  Δ → Term Γ A → Term Δ A 
wk θ n = tm→ (wkM θ (snd (→tm n)))


Substitution with the metric

With the metric in tow, substitution is straightforward. Termination is established by induction on the metric of the second term. Note the use of both wken (weakening outside of the metric) and exchM (exchange inside the metric) in the lambda case.
   subst : ∀{Γ A C} → Term Γ A → Term (A :: Γ) C → Term Γ C
subst n1 n2 = substM n1 (snd (→tm n2))
where
substM : ∀{Γ A C m} → Term Γ A → MTerm (A :: Γ) C m → Term Γ C
substM n1 (var Z) = n1
substM n1 (var (S n)) = var n
substM n1 (n · n') = substM n1 n · substM n1 n'
substM n1 (Λ n) = Λ (substM (wken n1) (exchM n))
In this example, it's not clear that we've saved much work: setting up the metric was a lot of boilerplate, though it is straightforward; the benefit that we gained in terms of proof simplicity was primarily an artifact of Agda's pattern matching: either a variable was the first variable in the context (and therefore represented by the de Bruijn index Z), or it was later in the context. So this example isn't the world's best argument for this style, but in the more complex logics that Bernardo and I dealt with, we found that this technique seemed, of all the approaches we could think of, to be the one least likely to fail on us.

Comments

Dan's said on a couple of occasions that he thinks there are less painful ways of doing this sort of thing, existing Agda techniques that avoid the annoying duplication of data types. My understanding is also that there is a not-totally-integrated notion of "sized types" in Agda that could maybe handle this sort of thing, but that the way it works is by defining metrics that are only natural numbers. It seems like a step backwards, somehow, if we go from structural induction to natural number induction. Furthermore, sized types based on natural numbers would almost certainly fail to handle the vaguely insane termination arguments that come up in constructive provability logic.

Totally nameless representations?

The germ of this post comes from a conversation with Chris Casinghino at UPenn back in May. He was working on extending this sort of simple representation in a way that was similarly-insane but mostly unrelated to constructive provability logic. Our problems were different; he placed the blame for some of the difficulties he encountered on the "sillyness of using all de bruijn indices instead of a locally nameless encoding," which was a response that I initially found puzzling. I've since decided that my puzzlement was the result of the fact that he was thinking of the specific thing he was doing in terms of programming languages, and I was thinking about the thing I was doing (constructive provability logic) in terms of logic - my guess is that we're both so used to working on both sides of the Curry-Howard correspondence that it wouldn't have occurred to us that this might make a difference!

From a programming languages perspective, we have programs and typing derivations that show programs to be well typed; the locally nameless encoding gives you the ability to apply the exchange or weakening lemma and get a new typing derivation that says that the same term is still well typed. That wasn't the case for our original "Church-style" lambda calculus terms. In my current view, that's because a "Church-style" lambda calculus term is more like the typing derivation than it is like a term. It's a typing derivation without the term it's typing; in other words, it's a proof in natural deduction form! And what Bernardo and I was using was something that, for our purposes, was just as good as a derivation that the same term was well typed in the exchanged context - we got that the same metric was associated with the exchanged proof term.

In other words, it seems that, for the purposes of termination arguments, the metric serves the same purpose a locally nameless proof term does. Certainly it avoids the "silliness of using all de Bruijn indices," as it uses none whatsoever, hence the mostly-joking title of this post. However, it's quite possible I'm off base; I'd be interested in what people more versed in locally nameless representation thought. I should add that I certainly think it is possible to do an intrinsically typed locally nameless representation; you'd just have to have (at minimum) the free variables be aware of their type. If you were interested primarily in representing a programming language that might even be the right thing to do; but the totally nameless metrics seem to be an easier fit for developments where the primary interest is in the logic side of the correspondence.

* Shameless plug for my advisor Frank Pfenning's excellent essay, "Church and Curry, Combining Intrinsic and Extrinsic Typing" which talks about properties of Church-style versus Curry-style encodings and relates them to subtyping and work by William Lovas.

P.S. This blog post has been about Agda; the first time I used this kind of tree structural metric on terms was in Twelf, when (on a challenge from John Boyland) I showed a bijection between HOAS and de Bruijn form representations of the terms of an untyped (or, if you prefer, "Curry-style") lambda calculus terms.

5 comments:

  1. Nice post!

    I'm curious what you mean about being able to do an intrinsically typed locally nameless: do you mean just intrinsicially typed, or also intrinsicially scoped? The problem with intrinsicially scoped locally nameless is that you need proofs that the names are in the context---at which point you're back to de Bruijn indices! You really need some proof irrelevance for it not to degenerate to de Bruijn.

    Regarding "Dan's said on a couple of occasions that he thinks there are less painful ways of doing this sort of thing, existing Agda techniques that avoid the annoying duplication of data types." I think what I had in mind is defining
    Metric : (Γ : Ctx)(A : Tp) → Term Γ A → Set
    recursively: for each constructor, you have recursive Metrics whereever you need recursive calls. Then you prove that Metric e is the same as Metric (exchage e), etc. I've used this once, but I'm not sure what the tradeoffs are in your setting.

    Also, the right citations for this representation are Bellegarde and Hook, 94; Altenkirch and Reus, 99; Bird and Patterson, 99---see the biblio of basically any paper I've written in the past 3 years for the deets.

    ReplyDelete
  2. I just mean intrinsically typed; your data type would have to have a "bvar : ∀{A} → A ∈ Γ → Term Γ A" as well as a "fvar : ∀{A} → A → Name → Term Γ A". I can imagine writing "open" and "close" functions for such a type; I don't know how you'd get the right induction principles out of that, though.

    ReplyDelete
  3. Fun post! As far as sized types go, it just so happens that they are strictly -more- powerful than structural induction: the latter only allows recursive calls on subterms, whereas size based termination allows recursive calls on -all- smaller terms (as far as the type system can tell that they are smaller). Just to clear things up, the notion of size is actually the abstraction of an ordinal, to allow recursive calls on inductive terms of higher type.

    Your first presentation of substitution is essentially the algebraic presentation commonly referred to as the "presheaf approach" which is itself related to the "monadic approach". In these cases, weakening is the action of the term functor on context renamings (exactly your wk function), and substitution is actually -more- general: all variables are substituted in one go. I wonder if that more general defnition allows for an even simpler definition of tm-subst, which would suggest that this is the natural presentation of substitution in this context...

    ReplyDelete
  4. Do sized types in Agda scale past omega? More generally: was your comment about sized types generally, or their current state of implementation in Agda (which I am unclear about).

    As for what I would call simultaneous substitutions: that works too; I've added the example to the bottom of the code example for this post. It uses some library functions; "LIST.All P xs" just says "P holds for every x in xs", which translates to "there is a term in the new context for every type in the old context." Indeed, the result is slicker than either of the approaches I originally presented.

    Our experience with the metric technique I demonstrated here is primarily cut elimination arguments, not substitution - substitution was just used as an example. However, if substitution is what you're actually hoping to get at, then I think the presheaf approach/simultaneous substitution definition looks pretty good. Thanks thecod!

    ReplyDelete
  5. I'm not sure about what exactly has been implemented in Agda. I believe Andreas Abel has done most of the work himself (though I could be wrong), so there is a high chance that the system is similar to that of his PhD thesis (which describes Fomega actually but the ideas are similar, you can also see the work on MiniAgda).

    The language of sizes only contains the symbols $ (for successor) and # (for infinity) with 0 and max depending on the version. This means that you can not explicitly write a size that is larger than omega. However, and this is the important part, you can express structural decrease (and other kinds of decrease) on inductive types as the -difference- of sizes is always far less than omega (usually less than 3). You can have a look at http://perso.ens-lyon.fr/colin.riba/papers/tutorial-tbt.pdf for some more explanations.

    Thanks for the update!

    ReplyDelete