CFML: Characteristic Formulae for ML

CFML can be used to verify Caml programs using the Coq proof assistant. It is based on Characteristic Formulae, which I have developed during my thesis. CFML consists of:
  • a generator that parses Caml code and produces characteristic formulae expressed as Coq axioms (the generator itself is implemented in Caml),
  • a Coq library that provides tactics for manipulating characteristic formulae interactively.

Download

  • The stable source files can be obtained by running:
    svn checkout svn://scm.gforge.inria.fr/svn/cfml/branches/v1 cfml
  • The more recent developments can be found in the trunk folder.
  • The compilation requires Coq v8.4 and OCaml (>= 10.1).
  • The developments rely on my Coq library TLC, which comes with CFML.
  • All the files are distributed under the GNU-LGPL license.
  • The documentation contains the details for the installation procedure and an overview of the tactics available.

Related publications

Characteristic Formulae for the Verification of Imperative Programs
To appear in HOSC, October 2012 (submission date)
Characteristic Formulae for the Verification of Imperative Programs
International Conference on Functional Programming (ICFP), September 2011
Program Verification Through Characteristic Formulae
International Conference on Functional Programming (ICFP), September 2010
Characteristic Formulae for Mechanized Program Verification
PhD thesis, December 2010

Verification of imperative programs, including cost analysis

Formalization of the correctness and cost analysis of bootstrapped chunked sequences.
This development is presented in slides from the talk given at the IHP.
Chunked sequences
Bootstrapped chunked sequences

Verification of imperative programs

Dijkstra's shortest path algorithm
Sparse arrays (from the Vacid challenge)
Mutable list: length, destructive append, cps-append
Counter generator, and call to iter on a list of counters
Union-Find (without path compression nor ranks)
Factorial with recursion, for loop, and while loop
The composition function
The swap function for two references
Landin's knot (recursion through the store)

Data structures from Okasaki's book

Batched queue (page 43)
Bankers queue (page 65)
Physicists queue (page 73)
Real-time queue (page 83)
Implicit queue (page 174)
Bootstrapped queue (page 149)
Hood-Melville queue (page 105)
Leftist heap (page 20)
Pairing heap (page 54)
Lazy pairing heap (page 80)
Splay heap (page 50)
Binominal heap (page 24)
Unbalanced set (page 14)
Red-black set (page 28)
Bottom-up merge sort (page 77)
Catenable lists (page 156)
Binary random-access lists (page 123)

Signatures from Okasaki's book

Queues (page 42)
Double-ended queues (page 45)
Ordered types (page 14)
Finite sets (page 12)
Heaps (page 18)
Sortable collections (page 74)
Random-access lists (page 120)
Catenable lists (page 153)

CFML library

Formalization of heaps and separation logic tactics
Definitions and lemmas related to specification
Notation for displaying formulae and specifications
Specification of primitive functions
Tactics for manipulating characteristic formulae
Bundle exporting all the library

Models

Realization of AppReturns based on the deep embedding of a pure language
Proof of soundness and completness of characteristic formulae for a basic imperative language

Program verification using characteristic formulae

In my thesis, I have developed a new approach to program verification based on the notion of characteristic formula. This approach is implemented in a tool, called CFML, which can be used to establish in Coq the full functional correctness of arbitrarily complex Caml programs.

The characteristic formula of a piece of code is a logical formula that describes the semantics of this code (it is a form of strongest post-condition). This characteristic formula satisfies the following property:

  • it is built compositionally and automatically from the source code alone,
  • it has a size linear in that of the source code,
  • it can be displayed in the logic in a way that closely resembles source code (using Coq notation system),
  • it can be manipulated through a small set of specialized tactics that make it unnecessary to know how the formula is constructed
  • it is not just a sound, but also a complete description of the code semantics, meaning that characteristic formulae do not restrict in any way the ability to reason about the code;
  • it supports modular verification, and it integrates the frame rule, which enables local reasoning.

The concept of characteristic formula

The general concept of characteristic formula is not new: it originates in work on process calculi from the 80's (Park, 1981). In this setting, every syntactic process definition is mapped to a formula of Hennessy-Milner's logic. This mapping is such that two syntactic processes are behaviorally equivalent if and only if their associated formulae are logically equivalent. More recently, these results on process calculi were adapted to lambda-calculi by Honda, Berger and Yoshida, who derived a sound and complete Hoare logic for PCF (2006). I have turned the concept of characteristic formula into an effective approach to program verification my building formulae of linear size, expressed in a standard higher-order logic, associated with a pretty-printer, and including support for local reasoning.

It is useful to understand what the characteristic formula approach is not:

  • it is not a verification condition generator (VCG): the source code needs not be annotated with invariants; instead, invariants are provided in interactive proofs;
  • it is not a deep embedding: there is no inductive data type used to represent code syntax, so we avoid all technical issues related to the representation and manipulation of program syntax;
  • it is not a shallow embedding: Caml functions are not represented as Coq functions, so there is no need to resort to monadic programming for avoiding the mismatch between program functions, which can be partial, and logical functions, which must be total;
  • it is not a dynamic logic: characteristic formulae are not expressed in ad-hoc logic but instead in a standard higher-order logic, making it possible to leverage on existing theorem provers.

Earlier work: the deep embedding approach

Before developing characteristic formulae, I worked on a deep embedding approach to reasoning about Caml programs using Coq. I showed that, by introducing the right definition, notation and tactics, it is actually feasible to scale up the deep embedding approach to a realistic programming language, more precisely to the pure fragment of Caml.
Interactive Verification of Call-by-Value Functional Programs Through a Deep Embedding
Report, March 2009
Nevertheless, specifications were polluted with the need to relate embedded Caml values with the corresponding Coq values, and proofs involved heavy custom tactics whose use resulted in big proof terms. Characteristic formulae appeared as a way to avoid all of the issues of the deep embedding, while retaining all of the expressivity of this approach.