TP 1 - Denotational semantics

The goal of this session is to program an interpreter to compute the denotational semantics of a simple language. We will use OCaml.

Language

Start by downloading the package here.

The package contains:

a parser for the language, programmed in OCamlLex and Menhir: lexer.mll and parser.mly;
the type of abstract syntax trees output by the parser, defined in abstractSyntax.ml;
a pretty-printer, to print back an abstract syntax tree into the original language, in abstractSyntaxPrinter.ml;
in interpreter.ml, a simple driver that takes a file name passed as argument, parses it, and prints it back. Typing make should compile an executable that runs the simple driver.

You will also need to be sure that you are using OCaml 5.1 (or more recent) and that you have dune, menhir and zarith installed.

You can build the project using dune build, and you can test your interpreter using dune exec -- interpreter/interpreter.exe examples/loop.c.

Syntax

The language is a very simple “curly brackets” C-like language. A program is composed of a sequence of statements of the form:

assignment: var = expr;
tests: if (expr) stmt-1; or if (expr) stmt-1; else stmt-2;
while loops: while (expr) stmt;
blocks in curly brackets: { stmt-1; ... stmt-n; }

Non-standard statements include:

assertions of boolean expressions: assert (expr);
variable printing: print (var-1,...,var-n);
failure: halt;, which stops the program immediately

Expressions include:

integer arithmetic operators: +, -, *, /, % (modulo);
boolean operators: && (and), || (or), ! (negation);
integer comparisons <, <=, >, >=;
equality == and disequality !=, that can be used to compare either two integers or two boolean values;
constants, including integers, and the boolean constants true and false;
the special expression rand(l,h) that denotes the non-deterministic interval of integers between the constant l and the constant h.

The operators have their usual precedence, and you can group expressions using parentheses.

You can use /* ... */ and // comments.

Unlike C, variables do not need to be declared; they start existing when first assigned a value, and keep existing until the end of the program. There are no local variables, and no functions.

Deterministic semantics

We first consider the deterministic subset of the language, i.e., we ignore the Erand expression node for now.

Write an interpreter that executes the program by induction on the syntax of statements and expressions; it returns either an environment mapping variables to values, or an error.

You can use the following steps:

Define the type value of values. It should contain integers and booleans. Also define the derived type value_err which represents either a correct value, of type value, or an error. You can use a string representation for errors, which will give the user some information on the location and cause of the error. The value_err type will be useful to propagate errors during the evaluation of expressions.
Define a type env for environments. You can use the Map functor from the standard OCaml library to represent mappings from variables to (non-erroneous) values. Likewise, the env_err type shall denote either an environment or an error.
Write an expression evaluation function eval_expr: env -> expr ext -> value_err by induction on the syntax of expressions.
Write a statement evaluation function eval_stmt: env_err -> stmt ext -> env_err. When should the function return an error environment?
Test your interpreter on the programs from the examples directory. Can you detect infinite loops in loop.c, loop2.c, and loop3.c? Under which condition does your interpreter terminate?

Non-deterministic semantics

We now consider the full language including the non-deterministic expression node rand(l,h).

Write an interpreter for this language that outputs the set of all possible environments at the end of the program as well as the set of all errors that can be encountered.

The structure of the interpreter will be similar to the one in the previous question. You can use the following steps:

Define a type value_err_set to represent sets of value_err objects, i.e., sets containing values and errors. You can use OCaml’s standard Set functor.
Define a type env_err_set to represent sets of environments and errors.
Program a function eval_expr: env -> expr ext -> value_err_set to evaluate an expression in a single environment and return the set of its possible values (and errors) in that environment. When encountering a unary node, the operator must be applied to each possible value of its argument expression; you can use iterators such as fold. Binary nodes require nested fold.
Program a filter function filter: env_err_set -> expr ext -> env_err_set that returns the subset of its env_err_set argument that can satisfy the expression, enriched with the errors encountered during the expression evaluation. This function will be useful to model loops, tests and assertions. Remember that an environment can satisfy both an expression and its negation!
Program a generic fixpoint operator fix: ('a -> 'a) -> 'a -> 'a that iterates a function from a base element to reach a fixpoint. Use it then in the semantics of loops.
Test your interpreter on the examples directory, including non-deterministic programs such as gcd_nd.c and loop4.c.

Extensions

Here are a few possible extensions you can implement in the language:

Uninitialized. This extension adds a notion of “uninitialized” value. If a variable is used before it is assigned a value, the executions continues by returning the “uninitialized” value. The “uninitialized” value is propagated by all operations (“uninitialized”+1 equals “uninitialized”).
Machine integers. This extension changes the semantics of the integer data-type so that 32-bit machine integers are used instead of unbounded integers. You can design a version where overflows result in a wrap-around, following two’s complement arithmetic, or a version where overflows cause run-time errors.