The point of the graph transformation is that you don’t need to handle the
language defined in abstractSyntax.ml
, which is complex. The graph in
controlFlowGraph.ml
is much simpler, it is comprised of:
node
objects, that denote (roughly) program locations:
type node = {
node_id: int; (* unique identifier *)
node_pos: position; (* position in the source *)
node_out: arc list; (* arcs with this node as source *)
node_in: arc list; (* arcs with this node as destination *)
}
arc
objects, that link a source node to a destination node and are labelled with an instruction:
type arc = {
arc_id: int; (* unique identifier *)
arc_src: node; (* source node *)
arc_dst: node; (* destination node *)
arc_inst: inst; (* instruction *)
}
To ease graph traversal, as standard, each node contains a list of out-going
arcs and in-coming arcs. Each arc and each node in the graph is given a unique
integer identifier. For ease of programming, Set
, Map
, and Hashtbl
collection
modules over these objects are also provided.
Each arc is labelled with an instruction to execute when going from its source node to its destination node:
-
CFG_skip
does nothing. It is useful to model jumps or returns without a value. -
CFG_assign (var,expr)
assigns the value of the integer expressionexpr
to the variablevar
. The expressions in the graph are slightly simpler than those in the abstract syntax tree because function calls have been extracted to separate instructions. In place of the call, we put in the expression a variable where the function’s return value has been stored by the function call. -
CFG_guard expr
takes the transition to the destination node only if the boolean expressionexpr
evaluates to true. Generally, a single node will feature two out-going arcs with aCFG_guard
instruction: one withCFG_guard expr
, and the other one withCFG_guard !expr
, corresponding to the two branches ofif
conditionals or loops. Note that our expressions feature non-determinism (both as random integer intervals and as random boolean choices); hence, it is possible for a program environment to be propagated along both theexpr
and the!expr
branches. -
CFG_assert expr
is similar toCFG_guard expr
, but executing the instruction causes a program execution error ifexpr
is nottrue
: i.e., the analyzer displays an assertion violation message and continues the analysis propagating only the memory environments that satisfy the condition. -
CFG_call f
indicates a call to the function f. Note that all the information about the function arguments and return values are omitted in the instruction. The graph translation takes care of generatingCFG_assign
instructions storing the actual arguments into the formal arguments prior to theCFG_call
instruction. The function itself, if it returns a value, takes care of storing this value into a special variable before returning.
Each program variable is associated a var structure:
type var = {
var_id: int; (* unique variable identifier *)
var_name: id; (* original name, in the program *)
var_type: typ; (* variable type *)
var_pos: extent; (* position of the variable declaration *)
}
Note that the program can declare several different variables with the same name, with different scopes:
void f(int x)
{
x++; // first x
{
int x;
{
int x;
x++; // third x
}
x++; // second x
}
}
In that case, each version x
has its own structure with a different unique
identifier. The graph transformation thus resolves scoping. Note that by
variable, we actually denote the set of all global variables, local variables,
and formal function arguments. Additionally, the translation to a graph may add
temporary variables (e.g. to hold the return value of functions), which also
have a structure. In your analyzer, you don’t have to distinguish different
kinds of variables: you can just assume that each var structure denotes a
different global variable (to simplify, all global, local, formal argument and
temporary variables are live at all program points).
To each function in the program source is associated a func
structure:
type func = {
func_id: int; (* unique function identifier *)
func_name: string; (* function name *)
func_pos: extent; (* function position in the source *)
func_entry: node; (* entry node *)
func_exit: node; (* exit node *)
func_args: var list; (* list of formal arguments *)
func_ret: var option; (* variable used to store the return value *)
func_calls: arc list; (* list of calls to the function *)
}
The execution of the function starts at its func_entry
node and finishes at
its func_exit
node: i.e., both the normal function return
and every return
instruction will jump to the exit node. For each function returning an integer,
a synthetic variable is created (func_rec
field). It is used to store the return
value: i.e., return expr;
is modeled as storing the value of expr
into the
variable and then jumping to the exit node. To ease inter-procedural analysis,
we remember all the instructions, in other functions, that call this function.
The result of the translation is a prog
structure that contains the whole
control-flow graph for all the functions in the program.
type cfg = {
cfg_vars: var list; (* list of all the variables *)
cfg_funcs: func list; (* list of all the functions *)
cfg_nodes: node list; (* list of all the nodes in the program *)
cfg_arcs: arc list; (* list of all the arcs in the program *)
cfg_init_entry: node; (* first node of code initializing global variables *)
cfg_init_exit: node; (* last node of initialization code *)
}
As expected, the structure contains the list of all variables and function
structures, as well as the nodes and arcs of the whole program. The control-flow
graph is actually composed of several unconnected parts: one subgraph for each
function, plus an initialization subgraph. This initialization subgraph handles
the initialization of global variables; its entry point is indicated by the
cfg_init_entry
field. The initialization part must thus be analyzed prior to
analyzing any program function.