Semantics and application to program verification

Static analysis project

Control-flow graph

The point of the graph transformation is that you don't need to handle the language defined in abstract_syntax_tree.ml, which is complex. The graph in cfg.ml is much simpler, it is comprised of:

To ease graph traversal, as standard, each node contains a list of out-going arcs and in-coming arcs. Each arc and each node in the graph is given a unique integer identifier. For ease of programming, Set, Map, and Hashtbl collection modules over these objects are also provided.

Each arc is labelled with an instruction to execute when going from its source node to its destination node:

Each program variable is associated a var structure:

type var = {
   var_id: int; (* unique variable identifier *)
   var_name: id; (* original name, in the program *)
   var_type: typ; (* variable type *)
   var_pos: extent; (* position of the variable declaration *)
}

Note that the program can declare several different variables with the same name, with different scopes:

void f(int x)
{
  x++; // first x
  {
     int x;
     {
       int x;
       x++; // third x
     }
     x++; // second x
  }
}

In that case, each version x has its own structure with a different unique identifier. The graph transformation thus resolves scoping. Note that by variable, we actually denote the set of all global variables, local variables, and formal function arguments. Additionally, the translation to a graph may add temporary variables (e.g. to hold the return value of functions), which also have a structure. In your analyzer, you don't have to distinguish different kinds of variables: you can just assume that each var structure denotes a different global variable (to simplify, all global, local, formal argument and temporary variables are live at all program points)..

To each function in the program source is associated a func structure:

type func = {
   func_id: int; (* unique function identifier *)
   func_name: string; (* function name *)
   func_pos: extent; (* function position in the source *)
   func_entry: node; (* entry node *)
   func_exit: node; (* exit node *)
   func_args: var list; (* list of formal arguments *)
   func_ret: var option; (* variable used to store the return value *)
   func_calls: arc list; (* list of calls to the function *)
}

The execution of the function starts at its entry node and finishes at its exit node: i.e., both the normal function return and every return instruction will jump to the exit node. For each function returning an integer, a synthetic variable is created (func_rec field). It is used to store the return value: i.e., return expr; is modeled as storing the value of expr into the variable and then jumping to the exit node. To ease inter-procedural analysis, we remember all the instructions, in other functions, that call this function.

The result of the translation is a prog structure that contains the whole control-flow graph for all the functions in the program.

type cfg = {
   cfg_vars: var list; (* list of all the variables *)
   cfg_funcs: func list; (* list of all the functions *)
   cfg_nodes: node list; (* list of all the nodes in the program *)
   cfg_arcs: arc list; (* list of all the arcs in the program *)
   cfg_init_entry: node; (* first node of code initializing global variables *)
   cfg_init_exit: node; (* last node of initialization code *)
}

As expected, the structure contains the list of all variables and function structures, as well as the nodes and arcs of the whole program. The control-flow graph is actually composed of several unconnected parts: one subgraph for each function, plus an initialization subgraph. This initialization subgraph handles the initialization of global variables; its entry point is indicated by the cfg_init_entry field. The initialization part must thus be analyzed prior to analyzing any program function.



Author: Antoine Miné