Next: Dataflow analysis: copy propagation.
Up: Compiler Theory: Code Optimization
Previous: Dataflow analysis: identifying loops
Recall that we consider a TAC program .
We assume that it is generated from a high level
language where statements are
 assignments,
 alternatives,
 while loops.
So our high level program may be generated by a grammar of the form
S 

id := E 
S 

S ; S 
S 

if E then S else S 
S 

while E do S 
E 

E + E 
E 

E  E 
E 

E = E 
E 

E < E 
E 

id 
E 

num 
E 

bool 
Hence we can assume that the control flow graph G of is reducible.
Moreover we assume that
 the assignments of our TAC are of one of the forms
 x := y,
 x := unop y where unop is a unary arithmetic operation,
 x := y binop z where binop is a binary arithmetic operation.
 and all other statements are conditional or unconditional jumps.
(Hence we do not consider pointers nor function calls.)
Finally we assume that the alternatives and while loops
are translated using the translation schemes of the previous
chapter. This implies that whatever is E the control
will flow to the same point after executing each of the statements.
 if E then S_{1} else S_{2}
 while E do S
POINTS. Within a basic block B, a point denotes
 either the beginning of B (before its first statement)
 or the end of a statement.
Hence each basic block consisting of statements possesses
+ 1 points. We will use the following notations.
 The first point of B is denoted by first(B).
 The last point of B is denoted by last(B).
PATH. A sequence of points
p_{1},..., p_{n} is a path if for
every
i = 1,..., n  1
 either p_{i} and p_{i+1} are in the same basic bloc
and p_{i} immediately precedes p_{i+1}.
 or p_{i} is the last point of a block B and p_{i+1}
is the first point a block B' such that there is an edge
from B to B' in G.
VARIABLE DEFINITION. A definition of a variable x is a statement that assigns
(or may assign) x.
Observations.
 Because of our assumptions regarding TAC a statement assign or does
not assign a variable.
 In a more general theory, with procedure calls or pointers,
a statement may assign a variable. In this case we would
consider the notions of unambiguous definitions and ambiguous definitions.
A VARIABLE DEFINITION D REACHES A POINT P if there is a path from the point immediately following d to p
such that d is not modified (killed) along this path.
From now
 we number each TAC statement and
 we will refer to a variable definition by the number of the statement
where this definition is made.
REGION. A subgraph R of the control flow graph G is a region of G
if there exists a unique node h of R which dominates
all other nodes of R.
Observations.
 This unique point h is called the header of the region.
 A loop is a special case of a region that is strongly
connected.
Theorem 1
The subgraph of the CFG corresponding to the
translation of a statement
S of our high level
language is a region denoted by
region(
S).
Moreover, by introducing empty blocks, one
may assume that
for any statement S of our high level language
control can flow to only
one outside block when it leaves region(S).
A STATEMENT REGION is a subgraph of the CFG of the form region(S).
Because of the previous definition and theorem,
it makes sense to talk about the first point
and the last point of a region,
denoted respectively by
entry(S) and exit(S).
DATAFLOW SETS FOR STATEMENT REGIONS. For each statement S of our high level program,
we associate four sets to region(S).
Each of them
 is a subset of the variable definitions of the
TAC program and
 can be viewed as an attribute of the statement S.
These sets are defined as follows.
 in(S):
 A definition d (from anywhere in ) belongs to in(S)
if d is active at
entry(S).
 out(S):
 A definition d (from anywhere in ) belongs to out(S)
if d is active at exit(S).
 gen(S):
 A definition d belongs to gen(S) if d is made
in S and there exists a path from to d to exit(S)
which uses points all inside S.
 kill (S):
 A definition d (from anywhere in )
belongs to kill (S) if for every path from
entry(S) to exit(S)
it is killed by definition d' with d d'.
Observe that these definitions extend naturally to basic blocks
since a basic block can be viewed as a sequence of statements of the
high level language.
DATAFLOW EQUATIONS. We give below equations relating the dataflow sets.
 It is important to realize that these equations are an approximation
of what will really happen when the programs runs.
Indeed some killing statements may in fact
not kill like in
x := 0
y := y + x
 A central principle of code improving transformations
is that we should make conservative decisions
 if in doubt we should add a statement to a generated set
 if in doubt, we should not add any definition to a killed set.
DATAFLOW EQUATIONS FOR AN ASSIGNMENT.
gen(S) 
= 
{d} 
kill (S) 
= 
D_{a} {d} 
out(S) 
= 
gen(S) in(S) kill (S) 

(1) 
where D_{a} denotes the set of all definitions of a in the TAC program .
Figure 6 illustrates the fact that gen(S), kill (S), out(S)
can be computed from in(S).
Figure 6:
Dataflow equations for an assignment.

DATAFLOW EQUATIONS FOR A SEQUENCE OF STATEMENTS.
gen(S) 
= 
gen(S_{2}) gen(S_{1}) kill (S_{2}) 
kill (S) 
= 
kill (S_{2}) kill (S_{1}) gen(S_{2}) 
in(S_{1}) 
= 
in(S) 
in(S_{2}) 
= 
out(S_{1}) 
out(S) 
= 
out(S_{2}) 

(2) 
Figure 7 illustrates the fact that gen(S), kill (S), out(S)
are synthesized attributes whereas in(S) is an inherited attribute.
Figure 7:
Dataflow equations for a sequence of statements.

DATAFLOW EQUATIONS FOR AN ALTERNATIVE.
gen(S) 
= 
gen(S_{1}) gen(S_{2}) 
kill (S) 
= 
kill (S_{1}) kill (S_{2}) 
in(S_{1}) 
= 
in(S) 
in(S_{2}) 
= 
in(S) 
out(S) 
= 
out(S_{2}) out(S_{1}) 

(3) 
Figure 8:
Dataflow equations for an alternative.

DATAFLOW EQUATIONS FOR A WHILE LOOP.
gen(S) 
= 
gen(S_{1}) 
kill (S) 
= 
kill (S_{1}) 
in(S_{1}) 
= 
in(S) gen(S_{1}) 
out(S) 
= 
out(S_{1}) 

(4) 
Observe that syntaxdirected definition associated
with the above rules is not Lattributed.
Indeed, the inherited attribute in(S_{1}) depends on the
synthesized attribute gen(S_{1}).
Figure 9:
Dataflow equations for a while loop.

COMPUTING THE DATAFLOW SETS.
 The synthesized attributes gen(S) and kill (S) can be computed
during a bottomup parsing
 However the inherited attributes in(S) cannot be computed
by a depthfirst traversal of the parse tree because of the rule
in(S_{1}) = in(S) gen(S_{1}) 
(5) 
as mentioned above.
 To compute the inherited attributes in(S) we use a completion
algorithm as for the FOLLOW sets or the dominator sets.
 This is done by Algorithm 4 which assumes
that the sets kill (B) and gen(B) are known for every basic bloc B.
Algorithm 4
Observations.
 Since each
 out(B)  is bounded (by the number of lines of the TAC program)
and since there is a finite number of basic blocks,
the algorithm has to stop.
 When it stops, the sets in(B) cannot further increase since
the sets out(B) are themselves saturated.
 One can show (not really difficult) that this algorithm has a quadratic time
complexity in the number of lines of the program.
 This algorithm can be implemented efficiently by coding
each dataflow set by an inetger (regarded as a bit vector)
and using bit vector operation from the machine processor.
 Algorithm 5 detects loop invariant computations.
Algorithm 5
Note that a loop invariant is not necessarily a TAC which can be moved
into a preheader. Indeed, in a given loop we may have two statemens with constant operands
that define the same variable. For instance, x := 2 and x:= 5.
Next: Dataflow analysis: copy propagation.
Up: Compiler Theory: Code Optimization
Previous: Dataflow analysis: identifying loops
Marc Moreno Maza
20041202