next up previous
Next: Data-flow analysis: copy propagation. Up: Compiler Theory: Code Optimization Previous: Data-flow analysis: identifying loops

Data-flow analysis: reaching definitions

Recall that we consider a TAC program $ \cal {P}$. We assume that it is generated from a high level language where statements are So our high level program may be generated by a grammar of the form
S $ \longmapsto$ id := E
S $ \longmapsto$ S ; S
S $ \longmapsto$ if E then S else S
S $ \longmapsto$ while E do S
E $ \longmapsto$ E + E
E $ \longmapsto$ E - E
E $ \longmapsto$ E = E
E $ \longmapsto$ E < E
E $ \longmapsto$ id
E $ \longmapsto$ num
E $ \longmapsto$ bool
Hence we can assume that the control flow graph G of $ \cal {P}$ is reducible. Moreover we assume that Finally we assume that the alternatives and while loops are translated using the translation schemes of the previous chapter. This implies that whatever is E the control will flow to the same point after executing each of the statements.


POINTS. Within a basic block B, a point denotes

Hence each basic block consisting of $ \ell$ statements possesses $ \ell$ + 1 points. We will use the following notations.


PATH. A sequence of points p1,..., pn is a path if for every i = 1,..., n - 1


VARIABLE DEFINITION. A definition of a variable x is a statement that assigns (or may assign) x. Observations.


A VARIABLE DEFINITION D REACHES A POINT P if there is a path from the point immediately following d to p such that d is not modified (killed) along this path.

From now


REGION. A subgraph R of the control flow graph G is a region of G if there exists a unique node h of R which dominates all other nodes of R. Observations.

Theorem 1   The subgraph of the CFG corresponding to the translation of a statement S of our high level language is a region denoted by region(S).

Moreover, by introducing empty blocks, one may assume that for any statement S of our high level language control can flow to only one outside block when it leaves region(S).


A STATEMENT REGION is a subgraph of the CFG of the form region(S). Because of the previous definition and theorem, it makes sense to talk about the first point and the last point of a region, denoted respectively by entry(S) and exit(S).


DATA-FLOW SETS FOR STATEMENT REGIONS. For each statement S of our high level program, we associate four sets to region(S). Each of them

These sets are defined as follows.
in(S):
A definition d (from anywhere in $ \cal {P}$) belongs to in(S) if d is active at entry(S).
out(S):
A definition d (from anywhere in $ \cal {P}$) belongs to out(S) if d is active at exit(S).
gen(S):
A definition d belongs to gen(S) if d is made in S and there exists a path from to d to exit(S) which uses points all inside S.
kill (S):
A definition d (from anywhere in $ \cal {P}$) belongs to kill (S) if for every path from entry(S) to exit(S) it is killed by definition d' with d $ \neq$ d'.
Observe that these definitions extend naturally to basic blocks since a basic block can be viewed as a sequence of statements of the high level language.


DATA-FLOW EQUATIONS. We give below equations relating the data-flow sets.


DATA-FLOW EQUATIONS FOR AN ASSIGNMENT.

gen(S) = {d}
kill (S) = Da $\displaystyle \setminus$ {d}
out(S) = gen(S$\displaystyle \bigcup$ $\displaystyle \left(\vphantom{ in(S) \setminus kill(S) }\right.$in(S) $\displaystyle \setminus$ kill (S)$\displaystyle \left.\vphantom{ in(S) \setminus kill(S) }\right)$
(1)

where Da denotes the set of all definitions of a in the TAC program $ \cal {P}$. Figure 6 illustrates the fact that gen(S), kill (S), out(S) can be computed from in(S).
Figure 6: Data-flow equations for an assignment.
\begin{figure}\htmlimage
\centering\includegraphics[scale=.45]{assignmentDataFlowEq.eps}
\end{figure}


DATA-FLOW EQUATIONS FOR A SEQUENCE OF STATEMENTS.

gen(S) = gen(S2$\displaystyle \bigcup$ $\displaystyle \left(\vphantom{ gen(S_1) \setminus kill(S_2) }\right.$gen(S1) $\displaystyle \setminus$ kill (S2)$\displaystyle \left.\vphantom{ gen(S_1) \setminus kill(S_2) }\right)$
kill (S) = kill (S2$\displaystyle \bigcup$ $\displaystyle \left(\vphantom{ kill(S_1) \setminus gen(S_2) }\right.$kill (S1) $\displaystyle \setminus$ gen(S2)$\displaystyle \left.\vphantom{ kill(S_1) \setminus gen(S_2) }\right)$
in(S1) = in(S)
in(S2) = out(S1)
out(S) = out(S2)
(2)

Figure 7 illustrates the fact that gen(S), kill (S), out(S) are synthesized attributes whereas in(S) is an inherited attribute.
Figure 7: Data-flow equations for a sequence of statements.
\begin{figure}\htmlimage
\centering\includegraphics[scale=.45]{statementsDataFlowEq.eps}
\end{figure}


DATA-FLOW EQUATIONS FOR AN ALTERNATIVE.

gen(S) = gen(S1$\displaystyle \bigcup$ gen(S2)
kill (S) = kill (S1$\displaystyle \bigcap$ kill (S2)
in(S1) = in(S)
in(S2) = in(S)
out(S) = out(S2$\displaystyle \bigcup$ out(S1)
(3)

Figure 8: Data-flow equations for an alternative.
\begin{figure}\htmlimage
\centering\includegraphics[scale=.45]{alternativeDataFlowEq.eps}
\end{figure}


DATA-FLOW EQUATIONS FOR A WHILE LOOP.

gen(S) = gen(S1)
kill (S) = kill (S1)
in(S1) = in(S$\displaystyle \bigcup$ gen(S1)
out(S) = out(S1)
(4)

Observe that syntax-directed definition associated with the above rules is not L-attributed. Indeed, the inherited attribute in(S1) depends on the synthesized attribute gen(S1).
Figure 9: Data-flow equations for a while loop.
\begin{figure}\htmlimage
\centering\includegraphics[scale=.45]{whileDataFlowEq.eps}
\end{figure}


COMPUTING THE DATA-FLOW SETS.

Algorithm 4  

\fbox{
\begin{minipage}{10 cm}
\begin{description}
\item[{\bf Input:}] $kill(B)$...
...cup} \ \left( in(B) \setminus kill(B) \right)$\ \\
\end{tabbing}\end{minipage}}

Observations.

Algorithm 5  

\fbox{
\begin{minipage}{13 cm}
\begin{description}
\item[{\bf Input:}] A loop $L...
...\\
\> \> \> \> which is marked {\sc INVARIANT} \\
\end{tabbing}\end{minipage}}

Note that a loop invariant is not necessarily a TAC which can be moved into a pre-header. Indeed, in a given loop we may have two statemens with constant operands that define the same variable. For instance, x := 2 and x:= 5.


next up previous
Next: Data-flow analysis: copy propagation. Up: Compiler Theory: Code Optimization Previous: Data-flow analysis: identifying loops
Marc Moreno Maza
2004-12-02