Dataflow and Max-Plus Algebra

by Marc Geilen (m.c.w.geilen@tue.nl); exercises by Twan Basten

June, 2022

Monotonicity

The models that we have made and the examples that we have shown all have fixed, constant durations associated to the executed actions or to the time constraints. In reality, not all systems have this property. It often happens that there is variation in such durations. Explicitly incorporating such variation in our models may complicate modelling and/or analysis. Very often we take an alternative approach, where we assume in our models a constant worst-case duration that abstracts such variations. Max-plus-linear models have an important property that supports such a worst-case modelling approach, which is called monotonicity.

In terms of the elementary operators, this can be formulated as follows.

Monotonicity For all \(x, y, z\in{}\mathbb{T}\):

Intuitively, for example, whenever a dataflow actor waits for the arrival of two input tokens and one of them is delayed due to our worst-case abstraction, then also the start of the actor firing (captured through a \(\oplus{}\) operation) is a worst-case abstraction of the actual start of the actor. Similarly, monotonicity of the \(\otimes{}\) operator tells us that if the firing duration of an actor firing is abstracted to a worst case, the actor completion time computed with max-plus algebra results in a valid worst-case abstraction.

Because both operators are monotone, the monotonicity result extends straightforwardly to all operations on vectors and matrices. We define less-than-or-equal for vectors and matrices by element-wise comparison.

If \({\textrm{\bf{}{v}}}\) and \({\textrm{\bf{}{w}}}\) are vectors of the same size \(n\) then \[ {\textrm{\bf{}{v}}}\leq{}{\textrm{\bf{}{w}}}\Leftrightarrow{} \forall_{1\leq{}k\leq{}n} {\textrm{\bf{}{v}}}[k] \leq{} {\textrm{\bf{}{w}}}[k] \]

Analogously, we define the relation for matrices of the same size \(m\) by \(n\): \[ {\textrm{\bf{}{A}}}\leq{}{\textrm{\bf{}{B}}}\Leftrightarrow{} \forall_{1\leq{}k\leq{}m,1\leq{}l\leq{}n} {\textrm{\bf{}{A}}}[k,l] \leq{} {\textrm{\bf{}{B}}}[k,l]\]

If \(c\leq{} c'\), \({\textrm{\bf{}{x}}}\leq{}{\textrm{\bf{}{x}}}'\), \({\textrm{\bf{}{y}}}\leq{}{\textrm{\bf{}{y}}}'\), \({\textrm{\bf{}{A}}}\leq{}{\textrm{\bf{}{A}}}'\) , \({\textrm{\bf{}{B}}}\leq{}{\textrm{\bf{}{B}}}'\) we have all of the following results. \[ \begin{aligned} c\otimes{}{\textrm{\bf{}{x}}} &\leq{} c'\otimes{}{\textrm{\bf{}{x}}}' \\ {\textrm{\bf{}{x}}}\oplus{}{\textrm{\bf{}{y}}} &\leq{} {\textrm{\bf{}{x}}}'\oplus{}{\textrm{\bf{}{y}}}' \\ {\textrm{\bf{}{A}}}{\textrm{\bf{}{x}}} &\leq{} {\textrm{\bf{}{A}}}'{\textrm{\bf{}{x}}}' \\ {\textrm{\bf{}{A}}}{\textrm{\bf{}{B}}} &\leq{} {\textrm{\bf{}{A}}}'{\textrm{\bf{}{B}}}' \\ |{\textrm{\bf{}{x}}}| &\leq{} |{\textrm{\bf{}{x}}}'|\\ ({\textrm{\bf{}{x}}},{\textrm{\bf{}{y}}}) &\leq{} ({\textrm{\bf{}{x}}}',{\textrm{\bf{}{y}}}') \\ \end{aligned} \]

We will see later how we can use max-plus algebra to compute performance metrics such as throughput or latency. The monotonicity property can be used to prove that such metrics computed on worst-case abstractions of a system results in conservative estimates of the actual system metrics.

Example 15 (Variable Response Times)

Assume that we want to model tasks executing on a processor that employs round-robin arbitration between three applications. The arbitration works as follows. It offers service of the processor to the three applications in a cyclic fashion. When the application has workload to perform, it can use the processor until all its work is done, or until it has used the processor for the maximum period \(P\), whichever comes first. After that, or immediately, in case the application did not have any work to perform, the arbitration turns to the following application in the cycle and offers service to that application. This process continues. When none of the applications has any workload, the processor idles until one of the applications has some workload again.

From the perspective of one of the applications, the system is not deterministic. The behavior may vary with the workload of other applications. Assume the period of the round-robin arbitration is \(P=4\) and the application needs to execute tasks which need the processor for \(10\) time units. Then, in the best case, it takes exactly \(10\) time units to complete if the other applications do not require the processor service. In the worst case, it may take \(34\) time units. The processor may be allocated to both other applications first for two periods (8 time units) Then the 10 time units are interrupted twice, after 4 and again after 8 time units with two more interruptions of two periods each. Any duration in between \(10\) and \(34\) is also possible.

This means we could model the task execution with the following dataflow model.

Note that the self-edge prevents multiple instances of the task from executing simultaneously, which is not possible as they would need the same processor. Hence, a deterministic worst-case model exists of the processor which is itself not deterministic. Monotonicity of the dataflow model ensures that any performance metrics we derive using the dataflow model are conservative with respect to the actual performance with the real round-robin arbitration.

The simple model we derived above is, however, more pessimistic than necessary. Imagine for example that two task instances arrive together. They need a total of 20 time units of the processor. With similar reasoning as applied above, we can conclude that this requires in the worst case, 60 time units, but the dataflow model above predicts 68 time units.

The reason is that the worst-case conditions that determine the response time of 34 for a single task cannot occur also for the second task if the processing of the second task immediately follows the processing of the first.

We often use the following type of dataflow model instead, which is more accurate. It is called a latency-rate dataflow model since it has two actors, the first models the average processing rate, which is \(\frac{1}{30}\) since the task is 10 time units and we get on average in the worst case a third of the processor. The second actor, notably without a self-edge, then accounts for the worst-case latency. For the example, it requires a delay of 4 for the latency actor to be conservative, for instance, to account for the response time of 34 for a single task.

Note, however, that this model is still pessimistic. It has, in contrast to the first model, the correct worst-case throughput, but it predicts a response time for two task executions of 64, while we have seen that it is 60. An exact worst-case model exists in this particular case, shown below, but it need not exist in general. Close approximations can be very large, so they are not commonly applied.

More details about accurate response-time modelling, especially for TDM (Time-Division Multiplex) arbiters, can be found in (Lele, Moreira, and Cuijpers 2012). The key message of this example is that worst-case max-plus-linear models, because of monotonicity, can be used to reason about the performance metrics of a system under worst-case timing assumptions. These metrics then provide bounds on the actual metrics.

In the same spirit, of earlier-time-stamps-are-better, we define the following relation on event sequences that we will use in Section 12.

Definition 29 (Ordering Event Sequences)

Let \(s_1\) and \(s_2\) be event sequences. \(s_1\leq{}s_2\) if and only if for all \(k\in{}\mathbb{N}\) such that \(s_2(k)\) is defined, \(s_1(k)\) is also defined and \(s_1(k)\leq{}s_2(k)\)

Observe that in the definition of \(s_1\leq{}s_2\) we allow the event sequence \(s_1\) to be longer (have more events) than the event sequence \(s_2\). This aligns well with our notion of worst-case modeling: a system that can produce more output events is better than a system that does not produce those output events at all. Hence, \(\epsilon{}\leq{}s\) for any event sequence \(s\).

Convolution is monotone. If \(s\), \(s'\), \(t\) and \(t'\) are event sequences such that \(s\leq{}s'\) and \(t\leq{}t'\), then \[ s\otimes{}t \leq{} s'\otimes{}t' \]

Therefore, max-plus-linear and index-invariant event systems are monotone. If \(s\mathbin{\stackrel{S}{\longrightarrow}}o\) and \(s'\mathbin{\stackrel{S}{\longrightarrow}}o'\), then \(o\leq{}o'\). And if system \(S\) has impulse response \(h\) and system \(S'\) has impulse response \(h'\) such that \(h\leq{}h'\), \(s\mathbin{\stackrel{S}{\longrightarrow}}o\) and \(s\mathbin{\stackrel{S'}{\longrightarrow}}o'\), then \(o\leq{}o'\).

Equations \({\textrm{\bf{}{A}}}{\textrm{\bf{}{x}}}={\textrm{\bf{}{b}}}\) in max-plus algebra often do not have exact solutions. Instead, one can ask the question, what is the largest vector \({\textrm{\bf{}{x}}}\) such that \({\textrm{\bf{}{A}}}{\textrm{\bf{}{x}}}\leq{}{\textrm{\bf{}{b}}}\) for a given matrix \({\textrm{\bf{}{A}}}\) and vector \({\textrm{\bf{}{b}}}\).

For example, consider the following equation. \[ \begin{bmatrix}{1} & {2}\\{3} & {4}\\\end{bmatrix} \left[\begin{array}{c}{x_1}\\{x_2}\\\end{array}\right] \leq{} \left[\begin{array}{c}{7}\\{5}\\\end{array}\right] \]

We can derive the largest possible values for which the equation holds. If we write out the equation per row we obtain the following two equations. \[ \begin{aligned} 1\otimes{}x_1\oplus{} 2\otimes{}x_2 & \leq{} 7\\ 3\otimes{}x_1\oplus{} 4\otimes{}x_2 & \leq{} 5\\ \end{aligned} \]

If we consider that \(\max(a,b)\leq{} c\) is equivalent to \(a\leq{}c \wedge b\leq{}c\), we obtain four conditions. \[ \begin{aligned} 1\otimes{}x_1 & \leq{} 7 , & 2\otimes{}x_2 & \leq{} 7\\ 3\otimes{}x_1 & \leq{} 5 , & 4\otimes{}x_2 & \leq{} 5 \end{aligned} \]

This means for \(x_1\) that \[ x_1 \leq{} 7-1 ~~\text{and}~~ x_1 \leq{} 5-3 \]

and for \(x_2\) that \[ x_2 \leq{} 7-2 ~~\text{and}~~ x_2 \leq{} 5-4 \]

Then the largest values for \(x_1\) and \(x_2\) that satisfy the constraint can easily be found to be. \[ x_1 = 2 ~~\text{and}~~ x_2 = 1\\ \] \[ \begin{bmatrix}{1} & {2}\\{3} & {4}\\\end{bmatrix} \left[\begin{array}{c}{2}\\{1}\\\end{array}\right] = \left[\begin{array}{c}{3}\\{5}\\\end{array}\right] \leq{} \left[\begin{array}{c}{7}\\{5}\\\end{array}\right] \]

We can now generalize the method to the general case of an equation \({\textrm{\bf{}{A}}}{\textrm{\bf{}{x}}}\leq{}{\textrm{\bf{}{b}}}\). The row equation for row \(k\) is \[ \bigoplus_{m} {\textrm{\bf{}{A}}}[k,m]\otimes{}{\textrm{\bf{}{x}}}[m]\leq{} {\textrm{\bf{}{b}}}[k] \]

Thus we have for all \(k,m\): \[ {\textrm{\bf{}{A}}}[k,m]\otimes{}{\textrm{\bf{}{x}}}[m]\leq{} {\textrm{\bf{}{b}}}[k] \]

Note the the equation is trivially satisfied if \({\textrm{\bf{}{A}}}[k,m]=-\infty{}\). Therefore, we get the equations for all \(k\), \(m\) such that \({\textrm{\bf{}{A}}}[k,m]\neq{}-\infty{}\): \[ {\textrm{\bf{}{x}}}[m]\leq{} {\textrm{\bf{}{b}}}[k] - {\textrm{\bf{}{A}}}[k,m] \]

Note that in case \({\textrm{\bf{}{A}}}[k,m]=-\infty{}\) for all \(m\), i.e., an entire column of \({\textrm{\bf{}{A}}}\) is equal to \(-\infty{}\), then \({\textrm{\bf{}{x}}}[k]\) can be arbitrarily large and such a specific largest vector \({\textrm{\bf{}{x}}}\) does not exist. In case all columns of \({\textrm{\bf{}{A}}}\) contain at least one value different from \(-\infty{}\), we find the elements of the largest vector with the following equation. \[ {\textrm{\bf{}{x}}}[m] = \min_{k~\text{s.t.}~{\textrm{\bf{}{A}}}[k,m]\neq{}-\infty{}} {\textrm{\bf{}{b}}}[k] - {\textrm{\bf{}{A}}}[k,m] \]

Exercise 16 (A producer-consumer pipeline -- worst-case abstractions)

Reconsider the producer-consumer pipeline of Exercise 2, with function \(f_2\) for the filter actor \(F\) that takes 3 time units to process once on a given processor. Assume Actor \(F\) is sharing its processor with another application. Further, assume that the processor uses TDM scheduling with a period of 4 time units, in which each application gets two of the four time slots. It is unknown which time slots are given to each of the applications, and the time slots are dedicated to the application even if it does not have any workload).

  1. Provide a three-actor conservative dataflow model of the producer-consumer pipeline following the structure of the model given in Exercise 2.
  2. Adapt the model of the first item by including a conservative latency-rate abstraction for the processing of \(f_2\) in the model.
  3. What are the first three completion times of Actor \(C\) under the true worst-case conditions for \(F\)? What are the first three completion times of \(C\) predicted by the models of the previous two items? Assume in all cases that initial tokens are available at time 0.

See answer