1 Introduction

Task models based upon directed acyclic graphs (DAGs) are widely used for representing recurrent real-time processes in a manner that exposes their internal parallelism, thereby enabling the exploitation of such parallelism upon multiprocessor and multicore platforms. These task models typically represent pieces of sequential (i.e., non-parallelizable) computation via vertices and their dependencies as edges between vertices; hence constructing such a model for a recurrent process requires detailed knowledge of the internal control-flow structure of the process.

Such knowledge is not always available. Furthermore, even when available, conservative estimates of the computational demands of individual vertices, e.g., via worst-case execution time (WCET) parameters, can result in severe under-utilization of computational resources during run-time. To ameliorate these problems, a measurement-based model was recently proposed (Agrawal and Baruah 2018). This model deals with the lack of knowledge of the internal structure by representing the computation of a DAG with just the two parameters \(work\) (the cumulative computation of all the vertices in the DAG) and \(span\) (the maximum cumulative computation of any precedence-constrained sequence of vertices). This model deals with the potential pessimism by requiring that two estimates be provided for each parameters: \(work _{{ O}}\) and \(span _{{ O}}\) are very conservative upper bounds (safe even under overload conditions), while \(work _{{ N}}\) and \(span _{{ N}}\) are nominal upper bounds (i.e., upper bounds under “typical” circumstances) on the values of the work and span parameters respectively. It is assumed that \(work _{{ N}}\le work _{{ O}}\) and \(span _{{ N}}\le span _{{ O}}\).

Definition 1

(The scheduling problem) Suppose we are given a task represented by the four parameters \(work _{{ N}}\), \(span _{{ N}}\), \(work _{{ O}}\) and \(span _{{ O}}\), and a deadline D and two processor counts: \(m_{{ N}}\) and \(m_{{ O}}\), where \(m_{{ N}}\le m_{{ O}}\). The scheduling problem is to finish the task with a makespan (response time) no larger than the deadline D, and we may use at most \(m_{{ N}}\) processors to do so, unless it is observed during the execution that at least one of the nominal parameters \(work _{{ N}}\) and \(span _{{ N}}\) does not provide a valid upper bound for the current invocation of the task. If this is observed, we may switch to using up to \(m_{{ O}}\) processors instead for the remainder of the execution, but we must still meet the original deadline D even if the computational demands of the task invocation turns out to be as high as \(work _{{ O}}\) and \(span _{{ O}}\). The scheduler does not know anything more about the internal details of the task than what can be deduced from the given parameters. \(\square\)

The approach presented by Agrawal and Baruah (2018) is a scheduling strategy that precomputes an upper bound \(D_{{ N}}\) on the maximum makespan that is possible when executing a task with a total work at most \(work _{{ N}}\) and a span at most \(span _{{ N}}\) upon \(m_{{ N}}\) processors using any greedy (work-conserving) scheduling (Graham 1969). It then starts to execute the given task upon \(m_{{ N}}\) processors greedily, and after \(D_{{ N}}\) time units checks whether the task has completed. If not at least one of \(work _{{ N}}\) or \(span _{{ N}}\) must have been exceeded, and so it activates the additional \((m_{{ O}}- m_{{ N}})\) processors and continues the greedy execution until completion.

The new approach in this paper is also to begin executing the task greedily upon \(m_{{ N}}\) processors, but rather than checking the progress of the task at a precomputed time point \(D_{{ N}}\), it instead monitors the total amount of execution occurring across all the \(m_{{ N}}\) processors. If the invocation does not complete before the execution equals the nominal work parameter \(work _{{ N}}\), then it activates the additional \((m_{{ O}}- m_{{ N}})\) processors and continues executing the task greedily until completion.

Contributions and comparisons The approach of Agrawal and Baruah (2018) only requires that the runtime detect whether the task has completed by time \(D_{{ N}}\). In contrast, our approach requires the capability to monitor the total progress on the work—that is, the amount of execution done across the processors. Assuming this capability is available, we will show below that our approach is, in fact, optimal—no other scheduler can guarantee to meet the deadline D under the constraints of the scheduling problem specified above if this approach cannot also do so. Note that, our approach also has the advantage that it only needs three parameters; \(work _{{ N}}\), \(work _{{ O}}\), and \(span _{{ O}}\) since it does not need to monitor whether the span exceeds \(span _{{ N}}\). In contrast, the approach by Agrawal and Baruah (2018) needs \(span _{{ N}}\) to calculate the intermediate deadline \(D_{{ N}}\).

In addition, (Expression (1) of Theorem 2) is a tight bound on the maximum makespan with this new scheduling approach. In addition to its use as a schedulability test, this expression can be used to, e.g., minimize the processor counts \(m_{{ N}}\) and \(m_{{ O}}\) needed to meet the deadline. Note that this is exactly what we want to do if the task is periodically or sporadically activated and we wanted to schedule it in a federated manner similar to Li et al. (2016).

2 Schedulability conditions

We use a well-known result about scheduling DAG tasks characterized by single \(work\) and \(span\) parameters (i.e., where we don’t separate nominal and overload scenarios).

Theorem 1

(Graham (1969)) The maximum makespan of a given DAG executed onmprocessors by a greedy (work-conserving) scheduler is no larger than\(M = (\frac{ work - span }{m} + span )\). \(\square\)

In the following, we derive a tight bound on the makespan for our new scheduling approach for DAG tasks that are characterized by parameters \(work _{{ N}}\), \(span _{{ N}}\), \(work _{{ O}}\) and \(span _{{ O}}\) for nominal and overload scenarios. Comparing this bound with a deadline is a sufficient schedulability condition for our proposed strategy and also a necessary condition for any scheduler following the rules of the scheduling problem described in Definition 1.

Theorem 2

Our proposed scheduling strategy will execute a task with a makespan that is no larger than

$$\begin{aligned} M = {\left\{ \begin{array}{ll} \frac{ work _{{ O}}- span _{{ O}}}{m_{{ N}}} + span _{{ O}}, &{} \quad \text {if } work _{{ N}}> work _{{ O}}- span _{{ O}}\\ \frac{ work _{{ N}}}{m_{{ N}}} + \frac{ work _{{ O}}- work _{{ N}}- span _{{ O}}}{m_{{ O}}} + span _{{ O}}, &{} \quad \text {if } work _{{ N}}\le work _{{ O}}- span _{{ O}}. \end{array}\right. } \end{aligned}$$
(1)

In addition, no scheduler can guarantee a smaller makespan. \(\square\)

Theorem 2 follows directly from lemmas 1 to 4, proven below. We start with lemmas 1 and 2, which demonstrate that no scheduler can guarantee a smaller makespan bound. Recall from Definition 1 that schedulers are assumed to not know the internal structure of the DAG, except for what can be deduced from the four parameters \(work _{{ N}}\), \(span _{{ N}}\), \(work _{{ O}}\) and \(span _{{ O}}\). The actual structure of the DAG may be anything consistent with those parameters.

Lemma 1

If\(work _{{ N}}> work _{{ O}}- span _{{ O}}\), then no scheduler can guarantee to complete the task with a makespan smaller than\(\frac{ work _{{ O}}- span _{{ O}}}{m_{{ N}}} + span _{{ O}}\).

Proof

Consider a task invocation where the first \(work _{{ O}}- span _{{ O}}\) units of work that can be executed is fully parallel (i.e., not on the critical path of the DAG) and the remaining \(span _{{ O}}\) units of work is sequential. Because \(work _{{ N}}> work _{{ O}}- span _{{ O}}\), no scheduler may activate the extra \(m_{{ O}}- m_{{ N}}\) processors until some time after finishing the first \(work _{{ O}}- span _{{ O}}\) units of work. This initial work cannot be finished in less than \(( work _{{ O}}- span _{{ O}}) / m_{{ N}}\) time units. After finishing these \(work _{{ O}}- span _{{ O}}\) units of work, the task invocation is left with the sequential workload that takes \(span _{{ O}}\) time units to finish no matter how many processors are available. Therefore, the task can finish earliest after \(( work _{{ O}}- span _{{ O}}) / m_{{ N}}+ span _{{ O}}\) time units. \(\square\)

Lemma 2

If\(work _{{ N}}\le work _{{ O}}- span _{{ O}}\), then no scheduler can guarantee to complete the task with a makespan smaller than\(\frac{ work _{{ N}}}{m_{{ N}}} + \frac{ work _{{ O}}- work _{{ N}}- span _{{ O}}}{m_{{ O}}} + span _{{ O}}\).

Proof

Let the task invocation be such that the first \(work _{{ N}}\) units of work executed are fully parallel, which is possible since \(work _{{ N}}\le work _{{ O}}- span _{{ O}}\). Then, no scheduler may activate the extra processors before finishing a total of \(work _{{ N}}\) units of work, which can happen earliest after \(work _{{ N}}/ m_{{ N}}\) time units. After finishing the first \(work _{{ N}}\) units of work and \(m_{{ O}}\) processors are allowed to be used, the task invocation still has \(work _{{ O}}- work _{{ N}}- span _{{ O}}\) units of work that are fully parallel, which takes \(\frac{ work _{{ O}}- work _{{ N}}- span _{{ O}}}{m_{{ O}}}\) time units to finish. Lastly, the task invocation is left with an entirely sequential part that cannot be finished in less than \(span _{{ O}}\) time units. The total time to completion is then at least \(\frac{ work _{{ N}}}{m_{{ N}}} + \frac{ work _{{ O}}- work _{{ N}}- span _{{ O}}}{m_{{ O}}} + span _{{ O}}\). \(\square\)

We now show with lemmas 3 and 4 that our proposed scheduling strategy can finish within a makespan no larger than the one specified in Theorem 2.

Lemma 3

If\(work _{{ N}}> work _{{ O}}- span _{{ O}}\), then our proposed scheduling strategy will complete the task with a makespan no larger than\(\frac{ work _{{ O}}- span _{{ O}}}{m_{{ N}}} + span _{{ O}}\).

Proof

Follows from using Theorem 1 with the more conservative task parameters \(work _{{ O}}\) and \(span _{{ O}}\) and the smaller number of processors \(m_{{ N}}\) that we are always guaranteed. \(\square\)

Lemma 4

If\(work _{{ N}}\le work _{{ O}}- span _{{ O}}\), then our proposed scheduling strategy will complete the task with a makespan no larger than\(\frac{ work _{{ N}}}{m_{{ N}}} + \frac{ work _{{ O}}- work _{{ N}}- span _{{ O}}}{m_{{ O}}} + span _{{ O}}\).

Proof

We separately consider the cases where the nominal parameter \(work _{{ N}}\) holds or not during the execution of the task invocation.

Case 1 (The total workload of the current invocation is no larger than workN): In this case the extra processors will never be activated. By Theorem 1 the makespan is no larger than \(\frac{ work _{{ N}}- span _{{ O}}}{m_{{ N}}} + span _{{ O}}\), and using the assumption \(0 \le work _{{ O}}- work _{{ N}}- span _{{ O}}\) we have

$$\begin{aligned} \frac{ work _{{ N}}- span _{{ O}}}{m_{{ N}}} + span _{{ O}}\quad \le \quad \frac{ work _{{ N}}}{m_{{ N}}} + \frac{ work _{{ O}}- work _{{ N}}- span _{{ O}}}{m_{{ O}}} + \, span _{{ O}}. \end{aligned}$$

Case 2 (The total workload of the current invocation is larger than workN): In this case, the extra \(m_{{ O}}- m_{{ N}}\) processors will get activated by our proposed approach, say after t time units. Let \(t_{\text {busy}}\) denote the total amount of time before t where all \(m_{{ N}}\) processors are busy, and let \(t_{\text {idle}}= t-t_{\text {busy}}\) denote the total time during which at least one processor is idling. Let \(work '\) and \(span '\) denote the actual remaining work and span after the first t time units and note that \(work ' \le work _{{ O}}- work _{{ N}}\) and \(span ' \le span _{{ O}}\).

Because a greedy scheduler never idles all processors unless the invocation completes and we have completed exactly \(work _{{ N}}\) units of execution after t time units, we have \(work _{{ N}}\ge t_{\text {busy}}\times m_{{ N}}+ t_{\text {idle}}\), which implies that \(t_{\text {busy}}\le \frac{ work _{{ N}}- t_{\text {idle}}}{m_{{ N}}}\). Note that the first vertex in any path is always available for execution, and so if any processor is idle we know that all critical paths must currently be executing and therefore the remaining span is also being shortened. We must then have \(span ' \le span _{{ O}}- t_{\text {idle}}\), which implies \(t_{\text {idle}}\le span _{{ O}}- span '\). Thus,

$$\begin{aligned} t \;=\; \bigl (t_{\text {busy}}+ t_{\text {idle}}\bigr ) \;\le \; \frac{ work _{{ N}}-t_{\text {idle}}}{m_{{ N}}} + t_{\text {idle}}\;\le \; \frac{ work _{{ N}}}{m_{{ N}}} + ( span _{{ O}}- span ')\left( 1-\frac{1}{m_{{ N}}}\right) . \end{aligned}$$
(2)

Using Eq. (2) and Theorem 1 we see that the total makespan cannot be larger than

$$\begin{aligned} t + \frac{ work ' - span '}{m_{{ O}}} + span ' \; \le \;&\frac{ work _{{ N}}}{m_{{ N}}} + ( span _{{ O}}- span ')\left( 1-\frac{1}{m_{{ N}}}\right) + \frac{ work ' - span '}{m_{{ O}}} + span '\\ =\;&\frac{ work _{{ N}}}{m_{{ N}}} + \frac{ work '}{m_{{ O}}} + span _{{ O}}- \frac{ span _{{ O}}}{m_{{ N}}} +\frac{ span '}{m_{{ N}}} - \frac{ span '}{m_{{ O}}}\\ \le \;&\frac{ work _{{ N}}}{m_{{ N}}} + \frac{ work '}{m_{{ O}}} + span _{{ O}}- \frac{ span _{{ O}}}{m_{{ N}}}+ \left( \frac{1}{m_{{ N}}} - \frac{1}{m_{{ O}}} \right) span _{{ O}}\\ \le \;&\frac{ work _{{ N}}}{m_{{ N}}} + \frac{ work _{{ O}}- work _{{ N}}- span _{{ O}}}{m_{{ O}}} + span _{{ O}}, \end{aligned}$$

which finishes the proof.\(\square\)