Real-Time Scheduling for Safe Autonomous Driving
“Scheduling” refers to the arrangement of tasks in a certain sequence. According to the casual definition by Mindtools, which considers scheduling in the context of daily life, scheduling is the art of planning your activities so that you can achieve your goals and priorities in the time available to you. Put another way, scheduling ensures that there is enough time for the most essential tasks by prioritizing those tasks, that are the most urgent. Translated to the world of hard real-time systems: The key function of scheduling is to guarantee the execution of safety-critical tasks. Not only is the actual execution of tasks what matters, but that they are performed accurately and in time.
We have already discussed hard real-time systems and the requirements they must meet in our previous article ‘Real-Time Automotive-Safe Systems for the Future’. To introduce the notion, Prof. Kopetz describes the scheduling problem as follows in his classic book Real-Time Systems: “A hard real-time system must execute a set of concurrent real-time tasks in such a way that all time-critical tasks meet their specified deadlines. Every task needs computational and data resources to proceed. The scheduling problem is concerned with the allocation of these resources to satisfy all timing requirements.”
Let’s take this to the context of our industry, where automotive E/E systems are evolving towards a consolidation of traditionally separated domains. With the aggregation of car functions onto common hardware, the importance of a system’s accurate time behavior is on the rise. This is particularly true for the autonomous driving (AD) systems, which incorporate complex functions along with strong safety requirements. Their software functions must co-exist and share resources with other software-powered car functions without sacrificing their real-time safety requirements, thereby raising the bar for car operating systems. Furthermore, AD not only brings hard real-time requirements for individual functions, but additional prioritization dependencies between networks of sensors, control software, and actuators. TTTech Labs scientist Silviu S. Craciunas and the researchers of the Danish Kongens Lyngby University find that these, so called mixed criticality systems, require higher standards in terms of safety-critical compliance, temporal and spatial isolation, resulting in a need for new scheduling capabilities at runtime and a new configuration tools at design time.
These challenges grow along with the increasing amount of digital functions found in cars, driven by the advent of autonomous, connected, electrical and shared (ACES) mobility. To respond adequately, modern vehicles need not only additional computational performance to run increasingly complex software, but mechanisms that actually guarantee the individual and collective safety levels and priorities of all car functions. This is why safe real-time scheduling is central to the future of the automobile.
What is real-time scheduling?
A distributed real-time computer system executes multiple system functions simultaneously across its nodes. To ensure consistent execution of functions throughout the distributed system, all incoming events must be processed by their respective nodes in the temporal order in which they occur.
Therefore, all interconnected components of a distributed system must be properly synchronized relative to one-another to ensure the reliable behavior of the overall system. For real-time distributed systems, it is furthermore necessary that the local clocks of all nodes are properly synchronized with a reference clock that follows the metrics of the physical world, by aligning to the International Atomic Time (TAI).
Nodes of a real-time distributed system must operate under the same global time
To define “proper” synchronization, we can imagine that each node ticks with its local clock and we will call these ticks microticks. The subset of microticks that is aligned to the ticks of global time within an acceptable tolerance measure, will be called macroticks (or simply tics) of global time. Global notion of time is thus introduced to the system when these macrotics reflect properly selected microtics from the synchronized local physical clocks of an ensemble of nodes under consideration.
In addition to the synchronous execution of tasks across nodes, coordinated communication between nodes is necessary to enable the successful operation of the entire system. Each task is built up of multiple individual actions or steps. During the execution of each task, a sequential program reads the input data and the internal state of the node, determines, and delivers output results and updates the internal state of the node.
All this can be measured in time: The actual duration of a task is the time interval between the start and the termination of a task. The maximum possible duration for the execution of a task, under all possible input data and execution scenarios, is called the worst-case execution time (WCET). The closer the WCET can be set to the runtime in real life scenarios, the more efficiently the schedule is used. Jitter is the difference between the WCET and the minimum duration of a task (the smallest time interval for task execution).
In hard real-time systems, all critical tasks must meet their deadlines
In hard real-time computer systems, safety-critical tasks must be executed under strict timing constraints. These tasks must meet all their deadlines and be executed within very tight time intervals to achieve “global” real-time end-to-end guarantees. Scheduling is the method by which the resources of a computer system are allocated to tasks to ensure their accurate timely execution. Building on this, real-time scheduling is the method by which the distribution of tasks across the nodes of a system is performed in such a way that the maximum allowable jitter value of any task on any node is never exceeded, thereby ensuring a predictable, and thus safe, behavior of the overall system.
Achieving hard real-time scheduling for autonomous vehicles
Real-time scheduling can be achieved in different ways. Let’s consider the options available as represented here:
Real-time scheduling taxonomy
Starting from the top, the main difference between dynamic and static hard real-time scheduling is that dynamic scheduling is performed online, during runtime, while static scheduling is performed offline, before runtime. Dynamic schedulers are flexible and make scheduling decisions “on the go”. This means that they adapt to the evolving task flow and thus select the task to be scheduled from the current set of ready tasks. The overhead of finding such a schedule during runtime can be significant. While dynamic schedulers assign the task start time at runtime, static schedulers find these times already at design time. The static scheduler generates the dispatching table with the complete set of information about the tasks offline, which are then being dispatched at runtime.
Continuing in our classification diagram, we can see two different approaches to both static and dynamic scheduling – preemptive and non-preemptive scheduling. In preemptive scheduling, some tasks can be interrupted (preempted) by other more urgent tasks. In non-preemptive scheduling, tasks can decide when to release resources (usually upon completion) to the other tasks. It makes sense to schedule tasks in the non-preemptive way if many short tasks are to be executed.
Finding a feasible schedule is a complex process leading to a higher efficiency of the entire distributed system. There are certain task characteristics that can be helpful in this search. One quantity that is particularly useful for the purpose of schedulability, is task request time, a time at which a request is made for task execution. Based on the task request time, we can distinguish two types of tasks: periodic and sporadic. After the periodic task is requested for the first time, all future request times are known by adding multiples of a known period to the initial request time. For sporadic tasks, the request times are not known prior to their activation. In this case, the schedulability criterion is the existence of a minimum interval between any two request times of sporadic tasks. Furthermore, if there is no such request time constraint, we call this task an aperiodic task.
Considering the options available, what is the right approach for real-time scheduling for AD systems? We already pointed out that AD is critically dependent on hard real-time behavior. Looking “inside the box” of an AD system, we would see many concurrent tasks sharing common resources and exchanging data. In real-world scenarios this involves the consideration of many dependencies such as precedence of higher over lower priority tasks and mutual exclusion constraints between tasks to find a suitable schedule. Faced with these challenges, dynamic scheduling techniques can hardly guarantee the tight deadlines available for such tasks, especially taking into account the need to manage the communication between the nodes of an E/E architecture. In other words, the dynamic models are not expressive enough to capture all those real-life requirements efficiently, hence it becomes hardly possible to ensure guarantees once the system goes beyond the (extremely oversimplified) assumptions in the dynamic models. Thus, to achieve the needed level of predictability that actually ensures safety for the autonomous driving function we need a static pre-run-time scheduling instead.
A static schedule is a schedule determined during design-time based on the totality of events and execution activities, applied to systems of different levels of complexity, including distributed real-time systems. It handles both the preplanned resource usage and the preplanned access to a communication medium in the distributed system. Although the computer system cannot control external interruptions, the points in time when these incoming events will be serviced can be defined a priori, based on the assumptions for each class of events. We can see static scheduling as a search for a feasible schedule. The goal of this search is to find a complete schedule which considers all precedence relations and mutual exclusion constraints between tasks and ensures that all tasks finish before their deadlines. To improve and optimize the results, a heuristic function can be applied to the search. Such a search optimization strategy has been proposed by the award-winning paper Mapping and Scheduling Automotive Applications on ADAS (Advanced driver-assistance systems) Platforms using Metaheuristics.
Although static scheduling serves as a method for achieving high level of system predictability, it can only handle the cases where dependencies among tasks are already known at the design time. To be able to address variable real-life use cases, certain methods are introduced to increase the flexibility of static scheduling.
One of the efficiency measures of this kind is mode switching. ADAS applications operate in different modes of operation, such as the parking mode or the highway mode. The same services and functions are not required for these two modes. If the tasks are scheduled separately, depending on the active mode, better resource utilization can be achieved. Resources are then allocated when and to where they are needed. System designers must identify all operational and emergency modes and calculate static schedules for each mode offline, then when mode switching is requested at runtime, the mode switching schedule must be activated accordingly.
These are the ingredients static real-time scheduling brings to the table to ensure safe AD system behavior along with the optimal usage of computational resources under the wide range of possible scenarios and their individual requirements. They must ensure the correctness and responsiveness of the system.
Safe real-time scheduling in practice with MotionWise
The already mentioned ACES developments are pushing the automotive industry towards software-defined and centralized E/E architectures. With this trend, vehicle operating systems are expected to evolve in the direction of supporting all vehicle functions on few central compute hubs, while at the same time, ensuring the availability and safety of the overall system. The need for testing and validation efforts are increasing, as is the need to simulate real-world scenarios for accurate and trustworthy system behavior. This consolidation of domains, along with different applications of mixed criticality integrated into a system, require a system design supporting the guaranteed prioritization of critical software functions and their freedom from possible interference. Without these capabilities the automotive industry cannot ensure the responsiveness of AD systems and, consequently, the guarantee of the safety of this service.
Enter MotionWise, the software safety platform that addresses these problems with a combination of runtime services and design time tools, providing end-to-end guarantees for software behavior. By defining and enforcing predefined execution boundaries for each application and implementing pre-runtime static scheduling, MotionWise orchestrates mechanisms provided by lower-level Operating Systems to ensure the real-time behavior of the overall system. With its tooling, MotionWise enables the development of car software along the development lifecycle system including design, integration, testing and validation, in a coordinated and seamless manner.
Let’s have a look under the hood to understand the technologies at work here: MotionWise provides an execution manager that orchestrates the schedules of multiple applications across all hosts in a system deterministically. All communication between hosts is implemented over a deterministic network and orchestrated by MotionWise Communication Manager Stack. The global scheduling concept ensures that all the task schedules of the hosts are aligned with the network schedule of the backbone communication network connecting these hosts. Because global scheduling requires the same concept of time on all hosts, it must be supported by dedicated MotionWise features for reliable time synchronization. Moreover, all this is abstracted from the user-perspective by MotionWise Planning Tools and Execution In-Vehicle Software Stack.
The complexity of modern distributed systems, combined with the extremely complicated and unpredictable real-life driving conditions, creates a challenging starting point for finding a proper schedule that could meet the safety requirements of highly automated, software-defined, vehicles. Thus, creating a valid schedule requires a global view: system-wide planning and global time consideration. This often implies architectural work on a multi-host, sometimes even a multi ECU environment. The global scheduler creates a schedule for the entire heterogeneous, multi-ECU, multi-SoC, multi-core system. This is achieved by the MotionWise Creator, which generates the efficient schedule based on the constraints defined by the user. The tool also provides the information when a feasible schedule is not possible.
MotionWise Global Scheduler Capabilities
As already mentioned, real-life situations often introduce highly complex task flow, when numerous interdependent, mutually exclusive tasks need to be properly managed. To make sure that all those tasks are well organized and scheduled based on their interdependencies and latency requirements, MotionWise groups these tasks into, so called, computational chains (CC). Moreover, they can be scheduled on any host and have different time periods. At runtime, the deterministic execution of a CC is ideally ensured by the WCET of its tasks and message transfers. If the WCET cannot be accurately estimated, there are no guarantees that a particular task will not overrun its scheduled time. In practice, runtime budgets are estimated iteratively to minimize the probability of an overrun. In the rare case that an overrun still occurs, MotionWise detects, reports, and triggers an appropriate error reaction. Incoming tasks can be classified as event-driven, data-driven, or time-driven. For example, planning and control are time-driven, while multiple asynchronous sensor inputs are event-driven. An example of data-driven tasks is a perception layer, which consists of a sequence of processes with data dependencies between them.
As different applications run on different hosts based on their requirements and priorities, the ADAS system architect decides which policies to apply to which application in order to meet all the functional and safety requirements.
A time-triggered scheduling policy enables highly deterministic execution of the relevant time-driven tasks. These tasks are activated and executed during their respective fixed-length time slots and occur periodically at predetermined points in time. The scheduling tables are generated offline and deployed statically. This approach allows for a high degree of freedom from interference, however, the tasks may not require the entire allocated time budget for their execution (especially those with highly varying runtimes) and therefore require a more efficient use of resources for such scenarios. Since even safety-critical tasks can have variable runtimes, it is important to use resources efficiently on each host. Furthermore, in the overall processing activity of our hard real-time system, not only periodic, but also sporadic tasks need to be considered. This seems to be the weak spot for static scheduling.
Clearly, the modern automotive industry requires flexible scheduling mechanisms to cover the variety of possible use cases. As a respond to real-world driving conditions, MotionWise scheduling capabilities reflect the perfect synergy between flexibility and safety. Our time-aware solution architecture enables the execution of diverse applications by allowing both event-driven and data-driven tasks to be completed while still providing time guarantees. For example, periodic sensor inputs (camera image frames) are inherently asynchronous and therefore must be handled in an event-driven manner. The MotionWise Scheduling Service takes care of both time driven and event-driven tasks by integrating event-driven tasks (sensor inputs) into the time-aware architecture. Unlike the predefined time slots for time driven tasks, the time slots for events are calculated based on the guarantees that must be met for the events.
Event-driven tasks in our time-aware architecture
MotionWise also allows for data-driven scheduling, in which a user can define a group of tasks that are data-driven. Use case for the data-driven approach is the dynamic sequential execution of tasks in a perception layer where a data-driven task can start as soon as the preconditions are met (input data is ready). Although the execution of individual tasks is not essentially conditioned by the completion of previous tasks, it is crucial that the end-to-end latency does not exceed the upper bound.
Optimizing resources with data-driven scheduling
Being flexible and accurate at the same time, MotionWise showcases different mechanisms that support this ultimate safety goal. We have already mentioned mode switching as a very useful method for increasing the efficiency of static scheduling. In order to allocate resources efficiently, it is important to have a separate scheduling approach for the different operating modes of a system. In other words, having a single scheduling table causes non-optimal resource usage and longer time taken in generating a schedule. The MotionWise Scheduling Service supports switching between multiple pre-configured scheduling tables at runtime for mutually exclusive applications such as highway pilot and parking assistance.
As has been argued in this article, hard real-time scheduling is critical to ADAS and the overall AD evolution. The MotionWise platform provides all the needed capabilities today to plan, build and run safe real-time E/E architectures. As ACES mobility becomes an incremental reality in our world, technological advancements must be made while ensuring that human lives are never put at risk. At TTTech Auto, we understand the implications of this journey and work passionately every day to deliver unconditional solutions for safety, the requirement for truly prosperous modern mobility.
About the Author
Marija Sokcevic is a Technical Content Manager at TTTech Auto. She holds a master’s degree in Physics from the University of Zagreb. One of her greatest passions is to link creativity and technology by expressing the most recent technological findings through different media.
Accelerate your journey towards highly automated driving with MotionWise safety software platform. MotionWise delivers safety by design and fail-operational performance while managing the high complexity of solution elements. As a result, OEMs and Tier 1 suppliers can benefit from faster time-to-market for their automated driving projects and increased competitive edge at reduced costs.