Design Considerations of a Distributed Task Scheduler
Queueing
A distributed queue is a fundamental building block of a scheduler. The simplest approach is first come, first served (FCFS), in which the scheduler dequeues tasks from the queue and assigns them to available nodes. However, if all resources are busy, small tasks can be blocked by long-running ones.
This head-of-line blocking degrades system reliability and availability. To guarantee low-latency handling of urgent tasks such as security notifications, a pure FCFS policy is insufficient. Instead, tasks are classified into priority tiers:
Urgent: Tasks that cannot be delayed.
Delayable: Tasks that can wait for resources.
Periodic: Tasks executed on a schedule (e.g., every hour).
To prevent starvation, the system monitors non-urgent queues. If a task approaches its delay limit, the ...