Vishesh Goyal, Dr. Pavithra N.

Every large-scale computing cluster, from cloud engines to supercomputers, runs on a job scheduler. Most of them are decades-old algorithms that don't know what's happening in the system right now. PRIORIS is an IEEE-published adaptive job scheduling framework for High Performance Computing environments that replaces static scheduling and failure prediction with real-time resource awareness, dependency-driven job promotion, and starvation prevention. Evaluated on 5000 synthetic jobs, it reduced makespan by 24.7% and average wait time by 31.5% compared to the standard First-Come-First-Served baseline.
The problem with most HPC job schedulers isn't that they're bad, it's that they're static in a dynamic world. First-Come-First-Served doesn't know that a high-priority job is waiting behind a resource hog. Shortest Job First can't handle job dependencies. Failure prediction models require historical training data and break down in new environments. None of them adapt to what's actually happening in the system right now.
PRIORIS takes a different approach entirely. Instead of predicting failures, it avoids them by checking real-time resource availability before every job dispatch and dynamically reordering the queue based on current system state.
The core of the algorithm is a Calculated Priority Metric that integrates base priority, estimated runtime, and resource cost into a single scheduling score. Short, lightweight jobs get promoted. Heavy resource consumers get appropriately penalised. This isn't round-robin or FCFS, every position in the queue is earned.
Four mechanisms make PRIORIS distinct from prior work:
— Dependency-Driven Promotion: If Job A depends on Job B, Job B is automatically promoted by a distance proportional to Job A's runtime. This prevents high-priority jobs from stalling on unresolved dependencies, a problem that SLURM and PBS Pro don't handle without external configuration.
— Dynamic Resource Checking: Before any job executes, the scheduler verifies live CPU, memory, disk I/O, and network bandwidth availability. Jobs that can't run right now don't block the queue, they're pushed down and reconsidered.
— Waiting Queue with Anti-Starvation: Jobs pushed down more than 3 times enter a dedicated waiting queue that is prioritised over the main queue every 4th scheduling cycle. No job waits forever.
— Limited Parallelism: Up to 2 jobs with adjacent priority scores may execute simultaneously, maximising utilisation without oversubscription.
PRIORIS outperformed every benchmark, including a failure prediction model that requires historical training data, without needing any prior data at all. The 500 remaining dependency violations are a known limitation acknowledged in the paper, and are the primary target for the next iteration.
The key philosophical shift: most schedulers ask "will this job fail?" PRIORIS asks "do we have the resources to run this job right now?" That shift from prediction to prevention is simpler, more interpretable, and more robust.
Published at the 2025 IEEE International Conference on Automatic Control and Intelligent Systems (I2CACIS), Kuala Lumpur. DOI: 10.1109/I2CACIS65476.2025.11101086