Portland State University. Department of Computer Science
Term of Graduation
Date of Publication
Doctor of Philosophy (Ph.D.) in Computer Science
Parallel processing (Electronic computers)
1 online resource (2, ix, 140 pages)
System administrators for parallel computers face many difficulties when managing job scheduling systems. First, current production job schedulers use many parameters, which seem flexible but it is highly challenging to configure and tune these parameters. Second, fair share is an important scheduling goal, but it is not clear what kind of fair share can be expected under current schedulers and how fair share impacts scheduling performance. Third, several job runtime prediction methods were proposed to improve inaccurate user-estimated runtimes, but these methods could under-estimate runtimes by a large amount and it is not clear whether they are practical for use on real systems. To address these issues, we study existing scheduling policies and design new policies. We evaluate policy performance by event-driven simulation, using real job traces.
To simplify the system administration task, we propose a new scheduling framework, which allows the system administrators to specify only high-level objectives, while the scheduler automatically decides the schedules according to the given objectives and adapts to workload changes. We investigate several design and implementation choices of the goal-oriented policies. We show that by optimizing performance for objectives, goal-oriented policies have the potential to considerably improve the performance.
To provide a better understanding of fair share policies supported by current production schedulers and their impact on scheduling performance, we evaluate two classes of fair share policies using a wide range of performance measures and several fair share measures proposed in this thesis. Our evaluation results show that fair share indeed reduces heavy-demand users from dominating system resources. However, our detailed per-user performance results show that some types of users may suffer unfairness under fair share, possibly due to priority mechanisms used by the current schedulers.
As for runtime predictions, we find that using previous methods results in poor performance and unfairness problems, because of under-estimated runtimes induced by predictions. To reduce the problems, we investigate several alternative methods, including inflated each initial prediction by half of the requested runtime and two-class runtime estimates. We find that these alternative methods can outperform previous methods in most cases.
In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/
Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
Vasupongayya, Sangsuree, "Goal-Oriented Job Scheduling for Parallel Computer Systems" (2007). Dissertations and Theses. Paper 6068.