Java Runtime System for Parallel Execution of Multithreaded Java Application
Funded by the NSF



In this project we will design a Java framework for high performance computing (HPC) on PC clusters (Beowulf clusters). We will be using a new lottery-based job-stealing algorithm for efficient scheduling of large-scale multithreaded computations on PC clusters. In the proposed algorithm, idle processors actively search out new jobs rather than wait for jobs to be assigned. In the lottery game, each processor is equipped with a set of tickets and the number of tickets is proportional to the age of the oldest thread in the ready pool of the processor. A winning ticket is drawn at random and the processor with the winning ticket becomes the victim from which the idle processor steals jobs. The proposed selection procedure serves two purpose. First, we try to lower communication costs by stealing a large amount of jobs by a lottery-based job-stealing algorithm, with the logic behind it being that old-aged computations are likely to spawn more jobs than relatively young computations. Second, we would like to bias the search to obtain favorable results while at the same time avoiding system bottlenecks. Our approach will be implemented and tested on the Beowulf architecture running under Windows/NT OS.

Another objective of this research is to establish whether the proposed lottery-based job-stealing algorithm is efficient across the board, or whether more efficient job-scheduling algorithms exist for scheduling across a small number of processors, as it is often the case with the Beowulf cluster. This will be achieved by implementing both the lottery-based job-stealing algorithm and a centralized job-scheduling algorithm (with and without load balancing). A centralized algorithm is chosen because it exhibits far less communication overhead during scheduling than does the lottery-based job-stealing algorithm. The centralized algorithm directs participating processors to a single controller processor to collect jobs. The lottery-based job-stealing algorithm requires participating processors to query each other to solicit jobs. This practice can often result in inefficiency where a processor wanting jobs contacts another processor without available jobs, requiring the process to be repeated until a job is located.

Finally, as part of the design and implementation of the Java runtime system for HPC we will explore different client-server design alternatives and concurrent design patterns. The performance and correctness of the server programs is of crucial importance to the success of many network applications. We set out to explore and experimentally contrast and compare different client-server design alternatives implemented in the Java programming language. Today, many new technologies, like the Java Virtual Machine (JVM), have matured. In JVM, for example, the time required to spawn a new thread or to obtain an object’s lock in most implementations is negligible. The use of multiprocessors is a norm. Mapping of threads to processors has been optimized tremendously. As a result, designs that have been impossible until recently have become viable propositions.