First Advisor

Michael A. Driscoll

Date of Publication


Document Type


Degree Name

Master of Science (M.S.) in Electrical and Computer Engineering


Electrical and Computer Engineering


Parallel processing (Electronic computers), Multiprocessors, Sorting (Electronic computers)



Physical Description

1 online resource (2, v, 63 p.)


The effective use of computational resources requires a good understanding of parallel architectures and algorithms. The effect of the parallel architecture and also the parallel application on the performance of the parallel systems becomes more complex with increasing numbers of processors. We will address this issue in this thesis, and develop a methodology to predict the overall execution time of a parallel application as a function of the system and problem size by combining simple analysis with a few experimental results. We show that runtimes and speedup can be predicted more accurately by analyzing the functional forms of the sequential and parallel times of critical code segments of a parallel application that affect the speedup of a parallel program. We then combine the functional forms to model the runtime of a parallel application. A small set of experiments are sufficient to get a good estimate of the coefficients for the runtime models obtained. Speedup can then be derived for any case from the runtime model. We also analyze the effect of the 1/0 on runtimes in memory bounded parallel systems, and how speedup is affected by communication and 1/0. Throughout the thesis we use the bitonic merge sort as a typical realistic parallel application to illustrate our methodology. Several variations of the sorting algorithm (such as problem size greater than or equal to the system size, unlimited or limited buffer size) suitable for a wide range of problem sizes are implemented in two parallel environments and the speedups for them are measured and compared with different speedup predictions. We have conducted numerous experiments using Multi and PVM to empirically study speedup for the different realistic implementations of the bitonic merge sort. The results show how well the various models predicted speedup, and that our methodology can predict speedup accurately for a given parallel application. One interesting value from a speedup curve is the roll-off point - the system size beyond which speedup actually decreases when the number of processors is increased. Results show that simple theoretic models predicted roll-off point to be higher than the actual values, where as our methodology predicted it to be less than and closer to the actual values. The predictions by our methodology also compare well with the speedup estimates provided by the Multi tool.


If you are the rightful copyright holder of this dissertation or thesis and wish to have it removed from the Open Access Collection, please submit a request to and include clear identification of the work, preferably with URL

Persistent Identifier