Exploring the Performance Limits of Simultaneous Multithreading for Scientific Codes
Citations Over Time
Abstract
Simultaneous multithreading (SMT) has been proposed to improve system throughput by overlapping instructions from multiple threads on a single wide-issue processor. The speedup of a single application that is parallelized into multiple threads, is often sensitive to its inherent instruction level parallelism (ILP), as well as the efficiency of synchronization and communication mechanisms between its separate, but possibly dependent, threads. In this paper, we evaluate and contrast software prefetching and thread-level parallelism (TLP) techniques for a series of scientific codes executed on an SMT processor. We explore the performance limits by evaluating the tradeoffs between ILP and TLP for various kinds of instructions streams. Obtaining knowledge on how such streams interact when executed simultaneously on the processor, and quantifying their presence within each application's threads, we try to interpret the observed performance for each application when parallelized according to the aforementioned techniques. In order to amplify this evaluation process, we also present results gathered from the performance monitoring hardware of the processor
Related Papers
- → Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading(1997)232 cited
- → The MAJC architecture: a synthesis of parallelism and scalability(2000)92 cited
- → A feasibility study of hierarchical multithreading(2002)1 cited
- Improved model based on speculative multithreading(2007)
- → Quantifying the benefits of SPECint distant parallelism in simultaneous multithreading architectures(2003)