File(s) under permanent embargo
Cache performance in multithreaded processor architectures
thesisposted on 2023-05-26, 17:51 authored by Neumeyer, Paul Grant
Multithreading techniques used within computer processors aim to provide the computer system with a means to tolerate long latency operations and also dynamically convert variable software concurrency into the maximum parallelism in hardware. To meet this challenge resources must be allocated to threads. Cache in the memory hierarchy poses a problem because it is not an allocated resource. The effect of interference in the cache between coactive threads because of this can lead to poor performance results compared to the same threads executing without interference on similar hardware. The memory cache poses a resource management problem in a multithreaded architecture because it does not distinguish between accesses from different threads and thus is transparent to the memory model. Short lived or intermittent threads can displace the cache lines still in active use by other threads. This interference in the cache voids the benefit of the cache to those other threads on the subsequent access to a displaced active cache line. Techniques to reduce this interference will increase the performance achieved from multithreaded hardware. We propose analytical models for the instantaneous miss rate of the cache that enable the interference between threads to be determined. The interference between threads is determined dependent on their dynamic interaction. Inputs to the models are the cache design parameters and the memory behaviour of each thread. Sharing, latency effects, cache design, and memory usage are investigated with the models. A type of miss, the latency miss, is investigated. This category of miss has not been described previously and occurs between threads with shared memory in non-blocking multithreaded architectures. A proposal is also made for an approach called thread pulling which is a scheduling scheme aimed at reducing interference by increasing the locality of reference of concurrent threads. New methods were determined that produce mathematically tractable measurements of memory behaviour for the analytical cache miss rate models. The dependence on arbitrary factors used while making the measurements was minimised via these methods. Using these, the calculation of the working set of a thread at an instant without requiring an arbitrary choice of the working set parameter is developed. Two cache miss models are proposed, validated, and used to analyse the performance of the cache under a range of multithreaded loads. First, a single threaded analytical cache miss model at a point in time. Second, a multithreaded analytical cache miss rate model using the working sets of the active threads at a point in time. Significant agreement was found between the predictions of the models and measurements of the real cache performance. Our analytical cache miss rate models provide insight to computer architects and compiler writers toward performance related optimisations.
Rights statementCopyright 1999 the author - The University is continuing to endeavour to trace the copyright owner(s) and in the meantime this item has been reproduced here in good faith. We would be pleased to hear from the copyright owner(s). Thesis (Ph.D.)--University of Tasmania, 2000. Includes bibliographical references