Some of the material in is restricted to members of the community. By logging in, you may be able to gain additional access to certain collections or items. If you have questions about access or logging in, please use the form on the Contact Page.
As we reach the limits of single-core computing, we are promised more and more cores in our systems. Modern architectures include many performance counters per core, but few or no inter-core counters. In fact, performance counters were not originally designed to be exploited by users as they now are, but used simply as aids for hardware debugging and testing during system creation. As such, they tend to be an "afterthought" in the design, with no standardization across or within platforms. Nonetheless, given access to these counters, researchers are using them to great advantage. Furthermore, evaluating counters for multicore systems has become a complex and resource-consuming task. This dissertation explores a Performance Monitoring System consisting of a specialized CPU core designed to allow efficient collection and evaluation of performance data for both static and dynamic optimizations. A synthesizable hardware implementation is created and compared to modern day processors. Furthermore, each component and the ISA of the system is thoroughly explored. This system provides a transparent mechanism to dynamically change how architectural features inform the operating system of process behavior, and assist in profiling and debugging. For instance, a piece of hardware watching snoop packets can determine when a write-update cache coherence protocol would be helpful or detrimental to the currently running program. The Performance Monitoring System is designed to let the hardware feed performance statistics back to the software, allowing dynamic architectural adjustments at runtime. SPLASH2 benchmarks are evaluated for cache coherency policy and task scheduling. Using these two examples, this dissertation shows how the Performance Monitoring System is programmed to find performance improvements. A 16% average performance improvement was found for cache coherency and 17% improvement was found for task scheduling.
A Dissertation Submitted to the Department of Computer Science in Partial FulfiLlment of the Requirements for the Degree of Doctor of Philosophy.
Bibliography Note
Includes bibliographical references.
Publisher
Florida State University
Identifier
FSU_migr_etd-1150
Use and Reproduction
This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s). The copyright in theses and dissertations completed at Florida State University is held by the students who author them.