Synthetics for Multicore Machines :


Most of the modern workloads like the SPEC CPU2006 benchmarks have trillions of instructions and take weeks to months of time when fed to cycle accurate simulators. RTL level simulations are even slower than cycle accurate simulations and it is almost impossible to use such modern workloads/benchmarks for preilicon design analysis. Another problem that computer architects face is that they do not have access to a lot of target applications due to their proprietary nature as vendors hesitate to share them. We present the use of miniaturized synthetic benchmark clones for modern multithreaded workloads with multithreaded synthetic loops to solve these preilicon design challenges. Using this framework, vendors can generate these miniaturized synthetic clones for their applications and share them with processor designers.

In this project, we first characterize the target benchmark suites based on the microarchitecture independent metrics. Based on the characterize tion, we generate and provide miniaturized synthetic benchmark clones for these workloads aiding to accelerate simulation based architecture research. We use multiple loops to model a synthetic benchmark and show that our methodology yields in generating more accurate synthetic clones compared to previous approaches. We show the efficacy of our generated synthetic benchmarks by comparing their performance and power characteristics with that of their original counterparts. We also evaluate the ectiveness of these synthetic clones in assessing the change in performance for various design changes. The average absolute error in the IPC is around 3.66% and has an average correlation coefficient of 0.93 in assessing various design configurations and that in power per cycle is around 0.98 for the single threaded CPU2006 workloads. For the Embedded single threaded ImplantKarthikch suite, we have an average absolute error of 3.1% in IPC and 2.5% in power per cycle. Since we are limiting the length of the synthetic benchmarks to 1 million, a simulation speedup of 1 million to 10 million is achieved for various SPEC CPU2006 benchmarks.

Ongoing work is to characterize multithreaded workloads based on microarchitecture independent characteristics and provide the multithreaded clones. We use the full system simulator Virtutech Simics as our simulation environment. For simulating the multicore processors, shared caches, interconnection network and main memory, we use the Wisconsin Gems simulator and DRAMsim simulators.