Research Statement

The scaling of the technology has continuously increased the integration density at the speed promised by Moore's Law, however, the processor performance does not scale nearly as fast due to the issues of power wall, memory bottlenecks as well as programming paradigms. Harnessing the abundant on-chip resources and translating them into performance in energy efficient way is the fundamental challenge that computer architect are confronted with.

Replicating multiple copies of the processor cores on the chip seems to be a straightforward way to utilize the ever-increasing on-chip transistors. However, this homogeneous multi-core design paradigm cannot provide an energy efficient execution platform for diverse heterogeneous workloads. As an example, imagine a workload contains a program with high ILP and a program with low ILP. A homogeneous multicore will end up with either sacrificing the achievable performance of the high ILP program or undermining the power efficiency of the low ILP program. One attractive way out of this dilemma is via hardware specialization.

Heterogeneous multicore is one form of hardware specialization. By integrating cores with different configurations in a single chip, heterogeneous multicore provides the hardware substrate that meets the diverse demands from the workloads efficiently. To unleash the potential of efficient computing, programs have to be scheduled to the appropriate cores. We argue that program inherent characteristics can be leveraged for this purpose.

Hardware adaptation is another form of hardware specialization. In a multicore context, this technique essentially leads to a dynamic reconfigurable heterogeneous multicore. While many approaches have been proposed on how to adapt the hardware to the need of the workloads, most of them are in a reactive, trial-and-error manner, which incurs significant overhead in power and energy consumption. We propose a performance model that captures the performance trend of different configurations without any partial simulations, hence is able to proactively tailor the hardware resource according to the need of the workloads. It also provides an attractive solution for converting the performance target to the corresponding resource usage, paving way for QoS in CMP systems.

Accelerator-based architecture sits in the end of the spectrum of hardware specialization. It is probably the most energy efficient execution platform for diverse workloads, yet it also comes along with the issues like hardware complexity, software compatibility etc.

All in all, to achieve high performance and power efficient computing, hardware have to be well aware of, and further take advantage of the workload behaviors. On-line profiling is one way to understand and predict the workload behavior. A more fundamental and probably more effective way is cross layer software/hardware interaction. We attempts to exploit some of the features nested in OOP languages, and extract some of the valuable information to assist run-time decisions in hardware.

- As of Dec.2009