Since the beginning of the new millennium, the world of digital system design has witnessed an unprecedented phenomenon: a rapid and persistent decrease in feature sizes well into the nanoscale realm. Advancements in device technology and fabrication techniques have enabled designers to tread previously unchartered territories; integration of billions of transistors on-die is now a reality.
The Diminishing Returns of Instruction-Level Parallelism
This newly-developed ability to integrate previously unimaginable amounts of logic on a single die has naturally led to a paradigm shift in computer architecture. Instead of concentrating on exploiting the saturating Instruction-Level Parallelism (ILP) through complex, super-scalar cores, architects are now starting to target Thread-Level Parallelism (TLP) by using multiple simple Central Processing Unit (CPU) cores. The embodiment of this transition is the advent of Chip Multiprocessors (CMP) as a viable alternative to the complex superscalar architectures. CMPs are simple, compact processing cores forming a decentralized micro-architecture that scales more efficiently with increased integration densities.
The irreversible march toward manycore systems with tens, or even hundreds, of processing elements is exemplified by the recent onslaught of commercial multicore processors. A fundamental research goal at multiCAL is the design of microprocessor architectures for future tera-scale platforms, i.e., CMPs “capable of performing trillions of calculations per second (teraflops) on trillions of bytes of data (terabytes)” [as stated in Intel’s Tera-Scale Computing Research Vision].
The Dawn of the Communication-Centric Revolution
The advent of multicore chips has signaled the beginning of communication-centric, rather than computation-centric, systems. The large, sophisticated monolithic modules are giving way to several smaller, simpler processing elements working in tandem.
This profound realization constitutes another primary research driver of our efforts at multiCAL: a communication-centric focus on multicore architectures. The goal is to devise scalable hardware/software mechanisms that can efficiently utilize the abundance of interconnected processing elements found in these new architectures.
As a side effect, the new multicore design paradigm has inflicted enormous strain on the interconnection backbone, which now needs to undertake a more prominent and sophisticated role. Due to their scalability, packet-based Networks-on-Chip (NoC) are considered the most viable solution for the manycore chips of the future. Nevertheless, the design of efficient on-chip networks is impeded by inherently conflicting requirements: the NoC is expected to provide ultra-low latencies, while occupying as little silicon real-estate, and consuming as little energy, as possible. These three design strands engage in an elaborate tug-of-war, requiring extensive exploration to reach a delicate balance between all three. This intricate interplay is compounded even further by reliability and process variability artifacts, which are emerging ominously as technology feature sizes dwindle.
Owing to their rapidly increasing eminence and undisputed significance in the multicore domain, NoCs figure prominently within multiCAL’s research repertoire. Our investigation of next-generation on-chip interconnection fabrics employs a holistic, multi-pronged approach, which encompasses all of the aforementioned critical design metrics. Designing for performance alone is no longer viable in the modern deep sub-micron epoch. Advancements in technology have spurred not only new opportunities, but also new challenges that cannot be ignored.