Event
Remote Ph.D. Defense: Candace Walden
Thursday, July 15, 2021
10:30 a.m.
https://umd.webex.com/umd/j.php?MTID=ma7e9a4181a2a031ac90db6b2227599aa Password: 9cfXAp32iNN
Maria Hoo
301 405 3681
mch@umd.edu
Date/Time: Thursday, July 15, 2021 at 10:30 AM
Location: https://umd.webex.com/umd/j.php?MTID=ma7e9a4181a2a031ac90db6b2227599aa
The access circuitry for ReRAM subarrays requires only a small percentage of the area beneath the array. Still, this dense circuitry and wiring disrupts the layouts of irregular logic like CPUs. Caches are very regular and composed of smaller subarrays, making them a better candidate to place beneath crosspoint subarrays. By co-designing the cache subarrays and ReRAM crosspoint subarrays, minimal disruption to the cache logic can be achieved while still covering the bulk of the last-level cache area in ReRAM. Using a modified version of Cacti, we are able to explore the design trade-offs when integrating ReRAM and cache and quantify the impact the ReRAM has on the last-level cache. We also examine how the physical integration presents opportunities for logical integration of the last-level cache and main memory.
Additionally, this work explores one architectural style which can balance the monolithic memory system and a general-purpose compute system---a tiled multicore with wide SIMD and multi-threading. We develop a simulator for this architecture capable of simulating a wide variety of system parameters. Through a design space exploration of many of the parameters across sparse, irregular graph kernels and dense, streaming computations, we find monolithic ReRAM exceeds the performance of a state-of-the-art DRAM system for memory intensive workloads given enough parallelism. We further develop an analytic model to describe our system and highlight the important performance characteristics for a monolithic CPU-main memory chip. The analytic model is validated against our simulation data. Using this model, we examine the architectural balance of the systems we simulated.
Finally, we develop an RTL model of the combined cache--main memory interface. This gives a more accurate model for the increase in resources required for the combined controller. We additionally develop a system-on-a-chip with an RTL model that alters requests to the FPGA's main memory to be at the speed of ReRAM requests. This model is used to show the performance of more computationally intensive benchmarks. It also is the first step toward creating a test chip for a monolithically integrated ReRAM main memory.
