# Computer Architecture Why Is It So Important and Exciting Today? Onur Mutlu omutlu@gmail.com https://people.inf.ethz.ch/omutlu 16 October 2020 **IYTE Guest Lecture** Carnegie Mellon #### Brief Self Introduction #### Onur Mutlu - Full Professor @ ETH Zurich ITET (INFK), since September 2015 - □ Strecker Professor @ Carnegie Mellon University ECE/CS, 2009-2016, 2016-... - PhD from UT-Austin, worked at Google, VMware, Microsoft Research, Intel, AMD - https://people.inf.ethz.ch/omutlu/ - omutlu@gmail.com (Best way to reach me) - https://people.inf.ethz.ch/omutlu/projects.htm #### Research and Teaching in: - Computer architecture, computer systems, hardware security, bioinformatics - Memory and storage systems - Hardware security, safety, predictability - Fault tolerance - Hardware/software cooperation - Architectures for bioinformatics, health, medicine - ... #### Current Research Mission #### Computer architecture, HW/SW, systems, bioinformatics, security #### **Build fundamentally better architectures** #### Four Key Current Directions Fundamentally Secure/Reliable/Safe Architectures - Fundamentally Energy-Efficient Architectures - Memory-centric (Data-centric) Architectures Fundamentally Low-Latency and Predictable Architectures Architectures for AI/ML, Genomics, Medicine, Health #### The Transformation Hierarchy Computer Architecture (expanded view) Computer Architecture (narrow view) #### Axiom To achieve the highest energy efficiency and performance: #### we must take the expanded view of computer architecture Co-design across the hierarchy: Algorithms to devices Specialize as much as possible within the design goals #### Current Research Mission & Major Topics #### **Build fundamentally better architectures** Broad research spanning apps, systems, logic with architecture at the center - Data-centric arch. for low energy & high perf. - Proc. in Mem/DRAM, NVM, unified mem/storage - Low-latency & predictable architectures - Low-latency, low-energy yet low-cost memory - QoS-aware and predictable memory systems - Fundamentally secure/reliable/safe arch. - Tolerating all bit flips; patchable HW; secure mem - Architectures for ML/AI/Genomics/Graph/Med - Algorithm/arch./logic co-design; full heterogeneity - Data-driven and data-aware architectures - ML/AI-driven architectural controllers and design - Expressive memory and expressive systems #### Onur Mutlu's SAFARI Research Group Computer architecture, HW/SW, systems, bioinformatics, security, memory https://safari.ethz.ch/safari-newsletter-april-2020/ Think BIG, Aim HIGH! SAFARI https://safari.ethz.ch #### Principle: Teaching and Research Teaching drives Research Research drives Teaching ### Focus on Insight Encourage New Ideas #### Research & Teaching: Some Overview Talks https://www.youtube.com/onurmutlulectures - Future Computing Architectures - https://www.youtube.com/watch?v=kgiZISOcGFM&list=PL5Q2soXY2Zi8D 5MGV6EnXEJHnV2YFBJI&index=1 - Enabling In-Memory Computation - https://www.youtube.com/watch?v=njX 14584Jw&list=PL5Q2soXY2Zi8D 5MGV6EnXEJHnV2YFBJl&index=16 - Accelerating Genome Analysis - https://www.youtube.com/watch?v=hPnSmfwu2-A&list=PL5Q2soXY2Zi8D\_5MGV6EnXEJHnV2YFBJl&index=9 - Rethinking Memory System Design - https://www.youtube.com/watch?v=F7xZLNMIY1E&list=PL5Q2soXY2Zi8D\_5MGV6EnXEJHnV2YFBJl&index=3 - Intelligent Architectures for Intelligent Machines - https://www.youtube.com/watch?v=n8Aj\_A0WSq8&list=PL5Q2soXY2Zi8D\_5MGV6EnXEJHnV2YFBJl&index=22 #### An Interview on Research and Education - Computing Research and Education (@ ISCA 2019) - https://www.youtube.com/watch?v=8ffSEKZhmvo&list=PL5Q2 soXY2Zi\_4oP9LdL3cc8G6NIjD2Ydz - Maurice Wilkes Award Speech (10 minutes) - https://www.youtube.com/watch?v=tcQ3zZ3JpuA&list=PL5Q2 soXY2Zi8D\_5MGV6EnXEJHnV2YFBJl&index=15 #### More Thoughts and Suggestions Onur Mutlu, #### "Some Reflections (on DRAM)" Award Speech for <u>ACM SIGARCH Maurice Wilkes Award</u>, at the **ISCA** Awards Ceremony, Phoenix, AZ, USA, 25 June 2019. [Slides (pptx) (pdf)] [Video of Award Acceptance Speech (Youtube; 10 minutes) (Youku; 13 minutes)] [Video of Interview after Award Acceptance (Youtube; 1 hour 6 minutes) (Youku; 1 hour 6 minutes) [News Article on "ACM SIGARCH Maurice Wilkes Award goes to Prof. Onur Mutlu"] Onur Mutlu, #### "How to Build an Impactful Research Group" 57th Design Automation Conference Early Career Workshop (DAC), Virtual, 19 July 2020. [Slides (pptx) (pdf)] ## Why Study Computer Architecture? #### Computer Architecture - is the science and art of designing computing platforms (hardware, interface, system SW, and programming model) - to achieve a set of design goals - E.g., highest performance on earth on workloads X, Y, Z - E.g., longest battery life at a form factor that fits in your pocket with cost < \$\$\$ CHF</li> - E.g., best average performance across all known workloads at the best performance/cost ratio - **...** - □ Designing a supercomputer is different from designing a smartphone → But, many fundamental principles are similar **Figure 3.** TPU Printed Circuit Board. It can be inserted in the slot for an SATA disk in a server, but the card uses PCIe Gen3 x16. **Figure 4.** Systolic data flow of the Matrix Multiply Unit. Software has the illusion that each 256B input is read at once, and they instantly update one location of each of 256 accumulator RAMs. Jouppi et al., "In-Datacenter Performance Analysis of a Tensor Processing Unit", ISCA 2017. 22 - ML accelerator: 260 mm<sup>2</sup>, 6 billion transistors, 600 GFLOPS GPU, 12 ARM 2.2 GHz CPUs. - Two redundant chips for better safety. #### What is Computer Architecture? The science and art of designing, selecting, and interconnecting hardware components and designing the hardware/software interface to create a computing system that meets functional, performance, energy consumption, cost, and other specific goals. #### The Transformation Hierarchy Computer Architecture (expanded view) Computer Architecture (narrow view) #### Why Study Computer Architecture? - Enable better systems: make computers faster, cheaper, smaller, more reliable, ... - By exploiting advances and changes in underlying technology/circuits - Enable new applications - Life-like 3D visualization 20 years ago? Virtual reality? - Self-driving cars? - Personalized genomics? Personalized medicine? - Enable better solutions to problems - Software innovation is built on trends and changes in computer architecture - > 50% performance improvement per year has enabled this innovation - Understand why computers work the way they do #### Computer Architecture Today (I) - Today is a very exciting time to study computer architecture - Industry is in a large paradigm shift (to novel architectures) - many different potential system designs possible - Many difficult problems motivating and caused by the shift - Huge hunger for data and new data-intensive applications - Power/energy/thermal constraints - Complexity of design - Difficulties in technology scaling - Memory bottleneck - Reliability problems - Programmability problems - Security and privacy issues - No clear, definitive answers to these problems #### Computer Architecture Today (II) These problems affect all parts of the computing stack – if we do not change the way we design systems Many new demands from the top (Look Up) Fast changing demands and personalities of users (Look Up) Many new issues at the bottom (Look Down) No clear, definitive answers to these problems #### Computer Architecture Today (III) - Computing landscape is very different from 10-20 years ago - Both UP (software and humanity trends) and DOWN (technologies and their issues), FORWARD and BACKWARD, and the resulting requirements and constraints #### Axiom To achieve the highest energy efficiency and performance: #### we must take the expanded view of computer architecture Co-design across the hierarchy: Algorithms to devices Specialize as much as possible within the design goals #### Historical: Opportunities at the Bottom #### There's Plenty of Room at the Bottom From Wikipedia, the free encyclopedia "There's Plenty of Room at the Bottom: An Invitation to Enter a New Field of Physics" was a lecture given by physicist Richard Feynman at the annual American Physical Society meeting at Caltech on December 29, 1959.<sup>[1]</sup> Feynman considered the possibility of direct manipulation of individual atoms as a more powerful form of synthetic chemistry than those used at the time. Although versions of the talk were reprinted in a few popular magazines, it went largely unnoticed and did not inspire the conceptual beginnings of the field. Beginning in the 1980s, nanotechnology advocates cited it to establish the scientific credibility of their work. #### Historical: Opportunities at the Bottom (II) #### There's Plenty of Room at the Bottom From Wikipedia, the free encyclopedia Feynman considered some ramifications of a general ability to manipulate matter on an atomic scale. He was particularly interested in the possibilities of denser computer circuitry, and microscopes that could see things much smaller than is possible with scanning electron microscopes. These ideas were later realized by the use of the scanning tunneling microscope, the atomic force microscope and other examples of scanning probe microscopy and storage systems such as Millipede, created by researchers at IBM. Feynman also suggested that it should be possible, in principle, to make nanoscale machines that "arrange the atoms the way we want", and do chemical synthesis by mechanical manipulation. He also presented the possibility of "swallowing the doctor", an idea that he credited in the essay to his friend and graduate student Albert Hibbs. This concept involved building a tiny, swallowable surgical robot. #### Historical: Opportunities at the Top #### **REVIEW** ### There's plenty of room at the Top: What will drive computer performance after Moore's law? - (D) Charles E. Leiserson<sup>1</sup>, (D) Neil C. Thompson<sup>1,2,\*</sup>, (D) Joel S. Emer<sup>1,3</sup>, (D) Bradley C. Kuszmaul<sup>1,†</sup>, Butler W. Lampson<sup>1,4</sup>, (D)... - + See all authors and affiliations Science 05 Jun 2020: Vol. 368, Issue 6495, eaam9744 DOI: 10.1126/science.aam9744 Much of the improvement in computer performance comes from decades of miniaturization of computer components, a trend that was foreseen by the Nobel Prize—winning physicist Richard Feynman in his 1959 address, "There's Plenty of Room at the Bottom," to the American Physical Society. In 1975, Intel founder Gordon Moore predicted the regularity of this miniaturization trend, now called Moore's law, which, until recently, doubled the number of transistors on computer chips every 2 years. Unfortunately, semiconductor miniaturization is running out of steam as a viable way to grow computer performance—there isn't much more room at the "Bottom." If growth in computing power stalls, practically all industries will face challenges to their productivity. Nevertheless, opportunities for growth in computing performance will still be available, especially at the "Top" of the computing-technology stack: software, algorithms, and hardware architecture. #### Axiom, Revisited There is plenty of room both at the top and at the bottom but much more so when you communicate well between and optimize across the top and the bottom. #### Hence the Expanded View Computer Architecture (expanded view) # Some Cross-Layer Design Examples (Foreshadowing) ### Expressive (Memory) Interfaces Nandita Vijaykumar, Abhilasha Jain, Diptesh Majumdar, Kevin Hsieh, Gennady Pekhimenko, Eiman Ebrahimi, Nastaran Hajinazar, Phillip B. Gibbons and Onur Mutlu, "A Case for Richer Cross-layer Abstractions: Bridging the Semantic Gap with Expressive Memory" Proceedings of the <u>45th International Symposium on Computer Architecture</u> (**ISCA**), Los Angeles, CA, USA, June 2018. [Slides (pptx) (pdf)] [Lightning Talk Slides (pptx) (pdf)] [Lightning Talk Video] # A Case for Richer Cross-layer Abstractions: Bridging the Semantic Gap with Expressive Memory Nandita Vijaykumar<sup>†§</sup> Abhilasha Jain<sup>†</sup> Diptesh Majumdar<sup>†</sup> Kevin Hsieh<sup>†</sup> Gennady Pekhimenko<sup>‡</sup> Eiman Ebrahimi<sup>ℵ</sup> Nastaran Hajinazar<sup>‡</sup> Phillip B. Gibbons<sup>†</sup> Onur Mutlu<sup>§†</sup> ## X-MeM Aids Many Optimizations | Memory optimization | Example semantics provided by XMem (described in §3.3) | Example Benefits of XMem | |-----------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Cache<br>management | (i) Distinguishing between data<br>structures or pools of similar data;<br>(ii) Working set size; (iii) Data reuse | Enables: (i) applying different caching policies to different data structures or pools of data; (ii) avoiding cache thrashing by <i>knowing</i> the active working set size; (iii) bypassing/prioritizing data that has no/high reuse. (§5) | | Page placement<br>in DRAM<br>e.g., [23, 24] | (i) Distinguishing between data structures; (ii) Access pattern; (iii) Access intensity | Enables page placement at the <i>data structure</i> granularity to (i) isolate data structures that have high row buffer locality and (ii) spread out concurrently-accessed irregular data structures across banks and channels to improve parallelism. (§6) | | Cache/memory<br>compression<br>e.g., [25–32] | (i) Data type: integer, float, char;<br>(ii) Data properties: sparse, pointer,<br>data index | Enables using a <i>different compression algorithm</i> for each data structure based on data type and data properties, e.g., sparse data encodings, FP-specific compression, delta-based compression for pointers [27]. | | Data<br>prefetching<br>e.g., [33–36] | (i) Access pattern: strided, irregular, irregular but repeated (e.g., graphs), access stride; (ii) Data type: index, pointer | Enables (i) highly accurate software-driven prefetching while leveraging the benefits of hardware prefetching (e.g., by being memory bandwidth-aware, avoiding cache thrashing); (ii) using different prefetcher <i>types</i> for different data structures: e.g., stride [33], tile-based [20], pattern-based [34–37], data-based for indices/pointers [38,39], etc. | | DRAM cache<br>management<br>e.g., [40–46] | (i) Access intensity; (ii) Data reuse; (iii) Working set size | (i) Helps avoid cache thrashing by knowing working set size [44]; (ii) Better DRAM cache management via reuse behavior and access intensity information. | | Approximation in memory e.g., [47–53] | (i) Distinguishing between pools of similar data; (ii) Data properties: tolerance towards approximation | Enables (i) each memory component to track how approximable data is (at a fine granularity) to inform approximation techniques; (ii) data placement in heterogeneous reliability memories [54]. | | Data placement:<br>NUMA systems<br>e.g., [55, 56] | (i) Data partitioning across threads (i.e., relating data to threads that access it); (ii) Read-Write properties | Reduces the need for profiling or data migration (i) to co-locate data with threads that access it and (ii) to identify Read-Only data, thereby enabling techniques such as replication. | | Data placement:<br>hybrid<br>memories<br>e.g., [16,57,58] | (i) Read-Write properties<br>(Read-Only/Read-Write); (ii) Access<br>intensity; (iii) Data structure size;<br>(iv) Access pattern | Avoids the need for profiling/migration of data in hybrid memories to (i) effectively manage the asymmetric read-write properties in NVM (e.g., placing Read-Only data in the NVM) [16, 57]; (ii) make tradeoffs between data structure "hotness" and size to allocate fast/high bandwidth memory [14]; and (iii) leverage row-buffer locality in placement based on access pattern [45]. | | Managing<br>NUCA systems<br>e.g., [15,59] | (i) Distinguishing pools of similar data;<br>(ii) Access intensity; (iii) Read-Write or<br>Private-Shared properties | (i) Enables using different cache policies for different data pools (similar to [15]); (ii) Reduces the need for reactive mechanisms that detect sharing and read-write characteristics to inform cache policies. | ### Expressive (Memory) Interfaces for GPUs Nandita Vijaykumar, Eiman Ebrahimi, Kevin Hsieh, Phillip B. Gibbons and Onur Mutlu, "The Locality Descriptor: A Holistic Cross-Layer Abstraction to Express Data Locality in GPUs" Proceedings of the <u>45th International Symposium on Computer Architecture</u> (**ISCA**), Los Angeles, CA, USA, June 2018. [Slides (pptx) (pdf)] [Lightning Talk Slides (pptx) (pdf)] [Lightning Talk Video] #### The Locality Descriptor: #### A Holistic Cross-Layer Abstraction to Express Data Locality in GPUs ``` Nandita Vijaykumar<sup>†§</sup> Eiman Ebrahimi<sup>‡</sup> Kevin Hsieh<sup>†</sup> Phillip B. Gibbons<sup>†</sup> Onur Mutlu<sup>§†</sup> ``` †Carnegie Mellon University ‡NVIDIA §ETH Zürich #### Heterogeneous-Reliability Memory Yixin Luo, Sriram Govindan, Bikash Sharma, Mark Santaniello, Justin Meza, Aman Kansal, Jie Liu, Badriddine Khessib, Kushagra Vaid, and Onur Mutlu, "Characterizing Application Memory Error Vulnerability to Optimize Data Center Cost via Heterogeneous-Reliability Memory" Proceedings of the 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Atlanta, GA, June 2014. [Summary] [Slides (pptx) (pdf)] [Coverage on ZDNet] # Characterizing Application Memory Error Vulnerability to Optimize Datacenter Cost via Heterogeneous-Reliability Memory Yixin Luo Sriram Govindan\* Bikash Sharma\* Mark Santaniello\* Justin Meza Aman Kansal\* Jie Liu\* Badriddine Khessib\* Kushagra Vaid\* Onur Mutlu Carnegie Mellon University, yixinluo@cs.cmu.edu, {meza, onur}@cmu.edu \*Microsoft Corporation, {srgovin, bsharma, marksan, kansal, jie.liu, bkhessib, kvaid}@microsoft.com #### EDEN: Data-Aware Efficient DNN Inference Skanda Koppula, Lois Orosa, A. Giray Yaglikci, Roknoddin Azizi, Taha Shahroodi, Konstantinos Kanellopoulos, and Onur Mutlu, <u>"EDEN: Enabling Energy-Efficient, High-Performance Deep Neural Network Inference Using Approximate DRAM"</u> Proceedings of the <u>52nd International Symposium on Microarchitecture</u> (**MICRO**), Columbus, OH, USA, October 2019. [Slides (pptx) (pdf)] [Lightning Talk Slides (pptx) (pdf)] [Poster (pptx) (pdf)] [Lightning Talk Video (90 seconds)] [Full Talk Lecture (38 minutes)] #### EDEN: Enabling Energy-Efficient, High-Performance Deep Neural Network Inference Using Approximate DRAM Skanda Koppula Lois Orosa A. Giray Yağlıkçı Roknoddin Azizi Taha Shahroodi Konstantinos Kanellopoulos Onur Mutlu ETH Zürich ## SMASH: SW/HW Indexing Acceleration Konstantinos Kanellopoulos, Nandita Vijaykumar, Christina Giannoula, Roknoddin Azizi, Skanda Koppula, Nika Mansouri Ghiasi, Taha Shahroodi, Juan Gomez-Luna, and Onur Mutlu, "SMASH: Co-designing Software Compression and Hardware-**Accelerated Indexing for Efficient Sparse Matrix Operations**" Proceedings of the <u>52nd International Symposium on</u> Microarchitecture (MICRO), Columbus, OH, USA, October 2019. [Slides (pptx) (pdf)] [Lightning Talk Slides (pptx) (pdf)] [Poster (pptx) (pdf)] [Lightning Talk Video (90 seconds)] [Full Talk Lecture (30 minutes)] #### SMASH: Co-designing Software Compression and Hardware-Accelerated Indexing for Efficient Sparse Matrix Operations Konstantinos Kanellopoulos<sup>1</sup> Nandita Vijaykumar<sup>2,1</sup> Christina Giannoula<sup>1,3</sup> Roknoddin Azizi<sup>1</sup> Skanda Koppula<sup>1</sup> Nika Mansouri Ghiasi<sup>1</sup> Taha Shahroodi<sup>1</sup> Juan Gomez Luna<sup>1</sup> Onur Mutlu<sup>1,2</sup> ### Rethinking Virtual Memory Nastaran Hajinazar, Pratyush Patel, Minesh Patel, Konstantinos Kanellopoulos, Saugata Ghose, Rachata Ausavarungnirun, Geraldo Francisco de Oliveira Jr., Jonathan Appavoo, Vivek Seshadri, and Onur Mutlu, <u>"The Virtual Block Interface: A Flexible Alternative to the Conventional Virtual Memory Framework"</u> Proceedings of the <u>47th International Symposium on Computer Architecture</u> (**ISCA**), Valencia, Spain, June 2020. [Slides (pptx) (pdf)] [Lightning Talk Slides (pptx) (pdf)] [ARM Research Summit Poster (pptx) (pdf)] [Talk Video (26 minutes)] [Lightning Talk Video (3 minutes)] # The Virtual Block Interface: A Flexible Alternative to the Conventional Virtual Memory Framework Nastaran Hajinazar\*<sup>†</sup> Pratyush Patel<sup>™</sup> Minesh Patel<sup>\*</sup> Konstantinos Kanellopoulos<sup>\*</sup> Saugata Ghose<sup>‡</sup> Rachata Ausavarungnirun<sup>⊙</sup> Geraldo F. Oliveira<sup>\*</sup> Jonathan Appavoo<sup>†</sup> Vivek Seshadri<sup>▽</sup> Onur Mutlu<sup>\*‡</sup> \*ETH Zürich †Simon Fraser University ™University of Washington ‡Carnegie Mellon University <sup>⊙</sup>King Mongkut's University of Technology North Bangkok <sup>◇</sup>Boston University <sup>▽</sup>Microsoft Research India # Many Interesting Things Are Happening Today in Computer Architecture # Many Interesting Things Are Happening Today in Computer Architecture # Performance and Energy Efficiency ## Intel Optane Persistent Memory (2019) - Non-volatile main memory - Based on 3D-XPoint Technology #### PCM as Main Memory: Idea in 2009 Benjamin C. Lee, Engin Ipek, Onur Mutlu, and Doug Burger, "Architecting Phase Change Memory as a Scalable DRAM Alternative" Proceedings of the <u>36th International Symposium on Computer</u> <u>Architecture</u> (**ISCA**), pages 2-13, Austin, TX, June 2009. <u>Slides</u> (pdf) #### Architecting Phase Change Memory as a Scalable DRAM Alternative Benjamin C. Lee† Engin Ipek† Onur Mutlu‡ Doug Burger† †Computer Architecture Group Microsoft Research Redmond, WA {blee, ipek, dburger}@microsoft.com ‡Computer Architecture Laboratory Carnegie Mellon University Pittsburgh, PA onur@cmu.edu ### PCM as Main Memory: Idea in 2009 Benjamin C. Lee, Ping Zhou, Jun Yang, Youtao Zhang, Bo Zhao, Engin Ipek, Onur Mutlu, and Doug Burger, "Phase Change Technology and the Future of Main Memory" IEEE Micro, Special Issue: Micro's Top Picks from 2009 Computer Architecture Conferences (MICRO TOP PICKS), Vol. 30, No. 1, pages 60-70, January/February 2010. # PHASE-CHANGE TECHNOLOGY AND THE FUTURE OF MAIN MEMORY ## Cerebras's Wafer Scale Engine (2019) The largest ML accelerator chip 400,000 cores #### **Cerebras WSE** 1.2 Trillion transistors 46,225 mm<sup>2</sup> #### **Largest GPU** 21.1 Billion transistors 815 mm<sup>2</sup> **NVIDIA** TITAN V https://www.anandtech.com/show/14758/hot-chips-31-live-blogs-cerebras-wafer-scale-deep-learning https://www.cerebras.net/cerebras-wafer-scale-engine-why-we-need-big-chips-for-deep-learning #### UPMEM Processing-in-DRAM Engine (2019) - Processing in DRAM Engine - Includes standard DIMM modules, with a large number of DPU processors combined with DRAM chips. - Replaces standard DIMMs - DDR4 R-DIMM modules - 8GB+128 DPUs (16 PIM chips) - Standard 2x-nm DRAM process - Large amounts of compute & memory bandwidth ### More on Processing in Memory (I) Vivek Seshadri and Onur Mutlu, "In-DRAM Bulk Bitwise Execution Engine" Invited Book Chapter in Advances in Computers, to appear in 2020. [Preliminary arXiv version] #### In-DRAM Bulk Bitwise Execution Engine Vivek Seshadri Microsoft Research India visesha@microsoft.com Onur Mutlu ETH Zürich onur.mutlu@inf.ethz.ch ## More on Processing in Memory (II) Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi, "A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing" Proceedings of the <u>42nd International Symposium on</u> <u>Computer Architecture</u> (**ISCA**), Portland, OR, June 2015. [Slides (pdf)] [Lightning Session Slides (pdf)] #### A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing Junwhan Ahn Sungpack Hong<sup>§</sup> Sungjoo Yoo Onur Mutlu<sup>†</sup> Kiyoung Choi junwhan@snu.ac.kr, sungpack.hong@oracle.com, sungjoo.yoo@gmail.com, onur@cmu.edu, kchoi@snu.ac.kr Seoul National University <sup>§</sup>Oracle Labs <sup>†</sup>Carnegie Mellon University ## More on Processing in Memory (III) Amirali Boroumand, Saugata Ghose, Youngsok Kim, Rachata Ausavarungnirun, Eric Shiu, Rahul Thakur, Daehyun Kim, Aki Kuusela, Allan Knies, Parthasarathy Ranganathan, and Onur Mutlu, "Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks" Proceedings of the <u>23rd International Conference on Architectural</u> <u>Support for Programming Languages and Operating</u> <u>Systems</u> (**ASPLOS**), Williamsburg, VA, USA, March 2018. #### Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks Amirali Boroumand<sup>1</sup> Saugata Ghose<sup>1</sup> Youngsok Kim<sup>2</sup> Rachata Ausavarungnirun<sup>1</sup> Eric Shiu<sup>3</sup> Rahul Thakur<sup>3</sup> Daehyun Kim<sup>4,3</sup> Aki Kuusela<sup>3</sup> Allan Knies<sup>3</sup> Parthasarathy Ranganathan<sup>3</sup> Onur Mutlu<sup>5,1</sup> ## More on Processing in Memory (IV) Junwhan Ahn, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi, "PIM-Enabled Instructions: A Low-Overhead, Locality-Aware Processing-in-Memory Architecture" Proceedings of the <u>42nd International Symposium on</u> Computer Architecture (ISCA), Portland, OR, June 2015. [Slides (pdf)] [Lightning Session Slides (pdf)] #### PIM-Enabled Instructions: A Low-Overhead, Locality-Aware Processing-in-Memory Architecture Junwhan Ahn Sungjoo Yoo Onur Mutlu<sup>†</sup> Kiyoung Choi junwhan@snu.ac.kr, sungjoo.yoo@gmail.com, onur@cmu.edu, kchoi@snu.ac.kr Seoul National University <sup>†</sup>Carnegie Mellon University SAFARI #### PIM Review and Open Problems (I) #### Processing Data Where It Makes Sense: Enabling In-Memory Computation Onur Mutlu<sup>a,b</sup>, Saugata Ghose<sup>b</sup>, Juan Gómez-Luna<sup>a</sup>, Rachata Ausavarungnirun<sup>b,c</sup> <sup>a</sup>ETH Zürich <sup>b</sup>Carnegie Mellon University <sup>c</sup>King Mongkut's University of Technology North Bangkok Onur Mutlu, Saugata Ghose, Juan Gomez-Luna, and Rachata Ausavarungnirun, <a href=""Processing Data Where It Makes Sense: Enabling In-Memory">"Processing Data Where It Makes Sense: Enabling In-Memory</a> <a href="Computation">Computation</a> Invited paper in <u>Microprocessors and Microsystems</u> (**MICPRO**), June 2019. [arXiv version] SAFARI ### PIM Review and Open Problems (II) #### A Workload and Programming Ease Driven Perspective of Processing-in-Memory Saugata Ghose<sup>†</sup> Amirali Boroumand<sup>†</sup> Jeremie S. Kim<sup>†</sup>§ Juan Gómez-Luna<sup>§</sup> Onur Mutlu<sup>§†</sup> <sup>†</sup>Carnegie Mellon University <sup>§</sup>ETH Zürich Saugata Ghose, Amirali Boroumand, Jeremie S. Kim, Juan Gomez-Luna, and Onur Mutlu, "Processing-in-Memory: A Workload-Driven Perspective" Invited Article in IBM Journal of Research & Development, Special Issue on Hardware for Artificial Intelligence, to appear in November 2019. [Preliminary arXiv version] # TESLA Full Self-Driving Computer (2019) - ML accelerator: 260 mm<sup>2</sup>, 6 billion transistors, 600 GFLOPS GPU, 12 ARM 2.2 GHz CPUs. - Two redundant chips for better safety. #### Google TPU Generation I (~2016) **Figure 3.** TPU Printed Circuit Board. It can be inserted in the slot for an SATA disk in a server, but the card uses PCIe Gen3 x16. **Figure 4.** Systolic data flow of the Matrix Multiply Unit. Software has the illusion that each 256B input is read at once, and they instantly update one location of each of 256 accumulator RAMs. Jouppi et al., "In-Datacenter Performance Analysis of a Tensor Processing Unit", ISCA 2017. ### Google TPU Generation II (2017) https://www.nextplatform.com/2017/05/17/first-depth-look-googles-new-second-generation-tpu/ 4 TPU chips vs 1 chip in TPU1 High Bandwidth Memory vs DDR3 Floating point operations vs FP16 45 TFLOPS per chip vs 23 TOPS Designed for training and inference vs only inference ### An Example Modern Systolic Array: TPU (II) As reading a large SRAM uses much more power than arithmetic, the matrix unit uses systolic execution to save energy by reducing reads and writes of the Unified Buffer [Kun80][Ram91][Ovt15b]. Figure 4 shows that data flows in from the left, and the weights are loaded from the top. A given 256-element multiply-accumulate operation moves through the matrix as a diagonal wavefront. The weights are preloaded, and take effect with the advancing wave alongside the first data of a new block. Control and data are pipelined to give the illusion that the 256 inputs are read at once, and that they instantly update one location of each of 256 accumulators. From a correctness perspective, software is unaware of the systolic nature of the matrix unit, but for performance, it does worry about the latency of the unit. Jouppi et al., "In-Datacenter Performance Analysis of a Tensor Processing Unit", ISCA 2017. #### An Example Modern Systolic Array: TPU (III) **Figure 1.** TPU Block Diagram. The main computation part is the yellow Matrix Multiply unit in the upper right hand corner. Its inputs are the blue Weight FIFO and the blue Unified Buffer (UB) and its output is the blue Accumulators (Acc). The yellow Activation Unit performs the nonlinear functions on the Acc, which go to the UB. ## Many (Other) AI/ML Chips - Alibaba - Amazon - Facebook - Google - Huawei - Intel - Microsoft - NVIDIA - Tesla - Many Others and Many Startups... - Many More to Come... ## Many (Other) AI/ML Chips # Many Interesting Things Are Happening Today in Computer Architecture # Many Interesting Things Are Happening Today in Computer Architecture # Reliability and Security # Security: RowHammer (2014) #### The Story of RowHammer - One can predictably induce bit flips in commodity DRAM chips - □ >80% of the tested DRAM chips are vulnerable - First example of how a simple hardware failure mechanism can create a widespread system security vulnerability Forget Software—Now Hackers Are Exploiting Physics BUSINESS CULTURE DESIGN GEAR SCIENCE NDY GREENBERG SECURITY 08.31.16 7:00 AM # FORGET SOFTWARE—NOW HACKERS ARE EXPLOITING PHYSICS #### Modern DRAM is Prone to Disturbance Errors Repeatedly reading a row enough times (before memory gets refreshed) induces disturbance errors in adjacent rows in most real DRAM chips you can buy today #### Most DRAM Modules Are Vulnerable A company **B** company **C** company Up to Up to **1.0×10**<sup>7</sup> 2.7×10<sup>6</sup> errors Up to $3.3 \times 10^5$ errors errors #### One Can Take Over an Otherwise-Secure System #### Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors Abstract. Memory isolation is a key property of a reliable and secure computing system — an access to one memory address should not have unintended side effects on data stored in other addresses. However, as DRAM process technology # Project Zero Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors (Kim et al., ISCA 2014) News and updates from the Project Zero team at Google Exploiting the DRAM rowhammer bug to gain kernel privileges (Seaborn, 2015) Monday, March 9, 2015 Exploiting the DRAM rowhammer bug to gain kernel privileges #### Security: RowHammer (2014) It's like breaking into an apartment by repeatedly slamming a neighbor's door until the vibrations open the door you were after #### RowHammer: Five Years Ago... Yoongu Kim, Ross Daly, Jeremie Kim, Chris Fallin, Ji Hye Lee, Donghyuk Lee, Chris Wilkerson, Konrad Lai, and Onur Mutlu, "Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors" Proceedings of the 41st International Symposium on Computer Architecture (ISCA), Minneapolis, MN, June 2014. [Slides (pptx) (pdf)] [Lightning Session Slides (pptx) (pdf)] [Source Code and Data] #### Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors Yoongu Kim<sup>1</sup> Ross Daly\* Jeremie Kim<sup>1</sup> Chris Fallin\* Ji Hye Lee<sup>1</sup> Donghyuk Lee<sup>1</sup> Chris Wilkerson<sup>2</sup> Konrad Lai Onur Mutlu<sup>1</sup> Carnegie Mellon University <sup>2</sup>Intel Labs SAFARI 72 ## RowHammer: Now and Beyond... Onur Mutlu and Jeremie Kim, "RowHammer: A Retrospective" <u>IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems</u> (TCAD) Special Issue on Top Picks in Hardware and Embedded Security, 2019. [Preliminary arXiv version] ## RowHammer: A Retrospective Onur Mutlu<sup>§‡</sup> Jeremie S. Kim<sup>‡§</sup> §ETH Zürich <sup>‡</sup>Carnegie Mellon University SAFARI 7/3 ### RowHammer in 2020 (I) Jeremie S. Kim, Minesh Patel, A. Giray Yaglikci, Hasan Hassan, Roknoddin Azizi, Lois Orosa, and Onur Mutlu, "Revisiting RowHammer: An Experimental Analysis of Modern Devices and Mitigation Techniques" Proceedings of the <u>47th International Symposium on Computer</u> <u>Architecture</u> (**ISCA**), Valencia, Spain, June 2020. [Slides (pptx) (pdf)] [Lightning Talk Slides (pptx) (pdf)] [Talk Video (20 minutes)] [Lightning Talk Video (3 minutes)] ## Revisiting RowHammer: An Experimental Analysis of Modern DRAM Devices and Mitigation Techniques Jeremie S. Kim $^{\S \dagger}$ Minesh Patel $^{\S}$ A. Giray Yağlıkçı $^{\S}$ Hasan Hassan $^{\S}$ Roknoddin Azizi $^{\S}$ Lois Orosa $^{\S}$ Onur Mutlu $^{\S \dagger}$ $^{\S}$ ETH Zürich $^{\dagger}$ Carnegie Mellon University ## RowHammer in 2020 (II) Pietro Frigo, Emanuele Vannacci, Hasan Hassan, Victor van der Veen, Onur Mutlu, Cristiano Giuffrida, Herbert Bos, and Kaveh Razavi, "TRRespass: Exploiting the Many Sides of Target Row Refresh" Proceedings of the 41st IEEE Symposium on Security and Privacy (S&P), San Francisco, CA, USA, May 2020. [Slides (pptx) (pdf)] [Talk Video (17 minutes)] Source Code [Web Article] Best paper award. ## TRRespass: Exploiting the Many Sides of Target Row Refresh Pietro Frigo\*† Emanuele Vannacci\*† Hasan Hassan§ Victor van der Veen¶ Onur Mutlu§ Cristiano Giuffrida\* Herbert Bos\* Kaveh Razavi\* \*Vrije Universiteit Amsterdam §ETH Zürich ¶Oualcomm Technologies Inc. ### RowHammer in 2020 (III) Lucian Cojocar, Jeremie Kim, Minesh Patel, Lillian Tsai, Stefan Saroiu, Alec Wolman, and Onur Mutlu, "Are We Susceptible to Rowhammer? An End-to-End Methodology for Cloud Providers" Proceedings of the <u>41st IEEE Symposium on Security and</u> <u>Privacy</u> (**S&P**), San Francisco, CA, USA, May 2020. [Slides (pptx) (pdf)] [Talk Video (17 minutes)] ## Are We Susceptible to Rowhammer? An End-to-End Methodology for Cloud Providers Lucian Cojocar, Jeremie Kim<sup>§†</sup>, Minesh Patel<sup>§</sup>, Lillian Tsai<sup>‡</sup>, Stefan Saroiu, Alec Wolman, and Onur Mutlu<sup>§†</sup> Microsoft Research, <sup>§</sup>ETH Zürich, <sup>†</sup>CMU, <sup>‡</sup>MIT 76 ## Security: Meltdown and Spectre (2018) ## Meltdown and Spectre - Someone can steal secret data from the system even though - your program and data are perfectly correct and - your hardware behaves according to the specification and - there are no software vulnerabilities/bugs #### Why? - Speculative execution leaves traces of secret data in the processor's cache (internal storage) - It brings data that is not supposed to be brought/accessed if there was no speculative execution - A malicious program can inspect the contents of the cache to "infer" secret data that it is not supposed to access - A malicious program can actually force another program to speculatively execute code that leaves traces of secret data ## More on Meltdown/Spectre Vulnerabilities ## Project Zero News and updates from the Project Zero team at Google Wednesday, January 3, 2018 #### Reading privileged memory with a side-channel Posted by Jann Horn, Project Zero We have discovered that CPU data cache timing can be abused to efficiently leak information out of misspeculated execution, leading to (at worst) arbitrary virtual memory read vulnerabilities across local security boundaries in various contexts. # Many Interesting Things Are Happening Today in Computer Architecture # Many Interesting Things Are Happening Today in Computer Architecture ## **More Demanding Workloads** ## New Genome Sequencing Technologies ## Nanopore sequencing technology and tools for genome assembly: computational analysis of the current state, bottlenecks and future directions Damla Senol Cali ™, Jeremie S Kim, Saugata Ghose, Can Alkan, Onur Mutlu Briefings in Bioinformatics, bby017, https://doi.org/10.1093/bib/bby017 Published: 02 April 2018 Article history ▼ Oxford Nanopore MinION ## Data → performance & energy bottleneck ## Why Do We Care? An Example 200 Oxford Nanopore sequencers have left UK for China, to support rapid, near-sample coronavirus sequencing for outbreak surveillance Fri 31st January 2020 Following extensive support of, and collaboration with, public health professionals in China, Oxford Nanopore has shipped an additional 200 MinION sequencers and related consumables to China. These will be used to support the ongoing surveillance of the current coronavirus outbreak, adding to a large number of the devices already installed in the country. Each MinION sequencer is approximately the size of a stapler, and can provide rapid sequence information about the coronavirus. 700Kg of Oxford Nanopore sequencers and consumables are on their way for use by Chinese scientists in understanding the current coronavirus outbreak. **Read Mapping** **Sequencing** Genome **Analysis** ## Data → performance & energy bottleneck reau4: CGCTTCCAT read5: CCATGACGC read6: TTCCATGAC **Scientific Discovery** **Variant Calling** ## Future of Genome Sequencing & Analysis ## GateKeeper: FPGA-Based Alignment Filtering Mohammed Alser, Hasan Hassan, Hongyi Xin, Oguz Ergin, Onur Mutlu, and Can Alkan "GateKeeper: A New Hardware Architecture for Accelerating Pre-Alignment in DNA Short Read Mapping" Bioinformatics, [published online, May 31], 2017. Source Code Online link at Bioinformatics Journal ## GateKeeper: a new hardware architecture for accelerating pre-alignment in DNA short read mapping Mohammed Alser ™, Hasan Hassan, Hongyi Xin, Oğuz Ergin, Onur Mutlu ™, Can Alkan ™ Bioinformatics, Volume 33, Issue 21, 1 November 2017, Pages 3355–3363, https://doi.org/10.1093/bioinformatics/btx342 Published: 31 May 2017 Article history ▼ ## Shouji: Faster Algorithm-Hardware Co-Design Mohammed Alser, Hasan Hassan, Akash Kumar, Onur Mutlu, and Can Alkan, "Shouji: A Fast and Efficient Pre-Alignment Filter for Sequence Alignment" Bioinformatics, [published online, March 28], 2019. Source Code Online link at Bioinformatics Journal Bioinformatics, 2019, 1–9 doi: 10.1093/bioinformatics/btz234 Advance Access Publication Date: 28 March 2019 Original Paper #### Sequence alignment ## Shouji: a fast and efficient pre-alignment filter for sequence alignment Mohammed Alser<sup>1,2,3,\*</sup>, Hasan Hassan<sup>1</sup>, Akash Kumar<sup>2</sup>, Onur Mutlu<sup>1,3,\*</sup> and Can Alkan<sup>3,\*</sup> <sup>1</sup>Computer Science Department, ETH Zürich, Zürich 8092, Switzerland, <sup>2</sup>Chair for Processor Design, Center For Advancing Electronics Dresden, Institute of Computer Engineering, Technische Universität Dresden, 01062 Dresden, Germany and <sup>3</sup>Computer Engineering Department, Bilkent University, 06800 Ankara, Turkey Associate Editor: Inanc Birol SAFARI <sup>\*</sup>To whom correspondence should be addressed. ## In-Memory DNA Sequence Analysis Jeremie S. Kim, Damla Senol Cali, Hongyi Xin, Donghyuk Lee, Saugata Ghose, Mohammed Alser, Hasan Hassan, Oguz Ergin, Can Alkan, and Onur Mutlu, "GRIM-Filter: Fast Seed Location Filtering in DNA Read Mapping Using Processing-in-Memory Technologies" <u>BMC Genomics</u>, 2018. Proceedings of the <u>16th Asia Pacific Bioinformatics Conference</u> (**APBC**), Yokohama, Japan, January 2018. arxiv.org Version (pdf) # GRIM-Filter: Fast seed location filtering in DNA read mapping using processing-in-memory technologies Jeremie S. Kim<sup>1,6\*</sup>, Damla Senol Cali<sup>1</sup>, Hongyi Xin<sup>2</sup>, Donghyuk Lee<sup>3</sup>, Saugata Ghose<sup>1</sup>, Mohammed Alser<sup>4</sup>, Hasan Hassan<sup>6</sup>, Oguz Ergin<sup>5</sup>, Can Alkan<sup>4\*</sup> and Onur Mutlu<sup>6,1\*</sup> From The Sixteenth Asia Pacific Bioinformatics Conference 2018 Yokohama, Japan. 15-17 January 2018 ## GenASM: Fast Approximate String Matching Damla Senol Cali, Gurpreet S. Kalsi, Zulal Bingol, Can Firtina, Lavanya Subramanian, Jeremie S. Kim, Rachata Ausavarungnirun, Mohammed Alser, Juan Gomez-Luna, Amirali Boroumand, Anant Nori, Allison Scibisz, Sreenivas Subramoney, Can Alkan, Saugata Ghose, and Onur Mutlu, "GenASM: A High-Performance, Low-Power Approximate String Matching Acceleration Framework for Genome Sequence Analysis" Proceedings of the 53rd International Symposium on Microarchitecture (MICRO), Virtual, October 2020. [<u>Lighting Talk Video</u> (1.5 minutes)] [<u>Lightning Talk Slides (pptx) (pdf)</u>] [<u>Talk Video</u> (18 minutes)] [<u>Slides (pptx) (pdf)</u>] #### GenASM: A High-Performance, Low-Power Approximate String Matching Acceleration Framework for Genome Sequence Analysis Damla Senol Cali<sup>†™</sup> Gurpreet S. Kalsi<sup>™</sup> Zülal Bingöl<sup>▽</sup> Can Firtina<sup>⋄</sup> Lavanya Subramanian<sup>‡</sup> Jeremie S. Kim<sup>⋄†</sup> Rachata Ausavarungnirun<sup>⊙</sup> Mohammed Alser<sup>⋄</sup> Juan Gomez-Luna<sup>⋄</sup> Amirali Boroumand<sup>†</sup> Anant Nori<sup>™</sup> Allison Scibisz<sup>†</sup> Sreenivas Subramoney<sup>™</sup> Can Alkan<sup>▽</sup> Saugata Ghose<sup>\*†</sup> Onur Mutlu<sup>⋄†▽</sup> † Carnegie Mellon University <sup>™</sup> Processor Architecture Research Lab, Intel Labs <sup>▽</sup> Bilkent University <sup>⋄</sup> ETH Zürich ‡ Facebook <sup>⊙</sup> King Mongkut's University of Technology North Bangkok <sup>\*</sup> University of Illinois at Urbana–Champaign 89 ## More on Genome Analysis: A Survey Mohammed Alser, Zulal Bingol, Damla Senol Cali, Jeremie Kim, Saugata Ghose, Can Alkan, and Onur Mutlu, "Accelerating Genome Analysis: A Primer on an Ongoing Journey" IEEE Micro (IEEE MICRO), Vol. 40, No. 5, pages 65-75, September/October 2020. [Slides (pptx)(pdf)] [Talk Video (1 hour 2 minutes)] Accelerating Genome Analysis: A Primer on an Ongoing Journey #### **Mohammed Alser** ETH Zürich #### Zülal Bingöl Bilkent University #### Damla Senol Cali Carnegie Mellon University #### Jeremie Kim ETH Zurich and Carnegie Mellon University #### Saugata Ghose University of Illinois at Urbana–Champaign and Carnegie Mellon University #### Can Alkan Bilkent University #### **Onur Mutlu** ETH Zurich, Carnegie Mellon University, and Bilkent University ### More on Genome Analysis: Another Lecture Onur Mutlu, "Accelerating Genome Analysis: A Primer on an Ongoing Journey" Keynote talk at 2nd Workshop on Accelerator Architecture in Computational Biology and Bioinformatics (AACBB), Washington, DC, USA, February 2019. [Slides (pptx)(pdf)] [Video] ### Accelerating Genome Analysis A Primer on an Ongoing Journey Onur Mutlu omutlu@gmail.com https://people.inf.ethz.ch/omutlu 16 February 2019 AACBB Keynote Talk SAFARI Carnegie Mellon ## More on Genome Analysis: Another Lecture Computer Arch. - Lecture 3a: Introduction to Genome Sequence Analysis (ETH Zürich, Spring 2020) #### Data Overwhelms Modern Machines **In-memory Databases** **Graph/Tree Processing** ## Data → performance & energy bottleneck #### In-Memory Data Analytics [Clapp+ (Intel), IISWC'15; Awan+, BDCloud'15] #### **Datacenter Workloads** [Kanev+ (Google), ISCA' 15] #### Data Overwhelms Modern Machines **TensorFlow Mobile** Data → performance & energy bottleneck VP9 VouTube Video Playback Google's video codec Google's video codec #### Data Movement Overwhelms Modern Machines Amirali Boroumand, Saugata Ghose, Youngsok Kim, Rachata Ausavarungnirun, Eric Shiu, Rahul Thakur, Daehyun Kim, Aki Kuusela, Allan Knies, Parthasarathy Ranganathan, and Onur Mutlu, "Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks" Proceedings of the <u>23rd International Conference on Architectural Support for Programming</u> Languages and Operating Systems (ASPLOS), Williamsburg, VA, USA, March 2018. ### 62.7% of the total system energy is spent on data movement ### Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks Amirali Boroumand<sup>1</sup> Rachata Ausavarungnirun<sup>1</sup> Aki Kuusela<sup>3</sup> Allan Knies<sup>3</sup> Saugata Ghose<sup>1</sup> Youngsok Kim<sup>2</sup> Eric Shiu<sup>3</sup> Rahul Thakur<sup>3</sup> Daehyun Kim<sup>4,3</sup> Parthasarathy Ranganathan<sup>3</sup> Onur Mutlu<sup>5,1</sup> # Many Interesting Things Are Happening Today in Computer Architecture ## Many Novel Concepts Investigated Today - New Computing Paradigms (Rethinking the Full Stack) - Processing in Memory, Processing Near Data - Neuromorphic Computing - Fundamentally Secure and Dependable Computers - New Accelerators (Algorithm-Hardware Co-Designs) - Artificial Intelligence & Machine Learning - Graph Analytics - Genome Analysis - New Memories and Storage Systems - Non-Volatile Main Memory - Intelligent Memory ## Increasingly Demanding Applications Dream, and they will come ## Increasingly Diverging/Complex Tradeoffs ## Increasingly Diverging/Complex Tradeoffs A memory access consumes ~1000X the energy of a complex addition ## Increasingly Complex Systems #### Past systems ## Increasingly Complex Systems ## Computer Architecture Today - Computing landscape is very different from 10-20 years ago - Applications and technology both demand novel architectures ## Computer Architecture Today (II) - You can revolutionize the way computers are built, if you understand both the hardware and the software (and change each accordingly) - You can invent new paradigms for computation, communication, and storage - Recommended book: Thomas Kuhn, "The Structure of Scientific Revolutions" (1962) - Pre-paradigm science: no clear consensus in the field - Normal science: dominant theory used to explain/improve things (business as usual); exceptions considered anomalies - Revolutionary science: underlying assumptions re-examined ## Computer Architecture Today (II) You can revolutionize the way computers are built, if you understand both the hardware and the software (and change each accordingly) You can ir communic Recomme Scientific I Pre-para Normal : things (t Revoluti ure of eld improve anomalies examined ## Takeaways - It is an exciting time to be understanding and designing computing architectures - Many challenging and exciting problems in platform design - That no one has tackled (or thought about) before - That can have huge impact on the world's future - Driven by huge hunger for data (Big Data), new applications (ML/AI, graph analytics, genomics), ever-greater realism, ... - We can easily collect more data than we can analyze/understand - Driven by significant difficulties in keeping up with that hunger at the technology layer - Five walls: Energy, reliability, complexity, security, scalability ## You Can Watch the Remaining Part of This Lecture Here: https://www.youtube.com/watch?v=3q8MnVenMMQ # Computer Architecture Why Is It So Important and Exciting Today? Onur Mutlu omutlu@gmail.com https://people.inf.ethz.ch/omutlu 16 October 2020 **IYTE Guest Lecture** Carnegie Mellon #### Let's Start with Some Fundamentals # Question: What Is This? #### Answer: The First Major Piece of a Famous Architect - Bahnhof Stadelhofen: "The train station has several of the features that became signatures of his work; straight lines and right angles are rare." - ETH Alumnus, PhD in Civil Engineering Santiago Calatrava Valls (born 28 July 1951) is a Spanish architect, structural engineer, sculptor and painter, particularly known for his bridges supported by single leaning pylons, and his railway stations, stadiums, and museums, whose sculptural forms often resemble living organisms. [1] His best-known works include the Milwaukee Art Museum, the Turning Torso tower in Malmo, Sweden, the Margaret Hunt Hill Bridge in Dallas, Texas, and the Museum of Tomorrow in Rio de Janeiro, # Compare To This # Question 2: What Is This? # Answer: Masterpiece of a Famous Architect #### Design [edit] Calatrava said that the Oculus resembles a bird being released from a child's hand. The roof was originally designed to mechanically open to increase light and ventilation to the enclosed space. Herbert Muschamp, architecture critic of *The New York Times*, compared the design to the Bethesda Terrace and Fountain in Central Park, and wrote in 2004: # Strengths and Praise Santiago Calatrava's design for the World Trade Center PATH station should satisfy those who believe that buildings planned for ground zero must aspire to a spiritual dimension. Over the years, many people have discerned a metaphysical element in Mr. Calatrava's work. I hope New Yorkers will detect its presence, too. With deep appreciation, I congratulate the Port Authority for commissioning Mr. Calatrava, the great Spanish architect and engineer, to design a building with the power to shape the future of New York. It is a pleasure to report, for once, that public officials are not overstating the case when they describe a design as breathtaking.<sup>[43]</sup> #### Design Constraints and Criticism However, Calatrava's original soaring spike design was scaled back because of security issues. The *New York Times* observed in 2005: In the name of security, Santiago Calatrava's bird has grown a beak. Its ribs have doubled in number and its wings have lost their interstices of glass.... [T]he main transit hall, between Church and Greenwich Streets, will almost certainly lose some of its delicate quality, while gaining structural expressiveness. It may now evoke a slender stegosaurus more than it does a bird. [45] 99 #### Stegosaurus From Wikipedia, the free encyclopedia For the pachycephalosaurid of a similar name, see Stegoceras. Stegosaurus (/stɛgəˈsɔxrəs/[1]) is a genus of armored dinosaur. Fossils of this genus date to the Late Jurassic period, where they are found in Kimmeridgian to early Tithonian aged strata, between 155 and 150 million years ago, in the western United States and Portugal. Several #### Design Constraints: Noone is Immune However, Calatrava's original soaring spike design was scaled back because of security issues. The *New York Times* observed in 2005: In the name of security, Santiago Calatrava's bird has grown a beak. Its ribs have doubled in number and its wings have lost their interstices of glass.... [T]he main transit hall, between Church and Greenwich Streets, will almost certainly lose some of its delicate quality, while gaining structural expressiveness. It may now evoke a slender stegosaurus more than it does a bird. [45] The design was further modified in 2008 to eliminate the opening and closing roof mechanism because of budget and space constraints.<sup>[46]</sup> The Transportation Hub has been dubbed "the world's most expensive transportation hub" for its massive cost for reconstruction—\$3.74 billion dollars. [48][58] By contrast, the proposed two-mile PATH extension 118 # Question: What Is This? #### Answer: Masterpiece of Another Famous Architect # Fallingwater From Wikipedia, the free encyclopedia Fallingwater or Kaufmann Residence is a house designed by architect Frank Lloyd Wright in 1935 in rural southwestern Pennsylvania, 43 miles (69 km) southeast of Pittsburgh.<sup>[4]</sup> The home was built partly over a waterfall on Bear Run in the Mill Run section of Stewart Township, Fayette County, Pennsylvania, in the Laurel Highlands of the Allegheny Mountains. Time cited it after its completion as Wright's "most beautiful job";<sup>[5]</sup> it is listed among Smithsonian's Life List of 28 places "to visit before you die."<sup>[6]</sup> It was designated a National Historic Landmark in 1966.<sup>[3]</sup> In 1991, members of the American Institute of Architects named the house the "best all-time work of American architecture" and in 2007, it was ranked twenty-ninth on the list of America's Favorite Architecture according to the AIA. #### Your First Comp Arch Assignment - Go and visit Bahnhof Stadelhofen - Extra credit: Repeat for Oculus - Extra+ credit: Repeat for Fallingwater - Appreciate the beauty & out-of-the-box and creative thinking - Think about tradeoffs in the design of the Bahnhof - Strengths, weaknesses, goals of design - Derive principles on your own for good design and innovation - Due date: Any time during this course - Later during the course is better - Apply what you have learned in this course - Think out-of-the-box # But First, Today's First Assignment Find The Differences Of This and That # Find The Differences of This and That #### This #### That # Many Tradeoffs Between Two Designs You can list them after you complete the first assignment... # Aside: Evaluation Criteria for the Designs - Functionality (Does it meet the specification?) - Reliability - Space requirement - Cost - Expandability - Comfort level of users - Happiness level of users - Aesthetics - **...** - How to evaluate goodness of design is always a critical question. #### A Key Question - How was Calavatra able to design especially his key buildings? - Can have many guesses - (Ultra) hard work, perseverance, dedication (over decades) - Experience - Creativity, Out-of-the-box thinking - A good understanding of past designs - Good judgment and intuition - Strong skill combination (math, architecture, art, engineering, ...) - Funding (\$\$\$\$), luck, initiative, entrepreneurialism - Strong understanding of and commitment to fundamentals - Principled design - **-** ... - (You will be exposed to and hopefully develop/enhance many of these skills in this course) # Principled Design - "To me, there are two overriding principles to be found in nature which are most appropriate for building: - one is the optimal use of material, - the other the capacity of organisms to change shape, to grow, and to move." - Santiago Calatrava "Calatrava's constructions are inspired by natural forms like plants, bird wings, and the human body." #### Gare do Oriente, Lisbon, Revisited #### A Principled Design #### Zoomorphic architecture From Wikipedia, the free encyclopedia **Zoomorphic architecture** is the practice of using animal forms as the inspirational basis and blueprint for architectural design. "While animal forms have always played a role adding some of the deepest layers of meaning in architecture, it is now becoming evident that a new strand of biomorphism is emerging where the meaning derives not from any specific representation but from a more general allusion to biological processes."<sup>[1]</sup> Some well-known examples of Zoomorphic architecture can be found in the TWA Flight Center building in New York City, by Eero Saarinen, or the Milwaukee Art Museum by Santiago Calatrava, both inspired by the form of a bird's wings.<sup>[3]</sup> #### What Does This Remind You Of? #### What About This? #### A Quote from The Other Famous Architect "architecture [...] based upon principle, and not upon precedent" (Frank Lloyd Wright) Source: http://www.fallingwater.org/ # A Principled Design # Organic architecture From Wikipedia, the free encyclopedia Organic architecture is a philosophy of architecture which promotes harmony between human habitation and the natural world through design approaches so sympathetic and well integrated with its site, that buildings, furnishings, and surroundings become part of a unified, interrelated composition. A well-known example of organic architecture is Fallingwater, the residence Frank Lloyd Wright designed for the Kaufmann family in rural Pennsylvania. Wright had many choices to locate a home on this large site, but chose to place the home directly over the waterfall and creek creating a close, yet noisy dialog with the rushing water and the steep site. The horizontal striations of stone masonry with daring cantilevers of colored beige concrete blend with native rock outcroppings and the wooded environment. #### Another View #### Yet Another View #### Major High-Level Goals of This Course - Understand the principles - Understand the precedents - Based on such understanding: - Enable you to evaluate tradeoffs of different designs and ideas - Enable you to develop principled designs - Enable you to develop novel, out-of-the-box designs - The focus is on: - Principles, precedents, and how to use them for new designs - In Computer Architecture # Role of the (Computer) Architect #### Role of the Architect - -- Look Backward (Examine old code) - -- Look forward (Listen to the dreamers) - -- Look Up (Nature of the problems) - -- Look Down (Predict the future of technology) #### Role of The (Computer) Architect - Look backward (to the past) - Understand tradeoffs and designs, upsides/downsides, past workloads. Analyze and evaluate the past. - Look forward (to the future) - Be the dreamer and create new designs. Listen to dreamers. - Push the state of the art. Evaluate new design choices. - Look up (towards problems in the computing stack) - Understand important problems and their nature. - Develop architectures and ideas to solve important problems. - Look down (towards device/circuit technology) - Understand the capabilities of the underlying technology. - Predict and adapt to the future of technology (you are designing for N years ahead). Enable the future technology. #### Takeaways - Being an architect is not easy - You need to consider many things in designing a new system + have good intuition/insight into ideas/tradeoffs - But, it is fun and can be very rewarding - And, enables a great future - E.g., many scientific and everyday-life innovations would not have been possible without architectural innovation that enabled very high performance systems - E.g., your mobile phones - E.g., self-driving vehicles - This course will enable you to become a good computer architect #### So, I Hope You Are Here for This Comp. Systems - How does an assembly program end up executing as digital logic? - What happens in-between? - How is a computer designed using logic gates and wires to satisfy specific goals? "C" as a model of computation Programmer's view of how a computer system works Architect/microarchitect's view: How to design a computer that meets system design goals. Choices critically affect both the SW programmer and the HW designer HW designer's view of how a computer system works Digital logic as a model of computation Digital Design #### Levels of Transformation "The purpose of computing is [to gain] insight" (*Richard Hamming*) We gain and generate insight by solving problems How do we ensure problems are solved by electrons? #### **Algorithm** Step-by-step procedure that is guaranteed to terminate where each step is precisely stated and can be carried out by a computer - Finiteness - Definiteness - Effective computability Many algorithms for the same problem Microarchitecture An implementation of the ISA Problem **Algorithm** Program/Language Runtime System (VM, OS, MM) ISA (Architecture) Microarchitecture Logic Devices Electrons ISA (Instruction Set Architecture) Interface/contract between SW and HW. What the programmer assumes hardware will satisfy. Digital logic circuits Building blocks of micro-arch (e.g., gates) # Aside: A Famous Work By Hamming - Hamming, "Error Detecting and Error Correcting Codes," Bell System Technical Journal 1950. - Introduced the concept of Hamming distance - number of locations in which the corresponding symbols of two equal-length strings is different - Developed a theory of codes used for error detection and correction - Also see: - □ Hamming, "You and Your Research," Talk at Bell Labs, 1986. - http://www.cs.virginia.edu/~robins/YouAndYourResearch.html #### Levels of Transformation, Revisited A user-centric view: computer designed for users The entire stack should be optimized for user #### The Power of Abstraction #### Levels of transformation create abstractions - Abstraction: A higher level only needs to know about the interface to the lower level, not how the lower level is implemented - E.g., high-level language programmer does not really need to know what the ISA is and how a computer executes instructions - Abstraction improves productivity - No need to worry about decisions made in underlying levels - E.g., programming in Java vs. C vs. assembly vs. binary vs. by specifying control signals of each transistor every cycle - Then, why would you want to know what goes on underneath or above? #### Crossing the Abstraction Layers As long as everything goes well, not knowing what happens underneath (or above) is not a problem. #### What if - The program you wrote is running slow? - The program you wrote does not run correctly? - The program you wrote consumes too much energy? - Your system just shut down and you have no idea why? - Someone just compromised your system and you have no idea how? #### What if - The hardware you designed is too hard to program? - The hardware you designed is too slow because it does not provide the right primitives to the software? #### What if You want to design a much more efficient and higher performance system? #### Crossing the Abstraction Layers - Two key goals of this course are - to understand how a processor works underneath the software layer and how decisions made in hardware affect the software/programmer - to enable you to be comfortable in making design and optimization decisions that cross the boundaries of different layers and system components #### An Example: Multi-Core Systems Multi-Core Chip # Computer Architecture Why Is It So Important and Exciting Today? Onur Mutlu omutlu@gmail.com https://people.inf.ethz.ch/omutlu 16 October 2020 **IYTE Guest Lecture** Carnegie Mellon