
Kunle Olukotun
Cadence Design Systems Professor and Professor of Electrical Engineering
Bio
Kunle Olukotun is the Cadence Design Systems Professor in the School of Engineering and Professor of Electrical Engineering and Computer Science at Stanford University. Olukotun is well known as a pioneer in multicore processor design and the leader of the Stanford Hydra chip multiprocessor (CMP) research project. Olukotun founded Afara Websystems to develop high-throughput, low-power multicore processors for server systems. The Afara multicore processor, called Niagara, was acquired by Sun Microsystems. Niagara derived processors now power all Oracle SPARC-based servers. Olukotun currently directs the Stanford Pervasive Parallelism Lab (PPL), which seeks to proliferate the use of heterogeneous parallelism in all application areas using Domain Specific Languages (DSLs).
Academic Appointments
-
Professor, Electrical Engineering
-
Professor, Computer Science
-
Faculty Affiliate, Institute for Human-Centered Artificial Intelligence (HAI)
-
Member, Wu Tsai Neurosciences Institute
Honors & Awards
-
Fellow, ACM (2007)
-
Fellow, IEEE (2007)
Professional Education
-
PhD, Michigan (1991)
2020-21 Courses
- Digital Systems Design Lab
EE 109 (Spr) - Parallel Computing
CS 149 (Aut) -
Independent Studies (21)
- Advanced Reading and Research
CS 499 (Aut, Win, Spr, Sum) - Advanced Reading and Research
CS 499P (Aut, Win, Spr, Sum) - Computer Laboratory
CS 393 (Aut, Win, Spr, Sum) - Curricular Practical Training
CS 390A (Aut, Win, Spr, Sum) - Curricular Practical Training
CS 390B (Aut, Win, Spr, Sum) - Curricular Practical Training
CS 390C (Aut, Win, Spr, Sum) - Independent Database Project
CS 395 (Aut, Win, Spr, Sum) - Independent Project
CS 399 (Aut, Win, Spr, Sum) - Independent Project
CS 399P (Aut, Win, Spr, Sum) - Independent Work
CS 199 (Aut, Win, Spr, Sum) - Independent Work
CS 199P (Aut, Win, Spr, Sum) - Master's Thesis and Thesis Research
EE 300 (Aut, Win, Spr, Sum) - Part-time Curricular Practical Training
CS 390D (Aut, Win) - Programming Service Project
CS 192 (Aut, Win, Spr, Sum) - Senior Project
CS 191 (Aut, Win, Spr, Sum) - Special Studies and Reports in Electrical Engineering
EE 191 (Aut, Win, Spr) - Special Studies and Reports in Electrical Engineering
EE 391 (Aut, Win, Spr, Sum) - Special Studies and Reports in Electrical Engineering (WIM)
EE 191W (Aut, Win, Spr) - Special Studies or Projects in Electrical Engineering
EE 190 (Aut, Win, Spr) - Special Studies or Projects in Electrical Engineering
EE 390 (Aut, Win, Spr, Sum) - Writing Intensive Senior Project (WIM)
CS 191W (Aut, Win, Spr)
- Advanced Reading and Research
-
Prior Year Courses
2019-20 Courses
- Digital Systems Design Lab
EE 109 (Spr) - Hardware Accelerators for Machine Learning
CS 217 (Win) - Parallel Computing
CS 149 (Aut)
2018-19 Courses
- Digital Systems Design Lab
EE 109 (Spr) - Hardware Accelerators for Machine Learning
CS 217 (Aut) - Parallel Computing
CS 149 (Win)
2017-18 Courses
- Digital Systems Design Lab
EE 109 (Spr) - Parallel Computing
CS 149 (Win)
- Digital Systems Design Lab
Stanford Advisees
-
Doctoral Dissertation Reader (AC)
Elaina Chai, Robert Radway, Eshan Singh, Sahaana Suri -
Doctoral Dissertation Advisor (AC)
Anand Atreya, Matthew Feldman, Stefan Hadjis, Tushar Swamy, Matthew Vilim, Tian Zhao -
Master's Program Advisor
Andrew Benson, Nick Comly, Gerardo Gomez Martinez, Xuyi Guo, Yao Hsiao, Abisola Olawale, Taresh Sethi, Marc Vaz, Megan Zhang -
Doctoral (Program)
Anand Atreya, Sneha Goenka, Stefan Hadjis, Olivia Hsu, Taeyoung Kong, Alex Rucker, Nathan Zhang, Tian Zhao
All Publications
-
Plasticine: A Reconfigurable Architecture For Parallel Patterns
ISCA '17: 44th International Symposium on Computer Architecture, June 2017
2017
View details for DOI 10.1145/3079856.3080256
-
Beyond Parallel Programming with Domain Specific Languages
ACM SIGPLAN NOTICES
2014; 49 (8): 179-179
View details for DOI 10.1145/2555243.2557966
View details for Web of Science ID 000349142100016
-
Delite: A Compiler Architecture for Performance-Oriented Embedded Domain-Specific Languages
ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS
2014; 13
View details for DOI 10.1145/2584665
View details for Web of Science ID 000341390100017
-
Surgical Precision JIT Compilers
35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)
ASSOC COMPUTING MACHINERY. 2014: 41–52
View details for DOI 10.1145/2594291.2594316
View details for Web of Science ID 000344455800008
-
Forge: Generating a High Performance DSL Implementation from a Declarative Specification
ACM SIGPLAN NOTICES
2014; 49 (3): 145-154
View details for DOI 10.1145/2517208.2517220
View details for Web of Science ID 000338625500017
-
Optimizing Data Structures in High-Level Programs New Directions for Extensible Compilers based on Staging
ACM SIGPLAN NOTICES
2013; 48 (1): 497-510
View details for DOI 10.1145/2480359.2429128
View details for Web of Science ID 000318629900042
-
High Performance Embedded Domain Specific Languages
ACM SIGPLAN NOTICES
2012; 47 (9): 139-139
View details for DOI 10.1145/2398856.2364548
View details for Web of Science ID 000311296000014
-
Green-Marl: A DSL for Easy and Efficient Graph Analysis
ACM SIGPLAN NOTICES
2012; 47 (4): 349-362
View details for Web of Science ID 000209339300029
- Green-Marl: A DSL for Easy and Efficient Graph Analysis 2012
-
IMPLEMENTING DOMAIN-SPECIFIC LANGUAGES FOR HETEROGENEOUS PARALLEL COMPUTING
IEEE MICRO
2011; 31 (5): 42-52
View details for Web of Science ID 000295883700006
-
Accelerating CUDA Graph Algorithms at Maximum Warp
ACM SIGPLAN NOTICES
2011; 46 (8): 267-276
View details for Web of Science ID 000296264900027
-
A Domain-Specific Approach To Heterogeneous Parallelism
ACM SIGPLAN NOTICES
2011; 46 (8): 35-45
View details for Web of Science ID 000296264900005
-
Hardware Acceleration of Transactional Memory on Commodity Systems
ACM SIGPLAN NOTICES
2011; 46 (3): 27-38
View details for DOI 10.1145/1961296.1950372
View details for Web of Science ID 000290854400004
- Implementing Domain-Specific Languages for Heterogeneous Parallel Computing IEEE Micro: Special Issue on CPU, GPU, and Hybrid Computing 2011
- Hardware Acceleration of Transactional Memory on Commodity Systems 2011
- Accelerating CUDA Graph Algorithms at Maximum Warp 2011
- A Domain-Specific Approach to Heterogeneous Parallelism 2011
- Building-Blocks for Performance Oriented DSLs 2011
- OptiML: An Implicitly Parallel Domain-Specific Language for Machine Learning 2011
- Efficient Parallel Graph Exploration on Multi-Core CPU and GPU 2011
- A Heterogeneous Parallel Framework for Domain-Specific Languages 2011
-
Language Virtualization for Heterogeneous Parallel Computing
Conference on Object Oriented Programming Systems, Languages and Applications/SPLASH 2010
ASSOC COMPUTING MACHINERY. 2010: 835–47
View details for DOI 10.1145/1932682.1869527
View details for Web of Science ID 000286595800051
-
A Practical Concurrent Binary Search Tree
ACM SIGPLAN NOTICES
2010; 45 (5): 257-268
View details for Web of Science ID 000280548100024
-
UBIQUITOUS PARALLEL COMPUTING FROM BERKELEY, ILLINOIS, AND STANFORD
IEEE MICRO
2010; 30 (2): 41-55
View details for Web of Science ID 000276473900006
- A Large-scale Architecture for Restricted Boltzmann Machines 2010
- FARM: A Prototyping Environment for Tightly-Coupled, Heterogeneous Architectures 2010
- Implementing and Evaluating Nested Parallel Transactions in Software Transactional Memory 2010
- Transactional Predication: High-Performance Concurrent Sets and Maps for STM 2010
- EigenBench: A Simple Exploration Tool for Orthogonal TM Characterisitics 2010
- CCSTM: A Library-Based STM for Scala 2010
- Making Nested Parallel Transactions Practical using Lightweight Hardware Support 2010
- Language Virtualization for Heterogeneous Parallel Computing 2010
- Implementing and Evaluating a Model Checker for Transactional Memory Systems 2010
- A Practical Concurrent Binary Search Tree. 2010
- A Highly Scalable Restricted Boltzmann Machine FPGA Implementation 2009
-
Feedback-Directed Barrier Optimization in a Strongly Isolated STM
ACM SIGPLAN NOTICES
2009; 44 (1): 213-225
View details for Web of Science ID 000272013800020
- Feedback-Directed Barrier Optimization in a Strongly Isolated STM 2009
-
Improving Software Concurrency with Hardware-assisted Memory Snapshot
20th ACM Symposium on Parallelism in Algorithms and Architectures
ASSOC COMPUTING MACHINERY. 2008: 363–363
View details for Web of Science ID 000266217200050
-
STAMP: Stanford Transactional Applications for Multi-Processing
IEEE International Symposium on Workload Characterization
IEEE. 2008: 31–42
View details for Web of Science ID 000263063500004
-
ASeD: Availability, Security, and Debugging Support using Transactional Memory
20th ACM Symposium on Parallelism in Algorithms and Architectures
ASSOC COMPUTING MACHINERY. 2008: 366–366
View details for Web of Science ID 000266217200053
-
Transactional memory: The hardware-software interface
IEEE MICRO
2007; 27 (1): 67-76
View details for Web of Science ID 000246455000009
-
An Effective Hybrid Transactional Memory System with Strong Isolation Guarantees
34th Annual International Symposium on Computer Architecture
ASSOC COMPUTING MACHINERY. 2007: 69–80
View details for Web of Science ID 000265786200007
-
Transactional Collection Classes
ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
ASSOC COMPUTING MACHINERY. 2007: 56–67
View details for Web of Science ID 000266870900006
-
A Practical FPGA-based Framework for Novel CMP Research
15th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
ASSOC COMPUTING MACHINERY. 2007: 116–125
View details for Web of Science ID 000268330100013
-
Towards Soft Optimization Techniques for Parallel Cognitive Applications
19th Annual Symposium on Parallelism in Algorithms and Architectures
ASSOC COMPUTING MACHINERY. 2007: 59–60
View details for Web of Science ID 000266371200009
-
A scalable, non-blocking approach to transactional memory
13th International Symposium on High-Performance Computer Architecture
IEEE COMPUTER SOC. 2007: 97–108
View details for Web of Science ID 000245463100010
-
ATLAS: A chip-multiprocessor with Transactional Memory support
Design, Automation and Test in Europe Conference and Exhibition (DATE 07)
IEEE. 2007: 3–8
View details for Web of Science ID 000252175700001
-
Executing Java programs with transactional memory
OOPSLA Workshop on Synchronization and Concurrent in Object-Oriented Languages
ELSEVIER SCIENCE BV. 2006: 111–29
View details for DOI 10.1016/j.scico.2006.05.006
View details for Web of Science ID 000241921200002
-
Tradeoffs in transactional memory virtualization
ACM SIGPLAN NOTICES
2006; 41 (11): 371-381
View details for Web of Science ID 000202972600035
-
The ATOMO Sigma transactional programming language
ACM SIGPLAN NOTICES
2006; 41 (6): 1-13
View details for Web of Science ID 000202972100001
- The Atomos Transactional Programming Language 2006
-
Architectural semantics for practical Transactional Memory
33rd International Symposium on Computer Architecture
IEEE COMPUTER SOC. 2006: 53–64
View details for Web of Science ID 000238976500005
-
The common case transactional behavior of multithreaded programs
12th International Symposium on High-Performance Computer Architecture
IEEE COMPUTER SOC. 2006: 271–282
View details for Web of Science ID 000237200400026
- The Common Case Transactional Behavior of Multithreaded Programs 2006
- Architectural Semantics for Practical Transactional Memory 2006
- The Software Stack for Transactional Memory: Challenges and Opportunities 2006
- Tradeoffs in Transactional Memory Virtualizations 2006
-
Niagara: A 32-way multithreaded SPARC processor
IEEE MICRO
2005; 25 (2): 21-29
View details for Web of Science ID 000228487000004
- The Future of Microprocessors ACM QUEUE Magazine 2005
-
Maximizing CMP throughput with mediocre cores
PACT 2005: 14TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES
2005: 51-62
View details for Web of Science ID 000233637100005
-
A new approach to programming and prototyping parallel systems
HIGH PERFORMANCE COMPUTING - HIPC 2005, PROCEEDINGS
2005; 3769: 4-4
View details for Web of Science ID 000235801700003
-
Characterization of TCC on chip-multiprocessors
14th International Conference on Parallel Architectures and Compilation Techniques
IEEE COMPUTER SOC. 2005: 63–74
View details for Web of Science ID 000233637100006
- Maximizing CMP Throughput with Mediocre Cores 2005
- TAPE: A Transactional Application Profiling Environment 2005
- Article about Kunle Olukuton's Niagara processor: Sun's Big Splash IEEE Spectrum Magazine 2005
- Transactional Execution of Java Programs 2005
- Exposing Speculative Thread Parallelism in SPEC2000 2005
- Characterization of TCC on Chip-Multiprocessors 2005
-
Transactional coherence and consistency: Simplifying parallel hardware and software
IEEE MICRO
2004; 24 (6): 92-103
View details for Web of Science ID 000226365900013
-
Programming with transactional coherence and consistency (TCC)
11th International Conference on Architectural Support for Programming Languages and Operating Systems
ASSOC COMPUTING MACHINERY. 2004: 1–13
View details for Web of Science ID 000228341700003
- Transactional Coherence and Consistency: Simplifying Parallel Hardware and Software Micro's Top Picks, IEEE Micro 2004; 24 (6)
-
Transactional memory coherence and consistency
31st Annual International Symposium on Computer Architecture
IEEE COMPUTER SOC. 2004: 102–113
View details for Web of Science ID 000222915900009
- Niagara: A 32-Way Multithreaded SPARC Processor IEEE MICRO Magazine, March-April 2005, and presented at Hot Chips 2004
- Transactional Memory Coherence and Consistency 2004
- Programming with Transactional Coherence and Consistency (TCC) 2004
-
The Jrpm system for dynamically parallelizing sequential Java programs
IEEE MICRO
2003; 23 (6): 26-35
View details for Web of Science ID 000188257700006
-
Using thread-level speculation to simplify manual parallelization
9th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
ASSOC COMPUTING MACHINERY. 2003: 1–12
View details for Web of Science ID 000187366900001
- Using Thread-Level Speculation to Simplify Manual Parallelization 2003
-
The Jrpm system for dynamically parallelizing Java programs
30TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, PROCEEDINGS
2003: 434-445
View details for Web of Science ID 000183763700037
-
TEST: A tracer for extracting speculative threads
CGO 2003: INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION
2003: 301-312
View details for Web of Science ID 000182316800026
- The Jrpm System for Dynamically Parallelizing Java Programs 2003
- TEST: A Tracer for Extracting Speculative Threads 2003
- The Jrpm System for Dynamically Parallelizing Java Programs 2003
-
Targeting dynamic compilation for embedded environments
USENIX ASSOCIATION PROCEEDINGS OF THE 2ND JAVA(TM) VIRTUAL MACHINE RESEARCH AND TECHNOLOGY SYMPOSIUM
2002: 151-164
View details for Web of Science ID 000178400500013
-
Efficient state representation for symbolic simulation
39TH DESIGN AUTOMATION CONFERENCE, PROCEEDINGS 2002
2002: 99-104
View details for Web of Science ID 000177213300018
-
High bandwidth on-chip cache design
IEEE TRANSACTIONS ON COMPUTERS
2001; 50 (4): 292-307
View details for Web of Science ID 000168145500002
-
The Stanford Hydra CMP
IEEE MICRO
2000; 20 (2): 71-84
View details for Web of Science ID 000086194900013
-
A single chip multiprocessor integrated with high density DRAM
IEICE TRANSACTIONS ON ELECTRONICS
1999; E82C (8): 1567-1577
View details for Web of Science ID 000082243400030
-
REMARC: Reconfigurable multimedia array coprocessor
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS
1999; E82D (2): 389-397
View details for Web of Science ID 000079040600006
- The Stanford Hydra CMP IEEE MICRO Magazine, March-April 2000, and presented at Hot Chips 1999
- Improving the Performance of Speculatively Parallel Applications on the Hydra CMP 1999
-
Data speculation support for a chip multiprocessor
ACM SIGPLAN NOTICES
1998; 33 (11): 58-69
View details for Web of Science ID 000076778700008
- Considerations in the Design of Hydra: A Multiprocessor-on-a-Chip Microarchitecture Stanford University Computer Systems Lab Technical Report CSL-TR-98-749 1998
-
Digital system simulation: Methodologies and examples
35th Design Automation Conference
ASSOC COMPUTING MACHINERY. 1998: 658–663
View details for Web of Science ID 000077273700118
-
Exploiting method-level parallelism in single-threaded Java programs
International Conference on Parallel Architectures and Compilation Techniques
IEEE COMPUTER SOC. 1998: 176–184
View details for Web of Science ID 000076611700022
-
DCP: an algorithm for datapath/control partitioning of synthesizable RTL models
International Conference on Computer Design: VLSI in Computers and Processors
I E E E, COMPUTER SOC PRESS. 1998: 442–449
View details for Web of Science ID 000076796900070
- Data Speculation Support for a Chip Multiprocessor 1998
- Exploiting Method-Level Parallelism in Single-Threaded Java Programs 1998
-
Multilevel optimization of pipelined caches
IEEE TRANSACTIONS ON COMPUTERS
1997; 46 (10): 1093-1102
View details for Web of Science ID A1997YB64800004
-
A single-chip multiprocessor
COMPUTER
1997; 30 (9): 79-?
View details for Web of Science ID A1997XU01900018
- A Single Chip Multiprocessor Integrated with DRAM 1997
-
Java as a specification language for hardware-software systems
1997 IEEE/ACM International Conference on Computer-Aided Design (ICCAD 97)
I E E E, COMPUTER SOC PRESS. 1997: 690–697
View details for Web of Science ID A1997BK01U00099
-
Verifying correct pipeline implementation for microprocessors
1997 IEEE/ACM International Conference on Computer-Aided Design (ICCAD 97)
I E E E, COMPUTER SOC PRESS. 1997: 162–169
View details for Web of Science ID A1997BK01U00026
-
Designing high bandwidth on-chip caches
24th Annual International Symposium on Computer Architecture
ASSOC COMPUTING MACHINERY. 1997: 121–132
View details for Web of Science ID A1997BH95B00011
- A Single-Chip Multiprocessor IEEE Computer Special Issue on "Billion-Transistor Processors" 1997
- Software and Hardware for Exploiting Speculative Parallelism with a Multiprocessor Stanford University Computer Systems Lab Technical Report CSL-TR-97-715 1997
-
The case for a single-chip multiprocessor
ACM SIGPLAN NOTICES
1996; 31 (9): 2-11
View details for Web of Science ID A1996VM12800003
- The Case for a Single-Chip Multiprocessor 1996
-
A scalable formal verification methodology for pipelined microprocessors
33rd Design Automation Conference
ASSOC COMPUTING MACHINERY. 1996: 558–563
View details for Web of Science ID A1996BF92A00111
-
The impact of shared-cache clustering in small-scale shared-memory multiprocessors
2nd International Symposium on High-Performance Computer Architecture (HPCA-2)
I E E E, COMPUTER SOC PRESS. 1996: 74–84
View details for Web of Science ID A1996BF28H00007
-
Evaluation of design alternatives for a multiprocessor microprocessor
23rd Annual International Symposium on Computer Architecture
ASSOC COMPUTING MACHINERY. 1996: 67–77
View details for Web of Science ID A1996BF68U00007
-
Emulation and prototyping of digital systems
NATO Advanced Study Institute on Hardware/Software Co-Design
SPRINGER. 1996: 339–366
View details for Web of Science ID A1996BF04R00014
-
Increasing cache port efficiency for dynamic superscalar microprocessors
23rd Annual International Symposium on Computer Architecture
ASSOC COMPUTING MACHINERY. 1996: 147–157
View details for Web of Science ID A1996BF68U00014
- Evaluation of Design Alternatives for a Multiprocessor Microprocessor 1996
-
The benefits of clustering in shared address space multiprocessors: An applications-driven investigation
1995 ACM/IEEE Supercomputing Conference (SC 95)
ASSOC COMPUTING MACHINERY. 1995: 1674–1704
View details for Web of Science ID A1995BH56H00055
-
A general method for compiling event driven simulations
32nd Design Automation Conference
ASSOC COMPUTING MACHINERY. 1995: 151–156
View details for Web of Science ID A1995BD41Y00026
-
A SOFTWARE-HARDWARE COSYNTHESIS APPROACH TO DIGITAL SYSTEM SIMULATION
IEEE MICRO
1994; 14 (4): 48-58
View details for Web of Science ID A1994NZ48900009
- Rationale and Design of the Hydra Multiprocessor Stanford University Computer Systems Lab Technical Report CSL-TR-94-645 1994
-
EXPLORING THE DESIGN SPACE FOR A SHARED-CACHE MULTIPROCESSOR
21st Annual International Symposium on Computer Architecture
I E E E, COMPUTER SOC PRESS. 1994: 166–175
View details for Web of Science ID A1994BA93B00015
-
ANALYSIS AND DESIGN OF LATCH-CONTROLLED SYNCHRONOUS DIGITAL CIRCUITS
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS
1992; 11 (3): 322-333
View details for Web of Science ID A1992HB92700004
-
THE DESIGN OF A MICROSUPERCOMPUTER
COMPUTER
1991; 24 (1): 57-64
View details for Web of Science ID A1991ER66000009
-
HIERARCHICAL GATE-ARRAY ROUTING ON A HYPERCUBE MULTIPROCESSOR
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING
1990; 8 (4): 313-324
View details for Web of Science ID A1990CY87900003
-
INTERCONNECTING OFF-THE-SHELF MICROPROCESSORS
AFIPS CONFERENCE PROCEEDINGS
1985; 54: 175-?
View details for Web of Science ID A1985ANT7800024