Bio


Dally develops efficient hardware for demanding information processing problems and sustainable energy systems. His current projects include domain-specific accelerators for deep learning, bioinformatics, and SAT solving; redesigning memory systems for the data center; developing efficient methods for video perception; and developing efficient sustainable energy systems. His research involves demonstrating novel concepts with working systems. Previous systems include the MARS Hardware Accelerator, the Torus Routing Chip, the J-Machine, M-Machine, the Reliable Router, the Imagine signal and image processor, the Merrimac supercomputer, and the ELM embedded processor. His work on stream processing led to GPU computing. His group has pioneered techniques including fast capability-based addressing, processor coupling, virtual channel flow control, wormhole routing, link-level retry, message-driven processing, deadlock-free routing, pruning neural networks, and quantizing neural networks.

Academic Appointments


Boards, Advisory Committees, Professional Organizations


  • Member, American Academy of Arts and Sciences (2009 - Present)
  • Member, National Academy of Engineering (2010 - Present)

Professional Education


  • PhD, Caltech (1986)

2019-20 Courses


Stanford Advisees


  • Doctoral Dissertation Reader (AC)
    Sneha Goenka
  • Doctoral Dissertation Advisor (AC)
    Francis Chen, Huizi Mao, Chenzhuo Zhu
  • Doctoral (Program)
    Huizi Mao, Chenzhuo Zhu

All Publications


  • INVITED: Bandwidth-Efficient Deep Learning Han, S., Dally, W. J., IEEE IEEE. 2018
  • CG-OoO: Energy-Efficient Coarse-Grain Out-of-Order Execution Near In-Order Energy with Near Out-of-Order Performance ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION Mohammadi, M., Aamodt, T. M., Dally, W. J. 2017; 14 (4)

    View details for DOI 10.1145/3151034

    View details for Web of Science ID 000423277300008

  • The Reconfigurable Arithmetic Processor Fiske, S., Dally, W. J.
  • Logic Simulation Algorithms for Pipelined Hardware Architectures Hardware Accelerators for Electrical CAD Agrawal, P., Dally, W. J., Tutundjian, R. edited by Ambler, T., Agrawal, P. 1988.
  • Conference Author/Panelist Index Dally, W. J., Aoki, N., Bai, X., Banerjee, K., Benini, L., Bergamaschi, R.
  • 2010 Reviewers List Dally, W. J., Acacio, M. E., Agrawal, N., Altman, E., Alur, R., Baas, B.
  • Stanford University Concurrent VLSI Architecture Memo 124 Elastic Buffer Networks-on-Chip Michelogiannakis, G., Balfour, J., Dally, W. J.
  • SSCS Members Honored as 2002 IEEE Fellows Banu, M., Burghartz, J. N., Dally, W. J., Dean, M. E., Gielen, G. G., Griffin, E. L.
  • Spills, Fills, and Kills Erez, M., Towles, B. P., Dally, W. J.
  • Program Chair’s Message Dally, W. J.
  • ISSCC 2007/SESSION 24/MULTI-GB/s TRANSCEIVERS/24.3 Palmer, R., Poulton, J., Dally, W. J., Eyles, J., Fuller, A. M., Greer, T.
  • IEEE MICRO 1998 ANNUAL INDEX, VOL. 18 Burns Dally, W. J., Adams, J., Alt, P. M., Arai, T., Arakawa, F., Avresky, D. R. ; 66: 79
  • IEEE Fellows Lead the Engineering Profession Dally, W. J., Agha, G. A., Babic, H. I., Basu, S., Beausoleil, W. F., Bertino, E.
  • Message-Driven Processor Architecture: Verson 11 Dally, W. J., Chien, A., Fiske, S., Horwat, W., Keen, J., Nuth, P.
  • ISSCC 2004/SESSION 7/TD: SCALING TRENDS/7.1 Horowitz, M., Dally, W.
  • Globally Adaptive Load-Balanced Routing on k-ary n-cubes Singh, A., Dally, W. J., Towles, B., Gupta, A. K.
  • ARVLSI’97 Committees Dally, W. J., Brown, R. B., Ishii, A. T., Papaefthymiou, M. C., Mudge, T. N., June, C. S.
  • CIMI FÍITIIt Dally, W. J., Balfour, J., Black-Shaffer, D., Chen, J., Harting, R. C., Parikb, V.
  • AI Memo No. 1272 April 26, 1994 Spertus, E., Dally, W. J.
  • 5 Guest Editors’ Introduction: Hot Chips 21 Krste Asanovic and Ralph Wittig 7 Power7: IBM’s Next-Generation Server Processor Dally, W. J., Kalla, R., Sinharoy, B., Starke, W. J., Floyd, M., Conway, P.
  • 31st Annual International Symposium on Computer Architecture ISCA 2004 Dally, W. J., Agerwala, T., Taylor, M., Lee, W., Miller, J., Wentzlaff, D.
  • 1987 INDEX, VOLUME 4 Dally, W. J., Agrawal, P.
  • 6 Guest Editors’ Introduction: Top Picks from the 2008 Computer Architecture Conferences Joel Emer and Dean Tullsen 10 Larrabee: A Many-Core x86 Architecture Dally, W. J., Seiler, L., Carmean, D., Sprangle, E., Forsyth, T., Abrash, M.
  • FPGAS VERSUS GPUS IN DATACENTERS IEEE MICRO Falsafi, B., Dally, B., Singh, D., Chiou, D., Yi, J. J., Sendag, R. 2017; 37 (1): 60-72
  • Exploring the Granularity of Sparsity in Convolutional Neural Networks Mao, H., Han, S., Pool, J., Li, W., Liu, X., Wang, Y., Dally, W. J., IEEE IEEE. 2017: 1927–34
  • SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks Parashar, A., Rhu, M., Mukkara, A., Puglielli, A., Venkatesan, R., Khailany, B., Emer, J., Keckler, S. W., Dally, W. J., Assoc Comp Machinery ASSOC COMPUTING MACHINERY. 2017: 27–40
  • Reuse Distance-Based Probabilistic Cache Replacement ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION Das, S., Aamodt, T. M., Dally, W. J. 2016; 12 (4)

    View details for DOI 10.1145/2818374

    View details for Web of Science ID 000367950500001

  • On-Chip Active Messages for Speed, Scalability, and Efficiency IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS Harting, R. C., Dally, W. J. 2015; 26 (2): 507-515
  • On-Demand Dynamic Branch Prediction IEEE COMPUTER ARCHITECTURE LETTERS Mohammadi, M., Han, S., Aamodt, T. M., Dally, W. J. 2015; 14 (1): 50-53
  • A 0.54 pJ/b 20 Gb/s Ground-Referenced Single-Ended Short-Reach Serial Link in 28 nm CMOS for Advanced Packaging Applications IEEE JOURNAL OF SOLID-STATE CIRCUITS Poulton, J. W., Dally, W. J., Chen, X., Eyles, J. G., Greer, T. H., Tell, S. G., Wilson, J. M., Gray, C. T. 2013; 48 (12): 3206-3218
  • Elastic Buffer Flow Control for On-Chip Networks IEEE TRANSACTIONS ON COMPUTERS Michelogiannakis, G., Dally, W. J. 2013; 62 (2): 295-309
  • A 0.54 pJ/b 20Gb/s ground-referenced single-ended short-haul serial link in 28nm CMOS for advanced packaging applications Solid-State Circuits Conference Digest of Technical Papers (ISSCC) Poulton, J. W., Dally, W. J., Chen, X., Eyles, J. G., Greer, T. H., Tell, S. G. 2013
  • Optimizing data structures in high-level programs: new directions for extensible compilers based on staging Rompf, T., Sujeeth, A. K., Amin, N., Brown, K. J., Jovanovic, V., Lee, H. 2013
  • Composition and reuse with compiled domain-specific languages Dally, W. J., Sujeeth, A. K., Rompf, T., Brown, K. J., Lee, H., Chafi, H. 2013
  • Channel reservation protocol for over-subscribed channels and destinations Michelogiannakis, G., Jiang, N., Becker, D., Dally, W. J. 2013
  • 21st century digital design tools Dally, W. J., Malachowsky, C., Keckler, S. W. 2013
  • A detailed and flexible cycle-accurate network-on-chip simulator Performance Analysis of Systems and Software (ISPASS) Jiang, N., Becker, D. U., Michelogiannakis, G., Balfour, J., Towles, B., Shaw, D. E. 2013
  • A 0.54 pJ/b 20 Gb/s Ground-Referenced Single-Ended Short-Reach Serial Link in 28 nm CMOS for Advanced Packaging Applications IEEE Poulton, J. W., Dally, W. J., Chen, X., Eyles, J. G., Greer, T. H., Tell, S. G. 2013
  • A Hierarchical Thread Scheduler and Register File for Energy-Efficient Throughput Processors ACM TRANSACTIONS ON COMPUTER SYSTEMS Gebhart, M., Johnson, D. R., Tarjan, D., Keckler, S. W., Dally, W. J., Lindholm, E., Skadron, K. 2012; 30 (2)
  • Green-Marl: A DSL for Easy and Efficient Graph Analysis ACM SIGPLAN NOTICES Hong, S., Chafi, H., Sedlar, E., Olukotun, K. 2012; 47 (4): 349-362
  • Adaptive Backpressure: Efficient Buffer Management for On-Chip Networks 30th IEEE International Conference on Computer Design (ICCD) Becker, D. U., Jiang, N., Michelogiannakis, G., Dally, W. J. IEEE. 2012: 419–426
  • Unifying primary cache, scratch, and register file memories in a throughput processor Gebhart, M., Keckler, S. W., Khailany, B., Krashinsky, R., Dally, W. J. 2012
  • A case of system-level hardware/software co-design and co-verification of a commodity multi-processor system with custom hardware Dally, W. J., Hong, S., Oguntebi, T., Casper, J., Bronson, N., Kozyrakis, C. 2012
  • Digital Design: A Systems Approach Dally, W. J., Harting, R. C. Cambridge University Press. 2012
  • Article 8-A Hierarchical Thread Scheduler and Register File for Energy-Efficient Throughput Processors ACM Transactions on Computer Systems-TOCS Gebhart, M., Johnson, D. R., Tarjan, D., Keckler, S. W., Dally, W. J., Lindholm, E. 2012; 2 (30): 38
  • It's about the Power: An Architect's View of Interconnect IEEE International Interconnect Technology Conference (IITC) Dally, B. IEEE. 2012
  • Network Congestion Avoidance Through Speculative Reservation 18th IEEE International Symposium on High-Performance Computer Architecture (HPCA) Jiang, N., Becker, D. U., Michelogiannakis, G., Dally, W. J. IEEE. 2012: 443–454
  • Packet Chaining: Efficient Single-Cycle Allocation for On-Chip Networks IEEE COMPUTER ARCHITECTURE LETTERS Michelogiannakis, G., Jiang, N., Becker, D. U., Dally, W. J. 2011; 10 (2): 33-36
  • Evaluating Elastic Buffer and Wormhole Flow Control IEEE TRANSACTIONS ON COMPUTERS Michelogiannakis, G., Becker, D. U., Dally, W. J. 2011; 60 (6): 896-903
  • 2011 Index IEEE Computer Architecture Letters Vol. 10 Computer Architecture Letters Becker, D., Choi, I., Cooper-Balis, E., Dally, W. J., Devadas, S., Duato, J. 2011; 53: 56
  • Liszt: a domain specific language for building portable mesh-based PDE solvers DeVito, Z., Joubert, N., Palacios, F., Oakley, S., Medina, M., Barrientos, M. 2011
  • Circuit challenges for future computing systems Dally, W. J. 2011
  • A compile-time managed multi-level register file hierarchy Gebhart, M., Keckler, S. W., Dally, W. J. 2011
  • Guaranteeing forward progress of unified register allocation and instruction scheduling Technical Report Concurrent VLSI Architecture Group Memo 127, Stanford Park, J., Dally, W. J. 2011
  • Gpus and the future of parallel computing Micro, IEEE Keckler, S. W., Dally, W. J., Khailany, B., Garland, M., Glasco, D. 2011; 5 (31): 7-17
  • Energy-efficient mechanisms for managing thread context in throughput processors ACM SIGARCH Computer Architecture News Gebhart, M., Johnson, D. R., Tarjan, D., Keckler, S. W., Dally, W. J., Lindholm, E. 2011; 3 (39): 235-246
  • 4 Guest Editor’s Introduction: CPUs, GPUs, and Hybrid Computing David Brooks 7 GPUs and the Future of Parallel Computing Keckler, S. W., Dally, W. J., Khailany, B., Garland, M., Glasco, D., Rohr, D. 2011
  • Efficient Topologies for Large-scale Cluster Networks Conference on Optical Fiber Communication (OFC)/Collocated National Fiber Optic Engineers (NFOEC) Kim, J., Dally, W. J., Abts, D. IEEE. 2010
  • The end of denial architecture and the rise of throughput computing Dally, W. J. 2010
  • Throughput computing Dally, W. J. 2010
  • Fine-grain dynamic instruction placement for L0 scratch-pad memory Park, J., Balfour, J., Dally, W. J. 2010
  • Block-Parallel Programming for Real-time Embedded Applications WJ 2010

    View details for DOI D

  • Evaluating bufferless flow control for on-chip networks Michelogiannakis, G., Sanchez, D., Dally, W. J., Kozyrakis, C. 2010
  • The GPU Computing Era (HTML) Nickolls, J., Dally, W. J. 2010
  • The GPU computing era Micro, IEEE Nickolls, J., Dally, W. J. 2010; 2 (30): 56-69
  • The end of denial architecture and the rise of throughput computing Keynote speech at Desgin Automation Conference Dally, W. J. 2010
  • The even/odd synchronizer: A fast, all-digital, periodic synchronizer Asynchronous Circuits and Systems (ASYNC), 2010 IEEE Symposium on Dally, W. J., Tell, S. G. 2010: 75-84
  • Moving the needle, computer architecture research in academe and industry ACM SIGARCH Computer Architecture News Dally, W. J. 2010; 3 (38): 1-1
  • Apparatus and method for packet scheduling US Patent Dally, W. J., Carvey, P. P., Beliveau, P. A., Mann, W. F., Dennison, L. R. 2010; 760 (7): 747
  • Booksim 2.0 User’s Guide Standford University Jiang, N., Michelogiannakis, G., Becker, D., Towles, B., Dally, W. J. 2010
  • 2010 IEEE Symposium on Asynchronous Circuits and Systems Dally, W. J., Tell, S. G. 2010
  • Buffer-space Efficient and Deadlock-free Scheduling of Stream Applications on Multi-core Architectures 22nd ACM Symposium on Parallelism in Algorithms and Architectures Park, J., Dally, W. J. ASSOC COMPUTING MACHINERY. 2010: 1–10
  • Operand Registers and Explicit Operand Forwarding IEEE COMPUTER ARCHITECTURE LETTERS Balfour, J., Harting, R. C., Dally, W. J. 2009; 8 (2): 60-63
  • COST-EFFICIENT DRAGONFLY TOPOLOGY FOR LARGE-SCALE SYSTEMS IEEE MICRO Kim, J., Dally, W., Scott, S., Abts, D. 2009; 29 (1): 33-40
  • Router designs for elastic buffer on-chip networks Michelogiannakis, G., Dally, W. J. 2009
  • Embracing heterogeneity–parallel programming for changing hardware Linderman, M. D., Balfour, J., Meng, T. H., Dally, W. J. 2009
  • Allocator implementations for network-on-chip routers Becker, D. U., Dally, J. J. 2009
  • Stream Processors Multicore Processors and Systems Erez, M., Dally, W. J. 2009: 231-270
  • Power efficient supercomputing Accelerator-based Computing and Manycore Workshop (presentation) Dally, W. J. 2009; 1
  • Maximizing the Filter Rate of L0 Compiler-Managed Instruction Stores by Pinning Technical Report 126, Concurrent VLSI Architecture Group, Stanford University Park, J., Balfour, J., Dally, W. J. 2009
  • Load-balanced routing US Patent Singh, A., Dally, W. J. 2009; 633 (7): 940
  • Indirect adaptive routing on large scale interconnection networks ACM SIGARCH Computer Architecture News Jiang, N., Kim, J., Dally, W. J. 2009; 3 (37): 220-231
  • Exascale software study: Software challenges in extreme scale systems DARPA IPTO, Air Force Research Labs Amarasinghe, S., Campbell, D., Carlson, W., Chien, A., Dally, W., Elnohazy, E. 2009
  • Elastic-buffer flow control for on-chip networks High Performance Computer Architecture Michelogiannakis, G., Balfour, J., Dally, W. J. 2009
  • Cost-Efficient Dragonfly Topology for Large-Scale Systems Conference on Optical Fiber Communication (OFC 2009) Kim, J., Dally, W. J., Scott, S., Abts, D. IEEE. 2009: 2174–2176
  • Opportunities Beyond Single-Core Microprocessors 15th International Symposium on High-Performance Computer Architecture Hill, M. D., Adve, S. V., Bader, D. A., Dally, W., Harrod, W., Sarkar, V. IEEE COMPUTER SOC. 2009: 143–143
  • Elastic-Buffer Flow Control for On-Chip Networks 15th International Symposium on High-Performance Computer Architecture Michelogiannakis, G., Balfour, J., Dally, W. J. IEEE COMPUTER SOC. 2009: 151–162
  • Efficient embedded computing COMPUTER Dally, W. J., Balfour, J., Black-Shaffer, D., Chen, J., Harting, R. C., Parikh, V., Park, J., Sheffield, D. 2008; 41 (7): 27-?
  • Technology-driven, highly-scalable dragonfly topology 35th Annual International Symposium on Computer Architecture Kim, J., Dally, W. J., Scott, S., Abts, D. IEEE COMPUTER SOC. 2008: 77–88
  • Exascale computing study: Technology challenges in achieving exascale systems Bergman, K., Borkar, S., Campbell, D., Carlson, W., Dally, W., Denneau, M. 2008
  • A tuning framework for software-managed memory hierarchies Ren, M., Park, J. Y., Houston, M., Aiken, A., Dally, W. J. 2008
  • Structured Application-Specific Integrated Circuit (ASIC) Study STANFORD UNIV CA COMPUTER SYSTEMS LAB Dally, W., Balfour, J., Black-Schaffer, D., Hartke, P. 2008
  • Hierarchical instruction register organization Computer Architecture Letters Black-Schaffer, D., Balfour, J., Dally, W., Parikh, V., Park, J. S. 2008; 2 (7): 41-44
  • Exascale computing study: Technology challenges in achieving exascale systems Kogge, P., Bergman, K., Borkar, S., Campbell, D., Carson, W., Dally, W. 2008
  • An energy-efficient processor architecture for embedded systems Computer Architecture Letters Balfour, J., Dally, W. J., Black-Schaffer, D., Parikh, V., Park, J. S. 2008; 1 (7): 29-32
  • A programmable 512 GOPS stream processor for signal, image, and video processing Solid-State Circuits, IEEE Journal Khailany, B. K., Williams, T., Lin, J., Long, E. P., Rygh, M., Tovey, D. F., Dally, B. 2008; 1 (43): 202-213
  • Stream scheduling: A framework to manage bulk operations in memory hierarchies 14th International Euro-Par Conference on Parallel Computing Das, A., Dally, W. J. SPRINGER-VERLAG BERLIN. 2008: 337–349
  • A Portable Runtime Interface For Multi-Level Memory Hierarchies ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 08) Houston, M., Park, J., Ren, M., Knight, T., Fatahalian, K., Aiken, A., Dally, W. J., Hanrahan, P. ASSOC COMPUTING MACHINERY. 2008: 143–152
  • A 14-mW 6.25-Gb/s transceiver in 90-nm CMOS IEEE International Solid-State Circuits Conference (ISSCC) Poulton, J., Palmer, R., Fuller, A. M., Greer, T., Eyles, J., Dally, W. J., Horowitz, M. IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC. 2007: 2745–57
  • Research challenges for on-chip interconnection networks IEEE MICRO Owens, J. D., Dally, W. J., Ho, R., Jayasimha, D. N., Keckler, S. W., Peh, L. 2007; 27 (5): 96-108
  • Flattened Butterfly : A Cost-Efficient Topology for High-Radix Networks 34th Annual International Symposium on Computer Architecture Kim, J., Dally, W. J., Abts, D. ASSOC COMPUTING MACHINERY. 2007: 126–137
  • Tradeoff between data-, instruction-, and thread-level parallelism in stream processors Ahn, J., Erez, M., Dally, W. J. 2007
  • Executing irregular scientific applications on stream architectures Erez, M., Ahn, J. H., Gummaraju, J., Rosenblum, M., Dally, W. J. 2007
  • Architectural support for the stream execution model on general-purpose processors Gummaraju, J., Erez, M., Coburn, J., Rosenblum, M., Dally, W. J. 2007
  • A 14mW 6.25 Gb/s transceiver in 90nm CMOS for serial chip-to-chip communications Palmer, R., Poulton, J., Dally, W. J., Eyles, J., Fuller, A. M., Greer, T. 2007
  • Stream Scheduling: A Framework to Manage Bulk Operations in a Memory Hierarchy Parallel Architecture and Compilation Techniques Das, A., Dally, W. J. 2007
  • Research Challenges for On-Chip Interconnection Networks (HTML) Owens, J. D., Dally, W. J., Ho, R., Jayasimha, D. N., Keckler, S. W., Peh, L. S. 2007
  • Interconnect-Centric Computing. HPCA Dally, W. J., Keynote, H. 2007; 1
  • Flattened butterfly: a cost-efficient topology for high-radix networks ACM SIGARCH Computer Architecture News Kim, J., Dally, W. J., Abts, D. 2007; 2 (35): 126-137
  • Computer architecture in the many-core era 24th International Conference on Computer Design Dally, B. IEEE. 2007: 1–1
  • Flattened butterfly topology for on-chip networks 40th Annual IEEE/AMC International Symposium on Microarchitecture Kim, J., Balfour, J., Dally, W. J. IEEE COMPUTER SOC. 2007: 172–182
  • Compilation for Explicitly Managed Memory Hierarchies ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming Knight, T. J., Park, J. Y., Ren, M., Houston, M., Erez, M., Fatahalian, K., Aiken, A., Dally, W. J., Hanrahan, P. ASSOC COMPUTING MACHINERY. 2007: 226–236
  • Register pointer architecture for efficient embedded processors Design, Automation and Test in Europe Conference and Exhibition (DATE 07) Park, J., Park, S., Balfour, J. D., Black-Schaffer, D., Kozyrakis, C., Dally, W. J. IEEE. 2007: 600–605
  • The BlackWidow high-radix Clos network 33rd International Symposium on Computer Architecture Scott, S., Abts, D., Kim, J., Dally, W. J. IEEE COMPUTER SOC. 2006: 16–27
  • Pulsenet - A parallel flash sampler and digital processor IC for optical SETI PROCEEDINGS OF THE IEEE 2006 CUSTOM INTEGRATED CIRCUITS CONFERENCE Howard, A. W., Wei, G., Dally, W. J., Horowitz, P. 2006: 261-264
  • Sequoia: programming the memory hierarchy Fatahalian, K., Horn, D., Knight, T., Leem, L., Houston, M., Park, J., Dally, B. 2006
  • Multi-Core for HPC: Breakthrough or Breakdown? Sterling, T., Kogge, P., Dally, W., Scott, S., Gropp, W., Keyes, D. 2006
  • The design space of data-parallel memory systems Ahn, J. H., Erez, M., Dally, W. J. 2006
  • Design tradeoffs for tiled CMP on-chip networks Balfour, J., Dally, W., J. 2006
  • Compiling for stream processing Das, A., Dally, W. J., Mattson, P. 2006
  • Adaptive routing in high-radix clos network Kim, J., Dally, W. J., Dally, J., Abts, D. 2006
  • Topology optimization of interconnection networks Computer Architecture Letters Gupta, A. K., Dally, W. J. 2006; 1 (5): 10-13
  • Prefix search method US Patent Waters, G. M., Dennison, L. R., Carvey, P. P., Dally, W. J., Mann, W. F. 2006; 130 (7): 847
  • Pulsenet-A Parallel Flash Sampler and Digital Processor IC for Optical SETI Custom Integrated Circuits Conference, 2006. CICC'06. IEEE Howard, A. W., Wei, G. Y., Dally, W. J., Horowitz, P. 2006: 261-264
  • Future directions for on-chip interconnection networks OCIN Workshop Dally, W. J. 2006
  • DRAFT Final Report: Workshop on On-and Off-Chip Networks for Multi-Core Systems Capturado em: http://www. ece. ucdavis. edu/~ ocin06 Dally, W. 2006
  • Data parallel address architecture Computer Architecture Letters Ahn, J. H., Dally, W. J. 2006; 1 (5): 30-33
  • A 20-Gb/s 0.13-mu m CMOS serial link transmitter using an LC-PLL to directly drive the output multiplexer Symposium on VLSI Circuits Chiang, P., Dally, W. J., Lee, M. J., Senthinathan, R., Oh, Y., Horowitz, M. A. IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC. 2005: 1004–11
  • Scatter-add in data parallel architectures 11th International Symposium on High-Performance Computer Architecture Ahn, J. H., Erez, M., Dally, W. J. IEEE COMPUTER SOC. 2005: 132–142
  • Fault tolerance techniques for the merrimac streaming supercomputer Erez, M., Jayasena, N., Knight, T. J., Dally, W. J. 2005
  • 11th International Symposium on High-Performance Computer Architecture (HPCA'05) Ahn, J. H., Erez, M., Dally, W. J. 2005
  • Explaining the gap between ASIC and custom power: A custom perspective 42nd Design Automation Conference Chang, A., Dally, W. J. IEEE COMPUTER SOC. 2005: 281–284
  • Microarchitecture of a high-radix router 32nd International Symposium on Computer Architecture Kim, J., Dally, W. J., Towles, B., Gupta, A. K. IEEE COMPUTER SOC. 2005: 420–431
  • A 33-mW 8-Gb/s CMOS clock multiplier and CDR for highly integrated I/Os IEEE Custom Integrated Circuits Conference Farjad-Rad, R., Nguyen, A., Tran, J. M., Greer, T., Poulton, J., Dally, W. J., Edmondson, J. H., Senthinathan, R., Rathi, R., Lee, M. J., Ng, H. T. IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC. 2004: 1553–61
  • A 20Gb/s 0.13um CMOS serial link transmitter using an LC-PLL to directly drive the output multiplexer Symposium on VLSI Circuits Chiang, P., Dally, W. J., Lee, M. J., Senthinathan, R., Oh, Y., Horowitz, M. IEEE. 2004: 272–275
  • The case for broader computer architecture education: keynote address Dally, W. J. 2004
  • Analysis and performance results of a molecular modeling application on Merrimac Erez, M., Ahn, J. H., Garg, A., Dally, W. J., Darve, E. 2004
  • Adaptive channel queue routing on k-ary n-cubes Singh, A., Dally, W., J., Gupta, A., Towles, B. 2004
  • Stream processors: Progammability and efficiency Queue Dally, W. J., Kapasi, U. J., Khailany, B., Ahn, J. H., Da, A. 2004; 1 (2): 52
  • Streams and vectors: A memory system perspective 6th WorkShop on Media and Streaming Processors Jayasena, N., Dally, W. J. 2004
  • Space-efficient source routing Carvey, P., Dally, W., Dennison, L., King, P., Mann, W. 2004
  • Principles and practices of interconnection networks Access Online via Elsevier Dally, W. J., Towles, B. P. 2004
  • High-Speed Logic, Circuits, Libraries and Layout Closing the Gap Between ASIC & Custom Chang, A., Dally, W. J., Chinnery, D., Keutzer, K., Zlatanovici, R. 2004: 101-144
  • How scaling will change processor architecture Solid-State Circuits Conference, 2004. Digest of Technical Papers. Horowitz, M., Dally, W. 2004
  • Globally adaptive load-balanced routing on tori Computer Architecture Letters Singh, A., Dally, W. J., Towles, B., Gupta, A. K. 2004; 1 (3): 2-2
  • Exploiting Structure and Managing Wires to Increase Density and Performance Closing the Gap Between ASIC & Custom Chang, A., Dally, W. J. 2004: 269-287
  • Buffer and delay bounds in high radix interconnection networks Computer Architecture Letters Singh, A., Dally, W. J. 2004; 1 (3): 8-8
  • Evaluating the imagine stream architecture 31st Annual International Symposium on Computer Architecture Alm, J. H., Dally, W. J., Khailany, B., Kapasi, U. J., Das, A. IEEE COMPUTER SOC. 2004: 14–25
  • Stream register files with indexed access 10th International Symposium on High-Performance Computer Architecture Jayasena, N., Erez, M., Ahn, J. H., Dally, W. J. IEEE COMPUTER SOC. 2004: 60–72
  • A second-order semidigital clock recovery circuit based on injection locking IEEE International Solid-State Circuits Conference Ng, H. T., Farjad-Rad, R., Lee, M. J., Dally, W. J., Greer, T., Poulton, J., Edmondson, J. H., Rathi, R., Senthinathan, R. IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC. 2003: 2101–10
  • Guaranteed scheduling for switches with configuration overhead IEEE-ACM TRANSACTIONS ON NETWORKING Towles, B., Dally, W. J. 2003; 11 (5): 835-847
  • Programmable stream processors COMPUTER Kapasi, U. J., Rixner, S., Dally, W. J., Kailany, B., Ahn, J. H., Mattson, P., Owens, J. D. 2003; 36 (8): 54-?
  • Jitter transfer characteristics of delay-locked loops - Theories and design techniques IEEE JOURNAL OF SOLID-STATE CIRCUITS Lee, M. J., Dally, W. J., Greer, T., Ng, H. T., Farjad-Rad, R., Poulton, J., Senthinathan, R. 2003; 38 (4): 614-621
  • GOAL: A load-balanced adaptive routing algorithm for torus networks 30th Annual International Symposium on Computer Architecture Singh, A., Dally, W. J., Gupta, A. K., Towles, B. IEEE COMPUTER SOC. 2003: 194–205
  • Throughput-centric routing algorithm design Towles, B., Dally, W. J., Boyd, S. 2003
  • Merrimac: Supercomputing with streams Dally, W., J., Labonte, F., Das, A., Hanrahan, P., Ahn, J. H., Gummaraju, J. 2003
  • CMOS high-speed I/Os-present and future Lee, M. J., Dally, W. J., Farjad-Rad, R., Ng, H. T., Senthinathan, R., Edmondson, J. 2003
  • A 33mW 8Gb/s CMOS clock multiplier and CDR for highly integrated I/Os Ng, H. T., Lee, M. J., Farjad-Rad, R., Senthinathan, R., Dally, W. J., Nguyen, A. 2003
  • The Ninth International Symposium on High-Performance Computer Architecture (HPCA'03) Khailany, B., Dally, W. J., Rixner, S., Kapasi, U. J., Owens, J. D., Towles, B. 2003
  • Prefix search method Carvey, P., Carvey, P., Dennison, L., Mann, W., Waters, G. 2003
  • Methods and apparatus for event-driven routing Carvey, P., Dally, W., Dennison, L., King, P. 2003
  • A second-order semi-digital clock recovery circuit based on injection locking Solid-State Circuits Conference, 2003. Digest of Technical Papers. ISSCC Lee, M. J., Dally, W. J., Poulton, J., Greer, T., Edmondson, J., Farjad-Rad, R. 2003
  • 0.622-8.0 Gbps 150 mW serial IO macrocell with fully flexible preemphasis and equalization VLSI Circuits, 2003. Digest of Technical Papers. 2003 Symposium on Farjad-Rad, R., Ng, H. T., Lee, M. J., Senthinathan, R., Dally, W. J., Nguyen, A. 2003: 63-66
  • Exploring the VLSI scalability of stream processors 9th International Symposium on High-Performance Computer Architecture Khailany, B., Dally, W. J., Rixner, S., Kapasi, U. J., Owens, J. D., Towles, B. IEEE COMPUTER SOC. 2003: 153–164
  • A low-power multiplying DLL for low-jitter multigigahertz clock generation in highly integrated digital chips IEEE International Solid-State Circuits Conference (ISSCC 2001) Farjad-Rad, R., Dally, W., Ng, H. T., Senthinathan, R., Lee, M. J., Rathi, R., Poulton, J. IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC. 2002: 1804–12
  • Scalable opto-electronic network (SOENet) 10th Symposium on High Performance Interconnects Gupta, A. K., Dally, W. J., Singh, A., Towles, B. IEEE COMPUTER SOC. 2002: 71–76
  • Worst-case traffic for oblivious routing functions Towles, B., Dally, W. J. 2002
  • Locality-preserving randomized oblivious routing on torus networks Singh, A., Dally, W. J., Towles, B., Gupta, A. K. 2002
  • Comparing Reyes and OpenGL on a stream architecture Owens, J. D., Khailany, B., Towles, B., Dally, W. J. 2002
  • A 0.2-2 GHz 12 mW multiplying DLL for low-jitter clock synthesis in highly-integrated data communication chips Farjad-Rad, R., Dally, W., Ng, H. T., Poulton, J., Stone, T., Rathi, R. 2002
  • Worst-case Traffic for Oblivious Routing Functions (PDF) Towles, B., Dally, W. J. 2002
  • Prefix search circuitry and method Carvey, P., Dally, W., Dennison, L., Mann, W., Waters, G. 2002
  • Stream Processing for High-Performance Embedded Systems Defense Technical Information Center Dally, W. J. 2002
  • Migration in single chip multiprocessors Computer Architecture Letters Shaw, K. A., Dally, W. J. 2002; 1 (1): 12-12
  • Method and system for guaranteeing quality of service in large capacity input output buffered cell switch based on minimum bandwidth guarantees and weighted fair share of unused bandwidth Dally, W., Meempat, G., Ramamurthy, G. 2002
  • Internet switch router Carvey, P., Carvey, P., Dennison, L., King, P. 2002
  • Computer architecture is all about interconnect High-Perf. Comp. Architecture Dally, W. J. 2002
  • VLSI design and verification of the imagine processor 20th IEEE International Conference on Computer Design Khailany, B., Dally, W. J., Chang, A., Kapasi, U. J., Namkoong, J., Towles, B. IEEE COMPUTER SOC. 2002: 289–294
  • Guaranteed scheduling for switches with configuration overhead 21st Annual Joint Conference of the IEEE-Computer-and-Communications-Societies Towles, B., Dally, W. J. IEEE. 2002: 342–351
  • Media processing applications on the imagine stream processor 20th IEEE International Conference on Computer Design Owens, J. D., Rixner, S., Kapasi, U. J., Mattson, P., Towles, B., Serebrin, B., Dally, W. J. IEEE COMPUTER SOC. 2002: 295–302
  • A stream processor development platform 20th IEEE International Conference on Computer Design Serebrin, B., Owens, J. D., Chen, C. H., Crago, S. P., Kapasi, U. J., Khailany, B., Mattson, P., Namkoong, J., Rixner, S., Dally, W. J. IEEE COMPUTER SOC. 2002: 303–308
  • The imagine stream processor 20th IEEE International Conference on Computer Design Kapasi, U. J., Dally, W. J., Rixner, S., Owens, J. D., Khailany, B. IEEE COMPUTER SOC. 2002: 282–288
  • Imagine: Media processing with streams IEEE MICRO Khailany, B., Dally, W. J., Kapasi, U. J., Mattson, P., Namkoong, J., Owens, J. D., Towles, B., Chang, A., Rixner, S. 2001; 21 (2): 35-46
  • Hot chips 12 IEEE MICRO Dally, W. J., Tremblay, M., Baum, A. J. 2001; 21 (2): 13-15
  • A delay model for router microarchitectures IEEE MICRO Peh, L. S., Dally, W. J. 2001; 21 (1): 26-34
  • Elastic interconnects: Repeater-inserted long wiring capable of compressing and decompressing data Mizuno, M., Dally, W., Onishi, H. 2001
  • Scalable switching fabrics for Internet routers White paper, Avici Systems Inc Dally, W. J. 2001
  • Monolithic chaotic communications system Circuits and Systems, 2001. ISCAS 2001. The 2001 IEEE International Chiang, P., Dally, W., Lee, E. 2001
  • Guest Editors' Introduction: Hot Chips 12 IEEE MICRO Baum, A. J., Dally, W. J., Tremblay, M. 2001; 2 (21): 0013-15
  • Guest Editors' Introduction: Hot Chips 12 (HTML) Dally, W. J., Tremblay, M., Baum, A. J. 2001
  • A streaming supercomputer Whitepaper Dally, W. J., Hanrahan, P., Fedkiw, R. 2001
  • A single-chip terabit switch Hot Chips Dally, W. J., Dettloff, W., Eyles, J., Greer, T., Poulton, J., Stone, T. 2001; 13
  • A Delay Model for Router Microarchitectures (HTML) Peh, L. S., Dally, W. J. 2001
  • An 84-mW 4-Gb/s clock and data recovery circuit for serial link applications Symposium on VLSI Circuits Lee, M. J., Dally, W. J., POULTON, J. W., Chiang, P., Greenwood, S. F. JAPAN SOCIETY APPLIED ELECTROMAGNETICS & MECHANICS. 2001: 149–152
  • A delay model and speculative architecture for pipelined routers 7th International Symposium on High-Performance Computer Architecture Peh, L. S., Dally, W. J. IEEE COMPUTER SOC. 2001: 255–266
  • Communication scheduling 9th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS_IX) Mattson, P., Dally, W. J., Rixner, S., Kapasi, U. J., Owens, J. D. ASSOC COMPUTING MACHINERY. 2000: 82–92
  • Low-power area-efficient high-speed I/O circuit techniques International Solid-State Circuits Conference Lee, M. J., Dally, W. J., Chiang, P. IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC. 2000: 1591–99
  • Processor mechanisms for software shared memory 3rd International Symposium on High Performance Computing Carter, N. P., Dally, W. J., Lee, W. S., Keckler, S. W., Chang, A. SPRINGER-VERLAG BERLIN. 2000: 120–133
  • 10 Subspace Optimizations Knobe, K., Dally, W. J. edited by Kessler, Christoph, W. 2000
  • Flit-reservation flow control Peh, L., S., Dally, W. J. 2000
  • Register organization for media processing Rixner, S., Dally, W., J., Khailany, B., Mattson, P., Kapasi, U. J., Owens, J. 2000
  • Polygon rendering on a stream architecture Owens, J. D., Dally, W. J., Kapasi, U. J., Rixner, S., Mattson, P., Mowery, B. 2000
  • A 90 mW 4 Gb/s equalized I/O circuit with input offset cancellation Lee, M. J., Dally, W., Chiang, P. 2000
  • Stream scheduling STANFORD UNIV CA COMPUTER SYSTEMS LAB Kapasi, U. J., Mattson, P., Dally, W. J., Owens, J. D., Towles, B. 2000
  • Stream Scheduling STANFORD UNIV CA COMPUTER SYSTEMS LAB Dally, W. J., Mattson, P., Kapasi, U. J., Owens, J. D., Towles, B. 2000
  • Smart memories: A modular reconfigurable architecture ACM SIGARCH Computer Architecture News Mai, K., Paaske, T., Jayasena, N., Ho, R., Dally, W. J., Horowitz, M. 2000; 2 (28): 161-171
  • Sixth International Symposium on High-Performance Computer Architecture Rixner, S., Dally, W. J., Khailany, B., Mattson, P., Kapasi, U. J., Owens, J. D. 2000
  • Sixth International Symposium on High-Performance Computer Architecture Peh, L. S., Dally, W. J. 2000
  • Memory access scheduling isca Owens, J. D., Mattson, P., Kapasi, U. J., Dally, W. J., Rixner, S. 2000; 128
  • The role of custom design in ASIC chips 37th Annual Design Automation Conference (DAC) Dally, W. J., Chang, A. ASSOC COMPUTING MACHINERY. 2000: 643–647
  • Efficient conditional operations for data-parallel architectures 33rd Annual International Symposium on Microarchitecture (MICRO-33) Kapasi, U. J., Dally, W. J., Rixner, S., Mattson, P. R., Owens, J. D., Khailany, B. IEEE COMPUTER SOC. 2000: 159–170
  • Concurrent event handling through multithreading IEEE TRANSACTIONS ON COMPUTERS Keckler, S. W., Chang, A., Lee, W. S., Chatterjee, S., Dally, W. J. 1999; 48 (9): 903-916
  • VLSI architecture: Past, present, and future 20th Anniversary Conference on Advanced Research in VLSI Dally, W. J., Lacy, S. IEEE COMPUTER SOC. 1999: 232–241
  • GAD: A 12-GS/s CMOS 4-bit A/D converter for an equalized multi-level link Ellersick, W., Yang, C. K., Horowitz, M., Dally, W. J. 1999
  • Interconnect-limited VLSI architecture Interconnect Technology, 1999. IEEE International Conference Dally, W. J. 1999: 15-17
  • Computer Architecture for the Next Millenium Dally, W. J. 1999
  • 20th Anniversary Conference on Advanced Research in VLSI Dally, W. J., Lacy, S. 1999
  • A tracking clock recovery receiver for 4-Gbps signaling IEEE MICRO Poulton, J., Dally, W. J., Tell, S. 1998; 18 (1): 25-27
  • Retrospective: the J-machine Dally, W. J., Chien, A., Fiske, S., Horwat, W., Lethin, R., Noakes, M. 1998
  • Invited Talks Coldren, L. A., Dally, W. J. 1998
  • Architecture of the Avici terabit switch/router Dally, W., Carvey, P., Dennison, L. 1998
  • VLSI datapath choices: Cell-based versus full-custom Massachusetts Institute of Technology Chang, A. L. 1998
  • Tomorrow’s Computing Engines keynote speech, Fourth Int’l Symp. High-Performance Computer Architecture Dally, W. 1998
  • Digital Systems Engineering Poulton, J. W., Dally, J., John, W. Cambridge University Press. 1998
  • The J-Machine ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE Dally, W. J., Chang, A., Chien, A., Fiske, S., Horwat, W., Keen, J. 1998; 25: 54-58
  • The Fifth International Conference on Massively Parallel Processing Using Optical Interconnections Dally, W. J., Lee, M. J., An, F. T., Poulton, J., Tell, S. 1998
  • The j-machine: A retrospective Retrospective in Dally, W. J., Chang, A., Chien, A., Fiske, S., Horwat, W., Keen, J. 1998: 54-58
  • Point sample rendering Massachusetts Institute of Technology Dally, W. J., Grossman, J. P. 1998
  • Point sample rendering Rendering Techniques Grossman, J. P., Dally, W. J. 1998; 98: 181-192
  • Media Processors 1999 (Proceedings Volume) Dally, W. J., Fritts, J. E., Wolf, W. H., Liu, B., Bove Jr, V. M., Lee, M. 1998
  • Media processing using streams Electronic Imaging Rixner, S., Dally, W. J., Kapasi, U. J., Khailany, B., Lopez-Lagunas, A., Mattson, P. R. 1998: 122-134
  • E cient, protected message interface in the MIT M-Machine IEEE Computer Special Issue on Design Challenges for High-Performance Lee, W. S., Dally, W. J., Keckler, S. W., Carter, N. P., Chang, A. 1998
  • Digital systems engineering Cambridge university press Dally, W. J., Poulton, J. W. 1998
  • An instruction scheduling algorithm for communication-constrained microprocessors Massachusetts Institute of Technology Dally, W. J., Buehler, C. J. 1998
  • Architecture of a message-driven processor 25 years of the international symposia on Computer architecture (selected Dally, W. J., Chao, L., Chien, A., Hassoun, S., Horwat, W., Kaplan, J. 1998
  • An efficient, protected message interface Computer Lee, W. S., Dally, W. J., Keckler, S. W., Carter, N. P., Chang, A. 1998; 11 (31): 69-75
  • Guest editors' introduction: The bleeding edge IEEE MICRO Rettberg, R., Dally, W. J., Culler, D. E. 1998; 18 (1): 10-11
  • High-performance electrical signaling 5th International Conference on Massively Parallel Processing Dally, W. J., Lee, M. J., An, F. T., Poulton, J., Tell, S. I E E E, COMPUTER SOC PRESS. 1998: 11–16
  • Media processors using streams Conference on Media Processors 1999 Rixner, S., Dally, W. J., Kapasi, U. J., Khailany, B., Lopez-Lagunas, A., Mattson, P. R., Owens, J. D. SPIE-INT SOC OPTICAL ENGINEERING. 1998: 122–134
  • A bandwidth-efficient architecture for media processing 31st Annual ACM/IEEE International Symposium on Microarchitecture (MICRO31) Rixner, S., Dally, W. J., Kapasi, U. J., Khailany, B., Lopez-Lagunas, A., Mattson, P. R., Owens, J. D. I E E E, COMPUTER SOC PRESS. 1998: 3–13
  • The effects of explicitly parallel mechanisms on the Multi-ALU Processor cluster pipeline International Conference on Computer Design: VLSI in Computers and Processors Chang, A., Dally, W. J., Keckler, S. W., Carter, N. P., Lee, W. S. I E E E, COMPUTER SOC PRESS. 1998: 474–481
  • Communication-oriented computer architecture: Data choreography abstract International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems Dally, W. J. I E E E, COMPUTER SOC PRESS. 1998: 93–93
  • 1997Annual Index, Vol. 17 development [single chip microprocessors] Dally, W. J., Adams, L., Anderson, T., Bilas, A., Biswas, B. B., Burger, D. 1997; 2000: 28-36
  • Transmitter equalization for 4-Gbps signaling Micro, IEEE Dally, W. J., Poulton, J. 1997; 1 (17): 48-56
  • TPDS Now Online! z Special Issue Editors Old and New IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS Dally, W. J., Fortes, J. A. 1997; 3 (8): 225
  • The m-machine multicomputer International Journal of Parallel Programming Fillo, M., Keckler, S. W., Dally, W. J., Carter, N. P., Chang, A., Gurevich, Y. 1997; 3 (25): 183-212
  • The delta tree: An object-centered approach to image-based rendering Dally, W. J., McMillan, L., Bishop, G., Fuchs, H. 1997
  • Message-driven dynamics Massachusetts Institute of Technology Dally, W. J., Lethin, R. A. 1997
  • Extended ephemeral logging: log storage management for applications with long lived transactions ACM Transactions on Database Systems (TODS) Keen, J. S., Dally, W. J. 1997; 1 (22): 1-42
  • Design of the Configuration and Diagnostic Units of the MAP Chip Massachusetts Institute of Technology Dally, W. J., Klayman, K. 1997
  • Circuit designs for the MAP chip Massachusetts Institute of Technology Dally, W. J., Chen, A. R. 1997
  • Asynchronous event handing Massachusetts Institute of Technology Dally, W. J., Chatterjee, S. 1997
  • An I/O port controller for the MAP chip Massachusetts Institute of Technology, Dept. of Electrical Engineering and Dally, W. J., Ma, A. 1997
  • Advances in the M-machine runtime system Massachusetts Institute of Technology Dally, W. J., Shultz, A. 1997
  • Architects Look to Processors of Future MICROPROCESSOR REPORT, MICRODESIGN RESOURCES Bell, G., Sites, R., Dally, W., Ditzel, D., Patt, Y. 1996; 10 (10)
  • Bandwidth, Granularity, and Mechanisms: Key Issues in the Design of Parallel Computers Dally, W. J. 1996
  • A data-driven IDCT architecture for low power video applications Xanthopoulos, T., Chandrakasan, A. P., Sodini, C. G., Dally, W. J. 1996
  • The subspace model: Shape-based compilation for parallel systems Massachusetts Institute of Technology Dally, W. J., Knobe, K. B. 1996
  • Multiprocessor coupling system with integrated compile and run time scheduling for parallelism US Patent Keckler, S. W., Dally, W. J. 1996; 574 (5): 939
  • Flexible Memory Systems.(AASERT Fellowship). MASSACHUSETTS INST OF TECH CAMBRIDGE Carter, N., Dally, W. J. 1996
  • Flexible Memory Systems.(AASERT Fellowship) MASSACHUSETTS INST OF TECH CAMBRIDGE Dally, W. J., Carter, N. 1996
  • 1st IEEE Symposium on High-Performance Computer Architecture Nuth, P. R., Dally, W. J. 1995
  • The subspace model: A theory of shapes for parallel systems Knobe, K., Dally, W. J. 1995
  • The named-state register file: Implementation and performance Nuth, P. R., Dally, W. J. 1995
  • Proceedings Dally, W. J., Poulton, J. W., Ishii, A. T. 1995
  • Low-latency plesiochronous data retiming Dennison, L. R., Dally, W. J., Xanthopoulos, D. 1995
  • Implementation of atomic primitives on distributed shared memory multiprocessors Dally, W. J., Michael, M. M., Scott, M. L. 1995
  • Thread prioritization: A thread scheduling mechanism for multiple-context parallel processors Future Generation Computer Systems Fiske, S., Dally, W. J. 1995; 6 (11): 503-518
  • The M-Machine operating system Massachusetts Institute of Technology Dally, W. J., Gurevich, Y. 1995
  • The M-Machine Multicomputer MASSACHUSETTS INST OF TECH CAMBRIDGE ARTIFICIAL INTELLIGENCE LAB Dally, W. J., Keckler, S. W., Fillo, M., Carter, N. P., Chang, A. 1995
  • Fault tolerant adaptive routing in multicomputer networks Massachusetts Institute of Technology Xanthopoulos, T. 1995
  • Evaluating the locality benefits of active messages ACM SIGPLAN Notices Spertus, E., Dally, W. J. 1995; 8 (30): 189-198
  • 1st IEEE Symposium on High-Performance Computer Architecture Fiske, S., Dally, W. J. 1995
  • A numerical engine for distributed sparse matrices Massachusetts Institute of Technology Dally, W. J., Telichevesky, R. 1994
  • XEL: extended ephemeral logging for log storage management Keen, J. S., Dally, W. J. 1994
  • Architecture and implementation of the Reliable Router Dally, W. J., Dennison, L. R., Harris, D., Kan, K., Xanthopoulos, T. 1994
  • VLSI design for freshmen and sophomores Massachusetts Institute of Technology Dally, W. J., Harris, D. 1994
  • The reliable router: A reliable and high-performance communication substrate for parallel computers Parallel Computer Routing and Communication Dally, W. J., Dennison, L. R., Harris, D., Kan, K., Xanthopoulos, T. 1994: 241-255
  • The design and implementation of an actor language based on linear logic Massachusetts Institute of Technology Dally, W. J., Tse, C. S. 1994
  • Subspace optimizations Automatic Parallelization Knobe, K., Dally, W. J. 1994: 153-176
  • The implementation of a reliable router chip Massachusetts Institute of Technology Dally, W. J., Kan, K. H. 1994
  • The design of a high performance SPARC bus interface Massachusetts Institute of Technology Dally, W. J., Wong, D. F. 1994
  • Named state and efficient context switching Multithreaded Computer Architecture Nuth, P. R., Dally, W. J. 1994: 201-212
  • Multithreaded computer architecture Boston: Kluwer Academic Publishers Dennis, J. B., Gao, G. R., Iannucii, R. A., Dally, W. J. 1994
  • M-Machine Microarchitecture v1. 11 Dally, W. J., Keckler, S. W., Carter, N., Chang, A., Fillo, M., Lee, W. S. 1994
  • Logging and recovery in a highly concurrent database Dally, W. J., Keen, J. S. 1994
  • Issues in the Design and Implementation of Instruction Processors for Multicomputers (Position Statement) Multithreaded Computer Architecture Dally, W. J. 1994: 79-82
  • How to Choose the Grain Size of a Parallel Computer MIT/LCS Technical Report Yeung, D., Dally, W. J., Agarwal, A. 1994: MIT-LCS-TR-739
  • Hardware support for fast capability-based addressing ACM SIGPLAN Notices Carter, N. P., Keckler, S. W., Dally, W. J. 1994; 11 (29): 319-327
  • Efficient message subsystem design Massachusetts Institute of Technology Dally, W. J., Lee, W. S. 1994
  • A subspace optimizing data parallel complier Massachusetts Institute of Technology Dally, W. J., Dampier, T. O. 1994
  • A universal parallel computer architecture New Generation Computing Dally, W. J. 1993; 3-4 (11): 227-249
  • High-performance bidirectional signalling in VLSI systems Dennison, L. R., Lee, W. S., Dally, W. J. 1993
  • The J-Machine architecture and evaluation Compcon Spring'93, Digest of Papers. Dally, W. J., Keen, J. S., Noakes, M. D. 1993: 183-188
  • The Future of Computing is Parallel Computer Science Department Dally, W. J. 1993
  • The J-machine multicomputer: an architectural evaluation ACM SIGARCH Computer Architecture News Noakes, M. D., Wallach, D. A., Dally, W. J. 1993; 2 (21): 224-235
  • Performance evaluation of ephemeral logging ACM SIGMOD Record Keen, J. S., Dally, W. J. 1993; 2 (22): 187-196
  • Message-driven processor in a concurrent computer US Patent Dally, W. J., Chien, A. A., Horwat, W. P., Fiske, S. 1993; 212 (5): 778
  • Mechanisms for parallel computers Parallel Computing on Distributed Memory Multiprocessors Dally, W. J., Wills, D. S., Lethin, R. 1993: 3-25
  • Evaluation of mechanisms for fine-grained parallel programs in the J-machine and the CM-5 ACM SIGARCH Computer Architecture News Spertus, E., Goldstein, S. C., Schauser, K. E., Eicken, T. V., Culler, D. E., Dally, W. J. 1993; 3 (21): 302-313
  • Deadlock-free adaptive routing in multicomputer networks using virtual channels Parallel and Distributed Systems, IEEE Transactions Dally, W. J., Aoki, H. 1993; 4 (4): 466-475
  • A Video Controller and Distributed Frame Bu er for the J-Machine Dally, W. J., McDonald, E. 1993
  • COSMOS: An operating system for a fine-grain concurrent computer Research directions in concurrent object-oriented programming Horwat, W., Totty, B., Dally, W. J. 1993: 452-476
  • A fast translation method for paging on top of segmentation Computers, IEEE Transactions Dally, W. J. 1992; 2 (41): 247-250
  • Design and implementation of the Message-Driven Processor Dally, W. J., Ahmed, S., Carrick, P., Chien, A., Davison, R., Fiske, J. 1992
  • Virtual-Channel Flow Control (PDF) Dally, W. J. 1992
  • Virtual-channel flow control Parallel and Distributed Systems, IEEE Transactions Dally, W. J. 1992; 2 (3): 194-205
  • The J-machine: a fine-grain parallel computer Computing Systems in Engineering Dally, W. J., Chien, A., Davison, R., Fiske, J. A., Furman, S., Fyler, G. 1992; 1 (3): 7-15
  • The J-machine network Computer Design: VLSI in Computers and Processors Nuth, P. R., Dally, W. J. 1992
  • The message-driven processor: A multicomputer processing node with efficient mechanisms Micro, IEEE Dally, W. J., Fiske, J. A., Keen, J. S., Lethin, R. A., Noakes, M. D., Nuth, P. R. 1992; 2 (12): 23-39
  • The message driven processor: An integrated multicomputer processing element Computer Design: VLSI in Computers and Processor Dally, W. J., Chien, A., Fiske, J. A., Fyler, G., Horwat, W., Keen, J. S. 1992
  • Processor coupling: Integrating compile time and runtime scheduling for parallelism ACM SIGARCH Computer Architecture News Keckler, S. W., Dally, W. J. 1992; 2 (20): 202-213
  • Pi: a parallel architecture interface Frontiers of Massively Parallel Computation, 1992., Fourth Symposium on the… Wills, D. S., Dally, W. J. 1992
  • MDP design tools and methods Computer Design: VLSI in Computers and Processors Lethin, R. A., Dally, W. J. 1992: ICCD'92
  • INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE 1992 Scientific information bulletin Keckler, S. W., Dally, W. J. 1992; 4 (17): 35
  • Custom integrated circuits Custom Integrated Circuits Dally, W. J., Allen, J., Wyatt Jr, J. L., White, J. K., Devadas, S., Armstrong, R. C. 1992
  • A mechanism for efficient context switching Computer Design: VLSI in Computers and Processors Nuth, P. R., Dally, W. J. 1991: ICCD'91
  • Express cubes: improving the performance of< e1> k</e1>-ary< e1> n</e1>-cube interconnection networks Computers, IEEE Transactions Dally, W. J. 1991; 9 (40): 1016-1023
  • Experiments with Dataflow on a General-Purpose Parallel Computer. MASSACHUSETTS INST OF TECH CAMBRIDGE ARTIFICIAL INTELLIGENCE LAB Spertus, E., Dally, W. J. 1991
  • Experiments with Dataflow on a General-Purpose Parallel Computer MASSACHUSETTS INST OF TECH CAMBRIDGE ARTIFICIAL INTELLIGENCE LAB Dally, W. J., Spertus, E. 1991
  • Experiments with data flow on a general-purpose parallel computer. Memorandum report Massachusetts Inst. of Tech., Cambridge, MA (United States). Artificial Spertus, E., Dally, W. J. 1991
  • Experiences Implementing Dataflow on a General-Purpose Parallel Computer. ICPP Spertus, E., Dally, W. J. 1991; 2: 231-235
  • A hardware logic simulation system Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions Agrawal, P., Dally, W. J. 1990
  • Virtual-channel flow control Dally, W., J. 1990
  • System design of the J-Machine Noakes, M., Dally, W. J. 1990
  • Proceedings of the sixth MIT conference on Advanced research in VLSI Dally, W. J. 1990
  • Experience with concurrent aggregates (CA): Implementation and programming Chien, A. A., Dally, W. J. 1990
  • Advanced Research in VLSI: Proceedings of the Sixth MIT Conference;[papers Presented at the Sixth MIT Conference on Advanced Research in VLSI, Held in Cambridge, Mass., in 1990] Da, W. J. 1990
  • The Message-Driven Processor: A Multicomputer Processing Node with E cient Mechanisms Dally, W. J., Davison, R., Fiske, J. A., Fyler, G., Keen, J. S., Lethin, R. A. 1990
  • Simultaneous bidirectional signalling for IC systems Computer Design: VLSI in Computers and Processors Lam, K., Dennison, L. R., Dally, W. J. 1990: ICCD'90
  • Performance analysis of< e1> k</e1>-ary< e1> n</e1>-cube interconnection networks Computers, IEEE Transactions Dally, W. J. 1990; 6 (39): 775-785
  • Network and processor architecture for message-driven computers VLSI and Parallel Computation Dally, W. 1990: 140-222
  • Critical Problems in Very Large Scale Computer Systems KURTZ LABS YELLOW SPRINGS OH Leighton, F. T., Knight, T. F., Agarwal, A., Dally, W. J., Devadas, S. 1990
  • Critical Problems in Very Large Scale Computer Systems MASSACHUSETTS INST OF TECH CAMBRIDGE Agarwal, A., Dally, W. J., Devadas, S., Knight Jr, T. F., Leighton, F. T., Nabors, K. 1990
  • Concurrent aggregates (CA) ACM Sigplan Notices Chien, A. A., Dally, W. J. 1990; 3 (25): 187-196
  • A fine-grain, message-passing processing node Concurrent Computations Dally, W. J. 1989: 375-389
  • Algorithms for accuracy enhancement in a hardware logic simulator Agrawal, P., Tutundjian, R., Dally, W. 1989
  • Universal mechanisms for concurrency PARLE'89 Parallel Architectures and Languages Europe Dally, W. J., Wills, D. S. 1989: 19-33
  • The J-machine: a fine grain concurrent computer MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Dally, W. J., Chien, A., Fiske, S., Horwat, W., Keen, J. 1989
  • Micro-optimization of floating-point operations ACM SIGARCH Computer Architecture News Dally, W. J. 1989; 2 (17): 283-289
  • Express cubes: Improving the performance of k-ary n-cube interconnection networks MASSACHUSETTS INST OF TECH CAMBRIDGE LAB FOR COMPUTER SCIENCE Dally, W. J. 1989
  • Experience with CST: Programming and implementation ACM SIGPLAN Notices Horwat, W., Chien, A. A., Dally, W. J. 1989; 7 (24): 101-109
  • Experience with CST: Programming and Implementation MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Chien, A. A., Dally, W. J., Horwat, W. 1989
  • A network element based fault tolerant processor Massachusetts Institute of Technology Abler, T. A. 1988
  • Object-oriented concurrent programming in CST Dally, W. J., Chien, A. A. 1988
  • Finite-grain message passing concurrent computers Dally, W. 1988
  • The Reconfigurable Arithmetic Processor MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Dally, W. J., Fiske, S. 1988
  • The reconfigurable arithmetic processor ACM SIGARCH Computer Architecture News Fiske, S., Dally, W. J. 1988; 2 (16): 30-36
  • The J-machine: System support for Actors MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Dally, W. J. 1988
  • Mechanisms for Concurrent Computing FGCS Dally, W. J. 1988: 154-156
  • ON FIFTH GENERATION COMPUTER SYSTEMS 1988, edited by ICOT.© ICOT, 1988 Dally, W. J. 1988; 3 (FGCS'88): 154
  • Object-Oriented Concurrent Programming in CST MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Chien, A. A., Dally, W. J. 1988
  • Message-Driven Processor architecture, Version 11. Artificial intelligence memo Massachusetts Inst. of Tech., Cambridge (USA). Artificial Intelligence Lab. Dally, W., Chien, A., Fiske, S., Horwat, W., Keen, J. 1988
  • Message-Driven Processor Architecture MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Dally, W., Chien, A., Fiske, S., Horwat, W., Keen, J. 1988
  • Critical Problems in Very Large Scale Computer Systems MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Knight, T. F., Penfield, P., Glasser, L. A., Agarwal, A., Dally, W. J. 1988
  • Critical problems in very-large-scale computer systems. Semiannual technical report, 1 April-30 September 1988 Massachusetts Inst. of Tech., Cambridge (USA). Microsystems Research Center Penfield, P., Agarwal, A., Dally, W. J., Devadas, S., Knight, T. F. 1988
  • A Coherent VLSI Design Environment MASSACHUSETTS INST OF TECH CAMBRIDGE Abelson, H., Penfield, P., Antoniadis, D. A., Dally, W. J., Fonstad, C. G. 1987
  • Architecture and design of the MARS hardware accelerator Agrawal, P., Dally, W. J., Ezzat, A. K., Fischer, W. C., Jagadish, H. V., Krishnakumar, A. 1987
  • The Balanced Cube A VLSI Architecture for Concurrent Data Structures Dally, W. J. 1987: 27-73
  • Performance analysis of k-ary n-cube interconnection networks NASA STI/Recon Technical Report N Dally, W. J. 1987; 88: 30010
  • MARS: A multiprocessor-based programmable accelerator Design & Test of Computers, IEEE Agrawal, P., Dally, W. J., Fischer, W. C., Jagadish, H. V., Krishnakumar, A. S., Tutundjian, R. 1987; 5 (4): 28-36
  • Graph Algorithms A VLSI Architecture for Concurrent Data Structures Dally, W. J. 1987: 75-132
  • Deadlock-free message routing in multiprocessor interconnection networks Computers, IEEE Transactions Dally, W. J., Seitz, C. L. 1987; 5 (100): 547-553
  • Design of a self-timed VLSI multicomputer communication controller NASA STI/Recon Technical Report Dally, W. J., Song, P. 1987; 88: 30014
  • Concurrent Smalltalk A VLSI Architecture for Concurrent Data Structures Dally, W. J. 1987: 13-25
  • Concurrent computer architecture Massachusetts Inst. of Tech., Cambridge (USA). Artificial Intelligence Lab. Dally, W. J. 1987
  • Coherent VLSI environment. Semiannual technical report, 1 October 1986-31 March 1987 Massachusetts Inst. of Tech., Cambridge (USA). Microsystems Research Center Penfield, P., Dally, W. J., Glasser, L. A., Knight, T. F., Leighton, F. T. 1987
  • Architecture of a Message-Driven Processor MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Chao, L., Dally, W. J., Chien, A., Hassoun, S., Horwat, W. 1987
  • A coherent VLSI environment Massachusetts Inst. of Tech. Report Penfield Jr, P., Dally, W. J., Glasser, L. A., Knight Jr, T. F., Leighton, F. T. 1987
  • A message passing system for a fault tolerant parallel processor Massachusetts Institute of Technology Dally, W. J., Heyda, R. L. 1987
  • A Coherent VLSI Design Environment MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Leighton, F. T., Penfield, P., Glasser, L. A., Knight, T. F., Dally, W. J. 1987
  • A coherent VLSI design environment MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Penfield Jr, P., Dally, W. J., Glasser, L. A., Knight Jr, T. F., Leighton, F. T., Wyatt Jr, J. L. 1987
  • 5208: TR: _86 Dally, W. J. 1986
  • Directions in concurrent computing Dally, W. J. 1986
  • Wire-efficient VLSI multiprocessor communication networks Massachusetts Institute of Technology, Microsystems Program Office Dally, W. J. 1986
  • VLSI architecture for concurrent data structures California Inst. of Tech. Dally, W. J. 1986
  • The torus routing chip Distributed computing Dally, W. J., Seitz, C. L. 1986; 4 (1): 187-196
  • The torus routine chip Dally, W. J., Seitz, C. L. 1986
  • On the Performance of k-ary n-cube Interconnection Networks California Institute of Technology Dally, W. J. 1986
  • A High-performance VLSI Quaternary Serial Multiplier Dally, W. J. 1986
  • A Coherent VLSI Design Environment MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Leiserson, C. E., Penfield, P., Glasser, L. A., Knight, T. F., Dally, W. J. 1986
  • A hardware architecture for switch-level simulation Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions Dally, W. J., Bryant, R. E. 1985
  • The balanced cube: a concurrent data structure California Institute of Technology Dally, W. J., Seitz, C. L. 1985
  • Fungicides for Crop Protection: Invited papers International Specialized Book Service Incorporated Dally, W. J., Smith, I. M. 1985
  • Concurrent Algorithms for the Max-Flow Problem California Institute of Technology Dally, W. J. 1985
  • An object oriented architecture ACM SIGARCH Computer Architecture News Dally, W. J., Kajiya, J. T. 1985; 3 (13): 154-161
  • A Special Purpose Processor for Switch-Level Simulation International Conference on Computer Aided Design Dally, W. J., Bryant, R. E. 1984
  • The MOSSIM Simulation Engine Architecture and Design California Institute of Technology Dally, W. J. 1984