Bio


Dally investigates methods for applying VLSI technology to solve information processing problems. His current projects include network architecture, multicomputer architecture, media-processor architecture, and high-speed (4Gb/s) CMOS signaling. His research involves demonstrating novel concepts with working systems. Previous systems include the MARS Hardware Accelerator, the Torus Routing Chip, the J-Machine, M-Machine, and the Reliable Router. His group has pioneered techniques including fast capability-based addressing, processor coupling, virtual channel flow control, wormhole routing, link-level retry, message-driven processing, and deadlock-free routing.

Academic Appointments


Boards, Advisory Committees, Professional Organizations


  • Member, National Academy of Engineering (2013 - Present)
  • Member, American Academy of Arts and Sciences (2013 - Present)

Professional Education


  • PhD, Caltech (1986)

2015-16 Courses


Stanford Advisees


All Publications


  • Conference Author/Panelist Index Dally, W. J., Aoki, N., Bai, X., Banerjee, K., Benini, L., Bergamaschi, R.
  • Logic Simulation Algorithms for Pipelined Hardware Architectures Hardware Accelerators for Electrical CAD Agrawal, P., Dally, W. J., Tutundjian, R. edited by Ambler, T., Agrawal, P. 1988.
  • The Reconfigurable Arithmetic Processor Fiske, S., Dally, W. J.
  • IEEE Fellows Lead the Engineering Profession Dally, W. J., Agha, G. A., Babic, H. I., Basu, S., Beausoleil, W. F., Bertino, E.
  • Stanford University Concurrent VLSI Architecture Memo 124 Elastic Buffer Networks-on-Chip Michelogiannakis, G., Balfour, J., Dally, W. J.
  • Spills, Fills, and Kills Erez, M., Towles, B. P., Dally, W. J.
  • Message-Driven Processor Architecture: Verson 11 Dally, W. J., Chien, A., Fiske, S., Horwat, W., Keen, J., Nuth, P.
  • ISSCC 2004/SESSION 7/TD: SCALING TRENDS/7.1 Horowitz, M., Dally, W.
  • CIMI FÍITIIt Dally, W. J., Balfour, J., Black-Shaffer, D., Chen, J., Harting, R. C., Parikb, V.
  • AI Memo No. 1272 April 26, 1994 Spertus, E., Dally, W. J.
  • ARVLSI’97 Committees Dally, W. J., Brown, R. B., Ishii, A. T., Papaefthymiou, M. C., Mudge, T. N., June, C. S.
  • SSCS Members Honored as 2002 IEEE Fellows Banu, M., Burghartz, J. N., Dally, W. J., Dean, M. E., Gielen, G. G., Griffin, E. L.
  • ISSCC 2007/SESSION 24/MULTI-GB/s TRANSCEIVERS/24.3 Palmer, R., Poulton, J., Dally, W. J., Eyles, J., Fuller, A. M., Greer, T.
  • Globally Adaptive Load-Balanced Routing on k-ary n-cubes Singh, A., Dally, W. J., Towles, B., Gupta, A. K.
  • Program Chair’s Message Dally, W. J.
  • IEEE MICRO 1998 ANNUAL INDEX, VOL. 18 Burns Dally, W. J., Adams, J., Alt, P. M., Arai, T., Arakawa, F., Avresky, D. R. ; 66: 79
  • 1987 INDEX, VOLUME 4 Dally, W. J., Agrawal, P.
  • 6 Guest Editors’ Introduction: Top Picks from the 2008 Computer Architecture Conferences Joel Emer and Dean Tullsen 10 Larrabee: A Many-Core x86 Architecture Dally, W. J., Seiler, L., Carmean, D., Sprangle, E., Forsyth, T., Abrash, M.
  • 2010 Reviewers List Dally, W. J., Acacio, M. E., Agrawal, N., Altman, E., Alur, R., Baas, B.
  • 5 Guest Editors’ Introduction: Hot Chips 21 Krste Asanovic and Ralph Wittig 7 Power7: IBM’s Next-Generation Server Processor Dally, W. J., Kalla, R., Sinharoy, B., Starke, W. J., Floyd, M., Conway, P.
  • 31st Annual International Symposium on Computer Architecture ISCA 2004 Dally, W. J., Agerwala, T., Taylor, M., Lee, W., Miller, J., Wentzlaff, D.
  • Elastic Buffer Flow Control for On-Chip Networks IEEE TRANSACTIONS ON COMPUTERS Michelogiannakis, G., Dally, W. J. 2013; 62 (2): 295-309
  • 21st century digital design tools Dally, W. J., Malachowsky, C., Keckler, S. W. 2013
  • Channel reservation protocol for over-subscribed channels and destinations Michelogiannakis, G., Jiang, N., Becker, D., Dally, W. J. 2013
  • A 0.54 pJ/b 20Gb/s ground-referenced single-ended short-haul serial link in 28nm CMOS for advanced packaging applications Solid-State Circuits Conference Digest of Technical Papers (ISSCC) Poulton, J. W., Dally, W. J., Chen, X., Eyles, J. G., Greer, T. H., Tell, S. G. 2013
  • A detailed and flexible cycle-accurate network-on-chip simulator Performance Analysis of Systems and Software (ISPASS) Jiang, N., Becker, D. U., Michelogiannakis, G., Balfour, J., Towles, B., Shaw, D. E. 2013
  • A 0.54 pJ/b 20 Gb/s Ground-Referenced Single-Ended Short-Reach Serial Link in 28 nm CMOS for Advanced Packaging Applications IEEE Poulton, J. W., Dally, W. J., Chen, X., Eyles, J. G., Greer, T. H., Tell, S. G. 2013
  • Composition and reuse with compiled domain-specific languages Dally, W. J., Sujeeth, A. K., Rompf, T., Brown, K. J., Lee, H., Chafi, H. 2013
  • Optimizing data structures in high-level programs: new directions for extensible compilers based on staging Rompf, T., Sujeeth, A. K., Amin, N., Brown, K. J., Jovanovic, V., Lee, H. 2013
  • A Hierarchical Thread Scheduler and Register File for Energy-Efficient Throughput Processors ACM TRANSACTIONS ON COMPUTER SYSTEMS Gebhart, M., Johnson, D. R., Tarjan, D., Keckler, S. W., Dally, W. J., Lindholm, E., Skadron, K. 2012; 30 (2)
  • Network Congestion Avoidance Through Speculative Reservation 2012 IEEE 18TH INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA) Jiang, N., Becker, D. U., Michelogiannakis, G., Dally, W. J. 2012: 443-454
  • Adaptive Backpressure: Efficient Buffer Management for On-Chip Networks 2012 IEEE 30TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD) Becker, D. U., Jiang, N., Michelogiannakis, G., Dally, W. J. 2012: 419-426
  • A case of system-level hardware/software co-design and co-verification of a commodity multi-processor system with custom hardware Dally, W. J., Hong, S., Oguntebi, T., Casper, J., Bronson, N., Kozyrakis, C. 2012
  • Digital Design: A Systems Approach Dally, W. J., Harting, R. C. Cambridge University Press. 2012
  • Green-Marl: A DSL for Easy and Efficient Graph Analysis ASPLOS XVII: SEVENTEENTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS Hong, S., Chafi, H., Sedlar, E., Olukotun, K. 2012: 349-362
  • Unifying primary cache, scratch, and register file memories in a throughput processor Gebhart, M., Keckler, S. W., Khailany, B., Krashinsky, R., Dally, W. J. 2012
  • Article 8-A Hierarchical Thread Scheduler and Register File for Energy-Efficient Throughput Processors ACM Transactions on Computer Systems-TOCS Gebhart, M., Johnson, D. R., Tarjan, D., Keckler, S. W., Dally, W. J., Lindholm, E. 2012; 2 (30): 38
  • It's about the Power: An Architect's View of Interconnect 2012 IEEE INTERNATIONAL INTERCONNECT TECHNOLOGY CONFERENCE (IITC) Dally, B. 2012
  • Packet Chaining: Efficient Single-Cycle Allocation for On-Chip Networks IEEE COMPUTER ARCHITECTURE LETTERS Michelogiannakis, G., Jiang, N., Becker, D. U., Dally, W. J. 2011; 10 (2): 33-36
  • Evaluating Elastic Buffer and Wormhole Flow Control IEEE TRANSACTIONS ON COMPUTERS Michelogiannakis, G., Becker, D. U., Dally, W. J. 2011; 60 (6): 896-903
  • 4 Guest Editor’s Introduction: CPUs, GPUs, and Hybrid Computing David Brooks 7 GPUs and the Future of Parallel Computing Keckler, S. W., Dally, W. J., Khailany, B., Garland, M., Glasco, D., Rohr, D. 2011
  • Liszt: a domain specific language for building portable mesh-based PDE solvers DeVito, Z., Joubert, N., Palacios, F., Oakley, S., Medina, M., Barrientos, M. 2011
  • Guaranteeing forward progress of unified register allocation and instruction scheduling Technical Report Concurrent VLSI Architecture Group Memo 127, Stanford Park, J., Dally, W. J. 2011
  • Gpus and the future of parallel computing Micro, IEEE Keckler, S. W., Dally, W. J., Khailany, B., Garland, M., Glasco, D. 2011; 5 (31): 7-17
  • Energy-efficient mechanisms for managing thread context in throughput processors ACM SIGARCH Computer Architecture News Gebhart, M., Johnson, D. R., Tarjan, D., Keckler, S. W., Dally, W. J., Lindholm, E. 2011; 3 (39): 235-246
  • 2011 Index IEEE Computer Architecture Letters Vol. 10 Computer Architecture Letters Becker, D., Choi, I., Cooper-Balis, E., Dally, W. J., Devadas, S., Duato, J. 2011; 53: 56
  • Circuit challenges for future computing systems Dally, W. J. 2011
  • A compile-time managed multi-level register file hierarchy Gebhart, M., Keckler, S. W., Dally, W. J. 2011
  • Efficient Topologies for Large-scale Cluster Networks 2010 CONFERENCE ON OPTICAL FIBER COMMUNICATION OFC COLLOCATED NATIONAL FIBER OPTIC ENGINEERS CONFERENCE OFC-NFOEC Kim, J., Dally, W. J., Abts, D. 2010
  • Throughput computing Dally, W. J. 2010
  • Evaluating bufferless flow control for on-chip networks Michelogiannakis, G., Sanchez, D., Dally, W. J., Kozyrakis, C. 2010
  • The even/odd synchronizer: A fast, all-digital, periodic synchronizer Asynchronous Circuits and Systems (ASYNC), 2010 IEEE Symposium on Dally, W. J., Tell, S. G. 2010: 75-84
  • Moving the needle, computer architecture research in academe and industry ACM SIGARCH Computer Architecture News Dally, W. J. 2010; 3 (38): 1-1
  • Booksim 2.0 User’s Guide Standford University Jiang, N., Michelogiannakis, G., Becker, D., Towles, B., Dally, W. J. 2010
  • The end of denial architecture and the rise of throughput computing Dally, W. J. 2010
  • The GPU Computing Era (HTML) Nickolls, J., Dally, W. J. 2010
  • Fine-grain dynamic instruction placement for L0 scratch-pad memory Park, J., Balfour, J., Dally, W. J. 2010
  • Block-Parallel Programming for Real-time Embedded Applications WJ 2010

    View details for DOI D

  • Apparatus and method for packet scheduling US Patent Dally, W. J., Carvey, P. P., Beliveau, P. A., Mann, W. F., Dennison, L. R. 2010; 760 (7): 747
  • 2010 IEEE Symposium on Asynchronous Circuits and Systems Dally, W. J., Tell, S. G. 2010
  • The GPU computing era Micro, IEEE Nickolls, J., Dally, W. J. 2010; 2 (30): 56-69
  • The end of denial architecture and the rise of throughput computing Keynote speech at Desgin Automation Conference Dally, W. J. 2010
  • Buffer-space Efficient and Deadlock-free Scheduling of Stream Applications on Multi-core Architectures SPAA '10: PROCEEDINGS OF THE TWENTY-SECOND ANNUAL SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURES Park, J., Dally, W. J. 2010: 1-10
  • Operand Registers and Explicit Operand Forwarding IEEE COMPUTER ARCHITECTURE LETTERS Balfour, J., Harting, R. C., Dally, W. J. 2009; 8 (2): 60-63
  • COST-EFFICIENT DRAGONFLY TOPOLOGY FOR LARGE-SCALE SYSTEMS IEEE MICRO Kim, J., Dally, W., Scott, S., Abts, D. 2009; 29 (1): 33-40
  • Indirect adaptive routing on large scale interconnection networks ACM SIGARCH Computer Architecture News Jiang, N., Kim, J., Dally, W. J. 2009; 3 (37): 220-231
  • Router designs for elastic buffer on-chip networks Michelogiannakis, G., Dally, W. J. 2009
  • Embracing heterogeneity–parallel programming for changing hardware Linderman, M. D., Balfour, J., Meng, T. H., Dally, W. J. 2009
  • Power efficient supercomputing Accelerator-based Computing and Manycore Workshop (presentation) Dally, W. J. 2009; 1
  • Elastic-buffer flow control for on-chip networks High Performance Computer Architecture Michelogiannakis, G., Balfour, J., Dally, W. J. 2009
  • Allocator implementations for network-on-chip routers Becker, D. U., Dally, J. J. 2009
  • Maximizing the Filter Rate of L0 Compiler-Managed Instruction Stores by Pinning Technical Report 126, Concurrent VLSI Architecture Group, Stanford University Park, J., Balfour, J., Dally, W. J. 2009
  • Stream Processors Multicore Processors and Systems Erez, M., Dally, W. J. 2009: 231-270
  • Load-balanced routing US Patent Singh, A., Dally, W. J. 2009; 633 (7): 940
  • Exascale software study: Software challenges in extreme scale systems DARPA IPTO, Air Force Research Labs Amarasinghe, S., Campbell, D., Carlson, W., Chien, A., Dally, W., Elnohazy, E. 2009
  • Elastic-Buffer Flow Control for On-Chip Networks HPCA-15 2009: FIFTEENTH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, PROCEEDINGS Michelogiannakis, G., Balfour, J., Dally, W. J. 2009: 151-162
  • Opportunities Beyond Single-Core Microprocessors HPCA-15 2009: FIFTEENTH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, PROCEEDINGS Hill, M. D., Adve, S. V., Bader, D. A., Dally, W., Harrod, W., Sarkar, V. 2009: 143-143
  • Cost-Efficient Dragonfly Topology for Large-Scale Systems OFC: 2009 CONFERENCE ON OPTICAL FIBER COMMUNICATION, VOLS 1-5 Kim, J., Dally, W. J., Scott, S., Abts, D. 2009: 2174-2176
  • Efficient embedded computing COMPUTER Dally, W. J., Balfour, J., Black-Shaffer, D., Chen, J., Harting, R. C., Parikh, V., Park, J., Sheffield, D. 2008; 41 (7): 27-?
  • Stream scheduling: A framework to manage bulk operations in memory hierarchies EURO-PAR 2008 PARALLEL PROCESSING, PROCEEDINGS Das, A., Dally, W. J. 2008; 5168: 337-349
  • A Portable Runtime Interface For Multi-Level Memory Hierarchies PPOPP'08: PROCEEDINGS OF THE 2008 ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING Houston, M., Park, J., Ren, M., Knight, T., Fatahalian, K., Aiken, A., Dally, W. J., Hanrahan, P. 2008: 143-152
  • A tuning framework for software-managed memory hierarchies Ren, M., Park, J. Y., Houston, M., Aiken, A., Dally, W. J. 2008
  • An energy-efficient processor architecture for embedded systems Computer Architecture Letters Balfour, J., Dally, W. J., Black-Schaffer, D., Parikh, V., Park, J. S. 2008; 1 (7): 29-32
  • Exascale computing study: Technology challenges in achieving exascale systems Kogge, P., Bergman, K., Borkar, S., Campbell, D., Carson, W., Dally, W. 2008
  • A programmable 512 GOPS stream processor for signal, image, and video processing Solid-State Circuits, IEEE Journal Khailany, B. K., Williams, T., Lin, J., Long, E. P., Rygh, M., Tovey, D. F., Dally, B. 2008; 1 (43): 202-213
  • Structured Application-Specific Integrated Circuit (ASIC) Study STANFORD UNIV CA COMPUTER SYSTEMS LAB Dally, W., Balfour, J., Black-Schaffer, D., Hartke, P. 2008
  • Exascale computing study: Technology challenges in achieving exascale systems Bergman, K., Borkar, S., Campbell, D., Carlson, W., Dally, W., Denneau, M. 2008
  • Hierarchical instruction register organization Computer Architecture Letters Black-Schaffer, D., Balfour, J., Dally, W., Parikh, V., Park, J. S. 2008; 2 (7): 41-44
  • Technology-driven, highly-scalable dragonfly topology ISCA 2008 PROCEEDINGS: 35TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE Kim, J., Dally, W. J., Scott, S., Abts, D. 2008: 77-88
  • A 14-mW 6.25-Gb/s transceiver in 90-nm CMOS Poulton, J., Palmer, R., Fuller, A. M., Greer, T., Eyles, J., Dally, W. J., Horowitz, M. IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC. 2007: 2745-2757
  • Research challenges for on-chip interconnection networks IEEE MICRO Owens, J. D., Dally, W. J., Ho, R., Jayasimha, D. N., Keckler, S. W., Peh, L. 2007; 27 (5): 96-108
  • Register pointer architecture for efficient embedded processors 2007 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, VOLS 1-3 Park, J., Park, S., Balfour, J. D., Black-Schaffer, D., Kozyrakis, C., Dally, W. J. 2007: 600-605
  • Computer architecture in the many-core era PROCEEDINGS 2006 INTERNATIONAL CONFERENCE ON COMPUTER DESIGN Dally, B. 2007: 1-1
  • Compilation for Explicitly Managed Memory Hierarchies PROCEEDINGS OF THE 2007 ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING PPOPP'07 Knight, T. J., Park, J. Y., Ren, M., Houston, M., Erez, M., Fatahalian, K., Aiken, A., Dally, W. J., Hanrahan, P. 2007: 226-236
  • Research Challenges for On-Chip Interconnection Networks (HTML) Owens, J. D., Dally, W. J., Ho, R., Jayasimha, D. N., Keckler, S. W., Peh, L. S. 2007
  • Executing irregular scientific applications on stream architectures Erez, M., Ahn, J. H., Gummaraju, J., Rosenblum, M., Dally, W. J. 2007
  • A 14mW 6.25 Gb/s transceiver in 90nm CMOS for serial chip-to-chip communications Palmer, R., Poulton, J., Dally, W. J., Eyles, J., Fuller, A. M., Greer, T. 2007
  • Architectural support for the stream execution model on general-purpose processors Gummaraju, J., Erez, M., Coburn, J., Rosenblum, M., Dally, W. J. 2007
  • Stream Scheduling: A Framework to Manage Bulk Operations in a Memory Hierarchy Parallel Architecture and Compilation Techniques Das, A., Dally, W. J. 2007
  • Interconnect-Centric Computing. HPCA Dally, W. J., Keynote, H. 2007; 1
  • Tradeoff between data-, instruction-, and thread-level parallelism in stream processors Ahn, J., Erez, M., Dally, W. J. 2007
  • Flattened butterfly: a cost-efficient topology for high-radix networks ACM SIGARCH Computer Architecture News Kim, J., Dally, W. J., Abts, D. 2007; 2 (35): 126-137
  • Flattened Butterfly : A Cost-Efficient Topology for High-Radix Networks ISCA'07: 34TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, CONFERENCE PROCEEDINGS Kim, J., Dally, W. J., Abts, D. 2007: 126-137
  • Flattened butterfly topology for on-chip networks MICRO-40: PROCEEDINGS OF THE 40TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE Kim, J., Balfour, J., Dally, W. J. 2007: 172-182
  • Future directions for on-chip interconnection networks OCIN Workshop Dally, W. J. 2006
  • The BlackWidow high-radix Clos network 33RD INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHTIECTURE, PROCEEDINGS Scott, S., Abts, D., Kim, J., Dally, W. J. 2006: 16-27
  • Sequoia: programming the memory hierarchy Fatahalian, K., Horn, D., Knight, T., Leem, L., Houston, M., Park, J., Dally, B. 2006
  • Multi-Core for HPC: Breakthrough or Breakdown? Sterling, T., Kogge, P., Dally, W., Scott, S., Gropp, W., Keyes, D. 2006
  • Topology optimization of interconnection networks Computer Architecture Letters Gupta, A. K., Dally, W. J. 2006; 1 (5): 10-13
  • Prefix search method US Patent Waters, G. M., Dennison, L. R., Carvey, P. P., Dally, W. J., Mann, W. F. 2006; 130 (7): 847
  • DRAFT Final Report: Workshop on On-and Off-Chip Networks for Multi-Core Systems Capturado em: http://www. ece. ucdavis. edu/~ ocin06 Dally, W. 2006
  • Compiling for stream processing Das, A., Dally, W. J., Mattson, P. 2006
  • Data parallel address architecture Computer Architecture Letters Ahn, J. H., Dally, W. J. 2006; 1 (5): 30-33
  • Adaptive routing in high-radix clos network Kim, J., Dally, W. J., Dally, J., Abts, D. 2006
  • Pulsenet-A Parallel Flash Sampler and Digital Processor IC for Optical SETI Custom Integrated Circuits Conference, 2006. CICC'06. IEEE Howard, A. W., Wei, G. Y., Dally, W. J., Horowitz, P. 2006: 261-264
  • The design space of data-parallel memory systems Ahn, J. H., Erez, M., Dally, W. J. 2006
  • Design tradeoffs for tiled CMP on-chip networks Balfour, J., Dally, W., J. 2006
  • A 20-Gb/s 0.13-mu m CMOS serial link transmitter using an LC-PLL to directly drive the output multiplexer Chiang, P., Dally, W. J., Lee, M. J., Senthinathan, R., Oh, Y., Horowitz, M. A. IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC. 2005: 1004-1011
  • Scatter-add in data parallel architectures 11TH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, PROCEEDINGS Ahn, J. H., Erez, M., Dally, W. J. 2005: 132-142
  • 11th International Symposium on High-Performance Computer Architecture (HPCA'05) Ahn, J. H., Erez, M., Dally, W. J. 2005
  • Fault tolerance techniques for the merrimac streaming supercomputer Erez, M., Jayasena, N., Knight, T. J., Dally, W. J. 2005
  • Microarchitecture of a high-radix router 32ND INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, PROCEEDINGS Kim, J., Dally, W. J., Towles, B., Gupta, A. K. 2005: 420-431
  • Explaining the gap between ASIC and custom power: A custom perspective 42ND DESIGN AUTOMATION CONFERENCE, PROCEEDINGS 2005 Chang, A., Dally, W. J. 2005: 281-284
  • A 33-mW 8-Gb/s CMOS clock multiplier and CDR for highly integrated I/Os Farjad-Rad, R., Nguyen, A., Tran, J. M., Greer, T., Poulton, J., Dally, W. J., Edmondson, J. H., Senthinathan, R., Rathi, R., Lee, M. J., Ng, H. T. IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC. 2004: 1553-1561
  • Stream register files with indexed access 10TH INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE, PROCEEDINGS Jayasena, N., Erez, M., Ahn, J. H., Dally, W. J. 2004: 60-72
  • Streams and vectors: A memory system perspective 6th WorkShop on Media and Streaming Processors Jayasena, N., Dally, W. J. 2004
  • High-Speed Logic, Circuits, Libraries and Layout Closing the Gap Between ASIC & Custom Chang, A., Dally, W. J., Chinnery, D., Keutzer, K., Zlatanovici, R. 2004: 101-144
  • The case for broader computer architecture education: keynote address Dally, W. J. 2004
  • Buffer and delay bounds in high radix interconnection networks Computer Architecture Letters Singh, A., Dally, W. J. 2004; 1 (3): 8-8
  • Adaptive channel queue routing on k-ary n-cubes Singh, A., Dally, W., J., Gupta, A., Towles, B. 2004
  • Stream processors: Progammability and efficiency Queue Dally, W. J., Kapasi, U. J., Khailany, B., Ahn, J. H., Da, A. 2004; 1 (2): 52
  • Principles and practices of interconnection networks Access Online via Elsevier Dally, W. J., Towles, B. P. 2004
  • How scaling will change processor architecture Solid-State Circuits Conference, 2004. Digest of Technical Papers. Horowitz, M., Dally, W. 2004
  • Exploiting Structure and Managing Wires to Increase Density and Performance Closing the Gap Between ASIC & Custom Chang, A., Dally, W. J. 2004: 269-287
  • Analysis and performance results of a molecular modeling application on Merrimac Erez, M., Ahn, J. H., Garg, A., Dally, W. J., Darve, E. 2004
  • Space-efficient source routing Carvey, P., Dally, W., Dennison, L., King, P., Mann, W. 2004
  • Globally adaptive load-balanced routing on tori Computer Architecture Letters Singh, A., Dally, W. J., Towles, B., Gupta, A. K. 2004; 1 (3): 2-2
  • Evaluating the imagine stream architecture 31ST ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, PROCEEDINGS Alm, J. H., Dally, W. J., Khailany, B., Kapasi, U. J., Das, A. 2004: 14-25
  • A 20Gb/s 0.13um CMOS serial link transmitter using an LC-PLL to directly drive the output multiplexer 2004 SYMPOSIUM ON VLSI CIRCUITS, DIGEST OF TECHNICAL PAPERS Chiang, P., Dally, W. J., Lee, M. J., Senthinathan, R., Oh, Y., Horowitz, M. 2004: 272-275
  • A second-order semidigital clock recovery circuit based on injection locking Ng, H. T., Farjad-Rad, R., Lee, M. J., Dally, W. J., Greer, T., Poulton, J., Edmondson, J. H., Rathi, R., Senthinathan, R. IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC. 2003: 2101-2110
  • Guaranteed scheduling for switches with configuration overhead IEEE-ACM TRANSACTIONS ON NETWORKING Towles, B., Dally, W. J. 2003; 11 (5): 835-847
  • Programmable stream processors COMPUTER Kapasi, U. J., Rixner, S., Dally, W. J., Kailany, B., Ahn, J. H., Mattson, P., Owens, J. D. 2003; 36 (8): 54-?
  • Jitter transfer characteristics of delay-locked loops - Theories and design techniques IEEE JOURNAL OF SOLID-STATE CIRCUITS Lee, M. J., Dally, W. J., Greer, T., Ng, H. T., Farjad-Rad, R., Poulton, J., Senthinathan, R. 2003; 38 (4): 614-621
  • Exploring the VLSI scalability of stream processors NINTH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, PROCEEDINGS Khailany, B., Dally, W. J., Rixner, S., Kapasi, U. J., Owens, J. D., Towles, B. 2003: 153-164
  • 0.622-8.0 Gbps 150 mW serial IO macrocell with fully flexible preemphasis and equalization VLSI Circuits, 2003. Digest of Technical Papers. 2003 Symposium on Farjad-Rad, R., Ng, H. T., Lee, M. J., Senthinathan, R., Dally, W. J., Nguyen, A. 2003: 63-66
  • Merrimac: Supercomputing with streams Dally, W., J., Labonte, F., Das, A., Hanrahan, P., Ahn, J. H., Gummaraju, J. 2003
  • Prefix search method Carvey, P., Carvey, P., Dennison, L., Mann, W., Waters, G. 2003
  • A second-order semi-digital clock recovery circuit based on injection locking Solid-State Circuits Conference, 2003. Digest of Technical Papers. ISSCC Lee, M. J., Dally, W. J., Poulton, J., Greer, T., Edmondson, J., Farjad-Rad, R. 2003
  • A 33mW 8Gb/s CMOS clock multiplier and CDR for highly integrated I/Os Ng, H. T., Lee, M. J., Farjad-Rad, R., Senthinathan, R., Dally, W. J., Nguyen, A. 2003
  • Methods and apparatus for event-driven routing Carvey, P., Dally, W., Dennison, L., King, P. 2003
  • Throughput-centric routing algorithm design Towles, B., Dally, W. J., Boyd, S. 2003
  • CMOS high-speed I/Os-present and future Lee, M. J., Dally, W. J., Farjad-Rad, R., Ng, H. T., Senthinathan, R., Edmondson, J. 2003
  • The Ninth International Symposium on High-Performance Computer Architecture (HPCA'03) Khailany, B., Dally, W. J., Rixner, S., Kapasi, U. J., Owens, J. D., Towles, B. 2003
  • GOAL: A load-balanced adaptive routing algorithm for torus networks 30TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, PROCEEDINGS Singh, A., Dally, W. J., Gupta, A. K., Towles, B. 2003: 194-205
  • A low-power multiplying DLL for low-jitter multigigahertz clock generation in highly integrated digital chips Farjad-Rad, R., Dally, W., Ng, H. T., Senthinathan, R., Lee, M. J., Rathi, R., Poulton, J. IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC. 2002: 1804-1812
  • A stream processor development platform ICCD'2002: IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN: VLSI IN COMPUTERS AND PROCESSORS, PROCEEDINGS Serebrin, B., Owens, J. D., Chen, C. H., Crago, S. P., Kapasi, U. J., Khailany, B., Mattson, P., Namkoong, J., Rixner, S., Dally, W. J. 2002: 303-308
  • Locality-preserving randomized oblivious routing on torus networks Singh, A., Dally, W. J., Towles, B., Gupta, A. K. 2002
  • Comparing Reyes and OpenGL on a stream architecture Owens, J. D., Khailany, B., Towles, B., Dally, W. J. 2002
  • Prefix search circuitry and method Carvey, P., Dally, W., Dennison, L., Mann, W., Waters, G. 2002
  • Internet switch router Carvey, P., Carvey, P., Dennison, L., King, P. 2002
  • Computer architecture is all about interconnect High-Perf. Comp. Architecture Dally, W. J. 2002
  • Worst-case traffic for oblivious routing functions Towles, B., Dally, W. J. 2002
  • Stream Processing for High-Performance Embedded Systems Defense Technical Information Center Dally, W. J. 2002
  • Method and system for guaranteeing quality of service in large capacity input output buffered cell switch based on minimum bandwidth guarantees and weighted fair share of unused bandwidth Dally, W., Meempat, G., Ramamurthy, G. 2002
  • Worst-case Traffic for Oblivious Routing Functions (PDF) Towles, B., Dally, W. J. 2002
  • A 0.2-2 GHz 12 mW multiplying DLL for low-jitter clock synthesis in highly-integrated data communication chips Farjad-Rad, R., Dally, W., Ng, H. T., Poulton, J., Stone, T., Rathi, R. 2002
  • Migration in single chip multiprocessors Computer Architecture Letters Shaw, K. A., Dally, W. J. 2002; 1 (1): 12-12
  • Media processing applications on the imagine stream processor ICCD'2002: IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN: VLSI IN COMPUTERS AND PROCESSORS, PROCEEDINGS Owens, J. D., Rixner, S., Kapasi, U. J., Mattson, P., Towles, B., Serebrin, B., Dally, W. J. 2002: 295-302
  • Guaranteed scheduling for switches with configuration overhead IEEE INFOCOM 2002: THE CONFERENCE ON COMPUTER COMMUNICATIONS, VOLS 1-3, PROCEEDINGS Towles, B., Dally, W. J. 2002: 342-351
  • Scalable opto-electronic network (SOENet) HOT INTERCONNECTS 10 Gupta, A. K., Dally, W. J., Singh, A., Towles, B. 2002: 71-76
  • The imagine stream processor ICCD'2002: IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN: VLSI IN COMPUTERS AND PROCESSORS, PROCEEDINGS Kapasi, U. J., Dally, W. J., Rixner, S., Owens, J. D., Khailany, B. 2002: 282-288
  • VLSI design and verification of the imagine processor ICCD'2002: IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN: VLSI IN COMPUTERS AND PROCESSORS, PROCEEDINGS Khailany, B., Dally, W. J., Chang, A., Kapasi, U. J., Namkoong, J., Towles, B. 2002: 289-294
  • Hot chips 12 IEEE MICRO Dally, W. J., Tremblay, M., Baum, A. J. 2001; 21 (2): 13-15
  • Imagine: Media processing with streams IEEE MICRO Khailany, B., Dally, W. J., Kapasi, U. J., Mattson, P., Namkoong, J., Owens, J. D., Towles, B., Chang, A., Rixner, S. 2001; 21 (2): 35-46
  • A delay model for router microarchitectures IEEE MICRO Peh, L. S., Dally, W. J. 2001; 21 (1): 26-34
  • A Delay Model for Router Microarchitectures (HTML) Peh, L. S., Dally, W. J. 2001
  • Elastic interconnects: Repeater-inserted long wiring capable of compressing and decompressing data Mizuno, M., Dally, W., Onishi, H. 2001
  • Scalable switching fabrics for Internet routers White paper, Avici Systems Inc Dally, W. J. 2001
  • Monolithic chaotic communications system Circuits and Systems, 2001. ISCAS 2001. The 2001 IEEE International Chiang, P., Dally, W., Lee, E. 2001
  • Guest Editors' Introduction: Hot Chips 12 IEEE MICRO Baum, A. J., Dally, W. J., Tremblay, M. 2001; 2 (21): 0013-15
  • A streaming supercomputer Whitepaper Dally, W. J., Hanrahan, P., Fedkiw, R. 2001
  • A single-chip terabit switch Hot Chips Dally, W. J., Dettloff, W., Eyles, J., Greer, T., Poulton, J., Stone, T. 2001; 13
  • Guest Editors' Introduction: Hot Chips 12 (HTML) Dally, W. J., Tremblay, M., Baum, A. J. 2001
  • Route packets, not wires: On-chip interconnection networks 38TH DESIGN AUTOMATION CONFERENCE PROCEEDINGS 2001 Dally, W. J., Towles, B. 2001: 684-689
  • A delay model and speculative architecture for pipelined routers HPCA: SEVENTH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTING ARCHITECTURE, PROCEEDINGS Peh, L. S., Dally, W. J. 2001: 255-266
  • An 84-mW 4-Gb/s clock and data recovery circuit for serial link applications 2001 SYMPOSIUM ON VLSI CIRCUITS, DIGEST OF TECHNICAL PAPERS Lee, M. J., Dally, W. J., POULTON, J. W., Chiang, P., Greenwood, S. F. 2001: 149-152
  • Low-power area-efficient high-speed I/O circuit techniques Lee, M. J., Dally, W. J., Chiang, P. IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC. 2000: 1591-1599
  • Communication scheduling Mattson, P., Dally, W. J., Rixner, S., Kapasi, U. J., Owens, J. D. ASSOC COMPUTING MACHINERY. 2000: 82-92
  • Memory access scheduling PROCEEDING OF THE 27TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE Rixner, S., Dally, W. J., Kapasi, U. J., Mattson, P., Owens, J. D. 2000: 128-138
  • Efficient conditional operations for data-parallel architectures 33RD ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE: MICRO-33 2000, PROCEEDINGS Kapasi, U. J., Dally, W. J., Rixner, S., Mattson, P. R., Owens, J. D., Khailany, B. 2000: 159-170
  • 10 Subspace Optimizations Knobe, K., Dally, W. J. edited by Kessler, Christoph, W. 2000
  • Flit-reservation flow control Peh, L., S., Dally, W. J. 2000
  • Stream Scheduling STANFORD UNIV CA COMPUTER SYSTEMS LAB Dally, W. J., Mattson, P., Kapasi, U. J., Owens, J. D., Towles, B. 2000
  • Stream scheduling STANFORD UNIV CA COMPUTER SYSTEMS LAB Kapasi, U. J., Mattson, P., Dally, W. J., Owens, J. D., Towles, B. 2000
  • Sixth International Symposium on High-Performance Computer Architecture Rixner, S., Dally, W. J., Khailany, B., Mattson, P., Kapasi, U. J., Owens, J. D. 2000
  • Sixth International Symposium on High-Performance Computer Architecture Peh, L. S., Dally, W. J. 2000
  • Memory access scheduling isca Owens, J. D., Mattson, P., Kapasi, U. J., Dally, W. J., Rixner, S. 2000; 128
  • Register organization for media processing Rixner, S., Dally, W., J., Khailany, B., Mattson, P., Kapasi, U. J., Owens, J. 2000
  • Polygon rendering on a stream architecture Owens, J. D., Dally, W. J., Kapasi, U. J., Rixner, S., Mattson, P., Mowery, B. 2000
  • A 90 mW 4 Gb/s equalized I/O circuit with input offset cancellation Lee, M. J., Dally, W., Chiang, P. 2000
  • Smart memories: A modular reconfigurable architecture ACM SIGARCH Computer Architecture News Mai, K., Paaske, T., Jayasena, N., Ho, R., Dally, W. J., Horowitz, M. 2000; 2 (28): 161-171
  • Processor mechanisms for software shared memory HIGH PERFORMANCE COMPUTING, PROCEEDINGS Carter, N. P., Dally, W. J., Lee, W. S., Keckler, S. W., Chang, A. 2000; 1940: 120-133
  • The role of custom design in ASIC chips 37TH DESIGN AUTOMATION CONFERENCE, PROCEEDINGS 2000 Dally, W. J., Chang, A. 2000: 643-647
  • Concurrent event handling through multithreading IEEE TRANSACTIONS ON COMPUTERS Keckler, S. W., Chang, A., Lee, W. S., Chatterjee, S., Dally, W. J. 1999; 48 (9): 903-916
  • VLSI architecture: Past, present, and future 20TH ANNIVERSARY CONFERENCE ON ADVANCED RESEARCH IN VLSI, PROCEEDINGS Dally, W. J., Lacy, S. 1999: 232-241
  • GAD: A 12-GS/s CMOS 4-bit A/D converter for an equalized multi-level link Ellersick, W., Yang, C. K., Horowitz, M., Dally, W. J. 1999
  • Interconnect-limited VLSI architecture Interconnect Technology, 1999. IEEE International Conference Dally, W. J. 1999: 15-17
  • Computer Architecture for the Next Millenium Dally, W. J. 1999
  • 20th Anniversary Conference on Advanced Research in VLSI Dally, W. J., Lacy, S. 1999
  • Guest editors' introduction: The bleeding edge IEEE MICRO Rettberg, R., Dally, W. J., Culler, D. E. 1998; 18 (1): 10-11
  • Tomorrow’s Computing Engines keynote speech, Fourth Int’l Symp. High-Performance Computer Architecture Dally, W. 1998
  • The j-machine: A retrospective Retrospective in Dally, W. J., Chang, A., Chien, A., Fiske, S., Horwat, W., Keen, J. 1998: 54-58
  • Retrospective: the J-machine Dally, W. J., Chien, A., Fiske, S., Horwat, W., Lethin, R., Noakes, M. 1998
  • VLSI datapath choices: Cell-based versus full-custom Massachusetts Institute of Technology Chang, A. L. 1998
  • Architecture of a message-driven processor 25 years of the international symposia on Computer architecture (selected Dally, W. J., Chao, L., Chien, A., Hassoun, S., Horwat, W., Kaplan, J. 1998
  • An efficient, protected message interface Computer Lee, W. S., Dally, W. J., Keckler, S. W., Carter, N. P., Chang, A. 1998; 11 (31): 69-75
  • Digital systems engineering Cambridge university press Dally, W. J., Poulton, J. W. 1998
  • Architecture of the Avici terabit switch/router Dally, W., Carvey, P., Dennison, L. 1998
  • Digital Systems Engineering Poulton, J. W., Dally, J., John, W. Cambridge University Press. 1998
  • E cient, protected message interface in the MIT M-Machine IEEE Computer Special Issue on Design Challenges for High-Performance Lee, W. S., Dally, W. J., Keckler, S. W., Carter, N. P., Chang, A. 1998
  • An instruction scheduling algorithm for communication-constrained microprocessors Massachusetts Institute of Technology Dally, W. J., Buehler, C. J. 1998
  • The Fifth International Conference on Massively Parallel Processing Using Optical Interconnections Dally, W. J., Lee, M. J., An, F. T., Poulton, J., Tell, S. 1998
  • Point sample rendering Rendering Techniques Grossman, J. P., Dally, W. J. 1998; 98: 181-192
  • Media Processors 1999 (Proceedings Volume) Dally, W. J., Fritts, J. E., Wolf, W. H., Liu, B., Bove Jr, V. M., Lee, M. 1998
  • Media processing using streams Electronic Imaging Rixner, S., Dally, W. J., Kapasi, U. J., Khailany, B., Lopez-Lagunas, A., Mattson, P. R. 1998: 122-134
  • The J-Machine ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE Dally, W. J., Chang, A., Chien, A., Fiske, S., Horwat, W., Keen, J. 1998; 25: 54-58
  • Invited Talks Coldren, L. A., Dally, W. J. 1998
  • Point sample rendering Massachusetts Institute of Technology Dally, W. J., Grossman, J. P. 1998
  • A tracking clock recovery receiver for 4-Gbps signaling IEEE MICRO Poulton, J., Dally, W. J., Tell, S. 1998; 18 (1): 25-27
  • A bandwidth-efficient architecture for media processing 31ST ANNUAL ACM/IEEE INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, PROCEEDINGS Rixner, S., Dally, W. J., Kapasi, U. J., Khailany, B., Lopez-Lagunas, A., Mattson, P. R., Owens, J. D. 1998: 3-13
  • Communication-oriented computer architecture: Data choreography abstract INNOVATIVE ARCHITECTURE FOR FUTURE GENERATION HIGH-PERFORMANCE PROCESSORS AND SYSTEMS, PROCEEDINGS Dally, W. J. 1998: 93-93
  • Exploiting fine-grain thread level parallelism on the MIT Multi-ALU Processor 25TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, PROCEEDINGS Keckler, S. W., Dally, W. J., Maskit, D., Carter, N. P., Chang, A., Lee, W. S. 1998: 306-317
  • The effects of explicitly parallel mechanisms on the Multi-ALU Processor cluster pipeline INTERNATIONAL CONFERENCE ON COMPUTER DESIGN: VLSI IN COMPUTERS AND PROCESSORS, PROCEEDINGS Chang, A., Dally, W. J., Keckler, S. W., Carter, N. P., Lee, W. S. 1998: 474-481
  • High-performance electrical signaling FIFTH INTERNATIONAL CONFERENCE ON MASSIVELY PARALLEL PROCESSING, PROCEEDINGS Dally, W. J., Lee, M. J., An, F. T., Poulton, J., Tell, S. 1998: 11-16
  • Media processors using streams MEDIA PROCESSORS 1999 Rixner, S., Dally, W. J., Kapasi, U. J., Khailany, B., Lopez-Lagunas, A., Mattson, P. R., Owens, J. D. 1998; 3655: 122-134
  • Message-driven dynamics Massachusetts Institute of Technology Dally, W. J., Lethin, R. A. 1997
  • An I/O port controller for the MAP chip Massachusetts Institute of Technology, Dept. of Electrical Engineering and Dally, W. J., Ma, A. 1997
  • Transmitter equalization for 4-Gbps signaling Micro, IEEE Dally, W. J., Poulton, J. 1997; 1 (17): 48-56
  • The m-machine multicomputer International Journal of Parallel Programming Fillo, M., Keckler, S. W., Dally, W. J., Carter, N. P., Chang, A., Gurevich, Y. 1997; 3 (25): 183-212
  • The delta tree: An object-centered approach to image-based rendering Dally, W. J., McMillan, L., Bishop, G., Fuchs, H. 1997
  • Extended ephemeral logging: log storage management for applications with long lived transactions ACM Transactions on Database Systems (TODS) Keen, J. S., Dally, W. J. 1997; 1 (22): 1-42
  • Design of the Configuration and Diagnostic Units of the MAP Chip Massachusetts Institute of Technology Dally, W. J., Klayman, K. 1997
  • Asynchronous event handing Massachusetts Institute of Technology Dally, W. J., Chatterjee, S. 1997
  • Advances in the M-machine runtime system Massachusetts Institute of Technology Dally, W. J., Shultz, A. 1997
  • TPDS Now Online! z Special Issue Editors Old and New IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS Dally, W. J., Fortes, J. A. 1997; 3 (8): 225
  • Circuit designs for the MAP chip Massachusetts Institute of Technology Dally, W. J., Chen, A. R. 1997
  • 1997Annual Index, Vol. 17 development [single chip microprocessors] Dally, W. J., Adams, L., Anderson, T., Bilas, A., Biswas, B. B., Burger, D. 1997; 2000: 28-36
  • Flexible Memory Systems.(AASERT Fellowship). MASSACHUSETTS INST OF TECH CAMBRIDGE Carter, N., Dally, W. J. 1996
  • The subspace model: Shape-based compilation for parallel systems Massachusetts Institute of Technology Dally, W. J., Knobe, K. B. 1996
  • Architects Look to Processors of Future MICROPROCESSOR REPORT, MICRODESIGN RESOURCES Bell, G., Sites, R., Dally, W., Ditzel, D., Patt, Y. 1996; 10 (10)
  • Multiprocessor coupling system with integrated compile and run time scheduling for parallelism US Patent Keckler, S. W., Dally, W. J. 1996; 574 (5): 939
  • Bandwidth, Granularity, and Mechanisms: Key Issues in the Design of Parallel Computers Dally, W. J. 1996
  • Flexible Memory Systems.(AASERT Fellowship) MASSACHUSETTS INST OF TECH CAMBRIDGE Dally, W. J., Carter, N. 1996
  • A data-driven IDCT architecture for low power video applications Xanthopoulos, T., Chandrakasan, A. P., Sodini, C. G., Dally, W. J. 1996
  • 1st IEEE Symposium on High-Performance Computer Architecture Fiske, S., Dally, W. J. 1995
  • Thread prioritization: A thread scheduling mechanism for multiple-context parallel processors Future Generation Computer Systems Fiske, S., Dally, W. J. 1995; 6 (11): 503-518
  • Low-latency plesiochronous data retiming Dennison, L. R., Dally, W. J., Xanthopoulos, D. 1995
  • Proceedings Dally, W. J., Poulton, J. W., Ishii, A. T. 1995
  • The M-Machine Multicomputer MASSACHUSETTS INST OF TECH CAMBRIDGE ARTIFICIAL INTELLIGENCE LAB Dally, W. J., Keckler, S. W., Fillo, M., Carter, N. P., Chang, A. 1995
  • 1st IEEE Symposium on High-Performance Computer Architecture Nuth, P. R., Dally, W. J. 1995
  • Implementation of atomic primitives on distributed shared memory multiprocessors Dally, W. J., Michael, M. M., Scott, M. L. 1995
  • Evaluating the locality benefits of active messages ACM SIGPLAN Notices Spertus, E., Dally, W. J. 1995; 8 (30): 189-198
  • The M-Machine operating system Massachusetts Institute of Technology Dally, W. J., Gurevich, Y. 1995
  • The subspace model: A theory of shapes for parallel systems Knobe, K., Dally, W. J. 1995
  • Fault tolerant adaptive routing in multicomputer networks Massachusetts Institute of Technology Xanthopoulos, T. 1995
  • The named-state register file: Implementation and performance Nuth, P. R., Dally, W. J. 1995
  • Hardware support for fast capability-based addressing ACM SIGPLAN Notices Carter, N. P., Keckler, S. W., Dally, W. J. 1994; 11 (29): 319-327
  • Efficient message subsystem design Massachusetts Institute of Technology Dally, W. J., Lee, W. S. 1994
  • The implementation of a reliable router chip Massachusetts Institute of Technology Dally, W. J., Kan, K. H. 1994
  • The design of a high performance SPARC bus interface Massachusetts Institute of Technology Dally, W. J., Wong, D. F. 1994
  • VLSI design for freshmen and sophomores Massachusetts Institute of Technology Dally, W. J., Harris, D. 1994
  • Subspace optimizations Automatic Parallelization Knobe, K., Dally, W. J. 1994: 153-176
  • M-Machine Microarchitecture v1. 11 Dally, W. J., Keckler, S. W., Carter, N., Chang, A., Fillo, M., Lee, W. S. 1994
  • Logging and recovery in a highly concurrent database Dally, W. J., Keen, J. S. 1994
  • XEL: extended ephemeral logging for log storage management Keen, J. S., Dally, W. J. 1994
  • The reliable router: A reliable and high-performance communication substrate for parallel computers Parallel Computer Routing and Communication Dally, W. J., Dennison, L. R., Harris, D., Kan, K., Xanthopoulos, T. 1994: 241-255
  • Named state and efficient context switching Multithreaded Computer Architecture Nuth, P. R., Dally, W. J. 1994: 201-212
  • Multithreaded computer architecture Boston: Kluwer Academic Publishers Dennis, J. B., Gao, G. R., Iannucii, R. A., Dally, W. J. 1994
  • Architecture and implementation of the Reliable Router Dally, W. J., Dennison, L. R., Harris, D., Kan, K., Xanthopoulos, T. 1994
  • A subspace optimizing data parallel complier Massachusetts Institute of Technology Dally, W. J., Dampier, T. O. 1994
  • A numerical engine for distributed sparse matrices Massachusetts Institute of Technology Dally, W. J., Telichevesky, R. 1994
  • The design and implementation of an actor language based on linear logic Massachusetts Institute of Technology Dally, W. J., Tse, C. S. 1994
  • Issues in the Design and Implementation of Instruction Processors for Multicomputers (Position Statement) Multithreaded Computer Architecture Dally, W. J. 1994: 79-82
  • How to Choose the Grain Size of a Parallel Computer MIT/LCS Technical Report Yeung, D., Dally, W. J., Agarwal, A. 1994: MIT-LCS-TR-739
  • Deadlock-free adaptive routing in multicomputer networks using virtual channels Parallel and Distributed Systems, IEEE Transactions Dally, W. J., Aoki, H. 1993; 4 (4): 466-475
  • High-performance bidirectional signalling in VLSI systems Dennison, L. R., Lee, W. S., Dally, W. J. 1993
  • The J-machine multicomputer: an architectural evaluation ACM SIGARCH Computer Architecture News Noakes, M. D., Wallach, D. A., Dally, W. J. 1993; 2 (21): 224-235
  • Performance evaluation of ephemeral logging ACM SIGMOD Record Keen, J. S., Dally, W. J. 1993; 2 (22): 187-196
  • Evaluation of mechanisms for fine-grained parallel programs in the J-machine and the CM-5 ACM SIGARCH Computer Architecture News Spertus, E., Goldstein, S. C., Schauser, K. E., Eicken, T. V., Culler, D. E., Dally, W. J. 1993; 3 (21): 302-313
  • COSMOS: An operating system for a fine-grain concurrent computer Research directions in concurrent object-oriented programming Horwat, W., Totty, B., Dally, W. J. 1993: 452-476
  • The J-Machine architecture and evaluation Compcon Spring'93, Digest of Papers. Dally, W. J., Keen, J. S., Noakes, M. D. 1993: 183-188
  • Message-driven processor in a concurrent computer US Patent Dally, W. J., Chien, A. A., Horwat, W. P., Fiske, S. 1993; 212 (5): 778
  • A Video Controller and Distributed Frame Bu er for the J-Machine Dally, W. J., McDonald, E. 1993
  • A universal parallel computer architecture New Generation Computing Dally, W. J. 1993; 3-4 (11): 227-249
  • Mechanisms for parallel computers Parallel Computing on Distributed Memory Multiprocessors Dally, W. J., Wills, D. S., Lethin, R. 1993: 3-25
  • The Future of Computing is Parallel Computer Science Department Dally, W. J. 1993
  • The J-machine: a fine-grain parallel computer Computing Systems in Engineering Dally, W. J., Chien, A., Davison, R., Fiske, J. A., Furman, S., Fyler, G. 1992; 1 (3): 7-15
  • Design and implementation of the Message-Driven Processor Dally, W. J., Ahmed, S., Carrick, P., Chien, A., Davison, R., Fiske, J. 1992
  • The message-driven processor: A multicomputer processing node with efficient mechanisms Micro, IEEE Dally, W. J., Fiske, J. A., Keen, J. S., Lethin, R. A., Noakes, M. D., Nuth, P. R. 1992; 2 (12): 23-39
  • The message driven processor: An integrated multicomputer processing element Computer Design: VLSI in Computers and Processor Dally, W. J., Chien, A., Fiske, J. A., Fyler, G., Horwat, W., Keen, J. S. 1992
  • Processor coupling: Integrating compile time and runtime scheduling for parallelism ACM SIGARCH Computer Architecture News Keckler, S. W., Dally, W. J. 1992; 2 (20): 202-213
  • MDP design tools and methods Computer Design: VLSI in Computers and Processors Lethin, R. A., Dally, W. J. 1992: ICCD'92
  • INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE 1992 Scientific information bulletin Keckler, S. W., Dally, W. J. 1992; 4 (17): 35
  • Custom integrated circuits Custom Integrated Circuits Dally, W. J., Allen, J., Wyatt Jr, J. L., White, J. K., Devadas, S., Armstrong, R. C. 1992
  • A fast translation method for paging on top of segmentation Computers, IEEE Transactions Dally, W. J. 1992; 2 (41): 247-250
  • Virtual-Channel Flow Control (PDF) Dally, W. J. 1992
  • The J-machine network Computer Design: VLSI in Computers and Processors Nuth, P. R., Dally, W. J. 1992
  • Pi: a parallel architecture interface Frontiers of Massively Parallel Computation, 1992., Fourth Symposium on the… Wills, D. S., Dally, W. J. 1992
  • Virtual-channel flow control Parallel and Distributed Systems, IEEE Transactions Dally, W. J. 1992; 2 (3): 194-205
  • Experiences Implementing Dataflow on a General-Purpose Parallel Computer. ICPP Spertus, E., Dally, W. J. 1991; 2: 231-235
  • A mechanism for efficient context switching Computer Design: VLSI in Computers and Processors Nuth, P. R., Dally, W. J. 1991: ICCD'91
  • Express cubes: improving the performance of< e1> k</e1>-ary< e1> n</e1>-cube interconnection networks Computers, IEEE Transactions Dally, W. J. 1991; 9 (40): 1016-1023
  • Experiments with data flow on a general-purpose parallel computer. Memorandum report Massachusetts Inst. of Tech., Cambridge, MA (United States). Artificial Spertus, E., Dally, W. J. 1991
  • Experiments with Dataflow on a General-Purpose Parallel Computer MASSACHUSETTS INST OF TECH CAMBRIDGE ARTIFICIAL INTELLIGENCE LAB Dally, W. J., Spertus, E. 1991
  • Experiments with Dataflow on a General-Purpose Parallel Computer. MASSACHUSETTS INST OF TECH CAMBRIDGE ARTIFICIAL INTELLIGENCE LAB Spertus, E., Dally, W. J. 1991
  • A hardware logic simulation system Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions Agrawal, P., Dally, W. J. 1990
  • Proceedings of the sixth MIT conference on Advanced research in VLSI Dally, W. J. 1990
  • Experience with concurrent aggregates (CA): Implementation and programming Chien, A. A., Dally, W. J. 1990
  • Advanced Research in VLSI: Proceedings of the Sixth MIT Conference;[papers Presented at the Sixth MIT Conference on Advanced Research in VLSI, Held in Cambridge, Mass., in 1990] Da, W. J. 1990
  • The Message-Driven Processor: A Multicomputer Processing Node with E cient Mechanisms Dally, W. J., Davison, R., Fiske, J. A., Fyler, G., Keen, J. S., Lethin, R. A. 1990
  • Performance analysis of< e1> k</e1>-ary< e1> n</e1>-cube interconnection networks Computers, IEEE Transactions Dally, W. J. 1990; 6 (39): 775-785
  • Network and processor architecture for message-driven computers VLSI and Parallel Computation Dally, W. 1990: 140-222
  • Critical Problems in Very Large Scale Computer Systems MASSACHUSETTS INST OF TECH CAMBRIDGE Agarwal, A., Dally, W. J., Devadas, S., Knight Jr, T. F., Leighton, F. T., Nabors, K. 1990
  • Concurrent aggregates (CA) ACM Sigplan Notices Chien, A. A., Dally, W. J. 1990; 3 (25): 187-196
  • Virtual-channel flow control Dally, W., J. 1990
  • Simultaneous bidirectional signalling for IC systems Computer Design: VLSI in Computers and Processors Lam, K., Dennison, L. R., Dally, W. J. 1990: ICCD'90
  • System design of the J-Machine Noakes, M., Dally, W. J. 1990
  • Critical Problems in Very Large Scale Computer Systems KURTZ LABS YELLOW SPRINGS OH Leighton, F. T., Knight, T. F., Agarwal, A., Dally, W. J., Devadas, S. 1990
  • Experience with CST: Programming and implementation ACM SIGPLAN Notices Horwat, W., Chien, A. A., Dally, W. J. 1989; 7 (24): 101-109
  • Express cubes: Improving the performance of k-ary n-cube interconnection networks MASSACHUSETTS INST OF TECH CAMBRIDGE LAB FOR COMPUTER SCIENCE Dally, W. J. 1989
  • Algorithms for accuracy enhancement in a hardware logic simulator Agrawal, P., Tutundjian, R., Dally, W. 1989
  • Universal mechanisms for concurrency PARLE'89 Parallel Architectures and Languages Europe Dally, W. J., Wills, D. S. 1989: 19-33
  • Experience with CST: Programming and Implementation MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Chien, A. A., Dally, W. J., Horwat, W. 1989
  • A fine-grain, message-passing processing node Concurrent Computations Dally, W. J. 1989: 375-389
  • The J-machine: a fine grain concurrent computer MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Dally, W. J., Chien, A., Fiske, S., Horwat, W., Keen, J. 1989
  • Micro-optimization of floating-point operations ACM SIGARCH Computer Architecture News Dally, W. J. 1989; 2 (17): 283-289
  • The Reconfigurable Arithmetic Processor MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Dally, W. J., Fiske, S. 1988
  • The J-machine: System support for Actors MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Dally, W. J. 1988
  • Finite-grain message passing concurrent computers Dally, W. 1988
  • ON FIFTH GENERATION COMPUTER SYSTEMS 1988, edited by ICOT.© ICOT, 1988 Dally, W. J. 1988; 3 (FGCS'88): 154
  • Object-Oriented Concurrent Programming in CST MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Chien, A. A., Dally, W. J. 1988
  • Message-Driven Processor architecture, Version 11. Artificial intelligence memo Massachusetts Inst. of Tech., Cambridge (USA). Artificial Intelligence Lab. Dally, W., Chien, A., Fiske, S., Horwat, W., Keen, J. 1988
  • Message-Driven Processor Architecture MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Dally, W., Chien, A., Fiske, S., Horwat, W., Keen, J. 1988
  • Critical Problems in Very Large Scale Computer Systems MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Knight, T. F., Penfield, P., Glasser, L. A., Agarwal, A., Dally, W. J. 1988
  • Critical problems in very-large-scale computer systems. Semiannual technical report, 1 April-30 September 1988 Massachusetts Inst. of Tech., Cambridge (USA). Microsystems Research Center Penfield, P., Agarwal, A., Dally, W. J., Devadas, S., Knight, T. F. 1988
  • A network element based fault tolerant processor Massachusetts Institute of Technology Abler, T. A. 1988
  • Object-oriented concurrent programming in CST Dally, W. J., Chien, A. A. 1988
  • The reconfigurable arithmetic processor ACM SIGARCH Computer Architecture News Fiske, S., Dally, W. J. 1988; 2 (16): 30-36
  • Mechanisms for Concurrent Computing FGCS Dally, W. J. 1988: 154-156
  • The Balanced Cube A VLSI Architecture for Concurrent Data Structures Dally, W. J. 1987: 27-73
  • A Coherent VLSI Design Environment MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Leighton, F. T., Penfield, P., Glasser, L. A., Knight, T. F., Dally, W. J. 1987
  • Architecture of a Message-Driven Processor MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Chao, L., Dally, W. J., Chien, A., Hassoun, S., Horwat, W. 1987
  • Architecture and design of the MARS hardware accelerator Agrawal, P., Dally, W. J., Ezzat, A. K., Fischer, W. C., Jagadish, H. V., Krishnakumar, A. 1987
  • Performance analysis of k-ary n-cube interconnection networks NASA STI/Recon Technical Report N Dally, W. J. 1987; 88: 30010
  • MARS: A multiprocessor-based programmable accelerator Design & Test of Computers, IEEE Agrawal, P., Dally, W. J., Fischer, W. C., Jagadish, H. V., Krishnakumar, A. S., Tutundjian, R. 1987; 5 (4): 28-36
  • Graph Algorithms A VLSI Architecture for Concurrent Data Structures Dally, W. J. 1987: 75-132
  • Deadlock-free message routing in multiprocessor interconnection networks Computers, IEEE Transactions Dally, W. J., Seitz, C. L. 1987; 5 (100): 547-553
  • A coherent VLSI environment Massachusetts Inst. of Tech. Report Penfield Jr, P., Dally, W. J., Glasser, L. A., Knight Jr, T. F., Leighton, F. T. 1987
  • Concurrent Smalltalk A VLSI Architecture for Concurrent Data Structures Dally, W. J. 1987: 13-25
  • A message passing system for a fault tolerant parallel processor Massachusetts Institute of Technology Dally, W. J., Heyda, R. L. 1987
  • A Coherent VLSI Design Environment MASSACHUSETTS INST OF TECH CAMBRIDGE Abelson, H., Penfield, P., Antoniadis, D. A., Dally, W. J., Fonstad, C. G. 1987
  • Design of a self-timed VLSI multicomputer communication controller NASA STI/Recon Technical Report Dally, W. J., Song, P. 1987; 88: 30014
  • Concurrent computer architecture Massachusetts Inst. of Tech., Cambridge (USA). Artificial Intelligence Lab. Dally, W. J. 1987
  • Coherent VLSI environment. Semiannual technical report, 1 October 1986-31 March 1987 Massachusetts Inst. of Tech., Cambridge (USA). Microsystems Research Center Penfield, P., Dally, W. J., Glasser, L. A., Knight, T. F., Leighton, F. T. 1987
  • A coherent VLSI design environment MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Penfield Jr, P., Dally, W. J., Glasser, L. A., Knight Jr, T. F., Leighton, F. T., Wyatt Jr, J. L. 1987
  • VLSI architecture for concurrent data structures California Inst. of Tech. Dally, W. J. 1986
  • On the Performance of k-ary n-cube Interconnection Networks California Institute of Technology Dally, W. J. 1986
  • 5208: TR: _86 Dally, W. J. 1986
  • The torus routine chip Dally, W. J., Seitz, C. L. 1986
  • A High-performance VLSI Quaternary Serial Multiplier Dally, W. J. 1986
  • Wire-efficient VLSI multiprocessor communication networks Massachusetts Institute of Technology, Microsystems Program Office Dally, W. J. 1986
  • Directions in concurrent computing Dally, W. J. 1986
  • The torus routing chip Distributed computing Dally, W. J., Seitz, C. L. 1986; 4 (1): 187-196
  • A Coherent VLSI Design Environment MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Leiserson, C. E., Penfield, P., Glasser, L. A., Knight, T. F., Dally, W. J. 1986
  • An object oriented architecture ACM SIGARCH Computer Architecture News Dally, W. J., Kajiya, J. T. 1985; 3 (13): 154-161
  • The balanced cube: a concurrent data structure California Institute of Technology Dally, W. J., Seitz, C. L. 1985
  • Fungicides for Crop Protection: Invited papers International Specialized Book Service Incorporated Dally, W. J., Smith, I. M. 1985
  • Concurrent Algorithms for the Max-Flow Problem California Institute of Technology Dally, W. J. 1985
  • A hardware architecture for switch-level simulation Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions Dally, W. J., Bryant, R. E. 1985
  • The MOSSIM Simulation Engine Architecture and Design California Institute of Technology Dally, W. J. 1984
  • A Special Purpose Processor for Switch-Level Simulation International Conference on Computer Aided Design Dally, W. J., Bryant, R. E. 1984