Bio


Dally investigates methods for applying VLSI technology to solve information processing problems. His current projects include network architecture, multicomputer architecture, media-processor architecture, and high-speed (4Gb/s) CMOS signaling. His research involves demonstrating novel concepts with working systems. Previous systems include the MARS Hardware Accelerator, the Torus Routing Chip, the J-Machine, M-Machine, and the Reliable Router. His group has pioneered techniques including fast capability-based addressing, processor coupling, virtual channel flow control, wormhole routing, link-level retry, message-driven processing, and deadlock-free routing.

Academic Appointments


Boards, Advisory Committees, Professional Organizations


  • Member, National Academy of Engineering (2013 - Present)
  • Member, American Academy of Arts and Sciences (2013 - Present)

Professional Education


  • PhD, Caltech (1986)

2017-18 Courses


All Publications


  • Conference Author/Panelist Index Dally, W. J., Aoki, N., Bai, X., Banerjee, K., Benini, L., Bergamaschi, R.
  • The Reconfigurable Arithmetic Processor Fiske, S., Dally, W. J.
  • Logic Simulation Algorithms for Pipelined Hardware Architectures Hardware Accelerators for Electrical CAD Agrawal, P., Dally, W. J., Tutundjian, R. edited by Ambler, T., Agrawal, P. 1988.
  • IEEE MICRO 1998 ANNUAL INDEX, VOL. 18 Burns Dally, W. J., Adams, J., Alt, P. M., Arai, T., Arakawa, F., Avresky, D. R. ; 66: 79
  • ISSCC 2004/SESSION 7/TD: SCALING TRENDS/7.1 Horowitz, M., Dally, W.
  • Message-Driven Processor Architecture: Verson 11 Dally, W. J., Chien, A., Fiske, S., Horwat, W., Keen, J., Nuth, P.
  • CIMI FÍITIIt Dally, W. J., Balfour, J., Black-Shaffer, D., Chen, J., Harting, R. C., Parikb, V.
  • AI Memo No. 1272 April 26, 1994 Spertus, E., Dally, W. J.
  • 6 Guest Editors’ Introduction: Top Picks from the 2008 Computer Architecture Conferences Joel Emer and Dean Tullsen 10 Larrabee: A Many-Core x86 Architecture Dally, W. J., Seiler, L., Carmean, D., Sprangle, E., Forsyth, T., Abrash, M.
  • 2010 Reviewers List Dally, W. J., Acacio, M. E., Agrawal, N., Altman, E., Alur, R., Baas, B.
  • Stanford University Concurrent VLSI Architecture Memo 124 Elastic Buffer Networks-on-Chip Michelogiannakis, G., Balfour, J., Dally, W. J.
  • Spills, Fills, and Kills Erez, M., Towles, B. P., Dally, W. J.
  • 5 Guest Editors’ Introduction: Hot Chips 21 Krste Asanovic and Ralph Wittig 7 Power7: IBM’s Next-Generation Server Processor Dally, W. J., Kalla, R., Sinharoy, B., Starke, W. J., Floyd, M., Conway, P.
  • 31st Annual International Symposium on Computer Architecture ISCA 2004 Dally, W. J., Agerwala, T., Taylor, M., Lee, W., Miller, J., Wentzlaff, D.
  • SSCS Members Honored as 2002 IEEE Fellows Banu, M., Burghartz, J. N., Dally, W. J., Dean, M. E., Gielen, G. G., Griffin, E. L.
  • ISSCC 2007/SESSION 24/MULTI-GB/s TRANSCEIVERS/24.3 Palmer, R., Poulton, J., Dally, W. J., Eyles, J., Fuller, A. M., Greer, T.
  • Globally Adaptive Load-Balanced Routing on k-ary n-cubes Singh, A., Dally, W. J., Towles, B., Gupta, A. K.
  • 1987 INDEX, VOLUME 4 Dally, W. J., Agrawal, P.
  • IEEE Fellows Lead the Engineering Profession Dally, W. J., Agha, G. A., Babic, H. I., Basu, S., Beausoleil, W. F., Bertino, E.
  • ARVLSI’97 Committees Dally, W. J., Brown, R. B., Ishii, A. T., Papaefthymiou, M. C., Mudge, T. N., June, C. S.
  • Program Chair’s Message Dally, W. J.
  • Elastic Buffer Flow Control for On-Chip Networks IEEE TRANSACTIONS ON COMPUTERS Michelogiannakis, G., Dally, W. J. 2013; 62 (2): 295-309
  • Channel reservation protocol for over-subscribed channels and destinations Michelogiannakis, G., Jiang, N., Becker, D., Dally, W. J. 2013
  • A 0.54 pJ/b 20Gb/s ground-referenced single-ended short-haul serial link in 28nm CMOS for advanced packaging applications Solid-State Circuits Conference Digest of Technical Papers (ISSCC) Poulton, J. W., Dally, W. J., Chen, X., Eyles, J. G., Greer, T. H., Tell, S. G. 2013
  • A detailed and flexible cycle-accurate network-on-chip simulator Performance Analysis of Systems and Software (ISPASS) Jiang, N., Becker, D. U., Michelogiannakis, G., Balfour, J., Towles, B., Shaw, D. E. 2013
  • A 0.54 pJ/b 20 Gb/s Ground-Referenced Single-Ended Short-Reach Serial Link in 28 nm CMOS for Advanced Packaging Applications IEEE Poulton, J. W., Dally, W. J., Chen, X., Eyles, J. G., Greer, T. H., Tell, S. G. 2013
  • Optimizing data structures in high-level programs: new directions for extensible compilers based on staging Rompf, T., Sujeeth, A. K., Amin, N., Brown, K. J., Jovanovic, V., Lee, H. 2013
  • 21st century digital design tools Dally, W. J., Malachowsky, C., Keckler, S. W. 2013
  • Composition and reuse with compiled domain-specific languages Dally, W. J., Sujeeth, A. K., Rompf, T., Brown, K. J., Lee, H., Chafi, H. 2013
  • A Hierarchical Thread Scheduler and Register File for Energy-Efficient Throughput Processors ACM TRANSACTIONS ON COMPUTER SYSTEMS Gebhart, M., Johnson, D. R., Tarjan, D., Keckler, S. W., Dally, W. J., Lindholm, E., Skadron, K. 2012; 30 (2)
  • Green-Marl: A DSL for Easy and Efficient Graph Analysis ACM SIGPLAN NOTICES Hong, S., Chafi, H., Sedlar, E., Olukotun, K. 2012; 47 (4): 349-362
  • Network Congestion Avoidance Through Speculative Reservation 18th IEEE International Symposium on High-Performance Computer Architecture (HPCA) Jiang, N., Becker, D. U., Michelogiannakis, G., Dally, W. J. IEEE. 2012: 443–454
  • Digital Design: A Systems Approach Dally, W. J., Harting, R. C. Cambridge University Press. 2012
  • A case of system-level hardware/software co-design and co-verification of a commodity multi-processor system with custom hardware Dally, W. J., Hong, S., Oguntebi, T., Casper, J., Bronson, N., Kozyrakis, C. 2012
  • Unifying primary cache, scratch, and register file memories in a throughput processor Gebhart, M., Keckler, S. W., Khailany, B., Krashinsky, R., Dally, W. J. 2012
  • It's about the Power: An Architect's View of Interconnect IEEE International Interconnect Technology Conference (IITC) Dally, B. IEEE. 2012
  • Article 8-A Hierarchical Thread Scheduler and Register File for Energy-Efficient Throughput Processors ACM Transactions on Computer Systems-TOCS Gebhart, M., Johnson, D. R., Tarjan, D., Keckler, S. W., Dally, W. J., Lindholm, E. 2012; 2 (30): 38
  • Adaptive Backpressure: Efficient Buffer Management for On-Chip Networks 30th IEEE International Conference on Computer Design (ICCD) Becker, D. U., Jiang, N., Michelogiannakis, G., Dally, W. J. IEEE. 2012: 419–426
  • Packet Chaining: Efficient Single-Cycle Allocation for On-Chip Networks IEEE COMPUTER ARCHITECTURE LETTERS Michelogiannakis, G., Jiang, N., Becker, D. U., Dally, W. J. 2011; 10 (2): 33-36
  • Evaluating Elastic Buffer and Wormhole Flow Control IEEE TRANSACTIONS ON COMPUTERS Michelogiannakis, G., Becker, D. U., Dally, W. J. 2011; 60 (6): 896-903
  • Circuit challenges for future computing systems Dally, W. J. 2011
  • Guaranteeing forward progress of unified register allocation and instruction scheduling Technical Report Concurrent VLSI Architecture Group Memo 127, Stanford Park, J., Dally, W. J. 2011
  • Gpus and the future of parallel computing Micro, IEEE Keckler, S. W., Dally, W. J., Khailany, B., Garland, M., Glasco, D. 2011; 5 (31): 7-17
  • Energy-efficient mechanisms for managing thread context in throughput processors ACM SIGARCH Computer Architecture News Gebhart, M., Johnson, D. R., Tarjan, D., Keckler, S. W., Dally, W. J., Lindholm, E. 2011; 3 (39): 235-246
  • 2011 Index IEEE Computer Architecture Letters Vol. 10 Computer Architecture Letters Becker, D., Choi, I., Cooper-Balis, E., Dally, W. J., Devadas, S., Duato, J. 2011; 53: 56
  • Liszt: a domain specific language for building portable mesh-based PDE solvers DeVito, Z., Joubert, N., Palacios, F., Oakley, S., Medina, M., Barrientos, M. 2011
  • A compile-time managed multi-level register file hierarchy Gebhart, M., Keckler, S. W., Dally, W. J. 2011
  • 4 Guest Editor’s Introduction: CPUs, GPUs, and Hybrid Computing David Brooks 7 GPUs and the Future of Parallel Computing Keckler, S. W., Dally, W. J., Khailany, B., Garland, M., Glasco, D., Rohr, D. 2011
  • Efficient Topologies for Large-scale Cluster Networks Conference on Optical Fiber Communication (OFC)/Collocated National Fiber Optic Engineers (NFOEC) Kim, J., Dally, W. J., Abts, D. IEEE. 2010
  • Throughput computing Dally, W. J. 2010
  • Evaluating bufferless flow control for on-chip networks Michelogiannakis, G., Sanchez, D., Dally, W. J., Kozyrakis, C. 2010
  • The even/odd synchronizer: A fast, all-digital, periodic synchronizer Asynchronous Circuits and Systems (ASYNC), 2010 IEEE Symposium on Dally, W. J., Tell, S. G. 2010: 75-84
  • Moving the needle, computer architecture research in academe and industry ACM SIGARCH Computer Architecture News Dally, W. J. 2010; 3 (38): 1-1
  • The end of denial architecture and the rise of throughput computing Dally, W. J. 2010
  • Apparatus and method for packet scheduling US Patent Dally, W. J., Carvey, P. P., Beliveau, P. A., Mann, W. F., Dennison, L. R. 2010; 760 (7): 747
  • 2010 IEEE Symposium on Asynchronous Circuits and Systems Dally, W. J., Tell, S. G. 2010
  • The GPU Computing Era (HTML) Nickolls, J., Dally, W. J. 2010
  • The GPU computing era Micro, IEEE Nickolls, J., Dally, W. J. 2010; 2 (30): 56-69
  • The end of denial architecture and the rise of throughput computing Keynote speech at Desgin Automation Conference Dally, W. J. 2010
  • Booksim 2.0 User’s Guide Standford University Jiang, N., Michelogiannakis, G., Becker, D., Towles, B., Dally, W. J. 2010
  • Fine-grain dynamic instruction placement for L0 scratch-pad memory Park, J., Balfour, J., Dally, W. J. 2010
  • Block-Parallel Programming for Real-time Embedded Applications WJ 2010

    View details for DOI D

  • Buffer-space Efficient and Deadlock-free Scheduling of Stream Applications on Multi-core Architectures 22nd ACM Symposium on Parallelism in Algorithms and Architectures Park, J., Dally, W. J. ASSOC COMPUTING MACHINERY. 2010: 1–10
  • Operand Registers and Explicit Operand Forwarding IEEE COMPUTER ARCHITECTURE LETTERS Balfour, J., Harting, R. C., Dally, W. J. 2009; 8 (2): 60-63
  • COST-EFFICIENT DRAGONFLY TOPOLOGY FOR LARGE-SCALE SYSTEMS IEEE MICRO Kim, J., Dally, W., Scott, S., Abts, D. 2009; 29 (1): 33-40
  • Indirect adaptive routing on large scale interconnection networks ACM SIGARCH Computer Architecture News Jiang, N., Kim, J., Dally, W. J. 2009; 3 (37): 220-231
  • Router designs for elastic buffer on-chip networks Michelogiannakis, G., Dally, W. J. 2009
  • Power efficient supercomputing Accelerator-based Computing and Manycore Workshop (presentation) Dally, W. J. 2009; 1
  • Maximizing the Filter Rate of L0 Compiler-Managed Instruction Stores by Pinning Technical Report 126, Concurrent VLSI Architecture Group, Stanford University Park, J., Balfour, J., Dally, W. J. 2009
  • Stream Processors Multicore Processors and Systems Erez, M., Dally, W. J. 2009: 231-270
  • Exascale software study: Software challenges in extreme scale systems DARPA IPTO, Air Force Research Labs Amarasinghe, S., Campbell, D., Carlson, W., Chien, A., Dally, W., Elnohazy, E. 2009
  • Embracing heterogeneity–parallel programming for changing hardware Linderman, M. D., Balfour, J., Meng, T. H., Dally, W. J. 2009
  • Elastic-buffer flow control for on-chip networks High Performance Computer Architecture Michelogiannakis, G., Balfour, J., Dally, W. J. 2009
  • Allocator implementations for network-on-chip routers Becker, D. U., Dally, J. J. 2009
  • Load-balanced routing US Patent Singh, A., Dally, W. J. 2009; 633 (7): 940
  • Elastic-Buffer Flow Control for On-Chip Networks 15th International Symposium on High-Performance Computer Architecture Michelogiannakis, G., Balfour, J., Dally, W. J. IEEE COMPUTER SOC. 2009: 151–162
  • Opportunities Beyond Single-Core Microprocessors 15th International Symposium on High-Performance Computer Architecture Hill, M. D., Adve, S. V., Bader, D. A., Dally, W., Harrod, W., Sarkar, V. IEEE COMPUTER SOC. 2009: 143–143
  • Cost-Efficient Dragonfly Topology for Large-Scale Systems Conference on Optical Fiber Communication (OFC 2009) Kim, J., Dally, W. J., Scott, S., Abts, D. IEEE. 2009: 2174–2176
  • Efficient embedded computing COMPUTER Dally, W. J., Balfour, J., Black-Shaffer, D., Chen, J., Harting, R. C., Parikh, V., Park, J., Sheffield, D. 2008; 41 (7): 27-?
  • Stream scheduling: A framework to manage bulk operations in memory hierarchies 14th International Euro-Par Conference on Parallel Computing Das, A., Dally, W. J. SPRINGER-VERLAG BERLIN. 2008: 337–349
  • A tuning framework for software-managed memory hierarchies Ren, M., Park, J. Y., Houston, M., Aiken, A., Dally, W. J. 2008
  • An energy-efficient processor architecture for embedded systems Computer Architecture Letters Balfour, J., Dally, W. J., Black-Schaffer, D., Parikh, V., Park, J. S. 2008; 1 (7): 29-32
  • Exascale computing study: Technology challenges in achieving exascale systems Kogge, P., Bergman, K., Borkar, S., Campbell, D., Carson, W., Dally, W. 2008
  • Structured Application-Specific Integrated Circuit (ASIC) Study STANFORD UNIV CA COMPUTER SYSTEMS LAB Dally, W., Balfour, J., Black-Schaffer, D., Hartke, P. 2008
  • Hierarchical instruction register organization Computer Architecture Letters Black-Schaffer, D., Balfour, J., Dally, W., Parikh, V., Park, J. S. 2008; 2 (7): 41-44
  • Technology-driven, highly-scalable dragonfly topology 35th Annual International Symposium on Computer Architecture Kim, J., Dally, W. J., Scott, S., Abts, D. IEEE COMPUTER SOC. 2008: 77–88
  • A programmable 512 GOPS stream processor for signal, image, and video processing Solid-State Circuits, IEEE Journal Khailany, B. K., Williams, T., Lin, J., Long, E. P., Rygh, M., Tovey, D. F., Dally, B. 2008; 1 (43): 202-213
  • Exascale computing study: Technology challenges in achieving exascale systems Bergman, K., Borkar, S., Campbell, D., Carlson, W., Dally, W., Denneau, M. 2008
  • A Portable Runtime Interface For Multi-Level Memory Hierarchies ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 08) Houston, M., Park, J., Ren, M., Knight, T., Fatahalian, K., Aiken, A., Dally, W. J., Hanrahan, P. ASSOC COMPUTING MACHINERY. 2008: 143–152
  • A 14-mW 6.25-Gb/s transceiver in 90-nm CMOS IEEE International Solid-State Circuits Conference (ISSCC) Poulton, J., Palmer, R., Fuller, A. M., Greer, T., Eyles, J., Dally, W. J., Horowitz, M. IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC. 2007: 2745–57
  • Research challenges for on-chip interconnection networks IEEE MICRO Owens, J. D., Dally, W. J., Ho, R., Jayasimha, D. N., Keckler, S. W., Peh, L. 2007; 27 (5): 96-108
  • Register pointer architecture for efficient embedded processors Design, Automation and Test in Europe Conference and Exhibition (DATE 07) Park, J., Park, S., Balfour, J. D., Black-Schaffer, D., Kozyrakis, C., Dally, W. J. IEEE. 2007: 600–605
  • Research Challenges for On-Chip Interconnection Networks (HTML) Owens, J. D., Dally, W. J., Ho, R., Jayasimha, D. N., Keckler, S. W., Peh, L. S. 2007
  • A 14mW 6.25 Gb/s transceiver in 90nm CMOS for serial chip-to-chip communications Palmer, R., Poulton, J., Dally, W. J., Eyles, J., Fuller, A. M., Greer, T. 2007
  • Stream Scheduling: A Framework to Manage Bulk Operations in a Memory Hierarchy Parallel Architecture and Compilation Techniques Das, A., Dally, W. J. 2007
  • Interconnect-Centric Computing. HPCA Dally, W. J., Keynote, H. 2007; 1
  • Tradeoff between data-, instruction-, and thread-level parallelism in stream processors Ahn, J., Erez, M., Dally, W. J. 2007
  • Flattened butterfly: a cost-efficient topology for high-radix networks ACM SIGARCH Computer Architecture News Kim, J., Dally, W. J., Abts, D. 2007; 2 (35): 126-137
  • Executing irregular scientific applications on stream architectures Erez, M., Ahn, J. H., Gummaraju, J., Rosenblum, M., Dally, W. J. 2007
  • Architectural support for the stream execution model on general-purpose processors Gummaraju, J., Erez, M., Coburn, J., Rosenblum, M., Dally, W. J. 2007
  • Computer architecture in the many-core era 24th International Conference on Computer Design Dally, B. IEEE. 2007: 1–1
  • Compilation for Explicitly Managed Memory Hierarchies ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming Knight, T. J., Park, J. Y., Ren, M., Houston, M., Erez, M., Fatahalian, K., Aiken, A., Dally, W. J., Hanrahan, P. ASSOC COMPUTING MACHINERY. 2007: 226–236
  • Flattened Butterfly : A Cost-Efficient Topology for High-Radix Networks 34th Annual International Symposium on Computer Architecture Kim, J., Dally, W. J., Abts, D. ASSOC COMPUTING MACHINERY. 2007: 126–137
  • Flattened butterfly topology for on-chip networks 40th Annual IEEE/AMC International Symposium on Microarchitecture Kim, J., Balfour, J., Dally, W. J. IEEE COMPUTER SOC. 2007: 172–182
  • The BlackWidow high-radix Clos network 33rd International Symposium on Computer Architecture Scott, S., Abts, D., Kim, J., Dally, W. J. IEEE COMPUTER SOC. 2006: 16–27
  • Sequoia: programming the memory hierarchy Fatahalian, K., Horn, D., Knight, T., Leem, L., Houston, M., Park, J., Dally, B. 2006
  • Multi-Core for HPC: Breakthrough or Breakdown? Sterling, T., Kogge, P., Dally, W., Scott, S., Gropp, W., Keyes, D. 2006
  • Topology optimization of interconnection networks Computer Architecture Letters Gupta, A. K., Dally, W. J. 2006; 1 (5): 10-13
  • Prefix search method US Patent Waters, G. M., Dennison, L. R., Carvey, P. P., Dally, W. J., Mann, W. F. 2006; 130 (7): 847
  • DRAFT Final Report: Workshop on On-and Off-Chip Networks for Multi-Core Systems Capturado em: http://www. ece. ucdavis. edu/~ ocin06 Dally, W. 2006
  • Compiling for stream processing Das, A., Dally, W. J., Mattson, P. 2006
  • Data parallel address architecture Computer Architecture Letters Ahn, J. H., Dally, W. J. 2006; 1 (5): 30-33
  • Pulsenet-A Parallel Flash Sampler and Digital Processor IC for Optical SETI Custom Integrated Circuits Conference, 2006. CICC'06. IEEE Howard, A. W., Wei, G. Y., Dally, W. J., Horowitz, P. 2006: 261-264
  • Design tradeoffs for tiled CMP on-chip networks Balfour, J., Dally, W., J. 2006
  • The design space of data-parallel memory systems Ahn, J. H., Erez, M., Dally, W. J. 2006
  • Adaptive routing in high-radix clos network Kim, J., Dally, W. J., Dally, J., Abts, D. 2006
  • Future directions for on-chip interconnection networks OCIN Workshop Dally, W. J. 2006
  • A 20-Gb/s 0.13-mu m CMOS serial link transmitter using an LC-PLL to directly drive the output multiplexer Symposium on VLSI Circuits Chiang, P., Dally, W. J., Lee, M. J., Senthinathan, R., Oh, Y., Horowitz, M. A. IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC. 2005: 1004–11
  • Explaining the gap between ASIC and custom power: A custom perspective 42nd Design Automation Conference Chang, A., Dally, W. J. IEEE COMPUTER SOC. 2005: 281–284
  • 11th International Symposium on High-Performance Computer Architecture (HPCA'05) Ahn, J. H., Erez, M., Dally, W. J. 2005
  • Fault tolerance techniques for the merrimac streaming supercomputer Erez, M., Jayasena, N., Knight, T. J., Dally, W. J. 2005
  • Scatter-add in data parallel architectures 11th International Symposium on High-Performance Computer Architecture Ahn, J. H., Erez, M., Dally, W. J. IEEE COMPUTER SOC. 2005: 132–142
  • Microarchitecture of a high-radix router 32nd International Symposium on Computer Architecture Kim, J., Dally, W. J., Towles, B., Gupta, A. K. IEEE COMPUTER SOC. 2005: 420–431
  • A 33-mW 8-Gb/s CMOS clock multiplier and CDR for highly integrated I/Os IEEE Custom Integrated Circuits Conference Farjad-Rad, R., Nguyen, A., Tran, J. M., Greer, T., Poulton, J., Dally, W. J., Edmondson, J. H., Senthinathan, R., Rathi, R., Lee, M. J., Ng, H. T. IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC. 2004: 1553–61
  • Stream register files with indexed access 10th International Symposium on High-Performance Computer Architecture Jayasena, N., Erez, M., Ahn, J. H., Dally, W. J. IEEE COMPUTER SOC. 2004: 60–72
  • Streams and vectors: A memory system perspective 6th WorkShop on Media and Streaming Processors Jayasena, N., Dally, W. J. 2004
  • High-Speed Logic, Circuits, Libraries and Layout Closing the Gap Between ASIC & Custom Chang, A., Dally, W. J., Chinnery, D., Keutzer, K., Zlatanovici, R. 2004: 101-144
  • Adaptive channel queue routing on k-ary n-cubes Singh, A., Dally, W., J., Gupta, A., Towles, B. 2004
  • Stream processors: Progammability and efficiency Queue Dally, W. J., Kapasi, U. J., Khailany, B., Ahn, J. H., Da, A. 2004; 1 (2): 52
  • Principles and practices of interconnection networks Access Online via Elsevier Dally, W. J., Towles, B. P. 2004
  • How scaling will change processor architecture Solid-State Circuits Conference, 2004. Digest of Technical Papers. Horowitz, M., Dally, W. 2004
  • Exploiting Structure and Managing Wires to Increase Density and Performance Closing the Gap Between ASIC & Custom Chang, A., Dally, W. J. 2004: 269-287
  • Analysis and performance results of a molecular modeling application on Merrimac Erez, M., Ahn, J. H., Garg, A., Dally, W. J., Darve, E. 2004
  • A 20Gb/s 0.13um CMOS serial link transmitter using an LC-PLL to directly drive the output multiplexer Symposium on VLSI Circuits Chiang, P., Dally, W. J., Lee, M. J., Senthinathan, R., Oh, Y., Horowitz, M. IEEE. 2004: 272–275
  • The case for broader computer architecture education: keynote address Dally, W. J. 2004
  • Buffer and delay bounds in high radix interconnection networks Computer Architecture Letters Singh, A., Dally, W. J. 2004; 1 (3): 8-8
  • Space-efficient source routing Carvey, P., Dally, W., Dennison, L., King, P., Mann, W. 2004
  • Globally adaptive load-balanced routing on tori Computer Architecture Letters Singh, A., Dally, W. J., Towles, B., Gupta, A. K. 2004; 1 (3): 2-2
  • Evaluating the imagine stream architecture 31st Annual International Symposium on Computer Architecture Alm, J. H., Dally, W. J., Khailany, B., Kapasi, U. J., Das, A. IEEE COMPUTER SOC. 2004: 14–25
  • A second-order semidigital clock recovery circuit based on injection locking IEEE International Solid-State Circuits Conference Ng, H. T., Farjad-Rad, R., Lee, M. J., Dally, W. J., Greer, T., Poulton, J., Edmondson, J. H., Rathi, R., Senthinathan, R. IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC. 2003: 2101–10
  • Guaranteed scheduling for switches with configuration overhead IEEE-ACM TRANSACTIONS ON NETWORKING Towles, B., Dally, W. J. 2003; 11 (5): 835-847
  • Programmable stream processors COMPUTER Kapasi, U. J., Rixner, S., Dally, W. J., Kailany, B., Ahn, J. H., Mattson, P., Owens, J. D. 2003; 36 (8): 54-?
  • Jitter transfer characteristics of delay-locked loops - Theories and design techniques IEEE JOURNAL OF SOLID-STATE CIRCUITS Lee, M. J., Dally, W. J., Greer, T., Ng, H. T., Farjad-Rad, R., Poulton, J., Senthinathan, R. 2003; 38 (4): 614-621
  • Exploring the VLSI scalability of stream processors 9th International Symposium on High-Performance Computer Architecture Khailany, B., Dally, W. J., Rixner, S., Kapasi, U. J., Owens, J. D., Towles, B. IEEE COMPUTER SOC. 2003: 153–164
  • Prefix search method Carvey, P., Carvey, P., Dennison, L., Mann, W., Waters, G. 2003
  • Merrimac: Supercomputing with streams Dally, W., J., Labonte, F., Das, A., Hanrahan, P., Ahn, J. H., Gummaraju, J. 2003
  • A second-order semi-digital clock recovery circuit based on injection locking Solid-State Circuits Conference, 2003. Digest of Technical Papers. ISSCC Lee, M. J., Dally, W. J., Poulton, J., Greer, T., Edmondson, J., Farjad-Rad, R. 2003
  • A 33mW 8Gb/s CMOS clock multiplier and CDR for highly integrated I/Os Ng, H. T., Lee, M. J., Farjad-Rad, R., Senthinathan, R., Dally, W. J., Nguyen, A. 2003
  • 0.622-8.0 Gbps 150 mW serial IO macrocell with fully flexible preemphasis and equalization VLSI Circuits, 2003. Digest of Technical Papers. 2003 Symposium on Farjad-Rad, R., Ng, H. T., Lee, M. J., Senthinathan, R., Dally, W. J., Nguyen, A. 2003: 63-66
  • CMOS high-speed I/Os-present and future Lee, M. J., Dally, W. J., Farjad-Rad, R., Ng, H. T., Senthinathan, R., Edmondson, J. 2003
  • The Ninth International Symposium on High-Performance Computer Architecture (HPCA'03) Khailany, B., Dally, W. J., Rixner, S., Kapasi, U. J., Owens, J. D., Towles, B. 2003
  • Methods and apparatus for event-driven routing Carvey, P., Dally, W., Dennison, L., King, P. 2003
  • Throughput-centric routing algorithm design Towles, B., Dally, W. J., Boyd, S. 2003
  • GOAL: A load-balanced adaptive routing algorithm for torus networks 30th Annual International Symposium on Computer Architecture Singh, A., Dally, W. J., Gupta, A. K., Towles, B. IEEE COMPUTER SOC. 2003: 194–205
  • A low-power multiplying DLL for low-jitter multigigahertz clock generation in highly integrated digital chips IEEE International Solid-State Circuits Conference (ISSCC 2001) Farjad-Rad, R., Dally, W., Ng, H. T., Senthinathan, R., Lee, M. J., Rathi, R., Poulton, J. IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC. 2002: 1804–12
  • A stream processor development platform 20th IEEE International Conference on Computer Design Serebrin, B., Owens, J. D., Chen, C. H., Crago, S. P., Kapasi, U. J., Khailany, B., Mattson, P., Namkoong, J., Rixner, S., Dally, W. J. IEEE COMPUTER SOC. 2002: 303–308
  • Locality-preserving randomized oblivious routing on torus networks Singh, A., Dally, W. J., Towles, B., Gupta, A. K. 2002
  • Comparing Reyes and OpenGL on a stream architecture Owens, J. D., Khailany, B., Towles, B., Dally, W. J. 2002
  • Internet switch router Carvey, P., Carvey, P., Dennison, L., King, P. 2002
  • Prefix search circuitry and method Carvey, P., Dally, W., Dennison, L., Mann, W., Waters, G. 2002
  • Stream Processing for High-Performance Embedded Systems Defense Technical Information Center Dally, W. J. 2002
  • A 0.2-2 GHz 12 mW multiplying DLL for low-jitter clock synthesis in highly-integrated data communication chips Farjad-Rad, R., Dally, W., Ng, H. T., Poulton, J., Stone, T., Rathi, R. 2002
  • Media processing applications on the imagine stream processor 20th IEEE International Conference on Computer Design Owens, J. D., Rixner, S., Kapasi, U. J., Mattson, P., Towles, B., Serebrin, B., Dally, W. J. IEEE COMPUTER SOC. 2002: 295–302
  • Scalable opto-electronic network (SOENet) 10th Symposium on High Performance Interconnects Gupta, A. K., Dally, W. J., Singh, A., Towles, B. IEEE COMPUTER SOC. 2002: 71–76
  • The imagine stream processor 20th IEEE International Conference on Computer Design Kapasi, U. J., Dally, W. J., Rixner, S., Owens, J. D., Khailany, B. IEEE COMPUTER SOC. 2002: 282–288
  • Computer architecture is all about interconnect High-Perf. Comp. Architecture Dally, W. J. 2002
  • Worst-case traffic for oblivious routing functions Towles, B., Dally, W. J. 2002
  • Method and system for guaranteeing quality of service in large capacity input output buffered cell switch based on minimum bandwidth guarantees and weighted fair share of unused bandwidth Dally, W., Meempat, G., Ramamurthy, G. 2002
  • Worst-case Traffic for Oblivious Routing Functions (PDF) Towles, B., Dally, W. J. 2002
  • Migration in single chip multiprocessors Computer Architecture Letters Shaw, K. A., Dally, W. J. 2002; 1 (1): 12-12
  • Guaranteed scheduling for switches with configuration overhead 21st Annual Joint Conference of the IEEE-Computer-and-Communications-Societies Towles, B., Dally, W. J. IEEE. 2002: 342–351
  • VLSI design and verification of the imagine processor 20th IEEE International Conference on Computer Design Khailany, B., Dally, W. J., Chang, A., Kapasi, U. J., Namkoong, J., Towles, B. IEEE COMPUTER SOC. 2002: 289–294
  • Hot chips 12 IEEE MICRO Dally, W. J., Tremblay, M., Baum, A. J. 2001; 21 (2): 13-15
  • Imagine: Media processing with streams IEEE MICRO Khailany, B., Dally, W. J., Kapasi, U. J., Mattson, P., Namkoong, J., Owens, J. D., Towles, B., Chang, A., Rixner, S. 2001; 21 (2): 35-46
  • A delay model for router microarchitectures IEEE MICRO Peh, L. S., Dally, W. J. 2001; 21 (1): 26-34
  • Elastic interconnects: Repeater-inserted long wiring capable of compressing and decompressing data Mizuno, M., Dally, W., Onishi, H. 2001
  • Guest Editors' Introduction: Hot Chips 12 IEEE MICRO Baum, A. J., Dally, W. J., Tremblay, M. 2001; 2 (21): 0013-15
  • Scalable switching fabrics for Internet routers White paper, Avici Systems Inc Dally, W. J. 2001
  • Monolithic chaotic communications system Circuits and Systems, 2001. ISCAS 2001. The 2001 IEEE International Chiang, P., Dally, W., Lee, E. 2001
  • A streaming supercomputer Whitepaper Dally, W. J., Hanrahan, P., Fedkiw, R. 2001
  • A single-chip terabit switch Hot Chips Dally, W. J., Dettloff, W., Eyles, J., Greer, T., Poulton, J., Stone, T. 2001; 13
  • A Delay Model for Router Microarchitectures (HTML) Peh, L. S., Dally, W. J. 2001
  • Guest Editors' Introduction: Hot Chips 12 (HTML) Dally, W. J., Tremblay, M., Baum, A. J. 2001
  • An 84-mW 4-Gb/s clock and data recovery circuit for serial link applications Symposium on VLSI Circuits Lee, M. J., Dally, W. J., POULTON, J. W., Chiang, P., Greenwood, S. F. JAPAN SOCIETY APPLIED ELECTROMAGNETICS & MECHANICS. 2001: 149–152
  • A delay model and speculative architecture for pipelined routers 7th International Symposium on High-Performance Computer Architecture Peh, L. S., Dally, W. J. IEEE COMPUTER SOC. 2001: 255–266
  • Low-power area-efficient high-speed I/O circuit techniques International Solid-State Circuits Conference Lee, M. J., Dally, W. J., Chiang, P. IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC. 2000: 1591–99
  • Communication scheduling 9th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS_IX) Mattson, P., Dally, W. J., Rixner, S., Kapasi, U. J., Owens, J. D. ASSOC COMPUTING MACHINERY. 2000: 82–92
  • The role of custom design in ASIC chips 37th Annual Design Automation Conference (DAC) Dally, W. J., Chang, A. ASSOC COMPUTING MACHINERY. 2000: 643–647
  • 10 Subspace Optimizations Knobe, K., Dally, W. J. edited by Kessler, Christoph, W. 2000
  • Flit-reservation flow control Peh, L., S., Dally, W. J. 2000
  • Stream Scheduling STANFORD UNIV CA COMPUTER SYSTEMS LAB Dally, W. J., Mattson, P., Kapasi, U. J., Owens, J. D., Towles, B. 2000
  • Stream scheduling STANFORD UNIV CA COMPUTER SYSTEMS LAB Kapasi, U. J., Mattson, P., Dally, W. J., Owens, J. D., Towles, B. 2000
  • Sixth International Symposium on High-Performance Computer Architecture Peh, L. S., Dally, W. J. 2000
  • Polygon rendering on a stream architecture Owens, J. D., Dally, W. J., Kapasi, U. J., Rixner, S., Mattson, P., Mowery, B. 2000
  • A 90 mW 4 Gb/s equalized I/O circuit with input offset cancellation Lee, M. J., Dally, W., Chiang, P. 2000
  • Sixth International Symposium on High-Performance Computer Architecture Rixner, S., Dally, W. J., Khailany, B., Mattson, P., Kapasi, U. J., Owens, J. D. 2000
  • Efficient conditional operations for data-parallel architectures 33rd Annual International Symposium on Microarchitecture (MICRO-33) Kapasi, U. J., Dally, W. J., Rixner, S., Mattson, P. R., Owens, J. D., Khailany, B. IEEE COMPUTER SOC. 2000: 159–170
  • Memory access scheduling isca Owens, J. D., Mattson, P., Kapasi, U. J., Dally, W. J., Rixner, S. 2000; 128
  • Register organization for media processing Rixner, S., Dally, W., J., Khailany, B., Mattson, P., Kapasi, U. J., Owens, J. 2000
  • Smart memories: A modular reconfigurable architecture ACM SIGARCH Computer Architecture News Mai, K., Paaske, T., Jayasena, N., Ho, R., Dally, W. J., Horowitz, M. 2000; 2 (28): 161-171
  • Processor mechanisms for software shared memory 3rd International Symposium on High Performance Computing Carter, N. P., Dally, W. J., Lee, W. S., Keckler, S. W., Chang, A. SPRINGER-VERLAG BERLIN. 2000: 120–133
  • Concurrent event handling through multithreading IEEE TRANSACTIONS ON COMPUTERS Keckler, S. W., Chang, A., Lee, W. S., Chatterjee, S., Dally, W. J. 1999; 48 (9): 903-916
  • VLSI architecture: Past, present, and future 20th Anniversary Conference on Advanced Research in VLSI Dally, W. J., Lacy, S. IEEE COMPUTER SOC. 1999: 232–241
  • GAD: A 12-GS/s CMOS 4-bit A/D converter for an equalized multi-level link Ellersick, W., Yang, C. K., Horowitz, M., Dally, W. J. 1999
  • Interconnect-limited VLSI architecture Interconnect Technology, 1999. IEEE International Conference Dally, W. J. 1999: 15-17
  • Computer Architecture for the Next Millenium Dally, W. J. 1999
  • 20th Anniversary Conference on Advanced Research in VLSI Dally, W. J., Lacy, S. 1999
  • Guest editors' introduction: The bleeding edge IEEE MICRO Rettberg, R., Dally, W. J., Culler, D. E. 1998; 18 (1): 10-11
  • Tomorrow’s Computing Engines keynote speech, Fourth Int’l Symp. High-Performance Computer Architecture Dally, W. 1998
  • VLSI datapath choices: Cell-based versus full-custom Massachusetts Institute of Technology Chang, A. L. 1998
  • The j-machine: A retrospective Retrospective in Dally, W. J., Chang, A., Chien, A., Fiske, S., Horwat, W., Keen, J. 1998: 54-58
  • Architecture of a message-driven processor 25 years of the international symposia on Computer architecture (selected Dally, W. J., Chao, L., Chien, A., Hassoun, S., Horwat, W., Kaplan, J. 1998
  • An efficient, protected message interface Computer Lee, W. S., Dally, W. J., Keckler, S. W., Carter, N. P., Chang, A. 1998; 11 (31): 69-75
  • Digital systems engineering Cambridge university press Dally, W. J., Poulton, J. W. 1998
  • Architecture of the Avici terabit switch/router Dally, W., Carvey, P., Dennison, L. 1998
  • E cient, protected message interface in the MIT M-Machine IEEE Computer Special Issue on Design Challenges for High-Performance Lee, W. S., Dally, W. J., Keckler, S. W., Carter, N. P., Chang, A. 1998
  • An instruction scheduling algorithm for communication-constrained microprocessors Massachusetts Institute of Technology Dally, W. J., Buehler, C. J. 1998
  • Point sample rendering Rendering Techniques Grossman, J. P., Dally, W. J. 1998; 98: 181-192
  • Media Processors 1999 (Proceedings Volume) Dally, W. J., Fritts, J. E., Wolf, W. H., Liu, B., Bove Jr, V. M., Lee, M. 1998
  • Media processing using streams Electronic Imaging Rixner, S., Dally, W. J., Kapasi, U. J., Khailany, B., Lopez-Lagunas, A., Mattson, P. R. 1998: 122-134
  • The J-Machine ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE Dally, W. J., Chang, A., Chien, A., Fiske, S., Horwat, W., Keen, J. 1998; 25: 54-58
  • A bandwidth-efficient architecture for media processing 31st Annual ACM/IEEE International Symposium on Microarchitecture (MICRO31) Rixner, S., Dally, W. J., Kapasi, U. J., Khailany, B., Lopez-Lagunas, A., Mattson, P. R., Owens, J. D. I E E E, COMPUTER SOC PRESS. 1998: 3–13
  • Digital Systems Engineering Poulton, J. W., Dally, J., John, W. Cambridge University Press. 1998
  • The Fifth International Conference on Massively Parallel Processing Using Optical Interconnections Dally, W. J., Lee, M. J., An, F. T., Poulton, J., Tell, S. 1998
  • Retrospective: the J-machine Dally, W. J., Chien, A., Fiske, S., Horwat, W., Lethin, R., Noakes, M. 1998
  • Invited Talks Coldren, L. A., Dally, W. J. 1998
  • Point sample rendering Massachusetts Institute of Technology Dally, W. J., Grossman, J. P. 1998
  • A tracking clock recovery receiver for 4-Gbps signaling IEEE MICRO Poulton, J., Dally, W. J., Tell, S. 1998; 18 (1): 25-27
  • Communication-oriented computer architecture: Data choreography abstract International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems Dally, W. J. I E E E, COMPUTER SOC PRESS. 1998: 93–93
  • The effects of explicitly parallel mechanisms on the Multi-ALU Processor cluster pipeline International Conference on Computer Design: VLSI in Computers and Processors Chang, A., Dally, W. J., Keckler, S. W., Carter, N. P., Lee, W. S. I E E E, COMPUTER SOC PRESS. 1998: 474–481
  • High-performance electrical signaling 5th International Conference on Massively Parallel Processing Dally, W. J., Lee, M. J., An, F. T., Poulton, J., Tell, S. I E E E, COMPUTER SOC PRESS. 1998: 11–16
  • Media processors using streams Conference on Media Processors 1999 Rixner, S., Dally, W. J., Kapasi, U. J., Khailany, B., Lopez-Lagunas, A., Mattson, P. R., Owens, J. D. SPIE-INT SOC OPTICAL ENGINEERING. 1998: 122–134
  • Message-driven dynamics Massachusetts Institute of Technology Dally, W. J., Lethin, R. A. 1997
  • The m-machine multicomputer International Journal of Parallel Programming Fillo, M., Keckler, S. W., Dally, W. J., Carter, N. P., Chang, A., Gurevich, Y. 1997; 3 (25): 183-212
  • The delta tree: An object-centered approach to image-based rendering Dally, W. J., McMillan, L., Bishop, G., Fuchs, H. 1997
  • Extended ephemeral logging: log storage management for applications with long lived transactions ACM Transactions on Database Systems (TODS) Keen, J. S., Dally, W. J. 1997; 1 (22): 1-42
  • Transmitter equalization for 4-Gbps signaling Micro, IEEE Dally, W. J., Poulton, J. 1997; 1 (17): 48-56
  • Asynchronous event handing Massachusetts Institute of Technology Dally, W. J., Chatterjee, S. 1997
  • Advances in the M-machine runtime system Massachusetts Institute of Technology Dally, W. J., Shultz, A. 1997
  • TPDS Now Online! z Special Issue Editors Old and New IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS Dally, W. J., Fortes, J. A. 1997; 3 (8): 225
  • Circuit designs for the MAP chip Massachusetts Institute of Technology Dally, W. J., Chen, A. R. 1997
  • 1997Annual Index, Vol. 17 development [single chip microprocessors] Dally, W. J., Adams, L., Anderson, T., Bilas, A., Biswas, B. B., Burger, D. 1997; 2000: 28-36
  • Design of the Configuration and Diagnostic Units of the MAP Chip Massachusetts Institute of Technology Dally, W. J., Klayman, K. 1997
  • An I/O port controller for the MAP chip Massachusetts Institute of Technology, Dept. of Electrical Engineering and Dally, W. J., Ma, A. 1997
  • A data-driven IDCT architecture for low power video applications Xanthopoulos, T., Chandrakasan, A. P., Sodini, C. G., Dally, W. J. 1996
  • The subspace model: Shape-based compilation for parallel systems Massachusetts Institute of Technology Dally, W. J., Knobe, K. B. 1996
  • Bandwidth, Granularity, and Mechanisms: Key Issues in the Design of Parallel Computers Dally, W. J. 1996
  • Flexible Memory Systems.(AASERT Fellowship) MASSACHUSETTS INST OF TECH CAMBRIDGE Dally, W. J., Carter, N. 1996
  • Flexible Memory Systems.(AASERT Fellowship). MASSACHUSETTS INST OF TECH CAMBRIDGE Carter, N., Dally, W. J. 1996
  • Architects Look to Processors of Future MICROPROCESSOR REPORT, MICRODESIGN RESOURCES Bell, G., Sites, R., Dally, W., Ditzel, D., Patt, Y. 1996; 10 (10)
  • Multiprocessor coupling system with integrated compile and run time scheduling for parallelism US Patent Keckler, S. W., Dally, W. J. 1996; 574 (5): 939
  • 1st IEEE Symposium on High-Performance Computer Architecture Fiske, S., Dally, W. J. 1995
  • Thread prioritization: A thread scheduling mechanism for multiple-context parallel processors Future Generation Computer Systems Fiske, S., Dally, W. J. 1995; 6 (11): 503-518
  • The M-Machine Multicomputer MASSACHUSETTS INST OF TECH CAMBRIDGE ARTIFICIAL INTELLIGENCE LAB Dally, W. J., Keckler, S. W., Fillo, M., Carter, N. P., Chang, A. 1995
  • 1st IEEE Symposium on High-Performance Computer Architecture Nuth, P. R., Dally, W. J. 1995
  • Implementation of atomic primitives on distributed shared memory multiprocessors Dally, W. J., Michael, M. M., Scott, M. L. 1995
  • The subspace model: A theory of shapes for parallel systems Knobe, K., Dally, W. J. 1995
  • The named-state register file: Implementation and performance Nuth, P. R., Dally, W. J. 1995
  • Low-latency plesiochronous data retiming Dennison, L. R., Dally, W. J., Xanthopoulos, D. 1995
  • Evaluating the locality benefits of active messages ACM SIGPLAN Notices Spertus, E., Dally, W. J. 1995; 8 (30): 189-198
  • The M-Machine operating system Massachusetts Institute of Technology Dally, W. J., Gurevich, Y. 1995
  • Fault tolerant adaptive routing in multicomputer networks Massachusetts Institute of Technology Xanthopoulos, T. 1995
  • Proceedings Dally, W. J., Poulton, J. W., Ishii, A. T. 1995
  • Hardware support for fast capability-based addressing ACM SIGPLAN Notices Carter, N. P., Keckler, S. W., Dally, W. J. 1994; 11 (29): 319-327
  • The implementation of a reliable router chip Massachusetts Institute of Technology Dally, W. J., Kan, K. H. 1994
  • The design of a high performance SPARC bus interface Massachusetts Institute of Technology Dally, W. J., Wong, D. F. 1994
  • Efficient message subsystem design Massachusetts Institute of Technology Dally, W. J., Lee, W. S. 1994
  • Subspace optimizations Automatic Parallelization Knobe, K., Dally, W. J. 1994: 153-176
  • M-Machine Microarchitecture v1. 11 Dally, W. J., Keckler, S. W., Carter, N., Chang, A., Fillo, M., Lee, W. S. 1994
  • The reliable router: A reliable and high-performance communication substrate for parallel computers Parallel Computer Routing and Communication Dally, W. J., Dennison, L. R., Harris, D., Kan, K., Xanthopoulos, T. 1994: 241-255
  • Named state and efficient context switching Multithreaded Computer Architecture Nuth, P. R., Dally, W. J. 1994: 201-212
  • Multithreaded computer architecture Boston: Kluwer Academic Publishers Dennis, J. B., Gao, G. R., Iannucii, R. A., Dally, W. J. 1994
  • Architecture and implementation of the Reliable Router Dally, W. J., Dennison, L. R., Harris, D., Kan, K., Xanthopoulos, T. 1994
  • A numerical engine for distributed sparse matrices Massachusetts Institute of Technology Dally, W. J., Telichevesky, R. 1994
  • The design and implementation of an actor language based on linear logic Massachusetts Institute of Technology Dally, W. J., Tse, C. S. 1994
  • XEL: extended ephemeral logging for log storage management Keen, J. S., Dally, W. J. 1994
  • VLSI design for freshmen and sophomores Massachusetts Institute of Technology Dally, W. J., Harris, D. 1994
  • Logging and recovery in a highly concurrent database Dally, W. J., Keen, J. S. 1994
  • A subspace optimizing data parallel complier Massachusetts Institute of Technology Dally, W. J., Dampier, T. O. 1994
  • Issues in the Design and Implementation of Instruction Processors for Multicomputers (Position Statement) Multithreaded Computer Architecture Dally, W. J. 1994: 79-82
  • How to Choose the Grain Size of a Parallel Computer MIT/LCS Technical Report Yeung, D., Dally, W. J., Agarwal, A. 1994: MIT-LCS-TR-739
  • Deadlock-free adaptive routing in multicomputer networks using virtual channels Parallel and Distributed Systems, IEEE Transactions Dally, W. J., Aoki, H. 1993; 4 (4): 466-475
  • The J-machine multicomputer: an architectural evaluation ACM SIGARCH Computer Architecture News Noakes, M. D., Wallach, D. A., Dally, W. J. 1993; 2 (21): 224-235
  • Performance evaluation of ephemeral logging ACM SIGMOD Record Keen, J. S., Dally, W. J. 1993; 2 (22): 187-196
  • Evaluation of mechanisms for fine-grained parallel programs in the J-machine and the CM-5 ACM SIGARCH Computer Architecture News Spertus, E., Goldstein, S. C., Schauser, K. E., Eicken, T. V., Culler, D. E., Dally, W. J. 1993; 3 (21): 302-313
  • A Video Controller and Distributed Frame Bu er for the J-Machine Dally, W. J., McDonald, E. 1993
  • A universal parallel computer architecture New Generation Computing Dally, W. J. 1993; 3-4 (11): 227-249
  • High-performance bidirectional signalling in VLSI systems Dennison, L. R., Lee, W. S., Dally, W. J. 1993
  • Mechanisms for parallel computers Parallel Computing on Distributed Memory Multiprocessors Dally, W. J., Wills, D. S., Lethin, R. 1993: 3-25
  • COSMOS: An operating system for a fine-grain concurrent computer Research directions in concurrent object-oriented programming Horwat, W., Totty, B., Dally, W. J. 1993: 452-476
  • The J-Machine architecture and evaluation Compcon Spring'93, Digest of Papers. Dally, W. J., Keen, J. S., Noakes, M. D. 1993: 183-188
  • Message-driven processor in a concurrent computer US Patent Dally, W. J., Chien, A. A., Horwat, W. P., Fiske, S. 1993; 212 (5): 778
  • The Future of Computing is Parallel Computer Science Department Dally, W. J. 1993
  • Virtual-channel flow control Parallel and Distributed Systems, IEEE Transactions Dally, W. J. 1992; 2 (3): 194-205
  • Design and implementation of the Message-Driven Processor Dally, W. J., Ahmed, S., Carrick, P., Chien, A., Davison, R., Fiske, J. 1992
  • The message-driven processor: A multicomputer processing node with efficient mechanisms Micro, IEEE Dally, W. J., Fiske, J. A., Keen, J. S., Lethin, R. A., Noakes, M. D., Nuth, P. R. 1992; 2 (12): 23-39
  • The message driven processor: An integrated multicomputer processing element Computer Design: VLSI in Computers and Processor Dally, W. J., Chien, A., Fiske, J. A., Fyler, G., Horwat, W., Keen, J. S. 1992
  • MDP design tools and methods Computer Design: VLSI in Computers and Processors Lethin, R. A., Dally, W. J. 1992: ICCD'92
  • INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE 1992 Scientific information bulletin Keckler, S. W., Dally, W. J. 1992; 4 (17): 35
  • Processor coupling: Integrating compile time and runtime scheduling for parallelism ACM SIGARCH Computer Architecture News Keckler, S. W., Dally, W. J. 1992; 2 (20): 202-213
  • Custom integrated circuits Custom Integrated Circuits Dally, W. J., Allen, J., Wyatt Jr, J. L., White, J. K., Devadas, S., Armstrong, R. C. 1992
  • A fast translation method for paging on top of segmentation Computers, IEEE Transactions Dally, W. J. 1992; 2 (41): 247-250
  • Pi: a parallel architecture interface Frontiers of Massively Parallel Computation, 1992., Fourth Symposium on the… Wills, D. S., Dally, W. J. 1992
  • The J-machine: a fine-grain parallel computer Computing Systems in Engineering Dally, W. J., Chien, A., Davison, R., Fiske, J. A., Furman, S., Fyler, G. 1992; 1 (3): 7-15
  • Virtual-Channel Flow Control (PDF) Dally, W. J. 1992
  • The J-machine network Computer Design: VLSI in Computers and Processors Nuth, P. R., Dally, W. J. 1992
  • Experiences Implementing Dataflow on a General-Purpose Parallel Computer. ICPP Spertus, E., Dally, W. J. 1991; 2: 231-235
  • A mechanism for efficient context switching Computer Design: VLSI in Computers and Processors Nuth, P. R., Dally, W. J. 1991: ICCD'91
  • Experiments with Dataflow on a General-Purpose Parallel Computer. MASSACHUSETTS INST OF TECH CAMBRIDGE ARTIFICIAL INTELLIGENCE LAB Spertus, E., Dally, W. J. 1991
  • Experiments with Dataflow on a General-Purpose Parallel Computer MASSACHUSETTS INST OF TECH CAMBRIDGE ARTIFICIAL INTELLIGENCE LAB Dally, W. J., Spertus, E. 1991
  • Express cubes: improving the performance of< e1> k</e1>-ary< e1> n</e1>-cube interconnection networks Computers, IEEE Transactions Dally, W. J. 1991; 9 (40): 1016-1023
  • Experiments with data flow on a general-purpose parallel computer. Memorandum report Massachusetts Inst. of Tech., Cambridge, MA (United States). Artificial Spertus, E., Dally, W. J. 1991
  • Simultaneous bidirectional signalling for IC systems Computer Design: VLSI in Computers and Processors Lam, K., Dennison, L. R., Dally, W. J. 1990: ICCD'90
  • Experience with concurrent aggregates (CA): Implementation and programming Chien, A. A., Dally, W. J. 1990
  • Advanced Research in VLSI: Proceedings of the Sixth MIT Conference;[papers Presented at the Sixth MIT Conference on Advanced Research in VLSI, Held in Cambridge, Mass., in 1990] Da, W. J. 1990
  • Performance analysis of< e1> k</e1>-ary< e1> n</e1>-cube interconnection networks Computers, IEEE Transactions Dally, W. J. 1990; 6 (39): 775-785
  • Network and processor architecture for message-driven computers VLSI and Parallel Computation Dally, W. 1990: 140-222
  • The Message-Driven Processor: A Multicomputer Processing Node with E cient Mechanisms Dally, W. J., Davison, R., Fiske, J. A., Fyler, G., Keen, J. S., Lethin, R. A. 1990
  • Critical Problems in Very Large Scale Computer Systems MASSACHUSETTS INST OF TECH CAMBRIDGE Agarwal, A., Dally, W. J., Devadas, S., Knight Jr, T. F., Leighton, F. T., Nabors, K. 1990
  • Concurrent aggregates (CA) ACM Sigplan Notices Chien, A. A., Dally, W. J. 1990; 3 (25): 187-196
  • System design of the J-Machine Noakes, M., Dally, W. J. 1990
  • Critical Problems in Very Large Scale Computer Systems KURTZ LABS YELLOW SPRINGS OH Leighton, F. T., Knight, T. F., Agarwal, A., Dally, W. J., Devadas, S. 1990
  • A hardware logic simulation system Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions Agrawal, P., Dally, W. J. 1990
  • Virtual-channel flow control Dally, W., J. 1990
  • Proceedings of the sixth MIT conference on Advanced research in VLSI Dally, W. J. 1990
  • Express cubes: Improving the performance of k-ary n-cube interconnection networks MASSACHUSETTS INST OF TECH CAMBRIDGE LAB FOR COMPUTER SCIENCE Dally, W. J. 1989
  • Experience with CST: Programming and Implementation MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Chien, A. A., Dally, W. J., Horwat, W. 1989
  • Algorithms for accuracy enhancement in a hardware logic simulator Agrawal, P., Tutundjian, R., Dally, W. 1989
  • Universal mechanisms for concurrency PARLE'89 Parallel Architectures and Languages Europe Dally, W. J., Wills, D. S. 1989: 19-33
  • A fine-grain, message-passing processing node Concurrent Computations Dally, W. J. 1989: 375-389
  • The J-machine: a fine grain concurrent computer MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Dally, W. J., Chien, A., Fiske, S., Horwat, W., Keen, J. 1989
  • Micro-optimization of floating-point operations ACM SIGARCH Computer Architecture News Dally, W. J. 1989; 2 (17): 283-289
  • Experience with CST: Programming and implementation ACM SIGPLAN Notices Horwat, W., Chien, A. A., Dally, W. J. 1989; 7 (24): 101-109
  • The Reconfigurable Arithmetic Processor MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Dally, W. J., Fiske, S. 1988
  • Finite-grain message passing concurrent computers Dally, W. 1988
  • Message-Driven Processor architecture, Version 11. Artificial intelligence memo Massachusetts Inst. of Tech., Cambridge (USA). Artificial Intelligence Lab. Dally, W., Chien, A., Fiske, S., Horwat, W., Keen, J. 1988
  • The J-machine: System support for Actors MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Dally, W. J. 1988
  • ON FIFTH GENERATION COMPUTER SYSTEMS 1988, edited by ICOT.© ICOT, 1988 Dally, W. J. 1988; 3 (FGCS'88): 154
  • Object-Oriented Concurrent Programming in CST MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Chien, A. A., Dally, W. J. 1988
  • Message-Driven Processor Architecture MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Dally, W., Chien, A., Fiske, S., Horwat, W., Keen, J. 1988
  • Critical Problems in Very Large Scale Computer Systems MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Knight, T. F., Penfield, P., Glasser, L. A., Agarwal, A., Dally, W. J. 1988
  • Critical problems in very-large-scale computer systems. Semiannual technical report, 1 April-30 September 1988 Massachusetts Inst. of Tech., Cambridge (USA). Microsystems Research Center Penfield, P., Agarwal, A., Dally, W. J., Devadas, S., Knight, T. F. 1988
  • A network element based fault tolerant processor Massachusetts Institute of Technology Abler, T. A. 1988
  • The reconfigurable arithmetic processor ACM SIGARCH Computer Architecture News Fiske, S., Dally, W. J. 1988; 2 (16): 30-36
  • Mechanisms for Concurrent Computing FGCS Dally, W. J. 1988: 154-156
  • Object-oriented concurrent programming in CST Dally, W. J., Chien, A. A. 1988
  • The Balanced Cube A VLSI Architecture for Concurrent Data Structures Dally, W. J. 1987: 27-73
  • Architecture and design of the MARS hardware accelerator Agrawal, P., Dally, W. J., Ezzat, A. K., Fischer, W. C., Jagadish, H. V., Krishnakumar, A. 1987
  • Performance analysis of k-ary n-cube interconnection networks NASA STI/Recon Technical Report N Dally, W. J. 1987; 88: 30010
  • Graph Algorithms A VLSI Architecture for Concurrent Data Structures Dally, W. J. 1987: 75-132
  • MARS: A multiprocessor-based programmable accelerator Design & Test of Computers, IEEE Agrawal, P., Dally, W. J., Fischer, W. C., Jagadish, H. V., Krishnakumar, A. S., Tutundjian, R. 1987; 5 (4): 28-36
  • Deadlock-free message routing in multiprocessor interconnection networks Computers, IEEE Transactions Dally, W. J., Seitz, C. L. 1987; 5 (100): 547-553
  • A coherent VLSI environment Massachusetts Inst. of Tech. Report Penfield Jr, P., Dally, W. J., Glasser, L. A., Knight Jr, T. F., Leighton, F. T. 1987
  • Concurrent Smalltalk A VLSI Architecture for Concurrent Data Structures Dally, W. J. 1987: 13-25
  • A Coherent VLSI Design Environment MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Leighton, F. T., Penfield, P., Glasser, L. A., Knight, T. F., Dally, W. J. 1987
  • Design of a self-timed VLSI multicomputer communication controller NASA STI/Recon Technical Report Dally, W. J., Song, P. 1987; 88: 30014
  • Coherent VLSI environment. Semiannual technical report, 1 October 1986-31 March 1987 Massachusetts Inst. of Tech., Cambridge (USA). Microsystems Research Center Penfield, P., Dally, W. J., Glasser, L. A., Knight, T. F., Leighton, F. T. 1987
  • Architecture of a Message-Driven Processor MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Chao, L., Dally, W. J., Chien, A., Hassoun, S., Horwat, W. 1987
  • A message passing system for a fault tolerant parallel processor Massachusetts Institute of Technology Dally, W. J., Heyda, R. L. 1987
  • A Coherent VLSI Design Environment MASSACHUSETTS INST OF TECH CAMBRIDGE Abelson, H., Penfield, P., Antoniadis, D. A., Dally, W. J., Fonstad, C. G. 1987
  • Concurrent computer architecture Massachusetts Inst. of Tech., Cambridge (USA). Artificial Intelligence Lab. Dally, W. J. 1987
  • A coherent VLSI design environment MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Penfield Jr, P., Dally, W. J., Glasser, L. A., Knight Jr, T. F., Leighton, F. T., Wyatt Jr, J. L. 1987
  • A Coherent VLSI Design Environment MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Leiserson, C. E., Penfield, P., Glasser, L. A., Knight, T. F., Dally, W. J. 1986
  • On the Performance of k-ary n-cube Interconnection Networks California Institute of Technology Dally, W. J. 1986
  • 5208: TR: _86 Dally, W. J. 1986
  • The torus routine chip Dally, W. J., Seitz, C. L. 1986
  • A High-performance VLSI Quaternary Serial Multiplier Dally, W. J. 1986
  • Wire-efficient VLSI multiprocessor communication networks Massachusetts Institute of Technology, Microsystems Program Office Dally, W. J. 1986
  • Directions in concurrent computing Dally, W. J. 1986
  • The torus routing chip Distributed computing Dally, W. J., Seitz, C. L. 1986; 4 (1): 187-196
  • VLSI architecture for concurrent data structures California Inst. of Tech. Dally, W. J. 1986
  • An object oriented architecture ACM SIGARCH Computer Architecture News Dally, W. J., Kajiya, J. T. 1985; 3 (13): 154-161
  • A hardware architecture for switch-level simulation Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions Dally, W. J., Bryant, R. E. 1985
  • Fungicides for Crop Protection: Invited papers International Specialized Book Service Incorporated Dally, W. J., Smith, I. M. 1985
  • The balanced cube: a concurrent data structure California Institute of Technology Dally, W. J., Seitz, C. L. 1985
  • Concurrent Algorithms for the Max-Flow Problem California Institute of Technology Dally, W. J. 1985
  • The MOSSIM Simulation Engine Architecture and Design California Institute of Technology Dally, W. J. 1984
  • A Special Purpose Processor for Switch-Level Simulation International Conference on Computer Aided Design Dally, W. J., Bryant, R. E. 1984