Academic Appointments

2019-20 Courses

Stanford Advisees

  • Doctoral Dissertation Reader (AC)
    Sadjad Fouladi, Yawen Wang
  • Postdoctoral Faculty Sponsor
    Fiodar Kazhamiaka
  • Doctoral Dissertation Advisor (AC)
    Firas Abuzaid
  • Orals Evaluator
    Zhihao Jia
  • Master's Program Advisor
    Nikhil Athreya, Advay Pal, Julius Stener, Rui Ueyama
  • Doctoral Dissertation Co-Advisor (AC)
    Zhihao Jia, Daniel Kang, James Thomas
  • Doctoral (Program)
    Cody Coleman, Peter Kraft, Deepak Narayanan, Deepti Raghavan, Pratiksha Thaker

All Publications

  • PREDICTING SUDDEN CARDIAC DEATH BY MACHINE LEARNING OF VENTRICULAR ACTION POTENTIALS Selvalingam, A., Alhusseini, M., Rogers, A. J., Krummen, D., Abuzaid, F. M., Baykaner, T., Clopton, P., Bailis, P., Zaharia, M., Wang, P., Narayan, S. ELSEVIER SCIENCE INC. 2020: 427
  • From Laptop to Lambda: Outsourcing Everyday Jobs to Thousands of Transient Functional Containers Fouladi, S., Romero, F., Iter, D., Li, Q., Chatterjee, S., Kozyrakis, C., Zaharia, M., Winstein, K., USENIX Assoc USENIX ASSOC. 2019: 475–88
  • PipeDream: Generalized Pipeline Parallelism for DNN Training Narayanan, D., Harlap, A., Phanishayee, A., Seshadri, V., Devanur, N. R., Ganger, G. R., Gibbons, P. B., Zaharia, M., ACM ASSOC COMPUTING MACHINERY. 2019: 1–15
  • To Index or Not to Index: Optimizing Exact Maximum Inner Product Search Abuzaid, F., Sethi, G., Bailis, P., Zaharia, M., IEEE IEEE. 2019: 1250–61
  • Optimizing Data-Intensive Computations in Existing Libraries with Split Annotations Palkar, S., Zaharia, M., ACM ASSOC COMPUTING MACHINERY. 2019: 291–305
  • TASO: Optimizing Deep Learning Computation with Automatic Generation of Graph Substitutions Jia, Z., Padon, O., Thomas, J., Warszawski, T., Zaharia, M., Aiken, A., ACM ASSOC COMPUTING MACHINERY. 2019: 47–62
  • DIFF: A Relational Interface for Large-Scale Data Explanation PROCEEDINGS OF THE VLDB ENDOWMENT Abuzaid, F., Kraft, P., Suri, S., Gan, E., Xu, E., Shenoy, A., Ananthanarayan, A., Sheu, J., Meijer, E., Wu, X., Naughton, J., Bailis, P., Zaharia, M. 2018; 12 (4): 419–32
  • Structured Streaming: A Declarative API for Real-Time Applications in Apache Spark Armbrust, M., Das, T., Torres, J., Yavuz, B., Zhu, S., Xin, R., Ghodsi, A., Stoica, I., Zaharia, M., Das, G., Jermaine, C., Bernstein, P., Eldawy, A. ASSOC COMPUTING MACHINERY. 2018: 601–13
  • MISTIQUE: A System to Store and Query Model Intermediates for Model Diagnosis Vartak, M., da Trindade, J. F., Madden, S., Zaharia, M., Das, G., Jermaine, C., Bernstein, P., Eldawy, A. ASSOC COMPUTING MACHINERY. 2018: 1285–1300
  • NoScope: Optimizing Neural Network Queries over Video at Scale PROCEEDINGS OF THE VLDB ENDOWMENT Kang, D., Emmons, J., Abuzaid, F., Bailis, P., Zaharia, M. 2017; 10 (11): 1586–97
  • Splinter: Practical Private Queries on Public Data Wang, F., Yun, C., Goldwasser, S., Vaikuntanathan, V., Zaharia, M., USENIX Assoc USENIX ASSOC. 2017: 299–313
  • DIY Hosting for Online Privacy Palkar, S., Zaharia, M., Assoc Comp Machinery ASSOC COMPUTING MACHINERY. 2017: 1–7
  • Making Caches Work for Graph Analytics Zhang, Y., Kiriansky, V., Mendis, C., Amarasinghe, S., Zaharia, M., Nie, J. Y., Obradovic, Z., Suzumura, T., Ghosh, R., Nambiar, R., Wang, C., Zang, H., BaezaYates, R., Hu, Kepner, J., Cuzzocrea, A., Tang, J., Toyoda, M. IEEE. 2017: 293–302
  • Apache Spark: A Unified Engine for Big Data Processing COMMUNICATIONS OF THE ACM Zaharia, M., Xin, R. S., Wendell, P., Das, T., Armbrust, M., Dave, A., Meng, X., Rosen, J., Venkataraman, S., Franklin, M. J., Ghodsi, A., Gonzalez, J., Shenker, S., Stoica, I. 2016; 59 (11): 56-65

    View details for DOI 10.1145/2934664

    View details for Web of Science ID 000387897700022

  • Voodoo - A Vector Algebra for Portable Database Performance on Modern Hardware PROCEEDINGS OF THE VLDB ENDOWMENT Pirk, H., Moll, O., Zaharia, M., Madden, S. 2016; 9 (14): 1707–18
  • MLlib: Machine Learning in Apache Spark JOURNAL OF MACHINE LEARNING RESEARCH Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D., Freeman, J., Tsai, D. B., Amde, M., Owen, S., Xin, D., Xin, R., Franklin, M. J., Zadeh, R., Zaharia, M., Talwalkar, A. 2016; 17
  • Matrix Computations and Optimization in Apache Spark Zadeh, R., Meng, X., Ulanov, A., Yavuz, B., Pu, L., Venkataraman, S., Sparks, E., Staple, A., Zaharia, M., Assoc Comp Machinery ASSOC COMPUTING MACHINERY. 2016: 31–38
  • Yggdrasil: An Optimized System for Training Deep Decision Trees at Scale Abuzaid, F., Bradley, J., Liang, F., Feng, A., Yang, L., Zaharia, M., Talwalkar, A., Lee, D. D., Sugiyama, M., Luxburg, U. V., Guyon, Garnett, R. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2016
  • Introduction to Spark 2.0 for Database Researchers Armbrust, M., Bateman, D., Xin, R., Zaharia, M., ACM SIGMOD ASSOC COMPUTING MACHINERY. 2016: 2193–94
  • SparkR: Scaling R Programs with Spark Venkataraman, S., Yang, Z., Liu, D., Liang, E., Falaki, H., Meng, X., Xin, R., Ghodsi, A., Franklin, M., Stoica, I., Zaharia, M., ACM SIGMOD ASSOC COMPUTING MACHINERY. 2016: 1099–1104
  • FairRide: Near-Optimal, Fair Cache Sharing Pu, Q., Li, H., Zaharia, M., Ghodsi, A., Stoica, I., USENIX Assoc USENIX ASSOC. 2016: 393–406
  • GraphFrames: An Integrated API for Mixing Graph and Relational Queries Dave, A., Jindal, A., Li, L., Xin, R., Gonzalez, J., Zaharia, M., ACM ASSOC COMPUTING MACHINERY. 2016
  • Scaling Spark in the Real World: Performance and Usability PROCEEDINGS OF THE VLDB ENDOWMENT Armbrust, M., Das, T., Davidson, A., Ghodsi, A., Or, A., Rosen, J., Stoica, I., Wendell, P., Xin, R., Zaharia, M. 2015; 8 (12): 1840–43
  • Spark SQL: Relational Data Processing in Spark Armbrust, M., Xin, R. S., Lian, C., Huai, Y., Liu, D., Bradley, J. K., Meng, X., Kaftan, T., Franklint, M. J., Ghodsi, A., Zaharia, M., ACM SIGMOD ASSOC COMPUTING MACHINERY. 2015: 1383–94
  • Vuvuzela: Scalable Private Messaging Resistant to Traffic Analysis van den Hooff, J., Lazar, D., Zaharia, M., Zeldovich, N., Assoc Comp Machinery ASSOC COMPUTING MACHINERY. 2015: 137–52
  • Optimally designing games for behavioural research PROCEEDINGS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES Rafferty, A. N., Zaharia, M., Griffiths, T. L. 2014; 470 (2167): 20130828


    Computer games can be motivating and engaging experiences that facilitate learning, leading to their increasing use in education and behavioural experiments. For these applications, it is often important to make inferences about the knowledge and cognitive processes of players based on their behaviour. However, designing games that provide useful behavioural data are a difficult task that typically requires significant trial and error. We address this issue by creating a new formal framework that extends optimal experiment design, used in statistics, to apply to game design. In this framework, we use Markov decision processes to model players' actions within a game, and then make inferences about the parameters of a cognitive model from these actions. Using a variety of concept learning games, we show that in practice, this method can predict which games will result in better estimates of the parameters of interest. The best games require only half as many players to attain the same level of precision.

    View details for DOI 10.1098/rspa.2013.0828

    View details for Web of Science ID 000336184600004

    View details for PubMedID 25002821

    View details for PubMedCentralID PMC4032552

  • A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples GENOME RESEARCH Naccache, S. N., Federman, S., Veeraraghavan, N., Zaharia, M., Lee, D., Samayoa, E., Bouquet, J., Greninger, A. L., Luk, K., Enge, B., Wadford, D. A., Messenger, S. L., Genrich, G. L., Pellegrino, K., Grard, G., Leroy, E., Schneider, B. S., Fair, J. N., Martinez, M. A., Isa, P., Crump, J. A., DeRisi, J. L., Sittler, T., Hackett, J., Miller, S., Chiu, C. Y. 2014; 24 (7): 1180–92


    Unbiased next-generation sequencing (NGS) approaches enable comprehensive pathogen detection in the clinical microbiology laboratory and have numerous applications for public health surveillance, outbreak investigation, and the diagnosis of infectious diseases. However, practical deployment of the technology is hindered by the bioinformatics challenge of analyzing results accurately and in a clinically relevant timeframe. Here we describe SURPI ("sequence-based ultrarapid pathogen identification"), a computational pipeline for pathogen identification from complex metagenomic NGS data generated from clinical samples, and demonstrate use of the pipeline in the analysis of 237 clinical samples comprising more than 1.1 billion sequences. Deployable on both cloud-based and standalone servers, SURPI leverages two state-of-the-art aligners for accelerated analyses, SNAP and RAPSearch, which are as accurate as existing bioinformatics tools but orders of magnitude faster in performance. In fast mode, SURPI detects viruses and bacteria by scanning data sets of 7-500 million reads in 11 min to 5 h, while in comprehensive mode, all known microorganisms are identified, followed by de novo assembly and protein homology searches for divergent viruses in 50 min to 16 h. SURPI has also directly contributed to real-time microbial diagnosis in acutely ill patients, underscoring its potential key role in the development of unbiased NGS-based clinical assays in infectious diseases that demand rapid turnaround times.

    View details for DOI 10.1101/gr.171934.113

    View details for Web of Science ID 000338185000012

    View details for PubMedID 24899342

    View details for PubMedCentralID PMC4079973

  • Multi-Resource Fair Queueing for Packet Processing ACM SIGCOMM COMPUTER COMMUNICATION REVIEW Ghodsi, A., Sekar, V., Zaharia, M., Stoica, I. 2012; 42 (4): 1–12
  • Managing Data Transfers in Computer Clusters with Orchestra ACM SIGCOMM COMPUTER COMMUNICATION REVIEW Chowdhury, M., Zaharia, M., Ma, J., Jordan, M. I., Stoica, I. 2011; 41 (4): 98–109