Emily Fox is a Professor in the Department of Statistics and, by courtesy, Computer Science at Stanford University. Prior to Stanford, she was the Amazon Professor of Machine Learning in the Paul G. Allen School of Computer Science & Engineering and Department of Statistics at the University of Washington. From 2018-2021, Emily led the Health AI team at Apple, where she was a Distinguished Engineer. Before joining UW, Emily was an Assistant Professor at the Wharton School Department of Statistics at the University of Pennsylvania. She earned her doctorate from Electrical Engineering and Computer Science (EECS) at MIT where her thesis was recognized with EECS' Jin-Au Kong Outstanding Doctoral Thesis Prize and the Leonard J. Savage Award for Best Thesis in Applied Methodology.

Emily has been awarded a CZ Biohub Investigator Award, Presidential Early Career Award for Scientists and Engineers (PECASE), a Sloan Research Fellowship, ONR Young Investigator Award, and NSF CAREER Award. Her research interests are in large-scale Bayesian dynamic modeling, interpretability and computations, with applications in health and computational neuroscience.

Academic Appointments

Honors & Awards

  • Investigator Award, CZ Biohub (2022)
  • AWS Machine Learning Research Award, Amazon (2018)
  • Presidential Early Career Award in Science & Engineering, National Science Foundation (2017)
  • Sloan Research Fellowship, Alfred P. Sloan Foundation (2015)
  • Young Investigator Award, Office of Naval Research (2015)
  • CAREER Award, National Science Foundation (2014)
  • Jin-Au Kong Outstanding Doctoral Thesis Prize, MIT EECS (2009)
  • Leonard J. Savage Award for Best Thesis in Applied Methodology, International Society for Bayesian Analysis (2009)

Professional Education

  • Ph.D., Massachusetts Institute of Technology (MIT), Electrical Engineering and Computer Science (2009)
  • E.E., Massachusetts Institute of Technology (MIT), Electrical Engineering and Computer Science (2008)
  • M.Eng., Massachusetts Institute of Technology (MIT), Electrical Engineering and Computer Science (2005)
  • S.B., Massachusetts Institute of Technology (MIT), Electrical Science and Engineering (2004)

2022-23 Courses

Stanford Advisees

All Publications

  • Granger Causality: A Review and Recent Advances ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION Shojaie, A., Fox, E. B. 2022; 9: 289-319
  • Neural Granger Causality IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Tank, A., Covert, I., Foti, N., Shojaie, A., Fox, E. B. 2021; 44 (8): 4267-4279


    While most classical approaches to Granger causality detection assume linear dynamics, many interactions in real-world applications, like neuroscience and genomics, are inherently nonlinear. In these cases, using linear models may lead to inconsistent estimation of Granger causal interactions. We propose a class of nonlinear methods by applying structured multilayer perceptrons (MLPs) or recurrent neural networks (RNNs) combined with sparsity-inducing penalties on the weights. By encouraging specific sets of weights to be zero-in particular, through the use of convex group-lasso penalties-we can extract the Granger causal structure. To further contrast with traditional approaches, our framework naturally enables us to efficiently capture long-range dependencies between series either via our RNNs or through an automatic lag selection in the MLP. We show that our neural Granger causality methods outperform state-of-the-art nonlinear Granger causality methods on the DREAM3 challenge data. This data consists of nonlinear gene expression and regulation time courses with only a limited number of time points. The successes we show in this challenging dataset provide a powerful example of how deep learning can be useful in cases that go beyond prediction on large datasets. We likewise illustrate our methods in detecting nonlinear interactions in a human motion capture dataset.

    View details for DOI 10.1109/TPAMI.2021.3065601

    View details for Web of Science ID 000820522700002

    View details for PubMedID 33705309

  • Improving Reproducibility in Machine Learning Research (A Report from the NeurIPS 2019 Reproducibility Program) JOURNAL OF MACHINE LEARNING RESEARCH Pineau, J., Vincent-Lamarre, P., Sinha, K., Lariviere, V., Beygelzimer, A., d'Alche-Buc, F., Fox, E., Larochelle, H. 2021; 22
  • The Convex Mixture Distribution: Granger Causality for Categorical Time Series SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE Tank, A., Li, X., Fox, E. B., Shojaie, A. 2021; 3 (1): 83-112

    View details for DOI 10.1137/20M133097X

    View details for Web of Science ID 000646591200004

  • Adaptively Truncating Backpropagation Through Time to Control Gradient Bias Aicher, C., Foti, N. J., Fox, E. B., Adams, R. P., Gogate JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2020: 799-808
  • sgmcmc: An R Package for Stochastic Gradient Markov Chain Monte Carlo JOURNAL OF STATISTICAL SOFTWARE Baker, J., Fearnhead, P., Fox, E. B., Nemeth, C. 2019; 91 (3): 1-27
  • Control variates for stochastic gradient MCMC STATISTICS AND COMPUTING Baker, J., Fearnhead, P., Fox, E. B., Nemeth, C. 2019; 29 (3): 599-615
  • Statistical model-based approaches for functional connectivity analysis of neuroimaging data CURRENT OPINION IN NEUROBIOLOGY Foti, N. J., Fox, E. B. 2019; 55: 48-54


    We present recent literature on model-based approaches to estimating functional connectivity from neuroimaging data. In contrast to the typical focus on a particular scientific question, we reframe a wider literature in terms of the underlying statistical model used. We distinguish between directed versus undirected and static versus time-varying connectivity. There are numerous advantages to a model-based approach, including easily specified inductive bias, handling limited data scenarios, and building complex models from simpler building blocks.

    View details for DOI 10.1016/j.conb.2019.01.009

    View details for Web of Science ID 000472127600008

    View details for PubMedID 30739880

  • Stochastic Gradient MCMC for State Space Models SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE Aicher, C., Ma, Y., Foti, N. J., Fox, E. B. 2019; 1 (3): 555-587

    View details for DOI 10.1137/18M1214780

    View details for Web of Science ID 000646580300008

  • Irreversible samplers from jump and continuous Markov processes STATISTICS AND COMPUTING Ma, Y., Fox, E. B., Chen, T., Wu, L. 2019; 29 (1): 177-202
  • A Simple Adaptive Tracker with Reminiscences Xie, C., Fox, E., Harchaoui, Z., IEEE, Howard, A., Althoefer, K., Arai, F., Arrichiello, F., Caputo, B., Castellanos, J., Hauser, K., Isler, Kim, J., Liu, H., Oh, P., Santos, Scaramuzza, D., Ude, A., Voyles, R., Yamane, K., Okamura, A. IEEE. 2019: 6596-6603
  • oi-VAE: Output Interpretable VAEs for Nonlinear Group Factor Analysis Ainsworth, S. K., Foti, N. J., Lee, A. C., Fox, E. B., Dy, J., Krause, A. JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2018
  • Large-Scale Stochastic Sampling from the Probability Simplex Baker, J., Fearnhead, P., Fox, E. B., Nemeth, C., Bengio, S., Wallach, H., Larochelle, H., Grauman, K., CesaBianchi, N., Garnett, R. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2018
  • Sparse graphs using exchangeable random measures JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY Caron, F., Fox, E. B. 2017; 79 (5): 1295-1366


    Statistical network modelling has focused on representing the graph as a discrete structure, namely the adjacency matrix. When assuming exchangeability of this array-which can aid in modelling, computations and theoretical analysis-the Aldous-Hoover theorem informs us that the graph is necessarily either dense or empty. We instead consider representing the graph as an exchangeable random measure and appeal to the Kallenberg representation theorem for this object. We explore using completely random measures (CRMs) to define the exchangeable random measure, and we show how our CRM construction enables us to achieve sparse graphs while maintaining the attractive properties of exchangeability. We relate the sparsity of the graph to the Lévy measure defining the CRM. For a specific choice of CRM, our graphs can be tuned from dense to sparse on the basis of a single parameter. We present a scalable Hamiltonian Monte Carlo algorithm for posterior inference, which we use to analyse network properties in a range of real data sets, including networks with hundreds of thousands of nodes and millions of edges.

    View details for DOI 10.1111/rssb.12233

    View details for Web of Science ID 000413946300001

    View details for PubMedID 29200934

    View details for PubMedCentralID PMC5699441

  • Stochastic Gradient MCMC Methods for Hidden Markov Models Ma, Y., Foti, N. J., Fox, E. B., Precup, D., Teh, Y. W. JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2017
  • Comment: Nonparametric Bayes Modeling of Populations of Networks JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION Foti, N. J., Fox, E. B. 2017; 112 (520): 1539-1543
  • Temporal behavior of seizures and interictal bursts in prolonged intracranial recordings from epileptic canines EPILEPSIA Ung, H., Davis, K. A., Wulsin, D., Wagenaar, J., Fox, E., McDonnell, J. J., Patterson, N., Vite, C. H., Worrell, G., Litt, B. 2016; 57 (12): 1949-1957


    Epilepsy is a chronic disorder, but seizure recordings are usually obtained in the acute setting. The chronic behavior of seizures and the interictal bursts that sometimes initiate them is unknown. We investigate the variability of these electrographic patterns over an extended period of time using chronic intracranial recordings in canine epilepsy.Continuous, yearlong intracranial electroencephalography (iEEG) recordings from four dogs with naturally occurring epilepsy were analyzed for seizures and interictal bursts. Following automated detection and clinician verification of interictal bursts and seizures, temporal trends of seizures, burst count, and burst-burst similarities were determined. One dog developed status epilepticus, the recordings of which were also investigated.Multiple seizure types, determined by onset channels, were observed in each dog, with significant temporal variation between types. The first 14 days of invasive recording, analogous to the average duration of clinical invasive recordings in humans, did not capture the entirety of seizure types. Seizures typically occurred in clusters, and isolated seizures were rare. The count and dynamics of interictal bursts form distinct groups and do not stabilize until several weeks after implantation.There is significant temporal variability in seizures and interictal bursts after electrode implantation that requires several weeks to reach steady state. These findings, comparable to those reported in humans implanted with the NeuroPace Responsive Neurostimulator System (RNS) device, suggest that transient network changes following electrode implantation may need to be taken into account when interpreting or analyzing iEEG during evaluation for epilepsy surgery. Chronic, ambulatory iEEG may be better suited to accurately map epileptic networks in appropriate individuals.

    View details for DOI 10.1111/epi.13591

    View details for Web of Science ID 000390353100002

    View details for PubMedID 27807850

    View details for PubMedCentralID PMC5241889

  • A novel seizure detection algorithm informed by hidden Markov model event states JOURNAL OF NEURAL ENGINEERING Baldassano, S., Wulsin, D., Ung, H., Blevins, T., Brown, M., Fox, E., Litt, B. 2016; 13 (3): 036011


    Recently the FDA approved the first responsive, closed-loop intracranial device to treat epilepsy. Because these devices must respond within seconds of seizure onset and not miss events, they are tuned to have high sensitivity, leading to frequent false positive stimulations and decreased battery life. In this work, we propose a more robust seizure detection model.We use a Bayesian nonparametric Markov switching process to parse intracranial EEG (iEEG) data into distinct dynamic event states. Each event state is then modeled as a multidimensional Gaussian distribution to allow for predictive state assignment. By detecting event states highly specific for seizure onset zones, the method can identify precise regions of iEEG data associated with the transition to seizure activity, reducing false positive detections associated with interictal bursts. The seizure detection algorithm was translated to a real-time application and validated in a small pilot study using 391 days of continuous iEEG data from two dogs with naturally occurring, multifocal epilepsy. A feature-based seizure detector modeled after the NeuroPace RNS System was developed as a control.Our novel seizure detection method demonstrated an improvement in false negative rate (0/55 seizures missed versus 2/55 seizures missed) as well as a significantly reduced false positive rate (0.0012 h versus 0.058 h(-1)). All seizures were detected an average of 12.1 ± 6.9 s before the onset of unequivocal epileptic activity (unequivocal epileptic onset (UEO)).This algorithm represents a computationally inexpensive, individualized, real-time detection method suitable for implantable antiepileptic devices that may considerably reduce false positive rate relative to current industry standards.

    View details for DOI 10.1088/1741-2560/13/3/036011

    View details for Web of Science ID 000375701200015

    View details for PubMedID 27098152

    View details for PubMedCentralID PMC4888894

  • Mining continuous intracranial EEG in focal canine epilepsy: Relating interictal bursts to seizure onsets EPILEPSIA Davis, K. A., Ung, H., Wulsin, D., Wagenaar, J., Fox, E., Patterson, N., Vite, C., Worrell, G., Litt, B. 2016; 57 (1): 89-98


    Brain regions are localized for resection during epilepsy surgery based on rare seizures observed during a short period of intracranial electroencephalography (iEEG) monitoring. Interictal epileptiform bursts, which are more prevalent than seizures, may provide complementary information to aid in epilepsy evaluation. In this study, we leverage a long-term iEEG dataset from canines with naturally occurring epilepsy to investigate interictal bursts and their electrographic relationship to seizures.Four dogs were included in this study, each monitored previously with continuous iEEG for periods of 475.7, 329.9, 45.8, and 451.8 days, respectively, for a total of >11,000 h. Seizures and bursts were detected and validated by two board-certified epileptologists. A published Bayesian model was applied to analyze the dynamics of interictal epileptic bursts on EEG and compare them to seizures.In three dogs, bursts were stereotyped and found to be statistically similar to periods before or near seizure onsets. Seizures from one dog during status epilepticus were markedly different from other seizures in terms of burst similarity.Shorter epileptic bursts explored in this work have the potential to yield significant information about the distribution of epileptic events. In our data, bursts are at least an order of magnitude more prevalent than seizures and occur much more regularly. Our finding that bursts often display pronounced similarity to seizure onsets suggests that they contain relevant information about the epileptic networks from which they arise and may aide in the clinical evaluation of epilepsy in patients.

    View details for DOI 10.1111/epi.13249

    View details for Web of Science ID 000368132100016

    View details for PubMedID 26608448

    View details for PubMedCentralID PMC4770560

  • Bayesian Nonparametric Covariance Regression JOURNAL OF MACHINE LEARNING RESEARCH Fox, E. B., Dunson, D. B. 2015; 16: 2501-2542
  • Guest Editors' Introduction to the Special Issue on Bayesian Nonparametrics IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Adams, R. P., Fox, E. B., Sudderth, E. B., Teh, Y. 2015; 37 (2): 209-211

    View details for DOI 10.1109/TPAMI.2014.2380478

    View details for Web of Science ID 000349625500001

    View details for PubMedID 26598765

  • A Complete Recipe for Stochastic Gradient MCMC Ma, Y., Chen, T., Fox, E. B., Cortes, C., Lawrence, N. D., Lee, D. D., Sugiyama, M., Garnett, R. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2015
  • Streaming Variational Inference for Bayesian Nonparametric Mixture Models Tank, A., Foti, N. J., Fox, E. B., Lebanon, G., Vishwanathan, S. V. MICROTOME PUBLISHING. 2015: 968-976
  • Bayesian Structure Learning for Stationary Time Series Tank, A., Foti, N. J., Fox, E. B., Meila, M., Heskes, T. AUAI PRESS. 2015: 872-881
  • Modeling the complex dynamics and changing correlations of epileptic events ARTIFICIAL INTELLIGENCE Wulsin, D. F., Fox, E. B., Litt, B. 2014; 216: 55-75


    Patients with epilepsy can manifest short, sub-clinical epileptic "bursts" in addition to full-blown clinical seizures. We believe the relationship between these two classes of events-something not previously studied quantitatively-could yield important insights into the nature and intrinsic dynamics of seizures. A goal of our work is to parse these complex epileptic events into distinct dynamic regimes. A challenge posed by the intracranial EEG (iEEG) data we study is the fact that the number and placement of electrodes can vary between patients. We develop a Bayesian nonparametric Markov switching process that allows for (i) shared dynamic regimes between a variable number of channels, (ii) asynchronous regime-switching, and (iii) an unknown dictionary of dynamic regimes. We encode a sparse and changing set of dependencies between the channels using a Markov-switching Gaussian graphical model for the innovations process driving the channel dynamics and demonstrate the importance of this model in parsing and out-of-sample predictions of iEEG data. We show that our model produces intuitive state assignments that can help automate clinical analysis of seizures and enable the comparison of sub-clinical bursts and full clinical seizures.

    View details for DOI 10.1016/j.artint.2014.05.006

    View details for Web of Science ID 000342253000003

    View details for PubMedID 25284825

    View details for PubMedCentralID PMC4180222


    View details for DOI 10.1214/14-AOAS741

    View details for Web of Science ID 000347529300013


    View details for DOI 10.1214/14-AOAS742

    View details for Web of Science ID 000347529300001

  • Expectation-Maximization for Learning Determinantal Point Processes Gillenwater, J., Kulesza, A., Fox, E., Taskar, B., Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N. D., Weinberger, K. Q. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2014
  • Stochastic Variational Inference for Hidden Markov Models Foti, N. J., Xu, J., Laird, D., Fox, E. B., Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N. D., Weinberger, K. Q. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2014
  • Learning the Parameters of Determinantal Point Process Kernels Affandi, R., Fox, E. B., Adams, R. P., Taskar, B., Xing, E. P., Jebara, T. JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2014: 1224-1232
  • Stochastic Gradient Hamiltonian Monte Carlo Chen, T., Fox, E. B., Guestrin, C., Xing, E. P., Jebara, T. JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2014: 1683-1691
  • Representing Documents Through Their Readers El-Arini, K., Xu, M., Fox, E. B., Guestrin, C., ACM ASSOC COMPUTING MACHINERY. 2013: 14-22
  • A STICKY HDP-HMM WITH APPLICATION TO SPEAKER DIARIZATION ANNALS OF APPLIED STATISTICS Fox, E. B., Sudderth, E. B., Jordan, M. I., Willsky, A. S. 2011; 5 (2A): 1020-1056

    View details for DOI 10.1214/10-AOAS395

    View details for Web of Science ID 000295453300019

  • Bayesian Nonparametric Inference of Switching Dynamic Linear Models IEEE TRANSACTIONS ON SIGNAL PROCESSING Fox, E., Sudderth, E. B., Jordan, M. I., Willsky, A. S. 2011; 59 (4): 1569-1585
  • Bayesian Nonparametric Methods for Learning Markov Switching Processes IEEE SIGNAL PROCESSING MAGAZINE Fox, E. B., Sudderth, E. B., Jordan, M. I., Willsky, A. S. 2010; 27 (6): 43-54