All Publications


  • Super-resolved spatial transcriptomics by deep data fusion. Nature biotechnology Bergenstrahle, L., He, B., Bergenstrahle, J., Abalo, X., Mirzazadeh, R., Thrane, K., Ji, A. L., Andersson, A., Larsson, L., Stakenborg, N., Boeckxstaens, G., Khavari, P., Zou, J., Lundeberg, J., Maaskola, J. 2021

    Abstract

    Current methods for spatial transcriptomics are limited by low spatial resolution. Here we introduce a method that integrates spatial gene expression data with histological image data from the same tissue section to infer higher-resolution expression maps. Using a deep generative model, our method characterizes the transcriptome of micrometer-scale anatomical features and can predict spatial gene expression from histology images alone.

    View details for DOI 10.1038/s41587-021-01075-3

    View details for PubMedID 34845373

  • Deep learning evaluation of biomarkers from echocardiogram videos. EBioMedicine Hughes, J. W., Yuan, N., He, B., Ouyang, J., Ebinger, J., Botting, P., Lee, J., Theurer, J., Tooley, J. E., Nieman, K., Lungren, M. P., Liang, D. H., Schnittger, I., Chen, J. H., Ashley, E. A., Cheng, S., Ouyang, D., Zou, J. Y. 2021; 73: 103613

    Abstract

    BACKGROUND: Laboratory testing is routinely used to assay blood biomarkers to provide information on physiologic state beyond what clinicians can evaluate from interpreting medical imaging. We hypothesized that deep learning interpretation of echocardiogram videos can provide additional value in understanding disease states and can evaluate common biomarkers results.METHODS: We developed EchoNet-Labs, a video-based deep learning algorithm to detect evidence of anemia, elevated B-type natriuretic peptide (BNP), troponin I, and blood urea nitrogen (BUN), as well as values of ten additional lab tests directly from echocardiograms. We included patients (n=39,460) aged 18 years or older with one or more apical-4-chamber echocardiogram videos (n=70,066) from Stanford Healthcare for training and internal testing of EchoNet-Lab's performance in estimating the most proximal biomarker result. Without fine-tuning, the performance of EchoNet-Labs was further evaluated on an additional external test dataset (n=1,301) from Cedars-Sinai Medical Center. We calculated the area under the curve (AUC) of the receiver operating characteristic curve for the internal and external test datasets.FINDINGS: On the held-out test set of Stanford patients not previously seen during model training, EchoNet-Labs achieved an AUC of 0.80 (0.79-0.81) in detecting anemia (low hemoglobin), 0.86 (0.85-0.88) in detecting elevated BNP, 0.75 (0.73-0.78) in detecting elevated troponin I, and 0.74 (0.72-0.76) in detecting elevated BUN. On the external test dataset from Cedars-Sinai, EchoNet-Labs achieved an AUC of 0.80 (0.77-0.82) in detecting anemia, of 0.82 (0.79-0.84) in detecting elevated BNP, of 0.75 (0.72-0.78) in detecting elevated troponin I, and of 0.69 (0.66-0.71) in detecting elevated BUN. We further demonstrate the utility of the model in detecting abnormalities in 10 additional lab tests. We investigate the features necessary for EchoNet-Labs to make successful detection and identify potential mechanisms for each biomarker using well-known and novel explainability techniques.INTERPRETATION: These results show that deep learning applied to diagnostic imaging can provide additional clinical value and identify phenotypic information beyond current imaging interpretation methods.FUNDING: J.W.H. and B.H. are supported by the NSF Graduate Research Fellowship. D.O. is supported by NIH K99 HL157421-01. J.Y.Z. is supported by NSF CAREER 1942926, NIH R21 MD012867-01, NIH P30AG059307 and by a Chan-Zuckerberg Biohub Fellowship.

    View details for DOI 10.1016/j.ebiom.2021.103613

    View details for PubMedID 34656880

  • How to evaluate deep learning for cancer diagnostics - factors and recommendations. Biochimica et biophysica acta. Reviews on cancer Daneshjou, R., He, B., Ouyang, D., Zou, J. 2021: 188515

    Abstract

    The large volume of data used in cancer diagnosis presents a unique opportunity for deep learning algorithms, which improve in predictive performance with increasing data. When applying deep learning to cancer diagnosis, the goal is often to learn how to classify an input sample (such as images or biomarkers) into predefined categories (such as benign or cancerous). In this article, we examine examples of how deep learning algorithms have been implemented to make predictions related to cancer diagnosis using clinical, radiological, and pathological image data. We present a systematic approach for evaluating the development and application of clinical deep learning algorithms. Based on these examples and the current state of deep learning in medicine, we discuss the future possibilities in this space and outline a roadmap for implementations of deep learning in cancer diagnosis.

    View details for DOI 10.1016/j.bbcan.2021.188515

    View details for PubMedID 33513392

  • Integrating spatial gene expression and breast tumour morphology via deep learning. Nature biomedical engineering He, B., Bergenstrahle, L., Stenbeck, L., Abid, A., Andersson, A., Borg, A., Maaskola, J., Lundeberg, J., Zou, J. 2020

    Abstract

    Spatial transcriptomics allows for the measurement of RNA abundance at a high spatial resolution, making it possible to systematically link the morphology of cellular neighbourhoods and spatially localized gene expression. Here, we report the development of a deep learning algorithm for the prediction of local gene expression from haematoxylin-and-eosin-stained histopathology images using a new dataset of 30,612 spatially resolved gene expression data matched to histopathology images from 23 patients with breast cancer. We identified over 100 genes, including known breast cancer biomarkers of intratumoral heterogeneity and the co-localization of tumour growth and immune activation, the expression of which can be predicted from the histopathology images at a resolution of 100m. We also show that the algorithm generalizes well to The Cancer Genome Atlas and to other breast cancer gene expression datasets without the need for re-training. Predicting the spatially resolved transcriptome of a tissue directly from tissue images may enable image-based screening for molecular biomarkers with spatial variation.

    View details for DOI 10.1038/s41551-020-0578-x

    View details for PubMedID 32572199

  • The Diversity-Innovation Paradox in Science. Proceedings of the National Academy of Sciences of the United States of America Hofstra, B., Kulkarni, V. V., Munoz-Najar Galvez, S., He, B., Jurafsky, D., McFarland, D. A. 2020

    Abstract

    Prior work finds a diversity paradox: Diversity breeds innovation, yet underrepresented groups that diversify organizations have less successful careers within them. Does the diversity paradox hold for scientists as well? We study this by utilizing a near-complete population of 1.2 million US doctoral recipients from 1977 to 2015 and following their careers into publishing and faculty positions. We use text analysis and machine learning to answer a series of questions: How do we detect scientific innovations? Are underrepresented groups more likely to generate scientific innovations? And are the innovations of underrepresented groups adopted and rewarded? Our analyses show that underrepresented groups produce higher rates of scientific novelty. However, their novel contributions are devalued and discounted: For example, novel contributions by gender and racial minorities are taken up by other scholars at lower rates than novel contributions by gender and racial majorities, and equally impactful contributions of gender and racial minorities are less likely to result in successful scientific careers than for majority groups. These results suggest there may be unwarranted reproduction of stratification in academic careers that discounts diversity's role in innovation and partly explains the underrepresentation of some groups in academia.

    View details for DOI 10.1073/pnas.1915378117

    View details for PubMedID 32291335

  • Video-based AI for beat-to-beat assessment of cardiac function. Nature Ouyang, D., He, B., Ghorbani, A., Yuan, N., Ebinger, J., Langlotz, C. P., Heidenreich, P. A., Harrington, R. A., Liang, D. H., Ashley, E. A., Zou, J. Y. 2020; 580 (7802): 252-256

    Abstract

    Accurate assessment of cardiac function is crucial for the diagnosis of cardiovascular disease1, screening for cardiotoxicity2 and decisions regarding the clinical management of patients with a critical illness3. However, human assessment of cardiac function focuses on a limited sampling of cardiac cycles and has considerable inter-observer variability despite years of training4,5. Here, to overcome this challenge, we present a video-based deep learning algorithm-EchoNet-Dynamic-that surpasses the performance of human experts in the critical tasks of segmenting the left ventricle, estimating ejection fraction and assessing cardiomyopathy. Trained on echocardiogram videos, our model accurately segments the left ventricle with a Dice similarity coefficient of 0.92, predicts ejection fraction with a mean absolute error of 4.1% and reliably classifies heart failure with reduced ejection fraction (area under the curve of 0.97). In an external dataset from another healthcare system, EchoNet-Dynamic predicts the ejection fraction with a mean absolute error of 6.0% and classifies heart failure with reduced ejection fraction with an area under the curve of 0.96. Prospective evaluation with repeated human measurements confirms that the model has variance that is comparable to or less than that of human experts. By leveraging information across multiple cardiac cycles, our model can rapidly identify subtle changes in ejection fraction, is more reproducible than human evaluation and lays the foundation for precise diagnosis of cardiovascular disease in real time. As a resource to promote further innovation, we also make publicly available a large dataset of 10,030 annotated echocardiogram videos.

    View details for DOI 10.1038/s41586-020-2145-8

    View details for PubMedID 32269341

  • Video-based AI for beat-to-beat assessment of cardiac function NATURE Ouyang, D., He, B., Ghorbani, A., Yuan, N., Ebinger, J., Langlotz, C. P., Heidenreich, P. A., Harrington, R. A., Liang, D. H., Ashley, E. A., Zou, J. Y. 2020
  • Deep learning interpretation of echocardiograms. NPJ digital medicine Ghorbani, A., Ouyang, D., Abid, A., He, B., Chen, J. H., Harrington, R. A., Liang, D. H., Ashley, E. A., Zou, J. Y. 2020; 3 (1): 10

    Abstract

    Echocardiography uses ultrasound technology to capture high temporal and spatial resolution images of the heart and surrounding structures, and is the most common imaging modality in cardiovascular medicine. Using convolutional neural networks on a large new dataset, we show that deep learning applied to echocardiography can identify local cardiac structures, estimate cardiac function, and predict systemic phenotypes that modify cardiovascular risk but not readily identifiable to human interpretation. Our deep learning model, EchoNet, accurately identified the presence of pacemaker leads (AUC = 0.89), enlarged left atrium (AUC = 0.86), left ventricular hypertrophy (AUC = 0.75), left ventricular end systolic and diastolic volumes ([Formula: see text] = 0.74 and [Formula: see text] = 0.70), and ejection fraction ([Formula: see text] = 0.50), as well as predicted systemic phenotypes of age ([Formula: see text] = 0.46), sex (AUC = 0.88), weight ([Formula: see text] = 0.56), and height ([Formula: see text] = 0.33). Interpretation analysis validates that EchoNet shows appropriate attention to key cardiac structures when performing human-explainable tasks and highlights hypothesis-generating regions of interest when predicting systemic phenotypes difficult for human interpretation. Machine learning on echocardiography images can streamline repetitive tasks in the clinical workflow, provide preliminary interpretation in areas with insufficient qualified cardiologists, and predict phenotypes challenging for human evaluation.

    View details for DOI 10.1038/s41746-019-0216-8

    View details for PubMedID 33483633

  • Deep learning interpretation of echocardiograms. NPJ digital medicine Ghorbani, A., Ouyang, D., Abid, A., He, B., Chen, J. H., Harrington, R. A., Liang, D. H., Ashley, E. A., Zou, J. Y. 2020; 3: 10

    Abstract

    Echocardiography uses ultrasound technology to capture high temporal and spatial resolution images of the heart and surrounding structures, and is the most common imaging modality in cardiovascular medicine. Using convolutional neural networks on a large new dataset, we show that deep learning applied to echocardiography can identify local cardiac structures, estimate cardiac function, and predict systemic phenotypes that modify cardiovascular risk but not readily identifiable to human interpretation. Our deep learning model, EchoNet, accurately identified the presence of pacemaker leads (AUC=0.89), enlarged left atrium (AUC=0.86), left ventricular hypertrophy (AUC=0.75), left ventricular end systolic and diastolic volumes ( R 2 =0.74 and R 2 =0.70), and ejection fraction ( R 2 =0.50), as well as predicted systemic phenotypes of age ( R 2 =0.46), sex (AUC=0.88), weight ( R 2 =0.56), and height ( R 2 =0.33). Interpretation analysis validates that EchoNet shows appropriate attention to key cardiac structures when performing human-explainable tasks and highlights hypothesis-generating regions of interest when predicting systemic phenotypes difficult for human interpretation. Machine learning on echocardiography images can streamline repetitive tasks in the clinical workflow, provide preliminary interpretation in areas with insufficient qualified cardiologists, and predict phenotypes challenging for human evaluation.

    View details for DOI 10.1038/s41746-019-0216-8

    View details for PubMedID 31993508

  • Accelerated Stochastic Power Iteration. Proceedings of machine learning research De Sa, C., He, B., Mitliagkas, I., Re, C., Xu, P. 2018; 84: 58–67

    Abstract

    Principal component analysis (PCA) is one of the most powerful tools in machine learning. The simplest method for PCA, the power iteration, requires O ( 1 / Delta ) full-data passes to recover the principal component of a matrix with eigen-gap Delta. Lanczos, a significantly more complex method, achieves an accelerated rate of O ( 1 / Delta ) passes. Modern applications, however, motivate methods that only ingest a subset of available data, known as the stochastic setting. In the online stochastic setting, simple algorithms like Oja's iteration achieve the optimal sample complexity O ( sigma 2 / Delta 2 ) . Unfortunately, they are fully sequential, and also require O ( sigma 2 / Delta 2 ) iterations, far from the O ( 1 / Delta ) rate of Lanczos. We propose a simple variant of the power iteration with an added momentum term, that achieves both the optimal sample and iteration complexity. In the full-pass setting, standard analysis shows that momentum achieves the accelerated rate, O ( 1 / Delta ) . We demonstrate empirically that naively applying momentum to a stochastic method, does not result in acceleration. We perform a novel, tight variance analysis that reveals the "breaking-point variance" beyond which this acceleration does not occur. By combining this insight with modern variance reduction techniques, we construct stochastic PCA algorithms, for the online and offline setting, that achieve an accelerated iteration complexity O ( 1 / Delta ) . Due to the embarassingly parallel nature of our methods, this acceleration translates directly to wall-clock time if deployed in a parallel environment. Our approach is very general, and applies to many non-convex optimization problems that can now be accelerated using the same technique.

    View details for PubMedID 31187095

  • Accelerated Stochastic Power Iteration Xu, P., He, B., De Sa, C., Mitliagkas, I., Re, C., Storkey, A., PerezCruz, F. MICROTOME PUBLISHING. 2018
  • Inferring Generative Model Structure with Static Analysis. Advances in neural information processing systems Varma, P., He, B., Bajaj, P., Banerjee, I., Khandwala, N., Rubin, D. L., Re, C. 2017; 30: 239–49

    Abstract

    Obtaining enough labeled data to robustly train complex discriminative models is a major bottleneck in the machine learning pipeline. A popular solution is combining multiple sources of weak supervision using generative models. The structure of these models affects training label quality, but is difficult to learn without any ground truth labels. We instead rely on these weak supervision sources having some structure by virtue of being encoded programmatically. We present Coral, a paradigm that infers generative model structure by statically analyzing the code for these heuristics, thus reducing the data required to learn structure significantly. We prove that Coral's sample complexity scales quasilinearly with the number of heuristics and number of relations found, improving over the standard sample complexity, which is exponential in n for identifying nth degree relations. Experimentally, Coral matches or outperforms traditional structure learning approaches by up to 3.81 F1 points. Using Coral to model dependencies instead of assuming independence results in better performance than a fully supervised model by 3.07 accuracy points when heuristics are used to label radiology data without ground truth labels.

    View details for PubMedID 29391769

  • Learning the Structure of Generative Models without Labeled Data. Proceedings of machine learning research Bach, S. H., He, B. n., Ratner, A. n., Ré, C. n. 2017; 70: 273–82

    Abstract

    Curating labeled training data has become the primary bottleneck in machine learning. Recent frameworks address this bottleneck with generative models to synthesize labels at scale from weak supervision sources. The generative model's dependency structure directly affects the quality of the estimated labels, but selecting a structure automatically without any labeled data is a distinct challenge. We propose a structure estimation method that maximizes the ℓ1-regularized marginal pseudolikelihood of the observed data. Our analysis shows that the amount of unlabeled data required to identify the true structure scales sublinearly in the number of possible dependencies for a broad class of models. Simulations show that our method is 100× faster than a maximum likelihood approach and selects 1/4 as many extraneous dependencies. We also show that our method provides an average of 1.5 F1 points of improvement over existing, user-developed information extraction applications on real-world data such as PubMed journal abstracts.

    View details for PubMedID 30882087

  • Inferring Generative Model Structure with Static Analysis Varma, P., He, B., Bajaj, P., Khandwala, N., Banerjee, I., Rubin, D., Re, C., Guyon, Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2017
  • Signal quality of endovascular electroencephalography. Journal of neural engineering He, B. D., Ebrahimi, M., Palafox, L., Srinivasan, L. 2016; 13 (1): 016016-?

    Abstract

    Objective, Approach. A growing number of prototypes for diagnosing and treating neurological and psychiatric diseases are predicated on access to high-quality brain signals, which typically requires surgically opening the skull. Where endovascular navigation previously transformed the treatment of cerebral vascular malformations, we now show that it can provide access to brain signals with substantially higher signal quality than scalp recordings.While endovascular signals were known to be larger in amplitude than scalp signals, our analysis in rabbits borrows a standard technique from communication theory to show endovascular signals also have up to 100× better signal-to-noise ratio.With a viable minimally-invasive path to high-quality brain signals, patients with brain diseases could one day receive potent electroceuticals through the bloodstream, in the course of a brief outpatient procedure.

    View details for DOI 10.1088/1741-2560/13/1/016016

    View details for PubMedID 26735327

  • Scan Order in Gibbs Sampling: Models in Which it Matters and Bounds on How Much. Advances in neural information processing systems He, B., De Sa, C., Mitliagkas, I., Ré, C. 2016; 29

    Abstract

    Gibbs sampling is a Markov Chain Monte Carlo sampling technique that iteratively samples variables from their conditional distributions. There are two common scan orders for the variables: random scan and systematic scan. Due to the benefits of locality in hardware, systematic scan is commonly used, even though most statistical guarantees are only for random scan. While it has been conjectured that the mixing times of random scan and systematic scan do not differ by more than a logarithmic factor, we show by counterexample that this is not the case, and we prove that that the mixing times do not differ by more than a polynomial factor under mild conditions. To prove these relative bounds, we introduce a method of augmenting the state space to study systematic scan using conductance.

    View details for PubMedID 28344429

  • Scan Order in Gibbs Sampling: Models in Which it Matters and Bounds on How Much He, B., De Sa, C., Mitliagkas, I., Re, C., Lee, D. D., Sugiyama, M., Luxburg, U. V., Guyon, Garnett, R. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2016
  • Generalized analog thresholding for spike acquisition at ultralow sampling rates JOURNAL OF NEUROPHYSIOLOGY He, B. D., Wein, A., Varshney, L. R., Kusuma, J., Richardson, A. G., Srinivasan, L. 2015; 114 (1): 746-760

    Abstract

    Efficient spike acquisition techniques are needed to bridge the divide from creating large multi-electrode arrays (MEA) to achieving whole-cortex electrophysiology. In this paper, we introduce generalized analog thresholding (gAT), which achieves millisecond temporal resolution with sampling rates as low as 10 Hz. Consider the torrent of data from a single 1000-channel MEA, which would generate more than 3 GB per minute using standard 30 kHz Nyquist sampling. Recent neural signal processing methods based on compressive sensing (CS) still require Nyquist sampling as a first step and use iterative methods to reconstruct spikes. Analog thresholding (AT) remains the best existing alternative, where spike waveforms are passed through an analog comparator and sampled at 1 kHz, with instant spike reconstruction. By generalizing AT, the new method reduces sampling rates another order of magnitude, detects more than one spike per interval, and reconstructs spike width. Unlike CS, the new method reveals a simple closed-form solution to achieve instant (non-iterative) spike reconstruction. The base method is already robust to hardware non-idealities including realistic quantization error and integration noise. Because it achieves these considerable specifications using hardware-friendly components like integrators and comparators, gAT could translate large-scale MEAs into implantable devices for scientific investigation and medical technology.

    View details for DOI 10.1152/jn.00623.2014

    View details for Web of Science ID 000360556600030

    View details for PubMedID 25904712

  • Smooth Interactive Submodular Set Cover He, B., Yue, Y., Cortes, C., Lawrence, N. D., Lee, D. D., Sugiyama, M., Garnett, R. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2015
  • FEASIBILITY OF FRI-BASE D SQUARE - WAVE RECONSTRUCTION WITH QUANTIZATION ERROR AND INTEGRATOR NOISE He, B., Wein, A., Srinivasan, L., IEEE IEEE. 2015: 5952-5956