Honors & Awards


  • Stanford Graduate Student Fellowship, Albion Walter Hewlett Fellow, Stanford University
  • National Science Foundation (NSF) Graduate Research Fellow, National Science Foundation
  • Microsoft Research Graduate Women’s Scholar, Microsoft (2012)

Education & Certifications


  • BA, Duke University, Psychology & Neuroscience (2009)
  • PhD, Stanford University, Biomedical Informatics (2016)

All Publications


  • The Scientific Filesystem. GigaScience Sochat, V. 2018; 7 (5)

    Abstract

    Background: Here, we present the Scientific Filesystem (SCIF), an organizational format that supports exposure of executables and metadata for discoverability of scientific applications. The format includes a known filesystem structure, a definition for a set of environment variables describing it, and functions for generation of the variables and interaction with the libraries, metadata, and executables located within. SCIF makes it easy to expose metadata, multiple environments, installation steps, files, and entry points to render scientific applications consistent, modular, and discoverable. A SCIF can be installed on a traditional host or in a container technology such as Docker or Singularity. We start by reviewing the background and rationale for the SCIF, followed by an overview of the specification and the different levels of internal modules ("apps") that the organizational format affords. Finally, we demonstrate that SCIF is useful by implementing and discussing several use cases that improve user interaction and understanding of scientific applications. SCIF is released along with a client and integration in the Singularity 2.4 software to quickly install and interact with SCIF. When used inside of a reproducible container, a SCIF is a recipe for reproducibility and introspection of the functions and users that it serves.Results: We use SCIF to evaluate container software, provide metrics, serve scientific workflows, and execute a primary function under different contexts. To encourage collaboration and sharing of applications, we developed tools along with an open source, version-controlled, tested, and programmatically accessible web infrastructure. SCIF and associated resources are available at https://sci-f.github.io. The ease of using SCIF, especially in the context of containers, offers promise for scientists' work to be self-documenting and programatically parseable for maximum reproducibility. SCIF opens up an abstraction from underlying programming languages and packaging logic to work with scientific applications, opening up new opportunities for scientific software development.

    View details for DOI 10.1093/gigascience/giy023

    View details for PubMedID 29718213

  • lEnhancing reproducibility in scientific computing: Metrics and registry for Singularity containers PLOS ONE Sochat, V. V., Prybol, C. J., Kurtzer, G. M. 2017; 12 (11): e0188511

    Abstract

    Here we present Singularity Hub, a framework to build and deploy Singularity containers for mobility of compute, and the singularity-python software with novel metrics for assessing reproducibility of such containers. Singularity containers make it possible for scientists and developers to package reproducible software, and Singularity Hub adds automation to this workflow by building, capturing metadata for, visualizing, and serving containers programmatically. Our novel metrics, based on custom filters of content hashes of container contents, allow for comparison of an entire container, including operating system, custom software, and metadata. First we will review Singularity Hub's primary use cases and how the infrastructure has been designed to support modern, common workflows. Next, we conduct three analyses to demonstrate build consistency, reproducibility metric and performance and interpretability, and potential for discovery. This is the first effort to demonstrate a rigorous assessment of measurable similarity between containers and operating systems. We provide these capabilities within Singularity Hub, as well as the source software singularity-python that provides the underlying functionality. Singularity Hub is available at https://singularity-hub.org, and we are excited to provide it as an openly available platform for building, and deploying scientific containers.

    View details for DOI 10.1371/jeumal.pone.0188511

    View details for Web of Science ID 000416484900041

    View details for PubMedID 29186161

    View details for PubMedCentralID PMC5706697

  • Predicting Prostate Cancer Recurrence After Radical Prostatectomy PROSTATE Jeffers, A., Sochat, V., Kattan, M. W., Yu, C., Melcon, E., Yamoah, K., Rebbeck, T. R., Whittemore, A. S. 2017; 77 (3): 291-298

    View details for DOI 10.1002/pros.23268

    View details for Web of Science ID 000393893600006

  • Singularity: Scientific containers for mobility of compute. PloS one Kurtzer, G. M., Sochat, V., Bauer, M. W. 2017; 12 (5)

    Abstract

    Here we present Singularity, software developed to bring containers and reproducibility to scientific computing. Using Singularity containers, developers can work in reproducible environments of their choosing and design, and these complete environments can easily be copied and executed on other platforms. Singularity is an open source initiative that harnesses the expertise of system and software engineers and researchers alike, and integrates seamlessly into common workflows for both of these groups. As its primary use case, Singularity brings mobility of computing to both users and HPC centers, providing a secure means to capture and distribute software and compute environments. This ability to create and deploy reproducible environments across these centers, a previously unmet need, makes Singularity a game changing development for computational science.

    View details for DOI 10.1371/journal.pone.0177459

    View details for PubMedID 28494014

  • Sharing brain mapping statistical results with the neuroimaging data model SCIENTIFIC DATA Maumet, C., Auer, T., Bowring, A., Chen, G., Das, S., Flandin, G., Ghosh, S., Glatard, T., Gorgolewski, K. J., Helmer, K. G., Jenkinson, M., Keator, D. B., Nichols, B. N., Poline, J., Reynolds, R., Sochat, V., Turner, J., Nichols, T. E. 2016; 3

    Abstract

    Only a tiny fraction of the data and metadata produced by an fMRI study is finally conveyed to the community. This lack of transparency not only hinders the reproducibility of neuroimaging results but also impairs future meta-analyses. In this work we introduce NIDM-Results, a format specification providing a machine-readable description of neuroimaging statistical results along with key image data summarising the experiment. NIDM-Results provides a unified representation of mass univariate analyses including a level of detail consistent with available best practices. This standardized representation allows authors to relay methods and results in a platform-independent regularized format that is not tied to a particular neuroimaging software package. Tools are available to export NIDM-Result graphs and associated files from the widely used SPM and FSL software packages, and the NeuroVault repository can import NIDM-Results archives. The specification is publically available at: http://nidm.nidash.org/specs/nidm-results.html.

    View details for DOI 10.1038/sdata.2016.102

    View details for Web of Science ID 000390238300001

    View details for PubMedID 27922621

    View details for PubMedCentralID PMC5139675

  • Predicting Prostate Cancer Recurrence After Radical Prostatectomy. Prostate Jeffers, A., Sochat, V., Kattan, M. W., Yu, C., Melcon, E., Yamoah, K., Rebbeck, T. R., Whittemore, A. S. 2016

    Abstract

    Prostate cancer prognosis is variable, and management decisions involve balancing patients' risks of recurrence and recurrence-free death. Moreover, the roles of body mass index (BMI) and race in risk of recurrence are controversial [1,2]. To address these issues, we developed and cross-validated RAPS (Risks After Prostate Surgery), a personal prediction model for biochemical recurrence (BCR) within 10 years of radical prostatectomy (RP) that includes BMI and race as possible predictors, and recurrence-free death as a competing risk.RAPS uses a patient's risk factors at surgery to assign him a recurrence probability based on statistical learning methods applied to a cohort of 1,276 patients undergoing RP at the University of Pennsylvania. We compared the performance of RAPS to that of an existing model with respect to calibration (by comparing observed and predicted outcomes), and discrimination (using the area under the receiver operating characteristic curve (AUC)).RAPS' cross-validated BCR predictions provided better calibration than those of an existing model that underestimated patients' risks. Discrimination was similar for the two models, with BCR AUCs of 0.793, 95% confidence interval (0.766-0.820) for RAPS, and 0.780 (0.745-0.815) for the existing model. RAPS' most important BCR predictors were tumor grade, preoperative prostate-specific antigen (PSA) level and BMI; race was less important [3]. RAPS' predictions can be obtained online at https://predict.shinyapps.io/raps.RAPS' cross-validated BCR predictions were better calibrated than those of an existing model, and BMI information contributed substantially to these predictions. RAPS predictions for recurrence-free death were limited by lack of co-morbidity data; however the model provides a simple framework for extension to include such data. Its use and extension should facilitate decision strategies for post-RP prostate cancer management. Prostate 77:291-298, 2017. © 2016 Wiley Periodicals, Inc.

    View details for DOI 10.1002/pros.23268

    View details for PubMedID 27775165

  • The Experiment Factory: Standardizing Behavioral Experiments FRONTIERS IN PSYCHOLOGY Sochat, V. V., Eisenberg, I. W., Enkavi, A. Z., Li, J., Bissett, P. G., Poldrack, R. A. 2016; 7

    Abstract

    The administration of behavioral and experimental paradigms for psychology research is hindered by lack of a coordinated effort to develop and deploy standardized paradigms. While several frameworks (Mason and Suri, 2011; McDonnell et al., 2012; de Leeuw, 2015; Lange et al., 2015) have provided infrastructure and methods for individual research groups to develop paradigms, missing is a coordinated effort to develop paradigms linked with a system to easily deploy them. This disorganization leads to redundancy in development, divergent implementations of conceptually identical tasks, disorganized and error-prone code lacking documentation, and difficulty in replication. The ongoing reproducibility crisis in psychology and neuroscience research (Baker, 2015; Open Science Collaboration, 2015) highlights the urgency of this challenge: reproducible research in behavioral psychology is conditional on deployment of equivalent experiments. A large, accessible repository of experiments for researchers to develop collaboratively is most efficiently accomplished through an open source framework. Here we present the Experiment Factory, an open source framework for the development and deployment of web-based experiments. The modular infrastructure includes experiments, virtual machines for local or cloud deployment, and an application to drive these components and provide developers with functions and tools for further extension. We release this infrastructure with a deployment (http://www.expfactory.org) that researchers are currently using to run a set of over 80 standardized web-based experiments on Amazon Mechanical Turk. By providing open source tools for both deployment and development, this novel infrastructure holds promise to bring reproducibility to the administration of experiments, and accelerate scientific progress by providing a shared community resource of psychological paradigms.

    View details for DOI 10.3389/fpsyg.2016.00610

    View details for Web of Science ID 000374735800001

    View details for PubMedID 27199843

  • NeuroVault.org: A repository for sharing unthresholded statistical maps, parcellations, and atlases of the human brain. NeuroImage Gorgolewski, K. J., Varoquaux, G., Rivera, G., Schwartz, Y., Sochat, V. V., Ghosh, S. S., Maumet, C., Nichols, T. E., Poline, J., Yarkoni, T., Margulies, D. S., Poldrack, R. A. 2016; 124: 1242-1244

    Abstract

    NeuroVault.org is dedicated to storing outputs of analyses in the form of statistical maps, parcellations and atlases, a unique strategy that contrasts with most neuroimaging repositories that store raw acquisition data or stereotaxic coordinates. Such maps are indispensable for performing meta-analyses, validating novel methodology, and deciding on precise outlines for regions of interest (ROIs). NeuroVault is open to maps derived from both healthy and clinical populations, as well as from various imaging modalities (sMRI, fMRI, EEG, MEG, PET, etc.). The repository uses modern web technologies such as interactive web-based visualization, cognitive decoding, and comparison with other maps to provide researchers with efficient, intuitive tools to improve the understanding of their results. Each dataset and map is assigned a permanent Universal Resource Locator (URL), and all of the data is accessible through a REST Application Programming Interface (API). Additionally, the repository supports the NIDM-Results standard and has the ability to parse outputs from popular FSL and SPM software packages to automatically extract relevant metadata. This ease of use, modern web-integration, and pioneering functionality holds promise to improve the workflow for making inferences about and sharing whole-brain statistical maps.

    View details for DOI 10.1016/j.neuroimage.2015.04.016

    View details for PubMedID 25869863

  • The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments. Scientific data Gorgolewski, K. J., Auer, T., Calhoun, V. D., Craddock, R. C., Das, S., Duff, E. P., Flandin, G., Ghosh, S. S., Glatard, T., Halchenko, Y. O., Handwerker, D. A., Hanke, M., Keator, D., Li, X., Michael, Z., Maumet, C., Nichols, B. N., Nichols, T. E., Pellman, J., Poline, J., Rokem, A., Schaefer, G., Sochat, V., Triplett, W., Turner, J. A., Varoquaux, G., Poldrack, R. A. 2016; 3: 160044-?

    Abstract

    The development of magnetic resonance imaging (MRI) techniques has defined modern neuroimaging. Since its inception, tens of thousands of studies using techniques such as functional MRI and diffusion weighted imaging have allowed for the non-invasive study of the brain. Despite the fact that MRI is routinely used to obtain data for neuroscience research, there has been no widely adopted standard for organizing and describing the data collected in an imaging experiment. This renders sharing and reusing data (within or between labs) difficult if not impossible and unnecessarily complicates the application of automatic pipelines and quality assurance protocols. To solve this problem, we have developed the Brain Imaging Data Structure (BIDS), a standard for organizing and describing MRI datasets. The BIDS standard uses file formats compatible with existing software, unifies the majority of practices already common in the field, and captures the metadata necessary for most common data processing operations.

    View details for DOI 10.1038/sdata.2016.44

    View details for PubMedID 27326542

    View details for PubMedCentralID PMC4978148

  • Long-term neural and physiological phenotyping of a single human NATURE COMMUNICATIONS Poldrack, R. A., Laumann, T. O., Koyejo, O., Gregory, B., Hover, A., Chen, M., Gorgolewski, K. J., Luci, J., Joo, S. J., Boyd, R. L., Hunicke-Smith, S., Simpson, Z. B., Caven, T., Sochat, V., Shine, J. M., Gordon, E., Snyder, A. Z., Adeyemo, B., Petersen, S. E., Glahn, D. C., McKay, D. R., Curran, J. E., Goering, H. H., Carless, M. A., Blangero, J., Dougherty, R., Leemans, A., Handwerker, D. A., Frick, L., Marcotte, E. M., Mumford, J. A. 2015; 6

    View details for DOI 10.1038/ncomms9885

    View details for Web of Science ID 000367577400002

  • Effects of thresholding on correlation-based image similarity metrics FRONTIERS IN NEUROSCIENCE Sochat, V. V., Gorgolewski, K. J., Koyejo, O., Durnez, J., Poldrack, R. A. 2015; 9

    View details for DOI 10.3389/fnins.2015.00418

    View details for Web of Science ID 000366713100001

    View details for PubMedID 26578875

  • Searching for a minimal set of behaviors for autism detection through feature selection-based machine learning TRANSLATIONAL PSYCHIATRY Kosmicki, J. A., Sochat, V., Duda, M., Wall, D. P. 2015; 5

    Abstract

    Although the prevalence of autism spectrum disorder (ASD) has risen sharply in the last few years reaching 1 in 68, the average age of diagnosis in the United States remains close to 4-well past the developmental window when early intervention has the largest gains. This emphasizes the importance of developing accurate methods to detect risk faster than the current standards of care. In the present study, we used machine learning to evaluate one of the best and most widely used instruments for clinical assessment of ASD, the Autism Diagnostic Observation Schedule (ADOS) to test whether only a subset of behaviors can differentiate between children on and off the autism spectrum. ADOS relies on behavioral observation in a clinical setting and consists of four modules, with module 2 reserved for individuals with some vocabulary and module 3 for higher levels of cognitive functioning. We ran eight machine learning algorithms using stepwise backward feature selection on score sheets from modules 2 and 3 from 4540 individuals. We found that 9 of the 28 behaviors captured by items from module 2, and 12 of the 28 behaviors captured by module 3 are sufficient to detect ASD risk with 98.27% and 97.66% accuracy, respectively. A greater than 55% reduction in the number of behaviorals with negligible loss of accuracy across both modules suggests a role for computational and statistical methods to streamline ASD risk detection and screening. These results may help enable development of mobile and parent-directed methods for preliminary risk evaluation and/or clinical triage that reach a larger percentage of the population and help to lower the average age of detection and diagnosis.

    View details for DOI 10.1038/tp.2015.7

    View details for Web of Science ID 000367652200002

  • Searching for a minimal set of behaviors for autism detection through feature selection-based machine learning. Translational psychiatry Kosmicki, J. A., Sochat, V., Duda, M., Wall, D. P. 2015; 5

    Abstract

    Although the prevalence of autism spectrum disorder (ASD) has risen sharply in the last few years reaching 1 in 68, the average age of diagnosis in the United States remains close to 4-well past the developmental window when early intervention has the largest gains. This emphasizes the importance of developing accurate methods to detect risk faster than the current standards of care. In the present study, we used machine learning to evaluate one of the best and most widely used instruments for clinical assessment of ASD, the Autism Diagnostic Observation Schedule (ADOS) to test whether only a subset of behaviors can differentiate between children on and off the autism spectrum. ADOS relies on behavioral observation in a clinical setting and consists of four modules, with module 2 reserved for individuals with some vocabulary and module 3 for higher levels of cognitive functioning. We ran eight machine learning algorithms using stepwise backward feature selection on score sheets from modules 2 and 3 from 4540 individuals. We found that 9 of the 28 behaviors captured by items from module 2, and 12 of the 28 behaviors captured by module 3 are sufficient to detect ASD risk with 98.27% and 97.66% accuracy, respectively. A greater than 55% reduction in the number of behaviorals with negligible loss of accuracy across both modules suggests a role for computational and statistical methods to streamline ASD risk detection and screening. These results may help enable development of mobile and parent-directed methods for preliminary risk evaluation and/or clinical triage that reach a larger percentage of the population and help to lower the average age of detection and diagnosis.

    View details for DOI 10.1038/tp.2015.7

    View details for PubMedID 25710120

  • Translational Meta-analytical Methods to Localize the Regulatory Patterns of Neurological Disorders in the Human Brain. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Sochat, V., David, M., Wall, D. P. 2015; 2015: 2073-2082

    Abstract

    The task of mapping neurological disorders in the human brain must be informed by multiple measurements of an individual's phenotype - neuroimaging, genomics, and behavior. We developed a novel meta-analytical approach to integrate disparate resources and generated transcriptional maps of neurological disorders in the human brain yielding a purely computational procedure to pinpoint the brain location of transcribed genes likely to be involved in either onset or maintenance of the neurological condition.

    View details for PubMedID 26958307

    View details for PubMedCentralID PMC4765688

  • AuthorSynth: a collaboration network and behaviorally-based visualization tool of activation reports from the neuroscience literature. Frontiers in neuroinformatics Sochat, V. V. 2015; 9: 6-?

    Abstract

    Targeted collaboration is becoming more challenging with the ever-increasing number of publications, conferences, and academic responsibilities that the modern-day researcher must synthesize. Specifically, the field of neuroimaging had roughly 10,000 new papers in PubMed for the year 2013, presenting tens of thousands of international authors, each a potential collaborator working on some sub-domain in the field. To remove the burden of synthesizing an entire corpus of publications, talks, and conference interactions to find and assess collaborations, we combine meta-analytical neuroimaging informatics methods with machine learning and network analysis toward this goal. We present "AuthorSynth," a novel application prototype that includes (1) a collaboration network to identify researchers with similar results reported in the literature; and (2) a 2D plot-"brain lattice"-to visually summarize a single author's contribution to the field, and allow for searching of authors based on behavioral terms. This method capitalizes on intelligent synthesis of the neuroimaging literature, and demonstrates that data-driven approaches can be used to confirm existing collaborations, reveal potential ones, and identify gaps in published knowledge. We believe this tool exemplifies how methods from neuroimaging informatics can better inform researchers about progress and knowledge in the field, and enhance the modern workflow of finding collaborations.

    View details for DOI 10.3389/fninf.2015.00006

    View details for PubMedID 25859214

  • NeuroVault.org: a web-based repository for collecting and sharing unthresholded statistical maps of the human brain. Frontiers in neuroinformatics Gorgolewski, K. J., Varoquaux, G., Rivera, G., Schwarz, Y., Ghosh, S. S., Maumet, C., Sochat, V. V., Nichols, T. E., Poldrack, R. A., Poline, J., Yarkoni, T., Margulies, D. S. 2015; 9: 8-?

    Abstract

    Here we present NeuroVault-a web based repository that allows researchers to store, share, visualize, and decode statistical maps of the human brain. NeuroVault is easy to use and employs modern web technologies to provide informative visualization of data without the need to install additional software. In addition, it leverages the power of the Neurosynth database to provide cognitive decoding of deposited maps. The data are exposed through a public REST API enabling other services and tools to take advantage of it. NeuroVault is a new resource for researchers interested in conducting meta- and coactivation analyses.

    View details for DOI 10.3389/fninf.2015.00008

    View details for PubMedID 25914639

    View details for PubMedCentralID PMC4392315

  • A Robust Classifier to Distinguish Noise from fMRI Independent Components PLOS ONE Sochat, V., Supekar, K., Bustillo, J., Calhoun, V., Turner, J. A., Rubin, D. L. 2014; 9 (4)

    View details for DOI 10.1371/journal.pone.0095493

    View details for Web of Science ID 000335226500115

    View details for PubMedID 24748378